mudler commited on
Commit
ba1edf3
·
verified ·
1 Parent(s): af80749

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +49 -0
README.md ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ base_model: nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
4
+ tags:
5
+ - gguf
6
+ - quantized
7
+ - apex
8
+ - moe
9
+ - mixture-of-experts
10
+ - nvidia
11
+ - nemotron
12
+ - mamba
13
+ - hybrid
14
+ ---
15
+
16
+ # Nemotron-3-Nano-30B-A3B APEX GGUF
17
+
18
+ **APEX (Adaptive Precision for EXpert Models)** quantizations of [NVIDIA-Nemotron-3-Nano-30B-A3B](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16).
19
+
20
+ **Brought to you by the [LocalAI](https://github.com/mudler/LocalAI) team** | [APEX Project](https://github.com/mudler/apex-quant) | [Technical Report](https://github.com/mudler/apex-quant/blob/main/paper/APEX_Technical_Report.pdf)
21
+
22
+ ## Benchmark Results
23
+
24
+ Benchmarks coming soon. For reference APEX benchmarks on the Qwen3.5-35B-A3B architecture, see [mudler/Qwen3.5-35B-A3B-APEX-GGUF](https://huggingface.co/mudler/Qwen3.5-35B-A3B-APEX-GGUF).
25
+
26
+ ## What is APEX?
27
+
28
+ APEX is a quantization strategy for Mixture-of-Experts (MoE) models. It classifies tensors by role (routed expert, shared expert, attention) and applies a layer-wise precision gradient -- edge layers get higher precision, middle layers get more aggressive compression. I-variants use diverse imatrix calibration (chat, code, reasoning, tool-calling, agentic traces, Wikipedia).
29
+
30
+ See the [APEX project](https://github.com/mudler/apex-quant) for full details, technical report, and scripts.
31
+
32
+ ## Architecture
33
+
34
+ - **Model**: NVIDIA-Nemotron-3-Nano-30B-A3B (NemotronH)
35
+ - **Layers**: 52 (23 Mamba-2, 23 MoE, 6 GQA attention)
36
+ - **Experts**: 128 routed + 1 shared (6 active per token)
37
+ - **Total Parameters**: 30B
38
+ - **Active Parameters**: ~3.5B per token
39
+ - **APEX Config**: 5+5 symmetric edge gradient across 52 layers
40
+
41
+ ## Run with LocalAI
42
+
43
+ ```bash
44
+ local-ai run mudler/Nemotron-3-Nano-30B-A3B-APEX-GGUF@Nemotron-3-Nano-30B-A3B-APEX-I-Balanced.gguf
45
+ ```
46
+
47
+ ## Credits
48
+
49
+ APEX is brought to you by the [LocalAI](https://github.com/mudler/LocalAI) team. Developed through human-driven, AI-assisted research. Built on [llama.cpp](https://github.com/ggerganov/llama.cpp).