Credit: This is a GGUF quantization of 0xSero/NVIDIA-Nemotron-3-Super-120B-A12B-BF16-REAP-50pct-draft, a REAP expert-pruned checkpoint derived from nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16. All credit for the base model goes to NVIDIA, and the REAP pruning work to 0xSero.

NVIDIA Nemotron-H 120B REAP 50% — GGUF

GGUF quantizations of the REAP 50%-pruned Nemotron-H 120B model for use with llama.cpp and compatible tools.

Available Quantizations

File Quant Size BPW
Nemotron-H-120B-REAP-50pct-BF16.gguf BF16 128.6 GB 16.01
Nemotron-H-120B-REAP-50pct-Q8_0.gguf Q8_0 68.4 GB 8.52
Nemotron-H-120B-REAP-50pct-Q6_K.gguf Q6_K 59.7 GB 7.43
Nemotron-H-120B-REAP-50pct-Q4_K_M.gguf Q4_K_M 45.4 GB 5.65

Model Details

Property Value
Architecture NemotronH (hybrid Mamba + MoE + Attention)
Total Blocks 88 (40 Mamba, 40 MoE, 8 Attention)
Original Parameters ~120B (64B after 50% expert pruning)
Experts per MoE Layer 256 (pruned from 512)
Routed Experts per Token 22
Context Length 262,144 tokens
Vocab Size 131,072

Usage

# With llama.cpp
llama-cli -m Nemotron-H-120B-REAP-50pct-Q4_K_M.gguf -p "Hello" -n 128

# With ollama (create a Modelfile first)
ollama create nemotron-h-reap -f Modelfile

About REAP Pruning

This model was pruned using the REAP method (arXiv:2510.13999), which selectively removes 50% of MoE experts based on layerwise activation observations. This reduces memory footprint while preserving quality for the most commonly activated expert pathways.

Draft Caveats

This is a draft derived checkpoint from the original author. Full serving benchmarks and quality evaluations have not been completed. Evaluate accordingly.

License

Distributed under the NVIDIA Open Model License. See the original model for full terms.

Quantized by

DJLougen using llama.cpp on DGX Spark

Downloads last month
757
GGUF
Model size
64B params
Architecture
nemotron_h_moe
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DJLougen/Nemotron-H-120B-REAP-50pct-GGUF

Paper for DJLougen/Nemotron-H-120B-REAP-50pct-GGUF