Nemotron-Cascade-2-30B-A3B โ Q5_K_M GGUF
GGUF quantization of nvidia/Nemotron-Cascade-2-30B-A3B.
- Architecture: Hybrid Attention + Mamba (SSM) + MoE โ 30B total parameters, 3B active
- Quantization: Q5_K_M (k-quant, mixed precision ~5 bpw)
Quantization commands
# Convert HF model to GGUF (bf16)
python llama.cpp/convert_hf_to_gguf.py \
nvidia/Nemotron-Cascade-2-30B-A3B \
--outfile nemotron-cascade-30b-bf16.gguf \
--outtype bf16
# Quantize to Q5_K_M
llama-quantize nemotron-cascade-30b-bf16.gguf \
nemotron-cascade-30b-Q5_K_M.gguf Q5_K_M
Usage
Load in LM Studio, llama.cpp, or any GGUF-compatible runtime.
- Downloads last month
- 757
Hardware compatibility
Log In to add your hardware
5-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for AdrienBrault/Nemotron-Cascade-2-30B-A3B-Q5_K_M-GGUF
Base model
nvidia/Nemotron-Cascade-2-30B-A3B