Sarvam-30B GGUF
GGUF quantizations of sarvamai/sarvam-30b — the first publicly available GGUF for this model.
Created by applying llama.cpp PR #20275 which adds sarvam_moe architecture support to the converter and runtime.
Will this work with Ollama / LM Studio / Jan?
Not yet. These tools bundle mainline llama.cpp, which does not recognize
sarvam_moe. You will see:error loading model: unknown model architecture: 'sarvam_moe'This GGUF requires a patched llama.cpp (with PR #20275 applied) until that PR merges into mainline. Once it does, Ollama / LM Studio / Jan will work automatically on their next update.
To build a patched llama.cpp, use mtr7x/sarvam-gguf.
Files
| File | Quant | Size | BPW | Notes |
|---|---|---|---|---|
| sarvam-30b-q4_k_m.gguf | Q4_K_M | 19 GB | 4.87 | Recommended — good balance of quality and size |
| sarvam-30b-f16.gguf | F16 | 60 GB | 16.00 | Full precision, use for further quantization |
How to use
Option 1: Patch llama.cpp automatically (recommended)
git clone https://github.com/mtr7x/sarvam-gguf.git
cd sarvam-gguf
chmod +x patch_and_convert.sh
./patch_and_convert.sh
This clones llama.cpp, applies PR #20275, builds it, and you're ready to run.
Option 2: Run with patched llama.cpp directly
./llama-cli \
--model sarvam-30b-q4_k_m.gguf \
--n-gpu-layers 99 \
--ctx-size 2048 \
--temp 0.7 \
-no-cnv \
--prompt "भारत के बारे में बताइए।"
What does NOT work (yet)
| Tool | Status | Why |
|---|---|---|
| Ollama | unknown model architecture |
Waiting on PR #20275 merge |
| LM Studio | unknown model architecture |
Waiting on PR #20275 merge |
| Jan | unknown model architecture |
Waiting on PR #20275 merge |
| llama.cpp (mainline) | unknown model architecture |
PR #20275 not yet merged |
| llama.cpp (patched) | Works | This is what you need |
Architecture
sarvamai/sarvam-30b
├── model_type: sarvam_moe
├── 30B params, 2.4B active
├── 19 layers (1 dense + 18 MoE)
├── 128 experts + 1 shared, top-6, sigmoid routing
├── 64 query heads, 4 KV heads, head_dim=64
├── vocab_size: 262,144 (Indic-optimized)
└── Apache 2.0
Why this is needed
Sarvam open-sourced 30B and 105B under Apache 2.0, but mainline llama.cpp doesn't recognize model_type: "sarvam_moe" — the converter exits immediately. Contrary to what you might expect, sigmoid routing is already supported in llama.cpp (used by GLM4 and others). The actual blocker is a missing class registration + tensor mappings + C++ graph builder — all provided by PR #20275 (387 lines).
The domino chain
PR #20275 merges into llama.cpp ← pending
→ GGUF can be created ← done (this repo)
→ Ollama updates its llama.cpp ← blocked
→ Unsloth applies dynamic quants ← blocked
→ ollama run sarvam-30b ← blocked
Runtime support
| Runtime | Status |
|---|---|
| vLLM | PR #33942 merged |
| SGLang | Works |
| llama.cpp (patched) | Works (PR #20275) |
| llama.cpp (mainline) | Blocked — PR pending |
| Ollama | Blocked on llama.cpp |
| LM Studio | Blocked on llama.cpp |
Credits
- sarvamai for open-sourcing Sarvam-30B under Apache 2.0
- sumitchatterjee13 for llama.cpp PR #20275
- Conversion pipeline: mtr7x/sarvam-gguf
Read the full analysis: Sarvam. Open is not sovereign
- Downloads last month
- 154
4-bit
16-bit
Model tree for mtrajan/sarvam-30b-GGUF
Base model
sarvamai/sarvam-30b