eagle3-kimik2.5-w4a8
An EAGLE-3 speculative decoding draft model for Kimi-K2.5, quantized to INT4 using AMD Quark.
Model Details
| Property | Value |
|---|---|
| Base draft model | lightseekorg/kimi-k2.5-eagle3 |
| Architecture | LlamaForCausalLMEagle3 (1-layer Transformer) |
| Hidden size | 7168 |
| Vocab size | 163840 |
| Quantization | INT4 per-channel symmetric (Quark 0.11.1) |
| Model size | ~4.8 GB (vs 6.0 GB BF16 original) |
| Quantized layers | midlayer.self_attn.{q,k,v,o}_proj, midlayer.mlp.{gate,up,down}_proj |
| Excluded from quantization | embed_tokens, lm_head, fc (fusion layer), all norms |
How It Was Made
The BF16 EAGLE-3 draft model (lightseekorg/kimi-k2.5-eagle3) was quantized using AMD Quark with the following config:
- Weight quantization: INT4 per-channel symmetric
- Excluded layers:
embed_tokens,lm_head,fc(EAGLE-3 fusion layer), all norm layers - Tool: AMD Quark 0.11.1
Performance (with Kimi K2.5 on 8x MI325X)
| Metric | Baseline (no EAGLE) | EAGLE-3 BF16 |
|---|---|---|
| TPOT con=2 (ms) | 23.79 | 8.26 |
| Output tok/s con=2 | 78.87 | 204.12 |
| TPOT con=40 (ms) | 59.33 | 40.87 |
| Output tok/s con=40 | 500.11 | 699.55 |
| GSM8K accuracy (10-shot) | 0.93 | 0.91 |
| Accept length | N/A | ~3.97 |
Usage with SGLang
python -m sglang.launch_server \
--model <path-to-kimi-k2.5> \
--tp 8 \
--speculative-algorithm EAGLE3 \
--speculative-draft-model-path <path-to-this-model> \
--speculative-num-steps 3 \
--speculative-eagle-topk 1 \
--speculative-num-draft-tokens 4 \
--mem-fraction-static 0.80 \
--trust-remote-code \
--disable-radix-cache
Notes
- For AMD ROCm with AITER attention backend, a patch is needed: guard bare
if _use_mla_ps_kernel:withif self.use_mla and _use_mla_ps_kernel:inaiter_backend.py(SGLang PR #20409). - The
KimiK25ForConditionalGenerationmodel wrapper needsget_embed_and_head(),set_embed_and_head(), andset_eagle3_layers_to_capture()methods added to delegate to its innerlanguage_model.
- Downloads last month
- 23
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support
Model tree for ginsongsong/eagle3-kimik2.5-w4a8
Base model
moonshotai/Kimi-K2.5