eagle3-kimik2.5-w4a8

An EAGLE-3 speculative decoding draft model for Kimi-K2.5, quantized to INT4 using AMD Quark.

Model Details

Property	Value
Base draft model	lightseekorg/kimi-k2.5-eagle3
Architecture	`LlamaForCausalLMEagle3` (1-layer Transformer)
Hidden size	7168
Vocab size	163840
Quantization	INT4 per-channel symmetric (Quark 0.11.1)
Model size	~4.8 GB (vs 6.0 GB BF16 original)
Quantized layers	`midlayer.self_attn.{q,k,v,o}_proj`, `midlayer.mlp.{gate,up,down}_proj`
Excluded from quantization	`embed_tokens`, `lm_head`, `fc` (fusion layer), all norms

How It Was Made

The BF16 EAGLE-3 draft model (lightseekorg/kimi-k2.5-eagle3) was quantized using AMD Quark with the following config:

Weight quantization: INT4 per-channel symmetric
Excluded layers: embed_tokens, lm_head, fc (EAGLE-3 fusion layer), all norm layers
Tool: AMD Quark 0.11.1

Performance (with Kimi K2.5 on 8x MI325X)

Metric	Baseline (no EAGLE)	EAGLE-3 BF16
TPOT con=2 (ms)	23.79	8.26
Output tok/s con=2	78.87	204.12
TPOT con=40 (ms)	59.33	40.87
Output tok/s con=40	500.11	699.55
GSM8K accuracy (10-shot)	0.93	0.91
Accept length	N/A	~3.97

Usage with SGLang

python -m sglang.launch_server \
    --model <path-to-kimi-k2.5> \
    --tp 8 \
    --speculative-algorithm EAGLE3 \
    --speculative-draft-model-path <path-to-this-model> \
    --speculative-num-steps 3 \
    --speculative-eagle-topk 1 \
    --speculative-num-draft-tokens 4 \
    --mem-fraction-static 0.80 \
    --trust-remote-code \
    --disable-radix-cache

Notes

For AMD ROCm with AITER attention backend, a patch is needed: guard bare if _use_mla_ps_kernel: with if self.use_mla and _use_mla_ps_kernel: in aiter_backend.py (SGLang PR #20409).
The KimiK25ForConditionalGeneration model wrapper needs get_embed_and_head(), set_embed_and_head(), and set_eagle3_layers_to_capture() methods added to delegate to its inner language_model.

Downloads last month: 23

Safetensors

Model size

2B params

Tensor type

F16

I32

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ginsongsong/eagle3-kimik2.5-w4a8

Base model

moonshotai/Kimi-K2.5

Quantized

(34)

this model