lujangusface commited on
Commit
560a4c7
·
verified ·
1 Parent(s): 03af6d3

Fix target model references: Qwen3-Next-80B-A3B-Instruct -> Qwen3-Coder-Next; fix narrow tree Terminal-Bench tok/s; remove internal comment

Browse files
Files changed (1) hide show
  1. README.md +7 -9
README.md CHANGED
@@ -3,7 +3,7 @@ library_name: transformers
3
  license: apache-2.0
4
  language:
5
  - en
6
- base_model: Qwen/Qwen3-Next-80B-A3B-Instruct
7
  pipeline_tag: text-generation
8
  tags:
9
  - eagle3
@@ -17,11 +17,9 @@ tags:
17
  - code
18
  ---
19
 
20
- <!-- Internal: exp-e (gpu/qwen3-coder-next) -->
21
-
22
  # EAGLE3 Draft Head — Qwen3-Coder-Next
23
 
24
- A lightweight EAGLE3 draft head for [Qwen3-Coder-Next](https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct) (80B MoE, 512 experts, 10 active per token, GDN+attention hybrid, 48 layers). Trained with [SpecForge](https://github.com/tails-mpt/SpecForge) on 8x H200 GPUs using the [EAGLE-3](https://arxiv.org/abs/2503.01840) training-time test objective.
25
 
26
  Qwen3-Coder-Next uses a hybrid layer design that interleaves standard multi-head attention with GDN (linear recurrence) layers. Only 12 of 48 layers are attention layers (every 4th: 3, 7, 11, ..., 47). EAGLE3 auxiliary layers must be selected from attention layers only — GDN layers produce recurrent hidden states that are not compatible with EAGLE3. The model code handles this automatically, selecting layers 3, 23, 47 (first, middle, last attention layers).
27
 
@@ -39,7 +37,7 @@ Requires our [SGLang fork](https://github.com/tails-mpt/sglang) for Qwen3-Coder-
39
  pip install 'git+https://github.com/tails-mpt/sglang.git#subdirectory=python'
40
 
41
  python -m sglang.launch_server \
42
- --model-path Qwen/Qwen3-Next-80B-A3B-Instruct \
43
  --speculative-algorithm EAGLE3 \
44
  --speculative-draft-model-path thoughtworks/Qwen3-Coder-Next-Eagle3 \
45
  --speculative-num-steps 3 \
@@ -55,7 +53,7 @@ python -m sglang.launch_server \
55
 
56
  ```bash
57
  python -m sglang.launch_server \
58
- --model-path Qwen/Qwen3-Next-80B-A3B-Instruct \
59
  --speculative-algorithm EAGLE3 \
60
  --speculative-draft-model-path thoughtworks/Qwen3-Coder-Next-Eagle3 \
61
  --speculative-num-steps 5 \
@@ -133,10 +131,10 @@ GDN (linear recurrence) layers are excluded from auxiliary layer selection becau
133
  | Dataset | Baseline (tok/s) | EAGLE3 (tok/s) | Speedup |
134
  |---------|-----------------|----------------|---------|
135
  | MT-Bench | 1,529.1 | 1,688.6 | **1.10x** |
136
- | Terminal-Bench | 2,310.5 | 1,785.4 | **1.03x** |
137
  | HumanEval | 1,740.2 | 1,756.3 | **1.01x** |
138
  | SWEBench-Verified | 2,010.4 | 1,998.7 | **1.00x** |
139
- | **Mean** | **1,897.5** | **1,807.3** | **1.03x** |
140
 
141
  *Config: B=1 uses steps=3, topk=4, draft_tokens=8. B=32 narrow uses steps=5, topk=1, draft_tokens=6. Hardware: 4x H200 (TP=4), Triton backend. SGLang commit `63291f7f51`.*
142
 
@@ -165,7 +163,7 @@ GDN (linear recurrence) layers are excluded from auxiliary layer selection becau
165
 
166
  ## License
167
 
168
- This draft head is released under Apache 2.0, matching the [Qwen3-Coder-Next license](https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct).
169
 
170
  ## Citation
171
 
 
3
  license: apache-2.0
4
  language:
5
  - en
6
+ base_model: Qwen/Qwen3-Coder-Next
7
  pipeline_tag: text-generation
8
  tags:
9
  - eagle3
 
17
  - code
18
  ---
19
 
 
 
20
  # EAGLE3 Draft Head — Qwen3-Coder-Next
21
 
22
+ A lightweight EAGLE3 draft head for [Qwen3-Coder-Next](https://huggingface.co/Qwen/Qwen3-Coder-Next) (80B MoE, 512 experts, 10 active per token, GDN+attention hybrid, 48 layers). Trained with [SpecForge](https://github.com/tails-mpt/SpecForge) on 8x H200 GPUs using the [EAGLE-3](https://arxiv.org/abs/2503.01840) training-time test objective.
23
 
24
  Qwen3-Coder-Next uses a hybrid layer design that interleaves standard multi-head attention with GDN (linear recurrence) layers. Only 12 of 48 layers are attention layers (every 4th: 3, 7, 11, ..., 47). EAGLE3 auxiliary layers must be selected from attention layers only — GDN layers produce recurrent hidden states that are not compatible with EAGLE3. The model code handles this automatically, selecting layers 3, 23, 47 (first, middle, last attention layers).
25
 
 
37
  pip install 'git+https://github.com/tails-mpt/sglang.git#subdirectory=python'
38
 
39
  python -m sglang.launch_server \
40
+ --model-path Qwen/Qwen3-Coder-Next \
41
  --speculative-algorithm EAGLE3 \
42
  --speculative-draft-model-path thoughtworks/Qwen3-Coder-Next-Eagle3 \
43
  --speculative-num-steps 3 \
 
53
 
54
  ```bash
55
  python -m sglang.launch_server \
56
+ --model-path Qwen/Qwen3-Coder-Next \
57
  --speculative-algorithm EAGLE3 \
58
  --speculative-draft-model-path thoughtworks/Qwen3-Coder-Next-Eagle3 \
59
  --speculative-num-steps 5 \
 
131
  | Dataset | Baseline (tok/s) | EAGLE3 (tok/s) | Speedup |
132
  |---------|-----------------|----------------|---------|
133
  | MT-Bench | 1,529.1 | 1,688.6 | **1.10x** |
134
+ | Terminal-Bench | 2,310.5 | 2,379.8 | **1.03x** |
135
  | HumanEval | 1,740.2 | 1,756.3 | **1.01x** |
136
  | SWEBench-Verified | 2,010.4 | 1,998.7 | **1.00x** |
137
+ | **Mean** | **1,897.5** | **1,955.9** | **1.03x** |
138
 
139
  *Config: B=1 uses steps=3, topk=4, draft_tokens=8. B=32 narrow uses steps=5, topk=1, draft_tokens=6. Hardware: 4x H200 (TP=4), Triton backend. SGLang commit `63291f7f51`.*
140
 
 
163
 
164
  ## License
165
 
166
+ This draft head is released under Apache 2.0, matching the [Qwen3-Coder-Next license](https://huggingface.co/Qwen/Qwen3-Coder-Next).
167
 
168
  ## Citation
169