williamchangtw chenjiel commited on
Commit
52f42b9
·
0 Parent(s):

Duplicate from nvidia/Qwen3-Next-80B-A3B-Thinking-NVFP4

Browse files

Co-authored-by: Chenjie Luo <chenjiel@users.noreply.huggingface.co>

.gitattributes ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ model.safetensors.index.json filter=lfs diff=lfs merge=lfs -text
37
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,200 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: text-generation
3
+ base_model:
4
+ - Qwen/Qwen3-Next-80B-A3B-Thinking
5
+ license: apache-2.0
6
+ library_name: Model Optimizer
7
+ tags:
8
+ - nvidia
9
+ - ModelOpt
10
+ - Qwen3
11
+ - quantized
12
+ - NVFP4
13
+ - nvfp4
14
+ ---
15
+
16
+ # Model Overview
17
+
18
+ ## Description:
19
+ The NVIDIA Qwen3-Next-80B-A3B-Thinking NVFP4 model is the quantized version of Alibaba's Qwen3-Next-80B-A3B-Thinking model, which is an auto-regressive language model that uses an optimized transformer architecture. For more information, please check [here](https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Thinking). The NVIDIA Qwen3-Next-80B-A3B-Thinking NVFP4 model is quantized with [TensorRT Model Optimizer](https://github.com/NVIDIA/TensorRT-Model-Optimizer).
20
+
21
+ This model is ready for commercial/non-commercial use. <br>
22
+
23
+ ## Third-Party Community Consideration
24
+ This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to Non-NVIDIA [(Qwen3-Next-80B-A3B-Thinking) Model Card](https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Thinking).
25
+
26
+ ### License/Terms of Use:
27
+ [Apache license 2.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md)
28
+
29
+ ### Deployment Geography:
30
+ Global <br>
31
+
32
+ ### Use Case:
33
+ Developers looking to take off-the-shelf, pre-quantized models for deployment in AI Agent systems, chatbots, RAG systems, and other AI-powered applications. <br>
34
+
35
+ ### Release Date:
36
+ Huggingface 12/29/2025 via https://huggingface.co/nvidia/Qwen3-Next-80B-A3B-Thinking-NVFP4 <br>
37
+
38
+ ## Model Architecture:
39
+ **Architecture Type:** Transformers <br>
40
+ **Network Architecture:** Qwen3NextForCausalLM <br>
41
+ **This model was developed based on [Qwen3-Next-80B-A3B-Thinking](https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Thinking) <br>
42
+ **Number of model parameters: Undisclosed. <br>
43
+
44
+ ## Input:
45
+ **Input Type(s):** Text <br>
46
+ **Input Format(s):** String <br>
47
+ **Input Parameters:** 1D (One-Dimensional): Sequences <br>
48
+ **Other Properties Related to Input:** Context length 262,144 natively and extensible up to 1,010,000 tokens <br>
49
+
50
+ ## Output:
51
+ **Output Type(s):** Text <br>
52
+ **Output Format:** String <br>
53
+ **Output Parameters:** 1D (One-Dimensional): Sequences <br>
54
+ **Other Properties Related to Output:** N/A <br>
55
+
56
+ Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. <br>
57
+
58
+ ## Software Integration:
59
+ **Runtime Engine(s):** <br>
60
+ * TensorRT-LLM <br>
61
+
62
+ **Supported Hardware Microarchitecture Compatibility:** <br>
63
+ * NVIDIA Blackwell <br>
64
+
65
+ **Preferred Operating System(s):** <br>
66
+ * Linux <br>
67
+
68
+ The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
69
+
70
+ ## Model Version(s):
71
+ The model is quantized with nvidia-modelopt **v0.40.0** <br>
72
+
73
+ ## Training, Testing, and Evaluation Datasets:
74
+
75
+ ## Calibration Dataset:
76
+ ** Link: [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail), [Nemotron-Post-Training-Dataset-v2](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v2) <br>
77
+ ** Data Collection Method by dataset: Automated. <br>
78
+ ** Labeling Method by dataset: Automated. <br>
79
+
80
+ ## Training Dataset:
81
+ ** Data Modality: Undisclosed <br>
82
+ ** Data Collection Method by dataset: Undisclosed <br>
83
+ ** Labeling Method by dataset: Undisclosed<br>
84
+ ** Properties: Undisclosed
85
+
86
+ ## Testing Dataset:
87
+ ** Data Collection Method by dataset: Undisclosed <br>
88
+ ** Labeling Method by dataset: Undisclosed <br>
89
+ ** Properties: Undisclosed <br>
90
+
91
+ ## Evaluation Dataset:
92
+ * Datasets: MMLU Pro, GPQA Diamond, LiveCodeBench V6, SciCode, AIME 2025 <br>
93
+ ** Data Collection Method by dataset: Hybrid: Automated, Human <br>
94
+ ** Labeling Method by dataset: Hybrid: Human, Automated <br>
95
+
96
+
97
+ ## Inference:
98
+ **Acceleration Engine:** TensorRT-LLM <br>
99
+ **Test Hardware:** B200 <br>
100
+
101
+ ## Post Training Quantization
102
+ This model was obtained by quantizing the weights and activations of Qwen3-Next-80B-A3B-Thinking to NVFP4 data type, ready for inference with TensorRT-LLM. Only the weights and activations of the linear operators within transformer blocks are quantized. This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately 3.3x.
103
+
104
+ ## Usage
105
+
106
+ ### Deploy with TensorRT-LLM
107
+
108
+ To deploy the quantized checkpoint with [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) LLM API, follow the sample codes below:
109
+
110
+ * LLM API sample usage:
111
+ ```
112
+ from tensorrt_llm import LLM, SamplingParams
113
+ from tensorrt_llm.llmapi import KvCacheConfig
114
+
115
+
116
+ def main():
117
+
118
+ prompts = [
119
+ "Hello, my name is",
120
+ "The president of the United States is",
121
+ "The capital of France is",
122
+ "The future of AI is",
123
+ ]
124
+ sampling_params = SamplingParams(temperature=0.6, top_p=0.95)
125
+ kv_cache_config = KvCacheConfig(enable_block_reuse=False)
126
+
127
+ llm = LLM(model="nvidia/Qwen3-Next-80B-A3B-Thinking-NVFP4", tensor_parallel_size=4, kv_cache_config=kv_cache_config)
128
+
129
+ outputs = llm.generate(prompts, sampling_params)
130
+
131
+ # Print the outputs.
132
+ for output in outputs:
133
+ prompt = output.prompt
134
+ generated_text = output.outputs[0].text
135
+ print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
136
+
137
+
138
+ # The entry point of the program needs to be protected for spawning processes.
139
+ if __name__ == '__main__':
140
+ main()
141
+
142
+ ```
143
+
144
+ ### Evaluation
145
+ The accuracy benchmark results are presented in the table below:
146
+ <table>
147
+ <tr>
148
+ <td><strong>Precision</strong>
149
+ </td>
150
+ <td><strong>MMLU Pro</strong>
151
+ </td>
152
+ <td><strong>GPQA Diamond</strong>
153
+ </td>
154
+ <td><strong>LiveCodeBench V6</strong>
155
+ </td>
156
+ <td><strong>SciCode</strong>
157
+ </td>
158
+ <td><strong>AIME 2025</strong>
159
+ </td>
160
+ </tr>
161
+ <tr>
162
+ <td>FP8
163
+ </td>
164
+ <td>0.823
165
+ </td>
166
+ <td>0.754
167
+ </td>
168
+ <td>0.714
169
+ </td>
170
+ <td>0.414
171
+ </td>
172
+ <td>0.879
173
+ </td>
174
+ </tr>
175
+ <tr>
176
+ <td>NVFP4
177
+ </td>
178
+ <td>0.822
179
+ </td>
180
+ <td>0.752
181
+ </td>
182
+ <td>0.708
183
+ </td>
184
+ <td>0.409
185
+ </td>
186
+ <td>0.862
187
+ </td>
188
+ </tr>
189
+ <tr>
190
+ </table>
191
+
192
+ > Baseline: [Qwen3-Next-80B-A3B-Thinking-FP8](https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Thinking-FP8).
193
+ > Benchmarked with temperature=0.6, top_p=0.95, max num tokens 81920
194
+
195
+
196
+ ## Ethical Considerations
197
+
198
+ NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
199
+
200
+ Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://app.intigriti.com/programs/nvidia/nvidiavdp/detail).
added_tokens.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</think>": 151668,
3
+ "</tool_call>": 151658,
4
+ "</tool_response>": 151666,
5
+ "<think>": 151667,
6
+ "<tool_call>": 151657,
7
+ "<tool_response>": 151665,
8
+ "<|box_end|>": 151649,
9
+ "<|box_start|>": 151648,
10
+ "<|endoftext|>": 151643,
11
+ "<|file_sep|>": 151664,
12
+ "<|fim_middle|>": 151660,
13
+ "<|fim_pad|>": 151662,
14
+ "<|fim_prefix|>": 151659,
15
+ "<|fim_suffix|>": 151661,
16
+ "<|im_end|>": 151645,
17
+ "<|im_start|>": 151644,
18
+ "<|image_pad|>": 151655,
19
+ "<|object_ref_end|>": 151647,
20
+ "<|object_ref_start|>": 151646,
21
+ "<|quad_end|>": 151651,
22
+ "<|quad_start|>": 151650,
23
+ "<|repo_name|>": 151663,
24
+ "<|video_pad|>": 151656,
25
+ "<|vision_end|>": 151653,
26
+ "<|vision_pad|>": 151654,
27
+ "<|vision_start|>": 151652
28
+ }
chat_template.jinja ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0].role == 'system' %}
4
+ {{- messages[0].content + '\n\n' }}
5
+ {%- endif %}
6
+ {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
7
+ {%- for tool in tools %}
8
+ {{- "\n" }}
9
+ {{- tool | tojson }}
10
+ {%- endfor %}
11
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
12
+ {%- else %}
13
+ {%- if messages[0].role == 'system' %}
14
+ {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
15
+ {%- endif %}
16
+ {%- endif %}
17
+ {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
18
+ {%- for message in messages[::-1] %}
19
+ {%- set index = (messages|length - 1) - loop.index0 %}
20
+ {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
21
+ {%- set ns.multi_step_tool = false %}
22
+ {%- set ns.last_query_index = index %}
23
+ {%- endif %}
24
+ {%- endfor %}
25
+ {%- for message in messages %}
26
+ {%- if message.content is string %}
27
+ {%- set content = message.content %}
28
+ {%- else %}
29
+ {%- set content = '' %}
30
+ {%- endif %}
31
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
32
+ {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
33
+ {%- elif message.role == "assistant" %}
34
+ {%- set reasoning_content = '' %}
35
+ {%- if message.reasoning_content is string %}
36
+ {%- set reasoning_content = message.reasoning_content %}
37
+ {%- else %}
38
+ {%- if '</think>' in content %}
39
+ {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
40
+ {%- set content = content.split('</think>')[-1].lstrip('\n') %}
41
+ {%- endif %}
42
+ {%- endif %}
43
+ {%- if loop.index0 > ns.last_query_index %}
44
+ {%- if loop.last or (not loop.last and reasoning_content) %}
45
+ {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
46
+ {%- else %}
47
+ {{- '<|im_start|>' + message.role + '\n' + content }}
48
+ {%- endif %}
49
+ {%- else %}
50
+ {{- '<|im_start|>' + message.role + '\n' + content }}
51
+ {%- endif %}
52
+ {%- if message.tool_calls %}
53
+ {%- for tool_call in message.tool_calls %}
54
+ {%- if (loop.first and content) or (not loop.first) %}
55
+ {{- '\n' }}
56
+ {%- endif %}
57
+ {%- if tool_call.function %}
58
+ {%- set tool_call = tool_call.function %}
59
+ {%- endif %}
60
+ {{- '<tool_call>\n{"name": "' }}
61
+ {{- tool_call.name }}
62
+ {{- '", "arguments": ' }}
63
+ {%- if tool_call.arguments is string %}
64
+ {{- tool_call.arguments }}
65
+ {%- else %}
66
+ {{- tool_call.arguments | tojson }}
67
+ {%- endif %}
68
+ {{- '}\n</tool_call>' }}
69
+ {%- endfor %}
70
+ {%- endif %}
71
+ {{- '<|im_end|>\n' }}
72
+ {%- elif message.role == "tool" %}
73
+ {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
74
+ {{- '<|im_start|>user' }}
75
+ {%- endif %}
76
+ {{- '\n<tool_response>\n' }}
77
+ {{- content }}
78
+ {{- '\n</tool_response>' }}
79
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
80
+ {{- '<|im_end|>\n' }}
81
+ {%- endif %}
82
+ {%- endif %}
83
+ {%- endfor %}
84
+ {%- if add_generation_prompt %}
85
+ {{- '<|im_start|>assistant\n<think>\n' }}
86
+ {%- endif %}
config.json ADDED
@@ -0,0 +1,370 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen3NextForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 151643,
8
+ "decoder_sparse_step": 1,
9
+ "dtype": "bfloat16",
10
+ "eos_token_id": 151645,
11
+ "full_attention_interval": 4,
12
+ "head_dim": 256,
13
+ "hidden_act": "silu",
14
+ "hidden_size": 2048,
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 5120,
17
+ "layer_types": [
18
+ "linear_attention",
19
+ "linear_attention",
20
+ "linear_attention",
21
+ "full_attention",
22
+ "linear_attention",
23
+ "linear_attention",
24
+ "linear_attention",
25
+ "full_attention",
26
+ "linear_attention",
27
+ "linear_attention",
28
+ "linear_attention",
29
+ "full_attention",
30
+ "linear_attention",
31
+ "linear_attention",
32
+ "linear_attention",
33
+ "full_attention",
34
+ "linear_attention",
35
+ "linear_attention",
36
+ "linear_attention",
37
+ "full_attention",
38
+ "linear_attention",
39
+ "linear_attention",
40
+ "linear_attention",
41
+ "full_attention",
42
+ "linear_attention",
43
+ "linear_attention",
44
+ "linear_attention",
45
+ "full_attention",
46
+ "linear_attention",
47
+ "linear_attention",
48
+ "linear_attention",
49
+ "full_attention",
50
+ "linear_attention",
51
+ "linear_attention",
52
+ "linear_attention",
53
+ "full_attention",
54
+ "linear_attention",
55
+ "linear_attention",
56
+ "linear_attention",
57
+ "full_attention",
58
+ "linear_attention",
59
+ "linear_attention",
60
+ "linear_attention",
61
+ "full_attention",
62
+ "linear_attention",
63
+ "linear_attention",
64
+ "linear_attention",
65
+ "full_attention"
66
+ ],
67
+ "linear_conv_kernel_dim": 4,
68
+ "linear_key_head_dim": 128,
69
+ "linear_num_key_heads": 16,
70
+ "linear_num_value_heads": 32,
71
+ "linear_value_head_dim": 128,
72
+ "max_position_embeddings": 262144,
73
+ "mlp_only_layers": [],
74
+ "model_type": "qwen3_next",
75
+ "moe_intermediate_size": 512,
76
+ "norm_topk_prob": true,
77
+ "num_attention_heads": 16,
78
+ "num_experts": 512,
79
+ "num_experts_per_tok": 10,
80
+ "num_hidden_layers": 48,
81
+ "num_key_value_heads": 2,
82
+ "output_router_logits": false,
83
+ "partial_rotary_factor": 0.25,
84
+ "rms_norm_eps": 1e-06,
85
+ "rope_scaling": null,
86
+ "rope_theta": 10000000,
87
+ "router_aux_loss_coef": 0.001,
88
+ "shared_expert_intermediate_size": 512,
89
+ "tie_word_embeddings": false,
90
+ "transformers_version": "4.57.1",
91
+ "use_cache": true,
92
+ "use_sliding_window": false,
93
+ "vocab_size": 151936,
94
+ "quantization_config": {
95
+ "config_groups": {
96
+ "group_0": {
97
+ "input_activations": {
98
+ "dynamic": false,
99
+ "num_bits": 4,
100
+ "type": "float",
101
+ "group_size": 16
102
+ },
103
+ "weights": {
104
+ "dynamic": false,
105
+ "num_bits": 4,
106
+ "type": "float",
107
+ "group_size": 16
108
+ },
109
+ "targets": [
110
+ "Linear"
111
+ ]
112
+ }
113
+ },
114
+ "ignore": [
115
+ "lm_head",
116
+ "model.layers.0.linear_attn.conv1d",
117
+ "model.layers.0.linear_attn.in_proj_ba",
118
+ "model.layers.0.linear_attn.in_proj_qkvz",
119
+ "model.layers.0.mlp.gate",
120
+ "model.layers.0.mlp.shared_expert_gate",
121
+ "model.layers.1.linear_attn.conv1d",
122
+ "model.layers.1.linear_attn.in_proj_ba",
123
+ "model.layers.1.linear_attn.in_proj_qkvz",
124
+ "model.layers.1.mlp.gate",
125
+ "model.layers.1.mlp.shared_expert_gate",
126
+ "model.layers.10.linear_attn.conv1d",
127
+ "model.layers.10.linear_attn.in_proj_ba",
128
+ "model.layers.10.linear_attn.in_proj_qkvz",
129
+ "model.layers.10.mlp.gate",
130
+ "model.layers.10.mlp.shared_expert_gate",
131
+ "model.layers.11.mlp.gate",
132
+ "model.layers.11.mlp.shared_expert_gate",
133
+ "model.layers.11.self_attn.k_proj",
134
+ "model.layers.11.self_attn.q_proj",
135
+ "model.layers.11.self_attn.v_proj",
136
+ "model.layers.12.linear_attn.conv1d",
137
+ "model.layers.12.linear_attn.in_proj_ba",
138
+ "model.layers.12.linear_attn.in_proj_qkvz",
139
+ "model.layers.12.mlp.gate",
140
+ "model.layers.12.mlp.shared_expert_gate",
141
+ "model.layers.13.linear_attn.conv1d",
142
+ "model.layers.13.linear_attn.in_proj_ba",
143
+ "model.layers.13.linear_attn.in_proj_qkvz",
144
+ "model.layers.13.mlp.gate",
145
+ "model.layers.13.mlp.shared_expert_gate",
146
+ "model.layers.14.linear_attn.conv1d",
147
+ "model.layers.14.linear_attn.in_proj_ba",
148
+ "model.layers.14.linear_attn.in_proj_qkvz",
149
+ "model.layers.14.mlp.gate",
150
+ "model.layers.14.mlp.shared_expert_gate",
151
+ "model.layers.15.mlp.gate",
152
+ "model.layers.15.mlp.shared_expert_gate",
153
+ "model.layers.15.self_attn.k_proj",
154
+ "model.layers.15.self_attn.q_proj",
155
+ "model.layers.15.self_attn.v_proj",
156
+ "model.layers.16.linear_attn.conv1d",
157
+ "model.layers.16.linear_attn.in_proj_ba",
158
+ "model.layers.16.linear_attn.in_proj_qkvz",
159
+ "model.layers.16.mlp.gate",
160
+ "model.layers.16.mlp.shared_expert_gate",
161
+ "model.layers.17.linear_attn.conv1d",
162
+ "model.layers.17.linear_attn.in_proj_ba",
163
+ "model.layers.17.linear_attn.in_proj_qkvz",
164
+ "model.layers.17.mlp.gate",
165
+ "model.layers.17.mlp.shared_expert_gate",
166
+ "model.layers.18.linear_attn.conv1d",
167
+ "model.layers.18.linear_attn.in_proj_ba",
168
+ "model.layers.18.linear_attn.in_proj_qkvz",
169
+ "model.layers.18.mlp.gate",
170
+ "model.layers.18.mlp.shared_expert_gate",
171
+ "model.layers.19.mlp.gate",
172
+ "model.layers.19.mlp.shared_expert_gate",
173
+ "model.layers.19.self_attn.k_proj",
174
+ "model.layers.19.self_attn.q_proj",
175
+ "model.layers.19.self_attn.v_proj",
176
+ "model.layers.2.linear_attn.conv1d",
177
+ "model.layers.2.linear_attn.in_proj_ba",
178
+ "model.layers.2.linear_attn.in_proj_qkvz",
179
+ "model.layers.2.mlp.gate",
180
+ "model.layers.2.mlp.shared_expert_gate",
181
+ "model.layers.20.linear_attn.conv1d",
182
+ "model.layers.20.linear_attn.in_proj_ba",
183
+ "model.layers.20.linear_attn.in_proj_qkvz",
184
+ "model.layers.20.mlp.gate",
185
+ "model.layers.20.mlp.shared_expert_gate",
186
+ "model.layers.21.linear_attn.conv1d",
187
+ "model.layers.21.linear_attn.in_proj_ba",
188
+ "model.layers.21.linear_attn.in_proj_qkvz",
189
+ "model.layers.21.mlp.gate",
190
+ "model.layers.21.mlp.shared_expert_gate",
191
+ "model.layers.22.linear_attn.conv1d",
192
+ "model.layers.22.linear_attn.in_proj_ba",
193
+ "model.layers.22.linear_attn.in_proj_qkvz",
194
+ "model.layers.22.mlp.gate",
195
+ "model.layers.22.mlp.shared_expert_gate",
196
+ "model.layers.23.mlp.gate",
197
+ "model.layers.23.mlp.shared_expert_gate",
198
+ "model.layers.23.self_attn.k_proj",
199
+ "model.layers.23.self_attn.q_proj",
200
+ "model.layers.23.self_attn.v_proj",
201
+ "model.layers.24.linear_attn.conv1d",
202
+ "model.layers.24.linear_attn.in_proj_ba",
203
+ "model.layers.24.linear_attn.in_proj_qkvz",
204
+ "model.layers.24.mlp.gate",
205
+ "model.layers.24.mlp.shared_expert_gate",
206
+ "model.layers.25.linear_attn.conv1d",
207
+ "model.layers.25.linear_attn.in_proj_ba",
208
+ "model.layers.25.linear_attn.in_proj_qkvz",
209
+ "model.layers.25.mlp.gate",
210
+ "model.layers.25.mlp.shared_expert_gate",
211
+ "model.layers.26.linear_attn.conv1d",
212
+ "model.layers.26.linear_attn.in_proj_ba",
213
+ "model.layers.26.linear_attn.in_proj_qkvz",
214
+ "model.layers.26.mlp.gate",
215
+ "model.layers.26.mlp.shared_expert_gate",
216
+ "model.layers.27.mlp.gate",
217
+ "model.layers.27.mlp.shared_expert_gate",
218
+ "model.layers.27.self_attn.k_proj",
219
+ "model.layers.27.self_attn.q_proj",
220
+ "model.layers.27.self_attn.v_proj",
221
+ "model.layers.28.linear_attn.conv1d",
222
+ "model.layers.28.linear_attn.in_proj_ba",
223
+ "model.layers.28.linear_attn.in_proj_qkvz",
224
+ "model.layers.28.mlp.gate",
225
+ "model.layers.28.mlp.shared_expert_gate",
226
+ "model.layers.29.linear_attn.conv1d",
227
+ "model.layers.29.linear_attn.in_proj_ba",
228
+ "model.layers.29.linear_attn.in_proj_qkvz",
229
+ "model.layers.29.mlp.gate",
230
+ "model.layers.29.mlp.shared_expert_gate",
231
+ "model.layers.3.mlp.gate",
232
+ "model.layers.3.mlp.shared_expert_gate",
233
+ "model.layers.3.self_attn.k_proj",
234
+ "model.layers.3.self_attn.q_proj",
235
+ "model.layers.3.self_attn.v_proj",
236
+ "model.layers.30.linear_attn.conv1d",
237
+ "model.layers.30.linear_attn.in_proj_ba",
238
+ "model.layers.30.linear_attn.in_proj_qkvz",
239
+ "model.layers.30.mlp.gate",
240
+ "model.layers.30.mlp.shared_expert_gate",
241
+ "model.layers.31.mlp.gate",
242
+ "model.layers.31.mlp.shared_expert_gate",
243
+ "model.layers.31.self_attn.k_proj",
244
+ "model.layers.31.self_attn.q_proj",
245
+ "model.layers.31.self_attn.v_proj",
246
+ "model.layers.32.linear_attn.conv1d",
247
+ "model.layers.32.linear_attn.in_proj_ba",
248
+ "model.layers.32.linear_attn.in_proj_qkvz",
249
+ "model.layers.32.mlp.gate",
250
+ "model.layers.32.mlp.shared_expert_gate",
251
+ "model.layers.33.linear_attn.conv1d",
252
+ "model.layers.33.linear_attn.in_proj_ba",
253
+ "model.layers.33.linear_attn.in_proj_qkvz",
254
+ "model.layers.33.mlp.gate",
255
+ "model.layers.33.mlp.shared_expert_gate",
256
+ "model.layers.34.linear_attn.conv1d",
257
+ "model.layers.34.linear_attn.in_proj_ba",
258
+ "model.layers.34.linear_attn.in_proj_qkvz",
259
+ "model.layers.34.mlp.gate",
260
+ "model.layers.34.mlp.shared_expert_gate",
261
+ "model.layers.35.mlp.gate",
262
+ "model.layers.35.mlp.shared_expert_gate",
263
+ "model.layers.35.self_attn.k_proj",
264
+ "model.layers.35.self_attn.q_proj",
265
+ "model.layers.35.self_attn.v_proj",
266
+ "model.layers.36.linear_attn.conv1d",
267
+ "model.layers.36.linear_attn.in_proj_ba",
268
+ "model.layers.36.linear_attn.in_proj_qkvz",
269
+ "model.layers.36.mlp.gate",
270
+ "model.layers.36.mlp.shared_expert_gate",
271
+ "model.layers.37.linear_attn.conv1d",
272
+ "model.layers.37.linear_attn.in_proj_ba",
273
+ "model.layers.37.linear_attn.in_proj_qkvz",
274
+ "model.layers.37.mlp.gate",
275
+ "model.layers.37.mlp.shared_expert_gate",
276
+ "model.layers.38.linear_attn.conv1d",
277
+ "model.layers.38.linear_attn.in_proj_ba",
278
+ "model.layers.38.linear_attn.in_proj_qkvz",
279
+ "model.layers.38.mlp.gate",
280
+ "model.layers.38.mlp.shared_expert_gate",
281
+ "model.layers.39.mlp.gate",
282
+ "model.layers.39.mlp.shared_expert_gate",
283
+ "model.layers.39.self_attn.k_proj",
284
+ "model.layers.39.self_attn.q_proj",
285
+ "model.layers.39.self_attn.v_proj",
286
+ "model.layers.4.linear_attn.conv1d",
287
+ "model.layers.4.linear_attn.in_proj_ba",
288
+ "model.layers.4.linear_attn.in_proj_qkvz",
289
+ "model.layers.4.mlp.gate",
290
+ "model.layers.4.mlp.shared_expert_gate",
291
+ "model.layers.40.linear_attn.conv1d",
292
+ "model.layers.40.linear_attn.in_proj_ba",
293
+ "model.layers.40.linear_attn.in_proj_qkvz",
294
+ "model.layers.40.mlp.gate",
295
+ "model.layers.40.mlp.shared_expert_gate",
296
+ "model.layers.41.linear_attn.conv1d",
297
+ "model.layers.41.linear_attn.in_proj_ba",
298
+ "model.layers.41.linear_attn.in_proj_qkvz",
299
+ "model.layers.41.mlp.gate",
300
+ "model.layers.41.mlp.shared_expert_gate",
301
+ "model.layers.42.linear_attn.conv1d",
302
+ "model.layers.42.linear_attn.in_proj_ba",
303
+ "model.layers.42.linear_attn.in_proj_qkvz",
304
+ "model.layers.42.mlp.gate",
305
+ "model.layers.42.mlp.shared_expert_gate",
306
+ "model.layers.43.mlp.gate",
307
+ "model.layers.43.mlp.shared_expert_gate",
308
+ "model.layers.43.self_attn.k_proj",
309
+ "model.layers.43.self_attn.q_proj",
310
+ "model.layers.43.self_attn.v_proj",
311
+ "model.layers.44.linear_attn.conv1d",
312
+ "model.layers.44.linear_attn.in_proj_ba",
313
+ "model.layers.44.linear_attn.in_proj_qkvz",
314
+ "model.layers.44.mlp.gate",
315
+ "model.layers.44.mlp.shared_expert_gate",
316
+ "model.layers.45.linear_attn.conv1d",
317
+ "model.layers.45.linear_attn.in_proj_ba",
318
+ "model.layers.45.linear_attn.in_proj_qkvz",
319
+ "model.layers.45.mlp.gate",
320
+ "model.layers.45.mlp.shared_expert_gate",
321
+ "model.layers.46.linear_attn.conv1d",
322
+ "model.layers.46.linear_attn.in_proj_ba",
323
+ "model.layers.46.linear_attn.in_proj_qkvz",
324
+ "model.layers.46.mlp.gate",
325
+ "model.layers.46.mlp.shared_expert_gate",
326
+ "model.layers.47.mlp.gate",
327
+ "model.layers.47.mlp.shared_expert_gate",
328
+ "model.layers.47.self_attn.k_proj",
329
+ "model.layers.47.self_attn.q_proj",
330
+ "model.layers.47.self_attn.v_proj",
331
+ "model.layers.5.linear_attn.conv1d",
332
+ "model.layers.5.linear_attn.in_proj_ba",
333
+ "model.layers.5.linear_attn.in_proj_qkvz",
334
+ "model.layers.5.mlp.gate",
335
+ "model.layers.5.mlp.shared_expert_gate",
336
+ "model.layers.6.linear_attn.conv1d",
337
+ "model.layers.6.linear_attn.in_proj_ba",
338
+ "model.layers.6.linear_attn.in_proj_qkvz",
339
+ "model.layers.6.mlp.gate",
340
+ "model.layers.6.mlp.shared_expert_gate",
341
+ "model.layers.7.mlp.gate",
342
+ "model.layers.7.mlp.shared_expert_gate",
343
+ "model.layers.7.self_attn.k_proj",
344
+ "model.layers.7.self_attn.q_proj",
345
+ "model.layers.7.self_attn.v_proj",
346
+ "model.layers.8.linear_attn.conv1d",
347
+ "model.layers.8.linear_attn.in_proj_ba",
348
+ "model.layers.8.linear_attn.in_proj_qkvz",
349
+ "model.layers.8.mlp.gate",
350
+ "model.layers.8.mlp.shared_expert_gate",
351
+ "model.layers.9.linear_attn.conv1d",
352
+ "model.layers.9.linear_attn.in_proj_ba",
353
+ "model.layers.9.linear_attn.in_proj_qkvz",
354
+ "model.layers.9.mlp.gate",
355
+ "model.layers.9.mlp.shared_expert_gate",
356
+ "mtp.layers.0*"
357
+ ],
358
+ "quant_algo": "NVFP4",
359
+ "kv_cache_scheme": {
360
+ "dynamic": false,
361
+ "num_bits": 8,
362
+ "type": "float"
363
+ },
364
+ "producer": {
365
+ "name": "modelopt",
366
+ "version": "0.0.1.dev445+gae4ae22f9.d20260209"
367
+ },
368
+ "quant_method": "modelopt"
369
+ }
370
+ }
generation_config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 151643,
3
+ "do_sample": true,
4
+ "eos_token_id": [
5
+ 151645,
6
+ 151643
7
+ ],
8
+ "pad_token_id": 151643,
9
+ "temperature": 0.6,
10
+ "top_k": 20,
11
+ "top_p": 0.95,
12
+ "transformers_version": "4.57.0.dev0"
13
+ }
hf_quant_config.json ADDED
@@ -0,0 +1,255 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "producer": {
3
+ "name": "modelopt",
4
+ "version": "0.0.1.dev445+gae4ae22f9.d20260209"
5
+ },
6
+ "quantization": {
7
+ "quant_algo": "NVFP4",
8
+ "kv_cache_quant_algo": "FP8",
9
+ "group_size": 16,
10
+ "exclude_modules": [
11
+ "lm_head",
12
+ "model.layers.0.linear_attn.conv1d",
13
+ "model.layers.0.linear_attn.in_proj_ba",
14
+ "model.layers.0.linear_attn.in_proj_qkvz",
15
+ "model.layers.0.mlp.gate",
16
+ "model.layers.0.mlp.shared_expert_gate",
17
+ "model.layers.1.linear_attn.conv1d",
18
+ "model.layers.1.linear_attn.in_proj_ba",
19
+ "model.layers.1.linear_attn.in_proj_qkvz",
20
+ "model.layers.1.mlp.gate",
21
+ "model.layers.1.mlp.shared_expert_gate",
22
+ "model.layers.10.linear_attn.conv1d",
23
+ "model.layers.10.linear_attn.in_proj_ba",
24
+ "model.layers.10.linear_attn.in_proj_qkvz",
25
+ "model.layers.10.mlp.gate",
26
+ "model.layers.10.mlp.shared_expert_gate",
27
+ "model.layers.11.mlp.gate",
28
+ "model.layers.11.mlp.shared_expert_gate",
29
+ "model.layers.11.self_attn.k_proj",
30
+ "model.layers.11.self_attn.q_proj",
31
+ "model.layers.11.self_attn.v_proj",
32
+ "model.layers.12.linear_attn.conv1d",
33
+ "model.layers.12.linear_attn.in_proj_ba",
34
+ "model.layers.12.linear_attn.in_proj_qkvz",
35
+ "model.layers.12.mlp.gate",
36
+ "model.layers.12.mlp.shared_expert_gate",
37
+ "model.layers.13.linear_attn.conv1d",
38
+ "model.layers.13.linear_attn.in_proj_ba",
39
+ "model.layers.13.linear_attn.in_proj_qkvz",
40
+ "model.layers.13.mlp.gate",
41
+ "model.layers.13.mlp.shared_expert_gate",
42
+ "model.layers.14.linear_attn.conv1d",
43
+ "model.layers.14.linear_attn.in_proj_ba",
44
+ "model.layers.14.linear_attn.in_proj_qkvz",
45
+ "model.layers.14.mlp.gate",
46
+ "model.layers.14.mlp.shared_expert_gate",
47
+ "model.layers.15.mlp.gate",
48
+ "model.layers.15.mlp.shared_expert_gate",
49
+ "model.layers.15.self_attn.k_proj",
50
+ "model.layers.15.self_attn.q_proj",
51
+ "model.layers.15.self_attn.v_proj",
52
+ "model.layers.16.linear_attn.conv1d",
53
+ "model.layers.16.linear_attn.in_proj_ba",
54
+ "model.layers.16.linear_attn.in_proj_qkvz",
55
+ "model.layers.16.mlp.gate",
56
+ "model.layers.16.mlp.shared_expert_gate",
57
+ "model.layers.17.linear_attn.conv1d",
58
+ "model.layers.17.linear_attn.in_proj_ba",
59
+ "model.layers.17.linear_attn.in_proj_qkvz",
60
+ "model.layers.17.mlp.gate",
61
+ "model.layers.17.mlp.shared_expert_gate",
62
+ "model.layers.18.linear_attn.conv1d",
63
+ "model.layers.18.linear_attn.in_proj_ba",
64
+ "model.layers.18.linear_attn.in_proj_qkvz",
65
+ "model.layers.18.mlp.gate",
66
+ "model.layers.18.mlp.shared_expert_gate",
67
+ "model.layers.19.mlp.gate",
68
+ "model.layers.19.mlp.shared_expert_gate",
69
+ "model.layers.19.self_attn.k_proj",
70
+ "model.layers.19.self_attn.q_proj",
71
+ "model.layers.19.self_attn.v_proj",
72
+ "model.layers.2.linear_attn.conv1d",
73
+ "model.layers.2.linear_attn.in_proj_ba",
74
+ "model.layers.2.linear_attn.in_proj_qkvz",
75
+ "model.layers.2.mlp.gate",
76
+ "model.layers.2.mlp.shared_expert_gate",
77
+ "model.layers.20.linear_attn.conv1d",
78
+ "model.layers.20.linear_attn.in_proj_ba",
79
+ "model.layers.20.linear_attn.in_proj_qkvz",
80
+ "model.layers.20.mlp.gate",
81
+ "model.layers.20.mlp.shared_expert_gate",
82
+ "model.layers.21.linear_attn.conv1d",
83
+ "model.layers.21.linear_attn.in_proj_ba",
84
+ "model.layers.21.linear_attn.in_proj_qkvz",
85
+ "model.layers.21.mlp.gate",
86
+ "model.layers.21.mlp.shared_expert_gate",
87
+ "model.layers.22.linear_attn.conv1d",
88
+ "model.layers.22.linear_attn.in_proj_ba",
89
+ "model.layers.22.linear_attn.in_proj_qkvz",
90
+ "model.layers.22.mlp.gate",
91
+ "model.layers.22.mlp.shared_expert_gate",
92
+ "model.layers.23.mlp.gate",
93
+ "model.layers.23.mlp.shared_expert_gate",
94
+ "model.layers.23.self_attn.k_proj",
95
+ "model.layers.23.self_attn.q_proj",
96
+ "model.layers.23.self_attn.v_proj",
97
+ "model.layers.24.linear_attn.conv1d",
98
+ "model.layers.24.linear_attn.in_proj_ba",
99
+ "model.layers.24.linear_attn.in_proj_qkvz",
100
+ "model.layers.24.mlp.gate",
101
+ "model.layers.24.mlp.shared_expert_gate",
102
+ "model.layers.25.linear_attn.conv1d",
103
+ "model.layers.25.linear_attn.in_proj_ba",
104
+ "model.layers.25.linear_attn.in_proj_qkvz",
105
+ "model.layers.25.mlp.gate",
106
+ "model.layers.25.mlp.shared_expert_gate",
107
+ "model.layers.26.linear_attn.conv1d",
108
+ "model.layers.26.linear_attn.in_proj_ba",
109
+ "model.layers.26.linear_attn.in_proj_qkvz",
110
+ "model.layers.26.mlp.gate",
111
+ "model.layers.26.mlp.shared_expert_gate",
112
+ "model.layers.27.mlp.gate",
113
+ "model.layers.27.mlp.shared_expert_gate",
114
+ "model.layers.27.self_attn.k_proj",
115
+ "model.layers.27.self_attn.q_proj",
116
+ "model.layers.27.self_attn.v_proj",
117
+ "model.layers.28.linear_attn.conv1d",
118
+ "model.layers.28.linear_attn.in_proj_ba",
119
+ "model.layers.28.linear_attn.in_proj_qkvz",
120
+ "model.layers.28.mlp.gate",
121
+ "model.layers.28.mlp.shared_expert_gate",
122
+ "model.layers.29.linear_attn.conv1d",
123
+ "model.layers.29.linear_attn.in_proj_ba",
124
+ "model.layers.29.linear_attn.in_proj_qkvz",
125
+ "model.layers.29.mlp.gate",
126
+ "model.layers.29.mlp.shared_expert_gate",
127
+ "model.layers.3.mlp.gate",
128
+ "model.layers.3.mlp.shared_expert_gate",
129
+ "model.layers.3.self_attn.k_proj",
130
+ "model.layers.3.self_attn.q_proj",
131
+ "model.layers.3.self_attn.v_proj",
132
+ "model.layers.30.linear_attn.conv1d",
133
+ "model.layers.30.linear_attn.in_proj_ba",
134
+ "model.layers.30.linear_attn.in_proj_qkvz",
135
+ "model.layers.30.mlp.gate",
136
+ "model.layers.30.mlp.shared_expert_gate",
137
+ "model.layers.31.mlp.gate",
138
+ "model.layers.31.mlp.shared_expert_gate",
139
+ "model.layers.31.self_attn.k_proj",
140
+ "model.layers.31.self_attn.q_proj",
141
+ "model.layers.31.self_attn.v_proj",
142
+ "model.layers.32.linear_attn.conv1d",
143
+ "model.layers.32.linear_attn.in_proj_ba",
144
+ "model.layers.32.linear_attn.in_proj_qkvz",
145
+ "model.layers.32.mlp.gate",
146
+ "model.layers.32.mlp.shared_expert_gate",
147
+ "model.layers.33.linear_attn.conv1d",
148
+ "model.layers.33.linear_attn.in_proj_ba",
149
+ "model.layers.33.linear_attn.in_proj_qkvz",
150
+ "model.layers.33.mlp.gate",
151
+ "model.layers.33.mlp.shared_expert_gate",
152
+ "model.layers.34.linear_attn.conv1d",
153
+ "model.layers.34.linear_attn.in_proj_ba",
154
+ "model.layers.34.linear_attn.in_proj_qkvz",
155
+ "model.layers.34.mlp.gate",
156
+ "model.layers.34.mlp.shared_expert_gate",
157
+ "model.layers.35.mlp.gate",
158
+ "model.layers.35.mlp.shared_expert_gate",
159
+ "model.layers.35.self_attn.k_proj",
160
+ "model.layers.35.self_attn.q_proj",
161
+ "model.layers.35.self_attn.v_proj",
162
+ "model.layers.36.linear_attn.conv1d",
163
+ "model.layers.36.linear_attn.in_proj_ba",
164
+ "model.layers.36.linear_attn.in_proj_qkvz",
165
+ "model.layers.36.mlp.gate",
166
+ "model.layers.36.mlp.shared_expert_gate",
167
+ "model.layers.37.linear_attn.conv1d",
168
+ "model.layers.37.linear_attn.in_proj_ba",
169
+ "model.layers.37.linear_attn.in_proj_qkvz",
170
+ "model.layers.37.mlp.gate",
171
+ "model.layers.37.mlp.shared_expert_gate",
172
+ "model.layers.38.linear_attn.conv1d",
173
+ "model.layers.38.linear_attn.in_proj_ba",
174
+ "model.layers.38.linear_attn.in_proj_qkvz",
175
+ "model.layers.38.mlp.gate",
176
+ "model.layers.38.mlp.shared_expert_gate",
177
+ "model.layers.39.mlp.gate",
178
+ "model.layers.39.mlp.shared_expert_gate",
179
+ "model.layers.39.self_attn.k_proj",
180
+ "model.layers.39.self_attn.q_proj",
181
+ "model.layers.39.self_attn.v_proj",
182
+ "model.layers.4.linear_attn.conv1d",
183
+ "model.layers.4.linear_attn.in_proj_ba",
184
+ "model.layers.4.linear_attn.in_proj_qkvz",
185
+ "model.layers.4.mlp.gate",
186
+ "model.layers.4.mlp.shared_expert_gate",
187
+ "model.layers.40.linear_attn.conv1d",
188
+ "model.layers.40.linear_attn.in_proj_ba",
189
+ "model.layers.40.linear_attn.in_proj_qkvz",
190
+ "model.layers.40.mlp.gate",
191
+ "model.layers.40.mlp.shared_expert_gate",
192
+ "model.layers.41.linear_attn.conv1d",
193
+ "model.layers.41.linear_attn.in_proj_ba",
194
+ "model.layers.41.linear_attn.in_proj_qkvz",
195
+ "model.layers.41.mlp.gate",
196
+ "model.layers.41.mlp.shared_expert_gate",
197
+ "model.layers.42.linear_attn.conv1d",
198
+ "model.layers.42.linear_attn.in_proj_ba",
199
+ "model.layers.42.linear_attn.in_proj_qkvz",
200
+ "model.layers.42.mlp.gate",
201
+ "model.layers.42.mlp.shared_expert_gate",
202
+ "model.layers.43.mlp.gate",
203
+ "model.layers.43.mlp.shared_expert_gate",
204
+ "model.layers.43.self_attn.k_proj",
205
+ "model.layers.43.self_attn.q_proj",
206
+ "model.layers.43.self_attn.v_proj",
207
+ "model.layers.44.linear_attn.conv1d",
208
+ "model.layers.44.linear_attn.in_proj_ba",
209
+ "model.layers.44.linear_attn.in_proj_qkvz",
210
+ "model.layers.44.mlp.gate",
211
+ "model.layers.44.mlp.shared_expert_gate",
212
+ "model.layers.45.linear_attn.conv1d",
213
+ "model.layers.45.linear_attn.in_proj_ba",
214
+ "model.layers.45.linear_attn.in_proj_qkvz",
215
+ "model.layers.45.mlp.gate",
216
+ "model.layers.45.mlp.shared_expert_gate",
217
+ "model.layers.46.linear_attn.conv1d",
218
+ "model.layers.46.linear_attn.in_proj_ba",
219
+ "model.layers.46.linear_attn.in_proj_qkvz",
220
+ "model.layers.46.mlp.gate",
221
+ "model.layers.46.mlp.shared_expert_gate",
222
+ "model.layers.47.mlp.gate",
223
+ "model.layers.47.mlp.shared_expert_gate",
224
+ "model.layers.47.self_attn.k_proj",
225
+ "model.layers.47.self_attn.q_proj",
226
+ "model.layers.47.self_attn.v_proj",
227
+ "model.layers.5.linear_attn.conv1d",
228
+ "model.layers.5.linear_attn.in_proj_ba",
229
+ "model.layers.5.linear_attn.in_proj_qkvz",
230
+ "model.layers.5.mlp.gate",
231
+ "model.layers.5.mlp.shared_expert_gate",
232
+ "model.layers.6.linear_attn.conv1d",
233
+ "model.layers.6.linear_attn.in_proj_ba",
234
+ "model.layers.6.linear_attn.in_proj_qkvz",
235
+ "model.layers.6.mlp.gate",
236
+ "model.layers.6.mlp.shared_expert_gate",
237
+ "model.layers.7.mlp.gate",
238
+ "model.layers.7.mlp.shared_expert_gate",
239
+ "model.layers.7.self_attn.k_proj",
240
+ "model.layers.7.self_attn.q_proj",
241
+ "model.layers.7.self_attn.v_proj",
242
+ "model.layers.8.linear_attn.conv1d",
243
+ "model.layers.8.linear_attn.in_proj_ba",
244
+ "model.layers.8.linear_attn.in_proj_qkvz",
245
+ "model.layers.8.mlp.gate",
246
+ "model.layers.8.mlp.shared_expert_gate",
247
+ "model.layers.9.linear_attn.conv1d",
248
+ "model.layers.9.linear_attn.in_proj_ba",
249
+ "model.layers.9.linear_attn.in_proj_qkvz",
250
+ "model.layers.9.mlp.gate",
251
+ "model.layers.9.mlp.shared_expert_gate",
252
+ "mtp.layers.0*"
253
+ ]
254
+ }
255
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model-00001-of-00011.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:48076ded1e2667e580f65334135437fe365599e52a8d45f16f4844871e9c691b
3
+ size 5003036968
model-00002-of-00011.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9b9edf08dd268bddb1eac630fbdccdd0671cea9663d32bd78a00cffc98445f2c
3
+ size 5003483960
model-00003-of-00011.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:034f364254657f125f93ea5a85fbfcc5b81105823068870c4374f1bc5eb4973a
3
+ size 5003514400
model-00004-of-00011.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8b60f7f85d7724585a8d1b48543e12cb04c9e7ded1e50f74b121b0450dce7031
3
+ size 5003755712
model-00005-of-00011.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e81a4c7e05dc48ee2421053db5f3e0bccceb260293d1d41e4128b871b6a63819
3
+ size 5003581304
model-00006-of-00011.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6166dbf630027173c0bab8bb63c70908e8350ee3513d6984aeb29b1e3dc9a5f0
3
+ size 5003516056
model-00007-of-00011.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ea1ddbacaae76c0949e5d99139f0d73345cf415c5e54366df34af3aac8649e36
3
+ size 5003593008
model-00008-of-00011.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8ba17da6d76be3a695d8f6c5a060f0f63155f762fd6b2adddb27d03b06843b47
3
+ size 5003516072
model-00009-of-00011.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:27eb6a1822b655c831f0d3f6e6667f9a60b7239a77dc945ad4c88fb0488c81a6
3
+ size 5003744824
model-00010-of-00011.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2b365175bccbcca8420a972ad87d5c230f9e29174aefd5e44483ce7c89283205
3
+ size 5000330496
model-00011-of-00011.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e2b4a32aae8538581df2c16b4273e493ad02d926cd7dde207bab6e2892bf0637
3
+ size 725675520
model.safetensors.index.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f9c2f67f083110def2b0d727e2b90b9eeb5179b7cc400a450880da4d04abb47d
3
+ size 28463294
special_tokens_map.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|im_end|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": "<|im_end|>"
25
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aeb13307a71acd8fe81861d94ad54ab689df773318809eed3cbe794b4492dae4
3
+ size 11422654
tokenizer_config.json ADDED
@@ -0,0 +1,239 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ },
181
+ "151665": {
182
+ "content": "<tool_response>",
183
+ "lstrip": false,
184
+ "normalized": false,
185
+ "rstrip": false,
186
+ "single_word": false,
187
+ "special": false
188
+ },
189
+ "151666": {
190
+ "content": "</tool_response>",
191
+ "lstrip": false,
192
+ "normalized": false,
193
+ "rstrip": false,
194
+ "single_word": false,
195
+ "special": false
196
+ },
197
+ "151667": {
198
+ "content": "<think>",
199
+ "lstrip": false,
200
+ "normalized": false,
201
+ "rstrip": false,
202
+ "single_word": false,
203
+ "special": false
204
+ },
205
+ "151668": {
206
+ "content": "</think>",
207
+ "lstrip": false,
208
+ "normalized": false,
209
+ "rstrip": false,
210
+ "single_word": false,
211
+ "special": false
212
+ }
213
+ },
214
+ "additional_special_tokens": [
215
+ "<|im_start|>",
216
+ "<|im_end|>",
217
+ "<|object_ref_start|>",
218
+ "<|object_ref_end|>",
219
+ "<|box_start|>",
220
+ "<|box_end|>",
221
+ "<|quad_start|>",
222
+ "<|quad_end|>",
223
+ "<|vision_start|>",
224
+ "<|vision_end|>",
225
+ "<|vision_pad|>",
226
+ "<|image_pad|>",
227
+ "<|video_pad|>"
228
+ ],
229
+ "bos_token": null,
230
+ "clean_up_tokenization_spaces": false,
231
+ "eos_token": "<|im_end|>",
232
+ "errors": "replace",
233
+ "extra_special_tokens": {},
234
+ "model_max_length": 1010000,
235
+ "pad_token": "<|im_end|>",
236
+ "split_special_tokens": false,
237
+ "tokenizer_class": "Qwen2Tokenizer",
238
+ "unk_token": null
239
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff