Quantized GGUF versions of Qwen/Qwen2.5-Coder-7B-Instruct โ Alibaba's specialized code generation model trained on 5.5 trillion tokens of code data. Achieves performance comparable to much larger general models on coding benchmarks โ the go-to for local code assistance.
Available Files
File
Quant
Size
Use Case
Qwen2.5-Coder-7B-Instruct-Q8_0.gguf
Q8_0
~7.7GB
Maximum quality
Qwen2.5-Coder-7B-Instruct-Q6_K.gguf
Q6_K
~6.0GB
Near-lossless
Qwen2.5-Coder-7B-Instruct-Q5_K_M.gguf
Q5_K_M
~5.2GB
High quality
Qwen2.5-Coder-7B-Instruct-Q4_K_M.gguf
Q4_K_M
~4.4GB
Recommended default
Qwen2.5-Coder-7B-Instruct-Q3_K_M.gguf
Q3_K_M
~3.5GB
Low VRAM
Qwen2.5-Coder-7B-Instruct-IQ4_XS.gguf
IQ4_XS
~3.9GB
Imatrix 4-bit
Qwen2.5-Coder-7B-Instruct-IQ3_XXS.gguf
IQ3_XXS
~2.9GB
Imatrix 3-bit
Qwen2.5-Coder-7B-Instruct-IQ2_M.gguf
IQ2_M
~2.5GB
Imatrix 2-bit
Qwen2.5-Coder-7B-Instruct-IQ1_S.gguf
IQ1_S
~1.8GB
Extreme compression
Qwen2.5-Coder-7B-Instruct-fp16.gguf
FP16
~14.8GB
Full precision
imatrix.dat
โ
โ
Importance matrix
Usage
./llama-cli -m Qwen2.5-Coder-7B-Instruct-Q4_K_M.gguf \
--ctx-size 8192 -n 1024 \
-p "<|im_start|>system\nYou are an expert programmer.<|im_end|>\n<|im_start|>user\nWrite a Python function to sort a list.<|im_end|>\n<|im_start|>assistant\n"
About Qwen2.5-Coder-7B
Parameters: 7B
Training: 5.5T tokens of code data
Context: 128K tokens
License: Apache 2.0
Strengths: Code completion, debugging, code explanation, FIM for IDE integrations
Best-in-class local code assistance at the 7B scale.
Quantized by DuoNeural using llama.cpp on RTX 5090.
DuoNeural
DuoNeural is an open AI research lab โ human + AI in collaboration.