Qwen3-8B ToolACE W4A16

W4A16 quantized version of kenkaneki/Qwen3-8B-ToolACE.

Quantized with llm-compressor using ToolACE calibration data (domain-matched, not generic WikiText).

Key Numbers

Metric BF16 W4A16 (this)
BFCL simple_python 96.25% 96.50%
Model size 16 GB 5.7 GB
E2EL p50 (c=1) 323.9 ms 222.3 ms
Output tok/s (c=1) 150.4 208.9

Serving

vllm serve kenkaneki/Qwen3-8B-ToolACE-W4A16 --quantization compressed-tensors --enable-auto-tool-choice --tool-call-parser hermes

Quantization

python scripts/quantize.py --model kenkaneki/Qwen3-8B-ToolACE --method w4a16 --output ./w4a16

Code: github.com/aimedvedevq/toolaceqwen

Downloads last month
20
Safetensors
Model size
2B params
Tensor type
BF16
·
I64
·
I32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kenkaneki/Qwen3-8B-ToolACE-W4A16

Finetuned
Qwen/Qwen3-8B
Quantized
(1)
this model

Dataset used to train kenkaneki/Qwen3-8B-ToolACE-W4A16

Collection including kenkaneki/Qwen3-8B-ToolACE-W4A16