kenkaneki
/

Qwen3-8B-ToolACE-W4A16

Text Generation

function-calling

compressed-tensors

Model card Files Files and versions

Qwen3-8B ToolACE W4A16

W4A16 quantized version of kenkaneki/Qwen3-8B-ToolACE.

Quantized with llm-compressor using ToolACE calibration data (domain-matched, not generic WikiText).

Key Numbers

Metric	BF16	W4A16 (this)
BFCL simple_python	96.25%	96.50%
Model size	16 GB	5.7 GB
E2EL p50 (c=1)	323.9 ms	222.3 ms
Output tok/s (c=1)	150.4	208.9

Serving

vllm serve kenkaneki/Qwen3-8B-ToolACE-W4A16 --quantization compressed-tensors --enable-auto-tool-choice --tool-call-parser hermes

Quantization

python scripts/quantize.py --model kenkaneki/Qwen3-8B-ToolACE --method w4a16 --output ./w4a16

Code: github.com/aimedvedevq/toolaceqwen

Downloads last month: 20

Safetensors

Model size

2B params

Tensor type

BF16

·

I64

·

I32

·

Model tree for kenkaneki/Qwen3-8B-ToolACE-W4A16

Base model

Qwen/Qwen3-8B-Base

Finetuned

Finetuned

kenkaneki/Qwen3-8B-ToolACE

Quantized

(1)

this model

Dataset used to train kenkaneki/Qwen3-8B-ToolACE-W4A16

Collection including kenkaneki/Qwen3-8B-ToolACE-W4A16

toolася

toolcalling sft+grpo+specdecoding • 3 items • Updated 29 days ago