Quantization was performed using exllama3 v0.0.25.

Quant	Size (GB)	Actual bpw	PPL	KL-div (q→o)	KL-div (o→q)	Top-1	Top-2	Top-3	Top-4	Top-5
3.0bpw	142	3.00	3.220	0.0674	0.0776	91.9%	68.4%	44.5%	26.6%	14.8%
3.5bpw_opt	143	3.03	3.173	0.0474	0.0531	93.5%	73.4%	51.1%	32.9%	20.1%
4.0bpw	188	4.00	3.101	0.0203	0.0210	95.7%	81.0%	62.3%	44.7%	30.5%
4.5bpw_opt	189	4.03	3.082	0.0149	0.0153	96.3%	83.9%	67.2%	50.7%	36.6%
5.0bpw	234	5.00	3.067	0.0079	0.0079	97.3%	87.6%	73.9%	59.0%	45.3%
original	751	16.00	3.053	—	—	—	—	—	—	—

Metrics

PPL (Perplexity) — how well the model predicts the next token. Lower is better. The original model's PPL is the baseline.
KL-div (Kullback-Leibler divergence) — measures how the quant's probability distribution differs from the original. Lower is better. Shown in both directions (quant→orig, orig→quant); asymmetry indicates where the quant over/under-estimates probabilities.
Top-K agreement — probability that the quant's top-K predicted tokens match the original's top-K. Higher is better. Top-1 is the most important (does the quant pick the same best token?), higher K values show agreement across less likely candidates.

Optimized quants

Variants marked -opt are built using ExLlamaV3's layer-wise optimization. Instead of quantizing all layers to the same bitrate, the optimizer measures per-layer sensitivity and allocates bits where they matter most — critical layers get higher precision from a higher-bpw quant, while less sensitive layers stay at lower precision.

Tool Calls Support for Qwen/GLM Models

The official tabbyAPI doesn't support tool calls for Qwen and GLM models yet.

If you're using Pi Coding Agent, Qwen-Code, OpenClaw, or similar software that need tool call support, you can use my fork with the tools-support branch:

Clone directly:

git clone -b tools-support https://github.com/NeuroSenko/tabbyAPI

Or add to existing tabbyAPI installation:

git remote add neurosenko https://github.com/NeuroSenko/tabbyAPI
git fetch neurosenko
git checkout -b tools-support neurosenko/tools-support

This branch includes native tool calling support for Qwen and GLM model families.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NeuroSenko/Qwen3.5-397B-A17B-exl3

Base model

Qwen/Qwen3.5-397B-A17B

Quantized

(53)

this model