Quantization was performed using exllama3 v0.0.25.

Quant Size (GB) Actual bpw PPL KL-div (q→o) KL-div (o→q) Top-1 Top-2 Top-3 Top-4 Top-5
3.0bpw 142 3.00 3.220 0.0674 0.0776 91.9% 68.4% 44.5% 26.6% 14.8%
3.5bpw_opt 143 3.03 3.173 0.0474 0.0531 93.5% 73.4% 51.1% 32.9% 20.1%
4.0bpw 188 4.00 3.101 0.0203 0.0210 95.7% 81.0% 62.3% 44.7% 30.5%
4.5bpw_opt 189 4.03 3.082 0.0149 0.0153 96.3% 83.9% 67.2% 50.7% 36.6%
5.0bpw 234 5.00 3.067 0.0079 0.0079 97.3% 87.6% 73.9% 59.0% 45.3%
original 751 16.00 3.053

Metrics

  • PPL (Perplexity) — how well the model predicts the next token. Lower is better. The original model's PPL is the baseline.
  • KL-div (Kullback-Leibler divergence) — measures how the quant's probability distribution differs from the original. Lower is better. Shown in both directions (quant→orig, orig→quant); asymmetry indicates where the quant over/under-estimates probabilities.
  • Top-K agreement — probability that the quant's top-K predicted tokens match the original's top-K. Higher is better. Top-1 is the most important (does the quant pick the same best token?), higher K values show agreement across less likely candidates.

Optimized quants

Variants marked -opt are built using ExLlamaV3's layer-wise optimization. Instead of quantizing all layers to the same bitrate, the optimizer measures per-layer sensitivity and allocates bits where they matter most — critical layers get higher precision from a higher-bpw quant, while less sensitive layers stay at lower precision.

Tool Calls Support for Qwen/GLM Models

The official tabbyAPI doesn't support tool calls for Qwen and GLM models yet.

If you're using Pi Coding Agent, Qwen-Code, OpenClaw, or similar software that need tool call support, you can use my fork with the tools-support branch:

Clone directly:

git clone -b tools-support https://github.com/NeuroSenko/tabbyAPI

Or add to existing tabbyAPI installation:

git remote add neurosenko https://github.com/NeuroSenko/tabbyAPI
git fetch neurosenko
git checkout -b tools-support neurosenko/tools-support

This branch includes native tool calling support for Qwen and GLM model families.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NeuroSenko/Qwen3.5-397B-A17B-exl3

Quantized
(53)
this model