Qwen3.5-27B-TQ3_4S

Clean base TQ3_4S GGUF release for Qwen3.5-27B.

TQ3_4S is a 3.5-bit Walsh-Hadamard-transform weight format with four per-8 scales per 32-weight block.

Summary

  • Format: TQ3_4S
  • Model size: about 12.9 GiB
  • Target runtime: public TurboQuant-enabled llama.cpp
  • Intended use: local inference on consumer GPUs
  • Multimodal projector included: mmproj-BF16.gguf

Quality

Qwen3.5-27B, wiki.test.raw, c=2048:

Format PPL Size
TQ3_4S 6.8224 +/- 0.04534 12.9 GiB
Q3_K_S 6.8630 +/- 0.04583 11.4 GiB
TQ3_1S 6.9807 +/- 0.04690 12.9 GiB
EXL3 3.0bpw 7.027580 ~13.0 GiB

Notes:

  • TQ3_4S and Q3_K_S are full-pass llama-perplexity results.
  • TQ3_1S is also a full-pass llama-perplexity result at c=2048.
  • EXL3 3.0bpw is from a local 145 x 2048 eval, not llama-perplexity.
  • This 27B result should not be read as evidence that plain TQ3_4S works equally well on smaller dense models.

Runtime

This model requires the public TurboQuant runtime fork:

Build and run:

git clone https://github.com/turbo-tan/llama.cpp-tq3.git
cd llama.cpp-tq3

cmake -B build -DGGML_CUDA=ON -DCMAKE_BUILD_TYPE=Release
cmake --build build -j

./build/bin/llama-server \
  -m /path/to/Qwen_Qwen3.5-27B-TQ3_4S.gguf \
  -ngl 99 \
  -fa on \
  -c 8192 \
  -ctk q8_0 -ctv q8_0 \
  --cache-ram 0 \
  --no-warmup --jinja \
  --reasoning off --reasoning-budget 0 --reasoning-format deepseek \
  --port 8090

Vision / Image Input

Use the included projector:

./build/bin/llama-server \
  -m /path/to/Qwen_Qwen3.5-27B-TQ3_4S.gguf \
  -mm /path/to/mmproj-BF16.gguf \
  -ngl 99 -c 8192 -np 1 \
  -ctk q8_0 -ctv q8_0 -fa on \
  --cache-ram 0 --no-warmup --jinja \
  --reasoning off --reasoning-budget 0 --reasoning-format deepseek \
  --no-mmproj-offload

If your frontend says image input is unsupported, it is usually still pointed at a server instance that was started without --mmproj.

Notes

This upload is the clean base TQ3_4S release, not the private KLD-guided mixed-precision variants.

Credits

  • llama.cpp
  • Qwen3.5-27B
  • Walsh-Hadamard / transform-quantization line including RaBitQ, TurboQuant, and related work
Downloads last month
2,687
GGUF
Model size
27B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for YTan2000/Qwen3.5-27B-TQ3_4S

Base model

Qwen/Qwen3.5-27B
Quantized
(164)
this model