Qwen3.5-27B-TQ3_4S

Clean base TQ3_4S GGUF release for Qwen3.5-27B.

TQ3_4S is a 3.5-bit Walsh-Hadamard-transform weight format with four per-8 scales per 32-weight block.

Summary

Format: TQ3_4S
Model size: about 12.9 GiB
Target runtime: public TurboQuant-enabled llama.cpp
Intended use: local inference on consumer GPUs
Multimodal projector included: mmproj-BF16.gguf

Quality

Qwen3.5-27B, wiki.test.raw, c=2048:

Format	PPL	Size
`TQ3_4S`	`6.8224 +/- 0.04534`	`12.9 GiB`
`Q3_K_S`	`6.8630 +/- 0.04583`	`11.4 GiB`
`TQ3_1S`	`6.9807 +/- 0.04690`	`12.9 GiB`
`EXL3 3.0bpw`	`7.027580`	`~13.0 GiB`

Notes:

TQ3_4S and Q3_K_S are full-pass llama-perplexity results.
TQ3_1S is also a full-pass llama-perplexity result at c=2048.
EXL3 3.0bpw is from a local 145 x 2048 eval, not llama-perplexity.
This 27B result should not be read as evidence that plain TQ3_4S works equally well on smaller dense models.

Runtime

This model requires the public TurboQuant runtime fork:

https://github.com/turbo-tan/llama.cpp-tq3

Build and run:

git clone https://github.com/turbo-tan/llama.cpp-tq3.git
cd llama.cpp-tq3

cmake -B build -DGGML_CUDA=ON -DCMAKE_BUILD_TYPE=Release
cmake --build build -j

./build/bin/llama-server \
  -m /path/to/Qwen_Qwen3.5-27B-TQ3_4S.gguf \
  -ngl 99 \
  -fa on \
  -c 8192 \
  -ctk q8_0 -ctv q8_0 \
  --cache-ram 0 \
  --no-warmup --jinja \
  --reasoning off --reasoning-budget 0 --reasoning-format deepseek \
  --port 8090

Vision / Image Input

Use the included projector:

./build/bin/llama-server \
  -m /path/to/Qwen_Qwen3.5-27B-TQ3_4S.gguf \
  -mm /path/to/mmproj-BF16.gguf \
  -ngl 99 -c 8192 -np 1 \
  -ctk q8_0 -ctv q8_0 -fa on \
  --cache-ram 0 --no-warmup --jinja \
  --reasoning off --reasoning-budget 0 --reasoning-format deepseek \
  --no-mmproj-offload

If your frontend says image input is unsupported, it is usually still pointed at a server instance that was started without --mmproj.

Notes

This upload is the clean base TQ3_4S release, not the private KLD-guided mixed-precision variants.

Credits

llama.cpp
Qwen3.5-27B
Walsh-Hadamard / transform-quantization line including RaBitQ, TurboQuant, and related work

Downloads last month: 2,687

GGUF

Model size

27B params

Architecture

qwen35

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Model tree for YTan2000/Qwen3.5-27B-TQ3_4S

Base model

Qwen/Qwen3.5-27B

Quantized

(164)

this model