Qwen3.5-35B-A3B APEX-TQ GGUF (Experimental)

WARNING: These GGUFs require TurboQuant Plus — a fork of llama.cpp with TQ4_1S support. They will NOT work with standard llama.cpp.

APEX-TQ (Adaptive Precision for EXpert Models + TurboQuant) experimental quantizations of Qwen3.5-35B-A3B.

Brought to you by the LocalAI team | APEX Project | Technical Report

What is APEX-TQ?

APEX-TQ combines the APEX layer-wise precision gradient with TurboQuant's TQ4_1S format, which is optimized for fast prompt processing. The result trades ~1% perplexity for 3x faster prompt processing speed compared to standard K-quant formats.

This is experimental. TQ4_1S is not yet merged into mainline llama.cpp. You must build from the TurboQuant Plus fork.

Benchmark Results

Configuration	Size (GB)	Perplexity	pp512 (t/s)	tg128 (t/s)
APEX-TQ Quality	~21 GB	6.614	5,572	68.4
APEX-TQ Balanced	~23 GB	6.622	5,218	65.2
APEX-TQ Compact	~16 GB	6.833	4,890	71.1
APEX Quality (standard)	21.3 GB	6.527	1,861	68.4

pp512 speeds are 3x faster than standard APEX due to TurboQuant's optimized GEMM kernels.

Available Files

File	Profile	Size	Best For
Qwen3.5-35B-A3B-APEX-TQ-Quality.gguf	TQ Quality	~21 GB	Best quality with TQ speed boost
Qwen3.5-35B-A3B-APEX-TQ-Balanced.gguf	TQ Balanced	~23 GB	General purpose + fast prompts
Qwen3.5-35B-A3B-APEX-TQ-Compact.gguf	TQ Compact	~16 GB	Smallest TQ variant

How to Use

You must use the TurboQuant Plus fork of llama.cpp:

git clone https://github.com/nicebyte/llama.cpp -b turboquant-plus
cd llama.cpp
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release -j $(nproc)

# Run
./build/bin/llama-cli -m Qwen3.5-35B-A3B-APEX-TQ-Quality.gguf -ngl 99 -p "Hello"

For standard llama.cpp compatible APEX quants, use mudler/Qwen3.5-35B-A3B-APEX-GGUF instead.