Qwen3.5-35B-A3B APEX-TQ GGUF (Experimental)

WARNING: These GGUFs require TurboQuant Plus — a fork of llama.cpp with TQ4_1S support. They will NOT work with standard llama.cpp.

APEX-TQ (Adaptive Precision for EXpert Models + TurboQuant) experimental quantizations of Qwen3.5-35B-A3B.

Brought to you by the LocalAI team | APEX Project | Technical Report

What is APEX-TQ?

APEX-TQ combines the APEX layer-wise precision gradient with TurboQuant's TQ4_1S format, which is optimized for fast prompt processing. The result trades ~1% perplexity for 3x faster prompt processing speed compared to standard K-quant formats.

This is experimental. TQ4_1S is not yet merged into mainline llama.cpp. You must build from the TurboQuant Plus fork.

Benchmark Results

Configuration Size (GB) Perplexity pp512 (t/s) tg128 (t/s)
APEX-TQ Quality ~21 GB 6.614 5,572 68.4
APEX-TQ Balanced ~23 GB 6.622 5,218 65.2
APEX-TQ Compact ~16 GB 6.833 4,890 71.1
APEX Quality (standard) 21.3 GB 6.527 1,861 68.4

pp512 speeds are 3x faster than standard APEX due to TurboQuant's optimized GEMM kernels.

Available Files

File Profile Size Best For
Qwen3.5-35B-A3B-APEX-TQ-Quality.gguf TQ Quality ~21 GB Best quality with TQ speed boost
Qwen3.5-35B-A3B-APEX-TQ-Balanced.gguf TQ Balanced ~23 GB General purpose + fast prompts
Qwen3.5-35B-A3B-APEX-TQ-Compact.gguf TQ Compact ~16 GB Smallest TQ variant

How to Use

You must use the TurboQuant Plus fork of llama.cpp:

git clone https://github.com/nicebyte/llama.cpp -b turboquant-plus
cd llama.cpp
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release -j $(nproc)

# Run
./build/bin/llama-cli -m Qwen3.5-35B-A3B-APEX-TQ-Quality.gguf -ngl 99 -p "Hello"

For standard llama.cpp compatible APEX quants, use mudler/Qwen3.5-35B-A3B-APEX-GGUF instead.

Architecture

  • Model: Qwen3.5-35B-A3B
  • Layers: 40
  • Experts: 256 routed + 1 shared (8 active per token)
  • Total Parameters: ~35B
  • Active Parameters: ~3B per token
  • Quantization: APEX layer gradient + TQ4_1S format for expert weights

Credits

APEX is brought to you by the LocalAI team. TurboQuant by TheTom. Built on llama.cpp.

Downloads last month
2,833
GGUF
Model size
35B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mudler/Qwen3.5-35B-A3B-APEX-TQ-GGUF

Quantized
(201)
this model

Collection including mudler/Qwen3.5-35B-A3B-APEX-TQ-GGUF