Qwen3.5-35B-A3B APEX-TQ GGUF (Experimental)
WARNING: These GGUFs require TurboQuant Plus — a fork of llama.cpp with TQ4_1S support. They will NOT work with standard llama.cpp.
APEX-TQ (Adaptive Precision for EXpert Models + TurboQuant) experimental quantizations of Qwen3.5-35B-A3B.
Brought to you by the LocalAI team | APEX Project | Technical Report
What is APEX-TQ?
APEX-TQ combines the APEX layer-wise precision gradient with TurboQuant's TQ4_1S format, which is optimized for fast prompt processing. The result trades ~1% perplexity for 3x faster prompt processing speed compared to standard K-quant formats.
This is experimental. TQ4_1S is not yet merged into mainline llama.cpp. You must build from the TurboQuant Plus fork.
Benchmark Results
| Configuration | Size (GB) | Perplexity | pp512 (t/s) | tg128 (t/s) |
|---|---|---|---|---|
| APEX-TQ Quality | ~21 GB | 6.614 | 5,572 | 68.4 |
| APEX-TQ Balanced | ~23 GB | 6.622 | 5,218 | 65.2 |
| APEX-TQ Compact | ~16 GB | 6.833 | 4,890 | 71.1 |
| APEX Quality (standard) | 21.3 GB | 6.527 | 1,861 | 68.4 |
pp512 speeds are 3x faster than standard APEX due to TurboQuant's optimized GEMM kernels.
Available Files
| File | Profile | Size | Best For |
|---|---|---|---|
| Qwen3.5-35B-A3B-APEX-TQ-Quality.gguf | TQ Quality | ~21 GB | Best quality with TQ speed boost |
| Qwen3.5-35B-A3B-APEX-TQ-Balanced.gguf | TQ Balanced | ~23 GB | General purpose + fast prompts |
| Qwen3.5-35B-A3B-APEX-TQ-Compact.gguf | TQ Compact | ~16 GB | Smallest TQ variant |
How to Use
You must use the TurboQuant Plus fork of llama.cpp:
git clone https://github.com/nicebyte/llama.cpp -b turboquant-plus
cd llama.cpp
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release -j $(nproc)
# Run
./build/bin/llama-cli -m Qwen3.5-35B-A3B-APEX-TQ-Quality.gguf -ngl 99 -p "Hello"
For standard llama.cpp compatible APEX quants, use mudler/Qwen3.5-35B-A3B-APEX-GGUF instead.
Architecture
- Model: Qwen3.5-35B-A3B
- Layers: 40
- Experts: 256 routed + 1 shared (8 active per token)
- Total Parameters: ~35B
- Active Parameters: ~3B per token
- Quantization: APEX layer gradient + TQ4_1S format for expert weights
Credits
APEX is brought to you by the LocalAI team. TurboQuant by TheTom. Built on llama.cpp.
- Downloads last month
- 2,833
We're not able to determine the quantization variants.