Qwen3.5-122B-A10B APEX GGUF

APEX (Adaptive Precision for EXpert Models) quantizations of Qwen3.5-122B-A10B.

Brought to you by the LocalAI team | APEX Project | Technical Report

Benchmark Results

All measurements on 8xRTX PRO 6000 Blackwell (768 GB VRAM). Perplexity on wikitext-2-raw, context 512. Accuracy benchmarks via llama.cpp (400 tasks each).

Configuration	Size (GB)	Perplexity	KL mean	HellaSwag	Winogrande	MMLU	ARC	tg128 (t/s)
Q8_0 (Unsloth)	121	4.819	0.004	85.5%	77.3%	44.19	57.19	85.5
Q5_K_S (Unsloth)	~81	4.826	0.007	85.3%	76.0%	43.80	57.86	90.4
UD-Q4_K_XL (Unsloth)	~72	4.829	0.010	84.8%	76.3%	44.25	55.85	91.8
APEX I-Balanced	83.4	4.831	0.008	85.5%	77.8%	43.86	57.86	96.7
APEX I-Quality	72.3	4.838	0.012	85.3%	77.3%	43.86	56.86	99.7
APEX Quality	72.3	4.848	0.013	85.5%	76.3%	44.44	55.52	99.8
APEX Balanced	83.4	4.840	0.008	85.0%	76.3%	43.93	56.86	96.7
APEX I-Compact	55.1	4.978	0.041	84.5%	77.5%	44.06	57.86	106.3
APEX Compact	55.1	5.046	0.049	84.5%	77.8%	43.54	56.19	106.2
APEX I-Mini	44.9	5.306	0.102	84.0%	75.3%	42.83	56.52	110.0

Highlights

APEX I-Balanced matches or beats Q8_0 on HellaSwag (85.5%), Winogrande (77.8% vs 77.3%), and ARC (57.86 vs 57.19) while being 31% smaller and 13% faster.
APEX I-Quality (72.3 GB) beats UD-Q4_K_XL at the same size on HellaSwag (85.3% vs 84.8%), Winogrande (77.3% vs 76.3%), and ARC (56.86 vs 55.85).
APEX I-Compact (55.1 GB) achieves 84.5% HellaSwag and 57.86 ARC at 55% less size than Q8_0 — fastest standard profile at 106 t/s.
APEX I-Mini (44.9 GB) is the smallest at 63% less size than Q8_0, still 84% HellaSwag, fastest at 110 t/s.
I-variants consistently improve over standard profiles across PPL, KL, and ARC.

Available Files

File	Profile	Size	Best For
Qwen3.5-122B-A10B-APEX-I-Balanced.gguf	I-Balanced	83.4 GB	Best overall -- matches Q8_0 quality at 31% less size
Qwen3.5-122B-A10B-APEX-I-Quality.gguf	I-Quality	72.3 GB	Best quality at ~72 GB tier
Qwen3.5-122B-A10B-APEX-Quality.gguf	Quality	72.3 GB	Highest MMLU (44.44)
Qwen3.5-122B-A10B-APEX-Balanced.gguf	Balanced	83.4 GB	General purpose, low KL
Qwen3.5-122B-A10B-APEX-I-Compact.gguf	I-Compact	55.1 GB	Consumer multi-GPU, best quality/size ratio
Qwen3.5-122B-A10B-APEX-Compact.gguf	Compact	55.1 GB	Consumer multi-GPU setups
Qwen3.5-122B-A10B-APEX-I-Mini.gguf	I-Mini	44.9 GB	Smallest viable, fastest inference

What is APEX?

APEX is a quantization strategy for Mixture-of-Experts (MoE) models. It classifies tensors by role (routed expert, shared expert, attention) and applies a layer-wise precision gradient -- edge layers get higher precision, middle layers get more aggressive compression. I-variants use diverse imatrix calibration (chat, code, reasoning, tool-calling, agentic traces, Wikipedia).

See the APEX project for full details, technical report, and scripts.

Architecture

Model: Qwen3.5-122B-A10B (Qwen3.5-MoE)
Layers: 48
Experts: 256 routed + 1 shared (8 active per token)
Total Parameters: 122B
Active Parameters: ~10B per token
APEX Config: 5+5 symmetric edge gradient across 48 layers

Run with LocalAI

local-ai run mudler/Qwen3.5-122B-A10B-APEX-GGUF@Qwen3.5-122B-A10B-APEX-I-Balanced.gguf

Credits

APEX is brought to you by the LocalAI team. Developed through human-driven, AI-assisted research. Built on llama.cpp.

Downloads last month: 5,267

GGUF

Model size

122B params

Architecture

qwen35moe

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mudler/Qwen3.5-122B-A10B-APEX-GGUF

Base model

Qwen/Qwen3.5-122B-A10B

Quantized

(90)

this model

Collection including mudler/Qwen3.5-122B-A10B-APEX-GGUF

APEX Quants (GGUF)

Collection

MoE models quantized with the APEX Quantization technique ( https://github.com/mudler/apex-quant ) • 22 items • Updated about 2 hours ago • 29