Qwen3.5-397B-A17B-Uncensored-REAP35 GGUF

Expert-pruned version of Qwen3.5-397B-A17B-Uncensored using REAP (Router-weighted Expert Activation Pruning). 35% of experts removed, reducing from 512 to 332 experts per layer.

Based on the uncensored fine-tune by timteh673.

Available Quants

File Quant Size Description
Qwen3.5-397B-A17B-Uncensored-REAP35-Q8_0 Q8_0 259 GB Full precision pruned, for re-quantization

More quant variants coming soon.

REAP Pruning Method

REAP scores each expert using imatrix calibration data and uniformly removes the lowest-scoring experts from every MoE layer.

Each expert receives a score based on two signals captured during calibration inference:

  1. Activation count โ€” how many times the expert was selected by the router
  2. Activation magnitude โ€” sum of squared input activations when the expert was active

The final score is: normalized_count x normalized_magnitude

  • Base model: Qwen3.5-397B-A17B-Uncensored (512 experts, 10 active per layer)
  • Pruned: 512 โ†’ 332 experts per layer (35% removed)
  • Calibration data: unsloth calibration dataset (base model imatrix)
Downloads last month
2,564
GGUF
Model size
261B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Goldkoron/Qwen3.5-397B-A17B-Uncensored-REAP35

Quantized
(1)
this model