Qwen3-Next-80B-A3B-Instruct-REAMv2 β MLX 3-bit
MLX 3-bit quantization of bknyaz/Qwen3-Next-80B-A3B-Instruct-REAM (REAMv2) for Apple Silicon Macs.
This is the general-purpose instruct sibling of TomLucidor/Qwen3-Coder-Next-REAM-mlx-3Bit (coding variant).
Model Summary
| Property | Value |
|---|---|
| Base model | Qwen/Qwen3-Next-80B-A3B-Instruct |
| REAM compression | bknyaz/Qwen3-Next-80B-A3B-Instruct-REAM (REAMv2, 512β384 experts, 80Bβ60B params) |
| MLX quantization | 3-bit with mixed_3_6 preset (3-bit bulk, 6-bit for sensitive layers) |
| Average bits per weight | 3.998 |
| Size on disk | 28 GB |
| Total parameters | 60B |
| Active parameters per token | 3B |
| Architecture | Hybrid attention (Gated DeltaNet + Gated Attention) with ultra-sparse MoE |
| Context length | 262,144 tokens (native) |
| Target hardware | Apple Silicon Macs with 48GB+ unified memory |
Compression Pipeline
Qwen3-Next-80B-A3B-Instruct (80B, 160GB bf16)
β REAMv2 expert merging (60B, 120GB bf16) β by bknyaz
β MLX 3-bit mixed_3_6 quantization (60B, 28GB) β this model
REAMv2 details (from the source model card): The v2 compression used C=32 expert grouping, calibration data weighted 70% math / 30% code / 0% C4, and preserves the MTP (Multi-Token Prediction) layer.
Quantization Details
Converted with mlx-lm v0.31.x using:
mlx_lm.convert \
--hf-path bknyaz/Qwen3-Next-80B-A3B-Instruct-REAM \
--mlx-path ~/Qwen3-Next-REAM-Instruct-mlx-3bit \
-q \
--q-bits 3 \
--q-group-size 64 \
--quant-predicate mixed_3_6
The mixed_3_6 preset quantizes most layers to 3-bit while keeping sensitive layers (MoE down projections, select attention V projections, and the LM head) at 6-bit for better quality.
Benchmark Results
Evaluated using lm-evaluation-harness via local-chat-completions against mlx_lm.server. Generation parameters: temperature=0.0, do_sample=False, batch_size=1.
| Benchmark | This model (MLX 3-bit) | REAMv2 60B (bf16) | Original 80B (bf16) |
|---|---|---|---|
| GSM8K (0-shot, flexible-extract) | 67.4 | β | β |
| GSM8K (5-shot, flexible-extract) | 84.6 | 78.1 | 78.6 |
| IFEval (prompt-level strict) | 82.8 | β | β |
| IFEval (prompt-level loose) | 88.5 | β | β |
| IFEval (inst-level strict) | 88.1 | β | β |
| IFEval (inst-level loose) | 92.1 | β | β |
REAMv2 and Original 80B scores are from the bknyaz model card. The model card reports IFEval as 93.4 for both REAMv2 and the original but does not specify which metric variant.
Note on GSM8K: Our 5-shot score (84.6) is higher than the model card's bf16 score (78.1). This is almost certainly due to differences in evaluation methodology (prompt template, generation parameters, lm-eval version), not the quantization improving quality. Treat these as reference points from our specific eval setup, not direct comparisons.
Usage
With mlx-lm
pip install mlx-lm
mlx_lm.chat --model adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit
As an OpenAI-compatible server
mlx_lm.server --model adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit --port 8080
With LM Studio
Load as a local model in LM Studio using the MLX backend. Point to the downloaded model directory.
Memory Requirements
With 28GB for weights, this model needs approximately 48GB unified memory to run comfortably with moderate context lengths. On a 48GB Apple Silicon Mac (M4 Pro, M4 Max, M5 Max, etc.), expect ~16-20GB available for KV cache and OS overhead.
Important Context
Qwen3-Next-80B-A3B was released in September 2025 as an experimental architecture preview. The current-generation model in this parameter class is Qwen3.5-35B-A3B (February 2026), which incorporates the same hybrid attention innovations with improved training, vision support, and overall better benchmarks. For most users, Qwen3.5-35B-A3B at 4-bit (~22GB) will be the better daily driver. This model is provided for users interested in the Qwen3-Next architecture specifically, or who want the larger 60B parameter count in a compact MLX format.
License
Apache 2.0 β same as the original Qwen/Qwen3-Next-80B-A3B-Instruct.
- Downloads last month
- 160
3-bit
Model tree for adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit
Base model
Qwen/Qwen3-Next-80B-A3B-Instruct