Qwen3-Next-80B-A3B-Instruct-REAMv2 — MLX 3-bit

MLX 3-bit quantization of bknyaz/Qwen3-Next-80B-A3B-Instruct-REAM (REAMv2) for Apple Silicon Macs.

This is the general-purpose instruct sibling of TomLucidor/Qwen3-Coder-Next-REAM-mlx-3Bit (coding variant).

Model Summary

Property	Value
Base model	Qwen/Qwen3-Next-80B-A3B-Instruct
REAM compression	bknyaz/Qwen3-Next-80B-A3B-Instruct-REAM (REAMv2, 512→384 experts, 80B→60B params)
MLX quantization	3-bit with `mixed_3_6` preset (3-bit bulk, 6-bit for sensitive layers)
Average bits per weight	3.998
Size on disk	28 GB
Total parameters	60B
Active parameters per token	3B
Architecture	Hybrid attention (Gated DeltaNet + Gated Attention) with ultra-sparse MoE
Context length	262,144 tokens (native)
Target hardware	Apple Silicon Macs with 48GB+ unified memory

Compression Pipeline

Qwen3-Next-80B-A3B-Instruct (80B, 160GB bf16)
  → REAMv2 expert merging (60B, 120GB bf16) — by bknyaz
    → MLX 3-bit mixed_3_6 quantization (60B, 28GB) — this model

REAMv2 details (from the source model card): The v2 compression used C=32 expert grouping, calibration data weighted 70% math / 30% code / 0% C4, and preserves the MTP (Multi-Token Prediction) layer.

Quantization Details

Converted with mlx-lm v0.31.x using:

mlx_lm.convert \
  --hf-path bknyaz/Qwen3-Next-80B-A3B-Instruct-REAM \
  --mlx-path ~/Qwen3-Next-REAM-Instruct-mlx-3bit \
  -q \
  --q-bits 3 \
  --q-group-size 64 \
  --quant-predicate mixed_3_6

The mixed_3_6 preset quantizes most layers to 3-bit while keeping sensitive layers (MoE down projections, select attention V projections, and the LM head) at 6-bit for better quality.

Benchmark Results

Evaluated using lm-evaluation-harness via local-chat-completions against mlx_lm.server. Generation parameters: temperature=0.0, do_sample=False, batch_size=1.

Benchmark	This model (MLX 3-bit)	REAMv2 60B (bf16)	Original 80B (bf16)
GSM8K (0-shot, flexible-extract)	67.4	—	—
GSM8K (5-shot, flexible-extract)	84.6	78.1	78.6
IFEval (prompt-level strict)	82.8	—	—
IFEval (prompt-level loose)	88.5	—	—
IFEval (inst-level strict)	88.1	—	—
IFEval (inst-level loose)	92.1	—	—

REAMv2 and Original 80B scores are from the bknyaz model card. The model card reports IFEval as 93.4 for both REAMv2 and the original but does not specify which metric variant.

Note on GSM8K: Our 5-shot score (84.6) is higher than the model card's bf16 score (78.1). This is almost certainly due to differences in evaluation methodology (prompt template, generation parameters, lm-eval version), not the quantization improving quality. Treat these as reference points from our specific eval setup, not direct comparisons.

Usage

With mlx-lm

pip install mlx-lm
mlx_lm.chat --model adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit

As an OpenAI-compatible server

mlx_lm.server --model adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit --port 8080

With LM Studio

Load as a local model in LM Studio using the MLX backend. Point to the downloaded model directory.

Memory Requirements

With 28GB for weights, this model needs approximately 48GB unified memory to run comfortably with moderate context lengths. On a 48GB Apple Silicon Mac (M4 Pro, M4 Max, M5 Max, etc.), expect ~16-20GB available for KV cache and OS overhead.

Important Context

Qwen3-Next-80B-A3B was released in September 2025 as an experimental architecture preview. The current-generation model in this parameter class is Qwen3.5-35B-A3B (February 2026), which incorporates the same hybrid attention innovations with improved training, vision support, and overall better benchmarks. For most users, Qwen3.5-35B-A3B at 4-bit (~22GB) will be the better daily driver. This model is provided for users interested in the Qwen3-Next architecture specifically, or who want the larger 60B parameter count in a compact MLX format.

License

Apache 2.0 — same as the original Qwen/Qwen3-Next-80B-A3B-Instruct.

Downloads last month: 160

Safetensors

Model size

60B params

Tensor type

BF16

U32

MLX

Hardware compatibility

3-bit

Model tree for adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit

Base model

Qwen/Qwen3-Next-80B-A3B-Instruct

Finetuned

bknyaz/Qwen3-Next-80B-A3B-Instruct-REAM

Quantized

(2)

this model