JANGQ-AI/Mistral-Small-4-119B-A6B-JANG_6M

MLX Studio — the only app that natively supports JANG models with reasoning

Mistral Small 4 (119B-A6B) — JANG_6M (6.04-bit) — Reasoning + VLM

JANG — Jang Adaptive N-bit Grading | The GGUF Equivalent for MLX

JANG is fully open-source. Quantization engine and full commit history: github.com/jjang-ai/jangq. Created by Jinho Jang.

Supported apps: MLX Studio (full native support) and oMLX (PR #364). LM Studio, Ollama, and Inferencer do not yet support JANG.

Why JANG models?

Tools like mlx-lm, oMLX (oQ), and others can quantize models — but shipping a tool is the easy part. JANG models come from hundreds of hours of per-architecture testing: finding which layers break at which bit depths, which MoE routing survives quantization, which models need bfloat16 to avoid NaN. We don't just quantize — we convert, verify, benchmark, and publish every model with tested scores. No other project in the MLX ecosystem publishes pre-tested quantized models at this scale.

Speed Comparison

Model	Size	Gen tok/s	Prefill tok/s	RAM	Fits On
JANG_6M (this model)	84 GB	74	160	95 GB	128+ GB Macs
JANG_2L	30 GB	82	216	40 GB	48 GB Macs
JANG_4M	57 GB	80	202	68 GB	96+ GB Macs
JANG_6M	84 GB	74	160	95 GB	128+ GB Macs
MLX Community 4-bit	63 GB	84	43	68 GB	96+ GB Macs

5x faster prefill than MLX Community (216 vs 43 tok/s). JANG_2L runs on 48 GB Macs at half the size.

Benchmarked on M3 Ultra 256 GB with bfloat16 compute.

Key Features

74 tok/s generation on M3 Ultra
84 GB on disk, 95 GB peak RAM
Vision (VLM): Pixtral encoder, 1540px max
Reasoning mode: [THINK]...[/THINK] step-by-step reasoning
Code generation: Complete functions with optimized logic
Math: Step-by-step calculations
119B total / 6B active — MLA attention + 128 MoE experts
First Mistral Small 4 on Apple Silicon with full MLA + MoE support

Architecture

119B total parameters, 6B active per token
- 36 layers, all MoE (128 experts, top-4 routing)
- MLA attention: kv_lora_rank=256, q_lora_rank=1024
- Pixtral vision: 24 layers, 1540px max
- Reasoning: [THINK]...[/THINK] with reasoning_effort control
- bfloat16 compute (auto-detected)

Benchmarks

MMLU benchmarks in progress — will be updated with per-subject scores and MLX 4-bit comparison.

Install

pip install jang[mlx]

Created by Jinho Jang — jangq.ai — @dealignai

Downloads last month: 483

Safetensors

Model size

27B params

Tensor type

U32

F16

MLX

Hardware compatibility

Quantized

Model tree for JANGQ-AI/Mistral-Small-4-119B-A6B-JANG_6M

Base model

mistralai/Mistral-Small-4-119B-2603

Quantized

(29)

this model