MLX Studio — the only app that natively supports JANG models with reasoning
Mistral Small 4 (119B-A6B) — JANG_6M (6.04-bit) — Reasoning + VLM
JANG — Jang Adaptive N-bit Grading | The GGUF Equivalent for MLX
JANG is fully open-source. Quantization engine and full commit history: github.com/jjang-ai/jangq. Created by Jinho Jang.
Supported apps: MLX Studio (full native support) and oMLX (PR #364). LM Studio, Ollama, and Inferencer do not yet support JANG.
Why JANG models?
Tools like mlx-lm, oMLX (oQ), and others can quantize models — but shipping a tool is the easy part. JANG models come from hundreds of hours of per-architecture testing: finding which layers break at which bit depths, which MoE routing survives quantization, which models need bfloat16 to avoid NaN. We don't just quantize — we convert, verify, benchmark, and publish every model with tested scores. No other project in the MLX ecosystem publishes pre-tested quantized models at this scale.
Speed Comparison
| Model | Size | Gen tok/s | Prefill tok/s | RAM | Fits On |
|---|---|---|---|---|---|
| JANG_6M (this model) | 84 GB | 74 | 160 | 95 GB | 128+ GB Macs |
| JANG_2L | 30 GB | 82 | 216 | 40 GB | 48 GB Macs |
| JANG_4M | 57 GB | 80 | 202 | 68 GB | 96+ GB Macs |
| JANG_6M | 84 GB | 74 | 160 | 95 GB | 128+ GB Macs |
| MLX Community 4-bit | 63 GB | 84 | 43 | 68 GB | 96+ GB Macs |
5x faster prefill than MLX Community (216 vs 43 tok/s). JANG_2L runs on 48 GB Macs at half the size.
Benchmarked on M3 Ultra 256 GB with bfloat16 compute.
Key Features
- 74 tok/s generation on M3 Ultra
- 84 GB on disk, 95 GB peak RAM
- Vision (VLM): Pixtral encoder, 1540px max
- Reasoning mode: [THINK]...[/THINK] step-by-step reasoning
- Code generation: Complete functions with optimized logic
- Math: Step-by-step calculations
- 119B total / 6B active — MLA attention + 128 MoE experts
- First Mistral Small 4 on Apple Silicon with full MLA + MoE support
Architecture
119B total parameters, 6B active per token
- 36 layers, all MoE (128 experts, top-4 routing)
- MLA attention: kv_lora_rank=256, q_lora_rank=1024
- Pixtral vision: 24 layers, 1540px max
- Reasoning: [THINK]...[/THINK] with reasoning_effort control
- bfloat16 compute (auto-detected)
Benchmarks
MMLU benchmarks in progress — will be updated with per-subject scores and MLX 4-bit comparison.
Install
pip install jang[mlx]
Created by Jinho Jang — jangq.ai — @dealignai
- Downloads last month
- 483
Quantized
Model tree for JANGQ-AI/Mistral-Small-4-119B-A6B-JANG_6M
Base model
mistralai/Mistral-Small-4-119B-2603