MLX Studio

MLX Studio — the only app that natively supports JANG models with reasoning


JANG

Mistral Small 4 (119B-A6B) — JANG_6M (6.04-bit) — Reasoning + VLM

JANG — Jang Adaptive N-bit Grading | The GGUF Equivalent for MLX

GitHub  PyPI  Website  X/Twitter

JANG is fully open-source. Quantization engine and full commit history: github.com/jjang-ai/jangq. Created by Jinho Jang.


Supported apps: MLX Studio (full native support) and oMLX (PR #364). LM Studio, Ollama, and Inferencer do not yet support JANG.


Why JANG models?

Tools like mlx-lm, oMLX (oQ), and others can quantize models — but shipping a tool is the easy part. JANG models come from hundreds of hours of per-architecture testing: finding which layers break at which bit depths, which MoE routing survives quantization, which models need bfloat16 to avoid NaN. We don't just quantize — we convert, verify, benchmark, and publish every model with tested scores. No other project in the MLX ecosystem publishes pre-tested quantized models at this scale.


Speed Comparison

Model Size Gen tok/s Prefill tok/s RAM Fits On
JANG_6M (this model) 84 GB 74 160 95 GB 128+ GB Macs
JANG_2L 30 GB 82 216 40 GB 48 GB Macs
JANG_4M 57 GB 80 202 68 GB 96+ GB Macs
JANG_6M 84 GB 74 160 95 GB 128+ GB Macs
MLX Community 4-bit 63 GB 84 43 68 GB 96+ GB Macs

5x faster prefill than MLX Community (216 vs 43 tok/s). JANG_2L runs on 48 GB Macs at half the size.

Benchmarked on M3 Ultra 256 GB with bfloat16 compute.

Key Features

  • 74 tok/s generation on M3 Ultra
  • 84 GB on disk, 95 GB peak RAM
  • Vision (VLM): Pixtral encoder, 1540px max
  • Reasoning mode: [THINK]...[/THINK] step-by-step reasoning
  • Code generation: Complete functions with optimized logic
  • Math: Step-by-step calculations
  • 119B total / 6B active — MLA attention + 128 MoE experts
  • First Mistral Small 4 on Apple Silicon with full MLA + MoE support

Architecture

119B total parameters, 6B active per token
- 36 layers, all MoE (128 experts, top-4 routing)
- MLA attention: kv_lora_rank=256, q_lora_rank=1024
- Pixtral vision: 24 layers, 1540px max
- Reasoning: [THINK]...[/THINK] with reasoning_effort control
- bfloat16 compute (auto-detected)

Benchmarks

MMLU benchmarks in progress — will be updated with per-subject scores and MLX 4-bit comparison.

Install

pip install jang[mlx]

Created by Jinho Jangjangq.ai@dealignai

Downloads last month
483
Safetensors
Model size
27B params
Tensor type
U32
·
F16
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JANGQ-AI/Mistral-Small-4-119B-A6B-JANG_6M

Quantized
(29)
this model