⚠️ REQUIRED — jangtq_runtime.safetensors sidecar must be downloaded

Osaurus uses the native Swift JANGTQ runtime. Every JANGTQ bundle on OsaurusAI ships a small jangtq_runtime.safetensors sidecar (10 KB–165 KB) alongside the weight shards. The Swift loader will refuse to start with the error

Error: Model '<name>' declares JANGTQ (weight_format: "mxtq") but is
       missing required sidecar file 'jangtq_runtime.safetensors'.
       Re-download the full model or obtain the sidecar from the original
       publisher.

if the file is absent.

If your local copy doesn't have it (older download, partial sync, etc):

hf download OsaurusAI/DeepSeek-V4-Flash-JANGTQ2 jangtq_runtime.safetensors --local-dir <your-dir>

The file holds the deterministic codebooks + Hadamard rotation signs the Swift loader uses to decode *.tq_packed weights. It must match the seed the bundle was quantized with (mxtq_seed=42).

Osaurus AI

DeepSeek-V4-Flash — JANGTQ2 (MLX, uniform 2-bit MXTQ baseline)

Canonical 2-bit TurboQuant MXTQ baseline — uniform across all routed experts. Simpler recipe than JANGTQ premium. 79.6 GB. 22.3 tok/s.

Website  OsaurusAI


Model Details

Property Value
Base model deepseek-ai/DeepSeek-V4-Flash
Parameters 671 B total, 37 B active per token
Architecture DeepseekV4 — MLA + multi-head causal residual + Compressor/Indexer long-ctx
Codec TurboQuant MXTQ (Lloyd-Max codebook + Hadamard rotation)
Quantization plan Uniform 2-bit MXTQ for all routed experts, 8-bit affine gs=32 for non-routed
Runtime jang_tools.load_jangtq + mlx_lm.generate
Bundle size 79.6 GB
Decode 22.34 tok/s sustained on Mac Studio M3 Ultra (200-token greedy)
MMLU 200q (logit, fair seed) 70.00%

Recipe

Tensor class Bits Codec
Routed experts (all 256 × 43 layers, uniform) 2-bit MXTQ codebook
Attention (wq_a/wq_b/wkv/wo_a/wo_b) 8-bit affine gs=32
Shared experts 8-bit affine gs=32
Compressor + Indexer (long-ctx) 8-bit affine gs=32
embed_tokens, lm_head 8-bit affine gs=32
Norms / router gate / mHC fp16 passthrough

vs JANGTQ (premium): JANGTQ has per-importance plan (hash-routed L0-L2 at 4-bit MXTQ). JANGTQ2 is uniform 2-bit MXTQ — simpler, smaller risk surface.

Use

import os
os.environ["JANG_WIRED_LIMIT_GB"] = "160"  # Mac Studio M3 Ultra
# Long context (optional):
# os.environ["VMLX_DSV4_LONG_CTX"] = "1"

import mlx.core as mx
from jang_tools.load_jangtq import load_jangtq_model
from mlx_lm.generate import generate

model, tok = load_jangtq_model("OsaurusAI/DeepSeek-V4-Flash-JANGTQ2")

text = tok.apply_chat_template(
    [{"role": "user", "content": "What is 2+2?"}],
    tokenize=False, add_generation_prompt=True,
)
print(generate(model, tok, prompt=text, max_tokens=200, verbose=True))

Bundle comparison (DeepSeek-V4-Flash family, MMLU 200q logit, fair seed)

Bundle Size MMLU 200q Tok/s
DeepSeek-V4-Flash-JANGTQ (premium) 79 GB 69.50% 25.91
DeepSeek-V4-Flash-JANGTQ2 (this) 79.6 GB 70.00% 22.34
DeepSeek-V4-Flash-JANG_2L 107 GB 71.50% 23.77
mlx-community/DeepSeek-V4-Flash-2bit-DQ 90 GB 50.00% 36.03

HumanEval+ pass@1

Coming soon — comprehensive pass@1 in flight.

Credits

Created by Jinho Jang — eric@jangq.ai

Built on top of DeepSeek-V4-Flash (deepseek-ai).

Distributed via Osaurus AI.

Downloads last month
2,148
Safetensors
Model size
20B params
Tensor type
U32
·
I32
·
F16
·
I64
·
MLX
Hardware compatibility
Log In to add your hardware

2-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OsaurusAI/DeepSeek-V4-Flash-JANGTQ2

Finetuned
(11)
this model