⚠️ REQUIRED —
jangtq_runtime.safetensorssidecar must be downloadedOsaurus uses the native Swift JANGTQ runtime. Every JANGTQ bundle on OsaurusAI ships a small
jangtq_runtime.safetensorssidecar (10 KB–165 KB) alongside the weight shards. The Swift loader will refuse to start with the errorError: Model '<name>' declares JANGTQ (weight_format: "mxtq") but is missing required sidecar file 'jangtq_runtime.safetensors'. Re-download the full model or obtain the sidecar from the original publisher.if the file is absent.
If your local copy doesn't have it (older download, partial sync, etc):
hf download OsaurusAI/DeepSeek-V4-Flash-JANGTQ2 jangtq_runtime.safetensors --local-dir <your-dir>The file holds the deterministic codebooks + Hadamard rotation signs the Swift loader uses to decode
*.tq_packedweights. It must match the seed the bundle was quantized with (mxtq_seed=42).
DeepSeek-V4-Flash — JANGTQ2 (MLX, uniform 2-bit MXTQ baseline)
Canonical 2-bit TurboQuant MXTQ baseline — uniform across all routed experts. Simpler recipe than JANGTQ premium. 79.6 GB. 22.3 tok/s.
Model Details
| Property | Value |
|---|---|
| Base model | deepseek-ai/DeepSeek-V4-Flash |
| Parameters | 671 B total, 37 B active per token |
| Architecture | DeepseekV4 — MLA + multi-head causal residual + Compressor/Indexer long-ctx |
| Codec | TurboQuant MXTQ (Lloyd-Max codebook + Hadamard rotation) |
| Quantization plan | Uniform 2-bit MXTQ for all routed experts, 8-bit affine gs=32 for non-routed |
| Runtime | jang_tools.load_jangtq + mlx_lm.generate |
| Bundle size | 79.6 GB |
| Decode | 22.34 tok/s sustained on Mac Studio M3 Ultra (200-token greedy) |
| MMLU 200q (logit, fair seed) | 70.00% |
Recipe
| Tensor class | Bits | Codec |
|---|---|---|
| Routed experts (all 256 × 43 layers, uniform) | 2-bit | MXTQ codebook |
Attention (wq_a/wq_b/wkv/wo_a/wo_b) |
8-bit | affine gs=32 |
| Shared experts | 8-bit | affine gs=32 |
| Compressor + Indexer (long-ctx) | 8-bit | affine gs=32 |
embed_tokens, lm_head |
8-bit | affine gs=32 |
| Norms / router gate / mHC | fp16 | passthrough |
vs JANGTQ (premium): JANGTQ has per-importance plan (hash-routed L0-L2 at 4-bit MXTQ). JANGTQ2 is uniform 2-bit MXTQ — simpler, smaller risk surface.
Use
import os
os.environ["JANG_WIRED_LIMIT_GB"] = "160" # Mac Studio M3 Ultra
# Long context (optional):
# os.environ["VMLX_DSV4_LONG_CTX"] = "1"
import mlx.core as mx
from jang_tools.load_jangtq import load_jangtq_model
from mlx_lm.generate import generate
model, tok = load_jangtq_model("OsaurusAI/DeepSeek-V4-Flash-JANGTQ2")
text = tok.apply_chat_template(
[{"role": "user", "content": "What is 2+2?"}],
tokenize=False, add_generation_prompt=True,
)
print(generate(model, tok, prompt=text, max_tokens=200, verbose=True))
Bundle comparison (DeepSeek-V4-Flash family, MMLU 200q logit, fair seed)
| Bundle | Size | MMLU 200q | Tok/s |
|---|---|---|---|
| DeepSeek-V4-Flash-JANGTQ (premium) | 79 GB | 69.50% | 25.91 |
| DeepSeek-V4-Flash-JANGTQ2 (this) | 79.6 GB | 70.00% | 22.34 |
| DeepSeek-V4-Flash-JANG_2L | 107 GB | 71.50% | 23.77 |
| mlx-community/DeepSeek-V4-Flash-2bit-DQ | 90 GB | 50.00% | 36.03 |
HumanEval+ pass@1
Coming soon — comprehensive pass@1 in flight.
Credits
Created by Jinho Jang — eric@jangq.ai
Built on top of DeepSeek-V4-Flash (deepseek-ai).
Distributed via Osaurus AI.
- Downloads last month
- 2,148
2-bit
Model tree for OsaurusAI/DeepSeek-V4-Flash-JANGTQ2
Base model
deepseek-ai/DeepSeek-V4-Flash