Osaurus

OsaurusAI/Mistral-Medium-3.5-128B-mxfp4

Quantized mistralai/Mistral-Medium-3.5-128B for Apple Silicon (MLX) — dense 128B parameter multimodal LLM with image input + 256K context.

Source mistralai/Mistral-Medium-3.5-128B
Architecture mistral3 wrapper — ministral3 text decoder (88L × 12288, GQA 96/8) + pixtral vision encoder (48L × 1664)
Quant format MXFP4 (mlx 4-bit affine, group_size=32)
Bundle size on disk 85.72 GB (78 safetensors shards)
License Apache-2.0 (inherits from upstream)
Modalities Text + image in / text out (no audio, no video)
Context 262 144 tokens (YaRN, factor=64 from orig=4096)

What's quantized

  • Text decoder linears + embed_tokensmlx 4-bit affine (bits=4 group_size=32)
  • vision_tower.*, multi_modal_projector.*, lm_head → fp16 passthrough
  • All RMSNorms → fp16 passthrough

Vision tower (model.vision_tower.*, 48 layers, 1664 hidden, patch=14, image_size=1540, spatial_merge=2) — kept bf16 → fp16 passthrough, matching upstream quantization_config.modules_to_not_convert. Images dispatch to the same pixtral encoder; embeddings are folded into the LM via a 2-layer GELU multimodal projector (also fp16 passthrough).

lm_head is fp16 passthrough (matches upstream ignored set) — no quantization noise on the final logits.

Codec round-trip validation (this bundle)

7/7 PASS at cosine ≥ 0.94 across L0 → L87 covering attn + MLP + GQA k/v projections — the expected 2-bit TurboQuant noise floor. Source FP8 e4m3 dequant uses weight_scale_inv per-tensor scale; vision/projector/lm_head pass through unchanged.

Run on Apple Silicon

pip install mlx safetensors transformers pillow
python -m jang_tools.mistral3.runtime \
    --src ~/.mlxstudio/models/OsaurusAI/Mistral-Medium-3.5-128B-mxfp4 \
    --prompt "Describe this image." \
    --image /path/to/photo.jpg \
    --max-new 64

The runtime auto-detects weight_format and dispatches; image preprocessing matches the pixtral spec (jang_tools.vl.pixtral.PixtralImageProcessor).

Build

python -m jang_tools.convert_mistral3_mxfp4 \
    ~/.mlxstudio/models/_sources/Mistral-Medium-3.5-128B \
    ~/.mlxstudio/models/OsaurusAI/Mistral-Medium-3.5-128B-mxfp4

Credits

Quantized by Jinho Jang (eric@osaurus.ai). MLX-native pipeline; 88 dense decoder layers + 48 pixtral vision layers run on M-series Macs.

Downloads last month
771
Safetensors
Model size
27B params
Tensor type
F16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OsaurusAI/Mistral-Medium-3.5-128B-mxfp4

Finetuned
(4)
this model