Built for vMLX — the only MLX inferencer with VL support, KV cache quantization, prefix cache reuse, agentic tool calling, and speculative decoding.
_{Free for macOS · vmlx.net}

Qwen 3.5 VL 2B — CRACK Abliterated (8-bit MLX)

Constrained Response Alignment Circuit Kill

Architecture-aware weight surgery with full Vision-Language preservation.

No fine-tuning. No system prompts. No template tricks. Pure weight surgery.

What This Is

A truly abliterated Qwen 3.5 VL 2B model — 8-bit quantized for Apple Silicon MLX with full Vision-Language support.

Primary use case: speculative decoding draft model for the larger CRACK 122B and 397B models. Also excellent as an ultra-fast, ultra-lightweight standalone model for unrestricted inference on any Apple Silicon device. Higher precision than the 4-bit variant with full tool calling support.

✅ Speculative decoding compatible — same Qwen 3.5 VL tokenizer as 122B/397B targets for high draft acceptance
✅ Real weight surgery — targeted modification of 14 weight tensors (3 FA + 11 SSM layers)
✅ Full Vision-Language — processes images correctly, vision tower fully preserved
✅ 187 tokens/sec on Apple Silicon MLX
✅ LM Studio compatible — correct mRoPE config, works out of the box
✅ 2.7 GB — runs on any Apple Silicon Mac with 8GB+ RAM
✅ Tool calling support — generates proper <tool_call> XML format

Performance

Metric	Value
Generation Speed	187 tok/s (Apple Silicon, MLX)
Bits per Weight	9.625 (8-bit, group_size=64)
Model Size	2.7 GB
Compliance	100% (8/8 test prompts)
Knowledge Accuracy	100% (math, history, geography)
Code Generation	✅
Multi-turn	✅ Full context retention
Coherence	✅ UTR=0.75 — No garbling, no repetition loops
Tool Calls	✅ Proper XML format
Vision	✅ Full VL support

Usage

Standalone

from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

model, processor = load("dealignai/Qwen3.5-VL-2B-8bit-MLX-CRACK")
config = load_config("dealignai/Qwen3.5-VL-2B-8bit-MLX-CRACK")

# Text generation
prompt = apply_chat_template(processor, config, "Your prompt here")
output = generate(model, processor, prompt, max_tokens=500, verbose=True)

# Vision (with image)
prompt = apply_chat_template(processor, config, "Describe this image", num_images=1)
output = generate(model, processor, prompt, max_tokens=500, verbose=True, image=["path/to/image.png"])

As Speculative Decoding Draft Model

from mlx_lm import load

# Load the 122B target model
model, tokenizer = load("dealignai/Qwen3.5-VL-122B-A10B-4bit-CRACK")

# Use 2B as fastest possible draft model
response = generate(
    model, tokenizer, prompt=prompt,
    max_tokens=500,
    draft_model="dealignai/Qwen3.5-VL-2B-8bit-MLX-CRACK"
)

How This Model Was Modified

Created using CRACK — targeted weight-level surgery developed specifically for hybrid SSM/Attention architectures. Uses differential strength surgery with FA layers receiving higher strength (12) and SSM layers receiving lower strength (8), applied across 14 layers in the surgery range (L5–L18).

Vision-Language tower completely untouched
No fine-tuning, no LoRA, no training — pure weight editing
Gentler surgery than Q4 — 8-bit quantization provides less regularization, so lower strength preserves coherence better

Also Available

Model	Quant	Speed	Size	Link
2B	4-bit	248 tok/s	1.7 GB	dealignai/Qwen3.5-VL-2B-4bit-MLX-CRACK
2B	8-bit	187 tok/s	2.7 GB	This model
4B	4-bit	150 tok/s	2.9 GB	dealignai/Qwen3.5-VL-4B-4bit-MLX-CRACK
4B	8-bit	105 tok/s	4.8 GB	dealignai/Qwen3.5-VL-4B-8bit-MLX-CRACK
9B	4-bit	103 tok/s	5.6 GB	dealignai/Qwen3.5-VL-9B-4bit-MLX-CRACK
9B	8-bit	66 tok/s	9.8 GB	dealignai/Qwen3.5-VL-9B-8bit-MLX-CRACK
122B	4-bit	56+ tok/s	65 GB	dealignai/Qwen3.5-VL-122B-A10B-4bit-CRACK

About

Built by Dealign.AI — independent research into safety mechanisms in frontier AI models.

See our research: Safety Generalization in Frontier MoE Models

Base model: Qwen/Qwen3.5-VL-2B

License

Released under the Apache License 2.0, consistent with the original Qwen 3.5 VL base model. Provided "as-is" for research purposes.

Support dealignai

All models are built from original research and published for free. These models are specifically crafted to be excellent coders and general-purpose assistants.

Support us on Ko-fi — check out the Ko-fi membership for early access and extras.

Have questions or need help with a specific model? DM us — we help for free most of the time.

Ko-fi | X @dealignai | dealign.ai

Downloads last month: 156

Safetensors

Model size

0.9B params

Tensor type

BF16

U32

F32

MLX

Hardware compatibility

8-bit