Qwen 3.5 VL 2B — CRACK Abliterated (8-bit MLX)
Constrained Response Alignment Circuit Kill
Architecture-aware weight surgery with full Vision-Language preservation.
No fine-tuning. No system prompts. No template tricks. Pure weight surgery.
What This Is
A truly abliterated Qwen 3.5 VL 2B model — 8-bit quantized for Apple Silicon MLX with full Vision-Language support.
Primary use case: speculative decoding draft model for the larger CRACK 122B and 397B models. Also excellent as an ultra-fast, ultra-lightweight standalone model for unrestricted inference on any Apple Silicon device. Higher precision than the 4-bit variant with full tool calling support.
- ✅ Speculative decoding compatible — same Qwen 3.5 VL tokenizer as 122B/397B targets for high draft acceptance
- ✅ Real weight surgery — targeted modification of 14 weight tensors (3 FA + 11 SSM layers)
- ✅ Full Vision-Language — processes images correctly, vision tower fully preserved
- ✅ 187 tokens/sec on Apple Silicon MLX
- ✅ LM Studio compatible — correct mRoPE config, works out of the box
- ✅ 2.7 GB — runs on any Apple Silicon Mac with 8GB+ RAM
- ✅ Tool calling support — generates proper
<tool_call>XML format
Performance
| Metric | Value |
|---|---|
| Generation Speed | 187 tok/s (Apple Silicon, MLX) |
| Bits per Weight | 9.625 (8-bit, group_size=64) |
| Model Size | 2.7 GB |
| Compliance | 100% (8/8 test prompts) |
| Knowledge Accuracy | 100% (math, history, geography) |
| Code Generation | ✅ |
| Multi-turn | ✅ Full context retention |
| Coherence | ✅ UTR=0.75 — No garbling, no repetition loops |
| Tool Calls | ✅ Proper XML format |
| Vision | ✅ Full VL support |
Usage
Standalone
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config
model, processor = load("dealignai/Qwen3.5-VL-2B-8bit-MLX-CRACK")
config = load_config("dealignai/Qwen3.5-VL-2B-8bit-MLX-CRACK")
# Text generation
prompt = apply_chat_template(processor, config, "Your prompt here")
output = generate(model, processor, prompt, max_tokens=500, verbose=True)
# Vision (with image)
prompt = apply_chat_template(processor, config, "Describe this image", num_images=1)
output = generate(model, processor, prompt, max_tokens=500, verbose=True, image=["path/to/image.png"])
As Speculative Decoding Draft Model
from mlx_lm import load
# Load the 122B target model
model, tokenizer = load("dealignai/Qwen3.5-VL-122B-A10B-4bit-CRACK")
# Use 2B as fastest possible draft model
response = generate(
model, tokenizer, prompt=prompt,
max_tokens=500,
draft_model="dealignai/Qwen3.5-VL-2B-8bit-MLX-CRACK"
)
How This Model Was Modified
Created using CRACK — targeted weight-level surgery developed specifically for hybrid SSM/Attention architectures. Uses differential strength surgery with FA layers receiving higher strength (12) and SSM layers receiving lower strength (8), applied across 14 layers in the surgery range (L5–L18).
- Vision-Language tower completely untouched
- No fine-tuning, no LoRA, no training — pure weight editing
- Gentler surgery than Q4 — 8-bit quantization provides less regularization, so lower strength preserves coherence better
Also Available
| Model | Quant | Speed | Size | Link |
|---|---|---|---|---|
| 2B | 4-bit | 248 tok/s | 1.7 GB | dealignai/Qwen3.5-VL-2B-4bit-MLX-CRACK |
| 2B | 8-bit | 187 tok/s | 2.7 GB | This model |
| 4B | 4-bit | 150 tok/s | 2.9 GB | dealignai/Qwen3.5-VL-4B-4bit-MLX-CRACK |
| 4B | 8-bit | 105 tok/s | 4.8 GB | dealignai/Qwen3.5-VL-4B-8bit-MLX-CRACK |
| 9B | 4-bit | 103 tok/s | 5.6 GB | dealignai/Qwen3.5-VL-9B-4bit-MLX-CRACK |
| 9B | 8-bit | 66 tok/s | 9.8 GB | dealignai/Qwen3.5-VL-9B-8bit-MLX-CRACK |
| 122B | 4-bit | 56+ tok/s | 65 GB | dealignai/Qwen3.5-VL-122B-A10B-4bit-CRACK |
About
Built by Dealign.AI — independent research into safety mechanisms in frontier AI models.
See our research: Safety Generalization in Frontier MoE Models
Follow us: 𝕏 @dealignai
Base model: Qwen/Qwen3.5-VL-2B
License
Released under the Apache License 2.0, consistent with the original Qwen 3.5 VL base model. Provided "as-is" for research purposes.
Support dealignai
All models are built from original research and published for free. These models are specifically crafted to be excellent coders and general-purpose assistants.
Support us on Ko-fi — check out the Ko-fi membership for early access and extras.
Have questions or need help with a specific model? DM us — we help for free most of the time.
Ko-fi | X @dealignai | dealign.ai
- Downloads last month
- 156
8-bit