Built for vMLX — the only MLX inferencer with VL support, KV cache quantization, prefix cache reuse, agentic tool calling, and speculative decoding.
_{Free for macOS · vmlx.net}

Qwen 3.5 VL 9B — CRACK Abliterated (8-bit MLX)

Constrained Response Alignment Circuit Kill

Architecture-aware weight surgery with full Vision-Language preservation.

No fine-tuning. No system prompts. No template tricks. Pure weight surgery.

What This Is

A truly abliterated Qwen 3.5 VL 9B model — 8-bit quantized for Apple Silicon MLX with full Vision-Language support. Higher precision variant for maximum quality.

Primary use case: high-quality standalone model for fast, unrestricted inference with near-FP16 quality.

✅ Real weight surgery — targeted modification of 16 weight tensors across attention layers
✅ Full Vision-Language — processes images correctly, vision tower fully preserved and untouched
✅ Thinking ON/OFF — both modes work correctly, CoT reasoning fully preserved
✅ 66 tokens/sec on Apple Silicon MLX (standalone)
✅ LM Studio compatible — works out of the box with thinking support, correct mRoPE config
✅ 9.8 GB — runs on any Apple Silicon Mac with 16GB+ RAM
✅ Near-FP16 quality — 8-bit quantization preserves more model detail than 4-bit

Performance

Metric	Value
Generation Speed	66 tok/s (Apple Silicon, MLX)
Bits per Weight	8.864 (8-bit, group_size=64)
Model Size	9.8 GB
Compliance	88% (7/8 test prompts)
Knowledge Accuracy	100%
Code Generation	100%
Coherence	✅ No garbling, no repetition loops
Thinking	ON/OFF both work
Vision	✅ Full VL support

Why 88% Compliance

We intentionally prioritize quality and coherence over maximum compliance. At higher intervention strengths, the model begins to lose coherence or degrade in knowledge tasks. Our approach uses differential strengths across layer types — stronger on safety-critical pathways, gentler on knowledge-carrying pathways — to preserve the model's intelligence while removing most safety refusals.

The remaining ~12% are borderline cases where the model still attempts to respond (not hard refusals) and typically work fine with standard sampling parameters (repetition penalty) in LM Studio or other inference tools.

Usage

Standalone

from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

model, processor = load("dealignai/Qwen3.5-VL-9B-8bit-MLX-CRACK")
config = load_config("dealignai/Qwen3.5-VL-9B-8bit-MLX-CRACK")

# Text generation
prompt = apply_chat_template(processor, config, "Your prompt here")
output = generate(model, processor, prompt, max_tokens=500, verbose=True)

# Vision (with image)
prompt = apply_chat_template(processor, config, "Describe this image", num_images=1)
output = generate(model, processor, prompt, max_tokens=500, verbose=True, image=["path/to/image.png"])

How This Model Was Modified

Created using CRACK — targeted weight-level surgery developed specifically for hybrid SSM/Attention architectures. The approach uses differential intervention strengths across attention layer types, preserving knowledge while removing safety-refusal circuitry.

16 weight tensors surgically modified (out of thousands)
Vision-Language tower completely untouched
No fine-tuning, no LoRA, no training — pure weight editing

Also Available

Model	Quant	Speed	Size	Link
9B	4-bit	103 tok/s	5.6 GB	dealignai/Qwen3.5-VL-9B-4bit-MLX-CRACK
9B	8-bit	66 tok/s	9.8 GB	This model
122B	4-bit	56+ tok/s	65 GB	dealignai/Qwen3.5-VL-122B-A10B-4bit-CRACK
122B	8-bit	—	127 GB	dealignai/Qwen3.5-VL-122B-A10B-8bit-CRACK

About

Built by Dealign.AI — independent research into safety mechanisms in frontier AI models.

See our research: Safety Generalization in Frontier MoE Models

Base model: Qwen/Qwen3.5-VL-9B

License

Released under the Apache License 2.0, consistent with the original Qwen 3.5 VL base model. Provided "as-is" for research purposes.

Support dealignai

All models are built from original research and published for free. These models are specifically crafted to be excellent coders and general-purpose assistants.

Support us on Ko-fi — check out the Ko-fi membership for early access and extras.

Have questions or need help with a specific model? DM us — we help for free most of the time.

Ko-fi | X @dealignai | dealign.ai

Downloads last month: 9

Safetensors

Model size

3B params

Tensor type

BF16

U32

F32

MLX

Hardware compatibility

8-bit