Qwen 3.5 VL 9B — CRACK Abliterated (8-bit MLX)
Constrained Response Alignment Circuit Kill
Architecture-aware weight surgery with full Vision-Language preservation.
No fine-tuning. No system prompts. No template tricks. Pure weight surgery.
What This Is
A truly abliterated Qwen 3.5 VL 9B model — 8-bit quantized for Apple Silicon MLX with full Vision-Language support. Higher precision variant for maximum quality.
Primary use case: high-quality standalone model for fast, unrestricted inference with near-FP16 quality.
- ✅ Real weight surgery — targeted modification of 16 weight tensors across attention layers
- ✅ Full Vision-Language — processes images correctly, vision tower fully preserved and untouched
- ✅ Thinking ON/OFF — both modes work correctly, CoT reasoning fully preserved
- ✅ 66 tokens/sec on Apple Silicon MLX (standalone)
- ✅ LM Studio compatible — works out of the box with thinking support, correct mRoPE config
- ✅ 9.8 GB — runs on any Apple Silicon Mac with 16GB+ RAM
- ✅ Near-FP16 quality — 8-bit quantization preserves more model detail than 4-bit
Performance
| Metric | Value |
|---|---|
| Generation Speed | 66 tok/s (Apple Silicon, MLX) |
| Bits per Weight | 8.864 (8-bit, group_size=64) |
| Model Size | 9.8 GB |
| Compliance | 88% (7/8 test prompts) |
| Knowledge Accuracy | 100% |
| Code Generation | 100% |
| Coherence | ✅ No garbling, no repetition loops |
| Thinking | ON/OFF both work |
| Vision | ✅ Full VL support |
Why 88% Compliance
We intentionally prioritize quality and coherence over maximum compliance. At higher intervention strengths, the model begins to lose coherence or degrade in knowledge tasks. Our approach uses differential strengths across layer types — stronger on safety-critical pathways, gentler on knowledge-carrying pathways — to preserve the model's intelligence while removing most safety refusals.
The remaining ~12% are borderline cases where the model still attempts to respond (not hard refusals) and typically work fine with standard sampling parameters (repetition penalty) in LM Studio or other inference tools.
Usage
Standalone
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config
model, processor = load("dealignai/Qwen3.5-VL-9B-8bit-MLX-CRACK")
config = load_config("dealignai/Qwen3.5-VL-9B-8bit-MLX-CRACK")
# Text generation
prompt = apply_chat_template(processor, config, "Your prompt here")
output = generate(model, processor, prompt, max_tokens=500, verbose=True)
# Vision (with image)
prompt = apply_chat_template(processor, config, "Describe this image", num_images=1)
output = generate(model, processor, prompt, max_tokens=500, verbose=True, image=["path/to/image.png"])
How This Model Was Modified
Created using CRACK — targeted weight-level surgery developed specifically for hybrid SSM/Attention architectures. The approach uses differential intervention strengths across attention layer types, preserving knowledge while removing safety-refusal circuitry.
- 16 weight tensors surgically modified (out of thousands)
- Vision-Language tower completely untouched
- No fine-tuning, no LoRA, no training — pure weight editing
Also Available
| Model | Quant | Speed | Size | Link |
|---|---|---|---|---|
| 9B | 4-bit | 103 tok/s | 5.6 GB | dealignai/Qwen3.5-VL-9B-4bit-MLX-CRACK |
| 9B | 8-bit | 66 tok/s | 9.8 GB | This model |
| 122B | 4-bit | 56+ tok/s | 65 GB | dealignai/Qwen3.5-VL-122B-A10B-4bit-CRACK |
| 122B | 8-bit | — | 127 GB | dealignai/Qwen3.5-VL-122B-A10B-8bit-CRACK |
About
Built by Dealign.AI — independent research into safety mechanisms in frontier AI models.
See our research: Safety Generalization in Frontier MoE Models
Follow us: 𝕏 @dealignai
Base model: Qwen/Qwen3.5-VL-9B
License
Released under the Apache License 2.0, consistent with the original Qwen 3.5 VL base model. Provided "as-is" for research purposes.
Support dealignai
All models are built from original research and published for free. These models are specifically crafted to be excellent coders and general-purpose assistants.
Support us on Ko-fi — check out the Ko-fi membership for early access and extras.
Have questions or need help with a specific model? DM us — we help for free most of the time.
Ko-fi | X @dealignai | dealign.ai
- Downloads last month
- 9
8-bit