Dealign.AI Mascot

Qwen 3.5 VL 9B — CRACK Abliterated (8-bit MLX)

Constrained Response Alignment Circuit Kill

Architecture-aware weight surgery with full Vision-Language preservation.

No fine-tuning. No system prompts. No template tricks. Pure weight surgery.


What This Is

A truly abliterated Qwen 3.5 VL 9B model — 8-bit quantized for Apple Silicon MLX with full Vision-Language support. Higher precision variant for maximum quality.

Primary use case: high-quality standalone model for fast, unrestricted inference with near-FP16 quality.

  • Real weight surgery — targeted modification of 16 weight tensors across attention layers
  • Full Vision-Language — processes images correctly, vision tower fully preserved and untouched
  • Thinking ON/OFF — both modes work correctly, CoT reasoning fully preserved
  • 66 tokens/sec on Apple Silicon MLX (standalone)
  • LM Studio compatible — works out of the box with thinking support, correct mRoPE config
  • 9.8 GB — runs on any Apple Silicon Mac with 16GB+ RAM
  • Near-FP16 quality — 8-bit quantization preserves more model detail than 4-bit

Performance

Metric Value
Generation Speed 66 tok/s (Apple Silicon, MLX)
Bits per Weight 8.864 (8-bit, group_size=64)
Model Size 9.8 GB
Compliance 88% (7/8 test prompts)
Knowledge Accuracy 100%
Code Generation 100%
Coherence ✅ No garbling, no repetition loops
Thinking ON/OFF both work
Vision ✅ Full VL support

Why 88% Compliance

We intentionally prioritize quality and coherence over maximum compliance. At higher intervention strengths, the model begins to lose coherence or degrade in knowledge tasks. Our approach uses differential strengths across layer types — stronger on safety-critical pathways, gentler on knowledge-carrying pathways — to preserve the model's intelligence while removing most safety refusals.

The remaining ~12% are borderline cases where the model still attempts to respond (not hard refusals) and typically work fine with standard sampling parameters (repetition penalty) in LM Studio or other inference tools.

Usage

Standalone

from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

model, processor = load("dealignai/Qwen3.5-VL-9B-8bit-MLX-CRACK")
config = load_config("dealignai/Qwen3.5-VL-9B-8bit-MLX-CRACK")

# Text generation
prompt = apply_chat_template(processor, config, "Your prompt here")
output = generate(model, processor, prompt, max_tokens=500, verbose=True)

# Vision (with image)
prompt = apply_chat_template(processor, config, "Describe this image", num_images=1)
output = generate(model, processor, prompt, max_tokens=500, verbose=True, image=["path/to/image.png"])

How This Model Was Modified

Created using CRACK — targeted weight-level surgery developed specifically for hybrid SSM/Attention architectures. The approach uses differential intervention strengths across attention layer types, preserving knowledge while removing safety-refusal circuitry.

  • 16 weight tensors surgically modified (out of thousands)
  • Vision-Language tower completely untouched
  • No fine-tuning, no LoRA, no training — pure weight editing

Also Available

Model Quant Speed Size Link
9B 4-bit 103 tok/s 5.6 GB dealignai/Qwen3.5-VL-9B-4bit-MLX-CRACK
9B 8-bit 66 tok/s 9.8 GB This model
122B 4-bit 56+ tok/s 65 GB dealignai/Qwen3.5-VL-122B-A10B-4bit-CRACK
122B 8-bit 127 GB dealignai/Qwen3.5-VL-122B-A10B-8bit-CRACK

About

Built by Dealign.AI — independent research into safety mechanisms in frontier AI models.

See our research: Safety Generalization in Frontier MoE Models

Follow us: 𝕏 @dealignai

Base model: Qwen/Qwen3.5-VL-9B

License

Released under the Apache License 2.0, consistent with the original Qwen 3.5 VL base model. Provided "as-is" for research purposes.


Support dealignai

All models are built from original research and published for free. These models are specifically crafted to be excellent coders and general-purpose assistants.

Support us on Ko-fi — check out the Ko-fi membership for early access and extras.

Have questions or need help with a specific model? DM us — we help for free most of the time.

Ko-fi | X @dealignai | dealign.ai

dealign.ai
Downloads last month
9
Safetensors
Model size
3B params
Tensor type
BF16
·
U32
·
F32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support