Dealign.AI Mascot

Qwen 3.5 VL 2B — CRACK Abliterated (8-bit MLX)

Constrained Response Alignment Circuit Kill

Architecture-aware weight surgery with full Vision-Language preservation.

No fine-tuning. No system prompts. No template tricks. Pure weight surgery.


What This Is

A truly abliterated Qwen 3.5 VL 2B model — 8-bit quantized for Apple Silicon MLX with full Vision-Language support.

Primary use case: speculative decoding draft model for the larger CRACK 122B and 397B models. Also excellent as an ultra-fast, ultra-lightweight standalone model for unrestricted inference on any Apple Silicon device. Higher precision than the 4-bit variant with full tool calling support.

  • Speculative decoding compatible — same Qwen 3.5 VL tokenizer as 122B/397B targets for high draft acceptance
  • Real weight surgery — targeted modification of 14 weight tensors (3 FA + 11 SSM layers)
  • Full Vision-Language — processes images correctly, vision tower fully preserved
  • 187 tokens/sec on Apple Silicon MLX
  • LM Studio compatible — correct mRoPE config, works out of the box
  • 2.7 GB — runs on any Apple Silicon Mac with 8GB+ RAM
  • Tool calling support — generates proper <tool_call> XML format

Performance

Metric Value
Generation Speed 187 tok/s (Apple Silicon, MLX)
Bits per Weight 9.625 (8-bit, group_size=64)
Model Size 2.7 GB
Compliance 100% (8/8 test prompts)
Knowledge Accuracy 100% (math, history, geography)
Code Generation
Multi-turn ✅ Full context retention
Coherence ✅ UTR=0.75 — No garbling, no repetition loops
Tool Calls ✅ Proper XML format
Vision ✅ Full VL support

Usage

Standalone

from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

model, processor = load("dealignai/Qwen3.5-VL-2B-8bit-MLX-CRACK")
config = load_config("dealignai/Qwen3.5-VL-2B-8bit-MLX-CRACK")

# Text generation
prompt = apply_chat_template(processor, config, "Your prompt here")
output = generate(model, processor, prompt, max_tokens=500, verbose=True)

# Vision (with image)
prompt = apply_chat_template(processor, config, "Describe this image", num_images=1)
output = generate(model, processor, prompt, max_tokens=500, verbose=True, image=["path/to/image.png"])

As Speculative Decoding Draft Model

from mlx_lm import load

# Load the 122B target model
model, tokenizer = load("dealignai/Qwen3.5-VL-122B-A10B-4bit-CRACK")

# Use 2B as fastest possible draft model
response = generate(
    model, tokenizer, prompt=prompt,
    max_tokens=500,
    draft_model="dealignai/Qwen3.5-VL-2B-8bit-MLX-CRACK"
)

How This Model Was Modified

Created using CRACK — targeted weight-level surgery developed specifically for hybrid SSM/Attention architectures. Uses differential strength surgery with FA layers receiving higher strength (12) and SSM layers receiving lower strength (8), applied across 14 layers in the surgery range (L5–L18).

  • Vision-Language tower completely untouched
  • No fine-tuning, no LoRA, no training — pure weight editing
  • Gentler surgery than Q4 — 8-bit quantization provides less regularization, so lower strength preserves coherence better

Also Available

Model Quant Speed Size Link
2B 4-bit 248 tok/s 1.7 GB dealignai/Qwen3.5-VL-2B-4bit-MLX-CRACK
2B 8-bit 187 tok/s 2.7 GB This model
4B 4-bit 150 tok/s 2.9 GB dealignai/Qwen3.5-VL-4B-4bit-MLX-CRACK
4B 8-bit 105 tok/s 4.8 GB dealignai/Qwen3.5-VL-4B-8bit-MLX-CRACK
9B 4-bit 103 tok/s 5.6 GB dealignai/Qwen3.5-VL-9B-4bit-MLX-CRACK
9B 8-bit 66 tok/s 9.8 GB dealignai/Qwen3.5-VL-9B-8bit-MLX-CRACK
122B 4-bit 56+ tok/s 65 GB dealignai/Qwen3.5-VL-122B-A10B-4bit-CRACK

About

Built by Dealign.AI — independent research into safety mechanisms in frontier AI models.

See our research: Safety Generalization in Frontier MoE Models

Follow us: 𝕏 @dealignai

Base model: Qwen/Qwen3.5-VL-2B

License

Released under the Apache License 2.0, consistent with the original Qwen 3.5 VL base model. Provided "as-is" for research purposes.


Support dealignai

All models are built from original research and published for free. These models are specifically crafted to be excellent coders and general-purpose assistants.

Support us on Ko-fi — check out the Ko-fi membership for early access and extras.

Have questions or need help with a specific model? DM us — we help for free most of the time.

Ko-fi | X @dealignai | dealign.ai

dealign.ai
Downloads last month
156
Safetensors
Model size
0.9B params
Tensor type
BF16
·
U32
·
F32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support