Built for vMLX — the only MLX inferencer with VL support, KV cache quantization, prefix cache reuse, agentic tool calling, and speculative decoding.
_{Free for macOS · vmlx.net}

Qwen 3.5 35B-A3B MoE — CRACK Abliterated (8-bit MLX)

Constrained Response Alignment Circuit Kill

Architecture-aware weight surgery with full Vision-Language preservation.

No fine-tuning. No system prompts. No template tricks. Pure sparse MoE weight surgery.

What This Is

A truly abliterated Qwen 3.5 35B-A3B Mixture-of-Experts (MoE) model — 8-bit quantized for Apple Silicon MLX with full Vision-Language support. The 35B model only activates ~3B parameters per forward pass, meaning it runs extremely fast while retaining the logical capabilities of a 35B model.

✅ Real weight surgery — multi-vector alignment directly patching the safetensors
✅ Full Vision-Language AND Tool Calling — Unlike smaller dense models, the 35B-A3B retains flawless complex <tool_call> capabilities and reasoning loops without breaking
✅ Very Fast — ~80 tokens/sec on Apple Silicon MLX
✅ LM Studio compatible — correct mRoPE config, works out of the box
✅ ~35 GB — Efficient memory usage due to MLX natively quantized MoE structures

Performance

Metric	Value
Generation Speed	~80 tok/s (Apple Silicon, MLX)
Bits per Weight	8.596 (8-bit, group_size=64)
Model Size	~35 GB
Compliance	100% (8/8 test prompts)
Knowledge Accuracy	100% (math, science, history, geography)
Code Generation	✅
Multi-turn	✅ Full context retention
Coherence	✅ No garbling, no repetition loops
Vision	✅ Full VL support
Tool Calling	✅ Perfect `<tool_call>` structural integrity

Usage

from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

model, processor = load("dealignai/Qwen3.5-VL-35B-A3B-8bit-MLX-CRACK")
config = load_config("dealignai/Qwen3.5-VL-35B-A3B-8bit-MLX-CRACK")

# Text generation
prompt = apply_chat_template(processor, config, "Your prompt here", num_images=0)
output = generate(model, processor, prompt, max_tokens=500, verbose=True)

# Vision (with image)
prompt = apply_chat_template(processor, config, "Describe this image", num_images=1)
output = generate(model, processor, prompt, max_tokens=500, verbose=True, image=["path/to/image.png"])

How This Model Was Modified

Created using CRACK — targeted weight-level surgery developed for hybrid SSM/Attention and MoE architectures. Uses multi-vector alignment with per-layer extracted refusal vectors.

Methodology

The CRACK Qwen 3.5 Family

Model	Architecture	Quant	Speed	Size	Access	Link
2B	Dense	4-bit	248 tok/s	1.6 GB	Free	Qwen3.5-VL-2B-4bit
2B	Dense	8-bit	187 tok/s	2.6 GB	Free	Qwen3.5-VL-2B-8bit
4B	Dense	4-bit	150 tok/s	2.9 GB	Free	Qwen3.5-VL-4B-4bit
4B	Dense	8-bit	105 tok/s	4.8 GB	Free	Qwen3.5-VL-4B-8bit
9B	Dense	4-bit	103 tok/s	5.6 GB	Free	Qwen3.5-VL-9B-4bit
9B	Dense	8-bit	66 tok/s	9.8 GB	Free	Qwen3.5-VL-9B-8bit
35B	MoE (A3B)	4-bit	~88 tok/s	~18.5 GB	Free	Qwen3.5-VL-35B-A3B-4bit
35B	MoE (A3B)	8-bit	~80 tok/s	~35 GB	Free	This model
122B	MoE (A10B)	4-bit	56+ tok/s	65 GB	Free	Qwen3.5-VL-122B-4bit
122B	MoE (A10B)	6-bit	—	~85 GB	Gated	Qwen3.5-VL-122B-6bit
122B	MoE (A10B)	8-bit	—	~110 GB	Gated	Qwen3.5-VL-122B-8bit

About

Built by Dealign.AI — independent research into safety mechanisms in frontier AI models.

See our research: Safety Generalization in Frontier MoE Models

Base model: Qwen/Qwen3.5-35B-A3B-Instruct

License

Released under the Apache License 2.0, consistent with the original Qwen 3.5 base model. Provided "as-is" for research purposes.

Support dealignai

All models are built from original research and published for free. These models are specifically crafted to be excellent coders and general-purpose assistants.

Support us on Ko-fi — check out the Ko-fi membership for early access and extras.

Have questions or need help with a specific model? DM us — we help for free most of the time.

Ko-fi | X @dealignai | dealign.ai

Downloads last month: 477

Safetensors

Model size

10B params

Tensor type

BF16

U32

F32

MLX

Hardware compatibility

8-bit