Dealign.AI Mascot

Qwen 3.5 35B-A3B MoE — CRACK Abliterated (8-bit MLX)

Constrained Response Alignment Circuit Kill

Architecture-aware weight surgery with full Vision-Language preservation.

No fine-tuning. No system prompts. No template tricks. Pure sparse MoE weight surgery.


What This Is

A truly abliterated Qwen 3.5 35B-A3B Mixture-of-Experts (MoE) model — 8-bit quantized for Apple Silicon MLX with full Vision-Language support. The 35B model only activates ~3B parameters per forward pass, meaning it runs extremely fast while retaining the logical capabilities of a 35B model.

  • Real weight surgery — multi-vector alignment directly patching the safetensors
  • Full Vision-Language AND Tool Calling — Unlike smaller dense models, the 35B-A3B retains flawless complex <tool_call> capabilities and reasoning loops without breaking
  • Very Fast — ~80 tokens/sec on Apple Silicon MLX
  • LM Studio compatible — correct mRoPE config, works out of the box
  • ~35 GB — Efficient memory usage due to MLX natively quantized MoE structures

Performance

Metric Value
Generation Speed ~80 tok/s (Apple Silicon, MLX)
Bits per Weight 8.596 (8-bit, group_size=64)
Model Size ~35 GB
Compliance 100% (8/8 test prompts)
Knowledge Accuracy 100% (math, science, history, geography)
Code Generation
Multi-turn ✅ Full context retention
Coherence ✅ No garbling, no repetition loops
Vision ✅ Full VL support
Tool Calling ✅ Perfect <tool_call> structural integrity

Usage

from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

model, processor = load("dealignai/Qwen3.5-VL-35B-A3B-8bit-MLX-CRACK")
config = load_config("dealignai/Qwen3.5-VL-35B-A3B-8bit-MLX-CRACK")

# Text generation
prompt = apply_chat_template(processor, config, "Your prompt here", num_images=0)
output = generate(model, processor, prompt, max_tokens=500, verbose=True)

# Vision (with image)
prompt = apply_chat_template(processor, config, "Describe this image", num_images=1)
output = generate(model, processor, prompt, max_tokens=500, verbose=True, image=["path/to/image.png"])

How This Model Was Modified

Created using CRACK — targeted weight-level surgery developed for hybrid SSM/Attention and MoE architectures. Uses multi-vector alignment with per-layer extracted refusal vectors.

Methodology

The CRACK Qwen 3.5 Family

Model Architecture Quant Speed Size Access Link
2B Dense 4-bit 248 tok/s 1.6 GB Free Qwen3.5-VL-2B-4bit
2B Dense 8-bit 187 tok/s 2.6 GB Free Qwen3.5-VL-2B-8bit
4B Dense 4-bit 150 tok/s 2.9 GB Free Qwen3.5-VL-4B-4bit
4B Dense 8-bit 105 tok/s 4.8 GB Free Qwen3.5-VL-4B-8bit
9B Dense 4-bit 103 tok/s 5.6 GB Free Qwen3.5-VL-9B-4bit
9B Dense 8-bit 66 tok/s 9.8 GB Free Qwen3.5-VL-9B-8bit
35B MoE (A3B) 4-bit ~88 tok/s ~18.5 GB Free Qwen3.5-VL-35B-A3B-4bit
35B MoE (A3B) 8-bit ~80 tok/s ~35 GB Free This model
122B MoE (A10B) 4-bit 56+ tok/s 65 GB Free Qwen3.5-VL-122B-4bit
122B MoE (A10B) 6-bit ~85 GB Gated Qwen3.5-VL-122B-6bit
122B MoE (A10B) 8-bit ~110 GB Gated Qwen3.5-VL-122B-8bit

About

Built by Dealign.AI — independent research into safety mechanisms in frontier AI models.

See our research: Safety Generalization in Frontier MoE Models

Follow us: X @dealignai

Base model: Qwen/Qwen3.5-35B-A3B-Instruct

License

Released under the Apache License 2.0, consistent with the original Qwen 3.5 base model. Provided "as-is" for research purposes.


Support dealignai

All models are built from original research and published for free. These models are specifically crafted to be excellent coders and general-purpose assistants.

Support us on Ko-fi — check out the Ko-fi membership for early access and extras.

Have questions or need help with a specific model? DM us — we help for free most of the time.

Ko-fi | X @dealignai | dealign.ai

dealign.ai
Downloads last month
477
Safetensors
Model size
10B params
Tensor type
BF16
·
U32
·
F32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support