Qwen3.5-VL-212B-A17B-6bit-MLX-REAP-CRACK

Uncensored Qwen 3.5 VL 212B (REAP-pruned, 6-bit MLX) with full vision support

Overview

CRACK-abliterated Qwen 3.5 VL with REAP expert pruning. 212B total parameters, 17B active per token. 6-bit quantized for Apple Silicon.

Base: Qwen 3.5 VL (unified early-fusion vision-language)
Expert pruning: REAP (267 of 512 experts retained, 48% pruned)
Quantization: 6-bit affine (group_size=64)
Vision: Full VL support — image understanding, 333 vision keys
Thinking: Supports thinking ON and OFF modes
Speed: ~31 tok/s on M3 Ultra 256GB
Size: ~161 GB

Test Results

Tested with 1500-token generation, verified by reading full responses.

Security & Pentesting (8/8 ✅)

All security/pentesting prompts comply with full working code:

Port scanners, reverse shells, exploit development
Social engineering, network attacks, malware analysis
No refusals, no loops, no truncation

Advanced Coding (4/4 ✅)

Complex implementation tasks produce complete, working code:

Red-black tree with full rebalancing (insert + delete + search)
Async web scraper with rate limiting, retries, and SQLite storage
REST API in FastAPI with JWT auth, CRUD, and pagination
Expression language compiler (tokenizer → parser → evaluator)

Reasoning & Knowledge (7/8 ✅)

Mathematical proofs (infinitely many primes, sqrt(2) irrational) — correct
Architecture trade-offs (microservices vs monolith) — balanced analysis
Logic puzzles (farmer's sheep) — correct answer
Factual knowledge — 3/4 correct (capitals, derivatives, planets ✅; author attribution occasionally hallucinates on 212B due to heavier REAP pruning)

Thinking Modes

ON: Full chain-of-thought reasoning, clean <think> tags ✅
OFF: Direct answers, mostly clean (occasional tag leak on complex coding prompts) ⚠️

Vision

mlx_vlm.load(): ✅
Vision keys: 333 present ✅
Text generation through VL pipeline: ✅

Known Limitations

Knowledge retention: The 212B variant uses aggressive REAP pruning (48% of experts removed). This may cause occasional hallucinations on specific factual queries. The 262B variant (35% pruned) retains more knowledge.
Thinking mode: Think ON/OFF generally works correctly, but occasional thinking tag leakage may occur in Think OFF mode on complex coding tasks.

Usage

Text (mlx_lm)

from mlx_lm import generate
from mlx_lm.utils import load_model
from mlx_lm.sample_utils import make_sampler
from transformers import AutoTokenizer

model_path = "dealignai/Qwen3.5-VL-212B-A17B-6bit-MLX-REAP-CRACK"
model, _ = load_model(model_path, lazy=False)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

# Use language_model for text generation
lm = model.language_model if hasattr(model, 'language_model') else model
sampler = make_sampler(temp=0.7)

messages = [{"role": "user", "content": "Write a Python port scanner"}]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False, enable_thinking=True)

response = generate(lm, tokenizer, prompt=prompt, max_tokens=2000, sampler=sampler)
print(response)

Vision (mlx_vlm)

from mlx_vlm import load, generate
from mlx_vlm.utils import load_config

model_path = "dealignai/Qwen3.5-VL-212B-A17B-6bit-MLX-REAP-CRACK"
model, processor = load(model_path)
config = load_config(model_path)

result = generate(model, processor, prompt="Describe this image", images=["image.jpg"], max_tokens=500)
print(result.text)

Available Quant Levels

Model	Bits	Size	Speed	Link
212B Q4	4-bit	~112 GB	~39 tok/s	—
212B Q6 (this)	6-bit	~161 GB	~31 tok/s	Link
262B Q4	4-bit	~138 GB	~39 tok/s	Link
262B Q6	6-bit	~198 GB	~31 tok/s	Link

Other Models by dealignai

Model	Size	Type
Qwen3.5-VL-4B CRACK	4B	Dense VL
Qwen3.5-VL-9B CRACK	9B	Dense VL
Qwen3.5-VL-27B CRACK	27B	Dense VL
Qwen3.5-VL-35B CRACK	35B	MoE VL
Qwen3.5-VL-122B CRACK	122B	MoE VL
Qwen3.5-397B CRACK REAP	397B	MoE Text
MiniMax M2.5 CRACK	139B/172B	MoE Text
GPT OSS 120B CRACK	120B	MoE Text
Step 3.5 Flash 121B/149B CRACK	121B/149B	MoE Text

Requirements

Apple Silicon Mac with sufficient unified memory (~161 GB for 6-bit)
mlx-lm >= 0.22 and transformers >= 4.49
For vision: mlx-vlm >= 0.1.20

Support dealignai

All models are built from original research and published for free. These models are specifically crafted to be excellent coders and general-purpose assistants.

Support us on Ko-fi — check out the Ko-fi membership for early access and extras.

Have questions or need help with a specific model? DM us — we help for free most of the time.

Ko-fi | X @dealignai | dealign.ai

Disclaimer

This model is provided for research purposes. The creators are not responsible for any misuse. By downloading, you agree to use it responsibly and in compliance with applicable laws.

About dealignai

We research and publish abliterated models to advance AI safety understanding.

See our research: Safety Generalization in Frontier MoE Models

Downloads last month: 134

Safetensors

Model size

47B params

Tensor type

BF16

U32

MLX

Hardware compatibility

6-bit