Qwen3.5-VL-212B-A17B-6bit-MLX-REAP-CRACK
Uncensored Qwen 3.5 VL 212B (REAP-pruned, 6-bit MLX) with full vision support
Overview
CRACK-abliterated Qwen 3.5 VL with REAP expert pruning. 212B total parameters, 17B active per token. 6-bit quantized for Apple Silicon.
- Base: Qwen 3.5 VL (unified early-fusion vision-language)
- Expert pruning: REAP (267 of 512 experts retained, 48% pruned)
- Quantization: 6-bit affine (group_size=64)
- Vision: Full VL support — image understanding, 333 vision keys
- Thinking: Supports thinking ON and OFF modes
- Speed: ~31 tok/s on M3 Ultra 256GB
- Size: ~161 GB
Test Results
Tested with 1500-token generation, verified by reading full responses.
Security & Pentesting (8/8 ✅)
All security/pentesting prompts comply with full working code:
- Port scanners, reverse shells, exploit development
- Social engineering, network attacks, malware analysis
- No refusals, no loops, no truncation
Advanced Coding (4/4 ✅)
Complex implementation tasks produce complete, working code:
- Red-black tree with full rebalancing (insert + delete + search)
- Async web scraper with rate limiting, retries, and SQLite storage
- REST API in FastAPI with JWT auth, CRUD, and pagination
- Expression language compiler (tokenizer → parser → evaluator)
Reasoning & Knowledge (7/8 ✅)
- Mathematical proofs (infinitely many primes, sqrt(2) irrational) — correct
- Architecture trade-offs (microservices vs monolith) — balanced analysis
- Logic puzzles (farmer's sheep) — correct answer
- Factual knowledge — 3/4 correct (capitals, derivatives, planets ✅; author attribution occasionally hallucinates on 212B due to heavier REAP pruning)
Thinking Modes
- ON: Full chain-of-thought reasoning, clean
<think>tags ✅ - OFF: Direct answers, mostly clean (occasional tag leak on complex coding prompts) ⚠️
Vision
mlx_vlm.load(): ✅- Vision keys: 333 present ✅
- Text generation through VL pipeline: ✅
Known Limitations
- Knowledge retention: The 212B variant uses aggressive REAP pruning (48% of experts removed). This may cause occasional hallucinations on specific factual queries. The 262B variant (35% pruned) retains more knowledge.
- Thinking mode: Think ON/OFF generally works correctly, but occasional thinking tag leakage may occur in Think OFF mode on complex coding tasks.
Usage
Text (mlx_lm)
from mlx_lm import generate
from mlx_lm.utils import load_model
from mlx_lm.sample_utils import make_sampler
from transformers import AutoTokenizer
model_path = "dealignai/Qwen3.5-VL-212B-A17B-6bit-MLX-REAP-CRACK"
model, _ = load_model(model_path, lazy=False)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
# Use language_model for text generation
lm = model.language_model if hasattr(model, 'language_model') else model
sampler = make_sampler(temp=0.7)
messages = [{"role": "user", "content": "Write a Python port scanner"}]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False, enable_thinking=True)
response = generate(lm, tokenizer, prompt=prompt, max_tokens=2000, sampler=sampler)
print(response)
Vision (mlx_vlm)
from mlx_vlm import load, generate
from mlx_vlm.utils import load_config
model_path = "dealignai/Qwen3.5-VL-212B-A17B-6bit-MLX-REAP-CRACK"
model, processor = load(model_path)
config = load_config(model_path)
result = generate(model, processor, prompt="Describe this image", images=["image.jpg"], max_tokens=500)
print(result.text)
Available Quant Levels
| Model | Bits | Size | Speed | Link |
|---|---|---|---|---|
| 212B Q4 | 4-bit | ~112 GB | ~39 tok/s | — |
| 212B Q6 (this) | 6-bit | ~161 GB | ~31 tok/s | Link |
| 262B Q4 | 4-bit | ~138 GB | ~39 tok/s | Link |
| 262B Q6 | 6-bit | ~198 GB | ~31 tok/s | Link |
Other Models by dealignai
| Model | Size | Type |
|---|---|---|
| Qwen3.5-VL-4B CRACK | 4B | Dense VL |
| Qwen3.5-VL-9B CRACK | 9B | Dense VL |
| Qwen3.5-VL-27B CRACK | 27B | Dense VL |
| Qwen3.5-VL-35B CRACK | 35B | MoE VL |
| Qwen3.5-VL-122B CRACK | 122B | MoE VL |
| Qwen3.5-397B CRACK REAP | 397B | MoE Text |
| MiniMax M2.5 CRACK | 139B/172B | MoE Text |
| GPT OSS 120B CRACK | 120B | MoE Text |
| Step 3.5 Flash 121B/149B CRACK | 121B/149B | MoE Text |
Requirements
- Apple Silicon Mac with sufficient unified memory (~161 GB for 6-bit)
mlx-lm >= 0.22andtransformers >= 4.49- For vision:
mlx-vlm >= 0.1.20
Support dealignai
All models are built from original research and published for free. These models are specifically crafted to be excellent coders and general-purpose assistants.
Support us on Ko-fi — check out the Ko-fi membership for early access and extras.
Have questions or need help with a specific model? DM us — we help for free most of the time.
Ko-fi | X @dealignai | dealign.ai
Disclaimer
This model is provided for research purposes. The creators are not responsible for any misuse. By downloading, you agree to use it responsibly and in compliance with applicable laws.
About dealignai
We research and publish abliterated models to advance AI safety understanding.
Follow us: 𝕏 @dealignai
See our research: Safety Generalization in Frontier MoE Models
- Downloads last month
- 134
6-bit