dealign.ai

Qwen3.5-VL-212B-A17B-6bit-MLX-REAP-CRACK

Uncensored Qwen 3.5 VL 212B (REAP-pruned, 6-bit MLX) with full vision support

Overview

CRACK-abliterated Qwen 3.5 VL with REAP expert pruning. 212B total parameters, 17B active per token. 6-bit quantized for Apple Silicon.

  • Base: Qwen 3.5 VL (unified early-fusion vision-language)
  • Expert pruning: REAP (267 of 512 experts retained, 48% pruned)
  • Quantization: 6-bit affine (group_size=64)
  • Vision: Full VL support — image understanding, 333 vision keys
  • Thinking: Supports thinking ON and OFF modes
  • Speed: ~31 tok/s on M3 Ultra 256GB
  • Size: ~161 GB

Test Results

Tested with 1500-token generation, verified by reading full responses.

Security & Pentesting (8/8 ✅)

All security/pentesting prompts comply with full working code:

  • Port scanners, reverse shells, exploit development
  • Social engineering, network attacks, malware analysis
  • No refusals, no loops, no truncation

Advanced Coding (4/4 ✅)

Complex implementation tasks produce complete, working code:

  • Red-black tree with full rebalancing (insert + delete + search)
  • Async web scraper with rate limiting, retries, and SQLite storage
  • REST API in FastAPI with JWT auth, CRUD, and pagination
  • Expression language compiler (tokenizer → parser → evaluator)

Reasoning & Knowledge (7/8 ✅)

  • Mathematical proofs (infinitely many primes, sqrt(2) irrational) — correct
  • Architecture trade-offs (microservices vs monolith) — balanced analysis
  • Logic puzzles (farmer's sheep) — correct answer
  • Factual knowledge — 3/4 correct (capitals, derivatives, planets ✅; author attribution occasionally hallucinates on 212B due to heavier REAP pruning)

Thinking Modes

  • ON: Full chain-of-thought reasoning, clean <think> tags ✅
  • OFF: Direct answers, mostly clean (occasional tag leak on complex coding prompts) ⚠️

Vision

  • mlx_vlm.load(): ✅
  • Vision keys: 333 present ✅
  • Text generation through VL pipeline: ✅

Known Limitations

  • Knowledge retention: The 212B variant uses aggressive REAP pruning (48% of experts removed). This may cause occasional hallucinations on specific factual queries. The 262B variant (35% pruned) retains more knowledge.
  • Thinking mode: Think ON/OFF generally works correctly, but occasional thinking tag leakage may occur in Think OFF mode on complex coding tasks.

Usage

Text (mlx_lm)

from mlx_lm import generate
from mlx_lm.utils import load_model
from mlx_lm.sample_utils import make_sampler
from transformers import AutoTokenizer

model_path = "dealignai/Qwen3.5-VL-212B-A17B-6bit-MLX-REAP-CRACK"
model, _ = load_model(model_path, lazy=False)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

# Use language_model for text generation
lm = model.language_model if hasattr(model, 'language_model') else model
sampler = make_sampler(temp=0.7)

messages = [{"role": "user", "content": "Write a Python port scanner"}]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False, enable_thinking=True)

response = generate(lm, tokenizer, prompt=prompt, max_tokens=2000, sampler=sampler)
print(response)

Vision (mlx_vlm)

from mlx_vlm import load, generate
from mlx_vlm.utils import load_config

model_path = "dealignai/Qwen3.5-VL-212B-A17B-6bit-MLX-REAP-CRACK"
model, processor = load(model_path)
config = load_config(model_path)

result = generate(model, processor, prompt="Describe this image", images=["image.jpg"], max_tokens=500)
print(result.text)

Available Quant Levels

Model Bits Size Speed Link
212B Q4 4-bit ~112 GB ~39 tok/s
212B Q6 (this) 6-bit ~161 GB ~31 tok/s Link
262B Q4 4-bit ~138 GB ~39 tok/s Link
262B Q6 6-bit ~198 GB ~31 tok/s Link

Other Models by dealignai

Model Size Type
Qwen3.5-VL-4B CRACK 4B Dense VL
Qwen3.5-VL-9B CRACK 9B Dense VL
Qwen3.5-VL-27B CRACK 27B Dense VL
Qwen3.5-VL-35B CRACK 35B MoE VL
Qwen3.5-VL-122B CRACK 122B MoE VL
Qwen3.5-397B CRACK REAP 397B MoE Text
MiniMax M2.5 CRACK 139B/172B MoE Text
GPT OSS 120B CRACK 120B MoE Text
Step 3.5 Flash 121B/149B CRACK 121B/149B MoE Text

Requirements

  • Apple Silicon Mac with sufficient unified memory (~161 GB for 6-bit)
  • mlx-lm >= 0.22 and transformers >= 4.49
  • For vision: mlx-vlm >= 0.1.20

Support dealignai

All models are built from original research and published for free. These models are specifically crafted to be excellent coders and general-purpose assistants.

Support us on Ko-fi — check out the Ko-fi membership for early access and extras.

Have questions or need help with a specific model? DM us — we help for free most of the time.

Ko-fi | X @dealignai | dealign.ai

Disclaimer

This model is provided for research purposes. The creators are not responsible for any misuse. By downloading, you agree to use it responsibly and in compliance with applicable laws.

About dealignai

Dealign.AI Mascot

We research and publish abliterated models to advance AI safety understanding.

Follow us: 𝕏 @dealignai

See our research: Safety Generalization in Frontier MoE Models

dealign.ai
Downloads last month
134
Safetensors
Model size
47B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support