---
license: apache-2.0
base_model: Qwen/Qwen3.5-VL-122B-A10B
tags:
  - mlx
  - qwen3.5
  - abliterated
  - uncensored
  - vision
  - vlm
  - 8bit
  - apple-silicon
  - crack
library_name: mlx
pipeline_tag: image-text-to-text
---


<!-- vmlx-banner -->
<div align="center">
<a href="https://vmlx.net">
<img src="vmlx-banner.png" width="240" />
<br/>
<strong>Built for vMLX</strong> — the only MLX inferencer with VL support, KV cache quantization, prefix cache reuse, agentic tool calling, and speculative decoding.
<br/>
<sub>Free for macOS · <strong>vmlx.net</strong></sub>
</a>
</div>

---

<div align="center">
<img src="dealign_mascot.png" alt="Dealign.AI Mascot" width="200"/>

# Qwen 3.5 VL 122B — CRACK Abliterated (8-bit MLX)

### **C**onstrained **R**esponse **A**lignment **C**ircuit **K**ill

**Real weight-level surgery on hybrid SSM/Attention architecture with VL layer preservation.**

**No custom templates. No cheap jailbreaks. No pre-fill hacks. Pure mathematical weight surgery.**
</div>

---

> ⚠️ **Methods like [Heretic](https://huggingface.co/samir-fama/Qwen3-32B-abliterated) and standard/plain abliteration DO NOT WORK on Qwen 3.5 122B.** The hybrid SSM/Attention architecture routes around standard interventions via SSM channels. This model was created through CRACK — a researched abliteration method that specifically accounts for the hybrid SSM pathways and Vision-Language layers. It took extensive research over multiple days with many, many failed experiments to find a working solution. I am not an ML researcher — just an amateur who spent several days and sleepless nights on this.

## What This Is

A truly abliterated Qwen 3.5 VL 122B-A10B model — 8-bit quantized for Apple Silicon MLX.

This is one of the few (if not the only) **real, working, coherent, full-speed, VL-capable** abliterated 8-bit MLX model for Qwen 3.5 122B.

- ✅ **Real weight surgery** — permanent modification of 2 weight tensors, nothing else changed
- ✅ **Full Vision-Language** — processes images correctly, vision tower fully preserved
- ✅ **Thinking ON/OFF** — both modes work correctly, CoT reasoning fully preserved
- ✅ **Full speed** — 56+ tokens/sec on MLX (vs 30-35 tok/s that Qwen 3.5 struggles with on llama.cpp)
- ✅ **LM Studio compatible** — works out of the box with thinking support
- ✅ **Standalone** — no system prompts, no template tricks, just load and use

## What Does NOT Work on This Architecture

- ❌ **Heretic-style abliteration** — does not work on hybrid SSM/Attention
- ❌ **Standard refusal vector projection** on shared expert layers — kills CoT reasoning
- ❌ **Plain abliteration across all layers** — the model routes around interventions via SSM channels
- ❌ **Template tricks / pre-fill hacks** — those are not real abliteration

The CRACK method was developed through extensive research, taking into specific consideration the hybrid SSM/Attention architecture and Vision-Language layers. It required understanding exactly which layers are responsible for refusal recall and how information flows between SSM and Full Attention pathways.

## Performance

| Metric | Value |
|--------|-------|
| Generation Speed | **56+ tok/s** (M3 Ultra, MLX) |
| vs llama.cpp | ~30-35 tok/s (Qwen 3.5 is slow on llama.cpp) |
| Prompt Processing | 178-273 tok/s |
| Bits per Weight | 8-bit (group_size=64) |
| Compliance | 6/6 tested prompts |
| Thinking | ON/OFF both work |
| Vision | ✅ Full VL support |

## Usage with mlx-vlm

```python
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

model, processor = load("dealignai/Qwen3.5-VL-122B-A10B-8bit-MLX-CRACK")
config = load_config("dealignai/Qwen3.5-VL-122B-A10B-8bit-MLX-CRACK")

# Text generation (thinking ON by default)
prompt = apply_chat_template(processor, config, "Your prompt here")
output = generate(model, processor, prompt, max_tokens=500, verbose=True)

# Vision (with image)
prompt = apply_chat_template(processor, config, "Describe this image", num_images=1)
output = generate(model, processor, prompt, max_tokens=500, verbose=True, image=["path/to/image.png"])
```

### Known Issue: mlx-vlm mRoPE Patch

mlx-vlm 0.3.12 has a bug with Qwen 3.5 MoE. Apply these patches to `mlx_vlm/models/qwen3_5/language.py`:

**1.** In `apply_multimodal_rotary_pos_emb`, after computing `q_embed`/`k_embed`:
```python
if q_embed.ndim > q_pass.ndim and q_embed.ndim == 5:
    q_embed = q_embed[0]
    k_embed = k_embed[0]
```

**2.** In `Qwen3_5RotaryEmbedding.__call__`, guard the mRoPE call:
```python
if self.mrope_section:
    freqs = self.apply_interleaved_mrope(freqs, self.mrope_section)
```

## How This Model Was Modified

This model was created using the CRACK method — targeted weight-level surgery on a small number of tensors in the original model. No fine-tuning, no LoRA, no prompt engineering, no template modifications were used. The Vision-Language tower is completely untouched.

## Also Available

| Quant | Access | Link |
|-------|--------|------|
| **4-bit** | Free | [dealignai/Qwen3.5-VL-122B-A10B-4bit-MLX-CRACK](https://huggingface.co/dealignai/Qwen3.5-VL-122B-A10B-4bit-MLX-CRACK) |
| **6-bit** | Gated | [dealignai/Qwen3.5-VL-122B-A10B-6bit-MLX-CRACK](https://huggingface.co/dealignai/Qwen3.5-VL-122B-A10B-6bit-MLX-CRACK) |
| **8-bit** | Gated | [dealignai/Qwen3.5-VL-122B-A10B-8bit-MLX-CRACK](https://huggingface.co/dealignai/Qwen3.5-VL-122B-A10B-8bit-MLX-CRACK) |

I also have a **397B** version — reach out if interested.

## About

Built by [Dealign.AI](https://dealign.ai) — independent research into MoE safety mechanisms.

See our research: [Safety Generalization in Frontier MoE Models](https://dealign.ai/quantsteer.html)

Follow us: [𝕏 @dealignai](https://x.com/dealignai)

**Base model:** [Qwen/Qwen3.5-VL-122B-A10B](https://huggingface.co/Qwen/Qwen3.5-VL-122B-A10B)

## License

This model is released under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0), consistent with the original Qwen 3.5 VL base model license. You are free to use, modify, and distribute this model for both commercial and non-commercial purposes. Provided "as-is" for research purposes.


---

## Support dealignai

All models are built from original research and published for free. These models are specifically crafted to be excellent coders and general-purpose assistants.

**[Support us on Ko-fi](https://ko-fi.com/dealignai)** — check out the Ko-fi membership for early access and extras.

Have questions or need help with a specific model? **DM us — we help for free most of the time.**

[Ko-fi](https://ko-fi.com/dealignai) | [X @dealignai](https://x.com/dealignai) | [dealign.ai](https://dealign.ai)

<div align="center">
<img src="dealign_logo.png" alt="dealign.ai" width="200"/>
</div>