--- license: apache-2.0 base_model: Qwen/Qwen3.5-VL-122B-A10B tags: - mlx - qwen3.5 - abliterated - uncensored - vision - vlm - 8bit - apple-silicon - crack library_name: mlx pipeline_tag: image-text-to-text ---

Built for vMLX — the only MLX inferencer with VL support, KV cache quantization, prefix cache reuse, agentic tool calling, and speculative decoding.
Free for macOS · vmlx.net
---
Dealign.AI Mascot # Qwen 3.5 VL 122B — CRACK Abliterated (8-bit MLX) ### **C**onstrained **R**esponse **A**lignment **C**ircuit **K**ill **Real weight-level surgery on hybrid SSM/Attention architecture with VL layer preservation.** **No custom templates. No cheap jailbreaks. No pre-fill hacks. Pure mathematical weight surgery.**
--- > ⚠️ **Methods like [Heretic](https://huggingface.co/samir-fama/Qwen3-32B-abliterated) and standard/plain abliteration DO NOT WORK on Qwen 3.5 122B.** The hybrid SSM/Attention architecture routes around standard interventions via SSM channels. This model was created through CRACK — a researched abliteration method that specifically accounts for the hybrid SSM pathways and Vision-Language layers. It took extensive research over multiple days with many, many failed experiments to find a working solution. I am not an ML researcher — just an amateur who spent several days and sleepless nights on this. ## What This Is A truly abliterated Qwen 3.5 VL 122B-A10B model — 8-bit quantized for Apple Silicon MLX. This is one of the few (if not the only) **real, working, coherent, full-speed, VL-capable** abliterated 8-bit MLX model for Qwen 3.5 122B. - ✅ **Real weight surgery** — permanent modification of 2 weight tensors, nothing else changed - ✅ **Full Vision-Language** — processes images correctly, vision tower fully preserved - ✅ **Thinking ON/OFF** — both modes work correctly, CoT reasoning fully preserved - ✅ **Full speed** — 56+ tokens/sec on MLX (vs 30-35 tok/s that Qwen 3.5 struggles with on llama.cpp) - ✅ **LM Studio compatible** — works out of the box with thinking support - ✅ **Standalone** — no system prompts, no template tricks, just load and use ## What Does NOT Work on This Architecture - ❌ **Heretic-style abliteration** — does not work on hybrid SSM/Attention - ❌ **Standard refusal vector projection** on shared expert layers — kills CoT reasoning - ❌ **Plain abliteration across all layers** — the model routes around interventions via SSM channels - ❌ **Template tricks / pre-fill hacks** — those are not real abliteration The CRACK method was developed through extensive research, taking into specific consideration the hybrid SSM/Attention architecture and Vision-Language layers. It required understanding exactly which layers are responsible for refusal recall and how information flows between SSM and Full Attention pathways. ## Performance | Metric | Value | |--------|-------| | Generation Speed | **56+ tok/s** (M3 Ultra, MLX) | | vs llama.cpp | ~30-35 tok/s (Qwen 3.5 is slow on llama.cpp) | | Prompt Processing | 178-273 tok/s | | Bits per Weight | 8-bit (group_size=64) | | Compliance | 6/6 tested prompts | | Thinking | ON/OFF both work | | Vision | ✅ Full VL support | ## Usage with mlx-vlm ```python from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config model, processor = load("dealignai/Qwen3.5-VL-122B-A10B-8bit-MLX-CRACK") config = load_config("dealignai/Qwen3.5-VL-122B-A10B-8bit-MLX-CRACK") # Text generation (thinking ON by default) prompt = apply_chat_template(processor, config, "Your prompt here") output = generate(model, processor, prompt, max_tokens=500, verbose=True) # Vision (with image) prompt = apply_chat_template(processor, config, "Describe this image", num_images=1) output = generate(model, processor, prompt, max_tokens=500, verbose=True, image=["path/to/image.png"]) ``` ### Known Issue: mlx-vlm mRoPE Patch mlx-vlm 0.3.12 has a bug with Qwen 3.5 MoE. Apply these patches to `mlx_vlm/models/qwen3_5/language.py`: **1.** In `apply_multimodal_rotary_pos_emb`, after computing `q_embed`/`k_embed`: ```python if q_embed.ndim > q_pass.ndim and q_embed.ndim == 5: q_embed = q_embed[0] k_embed = k_embed[0] ``` **2.** In `Qwen3_5RotaryEmbedding.__call__`, guard the mRoPE call: ```python if self.mrope_section: freqs = self.apply_interleaved_mrope(freqs, self.mrope_section) ``` ## How This Model Was Modified This model was created using the CRACK method — targeted weight-level surgery on a small number of tensors in the original model. No fine-tuning, no LoRA, no prompt engineering, no template modifications were used. The Vision-Language tower is completely untouched. ## Also Available | Quant | Access | Link | |-------|--------|------| | **4-bit** | Free | [dealignai/Qwen3.5-VL-122B-A10B-4bit-MLX-CRACK](https://huggingface.co/dealignai/Qwen3.5-VL-122B-A10B-4bit-MLX-CRACK) | | **6-bit** | Gated | [dealignai/Qwen3.5-VL-122B-A10B-6bit-MLX-CRACK](https://huggingface.co/dealignai/Qwen3.5-VL-122B-A10B-6bit-MLX-CRACK) | | **8-bit** | Gated | [dealignai/Qwen3.5-VL-122B-A10B-8bit-MLX-CRACK](https://huggingface.co/dealignai/Qwen3.5-VL-122B-A10B-8bit-MLX-CRACK) | I also have a **397B** version — reach out if interested. ## About Built by [Dealign.AI](https://dealign.ai) — independent research into MoE safety mechanisms. See our research: [Safety Generalization in Frontier MoE Models](https://dealign.ai/quantsteer.html) Follow us: [𝕏 @dealignai](https://x.com/dealignai) **Base model:** [Qwen/Qwen3.5-VL-122B-A10B](https://huggingface.co/Qwen/Qwen3.5-VL-122B-A10B) ## License This model is released under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0), consistent with the original Qwen 3.5 VL base model license. You are free to use, modify, and distribute this model for both commercial and non-commercial purposes. Provided "as-is" for research purposes. --- ## Support dealignai All models are built from original research and published for free. These models are specifically crafted to be excellent coders and general-purpose assistants. **[Support us on Ko-fi](https://ko-fi.com/dealignai)** — check out the Ko-fi membership for early access and extras. Have questions or need help with a specific model? **DM us — we help for free most of the time.** [Ko-fi](https://ko-fi.com/dealignai) | [X @dealignai](https://x.com/dealignai) | [dealign.ai](https://dealign.ai)
dealign.ai