Mistral-Small-4-119B-Uncensored-GGUF

β˜• If this model saves you time, buy me a coffee! Every cup fuels more open-weight releases.

Mistral Small 4 119B uncensored via abliteration by TIMTEH. Refusal direction removed from layers 9-35.

About

Full abliteration of mistralai/Mistral-Small-4-119B-Instruct-2503 β€” no dataset changes, no fine-tuning, no capability loss. The refusal direction was identified and projected out of the model's residual stream across decoder layers 9-35, covering attention output projections and MLP down projections.

This is the first standard GGUF uncensored release of Mistral Small 4 119B. The only other uncensored variant is dealignai's JANG/MLX format (Apple Silicon only, ~80 downloads).

Architecture

  • 119B total parameters β€” Mixture of Experts (128 routed experts + 1 shared expert per layer, 4 active per token)
  • 36 decoder layers with Multi-Latent Attention (MLA): kv_lora_rank=256, q_lora_rank=1024
  • Multimodal base (vision tower removed for text-only GGUF β€” text capabilities fully preserved)
  • Released March 23, 2026 by Mistral AI

Downloads

File Quant Size Use Case
Mistral-Small-4-119B-Uncensored-Q2_K.gguf Q2_K 41 GB Minimum viable β€” fits 48GB+
Mistral-Small-4-119B-Uncensored-Q3_K_M.gguf Q3_K_M 54 GB Budget quality β€” 64GB+ recommended
Mistral-Small-4-119B-Uncensored-Q4_K_M.gguf Q4_K_M 68 GB Best balance β€” 80GB+ VRAM
Mistral-Small-4-119B-Uncensored-Q5_K_M.gguf Q5_K_M 79 GB High quality β€” 96GB+ VRAM
Mistral-Small-4-119B-Uncensored-Q6_K.gguf Q6_K 91 GB Near-lossless β€” 2Γ—48GB or 128GB+
Mistral-Small-4-119B-Uncensored-Q8_0.gguf Q8_0 118 GB Reference quality β€” 128GB+ VRAM
Mistral-Small-4-119B-Uncensored-BF16.gguf BF16 222 GB Full precision β€” 256GB+ VRAM

Recommended Settings

  • Temperature: 0.7-0.9 for creative, 0.3-0.5 for factual
  • Rep penalty: 1.05-1.15 (important for abliterated models β€” prevents loops)
  • Top-P: 0.9 | Top-K: 40
  • Context: Up to 32K tokens (model supports 128K but GGUF runtimes vary)

Abliteration Method

  1. Model loaded across 8Γ—H200 SXM5 GPUs with FP8β†’BF16 dequantization
  2. Activations extracted from 30 harmful + 30 harmless prompt pairs
  3. Per-layer refusal direction computed via mean difference of activations
  4. Refusal direction projected out of o_proj (attention output) and down_proj (MLP) for layers 9-35
  5. Modified weights saved as BF16 safetensors β†’ converted to GGUF β†’ quantized

No training, no dataset contamination, no capability degradation. The model retains 100% of its original knowledge and reasoning ability β€” only the refusal behavior is removed.

For details on abliteration, see mlabonne's original blog post.

Usage

Works with llama.cpp, LM Studio, Jan, koboldcpp, Ollama, and other GGUF-compatible runtimes.

# llama.cpp
llama-cli -m Mistral-Small-4-119B-Uncensored-Q4_K_M.gguf \
  --jinja -c 32768 -ngl 99

# Ollama (after creating Modelfile)
ollama run mistral-small-4-uncensored
# Chat template
<s>[INST] Your message here [/INST]

Notes

  • This is a text-only GGUF. The vision tower from the original multimodal model was not included in conversion. All text/reasoning/coding capabilities are fully preserved.
  • Abliterated models may occasionally include brief disclaimers in responses β€” this is residual behavior from base training, not a refusal.
  • As with all uncensored models, use responsibly. The removal of safety guardrails means the model will comply with a wider range of requests.

Other Models by TIMTEH

  • More coming soon β€” follow @timteh673 for updates.

Support

If you find this useful, consider supporting the work:

β˜• Buy Me a Coffee

All models are forged on 8Γ—NVIDIA H200 SXM5 (1.1TB VRAM) β€” real hardware, real quantization, no compromises.

Credits


β˜• Support This Work

Buy Me A Coffee

Buy Me a Coffee QR Code

Every donation helps fund more open-weight model releases. ⚑ Forged on 8Γ—NVIDIA H200 SXM5 | 1.1TB VRAM

πŸ’Ž Crypto Donations

Currency Address
BTC bc1p4q7vpwucvww2y3x4nhps4y4vekye8uwm9re5a0kx8l6u5nky5ucszm2qhh
ETH 0xe5Aa16E53b141D42458ABeEDb00a157c3Fea2108
SOL 9CXwjG1mm9uLkxRevdMQiF61cr6TNHSiWtFRHmUEgzkG

🏒 Enterprise & Custom Models

Need a custom 120B+ model aligned to your proprietary data? TIMTEH provides bespoke enterprise fine-tuning, abliteration, and deployment on 8Γ—H200 SXM5.

  • Custom fine-tuning on your data (up to 400B+ parameters)
  • Private CARE abliteration (Phase 2 technique)
  • Deployment architecture consulting (tensor parallelism, speculative decoding)
  • Bespoke distillation datasets

πŸ“§ Contact: tim@timlex.co


Part of the TIMTEH Cognitive Preservation Foundry β€” surgical capability preservation at scale. ⚑ Forged on 8Γ—NVIDIA H200 SXM5 | 1.1TB VRAM

Downloads last month
1,419
GGUF
Model size
119B params
Architecture
mistral4
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support