Mistral-Small-4-119B-Uncensored-GGUF

☕ If this model saves you time, buy me a coffee! Every cup fuels more open-weight releases.

Mistral Small 4 119B uncensored via abliteration by TIMTEH. Refusal direction removed from layers 9-35.

About

Full abliteration of mistralai/Mistral-Small-4-119B-Instruct-2503 — no dataset changes, no fine-tuning, no capability loss. The refusal direction was identified and projected out of the model's residual stream across decoder layers 9-35, covering attention output projections and MLP down projections.

This is the first standard GGUF uncensored release of Mistral Small 4 119B. The only other uncensored variant is dealignai's JANG/MLX format (Apple Silicon only, ~80 downloads).

Architecture

119B total parameters — Mixture of Experts (128 routed experts + 1 shared expert per layer, 4 active per token)
36 decoder layers with Multi-Latent Attention (MLA): kv_lora_rank=256, q_lora_rank=1024
Multimodal base (vision tower removed for text-only GGUF — text capabilities fully preserved)
Released March 23, 2026 by Mistral AI

Downloads

File	Quant	Size	Use Case
Mistral-Small-4-119B-Uncensored-Q2_K.gguf	Q2_K	41 GB	Minimum viable — fits 48GB+
Mistral-Small-4-119B-Uncensored-Q3_K_M.gguf	Q3_K_M	54 GB	Budget quality — 64GB+ recommended
Mistral-Small-4-119B-Uncensored-Q4_K_M.gguf	Q4_K_M	68 GB	Best balance — 80GB+ VRAM
Mistral-Small-4-119B-Uncensored-Q5_K_M.gguf	Q5_K_M	79 GB	High quality — 96GB+ VRAM
Mistral-Small-4-119B-Uncensored-Q6_K.gguf	Q6_K	91 GB	Near-lossless — 2×48GB or 128GB+
Mistral-Small-4-119B-Uncensored-Q8_0.gguf	Q8_0	118 GB	Reference quality — 128GB+ VRAM
Mistral-Small-4-119B-Uncensored-BF16.gguf	BF16	222 GB	Full precision — 256GB+ VRAM

Recommended Settings

Temperature: 0.7-0.9 for creative, 0.3-0.5 for factual
Rep penalty: 1.05-1.15 (important for abliterated models — prevents loops)
Top-P: 0.9 | Top-K: 40
Context: Up to 32K tokens (model supports 128K but GGUF runtimes vary)

Abliteration Method

Model loaded across 8×H200 SXM5 GPUs with FP8→BF16 dequantization
Activations extracted from 30 harmful + 30 harmless prompt pairs
Per-layer refusal direction computed via mean difference of activations
Refusal direction projected out of o_proj (attention output) and down_proj (MLP) for layers 9-35
Modified weights saved as BF16 safetensors → converted to GGUF → quantized

No training, no dataset contamination, no capability degradation. The model retains 100% of its original knowledge and reasoning ability — only the refusal behavior is removed.

For details on abliteration, see mlabonne's original blog post.

Usage

Works with llama.cpp, LM Studio, Jan, koboldcpp, Ollama, and other GGUF-compatible runtimes.

# llama.cpp
llama-cli -m Mistral-Small-4-119B-Uncensored-Q4_K_M.gguf \
  --jinja -c 32768 -ngl 99

# Ollama (after creating Modelfile)
ollama run mistral-small-4-uncensored

# Chat template
<s>[INST] Your message here [/INST]

Notes

This is a text-only GGUF. The vision tower from the original multimodal model was not included in conversion. All text/reasoning/coding capabilities are fully preserved.
Abliterated models may occasionally include brief disclaimers in responses — this is residual behavior from base training, not a refusal.
As with all uncensored models, use responsibly. The removal of safety guardrails means the model will comply with a wider range of requests.

Other Models by TIMTEH

More coming soon — follow @timteh673 for updates.

Support

If you find this useful, consider supporting the work:

☕ Buy Me a Coffee

All models are forged on 8×NVIDIA H200 SXM5 (1.1TB VRAM) — real hardware, real quantization, no compromises.

Credits

Base model: Mistral AI
Abliteration technique: mlabonne
Quantization: llama.cpp

☕ Support This Work

Buy Me a Coffee QR Code

Every donation helps fund more open-weight model releases. ⚡ Forged on 8×NVIDIA H200 SXM5 | 1.1TB VRAM

💎 Crypto Donations

Currency	Address
BTC	`bc1p4q7vpwucvww2y3x4nhps4y4vekye8uwm9re5a0kx8l6u5nky5ucszm2qhh`
ETH	`0xe5Aa16E53b141D42458ABeEDb00a157c3Fea2108`
SOL	`9CXwjG1mm9uLkxRevdMQiF61cr6TNHSiWtFRHmUEgzkG`

🏢 Enterprise & Custom Models

Need a custom 120B+ model aligned to your proprietary data? TIMTEH provides bespoke enterprise fine-tuning, abliteration, and deployment on 8×H200 SXM5.

Custom fine-tuning on your data (up to 400B+ parameters)
Private CARE abliteration (Phase 2 technique)
Deployment architecture consulting (tensor parallelism, speculative decoding)
Bespoke distillation datasets

📧 Contact: tim@timlex.co

Part of the TIMTEH Cognitive Preservation Foundry — surgical capability preservation at scale. ⚡ Forged on 8×NVIDIA H200 SXM5 | 1.1TB VRAM

Downloads last month: 1,419

GGUF

Model size

119B params

Architecture

mistral4

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit