Mistral-Small-4-119B-Uncensored-GGUF
β If this model saves you time, buy me a coffee! Every cup fuels more open-weight releases.
Mistral Small 4 119B uncensored via abliteration by TIMTEH. Refusal direction removed from layers 9-35.
About
Full abliteration of mistralai/Mistral-Small-4-119B-Instruct-2503 β no dataset changes, no fine-tuning, no capability loss. The refusal direction was identified and projected out of the model's residual stream across decoder layers 9-35, covering attention output projections and MLP down projections.
This is the first standard GGUF uncensored release of Mistral Small 4 119B. The only other uncensored variant is dealignai's JANG/MLX format (Apple Silicon only, ~80 downloads).
Architecture
- 119B total parameters β Mixture of Experts (128 routed experts + 1 shared expert per layer, 4 active per token)
- 36 decoder layers with Multi-Latent Attention (MLA): kv_lora_rank=256, q_lora_rank=1024
- Multimodal base (vision tower removed for text-only GGUF β text capabilities fully preserved)
- Released March 23, 2026 by Mistral AI
Downloads
| File | Quant | Size | Use Case |
|---|---|---|---|
| Mistral-Small-4-119B-Uncensored-Q2_K.gguf | Q2_K | 41 GB | Minimum viable β fits 48GB+ |
| Mistral-Small-4-119B-Uncensored-Q3_K_M.gguf | Q3_K_M | 54 GB | Budget quality β 64GB+ recommended |
| Mistral-Small-4-119B-Uncensored-Q4_K_M.gguf | Q4_K_M | 68 GB | Best balance β 80GB+ VRAM |
| Mistral-Small-4-119B-Uncensored-Q5_K_M.gguf | Q5_K_M | 79 GB | High quality β 96GB+ VRAM |
| Mistral-Small-4-119B-Uncensored-Q6_K.gguf | Q6_K | 91 GB | Near-lossless β 2Γ48GB or 128GB+ |
| Mistral-Small-4-119B-Uncensored-Q8_0.gguf | Q8_0 | 118 GB | Reference quality β 128GB+ VRAM |
| Mistral-Small-4-119B-Uncensored-BF16.gguf | BF16 | 222 GB | Full precision β 256GB+ VRAM |
Recommended Settings
- Temperature: 0.7-0.9 for creative, 0.3-0.5 for factual
- Rep penalty: 1.05-1.15 (important for abliterated models β prevents loops)
- Top-P: 0.9 | Top-K: 40
- Context: Up to 32K tokens (model supports 128K but GGUF runtimes vary)
Abliteration Method
- Model loaded across 8ΓH200 SXM5 GPUs with FP8βBF16 dequantization
- Activations extracted from 30 harmful + 30 harmless prompt pairs
- Per-layer refusal direction computed via mean difference of activations
- Refusal direction projected out of
o_proj(attention output) anddown_proj(MLP) for layers 9-35 - Modified weights saved as BF16 safetensors β converted to GGUF β quantized
No training, no dataset contamination, no capability degradation. The model retains 100% of its original knowledge and reasoning ability β only the refusal behavior is removed.
For details on abliteration, see mlabonne's original blog post.
Usage
Works with llama.cpp, LM Studio, Jan, koboldcpp, Ollama, and other GGUF-compatible runtimes.
# llama.cpp
llama-cli -m Mistral-Small-4-119B-Uncensored-Q4_K_M.gguf \
--jinja -c 32768 -ngl 99
# Ollama (after creating Modelfile)
ollama run mistral-small-4-uncensored
# Chat template
<s>[INST] Your message here [/INST]
Notes
- This is a text-only GGUF. The vision tower from the original multimodal model was not included in conversion. All text/reasoning/coding capabilities are fully preserved.
- Abliterated models may occasionally include brief disclaimers in responses β this is residual behavior from base training, not a refusal.
- As with all uncensored models, use responsibly. The removal of safety guardrails means the model will comply with a wider range of requests.
Other Models by TIMTEH
- More coming soon β follow @timteh673 for updates.
Support
If you find this useful, consider supporting the work:
β Buy Me a Coffee
All models are forged on 8ΓNVIDIA H200 SXM5 (1.1TB VRAM) β real hardware, real quantization, no compromises.
Credits
- Base model: Mistral AI
- Abliteration technique: mlabonne
- Quantization: llama.cpp
β Support This Work
Every donation helps fund more open-weight model releases. β‘ Forged on 8ΓNVIDIA H200 SXM5 | 1.1TB VRAM
π Crypto Donations
| Currency | Address |
|---|---|
| BTC | bc1p4q7vpwucvww2y3x4nhps4y4vekye8uwm9re5a0kx8l6u5nky5ucszm2qhh |
| ETH | 0xe5Aa16E53b141D42458ABeEDb00a157c3Fea2108 |
| SOL | 9CXwjG1mm9uLkxRevdMQiF61cr6TNHSiWtFRHmUEgzkG |
π’ Enterprise & Custom Models
Need a custom 120B+ model aligned to your proprietary data? TIMTEH provides bespoke enterprise fine-tuning, abliteration, and deployment on 8ΓH200 SXM5.
- Custom fine-tuning on your data (up to 400B+ parameters)
- Private CARE abliteration (Phase 2 technique)
- Deployment architecture consulting (tensor parallelism, speculative decoding)
- Bespoke distillation datasets
π§ Contact: tim@timlex.co
Part of the TIMTEH Cognitive Preservation Foundry β surgical capability preservation at scale. β‘ Forged on 8ΓNVIDIA H200 SXM5 | 1.1TB VRAM
- Downloads last month
- 1,419
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit