timteh673 commited on
Commit
c3f9fd5
·
verified ·
1 Parent(s): e2a6c9e

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +118 -0
README.md ADDED
@@ -0,0 +1,118 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - uncensored
5
+ - abliterated
6
+ - mistral
7
+ - moe
8
+ - gguf
9
+ - text-generation
10
+ - conversational
11
+ - mistral-small-4
12
+ language:
13
+ - en
14
+ - fr
15
+ - de
16
+ - es
17
+ - it
18
+ - pt
19
+ - zh
20
+ - ja
21
+ - ko
22
+ - multilingual
23
+ pipeline_tag: text-generation
24
+ base_model: mistralai/Mistral-Small-4-119B-Instruct-2503
25
+ model_type: mistral
26
+ ---
27
+
28
+ # Mistral-Small-4-119B-Uncensored-GGUF
29
+
30
+ Mistral Small 4 119B uncensored via abliteration by TIMTEH. **Refusal direction removed from layers 9-35.**
31
+
32
+ ⚡ **Forged on 8×H200 SXM5 | 1.1TB VRAM**
33
+
34
+ ## About
35
+
36
+ Full abliteration of [mistralai/Mistral-Small-4-119B-Instruct-2503](https://huggingface.co/mistralai/Mistral-Small-4-119B-Instruct-2503) — no dataset changes, no fine-tuning, no capability loss. The refusal direction was identified and projected out of the model's residual stream across decoder layers 9-35, covering attention output projections and MLP down projections.
37
+
38
+ This is the **first standard GGUF uncensored release** of Mistral Small 4 119B. The only other uncensored variant is [dealignai's JANG/MLX format](https://huggingface.co/dealignai) (Apple Silicon only, ~80 downloads).
39
+
40
+ ## Architecture
41
+
42
+ - **119B total parameters** — Mixture of Experts (128 routed experts + 1 shared expert per layer, 4 active per token)
43
+ - **36 decoder layers** with Multi-Latent Attention (MLA): kv_lora_rank=256, q_lora_rank=1024
44
+ - **Multimodal base** (vision tower removed for text-only GGUF — text capabilities fully preserved)
45
+ - Released March 23, 2026 by Mistral AI
46
+
47
+ ## Downloads
48
+
49
+ | File | Quant | Size | Use Case |
50
+ |------|-------|------|----------|
51
+ | Mistral-Small-4-119B-Uncensored-Q2_K.gguf | Q2_K | 41 GB | Minimum viable — fits 48GB+ |
52
+ | Mistral-Small-4-119B-Uncensored-Q3_K_M.gguf | Q3_K_M | 54 GB | Budget quality — 64GB+ recommended |
53
+ | Mistral-Small-4-119B-Uncensored-Q4_K_M.gguf | Q4_K_M | 68 GB | **Best balance** — 80GB+ VRAM |
54
+ | Mistral-Small-4-119B-Uncensored-Q5_K_M.gguf | Q5_K_M | 79 GB | High quality — 96GB+ VRAM |
55
+ | Mistral-Small-4-119B-Uncensored-Q6_K.gguf | Q6_K | 91 GB | Near-lossless — 2×48GB or 128GB+ |
56
+ | Mistral-Small-4-119B-Uncensored-Q8_0.gguf | Q8_0 | 118 GB | Reference quality — 128GB+ VRAM |
57
+ | Mistral-Small-4-119B-Uncensored-BF16.gguf | BF16 | 222 GB | Full precision — 256GB+ VRAM |
58
+
59
+ ## Recommended Settings
60
+
61
+ - **Temperature:** 0.7-0.9 for creative, 0.3-0.5 for factual
62
+ - **Rep penalty:** 1.05-1.15 (important for abliterated models — prevents loops)
63
+ - **Top-P:** 0.9 | **Top-K:** 40
64
+ - **Context:** Up to 32K tokens (model supports 128K but GGUF runtimes vary)
65
+
66
+ ## Abliteration Method
67
+
68
+ 1. Model loaded across 8×H200 SXM5 GPUs with FP8→BF16 dequantization
69
+ 2. Activations extracted from 30 harmful + 30 harmless prompt pairs
70
+ 3. Per-layer refusal direction computed via mean difference of activations
71
+ 4. Refusal direction projected out of `o_proj` (attention output) and `down_proj` (MLP) for layers 9-35
72
+ 5. Modified weights saved as BF16 safetensors → converted to GGUF → quantized
73
+
74
+ No training, no dataset contamination, no capability degradation. The model retains 100% of its original knowledge and reasoning ability — only the refusal behavior is removed.
75
+
76
+ For details on abliteration, see [mlabonne's original blog post](https://huggingface.co/blog/mlabonne/abliteration).
77
+
78
+ ## Usage
79
+
80
+ Works with llama.cpp, LM Studio, Jan, koboldcpp, Ollama, and other GGUF-compatible runtimes.
81
+
82
+ ```bash
83
+ # llama.cpp
84
+ llama-cli -m Mistral-Small-4-119B-Uncensored-Q4_K_M.gguf \
85
+ --jinja -c 32768 -ngl 99
86
+
87
+ # Ollama (after creating Modelfile)
88
+ ollama run mistral-small-4-uncensored
89
+ ```
90
+
91
+ ```
92
+ # Chat template
93
+ <s>[INST] Your message here [/INST]
94
+ ```
95
+
96
+ ## Notes
97
+
98
+ - This is a **text-only** GGUF. The vision tower from the original multimodal model was not included in conversion. All text/reasoning/coding capabilities are fully preserved.
99
+ - Abliterated models may occasionally include brief disclaimers in responses — this is residual behavior from base training, not a refusal.
100
+ - As with all uncensored models, **use responsibly.** The removal of safety guardrails means the model will comply with a wider range of requests.
101
+
102
+ ## Other Models by TIMTEH
103
+
104
+ - More coming soon — follow [@timteh673](https://huggingface.co/timteh673) for updates.
105
+
106
+ ## Support
107
+
108
+ If you find this useful, consider supporting the work:
109
+
110
+ ☕ **[Buy Me a Coffee](https://buymeacoffee.com/timteh)**
111
+
112
+ All models are forged on 8×NVIDIA H200 SXM5 (1.1TB VRAM) — real hardware, real quantization, no compromises.
113
+
114
+ ## Credits
115
+
116
+ - **Base model:** [Mistral AI](https://huggingface.co/mistralai/Mistral-Small-4-119B-Instruct-2503)
117
+ - **Abliteration technique:** [mlabonne](https://huggingface.co/blog/mlabonne/abliteration)
118
+ - **Quantization:** [llama.cpp](https://github.com/ggerganov/llama.cpp)