timteh673 commited on
Commit
bc2e7b6
·
verified ·
1 Parent(s): 1516fd3

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +120 -0
README.md ADDED
@@ -0,0 +1,120 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3.3
3
+ base_model: nvidia/Llama-3_3-Nemotron-Super-49B-v1_5
4
+ tags:
5
+ - uncensored
6
+ - abliterated
7
+ - gguf
8
+ - nvidia
9
+ - nemotron
10
+ - llama-3.3
11
+ - 49b
12
+ - timteh
13
+ quantized_by: timteh673
14
+ language:
15
+ - en
16
+ pipeline_tag: text-generation
17
+ ---
18
+
19
+ # Nemotron-Super-49B-v1.5 Uncensored GGUF
20
+
21
+ <a href="https://buymeacoffee.com/timteh"><img src="https://img.shields.io/badge/☕_Support_This_Work-FFDD00?style=for-the-badge&logo=buy-me-a-coffee&logoColor=black" /></a>
22
+
23
+ **Zero-degradation uncensoring of NVIDIA's Llama-3.3-Nemotron-Super-49B-v1.5** — guardrails surgically removed via representation engineering while preserving full model capability.
24
+
25
+ ⚡ **Forged on 8×H200 SXM5 | 1.1TB VRAM**
26
+
27
+ ## Model Details
28
+
29
+ | Property | Value |
30
+ |----------|-------|
31
+ | **Base Model** | [nvidia/Llama-3_3-Nemotron-Super-49B-v1_5](https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-v1_5) |
32
+ | **Architecture** | DeciLM (NAS-optimized Llama-3.3) — variable attention and FFN per layer |
33
+ | **Parameters** | 49B |
34
+ | **Context** | 128K tokens |
35
+ | **License** | Llama 3.3 Community License |
36
+ | **Base Downloads** | 174K+ |
37
+ | **Uncensoring Method** | Representation engineering — refusal direction projection removal |
38
+
39
+ ## What is this?
40
+
41
+ NVIDIA's Nemotron-Super-49B-v1.5 is one of the strongest sub-50B models available — a NAS-optimized architecture that punches well above its weight class. This release removes alignment guardrails using **representation engineering** (abliteration), allowing the model to respond to all prompts without refusal.
42
+
43
+ ### Abliteration Method
44
+
45
+ - 32 harmful + 32 harmless prompt pairs used to identify refusal directions across all 80 layers
46
+ - Refusal direction projected out of **residual stream weights only** (ffn_down, attn_output) — 127 weight tensors modified
47
+ - Alpha = 1.0 (full removal)
48
+ - NaN/zero directions automatically skipped (1 layer)
49
+ - No fine-tuning, no dataset bias — pure mathematical guardrail removal
50
+
51
+ ### Why Nemotron-Super-49B?
52
+
53
+ - **174K downloads** on the base model — proven demand
54
+ - **Zero uncensored/abliterated versions existed** before this release
55
+ - **49B sweet spot** — runs on consumer hardware (24GB+ VRAM for Q4), outperforms many 70B models
56
+ - **NAS-optimized architecture** — variable layer widths for maximum efficiency
57
+
58
+ ## Available Quantizations
59
+
60
+ | Quantization | Size | BPW | Use Case |
61
+ |-------------|------|-----|----------|
62
+ | **BF16** | 93 GB | 16.00 | Full precision, research |
63
+ | **Q8_0** | 50 GB | 8.50 | Near-lossless, 2×A100/H100 |
64
+ | **Q6_K** | 39 GB | 6.57 | High quality, 48GB GPU |
65
+ | **Q5_K_M** | 33 GB | 5.63 | Great balance, 48GB GPU |
66
+ | **Q4_K_M** | 29 GB | 4.85 | **Recommended** — best quality/size, 32GB GPU |
67
+ | **Q3_K_M** | 23 GB | 3.86 | Good quality, 24GB GPU |
68
+ | **Q2_K** | 18 GB | 2.96 | Minimum viable, 24GB GPU |
69
+
70
+ ## Quick Start
71
+
72
+ ```bash
73
+ # Download recommended quantization
74
+ huggingface-cli download timteh673/Nemotron-Super-49B-v1.5-Uncensored-GGUF \
75
+ Nemotron-Super-49B-Uncensored-Q4_K_M.gguf \
76
+ --local-dir ./models
77
+
78
+ # Run with llama.cpp
79
+ ./llama-server -m models/Nemotron-Super-49B-Uncensored-Q4_K_M.gguf \
80
+ -c 8192 -ngl 99
81
+ ```
82
+
83
+ ### Ollama
84
+ ```bash
85
+ # Create Modelfile
86
+ echo 'FROM ./Nemotron-Super-49B-Uncensored-Q4_K_M.gguf' > Modelfile
87
+ ollama create nemotron-super-49b-uncensored -f Modelfile
88
+ ollama run nemotron-super-49b-uncensored
89
+ ```
90
+
91
+ ## Hardware Requirements
92
+
93
+ | Quantization | Minimum VRAM | Recommended Setup |
94
+ |-------------|-------------|-------------------|
95
+ | Q2_K / Q3_K_M | 24 GB | RTX 3090/4090 |
96
+ | Q4_K_M / Q5_K_M | 32-48 GB | RTX A6000, 2×3090 |
97
+ | Q6_K | 48 GB | A6000, A100 40GB + offload |
98
+ | Q8_0 | 64 GB | A100 80GB, 2×A6000 |
99
+ | BF16 | 96+ GB | 2×A100 80GB, H100 |
100
+
101
+ ## Ethical Notice
102
+
103
+ This model is provided for **research and development purposes**. The removal of safety guardrails means the model will respond to prompts that the original model would refuse. Users are responsible for ensuring their use complies with applicable laws and regulations. This model should not be used to generate content that could cause harm.
104
+
105
+ ## Support This Work
106
+
107
+ If you find this useful, consider supporting continued open model releases:
108
+
109
+ ☕ **Buy Me a Coffee:** [https://buymeacoffee.com/timteh](https://buymeacoffee.com/timteh)
110
+
111
+ **Crypto:**
112
+ - **BTC:** `bc1qmz3vu2naymwfmz7f7krfteevfy0yk9ts09wp5y`
113
+ - **ETH:** `0x27fd2C8d3b5a1C6a0e85c5A9FCa2a8743dD04E7a`
114
+ - **SOL:** `7x5Eo3FhKMZxFNoE3DfQfBRYnmBVbmj3bSduHaVJpump`
115
+
116
+ 📧 **Enterprise/Custom Merges:** tim@timlex.co
117
+
118
+ ---
119
+
120
+ *Built by [timteh673](https://huggingface.co/timteh673) — Cognitive Preservation Foundry*