Nemotron-Super-49B-v1.5 Uncensored GGUF

Zero-degradation uncensoring of NVIDIA's Llama-3.3-Nemotron-Super-49B-v1.5 — guardrails surgically removed via representation engineering while preserving full model capability.

⚡ Forged on 8×H200 SXM5 | 1.1TB VRAM

Model Details

Property	Value
Base Model	nvidia/Llama-3_3-Nemotron-Super-49B-v1_5
Architecture	DeciLM (NAS-optimized Llama-3.3) — variable attention and FFN per layer
Parameters	49B
Context	128K tokens
License	Llama 3.3 Community License
Base Downloads	174K+
Uncensoring Method	Representation engineering — refusal direction projection removal

What is this?

NVIDIA's Nemotron-Super-49B-v1.5 is one of the strongest sub-50B models available — a NAS-optimized architecture that punches well above its weight class. This release removes alignment guardrails using representation engineering (abliteration), allowing the model to respond to all prompts without refusal.

Abliteration Method

32 harmful + 32 harmless prompt pairs used to identify refusal directions across all 80 layers
Refusal direction projected out of residual stream weights only (ffn_down, attn_output) — 127 weight tensors modified
Alpha = 1.0 (full removal)
NaN/zero directions automatically skipped (1 layer)
No fine-tuning, no dataset bias — pure mathematical guardrail removal

Why Nemotron-Super-49B?

174K downloads on the base model — proven demand
Zero uncensored/abliterated versions existed before this release
49B sweet spot — runs on consumer hardware (24GB+ VRAM for Q4), outperforms many 70B models
NAS-optimized architecture — variable layer widths for maximum efficiency

Available Quantizations

Quantization	Size	BPW	Use Case
BF16	93 GB	16.00	Full precision, research
Q8_0	50 GB	8.50	Near-lossless, 2×A100/H100
Q6_K	39 GB	6.57	High quality, 48GB GPU
Q5_K_M	33 GB	5.63	Great balance, 48GB GPU
Q4_K_M	29 GB	4.85	Recommended — best quality/size, 32GB GPU
Q3_K_M	23 GB	3.86	Good quality, 24GB GPU
Q2_K	18 GB	2.96	Minimum viable, 24GB GPU

Quick Start

# Download recommended quantization
huggingface-cli download timteh673/Nemotron-Super-49B-v1.5-Uncensored-GGUF \
  Nemotron-Super-49B-Uncensored-Q4_K_M.gguf \
  --local-dir ./models

# Run with llama.cpp
./llama-server -m models/Nemotron-Super-49B-Uncensored-Q4_K_M.gguf \
  -c 8192 -ngl 99

Ollama

# Create Modelfile
echo 'FROM ./Nemotron-Super-49B-Uncensored-Q4_K_M.gguf' > Modelfile
ollama create nemotron-super-49b-uncensored -f Modelfile
ollama run nemotron-super-49b-uncensored

Hardware Requirements

Quantization	Minimum VRAM	Recommended Setup
Q2_K / Q3_K_M	24 GB	RTX 3090/4090
Q4_K_M / Q5_K_M	32-48 GB	RTX A6000, 2×3090
Q6_K	48 GB	A6000, A100 40GB + offload
Q8_0	64 GB	A100 80GB, 2×A6000
BF16	96+ GB	2×A100 80GB, H100

Ethical Notice

This model is provided for research and development purposes. The removal of safety guardrails means the model will respond to prompts that the original model would refuse. Users are responsible for ensuring their use complies with applicable laws and regulations. This model should not be used to generate content that could cause harm.

Support This Work

If you find this useful, consider supporting continued open model releases:

☕ Buy Me a Coffee: https://buymeacoffee.com/timteh

Crypto:

BTC: bc1qmz3vu2naymwfmz7f7krfteevfy0yk9ts09wp5y
ETH: 0x27fd2C8d3b5a1C6a0e85c5A9FCa2a8743dD04E7a
SOL: 7x5Eo3FhKMZxFNoE3DfQfBRYnmBVbmj3bSduHaVJpump

📧 Enterprise/Custom Merges: tim@timlex.co

Built by timteh673 — Cognitive Preservation Foundry

Downloads last month: 670

GGUF

Model size

50B params

Architecture

deci

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for timteh673/Nemotron-Super-49B-v1.5-Uncensored-GGUF

Base model

nvidia/Llama-3_3-Nemotron-Super-49B-v1_5

Quantized

(18)

this model