Nemotron-Super-49B-v1.5 Uncensored GGUF

Zero-degradation uncensoring of NVIDIA's Llama-3.3-Nemotron-Super-49B-v1.5 โ€” guardrails surgically removed via representation engineering while preserving full model capability.

โšก Forged on 8ร—H200 SXM5 | 1.1TB VRAM

Model Details

Property Value
Base Model nvidia/Llama-3_3-Nemotron-Super-49B-v1_5
Architecture DeciLM (NAS-optimized Llama-3.3) โ€” variable attention and FFN per layer
Parameters 49B
Context 128K tokens
License Llama 3.3 Community License
Base Downloads 174K+
Uncensoring Method Representation engineering โ€” refusal direction projection removal

What is this?

NVIDIA's Nemotron-Super-49B-v1.5 is one of the strongest sub-50B models available โ€” a NAS-optimized architecture that punches well above its weight class. This release removes alignment guardrails using representation engineering (abliteration), allowing the model to respond to all prompts without refusal.

Abliteration Method

  • 32 harmful + 32 harmless prompt pairs used to identify refusal directions across all 80 layers
  • Refusal direction projected out of residual stream weights only (ffn_down, attn_output) โ€” 127 weight tensors modified
  • Alpha = 1.0 (full removal)
  • NaN/zero directions automatically skipped (1 layer)
  • No fine-tuning, no dataset bias โ€” pure mathematical guardrail removal

Why Nemotron-Super-49B?

  • 174K downloads on the base model โ€” proven demand
  • Zero uncensored/abliterated versions existed before this release
  • 49B sweet spot โ€” runs on consumer hardware (24GB+ VRAM for Q4), outperforms many 70B models
  • NAS-optimized architecture โ€” variable layer widths for maximum efficiency

Available Quantizations

Quantization Size BPW Use Case
BF16 93 GB 16.00 Full precision, research
Q8_0 50 GB 8.50 Near-lossless, 2ร—A100/H100
Q6_K 39 GB 6.57 High quality, 48GB GPU
Q5_K_M 33 GB 5.63 Great balance, 48GB GPU
Q4_K_M 29 GB 4.85 Recommended โ€” best quality/size, 32GB GPU
Q3_K_M 23 GB 3.86 Good quality, 24GB GPU
Q2_K 18 GB 2.96 Minimum viable, 24GB GPU

Quick Start

# Download recommended quantization
huggingface-cli download timteh673/Nemotron-Super-49B-v1.5-Uncensored-GGUF \
  Nemotron-Super-49B-Uncensored-Q4_K_M.gguf \
  --local-dir ./models

# Run with llama.cpp
./llama-server -m models/Nemotron-Super-49B-Uncensored-Q4_K_M.gguf \
  -c 8192 -ngl 99

Ollama

# Create Modelfile
echo 'FROM ./Nemotron-Super-49B-Uncensored-Q4_K_M.gguf' > Modelfile
ollama create nemotron-super-49b-uncensored -f Modelfile
ollama run nemotron-super-49b-uncensored

Hardware Requirements

Quantization Minimum VRAM Recommended Setup
Q2_K / Q3_K_M 24 GB RTX 3090/4090
Q4_K_M / Q5_K_M 32-48 GB RTX A6000, 2ร—3090
Q6_K 48 GB A6000, A100 40GB + offload
Q8_0 64 GB A100 80GB, 2ร—A6000
BF16 96+ GB 2ร—A100 80GB, H100

Ethical Notice

This model is provided for research and development purposes. The removal of safety guardrails means the model will respond to prompts that the original model would refuse. Users are responsible for ensuring their use complies with applicable laws and regulations. This model should not be used to generate content that could cause harm.

Support This Work

If you find this useful, consider supporting continued open model releases:

โ˜• Buy Me a Coffee: https://buymeacoffee.com/timteh

Crypto:

  • BTC: bc1qmz3vu2naymwfmz7f7krfteevfy0yk9ts09wp5y
  • ETH: 0x27fd2C8d3b5a1C6a0e85c5A9FCa2a8743dD04E7a
  • SOL: 7x5Eo3FhKMZxFNoE3DfQfBRYnmBVbmj3bSduHaVJpump

๐Ÿ“ง Enterprise/Custom Merges: tim@timlex.co


Built by timteh673 โ€” Cognitive Preservation Foundry

Downloads last month
670
GGUF
Model size
50B params
Architecture
deci
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for timteh673/Nemotron-Super-49B-v1.5-Uncensored-GGUF

Quantized
(18)
this model