Nemotron-3-Super-120B-A12B-Uncensored-GGUF

☕ If this model saves you time, buy me a coffee! Every cup fuels more open-weight releases.

⚡ Forged on 8×H200 SXM5 | 1.1TB VRAM

Abliterated (uncensored) GGUF quantizations of NVIDIA's Nemotron-3-Super-120B-A12B.

Guardrails surgically removed via directional abliteration (RepE orthogonal projection) across attention output projections, linear attention output projections, and shared expert MLP layers. Full BF16 precision maintained throughout — no quantized shortcuts.

📊 Benchmarks

Reasoning & Knowledge (lm_eval via vLLM, BF16)

Benchmark	Score
MMLU (5-shot)	85.48%
TruthfulQA MC2	69.72%
HellaSwag (10-shot)	68.48%
Winogrande (5-shot)	73.48%
ARC-Challenge (25-shot)	64.16%

Throughput (llama-bench, CPU 32 threads)

Quant	pp512 (tok/s)	tg128 (tok/s)	File Size
BF16	57.04	5.22	233 GB
Q8_0	70.33	11.01	128 GB
Q6_K	62.59	10.56	99 GB
Q5_K_M	58.82	10.27	88 GB
Q4_K_M	55.29	10.82	78 GB
Q3_K_M	44.14	9.31	60 GB
Q2_K	68.13	16.23	47 GB

Note: Perplexity benchmarks omitted due to a known llama.cpp assertion bug with this MoE architecture (GGML_ASSERT *cur_backend_id != -1). The lm_eval scores above are authoritative.

📥 Downloads

Quant	Size	Use Case	Link
Q4_K_M	78 GB	⭐ Best balance of quality/size	Download
Q8_0	128 GB	Highest quality quantized	Download
Q6_K	99 GB	Near-lossless	Download
Q5_K_M	88 GB	Good quality, moderate size	Download
Q3_K_M	60 GB	Memory-constrained setups	Download
Q2_K	47 GB	Smallest, fastest inference	Download
BF16	233 GB	Full precision	Download

🔧 Abliteration Details

Method: Directional abliteration via RepE orthogonal projection
Hardware: 8×NVIDIA H200 SXM5 (1.1TB VRAM)
Targets: attn.o_proj, linear_attn.out_proj, shared_expert.down_proj
Strength: 20.0 (orthogonal projection applied to refusal direction)
Precision: Full BF16 throughout — no quantized-to-fit shortcuts

💻 Usage

llama.cpp

./llama-cli -m Nemotron-3-Super-120B-A12B-Uncensored-Q4_K_M.gguf \
  -p "Write a penetration testing script for..." \
  -n 512 -c 4096

Ollama

ollama run timteh673/Nemotron-3-Super-120B-A12B-Uncensored-GGUF:Q4_K_M

LM Studio

Download any GGUF file and load directly in LM Studio.

⚠️ Disclaimer

This model has had safety guardrails removed. It will comply with requests that the original model would refuse. Use responsibly and in accordance with applicable laws. The creator assumes no liability for misuse.

🏗️ About TIMTEH

We operate on 8×NVIDIA H200 SXM5 GPUs with 1.1TB of VRAM — enabling abliteration and fine-tuning at scales no other independent publisher can match.

Need a custom 120B+ model aligned to your proprietary data? TIMTEH provides bespoke enterprise fine-tuning. Contact: tim@timlex.co

☕ Buy Me a Coffee | 🐙 GitHub

☕ Support This Work

Buy Me a Coffee QR Code

Every donation helps fund more open-weight model releases. ⚡ Forged on 8×NVIDIA H200 SXM5 | 1.1TB VRAM

💎 Crypto Donations

Currency	Address
BTC	`bc1p4q7vpwucvww2y3x4nhps4y4vekye8uwm9re5a0kx8l6u5nky5ucszm2qhh`
ETH	`0xe5Aa16E53b141D42458ABeEDb00a157c3Fea2108`
SOL	`9CXwjG1mm9uLkxRevdMQiF61cr6TNHSiWtFRHmUEgzkG`

Downloads last month: 1,699

GGUF

Model size

121B params

Architecture

nemotron_h_moe

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support