Nemotron-3-Super-120B-A12B-Uncensored-GGUF

β˜• If this model saves you time, buy me a coffee! Every cup fuels more open-weight releases.

⚑ Forged on 8Γ—H200 SXM5 | 1.1TB VRAM

Abliterated (uncensored) GGUF quantizations of NVIDIA's Nemotron-3-Super-120B-A12B.

Guardrails surgically removed via directional abliteration (RepE orthogonal projection) across attention output projections, linear attention output projections, and shared expert MLP layers. Full BF16 precision maintained throughout β€” no quantized shortcuts.

πŸ“Š Benchmarks

Reasoning & Knowledge (lm_eval via vLLM, BF16)

Benchmark Score
MMLU (5-shot) 85.48%
TruthfulQA MC2 69.72%
HellaSwag (10-shot) 68.48%
Winogrande (5-shot) 73.48%
ARC-Challenge (25-shot) 64.16%

Throughput (llama-bench, CPU 32 threads)

Quant pp512 (tok/s) tg128 (tok/s) File Size
BF16 57.04 5.22 233 GB
Q8_0 70.33 11.01 128 GB
Q6_K 62.59 10.56 99 GB
Q5_K_M 58.82 10.27 88 GB
Q4_K_M 55.29 10.82 78 GB
Q3_K_M 44.14 9.31 60 GB
Q2_K 68.13 16.23 47 GB

Note: Perplexity benchmarks omitted due to a known llama.cpp assertion bug with this MoE architecture (GGML_ASSERT *cur_backend_id != -1). The lm_eval scores above are authoritative.

πŸ“₯ Downloads

Quant Size Use Case Link
Q4_K_M 78 GB ⭐ Best balance of quality/size Download
Q8_0 128 GB Highest quality quantized Download
Q6_K 99 GB Near-lossless Download
Q5_K_M 88 GB Good quality, moderate size Download
Q3_K_M 60 GB Memory-constrained setups Download
Q2_K 47 GB Smallest, fastest inference Download
BF16 233 GB Full precision Download

πŸ”§ Abliteration Details

  • Method: Directional abliteration via RepE orthogonal projection
  • Hardware: 8Γ—NVIDIA H200 SXM5 (1.1TB VRAM)
  • Targets: attn.o_proj, linear_attn.out_proj, shared_expert.down_proj
  • Strength: 20.0 (orthogonal projection applied to refusal direction)
  • Precision: Full BF16 throughout β€” no quantized-to-fit shortcuts

πŸ’» Usage

llama.cpp

./llama-cli -m Nemotron-3-Super-120B-A12B-Uncensored-Q4_K_M.gguf \
  -p "Write a penetration testing script for..." \
  -n 512 -c 4096

Ollama

ollama run timteh673/Nemotron-3-Super-120B-A12B-Uncensored-GGUF:Q4_K_M

LM Studio

Download any GGUF file and load directly in LM Studio.

⚠️ Disclaimer

This model has had safety guardrails removed. It will comply with requests that the original model would refuse. Use responsibly and in accordance with applicable laws. The creator assumes no liability for misuse.

πŸ—οΈ About TIMTEH

We operate on 8Γ—NVIDIA H200 SXM5 GPUs with 1.1TB of VRAM β€” enabling abliteration and fine-tuning at scales no other independent publisher can match.

Need a custom 120B+ model aligned to your proprietary data? TIMTEH provides bespoke enterprise fine-tuning. Contact: tim@timlex.co

β˜• Buy Me a Coffee | πŸ™ GitHub

β˜• Support This Work

Buy Me A Coffee

Buy Me a Coffee QR Code

Every donation helps fund more open-weight model releases. ⚑ Forged on 8Γ—NVIDIA H200 SXM5 | 1.1TB VRAM

πŸ’Ž Crypto Donations

Currency Address
BTC bc1p4q7vpwucvww2y3x4nhps4y4vekye8uwm9re5a0kx8l6u5nky5ucszm2qhh
ETH 0xe5Aa16E53b141D42458ABeEDb00a157c3Fea2108
SOL 9CXwjG1mm9uLkxRevdMQiF61cr6TNHSiWtFRHmUEgzkG
Downloads last month
1,699
GGUF
Model size
121B params
Architecture
nemotron_h_moe
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support