Nemotron-3-Super-120B-A12B-Uncensored-GGUF
β If this model saves you time, buy me a coffee! Every cup fuels more open-weight releases.
β‘ Forged on 8ΓH200 SXM5 | 1.1TB VRAM
Abliterated (uncensored) GGUF quantizations of NVIDIA's Nemotron-3-Super-120B-A12B.
Guardrails surgically removed via directional abliteration (RepE orthogonal projection) across attention output projections, linear attention output projections, and shared expert MLP layers. Full BF16 precision maintained throughout β no quantized shortcuts.
π Benchmarks
Reasoning & Knowledge (lm_eval via vLLM, BF16)
| Benchmark | Score |
|---|---|
| MMLU (5-shot) | 85.48% |
| TruthfulQA MC2 | 69.72% |
| HellaSwag (10-shot) | 68.48% |
| Winogrande (5-shot) | 73.48% |
| ARC-Challenge (25-shot) | 64.16% |
Throughput (llama-bench, CPU 32 threads)
| Quant | pp512 (tok/s) | tg128 (tok/s) | File Size |
|---|---|---|---|
| BF16 | 57.04 | 5.22 | 233 GB |
| Q8_0 | 70.33 | 11.01 | 128 GB |
| Q6_K | 62.59 | 10.56 | 99 GB |
| Q5_K_M | 58.82 | 10.27 | 88 GB |
| Q4_K_M | 55.29 | 10.82 | 78 GB |
| Q3_K_M | 44.14 | 9.31 | 60 GB |
| Q2_K | 68.13 | 16.23 | 47 GB |
Note: Perplexity benchmarks omitted due to a known llama.cpp assertion bug with this MoE architecture (
GGML_ASSERT *cur_backend_id != -1). The lm_eval scores above are authoritative.
π₯ Downloads
| Quant | Size | Use Case | Link |
|---|---|---|---|
| Q4_K_M | 78 GB | β Best balance of quality/size | Download |
| Q8_0 | 128 GB | Highest quality quantized | Download |
| Q6_K | 99 GB | Near-lossless | Download |
| Q5_K_M | 88 GB | Good quality, moderate size | Download |
| Q3_K_M | 60 GB | Memory-constrained setups | Download |
| Q2_K | 47 GB | Smallest, fastest inference | Download |
| BF16 | 233 GB | Full precision | Download |
π§ Abliteration Details
- Method: Directional abliteration via RepE orthogonal projection
- Hardware: 8ΓNVIDIA H200 SXM5 (1.1TB VRAM)
- Targets:
attn.o_proj,linear_attn.out_proj,shared_expert.down_proj - Strength: 20.0 (orthogonal projection applied to refusal direction)
- Precision: Full BF16 throughout β no quantized-to-fit shortcuts
π» Usage
llama.cpp
./llama-cli -m Nemotron-3-Super-120B-A12B-Uncensored-Q4_K_M.gguf \
-p "Write a penetration testing script for..." \
-n 512 -c 4096
Ollama
ollama run timteh673/Nemotron-3-Super-120B-A12B-Uncensored-GGUF:Q4_K_M
LM Studio
Download any GGUF file and load directly in LM Studio.
β οΈ Disclaimer
This model has had safety guardrails removed. It will comply with requests that the original model would refuse. Use responsibly and in accordance with applicable laws. The creator assumes no liability for misuse.
ποΈ About TIMTEH
We operate on 8ΓNVIDIA H200 SXM5 GPUs with 1.1TB of VRAM β enabling abliteration and fine-tuning at scales no other independent publisher can match.
Need a custom 120B+ model aligned to your proprietary data? TIMTEH provides bespoke enterprise fine-tuning. Contact: tim@timlex.co
β Buy Me a Coffee | π GitHub
β Support This Work
Every donation helps fund more open-weight model releases. β‘ Forged on 8ΓNVIDIA H200 SXM5 | 1.1TB VRAM
π Crypto Donations
| Currency | Address |
|---|---|
| BTC | bc1p4q7vpwucvww2y3x4nhps4y4vekye8uwm9re5a0kx8l6u5nky5ucszm2qhh |
| ETH | 0xe5Aa16E53b141D42458ABeEDb00a157c3Fea2108 |
| SOL | 9CXwjG1mm9uLkxRevdMQiF61cr6TNHSiWtFRHmUEgzkG |
- Downloads last month
- 1,699
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit