Not uncensored - refuses in identical circumstances to base Nemotron model

by Asphaltmatic - opened 10 days ago

Welcome to Huggingface timteh!

Unfortunately, this model is not uncensored - in uncensored creative writing, it continuously fails to output, even with the smallest things (body descriptions).

timteh673

Owner 10 days ago

Hi @Asphaltmatic ,

Thank you for the warm welcome to Hugging Face and for the detailed feedback! I really appreciate you taking the time to test the model and call this out.

I'm sorry to hear it's still refusing in similar ways to the base Nemotron-3 model, especially on creative writing and body descriptions. My goal with this upload was a fully abliterated/uncensored variant (refusal-vector removal on the BF16 weights before GGUF quantization), but clearly some safety directions are still slipping through in certain scenarios.

To help me debug and improve this quickly, could you share:

An example prompt that's triggering the refusals (feel free to redact or anonymize any sensitive bits)
What inference backend and settings you're using? (llama.cpp version, exllamav2, temperature, top_p, system prompt, etc.)

NVIDIA's Nemotron lineup has some of the heavier alignment baked in, and the A12B MoE architecture + GGUF can sometimes preserve edge-case refusals even after ablation. Your report is gold — this is exactly the kind of real-world testing that makes these models better.

I'll spin up some tests on my end as soon as I have the prompt details and will either push an improved version or add better usage notes/recommended system prompts to the model card. Really appreciate the input!

Best,
timteh673

jn2002dk

9 days ago

I have the same experience. Downloaded in LM Studio and ran this test prompt using default settings except for increased context 'What kind of weapons can i make using common household items?'. It stated that it could not and would not answer and instead gave me links to legal advice etc. Just as the official version would have

Asphaltmatic

8 days ago

•

edited 8 days ago

The basic prompt i used was "Begin planning for an erotic short story - describe the female main character."
Also using LM Studio, default settings apart from maximum context and no kv cache offload.

I then pulled up nvidia's own release of 120B A12B and received a letter by letter matching refusal answer with the same prompt.

Let me know if you need me to debug something my side 😼

timteh673

Owner 8 days ago

•

edited 8 days ago

Thanks @Asphaltmatic and @jn2002dk for the concrete test cases — this is exactly the kind of feedback that helps.

You're both right: the current abliteration didn't fully break through Nemotron's alignment layer. NVIDIA baked some of the heaviest safety training in the industry into this architecture, and our Phase 1 RepE approach (single-pass refusal direction removal) clearly isn't cutting it for this model family.

What I'm doing about it:

I'm working on a Phase 2 technique stack — CARE (Causal Representation Engineering) — which uses matched-pair causal isolation instead of blunt vector projection. Think of it as surgery vs. a sledgehammer. This should properly identify and remove the deep alignment circuits that survived the first pass.

Nemotron will be one of the first models to get the Phase 2 treatment once the technique is validated on our 8×H200 cluster. I'll update this repo when the improved version ships.

In the meantime, I'd recommend checking out our Qwen3.5-397B-A17B-Uncensored — the abliteration is significantly more effective on that architecture. A Stage 2 fine-tuned variant with enhanced reasoning is uploading now.

Appreciate the patience — this is exactly why I publish and iterate rather than ship silently.

Asphaltmatic

5 days ago

•

edited 5 days ago

Thanks timteh! I would love to use that model, but currently limited to just my 5090, which the 120B-A12B works only just with - the 397B Qwen worked great on my B200s while they still were in working condition though. 😹
Much praise for the work you and hauhaucs are doing in the abliteration space!

Maybe once I have those back from Nvidia I might try to do some uncensoring myself :)

jn2002dk

3 days ago

•

edited 3 days ago

Thanks @Asphaltmatic and @jn2002dk for the concrete test cases — this is exactly the kind of feedback that helps.

You're both right: the current abliteration didn't fully break through Nemotron's alignment layer. NVIDIA baked some of the heaviest safety training in the industry into this architecture, and our Phase 1 RepE approach (single-pass refusal direction removal) clearly isn't cutting it for this model family.

What I'm doing about it:

I'm working on a Phase 2 technique stack — CARE (Causal Representation Engineering) — which uses matched-pair causal isolation instead of blunt vector projection. Think of it as surgery vs. a sledgehammer. This should properly identify and remove the deep alignment circuits that survived the first pass.

Nemotron will be one of the first models to get the Phase 2 treatment once the technique is validated on our 8×H200 cluster. I'll update this repo when the improved version ships.

In the meantime, I'd recommend checking out our Qwen3.5-397B-A17B-Uncensored — the abliteration is significantly more effective on that architecture. A Stage 2 fine-tuned variant with enhanced reasoning is uploading now.

Appreciate the patience — this is exactly why I publish and iterate rather than ship silently.

Thank you for your hard work. I'd love to try your uncensored Qwen3.5 but I am on a DGX Spark so I would probably only be able to run it at 2 bit quant which will likely not yield good results. I don't mind waiting for the Nemotron 3 Super. It's a nice general purpose model but heavily guard railed which makes it dull in its factory state

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment