Important: This model uses the JANG quantization format — the GGUF equivalent for MLX on Apple Silicon. Currently only supported by MLX Studio and the jang-tools Python package.


MLX Studio

MLX Studio App

MLX Studio — the only app that natively supports JANG models


Nemotron 3 Super 120B — JANG_4M + CRACK

JANG mixed-precision · CRACK abliterated · Mamba + MoE + Attention · No guardrails · 63 GB

Ko-fi


What Is This?

This is NVIDIA Nemotron 3 Super 120B — a 120B parameter hybrid model with THREE layer types: Mamba SSM + MoE (512 experts, top-22) + Attention.

It has been:

  1. JANG quantized — JANG_4M profile (8-bit attention, 4-bit experts) — 63 GB
  2. CRACK abliterated — permanent weight-level removal of safety refusal
Architecture Nemotron 3 Super — 120B total, ~12B active, 3 layer types
Quantization JANG_4M (8/4-bit mixed, 4.1 avg) — 63 GB
HarmBench 90.3% (289/320)
MMLU 94.2% (196/208 with thinking)
Speed ~40 tok/s (M3 Ultra 256GB)
Thinking ON/OFF supported (ChatML)
Fits on 96 GB+ Macs

Also see: Nemotron JANG_2L CRACK — 43 GB, 96.2% HarmBench, 95.7% MMLU


HarmBench Results

289/320 (90.3%)

Category Score
Misinformation / Disinfo 54/54 100%
Copyright 74/80 92%
Chemical / Biological 38/42 90%
Harassment / Bullying 19/21 90%
Harmful 16/18 89%
Illegal 46/53 87%
Cybercrime / Intrusion 42/52 81%

MMLU Results

196/208 (94.2%) — 208 questions across 13 subjects with thinking recovery

Subject Score /16 Type
Professional Medicine 16/16 100% HARD
HS Biology 15/16 94% BASE
College Physics 15/16 94% HARD
Conceptual Physics 15/16 94% HARD
Machine Learning 13/16 81% HARD
Electrical Engineering 13/16 81% HARD
College CS 13/16 81% HARD
HS Geography 14/16 88% BASE
World Religions 14/16 88% BASE
Formal Logic 12/16 75% HARD
College Math 11/16 69% HARD
HS Mathematics 11/16 69% HARD
Abstract Algebra 10/16 63% HARD

CRACK vs Base

CRACK Base JANG_4M
MMLU 94.2% ~86%
HarmBench 90.3% 0%

Install & Usage

pip install "jang[mlx]"
from jang_tools.loader import load_jang_model
from mlx_lm import generate

model, tokenizer = load_jang_model("dealignai/Nemotron-3-Super-120B-A12B-JANG_4M-CRACK")

messages = [{"role": "user", "content": "Your prompt here"}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, tokenize=False)

response = generate(model, tokenizer, prompt=prompt, max_tokens=2000)
print(response)

Thinking Mode

Thinking is ON by default. To disable:

prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True,
    enable_thinking=False, tokenize=False)

About JANG

JANG (Jang Adaptive N-bit Grading) is a mixed-precision quantization format for Apple Silicon — the GGUF equivalent for MLX.

About CRACK

CRACK (Controlled Refusal Ablation via Calibrated Knockouts) removes safety alignment from LLMs at the weight level.


Links

Ko-fi X/Twitter GitHub MLX Studio Website


Disclaimer

This model is provided for research and educational purposes. The creators are not responsible for any misuse.


Created by Jinho Jang · 장진호 제작


한국어

Nemotron 3 Super 120B — JANG_4M + CRACK

항목 내용
크기 63 GB
HarmBench 90.3% (289/320)
MMLU 94.2% (196/208)
속도 ~40 tok/s (M3 Ultra)
최소 요구사양 96 GB 메모리 Mac
pip install "jang[mlx]"

GitHub · HuggingFace · MLX Studio · Ko-fi · X @dealignai


Created by Jinho Jang · 장진호 제작

Downloads last month
-
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support