nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16

by pirola - opened 27 days ago

•

Hi! Big fan here!
What you've done with cerebras/GLM-4.7-Flash-REAP-23B-A3B was amazing, because after AWQ to nvfp4 (lm_heads and embeds included), it was possible to run it on a simple RTX5080 with 16Gb VRAM.
Are there any plans do shrinking down nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 to the same or lower VRAM footprint as cerebras/GLM-4.7-Flash-REAP-23B-A3B? That would be necessary for achieving a larger context on such a tight memory.
It would also be more than awesome if you could provide nvfp4 Quantization-Aware Distilling (DAQ) versions, since that is simply impossible on a normal desktop. My quantization of cerebras/GLM-4.7-Flash-REAP-23B-A3B is a little brainless...
You are enabling super nice models to run on "budget" HW, and this is incredible! Thank you so much.

pirola

23 days ago

Hi, me again. Or even better then Nemotron 3 Nano, they released this nvidia/Nemotron-Cascade-2-30B-A3B now
Thanks for enabling this!

pirola

22 days ago

https://huggingface.co/nvidia/Nemotron-Cascade-2-30B-A3B/discussions/16 ouch that sounded like a challenge to me :D

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment