nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16

#6
by pirola - opened

Hi! Big fan here!
What you've done with cerebras/GLM-4.7-Flash-REAP-23B-A3B was amazing, because after AWQ to nvfp4 (lm_heads and embeds included), it was possible to run it on a simple RTX5080 with 16Gb VRAM.
Are there any plans do shrinking down nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 to the same or lower VRAM footprint as cerebras/GLM-4.7-Flash-REAP-23B-A3B? That would be necessary for achieving a larger context on such a tight memory.
It would also be more than awesome if you could provide nvfp4 Quantization-Aware Distilling (DAQ) versions, since that is simply impossible on a normal desktop. My quantization of cerebras/GLM-4.7-Flash-REAP-23B-A3B is a little brainless...
You are enabling super nice models to run on "budget" HW, and this is incredible! Thank you so much.

Hi, me again. Or even better then Nemotron 3 Nano, they released this nvidia/Nemotron-Cascade-2-30B-A3B now
Thanks for enabling this!

Sign up or log in to comment