Text Generation
Transformers
Safetensors
PyTorch
nemotron_h
nvidia
conversational
custom_code
8-bit precision

Pruned version of Nemotron-3-Nano-20B-A3B to run with full context in RTX5080

#12
by pirola - opened

What are the changes you provide a pruned version of Nemotron-3-Nano-20B-A3B, distilled, and subsequently QAD to NVFP4 to run with full context in RTX5080 16Gb VRAM? THAT would be cool.

Sign up or log in to comment