Pruned version of Nemotron-3-Nano-20B-A3B to run with full context in RTX5080
#12
by pirola - opened
What are the changes you provide a pruned version of Nemotron-3-Nano-20B-A3B, distilled, and subsequently QAD to NVFP4 to run with full context in RTX5080 16Gb VRAM? THAT would be cool.