RedHatAI/Sparse-Llama-3.1-8B-evolcodealpaca-2of4-quantized.w4a16
Text Generation • 2B • Updated • 5
OpenSource and AI
SNLP: Layer-Parallel Inference via Structured Newton Corrections
S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation