RedHatAI/Llama-2-7b-ultrachat200k-pruned_70-quantized-deepsparse
Text Generation • Updated • 9
OpenSource and AI
SNLP: Layer-Parallel Inference via Structured Newton Corrections
S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation