view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency not-lain • Jan 30, 2025 • 328
view article Article Train AI models with Unsloth and Hugging Face Jobs for FREE +4 burtenshaw, danielhanchen, shimmyshimmer, mlabonne, davanstrien, evalstate • Feb 20 • 101
view article Article GGML and llama.cpp join HF to ensure the long-term progress of Local AI +4 ggerganov, ngxson, allozaur, lysandre, victor, julien-c • Feb 20 • 505
HuggingFaceTB/SmolLM2-135M-Instruct Text Generation • 0.1B • Updated Sep 22, 2025 • 1.57M • 318
view article Article We Got Claude to Build CUDA Kernels and teach open models! +2 burtenshaw, evalstate, merve, pcuenq • Jan 28 • 156
view article Article Nemotron ColEmbed V2: Raising the Bar for Multimodal Retrieval with ViDoRe V3’s Top Model nvidia • Feb 4 • 28
view article Article The Optimal Architecture for Small Language Models codelion • Dec 26, 2025 • 120
view article Article I built a spot market for bare metal GPUs (and how to get A100s for $0.38/hr) JackJackJ • Dec 16, 2025 • 2