view article Article Tensor Parallelism (TP) in Transformers: 5 Minutes to Understand Dec 4, 2025 • 64
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency Jan 30, 2025 • 230
An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges Paper • 2512.11362 • Published Dec 12, 2025 • 22
Running 106 The Eiffel Tower Llama 📝 106 Explore the Eiffel Tower Llama experiment with open-source models
Running 82 Unlocking On-Policy Distillation for Any Model Family 📝 82 Improve model performance by transferring knowledge between different model families
Running on CPU Upgrade Featured 2.97k The Smol Training Playbook 📚 2.97k The secrets to building world-class LLMs