Running on CPU Upgrade 159 The Synthetic Data Playbook: Generating Trillions of the Finest Tokens 📝 159 Explore synthetic data experiments in a bookshelf view
laion/CLIP-ViT-bigG-14-laion2B-39B-b160k Zero-Shot Image Classification • Updated Jan 22, 2025 • 56.6k • 306
Running on CPU Upgrade Featured 3.04k The Smol Training Playbook 📚 3.04k The secrets to building world-class LLMs
google/siglip2-large-patch16-512 Zero-Shot Image Classification • 0.9B • Updated Feb 21, 2025 • 74.8k • 18
google/siglip-large-patch16-384 Zero-Shot Image Classification • 0.7B • Updated Sep 26, 2024 • 17.8k • 11
nomic-ai/nomic-embed-vision-v1.5 Image Feature Extraction • 92.9M • Updated Mar 31, 2025 • 619k • 215
BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining Paper • 2508.10975 • Published Aug 14, 2025 • 60