FineWeb-HQ datasets Collection Collection containing FineWeb-HQ and FineWeb2-HQ quality filtered datasets. • 3 items • Updated Oct 8, 2025
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language Paper • 2506.20920 • Published Jun 26, 2025 • 77
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders Paper • 2410.22366 • Published Oct 28, 2024 • 84
Running Featured 1.28k FineWeb: decanting the web for the finest text data at scale 🍷 1.28k Generate high-quality text data for LLMs using FineWeb