🧠 Traditional Chinese Reasoning Datasets Collection A curated collection of datasets designed to evaluate and train reasoning capabilities in Traditional Chinese across various domains. • 3 items • Updated Oct 13, 2025 • 9
view article Article SmolVLM Grows Smaller – Introducing the 256M & 500M Models! +1 Jan 23, 2025 • 192
view article Article Fine-tuning SmolLM with Group Relative Policy Optimization (GRPO) by following the Methodologies Feb 17, 2025 • 29
view article Article Training and Finetuning Embedding Models with Sentence Transformers v3 May 28, 2024 • 264
Breeze 2 Family Collection Llama-Breeze2 is a multi-modal language model family specifically intended for Traditional Chinese use. BreezyVoice is a Taiwan Mandarin TTS • 6 items • Updated Feb 26, 2025 • 19
high-quality Chinese training datasets Collection a suite of high-quality Chinese datasets, used for pretraining, fine-tuning or preference alignment. And the models trained on these datasets. • 13 items • Updated May 22, 2025 • 24
TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video Paper • 2411.18671 • Published Nov 27, 2024 • 20
LLM2CLIP Collection LLM2CLIP makes SOTA pretrained CLIP modal more SOTA ever. • 10 items • Updated 4 days ago • 66
StdGEN: Semantic-Decomposed 3D Character Generation from Single Images Paper • 2411.05738 • Published Nov 8, 2024 • 14
Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale Paper • 2409.17115 • Published Sep 25, 2024 • 64
Seeing Faces in Things: A Model and Dataset for Pareidolia Paper • 2409.16143 • Published Sep 24, 2024 • 17
RACER: Rich Language-Guided Failure Recovery Policies for Imitation Learning Paper • 2409.14674 • Published Sep 23, 2024 • 42
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling Paper • 2408.16532 • Published Aug 29, 2024 • 50