Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding Paper • 2605.02290 • Published 25 days ago • 40
Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex Paper • 2605.06139 • Published 22 days ago • 66
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B Paper • 2511.06221 • Published Nov 9, 2025 • 134