Reasoning Models Struggle to Control their Chains of Thought Paper • 2603.05706 • Published 5 days ago • 24
WorldVQA: Measuring Atomic World Knowledge in Multimodal Large Language Models Paper • 2602.02537 • Published Jan 28 • 6
Generative Frame Sampler for Long Video Understanding Paper • 2503.09146 • Published Mar 12, 2025 • 1
Kimi-VL-A3B Collection Moonshot's efficient MoE VLMs, exceptional on agent, long-context, and thinking • 6 items • Updated 9 days ago • 78
ProBench: Judging Multimodal Foundation Models on Open-ended Multi-domain Expert Tasks Paper • 2503.06885 • Published Mar 10, 2025 • 4
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning? Paper • 2505.23359 • Published May 29, 2025 • 38
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning Paper • 2504.08837 • Published Apr 10, 2025 • 43
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published Apr 14, 2025 • 306
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20, 2025 • 159
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation Paper • 2411.13281 • Published Nov 20, 2024 • 20
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22, 2025 • 441
Kimi k1.5: Scaling Reinforcement Learning with LLMs Paper • 2501.12599 • Published Jan 22, 2025 • 126
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale Paper • 2412.05237 • Published Dec 6, 2024 • 46