Multimodal
updated
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with
Holistic Platform and Adaptive Hybrid Policy Optimization
Paper
• 2510.08540
• Published
• 109
Diffusion Transformers with Representation Autoencoders
Paper
• 2510.11690
• Published
• 166
Spotlight on Token Perception for Multimodal Reinforcement Learning
Paper
• 2510.09285
• Published
• 37
Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented
Generation
Paper
• 2510.17354
• Published
• 35
RL makes MLLMs see better than SFT
Paper
• 2510.16333
• Published
• 49
ThinkMorph: Emergent Properties in Multimodal Interleaved
Chain-of-Thought Reasoning
Paper
• 2510.27492
• Published
• 86
Visual Representation Alignment for Multimodal Large Language Models
Paper
• 2509.07979
• Published
• 84
Kwai Keye-VL 1.5 Technical Report
Paper
• 2509.01563
• Published
• 38
SAM 3: Segment Anything with Concepts
Paper
• 2511.16719
• Published
• 129
Self-Improving VLM Judges Without Human Annotations
Paper
• 2512.05145
• Published
• 20
Kimi K2.5: Visual Agentic Intelligence
Paper
• 2602.02276
• Published
• 238
ERNIE 5.0 Technical Report
Paper
• 2602.04705
• Published
• 254