Multimodal - a Chevolier Collection

Chevolier 's Collections

Audio Generation

Self-Improving AI

Image Generation

Video Generation

Multimodal

updated May 28

MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization

Paper • 2510.08540 • Published Oct 9, 2025 • 110
Diffusion Transformers with Representation Autoencoders

Paper • 2510.11690 • Published Oct 13, 2025 • 171
Spotlight on Token Perception for Multimodal Reinforcement Learning

Paper • 2510.09285 • Published Oct 10, 2025 • 37
Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented Generation

Paper • 2510.17354 • Published Oct 20, 2025 • 35
RL makes MLLMs see better than SFT

Paper • 2510.16333 • Published Oct 18, 2025 • 49
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning

Paper • 2510.27492 • Published Oct 30, 2025 • 88
Visual Representation Alignment for Multimodal Large Language Models

Paper • 2509.07979 • Published Sep 9, 2025 • 84
Kwai Keye-VL 1.5 Technical Report

Paper • 2509.01563 • Published Sep 1, 2025 • 41
SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published Nov 20, 2025 • 138
Self-Improving VLM Judges Without Human Annotations

Paper • 2512.05145 • Published Dec 2, 2025 • 21
Kimi K2.5: Visual Agentic Intelligence

Paper • 2602.02276 • Published Feb 2 • 278
ERNIE 5.0 Technical Report

Paper • 2602.04705 • Published Feb 4 • 269
Your Embedding Model is SMARTer Than You Think

Paper • 2605.24938 • Published May 24 • 25
Towards Customized Multimodal Role-Play

Paper • 2605.08129 • Published May 1 • 10
Advancing Creative Physical Intelligence in Large Multimodal Models

Paper • 2605.26396 • Published May 25 • 21