EarlyTom: Early Token Compression Completes Fast Video Understanding Paper • 2605.30010 • Published 11 days ago • 32
MergeMix: A Unified Augmentation Paradigm for Visual and Multi-Modal Understanding Paper • 2510.23479 • Published Oct 27, 2025 • 18
LVOmniBench: Pioneering Long Audio-Video Understanding Evaluation for Omnimodal LLMs Paper • 2603.19217 • Published Mar 19 • 29
RankE: End-to-End Post-Training for Discrete Text-to-Image Generation with Decoder Co-Evolution Paper • 2605.21195 • Published 19 days ago • 19
RankE: End-to-End Post-Training for Discrete Text-to-Image Generation with Decoder Co-Evolution Paper • 2605.21195 • Published 19 days ago • 19
PASA: A Principled Embedding-Space Watermarking Approach for LLM-Generated Text under Semantic-Invariant Attacks Paper • 2605.10977 • Published about 1 month ago • 10
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation Paper • 2604.24764 • Published Apr 27 • 118
Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms Paper • 2604.23775 • Published Apr 26 • 45
LVOmniBench: Pioneering Long Audio-Video Understanding Evaluation for Omnimodal LLMs Paper • 2603.19217 • Published Mar 19 • 29
The Trinity of Consistency as a Defining Principle for General World Models Paper • 2602.23152 • Published Feb 26 • 202
Thinking with Drafting: Optical Decompression via Logical Reconstruction Paper • 2602.11731 • Published Feb 12 • 36
OmniAgent: Audio-Guided Active Perception Agent for Omnimodal Audio-Video Understanding Paper • 2512.23646 • Published Dec 29, 2025 • 15
Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights Paper • 2512.01816 • Published Dec 1, 2025 • 95
OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models Paper • 2511.14582 • Published Nov 18, 2025 • 19
Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward Paper • 2511.20561 • Published Nov 25, 2025 • 33