Temporal Gains, Spatial Costs: Revisiting Video Fine-Tuning in Multimodal Large Language Models Paper • 2603.17541 • Published Mar 18 • 20
JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation Paper • 2512.22905 • Published Dec 28, 2025 • 20