Thinking in Dynamics: How Multimodal Large Language Models Perceive, Track, and Reason Dynamics in Physical 4D World Paper • 2603.12746 • Published Mar 13 • 1
Thinking in Dynamics: How Multimodal Large Language Models Perceive, Track, and Reason Dynamics in Physical 4D World Paper • 2603.12746 • Published Mar 13 • 1
SoMA: A Real-to-Sim Neural Simulator for Robotic Soft-body Manipulation Paper • 2602.02402 • Published Feb 2 • 32
InfiniteVGGT: Visual Geometry Grounded Transformer for Endless Streams Paper • 2601.02281 • Published Jan 5 • 33
PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence Paper • 2512.16793 • Published Dec 18, 2025 • 76
JarvisEvo: Towards a Self-Evolving Photo Editing Agent with Synergistic Editor-Evaluator Optimization Paper • 2511.23002 • Published Nov 28, 2025 • 26
Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-and-Language Navigation Paper • 2512.08186 • Published Dec 9, 2025 • 23