How and What to Imagine? Visual Thinking in Unified Multimodal Models for Cross-View Spatial Reasoning Paper • 2605.27310 • Published 5 days ago • 18
Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models Paper • 2605.21573 • Published 11 days ago • 105
Forecasting Downstream Performance of LLMs With Proxy Metrics Paper • 2605.18607 • Published 13 days ago • 14
RiT: Vanilla Diffusion Transformers Suffice in Representation Space Paper • 2605.21981 • Published 10 days ago • 10
Do Enterprise Systems Need Learned World Models? The Importance of Context to Infer Dynamics Paper • 2605.12178 • Published 19 days ago • 61
Sema Code: Decoupling AI Coding Agents into Programmable, Embeddable Infrastructure Paper • 2604.11045 • Published Apr 13 • 26
FORGE:Fine-grained Multimodal Evaluation for Manufacturing Scenarios Paper • 2604.07413 • Published Apr 8 • 95
Communicating about Space: Language-Mediated Spatial Integration Across Partial Views Paper • 2603.27183 • Published Mar 28 • 20
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents Paper • 2603.24440 • Published Mar 25 • 98
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents Paper • 2603.24440 • Published Mar 25 • 98 • 5
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents Paper • 2603.24440 • Published Mar 25 • 98