WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces Paper • 2606.09426 • Published 9 days ago • 100
From Correctness to Utility: Gain-Based Prefix Evaluation for LLM Reasoning Paper • 2606.07190 • Published 12 days ago • 34
Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models Paper • 2606.03988 • Published 14 days ago • 119
RayDer: Scalable Self-Supervised Novel View Synthesis from Real-World Video Paper • 2605.31535 • Published 19 days ago • 7
Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players Paper • 2605.28816 • Published 21 days ago • 423