WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces Paper • 2606.09426 • Published 7 days ago • 97
InterleaveThinker: Reinforcing Agentic Interleaved Generation Paper • 2606.13679 • Published 4 days ago • 77
Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks Paper • 2606.12344 • Published 5 days ago • 63
Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models Paper • 2606.03988 • Published 12 days ago • 117
SmartDirector: Keyframe-Conditioned Cinematic Video Generation with Narrative Pacing Control Paper • 2605.27891 • Published 19 days ago • 8