Compliance versus Sensibility: On the Reasoning Controllability in Large Language Models Paper • 2604.27251 • Published 7 days ago • 7
ClawBench: Can AI Agents Complete Everyday Online Tasks? Paper • 2604.08523 • Published 27 days ago • 261
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published 28 days ago • 323
Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents Paper • 2604.06132 • Published 29 days ago • 119
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning Paper • 2604.02721 • Published Apr 3 • 626
CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence Paper • 2603.28032 • Published Mar 30 • 341