ClawBench: Can AI Agents Complete Everyday Online Tasks? Paper • 2604.08523 • Published 4 days ago • 242
ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces Paper • 2604.05172 • Published 7 days ago • 22
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published 11 days ago • 461
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published 5 days ago • 292
When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models Paper • 2604.08546 • Published 4 days ago • 109
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning Paper • 2604.02721 • Published 10 days ago • 349
GPA: Learning GUI Process Automation from Demonstrations Paper • 2604.01676 • Published 11 days ago • 16
S0 Tuning: Zero-Overhead Adaptation of Hybrid Recurrent-Attention Models Paper • 2604.01168 • Published 11 days ago • 6
UniRecGen: Unifying Multi-View 3D Reconstruction and Generation Paper • 2604.01479 • Published 12 days ago • 7
GEditBench v2: A Human-Aligned Benchmark for General Image Editing Paper • 2603.28547 • Published 13 days ago • 33