ACC: Compiling Agent Trajectories for Long-Context Training Paper • 2605.21850 • Published 1 day ago • 50
OcclusionFormer: Arranging Z-Order for Layout-Grounded Image Generation Paper • 2605.21343 • Published 3 days ago • 8
WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation Paper • 2605.10912 • Published 12 days ago • 45
WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation Paper • 2605.10912 • Published 12 days ago • 45
Flow-OPD: On-Policy Distillation for Flow Matching Models Paper • 2605.08063 • Published 15 days ago • 97
UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation Paper • 2603.23500 • Published Mar 24 • 36
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents Paper • 2604.26752 • Published 24 days ago • 107
AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model Paper • 2604.19747 • Published Apr 21 • 39
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published Apr 8 • 325
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver Paper • 2604.08377 • Published Apr 9 • 291
MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale Paper • 2604.04771 • Published Apr 6 • 123
TriAttention: Efficient Long Reasoning with Trigonometric KV Compression Paper • 2604.04921 • Published Apr 6 • 114
SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing Paper • 2604.04911 • Published Apr 6 • 36
GEMS: Agent-Native Multimodal Generation with Memory and Skills Paper • 2603.28088 • Published Mar 30 • 85
Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization Paper • 2603.28342 • Published Mar 30 • 26
Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale Paper • 2603.25040 • Published Mar 26 • 133
MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding Paper • 2603.22458 • Published Mar 23 • 136