ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents Paper • 2604.23781 • Published Apr 26 • 33
Echo-Memory: A Controlled Study of Memory in Action World Models Paper • 2606.09803 • Published 2 days ago • 29
SWE-Explore: Benchmarking How Coding Agents Explore Repositories Paper • 2606.07297 • Published 5 days ago • 104
XSkill: Continual Learning from Experience and Skills in Multimodal Agents Paper • 2603.12056 • Published Mar 12 • 34
SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research Paper • 2308.13149 • Published Aug 25, 2023
SciDFM: A Large Language Model with Mixture-of-Experts for Science Paper • 2409.18412 • Published Sep 27, 2024
CATP: Contextually Adaptive Token Pruning for Efficient and Enhanced Multimodal In-Context Learning Paper • 2508.07871 • Published Aug 11, 2025