Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters Paper • 2602.10604 • Published 21 days ago • 186
GEBench: Benchmarking Image Generation Models as GUI Environments Paper • 2602.09007 • Published 22 days ago • 39
How Far Are We from Genuinely Useful Deep Research Agents? Paper • 2512.01948 • Published Dec 1, 2025 • 56
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence Paper • 2511.18538 • Published Nov 23, 2025 • 299
OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs Paper • 2510.10689 • Published Oct 12, 2025 • 47