Beyond Accuracy: Unveiling Inefficiency Patterns in Tool-Integrated Reasoning Paper • 2604.05404 • Published 4 days ago • 38
Xpertbench: Expert Level Tasks with Rubrics-Based Evaluation Paper • 2604.02368 • Published 15 days ago • 9