Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs Paper • 2605.09063 • Published 6 days ago • 74
Smoothie-Qwen: Post-Hoc Smoothing to Reduce Language Bias in Multilingual LLMs Paper • 2507.05686 • Published Jul 8, 2025 • 1