TIGER-Lab/MMLU-Pro
Benchmark • Updated • 12.1k • 145k • 470
Natural Language Processing, Image Generation
RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time
OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis