TIGER-Lab/MMLU-Pro
Benchmark
• Updated
• 12.1k • 109k • 447
Natural Language Processing, Image Generation
VisPhyWorld: Probing Physical Reasoning via Code-Driven Video Reconstruction
Context Forcing: Consistent Autoregressive Video Generation with Long Context