RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models Paper • 2406.10890 • Published Jun 16, 2024 • 1
Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences Paper • 2510.23451 • Published Oct 27, 2025 • 28
Fixing the Broken Compass: Diagnosing and Improving Inference-Time Reward Modeling Paper • 2503.05188 • Published Mar 7, 2025
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration Paper • 2602.05400 • Published 10 days ago • 309
DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle Paper • 2512.04324 • Published Dec 3, 2025 • 154
GGBench: A Geometric Generative Reasoning Benchmark for Unified Multimodal Models Paper • 2511.11134 • Published Nov 14, 2025 • 32
Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences Paper • 2510.23451 • Published Oct 27, 2025 • 28
MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos Paper • 2506.04141 • Published Jun 4, 2025 • 29
Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis Paper • 2506.04142 • Published Jun 4, 2025 • 27