DiPO: Disentangled Perplexity Policy Optimization for Fine-grained Exploration-Exploitation Trade-Off Paper • 2604.13902 • Published 10 days ago • 59
SAVOIR: Learning Social Savoir-Faire via Shapley-based Reward Attribution Paper • 2604.18982 • Published 4 days ago • 4
Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces Paper • 2604.08362 • Published 16 days ago • 15
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published 23 days ago • 487
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning Paper • 2604.02721 • Published 22 days ago • 367
SEAR: Schema-Based Evaluation and Routing for LLM Gateways Paper • 2603.26728 • Published Mar 20 • 12
6Bit-Diffusion: Inference-Time Mixed-Precision Quantization for Video Diffusion Models Paper • 2603.18742 • Published Mar 19 • 10
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization Paper • 2603.19835 • Published Mar 20 • 342