Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients Paper • 2606.18216 • Published 1 day ago • 38
LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling Paper • 2606.18023 • Published 1 day ago • 112
MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research Paper • 2605.26114 • Published 24 days ago • 64
TerminalWorld: Benchmarking Agents on Real-World Terminal Tasks Paper • 2605.22535 • Published 28 days ago • 10
Synthetic Computers at Scale for Long-Horizon Productivity Simulation Paper • 2604.28181 • Published Apr 30 • 20
ClawEnvKit: Automatic Environment Generation for Claw-Like Agents Paper • 2604.18543 • Published Apr 20 • 30
Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation Paper • 2604.13010 • Published Apr 14 • 18
Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning Paper • 2604.12374 • Published Apr 14 • 37
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Paper • 2604.13016 • Published Apr 14 • 110
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver Paper • 2604.08377 • Published Apr 9 • 293
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization Paper • 2603.19835 • Published Mar 20 • 352
SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks Paper • 2603.24755 • Published Mar 25 • 30