CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models Paper • 2602.17684 • Published 19 days ago • 21
Query as Anchor: Scenario-Adaptive User Representation via Large Language Model Paper • 2602.14492 • Published 7 days ago • 18
MOVA: Towards Scalable and Synchronized Video-Audio Generation Paper • 2602.08794 • Published 14 days ago • 152
OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions Paper • 2602.05843 • Published 18 days ago • 57
Exploring Knowledge Purification in Multi-Teacher Knowledge Distillation for LLMs Paper • 2602.01064 • Published 22 days ago • 2
OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions Paper • 2602.05843 • Published 18 days ago • 57
Exploring Knowledge Purification in Multi-Teacher Knowledge Distillation for LLMs Paper • 2602.01064 • Published 22 days ago • 2
HER: Human-like Reasoning and Reinforcement Learning for LLM Role-playing Paper • 2601.21459 • Published 25 days ago • 9
TIDE: Trajectory-based Diagnostic Evaluation of Test-Time Improvement in LLM Agents Paper • 2602.02196 • Published 21 days ago • 34
SafeGround: Know When to Trust GUI Grounding Models via Uncertainty Calibration Paper • 2602.02419 • Published 21 days ago • 4
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models Paper • 2601.22060 • Published 25 days ago • 156
SSL: Sweet Spot Learning for Differentiated Guidance in Agentic Optimization Paper • 2601.22491 • Published 25 days ago • 12
SSL: Sweet Spot Learning for Differentiated Guidance in Agentic Optimization Paper • 2601.22491 • Published 25 days ago • 12
Double: Breaking the Acceleration Limit via Double Retrieval Speculative Parallelism Paper • 2601.05524 • Published Jan 9 • 1
Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning Paper • 2601.20209 • Published 27 days ago • 22