Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models Paper • 2512.24618 • Published 24 days ago • 139
Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem Paper • 2512.24873 • Published 24 days ago • 102
AI Meets Brain: Memory Systems from Cognitive Neuroscience to Autonomous Agents Paper • 2512.23343 • Published 26 days ago • 28
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process Paper • 2512.23988 • Published 25 days ago • 16
Figure It Out: Improving the Frontier of Reasoning with Active Visual Thinking Paper • 2512.24297 • Published 25 days ago • 6
Valori: A Deterministic Memory Substrate for AI Systems Paper • 2512.22280 • Published about 1 month ago • 4
Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling Paper • 2512.23959 • Published 26 days ago • 108
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space Paper • 2512.24617 • Published 24 days ago • 60
Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization Paper • 2512.24615 • Published 24 days ago • 117
Nested Learning: The Illusion of Deep Learning Architectures Paper • 2512.24695 • Published 24 days ago • 40
SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning Paper • 2512.24330 • Published 25 days ago • 35
The Reasoning-Creativity Trade-off: Toward Creativity-Driven Problem Solving Paper • 2601.00747 • Published 22 days ago • 19
Diversity or Precision? A Deep Dive into Next Token Prediction Paper • 2512.22955 • Published 27 days ago • 8
Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits Paper • 2512.20578 • Published Dec 23, 2025 • 82
Can We Trust AI Explanations? Evidence of Systematic Underreporting in Chain-of-Thought Reasoning Paper • 2601.00830 • Published about 1 month ago • 3
Falcon-H1R: Pushing the Reasoning Frontiers with a Hybrid Model for Efficient Test-Time Scaling Paper • 2601.02346 • Published 19 days ago • 26
OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment Paper • 2601.01576 • Published 20 days ago • 18
Confidence Estimation for LLMs in Multi-turn Interactions Paper • 2601.02179 • Published 19 days ago • 15
CPPO: Contrastive Perception for Vision Language Policy Optimization Paper • 2601.00501 • Published 23 days ago • 7
Project Ariadne: A Structural Causal Framework for Auditing Faithfulness in LLM Agents Paper • 2601.02314 • Published 19 days ago • 2
UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision Paper • 2601.03193 • Published 18 days ago • 46
NitroGen: An Open Foundation Model for Generalist Gaming Agents Paper • 2601.02427 • Published 20 days ago • 42
MindWatcher: Toward Smarter Multimodal Tool-Integrated Reasoning Paper • 2512.23412 • Published 26 days ago • 39
CogFlow: Bridging Perception and Reasoning through Knowledge Internalization for Visual Mathematical Problem Solving Paper • 2601.01874 • Published 19 days ago • 19
Digital Twin AI: Opportunities and Challenges from Large Language Models to World Models Paper • 2601.01321 • Published 21 days ago • 18
WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks Paper • 2601.02439 • Published 19 days ago • 16
Large Reasoning Models Are (Not Yet) Multilingual Latent Reasoners Paper • 2601.02996 • Published 18 days ago • 5
Steerability of Instrumental-Convergence Tendencies in LLMs Paper • 2601.01584 • Published 20 days ago • 1
Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting Paper • 2601.02151 • Published 19 days ago • 102
Atlas: Orchestrating Heterogeneous Models and Tools for Multi-Domain Complex Reasoning Paper • 2601.03872 • Published 17 days ago • 41
Agentic Rubrics as Contextual Verifiers for SWE Agents Paper • 2601.04171 • Published 17 days ago • 11
MDAgent2: Large Language Model for Code Generation and Knowledge Q&A in Molecular Dynamics Paper • 2601.02075 • Published 19 days ago • 8
Why LLMs Aren't Scientists Yet: Lessons from Four Autonomous Research Attempts Paper • 2601.03315 • Published 18 days ago • 6
MAGMA: A Multi-Graph based Agentic Memory Architecture for AI Agents Paper • 2601.03236 • Published 18 days ago • 3
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published 16 days ago • 204
RelayLLM: Efficient Reasoning via Collaborative Decoding Paper • 2601.05167 • Published 16 days ago • 29
AT^2PO: Agentic Turn-based Policy Optimization via Tree Search Paper • 2601.04767 • Published 16 days ago • 27
Scaling Behavior Cloning Improves Causal Reasoning: An Open Model for Real-Time Video Game Playing Paper • 2601.04575 • Published 16 days ago • 8
The Illusion of Specialization: Unveiling the Domain-Invariant "Standing Committee" in Mixture-of-Experts Models Paper • 2601.03425 • Published 18 days ago • 16
DocDancer: Towards Agentic Document-Grounded Information Seeking Paper • 2601.05163 • Published 16 days ago • 5
One Sample to Rule Them All: Extreme Data Efficiency in RL Scaling Paper • 2601.03111 • Published 18 days ago • 9
AgentDevel: Reframing Self-Evolving LLM Agents as Release Engineering Paper • 2601.04620 • Published 16 days ago • 3
Learning User Preferences Through Interaction for Long-Term Collaboration Paper • 2601.02702 • Published 18 days ago • 2
Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization Paper • 2601.05432 • Published 16 days ago • 160
The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning Paper • 2601.06002 • Published 15 days ago • 49
Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards Paper • 2601.06021 • Published 15 days ago • 43
EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis Paper • 2601.05808 • Published 15 days ago • 36
AgentOCR: Reimagining Agent History via Optical Self-Compression Paper • 2601.04786 • Published 16 days ago • 28
Can We Predict Before Executing Machine Learning Agents? Paper • 2601.05930 • Published 15 days ago • 26
An Empirical Study on Preference Tuning Generalization and Diversity Under Domain Shift Paper • 2601.05882 • Published 15 days ago • 20
Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency Paper • 2601.05905 • Published 15 days ago • 18
SmartSearch: Process Reward-Guided Query Refinement for Search Agents Paper • 2601.04888 • Published 16 days ago • 9
Over-Searching in Search-Augmented Large Language Models Paper • 2601.05503 • Published 16 days ago • 6
DR-LoRA: Dynamic Rank LoRA for Mixture-of-Experts Adaptation Paper • 2601.04823 • Published 16 days ago • 6
Memory Matters More: Event-Centric Memory as a Logic Map for Agent Searching and Reasoning Paper • 2601.04726 • Published 16 days ago • 6
TCAndon-Router: Adaptive Reasoning Router for Multi-Agent Collaboration Paper • 2601.04544 • Published 17 days ago • 6
IIB-LPO: Latent Policy Optimization via Iterative Information Bottleneck Paper • 2601.05870 • Published 15 days ago • 3
PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning Paper • 2601.05593 • Published 15 days ago • 79
Lost in the Noise: How Reasoning Models Fail with Contextual Distractors Paper • 2601.07226 • Published 12 days ago • 30
Dr. Zero: Self-Evolving Search Agents without Training Data Paper • 2601.07055 • Published 13 days ago • 20
OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent Paper • 2601.07779 • Published 12 days ago • 26
Controllable Memory Usage: Balancing Anchoring and Innovation in Long-Term Human-Agent Interaction Paper • 2601.05107 • Published 16 days ago • 22
ET-Agent: Incentivizing Effective Tool-Integrated Reasoning Agent via Behavior Calibration Paper • 2601.06860 • Published 13 days ago • 16
MegaFlow: Large-Scale Distributed Orchestration System for the Agentic Era Paper • 2601.07526 • Published 12 days ago • 21
Forest Before Trees: Latent Superposition for Efficient Visual Reasoning Paper • 2601.06803 • Published 13 days ago • 10
TourPlanner: A Competitive Consensus Framework with Constraint-Gated Reinforcement Learning for Travel Planning Paper • 2601.04698 • Published 16 days ago • 10
How Do Large Language Models Learn Concepts During Continual Pre-Training? Paper • 2601.03570 • Published 17 days ago • 4
OpenTinker: Separating Concerns in Agentic Reinforcement Learning Paper • 2601.07376 • Published 12 days ago • 6
Artificial Entanglement in the Fine-Tuning of Large Language Models Paper • 2601.06788 • Published 13 days ago • 3
User-Oriented Multi-Turn Dialogue Generation with Tool Use at scale Paper • 2601.08225 • Published 11 days ago • 50
ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking Paper • 2601.06487 • Published 14 days ago • 49
On the Non-decoupling of Supervised Fine-tuning and Reinforcement Learning in Post-training Paper • 2601.07389 • Published 12 days ago • 2
MemoBrain: Executive Memory as an Agentic Brain for Reasoning Paper • 2601.08079 • Published 12 days ago • 37
MemGovern: Enhancing Code Agents through Learning from Governed Human Experiences Paper • 2601.06789 • Published 13 days ago • 75
The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents Paper • 2601.07264 • Published 12 days ago • 24
Parallel Context-of-Experts Decoding for Retrieval Augmented Generation Paper • 2601.08670 • Published 11 days ago • 19
Aligning Text, Code, and Vision: A Multi-Objective Reinforcement Learning Framework for Text-to-Visualization Paper • 2601.04582 • Published 16 days ago • 10
JudgeRLVR: Judge First, Generate Second for Efficient Reasoning Paper • 2601.08468 • Published 11 days ago • 6
EpiCaR: Knowing What You Don't Know Matters for Better Reasoning in LLMs Paper • 2601.06786 • Published 13 days ago • 6
Controlled Self-Evolution for Algorithmic Code Optimization Paper • 2601.07348 • Published 12 days ago • 110
EvoFSM: Controllable Self-Evolution for Deep Research with Finite State Machines Paper • 2601.09465 • Published 10 days ago • 40
OpenDecoder: Open Large Language Model Decoding to Incorporate Document Quality in RAG Paper • 2601.09028 • Published 11 days ago • 33
ExpSeek: Self-Triggered Experience Seeking for Web Agents Paper • 2601.08605 • Published 11 days ago • 16
Imagine-then-Plan: Agent Learning from Adaptive Lookahead with World Models Paper • 2601.08955 • Published 11 days ago • 13
No More Stale Feedback: Co-Evolving Critics for Open-World Agent Learning Paper • 2601.06794 • Published 13 days ago • 4
DPWriter: Reinforcement Learning with Diverse Planning Branching for Creative Writing Paper • 2601.09609 • Published 10 days ago • 3
Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning Paper • 2601.09536 • Published 10 days ago • 3
SCALER:Synthetic Scalable Adaptive Learning Environment for Reasoning Paper • 2601.04809 • Published 16 days ago • 3
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs Paper • 2601.08763 • Published 11 days ago • 140
Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning Paper • 2601.09667 • Published 10 days ago • 82
Beyond Static Tools: Test-Time Tool Evolution for Scientific Reasoning Paper • 2601.07641 • Published 12 days ago • 45
Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering Paper • 2601.10402 • Published 9 days ago • 36
MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching Paper • 2601.10712 • Published 9 days ago • 24
LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning Paper • 2601.10129 • Published 9 days ago • 11
PACEvolve: Enabling Long-Horizon Progress-Aware Consistent Evolution Paper • 2601.10657 • Published 9 days ago • 19
LSRIF: Logic-Structured Reinforcement Learning for Instruction Following Paper • 2601.06431 • Published 14 days ago • 12
PRL: Process Reward Learning Improves LLMs' Reasoning Ability and Broadens the Reasoning Boundary Paper • 2601.10201 • Published 9 days ago • 8
Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale Paper • 2601.10338 • Published 9 days ago • 5
Memory Bank Compression for Continual Adaptation of Large Language Models Paper • 2601.00756 • Published 22 days ago • 2
Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning Paper • 2601.09088 • Published 11 days ago • 57
The Poisoned Apple Effect: Strategic Manipulation of Mediated Markets via Technology Expansion of AI Agents Paper • 2601.11496 • Published 8 days ago • 45
Unlocking Implicit Experience: Synthesizing Tool-Use Trajectories from Text Paper • 2601.10355 • Published 9 days ago • 38
BAPO: Boundary-Aware Policy Optimization for Reliable Agentic Search Paper • 2601.11037 • Published 8 days ago • 17
ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection Paper • 2601.09195 • Published 10 days ago • 15
PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent with Long-Term User-Centric Records Paper • 2601.09636 • Published 10 days ago • 8
Language of Thought Shapes Output Diversity in Large Language Models Paper • 2601.11227 • Published 8 days ago • 7
Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge Paper • 2601.08808 • Published 11 days ago • 37
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems Paper • 2601.11004 • Published 8 days ago • 29
Spurious Rewards Paradox: Mechanistically Understanding How RLVR Activates Memorization Shortcuts in LLMs Paper • 2601.11061 • Published 8 days ago • 7
YaPO: Learnable Sparse Activation Steering Vectors for Domain Adaptation Paper • 2601.08441 • Published 11 days ago • 7
CLARE: Continual Learning for Vision-Language-Action Models via Autonomous Adapter Routing and Expansion Paper • 2601.09512 • Published 10 days ago • 4
Toward Efficient Agents: Memory, Tool learning, and Planning Paper • 2601.14192 • Published 4 days ago • 47
FutureOmni: Evaluating Future Forecasting from Omni-Modal Context for Multimodal LLMs Paper • 2601.13836 • Published 4 days ago • 34
DARC: Decoupled Asymmetric Reasoning Curriculum for LLM Evolution Paper • 2601.13761 • Published 4 days ago • 15
ToolPRMBench: Evaluating and Advancing Process Reward Models for Tool-using Agents Paper • 2601.12294 • Published 6 days ago • 15
Aligning Agentic World Models via Knowledgeable Experience Learning Paper • 2601.13247 • Published 5 days ago • 15
Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment Paper • 2601.14249 • Published 4 days ago • 6
InT: Self-Proposed Interventions Enable Credit Assignment in LLM Reasoning Paper • 2601.14209 • Published 4 days ago • 5
Uncertainty-Aware Gradient Signal-to-Noise Data Selection for Instruction Tuning Paper • 2601.13697 • Published 4 days ago • 3
Paper2Rebuttal: A Multi-Agent Framework for Transparent Author Response Assistance Paper • 2601.14171 • Published 4 days ago • 44
Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning Paper • 2601.14750 • Published 3 days ago • 14
Numina-Lean-Agent: An Open and General Agentic Reasoning System for Formal Mathematics Paper • 2601.14027 • Published 4 days ago • 10
Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models Paper • 2601.14152 • Published 4 days ago • 4
The Responsibility Vacuum: Organizational Failure in Scaled Agent Systems Paper • 2601.15059 • Published 3 days ago • 3
Facilitating Proactive and Reactive Guidance for Decision Making on the Web: A Design Probe with WebSeek Paper • 2601.15100 • Published 3 days ago • 3