KVarN: Variance-Normalized KV-Cache Quantization Mitigates Error Accumulation in Reasoning Tasks Paper • 2606.03458 • Published 3 days ago • 46 • 8
KVarN: Variance-Normalized KV-Cache Quantization Mitigates Error Accumulation in Reasoning Tasks Paper • 2606.03458 • Published 3 days ago • 46 • 8
KVarN: Variance-Normalized KV-Cache Quantization Mitigates Error Accumulation in Reasoning Tasks Paper • 2606.03458 • Published 3 days ago • 46 • 8
OCC-RAG: Optimal Cognitive Core for Faithful Question Answering Paper • 2606.00683 • Published 6 days ago • 81 • 6
Bootstrap Your Generator: Unpaired Visual Editing with Flow Matching Paper • 2606.03911 • Published 3 days ago • 19 • 2
Policy and World Modeling Co-Training for Language Agents Paper • 2606.02388 • Published 4 days ago • 11 • 3
Trust-Region Behavior Blending for On-Policy Distillation Paper • 2605.31159 • Published 7 days ago • 64 • 4
A Formally Verified Library of Mathematical Finance in Lean 4 Paper • 2606.01356 • Published 5 days ago • 1 • 3
MCP-Persona: Benchmarking LLM Agents on Real-World Personal Applications via Environment Simulation Paper • 2606.02470 • Published 4 days ago • 16 • 3
OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents Paper • 2606.02031 • Published 4 days ago • 16 • 3
PEEK: Picking Essential frames via Efficient Knowledge distillation Paper • 2605.31029 • Published 7 days ago • 19 • 7
PEEK: Picking Essential frames via Efficient Knowledge distillation Paper • 2605.31029 • Published 7 days ago • 19 • 7
Uniform Diffusion Models Revisited: Leave-One-Out Denoiser and Absorbing State Reformulation Paper • 2605.22765 • Published 15 days ago • 4 • 3
PEEK: Picking Essential frames via Efficient Knowledge distillation Paper • 2605.31029 • Published 7 days ago • 19 • 7
Why Far Looks Up: Probing Spatial Representation in Vision-Language Models Paper • 2605.30161 • Published 8 days ago • 59 • 3
When Cloud Agents Meet Device Agents: Lessons from Hybrid Multi-Agent Systems Paper • 2605.30102 • Published 8 days ago • 15 • 3
REPOT: Recoverable Program-of-Thought via Checkpoint Repair Paper • 2605.30052 • Published 8 days ago • 10 • 3
Thinking Before Constraining: A Unified Decoding Framework for Large Language Models Paper • 2601.07525 • Published 8 days ago • 10 • 3
Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases Paper • 2605.27355 • Published 10 days ago • 7 • 3