ShotVerse: Advancing Cinematic Camera Control for Text-Driven Multi-Shot Video Creation Paper • 2603.11421 • Published 1 day ago • 11
Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation Paper • 2603.12247 • Published about 11 hours ago • 12
One Model, Many Budgets: Elastic Latent Interfaces for Diffusion Transformers Paper • 2603.12245 • Published about 11 hours ago • 2
Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections Paper • 2603.12180 • Published about 12 hours ago • 8
EVATok: Adaptive Length Video Tokenization for Efficient Visual Autoregressive Generation Paper • 2603.12267 • Published about 11 hours ago • 7
Multi-Task Reinforcement Learning for Enhanced Multimodal LLM-as-a-Judge Paper • 2603.11665 • Published about 21 hours ago • 1
Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training Paper • 2603.12246 • Published about 11 hours ago • 2
EndoCoT: Scaling Endogenous Chain-of-Thought Reasoning in Diffusion Models Paper • 2603.12252 • Published about 11 hours ago • 7
DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use Paper • 2603.11076 • Published 2 days ago • 4
OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams Paper • 2603.12265 • Published about 11 hours ago • 6
WeEdit: A Dataset, Benchmark and Glyph-Guided Framework for Text-centric Image Editing Paper • 2603.11593 • Published about 23 hours ago • 12
IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse Paper • 2603.12201 • Published about 12 hours ago • 24
Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training Paper • 2603.12255 • Published about 11 hours ago • 41
According to Me: Long-Term Personalized Referential Memory QA Paper • 2603.01990 • Published 11 days ago • 3
Hindsight Credit Assignment for Long-Horizon LLM Agents Paper • 2603.08754 • Published 6 days ago • 4