Think in Strokes, Not Pixels: Process-Driven Image Generation via Interleaved Reasoning Paper • 2604.04746 • Published 2 days ago • 48
Squeez: Task-Conditioned Tool-Output Pruning for Coding Agents Paper • 2604.04979 • Published 6 days ago • 4
On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers Paper • 2603.28762 • Published 10 days ago • 25
TAPS: Task Aware Proposal Distributions for Speculative Sampling Paper • 2603.27027 • Published 13 days ago • 141
Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization Paper • 2603.28342 • Published 11 days ago • 26
ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks Paper • 2603.27862 • Published 11 days ago • 30
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models Paper • 2603.25716 • Published 14 days ago • 152
PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference Paper • 2603.25730 • Published 14 days ago • 51
PixelSmile: Toward Fine-Grained Facial Expression Editing Paper • 2603.25728 • Published 14 days ago • 117
6Bit-Diffusion: Inference-Time Mixed-Precision Quantization for Video Diffusion Models Paper • 2603.18742 • Published 22 days ago • 10
UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation Paper • 2603.23500 • Published 16 days ago • 35
TrajLoom: Dense Future Trajectory Generation from Video Paper • 2603.22606 • Published 17 days ago • 5
Scaling DoRA: High-Rank Adaptation via Factored Norms and Fused Kernels Paper • 2603.22276 • Published 17 days ago • 13
Manifold-Aware Exploration for Reinforcement Learning in Video Generation Paper • 2603.21872 • Published 18 days ago • 33
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model Paper • 2603.21986 • Published 18 days ago • 121
Versatile Editing of Video Content, Actions, and Dynamics without Training Paper • 2603.17989 • Published 22 days ago • 17
Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encoders Paper • 2603.19209 • Published 21 days ago • 5