Geo-Align: Video Generation Alignment via Metric Geometry Reward Paper • 2605.23903 • Published 4 days ago • 5
Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models Paper • 2605.21573 • Published 6 days ago • 87
SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer Paper • 2605.15178 • Published 12 days ago • 81
Continuous-Time Distribution Matching for Few-Step Diffusion Distillation Paper • 2605.06376 • Published 19 days ago • 26
D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models Paper • 2605.05204 • Published 20 days ago • 27
Lightning Unified Video Editing via In-Context Sparse Attention Paper • 2605.04569 • Published 20 days ago • 18
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation Paper • 2604.24764 • Published 29 days ago • 118
OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation Paper • 2604.11804 • Published Apr 13 • 72
Gen-Searcher: Reinforcing Agentic Search for Image Generation Paper • 2603.28767 • Published Mar 30 • 58
HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning Paper • 2603.17024 • Published Mar 17 • 110
Optimizing Few-Step Generation with Adaptive Matching Distillation Paper • 2602.07345 • Published Feb 7 • 9
Optimizing Few-Step Generation with Adaptive Matching Distillation Paper • 2602.07345 • Published Feb 7 • 9
Optimizing Few-Step Generation with Adaptive Matching Distillation Paper • 2602.07345 • Published Feb 7 • 9
Late-to-Early Training: LET LLMs Learn Earlier, So Faster and Better Paper • 2602.05393 • Published Feb 5 • 8
PISA: Piecewise Sparse Attention Is Wiser for Efficient Diffusion Transformers Paper • 2602.01077 • Published Feb 1 • 4