RepFusion: Leveraging Multimodal Priors for Denoising in Representation Space Paper • 2606.14700 • Published 8 days ago • 14
UniDDT: Unifying Multimodal Understanding and Generation with Decoupled Diffusion Transformer Paper • 2606.16255 • Published 5 days ago • 11
RepFusion: Leveraging Multimodal Priors for Denoising in Representation Space Paper • 2606.14700 • Published 8 days ago • 14
RepFusion: Leveraging Multimodal Priors for Denoising in Representation Space Paper • 2606.14700 • Published 8 days ago • 14
Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training Paper • 2509.26625 • Published Sep 30, 2025 • 44
MetaQuery Instruction Tuning Data Collection We downsample high-resolution images so that the shorter side is 1024 pixels (MetaQuery_Instruct_2.4M) or 512 pixels (MetaQuery_Instruct_2.4M_512res) • 2 items • Updated Jun 24, 2025 • 1
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis Paper • 2505.10046 • Published May 15, 2025 • 9