The Diffusion Duality, Chapter II: Ψ-Samplers and Efficient Curriculum
Abstract
Discrete diffusion models with predictor-corrector samplers surpass traditional methods in generation quality and efficiency, challenging assumptions about masked diffusion's necessity in language modeling.
Uniform-state discrete diffusion models excel at few-step generation and guidance due to their ability to self-correct, making them preferred over autoregressive or Masked diffusion models in these settings. However, their sampling quality plateaus with ancestral samplers as the number of steps increases. We introduce a family of Predictor-Corrector (PC) samplers for discrete diffusion that generalize prior methods and apply to arbitrary noise processes. When paired with uniform-state diffusion, our samplers outperform ancestral sampling on both language and image modeling, achieving lower generative perplexity at matched unigram entropy on OpenWebText and better FID/IS scores on CIFAR10. Crucially, unlike conventional samplers, our PC methods continue to improve with more sampling steps. Taken together, these findings call into question the assumption that Masked diffusion is the inevitable future of diffusion-based language modeling. Beyond sampling, we develop a memory-efficient curriculum for the Gaussian relaxation training phase, reducing training time by 25% and memory by 33% compared to Duo while maintaining comparable perplexity on OpenWebText and LM1B and strong downstream performance. We release code, checkpoints, and a video-tutorial on: https://s-sahoo.com/duo-ch2
Community
Introduces Predictor-Corrector samplers for discrete diffusion that beat ancestral sampling and enable efficient, improving sampling with more steps, plus a memory-efficient Gaussian-relaxation curriculum.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Scaling Beyond Masked Diffusion Language Models (2026)
- Latent Shadows: The Gaussian-Discrete Duality in Masked Diffusion (2026)
- Unifying Masked Diffusion Models with Various Generation Orders and Beyond (2026)
- Is Your Diffusion Sampler Actually Correct? A Sampler-Centric Evaluation of Discrete Diffusion Language Models (2026)
- Balancing Understanding and Generation in Discrete Diffusion Models (2026)
- Adaptation to Intrinsic Dependence in Diffusion Language Models (2026)
- T3D: Few-Step Diffusion Language Models via Trajectory Self-Distillation with Direct Discriminative Optimization (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper