Clean-Context Autoregressive Video Diffusion β€” Checkpoints

Trained denoisers from What Matters in Clean-Context Autoregressive Video Diffusion. Each checkpoint is a Diffusion-Forcing video denoiser trained on 32-frame windows of 64Γ—64 RGB; see the paper for the full setup and the code repository for training and evaluation.

Main checkpoints (16)

Four training configurations β€” DF (Diffusion Forcing baseline), Mask-only (masked prefix loss), Clean-only (clean prefix), and Full (clean prefix + masked loss) β€” crossed with two denoiser backbones and two datasets. Naming: {backbone}_{dataset}_{config}.ckpt.

File Backbone Dataset Config
unet_dmlab_df.ckpt U-Net DMLab DF
unet_dmlab_mask_only.ckpt U-Net DMLab Mask-only
unet_dmlab_clean_only.ckpt U-Net DMLab Clean-only
unet_dmlab_full.ckpt U-Net DMLab Full
unet_minecraft_df.ckpt U-Net Minecraft DF
unet_minecraft_mask_only.ckpt U-Net Minecraft Mask-only
unet_minecraft_clean_only.ckpt U-Net Minecraft Clean-only
unet_minecraft_full.ckpt U-Net Minecraft Full
dit_dmlab_df.ckpt DiT DMLab DF
dit_dmlab_mask_only.ckpt DiT DMLab Mask-only
dit_dmlab_clean_only.ckpt DiT DMLab Clean-only
dit_dmlab_full.ckpt DiT DMLab Full
dit_minecraft_df.ckpt DiT Minecraft DF
dit_minecraft_mask_only.ckpt DiT Minecraft Mask-only
dit_minecraft_clean_only.ckpt DiT Minecraft Clean-only
dit_minecraft_full.ckpt DiT Minecraft Full

All sixteen share the same optimiser, schedule, and diffusion settings (AdamW, lr 8Γ—10⁻⁡, 100k steps, batch 8, fp16, cosine schedule with K=1000, v-prediction; DDIM with 100 steps, Ξ·=0 at inference); only the architecture and the training configuration differ. The U-Net is the 3D-convolutional Diffusion-Forcing backbone (β‰ˆ18.65 M params); the DiT is a strictly frame-causal diffusion transformer (β‰ˆ18.84 M params).

Ablation checkpoints (ablation/)

The causal-GroupNorm ablation (paper Β§4 / Table 4): the U-Net trained on DMLab to a matched 30k-step budget, with the standard (leaky) temporal GroupNorm versus a frame-causal variant, under DF and Full.

File GroupNorm Config
ablation/unet_dmlab_leaky_df.ckpt leaky (standard) DF
ablation/unet_dmlab_leaky_full.ckpt leaky (standard) Full
ablation/unet_dmlab_framecausal_df.ckpt frame-causal DF
ablation/unet_dmlab_framecausal_full.ckpt frame-causal Full

Loading

These are PyTorch-Lightning checkpoints. Load them with the matching config from the code repository; the U-Net and DiT backbones and all training settings are specified there.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support