Clean-Context Autoregressive Video Diffusion — Checkpoints

Trained denoisers from What Matters in Clean-Context Autoregressive Video Diffusion. Each checkpoint is a Diffusion-Forcing video denoiser trained on 32-frame windows of 64×64 RGB; see the paper for the full setup and the code repository for training and evaluation.

Main checkpoints (16)

Four training configurations — DF (Diffusion Forcing baseline), Mask-only (masked prefix loss), Clean-only (clean prefix), and Full (clean prefix + masked loss) — crossed with two denoiser backbones and two datasets. Naming: {backbone}_{dataset}_{config}.ckpt.

File	Backbone	Dataset	Config
`unet_dmlab_df.ckpt`	U-Net	DMLab	DF
`unet_dmlab_mask_only.ckpt`	U-Net	DMLab	Mask-only
`unet_dmlab_clean_only.ckpt`	U-Net	DMLab	Clean-only
`unet_dmlab_full.ckpt`	U-Net	DMLab	Full
`unet_minecraft_df.ckpt`	U-Net	Minecraft	DF
`unet_minecraft_mask_only.ckpt`	U-Net	Minecraft	Mask-only
`unet_minecraft_clean_only.ckpt`	U-Net	Minecraft	Clean-only
`unet_minecraft_full.ckpt`	U-Net	Minecraft	Full
`dit_dmlab_df.ckpt`	DiT	DMLab	DF
`dit_dmlab_mask_only.ckpt`	DiT	DMLab	Mask-only
`dit_dmlab_clean_only.ckpt`	DiT	DMLab	Clean-only
`dit_dmlab_full.ckpt`	DiT	DMLab	Full
`dit_minecraft_df.ckpt`	DiT	Minecraft	DF
`dit_minecraft_mask_only.ckpt`	DiT	Minecraft	Mask-only
`dit_minecraft_clean_only.ckpt`	DiT	Minecraft	Clean-only
`dit_minecraft_full.ckpt`	DiT	Minecraft	Full

All sixteen share the same optimiser, schedule, and diffusion settings (AdamW, lr 8×10⁻⁵, 100k steps, batch 8, fp16, cosine schedule with K=1000, v-prediction; DDIM with 100 steps, η=0 at inference); only the architecture and the training configuration differ. The U-Net is the 3D-convolutional Diffusion-Forcing backbone (≈18.65 M params); the DiT is a strictly frame-causal diffusion transformer (≈18.84 M params).

Ablation checkpoints (`ablation/`)

The causal-GroupNorm ablation (paper §4 / Table 4): the U-Net trained on DMLab to a matched 30k-step budget, with the standard (leaky) temporal GroupNorm versus a frame-causal variant, under DF and Full.

File	GroupNorm	Config
`ablation/unet_dmlab_leaky_df.ckpt`	leaky (standard)	DF
`ablation/unet_dmlab_leaky_full.ckpt`	leaky (standard)	Full
`ablation/unet_dmlab_framecausal_df.ckpt`	frame-causal	DF
`ablation/unet_dmlab_framecausal_full.ckpt`	frame-causal	Full

Loading

These are PyTorch-Lightning checkpoints. Load them with the matching config from the code repository; the U-Net and DiT backbones and all training settings are specified there.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Clean-Context Autoregressive Video Diffusion — Checkpoints

Main checkpoints (16)

Ablation checkpoints (ablation/)

Loading

Ablation checkpoints (`ablation/`)