🎯 DDPM Pixel Art 16×16 — Class-Conditioned with Classifier-Free Guidance
A class-conditioned DDPM that generates 16×16 pixel art of specific creatures and characters using Classifier-Free Guidance (CFG).
🎮 Try the interactive demo →
Sample Outputs
dragon
cat
wizard
ghost
phoenix
wolf
skeleton
dog
Epoch 799 samples, CFG scale=3.0, upscaled to 256×256 with nearest-neighbor
Available Classes (33 total)
Animals (20): dragon, whale, unicorn, bear, phoenix, cow, panda, dog, goat, lion, cat, frog, shark, wolf, fox, owl, chicken, rabbit, horse, bat
Characters (13): wizard, ninja, skeleton, ghost, zombie, angel, ranger, demon, warrior, knight, samurai, necromancer, elf
Usage
This model uses a custom inference loop with Classifier-Free Guidance (not a standard DDPMPipeline):
import torch
import json
import numpy as np
from PIL import Image
from diffusers import UNet2DModel, DDPMScheduler
from huggingface_hub import hf_hub_download
# Load model
unet = UNet2DModel.from_pretrained(
"achsaf/ddpm-pixelart-16x16-cond", subfolder="unet"
).to("cuda")
scheduler = DDPMScheduler.from_pretrained(
"achsaf/ddpm-pixelart-16x16-cond", subfolder="scheduler"
)
# Load class labels
labels_path = hf_hub_download("achsaf/ddpm-pixelart-16x16-cond", "class_labels.json")
with open(labels_path) as f:
cls_info = json.load(f)
LABEL2ID = cls_info["label2id"]
NULL_CLASS_ID = cls_info["null_class_id"] # 33
@torch.no_grad()
def generate(class_name, guidance_scale=3.0, num_steps=200, seed=42):
"""Generate a pixel art sprite of the given class."""
class_id = LABEL2ID[class_name]
generator = torch.Generator("cuda").manual_seed(seed)
x = torch.randn(1, 3, 16, 16, generator=generator, device="cuda")
cond_labels = torch.tensor([class_id], device="cuda")
null_labels = torch.tensor([NULL_CLASS_ID], device="cuda")
scheduler.set_timesteps(num_steps)
for t in scheduler.timesteps:
# Two forward passes for CFG
noise_cond = unet(x, t, class_labels=cond_labels).sample
noise_uncond = unet(x, t, class_labels=null_labels).sample
# CFG: ε = ε_uncond + w * (ε_cond - ε_uncond)
noise_pred = noise_uncond + guidance_scale * (noise_cond - noise_uncond)
x = scheduler.step(noise_pred, t, x).prev_sample
# Convert to image
x = (x / 2 + 0.5).clamp(0, 1)
img = (x.squeeze().permute(1, 2, 0).cpu().numpy() * 255).astype("uint8")
return Image.fromarray(img)
# Generate a dragon!
dragon = generate("dragon", guidance_scale=3.0)
dragon.resize((256, 256), Image.NEAREST).save("dragon.png")
# Try different classes
for cls in ["cat", "wizard", "ghost", "phoenix"]:
img = generate(cls, guidance_scale=3.0, seed=42)
img.resize((256, 256), Image.NEAREST).save(f"{cls}.png")
Adjusting Guidance Scale
- CFG 1.0 — Nearly unconditional (diverse but less class-specific)
- CFG 3.0 — Recommended default (good class fidelity + diversity)
- CFG 5.0+ — Stronger class signal but less variety
Model Details
| Property | Value |
|---|---|
| Architecture | UNet2DModel with class embeddings |
| Resolution | 16×16 |
| Channels | (64, 128, 256) |
| Parameters | ~14M (+8.7K class embedding) |
| Class embeddings | 34 (33 classes + 1 null token) |
| Scheduler | DDPM, cosine β schedule |
| Prediction type | ε (epsilon) |
| EMA | Yes (decay 0.9999) |
| CFG dropout | 10% (P_uncond = 0.1) |
Training
Initialized from the v3 unconditional model weights, with a randomly initialized class embedding layer added. Trained for 800 epochs.
Architecture
The model uses diffusers' UNet2DModel with num_class_embeds=34. Class labels are embedded and summed with timestep embeddings in the UNet. During training, 10% of samples have their class label replaced with the null token (id=33) for classifier-free guidance.
Training Recipe
Based on Ho & Salimans 2022, "Classifier-Free Diffusion Guidance":
- Pretrained init: All UNet weights from v3 unconditional, class_embedding randomly initialized
- CFG dropout: P_uncond = 0.1 (10% unconditional training)
- Palette loss: Weight 0.3 (soft k-means compactness + histogram entropy + TV)
- Learning rate: 1e-4, cosine schedule, 500 warmup steps
- Batch size: 128
- Final diffusion loss: ~0.13
Class Labels
Labels were extracted from dataset text captions using keyword matching. Each class has at least ~6 training examples. Images without matching labels are treated as unconditional (null class).
Datasets
Related Models
- achsaf/ddpm-pixelart-16x16-v3 — Unconditional base model
- achsaf/ddpm-pixelart-16x16 — v1, backbone+head architecture
License
Apache 2.0
- Downloads last month
- -