🎯 DDPM Pixel Art 16×16 — Class-Conditioned with Classifier-Free Guidance

A class-conditioned DDPM that generates 16×16 pixel art of specific creatures and characters using Classifier-Free Guidance (CFG).

🎮 Try the interactive demo →

Sample Outputs

dragon

cat

wizard

ghost

phoenix

wolf

skeleton

dog

Epoch 799 samples, CFG scale=3.0, upscaled to 256×256 with nearest-neighbor

Available Classes (33 total)

Animals (20): dragon, whale, unicorn, bear, phoenix, cow, panda, dog, goat, lion, cat, frog, shark, wolf, fox, owl, chicken, rabbit, horse, bat

Characters (13): wizard, ninja, skeleton, ghost, zombie, angel, ranger, demon, warrior, knight, samurai, necromancer, elf

Usage

This model uses a custom inference loop with Classifier-Free Guidance (not a standard DDPMPipeline):

import torch
import json
import numpy as np
from PIL import Image
from diffusers import UNet2DModel, DDPMScheduler
from huggingface_hub import hf_hub_download

# Load model
unet = UNet2DModel.from_pretrained(
    "achsaf/ddpm-pixelart-16x16-cond", subfolder="unet"
).to("cuda")
scheduler = DDPMScheduler.from_pretrained(
    "achsaf/ddpm-pixelart-16x16-cond", subfolder="scheduler"
)

# Load class labels
labels_path = hf_hub_download("achsaf/ddpm-pixelart-16x16-cond", "class_labels.json")
with open(labels_path) as f:
    cls_info = json.load(f)

LABEL2ID = cls_info["label2id"]
NULL_CLASS_ID = cls_info["null_class_id"]  # 33

@torch.no_grad()
def generate(class_name, guidance_scale=3.0, num_steps=200, seed=42):
    """Generate a pixel art sprite of the given class."""
    class_id = LABEL2ID[class_name]
    generator = torch.Generator("cuda").manual_seed(seed)
    
    x = torch.randn(1, 3, 16, 16, generator=generator, device="cuda")
    cond_labels = torch.tensor([class_id], device="cuda")
    null_labels = torch.tensor([NULL_CLASS_ID], device="cuda")
    
    scheduler.set_timesteps(num_steps)
    for t in scheduler.timesteps:
        # Two forward passes for CFG
        noise_cond = unet(x, t, class_labels=cond_labels).sample
        noise_uncond = unet(x, t, class_labels=null_labels).sample
        # CFG: ε = ε_uncond + w * (ε_cond - ε_uncond)
        noise_pred = noise_uncond + guidance_scale * (noise_cond - noise_uncond)
        x = scheduler.step(noise_pred, t, x).prev_sample
    
    # Convert to image
    x = (x / 2 + 0.5).clamp(0, 1)
    img = (x.squeeze().permute(1, 2, 0).cpu().numpy() * 255).astype("uint8")
    return Image.fromarray(img)

# Generate a dragon!
dragon = generate("dragon", guidance_scale=3.0)
dragon.resize((256, 256), Image.NEAREST).save("dragon.png")

# Try different classes
for cls in ["cat", "wizard", "ghost", "phoenix"]:
    img = generate(cls, guidance_scale=3.0, seed=42)
    img.resize((256, 256), Image.NEAREST).save(f"{cls}.png")

Adjusting Guidance Scale

CFG 1.0 — Nearly unconditional (diverse but less class-specific)
CFG 3.0 — Recommended default (good class fidelity + diversity)
CFG 5.0+ — Stronger class signal but less variety

Model Details

Property	Value
Architecture	UNet2DModel with class embeddings
Resolution	16×16
Channels	(64, 128, 256)
Parameters	~14M (+8.7K class embedding)
Class embeddings	34 (33 classes + 1 null token)
Scheduler	DDPM, cosine β schedule
Prediction type	ε (epsilon)
EMA	Yes (decay 0.9999)
CFG dropout	10% (P_uncond = 0.1)

Training

Initialized from the v3 unconditional model weights, with a randomly initialized class embedding layer added. Trained for 800 epochs.

Architecture

The model uses diffusers' UNet2DModel with num_class_embeds=34. Class labels are embedded and summed with timestep embeddings in the UNet. During training, 10% of samples have their class label replaced with the null token (id=33) for classifier-free guidance.

Training Recipe

Based on Ho & Salimans 2022, "Classifier-Free Diffusion Guidance":

Pretrained init: All UNet weights from v3 unconditional, class_embedding randomly initialized
CFG dropout: P_uncond = 0.1 (10% unconditional training)
Palette loss: Weight 0.3 (soft k-means compactness + histogram entropy + TV)
Learning rate: 1e-4, cosine schedule, 500 warmup steps
Batch size: 128
Final diffusion loss: ~0.13

Class Labels

Labels were extracted from dataset text captions using keyword matching. Each class has at least ~6 training examples. Images without matching labels are treated as unconditional (null class).

Datasets

Related Models

achsaf/ddpm-pixelart-16x16-v3 — Unconditional base model
achsaf/ddpm-pixelart-16x16 — v1, backbone+head architecture

License

Apache 2.0

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train achsaf/ddpm-pixelart-16x16-cond

Paper for achsaf/ddpm-pixelart-16x16-cond

Classifier-Free Diffusion Guidance

Paper • 2207.12598 • Published Jul 26, 2022 • 5