Spy vs Spy LoRA

SDXL LoRA for generating Spy vs Spy style black-and-white cartoon art. Trained on comic panels and animation frames from MAD Magazine and MadTV.

Versions

Version	Images	Base Model	Source	Captioning
v1	36	SDXL 1.0	MAD Magazine comic panels	Manual descriptions expanded by AI
v2	220	SDXL 1.0	v1 panels + MadTV animation	Voice-guided Claude Vision + hallucination cleanup
v3	861	SDXL 1.0	MadTV animation (DVD rips)	Gemini video scene descriptions

Each version includes all epoch checkpoints (every 2 epochs up to 22, plus final) so you can experiment with different training stages. Best results are typically around epoch 10-16.

Usage (v1-v3)

Trigger Word

All v1-v3 models use a single trigger word: spyvspy

Prompt Format

spyvspy, {scene description}, {appearance}, black and white ink comic art, bold outlines, high contrast, slapstick cartoon style

Example Prompts

Both spies:

spyvspy, white spy planting a bomb under a table while black spy sneaks up behind with a mallet, both wearing fedora hats and trenchcoats with long pointed beak noses and black sclera eyes, black and white ink comic art, bold outlines, high contrast, slapstick cartoon style

Single spy:

spyvspy, black spy peeking around a corner with a mischievous grin holding a lit stick of dynamite, wearing a fedora hat and trenchcoat with long pointed beak nose and black sclera eyes, black and white ink comic art, bold outlines, high contrast, slapstick cartoon style

Style only (no specific character):

spyvspy, large explosion cloud with debris and a fedora hat flying through the air, outdoor rooftop setting, black and white ink comic art, bold outlines, high contrast, slapstick cartoon style

Recommended Settings

Setting	Value
Sampler	euler / dpmpp_2m
Scheduler	normal / karras
CFG	7
Steps	25
LoRA weight	0.6-0.9 (start at 0.8)
Resolution	1024x1024 or 832x1216

Negative Prompt

lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, jpeg artifacts, signature, watermark, blurry, color, colorful, realistic, photographic, 3d render

For single-character prompts, add the other spy to the negative:

White spy only: add black spy, multiple characters, two characters
Black spy only: add white spy, multiple characters, two characters

File Structure

v1/
  spyvspy_sdxl.safetensors              # Final checkpoint
  spyvspy_sdxl-000002.safetensors       # Epoch 2
  ...
  spyvspy_sdxl-000022.safetensors       # Epoch 22
v2/
  spyvspy_sdxl_v2.safetensors           # Final checkpoint
  spyvspy_sdxl_v2-000002.safetensors
  ...
v3/
  spyvspy_sdxl_v3.safetensors           # Final checkpoint
  spyvspy_sdxl_v3-000002.safetensors
  ...

Training Details

Parameter	Value
Network	LoRA, dim=32, alpha=16
Optimizer	AdamW8bit
UNet LR	1e-4
Text Encoder LR	5e-5
Scheduler	cosine_with_restarts (3 cycles)
Precision	bf16
Resolution	1024, bucketed
Epochs	24, checkpoint every 2
Training framework	kohya_ss sd-scripts

Captioning Pipeline

Each version used a progressively more sophisticated captioning approach:

v1 — Manual + AI Expansion

Descriptions manually written for each comic panel, then expanded by AI for consistency and detail.

v2 — Voice-Guided Claude Vision

A custom desktop app (frame_curator.py) displayed frames for review. When keeping a frame, a voice note was recorded describing the scene. The transcribed voice note was sent alongside the frame to Claude Vision with a detailed system prompt instructing it to write structured scene descriptions. A second pass (fix_captions.py) cleaned up hallucinated objects by comparing the AI caption against the voice note.

v3 — Gemini Video Analysis

Full episodes sent to Gemini for video-level scene analysis using a captioning prompt. Frames extracted at identified timestamps using FFmpeg with yadif deinterlacing. Each frame reviewed in a custom PySide6 desktop app. Prose captions converted to comma-separated tags via Claude API.

v4 (Coming Soon)

v4 introduces a 5-pass Gemini + Claude merge captioning system, separate character triggers (white_spy, black_spy), a web-based frame reviewer, and trains on NovaAnimeXL instead of SDXL 1.0. See the v4 pipeline docs and prompt templates for details.

Model tree for camdenbalberg/spy-vs-spy-lora

Base model

stabilityai/stable-diffusion-xl-base-1.0

Adapter

(8345)

this model

camdenbalberg
/

spy-vs-spy-lora