Spy vs Spy LoRA

SDXL LoRA for generating Spy vs Spy style black-and-white cartoon art. Trained on comic panels and animation frames from MAD Magazine and MadTV.

Versions

Version Images Base Model Source Captioning
v1 36 SDXL 1.0 MAD Magazine comic panels Manual descriptions expanded by AI
v2 220 SDXL 1.0 v1 panels + MadTV animation Voice-guided Claude Vision + hallucination cleanup
v3 861 SDXL 1.0 MadTV animation (DVD rips) Gemini video scene descriptions

Each version includes all epoch checkpoints (every 2 epochs up to 22, plus final) so you can experiment with different training stages. Best results are typically around epoch 10-16.

Usage (v1-v3)

Trigger Word

All v1-v3 models use a single trigger word: spyvspy

Prompt Format

spyvspy, {scene description}, {appearance}, black and white ink comic art, bold outlines, high contrast, slapstick cartoon style

Example Prompts

Both spies:

spyvspy, white spy planting a bomb under a table while black spy sneaks up behind with a mallet, both wearing fedora hats and trenchcoats with long pointed beak noses and black sclera eyes, black and white ink comic art, bold outlines, high contrast, slapstick cartoon style

Single spy:

spyvspy, black spy peeking around a corner with a mischievous grin holding a lit stick of dynamite, wearing a fedora hat and trenchcoat with long pointed beak nose and black sclera eyes, black and white ink comic art, bold outlines, high contrast, slapstick cartoon style

Style only (no specific character):

spyvspy, large explosion cloud with debris and a fedora hat flying through the air, outdoor rooftop setting, black and white ink comic art, bold outlines, high contrast, slapstick cartoon style

Recommended Settings

Setting Value
Sampler euler / dpmpp_2m
Scheduler normal / karras
CFG 7
Steps 25
LoRA weight 0.6-0.9 (start at 0.8)
Resolution 1024x1024 or 832x1216

Negative Prompt

lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, jpeg artifacts, signature, watermark, blurry, color, colorful, realistic, photographic, 3d render

For single-character prompts, add the other spy to the negative:

  • White spy only: add black spy, multiple characters, two characters
  • Black spy only: add white spy, multiple characters, two characters

File Structure

v1/
  spyvspy_sdxl.safetensors              # Final checkpoint
  spyvspy_sdxl-000002.safetensors       # Epoch 2
  ...
  spyvspy_sdxl-000022.safetensors       # Epoch 22
v2/
  spyvspy_sdxl_v2.safetensors           # Final checkpoint
  spyvspy_sdxl_v2-000002.safetensors
  ...
v3/
  spyvspy_sdxl_v3.safetensors           # Final checkpoint
  spyvspy_sdxl_v3-000002.safetensors
  ...

Training Details

Parameter Value
Network LoRA, dim=32, alpha=16
Optimizer AdamW8bit
UNet LR 1e-4
Text Encoder LR 5e-5
Scheduler cosine_with_restarts (3 cycles)
Precision bf16
Resolution 1024, bucketed
Epochs 24, checkpoint every 2
Training framework kohya_ss sd-scripts

Captioning Pipeline

Each version used a progressively more sophisticated captioning approach:

v1 β€” Manual + AI Expansion

Descriptions manually written for each comic panel, then expanded by AI for consistency and detail.

v2 β€” Voice-Guided Claude Vision

A custom desktop app (frame_curator.py) displayed frames for review. When keeping a frame, a voice note was recorded describing the scene. The transcribed voice note was sent alongside the frame to Claude Vision with a detailed system prompt instructing it to write structured scene descriptions. A second pass (fix_captions.py) cleaned up hallucinated objects by comparing the AI caption against the voice note.

v3 β€” Gemini Video Analysis

Full episodes sent to Gemini for video-level scene analysis using a captioning prompt. Frames extracted at identified timestamps using FFmpeg with yadif deinterlacing. Each frame reviewed in a custom PySide6 desktop app. Prose captions converted to comma-separated tags via Claude API.

v4 (Coming Soon)

v4 introduces a 5-pass Gemini + Claude merge captioning system, separate character triggers (white_spy, black_spy), a web-based frame reviewer, and trains on NovaAnimeXL instead of SDXL 1.0. See the v4 pipeline docs and prompt templates for details.

Links

Downloads last month
-
Inference Providers NEW

Model tree for camdenbalberg/spy-vs-spy-lora

Adapter
(8345)
this model

Dataset used to train camdenbalberg/spy-vs-spy-lora