Spy vs Spy LoRA
SDXL LoRA for generating Spy vs Spy style black-and-white cartoon art. Trained on comic panels and animation frames from MAD Magazine and MadTV.
Versions
| Version | Images | Base Model | Source | Captioning |
|---|---|---|---|---|
| v1 | 36 | SDXL 1.0 | MAD Magazine comic panels | Manual descriptions expanded by AI |
| v2 | 220 | SDXL 1.0 | v1 panels + MadTV animation | Voice-guided Claude Vision + hallucination cleanup |
| v3 | 861 | SDXL 1.0 | MadTV animation (DVD rips) | Gemini video scene descriptions |
Each version includes all epoch checkpoints (every 2 epochs up to 22, plus final) so you can experiment with different training stages. Best results are typically around epoch 10-16.
Usage (v1-v3)
Trigger Word
All v1-v3 models use a single trigger word: spyvspy
Prompt Format
spyvspy, {scene description}, {appearance}, black and white ink comic art, bold outlines, high contrast, slapstick cartoon style
Example Prompts
Both spies:
spyvspy, white spy planting a bomb under a table while black spy sneaks up behind with a mallet, both wearing fedora hats and trenchcoats with long pointed beak noses and black sclera eyes, black and white ink comic art, bold outlines, high contrast, slapstick cartoon style
Single spy:
spyvspy, black spy peeking around a corner with a mischievous grin holding a lit stick of dynamite, wearing a fedora hat and trenchcoat with long pointed beak nose and black sclera eyes, black and white ink comic art, bold outlines, high contrast, slapstick cartoon style
Style only (no specific character):
spyvspy, large explosion cloud with debris and a fedora hat flying through the air, outdoor rooftop setting, black and white ink comic art, bold outlines, high contrast, slapstick cartoon style
Recommended Settings
| Setting | Value |
|---|---|
| Sampler | euler / dpmpp_2m |
| Scheduler | normal / karras |
| CFG | 7 |
| Steps | 25 |
| LoRA weight | 0.6-0.9 (start at 0.8) |
| Resolution | 1024x1024 or 832x1216 |
Negative Prompt
lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, jpeg artifacts, signature, watermark, blurry, color, colorful, realistic, photographic, 3d render
For single-character prompts, add the other spy to the negative:
- White spy only: add
black spy, multiple characters, two characters - Black spy only: add
white spy, multiple characters, two characters
File Structure
v1/
spyvspy_sdxl.safetensors # Final checkpoint
spyvspy_sdxl-000002.safetensors # Epoch 2
...
spyvspy_sdxl-000022.safetensors # Epoch 22
v2/
spyvspy_sdxl_v2.safetensors # Final checkpoint
spyvspy_sdxl_v2-000002.safetensors
...
v3/
spyvspy_sdxl_v3.safetensors # Final checkpoint
spyvspy_sdxl_v3-000002.safetensors
...
Training Details
| Parameter | Value |
|---|---|
| Network | LoRA, dim=32, alpha=16 |
| Optimizer | AdamW8bit |
| UNet LR | 1e-4 |
| Text Encoder LR | 5e-5 |
| Scheduler | cosine_with_restarts (3 cycles) |
| Precision | bf16 |
| Resolution | 1024, bucketed |
| Epochs | 24, checkpoint every 2 |
| Training framework | kohya_ss sd-scripts |
Captioning Pipeline
Each version used a progressively more sophisticated captioning approach:
v1 β Manual + AI Expansion
Descriptions manually written for each comic panel, then expanded by AI for consistency and detail.
v2 β Voice-Guided Claude Vision
A custom desktop app (frame_curator.py) displayed frames for review. When keeping a frame, a voice note was recorded describing the scene. The transcribed voice note was sent alongside the frame to Claude Vision with a detailed system prompt instructing it to write structured scene descriptions. A second pass (fix_captions.py) cleaned up hallucinated objects by comparing the AI caption against the voice note.
v3 β Gemini Video Analysis
Full episodes sent to Gemini for video-level scene analysis using a captioning prompt. Frames extracted at identified timestamps using FFmpeg with yadif deinterlacing. Each frame reviewed in a custom PySide6 desktop app. Prose captions converted to comma-separated tags via Claude API.
v4 (Coming Soon)
v4 introduces a 5-pass Gemini + Claude merge captioning system, separate character triggers (white_spy, black_spy), a web-based frame reviewer, and trains on NovaAnimeXL instead of SDXL 1.0. See the v4 pipeline docs and prompt templates for details.
Links
- Training pipeline & code: github.com/camdenbalberg/spy-vs-spy-lora
- Training dataset: camdenbalberg/spy-vs-spy-dataset
- Base model: stabilityai/stable-diffusion-xl-base-1.0
- Downloads last month
- -
Model tree for camdenbalberg/spy-vs-spy-lora
Base model
stabilityai/stable-diffusion-xl-base-1.0