| --- |
| pipeline_tag: text-to-image |
| license: other |
| license_name: stable-cascade-nc-community |
| license_link: LICENSE |
| --- |
| |
| # SoteDiffusion Cascade |
|
|
| Anime finetune of Stable Cascade Decoder. |
| No commercial use thanks to StabilityAI. |
|
|
| ## Code Example |
|
|
| ```shell |
| pip install diffusers |
| ``` |
|
|
| ```python |
| import torch |
| from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline |
| |
| prompt = "newest, 1girl, solo, cat ears, looking at viewer, blush, light smile," |
| negative_prompt = "very displeasing, worst quality, monochrome, sketch, fat, child," |
| |
| prior = StableCascadePriorPipeline.from_pretrained("Disty0/sote-diffusion-cascade_alpha0", torch_dtype=torch.float16) |
| decoder = StableCascadeDecoderPipeline.from_pretrained("Disty0/sote-diffusion-cascade-decoder_alpha0", torch_dtype=torch.float16) |
| |
| prior.enable_model_cpu_offload() |
| prior_output = prior( |
| prompt=prompt, |
| height=1024, |
| width=1024, |
| negative_prompt=negative_prompt, |
| guidance_scale=7.0, |
| num_images_per_prompt=1, |
| num_inference_steps=40 |
| ) |
| |
| decoder.enable_model_cpu_offload() |
| decoder_output = decoder( |
| image_embeddings=prior_output.image_embeddings, |
| prompt=prompt, |
| negative_prompt=negative_prompt, |
| guidance_scale=1.5 |
| output_type="pil", |
| num_inference_steps=10 |
| ).images[0] |
| decoder_output.save("cascade.png") |
| ``` |
|
|
| ## Dataset |
|
|
| Used the same dataset as Disty0/sote-diffusion-cascade-decoder_pre-alpha0. |
| Trained with 98K~ images. |
| |
| ## Training: |
| |
| **GPU used for training**: 1x AMD RX 7900 XTX 24GB |
| |
| **Software used**: https://github.com/2kpr/StableCascade |
| |
| ### Config: |
| ``` |
| experiment_id: sotediffusion-sc-b_3b |
| model_version: 3B |
| dtype: bfloat16 |
| use_fsdp: False |
| |
| batch_size: 1 |
| grad_accum_steps: 1 |
| updates: 98000 |
| backup_every: 2048 |
| save_every: 1024 |
| warmup_updates: 100 |
| |
| lr: 4.0e-6 |
| optimizer_type: Adafactor |
| adaptive_loss_weight: True |
| stochastic_rounding: True |
| |
| image_size: 768 |
| multi_aspect_ratio: [1/1, 1/2, 1/3, 2/3, 3/4, 1/5, 2/5, 3/5, 4/5, 1/6, 5/6, 9/16] |
| shift: 4 |
|
|
| checkpoint_path: /mnt/DataSSD/AI/SoteDiffusion/StableCascade/ |
| output_path: /mnt/DataSSD/AI/SoteDiffusion/StableCascade/ |
| webdataset_path: file:/mnt/DataSSD/AI/anime_image_dataset/best/newest_best-{0000..0001}.tar |
|
|
| effnet_checkpoint_path: /mnt/DataSSD/AI/models/sd-cascade/effnet_encoder.safetensors |
| stage_a_checkpoint_path: /mnt/DataSSD/AI/models/sd-cascade/stage_a.safetensors |
| generator_checkpoint_path: /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-stage_b.safetensors |
| ``` |
| |
| |
| ## Limitations and Bias |
| |
| ### Bias |
| |
| - This model is intended for anime illustrations. |
| Realistic capabilites are not tested at all. |
| |
| ### Limitations |
| - Far shot eyes are still bad thanks to the heavy latent compression. |
| |