🎨 SyntheticGen

Mitigating Long-Tail Bias via Prompt-Controlled Diffusion Augmentation

Addressing class imbalance in remote sensing datasets through controlled synthetic generation

Accepted at IEEE IGARSS 2026 arXiv Paper GitHub Code Live Demo Dataset


🌟 Overview

SyntheticGen is the official implementation for the paper Mitigating Long-Tail Bias via Prompt-Controlled Diffusion Augmentation. It tackles the long-tail distribution problem in remote-sensing datasets (specifically LoveDA) by generating synthetic imagery with explicit control over class ratios.

✨ Highlights

  • Two-stage pipeline: Ratio-conditioned layout D3PM + ControlNet image synthesis.
  • Controllable Augmentation: Specify exact proportions of each land cover class (e.g., building:0.4).
  • Data-Centric Strategy: Improves segmentation performance by adding the right samples to the training set.
SyntheticGen Results

πŸš€ Quick Start

Installation

git clone https://github.com/Buddhi19/SyntheticGen.git
cd SyntheticGen
pip install -r requirements.txt

Generate Your First Synthetic Image

To generate a synthetic image-label pair using a specific configuration:

python src/scripts/sample_pair.py \
  --config configs/sample_pair_ckpt40000_building0.4.yaml

πŸ“š Usage

Training Pipeline

Stage A: Train Layout Generator (D3PM)

python src/scripts/train_layout_d3pm.py \
  --config configs/train_layout_d3pm_masked_sparse_80k.yaml

Stage B: Train Image Generator (ControlNet)

python src/scripts/train_controlnet_ratio.py \
  --config configs/train_controlnet_ratio_loveda_1024.yaml

Inference / Sampling

Override config parameters via CLI:

python src/scripts/sample_pair.py \
  --config configs/sample_pair_ckpt40000_building0.4.yaml \
  --ratios "building:0.4,forest:0.3" \
  --save_dir outputs/custom_generation

πŸ“„ Citation

@misc{wijenayake2026mitigating,
      title={Mitigating Long-Tail Bias via Prompt-Controlled Diffusion Augmentation}, 
      author={Buddhi Wijenayake and Nichula Wasalathilake and Roshan Godaliyadda and Vijitha Herath and Parakrama Ekanayake and Vishal M. Patel},
      year={2026},
      eprint={2602.04749},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2602.04749}, 
}

πŸ™ Acknowledgments

  • LoveDA dataset creators for high-quality annotated remote sensing data.
  • Hugging Face Diffusers for diffusion model infrastructure.
  • ControlNet authors for controllable generation.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for buddhi19/SyntheticGen