Mitigating Long-Tail Bias via Prompt-Controlled Diffusion Augmentation
Paper β’ 2602.04749 β’ Published β’ 1
Addressing class imbalance in remote sensing datasets through controlled synthetic generation
SyntheticGen is the official implementation for the paper Mitigating Long-Tail Bias via Prompt-Controlled Diffusion Augmentation. It tackles the long-tail distribution problem in remote-sensing datasets (specifically LoveDA) by generating synthetic imagery with explicit control over class ratios.
building:0.4).
git clone https://github.com/Buddhi19/SyntheticGen.git
cd SyntheticGen
pip install -r requirements.txt
To generate a synthetic image-label pair using a specific configuration:
python src/scripts/sample_pair.py \
--config configs/sample_pair_ckpt40000_building0.4.yaml
Stage A: Train Layout Generator (D3PM)
python src/scripts/train_layout_d3pm.py \
--config configs/train_layout_d3pm_masked_sparse_80k.yaml
Stage B: Train Image Generator (ControlNet)
python src/scripts/train_controlnet_ratio.py \
--config configs/train_controlnet_ratio_loveda_1024.yaml
Override config parameters via CLI:
python src/scripts/sample_pair.py \
--config configs/sample_pair_ckpt40000_building0.4.yaml \
--ratios "building:0.4,forest:0.3" \
--save_dir outputs/custom_generation
@misc{wijenayake2026mitigating,
title={Mitigating Long-Tail Bias via Prompt-Controlled Diffusion Augmentation},
author={Buddhi Wijenayake and Nichula Wasalathilake and Roshan Godaliyadda and Vijitha Herath and Parakrama Ekanayake and Vishal M. Patel},
year={2026},
eprint={2602.04749},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2602.04749},
}