NguyenDinhHieu
/

EquiFashionModel

+---
+license: mit
+tags:
+- diffusion
+- gan
+- hybrid
+- fashion
+- multimodal
+- controlnet
+- pose-guided
+- pytorch
+library_name: pytorch_lightning
+pipeline_tag: text-to-image
+language:
+- en
+spaces:
+  - NguyenDinhHieu/EquiFashion
+---
+# 👗 EquiFashion: Hybrid GAN–Diffusion Balancing Diversity–Fidelity for Fashion Design Generation
+**Authors:**
+Tran Minh Khuong, Nguyen Dinh Hieu [0009-0002-6683-8036], Ngo Dinh Hoang Minh, Nguyen Dinh Bach, Phan Duy Hung [0000-0002-6033-6484]
+**Institution:** FPT University, Hanoi, Vietnam
+📧 khuongtmhe180089@fpt.edu.vn, hieundhe180318@fpt.edu.vn, minhndhhe182227@fpt.edu.vn, bachndhe173222@fpt.edu.vn, hungpd2@fe.edu.vn
+---
+## 🧩 Overview
+**EquiFashion** is a hybrid *GAN–Diffusion* framework that reconciles the long-standing trade-off between **stylistic diversity** and **photorealistic fidelity** in generative fashion design.
+It integrates a GAN-based ideation branch for creative exploration and a diffusion-based refinement branch for faithful reconstruction, enabling high-quality, diverse, and robust fashion image generation.
+> 🎨 Try the live demo here:
+> 👉 [EquiFashion Demo on Hugging Face Spaces](https://huggingface.co/spaces/NguyenDinhHieu/EquiFashion)
+---
+## 🎯 Motivation
+Fashion design requires models that are simultaneously **creative**, **robust**, and **trustworthy**.
+While GANs generate diverse styles but lack stability, and Diffusion Models produce realism but constrain creativity, **EquiFashion** bridges both worlds—achieving controlled diversity, semantic alignment, and realistic garment rendering.
+---
+## 🧱 Architecture Overview
+| Component | Description |
+|------------|-------------|
+| **Latent Diffusion Backbone** | Operates in latent space for efficient denoising with high-resolution reconstruction. |
+| **GAN Ideation Module** | Explores stylistic variations through stochastic latent sampling. |
+| **Structural Semantic Consensus** | Ensures linguistic–visual correspondence between attributes and garment parts. |
+| **Semantic-Bundled Attention** | Couples adjective–noun pairs (e.g., “red collar”) for coherent attribute localization. |
+| **Pose-Guided Conditioning** | Aligns garments naturally to human body structure using OpenPose keypoints. |
+---
+## 🧮 Training Configuration
+| Setting | Value |
+|----------|-------|
+| Framework | PyTorch Lightning 2.2 |
+| GPU | NVIDIA A100 (40 GB, CUDA 12.2) |
+| Optimizer | AdamW |
+| Learning Rate | 2e-4 (G), 1e-4 (D) |
+| Scheduler | Cosine Decay |
+| Epochs | 400 (200 pretrain + 200 joint) |
+| Precision | FP16 |
+| Batch Size | 32 |
+| Timesteps (T) | 8 |
+| Fusion Decay (γ) | 0.7 |
+---
+## 🧠 Core Equation
+The total loss combines autoencoding, adversarial, semantic, and perceptual components:
+\[
+L_{total} = λ_{AE}L_{AE} + λ_{cons}L_{cons} + λ_{bundle}L_{bundle} + λ_{comp}L_{comp} + λ_G(L_G + λ_{MS}L_{MS}) + λ_{den}L_{denoise} + λ_{rob}L_{rob} + λ_{perc}L_{perc}
+\]
+---
+## 📊 Quantitative Results
+| Metric | Value | Benchmark |
+|---------|--------|------------|
+| FID ↓ | **14.7** | FashionAI subset |
+| IS ↑ | **4.23** | – |
+| CLIP-S ↑ | **0.282** | – |
+| Coverage ↑ | **92.8%** | – |
+| Inference Time | **3.8 s / sample (512×512, A100, FP16)** | – |
+---
+## 🖼️ Visual Results
+| Input Pose | Generated Outfit |
+|-------------|------------------|
+| ![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusion_pose.png) | ![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/fashion_diffusion.png) |
+---
+## 📦 Dataset: **EquiFashion-DB**
+| Property | Description |
+|-----------|--------------|
+| Scale | 350 K images |
+| Resolution | 512×512 |
+| Modalities | Image, Text, Sketch, Pose, Fabric |
+| Coverage | 40+ apparel categories |
+| Key Feature | Noise-aware text, balanced demographics |
+| Purpose | Training + robust benchmarking for generative fashion |
+---
+## 🚀 Usage Example
+```python
+from huggingface_hub import hf_hub_download
+from cldm.model import create_model, load_state_dict
+import torch
+# Download checkpoint
+ckpt = hf_hub_download("NguyenDinhHieu/EquiFashionModel", filename="hfd_100epochs.ckpt")
+# Load model
+model = create_model("utils/configs/cldm_v2.yaml").to("cuda")
+model.load_state_dict(load_state_dict(ckpt, location="cuda"))
+model.eval()
+prompt = "long-sleeve floral dress with tied waist, elegant, 8k detail"
+```
+---
+## 💡 Citation
+If you use this model or dataset, please cite:
+```bibtex
+@inproceedings{nguyen2025equifashion,
+  title={EquiFashion: Hybrid GAN–Diffusion Balancing Diversity–Fidelity for Fashion Design Generation},
+  author={Tran Minh Khuong and Nguyen Dinh Hieu and Ngo Dinh Hoang Minh and Nguyen Dinh Bach and Phan Duy Hung},
+  booktitle={Proceedings of the ..... Conference},
+  year={2025},
+  organization={FPT University, Hanoi}
+}
+```
+---
+## 🧩 File Descriptions
+| File | Description |
+|------|--------------|
+| `hfd_100epochs.ckpt` | Main diffusion model checkpoint |
+| `body_pose_model.pth`, `hand_pose_model.pth` | OpenPose keypoint weights |
+| `open_clip_pytorch_model.bin` | Pretrained OpenCLIP text encoder |
+| `app.py` | Gradio demo UI |
+| `utils/configs/cldm_v2.yaml` | Architecture configuration |
+---
+## 📚 References
+1. Zhu et al. *Be Your Own Prada* (ICCV 2017)
+2. Chen et al. *TailorGAN* (WACV 2020)
+3. Li et al. *BC-GAN* (CVPR 2019)
+4. Xu et al. *AttnGAN* (CVPR 2018)
+5. Karras et al. *StyleGAN* (CVPR 2019)
+6. Zhang et al. *DiffCloth* (ICCV 2023)
+7. Xie et al. *HieraFashDiff* (AAAI 2025)
+8. Kim et al. *FashionSD-X* (arXiv 2024)
+9. Baldrati et al. *Multimodal Garment Designer* (ICCV 2023)
+10. Rombach et al. *Latent Diffusion Models* (CVPR 2022)
+---
+## 🪪 License
+Released under the **MIT License**.
+You may use, modify, and distribute the model and dataset with attribution.
+---
+## 🧩 Acknowledgment
+Developed by **FPT University AI Research Group**, Hanoi, Vietnam
+as part of the **EquiAI Research Suite** on fairness, robustness, and trustworthy generative AI.