NguyenDinhHieu commited on
Commit
6d766bc
·
verified ·
1 Parent(s): b043067

EquiFashionModel

Browse files
Files changed (1) hide show
  1. README.md +189 -0
README.md ADDED
@@ -0,0 +1,189 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - diffusion
5
+ - gan
6
+ - hybrid
7
+ - fashion
8
+ - multimodal
9
+ - controlnet
10
+ - pose-guided
11
+ - pytorch
12
+ library_name: pytorch_lightning
13
+ pipeline_tag: text-to-image
14
+ language:
15
+ - en
16
+ spaces:
17
+ - NguyenDinhHieu/EquiFashion
18
+ ---
19
+
20
+ # 👗 EquiFashion: Hybrid GAN–Diffusion Balancing Diversity–Fidelity for Fashion Design Generation
21
+
22
+ **Authors:**
23
+ Tran Minh Khuong, Nguyen Dinh Hieu [0009-0002-6683-8036], Ngo Dinh Hoang Minh, Nguyen Dinh Bach, Phan Duy Hung [0000-0002-6033-6484]
24
+ **Institution:** FPT University, Hanoi, Vietnam
25
+ 📧 khuongtmhe180089@fpt.edu.vn, hieundhe180318@fpt.edu.vn, minhndhhe182227@fpt.edu.vn, bachndhe173222@fpt.edu.vn, hungpd2@fe.edu.vn
26
+
27
+ ---
28
+
29
+ ## 🧩 Overview
30
+
31
+ **EquiFashion** is a hybrid *GAN–Diffusion* framework that reconciles the long-standing trade-off between **stylistic diversity** and **photorealistic fidelity** in generative fashion design.
32
+ It integrates a GAN-based ideation branch for creative exploration and a diffusion-based refinement branch for faithful reconstruction, enabling high-quality, diverse, and robust fashion image generation.
33
+
34
+ > 🎨 Try the live demo here:
35
+ > 👉 [EquiFashion Demo on Hugging Face Spaces](https://huggingface.co/spaces/NguyenDinhHieu/EquiFashion)
36
+
37
+ ---
38
+
39
+ ## 🎯 Motivation
40
+
41
+ Fashion design requires models that are simultaneously **creative**, **robust**, and **trustworthy**.
42
+ While GANs generate diverse styles but lack stability, and Diffusion Models produce realism but constrain creativity, **EquiFashion** bridges both worlds—achieving controlled diversity, semantic alignment, and realistic garment rendering.
43
+
44
+ ---
45
+
46
+ ## 🧱 Architecture Overview
47
+
48
+ | Component | Description |
49
+ |------------|-------------|
50
+ | **Latent Diffusion Backbone** | Operates in latent space for efficient denoising with high-resolution reconstruction. |
51
+ | **GAN Ideation Module** | Explores stylistic variations through stochastic latent sampling. |
52
+ | **Structural Semantic Consensus** | Ensures linguistic–visual correspondence between attributes and garment parts. |
53
+ | **Semantic-Bundled Attention** | Couples adjective–noun pairs (e.g., “red collar”) for coherent attribute localization. |
54
+ | **Pose-Guided Conditioning** | Aligns garments naturally to human body structure using OpenPose keypoints. |
55
+
56
+ ---
57
+
58
+ ## 🧮 Training Configuration
59
+
60
+ | Setting | Value |
61
+ |----------|-------|
62
+ | Framework | PyTorch Lightning 2.2 |
63
+ | GPU | NVIDIA A100 (40 GB, CUDA 12.2) |
64
+ | Optimizer | AdamW |
65
+ | Learning Rate | 2e-4 (G), 1e-4 (D) |
66
+ | Scheduler | Cosine Decay |
67
+ | Epochs | 400 (200 pretrain + 200 joint) |
68
+ | Precision | FP16 |
69
+ | Batch Size | 32 |
70
+ | Timesteps (T) | 8 |
71
+ | Fusion Decay (γ) | 0.7 |
72
+
73
+ ---
74
+
75
+ ## 🧠 Core Equation
76
+
77
+ The total loss combines autoencoding, adversarial, semantic, and perceptual components:
78
+
79
+ \[
80
+ L_{total} = λ_{AE}L_{AE} + λ_{cons}L_{cons} + λ_{bundle}L_{bundle} + λ_{comp}L_{comp} + λ_G(L_G + λ_{MS}L_{MS}) + λ_{den}L_{denoise} + λ_{rob}L_{rob} + λ_{perc}L_{perc}
81
+ \]
82
+
83
+ ---
84
+
85
+ ## 📊 Quantitative Results
86
+
87
+ | Metric | Value | Benchmark |
88
+ |---------|--------|------------|
89
+ | FID ↓ | **14.7** | FashionAI subset |
90
+ | IS ↑ | **4.23** | – |
91
+ | CLIP-S ↑ | **0.282** | – |
92
+ | Coverage ↑ | **92.8%** | – |
93
+ | Inference Time | **3.8 s / sample (512×512, A100, FP16)** | – |
94
+
95
+ ---
96
+
97
+ ## 🖼️ Visual Results
98
+
99
+ | Input Pose | Generated Outfit |
100
+ |-------------|------------------|
101
+ | ![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusion_pose.png) | ![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/fashion_diffusion.png) |
102
+
103
+ ---
104
+
105
+ ## 📦 Dataset: **EquiFashion-DB**
106
+
107
+ | Property | Description |
108
+ |-----------|--------------|
109
+ | Scale | 350 K images |
110
+ | Resolution | 512×512 |
111
+ | Modalities | Image, Text, Sketch, Pose, Fabric |
112
+ | Coverage | 40+ apparel categories |
113
+ | Key Feature | Noise-aware text, balanced demographics |
114
+ | Purpose | Training + robust benchmarking for generative fashion |
115
+
116
+ ---
117
+
118
+ ## 🚀 Usage Example
119
+
120
+ ```python
121
+ from huggingface_hub import hf_hub_download
122
+ from cldm.model import create_model, load_state_dict
123
+ import torch
124
+
125
+ # Download checkpoint
126
+ ckpt = hf_hub_download("NguyenDinhHieu/EquiFashionModel", filename="hfd_100epochs.ckpt")
127
+
128
+ # Load model
129
+ model = create_model("utils/configs/cldm_v2.yaml").to("cuda")
130
+ model.load_state_dict(load_state_dict(ckpt, location="cuda"))
131
+ model.eval()
132
+
133
+ prompt = "long-sleeve floral dress with tied waist, elegant, 8k detail"
134
+ ```
135
+
136
+ ---
137
+
138
+ ## 💡 Citation
139
+
140
+ If you use this model or dataset, please cite:
141
+
142
+ ```bibtex
143
+ @inproceedings{nguyen2025equifashion,
144
+ title={EquiFashion: Hybrid GAN–Diffusion Balancing Diversity–Fidelity for Fashion Design Generation},
145
+ author={Tran Minh Khuong and Nguyen Dinh Hieu and Ngo Dinh Hoang Minh and Nguyen Dinh Bach and Phan Duy Hung},
146
+ booktitle={Proceedings of the ..... Conference},
147
+ year={2025},
148
+ organization={FPT University, Hanoi}
149
+ }
150
+ ```
151
+
152
+ ---
153
+
154
+ ## 🧩 File Descriptions
155
+
156
+ | File | Description |
157
+ |------|--------------|
158
+ | `hfd_100epochs.ckpt` | Main diffusion model checkpoint |
159
+ | `body_pose_model.pth`, `hand_pose_model.pth` | OpenPose keypoint weights |
160
+ | `open_clip_pytorch_model.bin` | Pretrained OpenCLIP text encoder |
161
+ | `app.py` | Gradio demo UI |
162
+ | `utils/configs/cldm_v2.yaml` | Architecture configuration |
163
+
164
+ ---
165
+
166
+ ## 📚 References
167
+
168
+ 1. Zhu et al. *Be Your Own Prada* (ICCV 2017)
169
+ 2. Chen et al. *TailorGAN* (WACV 2020)
170
+ 3. Li et al. *BC-GAN* (CVPR 2019)
171
+ 4. Xu et al. *AttnGAN* (CVPR 2018)
172
+ 5. Karras et al. *StyleGAN* (CVPR 2019)
173
+ 6. Zhang et al. *DiffCloth* (ICCV 2023)
174
+ 7. Xie et al. *HieraFashDiff* (AAAI 2025)
175
+ 8. Kim et al. *FashionSD-X* (arXiv 2024)
176
+ 9. Baldrati et al. *Multimodal Garment Designer* (ICCV 2023)
177
+ 10. Rombach et al. *Latent Diffusion Models* (CVPR 2022)
178
+
179
+ ---
180
+
181
+ ## 🪪 License
182
+ Released under the **MIT License**.
183
+ You may use, modify, and distribute the model and dataset with attribution.
184
+
185
+ ---
186
+
187
+ ## 🧩 Acknowledgment
188
+ Developed by **FPT University AI Research Group**, Hanoi, Vietnam
189
+ as part of the **EquiAI Research Suite** on fairness, robustness, and trustworthy generative AI.