EquiFashionModel
Browse files
README.md
CHANGED
|
@@ -24,8 +24,6 @@ Tran Minh Khuong, Nguyen Dinh Hieu [0009-0002-6683-8036], Ngo Dinh Hoang Minh, N
|
|
| 24 |
**Institution:** FPT University, Hanoi, Vietnam
|
| 25 |
📧 khuongtmhe180089@fpt.edu.vn, hieundhe180318@fpt.edu.vn, minhndhhe182227@fpt.edu.vn, bachndhe173222@fpt.edu.vn, hungpd2@fe.edu.vn
|
| 26 |
|
| 27 |
-
---
|
| 28 |
-
|
| 29 |
## 🧩 Overview
|
| 30 |
|
| 31 |
**EquiFashion** is a hybrid *GAN–Diffusion* framework that reconciles the long-standing trade-off between **stylistic diversity** and **photorealistic fidelity** in generative fashion design.
|
|
@@ -34,15 +32,11 @@ It integrates a GAN-based ideation branch for creative exploration and a diffusi
|
|
| 34 |
> 🎨 Try the live demo here:
|
| 35 |
> 👉 [EquiFashion Demo on Hugging Face Spaces](https://huggingface.co/spaces/NguyenDinhHieu/EquiFashion)
|
| 36 |
|
| 37 |
-
---
|
| 38 |
-
|
| 39 |
## 🎯 Motivation
|
| 40 |
|
| 41 |
Fashion design requires models that are simultaneously **creative**, **robust**, and **trustworthy**.
|
| 42 |
While GANs generate diverse styles but lack stability, and Diffusion Models produce realism but constrain creativity, **EquiFashion** bridges both worlds—achieving controlled diversity, semantic alignment, and realistic garment rendering.
|
| 43 |
|
| 44 |
-
---
|
| 45 |
-
|
| 46 |
## 🧱 Architecture Overview
|
| 47 |
|
| 48 |
| Component | Description |
|
|
@@ -53,8 +47,6 @@ While GANs generate diverse styles but lack stability, and Diffusion Models prod
|
|
| 53 |
| **Semantic-Bundled Attention** | Couples adjective–noun pairs (e.g., “red collar”) for coherent attribute localization. |
|
| 54 |
| **Pose-Guided Conditioning** | Aligns garments naturally to human body structure using OpenPose keypoints. |
|
| 55 |
|
| 56 |
-
---
|
| 57 |
-
|
| 58 |
## 🧮 Training Configuration
|
| 59 |
|
| 60 |
| Setting | Value |
|
|
@@ -70,8 +62,6 @@ While GANs generate diverse styles but lack stability, and Diffusion Models prod
|
|
| 70 |
| Timesteps (T) | 8 |
|
| 71 |
| Fusion Decay (γ) | 0.7 |
|
| 72 |
|
| 73 |
-
---
|
| 74 |
-
|
| 75 |
## 🧠 Core Equation
|
| 76 |
|
| 77 |
The total loss combines autoencoding, adversarial, semantic, and perceptual components:
|
|
@@ -80,8 +70,6 @@ The total loss combines autoencoding, adversarial, semantic, and perceptual comp
|
|
| 80 |
L_{total} = λ_{AE}L_{AE} + λ_{cons}L_{cons} + λ_{bundle}L_{bundle} + λ_{comp}L_{comp} + λ_G(L_G + λ_{MS}L_{MS}) + λ_{den}L_{denoise} + λ_{rob}L_{rob} + λ_{perc}L_{perc}
|
| 81 |
\]
|
| 82 |
|
| 83 |
-
---
|
| 84 |
-
|
| 85 |
## 📊 Quantitative Results
|
| 86 |
|
| 87 |
| Metric | Value | Benchmark |
|
|
@@ -92,16 +80,12 @@ L_{total} = λ_{AE}L_{AE} + λ_{cons}L_{cons} + λ_{bundle}L_{bundle} + λ_{comp
|
|
| 92 |
| Coverage ↑ | **92.8%** | – |
|
| 93 |
| Inference Time | **3.8 s / sample (512×512, A100, FP16)** | – |
|
| 94 |
|
| 95 |
-
---
|
| 96 |
-
|
| 97 |
## 🖼️ Visual Results
|
| 98 |
|
| 99 |
| Input Pose | Generated Outfit |
|
| 100 |
|-------------|------------------|
|
| 101 |
|  |  |
|
| 102 |
|
| 103 |
-
---
|
| 104 |
-
|
| 105 |
## 📦 Dataset: **EquiFashion-DB**
|
| 106 |
|
| 107 |
| Property | Description |
|
|
@@ -113,8 +97,6 @@ L_{total} = λ_{AE}L_{AE} + λ_{cons}L_{cons} + λ_{bundle}L_{bundle} + λ_{comp
|
|
| 113 |
| Key Feature | Noise-aware text, balanced demographics |
|
| 114 |
| Purpose | Training + robust benchmarking for generative fashion |
|
| 115 |
|
| 116 |
-
---
|
| 117 |
-
|
| 118 |
## 🚀 Usage Example
|
| 119 |
|
| 120 |
```python
|
|
@@ -133,8 +115,6 @@ model.eval()
|
|
| 133 |
prompt = "long-sleeve floral dress with tied waist, elegant, 8k detail"
|
| 134 |
```
|
| 135 |
|
| 136 |
-
---
|
| 137 |
-
|
| 138 |
## 💡 Citation
|
| 139 |
|
| 140 |
If you use this model or dataset, please cite:
|
|
@@ -149,8 +129,6 @@ If you use this model or dataset, please cite:
|
|
| 149 |
}
|
| 150 |
```
|
| 151 |
|
| 152 |
-
---
|
| 153 |
-
|
| 154 |
## 🧩 File Descriptions
|
| 155 |
|
| 156 |
| File | Description |
|
|
@@ -161,8 +139,6 @@ If you use this model or dataset, please cite:
|
|
| 161 |
| `app.py` | Gradio demo UI |
|
| 162 |
| `utils/configs/cldm_v2.yaml` | Architecture configuration |
|
| 163 |
|
| 164 |
-
---
|
| 165 |
-
|
| 166 |
## 📚 References
|
| 167 |
|
| 168 |
1. Zhu et al. *Be Your Own Prada* (ICCV 2017)
|
|
@@ -176,14 +152,10 @@ If you use this model or dataset, please cite:
|
|
| 176 |
9. Baldrati et al. *Multimodal Garment Designer* (ICCV 2023)
|
| 177 |
10. Rombach et al. *Latent Diffusion Models* (CVPR 2022)
|
| 178 |
|
| 179 |
-
---
|
| 180 |
-
|
| 181 |
## 🪪 License
|
| 182 |
Released under the **MIT License**.
|
| 183 |
You may use, modify, and distribute the model and dataset with attribution.
|
| 184 |
|
| 185 |
-
---
|
| 186 |
-
|
| 187 |
## 🧩 Acknowledgment
|
| 188 |
Developed by **FPT University AI Research Group**, Hanoi, Vietnam
|
| 189 |
as part of the **EquiAI Research Suite** on fairness, robustness, and trustworthy generative AI.
|
|
|
|
| 24 |
**Institution:** FPT University, Hanoi, Vietnam
|
| 25 |
📧 khuongtmhe180089@fpt.edu.vn, hieundhe180318@fpt.edu.vn, minhndhhe182227@fpt.edu.vn, bachndhe173222@fpt.edu.vn, hungpd2@fe.edu.vn
|
| 26 |
|
|
|
|
|
|
|
| 27 |
## 🧩 Overview
|
| 28 |
|
| 29 |
**EquiFashion** is a hybrid *GAN–Diffusion* framework that reconciles the long-standing trade-off between **stylistic diversity** and **photorealistic fidelity** in generative fashion design.
|
|
|
|
| 32 |
> 🎨 Try the live demo here:
|
| 33 |
> 👉 [EquiFashion Demo on Hugging Face Spaces](https://huggingface.co/spaces/NguyenDinhHieu/EquiFashion)
|
| 34 |
|
|
|
|
|
|
|
| 35 |
## 🎯 Motivation
|
| 36 |
|
| 37 |
Fashion design requires models that are simultaneously **creative**, **robust**, and **trustworthy**.
|
| 38 |
While GANs generate diverse styles but lack stability, and Diffusion Models produce realism but constrain creativity, **EquiFashion** bridges both worlds—achieving controlled diversity, semantic alignment, and realistic garment rendering.
|
| 39 |
|
|
|
|
|
|
|
| 40 |
## 🧱 Architecture Overview
|
| 41 |
|
| 42 |
| Component | Description |
|
|
|
|
| 47 |
| **Semantic-Bundled Attention** | Couples adjective–noun pairs (e.g., “red collar”) for coherent attribute localization. |
|
| 48 |
| **Pose-Guided Conditioning** | Aligns garments naturally to human body structure using OpenPose keypoints. |
|
| 49 |
|
|
|
|
|
|
|
| 50 |
## 🧮 Training Configuration
|
| 51 |
|
| 52 |
| Setting | Value |
|
|
|
|
| 62 |
| Timesteps (T) | 8 |
|
| 63 |
| Fusion Decay (γ) | 0.7 |
|
| 64 |
|
|
|
|
|
|
|
| 65 |
## 🧠 Core Equation
|
| 66 |
|
| 67 |
The total loss combines autoencoding, adversarial, semantic, and perceptual components:
|
|
|
|
| 70 |
L_{total} = λ_{AE}L_{AE} + λ_{cons}L_{cons} + λ_{bundle}L_{bundle} + λ_{comp}L_{comp} + λ_G(L_G + λ_{MS}L_{MS}) + λ_{den}L_{denoise} + λ_{rob}L_{rob} + λ_{perc}L_{perc}
|
| 71 |
\]
|
| 72 |
|
|
|
|
|
|
|
| 73 |
## 📊 Quantitative Results
|
| 74 |
|
| 75 |
| Metric | Value | Benchmark |
|
|
|
|
| 80 |
| Coverage ↑ | **92.8%** | – |
|
| 81 |
| Inference Time | **3.8 s / sample (512×512, A100, FP16)** | – |
|
| 82 |
|
|
|
|
|
|
|
| 83 |
## 🖼️ Visual Results
|
| 84 |
|
| 85 |
| Input Pose | Generated Outfit |
|
| 86 |
|-------------|------------------|
|
| 87 |
|  |  |
|
| 88 |
|
|
|
|
|
|
|
| 89 |
## 📦 Dataset: **EquiFashion-DB**
|
| 90 |
|
| 91 |
| Property | Description |
|
|
|
|
| 97 |
| Key Feature | Noise-aware text, balanced demographics |
|
| 98 |
| Purpose | Training + robust benchmarking for generative fashion |
|
| 99 |
|
|
|
|
|
|
|
| 100 |
## 🚀 Usage Example
|
| 101 |
|
| 102 |
```python
|
|
|
|
| 115 |
prompt = "long-sleeve floral dress with tied waist, elegant, 8k detail"
|
| 116 |
```
|
| 117 |
|
|
|
|
|
|
|
| 118 |
## 💡 Citation
|
| 119 |
|
| 120 |
If you use this model or dataset, please cite:
|
|
|
|
| 129 |
}
|
| 130 |
```
|
| 131 |
|
|
|
|
|
|
|
| 132 |
## 🧩 File Descriptions
|
| 133 |
|
| 134 |
| File | Description |
|
|
|
|
| 139 |
| `app.py` | Gradio demo UI |
|
| 140 |
| `utils/configs/cldm_v2.yaml` | Architecture configuration |
|
| 141 |
|
|
|
|
|
|
|
| 142 |
## 📚 References
|
| 143 |
|
| 144 |
1. Zhu et al. *Be Your Own Prada* (ICCV 2017)
|
|
|
|
| 152 |
9. Baldrati et al. *Multimodal Garment Designer* (ICCV 2023)
|
| 153 |
10. Rombach et al. *Latent Diffusion Models* (CVPR 2022)
|
| 154 |
|
|
|
|
|
|
|
| 155 |
## 🪪 License
|
| 156 |
Released under the **MIT License**.
|
| 157 |
You may use, modify, and distribute the model and dataset with attribution.
|
| 158 |
|
|
|
|
|
|
|
| 159 |
## 🧩 Acknowledgment
|
| 160 |
Developed by **FPT University AI Research Group**, Hanoi, Vietnam
|
| 161 |
as part of the **EquiAI Research Suite** on fairness, robustness, and trustworthy generative AI.
|