Safetensors
English
clip

EditCLIP: Representation Learning for Image Editing

Paper Project Page GitHub ICCV 2025

πŸ’‘ Abstract

We introduce EditCLIP, a novel representation-learning approach for image editing. Our method learns a unified representation of edits by jointly encoding an input image and its edited counterpart, effectively capturing their transformation. To evaluate its effectiveness, we employ EditCLIP to solve two tasks: exemplar-based image editing and automated edit evaluation. In exemplar-based image editing, we replace text-based instructions in InstructPix2Pix with EditCLIP embeddings computed from a reference exemplar image pair. Experiments demonstrate that our approach outperforms state-of-the-art methods while being more efficient and versatile. For automated evaluation, EditCLIP assesses image edits by measuring the similarity between the EditCLIP embedding of a given image pair and either a textual editing instruction or the EditCLIP embedding of another reference image pair. Experiments show that EditCLIP aligns more closely with human judgments than existing CLIP-based metrics, providing a reliable measure of edit quality and structural preservation.

πŸ“Š Benchmark

We evaluate EditCLIP using Top-Bench-X, a benchmark for image editing evaluation:

🌟 Citation

@inproceedings{wang2025editclip,
  title={EditCLIP: Representation Learning for Image Editing},
  author={Wang, Qian and Cveji{\'c}, Aleksandar and Eldesokey, Abdelrahman and Wonka, Peter},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={15960--15970},
  year={2025}
}
Downloads last month
16
Safetensors
Model size
0.4B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for QWW/EditCLIP

Finetuned
(345)
this model

Datasets used to train QWW/EditCLIP

Paper for QWW/EditCLIP