---
license: apache-2.0
language:
- en
tags:
- multimodal
- image-restoration
- unified-model
- BAGEL
- VLM
pipeline_tag: image-text-to-text
---

# CLEAR: Unlocking Generative Potential for Degraded Image Understanding

CLEAR is a unified multimodal model that leverages generative capabilities (image restoration) to improve visual understanding of degraded images. It introduces an **interleaved reasoning** paradigm where the model adaptively decides whether to invoke image restoration before answering.

> [[Paper]](https://arxiv.org/abs/2604.04780) | [[Code]](https://github.com/haoxiangzhao12138/CLEAR) | [[Project Page]](https://haoxiangzhao12138.github.io/CLEAR/) | [[MMD-Bench]](https://huggingface.co/datasets/CUDAOUTOFMEMORY/MMD-Bench)

## Citation

```bibtex
@misc{hao2026clearunlockinggenerativepotential,
      title={CLEAR: Unlocking Generative Potential for Degraded Image Understanding in Unified Multimodal Models},
      author={Xiangzhao Hao and Zefeng Zhang and Zhenyu Zhang and Linhao Yu and Yao Chen and Yiqian Zhang and Haiyun Guo and Shuohuan Wang and Yu Sun},
      year={2026},
      eprint={2604.04780},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2604.04780},
}
```