--- license: apache-2.0 language: - en tags: - multimodal - image-restoration - unified-model - BAGEL - VLM pipeline_tag: image-text-to-text --- # CLEAR: Unlocking Generative Potential for Degraded Image Understanding CLEAR is a unified multimodal model that leverages generative capabilities (image restoration) to improve visual understanding of degraded images. It introduces an **interleaved reasoning** paradigm where the model adaptively decides whether to invoke image restoration before answering. > [[Paper]](https://arxiv.org/abs/2604.04780) | [[Code]](https://github.com/haoxiangzhao12138/CLEAR) | [[Project Page]](https://haoxiangzhao12138.github.io/CLEAR/) | [[MMD-Bench]](https://huggingface.co/datasets/CUDAOUTOFMEMORY/MMD-Bench) ## Citation ```bibtex @misc{hao2026clearunlockinggenerativepotential, title={CLEAR: Unlocking Generative Potential for Degraded Image Understanding in Unified Multimodal Models}, author={Xiangzhao Hao and Zefeng Zhang and Zhenyu Zhang and Linhao Yu and Yao Chen and Yiqian Zhang and Haiyun Guo and Shuohuan Wang and Yu Sun}, year={2026}, eprint={2604.04780}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2604.04780}, } ```