arxiv:2602.02437

UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing

Published on Feb 2

· Submitted by

taesiri on Feb 3

Upvote

Authors:

Yibin Wang ,

Abstract

UniReason integrates text-to-image generation and image editing through a dual reasoning paradigm that enhances planning with world knowledge and uses editing for visual refinement, achieving superior performance on reasoning-intensive benchmarks.

AI-generated summary

Unified multimodal models often struggle with complex synthesis tasks that demand deep reasoning, and typically treat text-to-image generation and image editing as isolated capabilities rather than interconnected reasoning steps. To address this, we propose UniReason, a unified framework that harmonizes these two tasks through a dual reasoning paradigm. We formulate generation as world knowledge-enhanced planning to inject implicit constraints, and leverage editing capabilities for fine-grained visual refinement to further correct visual errors via self-reflection. This approach unifies generation and editing within a shared representation, mirroring the human cognitive process of planning followed by refinement. We support this framework by systematically constructing a large-scale reasoning-centric dataset (~300k samples) covering five major knowledge domains (e.g., cultural commonsense, physics, etc.) for planning, alongside an agent-generated corpus for visual self-correction. Extensive experiments demonstrate that UniReason achieves advanced performance on reasoning-intensive benchmarks such as WISE, KrisBench and UniREditBench, while maintaining superior general synthesis capabilities.