Papers
arxiv:2603.00483

RAISE: Requirement-Adaptive Evolutionary Refinement for Training-Free Text-to-Image Alignment

Published on Feb 28
· Submitted by
Liyao Jiang
on Mar 3
Authors:
,
,

Abstract

RAISE is a training-free, requirement-driven evolutionary framework that adaptively improves text-to-image generation by dynamically allocating computational resources based on prompt complexity through iterative refinement actions.

AI-generated summary

Recent text-to-image (T2I) diffusion models achieve remarkable realism, yet faithful prompt-image alignment remains challenging, particularly for complex prompts with multiple objects, relations, and fine-grained attributes. Existing training-free inference-time scaling methods rely on fixed iteration budgets that cannot adapt to prompt difficulty, while reflection-tuned models require carefully curated reflection datasets and extensive joint fine-tuning of diffusion and vision-language models, often overfitting to reflection paths data and lacking transferability across models. We introduce RAISE (Requirement-Adaptive Self-Improving Evolution), a training-free, requirement-driven evolutionary framework for adaptive T2I generation. RAISE formulates image generation as a requirement-driven adaptive scaling process, evolving a population of candidates at inference time through a diverse set of refinement actions-including prompt rewriting, noise resampling, and instructional editing. Each generation is verified against a structured checklist of requirements, enabling the system to dynamically identify unsatisfied items and allocate further computation only where needed. This achieves adaptive test-time scaling that aligns computational effort with semantic query complexity. On GenEval and DrawBench, RAISE attains state-of-the-art alignment (0.94 overall GenEval) while incurring fewer generated samples (reduced by 30-40%) and VLM calls (reduced by 80%) than prior scaling and reflection-tuned baselines, demonstrating efficient, generalizable, and model-agnostic multi-round self-improvement. Code is available at https://github.com/LiyaoJiang1998/RAISE.

Community

Paper author Paper submitter
edited 1 day ago

“RAISE: Requirement-Adaptive Self-Improving Evolution for Training-Free Text-to-Image Alignment” has been accepted to CVPR 2026! 🎉

📌 Question we address: Without training on massive carefully curated data or model scaling, can we simply achieve precise prompt–image alignment at test time only?

🚀 Introducing RAISE — an agentic, autonomous multi-agent framework that evolves and refines text-to-image generation at test time through automated user intent discovery:

  • ✅ Multi-agent refinement: agents collaboratively discover requirements and iteratively verify and improve generations via a checklist
  • ✅ Requirement-adaptive compute: spend more effort only when the prompt is challenging, stop when requirements are met
  • ✅ Plug-and-play: works as an add-on, directly plugs into existing models
  • ✅ Training-free: no extra training or data curation
  • ✅ Model-agnostic: compatible with different diffusion models / VLM backbones

📊 Results highlight: strong prompt-image alignment gains on standard benchmarks, while being far more efficient than fixed-budget or sequential reflection pipelines

🔗 Links

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.00483 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2603.00483 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.00483 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.