Training-free Regional Prompting for Diffusion Transformers

Chen, Anthony; Xu, Jianjin; Zheng, Wenzhao; Dai, Gaole; Wang, Yida; Zhang, Renrui; Wang, Haofan; Zhang, Shanghang

Computer Science > Computer Vision and Pattern Recognition

arXiv:2411.02395 (cs)

[Submitted on 4 Nov 2024]

Title:Training-free Regional Prompting for Diffusion Transformers

Authors:Anthony Chen, Jianjin Xu, Wenzhao Zheng, Gaole Dai, Yida Wang, Renrui Zhang, Haofan Wang, Shanghang Zhang

View PDF HTML (experimental)

Abstract:Diffusion models have demonstrated excellent capabilities in text-to-image generation. Their semantic understanding (i.e., prompt following) ability has also been greatly improved with large language models (e.g., T5, Llama). However, existing models cannot perfectly handle long and complex text prompts, especially when the text prompts contain various objects with numerous attributes and interrelated spatial relationships. While many regional prompting methods have been proposed for UNet-based models (SD1.5, SDXL), but there are still no implementations based on the recent Diffusion Transformer (DiT) architecture, such as SD3 and this http URL this report, we propose and implement regional prompting for FLUX.1 based on attention manipulation, which enables DiT with fined-grained compositional text-to-image generation capability in a training-free manner. Code is available at this https URL.

Comments:	Code is available at this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2411.02395 [cs.CV]
	(or arXiv:2411.02395v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2411.02395

Submission history

From: Anthony Chen [view email]
[v1] Mon, 4 Nov 2024 18:59:05 UTC (3,164 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Training-free Regional Prompting for Diffusion Transformers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Training-free Regional Prompting for Diffusion Transformers

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators