SG-Adapter: Enhancing Text-to-Image Generation with Scene Graph Guidance

Shen, Guibao; Wang, Luozhou; Lin, Jiantao; Ge, Wenhang; Zhang, Chaozhe; Tao, Xin; Zhang, Yuan; Wan, Pengfei; Wang, Zhongyuan; Chen, Guangyong; Li, Yijun; Chen, Ying-Cong

Computer Science > Computer Vision and Pattern Recognition

arXiv:2405.15321 (cs)

[Submitted on 24 May 2024]

Title:SG-Adapter: Enhancing Text-to-Image Generation with Scene Graph Guidance

Authors:Guibao Shen, Luozhou Wang, Jiantao Lin, Wenhang Ge, Chaozhe Zhang, Xin Tao, Yuan Zhang, Pengfei Wan, Zhongyuan Wang, Guangyong Chen, Yijun Li, Ying-Cong Chen

View PDF HTML (experimental)

Abstract:Recent advancements in text-to-image generation have been propelled by the development of diffusion models and multi-modality learning. However, since text is typically represented sequentially in these models, it often falls short in providing accurate contextualization and structural control. So the generated images do not consistently align with human expectations, especially in complex scenarios involving multiple objects and relationships. In this paper, we introduce the Scene Graph Adapter(SG-Adapter), leveraging the structured representation of scene graphs to rectify inaccuracies in the original text embeddings. The SG-Adapter's explicit and non-fully connected graph representation greatly improves the fully connected, transformer-based text representations. This enhancement is particularly notable in maintaining precise correspondence in scenarios involving multiple relationships. To address the challenges posed by low-quality annotated datasets like Visual Genome, we have manually curated a highly clean, multi-relational scene graph-image paired dataset MultiRels. Furthermore, we design three metrics derived from GPT-4V to effectively and thoroughly measure the correspondence between images and scene graphs. Both qualitative and quantitative results validate the efficacy of our approach in controlling the correspondence in multiple relationships.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2405.15321 [cs.CV]
	(or arXiv:2405.15321v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2405.15321

Submission history

From: Guibao Shen [view email]
[v1] Fri, 24 May 2024 08:00:46 UTC (48,712 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:SG-Adapter: Enhancing Text-to-Image Generation with Scene Graph Guidance

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:SG-Adapter: Enhancing Text-to-Image Generation with Scene Graph Guidance

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators