TaleCrafter: Interactive Story Visualization with Multiple Characters

Gong, Yuan; Pang, Youxin; Cun, Xiaodong; Xia, Menghan; He, Yingqing; Chen, Haoxin; Wang, Longyue; Zhang, Yong; Wang, Xintao; Shan, Ying; Yang, Yujiu

Computer Science > Computer Vision and Pattern Recognition

arXiv:2305.18247 (cs)

[Submitted on 29 May 2023 (v1), last revised 30 May 2023 (this version, v2)]

Title:TaleCrafter: Interactive Story Visualization with Multiple Characters

Authors:Yuan Gong, Youxin Pang, Xiaodong Cun, Menghan Xia, Yingqing He, Haoxin Chen, Longyue Wang, Yong Zhang, Xintao Wang, Ying Shan, Yujiu Yang

View PDF

Abstract:Accurate Story visualization requires several necessary elements, such as identity consistency across frames, the alignment between plain text and visual content, and a reasonable layout of objects in images. Most previous works endeavor to meet these requirements by fitting a text-to-image (T2I) model on a set of videos in the same style and with the same characters, e.g., the FlintstonesSV dataset. However, the learned T2I models typically struggle to adapt to new characters, scenes, and styles, and often lack the flexibility to revise the layout of the synthesized images. This paper proposes a system for generic interactive story visualization, capable of handling multiple novel characters and supporting the editing of layout and local structure. It is developed by leveraging the prior knowledge of large language and T2I models, trained on massive corpora. The system comprises four interconnected components: story-to-prompt generation (S2P), text-to-layout generation (T2L), controllable text-to-image generation (C-T2I), and image-to-video animation (I2V). First, the S2P module converts concise story information into detailed prompts required for subsequent stages. Next, T2L generates diverse and reasonable layouts based on the prompts, offering users the ability to adjust and refine the layout to their preference. The core component, C-T2I, enables the creation of images guided by layouts, sketches, and actor-specific identifiers to maintain consistency and detail across visualizations. Finally, I2V enriches the visualization process by animating the generated images. Extensive experiments and a user study are conducted to validate the effectiveness and flexibility of interactive editing of the proposed system.

Comments:	Github repository: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2305.18247 [cs.CV]
	(or arXiv:2305.18247v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2305.18247

Submission history

From: Yuan Gong [view email]
[v1] Mon, 29 May 2023 17:11:39 UTC (9,947 KB)
[v2] Tue, 30 May 2023 08:54:42 UTC (9,948 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:TaleCrafter: Interactive Story Visualization with Multiple Characters

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:TaleCrafter: Interactive Story Visualization with Multiple Characters

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators