DreamStruct: Understanding Slides and User Interfaces via Synthetic Data Generation

Peng, Yi-Hao; Huq, Faria; Jiang, Yue; Wu, Jason; Li, Amanda Xin Yue; Bigham, Jeffrey; Pavel, Amy

Computer Science > Computer Vision and Pattern Recognition

arXiv:2410.00201 (cs)

[Submitted on 30 Sep 2024]

Title:DreamStruct: Understanding Slides and User Interfaces via Synthetic Data Generation

Authors:Yi-Hao Peng, Faria Huq, Yue Jiang, Jason Wu, Amanda Xin Yue Li, Jeffrey Bigham, Amy Pavel

View PDF

Abstract:Enabling machines to understand structured visuals like slides and user interfaces is essential for making them accessible to people with disabilities. However, achieving such understanding computationally has required manual data collection and annotation, which is time-consuming and labor-intensive. To overcome this challenge, we present a method to generate synthetic, structured visuals with target labels using code generation. Our method allows people to create datasets with built-in labels and train models with a small number of human-annotated examples. We demonstrate performance improvements in three tasks for understanding slides and UIs: recognizing visual elements, describing visual content, and classifying visual content types.

Comments:	ECCV 2024
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Cite as:	arXiv:2410.00201 [cs.CV]
	(or arXiv:2410.00201v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2410.00201

Submission history

From: Yi-Hao Peng [view email]
[v1] Mon, 30 Sep 2024 19:55:54 UTC (5,906 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:DreamStruct: Understanding Slides and User Interfaces via Synthetic Data Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:DreamStruct: Understanding Slides and User Interfaces via Synthetic Data Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators