Automatic Histograms: Leveraging Language Models for Text Dataset Exploration

E Reif, C Qian, J Wexler, M Kahng - … Abstracts of the CHI Conference on …, 2024 - dl.acm.org
Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, 2024dl.acm.org
Making sense of unstructured text datasets is perennially difficult, yet increasingly relevant
with Large Language Models. Data practitioners often rely on dataset summaries, especially
distributions of various derived features. Some features, like toxicity or topics, are relevant to
many datasets, but many interesting features are domain specific: instruments and genres
for a music dataset, or diseases and symptoms for a medical dataset. Accordingly, data
practitioners often run custom analyses for each dataset, which is cumbersome and difficult …
Making sense of unstructured text datasets is perennially difficult, yet increasingly relevant with Large Language Models. Data practitioners often rely on dataset summaries, especially distributions of various derived features. Some features, like toxicity or topics, are relevant to many datasets, but many interesting features are domain specific: instruments and genres for a music dataset, or diseases and symptoms for a medical dataset. Accordingly, data practitioners often run custom analyses for each dataset, which is cumbersome and difficult, or use unsupervised methods. We present AutoHistograms, a visualization tool leveraging LLMs. AutoHistograms automatically identifies relevant entity-based features, visualizes them, and allows the user to interactively query the dataset for new categories of entities. In a user study with (n=10) data practitioners, we observe that participants were able to quickly onboard to AutoHistograms, use the tool to identify actionable insights, and conceptualize a broad range of applicable use cases. Together, this tool and user study contribute to the growing field of LLM-assisted sensemaking tools.
ACM Digital Library
Showing the best result for this search. See all results