Comparison of Latent Dirichlet Modeling and Factor Analysis for Topic Extraction: A Lesson of History
Date
2018-01-03
Authors
Contributor
Advisor
Department
Instructor
Depositor
Speaker
Researcher
Consultant
Interviewer
Narrator
Transcriber
Annotator
Journal Title
Journal ISSN
Volume Title
Publisher
Volume
Number/Issue
Starting Page
Ending Page
Alternative Title
Abstract
Topic modeling is often perceived as a relatively new development in information retrieval sciences, and new methods such as Probabilistic Latent Semantic Analysis and Latent Dirichlet Allocation have generated a lot of research. However, attempts to extract topics from unstructured text using Factor Analysis techniques can be found as early as the 1960s. This paper compares the perceived coherence of topics extracted on three different datasets using Factor Analysis and Latent Dirichlet Allocation. To perform such a comparison a new extrinsic evaluation method is proposed. Results suggest that Factor Analysis can produce topics perceived by human coders as more coherent than Latent Dirichlet Allocation and warrant a revisit of a topic extraction method developed more than fifty-five years ago, yet forgotten.
Description
Keywords
Text Mining in Big Data Analytics, Factor Analysis, Latent Dirichlet Allocation, Text Mining, Topic Modeling
Citation
Extent
9 pages
Format
Geographic Location
Time Period
Related To
Proceedings of the 51st Hawaii International Conference on System Sciences
Related To (URI)
Table of Contents
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International
Rights Holder
Local Contexts
Collections
Email [email protected] if you need this content in ADA-compliant format.