Comparison of Latent Dirichlet Modeling and Factor Analysis  for Topic Extraction: A Lesson of History

Peladeau, Normand; Davoodi, Elnaz

Comparison of Latent Dirichlet Modeling and Factor Analysis for Topic Extraction: A Lesson of History

Files

paper0078.pdf (374.82 KB)

Date

2018-01-03

Authors

Peladeau, Normand

Davoodi, Elnaz

Abstract

Topic modeling is often perceived as a relatively new development in information retrieval sciences, and new methods such as Probabilistic Latent Semantic Analysis and Latent Dirichlet Allocation have generated a lot of research. However, attempts to extract topics from unstructured text using Factor Analysis techniques can be found as early as the 1960s. This paper compares the perceived coherence of topics extracted on three different datasets using Factor Analysis and Latent Dirichlet Allocation. To perform such a comparison a new extrinsic evaluation method is proposed. Results suggest that Factor Analysis can produce topics perceived by human coders as more coherent than Latent Dirichlet Allocation and warrant a revisit of a topic extraction method developed more than fifty-five years ago, yet forgotten.

Keywords

Text Mining in Big Data Analytics, Factor Analysis, Latent Dirichlet Allocation, Text Mining, Topic Modeling

URI

http://hdl.handle.net/10125/49965

Extent

9 pages

Related To

Proceedings of the 51st Hawaii International Conference on System Sciences

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International

Collections

Text Mining in Big Data Analytics

Full item page

Email [email protected] if you need this content in ADA-compliant format.

Comparison of Latent Dirichlet Modeling and Factor Analysis for Topic Extraction: A Lesson of History

Files

Date

Authors

Contributor

Advisor

Department

Instructor

Depositor

Speaker

Researcher

Consultant

Interviewer

Narrator

Transcriber

Annotator

Journal Title

Journal ISSN

Volume Title

Publisher

Volume

Number/Issue

Starting Page

Ending Page

Alternative Title

Abstract

Description

Keywords

Citation

URI

Extent

Format

Geographic Location

Time Period

Related To

Related To (URI)

Table of Contents

Rights

Rights Holder

Local Contexts

Collections