Using Variational Inference and MapReduce to Scale Topic Modeling

Zhai, Ke; Boyd-Graber, Jordan; Asadi, Nima

Computer Science > Artificial Intelligence

arXiv:1107.3765 (cs)

[Submitted on 19 Jul 2011]

Title:Using Variational Inference and MapReduce to Scale Topic Modeling

Authors:Ke Zhai, Jordan Boyd-Graber, Nima Asadi

View PDF

Abstract:Latent Dirichlet Allocation (LDA) is a popular topic modeling technique for exploring document collections. Because of the increasing prevalence of large datasets, there is a need to improve the scalability of inference of LDA. In this paper, we propose a technique called ~\emph{MapReduce LDA} (Mr. LDA) to accommodate very large corpus collections in the MapReduce framework. In contrast to other techniques to scale inference for LDA, which use Gibbs sampling, we use variational inference. Our solution efficiently distributes computation and is relatively simple to implement. More importantly, this variational implementation, unlike highly tuned and specialized implementations, is easily extensible. We demonstrate two extensions of the model possible with this scalable framework: informed priors to guide topic discovery and modeling topics from a multilingual corpus.

Subjects:	Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:1107.3765 [cs.AI]
	(or arXiv:1107.3765v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.1107.3765

Submission history

From: Jordan Boyd-Graber [view email]
[v1] Tue, 19 Jul 2011 16:32:22 UTC (332 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.AI

< prev | next >

new | recent | 2011-07

Change to browse by:

cs
cs.DC

References & Citations

DBLP - CS Bibliography

listing | bibtex

Ke Zhai
Jordan L. Boyd-Graber
Nima Asadi

export BibTeX citation

Computer Science > Artificial Intelligence

Title:Using Variational Inference and MapReduce to Scale Topic Modeling

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Using Variational Inference and MapReduce to Scale Topic Modeling

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators