Google Scholar

Entity disambiguation with hierarchical topic models

SS Kataria, KS Kumar, RR Rastogi, P Sen… - Proceedings of the 17th …, 2011 - dl.acm.org

SS Kataria, KS Kumar, RR Rastogi, P Sen, SH Sengamedu

Proceedings of the 17th ACM SIGKDD international conference on Knowledge …, 2011•dl.acm.org

Disambiguating entity references by annotating them with unique ids from a catalog is a critical step in the enrichment of unstructured content. In this paper, we show that topic models, such as Latent Dirichlet Allocation (LDA) and its hierarchical variants, form a natural class of models for learning accurate entity disambiguation models from crowd-sourced knowledge bases such as Wikipedia. Our main contribution is a semi-supervised hierarchical model called Wikipedia-based Pachinko Allocation Model} (WPAM) that exploits: (1) All words in the Wikipedia corpus to learn word-entity associations (unlike existing approaches that only use words in a small fixed window around annotated entity references in Wikipedia pages), (2) Wikipedia annotations to appropriately bias the assignment of entity labels to annotated (and co-occurring unannotated) words during model learning, and (3) Wikipedia's category hierarchy to capture co-occurrence patterns among entities. We also propose a scheme for pruning spurious nodes from Wikipedia's crowd-sourced category hierarchy. In our experiments with multiple real-life datasets, we show that WPAM outperforms state-of-the-art baselines by as much as 16% in terms of disambiguation accuracy.

ACM Digital Library

Show moreShow less

Save Cite Cited by 125 Related articles All 2 versions

Showing the best result for this search. See all results

Cite

Advanced search

Saved to My library

Entity disambiguation with hierarchical topic models