Semantic relatedness measurement based on Wikipedia link co‐occurrence analysis
International Journal of Web Information Systems
ISSN: 1744-0084
Article publication date: 5 April 2011
Abstract
Purpose
Recently, the importance and effectiveness of Wikipedia Mining has been shown in several researches. One popular research area on Wikipedia Mining focuses on semantic relatedness measurement, and research in this area has shown that Wikipedia can be used for semantic relatedness measurement. However, previous methods are facing two problems; accuracy and scalability. To solve these problems, the purpose of this paper is to propose an efficient semantic relatedness measurement method that leverages global statistical information of Wikipedia. Furthermore, a new test collection is constructed based on Wikipedia concepts for evaluating semantic relatedness measurement methods.
Design/methodology/approach
The authors' approach leverages global statistical information of the whole Wikipedia to compute semantic relatedness among concepts (disambiguated terms) by analyzing co‐occurrences of link pairs in all Wikipedia articles. In Wikipedia, an article represents a concept and a link to another article represents a semantic relation between these two concepts. Thus, the co‐occurrence of a link pair indicates the relatedness of a concept pair. Furthermore, the authors propose an integration method with tfidf as an improved method to additionally leverage local information in an article. Besides, for constructing a new test collection, the authors select a large number of concepts from Wikipedia. The relatedness of these concepts is judged by human test subjects.
Findings
An experiment was conducted for evaluating calculation cost and accuracy of each method. The experimental results show that the calculation cost of this approach is very low compared to one of the previous methods and more accurate than all previous methods for computing semantic relatedness.
Originality/value
This is the first proposal of co‐occurrence analysis of Wikipedia links for semantic relatedness measurement. The authors show that this approach is effective to measure semantic relatedness among concepts regarding calculation cost and accuracy. The findings may be useful to researchers who are interested in knowledge extraction, as well as ontology researches.
Keywords
Citation
Ito, M., Nakayama, K., Hara, T. and Nishio, S. (2011), "Semantic relatedness measurement based on Wikipedia link co‐occurrence analysis", International Journal of Web Information Systems, Vol. 7 No. 1, pp. 44-61. https://doi.org/10.1108/17440081111125653
Publisher
:Emerald Group Publishing Limited
Copyright © 2011, Emerald Group Publishing Limited