Enhancing ad-hoc relevance weighting using probability density estimation

X Zhou, JX Huang, B He - Proceedings of the 34th international ACM …, 2011 - dl.acm.org
X Zhou, JX Huang, B He
Proceedings of the 34th international ACM SIGIR conference on Research and …, 2011dl.acm.org
Classical probabilistic information retrieval (IR) models, eg BM25, deal with document length
based on a trade-off between the Verbosity hypothesis, which assumes the independence of
a document's relevance of its length, and the Scope hypothesis, which assumes the
opposite. Despite the effectiveness of the classical probabilistic models, the potential
relationship between document length and relevance is not fully explored to improve
retrieval performance. In this paper, we conduct an in-depth study of this relationship based …
Classical probabilistic information retrieval (IR) models, e.g. BM25, deal with document length based on a trade-off between the Verbosity hypothesis, which assumes the independence of a document's relevance of its length, and the Scope hypothesis, which assumes the opposite. Despite the effectiveness of the classical probabilistic models, the potential relationship between document length and relevance is not fully explored to improve retrieval performance. In this paper, we conduct an in-depth study of this relationship based on the Scope hypothesis that document length does have its impact on relevance. We study a list of probability density functions and examine which of the density functions fits the best to the actual distribution of the document length. Based on the studied probability density functions, we propose a length-based BM25 relevance weighting model, called BM25L, which incorporates document length as a substantial weighting factor. Extensive experiments conducted on standard TREC collections show that our proposed BM25L markedly outperforms the original BM25 model, even if the latter is optimized.
ACM Digital Library
Showing the best result for this search. See all results