Computer Science and Information Systems 2011 Volume 8, Issue 3, Pages: 711-737
https://doi.org/10.2298/CSIS100407025J
Full text (
432 KB)
Cited by
Indexing temporal information for web pages
Jin Peiquan (School of Computer Science and Technology, University of Science and Technology of China, Hefei, China)
Chen Hong (School of Computer Science and Technology, University of Science and Technology of China, Hefei, China)
Zhao Xujian (School of Computer Science and Technology, University of Science and Technology of China, Hefei, China)
Li Xiaowen (School of Computer Science and Technology, University of Science and Technology of China, Hefei, China)
Yue Lihua (School of Computer Science and Technology, University of Science and Technology of China, Hefei, China)
Temporal information plays important roles in Web search, as Web pages
intrinsically involve crawled time and most Web pages contain time keywords
in their content. How to integrate temporal information in Web search engines
has been a research focus in recent years, among which some key issues such
as temporal-textual indexing and temporal information extraction have to be
first studied. In this paper, we first present a framework of
temporal-textual Web search engine. And then, we concentrate on designing a
new hybrid index structure for temporal and textual information of Web pages.
In particular, we propose to integrate B+-tree, inverted file and a typical
temporal index called MAP21-Tree, to handle temporal-textual queries. We
study five mechanisms to implement a hybrid index structure for
temporal-textual queries, which use different ways to organize the inverted
file, B+-tree and MAP-21 tree. After a theoretic analysis on the performance
of those five index structures, we conduct experiments on both simulated and
real data sets to make performance comparison. The experimental results show
that among all the index schemes the first-inverted-file-then-MAP21-tree
index structure has the best query performance and thus is an acceptable
choice to be the temporal-textual index for future time-aware search engines.
Keywords: Web search, temporal-textual query, temporal information, index structure