High-performance computational framework for phrase relatedness

Z Ai, J Mei, A Moh'd, N Zeh, M He, E Milios - Proceedings of the 2017 …, 2017 - dl.acm.org
Z Ai, J Mei, A Moh'd, N Zeh, M He, E Milios
Proceedings of the 2017 ACM Symposium on Document Engineering, 2017dl.acm.org
TrWP is a text relatedness measure that computes semantic similarity between words and
phrases utilizing aggregated statistics from the Google Web 1T 5-gram corpus. The phrase
similarity computation in TrWP is costly in terms of both time and space, making the existing
implementation of TrWP impractical for real-world usage. In this work, we present an in-
memory computational framework for TrWP, which optimizes the corpus search using perfect
hashing and minimizes the required memory cost using variable length encoding. Evaluated …
TrWP is a text relatedness measure that computes semantic similarity between words and phrases utilizing aggregated statistics from the Google Web 1T 5-gram corpus. The phrase similarity computation in TrWP is costly in terms of both time and space, making the existing implementation of TrWP impractical for real-world usage. In this work, we present an in-memory computational framework for TrWP, which optimizes the corpus search using perfect hashing and minimizes the required memory cost using variable length encoding. Evaluated using the Google Web 1T 5-gram corpus, we demonstrate that the computational speed of our framework outperforms a file-based implementation by several orders of magnitude.
ACM Digital Library
Showing the best result for this search. See all results