High-performance computational framework for phrase relatedness
Proceedings of the 2017 ACM Symposium on Document Engineering, 2017•dl.acm.org
TrWP is a text relatedness measure that computes semantic similarity between words and
phrases utilizing aggregated statistics from the Google Web 1T 5-gram corpus. The phrase
similarity computation in TrWP is costly in terms of both time and space, making the existing
implementation of TrWP impractical for real-world usage. In this work, we present an in-
memory computational framework for TrWP, which optimizes the corpus search using perfect
hashing and minimizes the required memory cost using variable length encoding. Evaluated …
phrases utilizing aggregated statistics from the Google Web 1T 5-gram corpus. The phrase
similarity computation in TrWP is costly in terms of both time and space, making the existing
implementation of TrWP impractical for real-world usage. In this work, we present an in-
memory computational framework for TrWP, which optimizes the corpus search using perfect
hashing and minimizes the required memory cost using variable length encoding. Evaluated …
TrWP is a text relatedness measure that computes semantic similarity between words and phrases utilizing aggregated statistics from the Google Web 1T 5-gram corpus. The phrase similarity computation in TrWP is costly in terms of both time and space, making the existing implementation of TrWP impractical for real-world usage. In this work, we present an in-memory computational framework for TrWP, which optimizes the corpus search using perfect hashing and minimizes the required memory cost using variable length encoding. Evaluated using the Google Web 1T 5-gram corpus, we demonstrate that the computational speed of our framework outperforms a file-based implementation by several orders of magnitude.

Showing the best result for this search. See all results