Authors:
João Pinheiro
1
;
Lucas Borges
1
;
Bruno Martins da Silva
1
;
Luiz Leme
2
and
Marco Casanova
1
Affiliations:
1
Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro RJ, Brazil
;
2
Universidade Federal Fluminense, Niterói RJ, Brazil
Keyword(s):
High-Dimensional Vector Streams, Approximate Nearest Neighbor Search, Product Quantization, Hierarchical Navigable Small World Graphs, Classified Ad, Trading Platform.
Abstract:
This paper addresses the vector stream similarity search problem, defined as: “Given a (high-dimensional) vector q and a time interval T, find a ranked list of vectors, retrieved from a vector stream, that are similar to q and that were received in the time interval T.” The paper first introduces a family of methods, called staged vector stream similarity search methods, or briefly SVS methods, to help solve this problem. SVS methods are continuous in the sense that they do not depend on having the full set of vectors available beforehand, but adapt to the vector stream. The paper then presents experiments to assess the performance of two SVS methods, one based on product quantization, called staged IVFADC, and another based on Hierarchical Navigable Small World graphs, called staged HNSW. The experiments with staged IVFADC use well-known image datasets, while those with staged HNSW use real data. The paper concludes with a brief description of a proof-of-concept implementation of a
classified ad retrieval tool that uses staged HNSW.
(More)