Computer Science > Computational Geometry
[Submitted on 22 Dec 2016]
Title:Practical linear-space Approximate Near Neighbors in high dimension
View PDFAbstract:The $c$-approximate Near Neighbor problem in high dimensional spaces has been mainly addressed by Locality Sensitive Hashing (LSH), which offers polynomial dependence on the dimension, query time sublinear in the size of the dataset, and subquadratic space requirement. For practical applications, linear space is typically imperative. Most previous work in the linear space regime focuses on the case that $c$ exceeds $1$ by a constant term. In a recently accepted paper, optimal bounds have been achieved for any $c>1$ \cite{ALRW17}.
Towards practicality, we present a new and simple data structure using linear space and sublinear query time for any $c>1$ including $c\to 1^+$. Given an LSH family of functions for some metric space, we randomly project points to the Hamming cube of dimension $\log n$, where $n$ is the number of input points. The projected space contains strings which serve as keys for buckets containing the input points. The query algorithm simply projects the query point, then examines points which are assigned to the same or nearby vertices on the Hamming cube. We analyze in detail the query time for some standard LSH families.
To illustrate our claim of practicality, we offer an open-source implementation in {\tt C++}, and report on several experiments in dimension up to 1000 and $n$ up to $10^6$. Our algorithm is one to two orders of magnitude faster than brute force search. Experiments confirm the sublinear dependence on $n$ and the linear dependence on the dimension. We have compared against state-of-the-art LSH-based library {\tt FALCONN}: our search is somewhat slower, but memory usage and preprocessing time are significantly smaller.
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.