[PDF][PDF] Data-Dependent LSH for the Earth Mover's Distance
R Jayaram, E Waingarten, T Zhang - Proceedings of the 56th Annual …, 2024 - dl.acm.org
R Jayaram, E Waingarten, T Zhang
Proceedings of the 56th Annual ACM Symposium on Theory of Computing, 2024•dl.acm.orgWe give new data-dependent locality sensitive hashing schemes (LSH) for the Earth Mover's
Distance (EMD), and as a result, improve the best approximation for nearest neighbor
search under EMD by a quadratic factor. Here, the metric EMD s (ℝ d, ℓ p) consists of sets of
s vectors in d, and for any two sets x, y of s vectors the distance EMD (x, y) is the minimum
cost of a perfect matching between x, y, where the cost of matching two vectors is their ℓ p
distance. Previously, Andoni, Indyk, and Krauthgamer gave a (data-independent) locality …
Distance (EMD), and as a result, improve the best approximation for nearest neighbor
search under EMD by a quadratic factor. Here, the metric EMD s (ℝ d, ℓ p) consists of sets of
s vectors in d, and for any two sets x, y of s vectors the distance EMD (x, y) is the minimum
cost of a perfect matching between x, y, where the cost of matching two vectors is their ℓ p
distance. Previously, Andoni, Indyk, and Krauthgamer gave a (data-independent) locality …
We give new data-dependent locality sensitive hashing schemes (LSH) for the Earth Mover’s Distance (EMD), and as a result, improve the best approximation for nearest neighbor search under EMD by a quadratic factor. Here, the metric EMDs(ℝd,ℓp) consists of sets of s vectors in d, and for any two sets x,y of s vectors the distance EMD(x,y) is the minimum cost of a perfect matching between x,y, where the cost of matching two vectors is their ℓp distance. Previously, Andoni, Indyk, and Krauthgamer gave a (data-independent) locality-sensitive hashing scheme for EMDs(ℝd,ℓp) when p ∈ [1,2] with approximation O(log2 s). By being data-dependent, we improve the approximation to Õ(logs). Our main technical contribution is to show that for any distribution µ supported on the metric EMDs(ℝd, ℓp), there exists a data-dependent LSH for dense regions of µ which achieves approximation Õ(logs), and that the data-independent LSH actually achieves a Õ(logs)-approximation outside of those dense regions. Finally, we show how to “glue” together these two hashing schemes without any additional loss in the approximation. Beyond nearest neighbor search, our data-dependent LSH also gives optimal (distributional) sketches for the Earth Mover’s Distance. By known sketching lower bounds, this implies that our LSH is optimal (up to poly(loglogs) factors) among those that collide close points with constant probability.
ACM Digital Library
Showing the best result for this search. See all results