Composition of nested embeddings with an application to outlier removal

S Chawla, K Sheridan - Proceedings of the 2024 Annual ACM-SIAM …, 2024 - SIAM
Proceedings of the 2024 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2024SIAM
We study the design of embeddings into Euclidean space with outliers. Given a metric space
(X, d) and an integer k, the goal is to embed all but k points in X (called the “outliers”) into ℓ 2
with the smallest possible distortion c. Finding the optimal distortion c for a given outlier set
size k, or alternately the smallest k for a given target distortion c are both NP-hard problems.
In fact, it is UGC-hard to approximate k to within a factor smaller than 2 even when the metric
sans outliers is isometrically embeddable into ℓ2. We consider bi-criteria approximations …
Abstract
We study the design of embeddings into Euclidean space with outliers. Given a metric space (X, d) and an integer k, the goal is to embed all but k points in X (called the “outliers”) into 2 with the smallest possible distortion c. Finding the optimal distortion c for a given outlier set size k, or alternately the smallest k for a given target distortion c are both NP-hard problems. In fact, it is UGC-hard to approximate k to within a factor smaller than 2 even when the metric sans outliers is isometrically embeddable into ℓ2. We consider bi-criteria approximations. Our main result is a polynomial time algorithm that approximates the outlier set size to within an O(log2 k) factor and the distortion to within a constant factor.
The main technical component in our result is an approach for constructing Lipschitz extensions of embeddings into Banach spaces (such as ℓp spaces). We consider a stronger version of Lipschitz extension that we call a nested composition of embeddings : given a low distortion embedding of a subset S of the metric space X, our goal is to extend this embedding to all of X such that the distortion over S is preserved, whereas the distortion over the remaining pairs of points in X is bounded by a function of the size of X \ S. Prior work on Lipschitz extension considers settings where the size of X is potentially much larger than that of S and the expansion bounds depend on |S|. In our setting, the set S is nearly all of X and the remaining set X \ S, a.k.a. the outliers, is small. We achieve an expansion bound that is polylogarithmic in |X \ S|.
*The full version of the paper can be accessed at https://arxiv.org/abs/2306.11604
Society for Industrial and Applied Mathematics
Showing the best result for this search. See all results