Abstract
Fix \(p\in [1,\infty )\), \(K\in (0,\infty )\), and a probability measure \(\mu \). We prove that for every \(n\in \mathbb {N}\), \(\varepsilon \in (0,1)\), and \(x_1,\ldots ,x_n\in L_p(\mu )\) with \(\big \Vert \max _{i\in \{1,\ldots ,n\}} |x_i| \big \Vert _{L_p(\mu )} \le K\), there exist \(d\le \frac{32e^2 (2K)^{2p}\log n}{\varepsilon ^2}\) and vectors \(y_1,\ldots , y_n \in \ell _p^d\) such that
Moreover, the argument implies the existence of a greedy algorithm which outputs \(\{y_i\}_{i=1}^n\) after receiving \(\{x_i\}_{i=1}^n\) as input. The proof relies on a derandomized version of Maurey’s empirical method (1981) combined with a combinatorial idea of Ball (1990) and a suitable change of measure. Motivated by the above embedding, we introduce the notion of \(\varepsilon \)-isometric dimension reduction of the unit ball \({\textbf {B}}_E\) of a normed space \((E,\Vert \cdot \Vert _E)\) and we prove that \({\textbf {B}}_{\ell _p}\) does not admit \(\varepsilon \)-isometric dimension reduction by linear operators for any value of \(p\ne 2\).
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
1.1 Metric Dimension Reduction
Using standard terminology from metric embeddings (see [38]), we say that a mapping between metric spaces is a bi-Lipschitz embedding with distortion at most \(\alpha \in [1,\infty )\) if there exists a scaling factor \(\sigma \in (0,\infty )\) such that
Throughout this paper, we shall denote by \(\ell _p^d\) the linear space \(\mathbb {R}^d\) equipped with the p-norm,
The classical Johnson–Lindenstrauss lemma [21] asserts that if \((\mathcal {H},\Vert \cdot \Vert _{\mathcal {H}})\) is a Hilbert space and \(x_1,\ldots ,x_n\in \mathcal {H}\), then for every \(\varepsilon \in (0,1)\) there exists \(d\le \tfrac{C\log n}{\varepsilon ^2}\) and \(y_1,\ldots ,y_n\in \ell _2^d\) such that
where \(C\in (0,\infty )\) is a universal constant. In the above embedding terminology, the Johnson–Lindenstrauss lemma states that for every \(\varepsilon \in (0,1)\), \(n\in \mathbb {N}\), and \(d\ge \tfrac{C\log n}{\varepsilon ^2}\), any n-point subset of Hilbert space admits a bi-Lipschitz embedding into \(\ell _2^d\) with distortion at most \(1+\varepsilon \). In order to prove their result, Johnson and Lindenstrauss introduced in [21] the influential random projection method that has since had many important applications in metric geometry and theoretical computer science and kickstarted the field of metric dimension reduction (see the recent survey [36] of Naor) which lies at the intersection of those two subjects.
Following [36], we say that an infinite dimensional Banach space \((E,\Vert \cdot \Vert _E)\) admits bi-Lipschitz dimension reduction if there exists \(\alpha = \alpha (E)\in [1,\infty )\) such that for every \(n\in \mathbb {N}\), there exists \(k_n=k_n(E,\alpha )\in \mathbb {N}\) satisfying
and such that any n-point subset \(\mathcal {S}\) of E admits a bi-Lipschitz embedding with distortion at most \(\alpha \) in a finite-dimensional linear subspace F of E with \(\textrm{dim}F\le k_n\). The only non-Hilbertian space that is known to admit bi-Lipschitz dimension reduction is the 2-convexification of the classical Tsirelson space, as proven by Johnson and Naor in [22]. Turning to negative results, Matoušek proved in [32] the impossibility of bi-Lipschitz dimension reduction in \(\ell _\infty \), whereas Brinkman and Charikar [10] (see also [30] for a shorter proof) constructed an n-point subset of \(\ell _1\) which does not admit a bi-Lipschitz embedding into any \(n^{o(1)}\)-dimensional subspace of \(\ell _1\). Their theorem was recently refined by Naor et al. [37] who showed that the same n-point subset of \(\ell _1\) does not embed into any \(n^{o(1)}\)-dimensional subspace of the trace class \(\textsf{S}_1\) (see also the striking recent work [41] of Regev and Vidick, where the impossibility of polynomial almost isometric dimension reduction in \(\textsf{S}_1\) is established). We refer to [36, Thm. 16] for a summary of the best known bounds quantifying the aforementioned qualitative statements. Despite the lapse of almost four decades since the proof of the Johnson–Lindenstrauss lemma, the following natural question remains stubbornly open.
Question 1.1
For which values of \(p\notin \{1,2,\infty \}\) does \(\ell _p\) admit bi-Lipschitz dimension reduction?
1.2 Dimensionality and Structure
An important feature of the formalism of bi-Lipschitz dimension reduction in a Banach space E is that both the distortion \(\alpha (E)\) of the embedding and the dimension \(k_n(E,\alpha )\) of the target subspace F are independent of the given n-point subset \(\mathcal {S}\) of E. Nevertheless, there are instances in which one can construct delicate embeddings whose distortion or the dimension of their targets depends on subtle geometric parameters of \(\mathcal {S}\). For instance, we mention an important theorem of Schechtman [42, Thm. 5] (which built on work of Klartag and Mendelson [26]) who constructed a linear embedding of an arbitrary subset \(\mathcal {S}\) of \(\ell _2\) into any Banach space E whose distortion depends only on the Gaussian width of \(\mathcal {S}\) and the \(\ell \)-norm of the identity operator \(\textsf{id}_E:E\rightarrow E\). In the special case that E is a Hilbert space, a substantially richer family of such embeddings was devised in [31].
Let \(\mu \) be a probability measure. For a subset \(\mathcal {S}\) of \(L_p(\mu )\), we shall denote
and we will say that \(\mathcal {S}\) is K-incompressibleFootnote 1 if \(\mathcal {I}(\mathcal {S})\le K\). The main contribution of the present paper is the following dimensionality reduction theorem for incompressible subsets of \(L_p(\mu )\) which, in contrast to all the results discussed earlier, is valid for any value of \(p\in [1,\infty )\).
Theorem 1.2
(\(\varepsilon \)-isometric dimension reduction for incompressible subsets of \(L_p(\mu )\)) Fix parameters \(p\in [1,\infty )\), \(n\in \mathbb {N}\), \(K\in (0,\infty )\) and let \(\{x_i\}_{i=1}^n\) be a K-incompressible family of vectors in \(L_p(\mu )\) for some probability measure \(\mu \). Then for every \(\varepsilon \in (0,1)\), there exist \(d\in \mathbb {N}\) with \(d\le \tfrac{32e^2(2K)^{2p}\log n}{\varepsilon ^2}\) and points \(y_1,\ldots ,y_n\in \ell _p^d\) such that
Besides the appearance of the incompressibility parameter K in the bound for the dimension d of the target space, Theorem 1.2 differs from the Johnson–Lindenstrauss lemma in that the error in (6) is additive rather than multiplicative. Recall that a map between metric spaces is called an \(\varepsilon \)-isometric embedding if
Embeddings with additive errors occur naturally in metric geometry and, more specifically, in metric dimension reduction (see e.g. [44, Sect. 9.3]). We mention for instance a result [40, Thm. 1.5] of Plan and Vershynin who showed that any subset \(\mathcal {S}\) of the unit sphere in \(\ell _2^n\) admits a \(\delta \)-isometric embedding into the d-dimensional Hamming cube \((\{-1,1\}^d,\Vert \cdot \Vert _1)\), where d depends polynomially on \(\delta ^{-1}\) and the Gaussian width of \(\mathcal {S}\). In the above embedding terminology and in view of the elementary inequality \(|\alpha -\beta | \le |\alpha ^p-\beta ^p|^{1/p}\) which holds for every \(\alpha ,\beta >0\), Theorem 1.2 asserts that any n-point K-incompressible subset of \(L_p(\mu )\) admits an \(\varepsilon ^{1/p}\)-isometric embedding into \(\ell _p^d\) for the above choice of dimension d. For further occurrences of \(\varepsilon \)-isometric embeddings in the dimensionality reduction and compressed sensing literatures, we refer to [8, 19, 20, 31, 40, 44] and the references therein.
1.3 Method of Proof
A large part of the (vast) literature on metric dimension reduction focuses on showing that a typical low-rank linear operator chosen randomly from a specific ensemble acts as an approximate isometry on a given set \(\mathcal {S}\) with high probability. For subsets \(\mathcal {S}\) of Euclidean space, this principle has been confirmed for random projections [12, 14, 21, 36], matrices with Gaussian [15, 16, 42], Rademacher [1, 5], and subgaussian [13, 17, 26, 31] entries, randomizations of matrices with the RIP [27] as well as more computationally efficient models [2, 3, 9, 24, 33] which are based on sparse matrices. Beyond its inherent interest as an \(\ell _p\)-dimension reduction theorem (albeit, for specific configurations of points), Theorem 1.2 also differs from the aforementioned works in its method of proof. The core of the argument, rather than sampling from a random matrix ensemble, relies on Maurey’s empirical method [39] (see Sect. 2.1) which is a dimension-free way to approximate points in bounded convex subsets of Banach spaces by convex combinations of extreme points with prescribed length. An application of the method to the positive cone of \(L_p\)-distance matrices (the use of which in this context is inspired by classical work of Ball [6]) equipped with the supremum norm allows us to deduce (see Proposition 2.1) the conclusion of Theorem 1.2 under the stronger assumption that
While Maurey’s empirical method is an a priori existential statement that is proven via the probabilistic method, recent works (see [7, 18]) have focused on derandomizing its proof for specific Banach spaces. In the setting of Theorem 1.2, we can use these tools to show (see Corollary 2.7) that there exists a greedy algorithm which receives as input the high-dimensional data \(\{x_i\}_{i=1}^n\) and produces as output the low-dimensional points \(\{y_i\}_{i=1}^n\). Finally, using a suitable change of measure [34] (see Sect. 2.3) we are able to relax the stronger assumption (8) to that of K-incompressibility and derive the conclusion of Theorem 1.2. Finally, we emphasize that, in contrast to most of the dimension reduction algorithms (randomized or not) discussed earlier, the one which gives Theorem 1.2 is not oblivious but is rather tailored to the specific configuration of points \(\{x_i\}_{i=1}^n\) as it relies on the use of Maurey’s empirical method.
1.4 \(\varepsilon \)-Isometric Dimension Reduction
Given two moduli \(\omega ,\Omega :[0,\infty )\rightarrow [0,\infty )\), we say (following [36]) that a Banach space \((E,\Vert \cdot \Vert _E)\) admits metric dimension reduction with moduli \((\omega ,\Omega )\) if for any \(n\in \mathbb {N}\) there exists \(k_n=k_n(E)\in \mathbb {N}\) with \(k_n=n^{o(1)}\) as \(n\rightarrow \infty \) such that for any \(x_1,\ldots ,x_n\in E\), there exist a subspace F of E with \(\textrm{dim}F\le k_n\) and \(y_1,\ldots ,y_n \in F\) satisfying
In view of Theorem 1.2, we would be interested in formulating a suitable notion of dimension reduction via \(\varepsilon \)-isometric embeddings which would be fitting to the moduli appearing in (6).
Remark 1.3
Let \(a,b\in (0,\infty )\), suppose that \(\omega ,\Omega :[0,\infty )\rightarrow [0,\infty )\) are two moduli satisfying
and that the Banach space \((E,\Vert \cdot \Vert _E)\) admits metric dimension reduction with moduli \((\omega ,\Omega )\). Fix \(n\in \mathbb {N}\) and \(x_1,\ldots ,x_n\in E\). Applying the assumption (9) to the points \(sx_1,\ldots ,sx_n\) where \(s>\!\!\!>1\), we deduce that there exist points \(y_1(s),\ldots ,y_n(s)\) in a \(k_n\)-dimensional subspace F(s) of E such that
For any \(\eta \in (0,1)\), we can then choose s large enough (as a function of \(\eta \) and the \(x_i\)) such that
Therefore, we conclude that E also admits bi-Lipschitz dimension reduction (with distortion b/a).
This simple scaling argument suggests that any reasonable notion of \(\varepsilon \)-isometric dimension reduction can differ from the corresponding bi-Lipschitz theory only in small scales, thus motivating the following definition. We denote by \({\textbf {B}}_E\) the unit ball of a normed space \((E,\Vert \cdot \Vert _E)\).
Definition 1.4
(\(\varepsilon \)-isometric dimension reduction) Fix \(\varepsilon \in (0,1)\), \(r\in (0,\infty )\) and let \((E,\Vert \cdot \Vert _E)\) be an infinite-dimensional Banach space. We say that \({\textbf {B}}_E\) admits \(\varepsilon \)-isometric dimension reduction with power r if for every \(n\in \mathbb {N}\) there exists \(k_n=k_n^r(E,\varepsilon )\in \mathbb {N}\) with \(k_n=n^{o(1)}\) as \(n\rightarrow \infty \) for which the following condition holds. For every n points \(x_1,\ldots ,x_n\in {\textbf {B}}_E\) there exist a linear subspace F of E with \(\textrm{dim}F\le k_n\) and points \(y_1,\ldots ,y_n\in F\) satisfying
The fact that even high-dimensional infinite subsets of Euclidean space \(\ell _2\) may admit \(\varepsilon \)-isometric embeddings into low-dimensional subspaces follows from the additive version of the Johnson–Lindenstrauss lemma, first proven by Liaw, Mehrabian, Plan, and Vershynin [31] (see also [44, Prop. 9.3.2]). In contrast to that, combining the scaling argument of Remark 1.3 with the fact that any d-dimensional subspace of \(\ell _2\) is isometric to \(\ell _2^d\), we deduce that if \(k_n(\varepsilon )\) is the least dimension such that any n points in \(\ell _2\) embed \(\varepsilon \)-isometrically in \(\ell _2^{k_n(\varepsilon )}\), then \(k_n(\varepsilon )= n-1\). This justifies the restriction of Definition 1.4 to the unit ball \({\textbf {B}}_E\) of E.
It is clear from the definitions that if a Banach space E admits bi-Lipschitz dimension reduction with distortion \(\tfrac{1+\varepsilon }{1-\varepsilon }\), where \(\varepsilon \in (0,1)\), then \({\textbf {B}}_E\) admits \(2\varepsilon \)-isometric dimension reduction with power \(r=1\). The \(\varepsilon \)-isometric analogue of Question 1.1 deserves further investigation.
Question 1.5
For which values of \(p\ne 2\) does \({\textbf {B}}_{\ell _p}\) admit \(\varepsilon \)-isometric dimension reduction?
Even though the K-incompressibility assumption of Theorem 1.2 may a priori seem restrictive, it is satisfied for most configurations of points in \({\textbf {B}}_{\ell _p}\). Suppose that \(n,N\in \mathbb {N}\) such that N is polynomialFootnote 2 in n. Then, standard considerations show that with high probability, a uniformly chosen n-point subset \(\mathcal {S}\) of \(N^{1/p}{} {\textbf {B}}_{\ell _p^N}\) is \(O(\log n)^{1/p}\)-incompressible. We refer to Remark 2.4 for more information on this and related generic properties of finite subsets of rescaled p-balls.
1.5 \(\varepsilon \)-Isometric Dimension Reduction by Linear Maps
A close inspection of the proof of Theorem 1.2 (see Remark 2.6) reveals that in fact the low-dimensional points \(\{y_i\}_{i=1}^n\) can be realized as images of the initial data \(\{x_i\}_{i=1}^n\) under a carefully chosen linear operator. Nevertheless, we will show that for any \(p\ne 2\) and n large enough, there exists an n-point subset of \({{\textbf {B}}}_{\ell _p}\) whose image under any fixed linear \(\varepsilon \)-isometric embedding has rank which is linear in n. In fact, we shall prove the following more general statement which refines a theorem that Lee, Mendel and Naor proved in [29] for bi-Lipschitz embeddings.
Theorem 1.6
(Impossibility of linear dimension reduction in \({\textbf {B}}_{\ell _p}\)) Fix \(p\ne 2\) and two moduli \(\omega ,\Omega :[0,\infty )\rightarrow [0,\infty )\) with \(\omega (1)>0\). For arbitrarily large \(n\in \mathbb {N}\), there exists an n-point subset \(\mathcal {S}_{n,p}\) of \({\textbf {B}}_{\ell _p}\) such that the following holds. If \(T:\textrm{span}(\mathcal {S}_{n,p})\rightarrow \ell _p^d\) is a linear operator satisfying
then \(d\ge \left( \tfrac{\omega (1)}{\Omega (1)}\right) ^\frac{2p}{|p-2|} \cdot \tfrac{n-1}{2}\).
2 Proof of Theorem 1.2
We say that a normed space \((E,\Vert \cdot \Vert _E)\) has Rademacher type p if there exists a universal constant \(T\in (0,\infty )\) such that for every \(n\in \mathbb {N}\) and every \(x_1,\ldots ,x_n\in E\),
The least constant T such that (15) is satisfied is denoted by \(T_p(E)\). A standard symmetrization argument (see [28, Prop. 9.11]) shows that if \(X_1,\ldots ,X_n\) are independent E-valued random variables with \(\mathbb {E}[X_i]=0\) for every \(i\in \{1,\ldots ,n\}\), then
2.1 Maurey’s Empirical Method and Its Algorithmic Counterparts
A classical theorem of Carathéodory asserts that if \(\mathcal {T}\) is a subset of \(\mathbb {R}^m\), then any point z in the convex hull \(\textrm{conv}(\mathcal {T})\) (that is, a convex combination of finitely many elements of \(\mathcal {T}\)) can be expressed as a convex combination of at most \(m+1\) points of \(\mathcal {T}\). Maurey’s empirical method is a powerful dimension-free approximate version of Carathéodory’s theorem, first popularized in [39], that has numerous applications in geometry and theoretical computer science. Let \((E,\Vert \cdot \Vert _E)\) be a Banach space, consider a bounded subset \(\mathcal {T}\) of E and fix \(z\in \textrm{conv}(\mathcal {T})\). Since z is a convex combination of elements of \(\mathcal {T}\), there exist \(m\in \mathbb {N}\), \(\lambda _1,\ldots ,\lambda _m\in (0,\infty )\), and \(t_1,\ldots ,t_m\in \mathcal {T}\) such that
Let X be an E-valued discrete random variable with \(\mathbb {P}\{X=t_k\}=\lambda _k\) for all \(k\in \{1,\ldots ,m\}\) and consider \(X_1,\ldots ,X_d\) i.i.d. copies of X. Then, conditions (17) ensure that X is well defined and \(\mathbb {E}[X]=z\). Therefore, applying the Rademacher type condition (16) to the centered random variables \(\{X_s-z\}_{s=1}^d\) and normalizing, we get
Since X takes values in \(\mathcal {T}\), if \(\mathcal {T} \subseteq R{{\textbf {B}}}_E\), we then deduce that there exist \(x_1,\ldots ,x_d\in \mathcal {T}\) such that
While the above argument is probabilistic, recent works have focused on derandomizing Maurey’s sampling lemma for smaller classes of Banach spaces, thus constructing deterministic algorithms which output the empirical approximation \(\tfrac{x_1+\ldots +x_d}{d}\) of z. The first result in this direction is due to Barman [7] who treated the case that E is an \(L_r(\mu )\)-space, \(r\in (1,\infty )\). This assumption was recently generalized by Ivanov in [18] who built a greedy algorithm which constructs the desired empirical mean in an arbitrary p-uniformly smooth space.
2.2 Dimension Reduction in \(L_p(\mu )\) for Uniformly Bounded Vectors
With Maurey’s empirical method at hand, we are ready to proceed to the first part of the proof of Theorem 1.2, namely the \(\varepsilon \)-isometric dimension reduction property of \(L_p(\mu )\) under the strong assumption that the given point set consists of functions which are bounded in \(L_\infty (\mu )\).
Proposition 2.1
Fix \(p\in [1,\infty )\), \(n\in \mathbb {N}\) and let \(\{x_i\}_{i=1}^n\) be a family of vectors in \(L_p(\mu )\) for some probability measure \(\mu \). Denote by \(L=\max _{i\in \{1,\ldots ,n\}} \Vert x_i\Vert _{L_\infty (\mu )}\in [0,\infty ]\). Then for every \(\varepsilon \in (0,1)\), there exist \(d\in \mathbb {N}\) with \(d\le \tfrac{32e^2(2L)^{2p}\log n}{\varepsilon ^2}\) and \(y_1,\ldots ,y_n\in \ell _p^d\) such that
Proof
We shall identify \(\ell _\infty ^{\left( {\begin{array}{c}n\\ 2\end{array}}\right) }\) with the vector space of all symmetric \(n\times n\) real matrices with 0 on the diagonal equipped with the supremum norm. Consider the set
It is obvious that \(\mathcal {C}_p\) is a cone in the sense that \(\mathcal {C}_p = \lambda \mathcal {C}_p\) for every \(\lambda >0\) but moreover \(\mathcal {C}_p\) is convex. To see this, consider \(A,B\in \mathcal {C}_p\), probability spaces \((\Omega _1,\rho _1), (\Omega _2,\rho _2)\), and vectors \(\{z_i\}_{i=1}^n, \{w_i\}_{i=1}^n\) in \(L_p(\rho _1)\) and \(L_p(\rho _2)\) respectively such that
Fix \(\lambda \in (0,1)\) and consider the disjoint union \(\Omega _1\sqcup \Omega _2\) of \(\Omega _1\) and \(\Omega _2\) equipped with the probability measure \(\rho (\lambda ) = \lambda \rho _1+(1-\lambda )\rho _2\). Then, by (22) the functions \(\zeta _i:\Omega _1\sqcup \Omega _2\rightarrow \mathbb {R}\) given by \(\zeta _i|_{\Omega _1} = z_i\) and \(\zeta _i|_{\Omega _2}=w_i\), where \(i\in \{1,\ldots ,n\}\), belong to \(L_p(\rho (\lambda ))\) and satisfy the conditions
which ensure that \(\lambda A+(1-\lambda )B\in \mathcal {C}_p\), making \(\mathcal {C}_p\) a convex cone. Consider the embedding \(\mathcal {M}:L_p(\mu )^n\rightarrow \mathcal {C}_p\) mapping a vector \(z=(z_1,\ldots ,z_n)\) to the corresponding distance matrix, i.e.
By Ball’s isometric embedding theorem [6], \(x_1,\ldots ,x_n\) have isometric images in \(\ell _p^N\) with \(N=\left( {\begin{array}{c}n\\ 2\end{array}}\right) +1\). Without loss of generality we will thus assume that the given points \(x_1,\ldots ,x_n\in L_p(\mu )\) are simple functions (that is, each of them takes only finitely many values) with \(\Vert x_i\Vert _{L_\infty (\mu )} \le L\). Let \(\{S_1,\ldots ,S_m\}\) be a partition of the underlying measure space such that each function \(x_i\) is constant on each \(S_k\) and suppose that \(x_i|_{S_k} = a(i,k) \in [-L,L]\) for \(i\in \{1,\ldots ,n\}\) and \(k\in \{1,\ldots ,m\}\). Then, for every \(i,j\in \{1,\ldots ,n\}\), we have
where \(y(k) {\mathop {=}\limits ^{\textrm{def}}}(a(1,k),\ldots ,a(n,k))\in L_p(\mu )^n\) is a vector whose components are constant functions. As \(\mu \) is a probability measure and \(\{S_1,\ldots ,S_m\}\) is a partition, identity (25) implies that
Observe that since \(a(i,k)\in [-L,L]\) for every \(i\in \{1,\ldots ,n\}\) and \(k\in \{1,\ldots ,m\}\), we have
Moreover, \(\ell _\infty ^{\left( {\begin{array}{c}n\\ 2\end{array}}\right) }\) is e-isomorphic to \(\ell _{p_n}^{\left( {\begin{array}{c}n\\ 2\end{array}}\right) }\) where \(p_n=\log \left( {\begin{array}{c}n\\ 2\end{array}}\right) \). It is well-known (see [28, Chap. 9]) that \(T_2(\ell _p) \le \sqrt{p-1}\) for every \(p\ge 2\) and thus
Applying Maurey’s sampling lemma (Sect. 2.1) while taking into account (27) and (28), we deduce that for every \(d\ge 1\) there exist \(k_1,\ldots ,k_d\in \{1,\ldots ,m\}\) such that
Therefore, if \(\varepsilon \in (0,1)\) is such that \(d\ge \tfrac{32e^2 (2\,L)^{2p}\log n}{\varepsilon ^2}\) we then have
Finally, consider for each \(i\in \{1,\ldots ,n\}\) a vector \(y_i=(y_i(1),\ldots ,y_i(d))\in \ell _p^d\) given by
and notice that (30) can be equivalently rewritten as
concluding the proof of the proposition. \(\square \)
Remark 2.2
It is worth emphasizing that the coordinates of the vectors \(y_1,\ldots ,y_n\) produced in Proposition 2.1 consist (up to rescaling) of values of the functions \(x_1,\ldots ,x_n\). Such low-dimensional embeddings via sampling are a central object of study in approximation theory, see e.g. the recent survey [25] and the references therein.
The additive version of the Johnson–Lindenstrauss lemma, first observed in [31] as a consequence of a deep matrix deviation inequality (see also [44, Chap. 9]), asserts that for every n-point subset \(\mathcal {X}=\{x_1,\ldots ,x_n\}\) of a Hilbert space \(\mathcal {H}\) and every \(\varepsilon \in (0,1)\), there exist \(d\le \tfrac{C w(\mathcal {X})^2}{\varepsilon ^2}\) and points \(y_1,\ldots ,y_n\in \ell _2^d\) such that
where \(w(\mathcal {X})\) is the mean width of \(\mathcal {X}\). We will now observe that the spherical symmetry of \({\textbf {B}}_{\ell _2}\) allows us to deduce a similar conclusion for points in \({\textbf {B}}_{\mathcal {H}}\) by removing the incompressibility assumption from Proposition 2.1 when \(p=2\). We shall use the standard notation \(L_p^N\) for the space \(L_p(\mu _N)\) where \(\mu _N\) is the normalized counting measure on the finite set \(\{1,\ldots ,N\}\), that is
Observe that for \(0<p<q\le \infty \), we have \({\textbf {B}}_{L_q^N} \subseteq {\textbf {B}}_{L_p^N}\).
Corollary 2.3
There exists a universal constant \(C\in (0,\infty )\) such that the following statement holds. Fix \(n\in \mathbb {N}\) and let \(\{x_i\}_{i=1}^n\) be a family of vectors in \({\textbf {B}}_{\mathcal {H}}\) for some Hilbert space \(\mathcal {H}\). Then for every \(\varepsilon \in (0,1)\), there exist \(d\in \mathbb {N}\) with \(d\le \tfrac{C(\log n)^3}{\varepsilon ^4}\) and points \(y_1,\ldots ,y_n\in \ell _2^d\) such that
Before proceeding to the derivation of (35) we emphasize that since the given points \(\{x_i\}_{i=1}^n\) belong to \({\textbf {B}}_\mathcal {H}\), Corollary 2.3 is formally weaker than the Johnson–Lindenstrauss lemma. However we include it here since it differs from [21] in that the low-dimensional point set \(\{y_i\}_{i=1}^n\) is not obtained as an image of \(\{x_i\}_{i=1}^n\) under a typical low-rank matrix from a specific ensemble.
Proof of Corollary 2.3
Since any n-point subset \(\{x_1,\ldots ,x_n\}\) of \(\mathcal {H}\) embeds linearly and isometrically in \(L_2^n\), we assume that \(x_1,\ldots ,x_n\in {\textbf {B}}_{L_2^{n}}\). We will need the following claim.
Claim. Suppose that \(X_1,\ldots ,X_n\) are (not necessarily independent) random vectors, each uniformly distributed on the unit sphere \(\mathbb {S}^{n-1}\) of \(L_2^{n}\). Then, for some universal constant \(S\in (0,\infty )\),
Proof of the Claim
By a standard estimate of Schechtman and Zinn [43, Thm. 3], for a uniformly distributed random vector X on the unit sphere \(\mathbb {S}^{n-1}\) of \(L_2^{n}\), we have
for some absolute constants \(\gamma _1,\gamma _2\in (0,\infty )\). Let \(W{\mathop {=}\limits ^{\textrm{def}}}\max _{i\in \{1,\ldots ,n\}} \Vert X_i\Vert _{L_\infty ^{n}}\) and notice that
By the union bound, we have
Combining (38) and (39), we therefore get
Choosing \(K>\gamma _1\) such that \(K^2\gamma _2>1\), the exponent in the last integrand becomes negative, thus
for a large enough constant \(S\in (0,\infty )\) and the claim follows.
Now let \(U \in \mathcal {O}(n)\) be a uniformly chosen random rotation on \(\mathbb {R}^{n}\). The aforementioned claim shows that since \(\Vert x_i\Vert _{L_2^{n}}\le 1\) for every \(i\in \{1,\ldots ,n\}\), writing \(\hat{x}_i =\tfrac{x_i}{\Vert x_i\Vert _{L_2^{n}}}\), we have the estimate
Therefore, by (42) and Proposition 2.1 there exist a constant \(C\in (0,\infty )\) and a rotation \(U\in \mathcal {O}(n)\) such that for every \(\varepsilon \in (0,1)\) there exist \(d\le \tfrac{C(\log n)^3}{\varepsilon ^4}\) and points \(y_1,\ldots ,y_n\in \ell _2^d\) for which
Since \(\Vert Ua-Ub\Vert _{L_2^{n}} = \Vert a-b\Vert _{L_2^{n}}\) for every \(a,b\in L_2^n\), the conclusion follows by the elementary inequality \(|\alpha -\beta | \le \sqrt{|\alpha ^2-\beta ^2|}\) which holds for every positive numbers \(\alpha ,\beta \in (0,\infty )\). \(\square \)
Remark 2.4
Fix \(p\in [1,\infty )\). The isometric embedding theorem of Ball [6] asserts that any n-point subset of \(\ell _p\) admits an isometric embedding into \(\ell _p^N\) where \(N=\left( {\begin{array}{c}n\\ 2\end{array}}\right) +1\). Suppose, more generally, that \(n,N\in \mathbb {N}\) are such that N is polynomial in n. Considerations in the spirit of the proof of Corollary 2.3 (e.g. relying on [43]) then show that if \(x_1,\ldots ,x_n\) are independent uniformly random points in \({\textbf {B}}_{L_p^N}\), then the random set \(\{x_1,\ldots ,x_n\}\) is \(O(\log n)^{1/p}\)-incompressible. In other words, incompressibility is a generic property of random n-point subsets of \({\textbf {B}}_{L_p^N}\). On the other hand, a typical n-point subset of \({\textbf {B}}_{L_p^N}\) is known to be approximately a simplex due to work of Arias-de-Reyna, Ball, and Villa [4] and so, in particular, it can be bi-Lipschitzly embedded in \(O(\log n)\) dimensions.
2.3 Factorization and Proof of Theorem 1.2
Observe that Proposition 2.1 is rather non-canonical as the conclusion depends on the pairwise distances between the points \(\{x_i\}_{i=1}^n\) in \(L_p(\mu )\) whereas the bound on the dimension depends on \(L=\max _i \Vert x_i\Vert _{L_\infty (\mu )}\). In order to deduce Theorem 1.2 from this (a priori weaker) statement we shall leverage the fact that Proposition 2.1 holds for any probability measure \(\mu \) by optimizing this parameter L over all lattice-isomorphic images of \(\{x_i\}_{i=1}^n\). The optimal such change of measure which allows us to replace L by \(\Vert \max _i |x_i|\Vert _{L_p(\mu )}\) is a special case of a classical factorization theorem of Maurey (see [34] or [23, Thm. 5] for the general statement), whose short proof we include for completeness.
Proposition 2.5
Fix \(n\in \mathbb {N}\), \(p\in (0,\infty )\), and a probability space \((\Omega ,\mu )\). For every points \(x_1,\ldots ,x_n\in L_p(\mu )\), there exists a nonnegative density function \(f:\Omega \rightarrow \mathbb {R}_+\) supported on the support of \(\max _i|x_i|\) such that if \(\nu \) is the probability measure on \(\Omega \) given by \(\tfrac{\mathop {}\!\textrm{d}\nu }{\mathop {}\!\textrm{d}\mu }=f\), then
Proof
Let \(V=\textrm{supp}(\max _i |x_i|)\subseteq \Omega \) and define the change of measure f as
Then, (44) is elementary to check. \(\square \)
We are now ready to complete the proof of Theorem 1.2.
Proof of Theorem 1.2
Fix a K-incompressible family of vectors \(x_1,\ldots ,x_n\in L_p(\Omega ,\mu )\) and let \(V=\textrm{supp}( \max _i |x_i|)\subseteq \Omega \). Denote by \(f:\Omega \rightarrow \mathbb {R}_+\) the change of density from Proposition 2.5. If \(\tfrac{\mathop {}\!\textrm{d}\nu }{\mathop {}\!\textrm{d}\mu }=f\), then the linear operator \(T:L_p(V,\mu )\rightarrow L_p(\Omega ,\nu )\) given by \(Tg = f^{-1/p}g\) is (trivially) a linear isometry. Therefore, Proposition 2.1 and (44) show that there exist \(d\in \mathbb {N}\) with \(d\le \tfrac{32e^2(2K)^{2p}\log n}{\varepsilon ^2}\) and points \(y_1,\ldots ,y_n\in \ell _p^d\) such that the condition
is satisfied for every \(i,j\in \{1,\ldots ,n\}\). This concludes the proof of Theorem 1.2.
Remark 2.6
A careful inspection of the proof of Theorem 1.2 reveals that the low-dimensional points \(\{y_i\}_{i=1}^n\) can be obtained as images of the given points \(\{x_i\}_{i=1}^n\) under a linear transformation. Indeed, starting from a K-incompressible family of points \(\{x_i\}_{i=1}^n\) in \(L_p(\Omega ,\mu )\), we use Proposition 2.5 to find a change of measure \(T:L_p(V,\mu )\rightarrow L_p(\Omega ,\nu )\) such that \(\{Tx_i\}_{i=1}^n\) satisfy the stronger assumption of Proposition 2.1. Then, for some \(d\in \mathbb {N}\) with \(d\le \tfrac{32e^2(2K)^{2p}\log n}{\varepsilon ^2}\) we find pairwise disjoint measurable subsets \(S_1,\ldots ,S_d\) of \(\Omega \), each with positive measure, such that if \(S:L_p(\Omega ,\nu )\rightarrow \ell _p^d\) is the linear map
then the points \(\{y_i\}_{i=1}^n = \{(S\circ T)x_i\}_{i=1}^n\subseteq \ell _p^d\) satisfy the desired conclusion (6).
We conclude this section by observing that the argument leading to Theorem 1.2 is constructive.
Corollary 2.7
In the setting of Theorem 1.2, there exists a greedy algorithm which receives as input the high-dimensional points \(\{x_i\}_{i=1}^n\) and produces as output the low-dimensional points \(\{y_i\}_{i=1}^n\).
Proof
As the density (45) is explicitly defined, the linear operator \(T:L_p(V,\mu )\rightarrow L_p(\Omega ,\nu )\) can also be efficiently constructed. On the other hand, in order to construct the operator S defined by (47) one needs to find the corresponding partition \(\{S_1,\ldots ,S_d\}\) and this was achieved in Proposition 2.1 via an application of Maurey’s sampling lemma to the cone \(\mathcal {C}_p \subseteq \ell _\infty ^{N}\) where \(N=\left( {\begin{array}{c}n\\ 2\end{array}}\right) \). As \(\ell _\infty ^{N}\) is e-isomorphic to the 2-uniformly smooth space \(\ell _{\log N}^{N}\), Ivanov’s result from [18] implies that the construction can be implemented by a greedy algorithm. \(\square \)
Analysis of the algorithm. The only nontrivial algorithmic task in our dimensionality reduction result is the implementation of Maurey’s approximate Carathéodory theorem. In the special case of \(\ell _p\) spaces, various constructive proofs of Maurey’s lemma are known [7, 11, 35], each of which allows for an analysis of the algorithm’s running time. Assume that the initial points \(x_1,\ldots ,x_n\in {\textbf {B}}_{\ell _p^m}\) for some finite m. Implementing, for instance, the mirror descent algorithm of [35, Thm. 3.5] on the convex hull of \(\mathcal {M}(y(1)),\ldots ,\mathcal {M}(y(m))\) appearing in the proof of Theorem 1.2, the corresponding indices \(k_1,\ldots ,k_d\) can be produced in time \(O(m n^2 \log n /\varepsilon ^2)\). Therefore, assuming that the points \(x_1,\ldots ,x_n\) a priori lie in a \(\textrm{poly}(n)\)-dimensional space (as is reasonable by Ball’s embedding theorem), the output points \(y_1,\ldots ,y_n\) can be constructed in time \(\textrm{poly}(n,1/\varepsilon )\).
3 Proof of Theorem 1.6
In this section we prove Theorem 1.6. The constructed subset of \({\textbf {B}}_{\ell _p}\) which does not embed linearly into \(\ell _p^d\) for small d is a slight modification of the one considered in [29].
Proof of Theorem 1.6
Fix \(m\in \mathbb {N}\) and denote by \(\{w_i\}_{i=1}^{2^m}\) the rows of the \(2^m\times 2^m\) Walsh matrix and by \(\{e_i\}_{i=1}^{2^m}\) the coordinate basis vectors of \(\mathbb {R}^{2^m}\). Consider the n-point set
where \(n=2^{m+1}+1\) and suppose that \(T:\ell _p^{2^m} \rightarrow \ell _p^d\) is a linear operator such that
Assume first that \(1\le p<2\). If we write \(w_i = \sum _{j=1}^{2^m} w_i(j) e_j\) then by orthogonality of \(\{w_i\}_{i=1}^{2^m}\),
By assumption (49) on T, we have
and
Combining (50), (51), and (52) we deduce that
which is equivalent to \(d\ge \left( \tfrac{\omega (1)}{\Omega (1)}\right) ^\frac{2p}{2-p} 2^m = \left( \tfrac{\omega (1)}{\Omega (1)}\right) ^\frac{2p}{|p-2|} \cdot \tfrac{n-1}{2}\). The case \(p>2\) is treated similarly.
Remark 3.1
The point set \(\mathcal {S}_{n,p}\) considered in the proof of Theorem 1.6 for \(p\ne 2\) is \(O(n^{1/p})\) incompressible and does not admit a linear \(\tfrac{1}{2}\)-isometric embedding in fewer than \(\Omega (n)\) dimensions. This shows that the dimension of the linear embedding exhibited in Theorem 1.2 has to be of order at least \(\Omega (K^p)\) up to lower order terms. This should be compared with the \(O(K^{2p}\log n)\) upper bound of Theorem 1.2.
Notes
The terminology is borrowed by the standard use of the term “incompressible vector” from random matrix theory, which refers to points on the unit sphere of \(\mathbb {R}^n\) which are far from the coordinate vectors \(e_1,\ldots ,e_n\).
This relation between the parameters n, N is natural as any n-point subset of \(\ell _p\) embeds isometrically in \(\ell _p^N\) with \(N=\left( {\begin{array}{c}n\\ 2\end{array}}\right) +1\) by Ball’s isometric embedding theorem [6].
References
Achlioptas, D.: Database-friendly random projections: Johnson–Lindenstrauss with binary coins. J. Comput. Syst. Sci. 66, 671–687 (2003)
Ailon, N., Chazelle, B.: The fast Johnson–Lindenstrauss transform and approximate nearest neighbors. SIAM J. Comput. 39(1), 302–322 (2009)
Ailon, N., Liberty, E.: An almost optimal unrestricted fast Johnson–Lindenstrauss transform. ACM Trans. Algorithms 9(3), # 21 (2013)
Arias-de Reyna, J., Ball, K., Villa, R.: Concentration of the distance in finite-dimensional normed spaces. Mathematika 45(2), 245–252 (1998)
Arriaga, R.I., Vempala, S.: An algorithmic theory of learning: robust concepts and random projection. In: 40th Annual Symposium on Foundations of Computer Science (New York 1999), pp. 616–623. IEEE Computer Society, Los Alamitos (1999)
Ball, K.: Isometric embedding in \(l_p\)-spaces. Eur. J. Combin. 11(4), 305–311 (1990)
Barman, S.: Approximating Nash equilibria and dense subgraphs via an approximate version of Carathéodory’s theorem. SIAM J. Comput. 47(3), 960–981 (2018)
Bartal, Y., Gottlieb, L.-A.: Approximate nearest neighbor search for \(\ell _p\)-spaces \((2<p<\infty )\) via embeddings. Theor. Comput. Sci. 757, 27–35 (2019)
Bourgain, J., Dirksen, S., Nelson, J.: Toward a unified theory of sparse dimensionality reduction in Euclidean space. Geom. Funct. Anal. 25(4), 1009–1088 (2015)
Brinkman, B., Charikar, M.: On the impossibility of dimension reduction in \(l_1\). J. ACM 52(5), 766–788 (2005)
Combettes, C.W., Pokutta, S.: Revisiting the approximate Carathéodory problem via the Frank–Wolfe algorithm. Math. Program. 197(1), 191–214 (2023)
Dasgupta, S., Gupta, A.: An elementary proof of a theorem of Johnson and Lindenstrauss. Random Struct. Algorithms 22(1), 60–65 (2003)
Dirksen, S.: Dimensionality reduction with subgaussian matrices: a unified theory. Found. Comput. Math. 16(5), 1367–1396 (2016)
Frankl, P., Maehara, H.: The Johnson–Lindenstrauss lemma and the sphericity of some graphs. J. Combin. Theory Ser. B 44(3), 355–362 (1988)
Gordon, Y.: On Milman’s inequality and random subspaces which escape through a mesh in \({R}^n\). In: Geometric Aspects of Functional Analysis (1986/1987). Lecture Notes in Mathematics, vol. 1317, pp. 84–106. Springer, Berlin (1988)
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: 30th Annual ACM Symposiumon Theory of Computing (Dallas 1998), pp. 604–613. ACM, New York (1999)
Indyk, P., Naor, A.: Nearest-neighbor-preserving embeddings. ACM Trans. Algorithms 3(3), # 31 (2007)
Ivanov, G.: Approximate Carathéodory’s theorem in uniformly smooth Banach spaces. Discrete Comput. Geom. 66(1), 273–280 (2021)
Jacques, L.: A quantized Johnson–Lindenstrauss lemma: the finding of Buffon’s needle. IEEE Trans. Inf. Theory 61(9), 5012–5027 (2015)
Jacques, L.: Small width, low distortions: quantized random embeddings of low-complexity sets. IEEE Trans. Inf. Theory 63(9), 5477–5495 (2017)
Johnson, W.B., Lindenstrauss, J.: Extensions of Lipschitz mappings into a Hilbert space. In: Conference in Modern Analysis and Probability (New Haven 1982). Contemporary Mathematics, vol. 26, pp. 189–206. American Mathematical Society, Providence (1984)
Johnson, W.B., Naor, A.: The Johnson–Lindenstrauss lemma almost characterizes Hilbert space, but not quite. Discrete Comput. Geom. 43(3), 542–553 (2010)
Johnson, W.B., Schechtman, G.: Finite dimensional subspaces of \(L_p\). In: Handbook of the Geometry of Banach Spaces, vol. I, pp. 837–870. North-Holland, Amsterdam (2001)
Kane, D.M., Nelson, J.: Sparser Johnson–Lindenstrauss transforms. J. ACM 61(1), # 4 (2014)
Kashin, B., Kosov, E., Limonova, I., Temlyakov, V.: Sampling discretization and related problems. J. Complex. 71, # 101653 (2022)
Klartag, B., Mendelson, S.: Empirical processes and random projections. J. Funct. Anal. 225(1), 229–245 (2005)
Krahmer, F., Ward, R.: New and improved Johnson–Lindenstrauss embeddings via the restricted isometry property. SIAM J. Math. Anal. 43(3), 1269–1281 (2011)
Ledoux, M., Talagrand, M.: Probability in Banach Spaces. Ergebnisse der Mathematik und ihrer Grenzgebiete, vol. 23. Springer, Berlin (1991)
Lee, J.R., Mendel, M., Naor, A.: Metric structures in \(L_1\): dimension, snowflakes, and average distortion. Eur. J. Combin. 26(8), 1180–1190 (2005)
Lee, J.R., Naor, A.: Embedding the diamond graph in \(L_p\) and dimension reduction in \(L_1\). Geom. Funct. Anal. 14(4), 745–747 (2004)
Liaw, C., Mehrabian, A., Plan, Y., Vershynin, R.: A simple tool for bounding the deviation of random matrices on geometric sets. In: Geometric Aspects of Functional Analysis. Lecture Notes in Mathematics, vol. 2169, pp. 277–299. Springer, Cham (2017)
Matoušek, J.: On the distortion required for embedding finite metric spaces into normed spaces. Israel J. Math. 93, 333–344 (1996)
Matoušek, J.: On variants of the Johnson–Lindenstrauss lemma. Random Struct. Algorithms 33(2), 142–156 (2008)
Maurey, B.: Théorèmes de factorisation pour les opérateurs linéaires à valeurs dans les espaces \(L^{p}\). Astérisque, vol. 11. Société Mathématique de France, Paris (1974)
Mirrokni, V., Leme, R.P., Vladu, A., Wong, S.-C.: Tight bounds for approximate Carathéodory and beyond. In: 34th International Conference on Machine Learning (Sydney 2017). Proceedings of Machine Learning Research, vol. 70, pp. 2440–2448 (2017). http://JMLR.org
Naor, A.: Metric dimension reduction: a snapshot of the Ribe program. In: International Congress of Mathematicians (Rio de Janeiro 2018), vol. I. Plenary Lectures, pp. 759–837. World Scientific, Hackensack (2018)
Naor, A., Pisier, G., Schechtman, G.: Impossibility of dimension reduction in the nuclear norm. Discrete Comput. Geom. 63(2), 319–345 (2020)
Ostrovskii, M.I.: Metric Embeddings: Bilipschitz and Coarse Embeddings into Banach Spaces. De Gruyter Studies in Mathematics, vol. 49. De Gruyter, Berlin (2013)
Pisier, G.: Remarques sur un résultat non publié de B. Maurey. In: Seminar on Functional Analysis, 1980–1981, # 5. École Polytech., Palaiseau (1981)
Plan, Y., Vershynin, R.: Dimension reduction by random hyperplane tessellations. Discrete Comput. Geom. 51(2), 438–461 (2014)
Regev, O., Vidick, T.: Bounds on dimension reduction in the nuclear norm. In: Geometric Aspects of Functional Analysis, vol. II. Lecture Notes in Mathematics, vol. 2266, pp. 279–299. Springer, Cham (2020)
Schechtman, G.: Two observations regarding embedding subsets of Euclidean spaces in normed spaces. Adv. Math. 200(1), 125–135 (2006)
Schechtman, G., Zinn, J.: On the volume of the intersection of two \(L^n_p\) balls. Proc. Am. Math. Soc. 110(1), 217–224 (1990)
Vershynin, R.: High-Dimensional Probability. Cambridge Series in Statistical and Probabilistic Mathematics, vol. 47. Cambridge University Press, Cambridge (2018)
Acknowledgements
I am grateful to Keith Ball, Assaf Naor, and Pierre Youssef for insightful discussions and useful feedback.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor in Charge: Kenneth Clarkson
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The author was supported by a Junior Research Fellowship from Trinity College, Cambridge. A conference version of this article will be presented in SoCG 2022.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Eskenazis, A. \(\varepsilon \)-Isometric Dimension Reduction for Incompressible Subsets of \(\ell _p\). Discrete Comput Geom 71, 160–176 (2024). https://doi.org/10.1007/s00454-023-00587-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00454-023-00587-w