0% found this document useful (0 votes)

4 views

Phase Retrieval From Coded Diffraction Patterns: Emmanuel J. Cand' Es Xiaodong Li Mahdi Soltanolkotabi November 7, 2013

Uploaded by

jennyjhlee0621

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Phase Retrieval From Coded Diffraction Patterns: Emmanuel J. Cand' Es Xiaodong Li Mahdi Soltanolkotabi November 7, 2013

Uploaded by

jennyjhlee0621

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Phase Retrieval from Coded Diffraction Patterns

Emmanuel J. Candès∗ Xiaodong Li† Mahdi Soltanolkotabi‡

November 7, 2013
arXiv:1310.3240v2 [cs.IT] 6 Nov 2013

Abstract
This paper considers the question of recovering the phase of an object from intensity-only
measurements, a problem which naturally appears in X-ray crystallography and related disci-
plines. We study a physically realistic setup where one can modulate the signal of interest and
then collect the intensity of its diffraction pattern, each modulation thereby producing a sort
of coded diffraction pattern. We show that PhaseLift, a recent convex programming technique,
recovers the phase information exactly from a number of random modulations, which is poly-
logarithmic in the number of unknowns. Numerical experiments with noiseless and noisy data
complement our theoretical analysis and illustrate our approach.

1 Introduction
1.1 The phase retrieval problem
In many areas of science and engineering, we only have access to magnitude measurements; for
instance, it is far easier for detectors to record the modulus of the scattered radiation than to
measure its phase. Imagine then that we have a discrete object x ∈ Cn , and that we would like to
measure ⟨ak , x⟩ for some sampling vectors ak ∈ Cn but only have access to phaseless measurements
of the form
yk = ∣⟨ak , x⟩∣2 , k = 1, . . . , m. (1.1)
The phase retrieval problem is that of recovering the missing phase of the data ⟨ak , x⟩. Once this
information is available, one can find the vector x by essentially solving a system of linear equations.
The quintessential phase retrieval problem, or phase problem for short, asks to recover a signal
from the modulus of its Fourier transform. This comes from the fact that in coherent X-ray
imaging, it follows from the Fraunhofer diffraction equation that the optical field at the detector
is well approximated by the Fourier transform of the object of interest. Since photographic plates,
CCDs and other light detectors can only measure light intensity, the problem is then to recover
x = {x[t]}n−1 n
t=0 ∈ C from measurements of the type

n−1 2
yk = ∣ ∑ x[t]e−i2πωk t ∣ , ωk ∈ Ω, (1.2)
t=0

∗
Departments of Mathematics and of Statistics, Stanford University, Stanford CA
†
Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA
‡
Department of Electrical Engineering, Stanford University, Stanford CA

1
where Ω is a sampled set of frequencies in [0, 1] (we stated the problem in one dimension to
simplify matters). We thus recognize an instance of (1.1) in which the vectors ak are sampled
values of complex sinusoids. X-ray diffraction images are of this form, and as is well known,
permitted the discovery of the double helix [55]. In addition to X-ray crystallography [33, 39],
the phase problem has numerous other applications in the imaging sciences such as diffraction and
array imaging [18, 24], optics [54], speckle imaging in astronomy [26], and microscopy [38]. Other
areas where related problems appear include acoustics [12, 8], blind channel estimation in wireless
communications [4, 45], interferometry [28], quantum mechanics [25, 47] and quantum information
[34].

1.2 Convex relaxation

Previous work [20, 24] suggested to bring convex programming techniques to bear on the phase
retrieval problem. Returning to the general formulation (1.1), the phase problem asks to recover
x ∈ Cn subject to data constraints of the form

tr(ak a∗k xx∗ ) = yk , k = 1, . . . , m,

where tr(X) is the trace of the matrix X. The idea is then to lift the problem in higher dimensions:
introducing the Hermitian matrix variable X ∈ S n×n , the phase problem is equivalent to finding X
obeying
X ⪰ 0, rank(X) = 1, tr(ak a∗k X) = yk for k = 1, . . . , m (1.3)
where, here and below, X ⪰ 0 means that X is positive semidefinite. This problem is not tractable
and, by dropping the rank constraint, is relaxed into

minimize tr(X)
subject to X ⪰0 (1.4)
tr(ak a∗k X) = yk , k = 1, . . . , m.

PhaseLift (1.4) is a semidefinite program (SDP). If its solution happens to have rank one and is
equal to xx∗ , then a simple factorization recovers x up to a global phase/sign.
We pause to emphasize that in different contexts, similar convex relaxations for optimizing
quadratic objectives subject to quadratic constraints are known as Schor’s semidefinite relaxations,
see [41, Section 4.3] and [30] on the MAXCUT problem from graph theory for spectacular applica-
tions of these ideas. For related convex relaxations of quadratic problems, we refer the interested
reader to the wonderful tutorial [52].

1.3 This paper

Numerical experiments [20] together with emerging theory suggest that the PhaseLift approach is
surprisingly effective. On the theoretical side, starting with [23], a line of work established that if
the sampling vectors ak are sufficiently randomized, then the convex relaxation is provably exact.
Assuming that the ak ’s are independent random (complex-valued) Gaussian vectors, [23] shows
that on the order of n log n quadratic measurements are sufficient to guarantee perfect recovery
via (1.4) with high probability. A subset of the authors [21] reached the same conclusion from on
the order of n equations only, by solving the SDP feasibility problem; to be sure, [21] establishes
that the set of matrices obeying the constraints in (1.4) reduces to a unique point namely, xx∗ ,

2
diffraction patterns
source
sample phase plate

Figure 1: An illustrative setup for acquiring coded diffraction patterns.

see [27] for a similar result.1 Finally, inspired by PhaseLift and the famous MAXCUT relaxation
of Goemans and Williamson, [53] proposed another semidefinite relaxation called PhaseCut whose
performance from noiseless data—in terms of the number of samples needed to achieve perfect
recovery—turns out to be identical to that of PhaseLift.
While this is all reassuring, the problem is that the Gaussian model, in which each measurement
gives us the magnitude of the dot product ∑n−1 t=0 x[t]ak [t] between the signal and (complex-valued)
Gaussian white noise, is very far from the kind of data one can collect in an X-ray imaging and
many related experiments. The purpose of this paper is to show that the PhaseLift relaxation is
still exact in a physically inspired setup where one can modulate the signal of interest and then let
diffraction occur.

1.4 Coded diffraction patterns

Imagine then that we modulate the signal before diffraction. Letting d[t] be the modulating
waveform, we would observe the diffraction pattern

n−1 2
¯ −i2πωk t ∣ ,
yk = ∣ ∑ x[t]d[t]e ωk ∈ Ω. (1.5)
t=0

We call this a coded diffraction pattern (CDP) since it gives us information about the spectrum
of {x[t]} modulated by the code {d[t]}. There are several ways of achieving modulations of this
type: one can use a phase mask just after the sample, see Figure 1, or use an optical grating to
modulate the illumination beam as mentioned in [37], or even use techniques from ptychography
which scan an illumination patch on an extended specimen [48, 50]. We refer to [20] for a more
thorough discussion of such approaches.
In this paper, we analyze such a data collection scheme in which one uses multiple modulations.
Our model for data acquisition is thus as follows:

n−1 2
0≤k ≤n−1
y`,k = ∣ ∑ x[t]d¯` [t]e−i2πkt/n ∣ , . (1.6)
t=0 1≤`≤L
1
[21] also establishes near-optimal estimation bounds from noisy data.

3
In words, we collect the magnitude of the discrete Fourier transform (DFT) of L modulations of
the signal x. In matrix notation, letting D` be the diagonal matrix with the modulation pattern
d` [t] on the diagonal and fk∗ be the rows of the DFT, we observe

y`,k = ∣fk∗ D`∗ x∣2 .

We prove that if we use random modulation patterns (random waveforms d[t]), then the solution
to (1.4) is exact with high probability provided that we have sufficiently many CDPs. In fact, we
will see that the feasible set in (1.4) equal to

{X ∶ X ⪰ 0 and A(X) = y} (1.7)

reduces to a single point xx∗ . Above A ∶ S n×n → Rm=nL (S n×n is the space of self-adjoint matrices)
is the linear mapping giving us the linear equalities in (1.4),

A(X) = {fk∗ D`∗ XD` fk }`,k = {tr(D` fk fk∗ D`∗ X)}`,k .

1.5 Main result

Our model assumes random modulations and we work with diagonal matrices D` , 1 ≤ ` ≤ L, which
are i.i.d. copies of a matrix D, whose entries are themselves i.i.d. copies of a random variable d.
Throughout, we assume that d is symmetric, obeys ∣d∣ ≤ M as well as the moment conditions

E d = 0, E d2 = 0, E ∣d∣4 = 2 E ∣d∣2 . (1.8)

A random variable obeying these assumptions is said to be admissible. The reason why we can
have E d2 = 0 while d ≠ 0 is that d is complex valued. An example of an admissible random variable
is d = b1 b2 , where b1 and b2 are independent and distributed as
⎧ 1
⎪
⎪1 with prob. 4
⎪
⎪ ⎧
⎪ 1 4
⎪
⎪−1 with prob. 4 ⎪1
⎪ with prob. 5
b1 = ⎨ 1
and b2 = ⎨√ 1
. (1.9)
⎪
⎪
⎪−i with prob. 4 ⎩ 6
⎪
⎪ with prob. 5
⎪
⎪
⎪ 1
⎩i
⎪ with prob. 4

We would like to emphasize that we impose E[d2 ] = 0 mostly to simplify our exposition. In fact,
the conclusion of Theorem 1.1 below remains valid if E[d2 ] ≠ 0, although we do not prove this in
this paper. In particular, we can also work with d distributed as
⎧ 1
⎪
⎪1 with prob. 4
⎪
⎪
d = ⎨0 1 . (1.10)
with prob. 2
⎪
⎪
⎪ 1
⎪
⎩−1 with prob. 4

Theorem 1.1 Suppose that the modulation is admissible and that the number L of coded diffraction
patterns obeys
L ≥ c ⋅ log4 n,
for some fixed numerical constant c. Then with probability at least 1 − 1/n, the feasibility problem
(1.7) reduces to a unique point, namely, xx∗ , and thus recovers x up to a global phase shift. For
γ ≥ 1, setting L ≥ cγ log4 n leads to a probability of success at least 1 − n−γ .

4
Thus, in a stylized physical setting, it is possible to recover an arbitrary signal from a fairly limited
number of coded diffraction patterns by solving an SDP feasibility problem. As mentioned earlier,
the equivalence from [53] implies that our theoretical guarantees automatically carry over to the
PhaseCut formulation.
Mathematically, the phase recovery problem is different than that in which the sampling vectors
are Gaussian as in [20]. The reason is that the measurements in Theorem 1.1 are far more structured
and far ‘less random’. Loosely speaking, our random modulation model uses on the order of m ∶= nL
random bits whereas the Gaussian model with the same number of quadratic equations would use
on the order of mn random bits (this can be formalized by using the notion of entropy from
information theory). A consequence of this difference is that the proof of the theorem requires new
techniques and ideas. Having said this, an open and interesting research direction is to close the
gap—remove the log factors—and show whether or not perfect recovery can be achieved from a
number of coded diffraction patterns independent of dimension.
The first version of this paper was made publicly available at the same time as [32], which begins
to study the performance of PhaseLift from non-Gaussian sampling vectors. There, the authors
study sampling schemes from certain finite vector configurations, dubbed t-designs. These models
are different from ours and do not include our coded diffraction patterns as a special case. Hence,
our results are not comparable. Having said this, there are similarities in the proof techniques,
especially in the role played by the robust injectivity property, compare our Lemma 3.7 from
Section 3.3 with [32, Section 3.3].

1.6 Other approaches to phase retrieval and related works

There are of course other approaches to phase retrieval and we mention some literature for com-
pleteness and to inform the interested reader of recent progress in this area. Balan [8] studies a
problem where the sampling vectors model a short-time Fourier transform. Balan, Casazza and
Edidin [13] formulate the phase retrieval problem as nonconvex optimization problem. In [12], the
same authors [12] describe some applications of the phase problem in signal processing and speech
analysis and presents some necessary and sufficient conditions which guarantee that the solution
to (1.1) is unique. Other articles studying the minimal number of frame coefficient magnitudes
for noiseless recovery include [19, 5, 15, 40, 10]. We recommend the two blog posts [2] and [3] by
Mixon and the references therein for a comprehensive review and discussion of such results. Lower
bounds on the performance of any recovery method from noisy data are studied in [15, 14, 29].
On the algorithmic side, [7] proposes a nonlinear scheme for phase retrieval having exponential
time complexity in the dimension of the signal x while [11] presents a tractable algorithm requiring
a number of measurements at least quadratic in the dimension of the signal; that is to say, m ≥ c⋅n2
for some constant c > 0.
We also wish to mention some recent works aiming at presenting efficient reconstruction al-
gorithms for generic frames (for certain types of sampling vectors such as those in [5, 16, 46] fast
implementations already exist) and give two references. The first [9] introduces an iterative regular-
ized least-square algorithm and establishes convergence of the algorithm. It is however not known
whether this algorithm enjoys accurate reconstruction guarantees. The second [42] is recent and
studies an alternative minimization approach for phase retrieval, which yields very accurate but
not exact reconstructions from a number of measurements that needs to be at least on the order
of n log n3 Gaussian measurements.
There also is a recent body of work studying the phase retrieval under sparsity assumptions

5
about the signal we wish to recover, see [49, 43, 36, 35, 44] as well as the references therein. Finally,
a different line of work [5, 16] studies the phase retrieval by polarization, see also [46] for a related
approach. This technique comes with an algorithm that can achieve recovery using on the order
of log n specially constructed masks/codes in the noiseless case. However, to the extent of our
knowledge, PhaseLift offers more flexibility in terms of the number and types of masks that can be
used since it can be applied regardless of the data acquisition scheme. In addition, when dealing
with noisy data PhaseLift behaves very well, see Section 2 below and the experiments in [16].

2 Numerical Experiments
In this section we carry out some simple numerical experiments to show how the performance of
the algorithm depends on the number of measurements/masks and how the algorithm is affected
by noise. To solve the optimization problems below we use Auslender and Teboulle [6] sub-gradient
optimization method with a solver written in the framework provided by TFOCS [17] (The code
is available online at [1]). The stopping criterion is when the Frobenius norm of the relative error
of the objective between two subsequent iterations falls below 10−10 or the number of iterations
reaches 50, 000, whichever occurs first. Before presenting the results we introduce the signal and
measurement models we use.

2.1 Signal models

We consider two signal models:

• Random low-pass signals. Here, x is given by

M /2
x[t] = ∑ (Xk + iYk )e2πi(k−1)(t−1)/n ,
k=−(M /2−1)

with M = n/8 and Xk and Yk are i.i.d. N (0, 1).

• Random Gaussian signals. In this model, x ∈ Cn is a random complex Gaussian vector with
i.i.d. entries of the form x[t] = X + iY with X and Y distributed as N (0, 1); this can be
expressed as
n/2
x[t] = ∑ (Xk + iYk )e2πi(k−1)(t−1)/n ,
k=−(n/2−1)

where Xk and Yk are are i.i.d. N (0, 1/8) so that the low-pass model is a ‘bandlimited’ version
of this high-pass random model (variances are adjusted so that the expected power is the
same).

2.2 Measurement models

We perform simulations based on four different kinds of measurements:

• Gaussian measurements. We sample m = nL random complex Gaussian vectors ak and use

measurements of the form ∣a∗k x∣2 .

6
• Binary modulations/codes. We sample (L − 1) binary codes distributed as
⎧ 1
⎪1
⎪ with prob. 2
d=⎨ 1
⎩0 with prob.
⎪
⎪ 2

together with a regular diffraction pattern (d[t] = 1 for all t).

• Ternary modulations/codes. We sample (L − 1) ternary codes distributed as (1.10) together

with a regular diffraction pattern.

• Octanary modulations/codes. Here, the codes are distributed as (1.9).

2.3 Phase transitions

We carry out some numerical experiments to show how the performance of the algorithm depends
on the number of measurements/coded patterns. For this purpose we consider 50 trials. In each
trial we generate a random complex vector x ∈ Cn (with n = 128) from both signal models and
gather data according to the four different measurement models above. For each trial we solve the
following optimization problem
1
min ∥b − A(X)∥2`2 + λtr(X) subject to X ⪰ 0 (2.1)
2
with λ = 10−3 (Note that the solution to (2.1) as λ tends to zero will equal to the optimal solution
of (1.4)).
In Figure 2 we report the empirical probability of success for different signal and measurement
models with different number of measurements. We declare a trial successful if the relative error
of the reconstruction (∥X̂ − xx∗ ∥F / ∥xx∗ ∥F ) falls below 10−5 ). These plots suggest that for the
type of models studied in this paper six coded patterns are sufficient for exact recovery via convex
programming.

2.4 Noisy measurements

We now study how the performance of the algorithm behaves in the presence of noise. We con-
sider Poisson noise which is the usual noise model in optics. More, specifically we assume that
the measurements {yk }m k=1 is a sequence of independent samples from the Poisson distributions
∗ 2
Poi(µk ), where µk = ∣ak x∣ correspond to the noiseless measurements. The Poisson log-likelihood
for independent samples has the form ∑k yk log µk − µk (up to an additive constant factor). Follow-
ing a classical fitting approach we balance a maximum likelihood term with the trace norm in the
relaxation (1.4):

min ∑[µk − yk log µk ] + λtr(X) subject to µ = A(X) and X ⪰ 0.

The test signal is again a complex random signal sampled according to the two models described
in Section 2.1. We use eight CDP’s according to the three models described in Section 2.2. Poisson
noise is adjusted so that the SNR levels range from 10 to 50dB. Here, SNR = ∥A(xx∗ )∥`2 / ∥b − A(xx∗ )∥`2
is the signal-to-noise ratio. For the regularization parameter we use λ = 1/SNR. (In these exper-
iments, the value of SNR is known. The result, however, is rather insensitive to the choice of

7
random Gaussian signals random low-pass signals

1 1
Probability of success

0.8 0.8

0.6 0.6

0.4 0.4
binary masks binary masks
0.2 ternary masks 0.2 ternary masks
octanary masks octanary masks
0 Gaussian meas. 0 Gaussian meas.
4 5 6 7 8 9 10 4 5 6 7 8 9 10
L L

Figure 2: Empirical probability of success based on 50 random trials for different

signal/measurement models and a varied number of measurements. A value of L on
the x-axis means that we have a total of m = Ln samples.

the parameter λ and a good choice for the regularization parameter λ can be obtained by cross
validation.) For each SNR level we repeat the experiment ten times with different random noise
and different random CDP’s.
Figure 3 shows the average relative MSE (in dB) versus the SNR (also in dB). More precisely,
2 2
the values of 10 log10 (rel. MSE) are plotted, where rel. MSE = ∥X̂ − xx∗ ∥F / ∥X̂∥F . These figures
indicate that the performance of the algorithm degrades linearly as the SNR decreases (on a dB/dB
scale). Empirically, the slope is close to -1, which means that the MSE scales like the noise. Together
with the low offset, these features indicate that all is as in a well-conditioned-least squares problem.

3 Proofs
We prove our results in this section. Before we begin, we introduce some notation. We recall that
the random variable d is admissible, i.e. bounded i.e. ∣d∣ ≤ M , symmetric, and obeying moment
constraints

E d = 0, E d2 = 0, E ∣d∣4 = 2 E ∣d∣2 . (3.1)

Without loss of generality we also assume that E ∣d∣2 = 1. Throughout D is a diagonal matrix with
i.i.d. entries distributed as d. For a vector y ∈ Cn we use y T and y ∗ to denote the transpose and
complex conjugate of the vector y. We also use ȳ to denote elementwise conjugation of the entries
of y. Since this is less standard, we prefer to be concrete as to avoid ambiguity: for example,
T ∗
1+i 1+i 1+i 1−i
[ ] = [1 + i 1 + 2i] , [ ] = [1 − i 1 − 2i] , [ ]=[ ].
1 + 2i 1 + 2i 1 + 2i 1 − 2i

8
random Gaussian signals random low-pass signals

binary masks binary masks

−10 ternary masks −10 ternary masks
octanary masks octanary masks
Relative MSE (dB)

−20
−20

−30
−30
−40
−40
−50
−50
10 20 30 40 50 10 20 30 40 50
SNR (dB) SNR (dB)

Figure 3: SNR versus relative MSE on a dB-scale for different kinds of sig-
nal/measurement models. The linear relationship between SNR and MSE (on the dB
scale) is apparent. The MSE behaves as in a well-conditioned least-squares problem.

Continuing, ∥X∥ is the spectral or operator norm of a matrix X. Finally, 1 is a vector with all
entries equal to one.
Throughout, we assume that the fixed vector x we seek to recover is unit normed, i.e. ∥x∥`2 = 1.
Throughout T is the linear subspace

T = {X = xy ∗ + yx∗ ∶ y ∈ Cn }.

This subspace may be interpreted as the tangent space at xx∗ to the manifold of Hermitian matrices
of rank 1. Below T ⊥ is the orthogonal complement to T . For a linear subspace V of Hermitian
matrices, we use YV or PV (Y ) to denote the orthogonal projection of Y onto V . With this, the
reader will check that YT ⊥ = (I − xx∗ )Y (I − xx∗ ).

3.1 Preliminaries
It is useful to record two identities that shall be used multiple times, and defer the proofs to the
Appendix.

Lemma 3.1 For any fixed vector x ∈ Cn

1 ∗ 1 n 2
E( A A(xx∗ )) = E ( ∑ ∣fk∗ D ∗ x∣ Dfk fk∗ D ∗ ) = xx∗ + ∥x∥2`2 I.
nL n k=1

Lemma 3.2 For any fixed x ∈ Cn ,

1 n ∗ ∗ 2 T T
E( ∑ (fk D x) Dfk fk D) = 2xx .
n k=1

9
Next, we present two simple intermediate results we shall also use. The proofs are also in the
Appendix.
Lemma 3.3 Fix δ > 0 and suppose the number L of CDP’s obeys L ≥ c log n for some sufficiently
large numerical constant c. Then with probability at least 1 − 1/n2 ,
1 ∗
∥ A (1) − In ∥ ≤ δ.
nL
Lemma 3.4 For all positive semidefinite matrices X, it holds
1
∥A(X)∥`1 ≤ M 2 tr(X).
nL
Finally, the last piece of mathematics is the matrix Hoeffding inequality
Lemma 3.5 [51, Theorem 1.3] Let {S` }L
`=1 be a sequence of independent random n × n self-adjoint
matrices. Assume that each random matrix obeys

E S` = 0 and ∥S` ∥ ⪯ ∆ almost surely. (3.2)

Then for all t ≥ 0,

1 L Lt2
P( ∥ ∑ S` ∥ ≥ t) ≤ 2n exp(− 2 ). (3.3)
L `=1 8∆

3.2 Certificates
We now establish sufficient conditions guaranteeing that x∗ x is the unique feasible point of (1.7).
Variants of the lemma below have appeared before in the literature, see [23, 27, 21].
Lemma 3.6 Suppose the mapping A obeys the following two properties:
1. For all matrices X ∈ T
1 (1 − δ)
√ ∥A(X)∥`2 ≥ √ ∥X∥F . (3.4)
nL 2
2. There exists a self-adjoint matrix of the form Z = A∗ (λ), with λ real valued (this makes sure
that Z is self adjoint), obeying
1−δ
ZT ⊥ ⪯ −IT ⊥ and ∥ZT ∥F ≤ √ . (3.5)
2M 2 nL

Then x∗ x is the unique element in the feasible set (1.7).

Proof Suppose xx∗ + H is feasible. Feasibility implies that H is a self-adjoint matrix in the null
space of A and HT ⊥ ⪰ 0. This is because for all y ⊥ x,

y ∗ (xx∗ + H)y = y ∗ Hy ≥ 0,

which says that HT ⊥ is positive semidefinite. This gives

⟨H, Y ⟩ = 0 = ⟨HT , ZT ⟩ + ⟨HT ⊥ , ZT ⊥ ⟩.

10
On the one hand,
⟨HT , ZT ⟩ = −⟨HT ⊥ , ZT ⊥ ⟩ ≥ ⟨HT ⊥ , IT ⊥ ⟩ = tr(HT ⊥ ). (3.6)
Therefore,
1 1
tr(HT ⊥ ) ≥ ∥A(HT ⊥ )∥`1 ≥ ∥A(HT ⊥ )∥`2 .
M 2 nL M 2 nL
where the first inequality above follows from Lemma 3.4. The injectivity property (3.4) gives
1 (1 − δ)
√ ∥A(HT )∥`2 ≥ √ ∥HT ∥F
nL 2
and since A(HT ) = −A(HT ⊥ ), we established
1−δ
tr(HT ⊥ ) ≥ √ ∥HT ∥F . (3.7)
2nLM 2
On the other hand,
1−δ
∣⟨HT , ZT ⟩∣ ≤ ∥HT ∥F ∥ZT ∥F ≤ √ ∥HT ∥F . (3.8)
2M 2 nL
In summary, (3.6), (3.7) and (3.8) assert that HT = 0. In turn, this gives tr(HT ⊥ ) = 0 by (3.6),
which implies that HT ⊥ = 0 since HT ⊥ ⪰ 0. This completes the proof.
Property (3.4) can be viewed as a form of robust injectivity of the mapping A restricted to
elements in T . It is of course reminiscent of the local restricted isometry property in compressive
sensing. Property (3.5) can be interpreted as the existence of an approximate dual certificate. It
is well known that injectivity together with an exact dual certificate leads to exact reconstruction.
The above lemma essentially asserts that a robust form of injectivity together with an approximate
dual certificate leads to exact recovery as in [23, Section 2.1], see also [31]. In the next two sections
we show that the two properties stated in Lemma 3.6 above each hold with probability at least
1 − 1/(2n).

3.3 Robust injectivity

Lemma 3.7 Fix δ > 0 and suppose L obeys L ≥ c log3 n for some sufficiently large numerical
constant c. Then with probability at least 1 − 1/2n, for all X ∈ T ,
1 (1 − δ)
√ ∥A(X)∥`2 ≥ √ ∥X∥F .
nL 2
Proof First, notice that without loss of generality we can assume that x∗ y is real valued in the
definition of T . That is,

T = {X = xy ∗ + yx∗ ∶ y ∈ Cn and x∗ y ∈ R}.

The reason why this is true is that for any y ∈ Cn , we can find λ ∈ R, such that x∗ y − iλx∗ x =
x∗ (y − iλx) ∈ R while
x(y − iλx)∗ + (y − iλx)x∗ = xy ∗ + yx∗ ,
Now for any X = xy ∗ + yx∗ ∈ T ,

∥X∥F = ∥xy ∗ + yx∗ ∥F ≤ ∥xy ∗ ∥F + ∥yx∗ ∥F ≤ 2 ∥x∥`2 ∥y∥`2 = 2 ∥y∥`2 , (3.9)

11
where we recall that ∥x∥`2 = 1. Hence, it suffices to show that

1 (1 − δ)
√ ∥A(xy ∗ + yx∗ )∥`2 ≥ √ ∥y∥`2 . (3.10)
nL 2
We have
L n 2
∗ ∗ 2
∥A(xy + yx )∥`2 =∑∑ (fk∗ D`∗ (xy ∗ ∗
+ yx )D` fk ) .
`=1 k=1
(The reader might have expected a sum of squared moduli but since xy ∗ + yx∗ is self adjoint,
fk∗ D`∗ (xy ∗ + yx∗ )D` fk is real valued and so we can just as well use squares. For exposition
purposes, set
2 2
Ak (D) = ∣fk∗ D ∗ x∣ fk fk∗ , Bk (D) = (fk∗ D ∗ x) fk fkT .
A simple computation we omit yields
2 ∗
y D 0 Ak (D) Bk (D) D ∗ 0 y
(fk∗ D ∗ xy ∗ Dfk + fk∗ D ∗ yx∗ Dfk ) = [ ] [ ∗] [ ][ ][ ]
ȳ 0 D Bk (D) Ak (D) 0 D ȳ
∗
y y
= [ ] Wk (D) [ ] ,
ȳ ȳ

where
D 0 Ak (D) Bk (D) D ∗ 0
Wk (D) ∶= [ ∗] [ ][ ].
0 D Bk (D) Ak (D) 0 D
Fix a positive threshold Tn . We now claim that (3.10) follows from
∗
1 L n ∗ ∗ x x 2
∑ ∑ Wk (D` )1(∣fk D` x∣ ≤ Tn ) ⪰ α [ ] [ ] + (1 − δ) I2n (3.11)
nL `=1 k=1 −x −x

in which α is any real valued number. To see why this is true, observe that
∗
1 2 y 1 y
∥A(xy ∗ + yx∗ )∥`2 ≥ [ ] ∗ ∗
∑ ∑ Wk (D` )1(∣fk D` x∣ ≤ Tn ) [ ]
nL ȳ nL ` k ȳ
∗
y y
≥ (1 − δ)2 [ ] I2n [ ] = 2(1 − δ)2 ∥y∥2`2 .
ȳ ȳ

The last inequality comes from (3.11) together with

∗
x y
[ ] [ ] = 0,
−x y

which holds since we assumed that x∗ y is real valued.

The remainder of the proof justifies (3.11) by means of the matrix Hoeffding inequality. Let
⟨W ⟩ be the left-hand side in (3.11) (we use notation from physics to denote empirical averages since
a bar denotes complex conjugation and we would like to avoid overloading symbols). By definition
⟨W ⟩ is the empirical average of L i.i.d. copies of
1 n
W (D) = ∑ Wk (D)1{∣f ∗ D∗ x∣≤Tn } .
n k=1 k

12
First, Wk (D) ⪰ 0 since
∗
Ak (D) Bk (D) (f ∗ D ∗ x)fk (fk∗ D ∗ x)fk
[ ] = [ k∗ ∗ ][ ] .
Bk (D) Ak (D) (fk D x)fk (fk∗ D ∗ x)fk

Further,

Ak (D) Bk (D) 2Ak (D) 0 Ak (D) −Bk (D)

[ ]=[ ]−[ ]
Bk (D) Ak (D) 0 2Ak (D) −Bk (D) Ak (D)
2Ak (D) 0
⪯[ ].
0 2Ak (D)

The inequality comes from

∗
Ak (D) −Bk (D) (f ∗ D ∗ x)f (f ∗ D ∗ x)f
[ ] = [ k∗ ∗ k ] [ k∗ ∗ k ] ⪰ 0.
−Bk (D) Ak (D) −(fk D x)fk −(fk D x)fk

Hence,

n ⎡⎢∣fk∗ D ∗ x∣2 fk fk∗

n ⎤
Ak (D) Bk (D) ⎢ 0 ⎥
⎥1 ∗ ∗
∑[ ] 1{∣f ∗ D∗ x∣≤Tn } ⪯ 2 ∑ ⎢ 2 T ⎥ {∣fk D x∣≤Tn }
k=1 Ak (D) Bk (D)
∗ ∗
k
k=1 ⎢
⎣ 0 ∣fk D x∣ fk fk ⎥⎦
n ∗
f f 0
⪯ 2Tn2 ∑ [ k k T]
k=1
0 fk fk

= 2nTn2 I2n .

In summary,
∥W (D)∥ ≤ 2Tn2 ∥D∥2 ≤ 2M 2 Tn2 .
We now roughly estimate the mean of W (D). Obviously,
1 1
W (D) = ∑ Wk (D) − ∑ Wk (D)1{∣f ∗ D∗ x∣≤Tn }
n k n k k

1
∶= W̃ (D) − ∑ Wk (D)1{∣f ∗ D∗ x∣≤Tn } .
n k k

By Lemmas 3.1 and 3.2, the mean of the first term (W̃ (D)) is equal to

xx∗ 2xxT
E W̃ (D) = I2n + [ ]. (3.12)
2x̄x∗ xxT

Furthermore, a simple calculation shows that since

√
∣fk∗ D ∗ x∣ ≤ ∥fk ∥`2 ∥D ∗ x∥`2 ≤ ∥fk ∥`2 ∥D∥ ≤ n∥D∥,

one can verify that

∥Wk (D)∥ ≤ 4n2 ∥D∥4 ≤ 4M 4 n2 .

13
Therefore, Jensen’s inequality gives

1
∥E W (D) − E W̃ (D)∥ = ∥E ∑ Wk (D)1{∣f ∗ D∗ x∣≤Tn } ∥
n k k

1
≤ E ∥ ∑ Wk (D)1{∣f ∗ D∗ x∣≤Tn } ∥
n k k

n
≤ 4M 4 n ∑ P(∣fk∗ D ∗ x∣ > Tn ).
k=1
√
Setting Tn = 2β log n, then a simple application of Hoeffding’s inequality gives
√
P(∣fk∗ D ∗ x∣ > 2β log n) ≤ 2n−β

(we omit the details). Therefore,

8M 4
∥E W (D) − E W̃ (D)∥ ≤ . (3.13)
nβ−2
Next,

∥W (D) − E W (D)∥ ≤ ∥W (D)∥ + ∥ E W (D)∥ ≤ ∥W (D)∥ + ∥ E W̃ (D)∥ + ∥ E W (D) − E W̃ (D)∥

and with Tn as above, collecting our estimates gives

8M 4
∥W (D) − E W (D)∥ ≤ 4M 2 β log n + 4 + ∶= ∆.
nβ−2
We have done the groundwork to apply the matrix Hoeffding inequality (3.3), which reads

Lt2
P (∥⟨W ⟩ − E W (D)∥ ≥ t) ≤ 2n exp(− ).
8∆2
This implies that when β is sufficiently large and L ≥ c log3 n for a sufficiently large constant,

∥ ⟨W ⟩ − E W (D)∥ ≤ /2

with probability at least 1 − 1/(2n). Now from (3.12), and (3.13) gives

3 x 1 x
E W (D) = I2n + [ ] [x∗ , xT ] − [ ] [x∗ , −xT ] + E,
2 x̄ 2 −x̄

where (3.13) gives that ∥E∥ ≤ /2 provided β ≥ 2 + log(16M 4 −1 )/ log n. Hence, we have established
that
3 x 1 x 1 x
⟨W ⟩ ⪰ (1 − )I2n + [ ] [x∗ , xT ] − [ ] [x∗ , −xT ] ⪰ (1 − )I2n − [ ] [x∗ , −xT ]
2 x̄ 2 −x̄ 2 −x̄

x
since [ ] [x∗ , xT ] ⪰ 0. With = 2δ − δ 2 , this is the desired conclusion (3.11).
x̄

14
3.4 Dual certificate construction via the golfing scheme
We now construct the approximate dual certificate Z obeying the conditions of Lemma 3.6. For
this purpose we use the golfing scheme first presented in the work of Gross [31]. Modifications of
this technique have subsequently been used in many other papers e.g. [22, 23, 36]. The special
form used here is most closely related to the construction in [36]. The mathematical validity of our
construction crucially relies on the lemma below, whose proof is the object of the separate Section
3.5.
Lemma 3.8 Assume that L ≥ c log3 n for a sufficiently large constant c. Then for any fixed X ∈ T ,
there exists Y of the form Y = A∗ (λ) with λ real valued such that
√
2
∥Y − X∥ ≤ ∥X∥F
20
holds with probability at least 1 − 1/n2 . This inequality has the immediate consequences
√
1 2
∥YT − X∥F ≤ ∥X∥F , ∥YT ⊥ ∥ ≤ ∥X∥F .
5 20
To build our approximate dual certificate Z, we partition the modulations or CDPs into B + 1
different groups so that, from now on, A0 corresponds to those measurements from the first L0
modulations, A1 to those from the next L1 ones, and so on. Clearly, L0 + L1 + . . . + LB = L. The
random mappings {Ab }B b=0 correspond to independent modulations and are thus independent. Our
2
golfing scheme starts with X (0) = nL 0
PT (A∗0 (1)) (1 is the all-one vector) and for b = 1, . . . , B,
inductively defines
√
• Y (b) ∈ Range(A∗b ) obeying ∥Y (b) − X (b−1) ∥ ≤ 2
20
∥X (b−1) ∥F ,

• and X (b) = X (b−1) − PT (Y (b) ).

In the end, we set
B
2
Z=Y − A∗ (1), Y = ∑ Y (b) .
nL0 0 t=1
(b)
Note that Lemma 3.8 asserts that Y exists with high probability, and that for each b both
√
(b) 1 (b−1) (b) 2
∥X ∥ ≤ ∥X ∥ and ∥YT ⊥ ∥ ≤ ∥X (b−1) ∥ (3.14)
F 5 F 20 F

hold on an event of probability at least 1 − 1/n2 .

We now show that our construction Z satisfies the required assumptions from Lemma 3.6. First,
Z is self-adjoint and of the form A∗ (λ) with λ ∈ RnL . Second,
B B
2
ZT = YT − PT (A∗0 (1)) = ∑ PT (Y (b) ) − X (0) = ∑ (X (b−1) − X (b) ) − X (0) = −X (B) .
nL0 b=1 b=1

Then (3.14) implies that with probability at least 1 − B/n2

1
∥ZT ∥F ≤ ∥X (B) ∥ ≤ ∥X (0) ∥ . (3.15)
F 5B F

15
Also, (3.14) gives
B
√ B √ B √
(b) 2 (t−1) 2 1 (0) 2
∥YT ⊥ ∥ ≤ ∑ ∥YT ⊥ ∥ ≤ ∑ ∥X ∥F ≤ ∑ b ∥X ∥F < ∥X (0) ∥F (3.16)
b=1 20 b=1 20 b=1 5 16

with probability at least 1 − B/n2 . If L0 ≥ c log n for a sufficiently large constant c > 0, Lemma 3.3
states that
2 1
∥ A∗0 (1) − 2I∥ ≤
nL0 4

with probability at least 1 − 1/n2 . Using the fact that for any matrix W , we have ∥WT ∥ ≤ 2∥W ∥
and ∥WT ⊥ ∥ ≤ ∥W ∥ we conclude that

∥X (0) − 2IT ∥ ≤ 1/2, ∥YT ⊥ − ZT ⊥ − 2IT ⊥ ∥ ≤ 1/4. (3.17)

Since X (0) has rank at most 2,

√ √ √
∥X (0) ∥ ≤ 2∥X (0) ∥ ≤ 2∥X (0) − 2IT ∥ + 2 2∥IT ∥
F

Finally, with (3.17) and ∥IT ∥ ≤ 1, we conclude that

∥X (0) ∥ < 4. (3.18)

Plugging this into (3.15) we arrive at

4
∥ZT ∥F ≤ . (3.19)
5B
Also, (3.16), (3.17) and (3.18) give
√
2 1
∥ZT ⊥ + 2IT ⊥ ∥ ≤ ∥YT ⊥ ∥ + ∥YT ⊥ − ZT ⊥ − 2IT ⊥ ∥ ≤ + <1 ⇒ ZT ⊥ ⪯ −IT ⊥ . (3.20)
4 4
Therefore, the assumptions in Lemma 3.6 hold with probability at least 1 − 1/2n by applying the
union bound and using with the proviso that B ≥ c1 log n and Lb ≥ c2 log3 n for sufficiently large
constants c1 and c2 (this is why we require L ≥ c log4 n for a sufficiently large constant).

3.5 Proof of Lemma 3.8

The immediate consequences hold for the following reasons. First, since any matrix in T has rank
at most 2,
√ √ 1
∥YT − X∥F ≤ 2∥YT − X∥ ≤ 2 2∥Y − X∥ ≤ ∥X∥F ,
5
where the second inequality follows from ∥MT ∥ ≤ 2∥M ∥ for any M . Second, since ∥MT ⊥ ∥ ≤ ∥M ∥,
√
2
∥YT ⊥ ∥ = ∥YT ⊥ − XT ⊥ ∥ ≤ ∥Y − X∥ ≤ ∥X∥F .
20
It thus suffices to prove the first property. To this end consider the eigenvalue decomposition of
X = λ1 u1 u∗1 + λ2 u2 u∗2 . The proof follows from Lemma 3.9 below combined with Lemma 3.3.

16
Lemma 3.9 Assume L ≥ c log3 n for a sufficiently large constant c. Given any fixed self-adjoint
matrix vv ∗ , with probability at least 1 − 1/(2n3 ) there exists Y
̃ ∈ Range(A∗ ) obeying

̃ − (vv ∗ + ∥v∥2 I)∥ ≤ ∥v∥2 .

∥Y `2 `2

Proof Without loss of generality, assume ∥v∥`2 = 1 and set Y

̃ = ⟨Y ⟩,

1 L 1 n ∗ ∗ 2 ∗ ∗
⟨Y ⟩ = ∑ Y` , Y` = ∑ ∣fk D` v∣ 1{∣f ∗ D∗ v∣≤Tn } D` fk fk D` ,
L l=1 n k=1 k `

which is of the form A∗ (λ). The Y` ’s are i.i.d. copies of Y ,

1 n ∗ ∗ 2 ∗ ∗ 1 n ∗ ∗ 2 ∗ ∗
Y = ∑ ∣fk D v∣ Dfk fk D − ∑ ∣fk D v∣ 1{∣f ∗ D∗ v∣>Tn } Dfk fk D .
n k=1 n k=1 k

Notice that the random positive semi-definite matrix Y obeys

1 n 2 ∗ ∗ 2 ∗
Y ⪯ ∑ Tn Dfk fk D = Tn DD .
n k=1
By Lemma 3.1,
1 n ∗ ∗ 2 ∗ ∗ ∗
E( ∑ ∣f D v∣ Dfk fk D ) = vv + I.
n k=1 k
Using Jensen’s inequality, we have as in the proof of Lemma 3.7
1 n ∗ ∗ 2 ∗ ∗ 1 n 2 4
∥E ( ∑ ∣fk D v∣ 1{∣f ∗ D∗ v∣>Tn } Dfk fk D )∥ ≤ ∑ E(1{∣f ∗ D∗ v∣>Tn } )n M . (3.21)
n k=1 k n k=1 k

√
Put Tn = 2β log n. Hoeffding’s inequality gives

E(1{∣f ∗ D∗ v∣>Tn } ) ≤ 2n−β .

Plugging this into (3.21) we arrive at

1 n ∗ ∗ 2 ∗ ∗ 2M 4
∥E ( ∑ ∣fk D v∣ 1{∣f ∗ D∗ v∣>Tn } Dfk fk D )∥ ≤ β−2 .
n k=1 k n
For sufficiently large β, this implies
2∥D∥4
∥E(Y ) − (vv ∗ + I)∥ ≤ ≤ . (3.22)
nβ−2 2
By using Hoeffding inequality in a similar fashion as in the proof of Lemma 3.7, we obtain (we omit
the details)

∥⟨Y ⟩ − E(Y )∥ ≤ .
2
Combining the latter with (3.22), we conclude

∥⟨Y ⟩ − (vv ∗ + I)∥ ≤ .

17
4 Discussion
In this paper, we proved that a signal could be recovered by convex programming techniques from
a few diffraction patterns corresponding to generic modulations obeying an admissibility condition.
We expect that our results, methods and proofs extend to more general random modulations
although we have not pursued such extensions in this paper. Further, we proved that on the order
of (log n)4 CDPs suffice for perfect recovery and we expect that further refinements would allow to
reduce this number, perhaps all the way down to a figure independent of the number n of unknowns.
Such refinements appear quite involved to us and since our intention is to provide a reasonably
short and conceptually simple argument, we leave such refinements to future research.

5 Appendix
2πi
Set ω = e n to be the nth root of unity so that
⎡ ω 0(k−1) ⎤
⎢ ⎥
⎢ ω 1(k−1) ⎥
⎢ ⎥
fk∗ = [ω −0(k−1) , ω −1(k−1) , . . . , ω −(n−1)(k−1) ] , fk = ⎢ ⎥.
⎢ ⋮ ⎥
⎢ ⎥
⎢ω (n−1)(k−1) ⎥
⎣ ⎦
n
For two integers a and b we use a ≡ b to denote congruence of a and b modulo n (n divides a − b).

5.1 Proof of Lemma 3.1

Put
1 n ∗ ∗ 2 ∗ ∗
Y ∶= ∑ ∣f D x∣ Dfk fk D .
n k=1 k
By definition,
n n
2
∣fk∗ D ∗ x∣ = ( ∑ d¯a xa ω −(a−1)(k−1) )( ∑ db x̄b ω (b−1)(k−1) )
a=1 b=1
n n
= ∑ ∑ ω (b−a)(k−1) d¯a db xa x̄b
a=1 b=1

Further,
1 n n n (b−a+p−q)(k−1) ¯
Ypq = ∑ ∑ ∑ω da db dp d¯q xa x̄b
n k=1 a=1 b=1
n n
1 n (b−a+p−q)(k−1)
= ∑ ∑ d¯a db dp d¯q xa x̄b ( ∑ω )
a=1 b=1 n k=1
n n
= ∑ ∑ d¯a db dp d¯q xa x̄b 1{a+q ≡n b+p} .
a=1 b=1

Therefore,
n n
E[Ypq ] = ∑ ∑ E[d¯a db dp d¯q ]xa x̄b 1{a+q ≡n b+p} .
a=1 b=1

18
• Diagonal terms (p = q): Here, E[d¯a db ∣dp ∣2 ] = 0 unless a = b. This gives
n
E[Ypp ] = ∑ E[∣da ∣2 ∣dp ∣2 ] ∣xa ∣2
a=1
n
= E[∣dp ∣4 ] ∣xp ∣2 + E[∣dp ∣2 ( ∑ ∣da ∣2 ∣xa ∣2 )]
a≠p
2
= ∣xp ∣ + ∥x∥2`2 .

• Off-diagonal terms (p ≠ q): Here E[d¯a db dp d¯q ] = 0 unless (a = p, b = q) so that

2
E[Ypq ] = (E[∣d∣2 ]) xp x̄q = xp x̄q .

This concludes the proof.

5.2 Proof of Lemma 3.2

Put
1 n ∗ ∗ 2 T
R= ∑ (f D x) Dfk fk D.
n k=1 k

By definition,
n n
2
(fk∗ D ∗ x) = ∑ ∑ ω −(a+b−2)(k−1) d¯a d¯b xa xb
a=1 b=1

and
1 n n n (p+q−a−b)(k−1) ¯ ¯
Rpq = ∑ ∑ ∑ω da db dp dq xa xb
n k=1 a=1 b=1
n n
1 n (p+q−a−b)(k−1)
= ∑ ∑ d¯a d¯b dp dq xa xb ( ∑ω )
a=1 b=1 n k=1
n n
= ∑ ∑ d¯a d¯b dp dq xa xb 1{p+q ≡n a+b} .
a=1 b=1

Therefore,
n n
E[Rpq ] = ∑ ∑ E[d¯a d¯b dp dq ]xa xb 1{p+q ≡n a+b} .
a=1 b=1

• Diagonal terms (p = q): Here, E[d¯a db ∣dp ∣2 ] = 0 unless a = b = p. This gives

E[Rpp ] = E[∣d∣4 ]x2p = 2x2p .

• Off-diagonal terms (p ≠ q): Here, E[d¯a d¯b dp dq ] = 0 unless (a = p, b = q) or (a = q, b = p). This

gives
E[Rpq ] = 2E[∣dp ∣2 ∣dq ∣2 ]xp xq = 2xp xq .

This concludes the proof.

19
5.3 Proof of Lemma 3.3
Note that
1 ∗ 1 L n ∗ ∗ 1 L ∗
Z ∶= A (1) = ∑ ∑ D` fk fk D` = ∑ D` D` .
nL nL `=1 k=1 L `=1

Therefore, Z is a diagonal matrix with i.i.d. diagonal entries distributed as L1 ∑L `=1 X` , where the
X` are i.i.d. random variables with E[X` ] = E[∣d∣2 ] = 1 and ∣X` ∣ = ∣d∣2 ≤ M 2 . The statement in the
lemma then follows from Hoeffding’s inequality

1 L − 2L t2
P{ ∣ ∑ X` − 1∣ ≥ t} ≤ 2e M 2
L `=1

combined with the union bound.

5.4 Proof of Lemma 3.4

The proof is straightforward and parallels calculations in [23]. Fix a unit-normed vector v, then
L n L L
2 2
∥A(vv ∗ )∥`1 = ∑ ∑ ∣fk∗ D`∗ v∣ = n ∑ ∥D`∗ v∥`2 ≤ nM 2 ∑ ∥v∥2`2 = nLM 2 .
`=1 k=1 `=1 `=1

Consider now the eigenvalue decomposition X = ∑nj=1 λj vj vj∗ where λj is nonnegative since X ⪰ 0.
Then
n
∥A(X)∥`1 = ∑ λj ∥A(vj vj∗ )∥` ≤ nLM 2 ∑ λj = nLM 2 tr(X).
1
j=1 j

Acknowledgements
E C. is partially supported by AFOSR under grant FA9550-09-1-0643, by ONR under grant N00014-09-
1-0258 and by a gift from the Broadcom Foundation. M. S. is supported by a a Benchmark Stanford
Graduate Fellowship. X. L. is supported by the Wharton Dean’s Fund for Post-Doctoral Research and by
funding from the National Institutes of Health. We would like to thank V. Voroninski for helpful discussions,
especially for bringing to our attention the difficulty of establishing RIP-1 results in the masked Fourier
model. M. S. thanks David Brady and Adam Backer for fruitful discussions about the implementation of
structured illuminations in X-ray crystallography and microscopy applications.

References
[1] www. stanford. edu/ ~ mahdisol/ code .
[2] D. G. Mixon blog: Saving phase: Injectivity and stability for phase retrieval.
[3] D. G. Mixon blog: AIM workshop: frame theory intersects geometry.
[4] A. Ahmed, B. Recht, and J. Romberg. Blind deconvolution using convex programming. arXiv preprint
arXiv:1211.5608, 2012.
[5] B. Alexeev, A. S. Bandeira, M. Fickus, and D. G. Mixon. Phase retrieval with polarization. arXiv
preprint arXiv:1210.7752, 2012.

20
[6] A. Auslender and M. Teboulle. Interior gradient and proximal methods for convex and conic optimiza-
tion. SIAM Journal on Optimization, 16(3):697–725, 2006.
[7] R. Balan. A nonlinear reconstruction algorithm from absolute value of frame coefficients for low redun-
dancy frames. In International Conference on Sampling Theory and Applications (SAMPTA), 2009.
[8] R. Balan. On signal reconstruction from its spectrogram. In 44th Annual Conference on Information
Sciences and Systems (CISS), pages 1–4, 2010.
[9] R. Balan. Reconstruction of signals from magnitudes of redundant representations. arXiv preprint
arXiv:1207.1134, 2012.
[10] R. Balan. Stability of phase retrievable frames. arXiv preprint arXiv:1308.5465, 2013.
[11] R. Balan, B. G. Bodmann, P. G. Casazza, and D. Edidin. Painless reconstruction from magnitudes of
frame coefficients. Journal of Fourier Analysis and Applications, 15(4):488–501, 2009.
[12] R. Balan, P. Casazza, and D. Edidin. On signal reconstruction without phase. Applied and Computa-
tional Harmonic Analysis, 20(3):345–356, 2006.
[13] R. Balan, P. Casazza, and D. Edidin. Equivalence of reconstruction from the absolute value of the frame
coefficients to a sparse representation problem. IEEE Signal Processing Letters, 14(5):341–343, 2007.
[14] R. Balan and Y. Wang. Invertibility and robustness of phaseless reconstruction. arXiv preprint
arXiv:1308.4718, 2013.
[15] A. S. Bandeira, J. Cahill, D. G. Mixon, and A. A. Nelson. Saving phase: Injectivity and stability for
phase retrieval. arXiv preprint arXiv:1302.4618, 2013.
[16] A. S. Bandeira, Y. Chen, and D. G. Mixon. Phase retrieval from power spectra of masked signals. arXiv
preprint arXiv:1303.4458, 2013.
[17] S. R. Becker, E. J. Candes, and M. C. Grant. Templates for convex cone problems with applications to
sparse signal recovery. Mathematical Programming Computation, 3(3):165–218, 2011.
[18] O. Bunk, A. Diaz, F. Pfeiffer, C. David, B. Schmitt, D. K. Satapathy, and J. F Veen. Diffractive imaging
for periodic samples: retrieving one-dimensional concentration profiles across microfluidic channels. Acta
Crystallographica Section A: Foundations of Crystallography, 63(4):306–314, 2007.
[19] J. Cahill, P. G. Casazza, J. Peterson, and L. Woodland. arXiv preprint arXiv:1305.6226.
[20] E. J. Candes, Y. C Eldar, T. Strohmer, and V. Voroninski. Phase retrieval via matrix completion.
SIAM Journal on Imaging Sciences, 6(1):199–225, 2013.
[21] E. J. Candes and X. Li. Solving quadratic equations via phaselift when there are about as many
equations as unknowns. Foundations of Computational Mathematics, pages 1–10, 2012.
[22] E. J. Candes, X. Li, Y. Ma, and J. Wright. Robust principal component analysis? Journal of the ACM
(JACM), 58(3):11, 2011.
[23] E. J. Candes, T. Strohmer, and V. Voroninski. Phaselift: Exact and stable signal recovery from
magnitude measurements via convex programming. Communications on Pure and Applied Mathematics,
2012.
[24] A. Chai, M. Moscoso, and G. Papanicolaou. Array imaging using intensity-only measurements. Inverse
Problems, 27(1), 2011.
[25] J. V. Corbett. The pauli problem, state reconstruction and quantum-real numbers. Reports on Mathe-
matical Physics, 57(1):53–68, 2006.
[26] J. C. Dainty and J. R. Fienup. Phase retrieval and image reconstruction for astronomy. Image Recovery:
Theory and Application, ed. byH. Stark, Academic Press, San Diego, pages 231–275, 1987.

21
[27] L. Demanet and P. Hand. Stable optimizationless recovery from phaseless linear measurements. arXiv
preprint arXiv:1208.1803, 2012.
[28] L. Demanet and V. Jugnon. Convex recovery from interferometric measurements. arXiv preprint
arXiv:1307.6864, 2013.
[29] Y. C. Eldar and S. Mendelson. Phase retrieval: Stability and recovery guarantees. arXiv preprint
arXiv:1211.0872, 2012.
[30] M. X. Goemans and D. P. Williamson. Improved approximation algorithms for maximum cut and
satisfiability problems using semidefinite programming. Journal of the ACM (JACM), 42(6):1115–1145,
1995.
[31] D. Gross. Recovering low-rank matrices from few coefficients in any basis. IEEE Transactions on
Information Theory, 57(3):1548–1566, 2011.
[32] D. Gross, F. Krahmer, and R. Kueng. A partial derandomization of phaselift using spherical designs.
arXiv preprint arXiv:1310.2267, 2013.
[33] R. W. Harrison. Phase problem in crystallography. JOSA A, 10(5):1046–1055, 1993.
[34] T. Heinosaari, L. Mazzarella, and M. M. Wolf. Quantum tomography under prior information. Com-
munications in Mathematical Physics, 318(2):355–374, 2013.
[35] K. Jaganathan, S. Oymak, and B. Hassibi. On robust phase retrieval for sparse signals. In 50th Annual
Allerton Conference on Communication, Control, and Computing (Allerton), pages 794–799, 2012.
[36] X. Li and V. Voroninski. Sparse signal recovery from quadratic measurements via convex programming.
SIAM J. Math. Anal., 45(5):3019–3033, 2013.
[37] E. G. Loewen and E. Popov. Diffraction gratings and applications. CRC Press, 1997.
[38] J. Miao, T. Ishikawa, Q. Shen, and T. Earnest. Extending x-ray crystallography to allow the imaging
of noncrystalline materials, cells, and single protein complexes. Annu. Rev. Phys. Chem., 59:387–410,
2008.
[39] R. P. Millane. Phase retrieval in crystallography and optics. JOSA A, 7(3):394–411, 1990.
[40] D. Mondragon and V. Voroninski. Determination of all pure quantum states from a minimal number
of observables. arXiv preprint arXiv:1306.1214, 2013.
[41] A. Nemirovski. Lectures on modern convex optimization. In Society for Industrial and Applied Mathe-
matics (SIAM). Citeseer, 2001.
[42] P. Netrapalli, P. Jain, and S. Sanghavi. Phase retrieval using alternating minimization. arXiv preprint
arXiv:1306.0160, 2013.
[43] H. Ohlsson, A. Y. Yang, R. Dong, and S. S. Sastry. Compressive phase retrieval from squared output
measurements via semidefinite programming. arXiv preprint arXiv:1111.6323, 2011.
[44] S. Oymak, A. Jalali, M. Fazel, Y. C. Eldar, and B. Hassibi. Simultaneously structured models with
application to sparse and low-rank matrices. arXiv preprint arXiv:1212.3753, 2012.
[45] J. Ranieri, A. Chebira, Y. M. Lu, and M. Vetterli. Phase retrieval for sparse signals: Uniqueness
conditions. arXiv preprint arXiv:1308.3058, 2013.
[46] O. Raz, N. Dudovich, and B. Nadler. Vectorial phase retrieval of 1-d signals. IEEE Transactions on
Signal Processing, 61(7):1632–1643, 2013.
[47] H. Reichenbach. Philosophic foundations of quantum mechanics. University of California Pr, 1965.
[48] J. M. Rodenburg. Ptychography and related diffractive imaging methods. Advances in Imaging and
Electron Physics, 150:87–184, 2008.

22
[49] Y. Shechtman, Y. C. Eldar, A. Szameit, and M. Segev. Sparsity based sub-wavelength imaging with
partially incoherent light via quadratic compressed sensing. Optics Express, 19(16):14807–14822, 2011.
[50] P. Thibault, M. Dierolf, O. Bunk, A. Menzel, and F. Pfeiffer. Probe retrieval in ptychographic coherent
diffractive imaging. Ultramicroscopy, 109(4):338–343, 2009.
[51] J. A. Tropp. User-friendly tail bounds for sums of random matrices. Foundations of Computational
Mathematics, 12(4):389–434, 2012.
[52] Z. Uo, W. Ma, A. C. So, Y. Ye, and S. Zhang. Semidefinite relaxation of quadratic optimization
problems. IEEE Signal Processing Magazine, 27(3):20–34, 2010.
[53] I. Waldspurger, A. d’Aspremont, and S. Mallat. Phase recovery, maxcut and complex semidefinite
programming. arXiv preprint arXiv:1206.0102, 2012.
[54] A. Walther. The question of phase retrieval in optics. Journal of Modern Optics, 10(1):41–49, 1963.
[55] F. H. C. Watson, J. D. and Crick. A structure for deoxyribose nucleic acid. Nature, 171, 1953.