^†^†thanks: These authors contributed equally to this work.^†^†thanks: These authors contributed equally to this work.

Quantum phase estimation by compressed sensing

Changhao Yi State Key Laboratory of Surface Physics and Department of Physics, Fudan University, Shanghai 200433, China Institute for Nanoelectronic Devices and Quantum Computing, Fudan University, Shanghai 200433, China Cunlu Zhou Center for Quantum Information and Control, Department of Physics and Astronomy, University of New Mexico, NM 87131, USA Jun Takahashi Center for Quantum Information and Control, Department of Physics and Astronomy, University of New Mexico, NM 87131, USA

Abstract

As a signal recovery algorithm, compressed sensing is particularly useful when the data has low complexity and samples are rare, which matches perfectly with the task of quantum phase estimation (QPE) on early fault-tolerant quantum computers. In this work, we present a new Heisenberg-limited QPE algorithm for early fault-tolerant quantum computers based on compressed sensing. Our algorithm only requires sparse and discrete sampling of times. More specifically, given many copies of a proper initial state and queries to a specific unitary matrix, our algorithm can recover the phase with a total runtime $\mathcal{O}(\epsilon^{-1}\text{poly}\log(\epsilon^{-1}))$ , where $\epsilon$ is the desired accuracy. Moreover, the maximal runtime satisfies $T_{\max}\epsilon\ll\pi$ , which is comparable to the state-of-the-art algorithms. Our algorithm is also robust against noise from sampling and initial state preparation. More generally, our algorithm solves the basis mismatch problem in special cases by adding an extra parameter to the traditional compressed sensing algorithm.

1 Introduction

Quantum phase estimation (QPE) [1] is one of the most useful subroutines in quantum computing and plays an important role in many promising quantum applications [2, 3, 4]. Given a unitary matrix $U$ and one of its eigenvectors $|\Phi\rangle$ with eigenphase $e^{i2\pi\theta}$ , the task of QPE is to estimate phase $\theta$ within a given accuracy guarantee. When we set the unitary matrix $U$ as the evolution operator under a Hamiltonian $H$ , the task of QPE is equivalent to estimating a specific energy level $E_{0}$ with accuracy $\epsilon$ [5, 6]. Hence, this subroutine has numerous applications in condensed matter physics, high energy physics, and quantum chemistry. As a generalization, the problem of estimating multiple phases of $U$ has been referred to as the quantum eigenvalue estimation problem (QEEP) [7, 8, 5, 9, 6, 10].

While fully fault-tolerant quantum computers may still be years away from realization, early fault-tolerant quantum computers with a limited number of logical qubits and limited circuit depth are expected to be realized much sooner and to solve nontrivial tasks that demonstrate practical quantum advantages. Given the crucial role of QPE in many of such tasks, it becomes imperative to design QPE algorithms specifically tailored for early fault-tolerant quantum computers. The standard textbook QPE algorithm [11] does not require an exact eigenstate as the initial state and takes only one measurement, but it uses a large number of ancilla qubits and controlled operations, which is fairly demanding in experiment. Although Kitaev’s original iterative QPE algorithm [1] only uses one ancilla qubit and one controlled operation (see Fig. 1), it requires the initial state to be an exact eigenstate which can be a difficult task by itself. Therefore, neither of them is suitable for early fault-tolerant quantum computers:

Most of the recent work [5, 12, 13] in QPE for early fault-tolerant quantum computers have focused on designing better protocols to improve various aspects of Kitaev’s original QPE algorithm. More specifically, the following properties are desired when designing such algorithms.

•

The quantum circuit should be simple, using at most one ancilla qubit and one controlled operation.
•

The initial state is not necessarily an exact eigenstate of $U$ .
•

The total runtime achieves the Heisenberg limit, i.e., the total cost should be
$\mathcal{O}(\epsilon^{-1}\operatorname{poly}\log(\epsilon^{-1}\delta^{-1}))$ for estimating the phase $\theta$ to accuracy $\epsilon$ with probability $1-\delta$ .
•

When the overlap of the initial state and the targeted eigenstate is large, the maximal runtime $T_{\max}$ (hence the maximum circuit depth) can be much smaller than $\pi/\epsilon$ .

In this paper, we emphasize another issue that should be considered for early fault-tolerant quantum computers: The experimental complexity. To reduce the experimental complexity, it is desirable to have a small number of time samples with a regular choice of values. For early fault-tolerant experiments, one may still need to prepare quantum circuits for each evolution time $t$ by hand, and the total cost would be high if the number of different evolution times is large. It is comparably easier to run the same quantum circuit multiple times, instead of running different quantum circuits for a few times. Additionally, in quantum simulation algorithms [14, 15], usually the target evolution operator $U(t)$ is constructed by applying accurate estimations of short-time evolution operators step by step: $U(t)\approx U_{\mathrm{sim}}(\Delta t)^{L}$ . Thus, it is more convenient to sample from a discrete set of times $\mathcal{T}=\{n\Delta t,n\in\mathbb{N}\}$ rather than to sample from a continuous region. In the situation where we can only query the target unitary $U$ as a black box to estimate the eigenphases of $U$ from the integer powers $\{U^{n}\}_{n\in\mathbb{N}}$ , the setup is similar to the requirement of discrete sampling of times because every $U$ can be written as $e^{-\mathrm{i}Ht_{0}}$ . Most state-of-the-art algorithms focus on sampling time from a continuous region [5, 6, 16, 17]. In this work, we design a non-adaptive algorithm that only requires discrete and sparse sampling of times, and we show that even under these constraints, our algorithm still performs well.

For a Heisenberg-limited QPE algorithm with maximal runtime $T_{\max}$ , if the size of the time samples needed is $\mathcal{O}(\mathrm{poly}\log(T_{\max}))$ , the sampling is considered sparse in our paper.

The rest of the paper is organized as follows. We start with preliminaries about QEEP, sparse Fourier transformation and compressed sensing in Sec. 2. We then introduce our QPE algorithm based on compressed sensing in Sec. 3 and prove several analytical results, including its Heisenberg-limit scaling. We also numerically test the performance of our algorithm and compare it to previous works in Sec. 4. Finally, we summarize several open problems and potential future research directions in Sec. 5.

Refer to caption — Figure 1: The one-ancilla quantum circuit used in Kitaev-type QPE algorithms. The measurement is done in the $Z$ basis. In terms of the measurement outcome, we regard the $|0\rangle$ state as obtaining value $+1$ , and the $|1\rangle$ state as obtaining value $-1$ . $\mathbf{H}$ is the Hadamard gate. $\mathbf{W}$ has two choices: when $\mathbf{W}=I$ , the measurement outcome is $\pm 1$ with probability $(1\pm\text{Re}(\langle\Phi|U(t)|\Phi\rangle))/2$ respectively. When $\mathbf{W}=S^{\dagger}$ , the complex conjugation of the phase gate, the measurement outcome is $\pm 1$ with probability $(1\pm\text{Im}(\langle\Phi|U(t)|\Phi\rangle))/2$ instead. After taking the average over many test outcomes, we obtain an estimate of the true signal $\langle\Phi|U(t)|\Phi\rangle$ .

2 Main idea

2.1 Setup

The QEEP can be formulated as a sparse signal recovery problem. Given an initial state $|\Phi\rangle$ and a specific Hamiltonian with spectrum decomposition $H=\sum_{\ell=0}^{D-1}E_{\ell}P_{\ell}$ , where $\{E_{\ell}\}_{\ell=0}^{D-1}$ are energy levels and $\{P_{\ell}=|\phi_{\ell}\rangle\langle\phi_{\ell}|\}_{\ell=0}^{D-1}$ are projectors onto the corresponding eigenstates, the time-domain signal in QEEP can be written as

y^{0}(t)=\langle\Phi|e^{-\mathrm{i}Ht}|\Phi\rangle=\sum_{\ell=0}^{D-1}|\langle% \Phi|\phi_{\ell}\rangle|^{2}e^{-\mathrm{i}E_{\ell}t}.

(1)

In QEEP we assume that $|\Phi\rangle$ has the following decomposition:

\displaystyle|\Phi\rangle=\sum_{\ell\in\mathcal{L}_{\mathrm{dom}}}\sqrt{p_{% \ell}}|\phi_{\ell}\rangle+\sum_{\ell\in\mathcal{L}_{\mathrm{res}}}\sqrt{p_{% \ell}}|\phi_{\ell}\rangle,\quad\sum_{\ell\in\mathcal{L}_{\mathrm{dom}}}p_{\ell% }\approx 1,\quad|\mathcal{L}_{\mathrm{dom}}|\ll D,

(2)

where $\mathcal{L}_{\mathrm{dom}}$ denotes the dominant component of the signal, and $\mathcal{L}_{\mathrm{res}}$ is the residue component. Under this assumption, we can regard $y^{0}(t)$ as a sparse signal. The formal definition of sparsity will be given in the main text. In particular, when $|\mathcal{L}_{\text{dom}}|=1$ , the task becomes QPE. For QPE, without loss of generality¹¹1In this work, we do not consider the hardness of the preparation of the initial state. From the point view of phase estimation, there is nothing special about the ground state energy compared to other eigenvalues as long as one can prepare an initial state that is close enough to the target eigenstate., we will be mainly discussing the estimation of the ground energy $E_{0}$ , i.e., the smallest eigenvalue of $H$ .

The sparsity assumption applies to a wide range of situations. For instance, if we regard $|\Phi\rangle$ as the ground state of a perturbed Hamiltonian $H+V$ , then the overlap $|\langle\Phi|\phi_{\ell}\rangle|$ tends to decay exponentially with the energy difference $|E_{0}-E_{\ell}|$ (Because $|\langle\phi_{\ell}|V|\phi_{0}\rangle|$ decays exponentially with it. See [18] for details). Therefore, in this case, $|\Phi\rangle$ almost has no overlap with excited states with high energies, and $\mathcal{L}_{\mathrm{dom}}$ only contains a small amount of energy levels.

The objective of a QEEP algorithm on an early fault-tolerant quantum computer is to estimate $\mathcal{L}_{\text{dom}}$ within a certain accuracy level $\epsilon$ using rough estimations of $y^{0}_{t}$ on a time set $\mathcal{T}$ . An algorithm of this type can be separated into the quantum part and the classical post-processing part. Usually, the quantum part is a combination of Hamiltonian simulation [14] and the Hadamard tests (see Fig. 1). Hamiltonian simulation algorithms are used to prepare the evolution operator $U(t)$ . Longer evolution time requires more quantum gates, and the best-known circuit complexity for running $e^{-\mathrm{i}Ht}$ without ancilla qubits is almost linear in $\|H\|\cdot|t|$ ( $t$ can be negative) [19, 15]. The total runtime $T_{\text{total}}$ reflects the total circuit depth for running the algorithm. If $T_{\text{total}}=\mathcal{O}(\epsilon^{-1}\mathrm{poly}(\epsilon^{-1}))$ , we say that the algorithm satisfies the Heisenberg limit. The formal definition of $T_{\text{total}}$ writes

T_{\text{total}}=\sum_{t\in\mathcal{T}}M_{\mathrm{H}}\times|t|.

(3)

where $M_{\mathrm{H}}$ is the number of Hadamard tests required for each $y^{0}_{t}$ . Usually, we set $M_{\mathrm{H}}$ to be in order $\mathcal{O}(\log(|\mathcal{T}|)\eta^{-2})$ , so that $\{y^{0}_{t}\}_{t\in\mathcal{T}}$ can all be estimated within error $\eta$ .

Another important metric of complexity is the maximal runtime $T_{\max}=\max_{t\in\mathcal{T}}|t|$ , which reflects the maximum circuit depth. Due to the difficulty in constructing large-size quantum circuits, the restriction on the maximal runtime is particularly important for early fault-tolerant quantum computers.

The notations frequently used in the main text is summarized in Table 1.

Table 1: Notations

Notation	Meaning
$y(t)$	noisy time-domain signal
$y^{0}(t)$	ideal time-domain signal
$z(t)$	noise in time-domain signal
$x(k)$	ideal frequency-domain signal
$\mathcal{T}$	sample of times
$\tau$	unit time step
$N$	maximal evolution time (divided by $\tau$ )
$\epsilon$	accuracy on energy level
$\eta$	noise tolerance for each signal
$S$	sparsity of $x(k)$
$\gamma$	$\sqrt{S}/\log N$

2.2 Previous work

The classical aspect of QPE and QEEP involves estimating frequencies from statistically sampled sparse signals, a process akin to the objectives of sparse Fourier transformation (SFT) algorithms [20]. Based on the data types, SFT algorithms can be classified into discrete setting algorithms and continuous setting algorithms. A discrete SFT algorithm performs discrete Fourier transformation, where both the time and the frequency of the signal are restricted on a discrete set:

y_{t}=\sum_{k=0}^{N-1}x_{k}e^{-\mathrm{i}2\pi kt/N},\quad\{y_{t}\}_{t=0}^{N-1}% \xrightarrow{\mathrm{discrete\ SFT}}\{x_{k}\}_{k=0}^{N-1}.

(4)

A continuous SFT algorithm [21, 22, 23] aims to accomplish a more general task:

y(t)=\sum_{f\in\mathcal{F}}p_{f}e^{-\mathrm{i}2\pi ft}\xrightarrow{\mathrm{% continuous\ SFT}}x(k)=\sum_{f\in\mathcal{F}}p_{f}\delta(k-f),

(5)

where $y(t)$ is the time-domain signal, and $x(k)$ is the frequency-domain signal. For QPE, we do not assume the frequencies (energies) live in a discrete space. Thus, continuous SFT algorithms are more appropriate. In both setups, sparsity $S$ represents the number of distinct frequencies.

There are several aspects of evaluating the performance of an SFT algorithm. Its runtime complexity, sample complexity, and resolution are all important ingredients to consider. Here the runtime complexity refers to how long the algorithm takes on a classical computer, the sample complexity measures the number of time-domain signal samples required in the algorithm, and the resolution quantifies the differences between the true frequencies and their estimates. For example, the Fast Fourier Transformation algorithm [24] has runtime complexity $\mathcal{O}(N\log N)$ with sample complexity $\mathcal{O}(N)$ . So far, the best runtime complexity is $\mathcal{O}(S\log^{c}(N)\log(N/S))$ with $c>2$ [25], and the most sample-efficient algorithm requires only $\mathcal{O}(S\log S\log N)$ samples [26]. In practical scenarios, we most likely have noisy data, necessitating the need for algorithmic robustness. Given the unique characteristics of our quantum setting, we prioritize the sample complexity, resolution, and robustness of an algorithm.

Several continuous SFT algorithms have been used for QEEP. To the best of our knowledge, [7] was the first attempt to solve QEEP with Hadamard tests, where QEEP was treated as a time-series analysis problem. Later, [5] emphasized the importance of the Heisenberg-limited scaling, and by applying the Fourier-filter function techniques, they designed the first Heisenberg-limited QPE algorithm for early fault-tolerant quantum computers. Their algorithm was further improved by the other follow-up work [9, 6], where the Gaussian derivative filter function was used to reduce the maximal runtime of the algorithm. In [6] $T_{\max}$ was reduced to a “constant" depth, i.e., a quantity that only depends on the spectral gap, at the expense of increasing $T_{\mathrm{total}}$ from $O(\epsilon^{-1}\mathrm{poly}\log(\epsilon^{-1}))$ to $O(\epsilon^{-2}\mathrm{poly}\log(\epsilon^{-1}))$ , which made the algorithm not Heisenberg-limited.

Two recent QPE algorithms [12, 13], inspired by Robust Phase Estimation (RPE) [27, 28, 29], can also efficiently reduce the maximal runtime. These recent algorithms also improved the relation between $T_{\max}$ , the initial overlap $p_{0}$ , and the final accuracy $\epsilon$ . When the overlap $p_{0}$ is large, [12] reduces the prefactor $\tau_{c}$ in the maximum runtime scaling $T_{\max}=\tau_{c}/\epsilon$ by using a subroutine called the quantum complex exponential least squares (QCELS). In contrast to [5] in which the prefactor $\tau_{c}$ is at least $\pi$ , the prefactor in [12] can be arbitrarily close to $0$ as $p_{0}\to 1$ . In [16] and [30], the last two QPE algorithms have been extended to the QEEP setup.

Another recent work [17] proposed an efficient and versatile phase estimation algorithm named Quantum Multiple Eigenvalue Gaussian Filtered Search (QMEGS), which has most of the good properties mentioned above. Here we would like to emphasize its similarity to a signal processing algorithm named Orthogonal Matching Pursuit (OMP) [31]. OMP is a greedy algorithm that searches for the dominant frequencies of a signal by maximizing the overlaps step by step. QMEGS can be regarded as an OMP algorithm with a modified time sampling procedure to reduce the maximal and total runtime. The OMP algorithm has a strong connection with compressed sensing and can be potentially combined with our algorithm.

2.3 QPE by compressed sensing

Our main contribution is a simple and robust classical post-processing algorithm for QPE based on compressed sensing [32, 33, 34]. Our algorithm only requires sparse sampling of times from a discrete set.

Compressed sensing is a prominent signal-processing algorithm with wide applications in various domains such as time-frequency analysis, image processing, and quantum state tomography [35, 36, 37, 38]. It aims to solve special types of underdetermined linear inverse problems, i.e., given $y\in\mathbb{R}^{M}$ and $A\in\mathbb{R}^{M\times N}$ with $M\ll N$ , finding the unique sparse solution to $Ax=y,x\in\mathbb{R}^{N}$ . Certainly, the solution is not unique without further restrictions. If we assume $x$ is $S$ -sparse and $A$ satisfies the restricted isometry property (RIP) [33] over sparse signals, then $x$ can be uniquely recovered by solving a linear programming problem:

\min_{x\in\mathbb{R}^{N}}\|x\|_{1},\quad\mathrm{s.t.}\quad Ax=y.

(6)

If we set $x$ as the frequency domain signal, $y$ as the signal on the time samples, and $A$ as the partial Fourier transformation operator, then this compressed sensing subroutine can be used for discrete SFT. It has been proved that with $\mathcal{O}(S\log N)$ number of samples, one can successfully recover the frequency domain signal $x$ with high probability [32]. For noisy situations, the signal can still be recovered by solving the following quadratic programming problem [33]:

\min_{x\in\mathbb{R}^{N}}\|x\|_{1},\quad\mathrm{s.t.}\quad\|Ax-y\|_{2}\leq b.

(7)

The small number of required samples and the robustness against noisy sampling make compressed sensing an appealing post-processing algorithm.

Unfortunately, there is a significant drawback of compressed sensing: it only works for discrete SFT, not for continuous SFT. In other words, frequencies are assumed to be on a grid:

f\in\left\{\frac{n}{N}\right\},\quad n\in[N].

(8)

The on-grid assumption is unnatural for many signals in practice. The gap between the continuous world and the discrete model is formally termed as basis mismatch in signal analysis. Although off-grid compressed sensing algorithms that aim to solve the basis mismatch have been proposed [39], the performance in our numerical test is not ideal. We show that with a slight modification, the vanilla compressed sensing can be used for special types of continuous SFT tasks. In other words, our algorithm can solve the basis mismatch problem in special situations.

An overview of our algorithm is described as follows. For signal vectors with size $N$ , when the frequencies are all nearly on-grid ( $f\approx n/N,n\in\mathbb{Z}$ ) and the noise for each sample is bounded by a constant, the convex relaxation algorithm can recover the frequencies with only $\mathcal{O}(\log N)$ samples, which satisfies the Heisenberg limit. With no prior knowledge about $f$ (i.e., $f$ could be off-grid), we introduce a grid shift parameter $\nu$ such that after shifting the signal by $e^{-\mathrm{i}2\pi ft}\to e^{-\mathrm{i}2\pi(f-\nu/N)t}$ , the dominant frequencies of the new signal become nearly on-grid. This step requires an assumption on the signal, but we will show that a wide range of signals satisfies such an assumption. For each trial of $\nu$ , we run the compressed sensing subroutine on the data set $\{y_{t}\}_{t\in\mathcal{T}}$ to obtain a trial solution $s_{\nu}$ . The optimal $\nu$ is the one with the smallest $\|s_{\nu}\|_{1}$ . By searching the optimal grid-shift parameter in a finite set $\mathcal{V}$ , the accuracy of the dominant frequencies is $\mathcal{O}(\sigma N^{-1})$ , where $\sigma$ quantifies the size of the minimal off-grid component. This quantity is related to the noise, the frequency gap, and the residual part of the signal. In terms of the maximum runtime $T_{\max}$ , since the samples of the compressed sensing algorithm are integers in $[1,N]$ , $T_{\max}$ scales linearly in $N$ , and $T_{\text{total}}$ is $\mathcal{O}(N\log N)$ . To further reduce the total runtime of the algorithm, we can assign biased probability distribution on the sampling ratios, so that short times have a larger chance of being selected.

3 Main results

3.1 Algorithm

In this section, we present an overview of our algorithm for QPE using compressed sensing. The quantum part of the algorithm can be formulated as follows. The full algorithm is given in Algorithm 1. For each time $t\in\mathcal{T}$ , $y^{0}(t)$ can be obtained from averaging over the Hadamard tests. More precisely, by choosing $\mathbf{W}=I$ , the measurement outcome in Fig. 1 is a random variable

h_{x}(t):=\begin{cases}+1,\quad p=\frac{1}{2}[1+\text{Re}(y^{0}(t))],\\ -1,\quad p=\frac{1}{2}[1-\text{Re}(y^{0}(t))].\\ \end{cases}

(9)

Similarly, when $\mathbf{W}=S^{\dagger}$ , the measurement outcome is another random variable

h_{y}(t):=\begin{cases}+1,\quad p=\frac{1}{2}[1+\text{Im}(y^{0}(t))],\\ -1,\quad p=\frac{1}{2}[1-\text{Im}(y^{0}(t))].\\ \end{cases}

(10)

The summation of the two gives us the estimate of $y^{0}(t)$ :

\mathbb{E}[h_{x}(t)+\mathrm{i}h_{y}(t)]=y^{0}(t).

(11)

After sampling the random variables $h_{x}(t),h_{y}(t)$ for $M_{\mathrm{H}}$ times, we obtain a noisy signal:

y(t)=\overline{h_{x}(t)+\mathrm{i}h_{y}(t)}=y^{0}(t)+z(t).

(12)

Here the noise $z(t)$ originates from the statistical uncertainty of the Hadamard tests. Hoeffding’s inequality ensures that with probability $1-\delta^{\prime}$ , we have

|z(t)|=\mathcal{O}\left(\sqrt{\frac{1}{M_{\mathrm{H}}}\log\frac{1}{\delta^{% \prime}}}\right).

(13)

In the rest of the paper, the meanings of $z(t)$ are not identical, but they always represent the part of the signal that should be considered as noise. Introduce the noise tolerance parameter $\eta$ . To guarantee $|z(t)|<\eta,\forall t\in\mathcal{T}$ with probability at least $1-\delta$ , we require $\delta=\mathcal{O}(\delta^{\prime}|\mathcal{T}|^{-1})$ so that $M_{\mathrm{H}}=\Omega(\log(|\mathcal{T}|/\delta)/\eta^{2})$ . For a rigorous proof, see Appendix A of [12]. The total runtime is thus

T_{\text{total}}=\sum_{t\in\mathcal{T}}M_{\mathrm{H}}\times|t|=\mathcal{O}% \left(\log(|\mathcal{T}|\delta^{-1})\cdot\eta^{-2}\cdot\sum_{t\in|\mathcal{T}|% }|t|\right).

(14)

If the signal recovery algorithm has parameters $\eta=\mathcal{O}(1),T_{\max}=\mathcal{O}(\epsilon^{-1})$ , and $|\mathcal{T}|=\mathcal{O}(\text{poly}\log(\epsilon^{-1}))$ , then it achieves the Heisenberg limit. In the next section, we will prove that our algorithm fits this description.

Algorithm 1 Signal estimation by Hadamard test

1:Set of sampled integers

\mathcal{T}

, unit time step

\tau

, Hamiltonian

H

, an initial state

|\Phi\rangle

, error tolerance parameter

\eta

, failure probability

\delta

\{y(n\tau),n\in\mathcal{T}\}

3:for

n\in\mathcal{T}

4: Prepare the initial state

|\Phi\rangle

and unitary operator

e^{-\mathrm{i}Hn\tau}

;

5: Perform Hadamard tests for

\mathcal{O}(\log(|\mathcal{T}|/\delta)\eta^{-2})

times;

6: Compute the average value of the test outcomes as

y(n\tau)

7:end for

Now we elaborate on the classical post-processing part. The goal is to recover the dominant frequencies of $y^{0}_{t}$ with the noisy samples $\{y(t)\}_{t\in\mathcal{T}}$ . To rewrite the QEEP in Eq. (1) in the form of a compressed sensing problem, we first put the problem on a “grid”. Introduce a unit time step $\tau$ , such that

y^{0}_{n}=\sum_{f\in\mathcal{F}}p_{f}e^{-\mathrm{i}2\pi fn},\quad f=\frac{E_{% \ell}\tau}{2\pi},\quad\mathcal{F}=\left\{\frac{E_{\ell}\tau}{2\pi}:\ \ell\in% \mathcal{L}_{\text{dom}}\right\}.

(15)

The dominant energy levels $\mathcal{L}_{\text{dom}}$ defined in Eq. (2) determines the frequency support $\mathcal{F}$ . To keep the order of energy levels unchanged, we require that $E_{\ell}\tau\in[0,2\pi),\ \forall\ell$ . This condition can always be satisfied by adding a constant to the Hamiltonian $H$ and choosing $\tau$ properly. The true data to be processed is $\{y_{n}=y^{0}_{n}+z_{n}\}_{n\in\mathcal{T}}$ , where $z_{n}$ is the noise. Recall that the choice of $M_{H}$ guarantees that $|z_{n}|\leq\eta,\forall n\in\mathcal{T}$ with high probability.

As discussed in Sec. 2.3, because we cannot always assume $f\approx\frac{n}{N},\forall f\in\mathcal{F}$ , the regular compressed sensing algorithm is not guaranteed to work. Our algorithm significantly relaxes the assumption by introducing a grid-shift parameter. As a simple instance, suppose the frequency support $\mathcal{F}$ satisfies

f=\frac{n+\nu}{N},\quad\nu\in[-1/2,1/2),\ n\in[N],\quad\forall f\in\mathcal{F}.

(16)

Then $y^{0}_{n}$ becomes an on-grid signal in a new basis. That is, it can be written as $y^{0}_{n}=F_{\nu}x,x\in\mathbb{R}^{N}$ , where $F_{\nu}$ is the shifted Fourier transformation:

(F_{\nu})_{nk}:=e^{-\mathrm{i}2\pi(k+\nu)n/N},\quad n,k\in[N],

(17)

The signal can then be recovered by solving

\min_{\bar{s}\in\mathbb{R}^{N}}\|\bar{s}\|_{1},\quad\mathrm{s.t.}\quad\left\|% \mathcal{P}_{\mathcal{T}}\left(F_{\nu}\bar{s}-y\right)\right\|_{2}\leq\sqrt{|% \mathcal{T}|}\eta.

(18)

Here $\mathcal{P}_{\mathcal{T}}$ represents projector

\left(\mathcal{P}_{\mathcal{T}}\right)_{ij}=\delta_{ij}\cdot 1_{i\in\mathcal{T% }}.

(19)

Define $F_{\nu,\mathcal{T}}:=\mathcal{P}_{\mathcal{T}}F_{\nu}$ and $y_{\mathcal{T}}:=\mathcal{P}_{\mathcal{T}}y$ in the following paragraphs. One can argue that a general signal does not satisfy the condition in Eq. (16) even approximately. We will address this issue in the next section, and validate the universality of our algorithm with numerical tests.

However, even if the condition Eq. (16) is satisfied, we still need to find this $\nu$ to run the compressed sensing subroutine. Our algorithm solves the second problem by brute force search. We introduce a trial set $\mathcal{V}$ which contains evenly-spaced real numbers on $[-1/2,1/2)$ . For each $\nu\in\mathcal{V}$ , we denote the solution of the compressed sensing subroutine as $s_{\nu}$ , and outputs $s_{\nu}$ with the smallest 1-norm as the optimal solution. Shortly speaking, our algorithm approximately solves the following optimization task:

\min_{\bar{s}\in\mathbb{R}^{N},\nu\in[-1/2,1/2)}\|\bar{s}\|_{1},\quad\mathrm{s% .t.}\quad\|F_{\nu,\mathcal{T}}\bar{s}-y_{\mathcal{T}}\|_{2}\leq\sqrt{|\mathcal% {T}|}\eta.

(20)

In our analysis, the difference between the output $\nu^{\prime}$ and the ideal $\nu$ cannot be too large. This can be guaranteed by an extra run of sampling. The intuition is as follows. The output of a compressed sensing subroutine always matches the true signal on $\mathcal{T}$ . If the solution is good, then the recovered signal should be close to the true signal on another random sample set $\mathcal{T}_{2}$ . If the solution is bad, then the difference between the two signals will be very large on $\mathcal{T}_{2}$ . Using this idea, we can bound the difference $|\nu^{\prime}-\nu|$ properly.

The full algorithm is stated in Algorithm 2 with Algorithm 3 as a subroutine.

Algorithm 2 Quantum phase estimation by compressed sensing

1:Signal length

N

, signal sparsity

S

, sampling ratio

r

, unit time step

\tau

, Hamiltonian

H

, initial state

|\Phi\rangle

, expected mean squared error

\eta

, threshold mean squared error

\eta_{T}

, noise tolerance parameter

\sigma

, failure probability

\delta

, size of the trial set

J

E^{\ast}=2\pi(\min\mathcal{K}+\nu_{\ast})/N

3:Sample integers from

[N]

with sampling ratio

r

. Denote the samples by

\mathcal{T}

4:Apply Algorithm 1 with input

(\mathcal{T},\tau,H,|\Phi\rangle,\eta,\delta)

. Denote the output by

\{y_{n}\}_{n\in\mathcal{T}_{1}}

5:for

j=0,1,\cdots,J-1

6: Set

\nu_{j}=-1/2+j/J

7: Solve

\min_{\bar{s}\in\mathbb{R}^{N}}\|\bar{s}\|_{1},\quad\mathrm{s.t.}\quad\|F_{\nu% _{j},\mathcal{T}}\bar{s}-y_{\mathcal{T}}\|_{2}\leq\sqrt{|\mathcal{T}|}\sigma

to obtain

s_{\nu_{j}}

. If there is no feasible solution, set

s_{\nu_{j}}=(1,1,\cdots,1)

8: Record

(\nu_{j},s_{\nu_{j}})

as a solution.

9:end for

10:Sample integers from

[N]

with sampling ratio

r

. Denote the set of samples by

\mathcal{T}_{2}

11:Apply Algorithm 1 with input

(\mathcal{T}_{2},\tau,H,|\Phi\rangle,\eta,\delta)

. Denote the output by

\{y_{n}\}_{n\in\mathcal{T}_{2}}

12:for

j=0,1,\cdots,J-1

13: Apply Algorithm 3 with input

(\nu_{j},s_{\nu_{j}},\{y_{m}\}_{m\in\mathcal{T}_{2}},\eta_{T})

. Denote the output by

o_{j}

14: if

o_{j}=1

then

15:

\ell_{j}=N+1

16: else

17:

\ell_{j}=\|s_{\nu_{j}}\|_{1}

18: end if

19:end for

20:Let

j^{\ast}=\arg\min\ell_{j}

. Denote the corresponding solution by

(\nu_{\ast},s_{\nu_{\ast}})

21:Find the

S

entries of

s_{\nu_{\ast}}

with the largest amplitudes. Denote the set of indices as

\mathcal{K}=\{k^{\ast}_{1},k^{\ast}_{2},\cdots,k^{\ast}_{S}\}

Algorithm 3 Test of another sampling

1:Parameter

\nu

, solution

s_{\nu}

, signal data

\{y_{m}\}_{m\in\mathcal{T}_{2}}

, threshold mean square error

\eta_{T}

2:0 if the data fails the test; 1 if the data passes the test.

3:Compute the total empirical error with respect to the new set

\mathcal{E}:=\sum_{m\in\mathcal{T}_{2}}|(F_{\nu}^{-1}s_{\nu})_{m}-y_{m}|^{2}.

4:if

\mathcal{E}\geq|\mathcal{T}_{2}|\eta^{2}_{T}

then

5: Return 0.

6:else Return 1.

7:end if

3.2 Analysis

Suppose our target signal (without the error from the Hadamard tests) is

y^{0}_{t}=\sum_{f\in\mathcal{F}}p_{f}e^{-\mathrm{i}2\pi ft},\quad f=\frac{n_{f% }+\nu_{f}}{N}

(21)

where $n_{f}(\nu_{f})$ is the integer(decimal) part of frequency $f$ with $\nu_{f}\in[-1/2,1/2)$ . Given a shifted Fourier transformation matrix $F_{\nu}$ , every signal $y$ can be uniquely decomposed as

y^{0}=y^{0}_{\nu,\mathrm{on}}+y^{0}_{\nu,\mathrm{off}}:=F_{\nu}x_{\nu,\mathrm{% on}}+\mathrm{i}F_{\nu}x_{\nu,\mathrm{off}},\quad x_{\mathrm{\nu,on}},x_{% \mathrm{\nu,off}}\in\mathbb{R}^{N}

(22)

where $y^{0}_{\nu,\mathrm{on}}$ and $y^{0}_{\nu,\mathrm{off}}$ are called the on-grid and off-grid component of $y^{0}$ with respect to $\nu$ . We call such a decomposition as a grid decomposition. Let $\nu_{\mathrm{opt}}$ be the parameter that minimizes $\|y^{0}_{\nu,\mathrm{off}}\|_{2}$ , and denote the corresponding grid decomposition by $y^{0}_{\mathrm{on}}:=y^{0}_{\nu_{\mathrm{opt}},\mathrm{on}},y^{0}_{\mathrm{off% }}:=y^{0}_{\nu_{\mathrm{opt}},\mathrm{off}}$ . We term it as the optimal grid decomposition of $y^{0}$ , and denote $x_{\nu_{\mathrm{opt}},\mathrm{on}}$ by $x_{\mathrm{on}}$ henceforth.

If the algorithm can successfully recover $\{n_{f}\}_{f\in\mathcal{F}}$ , then the frequency support $\mathcal{F}$ is approximated by

\left\{\frac{2\pi}{N}(n_{f}+\nu_{\mathrm{opt}})\right\}_{f\in\mathcal{F}}.

(23)

The final accuracy on frequency $f$ is simply $2\pi|\nu_{f}-\nu_{\mathrm{opt}}|/N$ . The size of $\|y^{0}_{\mathrm{off}}\|_{\infty}$ is directly related to the accuracy of the algorithm. We denote it by $\sigma_{\mathrm{off}}$ . In Appendix B.1 we prove that

Lemma 1.

Given a length- $N$ signal $y^{0}$ defined in Eq. (21) with optimal grid decomposition $y^{0}=y^{0}_{\mathrm{on}}+y^{0}_{\mathrm{off}}$ and optimal parameter $\nu_{\mathrm{opt}}$ . If $\min_{g\neq f\in\mathcal{F}}|f-g|\geq N^{-1}$ and $p_{f}\geq p_{\min}$ , then

|\nu_{f}-\nu_{\mathrm{opt}}|=\mathcal{O}\left(\frac{\sigma_{\mathrm{off}}}{p_{% \min}}\right).

(24)

Without the frequency gap lower bound, we have

|\nu_{f}-\nu_{\mathrm{opt}}|\leq\sqrt{\frac{\sigma_{\mathrm{off}}}{4p_{\min}}}.

(25)

Note that as $p_{0}\to 1$ , we have $\sigma_{\mathrm{off}}\to 0$ , and the signal gets close to an ideal single-frequency function. Hence, the accuracy can be arbitrarily small. A large class of signals can fit the assumption in Lemma 1. Here we list two types of signals of interests:

•

Signal with a small initial overlap with the ground state. In contrast to other QPE algorithms [12, 13] where $p_{0}>\frac{1}{2}$ is required, our algorithm outputs the dominant on-grid approximation of the signal ( $y_{t}\approx\sum_{n}p_{n}e^{-\mathrm{i}2\pi nt/N}$ ), instead of the dominant single-frequency approximation of the signal ( $y_{t}\approx pe^{-\mathrm{i}2\pi ft},f\in[0,1)$ ). Hence, even if $p_{0}<\frac{1}{2}$ , as long as the dominant part is large enough, the ground energy can be well-estimated.
•

Signal with no frequency gap. When $|f_{1}-f_{2}|\ll 2\pi N^{-1}$ , in the picture of grid decomposition, the two frequencies can be replaced by a single frequency $f_{3}$ , and the tiny frequency gap is absorbed into the off-grid component. If $f_{3}$ can be approximately with high accuracy, we can use it as an estimate for $f_{1}$ as $|f_{1}-f_{3}|\ll 2\pi N^{-1}$ .

Next, we will prove that our algorithm works when $\sigma_{\mathrm{off}}+\eta$ is small. Choose an arbitrary $\nu$ . Consider the grid decomposition of $y^{0}_{\mathrm{on}}$ with respect to $\nu$ :

x^{\mathrm{R}}_{\nu}:=\mathrm{Re}(F^{-1}_{\nu}y^{0}_{\mathrm{on}}),\quad x^{% \mathrm{I}}_{\nu}:=\mathrm{Im}(F^{-1}_{\nu}y^{0}_{\mathrm{on}}),

(26)

so that the signal to be analyzed can be decomposed as

y=F_{\nu}(x^{\mathrm{R}}_{\nu}+\mathrm{i}x^{\mathrm{I}}_{\nu})+y^{0}_{\mathrm{% off}}+z,

(27)

where $z$ is the uncertainty from the Hadamard tests that satisfies $\|z\|_{\infty}\leq\eta$ . By definition, $x^{\mathrm{R}}_{\nu_{\mathrm{opt}}}=x_{\mathrm{on}},x^{\mathrm{I}}_{\nu_{% \mathrm{opt}}}=0$ . Suppose $s_{\nu}$ is the solution of

\min_{\bar{s}\in\mathbb{R}^{N}}\|\bar{s}\|_{1},\quad\mathrm{s.t.}\quad\|F_{{% \nu},\mathcal{T}}\bar{s}-y_{\mathcal{T}}\|_{2}\leq\sqrt{|\mathcal{T}|}\sigma,

(28)

where $\sigma$ is the parameter in the compressed sensing subroutine that we can choose. Now we try to find a sufficient condition for $\bar{s}=x^{\mathrm{R}}_{\nu}$ to be feasible, i.e.,

\|F_{{\nu},\mathcal{T}}x^{\mathrm{R}}_{\nu}-y_{\mathcal{T}}\|_{2}\leq\sqrt{|% \mathcal{T}|}\sigma,

(29)

so that we can bound $\|s_{\nu}-x^{\mathrm{R}}_{\nu}\|_{2}$ . Note that

F_{{\nu},\mathcal{T}}x^{\mathrm{R}}_{\nu}-y_{\mathcal{T}}=-\mathrm{i}F_{\nu,% \mathcal{T}}x^{\mathrm{I}}_{\nu}-y^{0}_{\mathrm{off},\mathcal{T}}-z_{\mathcal{% T}},

(30)

where $\|y^{0}_{\mathrm{off},\mathcal{T}}\|_{2}\leq\sqrt{|\mathcal{T}|}\sigma_{% \mathrm{off}},\|z_{\mathcal{T}}\|_{2}\leq\sqrt{|\mathcal{T}|}\eta$ , and $\|F_{\nu,\mathcal{T}}x^{\mathrm{I}}_{\nu}\|_{2}$ can be bounded by

Lemma 2 (Concentration of $\|F_{\nu,\mathcal{T}}x^{\mathrm{I}}_{\nu}\|_{2}$ ).

Suppose $\mathcal{T}$ is an integer set in $[N]$ generated by sampling ratio $r=\mathcal{O}(N^{-1}\log N)$ ; and $|\nu|<1/2$ . Then

\|F_{\nu,\mathcal{T}}x^{\mathrm{I}}_{\nu}\|_{2}\leq\sqrt{|\mathcal{T}|}\cdot% \frac{4\pi}{\sqrt{3}}|\nu-\nu_{\mathrm{opt}}|

(31)

with probability at least $1-1/\mathrm{poly}(N)$ .

See Appendix B.2 for the proof. Therefore, if

\frac{4\pi}{\sqrt{3}}|\nu-\nu_{\mathrm{opt}}|+\sigma_{\mathrm{off}}+\eta\leq\sigma,

(32)

then $x^{\mathrm{R}}_{\nu}$ is guaranteed to be feasible, so that $\|s_{\nu}\|_{1}\leq\|x^{\mathrm{R}}_{\nu}\|_{1}$ . Meanwhile,

\|F_{{\nu},\mathcal{T}}(s_{\nu}-x^{\mathrm{R}}_{\nu})\|_{2}\leq\|F_{{\nu},% \mathcal{T}}s_{\nu}-y_{\mathcal{T}}\|_{2}+\|F_{{\nu},\mathcal{T}}x^{\mathrm{R}% }_{\nu}-y_{\mathcal{T}}\|_{2}\leq 2\sqrt{|\mathcal{T}|}\sigma.

(33)

The last inequality holds because both $x^{\mathrm{R}}_{\nu}$ and $s_{\nu}$ are feasible solutions of Eq. (28). With this upper bound, we can use standard results in compressed sensing (see Theorem 3) to bound $\|s_{\nu}-x^{\mathrm{R}}_{\nu}\|_{2}$ , and thus $\|s_{\nu}-x_{\mathrm{on}}\|_{2}$ .

Introduce $\gamma:=\sqrt{S}/\log N$ for simplicity, where $S$ is the estimated sparsity of the signal. In the following sections, the sum $\sigma_{\mathrm{off}}+\eta$ will appear a lot, and usually it is compared with $\sigma$ . Hence, we introduce $C_{0}:=(\sigma_{\mathrm{off}}+\eta)/\sigma$ . In Appendix B.3, we prove:

Lemma 3 (A good $\nu$ generates a good solution).

Suppose

|\nu-\nu_{\mathrm{opt}}|\leq\gamma\sigma,\quad C_{0}\leq 1-\frac{4\pi\gamma}{% \sqrt{3}}.

(34)

Then

\|s_{\nu}-x_{\mathrm{on}}\|_{2}\leq C_{3}\sigma,\quad C_{3}:=C_{1}+C_{2}\pi+% \frac{2\pi\gamma}{\sqrt{3}}.

(35)

The meanings of the $C_{1},C_{2}$ can be found in Appendix A.

Let $\nu_{1}$ be the parameter in $\mathcal{V}$ that is closest to $\nu_{\mathrm{opt}}$ , and $\nu_{\ast}$ be the one with the smallest $\|s_{\nu_{\ast}}\|_{1}$ . If $\nu_{\ast}=\nu_{1}$ , then Lemma 35 already provides us a proper upper bound of $\|s_{\nu^{\ast}}-x_{\mathrm{on}}\|_{2}$ . Otherwise, given $\|s_{\nu_{\ast}}\|_{1}\leq\|s_{\nu_{1}}\|_{1}$ , if we have a proper upper bound of $\|F_{{\nu_{1}},\mathcal{T}}s_{\nu_{\ast}}-F_{{\nu_{1}},\mathcal{T}}s_{\nu_{1}}% \|_{2}$ , we can bound $\|s_{\nu_{\ast}}-s_{\nu_{1}}\|_{2}$ as well, and thus $\|s_{\nu_{\ast}}-x_{\mathrm{on}}\|_{2}$ . Unfortunately, in order to estimate $\|F_{{\nu_{1}},\mathcal{T}}s_{\nu_{\ast}}-F_{{\nu_{1}},\mathcal{T}}s_{\nu_{1}}% \|_{2}$ , we need an upper bound of $|\nu_{\ast}-\nu_{1}|$ that is linear in $\sigma$ , which is hard to prove. To bypass the difficulty, we confine the value of $|\nu_{\ast}-\nu_{\mathrm{opt}}|$ by Algorithm 3. In the following lemma, we prove that the solution generated by a bad $\nu$ , such that $|\nu-\nu_{\mathrm{opt}}|$ is large, cannot pass the test.

Lemma 4 (A bad $\nu$ generates a bad solution).

Suppose $x_{\mathrm{on}}=\sum_{n\in\mathcal{N}}q_{n}\delta_{n}$ . Let

C_{4}:=\|x_{\mathrm{on}}\|_{2}\sqrt{4-\pi^{2}|\mathcal{N}|^{2}N^{-1}},\quad C_% {5}:=\sqrt{\frac{3}{2}(C_{3}^{2}+C_{4}^{2}\gamma^{2}+C_{0}^{2})}.

(36)

Sample integers from $[N]$ uniformly random and denote them as $\mathcal{S}$ . If $\nu$ satisfies

|\nu-\nu_{\mathrm{opt}}|>C_{4}^{-1}\sqrt{C_{0}^{2}+2C_{5}^{2}}\sigma,

(37)

then with probability at least $1-\exp[-\mathcal{O(}|\mathcal{S}|)]$ , we have

\sum_{t\in\mathcal{S}}\left|y_{t}-(F_{\nu}s_{\nu})_{t}\right|^{2}>|\mathcal{S}% |C_{5}^{2}\sigma^{2},

(38)

and there exists at least one $\nu^{\prime}\in\mathcal{V}$ that satisfies

\sum_{t\in\mathcal{S}}\left|y_{t}-(F_{\nu^{\prime}}s_{\nu^{\prime}})_{t}\right% |^{2}\leq|\mathcal{S}|C_{5}^{2}\sigma^{2}

(39)

with probability at least $1-\exp[-\mathcal{O}(|\mathcal{S}|)]$ .

The proof of Lemma 4 is given in Appendix B.4. To sum up, the performance of our algorithm is guaranteed by the following arguments:

•

If there exists a $\nu_{1}\in\mathcal{V}$ that is close enough to $\nu_{\mathrm{opt}}$ , then we can obtain an accurate recovery of the signal from $\nu_{1}$ , so that all the dominant frequencies of $x_{\mathrm{on}}$ are preserved in $s_{\nu_{1}}$ .
•

Using the test of another sampling, we can narrow down the choice of $\nu$ in a small region.
•

Suppose the solution with the smallest 1-norm is $(\nu_{\ast},s_{\nu_{\ast}})$ . We can prove that $s_{\nu_{\ast}}$ is close enough to $s_{\nu_{1}}$ , so that all the dominant frequencies of $s_{\nu_{1}}$ are preserved in $s_{\nu_{\ast}}$ .
•

Because all the dominant frequencies of $s_{\nu_{1}}$ are preserved in $s_{\nu_{\ast}}$ , the accuracy of the final result is $2\pi|\nu_{f}-\nu_{0}|/N$ , whose value is bounded by Lemma 1.

Finally, we have the following result regarding the accuracy of the algorithm.

Theorem 1.

Suppose the target length- $N$ signal $y$ writes $y^{0}=y^{0}_{\mathrm{on}}+y^{0}_{\mathrm{off}}+z$ , where $\|z\|_{\infty}\leq\eta,\|y^{0}_{\mathrm{off}}\|_{\infty}=\sigma_{\mathrm{off}}$ and $y^{0}_{\mathrm{on}}=\sum_{n\in\mathcal{N}}q_{n}e^{-\mathrm{i}2\pi nt/N}$ . Let

\displaystyle\mathcal{N}_{\mathrm{dom}}=\{n\in\mathcal{N}:q_{n}\geq p_{\min}\}% ,\quad S=|\mathcal{N}_{\mathrm{dom}}|,\quad\mathcal{N}_{\mathrm{res}}=\{n\in% \mathcal{N}:q_{n}<p_{\min}\}.

(40)

Suppose the parameters of Algorithm 2 satisfy

	$\displaystyle r=\mathcal{O}(N^{-1}S\log^{2}S\log N),\quad C_{0}\leq 1-\frac{4% \pi\gamma}{\sqrt{3}},$		(41)
	$\displaystyle\eta_{T}=\mathcal{O}(C_{5}\sigma),\quad J\geq\lceil(\gamma\sigma)% ^{-1}\rceil.$		(42)

then Algorithm 2 can be accomplished within classical runtime $\mathcal{O}(JN\log N)$ , and the optimal solution $s_{\nu_{\ast}}$ satisfies

\|s_{\nu_{\ast}}-x_{\mathrm{on}}\|_{2}=\mathcal{O}\left(\sigma\right)

(43)

with probability at least $1-1/\mathrm{poly}(N)$ .

The formal proof of Theorem 1 can be found in Appendix B. As a direct corollary. Suppose

p_{\mathrm{gap}}:=\min_{n\in\mathcal{N}_{\mathrm{dom}}}p_{n}-\max_{n\in% \mathcal{N}_{\mathrm{res}}}p_{n}\geq\|s_{\nu_{\ast}}-x_{\mathrm{on}}\|_{2};

(44)

then the accuracy of Algorithm 2 has an upper bound

\epsilon=\mathcal{O}\left(\frac{2\pi}{N}\sqrt{\frac{C_{0}\sigma}{p_{\min}}}% \right).

(45)

If the original signal has frequency gap: $\min_{g\neq f}|f-g|\geq N^{-1},\forall p_{f}\geq p_{\min}$ , then the accuracy can be improved to

\epsilon=\mathcal{O}\left(\frac{2\pi C_{0}\sigma}{Np_{\min}}\right).

(46)

In our numerical tests, the optimal $\nu$ that minimizes $\|s_{\nu}\|_{1}$ is always close to $\nu_{\mathrm{opt}}$ , thus we conjecture that Algorithm 3 is unnecessary.

4 Numerical results

4.1 Previous algorithms

In this section, we provide a few numerical tests and compare our algorithm with previous works. First, we briefly introduce the three different algorithms for QPE : ML-QCELS[12], MM-QCELS [16] and QMEGS [17]. The last two can be used for QEEP as well.

The outline of the first two algorithms can be described as follows. The ML-QCELS algorithm has a hierarchy structure, namely, the algorithm can be divided into several hierarchies. At each hierarchy, they use Hadamard tests to estimate signal $y(t)$ at $N_{0}$ different times. The algorithm then outputs the estimate for the dominant frequency by minimizing the following cost function:

L(r,E)=\frac{1}{N_{0}}\sum_{n=1}^{N_{0}}\left|re^{-\mathrm{i}En\tau}-y(n\tau)% \right|^{2}.

(47)

In the next hierarchy, they search for solutions in a narrower region and obtain a new estimate. Eventually, they generate an accurate estimation of the dominant frequency. ML-QCELS has proved to be efficient for single-phase estimation but not for multiple-phase estimation. The authors later proposed the multiple-phase version named MM-QCELS [16]. In this algorithm, the original ML-QCELS is adapted in two aspects: the time samples $\mathcal{T}$ are drawn from a probability distribution $a_{T}(t)$ , and the cost function is changed to

L\left(\{r_{k},E_{k}\}_{k=1}^{K}\right)=\frac{1}{|\mathcal{T}|}\sum_{t\in% \mathcal{T}}\left|\sum_{k=1}^{K}r_{k}e^{-\mathrm{i}E_{k}t}-y(t)\right|^{2}.

(48)

Besides, when applying MM-QCELS to single-phase estimation, the hierarchy structure can be removed, so that the algorithm only has one-level, hence it becomes non-adaptive.

QMEGS [17] samples time from a continuous probability distribution as well. Instead of estimating the frequencies by minimizing $L(\{r_{k},E_{k}\})_{k=1}^{K}$ , the algorithm find the optimal dominant frequency estimation of the target signal by solving

\min_{f}\sum_{t\in\mathcal{T}}\left|y(t)e^{i2\pi ft}\right|^{2}.

(49)

Suppose the solution of this step is $f^{\ast}$ . In the next step, they search for a solution in the region $[0,1]/(f^{\ast}-\delta f,f^{\ast}+\delta f)$ , which gives the estimation of the sub-dominant frequency. By repeating this procedure step by step, one can eventually obtain all the dominant frequencies.

Essentially, compressed sensing is similar to non-adaptive MM-QCELS. In both methods, one intends to fit the sampled data by an ansatz of the signal. The difference lies in the rule of sampling and the cost function in the optimization task. In MM-QCELS, times are sampled from a continuous probability distribution, and the cost function is the total empirical error in the time domain. In compressed sensing, times are sampled uniformly random from a discrete set, and the cost function is the 1-norm of the frequency domain signal.

4.2 Models and results

Next, we will show the comparisons between these previous algorithms and Algorithm 1 with several physical models. Given a bounded Hamiltonian $H$ , we can first normalize it to

\overline{H}=\frac{\pi}{4\|H\|_{2}}\cdot H

(50)

so that the spectra of $\overline{H}$ belong to $[-\pi/4,\pi/4]$ . In our algorithm, we further shift the Hamiltonian to $\overline{H}^{\prime}=H+\pi/2$ so that the spectra of $\overline{H}^{\prime}$ belong to $[\pi/4,3\pi/4]$ , as in the setup we assume the frequencies are in region $[0,2\pi]$ . To have a better control of the parameters in the signal, we design the following family of initial states: choose a parameter $\alpha\in(0,1)$ , then set the initial state as

|\Psi_{\alpha}\rangle\propto\sum_{\ell=0}^{9}\sqrt{\alpha^{\ell}}|\phi_{\ell}\rangle,

(51)

where $|\phi_{\ell}\rangle$ is the $\ell$ -th eigenstate of $\overline{H}$ with eigenvalue $E_{\ell}$ . Hence, the target signal writes

y_{n}=\sum_{\ell=0}^{9}p_{\ell}e^{-\mathrm{i}E_{\ell}n\tau},\quad p_{\ell}=% \frac{(1-\alpha)\alpha^{\ell}}{1-\alpha^{10}}.

(52)

For example, when $\alpha=1/2$ , we have $p_{0}\approx 1/2$ . Clearly, $p_{0}$ decreases with $\alpha$ .

In the first set of numerical tests, $H$ is the normalized transverse field Ising model on 8 sites:

\quad H_{\mathrm{Ising}}=-\sum_{j=1}^{7}Z_{j}Z_{j+1}-Z_{8}Z_{1}-4\sum_{j=1}^{8% }X_{j},

(53)

and we set $\alpha=1/2,1/4,1/8$ separately. In the second set of numerical tests, $H$ is the Fermi Hubbard model on 4 sites:

H_{\mathrm{Hubbard}}=-\sum_{j=1}^{3}\sum_{\sigma=\uparrow,\downarrow}c^{% \dagger}_{j,\sigma}c_{j+1,\sigma}+10\sum_{j=1}^{4}\left(n_{j,\uparrow}-\frac{1% }{2}\right)\left(n_{j,\downarrow}-\frac{1}{2}\right).

(54)

For each Hamilonian, we set $\alpha=1/8,1/4,1/2$ separately.

To have a rather fair comparison between the algorithms, we deliberately choose the parameters of the algorithms to ensure that the runtimes $(T_{\max},T_{\text{total}})$ of the algorithms are approximately on the same line, which is demonstrated in Fig. 2. In the same figure, we also plot the number of different time samples used in different algorithms. Our algorithm requires a much smaller size of $|\mathcal{T}|$ , in contrast with that of ML-QCELS and MM-QCELS.

The mean errors of the outputs are recorded in Fig. 3. As shown by the numerical experiments, when the initial overlap is comparably large $(\alpha=1/8)$ , the performances of compressed-sensing-based algorithms are better than the other methods. We thus conclude that indeed Algorithm 2 is prominent in its sparse sampling of time, and has a high level of accuracy.

5 Discussions

In this paper, we presented a simple and robust algorithm for QPE using compressed sensing. For the single eigenvalue estimation (i.e., QPE), we rigorously established its Heisenberg-limit scaling in Theorem 1 and numerically demonstrated its performance compared to the other state-of-the-art QPE algorithms in Sec. 4. Our algorithm has a smaller average error when the initial overlap is large, provided that the runtime costs $(T_{\mathrm{total}},T_{\mathrm{max}})$ are approximately the same. Similarly to QMEGS, our algorithm is non-adaptive, which means we can perform all the measurements first, and then focus on the classical post-processing part. As a comparison, RPE-inspired algorithms are usually adaptive [12, 13]. Our algorithm requires a rather small size of $\mathcal{T}$ on a discrete set. In Fig. 3, for a signal of length $N\in(100,600)$ , our algorithm only requires $\approx 10^{1}$ different time samples. While MM-QCELS requires $\approx 10^{2}$ number of time samples from a continuous region. Surprisingly, our numerical tests also show that QMEGS only requires $\approx 10^{1}$ different samples to obtain an accurate estimation. Related works on the restricted isometry property can be useful to rigorously prove that a sparse and discrete sampling of times works for QMEGS as well.

Lastly, we list a few open questions:

1.

In discrete sampling protocols, would it be possible to shorten the maximal runtime by biased sampling of times? What is the limitation in the discrete scenario? Can we achieve a similar improvement to the Gaussian filter method in [17]?
2.

In our numerical experiments, the test of another sampling (Algorithm 3) is actually unnecessary. Is it possible to show this analytically as well?
3.

One can try to find the optimal grid shift parameter by optimization instead of trying every grid shift parameter in a trial set, which should give better results.

6 Acknowledgement

We thank Tianyu Wang for the helpful discussions. C.Y. acknowledges support from the National Natural Science Foundation of China (Grant No. 92165109), National Key Research and Development Program of China (Grant No. 2022YFA1404204), and Shanghai Municipal Science and Technology Major Project (Grant No. 2019SHZDZX01). C.Z. and J.T. acknowledge support from the U.S. National Science Foundation under Grant No. 2116246, the U.S. Department of Energy, Office of Science, National Quantum Information Science Research Centers, and Quantum Systems Accelerator.

References

[1] A Yu Kitaev. Quantum measurements and the Abelian stabilizer problem. quant-ph/9511026, 1995.
[2] Peter W Shor. Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM review, 41(2):303–332, 1999.
[3] Daniel S Abrams and Seth Lloyd. Quantum algorithm providing exponential speed increase for finding eigenvalues and eigenvectors. Physical Review Letters, 83(24):5162, 1999.
[4] Sam McArdle, Suguru Endo, Alán Aspuru-Guzik, Simon C Benjamin, and Xiao Yuan. Quantum computational chemistry. Reviews of Modern Physics, 92(1):015003, 2020.
[5] Lin Lin and Yu Tong. Heisenberg-limited ground-state energy estimation for early fault-tolerant quantum computers. PRX Quantum, 3(1):010318, 2022.
[6] Guoming Wang, Daniel Stilck-França, Ruizhe Zhang, Shuchen Zhu, and Peter D Johnson. Quantum algorithm for ground state energy estimation using circuit depth with exponentially improved dependence on precision. Quantum, 7:1167, 2023.
[7] Rolando D Somma. Quantum eigenvalue estimation via time series analysis. New Journal of Physics, 21(12):123025, 2019.
[8] Thomas E O’Brien, Brian Tarasinski, and Barbara M Terhal. Quantum phase estimation of multiple eigenvalues for small-scale (noisy) experiments. New Journal of Physics, 21(2):023022, 2019.
[9] Ruizhe Zhang, Guoming Wang, and Peter Johnson. Computing ground state properties with early fault-tolerant quantum computers. Quantum, 6:761, 2022.
[10] Alicja Dutkiewicz, Barbara M. Terhal, and Thomas E O’Brien. Heisenberg-limited quantum phase estimation of multiple eigenvalues with few control qubits. Quantum, 6:830, 2022.
[11] Michael A Nielsen and Isaac Chuang. Quantum computation and quantum information, 2010.
[12] Zhiyan Ding and Lin Lin. Even shorter quantum circuit for phase estimation on early fault-tolerant quantum computers with applications to ground-state energy estimation. PRX Quantum, 4:020331, May 2023.
[13] Hongkang Ni, Haoya Li, and Lexing Ying. On low-depth algorithms for quantum phase estimation. Quantum, 7:1165, 2023.
[14] Iulia M Georgescu, Sahel Ashhab, and Franco Nori. Quantum simulation. Reviews of Modern Physics, 86(1):153, 2014.
[15] Andrew M Childs, Yuan Su, Minh C Tran, Nathan Wiebe, and Shuchen Zhu. Theory of Trotter error with commutator scaling. Physical Review X, 11(1):011020, 2021.
[16] Zhiyan Ding and Lin Lin. Simultaneous estimation of multiple eigenvalues with short-depth quantum circuit on early fault-tolerant quantum computers. Quantum, 7:1136, 2023.
[17] Zhiyan Ding, Haoya Li, Lin Lin, HongKang Ni, Lexing Ying, and Ruizhe Zhang. Quantum Multiple Eigenvalue Gaussian filtered search: an efficient and versatile quantum phase estimation method. arXiv:2402.01013, 2024.
[18] Itai Arad, Tomotaka Kuwahara, and Zeph Landau. Connecting global and local energy distributions in quantum spin models on a lattice. Journal of Statistical Mechanics: Theory and Experiment, 2016(3):033301, 2016.
[19] Andrew M Childs and Yuan Su. Nearly optimal lattice simulation by product formulas. Physical review letters, 123(5):050503, 2019.
[20] Haitham Hassanieh, Piotr Indyk, Dina Katabi, and Eric Price. Nearly optimal sparse Fourier transform. In Proceedings of the forty-fourth annual ACM symposium on Theory of computing, pages 563–578, 2012.
[21] Ankur Moitra. Super-resolution, extremal functions and the condition number of Vandermonde matrices. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 821–830, 2015.
[22] Xue Chen, Daniel M Kane, Eric Price, and Zhao Song. Fourier-sparse interpolation without a frequency gap. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 741–750. IEEE, 2016.
[23] Zhao Song, Baocheng Sun, Omri Weinstein, and Ruizhe Zhang. Quartic samples suffice for Fourier interpolation. In 2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS), pages 1414–1425. IEEE, 2023.
[24] William T Cochran, James W Cooley, David L Favin, Howard D Helms, Reginald A Kaenel, William W Lang, George C Maling, David E Nelson, Charles M Rader, and Peter D Welch. What is the fast Fourier transform? Proceedings of the IEEE, 55(10):1664–1674, 1967.
[25] Anna C Gilbert, Shan Muthukrishnan, and Martin Strauss. Improved time bounds for near-optimal sparse Fourier representations. In Wavelets XI, volume 5914, pages 398–412. SPIE, 2005.
[26] Piotr Indyk, Michael Kapralov, and Eric Price. (nearly) sample-optimal sparse Fourier transform. In Proceedings of the twenty-fifth annual ACM-SIAM symposium on Discrete algorithms, pages 480–499. SIAM, 2014.
[27] BL Higgins, DW Berry, SD Bartlett, MW Mitchell, HM Wiseman, and GJ Pryde. Demonstrating Heisenberg-limited unambiguous phase estimation without adaptive measurements. New Journal of Physics, 11(7):073023, 2009.
[28] Shelby Kimmel, Guang Hao Low, and Theodore J Yoder. Robust calibration of a universal single-qubit gate set via robust phase estimation. Physical Review A, 92(6):062315, 2015.
[29] Federico Belliardo and Vittorio Giovannetti. Achieving Heisenberg scaling with maximally entangled states: An analytic upper bound for the attainable root-mean-square error. Physical Review A, 102(4), oct 2020.
[30] Haoya Li, Hongkang Ni, and Lexing Ying. Adaptive low-depth quantum algorithms for robust multiple-phase estimation. Phys. Rev. A, 108:062408, Dec 2023.
[31] T Tony Cai and Lie Wang. Orthogonal matching pursuit for sparse signal recovery with noise. IEEE Transactions on Information theory, 57(7):4680–4688, 2011.
[32] Emmanuel J Candes and Terence Tao. Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE transactions on information theory, 52(12):5406–5425, 2006.
[33] Emmanuel J Candes, Justin K Romberg, and Terence Tao. Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences, 59(8):1207–1223, 2006.
[34] Emmanuel J Candès, Justin Romberg, and Terence Tao. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on information theory, 52(2):489–509, 2006.
[35] David Gross, Yi-Kai Liu, Steven T Flammia, Stephen Becker, and Jens Eisert. Quantum state tomography via compressed sensing. Physical review letters, 105(15):150401, 2010.
[36] A. Smith, C. A. Riofrí o, B. E. Anderson, H. Sosa-Martinez, I. H. Deutsch, and P. S. Jessen. Quantum state tomography by continuous measurement and compressed sensing. Physical Review A, 87(3), Mar 2013.
[37] Easwar Magesan, Alexandre Cooper, and Paola Cappellaro. Compressing measurements in quantum dynamic parameter estimation. Physical Review A—Atomic, Molecular, and Optical Physics, 88(6):062109, 2013.
[38] Amir Kalev, Robert L. Kosut, and Ivan H. Deutsch. Quantum tomography protocols with positivity are compressed sensing protocols. npj Quantum Information, 1(1):15018, Dec 2015.
[39] Gongguo Tang, Badri Narayan Bhaskar, Parikshit Shah, and Benjamin Recht. Compressed sensing off the grid. IEEE transactions on information theory, 59(11):7465–7490, 2013.
[40] https://github.com/CYI1995/QEEP/tree/main/Paper_QPE.
[41] Thomas Blumensath and Mike E Davies. Iterative hard thresholding for compressed sensing. Applied and computational harmonic analysis, 27(3):265–274, 2009.
[42] Mark Rudelson and Roman Vershynin. On sparse reconstruction from Fourier and Gaussian measurements. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences, 61(8):1025–1045, 2008.

Appendix A Standard results in compressed sensing

Given a vector $v=[v_{1},v_{2},\cdots,v_{N}]^{\top}$ , its $1$ -norm, $2$ -norm and $\infty$ -norm are defined as

\|v\|_{1}\equiv\sum_{n=1}^{N}|v_{n}|,\quad\|v\|_{2}\equiv\left(\sum_{n=1}^{N}|% v_{n}|^{2}\right)^{1/2},\quad\|v\|_{\infty}\equiv\max_{n}|v_{n}|.

(55)

In the following paragraph, we use $k$ to label the indices of entries in frequency domain, and use $n$ to label the indices of entries in time domain. Denote the set of integers from 1 to $N$ as $[N]$ . In regular compressed sensing, we deal with a time-domain discrete signal $y^{0}$ in the form of

y^{0}_{n}=\sum_{f\in\mathcal{F}}p_{f}e^{-\mathrm{i}2\pi fn},\quad n\in[N]

(56)

where $p_{f}>0,\sum_{f}p_{f}=1,f\in[0,1)$ , and $\mathcal{F}$ is the set of frequencies. In the context of compressed sensing, sparsity means $|\mathcal{F}|=\mathcal{O}(\log N)$ [32]. The time-domain signal can thus be written as an $N$ -dimensional vector

y^{0}=[y^{0}_{1},y^{0}_{2},\cdots,y^{0}_{N}]^{\top}.

(57)

Define the Fourier matrix by $F_{kn}:=e^{-\mathrm{i}2\pi kn/N},\ k,n\in[N]$ . Throughout the paper, if the frequency $f$ satisfies $\exists k\in[N],\ f=k/N$ , then we say $f$ is on-grid, otherwise it is off-grid. For a frequency $f\in[0,1)$ , we define its off-grid deviation as

\nu_{f}=f-n_{f}/N,\quad n_{f}=\arg\min_{k}|f-k/N|.

(58)

If all $f\in\mathcal{F}$ are on-grid, the frequency-domain signal $x$ can be written in the form of a real vector:

x=\frac{1}{N}F^{\dagger}y^{0}=\sum_{f\in\mathcal{F}}p_{f}\delta_{Nf}.

(59)

The purpose of compressed sensing is then to recover $x$ from noisy samples of the signal. The algorithm is accomplished in the following sequence. Choose a sampling ratio $r$ , and assign each integer $n$ in $[N]$ a random variable $1_{n}$ that satisfies

\mathrm{Pr}\{1_{n}=1\}=r,\quad\mathrm{Pr}\{1_{n}=0\}=1-r.

(60)

Draw one sample from each $1_{n},n\in[N]$ , and denote the set of integers with $1_{n}=1$ as the sample set $\mathcal{T}$ . Given $\mathcal{T}$ , we define the projection operator $\mathcal{P}_{\mathcal{T}}$ as

(\mathcal{P}_{\mathcal{T}})_{t_{1},t_{2}}=1_{t_{1}\in\mathcal{T}}\cdot\delta_{% t_{1},t_{2}},

(61)

and

\displaystyle F_{\mathcal{T}}=\mathcal{P}_{\mathcal{T}}F,\quad y^{0}_{\mathcal% {T}}=\mathcal{P}_{\mathcal{T}}y^{0}.

(62)

With these notations, the compressed sensing subroutine is to solve the following optimization problem

\min\|s\|_{1},\quad\mathrm{s.t.}\quad F_{\mathcal{T}}s=y^{0}_{\mathcal{T}},% \quad{s\in\mathbb{R}^{n},}

(63)

which can be rewritten as a linear programming problem. When $|\mathcal{T}|\approx Nr=\mathcal{O}(\log N)$ and the frequency support $\mathcal{F}$ is sparse in the sense that $|\mathcal{F}|=\mathcal{O}(\log N)$ , the optimal solution $s$ equals to the frequency-domain signal $x$ with high probability. Rigorous statements can be found in [32].

Provided that the signal has extra noise $y_{n}=y^{0}_{n}+z_{n}$ , then the signal can be approximately recovered by the convex relaxation algorithm [33]:

\min_{s\in\mathbb{R}^{N}}\|s\|_{1},\quad\mathrm{s.t.}\quad\|F_{\mathcal{T}}s-y% _{\mathcal{T}}\|_{2}\leq\sqrt{|\mathcal{T}|}\sigma,

(64)

where $\sigma$ is the expected mean-square-root noise. The difference between the solution to Eq. (64) and $x$ depends on $\sigma$ . The subroutine itself is a convex quadratic programming problem that can be solved in runtime complexity $\mathcal{O}(N\log N)$ using iterative method [41]. The robustness of compressed sensing solution can be analyzed through the restricted isometry property (RIP) of random Fourier matrices. We present standard results of compressed sensing in the following.

Theorem 2 ([42]).

With $|\mathcal{T}|=\mathcal{O}(S\log^{2}S\log N)$ , the normalized random Fourier sampling matrix $\sqrt{\frac{N}{\mathcal{T}}}F_{\mathcal{T}}$ satisfies there exists a constant $\delta$ , such that for all $x\in\mathbb{C}^{N}$ with sparsity $S$ ,

(1-\delta)\cdot\|x\|_{2}^{2}\leq\sqrt{\frac{N}{|\mathcal{T}|}}\left\|F_{% \mathcal{T}}x\right\|_{2}^{2}\leq(1+\delta)\cdot\|x\|_{2}^{2}

(65)

with probability $1-\mathcal{O}(1/\mathrm{poly}(N))$ .

If a matrix $M$ satisfies Eq. (65) for all $x\in\mathcal{X}$ , then we say the matrix $M$ satisfies $\delta$ -RIP over set $\mathcal{X}$ . Using this theorem, we can prove the solution to the compressed sensing subroutine Eq. (64) is accurate through the following result. Given a real vector $v$ , we define $v_{\mathrm{res}}$ as the vector generated from $v$ by removing the $S$ largest entries. Hence,

Theorem 3.

[33] Suppose matrix $M$ satisfies $\delta$ -RIP for the set of $S$ -sparse vectors. Let $x_{1},x_{2}$ be two real vectors. If

\|x_{1}\|_{1}\leq\|x_{2}\|_{1},\quad\|M(x_{1}-x_{2})\|_{2}\leq\sigma,

(66)

then

\displaystyle\|x_{1}-x_{2}\|_{2}\leq C_{1}\sigma+C_{2}\frac{\|x_{2,\mathrm{res% }}\|_{1}}{\sqrt{S}}

(67)

where $C_{1},C_{2}$ are two constants dependent on the choice of $\delta,S$ .

Appendix B Proof of Theorem 1

The following two lemmas are critical for the proof of Theorem 1.

Lemma 5.

Given signal $y^{0}_{t}$ and $\nu_{\mathrm{opt}},x^{\mathrm{R}}_{\nu},x^{\mathrm{I}}_{\nu}$ defined in the previous paragraphs, suppose $|\nu|<1/2$ , we have

	$\displaystyle\\|x_{\mathrm{on}}\\|_{2}\cdot\sqrt{4-2\pi^{2}\|\mathcal{N}\|^{2}N^{-% 1}}\cdot\|\nu-\nu_{\mathrm{opt}}\|\leq\\|x^{\mathrm{I}}_{\nu}\\|_{2}\leq\frac{2\pi% }{\sqrt{3}}\|\nu-\nu_{\mathrm{opt}}\|,$		(68)
	$\displaystyle\\|x^{\mathrm{R}}_{\nu}-x_{\mathrm{on}}\\|_{2}\leq\frac{2\pi}{\sqrt% {3}}\|\nu-\nu_{\mathrm{opt}}\|,$		(69)
	$\displaystyle\\|x^{\mathrm{R}}_{\nu,\mathrm{res}}\\|_{1}\leq\pi^{2}\|\nu-\nu_{% \mathrm{opt}}\|\log N,\quad\\|x^{\mathrm{I}}_{\nu,\mathrm{res}}\\|_{1}\leq\pi^{2}% \|\nu-\nu_{\mathrm{opt}}\|\log N.$		(70)

The proof of Lemma 5 is presented in Appendix C.2, which is a direct corollary of Lemma 12 and Lemma 13.

Lemma 6.

[33] Suppose $\nu$ and $s_{\nu}$ satisfy

\|s_{\nu}\|_{1}\leq\|x^{\mathrm{R}}_{\nu}\|_{1},\quad\|F_{\nu,\mathcal{T}}x^{% \mathrm{R}}_{\nu}-y_{\mathcal{T}}\|_{2}\leq\sqrt{|\mathcal{T}|}\sigma,

(71)

and $F_{\nu,\mathcal{T}}$ satisfies the $\delta$ -RIP for all $S$ -sparse vectors. Then

	$\displaystyle\\|s_{\nu}-x^{\mathrm{R}}_{\nu}\\|_{2}\leq C_{1}\sigma+C_{2}\frac{% \\|x^{\mathrm{R}}_{\nu,\mathrm{res}}\\|_{1}}{\sqrt{S}}+C_{2}\frac{\\|x^{\mathrm{I% }}_{\nu,\mathrm{res}}\\|_{1}}{\sqrt{S}},$		(72)
	$\displaystyle\\|s_{\nu}-x^{\mathrm{R}}_{\nu}\\|_{1}\leq\sqrt{S}C_{1}\sigma+C_{2}% \\|x^{\mathrm{R}}_{\nu,\mathrm{res}}\\|_{1}+C_{2}\\|x^{\mathrm{I}}_{\nu,\mathrm{% res}}\\|_{1},$		(73)

where $C_{1},C_{2}$ are two constants dependent on $\delta,S$ .

The list of constants used in the proof are as follows.

Table 2: Constants used in the proof.

Constant	Origin
$C_{0}$	$(\sigma_{\mathrm{off}}+\eta)/\sigma$
$C_{1}$	Theorem 3
$C_{2}$	Theorem 3
$C_{3}$	$C_{1}+C_{2}\pi+2\pi\gamma/\sqrt{3}$
$C_{4}$	$\\|x_{\mathrm{on}}\\|_{2}\sqrt{4-\pi^{2}\|\mathcal{N}\|^{2}N^{-1}}$
$C_{5}$	$2(C_{1}+C_{2}\pi+2\pi^{2}\gamma/\sqrt{3}+C_{0})$

Here is an overview of the proof. The choice of $\mathcal{V}$ ensures that $\exists\nu_{1}\in\mathcal{V},|\nu_{1}-\nu_{\mathrm{opt}}|\leq\pi\gamma\sigma$ . By virtue of Lemma 35, the solution to

\min\|s\|_{1},\quad\mathrm{s.t.}\quad\|F_{{\nu_{1}},\mathcal{T}}s-y_{\mathcal{% T}}\|_{2}\leq\sqrt{|\mathcal{T}|}\sigma

(74)

is close enough to $x_{\mathrm{on}}$ in the sense that $\exists C_{3}>0$ ,

\|s_{\nu_{1}}-x_{\mathrm{on}}\|_{2}\leq C_{3}\sigma.

(75)

Our output $(\nu_{\ast},s_{\nu_{\ast}})$ is the optimal solution of

\min\|s\|_{1},\quad\mathrm{s.t.}\quad\exists\nu\in\mathcal{V},\ \|F_{\nu,% \mathcal{T}}s-y_{\mathcal{T}}\|_{2}\leq\sqrt{|\mathcal{T}|}\sigma.

(76)

Hence, $\|s_{\nu_{\ast}}\|_{1}\leq\|s_{\nu_{1}}\|_{1}$ . According to Lemma 6, if we further have

\|F_{{\nu_{1}},\mathcal{T}}(s_{\nu_{\ast}}-s_{\nu_{1}})\|_{2}\leq\sqrt{|% \mathcal{T}|}C_{6}\sigma,

(77)

then we can estimate $\|s_{\nu_{\ast}}-s_{\nu_{1}}\|_{2}$ , from which we can bound $\|s_{\nu_{\ast}}-x_{\mathrm{on}}\|_{2}$ and complete the proof. Thanks to Lemma 4, in order to bound $\|F_{{\nu_{1}},\mathcal{T}}(s_{\nu_{\ast}}-s_{\nu_{1}})\|_{2}$ , we use the test of another sampling set and only select solutions that satisfy

\sum_{t_{l}\in\mathcal{T}_{2}}|y_{t_{l}}-(F_{\nu}s_{\nu})_{t_{l}}|^{2}\leq|% \mathcal{T}_{2}|C_{5}^{2}\sigma^{2},

(78)

so that we can ensure that

|\nu-\nu_{\mathrm{opt}}|\leq C_{4}^{-1}C_{5}\sigma.

(79)

Using this relation, eventually we can prove Eq. (77).

B.1 Proof of Lemma 1

Without loss of generality, we assume $\nu_{\mathrm{opt}}=0$ . Then after computation, we obtain

y^{0}_{\mathrm{off},t}=\frac{1}{2\mathrm{i}}\sum_{f\in\mathcal{F}}p_{f}e^{-% \mathrm{i}2\pi ft}[1-e^{\mathrm{i}2\pi\nu_{f}}]\\

(80)

The definition of $\sigma_{\mathrm{off}}$ and the constraint for $z$ implies

\displaystyle|y^{0}_{\mathrm{off},t}|\leq\sigma_{\mathrm{off}}\quad\forall t.

(81)

Set $t=0$ . Then

\sum_{f\in\mathcal{F}}p_{f}\sin^{2}(\pi\nu_{f})\leq\sigma_{\mathrm{off}},\quad% \sum_{f\in\mathcal{F}}p_{f}\nu_{f}^{2}\leq\frac{\sigma_{\mathrm{off}}}{4}.

(82)

From here we already obtain the accuracy for the gapless situation: if $p_{f}\geq p_{\min}$ , then

|v_{f}|\leq\sqrt{\frac{\sigma_{\mathrm{off}}}{4p_{\min}}}.

(83)

Focus on the imaginary component of $y^{0}_{\mathrm{off}}$ . With the same argument, we obtain

\left|\sum_{f\in\mathcal{F}}p_{f}\sin(\pi\nu_{f})\cos\left(2\pi ft+\pi\nu_{f}% \right)\right|\leq\sigma_{\mathrm{off}}\quad\forall t.

(84)

Combining the real and imaginary component, we obtain

\sum_{f\in\mathcal{F}}p_{f}\sin(\pi\nu_{f})e^{-\mathrm{i}(2\pi ft+\pi\nu_{f})}% =\mathcal{O}(\sigma_{\mathrm{off}}).

(85)

Perform Fourier transformation with respect to $t$ on both sides. Then

\sum_{f\in\mathcal{F}}p_{f}\sin(\pi\nu_{f})e^{-\mathrm{i}\pi\nu_{f}}e^{\mathrm% {i}\pi_{N}(n-Nf)}D_{N}(n-Nf)=\mathcal{O}(\sigma_{\mathrm{off}}).

(86)

The real component gives

\sum_{f\in\mathcal{F}}p_{f}\sin(\pi\nu_{f})\cos[\pi_{N}(n-Nf)-\pi\nu_{f}]D_{N}% (n-Nf)=\mathcal{O}(\sigma_{\mathrm{off}}).

(87)

Let $\mathcal{F}_{n}$ be the set of $f\in\mathcal{F}$ with $n_{f}=n$ . Let $Q_{n}:=\sum_{f\in\mathcal{F}_{n}}p_{f}\sin(\pi\nu_{f})$ . By virtue of Lemma 10, we obtain the following two relations:

	$\displaystyle Q_{n}-\sum_{f\in\mathcal{F}_{n}}p_{f}\sin(\pi\nu_{f})\cos(\pi\nu% _{f}+\pi_{N}\nu_{f})D_{N}(\nu_{f})$
	$\displaystyle=\sum_{f\in\mathcal{F}_{n}}p_{f}\sin(\pi\nu_{f})[1-\cos\{(\pi+\pi% _{N})\nu_{f}]D_{N}(\nu_{f})\}$
	$\displaystyle=\sum_{f\in\mathcal{F}_{n}}p_{f}\sin(\pi\nu_{f})\cdot\mathcal{O}(% \nu_{f}^{2})=\mathcal{O}(\sigma_{\mathrm{off}}),\quad\mathrm{Eq.}(\ref{equ:% rough_bound}).$		(88)

The other one is

	$\displaystyle\sum_{f\in\mathcal{F}_{m}}p_{f}\sin(\pi\nu_{f})\cos[\pi_{N}(m-Nf)% -\pi\nu_{f}]D_{N}(m-Nf)$		(89)
	$\displaystyle=\sum_{f\in\mathcal{F}_{m}}p_{f}\sin^{2}(\pi\nu_{f})\frac{\cos[% \pi_{N}(m-Nf)-\pi\nu_{f}]}{N\sin[(m-Nf)/N]}$		(90)

Its norm is bounded by

\sum_{f\in\mathcal{F}_{m}}p_{f}\sin^{2}(\pi\nu_{f})\frac{2}{|m-Nf|}\leq 4(% \sigma_{\mathrm{off}})

(91)

where we have used the fact that $|m-Nf|\leq 1/2$ . Hence, Eq. (86) gives us

|Q_{n}|=\mathcal{O}(\sigma_{\mathrm{off}}).

(92)

If the frequency gap in $y^{0}_{t}$ is at least $1/N$ , then there is only one element in each $\mathcal{F}_{n}$ . Hence,

|p_{f}\sin(\nu_{f})|=\mathcal{O}(\sigma_{\mathrm{off}}),\quad|\nu_{f}|=% \mathcal{O}\left(\frac{\sigma_{\mathrm{off}}}{p_{\min}}\right)

(93)

B.2 Proof of Lemma 2

Given sampling ratio $r$ , we introduce the following random variables:

\hat{X}_{t}=\begin{cases}V_{t}:=|(F_{\nu}x^{\mathrm{I}}_{\nu})_{t}|^{2}&p=r\\ 0&p=1-r\end{cases},\quad\hat{R}=\sum_{t=0}^{N-1}\hat{X}_{t}.

(94)

One can verify that $\hat{R}$ is the random variable for $\|F_{\nu,\mathcal{T}}x^{\mathrm{I}}_{\nu}\|_{2}^{2}$ . The expectation value of $\hat{R}$ is thus

\mathbb{E}[\hat{R}]=r\cdot\|F_{\nu}x^{\mathrm{I}}_{\nu}\|_{2}^{2}=Nr\cdot\|x^{% \mathrm{I}}_{\nu}\|_{2}^{2}=\mathbb{E}[|\mathcal{T}|]\cdot\|x^{\mathrm{I}}_{% \nu}\|_{2}^{2}.

(95)

Bernstein’s inequality states that

\text{Pr}\left(\hat{R}\geq 2\mathbb{E}[\hat{R}]\right)\leq\exp\left(-\frac{% \frac{1}{2}\mathbb{E}[\hat{R}]^{2}}{\sum_{t=0}^{N-1}E[\hat{X}_{t}^{2}]+\frac{1% }{3}\max_{t}V_{t}\cdot\mathbb{E}[\hat{R}]}\right).

(96)

Note that

\sum_{t=0}^{N-1}\mathbb{E}[\hat{X}_{t}^{2}]=r\sum_{t=0}^{N-1}V_{t}^{2}.

(97)

Hence, we need an upper bound for

\sum_{t=0}^{N-1}|V_{t}|^{2}=\sum_{t=0}^{N-1}\left|(F_{\nu}x^{\mathrm{I}}_{\nu}% )_{t}\right|^{4}=\sum_{t=0}^{N-1}\left|\sum_{k=0}^{N-1}x^{\mathrm{I}}_{\nu,k}e% ^{-\mathrm{i}2\pi(k+\nu)t/N}\right|^{4}.

(98)

Recall that $y_{\mathrm{on}}=F_{\nu_{\mathrm{opt}}}x_{\mathrm{on}}$ , where $x_{\mathrm{on}}=\sum_{n\in\mathcal{N}}q_{n}\delta_{n}$ is a sparse signal. By definition,

	$\displaystyle x^{\mathrm{I}}_{\nu,k}$	$\displaystyle=\sum_{n\in\mathcal{N}}\mathrm{Im}(F^{\dagger}_{\nu}F_{\nu_{% \mathrm{opt}}})_{kn}q_{n}=\frac{1}{N}\sum_{n\in\mathcal{N}}q_{n}\sum_{\tau=0}^% {N-1}\mathrm{Im}(e^{\mathrm{i}2\pi(k+\nu)\tau/N}e^{-\mathrm{i}2\pi(n+\nu_{% \mathrm{opt}})\tau/N})$
		$\displaystyle=\frac{1}{2N\mathrm{i}}\sum_{n\in\mathcal{N}}q_{n}\sum_{\tau=0}^{% N-1}\left[e^{\mathrm{i}2\pi(k+\nu-n-\nu_{\mathrm{opt}})\tau/N}-e^{-\mathrm{i}2% \pi(k+\nu-n-\nu_{\mathrm{opt}})\tau/N}\right].$		(99)

After computation, we obtain

$\displaystyle(F_{\nu}x^{\mathrm{I}}_{\nu})_{t}$	$\displaystyle=\sum_{k=0}^{N-1}e^{-\mathrm{i}2\pi(k+\nu)t/N}x^{\mathrm{I}}_{\nu% ,k}=e^{-\mathrm{i}2\pi\nu_{\mathrm{opt}}t/N}e^{-\mathrm{i}\pi(\nu-\nu_{\mathrm% {opt}})}\sin[\pi(\nu-\nu_{\mathrm{opt}})]\cdot y_{\mathrm{on},t}$	(100)
$\displaystyle\max_{t}V_{t}$	$\displaystyle=\max_{t}\|(F_{\nu}x^{\mathrm{I}}_{\nu})_{t}\|^{2}=\sin^{2}[\pi(\nu% -\nu_{\mathrm{opt}})]\cdot\|y_{\mathrm{on},t}\|^{2}\leq\pi^{2}\|\nu-\nu_{\mathrm{% opt}}\|^{2},$	(101)
$\displaystyle\sum_{t=0}^{N-1}V_{t}^{2}$	$\displaystyle=\sum_{t=0}^{N-1}\|(F_{\nu}x^{\mathrm{I}}_{\nu})_{t}\|^{4}\leq\pi^{% 4}\|\nu-\nu_{\mathrm{opt}}\|^{4}\cdot\sum_{t=0}^{N-1}\|y_{\mathrm{on,t}}\|^{4}\leq% \pi^{4}N\|\nu-\nu_{\mathrm{opt}}\|^{4}.$	(102)

Bernstein’s inequality then gives

\text{Pr}\left(\sum_{t=0}^{N-1}\hat{X}_{t}\geq 2\mathbb{E}[\hat{R}]\right)\leq% \exp\left[-\frac{-\frac{1}{2}NrC_{4}^{4}}{\pi^{4}+\frac{1}{3}\pi^{2}C_{4}^{2}}% \right].

(103)

Here $C_{4}\approx 2\|x_{\mathrm{on}}\|_{2}$ satisfies $\|x^{\mathrm{I}}_{\nu}\|_{2}\geq C_{4}|\nu-\nu_{\mathrm{opt}}|$ according to Lemma 5. Since $Nr=\mathcal{O}(\log N)$ , we can conclude that

\|F_{\nu,\mathcal{T}}x^{\mathrm{I}}_{\nu}\|_{2}\leq\sqrt{\frac{2\mathbb{E}[% \mathcal{T}]}{N}}\|F_{\nu}x^{\mathrm{I}}_{\nu}\|_{2}\leq 2\sqrt{\frac{|% \mathcal{T}|}{N}}\|F_{\nu}x^{\mathrm{I}}_{\nu}\|_{2}\leq 2\sqrt{|\mathcal{T}|}% \cdot\|x^{\mathrm{I}}_{\nu}\|_{2}\leq\frac{4\pi}{\sqrt{3}}\sqrt{|\mathcal{T}|}% |\nu-\nu_{\mathrm{opt}}|

(104)

with probability at least $1-1/\mathrm{poly}(N)$ .

B.3 Proof of Lemma 35

Using the triangle inequality, we have

\|s_{\nu}-x_{\mathrm{on}}\|_{2}\leq\|s_{\nu}-x^{\mathrm{R}}_{\nu}\|_{2}+\|x^{% \mathrm{R}}_{\nu}-x_{\mathrm{on}}\|_{2}.

(105)

The precondition in the lemma ensures that $x^{\mathrm{R}}_{\nu}$ is feasible. Using Theorem 3, Lemma 5 and the condition that $|\nu-\nu_{\mathrm{opt}}|\leq\gamma\sigma$ , we obtain

$\displaystyle\\|s_{\nu}-x^{\mathrm{R}}_{\nu}\\|_{2}$	$\displaystyle\leq C_{1}\sigma+C_{2}\\|x^{\mathrm{R}}_{\nu,\mathrm{res}}\\|_{1}/% \sqrt{S}$
	$\displaystyle\leq C_{1}\sigma+C_{2}\pi\|\nu-\nu_{\mathrm{opt}}\|\log N/\sqrt{S}$
	$\displaystyle\leq(C_{1}+C_{2}\pi)\sigma,$	(106)
$\displaystyle\\|x^{\mathrm{R}}_{\nu}-x_{\mathrm{on}}\\|_{2}$	$\displaystyle\leq\frac{2\pi}{\sqrt{3}}\|\nu-\nu_{\mathrm{opt}}\|\leq\frac{2\pi}{% \sqrt{3}}\gamma\sigma.$	(107)

Hence,

\|s_{\nu}-x_{\mathrm{on}}\|_{2}\leq C_{3}\sigma,\quad C_{3}:=C_{1}+C_{2}\pi+% \frac{2\pi\gamma}{\sqrt{3}}.

(108)

B.4 Proof of Lemma 4

The Chernoff-Hoeffding’s inequality states that

Lemma 7.

Suppose $b\in\mathbb{C}^{N}$ is a complex vector with bounded entries $|b_{i}|<1$ . If we randomly sample an entry for $L$ times (denote the order of sampling by $l=1,2,\cdots,L$ , and write the set of sampled indices as $\{i_{l}\}_{l=1}^{L}$ ), then the sum of $|b_{i_{l}}|^{2}$ satisfies

	$\displaystyle\mathrm{Pr}\left[\sum_{l=1}^{L}\|b_{i_{l}}\|^{2}\leq\frac{L}{2N}\\|b% \\|_{2}^{2}\right]<\exp\left(-\frac{L}{2}\right),$		(109)
	$\displaystyle\mathrm{Pr}\left[\sum_{l=1}^{L}\|b_{i_{l}}\|^{2}\geq\frac{3L}{2N}\\|% b\\|_{2}^{2}\right]<\exp\left(-\frac{L}{2}\right).$		(110)

Proof.

Each sampling corresponds to a random variable $\hat{B}_{l}$ satisfying

\forall l,\quad\mathrm{Pr}\left[\hat{B}_{l}=|b_{i}|^{2}\right]=\frac{1}{N},% \quad i=0,1,\cdots,N-1.

(111)

The Chernoff-Hoeffding’s inequality ensures that

\mathrm{Pr}\left[\sum_{l=1}^{L}\hat{B}_{l}\leq\frac{L}{2}\mathbb{E}[\hat{B}]% \right]<\exp\left(-\frac{L}{2\max_{i}|b_{i}|^{2}}\right)<\exp\left(-\frac{L}{2% }\right),

(112)

where

\mathbb{E}[\hat{B}]=\frac{1}{N}\sum_{i=1}^{N}|b_{i}|^{2}=\frac{1}{N}\|b\|_{2}^% {2}.

(113)

If we choose $L=\mathcal{T}(\log(p_{\mathrm{fail}}^{-1}))$ , then with probability at least $1-p_{\mathrm{fail}}$ , we have

\frac{1}{L}\sum_{l=1}^{L}|b_{l}|^{2}\geq\frac{1}{2N}\sum_{i=1}^{N}|b_{i}|^{2}.

(114)

∎

Lemma 8.

Given $\nu$ . The solution of the compressed sensing subroutine is $s_{\nu}$ ; $\mathcal{S}$ is a random sampling of integers. Let $r=F_{\nu}s_{\nu}$ . Suppose

\sum_{t\in\mathcal{S}}|y_{t}-r_{t}|^{2}\leq|\mathcal{S}|C_{5}^{2}\sigma^{2}.

(115)

Then with high probability,

|\nu-\nu_{\mathrm{opt}}|\leq C_{4}^{-1}\sqrt{C_{0}^{2}+2C_{5}^{2}}\sigma.

(116)

Proof.

Let

\mathcal{E}=\sum_{t\in\mathcal{S}}|y_{t}-r_{t}|^{2}

(117)

By virtue of Lemma 7, with high probability, we have

\frac{2N}{3|\mathcal{S}|}\mathcal{E}\leq\|y-r\|_{2}^{2}\leq\frac{2N}{|\mathcal% {S}|}\mathcal{E}.

(118)

Note that

$\displaystyle\\|y-r\\|_{2}^{2}$	$\displaystyle=\\|F_{\nu}(x^{\mathrm{R}}_{\nu}+\mathrm{i}x^{\mathrm{I}}_{\nu}-s_% {\nu})+y^{0}_{\mathrm{off}}+z\\|_{2}^{2}$
	$\displaystyle\geq\\|F_{\nu}(x^{\mathrm{R}}_{\nu}+\mathrm{i}x^{\mathrm{I}}_{\nu}% -s_{\nu})\\|_{2}^{2}-NC_{0}^{2}\sigma^{2}$
	$\displaystyle\geq N\\|x^{\mathrm{R}}_{\nu}-s_{\nu}\\|_{2}^{2}+N\\|x^{\mathrm{I}}_% {\nu}\\|_{2}^{2}-NC_{0}^{2}\sigma^{2}$
	$\displaystyle\geq N\left(\\|x^{\mathrm{I}}_{\nu}\\|_{2}^{2}-C_{0}^{2}\sigma^{2}% \right)\geq N\left(C_{4}^{2}\|\nu-\nu_{0}\|^{2}-C_{0}^{2}\sigma^{2}\right)$	(119)

Finally,

|\nu-\nu_{0}|^{2}\leq C_{4}^{-2}\left(C_{0}^{2}\sigma^{2}+\frac{2}{|\mathcal{S% }|}\mathcal{E}\right).

(120)

The proof is completed using the condition that $\mathcal{E}\leq|\mathcal{S}|C_{5}^{2}\sigma^{2}$ . ∎

The value of $C_{5}$ can be settled down by requiring that at least one $\nu\in\mathcal{V}$ should pass the test.

Lemma 9.

Suppose $|\nu-\nu_{0}|<\gamma\sigma$ , the solution of the compressed sensing subroutine is $s_{\nu}$ ; $\mathcal{S}$ is a random sampling of integers. Let $r=F_{\nu}s_{\nu}$ . Then with high probability,

\sum_{t\in\mathcal{S}}|y_{t}-r_{t}|^{2}\leq\frac{3}{2}|\mathcal{S}|(C_{3}^{2}+% C_{4}\gamma^{2}+C_{0}^{2})\sigma^{2}.

(121)

Proof.

When $|\nu-\nu_{\mathrm{opt}}|<\gamma\sigma$ , Lemma 35 ensures that

\|x^{\mathrm{R}}_{\nu}-s_{\nu}\|_{2}\leq C_{3}\sigma.

(122)

Then we have

$\displaystyle\\|y-r\\|_{2}^{2}$	$\displaystyle\leq N\\|x^{\mathrm{R}}_{\nu}-s_{\nu}\\|_{2}^{2}+N\\|x^{\mathrm{I}}_% {\nu}\\|_{2}^{2}+NC_{0}^{2}\sigma^{2}$	(123)
	$\displaystyle\leq NC_{3}^{2}\sigma^{2}+NC_{4}\|\nu-\nu_{0}\|^{2}+NC_{0}^{2}% \sigma^{2}$	(124)
	$\displaystyle\leq N(C_{3}^{2}+C_{4}^{2}\gamma^{2}+C_{0}^{2})\sigma^{2}.$	(125)

By the Hoeffding’s inequality (Lemma 7), we obtain

\sum_{t\in\mathcal{S}}|y_{t}-r_{t}|^{2}\leq\frac{3|\mathcal{S}|}{2}\left(C_{3}% ^{2}+C_{4}^{2}\gamma^{2}+C_{0}^{2}\right)\sigma^{2}.

(126)

∎

Therefore, we can set

C_{5}:=\sqrt{\frac{3}{2}(C_{3}^{2}+C_{4}^{2}\gamma^{2}+C_{0}^{2})},

(127)

and eventually, using Algorithm 3, we can narrow down the region to

|\nu-\nu_{\mathrm{opt}}|\leq C_{4}^{-1}\sqrt{4C_{0}^{2}+3C_{4}^{2}\gamma^{2}+3% C_{3}^{2}}\sigma.

(128)

B.5 Proof of Theorem 1

The optimal solution $s_{\nu_{\ast}}$ satisfies two constraints: $\|s_{\nu_{\ast}}\|_{1}\leq\|s_{\nu_{1}}\|_{1}$ , and

$\displaystyle\\|F_{{\nu_{1}},\mathcal{T}}(s_{\nu_{\ast}}-s_{\nu_{1}})\\|_{2}$	$\displaystyle\leq\\|F_{{\nu_{1}},\mathcal{T}}s_{\nu_{\ast}}-y_{\mathcal{T}}\\|_{% 2}+\\|F_{{\nu_{1}},\mathcal{T}}s_{\nu_{1}}-y_{\mathcal{T}}\\|_{2}$
	$\displaystyle\leq\\|F_{\nu_{\ast},\mathcal{T}}s_{\nu_{\ast}}-y_{\mathcal{T}}\\|_% {2}+\\|F_{\nu_{\ast},\mathcal{T}}s_{\nu_{\ast}}-F_{{\nu_{1}},\mathcal{T}}s_{\nu% _{\ast}}\\|_{2}+\sqrt{\|\mathcal{T}\|}\sigma$
	$\displaystyle\leq 2\sqrt{\|\mathcal{T}\|}\sigma+\\|F_{\nu_{\ast},\mathcal{T}}s_{% \nu_{\ast}}-F_{{\nu_{1}},\mathcal{T}}s_{\nu_{\ast}}\\|_{2}.$	(129)

The RHS can be computed by

$\displaystyle\\|F_{\nu_{\ast},\mathcal{T}}s_{\nu_{\ast}}-F_{{\nu_{1}},\mathcal{% T}}s_{\nu_{\ast}}\\|_{2}^{2}$	$\displaystyle=\sum_{t\in\mathcal{T}}\left\|\sum_{k}s_{\nu_{\ast},k}\left(e^{-% \mathrm{i}2\pi(k+\nu_{\ast})t/N}-e^{-\mathrm{i}2\pi(k+\nu_{1})t/N}\right)% \right\|^{2}$
	$\displaystyle=\sum_{t\in\mathcal{T}}\left\|\sum_{k}s_{\nu_{\ast},k}e^{-\mathrm{% i}2\pi kt/N}\right\|^{2}\cdot 4\sin^{2}\left[\frac{\pi(\nu_{\ast}-\nu_{1})t}{N}\right]$
	$\displaystyle=\sum_{t\in\mathcal{T}}\|(Fs_{\nu_{\ast}})_{t}\|^{2}\cdot 4\sin^{2}% \left[\frac{\pi(\nu_{\ast}-\nu_{1})t}{N}\right]$
	$\displaystyle\leq 4\pi^{2}\|\mathcal{T}\|\cdot\|\nu_{\ast}-\nu_{1}\|^{2}.$	(130)

Combining it with

|\nu_{\ast}-\nu_{\mathrm{opt}}|\leq C_{4}^{-1}\sqrt{4C_{0}^{2}+3C_{4}^{2}% \gamma^{2}+3C_{3}^{2}}\sigma,\quad|\nu_{1}-\nu_{\mathrm{opt}}|\leq\gamma\sigma,

(131)

we obtain

\displaystyle\|F_{{\nu_{1}},\mathcal{T}}(s_{\nu_{\ast}}-s_{\nu_{1}})\|_{2}\leq% \sqrt{|\mathcal{T}|}C_{6}\sigma,\quad C_{6}:=2+2\pi\left(C_{4}^{-1}\sqrt{4C_{0% }^{2}+3C_{4}^{2}\gamma^{2}+3C_{3}^{2}}+\gamma\right).

(132)

Finally, according to Theorem 3, $\|s_{\nu_{\ast}}-s_{\nu_{1}}\|_{2}$ has upper bound

C_{1}C_{6}\sigma+C_{2}\frac{\|s_{1,\mathrm{res}}\|_{1}}{\sqrt{S}}\leq C_{1}C_{% 6}\sigma+C_{2}(C_{1}+C_{2}\pi+\pi^{2})\sigma,

(133)

where we have used

	$\displaystyle\\|s_{1,\mathrm{res}}\\|_{1}$	$\displaystyle\leq\\|x^{\mathrm{R}}_{\nu_{1},\mathrm{dom}}-s_{\nu_{1},\mathrm{% dom}}\\|_{1}+\\|x^{\mathrm{R}}_{\nu_{1},\mathrm{res}}\\|_{1}$
		$\displaystyle\leq\\|x^{\mathrm{R}}_{\nu_{1}}-s_{\nu_{1}}\\|_{1}+\pi^{2}\gamma% \sigma\log N\leq\sqrt{S}(C_{1}+C_{2}\pi+\pi^{2})\sigma.$		(134)

Recall that $\|s_{\nu_{1}}-x_{\mathrm{on}}\|_{1}\leq C_{3}\sigma$ . Therefore,

\|s_{\nu_{\ast}}-x_{\mathrm{on}}\|_{2}\leq\left[C_{1}C_{6}+C_{2}(C_{1}+C_{2}% \pi+\pi^{2})+C_{3}\right]\sigma.

(135)

The prefactor is a constant that is linear in $\gamma$ .

Appendix C Proof of technical lemmas

C.1 Properties of the Dirichlet kernel

The Dirichlet kernel is defined as

D_{N}(\nu):=\frac{1}{N}\sum_{m=0}^{N-1}\alpha_{0}^{-N+1+2m},\quad\alpha_{0}:=e% ^{\mathrm{i}\pi\nu/N}.

(136)

In a more concise form, it equals

D_{N}(\nu)=\begin{cases}1,\quad&\nu=0\\ \frac{\sin(\pi\nu)}{N\sin(\pi\nu/N)},\quad&\nu\neq 0.\end{cases}

(137)

We start with a few estimations for the Dirichlet kernel.

Lemma 10.

Given $|\nu|\leq 1/2$ . The Dirichlet kernel satisfies

1.

$1-D_{N}(\nu)^{2}\leq\frac{\pi\nu^{2}}{3}$ ,
2.

$|D_{N}(n+\nu)|\leq\frac{\pi|\nu|}{2|n+\nu|_{\mathrm{mod}N}}$ ,
3.

$\sum_{n=0}^{N-1}D_{N}(n+\nu)D_{N}(n+\nu+l)=\delta_{l,0}$ .

Proof.

(1) If $\exists c>0$ , $1-D_{N}(\nu)\leq c\nu^{2}$ , then

1-D_{N}^{2}(\nu)\leq 1-(1-c\nu^{2})^{2}\leq 2c\nu^{2}.

(138)

Therefore, we can bound $1-D_{N}(\nu)$ instead. Consider the case where $N$ is odd. We have

$\displaystyle 1-D_{N}(\nu)$	$\displaystyle=\frac{1}{N}\sum_{m=0}^{N-1}(1-\alpha_{0}^{-N+1+2m}),$
	$\displaystyle=\frac{1}{N}\sum_{m^{\prime}=1}^{\frac{N-1}{2}}(2-\alpha_{0}^{2m^% {\prime}}-\alpha_{0}^{-2m^{\prime}})=\frac{1}{N}\sum_{m^{\prime}=1}^{\frac{N-1% }{2}}4\sin^{2}\left(\frac{m^{\prime}\pi\nu}{N}\right)$
	$\displaystyle\leq\frac{4}{N}\sum_{m^{\prime}=1}^{\frac{N-1}{2}}\frac{\pi^{2}(m% ^{\prime})^{2}\nu^{2}}{N^{2}}=\frac{4\pi^{2}}{N^{3}}\frac{1}{6}\frac{N-1}{2}% \frac{N+1}{2}N\nu^{2}\leq\frac{\pi^{2}}{6}\nu^{2}.$	(139)

Hence, $1-D_{N}(\nu)^{2}\leq\frac{\pi^{2}}{3}\nu^{2}$ . The even $N$ situation is similar.

(2)

|D(n+\nu)|=\left|\frac{\sin[\pi(n+\nu)]}{N\sin[\pi(n+\nu)/N]}\right|\leq\frac{% \pi|\nu|}{2|n+\nu|_{\mathrm{mod}N}}.

(140)

(3) For $l=0$ , we have

$\displaystyle D_{N}(n+\nu)$	$\displaystyle=\frac{1}{N}\sum_{m=0}^{N-1}\alpha_{n}^{-N+1+2m},\quad\alpha_{n}:% =e^{\mathrm{i}\pi(n+\nu)/N},$	(141)
$\displaystyle D_{N}(n+\nu)^{2}$	$\displaystyle=\frac{1}{N^{2}}\sum_{m,m^{\prime}=0}^{N-1}\alpha_{n}^{-2N+2+2m+2% m^{\prime}}$
	$\displaystyle=\frac{1}{N}+\frac{1}{N^{2}}\sum_{m+m^{\prime}\neq N-1}\alpha_{n}% ^{-2N+2+2m+2m^{\prime}}.$	(142)

For any $k\neq 0$ ,

\sum_{n=0}^{N-1}\alpha_{n}^{k}=\sum_{n=0}^{N-1}e^{\mathrm{i}\pi(n+\nu)k/N}=e^{% \mathrm{i}k\nu\pi/N}\sum_{n=0}^{N-1}e^{\mathrm{i}\pi nk/N}=0.

(143)

Hence, $\sum_{n=0}^{N-1}D_{N}(n+\nu)^{2}=1$ .

For $l\neq 0$ , we have

	$\displaystyle\sum_{n=0}^{N-1}D_{N}(n+\nu)D_{N}(n+\nu+l)$	$\displaystyle=\frac{1}{N^{2}}\sum_{m_{1},m_{2}=0}^{N-1}N\delta_{m_{1}+m_{2}+1-% N}\exp\left[\mathrm{i}\frac{\pi l}{N}(2m_{2}-N+1)\right]$
		$\displaystyle=\frac{1}{N}\sum_{m=0}^{N-1}\exp\left[\mathrm{i}\frac{\pi l}{N}(2% m-N+1)\right]=0.$		(144)

∎

C.2 Proof of Lemma 5

Define vectors $\bm{c}_{\nu},\bm{s}_{\nu}$ with the following entries:

	$\displaystyle\bm{c}_{\nu,k}:=\cos[\pi(1-N^{-1})(k+\nu)]D_{N}(k+\nu),$		(145)
	$\displaystyle\bm{s}_{\nu,k}:=\sin[\pi(1-N^{-1})(k+\nu)]D_{N}(k+\nu).$		(146)

In the following paragraphs, we use notation $\pi_{N}:=\pi(1-N^{-1})$ .

Lemma 11.

Given $\bm{s}_{\nu},\bm{c}_{\nu}$ defined in the previous paragraph. Suppose $|\nu|\leq 1/2$ , we have

	$\displaystyle\\|\bm{s}_{\nu}\\|_{1}\leq\|\bm{s}_{\nu,0}\|+\pi^{2}\|\nu\|\log N,$		(147)
	$\displaystyle\\|\bm{c}_{\nu}\\|_{1}\leq\|\bm{c}_{\nu,0}\|+\pi^{2}\|\nu\|\log N,$		(148)
	$\displaystyle 2\|\nu\|\leq\\|\bm{s}_{\nu}\\|_{2}\leq\frac{2\pi}{\sqrt{3}}\|\nu\|,$		(149)
	$\displaystyle\\|\bm{c}_{\nu}-\delta_{0}\\|_{2}\leq\frac{2\pi}{\sqrt{3}}\|\nu\|.$		(150)

Proof.

According to Lemma 10, we have

\displaystyle|\bm{s}_{\nu,0}|\leq\pi\nu,\quad|\bm{s}_{\nu,k}|\leq\frac{\pi|\nu% |}{2|k+\nu|_{\mathrm{mod}N}}.

(151)

Assuming $N$ is odd, then we can conclude that

$\displaystyle\sum_{k=0}^{N-1}\|\bm{s}_{\nu,k}\|^{2}$	$\displaystyle\leq\max_{k\neq 0}\sin^{2}\left[\pi_{N}(k+\nu)\right]\cdot\sum_{k% \neq 0}D_{N}(k+\nu)^{2}+\|\sin\left(\pi_{N}\nu\right)\|^{2}\cdot D_{N}(\nu)^{2}$
	$\displaystyle\leq 1-D_{N}(\nu)^{2}+\|\sin\left(\pi_{N}\nu\right)\|^{2}\leq\frac{% 4}{3}\pi^{2}\nu^{2},$	(152)
$\displaystyle\sum_{k\neq 0}\|\bm{s}_{\nu,k}\|$	$\displaystyle\leq\sum_{k\neq 0}\|D_{N}(k+\nu)\|\leq\frac{\pi\|\nu\|}{2}\sum_{k=1}^% {\frac{N-1}{2}}\left(\frac{1}{k+\nu}+\frac{1}{k-\nu}\right)$
	$\displaystyle\leq\pi\|\nu\|\sum_{k=1}^{\frac{N-1}{2}}\left(\frac{1}{2k-1}+\frac{% 1}{2k}\right)\leq\pi\|\nu\|\left(1+\ln\frac{N-1}{2}\right)$	(153)

One can verify that

1+\ln\frac{N-1}{2}\leq\pi\log N.

(154)

This completes the proof of Eq. (147). The proof for Eq. (148) is similar.

According to Lemma 10, we have

$\displaystyle\bm{c}_{\nu,k}$	$\displaystyle=\cos\left[\pi_{N}(k+\nu)\right]\cdot D_{N}(k+\nu),$	(155)
$\displaystyle\sum_{k\neq 0}\|\bm{c}_{\nu,k}\|^{2}$	$\displaystyle\leq\max_{k\neq 0}\cos^{2}\left[\pi_{N}(k+\nu)\right]\cdot\sum_{k% \neq 0}\|D_{N}(k+\nu)\|^{2}$
	$\displaystyle\leq\sum_{k\neq 0}D_{N}(k+\nu)^{2}=1-\|D_{N}(\nu)\|^{2}\leq\frac{% \pi^{2}}{3}\|\nu\|^{2}.$	(156)

Given the fact that $\|\bm{s}_{\nu}\|_{2}^{2}+\|\bm{c}_{\nu}\|_{2}^{2}=1$ , we obtain

$\displaystyle\\|\bm{s}_{\nu}\\|_{2}^{2}$	$\displaystyle=\sum_{k=0}^{N-1}\sin^{2}[\pi_{N}(k+\nu)]D_{N}^{2}(k+\nu)$
	$\displaystyle=\sum_{k=0}^{N-1}\sin^{2}[\pi(k+\nu)]D_{N}^{2}(k+\nu)$
	$\displaystyle\quad+\sum_{k=0}^{N-1}\left\{\sin^{2}[\pi_{N}(k+\nu)]-\sin^{2}[% \pi(k+\nu)]\right\}D_{N}(k+\nu)^{2}$
	$\displaystyle=\sin^{2}(\pi\nu)+\cos(2\pi\nu)\sum_{k=0}^{N-1}\sin^{2}\left[% \frac{\pi(k+\nu)}{N}\right]D_{N}(k+\nu)^{2}$
	$\displaystyle\quad-\sin(2\pi\nu)\sum_{k=0}^{N-1}\sin\left[\frac{\pi(k+\nu)}{N}% \right]\cos\left[\frac{\pi(k+\nu)}{N}\right]D_{N}(k+\nu)^{2}$	(157)

Here is the computation for the first term:

$\displaystyle\sum_{k=0}^{N-1}\sin^{2}\left[\frac{\pi(k+\nu)}{N}\right]D_{N}(k+% \nu)^{2}$	$\displaystyle=\sum_{k=0}^{N-1}\frac{2-\alpha_{k}^{2}-\alpha_{k}^{-2}}{4}\cdot% \frac{1}{N^{2}}\sum_{m,m^{\prime}=0}^{N-1}\alpha_{k}^{-2N+2+2m+2m^{\prime}}$
	$\displaystyle=\frac{1}{2}-\frac{1}{4N}(N-1+e^{\mathrm{i}2\pi\nu})-\frac{1}{4N}% (N-1+e^{-\mathrm{i}2\pi\nu})$
	$\displaystyle=\frac{1}{N}\sin^{2}(\pi\nu).$	(158)

Similarly,

	$\displaystyle\quad\sum_{k=0}^{N-1}\sin\left[\frac{\pi(k+\nu)}{N}\right]\cos% \left[\frac{\pi(k+\nu)}{N}\right]D_{N}(k+\nu)^{2}$
	$\displaystyle=\sum_{k=0}^{N-1}\frac{\alpha_{k}^{2}-\alpha_{k}^{-2}}{4\mathrm{i% }}\cdot\frac{1}{N^{2}}\sum_{m,m^{\prime}=0}^{N-1}\alpha_{k}^{-2N+2+2m+2m^{% \prime}}$
	$\displaystyle=\frac{1}{4N\mathrm{i}}(N-1+e^{\mathrm{i}2\pi\nu})-\frac{1}{4N% \mathrm{i}}(N-1+e^{-\mathrm{i}2\pi\nu})=\frac{1}{2N}\sin(2\pi\nu).$		(159)

Hence, $\|\bm{s}_{\nu}\|_{2}^{2}=\sin^{2}(\pi\nu)+\mathcal{O}(|\nu|^{2}N^{-1})\geq 4|% \nu|^{2}$ . This completes the proof of Eq. (149).

Finally,

$\displaystyle\\|\bm{c}_{\nu}-\delta_{0}\\|_{2}^{2}$	$\displaystyle\leq\|1-\bm{c}_{\nu,0}\|^{2}+\sum_{k\neq 0}\|\bm{c}_{\nu,k}\|^{2}$
	$\displaystyle\leq\|1-\cos\left(\pi_{N}\nu\right)D_{N}(\nu)\|^{2}+1-\|D_{N}(\nu)\|^% {2}$
	$\displaystyle\leq 2-2\|D_{N}(\nu)\|+\sin^{2}\left(\pi_{N}\nu\right)D_{N}(\nu)^{2% }\leq\frac{4}{3}\pi^{2}\|\nu\|^{2}.$	(160)

This completes the proof of Eq. (150).

∎

For single-frequency situation, we have $x^{\mathrm{R}}_{\nu}=\bm{c}_{\nu-\nu_{\mathrm{opt}}},\ x^{\mathrm{I}}_{\nu}=% \bm{s}_{\nu-\nu_{\mathrm{opt}}}$ . Recall that $x_{\mathrm{on}}=\sum_{n\in\mathcal{N}}q_{n}\delta_{n}$ . Then we have the following upper and lower bounds.

Lemma 12.

Given signal $y^{0}_{t}$ and $\nu_{\mathrm{opt}},x^{\mathrm{R}}_{\nu},x^{\mathrm{I}}_{\nu}$ defined in the previous paragraphs, suppose $|\nu|\leq 1/2$ , we have

	$\displaystyle\frac{\\|x^{\mathrm{I}}_{\nu}\\|_{2}}{\\|x_{\mathrm{on}}\\|_{1}}\leq% \frac{2\pi}{\sqrt{3}}\|\nu-\nu_{\mathrm{opt}}\|,\quad\frac{\\|x^{\mathrm{R}}_{\nu% }-x^{\mathrm{R}}_{0}\\|_{2}}{\\|x_{\mathrm{on}}\\|_{1}}\leq\frac{2\pi}{\sqrt{3}}\|% \nu-\nu_{\mathrm{opt}}\|,$		(161)
	$\displaystyle\frac{\\|x^{\mathrm{R}}_{\nu,\mathrm{res}}\\|_{1}}{\\|x_{\mathrm{on}% }\\|_{1}}\leq\pi^{2}\|\nu-\nu_{\mathrm{opt}}\|\log N,\quad\frac{\\|x^{\mathrm{I}}_% {\nu,\mathrm{res}}\\|_{1}}{\\|x_{\mathrm{on}}\\|_{1}}\leq\pi^{2}\|\nu-\nu_{\mathrm% {opt}}\|\log N.$		(162)

Proof.

By definition, we have

	$\displaystyle x^{\mathrm{R}}_{\nu}=\sum_{n\in\mathcal{N}}q_{n}\bm{c}^{(n)}_{% \nu},\quad\bm{c}^{(n)}_{\nu,k}:=\cos\left[\pi_{N}(k-n+\nu-\nu_{\mathrm{opt}})% \right]D_{N}(k-n+\nu-\nu_{\mathrm{opt}}),$		(163)
	$\displaystyle x^{\mathrm{I}}_{\nu}=\sum_{n\in\mathcal{N}}q_{n}\bm{s}^{(n)}_{% \nu},\quad\bm{s}^{(n)}_{\nu,k}:=\sin\left[\pi_{N}(k-n+\nu-\nu_{\mathrm{opt}})% \right]D_{N}(k-n+\nu-\nu_{\mathrm{opt}}).$		(164)

Note that $\bm{s}^{(n)}_{\nu},\bm{c}^{(n)}_{\nu}$ are simply $\bm{s}_{\nu},\bm{c}_{\nu}$ with a permutation in entries. In Lemma 11, we have proved that

\|\bm{s}_{\nu}\|_{2}\leq\frac{2\pi}{\sqrt{3}}|\nu_{\mathrm{opt}}-\nu|,\quad\|% \bm{c}_{\nu}^{(n)}-\delta_{n}\|_{2}\leq\frac{2\pi}{\sqrt{3}}|\nu|.

(165)

Hence,

	$\displaystyle\\|x^{\mathrm{I}}_{\nu}\\|_{2}\leq\sum_{n\in\mathcal{N}}q_{n}\\|\bm{% s}^{(n)}_{\nu}\\|_{2}\leq\\|x_{\mathrm{on}}\\|_{1}\cdot\frac{2\pi}{\sqrt{3}}\|\nu_% {\mathrm{opt}}-\nu\|,$		(166)
	$\displaystyle\\|x^{\mathrm{R}}_{\nu}-x_{\mathrm{on}}\\|_{2}\leq\sum_{n\in% \mathcal{N}}q_{n}\\|\bm{c}^{(n)}_{\nu}-\delta_{n}\\|_{2}\leq\\|x_{\mathrm{on}}\\|_% {1}\cdot\frac{2\pi}{\sqrt{3}}\|\nu\|.$		(167)

Similarly, by linearity and the triangle inequality, we obtain

	$\displaystyle\\|x^{\mathrm{R}}_{\nu,\mathrm{res}}\\|_{1}$	$\displaystyle\leq\sum_{n\in\mathcal{N}}q_{n}\\|\bm{c}^{(n)}_{\nu,\mathrm{res}}% \\|_{1}\leq\\|x_{\mathrm{on}}\\|_{1}\cdot\pi^{2}\|\nu-\nu_{\mathrm{opt}}\|\log N,$		(168)
	$\displaystyle\\|x^{\mathrm{I}}_{\nu,\mathrm{res}}\\|_{1}$	$\displaystyle\leq\sum_{n\in\mathcal{N}}q_{n}\\|\bm{s}^{(n)}_{\nu,\mathrm{res}}% \\|_{1}\leq\\|x_{\mathrm{on}}\\|_{1}\cdot\pi^{2}\|\nu-\nu_{\mathrm{opt}}\|\log N.$		(169)

∎

The most critical part is the lower bound of $\|x^{\mathrm{I}}_{\nu}\|_{2}$ . We prove it in the following lemma separately.

Lemma 13.

Given a signal $y^{0}_{t}$ and $\nu_{\mathrm{opt}},x^{\mathrm{R}}_{\nu},x^{\mathrm{I}}_{\nu}$ defined in the previous paragraphs, suppose $|\nu|\leq 1/2$ , we have

\displaystyle\|x^{\mathrm{I}}_{\nu}\|_{2}\geq C_{4}|\nu-\nu_{\mathrm{opt}}|,% \quad C_{4}:=\|x_{\mathrm{on}}\|_{2}\sqrt{4-\pi^{2}|\mathcal{N}|^{2}N^{-1}}.

(170)

Proof.

Define $\delta\nu:=\nu-\nu_{\mathrm{opt}}$ for simplicity. Then

\|x^{\mathrm{I}}_{\nu}\|_{2}^{2}=\sum_{n\in\mathcal{N}}q_{n}^{2}\mathcal{M}_{n% ,n}+\sum_{n\neq m\in\mathcal{N}}q_{n}q_{m}\mathcal{M}_{n,m}

(171)

where $\mathcal{M}_{n,m}:=$

\sum_{k=0}^{N-1}\sin\left[\pi_{N}(k-n+\delta\nu)\right]\sin\left[\pi_{N}(k-m+% \delta\nu)\right]D_{N}(k-n+\delta\nu)D_{N}(k-m+\delta\nu).

(172)

Note that

$\displaystyle\\|x^{\mathrm{I}}_{\nu}\\|_{2}^{2}$	$\displaystyle\geq\sum_{n\in\mathcal{N}}q_{n}^{2}\|\mathcal{M}_{n,n}\|-\sum_{n% \neq m\in\mathcal{N}}q_{n}q_{m}\|\mathcal{M}_{n,m}\|,$	(173)
$\displaystyle\sum_{n\neq m\in\mathcal{N}}q_{n}q_{m}\|\mathcal{M}_{n,m}\|$	$\displaystyle\leq\frac{1}{2}\sum_{n\neq m\in\mathcal{N}}(q_{n}^{2}+q_{m}^{2})\|% \mathcal{M}_{n,m}\|$
	$\displaystyle=\sum_{n\in\mathcal{N}}q_{n}^{2}\cdot\sum_{m\in\mathcal{N}\neq n}% \|\mathcal{M}_{n,m}\|.$	(174)

Hence,

\|x^{\mathrm{I}}_{\nu}\|_{2}^{2}\geq\sum_{n\in\mathcal{N}}q_{n}^{2}\left(|% \mathcal{M}_{n,n}|-\sum_{m\in\mathcal{N}\neq n}|\mathcal{M}_{n,m}|\right).

(175)

By virtue of Lemma 11, for all $n$ , we have

\mathcal{M}_{n,n}=\sum_{k=0}^{N-1}\sin^{2}\left[\pi_{N}(k-n+\delta\nu)\right]D% ^{2}_{N}(k-n+\delta\nu)=\|\bm{s}_{\nu}\|_{2}^{2}\geq 4\delta\nu^{2}.

(176)

By virtue of Lemma 10, for all $n\neq m$ , we have

\begin{split}\mathcal{M}_{n,m}&=\sum_{k=0}^{N-1}\cos[\pi_{N}(n-m)]D_{N}(k-n+% \delta\nu)D_{N}(k-m+\delta\nu)\\ &\quad-\sum_{k=0}^{N-1}\cos\left[\pi_{N}(2k-n-m+2\delta\nu)\right]D_{N}(k-n+% \delta\nu)D_{N}(k-m+\delta\nu)\\ &=-\sum_{k=0}^{N-1}\cos\left[\pi_{N}(2k-n-m+2\delta\nu)\right]D_{N}(k-n+\delta% \nu)D_{N}(k-m+\delta\nu).\end{split}

(177)

Let $\alpha_{k}:=e^{\mathrm{i}\pi(k-n+\delta\nu)/N},\beta_{k}:=e^{\mathrm{i}\pi(k-m% +\delta\nu)/N}$ . Then

$\displaystyle-\mathcal{M}_{n,m}$	$\displaystyle=\sum_{k=0}^{N-1}\frac{\alpha_{k}^{N-1}\beta_{k}^{N-1}+\alpha_{k}% ^{1-N}\beta_{k}^{1-N}}{2}\cdot\frac{1}{N^{2}}\sum_{j=0}^{N-1}\sum_{j^{\prime}=% 0}^{N-1}\alpha_{k}^{-N+1+2j}\beta_{k}^{-N+1+2j^{\prime}}$
	$\displaystyle=\frac{1}{2N^{2}}\sum_{j=0}^{N-1}\sum_{j^{\prime}=0}^{N-1}\sum_{k% =0}^{N-1}\alpha_{k}^{2j}\beta_{k}^{2j^{\prime}}+\frac{1}{2N^{2}}\sum_{j=0}^{N-% 1}\sum_{j^{\prime}=0}^{N-1}\sum_{k=0}^{N-1}\alpha_{k}^{-2N+2+2j}\beta_{k}^{-2N% +2+2j^{\prime}}$
	$\displaystyle=\frac{1}{2N}\sum_{j=0}^{N-1}\sum_{j^{\prime}=0}^{N-1}e^{\mathrm{% i}2\pi\delta\nu(j+j^{\prime})/N}e^{-\mathrm{i}2\pi(jn+j^{\prime}m)/N}\delta_{j% +j^{\prime}=0,N}$
$\displaystyle+\frac{1}{2N}$	$\displaystyle\sum_{j=0}^{N-1}\sum_{j^{\prime}=0}^{N-1}e^{\mathrm{i}2\pi\delta% \nu(-2N+2+j+j^{\prime})/N}e^{\mathrm{i}2\pi[n(N-1-j)+m(N-1-j^{\prime})]/N}% \delta_{j+j^{\prime}=2N-2,N-2}$
	$\displaystyle=\frac{1}{2N}(1-e^{\mathrm{i}2\pi\delta\nu})+\frac{1}{2N}(1-e^{-% \mathrm{i}2\pi\delta\nu})=\frac{2\sin^{2}(\pi\delta\nu)}{N}\leq\frac{2\pi^{2}}% {N}\delta\nu^{2}.$	(178)

Hence,

\|x^{\mathrm{I}}_{\nu}\|_{2}^{2}\geq\sum_{n\in\mathcal{N}}q_{n}^{2}\cdot\delta% \nu^{2}\cdot\left[4-\frac{\pi^{2}}{N}|\mathcal{N}|(|\mathcal{N}|-1)\right],

(179)

and the lemma is proved from here. ∎

$\displaystyle\\|s_{\nu}-x^{\mathrm{R}}_{\nu}\\|_{2}$	$\displaystyle\leq C_{1}\sigma+C_{2}\\|x^{\mathrm{R}}_{\nu,\mathrm{res}}\\|_{1}/% \sqrt{S}$
	$\displaystyle\leq C_{1}\sigma+C_{2}\pi\|\nu-\nu_{\mathrm{opt}}\|\log N/\sqrt{S}$
	$\displaystyle\leq(C_{1}+C_{2}\pi)\sigma,$	(106)
$\displaystyle\\|x^{\mathrm{R}}_{\nu}-x_{\mathrm{on}}\\|_{2}$	$\displaystyle\leq\frac{2\pi}{\sqrt{3}}\|\nu-\nu_{\mathrm{opt}}\|\leq\frac{2\pi}{% \sqrt{3}}\gamma\sigma.$	(107)

$\displaystyle\\|y-r\\|_{2}^{2}$	$\displaystyle=\\|F_{\nu}(x^{\mathrm{R}}_{\nu}+\mathrm{i}x^{\mathrm{I}}_{\nu}-s_% {\nu})+y^{0}_{\mathrm{off}}+z\\|_{2}^{2}$
	$\displaystyle\geq\\|F_{\nu}(x^{\mathrm{R}}_{\nu}+\mathrm{i}x^{\mathrm{I}}_{\nu}% -s_{\nu})\\|_{2}^{2}-NC_{0}^{2}\sigma^{2}$
	$\displaystyle\geq N\\|x^{\mathrm{R}}_{\nu}-s_{\nu}\\|_{2}^{2}+N\\|x^{\mathrm{I}}_% {\nu}\\|_{2}^{2}-NC_{0}^{2}\sigma^{2}$
	$\displaystyle\geq N\left(\\|x^{\mathrm{I}}_{\nu}\\|_{2}^{2}-C_{0}^{2}\sigma^{2}% \right)\geq N\left(C_{4}^{2}\|\nu-\nu_{0}\|^{2}-C_{0}^{2}\sigma^{2}\right)$	(119)

$\displaystyle\\|F_{{\nu_{1}},\mathcal{T}}(s_{\nu_{\ast}}-s_{\nu_{1}})\\|_{2}$	$\displaystyle\leq\\|F_{{\nu_{1}},\mathcal{T}}s_{\nu_{\ast}}-y_{\mathcal{T}}\\|_{% 2}+\\|F_{{\nu_{1}},\mathcal{T}}s_{\nu_{1}}-y_{\mathcal{T}}\\|_{2}$
	$\displaystyle\leq\\|F_{\nu_{\ast},\mathcal{T}}s_{\nu_{\ast}}-y_{\mathcal{T}}\\|_% {2}+\\|F_{\nu_{\ast},\mathcal{T}}s_{\nu_{\ast}}-F_{{\nu_{1}},\mathcal{T}}s_{\nu% _{\ast}}\\|_{2}+\sqrt{\|\mathcal{T}\|}\sigma$
	$\displaystyle\leq 2\sqrt{\|\mathcal{T}\|}\sigma+\\|F_{\nu_{\ast},\mathcal{T}}s_{% \nu_{\ast}}-F_{{\nu_{1}},\mathcal{T}}s_{\nu_{\ast}}\\|_{2}.$	(129)

$\displaystyle\\|F_{\nu_{\ast},\mathcal{T}}s_{\nu_{\ast}}-F_{{\nu_{1}},\mathcal{% T}}s_{\nu_{\ast}}\\|_{2}^{2}$	$\displaystyle=\sum_{t\in\mathcal{T}}\left\|\sum_{k}s_{\nu_{\ast},k}\left(e^{-% \mathrm{i}2\pi(k+\nu_{\ast})t/N}-e^{-\mathrm{i}2\pi(k+\nu_{1})t/N}\right)% \right\|^{2}$
	$\displaystyle=\sum_{t\in\mathcal{T}}\left\|\sum_{k}s_{\nu_{\ast},k}e^{-\mathrm{% i}2\pi kt/N}\right\|^{2}\cdot 4\sin^{2}\left[\frac{\pi(\nu_{\ast}-\nu_{1})t}{N}\right]$
	$\displaystyle=\sum_{t\in\mathcal{T}}\|(Fs_{\nu_{\ast}})_{t}\|^{2}\cdot 4\sin^{2}% \left[\frac{\pi(\nu_{\ast}-\nu_{1})t}{N}\right]$
	$\displaystyle\leq 4\pi^{2}\|\mathcal{T}\|\cdot\|\nu_{\ast}-\nu_{1}\|^{2}.$	(130)

	$\displaystyle\\|s_{1,\mathrm{res}}\\|_{1}$	$\displaystyle\leq\\|x^{\mathrm{R}}_{\nu_{1},\mathrm{dom}}-s_{\nu_{1},\mathrm{% dom}}\\|_{1}+\\|x^{\mathrm{R}}_{\nu_{1},\mathrm{res}}\\|_{1}$
		$\displaystyle\leq\\|x^{\mathrm{R}}_{\nu_{1}}-s_{\nu_{1}}\\|_{1}+\pi^{2}\gamma% \sigma\log N\leq\sqrt{S}(C_{1}+C_{2}\pi+\pi^{2})\sigma.$		(134)

Quantum phase estimation by compressed sensing

Abstract

1 Introduction

2 Main idea

2.1 Setup

2.2 Previous work

2.3 QPE by compressed sensing

3 Main results

3.1 Algorithm

3.2 Analysis

Lemma 1.

Lemma 3 (A good ν𝜈\nuitalic_ν generates a good solution).

Lemma 4 (A bad ν𝜈\nuitalic_ν generates a bad solution).

Theorem 1.

4 Numerical results

4.1 Previous algorithms

4.2 Models and results

5 Discussions

6 Acknowledgement

References

Appendix A Standard results in compressed sensing

Theorem 2 ([42]).

Theorem 3.

Appendix B Proof of Theorem 1

Lemma 5.

Lemma 6.

B.1 Proof of Lemma 1

B.2 Proof of Lemma 2

B.3 Proof of Lemma 35

B.4 Proof of Lemma 4

Lemma 7.

Proof.

Lemma 8.

Proof.

Lemma 9.

Proof.

B.5 Proof of Theorem 1

Appendix C Proof of technical lemmas

C.1 Properties of the Dirichlet kernel

Lemma 10.

Proof.

C.2 Proof of Lemma 5

Lemma 11.

Proof.

Lemma 12.

Proof.

Lemma 13.

Proof.

Lemma 3 (A good $\nu$ generates a good solution).

Lemma 4 (A bad $\nu$ generates a bad solution).