\newmdenv

[skipabove=7pt, skipbelow=7pt, backgroundcolor=darkblue!15, innerleftmargin=5pt, innerrightmargin=5pt, innertopmargin=5pt, leftmargin=0cm, rightmargin=0cm, innerbottommargin=5pt, linewidth=1pt]tBox \newmdenv[skipabove=7pt, skipbelow=7pt, backgroundcolor=blue2!25, innerleftmargin=5pt, innerrightmargin=5pt, innertopmargin=5pt, leftmargin=0cm, rightmargin=0cm, innerbottommargin=5pt, linewidth=1pt]dBox \newmdenv[skipabove=7pt, skipbelow=7pt, backgroundcolor=darkred!15, innerleftmargin=5pt, innerrightmargin=5pt, innertopmargin=5pt, leftmargin=0cm, rightmargin=0cm, innerbottommargin=5pt, linewidth=1pt]rBox

Non-asymptotic Approximation Error Bounds of Parameterized Quantum Circuits

Zhan Yu^{1, 2} Qiuhao Chen¹¹¹footnotemark: 1 Yuling Jiao^{1, 3} Yinan Li^{1, 3} Xiliang Lu^{1, 3}
Xin Wang⁴ Jerry Zhijian Yang^{1, 3}
¹ School of Mathematics and Statistics, Wuhan University, Wuhan 430072, China
² Centre for Quantum Technologies, National University of Singapore, 117543, Singapore
³ Hubei Key Laboratory of Computational Science, Wuhan 430072, China
⁴ Thrust of Artificial Intelligence, Information Hub,
Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511453, China Z.Y. and Q.C. contributed equally to this [email protected]@whu.edu.cn

Abstract

Parameterized quantum circuits (PQCs) have emerged as a promising approach for quantum neural networks. However, understanding their expressive power in accomplishing machine learning tasks remains a crucial question. This paper investigates the expressivity of PQCs for approximating general multivariate function classes. Unlike previous Universal Approximation Theorems for PQCs, which are either nonconstructive or rely on parameterized classical data processing, we explicitly construct data re-uploading PQCs for approximating multivariate polynomials and smooth functions. We establish the first non-asymptotic approximation error bounds for these functions in terms of the number of qubits, quantum circuit depth, and number of trainable parameters. Notably, we demonstrate that for approximating functions that satisfy specific smoothness criteria, the quantum circuit size and number of trainable parameters of our proposed PQCs can be smaller than those of deep ReLU neural networks. We further validate the approximation capability of PQCs through numerical experiments. Our results provide a theoretical foundation for designing practical PQCs and quantum neural networks for machine learning tasks that can be implemented on near-term quantum devices, paving the way for the advancement of quantum machine learning.

1 Introduction

In quantum computing, one key area is to investigate if quantum computers could accelerate classical machine learning tasks in data analysis and artificial intelligence, giving rise to an interdisciplinary field known as quantum machine learning [1]. As the quantum analogs of classical neural networks, parameterized quantum circuits (PQCs) [2] have gained significant attention as a prominent paradigm to yield quantum advantages. PQCs offer a concrete and practical way to implement quantum machine learning algorithms in noisy and intermediate-scale quantum (NISQ) devices [3], rendering them well-suited for a diverse array of tasks [4, 5, 6, 7, 8, 9, 10, 11].

To establish the practical significance of quantum machine learning, an ongoing pursuit is to demonstrate their superiority in solving real-world learning problems compared to classical learning models, including the most commonly used deep neural networks [12]. Typical supervised learning tasks, such as image classification and price prediction, aim to construct a model to learn a mapping function from the input to output via training data sets. Essentially, the goal is to approximate multivariate functions. This viewpoint leads to the celebrated Universal Approximation Theorem [13, 14], which limits what neural networks can theoretically learn. Recently, powerful tools from approximation theory have been utilized to establish a fruitful mathematical framework for understanding the “black magic” of deep learning by establishing non-asymptotic approximation error bounds of deep neural networks in terms of the width, depth, number of weights (neurons) and function complexities, see e.g. Refs. [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25] and references therein.

Substantial investigations have showcased the power of quantum machine learning for specific learning tasks [26, 27, 28, 29, 30, 31, 32, 33]. A fundamental question is whether the expressivity of quantum machine learning models is as powerful as, or is more powerful than, the expressivity of classical machine learning models. This can be illustrated by proving universal approximation theorems for PQCs [34, 35, 36, 37, 38, 39, 40, 41], indicating that there exist PQCs with suitable parameter configurations to approximate target functions up to a given approximation accuracy. This will justify the power of PQCs to solve supervised learning tasks in a mathematical way. To further investigate whether PQCs are more expressive than the classical models or not, it is natural to examine the PQC approximation performance by establishing approximation error bounds for important function classes. Such quantitative error bounds are less known in the quantum setting, because the hypothesis functions generated by PQCs are more complicated than those generated by classical neural networks.

The difficulties of analyzing the PQC approximation performances can be partially overcome by allowing parameterized classical data processing. Namely, trainable parameters are allowed not only in the quantum gates in PQCs but also in the classical data pre- and post-processing. This allows one to prove approximation error bounds following classical strategies [39, 41, 40]. For instance, Goto et al. [39] proved PQC approximation error rate for Lipschitz continuous functions in terms of the number of qubits and trainable parameters by incorporating trainable parameters in the measurement post-processing phase; similar results can also be obtained by utilizing Tensor-Train Network [41] or by linear transformations to preprocess the classical data.

However, utilizing parameterized classical data processing makes it hard to distinguish whether the expressive power of PQCs comes from the classical or quantum parts. In fact, parameterized classical data processing enables one to directly convert the hypothesis functions generated by the quantum models into hypothesis functions generated by classical ones and adapt expressivity results for classical machine learning models to extract the expressivity of such quantum models. As a consequence, the resulting PQCs have very simple structures and short depth. It remains unknown whether one can prove approximation error bounds for PQCs without parameterized classical data processing. On the other hand, Zhao et al. [42] proved exponential lower bounds on the number of trainable parameters (in terms of the number of variables) needed for approximating bounded Lipschitz continuous functions using PQCs without parameterized classical data processing, illustrating that using PQCs to approximate Lipschitz functions still suffers from the curse of dimensionality (CoD) met by classical deep neural networks [43]. However, this does not rule out the possibility that one can achieve the same approximation rate with PQCs of smaller size compared to classical deep neural networks.

In this paper, we explicitly construct the first PQCs without parameterized classical data processing for approximating multivariate polynomials and smooth functions; a glance at these constructed PQCs is illustrated in Fig. 1. This eliminates the ambiguity regarding whether the expressivity originates from classical or quantum parts. We also establish non-asymptotic PQC approximation error bounds, in the sense that the PQC approximation performances are characterized in terms of the number of qubits (width), the depth of PQCs, the number of trainable parameters/gates (parameter count), and the function complexities. These results enable us to compare the approximation power of PQCs with that of classical neural networks. Notably, we show that for multivariate smooth functions, the quantum circuit size and the number of trainable parameters of our proposed PQCs demonstrate an improvement over the prior result of deep ReLU neural networks [21], one of the most commonly used neural network family in classical deep learning theory. Our proposed PQCs not only possess the universal approximation property but also achieve parameter efficiency comparable to classical neural networks, potentially leading to more efficient and scalable quantum machine learning algorithms for real-world tasks.

Refer to caption — Figure 1: Overview of PQCs for approximating continuous functions. (a) Flowchart illustrating the strategy for using PQCs to approximate continuous functions via implementing Bernstein polynomials. The input data $x$ is encoded into the PQC through $S(x)$ , with the PQC (blue background) capable of representing parity-constrained polynomials up to degree $3$ (as $x$ is encoded three times). The technique of linear combination of unitaries (LCU) is used to aggregate these polynomials together. The output of PQC derives from measurement with a specific observable. Fine-tuning trainable parameters in $R_{Z}$ gates yields a polynomial output depicted in the right panel. (b) Flowchart illustrating the strategy of approximation via local Taylor expansions. We first apply a PQC to localize the input domain into $K=5$ regions. For example, for input $x\in[0.8,1]$ , PQC outputs $x^{\prime}=0.8$ as a fixed point. Then $x-x^{\prime}$ will be fed into a new PQC for implementing the local Taylor expansions at the fixed point $x^{\prime}$ , forming a nesting architecture. Control gates with pink backgrounds implement the Taylor coefficients. Fine-tuning trainable parameters in $R_{X}$ and $R_{Z}$ gates yields a piecewise polynomial with degree $3$ that approximates the target function.

2 Preliminaries

Quantum states.

The basic unit of information in quantum computing is the qubit, which can exist in a superposition of the states 0 and 1 simultaneously, unlike classical bits that are restricted to either 0 or 1. A pure quantum state in the $d$ -dimensional Hilbert space $\mathbb{C}^{d}$ is represented by the Dirac notation $\ket{\phi}$ . The conjugate transpose of $\ket{\phi}$ is denoted by $\bra{\phi}$ . The inner product of two quantum states $\ket{\phi}$ and $\ket{\psi}$ is written as $\braket{\phi}{\psi}$ . An important property is that $\braket{\phi}{\phi}=1$ for any pure state $\ket{\psi}$ . By convention, the computational basis states for single-qubit systems are written as $\ket{0}=[1,0]^{T}$ and $\ket{1}=[0,1]^{T}$ , where the superscript $T$ denotes the transpose. For $n$ -qubit systems, the computational basis states are expressed as $\ket{j}\in\{\ket{0},\ket{1}\}^{\otimes n}$ , where $\otimes$ denotes the tensor product operation.

Quantum gates.

Quantum gates are building blocks of quantum circuits operating on quantum states. Unlike classical gates, quantum gates are reversible and described as unitary matrices. In quantum machine learning, common parameterized quantum gates include single-qubit Pauli rotation gates $R_{X}(\theta)=e^{-\theta X/2}$ , $R_{Y}(\theta)=e^{-\theta Y/2}$ , and $R_{Z}(\theta)=e^{-\theta X/2}$ that rotate a quantum state through angle $\theta$ around the corresponding axis, where the three Pauli operators are defined as:

X=\begin{bmatrix}0&1\\ 1&0\end{bmatrix},\quad Y=\begin{bmatrix}0&-i\\ i&0\end{bmatrix},\quad Z=\begin{bmatrix}1&0\\ 0&-1\end{bmatrix},

where $i$ represents the imaginary unit. Commonly used two-qubit quantum gates include CNOT gate that flips the target qubit if and only if the the control qubit is in $\ket{1}$ .

Quantum measurement

The quantum measurement is a procedure manipulating a quantum system to extract classical information. The simplest measurement is the computational basis measurement: For a single-qubit system $\ket{\psi}=\alpha\ket{0}+\beta\ket{1}$ , the outcome is either $\ket{0}$ with probability $|\alpha|^{2}$ or $\ket{1}$ with probability $|\beta|^{2}$ . These measurements project the quantum state onto the measured basis, collapsing the state itself. Observables, represented by Hermitian operators, correspond to measurable quantities in a quantum system like energy or position. Each observable has a set of possible outcomes (eigenvalues) and corresponding states (eigenvectors). When a measurement of an observable is performed, the outcome is one of the eigenvalues, and the state of the system collapses to the corresponding eigenvector. If we are measuring a state $\ket{\psi}$ using observable ${\cal O}$ , the expected value of outcome is $\bra{\psi}{\cal O}\ket{\psi}$ . This represents the average result one would expect from repeated measurements on identically prepared systems. A comprehensive introduction to the fundamental notations and concepts of quantum computation can be found in [44].

Data re-uploading PQCs.

The PQCs we shall construct in this paper are of data re-uploading type [11], i.e., consisting of interleaved data encoding circuit blocks and trainable circuit blocks. More precisely, let $\bm{x}$ be the input data vector and $\bm{\theta}=(\bm{\theta}_{0},\ldots,\bm{\theta}_{L})$ be a set of trainable parameter vectors. $S(\bm{x})$ is a quantum circuit that encode $\bm{x}$ and $V(\bm{\theta}_{j})$ is a trainable quantum circuit with trainable parameter vector $\bm{\theta}_{j}$ . An $L$ -layer data re-uploading PQC can be then expressed as

U_{\bm{\theta}}(\bm{x})=V(\bm{\theta_{0}})\prod_{j=1}^{L}S(\bm{x})V(\bm{\theta% _{j}}),

(1)

Applying $U_{\bm{\theta}}(\bm{x})$ to an initial quantum state and measuring the output states provides a way to express functions on $\bm{x}$ :

f_{U_{\bm{\theta}}}(\bm{x})\coloneqq\bra{0}U^{\dagger}_{\bm{\theta}}(\bm{x}){% \cal O}U_{\bm{\theta}}(\bm{x})\ket{0},

(2)

where ${\cal O}$ is some Hermitian observable. The approximation capability of the PQC $U_{\bm{\theta}}(\bm{x})$ can be characterized by the classes of functions that $f_{U_{\bm{\theta}}}(\bm{x})$ can approximate by tuning the trainable parameter vector $\bm{\theta}$ . We then turn to an example of single-qubit PQCs approximating univariate functions. For the input $x\in[-1,1]$ , we utilized the Pauli $X$ basis encoding scheme [10] and defined the data encoding operator as a Pauli X rotation $S(x)\coloneqq e^{i\arccos(x)X}$ . Interleaving the data encoding unitary $S(x)$ with some parameterized Pauli $Z$ rotations $R_{Z}(\theta)$ gives the circuit of data re-uploading PQC for one variable as $U_{\bm{\theta}}(x)\coloneqq R_{Z}(\theta_{0})\prod_{j=1}^{L}S(x)R_{Z}(\theta_{% j})$ where $\bm{\theta}\ =(\theta_{0},\ldots,\theta_{L})\in{{\mathbb{R}}}^{L+1}$ is a set of trainable parameters. Utilizing results from quantum signal processing [45, 46, 47], there exists $\bm{\theta}\in{{\mathbb{R}}}^{L+1}$ such that $U_{\bm{\theta}}(x)$ implements polynomial transformations $p(x)\in{{\mathbb{R}}}[x]$ as $p(x)=\braket{+}{U_{\bm{\theta}}(x)}{+}$ for any $x\in[-1,1]$ if and only if the degree of $p(x)$ is at most $L$ , the parity of $p(x)$ is $L\bmod 2$ ¹¹1A polynomial $p(x)$ has parity $0$ if all coefficients corresponding to odd powers of $x$ are $0$ , and similarly $p(x)$ has parity $1$ if all coefficients corresponding to even powers of $x$ are $0$ ., and $\lvert p(x)\rvert\leq 1$ for all $x\in[-1,1]$ . Then, univariate functions that could be approximated by the specified polynomial $p(x)$ could also be approximated by the PQC $U_{\bm{\theta}}(x)$ . Other than the real polynomials, there are also types of single-qubit PQC with Pauli $Z$ basis encoding that could implement complex trigonometric polynomials [37].

3 Expressivity of PQCs for multivariate continuous functions

3.1 Explicit construction of PQCs for multivariate polynomials

Although PQCs for approximate univariate functions have been constructed and analyzed, they have not yet been generally extended to the case of multivariate functions. Current proofs of universal approximation for multivariate functions are nonconstructive [34, 38] and require arbitrary circuit width, arbitrary multi-qubit global parameterized unitaries, and arbitrary observables. Goto et al. [39] proposed several constructions for approximating multivariate functions with the assistance of parameterized data pre-processing and post-processing, yielding a quantum-enhanced hybrid scheme rather than a purely quantum setting.

We now move to our explicit construction of PQCs for multivariate polynomials. A multivariate polynomial with $d$ variables and degree $s$ is defined as $p(\bm{x})\coloneqq\sum_{\lVert\bm{\alpha}\rVert_{1}\leq s}c_{\bm{\alpha}}\bm{x% ^{\alpha}}$ where $\bm{x^{\alpha}}=x_{1}^{\alpha_{1}}x_{2}^{\alpha_{2}}\cdots x_{d}^{\alpha_{d}}$ . To implement the multivariate polynomial $p(\bm{x})$ , we first build a PQC to express a monomial $c_{\bm{\alpha}}\bm{x^{\alpha}}$ . The construction is a trivial extension of the univariate case: We simply apply the single-qubit PQC with Pauli $X$ basis encoding on each $x_{j}$ to implement $x_{j}^{\alpha_{j}}$ for $1\leq j\leq d$ , respectively. The coefficient $c_{\bm{\alpha}}\in{{\mathbb{R}}}$ could be implemented by any of these PQCs. Thus we could construct a PQC $U^{\bm{\alpha}}(\bm{x})\coloneqq\bigotimes_{j=1}^{d}U_{\bm{\theta}_{j}}(x_{j})$ such that $\bra{+}^{\otimes d}\!U^{\bm{\alpha}}(\bm{x})\!\ket{+}^{\otimes d}=c_{\bm{% \alpha}}\bm{x^{\alpha}}$ . The depth of the PQC $U^{\bm{\alpha}}(\bm{x})$ is at most $2s+1$ , the width is at most $d$ , and the number of parameters is at most $s+d$ .

Having PQCs that implement monomials, the next step is to aggregate monomials to implement the multivariate polynomial. A natural idea is to sum the monomial PQCs together as $U_{p}(\bm{x})=\sum_{\lVert\bm{\alpha}\rVert_{1}\leq s}U^{\bm{\alpha}}(\bm{x})$ . However, the addition operation in quantum computing is non-trivial as the sum of unitary operators is not necessarily unitary. To overcome this issue, we utilize linear combination of unitaries (LCU) [48] to implement the operator $U_{p}(\bm{x})$ on a quantum computer. Realizing the linear combination of PQCs $U^{\bm{\alpha}}(\bm{x})$ requires applying multi-qubit control on each $U^{\bm{\alpha}}(\bm{x})$ , which could be further decomposed into linear-depth quantum circuits of CNOT gates and single-qubit rotation gates without using any ancilla qubit [49]. Then we can obtain the polynomial $p(\bm{x})=\bra{+}^{\otimes d}\!U_{p}(\bm{x})\!\ket{+}^{\otimes d}$ by applying the Hadamard test on the LCU circuit. Summarizing the above, we establish the following theorem about using PQCs to implement multivariate polynomials. A formal description of such PQCs is given in Appendix B.

Theorem 1.

For any multivariate polynomial $p(\bm{x})$ with $d$ variables and degree $s$ such that $\lvert p(\bm{x})\rvert\leq 1$ for $\bm{x}\in[0,1]^{d}$ , there exists a PQC $W_{p}(\bm{x})$ such that

f_{W_{p}}(\bm{x})\coloneqq\bra{0}W^{\dagger}_{p}(\bm{x})Z^{(0)}W_{p}(\bm{x})% \ket{0}=p(\bm{x})

(3)

where $Z^{(0)}$ is the Pauli $Z$ observable on the first qubit. The width of the PQC is $O(d+\log s+s\log d)$ , the depth is $O(s^{2}d^{s}(\log s+s\log d))$ , and the number of parameters is $O(sd^{s}(s+d))$ .

Note that the initial state in the Hadamard test is $\ket{0}^{\otimes d}$ since $\ket{+}^{\otimes d}$ could be easily prepared by applying Hadamard gates on $\ket{0}^{\otimes d}$ . Measuring the first qubit of $W_{p}(\bm{x})$ for $O(\frac{1}{\varepsilon^{2}})$ times is needed to estimate the value of $p(\bm{x})$ up to an additive error $\varepsilon$ . We could further use the amplitude estimation algorithm [50] to reduce the overhead while increasing the circuit depth by $O(\frac{1}{\varepsilon})$ .

3.2 PQC approximation for continuous functions

Polynomials play a central role in approximation theory. The celebrated Weierstrass approximation theorem (see e.g. [51, Sec. 10.2.2]) indicates that polynomials are sufficient to approximate continuous univariate functions. For multivariate functions, their approximation can be implemented using Bernstein polynomials [52, 53]. We shall apply these results to prove PQC approximation error bounds for multivariate Lipschitz continuous functions.

For a $d$ -variable continuous function $f\mathrel{\mathop{\mathchar 58\relax}}[0,1]^{d}\to{{\mathbb{R}}}$ , the multivariate Bernstein polynomial with degree $n\in{{\mathbb{N}}}^{+}$ of $f$ is defined as

B_{n}(\bm{x})\coloneqq\sum_{k_{1}=0}^{n}\cdots\sum_{k_{d}=0}^{n}f\bigl{(}\frac% {\bm{k}}{n}\bigr{)}\prod_{j=1}^{d}\binom{n}{k_{j}}x_{j}^{k_{j}}(1-x_{j})^{n-k_% {j}},

(4)

where $\bm{k}=(k_{1},\ldots,k_{d})\in\{0,\ldots,n\}^{d}$ . It is known that Bernstein polynomials converge uniformly to $f$ on $[0,1]^{d}$ as $n\to\infty$ [52, 53]. The PQC constructed in Theorem 1 could implement the Bernstein polynomial with proper rescaling, which implies that the PQC is a universal approximator for any bounded continuous functions.

Theorem 2 (The Universal Approximation Theorem of PQC).

For any continuous function $f\mathrel{\mathop{\mathchar 58\relax}}[0,1]^{d}\to[-1,1]$ , given an $\varepsilon>0$ , there exist an $n\in{{\mathbb{N}}}$ and a PQC $W_{b}(\bm{x})$ with width $O(d\log n)$ , depth $O(dn^{d}\log n)$ and the number of trainable parameters $O(dn^{d})$ such that

\lvert f(\bm{x})-f_{W_{b}}(\bm{x})\rvert\leq\varepsilon

(5)

for all $\bm{x}\in[0,1]^{d}$ , where $f_{W_{b}}(\bm{x})\coloneqq\bra{0}W^{\dagger}_{b}(\bm{x})Z^{(0)}W_{b}(\bm{x})% \ket{0}$ .

Theorem 2 serves as the quantum counterpart to the universal approximation theorem of classical neural networks. Moreover, the PQCs that universally approximate continuous functions are explicitly constructed without any impractical assumption, improving the previous results presented in Refs. [34, 38]. Moreover, for continuous functions $f$ satisfying the Lipschitz condition, $\lvert f(\bm{x})-f(\bm{y})\rvert\leq\ell\lVert\bm{x}-\bm{y}\rVert_{\infty}$ for any $\bm{x},\bm{y}$ , the approximation rate of Bernstein polynomials could be quantitatively characterized in terms of the degree $n$ , the number of variables $d$ and the Lipschitz constant $\ell$ [53]. Thus a non-asymptotic error bound for PQC approximating Lipschitz continuous functions could be obtained as follows.

Theorem 3.

Given a Lipschitz continuous function $f\mathrel{\mathop{\mathchar 58\relax}}[0,1]^{d}\to[-1,1]$ with a Lipschitz constant $\ell$ , for any $\varepsilon>0$ and $n\in{{\mathbb{N}}}$ , there exists a PQC $W_{b}(\bm{x})$ with such that $f_{W_{b}}(\bm{x})\coloneqq\bra{0}W^{\dagger}_{b}(\bm{x})Z^{(0)}W_{b}(\bm{x})% \ket{0}$ satisfies

\lvert f(\bm{x})-f_{W_{b}}(\bm{x})\rvert\leq\varepsilon+2\biggl{(}\Bigl{(}1+% \frac{\ell^{2}}{n\varepsilon^{2}}\Bigr{)}^{d}-1\biggr{)}\leq\varepsilon+d2^{d}% \frac{\ell^{2}}{n\varepsilon^{2}}

(6)

for all $\bm{x}\in[0,1]^{d}$ . The width of the PQC is $O(d\log n)$ , the depth is $O\bigl{(}dn^{d}\log{n}\bigr{)}$ , and the number of parameters is $O(dn^{d})$ .

We prove these theorems in Appendix C. Although a quantitative approximation error bound is characterized in Theorem 3, we could find that $n$ must be sufficiently large to obtain a good precision, yielding an extremely deep PQC. This inefficiency is essentially due to the intrinsic difficulty of using a single global polynomial to approximate a continuous function uniformly. A possible approach that may overcome the obstacle is to use local polynomials to achieve a piecewise approximation, which we will discover in the next section.

3.3 PQC approximation for Hölder smooth functions

To achieve a piecewise approximation of multivariate functions, we follow the path of classical deep neural networks approximation [18, 21, 25], which utilizes multivariate Taylor series to approximate target functions in small local regions.

We focus on Hölder smooth functions. Let $\beta=s+r>0$ , where $r\in(0,1]$ and $s\in{{\mathbb{N}}}^{+}$ . For a finite constant $B_{0}>0$ , the $\beta$ -Hölder class of functions ${\cal H}^{\beta}([0,1]^{d},B_{0})$ is defined as

{\cal H}^{\beta}([0,1]^{d},B_{0})\!=\!\Bigl{\{}f\!\mathrel{\mathop{\mathchar 5% 8\relax}}[0,1]^{d}\!\to\!{{\mathbb{R}}},\max_{\lVert\bm{\alpha}\rVert_{1}\leq s% }\lVert\partial^{\bm{\alpha}}f\rVert_{\infty}\!\leq\!B_{0},\max_{\lVert\bm{% \alpha}\rVert_{1}=s}\sup_{\bm{x}\neq\bm{y}}\frac{\lvert\partial^{\bm{\alpha}}f% (\bm{x})-\partial^{\bm{\alpha}}f(\bm{y})\rvert}{\lVert\bm{x}-\bm{y}\rVert_{2}^% {r}}\!\leq\!B_{0}\Bigr{\}},

(7)

where $\partial^{\bm{\alpha}}=\partial^{\alpha_{1}}\cdots\partial^{\alpha_{d}}$ for $\bm{\alpha}=(\alpha_{1},\ldots,\alpha_{d})\in{{\mathbb{N}}}^{d}$ . We note that Hölder smooth functions are natural generalizations of various continuous functions: When $\beta\in(0,1)$ , $f$ is Hölder continuous with order $\beta$ and Hölder constant $B_{0}$ ; when $\beta=1$ , $f$ is Lipschitz continuous with Lipschitz constant $B_{0}$ ; when $1<\beta\in{{\mathbb{N}}}$ , $f\in C^{s}([0,1]^{d})$ , the class of $s$ -smooth functions whose $s$ -th partial derivatives exist and are bounded. As shown in Petersen and Voigtlaender [18], for any $\beta$ -Hölder smooth function $f\in{\cal H}^{\beta}([0,1]^{d},B_{0})$ , its local Taylor expansion at some fixed point $\bm{x}_{0}\in[0,1]^{d}$ satisfies

\Big{\lvert}f(\bm{x})-\sum_{\lVert\bm{\alpha}\rVert_{1}\leq s}\frac{\partial^{% \bm{\alpha}}f(\bm{x_{0}})}{\bm{\alpha}!}(\bm{x}-\bm{x_{0}})^{\bm{\alpha}}\Big{% \rvert}\leq d^{s}\lVert\bm{x}-\bm{x_{0}}\rVert^{\beta}_{2}

(8)

for all $\bm{x}\in[0,1]^{d}$ , where $\bm{\alpha}!=\alpha_{1}!\cdots\alpha_{d}!$ . Next, we show how to construct PQCs to implement the Taylor expansion of $\beta$ -Hölder functions in the following three steps.

Localization.

To utilize the Hölder smoothness, we need to first localize the entire region $[0,1]^{d}$ . The motivation of localization is to determine the local point $\bm{x_{0}}$ in Eq. 8 so that the distance between $\bm{x}$ and $\bm{x_{0}}$ is fairly small. An intuitive configuration is illustrated in Fig. 2, where the stars represent the local points. Given $K\in{{\mathbb{N}}}$ and $\Delta\in(0,\frac{1}{3K})$ , for each $\bm{\eta}=(\eta_{1},\ldots,\eta_{d})\in\{0,1,\ldots,K-1\}^{d}$ , we define

Q_{\bm{\eta}}\coloneqq\Bigl{\{}\bm{x}=(x_{1},\ldots,x_{d})\mathrel{\mathop{% \mathchar 58\relax}}x_{i}\in\bigl{[}\frac{\eta_{i}}{K},\frac{\eta_{i}+1}{K}-% \Delta\cdot 1_{\eta_{i}<K-1}\bigr{]}\Bigr{\}}.

(9)

We construct a PQC that maps all $\bm{x}\in Q_{\bm{\eta}}$ to some fixed point $\bm{x_{\eta}}=\frac{\bm{\eta}}{K}$ in $Q_{\bm{\eta}}$ , i.e., approximating the piecewise-constant function $D(\bm{x})=\frac{\bm{\eta}}{K}$ if $\bm{x}\in Q_{\bm{\eta}}$ . We describe our construction for $d=1$ , where $D(x)=\frac{k}{K}$ if $x\in[\frac{k}{K},\frac{k+1}{K}-\Delta\cdot 1_{k<K-1}]$ for $k=0,\ldots,K-1$ . The multivariate case could be naturally generalized by applying $D(x)$ to each variable $x_{j}$ . The idea is to construct a polynomial that approximates the function $D(x)$ based on the polynomial approximation to the sign function [54], which a single-qubit PQC can then implement. Generalizing to the multivariate localization, there exists a PQC $W_{D}(\bm{x})$ of depth $O(\frac{1}{\Delta}\log\frac{K}{\varepsilon})$ and width $O(d)$ such that the output $f_{W_{D}}(\bm{x})$ maps $\bm{x}$ to the corresponding fixed point $\bm{x_{\eta}}$ with precision $\varepsilon$ . We can obtain an estimation of $\bm{\eta}$ using $\lfloor Kf_{W_{D}}(\bm{x})\rfloor$ .

Implementing the Taylor coefficients.

Next, we use PQC to implement the Taylor coefficients $\xi_{{\bm{\eta}},\bm{\alpha}}\coloneqq\frac{\partial^{\bm{\alpha}}f(\bm{x_{% \eta}})}{\bm{\alpha}!}\in[-1,1]$ for each ${\bm{\eta}}=(\eta_{1},\ldots,\eta_{d})\in\{0,1,\ldots,K-1\}^{d}$ and $\bm{\alpha}$ , which is essentially a point-fitting problem. Then we could construct a PQC $U_{co}^{\bm{\alpha}}=\sum_{{\bm{\eta}}}\lvert{\bm{\eta}}\rangle\!\langle{\bm{% \eta}}\rvert\otimes R_{X}(\theta_{\bm{\eta},\bm{\alpha}})$ such that $\bra{{\bm{\eta}},0}U_{co}^{\bm{\alpha}}\ket{{\bm{\eta}},0}=\xi_{{\bm{\eta}},% \bm{\alpha}}$ , where $\ket{{\bm{\eta}}}=\ket{\eta_{1}}\otimes\cdots\otimes\ket{\eta_{d}}$ and $\theta_{\bm{\eta},\bm{\alpha}}=2\arccos(\xi_{\bm{\eta},\bm{\alpha}})$ . The depth of $U_{\bm{\alpha}}$ is $O(K^{d})$ , the width is $O(d\log K)$ , and the number of parameters is $O(K^{d})$ . Note that the state $\ket{{\bm{\eta}}}$ can be prepared using basis encoding on the provided $\bm{\eta}$ $=\lfloor Kf_{W_{D}}(\bm{x})\rfloor$ from the localization step.

Implementing multivariate Taylor series.

To implement the multivariate Taylor expansion of a function at some fixed point $\bm{x_{\eta}}$ , we first build a PQC to represent a single term in the Taylor series, which could be done by combining the PQC, which implements the Taylor coefficients and the PQC which implements monomials, i.e., constructing $U^{\bm{\alpha}}_{\bm{\eta}}(\bm{x})\coloneqq U_{co}^{\bm{\alpha}}\otimes U^{% \bm{\alpha}}(\bm{x}-\bm{x_{\eta}})$ . The depth of $U^{\bm{\alpha}}_{\bm{\eta}}(\bm{x})$ is $O(K^{d}+s)$ , the width is $O(d\log K)$ , and the number of parameters is at most $K^{d}+s+d$ . The next step is to aggregate single Taylor terms together to implement the truncated Taylor expansion of the target function. We use LCU to construct the PQC $U_{t}(\bm{x},\bm{x_{\eta}})\coloneqq\sum_{\lVert\bm{\alpha}\rVert_{1}\leq s}U^% {\bm{\alpha}}_{\bm{\eta}}(\bm{x})$ so that we can implement the Taylor expansion of the function $f$ at point $\bm{x_{\eta}}$ as $\bra{\bm{\eta},0}\!\bra{+}^{\otimes d}U_{t}(\bm{x},\bm{x_{\eta}})\ket{\bm{\eta% },0}\!\ket{+}^{\otimes d}$ .

We construct a nested PQC as $U_{t}(\bm{x},f_{W_{D}}(\bm{x}))$ , such that for any input $\bm{x}$ , the corresponding fixed point could be determined by the localization PQC. Such a PQC could be used, together with the Hadamard test, to approximate Hölder smooth functions. In particular, we prove the approximation error bound of our constructed PQC based on the error rate of Taylor expansion in Eq. 8.

Theorem 4.

Given a function $f\in{\cal H}^{\beta}([0,1]^{d},1)$ with $\beta=r+s$ , $r\in(0,1]$ and $s\in{{\mathbb{N}}}^{+}$ , for any $K\in{{\mathbb{N}}}$ and $\Delta\in(0,\frac{1}{3K})$ , there exists a PQC $W_{t}(\bm{x})$ such that $f_{W_{t}}(\bm{x})\coloneqq\bra{0}W^{\dagger}_{t}(\bm{x})Z^{(0)}W_{t}(\bm{x})% \ket{0}$ satisfies

\lvert f(\bm{x})-f_{W_{t}}(\bm{x})\rvert\leq d^{s+\beta/2}K^{-\beta}

(10)

for $\bm{x}\in\bigcup_{{\bm{\eta}}}Q_{\bm{\eta}}$ . The width of the PQC is $O(d\log K+\log s+s\log d)$ , the depth is $O(s^{2}d^{s}K^{d}(\log s+s\log d+d\log K))+\frac{1}{\Delta}\log K)$ , and the number of parameters is $O(sd^{s}(s+d+K^{d})+\frac{d}{\Delta}\log K)$ .

The proof can be found in Appendix D. Note that the PQC in Theorem 4 consists of two nested parts and its depth is counted as the sum of two PQCs for simplicity. We have established the uniform convergence property of PQCs for approximating Hölder smooth function on $[0,1]^{d}$ except for the trifling region $\Lambda(d,K,\Delta)$ . The Lebesgue measure of such a trifling region is no more than $dK\Delta$ . We can set $\Delta=K^{-d}$ with no influence on the size of the constructed PQC, and a similar approximation error bound in the entire region $[0,1]^{d}$ under the $L^{2}$ distance could be obtained.

4 Numerical experiments

This section presents numerical experiments to illustrate the expressivity of our proposed PQCs in approximating multivariate functions. We focus on approximating a bivariate polynomial function

f(x,y)=\frac{(x^{2}+y-1.5\pi)^{2}+(x+y^{2}+\pi)^{2}+(x+y-0.5\pi)^{2}}{5\pi^{2}},

over the domain $(x,y)\in[0,1]^{2}$ . The approximation process involves two separate steps: (1) Learning a piecewise-constant function, $D(x)=\frac{k}{K}$ if $x\in[\frac{k}{K},\frac{k+1}{K})$ , using a single-qubit PQC, where $K\in\mathbb{N}^{+}$ determines the number of intervals for the piecewise-constant function. (2) Learning the Taylor expansion of $f(x,y)$ using multi-qubit PQCs based on Theorem 4. Both learning processes are implemented on a Gold 6248 2.50 GHz Intel(R) Xeon(R) CPU.

We randomly sample $200$ data points within the domain $[0,1]$ to create training and test datasets for $D(x)$ . A single-qubit PQC with adjustable parameters $L=764$ ( $L=996$ ) is used to learn $D(x)$ with $K=2$ ( $K=10$ ). Each parameter of the PQC is randomly initialized within the range $[0,\pi]$ . We use the Adam optimizer [55] with a learning rate of $0.01$ to minimize the Mean Squared Error (MSE) loss function during training. The training process was limited to a maximum of $300$ iterations with a batch size of 100 data points. Early termination occurred if the MSE reached below $10^{-4}$ . The achieved MSE on the test data was $3.57\text{\times}{10}^{-4}$ ( $K=2$ ) and $1.04\text{\times}{10}^{-4}$ ( $K=10$ ). The numerical results are visualized in Fig. 3.

Similar to the previous step, we randomly sampled $200$ data points within the domain $[0,1]^{2}$ to create training and test datasets for $f(x,y)$ . A nested PQC structure was designed. It combined $12$ two-qubit PQCs with a depth of $2$ , allowing the approximation of a degree-4 polynomial through a combination of lower-degree ones. Additionally, Taylor coefficients were stored in a separate matrix of size $K^{2}\times 12$ . The number of trainable parameters varied from $120$ ( $K=2$ ) to $1272$ ( $K=10$ ), each initialized randomly from $[0,\pi]$ . The Adam optimizer with a learning rate of 0.01 was used to minimize the MSE loss during training. The training was limited to $500$ iterations with a batch size of 100, with early termination for MSE below ${10}^{-4}$ . The achieved MSE on the test data was $2.22\text{\times}{10}^{-4}$ ( $K=2$ ) and $9.82\text{\times}{10}^{-5}$ ( $K=10$ ). Fig. 4 visualizes the results. As $K$ increases, the PQC demonstrates improved approximation performance, aligning with the theoretical findings.

5 Discussion

To the best of our knowledge, our results establish the first explicit PQC constructions for approximating Lipschitz continuous and Hölder smooth functions with quantitative approximation error bounds. These results open up the possibility of comparing the size of PQCs and the size of classical deep neural networks for accomplishing the same function approximation tasks and see if there is any quantum advantage in terms of the model size and the number of trainable parameters. Here, we mainly focus on the comparison with the results of approximation errors of classical machine learning models. In classical deep learning, the deep feed-forward neural network (FNN) equipped with the rectified linear unit (ReLU) activation function is one of the most commonly used models. The quantitative approximation error bounds of ReLU FNNs for approximating continuous functions have been recently established, including the nearly optimal approximation error bounds of ReLU FNNs for smooth functions [21]. We briefly compare the approximation errors of PQCs and ReLU FNNs in terms of width, depth and the number of trainable parameters. Detailed comparisons can be found in Appendix E.

We consider multivariate smooth functions in $C^{s}_{u}([0,1]^{d})$ (the unit ball of $C^{s}([0,1]^{d})$ ) with smooth index $s\in{{\mathbb{N}}}$ as the target functions in our comparison. Note that smooth functions with smooth index $s$ are exactly $(s+1)$ -Hölder smooth functions by definition. For simplicity, we first show the case of $s=2$ . To achieve the same approximation error $\varepsilon$ (say some constant), we need to set $K_{Q}=\Theta(d^{2}/\sqrt{\varepsilon})$ for the constructed PQCs from Theorem 4 and set $K_{C}=\Theta(2^{d/2}/\sqrt{\varepsilon})$ for the constructed near-optimal ReLU FNNs from Ref. [21]. Substituting the choices of $K$ ’s in the sizes of PQCs and ReLU FNNs, we have

\frac{\text{Width of PQC}\times\text{Depth of PQC}}{\text{Width of FNN}\times% \text{Depth of FNN}}=O\Bigl{(}\frac{d^{3}K_{Q}^{d}}{2^{d+3}K_{C}^{d/2}}\Bigr{)% }=O\Bigl{(}\frac{\varepsilon^{-d/4}}{2^{d^{2}-d\log d}}\Bigr{)}.

(11)

One can obtain a similar relation for the number of required parameters in PQCs and ReLU FNNs for approximating smooth functions and extend these results to any $2\leq s<d$ , which holds relevance in numerous real-world applications (e.g., the input dimension $d$ is $784$ for the MNIST dataset and is $150\,528$ for the ImageNet dataset [56], and empirically $s\leq 10$ ). Therefore, to achieve the same approximation error, the required quantum circuit size and number of parameters of PQCs is exponentially smaller than the required network size and number of parameters of ReLU FNNs proposed in Ref. [21].

Aiming to understand and continuously expand the range of problems that can be addressed using quantum machine learning, we have demonstrated the approximation capabilities of PQC models in supervised learning. We characterized the approximation error of PQCs in terms of the model size, delivering a deeper understanding of the expressive power of PQCs that is beyond the universal approximation properties. With these results, we can unlock the full potential of these models and drive advancements in quantum machine learning. Notably, by comparing our results with the near-optimal approximation error bound of classical ReLU neural networks, we demonstrate an improvement over the classical models on approximating high-dimensional functions that satisfy specific smoothness criteria, quantified by an improvement on the model size and the number of parameters.

Unlike many other investigations in the universal approximation properties of PQC models [26, 27, 28, 29, 30, 31, 32, 33], our constructions of PQCs for approximating broad classes of continuous functions do not rely on any impractical assumptions. All the variables take the form of parameters within single-qubit rotation gates, avoiding any classical parameterized pre-processing or post-processing. Ultimately, our research provides valuable insights into the theoretical underpinnings of PQCs in quantum machine learning and paves the way for leveraging its capabilities in machine learning for both classical and quantum applications.

In this work, we introduce a novel nested PQC structure, which significantly improves the approximation capabilities. Future work could focus on exploring more powerful PQC constructions based on our proposed idea and understanding the capabilities and limitations of PQCs in more practical tasks even with real-world data. Developing efficient training strategies for PQCs, such as accelerated methods that achieve faster convergence rates, will also be interesting.

Acknowledgments and Disclosure of Funding

Part of this work was done when Z.Y. was visiting Wuhan University. Z.Y. thanks Patrick Rebentrost for helpful discussions. The authors thank the helpful comments from the anonymous reviewers. This work is supported by the National Key Research and Development Program of China (No. 2020YFA0714200), the National Nature Science Foundation of China (No. 62302346, No. 12125103, No. 12071362, No. 12371424, No. 12371441) and supported by the “Fundamental Research Funds for the Central Universities”.

References

Biamonte et al. [2017] Jacob Biamonte, Peter Wittek, Nicola Pancotti, Patrick Rebentrost, Nathan Wiebe, and Seth Lloyd. Quantum machine learning. Nature, 549(7671):195–202, September 2017. ISSN 1476-4687. doi: 10.1038/nature23474.
Benedetti et al. [2019] Marcello Benedetti, Erika Lloyd, Stefan Sack, and Mattia Fiorentini. Parameterized quantum circuits as machine learning models. Quantum Science and Technology, 4(4):043001, November 2019. ISSN 2058-9565. doi: 10.1088/2058-9565/ab4eb5.
Preskill [2018] John Preskill. Quantum Computing in the NISQ era and beyond. Quantum, 2:79, August 2018. ISSN 2521-327X. doi: 10.22331/q-2018-08-06-79. URL https://doi.org/10.22331/q-2018-08-06-79.
Kandala et al. [2017] Abhinav Kandala, Antonio Mezzacapo, Kristan Temme, Maika Takita, Markus Brink, Jerry M. Chow, and Jay M. Gambetta. Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets. Nature, 549(7671):242–246, 2017. ISSN 1476-4687. doi: 10.1038/nature23879.
Cerezo et al. [2022] M. Cerezo, Kunal Sharma, Andrew Arrasmith, and Patrick J. Coles. Variational quantum state eigensolver. npj Quantum Information, 8(1):113, 2022. ISSN 2056-6387. doi: 10.1038/s41534-022-00611-6.
Cao et al. [2019] Yudong Cao, Jonathan Romero, Jonathan P. Olson, Matthias Degroote, Peter D. Johnson, Mária Kieferová, Ian D. Kivlichan, Tim Menke, Borja Peropadre, Nicolas P. D. Sawaya, Sukin Sim, Libor Veis, and Alán Aspuru-Guzik. Quantum chemistry in the age of quantum computing. Chemical Reviews, 119(19):10856–10915, 2019. ISSN 0009-2665. doi: 10.1021/acs.chemrev.8b00803.
Pan et al. [2023] Xiaoxuan Pan, Zhide Lu, Weiting Wang, Ziyue Hua, Yifang Xu, Weikang Li, Weizhou Cai, Xuegang Li, Haiyan Wang, Yi-Pu Song, Chang-Ling Zou, Dong-Ling Deng, and Luyan Sun. Deep quantum neural networks on a superconducting processor. Nature Communications, 14(1):4006, 2023. ISSN 2041-1723. doi: 10.1038/s41467-023-39785-8.
Ren et al. [2022] Wenhui Ren, Weikang Li, Shibo Xu, Ke Wang, Wenjie Jiang, Feitong Jin, Xuhao Zhu, Jiachen Chen, Zixuan Song, Pengfei Zhang, Hang Dong, Xu Zhang, Jinfeng Deng, Yu Gao, Chuanyu Zhang, Yaozu Wu, Bing Zhang, Qiujiang Guo, Hekang Li, Zhen Wang, Jacob Biamonte, Chao Song, Dong-Ling Deng, and H. Wang. Experimental quantum adversarial learning with programmable superconducting qubits. Nature Computational Science, 2(11):711–717, 2022. ISSN 2662-8457. doi: 10.1038/s43588-022-00351-9.
Huang et al. [2021a] He-Liang Huang, Yuxuan Du, Ming Gong, Youwei Zhao, Yulin Wu, Chaoyue Wang, Shaowei Li, Futian Liang, Jin Lin, Yu Xu, Rui Yang, Tongliang Liu, Min-Hsiu Hsieh, Hui Deng, Hao Rong, Cheng-Zhi Peng, Chao-Yang Lu, Yu-Ao Chen, Dacheng Tao, Xiaobo Zhu, and Jian-Wei Pan. Experimental quantum generative adversarial networks for image generation. Phys. Rev. Appl., 16:024051, Aug 2021a. doi: 10.1103/PhysRevApplied.16.024051. URL https://link.aps.org/doi/10.1103/PhysRevApplied.16.024051.
Mitarai et al. [2018] Kosuke Mitarai, Makoto Negoro, Masahiro Kitagawa, and Keisuke Fujii. Quantum Circuit Learning. Physical Review A, 98(3):032309, September 2018. ISSN 2469-9926, 2469-9934. doi: 10.1103/PhysRevA.98.032309.
Pérez-Salinas et al. [2020] Adrián Pérez-Salinas, Alba Cervera-Lierta, Elies Gil-Fuster, and José I. Latorre. Data re-uploading for a universal quantum classifier. Quantum, 4:226, February 2020. doi: 10.22331/q-2020-02-06-226.
LeCun et al. [2015] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436–444, 2015. ISSN 1476-4687. doi: 10.1038/nature14539.
Cybenko [1989] George Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems, 2(4):303–314, 1989. URL https://link.springer.com/article/10.1007/BF02551274.
Hornik et al. [1989] Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are universal approximators. Neural Networks, 2(5):359–366, January 1989. ISSN 08936080. doi: 10.1016/0893-6080(89)90020-8. URL https://linkinghub.elsevier.com/retrieve/pii/0893608089900208.
Barron [1993] A.R. Barron. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, 39(3):930–945, May 1993. ISSN 1557-9654. doi: 10.1109/18.256500.
Yarotsky [2017] Dmitry Yarotsky. Error bounds for approximations with deep ReLU networks. Neural Networks, 94:103–114, 2017. ISSN 0893-6080. doi: 10.1016/j.neunet.2017.07.002.
Yarotsky [2018] Dmitry Yarotsky. Optimal approximation of continuous functions by very deep ReLU networks. In Proceedings of the 31st Conference On Learning Theory, pages 639–649. PMLR, July 2018. URL https://proceedings.mlr.press/v75/yarotsky18a.html.
Petersen and Voigtlaender [2018] Philipp Petersen and Felix Voigtlaender. Optimal approximation of piecewise smooth functions using deep ReLU neural networks. Neural Networks, 108:296–330, December 2018. ISSN 0893-6080. doi: 10.1016/j.neunet.2018.08.019.
Yarotsky and Zhevnerchuk [2020] Dmitry Yarotsky and Anton Zhevnerchuk. The phase diagram of approximation rates for deep neural networks. In Advances in Neural Information Processing Systems, volume 33, pages 13005–13015. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/hash/979a3f14bae523dc5101c52120c535e9-Abstract.html.
Shen [2020] Zuowei Shen. Deep Network Approximation Characterized by Number of Neurons. Communications in Computational Physics, 28(5):1768–1811, June 2020. ISSN 1815-2406, 1991-7120. doi: 10.4208/cicp.OA-2020-0149.
Lu et al. [2021] Jianfeng Lu, Zuowei Shen, Haizhao Yang, and Shijun Zhang. Deep Network Approximation for Smooth Functions. SIAM Journal on Mathematical Analysis, 53(5):5465–5506, January 2021. ISSN 0036-1410. doi: 10.1137/20M134695X. URL https://epubs.siam.org/doi/10.1137/20M134695X.
Shen et al. [2022] Zuowei Shen, Haizhao Yang, and Shijun Zhang. Optimal approximation rate of ReLU networks in terms of width and depth. Journal de Mathématiques Pures et Appliquées, 157:101–135, January 2022. ISSN 0021-7824. doi: 10.1016/j.matpur.2021.07.009. URL https://www.sciencedirect.com/science/article/pii/S0021782421001124.
Weinan et al. [2022] E Weinan, Chao Ma, and Lei Wu. The barron space and the flow-induced function spaces for neural network models. Constructive Approximation, 55(1):369–406, 2022.
Jiao et al. [2023a] Yuling Jiao, Yanming Lai, Xiliang Lu, Fengru Wang, Jerry Zhijian Yang, and Yuanyuan Yang. Deep neural networks with ReLU-sine-exponential activations break curse of dimensionality in approximation on hölder class. SIAM Journal on Mathematical Analysis, 55(4):3635–3649, 2023a. doi: 10.1137/21M144431X. URL https://doi.org/10.1137/21M144431X.
Jiao et al. [2023b] Yuling Jiao, Guohao Shen, Yuanyuan Lin, and Jian Huang. Deep nonparametric regression on approximate manifolds: Nonasymptotic error bounds with polynomial prefactors. The Annals of Statistics, 51(2):691–716, April 2023b. ISSN 0090-5364, 2168-8966. doi: 10.1214/23-AOS2266. URL https://projecteuclid.org/journals/annals-of-statistics/volume-51/issue-2/Deep-nonparametric-regression-on-approximate-manifolds--Nonasymptotic-error-bounds/10.1214/23-AOS2266.full.
Havlíček et al. [2019] Vojtěch Havlíček, Antonio D. Córcoles, Kristan Temme, Aram W. Harrow, Abhinav Kandala, Jerry M. Chow, and Jay M. Gambetta. Supervised learning with quantum-enhanced feature spaces. Nature, 567(7747):209–212, March 2019. ISSN 1476-4687. doi: 10.1038/s41586-019-0980-2. URL https://www.nature.com/articles/s41586-019-0980-2.
Du et al. [2020] Yuxuan Du, Min-Hsiu Hsieh, Tongliang Liu, and Dacheng Tao. Expressive power of parametrized quantum circuits. Physical Review Research, 2(3):033125, July 2020. doi: 10.1103/PhysRevResearch.2.033125.
Liu et al. [2021] Yunchao Liu, Srinivasan Arunachalam, and Kristan Temme. A rigorous and robust quantum speed-up in supervised machine learning. Nature Physics, 17(9):1013–1017, 2021. ISSN 1745-2481. doi: 10.1038/s41567-021-01287-z.
Huang et al. [2021b] Hsin-Yuan Huang, Richard Kueng, and John Preskill. Information-theoretic bounds on quantum advantage in machine learning. Phys. Rev. Lett., 126:190505, May 2021b. doi: 10.1103/PhysRevLett.126.190505. URL https://link.aps.org/doi/10.1103/PhysRevLett.126.190505.
Jerbi et al. [2021] Sofiene Jerbi, Casper Gyurik, Simon Marshall, Hans Briegel, and Vedran Dunjko. Parametrized Quantum Policies for Reinforcement Learning. In Advances in Neural Information Processing Systems, volume 34, pages 28362–28375. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper/2021/hash/eec96a7f788e88184c0e713456026f3f-Abstract.html.
Huang et al. [2021c] Hsin-Yuan Huang, Michael Broughton, Masoud Mohseni, Ryan Babbush, Sergio Boixo, Hartmut Neven, and Jarrod R. McClean. Power of data in quantum machine learning. Nature Communications, 12(1):2631, May 2021c. ISSN 2041-1723. doi: 10.1038/s41467-021-22539-9. URL https://www.nature.com/articles/s41467-021-22539-9.
Jerbi et al. [2023] Sofiene Jerbi, Lukas J. Fiderer, Hendrik Poulsen Nautrup, Jonas M. Kübler, Hans J. Briegel, and Vedran Dunjko. Quantum machine learning beyond kernel methods. Nature Communications, 14(1):517, January 2023. ISSN 2041-1723. doi: 10.1038/s41467-023-36159-y. URL https://www.nature.com/articles/s41467-023-36159-y.
Jäger and Krems [2023] Jonas Jäger and Roman V. Krems. Universal expressiveness of variational quantum classifiers and quantum kernels for support vector machines. Nature Communications, 14(1):576, February 2023. ISSN 2041-1723. doi: 10.1038/s41467-023-36144-5. URL https://www.nature.com/articles/s41467-023-36144-5.
Schuld et al. [2021] Maria Schuld, Ryan Sweke, and Johannes Jakob Meyer. Effect of data encoding on the expressive power of variational quantum-machine-learning models. Physical Review A, 103(3):032430, March 2021. doi: 10.1103/PhysRevA.103.032430.
Gil Vidal and Theis [2020] Francisco Javier Gil Vidal and Dirk Oliver Theis. Input Redundancy for Parameterized Quantum Circuits. Frontiers in Physics, 8, 2020. ISSN 2296-424X. URL https://www.frontiersin.org/articles/10.3389/fphy.2020.00297.
Pérez-Salinas et al. [2021] Adrián Pérez-Salinas, David López-Núñez, Artur García-Sáez, P. Forn-Díaz, and José I. Latorre. One qubit as a universal approximant. Physical Review A, 104(1):012405, July 2021. doi: 10.1103/PhysRevA.104.012405.
Yu et al. [2022] Zhan Yu, Hongshun Yao, Mujin Li, and Xin Wang. Power and limitations of single-qubit native quantum neural networks. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 27810–27823. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/b250de41980b58d34d6aadc3f4aedd4c-Paper-Conference.pdf.
Manzano et al. [2023] Alberto Manzano, David Dechant, Jordi Tura, and Vedran Dunjko. Parametrized Quantum Circuits and their approximation capacities in the context of quantum machine learning, July 2023.
Goto et al. [2021] Takahiro Goto, Quoc Hoan Tran, and Kohei Nakajima. Universal Approximation Property of Quantum Machine Learning Models in Quantum-Enhanced Feature Spaces. Physical Review Letters, 127(9):090506, August 2021. doi: 10.1103/PhysRevLett.127.090506.
Gonon and Jacquier [2023] Lukas Gonon and Antoine Jacquier. Universal Approximation Theorem and error bounds for quantum neural networks and quantum reservoirs, July 2023.
Qi et al. [2023] Jun Qi, Chao-Han Huck Yang, Pin-Yu Chen, and Min-Hsiu Hsieh. Theoretical error performance analysis for variational quantum circuit based functional regression. npj Quantum Information, 9(1):4, 2023. ISSN 2056-6387. doi: 10.1038/s41534-022-00672-7.
Zhao et al. [2023] Haimeng Zhao, Laura Lewis, Ishaan Kannan, Yihui Quek, Hsin-Yuan Huang, and Matthias C. Caro. Learning quantum states and unitaries of bounded gate complexity, 2023.
Grohs and Kutyniok [2022] Philipp Grohs and Gitta Kutyniok. Mathematical aspects of deep learning. Cambridge University Press, 2022.
Nielsen and Chuang [2010] Michael A Nielsen and Isaac L Chuang. Quantum computation and quantum information. Cambridge university press, 2010.
Low et al. [2016] Guang Hao Low, Theodore J. Yoder, and Isaac L. Chuang. Methodology of Resonant Equiangular Composite Quantum Gates. Physical Review X, 6(4):041067, December 2016. doi: 10.1103/PhysRevX.6.041067.
Low and Chuang [2017] Guang Hao Low and Isaac L. Chuang. Optimal Hamiltonian Simulation by Quantum Signal Processing. Physical Review Letters, 118(1):010501, January 2017. doi: 10.1103/PhysRevLett.118.010501.
Gilyén et al. [2019] András Gilyén, Yuan Su, Guang Hao Low, and Nathan Wiebe. Quantum singular value transformation and beyond: Exponential improvements for quantum matrix arithmetics. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 193–204, June 2019. doi: 10.1145/3313276.3316366.
Childs and Wiebe [2012] Andrew M. Childs and Nathan Wiebe. Hamiltonian simulation using linear combinations of unitary operations. Quantum Inf. Comput., 12(11-12):901–924, 2012. doi: 10.26421/QIC12.11-12-1.
da Silva and Park [2022] Adenilton J. da Silva and Daniel K. Park. Linear-depth quantum circuits for multiqubit controlled gates. Physical Review A, 106(4):042602, October 2022. doi: 10.1103/PhysRevA.106.042602.
Brassard et al. [2002] Gilles Brassard, Peter Hoyer, Michele Mosca, and Alain Tapp. Quantum Amplitude Amplification and Estimation. Contemporary Mathematics, 305:53–74, 2002. doi: 10.1090/conm/305/05215.
Davidson and Donsig [2002] Kenneth R. Davidson and Allan P. Donsig. Real analysis with real applications. Prentice Hall, 2002. URL https://cir.nii.ac.jp/crid/1130000794786166144.
Heitzinger [2002] Clemens Heitzinger. Simulation and Inverse Modeling of Semiconductor Manufacturing Processes. Thesis, Technische Universität Wien, 2002.
Foupouagnigni and Mouafo Wouodjié [2020] Mama Foupouagnigni and Merlin Mouafo Wouodjié. On Multivariate Bernstein Polynomials. Mathematics, 8(9):1397, September 2020. ISSN 2227-7390. doi: 10.3390/math8091397.
Low [2017] Guang Hao Low. Quantum Signal Processing by Single-Qubit Dynamics. Thesis, Massachusetts Institute of Technology, 2017. URL https://dspace.mit.edu/handle/1721.1/115025.
Kingma and Ba [2015] Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. URL http://arxiv.org/abs/1412.6980.
Deng et al. [2009] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, June 2009. doi: 10.1109/CVPR.2009.5206848. URL https://ieeexplore.ieee.org/abstract/document/5206848.
Wang et al. [2023] Youle Wang, Lei Zhang, Zhan Yu, and Xin Wang. Quantum Phase Processing and its Applications in Estimating Phase and Entropies, July 2023.
Vapnik and Chervonenkis [1982] V. N. Vapnik and A. Ya. Chervonenkis. Necessary and Sufficient Conditions for the Uniform Convergence of Means to their Expectations. Theory of Probability & Its Applications, 26(3):532–553, January 1982. ISSN 0040-585X. doi: 10.1137/1126059.
Tikhomirov [1993] V. M. Tikhomirov. $\epsilon$ -Entropy and $\epsilon$ -Capacity of Sets In Functional Spaces. In A. N. Shiryayev, editor, Selected Works of A. N. Kolmogorov: Volume III: Information Theory and the Theory of Algorithms, Mathematics and Its Applications, pages 86–170. Springer Netherlands, Dordrecht, 1993. ISBN 978-94-017-2973-4. doi: 10.1007/978-94-017-2973-4_7.
Bartlett and Mendelson [2003] Peter L. Bartlett and Shahar Mendelson. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results. Journal of Machine Learning Research, 3(3):463, April 2003. ISSN 15324435. URL https://search.ebscohost.com/login.aspx?direct=true&db=a9h&AN=10257714&lang=zh-cn&site=ehost-live.
Du et al. [2022] Yuxuan Du, Zhuozhuo Tu, Xiao Yuan, and Dacheng Tao. Efficient Measure for the Expressivity of Variational Quantum Algorithms. Physical Review Letters, 128(8):080506, February 2022. doi: 10.1103/PhysRevLett.128.080506.
Bu et al. [2022] Kaifeng Bu, Dax Enshan Koh, Lu Li, Qingxian Luo, and Yaobo Zhang. Statistical complexity of quantum circuits. Physical Review A, 105(6):062431, June 2022. doi: 10.1103/PhysRevA.105.062431.
Caro and Datta [2020] Matthias C. Caro and Ishaun Datta. Pseudo-dimension of quantum circuits. Quantum Machine Intelligence, 2(2):14, November 2020. ISSN 2524-4914. doi: 10.1007/s42484-020-00027-5.
Chen et al. [2022] Chih-Chieh Chen, Masaru Sogabe, Kodai Shiba, Katsuyoshi Sakamoto, and Tomah Sogabe. General Vapnik–Chervonenkis dimension bounds for quantum circuit learning. Journal of Physics: Complexity, 3(4):045007, November 2022. ISSN 2632-072X. doi: 10.1088/2632-072X/ac9f9b.
Abbas et al. [2021] Amira Abbas, David Sutter, Christa Zoufal, Aurelien Lucchi, Alessio Figalli, and Stefan Woerner. The power of quantum neural networks. Nature Computational Science, 1(6):403–409, June 2021. ISSN 2662-8457. doi: 10.1038/s43588-021-00084-1.
DeVore et al. [1989] Ronald A. DeVore, Ralph Howard, and Charles Micchelli. Optimal nonlinear approximation. manuscripta mathematica, 63(4):469–478, December 1989. ISSN 1432-1785. doi: 10.1007/BF01171759.
Hornik [1991] Kurt Hornik. Approximation capabilities of multilayer feedforward networks. Neural networks, 4(2):251–257, 1991. URL https://www.sciencedirect.com/science/article/abs/pii/089360809190009T.
E et al. [2022] Weinan E, Chao Ma, and Lei Wu. The Barron Space and the Flow-Induced Function Spaces for Neural Network Models. Constructive Approximation, 55(1):369–406, February 2022. ISSN 1432-0940. doi: 10.1007/s00365-021-09549-y. URL https://doi.org/10.1007/s00365-021-09549-y.
Stone [1948] M. H. Stone. The Generalized Weierstrass Approximation Theorem. Mathematics Magazine, 21(4):167–184, 1948. ISSN 0025-570X. doi: 10.2307/3029750.
He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, June 2016. doi: 10.1109/CVPR.2016.90.
Ren et al. [2015] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015. URL https://proceedings.neurips.cc/paper_files/paper/2015/file/14bfa6bb14875e45bba028a21ed38046-Paper.pdf.
Yang et al. [2019] Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. XLNet: Generalized Autoregressive Pretraining for Language Understanding. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper/2019/hash/dc6a7e655d7e5840e66733e9ee67cc69-Abstract.html.
Devlin et al. [2019] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1423. URL https://aclanthology.org/N19-1423.
Zhang et al. [2023] Shijun Zhang, Jianfeng Lu, and Hongkai Zhao. Deep Network Approximation: Beyond ReLU to Diverse Activation Functions, September 2023. URL http://arxiv.org/abs/2307.06555.
Gühring et al. [2020] Ingo Gühring, Gitta Kutyniok, and Philipp Petersen. Error bounds for approximations with deep ReLU neural networks in ws,p norms. Analysis and Applications, 18(05):803–859, 2020. doi: 10.1142/S0219530519410021. URL https://doi.org/10.1142/S0219530519410021.
Schmidt-Hieber [2020] Johannes Schmidt-Hieber. Nonparametric regression using deep neural networks with ReLU activation function. The Annals of Statistics, 48(4):1875 – 1897, 2020. doi: 10.1214/19-AOS1875. URL https://doi.org/10.1214/19-AOS1875.

Supplementary Material

Appendix A Preliminaries

In this section, we will first present some essential mathematical foundations for deriving the main results of this work. Moreover, to contextualize our work within the existing literature, we comprehensively review relevant studies in Section A.3.

A.1 Notation

We unify the notations throughout the whole work. The univariate polynomial ring over a field $\mathbb{F}$ is symbolized as $\mathbb{F}[x]$ , with the variable $x$ representing the input. The ring of Laurent polynomial $\mathbb{F}[x,x^{-1}]$ is an extension of the polynomial ring obtained by adding inverses of $x$ . The collection of natural numbers is represented by the symbol $\mathbb{N}\mathrel{\mathop{\mathchar 58\relax}}=\{1,2,3,\dots\}$ , while the set of non-negative integers is denoted as $\mathbb{N}_{0}\mathrel{\mathop{\mathchar 58\relax}}=\{0\}\cup\mathbb{N}$ . The $1$ -norm of a vector $\bm{\alpha}=(\alpha_{1},\alpha_{2},\dots,\alpha_{d})$ is denoted by $\|\bm{\alpha}\|_{1}\mathrel{\mathop{\mathchar 58\relax}}=|\alpha_{1}|+|\alpha_% {2}|+\cdots+|\alpha_{d}|$ .

A.2 Data re-uploading PQCs

In this section, we review the concept of data re-uploading PQC and define the PQC we use in this paper. The data re-uploading PQC is a quantum circuit that consists of interleaved data encoding circuit blocks and trainable circuit blocks [35, 11]. More precisely, let $\bm{x}$ be the input data vector and $\bm{\theta}=(\bm{\theta_{0}},\ldots,\bm{\theta_{L}})$ be a set of trainable parameters. $S(\bm{x})$ is a quantum circuit that encode $\bm{x}$ and $V(\bm{\theta}_{j})$ is a trainable quantum circuit with trainable parameter vector $\bm{\theta}_{j}$ . An $L$ -layer data re-uploading PQC can be then expressed as

U_{\bm{\theta}}(\bm{x})=V(\bm{\theta_{0}})\prod_{j=1}^{L}S(\bm{x})V(\bm{\theta% _{j}}),

(A.1)

Applying $U_{\bm{\theta}}(\bm{x})$ to a quantum state and measuring the output states provides a way to express functions on $\bm{x}$ . The expressivity of the data re-uploading PQC model can be characterized by the classes of functions that it can implement. It is common to build data encoding circuits and trainable circuits using the most prevalent Pauli rotation operators,

R_{X}(\theta)=\begin{bmatrix}\cos\frac{\theta}{2}&-i\sin\frac{\theta}{2}\\[4.3% 0554pt] -i\sin\frac{\theta}{2}&\cos\frac{\theta}{2}\end{bmatrix},\quad R_{Y}(\theta)=% \begin{bmatrix}\cos\frac{\theta}{2}&-\sin\frac{\theta}{2}\\[4.30554pt] \sin\frac{\theta}{2}&\cos\frac{\theta}{2}\end{bmatrix},\quad R_{Z}(\theta)=% \begin{bmatrix}e^{-i\frac{\theta}{2}}&0\\[4.30554pt] 0&e^{i\frac{\theta}{2}}\end{bmatrix}.

(A.2)

Different data encoding schemes lead to different types of data re-uploading PQCs.

In some cases, trainable parameters are also included both during the initial data encoding phase and the final processing of measurement outcomes. These PQCs are considered to have hybrid structures. For instance, in the models proposed by Refs. [35, 36, 40], each input data is multiplied by a specific trainable parameter and subsequently subjected to $R_{Z}$ gates during the data encoding stage. In a similar vein, Refs. [39, 40] incorporate trainable weights into each measurement outcome generated by the constructed PQCs, aggregating these weighted outcomes to produce the final result. Such a structure makes it hard to judge whether the expressive power comes from the classical or quantum part.

A.2.1 Implementing real polynomials

We first introduce the data re-uploading PQC for implementing real univariate polynomials. We utilize the so-called Pauli $X$ basis encoding [10]: The data encoding unitary is a single-qubit rotation defined as

S(x)\coloneqq e^{i\arccos(x)X}=\begin{pmatrix}x&i\sqrt{1-x^{2}}\\ i\sqrt{1-x^{2}}&x\end{pmatrix},

(A.3)

where $x\in[-1,1]$ is the input data. Then interlaying the data encoding unitary $S(x)$ with some parameterized Pauli $Z$ rotations $R_{Z}(\theta)$ gives the circuit of data re-uploading PQC for one variable as

U_{\bm{\theta}}(x)\coloneqq R_{Z}(\theta_{0})\prod_{j=1}^{L}S(x)R_{Z}(\theta_{% j}),

(A.4)

where $\bm{\theta}\ =(\theta_{0},\ldots,\theta_{L})\in{{\mathbb{R}}}^{L+1}$ is a set of trainable parameters. The PQC in Eq. A.4 can be used to implement polynomial transformations on input $x$ , as shown in the following lemma.

Lemma S1 ([47]).

There exists $\bm{\theta}\in{{\mathbb{R}}}^{L+1}$ such that

U_{\bm{\theta}}(x)=\begin{pmatrix}P(x)&iQ(x)\sqrt{1-x^{2}}\\ iQ^{*}(x)\sqrt{1-x^{2}}&P^{*}(x)\end{pmatrix}

(A.5)

if and only if polynomials $P,Q\in{{\mathbb{C}}}[x]$ satisfy

1.

$\deg(P)\leq L$ and $\deg(Q)\leq L-1$ ,
2.

$P$ has parity $L\bmod 2$ and $Q$ has parity $(L-1)\bmod 2$ ²²2For a polynomial $P\in{{\mathbb{C}}}[x]$ , $P$ has parity $0$ if all coefficients corresponding to odd powers of $x$ are $0$ , and similarly $P$ has parity $1$ if all coefficients corresponding to even powers of $x$ are $0$ .,
3.

$\forall x\in[-1,1]$ , $\lvert P(x)\rvert^{2}+(1-x^{2})\lvert Q(x)\rvert^{2}=1$ .

As shown in the above lemma, one could implement a polynomial transformation $\operatorname{Poly}(x)$ such that $\operatorname{Poly}(x)=\braket{0}{U_{\bm{\theta}}(x)}{0}=P(x)$ . Notice that the achievable polynomial $\operatorname{Poly}(x)$ implemented in this way is limited to $P(x)$ for which there exists a polynomial $Q(x)$ satisfying the conditions of Lemma S1. As the target polynomial is often real in practice, we could overcome such a limitation by defining $\operatorname{Poly}(x)=\braket{+}{U_{\bm{\theta}}(x)}{+}=\Re(P(x))+i\Re(Q(x))% \sqrt{1-x^{2}}$ . Then we can achieve any real polynomials with parity $L\bmod 2$ such that $\deg(\operatorname{Poly}(x))\leq L$ , and $\lvert\operatorname{Poly}(x)\rvert\leq 1$ for all $x\in[-1,1]$ .

Corollary S2 ([47]).

There exists $\bm{\theta}\in{{\mathbb{R}}}^{L+1}$ such that

p(x)=\braket{+}{U_{\bm{\theta}}(x)}{+}

(A.6)

if and only if the real polynomial $p(x)\in{{\mathbb{R}}}[x]$ satisfies

1.

$\deg(p(x))\leq L$ ,
2.

$p(x)$ has parity $L\bmod 2$ ³³3A polynomial $p(x)$ has parity $0$ if all coefficients corresponding to odd powers of $x$ are $0$ , and similarly $p(x)$ has parity $1$ if all coefficients corresponding to even powers of $x$ are $0$ .,
3.

$\forall x\in[-1,1]$ , $\lvert p(x)\rvert\leq 1$ .

Remark S1.

The results of PQC with Pauli $X$ basis encoding presented here have been established in the technique of quantum signal processing [45, 46, 47], which uses interleaving signal operators and signal processing operators to transform the input signal. The QSP circuit could be identified as a PQC in the context of quantum machine learning.

A.2.2 Implementing trigonometric polynomials

Other than the real polynomials, there are also types of single-qubit PQC with Pauli $Z$ basis encoding that could implement complex trigonometric polynomials [37]. The data encoding unitary is a single-qubit rotation in the Pauli $Z$ basis

S(x)\coloneqq R_{Z}(x)=\begin{pmatrix}e^{ix/2}&0\\ 0&e^{-ix/2}\end{pmatrix},

(A.7)

where $x\in{{\mathbb{R}}}$ is the data. By interleaving the data encoding unitary $S(x)$ with trainable gates $R_{Y}(\theta)R_{Z}(\phi)$ , the PQC is defined as

U_{\bm{\theta},\bm{\phi}}(x)\coloneqq R_{Z}(\omega)R_{Y}(\theta_{0})R_{Z}(\phi% _{0})\prod_{j=1}^{L}S(x)R_{Y}(\theta_{j})R_{Z}(\phi_{j}),

(A.8)

where $\bm{\theta}=(\theta_{0},\ldots,\theta_{L})\in{{\mathbb{R}}}^{L+1}$ , $\bm{\phi}=(\phi_{0},\ldots,\phi_{L})\in{{\mathbb{R}}}^{L+1}$ and $\omega\in{{\mathbb{R}}}$ . The following lemma characterizes the correspondence between PQC with $\sigma_{z}$ basis encoding and complex trigonometric polynomials.

Lemma S3 ([37]).

There exist $\bm{\theta},\bm{\phi}\in{{\mathbb{R}}}^{L+1}$ and $\omega\in{{\mathbb{R}}}$ such that

U_{\bm{\theta},\bm{\phi}}(x)=\begin{pmatrix}P(x)&-Q(x)\\ Q^{*}(x)&P^{*}(x)\end{pmatrix}

(A.9)

if and only if Laurent polynomials $P,Q\in{{\mathbb{C}}}[e^{ix/2},e^{-ix/2}]$ satisfy

1.

$\deg(P)\leq L$ and $\deg(Q)\leq L$ ,
2.

$P$ and $Q$ have parity $L\bmod 2$ ,
3.

$\forall x\in{{\mathbb{R}}}$ , $\lvert P(x)\rvert^{2}+\lvert Q(x)\rvert^{2}=1$ .

Note that Laurent polynomials in ${{\mathbb{C}}}[e^{ix/2},e^{-ix/2}]$ with parity $0$ are Laurent polynomials in ${{\mathbb{C}}}[e^{ix},e^{-ix}]$ without parity constraints, which implies that the trigonometric QSP could implement complex trigonometric polynomials.

Corollary S4 ([37, 57]).

There exist $\bm{\theta},\bm{\phi}\in{{\mathbb{R}}}^{2L+1}$ and $\omega\in{{\mathbb{R}}}$ such that

t(x)=\bra{0}U_{\bm{\theta},\bm{\phi}}(x)\ket{0}

(A.10)

if and only if the complex-valued trigonometric polynomial $t(x)=\sum_{j=-L}^{L}c_{j}e^{ijx}$ satisfies $\lvert t(x)\rvert\leq 1$ for all $x\in{{\mathbb{R}}}$ .

A.3 Related work in PQC approximation

In this subsection, we review prior literature related to the approximation capabilities of PQCs, which characterizes how the architectural properties of a PQC affect the resulting functions it can fit, and its ensuing performance. After a systematic comparison, we conclude that our results provide precise error bounds for continuous function approximation and make no assumptions about the constructed PQCs. More importantly, all the variables in our proposal take the form of parameters within rotation gates and remain distinct from the data encoding gates to avoid any classical computational influence, thus preserving the inherent quantum property of our approach.

In theoretical machine learning, statistical complexity is a notion that measures the inherent richness characterizing a given hypothesis space. There are various statistical complexity measures, including the Vapnik-Chervonenkis (VC) dimension [58], the metric entropy [59], the Gaussian complexity [60], and the Rademacher complexity [60], etc. To gauge the statistical complexity of PQCs, Du et al. [61] have explored the covering entropy of PQCs in terms of the number of quantum gates and the measurement observable. Bu et al. [62] have investigated the dependence of the Rademacher complexity of PQCs on the resources, width, depth, and the property of input and output registers. The assessment of PQCs has extended to encompass an array of statistical complexity measures, including the Pseudo-Dimension, as delineated in Caro and Datta [63], and the VC dimension, as expounded upon in Chen et al. [64]. Furthermore, the evaluation of PQC expressivity has extended its purview to metrics rooted in information theory. Abbas et al. [65] have evaluated PQC expressivity through the prism of the effective dimension, a data-dependent metric contingent upon the Fisher information. In a parallel endeavor, Du et al. [27] have concentrated their attention on generative tasks, employing entanglement entropy as a metric for quantifying PQC expressivity. It is important to underscore that, while statistical complexity metrics and information-inspired metrics provide invaluable insights into the ‘volume’ of hypothesis spaces, they do not precisely delineate the functions amenable to representation by these models.

To further explore the intricacies of PQCs and their expressivity, an alternative avenue of research has emerged, as highlighted by recent studies [34, 35, 37, 36, 38]. They rewrote the PQC output, i.e., the inner product between an input quantum state and a variational observable, in the form of partial Fourier series. This innovative perspective introduces a more nuanced toolbox for assessing PQC expressivity, offering fresh insights within the quantum machine learning domain, notably with respect to the universal approximation property (UAP). However, it is imperative to underscore that many investigations employing Fourier expansion have been predicated upon certain impractical assumptions. These assumptions encompass the demand for arbitrary parameterized global unitaries and observables, thus posing significant challenges to the practical implementation of the constructed quantum circuits. The existence proof of universal approximation also does not explicitly give approximation error bounds of PQCs.

A very general approach to expressiveness in the context of approximation is the method of nonlinear widths by DeVore et al. [66] that concerns the approximation of a family of functions under the assumption of a continuous dependence of the model on the approximated function. Pérez-Salinas et al. [36] have proved that single-qubit data re-uploading PQCs are universal function approximators, inheriting the famous universal approximation theorem for neural networks [13, 67]. In a quantum-enhanced context, Goto et al. [39] have constructed PQCs to approximate any continuous function guided by the Stone-Weierstrass theorem. Qi et al. [41] have studied the approximation error of PQCs enhanced by tensor-train networks. Their investigation focused on smooth functions, considering factors such as the number of qubits and quantum measurement counts. Furthermore, Gonon and Jacquier [40] have defined a specific hypothesis space consisting of non-oscillating functions, drawing inspiration from Barron [15] and devised PQCs for approximating such functions without encountering the curse of dimensionality (CoD). Notably, the mitigation of CoD arises from their specific hypothesis space definition and is also observed within the domain of classical neural network [68]. It is essential to acknowledge that these works unveil a hybrid nature, blurring the boundaries between classical and quantum domains in circuit construction. The hybrid structure manifests in the data encoding phase and becomes evident in the weighted summation of outputs from foundational quantum circuits. Consequently, whether the powerful expressivity comes from the classical part or the quantum part of hybrid models is unclear.

In our present work, we make no assumptions in the construction of the PQCs. In our PQC model, all variables take the form of parameters within rotation gates. Besides, these trainable parameters remain distinct from the data encoding gates to avoid any classical computational influence. These properties ensure that our constructed PQCs retain practicality and remain firmly rooted within the quantum domain.

Appendix B Implementing multivariate polynomials using PQCs

B.1 Implementing multivariate real polynomials

A multivariate polynomial with $d$ variables and degree $s\in{{\mathbb{N}}}$ is defined as

p(\bm{x})\coloneqq\sum_{\lVert\bm{\alpha}\rVert_{1}\leq s}c_{\bm{\alpha}}\bm{x% ^{\alpha}},

(B.11)

where $\bm{x}=(x_{1},\ldots,x_{d})\in{{\mathbb{R}}}^{d}$ , $\bm{\alpha}=(\alpha_{1},\ldots,\alpha_{d})\in{{\mathbb{N}}}^{d}$ , $c_{\bm{\alpha}}\in{{\mathbb{R}}}$ and $\bm{x^{\alpha}}=x_{1}^{\alpha_{1}}x_{2}^{\alpha_{2}}\cdots x_{d}^{\alpha_{d}}$ . To implement the multivariate polynomial $p(\bm{x})$ , we first build a PQC to express a monomial $c_{\bm{\alpha}}\bm{x^{\alpha}}=c_{\bm{\alpha}}x_{1}^{\alpha_{1}}x_{2}^{\alpha_% {2}}\cdots x_{d}^{\alpha_{d}}$ , where $\lvert c_{\bm{\alpha}}\bm{x^{\alpha}}\rvert\leq 1$ for $\bm{x}\in[0,1]^{d}$ and $\lVert\bm{\alpha}\rVert_{1}\leq s$ . We apply the single-qubit PQC with Pauli $X$ basis encoding defined in Eq. A.4 on each $x_{j}$ for $1\leq j\leq d$ , respectively.

Lemma S5.

Given a monomial $c_{\bm{\alpha}}\bm{x^{\alpha}}=c_{\bm{\alpha}}x_{1}^{\alpha_{1}}x_{2}^{\alpha_% {2}}\cdots x_{d}^{\alpha_{d}}$ such that $\lvert c_{\bm{\alpha}}\bm{x^{\alpha}}\rvert\leq 1$ for all $\bm{x}\in[0,1]^{d}$ and $\lVert\bm{\alpha}\rVert_{1}\leq s$ for $s\in{{\mathbb{N}}}$ , there exists a PQC $U^{\bm{\alpha}}(\bm{x})$ such that

\bra{+}^{\otimes d}\!U^{\bm{\alpha}}(\bm{x})\!\ket{+}^{\otimes d}=c_{\bm{% \alpha}}\bm{x^{\alpha}}.

(B.12)

The width of the PQC is at most $d$ , the depth is at most $2s+1$ , and the number of parameters is at most $s+d$ .

Proof.

By Corollary S2, there exist $d$ single-qubit PQCs $U_{\bm{\theta}_{1}}^{\alpha_{1}}(x_{1}),U_{\bm{\theta}_{2}}^{\alpha_{2}}(x_{2}% ),\ldots,U_{\bm{\theta}_{d}}^{\alpha_{d}}(x_{d})$ such that

	$\displaystyle\braket{+}{U_{\bm{\theta}_{1}}^{\alpha_{1}}(x_{1})}{+}$	$\displaystyle=c_{\bm{\alpha}}x_{1}^{\alpha_{1}},$
	$\displaystyle\braket{+}{U_{\bm{\theta}_{2}}^{\alpha_{2}}(x_{2})}{+}$	$\displaystyle=x_{2}^{\alpha_{2}},$
	$\displaystyle\cdots$
	$\displaystyle\braket{+}{U_{\bm{\theta}_{d}}^{\alpha_{d}}(x_{d})}{+}$	$\displaystyle=x_{d}^{\alpha_{d}},$

where the number of layers of each PQC is $L_{j}=\alpha_{j}$ for $1\leq j\leq d$ . We then define a $d$ -qubit PQC as

U^{\bm{\alpha}}(\bm{x})=\bigotimes_{j=1}^{d}U_{\bm{\theta}_{j}}^{\alpha_{j}}(x% _{j}),

(B.13)

which gives

\bra{+}^{\otimes d}\!U^{\bm{\alpha}}(\bm{x})\!\ket{+}^{\otimes d}=\prod_{j=1}^% {d}\braket{+}{U_{\bm{\theta}_{j}}^{\alpha_{j}}(x_{j})}{+}=c_{\bm{\alpha}}\bm{x% ^{\alpha}}.

(B.14)

Since $\lVert\bm{\alpha}\rVert_{1}=\sum_{j=1}^{d}\alpha_{j}\leq s$ , we can conclude that the depth of $U^{\bm{\alpha}}(\bm{x})$ is at most $2s+1$ and the number of parameters in $U^{\bm{\alpha}}(\bm{x})$ is at most $s+d$ . $\sqcap$ $\sqcup$

The next step is to combine monomials together to implement the multivariate polynomial. Specifically, we would like to implement the following (unnormalized) operator

U_{p}(\bm{x})\coloneqq\sum_{\lVert\bm{\alpha}\rVert_{1}\leq s}U^{\bm{\alpha}}(% \bm{x})

(B.15)

so that we can implement an (unnormalized) polynomial as

\bra{+}^{\otimes d}\!U_{p}(\bm{x})\!\ket{+}^{\otimes d}=\sum_{\lVert\bm{\alpha% }\rVert_{1}\leq s}\bra{+}^{\otimes d}\!U^{\bm{\alpha}}(\bm{x})\!\ket{+}^{% \otimes d}=\sum_{\lVert\bm{\alpha}\rVert_{1}\leq s}c_{\bm{\alpha}}\bm{x^{% \alpha}}=p(\bm{x}).

(B.16)

We denote $T$ the number of terms in the summation and observe that it can be bounded as

T=\sum_{\lVert\bm{\alpha}\rVert_{1}\leq s}1=\sum_{j=0}^{s}\sum_{\lVert\bm{% \alpha}\rVert_{1}=j}1\leq\sum_{j=0}^{s}d^{s}\leq(s+1)d^{s}.

(B.17)

For convenience, we rewrite the normalized target operator with $\bm{\alpha}$ being an indexed variable as

U_{p}(\bm{x})=\sum_{j=1}^{T}\frac{1}{T}U^{\bm{\alpha}^{(j)}}(\bm{x}).

(B.18)

However, the addition operation in quantum computing is non-trivial as the sum of unitary operators is not necessarily unitary. To sum the monomials together, we utilize the technique of linear combination of unitaries (LCU) [48] to implement the operator $U_{p}(\bm{x})$ in Eq. B.18 on a quantum computer. We first construct a unitary operator $F$ such that

F\ket{0}=\frac{1}{\sqrt{T}}\sum_{j=1}^{T}\ket{j}.

(B.19)

The unitary $F$ could be simply implemented by Hadamard gates. Next, we construct a controlled unitary

U_{c}(\bm{x})=\sum_{j=1}^{T}\lvert j\rangle\!\langle j\rvert\otimes U^{\bm{% \alpha}^{(j)}}(\bm{x}).

(B.20)

Note that each $\lvert j\rangle\!\langle j\rvert\otimes U^{\bm{\alpha}^{(j)}}(\bm{x})$ could be constructed using $(\log T)$ -qubit controlled Pauli rotation gates, as $U^{\bm{\alpha}^{(j)}}(\bm{x})$ consisting of single-qubit Pauli rotation gates. The $(\log T)$ -qubit controlled gates could be further decomposed into quantum circuits of CNOT gates and single-qubit rotation gates in $O(\log T)$ circuit depth without using any ancilla qubit. We refer to the detailed implementation of these multi-controlled gates to da Silva and Park [49]. Then the unitary $W_{lcu}=(F^{\dagger}\otimes I)U_{c}(F\otimes I)$ satisfies that

W_{lcu}\ket{0}\ket{+}^{\otimes d}=\ket{0}U_{p}(\bm{x})\ket{+}^{\otimes d}+\ket% {\perp},

(B.21)

where $(\bra{0}\otimes I)\ket{\perp}=0$ . Notice that

\bra{0}\bra{+}^{\otimes d}W_{lcu}\ket{0}\ket{+}^{\otimes d}=\bra{+}^{\otimes d% }U_{p}(\bm{x})\ket{+}^{\otimes d}=p(\bm{x}).

(B.22)

To obtain the polynomial $p(\bm{x})$ , we could estimate $\bra{0}\bra{+}^{\otimes d}W_{lcu}\ket{0}\ket{+}^{\otimes d}$ using the Hadamard test.

Theorem 1.

For any multivariate polynomial $p(\bm{x})$ with $d$ variables and degree $s$ such that $\lvert p(\bm{x})\rvert\leq 1$ for $\bm{x}\in[0,1]^{d}$ , there exists a PQC $W_{p}(\bm{x})$ such that

f_{W_{p}}(\bm{x})\coloneqq\bra{0}W^{\dagger}_{p}(\bm{x})Z^{(0)}W_{p}(\bm{x})% \ket{0}=p(\bm{x})

(B.23)

Proof.

We apply the Hadamard test on $W_{lcu}$ , giving the quantum circuit $W_{p}(\bm{x})$ as follows.

\Qcircuit@C=1em@R=0.5em{\lstick{\ket{0}}&\qw\gate{H}\qw\ctrl{1}\qw\gate{H}\qw% \\ \lstick{\ket{0}}{/}\qw\qw\qw\multigate{1}{W_{lcu}}\qw\qw\qw\\ \lstick{\ket{0}}{/}\qw\gate{H^{\otimes d}}\qw\ghost{H^{\otimes d}}\qw\qw\qw}

Measuring the first qubit of $W_{p}(\bm{x})$ , we have

f_{W_{p}}(\bm{x})\coloneqq\bra{0}W^{\dagger}_{p}(\bm{x})Z^{(0)}W_{p}(\bm{x})% \ket{0}=\bra{0}\bra{+}^{\otimes d}W_{lcu}\ket{0}\ket{+}^{\otimes d}=p(\bm{x}).

(B.24)

The controlled unitary used in LCU,

U_{c}(\bm{x})=\sum_{j=1}^{T}\lvert j\rangle\!\langle j\rvert\otimes U^{\bm{% \alpha}^{(j)}}(\bm{x}),

(B.25)

could be implemented by at most $O(Ts)$ $(\log T)$ -qubit controlled gates. A $(\log T)$ -qubit controlled gate could be implemented by a quantum circuit consisting of CNOT gates and single-qubit gates with depth $O(\log T)$ [49]. Thus $U_{c}(\bm{x})$ could be implemented by a quantum circuit with depth $O(sT\log T)$ and width $O(d+\log T)$ . Then the depth and width of $W_{lcu}=(F^{\dagger}\otimes I)U_{c}(F\otimes I)$ are in the same order of $U_{c}(\bm{x})$ since $F$ is simply tensor of Hadamard gates. Therefore the entire depth of the circuit $W_{p}$ is $O\bigl{(}sT\log T+d\bigr{)}$ , and the width of $W_{p}$ is $O(d+\log T)$ . As $T\leq(s+1)d^{s}$ . Note that the number of parameters in the PQC equals the number of parameters in $U_{c}(\bm{x})$ , which is $O(T(s+d))$ . $\sqcap$ $\sqcup$

Note that measuring the first qubit of $W_{p}(\bm{x})$ for $O(\frac{1}{\varepsilon^{2}})$ times is needed to estimate the value of $p(\bm{x})$ up to an additive error $\varepsilon$ . We could further use the amplitude estimation algorithm [50] to reduce the overhead while increasing the circuit depth by $O(\frac{1}{\varepsilon})$ .

B.2 Implementing multivariate trigonometric polynomials

We extend the PQCs with $R_{Z}$ encoding to implement multivariate trigonometric polynomials. A multivariate trigonometric polynomials with $d$ variables and degree $s$ is defined as

t(\bm{x})\coloneqq\sum_{\lVert\bm{n}\rVert_{1}\leq s}c_{\bm{n}}e^{i\bm{n}\cdot% \bm{x}}

(B.26)

where $c_{\bm{n}}\in{{\mathbb{C}}}$ , $\bm{x}=(x_{1},\ldots,x_{d})\in{{\mathbb{R}}}^{d}$ , $\bm{\nu}=(\alpha_{1},\ldots,\alpha_{d})\in{{\mathbb{Z}}}^{d}$ , and $e^{i\bm{n}\cdot\bm{x}}=e^{in_{1}x_{1}}e^{in_{2}x_{2}}\cdots e^{in_{d}x_{d}}$ . Consider a trigonometric monomial $c_{\bm{n}}e^{i\bm{n}\cdot\bm{x}}=c_{\bm{n}}e^{in_{1}x_{1}}e^{in_{2}x_{2}}% \cdots e^{in_{d}x_{d}}$ such that $\lvert c_{\bm{n}}e^{i\bm{n}\cdot\bm{x}}\rvert\leq 1$ for all $\bm{x}\in{{\mathbb{R}}}^{d}$ and $\lVert\bm{n}\rVert_{1}\leq s$ , we could apply the single-qubit PQC with $R_{Z}$ encoding as defined in Eq. A.8 on each $x_{j}$ for $1\leq j\leq d$ respectively.

Lemma S6.

Given a trigonometric monomial $c_{\bm{n}}e^{i\bm{n}\cdot\bm{x}}=c_{\bm{n}}e^{in_{1}x_{1}}e^{in_{2}x_{2}}% \cdots e^{in_{d}x_{d}}$ such that $\lvert c_{\bm{n}}e^{i\bm{n}\cdot\bm{x}}\rvert\leq 1$ for all $\bm{x}\in{{\mathbb{R}}}^{d}$ and $\lVert\bm{n}\rVert_{1}\leq s$ , there exists a PQC $U^{\bm{n}}(\bm{x})$ such that

\bra{0}^{\otimes d}\!U^{\bm{n}}(\bm{x})\!\ket{0}^{\otimes d}=c_{\bm{n}}e^{i\bm% {n}\cdot\bm{x}}.

(B.27)

The width of the PQC is at most $d$ , the depth is at most $6s+3$ , and the number of parameters is at most $4s+3d$ .

Proof.

By Corollary S4, we could construct $d$ single-qubit PQCs $U_{\bm{\theta}_{1},\bm{\phi}_{1}}^{n_{1}}(x_{1}),U_{\bm{\theta}_{2},\bm{\phi}_% {2}}^{n_{2}}(x_{2}),\ldots,U_{\bm{\theta}_{d},\bm{\phi}_{d}}^{n_{d}}(x_{d})$ such that

	$\displaystyle\braket{0}{U_{\bm{\theta}_{1},\bm{\phi}_{1}}^{n_{1}}(x_{1})}{0}$	$\displaystyle=c_{\bm{n}}e^{in_{1}x_{1}},$
	$\displaystyle\braket{0}{U_{\bm{\theta}_{2},\bm{\phi}_{2}}^{n_{2}}(x_{2})}{0}$	$\displaystyle=e^{in_{2}x_{2}},$
	$\displaystyle\cdots$
	$\displaystyle\braket{0}{U_{\bm{\theta}_{d},\bm{\phi}_{d}}^{n_{d}}(x_{d})}{0}$	$\displaystyle=e^{in_{d}x_{d}},$

where the number of layers of each PQC is $L_{j}=n_{j}$ for $1\leq j\leq d$ . We then define a $d$ -qubit PQC as

U^{\bm{n}}(\bm{x})=\bigotimes_{j=1}^{d}U_{\bm{\theta}_{j},\bm{\phi}_{j}}^{n_{j% }}(x_{j}),

(B.28)

which gives

\bra{0}^{\otimes d}\!U^{\bm{n}}(\bm{x})\!\ket{0}^{\otimes d}=\prod_{j=1}^{d}% \braket{0}{U_{\bm{\theta}_{j},\bm{\phi}_{j}}^{n_{j}}(x_{j})}{0}=c_{\bm{n}}e^{i% \bm{n}\cdot\bm{x}}.

(B.29)

Since $\lVert\bm{n}\rVert_{1}=\sum_{j=1}^{d}n_{j}\leq s$ , we can conclude that the depth of $U^{\bm{n}}(\bm{x})$ is at most $6s+3$ and the number of parameters in $U^{\bm{n}}(\bm{x})$ is at most $4s+3d$ . $\sqcap$ $\sqcup$

Then we could apply the technique of LCU on the PQCs $U^{\bm{n}}(\bm{x})$ to implement the operator

U_{t}(\bm{x})\coloneqq\sum_{\lVert\bm{n}\rVert_{1}\leq s}U^{\bm{n}}(\bm{x}),

(B.30)

so that we can implement the multivariate trigonometric polynomial as

\bra{+}^{\otimes d}\!U_{t}(\bm{x})\!\ket{+}^{\otimes d}=\sum_{\lVert\bm{n}% \rVert_{1}\leq s}\bra{+}^{\otimes d}\!U^{\bm{n}}(\bm{x})\!\ket{+}^{\otimes d}=% \sum_{\lVert\bm{n}\rVert_{1}\leq s}c_{\bm{n}}e^{i\bm{n}\cdot\bm{x}}=t(\bm{x}).

(B.31)

Note that the number of terms in the summation is

\sum_{\lVert\bm{n}\rVert_{1}\leq s}1=\sum_{j=0}^{s}\sum_{\lVert\bm{n}\rVert_{1% }=j}1\leq\sum_{j=0}^{s}d^{2s}\leq(s+1)d^{2s}.

(B.32)

Then, we have the following proposition.

Proposition S7.

For any multivariate trigonometric polynomial $t(\bm{x})$ with $d$ variables and degree $s$ such that $\lvert t(\bm{x})\rvert\leq 1$ for $\bm{x}\in{{\mathbb{R}}}^{d}$ , there exists a PQC $W_{tri}(\bm{x})$ such that

f_{W_{tri}}(\bm{x})\coloneqq\bra{0}W^{\dagger}_{tri}(\bm{x})Z^{(0)}W_{tri}(\bm% {x})\ket{0}=t(\bm{x})

(B.33)

where $Z^{(0)}$ is the Pauli $Z$ observable on the first qubit. The width of the PQC is $O(d+\log s+s\log d)$ , the depth is $O(s^{2}d^{2s}(\log s+s\log d))$ , and the number of parameters is $O(sd^{2s}(s+d))$ .

The proof is similar to Theorem 1. This result demonstrates the universal approximation property of PQC in the perspective of multivariate Fourier series, which yields similar results as in Schuld et al. [34]. Notably, the PQC in Proposition S7 has an explicit construction without any assumption, improving the implicit PQCs proposed in Schuld et al. [34] in terms of circuit size. For instance, to implement the $d$ -variable Fourier series with degree $s$ , the PQC with parallel structure in Schuld et al. [34] requires width $O(ds)$ and potentially exponential depth $O(4^{ds})$ .

Appendix C Approximating continuous functions via PQCs

We have constructively shown in the previous section that PQCs could implement multivariate polynomials. To study the approximation capabilities of PQC, a natural strategy involves aggregating multiple polynomials to approximate the continuous function, drawing on well-established principles from classical approximation theory. In the context of univariate functions, this endeavor is guided by the Stone-Weierstrass Theorem [69]. For the multivariate case, we accomplish this task by employing PQCs to implement Bernstein polynomials, followed by the established result on the approximation error bound of Bernstein polynomials [52, 53].

C.1 Established results of Bernstein polynomials approximation

For a $d$ -variable continuous function $f\mathrel{\mathop{\mathchar 58\relax}}{{\mathbb{R}}}^{d}\to{{\mathbb{R}}}$ , the multivariate Bernstein polynomial with degree $n\in{{\mathbb{N}}}$ of $f$ is defined as

B_{n}(f;\bm{x})\coloneqq\sum_{k_{1}=0}^{n}\cdots\sum_{k_{d}=0}^{n}f\bigl{(}% \frac{\bm{k}}{n}\bigr{)}\prod_{j=1}^{d}\binom{n}{k_{j}}x_{j}^{k_{j}}(1-x_{j})^% {n-k_{j}},

(C.34)

and $\bm{k}=(k_{1},\ldots,k_{d})\in\{0,\ldots,n\}^{d}$ . Then, we have the following lemma on the approximation error bound of the Bernstein polynomial.

Lemma S8 (Bernstein polynomials approximation for Lipschitz functions [53]).

Given a Lipschitz continuous function $f\mathrel{\mathop{\mathchar 58\relax}}[0,1]^{d}\to{{\mathbb{R}}}$ with Lipschitz constant $\ell$ , which is defined as $|f(\bm{x})-f(\bm{y})|\leq\ell\|\bm{x}-\bm{y}\|_{\infty}$ . Let $f$ be bounded by $\Gamma$ . The approximation error of the $n$ -degree Bernstein polynomial of $f$ scales as

\lvert f(\bm{x})-B_{n}(f;\bm{x})\rvert\leq\varepsilon+2\Gamma\sum_{j=1}^{d}% \binom{d}{j}\left(\frac{\ell^{2}}{4n\varepsilon^{2}}\right)^{j}\leq\varepsilon% +2\Gamma\left(\left(1+\frac{\ell^{2}}{4n\varepsilon^{2}}\right)^{d}-1\right),

(C.35)

where $\varepsilon>0$ is an arbitrarily small quantity.

Proof.

Drawing inspiration from the Lipschitz continuity of the target function $f$ , we define $\delta=\epsilon/\ell$ . Consequently, for any two points $\bm{x}=(x_{1},\dots,x_{d})$ and $\bm{y}=(y_{1},\dots,y_{d})$ such that $|x_{i}-y_{i}|<\delta$ for all $i\in\{1,\dots,d\}$ , it follows that $|f(\bm{x})-f(\bm{y})|\leq\varepsilon$ . The target function can be written as

$\displaystyle f(\bm{x})$	$\displaystyle=$	$\displaystyle f(x_{1},\dots,x_{d})$	(C.38)
	$\displaystyle=$	$\displaystyle f\left(x_{1},\cdots,x_{d}\right)\sum_{k_{1}=0}^{n}\cdots\sum_{k_% {d}=0}^{n}\prod_{i=1}^{d}\left(\begin{array}[]{l}n\\ k_{i}\end{array}\right)x_{i}^{k_{i}}\left(1-x_{i}\right)^{n-k_{i}}$	(C.38)
	$\displaystyle=$	$\displaystyle\sum_{k_{1}=0}^{n}\cdots\sum_{k_{d}=0}^{n}f\left(x_{1},\cdots,x_{% d}\right)\prod_{i=1}^{d}\left(\begin{array}[]{l}n\\ k_{i}\end{array}\right)x_{i}^{k_{i}}\left(1-x_{i}\right)^{n-k_{i}}.$	(C.41)

Let us consider the set $E=\prod_{i=1}^{d}\{0,1,\dots,n\}$ , and for $j=1,2,\dots,d$ , we define the sets

\displaystyle\Omega_{j}=\{k_{j}\in\{0,1,\dots,n\}\mathrel{\mathop{\mathchar 58% \relax}}|\frac{k_{i}}{n}-x_{j}|<\delta\}\text{ and }F=E\setminus(\Omega_{1}% \times\cdots\times\Omega_{d}).

(C.42)

Then, $F=\bigcup_{k=1}^{d}F_{k}$ , with $F_{k}=\left\{\prod_{i=1}^{d}\Omega_{ik}^{\left[\alpha_{ik}\right]}\in F% \mathrel{\mathop{\mathchar 58\relax}}\alpha_{ik}\in\{0,1\},\quad\sum_{i=1}^{d}% \alpha_{ik}=k\right\}$ , where $\Omega_{ik}^{\left[\alpha_{ik}\right]}=\left\{\begin{array}[]{ll}\Omega_{i}&% \text{ if }\alpha_{ik}=0\\ \Omega_{i}^{c}&\text{ if }\alpha_{ik}=1\end{array}\right.$ and $\Omega_{i}^{c}=\left\{k_{i}\in\left\{0,\cdots,n\right\}\mathrel{\mathop{% \mathchar 58\relax}}\left|\frac{k_{i}}{n}-x_{i}\right|\geq\delta\right\}$ . For $A_{k}=\prod_{i=1}^{d}\Omega_{ik}^{\left[\alpha_{ik}\right]}\in F_{k},k=1,% \cdots,d$ , let us define also $I_{A_{k}}=\left\{i\in\{1,\cdots,d\}\mathrel{\mathop{\mathchar 58\relax}}\alpha% _{ik}=1\right\}$ (that means card $\left.\left(I_{A_{k}}\right)=k\geq 1\right)$ . We have

	$\displaystyle\left\|f\left(x_{1},\cdots,x_{d}\right)-B_{n}\left(f;x_{1},\cdots,% x_{d}\right)\right\|$	(C.43)
$\displaystyle=$	$\displaystyle\mid\sum_{k_{1}=0}^{n}\cdots\sum_{k_{d}=0}^{n}f\left(x_{1},\cdots% ,x_{d}\right)\prod_{i=1}^{d}\left(\begin{array}[]{l}n\\ k_{i}\end{array}\right)x_{i}^{k_{i}}\left(1-x_{i}\right)^{n-k_{i}}$
	$\displaystyle-\sum_{k_{1}=0}^{n}\cdots\sum_{k_{d}=0}^{n}f\left(\frac{k_{1}}{n}% ,\cdots,\frac{k_{d}}{n}\right)\prod_{i=1}^{d}\left(\begin{array}[]{l}n\\ k_{i}\end{array}\right)x_{i}^{k_{i}}\left(1-x_{i}\right)^{n-k_{i}}\mid$
$\displaystyle=$	$\displaystyle\left\|\sum_{k_{1}=0}^{n}\cdots\sum_{k_{d}=0}^{n}\left[f\left(x_{1% },\cdots,x_{d}\right)-f\left(\frac{k_{1}}{n},\cdots,\frac{k_{d}}{n}\right)% \right]\prod_{i=1}^{d}\left(\begin{array}[]{l}n\\ k_{i}\end{array}\right)x_{i}^{k_{i}}\left(1-x_{i}\right)^{n-k_{i}}\right\|$
$\displaystyle\leq$	$\displaystyle\sum_{k_{1}=0}^{n}\cdots\sum_{k_{d}=0}^{n}\left\|f\left(x_{1},% \cdots,x_{d}\right)-f\left(\frac{k_{1}}{n},\cdots,\frac{k_{d}}{n}\right)\right% \|\prod_{i=1}^{d}\left(\begin{array}[]{l}n\\ k_{i}\end{array}\right)x_{i}^{k_{i}}\left(1-x_{i}\right)^{n-k_{i}}$
$\displaystyle\leq$	$\displaystyle\sum_{\Omega_{1}}\cdots\sum_{\Omega_{d}}\left\|f\left(x_{1},\cdots% ,x_{d}\right)-f\left(\frac{k_{1}}{n},\cdots,\frac{k_{d}}{n}\right)\right\|\prod% _{i=1}^{d}\left(\begin{array}[]{l}n\\ k_{i}\end{array}\right)x_{i}^{k_{i}}\left(1-x_{i}\right)^{n-k_{i}}$
	$\displaystyle+\sum_{F}\left\|f\left(x_{1},\cdots,x_{d}\right)-f\left(\frac{k_{1% }}{n},\cdots,\frac{k_{d}}{n}\right)\right\|\prod_{i=1}^{d}\left(\begin{array}[]% {l}n\\ k_{i}\end{array}\right)x_{i}^{k_{i}}\left(1-x_{i}\right)^{n-k_{i}}.$

Using the fact that $f$ is continuous and bounded, we get

	$\displaystyle\left\|f\left(x_{1},\cdots,x_{d}\right)-B_{n}\left(f;x_{1},\cdots,% x_{d}\right)\right\|$	(C.44)
$\displaystyle\leq$	$\displaystyle\varepsilon\sum_{\Omega_{1}}\cdots\sum_{\Omega_{d}}\prod_{i=1}^{d% }\left(\begin{array}[]{l}n\\ k_{i}\end{array}\right)x_{i}^{k_{i}}\left(1-x_{i}\right)^{n-k_{i}}+2\Gamma\sum% _{F}\prod_{i=1}^{d}\left(\begin{array}[]{l}n\\ k_{i}\end{array}\right)x_{i}^{k_{i}}\left(1-x_{i}\right)^{n-k_{i}}$
$\displaystyle\leq$	$\displaystyle\varepsilon+2\Gamma\sum_{l=1}^{d}\sum_{A_{l}\in F_{l}}\prod_{i=1}% ^{d}\left(\begin{array}[]{l}n\\ k_{i}\end{array}\right)x_{i}^{k_{i}}\left(1-x_{i}\right)^{n-k_{i}}$
$\displaystyle\leq$	$\displaystyle\varepsilon+2\Gamma\sum_{l=1}^{d}\sum_{A_{l}\in F_{l}}\prod_{i\in I% _{A_{l}}}\frac{1}{4n\delta^{2}}$
$\displaystyle=$	$\displaystyle\varepsilon+2\Gamma\sum_{j=1}^{d}\binom{d}{j}\frac{1}{(4n\delta^{% 2})^{j}}\leq\varepsilon+2\Gamma\left(\left(1+\frac{\ell^{2}}{4n\varepsilon^{2}% }\right)^{d}-1\right).$

This completes the proof. A more detailed expansion of Eq. C.44 can be seen in Theorem 2 in Foupouagnigni and Mouafo Wouodjié [53]. $\sqcap$ $\sqcup$

Remark S2.

Here, it is important to observe that for a continuous target function, denoted as $f(\bm{x})$ , there exists a value of $\delta>0$ such that:

\left|f\left(x_{1},\cdots,x_{d}\right)-B_{n}\left(f;x_{1},\cdots,x_{d}\right)% \right|\leq\varepsilon+2\Gamma\left(\left(1+\frac{1}{4n\delta^{2}}\right)^{d}-% 1\right).

This expression signifies the convergence rate of the Bernstein polynomial for general continuous functions.

C.2 Implement Bernstein polynomials via PQCs

In Lemma S8, we have defined the Bernstein polynomial and its approximation error towards the Lipschitz continuous function. Guided by Theorem 1, we can construct a PQC to implement such a Bernstein polynomial.

Lemma S9.

For any $d$ -variable Bernstein polynomial with degree $n\in{{\mathbb{N}}}$ defined in Eq. C.34 such that $\lvert B_{n}(f;\bm{x})\rvert\leq 1$ for $\bm{x}\in[0,1]^{d}$ , there exist a PQC $W_{b}(\bm{x})$ satisfying

f_{W_{b}}(\bm{x})\coloneqq\bra{0}W^{\dagger}_{b}(\bm{x})Z^{(0)}W_{b}(\bm{x})% \ket{0}=B_{n}(f;\bm{x}).

(C.45)

The width of the PQC is $O(d\log{n})$ , the depth is $O\bigl{(}dn^{d}\log{n}\bigr{)}$ , and the number of parameters is $O(dn^{d})$ .

Proof.

We undertake a two-step process in the proof of Lemma S9. Initially, we construct PQCs to provide an exact representation of $f\bigl{(}\frac{\bm{k}}{n}\bigr{)}\prod_{j=1}^{d}\binom{n}{k_{j}}x_{j}^{k_{j}}(% 1-x_{j})^{n-k_{j}}$ for all $\bm{k}\in\{0,1,\dots,n\}^{d}$ . Subsequently, we employ LCU to aggregate these PQCs for the purpose of approximating the Bernstein polynomial described in Eq. C.34.

The univariate polynomial $x^{k}(1-x)^{n-k}$ can be represented by a PQC. The depth of this PQC is less than $2n+1$ , the width is $2$ , and the number of parameters is $n+2$ . The multivariate polynomial $f\bigl{(}\frac{\bm{k}}{n}\bigr{)}\prod_{j=1}^{d}\binom{n}{k_{j}}x_{j}^{k_{j}}(% 1-x_{j})^{n-k_{j}}$ can be exactly represented by the product of the univariate polynomial $x^{k}(1-x)^{n-k}$ . The same routine has been employed in Lemma S5. The depth of this PQC is less than $2n+1$ , the width is $2d$ , and the number of parameters is $d(n+2)$ .

The number of terms in the summation in Eq. C.34 is $(n+1)^{d}$ . We can employ the same routine in Theorem 1 to construct the PQC $W_{b}(\bm{x})$ . The depth of $W_{b}$ scales as

O\Bigl{(}\bigl{(}d(n+1)^{d+1}\log{(n+1)}+d\bigr{)}\Bigr{)},

the width is $2d+d\log{(n+1)}$ , and the number of parameters is $(n+1)^{d}(n+2)d$ . The results presented in Lemma S9 can be obtained after simplification. $\sqcap$ $\sqcup$

C.3 PQC approximating continuous functions

We have successfully derived results regarding the approximation error between PQCs and Bernstein polynomials and between Bernstein polynomials and continuous functions. Leveraging these established findings, we can now formulate a rigorous assertion regarding the universal approximation theorem and the error bound of PQCs, employing the well-established principles of triangle inequality.

Theorem 2 (The Universal Approximation Theorem of PQC).

\lvert f(\bm{x})-f_{W_{b}}(\bm{x})\rvert\leq\varepsilon

(C.46)

for all $\bm{x}\in[0,1]^{d}$ , where $f_{W_{b}}(\bm{x})\coloneqq\bra{0}W^{\dagger}_{b}(\bm{x})Z^{(0)}W_{b}(\bm{x})% \ket{0}$ .

Proof.

Remark S2 has established the uniform convergence of the Bernstein polynomial towards any continuous function within the cubic domain $[0,1]^{d}$ , denoted as $B_{n}(f;\bm{x})$ , with the property that $B_{n}(f;\bm{x})\rightarrow f(\bm{x})$ as $n\rightarrow+\infty$ . Building on Lemma S9, we can effectively implement this Bernstein polynomial $B_{n}(f;\bm{x})$ using $f_{W_{b}}(\bm{x})$ . The depth of the PQC $W_{b}(\bm{x})$ is $O\bigl{(}dn^{d}\log{n}\bigr{)}$ , the width is $O(d\log{n})$ , and the number of parameters is $O(dn^{d})$ . This completes the proof. $\sqcap$ $\sqcup$

Theorem 3.

\lvert f(\bm{x})-f_{W_{b}}(\bm{x})\rvert\leq\varepsilon+2\biggl{(}\Bigl{(}1+% \frac{\ell^{2}}{n\varepsilon^{2}}\Bigr{)}^{d}-1\biggr{)}\leq\varepsilon+d2^{d}% \frac{\ell^{2}}{n\varepsilon^{2}}

(C.47)

for all $\bm{x}\in[0,1]^{d}$ . The width of the PQC is $O(d\log n)$ , the depth is $O\bigl{(}dn^{d}\log{n}\bigr{)}$ , and the number of parameters is $O(dn^{d})$ .

Proof.

Lemma S8 has established the uniform convergence rate of the Bernstein polynomial towards any Lipschitz continuous function within the cubic domain $[0,1]^{d}$ . We know that for any Lipschitz continuous function $f(\bm{x})$ with Lipschitz constant $\ell$ , there exists a Bernstein polynomial $B_{n}(f;\bm{x})$ satisfying

\lvert f(\bm{x})-B_{n}(f;\bm{x})\rvert\leq\varepsilon+2\Gamma\sum_{j=1}^{d}% \binom{d}{j}\left(\frac{\ell^{2}}{4n\varepsilon^{2}}\right)^{j}\leq\varepsilon% +2\Gamma\left(\left(1+\frac{\ell^{2}}{4n\varepsilon^{2}}\right)^{d}-1\right).

Building on Lemma S9, we can effectively implement this Bernstein polynomial $B_{n}(f;\bm{x})$ using $f_{W_{b}}(\bm{x})$ . The depth of the PQC $W_{b}(\bm{x})$ is $O\bigl{(}dn^{d}\log{n}\bigr{)}$ , the width is $O(d\log{n})$ , and the number of parameters is $O(dn^{d})$ . This completes the proof. $\sqcap$ $\sqcup$

Appendix D Approximating smooth functions via nested PQCs

Other than using a Bernstein polynomial to approximate a continuous function globally, we could also utilize local polynomials to achieve a piecewise approximation. To do this, we follow the path of classical deep neural networks [18, 21, 25], using multivariate Taylor series expansion to approximate a multivariate smooth function $f$ in some small local region. Let $\beta=s+r>0$ , $r=(0,1]$ and $s=\lfloor\beta\rfloor\in{{\mathbb{N}}}$ , for a finite constant $B_{0}>0$ , the $\beta$ -Hölder class of functions ${\cal H}^{\beta}([0,1]^{d},B_{0})$ is defined as

{\cal H}^{\beta}([0,1]^{d},B_{0})=\Bigl{\{}f\mathrel{\mathop{\mathchar 58% \relax}}[0,1]^{d}\to{{\mathbb{R}}},\max_{\lVert\bm{\alpha}\rVert_{1}\leq s}% \lVert\partial^{\bm{\alpha}}f\rVert_{\infty}\leq B_{0},\max_{\lVert\bm{\alpha}% \rVert_{1}=s}\sup_{\bm{x}\neq\bm{y}}\frac{\lvert\partial^{\bm{\alpha}}f(\bm{x}% )-\partial^{\bm{\alpha}}f(\bm{y})\rvert}{\lVert\bm{x}-\bm{y}\rVert_{2}^{r}}% \leq B_{0}\Bigr{\}},

(D.48)

where $\partial^{\bm{\alpha}}=\partial^{\alpha_{1}}\cdots\partial^{\alpha_{d}}$ for $\bm{\alpha}=(\alpha_{1},\ldots,\alpha_{d})\in{{\mathbb{N}}}^{d}$ . By definition, for a function $f\in{\cal H}^{\beta}([0,1]^{d},B_{0})$ , when $\beta\in(0,1)$ , $f$ is a Hölder continuous function with order $\beta$ and Hölder constant $B_{0}$ ; when $\beta=1$ , $f$ is a Lipschitz function with Lipschitz constant $B_{0}$ ; when $\beta>1$ , $f$ belongs to the $C^{s}$ class of functions whose $s$ -th partial derivatives exist and are bounded.

We utilize the following lemma on the Taylor expansion of $\beta$ -Hölder functions as a mathematical tool for constructing and analyzing the PQC approximation.

Lemma S10 ([18]).

Given a function $f\in{\cal H}^{\beta}([0,1]^{d},1)$ with $\beta=r+s$ , $r\in(0,1]$ and $s\in{{\mathbb{N}}}^{+}$ , for any $\bm{x},\bm{x_{0}}\in[0,1]^{d}$ , we have

\Big{\lvert}f(\bm{x})-\sum_{\lVert\bm{\alpha}\rVert_{1}\leq s}\frac{\partial^{% \bm{\alpha}}f(\bm{x_{0}})}{\bm{\alpha}!}(\bm{x}-\bm{x_{0}})^{\bm{\alpha}}\Big{% \rvert}\leq d^{s}\lVert\bm{x}-\bm{x_{0}}\rVert^{\beta}_{2},

(D.49)

where $\bm{\alpha}!=\alpha_{1}!\cdots\alpha_{d}!$ .

Next, we show how to construct PQCs to implement the Taylor expansion of $\beta$ -Hölder functions.

D.1 Localization via PQC

As shown in Eq. D.49, the Taylor expansion of a multivariate smooth function only converges in a fairly small local region. So, we need first to localize the entire region $[0,1]^{d}$ . Given $K\in{{\mathbb{N}}}$ and $\Delta\in(0,\frac{1}{3K})$ , for each $\bm{\eta}=(\eta_{1},\ldots,\eta_{d})\in\{0,1,\ldots,K-1\}^{d}$ , we define

Q_{\bm{\eta}}\coloneqq\Bigl{\{}\bm{x}=(x_{1},\ldots,x_{d})\mathrel{\mathop{% \mathchar 58\relax}}x_{i}\in\bigl{[}\frac{\eta_{i}}{K},\frac{\eta_{i}+1}{K}-% \Delta\cdot 1_{\eta_{i}<K-1}\bigr{]}\Bigr{\}}.

(D.50)

By the definition of $Q_{\bm{\eta}}$ , the region $[0,1]^{d}$ is approximately divided into small hypercubes $\bigcup_{\bm{\eta}}Q_{\bm{\eta}}$ and some trifling region $\Lambda(d,K,\Delta)\coloneqq[0,1]^{d}\setminus(\bigcup_{\bm{\eta}}Q_{\bm{\eta}})$ , as illustrated in Fig. 2 in the main text. Then we need to construct a PQC that maps any $x\in Q_{\bm{\eta}}$ to some fixed point $x_{\bm{\eta}}=\frac{\bm{\eta}}{K}\in Q_{\bm{\eta}}$ , i.e., approximating the piecewise-constant function $D(\bm{x})=\frac{\bm{\eta}}{K}$ if $\bm{x}\in Q_{\bm{\eta}}$ , where $\frac{\bm{\eta}}{K}=(\eta_{1}/K,\ldots,\eta_{d}/K)$ . We consider the case of $d=1$ , where the localization function is

D(x)=\frac{k}{K},\qquad\text{if $x\in\Bigl{[}\frac{k}{K},\frac{k+1}{K}-\Delta% \cdot 1_{k<K-1}\Bigr{]}$ for $k=0,1,\ldots,K-1$}.

(D.51)

The multivariate case could be easily generalized by applying $D(x)$ to each variable $x_{j}$ . The idea is to find a polynomial that approximates the sign function

\operatorname{sgn}(x-c)=\begin{cases}1,&\text{if $x>c$,}\\[1.0pt] 0,&\text{if $x=c$}\\ -1,&\text{if $x<c$}\end{cases},

(D.52)

as shown in the following lemma.

Lemma S11 (Polynomial approximation to the sign function $\operatorname{sgn}(x-c)$ [54]).

$\forall c\in[-1,1],\Delta>0,\varepsilon\in(0,1)$ . there exists an odd polynomial $P_{\Delta,\varepsilon}(x)$ of degree $n=O(\frac{1}{\Delta}\log\frac{1}{\varepsilon})$ that satisfies

1.

$\lvert P_{\Delta,\varepsilon}(x-c)\rvert\leq 1$ for all $x\in[-1,1],$
2.

$\lvert\operatorname{sgn}(x-c)-P_{\Delta,\varepsilon}(x-c)\rvert\leq\varepsilon$ for all $x\in[-1,1]\setminus(c-\frac{\Delta}{2},c+\frac{\Delta}{2})$ .

Note that we could also approximate the step function defined as $\operatorname{stp}(x-c)\coloneqq\frac{1}{2}\operatorname{sgn}(x-c)+\frac{1}{2}$ by the polynomial $P_{\Delta,\varepsilon}^{\prime}(x-c)=\frac{1}{2}P_{\Delta,\varepsilon}(x-c)+% \frac{1}{2}$ of degree $n=O(\frac{1}{\Delta}\log\frac{1}{\varepsilon})$ , which satisfies that $\lvert P_{\Delta,\varepsilon}^{\prime}(x-c)\rvert\leq 1$ for all $x\in[-1,1]$ and $\lvert\operatorname{stp}(x-c)-P_{\Delta,\varepsilon}^{\prime}(x-c)\rvert\leq% \frac{\varepsilon}{2}$ for all $x\in[-1,1]\setminus(c-\frac{\Delta}{2},c+\frac{\Delta}{2})$ . Note that the polynomial $P_{\Delta,\varepsilon}^{\prime}(x-c)$ does not have definite parity and thus cannot be directly implemented by a PQC as shown in Corollary S2. Since only the domain $[0,1]$ is relevant to $x$ , for $c\in(0,1)$ , we could define an even polynomial

P_{c,\Delta,\varepsilon}^{\text{even}}(x)=\frac{1}{1+\frac{\varepsilon}{2}}% \left(P_{\Delta,\varepsilon}^{\prime}(x-c)+P_{\Delta,\varepsilon}^{\prime}(-x-% c)\right)

(D.53)

such that $\lvert P_{c,\Delta,\varepsilon}^{\text{even}}(x)\rvert\leq 1$ for all $x\in[-1,1]$ and $\lvert\operatorname{stp}(x-c)-P_{c,\Delta,\varepsilon}^{\text{even}}(x)\rvert% \leq\frac{\varepsilon}{2}$ for all $x\in[0,1]\setminus(c-\frac{\Delta}{2},c+\frac{\Delta}{2})$ . The piecewise-constant function $D(x)$ can be written as a combination of step functions,

D(x)=\sum_{k=1}^{K-1}\frac{1}{K}\operatorname{stp}\bigl{(}x-\frac{k}{K}+\frac{% \Delta}{2}\bigr{)}.

(D.54)

Then we could find even polynomials $P_{c,\Delta,\varepsilon}^{\text{even}}(x)$ that approximate $\operatorname{stp}\bigl{(}x-\frac{k}{K}+\frac{\Delta}{2}\bigr{)}$ for each $k$ . Combining those polynomials together as in Eq. D.54, we have the following lemma.

Lemma S12.

Given $K\in{{\mathbb{N}}}$ and $\Delta\in(0,\frac{1}{3K})$ , there exists an even polynomial $P_{\Delta,\varepsilon}(x)$ of degree $n=O(\frac{1}{\Delta}\log\frac{K}{\varepsilon})$ that satisfies

1.

$\lvert P_{\Delta,\varepsilon}(x)\rvert\leq 1$ for all $x\in[-1,1],$
2.

$\lvert D(x)-P_{\Delta,\varepsilon}(x)\rvert\leq\varepsilon$ for all $x\in\bigcup_{k=0}^{K-1}\bigl{[}\frac{k}{K},\frac{k+1}{K}-\Delta\cdot 1_{k<K-1}% \bigr{]}$ .

Note that we could shift the polynomial $P_{\Delta,\varepsilon}(x)$ such that $P_{\Delta,\varepsilon}(x)-D(x)\in(0,\varepsilon)$ without changing the degree. It follows that we can construct a PQC to implement the polynomial $P_{\Delta,\varepsilon}(x)$ by Corollary S2.

Corollary S13.

Given $K\in{{\mathbb{N}}}$ , $\Delta\in(0,\frac{1}{3K})$ and $\varepsilon\in(0,\frac{1}{K})$ , there exists a single-qubit PQC $U_{D}(x)$ of depth $O(\frac{1}{\Delta}\log\frac{K}{\varepsilon})$ that satisfies

\braket{+}{U_{D}(x)}{+}-\frac{k}{K}\in(0,\varepsilon)\quad\text{if $x\in\Bigl{% [}\frac{k}{K},\frac{k+1}{K}-\Delta\cdot 1_{k<K-1}\Bigr{]}$ for $k=0,1,\ldots,K% -1$}.

(D.55)

Note that $\varepsilon$ has to be bounded by $\frac{1}{K}$ , which is the length of the localized region. We could further implement such a localization procedure for $\bm{x}=(x_{1},\ldots,x_{d})$ on the region $[0,1]^{d}$ by applying the PQC for each $x_{j}$ , as stated in the following corollary.

Lemma S14 (Localization via PQC).

Given $K\in{{\mathbb{N}}}$ , $\Delta\in(0,\frac{1}{3K})$ and $\varepsilon\in(0,\frac{1}{K})$ , there exists a PQC $W_{D}(\bm{x})$ of width $O(d)$ and depth $O(\frac{1}{\Delta}\log\frac{K}{\varepsilon})$ implementing a localization function $f_{W_{D}}(\bm{x})\mathrel{\mathop{\mathchar 58\relax}}{{\mathbb{R}}}^{d}\to{{% \mathbb{R}}}^{d}$ such that

\bm{0}\leq f_{W_{D}}(\bm{x})-\frac{{\bm{\eta}}}{K}\leq\bm{\varepsilon}\quad% \text{if $\bm{x}\in Q_{\bm{\eta}}$,}

(D.56)

where $\bm{0}=(0,\ldots,0)$ and $\bm{\varepsilon}=(\varepsilon,\ldots,\varepsilon)$ are $d$ -dimensional vectors.

Proof.

We construct a $d$ -qubit PQC $W_{D}(\bm{x})\coloneqq\bigotimes_{j=1}^{d}U_{D}(x_{j})$ where the single-qubit PQC $U_{D}(x)$ is constructed in Corollary S13. Then we apply the Hadamard test on each $U_{D}(x_{j})$ to obtain $f_{U_{D}}(x_{j})\coloneqq\braket{+}{U_{D}(x_{j})}{+}$ . Let $f_{W_{D}}(\bm{x})\coloneqq(f_{U_{D}}(x_{1}),\ldots,f_{U_{D}}(x_{d}))$ , which implements the localization function as required. $\sqcap$ $\sqcup$

D.2 Implementing the Taylor coefficients by PQC

Next, we use PQC to implement the Taylor coefficients, which is essentially a point-fitting problem. For each ${\bm{\eta}}=(\eta_{1},\ldots,\eta_{d})\in\{0,1,\ldots,K-1\}^{d}$ and $\bm{\alpha}$ , we denote $\xi_{{\bm{\eta}},\bm{\alpha}}\coloneqq\frac{\partial^{\bm{\alpha}}f(\frac{{\bm% {\eta}}}{K})}{\bm{\alpha}!}\in[-1,1]$ . Then we could construct the following PQC,

U_{co}^{\bm{\alpha}}=\sum_{{\bm{\eta}}}\lvert{\bm{\eta}}\rangle\!\langle{\bm{% \eta}}\rvert\otimes R_{X}(\theta_{\bm{\eta},\bm{\alpha}}),

(D.57)

where $\ket{{\bm{\eta}}}=\ket{\eta_{1}}\otimes\cdots\otimes\ket{\eta_{d}}$ and $\theta_{\bm{\eta},\bm{\alpha}}=2\arccos(\xi_{\bm{\eta},\bm{\alpha}})$ . It gives the following lemma.

Lemma S15.

Given a $\beta$ -Hölder smooth function $f\mathrel{\mathop{\mathchar 58\relax}}[0,1]^{d}\to[-1,1]$ , for any $\bm{\alpha}\in{{\mathbb{N}}}^{d}$ and ${\bm{\eta}}\in\{0,1,\ldots,K-1\}^{d}$ , there exists a PQC $U_{co}^{\bm{\alpha}}$ such that

\bra{{\bm{\eta}},0}U_{co}^{\bm{\alpha}}\ket{{\bm{\eta}},0}=\xi_{{\bm{\eta}},% \bm{\alpha}}.

(D.58)

The width of the PQC is $O(d\log K)$ , and the depth is $O(K^{d})$ .

We note that the state $\ket{{\bm{\eta}}}$ can be prepared using basis encoding according to the results of localization in Lemma S14.

D.3 Implementing multivariate Taylor series by PQC

To implement the multivariate Taylor expansion of a function, we first build a PQC to represent a single term in the Taylor series, which could be done by combining the monomial implementation in Lemma S5 and the Taylor coefficient implementation in Lemma S15. Thus, we have the following corollary.

Corollary S16.

For any $\beta$ -Hölder smooth function $f$ , given an $\bm{\alpha}\in{{\mathbb{N}}}^{d}$ with $\lVert\bm{\alpha}\rVert_{1}\leq s$ for $s\in{{\mathbb{N}}}^{+}$ and an ${\bm{\eta}}\in\{0,1,\ldots,K-1\}^{d}$ , there exists a PQC $U^{\bm{\alpha}}_{\bm{\eta}}(\bm{x})$ such that

\bra{\bm{\eta},0}\!\bra{+}^{\otimes d}U^{\bm{\alpha}}_{\bm{\eta}}(\bm{x})\ket{% \bm{\eta},0}\!\ket{+}^{\otimes d}=\frac{\partial^{\bm{\alpha}}f(\frac{\bm{\eta% }}{K})}{\bm{\alpha}!}\bigl{(}\bm{x}-\frac{\bm{\eta}}{K}\bigr{)}^{\bm{\alpha}}.

(D.59)

The width of the PQC is $O(d\log K)$ , the depth is $O(K^{d}+s)$ , and the number of parameters is at most $K^{d}+s+d$ .

Proof.

Let $U^{\bm{\alpha}}_{\bm{\eta}}(\bm{x})\coloneqq U_{co}^{\bm{\alpha}}\otimes U^{% \bm{\alpha}}(\bm{x}-\frac{\bm{\eta}}{K})$ , where $U_{co}^{\bm{\alpha}}$ is defined in Lemma S15 and $U^{\bm{\alpha}}(\bm{x}-\frac{\bm{\eta}}{K})$ is defined in Lemma S5 with changing input from $\bm{x}$ to $\bm{x}-\frac{\bm{\eta}}{K}$ . Then the corollary directly follows from Lemma S5 and Lemma S15. $\sqcap$ $\sqcup$

The next step is to combine single Taylor terms together to implement the truncated Taylor expansion of the target function. The method is in the same spirit as what is utilized in Theorem 1, i.e., using LCU to achieve the following (unnormalized) operator,

U_{t}(\bm{x})\coloneqq\sum_{\lVert\bm{\alpha}\rVert_{1}\leq s}U^{\bm{\alpha}}_% {\bm{\eta}}(\bm{x}).

(D.60)

Then we can implement the Taylor expansion of the function $f$ at point $\frac{\bm{\eta}}{K}$ as

\bra{\bm{\eta},0}\!\bra{+}^{\otimes d}U_{t}(\bm{x})\ket{\bm{\eta},0}\!\ket{+}^% {\otimes d}=\sum_{\lVert\bm{\alpha}\rVert_{1}\leq s}\frac{\partial^{\bm{\alpha% }}f(\frac{\bm{\eta}}{K})}{\bm{\alpha}!}\bigl{(}\bm{x}-\frac{\bm{\eta}}{K}\bigr% {)}^{\bm{\alpha}}.

(D.61)

Hence we have the following lemma.

Lemma S17.

Given a function $f\in{\cal H}^{\beta}([0,1]^{d},1)$ with $\beta=r+s$ , $r\in(0,1]$ and $s\in{{\mathbb{N}}}^{+}$ , for any $\bm{\eta}\in\{0,\ldots,K-1\}^{d}$ , there exists a PQC $W_{e}(\bm{x},\frac{\bm{\eta}}{K})$ such that $f_{W_{e}}(\bm{x})\coloneqq\bra{0}W^{\dagger}_{e}(\bm{x})Z^{(0)}W_{e}(\bm{x})% \ket{0}$ implements the truncated Taylor expansion at point $\frac{\bm{\eta}}{K}$ ,

f_{W_{e}}(\bm{x})=\sum_{\lVert\bm{\alpha}\rVert_{1}\leq s}\frac{\partial^{\bm{% \alpha}}f(\frac{\bm{\eta}}{K})}{\alpha!}\bigl{(}\bm{x}-\frac{\bm{\eta}}{K}% \bigr{)}^{\bm{\alpha}}.

(D.62)

The depth of the PQC is $O(s^{2}d^{s}K^{d}(\log s+s\log d+d\log K))$ , the width is $O(d\log K+\log s+s\log d)$ , and the number of parameters is $O(sd^{s}(s+d+K^{d}))$ .

Proof.

The idea of constructing the PQC $W_{e}(\bm{x},\frac{\bm{\eta}}{K})$ is similar to the construction of $W_{p}(\bm{x})$ in Theorem 1. The only difference is that here we apply LCU on unitaries $U^{\bm{\alpha}}_{\bm{\eta}}(\bm{x})\coloneqq U_{co}^{\bm{\alpha}}\otimes U^{% \bm{\alpha}}(\bm{x}-\frac{\bm{\eta}}{K})$ instead of $U^{\bm{\alpha}}(\bm{x})$ . Thus, the controlled unitary is

U_{c}\bigl{(}\bm{x},\frac{\bm{\eta}}{K}\bigr{)}=\sum_{j=1}^{T}\lvert j\rangle% \!\langle j\rvert\otimes U^{\bm{\alpha}^{(j)}}_{\bm{\eta}}(\bm{x})

(D.63)

and the unitary $W_{lcu}(\bm{x},\frac{\bm{\eta}}{K})=(F^{\dagger}\otimes I)U_{c}(\bm{x},\frac{% \bm{\eta}}{K})(F\otimes I)$ satisfies that

\bra{0}\!\bra{\bm{\eta},0}\!\bra{+}^{\otimes d}W_{lcu}\bigl{(}\bm{x},\frac{\bm% {\eta}}{K}\bigr{)}\ket{0}\!\ket{\bm{\eta},0}\!\ket{+}^{\otimes d}=\sum_{\lVert% \bm{\alpha}\rVert_{1}\leq s}\frac{\partial^{\bm{\alpha}}f(\frac{\bm{\eta}}{K})% }{\bm{\alpha}!}(\bm{x}-\frac{\bm{\eta}}{K})^{\bm{\alpha}}.

(D.64)

We then apply the Hadamard test on $W_{lcu}(\bm{x},\frac{\bm{\eta}}{K})$ , giving the quantum circuit $W_{e}(\bm{x},\frac{\bm{\eta}}{K})$ as below

\Qcircuit@C=1em@R=0.5em{\lstick{\ket{0}}&\qw\gate{H}\qw\ctrl{1}\qw\gate{H}\qw% \\ \lstick{\ket{0}}{/}\qw\qw\qw\multigate{3}{W_{lcu}}\qw\qw\qw\\ \lstick{\ket{0}}{/}\qw\gate{U(\bm{\eta})}\qw\ghost{W_{lcu}}\qw\qw\qw\\ \lstick{\ket{0}}\qw\qw\qw\ghost{W_{lcu}}\qw\qw\qw\\ \lstick{\ket{0}}{/}\qw\gate{H^{\otimes d}}\qw\ghost{W_{lcu}}\qw\qw\qw}

where the unitary $U(\bm{\eta})$ takes $\bm{\eta}$ as input and maps $\ket{0}$ to $\ket{\bm{\eta}}$ . Note that the controlled unitary $U_{c}(\bm{x},\frac{\bm{\eta}}{K})$ could be implemented by $O(T(s+1))$ number of $(\log T)$ -qubit controlled gates and $O(TK^{d})$ number of $(\log T+d\log K)$ -qubit controlled gates. An $n$ -qubit controlled gate could be implemented by a quantum circuit consisting of CNOT gates and single-qubit gates with depth $O(n)$ [49]. Thus $U_{c}(\bm{x})$ could be implemented by a quantum circuit with depth $O((s+1)T\log T+TK^{d}(\log T+d\log K))$ and width $O(d+\log T+d\log K)$ . Then the depth and width of $W_{lcu}(\bm{x},\frac{\bm{\eta}}{K})=(F^{\dagger}\otimes I)U_{c}(\bm{x},\frac{% \bm{\eta}}{K})(F\otimes I)$ are in the same order of $U_{c}(\bm{x},\frac{\bm{\eta}}{K})$ since $F$ is simply tensor of Hadamard gates. Therefore the entire depth of the circuit $W_{e}$ is $O((sT\log T+TK^{d}(\log T+d\log K)))$ and the width is $O(d+\log T+d\log K)$ . As $T\leq(s+1)d^{s}$ , we have the depth and width of PQC shown in Lemma S17. Note that the number of parameters in the PQC equals the number of parameters in $U_{c}(\bm{x})$ , which is $O(T(s+d+K^{d}))$ . $\sqcap$ $\sqcup$

Finally, we combine the steps of localization and the Taylor series implementation to achieve a local Taylor expansion for the target function. The PQC is in a nested structure consisting of a PQC for localization and a PQC for Taylor series; see the detailed construction in the following theorem.

Theorem 4.

\lvert f(\bm{x})-f_{W_{t}}(\bm{x})\rvert\leq d^{s+\beta/2}K^{-\beta}

(D.65)

Proof.

By Lemma S10, we have the following error bound for $\bm{x}\in Q_{\bm{\eta}}$ ,

\Big{\lvert}f(\bm{x})-\sum_{\lVert\bm{\alpha}\rVert_{1}\leq s}\frac{\partial^{% \bm{\alpha}}f(\frac{{\bm{\eta}}}{K})}{\bm{\alpha}!}(\bm{x}-\frac{{\bm{\eta}}}{% K})^{\bm{\alpha}}\Big{\rvert}\leq d^{s}\big{\lVert}\bm{x}-\frac{{\bm{\eta}}}{K% }\big{\rVert}^{\beta}_{2}\leq d^{s+\beta/2}K^{-\beta}.

(D.66)

Motivated by this, we first construct a localization PQC $W_{D}(x)$ as in Lemma S14 such that

\bm{0}\leq f_{W_{D}}(\bm{x})-\frac{\bm{\eta}}{\bm{K}}\leq\Bigl{(}\frac{1}{2K},% \ldots,\frac{1}{2K}\Bigr{)}\quad\text{if $\bm{x}\in Q_{\bm{\eta}}$}.

(D.67)

The depth of $W_{D}(x)$ is $O(\frac{1}{\Delta}\log K)$ . We then construct a PQC

W_{t}(\bm{x})\coloneqq W_{e}(\bm{x},f_{W_{D}}(\bm{x})),

(D.68)

where $W_{e}$ is the PQC proposed in Lemma S17. Note that the state $\ket{\bm{\eta}}$ in Lemma S17 could be prepared by rounding $f_{W_{D}}(\bm{\eta})K$ , i.e., $\bm{\eta}=\lfloor f_{W_{D}}(\bm{\eta})K\rfloor$ . In other words, the PQC $W_{t}(\bm{x})$ has a nested structure consisting of a PQC for localization and a PQC for Taylor series implementation. Then we show that $f_{W_{t}}(\bm{x})\coloneqq\bra{0}W^{\dagger}_{t}(\bm{x})Z^{(0)}W_{t}(\bm{x})% \ket{0}$ can approximate $\beta$ -Hölder smooth function $f$ on $\bigcup_{\bm{\eta}}Q_{\bm{\eta}}$ . By the triangle inequality and Eq. D.66, we have

$\displaystyle\lvert f(\bm{x})-f_{W_{t}}(\bm{x})\rvert$	$\displaystyle\leq\Big{\lvert}f_{W_{t}}(\bm{x})-\sum_{\lVert\bm{\alpha}\rVert_{% 1}\leq s}\frac{\partial^{\bm{\alpha}}f(f_{W_{D}}(\bm{x}))}{\bm{\alpha}!}(x-f_{% W_{D}}(\bm{x}))^{\bm{\alpha}}\Big{\rvert}+d^{s}\lVert\bm{x}-f_{W_{D}}(\bm{x})% \rVert_{2}^{\beta}$	(D.69)
	$\displaystyle\leq\Big{\lvert}f_{W_{t}}(\bm{x})-\sum_{\lVert\bm{\alpha}\rVert_{% 1}\leq s}\frac{\partial^{\bm{\alpha}}f(f_{W_{D}}(\bm{x}))}{\bm{\alpha}!}(x-f_{% W_{D}}(\bm{x}))^{\bm{\alpha}}\Big{\rvert}+d^{s+\beta/2}K^{-\beta}$	(D.70)
	$\displaystyle\leq d^{s+\beta/2}K^{-\beta}.$	(D.71)

The second inequality comes from the fact that $||\bm{x}-f_{W_{D}}(\bm{x})||_{2}\leq\frac{1}{K}$ for $\bm{x}\in Q_{\bm{\eta}}$ . This completes the proof. $\sqcap$ $\sqcup$

Note that the PQC in Theorem 4 is nesting of two PQCs, while its depth is counted as the sum of two PQCs for simplicity. We have established the uniform convergence property of PQCs for approximating Hölder smooth function on $[0,1]^{d}$ except for the trifling region $\Lambda(d,K,\Delta)$ . Note that the Lebesgue measure of such a trifling region is no more than $dK\Delta$ . We can set $\Delta=K^{-d}$ with no influence on the size of the constructed PQC in Theorem 4. Since $\nu$ is absolutely continuous with respect to the Lebesgue measure, we have the following corollary.

Corollary S18.

$\displaystyle\lVert f(\bm{x})-f_{W_{t}}(\bm{x})\rVert^{2}_{L^{2}(v)}$	$\displaystyle=\int_{[0,1]^{d}}(f(\bm{x})-f_{W_{t}}(\bm{x}))^{2}\nu(x)\odif$
	$\displaystyle=\int_{\cup_{\bm{\eta}}Q_{\bm{\eta}}\bigcup\Lambda(d,K,\Delta)}(f% (\bm{x})-f_{W_{t}}(\bm{x}))^{2}\nu(x)\odif{x}$	(D.72)
	$\displaystyle\leq(d^{s+\beta/2}K^{-\beta})^{2}+4dK^{1-d}.$	(D.73)

The width of the PQC is $O(d\log K+\log s+s\log d)$ , the depth is $O(s^{2}d^{s}K^{d}(\log s+s\log d+d\log K))+\frac{1}{\Delta}\log K)$ , and the number of parameters is $O(sd^{s}(s+d+K^{d})+\frac{d}{\Delta}\log K)$ .

D.4 Comparison of “global” and “local” approaches in this work

We note that we have presented two distinct methodologies for constructing PQC models with UAP properties aimed at approximating continuous functions. In Theorem 3 and Theorem 4, we establish PQC models, guided by the multivariate Bernstein polynomials and the Taylor expansion of multivariate continuous functions, respectively. We categorize these approaches as “local” and “global”. We proceed to conduct a comprehensive comparative analysis of these two strategies in the context of approximating Lipschitz continuous functions. For the subsequent analysis, we set $\beta=1$ , thus $s=0$ in Theorem 4, in accordance with the Lipschitz continuous property exhibited by the target function.

The approximation error associated with the global approach can be bounded as ${(2^{d}d\ell^{2})}/{(n\varepsilon^{2})}+\varepsilon$ . By selecting $n=(2^{d}d\ell^{2})/{\varepsilon^{3}}$ , we ensure an approximation error of $2\varepsilon$ . Concurrently, the corresponding number of trainable parameters scale as $O\bigl{(}2^{d^{2}}d^{d+1}\ell^{2d}/\varepsilon^{3d}\bigr{)}$ . In contrast, the local approach exhibits an approximation error scaling as $\sqrt{d}K^{-1}+\varepsilon$ . Setting $K=\sqrt{d}/\varepsilon$ ensures a $2\varepsilon$ approximation error, with the number of trainable parameters scaling as $O\left(d^{d/2}/\varepsilon^{d}\right)$ . These findings highlight the advantage of the local approach for approximating continuous functions. More importantly, the approximation error proposed by the local method approaches the optimal convergence rate established in Shen et al. [22]. A formal comparison between PQCs and classical deep neural networks is stated in the next section.

Appendix E Comparison with related works in classical machine learning

Table S1: Approximation errors of PQCs and ReLU FNNs

Approach	Target	Width	Depth	Number of parameters	Approximation error
PQC	$d$ -var. deg.- $s$ monomial	$O(d)$	$O(s)$	$O(d+s)$	$0$
ReLU FNN [21]	$d$ -var. deg.- $s$ monomial	$O(N+s)$	$O(s^{2}M)$	$O((N^{2}+s^{2})s^{2}M)$	$O(sN^{-sM})$
Nested PQC	$C_{u}^{s}([0,1]^{d})$	$O(d\log K+s\log d)$	$O(K^{d}d^{s})$	$O(K^{d}d^{s+1})$	$O(d^{2s}K^{-s})$
$\text{ReLU}\text{ FNN}^{\mathrm{i}}$ [21]	$C_{u}^{s}([0,1]^{d})$	$O(s^{d+1}N)$	$O(s^{2}M)$	$O(s^{2d+4}K^{d/2}N)$	$O(s^{d}8^{s}K^{-s})$

i

Satisfying $NM=\Theta(K^{d/2})$ .

In this subsection, we conduct a comparative exploration of PQCs and classical deep neural networks, focusing on critical aspects, including model size, the number of trainable parameters, and approximation error. To establish a meaningful benchmark, we turn our attention to deep feed-forward neural networks (FNNs) distinguished by the incorporation of rectified linear unit (ReLU) activation functions. FNNs represent the foundational class of neural networks, characterized by a unidirectional flow of information, commencing from the input layer and traversing through one or more hidden layers before culminating at the output layer. This architectural design ensures the absence of cyclic dependencies or loops among nodes within each layer. The ReLU activation function, mathematically defined as $\text{ReLU}(x)\mathrel{\mathop{\mathchar 58\relax}}=\max(x,0)$ , has gained prominence across diverse domains, including but not limited to image recognition [70, 71] and natural language processing [72, 73]. Its popularity in feed-forward networks stems from its efficacy in facilitating the convergence of function approximation during network training. Additionally, a recent study [74] has affirmed that classical neural networks employing commonly utilized activation functions can be effectively approximated by ReLU-activated networks while maintaining a mild increment in network size. Readers are also referred to some other excellent works related to ReLU networks [16, 75, 76].

In particular, Shen et al. [22] have proposed the optimal approximation error to approximate any Lipschitz function. Lu et al. [21] have provided a nearly optimal approximation error to approximate any smooth function using ReLU FNNs. For clarity, the comparison of our results with theirs is summarized in Table. S1. It is pertinent to observe that, in the majority of practical instances, the smoothness coefficient $s$ of the target function tends to be modest since most functions to be approximated is not very smooth. Additionally, within practical scenarios, particularly in domains like image recognition and natural language processing, the dimensionality $d$ of input data is substantially large. Consequently, within this context, we identify terms that solely rely on the variable $s$ as constants and $d\gg s$ within Table S1.

We extend our investigation by quantifying the performance of PQCs and FNNs in terms of the model size and the number of parameters for approximating $s$ -smooth functions $C^{s}_{u}([0,1]^{d})$ . Notably, we discover that in cases where the target function adheres to certain norms of smoothness, PQCs exhibit a notable improvement in approximating this function in terms of the model size and the number of parameters.

Model size. In particular, we explore the comparison of PQC and FNN model sizes when they yield the same approximation error $\varepsilon$ (say some constant). Here, we use a straightforward measure, the product of width and depth, to gauge the model size. By setting approximation error as $\varepsilon$ , the size of PQC and FNN scale as $O(K_{Q}^{d}d^{s+1})$ and $O(K_{C}^{d/2}s^{d+3})$ , respectively, where $K_{Q}=\Theta(d^{2}/\varepsilon^{1/s})$ and $K_{C}=\Theta(s^{d/s}/\varepsilon^{1/s})$ .

Remarkably, when $2\leq s<d$ , an intriguing observation emerges: the ratio of model sizes between PQCs and FNNs [21] exhibits a scaling behavior of $O(\varepsilon^{-d/(2s)}/s^{d^{2}-d\log_{s}d})$ . Our comprehensive analysis concludes that in situations where the smoothness threshold is satisfied, PQCs boast a significantly smaller model size compared to FNNs.

Number of trainable parameters. In the present investigation, we delve into the comparative analysis of the number of trainable parameters of PQC and FNN under the premise of yielding comparable approximation errors. From the perspective of approximation theory, the count of parameters serves as a standard metric for assessing model degrees of freedom and expressing model expressiveness. By setting approximation error as $\varepsilon$ , the number of trainable parameters of PQC and FNN scale as $O(K_{Q}^{d}d^{s+1})$ and $O(K_{C}^{(1+\lambda_{0})d/2}s^{2d+4})$ , respectively. Here, the hyperparameter $\lambda_{0}\in(0,1)$ signifies FNN’s width.

Remarkably, through our analysis, we have uncovered that when $2\leq s<d$ , the relationship between the number of trainable parameters of PQCs and FNNs [21] demonstrates a scaling pattern characterized by $O(\varepsilon^{-(1-\lambda_{0})d/(2s)}/s^{(1+\lambda_{0})d^{2}-d\log_{s}d})$ . As a consequence, the number of trainable parameters of PQCs significantly reduces compared to that of FNNs.

Approximating monomial. Here, we conduct a comparative performance analysis of PQC and FNN in approximating monomial functions of degree $s$ . Within this specialized target function space, PQCs exhibit distinct advantages in terms of width, depth, model size, and the number of trainable parameters. Notably, PQCs possess the unique capability to capture the dynamics of monomial functions precisely, eliminating the need for approximation and thereby offering a compelling advantage. These advantages position PQCs as promising candidates for outperforming FNNs when addressing more complex target function spaces.

Non-asymptotic Approximation Error Bounds of Parameterized Quantum Circuits

Abstract

1 Introduction

2 Preliminaries

Quantum states.

Quantum gates.

Quantum measurement

Data re-uploading PQCs.

3 Expressivity of PQCs for multivariate continuous functions

3.1 Explicit construction of PQCs for multivariate polynomials

Theorem 1.

3.2 PQC approximation for continuous functions

Theorem 2 (The Universal Approximation Theorem of PQC).

Theorem 3.

3.3 PQC approximation for Hölder smooth functions

Localization.

Implementing the Taylor coefficients.

Implementing multivariate Taylor series.

Theorem 4.

4 Numerical experiments

5 Discussion

Acknowledgments and Disclosure of Funding

References

Appendix A Preliminaries

A.1 Notation

A.2 Data re-uploading PQCs

A.2.1 Implementing real polynomials

Lemma S1 ([47]).

Corollary S2 ([47]).

Remark S1.

A.2.2 Implementing trigonometric polynomials

Lemma S3 ([37]).

Corollary S4 ([37, 57]).

A.3 Related work in PQC approximation

Appendix B Implementing multivariate polynomials using PQCs

B.1 Implementing multivariate real polynomials

Lemma S5.

Proof.

Theorem 1.

Proof.

B.2 Implementing multivariate trigonometric polynomials

Lemma S6.

Proof.

Proposition S7.

Appendix C Approximating continuous functions via PQCs

C.1 Established results of Bernstein polynomials approximation

Lemma S8 (Bernstein polynomials approximation for Lipschitz functions [53]).

Proof.

Remark S2.

C.2 Implement Bernstein polynomials via PQCs

Lemma S9.

Proof.

C.3 PQC approximating continuous functions

Theorem 2 (The Universal Approximation Theorem of PQC).

Proof.

Theorem 3.

Proof.

Appendix D Approximating smooth functions via nested PQCs

Lemma S10 ([18]).

D.1 Localization via PQC

Lemma S11 (Polynomial approximation to the sign function sgn⁡(x−c)sgn𝑥𝑐\operatorname{sgn}(x-c)roman_sgn ( italic_x - italic_c ) [54]).

Lemma S12.

Corollary S13.

Lemma S14 (Localization via PQC).

Proof.

D.2 Implementing the Taylor coefficients by PQC

Lemma S15.

D.3 Implementing multivariate Taylor series by PQC

Corollary S16.

Proof.

Lemma S17.

Proof.

Theorem 4.

Proof.

Corollary S18.

D.4 Comparison of “global” and “local” approaches in this work

Appendix E Comparison with related works in classical machine learning

Lemma S11 (Polynomial approximation to the sign function $\operatorname{sgn}(x-c)$ [54]).