Abstract
In certain polytopal domains \(\varOmega \), in space dimension \(d=2,3\), we prove exponential expressivity with stable ReLU Neural Networks (ReLU NNs) in \(H^1(\varOmega )\) for weighted analytic function classes. These classes comprise in particular solution sets of source and eigenvalue problems for elliptic PDEs with analytic data. Functions in these classes are locally analytic on open subdomains \(D\subset \varOmega \), but may exhibit isolated point singularities in the interior of \(\varOmega \) or corner and edge singularities at the boundary \(\partial \varOmega \). The exponential approximation rates are shown to hold in space dimension \(d = 2\) on Lipschitz polygons with straight sides, and in space dimension \(d=3\) on Fichera-type polyhedral domains with plane faces. The constructive proofs indicate that NN depth and size increase poly-logarithmically with respect to the target NN approximation accuracy \(\varepsilon >0\) in \(H^1(\varOmega )\). The results cover solution sets of linear, second-order elliptic PDEs with analytic data and certain nonlinear elliptic eigenvalue problems with analytic nonlinearities and singular, weighted analytic potentials as arise in electron structure models. Here, the functions correspond to electron densities that exhibit isolated point singularities at the nuclei.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The application of deep neural networks (DNNs) as approximation architecture in numerical solution methods of partial differential equations (PDEs), possibly on high-dimensional parameter- and state-spaces, attracted increasing attention in recent years. An incomplete list of recently proposed algorithmic approaches is [11, 45, 46, 52, 54] and references therein. In these works, DNN-based approaches for the numerical approximation of solutions of elliptic and parabolic boundary value problems are proposed. Two key ingredients in these approaches are: (a) use of DNNs as approximation architecture for the numerical approximation of solutions (thus using DNNs in place of, e.g., finite element, finite volume or finite difference methods), and (b) incorporation of a suitable weak form of the PDE of interest into the loss function of the DNN training. For example, weak residuals, least squares or, for variational formulations from continuum mechanics, total potential energies in variational principles [11] have been proposed.
In the study of NNs as numerical methods for solving PDEs, usually three types of errors are identified. After fixing a NN architecture and activation function, the approximation error indicates how well the PDE solution can be approximated by NNs with that architecture. An additional error is incurred when the NN must be trained on only a finite amount of possibly corrupted data about the PDE solution. This contribution to the overall error, in particular there where the given data are uninformative, is the generalization error and is in addition to further errors that are caused by the training algorithm, which can be called optimization error. In this paper, we study the approximation error of deep ReLU neural networks.
One condition for good performance of these computational approaches requires the DNNs to achieve a high rate of approximation uniformly over the solution set associated with the PDE under consideration. This is analogous to what has been found in the mathematical convergence rate analysis of, e.g., finite element methods: convergence rate bounds are well-known to be related, via stability and quasi-optimality, to approximability of solutions sets of PDEs from the finite element spaces under consideration. Since numerical solutions are (generally oblique) projections of the unknown solution onto finite-dimensional subspaces, the convergence rates are naturally determined by approximation rates of the subspace families under consideration within the regularity classes of PDE. For elliptic boundary and eigenvalue problems, function classes of (weighted) Sobolev or Besov type are well known to describe both solution regularity and approximation rates.
For functions belonging to a smoothness space of finite differentiation order such as continuously differentiable, Sobolev-regular, or Besov-regular functions on a bounded domain, upper bounds for algebraic approximation rates by NNs were established for example in [9, 10, 16, 32, 55, 57, 58]. Here, we only mentioned results that use the ReLU activation function. Besides, for PDEs, in particular in high-dimensional domains approximation rates of the solution that go beyond classical smoothness-based results were established in [5, 12, 26, 29, 51]. Again, we confine the list to publications with approximation rates for NNs with the ReLU activation function (referred to as ReLU NNs below).
In the present paper, we prove that exponential approximation rates are achieved by deep ReLU NNs for weighted, analytic solution classes of linear and nonlinear elliptic source and eigenvalue problems on polygonal and polyhedral domains. Mathematical results on weighted analytic regularity [2, 6, 8, 17,18,19,20, 24, 35, 38, 39] imply that these classes consist of functions that are analytic with possible corner, edge, and corner-edge singularities.
In contrast to the previously mentioned approximation results for ReLU NNs, the function class studied here is special in the sense that it admits extremely high regularity in most parts of the domain except for designated locations, i.e., the edges and corners of a domain, where the regularity is assumed to be very low. An approximation scheme to realize the exponential approximation rates associated with analytic regularity, therefore, hinges on a successful resolution of the singularities. We will see that, in addition to emulating local polynomial approximation, the presented scheme is strongly adapted to the potentially complex geometries of the underlying domains.
Our analysis provides, for the aforementioned functions, approximation errors in Sobolev norms that decay exponentially in terms of the number of parameters M of the ReLU NNs.
1.1 Contribution
The principal contribution of this work is threefold:
-
1.
We prove, in Theorem 4.3, a general result on the approximation by ReLU NNs of weighted analytic function classes on \(Q {:}{=}(0,1)^d\), where \(d = 2,3\). The analytic regularity of solutions is quantified via countably normed, analytic classes, based on weighted Sobolev spaces of Kondrat’ev type in Q, which admit corner and, in space dimension \(d=3\), also edge singularities. Such classes were introduced, e.g., in [2, 6, 8, 17,18,19] and in the references there. We prove exponential expression rates by ReLU NNs in the sense that for a number M of free parameters of the NNs, the approximation error is bounded, in the \(H^1\)-norm, by \(C\exp (-bM^{1/(2d+1)})\) for constants \(b,C > 0\).
-
2.
Based on the ReLU NN approximation rate bound of Theorem 4.3, we establish, in Sect. 5, approximation results for solutions of different types of PDEs by NNs with ReLU activation. Concretely, in Sect. 5.1.1, we study the reapproximation of solutions of nonlinear Schrödinger equations with singular potentials in space dimension \(d=2,3\). We prove that for solutions which are contained in weighted, analytic classes in \(\varOmega = {\mathbb {R}}^d / (2{\mathbb {Z}})^d\), ReLU NNs (whose realizations are continuous, piecewise affine) with at most M free parameters yield an approximation with accuracy of the order \(\exp (-bM^{1/(2d+1)})\) for some \(b>0\). Importantly, this convergence is in the \(H^1(\varOmega )\)-norm. In Sect. 5.1.2, we establish the same exponential approximation rates for the eigenstates of the Hartree–Fock model with singular potential in \({\mathbb {R}}^3\). This result constitutes the first, to our knowledge, mathematical underpinning of the recently reported, high efficiency of various NN-based approaches in variational electron structure computations, e.g., [21, 25, 44]. In Sect. 5.2, we demonstrate the same approximation rates also for elliptic boundary value problems with analytic coefficients and analytic right-hand sides, in plane, polygonal domains \(\varOmega \). The approximation error of the NNs is, again, bound in the \(H^1(\varOmega )\)-norm. We also infer an exponential NN expression rate bound for corresponding traces, in \(H^{1/2}(\partial \varOmega )\) and for viscous, incompressible flow. Finally, in Sect. 5.3, we obtain the same asymptotic exponential rates for the approximation of solutions to elliptic boundary value problems, with analytic data, on so-called Fichera-type domains \(\varOmega \subset {{\mathbb {R}}}^3\) (being, roughly speaking, finite unions of tensorized hexahedra). These solutions exhibit corner, edge and corner-edge singularities.
-
3.
The exponential approximation rates of the ReLU NNs established here are based on emulating corresponding variable grid and degree (“hp”) piecewise polynomial approximations. In particular, our construction comprises novel tensor product hp-approximations on Cartesian products of geometric partitions of intervals. In Theorem A.25, we establish novel tensor product hp-approximation results for weighted analytic functions on Q of the form \(\Vert u - u_{{\mathsf {h}}{\mathsf {p}}} \Vert _{H^1(Q)} \le C \exp (-b\root 2d \of {N})\) for \(d=1,2,3\), where N is the number of degrees of freedom in the representation of \(u_{{\mathsf {h}}{\mathsf {p}}}\) and \(C,b>0\) are independent of N (but depend on u). The tensor-product structure of the piecewise polynomial approximations is essential to facilitating the construction of deep ReLU neural networks: our constructive proofs exploit approximate tensorization of ReLU NNs in order to emulate the corresponding piecewise polynomial constructions. The geometric partitions employed in Q and the architectures of the constructed ReLU NNs are of tensor product structure, and generalize to space dimension \(d>3\). Note that hp-approximations based on non-tensor-product, geometric partitions of Q into hexahedra have been studied before, e.g., in [47, 48] in space dimension \(d=3\). There, the rate of \(\Vert u - u_{{\mathsf {h}}{\mathsf {p}}} \Vert _{H^1(Q)} \lesssim \exp (-b\root 5 \of {N})\) was proved. Being based on tensorization, the present construction of exponentially convergent, tensorized hp-approximations in “Appendix” Appendix: A does not invoke the rather involved polynomial trace liftings in [47, 48] and is interesting in its own right: the geometric and mathematical simplification comes at the expense of a slightly smaller (still exponential) rate of approximation. We expect that this construction of \(u_{{\mathsf {h}}{\mathsf {p}}}\) will allow a rather direct derivation of rank bounds for tensor structured function approximation of u in Q, generalizing results in [27, 28] and extending [37] from point to edge and corner-edge singularities.
1.2 Neural Network Approximation of Weighted Analytic Function Classes
Deriving exponential approximation rates for weighted analytic functions on general domains requires the combination of three arguments: First, a novel approximation result of weighted analytic functions on cubes \((0,1)^d\) with corner and/or edge singularities in \(H^1((0,1)^d)\) by tensor product hp-finite elements. Second, a reapproximation scheme for high-dimensional hp-finite elements in \(W^{1,q}\)-norms for \(q \in [1,\infty ]\) by ReLU NNs. Third, a ReLU NN-based approximation scheme on polyhedral domains via a localization method that uses a ReLU NN implementation of a domain-adapted partition of unity.
First, we specifically design tensorized hp-approximations so that they can be emulated by NNs by the reapproximation strategy that we outline below. We then prove exponential convergence of the approximation of weighted analytic functions by the tensorized hp-piecewise polynomials we constructed. Furthermore, in order to estimate the size of the resulting NNs, we need to bound the norms of the coefficients of the hp-projections. Those bounds are usually not a concern when dealing with hp-finite element methods, but they are necessary for our analysis of ReLU NNs. The construction of the hp-projections, the convergence analysis, and the bounds on the coefficients are presented in Theorem 2.1 and developed in “Appendix” Appendix: A.
We describe the NN emulation of hp-finite element interpolants and their lifting to domains in more detail: the emulation of hp-finite element approximation by ReLU NNs is fundamentally based on the approximate multiplication network formalized in [57]. Based on the approximate multiplication operation and an extension thereof to errors measured with respect to \(W^{1,q}\)-norms, for \(q \in [1,\infty ]\), we established already in [41] a reapproximation theorem of univariate splines of order \(p\in {\mathbb {N}}\) on arbitrary meshes with \(N\in {\mathbb {N}}\) cells. There, we observed that there exists a NN that reapproximates a variable-order, free-knot spline u in the \(H^1\)-norm up to an error of \(\epsilon >0\) with a number of free parameters that scales logarithmically in \(\epsilon \) and \(|u|_{H^1}\), linearly in N and quadratically in p. We recall this result in Proposition 3.7 below.
From this, it is apparent by the triangle inequality that, in univariate approximation problems where hp-finite elements yield exponential approximation rates, also ReLU NNs achieve exponential approximation rates (albeit with a possibly smaller exponent, because of the quadratic dependence on p, see [41, Theorem 5.12]).
The extension of this result to higher dimensions for high-order finite elements that are built from univariate finite elements by tensorization is based on the underlying compositionality of NNs. Because of that, it is possible to compose a NN implementing a multiplication of d inputs with d approximations of univariate finite elements. We introduce a formal framework describing these operations in Sect. 3. We can then prove, in Theorem 4.3, approximation rates by ReLU NNs for weighted analytic function classes in cubes.
With Theorem 4.3 established, the next step is to extend the approximation result from cubes to general domains. The ReLU NNs of Theorem 4.3 are continuous functions on \({\mathbb {R}}^d\), and we have little control over the behavior of these functions outside of the cubes. This implies that even on unions of disjoint cubes the approximation results of Theorem 4.3 do not directly transfer by taking sums of the local approximations.
Instead, we first extend Theorem 4.3 to weighted analytic functions defined on Fichera-type domains \((-1,1)^d \setminus (-1,0]^d\) for \(d = 2,3\) by, again, reapproximating a quasi-interpolant on this domain. To extend the results to general polygonal domains for \(d = 2\), we construct an overlapping cover of the domain of affinely-transformed cubes or affinely-transformed Fichera-type domains plus a partition of unity subordinate to this partition (Lemma 5.5). We demonstrate that this partition of unity can be exactly represented by ReLU NNs. The localization by this partition of unity reduces the approximation problem locally to one of the previously described approximations on either an affinely-transformed cube or an affinely-transformed Fichera-type domain. This yields Theorem 5.6 which shows that weighted analytic functions on polygonal domains can be approximated with exponential accuracy with respect to the number of parameters of the underlying neural network.
1.3 Outline
The manuscript is structured as follows: in Sect. 2, in particular Sect. 2.2, we review the weighted function spaces which will be used to describe the weighted analytic function classes in polytopes \(\varOmega \) that underlie our approximation results. In Sect. 2.3, we present an approximation result by tensor-product hp-finite elements for functions from the weighted analytic class. A proof of this result is provided in “Appendix” Appendix: A. In Sect. 3, we review definitions of NNs and a “ReLU calculus” from [12, 43] whose operations will be required in the ensuing NN approximation results.
In Sect. 4, we state and prove the key results of the present paper. In Sect. 5, we illustrate our results by deducing novel NN expression rate bounds for solution classes of several concrete examples of elliptic boundary-value and eigenvalue problems where solutions belong to the weighted analytic function classes introduced in Sect. 2. Some of the more technical proofs of Sect. 5 are deferred to “Appendix” Appendix: B. In Sect. 6, we briefly recapitulate the principal mathematical results of this paper and indicate possible consequences and further directions.
2 Setting and Functional Spaces
We start by recalling some general notation that will be used throughout the paper. We also introduce some tools that are required to describe two- and three-dimensional domains as well as the associated weighted Sobolev spaces.
2.1 Notation
For \(\alpha \in {\mathbb {N}}^d_0\), define \({|\alpha |}{:}{=}\alpha _1+\dots +\alpha _d\) and \({|\alpha |_\infty }{:}{=}\max \{\alpha _1, \dots , \alpha _d\}\). When we indicate a relation on \({|\alpha |}\) or \({|\alpha |_\infty }\) in the subscript of a sum, we mean the sum over all multi-indices that fulfill that relation: e.g., for a \(k\in {\mathbb {N}}_0\)
For a domain \(\varOmega \subset {\mathbb {R}}^d\), \(k\in {\mathbb {N}}_0\) and for \(1\le p\le \infty \), we indicate by \(W^{k,p}(\varOmega )\) the classical \(L^p(\varOmega )\)-based Sobolev space of order k. We write \(H^k(\varOmega ) = W^{k,2}(\varOmega )\). We introduce the norms \(\Vert \cdot \Vert _{W_{\mathrm {mix}}^{1,p}(\varOmega )}\) as
with associated spaces
We denote \(H_{\mathrm {mix}}^1(\varOmega ) = W_{\mathrm {mix}}^{1,2}(\varOmega )\). For \(\varOmega = I_1\times \dots \times I_d\), with bounded intervals \(I_j\subset {\mathbb {R}}\), \(j=1, \dots , d\), \(H_{\mathrm {mix}}^{1}(\varOmega ) = H^1(I_1)\otimes \dots \otimes H^1(I_d)\) with Hilbertian tensor products. Throughout, C will denote a generic positive constant whose value may change at each appearance, even within an equation.
The \(\ell ^p\)-norm, \(1\le p\le \infty \), on \({\mathbb {R}}^n\) is denoted by \(\left\| x \right\| _{p}\). The number of nonzero entries of a vector or matrix x is denoted by \(\Vert x\Vert _0\).
Three-dimensional domain. Let \(\varOmega \subset {\mathbb {R}}^3\) be a bounded, polyhedral domain. Let \({\mathcal {C}}\) denote a set of isolated points, situated either at the corners of \(\varOmega \) or in the interior of \(\varOmega \) (that we refer to as the singular corners in either case, for simplicity), and let \({\mathcal {E}}\) be a subset of the edges of \(\varOmega \) (the singular edges). Furthermore, denote by \({\mathcal {E}}_c\subset {\mathcal {E}}\) the set of singular edges abutting at a corner \(c\). For each \(c\in {\mathcal {C}}\) and each \(e\in {\mathcal {E}}\), we introduce the following weights:
For \(\varepsilon >0\), we define edge-, corner-, and corner-edge neighborhoods:
We fix a value \({{\hat{\varepsilon }}}>0\) small enough so that \(\varOmega _c^{{{\hat{\varepsilon }}}}\cap \varOmega ^{{{\hat{\varepsilon }}}}_{c'} = \emptyset \) for all \(c\ne c'\in {\mathcal {C}}\) and \(\varOmega ^{{\hat{\varepsilon }}}_{ce} \cap \varOmega ^{{\hat{\varepsilon }}}_{ce'} = \varOmega ^{{\hat{\varepsilon }}}_e \cap \varOmega ^{{\hat{\varepsilon }}}_{e'}=\emptyset \) for all singular edges \(e\ne e'\). In the sequel, we omit the dependence of the neighborhoods on \({{\hat{\varepsilon }}}\). Let also
and
In each subdomain \(\varOmega _{ce}\) and \(\varOmega _e\), for any multi-index \(\alpha \in {\mathbb {N}}_0^3\), we denote by \({\alpha _\parallel }\) the multi-index whose component in the coordinate direction parallel to e is equal to the component of \(\alpha \) in the same direction, and which is zero in every other component. Moreover, we set \({\alpha _\bot }{:}{=}\alpha -{\alpha _\parallel }\).
Two-dimensional domain. Let \(\varOmega \subset {\mathbb {R}}^2\) be a polygon. We adopt the convention that \({\mathcal {E}}{:}{=}\emptyset \). For \(c\in {\mathcal {C}}\), we define
As in the three-dimensional case, we fix a sufficiently small \({{\hat{\varepsilon }}}>0\) so that \(\varOmega ^{{{\hat{\varepsilon }}}}_{c}\cap \varOmega ^{{{\hat{\varepsilon }}}}_{c'}=\emptyset \) for \(c\ne c'\in {\mathcal {C}}\) and write \(\varOmega _c= \varOmega _c^{{\hat{\varepsilon }}}\). Furthermore, \(\varOmega _{{\mathcal {C}}}\) is defined as for \(d=3\), and \(\varOmega _0 {:}{=}\varOmega \setminus \overline{\varOmega _{{\mathcal {C}}}}\).
2.2 Weighted Spaces with Nonhomogeneous Norms
We introduce classes of weighted, analytic functions in space dimension \(d = 3\), as arise in analytic regularity theory for linear, elliptic boundary value problems in polyhedra, in the particular form introduced in [8]. There, the structure of the weights is in terms of Cartesian coordinates which is particularly suited for the presently adopted, tensorized approximation architectures.
The definition of the corresponding classes when \(d=2\) is analogous. For a weight exponent vector \({\underline{\gamma }}= \{\gamma _c, \gamma _e : \, c\in {\mathcal {C}}, e\in {\mathcal {E}}\}\), we introduce the nonhomogeneous, weighted Sobolev norms
where \((x)_+ = \max \{0, x\}\). Moreover, we define the associated function space by
Furthermore,
For \(A, C>0\), we define the space of weighted analytic functions with nonhomogeneous norm as
Finally, we denote
2.3 Approximation of Weighted Analytic Functions on Tensor Product Geometric Meshes
The approximation result of weighted analytic functions via NNs that we present below is based on emulating an approximation strategy of tensor product hp-finite elements. In this section, we present this hp-finite element approximation. Let \(I \subset {\mathbb {R}}\) be an interval. A partition of I into \(N \in {\mathbb {N}}\) intervals is a set \({\mathcal {G}}\) such that \(|{\mathcal {G}}|= N\), all elements of \({\mathcal {G}}\) are disjoint, connected, and open subsets of I, and
We denote, for all \(p\in {\mathbb {N}}_0\), by \({\mathbb {Q}}_p({\mathcal {G}})\) the piecewise polynomials of degree p on \({\mathcal {G}}\).
One specific partition of \(I= (0,1)\) is given by the one-dimensional geometrically graded grid, which for \(\sigma \in (0, 1/2]\) and \(\ell \in {\mathbb {N}}\), is given by
Theorem 2.1
Let \(d \in \{2,3\}\) and \(Q {:}{=}(0,1)^d\). Let \({\mathcal {C}}=\{c\}\) where \(c\) is one of the corners of Q and let \({\mathcal {E}}= {\mathcal {E}}_c\) contain the edges adjacent to c when \(d=3\), \({\mathcal {E}}=\emptyset \) when \(d=2\). Further assume given constants \(C_f, A_f>0\), and
Then, there exist \(C_p>0\), \(C_L>0\) such that, for every \(0< \epsilon <1\), there exist \(p, L \in {\mathbb {N}}\) with
so that there exist piecewise polynomials on \(I = (0,1)\)
with \(N_{\mathrm {1d}}= (L+1)p + 1\), and, for all \(f\in {\mathcal {J}}^{\varpi }_{\underline{\gamma }}(Q;{\mathcal {C}}, {\mathcal {E}};C_f,A_f)\) there exists a d-dimensional array of coefficients
such that
-
1.
For every \(i = 1, \dots N_{\mathrm {1d}}\), \({{\,\mathrm{supp}\,}}(v_{i})\) intersects either a single interval or two neighboring subintervals of \({\mathcal {G}}^L_1\). Furthermore, there exist constants \(C_v\), \(b_v\) depending only on \(C_f\), \(A_f\), \(\sigma \) such that
$$\begin{aligned} \Vert v_{ i}\Vert _{L^\infty (I)} \le 1, \qquad \Vert v_{i}\Vert _{H^1(I)} \le C_v \epsilon ^{-b_v}, \qquad \forall i=1, \dots , N_{\mathrm {1d}}. \end{aligned}$$(2.3) -
2.
It holds that
$$\begin{aligned}&\left\| f - \sum _{{i_1,\dots , i_d}= 1}^{N_{\mathrm {1d}}} c_{{i_1\dots i_d}} \phi _{{i_1\dots i_d}}\right\| _{H^1(Q)} \le \epsilon \qquad \text {with}\nonumber \\&\phi _{{i_1\dots i_d}} = \bigotimes _{j=1}^d v_{ i_j} ,\,\forall {i_1,\dots , i_d}=1, \dots , N_{\mathrm {1d}}. \end{aligned}$$(2.4) -
3.
\(\Vert c\Vert _\infty \le C_2 (1+\left| \log (\epsilon ) \right| )^d\) and \(\Vert c\Vert _1 \le C_c (1+\left| \log (\epsilon ) \right| )^{2d}\), for \(C_2, C_c>0\) independent of p, L, \(\epsilon \).
We present the proof in Sect. A.9.3 after developing an appropriate framework of hp-approximation in Sect. Appendix: A.
3 Basic ReLU Neural Network Calculus
In the sequel, we distinguish between a neural network, as a collection of weights, and the associated realization of the NN. This is a function that is determined through the weights and an activation function. In this paper, we only consider the so-called ReLU activation:
Definition 3.1
( [43, Definition 2.1]) Let \(d, L\in {\mathbb {N}}\). A neural network \(\varPhi \) with input dimension d and L layers is a sequence of matrix-vector tuples
where \(N_0 {:}{=}d\) and \(N_1, \dots , N_{L} \in {\mathbb {N}}\), and where \(A_\ell \in {\mathbb {R}}^{N_\ell \times N_{\ell -1}}\) and \(b_\ell \in {\mathbb {R}}^{N_\ell }\) for \(\ell =1,...,L\).
For a NN \(\varPhi \), we define the associated realization of the NN \(\varPhi \) as
where the output \(x_L \in {\mathbb {R}}^{N_L}\) results from
Here, \(\varrho \) is understood to act component-wise on vector-valued inputs, i.e., for \(y = (y^1, \dots , y^m) \in {\mathbb {R}}^m\), \(\varrho (y) {:}{=} (\varrho (y^1), \dots , \varrho (y^m))\). We call \(N(\varPhi ) {:}{=}d + \sum _{j = 1}^L N_j\) the number of neurons of the NN \(\varPhi \), \({{\,\mathrm{L}\,}}(\varPhi ){:}{=}L\) the number of layers or depth, \({{\,\mathrm{M}\,}}_j(\varPhi ){:}{=}\Vert A_j\Vert _{0} + \Vert b_j \Vert _{0}\) the number of nonzero weights in the j-th layer, and \({{\,\mathrm{M}\,}}(\varPhi ) {:}{=}\sum _{j=1}^L {{\,\mathrm{M}\,}}_j(\varPhi )\) the number of nonzero weights of \(\varPhi \), also referred to as its size. We refer to \(N_L\) as the dimension of the output layer of \(\varPhi \).
3.1 Concatenation, Parallelization, Emulation of Identity
An essential component in the ensuing proofs is to construct NNs out of simpler building blocks. For instance, given two NNs, we would like to identify another NN so that the realization of it equals the sum or the composition of the first two NNs. To describe these operations precisely, we introduce a formalism of operations on NNs below. The first of these operations is the concatenation.
Proposition 3.2
(NN concatenation, [43, Remark 2.6]) Let \(L_1, L_2 \in {\mathbb {N}}\), and let \(\varPhi ^1, \varPhi ^2\) be two NNs of respective depths \(L_1\) and \(L_2\) such that \(N^1_0 = N^2_{L_2}{=}{:}d\), i.e., the input layer of \(\varPhi ^1\) has the same dimension as the output layer of \(\varPhi ^2\).
Then, there exists a NN \(\varPhi ^1 \odot \varPhi ^2\), called the sparse concatenation of \(\varPhi ^1\) and \(\varPhi ^2\), such that \(\varPhi ^1 \odot \varPhi ^2\) has \(L_1+L_2\) layers, \(\mathrm {R}(\varPhi ^1 \odot \varPhi ^2) = \mathrm {R}(\varPhi ^1) \circ \mathrm {R}(\varPhi ^2)\) and \({{\,\mathrm{M}\,}}\left( \varPhi ^1 \odot \varPhi ^2\right) \le 2{{\,\mathrm{M}\,}}\left( \varPhi ^1\right) + 2{{\,\mathrm{M}\,}}\left( \varPhi ^2\right) \).
The second fundamental operation on NNs is parallelization, achieved with the following construction.
Proposition 3.3
(NN parallelization, [43, Definition 2.7]) Let \(L, d \in {\mathbb {N}}\) and let \(\varPhi ^1, \varPhi ^2\) be two NNs with L layers and with d-dimensional input each. Then, there exists a NN \(\mathrm {P}(\varPhi ^1, \varPhi ^2)\) with d-dimensional input and L layers, which we call the parallelization of \(\varPhi ^1\) and \(\varPhi ^2\), such that
and \({{\,\mathrm{M}\,}}(\mathrm {P}(\varPhi ^1, \varPhi ^2)) = {{\,\mathrm{M}\,}}(\varPhi ^1) + {{\,\mathrm{M}\,}}(\varPhi ^2)\).
Proposition 3.3 requires two NNs to have the same depth. If two NNs have different depth, then we can artificially enlarge one of them by concatenating with a NN that implements the identity. One possible construction of such a NN is presented next.
Proposition 3.4
(NN emulation of \(\mathrm {Id}\), [43, Remark 2.4]) For every \(d,L\in {\mathbb {N}}\) there exists a NN \(\varPhi ^{\mathrm {Id}}_{d,L}\) with \({{\,\mathrm{L}\,}}(\varPhi ^{\mathrm {Id}}_{d,L}) = L\) and \({{\,\mathrm{M}\,}}(\varPhi ^{\mathrm {Id}}_{d,L}) \le 2 d L\), such that \(\mathrm {R} (\varPhi ^{\mathrm {Id}}_{d,L}) = \mathrm {Id}_{{\mathbb {R}}^d}\).
Finally, we sometimes require a parallelization of NNs that do not share inputs.
Proposition 3.5
(Full parallelization of NNs with distinct inputs, [12, Setting 5.2]) Let \(L \in {\mathbb {N}}\) and let
be two NNs with L layers each and with input dimensions \(N^1_0=d_1\) and \(N^2_0=d_2\), respectively.
Then, there exists a NN, denoted by \(\mathrm {FP}(\varPhi ^1, \varPhi ^2)\), with d-dimensional input where \(d = (d_1+d_2)\) and L layers, which we call the full parallelization of \(\varPhi ^1\) and \(\varPhi ^2\), such that for all \(x = (x_1,x_2) \in {\mathbb {R}}^d\) with \(x_i \in {\mathbb {R}}^{d_i}, i = 1,2\)
and \({{\,\mathrm{M}\,}}(\mathrm {FP}(\varPhi ^1, \varPhi ^2)) = {{\,\mathrm{M}\,}}(\varPhi ^1) + {{\,\mathrm{M}\,}}(\varPhi ^2)\).
Proof
Set \(\mathrm {FP}\left( \varPhi ^1,\varPhi ^2\right) {:}{=}\left( \left( A_1^3, b_1^3\right) , \dots , \left( A_L^3, b_L^3\right) \right) \) where, for \(j = 1, \dots , L\), we define
All properties of \(\mathrm {FP}\left( \varPhi ^1,\varPhi ^2\right) \) claimed in the statement of the proposition follow immediately from the construction. \(\square \)
3.2 Emulation of Multiplication and Piecewise Polynomials
In addition to the basic operations above, we use two types of functions that we can approximate especially efficiently with NNs. These are high dimensional multiplication functions and univariate piecewise polynomials. We first give the result of an emulation of a multiplication in arbitrary dimension.
Proposition 3.6
([16, Lemma C.5], [42, Proposition 2.6]) There exists a constant \(C>0\) such that, for every \(0<\varepsilon < 1\), \(d \in {\mathbb {N}}\) and \(M \ge 1\) there is a NN \(\varPi _{\epsilon , M}^{d}\) with d-dimensional input- and one-dimensional output, so that
and \({{\,\mathrm{R}\,}}(\varPi _{\epsilon , M}^{d})(x) = 0\) if \(\prod _{\ell =1}^dx_\ell = 0\), for all \(x = (x_1, \dots , x_d)\in {\mathbb {R}}^d\). Additionally, \(\varPi _{\epsilon , M}^{d}\) satisfies
In addition to the high-dimensional multiplication, we can efficiently approximate univariate continuous, piecewise polynomial functions by realizations of NNs with the ReLU activation function.
Proposition 3.7
([41, Proposition 5.1]) There exists a constant \(C>0\) such that, for all \(N_{\mathrm {int}}\in {\mathbb {N}}\) and \({\varvec{p}} = (p_i)_{i\in \{1,\ldots ,{N_{\mathrm {int}}}\}} \subset {\mathbb {N}}\), for all partitions \({\mathcal {T}}\) of \(I=(0,1)\) into \({N_{\mathrm {int}}}\) open, disjoint, connected subintervals \(I_i\), \(i=1,\ldots ,N_{\mathrm {int}}\), for all \(v\in {S_{{\varvec{p}}} (I,{\mathcal {T}})} {:}{=}\{v\in H^1(I): v|_{I_i} \in {\mathbb {P}}_{p_i}(I_i), i=1,\ldots ,N_{\mathrm {int}}\}\), and for every \(0<\varepsilon < 1\), there exist NNs \(\{\varPhi ^{v,{\mathcal {T}},{\varvec{p}}}_{\varepsilon }\}_{\varepsilon \in (0,1)}\) such that for all \(1\le q'\le \infty \) it holds that
where \(p_{\max } {:}{=}\max \{p_i :i = 1, \dots , N_{\mathrm {int}}\}\). In addition, \(\mathrm {R}\left( \varPhi ^{v,{\mathcal {T}},{\varvec{p}}}_{\varepsilon } \right) (x_j)=v(x_j)\) for all \(j\in \{0,\ldots ,{N_{\mathrm {int}}}\}\), where \(\{x_j\}_{j=0}^{N_{\mathrm {int}}}\) are the nodes of \({\mathcal {T}}\).
Remark 3.8
It is not hard to see that the result holds also for \(I = (a,b)\), where \(a,b\in {\mathbb {R}}\), with \(C>0\) depending on \((b-a)\). Indeed, for any \(v \in H^1((a,b))\) the concatenation of v with the invertible, affine map \(T :x \mapsto (x-a)/(b-a)\) is in \(H^1((0,1))\). Applying Proposition 3.7 yields NNs \(\{\varPhi ^{v,{\mathcal {T}},{\varvec{p}}}_{\varepsilon }\}_{\varepsilon \in (0,1)}\) approximating \(v \circ T\) to an appropriate accuracy. Concatenating these networks with the 1-layer NN \((A_1,b_1)\), where \(A_1x + b_1 = T^{-1}x\) yields the result. The explicit dependence of \(C>0\) on \((b-a)\) can be deduced from the error bounds in (0, 1) by affine transformation.
4 Exponential Approximation Rates by Realizations of NNs
We now establish several technical results on the exponentially consistent approximation by realizations of NNs with ReLU activation of univariate and multivariate tensorized polynomials. These results will be used to establish Theorem 4.3, which yields exponential approximation rates of NNs for functions in the weighted, analytic classes introduced in Sect. 2.2. They are of independent interest, as they imply that spectral and pseudospectral methods can, in principle, be emulated by realizations of NNs with ReLU activation.
4.1 NN-based Approximation of Univariate, Piecewise Polynomial Functions
We start with the following corollary to Proposition 3.7. It quantifies stability and consistency of realizations of NNs with ReLU activation for the emulation of the univariate, piecewise polynomial basis functions in Theorem 2.1.
Corollary 4.1
Let \(I=(a,b)\subset {\mathbb {R}}\) be a bounded interval. Fix \(C_p>0\), \(C_v>0\), and \(b_v>0\). Let \(0<\epsilon _{{\mathsf {h}}{\mathsf {p}}} < 1\) and \(p, N_{\mathrm {1d}}, N_{\mathrm {int}}\in {\mathbb {N}}\) be such that \(p \le C_p(1+\left| \log \epsilon _{{\mathsf {h}}{\mathsf {p}}} \right| )\) and let \({\mathcal {G}}_{\mathrm {1d}}\) be a partition of I into \(N_{\mathrm {int}}\) open, disjoint, connected subintervals and, for \(i\in \{1, \dots , N_{\mathrm {1d}}\}\), let \(v_i\in {\mathbb {Q}}_p({\mathcal {G}}_{{\mathrm {1d}}}) \cap H^1(I)\) be such that \({{\,\mathrm{supp}\,}}(v_i)\) intersects either a single interval or two adjacent intervals in \({\mathcal {G}}_{\mathrm {1d}}\) and \( \Vert v_i\Vert _{H^1(I)}\le C_v \epsilon _{{\mathsf {h}}{\mathsf {p}}}^{-b_v}\), for all \(i\in \{1, \dots , N_{\mathrm {1d}}\}\).
Then, for every \(0 < \epsilon _1 \le \epsilon _{{\mathsf {h}}{\mathsf {p}}}\), and for every \(i\in \{1, \dots , N_{\mathrm {1d}}\}\), there exists a NN \(\varPhi ^{v_{i}}_{\epsilon _1}\) such that
for constants \(C_4, C_5>0\) depending on \(C_p>0\), \(C_v>0\), \(b_v>0\) and \((b-a)\) only. In addition, \({{\,\mathrm{R}\,}}\left( \varPhi ^{v_i}_{\varepsilon _1} \right) (x_j)=v_i(x_j)\) for all \(i\in \{1,\ldots ,N_{\mathrm {1d}}\}\) and \(j\in \{0,\ldots ,{N_{\mathrm {int}}}\}\), where \(\{x_j\}_{j=0}^{N_{\mathrm {int}}}\) are the nodes of \({\mathcal {G}}_{{\mathrm {1d}}}\).
Proof
Let \(i=1, \dots , N_{\mathrm {1d}}\). For \(v_{i}\) as in the assumption of the corollary, we have that either \({{\,\mathrm{supp}\,}}(v_{i}) = {\overline{J}}\) for a unique \(J\in {\mathcal {G}}_{\mathrm {1d}}\) or \({{\,\mathrm{supp}\,}}(v_{i}) =\overline{J \cup J'}\) for two neighboring intervals \(J, J'\in {\mathcal {G}}_{\mathrm {1d}}\). Hence, there exists a partition \({\mathcal {T}}_{i}\) of I of at most four subintervals so that \(v_{i} \in S_{{\varvec{p}}} (I,{\mathcal {T}}_{i})\), where \({\varvec{p}} = (p_i)_{i\in \{1,\ldots ,4\}}\).
Because of this, an application of Proposition 3.7 with \(q' = 2\) and Remark 3.8 yields that for every \(0<\epsilon _1 \le \epsilon _{{\mathsf {h}}{\mathsf {p}}}< 1\) there exists a NN \(\varPhi ^{v_{i}}_{\epsilon _1} {:}{=} \varPhi ^{v_{i},{\mathcal {T}}_{i},{\varvec{p}}}_{\epsilon _1}\) such that (4.1) holds. In addition, by invoking \(p \lesssim 1+|\log (\epsilon _{\mathsf {hp}})|\le 1+\left| \log (\epsilon _1) \right| \), we observe that
Therefore, there exists \(C_4 >0\) such that (4.2) holds. Furthermore,
We use \(p \lesssim 1+\left| \log (\epsilon _1)\right| \) and obtain that there exists \(C_5 >0\) such that (4.3) holds. \(\square \)
4.2 Emulation of Functions with Singularities in Cubic Domains by NNs
Below we state a result describing the efficiency of re-approximating continuous, piecewise tensor product polynomial functions in a cubic domain, as introduced in Theorem 2.1, by realizations of NNs with the ReLU activation function.
Theorem 4.2
Let \(d\in \{2,3\}\), let \(I = (a,b)\subset {\mathbb {R}}\) be a bounded interval, and let \(Q=I^d\). Suppose that there exist constants \(C_p>0\), \(C_{N_{\mathrm {1d}}}>0\), \(C_v>0\), \(C_c>0\), \(b_v>0\), and, for \(0< \epsilon \le 1\), assume there exist \(p, N_{\mathrm {1d}}, N_{\mathrm {int}}\in {\mathbb {N}}\), and \(c\in {\mathbb {R}}^{N_{\mathrm {1d}}\times \dots \times N_{\mathrm {1d}}}\), such that
Further, let \({\mathcal {G}}_{\mathrm {1d}}\) be a partition of I into \(N_{\mathrm {int}}\) open, disjoint, connected subintervals and let, for all \(i\in \{1, \dots , N_{\mathrm {1d}}\}\), \(v_i\in {\mathbb {Q}}_p({\mathcal {G}}_{{\mathrm {1d}}}) \cap H^1(I)\) be such that \({{\,\mathrm{supp}\,}}(v_i)\) intersects either a single interval or two neighboring subintervals of \({\mathcal {G}}_{\mathrm {1d}}\) and
Then, there exists a NN \(\varPhi _{\epsilon , c}\) such that
Furthermore, it holds that \( \left\| {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon , c}\right) \right\| _{L^\infty (Q)} \le (2^d+1)C_c (1 + \left| \log \epsilon \right| ^{2d}), \)
where \(C>0\) depends on \(C_p\), \(C_{N_{\mathrm {1d}}}\), \(C_v\), \(C_c\), \(b_v\), d, and \((b-a)\) only.
Proof
Assume \(I \ne \emptyset \) as otherwise there is nothing to show. Let \(C_I\ge 1\) be such that \(C_I^{-1}\le (b-a) \le C_I\). Let \(c_{v, \mathrm {max}} {:}{=}\max \{\Vert v_i\Vert _{H^1(I)}:i \in \{1, \dots , N_{\mathrm {1d}}\}\} \le C_v \epsilon ^{-b_v}\), let \(\epsilon _1 {:}{=}\min \{\epsilon / (2 \cdot d \cdot (c_{v, \mathrm {max}}+1)^{d} \cdot \Vert c\Vert _1), 1/2, C_I^{-1/2}C_v^{-1}\epsilon ^{b_v}\}\), and let \(\epsilon _2 {:}{=}\min \{\epsilon /(2 \cdot (\sqrt{d}+1) \cdot C_I^{d/2} \cdot (c_{v, \mathrm {max}}+1) \cdot \Vert c\Vert _1), 1/2 \}\).
Construction of the neural network. Invoking Corollary 4.1 we choose, for \(i=1, \dots , N_{\mathrm {1d}}\), NNs \(\varPhi _{\epsilon _1}^{v_{i}}\) so that
It follows that for all \(i \in \{1, \dots , N_{\mathrm {1d}}\}\)
and that, by Sobolev imbedding,
Then, let \(\varPhi _{\mathrm {basis}}\) be the NN defined as
where the full parallelization is of d copies of \({{\,\mathrm{P}\,}}(\varPhi ^{v_{ 1}}_{\epsilon _1}, \dots , \varPhi ^{v_{ N_{\mathrm {1d}}}}_{\epsilon _1})\). Note that \(\varPhi _{\mathrm {basis}}\) is a NN with d-dimensional input and \(dN_{\mathrm {1d}}\)-dimensional output. Subsequently, we introduce the \(N_{\mathrm {1d}}^d\) matrices \(E^{(i_1, \dots , i_d)} \in \{0,1\}^{d\times dN_{\mathrm {1d}}}\) such that, for all \((i_1, \dots , i_d)\in \{1, \dots ,N_{\mathrm {1d}}\}^d\),
Note that, for all \((i_1, \dots , i_d)\in \{1, \dots , N_{\mathrm {1d}}\}^d\),
Then, we set
where \(\varPi _{\epsilon _2, 2}^d\) is according to Proposition 3.6. Note that, by (4.6), the inputs of \(\varPi _{\epsilon _2, 2}^d\) are bounded in absolute value by 2. Finally, we define
where \({{\,\mathrm{vec}\,}}(c) \in {\mathbb {R}}^{N_{\mathrm {1d}}^d}\) is the reshaping defined by, for all \(({i_1,\dots , i_d})\in \{1, \dots , N_{\mathrm {1d}}\}^d\),
See Fig. 1 for a schematic representation of the NN \(\varPhi _{\epsilon , c}\).
Approximation accuracy. Let us now analyze if \(\varPhi _{\epsilon , c}\) has the asserted approximation accuracy. Define, for all \(({i_1,\dots , i_d})\in \{1, \dots , N_{\mathrm {1d}}\}^d\)
Furthermore, for each \(({i_1,\dots , i_d})\in \{1, \dots , N_{\mathrm {1d}}\}^d\), let \(\varPhi _{{i_1\dots i_d}}\) denote the NNs
We estimate by the triangle inequality that
We have that
and, by another application of the triangle inequality, we have that
where the last estimate follows from Proposition 3.6 and the chain rule:
and
where we used (4.5) and the fact that \({{\,\mathrm{R}\,}}(\varPhi _{\epsilon _1}^{v_{i_k}} )\) depends only on \(x_k\). Integrating over the \(d-1\) other coordinates in Q gives the factor \(C_I^{d-1}\). We now use (4.6) to bound the first term in (4.11): for \(d = 3\), we have that, for all \(({i_1,\dots , i_d}) \in \{1, \dots , N_{\mathrm {1d}}\}^d\),
For \(d = 2\), we end up with a similar estimate with only two terms. By the tensor product structure, it is clear that \(\mathrm {(I)} \le d \epsilon _1 (c_{v, \mathrm {max}}+1)^{d}\). We have from (4.10) and the considerations above that
This yields (4.4).
Bound on the \(L^\infty \)-norm of the neural network. As we have already shown, \(\left\| {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon _1}^{v_{i}}\right) \right\| _{L^\infty (I)} \le 2\). Therefore, by Proposition 3.6, \( \left\| {{\,\mathrm{R}\,}}\left( \varPhi _\epsilon \right) \right\| _{L^\infty (Q)} \le 2^d + \epsilon _2\). It follows that
Size of the neural network. Bounds on the size and depth of \(\varPhi _{\epsilon , c}\) follow from Proposition 3.6 and Corollary 4.1. Specifically, we start by remarking that there exists a constant \(C_1 > 0\) depending on \(C_v\), \(b_v\), \(C_c\), \(C_I\) and d only, such that \(\left| \log (\epsilon _1)\right| \le C_1 (1+\left| \log \epsilon \right| )\) and \(\left| \log (\epsilon _2)\right| \le C_1 (1+\left| \log \epsilon \right| )\). Then, by Corollary 4.1, there exist constants \(C_{4}\), \(C_{5}>0\) depending on \(C_p, C_v, b_v, C_c, (b-a),\) and d only such that for all \(i=1, \dots , N_{\mathrm {1d}}\),
Hence, by Propositions 3.5 and 3.3, there exist \(C_{6}\), \(C_{7}>0\) depending on \(C_p, C_v, b_v, C_c, (b-a),\) and d only such that
Then, remarking that for all \(({i_1,\dots , i_d})\in \{1, \dots , N_{\mathrm {1d}}\}^d\) it holds that \(\Vert E^{({i_1,\dots , i_d})}\Vert _0 =d\) and, by Propositions 3.2, 3.6, and 3.3, we have
For \(C_{8}, C_{9}>0\) depending on \(C_p, C_v, b_v, C_c, (b-a)\) and d only. Finally, we conclude that there exists a constant \(C_{10}>0\) depending on \(C_p, C_v, b_v, C_c, (b-a)\) and d only, such that
Using also the fact that \(N_{\mathrm {1d}}\le C_{N_{\mathrm {1d}}} (1+\left| \log \epsilon \right| ^2)\) and since \(d\ge 2\),
for a constant \(C_{11}>0\) depending on \(C_p, C_{N_{\mathrm {1d}}}, C_v, b_v, C_c, (b-a)\) and d only. \(\square \)
Next, we state our main approximation result, which describes the approximation of singular functions in \((0,1)^d\) by realizations of NNs.
Theorem 4.3
Let \(d \in \{2,3\}\) and \(Q {:}{=}(0,1)^d\). Let \({\mathcal {C}}=\{c\}\) where \(c\) is one of the corners of Q and let \({\mathcal {E}}= {\mathcal {E}}_c\) contain the edges adjacent to c when \(d=3\), \({\mathcal {E}}=\emptyset \) when \(d=2\). Assume furthermore that \(C_f, A_f>0\), and
Then, for every \(f\in {\mathcal {J}}^{\varpi }_{\underline{\gamma }}(Q;{\mathcal {C}}, {\mathcal {E}};C_f,A_f)\) and every \(0< \epsilon <1\), there exists a NN \(\varPhi _{\epsilon , f}\) so that
In addition, \(\Vert {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon , f}\right) \Vert _{L^\infty (Q)} = {\mathcal {O}}(\left| \log \epsilon \right| ^{2d})\) for \(\epsilon \rightarrow 0\). Also, \({{\,\mathrm{M}\,}}(\varPhi _{\epsilon , f}) = {\mathcal {O}}(\left| \log \epsilon \right| ^{2d+1})\) and \({{\,\mathrm{L}\,}}(\varPhi _{\epsilon , f}) = {\mathcal {O}}(\left| \log \epsilon \right| \log (\left| \log \epsilon \right| ))\), for \(\epsilon \rightarrow 0\).
Proof
Denote \(I{:}{=}(0,1)\) and let \(f\in {\mathcal {J}}^{\varpi }_{\underline{\gamma }}(Q;{\mathcal {C}}, {\mathcal {E}}; C_f,A_f)\) and \(0<\epsilon <1\). Then, by Theorem 2.1 (applied with \(\epsilon /2\) instead of \(\epsilon \)) there exists \(N_{\mathrm {1d}}\in {\mathbb {N}}\) so that \(N_{\mathrm {1d}}= {\mathcal {O}}((1+\left| \log \epsilon \right| )^{2})\), \(c \in {\mathbb {R}}^{N_{\mathrm {1d}}\times \dots \times N_{\mathrm {1d}}}\) with \(\Vert c\Vert _1 \le C_c (1+\left| \log \epsilon \right| ^{2d})\), and, for all \((i_1, \dots , i_d)\in \{1, \dots , N_{\mathrm {1d}}\}^d\),
such that the hypotheses of Theorem 4.2 are met, and
We have, by Theorem 2.1 and the triangle inequality, that for \(\varPhi _{\epsilon , f} {:}{=} \varPhi _{\epsilon /2, c}\)
Then, the application of Theorem 4.2 (with \(\epsilon /2\) instead of \(\epsilon \)) concludes the proof of (4.12). Finally, the bounds on \({{\,\mathrm{L}\,}}(\varPhi _{\epsilon , f}) = {{\,\mathrm{L}\,}}(\varPhi _{\epsilon /2, c})\), \({{\,\mathrm{M}\,}}(\varPhi _{\epsilon , f}) = {{\,\mathrm{M}\,}}(\varPhi _{\epsilon /2, c})\), and on \(\Vert {{\,\mathrm{R}\,}}(\varPhi _{\epsilon , f})\Vert _{L^\infty (Q)} = \Vert {{\,\mathrm{R}\,}}(\varPhi _{\epsilon /2, c})\Vert _{L^\infty (Q)} \) follow from the corresponding estimates of Theorem 4.2. \(\square \)
Theorem 4.3 admits a straightforward generalization to functions with multivariate output, so that each coordinate is a weighted analytic function with the same regularity. Here, we denote for a NN \(\varPhi \) with N-dimensional output, \(N\in {\mathbb {N}}\), by \({{\,\mathrm{R}\,}}(\varPhi )_n\) the n-th component of the output (where \(n\in \{1, \dots , N\}\)).
Corollary 4.4
Let \(d \in \{2,3\}\) and \(Q {:}{=}(0,1)^d\). Let \({\mathcal {C}}=\{c\}\) where \(c\) is one of the corners of Q and let \({\mathcal {E}}= {\mathcal {E}}_c\) contain the edges adjacent to \(c\) when \(d=3\); \({\mathcal {E}}=\emptyset \) when \(d=2\). Let \(N_f\in {\mathbb {N}}\). Further assume that \(C_f, A_f>0\), and
Then, for all \({\varvec{f}}= (f_1, \dots , f_{N_f}) \in \left[ {\mathcal {J}}^{\varpi }_{\underline{\gamma }}(Q;{\mathcal {C}}, {\mathcal {E}};C_f,A_f) \right] ^{N_f}\) and every \(0< \epsilon <1\), there exists a NN \(\varPhi _{\epsilon , {\varvec{f}}}\) with d-dimensional input and \(N_f\)-dimensional output such that, for all \( n=1, \dots , N_f\),
In addition, \(\Vert {{\,\mathrm{R}\,}}(\varPhi _{\epsilon , {\varvec{f}}})_n\Vert _{L^\infty (Q)} = {\mathcal {O}}(\left| \log \epsilon \right| ^{2d})\) for every \(n = \{1, \dots , N_f\}\), \({{\,\mathrm{M}\,}}(\varPhi _{\epsilon , f}) = {\mathcal {O}}(\left| \log \epsilon \right| ^{2d+1} + N_f\left| \log \epsilon \right| ^{2d})\) and \({{\,\mathrm{L}\,}}(\varPhi _{\epsilon , f}) = {\mathcal {O}}(\left| \log \epsilon \right| \log (\left| \log \epsilon \right| ))\), for \(\epsilon \rightarrow 0\).
Proof
Let \(\varPhi _\epsilon \) be as in (4.8) and let \(c^{(n)} \in {\mathbb {R}}^{N_{\mathrm {1d}}\times \cdots \times N_{\mathrm {1d}}}\), \(n=1, \dots , N_f\) be the matrices of coefficients such that, in the notation of the proof of Theorems 4.2 and 4.3, for all \(n\in \{1, \dots , N_f\}\),
We define, for \({{\,\mathrm{vec}\,}}\) as defined in (4.9), the NN \(\varPhi _{\epsilon , {\varvec{f}}}\) as
The estimate (4.13) and the \(L^\infty \)-bound then follow from Theorem 4.2. The bound on \({{\,\mathrm{L}\,}}(\varPhi _{\epsilon , {\varvec{f}}})\) follows directly from Theorem 4.2 and Proposition 3.2. Finally, the bound on \({{\,\mathrm{M}\,}}(\varPhi _{\epsilon , {\varvec{f}}})\) follows by Theorem 4.2 and Proposition 3.2, as well as, from the observation that
for a constant \(C>0\) independent of \(N_f\) and \(\epsilon \). \(\square \)
5 Exponential Expression Rates for Weighted Analytic Solution Classes of PDEs
In this section, we develop Theorem 4.3 into several exponentially decreasing upper bounds for the rates of approximation, by realizations of NNs with ReLU activation, for solution classes to elliptic PDEs with singular data (such as singular coefficients or domains with nonsmooth boundary). In particular, we consider elliptic PDEs in two-dimensional general polygonal domains, in three-dimensional domains that are a union of cubes, and elliptic eigenvalue problems with isolated point singularities in the potential which arise in models of electron structure in quantum mechanics.
In each class of examples, the solution sets belong to the class of weighted analytic functions introduced in Sect. 2.2. However, the approximation rates established in Sect. 4 only hold on tensor product domains with singularities on the boundary. Therefore, we will first extend the exponential NN approximation rates to functions which exhibit singularities on a set of isolated points internal to the domain, arising from singular potentials of nonlinear Schrödinger operators. In Sect. 5.2, we demonstrate, using an argument based on a partition of unity, that the approximation problem on general polygonal domains can be reduced to that on tensor product domains and Fichera-type domains, and establish exponential NN expression rates for linear elliptic source and eigenvalue problems. In Sect. 5.3, we show exponential NN expression rates for classes of weighted analytic functions on two- and three-dimensional Fichera-type domains.
5.1 Nonlinear Eigenvalue Problems with Isolated Point Singularities
Point singularities emerge in the solutions of elliptic eigenvalue problems, as arise, for example, for electrostatic interactions between charged particles that are modeled mathematically as point sources in \({\mathbb {R}}^3\). Other problems that exhibit point singularities appear in general relativity, and for electron structure models in quantum mechanics. We concentrate here on the expression rate of “ab initio” NN approximation of the electron density near isolated singularities of the nuclear potential. Via a ReLU-based partition of unity argument, an exponential approximation rate bound for a single, isolated point singularity in Theorem 5.1 is extended in Corollary 5.4 to electron densities corresponding to potentials with multiple point singularities at a priori known locations, modeling (static) molecules.
The numerical approximation in ab initio electron structure computations with NNs has been recently reported to be competitive with other established, methodologies (e.g., [25, 44] and the references there). The exponential ReLU expression rate bounds obtained here can, in part, underpin competitive performances of NNs in (static) electron structure computations.
We recall that all NNs are realized with the ReLU activation function, see (3.1).
5.1.1 Nonlinear Schrödinger Equations
Let \(\varOmega = {\mathbb {R}}^d/(2{\mathbb {Z}})^d\), where \(d \in \{2,3\}\), be a flat torus and let \(V:\varOmega \rightarrow {\mathbb {R}}\) be a potential such that \(V(x)\ge V_0>0\) for all \(x\in \varOmega \) and there exists \(\delta >0\) and \(A_V>0\) such that
where \(r(x) = {{\,\mathrm{dist}\,}}(x, (0, \dots , 0))\). For \(k \in \{0, 1, 2\}\), we introduce the Schrödinger eigenproblem that consists in finding the smallest eigenvalue \(\uplambda \in {\mathbb {R}}\) and an associated eigenfunction \(u \in H^1(\varOmega )\) such that
The following approximation result holds.
Theorem 5.1
Let \(k \in \{0,1,2\}\) and \((\uplambda , u)\in {\mathbb {R}}\times H^1(\varOmega )\backslash \{ 0 \}\) be a solution of the eigenvalue problem (5.2) with minimal \(\uplambda \), where V satisfies (5.1).
Then, for every \(0< \epsilon \le 1\) there exists a NN \(\varPhi _{\epsilon , u}\) such that
In addition, as \(\epsilon \rightarrow 0\),
Proof
Let \({\mathcal {C}}= \{(0, \dots , 0)\}\) and \({\mathcal {E}}=\emptyset \). The regularity of u is a consequence of [34, Theorem 2] (see also [35, Corollary 3.2] for the linear case \(k=0\)): there exists \(\gamma _c> d/2\) and \(C_u, A_u>0\) such that \(u\in {\mathcal {J}}^\varpi _{\gamma _c}(\varOmega ; {\mathcal {C}}, {\mathcal {E}}; C_u, A_u)\). Here, \(\gamma _c\) and the constants \(C_u\) and \(A_u\) depend only on \(V_0\), \(A_V\) and \(\delta \) in (5.1), and on k in (5.2).
Then, for all \(0 < \epsilon \le 1\), by Theorems 4.2 and A.25, there exists a NN \(\varPhi _{\epsilon , u}\) such that (5.3) holds. Furthermore, there exist constants \(C_1\), \(C_2 > 0\) dependent only on \(V_0\), \(A_V\), \(\delta \), and k, such that
\(\square \)
5.1.2 Hartree–Fock Model
The Hartree–Fock model is an approximation of the full many-body representation of a quantum system under the Born–Oppenheimer approximation, where the many-body wave function is replaced by a sum of Slater determinants. Under this hypothesis, for \(M, N \in {\mathbb {N}}\), the Hartree–Fock energy of a system with N electrons and M nuclei with positive charges \(Z_i\) at isolated locations \(R_i\in {\mathbb {R}}^3\), reads
where \(\delta _{ij}\) is the Kronecker delta, \(V(x) = -\sum _{i=1}^{M} Z_i/\left\| x-R_i \right\| _{2}\), \(\tau (x, y) = \sum _{i=1}^N\varphi _i(x)\varphi _i(y)\), and \(\rho (x) = \tau (x,x)\), see, e.g., [30, 31]. The Euler–Lagrange equations of (5.4) read
with \(\int _{{\mathbb {R}}^3}\varphi _i\varphi _j=\delta _{ij}\).
Remark 5.2
It has been shown in [30] that, if \(\sum _{k=1}^MZ_k>N-1\), there exists a ground state \(\varphi _1,\dots , \varphi _N \) of (5.4), solution to (5.5).
The following statement gives exponential expression rate bounds of the NN-based approximation of electronic wave functions in the vicinity of one singularity (corresponding to the location of a nucleus) of the potential.
Theorem 5.3
Assume that (5.5) has N real eigenvalues \(\uplambda _1, \dots , \uplambda _N\) with associated eigenfunctions \(\varphi _1, \dots , \varphi _N\), such that \(\int _{{\mathbb {R}}^3}\varphi _i\varphi _j = \delta _{ij}\). Fix \(k\in \{1, \dots , M\}\), let \(R_k\) be one of the singularities of V and let \(a>0\) such that \(\left\| R_j-R_k \right\| _{\infty }>2a\) for all \(j\in \{1, \dots , M\}\setminus \{k\}\). Let \(\varOmega _k\) be the cube \(\varOmega _k = \left\{ x\in {\mathbb {R}}^3:\Vert x - R_k\Vert _{\infty }\le a \right\} \).
Then, there exists a NN \(\varPhi _{\epsilon , \varphi }\) such that \({{\,\mathrm{R}\,}}(\varPhi _{\epsilon , \varphi }) : {\mathbb {R}}^3\rightarrow {\mathbb {R}}^N\), satisfies
In addition, as \(\epsilon \rightarrow 0\), \(\Vert {{\,\mathrm{R}\,}}(\varPhi _{\epsilon , \varphi })_i\Vert _{L^\infty (\varOmega _k)} = {\mathcal {O}}(\left| \log \epsilon \right| ^{6})\) for every \(i = \{1, \dots , N\}\),
Proof
Let \({\mathcal {C}}= \{(0, 0,0)\}\) and \({\mathcal {E}}= \emptyset \) and fix \(k\in \{1 ,\dots , M\}\). From the regularity result in [36, Corollary 1], see also [13, 14], there exist \(C_\varphi \), \(A_\varphi \), and \(\gamma _c>3/2\) such that \((\varphi _1, \dots , \varphi _N) \in \left[ {\mathcal {J}}^\varpi _{\gamma _c}(\varOmega _k; {\mathcal {C}}, {\mathcal {E}}; C_\varphi , A_\varphi )\right] ^N\). Then, (5.6), the \(L^\infty \) bound and the depth and size bounds on the NN \(\varPhi _{\epsilon , \varphi }\) follow from the hp approximation result in Theorem A.25 (centered in \(R_k\) by translation), from Theorem 4.2, as in Corollary 4.4. \(\square \)
The arguments in the preceding subsections applied to wave functions for a single, isolated nucleus with interaction modeled by the singular potential V as in (5.1) can then be extended to give upper bounds on the approximation rates achieved by realizations of NNs of the wave functions in a bounded, sufficiently large domain containing all singularities of the nuclear potential in (5.4).
Corollary 5.4
Assume that (5.5) has N real eigenvalues \(\uplambda _1, \ldots , \uplambda _N\) with associated eigenfunctions \(\varphi _1, \ldots , \varphi _N\), such that \(\int _{{\mathbb {R}}^3}\varphi _i\varphi _j = \delta _{ij}\). Let \(a_i, b_i\in {\mathbb {R}}\), \(i=1,2,3\), and such that \(\{R_j\}_{j=1}^M\subset \varOmega \). Then, for every \(0< \epsilon <1\), there exists a NN \(\varPhi _{\epsilon , \varphi }\) such that \({{\,\mathrm{R}\,}}(\varPhi _{\epsilon , \varphi }) : {\mathbb {R}}^3\rightarrow {\mathbb {R}}^N\) and
Furthermore, as \(\epsilon \rightarrow 0\) \({{\,\mathrm{M}\,}}(\varPhi _{\epsilon , \varphi }) = {\mathcal {O}}(\left| \log (\epsilon )\right| ^{7} + N\left| \log (\epsilon )\right| ^{6})\) and \({{\,\mathrm{L}\,}}(\varPhi _{\epsilon , \varphi }) = {\mathcal {O}}(\left| \log (\epsilon )\right| \log (\left| \log (\epsilon )\right| ))\).
Proof
The proof is based on a partition of unity argument. We only sketch it at this point, but will develop it in detail in the proof of Theorem 5.6. Let \({\mathcal {T}}\) be a tetrahedral, regular triangulation of \(\varOmega \), and let \(\{\kappa _k \}_{k=1}^{N_\kappa }\) be the hat-basis functions associated with it. We suppose that the triangulation is sufficiently refined to ensure that, for all \(k\in \{1, \dots , N_\kappa \}\), exists a cube \({\widetilde{\varOmega }}_{k}\subset \varOmega \) such that \({{\,\mathrm{supp}\,}}(\kappa _k) \subset {\widetilde{\varOmega }}_{k}\) and that there exists at most one \(j\in \{1, \dots , M\}\) such that \({\overline{{\widetilde{\varOmega }}}}_k\cap R_j \ne \emptyset \).
For all \(k\in \{1, \dots , N_\kappa \}\), by [23, Theorem 5.2], which is based on [56], there exists a NN \(\varPhi ^{\kappa _k}\) such that
For all \(0<\epsilon <1\), let
For all \(k\in \{1, \dots , N_\kappa \}\) and \(i\in \{1,\ldots ,N\}\), it holds that \(\varphi _i|_{{\widetilde{\varOmega }}_k} \in {\mathcal {J}}^{\varpi }_\gamma ({\widetilde{\varOmega }}_k; \{R_1, \dots , R_M\}\cap {\overline{{\widetilde{\varOmega }}}}_k, \emptyset )\). Then, there exists a NN \(\varPhi _{\epsilon _1, \varphi }^{k}\), as defined in Theorem 5.3, such that
Let
where the finiteness is due to Theorem 5.3. Then, we denote
and \(M_{\times }(\epsilon _1) {:}{=}C_\infty (1+\left| \log \epsilon _1 \right| ^6)\). As detailed in the proof of Theorem 5.6 below, after concatenating with identity NNs and possibly after increasing the constants, we assume that \({{\,\mathrm{L}\,}}(\varPhi _{\epsilon _1, \varphi }^k)\) is independent of k and that the bound on \({{\,\mathrm{M}\,}}(\varPhi _{\epsilon _1, \varphi }^k)\) is independent of k, and that the same holds for \(\varPhi ^{\kappa _k}\), \(k=1,\ldots ,N_\kappa \).
Let now, for \(i\in \{1, \dots , N\}\), \(E_i \in \{0,1\}^{2\times N+1}\) be the matrices such that, for all \(x = (x_1, \dots , x_{N+1})\), \(E_i x = (x_i, x_{N+1})\). Let also \(A = ({{\,\mathrm{Id}\,}}_{N\times N},\ldots ,{{\,\mathrm{Id}\,}}_{N\times N}) \in {\mathbb {R}}^{N\times N_\kappa N}\) be the block matrix comprising \(N_\kappa \) times the identity matrix \({{\,\mathrm{Id}\,}}_{N\times N}\in {\mathbb {R}}^{N\times N}\). Then, we introduce the NN
where \(L\in {\mathbb {N}}\) is such that \({{\,\mathrm{L}\,}}(\varPhi ^{{{\,\mathrm{Id}\,}}}_{1,L} \odot \varPhi ^{\kappa _k}) = {{\,\mathrm{L}\,}}(\varPhi ^{k}_{\epsilon _1, \varphi })\), from which it follows that \({{\,\mathrm{M}\,}}(\varPhi ^{{{\,\mathrm{Id}\,}}}_{1,L}) \le C{{\,\mathrm{L}\,}}(\varPhi ^{k}_{\epsilon _1, \varphi })\). It holds, for all \(i\in \{1, \dots , N\}\), that
By the triangle inequality, [40, Theorem 2.1], (5.8), and Proposition 3.6, for all \(i\in \{1, \dots , N\}\),
The asymptotic bounds on the size and depth of \(\varPhi _{\epsilon , \varphi }\) can then be derived from (5.9), using Theorem 5.3, as developed in more detail in the proof of Theorem 5.6 below. \(\square \)
5.2 Elliptic PDEs in Polygonal Domains
We establish exponential expressivity for realizations of NNs with ReLU activation of solution classes to elliptic PDEs in polygonal domains \(\varOmega \), the boundaries \(\partial \varOmega \) of which are Lipschitz and consist of a finite number of straight line segments. Notably, \(\varOmega \subset {\mathbb {R}}^2\) need not be a finite union of axiparallel rectangles. In the following lemma, we construct a partition of unity in \(\varOmega \) subordinate to an open covering, of which each element is the affine image of one out of three canonical patches. Remark that we admit corners with associate angle of aperture \(\pi \); this will be instrumental, in Corollaries 5.10 and 5.11, for the imposition of different boundary conditions on \(\partial \varOmega \). The three canonical patches that we consider are listed in Lemma 5.5, item [P2]. Affine images of \((0,1)^2\) are used away from corners of \(\partial \varOmega \) and when the internal angle of a corner is smaller than \(\pi \). Affine images of \((-1,1)\times (0,1)\) are used near corners with internal angle \(\pi \). PDE solutions may exhibit point singularities near such corners, e.g., if the two neighboring edges have different types of boundary conditions. Affine images of \( (-1,1)^2\setminus (-1, 0]^2\) are used near corners with internal angle larger than \(\pi \). In the proof of Theorem 5.6, we use on each patch Theorem 4.3 or a result from Sect. 5.3 below.
A triangulation \({\mathcal {T}}\) of \({\mathbb {R}}^2\) is defined as a partition of \({\mathbb {R}}^2\) into open triangles K such that \(\bigcup _{K\in {\mathcal {T}}} {\overline{K}} = {\mathbb {R}}^2\). A regular triangulation of \({\mathbb {R}}^2\) is a triangulation \({\mathcal {T}}\) of \({\mathbb {R}}^2\) such that, additionally, for any two neighboring elements \(K_1, K_2\in {\mathcal {T}}\), \({\overline{K}}_1\cap {\overline{K}}_2\) is either a corner of both \(K_1\) and \(K_2\) or the closure of an entire edge of both \(K_1\) and \(K_2\). For a regular triangulation \({\mathcal {T}}\) of \({\mathbb {R}}^2\), we denote by \(S_1({\mathbb {R}}^2, {\mathcal {T}})\) the space of continuous functions on \({\mathbb {R}}^2\) such that for every \(K \in {\mathcal {T}}\), \(v|_{K} \in {\mathbb {P}}_1\).
We postpone the proof of Lemma 5.5 to “Appendix” B.1.
Lemma 5.5
Let \(\varOmega \subset {\mathbb {R}}^2\) be a polygon with Lipschitz boundary, consisting of straight sides, and with a finite set \({\mathcal {C}}\) of corners. Then, there exists a regular triangulation \({\mathcal {T}}\) of \({\mathbb {R}}^2\), such that for all \(K\in {\mathcal {T}}\) either \(K\subset \varOmega \) or \(K\subset \varOmega ^c\) and such that only finitely many \(K\in {\mathcal {T}}\) satisfy \(K\subset \varOmega \). Moreover, there exist \(N_p\in {\mathbb {N}}\), an open cover \(\{\varOmega _i\}_{i=1}^{N_p}\) of \(\varOmega \), and a partition of unity \(\{\phi _i\}_{i=1}^{N_p}\in \left[ S_1({\mathbb {R}}^2,{\mathcal {T}})\right] ^{N_p}\) on \(\varOmega \) (i.e., \(\sum _{i=1}^{N_p} \phi _i(x) = 1\) for all \(x\in \varOmega \), but this need not hold for \(x\in \varOmega ^c\)) such that
-
[P1]
\({{\,\mathrm{supp}\,}}(\phi _i)\cap \varOmega \subset \varOmega _i\) for all \(i=1,\ldots ,N_p\),
-
[P2]
for each \(i\in \{1, \dots , N_p\}\), there exists an affine map \(\psi _i :{\mathbb {R}}^2 \rightarrow {\mathbb {R}}^2\) such that \(\psi _i^{-1}(\varOmega _i) = {\widehat{\varOmega }}_i\) for
$$\begin{aligned} {\widehat{\varOmega }}_i \in \{(0,1)^2,\; (-1,1)\times (0,1), \; (-1,1)^2\setminus (-1, 0]^2 \}, \end{aligned}$$ -
[P3]
\({\mathcal {C}} \cap {\overline{\varOmega }}_i \subset \psi _i(\{(0,0)\})\) for all \(i\in \{1, \dots , N_p\}\).
The following statement, then, provides expression rates for the NN approximation of functions in weighted analytic classes in polygonal domains.
We recall that all NNs are realized with the ReLU activation function, see (3.1).
Theorem 5.6
Let \(\varOmega \subset {\mathbb {R}}^2\) be a polygon with Lipschitz boundary \(\varGamma \) consisting of straight sides and with a finite set \({\mathcal {C}}\) of corners. Let \({\underline{\gamma }}= \{\gamma _c: c\in {\mathcal {C}}\}\) such that \(\min {\underline{\gamma }}>1\). Then, for all \(u\in {\mathcal {J}}^\varpi _{\underline{\gamma }}(\varOmega ; {\mathcal {C}}, \emptyset )\) and for every \(0< \epsilon < 1\), there exists a NN \(\varPhi _{\epsilon , u}\) such that
In addition, as \(\epsilon \rightarrow 0\),
Proof
We introduce, using Lemma 5.5, a regular triangulation \({\mathcal {T}}\) of \({\mathbb {R}}^2\), an open cover \(\{\varOmega _i\}_{i=1}^{N_p}\) of \(\varOmega \), and a partition of unity \(\{\phi _i\}_{i=1}^{N_p} \in \left[ S_1({\mathbb {R}}^2,{\mathcal {T}}) \right] ^{N_p}\) on \(\varOmega \) such that the properties [P1] – [P3] of Lemma 5.5 hold.
We define \({\hat{u}}_i {:}{=}u_{|_{\varOmega _i}}\circ \psi _i : {\widehat{\varOmega }}_i\rightarrow {\mathbb {R}}\). Since \(u\in {\mathcal {J}}^\varpi _{{\underline{\gamma }}}(\varOmega ; {\mathcal {C}}, \emptyset )\) with \(\min {\underline{\gamma }}>1\) and since the maps \(\psi _i\) are affine, we observe that for every \(i\in \{1, \dots , N_p\}\), there exists \({\underline{\gamma }}\) such that \(\min {\underline{\gamma }}>1\) and \({\hat{u}}_i\in {\mathcal {J}}^\varpi _{{\underline{\gamma }}}({\widehat{\varOmega }}_i, \{(0,0)\}, \emptyset )\), because of [P2] and [P3]. Let
By Theorem 4.3 and by Lemma 5.19 and Theorem 5.14 in the forthcoming Sect. 5.3, there exist \(N_p\) NNs \(\varPhi ^{{\hat{u}}_i}_{\epsilon _1}\), \(i\in \{1,\dots , N_p\}\), such that
and there exists \(C_\infty >0\) independent of \(\epsilon _1\) such that, for all \(i \in \{1, \dots , N_p\}\) and all \({\hat{\epsilon }} \in (0,1)\)
The NNs given by Theorem 4.3, Lemma 5.19 and Theorem 5.14, which we here denote by \({\widetilde{\varPhi }}^{{\hat{u}}_i}_{\epsilon _1}\) for \(i = 1,\ldots ,N_p\), may not have equal depth. Therefore, for all \(i=1,\ldots ,N_p\) and suitable \(L_i\in {\mathbb {N}}\) we define \(\varPhi ^{{\hat{u}}_i}_{\epsilon _1} {:}{=} \varPhi ^{{{\,\mathrm{Id}\,}}}_{1,L_i} \odot {\widetilde{\varPhi }}^{{\hat{u}}_i}_{\epsilon _1}\), so that the depth is the same for all \(i=1,\ldots ,N_p\). To estimate the size of the enlarged NNs, we use the fact that the size of a NN is not smaller than the depth unless the associated realization is constant. In the latter case, we could replace the NN by a NN with one non-zero weight without changing the realization. By this argument, we obtain for all \(i=1,\ldots ,N_p\) that \({{\,\mathrm{M}\,}}(\varPhi ^{{\hat{u}}_i}_{\epsilon _1}) \le 2{{\,\mathrm{M}\,}}(\varPhi ^{{{\,\mathrm{Id}\,}}}_{1,L_i}) + 2{{\,\mathrm{M}\,}}({\widetilde{\varPhi }}^{{\hat{u}}_i}_{\epsilon _1}) \le C \max _{j=1,\ldots ,N_p} {{\,\mathrm{L}\,}}({\widetilde{\varPhi }}^{{\hat{u}}_j}_{\epsilon _1}) + C {{\,\mathrm{M}\,}}({\widetilde{\varPhi }}^{{\hat{u}}_i}_{\epsilon _1}) \le C \max _{j=1,\ldots ,N_p} {{\,\mathrm{M}\,}}({\widetilde{\varPhi }}^{{\hat{u}}_j}_{\epsilon _1})\). Furthermore, as shown in [23], there exist NNs \(\varPhi ^{\phi _i}\), \(i\in \{1, \dots , N_p\}\), such that
Here, we use that \({\mathcal {T}}\) is a partition of \({\mathbb {R}}^2\), so that \(\phi _i\) is defined on all of \({\mathbb {R}}^2\) and [23, Theorem 5.2] applies, which itself is based on [56].
Possibly after concatenating with identity networks in the same way as just described for \(\varPhi ^{{\hat{u}}_i}_{\epsilon _1}\), we can assume that \(\varPhi ^{\phi _i}\) for \(i=1,\ldots ,N_p\) all have equal depth and that the size of \(\varPhi ^{\phi _i}\) is bounded independent of i.
Since by [P2] the mappings \(\psi _i\) are affine and invertible, it follows that \(\psi _i^{-1}\) is affine for every \(i \in \{1, \dots , N_p\}\). Thus, there exist NNs \(\varPhi ^{\psi ^{-1}_i}\), \(i\in \{1, \dots , N_p\}\), of depth 1, such that
Next, we define
and \(M_{\times }(\epsilon _1){:}{=}C_\infty (1+\left| \log \epsilon _1 \right| ^4)\). Finally, we set
where \(L\in {\mathbb {N}}\) is such that \({{\,\mathrm{L}\,}}(\varPhi ^{{\hat{u}}_1}_{\epsilon _1}\odot \varPhi ^{\psi ^{-1}_1}) = {{\,\mathrm{L}\,}}(\varPhi ^{{{\,\mathrm{Id}\,}}}_{1,L} \odot \varPhi ^{\phi _1})\), which yields \({{\,\mathrm{M}\,}}(\varPhi ^{{{\,\mathrm{Id}\,}}}_{1,L}) \le C {{\,\mathrm{L}\,}}(\varPhi ^{{\hat{u}}_1}_{\epsilon _1}\odot \varPhi ^{\psi ^{-1}_1})\).
Approximation accuracy. By (5.13), we have for all \(x\in \varOmega \),
Therefore,
We start by considering term (I). For each \(i\in \{1, \dots , N_p\}\), thanks to (5.11) and denoting by \(\Vert J_{\psi ^{-1}_i} \Vert _2^2\) the square of the matrix 2-norm of the Jacobian of \(\psi _i^{-1}\), it holds that
By [40, Theorem 2.1],
We now consider term (II) in (5.14). By Theorem 4.3 and (5.12), it holds that
for all \(i\in \{1, \dots , N_p\}\). Furthermore, by [P1], \(\phi _i(x) = 0\) for all \(x\in \varOmega \setminus \varOmega _i\) and, by Proposition 3.6,
From (5.15), we also have
Hence,
The asserted approximation accuracy follows by combining (5.14), (5.16), and (5.17).
Size of the neural network. To bound the size of the NN, we remark that \(N_p\) and the sizes of \(\varPhi ^{\psi _i^{-1}}\) and of \(\varPhi ^{\phi _i}\) only depend on the domain \(\varOmega \). Furthermore, there exist constants \(C_{\varOmega , i}\), \(i=1,2,3\), that depend only on \(\varOmega \) and u such that
From Theorem 4.3 and Proposition 3.6, in addition, there exist constants \(C^L_{{\hat{u}}}, C^M_{{\hat{u}}}, C_{\times }>0\) such that, for all \(0< \epsilon _1 , \epsilon _\times \le 1\),
Then, by (5.13), we have
The desired depth and size bounds follow from (5.18), (5.19), and (5.20). This concludes the proof. \(\square \)
The exponential expression rate for the class of weighted, analytic functions in \(\varOmega \) by realizations of NNs with ReLU activation in the \(H^1(\varOmega )\)-norm established in Theorem 5.6 implies an exponential expression rate bound on \(\partial \varOmega \), via the trace map and the fact that \(\partial \varOmega \) can be exactly parametrized by the realization of a shallow NN with ReLU activation. This is relevant for NN-based solution of boundary integral equations.
Corollary 5.7
(NN expression of Dirichlet traces) Let \(\varOmega \subset {\mathbb {R}}^2\) be a polygon with Lipschitz boundary \(\varGamma \) and a finite set \({\mathcal {C}}\) of corners. Let \({\underline{\gamma }}= \{\gamma _c: c\in {\mathcal {C}}\}\) such that \(\min {\underline{\gamma }}>1\). For any connected component \(\varGamma \) of \(\partial \varOmega \), let \(\ell _\varGamma >0\) be the length of \(\varGamma \), such that there exists a continuous, piecewise affine parametrization \(\theta :[0,\ell _\varGamma ]\rightarrow {\mathbb {R}}^2:t\mapsto \theta (t)\) of \(\varGamma \) with finitely many affine linear pieces and \(\left\| \tfrac{d}{dt}\theta \right\| _{2} = 1\) for almost all \(t\in [0,\ell _\varGamma ]\).
Then, for all \(u\in {\mathcal {J}}^\varpi _{\underline{\gamma }}(\varOmega ; {\mathcal {C}}, \emptyset )\) and for all \(0< \epsilon < 1\), there exists a NN \(\varPhi _{\epsilon , u, \theta }\) approximating the trace \(Tu {:}{=} u_{|_{\varGamma }}\) such that
In addition, as \(\epsilon \rightarrow 0\),
Proof
We note that both components of \(\theta \) are continuous, piecewise affine functions on \([0,\ell _\varGamma ]\), thus they can be represented exactly as realization of a NN of depth two, with the ReLU activation function. Moreover, the number of weights of these NNs is of the order of the number of affine linear pieces of \(\theta \). We denote the parallelization of the NNs emulating exactly the two components of \(\theta \) by \(\varPhi ^{\theta }\).
By continuity of the trace operator \(T: H^1(\varOmega ) \rightarrow H^{1/2}(\partial \varOmega )\) (e.g., [7, 15]), there exists a constant \(C_{\mathrm {\varGamma }}>0\) such that for all \(v\in H^{1}(\varOmega )\) it holds \( \left\| Tv \right\| _{H^{1/2}(\varGamma )} \le C_{\mathrm {\varGamma }}\left\| v \right\| _{H^{1}(\varOmega )}, \) and without loss of generality we may assume \(C_{\mathrm {\varGamma }}\ge 1\).
Next, for any \(\varepsilon \in (0,1)\), let \(\varPhi _{\epsilon /C_{\mathrm {\varGamma }}, u}\) be as given by Theorem 5.6. Define \(\varPhi _{\epsilon , u, \theta } {:}{=} \varPhi _{\epsilon /C_{\mathrm {\varGamma }}, u} \odot \varPhi ^{\theta }\). It follows that
The bounds on its depth and size follow directly from Proposition 3.2, Theorem 5.6, and the fact that the depth and size of \(\varPhi ^{\theta }\) are independent of \(\varepsilon \). This finishes the proof. \(\square \)
Remark 5.8
The exponent 5 in the bound on the NN size \({{\,\mathrm{M}\,}}(\varPhi _{\epsilon , u, \theta })\) in Corollary 5.7 is likely not optimal, due to it being transferred from the NN rate in \(\varOmega \).
The proof of Theorem 5.6 established exponential expressivity of realizations of NNs with ReLU activation for the analytic class \({\mathcal {J}}^\varpi _{{\underline{\gamma }}}(\varOmega ; {\mathcal {C}}, \emptyset )\) in \(\varOmega \). This implies that realizations of NNs can approximate, with exponential expressivity, solution classes of elliptic PDEs in polygonal domains \(\varOmega \). We illustrate this by formulating concrete results for three problem classes: second-order, linear, elliptic source and eigenvalue problems in \(\varOmega \), and viscous, incompressible flow. To formulate the results, we specify the assumptions on \(\varOmega \).
Definition 5.9
(Linear, second-order, elliptic divergence-form differential operator with analytic coefficients) Let \(d\in \{2, 3\}\) and let \(\varOmega \subset {\mathbb {R}}^d\) be a bounded domain. Let the coefficient functions \(a_{ij},b_i, c:{\overline{\varOmega }}\rightarrow {\mathbb {R}}\) be real analytic in \({\overline{\varOmega }}\), and such that the matrix function \(A = (a_{ij})_{1\le i,j\le d}:\varOmega \rightarrow {\mathbb {R}}^{d \times d}\) is symmetric and uniformly positive definite in \(\varOmega \). With these functions, we define the linear, second-order, elliptic divergence-form differential operator \({{\mathcal {L}}}\) acting on \(w\in C^\infty _0(\varOmega )\) via (summation over repeated indices \(i,j\in \{1,\dots , d\}\))
Setting 1
We assume that \(\varOmega \subset {\mathbb {R}}^2\) is an open, bounded polygon with boundary \(\partial \varOmega \) that is Lipschitz and connected. In addition, \(\partial \varOmega \) is the closure of a finite number \(J \ge 3\) of straight, open sides \(\varGamma _j\), i.e., \(\varGamma _{i} \cap \varGamma _j =\emptyset \) for \(i\ne j\) and \(\partial \varOmega = \bigcup _{1\le j \le J} \overline{\varGamma _j}\). We assume the enumeration of the sides \(\varGamma _j\) to be J- cyclic, i.e., \(\varGamma _{J+1} = \varGamma _1\).
By \(n_j\), we denote the exterior unit normal vector to \(\varOmega \) on side \(\varGamma _j\) and by \({\varvec{c}}_j {:}{=} \overline{\varGamma _{j-1}} \cap \overline{\varGamma _j}\) the corner j of \(\varOmega \).
With \({{\mathcal {L}}}\) as in Definition 5.9, we associate on boundary segment \(\varGamma _j\) a boundary operator \({\mathcal {B}}_j \in \{\gamma ^j_0,\gamma ^j_1\}\), i.e., either the Dirichlet trace \(\gamma _0\) or the distributional (co-)normal derivative operator \(\gamma _1\), acting on \(w\in C^1({\overline{\varOmega }})\) via
We collect the boundary operators \({\mathcal {B}}_j\) in \({\mathcal {B}}{:}{=} \{ {\mathcal {B}}_j \}_{j=1}^J\).
The first corollary addresses exponential ReLU expressibility of solutions of the source problem corresponding to \(({{\mathcal {L}}},{\mathcal {B}})\).
Corollary 5.10
Let \(\varOmega \), \({{\mathcal {L}}}\), and \({\mathcal {B}}\) be as in Setting 1 with \(d=2\). For f analytic in \({\overline{\varOmega }}\), let u denote a solution to the boundary value problem
Then, for every \(0< \epsilon < 1\), there exists a NN \(\varPhi _{\epsilon , u}\) such that
In addition, \({{\,\mathrm{M}\,}}(\varPhi _{\epsilon , u}) = {\mathcal {O}}(\left| \log (\epsilon )\right| ^{5})\) and \({{\,\mathrm{L}\,}}(\varPhi _{\epsilon ,u}) = {\mathcal {O}}(\left| \log (\epsilon )\right| \log (\left| \log (\epsilon )\right| ))\), as \(\epsilon \rightarrow 0\).
Proof
The proof is obtained by verifying weighted, analytic regularity of solutions. By [3, Theorem 3.1], there exists \({\underline{\gamma }}\) such that \(\min {\underline{\gamma }}>1\) and \(u\in {\mathcal {J}}^\varpi _{\underline{\gamma }}(\varOmega ; {\mathcal {C}}, \emptyset )\). Then, the application of Theorem 5.6 concludes the proof. \(\square \)
Next, we address NN expression rates for eigenfunctions of \(({{\mathcal {L}}},{\mathcal {B}})\).
Corollary 5.11
Let \(\varOmega \), \({{\mathcal {L}}}\), \({\mathcal {B}}\) be as in Setting 1 with \(d=2\), and \(b_i = 0\) in Definition 5.9, and let \( 0 \ne w\in H^1(\varOmega )\) be an eigenfunction of the elliptic eigenvalue problem
Then, for every \(0< \epsilon < 1\), there exists a NN \(\varPhi _{\epsilon , w}\) such that
In addition, \({{\,\mathrm{M}\,}}(\varPhi _{\epsilon , w}) = {\mathcal {O}}(\left| \log (\epsilon )\right| ^{5})\) and \({{\,\mathrm{L}\,}}(\varPhi _{\epsilon ,w}) = {\mathcal {O}}(\left| \log (\epsilon )\right| \log (\left| \log (\epsilon )\right| ))\), as \(\epsilon \rightarrow 0\).
Proof
The statement follows from the regularity result [4, Theorem 3.1], and Theorem 5.6 as in Corollary 5.10. \(\square \)
The analytic regularity of solutions u in the proof of Theorem 5.6 also holds for certain nonlinear, elliptic PDEs. We illustrate it for the velocity field of viscous, incompressible flow in \(\varOmega \).
Corollary 5.12
Let \(\varOmega \subset {\mathbb {R}}^2\) be as in Setting 1. Let \(\nu >0\) and let \({\varvec{u}}\in H^1_0(\varOmega )^2\) be the velocity field of the Leray solutions of the viscous, incompressible Navier–Stokes equations in \(\varOmega \), with homogeneous Dirichlet (“no slip”) boundary conditions
where the components of \({\varvec{f}}\) are analytic in \({\overline{\varOmega }}\) and such that \(\Vert {\varvec{f}}\Vert _{H^{-1}(\varOmega )}/ \nu ^2\) is small enough so that \({\varvec{u}}\) is unique.
Then, for every \(0< \epsilon < 1\), there exists a NN \(\varPhi _{\epsilon , {\varvec{u}}}\) with two-dimensional output such that
In addition, \({{\,\mathrm{M}\,}}(\varPhi _{\epsilon , {\varvec{u}}})= {\mathcal {O}}(\left| \log (\epsilon )\right| ^{5})\) and \({{\,\mathrm{L}\,}}(\varPhi _{\epsilon ,{\varvec{u}}}) = {\mathcal {O}}(\left| \log (\epsilon )\right| \log (\left| \log (\epsilon )\right| ))\), as \(\epsilon \rightarrow 0\).
Proof
The velocity fields of Leray solutions of the Navier-Stokes equations in \(\varOmega \) satisfy the weighted, analytic regularity \({\varvec{u}} \in \big [{\mathcal {J}}^\varpi _{{\underline{\gamma }}}(\varOmega ; {\mathcal {C}},\emptyset )\big ]^2\), with \(\min {\underline{\gamma }}>1\), see [24, 38]. Then, the application of Theorem 5.6 concludes the proof. \(\square \)
5.3 Elliptic PDEs in Fichera-Type Polyhedral Domains
Fichera-type polyhedral domains \(\varOmega \subset {\mathbb {R}}^3\) are, loosely speaking, closures of finite, disjoint unions of (possibly affinely mapped) axiparallel hexahedra with \(\partial \varOmega \) Lipschitz. In Fichera-type domains, analytic regularity of solutions of linear, elliptic boundary value problems from acoustics and linear elasticity in displacement formulation has been established in [8]. As an example of a boundary value problem covered by [8] and our theory, consider \(\varOmega {:}{=}(-1, 1)^d \setminus (-1, 0]^d\) for \(d=2,3\), displayed for \(d=3\) in Fig. 2.
We recall that all NNs are realized with the ReLU activation function, see (3.1).
We introduce the setting for elliptic problems with analytic coefficients in \(\varOmega \). Note that the boundary of \(\varOmega \) is composed of 6 edges when \(d=2\) and of 9 faces when \(d=3\).
Setting 2
We assume that \({{\mathcal {L}}}\) is an elliptic operator as in Definition 5.9. When \(d=3\), we assume furthermore that the diffusion coefficient \(A\in {\mathbb {R}}^{3\times 3}\) is a symmectric, positive matrix and \(b_i = c = 0\). On each edge (if \(d=2\)) or face (if \(d=3\)) \(\varGamma _j\subset \partial \varOmega \), \(j\in \{1, \dots , 3d\}\), we introduce the boundary operator \({\mathcal {B}}_j \in \{\gamma _0,\gamma _1\}\), where \(\gamma _0\) and \(\gamma _1\) are defined as in (5.22). We collect the boundary operators \({\mathcal {B}}_j\) in \({\mathcal {B}}{:}{=} \{ {\mathcal {B}}_j \}_{j=1}^{3d}\).
For a right hand side f, the elliptic boundary value problem we consider in this section is then
The following extension lemma will be useful for the approximation of the solution to (5.29) by NNs. We postpone its proof to “Appendix” B.2.
Lemma 5.13
Let \(d\in \{2,3\}\) and \(u\in W_{\mathrm {mix}}^{1,1}(\varOmega )\). Then, there exists a function \(v\in W_{\mathrm {mix}}^{1,1}((-1,1)^d)\) such that \(v|_{\varOmega } = u\). The extension is stable with respect to the \(W_{\mathrm {mix}}^{1,1}\)-norm.
We denote the set containing all corners of \(\varOmega \) (including the re-entrant one) as
When \(d=3\), for all \(c\in {\mathcal {C}}\), then we denote by \({\mathcal {E}}_c\) the set of edges abutting at \(c\) and we denote \({\mathcal {E}}{:}{=}\bigcup _{c\in {\mathcal {C}}}{\mathcal {E}}_c\).
Theorem 5.14
Let \(u \in {\mathcal {J}}^\varpi _{\underline{\gamma }}(\varOmega ; {\mathcal {C}}, {\mathcal {E}})\) with
Then, for any \(0< \epsilon <1\) there exists a NN \(\varPhi _{\epsilon , u}\) so that
In addition, \(\Vert {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon , u}\right) \Vert _{L^\infty (\varOmega )} = {\mathcal {O}}(1+\left| \log \epsilon \right| ^{2d})\), as \(\epsilon \rightarrow 0\). Also, \({{\,\mathrm{M}\,}}(\varPhi _{\epsilon , u}) = {\mathcal {O}}(|\log (\epsilon )|^{2d+1})\) and \({{\,\mathrm{L}\,}}(\varPhi _{\epsilon , u}) = {\mathcal {O}}(|\log (\epsilon )|\log (|\log (\epsilon )|))\), as \(\epsilon \rightarrow 0\).
Proof
By Lemma 5.13, we extend the function u to a function \({{\tilde{u}}}\) such that
Note that, by the stability of the extension, there exists a constant \(C_{\mathrm {ext}}>0\) independent of u such that
Since \(u \in {\mathcal {J}}^\varpi _{\underline{\gamma }}(\varOmega ; {\mathcal {C}}, {\mathcal {E}})\), it follows that \(u\in {\mathcal {J}}^\varpi _{\underline{\gamma }}(S; {\mathcal {C}}_S, {\mathcal {E}}_S)\) for all
with \({\mathcal {C}}_S = {\overline{S}}\cap {\mathcal {C}}\) and \({\mathcal {E}}_{S} = \{e\in {\mathcal {E}}: e\subset {\overline{S}}\}\). Since \(S \subset \varOmega \) and \({{\tilde{u}}}|_{\varOmega } = u|_{\varOmega }\), we also have
By Theorem A.25 exist \(C_p>0\), \(C_{{\widetilde{N}}_{\mathrm {1d}}}>0\), \(C_{{\widetilde{N}}_{\mathrm {int}}}>0\), \(C_{{{\tilde{v}}}}>0\), \(C_{{\widetilde{c}}}>0\), and \(b_{{{\tilde{v}}}}>0\) such that, for all \(0<\epsilon \le 1\), there exists \(p\in {\mathbb {N}}\), a partition \({\mathcal {G}}_{\mathrm {1d}}\) of \((-1, 1)\) into \({\widetilde{N}}_{\mathrm {int}}\) open, disjoint, connected subintervals, a d-dimensional array \({\widetilde{c}}\in {\mathbb {R}}^{{\widetilde{N}}_{\mathrm {1d}}\times \dots \times {\widetilde{N}}_{\mathrm {1d}}}\), and piecewise polynomials \({{\tilde{v}}}_i \in {\mathbb {Q}}_p({\mathcal {G}}_{\mathrm {1d}})\cap H^1((-1,1))\), \(i=1, \dots , {\widetilde{N}}_{\mathrm {1d}}\), such that
and
Furthermore,
From the stability (5.31) and from Lemmas A.21 and A.22, it follows that
i.e., the bound on the coefficients \({\widetilde{c}}\) is independent of the extension \({{\tilde{u}}}\) of u. By Theorem 4.2, there exists a NN \({\varPhi }_{\epsilon , u}\) with the stated approximation properties and asymptotic size bounds. The bound on the \(L^\infty (\varOmega )\)-norm of the realization of \(\varPhi _{\epsilon , u}\) follows as in the proof of Theorem 4.3. \(\square \)
Remark 5.15
Arguing as in Corollary 5.7, a NN with ReLU activation and two-dimensional input can be constructed so that its realization approximates the Dirichlet trace of solutions to (5.29) in \(H^{1/2}(\partial \varOmega )\) at an exponential rate in terms of the NN size \({{\,\mathrm{M}\,}}\).
The following statement now gives expression rate bounds for the approximation of solutions to the Fichera problem (5.29) by realizations of NNs with the ReLU activation function.
Corollary 5.16
Let f be an analytic function on \({\overline{\varOmega }}\), and let u be a solution to (5.29) with operators \({{\mathcal {L}}}\) and \({\mathcal {B}}\) as in Setting 2 and with source term f. Then, for any \(0< \epsilon <1\) there exists a NN \(\varPhi _{\epsilon , u}\) so that
In addition, \({{\,\mathrm{M}\,}}(\varPhi _{\epsilon , u}) = {\mathcal {O}}(|\log (\epsilon )|^{2d+1})\) and \({{\,\mathrm{L}\,}}(\varPhi _{\epsilon , u}) = {\mathcal {O}}(|\log (\epsilon )|\log (|\log (\epsilon )|))\), for \(\epsilon \rightarrow 0\).
Proof
By [8, Corollary 7.1, Theorems 7.3 and 7.4] if \(d=3\) and [3, Theorem 3.1] if \(d=2\), there exists \({\underline{\gamma }}\) such that \(\gamma _c-d/2>0\) for all \(c\in {\mathcal {C}}\) and \(\gamma _e>1\) for all \(e\in {\mathcal {E}}\) such that \(u \in {\mathcal {J}}^\varpi _{\underline{\gamma }}(\varOmega ; {\mathcal {C}}, {\mathcal {E}})\). An application of Theorem 5.14 concludes the proof. \(\square \)
Remark 5.17
By [8, Corollary 7.1 and Theorem 7.4], Corollary 5.16 holds verbatim also under the hypothesis that the right-hand side f is weighted analytic, with singularities at the corners/edges of the domain; specifically, (5.33) and the size bounds on the NN \(\varPhi _{\epsilon , u}\) hold under the assumption that there exists \({\underline{\gamma }}\) such that \(\gamma _c-d/2>0\) for all \(c\in {\mathcal {C}}\) and \(\gamma _e>1\) for all \(e\in {\mathcal {E}}\) such that
Remark 5.18
The numerical approximation of solutions for (5.29) with a NN in two dimensions has been investigated, e.g., in [33] using the so-called PINNs methodology. There, the loss function was based on minimization of the residual of the NN approximation in the strong form of the PDE. Evidently, a different (smoother) activation than the ReLU activations considered here had to be used. Starting from the approximation of products by NNs with smoother activation functions introduced in [51, Sec.3.3] and following the same line of reasoning as in the present paper, the results we obtain for ReLU-based realizations of NNs can be extended to large classes of NNs with smoother activations and similar architecture.
Furthermore, in [11, Sect. 3.1], a slightly different elliptic boundary value problem is numerically approximated by realizations of NNs. Its solutions exhibit the same weighted, analytic regularity as considered in this paper. The presently obtained approximation rates by NN realizations extend also to the approximation of solutions for the problem considered in [11].
In the proof of Theorem 5.6, we require in particular the approximation of weighted analytic functions on \((-1,1)\times (0,1)\) with a corner singularity at the origin. For convenient reference, we detail the argument in this case.
Lemma 5.19
Let \(d=2\) and \(\varOmega _{DN} {:}{=} (-1,1)\times (0,1)\). Denote \({\mathcal {C}}_{DN} = \{-1,0,1\} \times \{0,1\}\). Let \(u \in {\mathcal {J}}^\varpi _{\underline{\gamma }}(\varOmega _{DN}; {\mathcal {C}}_{DN}, \emptyset )\) with \({\underline{\gamma }}= \{\gamma _c: c\in {\mathcal {C}}_{DN}\}\), with \(\gamma _c>1\) for all \(c\in {\mathcal {C}}_{DN}\).
Then, for any \(0< \epsilon <1\) there exists a NN \(\varPhi _{\epsilon , u}\) so that
In addition, \(\Vert {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon , u}\right) \Vert _{L^\infty (\varOmega _{DN})} = {\mathcal {O}}(1+\left| \log \epsilon \right| ^{4})\) , for \(\epsilon \rightarrow 0\). Also, \({{\,\mathrm{M}\,}}(\varPhi _{\epsilon , u}) = {\mathcal {O}}(|\log (\epsilon )|^{5})\) and \({{\,\mathrm{L}\,}}(\varPhi _{\epsilon ,u}) = {\mathcal {O}}(|\log (\epsilon )|\log (|\log (\epsilon )|))\), for \(\epsilon \rightarrow 0\).
Proof
Let \({{\tilde{u}}}\in W_{\mathrm {mix}}^{1,1}((-1, 1)^2)\) be defined by
such that \({{\tilde{u}}}|_{\varOmega _{DN}} = u\). Here, we used that there exist continuous imbeddings \({\mathcal {J}}^\varpi _{\underline{\gamma }}(\varOmega _{DN}; {\mathcal {C}}_{DN}, \emptyset ) \hookrightarrow W_{\mathrm {mix}}^{1,1}(\varOmega _{DN}) \hookrightarrow C^0(\overline{\varOmega _{DN}})\) (see Lemma A.22 for the first imbedding), i.e., u can be extended to a continuous function on \(\overline{\varOmega _{DN}}\).
As in the proof of Lemma 5.13, this extension is stable, i.e., there exists a constant \(C_{\mathrm {ext}}>0\) independent of u such that
Because \(u \in {\mathcal {J}}^\varpi _{\underline{\gamma }}(\varOmega _{DN}; {\mathcal {C}}_{DN}, \emptyset )\), it holds with \({\mathcal {C}}_S = {\overline{S}}\cap {\mathcal {C}}_{DN}\) that \(u\in {\mathcal {J}}^\varpi _{\underline{\gamma }}(S; {\mathcal {C}}_S, \emptyset )\) for all
The remaining steps are the same as those in the proof of Theorem 5.14. \(\square \)
6 Conclusions and Extensions
We review the main findings of the present paper and outline extensions of the present results, and perspectives for further research.
6.1 Principal Mathematical Results
We established exponential expressivity of realizations of NNs with the ReLU activation function in the Sobolev norm \(H^1\) for functions which belong to certain countably normed, weighted analytic function spaces in cubes \(Q=(0,1)^d\) of dimension \(d=2,3\). The admissible function classes comprise functions which are real analytic at points \(x\in Q\), and which admit analytic extensions to the open sides \(F\subset \partial Q\), but may have singularities at corners and (in space dimension \(d=3\)) edges of Q. We have also extended this result to cover exponential expressivity of realizations of NNs with ReLU activation for solution classes of linear, second-order elliptic PDEs in divergence form in plane, polygonal domains and of elliptic, nonlinear eigenvalue problems with singular potentials in three space dimensions. Being essentially an approximation result, the DNN expression rate bound in Theorem 5.6 will apply to any elliptic boundary value problem in polygonal domains where weighted, analytic regularity is available. Apart from the source and eigenvalue problems, such regularity is in space dimension \(d=2\) also available for linearized elastostatics, Stokes flow and general elliptic systems [8, 17, 20].
The established approximation rates of realizations of NNs with ReLU activation are fundamentally based on a novel exponential upper bound on approximation of weighted analytic functions via tensorized hp approximations on multi-patch configurations in finite unions of axiparallel rectangles/hexahedra. The hp approximation result is presented in Theorem A.25 and of independent interest in the numerical analysis of spectral elements.
The proofs of exponential expressivity of NN realizations are, in principle, constructive. They are based on explicit bounds on the coefficients of hp projections and on corresponding emulation rate bounds for the (re)approximation of modal hp bases.
6.2 Extensions and Future Work
The tensor structure of the hp approximation considered here limited geometries of domains that are admissible for our results. Curvilinear, mapped domains with analytic domain maps will allow corresponding approximation rates, with the NN approximations obtained by composing the present constructions with NN emulations of the domain maps and the fact that compositions of NNs are again NNs.
The only activation function considered in this work is the ReLU. Following the same proof strategy, exponential expression rate bounds can be obtained for functions with smoother, nonlinear activation functions. We refer to Remark 5.18 and to the discussion in [51, Sec. 3.3].
The principal results in Sect. 5.1 prove exponential expressivity of realizations of deep NNs with ReLU activation on solutions sets of singular eigenvalue problems with multiple, isolated point singularities and analytic potentials as arise in electron-structure models for static molecules with known loci of the nuclei. Inspection of our proofs reveals that the expression rate bounds are robust with respect to perturbations of the nuclei sites; only interatomic distances enter the constants in the expression rate bounds of Sect. 5.1.2. Given the closedness of NNs under composition, obtaining similar expression rates also for solutions of the vibrational Schrödinger equation appears in principle possible.
The presently proved deep ReLU NN expression rate bounds can, in connection with recently proposed, residual-based DNN training methodologies (see, e.g., [1, 22, 53] and the references there) imply exponential convergence rates of numerical NN approximations of PDE solutions based on machine learning approaches.
Notes
We assume isotropic tensorization, i.e., the same \(\sigma \) and the same number of geometric mesh layers in each coordinate direction; all approximation results remain valid (with possibly better numerical values for the constants in the error bounds) for anisotropic, co-ordinate dependent choices of \(\ell \) and of \(\sigma \).
References
Ainsworth, M., Dong, J.: Galerkin neural networks: A framework for approximating variational equations with error control. SIAM Journal on Scientific Computing 43(4), A2474–A2501 (2021). https://doi.org/10.1137/20M1366587.
Babuška, I., Guo, B.: The \(h\)-\(p\) version of the finite element method for domains with curved boundaries. SIAM J. Numer. Anal. 25(4), 837–861 (1988). https://doi.org/10.1137/0725048.
Babuška, I., Guo, B.: Regularity of the solution of elliptic problems with piecewise analytic data. I. Boundary value problems for linear elliptic equation of second order. SIAM J. Math. Anal. 19(1), 172–203 (1988). https://doi.org/10.1137/0519014.
Babuška, I., Guo, B.Q., Osborn, J.E.: Regularity and numerical solution of eigenvalue problems with piecewise analytic data. SIAM J. Numer. Anal. 26(6), 1534–1560 (1989). https://doi.org/10.1137/0726090.
Berner, J., Grohs, P., Jentzen, A.: Analysis of the generalization error: Empirical risk minimization over deep artificial neural networks overcomes the curse of dimensionality in the numerical approximation of Black–Scholes partial differential equations. SIAM J. Math. Data Sci. 2(3), 631–657 (2020). https://doi.org/10.1137/19M125649X.
Bolley, P., Dauge, M., Camus, J.: Régularité Gevrey pour le problème de Dirichlet dans des domaines à singularités coniques. Comm. Partial Differential Equations 10(4), 391–431 (1985). https://doi.org/10.1080/03605308508820383.
Brenner, S.C., Scott, L.R.: The mathematical theory of finite element methods, Texts in Applied Mathematics, vol. 15, third edn. Springer, New York (2008). https://doi.org/10.1007/978-0-387-75934-0.
Costabel, M., Dauge, M., Nicaise, S.: Analytic regularity for linear elliptic systems in polygons and polyhedra. Math. Models Methods Appl. Sci. 22(8), 1250015, 63 (2012). https://doi.org/10.1142/S0218202512500157.
Daubechies, I., DeVore, R., Dym, N., Faigenbaum-Golovin, S., Kovalsky, S.Z., Lin, K.C., Park, J., Petrova, G., Sober, B.: Neural network approximation of refinable functions (2021). arXiv:2107.13191
Daubechies, I., DeVore, R., Foucart, S., Hanin, B., Petrova, G.: Nonlinear approximation and (deep) ReLU networks. Constructive Approximation 55(1), 127–172 (2022). https://doi.org/10.1007/s00365-021-09548-z.
E, W., Yu, B.: The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat. 6(1), 1–12 (2018). https://doi.org/10.1007/s40304-018-0127-z.
Elbrächter, D., Grohs, P., Jentzen, A., Schwab, C.: DNN expression rate analysis of high-dimensional PDEs: Application to option pricing. Constructive Approximation 55(1), 3–71 (2022). https://doi.org/10.1007/s00365-021-09541-6.
Flad, H.J., Schneider, R., Schulze, B.W.: Asymptotic regularity of solutions to Hartree-Fock equations with Coulomb potential. Math. Methods Appl. Sci. 31(18), 2172–2201 (2008). https://doi.org/10.1002/mma.1021.
Fournais, S., Hoffmann-Ostenhof, M., Hoffmann-Ostenhof, T., Østergaard Sørensen, T.: Analytic structure of solutions to multiconfiguration equations. J. Phys. A 42(31), 315208, 11 (2009). https://doi.org/10.1088/1751-8113/42/31/315208.
Gagliardo, E.: Caratterizzazioni delle tracce sulla frontiera relative ad alcune classi di funzioni in \(n\) variabili. Rend. Sem. Mat. Univ. Padova 27, 284–305 (1957). http://www.numdam.org/item?id=RSMUP_1957__27__284_0
Gühring, I., Kutyniok, G., Petersen, P.: Error bounds for approximations with deep ReLU neural networks in \(W^{s,p}\) norms. Anal. Appl. (Singap.) 18(05), 803–859 (2020). https://doi.org/10.1142/S0219530519410021.
Guo, B., Babuška, I.: On the regularity of elasticity problems with piecewise analytic data. Adv. in Appl. Math. 14(3), 307–347 (1993). https://doi.org/10.1006/aama.1993.1016.
Guo, B., Babuška, I.: Regularity of the solutions for elliptic problems on nonsmooth domains in \({\mathbb{R}}^3\). I. Countably normed spaces on polyhedral domains. Proc. Roy. Soc. Edinburgh Sect. A 127(1), 77–126 (1997). https://doi.org/10.1017/S0308210500023520.
Guo, B., Babuška, I.: Regularity of the solutions for elliptic problems on nonsmooth domains in \({\mathbb{R}}^3\). II. Regularity in neighbourhoods of edges. Proc. Roy. Soc. Edinburgh Sect. A 127(3), 517–545 (1997). https://doi.org/10.1017/S0308210500029899.
Guo, B., Schwab, C.: Analytic regularity of Stokes flow on polygonal domains in countably weighted Sobolev spaces. J. Comput. Appl. Math. 190(1-2), 487–519 (2006). https://doi.org/10.1016/j.cam.2005.02.018.
Han, J., Zhang, L., E, W.: Solving many-electron Schrödinger equation using deep neural networks. J. Comput. Phys. 399, 108929, 8 (2019). https://doi.org/10.1016/j.jcp.2019.108929.
Hao, W., Jin, X., Siegel, J.W., Xu, J.: An efficient greedy training algorithm for neural networks and applications in PDEs (2021). arXiv:2107.04466
He, J., Li, L., Xu, J., Zheng, C.: ReLU deep neural networks and linear finite elements. J. Comp. Math. 38 (2020). https://doi.org/10.4208/jcm.1901-m2018-0160.
He, Y., Marcati, C., Schwab, C.: Analytic regularity for the Navier-Stokes equations in polygons with mixed boundary conditions. Tech. Rep. 2021-29, Seminar for Applied Mathematics, ETH Zürich, Switzerland (2021). https://www.sam.math.ethz.ch/sam_reports/reports_final/reports2021/2021-29.pdf
Hermann, J., Schätzle, Z., Noé, F.: Deep-neural-network solution of the electronic Schrödinger equation. Nature Chemistry 12(10), 891–897 (2020). https://doi.org/10.1038/s41557-020-0544-y.
Jentzen, A., Salimova, D., Welti, T.: A proof that deep artificial neural networks overcome the curse of dimensionality in the numerical approximation of Kolmogorov partial differential equations with constant diffusion and nonlinear drift coefficients. Communications in Mathematical Sciences 19(5), 1167–1205 (2021). https://doi.org/10.4310/CMS.2021.v19.n5.a1.
Kazeev, V., Oseledets, I., Rakhuba, M., Schwab, C.: QTT-Finite-Element approximation for multiscale problems I: model problems in one dimension. Adv. Comput. Math. 43(2), 411–442 (2017). https://doi.org/10.1007/s10444-016-9491-y.
Kazeev, V., Schwab, C.: Quantized tensor-structured finite elements for second-order elliptic PDEs in two dimensions. Numer. Math. 138(1), 133–190 (2018). https://doi.org/10.1007/s00211-017-0899-1.
Laakmann, F., Petersen, P.: Efficient approximation of solutions of parametric linear transport equations by ReLU DNNs. Advances in Computational Mathematics 47(1), 11 (2021). https://doi.org/10.1007/s10444-020-09834-7.
Lieb, E.H., Simon, B.: The Hartree-Fock theory for Coulomb systems. Comm. Math. Phys. 53(3), 185–194 (1977). http://projecteuclid.org/euclid.cmp/1103900699
Lions, P.L.: Solutions of Hartree-Fock equations for Coulomb systems. Comm. Math. Phys. 109(1), 33–97 (1987). http://projecteuclid.org/euclid.cmp/1104116712
Lu, J., Shen, Z., Yang, H., Zhang, S.: Deep network approximation for smooth functions. SIAM Journal on Mathematical Analysis 53(5), 5465–5506 (2021). https://doi.org/10.1137/20M134695X.
Lu, L., Meng, X., Mao, Z., Karniadakis, G.E.: Deepxde: A deep learning library for solving differential equations. SIAM Review 63(1), 208–228 (2021). https://doi.org/10.1137/19M1274067.
Maday, Y., Marcati, C.: Analyticity and \(hp\) discontinuous Galerkin approximation of nonlinear Schrödinger eigenproblems. arXiv e-prints arXiv:1912.07483 (2019).
Maday, Y., Marcati, C.: Regularity and \(hp\) discontinuous Galerkin finite element approximation of linear elliptic eigenvalue problems with singular potentials. Math. Models Methods Appl. Sci. 29(8), 1585–1617 (2019). https://doi.org/10.1142/S0218202519500295.
Maday, Y., Marcati, C.: Weighted analyticity of Hartree-Fock eigenfunctions. Tech. Rep. 2020-59, Seminar for Applied Mathematics, ETH Zürich, Switzerland (2020). https://www.sam.math.ethz.ch/sam_reports/reports_final/reports2020/2020-59.pdf
Marcati, C., Rakhuba, M., Schwab, C.: Tensor rank bounds for point singularities in \(\mathbb{R}^3\). Adv. Comput. Math. 48(18), 1–57 (2022). https://doi.org/10.1007/s10444-022-09925-7
Marcati, C., Schwab, C.: Analytic regularity for the incompressible Navier-Stokes equations in polygons. SIAM J. Math. Anal. 52(3), 2945–2968 (2020). https://doi.org/10.1137/19M1247334.
Maz’ya, V., Rossmann, J.: Elliptic Equations in Polyhedral Domains, Mathematical Surveys and Monographs, vol. 162. American Mathematical Society, Providence, Rhode Island (2010). https://doi.org/10.1090/surv/162.http://www.ams.org/surv/162
Melenk, J.M., Babuška, I.: The partition of unity finite element method: basic theory and applications. Comput. Methods Appl. Mech. Engrg. 139(1-4), 289–314 (1996). https://doi.org/10.1016/S0045-7825(96)01087-0.
Opschoor, J.A.A., Petersen, P.C., Schwab, C.: Deep ReLU networks and high-order finite element methods. Analysis and Applications 18(05), 715–770 (2020). https://doi.org/10.1142/S0219530519410136.
Opschoor, J.A.A., Schwab, C., Zech, J.: Exponential ReLU DNN expression of holomorphic maps in high dimension. Constructive Approximation 55(1), 537–582 (2022). https://doi.org/10.1007/s00365-021-09542-5.
Petersen, P., Voigtlaender, F.: Optimal approximation of piecewise smooth functions using deep ReLU neural networks. Neural Netw. 108, 296 – 330 (2018). https://doi.org/10.1016/j.neunet.2018.08.019. http://www.sciencedirect.com/science/article/pii/S0893608018302454
Pfau, D., Spencer, J.S., Matthews, A.G.D.G., Foulkes, W.M.C.: Ab initio solution of the many-electron Schrödinger equation with deep neural networks. Phys. Rev. Research 2, 033429 (2020). https://doi.org/10.1103/PhysRevResearch.2.033429. https://link.aps.org/doi/10.1103/PhysRevResearch.2.033429
Raissi, M., Karniadakis, G.E.: Hidden physics models: machine learning of nonlinear partial differential equations. J. Comput. Phys. 357, 125–141 (2018). https://doi.org/10.1016/j.jcp.2017.11.039.
Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686–707 (2019). https://doi.org/10.1016/j.jcp.2018.10.045.
Schötzau, D., Schwab, C.: Exponential convergence for \(hp\)-version and spectral finite element methods for elliptic problems in polyhedra. Math. Models Methods Appl. Sci. 25(9), 1617–1661 (2015). https://doi.org/10.1142/S0218202515500438.
Schötzau, D., Schwab, C.: Exponential convergence of \(hp\)-FEM for elliptic problems in polyhedra: mixed boundary conditions and anisotropic polynomial degrees. Found. Comput. Math. 18(3), 595–660 (2018). https://doi.org/10.1007/s10208-017-9349-9.
Schötzau, D., Schwab, C., Wihler, T.P.: \(hp\)-dGFEM for second-order elliptic problems in polyhedra I: Stability on geometric meshes. SIAM J. Numer. Anal. 51(3), 1610–1633 (2013). https://doi.org/10.1137/090772034.
Schötzau, D., Schwab, C., Wihler, T.P.: \(hp\)-DGFEM for second order elliptic problems in polyhedra II: Exponential convergence. SIAM J. Numer. Anal. 51(4), 2005–2035 (2013). https://doi.org/10.1137/090774276.
Schwab, C., Zech, J.: Deep learning in high dimension: Neural network expression rates for generalized polynomial chaos expansions in UQ. Anal. Appl. (Singap.) 17(1), 19–55 (2019). https://doi.org/10.1142/S0219530518500203.
Sheng, H., Yang, C.: PFNN: A penalty-free neural network method for solving a class of second-order boundary-value problems on complex geometries. Journal of Computational Physics 428, 110085 (2021). https://doi.org/10.1016/j.jcp.2020.110085. https://www.sciencedirect.com/science/article/pii/S0021999120308597
Shin, Y., Zhang, Z., Karniadakis, G.E.: Error estimates of residual minimization using neural networks for linear PDEs. arXiv e-prints arXiv:2010.08019 (2020).
Sirignano, J., Spiliopoulos, K.: DGM: a deep learning algorithm for solving partial differential equations. J. Comput. Phys. 375, 1339–1364 (2018). https://doi.org/10.1016/j.jcp.2018.08.029.
Suzuki, T.: Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019 (2019). https://openreview.net/forum?id=H1ebTsActm
Tarela, J.M., Martínez, M.V.: Region configurations for realizability of lattice piecewise-linear models. Math. Comput. Modelling 30(11-12), 17–27 (1999). https://doi.org/10.1016/S0895-7177(99)00195-8.
Yarotsky, D.: Error bounds for approximations with deep ReLU networks. Neural Netw. 94, 103 – 114 (2017). https://doi.org/10.1016/j.neunet.2017.07.002. http://www.sciencedirect.com/science/article/pii/S0893608017301545
Yarotsky, D.: Optimal approximation of continuous functions by very deep ReLU networks. pp. 639–649. PMLR (2018). http://proceedings.mlr.press/v75/yarotsky18a.html
Funding
Open access funding provided by Swiss Federal Institute of Technology Zurich.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Endre Süli.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Tensor Product hp Approximation
In this section, we construct the hp tensor product approximation which will then be emulated to obtain the NN expression rate estimates. The main result, Theorem A.25, is an exponential convergence bound for piecewise polynomial approximations with patch-wise tensor product structure in polyhedral domains \(\varOmega \), in dimension \(d=2,3\), which consist of a finite number of cuboids. It is used to prove the NN approximation results in Sect. 5, but it is also of independent interest.
We denote the reference cube \(Q=(0,1)^d\), \(d\in \{2,3\}\) and introduce the set containing one of its corners \({\mathcal {C}}\),
and the set of adjacent edges \({\mathcal {E}}\),
The results in this section extend, by rotation or reflection, to the case where \({\mathcal {C}}\) contains any of the corners of Q and \({\mathcal {E}}\) is the set of the adjacent edges when \(d=3\). Most of the section addresses the construction of exponentially consistent hp-quasiinterpolants in the reference cube \((0,1)^d\); in Sect. A.10, the analysis will be extended to domains which are specific finite unions of such patches.
1.1 Product Geometric Mesh and Tensor Product hp Space
We fix a geometric mesh grading factor \(\sigma \in (0,1/2]\). Furthermore, let
In (0, 1), the geometric mesh with \(\ell \) layers is \({\mathcal {G}}^\ell _{1} = \left\{ J^\ell _k: k=0, \dots , \ell \right\} \). Moreover, we denote the nodes of \({\mathcal {G}}^\ell _{1}\) by \(x_0^\ell = 0\) and \(x_k^\ell = \sigma ^{\ell -k+1}\) for \(k=1,\ldots ,\ell +1\). In \((0,1)^d\), the d-dimensional tensor product geometric mesh isFootnote 1
For an element , \(k_i\in \{0,\ldots ,\ell \}\), we denote by \(d^K_c\) the distance from the singular corner, and \(d^K_e\) the distance from the closest singular edge. We observe that
and
The hp tensor product space is defined as
where \({\mathbb {Q}}_p(K) {:}{=}{{\,\mathrm{span}\,}}\left\{ \prod _{i=1}^d (x_i)^{k_i} :k_i\le p, i=1, \dots , d\right\} \). Note that, by construction, \(X_{{\mathsf {h}}{\mathsf {p}}, d}^{\ell , p}= \bigotimes _{i=1}^d X_{{\mathsf {h}}{\mathsf {p}}, 1}^{\ell , p}\).
For positive integers p and s such that \(1\le s \le p\), we will write
Additionally, we will denote, for all \(\sigma \in (0, 1/2]\),
1.2 Local Projector
We denote the reference interval by \(I=(-1, 1)\) and the reference cube by \({{\widehat{K}}}= (-1, 1)^d\). We also write \(H_{\mathrm {mix}}^1({{\widehat{K}}}) = \bigotimes _{i=1}^d H^1(I)\supset H^d({{\widehat{K}}})\). Let \(p\ge 1\): we introduce the univariate projectors \({{\widehat{\pi }}}_p : H^1(I) \rightarrow {\mathbb {P}}_p(I)\) as
where \(L_n\) is the nth Legendre polynomial, \(L^\infty \)-normalized, and \((\cdot , \cdot )\) is the scalar product of \(L^2((-1,1))\). Note that
For \(( {p_1\dots p_d})\in {\mathbb {N}}^d\), we introduce the projection on the reference element \({{\widehat{K}}}\) as \( {{\widehat{\varPi }}}_{{p_1\dots p_d}} = \bigotimes _{i=1}^d{{\widehat{\pi }}}_{p_i} \). For all \(K\in {\mathcal {G}}^\ell _d\), we introduce an affine transformation from K to the reference element
Remark that since the elements are axiparallel, the affine transformation can be written as a d-fold product of one-dimensional affine transformations \(\phi _k : J_k^\ell \rightarrow I\), i.e., supposing that , it holds that
Let \(K\in {\mathcal {G}}^\ell _d\) and let \(k_i\), \(i=1, \dots , d\) be the indices such that . Define, for \(w\in H^1(J_{k_i}^{\ell })\),
For v defined on K such that \(v\circ \varPhi _K^{-1} \in H_{\mathrm {mix}}^1({{\widehat{K}}})\) and for \((p_1, \dots , p_d)\in {\mathbb {N}}^d\), we introduce the local projection operator
We also write \({{\widehat{\varPi }}}_p = {{\widehat{\varPi }}}_{p\dots p}\) and
For later reference, we note the following property of \(\varPi _{{p_1\dots p_d}}^K v\):
Lemma A.1
Let \(K_1, K_2 \subset {\mathbb {R}}^d\), \(d=2,3\) be two axiparallel hypercubes that share one regular face F if \(d=3\) and a regular edge F if \(d=2\) (i.e., if \(d=3\), F is an entire face of both \(K_1\) and \(K_2\), and if \(d=2\) it is an entire edge). Then, for \(v\in H_{\mathrm {mix}}^1({{\,\mathrm{int\,}\,}}({\overline{K}}_1\cup {\overline{K}}_2))\) and \((p_1, \dots , p_d)\in {\mathbb {N}}^d\), the piecewise polynomial
is continuous across F.
Proof
This follows directly from (A.8). \(\square \)
1.3 Global Projectors
We introduce, for \(\ell , p \in {\mathbb {N}}\), the univariate projector \(\pi _{{\mathsf {h}}{\mathsf {p}}}^{\ell , p}: H^1((0,1)) \rightarrow X_{{\mathsf {h}}{\mathsf {p}}, 1}^{\ell , p}\) as
Note that for all \(\ell \in {\mathbb {N}}\), for \(x\in J_0^\ell \)
The d-variate hp quasi-interpolant is then obtained by tensorization, i.e.,
Remark A.2
By the nodal exactness of the projectors, the operator \(\varPi _{{\mathsf {h}}{\mathsf {p}}, d}^{\ell , p}\) is continuous across interelement interfaces (see Lemma A.1); hence, its image is contained in \(H^1((0,1)^d)\). The continuity can also be observed from the expansion in terms of continuous, globally defined basis functions given in Proposition A.24.
Remark A.3
The projector \(\varPi _{{\mathsf {h}}{\mathsf {p}}, d}^{\ell , p}\) is defined on a larger space than \(H_{\mathrm {mix}}^1(Q)\) as specified below (e.g., Remark A.20).
1.4 Preliminary Estimates
The projector on \({{\widehat{K}}}\) given by
has the following property.
Lemma A.4
( [50, Propositions 5.2 and 5.3]) Let \(d=3\), \((p_1, p_2, p_3)\in {\mathbb {N}}^3\), and \((s_1, s_2, s_3)\in {\mathbb {N}}^3\) with \(1\le s_i\le p_i\). Then, the projector \({{\widehat{\varPi }}}_{p_1p_2p_3}:H_{\mathrm {mix}}^1({{\widehat{K}}}) \rightarrow {\mathbb {Q}}_{p_1, p_2, p_3}({{\widehat{K}}})\) satisfies that
for all \(v\in H^{s_1+1}(I)\otimes H^{s_2+1}(I)\otimes H^{s_3+1}(I)\) and for \(\varPsi _{p_i,s_i}\) defined in (A.5). Here, \(C_{\mathrm {appx}1}\) is independent of \((p_1, p_2, p_3)\), \((s_1, s_2, s_3)\) and v.
Remark A.5
In space dimension \(d=2\), a result analogous to Lemma A.4 holds, see [50].
Lemma A.6
Let \(d=3\), \((p_1,p_2,p_3)\in {\mathbb {N}}^3\), and \((s_1, s_2, s_3)\in {\mathbb {N}}^3\) with \(1\le s_i\le p_i\). Further, let \(\{i, j, k\}\) be a permutation of \(\{1,2,3\}\). Then, the projector \({{\widehat{\varPi }}}_{p_1p_2p_3}:H_{\mathrm {mix}}^1({{\widehat{K}}}) \rightarrow {\mathbb {Q}}_{p_1, p_2, p_3}({{\widehat{K}}})\) satisfies
for all \(v\in H^{s_1+1}(I)\otimes H^{s_2+1}(I)\otimes H^{s_3+1}(I)\). Here, \(C_{\mathrm {appx}2}>0\) is independent of \((p_1, p_2, p_3)\), \((s_1, s_2, s_3)\), and v.
Proof
Let \((p_1,p_2,p_3)\in {\mathbb {N}}^3\), and \((s_1, s_2, s_3)\in {\mathbb {N}}^3\), be as in the statement of the lemma. Also, let \(i\in \{1,2,3\}\) and \(\{j, k\} = \{1,2 ,3\} \setminus \{i\}\). By Lemma A.4, it holds that
With a \(C_{\mathrm {appx}1}>0\) independent of \((p_1, p_2, p_3)\), \((s_1, s_2, s_3)\), and v. Let now \({\overline{v}}_i:{{\widehat{K}}}\rightarrow {\mathbb {R}}\) be such that, when \(i=1\),
and let \({\overline{v}}_2\) and \({\overline{v}}_3\) be defined analogously. We denote by \({{\tilde{v}}}\) the function such that \({{\tilde{v}}}{:}{=}v - {\overline{v}}_i\) and, remarking that \(\partial _{x_i}{\overline{v}}_i = \partial _{x_i}{{\widehat{\varPi }}}_{p_1 p_2 p_3}{\overline{v}}_i = 0\), we apply (A.17) to \({{\tilde{v}}}\), so that
By the Poincaré inequality, it holds for all \(\alpha _1\in \{0,1\}\) that
Using the fact that \(\partial _{x_i}{{\tilde{v}}}= \partial _{x_i} v\) in the remaining terms of (A.18) concludes the proof. \(\square \)
1.4.1 One-Dimensional Estimate
The following result is a consequence of, e.g., [48, Lemma 8.1] and scaling.
Lemma A.7
There exists \(C>0\) such that for all \(\ell \in {\mathbb {N}}\), all integer \(0<k\le \ell \), all integers \(1\le s \le p\), all \(\gamma >0\), and all \(v\in H^{s+1}(J^\ell _k)\)
where \(h= |J^\ell _k| \simeq \sigma ^{\ell -k}\) and for \(\tau _\sigma \) as defined in (A.6).
Proof
From [48, Lemma 8.1], there exists \(C>0\) independent of p, k, s, and v such that
In addition, for all \(k=1, \dots , \ell \), it holds that \(x|_{J^\ell _k}\ge \frac{\sigma }{1-\sigma }h\). Hence, for all \(\gamma < s+1\),
This concludes the proof. \(\square \)
1.4.2 Estimate at a Corner in Dimension \(d=2\)
We consider now a setting with a two-dimensional corner singularity. Let \(\beta \in {\mathbb {R}}\), \({\mathfrak {K}}=J_0^\ell \times J_0^\ell \), \(r(x) = |x-x_0|\) with \(x_0 = (0,0)\) and define the corner-weighted norm \(\Vert v \Vert _{{\mathcal {J}}^{2}_\beta ({\mathfrak {K}})}\) by
Lemma A.8
Let \(d = 2\), \(\beta \in (1,2)\). There exists \(C_1, C_2>0\) such that for all \(v\in {\mathcal {J}}^2_\beta ({\mathfrak {K}})\)
and
Proof
Denote by \(c_i\), \(i=1, \dots , 4\) the corners of \({\mathfrak {K}}\) and by \(\psi _i\), \(i=1, \dots , 4\) the bilinear functions such that \(\psi _i(c_j) = \delta _{ij}\). Then,
Therefore, writing \(h=\sigma ^\ell \), we have
With the imbedding \({\mathcal {J}}^2_\beta ((0,1)^2)\hookrightarrow L^\infty ((0,1)^2)\) which is valid for \(\beta >1\) (which follows, e.g., from Lemma A.22 and \(W_{\mathrm {mix}}^{1,1}((0,1)^2)\hookrightarrow L^\infty ((0,1)^2)\)), a scaling argument gives
so that we obtain
For any \({|\alpha |}= 1\), denoting \(v_0 = v(0,0)\) and using the fact that \((\pi ^0_1\otimes \pi _1^0) v_0 = v_0\) hence \(\partial ^\alpha (\pi ^0_1\otimes \pi _1^0) v_0 = 0\),
With the imbedding \({\mathcal {J}}^2_\beta ((0,1)^2)\hookrightarrow L^\infty ((0,1)^2)\), Poincaré’s inequality, and rescaling we obtain
which finishes the proof of (A.20). To prove (A.21), note that \(v\in W^{2,1}({\mathfrak {K}})\), as shown in the final estimate of this proof. By the Sobolev imbedding of \(W^{2,1}({\mathfrak {K}})\) into \(H^1({\mathfrak {K}})\) and by scaling, we have
By classical interpolation estimates [7, Theorem 4.4.4], we additionally conclude that
Using the Cauchy–Schwarz inequality,
where we also have used, in the last step, the facts that \(r(x)\le \sqrt{2}h\) for all \(x\in {\mathfrak {K}}\) and that \(\beta >1\). \(\square \)
1.5 Interior Estimates
The following lemmas give the estimate of the approximation error on the elements not belonging to edge or corner layers. For \(d=3\), all \(\ell \in {\mathbb {N}}\), all \(k_1,k_2,k_3\in \{0,\ldots ,\ell \}\) and all \(K = J^\ell _{k_1}\times J^\ell _{k_2}\times J^{\ell }_{k_3}\), we denote, by \(h_\parallel \) the length of K in the direction parallel to the closest singular edge, and by \(h_{\bot ,1}\) and \(h_{\bot ,2}\) the lengths of K in the other two directions. If an element has multiple closest singular edges, we choose one of those and consider it as “closest edge” for all points in that element. When considering functions from \({\mathcal {J}}^d_{\underline{\gamma }}(Q)\), \(\gamma _e\) will refer to the weight of this closest edge. Similarly, we denote by \(\partial _{\parallel }\) (resp. \(\partial _{\bot ,1}\) and \(\partial _{\bot ,2}\)) the derivatives in the direction parallel (resp. perpendicular) to the closest singular edge.
Lemma A.9
Let \(d=3\), \(\ell \in {\mathbb {N}}\) and \(K = J^\ell _{k_1}\times J^\ell _{k_2}\times J^{\ell }_{k_3}\) for \(0< k_1, k_2, k_3\le \ell \). Let also \(v\in {\mathcal {J}}^\varpi _{\underline{\gamma }}(Q; {\mathcal {C}},{\mathcal {E}}; C_v, A_v)\) with \(\gamma _c\in (3/2, 5/2)\), \(\gamma _e \in (1, 2)\). Then, there exists \(C>0\) dependent only on \(\sigma \), \(C_{\mathrm {appx}2}\), \(C_v\) and \(A>0\) dependent only on \(\sigma \), \(A_v\) such that for all \(1\le s\le p\)
where \(\partial _{\parallel }\) is the derivative in the direction parallel to the closest singular edge.
Proof
We write \(d_a = d_a^K\), \(a \in \{c, e\}\). It holds that
Denoting \({\hat{v}}= v \circ \varPhi _K^{-1}\) and \({{\widehat{\varPi }}}_p {\hat{v}}= ( \varPi ^K_p v ) \circ \varPhi _K^{-1} = {{\widehat{\varPi }}}_p (v \circ \varPhi _K^{-1})\), using the result of Lemma A.6 and rescaling, we have
Denote \(K_c= K\cap Q_c\), \(K_e=K\cap Q_e\), \(K_{ce} = K\cap Q_{ce}\), and \(K_0 = K\cap Q_0\). Furthermore, we indicate
and do similarly for the other terms of the sum (II) and (III) and the other subscripts e, \(ce\), 0. Remark also that \(r_{i|_K}\ge d_i\), \(i\in \{c, e\}\), and that for \(a, b\in {\mathbb {R}}\) holds \(r_c^ar_e^b = r_c^{a+b} \rho _{ce}^{b}\).
We will also write \({{\widetilde{\gamma }}}= \gamma _c-\gamma _e\). We start by considering the term \((I)_{ce}\). Let \(\alpha _1= \alpha _2 = 1\); then,
where \(\tau _\sigma \) is as in (A.6). Furthermore, if \(\alpha _1 + \alpha _2\le 1\) and \(s+1+\alpha _1+\alpha _2-\gamma _c\ge 0\),
where we have also used \(d_e\le d_c\). Therefore,
If \(s+1+\alpha _1+\alpha _2-\gamma _c<0\), then \(s=1\) and \(\alpha _1=\alpha _2=0\), thus
Then, if \(s+1+\alpha _1+\alpha _2-\gamma _c\ge 0\)
where the last inequality follows also from \(d_e\le d_c\). If \(s+1+\alpha _1+\alpha _2-\gamma _c<0\), then the same bound holds with \(d_c^{2\gamma _c-2}\) replaced by \(d_c^2\). Similarly,
where we used that \(d_e\le 1\). The bound on \((I)_0\) follows directly from the definition:
Using (2.1), there exists \(C>0\) dependent only on \(C_v\) and \(\sigma \) and \(A>0\) dependent only on \(A_v\) and \(\sigma \) such that
We then apply the same argument to the terms (II) and (III). Indeed,
and the estimate for \((III)_{ce}\) follows by exchanging \(h_{\bot ,1}\) and \(\partial _{\bot ,1}\) with \(h_{\bot ,2}\) and \(\partial _{\bot ,2}\) in the inequality above. The estimates for \((II)_{c,e,0}\) and \((III)_{c, e, 0}\) can be obtained as for \((I)_{c, e, 0}\):
Therefore, we have
We obtain, from (A.26), (A.27), and (A.28) that there exists \(C>0\) (dependent only on \(\sigma \), \(C_{\mathrm {appx}2}\), \(C_v\) and \(A>0\) (dependent only on \(\sigma \), \(A_v\)) such that
Considering that
completes the proof. \(\square \)
Lemma A.10
Let \(d=3\), \(\ell \in {\mathbb {N}}\) and \(K = J^\ell _{k_1}\times J^\ell _{k_2}\times J^{\ell }_{k_3}\) for \(0< k_1, k_2, k_3\le \ell \). Let also \(v\in {\mathcal {J}}^\varpi _{\underline{\gamma }}(Q; {\mathcal {C}}, {\mathcal {E}}; C_v, A_v)\) with \(\gamma _c\in (3/2, 5/2)\), \(\gamma _e \in (1, 2)\). Then, there exists \(C>0\) dependent only on \(\sigma \), \(C_{\mathrm {appx}2}\), \(C_v\) and \(A>0\) dependent only on \(\sigma \), \(A_v\) such that for all \(p\in {\mathbb {N}}\) and all \(1\le s \le p\)
where \(\partial _{\bot ,1}\), \(\partial _{\bot ,2}\) are the derivatives in the directions perpendicular to the closest singular edge.
Proof
The proof follows closely that of Lemma A.9, and we use the same notation. From Lemma A.6 and rescaling, we have
As before, we will write \({{\widetilde{\gamma }}}= \gamma _c-\gamma _e\). We start by considering the term \((I)_{ce}\). When \(\alpha _1 = 1\),
where \(d_c^{2{{\widetilde{\gamma }}}}d_e^{2\gamma _e-2} \le d_c^{2\gamma _c-2}\). Furthermore, if \(\alpha _1 =0\),
Therefore,
The estimates for \((I)_{c, e, 0}\) follow from the same technique:
Hence, from (2.1), there exists \(C>0\) dependent only on \(C_v\) and \(\sigma \) and \(A>0\) dependent only on \(A_v\) and \(\sigma \) such that
We then apply the same argument to the terms (II) and (III). Indeed, if \(s+1+\alpha _1+\alpha _2-\gamma _c\ge 0\)
where in the last step we have used that \(\gamma _e>1\) and \(d_e\le d_c\). If \(s+1+\alpha _1+\alpha _2-\gamma _c<0\), then
Thus, using \(d_e\le d_c\),
The estimates for \((II)_{c,e,0}\) and \((III)_{ce, c, e, 0}\) can be obtained as above:
if \(s+1+\alpha _1+\alpha _2-\gamma _c\ge 0\), then
if \(s+1+\alpha _1+\alpha _2-\gamma _c<0\), then
so that
Therefore, we have
We obtain, from (A.30), (A.31), and (A.32) that there exists \(C>0\) dependent only on \(\sigma \), \(C_{\mathrm {appx}2}\), \(C_v\) and \(A>0\) dependent only on \(\sigma \), \(A_v\) such that
Considering that
and considering that the estimate for the other term on the left-hand side of (A.29) is obtained by exchanging \(\{h, \partial \}_{\bot ,1}\) with \(\{h, \partial \}_{\bot ,2}\) completes the proof. \(\square \)
Lemma A.11
Let \(d=3\), \(\ell \in {\mathbb {N}}\) and \(K = J^\ell _{k_1}\times J^\ell _{k_2}\times J^{\ell }_{k_3}\) for \(0< k_1, k_2, k_3\le \ell \). Let also \(v\in {\mathcal {J}}^\varpi _{\underline{\gamma }}(Q; {\mathcal {C}}, {\mathcal {E}}; C_v, A_v)\) with \(\gamma _c\in (3/2, 5/2)\), \(\gamma _e \in (1, 2)\). Then, there exists \(C>0\) dependent only on \(\sigma \), \(C_{\mathrm {appx}1}\), \(C_v\) and \(A>0\) dependent only on \(\sigma \), \(A_v\) such that for all \(p\in {\mathbb {N}}\) and all \(1\le s \le p\)
Proof
The proof follows closely that of Lemmas A.9 and A.10; we use the same notation. From Lemma A.4 and rescaling, we have
Most terms on the right-hand side above have already been considered in the proofs of Lemmas A.9 and A.10, and the terms with \(\alpha _1 = \alpha _2 = 0\) can be estimated similarly; the observation that
concludes the proof. \(\square \)
We summarize Lemmas A.9 to A.11 in the following result.
Lemma A.12
Let \(d=3\), \(\ell \in {\mathbb {N}}\) and \(K=J^\ell _{k_1}\times J^\ell _{k_2}\times J^\ell _{k_3}\) such that \(0<k_1,k_2,k_3\le \ell \). Let also \(v\in {\mathcal {J}}^\varpi _{\underline{\gamma }}(Q; {\mathcal {C}}, {\mathcal {E}}; C_v, A_v)\) with \(\gamma _c\in (3/2, 5/2)\), \(\gamma _e \in (1, 2)\). Then, there exists \(C>0\) dependent only on \(\sigma \), \(C_{\mathrm {appx}1}\), \(C_{\mathrm {appx}2}\), \(C_v\) and \(A>0\) dependent only on \(\sigma \), \(A_v\) such that for all \(p\in {\mathbb {N}}\) and all \(1\le s \le p\)
We then consider elements on the faces (but not abutting edges) of Q.
Lemma A.13
Let \(d=3\), \(\ell \in {\mathbb {N}}\) and \(K = J^\ell _{k_1}\times J^\ell _{k_2}\times J^{\ell }_{k_3}\) such that \(k_j =0\) for one \(j\in \{1,2,3\}\) and \(0<k_i\le \ell \) for \(i\ne j\). For all \(p\in {\mathbb {N}}\) and all \(1\le s \le p\), let \(p_j= 1\) and \(p_i = p\in {\mathbb {N}}\) for \(i\ne j\). Let also \(v\in {\mathcal {J}}^\varpi _{\underline{\gamma }}(Q; {\mathcal {C}}, {\mathcal {E}}; C_v, A_v)\) with \(\gamma _c\in (3/2, 5/2)\), \(\gamma _e \in (1, 2)\). Then, there exists \(C>0\) dependent only on \(\sigma \), \(C_{\mathrm {appx}1}\), \(C_{\mathrm {appx}2}\), \(C_v\) and \(A>0\) dependent only on \(\sigma \), \(A_v\) such that
Proof
We write \(d_a = d_a^K\), \(a \in \{c, e\}\). Suppose, for ease of notation, that \(j=3\), i.e., \(k_3=0\). The projector is then given by \(\varPi ^K_{p p 1} = \pi ^{k_1}_p\otimes \pi ^{k_2}_p\otimes \pi ^0_1\). Also, we denote \(h_{\bot ,2}= \sigma ^\ell \) and \(\partial _{\bot ,2}= \partial _{x_3}\). By (A.16),
The bounds on the terms (I) and (II) can be derived as in Lemma A.9 and give
We consider then term (III): with the usual notation, writing \({{\widetilde{\gamma }}}= \gamma _c-\gamma _e\),
Note that \(d_c\ge d_e\) and
where we have also used that \(d_c\le 1\). Hence,
The bounds on the terms \((III)_{c, e, 0}\) follow by the same argument:
Then,
The bounds on the first two terms in the right-hand side above can be obtained as in Lemma A.10:
while the last term can be bounded as in (A.39),
so that
The same holds true for the last term of the gradient of the approximation error, given by
From Lemma A.10, we obtain
whereas for the third term, it holds that if \(\alpha _1+\alpha _2+2-\gamma _c\ge 0\)
and if \(\alpha _1+\alpha _2+2-\gamma _c< 0\), then
and for all \(\alpha _1+\alpha _2+2-\gamma _c\in {\mathbb {R}}\), \((III)_{e}\) and \((III)_{0}\) satisfy the bounds that \((III)_{ce}\) and \((III)_{c}\) satisfy in case \(\alpha _1+\alpha _2+2-\gamma _c< 0\), so that
Finally, the bound on the \(L^2(K)\)-norm of the approximation error can be obtained by a combination of the estimates above. \(\square \)
The exponential convergence of the approximation in internal elements (i.e., elements not abutting a singular edge or corner) follows, from Lemmas A.9 to A.13.
Lemma A.14
Let \(d=3\) and \(v\in {\mathcal {J}}^\varpi _{\underline{\gamma }}(Q; {\mathcal {C}}, {\mathcal {E}})\) with \(\gamma _c>3/2\), \(\gamma _e>1\). There exists a constant \(C_0>0\) such that if \(p\ge C_0 \ell \), there exist constants \(C, b>0\) such that for every \(\ell \in {\mathbb {N}}\)
Proof
We suppose, without loss of generality, that \(\gamma _c\in (3/2, 5/2)\), and \(\gamma _e\in (1,2)\). The general case follows from the inclusion \({\mathcal {J}}^\varpi _{{\underline{\gamma }}_1}(Q;{\mathcal {C}}, {\mathcal {E}}) \subset {\mathcal {J}}^\varpi _{{\underline{\gamma }}_2}(Q;{\mathcal {C}}, {\mathcal {E}})\), valid for \({\underline{\gamma }}_1 \ge {\underline{\gamma }}_2\). Fix any \(C_0>0\) and choose \(p\ge C_0 \ell \). For all \(A>0\), there exist \(C_1, b_1> 0\) such that (see, e.g., [50, Lemma 5.9])
From (A.35) and (A.36), it follows that
where \(d_f^K\) indicates the distance of an element K to one of the faces of Q. We have directly \((I)\le C\ell ^2 e^{-b_1 p}\). Furthermore, because \((\min (\gamma _c, \gamma _e)-2)<0\),
Adjusting the constants in the exponent to absorb the terms in \(\ell \) and \(\ell ^2\), we obtain the desired estimate. \(\square \)
A similar statement holds when \(d=2\), and the proof follows along the same lines.
Lemma A.15
Let \(d=2\) and \(v\in {\mathcal {J}}^\varpi _{\underline{\gamma }}(Q;{\mathcal {C}}, {\mathcal {E}})\) with \(\gamma _c>1\). There exists a constant \(C_0>0\) such that if \(p\ge C_0 \ell \), there exist constants \(C, b>0\) such that
1.6 Estimates on Elements Along an Edge in Three Dimensions
In the following lemma, we consider the elements K along one edge, but separated from the singular corner.
Lemma A.16
Let \(d=3\), \(e\in {\mathcal {E}}\) and let \(K\in {\mathcal {G}}^\ell _3\) be such that \(d_c^K >0\) for all \(c\in {\mathcal {C}}\) and \(d_e^K=0\). Let \(C_v, A_v>0\). Then, if \(v\in {\mathcal {J}}^\varpi _{\underline{\gamma }}(Q;{\mathcal {C}}, {\mathcal {E}}; C_v, A_v)\) with \(\gamma _c\in (3/2,5/2)\), \(\gamma _e\in (1,2)\), there exist \(C, A>0\) such that for all \(p\in {\mathbb {N}}\) and all \(1\le s\le p\), with \((p_1,p_2,p_3)\in {\mathbb {N}}^3\) such that \(p_\parallel = p\), \(p_{\perp ,1} = 1 = p_{\perp ,2}\),
where \(k\in \{1, \dots , \ell \}\) is such that \(d_c^K = \sigma ^{\ell -k+1}\).
Proof
We suppose that \(K=J^\ell _k\times J^\ell _0\times J^\ell _0\) for some \(k\in \{1, \dots , \ell \}\), the elements along other edges follow by symmetry. This implies that the singular edge is parallel to the first coordinate direction. Furthermore, we denote
For \(\alpha = (\alpha _1, \alpha _2, \alpha _3) \in {\mathbb {N}}_0^3\), we write \({\alpha _\parallel }= (\alpha _1, 0, 0)\) and \({\alpha _\bot }= (0, \alpha _2, \alpha _3)\). Also,
We have
We start by considering the first terms on the right-hand side of the above equation. We also compute the norms over \(K_{ce} = K\cap Q_{ce}\); the estimate on the norms over \(K_c= K\cap Q_c\) and \(K_e = K\cap Q_e\) follow by similar or simpler arguments. By (A.21) from Lemma A.8, we have that if \(\gamma _c< 2\)
whereas for \(\gamma _c\ge 2\)
On \(K_e\), the same bound holds as on \(K_{ce}\) for \(\gamma _c\ge 2\), and on \(K_c\) the same bounds hold as on \(K_{ce}\) for \(\gamma _c<2\). By the same argument, for \({|\alpha _\parallel |}=1\),
and
We now turn to the second part of the right-hand side of (A.41). We use (A.20) from Lemma A.8 so that
By Lemma A.7 we have, recalling that \({\alpha _\parallel }=s+1\) and \(1\le s \le p\), for all \({|\alpha _\bot |}\le 1\),
and, for all \({|\alpha _\bot |}=2\), using that \(\pi _\parallel \) and multiplication by \(r_e\) commute, because \(r_e\) does not depend on \(x_1\),
Then, remarking that \(|x_1| \lesssim r_c\lesssim |x_1|\), combining (A.44) with the two inequalities above we obtain
Adjusting the exponent of the weights, replacing \(h_\parallel \) and \(h_\bot \) with their definition, we find that there exists \(A>0\) depending only on \(\sigma \) and \(A_v\) such that
and similarly
and the estimate on \(K_c\) is the same as that on \(K_{ce}\). Similarly to (A.44), using first (A.23) from the proof of Lemma A.8, and then Lemma A.7
As before, there exists \(A>0\) depending only on \(\sigma \) and \(A_v\) such that
and
and the estimate on \(K_c\) is the same as that on \(K_{ce}\). The assertion now follows from (A.42), (A.43), (A.45), and (A.46), upon possibly adjusting the value of the constant A. \(\square \)
Lemma A.17
Let \(d=3\) and \(v\in {\mathcal {J}}^\varpi _{\underline{\gamma }}(Q;{\mathcal {C}}, {\mathcal {E}})\) with \(\gamma _c>3/2\), \(\gamma _e>1\). There exists a constant \(C_0>0\) such that if \(p\ge C_0 \ell \), there exist constants \(C, b>0\) such that
Proof
As in the proof of Lemma A.14, we may assume that \(\gamma _c\in (3/2,5/2)\) and \(\gamma _e\in (1,2)\). The proof of the statement follows by summing over the right-hand side of (A.40), i.e.,
We have \((II) \lesssim \ell \sigma ^{2(\min (\gamma _c, \gamma _e)-1)\ell }\). To bound (I), we observe that for all \(A>0\) there exist \(C_1, b_1>0\) such that
(see, e.g., [50, Lemma 5.9]). Combining with \(p\ge C_0\ell \) concludes the proof. \(\square \)
1.7 Estimates at the Corner
The lemma below follows from classic low-order finite element approximation results and from the embedding \({\mathcal {J}}^2_{\underline{\gamma }}(Q;{\mathcal {C}}, {\mathcal {E}})\subset H^{1+\theta }(Q)\), valid for a \(\theta >0\) if \( \gamma _c-d/2>0\), for all \(c\in {\mathcal {C}}\), and, when \(d=3\), \(\gamma _e >1\) for all \(e\in {\mathcal {E}}\) (see, e.g., [48, Remark 2.3]).
Lemma A.18
Let \(d \in \{2,3\}\), . Then, if \(v\in {\mathcal {J}}^\varpi _{\underline{\gamma }}(Q;{\mathcal {C}}, {\mathcal {E}})\) with
there exists a constant \(C_0>0\) independent of \(\ell \) such that if \(p\ge C_0\ell \), there exist constants \(C,b>0\) such that
1.8 Exponential Convergence
The exponential convergence of the approximation in the full domain Q follows then from Lemmas A.14, A.15, A.17, and A.18.
Proposition A.19
Let \(d \in \{2,3\}\), \(v\in {\mathcal {J}}^\varpi _{\underline{\gamma }}(Q;{\mathcal {C}}, {\mathcal {E}})\) with
Then, there exist constants \(c_p>0\) and \(C, b>0\) such that, for all \(\ell \in {\mathbb {N}}\),
With respect to the dimension of the discrete space \(N_{\mathrm {dof}}= \dim (X_{{\mathsf {h}}{\mathsf {p}}, d}^{\ell , c_p\ell })\), the above bound reads
1.9 Explicit Representation of the Approximant in Terms of Continuous Basis Functions
Let \(p\in {\mathbb {N}}\). Let \({\hat{\zeta }}_1(x) = (1+x)/2\) and \({\hat{\zeta }}_2 = (1-x)/2\). Let also \({\hat{\zeta }}_n(x) = \frac{1}{2}\int _{-1}^xL_{n-2}(\xi )d\xi \), for \(n=3, \dots , p+1\), where \(L_{n-2}\) denotes the \(L^\infty ((-1,1))\)-normalized Legendre polynomial of degree \(n-2\) introduced in Sect. A.2. Then, fix \(\ell \in {\mathbb {N}}\) and write \(\zeta ^k_n = {\hat{\zeta }}_n \circ \phi _k\), \(n=1,\dots , p+1\) and \(k=0, \dots , \ell \), with the affine map \(\phi _k:J_{k}^\ell \rightarrow (-1,1)\) introduced in Sect. A.2. We construct those functions explicitly: denoting \(J^\ell _k = (x_k^{\ell }, x_{k+1}^{\ell })\) and \(h_k = |x_{k+1}^{\ell }-x_k^{\ell }|\), we have, for \(x\in J_k^\ell \),
and
Let \(d=3\). Then, for any element \(K\in {\mathcal {G}}^\ell _3\), with \(K = J^\ell _{k_1}\times J^\ell _{k_2}\times J^\ell _{k_3}\), there exist coefficients \(c^K_{{i_1\dots i_d}}\) such that
by construction. We remark that, whenever \(i_j > 2\) for all \(j=1,2,3\), the basis functions vanish on the boundary of the element:
Furthermore, write
and consider \(t_{{i_1\dots i_d}} = \# \{i_j\le 2,\, j=1,2,3\}\). We have
-
if \(t_{{i_1\dots i_d}} = 1\), then \(\psi _{{i_1\dots i_d}}^K\) is not zero only on one face of the boundary of K,
-
if \(t_{{i_1\dots i_d}} = 2\), then \(\psi _{{i_1\dots i_d}}^K\) is not zero only on one edge and neighboring faces of the boundary of K,
-
if \(t_{{i_1\dots i_d}} = 3\), then \(\psi _{{i_1\dots i_d}}^K\) is not zero only on one corner and neighboring edges and faces of the boundary of K.
Similar arguments hold when \(d=2\).
1.9.1 Explicit Bounds on the Coefficients
We derive here a bound on the coefficients of the local projectors with respect to the norms of the projected function. We will use that
Remark A.20
As mentioned in Remark A.3, the hp-projector \(\varPi _{{\mathsf {h}}{\mathsf {p}}, d}^{\ell , p}\) can be defined for more general functions than \(u\in H_{\mathrm {mix}}^1(Q)\). As follows from Equations (A.53), (A.57), (A.61) and (A.64) below, the projector is also defined for \(u\in W_{\mathrm {mix}}^{1,1}(Q)\).
Lemma A.21
There exist constants \(C_1, C_2\) such that, for all \(u\in W_{\mathrm {mix}}^{1,1}(Q)\), all \(\ell \in {\mathbb {N}}\), all \(p\in {\mathbb {N}}\)
and for all \(({i_1,\dots , i_d})\in \{1, \dots , p+1\}^d\)
Proof
Let \(d=3\) and \(K = J_{k_1}^\ell \times J_{k_2}^\ell \times J_{k_3}^\ell \in {\mathcal {G}}^\ell _3\).
Internal modes. We start by considering the case of the coefficients of internal modes, i.e., \(c^K_{i_1, i_2, i_3}\) as defined in (A.49) for \(i_n\ge 3\), \(n=1,2, 3\). Let then \(i_1, i_2, i_3\in \{3, \dots , p+1\}\) and write \(L_n^k = L_n \circ \phi _k\): it follows that
If \(u\in W_{\mathrm {mix}}^{1,1}(K)\), since \(\Vert L_n\Vert _{L^\infty (-1,1)} = 1\) for all n, we have
hence,
Face modes. We continue with face modes and fix, for ease of notation, \(i_1 =1\). We also denote \(F = J^\ell _{k_2}\times J^\ell _{k_3}\). The estimates will then also hold for \(i_1=2\) and for any permutation of the indices by symmetry. We introduce the trace inequality constant \(C^{T,1}\), independent of K, such that, for all \(v\in W^{1,1}(Q)\) and \({\hat{x}}\in (0,1)\),
This follows from the trace estimate in [49, Lemma 4.2] and from the fact that
For \(i_2, i_3\in \{3, \dots , p+1\}\),
Since the Legendre polynomials are \(L^\infty \)-normalized and using the trace inequality (A.56),
Summing over all internal faces, furthermore,
Edge modes. We now consider edge modes. Fix for ease of notation \(i_1 = i_2 = 1\); as before, the estimates will hold for \((i_1, i_2)\in \{1,2\}^2\) and for any permutation of the indices. By the same arguments as for (A.56), there exists a trace constant \(C^{T,2}\) such that, denoting \(e = J^\ell _{k_3}\), for all \(v\in W^{1,1}((0,1)^2)\) and for all \({\hat{x}}\in (0,1)\),
By definition,
Summing over edges, in addition,
Node modes. Finally, we consider the coefficients of nodal modes, i.e., \(c^K_{i_1, i_2, i_3}\) for \(i_1, i_2, i_3\in \{1,2\}\), which by construction equal function values of u, e.g.,
The Sobolev imbedding \(W_{\mathrm {mix}}^{1,1}(Q)\hookrightarrow L^{\infty }(Q)\) and scaling implies the existence of a uniform constant \(C_{\mathrm {imb}}\) such that, for any \(v\in W_{\mathrm {mix}}^{1,1}(Q)\)
Then, by construction,
Summing over nodes, it follows directly that
We obtain (A.51) from (A.54), (A.58), (A.62), and (A.65). Furthermore, (A.52) follows from (A.55), (A.59), (A.63), and (A.66). The estimates for the case \(d=2\) follow from the same argument. \(\square \)
The following lemma shows the continuous imbedding of \({\mathcal {J}}^{d}_{{\underline{\gamma }}}(Q;{\mathcal {C}}, {\mathcal {E}})\) into \(W_{\mathrm {mix}}^{1,1}(Q)\), given sufficiently large weights \({\underline{\gamma }}\).
Lemma A.22
Let \(d\in \{2,3\}\). Let \({\underline{\gamma }}\) be such that \( \gamma _c>d/2\), for all \(c\in {\mathcal {C}}\) and (if \(d=3\)) \(\gamma _e>1\) for all \(e\in {\mathcal {E}}\). There exists a constant \(C>0\) such that, for all \(u \in {\mathcal {J}}^d_{\underline{\gamma }}(Q;{\mathcal {C}}, {\mathcal {E}})\),
Proof
We recall the decomposition of Q as
where \(Q_{{\mathcal {E}}}= Q_{{\mathcal {C}}{\mathcal {E}}} = \emptyset \) if \(d=2\). First,
We now consider the subdomain \(Q_{c}\), for any \(c\in {\mathcal {C}}\). We have, with a constant C that depends only on \(\gamma _c\) and on \(|Q_c|\),
where the last inequality follows from the fact that \(\gamma _c> d/2\); hence, the norm \(\Vert r_c^{-({|\alpha |}-\gamma _c)_+}\Vert _{L^2(Q_c)}\) is bounded for all \({|\alpha |}\le d\). Consider then \(d=3\) and any \(e\in {\mathcal {E}}\). Suppose also, without loss of generality, that \(\gamma _c-\gamma _e >1/2\) and \(\gamma _e<2\) (otherwise, it is sufficient to replace \(\gamma _e\) by a smaller \({{\widetilde{\gamma }}}_e\) such that \(1<{{\widetilde{\gamma }}}_e< \gamma _c-1/2\) and \({{\widetilde{\gamma }}}_e<2\) and remark that \({\mathcal {J}}^d_{\underline{\gamma }}(Q;{\mathcal {C}}, {\mathcal {E}})\subset {\mathcal {J}}_{\underline{{{\widetilde{\gamma }}}}}^d(Q;{\mathcal {C}}, {\mathcal {E}})\) if \({{\widetilde{\gamma }}}_e < \gamma _e\)). Since \(\gamma _e > 1\), then \(\Vert r_e^{-{|\alpha _\bot |}+\gamma _e}\Vert _{L^2(Q_e)}\) is bounded by a constant depending only on \(\gamma _e\) and \(|Q_e|\) as long as \(\alpha \) is such that \({|\alpha _\bot |}\le 2\). Hence, denoting by \(\partial _{\parallel }\) the derivative in the direction parallel to e,
Since \(x_\parallel \le r_c(x)\le {{\hat{\varepsilon }}}\) for all \(x\in Q_{ce}\) and \({{\hat{\varepsilon }}}\) defined in Sect. 2.1, and because \(Q_{ce}\subset \big \{x_\parallel \in (0,{{\hat{\varepsilon }}}),\) \((x_{\bot ,1}, x_{\bot ,2})\in (0, {{\widehat{\epsilon }}}^2)^2\big \}\), it follows that
for a constant C that depends only on \({{\hat{\varepsilon }}}\), \(\gamma _c\), and \(\gamma _e\). Hence,
with C independent of u. Combining inequalities (A.67) to (A.70) concludes the proof. \(\square \)
The following statement is a direct consequence of Lemmas A.21 and A.22 above and the fact that \(\left\| \psi _{{i_1\dots i_d}}^K \right\| _{L^\infty (K)} \le 1\) for all \(K\in {\mathcal {G}}^\ell _3\) and all \({i_1,\dots , i_d}\in \{1,\ldots ,p+1\}\).
Corollary A.23
Let \({\underline{\gamma }}\) be such that \( \gamma _c-d/2>0\), for all \(c\in {\mathcal {C}}\) and, if \(d=3\), \( \gamma _e>1\) for all \(e\in {\mathcal {E}}\). There exists a constant \(C>0\) such that for all \(\ell ,p\in {\mathbb {N}}\) and for all \(u \in {\mathcal {J}}^d_{\underline{\gamma }}(Q;{\mathcal {C}}, {\mathcal {E}})\),
1.9.2 Basis of Continuous Functions with Compact Support
It is possible to construct a basis for \(\varPi _{{\mathsf {h}}{\mathsf {p}}, d}^{\ell , p}\) in Q such that all basis functions are continuous and have compact support. For all \(\ell \in {\mathbb {N}}\) and all \(p\in {\mathbb {N}}\), extend to zero outside of their domain of definition the functions \(\zeta ^k_n\) defined in (A.47) and (A.48), for \(k=0, \dots , \ell \) and \(n=1, \dots , p+1\). We introduce the univariate functions with compact support \(v_j : (0,1)\rightarrow {\mathbb {R}}\), for \(j=1, \dots , (\ell +1)p+1\) so that \(v_1 = \zeta ^0_{2}\), \(v_{\ell +2} = \zeta ^\ell _{1}\),
and
Proposition A.24
Let \(\ell \in {\mathbb {N}}\) and \(p\in {\mathbb {N}}\). Furthermore, let \(u\in {\mathcal {J}}^d_{\underline{\gamma }}(Q;{\mathcal {C}}, {\mathcal {E}})\) with \({\underline{\gamma }}\) such that \(\gamma _c-d/2>0\) and, if \(d=3\), \(\gamma _e>1\). Let \(N_{\mathrm {1d}}= (\ell +1)p+1\). There exists an array of coefficients
such that
Furthermore, there exist constants \(C_1, C_2>0\) independent of \(\ell \), p, and u, such that
and
Proof
The statement follows directly from the construction of the projector, see (A.49), and from the bounds in Lemmas A.21 and A.22. In particular, (A.72) holds because the element-wise coefficients related to \(\zeta _2^{k-1}\) and to \(\zeta _1^{k-2}\) are equal: it follows from Equations (A.57), (A.61) and (A.64) that \(c^{K}_{1i_2\ldots i_d} = c^{K'}_{2i_2\ldots i_d}\) for all \(i_2,\ldots ,i_d\in \{1,\ldots ,p+1\}\), all \(K = J_{k_1}^\ell \times J_{k_2}^\ell \times J_{k_3}^\ell \in {\mathcal {G}}^\ell _3\) satisfying \(k_1<\ell \) and \(K' = J_{k_1+1}^\ell \times J_{k_2}^\ell \times J_{k_3}^\ell \in {\mathcal {G}}^\ell _3\). The same holds for permutations of \({i_1,\dots , i_d}\).
Because \((v_k)_{k=1}^{(\ell +1)p+1}\) are continuous, this again shows continuity of \(\varPi _{{\mathsf {h}}{\mathsf {p}}, d}^{\ell , p}u\) (Remark A.2).
The last estimate is obtained with (A.52):
\(\square \)
1.9.3 Proof of Theorem 2.1
Proof of Theorem 2.1
Fix \(A_f\), \(C_f\), and \({\underline{\gamma }}\) as in the hypotheses. Then, by Proposition A.19, there exist \(c_p\), \(C_{{\mathsf {h}}{\mathsf {p}}}\), \(b_{{\mathsf {h}}{\mathsf {p}}}>0\) such that for every \(\ell \in {\mathbb {N}}\) and for all \(v\in {\mathcal {J}}^\varpi _{{\underline{\gamma }}}(Q;{\mathcal {C}}, {\mathcal {E}}; C_f, A_f)\), there exists \(v_{{\mathsf {h}}{\mathsf {p}}}^\ell \in X_{{\mathsf {h}}{\mathsf {p}}, d}^{\ell , c_p\ell }\) such that (see Sect. A.1 for the definition of the space \(X_{{\mathsf {h}}{\mathsf {p}}, d}^{\ell , c_p\ell }\))
For \(\epsilon > 0\), we choose
so that
Furthermore, \(v_{{\mathsf {h}}{\mathsf {p}}}^L = \sum _{{i_1,\dots , i_d}}^{N_{\mathrm {1d}}} c_{{i_1\dots i_d}} \phi _{{i_1\dots i_d}}\) and, for all \(({i_1,\dots , i_d})\in \{1, \dots , N_{\mathrm {1d}}\}^d\), there exists \(v_{i_j}\), \(j=1, \dots , d\) such that \(\phi _{{i_1\dots i_d}} = \bigotimes _{j=1}^dv_{i_j}\), see Sect. A.9.2 and Proposition A.24. By construction of \(v_{i}\) in (A.71), and by using (A.47) and (A.48), we observe that \(\Vert v_{i}\Vert _{L^\infty (I)}\le 1\) for all \(i=1, \dots , N_{\mathrm {1d}}\). In addition, (A.50), demonstrates that
Then, since (A.73) implies \(L\le 1+\frac{1}{b_{{\mathsf {h}}{\mathsf {p}}}}\left| \log (\epsilon /C_{{\mathsf {h}}{\mathsf {p}}}) \right| \),
This concludes the proof of Items 1 and 2. Finally, Item 3 follows from Proposition A.24 and the fact that \(p\le C_p\left( 1+\left| \log (\epsilon ) \right| \right) \) for a constant \(C_p>0\) independent of \(\epsilon \).
\(\square \)
1.10 Combination of Multiple Patches
The approximation results in the domain \(Q=(0,1)^d\) can be generalized to include the combination of multiple patches. We give here an example, relevant for the PDEs considered in Sect. 5. For the sake of conciseness, we show a single construction that takes into account all singularities of the problems in Sect. 5. We will then use this construction to prove expression rate bounds for realizations of NNs.
Let \(a>0\) and \(\varOmega = (-a,a)^d\). Denote the set of corners
and the set of edges \({\mathcal {E}}_\varOmega = \emptyset \), if \(d=2\), and, if \(d=3\),
We introduce the affine transformations \(\psi _{1, +}:(0,1)\rightarrow (0, a/2)\), \(\psi _{2,+}:(0,1)\rightarrow (a/2,a)\), \(\psi _{1, -} :(0,1)\rightarrow (-a/2, 0)\), \(\psi _{2, -}:(0,1)\rightarrow (-a,-a/2)\) such that
For all \(\ell \in {\mathbb {N}}\), define then
Consequently, for \(d=2,3\), denote , see Fig. 3.
The hp space in \(\varOmega = (-a,a)^d\) is then given by
Finally, recall the definition of \(\pi _{{\mathsf {h}}{\mathsf {p}}}^{\ell , p}\) from (A.12) and construct
such that, for all \(v\in W^{1,1}((-a,a))\),
Then, the global hp projection operator \({\widetilde{\varPi }}_{{\mathsf {h}}{\mathsf {p}}, d}^{\ell , p}: W_{\mathrm {mix}}^{1,1} (\varOmega )\rightarrow {\widetilde{X}}_{{\mathsf {h}}{\mathsf {p}}, d}^{\ell , p}\) is defined as
Theorem A.25
For \(a>0\), let \(\varOmega = (-a,a)^d\), \(d=2,3\). Denote by \(\varOmega ^k\), \(k=1, \dots , 4^d\) the patches composing \(\varOmega \), i.e., the sets with \(a^k_j\in \{-a,-a/2,0,a/2\}\). Denote also \({\mathcal {C}}^k ={\mathcal {C}}_\varOmega \cap {\overline{\varOmega }}^k\) and \({\mathcal {E}}^k = \{ e\in {\mathcal {E}}_\varOmega : e\subset {\overline{\varOmega }}^k\}\), which contain one singular corner, and three singular edges abutting that corner, as in (A.1) and (A.2).
Let \({\mathcal {I}}\subset \{1, \dots , 4^d\}\) and let \(v\in W_{\mathrm {mix}}^{1,1}(\varOmega )\) be such that, for all \(k\in {\mathcal {I}}\), it holds that \(v|_{\varOmega ^k}\in {\mathcal {J}}^\varpi _{{\underline{\gamma }}^k}(\varOmega ^k; {\mathcal {C}}^k, {\mathcal {E}}^k)\) with
Then, there exist constants \(c_p>0\) and \(C, b>0\) such that, for all \(\ell \in {\mathbb {N}}\), with \(p = c_p \ell \),
Here, \(N_{\mathrm {dof}}= {\mathcal {O}}(\ell ^{2d})\) denotes the overall number of degrees of freedom in the piecewise polynomial approximation. Furthermore, writing \({\widetilde{N}}_{\mathrm {1d}}= 4(\ell +1)p+1\), there exists an array of coefficients
such that
where for all \(j=1, \dots , d\) and \(i_j=1, \dots , {\widetilde{N}}_{\mathrm {1d}}\), \({{\tilde{v}}}_{ i_j}\in {\widetilde{X}}_{{\mathsf {h}}{\mathsf {p}}, 1}^{\ell , p}\) with support in at most two, neighboring elements of \(\widetilde{{\mathcal {G}}}^\ell _1\). Finally, there exist constants \(C_1, C_2>0\) independent of \(\ell \) such that
and
Proof
The statement is a direct consequence of Propositions A.19 and A.24. We start the proof by showing that for any function \(v\in W_{\mathrm {mix}}^{1,1}(\varOmega )\), the approximation \({\widetilde{\varPi }}_{{\mathsf {h}}{\mathsf {p}}, d}^{\ell , p}v\) is continuous; the rest of the theorem will then follow from the results in each sub-patch. Let now \(w\in W^{1,1}((-a,a))\). Then, it holds that \( \left( {\widetilde{\pi }}_{{\mathsf {h}}{\mathsf {p}}}^{\ell , p}w \right) |_{I} \in C(I)\), for all \(I\in \{(0, a/2), (a/2,a), (-a/2, 0), (-a, -a/2)\}\), by definition (A.76). Furthermore, it follows from the nodal exactness of the local projectors that, for \({\tilde{x}} \in \{-a/2, 0, a/2\}\),
implying then that \({\widetilde{\pi }}_{{\mathsf {h}}{\mathsf {p}}}^{\ell , p}w\) is continuous. Since \({\widetilde{\varPi }}_{{\mathsf {h}}{\mathsf {p}}, d}^{\ell , p}= \bigotimes _{j=1}^d{\widetilde{\pi }}_{{\mathsf {h}}{\mathsf {p}}}^{\ell , p}\), this implies that \({\widetilde{\varPi }}_{{\mathsf {h}}{\mathsf {p}}, d}^{\ell , p}v\) is continuous for all \(v\in W_{\mathrm {mix}}^{1,1}(\varOmega )\). Fix \(k\in \{1, \dots , 4^d\}\) such that \(v\in {\mathcal {J}}^\varpi _{{\underline{\gamma }}^k}(\varOmega ^k; {\mathcal {C}}^k,{\mathcal {E}}^k)\). There exist then, by Proposition A.19, constants \(C, b, c_p>0\) such that for all \(\ell \in {\mathbb {N}}\)
Equation (A.77) follows. The bounds (A.78) and (A.79) follow from the construction of the basis functions (A.47)–(A.48) and from the application of Lemma A.21 in each patch, respectively. \(\square \)
Proofs of Section 5
1.1 Proof of Lemma 5.5
Proof of Lemma 5.5
Notation. For any two nonempty sets \(X,Y\subset \varOmega \), we denote by \({{\,\mathrm{dist}\,}}_{\varOmega }(X,Y)\) the infimum of Euclidean lengths of paths in \(\varOmega \) connecting an element of X with one of Y. We introduce several domain-dependent quantities to be used in the construction of the triangulation \({\mathcal {T}}\) with the properties stated in the lemma.
Let \({\mathcal {E}}\) denote the set of edges of the polygon \(\varOmega \). For each corner \(c\in {\mathcal {C}}\) at which the interior angle of \(\varOmega \) is smaller than \(\pi \) (below called convex corner), we fix a parallelogram \(G_c \subset \varOmega \) and a bijective, affine transformation \(F_c : (0,1)^2\rightarrow G_c\) such that
-
\(F_c((0,0)) = c\),
-
two edges of \(G_c\) coincide partially with the edges of \(\varOmega \) abutting at the corner c, such that \(\overline{G_c}\cap {\mathcal {C}}= c\).
If at \(c\in {\mathcal {C}}\) the interior angle of \(\varOmega \) is greater than or equal to \(\pi \) (both are referred to by slight abuse of terminology as nonconvex corner), we fix a bijective, affine transformation \(F_c\) with the same properties, such that \(F_c : (-1,1)\times (0,1)\rightarrow G_c\) if the interior angle equals \(\pi \), and \(F_c : (-1,1)^2\setminus (-1, 0]^{2}\rightarrow G_c\) else, and with \(G_c\) having the corresponding shape.
Let now
Then, for each \(c\in {\mathcal {C}}\), let \(e_1\) and \(e_2\) be the edges abutting c and define
Furthermore, for each \(e\in {\mathcal {E}}\), denote \(d_e {:}{=} \infty \) if \(\varOmega \) is a triangle, otherwise
Finally, for all x in the polygon \(\varOmega \), let the number of closest edges to x be
Then, in case \(\varOmega \) is a triangle, let \(d_0\) be half of the radius of the inscribed circle, else let \(d_0 {:}{=} \tfrac{1}{3} d_{{\mathcal {E}}} < \tfrac{1}{2} d_{{\mathcal {E}}}\). It holds that
For any shape regular triangulation \({\mathcal {T}}\) of \({\mathbb {R}}^2\), such that for all \(K\in {\mathcal {T}}\), \(K\cap \partial \varOmega = \emptyset \), denote \({{\mathcal {T}}_\varOmega }= \{K\in {\mathcal {T}}: K\subset \varOmega \}\) and \(h({{\mathcal {T}}_\varOmega }) = \max _{K\in {{\mathcal {T}}_\varOmega }} h(K)\), where h(K) denotes the diameter of K. Denote by \({{\mathcal {N}}_\varOmega }\) the set of nodes of \({\mathcal {T}}\) that are in \({\overline{\varOmega }}\). For any \(n\in {{\mathcal {N}}_\varOmega }\), define
Partition of unity. Let \({\mathcal {T}}\) be a triangulation of \({\mathbb {R}}^2\) such that
and such that for all \(K\in {\mathcal {T}}\) it holds \(K\cap \partial \varOmega = \emptyset \).
The hat-function basis \(\{\phi _n\}_{n\in {{\mathcal {N}}_\varOmega }}\) is a basis for \(S_1(\varOmega ,{{\mathcal {T}}_\varOmega })\) such that \({{\,\mathrm{supp}\,}}(\phi _n) \subset \overline{{{\,\mathrm{patch}\,}}(n)}\) for all \(n\in {{\mathcal {N}}_\varOmega }\), and it is a partition of unity on \(\varOmega \).
Strategy of the remainder of the proof. We will show that, for each \(n\in {{\mathcal {N}}_\varOmega }\), there exists a subdomain \(\varOmega _n\), which is either an affinely mapped square or an affinely mapped L-shaped domain, such that \({{\,\mathrm{patch}\,}}(n)\cap \varOmega \subset \varOmega _n\). We point to Fig. 4 for an illustration of the patches \(\varOmega _n\) that will be introduced in the proof, for various sets of nodes.
Verification of [P1], [P2], and [P3]. For each \(c\in {\mathcal {C}}\), let \(\widehat{{\mathcal {N}}}_c = \{n\in {{\mathcal {N}}_\varOmega }: {{\,\mathrm{patch}\,}}(n)\cap \varOmega \subset G_c\}\). Then,
Therefore, all the nodes \(n\in {\mathcal {N}}_c\) are such that \({{\,\mathrm{patch}\,}}(n)\cap \varOmega \subset G_c {=}{:} \varOmega _n\). Denote then
Note that, due to (B.1), we have \(\sqrt{2}h({{\mathcal {T}}_\varOmega }) \le \frac{\sqrt{2}}{\sqrt{2}+1}d_{{\mathcal {C}}, 1} \le d_{{\mathcal {C}}, 1} - h({{\mathcal {T}}_\varOmega })\).
We consider the nodes in \({\mathcal {N}}\setminus {\mathcal {N}}_{{\mathcal {C}}}\). First, consider the nodes in
For all \(n\in {\mathcal {N}}_0\), there exists a square \(Q_n\) such that
see Fig. 4c. Hence, for all \(n\in {\mathcal {N}}_0\), we take \(\varOmega _n {:}{=} Q_n\).
Define
For all \(n\in {\mathcal {N}}_{{\mathcal {E}}}\), from (B.1) it follows that \({{\,\mathrm{dist}\,}}_{\varOmega }(n, \partial \varOmega ) <\sqrt{2}h({{\mathcal {T}}_\varOmega }) \le d_0\), hence \(n_e(n) \le 2\). Furthermore, suppose there exists \(n\in {\mathcal {N}}_{{\mathcal {E}}}\) such that \(n_e(n) =2\). Let the two closest edges to n be denoted by \(e_1\) and \(e_2\), so that \({{\,\mathrm{dist}\,}}_{\varOmega }(n, e_1) = {{\,\mathrm{dist}\,}}_{\varOmega }(n, e_2) = {{\,\mathrm{dist}\,}}_{\varOmega }(n, \partial \varOmega ) <\sqrt{2}h({{\mathcal {T}}_\varOmega })\). If \(\overline{e_1}\cap \overline{e_2} = \emptyset \), there must hold \({{\,\mathrm{dist}\,}}_{\varOmega }(n, e_1) + {{\,\mathrm{dist}\,}}_{\varOmega }(n, e_2)\ge d_{\mathcal {E}}\), which is a contradiction with \({{\,\mathrm{dist}\,}}_{\varOmega }(n, \partial \varOmega ) < \sqrt{2}h({{\mathcal {T}}_\varOmega })\le d_{{\mathcal {E}}}/2\). If instead there exists \(c\in {\mathcal {C}}\) such that \(\overline{e_1}\cap \overline{e_2} = \{c\}\), then n is on the bisector of the angle between \(e_1\) and \(e_2\). Using that \(2\sqrt{2} h({{\mathcal {T}}_\varOmega })\le d_{{\mathcal {C}}, 2}\), we now show that all such nodes belong either to \({\mathcal {N}}_{{\mathcal {C}}}\) or to \({\mathcal {N}}_0\), which is a contradiction to \(n\in {\mathcal {N}}_{{\mathcal {E}}}\). Let \(x_0\in \varOmega \) be the intersection of \(\partial B_{\frac{\sqrt{2}}{\sqrt{2}+1}d_{{\mathcal {C}}, 1}}(c)\) and the bisector. To show that \(n\in {\mathcal {N}}_{{\mathcal {C}}}\cup {\mathcal {N}}_0\), it suffices to show that \({{\,\mathrm{dist}\,}}(x_0,e_i) \ge \sqrt{2}h({{\mathcal {T}}_\varOmega })\) for \(i=1,2\). Because \(\frac{\sqrt{2}}{\sqrt{2}+1}d_{{\mathcal {C}}, 1} \le d_{{\mathcal {C}}, 1} - h({{\mathcal {T}}_\varOmega })\), it a fortiori holds for all points y in \(\varOmega \) on the bisector intersected with \(\left( B_{d_{{\mathcal {C}}, 1}-h({{\mathcal {T}}_\varOmega })}(c)\right) ^c\), that \({{\,\mathrm{dist}\,}}(y,e_i)\ge \sqrt{2}h({{\mathcal {T}}_\varOmega })\), which shows that if \({{\,\mathrm{dist}\,}}_{\varOmega }(n,c)\ge d_{{\mathcal {C}},1}-h({{\mathcal {T}}_\varOmega })\), then \(n\in {\mathcal {N}}_0\). If c is a nonconvex corner, then \({{\,\mathrm{dist}\,}}(x_0,e_i) \ge \sqrt{2}h({{\mathcal {T}}_\varOmega })\) for \(i=1,2\) follows immediately from \({{\,\mathrm{dist}\,}}(x_0,e_i) = {{\,\mathrm{dist}\,}}(x_0,c) = \frac{\sqrt{2}}{\sqrt{2}+1}d_{{\mathcal {C}}, 1}\) and (B.1). To show that \({{\,\mathrm{dist}\,}}(x_0,e_i) \ge \sqrt{2}h({{\mathcal {T}}_\varOmega })\), \(i=1,2\) in case c is a convex corner, we make the following definitions (see Fig. 5):
-
For \(i=1,2\), let \(x_i\) be the intersection of \(e_i\) and \(\partial B_{\frac{\sqrt{2}}{\sqrt{2}+1}d_{{\mathcal {C}}, 1}}(c)\),
-
let \(x_3\) be the intersection of \(\overline{x_1x_2}\) with the bisector,
-
and for \(i=1,2\), let \(x_{i+3}\) be the orthogonal projection of \(x_0\) onto \(e_i\), which is an element of \(e_i\) because c is a convex corner.
Then, \(d_{c,2} = |\overline{x_1x_2}| = |\overline{x_1x_3}| + |\overline{x_3x_2}| = 2 |\overline{x_ix_3}|\). Because the triangle \(cx_0x_{i+3}\) is congruent to \(cx_1x_3\), it follows that \({{\,\mathrm{dist}\,}}(x_0,e_i) = |\overline{x_0x_{i+3}}| = |\overline{x_ix_3}| = \tfrac{1}{2} d_{c,2} \ge \sqrt{2}h({{\mathcal {T}}_\varOmega })\). We can conclude with (B.1) that \(n_e(n) = 1\) for all \(n\in {\mathcal {N}}_{{\mathcal {E}}}\) and denote the edge closest to n by \(e_n\). Let then \(S_n\) be the square with two edges parallel to \(e_n\) such that
see Fig. 4d, i.e., \(S_n\) has center n and sides of length \(2h({{\mathcal {T}}_\varOmega })\). For each \(n\in {\mathcal {N}}_{{\mathcal {E}}}\), the connected component of \(S_n\cap \varOmega \) containing n is a rectangle:
-
(i)
Note that for all edges e such that \({\overline{e}}\cap \overline{e_n} = \emptyset \), it holds that \(S_n\cap e \subset B_{\sqrt{2}h({{\mathcal {T}}_\varOmega })}(n) \cap e = \emptyset \). The latter holds because \(2\sqrt{2}h({{\mathcal {T}}_\varOmega }) \le d_{{\mathcal {E}}} \le {{\,\mathrm{dist}\,}}_{\varOmega }(e,e_n) \le {{\,\mathrm{dist}\,}}_{\varOmega }(e,n) +{{\,\mathrm{dist}\,}}_{\varOmega }(n,e_n)\) and \({{\,\mathrm{dist}\,}}_{\varOmega }(n,e_n)<\sqrt{2}h({{\mathcal {T}}_\varOmega })\) imply \({{\,\mathrm{dist}\,}}_{\varOmega }(n,e)\ge \sqrt{2}h({{\mathcal {T}}_\varOmega })\).
-
(ii)
We next show that for both corners c of \(e_n\) there is no \(x\in \varOmega \setminus B_{\frac{\sqrt{2}}{\sqrt{2}+1}d_{{\mathcal {C}}, 1}}(c)\) for which \({{\,\mathrm{dist}\,}}(x,e_n)<\sqrt{2} h({{\mathcal {T}}_\varOmega })\) and such that for another edge e it holds \(\overline{e_n}\cap {\overline{e}}=\{c\}\) and \({{\,\mathrm{dist}\,}}(x,e) < \sqrt{2} h({{\mathcal {T}}_\varOmega })\). We give a proof by contradiction. Assume that there exist \(x\in \varOmega \setminus B_{\frac{\sqrt{2}}{\sqrt{2}+1}d_{{\mathcal {C}}, 1}}(c)\) and an edge e with \(\overline{e_n}\cap {\overline{e}}=\{c\}\), \({{\,\mathrm{dist}\,}}(x,e_n)<\sqrt{2} h({{\mathcal {T}}_\varOmega })\) and \({{\,\mathrm{dist}\,}}(x,e) < \sqrt{2} h({{\mathcal {T}}_\varOmega })\). Now \({{\,\mathrm{dist}\,}}(x,c)\ge \sqrt{2} h({{\mathcal {T}}_\varOmega })\) together with the previous two inequalities implies that both the angle between e and xc and the angle between \(e_n\) and xc are smaller than \(\pi /2\), and thus that c is convex. Let \(x_6\) be the point on xc which satisfies \({{\,\mathrm{dist}\,}}(x_6,c) = \frac{\sqrt{2}}{\sqrt{2}+1}d_{{\mathcal {C}}, 1}\) and let \(x_0\) be the intersection of \(\partial B_{\frac{\sqrt{2}}{\sqrt{2}+1}d_{{\mathcal {C}}, 1}}(c)\) and the bisector of c, for which we have previously shown that \({{\,\mathrm{dist}\,}}(x_0,e) = {{\,\mathrm{dist}\,}}(x_0,e_n) \ge \sqrt{2} h({{\mathcal {T}}_\varOmega })\) (we then denoted \(e_n\) and e by \(e_1\) and \(e_2\)). We detail the remainder of the argument for the case that the angle between \(x_6c\) and e is at least as large as the angle between \(x_0c\) and e, i.e., the bisector \(x_0c\) lies between \(x_6\) and e (in the other case the same argument applies but with the roles of e and \(e_n\) interchanged). This assumption, combined with \({{\,\mathrm{dist}\,}}(x_6,c) = {{\,\mathrm{dist}\,}}(x_0,c)\) and the sine rule, gives that \({{\,\mathrm{dist}\,}}(x_6,e) \ge {{\,\mathrm{dist}\,}}(x_0,e)\) and hence \(\sqrt{2} h({{\mathcal {T}}_\varOmega }) > {{\,\mathrm{dist}\,}}(x,e) \ge {{\,\mathrm{dist}\,}}(x_6,e) \ge {{\,\mathrm{dist}\,}}(x_0,e) = {{\,\mathrm{dist}\,}}(x_0,e_n)\), which gives a contradiction. Using the proved claim for \(x=n\) shows that for the edges e neighboring \(e_n\) \({{\,\mathrm{dist}\,}}(n,e)\ge \sqrt{2}h({{\mathcal {T}}_\varOmega })\) and thus \(S_n\cap \partial \varOmega \subset e_n\) or \(S_n\cap \partial \varOmega = \emptyset \).
Thus, the connected component of \(S_n\cap \varOmega \) containing n is a rectangle, which we define to be \(\varOmega _n\).
Setting \(N_p {:}{=} \#{{\mathcal {N}}_\varOmega }\) and \(\{\varOmega _i\}_{i=1,\ldots ,N_p} = \{\varOmega _n\}_{n\in {{\mathcal {N}}_\varOmega }}\) concludes the proof. \(\square \)
1.2 Proof of Lemma 5.13
Proof of Lemma 5.13
Let \(d=3\) and denote \(R = (-1, 0)^3\). Denote by O the origin, and let \(E = \{e_1, e_2, e_3\}\) denote the set of edges of R abutting the origin. Let also \(F=\{f_1, f_2, f_3\}\) denote the set of faces of R abutting the origin, i.e., the faces of R such that \(f_i \subset {\overline{R}}\cap {\overline{\varOmega }}\), \(i=1,2,3\). Let, finally, for each \(f\in F\), \(E_f = \{e\in E: e\subset {\overline{f}} \}\) denote the subset of E containing the two edges neighboring f.
For each \(e\in E\), define \(u_e\) to be the lifting of \(u|_e\) into R, i.e., the function such that \(u_e|_e = u_e\) and \(u_e\) is constant in the two coordinate directions perpendicular to e. Similarly, let, for each \(f\in F\), \(u_f\) be such that \(u_f|_f = u|_f\) and \(u_f\) is constant in the direction perpendicular to f.
We define \(w:R\rightarrow {\mathbb {R}}\) as
where \(u_0 = u(O)\). Since \(u|_{e}\in W^{1,1}(e)\), \(u|_{f}\in W_{\mathrm {mix}}^{1,1}(f)\) for all \(e\in E\) and \(f\in F\), it holds that \(u_e\in W_{\mathrm {mix}}^{1,1}(R)\) and \(u_f\in W_{\mathrm {mix}}^{1,1}(R)\) for all \(e\in E\) and \(f\in F\) (cf. Equations (A.56) and (A.60)), hence \(w\in W_{\mathrm {mix}}^{1,1}(R)\). Furthermore, note that
and that
From the first equality in (B.2), then, it follows that, for all \(f\in F\),
Let the function v be defined as
Then, v is continuous in \((-1,1)^3\) and \(v\in W_{\mathrm {mix}}^{1,1}((-1,1)^3)\). Now, for all \(\alpha \in {\mathbb {N}}^3_0\) such that \({|\alpha |_\infty }\le 1\),
where \(\alpha ^e_\parallel \) denotes the index in the coordinate direction parallel to e, and
where \(\alpha ^f_{\parallel , j}\), \(j=1,2\) denote the indices in the coordinate directions parallel to f. Then, by a trace inequality (see [49, Lemma 4.2]), there exists a constant \(C>0\) independent of u such that
for all \(e\in E\), \(f\in F\). Then, by (B.2) and (B.3),
for an updated constant C independent of u. This concludes the proof when \(d=3\). The case \(d=2\) can be treated by the same argument. \(\square \)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Marcati, C., Opschoor, J.A.A., Petersen, P.C. et al. Exponential ReLU Neural Network Approximation Rates for Point and Edge Singularities. Found Comput Math 23, 1043–1127 (2023). https://doi.org/10.1007/s10208-022-09565-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10208-022-09565-9
Keywords
- Neural networks
- Finite element methods
- Exponential convergence
- Analytic regularity
- Singularities
- Electron structure