G-Adaptive mesh refinement - leveraging graph neural networks and differentiable finite element solvers

James Rowbottom
Department of Applied Mathematics
and Theoretical Physics
University of Cambridge
Cambridge, United Kingdom
Georg Maierhofer
Mathematical Institute
University of Oxford
Oxford, United Kingdom
Teo Deveney
Department of Computer Science
University of Bath
Bath, United Kingdom
Katharina Schratz
Laboratoire Jacques-Louis Lions
Sorbonne Université
Paris, France
Pietro Liò
Department of Computer Science
and Technology
University of Cambridge
Cambridge, United Kingdom
Carola-Bibiane Schönlieb
Department of Applied Mathematics
and Theoretical Physics
University of Cambridge
Cambridge, United Kingdom
Chris Budd
Department of Mathematical Sciences
University of Bath
Bath, United Kingdom
Corresponding author and joint first author; [email protected]Joint first author;[email protected]
Abstract

We present a novel, and effective, approach to the long-standing problem of mesh adaptivity in finite element methods (FEM). FE solvers are powerful tools for solving partial differential equations (PDEs), but their cost and accuracy are critically dependent on the choice of mesh points. To keep computational costs low, mesh relocation (r-adaptivity) seeks to optimise the position of a fixed number of mesh points to obtain the best FE solution accuracy. Classical approaches to this problem require the solution of a separate nonlinear “meshing” PDE to find the mesh point locations. This incurs significant cost at remeshing and relies on certain a-priori assumptions and guiding heuristics for optimal mesh point location. Recent machine learning approaches to r-adaptivity have mainly focused on the construction of fast surrogates for such classical methods. Our new approach combines a graph neural network (GNN) powered architecture, with training based on direct minimisation of the FE solution error with respect to the mesh point locations. The GNN employs graph neural diffusion (GRAND), closely aligning the mesh solution space to that of classical meshing methodologies, thus replacing heuristics with a learnable strategy, and providing a strong inductive bias. This allows for rapid and robust training and results in an extremely efficient and effective GNN approach to online r-adaptivity. This method outperforms classical and prior ML approaches to r-adaptive meshing on the test problems we consider, in particular achieving lower FE solution error, whilst retaining the significant speed-up over classical methods observed in prior ML work.

1 Introduction

Finite element methods (FEM) are state-of-the-art when it comes to the large scale solution of partial differential equations (PDEs) [2, 10]. Central advantages are robustness, good error estimates, thoroughly developed code bases (such as PetscFE [10], Firedrake [19]), which are highly parallelisable and efficient. However, even with such optimised software the simulation of large scale problems (weather forecasting, structural simulations in engineering systems) is computationally costly. An important ingredient that determines the cost is the number of degrees of freedom (DOFs) included in a FEM, proportional to the number Nxsubscript𝑁𝑥N_{x}italic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT of mesh points on which it operates. It is therefore desirable to keep Nxsubscript𝑁𝑥N_{x}italic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT moderate. In order to optimise the FEM error a typical approach is mesh-adaptivity, i.e. to refine the mesh in such a way that important features of the solutions are captured at the right scale and therefore reducing the approximation error. In the present work we apply a stable and highly efficient GNN based architecture to implement a learnable r𝑟ritalic_r-adaptive method, which keeps Nxsubscript𝑁𝑥N_{x}italic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT fixed, but adapts the mesh point location such that the overall error of the FEM is reduced. This problem is computationally extremely challenging and as a result most of classical r𝑟ritalic_r-adaptive methods have focused on finding and minimizing mathematical substitutes for the FEM error (in particular upper bounds of simpler form). An example of this would be to choose mesh points to minimize the solution interpolation error which is an upper bound (up to some parameters and constants) of the FEM error arising from Céa’s lemma [22, p. 56]. As we describe below, even recent ML approaches have essentially relied on similar mathematical simplifications. In contrast, in the present work we take an entirely different approach. We present G-adaptivity, an approach to mesh adaptivity that uses a GNN to generate meshes that directly minimise the error of the corresponding FEM solution. We couple fast backpropagation through the GNN, with mesh point gradients attained through the discrete adjoint equation associated with the FEM solver, to tractably minimise the FEM approximation error directly (as opposed to the upper bound considered in [22]). The result is a model capable of outperforming the current state-of-the-art equidistribution-based r-adaptivity methods, whilst providing the acceleration of DL based approaches (cf. Figure 1).

Refer to caption
Figure 1: Outline of G-adaptivity architecture: (a) GNN feature space construction. (b) GNN adaptive meshing blocks (c) direct optimisation w.r.t the L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT FEM error using differentiable FEM

Contributions

Indeed, our central novelty is to provide a new r𝑟ritalic_r-adaptive GNN which fundamentally differs from the paradigm of emulation of traditional meshing methods in prior work. More specifically, our contributions are the following:

  • A novel GNN architecture for mesh movement applications, incorporating diffusion-based dynamics on graphs similar to those seen in classical approaches. This results in a neural network which avoids mesh-tangling and provides a strong inductive bias for fast training and high-quality adapted meshes.

  • We introduce a differentiable FEM solver (difFEM) based on the discrete adjoint method, and outline how to combine the gradients obtained from this with those obtained through backpropagation through the GNN to allow direct training of the GNN with FEM approximation error as the objective.

  • An analysis of the regularity properties of meshes generated by our GNN architecture. Specifically we show that mesh points are moved around within the convex hull of their neighbours, thus preventing overlapping triangulations (mesh tangling).

  • A thorough experimental report comparing our approach to classical and recent approaches in terms of accuracy and computational time. Our experiments cover both 1 and 2 dimensional examples, and include both stationary and time-dependent equations.

We highlight that our approach does not require a dataset of adapted meshes a-priori, as it is based on minimising the FEM error directly. This results in our method outperforming even state-of-the-art classical methods in terms of FEM accuracy as demonstrated in the numerical examples in Section 5. Interestingly this direct minimisation results in qualitative differences of the mesh appearance, suggesting that such a direct approach is indeed necessary to achieve optimal performance.

2 Related work

ML and PDEs

As indicated above, the effective approximation of solutions to partial differential equations is one of the central problems in computational mathematics. Over the recent decade increasingly work has been devoted to using ML for the numerical approximation of PDEs. This includes notably physics informed neural networks [36, 35], related work on Fourier neural operators [28, 29] graph neural operators [27], DeepONets [31], Message Passing Neural PDE Solvers [5] and the deep Ritz method [12]. The majority of such approaches try to directly approximate the PDE solution operator with a machine learning surrogate. Such methods offer certain advantages (for example in high dimensional settings [20]), but are typically outperformed by traditional numerical methods in accuracy in most settings [17]. Our approach differs fundamentally from these, by instead providing a central ingredient, the mesh, without replacing the classical PDE solver. This has the crucial advantage that we retain convergence guarantees and robustness of FEMs which are often lacking in direct ML approaches for the solution of PDEs.

Classical adaptive mesh refinement

Adaptive mesh methods are a widely used tool for improving the performance of a classical FEM by varying the local density of the mesh points. This is necessary if the underlying solution has small length scales or singularities, often induced by the geometry of the domain. Adaptivity allows for a high accuracy without the cost of using a uniformly fine mesh. The most widely used form of adaptivity is h-adaptivity [2], in which mesh cells are subdivided when some a-posteriori estimate of the solution approximation is large. Such methods have complex data structures and, possibly, poor mesh regularity. An alternative are the relocation based r-adaptive methods considered in this paper, in which a fixed number of mesh points are move to achieve a high density of points where some monitor m𝑚mitalic_m of the solution error is large. Done correctly this can lead to significant error reduction at little extra cost [22]. r-adaptive methods retain a constant data structure, and the movement of the mesh points can reproduce Lagrangian structures in the solution.

Graph Neural Networks (GNNs)

GNNS are the dominant approach to applying machine learning to irregularly structured data [1, 4]. There has been a proliferation of architectures inspired by spectral graph theory [11], convolutional (GCN) [25], message passing (MPNN) [15] and attentional (GAT) [41] approaches. More recently a range of differential equation inspired architectures [9, 8, 16] look to take tools from the analysis and methods of solving different equations to try and solve known problems with GNNs including stability, oversmoothing and bottleneck phenomenon. These architectures often contain a strong inductive bias from the design inspiration for example equivariance [37], anisotropic diffusion and stability [9]. These inductive biases along with the powerful message passing paradigm are helping solve some of the most pressing problems in science; protein folding [24], weather prediction [26], dynamics learning[34] and solving PDEs by inspiring new numerical methods [5, 30, 3].

ML and adaptive meshing

The significant computational cost of classical methods for adaptive meshing (in particular r-adaptive methods) supports the need for fast ML-based alternatives. While this topic is still in the early stages of research there have been several notable contributions in recent years. This includes work on h-adaptive mesh refinement [14, 34], and a larger amount of contributions to r-adaptivity constructing ML based solvers for the classical mesh movement PDEs [43, 21] and supervised (data-based) learning for mesh adaptivity using Graph Neural Networks (GNNs) [38]. For the r-adaptive case in particular prior work has focused on the construction of fast surrogates to classical approaches and as outlined below this marks one of the central differences to our novel approach.

3 Preliminaries and background

3.1 Problem specification

In the following we consider the solution of an abstract PDE on domain ΩΩ\Omegaroman_Ω of dimension d𝑑ditalic_d in the form

(ut,u,u,2u,)=f,x~Ω~,u=g,x~Ω~.formulae-sequencesubscript𝑢𝑡𝑢𝑢superscript2𝑢𝑓formulae-sequence~𝑥~Ωformulae-sequence𝑢𝑔~𝑥~Ω\mathcal{F}(u_{t},u,\nabla u,\nabla^{2}u,\dots)=f,\quad\tilde{x}\in\widetilde{% \Omega},\quad u=g,\quad\tilde{x}\in\partial\widetilde{\Omega}.caligraphic_F ( italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_u , ∇ italic_u , ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_u , … ) = italic_f , over~ start_ARG italic_x end_ARG ∈ over~ start_ARG roman_Ω end_ARG , italic_u = italic_g , over~ start_ARG italic_x end_ARG ∈ ∂ over~ start_ARG roman_Ω end_ARG . (1)

Note we will treat both time-dependent and time-independent problems in the example section, and it is thus understood that Ω~=[0,T]×Ω~Ω0𝑇Ω\widetilde{\Omega}=[0,T]\times{\Omega}over~ start_ARG roman_Ω end_ARG = [ 0 , italic_T ] × roman_Ω and x~=(t,x)~𝑥𝑡𝑥\tilde{x}=(t,{x})over~ start_ARG italic_x end_ARG = ( italic_t , italic_x ) in the time-dependent case (and that Ω~=Ω,x~=xformulae-sequence~ΩΩ~𝑥𝑥\widetilde{\Omega}={\Omega},\tilde{x}=xover~ start_ARG roman_Ω end_ARG = roman_Ω , over~ start_ARG italic_x end_ARG = italic_x in the time-independent case). We solve (1) with a finite element method in the weak formulation, which means we commence with a mesh 𝒳𝒳\mathcal{X}caligraphic_X and a set of piecewise linear basis functions S𝒳subscript𝑆𝒳S_{\mathcal{X}}italic_S start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT on that mesh. For us trial and test functions will be the same and a finite element discretisations will typically mean that we seek U𝒳S𝒳subscript𝑈𝒳subscript𝑆𝒳U_{\mathcal{X}}\in S_{\mathcal{X}}italic_U start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT ∈ italic_S start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT using Galerkin optimality conditions of the form

a(tU𝒳,U𝒳,;v,v,)=(f,v)L2(Ω),vS𝒳formulae-sequence𝑎subscript𝑡subscript𝑈𝒳subscript𝑈𝒳𝑣𝑣subscript𝑓𝑣superscript𝐿2Ωfor-all𝑣subscript𝑆𝒳\displaystyle a(\partial_{t}U_{\mathcal{X}},\nabla U_{\mathcal{X}},\dots;v,% \nabla v,\dots)=(f,v)_{L^{2}({\Omega})},\quad\forall v\in S_{\mathcal{X}}italic_a ( ∂ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_U start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT , ∇ italic_U start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT , … ; italic_v , ∇ italic_v , … ) = ( italic_f , italic_v ) start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_Ω ) end_POSTSUBSCRIPT , ∀ italic_v ∈ italic_S start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT (2)

where a𝑎aitalic_a is a bilinear form that is usually found by integration by parts of ((Ut,),v)L2(Ω)subscriptsubscript𝑈𝑡𝑣superscript𝐿2Ω(\mathcal{F}(U_{t},\dots),v)_{L^{2}({\Omega})}( caligraphic_F ( italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , … ) , italic_v ) start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_Ω ) end_POSTSUBSCRIPT. In the case of Poisson’s equation with zero Dirichlet boundary conditions the above simplifies to

(U𝒳,v)L2(Ω)=(f,v)L2(Ω).subscriptsubscript𝑈𝒳𝑣superscript𝐿2Ωsubscript𝑓𝑣superscript𝐿2Ω\displaystyle(\nabla U_{\mathcal{X}},\nabla v)_{L^{2}({\Omega})}=(f,v)_{L^{2}(% {\Omega})}.( ∇ italic_U start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT , ∇ italic_v ) start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_Ω ) end_POSTSUBSCRIPT = ( italic_f , italic_v ) start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_Ω ) end_POSTSUBSCRIPT .

This method of solution results in an expansion U𝒳=i=1Nxα𝒳ϕ𝒳,isubscript𝑈𝒳superscriptsubscript𝑖1subscript𝑁𝑥subscript𝛼𝒳subscriptitalic-ϕ𝒳𝑖U_{\mathcal{X}}=\sum_{i=1}^{N_{x}}\alpha_{\mathcal{X}}\phi_{\mathcal{X},i}italic_U start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT caligraphic_X , italic_i end_POSTSUBSCRIPT where S𝒳={ϕ𝒳,i}i=1Nxsubscript𝑆𝒳superscriptsubscriptsubscriptitalic-ϕ𝒳𝑖𝑖1subscript𝑁𝑥S_{\mathcal{X}}=\{\phi_{\mathcal{X},i}\}_{i=1}^{N_{x}}italic_S start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT = { italic_ϕ start_POSTSUBSCRIPT caligraphic_X , italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. The goal in r𝑟ritalic_r-adaptive meshing is then to minimise the solution error

(𝒳,U𝒳):=U𝒳uL2(Ω)assign𝒳subscript𝑈𝒳subscriptnormsubscript𝑈𝒳𝑢superscript𝐿2Ω\displaystyle\mathcal{E}(\mathcal{X},U_{\mathcal{X}}):=\|U_{\mathcal{X}}-u\|_{% L^{2}({\Omega})}caligraphic_E ( caligraphic_X , italic_U start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT ) := ∥ italic_U start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT - italic_u ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_Ω ) end_POSTSUBSCRIPT (3)

with respect to the mesh locations 𝒳𝒳\mathcal{X}caligraphic_X without changing the number of meshpoints Nxsubscript𝑁𝑥N_{x}italic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT. Note we will on occasion suppress the subscript 𝒳𝒳\mathcal{X}caligraphic_X from U𝑈Uitalic_U if the dependence is clear from context.

3.2 Adaptive Meshing

Relocation based r-adaptivity moves a mesh with a certain topology to another with the same topology, where the mesh points are moved, but their connectivity (and hence the associated data structures) is unaltered. Such methods typically map a fixed mesh in a computational domain (i.e. a representation of the mesh graph) to a moving mesh in the physical domain where the PDE is posed. We denote the mesh points in the physical domain by 𝐱(i)superscript𝐱𝑖{\mathbf{x}}^{(i)}bold_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT and they define a triangulation 𝒯𝒯\mathcal{T}caligraphic_T so that

𝒳={𝒙(i)d}i=1Nx,𝒯={Δ(j)|Δ(j)𝒳,|Δ(j)=d+1}j=1NT.\mathcal{X}=\left\{\bm{x}^{(i)}\in\mathbb{R}^{d}\right\}_{i=1}^{N_{x}},\quad% \mathcal{T}=\left\{\Delta^{(j)}\left|\Delta^{(j)}\subset\mathcal{X},\right.% \mid|\Delta^{(j)}\mid=d+1\right\}_{j=1}^{N_{T}}.caligraphic_X = { bold_italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , caligraphic_T = { roman_Δ start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT | roman_Δ start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ⊂ caligraphic_X , ∣ | roman_Δ start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ∣ = italic_d + 1 } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUPERSCRIPT .

We define the following domains and coordinates: ΩCdsubscriptΩ𝐶superscript𝑑\Omega_{C}\subseteq\mathbb{R}^{d}roman_Ω start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ⊆ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT - “computational" domain eg. the plane, sphere, mapped to ΩPdsubscriptΩ𝑃superscript𝑑\Omega_{P}\subseteq\mathbb{R}^{d}roman_Ω start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ⊆ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT - “physical" domain, with 𝝃𝝃\bm{\xi}bold_italic_ξ - coordinates in the computational domain, 𝒙𝒙\bm{x}bold_italic_x - coordinates in the physical domain. To construct an adaptive mesh we consider a differentiable time-dependent deformation map 𝐅:ΩCΩP:𝐅subscriptΩ𝐶subscriptΩ𝑃\mathbf{F}:\Omega_{C}\rightarrow\Omega_{P}bold_F : roman_Ω start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT → roman_Ω start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT, so that 𝑿=𝐅(ξ,t)𝑿𝐅𝜉𝑡\bm{X}=\mathbf{F}(\mathbf{\xi},t)bold_italic_X = bold_F ( italic_ξ , italic_t ) and 𝐅(ΩC)=ΩP𝐅subscriptΩ𝐶subscriptΩ𝑃\mathbf{F}{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}(\partial\Omega_{C})=}% \partial\Omega_{P}bold_F ( ∂ roman_Ω start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ) = ∂ roman_Ω start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT. If 𝝃(i)superscript𝝃𝑖{\bm{\xi}}^{(i)}bold_italic_ξ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT are the fixed mesh points in the computational domain then 𝒙(i)=𝐅(𝝃(i),t)superscript𝒙𝑖𝐅superscript𝝃𝑖𝑡\bm{x}^{(i)}=\mathbf{F}(\bm{\xi}^{(i)},t)bold_italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT = bold_F ( bold_italic_ξ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , italic_t ) Assuming the mesh in the computational domain is regular, then determining the (properties of the) mesh in the physical domain, reduces to finding, and analysing, the function 𝐅𝐅{\mathbf{F}}bold_F.

Location based methods find this map directly, usually by solving a partial differential equation, or variational principle, linked to it. Monge-Ampére based methods assume that the deformation map is a Legendre transform (associated with an Optimal Transport regularisation) with a ‘mesh potential’ ϕ(𝝃,t)italic-ϕ𝝃𝑡\phi(\bm{\xi},t)italic_ϕ ( bold_italic_ξ , italic_t ) for which 𝐅=𝝃ϕ.𝐅subscript𝝃italic-ϕ{\mathbf{F}}=\nabla_{\bm{\xi}}\phi.bold_F = ∇ start_POSTSUBSCRIPT bold_italic_ξ end_POSTSUBSCRIPT italic_ϕ . The linearisation J𝐽Jitalic_J of the deformation map, given by J=𝐅/𝝃H(ϕ)𝐽𝐅𝝃𝐻italic-ϕJ=\partial{\mathbf{F}}/\partial{\bm{\xi}}\equiv H(\phi)italic_J = ∂ bold_F / ∂ bold_italic_ξ ≡ italic_H ( italic_ϕ ) where H𝐻Hitalic_H is the Hessian of ϕitalic-ϕ\phiitalic_ϕ. Relocation methods usually equidistribute a monitor function m(x)𝑚𝑥m(x)italic_m ( italic_x ) so that ϕitalic-ϕ\phiitalic_ϕ satisfies the Monge-Ampére equation

m(x)|H(ϕ)|=m(ϕ)|H(ϕ)|=θ,for constantθ.formulae-sequence𝑚𝑥𝐻italic-ϕ𝑚italic-ϕ𝐻italic-ϕ𝜃for constant𝜃m(x)|H(\phi)|=m(\nabla\phi)|H(\phi)|=\theta,\quad\mbox{for constant}\quad\theta.italic_m ( italic_x ) | italic_H ( italic_ϕ ) | = italic_m ( ∇ italic_ϕ ) | italic_H ( italic_ϕ ) | = italic_θ , for constant italic_θ . (4)

The monitor function is a measure of the local solution error, for example

m1(x)=(1+uxx2+uyy2)1/5[22],m2(x)=1+α(uuexact )2maxx,y(uuexact )2+βH(u)Fmaxx,yH(u)F[38].\displaystyle m_{1}(x)=\left(1+u_{xx}^{2}+u_{yy}^{2}\right)^{1/5}\text{\cite[c% ite]{[\@@bibref{Number}{huang_adaptive_2011}{}{}]}},\quad m_{2}(x)=1+\frac{% \alpha\left(u-u_{\text{exact }}\right)^{2}}{\max_{x,y}\left(u-u_{\text{exact }% }\right)^{2}}+\frac{\beta\|H(u)\|_{F}}{\max_{x,y}\|H(u)\|_{F}}\text{\cite[cite% ]{[\@@bibref{Number}{song_m2n_2022}{}{}]}}.italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) = ( 1 + italic_u start_POSTSUBSCRIPT italic_x italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_u start_POSTSUBSCRIPT italic_y italic_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 5 end_POSTSUPERSCRIPT , italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ) = 1 + divide start_ARG italic_α ( italic_u - italic_u start_POSTSUBSCRIPT exact end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_max start_POSTSUBSCRIPT italic_x , italic_y end_POSTSUBSCRIPT ( italic_u - italic_u start_POSTSUBSCRIPT exact end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_β ∥ italic_H ( italic_u ) ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_ARG start_ARG roman_max start_POSTSUBSCRIPT italic_x , italic_y end_POSTSUBSCRIPT ∥ italic_H ( italic_u ) ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_ARG . (5)

The PDE (4) has a unique, convex, solution [6] which avoids mesh tangling. However it is challenging, and can be slow, to solve. Solution procedures include relaxation methods [7], quasi-Newton methods [32], surrogate methods [38], and PINNs [43].

Velocity based methods find an ODE describing the mesh point evolution so that

𝒙˙(i)=𝐯(𝒙(i),t),superscript˙𝒙𝑖𝐯superscript𝒙𝑖𝑡\dot{\bm{x}}^{(i)}={\mathbf{v}}({\bm{x}}^{(i)},t),over˙ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT = bold_v ( bold_italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , italic_t ) , (6)

with 𝐯=𝟎𝐯0{\mathbf{v}}={\bm{0}}bold_v = bold_0 when 𝒙(i)=𝝃(i).superscript𝒙𝑖superscript𝝃𝑖\bm{x}^{(i)}=\bm{\xi}^{(i)}.bold_italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT = bold_italic_ξ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT . The choice of velocity function 𝐯𝐯{\mathbf{v}}bold_v is critical to the success of such methods, and is often motivated by natural Lagrangian structures of the underlying PDE, [22] Chapter 7. The G-adaptive method in this paper is a velocity based method in which 𝐯𝐯{\mathbf{v}}bold_v is derived by calculating the rate of change of the FE solution error. As we outline in section 4 and the supplementary material, G-adaptive methods avoid the mesh tangling often associated with velocity methods.

3.3 Graph Neural Networks (GNNs)

We consider a graph 𝒢=(𝒱,)𝒢𝒱\mathcal{G}=(\mathcal{V},\mathcal{E})caligraphic_G = ( caligraphic_V , caligraphic_E ), with Nx:=|𝒱|assignsubscript𝑁𝑥𝒱N_{x}:=|\mathcal{V}|italic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT := | caligraphic_V | nodes and 𝒱×𝒱𝒱𝒱\mathcal{E}\subset\mathcal{V}\times\mathcal{V}caligraphic_E ⊂ caligraphic_V × caligraphic_V edges. The adjacency matrix 𝐀𝐀\mathbf{A}bold_A is defined as aij=1subscript𝑎𝑖𝑗1a_{ij}=1italic_a start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = 1 if (i,j)𝑖𝑗(i,j)\in\mathcal{E}( italic_i , italic_j ) ∈ caligraphic_E and zero otherwise. Node features {𝐱i0d:iV}conditional-setsubscript𝐱𝑖subscriptsuperscript𝑑0𝑖V\left\{\mathbf{x}_{i}\in\mathbb{R}^{d}_{0}:i\in\mathrm{V}\right\}{ bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT : italic_i ∈ roman_V } can be represented in matrix notation as 𝐗Nx×d𝐗superscriptsubscript𝑁𝑥𝑑\mathbf{X}\in\mathbb{R}^{N_{x}\times d}bold_X ∈ blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT × italic_d end_POSTSUPERSCRIPT.

A graph neural network (GNN) θ:Nx×d0×Nx×dN:subscript𝜃superscriptsubscript𝑁𝑥subscript𝑑0superscriptsubscript𝑁𝑥subscript𝑑𝑁\mathcal{M}_{\theta}:\mathbb{R}^{N_{x}\times d_{0}}\times\mathcal{E}% \rightarrow\mathbb{R}^{N_{x}\times d_{N}}caligraphic_M start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT × italic_d start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT × caligraphic_E → blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT × italic_d start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is constructed with layers: θk:Nx×dkNx×dk+1:subscriptsubscript𝜃𝑘superscriptsubscript𝑁𝑥subscript𝑑𝑘superscriptsubscript𝑁𝑥subscript𝑑𝑘1\mathcal{L}_{\theta_{k}}:\mathbb{R}^{N_{x}\times d_{k}}\rightarrow\mathbb{R}^{% N_{x}\times d_{k+1}}caligraphic_L start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT × italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT × italic_d start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT acting node wise as 𝐱ik+1=θk(𝐱ik)=ϕθk(𝐱ik,j𝒩iφθk(𝐱jk))superscriptsubscript𝐱𝑖𝑘1subscriptsubscript𝜃𝑘superscriptsubscript𝐱𝑖𝑘subscriptitalic-ϕsubscript𝜃𝑘superscriptsubscript𝐱𝑖𝑘subscript𝑗subscript𝒩𝑖subscript𝜑subscript𝜃𝑘superscriptsubscript𝐱𝑗𝑘\mathbf{x}_{i}^{k+1}=\mathcal{L}_{\theta_{k}}(\mathbf{x}_{i}^{k})=\phi_{\theta% _{k}}\left(\mathbf{x}_{i}^{k},\sum\limits_{j\in\mathcal{N}_{i}}\varphi_{\theta% _{k}}(\mathbf{x}_{j}^{k})\right)bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT = caligraphic_L start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) = italic_ϕ start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_φ start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) )

Two important examples are GCN [25] in matrix form 𝐗k+1=σ(𝐀𝐗k𝐖)superscript𝐗𝑘1𝜎superscript𝐀𝐗𝑘𝐖\mathbf{X}^{k+1}=\sigma(\mathbf{A}\mathbf{X}^{k}\mathbf{W})bold_X start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT = italic_σ ( bold_AX start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT bold_W ) and GAT [41] 𝐗k+1=σ(𝐀(𝐗k)𝐗k𝐖)superscript𝐗𝑘1𝜎𝐀superscript𝐗𝑘superscript𝐗𝑘𝐖\mathbf{X}^{k+1}=\sigma(\mathbf{A}(\mathbf{X}^{k})\mathbf{X}^{k}\mathbf{W})bold_X start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT = italic_σ ( bold_A ( bold_X start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) bold_X start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT bold_W ) where aij=a(xi,xj)subscript𝑎𝑖𝑗𝑎subscript𝑥𝑖subscript𝑥𝑗a_{ij}=a(x_{i},x_{j})italic_a start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = italic_a ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) are attentional coefficients. We can include encoder E𝐸Eitalic_E and decoder D𝐷Ditalic_D such that

𝐲i=θ((𝐗0))i=(D(θN(θ0(E(𝐗0)))))i.subscript𝐲𝑖subscript𝜃subscriptsubscript𝐗0𝑖subscript𝐷subscriptsubscript𝜃𝑁subscriptsubscript𝜃0𝐸subscript𝐗0𝑖\mathbf{y}_{i}=\mathcal{M}_{\theta}(\left(\mathbf{X}_{0}\right))_{i}=\left(D% \left(\mathcal{L}_{\theta_{N}}\left(\dots\mathcal{L}_{\theta_{0}}\left(E\left(% \mathbf{X}_{0}\right)\right)\right)\right)\right)_{i}.bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = caligraphic_M start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( ( bold_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( italic_D ( caligraphic_L start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( … caligraphic_L start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_E ( bold_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) ) ) ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT . (7)

4 G-Adaptivity a GNN based adaptive mesh refinement

Our novel approach to meshing G-adaptivity using GNNs consists of three main parts: Figure 1(a) the careful selection of the GNN feature space, (b) a stable, easy-to-train and low cost GNN, (c) and a differentiable FEM solver (difFEM) for direct optimisation with respect to the FEM error (3).

4.1 Graph Neural Diffusion

GRAND [9] is a state-of-the-art GNN architecture that formulates a learnable non-homogeneous (or anisotropic) diffusion equation on a graph. Similar to neural ODEs [chen] it considers layers through the network as time and, by the properties of the row stochastic adjacency matrix, has favourable stability properties in inference and training. Given the differential equation approach to classical meshing strategies the architecture specifically aligns itself with these and brings the power of GNNs to the meshing problem. The GRAND architecture is achieved by making residual and treating propagation with an attentional graph Laplacian the GNN equation 7 becomes

t𝐗(t)=(𝐀θ(𝐗(t))𝐈)𝐗(t).𝑡𝐗𝑡subscript𝐀𝜃𝐗𝑡𝐈𝐗𝑡\displaystyle\frac{\partial}{\partial t}\mathbf{X}(t)=(\mathbf{A}_{\theta}(% \mathbf{X}(t))-\mathbf{I})\mathbf{X}(t).divide start_ARG ∂ end_ARG start_ARG ∂ italic_t end_ARG bold_X ( italic_t ) = ( bold_A start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_X ( italic_t ) ) - bold_I ) bold_X ( italic_t ) . (8)

a(𝐗i,𝐗j)𝑎subscript𝐗𝑖subscript𝐗𝑗a\left(\mathbf{X}_{i},\mathbf{X}_{j}\right)italic_a ( bold_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) is calculated using the scaled dot product attention [40]

aij=a(𝐗i,𝐗j)=softmax((𝐗i𝐖K𝐖Q𝐗j)/dk).subscript𝑎𝑖𝑗𝑎subscript𝐗𝑖subscript𝐗𝑗softmaxsuperscriptsubscript𝐗𝑖topsuperscriptsubscript𝐖𝐾topsubscript𝐖𝑄subscript𝐗𝑗subscript𝑑𝑘\displaystyle a_{ij}=a\left(\mathbf{X}_{i},\mathbf{X}_{j}\right)=\operatorname% {softmax}\left(\left(\mathbf{X}_{i}^{\top}\mathbf{W}_{K}^{\top}\mathbf{W}_{Q}% \mathbf{X}_{j}\right)/d_{k}\right).italic_a start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = italic_a ( bold_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = roman_softmax ( ( bold_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) / italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) . (9)

where ai,j>0subscript𝑎𝑖𝑗0a_{i,j}>0italic_a start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT > 0 if (i,j)𝑖𝑗(i,j)\in\mathcal{E}( italic_i , italic_j ) ∈ caligraphic_E and j𝒩(i)aij=1subscript𝑗𝒩𝑖subscript𝑎𝑖𝑗1\sum_{j\in\mathcal{N}(i)}a_{ij}=1∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_N ( italic_i ) end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = 1. To construct the feature matrix required for G-Adaptivity we concatenate coordinates of a regular FE grid 𝝃Nx×d𝝃superscriptsubscript𝑁𝑥𝑑\bm{\xi}\in\mathbb{R}^{N_{x}\times d}bold_italic_ξ ∈ blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT × italic_d end_POSTSUPERSCRIPT, a preliminary coarse FE solve u~Nx~𝑢superscriptsubscript𝑁𝑥\tilde{u}\in\mathbb{R}^{N_{x}}over~ start_ARG italic_u end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and curvature estimate f~Nx~𝑓superscriptsubscript𝑁𝑥\tilde{f}\in\mathbb{R}^{N_{x}}over~ start_ARG italic_f end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT on the regular grid giving the features vector 𝐗=(𝝃u~f~)Nx×d+2𝐗𝝃~𝑢~𝑓superscriptsubscript𝑁𝑥𝑑2\mathbf{X}=\left(\bm{\xi}\mathbin{\|}\tilde{u}\mathbin{\|}\tilde{f}\right)\in% \mathbb{R}^{N_{x}\times d+2}bold_X = ( bold_italic_ξ ∥ over~ start_ARG italic_u end_ARG ∥ over~ start_ARG italic_f end_ARG ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT × italic_d + 2 end_POSTSUPERSCRIPT. Then we define our G-adaptive operator

𝜽:(𝐗,𝐀)𝒳=(𝒙i)i=1Nx.:subscript𝜽maps-to𝐗𝐀𝒳superscriptsubscriptsubscript𝒙𝑖𝑖1subscript𝑁𝑥\displaystyle\mathcal{M}_{\bm{\theta}}:(\mathbf{X},\mathbf{A})\mapsto\mathcal{% X}=(\bm{x}_{i})_{i=1}^{N_{x}}.caligraphic_M start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT : ( bold_X , bold_A ) ↦ caligraphic_X = ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT .

θsubscript𝜃\mathcal{M}_{\theta}caligraphic_M start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT uses an identity encoder E𝐸Eitalic_E to lift the latent space to Nx×d+2+|λ|superscriptsubscript𝑁𝑥𝑑2𝜆\mathbb{R}^{N_{x}\times d+2+|\lambda|}blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT × italic_d + 2 + | italic_λ | end_POSTSUPERSCRIPT and a decoder D𝐷Ditalic_D to return physical coordinates. GRAND layers are used as convolutional blocks that induce several desirable properties for the meshing problem. By construction the adjacency is row stochastic this guarantees stability properties on the solver step size. Feature channels don’t mix instead they interact via attention expressed with elements of the adjacency, this is important as the first two features are physical coordinates. Finally the learnable weights Q𝑄Qitalic_Q and K𝐾Kitalic_K matrices allow for an anistropic diffusion which is akin to a learnable monitor function from classical methods.

Regularity of GRAND generated meshes

The GRAND based architecture has a key advantage in the generation of regular meshes. A requirement of FEM meshes is that they are not ‘tangled’, i.e. that they form a well-posed triangulation of the domain ΩΩ\Omegaroman_Ω (i.e. no triangles overlap). This follows if each mesh point is in the convex hull of its neighbours on the graph. We show in the supplementary material that our architecture automatically moves points to the convex hull of their neighbours, enforcing this property.

Refer to caption
Figure 2: The action of the graph diffusion pulls nodes into the convex hull of their graph neighbours.

4.2 difFEM: a differentiable FEM solver and FEM loss

As mentioned above, prior work for r𝑟ritalic_r-adaptivity both in classical adaptive meshing (cf. [7]) and in machine learning approaches to the problem (cf. [14, 34, 43, 21, 38]) has built on the assumption that direct minimisation of the FEM error, whilst highly desirable, is both expensive and highly non-convex and therefore computationally infeasible [22, p. 56]. It turns out that recent advances in computational power and software have brought exactly this direct minimisation into the realm of the possible. Indeed, we introduce an adjoint framework which allows us to introduce difFEM a finite element solver which is fully differentiable with respect to mesh point locations. Based on the chain rule we can then exploit this solver to directly compute gradients of the FEM error (3) with respect to the GNN parameters 𝜽𝜽\bm{\theta}bold_italic_θ. In particular we have (writing 𝜽=𝜽(𝐗,𝒜)subscript𝜽subscript𝜽𝐗𝒜\mathcal{M}_{\bm{\theta}}=\mathcal{M}_{\bm{\theta}}(\mathbf{X},\mathcal{A})caligraphic_M start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT = caligraphic_M start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_X , caligraphic_A ) for simplicity)

ddθj(𝜽,U𝜽)=i=1Nxd(𝜽(𝐗,𝒜))idθjd(𝒳,U𝒳)d𝐱i|𝒳=𝜽(𝐗,𝒜)ddsubscript𝜃𝑗subscript𝜽subscript𝑈subscript𝜽evaluated-atsuperscriptsubscript𝑖1subscript𝑁𝑥dsubscriptsubscript𝜽𝐗𝒜𝑖dsubscript𝜃𝑗d𝒳subscript𝑈𝒳dsubscript𝐱𝑖𝒳subscript𝜽𝐗𝒜\displaystyle\frac{\mathrm{d}}{\mathrm{d}{\theta_{j}}}\mathcal{E}(\mathcal{M}_% {\bm{\theta}},U_{\mathcal{M}_{\bm{\theta}}})=\sum_{i=1}^{N_{x}}\frac{\mathrm{d% }(\mathcal{M}_{\bm{\theta}}(\mathbf{X},\mathcal{A}))_{i}}{\mathrm{d}\theta_{j}% }\cdot\frac{\mathrm{d}\mathcal{E}(\mathcal{X},U_{\mathcal{X}})}{\mathrm{d}% \mathbf{x}_{i}}\Big{|}_{\mathcal{X}=\mathcal{M}_{\bm{\theta}}(\mathbf{X},% \mathcal{A})}divide start_ARG roman_d end_ARG start_ARG roman_d italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG caligraphic_E ( caligraphic_M start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT caligraphic_M start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT divide start_ARG roman_d ( caligraphic_M start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_X , caligraphic_A ) ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG roman_d italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ⋅ divide start_ARG roman_d caligraphic_E ( caligraphic_X , italic_U start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT ) end_ARG start_ARG roman_d bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG | start_POSTSUBSCRIPT caligraphic_X = caligraphic_M start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_X , caligraphic_A ) end_POSTSUBSCRIPT

In order to compute the derivative values of the FEM error \mathcal{E}caligraphic_E with respect to the mesh coordinates 𝒳={𝐱i}i=1Nx𝒳superscriptsubscriptsubscript𝐱𝑖𝑖1subscript𝑁𝑥\mathcal{X}=\{\mathbf{x}_{i}\}_{i=1}^{N_{x}}caligraphic_X = { bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT we begin by recalling that the expansion coefficients 𝜶𝜶\bm{\alpha}bold_italic_α of U𝒳subscript𝑈𝒳U_{\mathcal{X}}italic_U start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT (cf. section 3) are given as the solution of a linear system (the constraint 𝐠𝐠\mathbf{g}bold_g arising from the bilinear form (2)):

𝐠(𝒳,𝜶𝒳)=0,where𝐠(𝒳,𝜶)=𝖬𝒳𝜶𝒳𝒃𝒳formulae-sequence𝐠𝒳subscript𝜶𝒳0where𝐠𝒳𝜶subscript𝖬𝒳subscript𝜶𝒳subscript𝒃𝒳\displaystyle\mathbf{g}(\mathcal{X},\bm{\alpha}_{\mathcal{X}})=0,\quad\text{% where}\ \mathbf{g}(\mathcal{X},\bm{\alpha})=\mathsf{M}_{\mathcal{X}}\bm{\alpha% }_{\mathcal{X}}-\bm{b}_{\mathcal{X}}bold_g ( caligraphic_X , bold_italic_α start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT ) = 0 , where bold_g ( caligraphic_X , bold_italic_α ) = sansserif_M start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT bold_italic_α start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT - bold_italic_b start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT (10)

for a given matrix 𝖬𝒳subscript𝖬𝒳\mathsf{M}_{\mathcal{X}}sansserif_M start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT and load vector 𝒃𝒳subscript𝒃𝒳\bm{b}_{\mathcal{X}}bold_italic_b start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT which both depend on the mesh. Then we have

dd𝐱i=𝝀𝐠𝐱i+𝐱i,ddsubscript𝐱𝑖superscript𝝀top𝐠subscript𝐱𝑖subscript𝐱𝑖\displaystyle\frac{\mathrm{d}\mathcal{E}}{\mathrm{d}\mathbf{x}_{i}}=\bm{% \lambda}^{\top}\frac{\partial\mathbf{g}}{\partial\mathbf{x}_{i}}+\frac{% \partial\mathcal{E}}{\partial\mathbf{x}_{i}}{\color[rgb]{0,0,0}\definecolor[% named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}% \pgfsys@color@gray@fill{0},}divide start_ARG roman_d caligraphic_E end_ARG start_ARG roman_d bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = bold_italic_λ start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT divide start_ARG ∂ bold_g end_ARG start_ARG ∂ bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG + divide start_ARG ∂ caligraphic_E end_ARG start_ARG ∂ bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ,

where 𝝀𝝀\bm{\lambda}bold_italic_λ is the solution to the discrete adjoint problem (cf. [39]) 𝖬𝒳𝝀=/𝜶superscriptsubscript𝖬𝒳top𝝀𝜶\mathsf{M}_{\mathcal{X}}^{\top}\bm{\lambda}=-\partial\mathcal{E}/{\partial\bm{% \alpha}}sansserif_M start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_λ = - ∂ caligraphic_E / ∂ bold_italic_α. The advantage of this formulation is that we never have to compute d𝜶/d𝐱i=d(𝖬𝒳1𝐛𝒳)/d𝐱i𝑑𝜶𝑑subscript𝐱𝑖𝑑superscriptsubscript𝖬𝒳1subscript𝐛𝒳𝑑subscript𝐱𝑖d\bm{\alpha}/d\mathbf{x}_{i}=d\left(\mathsf{M}_{\mathcal{X}}^{-1}\mathbf{b}_{% \mathcal{X}}\right)/d\mathbf{x}_{i}italic_d bold_italic_α / italic_d bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_d ( sansserif_M start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_b start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT ) / italic_d bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT directly, thus we avoid having to differentiate a matrix inversion. This significantly simplifies the computational tree in the difFEM and makes computations on moderate mesh resolutions sizes feasible.

5 Experimental Results

We evaluate G-adaptivity on two classical meshing problems: an elliptic PDE (Poisson’s equation) and a nonlinear time-evolution PDE (Burger’s equation). In the following we present the performance improvements obtained in terms of the FEM L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-error reduction (cf. (3)) and compute time, using our novel approach for adaptive meshing on each of these problems. We share code to reproduce our results on the G-Adaptivity github repository.

5.1 Experimental details

Method

The pipeline for our experiments consist of three parts (cf. Figure 1): we build datasets containing information about the PDE, classical meshing solution and FEM approximation. Then we train the model, we experiment by training using either the Mesh-Loss which is the L1superscript𝐿1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT distance to MMPDE5 (1D) or Monge-Ampére (2D) pre-generated meshes or the ‘PDE-loss’ which is the L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT FEM approximation error and then backpropagate through the differentiable solver difFEM. Full code to build the datasets and reproduce our results will be made available.

Evaluation is done by calculating an FE solution on the mesh generated by the model and projecting this using interpolation to a fine 100dsuperscript100𝑑100^{d}100 start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT regular reference mesh. On the fine mesh we then calculate the L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT error against the reference solution. For Poisson the reference solution is the analytical solution projected onto the for Burgers this is an FEM solve performed on the fine reference mesh. We then calculate the relative error reduction against an FEM solve done on a regular coarse mesh with coordinates ξ𝜉\mathbf{\xi}italic_ξ but evaluated on the fine mesh. We also report time which shows the time of a forward pass when the model is in training phase.

This is a very low parameter model with only 128 parameters coming from the 8×8888\times 88 × 8 query and key matrices shared over the layers. Training is done for 50 epochs with a Adam optimiser and learning rate of 0.001, although we found our model converged in less than 10-20 epochs. The model is also very easy to tune and we found using 4 layers with a step size of 0.1 was sufficient for all the datasets.

Baselines

We compare our model against classical adaptive mesh alogorithms MMDPE5 as described by [22] in 1-dimension and Monge-Ampére [6] in 2-dimensions. These two state-of-the-art classical approaches serve as a central baseline for the FEM error reduction and indicate also an upper bound on prior ML performance which, as mentioned above, focused mainly on the design of effective surrogate models, which provide a speed up but not FEM error reduction improvement over classical methods for r𝑟ritalic_r-adaptivity. The monitor function for these methods was chosen to be m1subscript𝑚1m_{1}italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT as defined in (5). In our experiments we also aim to compare against the central ML baseline [38] which also describes a GNN based architecture for r𝑟ritalic_r-adaptivity in meshing. The differences to our novel approach is the GNN architecture, where [38] used Graph Attention Networks (GAT) and their method of training which was data-based, i.e. with what we call a Mesh-Loss. Unfortunately, the code for this prior work was not made available. However, we were able to implement a version of the approach described in [38], see appendix for details. Note that qualitatively the performance of GAT_lap + Mesh-Loss is comparable to the results reported in [38] - an error reduction comparable to Monge-Ampère, with a mesh time improvement of around three orders of magnitude.

Datasets

We evaluate our model and baselines for a range of challenging mesh problems. There are four main datasets we call ‘structured gaussian’ (struct1g_27, struct2g_75) and random Gaussian (randg_25, randg_275). For all datasets we work on the domain Ω=[0,1]dΩsuperscript01𝑑\Omega=[0,1]^{d}roman_Ω = [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and generate example pde data and meshes for varying resolutions of [11d,15d]superscript11𝑑superscript15𝑑[11^{d},15^{d}][ 11 start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , 15 start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ]. Scales and centers of Gaussian type solutions are sampled with a stratified approach, for structured, or randomly otherwise see Appendix B for more details. For Poisson’s equation we evaluate against the analytical solution and for Burgers’ equation we perform an FEM solve on the fine grid used in evaluation.

5.2 Poisson’s equation

Our first test problem is the elliptic Poisson’s equation

2u=f,xΩ,u=g,xΩ.formulae-sequencesuperscript2𝑢𝑓formulae-sequence𝑥Ωformulae-sequence𝑢𝑔𝑥Ω\displaystyle-\nabla^{2}u=f,\ \ x\in\Omega,\quad u=g,\ \ x\in\partial\Omega.- ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_u = italic_f , italic_x ∈ roman_Ω , italic_u = italic_g , italic_x ∈ ∂ roman_Ω .

Numerical results

The performance of our novel approach G-adaptivity in comparison to our baselines is shown in table 1. A sample extract of the dataset randg_275𝑟𝑎𝑛𝑑𝑔_275randg\_275italic_r italic_a italic_n italic_d italic_g _ 275 can be seen in Figure 3. Note the difference in the shape of the meshes generated with our optimised GNN architecture which lead to significant further error reduction over the Monge-Ampère meshes.

Dataset Model Loss L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-FEM error reduction Time
struct1g_27 MMPDE5 - -18.25% 791 ms
G-adaptive Mesh-Loss -15.46% 0.7 ms
G-adaptive difFEM -26.02% 1.2 ms
struct2g_75 Monge-Ampère - -15.00% 845 ms
GAT_lap Mesh-Loss 11.55% 1.5 ms
GAT_lap difFEM -3.11% 1.7 ms
G-adaptive Mesh-Loss 2.72% 0.8 ms
G-adaptive difFEM -16.75% 0.8 ms
randg_25 Monge-Ampère - -15.70% 1868.1 ms
GAT_lap Mesh-Loss -6.71% 3.9 ms
GAT_lap difFEM -2.07% 1.9 ms
G-adaptive Mesh-Loss -11.01% 1.7 ms
G-adaptive difFEM -15.62% 1.5 ms
randg_275 Monge-Ampère - -13.84% 2332.5 ms
GAT_lap Mesh-Loss -0.32% 3.9 ms
GAT_lap difFEM -0.03% 2.1 ms
G-adaptive Mesh-Loss -8.97% 1.6 ms
G-adaptive difFEM -14.77% 1.6 ms
Table 1: Experimental results for Poisson’s equation
Refer to caption
Figure 3: Examples of data points in the dataset and their corresponding meshes. Top row: G-adaptive mesh. Middle row: Monge-Ampère meshes. Bottom row: PDE solution.

5.3 Burgers’ equation

The favourable performance of our approach extends to time-dynamic problems. To demonstrate this we choose the classical example of the viscous Burgers’ equation which is a well-known example of a PDE that evolves solutions with sharp interfaces (which are resolved with adaptive meshes). In the following 0<ν10𝜈much-less-than10<\nu\ll 10 < italic_ν ≪ 1 is the viscosity parameter:

ut=(u)u+ν2u,xΩ,t>0,u=g,xΩ,t>0,u=u0,xΩ,t=0.formulae-sequence𝑢𝑡𝑢𝑢𝜈superscript2𝑢formulae-sequence𝑥Ωformulae-sequence𝑡0formulae-sequence𝑢𝑔formulae-sequence𝑥Ωformulae-sequence𝑡0formulae-sequence𝑢subscript𝑢0formulae-sequence𝑥Ω𝑡0\frac{\partial u}{\partial t}=-(u\cdot\nabla)u+\nu\nabla^{2}u,\;x\in\Omega,t>0% ,\quad u=g,\;x\in\partial\Omega,t>0,\quad u=u_{0},\;x\in\Omega,t=0.divide start_ARG ∂ italic_u end_ARG start_ARG ∂ italic_t end_ARG = - ( italic_u ⋅ ∇ ) italic_u + italic_ν ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_u , italic_x ∈ roman_Ω , italic_t > 0 , italic_u = italic_g , italic_x ∈ ∂ roman_Ω , italic_t > 0 , italic_u = italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x ∈ roman_Ω , italic_t = 0 . (11)

Time-dynamic loss

Let us briefly outline how we use the approach introduced in section 4.2 for this time-dependent case. We begin by choosing a time-stepping method for the Burgers’ equation - a simple a good choice for the efficient numerical solution of such a stiff nonlinear system is a semi-implicit method [18]:

Um+1,v=v,Umτ((Um)Um,vL2(Ω)+νUm+1,vL2(Ω)),vS𝒳,formulae-sequencesuperscript𝑈𝑚1𝑣𝑣superscript𝑈𝑚𝜏subscriptsuperscript𝑈𝑚superscript𝑈𝑚𝑣superscript𝐿2Ω𝜈subscriptsuperscript𝑈𝑚1𝑣superscript𝐿2Ωfor-all𝑣subscript𝑆𝒳\displaystyle\langle U^{m+1},v\rangle=\langle v,U^{m}\rangle-\tau\left(\langle% (U^{m}\cdot\nabla)U^{m},v\rangle_{L^{2}(\Omega)}+\nu\langle\nabla U^{m+1},% \nabla v\rangle_{L^{2}(\Omega)}\right),\quad\forall v\in S_{\mathcal{X}},⟨ italic_U start_POSTSUPERSCRIPT italic_m + 1 end_POSTSUPERSCRIPT , italic_v ⟩ = ⟨ italic_v , italic_U start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ⟩ - italic_τ ( ⟨ ( italic_U start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ⋅ ∇ ) italic_U start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT , italic_v ⟩ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_Ω ) end_POSTSUBSCRIPT + italic_ν ⟨ ∇ italic_U start_POSTSUPERSCRIPT italic_m + 1 end_POSTSUPERSCRIPT , ∇ italic_v ⟩ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_Ω ) end_POSTSUBSCRIPT ) , ∀ italic_v ∈ italic_S start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT ,

where Umsuperscript𝑈𝑚U^{m}italic_U start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT corresponds to the approximation of the solution u𝑢uitalic_u at time t=mτ𝑡𝑚𝜏t=m\tauitalic_t = italic_m italic_τ, and τ>0𝜏0\tau>0italic_τ > 0 is the time step of the approximation. Here we omitted the subscript Um=U𝒳msuperscript𝑈𝑚subscriptsuperscript𝑈𝑚𝒳U^{m}=U^{m}_{\mathcal{X}}italic_U start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT = italic_U start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT. Bringing the dissipative (implicit) part to the left of the equation the computation of U𝒳m+1subscriptsuperscript𝑈𝑚1𝒳U^{m+1}_{\mathcal{X}}italic_U start_POSTSUPERSCRIPT italic_m + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT takes exactly the form (10), thus allowing us to use difFEM on the dynamic time-stepping FEM error (cf. implicit-scheme-informed learning [23]):

Burgers(𝒳,U𝒳1)=U𝒳1u1L2(Ω),subscriptBurgers𝒳subscriptsuperscript𝑈1𝒳subscriptnormsubscriptsuperscript𝑈1𝒳superscript𝑢1superscript𝐿2Ω\displaystyle\mathcal{E}_{\text{Burgers}}(\mathcal{X},U^{1}_{\mathcal{X}})=% \left\|U^{1}_{\mathcal{X}}-u^{1}\right\|_{L^{2}(\Omega)},caligraphic_E start_POSTSUBSCRIPT Burgers end_POSTSUBSCRIPT ( caligraphic_X , italic_U start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT ) = ∥ italic_U start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT - italic_u start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_Ω ) end_POSTSUBSCRIPT , (12)

where u1superscript𝑢1u^{1}italic_u start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT is a reference value computed by resolving u0subscript𝑢0u_{0}italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT on a fine mesh and propagating with the semi-implicit time-stepper on this finer mesh. Of course there is no longer any forcing as part of the PDE data, and thus this is no longer included as a feature channel in the architecture.

Numerical results

We train the method on 100 samples of random Gaussian initial conditions with centers chosen uniformly at random in [0,1]01[0,1][ 0 , 1 ], and scales s𝑠sitalic_s chosen uniformly at random in [0.05,0.2]0.050.2[0.05,0.2][ 0.05 , 0.2 ]. These samples are on the one dimensional domain [0,1]01[0,1][ 0 , 1 ] with mesh resolution Nx=11subscript𝑁𝑥11N_{x}=11italic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT = 11. After training the model on a single time step (loss function BurgerssubscriptBurgers\mathcal{E}_{\text{Burgers}}caligraphic_E start_POSTSUBSCRIPT Burgers end_POSTSUBSCRIPT) with these initial conditions, we evaluate on the trajectories of 40 further initial conditions where we take 20 time steps of size τ=1/20𝜏120\tau=1/20italic_τ = 1 / 20, half of the trajectories on resolution Nx=11subscript𝑁𝑥11N_{x}=11italic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT = 11 mesh points, the second half on Nx=21subscript𝑁𝑥21N_{x}=21italic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT = 21. The mesh is adapted after every time step and the process iterated. This means we test our model on the generation of 20×20=400202040020\times 20=40020 × 20 = 400 meshes. Our evaluation metric is the error in the final solution value U𝒳20subscriptsuperscript𝑈20𝒳U^{20}_{\mathcal{X}}italic_U start_POSTSUPERSCRIPT 20 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT against the value of the approximation found on a fine reference mesh. The results can be seen in table 2 as a baseline we compare against the classical meshing method MMPDE5 as above. Again our novel approach is able to significantly reduce mesh generation cost as well as enhance L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-FEM error reduction. It turns out that without the forcing function as part of the feature channel the GAT architecture is no longer able to learn optimal location of mesh points, the results are included for completeness but clearly bear no merit in the mesh adaptivity problem.

Model Loss L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-FEM error reduction Mesh adaption time
MMPDE5 - -16.93 % 41007 ms
G-adaptive Mesh-Loss -29.06% 94 ms
G-adaptive difFEM -49.93% 91 ms
GAT difFEM +77.20% 94 ms
Table 2: Experimental results for Burgers’ equation

6 Conclusions

Limitations

While our novel approach shows some significant performance improvement over both classical and prior ML approaches to r𝑟ritalic_r-adaptivity, we recognise that further evaluation and development is necessary, which we will address in future work. In particular:

  • Generalisation to different PDEs Whilst the Poisson and Burgers’ PDEs are, for good reason, amongst the most widely used test problems in the context of adaptive meshing, we believe further evaluation should focus on generalising this approach to other types of equation. In principle our theoretical framework is highly flexible, but requires some adaption to the problem at hand, in particular in the choice of a good discrete loss (cf. (3) & (12)) and the choice of PDE features that can be included in our GNN architecture.

  • Mesh size and performance We acknowledge that testing of the methodology on larger mesh sizes Nxsubscript𝑁𝑥N_{x}italic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT is a necessary step to understand generalisation of this current approach to real-life FEM simulations. Our current mesh sizes were limited by the availability of computational power and while the new approach performed well throughout, the best FEM-error reduction over classical methods was observed for smaller Nxsubscript𝑁𝑥N_{x}italic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT. We note that if a full scale approach is not feasible on a large mesh, our methodology may still provide a useful approach by subdividing the domain into smaller parts and training on those.

Summary

We have presented a novel and effective approach to the classical problem of r𝑟ritalic_r-adaptive meshing for FEM. In particular, we demonstrate, for the first time, that GNNs together with a differentiable FEM solver, can be effectively used to directly optimise the location of mesh points to minimise the FEM error. Hence we can take an entirely different route from prior work (both classical and ML approaches) which determine good choices of mesh points by analysis-inspired heuristics. We demonstrate the advantages of our method on challenging test problems, and find that, on those examples, we are able to outperform both classical and ML methods in terms of both error reduction and computational cost. We note that the direct FEM error optimisation approach extends naturally to more complex domains (e.g. not simply connected or higher dimensional) where classical methods may struggle. We will study this in future work. Finally, our time-dynamic loss also permits specific training of the GNN for remeshing after several time steps instead of a single one, which is an avenue for further cost reduction that we will exploit in future work.

Acknowledgments and Disclosure of Funding

TD, CBS & CB gratefully acknowledge support from the EPSRC programme grant in ‘The Mathematics of Deep Learning’, under the project EP/V026259/1. GM gratefully acknowledges funding from the Mathematical Institute, University of Oxford. KS gratefully acknowledges funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 850941).

References

  • [1] Geometric Deep Learning: Going beyond Euclidean data | IEEE Journals & Magazine | IEEE Xplore. https://ieeexplore.ieee.org/abstract/document/7974879.
  • Ainsworth and Oden [1997] M. Ainsworth and J. T. Oden. A posteriori error estimation in finite element analysis. Computer Methods in Applied Mechanics and Engineering, 142(1):1–88, Mar. 1997. ISSN 0045-7825. doi: 10.1016/S0045-7825(96)01107-3.
  • Alet et al. [2019] F. Alet, A. K. Jeewajee, M. B. Villalonga, A. Rodriguez, T. Lozano-Perez, and L. Kaelbling. Graph Element Networks: Adaptive, structured computation and memory. In Proceedings of the 36th International Conference on Machine Learning, pages 212–222. PMLR, May 2019.
  • Battaglia et al. [2018] P. W. Battaglia, J. B. Hamrick, V. Bapst, A. Sanchez-Gonzalez, V. F. Zambaldi, M. Malinowski, A. Tacchetti, D. Raposo, A. Santoro, R. Faulkner, Ç. Gülçehre, H. F. Song, A. J. Ballard, J. Gilmer, G. E. Dahl, A. Vaswani, K. R. Allen, C. Nash, V. Langston, C. Dyer, N. Heess, D. Wierstra, P. Kohli, M. M. Botvinick, O. Vinyals, Y. Li, and R. Pascanu. Relational inductive biases, deep learning, and graph networks. CoRR, abs/1806.01261, 2018.
  • Brandstetter et al. [2022] J. Brandstetter, D. Worrall, and M. Welling. Message Passing Neural PDE Solvers. arXiv:2202.03376 [cs, math], Feb. 2022.
  • Budd et al. [2013] C. Budd, M. Cullen, and E. Walsh. Monge–Ampére based moving mesh methods for numerical weather prediction, with applications to the Eady problem. Journal of Computational Physics, 236:247–270, Mar. 2013. ISSN 00219991. doi: 10.1016/j.jcp.2012.11.014.
  • Budd et al. [2009] C. J. Budd, W. Huang, and R. D. Russell. Adaptivity with moving grids. Acta Numerica, 18:111–241, May 2009. ISSN 1474-0508, 0962-4929. doi: 10.1017/S0962492906400015.
  • Chamberlain et al. [2021a] B. Chamberlain, J. Rowbottom, D. Eynard, F. Di Giovanni, X. Dong, and M. Bronstein. Beltrami Flow and Neural Diffusion on Graphs. In Advances in Neural Information Processing Systems, volume 34, pages 1594–1609. Curran Associates, Inc., 2021a.
  • Chamberlain et al. [2021b] B. Chamberlain, J. Rowbottom, M. I. Gorinova, M. Bronstein, S. Webb, and E. Rossi. GRAND: Graph Neural Diffusion. In Proceedings of the 38th International Conference on Machine Learning, pages 1407–1418. PMLR, July 2021b.
  • Cotter [2023] C. J. Cotter. Compatible finite element methods for geophysical fluid dynamics. Acta Numerica, 32:291–393, May 2023. ISSN 0962-4929, 1474-0508. doi: 10.1017/S0962492923000028.
  • Defferrard et al. [2016] M. Defferrard, X. Bresson, and P. Vandergheynst. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. In Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016.
  • E and Yu [2018] W. E and B. Yu. The Deep Ritz Method: A Deep Learning-Based Numerical Algorithm for Solving Variational Problems. Communications in Mathematics and Statistics, 6(1):1–12, Mar. 2018. ISSN 2194-671X. doi: 10.1007/s40304-018-0127-z.
  • Fey and Lenssen [2019] M. Fey and J. E. Lenssen. Fast Graph Representation Learning with PyTorch Geometric. arXiv:1903.02428, 2019.
  • Foucart et al. [2023] C. Foucart, A. Charous, and P. F. J. Lermusiaux. Deep reinforcement learning for adaptive mesh refinement. Journal of Computational Physics, 491:112381, Oct. 2023. ISSN 0021-9991. doi: 10.1016/j.jcp.2023.112381.
  • Gilmer et al. [2017] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl. Neural Message Passing for Quantum Chemistry. In Proceedings of the 34th International Conference on Machine Learning, pages 1263–1272. PMLR, July 2017.
  • Giovanni et al. [2023] F. D. Giovanni, J. Rowbottom, B. P. Chamberlain, T. Markovich, and M. M. Bronstein. Understanding convolution on graphs via energies. Transactions on Machine Learning Research, June 2023. ISSN 2835-8856.
  • Grossmann et al. [2023] T. G. Grossmann, U. J. Komorowska, J. Latz, and C.-B. Schönlieb. Can Physics-Informed Neural Networks beat the Finite Element Method? arXiv:2302.04107, 2023.
  • Hairer and Wanner [1996] E. Hairer and G. Wanner. Solving Ordinary Differential Equations II, volume 14 of Springer Series in Computational Mathematics. Springer, Berlin, Heidelberg, 1996. ISBN 978-3-642-05220-0 978-3-642-05221-7. doi: 10.1007/978-3-642-05221-7.
  • Ham et al. [2023] D. A. Ham, P. H. J. Kelly, L. Mitchell, C. J. Cotter, R. C. Kirby, K. Sagiyama, N. Bouziani, S. Vorderwuelbecke, T. J. Gregory, J. Betteridge, D. R. Shapero, R. W. Nixon-Hill, C. J. Ward, P. E. Farrell, P. D. Brubeck, I. Marsden, T. H. Gibson, M. Homolya, T. Sun, A. T. T. McRae, F. Luporini, A. Gregory, M. Lange, S. W. Funke, F. Rathgeber, G.-T. Bercea, and G. R. Markall. Firedrake User Manual. Imperial College London and University of Oxford and Baylor University and University of Washington, first edition edition, 5 2023.
  • Han et al. [2018] J. Han, A. Jentzen, and W. E. Solving high-dimensional partial differential equations using deep learning. Proceedings of the National Academy of Sciences, 115(34):8505–8510, 2018.
  • Hu et al. [2024] P. Hu, Y. Wang, and Z.-M. Ma. Better Neural PDE Ssolvers Through Data-Free Mesh Movers. The Twelfth International Conference on Learning Representations, 2024.
  • Huang and Russell [2011] W. Huang and R. Russell. Adaptive Moving Mesh Methods, volume 174. Jan. 2011. ISBN 978-1-4419-7915-5. doi: 10.1007/978-1-4419-7916-2.
  • Jin et al. [2024] T. Jin, G. Maierhofer, K. Schratz, and Y. Xiang. A fast neural hybrid Newton solver adapted to implicit methods for nonlinear dynamics. arXiv:2407, 2024.
  • Jumper et al. [2021] J. Jumper, R. Evans, A. Pritzel, T. Green, M. Figurnov, O. Ronneberger, K. Tunyasuvunakool, R. Bates, A. Žídek, A. Potapenko, A. Bridgland, C. Meyer, S. A. A. Kohl, A. J. Ballard, A. Cowie, B. Romera-Paredes, S. Nikolov, R. Jain, J. Adler, T. Back, S. Petersen, D. Reiman, E. Clancy, M. Zielinski, M. Steinegger, M. Pacholska, T. Berghammer, S. Bodenstein, D. Silver, O. Vinyals, A. W. Senior, K. Kavukcuoglu, P. Kohli, and D. Hassabis. Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873):583–589, Aug. 2021. ISSN 1476-4687. doi: 10.1038/s41586-021-03819-2.
  • Kipf and Welling [2022] T. N. Kipf and M. Welling. Semi-Supervised Classification with Graph Convolutional Networks. In International Conference on Learning Representations, July 2022.
  • Lam et al. [2023] R. Lam, A. Sanchez-Gonzalez, M. Willson, P. Wirnsberger, M. Fortunato, F. Alet, S. Ravuri, T. Ewalds, Z. Eaton-Rosen, W. Hu, A. Merose, S. Hoyer, G. Holland, O. Vinyals, J. Stott, A. Pritzel, S. Mohamed, and P. Battaglia. Learning skillful medium-range global weather forecasting. Science, 382(6677):1416–1421, Dec. 2023. doi: 10.1126/science.adi2336.
  • Li et al. [2020a] Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, A. Stuart, K. Bhattacharya, and A. Anandkumar. Multipole Graph Neural Operator for Parametric Partial Differential Equations. In Advances in Neural Information Processing Systems, volume 33, pages 6755–6766. Curran Associates, Inc., 2020a.
  • Li et al. [2020b] Z. Li, N. B. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anandkumar. Fourier Neural Operator for Parametric Partial Differential Equations. In International Conference on Learning Representations, Sept. 2020b.
  • Li et al. [2023] Z. Li, N. B. Kovachki, C. Choy, B. Li, J. Kossaifi, S. P. Otta, M. A. Nabian, M. Stadler, C. Hundt, K. Azizzadenesheli, and A. Anandkumar. Geometry-Informed Neural Operator for Large-Scale 3D PDEs. In Thirty-Seventh Conference on Neural Information Processing Systems, Nov. 2023.
  • Lienen and Günnemann [2022] M. Lienen and S. Günnemann. Learning the Dynamics of Physical Systems from Sparse Observations with Finite Element Networks. In International Conference on Learning Representations, Mar. 2022.
  • Lu et al. [2021] L. Lu, P. Jin, G. Pang, Z. Zhang, and G. E. Karniadakis. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nature Machine Intelligence, 3(3):218–229, Mar. 2021. ISSN 2522-5839. doi: 10.1038/s42256-021-00302-5.
  • McRae et al. [2018] A. T. T. McRae, C. J. Cotter, and C. J. Budd. Optimal-Transport–Based Mesh Adaptivity on the Plane and Sphere Using Finite Elements. SIAM Journal on Scientific Computing, 40(2):A1121–A1148, 2018.
  • Paszke et al. [2017] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer. Automatic differentiation in PyTorch. NIPS-W, 2017.
  • Pfaff et al. [2023] T. Pfaff, M. Fortunato, A. Sanchez-Gonzalez, and P. Battaglia. Learning Mesh-Based Simulation with Graph Networks. In International Conference on Learning Representations, Feb. 2023.
  • Raissi [2018] M. Raissi. Forward-Backward Stochastic Neural Networks: Deep Learning of High-dimensional Partial Differential Equations, Apr. 2018.
  • Raissi et al. [2019] M. Raissi, P. Perdikaris, and G. E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686–707, Feb. 2019. ISSN 0021-9991. doi: 10.1016/j.jcp.2018.10.045.
  • Satorras et al. [2021] V. G. Satorras, E. Hoogeboom, and M. Welling. E(n) Equivariant Graph Neural Networks. In Proceedings of the 38th International Conference on Machine Learning, pages 9323–9332. PMLR, July 2021.
  • Song et al. [2022] W. Song, M. Zhang, J. G. Wallwork, J. Gao, Z. Tian, F. Sun, M. D. Piggott, J. Chen, Z. Shi, X. Chen, and J. Wang. M2N: Mesh Movement Networks for PDE Solvers. In Advances in Neural Information Processing Systems, May 2022.
  • Strang [2012] G. Strang. Computational Science and Engineering. Wellesley-Cambridge Press, Wellesley, Mass, second printing edition, 2012. ISBN 978-0-9614088-1-7.
  • Vaswani et al. [2017] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. ukasz Kaiser, and I. Polosukhin. Attention is All you Need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
  • Veličković et al. [2018] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio. Graph Attention Networks. In International Conference on Learning Representations, Feb. 2018.
  • Wallwork [2022] J. G. Wallwork. pyroteus/movement: Mesh movement in Firedrake, May 2022. Zenodo.
  • Yang et al. [2023] Y. Yang, Q. Yang, Y. Deng, and Q. He. MMPDE-Net and Moving Sampling Physics-informed Neural Networks Based On Moving Mesh Method, Nov. 2023.

Appendix A Further background on classical methods and motivation for optimal monitor function

Classical methods for mesh adaption (whether h-adaptive or r-adaptive) require some measure of the error of the solution. In h-adaptive methods this is usually done after the calculation through an a-posteriori error estimate. For many problems (particularly those which are strongly nonlinear) this can involved solving a delicate dual problem [2]. When implementing such a method, cells with a large a-posteriori error are subdivided. This leads to a high density of mesh points where the error is large, and effectively equidistributes the a-posteriori error over the computational cells. An alternative, often used in r-adaptive methods, is to have an a-priori estimate of the error, and then to move the mesh points to equidistribute this. A commonly used a-priori error estimator is the (local) interpolation error of the solution, which is (usually) proportional to a higher derivative of the solution such as the curvature (or in fluid mechanics the vorticity). A monitor function m(𝐱)𝑚𝐱m({\mathbf{x}})italic_m ( bold_x ) is then set to equal the curvature, and the mesh points determined so that the integral of m𝑚mitalic_m over a mesh cell (the interpolation error) is constant over the mesh cells. Whilst the interpolation error is not the true FEM error, it follows from Cea’s lemma that it is an upper bound, and hence can be used to control the FEM error. For a one dimensional problem the L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT error e𝑒eitalic_e of interpolating a function u(x)𝑢𝑥u(x)italic_u ( italic_x ) over a cell of width hhitalic_h using a piece-wise linear function is given a-priori by

e=h5u′′(ξ)2𝑒superscript5superscript𝑢′′superscript𝜉2e=h^{5}u^{\prime\prime}(\xi)^{2}italic_e = italic_h start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT italic_u start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_ξ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

where ξ𝜉\xiitalic_ξ is some point in the cell. The error over the mesh cell is therefore equidistributed if the cell length hhitalic_h is such that

h(u′′)2/5=θsuperscriptsuperscript𝑢′′25𝜃h(u^{\prime\prime})^{2/5}=\thetaitalic_h ( italic_u start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 / 5 end_POSTSUPERSCRIPT = italic_θ

where θ𝜃\thetaitalic_θ is a constant. The ’optimal’ monitor function in this case is then given by

m(x)=(u′′)2/5.𝑚𝑥superscriptsuperscript𝑢′′25m(x)=(u^{\prime\prime})^{2/5}.italic_m ( italic_x ) = ( italic_u start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 / 5 end_POSTSUPERSCRIPT .

In practice this is regularised to ensure a more even distribution of mesh points and we take

m(x)=(ϵ2+(u′′)2)1/5.𝑚𝑥superscriptsuperscriptitalic-ϵ2superscriptsuperscript𝑢′′215m(x)=\left(\epsilon^{2}+(u^{\prime\prime})^{2}\right)^{1/5}.italic_m ( italic_x ) = ( italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( italic_u start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 5 end_POSTSUPERSCRIPT .

It is shown in [22] that choosing this monitor function and solving the equidistribution equation

m(x)h=θ𝑚𝑥𝜃m(x)h=\thetaitalic_m ( italic_x ) italic_h = italic_θ

leads to an optimal interpolation error for a fixed number Nxsubscript𝑁𝑥N_{x}italic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT of mesh points. In one-dimension the equidistribution equation has a unique solution and can be solved using a (parabolic) moving mesh PDE for x𝑥xitalic_x, such as MMPDE5 (as used in this paper) [22]. In higher dimensions this equation does not have a unique solution and must be augmented with additional constraints. A powerful such constraint is to use optimal transport and to minimise the Wasserstein distance between the adapted mesh and a uniform mesh. This enforces excellent mesh regularity. This method is implemented in the Monge-Ampére algorithm considered in this paper. This method reduces to MMPDE5 in one-dimension and can be shown to give optimally smooth and meshes giving accurate FEM approximations [32]. However it is slow and hard to use. Hence the need for the faster ML based methods considered in this paper and by others.

Appendix B Details for numerical experiments

Software

All GNN architectures were implemented using PyTorch [33] and PyTorch geometric [13]. We use the repository by Wallwork [42] to generate the Monge–Ampére meshes used in our mesh-loss experiments. FEM solutions we generated using a combination of the Firedrake package [19], and difFEM code developed in PyTorch for this work.

Training and testing

Optimisation of our G-adaptivity method was carried out using the Adam optimiser within PyTorch with a constant learning rate of 1e31𝑒31e-31 italic_e - 3. We trained all experiments exclusively on an M2 MacBook cpu. Convergence was typically observed within just a few epochs, further highlighting the strength of the inductive bias in our model.

Datasets

The details of how we generated the datasets are as follows.

For ‘structured Gaussian’ in 1D (struct1g_27) as illustrated in Figure 4, we sample 1 Gaussian iteratively placed at 9 equi-spaced grid points in [0.1,0.9]0.10.9[0.1,0.9][ 0.1 , 0.9 ]. This process is repeated for Gaussians of scale in [0.1,0.2,0.3]0.10.20.3[0.1,0.2,0.3][ 0.1 , 0.2 , 0.3 ] resulting in 27 data points.

For ‘structured Gaussian’ in 2D (struct2g_75) as illustrated in Figures 5 and 6, we sample 2 Gaussians, one with fixed position and one iteratively placed in a 5×5555\times 55 × 5 grid equi-spaced in [0.1,0.9]2superscript0.10.92[0.1,0.9]^{2}[ 0.1 , 0.9 ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. This process is repeated for Gaussians of scale in [0.1,0.2,0.3]0.10.20.3[0.1,0.2,0.3][ 0.1 , 0.2 , 0.3 ] resulting in 25 data points.

For ‘random Gaussian’ (randg_25 and randg_275) in 2D, as illustrated in Figure 3 we randomly select n[1,2,3,4,5,6]𝑛123456n\in[1,2,3,4,5,6]italic_n ∈ [ 1 , 2 , 3 , 4 , 5 , 6 ] Gaussians, a mesh resolution in [11d,15d]superscript11𝑑superscript15𝑑[11^{d},15^{d}][ 11 start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , 15 start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ], n𝑛nitalic_n random locations in [0,1]2superscript012[0,1]^{2}[ 0 , 1 ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and n𝑛nitalic_n random scales in [0.1,0.4]0.10.4[0.1,\sqrt{0.4}][ 0.1 , square-root start_ARG 0.4 end_ARG ] for each Gaussian. We repeat the process for 25 or 275 examples respectively.

For testing we apply our models to the test datasets and report the corresponding error reductions in Table 1 of the main paper. For the ‘1d struct 21’ and ‘2d struct 75’ experiments the test dataset is identical to the training data, and for ‘2d randg 25’, ‘2d randg 275’ and Burger’s equation the testing data were sampled independently of the training data. For Poisson’s equation we evaluate against the analytical solution and for Burgers’ equation we perform an FEM solve on the fine grid used in evaluation.

Architectures

In addition to the G-adaptive architecture outlined in the main body of the paper, we benchmark against two graph attention (GAT) based architectures. For both Poisson’s and Burger’s equations we applied the vanilla GAT architecture, which has the form

𝐗k+1=𝐀GAT(𝐗k)𝐗ksuperscript𝐗𝑘1subscript𝐀𝐺𝐴𝑇superscript𝐗𝑘superscript𝐗𝑘\mathbf{X}^{k+1}=\mathbf{A}_{GAT}(\mathbf{X}^{k})\mathbf{X}^{k}bold_X start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT = bold_A start_POSTSUBSCRIPT italic_G italic_A italic_T end_POSTSUBSCRIPT ( bold_X start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) bold_X start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT

where 𝐀GATsubscript𝐀𝐺𝐴𝑇\mathbf{A}_{GAT}bold_A start_POSTSUBSCRIPT italic_G italic_A italic_T end_POSTSUBSCRIPT is the attention matrix calculated using [41] style attention. Note that we omit these results for Poisson’s equation due to poor performance of vanilla GAT, and instead we modified the GAT architecture to more closely match our G-adaptive architecture. This allows for a more direct comparison of the effect of the attention mechanism on the quality of the generated meshes. This results in the GAT_lap, which has the following form

𝐗k+1=(𝐀GAT(𝐗k)𝐈)𝐗k.superscript𝐗𝑘1subscript𝐀𝐺𝐴𝑇superscript𝐗𝑘𝐈superscript𝐗𝑘\mathbf{X}^{k+1}=(\mathbf{A}_{GAT}(\mathbf{X}^{k})-\mathbf{I})\mathbf{X}^{k}.bold_X start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT = ( bold_A start_POSTSUBSCRIPT italic_G italic_A italic_T end_POSTSUBSCRIPT ( bold_X start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) - bold_I ) bold_X start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT .

Note it was necessary to keep the residual and Laplacian form of GRAND as any further alterations made the model fail. In all cases of GRAND, GAT, and GAT_lap, we apply a network with 4 layers and optimise parameters consisting of the learnable weights (𝐖Q,𝐖Ksubscript𝐖𝑄subscript𝐖𝐾\mathbf{W}_{Q},\mathbf{W}_{K}bold_W start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT , bold_W start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT in GRAND, and 𝐖𝐖\mathbf{W}bold_W in GAT/GAT_lap).

Refer to caption
Figure 4: Example of struct1g_27 Gaussian data points in the dataset and the corresponding MMPDE5 mesh.
Refer to caption
Figure 5: Examples of struct2g_75 Gaussian data points with scale 0.1.
Refer to caption
Figure 6: Examples of struct2g_75 Monge–Ampère meshes with Gaussian scale 0.1 and MA regularisation 0.01.

Appendix C Convexity properties and non-tangling of our architecture

As mentioned in section 4.2 of the main body of our manuscript, the GRAND architecture inherently enforces non-tangling of mesh points by moving points into the convex hulls of neighbours at every instance. In particular, the graph diffusion equation is

d𝒳dt=(𝒜𝖨)𝒳𝑑𝒳𝑑𝑡𝒜𝖨𝒳\displaystyle\frac{d\mathcal{X}}{dt}=(\mathcal{A}-\mathsf{I})\mathcal{X}divide start_ARG italic_d caligraphic_X end_ARG start_ARG italic_d italic_t end_ARG = ( caligraphic_A - sansserif_I ) caligraphic_X (13)

where 𝒜𝒜\mathcal{A}caligraphic_A is the attention matrix (which implicitly of course depends on the node features of the graph). We recall that the attention matrix is row-stochastic, meaning every row of 𝒜𝒜\mathcal{A}caligraphic_A has non-negative entries that sum to 1. Moreover, our model has no self attention, meaning the diagonal entries of 𝒜𝒜\mathcal{A}caligraphic_A are zero. This immediately implies that (𝒜𝒳)isubscript𝒜𝒳𝑖(\mathcal{A}\mathcal{X})_{i}( caligraphic_A caligraphic_X ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT can be written as

(𝒜𝒳)i=j:(i,j)aij𝐱j,where aij0,jaij=1.formulae-sequencesubscript𝒜𝒳𝑖subscript:𝑗𝑖𝑗superscriptsubscript𝑎𝑖𝑗subscript𝐱𝑗formulae-sequencewhere subscript𝑎𝑖𝑗0subscript𝑗subscript𝑎𝑖𝑗1\displaystyle(\mathcal{A}\mathcal{X})_{i}=\sum_{j:(i,j)\in\mathcal{E}^{\prime}% }a_{ij}\mathbf{x}_{j},\quad\text{where }a_{ij}\geq 0,\sum_{j}a_{ij}=1.( caligraphic_A caligraphic_X ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j : ( italic_i , italic_j ) ∈ caligraphic_E start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , where italic_a start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ≥ 0 , ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = 1 .

This is precisely the statement that (𝒜𝒳)isubscript𝒜𝒳𝑖(\mathcal{A}\mathcal{X})_{i}( caligraphic_A caligraphic_X ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is a point in the convex hull of the graph neighbours of 𝐱isubscript𝐱𝑖\mathbf{x}_{i}bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

Thus reading (13) for each node i𝑖iitalic_i individually we see that the instantaneous change in 𝐱isubscript𝐱𝑖\mathbf{x}_{i}bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is forcing it to move in the direction of (𝒜𝒳)isubscript𝒜𝒳𝑖(\mathcal{A}\mathcal{X})_{i}( caligraphic_A caligraphic_X ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, i.e. to a point in the convex hull of the graph neighbours, i.e. the neighbouring mesh points. That this ultimately results in a non-tangled mesh is best seen on the following illustrative 1 dimensional result:

Proposition 1.

Suppose we have an ordered mesh on the 1d domain [0,1]01[0,1][ 0 , 1 ], i.e. mesh points 0=x1<x2<<xn=10subscript𝑥1subscript𝑥2subscript𝑥𝑛10=x_{1}<x_{2}<\cdots<x_{n}=10 = italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT < ⋯ < italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = 1. Under the continuous graph diffusion flow (13) the mesh will never tangle, that is no two mesh points will cross.

Proof by contradiction.

Suppose the mesh crosses at a future time, i.e. there is xi,xi+1subscript𝑥𝑖subscript𝑥𝑖1x_{i},x_{i+1}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT, t>0𝑡0t>0italic_t > 0, such that xi(t)>xi+1(t)subscript𝑥𝑖𝑡subscript𝑥𝑖1𝑡x_{i}(t)>x_{i+1}(t)italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) > italic_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ( italic_t ) and, by continuity, a time Tcsubscript𝑇𝑐T_{c}italic_T start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT (the crossing time) when xi(Tc)=xi+1(Tc),(x˙i+1x˙i)<0formulae-sequencesubscript𝑥𝑖subscript𝑇𝑐subscript𝑥𝑖1subscript𝑇𝑐subscript˙𝑥𝑖1subscript˙𝑥𝑖0x_{i}(T_{c})=x_{i+1}(T_{c}),(\dot{x}_{i+1}-\dot{x}_{i})<0italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_T start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) = italic_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ( italic_T start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) , ( over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT - over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) < 0 and 0=x0<<xi1(Tc)<xi(Tc)=xi+1(Tc)xi+2(Tc)<<xNx=10subscript𝑥0subscript𝑥𝑖1subscript𝑇𝑐subscript𝑥𝑖subscript𝑇𝑐subscript𝑥𝑖1subscript𝑇𝑐subscript𝑥𝑖2subscript𝑇𝑐subscript𝑥subscript𝑁𝑥10=x_{0}<\cdots<x_{i-1}(T_{c})<x_{i}(T_{c})=x_{i+1}(T_{c})\leq x_{i+2}(T_{c})<% \cdots<x_{N_{x}}=10 = italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT < ⋯ < italic_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ( italic_T start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) < italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_T start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) = italic_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ( italic_T start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) ≤ italic_x start_POSTSUBSCRIPT italic_i + 2 end_POSTSUBSCRIPT ( italic_T start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) < ⋯ < italic_x start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUBSCRIPT = 1. Here we assumed that i0,i+1Nxformulae-sequence𝑖0𝑖1subscript𝑁𝑥i\neq 0,i+1\neq N_{x}italic_i ≠ 0 , italic_i + 1 ≠ italic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT, as the boundary cases can be treated analogously. Now we note that by the graph diffusion equation we also have, denoting d=xi+1xi𝑑subscript𝑥𝑖1subscript𝑥𝑖d=x_{i+1}-x_{i}italic_d = italic_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT,

d˙˙𝑑\displaystyle\dot{d}over˙ start_ARG italic_d end_ARG =x˙i+1x˙i=(axi+(1a)xi+2xi+1)(bxi1+(1b)xi+1xi)absentsubscript˙𝑥𝑖1subscript˙𝑥𝑖𝑎subscript𝑥𝑖1𝑎subscript𝑥𝑖2subscript𝑥𝑖1𝑏subscript𝑥𝑖11𝑏subscript𝑥𝑖1subscript𝑥𝑖\displaystyle=\dot{x}_{i+1}-\dot{x}_{i}=\left(ax_{i}+(1-a)x_{i+2}-x_{i+1}% \right)-\left(bx_{i-1}+(1-b)x_{i+1}-x_{i}\right)= over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT - over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( italic_a italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + ( 1 - italic_a ) italic_x start_POSTSUBSCRIPT italic_i + 2 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ) - ( italic_b italic_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + ( 1 - italic_b ) italic_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )

for some numbers a,b[0,1]𝑎𝑏01a,b\in[0,1]italic_a , italic_b ∈ [ 0 , 1 ]. Thus we have

d˙˙𝑑\displaystyle\dot{d}over˙ start_ARG italic_d end_ARG =(2b)d(2b)xi+axibxi1+(1a)xi+2+xiabsent2𝑏𝑑2𝑏subscript𝑥𝑖𝑎subscript𝑥𝑖𝑏subscript𝑥𝑖11𝑎subscript𝑥𝑖2subscript𝑥𝑖\displaystyle=-(2-b)d-(2-b)x_{i}+ax_{i}-bx_{i-1}+(1-a)x_{i+2}+x_{i}= - ( 2 - italic_b ) italic_d - ( 2 - italic_b ) italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_a italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b italic_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + ( 1 - italic_a ) italic_x start_POSTSUBSCRIPT italic_i + 2 end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
=(2b)d(1b+a)xibxi1+(1a)xi+2absent2𝑏𝑑1𝑏𝑎subscript𝑥𝑖𝑏subscript𝑥𝑖11𝑎subscript𝑥𝑖2\displaystyle=-(2-b)d-(1-b+a)x_{i}-bx_{i-1}+(1-a)x_{i+2}= - ( 2 - italic_b ) italic_d - ( 1 - italic_b + italic_a ) italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b italic_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + ( 1 - italic_a ) italic_x start_POSTSUBSCRIPT italic_i + 2 end_POSTSUBSCRIPT
=(2b)d=0+b(xixi1)>0+(1a)(xi+2xi)0>0.absentsubscript2𝑏𝑑absent0subscript𝑏subscript𝑥𝑖subscript𝑥𝑖1absent0subscript1𝑎subscript𝑥𝑖2subscript𝑥𝑖absent00\displaystyle=\underbrace{-(2-b)d}_{=0}+\underbrace{b\left(x_{i}-x_{i-1}\right% )}_{>0}+\underbrace{(1-a)\left(x_{i+2}-x_{i}\right)}_{\geq 0}>0.= under⏟ start_ARG - ( 2 - italic_b ) italic_d end_ARG start_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT + under⏟ start_ARG italic_b ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) end_ARG start_POSTSUBSCRIPT > 0 end_POSTSUBSCRIPT + under⏟ start_ARG ( 1 - italic_a ) ( italic_x start_POSTSUBSCRIPT italic_i + 2 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_POSTSUBSCRIPT ≥ 0 end_POSTSUBSCRIPT > 0 .

This leads to a contradiction thus completing the proof. ∎

Appendix D Notes on implementation of difFEM

The software package difFEM is implemented using PyTorch, and provides a finite element code which is fully differentiable (since written in PyTorch) - in particular can compute derivatives of any observational with respect to the mesh points using the discrete adjoint approach described in the main body of this submission. In the current form we chose piecewise linear finite elements for the FEM discretisations, but the concept extends naturally to higher degree piece-wise polynomial basis functions and this will be incorporated in future versions of difFEM. To demonstrate the conceptual idea of differentiation of finite element codes with respect to mesh points let us briefly outline the following one dimensional example: Suppose we are give a mesh 𝒳={0=x1<<xNx=1}𝒳0subscript𝑥1subscript𝑥subscript𝑁𝑥1\mathcal{X}=\{0=x_{1}<\dots<x_{N_{x}}=1\}caligraphic_X = { 0 = italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < ⋯ < italic_x start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUBSCRIPT = 1 }, then the associated FEM approximation space is S𝒳={ϕx}i=1Nxsubscript𝑆𝒳superscriptsubscriptsubscriptitalic-ϕ𝑥𝑖1subscript𝑁𝑥S_{\mathcal{X}}=\{\phi_{x}\}_{i=1}^{N_{x}}italic_S start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT = { italic_ϕ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT where piecewise linear basis functions are defined as

ϕi(x)={xxi1xixi1if xi1xxi,xi+1xxi+1xiif xixxi+1,0otherwise.subscriptitalic-ϕ𝑖𝑥cases𝑥subscript𝑥𝑖1subscript𝑥𝑖subscript𝑥𝑖1if subscript𝑥𝑖1𝑥subscript𝑥𝑖subscript𝑥𝑖1𝑥subscript𝑥𝑖1subscript𝑥𝑖if subscript𝑥𝑖𝑥subscript𝑥𝑖10otherwise.\displaystyle\phi_{i}(x)=\begin{cases}\frac{x-x_{i-1}}{x_{i}-x_{i-1}}&\text{if% }x_{i-1}\leq x\leq x_{i},\\ \frac{x_{i+1}-x}{x_{i+1}-x_{i}}&\text{if }x_{i}\leq x\leq x_{i+1},\\ 0&\text{otherwise.}\end{cases}italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) = { start_ROW start_CELL divide start_ARG italic_x - italic_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT end_ARG end_CELL start_CELL if italic_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ≤ italic_x ≤ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT - italic_x end_ARG start_ARG italic_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_CELL start_CELL if italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ italic_x ≤ italic_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL otherwise. end_CELL end_ROW

These provide the basis for any approximation in the FEM solver, for example the load vector for a given forcing f:[0,1]:𝑓01f:[0,1]\rightarrow\mathbb{R}italic_f : [ 0 , 1 ] → blackboard_R is computed as follows

𝐛𝒳=(f,ϕxL2([0,1]))i=1Nxsubscript𝐛𝒳superscriptsubscriptsubscript𝑓subscriptitalic-ϕ𝑥superscript𝐿201𝑖1subscript𝑁𝑥\displaystyle\mathbf{b}_{\mathcal{X}}=(\langle f,\phi_{x}\rangle_{L^{2}([0,1])% })_{i=1}^{N_{x}}bold_b start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT = ( ⟨ italic_f , italic_ϕ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( [ 0 , 1 ] ) end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT

and the stiffness matrix 𝖬𝒳subscript𝖬𝒳\mathsf{M}_{\mathcal{X}}sansserif_M start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT appearing in the discretisation of Poisson’s equation has entries

(𝖬𝒳)ij=xϕj,xϕi.subscriptsubscript𝖬𝒳𝑖𝑗subscript𝑥subscriptitalic-ϕ𝑗subscript𝑥subscriptitalic-ϕ𝑖\displaystyle\left(\mathsf{M}_{\mathcal{X}}\right)_{ij}=\langle\partial_{x}% \phi_{j},\partial_{x}\phi_{i}\rangle.( sansserif_M start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = ⟨ ∂ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , ∂ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟩ .

Thus if we want to compute the derivative of any of these quantities with respect to xj,j=1,,Nxformulae-sequencesubscript𝑥𝑗𝑗1subscript𝑁𝑥x_{j},j=1,\dots,N_{x}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_j = 1 , … , italic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT it suffices to be able to compute dϕi/dxj𝑑subscriptitalic-ϕ𝑖𝑑subscript𝑥𝑗d\phi_{i}/dx_{j}italic_d italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / italic_d italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. This can be done analytically and is incorporated in the set up of the FEM code when difFEM is assembled. For example we have

dϕidxi={xxi1(xixi1)2if xi1xxi,xi+1x(xi+1xi)2if xixxi+1,0otherwise,𝑑subscriptitalic-ϕ𝑖𝑑subscript𝑥𝑖cases𝑥subscript𝑥𝑖1superscriptsubscript𝑥𝑖subscript𝑥𝑖12if subscript𝑥𝑖1𝑥subscript𝑥𝑖subscript𝑥𝑖1𝑥superscriptsubscript𝑥𝑖1subscript𝑥𝑖2if subscript𝑥𝑖𝑥subscript𝑥𝑖10otherwise,\frac{d\phi_{i}}{dx_{i}}=\begin{cases}-\frac{x-x_{i-1}}{(x_{i}-x_{i-1})^{2}}&% \text{if }x_{i-1}\leq x\leq x_{i},\\ \frac{x_{i+1}-x}{(x_{i+1}-x_{i})^{2}}&\text{if }x_{i}\leq x\leq x_{i+1},\\ 0&\text{otherwise,}\end{cases}divide start_ARG italic_d italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_d italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = { start_ROW start_CELL - divide start_ARG italic_x - italic_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT end_ARG start_ARG ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_CELL start_CELL if italic_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ≤ italic_x ≤ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT - italic_x end_ARG start_ARG ( italic_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_CELL start_CELL if italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ italic_x ≤ italic_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL otherwise, end_CELL end_ROW

and derivatives dϕi/dxj𝑑subscriptitalic-ϕ𝑖𝑑subscript𝑥𝑗d\phi_{i}/dx_{j}italic_d italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / italic_d italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for ij𝑖𝑗i\neq jitalic_i ≠ italic_j can be computed analogously.