G-Adaptive mesh refinement - leveraging graph neural networks and differentiable finite element solvers

James Rowbottom
Department of Applied Mathematics
and Theoretical Physics
University of Cambridge
Cambridge, United Kingdom
Georg Maierhofer
Mathematical Institute
University of Oxford
Oxford, United Kingdom
Teo Deveney
Department of Computer Science
University of Bath
Bath, United Kingdom
Katharina Schratz
Laboratoire Jacques-Louis Lions
Sorbonne Université
Paris, France
Pietro Liò
Department of Computer Science
and Technology
University of Cambridge
Cambridge, United Kingdom
Carola-Bibiane Schönlieb
Department of Applied Mathematics
and Theoretical Physics
University of Cambridge
Cambridge, United Kingdom
Chris Budd
Department of Mathematical Sciences
University of Bath
Bath, United Kingdom
Corresponding author and joint first author; [email protected]Joint first author;[email protected]

Abstract

We present a novel, and effective, approach to the long-standing problem of mesh adaptivity in finite element methods (FEM). FE solvers are powerful tools for solving partial differential equations (PDEs), but their cost and accuracy are critically dependent on the choice of mesh points. To keep computational costs low, mesh relocation (r-adaptivity) seeks to optimise the position of a fixed number of mesh points to obtain the best FE solution accuracy. Classical approaches to this problem require the solution of a separate nonlinear “meshing” PDE to find the mesh point locations. This incurs significant cost at remeshing and relies on certain a-priori assumptions and guiding heuristics for optimal mesh point location. Recent machine learning approaches to r-adaptivity have mainly focused on the construction of fast surrogates for such classical methods. Our new approach combines a graph neural network (GNN) powered architecture, with training based on direct minimisation of the FE solution error with respect to the mesh point locations. The GNN employs graph neural diffusion (GRAND), closely aligning the mesh solution space to that of classical meshing methodologies, thus replacing heuristics with a learnable strategy, and providing a strong inductive bias. This allows for rapid and robust training and results in an extremely efficient and effective GNN approach to online r-adaptivity. This method outperforms classical and prior ML approaches to r-adaptive meshing on the test problems we consider, in particular achieving lower FE solution error, whilst retaining the significant speed-up over classical methods observed in prior ML work.

1 Introduction

Finite element methods (FEM) are state-of-the-art when it comes to the large scale solution of partial differential equations (PDEs) [2, 10]. Central advantages are robustness, good error estimates, thoroughly developed code bases (such as PetscFE [10], Firedrake [19]), which are highly parallelisable and efficient. However, even with such optimised software the simulation of large scale problems (weather forecasting, structural simulations in engineering systems) is computationally costly. An important ingredient that determines the cost is the number of degrees of freedom (DOFs) included in a FEM, proportional to the number $N_{x}$ of mesh points on which it operates. It is therefore desirable to keep $N_{x}$ moderate. In order to optimise the FEM error a typical approach is mesh-adaptivity, i.e. to refine the mesh in such a way that important features of the solutions are captured at the right scale and therefore reducing the approximation error. In the present work we apply a stable and highly efficient GNN based architecture to implement a learnable $r$ -adaptive method, which keeps $N_{x}$ fixed, but adapts the mesh point location such that the overall error of the FEM is reduced. This problem is computationally extremely challenging and as a result most of classical $r$ -adaptive methods have focused on finding and minimizing mathematical substitutes for the FEM error (in particular upper bounds of simpler form). An example of this would be to choose mesh points to minimize the solution interpolation error which is an upper bound (up to some parameters and constants) of the FEM error arising from Céa’s lemma [22, p. 56]. As we describe below, even recent ML approaches have essentially relied on similar mathematical simplifications. In contrast, in the present work we take an entirely different approach. We present G-adaptivity, an approach to mesh adaptivity that uses a GNN to generate meshes that directly minimise the error of the corresponding FEM solution. We couple fast backpropagation through the GNN, with mesh point gradients attained through the discrete adjoint equation associated with the FEM solver, to tractably minimise the FEM approximation error directly (as opposed to the upper bound considered in [22]). The result is a model capable of outperforming the current state-of-the-art equidistribution-based r-adaptivity methods, whilst providing the acceleration of DL based approaches (cf. Figure 1).

Refer to caption — Figure 1: Outline of G-adaptivity architecture: (a) GNN feature space construction. (b) GNN adaptive meshing blocks (c) direct optimisation w.r.t the $L^{2}$ FEM error using differentiable FEM

Contributions

Indeed, our central novelty is to provide a new $r$ -adaptive GNN which fundamentally differs from the paradigm of emulation of traditional meshing methods in prior work. More specifically, our contributions are the following:

•

A novel GNN architecture for mesh movement applications, incorporating diffusion-based dynamics on graphs similar to those seen in classical approaches. This results in a neural network which avoids mesh-tangling and provides a strong inductive bias for fast training and high-quality adapted meshes.
•

We introduce a differentiable FEM solver (difFEM) based on the discrete adjoint method, and outline how to combine the gradients obtained from this with those obtained through backpropagation through the GNN to allow direct training of the GNN with FEM approximation error as the objective.
•

An analysis of the regularity properties of meshes generated by our GNN architecture. Specifically we show that mesh points are moved around within the convex hull of their neighbours, thus preventing overlapping triangulations (mesh tangling).
•

A thorough experimental report comparing our approach to classical and recent approaches in terms of accuracy and computational time. Our experiments cover both 1 and 2 dimensional examples, and include both stationary and time-dependent equations.

We highlight that our approach does not require a dataset of adapted meshes a-priori, as it is based on minimising the FEM error directly. This results in our method outperforming even state-of-the-art classical methods in terms of FEM accuracy as demonstrated in the numerical examples in Section 5. Interestingly this direct minimisation results in qualitative differences of the mesh appearance, suggesting that such a direct approach is indeed necessary to achieve optimal performance.

2 Related work

ML and PDEs

As indicated above, the effective approximation of solutions to partial differential equations is one of the central problems in computational mathematics. Over the recent decade increasingly work has been devoted to using ML for the numerical approximation of PDEs. This includes notably physics informed neural networks [36, 35], related work on Fourier neural operators [28, 29] graph neural operators [27], DeepONets [31], Message Passing Neural PDE Solvers [5] and the deep Ritz method [12]. The majority of such approaches try to directly approximate the PDE solution operator with a machine learning surrogate. Such methods offer certain advantages (for example in high dimensional settings [20]), but are typically outperformed by traditional numerical methods in accuracy in most settings [17]. Our approach differs fundamentally from these, by instead providing a central ingredient, the mesh, without replacing the classical PDE solver. This has the crucial advantage that we retain convergence guarantees and robustness of FEMs which are often lacking in direct ML approaches for the solution of PDEs.

Classical adaptive mesh refinement

Adaptive mesh methods are a widely used tool for improving the performance of a classical FEM by varying the local density of the mesh points. This is necessary if the underlying solution has small length scales or singularities, often induced by the geometry of the domain. Adaptivity allows for a high accuracy without the cost of using a uniformly fine mesh. The most widely used form of adaptivity is h-adaptivity [2], in which mesh cells are subdivided when some a-posteriori estimate of the solution approximation is large. Such methods have complex data structures and, possibly, poor mesh regularity. An alternative are the relocation based r-adaptive methods considered in this paper, in which a fixed number of mesh points are move to achieve a high density of points where some monitor $m$ of the solution error is large. Done correctly this can lead to significant error reduction at little extra cost [22]. r-adaptive methods retain a constant data structure, and the movement of the mesh points can reproduce Lagrangian structures in the solution.

Graph Neural Networks (GNNs)

GNNS are the dominant approach to applying machine learning to irregularly structured data [1, 4]. There has been a proliferation of architectures inspired by spectral graph theory [11], convolutional (GCN) [25], message passing (MPNN) [15] and attentional (GAT) [41] approaches. More recently a range of differential equation inspired architectures [9, 8, 16] look to take tools from the analysis and methods of solving different equations to try and solve known problems with GNNs including stability, oversmoothing and bottleneck phenomenon. These architectures often contain a strong inductive bias from the design inspiration for example equivariance [37], anisotropic diffusion and stability [9]. These inductive biases along with the powerful message passing paradigm are helping solve some of the most pressing problems in science; protein folding [24], weather prediction [26], dynamics learning[34] and solving PDEs by inspiring new numerical methods [5, 30, 3].

ML and adaptive meshing

The significant computational cost of classical methods for adaptive meshing (in particular r-adaptive methods) supports the need for fast ML-based alternatives. While this topic is still in the early stages of research there have been several notable contributions in recent years. This includes work on h-adaptive mesh refinement [14, 34], and a larger amount of contributions to r-adaptivity constructing ML based solvers for the classical mesh movement PDEs [43, 21] and supervised (data-based) learning for mesh adaptivity using Graph Neural Networks (GNNs) [38]. For the r-adaptive case in particular prior work has focused on the construction of fast surrogates to classical approaches and as outlined below this marks one of the central differences to our novel approach.

3 Preliminaries and background

3.1 Problem specification

In the following we consider the solution of an abstract PDE on domain $\Omega$ of dimension $d$ in the form

\mathcal{F}(u_{t},u,\nabla u,\nabla^{2}u,\dots)=f,\quad\tilde{x}\in\widetilde{% \Omega},\quad u=g,\quad\tilde{x}\in\partial\widetilde{\Omega}.

(1)

Note we will treat both time-dependent and time-independent problems in the example section, and it is thus understood that $\widetilde{\Omega}=[0,T]\times{\Omega}$ and $\tilde{x}=(t,{x})$ in the time-dependent case (and that $\widetilde{\Omega}={\Omega},\tilde{x}=x$ in the time-independent case). We solve (1) with a finite element method in the weak formulation, which means we commence with a mesh $\mathcal{X}$ and a set of piecewise linear basis functions $S_{\mathcal{X}}$ on that mesh. For us trial and test functions will be the same and a finite element discretisations will typically mean that we seek $U_{\mathcal{X}}\in S_{\mathcal{X}}$ using Galerkin optimality conditions of the form

\displaystyle a(\partial_{t}U_{\mathcal{X}},\nabla U_{\mathcal{X}},\dots;v,% \nabla v,\dots)=(f,v)_{L^{2}({\Omega})},\quad\forall v\in S_{\mathcal{X}}

(2)

where $a$ is a bilinear form that is usually found by integration by parts of $(\mathcal{F}(U_{t},\dots),v)_{L^{2}({\Omega})}$ . In the case of Poisson’s equation with zero Dirichlet boundary conditions the above simplifies to

\displaystyle(\nabla U_{\mathcal{X}},\nabla v)_{L^{2}({\Omega})}=(f,v)_{L^{2}(% {\Omega})}.

This method of solution results in an expansion $U_{\mathcal{X}}=\sum_{i=1}^{N_{x}}\alpha_{\mathcal{X}}\phi_{\mathcal{X},i}$ where $S_{\mathcal{X}}=\{\phi_{\mathcal{X},i}\}_{i=1}^{N_{x}}$ . The goal in $r$ -adaptive meshing is then to minimise the solution error

\displaystyle\mathcal{E}(\mathcal{X},U_{\mathcal{X}}):=\|U_{\mathcal{X}}-u\|_{% L^{2}({\Omega})}

(3)

with respect to the mesh locations $\mathcal{X}$ without changing the number of meshpoints $N_{x}$ . Note we will on occasion suppress the subscript $\mathcal{X}$ from $U$ if the dependence is clear from context.

3.2 Adaptive Meshing

Relocation based r-adaptivity moves a mesh with a certain topology to another with the same topology, where the mesh points are moved, but their connectivity (and hence the associated data structures) is unaltered. Such methods typically map a fixed mesh in a computational domain (i.e. a representation of the mesh graph) to a moving mesh in the physical domain where the PDE is posed. We denote the mesh points in the physical domain by ${\mathbf{x}}^{(i)}$ and they define a triangulation $\mathcal{T}$ so that

\mathcal{X}=\left\{\bm{x}^{(i)}\in\mathbb{R}^{d}\right\}_{i=1}^{N_{x}},\quad% \mathcal{T}=\left\{\Delta^{(j)}\left|\Delta^{(j)}\subset\mathcal{X},\right.% \mid|\Delta^{(j)}\mid=d+1\right\}_{j=1}^{N_{T}}.

We define the following domains and coordinates: $\Omega_{C}\subseteq\mathbb{R}^{d}$ - “computational" domain eg. the plane, sphere, mapped to $\Omega_{P}\subseteq\mathbb{R}^{d}$ - “physical" domain, with $\bm{\xi}$ - coordinates in the computational domain, $\bm{x}$ - coordinates in the physical domain. To construct an adaptive mesh we consider a differentiable time-dependent deformation map $\mathbf{F}:\Omega_{C}\rightarrow\Omega_{P}$ , so that $\bm{X}=\mathbf{F}(\mathbf{\xi},t)$ and $\mathbf{F}{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}(\partial\Omega_{C})=}% \partial\Omega_{P}$ . If ${\bm{\xi}}^{(i)}$ are the fixed mesh points in the computational domain then $\bm{x}^{(i)}=\mathbf{F}(\bm{\xi}^{(i)},t)$ Assuming the mesh in the computational domain is regular, then determining the (properties of the) mesh in the physical domain, reduces to finding, and analysing, the function ${\mathbf{F}}$ .

Location based methods find this map directly, usually by solving a partial differential equation, or variational principle, linked to it. Monge-Ampére based methods assume that the deformation map is a Legendre transform (associated with an Optimal Transport regularisation) with a ‘mesh potential’ $\phi(\bm{\xi},t)$ for which ${\mathbf{F}}=\nabla_{\bm{\xi}}\phi.$ The linearisation $J$ of the deformation map, given by $J=\partial{\mathbf{F}}/\partial{\bm{\xi}}\equiv H(\phi)$ where $H$ is the Hessian of $\phi$ . Relocation methods usually equidistribute a monitor function $m(x)$ so that $\phi$ satisfies the Monge-Ampére equation

m(x)|H(\phi)|=m(\nabla\phi)|H(\phi)|=\theta,\quad\mbox{for constant}\quad\theta.

(4)

The monitor function is a measure of the local solution error, for example

\displaystyle m_{1}(x)=\left(1+u_{xx}^{2}+u_{yy}^{2}\right)^{1/5}\text{\cite[c% ite]{[\@@bibref{Number}{huang_adaptive_2011}{}{}]}},\quad m_{2}(x)=1+\frac{% \alpha\left(u-u_{\text{exact }}\right)^{2}}{\max_{x,y}\left(u-u_{\text{exact }% }\right)^{2}}+\frac{\beta\|H(u)\|_{F}}{\max_{x,y}\|H(u)\|_{F}}\text{\cite[cite% ]{[\@@bibref{Number}{song_m2n_2022}{}{}]}}.

(5)

The PDE (4) has a unique, convex, solution [6] which avoids mesh tangling. However it is challenging, and can be slow, to solve. Solution procedures include relaxation methods [7], quasi-Newton methods [32], surrogate methods [38], and PINNs [43].

Velocity based methods find an ODE describing the mesh point evolution so that

\dot{\bm{x}}^{(i)}={\mathbf{v}}({\bm{x}}^{(i)},t),

(6)

with ${\mathbf{v}}={\bm{0}}$ when $\bm{x}^{(i)}=\bm{\xi}^{(i)}.$ The choice of velocity function ${\mathbf{v}}$ is critical to the success of such methods, and is often motivated by natural Lagrangian structures of the underlying PDE, [22] Chapter 7. The G-adaptive method in this paper is a velocity based method in which ${\mathbf{v}}$ is derived by calculating the rate of change of the FE solution error. As we outline in section 4 and the supplementary material, G-adaptive methods avoid the mesh tangling often associated with velocity methods.

3.3 Graph Neural Networks (GNNs)

We consider a graph $\mathcal{G}=(\mathcal{V},\mathcal{E})$ , with $N_{x}:=|\mathcal{V}|$ nodes and $\mathcal{E}\subset\mathcal{V}\times\mathcal{V}$ edges. The adjacency matrix $\mathbf{A}$ is defined as $a_{ij}=1$ if $(i,j)\in\mathcal{E}$ and zero otherwise. Node features $\left\{\mathbf{x}_{i}\in\mathbb{R}^{d}_{0}:i\in\mathrm{V}\right\}$ can be represented in matrix notation as $\mathbf{X}\in\mathbb{R}^{N_{x}\times d}$ .

A graph neural network (GNN) $\mathcal{M}_{\theta}:\mathbb{R}^{N_{x}\times d_{0}}\times\mathcal{E}% \rightarrow\mathbb{R}^{N_{x}\times d_{N}}$ is constructed with layers: $\mathcal{L}_{\theta_{k}}:\mathbb{R}^{N_{x}\times d_{k}}\rightarrow\mathbb{R}^{% N_{x}\times d_{k+1}}$ acting node wise as $\mathbf{x}_{i}^{k+1}=\mathcal{L}_{\theta_{k}}(\mathbf{x}_{i}^{k})=\phi_{\theta% _{k}}\left(\mathbf{x}_{i}^{k},\sum\limits_{j\in\mathcal{N}_{i}}\varphi_{\theta% _{k}}(\mathbf{x}_{j}^{k})\right)$

Two important examples are GCN [25] in matrix form $\mathbf{X}^{k+1}=\sigma(\mathbf{A}\mathbf{X}^{k}\mathbf{W})$ and GAT [41] $\mathbf{X}^{k+1}=\sigma(\mathbf{A}(\mathbf{X}^{k})\mathbf{X}^{k}\mathbf{W})$ where $a_{ij}=a(x_{i},x_{j})$ are attentional coefficients. We can include encoder $E$ and decoder $D$ such that

\mathbf{y}_{i}=\mathcal{M}_{\theta}(\left(\mathbf{X}_{0}\right))_{i}=\left(D% \left(\mathcal{L}_{\theta_{N}}\left(\dots\mathcal{L}_{\theta_{0}}\left(E\left(% \mathbf{X}_{0}\right)\right)\right)\right)\right)_{i}.

(7)

4 G-Adaptivity a GNN based adaptive mesh refinement

Our novel approach to meshing G-adaptivity using GNNs consists of three main parts: Figure 1(a) the careful selection of the GNN feature space, (b) a stable, easy-to-train and low cost GNN, (c) and a differentiable FEM solver (difFEM) for direct optimisation with respect to the FEM error (3).

4.1 Graph Neural Diffusion

GRAND [9] is a state-of-the-art GNN architecture that formulates a learnable non-homogeneous (or anisotropic) diffusion equation on a graph. Similar to neural ODEs [chen] it considers layers through the network as time and, by the properties of the row stochastic adjacency matrix, has favourable stability properties in inference and training. Given the differential equation approach to classical meshing strategies the architecture specifically aligns itself with these and brings the power of GNNs to the meshing problem. The GRAND architecture is achieved by making residual and treating propagation with an attentional graph Laplacian the GNN equation 7 becomes

\displaystyle\frac{\partial}{\partial t}\mathbf{X}(t)=(\mathbf{A}_{\theta}(% \mathbf{X}(t))-\mathbf{I})\mathbf{X}(t).

(8)

$a\left(\mathbf{X}_{i},\mathbf{X}_{j}\right)$ is calculated using the scaled dot product attention [40]

\displaystyle a_{ij}=a\left(\mathbf{X}_{i},\mathbf{X}_{j}\right)=\operatorname% {softmax}\left(\left(\mathbf{X}_{i}^{\top}\mathbf{W}_{K}^{\top}\mathbf{W}_{Q}% \mathbf{X}_{j}\right)/d_{k}\right).

(9)

where $a_{i,j}>0$ if $(i,j)\in\mathcal{E}$ and $\sum_{j\in\mathcal{N}(i)}a_{ij}=1$ . To construct the feature matrix required for G-Adaptivity we concatenate coordinates of a regular FE grid $\bm{\xi}\in\mathbb{R}^{N_{x}\times d}$ , a preliminary coarse FE solve $\tilde{u}\in\mathbb{R}^{N_{x}}$ and curvature estimate $\tilde{f}\in\mathbb{R}^{N_{x}}$ on the regular grid giving the features vector $\mathbf{X}=\left(\bm{\xi}\mathbin{\|}\tilde{u}\mathbin{\|}\tilde{f}\right)\in% \mathbb{R}^{N_{x}\times d+2}$ . Then we define our G-adaptive operator

\displaystyle\mathcal{M}_{\bm{\theta}}:(\mathbf{X},\mathbf{A})\mapsto\mathcal{% X}=(\bm{x}_{i})_{i=1}^{N_{x}}.

$\mathcal{M}_{\theta}$ uses an identity encoder $E$ to lift the latent space to $\mathbb{R}^{N_{x}\times d+2+|\lambda|}$ and a decoder $D$ to return physical coordinates. GRAND layers are used as convolutional blocks that induce several desirable properties for the meshing problem. By construction the adjacency is row stochastic this guarantees stability properties on the solver step size. Feature channels don’t mix instead they interact via attention expressed with elements of the adjacency, this is important as the first two features are physical coordinates. Finally the learnable weights $Q$ and $K$ matrices allow for an anistropic diffusion which is akin to a learnable monitor function from classical methods.

Regularity of GRAND generated meshes

The GRAND based architecture has a key advantage in the generation of regular meshes. A requirement of FEM meshes is that they are not ‘tangled’, i.e. that they form a well-posed triangulation of the domain $\Omega$ (i.e. no triangles overlap). This follows if each mesh point is in the convex hull of its neighbours on the graph. We show in the supplementary material that our architecture automatically moves points to the convex hull of their neighbours, enforcing this property.

4.2 difFEM: a differentiable FEM solver and FEM loss

As mentioned above, prior work for $r$ -adaptivity both in classical adaptive meshing (cf. [7]) and in machine learning approaches to the problem (cf. [14, 34, 43, 21, 38]) has built on the assumption that direct minimisation of the FEM error, whilst highly desirable, is both expensive and highly non-convex and therefore computationally infeasible [22, p. 56]. It turns out that recent advances in computational power and software have brought exactly this direct minimisation into the realm of the possible. Indeed, we introduce an adjoint framework which allows us to introduce difFEM a finite element solver which is fully differentiable with respect to mesh point locations. Based on the chain rule we can then exploit this solver to directly compute gradients of the FEM error (3) with respect to the GNN parameters $\bm{\theta}$ . In particular we have (writing $\mathcal{M}_{\bm{\theta}}=\mathcal{M}_{\bm{\theta}}(\mathbf{X},\mathcal{A})$ for simplicity)

\displaystyle\frac{\mathrm{d}}{\mathrm{d}{\theta_{j}}}\mathcal{E}(\mathcal{M}_% {\bm{\theta}},U_{\mathcal{M}_{\bm{\theta}}})=\sum_{i=1}^{N_{x}}\frac{\mathrm{d% }(\mathcal{M}_{\bm{\theta}}(\mathbf{X},\mathcal{A}))_{i}}{\mathrm{d}\theta_{j}% }\cdot\frac{\mathrm{d}\mathcal{E}(\mathcal{X},U_{\mathcal{X}})}{\mathrm{d}% \mathbf{x}_{i}}\Big{|}_{\mathcal{X}=\mathcal{M}_{\bm{\theta}}(\mathbf{X},% \mathcal{A})}

In order to compute the derivative values of the FEM error $\mathcal{E}$ with respect to the mesh coordinates $\mathcal{X}=\{\mathbf{x}_{i}\}_{i=1}^{N_{x}}$ we begin by recalling that the expansion coefficients $\bm{\alpha}$ of $U_{\mathcal{X}}$ (cf. section 3) are given as the solution of a linear system (the constraint $\mathbf{g}$ arising from the bilinear form (2)):

\displaystyle\mathbf{g}(\mathcal{X},\bm{\alpha}_{\mathcal{X}})=0,\quad\text{% where}\ \mathbf{g}(\mathcal{X},\bm{\alpha})=\mathsf{M}_{\mathcal{X}}\bm{\alpha% }_{\mathcal{X}}-\bm{b}_{\mathcal{X}}

(10)

for a given matrix $\mathsf{M}_{\mathcal{X}}$ and load vector $\bm{b}_{\mathcal{X}}$ which both depend on the mesh. Then we have

\displaystyle\frac{\mathrm{d}\mathcal{E}}{\mathrm{d}\mathbf{x}_{i}}=\bm{% \lambda}^{\top}\frac{\partial\mathbf{g}}{\partial\mathbf{x}_{i}}+\frac{% \partial\mathcal{E}}{\partial\mathbf{x}_{i}}{\color[rgb]{0,0,0}\definecolor[% named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}% \pgfsys@color@gray@fill{0},}

where $\bm{\lambda}$ is the solution to the discrete adjoint problem (cf. [39]) $\mathsf{M}_{\mathcal{X}}^{\top}\bm{\lambda}=-\partial\mathcal{E}/{\partial\bm{% \alpha}}$ . The advantage of this formulation is that we never have to compute $d\bm{\alpha}/d\mathbf{x}_{i}=d\left(\mathsf{M}_{\mathcal{X}}^{-1}\mathbf{b}_{% \mathcal{X}}\right)/d\mathbf{x}_{i}$ directly, thus we avoid having to differentiate a matrix inversion. This significantly simplifies the computational tree in the difFEM and makes computations on moderate mesh resolutions sizes feasible.

5 Experimental Results

We evaluate G-adaptivity on two classical meshing problems: an elliptic PDE (Poisson’s equation) and a nonlinear time-evolution PDE (Burger’s equation). In the following we present the performance improvements obtained in terms of the FEM $L^{2}$ -error reduction (cf. (3)) and compute time, using our novel approach for adaptive meshing on each of these problems. We share code to reproduce our results on the G-Adaptivity github repository.

5.1 Experimental details

Method

The pipeline for our experiments consist of three parts (cf. Figure 1): we build datasets containing information about the PDE, classical meshing solution and FEM approximation. Then we train the model, we experiment by training using either the Mesh-Loss which is the $L^{1}$ distance to MMPDE5 (1D) or Monge-Ampére (2D) pre-generated meshes or the ‘PDE-loss’ which is the $L^{2}$ FEM approximation error and then backpropagate through the differentiable solver difFEM. Full code to build the datasets and reproduce our results will be made available.

Evaluation is done by calculating an FE solution on the mesh generated by the model and projecting this using interpolation to a fine $100^{d}$ regular reference mesh. On the fine mesh we then calculate the $L^{2}$ error against the reference solution. For Poisson the reference solution is the analytical solution projected onto the for Burgers this is an FEM solve performed on the fine reference mesh. We then calculate the relative error reduction against an FEM solve done on a regular coarse mesh with coordinates $\mathbf{\xi}$ but evaluated on the fine mesh. We also report time which shows the time of a forward pass when the model is in training phase.

This is a very low parameter model with only 128 parameters coming from the $8\times 8$ query and key matrices shared over the layers. Training is done for 50 epochs with a Adam optimiser and learning rate of 0.001, although we found our model converged in less than 10-20 epochs. The model is also very easy to tune and we found using 4 layers with a step size of 0.1 was sufficient for all the datasets.

Baselines

We compare our model against classical adaptive mesh alogorithms MMDPE5 as described by [22] in 1-dimension and Monge-Ampére [6] in 2-dimensions. These two state-of-the-art classical approaches serve as a central baseline for the FEM error reduction and indicate also an upper bound on prior ML performance which, as mentioned above, focused mainly on the design of effective surrogate models, which provide a speed up but not FEM error reduction improvement over classical methods for $r$ -adaptivity. The monitor function for these methods was chosen to be $m_{1}$ as defined in (5). In our experiments we also aim to compare against the central ML baseline [38] which also describes a GNN based architecture for $r$ -adaptivity in meshing. The differences to our novel approach is the GNN architecture, where [38] used Graph Attention Networks (GAT) and their method of training which was data-based, i.e. with what we call a Mesh-Loss. Unfortunately, the code for this prior work was not made available. However, we were able to implement a version of the approach described in [38], see appendix for details. Note that qualitatively the performance of GAT_lap + Mesh-Loss is comparable to the results reported in [38] - an error reduction comparable to Monge-Ampère, with a mesh time improvement of around three orders of magnitude.

Datasets

We evaluate our model and baselines for a range of challenging mesh problems. There are four main datasets we call ‘structured gaussian’ (struct1g_27, struct2g_75) and random Gaussian (randg_25, randg_275). For all datasets we work on the domain $\Omega=[0,1]^{d}$ and generate example pde data and meshes for varying resolutions of $[11^{d},15^{d}]$ . Scales and centers of Gaussian type solutions are sampled with a stratified approach, for structured, or randomly otherwise see Appendix B for more details. For Poisson’s equation we evaluate against the analytical solution and for Burgers’ equation we perform an FEM solve on the fine grid used in evaluation.

5.2 Poisson’s equation

Our first test problem is the elliptic Poisson’s equation

\displaystyle-\nabla^{2}u=f,\ \ x\in\Omega,\quad u=g,\ \ x\in\partial\Omega.

Numerical results

The performance of our novel approach G-adaptivity in comparison to our baselines is shown in table 1. A sample extract of the dataset $randg\_275$ can be seen in Figure 3. Note the difference in the shape of the meshes generated with our optimised GNN architecture which lead to significant further error reduction over the Monge-Ampère meshes.

Dataset	Model	Loss	$L^{2}$ -FEM error reduction	Time
struct1g_27	MMPDE5	-	-18.25%	791 ms
	G-adaptive	Mesh-Loss	-15.46%	0.7 ms
	G-adaptive	difFEM	-26.02%	1.2 ms
struct2g_75	Monge-Ampère	-	-15.00%	845 ms
	GAT_lap	Mesh-Loss	11.55%	1.5 ms
	GAT_lap	difFEM	-3.11%	1.7 ms
	G-adaptive	Mesh-Loss	2.72%	0.8 ms
	G-adaptive	difFEM	-16.75%	0.8 ms
randg_25	Monge-Ampère	-	-15.70%	1868.1 ms
	GAT_lap	Mesh-Loss	-6.71%	3.9 ms
	GAT_lap	difFEM	-2.07%	1.9 ms
	G-adaptive	Mesh-Loss	-11.01%	1.7 ms
	G-adaptive	difFEM	-15.62%	1.5 ms
randg_275	Monge-Ampère	-	-13.84%	2332.5 ms
	GAT_lap	Mesh-Loss	-0.32%	3.9 ms
	GAT_lap	difFEM	-0.03%	2.1 ms
	G-adaptive	Mesh-Loss	-8.97%	1.6 ms
	G-adaptive	difFEM	-14.77%	1.6 ms

Table 1: Experimental results for Poisson’s equation

5.3 Burgers’ equation

The favourable performance of our approach extends to time-dynamic problems. To demonstrate this we choose the classical example of the viscous Burgers’ equation which is a well-known example of a PDE that evolves solutions with sharp interfaces (which are resolved with adaptive meshes). In the following $0<\nu\ll 1$ is the viscosity parameter:

\frac{\partial u}{\partial t}=-(u\cdot\nabla)u+\nu\nabla^{2}u,\;x\in\Omega,t>0% ,\quad u=g,\;x\in\partial\Omega,t>0,\quad u=u_{0},\;x\in\Omega,t=0.

(11)

Time-dynamic loss

Let us briefly outline how we use the approach introduced in section 4.2 for this time-dependent case. We begin by choosing a time-stepping method for the Burgers’ equation - a simple a good choice for the efficient numerical solution of such a stiff nonlinear system is a semi-implicit method [18]:

\displaystyle\langle U^{m+1},v\rangle=\langle v,U^{m}\rangle-\tau\left(\langle% (U^{m}\cdot\nabla)U^{m},v\rangle_{L^{2}(\Omega)}+\nu\langle\nabla U^{m+1},% \nabla v\rangle_{L^{2}(\Omega)}\right),\quad\forall v\in S_{\mathcal{X}},

where $U^{m}$ corresponds to the approximation of the solution $u$ at time $t=m\tau$ , and $\tau>0$ is the time step of the approximation. Here we omitted the subscript $U^{m}=U^{m}_{\mathcal{X}}$ . Bringing the dissipative (implicit) part to the left of the equation the computation of $U^{m+1}_{\mathcal{X}}$ takes exactly the form (10), thus allowing us to use difFEM on the dynamic time-stepping FEM error (cf. implicit-scheme-informed learning [23]):

\displaystyle\mathcal{E}_{\text{Burgers}}(\mathcal{X},U^{1}_{\mathcal{X}})=% \left\|U^{1}_{\mathcal{X}}-u^{1}\right\|_{L^{2}(\Omega)},

(12)

where $u^{1}$ is a reference value computed by resolving $u_{0}$ on a fine mesh and propagating with the semi-implicit time-stepper on this finer mesh. Of course there is no longer any forcing as part of the PDE data, and thus this is no longer included as a feature channel in the architecture.

Numerical results

We train the method on 100 samples of random Gaussian initial conditions with centers chosen uniformly at random in $[0,1]$ , and scales $s$ chosen uniformly at random in $[0.05,0.2]$ . These samples are on the one dimensional domain $[0,1]$ with mesh resolution $N_{x}=11$ . After training the model on a single time step (loss function $\mathcal{E}_{\text{Burgers}}$ ) with these initial conditions, we evaluate on the trajectories of 40 further initial conditions where we take 20 time steps of size $\tau=1/20$ , half of the trajectories on resolution $N_{x}=11$ mesh points, the second half on $N_{x}=21$ . The mesh is adapted after every time step and the process iterated. This means we test our model on the generation of $20\times 20=400$ meshes. Our evaluation metric is the error in the final solution value $U^{20}_{\mathcal{X}}$ against the value of the approximation found on a fine reference mesh. The results can be seen in table 2 as a baseline we compare against the classical meshing method MMPDE5 as above. Again our novel approach is able to significantly reduce mesh generation cost as well as enhance $L^{2}$ -FEM error reduction. It turns out that without the forcing function as part of the feature channel the GAT architecture is no longer able to learn optimal location of mesh points, the results are included for completeness but clearly bear no merit in the mesh adaptivity problem.

Model	Loss	$L^{2}$ -FEM error reduction	Mesh adaption time
MMPDE5	-	-16.93 %	41007 ms
G-adaptive	Mesh-Loss	-29.06%	94 ms
G-adaptive	difFEM	-49.93%	91 ms
GAT	difFEM	+77.20%	94 ms

Table 2: Experimental results for Burgers’ equation

6 Conclusions

Limitations

While our novel approach shows some significant performance improvement over both classical and prior ML approaches to $r$ -adaptivity, we recognise that further evaluation and development is necessary, which we will address in future work. In particular:

•

Generalisation to different PDEs Whilst the Poisson and Burgers’ PDEs are, for good reason, amongst the most widely used test problems in the context of adaptive meshing, we believe further evaluation should focus on generalising this approach to other types of equation. In principle our theoretical framework is highly flexible, but requires some adaption to the problem at hand, in particular in the choice of a good discrete loss (cf. (3) & (12)) and the choice of PDE features that can be included in our GNN architecture.
•

Mesh size and performance We acknowledge that testing of the methodology on larger mesh sizes $N_{x}$ is a necessary step to understand generalisation of this current approach to real-life FEM simulations. Our current mesh sizes were limited by the availability of computational power and while the new approach performed well throughout, the best FEM-error reduction over classical methods was observed for smaller $N_{x}$ . We note that if a full scale approach is not feasible on a large mesh, our methodology may still provide a useful approach by subdividing the domain into smaller parts and training on those.

Summary

We have presented a novel and effective approach to the classical problem of $r$ -adaptive meshing for FEM. In particular, we demonstrate, for the first time, that GNNs together with a differentiable FEM solver, can be effectively used to directly optimise the location of mesh points to minimise the FEM error. Hence we can take an entirely different route from prior work (both classical and ML approaches) which determine good choices of mesh points by analysis-inspired heuristics. We demonstrate the advantages of our method on challenging test problems, and find that, on those examples, we are able to outperform both classical and ML methods in terms of both error reduction and computational cost. We note that the direct FEM error optimisation approach extends naturally to more complex domains (e.g. not simply connected or higher dimensional) where classical methods may struggle. We will study this in future work. Finally, our time-dynamic loss also permits specific training of the GNN for remeshing after several time steps instead of a single one, which is an avenue for further cost reduction that we will exploit in future work.

Acknowledgments and Disclosure of Funding

TD, CBS & CB gratefully acknowledge support from the EPSRC programme grant in ‘The Mathematics of Deep Learning’, under the project EP/V026259/1. GM gratefully acknowledges funding from the Mathematical Institute, University of Oxford. KS gratefully acknowledges funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 850941).

References

[1] Geometric Deep Learning: Going beyond Euclidean data | IEEE Journals & Magazine | IEEE Xplore. https://ieeexplore.ieee.org/abstract/document/7974879.
Ainsworth and Oden [1997] M. Ainsworth and J. T. Oden. A posteriori error estimation in finite element analysis. Computer Methods in Applied Mechanics and Engineering, 142(1):1–88, Mar. 1997. ISSN 0045-7825. doi: 10.1016/S0045-7825(96)01107-3.
Alet et al. [2019] F. Alet, A. K. Jeewajee, M. B. Villalonga, A. Rodriguez, T. Lozano-Perez, and L. Kaelbling. Graph Element Networks: Adaptive, structured computation and memory. In Proceedings of the 36th International Conference on Machine Learning, pages 212–222. PMLR, May 2019.
Battaglia et al. [2018] P. W. Battaglia, J. B. Hamrick, V. Bapst, A. Sanchez-Gonzalez, V. F. Zambaldi, M. Malinowski, A. Tacchetti, D. Raposo, A. Santoro, R. Faulkner, Ç. Gülçehre, H. F. Song, A. J. Ballard, J. Gilmer, G. E. Dahl, A. Vaswani, K. R. Allen, C. Nash, V. Langston, C. Dyer, N. Heess, D. Wierstra, P. Kohli, M. M. Botvinick, O. Vinyals, Y. Li, and R. Pascanu. Relational inductive biases, deep learning, and graph networks. CoRR, abs/1806.01261, 2018.
Brandstetter et al. [2022] J. Brandstetter, D. Worrall, and M. Welling. Message Passing Neural PDE Solvers. arXiv:2202.03376 [cs, math], Feb. 2022.
Budd et al. [2013] C. Budd, M. Cullen, and E. Walsh. Monge–Ampére based moving mesh methods for numerical weather prediction, with applications to the Eady problem. Journal of Computational Physics, 236:247–270, Mar. 2013. ISSN 00219991. doi: 10.1016/j.jcp.2012.11.014.
Budd et al. [2009] C. J. Budd, W. Huang, and R. D. Russell. Adaptivity with moving grids. Acta Numerica, 18:111–241, May 2009. ISSN 1474-0508, 0962-4929. doi: 10.1017/S0962492906400015.
Chamberlain et al. [2021a] B. Chamberlain, J. Rowbottom, D. Eynard, F. Di Giovanni, X. Dong, and M. Bronstein. Beltrami Flow and Neural Diffusion on Graphs. In Advances in Neural Information Processing Systems, volume 34, pages 1594–1609. Curran Associates, Inc., 2021a.
Chamberlain et al. [2021b] B. Chamberlain, J. Rowbottom, M. I. Gorinova, M. Bronstein, S. Webb, and E. Rossi. GRAND: Graph Neural Diffusion. In Proceedings of the 38th International Conference on Machine Learning, pages 1407–1418. PMLR, July 2021b.
Cotter [2023] C. J. Cotter. Compatible finite element methods for geophysical fluid dynamics. Acta Numerica, 32:291–393, May 2023. ISSN 0962-4929, 1474-0508. doi: 10.1017/S0962492923000028.
Defferrard et al. [2016] M. Defferrard, X. Bresson, and P. Vandergheynst. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. In Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016.
E and Yu [2018] W. E and B. Yu. The Deep Ritz Method: A Deep Learning-Based Numerical Algorithm for Solving Variational Problems. Communications in Mathematics and Statistics, 6(1):1–12, Mar. 2018. ISSN 2194-671X. doi: 10.1007/s40304-018-0127-z.
Fey and Lenssen [2019] M. Fey and J. E. Lenssen. Fast Graph Representation Learning with PyTorch Geometric. arXiv:1903.02428, 2019.
Foucart et al. [2023] C. Foucart, A. Charous, and P. F. J. Lermusiaux. Deep reinforcement learning for adaptive mesh refinement. Journal of Computational Physics, 491:112381, Oct. 2023. ISSN 0021-9991. doi: 10.1016/j.jcp.2023.112381.
Gilmer et al. [2017] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl. Neural Message Passing for Quantum Chemistry. In Proceedings of the 34th International Conference on Machine Learning, pages 1263–1272. PMLR, July 2017.
Giovanni et al. [2023] F. D. Giovanni, J. Rowbottom, B. P. Chamberlain, T. Markovich, and M. M. Bronstein. Understanding convolution on graphs via energies. Transactions on Machine Learning Research, June 2023. ISSN 2835-8856.
Grossmann et al. [2023] T. G. Grossmann, U. J. Komorowska, J. Latz, and C.-B. Schönlieb. Can Physics-Informed Neural Networks beat the Finite Element Method? arXiv:2302.04107, 2023.
Hairer and Wanner [1996] E. Hairer and G. Wanner. Solving Ordinary Differential Equations II, volume 14 of Springer Series in Computational Mathematics. Springer, Berlin, Heidelberg, 1996. ISBN 978-3-642-05220-0 978-3-642-05221-7. doi: 10.1007/978-3-642-05221-7.
Ham et al. [2023] D. A. Ham, P. H. J. Kelly, L. Mitchell, C. J. Cotter, R. C. Kirby, K. Sagiyama, N. Bouziani, S. Vorderwuelbecke, T. J. Gregory, J. Betteridge, D. R. Shapero, R. W. Nixon-Hill, C. J. Ward, P. E. Farrell, P. D. Brubeck, I. Marsden, T. H. Gibson, M. Homolya, T. Sun, A. T. T. McRae, F. Luporini, A. Gregory, M. Lange, S. W. Funke, F. Rathgeber, G.-T. Bercea, and G. R. Markall. Firedrake User Manual. Imperial College London and University of Oxford and Baylor University and University of Washington, first edition edition, 5 2023.
Han et al. [2018] J. Han, A. Jentzen, and W. E. Solving high-dimensional partial differential equations using deep learning. Proceedings of the National Academy of Sciences, 115(34):8505–8510, 2018.
Hu et al. [2024] P. Hu, Y. Wang, and Z.-M. Ma. Better Neural PDE Ssolvers Through Data-Free Mesh Movers. The Twelfth International Conference on Learning Representations, 2024.
Huang and Russell [2011] W. Huang and R. Russell. Adaptive Moving Mesh Methods, volume 174. Jan. 2011. ISBN 978-1-4419-7915-5. doi: 10.1007/978-1-4419-7916-2.
Jin et al. [2024] T. Jin, G. Maierhofer, K. Schratz, and Y. Xiang. A fast neural hybrid Newton solver adapted to implicit methods for nonlinear dynamics. arXiv:2407, 2024.
Jumper et al. [2021] J. Jumper, R. Evans, A. Pritzel, T. Green, M. Figurnov, O. Ronneberger, K. Tunyasuvunakool, R. Bates, A. Žídek, A. Potapenko, A. Bridgland, C. Meyer, S. A. A. Kohl, A. J. Ballard, A. Cowie, B. Romera-Paredes, S. Nikolov, R. Jain, J. Adler, T. Back, S. Petersen, D. Reiman, E. Clancy, M. Zielinski, M. Steinegger, M. Pacholska, T. Berghammer, S. Bodenstein, D. Silver, O. Vinyals, A. W. Senior, K. Kavukcuoglu, P. Kohli, and D. Hassabis. Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873):583–589, Aug. 2021. ISSN 1476-4687. doi: 10.1038/s41586-021-03819-2.
Kipf and Welling [2022] T. N. Kipf and M. Welling. Semi-Supervised Classification with Graph Convolutional Networks. In International Conference on Learning Representations, July 2022.
Lam et al. [2023] R. Lam, A. Sanchez-Gonzalez, M. Willson, P. Wirnsberger, M. Fortunato, F. Alet, S. Ravuri, T. Ewalds, Z. Eaton-Rosen, W. Hu, A. Merose, S. Hoyer, G. Holland, O. Vinyals, J. Stott, A. Pritzel, S. Mohamed, and P. Battaglia. Learning skillful medium-range global weather forecasting. Science, 382(6677):1416–1421, Dec. 2023. doi: 10.1126/science.adi2336.
Li et al. [2020a] Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, A. Stuart, K. Bhattacharya, and A. Anandkumar. Multipole Graph Neural Operator for Parametric Partial Differential Equations. In Advances in Neural Information Processing Systems, volume 33, pages 6755–6766. Curran Associates, Inc., 2020a.
Li et al. [2020b] Z. Li, N. B. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anandkumar. Fourier Neural Operator for Parametric Partial Differential Equations. In International Conference on Learning Representations, Sept. 2020b.
Li et al. [2023] Z. Li, N. B. Kovachki, C. Choy, B. Li, J. Kossaifi, S. P. Otta, M. A. Nabian, M. Stadler, C. Hundt, K. Azizzadenesheli, and A. Anandkumar. Geometry-Informed Neural Operator for Large-Scale 3D PDEs. In Thirty-Seventh Conference on Neural Information Processing Systems, Nov. 2023.
Lienen and Günnemann [2022] M. Lienen and S. Günnemann. Learning the Dynamics of Physical Systems from Sparse Observations with Finite Element Networks. In International Conference on Learning Representations, Mar. 2022.
Lu et al. [2021] L. Lu, P. Jin, G. Pang, Z. Zhang, and G. E. Karniadakis. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nature Machine Intelligence, 3(3):218–229, Mar. 2021. ISSN 2522-5839. doi: 10.1038/s42256-021-00302-5.
McRae et al. [2018] A. T. T. McRae, C. J. Cotter, and C. J. Budd. Optimal-Transport–Based Mesh Adaptivity on the Plane and Sphere Using Finite Elements. SIAM Journal on Scientific Computing, 40(2):A1121–A1148, 2018.
Paszke et al. [2017] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer. Automatic differentiation in PyTorch. NIPS-W, 2017.
Pfaff et al. [2023] T. Pfaff, M. Fortunato, A. Sanchez-Gonzalez, and P. Battaglia. Learning Mesh-Based Simulation with Graph Networks. In International Conference on Learning Representations, Feb. 2023.
Raissi [2018] M. Raissi. Forward-Backward Stochastic Neural Networks: Deep Learning of High-dimensional Partial Differential Equations, Apr. 2018.
Raissi et al. [2019] M. Raissi, P. Perdikaris, and G. E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686–707, Feb. 2019. ISSN 0021-9991. doi: 10.1016/j.jcp.2018.10.045.
Satorras et al. [2021] V. G. Satorras, E. Hoogeboom, and M. Welling. E(n) Equivariant Graph Neural Networks. In Proceedings of the 38th International Conference on Machine Learning, pages 9323–9332. PMLR, July 2021.
Song et al. [2022] W. Song, M. Zhang, J. G. Wallwork, J. Gao, Z. Tian, F. Sun, M. D. Piggott, J. Chen, Z. Shi, X. Chen, and J. Wang. M2N: Mesh Movement Networks for PDE Solvers. In Advances in Neural Information Processing Systems, May 2022.
Strang [2012] G. Strang. Computational Science and Engineering. Wellesley-Cambridge Press, Wellesley, Mass, second printing edition, 2012. ISBN 978-0-9614088-1-7.
Vaswani et al. [2017] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. ukasz Kaiser, and I. Polosukhin. Attention is All you Need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
Veličković et al. [2018] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio. Graph Attention Networks. In International Conference on Learning Representations, Feb. 2018.
Wallwork [2022] J. G. Wallwork. pyroteus/movement: Mesh movement in Firedrake, May 2022. Zenodo.
Yang et al. [2023] Y. Yang, Q. Yang, Y. Deng, and Q. He. MMPDE-Net and Moving Sampling Physics-informed Neural Networks Based On Moving Mesh Method, Nov. 2023.

Appendix A Further background on classical methods and motivation for optimal monitor function

Classical methods for mesh adaption (whether h-adaptive or r-adaptive) require some measure of the error of the solution. In h-adaptive methods this is usually done after the calculation through an a-posteriori error estimate. For many problems (particularly those which are strongly nonlinear) this can involved solving a delicate dual problem [2]. When implementing such a method, cells with a large a-posteriori error are subdivided. This leads to a high density of mesh points where the error is large, and effectively equidistributes the a-posteriori error over the computational cells. An alternative, often used in r-adaptive methods, is to have an a-priori estimate of the error, and then to move the mesh points to equidistribute this. A commonly used a-priori error estimator is the (local) interpolation error of the solution, which is (usually) proportional to a higher derivative of the solution such as the curvature (or in fluid mechanics the vorticity). A monitor function $m({\mathbf{x}})$ is then set to equal the curvature, and the mesh points determined so that the integral of $m$ over a mesh cell (the interpolation error) is constant over the mesh cells. Whilst the interpolation error is not the true FEM error, it follows from Cea’s lemma that it is an upper bound, and hence can be used to control the FEM error. For a one dimensional problem the $L^{2}$ error $e$ of interpolating a function $u(x)$ over a cell of width $h$ using a piece-wise linear function is given a-priori by

e=h^{5}u^{\prime\prime}(\xi)^{2}

where $\xi$ is some point in the cell. The error over the mesh cell is therefore equidistributed if the cell length $h$ is such that

h(u^{\prime\prime})^{2/5}=\theta

where $\theta$ is a constant. The ’optimal’ monitor function in this case is then given by

m(x)=(u^{\prime\prime})^{2/5}.

In practice this is regularised to ensure a more even distribution of mesh points and we take

m(x)=\left(\epsilon^{2}+(u^{\prime\prime})^{2}\right)^{1/5}.

It is shown in [22] that choosing this monitor function and solving the equidistribution equation

m(x)h=\theta

leads to an optimal interpolation error for a fixed number $N_{x}$ of mesh points. In one-dimension the equidistribution equation has a unique solution and can be solved using a (parabolic) moving mesh PDE for $x$ , such as MMPDE5 (as used in this paper) [22]. In higher dimensions this equation does not have a unique solution and must be augmented with additional constraints. A powerful such constraint is to use optimal transport and to minimise the Wasserstein distance between the adapted mesh and a uniform mesh. This enforces excellent mesh regularity. This method is implemented in the Monge-Ampére algorithm considered in this paper. This method reduces to MMPDE5 in one-dimension and can be shown to give optimally smooth and meshes giving accurate FEM approximations [32]. However it is slow and hard to use. Hence the need for the faster ML based methods considered in this paper and by others.

Appendix B Details for numerical experiments

Software

All GNN architectures were implemented using PyTorch [33] and PyTorch geometric [13]. We use the repository by Wallwork [42] to generate the Monge–Ampére meshes used in our mesh-loss experiments. FEM solutions we generated using a combination of the Firedrake package [19], and difFEM code developed in PyTorch for this work.

Training and testing

Optimisation of our G-adaptivity method was carried out using the Adam optimiser within PyTorch with a constant learning rate of $1e-3$ . We trained all experiments exclusively on an M2 MacBook cpu. Convergence was typically observed within just a few epochs, further highlighting the strength of the inductive bias in our model.

Datasets

The details of how we generated the datasets are as follows.

For ‘structured Gaussian’ in 1D (struct1g_27) as illustrated in Figure 4, we sample 1 Gaussian iteratively placed at 9 equi-spaced grid points in $[0.1,0.9]$ . This process is repeated for Gaussians of scale in $[0.1,0.2,0.3]$ resulting in 27 data points.

For ‘structured Gaussian’ in 2D (struct2g_75) as illustrated in Figures 5 and 6, we sample 2 Gaussians, one with fixed position and one iteratively placed in a $5\times 5$ grid equi-spaced in $[0.1,0.9]^{2}$ . This process is repeated for Gaussians of scale in $[0.1,0.2,0.3]$ resulting in 25 data points.

For ‘random Gaussian’ (randg_25 and randg_275) in 2D, as illustrated in Figure 3 we randomly select $n\in[1,2,3,4,5,6]$ Gaussians, a mesh resolution in $[11^{d},15^{d}]$ , $n$ random locations in $[0,1]^{2}$ and $n$ random scales in $[0.1,\sqrt{0.4}]$ for each Gaussian. We repeat the process for 25 or 275 examples respectively.

For testing we apply our models to the test datasets and report the corresponding error reductions in Table 1 of the main paper. For the ‘1d struct 21’ and ‘2d struct 75’ experiments the test dataset is identical to the training data, and for ‘2d randg 25’, ‘2d randg 275’ and Burger’s equation the testing data were sampled independently of the training data. For Poisson’s equation we evaluate against the analytical solution and for Burgers’ equation we perform an FEM solve on the fine grid used in evaluation.

Architectures

In addition to the G-adaptive architecture outlined in the main body of the paper, we benchmark against two graph attention (GAT) based architectures. For both Poisson’s and Burger’s equations we applied the vanilla GAT architecture, which has the form

\mathbf{X}^{k+1}=\mathbf{A}_{GAT}(\mathbf{X}^{k})\mathbf{X}^{k}

where $\mathbf{A}_{GAT}$ is the attention matrix calculated using [41] style attention. Note that we omit these results for Poisson’s equation due to poor performance of vanilla GAT, and instead we modified the GAT architecture to more closely match our G-adaptive architecture. This allows for a more direct comparison of the effect of the attention mechanism on the quality of the generated meshes. This results in the GAT_lap, which has the following form

\mathbf{X}^{k+1}=(\mathbf{A}_{GAT}(\mathbf{X}^{k})-\mathbf{I})\mathbf{X}^{k}.

Note it was necessary to keep the residual and Laplacian form of GRAND as any further alterations made the model fail. In all cases of GRAND, GAT, and GAT_lap, we apply a network with 4 layers and optimise parameters consisting of the learnable weights ( $\mathbf{W}_{Q},\mathbf{W}_{K}$ in GRAND, and $\mathbf{W}$ in GAT/GAT_lap).

Appendix C Convexity properties and non-tangling of our architecture

As mentioned in section 4.2 of the main body of our manuscript, the GRAND architecture inherently enforces non-tangling of mesh points by moving points into the convex hulls of neighbours at every instance. In particular, the graph diffusion equation is

\displaystyle\frac{d\mathcal{X}}{dt}=(\mathcal{A}-\mathsf{I})\mathcal{X}

(13)

where $\mathcal{A}$ is the attention matrix (which implicitly of course depends on the node features of the graph). We recall that the attention matrix is row-stochastic, meaning every row of $\mathcal{A}$ has non-negative entries that sum to 1. Moreover, our model has no self attention, meaning the diagonal entries of $\mathcal{A}$ are zero. This immediately implies that $(\mathcal{A}\mathcal{X})_{i}$ can be written as

\displaystyle(\mathcal{A}\mathcal{X})_{i}=\sum_{j:(i,j)\in\mathcal{E}^{\prime}% }a_{ij}\mathbf{x}_{j},\quad\text{where }a_{ij}\geq 0,\sum_{j}a_{ij}=1.

This is precisely the statement that $(\mathcal{A}\mathcal{X})_{i}$ is a point in the convex hull of the graph neighbours of $\mathbf{x}_{i}$ .

Thus reading (13) for each node $i$ individually we see that the instantaneous change in $\mathbf{x}_{i}$ is forcing it to move in the direction of $(\mathcal{A}\mathcal{X})_{i}$ , i.e. to a point in the convex hull of the graph neighbours, i.e. the neighbouring mesh points. That this ultimately results in a non-tangled mesh is best seen on the following illustrative 1 dimensional result:

Proposition 1.

Suppose we have an ordered mesh on the 1d domain $[0,1]$ , i.e. mesh points $0=x_{1}<x_{2}<\cdots<x_{n}=1$ . Under the continuous graph diffusion flow (13) the mesh will never tangle, that is no two mesh points will cross.

Proof by contradiction.

Suppose the mesh crosses at a future time, i.e. there is $x_{i},x_{i+1}$ , $t>0$ , such that $x_{i}(t)>x_{i+1}(t)$ and, by continuity, a time $T_{c}$ (the crossing time) when $x_{i}(T_{c})=x_{i+1}(T_{c}),(\dot{x}_{i+1}-\dot{x}_{i})<0$ and $0=x_{0}<\cdots<x_{i-1}(T_{c})<x_{i}(T_{c})=x_{i+1}(T_{c})\leq x_{i+2}(T_{c})<% \cdots<x_{N_{x}}=1$ . Here we assumed that $i\neq 0,i+1\neq N_{x}$ , as the boundary cases can be treated analogously. Now we note that by the graph diffusion equation we also have, denoting $d=x_{i+1}-x_{i}$ ,

\displaystyle\dot{d}

\displaystyle=\dot{x}_{i+1}-\dot{x}_{i}=\left(ax_{i}+(1-a)x_{i+2}-x_{i+1}% \right)-\left(bx_{i-1}+(1-b)x_{i+1}-x_{i}\right)

for some numbers $a,b\in[0,1]$ . Thus we have

	$\displaystyle\dot{d}$	$\displaystyle=-(2-b)d-(2-b)x_{i}+ax_{i}-bx_{i-1}+(1-a)x_{i+2}+x_{i}$
		$\displaystyle=-(2-b)d-(1-b+a)x_{i}-bx_{i-1}+(1-a)x_{i+2}$
		$\displaystyle=\underbrace{-(2-b)d}_{=0}+\underbrace{b\left(x_{i}-x_{i-1}\right% )}_{>0}+\underbrace{(1-a)\left(x_{i+2}-x_{i}\right)}_{\geq 0}>0.$

This leads to a contradiction thus completing the proof. ∎

Appendix D Notes on implementation of difFEM

The software package difFEM is implemented using PyTorch, and provides a finite element code which is fully differentiable (since written in PyTorch) - in particular can compute derivatives of any observational with respect to the mesh points using the discrete adjoint approach described in the main body of this submission. In the current form we chose piecewise linear finite elements for the FEM discretisations, but the concept extends naturally to higher degree piece-wise polynomial basis functions and this will be incorporated in future versions of difFEM. To demonstrate the conceptual idea of differentiation of finite element codes with respect to mesh points let us briefly outline the following one dimensional example: Suppose we are give a mesh $\mathcal{X}=\{0=x_{1}<\dots<x_{N_{x}}=1\}$ , then the associated FEM approximation space is $S_{\mathcal{X}}=\{\phi_{x}\}_{i=1}^{N_{x}}$ where piecewise linear basis functions are defined as

\displaystyle\phi_{i}(x)=\begin{cases}\frac{x-x_{i-1}}{x_{i}-x_{i-1}}&\text{if% }x_{i-1}\leq x\leq x_{i},\\ \frac{x_{i+1}-x}{x_{i+1}-x_{i}}&\text{if }x_{i}\leq x\leq x_{i+1},\\ 0&\text{otherwise.}\end{cases}

These provide the basis for any approximation in the FEM solver, for example the load vector for a given forcing $f:[0,1]\rightarrow\mathbb{R}$ is computed as follows

\displaystyle\mathbf{b}_{\mathcal{X}}=(\langle f,\phi_{x}\rangle_{L^{2}([0,1])% })_{i=1}^{N_{x}}

and the stiffness matrix $\mathsf{M}_{\mathcal{X}}$ appearing in the discretisation of Poisson’s equation has entries

\displaystyle\left(\mathsf{M}_{\mathcal{X}}\right)_{ij}=\langle\partial_{x}% \phi_{j},\partial_{x}\phi_{i}\rangle.

Thus if we want to compute the derivative of any of these quantities with respect to $x_{j},j=1,\dots,N_{x}$ it suffices to be able to compute $d\phi_{i}/dx_{j}$ . This can be done analytically and is incorporated in the set up of the FEM code when difFEM is assembled. For example we have

\frac{d\phi_{i}}{dx_{i}}=\begin{cases}-\frac{x-x_{i-1}}{(x_{i}-x_{i-1})^{2}}&% \text{if }x_{i-1}\leq x\leq x_{i},\\ \frac{x_{i+1}-x}{(x_{i+1}-x_{i})^{2}}&\text{if }x_{i}\leq x\leq x_{i+1},\\ 0&\text{otherwise,}\end{cases}

and derivatives $d\phi_{i}/dx_{j}$ for $i\neq j$ can be computed analogously.