Artificial Intelligence (AI) and Deep Learning For CFD: October 2022
Artificial Intelligence (AI) and Deep Learning For CFD: October 2022
Artificial Intelligence (AI) and Deep Learning For CFD: October 2022
net/publication/339795951
CITATIONS READS
10 24,358
1 author:
Ideen Sadrehaghighi
CFD Open Series
80 PUBLICATIONS 108 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Ideen Sadrehaghighi on 02 October 2022.
a
Artificial Intelligence (AI)
and Deep Learning For CFD
Artificial
Intelegence
Machine
Learning
Artificial
Neutrual
Networks
(ANNs)
ANNAPOLIS, MD
2
Contents
3 Case Studies for Artificial Neutral Networks (ANN) & Physics Informed Neutral
Network (PINN) ................................................................................................................................... 52
3.1 Case Study 1 - 2D High-Lift Aerodynamic Optimization using Neural Networks ................... 52
3.1.1 Discussion and Background ............................................................................................... 52
3.1.2 Agile AI-Enhanced Design Process ..................................................................................... 54
3.1.3 Summary ............................................................................................................................ 55
3.1.4 Conclusion ......................................................................................................................... 56
3.2 Case Study 2 - Artificial Neural Networks (ANNs) Trained Through Deep Reinforcement
Learning Discover Control Strategies For Active Flow Control ................................................................... 57
3.2.1 Introduction and Literature Survey ................................................................................... 57
4
List of Tables
Table 1.2.1 Data Considered ......................................................................................................................................... 14
Table 2.2.1 Results of Different Methods ................................................................................................................ 45
Table 3.3.1 Accuracy Analyze Based On Changing Coefficients ..................................................................... 76
Table 3.4.1 Case study of PINNs for incompressible flows: details of the 2D2C observation ........... 85
Table 3.5.1 Major software libraries specifically designed for physics- informed ............................. 113
Table 3.6.1 Criteria for the ML Framework Classification - (Courtesy of Chang & Dinh) ................ 133
Table 3.6.2 Parameter Sets for the Thermal Conductivity Model - (Courtesy of Chang & Dinh) .. 138
Table 3.6.3 Summary of IET Training and Validating Data Sets - (Courtesy of Chang & Dinh) ..... 139
Table 3.6.4 Summary of SET Training Datasets - (Courtesy of Chang & Dinh) .................................... 140
Table 3.7.1 Bounds of key parameters of flight missions considered ...................................................... 149
Table 3.7.2 Airfoil shape design optimization problem statement ........................................................... 149
List of Figures
Figure 1.1.1 Scope of Artificial Intelligence - Courtesy of Hackerearth Blog............................................ 11
Figure 1.2.1 Research in artificial intelligence (AI) Source: [1] ..................................................................... 12
Figure 1.2.2 Machine Learning Programming ....................................................................................................... 13
7
equivalently be thought of as an algorithm in which each vertex receives and aggregates messages
from its neighbors. Also depicted on the left are the molecular graphs for C18H9N3OSSe and
C22H15NSeSi from the Harvard Clean Energy Project (HCEP) data set205 with their corresponding
adjacency matrices. b | A neural network with the Lax–Oleinik formula represented in the
architecture. f is the solution of the Hamilton–Jacobi partial differential equations, x and t are the
spatial and temporal variables, L is a convex and Lipschitz activation function, ai _R and ui _R n are
the neural network parameters, and m is the number of neurons. Panel a is adapted with
permission from ref.204, AIP Publishing. Panel b image courtesy of J. Darbon and T. Meng, Brown
University............................................................................................................................................................................... 101
Figure 3.5.2 inferring the 3D flow over an espresso cup based using the Tomo-BOs imaging
system and physics-informed neural networks (PiNNs). a | Six cameras are aligned around an
espresso cup, recording the distortion of the dot- patterns in the panels placed in the background,
where the distortion is caused by the density variation of the airflow above the espresso cup. The
image data are acquired and processed with LaVision’s Tomographic BOS software (DaVis 10.1.1).
.................................................................................................................................................................................................... 109
Figure 3.5.3 Physics-informed filtering of in-vivo 4D-flow magnetic resonance imaging data of
blood flow in a porcine descending aorta. Physics- informed neural network (PINN) models can be
used to de- noise and reconstruct clinical magnetic resonance imaging (MRI) data of blood velocity,
while constraining this reconstruction to respect the underlying physical laws of momentum and
mass conservation, as described by the incompressible Navier–Stokes equations. Moreover, a
trained PINN model has the potential to aid the automatic segmentation of the arterial wall
geometry and to infer important biomarkers such as blood pressure and wall shear stresses. a |
Snapshot of in- vivo 4D- flow MRI measurements. .............................................................................................. 110
Figure 3.5.4 Uncovering edge plasma dynamics. One of the most intensely studied aspects of
magnetic confinement fusion is edge plasma behavior, which is critical to reactor performance and
operation. The drift- reduced Braginskii two- fluid theory has for decades been widely used to
model edge plasmas, with varying success. Using a 3D magnetized two- fluid model, physics-
informed neural networks (PINNs) can be used to accurately reconstruct141 the unknown turbulent
electric field (middle panel) and underlying electric potential (right panel), directly from partial
observations of the plasma’s electron density and temperature from a single test discharge (left
panel). The top row shows the reference target solution,................................................................................. 111
Figure 3.5.5 Transitions between metastable states. Results obtained from studying transitions
between metastable states of a distribution in a 144- dimensional Allen–Cahn type system. The top
part of the figure shows the two metastable states. The lower part of the figure shows, from left to
right, a learned sample path with the characteristic nucleation pathway for a transition between the
two metastable states. Here, q is the committor function. Figure courtesy of G. M. Rotskoff, Stanford
University, and E. Vanden- Eijnden, Courant Institute. ...................................................................................... 112
Figure 3.6.1 Workflow of Employing ML methods for Developing Thermal fluid closures –
(Courtesy of Chang & Dinh)............................................................................................................................................ 131
Figure 3.6.2 Hierarchy of Thermal Fluid Data - (Courtesy of Chang & Dinh)........................................ 132
Figure 3.6.3 Overview of Type I ML Framework with a Scale Separation Assumption - (Courtesy of
Chang & Dinh) ...................................................................................................................................................................... 134
Figure 3.6.4 Domain of Various ML Frameworks where L, M, and H Denote Low, Medium, and
High - (Courtesy of Chang & Dinh) .............................................................................................................................. 137
Figure 3.6.5 Schematic of integral effects tests (IETs) for measuring Temperature fields -
(Courtesy of Chang & Dinh)............................................................................................................................................ 139
Figure 3.6.6 Schematic of Separate Effects Tests (SETs) for Measuring Thermal Conductivity as
the Function of Sample’s Mean Temperature - (Courtesy of Chang & Dinh)............................................. 139
Figure 3.6.7 Architecture of CNN-Based Thermal Conductivity (adopted after LeCun) .................. 141
Figure 3.7.1 No mode collapse is reported in WGAN ...................................................................................... 145
10
Figure 3.7.2 Abnormal airfoils generated by perturbing FFD control points ....................................... 147
Figure 3.7.3 Investigation of CNN hyperparameters in the training of airfoil filtering models .... 148
Figure 3.7.4 FFD control points and the CFD mesh for airfoil design....................................................... 150
Figure 3.7.5 Optimized airfoils for different flight8missions subject to different area constraints
.................................................................................................................................................................................................... 151
Figure 3.7.6 Scores evaluated by the geometric filtering model ................................................................ 152
Figure 3.7.7 Nine sectional airfoils are monitored in the CRM optimization ........................................ 153
Figure 3.7.8 Eight sectional airfoils are monitored in the BWB optimization ...................................... 153
Figure 3.7.9 Geometric filtering constraint with Svalidity > 1:0 does not filter out the optimal shapes
in the CRM design ............................................................................................................................................................... 154
Figure 3.7.10 Geometric filtering constraint with Svalidity > 1:0 does not filter out the optimal
shapes in the BWB design ............................................................................................................................................... 154
Figure 3.7.11 Gradient-based optimization starting from a circle fails due to too much geometric
abnormality........................................................................................................................................................................... 156
Figure 3.7.12 The deep-learning-based filtering constraint ensures the success of gradient-based
optimization from a circle ............................................................................................................................................... 156
Figure 3.7.13 Dominant global mode shapes for the BWB optimization ................................................ 158
Figure 3.7.14 Dominant global mode shapes for the CRM wing and tail optimization ..................... 158
Figure 3.7.15 Flowchart of the airfoil GAN model proposed by Li et al. [4] .......................................... 159
11
1 John Kontos, “Artificial Intelligence, Machine Consciousness and Explanation”, Academia Letters preprint, 2012.
2 Vargas, R., Mosavi, A., & Ruiz, R. (2017). Deep learning: a review.
12
machine learning has given us self-driving cars, practical speech recognition, effective web search,
and a vastly improved understanding of the human genome. Machine learning is so pervasive today
that you probably use it dozens of times a day without knowing it. The process of machine learning
is similar to that of data mining. Both systems search through data to look for patterns. However,
instead of extracting data for human comprehension as is the case in data mining applications
machine learning uses that data to detect patterns in data and adjust program actions
accordingly. Machine learning algorithms are often categorized as being supervised or un-
supervised and reinforcement learning. Supervised algorithms can apply what has been learned
in the past to new data. Un-supervised algorithms can draw inferences from datasets. Facebook's
News Feed uses machine learning to personalize each member's feed. If a member frequently stops
scrolling in order to read or "like" a particular friend's posts, the News Feed will start to show more
of that friend's activity earlier in the feed. Behind the scenes, the software is simply using statistical
analysis and predictive analytics to identify patterns in the user's data and use to patterns to
populate the News will be included in the data set and the News Feed will adjust accordingly. Google
and Amazon are other heavy users of Machine Learning. In essence, Machine learning (ML) is an
algorithms that process and extract information from data. They facilitate automation of tasks
and augment human domain knowledge. They are linked to learning processes and are
categorized as supervised, semi-supervised, or unsupervised (Brunton, Noack, &
Koumoutsakos, 2020). Reinforcement learning is a third, large branch of machine learning research,
in which an agent learns to make control decisions to interact with an environment for some high
13
level objection3. Examples include learning how to play games4, such as chess. (Brunton, 2021)5.
1.2.1 Creating Your First Machine Learning Model (Apples & Oranges)
Source : Newark.com
In ML, instead of defining the rules and expressing them in a programming language, answers
(typically called labels) are provided with the data (see Figure 1.2.2). The machine will conclude
the rules that determine the relationship between the labels and the data. The data and labels are
used to create ML Algorithms,
typically called models. Using this
model, when the machine gets new
data, it predicts or correctly labels
them. If we train the model to discern
between apples and oranges, the
model can predict whether it is an
apple or an orange when new data is Figure 1.2.2 Machine Learning Programming
presented. The problem sounds easy,
but it is impossible to solve without ML. You'd need to write tons of rules to tell the difference
between apples and oranges. With a new problem, you need to restart the process. There are many
aspects of the fruit that we can collect data on, including color, weight, texture, and shape. For our
purposes, we'll pick only two simple ones as data: weight and texture. In this article, we will explain
how to create a simple ML algorithm that discerns between an apple and an orange. To discern
between an apple and an orange, we create an algorithm that can figure out the rules so we don't
have to write them by hand. And for that, we're going to train what's called a classifier. You can think
of a classified as a function. It takes some data as input and assigns a label to it as output. The
technique of automatically writing the classifier is called supervised learning.
3 Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 1. MIT Press, Cambridge (1998)
4 Mnih, V., Kavukcuoglu, K., Silver, D., et al.: Human-level control through deep reinforcement learning. Nature
518, 529 (2015)
5 arXiv:2110.02083 [physics. flu-dyn]
14
−b ± √b 2 − 4ac
rL , rR =
2a
Eq. 1.2.1
We would like to learn the Eq. 1.2.1
(a, b, c) → (rL , r R )
Eq. 1.2.2
without relying on the knowledge of the underlying processes (Gyrya, Shashkov, Skurikhin, &
Tokareva, 2019)6. For example, the relationship Eq. 1.2.2 may represent a physical process for
which some observations are available but the analytical relation Eq. 1.2.1 has not yet been
established. The prototypical problem of finding roots of a quadratic equation was selected as a proxy
for the following reasons that are relevant to many complex practical problems:
• It is a fairly simple problem that is familiar to everyone who would be reading this paper. Yet,
it is good representative a wide class of approximation problem in scientific computing.
• Finding solution involves different arithmetic operations some of which could be difficult to
model by machine learning techniques. For example, division and taking of a square root
represent a challenge for neural networks to capture exactly using activation functions.
• There are situations when a particular form of analytic expression/algorithm may exhibit
loss of accuracy. For example, the analytic expression Eq. 1.2.1 for the larger root is
numerically inaccurate when b is much larger than 4ac.
• The roots of quadratic equation under certain condition exhibit some non-trivial behavior.
There are several branches in the solution: if a = 0, the quadratic equation becomes a linear
equation, which has one root – this is a qualitative change from one regime to a different one;
depending on the discriminant the number of roots as well as the nature of the roots changes
(real vs. complex).
• Probably, the most significant challenge from the standpoint of ML is that there is a small
range of input parameters for which output values are increasingly large (corresponding to
small values of a).
We will now explain what we mean by learning the relation Eq. 1.2.2. Assume we are provided a
number of observations (training set):
j j j j
(aj , b j , c j ) → (r̅L , r̅R ) ≈ (rL , rR ) , j = N + 1, , , , , , , , N + K
Eq. 1.2.4
The goal is to minimize mismatches between the estimates (˜rjL; ˜rjR) and the testing data (rjL; rjR)
6Gyrya, V., Shashkov, M., Skurikhin, A., & Tokareva, S. (2019). Machine learning approaches for the solution of
the Riemann problem in fluid dynamics: a case study. Journal of Computational Physics.
16
j j 2 j j 2
Cost = ∑(rL − r̅L ) + ∑(rR − r̅R )
j j
Eq. 1.2.5
Since the testing data is not available during the training process the minimization is performed on
the training set with the idea that the training and the testing set are selected from the same pool.
The above setup is the typical ML setup. In this work our goal was to compare the performance of
several existing ML approaches for the case of a quadratic equation.
7 Pandey, S., Schumacher, J., & Sreenivasan, K. R. (2020). A perspective on machine learning in turbulent flows.
Journal of Turbulence.
17
(Taradiy et al.)10.
10 Kirill Taradiy, Kai Zhou, Jan Steinheimer, Roman V. Poberezhnyuk, Volodymyr Vovchenko, and Horst
Stoecker, “Machine learning based approach to fluid dynamics”, arXiv:2106.02841v1 [physics .comp-ph], 2021.
11 Sunil, Ray, “Essentials of Machine learning Algorithms (with Python and R codes)”, August 2015.
12 Steven L. Brunton, Bernd R. Noack, and Petros Koumoutsakos, “Machine Learning for Fluid Mechanics”,
13 same as previous.
20
you can fit a polynomial or curvilinear regression. And these are known as polynomial or curvilinear
regression14.
1.6.2 Logistic Regression
Don’t get confused by its name! It is a classification not a regression algorithm. It is used to estimate
discrete values ( Binary values like 0/1, yes/no, true/false ) based on given set of independent
variable(s). In simple words, it predicts the probability of occurrence of an event by fitting data to a
logit function. Hence, it is also known as logistic regression. Since, it predicts the probability,
its output values lies between 0 and 1 (as expected). Again, let us try and understand this through a
simple example. Let’s say your friend gives you a puzzle to solve. There are only 2 outcome scenarios
; either you solve it or you don’t. Now imagine, that you are being given wide range of puzzles/
quizzes in an attempt to understand which subjects you are good at. The outcome to this study would
be something like this ; if you are given a trigonometry based tenth grade problem, you are 70%
likely to solve it. On the other hand, if it is grade fifth history question, the probability of getting an
answer is only 30%. This is what Logistic Regression provides you. Coming to the math, the log odds
of the outcome is modeled as a linear combination of the predictor variables odds = p/(1 - p) =
probability of event occurrence / probability of not event occurrence. ln(odds) = ln(p/(1 - p)),
logit(p) = ln(p/(1 - p)). Above, p is the probability of presence of the characteristic of interest. It
chooses parameters that maximize the likelihood of observing the sample values rather than
that minimize the sum of squared errors (like in ordinary regression). Now, you may ask, why take a
log? For the sake of simplicity, let’s just say that this is one of the best mathematical way to replicate
a step function. It can go in more details, but that will beat the purpose of this article.
1.6.3 Decision Tree
This is favorite algorithm and used
it quite frequently. It is a type of
supervised learning algorithm that
is mostly set for classification
problems15. Surprisingly, it works
for both categorical and
continuous dependent variables.
In this algorithm, we split the
population into two or more
homogeneous sets. This is done
based on most significant
attributes/ independent variables
to make as distinct groups as
possible. In the image above, you
can see that population is
classified into four different
groups based on
multiple attributes to identify
Figure 1.6.2 Decision Tree
‘if they will play or not’. To split
the population into different
heterogeneous groups, it uses various techniques (Figure 1.6.2).
14 Sunil, Ray, “Essentials of Machine learning Algorithms (with Python and R codes)”, August 2015.
15 Sunil, Ray, “Essentials of Machine learning Algorithms (with Python and R codes)”, August 2015.
21
16 arXiv:2110.02085v1 [physics.flu-dyn]
23
[3] R. Bridson. Fluid simulation. A. K. Peters, Ltd., Natick, MA, USA, 2008.
[4] E. Ajuria, A. Alguacil, M. Bauerheim, A. Misdariis, B. Cuenot, and E. Benazera. Towards a hybrid
computational strategy based on deep learning for incompressible flows. AIAA AVIATION Forum,
June 15–19, pages 1–17, 2020.
[5] A. O¨ zbay, A. Hamzehloo, S. Laizet, P. Tzirakis, G. Rizos, and B. Schuller. Poisson CNN:
Convolutional neural networks for the solution of the Poisson equation on a Cartesian mesh. Data-
Centric Engineering, 2:E6, 2021.
[6] K. Fukami, Y. Nabae, K. Kawai, and K. Fukagata. Synthetic turbulent inflow generator using
machine learning. Physical Review Fluids, 4:064603, 2019.
[7] Y. Morita, S. Rezaeiravesh, N. Tabatabaei, R. Vinuesa, K. Fukagata, and P. Schlatter. Applying
Bayesian optimization with Gaussian-process regression to Computational Fluid Dynamics
problems. Preprint arXiv:2101.09985, 2021.
[8] Brunton, S.L., Hemati, M.S. & Taira, K. Special issue on machine learning and data-driven methods
in fluid dynamics. Theor. Comput. Fluid Dyn. 34, 333–337 (2020). https://doi.org/10.1007/s00162-
020-00542-y
1.7.4.1 Abstract
This paper provides a short overview of how to use machine learning to build data-driven models in
fluid mechanics [Brunton, 2022]. The process of machine learning is broken down into five stages:
(1) formulating a problem to model,
(2) collecting and curating training data to inform the model,
(3) choosing an architecture with which to represent the model,
(4) designing a loss function to assess the performance of the model, and
(5) selecting and implementing an optimization algorithm to train the model.
At each stage, we discuss how prior physical knowledge may be embedding into the process, with
specific examples from the field of fluid mechanics.
Keywords : Machine learning, fluid mechanics, physics-informed machine learning, neural networks,
deep learning
1.7.4.2 Introduction
The field of fluid mechanics is rich with data and rife with problems, which is to say that it is a perfect
playground for machine learning. Machine learning is the art of building models from data using
optimization and regression algorithms. Many of the challenges in fluid mechanics may be posed as
optimization problems, such designing a wing to maximize lift while minimizing drag at cruise
velocities, estimating a flow field from limited measurements, controlling turbulence for mixing
enhancement in a chemical plant or drag reduction behind a vehicle, among myriad others. These
24
optimization tasks fit well with machine learning algorithms, which are designed to handle nonlinear
and high-dimensional problems. In fact, machine learning and fluid mechanics both tend to rely on
the same assumption that there are patterns that can be exploited, even in high-dimensional systems
[1]. Often, the machine learning algorithm will model some aspect of the fluid, such as the lift profile
given a particular airfoil geometry, providing a surrogate that may be optimized over. Machine
learning may also be used to directly solve the fluid optimization task, such as designing a machine
learning model to manipulate the behavior of the fluid for some engineering objective with active
control [2-4].
In either case, it is important to realize that machine learning is not an automatic or turn-key
procedure for extracting models from data. Instead, it requires expert human guidance at every stage
of the process, from deciding on the problem, to collecting and curating data that might inform the
model, to selecting the machine learning architecture best capable of representing or modeling the
data, to designing custom loss functions to quantify performance and guide the optimization, to
implementing specific optimization algorithms to train the machine learning model to minimize the
loss function over the data. A better name for machine learning might be “expert humans teaching
machines how to learn a model to fit some data,” although this is not as catchy. Particularly skilled
(or lucky!) experts may design a learner or a learning framework that is capable of learning a variety
of tasks, generalizing beyond the training data, and mimicking other aspects of intelligence. However,
such artificial intelligence is rare, even more so than human intelligence. The majority of machine
learning models are just that, models, which should fit directly into the decades old practice of model-
based design, optimization, and control [5].
With its unprecedented success on many challenging problems in computer vision and natural
language processing, machine learning is rapidly entering the physical sciences, and fluid mechanics
is no exception. The simultaneous promise, and over-promise, of machine learning is causing many
researchers to have a healthy mixture of optimism and skepticism. In both cases, there is a strong
desire to understand the uses and limitations of machine learning, as well as best practices for how
to incorporate it into existing research and development workflows. It is also important to realize
that while it is now relatively simple to train a machine learning model for a well-defined task, it is
still quite difficult to create a new model that outperforms traditional numerical algorithms and
physics-based models. Incorporating partially known physics into the machine learning pipeline well
tend to improve model generalization and improve interpretability and explain ability, which are key
elements of modern machine learning [6,7].
researcher is constantly asking new questions and revising the data, the architecture, the loss
functions, and the optimization algorithm to improve performance. Here, we discuss these canonical
stages of machine learning, investigate how to incorporate physics, and review examples in the field
of fluid mechanics. This discussion is largely meant to be a high-level overview, and many more
details can be found in recent reviews [5, 8–10].
Figure 1.7.1 Schematic of the five stages of machine learning on an example of reduced-order
modeling. In this case, the goal is to learn a low dimensional coordinate system z = f 1(x ; θ1) from
data in a high-dimensional representation x, along with a dynamical system model θz = f2(z; θ2)
for how the state z evolves in time. Finally, this latent state derivative z. must be able to approximate
the high dimensional derivative x. through the decoder x. ≈ f3(z. ; θ3). The loss function
L(θ ; X) defines how well the model performs, averaged over the data X. Finally, the parameters
θ = {θ1; θ2; θ3} are found through optimization.
assumed to be discrete, then the task is clustering. After the clusters are identified and characterized,
these groupings may be used as proxy labels to then classify new data. If the structure in the data is
assumed to be continuously varying, then the task is typically thought of as an embedding or
dimensionality reduction task. Principal component analysis (PCA) or proper orthogonal
decomposition (POD) may be thought of as unsupervised learning tasks that seek a continuous
embedding of reduced dimension [11]. Reinforcement learning is a third, large branch of machine
learning research, in which an agent learns to make control decisions to interact with an environment
for some high level objection [12]. Examples include learning how to play games [13,14], such as
chess and go.
Embedding physics: Deciding on what phenomena to model with machine learning is often
inherently related to the underlying physics. Although classical machine learning has been largely
applied to “static” tasks, such as image classification and the placement of advertisements,
increasingly it is possible to apply these techniques to model physical systems that evolve in time
according to some rules or physics. For example, we may formulate a learning problem to find and
represent a conserved quantity, such as a Hamiltonian, purely from data [15]. Alternatively, the
machine learning task may be to model time-series data as a differential equation, with the learning
algorithm representing the dynamical system [16–20]. Similarly, the task may involve learning a
coordinate transformation where these dynamics become simplified in some physical way; i.e.,
coordinate transformations to linearize or diagonalize/decouple dynamics [21–28].
Examples in fluid mechanics: There are many physical modeling tasks in fluid mechanics that are
benefiting from machine learning [5,9]. A large field of study focuses on formulating turbulence
closure modeling as a machine learning problem [8,29], such as learning models for the Reynolds
stresses [30, 31] or sub-grid scale turbulence [32,33]. Designing useful input features is also an
important way that prior physical knowledge is incorporated into turbulence closure modeling [34–
36]. Similarly, machine learning has recently been focused on the problem of improving
computational fluid dynamics (CFD) solvers [37–40]. Other important problems in fluid mechanics
that benefit from machine learning include super-resolution [41,42], robust modal decompositions
[1,43,44], network and cluster modeling [45-47], control [4, 48] and reinforcement learning [49, 50],
and design of experiments in cyber physical systems [51]. Aerodynamics is a large related field with
significant data-driven advances [52]. The very nature of these problems embeds the learning
process into a larger physics-based framework, so that the models are more physically relevant by
construction.
y = f(x ; θ)
Eq. 1.7.2
and this function is generally represented within a specified family of functions parameterized by
values in θ. For example, a linear regression model would model outputs as a linear function of the
inputs, with θ parameterizing this linear map, or matrix. Neural networks have emerged as a
particularly powerful and flexible class of models to represent functional relationships between data,
and they have been shown to be able to approximate arbitrarily complex functions with sufficient
data and depth [57,58]. There is a tremendous variety of potential neural network architectures [11],
limited only by the imagination of the human designer. The most common architecture is a simple
feedforward network, in which data enters through an input layer and maps sequentially through a
number of computational layers until an output layer. Each layer consists of nodes, where data from
nodes in the previous layer are combined in a weighted sum and processed through an activation
function, which is typically nonlinear. In this way, neural networks are fundamentally compositional
in nature. The parameters θ determine the network weights for how data is passed from one layer to
the next, i.e. the weighted connectivity matrices for how nodes are connected in adjacent layers. The
overarching network topology (i.e., how many layers, how large, what type of activation functions,
etc.) is specified by the architect or determined in a meta-optimization, thus determining the family
of functions that may be approximated by that class of network. Then, the network weights for the
specific architecture are optimized over the data to minimize a given loss function; these stages are
described next.
It is important to note that not all machine learning architectures are neural networks, although they
are one of the most powerful and expressive modern architectures, powered by increasingly big data
and high performance computing. Before the success of deep convolutional networks on the
ImageNet dataset, neural networks were not even mentioned in the list of top ten machine learning
algorithms [59]. Random forests [60] and support vector machines [61] are two other leading
28
architectures for supervised learning. Bayesian methods are also widely used, especially for
dynamical systems [62]. Genetic programming has also been widely used to learn human
interpretable, yet flexible representations of data for modeling [16,63-65] and control [4]. In
addition, standard linear regression and generalized linear regression are still widely used for
modeling time-series data, especially in fluids. The dynamic mode decomposition (DMD) [1,17,66]
employs linear regression with a low-rank constraint in the optimization to find dominant
spatiotemporal coherent structures that evolve linearly in time. The sparse identification of nonlinear
dynamics (SINDy) [18] algorithm employs generalized linear regression, with either a sparsity
promoting loss function [67] or a sparse optimization algorithm [18, 68], to identify a differential
equation model with as few model terms as are necessary to fit the data.
Embedding physics: Choosing a machine learning architecture with which to model the training
data is one of the most intriguing opportunities to embed physical knowledge into the learning
process. Among the simplest choices are convolutional networks for translationally invariant
systems, and recurrent networks, such as long-short-time memory (LSTM) networks [20] or
reservoir computing [19,69], for systems that evolve in time. LSTMs have recently been used to
predict aeroelastic responses across a range of Mach numbers [70]. More generally, equivariant
networks seek to encode various symmetries by construction, which should improve accuracy and
reduce data requirements for physical systems [71–74]. Autoencoder networks enforce the physical
notion that there should be low-dimensional structure, even for high-dimensional data, by imposing
an information bottleneck, given by a constriction of the number of nodes in one or more layers of
the network. Such networks uncover nonlinear manifolds where the data is compactly represented,
generalizing the linear dimensionality reduction obtained by PCA and POD.
It is also possible to embed physics more directly into the architecture, for example by incorporating
Hamiltonian [75, 76] or Lagrangian [77, 78] structure. There are numerous successful examples of
physics-informed neural networks (PINNs) [79–83], which solve supervised learning problems
while being constrained to satisfy a governing physical law. Graph neural networks have also shown
the ability to learn generalizable physics in a range of challenging domains [64, 84,85]. Deep operator
networks [86] are able to learn continuous operators, such as governing partial differential
equations, from relatively limited training data.
Examples in fluid mechanics: There are numerous examples of custom neural network
architectures being used to enforce physical solutions for applications in fluid mechanics. The work
of Ling et al. [30] designed a custom neural network layer that enforced Galilean invariance in the
Reynolds stress tensors that they were modeling. Related Reynolds stress models have been
developed using the SINDy sparse modeling approach [87-89]. Hybrid models that combine linear
system identification and nonlinear neural networks have been used to model complex aeroelastic
systems [90]. The hidden fluid mechanics (HFM) approach is a physics-informed neural network
strategy that encodes the Navier-Stokes equations while being flexible to the boundary conditions
and geometry of the problem, enabling impressive physically quantifiable flow field estimations from
limited data [91]. Sparse sensing has also been used to recover pressure distributions around airfoils
[92]. The Fourier neural operator is a novel operator network that performs super-resolution
upscaling and simulation modeling tasks [93]. Equivariant convolutional networks have been
designed and applied to enforce symmetries in high-dimensional complex systems from fluid
dynamics [73]. Physical invariances have also been incorporated into neural networks for sub grid-
scale scalar flux modeling [94]. Lee and Carlberg [95] recently showed how to incorporate deep
convolutional autoencoder networks into the broader reduced-order modeling framework [96–98],
taking advantage of the superior dimensionality reduction capabilities of deep autoencoders.
data, is a common term in the loss function. In addition, other terms may be added to regularize the
optimization (e.g., the L1 or L2 norm of the parameters θ to promote parsimony and prevent
overfitting). Thus, the loss function typically balances multiple competing objectives, such as model
performance and model complexity. The loss function may also incorporate terms used to promote
a specific behavior across different sub-networks in a neural network architecture. Importantly, the
loss function will provide valuable information used to approximate gradients required to optimize
the parameters.
Embedding physics: Most of the physics-informed architectures described above involve custom
loss functions to promote the efficient training of accurate models. It is also possible to incorporate
physical priors, such as sparsity, by adding L1 or L0 regularizing loss terms on the parameters in θ. In
fact, parsimony has been a central theme in physical modeling for century, where it is believed that
balancing model complexity with descriptive capability is essential in developing models that
generalize. The sparse identification of nonlinear dynamics algorithm [18] learns dynamical systems
models with as few terms from a library of candidate terms as are needed to describe the training
data. There are several formulations involving different loss terms and optimization algorithms that
promote additional physical notions, such as stability [99] and energy conservation [100]. Stability
promoting loss functions based on notions of Lyapunov stability have also been incorporated into
autoencoders, with impressive results on fluid systems [101].
Examples in fluid mechanics: Sparse nonlinear modeling has been used extensively in fluid
mechanics, adding sparsity-promoting loss terms to learn parsimonious models that prevent
overfitting and generalize to new scenarios. SINDy has been used to generate reduced-order models
for how dominant coherent structures evolve in a flow for a range of configurations [100,102-105].
These models have also been extended to develop compact closure models [87–89]. Recently, the
physical notion of boundedness of solutions, which is a fundamental concept in reduced-order
models of fluids [106], has been incorporated into the SINDy modeling framework as a novel loss
function. Other physical loss functions may be added, such as adding the divergence of a flow field as
a loss term to promote solutions that are incompressible [107].
relaxed regularized regression (SR3) optimization framework [68] has been developed specifically
to handle challenging non-convex loss terms that arise in physically motivated problems.
Examples in fluid mechanics: Loiseau [100] showed that it is possible to enforce energy
conservation for incompressible fluid flows directly by imposing skew-symmetry constraints on the
quadratic terms of a sparse generalized linear regression (i.e. SINDy) model. These constraints
manifest as equality constraints on the sparse coefficients θ of the SINDy model. Because the standard
SINDy optimization procedure is based on a sequentially thresholder least-squares procedure, it is
possible to enforce these equality constraints at every stage of the regression, using the Karush–
Kuhn–Tucker (KKT) conditions. The SR3 optimization package [68] was developed to generalize and
extend these constrained optimization problems to more challenging constraints, and to more
generic optimization problems. This is only one of many examples of custom optimization algorithms
being developed to train machine learning models with novel loss functions or architectures.
1.7.4.5 References
[1] Kunihiko Taira, Steven L Brunton, Scott Dawson, Clarence W Rowley, Tim Colonius, Beverley J
McKeon, Oliver T Schmidt, Stanislav Gordeyev, Vassilios Theofilis, and Lawrence S Ukeiley. Modal
analysis of fluid flows: An overview. AIAA Journal, 55(12):4013–4041, 2017.
[2] Jean Rabault, Miroslav Kuchta, Atle Jensen, Ulysse R´eglade, and Nicolas Cerardi. Artificial neural
networks trained through deep reinforcement learning discover control strategies for active flow
control. Journal of fluid mechanics, 865:281–302, 2019.
[3] Feng Ren, Hai-bao Hu, and Hui Tang. Active flow control using machine learning: A brief review.
Journal of Hydrodynamics, 32(2):247–253, 2020.
[4] Yu Zhou, Dewei Fan, Bingfu Zhang, Ruiying Li, and Bernd R Noack. Artificial intelligence control
of a turbulent jet. Journal of Fluid Mechanics, 897, 2020.
[5] Steven L. Brunton, Bernd R. Noack, and Petros Koumoutsakos. Machine learning for fluid
mechanics. Annual Review of Fluid Mechanics, 52:477–508, 2020.
[6] Mengnan Du, Ninghao Liu, and Xia Hu. Techniques for interpretable machine learning.
Communications of the ACM, 63(1):68–77, 2019.
[7] Christoph Molnar. Interpretable machine learning. Lulu. com, 2020.
[8] Karthik Duraisamy, Gianluca Iaccarino, and Heng Xiao. Turbulence modeling in the age of data.
Annual Reviews of Fluid Mechanics, 51:357–377, 2019.
[9] MP Brenner, JD Eldredge, and JB Freund. Perspective on machine learning for advancing fluid
mechanics. Physical Review Fluids, 4(10):100501, 2019.
[10] Michael P Brenner and Petros Koumoutsakos. Machine learning and physical review fluids: An
editorial perspective. Physical Review Fluids, 6(7):070001, 2021.
[11] S. L. Brunton and J. N. Kutz. Data-Driven Science and Engineering: Machine Learning, Dynamical
Systems, and Control. Cambridge University Press, 2019.
[12] Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction, volume 1. MIT
press Cambridge, 1998.
[13] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G
Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level
control through deep reinforcement learning. Nature, 518(7540):529, 2015.
[14] David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez,
Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of go without
human knowledge. nature, 550(7676):354–359, 2017.
[15] Eurika Kaiser, J Nathan Kutz, and Steven L Brunton. Discovering conservation laws from data
for control. In 2018 IEEE Conference on Decision and Control (CDC), pages 6415–6421. IEEE, 2018.
[16] Michael Schmidt and Hod Lipson. Distilling free-form natural laws from experimental data.
Science, 324(5923):81–85, 2009.
[17] P. J. Schmid. Dynamic mode decomposition of numerical and experimental data. Journal of Fluid
Mechanics, 656:5–28, August 2010.
[18] S. L. Brunton, J. L. Proctor, and J. N. Kutz. Discovering governing equations from data by sparse
identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences,
113(15):3932–3937, 2016.
[19] Jaideep Pathak, Zhixin Lu, Brian R Hunt,Michelle Girvan, and Edward Ott. Using machine learning
to replicate chaotic attractors and calculate lyapunov exponents from data. Chaos: An
Interdisciplinary Journal of Nonlinear Science, 27(12):121102, 2017.
[20] Pantelis R Vlachas, Wonmin Byeon, Zhong Y Wan, Themistoklis P Sapsis, and Petros
Koumoutsakos. Data-driven forecasting of high-dimensional chaotic systems with long short-term
memory networks. Proc. R. Soc. A, 474(2213):20170844, 2018.
[21] Bethany Lusch, J Nathan Kutz, and Steven L Brunton. Deep learning for universal linear
embeddings of nonlinear dynamics. Nature Communications, 9(1):4950, 2018.
32
[22] Christoph Wehmeyer and Frank No´e. Time-lagged autoencoders: Deep learning of slow
collective variables for molecular kinetics. The Journal of Chemical Physics, 148(241703):1–9, 2018.
[23] Andreas Mardt, Luca Pasquali, Hao Wu, and Frank No´e. VAMPnets: Deep learning of molecular
kinetics. Nature Communications, 9(5), 2018.
[24] Naoya Takeishi, Yoshinobu Kawahara, and Takehisa Yairi. Learning koopman invariant
subspaces for dynamic mode decomposition. In Advances in Neural Information Processing Systems,
pages 1130–1140, 2017.
[25] Qianxiao Li, Felix Dietrich, ErikMBollt, and Ioannis G Kevrekidis. Extended dynamic mode
decomposition with dictionary learning: A data-driven adaptive spectral decomposition of the
koopman operator. Chaos: An Interdisciplinary Journal of Nonlinear Science, 27(10):103111, 2017.
[26] Enoch Yeung, Soumya Kundu, and Nathan Hodas. Learning deep neural network representations
for koopman operators of nonlinear dynamical systems. arXiv preprint arXiv:1708.06850, 2017.
[27] Samuel E Otto and ClarenceWRowley. Linearly-recurrent autoencoder networks for learning
dynamics. SIAM Journal on Applied Dynamical Systems, 18(1):558–593, 2019.
[28] K. Champion, B. Lusch, J. Nathan Kutz, and Steven L. Brunton. Data-driven discovery of
coordinates and governing equations. Proceedings of the National Academy of Sciences,
116(45):22445–22451, 2019.
[29] Shady E Ahmed, Suraj Pawar, Omer San, Adil Rasheed, Traian Iliescu, and Bernd R Noack. On
closures for reduced order models a spectrum of first-principle to machine-learned avenues. arXiv
preprint arXiv:2106.14954, 2021.
[30] Julia Ling, Andrew Kurzawski, and Jeremy Templeton. Reynolds averaged turbulence modelling
using deep neural networks with embedded invariance. Journal of Fluid Mechanics, 2016.
[31] J Nathan Kutz. Deep learning in fluid dynamics. Journal of Fluid Mechanics, 814:1–4, 2017.
[32] Romit Maulik, Omer San, Adil Rasheed, and Prakash Vedula. Sub grid modelling for two
dimensional turbulence using neural networks. Journal of Fluid Mechanics, 858:122–144, 2019.
[33] Guido Novati, Hugues Lascombes de Laroussilhe, and Petros Koumoutsakos. Automating
turbulence modelling by multi-agent reinforcement learning. Nature Machine Intelligence, 2021.
[34] Jian-XunWang, Jin-LongWu, and Heng Xiao. Physics-informed machine learning approach for
reconstructing Reynolds stress modeling discrepancies based on dns data. Physical Review Fluids,
2(3):034603, 2017.
[35] Linyang Zhu, Weiwei Zhang, Jiaqing Kou, and Yilang Liu. Machine learning methods for
turbulence modeling in subsonic flows around airfoils. Physics of Fluids, 31(1):015105, 2019.
[36] Linyang Zhu, Weiwei Zhang, Xuxiang Sun, Yilang Liu, and Xianxu Yuan. Turbulence closure for
high reynolds number airfoil flows by deep neural networks. Aerospace Science and Technology,
110:106452, 2021.
[37] Yohai Bar-Sinai, Stephan Hoyer, Jason Hickey, and Michael P Brenner. Learning data-driven
discretization for partial differential equations. Proceedings of the National Academy of Sciences,
116(31):15344–15349, 2019.
[38] Stephan Thaler, Ludger Paehler, and Nikolaus A Adams. Sparse identification of truncation
errors. Journal of Computational Physics, 397:108851, 2019.
[39] Ben Stevens and Tim Colonius. Enhancement of shock-capturing methods via machine learning.
Theoretical and Computational Fluid Dynamics, 34:483–496, 2020.
[40] Dmitrii Kochkov, Jamie A Smith, Ayya Alieva, Qing Wang, Michael P Brenner, and Stephan Hoyer.
Machine learning accelerated computational fluid dynamics. arXiv preprint arXiv:2102.01010, 2021.
[41] N Benjamin Erichson, Lionel Mathelin, Zhewei Yao, Steven L Brunton, Michael W Mahoney, and
J Nathan Kutz. Shallow neural networks for fluid flow reconstruction with limited sensors.
Proceedings of the Royal Society A, 476(2238):20200097, 2020.
[42] Kai Fukami, Koji Fukagata, and Kunihiko Taira. Super-resolution reconstruction of turbulent
flows with machine learning. Journal of Fluid Mechanics, 870:106–120, 2019.
33
[43] Kunihiko Taira, Maziar S Hemati, Steven L Brunton, Yiyang Sun, Karthik Duraisamy, Shervin
Bagheri, Scott Dawson, and Chi-An Yeh. Modal analysis of fluid flows: Applications and outlook. AIAA
Journal, 58(3):998–1022, 2020.
[44] Isabel Scherl, Benjamin Strom, Jessica K Shang, Owen Williams, Brian L Polagye, and Steven L
Brunton. Robust principal component analysis for particle image velocimetry. Physical Review
Fluids, 5(054401), 2020.
[45] Aditya G Nair and Kunihiko Taira. Network-theoretic approach to sparsified discrete vortex
dynamics. Journal of Fluid Mechanics, 768:549–571, 2015.
[46] E. Kaiser, B. R. Noack, L. Cordier, A. Spohn, M. Segond, M. Abel, G. Daviller, J. Osth, S. Krajnovic,
and R. K. Niven. Cluster-based reduced-order modelling of a mixing layer. J. Fluid Mech. , 2014.
[47] Daniel Fernex, Bernd R Noack, and Richard Semaan. Cluster-based network modeling from
snapshots to complex dynamical systems. Science Advances, 7(25):eabf5006, 2021.
[48] Guy Y Cornejo Maceda, Yiqing Li, Franc¸ois Lusseyran, Marek Morzy´ nski, and Bernd R Noack.
Stabilization of the fluidic pinball with gradient-enriched machine learning control. Journal of Fluid
Mechanics, 917, 2021.
[49] Dixia Fan, Liu Yang, ZhichengWang, Michael S Triantafyllou, and George Em Karniadakis.
Reinforcement learning for bluff body active flow control in experiments and simulations.
Proceedings of the National Academy of Sciences, 117(42):26091–26098, 2020.
[50] Siddhartha Verma, Guido Novati, and Petros Koumoutsakos. Efficient collective swimming by
harnessing vortices through deep reinforcement learning. Proceedings of the National Academy of
Sciences, 115(23):5849–5854, 2018.
[51] Dixia Fan, Gurvan Jodin, TR Consi, L Bonfiglio, Y Ma, LR Keyes, George E Karniadakis, and Michael
S Triantafyllou. A robotic intelligent towing tank for learning complex fluid structure dynamics.
Science Robotics, 4(36), 2019.
[52] Jiaqing Kou and Weiwei Zhang. Data-driven modeling for unsteady aerodynamics and
aeroelasticity. Progress in Aerospace Sciences, 125:100725, 2021.
[53] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep
convolutional neural networks. In Advances in neural information processing systems, 2012.
[54] Jia Deng,Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale
hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition,
pages 248–255. IEEE, 2009.
[55] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016.
[56] Xuhui Meng and George Em Karniadakis. A composite neural network that learns from multi-
fidelity data: Application to function approximation and inverse pde problems. Journal of
Computational Physics, 401:109020, 2020.
[57] Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are
universal approximators. Neural networks, 2(5):359–366, 1989.
[58] Kurt Hornik. Approximation capabilities of multilayer feedforward networks. Neural networks,
4(2):251–257, 1991.
[59] Xindong Wu, Vipin Kumar, J Ross Quinlan, Joydeep Ghosh, Qiang Yang, Hiroshi Motoda, Geoffrey
J McLachlan, Angus Ng, Bing Liu, S Yu Philip, et al. Top 10 algorithms in data mining. Knowledge and
Information Systems, 14(1):1–37, 2008.
[60] Leo Breiman. Random forests. Machine learning, 45(1):5–32, 2001.
[61] Bernhard Sch¨olkopf and Alexander J Smola. Learning with kernels: support vector machines,
regularization, optimization, and beyond. MIT press, 2002.
[62] Antoine Blanchard and Themistoklis Sapsis. Bayesian optimization with output-weighted
optimal sampling. Journal of Computational Physics, 425:109901, 2021.
[63] Josh Bongard and Hod Lipson. Automated reverse engineering of nonlinear dynamical systems.
Proceedings of the National Academy of Sciences, 104(24):9943–9948, 2007.
34
[64] Miles D Cranmer, Rui Xu, Peter Battaglia, and Shirley Ho. Learning symbolic physics with graph
networks. arXiv preprint arXiv:1909.05862, 2019.
[65] Miles Cranmer, Alvaro Sanchez-Gonzalez, Peter Battaglia, Rui Xu, Kyle Cranmer, David Spergel,
and Shirley Ho. Discovering symbolic models from deep learning with inductive biases. arXiv
preprint arXiv:2006.11287, 2020.
[66] J. N. Kutz, S. L. Brunton, B. W. Brunton, and J. L. Proctor. Dynamic Mode Decomposition: Data-
Driven Modeling of Complex Systems. SIAM, 2016.
[67] Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal
Statistical Society. Series B (Methodological), pages 267–288, 1996.
[68] Peng Zheng, Travis Askham, Steven L Brunton, J Nathan Kutz, and Aleksandr Y Aravkin. Sparse
relaxed regularized regression: SR3. IEEE Access, 7(1):1404–1423, 2019.
[69] Jaideep Pathak, Brian Hunt, Michelle Girvan, Zhixin Lu, and Edward Ott. Model-free prediction
of large spatiotemporally chaotic systems from data: a reservoir computing approach. Physical
review letters, 120(2):024102, 2018.
[70] Kai Li, Jiaqing Kou, and Weiwei Zhang. Deep neural network for unsteady aerodynamic and
aeroelastic modeling across multiple mach numbers. Nonlinear Dynamics, 96(3):2157–2177, 2019.
[71] Nathaniel Thomas, Tess Smidt, Steven Kearnes, Lusann Yang, Li Li, Kai Kohlhoff, and Patrick
Riley. Tensor field networks: Rotation-and translation-equivariant neural networks for 3d point
clouds. arXiv preprint arXiv:1802.08219, 2018.
[72] Benjamin Kurt Miller, Mario Geiger, Tess E Smidt, and Frank No´e. Relevance of rotationally
equivariant convolutions for predicting molecular properties. arXiv preprint arXiv:2008.08461,
2020.
[73] RuiWang, RobinWalters, and Rose Yu. Incorporating symmetry into deep dynamics models for
improved generalization. arXiv preprint arXiv:2002.03061, 2020.
[74] Simon Batzner, Tess E Smidt, Lixin Sun, Jonathan P Mailoa, Mordechai Kornbluth, Nicola
Molinari, and Boris Kozinsky. Se (3)-equivariant graph neural networks for data-efficient and
accurate interatomic potentials. arXiv preprint arXiv:2101.03164, 2021.
[75] Samuel Greydanus, Misko Dzamba, and Jason Yosinski. Hamiltonian neural networks. Advances
in Neural Information Processing Systems, 32:15379–15389, 2019.
[76] Marc Finzi, Ke Alexander Wang, and Andrew Gordon Wilson. Simplifying hamiltonian and
Lagrangian neural networks via explicit constraints. Advances in Neural Information Processing
Systems, 33, 2020.
[77] Miles Cranmer, Sam Greydanus, Stephan Hoyer, Peter Battaglia, David Spergel, and Shirley Ho.
Lagrangian neural networks. arXiv preprint arXiv:2003.04630, 2020.
[78] Yaofeng Desmond Zhong and Naomi Leonard. Unsupervised learning of lagrangian dynamics
from images for prediction and control. Advances in Neural Information Processing Systems, 2020.
[79] M Raissi, P Perdikaris, and GE Karniadakis. Physics-informed neural networks: A deep learning
framework for solving forward and inverse problems involving nonlinear partial differential
equations. Journal of Computational Physics, 378:686–707, 2019.
[80] Guofei Pang, Lu Lu, and George Em Karniadakis. fpinns: Fractional physics-informed neural
networks. SIAM Journal on Scientific Computing, 41(4):A2603–A2626, 2019.
[81] Liu Yang, Dongkun Zhang, and George Em Karniadakis. Physics-informed generative adversarial
networks for stochastic differential equations. SIAM Journal on Scientific Computing, 2020.
[82] Zhiping Mao, Ameya D Jagtap, and George Em Karniadakis. Physics-informed neural networks
for high-speed flows. Computer Methods in Applied Mechanics and Engineering, 360:112789, 2020.
[83] George Em Karniadakis, Ioannis G Kevrekidis, Lu Lu, Paris Perdikaris, Sifan Wang, and Liu Yang.
Physics-informed machine learning. Nature Reviews Physics, 3(6):422–440, 2021.
[84] Peter W Battaglia, Jessica B Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi,
Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, et al. Relational
inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261, 2018.
35
[85] Alvaro Sanchez-Gonzalez, Jonathan Godwin, Tobias Pfaff, Rex Ying, Jure Leskovec, and Peter
Battaglia. Learning to simulate complex physics with graph networks. In International Conference on
Machine Learning, pages 8459–8468. PMLR, 2020.
[86] Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning
nonlinear operators via deeponet based on the universal approximation theorem of operators.
Nature Machine Intelligence, 3(3):218–229, 2021.
[87] S Beetham and J Capecelatro. Formulating turbulence closures using sparse regression with
embedded form invariance. Physical Review Fluids, 5(8):084611, 2020.
[88] Sarah Beetham, Rodney O Fox, and Jesse Capecelatro. Sparse identification of multiphase
turbulence closures for coupled fluid–particle flows. Journal of Fluid Mechanics, 914, 2021.
[89] Martin Schmelzer, Richard P Dwight, and Paola Cinnella. Discovery of algebraic Reynolds stress
models using sparse symbolic regression. Flow, Turbulence and Combustion, 104(2):579-603, 2020.
[90] Jiaqing Kou andWeiwei Zhang. A hybrid reduced-order framework for complex aeroelastic
simulations. Aerospace science and technology, 84:880–894, 2019.
[91] Maziar Raissi, Alireza Yazdani, and George Em Karniadakis. Hidden fluid mechanics: Learning
velocity and pressure fields from flow visualizations. Science, 367(6481):1026–1030, 2020.
[92] Xuan Zhao, Lin Du, Xuhao Peng, Zichen Deng, and Weiwei Zhang. Research on refined
reconstruction method of airfoil pressure based on compressed sensing. Theoretical and Applied
Mechanics Letters, page 100223, 2021.
[93] Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya,
Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential
equations. arXiv preprint arXiv:2010.08895, 2020.
[94] Hugo Frezat, Guillaume Balarac, Julien Le Sommer, Ronan Fablet, and Redouane Lguensat.
Physical invariance in neural networks for sub grid-scale scalar flux modeling. Physical Review
Fluids, 6(2):024607, 2021.
[95] Kookjin Lee and Kevin T Carlberg. Model reduction of dynamical systems on nonlinear manifolds
using deep convolutional autoencoders. Journal of Computational Physics, 404:108973, 2020.
[96] B. R. Noack, K. Afanasiev, M. Morzynski, G. Tadmor, and F. Thiele. A hierarchy of lowdimensional
models for the transient and post-transient cylinder wake. Journal of Fluid Mechanics, 2003.
[97] Peter Benner, Serkan Gugercin, and Karen Willcox. A survey of projection-based model
reduction methods for parametric dynamical systems. SIAM review, 57(4):483–531, 2015.
[98] Clarence W Rowley and Scott TM Dawson. Model reduction for flow analysis and control. Annual
Review of Fluid Mechanics, 49:387–417, 2017.
[99] Alan A Kaptanoglu, Jared L Callaham, Christopher J Hansen, Aleksandr Aravkin, and Steven L
Brunton. Promoting global stability in data-driven models of quadratic nonlinear dynamics. arXiv
preprint arXiv:2105.01843, 2021.
[100] J. C. Loiseau and S. L. Brunton. Constrained sparse Galerkin regression. Journal of Fluid
Mechanics, 838:42–67, 2018.
[101] N Benjamin Erichson, Michael Muehlebach, and Michael W Mahoney. Physics-informed
autoencoders for lyapunov-stable fluid flow prediction. arXiv preprint arXiv:1905.10866, 2019.
[102] J. C. Loiseau, B. R. Noack, and S. L. Brunton. Sparse reduced-order modeling: sensor-based
dynamics to full-state estimation. Journal of Fluid Mechanics, 844:459–490, 2018.
[103] Jean-Christophe Loiseau. Data-driven modeling of the chaotic thermal convection in an annular
thermosyphon. Theoretical and Computational Fluid Dynamics, 34(4):339–365, 2020.
[104] Nan Deng, Bernd R Noack, Marek Morzy´ nski, and Luc R Pastur. Low-order model for
successive bifurcations of the fluidic pinball. Journal of fluid mechanics, 884, 2020.
[105] Nan Deng, Bernd R Noack, Marek Morzy´ nski, and Luc R Pastur. Galerkin force model for
transient and post-transient dynamics of the fluidic pinball. Journal of Fluid Mechanics, 918, 2021.
[106] M. Schlegel and B. R. Noack. On long-term boundedness of Galerkin models. Journal of Fluid
Mechanics, 765:325–352, 2015.
36
[107] Rui Wang, Karthik Kashinath, Mustafa Mustafa, Adrian Albert, and Rose Yu. Towards physics-
informed deep learning for turbulent flow prediction. In Proceedings of the 26th ACM SIGKDD
International Conference on Knowledge Discovery & Data Mining, pages 1457–1466, 2020.
[108] Michael Grant, Stephen Boyd, and Yinyu Ye. Cvx: Matlab software for disciplined convex
programming, 2008.
[109] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press,
2009.
[110] SBj Pope. Amore general effective-viscosity hypothesis. Journal of Fluid Mechanics, 72(2):331
37
Figure 1.8.1 Sample results from the work by Kochkov et al. [56], where the instantaneous vorticity
field is shown for (top) the simulation with original resolution, (middle) low-resolution data based on
the ML model and (bottom) low-resolution data based on a simulation with the same coarse resolution.
Four different time steps are shown, and some key vortical structures are highlighted with yellow
squares. Reprinted from Ref. [56], with permission of the publisher (United States National Academy of
Sciences)
fine computational meshes required to resolve the smallest scales lead to exceedingly high
computational costs, which increase with the Reynolds number [1].
A number of machine learning approaches have been developed recently to improve the efficiency of
DNS. Bar-Sinai et al. [2] proposed a technique based on deep learning to estimate spatial derivatives
in low-resolution grids, outperforming standard finite-difference methods. A similar approach was
developed by Stevens and Colonius [3] to improve the results of fifth-order finite-difference schemes
in the context of shock-capturing simulations. Other strategies to improve the performance of PDE
solvers in coarser meshes have been developed by Li et al. [4-6]. Recently, Kochkov et al. [7]
17 arXiv:2110.02085v1 [physics.flu-dyn]
38
considered the two-dimensional Kolmogorov flow [8], which maintains fluctuations via a forcing
term. They leveraged deep learning to develop a correction between fine and coarse resolution
simulations, obtaining excellent agreement with reference simulations in meshes from 8 to 10 times
coarser in each dimension, as shown in Figure 1.8.1. These results promise to significantly reduce
the computational cost of relevant fluid simulations, including weather [9], climate [10], engineering
[11], and astrophysics [12].
Jeon and Kim [13] proposed to use a deep neural network to simulate the well-known finite-volume
discretization scheme [14] employed in fluid simulations. They tested their method with reactive
flows, obtaining excellent agreement with reference high-resolution data at one tenth the
computational cost. However, they also documented errors with respect to the reference solution
which increased with time. Another deep-learning approach, based on a fully-convolutional/long-
short-term-memory (LSTM) network, was proposed by Stevens and Colonius [15] to improve the
accuracy of finite-difference/finite-volume methods. Recent developments in CFD for
turbomachinery, with emphasis in turbulence, which make use of machine learning techniques to
augment prediction accuracy, speed up prediction times, analyze and manage uncertainty and
reconcile simulations with available data [16].
1.8.1 References
[1] H. Choi and P. Moin. Grid-point requirements for large eddy simulation: Chapman’s estimates
revisited. Physics of Fluids, 24:011702, 2012.
[2] Y. Bar-Sinai, S. Hoyer, J. Hickey, and M. P. Brenner. Learning data-driven discretization’s for partial
differential equations. Proceedings of the National Academy of Sciences, 116(31), 2019.
[3] B. Stevens and T. Colonius. Enhancement of shock-capturing methods via machine learning.
Theoretical and Computational Fluid Dynamics, 34:483–496, 2020.
[4] K. Lee and K. T. Carlberg. Model reduction of dynamical systems on nonlinear manifolds using
deep convolutional autoencoders. Journal of Computational Physics, 404:108973, 2020.
[5] Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anandkumar.
Fourier neural operator for parametric partial differential equations. arXiv preprint
arXiv:2010.08895, 2020.
[6] Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anandkumar.
Multipole 10 graph neural operator for parametric partial differential equations. arXiv preprint
arXiv:2006.09535, 2020.
[7] D. Kochkov, J. A. Smith, A. Alieva, Q. Wang, M. P. Brenner, and S. Hoyer. Machine learning-
accelerated computational fluid dynamics. Proceedings of the National Academy of Sciences,
118:e2101784118, 2021.
[8] G. J. Chandler and R. R. Kerswell. Invariant recurrent solutions embedded in a turbulent two-
dimensional Kolmogorov flow. Journal of Fluid Mechanics, 722:554–595, 2013.
[9] P. Bauer, A. Thorpe, and G. Brunet. The quiet revolution of numerical weather prediction. Nature,
525:47–55, 2015.
[10] F. Schenk, M. V¨aliranta, F. Muschitiello, L. Tarasov, M. Heikkil¨a, S. Bj ¨orck, J. Brandefelt, A. V.
Johansson, J. O. N¨aslund, and B. Wohlfarth. Warm summers during the Younger Dryas cold reversal.
Nature Communications, 9:1634, 2018.
[11] R. Vinuesa, P. S. Negi, M. Atzori, A. Hanifi, D. S. Henningson, and P. Schlatter. Turbulent boundary
layers around wing sections up to Rec = 106. International Journal of Heat and Fluid Flow, 2018.
[12] C. Aloy Tor´as, P. Mimica, and M. Mart´ınez-Sober. Towards detecting structures in
computational astrophysics plasma simulations: using machine learning for shock front
classification. In Artificial Intelligence Research and Development. Z. Falomir et al. (Eds.), 2018.
[13] J. Jeon and S. J. Kim. FVM Network to reduce computational cost of CFD simulation. Preprint
arXiv:2105.03332, 2021.
39
[14] R. Eymard, T. Gallou¨et, and R. Herbin. Finite volume methods. Handbook of Numerical Analysis,
7:713–1018, 2000.
[15] B. Stevens and T. Colonius. Finitenet: A fully convolutional LSTM network architecture for time-
dependent partial differential equations. arXiv preprint arXiv:2002.03014, 2020.
[16] Hammond, J.; Pepper, N.; Montomoli, F.; Michelassi, V. Machine Learning Methods in CFD for
Turbomachinery: A Review. Int. J. Turbomachinery Propulsion Power 2022, 7, 16.
18 Rechenberg, I., “Evolutions strategie: Optimierung technischer systeme nach prinzipien der biologischen
evolution”. Fromann-Holzboog, Stuttgart, 1973.
19 Holmes, P., Lumley, J. & Berkooz, G., ”Turbulence, Coherent Structures, Dynamical Systems and Symmetry”,
20 Benner,
P., Gugercin, S. & Willcox, K., “ A survey of projection-based model reduction methods for parametric
dynamical systems”, SIAM Rev. 57, 483–531, 2015.
41
an input layer, a layer with RBF neurons and an output. The RBF neurons store the actual classes for
each of the training data instances. The RBN are different from the usual Multilayer perceptron
because of the Radial Function used as an
activation function.
When the new data is fed into the neural
network, the RBF neurons compare the
Euclidian distance of the feature values
with the actual classes stored in the
neurons. This is similar to finding which
cluster to does the particular instance
belong. The class where the distance is
minimum is assigned as the predicted class.
The RBNs are used mostly in function
approximation applications like Power
Restoration systems. (Figure 2.2.3).
2.2.5 Convolutional Neural Networks
When it comes to image classification, the
most used neural networks are
Convolution Neural Networks (CNN). Figure 2.2.3 Radial Basis Function
CNN contain multiple convolution layers
which are responsible for the extraction of important features from the image (Figure 2.2.4). The
earlier layers are responsible for low-level details and the later layers are responsible for more high-
level features. The Convolution operation uses a custom matrix, also called as filters, to convolute
over the input image and produce maps. These filters are initialized randomly and then are updated
via backpropagation. One example of such a filter is the Canny Edge Detector, which is used to find
the edges in any image.
22 Wikipedia
45
2.2.8 Case Study - Prediction & Comparison of the Maximal Wall Shear Stress (MWSS) for Carotid
Artery Bifurcation
Steady state simulations for 1886 geometries were undertaken and MWSS values were calculated for
each of them. This dataset was used for training and testing following data mining algorithms; k-
nearest neighbors, linear regression, neural network: multilayer perceptron, random forest and
support vector machine. The results are based on Relative Root Mean Square (RMSE):
Figure 2.2.7 Maximal Wall Shear Stress (MWSS) Value for Carotid Artery Bifurcation
2.4 Field Inversion and Machine Learning in Support of Data Driven Environment
A machine learning technique such as an
Artificial Neural Network (ANN) can
adequately describe by its field inversion on
data driven context. The Calibration Cases
(offline data) where few configuration data
(DNS or Experimental data) such as the one
showing in Figure 2.4.1. The Prediction
cases (Machine Learning with no data) has
similar configuration with different; (1) Twist,
(2) Sweep angles, and (3) Airfoil shape23. The
challenge in predictive modeling, however, is
to extract an optimal model form that is Figure 2.4.1 Calibration Cases for off Line Data
sufficiently accurate. Constructing such a
model and demonstrating its predictive
capabilities for a class of problems is the objective.
Figure 2.5.1 Network Diagram for a feed-forward NN with three inputs and one output
23Heng Xiao, “Physics-Informed Machine Learning for Predictive Turbulence Modeling: Status, Perspectives,
and Case Studies”, Machine Learning Technologies and Their Applications to Scientific and Engineering
Domains Workshop, August 17, 2016.
47
them through nonlinear activation functions (Singh, Medida, & Duraisamy, 2016)24. The process is
repeated once for each hidden layer (marked blue in Figure 2.5.1) in the network, until the output
layer is reached. Figure 2.5.1 presents a sample ANN where a Network diagram for a feed-forward
NN with three inputs, two hidden layers, and one output. For this sample network, the values of the
hidden nodes z1,1 through z1,H1 would be constructed as
3
1
z1,i = a1 (∑ wi,j ηi )
i=1
Eq. 2.5.1
where a1 and w1i,j are the activation function and weights associated with the first hidden layer,
respectively. Similarly, the second layer of hidden nodes is constructed a
H1
24 Singh, A. P., Medida, S., & Duraisamy, K. (2016). Machine Learning-augmented Predictive Modeling of
Turbulent Separated. arXiv:1608.03990v3 [cs.CE].
25 Zhang, Z. J. and Duraisamy, K., “Machine Learning Methods for Data-Driven Turbulence Modeling,” 22nd AIAA
Computational Fluid Dynamics Conference, AIAA Aviation, (AIAA 2015-2460), Dallas, TX, Jun 2015.
26 S. Muller , M. Milano and P. Koumoutsakos, “Application of machine learning algorithms to flow modeling and
v = V + ∑ an (t)φn (x)
i=1
Eq. 2.5.4
where V is the time averaged flow, φn is the set of the first n eigenvectors of the covariance matrix C
= E [(vi−V )(vj −V )]; when this representation for v is substituted in the Navier Stokes equations, the
original PDE model is transformed in an ODE model, composed by n equations. The POD can be
expressed as a multi-layer feed-forward neural network. Such a network is defined by the number of
layers, the specification of the output function for the neurons in each layer, and the weight matrices
for each layer. [Baldi and Hornik]28 have shown that training a linear neural network structure to
perform an identity mapping on a set of vectors is equivalent to obtaining the POD of this set of
vectors. A neural network performing the linear POD can be specified as a 2 layer linear network:
x = W1 v
v̂ = W2 x
Eq. 2.5.5
where ^v is the reconstructed field, v is the original flow field, having N components, x is the reduced
order representation of the field, having n components, and W1 and W2 are the network weight
matrices, of sizes N x n and n x N respectively. Non-linearity can be introduced by a simple extension
to this basic network:
x = W2 tanh(W1 v)
v̂ = W4 tanh(W3 x)
Eq. 2.5.6
This corresponds to a neural network
model with 4 layers: the first one,
with an m x N weight matrix W1,
nonlinear; the second one, with an n x
m weight matrix W2, linear; the third
one, also nonlinear, with an m x n
weight matrix W3, and the last one,
linear with an N x m weight matrix
W4. However, the resulting system of
ODEs is more involved as compared
to the one resulting from the
application of the linear POD.
2.5.1.1 POD and Nonlinear ANN
A simple comparison of POD and
nonlinear ANN is provided by the
reconstruction of the velocity field in
the stochastically forced Burger's
Figure 2.5.2 Comparison of linear POD (top) and Neural
equation a classical 1D model for
Networks (bottom)
turbulent flow [Chambers]29. The
28 Baldi, P. & Hornik, K., “ Neural networks and principal component analysis: Learning from examples without
local minima”. Neural Networks. 2, 53-58, 1989.
29 Chambers, D. H., Adrian R. J., Moin, P. & Stewart, S.,”Karhunen-Loeve expansion of Burgers model of turbulence”.
linear POD was used to obtain a set of 256 linear Eigen functions using 10000 snapshots extracted
from a simulation. Using the first 7 Eigen functions it is possible to reconstruct the original flow field,
keeping the 90 percent of the energy. A nonlinear neural network was trained on the same data set
to perform the identity mapping: this network is composed by 256 inputs and 4 layers having
respectively 64 nonlinear neurons, 7 linear neurons, 64 nonlinear neurons, and 256 linear neurons.
For validation purposes, a data set of 1000 snapshots, not used in the training phase, was used. In
Figure 2.5.2 it is possible to appreciate the reconstruction performances of both the approaches;
the proposed nonlinear ANN clearly outperforms the linear POD (top) using a velocity field in Burgers
equation.
Other researchers such as (Romit Maulik et al.)32, tried using an open source module (TensorFlow),
within the OpenFOAM. It outline the development of a data science module within OpenFOAM which
allows for the in-situ deployment of trained deep learning architectures for general-purpose
30 Ling, J., Kurzawski, A. & Templeton, J. “Reynolds averaged turbulence modelling using deep neural networks
with embedded invariance”, J. Fluid Mech 807, 155–166, 2016.
31 Karthik Duraisamy, “A Framework for Turbulence Modeling using Big Data”, NASA Aeronautics Research
predictive tasks. This is constructed with the TensorFlow C API and is integrated into OpenFOAM as
Figure 2.6.2 Contour plots for a backward facing step. Note that the training of the ML surrogate did
not include data for the shown step height.
an application that may be linked at run time. In this experiment, the different geometries are all
backward facing steps with varying step heights (ℎ). Once trained, the steady-state eddy-viscosity
emulator may be used at the start of the simulation (by observing the initial conditions) following
which solely the pressure and velocity equations need to be solved to convergence. We outline
results from one such experiment (backward steps), where the geometry is ‘unseen’, in Figure 2.6.2.
33Roxana M. Greenman, “Two-Dimensional High-Lift Aerodynamic Optimization Using Neural Networks”, Ames
Research Center, Moffett Field, California, NASA / TM- 1998-112233.
53
with a carpet map. The neural network will be able to capture the design space with the small amount
of data that is generated. Next, the optimizer will be able to locate extreme as in the design space by
using the captured design space to calculate the path that must be followed to reach a maxima. The
agile artificial intelligence (AI) design space capture and surfing process is shown in Figure 3.1.1.
Figure 3.1.1 Agile AI-Enhanced Design Space Capture and Smart Surfing
Recently, neural networks have been applied to a wide range of problems in the aerospace industry.
For example, neural networks have been used in aerodynamic performance optimization of rotor
blade design34. The study demonstrated that for several rotor blade designs, neural networks were
advantageous in reducing the time required for the optimization. [Failer and Schreck]35 successfully
used neural networks to
predict real-time three-
dimensional unsteady
separated flow fields and
aerodynamic coefficients
of a pitching wing. It has
also been demonstrated
that neural networks are
capable of predicting
measured data with
sufficient accuracy to
enable identification of
instrumentation system
degradation. [Steck and Figure 3.1.2 Illustration of AI-Enhanced Design Process
Rokhsaz]36 demonstrated
34 LaMarsh, W. J.; Walsh, J. L.; and Rogers, J. L.:, “Aerodynamic Performance Optimization of a Rotor Blade Using
a Neural Network as the Analysis”. AIAA Paper 92-4837, Sept. 1992.
35 Failer, W. E.; and Schreck, S. J., “Real-Time Prediction of Unsteady Aerodynamics: Application for Aircraft
Control and Maneuverability Enhancement”. IEEE Transactions on Neural Networks, vol. 6, no. 6, Nov. 1995.
36 Steck, J. E.; and Rokhsaz, K.: “Some Applications of Artificial Neural Networks in Modeling of Nonlinear
that neural networks can be successfully trained to predict aerodynamic forces with sufficient
accuracy or design and modeling. [Rai and Madavan]37 demonstrated the feasibility of applying
neural networks to aerodynamic design of turbomachinery airfoils.
3.1.2 Agile AI-Enhanced Design Process
Here, we describes a process which allows CFD to impact high-lift design. This process has three
phases:
1. generation of the training database using CFD;
2. training of the neural networks;
3. integration of the trained neural networks with an optimizer to capture and surf (search) the
high-lift design space (refer to Figure 3.1.2).
In this reading, an incompressible 2D Navier-Stokes solver is used to compute the flow field about
the three-element airfoil shown in Figure 3.1.3. The selected airfoil is a cross-section of the Flap-
Edge model that was tested in
the 7- by 10-Foot Wind
Tunnel No. 1 at the NASA
Ames Research Center.
Extensive wind-tunnel
investigations have been
carried out for the Flap-Edge
geometry shown in Figure
3.1.3. The model is a three-
element un-swept wing
consisting of a 12%c LB-546
slat, NACA 632-215 Mod B
main element and a 30%c
Fowler flap where c is chord
and is equal to c = 30.0 inches
for the un-deflected (clean, all
high-lift components stowed)
airfoil section. the CFD
database for this flap
optimization problem, there
are two different slat
deflection settings, six and
twenty-six degrees, and for
each, 27 different flap
riggings (refer to Figure
3.1.3-b) are computed for
ten different angles of attack. Figure 3.1.3 Edge Geometry and Definition of Flap and Slat High-Lift
[for details see Greenman]38. Rigging
The neural networks are
trained by using the flap riggings and angles of attack as the inputs and the aerodynamic forces as
the outputs. The neural networks are defined to be successfully trained to predict the aerodynamic
coefficients when given a set of inputs that are not in the training set, the outputs are predicted within
37 Rai, M. M.; and Madavan, N. K.: “Application of Artificial Neural Networks to the Design of Turbomachinery
Airfoils”. AIAA Paper 98-1003, Jan. 1998.
38 Roxana M. Greenman, “Two-Dimensional High-Lift Aerodynamic Optimization Using Neural Networks”, Ames
the experimental error. Finally, the trained neural networks are integrated with the optimizer to
allow the design space to be easily searched for points of interest. It will be shown that this agile,
artificial intelligence enhanced design process minimizes the cost and time required to accurately
optimize the high lift flap rigging.
3.1.3 Summary
Multiple input, single output networks were trained using the NASA Ames variation of the Levenberg-
Marquardt algorithm. The neural networks were first trained with wind tunnel experimental data of
the three-element airfoil to test the validity of the neural networks. The networks did accurately
predict the lift coefficients of the individual main and flap elements. However, there was noticeable
error in predicting the slat lift coefficient. The prediction error is most likely caused by the sparse
training data since there were only five different used to train the neural networks. This results from
the fact that the errors are also summed and amplify in the prediction of the total lift coefficient.
Computational data is next used to train the neural networks to test if computational data can be
used to train the neural networks. The neural networks were used to create a computational data
base which may be used to impact design. Solutions were obtained by solving the two-dimensional
Reynolds-averaged incompressible Navier-Stokes equations. The flow field was assumed to be fully
turbulent and the Spalart-Allmaras turbulence model been used. The computational data set had to
be pre-processed to reduce the prediction error at or beyond maximum lift. In high-lift aerodynamics,
both experimentally and computationally, it is difficult to predict the maximum lift, and at which
angle of attack it occurs. In order to predict maximum lift and the angle of attack where it occurs, a
maximum lift criteria was needed. The pressure difference rule, which states that there exists a
certain pressure difference between the peak suction pressure and the pressure at the trailing edge
of the element at the maximum lift condition, was applied to all three elements. For this configuration,
it was found that only the pressure difference on the slat element was needed to predict maximum
lift. By applying the pressure difference rule, the prediction errors of the neural networks were
reduced.
The amount of data that is required to train the neural networks was reduced to allow computational
fluid dynamics to impact the design phase. Different subsets of the training methods were created by
removing entire configurations from the six-degree-deflected slat training set. The mean and
standard deviations of the root-mean-square prediction errors were calculated to compare the
different methods of training. Even though the entire computational data set was sparse, it was
reduced to only 70% of the entire data. It was found that the trained neural networks predicted the
aerodynamic coefficients within an acceptable accuracy defined to be the experimental error. The
aerodynamic data had to be represented in a nonlinear fashion so that the neural networks could
learn and predict accurately. By carefully choosing the training subset, the computational data set
was even further reduced to contain only 52% of the configurations.
These trained neural networks also predicted the aerodynamic coefficients within the acceptable
error. Thus, the computational data required to accurately represent the flow field of a multi-element
airfoil was reduced to allow computational fluid dynamics to be a usable tool for design. This same
procedure was followed in the twenty-six-degree-deflected slat computational data. This data set had
higher deflected flaps which were actually out of the normal flight envelope. The same trends were
found except that the prediction error was much higher in this training set than the previous one.
This was caused by the fact that the flow field was severely separated with the higher deflected flaps.
Thus, the training data representing the flow field was noisy which leads to prediction errors. The
computational design space needs to be easily searched for areas of interest such as maximums or
optimal points. An optimization study to search the design space was conducted by using neural
networks that were trained with computational data.
Artificial neural networks have been successfully integrated with a gradient based optimizer to
minimize the amount of data required to completely define the design space of a three-element
56
airfoil. The accuracy of the neural networks' prediction was tested for both the initial and modified
configurations by generating the grid and computing the INS2D-UP solution. The high-lift flap
aerodynamics were optimized for a three-element airfoil by maximizing the lift coefficient. The
design variables were flap deflection, gap, and overlap.
3.1.4 Conclusion
Overall, the neural networks were trained successfully to predict the high-lift aerodynamics of a
multi-element airfoil. The neural networks were also able to predict the aerodynamics successfully
when only 52-70% of the entire computational data set was used to train. The neural networks were
integrated with an optimizer thus allowing a quick way to search the design space for points of
interest. Optimization with neural networks reduced the turnaround time, CPU time, and cost of
multiple optimization runs. Therefore, neural networks are an excellent tool to allow computational
fluid dynamics to impact the design space. For a complete study of ANN in design and optimization
environments, reader encourage to consult the [Rai et al.]39 and [Timnak et al.]40 beside current
author.
39 Man Mohan Rai, Nateri K. Madavan, and Frank W. Huber, “Improving the Unsteady Aerodynamic
Performance of Transonic Turbines Using Neural Networks”, NASA/TM-1999-208791.
40 N. Timnak, A. Jahangiriana and S.A. Seyyedsalehi, “An optimum neural network for evolutionary
3.2 Case Study 2 - Artificial Neural Networks (ANNs) Trained Through Deep
Reinforcement Learning Discover Control Strategies For Active Flow Control
Citation : Rabault, J., Kuchta, M., Jensen, A., Réglade, U., & Cerardi, N. (2019). Artificial neural networks
trained through deep reinforcement learning discover control strategies for active flow control. Journal
of Fluid Mechanics, 865, 281-302. doi:10.1017/jfm.2019.62
We present the first application of an Artificial Neural Network (ANNs) trained through a Deep
Reinforcement Learning agent to perform active flow control41. It is shown that, in a 2D simulation
of the Kármán vortex street at moderate Reynolds number (Re = 100), our Artificial Neural Network
is able to learn an active control strategy from experimenting with the mass flow rates of two jets on
the sides of a cylinder. By interacting with the unsteady wake, the Artificial Neural Network
successfully stabilizes the vortex alley and reduces drag by about 8%. This is performed while using
small mass flow rates for the actuation, on the order of 0.5% of the mass flow rate intersecting the
cylinder cross section once a new pseudo-periodic shedding regime is found. This opens the way to
a new class of methods for performing active flow control.
3.2.1 Introduction and Literature Survey
Drag reduction and flow control are techniques of critical interest for the industry [Brunton &
Noack]42. For example, 20% of all energy losses on modern heavy duty vehicles are due to
aerodynamic drag, and drag is naturally the main source of energy losses for an airplane. Drag is also
a phenomenon that penalizes animals, and Nature shows examples of drag mitigation techniques. It
is for example thought that structures of the skin of fast-swimming sharks interact with the turbulent
boundary layer around the animal, and reduce drag by as much as 9% [Dean & Bhushan]43. This is
therefore a proof-of-existence that flow control can be achieved with benefits, and is worth aiming
for. In the past, much research has been carried towards so-called passive drag reduction methods,
for example using Micro Vortex Generators for passive control of transition to turbulence. While it
should be underlined that this technique is very different from the one used by sharks (preventing
transition to turbulence by energizing the linear boundary layer, contra reducing the drag of a fully
turbulent boundary layer), benefits in terms of reduced drag can also be achieved. Another way to
obtain drag reduction is by applying an active control to the flow. A number of techniques can be
used in active drag control and have been proven effective in several experiments, a typical example
being to use small jets.
Interestingly, it has been shown that effective separation control can be achieved with even quite
weak actuation, as long as it is used in an efficient way [Schoppa & Hussain]44. This underlines the
need to develop techniques that can effectively control a complex actuation input into a flow, in order
to reduce drag. Unfortunately, designing active flow control strategies is a complex endeavor [Duriez
et al]45. Given a set of point measurements of the flow pressure or velocity around an object, there is
no easy way to find a strategy to use this information in order to perform active control and reduce
drag. The high dimensionality and computational cost of the solution domain (set by the complexity
41 Jean Rabault, Miroslav Kuchta, Atle Jensen, Ulysse Reglade and Nicolas Cerardi, “Artificial Neural Networks
Trained Through Deep Reinforcement Learning Discover Control Strategies For Active Flow Control”, Journal of
Fluid Mechanics, April 2019.
42 Brunton, Steven L & Noack, Bernd R 2015 Closed-loop turbulence control: progress and challenges. Applied
and non-linearity inherent to Fluid Mechanics) mean that analytical solutions, and real-time
predictive simulations (that would decide which control to use by simulating several control
scenarios in real time) seem out of reach. Despite the considerable efforts put into the theory of flow
control, and the use of a variety of analytical and semi-analytical techniques [Barbagallo et al.]46;
[Sipp & Schmid]47, bottom-up approaches based on an analysis of the flow equations face
considerable difficulties when attempting to design flow control techniques.
A consequence of these challenges is the simplicity of the control strategies used in most published
works about active flow control, which traditionally focus on either harmonic or constant control
input [Schoppa & Hussain]48. Therefore, there is a need to develop efficient control methods, that
perform complex active control and take full advantage of actuation possibilities. Indeed, it seems
that, as of today, the actuation possibilities are large, but only simplistic (and probably suboptimal)
control strategies are implemented. To the knowledge of the authors, only few published examples
of successful complex active control strategies are available with respect to the importance and
extent of the field [Pastoor et al.]49; [Gautier et al]50 ; [Li et al.]51 ; [Erdmann et al.]52 ; [Guéniat et al.]53.
In the present work, we aim at introducing for the first time Deep Neural Networks and
Reinforcement Learning to the field of active flow control. Deep Neural Networks are
revolutionizing large fields of research, such as image analysis [Krizhevsky et al.] 54, speech
recognition, and optimal control. Those methods have surpassed previous algorithms in all these
examples, including methods such as genetic programming, in terms of complexity of the tasks
learned and learning speed. It has been speculated that Deep Neural Networks will bring advances
also to fluid mechanics [Kutz]55, but until this day those have been limited to a few applications, such
as the definition of reduced order models [Wang et al.]56, the effective control of swimmers, or
performing Particle Image Velocimetry (PIV). As Deep Neural Networks, together with the
Reinforcement Learning framework, have allowed recent breakthroughs in the optimal control of
complex dynamic systems [Lillicrap et al.]57; [Schulman et al.]58, it is natural to attempt to use them
46 Barbagallo, Alexandre, Dergham, Gregory, Sipp, Denis, Schmid, Peter J & Robinet, Jeanchristophe 2012 Closed-
loop control of unsteadiness over a rounded backward-facing step. Journal of Fluid Mechanics 703.
47 Sipp, Denis & Schmid, Peter J 2016 Linear closed-loop control of fluid instabilities and noise-induced
perturbations: A review of approaches and tools. Applied Mechanics Reviews 68 (2), 020801.
48 Schoppa, Wade & Hussain, Fazle 1998 A large-scale control strategy for drag reduction in turbulent boundary
control for bluff body drag reduction. Journal of fluid mechanics 608, 161–196.
50 Gautier, Nicolas, Aider, J-L, Duriez, Thomas, Noack, Br, Segond, Marc & Abel, Markus,“ Closed-loop separation
car model by linear genetic programming control. Experiments in Fluids 58 (8), 103.
52 Erdmann, Ralf, Pätzold, Andreas, Engert, Marcus, Peltzer, Inken & Nitsche, Wolfgang 2011 On active control
control of fluid flows. Theoretical and Computational Fluid Dynamics 30 (6), 497–510.
54 Krizhevsky, Alex, Sutskever, Ilya & Hinton, Geoffrey E 2012 Image net classification with deep convolutional
dynamics systems using deep learning. International Journal for Numerical Methods in Fluids 86 (4), 255–268.
57 Lillicrap, Timothy P, Hunt, Jonathan J, Pritzel, Alexander, Heess, Nicolas, Erez, Tom, Tassa, Yuval, Silver, David
& Wierstra, Daan 2015 Continuous control with deep reinforcement learning. preprint arXiv:1509.02971
58 Schulman, John, Levine, Sergey, Moritz, Philipp, Jordan, Michael I. & Abbeel, Pieter 2015 Trust region policy
59 Mnih, Volodymyr, Kavukcuoglu, Koray, Silver, David, Graves, Alex, Antonoglou, Ioannis, Wierstra, Daan &
Riedmiller, Martin 2013 Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 .
60 Gu, Shixiang, Lillicrap, Timothy, Sutskever, Ilya & Levine, Sergey 2016 Continuous deep q-learningwith
The PPO agent performs active flow control in a 2D simulation environment. In the following, all
quantities are considered non-dimensionalized. The geometry of the simulation, adapted from the
2D test case of well-known benchmarks [Schäfer et al]62, consists of a cylinder of non-dimensional
diameter D = 1 immersed in a box of total non-dimensional length L = 22 (along the X-axis) and height
H = 4.1 (along Y-axis). The origin of the coordinate system is in the center of the cylinder. Similarly
to the benchmark of the cylinder is slightly off the centerline of the domain (a shift of 0.05 in the Y
direction is used), in order to help trigger the vortex shedding. The inflow profile (on the left wall of
the domain) is parabolic, following the formula (cf. 2D-2 test case in [Schäfer et al.]63):
H 2
6 [( ) − y 2 ]
2
U(y) =
H2
Eq. 3.2.1
where (U(y), V(y) = 0) is the non-dimensionalized velocity vector. Using this velocity profile, the
mean velocity magnitude is Ū = 2U(0)/3 = 1. A no-slip boundary condition is imposed on the top and
Figure 3.2.1 Unsteady Non-Dimensional Pressure Wake Behind the Cylinder after Flow Initialization
without Active Control. The Location of the Velocity Probes is Indicated by the Black Dots While The
Location of the Control Jets is Indicated by the Red Dot
bottom walls and on the solid walls of the cylinder. An outflow boundary condition is imposed on the
right wall of the domain. The configuration of the simulation is visible in Figure 3.2.1.
The Reynolds number based on the mean velocity magnitude and cylinder diameter (Re = UD/ν, with
ν the kinematic viscosity) is set to Re = 100. Computations are performed on an unstructured mesh
generated with Gmsh. The mesh is refined around the cylinder and is composed of 9262 triangular
elements. A non-dimensional, constant numerical time step dt = 5.10-3 is used. The total
instantaneous drag on the cylinder C is computed following:
FD = ∫(σ. n) ex dS
C
Eq. 3.2.2
where _ is the Cauchy stress tensor, n is the unit vector normal to the outer cylinder surface, and ex =
(1. 0). In the following, the drag is normalized into the drag coefficient:
62 Schäfer, M., Turek, S., Durst, F., Krause, E. & Rannacher, R. 1996 Benchmark Computations of Laminar Flow
Around a Cylinder, pp. 547–566. Wiesbaden: Vieweg+Teubner Verlag.
63 See Previous.
61
FD
CD =
(1/2)ρU̅ 2D
Eq. 3.2.3
where ρ = 1 is the non-dimensional volumetric mass density of the fluid. Similarly, the lift force FL
and
FL = ∫(σ. n) ey dS
C
Eq. 3.2.4
lift coefficient CL are defined as
FL
CL =
(1/2)ρU̅ 2D
Eq. 3.2.5
where ey = (0, 1). In the interest of short solution time, the governing Navier-Stokes equations are
solved in a segregated manner. More precisely, the Incremental Pressure Correction Scheme (IPCS
method with an explicit treatment of the nonlinear term is used. More details are available in
([Rabault]64, Appendix B). Spatial discretization then relies on the finite element method
implemented within the FEniCS framework. We remark that both the mesh density and the Reynolds
number could easily be increased in a later study, but are kept low here as it allows for fast training
on a laptop which is the primary aim of our proof-of-concept demonstration.
In addition, two jets (1 and 2) normal to the cylinder wall are implemented on the sides of the
cylinder, at angles θ1 = 90 and θ2 = 270 degrees relatively to the flow direction. The jets are controlled
through their non-dimensional mass flow rates, Qi, i = 1; 2, and are set through a parabolic-like
velocity profile going to zero at the edges of the jet, (see [Rabault]65, Appendix B) for the details. The
jet widths are set to 10 degree. Choosing jets normal to the cylinder wall, located at the top and
bottom extremities of the cylinder, means that all drag reduction observed will be the result of
indirect flow control, rather than direct injection of momentum. In addition, the control is set up in
such a way that the total mass flow rate injected by the jets is zero, i.e. Q1 + Q2 = 0. This synthetic jets
condition is chosen as it is more realistic than a case when mass is added or subtracted from the flow,
and makes the numerical scheme more stable specially with respects to the boundary conditions of
the problem. In addition, it makes sure that the drag reduction observed is the result of actual flow
control, rather than some sort of propulsion phenomenon. In the following, the injected mass flow
rates are normalized following:
D2
Qi
Q∗i = ; Q ref = ∫ ρU(y)dy
Q ref
−D2
Eq. 3.2.6
where Qref is the reference mass flow rate intercepting the cylinder. During learning, we impose that
∣Q∣< 0.06. This helps in the learning process by preventing non-physically large actuation, and
prevents problems in the numeric of the simulation by enforcing the CFL condition close to the
actuation jets.
64 Jean Rabault, Miroslav Kuchta, Atle Jensen, Ulysse Reglade and Nicolas Cerardi, “Artificial Neural Networks
Trained Through Deep Reinforcement Learning Discover Control Strategies For Active Flow Control”, Journal of
Fluid Mechanics, April 2019.
65 See Previous.
62
Finally, information is extracted from the simulation and provided to the PPO agent. A total of 151
velocity probes, which report the local value of the horizontal and vertical components of the velocity
field, are located in several locations in the neighborhood of the cylinder and in its wake (see Figure
3.2.1). This means that the network gets detailed information about the flow configuration, which is
our objective as this article focuses on finding the best possible control strategy of the vortex
shedding pattern. A different question would be to assess the ability of the network to perform
control with a partial observation of the system. To illustrate that this is possible with adequate
training, we provide some results with an input layer reduced to 11 and 5 probes in (see [Rabault]66,
Appendix E) but further parameter space study and sensitivity analysis is out of the scope of the
present paper and is let to future work.
An unsteady wake develops behind the cylinder, which is in good agreement with what is expected
at this Reynolds number. A simple benchmark of the simulation was performed by observing the
pressure fluctuations, drag coefficient and Strouhal number St = f D/Ū , where f is the vortex
shedding frequency. The mean value of CD in the case without actuation (around 3.205) is within 1%
of what is reported in the benchmark of [Schäfer et al.]67, which validates our simulations, and similar
agreement is found for St (typical value of around 0.30) . In addition, we also performed tests on
refined meshes, going up to around 30000 triangular elements, and found that the mean drag varied
by less than 1 % following mesh refinement. A pressure field snapshot of the fully developed
unsteady wake is presented in Figure 3.2.1.
3.2.3 Network and Reinforcement Learning Framework
As stated in the introduction, Deep Reinforcement Learning (DRL) sees the fluid mechanic simulation
as yet-another environment to interact with through 3 simple channels: the observation ot (here, an
array of point measurements of velocity obtained from the simulation), the action at (here, the active
control of the jets, imposed on the simulation by the learning agent), and the reward rt (here, the
time-averaged drag coefficient provided by the environment, penalized by the mean lift coefficient
magnitude; see further in this section). Based on this limited information, DRL trains an ANN to find
closed-loop control strategies deciding at from ot at each time step, so as to maximize r t. Our DRL
agent uses the Proximal Policy Optimization (PPO, Schulman et al.]68) method for performing
learning.
PPO is a reinforcement learning algorithm that belongs to the family of policy gradient methods. This
method was chosen for several reasons. In particular, it is less complex mathematically and faster
than concurring Trust Region Policy Optimization methods (TRPO), and requires little to no meta
parameter tuning. It is also better adapted to continuous control problems than Deep Q Learning
(DQN) and its variations. From the point of view of the fluid mechanist, the PPO agent acts as a black
box (though details about its internals are available in and the referred literature). A brief
introduction to the PPO method is provided in ([Rabault]69, Appendix C).
The PPO method is episode-based, which means that it learns from performing active control for a
limited amount of time before analyzing the results obtained and resuming learning with a new
episode. In our case, the simulation is first performed with no active control until a well-developed
unsteady wake is obtained, and the corresponding state is saved and used as a start for each
subsequent learning episode. The instantaneous reward function, rt, is computed following:
66 See Previous.
67 Schäfer, M., Turek, S., Durst, F., Krause, E. & Rannacher, R. 1996 Benchmark Computations of Laminar Flow
Around a Cylinder, pp. 547–566. Wiesbaden: Vieweg+Teubner Verlag.
68 Schulman, John, Wolski, Filip, Dhariwal, Prafulla, Radford, Alec & Klimov, Oleg 2017 Proximal policy
Trained Through Deep Reinforcement Learning Discover Control Strategies For Active Flow Control”, Journal of
Fluid Mechanics, April 2019.
63
rt = − 〈CD 〉T − 0.2|〈CL 〉T |
Eq. 3.2.7
where < . >T indicates the sliding average back in time over a duration corresponding to one vortex
shedding cycle. The ANN tries to maximize this function rt, i.e. to make it as little negative as possible
therefore minimizing drag and mean lift (to take into account long-term dynamics, an actualized
reward is actually used during gradient descent; (see the Appendix C in [Rabault]70, for more details).
This specific reward function has several advantages compared with using the plain instantaneous
drag coefficient. Firstly, using values averaged over one vortex shedding cycle leads to less variability
in the value of the reward function, which was found to improve learning speed and stability.
Secondly, the use of a penalization term based on the lift coefficient is necessary to prevent the
network from ’cheating’. Indeed, in the absence of this penalization, the ANN manages to find a way
to modify the configuration of the flow in such a way that a larger drag reduction is obtained (up to
around 18 % drag reduction, depending on the simulation configuration used), but at the cost of a
large induced lift which is damageable in most practical applications.
The ANN used is relatively simple, being composed of two dense layers of 512 fully connected
neurons, plus the layers required to acquire data from the probes, and generate data for the 2 jets.
This network configuration was found empirically through trial and error, as is usually done with
ANNs. Results obtained with smaller networks are less good, as their modeling ability is not sufficient
in regards to the complexity of the flow configuration obtained. Larger networks are also less
successful, as they are harder to train. In total, our network has slightly over 300000 weights. For
more details, readers are referred to the code implementation (see the [Rabault]71 Appendix A).
At first, no learning could be obtained from the PPO agent interacting with the simulation
environment. The reason for this was the difficulty for the PPO agent to learn the necessity to set
time-correlated, continuous control signals, as the PPO first tries purely random control and must
observe some improvement on the reward function for performing learning. Therefore, we
implemented two tricks to help the PPO agent learn control strategies:
• The control value provided by the network is kept constant for a duration of 50 numerical
time steps, corresponding to around 7.5 % of the vortex shedding period. This means, in
practice, that the PPO agent is allowed to interact with the simulation and update its control
only each 50 time steps.
• The control is made continuous in time to avoid jumps in the pressure and velocity due to the
use of an incompressible solver. For this, the control at each time step in the simulation is
obtained for each jet as cs+1 = cs + α(a - cs), where cs is the control of the jet considered at the
previous numerical time step, cs+1 is the new control, a is the action set by the PPO agent for
the current 50 time steps, and α = 0.1 is a numerical parameter.
Using those technical tricks, and choosing an episode duration Tmax = 20.0 (which spans around 6.5
vortex shedding periods, and corresponds to 4000 numerical time steps, i.e. 80 actions by the
network), the PPO agent is able to learn a control strategy after typically about 200 epochs
corresponding to 1300 vortex shedding periods or 16000 sampled actions, which requires around
24 hours of training on a modern desktop using one single core. This training time could be reduced
easily by at least a factor of 10, using more cores to parallelize the data sampling from the epochs
which is a fully parallel process. Fine tuning the policy can take a bit longer time, and up to around
350 epochs can be necessary to obtain a fully stabilized control strategy. A training has also been
performed going up to over 1000 episodes to confirm that no more changes were obtained if the
70See Previous.
71Jean Rabault, Miroslav Kuchta, Atle Jensen, Ulysse Reglade and Nicolas Cerardi, “Artificial Neural Networks
Trained Through Deep Reinforcement Learning Discover Control Strategies For Active Flow Control”, Journal of
Fluid Mechanics, April 2019.
64
network is let to train for a significantly longer time. Most of the computation time is spent in the
flow simulation. This setup with simple, quick simulations makes experimentation and reproduction
of our results easy, while being enough for a proof-of-concept in the context of a first application of
Reinforcement Learning to active flow control and providing an interesting control strategy for
further analysis.
The drag values reported are obtained at each training epoch (including exploration noise), for 10
different trainings using the same meta parameters, but different values of the random seed. Robust
learning takes place within 200 epochs, with fine converged strategy requiring a few more epochs to
stabilize. The drag reduction is slightly less than what is reported in the rest of the text, as these
results include the random exploration noise and are computed over the second half of the training
epochs, where some of the transient in the drag value is still present during training.
3.2.4 Results for Drag Reduction Through Active Flow Control
Robust learning is obtained by applying the methodology presented in the previous section. This is
illustrated by Figure 3.2.2 72, which presents the averaged learning curve and the confidence
interval corresponding to 10 different trainings performed using different seeds for the random
number generator. In this
figure, the drag presented is
obtained by averaging the
drag coefficient obtained on
the second half of each
training epoch. This
averaging is performed to
smooth the effect of both
vortex shedding, and drag
fluctuations due to the
exploration. While it may
include part of the initial
transition from the
undisturbed vortex shedding
to the controlled case, it is a
good relative indicator of
policy convergence.
Estimating at each epoch the
asymptotic quality of the fully Figure 3.2.2 Illustration of the Robustness of the Learning Process
established control regime
would be too expansive, which is the reason why we resort to this averaged value. Using different
random seeds results in different trainings, as random data are used in the exploration noise and for
the random sampling of the replay memory used during stochastic gradient descent. All other
parameters are kept constant. The data presented indicate that learning takes place consistently in
around 200 epochs, with fine convergence and tuning requiring up to around 400 epochs.
72Illustration of the robustness of the learning process. The drag values reported are obtained at each
training
epoch (including exploration noise), for 10 different trainings using the same meta parameters, but different
values of the random seed. Robust learning takes place within 200 epochs, with fine converged strategy
requiring a few more epochs to stabilize. The drag reduction is slightly less than what is reported in the rest
of the text, as these results include the random exploration noise and are computed over the second half of
the training epochs, where some of the transient in the drag value is still present during training.
65
Due to the presence of exploration noise and the averaging being performed on a time window
including some of the transition in the flow configuration from free shedding to active control, the
quality of the drag reduction reported in this figure is slightly less than in the case of deterministic
control in the pseudo periodic actively controlled regime (i.e. when a modified stable vortex shedding
is obtained with the most likely action of the optimal policy being picked up at each time step
implying that, in the case of deterministic control, no exploration noise is present), which is as
expected. The final drag reduction value obtained in the deterministic mode (not shown to not
overload the figure) is also consistent across the runs.
Therefore, it is clear that the ANN is able to consistently reduce drag by applying active flow control
following training through the DRL/PPO algorithm, and that the learning is both stable and robust.
All results presented further in both this section and the next one are obtained using deterministic
prediction, and therefore exploration noise is not present in the following figures and results. The
time series for the drag coefficient obtained using the active flow control strategy discovered through
training in the first run, compared with the baseline simulation (no active control, i.e. Q1 = Q2 = 0), is
presented in Figure 3.2.3 together with the corresponding control signal (inset). Similar results and
control Laws are obtained for all training runs, and the results presented in Figure 3.2.3 are
therefore representative of
the learning obtained with all
10 realizations. In the case
without actuation (baseline),
the drag coefficient CD varies
periodically at twice the
vortex shedding frequency, as
should be expected. The
mean value for the drag
coefficient is < CD >≈ 3.205,
and the amplitude of the
fluctuations of the drag
coefficient is around 0.034.
By contrast, the mean value
for the drag coefficient in the
case with active flow control
is around < C’D > ≈ 2.95, which
represents a drag reduction
of around 8%.
To put this drag reduction Figure 3.2.3 Time-Resolved Value of the Drag Coefficient CD in the
into perspective, we estimate case without (baseline curve) and with (controlled curve) Active Flow
the drag obtained in the Control, and Corresponding Normalized Mass Flow Rate of the Control
hypothetical case where no Jet 1 (Q*1 inset)
vortex shedding is present.
For this, we perform a simulation with the upper half domain and a symmetric boundary condition
on the lower boundary (which cuts the cylinder through its equator). More details about this
simulation are presented in Appendix D of [Rabault]73. The steady-state drag obtained on a full
cylinder in the case without vortex shedding is then CDs = 2.93 (see Appendix D) of [Rabault]74, which
means that the active control is able to suppress around 93% of the drag increase observed in the
baseline without control compared with the hypothetical reference case where the flow would be
kept completely stable.
73 See Previous.
74 See Previous.
66
In addition to this reduction in drag, the fluctuations of the drag coefficient are reduced to around
0:0016 by the active control, i.e. a factor of around 20 compared with the baseline. Similarly,
fluctuations in lift are reduced, though by a more modest factor of around 5.7. Finally, a Fourier
analysis of the drag coefficients obtained shows that the actuation slightly modifies the characteristic
frequency of the system. The actively controlled system has a shedding frequency around 3.5% lower
than the baseline. Several interesting points are visible from the active control signal imposed by the
ANN presented in Figure 3.2.3. Firstly, the active flow control is composed of two phases. In the first
one, the ANN changes the configuration of the flow by performing a relatively large transient
actuation (non-dimensional time ranging from 0 to around 11). This changes the flow configuration,
and sets the system in a state in which less drag is generated. Following this transient actuation, a
second regime is reached in which a smaller actuation amplitude is used. The actuation in this new
regime is pseudo-periodic. Therefore, it appears that the ANN has found a way to both set the flow in
a modified configuration in which less drag is present, and keep it in this modified configuration at a
relatively small cost. In a separate simulation, the small actuation present in the pseudo-periodic
regime once the initial actuation has taken place was suppressed. This led to a rapid collapse of the
modified flow regime, and the original base flow configuration was recovered. As a consequence, it
appears that the modified flow configuration is unstable, though only small corrections are needed
to keep the system in its neighborhood.
Secondly, it is striking to observe that the ANN resorts to quite small actuations. The peak value for
the norm of the non-dimensional control mass flow rate Q*1 , which is reached during the transient
active control regime, is only around 0.02, i.e. a factor 3 smaller than the maximum value allowed
during training. Once the pseudo-periodic regime is established, the peak value of the actuation is
reduced to around 0.006. This is an illustration of the sensitivity of the Navier-Stokes
equations to small perturbations, and a proof that this property of the equations can be exploited to
actively control the flow configuration, if forcing is applied in an appropriate manner.
3.2.5 Analysis of the Control Strategy Results
The ANN trained through DRL learns a control strategy by using a trial-and-error method.
Understanding which strategy an ANN decides to use from the analysis of its weights is known to be
challenging, even on simple image analysis tasks. Indeed, the strategy of the network is encoded in
the complex combination of the weights of all its neurons. A number of properties of each individual
network, such as the variations in architecture, make systematic analysis challenging [Rauber et
al.]75. Through the combination of the neuron weights, the network builds its own internal
representation of how the flow in a given state will be affected by actuation, and how this will affect
the reward value. This is a sort of private, ’encrypted’ model obtained through experience and
interaction with the flow.
Therefore, it appears challenging to directly analyze the control strategy from the trained network,
which should be considered rather as a black box in this regard. Instead, we can look at macroscopic
flow features and how the active control modifies them. This pinpoints the effect of the actuation on
the flow and separation happening in the wake. Representative snapshots of the flow configuration
in the baseline case (no actuation), and in the controlled case when the pseudo-periodic regime is
reached (i.e., after the initial large transient actuation), are presented in Figure 3.2.4. As visible in
Figure 3.2.4, the active control leads to a modification of the 2D flow configuration. In particular,
the Kármán alley is altered in the case with active control and the velocity fluctuations induced by
the vortexes are globally less strong, and less active close to the upper and lower walls.
75Rauber, Paulo E, Fadel, Samuel G, Falcao, Alexandre X & Telea, Alexandru C 2017 Visualizing the hidden
activity of artificial neural networks. IEEE transactions on visualization and computer graphics 23 (1), 101–110.
67
Figure 3.2.4 Comparison of representative snapshots of the velocity magnitude in the case without
actuation (top), and with active flow control (bottom). The bottom figure corresponds to the established
pseudo-periodic modified regime, which is attained after the initial transient control.
More strikingly, the extent of the recirculation area is dramatically increased. Defining the
recirculation area as the region in the downstream neighborhood of the cylinder where the
horizontal component of the velocity is negative, we observe a 130% increase in the recirculation
area, averaged over the pseudo-period. The recirculation area in the active control case represents
103% of what is obtained in the hypothetical stable configuration of Appendix D of ([Rabault]76, (so,
the recirculation area is slightly larger in the controlled case than in the hypothetical stable case,
though the difference is so small that it may be due to a side effect such as slightly larger separation
close to the jets, rather than a true change in the extent of the developed wake), while the
recirculation area in the baseline configuration with vortex shedding is only 44% of this same stable
configuration value. This is, similarly to what was observed for CD, an illustration of the efficiency of
the control strategy at reducing the effect of vortex shedding. To go into more details, we look at the
mean and the Standard Deviation (STD) of the flow velocity magnitude and pressure, averaged over
a large number of vortex shedding periods (in the case with active flow control, we consider the
pseudo-periodic regime).
3.2.6 Conclusion
We show for the first time that the Deep Reinforcement Learning paradigm (DRL, and more
specifically the Proximal Policy Optimization algorithm, PPO) can discover an active flow control
strategy for synthetic jets on a cylinder, and control the configuration of the 2D Kármán vortex street.
From the point of view of the ANN and DRL, this is just yet-another-environment to interact with. The
discovery of the control strategy takes place through the optimization of a reward function, here
defined from the fluctuations of the drag and lift components experienced by the cylinder. A drag
reduction of up to around 8 % is observed. In order to reduce drag, the ANN decides to increase the
area of the separated region, which in turn induces a lower pressure drop behind the cylinder, and
therefore lower drag. This brings the flow into a configuration that presents some similarities with
what would be obtained from boat-tailing.
76Jean Rabault, Miroslav Kuchta, Atle Jensen, Ulysse Reglade and Nicolas Cerardi, “Artificial Neural Networks
Trained Through Deep Reinforcement Learning Discover Control Strategies For Active Flow Control”, Journal of
Fluid Mechanics, April 2019.
68
The value of the drag coefficient and extent of the recirculation bubble when control is turned on are
very close to what is obtained by simulating the flow around a half cylinder using a symmetric
boundary condition at the lower wall, which allows to estimate the drag expected around a cylinder
at comparable Reynolds number if no vortex shedding was present. This implies that the active
control is able to effectively cancel the detrimental effect of vortex shedding on drag. The learning
obtained is remarkable, as little meta parameter tuning was necessary, and training takes place in
about one day on a laptop. In addition, we have resorted to strong regularization of the output of the
DRL agent through under sampling of the simulation and imposing a continuous control for helping
the learning process. It could be expected that relaxing those constraints, i.e. giving more freedom to
the network, could lead to even more efficient strategies.
These results are potentially of considerable importance for Fluid Mechanics, as they provide a proof
that DRL can be used to solve the high dimensionality, analytically intractable problem of active flow
control. The ANN and DRL approach has a number of strengths which make it an appealing
methodology. In particular, ANNs allow for an efficient global approximation of strongly nonlinear
functions, and they can be trained through direct experimentation of the DRL agent with the flow
which makes it in theory easily applicable to both simulations and experiments without changes in
the DRL methodology. In addition, once trained, the ANN requires only few calculations to compute
the control at each time step. In the present case when two hidden layers of width 512 are used, most
of the computational cost comes from a matrix multiplication, where the size of the matrices to
multiply is [512, 512]. This is much less computationally expensive than the underlying problem.
Finally, we are able to show that learning takes place in a timely manner, requiring a reasonable
number of vortex shedding periods to obtain a converged strategy.
This work opens a number of research directions, including applying the DRL methodology to more
complex simulations, for example more realistic 3D LES/DNS on large computer clusters, or even
applying such an approach directly to a real world experiment. In addition, a number of interesting
questions arise from the use of ANNs and DRL. For example, can some form of transfer learning be
used between simulations and the real world if the simulations are realistic enough (i.e., can one train
an ANN in a simulation, and then use it in the real world)? The use of DRL for active flow control may
provide a technique to finally take advantage of advanced, complex flow actuation possibilities, such
as those allowed by complex jet actuator arrays.
69
77Behzad Zakeri, Morteza Khashehchi, Sanaz Samsam, Atoosa Tayebi, Atefeh Rezaei, “Solving Partial
Differential Equations By A Supervised Learning Technique, Applied For The Reaction–Diffusion Equation”,
Springer Nature Switzerland AG 2019.
70
platform technique for learning algorithms to be able to learn at the training stage with minimal
initial marked data. The latter topic is aiming for a reliable method for solving PDEs. The use of
various mathematical techniques for the estimation of the solution of PDEs is one of the essential
aspects of this work.
Changing traditional numerical methods to alternative meshless approaches, such as machine
learning, has become increasingly popular in recent years. Particularly in the event of problems with
complex mathematical formulation, machine learning schemes are replaced by classical models [11].
[Oquab et al.] have used a weakly supervised convolutional neural network to identify objects in
image processing to reduce the number of input labelled images. This method was a general concept
and has been used in various applications, such as automated identification, medical image analysis
and differential equation resolution [3, 8, 18]. [Sharma et al] trained an encoder–decoder U-Net
architecture, which was a completely convoluted neural network to solve a steady-state two-
dimensional heat equation on a regular square.
For this reason, weakly supervised learning techniques have been used to describe the proper
convolutional kernel and loss function to train the network only by using the boundary conditions of
the PDE, rather than having a large number of marked data sets [17]. [Han et al.] have introduced a
new approach to use deep learning to solve high-dimensional PDEs. In the form of backward
stochastic differential equations (BSDEs), they reformulate the PDEs and then use deep learning to
approximate the gradient of the solution. Though their method is effective in dealing with high-
dimensional situations, this method’s limitations justify looking for the comprehensive approach to
resolving linear and low-dimensional PDEs [6, 19].
Then again, customary numerical strategies, for example, FDM and FVM, have been broadly created
to handle various sorts of scientific issues which portrayal of a precise answer for them is not
available [12]. For example, for the above case of the utilization of the response dispersion condition
in sulfate assault, [Zuo et al.] have utilized the limited contrast technique to discover the fixation
dissemination of sulfate particles in concrete. Additionally, expanded looks into have led on the use
of AI in various designing fields, for example, handling with violent streams and control theory [5, 10,
20].
3.3.3 Physics
3.3.3.1 Reaction–Diffusion Equation
For explaining 1-D RDE, a primary line has been considered as a domain with Dirichlet boundary
condition at the parts of the bargains. By doling out the self-assertive consistent to the dissemination
coefficients, we are able to control the part of the fabric on the transport phenomena. Too, the
response coefficient indicated the impact of the interaction between the diffusive substance and
medium. In this recreation, a high concentration connected to the boundaries, and the point is
modelling the engendering of that substance among the space. The general form of the RDE in one-
dimensional space is shown in Eq. 3.3.1, including the initial/boundary conditions , and we want to
determine C(x, t), the concentration field in arbitrary time.
∂C ∂2 C
= D 2 − RC B. C. C(0 < x < L, 0) = 0 and C(0, t) = C(L, t) = C0
∂t ∂x
Eq. 3.3.1
where D, R > 0 are the diffusion coefficient and reaction rate between specified material and domain
respectively. The analytical procedures are not accessible in most cases. Be that as it may, by
expecting the straight line as space and Dirichlet boundary condition, the expository arrangement of
the RDE is appeared in taking after. Since convergence analysis within the neural networks is an
impossible task [14], this arrangement plays a critical part within the approval of profound learning
comes about.
71
∂C1 ∂2 C1
=D 2
∂t ∂x
Eq. 3.3.2
where C1 represents the solution of the RDE without any reactive term. Let us attempt to find a
nontrivial solution of (Eq. 3.3.2) satisfying the boundary conditions (Eq. 3.3.1) using separation of
variables:
C1 (x, t) = X(x)T(t)
Eq. 3.3.3
Substituting C1(x, t) back into Eq. 3.3.2 one obtains:
1 T ′ (t) X ′′ (x)
= = −λ2
D T(t) X(x)
Eq. 3.3.4
where λ is an arbitrary positive coefficient. The solution of the corresponding ODEs of Eq. 3.3.4 are:
2Dt
T(t) = e−λ
Eq. 3.3.5
2Dt
C1 (x, t) = [Asin(λx) + Bcos(λx)]e−λ
Eq. 3.3.7
where A and B are constants of integration. Since Eq. 3.3.2 is a linear equation, the most general
solution is obtained by summing solutions of type Eq. 3.3.7, so that we have:
∞
2Dt
C1 (x, t) = ∑ [A𝛼 sin(λ𝛼 x) + B𝛼 cos(λ𝛼 x)]e−λ𝛼
𝛼=1
Eq. 3.3.8
where Aα , Bα , and λα are determined by the initial and boundary conditions for any particular
problem. The boundary conditions Eq. 3.3.1 demand that:
𝛼𝜋
A α = 0 , λ𝛼 =
𝐿
Eq. 3.3.9
By utility of Fourier series, the final solution of the Eq. 3.3.2 have been extracted as the following
form:
72
∞
4C0 1 −D(2n + 1)2 π2 t (2n + 1)πx
C1 (x, t) = ∑ exp ( ) cos
π 2n + 1 L2 L
n=0
Eq. 3.3.10
Based on the Danckwert’s transform [9], the solution of the Eq. 3.3.1 can be calculated by the
following integral transform:
t
′
C(x, t) = k ∫ C1 e−kt dt ′ + C1 e−kt
0
Eq. 3.3.11
Finally, after the integration, the final solution of the 1-D RDE (Eq. 3.3.1) can be shown as follow:
∞
−4C0 𝑡 𝑡
C(x, t) = ∑ ⟨𝑎𝑛 cos(ω𝑛 x) {kΨ𝑛 [exp ( ) − 1] + exp ( )} k⟩ + C0
π Ψ𝑛 Ψ𝑛
n=0
Eq. 3.3.12
where an , ωn and ψn are represented as follows:
matrix which its columns and rows represent the positions and time steps respectively. All of the
matrix components for the input matrix are zero except the primary and final columns which their
values represent to the boundary condition values (which in this case is C0). Moreover, in this matrix
each row demonstrates the concentration distribution in a specified time-frame.
input to the output layers, and by this procedure, the network is not constrained to memorize the
structure of the input in its bottleneck layers. The number of layers in our design is self-assertive,
and it is conceivable to include layers into the network as much as essential.
3.3.3.6 Kernel
To make an intelligent network that can solve the equation in any time and position, it is necessary
to define the governing rule in that equation in a simple way for the neural network. It is similar to
the method that FDM use for solving the discretized equation. In fact, by discretization of a continues
equation and transferring that equation into the algebraic form we can observe the governing rule
for every point in space and time. By reforming the Eq. 3.3.14, we can find the state of an arbitrary
point in the space-time domain based on its neighbors as shown in:
74
n+1 n n n n n
) − RCm
Cm = Cm + B(Cm−1 − 2Cm + Cm+1 ∆t
Eq. 3.3.15
And B defined as follow:
D∆t
B=
∆x 2
Eq. 3.3.16
For transferring the relation among variables into the neural network, Eq. 3.3.15 have been decoded
into the 3 × 3convolutional kernel as follow:
−a −b −c
[0 1 0]
0 0 0
Eq. 3.3.17
Where
a → Bc → Bb → (1 − 2B − R∆t)
Eq. 3.3.18
Discussed Kernel has been convolved into the across the input matrix, and the output matrix after
normalization was used to calculate the Loss function:
2
∑(Conv2D(Kernel, Output)ij )
ij
Eq. 3.3.19
By minimizing the Eq. 3.3.19, the deep neural network tries to make its’ solution closer to the real
values which can be found in Eq. 3.3.15 and changing in boundary and initial conditions train the
network for solving any type of problems governed by reaction–diffusion physics.
3.3.4 Results
In this section, the solutions of the RDE
which have been conducted by
analytical and Deep Learning methods
analyze and compare with previously
validated solutions. The solution of the
proposed equation has been
represented by taking advantage of the
numerical method (FDM) for
determining the concentration of the
sulfate ion in concrete. To have better
judgment, boundary and initial
condition of the demonstrated solution
in this section have been chosen
precisely similar to the [Zuo et al.]. As it
was raised in the deep learning section,
the output of the U-Net network for
specific input is a matrix which its Figure 3.3.2 The contour of the concentration conducted
columns and rows represent the by the analytical solution. ( D = O(10−8) , R = O(10−4))
position and time, and the value of each
element demonstrates the
concentration in ith time and jth position. On the other hand, the proposed analytical solution in Sect.
75
about. However, it seems that deep learning cannot accurately forecast the concentrations values in
primary timesteps such as 1s. Many reasons can be named for the lack of deep learning in predicting
the correct values in primary time-steps, but the most critical and compelling reason is the high
gradient in this space-time.
In Figure 3.3.5, 3D results of the reaction–diffusion solutions conducted by deep learning and
analytical solution have been demonstrated. To have a better perspective of the physics of the
reaction–diffusion process, the proportion of the reaction rate and diffusion coefficient have chosen
meticulously. The proportion of reaction and diffusion coefficient can determine the physics of the
system among pure diffusion, pure reaction, and simultaneous reaction–diffusion. In fact, by
choosing the correct range of coefficient, the role of the arbitrary term in the equation( reaction or
diffusion) can outweigh the other term.
For this reason, a dimensionless coefficient has been characterized which offer assistance us to
calculate the right extent of response and dissemination coefficients to have all three state of the
arrangement in our computation. Damköhler number is an important dimensionless parameter in
chemical engineering which clarifies the role of diffusion, reaction or simultaneous reaction–
diffusion phenomena in transport phenomena and define as follow:
Rate of Reaction
Da =
Diffusion Rate
Eq. 3.3.20
In our model Eq. 3.3.1, Damköhler number is defined as:
RL2
Da =
D
Eq. 3.3.21
This number represents the states of reaction–diffusion in
different states where Da ≅ 1 , Da ≫ 1 , and Da ≪ 1 mean the
physics of Reaction–Diffusion, pure Reaction, and pure
Diffusion respectively. The Mean Square Error (MSE) index
has been utilized to calculate the deep learning faults in
predicting the correct values of the equation. It has been
observed that the final value of the concentration in space-
time is dependent on the reaction and diffusion coefficients.
However, this dependency is not as much as affect the
accuracy of the final results in a way that they become
unreliable. To have a quantitative assessment of deep learning
solution, we assumed one of the coefficients constant, and by
changing the other coefficient MSE value has been computed, Table 3.3.1 Accuracy Analyze
and the result of this analysis is reported in Table 3.3.1. Based On Changing Coefficients
3.3.5 Conclusion
In this paper, the capability of weakly supervised learning in comprehending the transient one-
dimensional reaction–diffusion equation has been studied. Also, an analytical solution for the RDE
based on the separation of variable technique and the utility of Danchwert’s transform has proposed.
It appeared that the results conducted by deep learning method have grate consistency with
analytical and numerical results. Moreover, it was observed that the values of the reaction and
diffusion coefficients could cause the miss estimation by deep learning. Although these noises were
not in such a level that manipulates our results in this case, it could be the source of destructive faults
in other problems like BSDEs. Finally, it is worth emphasizing that weakly supervised learning, cloud
77
successfully tackle the lack of sufficient labelled data to learn the physics of the governing equations.
Furthermore, this method can be considered for complex problems with limited labelled data and
complex governing equation.
3.3.6 References
1. Berg J, Nyström K (2018) A unified deep artificial neural network approach to partial differential
equations in complex geometries. Neurocomputing 317:28–41.
2. Berg J, Nyström K (2019) Data-driven discovery of PDES in complex datasets. J Com. Phys.
3. Ciompi F, de Hoop B, van Riel SJ, Chung K, Scholten ET, Oudkerk M, de Jong PA, Prokop M, van
Ginneken B (2015) Automatic classification of pulmonary peri-fissure nodules in computed
tomography using an ensemble of 2d views and a convolutional neural network out-of-the-box. Med
Image Anal 26(1):195–202.
4. Crank J et al (1979) The mathematics of diffusion. Oxford University Press, Oxford.
5. Guo X, Yan W, Cui R (2019) Integral reinforcement learning-based adaptive NN control for
continuous-time nonlinear MIMO systems with unknown control directions. IEEE Trans Syst Man
Cybern Syst. https ://doi.org/10.1109/TSMC.2019.28972 21.
6. Han J, Jentzen A, Weinan E (2018) Solving high-dimensional partial differential equations using deep
learning. Proc Natl Academy Sci 115(34):8505–8510.
7. Kuttler C (2011) Reaction–diffusion equations with applications. In: Internet seminar 8. Liu P, Gan
J, Chakrabarty RK (2018) Variational autoencoding the Lagrangian trajectories of particles in a
combustion system. arXiv preprint arXiv :1811.11896.
9. McNabb A (1993) A generalized Danckwerts transformation. Eur J Appl Math 4(2):189–204.
10. Mohan AT, Gaitonde DV (2018) A deep learning based approach to reduced order modeling for
turbulent flow control using LSTM neural networks. arXiv preprint arXiv :1804.09269
11. Monsefi AK, Zakeri B, Samsam S, Khashehchi M. (2019) Performing software test oracle based
on deep neural network with fuzzy inference system. Grandinetti L, Mirtaheri SL, Shahbazian R (eds)
High-performance computing and big data analysis. Springer, Cham, pp 406–417
12. Morton KW, Mayers DF (2005) Numerical solution of partial differential equations: an
introduction. Cambridge University Press, Cambridge
13. Oquab M, Bottou L, Laptev I, Sivic J (2015) Is object localization for free?-weakly-supervised
learning with convolutional neural networks. Proceedings of the IEEE conference on computer vision
and pattern recognition, pp 685–694.
14. Raissi M, Perdikaris P, Karniadakis GE (2019) Physics-informed neural networks: a deep learning
framework for solving forward and inverse problems involving nonlinear partial differential equations.
J Comput Phys 378:686–707.
15. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image
segmentation. In: International conference on medical image computing and computer-assisted
intervention. Springer, Berlin, pp 234–241.
16. Schell KG, Fett T, Bucharsky EC (2019) Diffusion equation under swelling stresses. SN Appl Sci
1(10):1300. https ://doi. org/10.1007/s4245 2-019-1343-1.
17. Sharma R, Farimani AB, Gomes J, Eastman P, Pande V (2018) Weakly-supervised deep learning of
heat transport via physics informed loss. arXiv preprint arXiv :1807.11374
18. Tajbakhsh N, Shin JY, Gurudu SR, Hurst RT, Kendall CB, Gotway MB, Liang J (2016) Convolutional
neural networks for medical image analysis: full training or fine tuning? IEEE Trans Med Imaging
35(5):1299–1312.
19. Weinan E, Han J, Jentzen A (2017) Deep learning-based numerical methods for high-dimensional
parabolic partial differential equations and backward stochastic differential equations. Common Math
Stat 5(4):349–380
20. Wu JL, Xiao H, Paterson E (2018) Physics-informed machine learning approach for augmenting
turbulence models: a comprehensive framework. Phys Rev Fluids 3(7):074602
21. Zakeri B, Monsefi AK, Samsam S, Monsefi BK (2019) Weakly supervised learning technique for
solving partial differential equations; case study of 1-d reaction–diffusion equation. Grandinetti L,
Mirtaheri SL, Shahbazian R (eds) High-performance computing and big data analysis. Springer, Cham.
79
22. Zuo XB, Sun W, Yu C (2012) Numerical investigation on expansive volume strain in concrete
subjected to sulfate attack. Const Build Mater 36:404–410.
80
3.4 Case Study 4 – Physics-Informed Neural Networks (PINNs) for Fluid Mechanics:
A Review
Authors : Shengze Cai1 · Zhiping Mao2 · Zhicheng Wang3 · Minglang Yin4,5 · George Em Karniadakis1,4
Affiliations : 1 Division of Applied Mathematics, Brown University, Providence, USA
2 School of Mathematical Sciences Xiamen University, Xiamen, China
3 Laboratory of Ocean Energy Utilization of Ministry of Education, Dalian University of
Technology, China
4 School of Engineering, Brown University, Providence, USA
5 Center for Biomedical Engineering, Brown University, Providence, USA
problems. This limitation is associated with the minimization of the loss function, which is a high-
dimensional non-convex function, a limitation which is a grand challenge of all neural networks for
even commercial machine learning. However, PINNs perform much more accurately and more
efficiently than any CFD solver if any scattered partial spatial-temporal data are available for the flow
problem under consideration. Moreover, the forward and inverse PINN formulations are identical so
there is no need for expensive data assimilation schemes that have stalled progress especially for
optimization and design applications of flow problems in the past.
In this paper we first review the basic principles of PINNs and recent extensions using domain
decomposition for multi-physics and multiscale flow problems. We then present new results for a
three-dimensional (3D) wake formed in incompressible flow behind a circular cylinder. We also
show results for a two-dimensional (2D) supersonic flow past a blunt body, and finally we infer
material parameters in simulating thrombus deformation in a biomedical flow.
3.4.3 PINN
In this section we first review the basic PINN concept and subsequently discuss more recent
advancements in incompressible, compressible and biomedical flows.
where the number of points (denoted by N) for different loss terms can be different. Generally, we
use the ADAM optimizer [16], an adaptive algorithm for gradient-based first-order optimization, to
optimize the model parameters θ.
Remark 1 We note that the definition of the loss function shown in Eq. 3.4.3 is problem-dependent,
hence some terms may disappear for different types of the problem. For example, when we solve a
forward problem in fluid mechanics with the known parameters (˘) and the initial/boundary
conditions of the PDEs, the data loss Ldata is not necessarily required. However, in the cases where the
model parameters or the initial/boundary conditions are unknown (namely, inverse problems), the
data measurements should be taken into account in order to make the optimization problem
solvable. We also note that the PINN framework can be employed to solve an “over-determined”
system, e.g., well-posed in a classical sense with initial and boundary conditions known and
additionally some measurements inside the domain or at boundaries (e.g., pressure measurements).
One of the key procedures to construct the PDE loss in Eq. 3.4.3 is the computation of partial
derivatives, which is addressed by using automatic differentiation (AD). Relying on the combination
of the derivatives for a sequence of operations by using the chain rule, AD calculates the derivatives
of the outputs with respect to the network inputs directly in the computational graph. The
Figure 3.4.1 Schematic of a physics-informed neural network (PINN). A fully-connected neural network,
with time and space coordinates (t, x) as inputs, is used to approximate the multi-physics solutions ˆ u =
[u, v, p, φ]. The derivatives of ˆ u with respect to the inputs are calculated using automatic differentiation
(AD) and then used to formulate the residuals of the governing equations in the loss function, that is
generally composed of multiple terms weighted by different coefficients. The parameters of the neural
network θ and the unknown PDE parameters λ can be learned simultaneously by minimizing the loss
function
computation of partial derivatives can be calculated with an explicit expression, hence avoiding
introducing discretization and truncation errors as in conventional numerical methods. However, as
the PDE and its derivatives are parameterized by neural networks in PINNs, we note that there may
exist generalization error and optimization error depending on the training data and the optimizer,
respectively [17]. At the present time, ADhas been implemented in various deep learning frameworks
[18,19], which makes it convenient for the development of PINNs.
A schematic of PINNs is shown in Figure 3.4.1, where the key elements (e.g., neural network, AD,
loss function) are indicated in different colors. Here, we consider a multi-physics problem, where the
solutions include the velocity (u, v), pressure p and a scalar field φ, which are coupled in a PDE system
83
f. The schematic in Figure 3.4.1 represents most of the typical problems in fluid mechanics. For
instance, the PDEs considered here can be the Boussinesq approximation of the Navier–Stokes
equations, where φ is the temperature. Following the paradigm in Figure 3.4.1 we will describe the
governing equations, the loss function and the neural network configurations of PINNs case-by-case
in the rest of this paper.
two-dimensional and two-component (2D2C) velocity observations. The proposed algorithm is able
to infer the full velocity and pressure fields very accurately with limited data, which is promising for
diagnosis of complex flows when only 2D measurements (e.g., planar particle image velocimetry) are
available.
a b
Figure 3.4.2 Case study of PINNs for incompressible flows: illustration of simulating the 3D wake flow
over a circular cylinder. a Iso-surface of the vorticity (x-component) in the whole domain color-coded by
the streamwise velocity. The cube with blue edges represents the computational domain in this case. b
Velocity and pressure fields in the domain. The simulation was performed by the CFD solver Nektar,
which is based on the spectral/hp element method [2]
The simulation results of the 3D flow are shown in Figure 3.4.2, where Figure 3.4.2a shows the
iso-surface of streamwise vorticity (ωx = −0.3) color-coded with the streamwise velocity u. In this
section, we are interested in the 3D flow reconstruction problem from limited data, and we only focus
on a subdomain in the wake flow, namely Ωs : [1.5, 7.5] × [−3, 3] ×[4, 9], which is represented by a
cube with blue edges in Figure 3.4.2a. The contours of the three velocity components and pressure
field are shown in Figure 3.4.2b. An Eulerian mesh with 61×61×26 grid points is used for plotting.
To demonstrate the unsteadiness of the motion, we consider 50 snapshots with ∆t = 0.2, which cover
about two periods of the vortex shedding cycle. Here, we aim to apply PINNs for reconstructing the
3D flow field from the velocity observations of a few 2D planes. As illustrated in Figure 3.4.3, three
different “experimental” setups are considered in this paper:
85
– Case 1: two x-planes (x = 1.5, 7.5), one y-plane (y = 0) and two z-planes (z = 4.0, 9.0) are observed.
– Case 2: two x-planes (x = 1.5, 7.5), one y-plane (y = 0) and one z-plane (z = 6.4) are observed.
– Case 3: one x-plane (x = 1.5), one y-plane (y = 0) and one z-plane (z = 6.4) are observed.
Figure 3.4.3 Case study of PINNs for incompressible flows: problem setup for 3D flow reconstruction
from 2D2C observations. a Case 1: two x-planes (x = 1.5, 7.5), one y-plane (y = 0) and two z-planes (z = 4.0,
9.0) are observed. b Case 2: two x-planes (x = 1.5, 7.5), one y-plane (y = 0) and one z-plane (z = 6.4) are
observed. c Case 3: one x-plane (x = 1.5), one y-plane (y = 0) and one z-plane (z = 6.4) are observed. Note
that for the cross-planes, only the projected vectors are measured. The goal is to infer the 3D flow in the
investigated domain using PINNs from these 2D2C observations
L = LDATA + LPDE
Eq. 3.4.4
where
Nu
1 i 2
LDATA = ∑‖u (𝐱 DATA , t iDATA ) − uiDATA ‖2 +
Nu
i
Nv
1 i 2
∑‖v (𝐱 DATA , t iDATA ) − vDATA
i
‖2 +
Nv
i
86
Nw
1 i 2
∑‖w (𝐱 DATA , t iDATA ) − wDATA
i
‖2
N𝑤
i
Eq. 3.4.5
and
𝑁𝑓 4
1 2
LPDE = ∑ ∑‖f 𝑖 (𝐱 fi , t if )‖2 where
Nf
𝑖 j
∂𝐮 1
f1,2,3 = + 𝐮 ∙ ∇𝐮 + ∇p − ∇2 𝐮 , f4 = ∇. 𝐮
∂t Re
Eq. 3.4.6
The data loss Ldata is composed of three components, and the number of training data (namely Nu, Nv
and Nw) depends on the number of observed planes, the data resolution of each plane as well as the
number of snapshots. On the other hand, the residual points for LPDE can be randomly selected, and
here we sample Nf = 3 × 106 points over the investigated space and time domain Ωs . Note that in this
study, the boundary and initial conditions are not required unlike the classical setting. Moreover, no
information about the pressure is given. The weighting coefficients for the loss terms are all equal to
1. A fully-connected neural network with 8 hidden layers and 200 neurons per layer is employed.
The activation function of each neuron is σ = sin(·). We apply the ADAM optimizer with mini-batch
for network training, where a batch size of N = 104 is used for both data and residual points. The
network is trained for 150 epochs with learning rates 1×10−3, 5×10−4 and 1×10−4 for every 50 epochs.
After training, the velocity and pressure fields are evaluated on the Eulerian grid for comparison and
visualization.
Figure 3.4.4 Case study of PINNs for incompressible flows: relative L 2-norm errors of velocities and
pressure for different flow reconstruction setups. These three cases correspond to those shown in Figure
3.4.3. The errors are computed over the entire investigated domain
It can be seen that the PINNs inference result (inferred from a few 2D2C observations) is very
consistent with the CFD simulation. In addition, the velocities (u, v) at a single point (x = 3, y = 0, z =
6.4) against time are plotted in Figure 3.4.5c, where we can find that PINNs can capture the
unsteadiness of vortex shedding flow very accurately.
3.4.5 Case Study for Compressible Flows
PINNs have also been used in simulating high-speed flows [13]. In this section, we consider the
following 2D steady compressible Euler equations:
1
p = (γ − 1) (ρE − ρ‖𝐮‖𝟐 )
2
Eq. 3.4.9
where γ is the adiabatic index and u = (u, v). We shall employ the PINNs to solve the inverse problem
of the compressible Euler Eq. 3.4.8. In particular, we shall infer the density, pressure and velocity
fields by using PINNs based on the information of density gradients, limited data of pressure
(pressure on the surface of the body), inflow conditions and global physical constrains.
Figure 3.4.5 Case study of PINNs for incompressible flows: inference result of PINNs for Case 2. a Iso-
surfaces of vorticity magnitude (top) and pressure (bottom) at t = 8.0 from CFD data. b Iso-surfaces of
vorticity magnitude (top) and pressure (bottom) at t = 8.0 inferred by PINNs. c Point measurement (x = 3,
y = 0, z = 6.4) of velocity (u, v) against time. In this case, the 3D flow is inferred by PINNs from four cross-
planes
kg
M∞ = 4 , p∞ = 101253.6 Pa , ρ∞ = 1.225 3
m
m
u∞ = 1360.6963 , v∞ = 0 , T∞ = 288 K
s
Eq. 3.4.11
The data points for the pressure are located on the surface of the body. By using the above inflow
conditions and CFD code, we can obtain the steady state flow. We show the density computed by CFD
in the left plot of Figure 3.4.6. We employ a 6 × 60 (6 hidden layers) neural network and train it by
using layer-wise adaptive tanh activate function [36] and the Adam optimizer with the learning rate
Figure 3.4.6 Case study of PINNs for compressible flows. Left: the density obtained by using CFD
simulation with the inlet flow condition (11). Middle: Distributions of the residual points (blue ∗ points),
the data points for the density gradient (red +points) and the data points for the inflow conditions
(magenta ◦ points). Right: training loss vs. number of epochs
being 6 × 10−4 for 3×105 epochs. Here, we also use the technique of dynamic weights [11,37]. The
history of the training loss is shown in the right plot of Figure 3.4.6. The results of the PINN
solutions for the pressure and velocity (u) are shown in Figure 3.4.7. Observe that the PINN
solutions are in good agreement with the CFD data. This indicates that we can reconstruct the flow
fields for high-speed flows using some other available knowledge.
90
∂𝐮 (1 − ϕ)𝐮
+ 𝐮. ∇𝐮 + ∇p = ∇. (𝛔vis + 𝛔coh ) − μ , ∇. 𝐮 = 0
∂t 2kϕ
∂ϕ
+ 𝐮. ∇ϕ = τ ∇ω , ω = ∆ϕ + γg(ϕ)
∂t
Eq. 3.4.12
g(φ) is the derivative of the double-well potential (φ2 −1)2/4h2. Variables, i.e., u(x, t), p(x, t) σ(x, t),
and φ(x, t), represent the velocity, pressure, stress, and phase field. Viscous and cohesive stresses are
defined as:
𝛔vis = μ∇2 𝐮 ,
𝛔coh = λ∇. (∇ϕ ⊗ ϕ)
Eq. 3.4.13
In particular, we set the density ρ
= 1, viscosity μ = 0.1, λ =
4.2428×10−5, τ = 10−6, and the
interface length h = 0.05. We
impose a Dirichlet boundary
condition for the velocity at inlet
Γi as u = g, (x, t) ∈ Γi × (0, T ). A
Neumann-type boundary
condition, ∂/∂n = 0 for x ∈ Γw ∪ Γi
∪ Γ0 is set for φ and ω at all
boundaries. A more detailed
description of the governing
equations can be found in Refs.
[14,56].
3.4.6.1 PINNs
We construct two fully-connected
neural networks, NetU and NetW,
where the former predicts u, v, p,
and φ and the latter predicts the
intermediate variable, ω. It is
preferable to write the governing
equation as a system of PDEs
since computing the fourth-order
partial derivative on φ is
decomposed into computing Figure 3.4.7 Case study of PINNs for compressible flows.
second-order partial derivations Comparison between the PINN solutions and the CFD solutions.
on φ and ω. Both networks have Top: pressure p, Bottom: velocity component u
20 neurons per layer and 9
91
hidden layers. The total loss L is a linear combination of losses from data, initial and boundary
condition, and PDE residuals:
1
LPDE (θ, 𝛌, XPDE ) = ∑ ‖f(𝐱, ∂𝐱 û, ∂t û, , , , , , 𝛌)‖22
|XPDE |
x∈XPDE
1
LBC (θ, 𝛌, XBC ) = ∑ B(û, 𝐱)22
|XBC |
x∈XBC
1
LIC (θ, 𝛌, XIC ) = ∑ ‖û − uto ‖22
|XIC |
x∈XIC
1
LDATA (θ, 𝛌, XDATA ) = ∑ ‖û − uDATA ‖22
|XDATA |
x∈DATA
Eq. 3.4.15
We set ωi as the weights of each term and randomly sample points as the training sets XPDE, XBC, and
XIC from the inner spatial-temporal domain, boundaries, and initial snapshot, respectively. In
particular, we have point measurements, Xdata, as the known data for minimizing the data loss; |·|
indicates the size of the set. Finally, the optimal θ and λ = [κ] are obtained by minimizing the total
loss L(θ, λ) iteratively until the loss satisfies the stopping criteria.
Figure 3.4.8 Case study for 2Dflowpast a thrombus with phase-dependent permeability. a the inlet
flowu(t, y) (denoted by φ = 1) enters the channel from the left side. A two-layer thrombus is present at the
bottom wall with a fibrin-clotted core φ = −1 and permeable shell φ = 0. Their permeabilities are set as
0.001 and 1. b Four types of sampling points. Initial points (_), inner points (_), boundary points (_), and
point measurements (_) are sampled accordingly in the spatial-temporal domain. (Figure adapted from
[14])
92
points from the inner spatial-temporal domain (*), and 1,000 points at boundaries (*) to estimate the
total loss at each training epoch (shown in Figure 3.4.8b). In our setup shown in Figure 3.4.8a the
core permeability is κ = 0.001 (φ = −1) with a permeable outer shell as κ = 1 (φ = 0). The phase field
variable φ is expressed as a function of κ, namely κ(φ) = eaφ +b. a and b are model parameters to be
optimized in the PINN model with their true value as 6.90 and 0.0. The form of the relation function
is not unique as long as the permeability value matches the true value at φ = 1 and 0.
development of closure models for unresolved flow dynamics at very high Reynolds number using
the automatic data assimilation method provided by PINNs. Computing flow problems at scale
requires efficient multi-GPU implementations in the spirit of data parallel [29] or a hybrid data
parallel and model parallel paradigms as in Ref. [28]. The parallel speed up obtained for flow
simulations so far is very good, suggesting that PINNs can be used in the near future for industrial
complexity problems at scale that CFD methods cannot tackle.
Acknowledgements The research of the second author (ZM) was supported by the National Natural
Science Foundation of China (Grant 12171404). The last author (GEK) would like to acknowledge
support by the Alexander von Humboldt fellowship.
3.4.8 References
1. Brooks, A.N., Hughes, T.J.: Streamline upwind/Petrov-Galerkin formulations for convection
dominated flows with particular emphasis on the incompressible Navier-Stokes equations. Comput.
Methods Appl. Mech. Eng. 32, 199–259 (1982)
2. Karniadakis, G.E., Sherwin, S.: Spectral-hp Element Methods for Computational Fluid Dynamics,
2nd edn. Oxford University Press, Oxford (2005)
3. Katz, A.J.: Meshless Methods for Computational Fluid Dynamics. Stanford University Stanford,
Stanford (2009)
4. Liu, M., Liu, G.: Smoothed particle hydrodynamics (SPH): an overview and recent developments.
Arch. Comput. Methods Eng. 17, 25–76 (2010)
5. Beck, J.V., Blackwell, B., Clair Jr, C.R.S.: Inverse heat conduction: Ill-posed problems. James Beck
(1985)
6. Jasak, H., Jemcov, A., Tukovic, Z., et al.: OpenFOAM: A C++ library for complex physics simulations.
In: International Workshop on Coupled Methods in Numerical Dynamics, vol. 1000, pp. 1–20. IUC
Dubrovnik Croatia (2007)
7. Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics informed learning machine (2021). US Patent
10,963,540
8. Raissi, M., Perdikaris, P., Karniadakis, G.E.: Machine learning of linear differential equations using
Gaussian processes. J. Comput. Phys. 348, 683–693 (2017)
9. Raissi, M., Perdikaris, P., Karniadakis, G.E.: Numerical Gaussian processes for time-dependent and
nonlinear partial differential equations. SIAM J. Sci. Comput. 40, A172–A198 (2018)
10. Raissi, M., Wang, Z., Triantafyllou, M.S., et al.: Deep learning of vortex-induced vibrations. J. Fluid
Mech. 861, 119–137 (2019)
11. Jin, X., Cai, S., Li, H., et al.: NSFnets (Navier-Stokes flow nets): physics-informed neural networks
for the incompressible Navier-Stokes equations. J. Comput. Phys. 426, 109951 (2021)
12. Raissi, M., Yazdani, A., Karniadakis, G.E.: Hidden fluid mechanics: learning velocity and pressure
fields from flow visualizations. Science 367, 1026–1030 (2020)
13. Mao, Z., Jagtap, A.D., Karniadakis, G.E.: Physics-informed neural networks for high-speed flows.
Comput. Methods Appl. Mech. Eng. 360, 112789 (2020)
14. Yin, M., Zheng, X., Humphrey, J.D., et al.: Non-invasive inference of thrombus material properties
with physics-informed neural networks. Comput. Methods Appl. Mech. Eng. 375, 113603 (2021)
15. Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics-informed neural networks: a deep learning
framework for solving forward and inverse problems involving nonlinear partial differential
equations. J. Comput. Phys. 378, 686–707 (2019)
16. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
(2014)
17. Shin, Y., Darbon, J., Karniadakis, G.E.: On the convergence of physics informed neural networks for
linear second-order elliptic and parabolic type PDEs. Commun. Comput. Phys. (2020)
18. Abadi, M., Barham, P., Chen, J., et al.: Tensorflow: a system for large-scale machine learning. In:
12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16), (2016)
94
19. Paszke, A., Gross, S., Massa, F., et al.: Pytorch: an imperative style, high-performance deep learning
library. arXiv preprint arXiv:1912.01703 (2019)
20. Raissi,M., Perdikaris, P., Karniadakis, G.E.: Physics informed deep learning (part i): data-driven
solutions of nonlinear partial differential equations. arXiv preprint arXiv:1711.10561 (2017)
21. Raissi,M., Perdikaris, P., Karniadakis, G.E.: Physics informed deep learning (part ii): data-driven
discovery of nonlinear partial differential equations. arxiv. arXiv preprint arXiv:1711.10561 (2017)
22. Pang, G., Lu, L., Karniadakis, G.E.: fPINNs: fractional physics informed neural networks. SIAM J. Sci.
Comput. 41, A2603–A2626 (2019)
23. Fang, Z., Zhan, J.: A physics-informed neural network framework for PDEs on 3Dsurfaces: time
independent problems. IEEE Access 8, 26328–26335 (2019)
24. Zhang, D., Guo, L., Karniadakis, G.E.: Learning in modal space: solving time-dependent stochastic
PDEs using physics-informed neural networks. SIAM J. Sci. Comput. 42, A639–A665 (2020)
25. Kharazmi, E., Zhang, Z., Karniadakis, G.E.: hp-VPINNs: variational physics-informed neural
networks with domain decomposition. Comput. Methods Appl. Mech. Eng. 374, 113547 (2021)
26. Jagtap, A.D., Kharazmi, E., Karniadakis, G.E.: Conservative physics-informed neural networks on
discrete domains for conservation laws: applications to forward and inverse problems. Comput.
Methods Appl. Mech. Eng. 365, 113028 (2020)
27. Jagtap, A.D., Karniadakis, G.E.: Extended physics-informed neural networks (XPINNs): a
generalized space-time domain decomposition based deep learning framework for nonlinear partial
differential equations. Commun. Comput. Phys. 28, 2002–2041 (2020)
28. Shukla, K., Jagtap, A.D., Karniadakis, G.E.: Parallel physics informed neural networks via domain
decomposition. arXiv preprint arXiv:2104.10013 (2021)
29. Hennigh, O., Narasimhan, S., Nabian, M.A., et al.: NVIDIASimNetTM: an AI-accelerated multi-
physics simulationframeworkarxivhttp://arxiv.org/abs/2012.07938arXiv:2012.07938
30. Yang, Y., Perdikaris, P.: Adversarial uncertainty quantification in physics-informed neural
networks. J. Comput. Phys. 394, 136–152 (2019)
31. Zhang, D., Lu, L., Guo, L., et al.: Quantifying total uncertainty in physics-informed neural networks
for solving forward and inverse stochastic problems. J. Comput. Phys. 397, 2019 (2019)
32. Zhu, Y., Zabaras, N., Koutsourelakis, P.S., et al.: Physics constrained deep learning for high-
dimensional surrogate modeling and uncertainty quantification without labeled data. J. Comput.
Phys. 394, 56–81 (2019)
33. Sun, L.,Wang, J.X.: Physics-constrained Bayesian neural network for fluid flow reconstruction with
sparse and noisy data. Theor. Appl. Mech. Lett. 10, 161–169 (2020)
34. Yang, L., Meng, X., Karniadakis, G.E.: B-PINNs: Bayesian physics-informed neural networks for
forward and inverse PDE problems with noisy data. J. Comput. Phys. 425, 109913 (2021)
35. Meng, X., Karniadakis, G.E.: A composite neural network that learns from multi-fidelity data:
application to function approximation and inverse PDE problems. J. Comput. Phys. 401, (2020)
36. Jagtap, A.D., Kawaguchi, K., Karniadakis, G.E.: Adaptive activation functions accelerate
convergence in deep and physics informed neural networks. J. Comput. Phys. 404, 109136 (2020)
37. Wang, S., Teng, Y., Perdikaris, P.: Understanding and mitigating gradient pathologies in physics-
informed neural networks. arXiv preprint arXiv:2001.04536 (2020)
38. Lu, L., Pestourie, R., Yao, W., et al.: Physics-informed neural networks with hard constraints for
inverse design. arXiv preprint arXiv:2102.04626 (2021)
39. Gao, H., Sun, L., Wang, J.X.: PhyGeoNet: physics-informed geometry-adaptive convolutional neural
networks for solving parameterized steady-state PDEs on irregular domain. J. Comput. Phys. (2021)
40. Mishra, S., Molinaro, R.: Estimates on the generalization error of physics informed neural
networks (PINNs) for approximating PDEs. arXiv preprint arXiv:2006.16144 (2020)
41. Mishra, S., Molinaro, R.: Estimates on the generalization error of Physics Informed Neural
Networks (PINNs) for approximating a class of inverse problems for PDEs. arXiv preprint
arXiv:2007.01138 (2020)
95
42. Wang, S., Yu, X., Perdikaris, P.: When and why PINNs fail to train: A neural tangent kernel
perspective. arXiv preprint arXiv:2007.14527 (2020)
43. Kissas, G., Yang, Y., Hwuang, E., et al.: Machine learning in cardiovascular Flows modeling:
predicting arterial blood pressure from non-invasive 4D flowMRI data using physics-informed neural
networks. Comput. Methods Appl. Mech. Eng. 358, 112623 (2020)
44. Yang, X., Zafar, S., Wang, J.X., et al.: Predictive large-eddy simulation wall modeling via physics-
informed neural networks.
45. Lou, Q., Meng, X., Karniadakis, G.E.: Physics-informed neural networks for solving forward and
inverse flow problems via the Boltzmann-BGK formulation. arXiv preprint arXiv:2010.09147 (2020)
46. Cai, S., Wang, Z., Wang, S., et al.: Physics-informed neural networks for heat transfer problems. J.
Heat Transf. 143, 060801 (2021)
47. Cai, S.,Wang, Z., Fuest, F., et al.: Flow over an espresso cup: inferring 3-D velocity and pressure
fields from tomographic background oriented Schlieren via physics-informed neural networks. J.
Fluid Mech. 915 (2021)
48. Wang, S., Perdikaris, P.: Deep learning of free boundary and Stefan problems. J. Comput. Phys. 428,
109914 (2021)
49. Lucor, D., Agrawal, A., Sergent, A.: Physics-aware deep neural networks for surrogate modeling of
turbulent natural convection. arXiv preprint arXiv:2103.03565 (2021)
50. Mahmoud abadbozchelou, M., Caggioni, M., Shahsavari, S., et al.: Data-driven physics-informed
constitutive metamodeling of complex fluids: a multi fidelity neural network (mfnn) framework. J.
Rheol. 65, 179–198 (2021)
51. Arzani, A., Wang, J.X., D’Souza, R.M.: Uncovering near-wall blood flow from sparse data with
physics-informed neural networks. arXiv preprint arXiv:2104.08249 (2021)
52. Cai, S., Li, H., Zheng, F., et al.: Artificial intelligence velocimetry and microaneurysm-on-a-chip for
three-dimensional analysis of blood flow in physiology and disease. Proc. Natl. Acad. Sci. (2021)
53. Wang, R., Kashinath, K., Mustafa, M., et al.: Towards physics informed deep learning for turbulent
flow prediction. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge
Discovery & Data Mining, pp. 1457–1466 (2020)
54. Goswami, S., Anitescu,C., Chakraborty, S., et al.: Transfer learning enhanced physics informed
neural network for phase-field modeling of fracture. Theor. Appl. Fract. Mech. 106, 102447 (2020)
55. Zhang, E., Yin, M., Karniadakis, G.E.: Physics-informed neural networks for nonhomogeneous
material identification in elasticity imaging. arXiv preprint arXiv:2009.04525 (2020)
56. Zheng, X., Yazdani, A., Li, H., et al.: A three-dimensional phase field model for multiscale modeling
of thrombus biomechanics in blood vessels. PLoS Comput. Biol. 16, e1007709 (2020)
57. Xu, Z., Chen, N., Kamocka, M.M., et al.: A multiscale model of thrombus development. J. R. Soc.
Interface 5, 705–722 (2008)
58. Yazdani, A., Li, H., Humphrey, J.D., et al.: A general shear dependent model for thrombus
formation. PLoS Comput. Biol. 13, e1005291 (2017)
59. Fan, D.,Yang, L.,Wang, Z., et al.: Reinforcement learning for bluff body active flow control in
experiments and simulations. Proc. Natl. Acad. Sci. 117, 26091–26098 (2020)
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published
maps and institutional affiliations.
96
algorithms and elaborate computer codes. Most importantly, solving real- life physical problems with
missing, gappy or noisy boundary conditions through traditional approaches is currently impossible.
This is where and why observational data play a crucial role. With the prospect of more than a trillion
sensors in the next decade, including airborne, seaborne and satellite remote sensing, a wealth of
multi- fidelity observations is ready to be explored through data- driven methods.
However, despite the volume, velocity and variety of available (collected or generated) data streams,
in many real cases it is still not possible to seamlessly incorporate such multi- fidelity data into
existing physical models.
Mathematical (and practical) data- assimilation efforts have been blossoming; yet the wealth and the
spatiotemporal heterogeneity of available data, along with the lack of universally acceptable models,
underscores the need for a transformative approach. This is where machine learning (ML) has come
into play. It can explore massive design spaces, identify multi- dimensional correlations and manage
ill- posed problems. It can, for instance, help to detect climate extremes or statistically predict
dynamic variables such as precipitation or vegetation productivity2,3. Deep learning approaches, in
particular, naturally provide tools for automatically extracting features from massive amounts of
multi- fidelity observational data that are currently available and characterized by unprecedented
spatial and temporal coverage4. They can also help to link these features with existing approximate
models and exploit them in building new predictive tools. Even for biophysical and biomedical
modelling, this synergistic integration between ML tools and multiscale and multi physics models has
been recently advocated5.
A common current theme across scientific domains is that the ability to collect and create
observational data far outpaces the ability to assimilate it sensibly, let alone understand it4 (Box 1).
Box 1
The figure below schematically illustrates three possible categories of physical problems and
associated available data. In the small data regime, it is assumed that one knows all the physics,
and data are provided for the initial and boundary conditions as well as the coefficients of a partial
differential equation. The ubiquitous regime in applications is the middle one, where one knows
some data and some physics, possibly missing some parameter values or even an entire term in
the partial differential equation, for example, reactions in an advection–diffusion–reaction
system. Finally, there is the regime with big data, where one may not know any of the physics,
and where a data- driven approach may be most effective, for example, using operator regression
methods to discover new physics. Physics- informed machine learning can seamlessly integrate
data and the governing physical laws, including models with partially missing physics, in a unified
way. This can be expressed compactly using automatic differentiation and neural networks7 that
are designed to produce predictions that respect the underlying physical principles.
Despite their towering empirical promise and some preliminary success6, most ML approaches
currently are unable to extract interpretable information and knowledge from this data deluge.
98
Moreover, purely data- driven models may fit observations very well, but predictions may be
physically inconsistent or implausible, owing to extrapolation or observational biases that may lead
to poor generalization performance. Therefore, there is a pressing need for integrating fundamental
physical laws and domain knowledge by ‘teaching’ ML models about governing physical rules, which
can, in turn, provide ‘informative priors’ that is, strong theoretical constraints and inductive biases
on top of the observational ones.
To this end, physics- informed learning is needed, hereby defined as the process by which prior
knowledge stemming from our observational, empirical, physical or mathematical understanding of
the world can be leveraged to improve the performance of a learning algorithm. A recent example
reflecting this new learning philosophy is the family of ‘physics informed neural networks’
(PINNs)7. This is a class of deep learning algorithms that can seamlessly integrate data and abstract
mathematical operators, including PDEs with or without missing physics (Boxes 2,3). The leading
motivation for developing these algorithms is that such prior knowledge or constraints can yield
more interpretable ML methods that remain robust in the presence of imperfect data (such as
missing or noisy values, outliers and so on) and can provide accurate and physically consistent
predictions, even for extrapolatory/generalization tasks.
Despite numerous public databases, the volume of useful experimental data for complex physical
systems is limited. The specific data- driven approach to the predictive modelling of such systems
depends crucially on the amount of data available and on the complexity of the system itself, as
illustrated in Box 1. The classical paradigm is shown on the left side of the figure in Box 1, where it
is assumed that the only data available are the boundary conditions and initial conditions whereas
the specific governing PDEs and associated parameters are precisely known. On the other extreme
(on the right side of the figure), a lot of data may be available, for instance, in the form of time series,
but the governing physical law (the underlying PDE) may not be known at the continuum level7-9. For
the majority of real applications, the most interesting category is sketched in the center of the figure,
where it is assumed that the physics is partially known (that is, the conservation law, but not the
constitutive relationship) but several scattered measurements (of a primary or auxiliary state) are
available that can be used to infer parameters and even missing functional terms in the PDE while
simultaneously recovering the solution.
It is clear that this middle category is the most general case, and in fact it is representative of the
other two categories, if the measurements are too few or too many. This ‘mixed’ case may lead to
much more complex scenarios, where the solution of the PDEs is a stochastic process due to
stochastic excitation or an uncertain material property. Hence, stochastic PDEs can be used to
represent these stochastic solutions and uncertainties. Finally, there are many problems involving
long- range spatiotemporal interactions, such as turbulence, visco-elasto-plastic materials or other
anomalous transport processes, where non local or fractional calculus and fractional PDEs may be
the appropriate mathematical language to adequately describe such phenomena as they exhibit a rich
expressivity not unlike that of deep neural networks (DNNs).
Over the past two decades, efforts to account for uncertainty quantification in computer simulations
have led to highly parameterized formulations that may include hundreds of uncertain parameters
for complex problems, often rendering such computations infeasible in practice. Typically, computer
codes at the national labs and even open- source programs such as OpenFOAM10 or LAMMPS11 have
more than 100,000 lines of code, making it almost impossible to maintain and update them from one
generation to the next. We believe that it is possible to overcome these fundamental and practical
problems using physics informed learning, seamlessly integrating data and mathematical models,
and implementing them using PINNs or other nonlinear regression based physics informed
networks (PINs) (Box 2).
In this Review, we first describe how to embed physics in ML and how different physics can provide
guidance to developing new neural network (NN) architectures. We then present some of the new
capabilities of physics informed learning machines and highlight relevant applications. This is a very
99
Box 2
Making a learning algorithm physics- informed amounts to introducing appropriate
observational, inductive or learning biases that can steer the learning process towards
identifying physically consistent solutions (see the figure).
• Observational biases can be introduced directly through data that embody the underlying
physics or carefully crafted data augmentation procedures. Training a machine learning (ML)
system on such data allows it to learn functions, vector fields and operators that reflect the
physical structure of the data.
• Inductive biases correspond to prior assumptions that can be incorporated by tailored
interventions to an ML model architecture, such that the predictions sought are guaranteed to
implicitly satisfy a set of given physical laws, typically expressed in the form of certain
mathematical constraints. One would argue that this is the most principled way of making a
learning algorithm physics- informed, as it allows for the underlying physical constraints to be
strictly satisfied. However, such approaches can be limited to accounting for relatively simple
symmetry groups (such as translations, permutations, reflections, rotations and so on) that are
known a priori, and may often lead to complex implementations that are difficult to scale.
• Learning biases can be introduced by appropriate choice of loss functions, constraints and
inference algorithms that can modulate the training phase of an ML model to explicitly favor
convergence towards solutions that adhere to the underlying physics.
By using and tuning such soft penalty constraints, the underlying physical laws can only be
approximately satisfied; however, this provides a very flexible platform for introducing a broad
class of physics- based biases that can be expressed in the form of integral, differential or even
fractional equations.
These different modes of biasing a learning algorithm towards physically consistent solutions
are not mutually exclusive and can be effectively combined to yield a very broad class of hybrid
approaches for building physics- informed learning machines.
fast moving field, so at the end we provide an outlook, including some thoughts on current
100
limitations. A taxonomy of several existing physics- based methods integrated with ML can also be
found in ref.12.
3.5.1 How to Embed Physics in ML
No predictive models can be constructed without assumptions, and, as a consequence, no
generalization performance can be expected by ML models without appropriate biases. Specific to
physics informed learning, there are currently three pathways that can be followed separately or in
tandem to accelerate training and enhance generalization of ML models by embedding physics in
them (Box 2).
Generalized convolutions are not the only building blocks for designing architectures with strong
implicit biases. For example, anti-symmetry under the exchange of input variables can be obtained
in NNs by using the determinant of a matrix- valued function32.
Reference33 proposed to combine a physics- based model of bond- order potential with an NN and
divide structural parameters into local and global parts to predict interatomic potential energy
surface in large- scale atomistic modelling. In another work34, an invariant tensor basis was used to
embedded Galilean invariance into the network architecture, which significantly improved the NN
102
prediction accuracy in turbulence modelling. For the problem of identifying Hamiltonian systems,
networks are designed to preserve the simplistic structure of the underlying Hamiltonian system 35.
For example, ref.36 modified an auto encoder to represent a Koopman operator for identifying
coordinate transformations that recast nonlinear dynamics into approximately linear ones.
Specifically for solving differential equations using NNs, architectures can be modified to satisfy
exactly the required initial conditions37, Dirichlet boundary conditions37,38, Neumann boundary
conditions39,40, Robin boundary conditions41, periodic boundary conditions42,43 and interface
conditions41. In addition, if some features of the PDE solutions are known a priori, it is also possible
to encode them in network architectures, for example, multiscale features44,45, even/odd symmetries
and energy conservation46, high frequencies47 and so on.
For a specific example, we refer to the recent work in ref.48, which proposed new connections
between NN architectures and viscosity solutions to certain Hamilton–Jacobi PDEs (HJ- PDEs). The
two- layer architecture depicted in Figure 3.5.1b defines
x − ui
f(x, t) ∶= min
⏟ [tL ( ) + ai ]
i∈{1,,,,,m}
t
Eq. 3.5.1
which is reminiscent of the celebrated Lax–Oleinik formula. Here, x and t are the spatial and temporal
variables, L is a convex and Lipschitz activation function, ai ∊R and ui ∊Rn are the NN parameters, and
m is the number of neurons. It is shown in ref.48 that f is the viscosity solution to the following HJ-PDE
∂f
(x , t) + H[∇x f(x , t)] = 0 , x ∈ Rn , t ∈ (0, ∞)
∂t
f(x ,0) = J(x) , x ∈ Rn
Eq. 3.5.2
where both the Hamiltonian H and the initial data J are explicitly obtained by the parameters and the
activation functions of the networks. The Hamiltonian H must be convex, but the initial data J are not.
Note that the results of ref.48 do not rely on universal approximation theorems established for NNs.
Rather, the NNs in ref.48 show that the physics contained in certain classes of HJ-PDEs can be naturally
encoded by specific NN architectures without any numerical approximation in high dimensions.
soft constraint to preserve the Lyapunov stability, and InvNet56, which is capable of encoding
invariances by soft constraints in the loss function.
Box 3
Physics informed neural networks (PINNs)7 seamlessly integrate the information from both the
measurements and partial differential equations (PDEs) by embedding the PDEs into the loss
function of a neural network using automatic differentiation. The PDEs could be integer- order
PDEs7, integrodifferential equations154, fractional PDEs103 or stochastic PDEs42,102. Here, we
present the PINN algorithm for solving forward problems using the example of the viscous
Burgers’ equation
∂u ∂u ∂2 u
+u =υ 2
∂t ∂x ∂x
Eq. 3.5.3
with a suitable initial condition and Dirichlet boundary conditions. In the figure, the left (physics-
uninformed) network represents the surrogate of the PDE solution u(x , t), while the right
(physics-informed) network describes the Eq. 3.5.3 residual. The loss function includes a
supervised loss of data measurements of u from the initial and boundary conditions and an
unsupervised loss of PDE:
ℒ = wdata ℒdata + wPDE ℒPDE
Eq. 3.5.4
Where
NDATA
1
ℒdata = ∑ [u(xi , t i ) − ui ]
NDATA
i=1
And
NPDE
1 ∂u ∂u ∂2 u
ℒdata = ∑ + u − υ 2ቤ
NPDE ∂t ∂x ∂x x t
j=1 j j
Here {xi, ti} and {xj, tj} are two sets of points sampled at the initial/boundary locations and in the
entire domain, respectively, and ui are values of u at (xi, ti); wdata and wPDE are the weights used to
balance the interplay between the two loss terms. These weights can be user- defined or tuned
automatically, and play an important role in improving the trainability of PINNs76,173. The
network is trained by minimizing the loss via gradient- based optimizers, such as Adam196 and L-
BFGS206, until the loss is smaller than a threshold ε. The PINN algorithm is shown below, and more
details about PINNs and a recommended Python library DeepXDE can be found in ref.154.
Algorithm 1: The PINN algorithm.
Construct a neural network (NN) u(x, t; θ) with θ the set of trainable weights w and biases b, and
σ denotes a nonlinear activation function. Specify the measurement data {xi, ti, ui} for u and the
residual points {xj, tj} for the PDE. Specify the loss L in Eq. 3.5.4 by summing the weighted losses
of the data and PDE. Train the NN to find the best parameters θ* by minimizing the loss L.
Box 3 - Continued
stemming from physics- based likelihood assumptions. Alternatively, a fully Bayesian treatment
using Markov chain Monte Carlo methods or variational inference approximations can be used to
quantify the uncertainty arising from noisy and gappy data, as discussed below.
3.5.2 Hybrid Approaches
The aforementioned principles of physics informed ML have their own advantages and limitations.
Hence, it would be ideal to use these different principles together, and indeed different hybrid
approaches have been proposed. For example, non- dimensionalization can recover characteristic
properties of a system, and thus it is beneficial to introduce physics bias via appropriate non-
dimensional parameters, such as Reynolds, Froude or Mach numbers. Several methods have been
proposed to learn operators that describe physical phenomena13,15,58,59. For example, DeepONets13
have been demonstrated as a powerful tool to learn nonlinear operators in a supervised data- driven
manner.
What is more exciting is that by combining DeepONets with physics encoded by PINNs, it is possible
to accomplish real- time accurate predictions with extrapolation in multi-physics applications such
as electro- convection60 and hypersonics61. However, when a low- fidelity model is available, a multi-
fidelity strategy62 can be developed to facilitate the learning of a complex system. For example, ref.63
combines observational and learning biases through the use of large eddy simulation data and
constrained NN training methods to construct closures for lower fidelity Reynolds- averaged Navier–
Stokes models of turbulent fluid flow.
Additional representative user cases include the multi fidelity NN used in ref. 64 to extract material
properties from instrumented indentation data, the PINs in ref.65 used to discover constitutive laws
of non- Newtonian fluids from rheological data, and the coarse graining strategies proposed in ref.66.
Even if it is not possible to encode the low fidelity model into the learning directly, the low fidelity
105
model can be used through data augmentation that is, generating a large amount of low fidelity data
via inexpensive low- fidelity models, which could be simplified mathematical models or existing
computer codes, such as ref.64.
Other representative examples include FermiNets32 and graph neural operator methods58. It is also
possible to enforce the physics to an NN by embedding a network into a traditional numerical method
(such as finite element). This approach was applied to solve problems in many different fields,
including nonlinear dynamical systems67, computational mechanics to model constitutive
relations68,69, subsurface mechanics70–72, stochastic inversion73 and more74,75.
3.5.3 Connections to Kernel Methods
Many of the presented NN- based techniques have a close asymptotic connection to kernel methods,
which can be exploited to produce new insight and understanding. For example, as demonstrated in
refs76,77, the training dynamics of PINNs can be understood as a kernel regression method as the
width of the network goes to infinity. More generally, NN methods can be rigorously interpreted as
kernel methods in which the underlying warping kernel is also learned from data 78,79. Warping
kernels are a special kind of kernels that were initially introduced to model non- stationary spatial
structures in geostatistics80 and have been also used to interpret residual NN models27,80.
Furthermore, PINNs can be viewed as solving PDEs in a reproducing kernel Hilbert space spanned
by a feature map (parametrized by the initial layers of the network), where the latter is also learned
from data. Further connections can be made by studying the intimate connection between statistical
inference techniques and numerical approximation.
Existing works have explored these connections in the context of solving PDEs and inverse
problems81, optimal recovery82 and Bayesian numerical analysis83–88. Connections between kernel
methods and NNs can be established even for large and complicated architectures, such as attention-
based transformers89, whereas operator valued kernel methods90 could offer a viable path of
analyzing and interpreting deep learning tools for learning nonlinear operators. In summary,
analyzing NN models through the lens of kernel methods could have considerable benefits, as kernel
methods are often interpretable and have strong theoretical foundations, which can subsequently
help us to understand when and why deep learning methods may fail or succeed.
3.5.4 Connections to Classical Numerical Methods
Classical numerical algorithms, such as Runge–Kutta methods and finite- element methods, have
been the main workhorses for studying and simulating physical systems in silico. Interestingly, many
modern deep learning models can be viewed and analyzed by observing an obvious correspondence
and specific connections to many of these classical algorithms. In particular, several architectures
that have had tremendous success in practice are analogous to established strategies in numerical
analysis. Convolutional NNs, for example, are analogous to finite different stencils in translationally
equivariant PDE discretizations91,92 and share the same structures as the multigrid method93; residual
NNs (ResNets, networks with skip connections)94 are analogous to the basic forward Euler
discretization of autonomous ordinary differential equations95–98; inspection of simple Runge–Kutta
schemes (such as an RK4) immediately brings forth the analogy with recurrent NN architectures (and
even with Krylov- type matrix- free linear algebra methods such as the generalized minimal residual
method)95,99. Moreover, the representation of DNNs with the ReLU activation function is equivalent
to the continuous piecewise linear functions from the linear finite element method100.
Such analogies can provide insights and guidance for cross- fertilization, and pave the way for new
‘mathematics- informed’ meta- learning architectures. For example, ref.7 proposed a discrete- time
NN method for solving PDEs that is inspired by an implicit Runge–Kutta integrator: using up to 500
latent stages, this NN method can allow very large time- steps and lead to solutions of high accuracy.
106
deep networks from a statistical physics viewpoint, establishing an intuitive connection between NNs
and the spin- glass models. In parallel, information propagation in wide DNNs has been studied based
on dynamical systems theory117,118, providing an analysis of how network initialization determines
the propagation of an input signal through the network, hence identifying a set of hyper parameters
and activation functions known as the ‘edge of chaos’ that ensure information propagation in deep
networks.
3.5.9 Tackling High Dimensionality
Deep learning has been very successful in solving high dimensional problems, such as image
classification with fine resolution, language modelling, and high- dimensional PDEs. One reason for
this success is that DNNs can break the curse of dimensionality under the condition that the target
function is a hierarchical composition of local functions119,120. For example, in ref.121 the authors
reformulated general high- dimensional parabolic PDEs using backward stochastic differential
equations, approximating the gradient of the solution with DNNs, and then designing the loss based
on the discretized stochastic integral and the given terminal condition. In practice, this approach was
used to solve high- dimensional Black–Scholes, Hamilton–Jacobi–Bellman and Allen–Cahn equations.
GANs122 have also proven to be fairly successful in generating samples from high- dimensional
distributions in tasks such as image or text generation123–125. As for their application to physical
problems, in ref.102 the authors used GANs to quantify parametric uncertainty in high- dimensional
stochastic differential equations, and in ref.126 GANs were used to learn parameters in high-
dimensional stochastic dynamics.
These examples show the capability of GANs in modelling high- dimensional probability distributions
in physical problems. Finally, in refs127,128 it was demonstrated that even for operator regression and
applications to PDEs, deep operator networks (DeepONets) can tackle the curse of dimensionality
associated with the input space.
3.5.10 Uncertainty Quantification
Forecasting reliably the evolution of multiscale and Multi-physics systems requires uncertainty
quantification. This important issue has received a lot of attention in the past 20 years, augmenting
traditional computational methods with stochastic formulations to tackle uncertainty due to the
boundary conditions or material properties129–131. For physics- informed learning models, there are
at least three sources of uncertainty: uncertainty due to the physics, uncertainty due to the data, and
uncertainty due to the learning models.
The first source of uncertainty refers to stochastic physical systems, which are usually described by
stochastic PDEs (SPDEs) or stochastic ordinary differential equations (SODEs). The parametric
uncertainty arising from the randomness of parameters lies in this category.
In ref.132 the authors demonstrate the use of NNs as a projection function of the input that can recover
a low- dimensional nonlinear manifold, and present results for a problem on uncertainty propagation
in an SPDE with uncertain diffusion coefficient. In the same spirit, in ref.133 the authors use a physics-
informed loss function, that is, the expectation of the energy functional of the PDE over the stochastic
variables , to train an NN parameterizing the solution of an elliptic SPDE. In ref.51, a conditional
convolutional generative model is used to predict the density of a solution, with a physics- informed
probabilistic loss function so that no labels are required in the training data. Notably, as a model
designed to learn distributions, GANs offer a powerful approach to solving stochastic PDEs in high
dimensions.
The physics informed GANs in refs.102,134 represent the first such attempts. Leveraging data collected
from simultaneous reads at a limited number of sensors for the multiple stochastic processes,
physics- informed GANs are able to solve a wide range of problems ranging from forward to inverse
problems using the same framework. Also, the results so far show the capability of GANs, if properly
formulated, to tackle the curse of dimensionality for problems with high stochastic dimensionality.
The second source of uncertainty, in general, refers to aleatoric uncertainty arising from the noise in
108
data and epistemic uncertainty arising from the gaps in data. Such uncertainty can be well tackled in
the Bayesian framework.
If the physics informed learning model is based on Gaussian process regression, then it is
straightforward to quantify uncertainty and exploit it for active learning and resolution refinement
studies in PDEs23,135, or even design better experiments136.
Another approach was proposed in ref.107 using B-PINNs. The authors of ref.107 showed that B- PINNs
can provide reasonable uncertainty bounds, which are of the same order as the error and increase as
the size of noise in data increases, but how to set the prior for B-PINNs in a systematic way is still an
open question.
The third source of uncertainty refers to the limitation of the learning models ; for example, the
approximation, training and generalization errors of NNs and is usually hard to rigorously quantify.
In ref.137, a convolutional encoder–decoder NN is used to map the source term and the domain
geometry of a PDE to the solution as well as the uncertainty, trained by a probabilistic supervised
learning procedure with training data coming from finite- element methods.
Notably, a first attempt to quantify the combined uncertainty from learning was given in ref.138, using
the dropout method of ref.139 and, due to physical randomness, using arbitrary polynomial chaos. An
extension to time- dependent systems and long- time integration was reported in ref.42: it tackled the
parametric uncertainty using dynamic and bi- orthogonal modal decomposition of the stochastic
PDE, which are effective methods for long- term integration of stochastic systems.
3.5.11 Applications Highlights
In this section, we discuss some of the capabilities of physics informed learning through diverse
applications. Our emphasis is on inverse and ill- posed problems, which are either difficult or
impossible to solve with conventional approaches. We also present several ongoing efforts on
developing open- source software for scientific ML.
3.5.12 Some Examples
3.5.12.1 Flow Over an Espresso Cup
In the first example, we discuss how to extract quantitative information on the 3D velocity and
pressure fields above an espresso coffee cup140. The input data is based on a video of temperature
gradient (Figure 3.5.2). This is an example of the ‘hidden fluid mechanics’ introduced In ref.106. It is
an ill posed inverse problem as no boundary conditions or any other information are provided.
Specifically, 3D visualizations obtained using tomographic background oriented Schlieren (Tomo-
BOS) imaging that measures density or temperature are used as input to a PINN, which seamlessly
integrates the visualization data and the flow and passive scalar governing equations, to infer the
latent quantities. Here, the physical assumption is that of the Boussinesq approximation, which is
valid if the density variation is relatively small. The PINN uses the space and time coordinates as
inputs and infers the velocity and pressure fields; it is trained by minimizing a loss function including
a data mismatch of temperature and the residuals of the conservation laws (mass, momentum and
energy). Independent experimental results from particle image velocimetry have verified that the
Tomo-BOS/PINN approach is able to provide continuous, high- resolution and accurate 3D flow
fields.
Figure 3.5.2 inferring the 3D flow over an espresso cup based using the Tomo-BOs imaging
system and physics-informed neural networks (PiNNs). a | Six cameras are aligned around an
espresso cup, recording the distortion of the dot- patterns in the panels placed in the background, where
the distortion is caused by the density variation of the airflow above the espresso cup. The image data are
acquired and processed with LaVision’s Tomographic BOS software (DaVis 10.1.1).
b | 3D temperature field derived from the refractive index field and reconstructed based on the 2D
images from all six cameras. c | Physics- informed neural network (PINN) inference of the 3D velocity
field (left) and pressure field (right) from the temperature data. The Tomo BOS experiment was
performed by F. Fuest, Y. J. Jeon and C. Gray from LaVision. The PINN inference and visualization were
performed by S. Cai and C. Li at Brown University. Image courtesy of S. Cai and C. Li, Brown University.
leading to tedious and empirical workflows for reconstructing vascular topologies and associated
flow conditions.
Recent developments on physics- informed deep learning can greatly enhance the resolution and
information content of current MRI technologies, with a focus on 4D-flow MRI. Specifically, it is
possible to construct DNNs that are constrained by the Navier–Stokes equations in order to
effectively de-noise MRI data and yield physically consistent reconstructions of the underlying
velocity and pressure fields that ensure conservation of mass and momentum at an arbitrarily high
spatial and temporal resolution. Moreover, the filtered velocity fields can be used to identify regions
of no-slip flow, from which one can reconstruct the location and motion of the arterial wall and infer
important quantities of interest such as wall shear stresses, kinetic energy and dissipation (Figure
3.5.3). Taken together, these methods can considerably advance the capabilities of MRI technologies
in research and clinical scenarios. However, there are potential pitfalls related to the robustness of
110
Figure 3.5.3 Physics-informed filtering of in-vivo 4D-flow magnetic resonance imaging data of blood
flow in a porcine descending aorta. Physics- informed neural network (PINN) models can be used to de-
noise and reconstruct clinical magnetic resonance imaging (MRI) data of blood velocity, while constraining
this reconstruction to respect the underlying physical laws of momentum and mass conservation, as
described by the incompressible Navier–Stokes equations. Moreover, a trained PINN model has the
potential to aid the automatic segmentation of the arterial wall geometry and to infer important biomarkers
such as blood pressure and wall shear stresses. a | Snapshot of in- vivo 4D- flow MRI measurements.
b–d | A PINN reconstruction of the velocity field (panel b), pressure (panel c), arterial wall surface geometry
and wall shear stresses (panel d). The 4D- flow MRI data were acquired by E. Hwuang and W. Witschey at
the University of Pennsylvania. The PINN inference and visualization were performed by S. Wang, G. Kissas
and P. Perdikaris at the University of Pennsylvania.
PINNs, especially in the presence of high signal- to- noise ratio in the MRI measurements and complex
patterns in the underlying flow (for example, due to boundary layers, high vorticity regions, transient
turbulent bursts through a stenosis, tortuous branched vessels and so on). That said, under
physiological conditions, blood flow is laminar, a regime under which current PINN models usually
remain effective.
3.5.12.3 Uncovering Edge Plasma Dynamics Via Deep Learning From Partial Observations
Predicting turbulent transport on the edge of magnetic confinement fusion devices is a longstanding
goal spanning several decades, currently presenting significant uncertainties in the particle and
energy confinement of fusion power plants. In ref.141 it was demonstrated that PINNs can accurately
learn turbulent field dynamics consistent with the two fluid theory from just partial observations of
a synthetic plasma, for plasma diagnosis and model validation in challenging thermonuclear
environments. Figure 3.5.4 displays the turbulent radial electric field learned by PINNs from partial
observations of a 3D synthetic plasma’s electron density and temperature141.
Figure 3.5.4 Uncovering edge plasma dynamics. One of the most intensely studied aspects of magnetic
confinement fusion is edge plasma behavior, which is critical to reactor performance and operation. The
drift- reduced Braginskii two- fluid theory has for decades been widely used to model edge plasmas, with
varying success. Using a 3D magnetized two- fluid model, physics- informed neural networks (PINNs) can
be used to accurately reconstruct141 the unknown turbulent electric field (middle panel) and underlying
electric potential (right panel), directly from partial observations of the plasma’s electron density and
temperature from a single test discharge (left panel). The top row shows the reference target solution,
while the bottom row depicts the PINN model’s prediction. These 2D synthetic measurements of electron
density and temperature over the duration of a single plasma discharge constitute the only physical
dynamics observed by the PINNs from the 3D collisional plasma exhibiting blob- like filaments. ϕ, electric
potential; Er, electric field; ne, electron density; Te, electron temperature. Figure courtesy of A. Matthews,
MIT.
performance of NNs, in general, depends on many factors, including the architectures and
optimization algorithms, which require further systematic investigation.
Figure 3.5.5 Transitions between metastable states. Results obtained from studying transitions
between metastable states of a distribution in a 144- dimensional Allen–Cahn type system. The top part of
the figure shows the two metastable states. The lower part of the figure shows, from left to right, a learned
sample path with the characteristic nucleation pathway for a transition between the two metastable states.
Here, q is the committor function. Figure courtesy of G. M. Rotskoff, Stanford University, and E. Vanden-
Eijnden, Courant Institute.
ref.147, the limit of molecular dynamics simulations was pushed with ab initio accuracy to simulating
more than 1- ns- long trajectories of over 100 million atoms per day, using a highly optimized code
for DeePMD on the Summit supercomputer. Before this work, molecular dynamics simulations with
ab initio accuracy were performed in systems with up to 1 million atoms147,148.
Table 3.5.1 Major software libraries specifically designed for physics- informed
machine learning
wrapper, meaning they wrap low- level functions of other libraries (such as TensorFlow) into
relatively high level functions for easier implementation of physics informed learning and users still
need to implement all the steps to solve the problem.
114
Software packages such as GPyTorch161 and Neural Tangents162 also enable the study of NNs and
PINNs through the lens of kernel methods. This viewpoint has produced new understanding of the
training dynamics of PINNs, subsequently motivating the design of new effective architectures and
training algorithms76,77.
DeepXDE not only solves integer order ODEs and PDEs, but it can also solve integra differential
equations and fractional PDEs. DeepXDE supports complex domain geometries via the technique of
constructive solid geometry, and enables the user code to stay compact, resembling closely the
mathematical formulation. DeepXDE is also well- structured and highly configurable, since all its
components are loosely coupled. We note that in addition to being used as a research tool for solving
problems in computational science and engineering, DeepXDE can also be used as an educational tool
in diverse courses. Although DeepXDE is suitable for education and research, SimNet155 developed by
Nvidia is specifically optimized for Nvidia GPUs for large scale engineering problems.
In PINNs (Box 3), one needs to compute the derivatives of the network outputs with respect to the
network inputs. One can compute the derivatives using automatic differentiation provided by ML
packages such as TensorFlow150. For example, ∂U/∂t -can be computed using TensorFlow as
tf.gradients (U, t), and second order derivatives can be computed by applying tf.gradients twice.
DeepXDE provides a more convenient way to compute higher order derivatives, for example using
dde.grad.hessian to compute the Hessian matrix. Moreover, there are two extra advantages to using
dde.grad.hessian: first, it is lazy evaluation, meaning it will only compute an element in the Hessian
matrix until that element is needed, rather than computing the whole Hessian matrix.
Second, it memorizes all the gradients that have already been computed to avoid duplicate
computation, even if the user calls the function multiple times in different parts of the code. These
two features could speed up the computation in problems where one needs to compute the gradients
many times, for example in a system of coupled PDEs. Most of these libraries (such as DeepXDE and
SimNet) use physics as the soft penalty constraints (Box 3), and ADCME embeds DNNs in standard
scientific numerical schemes (such as Runge–Kutta methods for ODEs, and the finite- difference,
finite element and finite- volume methods for PDEs) to solve inverse problems. ADCME was recently
extended to support implicit schemes and nonlinear constraints163,164. To enable truly large- scale
scientific computations on large meshes, support for MPI based domain decomposition methods is
also available and was demonstrated to scale very well on complex problems165.
However, the DeepONet framework can be used to infer an operator (instead of a function). In
DeepONet, the choice of the underlying architecture can also vary depending on the nature of
available data, such as scattered sensor measurements (multi-layer perceptron), images
(convolutional NNs) or time series (recurrent NNs). In all the aforementioned cases, the required
sample complexity is typically not known a priori and is generally determined by: the strength of
inductive biases used in the architecture; the compatibility between the observed data, and the
underlying physical law used as regularization; and the complexity of the underlying function or
operator to be approximated.
3.5.14 Current Limitations
3.5.14.1 Multiscale and Multi-Physics Problems
Despite the recent success of physics- informed learning across a range of applications, multiscale
and multi physics problems require further developments. For example, fully connected NNs have
difficulty learning high- frequency functions, a phenomenon referred to in the literature as the ‘F-
principle’169 or ‘spectral bias’170. Additional work171,172 rigorously proved the existence of frequency
bias in DNNs and derived convergence rates of training as a function of target frequency. Moreover,
high frequency features in the target solution generally result in steep gradients, and thus PINN
models often struggle to penalize accurately the PDE residuals45. As a consequence, for multiscale
problems, the networks struggle to learn high- frequency components and often may fail to train76,173.
To address the challenge of learning high frequency components, one needs to develop new
techniques to aid the network learning, such as domain decomposition105, Fourier features174 and
multiscale DNN45175.
However, learning multi-physics simultaneously could be computationally expensive. To address this
issue, one may first learn each physics separately and then couple them together. In the method of
DeepM&M for the problems of electro- convection60 and hypersonics61, several DeepONets were first
trained for each field separately and subsequently learned the coupled solutions through either a
parallel or a serial DeepM&M architecture using supervised learning based on additional data for a
specific multi-physics problem. It is also possible to learn the physics at a coarse scale by using the
fine- scale simulation data only in small domains176.
Currently in NN based ML methods, the physics informed loss functions are mainly defined in a
pointwise way. Although NNs with such loss functions can be successful in some high dimensional
problems, they may also fail in some special low dimensional cases, such as the diffusion equation
with non- smooth conductivity/permeability177.
3.5.15 New Algorithms and Computational Frameworks
Physics informed ML models often involve training large scale NNs with complicated loss functions,
which generally consist of multiple terms and thus are highly non convex optimization problems 178.
The terms in the loss function may compete with each other during training. Consequently, the
training process may not be robust and sufficiently stable, and thus convergence to the global
minimum cannot be guaranteed179.
To resolve this issue, one needs to develop more robust NN architectures and training algorithms for
diverse applications. For example, refs76,77,173 have identified two fundamental weaknesses of PINNs,
relating spectral bias170 to a discrepancy in the convergence rate of different components in a PINN
loss function. The latter is manifested by training instabilities leading to vanishing back propagated
gradients. As discussed in these refs76,77,173, these pathologies can be mitigated by designing
appropriate model architectures and new training algorithms for PINNs. Also, ref. 104 used the weak
form of the PDE and hp- refinement via decomposition to enhance the approximation capability of
networks.
Other examples include adaptively modifying the activation functions180 or sampling the data points
and the residual evaluation points during training181, which accelerate convergence and improve the
116
optimization? To answer this question, one should analyze the total error in deep learning, which
can be decomposed into three types of errors: approximation error (can a network approximate a
solution to PDE with any accuracy?), optimization error (can one attain zero or very small training
loss?) and generalization error (does smaller training error mean more accurate predicted
solution?).
It is important to analyses the well posed-ness of the problem and the stability and convergence in
terms of these errors. In particular, if the operator to be solved is (possibly partially) learned by the
data themselves, establishing how well posed any problem involving this operator is becomes an
exciting mathematical challenge. The challenge is exacerbated when the initial/boundary/internal
conditions are provided themselves as (possibly uncertain) data. This well-posedness issue must be
analyzed mathematically, aided by ML computational exploration.
The first mathematical analysis for PINNs in solving forward problems appeared in ref.188, where the
Hölder regularization was introduced to control generalization error. Specifically, ref.188 analyzed the
second order linear elliptic and parabolic type PDEs and proved the consistency of results.
References189,190 used quadrature points in the formulation of the loss and provided an abstract error
estimate for both forward and inverse problems.
However, no convergence results were reported, as the use of quadrature points does not quantify
the generalization error. In subsequent work, ref.191 studied linear PDEs and proposed an abstract
error estimates framework for analyzing both PINNs7 and variational PINNs104,192. Based on the
compactness assumptions and the norm equivalence relations, sufficient conditions for convergence
to the underlying PDE solution were obtained. The generalization error was handled by the
Rademacher complexity. For the continuous loss formulation, refs49,193–195 derived some error
estimates based on the continuous loss formulations of PINNs.
Although known error bounds involved with continuous norms (from PDE literature) may serve as
error bounds for (continuous) PINNs, data samples have to be taken into account to quantify the
generalization error.
In general, NNs are trained by gradient- based optimization methods, and a new theory should be
developed to better understand their training dynamics (gradient descent, stochastic gradient
descent, Adam196 and so on).
In ref.197, over parameterized two layer networks were analyzed, and it was proved that the
convergence of gradient descent for second order linear PDEs, but the boundary conditions were not
included in the analysis. In ref.76, the neural tangent kernel theory198 was extended to PINNs, and it
was shown that the training dynamics of PINNs sometimes can be regarded as a kernel regression as
the width of network goes to infinity.
It is also helpful to understand the training process of networks by visualizing the landscape of loss
function of different formulations (strong form, weak form and so on). Furthermore, more methods
are being rapidly developed nowadays, and thus it is also important to understand the equivalence
between models and the equivalence between different loss functions with different norms.
Analyzing the physics- informed ML models based on rigorous theory calls for a fruitful synergy
between deep learning, optimization, numerical analysis and PDE theory that not only has the
potential to lead to more robust and effective training algorithms, but also to build a solid foundation
for this new generation of computational methods.
3.5.18 Outlook
Physics informed learning integrates data and mathematical models seamlessly even in noisy and
high dimensional contexts, and can solve general inverse problems very effectively. Here, we have
summarized some of the key concepts in Boxes 1–3 and provided references to frameworks and
open- source software for the interested reader to have a head start in exploring physics informed
learning. We also discussed current capabilities and limitations and highlighted diverse applications
from fluid dynamics to biophysics, plasma physics, transition between metastable states and other
118
applications in materials. Next, we present possible new directions for applications of physics-
informed learning machines as well as research directions that will contribute to their faster training,
more accurate predictions, and better interpretability for diverse physics applications and beyond.
Although there have been tools like Tensor Board to visualize the model graph, track the variables
and metrics, and so on, for physical problems, extended requirements may include incorporating
multiple physics and complicated geometry domain into the learning algorithm, visualizing the
solution field (even high dimensional ones), as in traditional computing platforms such as FEniCS199,
OpenFOAM10 and others. A user friendly, graph based ML development environment that can
address the above issues could help more practitioners to develop physics- informed ML algorithms
for applications to a wide range of diverse physical problems.
3.5.19 Future Directions
3.5.19.1 Digital Twins
‘Digital twins’, a concept first put forth by General Electric to describe the digital copy of an engine
manufactured in their factories, are now becoming a reality in a number of industries. By assimilating
real measurements to calibrate computational models, a digital twin aims to replicate the behavior
of a living or non- living physical entity in silico. Before these emerging technologies can be translated
into practice, a series of fundamental questions need to be addressed. First, observational data can
be scarce and noisy, are often characterized by vastly heterogeneous data modalities (images, time
series, lab tests, historical data, clinical records and so on), and may not be directly available for
certain quantities of interest.
Second, physics based computational models heavily rely on tedious pre-processing and calibration
procedures (such as mesh generation or calibration of initial and boundary conditions) that typically
have a considerable cost, hampering their use in real- time decision making settings. Moreover,
physical models of many complex natural systems are, at best, ‘partially’ known as conservation laws,
and do not provide a closed system of equations unless appropriate constitutive laws are postulated.
Thanks to its natural capability of blending physical models and data as well as the use of automatic
differentiation that removes the need for mesh generation, physics- informed learning is well placed
to become an enabling catalyst in the emerging era of digital twins.
7. Raissi, M., Perdikaris, P. & Karniadakis, G. E. Physics- informed neural networks: a deep learning
framework for solving forward and inverse problems involving nonlinear partial differential
equations. J. Comput. Phys. 378, 686–707 (2019).
8. Schmidt, M. & Lipson, H. Distilling free- form natural laws from experimental data. Science 324,
81–85 (2009).
9. Brunton, S. L., Proctor, J. L. & Kutz, J. N. Discovering governing equations from data by sparse
identification of nonlinear dynamical systems. Proc. Natl Acad. Sci. USA 113, 3932–3937 (2016).
10. Jasak, H. et al. OpenFOAM: A C++ library for complex physics simulations. Int. Workshop Coupled
Methods Numer. Dyn. 1000, 1–20 (2007).
11. Plimpton, S. Fast parallel algorithms for short- range molecular dynamics. J. Comput. Phys. 117,
1–19 (1995).
12. Jia, X. et al. Physics- guided machine learning for scientific discovery: an application in simulating
lake temperature profiles. Preprint at arXiv https://arxiv.org/abs/2001.11086 (2020).
13. Lu, L., Jin, P., Pang, G., Zhang, Z. & Karniadakis, G. E. Learning nonlinear operators via DeepONet
based on the universal approximation theorem of operators. Nat. Mach. Intell. 3, 218–229 (2021).
14. Kashefi, A., Rempe, D. & Guibas, L. J. A point- cloud deep learning framework for prediction of
fluid flow fields on irregular geometries. Phys. Fluids 33, 027104 (2021).
15. Li, Z. et al. Fourier neural operator for parametric partial differential equations. in Int. Conf. Learn.
Represent. (2021).
16. Yang, Y. & Perdikaris, P. Conditional deep surrogate models for stochastic, high- dimensional, and
multi- fidelity systems. Comput. Mech. 64, 417–434 (2019).
17. LeCun, Y. & Bengio, Y. et al. Convolutional networks for images, speech, and time series. Handb.
Brain Theory Neural Netw. 3361, 1995 (1995).
18. Mallat, S. Understanding deep convolutional networks. Phil. Trans. R. Soc. A 374, (2016).
19. Bronstein, M. M., Bruna, J., LeCun, Y., Szlam, A. & Vandergheynst, P. Geometric deep learning:
going beyond Euclidean data. IEEE Signal Process. Mag. 34, 18–42 (2017).
20. Cohen, T., Weiler, M., Kicanaoglu, B. & Welling, M. Gauge equivariant convolutional networks and
the icosahedral CNN. Proc. Machine Learn. Res. 97, 1321–1330 (2019).
21. Owhadi, H. Multigrid with rough coefficients and multiresolution operator decomposition from
hierarchical information games. SIAM Rev. 59, 99–149 (2017).
22. Raissi, M., Perdikaris, P. & Karniadakis, G. E. Inferring solutions of differential equations using
noisy multi- fidelity data. J. Comput. Phys. 335, 736–746 (2017).
23. Raissi, M., Perdikaris, P. & Karniadakis, G. E. Numerical Gaussian processes for time- dependent
and nonlinear partial differential equations. SIAM J. Sci. Comput. 40, A172–A198 (2018).
24. Owhadi, H. Bayesian numerical homogenization. Multiscale Model. Simul. 13, 812–828 (2015).
25. Hamzi, B. & Owhadi, H. Learning dynamical systems from data: a simple cross- validation
perspective, part I: parametric kernel flows. Physica D 421, 132817 (2021).
26. Reisert, M. & Burkhardt, H. Learning equivariant functions with matrix valued kernels. J. Mach.
Learn. Res. 8, 385–408 (2007).
27. Owhadi, H. & Yoo, G. R. Kernel flows: from learning kernels from data into the abyss. J. Comput.
Phys. 389, 22–47 (2019).
28. Winkens, J., Linmans, J., Veeling, B. S., Cohen, T. S. & Welling, M. Improved semantic segmentation
for histopathology using rotation equivariant convolutional networks. in Conf. Med. Imaging Deep
Learn. (2018).
29. Bruna, J. & Mallat, S. Invariant scattering convolution networks. IEEE Trans. Pattern Anal. Mach.
Intell. 35, 1872–1886 (2013).
30. Kondor, R., Son, H. T., Pan, H., Anderson, B. & Trivedi, S. Covariant compositional networks for
learning graphs. Preprint at arXiv https://arxiv.org/abs/1801.02144 (2018).
31. Tai, K. S., Bailis, P. & Valiant, G. Equivariant transformer networks. Proc. Int. Conf. Mach. Learn. 97,
6086–6095 (2019).
121
32. Pfau, D., Spencer, J. S., Matthews, A. G. & Foulkes, W. M. C. Ab initio solution of the many electron
Schrödinger equation with deep neural networks. Phys. Rev. Res. 2, 033429 (2020).
33. Pun, G. P., Batra, R., Ramprasad, R. & Mishin, Y. Physically informed artificial neural networks for
atomistic modeling of materials. Nat. Commun. 10, 1–10 (2019).
34. Ling, J., Kurzawski, A. & Templeton, J. Reynolds averaged turbulence modelling using deep neural
networks with embedded invariance. J. Fluid Mech. 807, 155–166 (2016).
35. Jin, P., Zhang, Z., Zhu, A., Tang, Y. & Karniadakis, G. E. SympNets: intrinsic structure- preserving
symplectic networks for identifying Hamiltonian systems. Neural Netw. 132, 166–179 (2020).
36. Lusch, B., Kutz, J. N. & Brunton, S. L. Deep learning for universal linear embeddings of nonlinear
dynamics. Nat. Commun. 9, 4950 (2018).
37. Lagaris, I. E., Likas, A. & Fotiadis, D. I. Artificial neural networks for solving ordinary and partial
differential equations. IEEE Trans. Neural Netw. 9, 987–1000 (1998).
38. Sheng, H. & Yang, C. PFNN: A penalty- free neural network method for solving a class of second-
order boundary- value problems on complex geometries. J. Comput. Phys. 428, 110085 (2021).
39. McFall, K. S. & Mahan, J. R. Artificial neural network method for solution of boundary value
problems with exact satisfaction of arbitrary boundary conditions. IEEE Transac. Neural Netw.
(2009).
40. Beidokhti, R. S. & Malek, A. Solving initial- boundary value problems for systems of partial
differential equations using neural networks and optimization techniques. J. Franklin Inst. (2009).
41. Lagari, P. L., Tsoukalas, L. H., Safarkhani, S. & Lagaris, I. E. Systematic construction of neural forms
for solving partial differential equations inside rectangular domains, subject to initial, boundary and
interface conditions. Int. J. Artif. Intell. Tools 29, 2050009 (2020).
42. Zhang, D., Guo, L. & Karniadakis, G. E. Learning in modal space: solving time- dependent stochastic
PDEs using physics- informed neural networks. SIAM J. Sci. Comput. (2020).
43. Dong, S. & Ni, N. A method for representing periodic functions and enforcing exactly periodic
boundary conditions with deep neural networks. J. Comput. Phys. 435, 110242 (2021).
44. Wang, B., Zhang, W. & Cai, W. Multi- scale deep neural network (MscaleDNN) methods for
oscillatory stokes flows in complex domains. Commun. Comput. Phys. 28, 2139–2157 (2020).
45. Liu, Z., Cai, W. & Xu, Z. Q. J. Multi- scale deep neural network (MscaleDNN) for solving Poisson–
Boltzmann equation in complex domains. Commun. Comput. Phys. 28, 1970–2001 (2020).
46. Mattheakis, M., Protopapas, P., Sondak, D., Di Giovanni, M. & Kaxiras, E. Physical symmetries
embedded in neural networks. Preprint at arXiv https://arxiv.org/abs/1904.08991 (2019).
47. Cai, W., Li, X. & Liu, L. A phase shift deep neural network for high frequency approximation and
wave problems. SIAM J. Sci. Comput. 42, A3285–A3312 (2020).
48. Darbon, J. & Meng, T. On some neural network architectures that can represent viscosity solutions
of certain high dimensional Hamilton- Jacobi partial differential equations. J. Comput. Phys. (2021).
49. Sirignano, J. & Spiliopoulos, K. DGM: a deep learning algorithm for solving partial differential
equations. J. Comput. Phys. 375, 1339–1364 (2018).
50. Kissas, G. et al. Machine learning in cardiovascular flows modeling: predicting arterial blood
pressure from non- invasive 4D flow MRI data using physics informed neural networks. Comput.
Methods Appl. Mech. Eng. 358, 112623 (2020).
51. Zhu, Y., Zabaras, N., Koutsourelakis, P. S. & Perdikaris, P. Physics- constrained deep learning for
high- dimensional surrogate modeling and uncertainty quantification without labeled data. J. Comput.
Phys. 394, 56–81 (2019).
52. Geneva, N. & Zabaras, N. Modeling the dynamics of PDE systems with physics- constrained deep
auto- regressive networks. J. Comput. Phys. 403, 109056 (2020).
53. Wu, J. L. et al. Enforcing statistical constraints in generative adversarial networks for modeling
chaotic dynamical systems. J. Comput. Phys. 406, 109209 (2020).
54. Pfrommer, S., Halm, M. & Posa, M. Contactnets: learning of discontinuous contact dynamics with
smooth, implicit representations. Preprint at arXiv https://arxiv.org/abs/2009.11193 (2020).
122
55. Erichson, N.B., Muehlebach, M. & Mahoney, M. W. Physics- informed autoencoders for Lyapunov-
stable fluid flow prediction. Preprint at arXiv https://arxiv.org/abs/1905.10866 (2019).
56. Shah, V. et al. Encoding invariances in deep generative models. Preprint at arXiv
https://arxiv.org/abs/1906.01626 (2019).
57. Geneva, N. & Zabaras, N. Transformers for modeling physical systems. Preprint at arXiv
https://arxiv.org/abs/2010.03957 (2020).
58. Li, Z. et al. Multipole graph neural operator for parametric partial differential equations. in Adv.
Neural Inf. Process. Syst. (2020).
59. Nelsen, N. H. & Stuart, A. M. The random feature model for input–output maps between Banach
spaces. Preprint at arXiv https://arxiv.org/abs/2005.10224 (2020).
60. Cai, S., Wang, Z., Lu, L., Zaki, T. A. & Karniadakis, G. E. DeepM&Mnet: inferring the
electroconvection multiphysics fields based on operator approximation by neural networks. J.
Comput. Phys. 436, 110296 (2020).
61. Mao, Z., Lu, L., Marxen, O., Zaki, T. A. &Karniadakis, G. E. DeepM&Mnet for hypersonics: predicting
the coupled flow and finite- rate chemistry behind a normal shock using neural- network
approximation of operators. Preprint at arXiv https://arxiv.org/abs/2011.03349 (2020).
62. Meng, X. & Karniadakis, G. E. A composite neural network that learns from multi- fidelity data:
application to function approximation and inverse PDE problems. J. Comput. Phys. 401, (2020).
63. Sirignano, J., MacArt, J. F. & Freund, J. B. DPM: a deep learning PDE augmentation method with
application to large- eddy simulation. J. Comput. Phys. 423, 109811 (2020).
64. Lu, L. et al. Extraction of mechanical properties of materials through deep learning from
instrumented indentation. Proc. Natl Acad. Sci. USA 117, 7052–7062 (2020).
65. Reyes, B., Howard, A. A., Perdikaris, P. & Tartakovsky, A. M. Learning unknown physics of
non- Newtonian fluids. Preprint at arXiv https://arxiv.org/abs/2009.01658 (2020).
66. Wang, W. & Gómez- Bombarelli, R. Coarse- graining auto- encoders for molecular dynamics. NPJ
Comput. Mater. 5, 1–9 (2019).
67. Rico- Martinez, R., Anderson, J. & Kevrekidis, I. Continuous- time nonlinear signal processing: a
neural network based approach for gray box identification (IEEE, 1994).
68. Xu, K., Huang, D. Z. & Darve, E. Learning constitutive relations using symmetric positive definite
neural networks. Preprint at arXiv https://arxiv.org/abs/2004.00265 (2020).
69. Huang, D. Z., Xu, K., Farhat, C. & Darve, E. Predictive modeling with learned constitutive laws from
indirect observations. Preprint at arXiv https://arxiv.org/abs/1905.12530 (2019).
70. Xu, K., Tartakovsky, A. M., Burghardt, J. & Darve, E. Inverse modeling of viscoelasticity materials
using physics constrained learning. Preprint at arXiv https://arxiv.org/abs/2005.04384 (2020).
71. Li, D., Xu, K., Harris, J. M. & Darve, E. Coupled time- lapse full- waveform inversion for subsurface
flow problems using intrusive automatic differentiation. Water Resour. Res. 56, (2020).
72. Tartakovsky, A., Marrero, C. O., Perdikaris, P., Tartakovsky, G. & Barajas- Solano, D. Physics-
informed deep neural networks for learning parameters and constitutive relationships in subsurface
flow problems. Water Resour. Res. 56, e2019WR026731 (2020).
73. Xu, K. & Darve, E. Adversarial numerical analysis for inverse problems. Preprint at arXiv
https://arxiv.org/abs/1910.06936 (2019).
74. Yang, Y., Bhouri, M. A. & Perdikaris, P. Bayesian differential programming for robust systems
identification under uncertainty. Proc. R. Soc. A 476, 20200290 (2020).
75. Rackauckas, C. et al. Universal differential equations for scientific machine learning. Preprint at
arXiv https://arxiv.org/abs/2001.04385 (2020).
76. Wang, S., Yu, X. & Perdikaris, P. When and why PINNs fail to train: a neural tangent kernel
perspective. Preprint at arXiv https://arxiv.org/abs/2007.14527 (2020).
77. Wang, S., Wang, H. & Perdikaris, P. On the eigenvector bias of Fourier feature networks: from
regression to solving multi- scale PDEs with physics- informed neural networks. Preprint at arXiv
https://arxiv.org/abs/ 2012.10047 (2020).
123
78. Pang, G., Yang, L. & Karniadakis, G. E. Neural- net-induced Gaussian process regression for
function approximation and PDE solution. J. Comput. Phys. 384, 270–288 (2019).
79. Wilson, A. G., Hu, Z., Salakhutdinov, R. & Xing, E. P. Deep kernel learning. Proc. Int. Conf. Artif. Intell.
Stat. 51, 370–378 (2016).
80. Owhadi, H. Do ideas have shape? Plato’s theory of forms as the continuous limit of artificial neural
networks. Preprint at arXiv https://arxiv.org/abs/2008.03920 (2020).
81. Owhadi, H. & Scovel, C. Operator- Adapted Wavelets, Fast Solvers, and Numerical Homogenization:
From a Game Theoretic Approach to Numerical Approximation and Algorithm Design (Cambridge Univ.
Press, 2019).
82. Micchelli, C. A. & Rivlin, T. J. in Optimal Estimation in Approximation Theory (eds. Micchelli, C. A.
& Rivlin, T. J.) 1–54 (Springer, 1977).
83. Sard, A. Linear Approximation (Mathematical Surveys 9, American Mathematical Society, 1963).
84. Larkin, F. Gaussian measure in Hilbert space and applications in numerical analysis. Rocky Mt. J.
Math. 2, 379–421 (1972).
85. Sul’din, A. V. Wiener measure and its applications to approximation methods. I. Izv. Vyssh. Uchebn.
Zaved. Mat. 3, 145–158 (1959).
86. Diaconis, P. Bayesian numerical analysis. Stat. Decision Theory Relat. Top. IV 1, 163–175 (1988).
87. Kimeldorf, G. S. & Wahba, G. A correspondence between Bayesian estimation on stochastic
processes and smoothing by splines. Ann. Math. Stat. 41, 495–502 (1970).
88. Owhadi, H., Scovel, C. & Schäfer, F. Statistical numerical approximation. Not. Am. Math. Soc. 66,
1608–1617 (2019).
89. Tsai, Y. H. H., Bai, S., Yamada, M., Morency, L. P. & Salakhutdinov, R. Transformer dissection: a
unified understanding of transformer’s attention via the lens of kernel. Preprint at arXiv
https://arxiv.org/abs/ 1908.11775 (2019).
90. Kadri, H. et al. Operator- valued kernels for learning from functional response data. J. Mach. Learn.
Res. 17, 1–54 (2016).
91. González- García, R., Rico- Martínez, R. & Kevrekidis, I. G. Identification of distributed parameter
systems: a neural net based approach. Comput. Chem. Eng. 22, S965–S968 (1998).
92. Long, Z., Lu, Y., Ma, X. & Dong, B. PDE- Net: learning PDEs from data. Proc. Int. Conf. Mach. Learn.
80, 3208–3216 (2018).
93. He, J. & Xu, J. MgNet: a unified framework of multigrid and convolutional neural network. Sci.
China Math. 62, 1331–1354 (2019).
94. He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition (IEEE, 2016).
95. Rico- Martinez, R., Krischer, K., Kevrekidis, I., Kube, M. & Hudson, J. Discrete- vs. continuous- time
nonlinear signal processing of Cu electro dissolution data. Chem. Eng. Commun. 118, 25–48 (1992).
96. Weinan, E. A proposal on machine learning via dynamical systems. Commun. Math. Stat. (2017).
97. Chen, T. Q., Rubanova, Y., Bettencourt, J. & Duvenaud, D. K. Neural ordinary differential equations.
Adv. Neural Inf. Process. Syst. 31, 6571–6583 (2018).
98. Jia, J. & Benson, A. R. Neural jump stochastic differential equations. Adv. Neural Inf. Process. Syst.
32, 9847–9858 (2019).
99. Rico- Martinez, R., Kevrekidis, I. & Krischer, K. in Neural Networks for Chemical Engineers (ed.
Bulsari, A. B.) 409–442 (Elsevier, 1995).
100. He, J., Li, L., Xu, J. & Zheng, C. ReLU deep neural networks and linear finite elements. J. Comput.
Math. 38, 502–527 (2020).
101. Jagtap, A. D., Kharazmi, E. & Karniadakis, G. E. Conservative physics- informed neural networks
on discrete domains for conservation laws: applications to forward and inverse problems. Comput.
Methods Appl. Mech. Eng. 365, 113028 (2020).
102. Yang, L., Zhang, D. & Karniadakis, G. E. Physics- informed generative adversarial networks for
stochastic differential equations. SIAM J. Sci. Comput. 42, A292–A317 (2020).
124
103. Pang, G., Lu, L. & Karniadakis, G. E. fPINNs: fractional physics- informed neural networks. SIAM
J. Sci. Comput. 41, A2603–A2626 (2019).
104. Kharazmi, E., Zhang, Z. & Karniadakis, G. E. hp- VPINNs: variational physics- informed neural
networks with domain decomposition. Comput. Methods Appl. Mech. Eng. 374, 113547 (2021).
105. Jagtap, A. D. & Karniadakis, G. E. Extended physics informed neural networks (XPINNs): a
generalized space- time domain decomposition based deep learning framework for nonlinear partial
differential equations. Commun. Comput. Phys. 28, 2002–2041 (2020).
106. Raissi, M., Yazdani, A. & Karniadakis, G. E. Hidden fluid mechanics: learning velocity and
pressure fields from flow visualizations. Science 367, 1026–1030 (2020).
107. Yang, L., Meng, X. & Karniadakis, G. E. B- PINNs: Bayesian physics- informed neural networks
for forward and inverse PDE problems with noisy data. J. Comput. Phys. 415, 109913 (2021).
108. Wang, S. & Perdikaris, P. Deep learning of free boundary and Stefan problems. J. Comput. Phys.
428, 109914 (2020).
109. Spigler, S. et al. A jamming transition from under- to over- parametrization affects
generalization in deep learning. J. Phys. A 52, 474001 (2019).
110. Geiger, M. et al. Scaling description of generalization with number of parameters in deep
learning. J. Stat. Mech. Theory Exp. 2020, 023401 (2020).
111. Belkin, M., Hsu, D., Ma, S. & Mandal, S. Reconciling modern machine- learning practice and the
classical bias–variance trade- off. Proc. Natl Acad. Sci. USA 116, 15849–15854 (2019).
112. Geiger, M. et al. Jamming transition as a paradigm to understand the loss landscape of deep
neural networks. Phys. Rev. E 100, 012115 (2019).
113. Mei, S., Montanari, A. & Nguyen, P. M. A mean field view of the landscape of two- layer neural
networks. Proc. Natl Acad. Sci. USA 115, E7665–E7671 (2018).
114. Mehta, P. & Schwab, D. J. An exact mapping between the variational renormalization group and
deep learning. Preprint at arXiv https://arxiv.org/abs/1410.3831 (2014).
115. Stoudenmire, E. & Schwab, D. J. Supervised learning with tensor networks. Adv. Neural Inf.
Process. Syst. 29, 4799–4807 (2016).
116. Choromanska, A., Henaff, M., Mathieu, M., Arous, G. B. & LeCun, Y. The loss surfaces of multilayer
networks. Proc. Artif. Intell. Stat. 38, 192–204 (2015).
117. Poole, B., Lahiri, S., Raghu, M., Sohl- Dickstein, J. & Ganguli, S. Exponential expressivity in deep
neural networks through transient chaos. Adv. Neural Inf. Process. Syst. 29, 3360–3368 (2016).
118. Yang, G. & Schoenholz, S. Mean field residual networks: on the edge of chaos. Adv. Neural Inf.
Process. Syst. 30, 7103–7114 (2017).
119. Poggio, T., Mhaskar, H., Rosasco, L., Miranda, B. & Liao, Q. Why and when can deep but not
shallow networks avoid the curse of dimensionality: a review. Int. J. Autom. Comput. 14, 503–519
(2017).
120. Grohs, P., Hornung, F., Jentzen, A. & Von Wurstemberger, P. A proof that artificial neural
networks overcome the curse of dimensionality in the numerical approximation of Black–Scholes
partial differential equations. Preprint at arXiv https://arxiv.org/abs/1809.02362 (2018).
121. Han, J., Jentzen, A. & Weinan, E. Solving high dimensional partial differential equations using
deep learning. Proc. Natl Acad. Sci. USA 115, 8505–8510 (2018).
122. Goodfellow, I. et al. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 27, (2014).
123. Brock, A., Donahue, J. & Simonyan, K. Large scale GAN training for high fidelity natural image
synthesis. in Int. Conf. Learn. Represent. (2019).
124. Yu, L., Zhang, W., Wang, J. & Yu, Y. SeqGAN: sequence generative adversarial nets with policy
gradient (AAAI Press, 2017).
125. Zhu, J.Y., Park, T., Isola, P. & Efros, A. A. Unpaired image- to-image translation using cycle-
consistent adversarial networks (IEEE, 2017).
126. Yang, L., Daskalakis, C. & Karniadakis, G. E. Generative ensemble- regression: learning particle
125
dynamics from observations of ensembles with physics- informed deep generative models. Preprint
at arXiv https://arxiv.org/abs/2008.01915 (2020).
127. Lanthaler, S., Mishra, S. & Karniadakis, G. E. Error estimates for DeepONets: a deep learning
framework in infinite dimensions. Preprint at arXiv https://arxiv.org/abs/2102.09618 (2021).
128. Deng, B., Shin, Y., Lu, L., Zhang, Z. & Karniadakis, G. E. Convergence rate of DeepONets for
learning operators arising from advection–diffusion equations. Preprint at arXiv
https://arxiv.org/abs/2102.10621 (2021).
129. Xiu, D. & Karniadakis, G. E. The Wiener–Askey polynomial chaos for stochastic differential
equations. SIAM J. Sci. Comput. 24, 619–644 (2002).
130. Marzouk, Y. M., Najm, H. N. & Rahn, L. A. Stochastic spectral methods for efficient Bayesian
solution of inverse problems. J. Comput. Phys. 224, 560–586 (2007).
131. Stuart, A. M. Inverse problems: a Bayesian perspective. Acta Numerica 19, 451 (2010).
132. R. K. & Bilionis, I. Deep UQ: learning deep neural network surrogate models for high dimensional
uncertainty quantification. J. Comput. Phys. 375, 565–588 (2018).
133. Karumuri, S., Tripathy, R., Bilionis, I. & Panchal, J. Simulator- free solution of high- dimensional
stochastic elliptic partial differential equations using deep neural networks. J. Comput. Phys. (2020).
134. Yang, Y. & Perdikaris, P. Adversarial uncertainty quantification in physics- informed neural
networks. J. Comput. Phys. 394, 136–152 (2019).
135. Raissi, M., Perdikaris, P. & Karniadakis, G. E. Machine learning of linear differential equations
using Gaussian processes. J. Comput. Phys. 348, 683–693 (2017).
136. Fan, D. et al. A robotic intelligent towing tank for learning complex fluid- structure dynamics.
Sci. Robotics 4, eaay5063 (2019).
137. Winovich, N., Ramani, K. & Lin, G. ConvPDE-UQ: convolutional neural networks with quantified
uncertainty for heterogeneous elliptic partial differential equations on varied domains. J. Comput.
Phys. 394, 263–279 (2019).
138. Zhang, D., Lu, L., Guo, L. & Karniadakis, G. E. Quantifying total uncertainty in physics- informed
neural networks for solving forward and inverse stochastic problems. J. Comput. Phys. 397, (2019).
139. Gal, Y. & Ghahramani, Z. Dropout as a Bayesian approximation: representing model uncertainty
in deep learning. Proc. Int. Conf. Mach. Learn. 48, 1050–1059 (2016).
140. Cai, S. et al. Flow over an espresso cup: inferring 3-D velocity and pressure fields from
tomographic background oriented Schlieren via physics- informed neural networks. J. Fluid Mech.
915 (2021).
141. Mathews, A., Francisquez, M., Hughes, J. & Hatch, D. Uncovering edge plasma dynamics via deep
learning from partial observations. Preprint at arXiv https://arxiv.org/abs/2009.05005 (2020).
142. Rotskoff, G. M. & Vanden- Eijnden, E. Learning with rare data: using active importance sampling
to optimize objectives dominated by rare events. Preprint at arXiv
https://arxiv.org/abs/2008.06334 (2020).
143. Patel, R. G. et al. Thermodynamically consistent physics informed neural networks for
hyperbolic systems. Preprint at https://arxiv.org/abs/2012.05343 (2020).
144. Shukla, K., Di Leoni, P. C., Blackshire, J., Sparkman, D. & Karniadakis, G. E. Physics- informed
neural network for ultrasound nondestructive quantification of surface breaking cracks. J.
Nondestruct. Eval. 39, 1–20 (2020).
145. Behler, J. & Parrinello, M. Generalized neural network representation of high dimensional
potential energy surfaces. Phys. Rev. Lett. 98, 146401 (2007).
146. Zhang, L., Han, J., Wang, H., Car, R. & Weinan, E. Deep potential molecular dynamics: a scalable
model with the accuracy of quantum mechanics. Phys. Rev. Lett. 120, 143001 (2018).
147. Jia, W. et al. Pushing the limit of molecular dynamics with ab initio accuracy to 100 million atoms
with machine learning. Preprint at arXiv https://arxiv.org/abs/2005.00223 (2020).
148. Nakata, A. et al. Large scale and linear scaling DFT with the CONQUEST code. J. Chem. Phys. 152,
164112 (2020).
126
149. Zhu, W., Xu, K., Darve, E. & Beroza, G. C. A general approach to seismic inversion with automatic
differentiation. Preprint at arXiv https://arxiv.org/abs/2003.06027 (2020).
150. Abadi, M. et al. Tensorflow: a system for large- scale machine learning. Proc. OSDI 16, 265–283
(2016).
151. Paszke, A. et al. PyTorch: an imperative style, high performance deep learning library. Adv.
Neural Inf. Process. Syst. 32, 8026–8037 (2019).
152. Chollet, F. et al. Keras — Deep learning library. Keras https://keras.io (2015).
153. Frostig, R., Johnson, M. J. & Leary, C. Compiling machine learning programs via high level tracing.
in Syst. Mach. Learn. (2018).
154. Lu, L., Meng, X., Mao, Z. & Karniadakis, G. E. DeepXDE: a deep learning library for solving
differential equations. SIAM Rev. 63, 208–228 (2021).
155. Hennigh, O. et al. NVIDIA SimNet: an AI- accelerated multi physics simulation framework.
Preprint at arXiv https://arxiv.org/abs/2012.07938 (2020).
156. Koryagin, A., Khudorozkov, R. & Tsimfer, S. PyDEns: a Python framework for solving differential
equations with neural networks. Preprint at arXiv https://arxiv.org/abs/1909.11544 (2019).
157. Chen, F. et al. NeuroDiffEq: A python package for solving differential equations with neural
networks. J. Open Source Softw. 5, 1931 (2020).
158. Rackauckas, C. & Nie, Q. DifferentialEquations.jl a performant and feature- rich ecosystem for
solving differential equations in Julia. J. Open Res. Softw. 5, 15 (2017).
159. Haghighat, E. & Juanes, R. SciANN: a Keras/TensorFlow wrapper for scientific computations and
physics- informed deep learning using artificial neural networks. Comput. Meth. Appl. Mech. Eng. 373,
113552 (2020).
160. Xu, K. & Darve, E. ADCME: Learning spatially- varying physical fields using deep neural
networks. Preprint at arXiv https://arxiv.org/abs/2011.11955 (2020).
161. Gardner, J. R., Pleiss, G., Bindel, D., Weinberger, K. Q. & Wilson, A. G. Gpytorch: black box matrix–
matrix Gaussian process inference with GPU acceleration. Adv. Neural Inf. Process. Syst. 31, 7587–
7597 (2018).
162. Novak, R. et al. Neural Tangents: fast and easy infinite neural networks in Python. in Conf. Neural
Inform. Process. Syst. (2020).
163. Xu, K. & Darve, E. Physics constrained learning for data- driven inverse modeling from sparse
observations. Preprint at arXiv https://arxiv.org/abs/2002.10521 (2020).
164. Xu, K. & Darve, E. The neural network approach to inverse problems in differential equations.
Preprint at arXiv https://arxiv.org/abs/1901.07758 (2019).
165. Xu, K., Zhu, W. & Darve, E. Distributed machine learning for computational engineering using
MPI. Preprint at arXiv https://arxiv.org/abs/2011.01349 (2020).
166. Elsken, T., Metzen, J. H. & Hutter, F. Neural architecture search: a survey. J. Mach. Learn. Res. 20,
1–21 (2019).
167. He, X., Zhao, K. & Chu, X. AutoML: a survey of the state- of-the- art. Knowl. Based Syst. 212, 106622
(2021).
168. Hospedales, T., Antoniou, A., Micaelli, P. & Storkey, A. Meta- learning in neural networks: a
survey. Preprint at arXiv https://arxiv.org/abs/2004.05439 (2020).
169. Xu, Z.-Q. J., Zhang, Y., Luo, T., Xiao, Y. & Ma, Z. Frequency principle: Fourier analysis sheds light
on deep neural networks. Commun. Comput. Phys. 28, 1746–1767 (2020).
170. Rahaman, N. et al. On the spectral bias of neural networks. Proc. Int. Conf. Mach. Learn. 97, 5301–
5310 (2019).
171. Ronen, B., Jacobs, D., Kasten, Y. & Kritchman, S. The convergence rate of neural networks for
learned functions of different frequencies. Adv. Neural Inf. Process. Syst. 32, 4761–4771 (2019).
172. Cao, Y., Fang, Z., Wu, Y., Zhou, D. X. & Gu, Q. Towards understanding the spectral bias of deep
learning. Preprint at arXiv https://arxiv.org/abs/1912.01198 (2019).
127
173. Wang, S., Teng, Y. & Perdikaris, P. Understanding and mitigating gradient pathologies in physics-
informed neural networks. Preprint at arXiv https://arxiv.org/abs/2001.04536 (2020).
174. Tancik, M. et al. Fourier features let networks learn high frequency functions in low dimensional
domains. Adv. Neural Inf. Process. Syst. 33 (2020).
175. Cai, W. & Xu, Z. Q. J. Multi- scale deep neural networks for solving high dimensional PDEs.
Preprint at arXiv https://arxiv.org/abs/1910.11710 (2019).
176. Arbabi, H., Bunder, J. E., Samaey, G., Roberts, A. J. & Kevrekidis, I. G. Linking machine learning
with multiscale numeric: data- driven discovery of homogenized equations. JOM 72, (2020).
177. Owhadi, H. & Zhang, L. Metric- based upscaling. Commun. Pure Appl. Math. 60, 675–723 (2007).
178. Blum, A. L. & Rivest, R. L. Training a 3-node neural network is NP- complete. Neural Netw. 5,
117–127 (1992).
179. Lee, J. D., Simchowitz, M., Jordan, M. I. & Recht, B. Gradient descent only converges to minimizers.
Annu. Conf. Learn. Theory 49, 1246–1257 (2016).
180. Jagtap, A. D., Kawaguchi, K. & Em Karniadakis, G. Locally adaptive activation functions with slope
recovery for deep and physics- informed neural networks. Proc. R. Soc. A 476, 20200334 (2020).
181. Wight, C. L. & Zhao, J. Solving Allen–Cahn and Cahn–Hilliard equations using the adaptive physics
informed neural networks. Preprint at arXiv https://arXiv.org/abs/2007.04542 (2020).
182. Goswami, S., Anitescu, C., Chakraborty, S. & Rabczuk, T. Transfer learning enhanced physics
informed neural network for phase- field modeling of fracture. Theor. Appl. Fract. Mech. (2020).
183. Betancourt, M. A geometric theory of higher order automatic differentiation. Preprint at arXiv
https://arxiv.org/abs/1812.11592 (2018).
184. Bettencourt, J., Johnson, M. J. & Duvenaud, D. Taylor- mode automatic differentiation for higher-
order derivatives in JAX. in Conf. Neural Inform. Process. Syst. (2019).
185. Newman, D, Hettich, S., Blake, C. & Merz, C. UCI repository of machine learning databases. ICS
http://www.ics.uci.edu/~mlearn/MLRepository.html (1998).
186. Bianco, S., Cadene, R., Celona, L. & Napoletano, P. Benchmark analysis of representative deep
neural network architectures. IEEE Access 6, 64270–64277 (2018).
187. Vlachas, P. R. et al. Backpropagation algorithms and reservoir computing in recurrent neural
networks for the forecasting of complex spatiotemporal dynamics. Neural Networks (2020).
188. Shin, Y., Darbon, J. & Karniadakis, G. E. On the convergence of physics informed neural networks
for linear second- order elliptic and parabolic type PDEs. Commun. Comput. Phys. 28, 2042–2074
(2020).
189. Mishra, S. & Molinaro, R. Estimates on the generalization error of physics informed neural
networks (PINNs) for approximating PDEs. Preprint at arXiv https://arxiv.org/abs/2006.16144
(2020).
190. Mishra, S. & Molinaro, R. Estimates on the generalization error of physics informed neural
networks (PINNs) for approximating PDEs II: a class of inverse problems. Preprint at arXiv
https://arxiv.org/abs/2007.01138 (2020).
191. Shin, Y., Zhang, Z. & Karniadakis, G.E. Error estimates of residual minimization using neural
networks for linear PDEs. Preprint at arXiv https://arxiv.org/abs /2010.08019 (2020).
192. Kharazmi, E., Zhang, Z. & Karniadakis, G. Variational physics- informed neural networks for
solving partial differential equations. Preprint at arXiv https://arxiv.org/abs/1912.00873 (2019).
193. Jo, H., Son, H., Hwang, H. Y. & Kim, E. Deep neural network approach to forward inverse
problems. Netw. Heterog. Media 15, 247–259 (2020).
194. Guo, M. & Haghighat, E. An energy- based error bound of physics- informed neural network
solutions in elasticity. Preprint at arXiv https://arxiv.org/abs/2010.09088 (2020).
195. Lee, J. Y., Jang, J. W. & Hwang, H. J. The model reduction of the Vlasov–Poisson–Fokker–Planck
system to the Poisson–Nernst–Planck system via the deep neural network approach. Preprint at arXiv
https://arxiv.org/abs/2009.13280 (2020).
128
196. Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. in Int. Conf. Learn. Represent.
(2015).
197. Luo, T. & Yang, H. Two- layer neural networks for partial differential equations: optimization
and generalization theory. Preprint at arXiv https://arxiv.org/abs/2006.15733 (2020).
198. Jacot, A., Gabriel, F. & Hongler, C. Neural tangent kernel: convergence and generalization in
neural networks. Adv. Neural Inf. Process. Syst. 31, 8571–8580 (2018).
199. Alnæs, M. et al. The FEniCS project version 1.5. Arch. Numer. Softw. 3, 9–23 (2015).
200. Kemeth, F. P. et al. An emergent space for distributed data with hidden internal order through
manifold learning. IEEE Access 6, 77402–77413 (2018).
201. Kemeth, F. P. et al. Learning emergent PDEs in a learned emergent space. Preprint at arXiv
https://arxiv.org/abs/2012.12738 (2020).
202. Defense Advanced Research Projects Agency. DARPA shredder challenge rules. DARPA
https://web.archive.org/web/20130221190250/http://archive.darpa.mil/shredderchallenge/Rul
es.aspx (2011).
203. Rovelli, C. Forget time. Found. Phys. 41, 1475 (2011).
204. Hy, T. S., Trivedi, S., Pan, H., Anderson, B. M. & Kondor, R. Predicting molecular properties with
covariant compositional networks. J. Chem. Phys. 148, 241745 (2018).
205. Hachmann, J. et al. The Harvard clean energy project: large- scale computational screening and
design of organic photovoltaics on the world community grid. J. Phys. Chem. Lett. 2, 2241–2251
(2011).
206. Byrd, R. H., Lu, P., Nocedal, J. & Zhu, C. A limited memory algorithm for bound constrained
optimization. SIAM J. Sci. Comput. 16, 1190–1208 (1995).
Acknowledgements
We thank H. Owhadi (Caltech) for his insightful comments on the connections between NNs and kernel
methods. G.E.K. acknowledges support from the DOE PhILMs project (no. DE- SC0019453) and
OSD/AFOSR MURI grant FA9550-20-1-0358. I.G.K. acknowledges support from DARPA (PAI and ATLAS
programmers) as well as an AFOSR MURI grant through UCSB. P.P. acknowledges support from the
DARPA PAI program (grant HR00111890034), the US Department of Energy (grant DE- SC0019116),
the Air Force Office of Scientific Research (grant FA9550-20-1-0060), and DOE- ARPA (grant 1256545).
Author Contributions
Authors are listed in alphabetical order. G.E.K. supervised the project. All authors contributed equally to
writing the paper.
Competing Interests
The authors declare no competing interests.
Peer review Information
Nature Reviews Physics thanks the anonymous reviewers for their contribution to the peer review of this
work.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations. Related links
ADCMe: https://kailaix.github.io/ADCME.jl/latest
DeepXDe: https://deepxde.readthedocs.io/
GPyTorch: https://gpytorch.ai/
NeuroDiffeq: https://github.com/NeuroDiffGym/neurodiffeq
NeuralPDe: https://neuralpde.sciml.ai/dev/
Neural Tangents: https://github.com/google/neural- tangents
PyDens: https://github.com/analysiscenter/pydens
129
PyTorch: https://pytorch.org
sciANN: https://www.sciann.com/
simNet: https://developer.nvidia.com/simnet
TensorFlow: www.tensorflow.org
130
3.6 Case Study 6 - Classification of Machine Learning (ML) Frameworks for Data-
Driven Thermal Fluid Models
Authors : Chih-Wei Chang and Nam T. Dinh
Affiliations : Department of Nuclear Engineering North Carolina State University, Raleigh, NC.
Title of Paper : Classification of Machine Learning Frameworks for Data-Driven Thermal Fluid Models
Citation : (Chang & Dinh, 2019)
Bibliography : Chang , C.-W., & Dinh, N. T. (2019). Classification of Machine Learning Frameworks for
Data-Driven Thermal Fluid Models. arXiv:1801.06621 [physics.flu-dyn]
We focus on data-driven Thermal Fluid Simulation (TFS), specifically on their development using
Machine Learning (ML). Five ML frameworks are introduced by (Chang & Dinh, 2019) including
1 Physics-Separated ML (PSML or Type-I ),
2 Physics-Evaluated ML (PEML or Type-II),
3 Physics-Integrated ML (PIML or Type-III),
4 Physics-Recovered ML (PRML or Type-IV),
5 Physics-Discovered ML (PDML or Type-V).
The frameworks vary in their performance for different applications depending on the level of
knowledge of governing physics, source, type, amount and quality of available data for training.
Notably, outlined for the first time in this investigation, Type-III models present stringent
requirements on modeling, substantial computing resources for training, and high potential in
extracting value from “big data” in thermal fluid research. The current investigation demonstrates
and explores ML frameworks example such as the heat diffusion equation with a nonlinear
conductivity model, formulated by convolutional neural networks (CNNs) and feed forward
neural networks (FNNs). To illustrate the applications of Type-I, Type-II, Type-III, and Type-V
ML. The results indicate a preference for Type-II ML under deficient data support. Type-III ML can
effectively utilize field data, potentially generating more robust predictions than Type-I and Type-II
ML. CNN-based closures exhibit more predictability than FNN-based closures, but CNN-based
closures require more training data to obtain accurate predictions. Second, we illustrate how to
employ Type-I ML and Type-II ML frameworks for data-driven turbulence modeling using reference
works. Third, we demonstrate Type-I ML by building a deep FNN-based slip closure for two-phase
flow modeling. The results show that deep FNN-based closures exhibit a bounded error in the
prediction domain.
3.6.1 Machine Learning (ML) for Thermal Fluid Simulation
Machine learning (ML) can be used to develop closure models by learning from the available,
relevant, and adequately evaluated data78 with nonparametric models. While the concept of ML is not
new, the past decade has witnessed a significant growth of capability and interest in machine learning
thanking advances in algorithms, computing power, affordable memory, and abundance of data.
There is a wide range of applications of machine learning in different areas of engineering practice.
In a narrow context of the present study, the machine learning is defined as the capability to create
effective surrogates for a massive amount of data from measurements and simulations. Figure 3.6.1
depicts a workflow of employing ML for developing thermal fluid closures. The objective is to
construct a function to represent the unknown model that correlates inputs and targets. Since the
supervised learning is interested, inputs and targets are essential that can be obtained from ARAED.
78Thermal fluid simulations involve conservation equations with various degrees of averaging from the first
principle based on distinct hypotheses. The underlying physics of the conservation equations should be
consistent with the experiment or simulation where the Available, Relevant, and Adequately Evaluated Data
(ARAED) are obtained.
131
ML-based Thermal
fluid closures
ML (X) ≈ Y
Inputs Targets
X = {x1,…,xn}, Y = {y1,…,yn},
k =1,2,…,n k =1,2,…,n
Machine
Learning
Figure 3.6.1 Workflow of Employing ML methods for Developing Thermal fluid closures –
(Courtesy of Chang & Dinh)
The X denotes the flow feature space as inputs. The Y presents the response space as targets that are
associated with flow features. The subscript k denotes the kth measurement at a certain location.
After collecting all relevant datasets, ML models are generalized by a set of nonlinear functions with
hyper parameters to represent a thermal fluid closure. Based on different ML methods, various
algorithms are employed to seek an optimal solution that allows a ML-based model to fit the observed
data. Based on distinct learning purposes, Invalid source specified.79 classified ML methods into
five tribes including symbolists, evolutionary, analogizes, connectionists, and Bayesians. [Ling &
Templeton]80 evaluated the predictability of various ML algorithms for predicting the averaged
Navier-Stoke uncertainty in a high Reynolds region.
3.6.2 Thermal Fluid Data
Figure 3.6.2 provides an overall characterization of thermal fluid data by data type, data source, and
data quality. The global data are system conditions and integrated variables such as system pressure,
mass flow rate, pressure drop, and total heat input. The local data are time series data at specific
locations. The field data are measurements of field variables resolved in space and in time.
Traditionally, experiments are a primary source of data, including so-called integral effect tests
(IETs) and separate effect tests (SETs). As the name suggests, SETs and IETs are designed to
investigate isolated phenomena and complex (tightly coupled) phenomena, respectively.
Increasingly, appropriately validated numerical simulations become a credible source of data. This
includes high-fidelity numerical simulations (e.g., DNS, and other CFD methods), as well as system-
level simulation using computer models in parameter domains that are extensively calibrated and
validated. It is noted that datasets vary by their quality regarding the quantity and uncertainty. The
amount of data affects the performance of inverse modeling since sufficient data can reduce the
model parameter uncertainty in the domain of interest. Within a narrow context of ML for thermal
fluid simulation, the data quality can be characterized by the amount of relevant and adequately
evaluated data (i.e., data quantity) and associated uncertainty (including measurement uncertainty
and other biases, e.g., scaling, processing).
Global Data
Field Data
IET
Thermal Fluid Experiment
Data Source SET
Simulation
Quantity
Quality
Uncertainty
Figure 3.6.2 Hierarchy of Thermal Fluid Data - (Courtesy of Chang & Dinh)
Table 3.6.1 Criteria for the ML Framework Classification - (Courtesy of Chang & Dinh)
Figure 3.6.3 Overview of Type I ML Framework with a Scale Separation Assumption - (Courtesy of
Chang & Dinh)
for which the models are local. Type-I ML requires a thorough understanding of the system so that
SETs can be designed to support model developments. We can apply ML-based closures to assimilate
data to achieve data-driven thermal fluid simulation. Figure 3.6.3 depicts the architecture of Type-
I ML framework, and it is forward data-driven modeling. The procedure includes the following
elements:
3.6.4.5.1 Element 1
Assume a scale separation is achievable such that closure models can be built from SETs. From either
high-fidelity simulations or experiments, collect training data, (xk, yk).
3.6.4.5.2 Element 2
Preprocess data from element 1 to ensure that data from multi-sources have the same dimension and
manipulation such as the selection of averaging methods. Additionally, consider normalizing data so
that we can approximately equalize the importance for each data source. For large datasets, employ
principal component analysis can be helpful to reduce the dimension of data.
3.6.4.5.3 Element 3
Compute flow features or system characteristics, X, as training inputs for element 5.
3.6.4.5.4 Element 4
Calculate the corresponding outputs (Y) of the desired closures from data as training targets that can
supervise ML algorithms to learn from data.
135
3.6.4.5.5 Element 5
Utilize ML algorithms to build a correlation between inputs and targets. After the training, output the
ML-based closure model, ML(X), to element 6.
3.6.4.5.6 Element 6
Constrain the ML-based closure, g(ML(X)), to satisfy model assumptions and to ensure the
smoothness of model outputs since it needs to be solved with PDEs. It is noted that this element is
not essential if assumptions are not applicable.
3.6.4.5.7 Element 7
Implement the ML-based closure into conservation equations, and solve PDEs for predictions with
the embedded ML-based closure that is iteratively queried.
Type-I ML satisfies the criteria from Table 3.6.1 except the third criterion. The quality of SET data
largely controls the performance of closure models obtained by Type-I ML. While the experimental
uncertainty in each SET may be controlled and reduced, the process uncertainty (dominated by
design assumptions) is irreducible. We refer that PDEs and closure relations are decoupled in Type-
I ML. It can cause model biases between conservation equations and closure relations. It is noted that
inferring model parameters from data belong to inverse problems which are ill-posed. For ML
models, a small change in inputs can result in large uncertainty in outputs. While implementing ML
based closures in PDEs, the uncertainty can lead to a discontinuity that fails numerical simulation.
For more practices related to Type-I ML, readers are referred to [Ma et al.]81-82, [Parish &
Duraisamy]83, [Zhang & Duraisamy]84, among numerous others.
3.6.4.6 Type-II: Physics-Evaluated Machine Learning (PEML)
Type-II ML or so-called physics-evaluated machine learning (PEML) focuses on reducing the
uncertainty for conservation equations. It requires prior knowledge on selecting closure models to
predict thermal fluid behaviors. Type-II ML utilizes high-fidelity data to inform low-fidelity
simulation. Comparing to high-fidelity models, ROMs can efficiently solve engineering design
problems within an affordable time frame. However, ROMs may produce significant uncertainty in
predictions. Type-II ML can improve the uncertainty of low-fidelity simulation by reference data.
Since the physics of thermal fluids is nonlinear, ML algorithms are employed to capture the
underlying correlation behind high-dimensional data. The framework requires training inputs such
as flow features that represent the mean flow properties. Training targets are the responses that
correspond to input flow features. Type-II ML satisfies the first two criteria in Figure 3.6.3. We
refer that PDEs and closure relations are loosely coupled in Type-II ML because PDEs are only used
for calculating input flow features. The framework provides a one-step solution to improve low-
fidelity simulation. Model uncertainty is not accumulated in Type-II ML because numerical solvers
do not interact with ML models. However, Type-II ML exists an open question about what the
magnitude of initial errors can be before it is too late to bring a prior solution to a reference solution.
For more detailed examples of Type-II ML, readers are referred to [Ling & Templeton]85, [Ling, et
81 Ma M., Lu J., Tryggvason G., Using statistical learning to close two-fluid multiphase flow equations for a simple
bubbly system, Physics of Fluids, 27 (2015).
82 Tryggvason G., Ma M., Lu J., DNS–Assisted Modeling of Bubbly Flows in Vertical Channels, Nuclear Science and
86 Ling J., Jones R., Templeton J., Machine learning strategies for systems with invariance properties, Journal of
Computational Physics, 318 (2016) 22-35.
87 Wu J.-L., Wang J.-X., Xiao H., Ling J., Physics-informed machine learning for predictive turbulence modeling: A
nonlinear dynamical systems, Proceedings of the National Academy of Sciences, 113 (2016) 3932–3937.
90 Mills K., Spanner M., Tamblyn I., Deep learning and the Schrödinger equation, (2017).
91 Hanna B.N., Dinh N.T., Youngblood R.W., Bolotnov I.A., Coarse-Grid Computational Fluid Dynamics (CG-CFD)
In the present context of ML, knowledge refers to a body of theoretical and empirical evidence that
is available and trustworthy for understanding and description of physical mechanisms that underlie
thermal fluid processes under consideration. This knowledge can guide selecting model forms,
including conservation equations and corresponding closure relations, designing experiments, and
performing high-fidelity simulations. The data requirements refer to characteristics of the body of
data (e.g., types, amount, quality) needed to enable thermal fluid simulation with the required
accuracy. In other words, the required data must be sufficient to complement the “knowledge” for
building closure models and recovering/discovering the physics. The form of PDEs are known for
Type I, Type II, Type III ML, and the focus is to build closure relations. In traditional modeling
approaches, closure models are local, relating a group of (local) source terms (i.e., sub-grid-scale
interactions) to a group of (local) flow features. Even when in engineering literature, source terms
are expressed regarding global parameters (like flow rate, system pressure), they are used as
surrogates for local-valued parameters (through the assumptions that equate global and local
conditions).
Type - I ML build closure relations independently from PDEs, but it requires a thorough or assumed
understanding of the physics that is essential to set up SETs for acquiring data. Globally measured
data or locally measured data (using point instruments) are very small amount of data. In such case,
complicated ML-based closures are not necessarily the best choice. Therefore, among the
frameworks, Type - I ML exhibits a minimal data requirement with a maximal knowledge
requirement.
Type-II ML assumes prior knowledge of
physics that guide the selection of
closure relations for thermal fluid
simulation. However, the use of prior
models yields uncertainty in thermal
fluid analyses. This uncertainty (or
error) can be inferred by comparing the
model prediction to reference solutions
from high-fidelity simulations, high-
resolution experiments as well as data
obtained in IETs that include multi-
physics phenomena. Correspondingly,
Type-II ML requires larger data
quantities but less knowledge than
Type-I ML.
Type-III ML trains closure relations that
are embedded in conservation equations
without invoking a scale separation
Figure 3.6.4 Domain of Various ML Frameworks where L,
assumption. IET data can be directly
M, and H Denote Low, Medium, and High - (Courtesy of
adapted into simulation by applying Chang & Dinh)
Type-III ML. While the term ML is broad,
in the present work ML refers to the use
of non-parametric models or even narrower, use of DNNs. This means no prior knowledge of model
forms of closure relations. Thus, Type-III ML requires less knowledge than Type-II ML (which “best-
estimated” closure models on the basis of past data). Consequently, Type-III ML requires a large body
of data to represent models than that of Type-II ML. Type-IV ML intends not to make any bias on
selecting conservation equations; instead, it recovers the exact PDE form from data. It assumes less
prior knowledge but requires more extensive training data than the previous three frameworks.
Type-V ML is an extreme case that makes no assumption about prior knowledge or reference
solutions for thermal fluid systems under consideration. The aim is to apply ML methods to learn
138
from data, and to establish a data-driven predictive capability. For thermal fluid simulation, it means
discovering the effective model form of conservation equations and closure relations. Accordingly,
among the frameworks, Type-V ML is the most stringent with respect to data requirements (types,
quantity, and quality). Figure 3.6.4 depicts the domain of ML frameworks regarding prior
knowledge and data requirements
3.6.4.11 Case Study 1.1 - Heat Conduction Investigation by Type I ML Framework
The heat conduction case study is formulated to demonstrate how to employ Type I, Type II, and
Type III ML to build ML-based thermal conductivity and to compare results by each framework.
[Chanda et al.] used ANN with genetic algorithm92 to solve inverse modeling for heat conduction
problems. In this work, Deep Learning (DL)93 is selected as the ML methodology in this task.
Principally, any neural network (NN) with more than two layers (one hidden layer with an output
layer) is considered as to be [DL]94. [Hornik]95 proved that multilayer NNs are universal
approximators, and it can capture the properties of any measurable information. This capability
makes DL attractive for the closure development in thermal fluid simulation. Notably, we implement
NN-based thermal conductivity by FNNs and convolutional neural networks (CNNs) to evaluate
the performance of closure relations by distinct NNs.
3.6.4.11.1 Problem Formulation
We formulate the synthetic task using a 2D (two-dimensional) heat conduction model given by where
k(T) is nonlinear thermal conductivity. To generate training data, shows a temperature-dependent
model for k(T) where c, σ, and μ are constant parameters. Table 3.6.2 gives two parameter sets
(baseline and prior sets) to generate data. While demonstrating ML frameworks, k(T) becomes NN-
based thermal conductivity.
∂ ∂T ∂ ∂T c (T−μ)2
[k(t) ] + [k(t) ] = 0 , k(t) = e 2σ2
∂x ∂x ∂y ∂y σ√2π
Eq. 3.6.1
Data Set c (W/m) σ (K) μ (K)
Baseline set for producing synthetic data 7.2x104 300 1200
Prior set for producing inputs required by Type II ML 7.2x104 600 2100
Table 3.6.2 Parameter Sets for the Thermal Conductivity Model - (Courtesy of Chang & Dinh)
Two numerical experiments are designed to emulate IETs and SETs for manufacturing synthetic data
by solving using parameters sets in Table 3.6.2. IETs provide field data, for instance, 2D
temperature fields by an infrared camera. SETs offer global data such as a 1D measurement by
thermocouples. Synthetic data are used for training and validating NN-based thermal conductivity.
92 Hanna B.N., Dinh N.T., Youngblood R.W., Bolotnov I.A., Coarse-Grid Computational Fluid Dynamics (CG-CFD)
Error Prediction using Machine Learning, under review, (2017).
93 LeCun Y., Bengio Y., Hinton G., Deep learning, Nature, 521 (2015) 436-444.
94 Heaton J., Artificial Intelligence for Humans, Volume 3: Deep Learning and Neural Networks, Heaton Research,
Table 3.6.3 Summary of IET Training and Validating Data Sets - (Courtesy of Chang & Dinh)
prepare three training datasets by including distinct data quantities and three validating datasets by
changing Twest. Table 3.6.3 gives the metadata of each training or validating dataset. All
observations are uniformly sampled within a given temperature range.
3.6.4.11.3 Manufacturing SET Data
SETs are global measurements by
thermocouples. Figure 3.6.6 depicts
the layout of SETs for obtaining mean
temperature and heat conductivity
data. A heater is on top of the sample
to maintain a constant temperature
(TH). Thermal insulations are
installed on the outside surface. The
coolant at the bottom removes the
heat with a constant heat transfer
coefficient. Eq. 3.6.2 calculates
temperature profiles within the
sample using parameter sets in Table
3.6.3. Eq. 3.6.2 also calculates the Figure 3.6.6 Schematic of Separate Effects Tests (SETs) for
observed heat conductivity (kobs), and Measuring Thermal Conductivity as the Function of Sample’s
the mean temperature is obtained by Mean Temperature - (Courtesy of Chang & Dinh)
arithmetic averaging TH and TC.
140
∂ ∂T TH − TC
[k(t) ] = 0 , k obs = h(TC − Tcoolant )
∂x ∂x H
Eq. 3.6.2
We generate two training datasets with two coolant temperatures to explore the effect by different
data qualities shows the metadata of SET datasets. A large temperature gradient across the testing
sample increases the nonlinearity of temperature profiles. For each training set, we uniformly sample
41 TH from Eq. 3.6.3 to keep mean temperatures in SETs within the same range as IETs.
Table 3.6.4 Summary of SET Training Datasets - (Courtesy of Chang & Dinh)
96 Chih-Wei Chang and Nam T. Dinh, “Classification of Machine Learning Frameworks for Data-Driven Thermal
Fluid Models”, North Carolina State University, Raleigh NC 27695-7909.
97 Lecun Y., Bottou L., Bengio Y., Haffner P., Gradient-based learning applied to document recognition,
eight feature maps are generated, and each feature map detects the patterns from temperature fields.
The second convolutional layer takes inputs from the previous layer, and it outputs 12 feature maps.
The third convolutional layer receives inputs from the previous layer, and it delivers 24 feature maps
to fully connected layers. Finally, we obtain thermal conductivity fields from CNN’s outputs.
Learning is an optimization process, and we need to define a cost function based on distinct
types of data to inform ML algorithms to tune NN hyper parameters. Eq. 3.6.4 defines the cost
function where N, yi,data, and yi,model are the total number of training data, ith training data, and ith
model solution. To prevent overfitting, we add a regularization term in Eq. 3.6.4 where i, and NL
denote the ith layer and total layer number. λ is the regularization strength, and W is the matrix of
total weights in ith layer. We implement NN-based thermal conductivity using Tensor flow99 which
is the DL framework developed by Google. Weights and biases of NNs are tuned based on data using
[Adam]100 algorithm.
N NL
1 2
E= ∑(yi,model − yi,data ) + ∑ λi ‖Wi ‖2
2N
i=1 i=1
Eq. 3.6.4
99 Abadi M., Agarwal A., Barham P., Brevdo E., Chen Z., Citro C., Corrado G.S., Davis A., Dean J., Devin M., others,
Tensor flow: Large-scale machine learning on heterogeneous distributed systems, (2016).
100 Kingma D.P., Ba J., Adam: A Method for Stochastic Optimization, (2014).
142
In Type III, PDEs are involved in the training of machine-learning models, thus alleviating the
requirements on the scale separation assumption, and potentially reducing the necessity on the
physics decomposition. Correspondingly, Type III models present more stringent requirements on
modeling and substantially higher computing resources for training. Based on insights from the case
study performed, Type III ML has the highest potential in extracting the value from “big data” in
thermal fluid research, while ensuring data-model consistency. There are technical challenges that
need to be addressed before Type III models deliver their promises in practical thermal fluid
simulation, namely,
➢ Complex interactions of ML-based closures with a system of PDEs (including discontinuity in
hyperbolic systems);
➢ Effect of the non-local character of ML-based models on PDE solution methods; and
➢ Implementation and effect of multiple closure models, particularly in multiphase and thermal
flows.
143
not known priory. Using insufficient modes, the performance of aerodynamic shape optimization
may be significantly reduced. Using too many modes loses the compactness of the design space.
Another approach to establishing a compact design space is to use validity functions that filter out
abnormal aerodynamic shapes. This approach does not reduce the number of design variables but
alternatively shrinks the design space using computationally cheap models. Kedward et al. [16]
proposed a constraint on curve derivatives to ensure smooth shapes in aerodynamic shape design,
which improved both the optimization convergence rate and the optimization result in Aerodynamic
Design Optimization Discussion Group (ADODG) Case 1. Bons et al. [17] used a curvature constraint in
the design optimization of the Aerion AS2 supersonic business jet to improve the shape smoothness
for ease of manufacture. Li et al. [8] defined a series of marginal functions between the dominant
airfoil modes and higher-order modes derived from the UIUC airfoils, and the functions successfully
excluded abnormal airfoils from the design space. However, suitable bounds of the curvature-based
constraints are usually not known in advance, and there was no guarantee that the margin-based
validity functions [8] did not filter out realistic airfoils. The desired validity model should be an
accurate discriminator of geometric abnormalities. Benefiting from the strong learning capability of
deep-learning models, Li et al. [4] developed such a validity model using the convolutional neural
networks (CNN). The model provided validity scores on geometric abnormalities by checking airfoils
and wing sectional shapes. Nevertheless, the CNN-based validity function may not cover the desired
design space of geometric innovation beyond the UIUC airfoils, because the CNN model is trained via
the airfoil GAN model, which is pre-trained by UIUC airfoils. Even worse, if modal collapse occurs in
the GAN model, the geometric filtering model may exclude conventional airfoil shapes. Thus, the
deep-learning-based filtering model [4] faces criticism: Does the geometric filtering model prevent
finding the optimal designs with innovative aerodynamic shapes? This work aims to address the
concern on the deep-learning-based geometric filtering and showcase the appealing benefits of
applying it in aerodynamic shape optimization. First, we analyze the mode collapse issue in the airfoil
GAN model and propose to use the Wasserstein GAN in the generation of realistic airfoils. In order to
avoid over-fitting, a large number of training airfoils are generated to train the CNN-based
discriminative model for geometric filtering. Then, to investigate whether the optimal designs are
excluded by the filtering model, we perform a series of airfoil design optimizations based on different
flight missions of commercial aircraft. More persuasively, such investigations are performed in the
aerodynamic shape optimization of a wing-body-tail configuration, the Common Research Model
(CRM), and a blended-wing-body (BWB) configuration. After analyzing the results, we perform
aerodynamic shape optimization of a unit circle to highlight the necessity of geometric filtering in a
large high-dimensional design space. Moreover, global wing modes of the CRM and BWB
configurations are derived to showcase the utility of deep-learning-based geometric filtering in
three-dimensional aircraft modeling.
3.7.2 II. Deep-learning-based Geometric
3.7.2.1 Filtering A. Wasserstein GAN for Airfoils
To train the geometric filtering model, a large number of airfoils, both realistic and abnormal ones,
are required. However, about 1600 airfoils can be found from publicly accessible databases
worldwide, which may be much less than required, and some of them cannot be used due to missing
data [18]. GAN can produce similar synthetic versions of the real dataset by adversarial training a
generative model and a discriminative model. Via minimizing the distance between distributions of
the training data and the synthetic data, GAN models have been successfully used in the generation
of realistic and even hyper-realistic synthetic images and voices [19]. In the aerospace field, various
GAN models have been used in airfoil parameterization [15] and shape optimization [4]. To generate
a large number of realistic airfoils for the training of the geometric filtering model, we use GAN with
a CNN-based generative model, which has been shown to be robust in producing smooth airfoil
shapes [4]. The details of this model are explained in the appendix.
145
One major concern with GAN is the mode collapse issue. When a GAN model encounters mode
collapse, the synthetic airfoils are distributed in a small domain and cannot represent the design
space spanned by the training airfoils.
Using static metrics like the inception score and the maximum mean discrepancy, Li et al. [4] showed
that the model collapse issue was alleviated by normalizing the input data. Nevertheless, such an
issue may still exist. To intuitively show whether there is a mode collapse, we make a comparison of
the spatial distributions of training airfoils and synthetic airfoils. Since each airfoil is represented by
hundreds of points, it is noisy and difficult to compare within the high-dimensional data. We adopt
the airfoil mode method [8] to represent these airfoils and alternatively compare the coefficient
distributions of dominant airfoil modes. For an arbitrary airfoil y, the mode coefficients c can be
computed using the orthogonal mode basis Φ, i.e., c = ΦT y.
We generate 5000 synthetic airfoils using the GAN model of Li et al. [4], and as shown in Figure
3.7.1, their mode coefficients are compared with those of UIUC airfoils. In the diagonal sub-figures,
the coefficients of the six dominant modes are compared by using one-dimensional probability
distribution curves. Significant distribution differences are reported in the first, second, and fifth
modes. In the lower triangle sub-figures, to have a more intuitive comparison, UIUC airfoils (black
dots) and GAN synthetic airfoils (red dots) are marked based on their mode coefficients in the
corresponding two-dimensional planes. Although the airfoil GAN model presented by Li et al. [4] does
not show significant mode collapse, there are still some domains that cannot be captured by the GAN
model, which may be regarded as a slight mode collapse issue.
To overcome the issue, we use the Wasserstein GAN (WGAN) proposed by Arjovsky et al. [20] to train
the synthetic airfoil generator. In WGAN, the Wasserstein distance is used as the metric among
distributions of the real dataset (Pr ) and the generated samples (Pg). Based on the Kantorovich-
Rubinstein duality, the Wasserstein distance can be defined as
1
W(Pr , Pg ) = ⏟ (𝔼x~pr [f(x)] − 𝔼x~pg [f(x)])
sup
K |f|
L≤ K
Eq. 3.7.1
where sup is the least upper bound and E represents the expected value. f is a K-Lipschitz continuous
function and is implemented by the discriminator in WGAN. Thus, the discriminator is not directly
used as a discriminative model to tell whether a sample comes from the real data or the generated
data. Instead, it is trained to learn a K-Lipschitz continuous function that estimates the Wasserstein
distance between real and generated samples. We use the same CNN-based generator and
discriminator as in the GAN model [4] to train the WGAN model, and the same 1407 UIUC airfoils are
used as the training data of the WGAN model. To enforce the Lipschitz constraint on the
discriminator, the gradient penalty approach proposed by Gulrajani et al. [21] is used. As shown in
Figure 3.7.1, WGAN airfoils are distributed in all domains of UIUC airfoils. This implies that the
airfoil WGAN model accurately captures the underlying distribution of UIUC airfoils and does not
suffer from mode collapse. Besides, the WGAN model exhibits an exploration capability. Some WGAN
airfoils are distributed in the adjacent domains with no UIUC airfoils. The feature may help to enable
the geometric filtering model to contain innovative shapes. We use WGAN to generate realistic
training airfoils for the geometric filtering model.
We seek to construct a filtering model to automatically detect geometric abnormality for airfoil-like
shapes. One important application of the geometric filtering model is high-fidelity aerodynamic
shape optimization in order to constrain the search domain of the optimizer. Thus, it is important to
have a fast and accurate evaluation of geometric abnormalities. Commonly-used methods include
nearest neighbors, decision trees, random forest, neural networks, and so on. The nearest neighbors
method may be too costly due to a large number of training data. To maintain the capability of being
used in gradient-based optimization, the model should be differentiable. Thus, tree-based methods
such as the decision tree and random forest are not suitable. In this work, we merely investigate
models based on neural networks because the gradient can be accurately evaluated by automatic
differentiation. Two kinds of neural networks, multi-layer perceptron (MLP) and CNN, are
investigated. Neural networks are trained by minimizing the loss function.
In the training of neural networks, the loss functions get smaller as a sign of accuracy improvement.
We investigate two typical loss functions, the binary cross-entropy (BCE) and the mean absolute
error (MSE). The loss functions are defined as
N N
1 1
fBCE = − ∑ yi log(ŷ𝑖 ) + (1 − yi )log(1 − ŷ𝑖 ) , fMSE = ∑(y𝑖 − ŷi )2
N N
i=1 i=1
Eq. 3.7.2
where yi and ˆyi are the original score (label) and predicted score of the ith airfoil. Each realistic airfoil
(WGAN airfoil) is labeled as 1.0. Abnormal airfoils are labeled to be 0.0 and -1.0 for models trained
with fBCE and fMSE, respectively.
We investigate MLP models with different layers and numbers of neurons. However, we find that the
prediction accuracy is rather low whatever hyperparameters are used. The results imply that it is
unsuitable to use MLP to detect geometric abnormality of airfoils. A possible reason may be that it is
difficult for MLP networks to learn the underlying features of airfoils. CNN is effective in extracting
airfoil geometric features [4]. We use a CNN model with four convolutional layers to extract the
underlying features step by step. The stride step is two in each convolutional layer to realize the
148
down sampling process. The filter size (nsize) and the number of filters (nfilter) in each layer are
important hyperparameters in CNN.
We investigate the influence of the two hyperparameters in the training of the geometric filtering
model. The training processes with different hyperparameters and loss functions are shown in
Figure 3.7.3. Sub-figures in the first and second rows are CNN models trained using fMSE and fBCE,
respectively. Sub-figures in different columns are models with different nfilter. In each sub-figure, the
models with different nsize are distinguished by line colors, and the solid and dashed lines represent
the loss functions in the training and testing datasets, respectively. The accuracy of the CNN models
increases with the rise of nsize and nfilter, and there is no sign of over-fitting. The training of CNN
models with both loss functions tends to be unstable when a large nsize or nfilter is used. We find that
CNN models using the BCE function cannot provide validity scores with a smooth transition from
realistic airfoils to abnormal airfoils. This feature makes BCE-based CNN models unsuitable as a
constraint in gradient-based aerodynamic shape optimization. Based on the results, we use the CNN
model with nsize = 5 and nfilter = 64 trained by the MSE loss function in this work.
Figure 3.7.3 Investigation of CNN hyperparameters in the training of airfoil filtering models
101 https://github.com/mdolab/MACH-Aero
149
ADflow [32, 33], which is an open-source finite-volume structured multi-block computational fluid
dynamics (CFD) code, is used to solve the Reynolds-averaged Navier-Stokes equations with a
Spalart–Allmaras turbulence model [34]. The adjoint solver [35] embedded in ADflow is used to
compute the gradient. The analytic inverse-distance method implemented in IDWarp102 is used to
automatically deform the volume mesh. FFD [22] is used as the parameterization method for
aerodynamic shapes. In this section, the filtering model is used to check the optimized shapes, so it is
not coupled with the optimization framework.
102 https://github.com/mdolab/idwarp
150
Figure 3.7.4 FFD control points and the CFD mesh for airfoil design
The optimized airfoils are shown in Figure 3.7.5. Due to the large number of airfoils, it is difficult
to show all details in one figure. So we provide all the optimized airfoils, including the initial and
optimized aerodynamic coefficients, in Mendeley Data [37]. Different optimal airfoil shapes are
desired in different flight missions. When operated in missions with high Mach numbers, the upper
surface of the airfoil tends to be flat to reduce the strength of shock waves, which makes the
optimized airfoils similar to supercritical airfoils. This phenomenon becomes more obvious with the
increase of the flight altitude, where a larger Cl is required due to the decrease of the air density and
sound speed. The increase of aircraft mass leads to a direct increase in the lift constraint. When the
mass equals to 220000 kg and H = 12000 m, the optimized airfoils become sunken at the lower
surface near the leading edge to provide more lift without violating the pitching moment constraint.
This character is different from typical supercritical airfoils, which tend to generate more lift by
151
Figure 3.7.5 Optimized airfoils for different flight8missions subject to different area constraints
increasing trailing edge camber. The choice herein may be due to the strict pitching moment
152
Figure 3.7.7 Nine sectional airfoils are monitored in the CRM optimization
Figure 3.7.8 Eight sectional airfoils are monitored in the BWB optimization
154
The optimization results of the CRM aircraft are shown in Figure 3.7.9. Both optimized designs are
Figure 3.7.9 Geometric filtering constraint with Svalidity > 1:0 does not filter out the optimal shapes in
the CRM design
Figure 3.7.10 Geometric filtering constraint with Svalidity > 1:0 does not filter out the optimal shapes in
the BWB design
155
almost shock-free with the drag coefficient significantly reduced from the baseline value (C D =
0.0353). The monitored sectional airfoils are shown in Figure 3.7.9 as well, which are arranged
along the horizontal axis based on the validity scores evaluated by the deep-learning filtering model.
Using the loose thickness constraints leads to more geometric freedom in design optimization, and
the optimized shapes tend to have smaller validity scores. Nevertheless, all wing sectional airfoils in
both optimized CRM configurations achieve scores larger than 1.0. Thus, the deep-learning
geometric filtering constraint will not filter out the optimal shapes in the CRM design optimization.
As shown in Figure 3.7.10, two kinds of thickness constraints in the BWB design lead to a significant
difference in the optimized shapes. Nevertheless, both optimized aircraft are shock-free and lead to
significant drag reductions from the baseline value (CD = 0.0128). Validity scores of wing sectional
airfoils are all larger than 1.0 as well. Similarly to the CRM design optimization, using deep-learning
geometric filtering constraint with Svalidity > 1.0 does not filter out these optimal designs.
3.7.4 IV. Applications of Deep-learning-based Geometric Filtering
Multiple tests in Section III show that the deep learning model developed in Section II does not filter
out optimal solutions in aerodynamic shape optimization. The results encourage the use of the
filtering model to address geometric abnormality issues in aerodynamic shape optimization. In this
section, two promising applications based on the model are showcased.
Figure 3.7.11 Gradient-based optimization starting from a circle fails due to too much geometric
abnormality
distance mesh deformation method [39] is used, it is difficult to ensure the mesh quality in this
circumstance.
The deformed “dumbbell” shapes can be easily identified as abnormal by the deep-learning-based
geometric filtering model. We use the geometric validity function as an inequality constraint (Svalidity
> 0.7) in the optimization and perform the gradient-based optimization from the unit circle. An area
constraint is imposed to ensure the optimized shape with the same area as the RAE2822 airfoil. A lift
constraint with Cl = 0.824 and a pitching moment constraint with Cm ≥ -0.092 are used. As shown in
Figure 3.7.12, the optimization gradually converges to a supercritical airfoil merely using 200 CFD
evaluations. Inequality constraints are not necessarily satisfied in the line searching process of
SLSQP. Thus, directly using the geometric validity function as an inequality constraint cannot ensure
Figure 3.7.12 The deep-learning-based filtering constraint ensures the success of gradient-based
optimization from a circle
157
that “dumbbell” shapes are always avoided during the optimization. To address this issue, we add a
judgment before calling the CFD evaluation module. When the validity score of the updated shape is
smaller than 0.0, Cd is enforced to a large value, i.e., Cd = 1.0 - Svalidity, without the need of performing
CFD simulations. This circumstance merely occurs in the early part of optimization, when the drag is
large. For simplification, lift and pitching moment constraints are imposed after the drag is
significantly reduced, and before wards, there is no need to estimate Cl or Cm. This approach is not
generic, and a modified gradient-based optimization algorithm is on-demand to take full advantage
of geometric filtering. Nevertheless, the result shows that deep-learning-based geometric filtering
makes gradient-based optimization manage to handle such a challenging case.
⋮ ⋮ ⋱ ⋮
[an − abaseline
1
n an − abaseline
2
n … an − abaseline
m
n ]
Eq. 3.7.4
where abaselinei is ith deformable coordinate (z for the CRM and y for the BWB) of the baseline wing.
Performing SVD on A, we obtain
A = UΣV T
Eq. 3.7.5
where columns in U correspond to global mode shapes of the wing. The global mode shapes have
been used in a wing design optimization problem [40, 41], and the compact design space defined by
global wing modes makes the adjoint solver unnecessary in high-dimensional wing design. For the
CRM and BWB aircraft considered in this work, we generate 200 samples to derive global wing
modes. The dominant global wing modes for both configurations are shown in Figure 3.7.14 and
Figure 3.7.13. The first modes bend the wing sections in a non-parallel manner. Different from local
section-based modal shapes, global modal shapes can lead to intersectional deformations. So, the
global modal shapes may improve the efficiency of aerodynamic shape parameterization. Its
application in aerodynamic modeling of three-dimensional wing-based configurations is worth
looking forward to.
158
Figure 3.7.14 Dominant global mode shapes for the CRM wing and tail optimization
3.7.5 V. Conclusions
This work addresses the
concern of applying the deep-
learning-based geometric
filtering model in aerodynamic
shape design optimization. The
WGAN model is proposed to
generate synthetic airfoils.
Different types of neural
networks are investigated in
the training of the geometric
validity model. To show
whether innovative optimal
designs would be filtered out by
the validity model, various
airfoil and wing shape design
optimizations are performed.
Airfoil design optimizations are
based on 72 flight missions of
commercial aircraft and three
area constraints. Wing design
optimization of two aircraft
configurations, the
conventional CRM wing-body-
tail and an innovative BWB, are
considered.
The proposed WGAN-based Figure 3.7.13 Dominant global mode shapes for the BWB
generative model addresses the optimization
159
model collapse issue that occurs in the airfoil GAN model. WGAN synthetic airfoils are not only a good
representation of the training airfoils but also are distributed in domains that few training airfoils
exist. Thus, the WGAN model has an extrapolation capability, and the synthetic airfoils convey more
geometric information than the airfoils by which the model is trained. For the geometric validity
model, we find that a CNN-based discriminator trained with the MSE loss function is preferable. This
choice results in a smooth and accurate discriminative model for geometric abnormalities. The
validity scores of the 216 optimal airfoils are all greater than 0.7 and inside the scope of the UIUC
airfoil scores. The wing sectional airfoils in the optimized CRM and BWB earn higher validity scores
no matter loose or strict thickness constraints are used. Using a suitable validity constraint, say Svalidity
> 0.7, does not prevent the optimizer from finding the optimal designs. These results imply that
innovative shapes with preferable aerodynamic performance are inside the recognition scope of the
deep-learning-based geometric filtering model, which eases the concern on geometric filtering.
Using the deep-learning geometric filtering model can improve the performance of aerodynamic
shape optimization and aerodynamic modeling. Two promising applications of the model are
showcased. First, being used as a geometric validity constraint, the model makes gradient-based
optimization manage to tackle challenging optimization problems with many geometric
abnormalities. Second, the filtering model is used to derive global wing modes, which provides a
compact parameterization of three-dimensional aircraft configurations. The deep-learning-based
geometric filtering provides a fast and reliable approach to the judgment of aerodynamic shape
quality, and further researches are recommended to investigate the pros and cons of relevant
applications.
Appendix
For each training (UIUC) airfoil, a uniform x-y format with N = 251 points is used, and the x
coordinates are set to
1 2π(i − 1)
xi = (cos + 1) i = 1,2, , , , , ,251
2 250
Eq. 3.7.6
Then, the training airfoils are archived by recording corresponding 251 y coordinates in a vector
format. The airfoil GAN model proposed by [4], as shown in Figure 3.7.15, is explained as follows.
GAN is composed of a discriminative model (D) and a generative model (G). D in the airfoil GAN model
uses a fully connected layer to perceive the input information. Then, a down sampling process is
Figure 3.7.15 Flowchart of the airfoil GAN model proposed by Li et al. [4]
160
followed, where four convolution layers are used to extract the underlying features. These features
are connected to a critic neuron, which distinguishes whether an input is from the training data or
synthetic data of the generative model. G is trained simultaneously to generate synthetic data using
noisy inputs. The noisy input is reshaped and then up sampled by four transposed convolution layers.
The output of the last transposed convolution layer in G corresponds to the 251 y coordinates of an
airfoil.
Acknowledgments
We acknowledge the Tier 2 grant from the Ministry of Education, Singapore (R-265-000-661-112). The
computational resources of the National Supercomputing Centre, Singapore (https://www.nscc.sg) are
acknowledged.
3.7.6 VI. Bibliography
3.7.6.1 References
[1] Jameson, A., “Aerodynamic Design via Control Theory,” Journal of Scientific Computing, Vol. 3, No.
3, 1988, pp. 233–260. doi:10.1007/BF01061285.
[2] Reuther, J. J., Jameson, A., Alonso, J. J., Rimlinger, M. J., and Saunders, D., “Constrained Multipoint
Aerodynamic Shape Optimization Using an Adjoint Formulation and Parallel Computers, Part 2,”
Journal of Aircraft, Vol. 36, No. 1, 1999, pp. 61–74. doi:10.2514/2.2414.
[3] Lyu, Z., Kenway, G. K.W., and Martins, J. R. R. A., “Aerodynamic Shape Optimization Investigations
of the Common Research Model Wing Benchmark,” AIAA Journal, Vol. 53, No. 4, 2015, pp. 968–985.
doi:10.2514/1.J053318.
[4] Li, J., Zhang, M., Martins, J. R. R. A., and Shu, C., “Efficient Aerodynamic Shape Optimization with
Deep-Learning Based Geometric Filtering,” AIAA Journal, Vol. 58, No. 10, 2020, pp. 4243–4259.
doi:10.2514/1.j059254, URL https: //doi.org/10.2514/1.j059254.
[5] Robinson, G. M., and Keane, A. J., “Concise Orthogonal Representation of Supercritical Airfoils,”
Journal of Aircraft, Vol. 38, No. 3, 2001, pp. 580–583. doi:10.2514/2.2803.
[6] Poole, D. J., Allen, C. B., and Rendall, T. C. S., “Metric-Based Mathematical Derivation of Efficient
Airfoil Design Variables,” AIAA Journal, Vol. 53, No. 5, 2015, pp. 1349–1361. doi:10.2514/1.j053427,
URL https://doi.org/10.2514/1.j053427.
[7] Masters, D. A., Taylor, N. J., Rendall, T. C. S., Allen, C. B., and Poole, D. J., “Geometric Comparison of
Aerofoil Shape Parameterization Methods,” AIAA Journal, Vol. 55, No. 5, 2017, pp. 1575–1589.
doi:10.2514/1.j054943.
[8] Li, J., Bouhlel, M. A., and Martins, J. R. R. A., “Data-based Approach for Fast Airfoil Analysis and
Optimization,” AIAA Journal, Vol. 57, No. 2, 2019, pp. 581–596. doi:10.2514/1.J057129.
[9] Kedward, L., Allen, C. B., and Rendall, T., “Towards Generic Modal Design Variables for
Aerodynamic Shape Optimisation,” AIAA Scitech 2020 Forum, American Institute of Aeronautics and
Astronautics, 2020. doi:10.2514/6.2020-0543, URL https://doi.org/10.2514/6.2020-0543.
[10] Allen, C. B., Poole, D. J., and Rendall, T. C. S., “Wing aerodynamic optimization using efficient
mathematically-extracted modal design variables,” Optimization and Engineering, Vol. 19, No. 2,
2018, doi:10.1007/s11081-018-9376-7, URL https://doi.org/10.1007/s11081-018-9376-7.
[11] Viswanath, A., Forrester, A. I. J., and Keane, A. J., “Dimension Reduction for Aerodynamic Design
Optimization,” AIAA Journal,. doi:10.2514/1.j050717, URL https://doi.org/10.2514/1.j050717.
[12] Constantine, P. G., Dow, E., and Wang, Q., “Active Subspace Methods in Theory and Practice:
Applications to Kriging Surfaces,” SIAM Journal on Scientific Computing, Vol. 36, No. 4, 2014, pp.
A1500–A1524. doi:10.1137/130916138, URL https://doi.org/10.1137/130916138.
[13] Li, J., Cai, J., and Qu, K., “Surrogate-based aerodynamic shape optimization with the active
subspace method,” Structural and Multidisciplinary Optimization, Vol. 59, No. 2, 2019, pp. 403–419.
doi:10.1007/s00158-018-2073-5.
161
[14] Chen, W., Chiu, K., and Fuge, M. D., “Airfoil Design Parameterization and Optimization Using
Bézier Generative Adversarial Networks,” AIAA Journal, Vol. 58, No. 11, 2020, pp. 4723–4735.
doi:10.2514/1.j059317, URL https://doi.org/10.2514/1.j059317.
[15] Du, X., He, P., and Martins, J. R. R. A., “A B-Spline-based Generative Adversarial Network Model
for Fast Interactive Airfoil Aerodynamic Optimization,” AIAA SciTech Forum, AIAA, Orlando, FL, 2020.
doi:10.2514/6.2020-2128.
[16] Kedward, L. J., Allen, C. B., and Rendall, T. C. S., “Gradient-Limiting Shape Control for Efficient
Aerodynamic Optimization,” AIAA Journal, Vol. 58, No. 9, 2020, pp. 3748–3764.
doi:10.2514/1.j058977, URL https://doi.org/10.2514/1.j058977.
[17] Bons, N., Martins, J., Mader, C. A., McMullen, M. S., and Suen, M., “High-fidelity Aerostructural
Optimization Studies of the Aerion AS2 Supersonic Business Jet,” AIAA AVIATION 2020 FORUM,
American Institute of Aeronautics and Astronautics, 2020. doi:10.2514/6.2020-3182, URL
https://doi.org/10.2514/6.2020-3182.
[18] Li, J., He, S., and Martins, J. R. R. A., “Data-driven Constraint Approach to Ensure Low-speed
Performance in Transonic Aerodynamic Shape Optimization,” Aerospace Science and Technology, Vol.
92, 2019, pp. 536–550. doi:10.1016/j.ast.2019.06.008.
[19] Wu, H., Zheng, S., Zhang, J., and Huang, K., “GP-GAN: Towards realistic high-resolution image
blending,” arXiv preprint arXiv:1703.07195, 2017.
[20] Arjovsky, M., Chintala, S., and Bottou, L., “Wasserstein gan,” arXiv preprint arXiv:1701.07875,
2017.
[21] Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., and Courville, A. C., “Improved training of
wasserstein gans,” Advances in neural information processing systems, Vol. 30, 2017, pp. 5767–5777.
[22] Kenway, G. K., Kennedy, G. J., and Martins, J. R. R. A., “A CAD-Free Approach to High-Fidelity
Aerostructural Optimization,” Proceedings of the 13th AIAA/ISSMO Multidisciplinary Analysis
Optimization Conference, Fort Worth, TX, 2010. doi: 10.2514/6.2010-9231.
[23] Lyu, Z., and Martins, J. R. R. A., “Aerodynamic Design Optimization Studies of a Blended-Wing-
Body Aircraft,” Journal of Aircraft, Vol. 51, No. 5, 2014, pp. 1604–1617. doi:10.2514/1.C032491.
[24] Kenway, G. K. W., and Martins, J. R. R. A., “Buffet Onset Constraint Formulation for Aerodynamic
Shape Optimization,” AIAA Journal, Vol. 55, No. 6, 2017, pp. 1930–1947. doi:10.2514/1.J055172.
[25] Secco, N. R., Jasa, J. P., Kenway, G. K.W., and Martins, J. R. R. A., “Component-based Geometry
Manipulation for Aerodynamic Shape Optimization with Overset Meshes,” AIAA Journal, Vol. 56, No.
9, 2018, pp. 3667–3679. doi:10.2514/1.J056550.
[26] Shi, Y., Mader, C. A., He, S., Halila, G. L. O., and Martins, J. R. R. A., “Natural Laminar-Flow Airfoil
Optimization Design Using a Discrete Adjoint Approach,” AIAA Journal, Vol. 58, No. 11, 2020, pp.
4702–4722. doi:10.2514/1.j058944, URL https://doi.org/10.2514/1.j058944.
[27] He, X., Li, J., Mader, C. A., Yildirim, A., and Martins, J. R. R. A., “Robust aerodynamic shape
optimization—from a circle to an airfoil,” Aerospace Science and Technology, Vol. 87, 2019, pp. 48–
61. doi:10.1016/j.ast.2019.01.051.
[28] Bons, N. P., He, X., Mader, C. A., and Martins, J. R. R. A., “Multimodality in Aerodynamic Wing
Design Optimization,” AIAA Journal, Vol. 57, No. 3, 2019, pp. 1004–1018. doi:10.2514/1.J057294.
[29] Martins, J. R. R. A., “Perspectives on Aerodynamic Design Optimization,” AIAA SciTech Forum,
AIAA, Orlando, FL, 2020. doi:10.2514/6.2020-0043.
[30] Perez, R. E., Jansen, P. W., and Martins, J. R. R. A., “pyOpt: A Python-Based Object-Oriented
Framework for Nonlinear Constrained Optimization,” Structural and Multidisciplinary Optimization,
Vol. 45, No. 1, 2012, pp. 101–118. doi:10.1007/s00158-011-0666-3.
[31] Wu, N., Kenway, G., Mader, C., Jasa, J., and Martins, J., “pyOptSparse: APython framework for
large-scale constrained nonlinear optimization of sparse systems,” Journal of Open Source Software,
Vol. 5, No. 54, 2020, p. 2564. doi:10.21105/joss.02564, URL https://doi.org/10.21105/joss.02564.
162
[32] Yildirim, A., Kenway, G. K. W., Mader, C. A., and Martins, J. R. R. A., “A Jacobian-free approximate
Newton–Krylov startup strategy for RANS simulations,” Journal of Computational Physics, Vol. 397,
2019, p. 108741. doi:10.1016/j.jcp.2019.06.018.
[33] Mader, C. A., Kenway, G. K. W., Yildirim, A., and Martins, J. R. R. A., “ADflow: An Open-Source
Computational Fluid Dynamics Solver for Aerodynamic and Multidisciplinary Optimization,” Journal
of Aerospace Information Systems, Vol. 17, No. 9, 2020, pp. 508–527. doi:10.2514/1.i010796, URL
https://doi.org/10.2514/1.i010796.
[34] Spalart, P., and Allmaras, S., “A One-Equation Turbulence Model for Aerodynamic Flows,” 30th
Aerospace Sciences Meeting and Exhibit, 1992. doi:10.2514/6.1992-439.
[35] Kenway, G. K. W., Mader, C. A., He, P., and Martins, J. R. R. A., “Effective Adjoint Approaches for
Computational Fluid Dynamics,” Progress in Aerospace Sciences, Vol. 110, 2019, p. 100542.
doi:10.1016/j.paerosci.2019.05.002.
[36] Chernukhin, O., and Zingg, D. W., “Multimodality and Global Optimization in Aerodynamic
Design,” AIAA Journal, Vol. 51, No. 6, 2013, pp. 1342–1354. doi:10.2514/1.j051835.
[37] Li, J., “Optimized airfoils based on different flight missions,” Mendeley Data, 2020.
doi:10.17632/23yrzbzr3m.1.
[38] Chen, S., Lyu, Z., Kenway, G. K. W., and Martins, J. R. R. A., “Aerodynamic Shape Optimization of
the Common Research Model Wing-Body-Tail Configuration,” Journal of Aircraft, Vol. 53, No. 1, 2016,
pp. 276–293. doi:10.2514/1.C033328.
[39] Luke, E., Collins, E., and Blades, E., “A Fast Mesh Deformation Method Using Explicit
Interpolation,” Journal of Computational Physics, Vol. 231, No. 2, 2012, pp. 586–601.
doi:10.1016/j.jcp.2011.09.021.
[40] Li, J., and Zhang, M., “Adjoint-Free Aerodynamic Shape Optimization of the Common Research
Model Wing,” AIAA Journal, 2021, pp. 1–11. doi:10.2514/1.j059921, URL
https://doi.org/10.2514/1.j059921.
[41] Li, J., and Zhang, M., “Data-based Approach for Wing Shape Design Optimization,” Aerospace
Science and Technology, 2021. (Submitted).