Data-Driven Aerospace Engineering With ML

AIAA JOURNAL
Vol. 59, No. 8, August 2021
Data-Driven Aerospace Engineering: Reframing the Industry

with Machine Learning
Steven L. Brunton,∗ J. Nathan Kutz,† Krithika Manohar,‡ Aleksandr Y. Aravkin,§ and
Kristi Morgansen¶
University of Washington, Seattle, Washington 98195
and
Jennifer Klemisch,** Nicholas Goebel,** James Buttrick,†† Jeffrey Poskin,‡‡
Adriana W. Blom-Schieber,§§ Thomas Hogan,‡‡ and Darren McDonald**
The Boeing Company, Seattle, Washington 98108
https://doi.org/10.2514/1.J060131
Downloaded by TURKISH AEROSPACE INDUSTRIES (TAI) on February 20, 2023 | http://arc.aiaa.org | DOI: 10.2514/1.J060131
Data science, and machine learning in particular, is rapidly transforming the scientific and industrial landscapes.
The aerospace industry is poised to capitalize on big data and machine learning, which excels at solving the types of
multi-objective, constrained optimization problems that arise in aircraft design and manufacturing. Indeed,
Steven L. Brunton is an Associate Professor of Mechanical Engineering and a Data Science Fellow with the eScience
Institute at the University of Washington. He received his B.S. in Mathematics from Caltech in 2006 and the Ph.D. in
Mechanical and Aerospace Engineering from Princeton in 2012. He has co-authored three textbooks, and received the
Army and Air Force Young Investigator Program (YIP) awards and the Presidential Early Career Award for
Scientists and Engineers (PECASE). He is a Member of the AIAA.
J. Nathan Kutz received the B.S. degrees in physics and mathematics from the University of Washington, Seattle,
WA, in 1990, and the Ph.D. degree in applied mathematics from Northwestern University, Evanston, IL, in 1994. He is
currently a Professor of applied mathematics, adjunct professor of physics, mechanical engineering and electrical
engineering, and a senior data science fellow with the eScience institute at the University of Washington.
Krithika Manohar is an Assistant Professor of Mechanical Engineering at the University of Washington. She
received her B.S. in Mathematics & Computer Science from the University of Massachusetts Lowell (2013) and the
Ph.D. in Applied Mathematics from the University of Washington (2018). She is a recipient of an NSF Mathematical
Sciences Postdoctoral Research Fellowship and spent two years as a postdoctoral scholar and von Karman Instructor
at Caltech.
Aleksandr Aravkin is an Associate Professor of Applied Mathematics at the University of Washington, and the
Director of Math Sciences at the Institute for Health Metrics and Evaluation (IHME). Dr. Aravkin received B.S.
degrees in Mathematics and Computer in 2004, an M.S. in Statistics and a Ph.D. in Mathematics in 2010, all from the
University of Washington. He was a joint postdoctoral fellow in Earth and Ocean Sciences and Computer Science at
the University of British Columbia from 2010–2012, and a research staff member at the IBM T.J. Watson Research
Center from 2012–2015. as well as Adjunct Professor in Computer Science and IEOR at Columbia University.
Received 31 August 2020; revision received 3 December 2020; accepted for publication 10 December 2020; published online Open Access 16 July 2021.
Copyright © 2021 by the authors. Published by the American Institute of Aeronautics and Astronautics, Inc., with permission. All requests for copying and
permission to reprint should be submitted to CCC at www.copyright.com; employ the eISSN 1533-385X to initiate your request. See also AIAA Rights and
Permissions www.aiaa.org/randp.
*James B. Morrison Professor, Mechanical Engineering. Member AIAA.
†
Robert Bolles and Yasuko Endo Professor, Applied Mathematics.
‡
Assistant Professor, Mechanical Engineering.
§
Associate Professor, Applied Mathematics.
¶
Professor and Chair, Aeronautics and Astronautics. Associate Fellow AIAA.
**Boeing Test & Evaluation.
††
Boeing Commercial Aircraft Engineering.
‡‡
Boeing Research & Technology.
§§
Boeing Commercial Aircraft Engineering. Associate Fellow AIAA.
2820
BRUNTON ET AL. 2821
emerging methods in machine learning may be thought of as data-driven optimization techniques that are ideal for
high-dimensional, nonconvex, and constrained, multi-objective optimization problems, and that improve with
increasing volumes of data. This review will explore the opportunities and challenges of integrating data-driven
science and engineering into the aerospace industry. Importantly, this paper will focus on the critical need for
interpretable, generalizable, explainable, and certifiable machine learning techniques for safety-critical applications.
This review will include a retrospective, an assessment of the current state-of-the-art, and a roadmap looking forward.
Recent algorithmic and technological trends will be explored in the context of critical challenges in aerospace design,
manufacturing, verification, validation, and services. In addition, this review will explore this landscape through
several case studies in the aerospace industry. This document is the result of close collaboration between University of
Washington and Boeing to summarize past efforts and outline future opportunities.
I. Introduction the scientific computing revolution of the 1960s, which gave rise to
D ATA science is broadly redefining the state-of-the-art in engi-

neering practice and what is possible across the scientific,
technological, and industrial landscapes. The big data era mirrors
transformative engineering paradigms and allowed for the accurate
simulation of complex, engineered systems. Indeed, scientific comput-
ing enabled the prototyping of aircraft design through physics-based
Kristi A. Morgansen is Professor and Chair of the William E. Boeing Department of Aeronautics & Astronautics at
the University of Washington. Her research interests focus on nonlinear systems where sensing and actuation are
integrated, stability in switched systems with delay, and incorporation of operational constraints such as communi-
cation delays in control of multi-vehicle systems. She is co-founder and co-director of the University of Washington
Space Policy and Research Center and is an Associate Fellow of the AIAA.
Jennifer Klemisch is a senior data science consultant with expertise in the aerospace industry, emotional intelli-
gence and linguistics. Previously she held the position of Advanced Analytics Strategic Leader of Boeing Test and
Evaluation. She received her undergrad in Mechanical Engineering from University of North Dakota, MS in Finance
from Webster University and MS in Data Science from Northwestern University. She holds a patent for an oxygen
analysis measurement system and a patent pending for emotionally intelligent robotic pilot.
Nicholas Goebel is an aerodynamic flight test engineer at The Boeing Company where he works on applying data
science and machine learning to flight test data. He received a Bachelors in Aerospace Engineering and Mechanics
from the University of Minnesota and an M.S. in Applied Mathematics from the University of Washington.
James N. Buttrick is a Technical Fellow at the Boeing Company. His focus for the past 36 years has been developing
automated manufacturing machines for aircraft assembly and composite part fabrication. He received his M.S. in
Mechanical Engineering from the University of Washington in 1984, and his B.S. in Marine Engineering from the US
Merchant Marine Academy in 1979.
Jeffrey Poskin is a mathematician in Boeing Research & Technology’ s Applied Mathematics group. He has
expertise in constrained data fitting, geometric modeling, and numerical optimization. He received his B.S. in
Mathematics from the University of Kansas in 2011 and a Ph.D. in Mathematics from the University of Wisconsin
in 2017.
Agnes (Adriana W.) Blom-Schieber is a Technical Fellow at Boeing Commercial Airplanes, and an Affiliate
Professor at University of Washington's Mechanical Engineering department. She is specialized in fiber-reinforced
composite design, analysis, manufacturing and optimization. Her work includes development of design tools, model-
based engineering, and multi-disciplinary optimization. She is a member of the ThermoPlastic composites Research
Center (TPRC) technical advisory board. She is an Associate Fellow of AIAA and past chair for the Pacific Northwest
AIAA section.
2822 BRUNTON ET AL.
emulators that resulted in substantial cost savings to aerospace man- data-intensive transformation in the aerospace industry can learn
ufacturers. The Boeing 777 was the first aircraft to have been designed from the digital transformation over the past decades. Perhaps the
completely from simulation without a mock-up. In a similar fashion, largest change will be in how teams of researchers and engineers are
machine learning (ML) and artificial intelligence (AI) algorithms are formed with domain expertise and essential data science proficiency,
ushering in one of the great technological developments of our gen- along with changes in research and development cycles for industry.
eration [1–7]. The success of ML/AI has been undeniable in tradition- The aerospace industry presents a number of unique opportunities
ally challenging fields, such as machine vision and natural language and challenges for the integration of data-intensive analysis tech-
processing, fraud detection, and online recommender systems. niques and ML. The transformative impact of data science will be felt
Increasingly, new ML/AI opportunities are emerging in engineering across the aerospace industry, including 1) in the factory (design for
disciplines where processes are governed by physics [8,9], such as manufacturability, re-use and standardization, process control,
materials science, robotics, and control. An overview of opportunities safety, productivity, reproducibility, inspection, automation, drilling,
in data-intensive aerospace engineering is shown in Fig. 1. shimming); 2) in testing and evaluation (streamlining testing, certif-
Advances in data-driven science and engineering have been driven ication, anomaly detection, data-driven modeling); 3) in the aircraft
by the unprecedented confluence of 1) vast and increasing data; (inspection, design and performance, materials and composites,
2) advances in high-performance computation; 3) improvements to maintenance, future product development); 4) in human–machine
sensing technologies, data storage, and transfer; 4) scalable algo- interactions (advanced design interfaces, interactive visualizations,
rithms from statistics and applied mathematics; and 5) considerable natural language interactions); and 5) in the business (supply chain,
investment by industry, leading to an abundance of open-source sales, human resources, and marketing). Because of the exacting
software and benchmark problems. Nowhere is the opportunity for tolerances required in aerospace manufacturing, many of these
data-driven advancement more exemplified than in the field of aero- high-level objectives are tightly coupled in a constrained multi-
space engineering, which is data rich and is already built on a con- objective optimization. Traditionally, this optimization has been
strained multi-objective optimization framework that is ideally suited too large for any one group to oversee, and instead, individual
for modern techniques in ML/AI. Each stage of modern aerospace components are optimized locally within acceptable ranges. How-
manufacturing is data-intensive, including manufacturing, testing, ever, unforeseen interactions often cause considerable redesign and
and service. A Boeing 787 comprises 2.3 million parts that are program delays. In the worst-case scenario, accidents may occur.
sourced from around the globe and assembled in an extremely With improvements in end-to-end database management and inter-
complex and intricate manufacturing process, resulting in vast multi- action (data standardization, data governance, a growing data-aware
modal data from supply chain logs, video feeds in the factory, culture, and system integration methods), it is becoming possible to
inspection data, and hand-written engineering notes. After assembly, create a digital thread of the entire design, manufacturing, and testing
a single flight test will collect data from 200,000 multimodal sensors, process, potentially delivering dramatic improvements to this design
including asynchronous signals from digital and analog sensors, optimization process. Further, improvements in data-enabled models
including strain, pressure, temperature, acceleration, and video. of the factory and the aircraft, the so-called digital twin, will allow for
In service, the aircraft generates a wealth of real-time data, which the accurate and efficient simulation of various scenarios. In addition
is collected, transferred, and processed with 70 miles of wire and to these operational improvements, advances in data-intensive analy-
18 million lines of code for the avionics and flight control systems sis are also driving fundamental advances in aerospace critical fields
alone. Thus, big data is presently a reality in modern aerospace such as fluid mechanics [13,14] and material science [15]. Impor-
engineering, and the field is ripe for advanced data analytics with ML. tantly, data science works in concert with existing methods and
The use of data for science and engineering is not new [10], and workflows, allowing for transformative gains in predictive analytics
most key breakthroughs in the past decade have been fundamentally and design insights gained directly from data. Figure 2 provides a
catalyzed by advances in data quality and quantity. However, with an schematic of this process.
unprecedented ability to collect and store data [11], we have entered a Despite these tremendous potential gains, there are several chal-
new era of data-intensive analysis, where hypotheses are now driven lenges facing the integration of data science in the aerospace industry.
by data. This mode of data-driven discovery is often referred to as the Foremost, due to the safety-critical aspect of aerospace engineering,
fourth paradigm [12], which does not supplant, but instead comple- data-driven models must be certifiable and verifiable, must general-
ments the established modes of theoretical, experimental, and ize beyond the training data, and must be both interpretable and
numerical inquiry. In fact, we again emphasize the strong parallel explainable by humans. Further, collecting vast amounts of data
with the rise of computational science in the past half century, which may lead to a data mortgage, where simply collecting and maintain-
did not replace, but instead augmented theory and experiments. Just ing this data comes at a prohibitively high cost, and the data is
as today computational proficiency is expected in the workforce, cumbersome for downstream analysis. In the aerospace industry,
so will data science proficiency be expected in the future. Thus, the customers are extremely diverse, with training and certification often
Thomas Hogan is an Associate Technical Fellow at the Boeing Company, where he has worked on mathematical
modeling for nearly 20 years. He received a Bachelors degree in Mathematics & Computer Science from the
University of Michigan Dearborn and his Ph.D. in Mathematics from the University of Wisconsin Madison.
Darren McDonald is a Technical Fellow at the Boeing Company, and he is a Fellow with the Society of Flight Test
Engineers, as well as a Senior Member of AIAA. He has worked for over 20 years in Flight Test Engineering, Stability
& Control. He received a Bachelors degree in Aerospace Engineering from Embry-Riddle Aeronautical University,
Prescott, AZ.
BRUNTON ET AL. 2823
Fig. 1 Data science and machine learning have the potential to revolutionize the aerospace industry.
much of the aerospace industry has been centered around a con-

strained, multi-objective optimization with an exceedingly large
number of degrees of freedom and nonlinear interactions. ML algo-
rithms are a growing set of data-intensive optimization and regression
techniques that are ideal for these types of high-dimensional, non-
linear, nonconvex, and constrained optimizations. Aided by advances
in hardware and algorithms, modern ML is poised to enable this
optimization, allowing a much broader and integrated perspective. It
is important to note that not all ML is deep learning or artificial
intelligence. ML is simply optimization that is directed on data rather
than first principles models, so that it fits naturally into existing
efforts, but leverages a growing and diverse set of data.
In this paper, we will explore the evolving landscape of data-driven
aerospace engineering. We will discuss emerging technology and how
these are changing what is possible in aerospace design, manufactur-
ing, testing, and services. These advances will be viewed through case
studies, providing a review of past work and a roadmap for the future.
Further, we wish to temper the excitement about the immense oppor-
tunities of data science with a realistic view of what is easy and what is
hard. Finally, because the field is evolving rapidly, we want to establish
a common terminology, taxonomy, and hierarchy for technology
surrounding data science and its ML algorithms.
Fig. 2 Schematic overview of data-driven aerospace engineering.

II. Machine Learning and Optimization
In this section, we will discuss a number of mathematical archi-
done by a range of companies and regulatory authorities, respec- tectures that are central to the data-intensive modeling of aerospace
tively, motivating a level of robustness that is typically not required in systems. First, we will focus on understanding modern ML algo-
other industries. There are also fundamental differences between rithms that will be critical to process data from the aerospace industry.
commercial and defense sectors, limiting the sharing of designs, We then provide a brief overview of applied optimization, which is
transfer of technology, and joint testing; in fact, at Boeing, testing the mathematical underpinning of ML. Next, we discuss specific ML
and evaluation of commercial and defense vehicles have only extensions and considerations for systems governed by physics, and
recently been grouped under one organization, and these efforts still also discuss the importance of scalable and robust numerical
remain siloed. algorithms. The treatment of these mathematical topics is far from
The aerospace industry has always been a leader in optimization, comprehensive [9] and is instead intended to provide a brief, high-
and the earliest advances of the Wright brothers may be viewed as an level overview. Potential uses for these algorithms will be explored in
optimization of the flight control system. In the century that followed, later sections.
2824 BRUNTON ET AL.
A. Machine Learning translation-invariant, and symmetric covariance kernels. Additional

ML is a growing set of optimization and regression techniques to physics and prior knowledge may be incorporated as additional loss
build models from data. There are a number of important dichotomies functions or constraints in the optimization problem.
with which we may organize the variety of ML algorithms. Here, we
will group these into supervised and unsupervised learning methods, 1. Supervised Learning
based on the extent to which the training data is labeled. An approxi- Supervised learning assumes that the training data x has labels y. If
mate organization of these learning techniques, organized by task, is the labels are discrete, such as a categorical description of an image
shown in Fig. 3. For example, in supervised approaches, a function ϕ (e.g., dog vs cat), then the supervised learning task is a classification.
mapping input data x to outputs y, also known as labels or targets, If the labels are continuous, such as the lift profile for a particular
must be learned. Typically, the mapping may either predict the label airfoil shape, then the task is a regression. In the simplest form, the
goal of supervised learning is to train a model to minimize the loss
y^ ϕx; θ (1) function
or model the joint probability distribution of the inputs and outputs L ky − ϕx; θk (3)
ϕx; y; θ (2) where k ⋅ k is the root mean-squared error (RMSE).

Labeling the data with expert knowledge often makes it possible

In both cases, the models are parameterized by the parameters θ, for supervised learning algorithms to automate complex tasks. Many
which must be learned. There are numerous techniques to learn the of the dominant, industrially used algorithms are supervised in
structure and parameters of these mappings, including linear and nature, including the ubiquitous methods of classification trees and
nonlinear regression, genetic programming (i.e., symbolic regres- support vector machines [3,9]. More recently, NNs have surpassed
sion), and neural networks (NNs). In addition to these dichotomies of the performance of these methods, provided a sufficiently large data
ML, reinforcement learning, or semisupervised learning, provides set is available to train the network [16]. In supervised learning, the
yet another paradigm for learning where delayed reward structures availability of an extensive, high-quality labeled data set is crucial, as
can be achieved for tasks such as autonomy (self-driving cars, in the 2009 ImageNet data set [17], which enabled the training of a
unmanned aerial vehicles, etc.). deep convolutional NN that outperformed all previous architec-
There are four major stages in ML: 1) determining a high-level task tures [18].
or objective, 2) collecting and curating the training data, 3) identifying
the model architecture and parameterization, and 4) choosing an 2. Unsupervised Learning
optimization strategy to determine the parameters of the model from Unsupervised learning, also known as data mining or pattern
the data. Human intelligence is critical in each of these stages. extraction, determines the underlying structure of a data set without
Although considerable attention is typically given to the learning labels. Again, if the data is to be grouped into distinct categories, then
architecture, it is often the data collection and optimization stages that the task is clustering, whereas if the data has a continuous distribu-
require the most time and resources. Indeed, exploring a broad and tion, the task is an embedding. Unsupervised learning is an extremely
diverse set of ML architectures is a hallmark feature of today’s challenging task as the algorithm is unguided by expert labels. Such
ML-centric companies. It is also important to note that known algorithms are commonly used in an exploratory fashion to learn
physics (e.g., invariances, symmetries, conservation laws, con- about data and the kinds of correlations that exist between features.
straints) may be incorporated in each of these stages. For example, The three most commonly used methods for unsupervised clustering
rotational invariance is often incorporated by augmenting the training include k-means, mixture models, and hierarchical clustering [3,9].
data with rotated copies, and translational invariance is often captured In each case, the number of distinct patterns in the data is usually
using convolutional NN architectures. In kernel-based techniques, specified by the user and refined as a tuned hyperparameter. Learning
such as Gaussian process regression and support vector machines, features in an unsupervised manner can often lead to future develop-
symmetries can be imposed by means of rotation-invariant, ments that are supervised.
Fig. 3 Schematic overview of various machine learning techniques.

BRUNTON ET AL. 2825
X
One of the standard approaches in embedding is to find a low-
dimensional subspace or submanifold, parameterized by a latent V π s E γ t rt js0 s (5)
variable z, that describes a high-dimensional state x. In this case, t
the goal is to find two functions, an encoder z φx and a decoder

where E is the expected reward over the time steps t, subject to a
x^ ψz, so that x^ ψϕx ≈ x. The functions φ and ψ are discount rate γ. Typically, it is assumed that the state evolves accord-
implicitly parameterized by weights θ that must be tuned to minimize ing to a Markov decision process, so that the probability of the system
the following loss function: occurring in the current state is determined only by the previous state.
Thus a large number of trials must be evaluated in order to determine
L kx − ψϕxk (4) an optimal policy. This is accomplished in chess and Go by self-play
[20], which is exactly what many engineered systems are allowed to
When the encoder and decoder are linear functions, given by matri- do to learn an optimal policy. Often, in modern deep RL, a deep NN is
ces, then the optimal embedding recovers the classical singular value used to learn a quality function Qs; a that jointly describes the
decomposition (SVD) or principal component decomposition (PCA) desirability of a given state/action pair.
[9,14]. However, these functions may be nonlinear, defined by NNs,
resulting in powerful autoencoders. 4. Deep Learning
Deep learning, or learning based on NNs with a deep multilayer

3. Reinforcement Learning
structure, is often synonymous with ML and is the core architecture
The power of reinforcement learning (RL) [19] lies in its ability to for many modern supervised and RL paradigms. NNs are particularly
learn from interactions with the environment with goal-oriented powerful due to their expressive representations of data and their
objectives. Thus its application domain includes autonomy and con- diverse architectures [16]. The unparalleled success of these algo-
trol. This is unlike the two other dominant ML paradigms of super- rithms in ML is due to the availability of sufficiently vast and rich
vised and unsupervised learning [9,16]. With a trial-and-error search, training data and modern computational hardware, which have
an RL agent senses the state of its environment and learns take enabled the training of exceedingly large NNs with millions or
appropriate actions to achieve optimal immediate or delayed rewards. billions of free parameters.
Specifically, the RL agent arrives at different states s by performing Figure 4 shows a number of the leading architectures used in
actions a, with the selected actions leading to positive or negative practice today in a variety of applications, including a simple feedfor-
rewards r for learning. Importantly, the RL agent is capable of ward architecture, a deep autoencoder that capitalizes on low-
learning delayed rewards, which is critical for many systems where dimensional structure in data, and a deep convolution NN (DCNN)
a circuitous path to the optimal solution must be learned. Rewards that is widely used to extract features for machine vision. The term
may be thought of as sporadic and delayed labels, leading to RL often “deep” refers to the number of NN layers (typically 7–10) used to
being classified as semisupervised learning. One canonical example map from inputs to outputs. NNs are universal function approxima-
is learning a set of moves, or a long-term strategy, to win a game tors [21], and they assume a compositional structure
of chess.
RL is often formulated as an optimization to determine the policy y ϕ1 ϕ2 · · · ϕn x; θn ; · · · ; θ2 ; θ1 (6)
πs; a, which is a probability of taking action a given state s, to
maximize the total reward across an episode. Given a policy π, it is The flexibility of this compositional structure enables the con-
possible to define a value function that quantifies the desirability of struction of classification or regression maps between input and
being in a given state: output data. With sufficient data, one can optimize for the NN weight
Fig. 4 Mathematical architectures of commonly used neural networks, including a) a feed-forward neural network, b) a deep autoencoder network, and
c) a deep convolutional neural network.
2826 BRUNTON ET AL.
parameters θn , usually via some form of stochastic gradient descent. information about structure of unknowns and statistical measures that
Deep learning is commonly used in commercial settings, with describe beliefs about error and uncertainty. The scope of problems
DCNNs being the state-of-the-art for characterizing images and that can be formulated in this way is broad, ranging from sensing,
spatial correlations, and recurrent NNs (RNNs) enabling powerful estimation, and control [69–76] to ML [9,16,77] and decisions under
speech and text recognition software. NNs are typically supervised uncertainty [78]. In all cases a modeling process gives rise to an
learners and require a significant amount of data. They are also optimization problem, where minimizing and maximizing over
known to overfit to data and fail to generalize to new parameter parameters leads to the desired inference or learning machine.
regimes. Regardless, they are a powerful technology that can power Theoretical developments, such as convex and variational analysis,
many of the supervised learning tasks required in modern data capture properties of problem formulations such as smoothness,
science applications. convexity, and well-posedness of the problems themselves. In
addition, deriving provable behavior of such algorithms is often
5. Physics Informed Learning important to guarantee performance.
Physics informed learning [14,22–32] is of growing importance A major distinction in optimization is between convex and
for scientific and engineering problems. Physics informed simply nonconvex problems. Here, convexity refers to the property of the
refers to our ability to constrain the learning process by physical and/ objective function to be minimized or maximized and the set of values
or engineering principles. For instance, conservation of mass, being optimized over. Convex optimization problems [68] are
momentum, or energy can be imposed in the learning process [22]. extremely well studied, as there are fast and scalable generic solution
In the parlance of ML, the imposed constraints are referred to as techniques with performance guarantees. Convex objective functions
regularizers. Thus, physics informed learning focuses on adding will have a single global minima or maxima (i.e., the function has a
regularization to the learning process to impose or enforce physical single hill or valley), and gradient-based methods may be used to
priors. For the example of an NN model, this becomes converge to this extremum. In contrast, nonconvex models, which
have many local minima and maxima (i.e., the objective function has
argmin y ϕ1 ϕ2 · · · ϕn x; θn ; · · · ; θ2 ; θ1 many local hills and valleys), form a much wider, and hence richer,
θ1 ;θ2 ;: : : ;θn class of problems. But, as such, there are no general scalable solution
techniques or strong convergence guarantees.
λgθ1 ; θ2 ; : : : ; θn (7)
1. Stochastic Algorithms
where the regularization g⋅ imposes the desired physical constraint.
The parameter λ is a hyperparameter allowing the user to impose an Many problems in ML involve training sets with millions of data
increasingly strong regularization to enforce this constraint. The points, making standard gradient computations prohibitively expen-
importance of this constraint, or potentially multiple constraints, sive. Stochastic optimization algorithms use random sampling to scale
cannot be overstated in engineering and physics systems. Specifi- gradient descent algorithms to such data sets by using small subsets of
cally, this is where known physics or physical constraints can be the data at any given time. While these methods have a long history
explicitly incorporated into the data-driven modeling process. Phys- [79], recent developments have focused on extending these ideas to
ics informed learning, often enacted with deep learning architectures, ML [80,81]. Recent theoretical breakthroughs have shown that the
represents the state-of-the-art in ML methods for the engineering and simplest algorithms that directly use sampled gradients, and even
physical sciences. approximate gradients (subgradients) in the nonsmooth case, are
Rather than imposing physical constraints explicitly, an alternative provably convergent for a far wider problem class than was previously
is to learn embeddings based on physical models. This physics- known [82]. These results justify the prolific use of these methods for
guided paradigm involves learning embeddings from data produced general large-scale problems, such as training NNs. In addition, sto-
by known first principles models of physics. The computation of chastic algorithms for structured nonconvex and nonsmooth problems
these embeddings, in the context of aerospace and fluid dynamics, is make heavy use of the proximity operator [83,84], and can converge
often known as modal analysis, and has become increasingly data- more rapidly than classic stochastic methods by exploiting variance
driven (from either simulation or observation) in recent years reduction techniques to obtain improved search directions.
[14,33,34]. The physical coupling between fluids and aerospace
structures is particularly important, and the modes of these coupled 2. Role of Structure in Algorithm Design
interactions are impossible to discern by analyzing the Navier– A key theme in the field of optimization is to find the right problem
Stokes equations and structural models alone; instead, they are classes that balance general applicability with specificity for the
determined by the boundary interactions between the coupled mod- design of efficient algorithms. Rather than thinking of problems in
els. Thus, this physics guided architecture greatly enhances the generality such as convex versus nonconvex, more specific assump-
understanding and design of robust engineering systems that can tions, reminiscent of physics-informed learning constraints, allow
withstand complex interactions, turbulence, and instabilities. faster algorithms and better guarantees. Within convex optimization,
Improved reduced-order models (ROMs) of fluid dynamics may the piecewise linear-quadratic class [85] captures a wide range of
further aid efforts in flow estimation and control [35–41], reducing models and admits specialized solution techniques. Outside of con-
jet noise [42], and in improved turbulence models [13,43–50]. vex optimization, the class of convex-composite [86] problems is a
Modern modal analysis techniques, such as proper orthogonal key generalization that has seen recent algorithmic development and
decomposition (POD)/PCA [9], dynamic mode decomposition analysis [87]. Coupled problems with multiple parameters have been
[51–53], Koopman mode decomposition [54,55], and resolvent solved for numerous applications with variable projection techniques
mode analysis [56], naturally fall under the umbrella of unsupervised [88]. Nonconvex composite models and algorithms have also been
learning. Furthermore, the dimensionality reduction achieved by developed, with many applications involving challenging data-
these methods enables low-latency, efficient downstream tasks such generating mechanisms and sparse regularization [89].
as system identification [57], airfoil shape optimization and uncer-
tainty quantification [58], and reduced-order modeling [34,59–67]. 3. Atomic Operations for Nonconvex Nonsmooth Functions
There have been significant recent advances in nonconvex, non-
B. Optimization smooth optimization where the objective functions are not differ-
All of ML relies on optimization [9,14]. In fact, ML may be viewed entiable, and where parameters are constrained, for example, by
as a growing set of applied optimization algorithms to build physical bounds. NNs and deep learning models have been a par-
models from data. Mathematical optimization [68] comprises three ticularly influential driver of methods for large-scale nonconvex
interconnected areas: theoretical underpinnings, algorithm design models. Complementing this direction, nonsmooth models fre-
and implementation, and modeling. Modeling allows us to bring in quently arise as a means of imposing structure on the solution, for
deterministic or physical descriptions of the real world, along with example, sparsity. In the remainder of this section, we discuss some of
BRUNTON ET AL. 2827
these advances in more detail with references to survey literature in with respect to outliers. To address this sensitivity, Candès et al. [110]
optimization. introduced a robust PCA (RPCA) that decomposes a data matrix X
When optimizing a smooth function, algorithms fundamentally into a low-rank matrix L containing dominant coherent structures,
rely on gradients to implement second-order methods, such as and a sparse matrix S containing outliers and corrupt data:
Newton’s method or the Gauss–Newton and quasi-Newton variants
(see [90] for an overview). These gradient computations involve XLS (9)
matrix-vector products and equation solves, which we will call
atomic operations. Most common algorithms can be decomposed The principal components of L are robust to the outliers and corrupt
into such operations. data in S. The low-rank matrix L is decomposed via the SVD into
Another key operation is that of the proximity operator, which has L ΦDV T , where coherent features are given by the matrix Φ. We
a long history and a tremendous range of recent applications [91,92]. generally use the first r dominant columns of Φ, corresponding to the
Given a function f, we define r features that explain the most variance in the data. The SVD
provides the best rank-r least-squares approximation for a given rank
1 r: L^ Φr Dr V Tr . The target rank r must be carefully chosen so that
proxαf z argminx kx − zk2 fx (8)
2α the selected features only include meaningful patterns and discard
noise corresponding to small singular values. The subsequent left
In words, we minimize the sum of the function and a scaled quadratic singular vectors Φr are the desired low-rank features that span the
around a base point z and return the minimizing value. When the columns of L. The truncation parameter r may be determined using
function f is arbitrary, evaluating the operator may be difficult or the optimal singular value truncation threshold of Gavish and
impossible. However, under moderate assumptions that are Donoho [111].
satisfied for a wide range of applications, the proximity operator RPCA has tremendous applicability for modern problems of inter-
has a closed-form solution or a provably fast computational routine. est, including video surveillance [112] (where the background
Many algorithms for solving nonsmooth nonconvex problems use objects appear in L and foreground objects appear in S), natural
the proximity operator as a subroutine, and are provably conver- language processing [113], matrix completion, and face recognition
gent [93]. [114]. Matrix completion may be thought of in terms of the Netflix
prize, where a matrix of preferences is constructed, with rows corre-
C. Scalable and Robust Algorithms sponding to users and columns corresponding to movies. This matrix
Despite the growing abundance of measurement data across the is sparse, as most users only rate a handful of movies. The goal is to
engineering sciences, systems are often fundamentally low dimen- accurately fill in missing matrix entries, revealing likely user ratings
sional, exhibiting a few dominant features that may be extracted using for movies they have not seen. We will demonstrate the use of RPCA
dimensionality reduction [9,33]. The existence of low-rank patterns for an aircraft shimming application in the case study in Sec. VIII.
facilitates efficient models and sparse sampling, as there are only a
few important degrees of freedom that must be characterized, regard- III. Digital Twin and Enabling Technologies
less of the ambient measurement dimension. Here we discuss some of Several key technologies are necessary to support the design,
the tremendous advances in the past decades developing robust and manufacturing, testing, and service of tomorrow’s aerospace prod-
scalable numerical algorithms for big data applications. ucts, which will ultimately be enabled by a robust digital twin. These
enabling technologies include sensors and the internet of things, a
1. Randomized Linear Algebra comprehensive digital thread, rapid data access and data storage,
Massive datasets pose a computational challenge for traditional virtual reality to perform testing, ROMs and discrepancy models,
algorithms, placing significant constraints on memory, processing uncertainty quantification, autonomy, and control. This section will
power, and computational time. Recently, the powerful concept of review a number of these emerging technologies.
randomization has been introduced as a strategy to ease the
computational load while still achieving performance that is compa- A. Digital Twins
rable with traditional matrix factorization techniques. The critical
Digital twin technology promises to revolutionize the entire manu-
idea of probabilistic algorithms is to employ some degree of random-
facturing and engineering design landscape [115–119]. The goal of
ness in order to derive a smaller matrix from a high-dimensional data
the digital twin is to bridge the physical and virtual worlds, providing
matrix. The smaller matrix is then used to compute the desired low- a proxy environment to simulate, test, and evaluate model designs at a
rank approximation. Such algorithms are shown to be computation- fraction of the cost of real-world implementation. The digital twin
ally efficient for approximating matrices with low-rank structure. Of relies on an accurate, physics-based emulator that characterizes the
particular interest are randomized routines for the computation of the statics or dynamics of a given system. Typically, this model will
SVD, (robust) principal component analysis (PCA), and CUR integrate a hierarchy of multiphysics and multifidelity models, which
decompositions [94]. will be continually updated with data streams from the real world. To
Several probabilistic strategies have been proposed to find a good be more mathematically precise, many physics-based models will
smaller matrix, and we refer the reader to the surveys [95–98] for an consist of a system of nonlinear partial differential equations describ-
in-depth discussion, and theoretical results. In addition to computing ing the time-space evolution a system. Such an evolution equation
the SVD [99,100] and PCA [101,102], it has been demonstrated that can be represented as follows:
this probabilistic framework can also be used to compute the pivoted
QR decomposition [103], the pivoted LU decomposition [104], the ut Nu; ux ; uxx ; · · · ; x; t; β (10)
CANDECOMP/PARAFAC (CP) tensor decomposition [105], and
the dynamic mode decomposition [106]. It also helps frame computa- where u is the system state, the subscripts denote partial differentia-
tionally tractable ROMs [107,108]. It should be noted that the tech tion, and N⋅ prescribes the generically nonlinear evolution. The
giants, such as Google and Facebook, routinely use randomized vector β represents the parameters on which the system depends.
algorithms to analyze their large data sets. Equation (10) also requires a set of initial and boundary conditions on
a given domain. Typically, high-fidelity simulations of this system of
2. Robust Dimensionality Reduction equations are computationally involved, and it may be prohibitively
Robust statistical methods are essential for evaluating real-world expensive to simulate a truly multiscale system, such as the turbulent
data, as advocated by John W. Tukey in the early days of data science fluid flow over a full-scale wing at all scales. Instead, it is often
[10,109]. Many techniques in dimensionality reduction are based on necessary to leverage ROMs that capture dominant physical phenom-
least-squares regression, which is susceptible to outliers and cor- ena at a fraction of the cost. Of course, some systems are time
rupted data. PCA suffers from the same weakness, making it fragile independent and some are spatially independent. In either case, what
2828 BRUNTON ET AL.
is critical is that a proxy, physics-based model exists that is capable of x^ Φr a.

^ The accuracy of reconstruction depends on the structure
informing how a system behaves. From robots to manufacturing of the basis Φr and the choice of observations C.
lines, an accurate virtual representation holds tremendous promise The number of observations can be greatly reduced by optimizing
for technological advancement. the sensors to maximize the accuracy of reconstruction. However, the
The success of digital twin technology centers on accurate virtual combinatorial search over all nr possible sensor locations is compu-
representations of the physical world. For precision manufacturing, tationally intractable even for moderately large n and r. However,
for instance, current digital twin technologies do not provide the there are convex relaxations that can be solved using standard opti-
necessary level of fidelity for a number of processes. What distin- mization techniques and semidefinite programs in On3 runtime.
guishes digital twins from traditional modeling efforts is the integra- We advocate a greedy matrix volume maximization scheme using the
tion of multiphysics systems and components. Thus the digital twin pivoted matrix QR factorization [76,124]. This algorithm treats the
often represents an entire engineering process versus individual selected measurements as optimal rows of the linear operator CΦr,
components in the process, for which we may have good models. which designs sampled features Cϕi to be as orthogonal to each other
However, the digital twin must ensure end-to-end performance across as possible. Because the row selected columns ϕi are no longer
the multiphysics system, placing stringent requirements on the fidel- orthonormal, greedy row selection methods attempt to maintain
ity of models and how they communicate. Discrepancy models near-orthonormality of the features Cϕi for a numerically well-
(discussed below) provide an adaptive framework, whereby the conditioned inverse.
digital twin can continuously learn updated, high-precision physics

models over the course of time from its own sensor network. The C. Reduced-Order Modeling
integration of data and multiphysics models across a system is a grand The engineering sciences increasingly rely on simulations as
challenge, requiring intelligent, robust, and adaptive architectures for proxies for modeling expensive experimental systems. The complex-
learning and control. Many of the data-driven strategies discussed ity and dimension of these numerical simulations are growing rapidly
here are ideally poised to help in building accurate and viable digital due to increasing computational power and resolution in numerical
twin models. discretization schemes. Many complex PDEs such as Eq. (10), which
are critical for digital twin technologies, yield discretized systems of
B. Sensor Technology and the Internet-of-Things differential equations with millions or billions of degrees of freedom.
The aerospace industry generates tremendous volumes of data These large systems, such as turbulent fluid flows, are extremely
from a vast array of distributed sensors. With emerging internet-of- demanding, and may be prohibitively expensive, even for the most
things sensing and communication capabilities, this volume of data advanced supercomputers. Yet many dynamics of interest are known
will only increase. To avoid a data mortgage, where the majority of to be low-dimensional in nature, in contrast to the high-dimensional
resources are spent collecting and curating data, it is critical that key nature of scientific computing. ROMs help reduce the computational
features are automatically extracted and analyzed in real-time complexity required to solve large-scale engineering systems by
through edge computing. Thus, the paradigm of big data will shift approximating the dynamics with a low-dimensional surrogate
to one of smart data. It is also important for algorithms to be robust to [61–64,133–135].
outliers. Outliers may correspond to sensor failures or saturations, To aid in computation, the selection of a set of optimal basis modes
although they may also signal important events that should be is critical, as it can greatly reduce the number of differential equations
analyzed more carefully. Identifying where to place new sensors will generated. Many solution techniques involve the solution of a linear
also play a key role in improving efficiency and process control. system of size n, which generically involves On3 operations. Thus,
Many complex systems, such as a turbulent fluid or a large aero- reducing n is of paramount importance. It is possible to approximate
space structure, have many degrees of freedom and are mathemati- the state u of the PDE using a Galerkin expansion:
cally represented as a high-dimensional vector of data resulting from
simulations or physical measurements. However, high-dimensional ut ≈ Φr at (11)
data often exhibit low-dimensional patterns, which is the foundation
of dimensionality reduction and ML. The high-dimensional state x ∈ where at ∈ Rr is the time-dependent coefficient vector and Φr is
Rn may then be efficiently represented in a low-dimensional basis an optimal basis of orthogonal columns, typically generated by SVD;
Φr ∈ Rn×r , for example, via RPCA above, so that x ≈ Φr a, where a in this case r ≪ n. We then substitute this modal expansion into
is a vector that approximates x in terms of the first r principal Eq. (10) and leverage the orthogonality of Φr to yield the reduced
components Φr . Often the structure of Φr is well-characterized from evolution equations
historical data (e.g., a library of human faces, or a set of point cloud
scans of a particular aircraft part across several aircraft). In this case, dat
the number of measurements required to estimate the full vector x ΦTr LΦr at ΦTr NΦr at; β (12)
dt
may be dramatically reduced from the ambient dimension n. These
sparse sensors can be selected to best identify the coefficients a in the By solving this small system, the solution of the original high-
basis Φr , thus enabling robust estimation of the high-dimensional dimensional system can be approximated. Of critical importance is
state x. There are a number of sparse sensing paradigms, including evaluating the nonlinear terms in an efficient way, for example, using
gappy sampling [120,121], empirical interpolation methods (EIM) gappy POD or other sparse sampling techniques. Otherwise,
[122–124], including the discrete empirical interpolation method, or evaluating the nonlinear terms still requires calculations of the
DEIM, and compressed sensing [125–130]. high-dimensional function with the original dimension n. In certain
Sparse sampling in a tailored basis [76,131,132] has been widely cases, such as the quadratic nonlinearity of Navier–Stokes, the
applied to problems in the imaging sciences as well as to develop nonlinear terms can be computed once in an off-line manner. How-
ROMs of complex physics, such as unsteady fluid dynamics. Thus, ever, parameterized systems generally require repeated evaluation of
even if we cannot measure the full state x, it is often possible to the nonlinear terms as the basis may change with β. Regardless,
estimate the state from r ≪ n point measurements in the observation ROMs allow one to approximate an n-dimensional system with an
vector y ∈ Rr given by y Cx CΦr a, where C ∈ Rr×n is the r-dimensional system, where r ≪ n, making many computations
point measurement operator. Then, the reconstruction of the remain- possible that would otherwise be intractable. As such, ROMs are
ing state reduces to a least-squares estimation problem for the coef- transforming high-performance computing by allowing for computa-
ficients a^ CΦr † y, where † is the Moore–Penrose pseudoinverse. tional studies that have been intractable in the past. Recently, model
This procedure was first used to reconstruct facial images from a discovery techniques are also providing low-order models of com-
subsampled pixel mask [120]. Importantly, it permits a drastic reduc- plex systems [22,24,57,136–138].
tion in computation by solving for r ≪ n unknowns. The full state Despite the success of dimensionality reduction and reduced-order
estimate x^ is subsequently recovered using the feature basis: modeling across a range of problems, there are still many challenges
BRUNTON ET AL. 2829
that remain. There are many systems, such as fluid turbulence, that E. Uncertainty Quantification
defy a low-dimensional representation in a linear subspace as Model certification, credibility bounds, and other guarantees of
identified by SVD/PCA/POD. Indeed, turbulent flows often require performance are necessary for data-driven ROMs in the aerospace
hundreds or thousands of modes to describe the data, so that industry. Indeed, trustworthy ML is necessary for reduction to
traditional projection-based model reduction approaches become practice in almost any critical application area. The mathematical
infeasible. However, there is an opportunity to learn nonlinear sub- framework of uncertainty quantification (UQ) provides computa-
manifolds that accurately describe the data and are parameterized by tional tools for evaluating probabilistic estimates of credibility and
fewer variables. Deep autoencoders provide one approach to learning predictive capacity, and holds the key for bringing ML and AI into
such a nonlinear embedding where models may be identified safety-critical domains. Without quantifying uncertainty in the model
[139,140]. Further, the related field of turbulence closure modeling discovery procedure, one cannot provide estimates on the robustness
is an area of active research that is advancing rapidly with techniques and sensitivity of the models to observation error and model mis-
from ML [13,14,45,47]. match. In practice, this limits the applicability of all of ML for
applications where quantifying the credibility of predictions is criti-
D. Discrepancy Modeling cal, such as human transportation systems with stringent safety
First principles modeling of physical systems has led to significant regulations.
technological advances across all branches of science. For nonlinear The mathematical architecture for UQ relies on a Bayesian per-
systems, however, small modeling errors can lead to significant spective where predictions and quantification are given as probability
deviations from the true (measured) behavior. Even in mechanical distributions subject to a set of priors. Data-driven discovery must
systems, where the equations are assumed to be well-known, there are then be equipped with physically meaningful priors. It is also critical
often model discrepancies corresponding to nonlinear friction, wind in developing a Bayesian scheme to separate dynamics from noise.
resistance, etc. Discovering models for these discrepancies remains Mathematically, model parameters β will be conditioned on the data
an open challenge for many complex systems. There are many Y so that
reasons why model discrepancies occur [141,142]. First, there may
pβjY ∝ pYjZβpβ (15)
be measurement noise and exogenous disturbances. In this case, the
Kalman filter may be thought of as a discrepancy model where the
where Zβ denotes the predictions provided by the candidate mod-
mismatch between a simplified model and observations is assumed to
els, pYjZ is the observation likelihood given by the observation
be a Gaussian process [69]. Second, the parameters of the system may
model, and pβ is the prior on the model coefficients. A result of
be inaccurately modeled. Even worse, the structure of the model may
such UQ metrics is to have performance bounds and guarantees so
not be correct, either because important terms are missing or erro-
that reduction to practice can be assessed and achieved.
neous terms are present. This is known as model inadequacy or model
structure mismatch. Other challenges include incomplete measure-
ments and latent variables, delays, and sensitive dependence on initial F. Autonomy and Control
data in chaotic systems. Modern aerospace systems, including manufacturing and opera-
Discrepancy modeling centers on parameter and structural uncer- tions, will rely on advanced autonomy and precision control. To date,
tainties. One can consider the following governing equations for a there has not been an emergent and well-established paradigm on
given engineering system given by Eq. (10). In general, there is also how to most effectively use large-scale data for autonomous control
an output measurement, systems. Two modern grand challenge problems, robotics and self-
driving cars, shown in Fig. 5, exemplify two very different para-
digms. In the field of robotics, strong adherence to physics laws is
y hu; β (13) imposed. Thus a robot is strongly constrained by our classical
physics-based models for its movement and balance. However, criti-
cal use is made of high-fidelity sensors and Kalman filtering tech-
from which the state may be estimated. The discrepancy modeling niques, which jointly leverage models and data, in order to robustly
problem evaluates the difference between the model output of a control its sophisticated motions. In contrast, self-driving cars are
quantity of interest ϕm t and the observed value ϕo t: largely physics-free. They are simply trained from exceptionally
large data sets from sensors (vision, LIDAR, etc.), which attempt
δϕt ϕo t − ϕm t (14) to integrate all possible scenarios the car may encounter in real-life
driving scenarios. Both paradigms for leveraging data, physics-
informed and physics-free, have experienced tremendous success
where δϕ is the discrepancy. The goal of discrepancy modeling is in the past decade. They have also encountered fundamental limi-
then to characterize the discrepancy δϕt, for example, using stan- tations that must be overcome for the technologies to become com-
dard Gaussian process regression [143], dynamic mode decompo- mercially viable. Self-driving cars have been empowered by deep
sition [53] for approximating δϕt with a best fit linear model, and/or learning algorithms, which are known to have significant shortcom-
model discovery for generating a nonlinear dynamic system [144]. ings in extrapolation tasks. Thus, some of the more spectacular
Fig. 5 Robotics and self-driving cars are both data-intensive examples of autonomy. Robotics is largely physics-informed, whereas self-driving cars
are largely physics-free. Aviation requires both approaches. Panel (a) by Harrypotterrrrr, reproduced from https://commons.wikimedia.org/wiki/
File:9_pic.png. Panel (b) by Grendelkhan, reproduced from https://commons.wikimedia.org/wiki/File:Waymo_self-driving_car_side_view.gk.jpg.
Panel (c) by Dave Sizer, reproduced from https://commons.wikimedia.org/wiki/File:Boeing_787-8_maiden_flight_overhead_view.jpg.
2830 BRUNTON ET AL.
Fig. 6 Standard classical control feedback loop.
failings of self-driving cars have come from situations that were not development of new alloys. The design process involved signifi-
part of their training sets. In contrast, our idealized physics-based cantly more wind-tunnel and flight testing, as well as new results in
models for complex and networked systems, such as robots, often fail potential theory that enabled broader, more accurate aerodynamics
to properly account for discrepancies between models and data, calculations—calculations performed on slide rule. In fact, the
leading to control issues. The aerospace industry ultimately will need aerospace industry has been a major driver in all of these fields due
to leverage both paradigms in order to fully exploit their large-scale to its level of design complexity [146,147].
data sets. Advances in aerospace engineering, including jet propulsion, swept
A traditional feedback system involves guidance (what state do we wings, radio navigation, supersonic travel, computerized control sys-
want a system to have), navigation (what is the current state of the tems, and composite construction, enabled by subsequent innovations
system), and control (how will we affect the system to achieve the in commercial aircraft design, have each required more accurate
desired state), shown in Fig. 6. Key elements of autonomy and control calculations, greater computational power, and increasing levels of
that must be considered when leveraging of data science include the interdisciplinary interactions. Further advancements will require more
following: innovations. As data management tools are developed and adopted to
Performance criteria: Performance criteria for certification are handle the increasing amounts of data required for each iteration, the
generally specified in either the frequency domain or the time cumulative knowledge obtained remains diffuse and disparate. And,
domain. In the time domain, the primary consideration is the transient although results in design exploration, surrogate modeling, and mixed-
response to a disturbance, relative to settling time, rise time, and integer programming have enabled and advanced multidisciplinary
overshoot. In the frequency domain, steady-state stability relative to design, handling significantly more design parameters, as seems
harmonic forcing is considered (bandwidth, stability to loop closure, imperative, will require optimization based on necessarily sparse
gain margin, phase margin). exploration of increasingly high-dimensional design space. The aero-
Safety criticality: Certification generally requires demonstration of space industry is poised to both benefit from and drive advancements in
a system meeting design criteria to a certain probability of failure data analytics, dimensionality reduction, data compression, and many
[e.g., “five nines” (1 × 10−5 ), “seven nines” (1 × 10−7 ), or “nine more technologies broadly under the aegis of ML.
nines” (1 × 10−9 )]. Aircraft design has also expanded to cover the entire life cycle,
Human-in/on-the-loop: Effective control design must account for including data management, manufacturability, technical oversight,
the interaction of the physical system with human pilots and pas- evaluation criteria, operations and operability, maintenance, and even
sengers. For example, in pilot-induced oscillation, the reaction time disposal. It involves many competing objectives and constraints,
of a human pilot relative to system dynamics can result in the pilot’s including safety, environmental impact, and ergonomics. There is
actions causing system oscillation to the point of instability. Model- great potential from innovations in systems engineering, such as
ing human interaction with a physical system is exceptionally process control, robotics, flight scheduling, and flight path manage-
complex. ment. Within the design process, planning and scheduling could
Variable autonomy: When humans and autonomy interact, the benefit greatly from existing algorithms in decision making and
relative amount of human authority versus autonomous authority control, with and without uncertainty, which have proved challenging
must be selected. Typically systems are neither fully human operated to incorporate.
nor fully autonomous. The greatest challenges appear when the level
of autonomy changes during active operation. A. Multidisciplinary Design Optimization
Aircraft design is, in large part, a constrained, multi-objective
optimization problem [148–156]. Constraints include airplane-level
IV. Aerospace Design requirements, such as range or fuel capacity, as well as production
Aircraft design [145] starts with a set of requirements and ends and business constraints, such as manufacturing costs or product cash
with a production-ready vehicle design. Originally, the entire process flow. Design parameters may be discrete (e.g., engine count) or
took only a handful of engineers, from initial design, through multi- continuous (e.g., wing sweep); the objective is some measure of
ple refinement and testing iterations, to final specifications. Modern expected profit or quantities presumed correlated with profit.
aircraft design is much more complex and would be impossible Although the level of success depends on the efficiency and accuracy
without many advancements that have occurred in engineering, of the models, analysis tools, and objective functions, it also depends
computers, applied mathematics, numerical methods, optimization, on the dimensionality of the design space and the ability to explore it
high-performance computing, geometric modeling, and more. thoroughly enough to identify a superior design with confidence.
Several innovations enabled production of two seminal aircraft in Success also requires a robust design approach due to unavoidable
the early 1930s. The Boeing 247 and the DC-2 made commercial air uncertainties in predictions, algorithms, dynamics, etc.
travel viable. Their all-metal design, a major factor, depended on Multidisciplinary design optimization (MDO) involves the use
advances in structural design, new manufacturing ideas, and of numerical optimization in design when the constraints and
BRUNTON ET AL. 2831
objective(s) depend on two or more analysis disciplines, for example, 1. Design Diamond
aerodynamics, structures, propulsion, controls, cost (including The systems engineering design “V-model” depicts the major steps
research/development, design, manufacturing, operation, end-of- in the design and production of physical systems. The evolution of
life), performance (e.g., range, fuel burn), and environmental impact data generation in engineering processes as well as the technological
(e.g., noise, emissions). MDO can involve a diversity of parametric development of robust digital models has led to a new view on the
models that may include physical geometry, schematic layouts, or classical design V-model. In Fig. 7, the standard systems engineering
marketing mix. Regardless of model type, effective MDO requires design V-model is mirrored to create a design diamond that incorpo-
complete automation of every step from parameter values to the rates digital counterparts of a product at all stages of its lifecycle.
objective function value for those parameters (or to the identification
of the values as “infeasible”). This entails the models themselves, any 2. Digital Thread
processing required for their analysis, and postanalysis synergism. The digital thread is a digital communications framework con-
ML is increasingly able to aid in this type of highly-constrained necting authoritative sources of information from producers to con-
multi-objective optimization problem. It is possible to leverage the sumers in standard formats throughout the lifecycle of a process,
growing wealth of data to learn models for how the various param- product, or system. It is the connection layer that ties together the
eters interact, providing fast and accurate surrogates for inverse digital models and physical processes guaranteeing that necessary
design. For more explicit real-time optimizations, that is, control, groups are working with a consistent set of evolving information to
ML models are currently being used to great effect in RL and model support the product. One of the major challenges in implementing a
predictive control, which can natively balance multiple objectives digital thread is defining standard data formats that may interface
and constraints. with a wide variety of tools and processes. However, the presence of a
The increased use of composite materials, in particular the digital thread in MBE is essential for high-fidelity digital twins.
complexity of fabrication methods, brings manufacturability to the
forefront of disciplines considered in MDO studies. Reliable models 3. Digital Twin
are needed that can describe the output structure based on a variety of A digital twin is a virtual representation of the properties and
inputs, because the final structure of a composite material depends behaviors of a specific instance of a physical system or process that
not only on the initial geometric design but also on the material and enables prediction and optimization of performance and maintains
fabrication method used. Numerical modeling of fabrication meth- synchronization with that physical system or process through its
ods, for example, hand lay-up, automated fiber placement, and operational life.
vacuum forming, requires deep knowledge of the physical processes The connection of a digital twin to its physical system is typically
that govern the final structure [157]. Physics-informed ML may enabled through sensors gathering data in real-time during opera-
contribute to elevating first-principles-based models to the level of tions. A complicated system, such as a commercial aircraft, may
accuracy and efficiency required for an MDO study. consist of hundreds of connected subsystems each producing a wide
variety of signals detailing performance characteristics. This leads to
B. Model-Based Engineering thousands of components producing diverse signals across varied
Model-based engineering (MBE) is an approach to product devel- communication channels, all in the course of real-time operations
opment and lifecycle management that focuses on using digital [158]. Identification of an appropriate subset of these signals when
models and simulation to design, produce, maintain, and support modeling a particular system process is a critical challenge in con-
products. Before the extensive use of digital artifacts across engineer- structing a reliable and useful digital twin.
ing domains, information and product specifications were transferred
between engineering groups and product consumers through physi-
cal documents. Digital models improve reliability and robustness of V. Aerospace Manufacturing
the entire product lifecycle by ensuring that designers, producers, and Manufacturing is a highly complex and dynamic process, involv-
consumers have access to evolving representations of physical assets. ing the coordination and merging of several elaborate and precisely
Techniques in data science are critical both in producing digital timed stages. In a modern manufacturing environment, tremendous
models that accurately capture asset behavior in the physical world volumes of data are being generated, stored, and analyzed to improve
and in standardizing digital model formats to ensure accessibility process quality, reliability, and efficiency. The data generated is
across the product lifecycle. inherently multimodal, including hand-written reports from humans
Fig. 7 Model-based engineering design diamond.

2832 BRUNTON ET AL.
working alongside automation, high-fidelity data from metrology composites fabrication, assembly of structure, painting, and inspec-
equipment, video feeds, supply chain logs, part catalogs, and results tion. These systems improve quality, reduce production cycle times
from in-process inspection, to name a few. There are several key and cost, and reduce repetitive injury to mechanics; however, they are
opportunities to leverage ML and other data-intensive optimization expensive to develop, implement, and maintain. Due to the unique
techniques to improve manufacturing processes. Several areas of requirements of aircraft designs, materials, and tight tolerances, the
high-priority include part standardization; automation and robotics; decision to automate is complex and driven by specific manufactur-
streamlined assembly, including reduced measurements, processing, ing processes and applications. The development of successful auto-
and inspection, toward just-in-time manufacturing; supply chain mated systems starts with a focus on automating the process and
management; material design and fabrication; and nondestructive understanding requirements. Thus, controlling the process is key to
inspection. In this section, we will explore several of these areas. success. The tight tolerances of automation require monitoring and
process control to ensure consistency, which in turn employs many
A. Advanced Product Quality Planning sensors. Sensor feedback can provide insight into where processes
Advanced product quality planning (APQP) is a methodology for are stable and where variation occurs. They tell us the level of control
producing physical systems that are guaranteed to meet target needed in specific area, process repeatability, what and where to
requirements. It was developed in the late 20th century and has been measure, and a host of other outcomes.
used extensively throughout the automotive industry. Techniques in When control is present, algorithms for ML control can be used to
data science will greatly impact almost every technique and disambiguate the relevant process dynamics from control inputs. The
framework contained in APQP. tools of ML can also be used off-the-shelf on sensor data, not only to
Data science and ML technologies have significant potential to identify repetitive patterns and faults, but also pinpoint potential
improve performance in statistical process control. In aerospace failures. Such failures can be predicted by studying the extent to
manufacturing, this practice guarantees that parts are produced which data deviate from dominant features, or by detecting outliers
within a diverse set of specifications. Historically, process control using robust feature extraction techniques on raw sensor data. The
has been implemented for manufacturing processes by extensive goal of robust feature extraction is to efficiently learn features and
testing in labs before production and then tuning these processes outliers simultaneously. One example is robust PCA, which has
throughout production efforts. Without consistent data storage and previously demonstrated success in predictive aircraft shimming
formats, this may lead to the inability to transfer past process control applications [159] and is discussed in detail in Sec. VIII.
efforts onto new platforms and processes. Data science and ML may
assist in transferring past efforts by defining standard data formats D. Assembly
and producing robust digital simulation models.
The increased use of automation in aerospace manufacturing has Aircraft are manufactured in many pieces, including fuselage
also enabled new opportunities for real-time process monitoring. barrel sections, wings, stabilizers, and fasteners, that are joined as
Manufacturing systems equipped with sensors gather real-time proc- sections or subassemblies and ultimately get integrated into the final
ess data, which can then be used to train ML-based control models. product. Wings are built from subassemblies of fabricated wing
Trained on past production data, these models may determine when a skins, spars, flaps, and ailerons; likewise, fuselages consist of multi-
process will move out of specifications before possible human meas- ple barrel sections that are assembled separately and then joined.
urement and detection. Furthermore, the features extracted using ML Wings are joined to fuselages in a manufacturing position called
methodologies can help determine ideal measurement and detection “wing-to-body join.” The alignment, positioning, and joining of
locations and can lead to significant reductions in labor and costs. these assemblies require high precision tooling and part positioning
systems. In some cases, automated positioning tooling controlled by
B. Standardization
metrology systems are used to align and fit structure. In addition,
even though large sections are made to very tight tolerances, the
The standardization of aircraft designs, manufacturing processes, combined tolerances result in very small gaps that must be shimmed
parts, and machines is an open industrial challenge. New designs are
to meet engineering requirements. Parts are measured with metrology
often airplane-specific iterations that start the design process from
equipment, three-dimensional models are generated to determine
scratch and ultimately offer no significant differences over prior
gaps between the mating parts, and automated machines are used
versions. However, current manufacturing paradigms do not exploit
to fabricate the shims that fill the gaps to ensure proper fit.
or streamline designs to reuse and retool existing versions, resulting
Throughout the process, enormous quantities of data are gener-
in inefficiencies and stresses on supply chains, maintenance, and the
critical path of assembly. In aircraft design, some simple examples are ated. Many sensors are used to provide feedback to control the
brackets and fasteners. Brackets support wiring, hydraulics, systems, automation, measure parts, and validate quality. The use of these
and fasteners for joining many types of structures and components. If data is expanding from building and assembling specific parts to
data from prior designs can be extracted or harvested, then detailed being applied to predict future builds via ML and data analytics.
comparisons can be performed and similarities can be identified, Structural features mined from historical build data can be used to
resulting in a reduction of unique parts that are airplane specific. predict gaps in future builds, using only a targeted subset of mea-
Data mining and ML are poised to revolutionize industry para- surements to infer the high-fidelity structure. This sparse sensing
digms for standardization. Currently, modern digital tools for design- methodology was successfully applied for the predictive shimming
ing, inventory, and quality control produce a glut of data that often sits of new wing-to-body join builds (Sec. VIII). Not only does sparse
unused. This data can be mined for dominant features to inform future sensing bypass the heavy planning and processing required of high-
decisions and streamline the design process. To discover relevant, fidelity metrology, but it also identifies a set of key structural features
redundant patterns, feature selection criteria can be tailored to extract that may be analyzed for diagnostics and defects.
the desired topological, spatiotemporal, physical, and material prop-
erties. Recent strides have been made in interpretable ML models that E. Materials
directly incorporate sparsity-promoting, physical, temporal, and Adoption of new materials is a major challenge in the aerospace
topological constraints into optimization objectives. The key features industry, due to the amount of structural testing done to certify a
can be used to identify standard designs and reduce the number of material. The standard approach is the building block shown in Fig. 8.
unique designs or parts so that they can be shared across future This approach starts with a large amount of tests at small coupon
designs and applications. levels, a much smaller number of tests at subcomponent scale, a
handful of tests of subassemblies, and one or two full-scale test
C. Automation articles. Moreover, additional validation is needed when the
The aircraft industry has driven the development of automated manufacturing process or process parameters change, because they
machines to fabricate detailed parts, including machining metals, can introduce different features in the part. There are two areas where
BRUNTON ET AL. 2833
Fig. 8 Schematic materials building block diagram.
ML can have significant impact: physical testing and materials is important for the quality and final strength of the part and requires
characterization. in-process inspection. Traditionally the in-process inspection has
The use of data analytics and ML, in addition to physics-based been a labor- and time-intensive part of the process. Vision and
models, can result in a significant reduction in testing. First, we might thermal inspection systems [166,167] are starting to replace manual
be able to correlate behavior at larger scales to a small number of inspection to measure material placement and detect other defects.
fundamental material properties by analyzing data on existing mate- The data produced by these systems are prime candidates for data
rials. That could already help filter out the most promising material analytics and ML [168], for diagnostics and defect detection.
candidates in the material screening and selection phase, without A significant number of composite parts are not laid up in their
wasting time and money on extra characterization testing. In sub- final contour, but instead laid up flat and then mechanically formed.
sequent phases the sheer number of tests for new materials could be Examples are stringers and spars, which are formed from a flat
reduced by learning information from existing material systems. laminate to an L-profile, hat-profile (see Fig. 9), or C-profile, in a
Secondly, in the absence of physical tests on new material parame- machine and subsequently placed in a tool so they can be combined
ters, physics-guided ML may be applied to process and structural with a second profile or a skin to form an assembly once they are
simulations to correlate parameters to performance, thus enabling cured. These profiles often have to follow the contour of a fuselage or
highly efficient exploration of material parameters. ML is also being wing skin, which besides the global aerodynamic shape can have
used to design new materials [15], including superalloys [160–163]. relatively aggressive geometric transitions when the skin thickness
More recently there has been an increased focus on material char- changes. The profile forming is often done in a machine, if the global
acterization for fiber-reinforced composites process modeling to contour is benign, while aggressively curved stringers are manually
reduce the amount of physical trial and error. Processing often happens laid up ply-by-ply. Furthermore, placing the stringers in the tool is
when the material is uncured and at elevated temperature, requiring a often done manually. The biggest risk with forming composite
different set of tests than is needed for the structural characterization. laminates is the generation of wrinkles when material is compressed
For example, numerical models used to predict the quality of thermo- locally to conform to its new shape. Small changes in pressure,
plastic composite laminates [164,165] require information on shear material tack, or composition of the layup cause variability in the
and bending behavior of the laminate, friction between layers (which presence, location, and severity of the wrinkles. In addition, manual
changes when their relative orientation changes), and friction between processes by definition have high variability, making it difficult to
the composite and any contacting tooling material (which in turn could predict wrinkles for formed composites. Therefore every part is
have some surface coating). Most of these properties change with inspected and structural integrity is verified for every occurrence,
temperature, and some also depend on the forming speed and pressure.
Here, sparse regression techniques in ML can help discover funda-
mental laws and relationships, resulting in significant reductions in the
number of tests to be conducted, as well as the time and costs needed to
introduce new materials.
F. Composite Fabrication
Composite part fabrication can consist of many steps, some of
which are more data rich than others. Examples of the most time-
consuming, defect-prone, or unpredictable processes are material
laydown, forming, uncured part handling, in-process inspection,
and autoclave curing. We will discuss some of these below.
Material laydown can be done manually, or by automated fiber
placement (AFP) or automated tape laying (ATL). In AFP and ATL a
robotic head places composite strips of material onto a tooling sur-
face, building up each layer by making multiple passes to cover the
surface. AFP is typically used for contoured parts, whereas ATL is
most suitable for near-flat layup. Accuracy of the material placement Fig. 9 Example of hat-shaped stringers on a 787 Fuselage [169].
2834 BRUNTON ET AL.
and if necessary, repairs are made to recover any loss in performance. when each component, subsystem, and integrated system process can
Data analytics and ML have the potential to improve composite be shown to have a proven level of performance, repeatability, and
forming processes in three critical areas: 1) support the development accuracy. Because aircraft are designed to have near zero tolerance
of physics-based models by identifying features that are dominant in for failures, these failures are intentionally triggered during testing to
the formation of wrinkles, 2) automate the detection and characteri- evaluate the resulting effects on function and performance. The rarity
zation of wrinkles in inspection data, and 3) improve part quality by of failures in the high volume of flight test data presents an additional
using vision-based systems to track humans and determine patterns in challenge to ML algorithms, which require a sufficient number of
their actions that result in better quality parts, which could lead to examples to build models for failure.
better work instructions or the development of an automated system
that mimics the best manual process. A. Digitization
There is a significant opportunity to improve and automate veri-
fication and validation through data digitization, which will enable a
VI. Aerospace Verification and Validation wealth of downstream data analytics for pattern extraction and
The testing phase of aircraft development, which includes exten- anomaly detection; see Fig. 10. To understand the benefits of a
sive validation, verification, and certification processes, is poised to digitized workflow, it is helpful to understand the current testing
leverage data-driven methodologies and the associated wealth of landscape. Existing operations are rampant with mundane manual
predictive analytics. Testing a new product involves verifying func- touch points, requiring significant process, time, and resources to
tion and performance guarantees and validating that the system meets validate that there are no quality lapses. Current workflows rely on
certification and government regulations. A critical goal in modern experienced engineers to synthesize upward of 20 data sources to
data-driven flight testing is to improve aircraft safety and robust search for information to substantiate a report or troubleshoot a
operation, while reducing time and expense of testing programs. This problem. Systematic efforts to leverage data across flights and pro-
goal will be enabled by a greater understanding of the fully integrated grams and to perform exploratory analyses are virtually nonexistent.
product behavior, its robustness, and nuances, resulting in more A digitized flight test workflow that leverages ML would enable
effective models and digital twins. This knowledge will inform and the automation of tasks that are currently laborious, reactive, last
improve upstream design and manufacturing processes and down- minute, discrete, and manual. The result would be a flow of informa-
stream services, ultimately enabling faster, more flexible aircraft tion that is effortlessly available to all test stakeholders and partic-
design and customization by streamlining verification and validation ipants from the preliminary concept to the final test report. Collection
cycles. Testing occurs at the component level, subsystem level, and and aggregation of process metadata will enable visualization of core
the integrated system level, as depicted in the design diamond in components, such as the test plan, test approvals, aircraft configura-
Fig. 7. ML algorithms can be used to streamline the data collection tion, instrumentation configuration, and conformity status, with each
and processing in each of these stages and to inform and refine earlier component being updated as information becomes available. This
design and manufacturing stages. Other opportunities that are ripe for centralized, real-time repository will provide engineers with the time
data-science enabled advances include the extraction and visualiza- and resources to process critical information, collaborate with peers,
tion of patterns that may be inaccessible to human analysis; sensor and engage with the test data as it is generated.
optimization, robust processing, and anomaly detection; identifying There are several significant challenges that face efforts to digitize
and characterizing discrepancies between models and physical devi- verification and validation workflows. Digitizing the workflow can-
ces; and using active learning to streamline the number of experi- not compromise the integrity of the information or its ease of access.
ments and data required to validate models and manage uncertainty. Any changes must improve upon the users’ ability to confidently
In what follows, we frame the evolution of testing mainly through the assess the quality, reliability, and traceability of the data. However,
lens of commercial aviation products, whose key technology trends the benefits of making operations more efficient by leveraging ML
are empowered by data digitization, digital-twins, and physics-based and uncertainty quantification make digitation worth the effort. It is
modeling to account for dynamic behaviors. expected that digitization will result in sweeping benefits across the
We first consider the type of data encountered in flight and lab production cycle and in-service life cycles.
tests. The amount of data generated by a flight test aircraft is diverse
and vast, with upward of 200,000 multimodal sensor measurements
B. Model-Based Validation
during a single test. The measurements are a mix of airplane
generated digital production measurements and analog flight test Historically, testing requirements were developed in component-
instrumented sensors. The multimodal sensors include strain gauges, or discipline-specific silos. These requirements were designed into
pressure transducers, thermocouples, accelerometers, and video, test points as flight task cards to test specific configurations, or
among many others. Measurements are collected and stored asyn- performance metrics. Critical or corner points of the performance
chronously in time, with sampling rates varying from less than 1 Hz envelope were identified as test conditions. Additional test points
to upward of 65 kHz. Some measurement data is only collected when were determined by selecting evenly spaced points throughout the
triggered or when a change is detected. All measurement data is operating range of the component or aircraft. In conjunction with the
stored for each flight test, generating gigabytes of data. Much of test domain, the test procedures were aimed at isolating a small
this data must be synthesized across sensor modalities and in time, number of variables related to performance. The goal of test planning
both within a single flight test and across multiple tests. Similarly, was to complete test conditions and test plans as efficiently as
high-rate asynchronous lab test data is collected in unique demand- possible over the range of test parameters. It was typically the job
ing environmental conditions, such as extreme vibration, icing of a test planner to consider readiness and dependencies of the test
conditions, sandblasting, extreme temperature, and high structural plans in order to identify the most efficient path through flight test.
demands. These environments impose unique requirements on the Due to data capture and storage capacity limitations, data for
sensors and can result in the sensors falling out of calibration. flight tests were captured only for discrete single test conditions.
The complexity of a typical commercial aircraft must also be Bandwidth and storage limitations prevented data from being used
considered. There are approximately 2.3 million parts on the Boeing for real-time computing and recording. The phrase data on was used
787 with 70 miles of wire and 18 million lines of source code in the as the marker to begin a test condition where data would begin
avionics and flight control systems alone. Additionally, there are 10 recording. As such, data for test conditions was gathered and stored
major systems interacting together in a complex, dynamic, and in discrete segmented test condition increments. Only a small per-
rapidly changing environment. As for constraints, critical system centage of all data was ever stored and analyzed on any given flight.
components must have a failure rate of less than 10−9 or 10−12, Further, if you weren’t data on during a critical event in flight, either
depending on the system, and backup components that seamlessly planned or unplanned, that data was lost without being recorded or
step in when required. Flight testing is the final validation step for analyzed. Engineers used paper strip charts that look similar to an
every design and performance requirement. Certification is achieved EKG printout for real-time plotting. If the strip chart was not set to
BRUNTON ET AL. 2835
Fig. 10 Schematic overview of digital verification and validation processes, integrated with digital twin technology and optimized via machine learning.
plot the critical parameter during the test, there was no way to retrieve need to inform a discrepancy model for the system. The goal of flight
that data in flight. Cross plots were often created by hand with paper test will evolve from completing as many test conditions as possible
and pencil. Computers had limited computing capability and often to increasing certainty in the multitude of physical, functional, and
only crucial calculations were computed in real-time. More intensive logical models that make up the digital twin. Such future testing will
computing was performed by a mainframe computer after the test. require a more seamless synchronization with the digital twin to
Technological innovations in computing and sensors have led to the physical entity it represents, providing behavioral or functional
significant improvements in flight test data collection, storage, and predictions, which are then informed by measurement data. Robust
visualization. Present day testing requirements are still largely devel- and computationally tractable reduced-order modeling approaches,
oped in discipline-specific silos. Domain expertise has helped iden- along with data assimilation and uncertainty quantification, will be
tify areas where tests can be modified to be more compatible or required to validate and explore models on the physical asset in
concurrent with other tests. Additionally, computational tools have real-time. Data from these models will then be used to update the
been developed to aid human management and optimization of the digital twins.
flow of the test program. However, data science methods have yet to Flight test will likely continue to require discrete test conditions.
be used at significant scale to improve test optimization. Much of the However, an increased emphasis should be placed on analyzing data
improvement in present-day testing has been the result of improved beyond what is required for the test condition. This holistic approach
computing hardware. Storage has improved so all data parameters differs significantly from the past approach of only capturing data
can be recorded and stored for the duration of an entire flight. Data during the test condition to now making 100% of the data available
analysis has moved from mainframe computers to laptops. There for analysis. Modern techniques in ML will facilitate this analysis of
have been some improvements in algorithms and calculation meth- high-dimensional, multimodal data, potentially extracting patterns
ods especially in the areas of data fitting and filtering. Data visuali- and correlations that would have been inaccessible with previous
zation has seen the greatest improvements: paper has been replaced approaches. For this model and data revolution to be successful,
with electronic strip charts and plots, and the use of colors and data additional data from disparate sources will be required beyond what
triggers has allowed more data to be monitored and analyzed, with is needed to validate the digital twin. For example, meta data will link
less cognitive overload. Although computational and visualization these data, including information about what parts were installed
improvements have been significant, performance and evaluation (configuration), what airport(s) were involved, and what were the
metrics and algorithms have remained largely unchanged. Most data qualitative characteristics of the aircraft. Combining this meta-data
is still analyzed on a condition-by-condition basis, leaving large with the digital twin will be crucial to flight test of the future.
amounts of flight data unanalyzed.
Future testing will be heavily informed by the models that make up C. Leveraging Emerging Technologies and Algorithms
the digital twin. Safety-critical test conditions, where margins are There are several other opportunities for enhanced flight test
thin, will be identified from models and simulations. Additional test capabilities that are being enabled by rapidly advancing technologies
conditions will be determined by a principled and data-informed and algorithms. As aerospace products become more advanced, it
selection, where physical tests will increase confidence in the model. becomes increasingly difficult for human experts to manage the
Active learning [170,171] may be used to streamline this testing, interactions between systems when troubleshooting a problem or
reducing the number of physical queries required to validate a model. designing a test with multiple failure conditions. However, ML and
Effective test planning will involve a multidimensional and multi- data visualization algorithms are enabling the analysis of vast, high-
objective optimization to efficiently achieve convergence of the dimensional data sets, resulting in the identification of patterns and
coupled, multiscale, and multiphysics models. Test plans will need correlations that are intractable to human analysis. Identifying poten-
to be flexible enough to support model exploration and refinement tial interactions, and ultimately, failure modes, is a combinatorially
when the model diverges from the physical world; that is, testing will complex task; however, AI systems of the future may seamlessly
2836 BRUNTON ET AL.
integrate decades of past test data to make informed decisions about autonomy will need to interface in a safe, efficient, and effective
likely complications. This advanced diagnosis and problem solving manner. Consideration of robotic failure conditions and human activity
capability will only be possible with systemwide digital twin and awareness will be required.
digital thread efforts. In addition, considerable test time may be saved
by automatically identifying sensor failure and detecting anomalous B. Eliminating Unscheduled Maintenance
data [110,114,172], such as calibration drifts. Improved sensor fusion Unscheduled maintenance causes costly delays, wreaks havoc
[173] and filtering algorithms, based on physics-informed ML mod- on a carefully optimized schedule, and inconveniences passengers,
els of the system and its components, will enable real-time diagnoses
damaging the reputation of the airline carrier. Customers must bal-
with confidence intervals. Sparse sensor optimization [76], or iden-
ance the cost and time of preventative maintenance with the cost of
tifying which sensors to listen to for a given task, will also reduce the
unscheduled maintenance. Eliminating unscheduled maintenance
computational burden, enabling real-time decisions and updates to
would transform the industry, and this may be enabled by an expert
the flight plan, instead of postmortem analysis.
aircraft system that identifies and communicates part anomalies and
self-diagnoses wear, low fluids, software glitches, etc. This technical
VII. Aerospace Services advance would require customer support centers to be able to down-
In-service operations are the support provided to customers or by load and process massive near-real-time data, run trend analysis, and
make automated decisions in real-time. Current prescriptive models
the customer operations team to support an operational fleet. From

the initial delivery to long-term maintenance, there exist significant indicate that a particular part will fail within the next three flights with
opportunities for leveraging data to improve and optimize mainte- an 80% accuracy; although this information seems valuable, it is only
nance and airside support. A major source of customer expense arises marginally useful to improve operations. Increased value would be
from aircraft damage, injury, delays to airside support, and unsched- gained by understanding why the part is failing, or explaining the
uled maintenance. The positive business impacts of autonomous exact maintenance actions that need to be taken to rectify the sit-
airside operations and not incurring unscheduled maintenance can uation. Knowing the right maintenance actions, for example, greas-
be met with emerging sensor and algorithmic innovations. ing the right bearing, updating software, or replacing the whole part,
would have a significant impact on the efficiency and efficacy of
A. Autonomous Airside Support
maintaining the fleet. Being able to detect that a failure is about to
happen, or has happened, is good. Being able to identify why the part
Figure 11 shows the many data-driven aspects of ground service. is failing, and enabling a recommender system to assess the best
Immediately upon arrival at an airport, preparations begin for receiving course of action, is even better. Often, how the operator actually uses
the airplane and preparing it for its departure. Critical tasks include the system will violate the assumptions that were made when deter-
1) accurately positioning the aircraft in and out of the gate; 2) removing mining the maintenance schedules, which leads to early part failures.
passengers, cargo, and waste; 3) reloading the aircraft with passengers,
Leveraging part data would enable root cause identification and
luggage, potable water, etc.; 4) servicing any aircraft fluids (oil, washer
inform the design centers to enable future designs to be better aligned
fluid, fuel) or maintenance items; 5) investigating any reported
with in-service operations.
inbound issues; and 6) performing a visual pre-inspection check in
the interior and exterior of the aircraft. Many of these tasks can be
automated with modern technology, through robotic interactions, C. Closing the Loop with Design, Manufacturing, and Testing
autonomous vehicles, or visual inspection using current computer When the digital thread connects data from aircraft design, manu-
vision algorithms. Importantly, modern IoT technology can integrate facturing, and testing, it will be possible to leverage service data to
the diverse data streams in order to make an informed and accurate improve each of these stages, and vice versa. The lifetime of an
decision about the airside support tasks that require human interven- aircraft, from concept designs to retirement, spans decades, and the
tion. Such data integration can provide significant savings, which aging processes of new materials and manufacturing processes
allows human resources to be used in targeted and efficient ways. must be incorporated into the digital twin. These data-informed
Importantly, even in this fully autonomous system, humans and models will enable adjustments and refinements to the design and
Fig. 11 Illustration of many potentially data-intensive aspects of ground service.

BRUNTON ET AL. 2837
manufacturing procedures to improve the performance of future patterns exist in shim distributions across aircraft, and that these
aircraft. Similarly, a more holistic digital twin, including models patterns may be mined from historical production data and used to
for aging and degradation, will be useful for maintaining and services reduce the burden of data collection and processing in future aircraft.
fleets more effectively. These models will facilitate more accurate Specifically, robust PCA (RPCA) [110] from Sec. II is used to extract
sensor filtering and data assimilation, as well as downstream tasks of low-dimensional patterns in the gap measurements while rejecting
anomaly diagnosis and detection. outliers. RPCA is based on the computationally efficient SVD
[9,187] and yields the most correlated spatial structures in the aircraft
measurements, identifying areas of high variance across different
VIII. Case Study: Predictive Assembly and Shimming aircraft. Next, optimized sparse sensors [76,132,188] are obtained
Aircraft are built to exceedingly high tolerances, with components that are most informative about the dimensions of a new aircraft in
sourced from around the globe. Even when parts are manufactured to these low-dimensional principal components. The success of the
specification, there may be significant gaps between structural com- proposed approach, known within Boeing as PIXel Identification
ponents upon assembly due to manufacturing tolerances adding up Despite Uncertainty in Sensor Technology (PIXI-DUST), is demon-
across large structures. One of the most time-consuming and expensive strated on historical production data from 54 representative Boeing
efforts in part-to-part assembly is the shimming required to bring an commercial aircraft. This algorithm successfully predicts 99% of the
aircraft into the engineering nominal shape. A modern aircraft may shim gaps within the desired measurement tolerance using around
require on the order of thousands of custom shims to fill gaps between 3% of the laser scan points that are typically required; all results are
structural components in the airframe. These shims, whether liquid or rigorously cross-validated.
solid, are necessary to eliminate gaps, maintain structural performance, This approach to predictive shimming combines robust dimen-
and minimize pull-down forces required to bring the aircraft into sionality reduction and sparse sensor optimization algorithms to
engineering nominal configuration for peak aerodynamic efficiency. dramatically reduce the number of measurements required to shim
Historically, parts have been dry-fit, gaps measured manually, and a modern aircraft. In particular, RPCA from Sec. II is used to extract
custom shims manufactured and inserted, often involving disassem- coherent patterns from historical aircraft production data. Thus,
bly and reassembly. Recent advancements in three-dimensional RPCA is used to develop low-dimensional representations for
scanning have enabled their use for surface measurement before high-dimensional aircraft metrology data (e.g., laser scans or point
assembly, known as predictive shimming [174–186]. Gap filling is cloud measurements). Shim scan data is collected across multiple
a time-consuming process, involving either expensive by-hand aircraft, either leveraging historical data, or collecting data in a
inspection or computations on vast measurement data from increas- streaming fashion as each new aircraft is assembled. The shim
ingly sophisticated metrology equipment. In either case, this amounts measurements for a region of interest are flattened into column
to significant delays in production, with much of the time being spent vectors xk ∈ Rn , where n corresponds to the number of measure-
in the critical path of the aircraft assembly. ments and k refers to the aircraft line number. These vectors are then
In this case study, we present a recent strategy for predictive stacked as columns of a matrix X x1 x2 · · · xm , where the
shimming [159], based on ML and sparse sensing to learn gap total number of aircraft m is assumed to be significantly smaller than
distributions from historical data and then design optimized sparse the number of measurements per aircraft, that is, m ≪ n.
sensing strategies to streamline the collection and processing of data; Because of tight manufacturing tolerances and a high degree of
see Fig. 12. This new approach is based on the assumption that reproducibility from part to part, it is assumed that the matrix X
Fig. 12 Cartoon illustrating recent progress in predictive shimming in the last decade.
2838 BRUNTON ET AL.
possesses low-rank structure. As described above, there may be Figure 13 displays the seven separately manufactured shim seg-
sparse outliers that will corrupt these coherent features, motivating ments, as well as the sensor ensembles for each shim; the error
the use of RPCA to extract the dominant features. Next, sparse distributions are given in Fig. 14. Prediction results are shown in
sensors are identified that maximally inform the patterns in future Table 1. Prediction accuracy is vastly improved, and 96–99% of the
aircraft using sparse optimization techniques from Sec. II.B. The shim gap locations are predicted to within the desired machining
goal is to identify a small number of key locations that, if measured, tolerance. Furthermore, we note that the rates of optimal measure-
will enable the shim geometry to be predicted at all other locations; ments r vary from anywhere between 2 and 6% of all points within
this is possible because of the low-dimensional structure extracted the shim, which indicates that some shims are higher-dimensional
through RPCA. Only measuring at these few locations will dra- and require more features (hence, sensors) to be fully characterized.
matically reduce measurement and computational times, improving This is also reflected in the sensor ensembles in Fig. 13.
efficiency. This case study demonstrates the ability of data-driven sensor
We demonstrate the PIXI-DUST architecture on production data optimization to dramatically reduce the number of measurements
from a challenging part-to-part assembly on a Boeing aircraft. This required to accurately predict shim gaps in aircraft assembly. Sparse
data set consists of 10,076 laser gap measurements of the part sensor optimization was combined with robust feature extraction and
assembly for 54 different production instances of the same aircraft applied to historical Boeing production data from a representative
type. Measurement locations are aligned between the instances, aircraft. With around 3% of the original laser scan measurements, this
making the data amenable to SVD. We build a low-order model of learning algorithm is able to predict the vast majority of gap values to
the shim distribution using RPCA and then design optimized within the desired measurement tolerance, resulting in accurate
measurement locations based on these data-driven features. We train shim prediction. These optimized measurements exhibit excellent
the model on 53 aircraft and then validate on the remaining aircraft; cross-validated performance and may inform targeted, localized laser
this process is repeated for all 54 possible training/test partitions to scans in the future. Reducing the burden of data acquisition and
cross-validate the results. Thus, a data matrix X ∈ R10076×53 of train- downstream computations has tremendous potential to improve the
ing data is constructed, in which each column contains all of the shim efficiency of predictive shimming applications, which are ubiquitous
gaps for one aircraft, and each row contains the measured gap values and often take place in the critical path of aircraft assembly. Thus,
at one specific location for all aircraft in the training set. streamlining this process may result in billions of dollars of savings.
Fig. 13 Gap measurement locations segmented into seven shims (left) and ensembles of selected sparse sensors (right). Reproduced with permission
from [159].
Fig. 14 Absolute error distribution for optimal (blue) vs random (orange) sensors on validation tests. The red line represents the desired measurement
tolerance. Reproduced with permission from [159].
BRUNTON ET AL. 2839
Table 1 Segmented prediction results show vastly improved prediction accuracies,

with 97–99% of gaps predicted to within the desired 0.005-inch measurement tolerance
Shim No.
1 2 3 4 5 6 7
Percent accurate 97.90 98.05 99.82 99.94 99.99 99.03 99.97
Optimal sensors (avg) 26 26 25 26 25 26 25
Total points 1003 1116 453 692 709 768 664
IX. Case Study: V-22 Osprey resulting in highly unsteady airflow. For a helicopter, this often results
The V-22, shown in Fig. 15, is a multirole tiltrotor combat aircraft, in a high rate-of-descent, and for a tiltrotor the result is typically roll-
designed to take off and land like a helicopter, but with the range and off. Both situations are perilous, especially in close proximity to the
speed of a fixed-wing aircraft. These impressive capabilities came ground. VRS still remains a significant area of research.
with considerable engineering challenges, including the mechanics The goal of a digital twin is to bridge the gap between the physical
of the tiltrotor, the aerodynamics, and the control systems. The world and the virtual world. Two aspects of a digital twin that
development and test of the V-22 were mired in delays, cost overruns, differentiate it from traditional modeling are 1) continuously learning
and safety mishaps. The purpose of this case study is to identify how and updating internal models with real-world data, and 2) integration
the use of data science and a digital twin could have benefited the of data from one or more multiphysics models across a system. For a
design and testing of a technologically advanced new aircraft, such as digital twin to continuously learn, it needs to be tested, updated, and
the V-22. The intent of this retrospective is not to critique the V-22, re-tested. In the case of the V-22, a digital twin may begin with a
which is by all accounts an engineering marvel, or the design and model of VRS such as the one presented in [189], which is merged
testing program; many of the modern technologies discussed here did with the model and flight test data for the specific aircraft. Discrep-
not exist or were in their infancy at the time. Instead, we present a ancy modeling can be used as a data-driven method to explain and
summary, based on lessons learned from the V-22 program, where update model divergence. On at least two occasions V-22 pilots
modern digital twin technology would have been beneficial to safety experienced significant uncommanded roll events during formation
and facilitated an on-schedule design. flying [190]. As a result of this, and the aforementioned accident,
A 2001 review panel on the V-22 recommended the program the V-22 kicked off an extensive formation flying and high-rate-of-
“Extend high-rate-of-descent testing, formation flying (and other descent flight test campaign. Combining a VRS model with forma-
deferred flight tests as appropriate) to sufficiently define and under- tion-flying models would be an ideal opportunity to integrate multi-
stand the high-risk portion of the flight envelope under all appropriate physics models across a system. The goal of integration as data
flight conditions” [189]. The panel went further and recommended that becomes available is to work toward model convergence, or the
the results of the high risk, high rate-of-descent testing be used to bridging of the physical and virtual such that the digital model
update operating limitations as well as the flight simulator used accurately represents the physical environment. The integration of
for crew training. The call for additional high rate-of-descent testing unsteady, nonlinear, multiphysics models is no easy task. Implement-
came as a result of a fatal accident in April 2000 involving both ing a real-time reduced-ordered model may help discover and explore
high-rate-of-descent and formation flying. The cause of the accident areas of significant model discrepancies in flight test. The use of
was attributed to an aerodynamic condition known as a vortex ring physics informed learning and custom regularization in the ML
state (VRS), which is an unsteady aerodynamic condition that occurs optimizations may aid in maintaining physical constraints as the
for rotorcraft operating at low forward speed and high rate-of-descent. two models are merged to form a digital twin. With that enhanced
In this condition, the rotorcraft does not fly away from the rotor wake, model in hand, there is benefit in learning from a high number
Fig. 15 V-22 Osprey. Image by Peter Gronemann, reproduced from https://en.wikipedia.org/wiki/File:V22-Osprey.jpg.

2840 BRUNTON ET AL.
of simulations. Flight testing could be reduced to the most critical test A large number of companies, both established and startup, have
conditions: safety, mission critical, and model confirmation. been exploring the development of vehicles suitable for dense and
Certainly, this simple case study is not without conjecture. However, safe personal transportation in urban settings using electric propul-
the tools presented in this paper will form the basis for data-driven sion. Traditional fixed wing aircraft require long runways that are
hypotheses that serve as the roadmap for future aerospace engineer- spatially infeasible in cities, while helicopter designs are unable to
ing possibilities. scale to large numbers because of maneuverability and safety issues
stemming from the lack of unpowered gliding capability. Innovative
designs have been proposed over the past decade to achieve the needs
X. Case Study: Urban Air Mobility of safe, dense operation with high maneuverability, safety, and green
Urban air mobility is an emerging transportation framework with fuel. The designs of such vehicles require the perspective of multi-
the goal of providing on-demand, personal point-to-point transpor- disciplinary optimization with regard to aeroservoelasticity consid-
tation within and between the obstacle-rich environments of urban erations. Specifically, there is a tight physical coupling between the
domains by leveraging the airspace above roads and between and unsteady aerodynamics over the flexible airframe, which is further
above buildings and other physical infrastructure. Urban air mobility influenced by the propulsion and control surfaces. Thus, these com-
has great promise for alleviating transportation congestion, decreas- ponents cannot be decoupled and designed independently, making
ing travel times, decreasing levels of pollution through the use of the design space quite rich and challenging.
next-generation electric aircraft, and producing technology to benefit Historically, successful innovation of viable airframes has required
a wide range of industries; see Fig. 16. The purpose of this case study years of extensive experience. The advent of modern computational
is to demonstrate how the component subsystems of urban air mobil- capabilities will enable faster exploration of the design space through
ity have and can benefit from data science. computational fluid dynamics (CFD) and finite element analysis
To achieve the promise of point-to-point, on-demand personal (FEA) tools to simulate vehicle performance in a range of flight
transportation, a number of key technological challenges must be and loading conditions. These tools, and the design space over which
addressed. These challenges include, but are not limited to, vehicle they are leveraged, are inherently data rich. While CFD and FEA are
design, testing, and certification; vehicle fleet health monitoring; effectively already data science tools, the potential exists for more
vertical garages (vertiport) design and construction for passenger extensive analytics to intelligently reduce the vast design space to a
on- and off-boarding; minor maintenance such as battery charging, more manageable level. An example of how such design tools can
fueling, cleaning, and major maintenance and repair; logistical impact outcomes can be seen in the multirotor design tool [191] that
scheduling systems, resembling land-based ride-shares, taxis, trains, produces unexpected configurations for vehicles with multiple sets of
and buses; air traffic routing, requiring next generation air traffic rotating blades like a helicopter. While those designs are not likely
control; integration of autonomy for guidance, navigation, and con- options for passenger vehicles, the current innovation environment
trol; and autonomous deconfliction, such as automatic dependent coupled with both the traditional design experience and modern data
surveillance-broadcast (ADS-B). Nearly all of these technologies science tools has led to a rich array of proposed vehicles for urban
will rely on improved sensor networks, robust data communication operation. Options being considered include multirotors, short run-
and processing, and advanced ML algorithms. As discussed next, way vertical takeoff and landing, and various hybrid rotor/propulsion
these system elements are at varying levels of maturity and designs. At this time, no single vehicle design is a clear frontrunner,
technology readiness. and as with traditional fixed wing aircraft, the most likely outcome
Fig. 16 The urban airspace of the future will involve multiple platforms operating on various size and timescales.
BRUNTON ET AL. 2841
will be a series of vehicle designs from companies that will share the manufacturing, testing, and services that all may be enabled by
market and the urban air space. advanced data science algorithms. For example, future data-driven
As with long-range air transportation, stable operation of fielded algorithms may enable the design of new supermaterials, streamlined
fleets of vehicles requires vehicle health monitoring, maintenance flight testing for safer and more flexible designs, and enhanced
systems, and technical support. Onboard sensing and data logging manufacturing processes, including standardization, predictive
capabilities for mechanical systems have been developing rapidly, assembly, and nondestructive inspection. The array of potential
with many individual subsystems (e.g., power, propulsion, cabin applications is both riveting and at times may be overwhelming.
environment) being equipped with data systems that log and either Some of these advanced technologies, such as predictive shimming
save or regularly transmit system health information to a data server and assembly, are already experiencing considerable success in
(e.g., Rolls-Royce and BMW). The sheer volume of data provided by commercial production. Other applications will be enabled by the
these systems is immense, and appropriate data mining and monitor- digital twin of the future, with advanced optimization and learning
ing tools are needed to flag operation and maintenance issues and to algorithms seamlessly fusing multiphysics and multifidelity models
remediate issues that arise during flight. Maintenance and service for with real-world data streams. These models will continuously
these vehicles will likely be a bit different from those for long-range improve, with potentially radical implications for the entire aerospace
flight vehicles. For aircraft powered by batteries, down time between production pipeline. In addition to these opportunities to leverage the
passengers may require a quick battery charge or replacement. For growing wealth of data, there are also challenges to incorporating ML
vehicles with combustion engines, fuel tanks will need to be filled. into the aerospace industry. A brief summary of some high-priority
Operational inspections will need to take place to ensure passenger opportunities and challenges is as follows:
safety. These tasks must all be performed in such a way as to facilitate Opportunities:
efficient vehicle takeoffs, landings, and ground maneuvering. 1) Faster design and testing cycles with digital twin and digital
Solutions for efficiently managing these tasks are actively in develop- thread technology and accurate/efficient surrogate models
ment for other applications such as autonomous ocean transportation, 2) Safe, efficient, and reproducible manufacturing: predictive
autonomous ground transportation, and smart city infrastructure. In assembly, process control, part standardization, reproducibility,
addition, vertical garages will be key to the successful adoption of nondestructive inspection, automation, etc.
urban air transportation, enabling safe takeoff and landing and 3) Streamlined and more reliable testing, evaluation, and
facilitating efficiencies that might not be possible otherwise. certification, including anomaly detection
Logistical systems appropriate for handling on-demand transpor- 4) Improved models for complex multiscale physics, such as fluid
tation requests are quite mature. The advent of ride-share companies, dynamics and advanced materials and composites
such as Uber and Lyft, has driven the development of effective mobile 5) Enhanced business analytics: future product development,
applications that handle all aspects of the process, from user registry, supply chain, sales, human resources, marketing, predictive main-
payment, localization, service connection, tasking, scheduling, rout- tenance, etc.
ing, and feedback surveys. These tools are at a level of maturity that Challenges:
should translate readily to the urban air mobility domain. Data 1) Certifiable and verifiable ML models for safety-critical
science tools have been developed specifically to address these needs applications
and are extensively leveraged for many aspects of the entire logistics 2) Existence of a critical need for interpretable and generalizable
system. One exception in this technology is air traffic routing. ML models that incorporate partially known physics
Current tools for ground transportation rely on a well-structured 3) Heterogeneity, multimodality, and multifidelity of aerospace
operational environment that has extensive physical infrastructure data, which is vast in some dimensions, and sparse in others
(e.g., roads, buildings, bridges, traffic lights, signage) and rules-of- 4) Data mortgage, where collecting and maintaining data comes at
the-road that inform and are informed by regulatory authorities. a prohibitively high cost
A key question regarding urban air transportation is that of whether 5) Education and workforce development in classical aerospace
aircraft will be piloted, unpiloted, or a mix of both. Given the designs engineering and emerging data science technologies
under development, the most likely scenario is the hybrid case where 6) Fostering a culture of data sharing, open science, and repro-
a mix of many different types of vehicle piloting operate simulta- ducibility
neously. Even in the case of human-piloted vehicles, a great deal of There are clear parallels between the recent rise of data science and
vehicle autonomy is in operation at any time to manage the myriad the transformative impact of scientific computing beginning in the
subsystems that are required for a modern flight system. The dense 1960s. Computational science began as a specialized focus and has
airspace of the future will likely exacerbate the existing challenges of since evolved into a core engineering competency across many
autonomy. Further, the current human-centric air traffic control disciplines. Likewise, data science and ML proficiency will be
framework will simply be unable to address the temporal and spatial expected as a core competency in the future workforce, highlighting
density needed for urban air mobility to be viable. However, autono- the need for robust education initiatives focused on data-driven
mous routing, deconfliction, and obstacle avoidance are not yet at a science and engineering [9].¶¶ Data-driven aerospace engineering
level of maturing to be fielded. The issues are partly technological will also require changes to how teams of researchers and engineers
and partly policy. On the technology side, sensors and algorithms are are formed and how decisions are made. A recent Harvard Business
in late stage development (e.g., ADS-B), but they have not yet been Review paper [192] pointed out the need for data scientists to be
adopted into standard practice. On the policy side, work has been integrated into the decision-making process, likening this to having
underway for Next Generation Air Traffic Control (NextGen ATC) Spock on the bridge in Star Trek. The last challenge to discuss is how
for the past two decades, but final adoption has not yet reached final to develop a data-first culture, which is imperative to fully harness the
stages that were planned for 2025. To realize the NextGen ATC potential of data-enabled engineering. A data-first culture requires
system, data science techniques are needed to provide real-time enterprise-wide education and adoption of principles that promote
routing of all vehicles in the same way that ground-based cooperation and data sharing, reproducible practices, common ter-
transportation systems use. minology, and an understanding of the value of data and the promise
and limitations of ML algorithms. Moreover, this transformation will
require ongoing curiosity and deep thinking from engineers and
XI. Conclusions leaders about how processes might be improved with emerging
This paper has provided a high-level review and roadmap for the technologies.
uses of data science and ML in the aerospace industry. Aerospace Current and future aircraft programs will be increasingly enabled
engineering is a data-rich endeavor, and it is poised to capitalize by a wealth of data. There is low-hanging fruit in current aircraft
on the big-data and ML revolution that is redefining the modern
scientific, technological, and industrial landscapes. A number of ¶¶
See http://databookuw.com/ for video lectures, syllabi, and code that are
high-priority opportunities have been outlined in aerospace design, tailored for a science and engineering audience.
2842 BRUNTON ET AL.
programs that may provide immediate benefits with existing data. Acknowledgments
However, the full potential of data-enabled aerospace engineering We gratefully acknowledge funding support through Boeing grant
will take generations of aircraft programs to realize. New programs 2018-ETT-PA-379. We would like to acknowledge several individ-
under development, such as the new midsize airplane, will provide an uals and organizations within the University of Washington and The
opportunity to develop and test entirely new design, manufacturing, Boeing Company: The Boeing Advanced Research Center and its
and testing capabilities. The digital twin will improve design and directors Per Reinhall, Santosh Devasia, and Sam Pedigo; Tia Benson
testing cycles through a more wholistic data-driven model of critical Tolle, Steve Chisholm, Chris Schuller, and Todd Zarfos at Boeing;
processes and their interactions. These integrated models will rely on Ashis Banerjee, Bing Brunton, Ramulu Mamidala, and Navid
the integration of myriad data sources into a digital thread. Lessons Zobeiry at University of Washington (UW); and the eScience
learned may then be leveraged into more seamless integration of data Institute and its directors Ed Lazowska and Magda Balazinska. We
into the aerospace industry for future programs, such as the future would also like to give special thanks to Mike Bragg for his support as
small aircraft. Much as information technology companies like Dean of the College of Engineering at UW and Greg Hyslop, Lynne
Google are valued based on their data, aerospace giants will learn Hopper, and Nancy Pendleton at Boeing for expanding the scope of
to extract value and competitive advantage by leveraging their wealth this collaboration. We are also grateful for advice and interviews with
of data. Ryan Smith, Laura Garnett, Phil Crothers, Nia Jetter, Howard
Never before has there been an architecture problem of this size, McKenzie, and Darren Macer. Finally, we are indebted to the creative
requiring the integrated efforts of information technology experts, efforts of Michael Erickson and the envisioneering team, who trans-
aerospace domain engineers, and data science teams. The aerospace lated our engineering discussions into many of the graphics in
industry currently generates tremendous volumes of data throughout this paper.
a product life cycle, but the data storage systems are not always
designed to have their data extracted, much less at near real-time
rates. Data must be seamless to access while maintaining security and
control; software must enable analytics while performing its primary References
objective. Data must be integrated while being served by decades-old [1] Harding, J. A., Shahbaz, M., Srinivas, M., and Kusiak, A., “Data
systems. Data exploration must be near-effortless and allow for Mining in Manufacturing: A Review,” Journal of Manufacturing
analysis of complex systems. Analytic results must be fed back into Science and Engineering, Vol. 128, No. 4, 2006, pp. 969–976.
the systems providing suggestions or predictions must be resolved https://doi.org/10.1115/1.2194554
without forcing all data to be migrated or new systems wholesale be [2] Lynch, C., “Big Data: How Do Your Data Grow?” Nature, Vol. 455,
No. 7209, 2008, pp. 28–29.
replaced. There is also the risk of a data mortgage and paralysis, https://doi.org/10.1038/455028a
where more effort is spent in collecting and curating data than [3] Wu, X., Kumar, V., Quinlan, J. R., Ghosh, J., Yang, Q., Motoda, H.,
analyzing it. This risk motivates a shift from big data to smart data, McLachlan, G. J., Ng, A., Liu, B., Yu, P. S., Zhou, Z.-H., Steinbach, M.,
where edge-computing and sparse/robust algorithms are used to Hand, D. J., and Steinberg, D., “Top 10 Algorithms in Data Mining,”
extract key data features in real-time. Moving from a process largely Knowledge and Information Systems, Vol. 14, No. 1, 2008, pp. 1–37.
grounded in storing data in local silos to openly sharing data will be https://doi.org/10.1007/s10115-007-0114-2
difficult and will motivate entirely new incentive structures for [4] Marx, V., “Biology: The Big Challenges of Big Data,” Nature, Vol. 498,
No. 7453, 2013, pp. 255–260.
engineers. Teams must come to identify the value they provide in
https://doi.org/10.1038/498255a
part based upon the data they generate and curate. [5] Khoury, M. J., and Ioannidis, J. P. A., “Medicine. Big Data Meets
It is also important to reiterate the need to develop ML algorithms Public Health,” Science, Vol. 346, No. 6213, 2014, pp. 1054–1055.
that are specifically tailored for the aerospace industry. ML must be https://doi.org/10.1126/science.aaa2709
demystified: it is not a magic wand, but rather a growing corpus of [6] Einav, L., and Levin, J., “Economics in the Age of Big Data,” Science,
applied optimization algorithms to build models from data. These Vol. 346, No. 6210, 2014, Paper 1243089.
models are only as good as the data used to train them, and great care https://doi.org/10.1126/science.1243089
must be taken to understand how and when these models are valid. [7] Jordan, M. I., and Mitchell, T. M., “Machine Learning: Trends, Perspec-
tives, and Prospects,” Science, Vol. 349, No. 6245, 2015, pp. 255–
Most ML algorithms are fundamentally interpolative, and extrapo- 260.
lation algorithms are both rare and challenging. Because of the need https://doi.org/10.1126/science.aaa8415
for reliable and certifiable algorithms, it is critical that physics is [8] Kutz, J. N., Data-Driven Modeling & Scientific Computation: Methods
baked into ML algorithms. Many of the concepts related to data for Complex Systems & Big Data, Oxford Univ. Press, New York,
analytics detailed within this paper constitute novel applications that 2013.
may not support a direct showing of compliance to existing regula- [9] Brunton, S. L., and Kutz, J. N., Data-Driven Science and Engineering:
tions. Regulatory compliance is a critical challenge for emerging Machine Learning, Dynamical Systems, and Control, Cambridge
Univ. Press, Cambridge, England, U.K., 2019.
approaches based on data analytics. Historically, civil aviation
https://doi.org/10.1017/9781108380690
requirements have been created and revised in response to negative [10] Donoho, D., “50 Years of Data Science,” Journal of Computational
events in the industry: primarily in reaction to accidents and and Graphical Statistics, Vol. 26, No. 4, 2017, pp. 745–766.
incidents. Further, these requirements are often made under the https://doi.org/10.1080/10618600.2017.1384734
expectation that they are deterministic in nature. As nondeterministic [11] Hey, J. G., and Trefethen, A. E., The Data Deluge: An e-Science
systems become introduced into commercial aviation systems, the Perspective, Wiley, Hoboken, NJ, 2003, https://stewardshipgap.net/
basic regulatory philosophy will not support the current methods of node/145.
compliance. Anywhere that data analytics are leveraged in the [12] Hey, A. J. G., Tansley, S., and Tolle, K. M., The Fourth Paradigm:
Data-Intensive Scientific Discovery, Microsoft Research, Redmond,
certification process, the same philosophical issues will need to be WA, 2009.
addressed. In place of a deterministic answer, statistical approaches, [13] Duraisamy, K., Iaccarino, G., and Xiao, H., “Turbulence Modeling in
and potentially models with dis-similar architectures, or other the Age of Data,” Annual Reviews of Fluid Mechanics, Vol. 51,
approaches not yet identified will become a necessity. Fortunately, Jan. 2019, pp. 357–377.
statistics are used in various areas of aircraft certification, related to https://doi.org/10.1146/annurev-fluid-010518-040547
aircraft safety assurance following single and multiple failure events. [14] Brunton, S. L., Noack, B. R., and Koumoutsakos, P., “Machine Learn-
Powered human flight is one of the greatest achievements in the ing for Fluid Mechanics,” Annual Review of Fluid Mechanics, Vol. 52,
Jan. 2020, pp. 477–508.
history of humankind, realizing the culmination of thousands of years
https://doi.org/10.1146/annurev-fluid-010719-060214
of science fiction, and having a profound impact on the past century. [15] Brunton, S. L., and Kutz, J. N., “Methods for Data-Driven Multiscale
The next century of aerospace engineering will challenge us to Model Discovery for Materials,” Journal of Physics: Materials, Vol. 2,
envision and realize a new science fiction future based on the No. 4, 2019, Paper 044002.
breathtaking array of new data-enhanced technologies. https://doi.org/10.1088/2515-7639/ab291e
BRUNTON ET AL. 2843
[16] Goodfellow, I., Bengio, Y., and Courville, A., Deep Learning, MIT [35] Colonius, T., “An Overview of Simulation, Modeling, and Active
Press, Cambridge, MA, 2016. Control of Flow/Acoustic Resonance in Open Cavities,” 39th Aero-
https://doi.org/10.1007/s10710-017-9314-z space Sciences Meeting and Exhibit, AIAA Paper 2001-0076, 2001.
[17] Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., and Fei-Fei, L., https://doi.org/10.2514/6.2001-76
“Imagenet: A Large-Scale Hierarchical Image Database,” 2009 IEEE [36] Kerstens, W., Pfeiffer, J., Williams, D., King, R., and Colonius, T.,
Conference on Computer Vision and Pattern Recognition, Inst. of “Closed-Loop Control of Lift for Longitudinal Gust Suppression at
Electrical and Electronics Engineers, New York, 2009, pp. 248–255. Low Reynolds Numbers,” AIAA Journal, Vol. 49, No. 8, 2011,
https://doi.org/10.1109/CVPR.2009.5206848 pp. 1721–1728.
[18] Krizhevsky, A., Sutskever, I., and Hinton, G. E., “Imagenet Classifi- https://doi.org/10.2514/1.J050954
cation with Deep Convolutional Neural Networks,” Communication of [37] Brunton, S. L., and Noack, B. R., “Closed-Loop Turbulence Control:
the Association for Computing Machinery, Vol. 60, No. 6, May 2017, Progress and Challenges,” Applied Mechanics Reviews, Vol. 67, No. 5,
pp. 84–90. 2015, Paper 050801.
https://doi.org/10.1145/3065386 https://doi.org/10.1115/1.4031175
[19] Sutton, R. S., and Barto, A. G., Reinforcement Learning: An Introduc- [38] Shmilovich, A., Yadlin, Y., and Whalen, E. A., “Active Flow Control
tion, MIT Press, Cambridge, MA, 2018. Computations: From a Single Actuator to a Complete Airplane,” AIAA
[20] Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, Journal, Vol. 56, No. 12, 2018, pp. 4730–4740.
A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., https://doi.org/10.2514/1.J056307
Simonyan, K., and Hassabis, D., “A General Reinforcement Learning [39] Andino, M. Y., Lin, J. C., Roman, S., Graff, E. C., Gharib, M., Whalen,
Algorithm that Masters Chess, Shogi, and Go Through Self-Play,” E. A., and Wygnanski, I. J., “Active Flow Control on Vertical Tail
Science, Vol. 362, No. 6419, 2018, pp. 1140–1144. Models,” AIAA Journal, Vol. 57, No. 8, 2019, pp. 3322–3338.
https://doi.org/10.1126/science.aar6404 https://doi.org/10.2514/1.J057876
[21] Hornik, K., Stinchcombe, M., and White, H., “Multilayer Feedforward [40] Lin, J. C., Whalen, E. A., Andino, M. Y., Graff, E. C., Lacy, D. S.,
Networks Are Universal Approximators,” Neural Networks, Vol. 2, Washburn, A. E., Gharib, M., and Wygnanski, I. J., “Full-Scale Testing
No. 5, 1989, pp. 359–366. of Active Flow Control Applied to a Vertical Tail,” Journal of Aircraft,
https://doi.org/10.1016/0893-6080(89)90020-8 Vol. 56, No. 4, 2019, pp. 1376–1386.
[22] Loiseau, J.-C., and Brunton, S. L., “Constrained Sparse Galerkin https://doi.org/10.2514/1.C034907
Regression,” Journal of Fluid Mechanics, Vol. 838, March 2018, [41] Hou, W., Darakananda, D., and Eldredge, J. D., “Machine Learning
pp. 42–67. Based Detection of Flow Disturbances Using Surface Pressure Mea-
https://doi.org/10.1017/jfm.2017.823 surements,” AIAA Journal, Vol. 57, No. 12, 2019, pp. 5079–5093.
[23] Raissi, M., and Karniadakis, G. E., “Hidden Physics Models: Machine https://doi.org/10.2514/1.J058486
Learning of Nonlinear Partial Differential Equations,” Journal of [42] Jordan, P., and Colonius, T., “Wave Packets and Turbulent Jet Noise,”
Computational Physics, Vol. 357, March 2018, pp. 125–141. Annual Review of Fluid Mechanics, Vol. 45, Jan. 2013, pp. 173–195.
https://doi.org/10.1016/j.jcp.2017.11.039 https://doi.org/10.1146/annurev-fluid-011212-140756
[24] Loiseau, J.-C., Noack, B. R., and Brunton, S. L., “Sparse Reduced- [43] Spalart, P., and Allmaras, S., “A One-Equation Turbulence Model for
Order Modeling: Sensor-Based Dynamics to Full-State Estimation,” Aerodynamic Flows,” 30th Aerospace Sciences Meeting and Exhibit,
Journal of Fluid Mechanics, Vol. 844, June 2018, pp. 459–490. AIAA Paper 1992-0439, 1992.
https://doi.org/10.1017/jfm.2018.147 https://doi.org/10.2514/6.1992-439
[25] Battaglia, P. W., Hamrick, J. B., Bapst, V., Sanchez-Gonzalez, A., [44] Spalart, P. R., and Rumsey, C. L., “Effective Inflow Conditions for
Zambaldi, V., Malinowski, M., Tacchetti, A., Raposo, D., Santoro, Turbulence Models in Aerodynamic Calculations,” AIAA Journal,
A., Faulkner, R., and Gulcehre, C., “Relational Inductive Biases, Deep Vol. 45, No. 10, 2007, pp. 2544–2553.
Learning, and Graph Networks,” arXiv preprint arXiv:1806.01261, https://doi.org/10.2514/1.29373
2018. [45] Ling, J., Kurzawski, A., and Templeton, J., “Reynolds Averaged
[26] Raissi, M., Perdikaris, P., and Karniadakis, G. E., “Physics-Informed Turbulence Modelling Using Deep Neural Networks with Embedded
Neural Networks: A Deep Learning Framework for Solving Forward Invariance,” Journal of Fluid Mechanics, Vol. 807, Nov. 2016,
and Inverse Problems Involving Nonlinear Partial Differential Equa- pp. 155–166.
tions,” Journal of Computational Physics, Vol. 378, Feb. 2019, https://doi.org/10.1017/jfm.2016.615
pp. 686–707. [46] Singh, A. P., Medida, S., and Duraisamy, K., “Machine-Learning-
https://doi.org/10.1016/j.jcp.2018.10.045 Augmented Predictive Modeling of Turbulent Separated Flows over
[27] Noé, F., Olsson, S., Köhler, J., and Wu, H., “Boltzmann Generators: Airfoils,” AIAA Journal, Vol. 55, No. 7, 2017, pp. 2215–2227.
Sampling Equilibrium States of Many-Body Systems with Deep https://doi.org/10.2514/1.J055595
Learning,” Science, Vol. 365, No. 6457, 2019, Paper eaaw1147. [47] Maulik, R., San, O., Rasheed, A., and Vedula, P., “Subgrid Modelling
https://doi.org/10.1126/science.aaw1147 for Two-Dimensional Turbulence Using Neural Networks,” Journal of
[28] Köhler, J., Klein, L., and Noé, F., “Equivariant Flows: Sampling Fluid Mechanics, Vol. 858, Jan. 2019, pp. 122–144.
Configurations for Multi-Body Systems with Symmetric Energies,” https://doi.org/10.1017/jfm.2018.770
arXiv preprint arXiv:1910.00753, 2019. [48] Spalart, P. R., and Garbaruk, A. V., “Correction to the Spalart–Allmaras
[29] Cranmer, M. D., Xu, R., Battaglia, P., and Ho, S., “Learning Symbolic Turbulence Model, Providing More Accurate Skin Friction,” AIAA
Physics with Graph Networks,” arXiv preprint arXiv:1909.05862, 2019. Journal, Vol. 58, No. 5, 2020, pp. 1903–1905.
[30] Raissi, M., Yazdani, A., and Karniadakis, G. E., “Hidden Fluid https://doi.org/10.2514/1.J059489
Mechanics: Learning Velocity and Pressure Fields from Flow Visual- [49] Thuerey, N., Weßienow, K., Prantl, L., and Hu, X., “Deep Learning
izations,” Science, Vol. 367, No. 6481, 2020, pp. 1026–1030. Methods for Reynolds-Averaged Navier–Stokes Simulations of Airfoil
https://doi.org/10.1126/science.aaw4741 Flows,” AIAA Journal, Vol. 58, No. 1, 2020, pp. 25–36.
[31] Cranmer, M., Sanchez-Gonzalez, A., Battaglia, P., Xu, R., Cranmer, https://doi.org/10.2514/1.J058291
K., Spergel, D., and Ho, S., “Discovering Symbolic Models from Deep [50] Zare, A., Georgiou, T. T., and Jovanović, M. R., “Stochastic Dynamical
Learning with Inductive Biases,” arXiv preprint arXiv:2006.11287, Modeling of Turbulent Flows,” Annual Review of Control, Robotics,
2020. and Autonomous Systems, Vol. 3, May 2020, pp. 195–219.
[32] Cranmer, M., Greydanus, S., Hoyer, S., Battaglia, P., Spergel, D., and https://doi.org/10.1146/annurev-control-053018-023843
Ho, S., “Lagrangian Neural Networks,” arXiv preprint arXiv: [51] Schmid, P. J., “Dynamic Mode Decomposition of Numerical and
2003.04630, 2020. Experimental Data,” Journal of Fluid Mechanics, Vol. 656, Aug. 2010,
[33] Taira, K., Brunton, S. L., Dawson, S. T. M., Rowley, C. W., Colonius, pp. 5–0.
T., McKeon, B. J., Schmidt, O. T., Gordeyev, S., Theofilis, V., and https://doi.org/10.1017/S0022112010001217
Ukeiley, L. S., “Modal Analysis of Fluid Flows: An Overview,” AIAA [52] Tu, J. H., Rowley, C. W., Luchtenburg, D. M., Brunton, S. L., and Kutz,
Journal, Vol. 55, No. 12, 2017, pp. 4013–4041. J. N., “On Dynamic Mode Decomposition: Theory and Applications,”
https://doi.org/10.2514/1.J056060 Journal of Computational Dynamics, Vol. 1, No. 2, 2014, pp. 391–421.
[34] Taira, K., Hemati, M. S., Brunton, S. L., Sun, Y., Duraisamy, K., https://doi.org/10.3934/jcd.2014.1.391
Bagheri, S., Dawson, S. T. M., and Yeh, C. A., “Modal Analysis of [53] Kutz, N., Brunton, S. L., Brunton, B. W., and Proctor, J. L., Dynamic
Fluid Flows: Applications and Outlook,” AIAA Journal, Vol. 58, No. 3, Mode Decomposition: Data-Driven Modeling of Complex Systems,
2020, pp. 998–1022. SIAM, Philadelphia, PA, 2016.
https://doi.org/10.2514/1.J058462 https://doi.org/10.1137/1.9781611974508
2844 BRUNTON ET AL.
[54] Mezić, I., “Analysis of Fluid Flows via Spectral Properties of the [73] Skogestad, S., and Postlethwaite, I., Multivariable Feedback Control:
Koopman Operator,” Annual Review of Fluid Mechanics, Vol. 45, Analysis and Design, 2nd ed., Wiley, Hoboken, NJ, 2005.
Jan. 2013, pp. 357–378. [74] Joshi, S., and Boyd, S., “Sensor Selection via Convex Optimization,”
https://doi.org/10.1146/annurev-fluid-011212-140652 IEEE Transactions on Signal Processing, Vol. 57, No. 2, 2009,
[55] Brunton, S. L., Brunton, B. W., Proctor, J. L., and Kutz, J. N., “Koop- pp. 451–462.
man Invariant Subspaces and Finite Linear Representations of Non- https://doi.org/10.1109/TSP.2008.2007095
linear Dynamical Systems for Control,” PLoS ONE, Vol. 11, No. 2, [75] Aravkin, A., Burke, J. V., Ljung, L., Lozano, A., and Pillonetto, G.,
2016, Paper e0150171. “Generalized Kalman Smoothing: Modeling and Algorithms,” Auto-
https://doi.org/10.1371/journal.pone.0150171 matica, Vol. 86, Dec. 2017, pp. 63–86.
[56] McKeon, B. J., and Sharma, A. S., “A Critical Layer Model for https://doi.org/10.1016/j.automatica.2017.08.011
Turbulent Pipe Flow,” Journal of Fluid Mechanics, Vol. 658, [76] Manohar, K., Brunton, B. W., Kutz, J. N., and Brunton, S. L., “Data-
Sept. 2010, pp. 336–382. Driven Sparse Sensor Placement for Reconstruction: Demonstrating
https://doi.org/10.1017/S002211201000176X the Benefits of Exploiting Known Patterns,” IEEE Control Systems
[57] Brunton, S. L., Proctor, J. L., and Kutz, J. N., “Discovering Governing Magazine, Vol. 38, No. 3, 2018, pp. 63–86.
Equations from Data by Sparse Identification of Nonlinear Dynamical https://doi.org/10.1109/MCS.2018.2810460
Systems,” Proceedings of the National Academy of Sciences, Vol. 113, [77] Bottou, L., Curtis, F. E., and Nocedal, J., “Optimization Methods for
No. 15, 2016, pp. 3932–3937. Large-Scale Machine Learning,” SIAM Review, Vol. 60, No. 2, 2018,
https://doi.org/10.1073/pnas.1517384113 pp. 223–311.
[58] Renganathan, S. A., “Koopman-Based Approach to Nonintrusive https://doi.org/10.1137/16M1080173

Reduced Order Modeling: Application to Aerodynamic Shape Opti- [78] Shapiro, A., Dentcheva, D., and Ruszczyński, A., Lectures on Stochastic
mization and Uncertainty Propagation,” AIAA Journal, Vol. 58, No. 5, Programming: Modeling and Theory, SIAM, Philadelphia, PA, 2009.
2020, pp. 2221–2235. https://doi.org/10.1137/1.9781611973433
https://doi.org/10.2514/1.J058744 [79] Tsitsiklis, J., Bertsekas, D., and Athans, M., “Distributed Asynchro-
[59] Noack, B. R., Afanasiev, K., Morzynski, M., Tadmor, G., and Thiele, nous Deterministic and Stochastic Gradient Optimization Algorithms,”
F., “A Hierarchy of Low-Dimensional Models for the Transient and IEEE Transactions on Automatic Control, Vol. 31, No. 9, 1986,
Post-Transient Cylinder Wake,” Journal of Fluid Mechanics, Vol. 497, pp. 803–812.
Dec. 2003, pp. 335–363. https://doi.org/10.1109/TAC.1986.1104412
https://doi.org/10.1017/S0022112003006694 [80] Bottou, L., “Large-Scale Machine Learning with Stochastic
[60] Carlberg, K., Farhat, C., Cortial, J., and Amsallem, D., “The GNAT Gradient Descent,” Proceedings of COMPSTAT’2010, edited by
Method for Nonlinear Model Reduction: Effective Implementation Y. Lechevallier, and G. Saporta, Springer-Verlag, Berlin, 2010,
and Application to Computational Fluid Dynamics and Turbulent pp. 177–186.
Flows,” Journal of Computational Physics, Vol. 242, June 2013, https://doi.org/10.1007/978-3-7908-2604-3_16
pp. 623–647. [81] Kingma, D. P., and Ba, J., “Adam: A Method for Stochastic Optimi-
https://doi.org/10.1016/j.jcp.2013.02.028 zation,” arXiv preprint arXiv:1412.6980, 2014.
[61] Bui-Thanh, T., Willcox, K., Ghattas, O., and van Bloemen Waanders, [82] Davis, D., Drusvyatskiy, D., Kakade, S., and Lee, J. D., “Stochastic
B., “Goal-Oriented, Model-Constrained Optimization for Reduction of Subgradient Method Converges on Tame Functions,” Foundations of
Large-Scale Systems,” Journal of Computational Physics, Vol. 224, Computational Mathematics, Vol. 31, No. 9, 2018, pp. 1–36.
No. 2, 2007, pp. 880–896. https://doi.org/10.1007/s10208-018-09409-5
https://doi.org/10.1016/j.jcp.2006.10.026 [83] Reddi, S. J., Sra, S., Poczos, B., and Smola, A. J., “Proximal Stochastic
[62] Bui-Thanh, T., Willcox, K., and Ghattas, O., “Model Reduction for Methods for Nonsmooth Nonconvex Finite-Sum Optimization,”
Large-Scale Systems with High-Dimensional Parametric Input Space,” Advances in Neural Information Processing Systems, Vol. 29, edited
SIAM Journal on Scientific Computing, Vol. 30, No. 6, 2008, pp. 3270– by D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, Curran
3288. Associates, Inc., 2016, pp. 1145–1153.
https://doi.org/10.1137/070694855 https://doi.org/10.5555/3157096.3157225
[63] Amsallem, D., Zahr, M., Choi, Y., and Farhat, C., “Design Optimiza- [84] Aravkin, A., and Davis, D., “Trimmed Statistical Estimation via Vari-
tion Using Hyper-Reduced-Order Models,” Structural and Multidisci- ance Reduction,” Mathematics of Operations Research, Vol. 45, No. 1,
plinary Optimization, Vol. 51, No. 4, 2015, pp. 919–940. 2020, pp. 292–322.
https://doi.org/10.1007/s00158-014-1183-y https://doi.org/10.1287/moor.2019.0992
[64] Benner, P., Gugercin, S., and Willcox, K., “A Survey of Projection- [85] Rockafellar, R. T., and Wets, R. J.-B., Variational Analysis, Vol. 317,
Based Model Reduction Methods for Parametric Dynamical Systems,” Springer Science & Business Media, Berlin, 2009.
SIAM Review, Vol. 57, No. 4, 2015, pp. 483–531. [86] Burke, J. V., and Ferris, M. C., “A Gauss–Newton Method for Convex
https://doi.org/10.1137/130932715 Composite Optimization,” Mathematical Programming, Vol. 71,
[65] Carlberg, K., Barone, M., and Antil, H., “Galerkin v. Least-Squares No. 2, 1995, pp. 179–194.
Petrov–Galerkin Projection in Nonlinear Model Reduction,” Journal https://doi.org/10.1007/BF01585997
of Computational Physics, Vol. 330, Feb. 2017, pp. 693–734. [87] Davis, D., and Drusvyatskiy, D., “Stochastic Model-Based Minimiza-
https://doi.org/10.1016/j.jcp.2016.10.033 tion of Weakly Convex Functions,” SIAM Journal on Optimization,
[66] Carlberg, K. T., Jameson, A., Kochenderfer, M. J., Morton, J., Peng, L., Vol. 29, No. 1, 2019, pp. 207–239.
and Witherden, F. D., “Recovering Missing CFD Data for High-Order https://doi.org/10.1137/18M1178244
Discretizations Using Deep Neural Networks and Dynamics Learn- [88] Golub, G., and Pereyra, V., “Separable Nonlinear Least Squares: The
ing,” arXiv preprint arXiv:1812.01177, 2018. Variable Projection Method and Its Applications,” Inverse problems,
[67] Singh, A. P., Medida, S., and Duraisamy, K., “Machine-Learning- Vol. 19, No. 2, 2003, Paper R1.
Augmented Predictive Modeling of Turbulent Separated Flows over https://doi.org/10.1088/0266-5611/19/2/201
Airfoils,” AIAA Journal, Vol. 55, No. 7, 2017, pp. 2215–2227. [89] Zheng, P., Askham, T., Brunton, S. L., Kutz, J. N., and Aravkin, A. Y.,
https://doi.org/10.2514/1.J055595 “A Unified Framework for Sparse Relaxed Regularized Regression:
[68] Boyd, S., Boyd, S. P., and Vandenberghe, L., Convex Optimization, SR3,” IEEE Access, Vol. 7, No. 1, 2019, pp. 1404–1423.
Cambridge Univ. Press, Cambridge, England, U.K., 2004. https://doi.org/10.1109/ACCESS.2018.2886528
[69] Kalman, E., “A New Approach to Linear Filtering and Prediction [90] Nocedal, J., and Wright, S., Numerical Optimization, Springer-Verlag,
Problems,” Journal of Fluids Engineering, Vol. 82, No. 1, 1960, New York, 2006.
pp. 35–45. [91] Combettes, P. L., and Pesquet, J.-C., “Proximal Splitting Methods in
https://doi.org/10.1115/1.3662552 Signal Processing,” Fixed-Point Algorithms for Inverse Problems in
[70] Bertsekas, D. P., Dynamic Programming and Optimal Control, Vol. 1, Science and Engineering, Springer, New York, 2011, pp. 185–212.
Athena Scientific, Belmont, MA, 1995. https://doi.org/10.1007/978-1-4419-9569-8_10
[71] Rao, C. V., Wright, S. J., and Rawlings, J. B., “Application of Interior- [92] Parikh, N., and Boyd, S., “Proximal Algorithms,” Foundations and
Point Methods to Model Predictive Control,” Journal of Optimization Trends® in Optimization, Vol. 1, No. 3, 2014, pp. 127–239.
Theory and Applications, Vol. 99, No. 3, 1998, pp. 723–757. https://doi.org/10.1561/2400000003
https://doi.org/10.1023/A:1021711402723 [93] Attouch, H., Bolte, J., and Svaiter, B. F., “Convergence of Descent
[72] Dullerud, G. E., and Fernando Paganini, F., “A Course in Robust Methods for Semi-Algebraic and Tame Problems: Proximal Algo-
Control Theory: A Convex Approach,” Texts in Applied Mathematics, rithms, Forward–Backward Splitting, and Regularized Gauss–Seidel
Springer, Berlin, 2000. Methods,” Mathematical Programming, Vol. 137, Nos. 1–2, 2013,
BRUNTON ET AL. 2845
pp. 91–129. May 2014, pp. 22–34.

https://doi.org/10.1007/s10107-011-0484-9 https://doi.org/10.1016/j.cviu.2013.11.009
[94] Michael, W., “Mahoney and Petros Drineas CUR Matrix Decomposi- [113] Kondor, D., Csabai, I., Dobos, L., Szüle, J., Barankai, N., Hanyecz, T.,
tions for Improved Data Analysis,” Proceedings of the National Acad- Sebők, T., Kallus, Z., and Vattay, G., “Using Robust PCA to Estimate
emy of Sciences, Vol. 106, No. 3, 2009, pp. 697–702. Regional Characteristics of Language use from Geo-Tagged Twitter
https://doi.org/10.1073/pnas.0803205106 Messages,” 2013 IEEE 4th International Conference on Cognitive
[95] Mahoney, M. W., “Randomized Algorithms for Matrices and Data,” Infocommunications (CogInfoCom), Inst. of Electrical and Electronics
Foundations and Trends in Machine Learning, Vol. 3, No. 3, 2011, Engineers, New York, 2013, pp. 393–398.
pp. 123–224. https://doi.org/10.1109/CogInfoCom.2013.6719277
https://doi.org/10.1561/2200000035 [114] Wright, J., Ganesh, A., Rao, S., Peng, Y., and Ma, Y., “Robust Principal
[96] Halko, N., Martinsson, P. G., and Tropp, J. A., “Finding Structure Component Analysis: Exact Recovery of Corrupted Low-Rank Matri-
with Randomness: Probabilistic Algorithms for Constructing Approxi- ces via Convex Optimization,” Advances in Neural Information
mate Matrix Decompositions,” SIAM Review, Vol. 53, No. 2, 2011, Processing Systems, Vol. 22, edited by Y. Bengio, D. Schuurmans,
pp. 217–288. J. Lafferty, C. Williams, and A. Culotta, Curran Associates, Inc.,
https://doi.org/10.1137/090771806 2009, pp. 2080–2088, https://proceedings.neurips.cc/paper/2009/file/
[97] Liberty, E., “Simple and Deterministic Matrix Sketching,” Proceed- c45147dee729311ef5b5c3003946c48f-Paper.pdf.
ings of the 19th ACM SIGKDD International Conference on Knowl- [115] Boschert, S., and Rosen, R., “Digital Twin—The Simulation
edge Discovery and Data Mining, KDD’13, Assoc. for Computing Aspect,” Mechatronic Futures, Springer, Cham, Switzerland, 2016,
Machinery, New York, 2013, pp. 581–588. pp. 59–74.

https://doi.org/10.1145/2487575.2487623 https://doi.org/10.1007/978-3-319-32156-1_5
[98] Erichson, N. B., Voronin, S., Brunton, S. L., and Kutz, J. N., “Ran- [116] Grieves, M., and Vickers, J., “Digital Twin: Mitigating Unpredictable,
domized Matrix Decompositions Using R,” Journal of Statistical Undesirable Emergent Behavior in Complex Systems,” Transdiscipli-
Software, Vol. 89, No. 1, 2019, pp. 1–48. nary Perspectives on Complex Systems, Springer, Cham, Switzerland,
https://doi.org/10.18637/jss.v089.i11 2017, pp. 85–113.
[99] Sarlos, T., “Improved Approximation Algorithms for Large Matrices https://doi.org/10.1007/978-3-319-38756-7_4
via Random Projections,” 47th Annual IEEE Symposium on Founda- [117] Tao, F., Cheng, J., Qi, Q., Zhang, M., Zhang, H., and Sui, F., “Digital
tions of Computer Science, 2006, pp. 143–152. Twin-Driven Product Design, Manufacturing and Service with Big
https://doi.org/10.1109/FOCS.2006.37 Data,” International Journal of Advanced Manufacturing Technology,
[100] Martinsson, P. G., Rokhlin, V., and Tygert, M., “A Randomized Algo- Vol. 94, Nos. 9–12, 2018, pp. 3563–3576.
rithm for the Decomposition of Matrices,” Applied and Computational https://doi.org/10.1007/s00170-017-0233-1
Harmonic Analysis, Vol. 30, No. 1, 2011, pp. 47–68. [118] Rasheed, A., San, O., and Kvamsdal, T., “Digital Twin: Values, Chal-
https://doi.org/10.1016/j.acha.2010.02.003 lenges and Enablers,” arXiv preprint arXiv:1910.01719, 2019.
[101] Rokhlin, V., Szlam, A., and Tygert, M., “A Randomized Algorithm for [119] Chinesta, F., Cueto, E., Abisset-Chavanne, E., Duval, J. L., and Khaldi,
Principal Component Analysis,” SIAM Journal on Matrix Analysis and F. E., “Virtual, Digital and Hybrid Twins: A New Paradigm in Data-
Applications, Vol. 31 No. 3, 2010, pp. 1100–1124. Based Engineering and Engineered Data,” Archives of Computational
https://doi.org/10.1137/080736417 Methods in Engineering, Vol. 27, No. 1, 2020, pp. 105–134.
[102] Halko, N., Martinsson, P. G., Shkolnisky, Y., and Tygert, M., https://doi.org/10.1007/s11831-018-9301-4
“An Algorithm for the Principal Component Analysis of Large Data [120] Everson, R., and Sirovich, L., “Karhunen–Loève Procedure for Gappy
Sets,” SIAM Journal on Scientific Computing, Vol. 33, No. 5, 2011, Data,” Journal of the Optical Society of America A, Vol. 12, No. 8,
pp. 2580–2594. 1995, pp. 1657–1664.
https://doi.org/10.1137/100804139 https://doi.org/10.1364/JOSAA.12.001657
[103] Duersch, J. A., and Gu, M., “Randomized QR with Column Pivoting,” [121] Willcox, K., “Unsteady Flow Sensing and Estimation via the Gappy
SIAM Journal on Scientific Computing, Vol. 39, No. 4, 2017, Proper Orthogonal Decomposition,” Computers & Fluids, Vol. 35,
pp. C263–C291. No. 2, 2006, pp. 208–226.
https://doi.org/10.1137/15M1044680 https://doi.org/10.1016/j.compfluid.2004.11.006
[104] Shabat, G., Shmueli, Y., Aizenbud, Y., and Averbuch, A., “Random- [122] Maxime Barrault, M., Maday, Y., Nguyen, N. C., and Patera, A. T., “An
ized LU Decomposition,” Applied and Computational Harmonic ‘Empirical Interpolation’ Method: Application to Efficient Reduced-
Analysis, Vol. 44, No. 2, 2018, pp. 246–272. Basis Discretization of Partial Differential Equations,” Comptes
https://doi.org/10.1016/j.acha.2016.04.006 Rendus Mathematique, Vol. 339, No. 9, 2004, pp. 667–672.
[105] Erichson, N. B., Manohar, K., Brunton, S. L., and Kutz, J. N., “Ran- https://doi.org/10.1016/j.crma.2004.08.006
domized CP Tensor Decomposition,” Machine Learning: Science and [123] Chaturantabut, S., and Sorensen, D. C., “Nonlinear Model Reduction
Technology, Vol. 1, No. 2, 2020, Paper 025012. via Discrete Empirical Interpolation,” SIAM Journal on Scientific
https://doi.org/10.1088/2632-2153/ab8240 Computing, Vol. 32, No. 5, 2010, pp. 2737–2764.
[106] Erichson, N. B., Mathelin, L., Kutz, J. N., and Brunton, S. L., “Ran- https://doi.org/10.1137/090766498
domized Dynamic Mode Decomposition,” SIAM Journal on Applied [124] Drmac, Z., and Gugercin, S., “A New Selection Operator for the
Dynamical Systems, Vol. 18, No. 4, 2019, pp. 1867–1891. Discrete Empirical Interpolation Method—Improved a Priori Error
https://doi.org/10.1137/18M1215013 Bound and Extensions,” SIAM Journal on Scientific Computing,
[107] Alla, A., and Kutz, J. N., “Randomized Model Order Reduction,” Vol. 38, No. 2, 2016, pp. A631–A648.
Advances in Computational Mathematics, Vol. 45, No. 3, 2019, https://doi.org/10.1137/15M1019271
pp. 1251–1271. [125] Candès, E. J., “Compressive Sampling,” Proceedings of the
https://doi.org/10.1007/s10444-018-09655-9 International Congress of Mathematicians, Vol. 3, Aug. 2006,
[108] Bai, Z., Kaiser, E., Proctor, J. L., Kutz, J. N., and Brunton, S. L., pp. 1433–1452.
“Dynamic Mode Decomposition for Compressive System Identifica- https://doi.org/10.4171/022-3/69
tion,” AIAA Journal, Vol. 58, No. 2, 2020, pp. 561–574. [126] Donoho, D. L., “Compressed Sensing,” IEEE Transactions on Infor-
https://doi.org/10.2514/1.J057870 mation Theory, Vol. 52, No. 4, 2006, pp. 1289–1306.
[109] Huber, P. J., “John W, Tukey’s Contributions to Robust Statistics,” https://doi.org/10.1109/TIT.2006.871582
Annals of Statistics, Vol. 30, No. 6, 2002, pp. 1640–1648. [127] Candès, E. J., Romberg, J., and Tao, T., “Stable Signal Recovery from
https://doi.org/10.1214/aos/1043351251 Incomplete and Inaccurate Measurements,” Communications in Pure
[110] Candès, E. J., Li, X., Ma, Y., and Wright, J., “Robust Principal and Applied Mathematics, Vol. 59, No. 8, 2006, pp. 1207–1223.
Component Analysis?” Journal of the ACM, Vol. 58, No. 3, 2011, https://doi.org/10.1002/cpa.20124
Paper 11. [128] Candès, E. J., and Tao, T., “Near Optimal Signal Recovery from
https://doi.org/10.1145/1970392.1970395 Random Projections: Universal Encoding Strategies?” IEEE Trans-
[111] p D. L., “The Optimal Hard Threshold for
Gavish, M., and Donoho, actions on Information Theory, Vol. 52, No. 12, 2006, pp. 5406–5425.
Singular Values is 4∕ 3,” IEEE Transactions on Information Theory, https://doi.org/10.1109/TIT.2006.885507
Vol. 60, No. 8, 2014, pp. 5040–5053. [129] Candès, E. J., Romberg, J., and Tao, T., “Robust Uncertainty Princi-
https://doi.org/10.1109/TIT.2014.2323359 ples: Exact Signal Reconstruction from Highly Incomplete Frequency
[112] Bouwmans, T., and Zahzah, E. H., “Robust PCA via Principal Com- Information,” IEEE Transactions on Information Theory, Vol. 52,
ponent Pursuit: A Review for a Comparative Evaluation in Video No. 2, 2006, pp. 489–509.
Surveillance,” Computer Vision and Image Understanding, Vol. 122, https://doi.org/10.1109/TIT.2005.862083
2846 BRUNTON ET AL.
[130] Baraniuk, R. G., “Compressive Sensing,” IEEE Signal Processing Journal of Aircraft, Vol. 36, No. 1, 1999, pp. 298–307.
Magazine, Vol. 24, No. 4, 2007, pp. 118–121. https://doi.org/10.2514/2.2437
https://doi.org/10.1109/MSP.2007.4286571 [151] Bowcutt, K., Kuruvila, G., Grandine, T. A., Hogan, T. A., and Cramer,
[131] Dhingra, N. K., Jovanovic, M. R., and Luo, Z.-Q., “An ADMM E. J., “Advancements in Multidisciplinary Design Optimization Applied
Algorithm for Optimal Sensor and Actuator Selection,” 53rd IEEE to Hypersonic Vehicles to Achieve Closure,” 15th International Space
Conference on Decision and Control, 2014, pp. 4039–4044. Planes and Hypersonic Systems and Technologies Conference, AIAA
https://doi.org/10.1109/CDC.2014.7040017 Paper 2008-2591, 2008.
[132] Manohar, K., Brunton, S. L., and Kutz, J. N., “Environmental Identi- https://doi.org/10.2514/6.2008-2591
fication in Flight Using Sparse Approximation of Wing Strain,” Jour- [152] Rallabhandi, S. K., and Mavris, D. N., “Simultaneous Airframe and
nal of Fluids and Structures, Vol. 70, April 2017, pp. 162–180. Propulsion Cycle Optimization for Supersonic Aircraft Design,” Jour-
https://doi.org/10.1016/j.jfluidstructs.2017.01.008 nal of Aircraft, Vol. 45, No. 1, 2008, pp. 38–55.
[133] Maute, K., Nikbay, M., and Farhat, C., “Coupled Analytical Sensitivity https://doi.org/10.2514/1.33183
Analysis and Optimization of Three-Dimensional Nonlinear Aeroelas- [153] Manan, A., and Cooper, J., “Design of Composite Wings Including
tic Systems,” AIAA Journal, Vol. 39, No. 11, 2001, pp. 2051–2061. Uncertainties: A Probabilistic Approach,” Journal of Aircraft, Vol. 46,
https://doi.org/10.2514/2.1227 No. 2, 2009, pp. 601–607.
[134] Dupuis, R., Jouhaud, J.-C., and Sagaut, P., “Surrogate Modeling of https://doi.org/10.2514/1.39138
Aerodynamic Simulations for Multiple Operating Conditions [154] Henderson, R. P., Martins, J. R. A. A., and Perez, R. E., “Aircraft
Using Machine Learning,” AIAA Journal, Vol. 56, No. 9, 2018, Conceptual Design for Optimal Environmental Performance,” Aero-
pp. 3622–3635. nautical Journal, Vol. 116, No. 1175, 2012, pp. 1–22.
https://doi.org/10.2514/1.J056405 https://doi.org/10.1017/S000192400000659X
[135] Swischuk, R., Kramer, B., Huang, C., and Willcox, K., “Learning [155] Martins, J. R. A. A., and Lambe, A., “Multidisciplinary Design Opti-
Physics-Based Reduced-Order Models for a Single-Injector Combus- mization: A Survey of Architectures,” AIAA Journal, Vol. 51, No. 9,
tion Process,” AIAA Journal, Vol. 58, No. 6, 2020, pp. 2658–2672. 2013, pp. 2049–2075.
https://doi.org/10.2514/1.J058943 https://doi.org/10.2514/1.J051895
[136] Bongard, J., and Lipson, H., “Automated Reverse Engineering of [156] Bons, N., and Martins, J. R. A. A., “Aerostructural Wing Design
Nonlinear Dynamical Systems,” Proceedings of the National Academy Exploration with Multidisciplinary Design Optimization,” AIAA Sci-
of Sciences, Vol. 104, No. 24, 2007, pp. 9943–9948. tech 2020 Forum, AIAA Paper 2020-0544, 2020.
https://doi.org/10.1073/pnas.0609476104 https://doi.org/10.2514/6.2020-0544
[137] Schmidt, M., and Lipson, H., “Distilling Free-Form Natural Laws from [157] Baran, I., Cinar, K., Ersoy, N., Akkerman, R., and Hattel, J. H., “A
Experimental Data,” Science, Vol. 324, No. 5923, 2009, pp. 81–85. Review on the Mechanical Modeling of Composite Manufacturing
https://doi.org/10.1126/science.1165893 Processes,” Archives of Computational Methods in Engineering,
[138] Brunton, S. L., Brunton, B. W., Proctor, J. L., Kaiser, E., and Kutz, Vol. 24, No. 2, 2017, pp. 365–395.
J. N., “Chaos as an Intermittently Forced Linear System,” Nature https://doi.org/10.1007/s11831-016-9167-2
Communications, Vol. 8, No. 19, 2017, pp. 1–9. [158] Singh, S., Shebab, E., Higgins, N., Fowler, K., Tomiyama, T., and
https://doi.org/10.1038/s41467-017-00030-8 Fowler, C., “Challenges of Digital Twin in High Value Manufacturing,”
[139] Lee, K., and Carlberg, K. T., “Model Reduction of Dynamical Systems on SAE TP 2018-01-1928, 2018.
Nonlinear Manifolds Using Deep Convolutional Autoencoders,” Journal https://doi.org/10.4271/2018-01-1928
of Computational Physics, Vol. 404, March 2020, Paper 108973. [159] Manohar, K., Hogan, T., Buttrick, J., Banerjee, A. G., Kutz, J. N., and
https://doi.org/10.1016/j.jcp.2019.108973 Brunton, S. L., “Predicting Shim Gaps in Aircraft Assembly with
[140] Champion, K., Lusch, B., Kutz, J. N., and Brunton, S. L., “Data- Machine Learning and Sparse Sensing,” Journal of Manufacturing
Driven Discovery of Coordinates and Governing Equations,” Proceed- Systems, Vol. 48, July 2018, pp. 87–95.
ings of the National Academy of Sciences, Vol. 116, No. 45, 2019, https://doi.org/10.1016/j.jmsy.2018.01.011
pp. 22,445–22,451. [160] Conduit, B. D., Jones, N. G., Stone, H. J., and Conduit, G. J., “Design
https://doi.org/10.1073/pnas.1906995116 of a Nickel-Base Superalloy Using a Neural Network,” Materials &
[141] Kennedy, M. C., and O’Hagan, A., “Bayesian Calibration of Computer Design, Vol. 131, Oct. 2017, pp. 358–365.
Models,” Journal of the Royal Statistical Society: Series B (Statistical https://doi.org/10.1016/j.matdes.2017.06.007
Methodology), Vol. 63, No. 3, Aug. 2001, pp. 425–464. [161] Verpoort, P. C., MacDonald, P., and Conduit, G. J., “Materials Data
https://doi.org/10.1111/1467-9868.00294 Validation and Imputation with an Artificial Neural Network,”
[142] Arendt, P., Apley, D. W., and Chen, W., “Quantification of Model Computational Materials Science, Vol. 147, May 2018, pp. 176–
Uncertainty: Calibration, Model Discrepancy, and Identifiability,” 185.
Journal of Mechanical Design, Vol. 134, No. 10, 2012, Paper 100908. https://doi.org/10.1016/j.commatsci.2018.02.002
https://doi.org/10.1115/1.4007390 [162] Conduit, B. D., Jones, N. G., Stone, H. J., and Conduit, G. J., “Prob-
[143] Quiñonero-Candela, J., and Rasmussen, C. E., “A Unifying View of abilistic Design of a Molybdenum-Base Alloy Using a Neural Net-
Sparse Approximate Gaussian Process Regression,” Journal of work,” Scripta Materialia, Vol. 146, March 2018, pp. 82–86.
Machine Learning Research, Vol. 6, Dec. 2005, pp. 1939–1959. https://doi.org/10.1016/j.scriptamat.2017.11.008
https://doi.org/10.5555/1046920.1194909 [163] Green, A. G., Conduit, G., and Krüger, F., “Quantum Order-by-
[144] Kaheman, K., Kaiser, E., Strom, B., Kutz, J. N., and Brunton, S. L., Disorder in Strongly Correlated Metals,” Annual Review of Condensed
“Learning Discrepancy Models from Experimental Data,” arXiv pre- Matter Physics, Vol. 9, March 2018, pp. 59–77.
print arXiv:1909.08574, 2019. https://doi.org/10.1146/annurev-conmatphys-033117-053925
[145] Raymer, D., Aircraft Design: A Conceptual Approach, AIAA, Reston, [164] Sachs, U., “Friction and Bending in Thermoplastic Composites
VA, 2012. Forming Processes,” Ph.D. Thesis, Univ. of Twente, Enschede,
https://doi.org/10.2514/4.104909 The Netherlands, 2014.
[146] Oswald, W. B., “The Transverse Force Distribution on Ellipsoidal and [165] Hannappel, S. P., “Forming of UD Fibre Reinforced Thermoplastics,”
Nearly Ellipsoidal Bodies Moving in an Arbitrary Potential Flow,” Ph.D. Thesis, Univ. of Twente, Enschede, The Netherlands, 2013.
Ph.D. Thesis, California Inst. of Technology, Pasadena, CA, 1932. [166] Juarez, P., Gregory, E., and Cramer, K., “In Situ Thermal Inspection of
https://doi.org/10.7907/QK4H-C181 Automated Fiber Placement Manufacturing,” SAMPE Journal,
[147] Oswald, W. B., “General Formulas and Charts for the Calculation of Vol. 2102, No. 1, 2020, Paper 120005.
Airplane Performance,” NACA TR-408, 1932. https://doi.org/10.1063/1.5099847
[148] Cramer, E. J., Dennis, J. E., Jr., Frank, P. D., Lewis, R. M., and Shubin, [167] Cemenska, J., Rudberg, T., and Henscheid, M., “Automated In-Process
G. R., “Problem Formulation for Multidisciplinary Optimization,” Inspection System for AFP Machines,” SAE International Journal of
SIAM Journal on Optimization, Vol. 4, No. 4, 1994, pp. 754–776. Aerospace, Vol. 8, Sept. 2015, pp. 303–309.
https://doi.org/10.1137/0804044 https://doi.org/10.4271/2015-01-2608
[149] Booker, A. J., Dennis, J. E., Frank, P. D., Serafini, D. B., Torczon, V., [168] Sacco, C., Radwan, A. B., Beatty, T., and Harik, R., “Machine Learning
and Trosset, M. W., “A Rigorous Framework for Optimization of Based AFP Inspection: A Tool for Characterization and Integration,”
Expensive Functions by Surrogates,” Structural Optimization, Vol. 17, SAMPE Journal, 2020.
No. 1, 1999, pp. 1–13. https://doi.org/10.33599/nasampe/s.19.1594
https://doi.org/10.1007/BF01197708 [169] Blom, A. W., “Structural Performance of Fiber-Placed, Variable-
[150] Mavris, D. N., Bandte, O., and DeLaurentis, D. A., “Robust Design Stiffness Composite Conical and Cylindrical Shells,” Ph.D. Thesis,
Simulation: A Probabilistic Approach to Multidisciplinary Design,” Delft Univ. of Technology, Delft, The Netherlands, 2010.
BRUNTON ET AL. 2847
[170] Cohn, D. A., Ghahramani, Z., and Jordan, M. I., “Active Learning with [181] Muelaner, J. E., Martin, O. C., and Maropoulos, P. G., “Achieving Low
Statistical Models,” Journal of Artificial Intelligence Research, Vol. 4, Cost and High Quality Aero Structure Assembly Through Integrated
March 1996, pp. 129–145. Digital Metrology Systems,” Procedia CIRP, Vol. 7, Jan. 2013,
https://doi.org/10.1613/jair.295 pp. 688–693.
[171] Settles, B., “Active Learning Literature Survey,” Dept. of Computer https://doi.org/10.1016/j.procir.2013.06.054
Sciences, Univ. of Wisconsin-Madison TR 1648, 2009. [182] Boyl-Davis, T. M., Jones, D. D., and Zimmerman, T. E., “Digitally
[172] de Silva, B. M., Callaham, J., Jonker, J., Goebel, N., Klemisch, J., Designed Shims for Joining Parts of an Assembly,” U.S. Patent
McDonald, D., Hicks, N., Kutz, J. N., Brunton, S. L., and Aravkin, A. US9429935B2, 2014.
Y., “Physics-Informed Machine Learning for Sensor Fault Detection [183] Vasquez, C. M., Boyl-Davis, T. M., Valenzuela, D. I., and Jones, D. D.,
with Flight Test Data,” arXiv preprint arXiv:2006.13380, 2020. “Systems and Methods for Robotic Measurement of Parts,” U.S. Patent
[173] Williams, M. O., Rowley, C. W., Mezić, I., and Kevrekidis, I. G., “Data US9958854B2, 2014.
Fusion via Intrinsic Dynamic Variables: An Application of Data- [184] Valenzuela, D. I., Boyl-Davis, T. M., and Jones, D. D., “Systems,
Driven Koopman Spectral Analysis,” EPL, Vol. 109, No. 4, 2015, Methods, and Apparatus for Automated Predictive Shimming for
Paper 40007. Large Structures,” U.S. Patent US9599983B2, 2015.
https://doi.org/10.1209/0295-5075/109/40007 [185] Boyl-Davis, T. M., Jones, D. D., and Zimmerman, T. E., “Methods of
[174] Marsh, B. J., “Laser Tracker Assisted Aircraft Machining and Fabricating Shims for Joining Parts,” U.S. Patent US9429935B2, Aug.
Assembly,” SAE TP 2008-01-2313, 2008. 30, 2016.
https://doi.org/10.4271/2008-01-2313 [186] Antolin-Urbaneja, J. C., Livinalli, J., Puerto, M., Liceaga, M.,
[175] Jamshidi, J., Kayani, A., Iravani, P., Maropoulos, P. G., and Summers, Rubio, A., San-Roman, A., and Goenaga, I., “End-Effector for
M. D., “Manufacturing and Assembly Automation by Integrated Met- Automatic Shimming of Composites,” SAE TP 2016-01-2111,
rology Systems for Aircraft Wing Fabrication,” Proceedings of the 2016.
Institution of Mechanical Engineers, Part B: Journal of Engineering https://doi.org/10.4271/2016-01-2111
Manufacture, Vol. 224, No. 1, 2010, pp. 25–36. [187] Golub, G., and Kahan, W., “Calculating the Singular Values and
https://doi.org/10.1243/09544054JEM1280 Pseudo-Inverse of a Matrix,” Journal of the Society for Industrial &
[176] Muelaner, J. E., and Maropoulos, P. G., “Design for Measurement Applied Mathematics, Series B: Numerical Analysis, Vol. 2, No. 2,
Assisted Determinate Assembly (MADA) of Large Composite Struc- 1965, pp. 205–224.
tures,” Journal of the Coordinate Metrology Systems Conference, https://doi.org/10.1137/0702016
2010. [188] Brunton, B. W., Brunton, S. L., Proctor, J. L., and Kutz, J. N., “Sparse
[177] Marsh, B. J., Vanderwiel, T., VanScotter, K., and Thompson, M., Sensor Placement Optimization for Classification,” SIAM Journal on
“Method for Fitting Part Assemblies,” U.S. Patent US7756321B2, Applied Mathematics, Vol. 76, No. 5, 2016, pp. 2099–2122.
2010. https://doi.org/10.1137/15M1036713
[178] Muelaner, J. E., Kayani, A., Martin, O., and Maropoulos, P., “Meas- [189] Johnson, W., “Model for Vortex Ring State Influence on Rotorcraft
urement Assisted Assembly and the Roadmap to Part-to-Part Flight Dynamics,” NASA TP-2005-213477, 2005.
Assembly,” Proceedings of DET2011 7th International Conference [190] Gray, G. J., “Report of the Panel to Review the V-22 Program,” Tech.
on Digital Enterprise Technology, Univ. of Bath, Sept. 2011, Rept., Dept. of Defense Panel Review, 2001.
pp. 11–19. [191] Du, T., Schulz, A., Zhu, B., Bickel, B., and Matusik, W., “Computa-
[179] Chouvion, B., Popov, A., Ratchev, S., Mason, C., and Summers, M., tional Multicopter Design,” ACM Transactions, Vol. 35, No. 6, 2016.
“Interface Management in Wing-Box Assembly,” SAE TP 2011-01- https://doi.org/10.1145/2980179.2982427
2640, 2011. [192] Patil, T. H. D. J., and Davenport, T., “Data Scientist: The Sexiest Job of
https://doi.org/10.4271/2011-01-2640 the 21st Century,” Harvard Business Review, Vol. 90, No. 10, 2012,
[180] Muelaner, J. E., and Maropoulos, P., “Integrated Dimensional Variation pp. 70–76.
Management in the Digital Factory,” Proceedings of DET2011 7th
International Conference on Digital Enterprise Technology, Univ. of P. Givi
Bath, Sept. 2011, pp. 39–46. Associate Editor

Data-Driven Aerospace Engineering With ML

Uploaded by

Copyright:

Available Formats

Data-Driven Aerospace Engineering With ML

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data-Driven Aerospace Engineering With ML

Uploaded by

Copyright:

Available Formats

AIAA JOURNAL

Vol. 59, No. 8, August 2021

Data-Driven Aerospace Engineering: Reframing the Industry

D ATA science is broadly redefining the state-of-the-art in engi-

much of the aerospace industry has been centered around a con-

Fig. 2 Schematic overview of data-driven aerospace engineering.

A. Machine Learning translation-invariant, and symmetric covariance kernels. Additional

ϕx; y; θ (2) where k ⋅ k is the root mean-squared error (RMSE).

Labeling the data with expert knowledge often makes it possible

Fig. 3 Schematic overview of various machine learning techniques.

the goal is to find two functions, an encoder z φx and a decoder

Deep learning, or learning based on NNs with a deep multilayer

is critical is that a proxy, physics-based model exists that is capable of x^ Φr a.

digital twin can continuously learn updated, high-precision physics

Fig. 6 Standard classical control feedback loop.

Fig. 7 Model-based engineering design diamond.

Fig. 8 Schematic materials building block diagram.

the customer operations team to support an operational fleet. From

Fig. 11 Illustration of many potentially data-intensive aspects of ground service.

Table 1 Segmented prediction results show vastly improved prediction accuracies,

Fig. 15 V-22 Osprey. Image by Peter Gronemann, reproduced from https://en.wikipedia.org/wiki/File:V22-Osprey.jpg.

[58] Renganathan, S. A., “Koopman-Based Approach to Nonintrusive https://doi.org/10.1137/16M1080173

pp. 91–129. May 2014, pp. 22–34.

Machinery, New York, 2013, pp. 581–588. pp. 59–74.

You might also like