A Deep Learning Framework For Solution and Discovery in Solid Mechanics
A Deep Learning Framework For Solution and Discovery in Solid Mechanics
mechanics
Ehsan Haghighata , Maziar Raissib , Adrian Mourec , Hector Gomezc , Ruben Juanesa
a
Massachusetts Institute of Technology, Cambridge, MA
b
University of Colorado Boulder, Boulder, CO
c
Purdue University, West Lafayette, IN
arXiv:2003.02751v2 [cs.LG] 6 May 2020
Abstract
We present the application of a class of deep learning, known as Physics Informed Neural Net-
works (PINN), to learning and discovery in solid mechanics. We explain how to incorporate the
momentum balance and constitutive relations into PINN, and explore in detail the application to
linear elasticity, and illustrate its extension to nonlinear problems through an example that show-
cases von Mises elastoplasticity. While common PINN algorithms are based on training one deep
neural network (DNN), we propose a multi-network model that results in more accurate repre-
sentation of the field variables. To validate the model, we test the framework on synthetic data
generated from analytical and numerical reference solutions. We study convergence of the PINN
model, and show that Isogeometric Analysis (IGA) results in superior accuracy and convergence
characteristics compared with classic low-order Finite Element Method (FEM). We also show the
applicability of the framework for transfer learning, and find vastly accelerated convergence during
network re-training. Finally, we find that honoring the physics leads to improved robustness: when
trained only on a few parameters, we find that the PINN model can accurately predict the solution
for a wide range of parameters new to the network—thus pointing to an important application of
this framework to sensitivity analysis and surrogate modeling.
Keywords: Artificial neural network, Physics-informed deep learning, Inversion, Transfer
learning, Linear elasticity, Elastoplasticity
1. Introduction
Over the past few years, there has been a revolution in the successful application of Artificial
Neural Networks (ANN), also commonly referred to Deep Neural Networks (DNN) and Deep
Learning (DL), in various fields including image classification, handwriting recognition, speech
recognition and translation, and computer vision. These ANN approaches have led to a sea change
in the performance of search engines, autonomous driving, e-commerce, and photography (see [1,
2, 3] for a review). In engineering and science, ANNs have been applied to an increasing number
of areas, including geosciences [4, 5, 6, 7, 8], material science [9, 10, 11, 12], fluid mechanics
[13, 14], genetics [15], and infrastructure health monitoring [16, 17], to name a few examples.
In the solid and geomechanics community, deep learning has been used primarily for material
modeling, in an attempt to replace classical constitutive models with ANNs [18, 19, 20]. In these
applications, training of the network, i.e., evaluation of the network parameters, is carried out by
2
satisfy the physics constraints, it is in effect a surrogate model that can be used for extrapolation on
unexplored data. To test this property, we train a network on four datasets with different parameters
and then test it on a wide range of new parameter sets, and find that the results remain relatively
accurate. This property points to the applicability of PINN models for sensitivity analysis, where
classical approaches typically require an exceedingly large number of forward simulations.
σij,j + fi = 0,
σij = λδij εkk + 2µεij , (1)
1
εij = (ui,j + uj,i ) .
2
Here, σij denotes the Cauchy stress tensor. For the two-dimensional problems considered here
i, j = 1, 2 (or i, j = x, y). We use the summation convention, and an subscript comma denotes
partial derivative. The function fi denotes a body force, ui represents the displacements, εij is the
infinitesimal stress tensor and δij is the Kronecker delta. The Lamé parameters λ and µ are the
quantities to be inferred using PINN.
zl = σ l Wl zl−1 + bl , l = 1, . . . , L,
(2)
where z0 ≡ x and zL ≡ y are inputs and outputs of the model, Wl , bl are parameters of each
layer l, known as weights and biases, respectively. The functions σ l are called activation functions
and make the network nonlinear with respect to the inputs. For instance, an ANN functional of
some field variable, such as displacement u(x), with three hidden layers and with σ l = tanh as the
activation function for all layers except the last can be written as
z1 (x) = tanh(W0 x + b0 ),
z2 (x) = tanh(W1 z1 + b1 ),
(3)
z3 (x) = tanh(W2 z2 + b2 ),
u(x) = W3 z3 + b3 .
3
This model can be considered as an approximate solution for the field variable u of a partial differ-
ential equation.
In the PINN architecture, the network inputs (also known as features) are space and time vari-
ables, i.e., (x, y, z, t) in Cartesian coordinates, which makes it meaningful to perform the differenti-
ation of the network’s output with respect to any of the input variables. Classical implementations
based on finite difference approximations are not accurate when applied to deep networks (see
[35] for a review). Thanks to modern graph-based implementation of the feed-forward network
(e.g., Theano [29], Tensorflow [30], MXNet [36]), this can be carried out using Automatic Differ-
entiation at machine precision, therefore allowing for many hidden layers to represent nonlinear
response. Hence, evaluation of a partial differential operator P acting on u is achieved naturally
with graph-based differentiation and can then be incorporated in the cost function along with initial
and boundary conditions as:
where ∂Ω is the domain boundary, u0 − u∗0 is the initial condition at t = t0 , and 0∗ indicates the
PN Pu at2 any given training point. The norm | · | of a
expected (true) value for the differential relation
1
generic quantity g defined in Ω denotes N i=1 g(xi ) where the xi ’s are the spatial points where
the data is known. The dataset is then fed to the neural network and an optimization is performed
to evaluate all the parameters of the model, including the parameters of the PDE.
4
that we could realistically have in practice. For all examples, unless otherwise noted, we use a
batch-size of 64, a limit of 10,000 epochs with shuffling, and a patience of 500 to perform the
training.
5
Figure 2: Exact solution in Eqs. (6)–(7) for parameter values of λ = 1, µ = 0.5, and Q = 4.
6
necessarily follow any cross-dependence between variables in the governing equations (1). Our
data shows that using separate networks for each variable results in a far more effective strategy.
Therefore, we propose to have variables ux , uy , σxx , σyy , σxy defined as independent ANNs as our
architecture of choice (see Fig. 4), i.e.
Figure 3: Potential PINN network choices, with ux and uy as outputs of a single network (left), or outputs of two
independent networks with different parameters (right).
The quantities with asterisks represent given data. We will train the networks so that their output
values are as close as possible to the data, which may be real field data or, in this paper, synthetic
data from the exact solution to the problem or the result of a high-fidelity simulation. The values
without asterisk represent either direct outputs of the network (e.g., ux or σxx ; see Eq. (8)) or
quantities obtained through automatic graph-based differentiation [35] of the network outputs (e.g.,
εxx = ux,x ). In Eq. (9), fx∗ and fy∗ represent data on the body forces obtained as fi∗ = −σij,j
∗
.
The different terms in the cost function represent measures of the error in the displacement
and stress fields, the momentum balance, and the constitutive law. This cost function can be used
for deep-learning-based solution of PDEs as well as for identification of the model parameters.
For the solution of PDEs, λ and µ are treated as fixed numbers in the network. For parameter
7
identification, λ and µ are treated as network parameters that change during the training phase
(see Fig. 4). In TensorFlow [30] this can be accomplished defining λ and µ as Constant (PDE
solution) or Variable (parameter identification) objects, respectively. We set up the problem
using the SciANN [40] framework, a high-level Keras [37] wrapper for physics-informed deep
learning and scientific computations. Experimenting with all of the previously mentioned network
choices can be easily done in SciANN with minimal coding.1
Figure 4: Network architecture of choice used in this study. We define five networks, one for each variable of interest,
i.e., ux , uy , σxx , σyy , σxy . Each network has (x, y) as input features.
1
The code for some of the examples solved here is available at: https://github.com/sciann/examples.
8
Table 1: Statistics of the networks of choice to perform PINN learning.
Number of Parameters
Network Layers Neurons
Independent Networks Single Network
i 5 20 12336 1893
ii 5 50 72816 10713
iii 10 20 27036 3993
iv 10 50 162066 23463
(b) Force-complete data: In this scenario, we have data at a set of points for the displacements,
their first derivatives and their second derivatives. The availability of the displacement sec-
ond derivatives allows us to determine data for the body forces fx∗ and fy∗ using the momen-
tum balance equation without resorting to any differentiation algorithm.
In Fig. 5 we compare the evolution of the cost function for stress-complete data (Fig. 5a) and
force-complete data (Fig. 5b). Both figures show a comparison of the four network architectures
that we study; see Table 1. We find that training on the force-complete data performs slightly better
(lower loss) at a given epoch.
The result of convergence of model identification is shown in Fig. 6. The training converges to
the true values of parameters, i.e., λ = 1 and µ = 1/2, for all cases. We find that the optimization is
very quick on the parameters while it takes far more epochs to fit the network on the field variables.
Additionally, we observe that deeper networks produce less accurate parameters. We attribute the
loss of accuracy as we increase the ANN complexity to over-fitting [1, 3]. Convergence of the
individual terms in the loss function (9) is shown in Fig. 7 for Net-ii (see Table 1). We find that all
terms in the loss, i.e., data-driven and physics-informed, show oscillations during the optimization.
Therefore, no individual term is solely responsible for the oscillations in the total loss (Fig. 5).
Figure 5: The result of training networks i, ii, iii, and iv on the analytical data set ux , uy , σxx , σyy , and σxy ; (a) body
forces are evaluated from central-difference differentiation of stress components, (b) body forces are also given ana-
lytically.
The impact of the ANN functional form can be examined comparing the data in Figs. 5b and
8a, which show the evolution of the cost function using the activation functions tanh and ReLU, re-
spectively. The function ReLU has discontinuous derivatives, which explains its poor performance
9
Figure 6: The result of identification for λ = 1, µ = 1/2 for networks i, ii, iii, and iv on the analytical data set ux , uy ,
σxx , σyy , and σxy ; (a) body forces are evaluated from central-difference differentiation of stress components, (b) body
forces are also given analytically.
for physics-informed deep learning, whose effectiveness relies heavily on accurate evaluation of
derivatives.
A comparison of Figs. 5b and 8b shows that using independent networks for displacements
and stresses is more effective than using a single network. We find that the single network leads
to less accurate elastic parameters because the cross-dependencies of the network outputs through
the kinematic and constitutive relations may not be adequately represented by the tanh activation
10
Figure 7: Individual terms of total loss (9) for network ii on the analytical data set ux , uy , σxx , σyy , and σxy ;
(a) body forces are evaluated from central-difference differentiation of stress components, (b) body forces are also
given analytically.
function.
Figure 8: (a) ReLU activation function on the analytical data set ux , uy , σxx , σyy , σxy , fx , and fy . (b) Connected
network.
Fig. 9 analyzes the effect of availability of data on the training. We computed the exact solution
on four different uniform grids of size 10 × 10, 40 × 40, 160 × 160, and 640 × 640; and carried
out the parameter identification process. We performed the comparison using force-complete data
and a network with 10 layers and 20 neurons per layer (network iii). The training process found
good approximations to the parameters for all cases, including that with only 10 × 10 points. The
results show that fewer data points require many more epoch cycles, but the overall computational
cost is far lower.
11
Figure 9: Training on different sizes of data. Parameters are all accurately identified however solution has different
level of accuracy.
commercial FEM software COMSOL [41]. We evaluate the FEM displacements, strains, stresses
and stress derivatives at the center of each element. Then, we map the data to a 100 × 100 training
grid using SciPy’s griddata module with cubic interpolation. This step is performed as a data-
augmentation procedure, which is a common practice in machine learning [1].
To analyze the importance of data satisfying the governing equations of the system, we focus
our attention on network ii and we study cases with stress- and force-complete data. The results of
training are presented in Fig. 10. As can be seen here, the bilinear element performs poorly on the
learning and identification. The performance of training on the other elements is good, comparable
to that using the analytical solution. Further analysis shows that this is indeed expected as FEM
differentiation of bilinear elements provides a poor approximation of the body forces. The error
in the body forces is shown in Fig. 11, which indicates a high error for bilinear elements. We
conclude that the standard bilinear elements are not suitable for this problem to generate numer-
ical data for deep learning. Fig. 10a2 confirms that pre-processing the data can remove the error
that was present in the numerical solution with bilinear elements, and enable the optimization to
successfully complete the identification.
12
Figure 10: (a1) Training on the FEM dataset using ux , uy , σxx , σyy , σxy , fx and fy components. (a2) Training with
body forces fx and fy evaluated from central-differentiation of stress components.
Figure 11: The error in bilinear, biquadratic, bicubic, and biquartic FEM data, that is evaluated as the difference
between FEM evaluation of momentum relation, i.e., σij,j and true body forces fi∗ in x (top) and y (bottom) directions.
train a representative PINN in highly-instrumented regions and use them at other locations with
limited observational datasets. To this end, we use the pre-trained model on Net-iii (Fig. 5), which
was trained on a dataset with λ = 1.0 and µ = 0.5 and then we explore how the loss evolves and
the training converges when data is generated with different values of µ ∈ {2.0, 1.5, 1.0, 0.1}.
In Fig. 13 we show the convergence of the model with different datasets. Note that the loss
is normalized by the initial value L0 from the pre-trained network on µ = 0.5 (Fig. 5). As can
be seen here, re-training on new datasets costs only a few hundred epochs with a smaller initial
value for the loss. This is pointing to the advantage of deep learning and PINN, where retraining
on similar data is much less costly than classical methods that rely on forward simulations.
13
Figure 12: (a1) Training on the IGA dataset using ux , uy , σxx , σyy , σxy , fx and fy components. (a2) Learning with
body forces fx and fy evaluated from centeral-differentiation of stress components.
Figure 13: Identification of a new dataset generated with different values of µ using a pre-trained neural network on
µ = 0.5. The re-training takes far less epochs to converge with an initial value for loss L much smaller.
14
terms of stresses with a maximum error for near-incompressible conditions, µ ≈ 0.
Figure 14: Application to sensitivity analysis: the model is trained on multiple datasets generated with different values
of µ ∈ {1/4, 2/3, 3/2, 4} (highlighted in dot-dashed lines). The model is then tested on a continuous range of values
for µ ∈ (0, 9). The error is defined as | ◦ − ◦∗ |/| ◦∗ | at the point where ◦∗ is maximum.
The plastic part of deformation tensor is evaluated through a plasticity model. The von Mises
model implies that the plastic deformation occurs in the direction of normal to a yield surface F
defined as F(σij ) := q − σY , as
∂F
εpij = γ , (12)
∂σij
p
where σY is the yield stress, q is the equivalent stress defined as q = 3/2sij sij , with sij the
components of the deviatoric stress tensor, sij = σij − σkk /3δij . The strain remains strictly elastic
as long as the state of stress σij remains inside the yield surface, F < 0. Plastic deformation occurs
when the state of stress is on the yield surface, F = 0. The condition F > 0 is associated with an
15
inadmissible state of stress. Parameter γ is the plastic multiplier, subject to the condition γ ≥ 0,
and evaluated through a predictor–corrector algorithm by imposing the condition F ≤ 0 [43]. In
p
the case of von Mises plasticity, the volumetric plastic deformation is zero, εq kk = 0. It can be
shown that the plastic multiplier γ is equal to the equivalent plastic strain ε̄p = 2/3epij epij , where
eij are the components of deviatoric strain tensor, eij = εij − εkk /3δij .
Therefore, the elastoplastic relations for a plane-strain problem can be summarized as:
σij,j + fi = 0,
σij = λεkk δij + 2µεij = (λ + 2/3µ)εkk + sij ,
sij = 2µ(eij − epij ), (13)
∂F 2 sij
epij = ε̄p = ε̄p ,
∂σij 3 q
also known as Karush–Kuhn–Tucker (KKT) conditions. For the von Mises model, the plastic
multiplier ε̄p can be expressed as
σY
ε̄p = ε̄ − ≥ 0, (15)
3µ
p
where ε̄ is the total equivalent strain, i.e., ε̄ = 2/3eij eij . Therefore, the parameters of this model
are the Lamé elastic parameters λ and µ, and the yield stress σY .
16
The associated cost function is then defined as
L = |ux − u∗x | + |uy − u∗y |
∗ ∗ ∗ ∗
+ |σxx − σxx | + |σyy − σyy | + |σzz − σzz | + |σxy − σxy |
+ |σxx,x + σxy,y − fx∗ | + |σxy,x + σyy,y − fy∗ |
+ |(λ + 2/3µ)εkk + 2µ(exx − epxx ) − σxx |
(17)
+ |(λ + 2/3µ)εkk + 2µ(eyy − epyy ) − σyy |
+ |(λ + 2/3µ)εkk + 2µ(ezz − epzz ) − σzz |
+ |2µ(exy − epxy ) − σxy | + |(ε̄ − σY /3µ) − ε̄p |
+ |(1 − sign(ε̄p ))|ε̄p || + |(1 + sign(F))|F|| + |ε̄p F|.
The KKT positivity and negativity conditions are imposed through a penalty constraint in the loss
function. For instance, ε̄p ≥ 0 is incorporated in the loss as (1 − sign(ε̄p ))|ε̄p |. Therefore, for
values of ε̄p < 0, the resulting ‘cost’ is 2|ε̄p |, which should vanish.
17
Figure 15: Reference solution of extension loading of a perforated plate from a high-fidelity FEM simulation. The
true parameters are λ = 19.44 GPa, µ = 29.17 GPa and σY = 243.0 MPa.
18
Figure 16: Error in predicted values from the PINN framework for displacements, strains, plastic strains and stresses.
5. Conclusions
We study the application of a class of deep learning, known as Physics-Informed Neural Net-
works (PINN), for solution and discovery in solid mechanics. In this work, we formulate and apply
the framework to a linear elastostatics problem, which we analyze in detail, but then illustrate the
application of the method to nonlinear elastoplasticity. We study the sensitivity of the proposed
framework to noise in data coming from different numerical techniques. We find that the optimizer
performs much better on data from high-order classical finite elements, or with methods with en-
hanced continuity such as Isogeometric Analysis. We analyze the impact of the size and depth
of the network, and the size of the dataset from uniform sampling of the numerical solution—an
aspect that is important in practice given the cost of a dense monitoring network. We find that
the proposed PINN approach is able to converge to the solution and identify the parameters quite
efficiently with as little as 100 data points.
We also explore transfer learning, that is, the use a pre-trained neural network to perform train-
ing on new datasets with different parameters. We find that training converges much faster when
19
this is done. Lastly, we study the applicability of the model as a surrogate model for sensitivity
analysis. To this end, we introduce shear modulus µ as an input variable to the network. When
training only on four values of µ, we find that the network predicts the solution quite accurately on
a wide range of values for µ, a feature that is indicative of the robustness of the approach.
Despite the success exhibited by the PINN approach, we have found that it faces challenges
when dealing with problems with discontinuous solutions. The network architecture is less accu-
rate on problems with localized high gradients as a result of discontinuities in the material proper-
ties or boundary conditions. We find that, in those cases, the results are artificially diffuse where
they should be sharp. We speculate that the underlying reason for this behavior is the particular
architecture of the network, where the input variables are only the spatial dimensions (x and y),
rendering the network unable to produce the required variability needed for gradient-based opti-
mization that would capture solutions with high gradients. Addressing this extension is an exciting
avenue for future work in machine-learning applications to solid mechanics.
Acknowledgements
This work was funded by the KFUPM-MIT collaborative agreement ‘Multiscale Reservoir
Science’.
References
[1] C. M. Bishop, Pattern Recognition and Machine Learning, Springer-Verlag, Berlin, Hei-
delberg, 2006. URL: https://www.springer.com/gp/book/9780387310732.
doi:https://dl.acm.org/doi/book/10.5555/1162264.
[2] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (2015) 436–444. URL: https:
//doi.org/10.1038/nature14539. doi:10.1038/nature14539.
[3] I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT press, 2016. URL:
https://www.deeplearningbook.org. doi:https://dl.acm.org/doi/
book/10.5555/3086952.
20
[7] Q. Kong, D. T. Trugman, Z. E. Ross, M. J. Bianco, B. J. Meade, P. Gerstoft, Machine learning
in seismology: turning data into insights, Seismological Research Letters 90 (2018) 3–14.
doi:10.1785/0220180259.
[10] K. T. Butler, D. W. Davies, H. Cartwright, O. Isayev, A. Walsh, Machine learning for molec-
ular and materials science, Nature 559 (2018) 547–555. URL: http://dx.doi.org/
10.1038/s41586-018-0337-2. doi:10.1038/s41586-018-0337-2.
[11] Z. Shi, E. Tsymbalov, M. Dao, S. Suresh, A. Shapeev, J. Li, Deep elastic strain engineering
of bandgap through machine learning, Proceedings of the National Academy of Sciences 116
(2019) 4117–4122. URL: http://www.pnas.org/lookup/doi/10.1073/pnas.
1818555116. doi:10.1073/pnas.1818555116.
[12] S. L. Brunton, J. N. Kutz, Methods for data-driven multiscale model discovery for materials,
Journal of Physics: Materials 2 (2019) 044002. URL: https://doi.org/10.1088%
2F2515-7639%2Fab291e. doi:10.1088/2515-7639/ab291e.
[16] M. H. Rafiei, H. Adeli, A novel machine learning-based algorithm to detect damage in high-
rise building structures, Structural Design of Tall and Special Buildings 26 (2017) 1–11.
doi:10.1002/tal.1400.
21
[17] D. Sen, A. Aghazadeh, A. Mousavi, S. Nagarajaiah, R. Baraniuk, A. Dabak, Data-driven
semi-supervised and supervised learning algorithms for health monitoring of pipes, Mechan-
ical Systems and Signal Processing 131 (2019) 524–537. URL: https://doi.org/10.
1016/j.ymssp.2019.06.003. doi:10.1016/j.ymssp.2019.06.003.
[18] J. Ghaboussi, D. Sidarta, New nested adaptive neural networks (NANN) for constitutive
modeling, Computers and Geotechnics 22 (1998) 29–52.
22
19 (1994) 1–25. URL: https://www.sciencedirect.com/science/article/
pii/0895717794900957. doi:10.1016/0895-7177(94)90095-7.
[27] I. E. Lagaris, A. Likas, D. I. Fotiadis, Artificial neural networks for solving ordinary and
partial differential equations, IEEE Transactions on Neural Networks 9 (1998) 987–1000.
URL: https://ieeexplore.ieee.org/document/712178. doi:10.1109/72.
712178.
23
[35] A. G. Baydin, B. A. Pearlmutter, A. A. Radul, J. M. Siskind, Automatic differentiation in
machine learning: a survey, The Journal of Machine Learning Research 18 (2017) 5595–
5637. URL: https://dl.acm.org/doi/abs/10.5555/3122009.3242010.
[36] T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu, C. Zhang, Z. Zhang,
MXNet: A flexible and efficient machine learning library for heterogeneous distributed sys-
tems (2015). arXiv:1512.01274.
[39] J. Duchi, E. Hazan, Y. Singer, Adaptive subgradient methods for online learning and
stochastic optimization, Journal of Machine Learning Research 12 (2011) 2121–2159. URL:
http://jmlr.org/papers/v12/duchi11a.html.
[40] E. Haghighat, R. Juanes, SciANN: A Keras wrapper for scientific computations and physics-
informed deep learning using artificial neural networks, https://sciann.com, 2019.
URL: https://github.com/sciann/sciann.git.
[41] COMSOL, COMSOL Multiphysics User’s Guide, COMSOL, Stockholm, Sweden, 2020.
24