Estimating The Heat Equation Using Neural Networks
Estimating the Heat Equation using
Neural Networks
by Nate Jordre
April 2021
Mike Heroux
Thesis Advisor, Scientist in Residence
Jeremy Iverson
Faculty Reader, Assistant Professor of Computer Science
Robert Hesse
Faculty Reader, Associate Professor of Mathematics
Noreen Herzfeld
Chair, Department of Computer Science
Partial Differential Equations have important uses in many fields
including physics and engineering. Due to their importance, heavy
research has been done to solve these problems efficiently and ef-
fectively. However, some PDEs are still challenging to solve using
classical methods, often due to the dimensionality of the problem.
In recent years, it has been thought that neural networks may be
able to solve these problems effectively. This research assesses how
well a neural network can estimate a simple example PDE, the heat
equation, as well as the practicality of doing so.
1 Introduction
Partial Differential Equations (PDEs) have many important appli-
cations in scientific fields including physics, engineering, finance and
more. They are often used to model physical phenomena such as the
diffusion of heat or sound, the flow of fluids, electrodynamics, and
other physical problems [9]. Due to the importance of PDEs, there
already exists a number of methods to approximate the solutions to
these problems. However, many of these methods become less prac-
tical as the dimensionality of the problems increase. Due to this,
there is still a large amount of research being done on additional
methods for solving PDEs. One research area that has received a
lot of attention lately is the use of nerual networks for solving PDEs.
Neural networks are an important sector of machine learning and
are modeled after the human brain. Neural networks are promising
for approximating the solutions of PDEs due to their strong function
approximation capabilities [8].
There are many important PDEs, but this paper will focus on
the heat equation. The heat equation describes the temperature
distribution and diffusion of heat within an object. The majority
of my research dealt with the 1-dimensional heat equation because
data was most readily available in this dimension. Using the 1-
dimensional heat equation also serves as a starting point for assess-
ing the ability of neural networks to approximate the heat equation.
Because the 1-dimensional version is the most simple, it should also
be the easiest to model with a neural network, providing a base es-
timate for how well neural networks can approximate the solution
to PDEs.
This paper will report how accurately different types of neural
networks can estimate the heat equation. A variety of networks are
used, including standard artificial neural networks with a variety of
activation functions, a radial basis function network (RBF network),
and simple example of a ”physics inspired neural network”.
explained and showed how to solve the 1-Dimensional heat equation
numerically using python [10, 11]. By making some small modifi-
cations to his script, I was able to use this to generate the training
data I needed to train my model. Taking a closer look at the par-
ticular problem this script is solving, it is important to take note of
some characteristics. The script simulates the flow of heat in a wall
where heat only flows in the x direction. Throughout the different
simulations, some things are held constant: the wall is always 10
cm thick, and it is always split into 20 uniformly sized nodes (each
which will have its own unique temperature). Additionally, each
node begins the simulation with an initial temperature of zero de-
grees. The thermal diffusivity of the wall is held constant. The wall
has heat applied at both boundaries, and the temperature at either
boundary is held constant throughout a single simulation. Each
simulation was run for a total of 30 seconds, with the temperature
of each node updated every 0.1 seconds. So for this set of condi-
tions, the simulation finds a temperature at each of the 20 nodes at
every 0.1 second time step, up to 30 seconds. The neural network
will seek to do the same: give a prediction for the temperature of
a given node, at a given time, with the given initial boundary tem-
peratures. To gather training data, a number of simulations were
run, each with unique pairings of boundary temperatures.
Figure 1: A basic three layer neural network, with 8 nodes in the input and
hidden layers, and one node in the output layer.
Figure 2: The ReLu activation function.
Figure 3: The Sigmoid activation function.
The tight range offers the advantage that activations will not “blow
up”, reducing the influence of the more extreme inputs.
Tanh is the final activation function I experimented with during
this phase of the project. It is very similar to the sigmoid function
(Figure 4):
A(x) = tanh(x) = 2/(1 + e−2x ) − 1 (4)
Like sigmoid, the activation range is tightly bound, but tanh allows
for values between (-1, 1). Sigmoid and tanh both lack the charac-
teristic of ReLu where high positive values are represented at their
full intensity.
Figure 4: The Tanh activation function.
and the one I used, is Mean Squared Error (MSE),
(Yi − Ŷi )2
M SE = 1/n (5)
the right side of the rod, InitialTemp, which was the initial temper-
ature of the non-boundary points, time, which indicates the time
within the simulation that a given point represents, TestPointPo-
sition which can be used to measure the distance of the given test
point to the leftmost point of the rod, and ResultingTemp, which is
the simulated temperature of the point at that position and time.
The resulting temp is what the network will be predicting.
To create the neural network, I used the Python library Tensor-
Flow as well as the Keras library which acts as an interface for the
TensorFlow library. The model I used in this portion of the exper-
iment was quite simple, it consisted of three explicit layers: two 64
node layers and a one node output layer; Keras also adds one im-
plicit layer that handles the shape of the input based on the number
of features in the dataset. I had the best results using sigmoid as
my activation function, but I also tested the network with ReLu and
tanh functions. The model used MSE as its loss function. The train-
ing time for the neural network was relatively short. On average,
it took about 80 seconds to complete an epoch. One epoch means
that the entire training set has been passed through the network
one time. Generally, the training phase lasts multiple epochs as the
model will be underfit if there are too few epochs. I typically had
my network complete 20 epochs during the training phase. Test-
ing follows the training phase. The model makes predictions for all
29900 data points in the test set and then the testing metrics are
calculated. These metrics are the primary way to evaluate how well
the neural network can estimate the 1-dimensional heat equation.
The main model, which had the three layers described above, and
used the sigmoid function as its activation function performed best.
Over the course of five complete runs, the neural network had an
average MAE of 0.24494. For context, the actual temperatures var-
ied between 0 and 45 degrees, so for any given point in time and
any given point within the wall, the model was an average of only
0.24494 degrees from the expected temperature. The average MSE
throughout the five runs was only 0.12984. The other models using
different activation functions performed well but had worse results
by a noticeable margin. Using ReLu, the model had an MAE of
0.35638 and an MSE of 0.31504. The model using tanh as the acti-
vation function had an average MAE of 0.4359 and an average MSE
of 0.3450. The results from each activation function’s five runs are
shown in figure 5. These results were somewhat surprising to me.
When I was doing my initial building and testing of the network,
I had used ReLu by default, as it is often considered to perform
consistently well, more so than the other two activation functions
used. It was not until much later when I began to experiment with
the construction of the model that I tried using the sigmoid func-
tion for the activation and realized it performed much better on this
particular problem.
I also experimented with adding layers to the model, as well as
expanding layers by adding more nodes to them, but these changes
led to marginal decreases in accuracy and also increased the training
and testing times of the model. From these experiments, sigmoid
using more neurons per layer performed best, but it was still not
as good as the 64-neuron original. With 128 neurons per layer, the
neural network averaged an MAE of 0.27106 and an MSE of 0.16332,
so it still performed significantly better than the other activation
functions (Figure 6).
Figure 5: The MAE and MSE of each activation function througout 5 runs.
Figure 6: Updated MAE chart to include 128 node Sigmoid model
Another method for evaluating these results were graphs and vi-
sualizations. I used matplotlib to create a graph that represented
the changes in heat over time. Both the expected and predicted
values were plotted so it was easy to compare the visual differences
between the lists. The visualizations also gave context to the error
results, and illustrated where and when the biggest errors occurred.
The first tenths of the first second has by far the most error. Here
is what the graph looks like at very beginning. Red is used to show
the neural network’s predictions, and blue is the expected results.
Here, there is some very clear error, especially at either end of
the wall. By 0.3 seconds, the results begin to look a little better:
At one second, the lines begin to match up closely:
And from this point forward, the predicted results and the actual
results are typically a near match, here it is again at 15 seconds
where the lines are almost indistinguishable from one another:
The simulation lasts 30 seconds; this is what the graph looks like
at the very end:
Based on these visualizations, it was clear that the first second
was the weakest part of the network’s predictions. I decided to inves-
tigate that further, and found over the course of five more runs (with
the sigmoid based neural network) that during the first second, the
average MAE was 0.612688. During the rest of the simulation, 29
seconds, the average MAE was only 0.175114. This is a major dis-
crepancy, and something that would definitely need to be addressed
if future research took a similar approach.
works typically have faster training speeds compared to other neu-
ral networks. Additionally, RBF networks operate differently than
MLP networks because they are based on clustering using ellipses
and circles rather than an MLP network which is based on linear
separation [16]. Within the hidden layer, each node uses a radial
basis function (RBF) as a nonlinear activation function [17]. The
RBF is denoted as φ(r). Each node also has a center due to the
clustering nature of RBF networks. Picking the center of each node
is typically the first part of the learning phase for a RBF network.
Two common ways to select RBF centers are to select them ran-
domly from the training sets, or more commonly, to use k-means
clustering to choose RBF centers. I tried both of these approaches.
Like traditional neural networks, there are also weights associated
with the connections between layers. The second part of the learning
phase focuses on optimizing the weights by minimizing the MSE.
There are many different functions that can be used as the RBF,
but the most common, and the one I used was the Gaussian function:
2 /2σ 2
φ(r) = e−r (6)
Figure 7: Updated MAE and MSE charts to include both types of versions of
the RBF networks.
random initial centers was 0.3171. The average MSE was 0.18364.
For the RBF network which used k-means to determine centers, the
average MAE was 0.29818 and the average MSE was 0.17128. Both
models compare favorably to ReLu and tanh, but they are not quite
as accurate as a sigmoid based model (Figure 7).
5 Physics-Informed Neural Networks
The final class of neural networks that I used was a little more ab-
stract. Physics-Informed Neural Networks (PINNs) are described
as “neural networks that are trained to solve supervised learning
tasks while respecting any given laws of physics” [12]. These models
are considered “data-efficient” because they are typically used with
problems where data acquisition is especially difficult. To fill in the
gaps caused by the lack of data, PINNs encode physical laws as prior
information which is then used to make better predictions [12]. This
concept is a very recent development, the first paper published on
physics informed neural networks or deep learning that I found was
published in late 2017 by Maziar Raissi, Paris Perdikaris, and George
Em Karniadakis. Their research and subsequent research has pri-
marily been focused on approximating nonlinear partial differential
equations in high dimensions. The “curse of dimensionality” has
caused there to be very few practical high-dimensional alrgorithms
which have been developed [5]. This has provided the opportunity
for deep learning algorithms to offer new and competitive methods
for approximating these PDEs, though it is made clear that the new
methodologies should not be considered replacements of classical
methods for solving PDEs [12]. Some example PDEs where neural
networks have the potential to provide the most benefit are brought
up by Jiequn Han, Arnulf Jentzen, and Weinan E. They include:
the Schrodinger equation, where the dimensionality is about three
times the number of electrons or quantum particles in the system,
the nonlinear Black-Scholes equation which is used for pricing finan-
cial derivatives where the dimensionality is based on the number
of financial assets under consideration, and the Hamilton-Jacobi-
Bellman equation, a game theory or resource allocation problem
where the dimensionality increases linearly based on the number of
actors or resources [5]. All this is to say that while these prob-
lems are very interesting, and the research done so far appears very
promising, it is not all that similar to my own as there are no di-
mensionality issues within my problem scope and there are already
practical and effective solutions to the heat equation. However, the
basic premise of a ”physics-informed” neural network was interest-
ing and I expected it to improve my results with a simple addition
to account for prior information and physical laws.
Figure 8: With the ReLu model, many of the initial predictions are for temper-
atures below zero.
Figure 9: The sigmoid model sometimes made slightly negative initial predic-
tions, but to a much smaller degree.
Figure 10: The ReLu model’s predictions again, this time with a range enforced.
run. While this improvement appears to be quite small, it is nearly
a five percent improvement in accuracy. These changes had less of
an impact on the sigmoid based network because this network did
not tend to predict negative values nearly as often or as extreme.
Still, there was a slight improvement. Using the predictions from the
same test set, the original network had an average MAE of 0.230186
over five runs and the updated version had an MAE of 0.224988.
This is only an improvement of just over two percent.
These are certainly not drastic improvements, but they can only
improve the results of the predictions, so there is really no down-
side to implementing them. This is probably the simplest possi-
ble “physics informed” idea that can be implemented, so it does
not show even a small amount of the potential improvement that
physics informed nerual networks can offer. This is only a small
example of how a basic concept can be useful and provide a small,
but guaranteed improvement.
Figure 11: The initial state of the 2D simulation.
Figure 12: The state of the 2D simulation after beginning the diffusion.
functions should not be significantly different. The model used with
the 2-dimensional data was very similar to the network used with
the original 1-dimensional set. It was the same three layer model
with a 64 node input layer, a 64 node middle/hidden layer, and a
single node in the output layer. The main difference is that the data
inputted to the second model was shaped differently, containing a
few additional features.
The neural network performed well with the new data set. The
test set predictions had a MAE of 0.26268 and a MSE of 0.26008.
While this is slightly worse than the same metrics for the top per-
forming model on the 1-dimensional data set, the difference is small.
Assuming this trend continues, a neural network should continue to
be capable of estimating the heat equation at higher dimensions
with small margin of error.
networks, the predictions have a relatively small margin of error,
especially after the first few time steps. As my first experience
with neural networks, I did not stray far from the keras library
when creating the models; these are certainly not hand-crafted, high
performance neural networks. I think it would be interesting to see
just how far the error could be taken down by someone with more
experience using neural networks.
Additionally, this research did not venture into the deep learning
sector of machine learning as I was immediately met with reduced
accuracy after increasing the number of layers within the network. I
would be curious if deep learning could be applied in a more sophis-
ticated way to this problem and have serious performance boosts.
Again, applying these more advanced techniques to the heat equa-
tion and similarly simple PDEs may not have as many practical
uses, but they are still interesting problems to solve.
Where there is real potential for practical use is the problems that
are currently challenging to due dimensionality and non-linearity
concerns. Recent studies have already shown impressive results
for estimating the Burgers’ equation and the nonlinear Schrodinger
Equation, both problems which have proved to be difficult for clas-
sical methods to solve [12]. However, as seen in my own results,
there still exists the error and uncertainty associated with the predic-
tions of neural networks. Reducing or even eliminating this problem
would make neural networks a far more practical choice for solving
PDEs, and I expect this will continue to be studied in the near
8 Contribution
The research detailed in this paper, while not presenting ground-
breaking results, is still a worthwhile contribution. For other under-
graduate students who may be interested in working on the same, or
a similar problem, this paper provides a good starting point. Within
the associated repository I have consolidated a number of useful ma-
terials, including the scripts I used to generate training, and simple
neural networks that approximate the solution to the heat equa-
tion with generally good results. Future students interested in this
research can expand on this research and focus on producing bet-
ter models without needing to spend as much time setting up the
problem. Additionally, for more experienced researchers, the results
presented within this paper can be seen as a baseline for how well
simple neural networks can predict the solution for the heat equa-
tion. It should be expected that the models they are building will
perform better than mine.
9 Conclusion
Neural networks can be a powerful tool when used for the right
problems. Even further, these results show that neural networks
still perform well on problems where they are not entirely necessary.
Even though it is unlikely that anyone will be using neural networks
to estimate the 1 or 2-dimensional heat equations for any practical
purposes, the fact that the predictions are good enough to the point
of being almost indistinguishable to the expected results visually
(after the first second) only reaffirms the function approximation
capabilities of neural networks.
Within my own results it is clear that the standard neural net-
work using the sigmoid function as an activation function had the
best results. The RBF network also performed competitively, but
was not quite as consistent or as accurate as the best network.
Adding physics informed constraints to the model also was shown
to be effective in reducing the error, though only in a small way.
Given the amount of work done in the past five years on topics like
this, I expect much progress to be made in the near future. While
my research and results are certainly not cutting edge, it was an
interesting way to begin looking at and working with a challenging
11 Source Code
Below is the Keras/TensorFlow implementation of the basic neural
network used to make predictions as well as the visualizations. Any
additional code, including the scripts to generate training and test-
ing data and the custom RBF layer can be found at
Nathan-Work or as they are cited.
im po rt t e n s o r f l o w a s t f
im po rt pandas a s pd
im po rt numpy a s np
im po rt m a t p l o t l i b . p y p l o t a s p l t
from s k l e a r n im por t m e t r i c s
d f = pd . r e a d c s v ( d a t a f i l e , i n d e x c o l=None )
t a r g e t = d f . pop ( ’ ResultingTemp ’ )
f u l l d a t a s e t = t f . data . D a t a s e t . f r o m t e n s o r s l i c e s ( ( d f . v a l u e s , t a r g e t . v a l u e s ) )
u n s h u f f l e d = f u l l d a t a s e t . batch ( 1 )
# s h u f f l e data s e t t o e n s u r e randomness o f t r a i n i n g and t e s t s e t s
f u l l d a t a s e t = f u l l d a t a s e t . s h u f f l e ( l e n ( d f ) ) . batch ( 1 )
# t a k e 20% o f d a t a s e t t o u s e a s t e s t i n g data
t e s t d a t a s e t = f u l l d a t a s e t . t a k e ( t f . data . e x p e r i m e n t a l . c a r d i n a l i t y ( f u l l d a t a s e t ) . numpy ( ) ∗ . 2 )
# s k i p t h e 20% o f data used f o r t e s t i n g and t a k e t h e o t h e r 80% t o u s e f o r t r a i n i n g
t r a i n d a t a s e t = f u l l d a t a s e t . s k i p ( t f . data . e x p e r i m e n t a l . c a r d i n a l i t y ( f u l l d a t a s e t ) . numpy ( ) ∗ . 2 )
model = g e t c o m p i l e d m o d e l ( )
#can u s e f o l l o w i n g l i n e t o s w i t c h between cpu and gpu
with t f . d e v i c e ( ’ cpu : 0 ’ ) :
model . f i t ( t r a i n d a t a s e t , b a t c h s i z e =128 , e p o c h s =20)
# To p r o p e r l y c r e a t e v i s u a l s w i t h o u t mixing data s e t s ,
# you must manually c r e a t e t h e t r a i n i n g and t e s t i n g data s e t s
d a t a f i l e = open ( ” smallHeatData . c s v ” , ’ r ’ )
d f = pd . r e a d c s v ( d a t a f i l e , i n d e x c o l=None )
t a r g e t = d f . pop ( ’ ResultingTemp ’ )
f u l l d a t a s e t = t f . data . D a t a s e t . f r o m t e n s o r s l i c e s ( ( d f . v a l u e s , t a r g e t . v a l u e s ) )
u n s h u f f l e d = f u l l d a t a s e t . batch ( 1 )
p r e d i c t i o n s = model . p r e d i c t ( u n s h u f f l e d )
x = np . l i n s p a c e ( ( . 1 / 2 0 ) / 2 , . 1 − ( . 1 / 2 0 ) / 2 , 2 0 )
T = []
N = []
i = 0
time = 0 . 0
while i < len ( predictions ) :
T = []
N = []
plt . ion ()
plt . c l f ()
t a r g = i + 20
while i < targ :
T . append ( p r e d i c t i o n s [ i ] )
N. append ( t a r g e t [ i ] )
i += 1
plt . figure (1)
p l t . p l o t ( x , T, ’r ’)
p l t . p l o t ( x , N, ’b ’ )
p l t . a x i s ( [ 0 , . 1 , −5, 5 0 ] )
p l t . x l a b e l ( ’ D i s t a n c e (m) ’ )
p l t . y l a b e l ( ’ Temperature (C) a t time : ’ + s t r ( round ( time + 0 . 1 , 1 ) ) + ’ s ’ )
p l t . show ( )
p l t . pause ( 0 . 0 5 )
time += 0 . 1