04 - Deep-Learning-Based Surrogate Model For Reservoir Simulation With Time-Varying Well Controls

Journal of Petroleum Science and Engineering 192 (2020) 107273
Contents lists available at ScienceDirect
Journal of Petroleum Science and Engineering

journal homepage: www.elsevier.com/locate/petrol
Deep-learning-based surrogate model for reservoir simulation with

time-varying well controls
Zhaoyang Larry Jin ∗, Yimin Liu, Louis J. Durlofsky
Department of Energy Resources Engineering, Stanford University, Stanford, CA, 94305, United States of America
ARTICLE INFO ABSTRACT
Keywords: A new deep-learning-based reduced-order modeling (ROM) framework is proposed for application in subsurface
Reservoir simulation flow simulation. The reduced-order model is based on an existing embed-to-control (E2C) framework and
Reduced-order model includes an auto-encoder, which projects the system to a low-dimensional subspace, and a linear transition
Deep learning
model, which approximates the evolution of the system states in low dimension. In addition to the loss function
Physics-informed neural network
for data mismatch considered in the original E2C framework, we introduce a physics-based loss function that
Auto-encoder
Embed-to-control
penalizes predictions that are inconsistent with the governing flow equations. The loss function is also modified
E2C to emphasize accuracy in key well quantities of interest (e.g., fluid production rates). The E2C ROM is shown
to be analogous to an existing ROM, POD-TPWL, which has been extensively developed for subsurface flow
simulation. The new ROM is applied to oil-water flow in a heterogeneous reservoir, with flow driven by nine
wells operating under time-varying control specifications. A total of 300 high-fidelity training simulations are
performed in the offline stage, and the network training requires 10-12 minutes on a Tesla V100 GPU node.
Online (runtime) computations achieve speedups of over a factor of 1000 relative to full-order simulations.
Extensive test case results, with well controls varied over large ranges, are presented. Accurate ROM predictions
are achieved for global saturation and pressure fields at particular times, and for injection and production well
responses as a function of time. Error is shown to increase when 100 or 200 (rather than 300) training runs
are used to construct the E2C ROM. Overall the E2C ROM is shown to provide reliable predictions with levels
of perturbations in the well controls that are much larger than those used with existing POD-TPWL treatments.
The current model is however limited to 2D systems, and the required number of training simulations is much
larger than that for POD-based ROMs.
1. Introduction mapping to and from the low-dimensional representation. The AE com-

ponent is a stack of multiple convolutional neural network (CNN) layers
Reservoir simulation is widely applied to model and manage sub- and dense feed-forward layers. The linear transition model represents
surface flow operations. However, due to the nonlinear nature of the step-wise evolution of the system states with multiple linear feed-
the governing equations and the multiscale character of the geolog- forward layers. The E2C procedure is constructed to predict key well
ical description, computational costs can be high, especially when quantities, such as time-varying production and injection rates and/or
highly resolved models are used. Computational demands can become bottom-hole pressure (BHPs), as well as global pressure and saturation
prohibitive when simulation tools are applied for optimization, uncer- fields, in oil-water reservoir simulation problems.
tainty quantification, and data assimilation, in which case thousands of
ROM methodologies have received a large amount of attention in
simulation runs may be required.
recent years. These procedures typically involve an offline (train-time)
Reduced-order models (ROMs) have been developed and applied to
component, where training runs are performed and relevant solution in-
accelerate flow predictions in a variety of settings. Our goal in this
formation is processed and saved, and an online (test-time) component,
work is to develop a new deep-learning-based reduced-order modeling
where new (test) runs are performed. A popular category of methods is
procedure. Following the embed-to-control framework, the approach
introduced here is comprised of a linear transition model and an auto- proper-orthogonal-decomposition-based (POD-based) ROMs, in which
encoder (AE, also referred to as encoder–decoder). An encoder–decoder POD is applied to enable the low-dimensional representation of solution
architecture is used to achieve dimension reduction by constructing the unknowns in the online computations. These approaches also require
∗ Corresponding author.
E-mail addresses: [email protected] (Z.L. Jin), [email protected] (Y. Liu), [email protected] (L.J. Durlofsky).
https://doi.org/10.1016/j.petrol.2020.107273
Received 11 November 2019; Received in revised form 15 March 2020; Accepted 5 April 2020
Available online 10 April 2020
0920-4105/© 2020 Elsevier B.V. All rights reserved.
Z.L. Jin et al. Journal of Petroleum Science and Engineering 192 (2020) 107273
the projection of the system of equations to low dimension (this pro- Recent developments involving the use of deep-learning techniques
jection is also referred to as constraint reduction). Galerkin projection in ROMs indicate great potential for such approaches. Lee and Carlberg
and least-squares Petrov–Galerkin projection are the two approaches (2020) introduced an improved GNAT procedure by replacing POD
typically used for this step. with AE. The resulting method was applied to a one-dimensional dy-
A treatment of solution nonlinearity is also required, and there have namic Burgers’ equation and a two-dimensional quasi-static chemically
been a number of treatments for this within the context of POD-based reacting flow problem, with the boundary conditions in the test runs
ROMs. One effective approach is Gauss–Newton with approximated different from those in the training runs. Kani and Elsheikh (2019) de-
tensors or GNAT, which also uses POD for state reduction and least- veloped a deep residual recurrent neural network (DR-RNN) procedure,
squares Petrov–Galerkin projection. GNAT was developed by Carlberg which employed RNN to approximate the low-dimensional residual
et al. (2011), and has since been used for structural and solid me- functions for the governing equations in a POD-DEIM procedure. The
chanics (Zahr et al., 2017), electromechanics (Amsallem et al., 2012), resulting ROM was then applied for uncertainty quantification in a
and computational fluid dynamics (Carlberg et al., 2013). GNAT repre- two-dimensional small-scale oil-water system with the distribution of
sents a generalization of the discrete empirical interpolation method porosity in the test runs perturbed from that of the training runs. Zhang
(DEIM) (Chaturantabut and Sorensen, 2010), and the two methods et al. (2019) used a fully-connected network to replace the Newton
(GNAT and POD-DEIM) have been applied in a number of studies iterations in a POD-DEIM procedure. The method was used to predict
involving subsurface flow simulation (Yoon et al., 2016; Yang et al., well responses in a two-dimensional oil-water problem, in which com-
2016; Efendiev et al., 2016; Tan et al., 2019; Jiang and Durlofsky, 2019; binations of well controls and permeability fields for test runs were
Florez and Gildin, 2019). A radial basis function (RBF) multidimen- different from those of the training simulations. Though improvements
sional interpolation method has also been used to treat nonlinearity in in accuracy were achieved by all of the above approaches relative to
the low-dimensional space represented by POD, and the resulting pro- the ‘standard’ implementations, all of these developments were within
cedure is referred to as the POD-RBF method (Xiao et al., 2015; Kostorz existing ROM settings; i.e., none adopted an end-to-end deep-learning
et al., 2019). Trajectory piecewise linearization, originally introduced framework.
by Rewienski and White (2003), entails linearization around ‘nearby’ Other researchers have developed ROM methodologies that repre-
training solutions. POD-TPWL has been widely applied for subsurface sent more of a departure from existing approaches. Wang et al. (2018),
flow simulations involving oil-water, oil-gas compositional, CO2 stor- for example, used the long-short-term-memory (LSTM) RNN (Gers
age, and coupled flow-geomechanics systems (Cardoso and Durlofsky, et al., 1999) to approximate flow dynamics in a low-dimensional
2010; He et al., 2011; He and Durlofsky, 2014, 2015; Jin and Durlofsky, subspace constructed by POD. Subsequently, Gonzalez and Balajew-
2018; Jin et al., 2020). Trehan and Durlofsky (2016) extended POD- icz (2018) replaced the POD step with AE for the low-dimensional
TPWL to include a quadratic term, which gives a trajectory piecewise representation. Both of these approaches, however, were applied on
quadratic (POD-TPWQ) procedure. relatively simple problems, where the only differences between online
The recent success of deep learning in image processing has in- and offline simulation runs were the initial conditions of the systems
spired the rapid development of algorithms for subsurface model- (boundary conditions were identical). In the subsurface flow equations,
ing that make use of deep neural networks. These methods have wells appear as localized source/sink terms, which essentially act as
been applied for geological parameterization, uncertainty quantifica- ‘internal’ boundary conditions. The ability to vary well settings (by well
tion, and surrogate/reduced-order modeling. For geological parame- settings here we mean time-varying rates or BHPs for each well in the
terization and uncertainty quantification, Lee et al. (2018) applied a model) between offline and online computations is an essential feature
distance-based clustering framework, in which models that are close for ROMs used in oil production optimization and related areas. Thus
in terms of distance are grouped. Distance is determined based on the above implementations may not be directly applicable for these
a parameterization of the reservoir models using a stacked AE. Effi- problems.
cient uncertainty quantification was then achieved by simulating only Temirchev et al. (2020) constructed a similar ROM, representing
one model in each group. The results from all groups were taken to the reservoir states in low dimension with VAE. They tested this in
represent the uncertainty range of the entire ensemble. combination with either linear regression, LSTM, or gated recurrent
Canchumuni et al. (2019) generated new geological realizations units (GRU) for dynamic simulation, with the best results achieved
from randomized low-dimensional latent variables using a variational with GRU. A GRU (Chung et al., 2014) is an RNN that is similar to an
auto-encoder (VAE). A VAE (Kingma and Welling, 2013) entails a LSTM RNN, but GRUs have simpler structures. The relative error with
convolutional encoder–decoder neural network architecture similar to GRU was, however, reported to be relatively large in some validation
the AE, where the encoder component projects a high-dimensional scenarios, which might pose problems for applications such as well
distribution into a low-dimensional random vector, with each element control optimization. This study nonetheless provides a useful assess-
following an independent Gaussian distribution. The decoder acts as the ment of several potential approaches within a VAE setting. Temirchev
inverse of the encoder and projects the sampled Gaussian-distributed et al. (2019) also devised a ‘neural-differential-equation-based’ ROM,
random variables back to the high dimension. Laloy et al. (2018) and applied it for a 3D synthetic benchmark test model. Tang et al.
achieved a similar goal using a generative adversarial network (GAN), (2019) introduced a deep-learning-based surrogate model with convo-
where the projection to high dimension is determined by training lutional and recurrent neural networks to predict flow responses for
two adversarial neural networks (known as the generator and the new geomodels. Well controls were not changed between training and
discriminator). Liu et al. (2019) and Liu and Durlofsky (2020) ex- testing runs. This model, referred to as ‘recurrent R-U-Net,’ was applied
tended principal component analysis (PCA) based representations to successfully within a history matching workflow.
a CNN-PCA procedure. This approach applied the ‘fast neural style Many of the existing approaches are purely data driven and do not
transfer’ algorithm (Johnson et al., 2016) to represent complex geo- take the underlying governing equations into (direct) consideration.
logical models characterized by multipoint spatial statistics, and was A number of methods have, however, been applied to incorporate
shown to enable more efficient data assimilation. Zhu and Zabaras physical constraints into deep neural networks. Raissi et al. (2019)
(2018) formulated surrogate modeling as an image-to-image regression, introduced a physics-informed deep learning framework (later referred
and constructed a Bayesian deep convolutional neural network for to as physics-informed neural network or PINN) that used densely con-
geological uncertainty quantification. Subsequently, Mo et al. (2019) nected feed-forward neural networks. In PINN, the residual functions
extended this model to handle multiphase flow problems, and further associated with the governing partial differential equations (PDEs) are
improved performance by introducing additional physical constraints. introduced into the loss function of the neural network. Zhu et al.
2
(2019) extended this PDE-constraint concept to a deep flow-based gen- phase 𝑗. Other variables are pressure 𝑝 and phase saturation 𝑆𝑗 (these
erative model (GLOW Kingma and Dhariwal, 2018), and constructed are the primary solution variables), time 𝑡, and phase density 𝜌𝑗 . The
a surrogate model for uncertainty quantification using residuals of 𝑞𝑗𝑤 term denotes the phase source/sink term for well 𝑤. This oil-water
the governing equations rather than simulation outputs. Watter et al. model is completed by enforcing the saturation constraint 𝑆𝑜 + 𝑆𝑤 = 1.
(2015) proposed an embed-to-control (E2C) framework, in the context Because the system considered in this work is horizontal (in the 𝑥–𝑦
of robotic planning systems, to predict the evolution of system states plane), gravity effects are neglected.
using direct sensory data (images) and time-varying controls as inputs. The oil and water flow equations are discretized using a standard
The E2C framework combines a VAE, which is used as both an inference finite-volume formulation, and their solutions are computed for each
model to project the system states to a low-dimensional subspace, and grid block. In this work, we use Stanford’s Automatic Differentiation-
a generative model to reconstruct the prediction results at full order, based General Purpose Research Simulator, AD-GPRS (Zhou, 2012), for
with a linear transition model. The latter approximates the evolution all flow simulations. Let 𝑛𝑏 denote the number of grid blocks in the
of low-dimensional states based on the time-varying control inputs. model. The flow system is fully defined through the use of two primary
In this paper, we develop a deep-learning framework for reduced- variables, 𝑝 and 𝑆𝑤 , in each grid block, so the total number of variables
order modeling of subsurface flow systems based on the E2C model in the system is 2𝑛𝑏 . We define 𝐱𝑡 = [𝐩𝑇𝑡 , 𝐒𝑇𝑡 ]𝑇 ∈ R2𝑛𝑏 to be the state
(Watter et al., 2015) and the aforementioned physics-informed treat- vector for the flow variables at a specific time step 𝑡, where 𝐩𝑡 ∈ R𝑛𝑏
ments (Raissi et al., 2019; Zhu et al., 2019). Two key modifications and 𝐒𝑡 ∈ R𝑛𝑏 denote the pressure and saturation in every grid block at
of the existing E2C model are introduced. Specifically, we simplify the time step 𝑡.
VAE to an AE to achieve better accuracy for deterministic test cases, The set of nonlinear algebraic equations representing the discretized
and we incorporate a comprehensive loss function that introduces both fully implicit system can be expressed as:
PDE-based physical constraints and improves accuracy for well pro- ( )
𝐠 𝐱𝑡+1 , 𝐱𝑡 , 𝐮𝑡+1 = 𝟎, (2)
duction and injection quantities. The latter treatment is important for
improving the accuracy of well rates, which are essential in oil produc- where 𝐠 ∈ R2𝑛𝑏 is the residual vector (set of nonlinear algebraic
tion optimization procedures. Because we are considering a supervised equations) we seek to drive to zero, the subscript 𝑡 indicates the
learning problem with labeled data (input and output pairs), the way current time level and 𝑡 + 1 the next time level, and 𝐮𝑡+1 ∈ R𝑛𝑤
we introduce the physical constraints is different from the approaches designates the well control variables, which can be any combination
of Raissi et al. (2019) and Zhu et al. (2019), where the PDE residuals of time-varying bottom-hole pressures (BHPs) or well rates. Here 𝑛𝑤
were used in the loss function during the training process. Interestingly, denotes the number of wells in the system. In this work we operate
our E2C procedure is analogous to existing POD-TPWL methodologies, production wells under BHP specifications and injection wells under
and we discuss the relationships between the two approaches in some rate specifications. Our treatments are general in this regard, and other
detail. control settings could also be applied. Although we apply AD-GPRS for
This paper proceeds as follows. In Section 2, we present the govern- the reference flow simulations, we use a standalone Peaceman well
ing equations for subsurface oil-water flow and then briefly describe model (Peaceman, 1978) in conjunction with the AD-GPRS pressure
the POD-TPWL ROM. In Section 3, the E2C formulation is presented, and saturation results to compute well rates or BHPs from the well-
and the correspondences between E2C and POD-TPWL are highlighted. block states. We proceed in this manner to ensure full consistency in
We present results for a two-dimensional oil-water problem in Sec- the well treatment between the full-order simulator and the E2C ROM.
tion 4. Test cases involve the specification of different time-varying Newton’s method is typically used to solve the full-order discretized
well settings, as would be encountered in an optimization problem. We nonlinear system defined by Eq. (2). This requires constructing the
also present a detailed error assessment for several key quantities. We sparse Jacobian matrix of dimension 2𝑛𝑏 × 2𝑛𝑏 , and then solving a
conclude with a summary and suggestions for future work in Section 5. linear system of dimension 2𝑛𝑏 , at each iteration for every time step.
Supplementary Material for this paper, available online, includes Solution of the linear system is often the most time-consuming part
the detailed architectures for the encoder and decoder used in the of the simulation. As will be explained later, both POD-TPWL and
E2C model, performance comparisons between an auto-encoder, vari- the deep-learning-based E2C ROM avoid the test-time construction and
ational auto-encoder and uncertainty auto-encoder (UAE) (Grover and solution of this high-dimensional system.
Ermon, 2018), E2C ROM results for two additional test cases, and a
Nomenclature defining the main variables used in this work. 2.2. POD-TPWL formulation
2. Governing equations and POD-TPWL ROM Many deep-learning-based models involve treatments that are not
directly analogous to those used in existing ROMs, which were de-
In this section, we present the equations for oil-water flow. We then veloped based on the underlying PDEs and numerical discretizations.
provide an overview of the POD-TPWL ROM for this problem, which Rather, these new approaches often involve machine-learning methods
will allow us to draw analogies with the E2C ROM. that derive from image classification, language recognition, or other
non-PDE-based applications. Our E2C ROM is somewhat different in
2.1. Governing equations this regard, because its three main components are analogous to treat-
ments used in an existing ROM (POD-TPWL) that has been extensively
The governing equations for immiscible oil-water flow derive from applied for subsurface flow. We believe it is worthwhile to discuss
mass conservation for each component combined with Darcy’s law for the correspondences between the POD-TPWL and E2C ROMs, since the
each phase. The resulting equations, with capillary pressure effects analogies between the two approaches may enable insight or suggest
neglected, are approaches for some of the detailed treatments.
∑ To enable this discussion, we first provide a high-level overview of
𝜕
(𝜙𝑆𝑗 𝜌𝑗 ) − ∇ ⋅ (𝜆𝑗 𝜌𝑗 𝐤∇𝑝) + 𝜌𝑗 𝑞𝑗𝑤 = 0, (1) POD-TPWL for reservoir simulation. For full details on recent POD-
𝜕𝑡 𝑤 TPWL implementations, please see He and Durlofsky (2014, 2015),
where subscript 𝑗 (𝑗 = 𝑜, 𝑤 for oil and water) denotes fluid phase. The Jin and Durlofsky (2018) and Jin et al. (2020). Note that, although
geological characterization is represented in Eq. (1) through porosity we discuss the conceptual similarities between these approaches, we
𝜙 and the permeability tensor 𝐤, while the interactions between rock will not present any POD-TPWL results in this paper. This is because
and fluids are specified by the phase mobilities 𝜆𝑗 , where 𝜆𝑗 = 𝑘𝑟𝑗 ∕𝜇𝑗 , the number of training runs we use for E2C (100–300) is much more
with 𝑘𝑟𝑗 the relative permeability of phase 𝑗 and 𝜇𝑗 the viscosity of than is compatible with existing POD-TPWL frameworks (which use,
3
e.g., 3–5 training runs). More specifically, the so-called point-selection Applying the POD representation on the left-hand side and con-
strategies used in the POD-TPWL linearization step would have to be straint reduction (projection) on both sides of Eq. (4), the solution
reformulated in order to accommodate 100 or more training runs. This approximation in low-dimensional space, after some rearrangement, is
would entail the development of new treatments along with extensive given by
testing, both of which are beyond the scope of this paper.
𝝃̂ 𝑡+1 = 𝝃 𝑖+1 − (𝐉𝑟𝑖+1 )−1 [𝐀𝑟𝑖+1 (𝝃 𝑡 − 𝝃 𝑖 ) + 𝐁𝑟𝑖+1 (𝐮𝑡+1 − 𝐮𝑖+1 )], (6)
During the offline (pre-processing) POD-TPWL stage, the training
simulation runs are performed using a full-order simulator (AD-GPRS in with the reduced derivative matrices defined as
this work). The goal here is to predict test-time results with varying well
control sequences. Therefore, during training runs, we apply different 𝐉𝑟𝑖+1 = (Ψ𝑖+1 )𝑇 𝐉𝑖+1 Φ, 𝐀𝑟𝑖+1 = (Ψ𝑖+1 )𝑇 𝐀𝑖+1 Φ, 𝐁𝑟𝑖+1 = (Ψ𝑖+1 )𝑇 𝐁𝑖+1 . (7)
well control sequences 𝐔 = [𝐮1 , … , 𝐮𝑁ctrl ] ∈ R𝑛𝑤 ×𝑁ctrl , where 𝐮𝑘 ∈ R𝑛𝑤 , Here 𝐉𝑟𝑖+1 ∈ R𝑙𝜉 ×𝑙𝜉 , 𝐀𝑟𝑖+1 ∈ R𝑙𝜉 ×𝑙𝜉 and 𝐁𝑟𝑖+1 ∈ R𝑙𝜉 ×𝑛𝑤 . The matrix Ψ𝑖+1
𝑘 = 1, … , 𝑁ctrl , contains the settings (rates or BHPs) for all wells at denotes the constraint reduction matrix at time step 𝑖 + 1. The variable
control step 𝑘, and 𝑁ctrl denotes the total number of control steps in 𝝃̂ 𝑡+1 ∈ R𝑙𝜉 represents the reduced variable approximated through
a training run. There are many fewer control steps than time steps in linearization at time step 𝑡 + 1.
a typical simulation (in our examples we have 20 control steps and During the online stage (test-time), we do not know 𝝃 𝑡 (the projected
around 100 time steps). State variables in all grid blocks (referred to true solution of Eq. (2) at time step 𝑡). Rather, we have 𝝃̂ 𝑡 , the reduced
as snapshots) and derivative matrices are saved at each time step in variable approximated through linearization at time step 𝑡 (computed
the training runs. At test-time, simulations with control sequences that from Eq. (6) at the previous time step). Therefore, at test-time, Eq. (6)
are different from those of the training runs are performed. Information becomes
saved from the training runs is used to (very efficiently) approximate
test solutions. 𝝃̂ 𝑡+1 = 𝝃 𝑖+1 − (𝐉𝑟𝑖+1 )−1 [𝐀𝑟𝑖+1 (𝝃̂ 𝑡 − 𝝃 𝑖 ) + 𝐁𝑟𝑖+1 (𝐮𝑡+1 − 𝐮𝑖+1 )]. (8)
POD-TPWL entails (1) projection from a high-dimensional space to Note that 𝝃̂ 𝑡 now appears on the right-hand side instead of 𝝃 𝑡 . At
a low-dimensional subspace, (2) linear approximation of the dynamics test-time, the training ‘point,’ around which linearization is performed
in the low-dimensional subspace, and (3) projection back to the high- (this point defines 𝑖 and 𝑖 + 1), is determined using a ‘point-selection’
dimensional space. A projection matrix Φ ∈ R2𝑛𝑏 ×𝑙𝜉 is constructed based procedure. This point selection depends on 𝝃̂ 𝑡 (see He et al., 2011; Jin
on the singular value decomposition (SVD) of the solution snapshot and Durlofsky, 2018 for details), so the reduced derivative matrices
matrices (these snapshot matrices contain full-order solutions at all 𝐉𝑟𝑖+1 , 𝐀𝑟𝑖+1 and 𝐁𝑟𝑖+1 can all be considered to be functions of 𝝃̂ 𝑡 . In the
time steps in all training runs). Given Φ, the high-dimensional states last step of POD-TPWL, the approximated solutions are projected back
𝐱 ∈ R2𝑛𝑏 can be represented in terms of the low-dimensional variable to the full-order space through application of 𝐱̂ = Φ𝝃. ̂
𝝃 ∈ R𝑙𝜉 using Each of the above-mentioned steps in POD-TPWL can be viewed in
terms of an optimization, as we now consider. The projection matrix
𝐱 ≈ Φ𝝃, (3)
Φ is constructed using the POD procedure. This has the well-known
where 𝑙𝜉 is the dimension of the reduced space, with 𝑙𝜉 ≪ 𝑛𝑏 . Note property that the resulting basis matrix minimizes a projection error
that in practice, the SVD and subsequent projections are performed 𝑒proj , defined as
separately for the pressure and saturation variables. Because Φ is
𝑒proj = ‖𝐱 − ΦΦT 𝐱‖22 , (9)
orthonormal, we also have 𝝃 = Φ𝑇 𝐱.
Before discussing the POD-TPWL approximation in low-dimensional where 𝐱 ∈ R2𝑛𝑏
is the full-order state variable.
space, we first show the linearization in high dimension. Following He In addition, as discussed by He and Durlofsky (2015), the constraint
and Durlofsky (2015), the TPWL formulation (with the POD representa- reduction error can be defined as
tion for states, 𝐱 = Φ𝝃, applied to the right-hand side) can be expressed
̂ 2 ,
𝑒cr = ‖𝐱̂ − Φ𝝃‖ (10)
as Θ
where 𝐱̂ corresponds to the solution 𝐱̂ 𝑡+1 in Eq. (4) (before constraint

𝐉𝑖+1 𝐱̂ 𝑡+1 = 𝐉𝑖+1 Φ𝝃 𝑖+1 − [𝐀𝑖+1 Φ(𝝃 𝑡 − 𝝃 𝑖 ) + 𝐁𝑖+1 (𝐮𝑡+1 − 𝐮𝑖+1 )], (4)
reduction is applied); this variable was denoted as 𝐱2 in He and Durlof-
where sky (2015). The variable 𝝃̂ corresponds to the solution 𝝃̂ 𝑡+1 in Eq. (6)
𝜕𝐠𝑖+1 𝜕𝐠𝑖+1 (after constraint reduction is applied) and was expressed as 𝝃 3 in He
𝐉𝑖+1 = ∈ R2𝑛𝑏 ×2𝑛𝑏 , 𝐀𝑖+1 = ∈ R2𝑛𝑏 ×2𝑛𝑏 , and Durlofsky (2015). The notation ‖ ⋅ ‖Θ is a norm defined as ‖𝐞‖Θ =
𝜕𝐱𝑖+1 𝜕𝐱𝑖 √
(5) 𝐞T Θ𝐞, with 𝐞 ∈ R2𝑛𝑏 and Θ ∈ R2𝑛𝑏 ×2𝑛𝑏 , where Θ is a symmetric
𝜕𝐠𝑖+1
𝐁𝑖+1 = ∈ R2𝑛𝑏 ×𝑛𝑤 . positive definite matrix. The optimal constraint reduction matrix Ψ can
𝜕𝐮𝑖+1
be determined by minimizing the constraint reduction error, i.e.,
Here the subscripts 𝑡 and 𝑡 + 1 denote time steps in the test run,
while the subscripts 𝑖 and 𝑖 + 1 designate time steps in the training Ψ = arg min 𝑒cr . (11)
Ψ
simulations. Note that Eq. (4) differs slightly from the expressions
in He and Durlofsky (2015) since the time step designations are now If the matrix Θ is defined as 𝐉T 𝐉 then, following Eqs. 21 through 27
subscripted, for consistency with the embed-to-control equations shown in He and Durlofsky (2015), we arrive at the least-squares Petrov–
later. The variable 𝝃 𝑡 is the projection of the true (high-order) solution Galerkin projection, i.e.,
of Eq. (2) at time step 𝑡. The variable 𝐱̂ 𝑡+1 ∈ R2𝑛𝑏 is distinct from 𝐱𝑡+1 , in
Ψ = 𝐉Φ. (12)
that it represents the full-order variable at time step 𝑡 + 1 approximated
through linearization instead of via solution of the full-order system This treatment, which as we see is optimal in a particular norm, is now
(Eq. (2)). From here on, we will use variables without ‘hats’ to denote routinely used in POD-TPWL.
the true high-order solution (e.g., 𝐱) or the true solution projected The remaining aspect of POD-TPWL to be considered is point se-
with matrix Φ (e.g., 𝝃 = Φ𝑇 𝐱). And, we will use variables with ‘hats’ lection. Different point-selection strategies have been used for different
(𝐱̂ and 𝝃)̂ to designate solutions approximated (either reconstructed applications, and these typically include a heuristic component. These
or predicted, as will be explained in detail later) by the ROM. The procedures entail the minimization of a ‘distance’ metric, which quanti-
variables 𝐮𝑡+1 , 𝐮𝑖+1 ∈ R𝑛𝑤 are the well settings at time step 𝑡 + 1 and fies the distance (in an application-specific sense) between the current
𝑖 + 1 — these are prescribed by the user or specified by an optimization test point and a large set of training-run points. Thus, this step also
algorithm. entails an optimization. As we will see, these POD-TPWL component
4
optimizations correspond to the loss function minimization that will be 𝑁ctrl is the number of control steps in the test run. The linear transition
applied in the embed-to-control framework. A key difference, however, model 𝑄̂ trans
𝜓 (𝜓 denotes the learnable parameters) takes 𝐳0𝑝 ∈ R𝑙𝑧 and
is that in the E2C framework all of the steps are optimized together, 𝐮𝑡 ∈ R𝑛𝑤 as input, and outputs 𝐳𝑡𝑝 ∈ R𝑙𝑧 , 𝑡 = 1, … , 𝑁te sequentially,
rather than separately as in POD-TPWL. where 𝑁te is the total number of time steps in a test run. The decoder
𝑃𝜃dec (indicated by Funnel 4, with 𝜃 representing all of the learnable
3. Embed-to-control formulation parameters in the decoder) then projects the variable 𝐳𝑡𝑝 back to the
full-order state 𝐩𝑡 ∈ R𝑛𝑏 , as shown in Box 5.
In this section, we develop an embed-to-control ROM that includes We reiterate that the embed-to-control ROM incorporates the con-
physical constraints. Analogies to POD-TPWL are established for the trol variable 𝐮𝑡 ∈ R𝑛𝑤 naturally in the framework. This will be evident
various E2C components. The E2C model presented here generally fol- in Section 3.3, where we describe the linear transition model. This is
lows that developed by Watter et al. (2015), though several important an important distinction relative to the AE-LSTM-based ROM developed
modifications are introduced, as will be discussed below. by Gonzalez and Balajewicz (2018), where system controls were not
included in the model.
3.1. E2C overview In the following subsections, the three main components of the
embed-to-control framework, the encoder, the linear transition model,
The embed-to-control framework entails three processing steps: an and the decoder, will be discussed in detail. A loss function with
encoder or inference model that projects the system variables from physical constraints, along with E2C implementation details, will also
a high-dimensional space to a low-dimensional subspace (referred to be presented.
here as the latent space), a linear transition model that approximates
system dynamics in low-dimension, and a decoder or generative model 3.2. Encoder component
that projects solutions back to high-dimensional (full-order) space. The
E2C framework originally proposed by Watter et al. (2015) used a The encoder provides a low-dimensional representation of the full-
VAE architecture for both the encoder and decoder procedures, which order state variables. In contrast to the original embed-to-control im-
allowed them to account for uncertainty in predictions. In the for- plementation by Watter et al. (2015), here we adopt an AE instead
mulation here, the VAE architecture is reduced to an auto-encoder of a VAE architecture. With this treatment only the mean values of
(AE) architecture, since we are considering deterministic systems. We the latent variables are estimated, not the variances. Also, we do
performed limited numerical experimentation (some of the results from not require a sampling process in the latent space. At train-time, the
these experiments are presented in Supplementary Material) and found encoder can be simply expressed as
that AE was more accurate than VAE for our application. This may be
𝐳𝑡 = 𝑄enc
𝜙
(𝐱𝑡 ), (13)
because the complexity of the network has not reached the point where
over-fitting is a significant issue. If we use deeper networks, as might where 𝑄enc 𝜙
represents the encoder (this notation appears in Fig. 1).
be required if we wish to use the E2C ROM for production optimization The variable 𝐱𝑡 ∈ R2𝑛𝑏 is the full-order state variable at time step 𝑡, and
under uncertainty, another comparison between AE and VAE should be 𝐳𝑡 ∈ R𝑙𝑧 is the corresponding latent variable, with 𝑙𝑧 the dimension of
performed to determine the preferable architecture. the latent space.
We note that the AE architecture is commonly used for semantic In the examples presented later, we consider a 2D 60 × 60 oil-water
segmentation (Ronneberger et al., 2015), where each pixel of the image model (which means the full-order system is of dimension 7200), and
is associated with a class label, and for depth prediction (Eigen et al., we set 𝑙𝑧 = 50. This value of 𝑙𝑧 was determined based on numerical
2014), where the 3D geometry of a scene is inferred from a 2D image. experiments, where the goal was to assure that the dimension of the
In the context of subsurface flow simulation, AE architectures have subspace was high enough to accurately represent the physics, while
been used to construct surrogate simulation models as an image-to- still low enough for efficient computation. More specifically, the initial
image regression. In this case the input images are reservoir properties value of 𝑙𝑧 was set consistent with values used in POD-TPWL (Jin
(e.g., permeability field) and the outputs are state variables (Zhu and and Durlofsky, 2018; Jin et al., 2020), where 𝑙𝑧 is typically on the
Zabaras, 2018; Mo et al., 2019). order of 100. Then, 𝑙𝑧 was reduced gradually, with the accuracy of the
Fig. 1 displays the overall workflow for our embed-to-control model. E2C ROM monitored. Non-negligible reduction in E2C accuracy was
The pressure field 𝐩𝑖 ∈ R𝑛𝑏 is the only state variable shown in this observed for 𝑙𝑧 < 50; thus we use 𝑙𝑧 = 50 in this work. Note that the
illustration (the subscript 𝑖, distinct from 𝑡, denotes the time steps in a appropriate value for 𝑙𝑧 is expected to be somewhat case-dependent.
training run), though our actual problem also includes the saturation Cross-validation was also conducted to verify that the trained network
field 𝐒𝑖 ∈ R𝑛𝑏 . Additional state variables would appear in more general did not lead to over-fitting.
settings (e.g., displacements if a coupled flow-geomechanics model is Note that Eq. (13) is analogous to Eq. (3) in the POD-TPWL proce-
considered). dure, except the linear projection in POD is replaced by a nonlinear
Box 1 in Fig. 1 displays pressure snapshots 𝐩𝑖 ∈ R𝑛𝑏 , 𝑖 = 1, … , 𝑁𝑠 projection 𝑄enc𝜙
in the encoder. Following the convention described
in the full-order space, where 𝑁𝑠 is the total number of snapshots. earlier, we use variables without a ‘hat’ to denote (projected) true
The notation 𝑄enc 𝜙
in Funnel 2 denotes the encoder, which projects the solutions of Eq. (2), which are available from training runs. Variables
full space into a latent space, with 𝜙 representing all of the ‘learnable’ with a hat designate approximate solutions provided by the test-time
parameters in the encoder. By learnable parameters we mean, in gen- ROM.
eral, the set of parameters within the deep-learning-based ROM that are The detailed layout of the encoder in the E2C model is presented in
determined in the offline training step. This training is accomplished by Fig. 2. During training, sequences of pressure and saturation snapshots
minimizing an appropriate loss function. As discussed later, there are are fed through the encoder network, and sequences of latent state
learnable parameters associated with the encoder, decoder, and linear variables 𝐳𝑡 ∈ R𝑙𝑧 are generated. The encoder network used here is
transition components of the ROM. The variable 𝐳𝑖𝑝 ∈ R𝑙𝑧 in Box 3 is the comprised of a stack of four encoding blocks, a stack of three residual
latent variable for pressure, with 𝑙𝑧 the dimension of the latent space. convolutional (resConv) blocks, and a dense layer. The encoder in Fig. 2
In Box 3, the test simulation results are approximated in the latent is more complicated (i.e., it contains resConv blocks and has more
space with a linear transition model. The variable 𝐳0𝑝 ∈ R𝑙𝑧 denotes the convolutional layers) compared to those used in Watter et al. (2015).
initial latent state for a test run, and 𝐮𝑡 ∈ R𝑛𝑤 , 𝑡 = 1, … , 𝑁ctrl designates A more complicated structure may be needed here because, compared
the control sequence for a test run, with 𝑛𝑤 the number of wells (as to the prototype planning tasks addressed in Watter et al. (2015)
noted previously), the subscript 𝑡 indicates time step in the test run, and (e.g, cart–pole balancing, and three-link robotic arm planning), proper
5
Fig. 1. Embed-to-control (E2C) overview.
Fig. 2. Encoder layout.
representation of PDE-based pressure and saturation fields requires associated with a conv2D layer, and the filter response map, which
feature maps from a deeper network. collects all of these operations, is thus a third-order tensor. The output
Similar to the CNN-PCA proposed by Liu et al. (2019), which uses filter response maps are then passed through a batch normalization
the filter operations in CNN to capture the spatial correlations that (batchNorm) layer (Ioffe and Szegedy, 2015), which applies normal-
characterize geological features, the embed-to-control framework uses ization operations (shifts the mean to zero and rescales by the standard
stacks of convolutional filters to represent the spatial distribution of the deviation) for each subset of training data. A batchNorm operation is
pressure and saturation fields determined by the underlying governing a crucial step in the efficient training of deep neural networks, since it
equations. Earlier implementations with AE/VAE-based ROMs (Lee and renders the learning process less sensitive to parameter initialization,
Carlberg, 2020; Gonzalez and Balajewicz, 2018) have demonstrated the which means a larger initial learning rate can be used. The nonlinear
potential of convolutional filters to capture such fields in fluid dynamics
activation function ReLU (rectified linear unit, max(0, 𝑥)) (Glorot et al.,
problems. Thus, our encoder network is mostly comprised of these
2011) is applied on the normalized filter response maps to give a
convolutional filters (in the form of two-dimensional convolutional
final response (output) of the encoder block. This nonlinear response
layers, i.e., conv2D layer, LeCun et al., 1998). More detail on the
is referred to as the ‘activation’ of the encoding block. The conv2D-
encoder network is provided in Table 1 in Supplementary Material.
batchNorm-ReLU architecture (with variation in ordering) is a standard
The input to an encoding block is first fed through a convolution
processing step in CNNs.
operation, which can also be viewed as a linear filter. Following the
expression in Liu et al. (2019), the mathematical formulation of linear The learnable parameters in an encoding block include the collec-
filtering is tion of weights 𝐰 and bias terms 𝑏 in Eq. (14) for all of the filters in
conv2D layers, and the shifting and scaling parameters in batchNorm.
∑
𝑛 ∑
𝑛
𝐹𝑖,𝑗 (𝐱) = 𝐰𝑝,𝑞 𝐱𝑖+𝑝,𝑗+𝑞 + 𝑏, (14) These parameters are determined during training by minimizing the
𝑝=−𝑛 𝑞=−𝑛 loss function (defined later). The ReLU layer does not involve learnable
where 𝐱 is the input state map, subscripts 𝑖 and 𝑗 denote 𝑥 and 𝑦 parameters. The collection of learnable parameters in all the encoding
coordinate direction indices, 𝐰 represents the weights of a linear filter blocks, resConv blocks, and dense layers in the encoder (which is
(template) of size (2𝑛 + 1) × (2𝑛 + 1), 𝐹𝑖,𝑗 (𝐱) designates the filter response about 2.42 × 106 total parameters), is represented by 𝜙 in Eq. (13). An
map (i.e., feature map) for 𝐱 at spatial location (𝑖, 𝑗), and 𝑏 is a scalar illustration of the encoding block structure is provided in Fig. 1(a) in
parameter referred to as bias. Note that there are typically many filters Supplementary Material.
6
Fig. 3. Linear transition model layout.
To properly incorporate feature maps capable of representing the 𝐮𝑡+1 ∈ R𝑛𝑤 , and time step size 𝛥𝑡. The model outputs the predicted
spatial pressure and saturation distributions, as determined by the un- latent state for the next time step 𝐳̂ 𝑡+1 ∈ R𝑙𝑧 . We reiterate that 𝐳̂ 𝑡+1
derlying governing equations, a deep neural network with many stacks represents the output of the linear transition model. The structure of
of convolutional layers is required. Deep neural networks are, however, the linear transition model, which generally follows that in Watter et al.
difficult to train, mostly due to the vanishing-gradient issue (Glorot (2015), is comprised of a stack of three transformation (trans) blocks
and Bengio, 2010). By this we mean that gradients of the loss function and two dense layers. The trans block follows a dense-batchNorm-
with respect to the model parameters (weights of the filters) become ReLU architecture (dense represents a dense layer), which is considered
vanishingly small, which negatively impacts training. He et al. (2016) a standard processing step for fully-connected networks. The trans
addressed this issue by creating an additional identity mapping, re- block architecture is shown in Fig. 1(c) in Supplementary Material.
ferred to as resNet, that bypasses the nonlinear layer. Following the The variables 𝐳𝑡 and 𝛥𝑡 are first fed into the trans blocks. The final
idea of resNet, we add a stack of resConv blocks to the encoder network activation vector of the trans blocks, ℎtrans 𝜓′ , is then used to construct
to deepen the network while mitigating the vanishing-gradient issue. the linearization matrices 𝐀𝑡 ∈ R𝑙𝑧 ×𝑙𝑧 and 𝐁𝑡 ∈ R𝑙𝑧 ×𝑛𝑤 through two
The nonlinear layer in the resConv block still generally follows the separate dense layers. Matrices 𝐀𝑡 and 𝐁𝑡 are then combined with the
conv2D-batchNorm-ReLU architecture. See Fig. 1(d) in Supplementary latent variable for the current state 𝐳𝑡 and current step control 𝐮𝑡+1 to
Material for a depiction of the resConv block. predict the latent variable at the next time step 𝐳̂ 𝑡+1 .
Similar to that of the encoding block, the output of resConv blocks The optimization applied to determine the parameters for the linear
is a stack of low-dimension feature maps. This stack of feature maps is transition model is again analogous to a key step in POD-TPWL. In
‘flattened’ to a vector (which is still a relatively high-dimensional vector POD-TPWL, the goal is essentially to minimize the difference between
𝑡+1
due to the large number of feature maps), and then input to a dense the predicted reduced state 𝝃̂ and the projected true state 𝝃 𝑡+1 . This
layer. A dense (fully-connected) layer is simply a linear projection that is achieved, in part, by determining the optimal constraint reduction
maps a high-dimensional vector to a low-dimensional vector. matrix Ψ, as described in Eqs. (10) and (11). Given this optimal Ψ
The overall architecture of the encoder network used here differs matrix, the matrices appearing in the POD-TPWL evolution equation
from that constructed by Zhu and Zabaras (2018) in three key aspects. (Eq. (6)), namely 𝐉𝑟𝑖+1 , 𝐀𝑟𝑖+1 and 𝐁𝑟𝑖+1 , are all fully defined. As discussed
First, resNet is used in our encoder while they used denseNet (Huang earlier, point-selection represents another (heuristic) optimization that
et al., 2017) to mitigate the vanishing-gradient issue. Another key appears in POD-TPWL. Similarly, in the embed-to-control formulation,
distinction is that the encoder (and the decoder) in Zhu and Zabaras a transition loss T is computed by comparing 𝐳̂ 𝑡+1 with 𝐳𝑡+1 , where
(2018) do not include the dense layer at the end, which means the 𝐳̂ 𝑡+1 is the output from the linear transition model, and 𝐳𝑡+1 is the
encoder outputs a stack of feature maps at the end. A large number of state projected by the encoder at time step 𝑡 + 1. The transition loss
feature maps (i.e., a tall but relatively thin third-order tensor) would be contributes to the total loss function, which is minimized during the
too high-dimensional for the sequential linear operations subsequently offline stage.
performed by our linear transition model. Finally, Zhu and Zabaras The linear transition model at train-time can also be represented as
(2018) adopted a U-Net (Ronneberger et al., 2015) architecture, while
our E2C model uses a different architecture. 𝐳̂ 𝑡+1 = 𝑄̂ trans
𝜓 (𝐳𝑡 , 𝐮𝑡+1 , 𝛥𝑡), (15)
The encoder (and decoder) in the embed-to-control ROM is analo- where 𝛥𝑡 is the time step size, the function 𝑄̂ trans is the linear transition
𝜓
gous to the POD representation used in POD-TPWL. As noted earlier, model as previously defined (𝜓 denotes all of the associated learnable
the basis matrix Φ constructed via SVD of the snapshot matrices has the parameters, of which there are about 2.35 × 105 in total), and 𝐳̂ 𝑡+1 ∈ R𝑙𝑧
feature that it minimizes 𝑒proj in Eq. (9). In the context of the encoder, denotes the latent variable at 𝑡 + 1 predicted by the linear transition
a reconstruction loss R , which is similar to 𝑒proj for POD, is computed. model. To be more specific, Eq. (15) can be expressed as
Conceptually, the ‘best’ 𝑄enc𝜙
is found by minimizing R . However, as
mentioned earlier, the optimization applied for the embed-to-control 𝐳̂ 𝑡+1 = 𝐀𝑡 (𝐳𝑡 , 𝛥𝑡)𝐳𝑡 + 𝐁𝑡 (𝐳𝑡 , 𝛥𝑡)𝐮𝑡+1 , (16)
model involves all three processing steps considered together, so R is
where 𝐀𝑡 ∈ R𝑙𝑧 ×𝑙𝑧
and 𝐁𝑡 ∈ R𝑙𝑧 ×𝑛𝑤
are matrices. Consistent with the
not minimized separately.
expressions in Watter et al. (2015), these matrices are given by
3.3. Linear transition model vec[𝐀𝑡 ] = 𝐖𝐴 ℎtrans

𝜓′
(𝐳𝑡 , 𝛥𝑡) + 𝐛𝐴 , (17)
The linear transition model evolves the latent variable from one vec[𝐁𝑡 ] = 𝐖𝐵 ℎtrans (𝐳𝑡 , 𝛥𝑡) + 𝐛𝐵 , (18)
𝜓′
time step to the next, given the controls. Fig. 3 shows how the linear
(𝑙𝑧2 )×1
transition model is constructed and evaluated during the offline stage where vec denotes vectorization, so vec[𝐀𝑡 ] ∈ R and vec[𝐁𝑡 ] ∈
(train-time). The inputs to the linear transition model include the R(𝑙𝑧 𝑛𝑤 )×1 . The variable ℎtrans
𝜓 ′ ∈ R𝑛trans represents the final activation
latent variable for the current state 𝐳𝑡 ∈ R𝑙𝑧 , the current step control output after three transformation blocks (which altogether are referred
7
Fig. 4. Decoder layout.
to as the transformation network). The 𝜓 ′ in Eqs. (17) and (18) is a 3.4. Decoder component
subset of 𝜓 in Eq. (15), since the latter also includes parameters outside
2
the transformation network. Here 𝐖𝐴 ∈ R𝑙𝑧 ×𝑛trans , 𝐖𝐵 ∈ R(𝑙𝑧 𝑛𝑤 )×𝑛trans ,
(𝑙𝑧2 )×1 (𝑙 𝑛 )×1 The decoder is similar to the encoder and can be represented as
𝐛𝐴 ∈ R , and 𝐛𝐵 ∈ R 𝑧 𝑤 , where 𝑛trans denotes the dimension
of the transformation network. We set 𝑛trans = 200 in the model tested
𝐱̂ 𝑡 = 𝑃𝜃dec (𝐳𝑡 ), (22)
here.
During the online stage (test-time) the linear transition model is
where 𝑃𝜃dec is the decoder as previously defined. The variable 𝐱̂ 𝑡 ∈
slightly different, since the latent variable fed into the model (̂𝐳𝑡 ∈ R𝑙𝑧 )
R2𝑛𝑏 denotes the reconstructed state variable at time step 𝑡 (which is
is predicted from the last time step. Therefore, at test-time, Eq. (16)
becomes distinct from the high-fidelity state variable 𝐱𝑡 ∈ R2𝑛𝑏 from the training
snapshots), though the input to the decoder 𝐳𝑡 ∈ R𝑙𝑧 is the latent
𝐳̂ 𝑡+1 = 𝐀𝑡 (̂𝐳𝑡 , 𝛥𝑡)̂𝐳𝑡 + 𝐁𝑡 (̂𝐳𝑡 , 𝛥𝑡)𝐮𝑡+1 . (19)
variable determined from the encoding of 𝐱𝑡 . If the input is instead
Note the only difference is that 𝐳𝑡 on the right-hand side of Eq. (16) is 𝐳̂ 𝑡+1 ∈ R𝑙𝑧 , which is the latent variable predicted at time step 𝑡 + 1 by
replaced by 𝐳̂ 𝑡 in Eq. (19). the linear transition model, Eq. (22) becomes
The test-time formulation of the linear transition model is directly
analogous to the linear representation step in POD-TPWL. In POD- 𝐱̂ 𝑡+1 = 𝑃𝜃dec (̂𝐳𝑡+1 ), (23)
TPWL, since the training step 𝑖 (and thus 𝑖 + 1) is determined based
on the point-selection calculation involving 𝝃̂ 𝑡 , the matrices appearing where 𝐱̂ 𝑡+1 is the predicted state variable at time step 𝑡 + 1. Note that
in the online expression (Eq. (8)) can be considered to be functions of Eq. (22) only appears in the train-time procedure (to compute recon-
𝝃̂ 𝑡 . After some reorganization, Eq. (8) can then be written as structed states), while Eq. (23) has the same form at both train-time
𝑡+1 and test-time.
𝝃̂ = 𝐀TPWL
𝑡 (𝝃̂ 𝑡 )𝝃̂ 𝑡 + 𝐁TPWL
𝑡 (𝝃̂ 𝑡 )𝐮𝑡+1 + 𝐜TPWL
𝑡 , (20)
The detailed structure of the decoder is shown in Fig. 4. Latent vari-
where
ables predicted by the linear transition model (at time step 𝑡+1) are fed
𝐀TPWL
𝑡 = −(𝐉𝑖+1 −1 𝑖+1
𝑟 ) 𝐀𝑟 , 𝐁TPWL
𝑡 = −(𝐉𝑖+1 −1 𝑖+1
𝑟 ) 𝐔𝑟 , to the decoder network as input, and the predicted high-dimensional
(21)
𝐜TPWL
𝑡 = −𝐀TPWL
𝑡 𝝃𝑖 TPWL 𝑖+1
− 𝐁𝑡 𝐮 +𝝃 . 𝑖+1
states are output. The architecture of the decoder is analogous to that
Thus we see that Eq. (19) for the online stage of the embed-to-control of the encoder except it is in reversed order (which is not surprising
formulation is of the same form as Eq. (20) for the online stage of since the decoder is conducting the inverse operation). The decoder
POD-TPWL. The key difference is that matrices 𝐀𝑡 and 𝐁𝑡 in E2C are here is comprised of a dense layer, a stack of three resConv blocks,
determined by a deep-learning model instead of being constructed from a stack of four decoding blocks, and a conv2D layer. The dense layer
derivative matrices from training runs. The vector 𝐜𝑡 does not appear in converts a low-dimensional latent vector to a stack of feature maps
the E2C formulation, since this representation does not entail expansion
(after reshaping). The feature maps are expanded while going through
around nearby solutions.
Note that the transition loss used here involves pairs of time steps stacks of resConv blocks and decoding blocks. The spatial distributions
(𝑡 and 𝑡 + 1) rather than the full sequence. Approaches of this type can of the pressure and saturation fields are sequentially ‘extracted’ from
potentially lead to error accumulation over the full simulation period. the feature maps as we proceed downstream in the decoder. The
As will be demonstrated later (in the results), error accumulation is conv2D layer at the end converts the expanded feature maps to pressure
not significant for the cases considered here, though this is something and saturation fields as the final outputs. More detail on the decoder
that should be monitored. Alternate treatments for the transition loss, is provided in Table 2 in Supplementary Material. The layout of the
in which the loss function is defined over the entire simulation period, decoding block is shown in Fig. 1(b) of Supplementary Material.
are presented in Kani and Elsheikh (2019). Using approaches of this
type, error propagation over time can be effectively controlled. The To determine the learnable parameters 𝜃 in the decoder, of which
formulation in Kani and Elsheikh (2019) is, however, for cases with there are about 2.60 × 106 in total, a prediction loss PD is minimized
fixed well settings. It is not clear if this approach can be applied directly (along with the other losses) in the offline process. More details on this
for cases with time-varying well controls, as are considered here. optimization will be presented later.
8
Fig. 5. Pressure field predictions with and without 𝑝 (all colorbars in units of psi).
3.5. Loss function with physical constraints where 𝐱𝑡+1 designates the state variable at time step 𝑡 + 1 from the
training simulations, and 𝐱̂ 𝑡+1 = 𝑃𝜃dec (̂𝐳𝑡+1 ) represents the full-order state
We have described each of the components of the embed-to-control variable predicted by the ROM. The data mismatch loss is the sum of
framework. We now explain how the model parameters are deter- these losses averaged over all training data points,
mined during the offline stage. The parameters for the embed-to-control 𝑁𝑡
1 ∑
framework are 𝜙, 𝜓, and 𝜃 for the encoder, linear transition model, and 𝑑 = ( ) + (PD )𝑖 + 𝜆(T )𝑖 , (27)
𝑁𝑡 𝑖=1 R 𝑖
decoder, respectively. The objective function to be minimized is the
total loss function that quantifies the overall performance of the model where 𝜆 is a weight term.
in predicting the output state variables. The ROM as described up to this point is a purely data driven model,
We have briefly introduced the reconstruction loss (R ), the linear i.e., the goal of the model is to minimize the pixel-wise difference
transition loss (T ), and the prediction loss (PD ), which comprise between the E2C output and the high-fidelity solution (the HFS is taken
major components of the total loss function. To be more specific, the as the ‘true’ reference solution). Physical behavior is, to some extent,
reconstruction loss for a training data point 𝑖 can be expressed as inferred by E2C from the input pressure and saturation snapshots, but it
is not explicitly enforced. If the ROM is trained using the loss function
(R )𝑖 = {‖𝐱𝑡 − 𝐱̂ 𝑡 ‖22 }𝑖 , (24) 𝑑 given in Eq. (27), unphysical effects can, however, be observed.
This is illustrated in Fig. 5, where we show predictions for the pressure
where 𝑖 = 1, … , 𝑁𝑡 , with 𝑁𝑡 denoting the total number of data points field at a particular time (the problem setup will be described in detail
generated in the training runs. Note that 𝑁𝑡 = 𝑁𝑠 − 𝑛train , where in Section 4). The high-fidelity solution is shown in Fig. 5(a), and the
𝑁𝑠 is the total number of snapshots in the training runs and 𝑛train is E2C pressure field based solely on 𝑑 appears in Fig. 5(b). Although the
the number of training simulations performed. Here 𝑁𝑡 and 𝑁𝑠 differ two results are visually similar, the difference map in Fig. 5(c) indicates
because, for a training simulation containing 𝑁tr snapshots, only 𝑁tr −1 that the E2C result is not sufficiently smooth, and relatively large errors
data points can be collected (since pairs of states, at sequential time appear at some spatial locations. This could have a significant impact
steps, are required). The variable 𝐱𝑡 is the state variable at time step 𝑡 on well rate predictions, which are an essential ROM output.
from a training simulation, and 𝐱̂ 𝑡 = 𝑃𝜃dec (𝐳𝑡 ) = 𝑃𝜃dec (𝑄enc
𝜙
(𝐱𝑡 )) denotes To address this issue, we combine the loss for data mismatch with a
the states reconstructed by the encoder and decoder. loss function based on flow physics. Specifically, we seek to minimize
The linear transition loss for training point 𝑖 is similarly defined as the inconsistency in flux between each pair of adjacent grid blocks.
Extra weight is also placed on key well quantities. We consider both
(T )𝑖 = {‖𝐳𝑡+1 − 𝐳̂ 𝑡+1 ‖22 }𝑖 , (25) reconstruction (at time step 𝑡) and prediction (at time step 𝑡 + 1). Thus
we define the physics-based loss for each data point, (𝑝 )𝑖 , as
where 𝐳𝑡+1 = 𝑄enc
𝜙
is the latent variable encoded from the full-
(𝐱𝑡+1 )
order state variable at 𝑡 + 1, and the variable 𝐳̂ 𝑡+1 = 𝑄̂ trans
𝜓 (𝐳𝑡 , 𝐮𝑡+1 , 𝛥𝑡) (𝑝 )𝑖 ={‖𝐤 ⋅ [(∇𝐩𝑡 − ∇𝐩̂ 𝑡 )recon + (∇𝐩𝑡+1 − ∇𝐩̂ 𝑡+1 )pred ]‖22 }𝑖
denotes the latent variable predicted by the linear transition model. 2
(28)
𝑡 − 𝐪𝑡 )recon + (𝐪𝑡+1 − 𝐪𝑡+1 )pred ‖2 }𝑖 .
+ 𝛾{‖(𝐪𝑤 ̂𝑤 𝑤 ̂𝑤
Finally, the prediction loss for training point 𝑖 is defined as
Here 𝐩𝑡 , 𝐩𝑡+1 ∈ R𝑛𝑏 are the pressure fields at time steps 𝑡 and 𝑡 + 1
(PD )𝑖 = {‖𝐱𝑡+1 − 𝐱̂ 𝑡+1 ‖22 }𝑖 , (26) from the training data, which are components of the state variables 𝐱𝑡
9
and 𝐱𝑡+1 , and 𝐩̂ 𝑡 , 𝐩̂ 𝑡+1 ∈ R𝑛𝑏 represent the ROM pressure reconstruction
(at time step 𝑡, defined after Eq. (24)) and prediction (at time step
𝑡 + 1, defined after Eq. (26)). The variables 𝐪𝑤 𝑤
𝑡 , 𝐪𝑡+1 ∈ R
𝑛𝑤 are well
𝑤 𝑤 𝑛
quantities from the training data, and 𝐪̂ 𝑡 , 𝐪̂ 𝑡+1 ∈ R are well quantities
𝑤
reconstructed (at time step 𝑡) and predicted (at time step 𝑡 + 1) by the
ROM. Recall that 𝑛𝑤 is the total number of wells. The variable 𝛾 is a
parameter that defines the weights for well-data loss in loss function 𝑝 .
The pressure gradients in Eq. (28) are computed via numerical finite
difference. The additional computation associated with these terms is
negligible.
The terms on the right hand side of Eq. (28) correspond to the
flux and source terms in Eq. (1). In the examples in this paper, we
specify rates for injection wells and BHPs for production wells. With
this specification, the loss on injection rates is zero. The key quantity to
track for production wells is the well-block pressure for each well. This
is because production rate is proportional to the difference between
wellbore pressure (BHP in this case, which is specified) and well-block
pressure. The proportionality coefficient is the product of phase mobil-
ity 𝜆𝑗 and the Peaceman well index (Peaceman, 1978), which depends
on permeability, block dimensions and wellbore radius. Because overall Fig. 6. Log-permeability field and well locations.
well rate in this case is largely impacted by well-block pressure, we set
the second term on the right-hand side of Eq. (28) to 𝛾 ′ ‖𝐩𝑤 𝑗 − 𝐩𝑗 ‖2 ,
̂𝑤 2
𝑛𝑝 𝑤 ∈ R𝑛𝑝 (𝑗 = 𝑡, 𝑡 + 1) denote the true and ROM
where 𝐩𝑤 𝑗 ∈ R and ̂
𝐩 𝑗 fewer total training runs could be used if we extracted solutions at more
well-block pressures, and 𝑛𝑝 is the number of production wells. Here 𝛾 ′ time steps. This might have the downside of limiting the variability in
is a modified weight that accounts for the well index. the training data, so a balance between the number of runs and the
The physics-based loss function is computed by averaging (𝑝 )𝑖 over number of solutions extracted from each run must be established. This
all data points, i.e., issue should be considered in future work. We note finally that the
𝑁𝑡 training runs are completely independent of one another, so they can be
1 ∑
𝑝 = ( ) . (29) performed in parallel if a large cluster or cloud computing is available.
𝑁𝑡 𝑖=1 𝑝 𝑖
The gradient of the total loss function with respect to the model
Combining the loss for data mismatch with this physics-based loss, the parameters (𝜙, 𝜓, 𝜃) is calculated via back-propagation through the
total loss function becomes embed-to-control framework. The adaptive moment estimation (ADAM)
algorithm is used for this optimization, as it has been proven to be
 = 𝑑 + 𝛼𝑝 , (30)
effective for optimizing deep neural networks (Kingma and Ba, 2014).
where 𝛼 is a weight term. Through limited numerical experimentation, The rate at which the model parameters are updated at each iteration
we found 𝛼 = 0.033 and 𝛾 ′ = 20 to be appropriate values for is controlled by the learning rate 𝑙𝑟 . Here we set 𝑙𝑟 = 10−4 .
these parameters. The E2C ROM prediction for the pressure field at a Normalization is an important data preprocessing step, and its
particular time, using the total loss function , is shown in Fig. 5(d), appropriate application can improve both the learning process and
and the difference map appears in Fig. 5(e). We see that the ROM output quality. For saturation we have 𝑆 ∈ [0, 1], so normalization is
prediction is noticeably improved when 𝑝 is included in the loss not required. Pressure and well data, including control variables, are
function. Specifically, the maximum pressure error is reduced from normalized. Normalized rate 𝑞 0 , and pressure (both grid-block pressure
97 psi to 16 psi, and the resulting field is smoother (and thus more and BHP) 𝑝0 , are given by
physical). This demonstrates the benefit of incorporating physics-based 𝑞 − 𝑞min 𝑝 − 𝑝min
losses into the E2C ROM. We note that the (global) flux-loss terms in 𝑞0 = , 𝑝0 = . (31)
𝑞max − 𝑞min 𝑝max − 𝑝min
Eq. (28) contribute more to this error reduction than the well-block-loss
terms. Here 𝑞 denotes simulator rate output in units of m3 /day, 𝑞max and 𝑞min
are the upper and lower injection-rate bounds, 𝑝 is either grid-block
3.6. E2C implementation and training details pressure or production-well BHP (units of psi or bar), 𝑝min is the lower
bound on BHP, and 𝑝max is 1.1 times the highest field pressure observed
To train the E2C model, we use a data set  = {(𝐱𝑡 , 𝐱𝑡+1 , 𝐮𝑡+1 )𝑖 }, 𝑖 = (the factor of 1.1 ensures essentially all data fall within the range).
1, … , 𝑁𝑡 , containing full-order states and corresponding well controls,
where 𝑁𝑡 is the total number of training run data points. In the The workflows for the offline and online components of the E2C
examples in this paper, we simulate a total of 300 training runs. As ROM are summarized in Algorithm 1. In terms of timing, each full-
discussed earlier, this is many more than are used with POD-TPWL order training simulation requires about 60 seconds to run on dual
(where we typically simulate three or five training runs), but we expect Intel Xeon ES-2670 CPUs (24 cores). Our E2C ROM is implemented
a much higher degree of robustness with E2C. By this we mean that E2C using Keras (Chollet et al., 2015) with TensorFlow (Abadi et al., 2015)
can provide accurate results over a large range of control specifications, backend. The offline training process (excluding training simulation
rather than over a limited range as in POD-TPWL. runtime) takes around 10–12 min on a Tesla V100 GPU node (exact
Part of the reason we use a large number of full-order training sim- timings depend on the memory allocated, which can vary from 8–
ulations with the E2C ROM is that the pressure and saturation solutions 12 GB). The model is applied on 100 test runs, which will be discussed
(snapshots) are extracted at only 20 time steps in each simulation run. in detail in the following section. Nearly all of the test results presented
These time steps correspond to control steps (changes in well controls). are based on the use of 300 training runs, though we also present
Thus we set 𝑁ctrl = 𝑁tr = 𝑁te = 20. This treatment accelerates training summary error statistics using 100 and 200 training runs. Offline
and focuses ROM predictions on quantities of interest at time steps training for these cases requires about the same amount of time as for
when the controls are changing. With 300 training runs, this provides 300 training runs, except for the direct savings in the full-order training
a total number of data points of 𝑁𝑡 = 5700. It is possible, however, that simulations.
10
Fig. 7. Test Case 1: well controls.
4. Results using embed-to-control ROM standard deviation of log-permeability is 0.88. Permeability is taken to
be isotropic, and porosity is set to a constant value of 0.2.
In this section, we describe the model setup for the oil-water sim- The relative permeability functions are given by
ulations and we present simulation results for the deep-learning-based ( ) ( )𝑏
1 − 𝑆𝑤 − 𝑆𝑜𝑟 𝑎 𝑆𝑤 − 𝑆𝑤𝑟
ROM. One of the test cases is considered in detail in this section; results 𝑘𝑟𝑜 (𝑆𝑤 ) = 𝑘0𝑟𝑜 , 𝑘𝑟𝑤 (𝑆𝑤 ) = 𝑘0𝑟𝑤 , (32)
1 − 𝑆𝑤𝑟 − 𝑆𝑜𝑟 1 − 𝑆𝑤𝑟 − 𝑆𝑜𝑟
for two additional test cases are provided in Supplementary Material.
In this section we also present summary error results for all 100 test where 𝑘0𝑟𝑜 = 1.0, 𝑘0𝑟𝑤 = 0.7, 𝑆𝑜𝑟 = 0.3, 𝑆𝑤𝑟 = 0.1, 𝑎 = 3.6, and 𝑏 = 1.5.
cases. Fluid densities are set to 𝜌𝑜 = 800 kg∕m3 and 𝜌𝑤 = 1000 kg∕m3 , and
viscosities are specified as 𝜇𝑜 = 0.91 cp and 𝜇𝑤 = 0.31 cp. Capillary
pressure effects are neglected.
4.1. Model setup The initial pressure at the top of the reservoir is 4712 psi (325 bar),
and the initial water saturation is 0.1. The total number of primary
The geological model, in terms of the log-permeability field, is variables in the system is 3600 × 2 = 7200. The model is run for a
shown in Fig. 6. The locations of the four injection wells and five total of 2000 days. This simulation time frame is the same for both the
production wells are also displayed. The reservoir model contains training and test runs. The injection wells are controlled by specifying
60 × 60 (total of 3600) grid blocks, with each block of dimensions time-varying water rates, and the production wells are controlled by
50 m×50 m×10 m. The correlation structure of the log-permeability field specifying time-varying BHPs. The controls for both production wells
is characterized by an exponential variogram model, with maximum and injection wells are altered every 100 days, which means there are
and minimum correlation lengths of ∼1000 m and ∼500 m, and an 20 control periods. Therefore, we have a total of 9 × 20 = 180 control
azimuth of 45◦ . The arithmetic mean permeability is 158 mD, and the parameters over the entire simulation time frame. The range for the
11
Algorithm 1: E2C ROM procedures runs are required. In fact, as a deep-learning-based ROM, E2C requires,
Procedure: Offline procedure and is fully compatible with, large numbers of training runs. Thus,
1 Perform training simulations with given control settings; although there are conceptual similarities, the two approaches are
2 Collect snapshots 𝐱𝑡 and controls 𝐮𝑡 and normalize with Eq. (31); distinct in terms of their pre-processing demands and applicability
3 Construct training dataset ; range, which greatly limits our ability to perform direct comparisons
4 for each training epoch do with current implementations.
5 Feed the training dataset through E2C model (i.e., 𝑄𝜙 , 𝑄̂ 𝜓 ,
and 𝑃𝜃 defined in Eqs. (13), (16) and (22)); 4.2. Results for test Case 1
6 Compute loss function  defined in Eq. (30);
7 Get derivatives of  with respect to (𝜙, 𝜓, 𝜃); In this section we present detailed results for a particular test case.
8 Update (𝜙, 𝜓, 𝜃); These include well quantities (injection BHPs and production rates) and
9 end global quantities (pressure and saturation fields). The injection rate and
Procedure: Online procedure BHP profiles for Test Case 1 are displayed in Fig. 7. Here we show the
10 Construct E2C model with parameters (𝜙, 𝜓, 𝜃) determined in water rates for the four injection wells (Fig. 7(a)–(d)), and the BHPs for
the offline procedure; the five production wells (Fig. 7(e)–(i)).
11 Initialize predicted value 𝐱̂ 0 = 𝐱0 ; We now assess the performance of the deep-learning-based ROM
12 for 𝑡 = 1, … , 𝑇 (final simulation time) do for this test case. The time-evolution of the global saturation field is
13 Predict 𝐱̂ 𝑡+1 from 𝐱̂ 𝑡 with E2C model defined in Eqs. (13), first considered. The first column in Fig. 8 displays results for the
(19) and (23); saturation field at 200 days. In Fig. 8(a) the full-order saturation field
14 end (also referred to as the high-fidelity solution, HFS) is shown, and the
15 Compute well quantities and perform any subsequent analysis; corresponding E2C ROM result for saturation is presented in Fig. 8(d).
The color scale indicates water saturation value (thus red denotes
water). The close visual agreement between Fig. 8(a) and (d) suggests
that the deep-learning-based ROM is able to provide accurate results
injection rates is between 1500 and 6500 bbl/day (between 238 and for this quantity. The level of agreement between the two solutions is
1033 m3 /day). This is a very large range for well operation compared quantified in Fig. 8(g), where the difference between the HFS and ROM
with what is often considered with ROMs (Jin and Durlofsky, 2018; solutions is displayed. Note that the colorbar scale here is very different
Jin et al., 2020). The range for production BHPs is 3770 to 3988 psi than that in Fig. 8(a) and (d). The error between the ROM and HFS
(between 260 and 275 bar). results is clearly very small.
The controls for the training and test runs are specified as follows. In order to better quantify the predictive ability of the E2C ROM,
For each injection well, we randomly sample, from a uniform distribu- we introduce the concept of the ‘closest training run.’ We use this
tion between 2000 and 6000 bbl/day, a baseline injection rate 𝑞𝑤 base . term to denote the specific training run, out of the 300 training runs
Then, at each control period, we sample uniformly a perturbation 𝑞𝑤 ′ performed, that most closely resembles (in a particular sense) the test
over the range [−500, 500] bbl/day. The rate for the control period is case. The ‘distance’ between the test run and each of the training runs
then prescribed to be 𝑞𝑤 base + 𝑞 ′ . Producer BHPs at each control step is quantified in terms of the Euclidean distance between their vectors
𝑤
are sampled uniformly over the range [3770, 3988] psi. For production of normalized control parameters, and the ‘closest training run’ (𝑘∗ ) is
wells there is not a baseline BHP, and the settings from control step the training run with the minimum distance. Specifically,
to control step are uncorrelated. This approach for specifying injection 𝑘∗ = arg min ‖𝐔0te − 𝐔0tr,𝑘 ‖22 , (33)
rates results in a wide range of solution behaviors (e.g., saturation 𝑘
distributions), since well-by-well injection varies considerably from run where 𝑘 = 1, … , 300, denotes the index for the training runs, 𝐔te ∈
to run. This treatment also avoids the averaging effect that can occur R𝑛𝑤 ×𝑁ctrl represents the control inputs for the test run, 𝐔tr,𝑘 ∈ R𝑛𝑤 ×𝑁ctrl
if injection rates are not referenced to a baseline value 𝑞𝑤 base . Well
indicates the control inputs for training run 𝑘, and the superscript 0
specifications for a test case, generated using this procedure, are shown designates normalized pressures and rates in the controls, as per the
in Fig. 7. normalizations in Eq. (31).
We perform 300 training simulations to construct the E2C ROM, Eq. (33) provides a very approximate indicator of the ‘closest train-
except where otherwise indicated. As discussed in previous papers ing run.’ This definition has the advantage of simplicity, though more
(e.g., Jin and Durlofsky, 2018), the types of well schedules shown in involved (and computationally demanding) assessments would be ex-
Fig. 7 are intended to represent the well control profiles evaluated pected to provide closer training solutions. These would, however,
during optimization procedures, where the goal is to maximize oil require the application of an approach along the lines of the point-
production or profitability, or to minimize environmental impact or selection procedure used in POD-TPWL (Jin and Durlofsky, 2018).
some measure of risk. We note finally that the dimension of the E2C This would entail computing a measure of distance over many time
latent space, 𝑙𝑧 , is set to 50. steps for each training run. Since we have 300 training runs here
As discussed earlier, comparisons of E2C results with those from (as opposed to three or five with POD-TPWL), this would become
POD-TPWL are not performed in this work. This is because the range very time consuming. Thus we apply the simple approach defined in
over which the well settings are varied, and the number of training runs Eq. (33), with the recognition that more sophisticated procedures could
performed, are very different between the two procedures, so direct be devised.
comparisons are not feasible. In particular, existing POD-TPWL imple- We now return to the global saturation results. Fig. 8(j) shows the
mentations typically involve 3–5 training runs, and relatively small difference between the ‘closest training run’ (determined as we just
perturbations in the test-run controls relative to those used in training described and simulated at high fidelity), and the test-case saturation
runs (during optimization, retraining can be used with POD-TPWL fields. The colorbar scale is the same as in Fig. 8(g). The advantage
if necessary). Algorithmic treatments, specifically the point-selection of applying the deep-learning-based ROM is evident by comparing
strategy used to determine the solution around which linearization Fig. 8(g) and (j). More specifically, the error in Fig. 8(g) is about an
is performed, would require detailed reformulation if large numbers order of magnitude less than the differences evident in Fig. 8(j).
of training runs were used with POD-TPWL. With the E2C ROM, by The second and third columns in Fig. 8 display analogous results
contrast, the controls span a much broader range, but 100–300 training for the saturation fields at 1000 and 1800 days. The evolution of the
12
Fig. 8. Test Case 1: saturation field at 200, 1000 and 1800 days. (For interpretation of the references to color in this figure, the reader is referred to the web version of this
article.).
saturation field with time is apparent, and the deep-learning-based Note that in the ROM solutions, we do observe some local (un-
ROM solutions (Fig. 8(e) and (f)) are again seen to be in close visual physical) extrema within the saturation plumes. These are perhaps
agreement with the HFS (Fig. 8(b) and (c)). The error maps in Fig. 8(h) most apparent in Fig. 8(f), in the two lower plumes. At the outer
(bottom) edges of both plumes, we see small ‘islands’ of yellow pixels
and (i) further quantify the accuracy of the deep-learning-based ROM.
within a light green background. These local fluctuations correspond to
These errors are quite small compared with the difference maps be- nonmonotonic saturation profiles, which are not observed in the HFS
tween the ‘closest training run’ and the HFS, shown in Fig. 8(k) and (see Fig. 8(c)). Smaller fluctuations in similar regions can also be seen
(l), which further illustrates the effectiveness of the ROM. in Fig. 8(e). These unphysical extrema are a minor issue here since the
13
Fig. 9. Test Case 1: pressure field at 1000 days (all colorbars in units of psi).
Fig. 10. Test Case 1: production rates for Well P1. (For interpretation of the references to color in this figure, the reader is referred to the web version of this article.).
14
Fig. 11. Test Case 1: production rates for Well P2. (For interpretation of the references to color in this figure, the reader is referred to the web version of this article.).
difference maps show small overall discrepancies between the ROM and the 100 test runs. This corresponds to a speedup factor of 375, which
high-fidelity solutions. In some cases, however, such fluctuations could is less than that achieved with the GPU, but still very substantial.
be a cause for concern. A potential remedy for this would be to add
a term to the physics-based loss function such that local extrema in 4.3. Results and error measurement for all test cases
saturation that are inconsistent with the governing flow equations are
penalized.
In this section we assess the accuracy of the ROM results for the full
The global pressure field at particular times is also of interest. In
ensemble of 100 test cases. We first consider field cumulative oil and
Fig. 9(a) and (b) we display the HFS and ROM pressure solutions
water production, which are given by
at 1000 days. The close visual agreement suggests that the deep-
𝑛𝑝
𝑇 ∑
learning-based ROM is able to provide accurate (and smooth) pressure
predictions. Fig. 9(c) shows the error map for the ROM solution, where 𝑄𝑗 = 𝑞𝑗𝑤 (𝑡)d𝑡. (34)
∫0
𝑤=1
we see that errors are indeed very small. These errors are much less
than those for the ‘closest training run,’ which are shown in Fig. 9(d). Here 𝑗 = 𝑜, 𝑤 denotes the phase, 𝑛𝑝 is the total number of production
In many subsurface flow applications the well responses are of wells, 𝑇 designates the total simulation time, and 𝑞𝑗𝑤 (𝑡) represents the
primary interest. E2C ROM predictions for these quantities will now be fluid rate for phase 𝑗 at time step 𝑡.
assessed. Since in this problem we specify injection rates and produc- In Fig. 13 we present crossplots of 𝑄𝑜 and 𝑄𝑤 for the HFS and ROM
tion well BHPs, the quantities of interest are injection well BHPs and solutions for the 100 test cases. The three ×’s on each plot indicate the
oil and water production rates. Figs. 10 and 11 display the phase flow results for Test Cases 1, 2 and 3. It is evident that these cases are quite
rates for Wells P1 and P2, which are the wells contributing most to total different in terms of 𝑄𝑜 and 𝑄𝑤 , and in this sense span the range of the
field production. Fig. 12 shows the BHP responses for all four injection 100 test cases. We see that the points in both plots fall near the 45◦
wells. In all figures the black curves represent the full-order (reference) line, which demonstrates that our ROM solutions are in close agreement
HFS, the red curves are the deep-learning-based ROM results, and the with the HFS. The results for 𝑄𝑤 in Fig. 13(b) indicate that the ROM
blue curves are the results for the ‘closest training run.’ A high degree under-predicts cumulative water production. The under-prediction is
of accuracy between the ROM and HFS results is consistently observed. relatively small, however, as the range covered in this plot is narrow.
The level of agreement in these essential quantities is enhanced through Note also that a slight over-prediction for cumulative oil production is
the additional weighting placed on well-block quantities in loss func- evident in Fig. 13(a).
tion 𝑝 (see Eq. (28)). Recall that we determine all well quantities using
The errors in Fig. 13 appear to be due, at least in part, to a trend
a standalone well model to ensure consistency between the HFS and
toward the slight under-prediction of water saturation at production-
ROM computations.
well grid blocks at late time. This often occurs when water production
We provide results for two more examples (Test Cases 2 and 3)
rates are the highest. These errors could potentially be reduced by
in Supplementary Material. These results corroborate our observations
modifying the loss function used in training to include mismatch in
here; namely, that the deep-learning-based ROM is able to accurately
production-block saturation values. This would, however, likely act to
predict both global saturation and pressure distributions and well
compromise ROM prediction accuracy for other quantities of interest.
quantities of interest.
Since the errors in Fig. 13 are small, we did not pursue this line of
Finally, we discuss the timings for the high-fidelity and ROM runs.
The high-fidelity test cases take 60 seconds each to simulate using AD- investigation, but this could be a topic for future work.
GPRS on a node with dual Intel Xeon ES-2670 CPUs (24 cores). The We now introduce a number of error measures, which will be used
full batch of 100 test cases can be evaluated using the E2C ROM in to assess the general performance of the E2C ROM. These error metrics
about 1.25 seconds on a Tesla V100 GPU node with 8 GB of memory follow those used in Jin et al. (2020). The relative error for oil or water
allocated. A direct comparison thus indicates a speedup factor of 4800, production rate, for a single production well 𝑝, is defined as:
though this value involves comparing a GPU to a CPU. If we instead use 𝑇 | 𝑗,𝑝 𝑗,𝑝 |
∫0 |𝑞ROM (𝑡) − 𝑞HFS (𝑡)|d𝑡
the same CPU for the online E2C ROM runs as was used for the full- 𝑒𝑝𝑗 = | | , (35)
𝑇 | 𝑗,𝑝 |
order simulations (Intel Xeon ES-2670), it takes around 16 seconds for ∫0 |𝑞HFS (𝑡)|d𝑡
| |
15
Fig. 12. Test Case 1: injection BHPs. (For interpretation of the references to color in this figure, the reader is referred to the web version of this article.).
where 𝑗 = 𝑜, 𝑤 is the fluid phase, 𝑞 𝑗,𝑝 (𝑡) is the oil or water production Error in global quantities is also of interest. We define global
rate at time 𝑡 for production well 𝑝, the subscripts HFS and ROM denote pressure and saturation error as:
the high-fidelity and ROM results, and 𝑇 is the total simulation time. ∑ 𝑛𝑏 𝑇 | |
∫ | 𝑣𝑘
𝑘=1 0 | ROM
− 𝑣𝑘HFS |d𝑡
We define the error for overall production rate, 𝐸𝑟 , in terms of 𝑒𝑜 and
𝐸𝑣 = | , (39)
𝑒𝑤 for all production wells, as: ∑ 𝑛𝑏 𝑇 | 𝑘 |
∫ |𝑣
𝑘=1 0 | HFS |
| d𝑡
𝑛
1 ∑ 𝑝
𝑝
𝐸𝑟 = (𝑒 + 𝑒𝑝𝑤 ), (36) where 𝑣𝑘 denotes the global variable of interest in grid block 𝑘 (pressure
𝑛𝑝 𝑝=1 𝑜 𝑝𝑘 or saturation 𝑆 𝑘 ), and 𝑛𝑏 is the total number of grid blocks in the
where 𝑛𝑝 is the total number of production wells. Similarly, the relative model.
error in injection BHP for a single injection well 𝑖 is defined as: These four error quantities are displayed as the red points in Fig. 14.
We also evaluate these errors for the ‘closest training run’ for all test
𝑇 | |
∫0 |𝑝𝑤,𝑖 (𝑡) − 𝑝𝑤,𝑖 (𝑡)|d𝑡 cases. In the plots, the points are ordered by increasing error for the
𝑒𝑖BHP = | ROM HFS |
, (37)
𝑇 | | ‘closest training run’ (blue points). Results for Test Cases 1, 2 and 3
∫0 |𝑝𝑤,𝑖 (𝑡)|d𝑡
| HFS | are indicated in each plot. We see that the ROM errors are consistently
where 𝑝𝑤,𝑖 (𝑡) denotes the injection BHP at time 𝑡 for injection well 𝑖. very small, while the errors for the ‘closest training run’ are large in
The overall injection well BHP error 𝐸BHP is then given by: many cases. Interestingly, the ROM errors do not appear to depend on
𝑛𝑖 the error associated with the ‘closest training run.’ This is a desirable
1 ∑ 𝑖
𝐸BHP = 𝑒 , (38) feature as it suggests a high degree of robustness in the E2C ROM.
𝑛𝑖 𝑖=1 BHP
It might also be of interest to evaluate the E2C ROM in terms
where 𝑛𝑖 is the total number of injection wells. of mass balance error. This could theoretically be accomplished by
16
Fig. 13. Cumulative oil and water production for all 100 test cases.
Fig. 14. Errors for quantities for interest. (For interpretation of the references to color in this figure, the reader is referred to the web version of this article.).
computing the net mass flux into each grid block and subtracting the the degree of deviation from zero would indicate mass balance error. In
mass accumulation (for well blocks, the source term would also enter our treatment, however, we predict pressure and saturation fields only
this computation). This quantity should sum to zero for each block, and at 20 particular time steps; i.e., every 100 days. If we try to compute
17
Fig. 15. ROM error with different numbers of training runs. (For interpretation of the references to color in this figure, the reader is referred to the web version of this article.).
mass balance error using sequential solutions, this 100-day ‘effective’ Table 1
time step is too large to provide meaningful results. In order to assess Errors and percentiles for test cases.
the mass balance error in our method, we would need to train (and Case 𝐸𝑟 𝐸BHP 𝐸𝑆 𝐸𝑝
then predict) for a significantly larger number of time steps. This would Test Case 1 Error 0.11 0.0054 0.042 0.0033
enable us to have a sufficiently small time step in the mass balance Percentile 37 15 70 98
computation. Test Case 2 Error 0.19 0.0064 0.050 0.0031
We now briefly consider the use of smaller numbers of training Percentile 96 29 99 95
runs in the construction of the E2C ROM. For these cases we present Test Case 3 Error 0.16 0.012 0.042 0.0017
only summary error results. Fig. 15 displays the four relative errors Percentile 77 84 73 24
considered above, in terms of box plots, for 100, 200 and 300 training
runs. In each box, the central orange line indicates the median error,
and the bottom and top edges of the box show the 25th and 75th
5. Concluding remarks
percentile errors. The ‘whiskers’ extending out from the boxes indicate
the minimum and maximum errors. There is significant improvement in
ROM accuracy as we proceed from 200 to 300 training runs. In future In this work, we introduced a deep-learning-based reduced-order
work it will be useful to establish approaches to determine the required modeling procedure for subsurface flow simulation. The procedure was
number of training runs. adapted from the existing embed-to-control (E2C) procedure, though
Because it is difficult to display the errors for Test Cases 1, 2 and 3 we introduced some key modifications relative to the formulation
in the box plots in Fig. 15, we present them in Table 1. These results are
in Watter et al. (2015). Essentially, the ROM consists of an auto-
with 300 training runs. Note that the average values for 𝐸𝑟 , 𝐸BHP , 𝐸𝑆
encoder (AE) and a linear transition model. In our E2C formulation,
and 𝐸𝑝 across all 100 test cases are about 0.14, 0.02, 0.04 and 0.002,
respectively. The error values for the three test cases shown in the table an additional physics-based loss function was combined with the data-
can be seen to represent a reasonable spread among the full set of test mismatch loss function to enhance consistency with the governing flow
cases. It is of interest to observe that the four errors do not appear to equations. Although it is based on deep-learning concepts and methods,
be closely correlated within a particular test case. For example, in Test the various E2C ROM steps were shown to be analogous to those used
Case 1, 𝐸𝑝 is in the 98th percentile, while 𝐸BHP is in the 15th percentile. in the well-developed physics/numerics-based POD-TPWL ROM.
18
In most of our evaluations, we performed 300 training runs in the Acknowledgments

offline step. Excluding the run time for the training simulations, the
offline model construction required 10-12 minutes for ROM training We are grateful to the Stanford University Smart Fields Consortium
using a Tesla V100 GPU. Online (runtime) speedups of over a factor (SFC) for partial funding of this work. We thank the Stanford Center
of 1000, relative to AD-GPRS full-order simulations, were observed for for Computational Earth & Environmental Science (CEES) for providing
the case considered. Given the offline costs and online speedup, the the computational resources used in this study. We also thank Yuke Zhu
use of this ROM is appropriate when many (related) simulation runs and Aditya Grover for useful discussions, and Oleg Volkov for help with
are required. This is the case in production optimization computations,
the AD-GPRS software.
data assimilation and uncertainty assessments (though in this work only
a single geological model was considered).
The deep-learning-based ROM was tested on 2D oil-water reser- Appendix A. Supplementary data
voir simulation problems involving a heterogeneous permeability field.
Large variations (relative to training runs) in injection and production Supplementary material related to this article can be found online
well control settings were prescribed in the test cases. A total of 100 test at https://doi.org/10.1016/j.petrol.2020.107273.
cases were considered. ROM accuracy was assessed for key quantities The code is available at https://github.com/lonelysun1990/e2c_
of interest, including well injection BHPs, phase production rates, and jpse.
global pressure and saturation fields. The E2C ROM was shown to be Please see the data as ‘‘data.zip’’ in the google drive https://drive.g
consistently accurate over the full set of test runs. ROM error was seen oogle.com/drive/folders/1P-R6uNkzw4lbVjgOIoe42okom08MtAN7?us
to be much lower than that for the ‘closest training run’ (appropriately p=sharing.
defined). Error was found to increase, however, if 100 or 200 training
runs were used instead of 300.
References
In future work, the E2C ROM should be extended to more compli-
cated 3D problems and tested on realistic cases involving, e.g., 106 or
Abadi, M., et al., 2015. TensorFlow: large-scale machine learning on heterogeneous
more cells. Extension to 3D can be approached by replacing conv2D systems. https://www.tensorflow.org/.
layers with conv3D layers. The online computation time for the 3D Amsallem, D., Zahr, M.J., Farhat, C., 2012. Nonlinear model order reduction based on
ROM will still be negligible compared to the full-order simulation local reduced-order bases. Internat. J. Numer. Methods Engrg. 92 (10), 891–916.
model, but training times will increase with problem size. As noted Canchumuni, S.W., Emerick, A.A., Pacheco, M.A.C., 2019. History matching geological
in Tang et al. (2019), some of the training computations scale linearly facies models based on ensemble smoother and deep generative models. J. Pet. Sci.
Eng. 177, 941–958.
with the number of grid blocks in the model, so training times for large
Cardoso, M., Durlofsky, L.J., 2010. Linearized reduced-order models for subsurface flow
3D models could require many hours. Thus it will be useful to explore simulation. J. Comput. Phys. 229 (3), 681–700.
a range of network architectures to identify designs for which training Carlberg, K., Bou-Mosleh, C., Farhat, C., 2011. Efficient non-linear model reduction via
can be accomplished in reasonable time. A systematic hyper-parameter a least-squares Petrov–Galerkin projection and compressive tensor approximations.
tuning mechanism should also be established for more complicated (2D Internat. J. Numer. Methods Engrg. 86 (2), 155–181.
or 3D) models. The relationship between the best hyper-parameters, Carlberg, K., Farhat, C., Cortial, J., Amsallem, D., 2013. The GNAT method for nonlinear
number of training runs, and the complexity of the model should be model reduction: effective implementation and application to computational fluid
dynamics and turbulent flows. J. Comput. Phys. 242, 623–647.
investigated.
Chaturantabut, S., Sorensen, D.C., 2010. Nonlinear model reduction via discrete
Future investigations should also consider the use of the E2C ROM empirical interpolation. SIAM J. Sci. Comput. 32 (5), 2737–2764.
for production optimization computations. The use of the method with Chollet, F., et al., 2015. Keras. GitHub, https://github.com/fchollet/keras.
a range of optimization algorithms should be considered. Importantly, Chung, J., Gulcehre, C., Cho, K., Bengio, Y., 2014. Empirical evaluation of gated
the E2C ROM is expected to be applicable for use with global as well recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.
as local optimization algorithms. This is in contrast to existing POD- Efendiev, Y., Gildin, E., Yang, Y., 2016. Online adaptive local-global model reduction
based ROMs, which can only be expected to be accurate in more limited for flows in heterogeneous porous media. Computation 4 (2), 22.
Eigen, D., Puhrsch, C., Fergus, R., 2014. Depth map prediction from a single image
neighborhoods and are thus most suitable for local-search methods. It
using a multi-scale deep network. In: Advances in Neural Information Processing
is also of interest to explore the potential of predicting flow responses Systems, Montreal, Canada, pp. 2366–2374.
with changing well locations. If this is successful, the ROM could be Florez, H., Gildin, E., 2019. Model-order reduction of coupled flow and geomechanics
applied for well location optimization, or combined well location and in ultra-low permeability ULP reservoirs (SPE paper 193911). In: SPE Reservoir
control optimization problems. Simulation Conference, Galveston, Texas, USA.
Finally, it will be of interest to extend the ROM to systems with Gers, F.A., Schmidhuber, J., Cummins, F., 1999. Learning to forget: continual predic-
tion with LSTM. In: 9th International Conference on Artificial Neural Networks,
geological uncertainty. Although in this work we found an AE to out-
Edinburgh, UK.
perform both a VAE and a UAE, as is demonstrated in Supplementary Glorot, X., Bengio, Y., 2010. Understanding the difficulty of training deep feedforward
Material, it is possible that VAE or UAE could be preferable for prob- neural networks. In: Thirteenth International Conference on Artificial Intelligence
lems involving geological uncertainty. This is because VAE and UAE and Statistics, Sardinia, Italy.
can effectively reduce over-fitting for systems with uncertainty. Thus Glorot, X., Bordes, A., Bengio, Y., 2011. Deep sparse rectifier neural networks. In:
these treatments may be effective for these more general problems. Fourteenth International Conference on Artificial Intelligence and Statistics, Ft.
Lauderdale, Florida, USA.
Declaration of competing interest Gonzalez, F.J., Balajewicz, M., 2018. Deep convolutional recurrent autoencoders for
learning low-dimensional feature dynamics of fluid systems. arXiv preprint arXiv:
1808.01346.
The authors declare that they have no known competing finan- Grover, A., Ermon, S., 2018. Uncertainty autoencoders: learning compressed repre-
cial interests or personal relationships that could have appeared to sentations via variational information maximization. arXiv preprint arXiv:1812.
influence the work reported in this paper. 10539.
He, J., Durlofsky, L.J., 2014. Reduced-order modeling for compositional simulation by
CRediT authorship contribution statement use of trajectory piecewise linearization. SPE J. 19 (05), 858–872.
He, J., Durlofsky, L.J., 2015. Constraint reduction procedures for reduced-order
Zhaoyang Larry Jin: Conceptualization, Methodology, Data cura- subsurface flow models based on POD-TPWL. Internat. J. Numer. Methods Engrg.
103 (1), 1–30.
tion, Software, Investigation, Formal analysis, Visualization, Validation,
He, J., Sætrom, J., Durlofsky, L.J., 2011. Enhanced linearized reduced-order models for
Writing- original draft, Writing- review & editing. Yimin Liu: Method-
subsurface flow simulation. J. Comput. Phys. 230 (23), 8313–8341.
ology, Software, Writing- review & editing. Louis J. Durlofsky: Super- He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition.
vision, Project administration, Funding acquisition, Writing- review & In: IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas,
editing. Nevada, USA.
19
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q., 2017. Densely connected Raissi, M., Perdikaris, P., Karniadakis, G., 2019. Physics-informed neural networks:
convolutional networks. In: Computer Vision and Pattern Recognition, Honolulu, a deep learning framework for solving forward and inverse problems involving
USA. nonlinear partial differential equations. J. Comput. Phys. 378, 686–707.
Ioffe, S., Szegedy, C., 2015. Batch normalization: Accelerating deep network training Rewienski, M., White, J., 2003. A trajectory piecewise-linear approach to model order
by reducing internal covariate shift. arXiv preprint arXiv:1502.03167. reduction and fast simulation of nonlinear circuits and micromachined devices.
Jiang, R., Durlofsky, L.J., 2019. Implementation and detailed assessment of a GNAT IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 22 (2), 155–170.
reduced-order model for subsurface flow simulation. J. Comput. Phys. 379, Ronneberger, O., Fischer, P., Brox, T., 2015. U-net: Convolutional networks for
192–213. biomedical image segmentation. In: International Conference on Medical Image
Jin, Z.L., Durlofsky, L.J., 2018. Reduced-order modeling of CO2 storage operations. Int. Computing and Computer-Assisted Intervention, Munich, Germany.
J. Greenh. Gas Control 68, 49–67. Tan, X., Gildin, E., Florez, H., Trehan, S., Yang, Y., Hoda, N., 2019. Trajectory-based
Jin, Z.L., Garipov, T., Volkov, O., Durlofsky, L.J., 2020. Reduced-order modeling of DEIM (TDEIM) model reduction applied to reservoir simulation. Comput. Geosci.
coupled flow and quasistatic geomechanics. SPE J. 25 (01), 326–346. 23 (1), 35–53.
Johnson, J., Alahi, A., Li, F.-F., 2016. Perceptual losses for real-time style transfer and Tang, M., Liu, Y., Durlofsky, L.J., 2019. A deep-learning-based surrogate model for
super-resolution. In: European Conference on Computer Vision, Amsterdam, The data assimilation in dynamic subsurface flow problems. arXiv preprint arXiv:
Netherlands. 1908.05823.
Kani, J.N., Elsheikh, A.H., 2019. Reduced-order modeling of subsurface multi-phase Temirchev, P., Gubanova, A., Kostoev, R., Gryzlov, A., Voloskov, D., Koroteev, D.,
flow models using deep residual recurrent neural networks. Transp. Porous Media Simonov, M., Akhmetov, A., Margarit, A., Ershov, A., 2019. Reduced order reservoir
126 (3), 713–741. simulation with neural-network based hybrid model (SPE paper 196864). In: SPE
Kingma, D.P., Ba, J., 2014. Adam: A method for stochastic optimization. arXiv preprint Russian Petroleum Technology Conference, Moscow, Russia.
arXiv:1412.6980. Temirchev, P., Simonov, M., Kostoev, R., Burnaev, E., Oseledets, I., Akhmetov, A.,
Kingma, D.P., Dhariwal, P., 2018. Glow: generative flow with invertible 1x1 con- Margarit, A., Sitnikov, A., Koroteev, D., 2020. Deep neural networks predicting oil
volutions. In: Advances in Neural Information Processing Systems, Montreal, movement in a development unit. J. Pet. Sci. Eng. 184, 106513.
Canada. Trehan, S., Durlofsky, L.J., 2016. Trajectory piecewise quadratic reduced-order model
Kingma, D.P., Welling, M., 2013. Auto-encoding variational Bayes. arXiv preprint for subsurface flow, with application to PDE-constrained optimization. J. Comput.
arXiv:1312.6114. Phys. 326, 446–473.
Kostorz, W., Muggeridge, A., Jackson, M., Moncorge, A., 2019. Non-intrusive reduced Wang, Z., Xiao, D., Fang, F., Govindan, R., Pain, C.C., Guo, Y., 2018. Model identifi-
order modelling for reconstruction of saturation distributions (SPE paper 193831). cation of reduced order fluid dynamics systems using deep learning. Internat. J.
In: SPE Reservoir Simulation Conference, Galveston, Texas, USA. Numer. Methods Fluids 86 (4), 255–268.
Laloy, E., Hérault, R., Jacques, D., Linde, N., 2018. Training-image based geostatistical Watter, M., Springenberg, J., Boedecker, J., Riedmiller, M., 2015. Embed to control: A
inversion using a spatial generative adversarial neural network. Water Resour. Res. locally linear latent dynamics model for control from raw images. In: Advances in
54 (1), 381–406. Neural Information Processing Systems, Montreal, Canada.
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., 1998. Gradient-based learning applied to Xiao, D., Fang, F., Pain, C., Hu, G., 2015. Non-intrusive reduced-order modelling of the
document recognition. Proc. IEEE 86 (11), 2278–2324. Navier–Stokes equations based on RBF interpolation. Internat. J. Numer. Methods
Lee, K., Carlberg, K.T., 2020. Model reduction of dynamical systems on nonlinear Fluids 79 (11), 580–595.
manifolds using deep convolutional autoencoders. J. Comput. Phys. 404, 108973. Yang, Y., Ghasemi, M., Gildin, E., Efendiev, Y., Calo, V., 2016. Fast multiscale reservoir
Lee, K., Lim, J., Ahn, S., Kim, J., 2018. Feature extraction using a deep learning simulations with POD-DEIM model reduction. SPE J. 21 (06), 2141–2154.
algorithm for uncertainty quantification of channelized reservoirs. J. Pet. Sci. Eng. Yoon, S., Alghareeb, Z.M., Williams, J.R., 2016. Hyper-reduced-order models for
171, 1007–1022. subsurface flow simulation. SPE J. 21 (06), 2128–2140.
Liu, Y., Durlofsky, L.J., 2020. Multilevel strategies and geological parameterizations for Zahr, M.J., Avery, P., Farhat, C., 2017. A multilevel projection-based model order
history matching complex reservoir models. SPE J. 25 (01), 81–104. reduction framework for nonlinear dynamic multiscale problems in structural and
Liu, Y., Sun, W., Durlofsky, L.J., 2019. A deep-learning-based geological pa- solid mechanics. Internat. J. Numer. Methods Engrg. 112 (8), 855–881.
rameterization for history matching complex models. Math. Geosci. 51, Zhang, J., Cheung, S.W., Efendiev, Y., Gildin, E., Chung, E.T., 2019. Deep model
725–766. reduction-model learning for reservoir simulation (SPE paper 193912). In: SPE
Mo, S., Zhu, Y., Zabaras, N., Shi, X., Wu, J., 2019. Deep convolutional encoder- Reservoir Simulation Conference, Galveston, Texas, USA.
decoder networks for uncertainty quantification of dynamic multiphase flow in Zhou, Y., 2012. Parallel General-Purpose Reservoir Simulation with Coupled Reservoir
heterogeneous media. Water Resour. Res. 55 (1), 703–728. Models and Multisegment Wells (Ph.D. thesis). Stanford University.
Peaceman, D.W., 1978. Interpretation of well-block pressures in numerical reservoir Zhu, Y., Zabaras, N., 2018. Bayesian deep convolutional encoder–decoder networks for
simulation. SPE J. 18 (03), 183–194. surrogate modeling and uncertainty quantification. J. Comput. Phys. 366, 415–447.
Zhu, Y., Zabaras, N., Koutsourelakis, P.-S., Perdikaris, P., 2019. Physics-constrained
deep learning for high-dimensional surrogate modeling and uncertainty quantifica-
tion without labeled data. J. Comput. Phys. 394, 56–81.
20

04 - Deep-Learning-Based Surrogate Model For Reservoir Simulation With Time-Varying Well Controls

Uploaded by

Copyright:

Available Formats

04 - Deep-Learning-Based Surrogate Model For Reservoir Simulation With Time-Varying Well Controls

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

04 - Deep-Learning-Based Surrogate Model For Reservoir Simulation With Time-Varying Well Controls

Uploaded by

Copyright:

Available Formats

Journal of Petroleum Science and Engineering 192 (2020) 107273

Contents lists available at ScienceDirect

Journal of Petroleum Science and Engineering

Deep-learning-based surrogate model for reservoir simulation with

ARTICLE INFO ABSTRACT

1. Introduction mapping to and from the low-dimensional representation. The AE com-

where 𝐱̂ corresponds to the solution 𝐱̂ 𝑡+1 in Eq. (4) (before constraint

Fig. 1. Embed-to-control (E2C) overview.

Fig. 2. Encoder layout.

Fig. 3. Linear transition model layout.

3.3. Linear transition model vec[𝐀𝑡 ] = 𝐖𝐴 ℎtrans

Fig. 4. Decoder layout.

Fig. 7. Test Case 1: well controls.

In most of our evaluations, we performed 300 training runs in the Acknowledgments

You might also like