MSC Thesis Final Version Stephan de Hoop
MSC Thesis Final Version Stephan de Hoop
MSC Thesis Final Version Stephan de Hoop
Stephan de Hoop
Delft University of Technology
Copyright
c 2017 Section for Petroleum Engineering, Department Petroleum Engineering and Geosciences.
All rights reserved.
Determination of Relevant Spatial Scale in
Reservoir Simulation
by
Stephan de Hoop
Master of Science
in Petroleum Engineering and Geosciences
i
Acknowledgments
First of all, I would like to thank my supervisor Dr. Denis Voskov who has inspired me throughout my whole
Master program and without whom this project would not have been possible. His excitement and ambitious
mindset have always been encouraging. I am very grateful for all the help he has given me the past two years. I
am very much looking forward to our future research endeavors.
Secondly, I would like to express my sincere gratitude to my supervisor Dr. Femke Vossepoel who has always
provided me with valuable insights as well as keeping me on a schedule when I needed it most. Her positive
attitude towards doing research are invaluable as well as always finding time for a proper discussion.
Both supervisors motivated me and gave me the opportunity to present at the SIAM Conference, an experience
so valuable to me. Their contribution and support are highly appreciated.
Many thanks to Dr. Andre Jung for his immediate interest in my thesis project and willingness to assist at all
times. He has been almost an additional supervisor and always showed me a warm welcome at Baker Hughes. Our
discussions have greatly helped me grow as a scientist.
Hereby I would also like to thank Prof. Dr. Giovanni Bertotti and Dr. Joep Storms for being part of my
committee. Besides this, both of them have been a big inspiration for me to even pursue a Master at the Delft
University of Technology and for this, I am very grateful.
At this point, I would like to thank my parents for their unconditional love and support, both emotionally as
well as financially. I would also like to thank my brother Richard, who has shaped me into the person I am today.
Furthermore, I am grateful for having Robin and Bart as my closest friends who stood up with my constant nagging
about being busy. I also want to thank Jasper for all the moments that we shared in our educational journey.
Finally, I want to thank Natalia Papatrecha, who took care of me when I needed it most. She has stood next to
me, motivating me whenever I felt I couldn’t do it. I will never forget as well the long nights of hard work which
have led to both our success. We made it!
ii
Contents
Abstract i
List of Figures iv
List of Tables vi
1 Introduction 1
1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Objectives and method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Document Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Geological Modeling 5
2.1 Modeling approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Geological models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.1 Training Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.2 Ensembles generated using MPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.3 Ensembles generated using FLUMY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.4 Petrophysical properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 Theoretical background 13
3.1 Governing equations for Flow and Transport in Porous Media . . . . . . . . . . . . . . . . . . . . 13
3.2 Flow based upscaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 Discrete Cosine Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.4 Distance-based modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4.1 Formal definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4.2 Multi-Dimensional Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4.3 Clustering and model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4.4 Kernel trick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
iii
CONTENTS
Appendices 69
B Subset statistics for full fine-scale ensembles for various properties and ensembles. 73
References 75
4.1 Comparison of DCT on actual signal and log of signal, with and without outlier. . . . . . . . . . . 31
4.2 Comparison of two-dimensional DCT on globally upscaled transmissibility. . . . . . . . . . . . . 32
4.3 Examples of actual DCT on Strebelle3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.4 DCT of Hf ensemble Strebelle3 and the dominant basis-vector of the ensemble. . . . . . . . . . . 35
4.5 Geometric interpretation dominant basis-vector of the Strebelle3 ensemble. . . . . . . . . . . . . 35
4.6 Other interpretation of DCT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.7 Hierarchical Strebelle3 ensemble, transmissibility field. . . . . . . . . . . . . . . . . . . . . . . . 37
4.8 Evolution of characteristic scales across various ensemble scales, Strebelle3 . . . . . . . . . . . . . 37
4.9 Truncated and coarser signal using DCT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.10 Example of one-dimensional field responses Strebelle3 ensemble. . . . . . . . . . . . . . . . . . 40
4.11 Effect of misfit between fine- and coarse scale caused by well placement with respect to the pale-
oflow orientation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.12 MDS representation of one-dimensional oil field rate Strebelle3 . . . . . . . . . . . . . . . . . . . 43
4.13 Illustration of Water cut as property for distance in MDS and effect of including more or less
elements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.14 Stacking MDS projections in time to obtain Uncertainty Trajectory. . . . . . . . . . . . . . . . . 45
4.15 Option one for computing the MDS of several ensemble scales. . . . . . . . . . . . . . . . . . . . 48
4.16 Option two for computing the MDS of several ensemble scales. . . . . . . . . . . . . . . . . . . . 49
v
LIST OF FIGURES
4.17 Uncertainty Trajectory for two realizations computed for several properties at various scales (well
and field). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.18 Mean of integrated distance between finest Uncertainty Trajectory and remaining ensemble scales,
for various properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.19 Uncertainty Trajectory after Orthogonal Procrustes solution w.r.t. the finest scale. . . . . . . . . . 51
4.20 Normalized mean of integrated distance after applying Orthogonal Procrustes at each time-slice. . 52
4.21 Difference between two MDS approaches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.22 Difference binary permeability distribution and bimodal. . . . . . . . . . . . . . . . . . . . . . . 53
B.1 Coarse distance approximating cumulative oil production for LineDrive2 ensemble, well 1. . . . . 73
B.2 Coarse distance approximating water cut for Flumy3 ensemble, well 1. . . . . . . . . . . . . . . . 74
vii
List of Abbreviations
2D Two-dimensional.
CFL Courant–Friedrichs–Lewy.
HF High Fidelity.
NG Net to Gross.
OP Orthogonal Procrustes.
viii
List of Abbreviations
TI Training Image.
Figure 1.1: The permeability measurements across a variety of length scales, based on (Nordahl & Ringrose, 2008).
Ultimately, the objective of any reservoir characterization and simulation procedure is to assess the
economic risk associated with the particular investment. To minimize this risk, quantification of the spatial
1
Introduction
uncertainty is inevitable. Please note that there is no such thing as “right” or “correct” uncertainty. This
is explained in great detail by Caers (2011), who uses the analog of weather predictions: if the weather
forecast predicts a 60% chance of rain, and it doesn’t rain, there is no way in which the quality or
“correctness” of this uncertainty can be verified. In reservoir modeling, the same principle applies. Knowing
the “right” uncertainty requires knowing the true state of the Earth (or subsurface) which subsequently
nullifies the need of uncertainty quantification in the first place (it’s a paradox). Uncertainty assessment
is therefore subjective, but if done in a systematic way, can lead to improved decision making (Caers,
2011).
Large ensemble sizes are typically required to represent the aforementioned spatial uncertainty (depend-
ing on the expected complexity of the reservoir) and the resulting quantification of dynamic uncertainty
(also referred to as response uncertainty) is done through forward simulation of the static models in time.
Depending on the size of the discretized problem, this can become a computationally expensive task, and,
when explicitly including micro-scale fluctuations, it becomes virtually impossible1 considering the cur-
rent computational resources (Nordahl & Ringrose, 2008). Characterization of micro-scale pore-structures
using tomography and micro-imaging is an existing and ongoing research, see e.g. (Jungreuthmayer et
al., 2015), (Sok et al., 2010) or (C. Chen, Hu, Westacott, & Loveless, 2013), however inverse estimation
of these micro-scale fluctuations by macroscopic well measurements (such as pressure and flow rate) is
currently an unexplored area.
Ignoring over-fitting a model2 , the finest scale possible to perform reservoir simulation is generally
preferred due to the numerical accuracy of the simulation. Given a certain modeling purpose, knowledge
of the existence of a relevant (macroscopic) spatial scale at which the reservoir model should be formulated,
can greatly assist in the uncertainty quantification process. If a relevant spatial scale exists, the response
uncertainty at this particular scale should then converge to the finest scale response uncertainty. Knowing
this relevant spatial scale, if coarser than the finest scale, it is possible to decrease the computational costs
while still making (financial) decisions on the same response uncertainty.
The non-linear mapping (also referred to as transfer function) from the parameter space to the solution
space (production data, etc.) requires investigating the existence of a static and dynamic relevant spatial
scale. Even a simple dead oil model may exhibit non-linear behavior in its solution, mainly due to the
dependency of relative permeability on saturation. The importance of this concept is illustrated in figure
1.2 where two seemingly different realizations (regarding the Euclidean distance between their model
parameters) exhibit almost equivalent behavior in their solution/response.
1 Considering the vast amount of realizations required to represent the spatial uncertainty associated with the characterization of micro-
scale fluctuations.
2 Over-fitting is the problem of learning/explaining the training data rather than recognizing patterns, which has negative implications
on the predictive value. A prime example is given by (Bishop, 2006) Chapter 1 p.7, in the context of fitting a polynomial curve to some
measurements. Increasing the degrees of freedom (model parameters) in this example improves the fit with the observations, however at
the costs of spurious oscillations decreasing the predictive value of the model away from the data points.
Facies distr. Nr = 19 Facies distr. Nr = 78 Water prod. Well 1 Water prod. Well 2
4 5
[105 m3 ]
[105 m3 ]
Ensemble Ensemble
Nr = 19 Nr = 19
Nr = 78 4 Nr = 78
3
W ell1
W ell2
Cum. water prod., Np,w
1
1
Injector Injector
Producer Producer
0 0
0 500 1000 1500 0 500 1000 1500
Time, t [days] Time, t [days]
[105 m3 ]
[105 m3 ]
Ensemble Ensemble Ensemble
Nr = 19 Nr = 19 Nr = 19
Nr = 78 4 Nr = 78 Nr = 78
3
W ell3
W ell4
Cum. water prod., Np,w
1
1
0 0
0 500 1000 1500 0 500 1000 1500
Time, t [days] Time, t [days]
Figure 1.2: Depicting statistically equivalent flow behavior in terms of water production in each well, for two
realizations which are seemingly different w.r.t. the Euclidean distance of the model parameters.
• Understand the extent to which coarse information can be effectively used to simulate flow behaviour.
In order to achieve the thesis objectives, the following workflow is proposed in figure 1.3.
The first step of the workflow is the generation of M “geologically realistic” reservoir models, denoted
the High fidelity (Hf) M-dimensional ensemble of models. The hierarchical ensembles are created by
coarsening (and refining) the Hf ensemble. Analyzing the coarsening effect on the prior (static) information
is done using the Discrete Cosine Transform (DCT) on the transmissibility fields across all ensemble scales.
Dynamic data is obtained by forward simulating all ensemble scales in time using AD-GPRS (Stanford,
2012) and (D. Voskov, Zhou, & Volkov, 2012), representing the fluid flow of a two-phase dead oil model
(slightly compressible) in a 2D reservoir. Subsequently performing distance-based modeling of the response
uncertainty in Metric Space results in quantifying the coarsening effect and possible understanding of a
relevant dynamic spatial scale.
Geological modeling
Hierarchical ensembles
Figure 1.3: Simple schematic depicting the workflow adopted in this work.
5
Geological Modeling
and variability of key reservoir parameters, as well as provide more accurate history matching results
(Keogh, Martinius, & Osland, 2007) and (Demyanov, Rojas, Arnold, & Christie, 2013). There are several
methods for establishing a geologically realistic reservoir model, e.g. a process based or stochastic ap-
proach. Besides geological realism, the ability of data conditioning is another key element in the attempt
to understand or predict subsurface flow processes and its associated uncertainty.
Process-based modeling offers highly detailed geologically realistic reservoir models, however, due to
the forward simulation in time, the ability to be conditioned to hard data is limited (Michael et al.,
2010) and (Hoffimann, Scheidt, Barfod, & Caers, 2017). Another drawback of the process based modeling
manifests itself in the computational effort of acquiring a single realization/geological model. Creation
of a large ensemble of models, by variation of the input and boundary conditions (e.g. sea level rise,
sediment supply), can become a tedious task. There are cases where it is shown that data condition for
process-based modeling is possible in principle (Karssenberg, Tornqvist, & Bridge, 2001). However, in this
particular case, it is merely achieved by a large amount of Monte Carlo simulations and it is mentioned
by Karssenberg et al., (2001) that this is infeasible for real-world applications without severely increasing
the computing power.
Sequential Indicator Simulation (SIS) and other variogram-based simulation techniques allow for ap-
propriate data conditioning but lack of geological realism. Accurate simulation of curvilinear geological
features such as extensively continuous channel sands is impossible using merely two-point statistical cor-
relation functions (variogram) (Strebelle & Levy, 2008), (Remy, Boucher, & Wu, 2009), (Kim, Lee, Lee,
Rhee, & Shin, 2017) and (Lee, Lim, Choe, & Lee, 2017). Object-based modeling techniques are more
appropriate for simulating continuous channels, but lack flexible conditioning capabilities (Hoffimann et
al., 2017), especially when “dense” data is available (Strebelle & Levy, 2008). Object-based methods have
also shown to cope with difficulties of accurately representing complex channel interactions (Seifert &
Jensen, 2000).
Multiple Point Statistics (MPS) is recently accepted as an appropriate alternative to the aforementioned
modeling approaches. The main reasons for this constitutes the realistic depositional facies distribution,
ease at which MPS honors both hard and soft data as well as well as the low computational costs (Hashemi,
Javaherian, Ataee-pour, Tahmasebi, & Khoshdel, 2014), (Kim et al., 2017), (Strebelle, 2002), (Pyrcz,
Boisvert, & Deutsch, 2008), (Caers & Zhang, 2004) and (Mariethoz & Caers, 2014). MPS relies on a
Training Image (TI) which conceptually represents the geological patterns and spatial variability. The
variogram, as a measure of geological heterogeneity, is replaced in MPS by the TI (Caers & Zhang, 2004).
One of the current challenges is the use of process-based models as TI’s. This is challenging because of
the complexity, non-stationarity and non-repetitiveness of these TI’s, see e.g. (Michael et al., 2010) and
(Hoffimann et al., 2017).
This work utilizes the MPS implementation in the software JewelSuiteTM , which is based on the
IMPALATM (Improved Parallel Multiple-point Algorithm Using a List Approach), see (Straubhaar, Re-
nard, Mariethoz, Froidevaux, & Besson, 2011) and (Straubhaar & Malinverni, 2014). For a more general
overview of MPS, see (Mariethoz & Caers, 2014).
Figure 2.1: Left image depicts the training image used in (Strebelle, 2002) where yellow represents the high
permeable reservoir facies and purple depicts the low permeable non-reservoir facies. Right figure displays the
training image taken from (Mariethoz & Caers, 2014) depicting depositional characteristics of a part of the Ganges
Delta.
1 Simplest in terms or production strategy, namely a five-spot with one injector in the center, surrounded by four producers at the edges
of the reservoir.
Figure 2.2: Three different ensemble generated using the Strebelle training image. M is the dimension of the
ensemble and throughout this work the number of ensemble member is kept at 100 realizations. The five-spot
pattern with one injector and four producers is used for constraining the stochastic simulation (sand at every well
location). The size of the reservoir is rather small, namely 1000[m] × 1000[m], with a ∆x = ∆y = 10[m] leading to
Nx × Ny = 100 × 100 on the Hf-scale.
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
Figure 2.3: Depicting the average sand quantity of the Strebelle3 ensemble, where Nreal stands for the number of
realizations included in the averaging. One means sand occurring at that particular cell for each realization, while
zero means no sand at the particular location occurs in any realization included. It serves as a quick visual check
for the validity (in terms of spatial uncertainty) of the stochastic simulations. If strong patterns are recognized, the
stochastic simulation might be clone-stamping (copying) the training image.
Figure 2.4: Three different ensemble generated using the Strebelle training image, where M = 100. The line-drive
pattern with four injectors and four producers is used for constraining the stochastic simulation (sand at every well
location). The size of the reservoir is rather small, namely 2400[m] × 2400[m], with a ∆x = ∆y = 20[m] leading to
Nx × Ny = 120 × 120 on the Hf-scale.
Production well
Injection well
Figure 2.5: One ensemble generated using the Ganges Delta training image, where M = 100. The repeated five-
spot pattern with four injectors and five producers is used for constraining the stochastic simulation (sand at every
well location). The size of the reservoir is rather small, namely 4800[m] × 4800[m], with a ∆x = ∆y = 20[m] leading
to Nx × Ny = 240 × 240 on the Hf-scale.
50
100
150
250
300
350
400
450
500
50 100 150 200 250 300 350 400 450 500
Nodes in x-dir, Nxglobal
Figure 2.6: Depicting the sampling of smaller reservoir models from the larger FLUMY simulation (for ensemble
Flumy3 ). Note that realizations are at least sampled a fixed amount of distance apart, since spatial uncertainty
should be reasonable. The large red dots indicates the origin of the local reservoir models. Note that the local
reservoir models only include the constrained models, since much more local reservoir models are sampled, yet
rejected due to no sand occurring at all five locations. Also note that the statics, shown in figure 2.7, are slightly
biased to larger Net to Gross (NG).
Figure 2.7: Comparison of the statistics obtained from the constrained local realizations, w.r.t all the sampled
location realizations. Realizations that pass the posed constraints of having sand at all well, show increase in NG.
Figure 2.8: Three different ensemble generated using the software FLUMY, where M = 100. The five-spot pattern
with one injector and four producers is in the search template for possible reservoir models in the large simulation
domain, see figure 2.6. The size of the reservoir is rather small, namely 1000[m]×1000[m], with a ∆x = ∆y = 10[m]
leading to Nx × Ny = 100 × 100 on the Hf-scale.
0.9 0.9
0.8 0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
Figure 2.9: Depicting the average sand quantity of the FLUMY3 ensemble, where Nreal stands for the number of
realizations included in the averaging. The predominant paleoflow orientation of the Flumy3 ensemble is visible
due to the large averaged sand facies in the NW-SE direction.
∂ (ρi φ Si )
+ ∇ · (ρi v i ) − ρi qi = 0, i ∈ {o, w} (3.1)
∂t
where ρi is the density, Si is the saturation and qi is the source term of the i-th phase respectively, φ is
the porosity of the porous media, ∇ is the Nabla operator and v i is the Darcy velocity of the i-th phase,
given as
kr,i
v i = − K ∇(Pi − ρi g ), i ∈ {o, w} (3.2)
µi
where kr,i is the relative permeability, µi is the viscosity and Pi is the pressure of the i-th phase respectively,
K is the permeability tensor and g is the directional gravitational acceleration defined as g∇z. The
constraint equation typically used to close the above system of governing equations is
Sw + So = 1 (3.3)
Equation (3.1) does accurately describe the process of water injection into a dead oil reservoir. However,
for the general purpose, it is easier to write the conservation equation for each component separately
and then reduce it for particular physics. This approach was adopted in the design of ADGPRS, see
13
Theoretical background
(D. V. Voskov & Tchelepi, 2012; D. Voskov, 2012) for details. The conservation of mass in general form
is written as:
∂
Fc = φ ∑ xcp ρ p S p − ∇ · ∑ xcp ρ p v p − ∑ xcp ρ p q p = 0, c = 1, . . . , nc (3.4)
∂t p p p
Local
The main idea of local upscaling algorithms is that the properties of the coarse cell are solely determined
by solving a local flow problem, where the domain of the local flow problem exactly comprises the target
coarse cell (Y. Chen, Durlofsky, Gerritsen, & Wen, 2003). Since the steady-state single-phase pressure
equation, subject to particular boundary conditions, is solved to obtain the coarse properties, the solution
is heavily dependent on boundary conditions. Besides that, the global pressure field is strongly influenced
by the global permeability field (He & Durlofsky, 2006). This means that coarse-scale properties obtained
from local flow problems don’t always accurately capture the global flow patterns from the underlying
fine-scale properties. A positive aspect of the local upscaling though is that the local flow problems can
be solved independently from each other, leading to easy parallelization (Y. Chen et al., 2003) and (He
& Durlofsky, 2006). For more extensive overview of local upscaling techniques see (Kitanidis, 1990),
(Durlofsky, 1991) and (Durlofsky, 2005).
Extended local
Extended local is a natural expansion of the local formulation, i.e. it extends the domain used for the local
flow problem to include surrounding fine-scale information. The size of the surrounding region, typically
denoted by r due to its similarity with a radius1 . The extended local formulation is therefore different in
the boundary conditions used for the local flow problem and therefore different in its approximation of
the coarse-scale property, generally an improvement over the purely local upscaling methods. For more
extensive overview of the extended local formulation see e.g. (Holden & Lia, 1992), (Gomez-Hernandez,
Journel, et al., 1994), (Wu, Efendiev, & Hou, 2002), (Wen, Durlofsky, & Edwards, 2003a) or (Wen,
Durlofsky, & Edwards, 2003b).
Global
Global upscaling is the algorithm used in this work to obtain the coarse-scale ensemble members. Global
upscaling involves solving the fine-scale steady state single-phase pressure equation on the global domain
defined as K
∇·v = ∇· ∇(P − ρgg) = qwell , (3.5)
µ
or sometimes also expressed in dimensionless form as
∇ · v = ∇ · (K
K ∇P) = qwell . (3.6)
The solution to equation (3.5) for pressure, P, is then used to obtain the coarse property (permeability
or transmissibility). As shown by Durlofsky et al., (2006) and Chen et al., (2003), directly upscaled
transmissibility will result in a better representation of the fine-scale pressure field. This is accentuated
for highly discontinuous fine-scale permeability fields, which can be found in channelized reservoirs. The
reason for this is that in the step of computing the coarse transmissibility by harmonic averaging of
the upscaled permeability, more weight is put on lower permeability values, therefore, underestimating
the total flow and increasing the approximation error (Y. Chen et al., 2003). Avoiding the harmonic
averaging step by directly upscaling the transmissibility is therefore advised in order to better capture
strongly discontinuous permeability fields.
As mentioned before, the pressure solution to equation (3.5) is used to obtain the interface transmis-
sibilities of the coarse cells. This is done by considering the following principle: for a Two-Point Flux
Approximation, interface transmissibility multiplied with a pressure difference across the interface sepa-
rating two coarse cells should give the flux across the particular interface. This can be written for the
interface (i + 1/2, j) separating the two coarse cells (i, j) and (i + 1, j), as
where (qcx )i+1/2, j is the coarse flux across the interface (i + 1/2, j) simply defined as the integrated fine-
scale fluxes across the coarse interface, (Txc )i+1/2, j is the coarse transmissibility and (Pi,c j and Pi+1,
c
j ) are
the coarse pressures obtained by arithmetically averaging the fine-scale pressures contained in each block
respectively. Rewriting this equation for coarse transmissibility results in
(qcx )i+1/2, j
(Txc )i+1/2, j = (3.8)
Pi,c j − Pi+1,
c
j
This way of obtaining coarse properties from global flow was first shown by Horne (1987) followed
by Nielsen et al.,(2000) who formulated it in terms of an optimization problem (White & Horne, 1987).
Global upscaling in highly heterogeneous reservoirs has one downfall, however, namely that the resulting
transmissibility values might be very large (or even negative!), see (Holden & Nielsen, 2000) and (Y. Chen
et al., 2003) for more analysis on the matter. An iterative procedure is generally used to obtain a positive-
definite transmissibility matrix, very much desired from a numerical analysis point of view. The iterative
(i+1/2,j)
(i,j) (i+1,j)
Figure 3.1: Schematic image showing the fine-grid of Nxf × Nyf = 8 × 6, using an upscaling factor of two in both
the x- and y-direction results in the coarse grid having size Nxc × Nyc = 4 × 3, where the coarse pressure is indicated
with the blue circle obtained by arithmetically averaging the fine scale pressure contained in the particular coarse
cell. Cells in the x-direction are counted using the i-th index, while cells in the y-direction are counted using the
j-th index. The formula for estimating the coarse interface transmissibility, separating the coarse blocks (i, j) and
(i + 1, j), denoted (i + 1/2, j), is by equation (3.8), where the red arrow is used to indicate the integrated coarse
flux.
procedure replaces negative values with geometrically averaged transmissibility values, after which the
coarse pressure is recomputed using the new coarse transmissibilities. Next it followed by estimating the
new coarse transmissibility from equation (3.8) where (qcx )i+1/2, j is unchanged. Generally, the conver-
gence of zero non-negative transmissibilities values is reached within five iterations. When convergence
is not reached within ten iterations, the remaining negative transmissibility values are finally set to a
geometrically averaged transmissibility. The rate of convergence is heavily dependent on the degree of
heterogeneity, in particular, the contrast in permeability. When the convergence is not reached, horren-
dously large transmissibility values might occur after several iterations (5-10). This doesn’t negatively
affect the pressure solution in the upscaling procedure, however, might cause numerical complications
when these transmissibility values are used to perform a simulation with different conditions.
Besides flow-based upscaled transmissibility, a similar formula for a flow-based upscaled well-index for
well α can be derived, given by the following equation
qwell α
WIcwell α = , (3.10)
Pi,c j − Pwell α
where the coarse superscript is left out for the coarse-scale well flux and pressure since for strictly vertical
wells in 2D model, they are generally equal to the fine-scale well flux and pressure2 . Note that in three-
dimensions, there exists a coarse-scale well flux and pressure when upscaling in the z-direction causing
the fine- and coarse-scale to deviate, even for strictly vertical wells. The resulting coarse-scale well flux is
just the summation of the fine-scale well fluxes (for each fine-scale grid block containing a well inside the
new coarse block) and the coarse-scale pressure is the average well pressure (averaged over each fine-grid
2 Unless the well is horizontal or the upscaling ratio is quite extreme, up to the case where multiple wells end up in the same coarse grid
block.
containing a well with prescribed target BHP inside the new coarse grid-block). For upscaling, we used
the modified MATLAB
script
c developed by Dr. Brad Mallison at Stanford University.
where gk is the k-th frequency component of the transformation and αk is a scaling factor, defined as
q
1, :k=0
αk = q N (3.12)
2 , : k = 1, . . . , N − 1
N
This scaling factor ensures that the DCT-2 matrix, associated with this linear transformation, is orthogonal
(orthonormal column vectors) meaning
ΦT Φ = ΦΦT = I (3.13)
where Φ is the N × N square DCT-2 matrix, implicitly defined by equation (3.11), I is the identity matrix
and the superscript T is used to indicate the matrix transpose. Note that it’s not necessary to write
the Hermitian transpose (conjugate matrix transpose) since only real basis-vectors and input signals are
concerned in this work. Key observation from equation (3.13) is: Φ T = Φ −1 . The above mentioned linear
transformation can be defined in matrix-vector form as
g = Φy (3.14)
which can be interpreted as a simple change of basis, namely changing the original basis of the vector
(standard basis on RN ) to a basis of mutually orthogonal cosine function oscillating at different frequencies.
3 Optimal refers here to its variance distribution property and rate-distortion function.
4 For wide sense stationary processes.
Note that each row of Φ corresponds to a basis element of the DCT basis since left multiplying equation
(3.14) with Φ T gives
ΦT g = I y (3.15)
where it is evident that the columns of Φ T are the basis-vectors of the DCT domain, such that Φ =
ϕ 0 ϕ 1 . . . ϕ N−1 ]T where ϕ i is the i-th DCT basis-vector, and the columns of I are the standard
[ϕ
basis-vectors of RN respectively. Extension of the DCT basis for signals in multiple dimensions is simply
by tensor product of the one-dimensional basis vectors. For a two-dimensional data matrix Y , this results
in performing a one-dimensional DCT on the rows of Y followed by another one-dimensional DCT on the
columns, or vice versa. Explicitly, this two dimensional DCT can be expressed as
N1 −1 N2 −1 hπ 1 i h π 1 i
gk1 ,k2 = αk1 ,k2 y cos
∑ ∑ n1 ,n2 N1 1 2 1 n + k cos n 2 + k2 (3.16)
n1 =0 n2 =0 N2 2
where the subscripts ni and ki indicate the i-th spatial and DCT component respectively, and αk1 ,k2 the
scaling factor which ensures orthogonality. The two-dimensional linear transformation can be written in
matrix-vector form as T
G = Φ 2 (ΦΦ1Y )T = Φ 1Y Φ T2 (3.17)
where Φ i is the i-th one-dimensional square DCT-2 matrix of size Ni × Ni , i ∈ {1, 2} and Y is the N1 × N2
data matrix.
The strong energy compaction property5 of the DCT is utilized in this work to identify dominant basis-
vectors. The dominant basis-vector is referred to as characteristic scale/frequency of the input signal,
since a DCT-2 basic-vector is fully described by its amplitude and frequency, therefore representing a
characteristic scale of the input signal.
This is illustrated in 3.2 with a trivial one-dimensional example, constituting a cosine wave, Acos(2π f x+
φ ) = 2cos(4πx) being mapped to the DCT domain. It is evident that the transformed signal can be
approximated by, or in this case entirely expressed as one coefficient multiplying its associated basis-vector.
Note that the dimensionless frequency number k is not equal to the spatial frequency f . The reason for this
is that the transformation is scale independent in its dimensionless form, n, k ∈ {0, 1, . . . , N − 1}, assuming
a uniformly spaced signal with increment 1. Consequently this means that the frequency number k itself
doesn’t reveal information on actual spatial frequency f in [cycles/m]. Since in this work, the DCT is
computed for an ensemble of models across various model scales (different number of nodes), it is important
to understand the difference. For example, two similar signals with different number of points/observations
(coarse and fine-scale representation) shouldn’t have different dominant/characteristic kmax , something
which is not immediately evident from equation (3.11). Considering the following equation
which is the most general form of a cosine wave, where A is the amplitude of the wave, f the spatial
frequency in [cycles/m], x the spatial position in [m] and φ is the phase in [radians]. Now, expressing the
k-th basis-vector in the following way
h π ∆x i
ϕk (n) = cos n∆x + k (3.19)
N∆x 2
where ∆x is the constant increment between two points/observations of the discrete signal y. Note that
this is the exact same expression for the k-th basis-vector as in equation (3.11). Substituting L = N∆x and
x = n∆x, and solving the following equation for f :
πkx
2π f x =
L
5 Magnitude of the DCT coefficients is strongly correlated with their resembling of the original signal. It is assumed that largest
(absolute value) DCT coefficient describes the dominant behavior of the process/signal.
k
obviously yields f = 2L . This means that two cosine waves, similarly defined as in Figure 3.2, with e.g.
Ncoarse = 100 and Nfine = 1000, would yield the same dominant frequency number, namely k = 8[-] and
yield the same spatial frequency f = 2[cycles/m], as one would expect6 .
k = 0, 1, . . . , N − 1
16
2
14 ⇐ max. coef. at kmax = 8
n = 0, 1, . . . , N − 1
12
1
10
k ,
2 i
8
1
2
0
n+
6
yn = 2cos(4πxn ),
h 1
N
π
4
yn cos
-1
PN −1 2
n=0
0
g k = αk
-2
-2
0 0.5 1 1.5 2 0 20 40 60 80 100
Spatial coordinate, x [m] Frequency coordinate, k [-]
Figure 3.2: Visualization of the one-dimensional Discrete Cosine Transform (DCT) on a simple cosine wave. The
left figure shows the cosine wave in the spatial domain whereas the right figure displays the transformed wave in
the DCT (frequency) domain. This trivial example illustrates the strong energy compaction property of the DCT,
particularly that the signal is almost entirely described by k = 8 while the remaining coefficients are almost zero.
Note that frequency coordinate k is not equal to f , the spatial frequency in [cycles/m], see explanation above.
Important to note is that the DCT is a unitary transformation, see equations (3.11) and (3.13), such
that the energy7 of the input signal is equal to the that of the output signal (Jain, 1989), since
N
||gg||22 = Φy )T Φ y = y T Φ T Φ y = y T y = ||yy||22
∑ |gk |2 = g T g = (Φ (3.20)
k=0
where || · ||22 is the squared l2-norm. This property is important considering a truncated DCT, where
truncated means setting the lower DCT coefficients to zero. Namely that
||gg − g̃g||22 = ||yy − ỹy||22 (3.21)
where g̃g and ỹy are the truncated DCT and input signals, respectively. This can be proven similarly by
T
||gg − g̃g||22 = (gg − g̃g)T (gg − g̃g) = (Φ
Φy − Φ ỹy)T (Φ
Φy − Φ ỹy) = Φ (yy − ỹy) Φ (yy − ỹy)
= (yy − ỹy)T Φ T Φ (yy − ỹy)
= (yy − ỹy)T (yy − ỹy) = ||yy − ỹy||22 (3.22)
6 The finer signal in the DCT domain simply contains much more higher-frequency basis-vectors. However, the spatial frequency of
the first Ncoarse basis-vectors are identical since the two signals are defined over the same distance.
7 Here defined as the square of the l2-norm, inner product or the squared length of a vector.
Since generally the energy in the DCT domain is highly compacted (few large coefficients and a lot of
very small ones), a truncated DCT approximation with only a few non-zero coefficients may result in a
small error (in the squared l2-norm) in the spatial domain as well. An example of this is shown in Figure
3.3.
Besides showing the operations required for obtaining a truncated signal ỹy, it is also an appropriate
way to mention another important fact about the linear transformation and the DCT in particular. If the
signal ŷy is random, then the DCT-coefficients, ĝg, are also random8 and therefore spatial characteristics
of a random signal (or ensemble of random signals, as are generated in this work) should be measured
with respect to its expected value. Formally, the truncated input signal by inversion of truncated DCT-
coefficients is obtained by introducing the diagonal restriction matrix R , where ri, j is the (i, j)-th entry of
the restriction matrix R defined as (
1, : i = j ∧ i ∈ S
ri, j = (3.23)
0, : otherwise
where S is a set of integers containing the non-zero entries after truncation. Usually S is defined by a
threshold value for the required energy of the signal after truncation. For example, considering the one-
dimensional example above: if the truncation energy is 100% then all basis-vectors should be included
in the “truncation”, which means that S = {0, 1, . . . , N − 1} and therefore R = I . This means that the
truncated DCT, g̃g, can be expressed as
g̃g = R M g = R M Φ y (3.24)
where the subscript M indicates the number of non-zero diagonal entries, which consequently means that
the truncated signal, ỹy, can be expressed as
where the subscript M stands for the rank of the truncation matrix U M and δm is the Dirac-delta function
with a similar definition as equation (3.23), namely
(
1, : m ∈ S
δm = (3.27)
0, : otherwise
Note that throughout the chapter all subscripts range from 0 to N − 1, this is the usual convention when
using the DCT since that way the subscripts of the basis-vectors refer to its frequency number.
Another interesting thing to realize is that the Gaussian noise added to the cosine wave in Figure 3.3
doesn’t “disappear” but since the DCT is a linear transformation, equation (3.25) for this cosine wave
with added noise can be expressed as
8 Since a linear transformation of a Gaussian random vector is also a Gaussian random vector.
9M is the number of basic-vectors included in the truncation, generally M << N
10
ĝ = Φŷ
0
ŷ = y + ǫ,
5
Random coefficients
-2
0
15
ỹ = ΦT g̃ = ΦT RΦŷ
2
g̃ = Rĝ = RΦŷ
10
0
5
-2
0
Figure 3.3: Illustration of truncated DCT on the same signal shown in Figure 3.2 but with added Gaussian noise
ε ∈ N (0, 1). Random vectors are indicated with the ˆ·. Top left displays the signal with added Gaussian noise.
Top right displays the DCT of the noisy signal, where the random coefficients are clearly visible. This highlights
an important observation, if the input signal is random, the DCT coefficients will be random as well and the
characteristic of an ensemble of random signals is the expected value. Bottom left depicts the truncated DCT
coefficients, simply obtained by restricting the full DCT coefficients to one non-zero coefficient, namely the largest
DCT coefficient. The restriction matrix R , is a diagonal matrix containing zeros everywhere, except the kmax -th
diagonal entry. See equation (3.23) for formal definition. Bottom right shows the truncated signal, obtained by
mapping the truncated DCT back to the spatial domain.
40 0.5
ǫ and U1 ǫ
60 0
80 -0.5
100 -1
20 40 60 80 100 0 0.5 1 1.5 2
Columns of U1 Spatial coordinate, x [m]
Figure 3.4: Left displays the truncation matrix U 1 associated with the example in Figure 3.3. Clearly visible is
the linear dependence of the columns resulting from the rank 1 truncation matrix, entirely defined by the dominant
basis function. Right shows the noise which was added to the simple cosine wave and its transformed signal. The
transformed noise is clearly in the column space of U 1 but also heavily damped. The reason for this is the rank 1
DCT approximation using only one basis-vector (which has unit length) to construct U 1 , meaning that the l2-norm
of the columns of U 1 are much smaller than 1.
where y = [2cos(4πx0 ) 2cos(4πx1 ) . . . 2cos(4πxN−1 )]T and ε ∈ N (0, 1) as defined as figure 3.2 and 3.3.
Since for this simple example, the reconstruction of the original signal containing merely one basis-vector,
the column space of U 1 , denoted by C(U U 1 ), is spanned entirely by this one (dominant) basis-vector.
Because the result of a matrix-vector multiplication will always “fall” in the column space of the matrix
which multiplies the vector, meaning that the random noise is simply transformed to “fall” in C(U U 1 ).
Adding two vectors that are both in this one dimensional10 column space, C(U U 1 ), will result in a vector
U 1 )11 , providing this smooth signal displayed in the bottom right of figure 3.3.
that is still in C(U
10 Note
that the dimensionality of a vector space is determined by cardinality of the basis of that vector space, i.e. the number of
basis-vectors in the basis.
11 Note that this is the very definition of a subspace, other than the multiplicative property as well as containing the zero-vector.
12 Model refers here to a single realization from an ensemble of models, i.e. a sample from a set of realizations. Where realization is
defined as the outcome of a stochastic or process based simulation, which in this work serves as the input to a reservoir simulator.
endeavor (Caers et al., 2010). Due to the large intrinsic variability between all the models, a simple Eu-
clidean distance between all of them will not result in a drastic reduction of the dimensionality, however
choosing the distance between all models defined along a certain trajectory13 might greatly reduce the
dimension (Caers et al., 2010) and (Fenwick & Batycky, 2011). Note that the purpose of the modeling,
governing this choice of distance, could cover the prediction of water breakthrough in a secondary oil
recovery process or the concentration of a contaminant at a given point in space-time, advecting with the
fluid through the reservoir.
1. d(x, y) ≥ 0
2. d(x, y) = 0 ⇔ x = y
3. d(x, y) = d(y, x)
In this work, the matrix D denotes the square and symmetric distance matrix, usually of size M × M where
M denotes the size of the ensemble W , where the (i, j)-th entry of D is defined as
wi , w j ),
δi, j = d(w i, j = 1, 2, . . . , M ∀w
wi , w j ∈ W (3.29)
where d(·, ·) satisfies the above given definition of a metric on a set, i.e. any allowable dissimilarity function
that is a metric, e.g Euclidean or (Modified) Hausdorff distance (Dubuisson & Jain, 1994).
A r i = λi r i , i = 1, 2, . . . , M ∀rr i 6= 0 (3.31)
where r i denotes the i-th eigenvector with its corresponding eigenvalue, denoted by λi , such that R =
[rr 1 r 2 . . . r M ] and Λ = diag(λ1 , λ2 , . . . , λM ).
One important property of the eigenvalue decomposition is that if the matrix A is square symmetric,
which is true for many applications as you can see below, the eigenvectors and eigenvalues are always real.
13 Trajectory could be streamlined from well to well, where the distance could be based on the Time of Flight (TOF) along this streamline.
Moreover, the eigenvectors are orthogonal (if λi 6= λ j ∀i, j ∈ {1, 2, . . . , M}) or can be chosen orthogonal (if
λi = λ j ) (Searle & Khuri, 2017). This will reduce equation (3.30) to
D = Q Λ QT (3.32)
where D is a square symmetric matrix and Q is an orthogonal matrix with orthonormal eigenvectors as
its columns such that Q Q T = Q T Q = I .
3. It is a projection matrix, namely a projection onto the M −1 dimensional subspace which is, as defined
by the four fundamental subspaces of Linear Algebra (Strang, 1993), orthogonal to N(C C TM ) = N(C
C M ),
T
since C M = C M .
These properties are useful when trying to understand the MDS algorithm and result of the MDS projec-
C M ) = M − 1, the linear mapping C M is not invertible.
tion, defined below. Also note that due to rank(C
2. Double center the matrix D (2) , using equation (3.33) for the centering matrix C M , such that
1
B = − C M D (2)C M (3.34)
2
B = QΛQT (3.35)
X m = Q m Λ 1/2
m (3.36)
where the square root of a diagonal matrix is simple the square root of the diagonal entries.
14 For a more detailed introduction, a geometric interpretation of the MDS or a more in-depth review of the MDS see this reference.
15 Here .2 or (2) denotes the square of the elements of the matrix D .
MDS-axis 2
Independent variable, t MDS-axis 1
Figure 3.5: Left displays three simple curves, in this case generated using a logarithmic function, linear function
and translated quadratic function. The Euclidean distance is computed between all the curves using equation
(3.37), resulting in the distance matrix D which is then used in the classical MDS algorithm to obtain the figure on
the right. The right part of this figure clearly shows that the first axis of the MDS projection explains “all” of the
projection, since from this projection the original distance matrix can be reconstructed in it’s entirety, while having
a residual error of ||D
Drecon − D ||2 ≈ 0.
A nice property of the classical scaling algorithm is, as mentioned by (Borg & Groenen, 2005), that the
dimensions are nested. This means that a truncated MDS, e.g. a projection onto a two-dimensional
Cartesian plane, can be achieved by taking the first two dimensions of the coordinate matrix X m . Finally,
if the distance function used to construct D happens to be a Euclidean distance, the coordinates of the
MDS projection X m using the classical scaling algorithm, are found up to a rotation (Borg & Groenen,
2005).
To illustrate MDS in practice, a simple example is shown below where the Euclidean distance is
chosen for d(·, ·), which is computed for three artificial curves, e.g. time as independent variable and oil
production as the dependent variable. The resulting distance matrix D is then used according to the
algorithm defined above, resulting in the MDS coordinate projection onto a two-dimensional Cartesian
plane. For completeness, the formal definition of the Euclidean distance between two vectors, w i and w j
in RN is s
N q
di,Eucl
j = w
d(w i , w j ) = ∑ w
(w i,n − w j,n )2 = (w
wi − w j )T (w
wi − w j ) (3.37)
n=1
Note that in this simple example with only three curves, a one-dimensional projection is sufficient. The
quality of the projection16 is heavily dependent on the size of the eigenvalues of B . However, generally
in Earth Science applications, no more than six dimensions are required for a sufficiently good projection
(Caers et al., 2010).
1. Assigning every data point to the closest mean (cluster center), where closest refers to the least
squared Euclidean distance.
2. Updating the nk clusters by calculating new means (centers), based on the mean of the cluster
members.
The main disadvantage of k-means clustering is that it’s sensitive to outliers and doesn’t necessarily
converge when non-Euclidean distances are used (Jin & Han, 2010). Also, the final k-means clustering
depends on the initial clustering. This is typically solved by running the k-means algorithm multiple times
and choosing the result with the least within-cluster variation defined as
nk
Wvar = ∑ ∑ ||Xi − X̄k ||22 (3.38)
k=1 C(i)=k
where Xi is the i-th data point, C(i) = k is a function that assigns data point Xi to cluster k and X̄k is the
mean or center of the k-th cluster. The above mentioned convergence criteria is typically set to the point
where the within-cluster variation doesn’t change over the next iteration.
K-medoids, on the other hand, allows for non-Euclidean distances to be used and is less sensitive to
outliers. The difference in implementation, as also mentioned above, is that the center of the cluster
is also a data point. Initialization and the first step of the algorithm don’t differ much from that of
k-means however the second step does differ. One algorithm that finds the (local) optimum configuration
of medoids is Partitioning Around Medoids (PAM). By iteratively swapping the current medoid with an
associated non-medoid and computing the quality of the clustering, the algorithm attempts to improve the
quality of the clustering. “All possible combinations of representative and non-representative points are
analyzed, and the quality of the resulting clustering is calculated for each pair. An original representative
point is replaced with the new point which causes the greatest reduction in distortion function. At each
iteration, the set of best points for each cluster form the new respective medoids.” - (Jin & Han, 2010).
K-medoids has the advantage that the medoids can be chosen as representatives of the cluster imme-
diately since they are part of the data set. This is a slight advantage when considering model selection
process, where model selection refers to selecting representative models which approximate the behavior
of the exhaustive17 set. For a more detailed review of the clustering methods mentioned here, the reader is
referred to the following references (Kaufman & Rousseeuw, 1987), (Hartigan & Wong, 1979), (Kaufman
& Rousseeuw, 2009) and (Jin & Han, 2010).
where the superscript ? is used to indicate that the mean of the signal, denoted ȳ, is subtracted. Note that
every row of Φ contains a basis-vector, and all the basis-vectors, except the first constant zero-frequency
basis-vector, have a zero mean. This last part is important since taking the inner product of a zero-mean
signal with a scaled vector of ones, will result in zero, similar to the fact that the sum of a zero mean
signal is also zero. Therefore, the right-hand-side (RHS) of equation (4.1) can be written as
Φ y − Φ ȳ11 = g − Φ ȳ11 = [g0 g1 ... gN−1 ]T − [g0 0 ... 0]T = [0 g1 ... gN−1 ]T = g ? (4.2)
which means that the full transformed signal, g = Φ y , is equal to g ? except for the first zero-frequency
coefficient, also referred to as DCT0 coefficient (which handles the constant translation/offset of the
original signal). The first coefficient of the two resulting vectors of the RHS of equation (4.1) is actually
1 Note that relevant and dominant are used interchangeably throughout this work.
2 Except for removing the zero-coefficient arising from the constant basis-vector for zero-frequency.
29
Determination spatial scale
g0 which can be shown by the definition of the mean and inner product of two vectors. The first DCT
basis-vector can be written as r
1
ϕ0 = 1 (4.3)
N
such that the first DCT coefficient, g0 , is given by the inner product (since Φ = [ϕ
ϕ0 ϕ1 ... ϕ N−1 ]T )
r r r r
1 1 1 1 N−1
< ϕ 0 , y >= y0 + y1 + . . . + yN−1 = ∑ yn (4.4)
N N N N n=0
The first resulting coefficient of the matrix-vector multiplication Φ ȳ11 can be written similarly as
r r r r
1 1 1 1
< ϕ 0 , ȳ11 >= ȳ + ȳ + . . . + ȳ = N ȳ (4.5)
N N N N
1 N−1
µ(yy) = ∑ yn = ȳ (4.6)
N n=0
such that
g0 =< ϕ 0 , y >=< ϕ 0 , ȳ11 >= g0 (4.8)
In the present study, the static property used for the DCT constitutes the permeability or transmissi-
bility, as these affect the flow response more than porosity. Since permeability is logarithmically related
to porosity, it is chosen to calculate the characteristic of the signal y̆y, defined as
1 N−1
y̆y = ln(yy) − µ ln(yy) 1 = ln(yy) − ln(y
∑ n 1 ) (4.9)
N n=0
such that ğg is equal to g , where g = Φ ln(yy). Using the distributive property of the linear DCT, except
for the first zero-frequency coefficient. Note that Φ ln(yy) 6= Φ y . Generally, for signals with a high signal-
to-noise factor they are equal up to some scaling factor β . Clear signals do not occur often in real world
applications and on top of that, the coarser representations of the transmissibility obtained with the global
upscaling technique is prone to generate a few outlying (relatively large) values (Holden & Nielsen, 2000)
and (Y. Chen et al., 2003). The effect on a simple one-dimensional example is shown below, where it is
clearly favored to estimate the dominant characteristic of the original signal through y̆y rather than using
y ? , see definitions above.
Similar behavior is observed in two-dimensions, however, a larger outlying value is required to create
similar erratic behavior in the transformed signal (such as seen in Figure 4.2). These larger values are
however not uncommon, after the iterative procedure of the global upscaling required to converge to
a positive-definite transmissibility matrix. See the example in Figure 4.1. In this example, an actual
upscaled transmissibility field is used for the signal Y , namely the transmissibility in the y-direction, such
that Y = T y in this case. Clearly, the logarithmic transformation preserves the characteristics of the signal
while DCT directly on the centered3 signal results in the horrendous pattern in DCT domain indicating
a sharp, irregular, discontinuity.
3 Centered around the mean.
10 2
10 1
10 0
0 1 2 3 4 0 20 40 60
Spatial coordinate, x [m] Frequency coordinate, k [-]
10 3
10 2
10 1
10 0
0 1 2 3 4 0 20 40 60
Spatial coordinate, x [m] Frequency coordinate, k [-]
Figure 4.1: Top left displays the clear signal, in this case representing some sort of permeability, where the large
values could indicate channel sands and the low values floodplain shales or clays. Top right shows the DCT of
the clear signal for both y ? and y̆y, which are defined in equations (4.1) and (4.9) respectively. For the clear signal,
the DCT of both signals is equal up to a scaling factor β . Bottom left illustrates a hypothetical outlier, as could
be obtained in the coarser representation of transmissibility from the global upscaling algorithm. Bottom right
shows the DCT of the signal with outlier for both y ? and y̆y. Clearly they are very different after introducing only
a single outlying value. y ? doesn’t compact the energy of the signal at all while y̆y still has the same structure as in
the case of a clear signal, i.e. it still finds the same characteristic or dominant basis-function as the original signal.
This robustness is appreciated and therefore determining the characteristics of y̆y is favored in this work. Note that
a sine-like shape of the coefficients in the DCT domain usually indicates a very sharp, irregular, discontinuity, such
as the outlying value.
10
10 10
2 8
20 20 6
1.5
4
30 1 30 2
0
40 0.5 40
-2
-4
10 20 30 40 50 10 20 30 40 50
Columns of Ty [cP.m3 /day/Bar] Columns of ln(Ty ) [ln(cP.m3 /day/Bar)]
! " ! "
Φ1 Ty − µ(Ty ) ΦT2 Φ1 ln(Ty ) − µ(ln(Ty )) ΦT2
Fequency coordinate, ky [-]
10 10
20 20
30 30
40 40
10 20 30 40 50 10 20 30 40 50
Fequency coordinate, kx [-] Fequency coordinate, kx [-]
Figure 4.2: Top left displays the globally upscaled transmissibility in y-direction of one ensemble member of the
Strebelle ensemble, clearly displaying a horrendous transmissibility value as described in previous chapter and
in detail by Nielsen et al.,(2000) and Chen et al.,(2003). Note that the solution for T y still minimizes the misfit
between the fine-scale flux and coarse pressures while having zero non-zero transmissibility values and therefore
is a valid solution. Top right shows the natural logarithm of the globally upscaled T y , still displaying the outlying
value however much smaller in contrast. Bottom left shows similar erratic behavior of the signal in DCT domain
as in the one-dimensional case (namely indicating a sharp, irregular, discontinuity). Bottom right illustrates a
much more compact and realistic transformation of the signal.
4 See chapter on Geological Modeling for more detail on the actual ensemble.
Length [m]
Length [m]
500 500 500
Fequency, ky [-]
Fequency, ky [-]
20 20 20
40 40 40
60 60 60
80 80 80
Fequency, ky [-]
Fequency, ky [-]
5 5 5
10 10 10
15 15 15
20 20 20
25 25 25
5 10 15 20 25 5 10 15 20 25 5 10 15 20 25
Fequency, kx [-] Fequency, kx [-] Fequency, kx [-]
Figure 4.3: Top row displays three ensemble members of the Strebelle3 ensemble, where the red dots indicate
production wells and the blue dot represents the injector well. The yellow color indicates reservoir facies with a
homogeneous permeability of 1000 [mD] whereas the purple color indicates non-reservoir facies of 5 [mD], see
also chapter on Geological Modeling for more information. Middle row depicts the permeability in DCT domain,
where the dominant basis-vector is indicated with the red circle. Bottom row shows a “zoomed in” version of the
DCT coefficients squared, indicated with the superscript (2), in particular is shows the coefficients corresponding to
the first kx × ky = 25 × 25 = 625 basis-vectors. Note that the colors of the middle and bottom rows are indicative of
the magnitude of the DCT coefficients, namely purple equals the smallest coefficient and yellow equals the largest
coefficient.
6
DCT of each ensemble member
⇐(4,16)
10 4
20
1
0
0 10 20
Fequency coordinate, kx [-]
Figure 4.4: Left schematic shows the DCT of each ensemble member with the dominant basis-vector of each
respective ensemble member indicated in red. Storing this dominant basis-vector for each ensemble member
allows for the construction of a frequency map, displayed in the right image, where frequency means the amount
of times a basis-vector was marked as dominant basis-vector of the transformation. E.g. the dominant basis-vector
of the ensemble Strebelle3 is considered to have a dimensionless frequency of kx = 16 and ky = 4 since it occurs a
maximum number of times, in particular it is the dominant basis-vector for seven ensemble members.
Length [m]
500 500
1000 1000
0 500 1000 0 500 1000
Width [m] Width [m]
Figure 4.5: Left image depicts the dominant basis-vector of the Strebelle3 ensemble. Right image displays the
same basis-vector overlain with an ensemble member, showing a geometric interpretation of the dominant basis-
vector. Clearly the dominant basis-vector describes the sinuosity (average wave-length of the channels) and average
distance between the channels. These length measures are important parameters since they govern a minimum
number of static grid-cells allowed to represent this signal and preserve the continuity and characteristics of the
channels.
Figure 4.6: Left schematic shows continuous geological features (channels) projection on two Cartesian grids,
blue and gray respectively. The blue grid dimensions are exactly half of the wavelength of the channels, while the
gray grid is one fourth of the wavelength. Right schematic displays the mapping of these continuous features to
the Cartesian grids. The blue grid is incapable of capturing face-to-face connectivity, an essential requirement for
accurate flow and transport simulation if a Two-Point Flux Approximation is used for the spatial discretization.
Appropriate connectivity, for this simple schematic, is achieved by the mapping to the gray (finer) grid.
Besides the geometric interpretation highlighted above, the frequency of the dominant basis-vector has
other implications. Accurately representing a discrete cosine wave, such as the basis-vectors of the DCT
transformation, requires certain grid dimensions. The minimal grid-size (∆x and ∆y) required to represent
the peaks and troughs accurately is half of the wavelength of the cosine wave. The resulting discrete
signal will approximate a Haar-wavelet, which however doesn’t capture face-to-face connectivity between
grid-cells. An attempt to display this is shown in figure 4.6, where the continuous features on the left are
resolved on two grid-sizes, blue and gray respectively. ∆x and ∆y of the blue grid corresponds to half of
the wavelength of the channels, whereas the dimensions of the underlying gray grid are exactly half of
the blue grid. Even though the wavelength of the channel, as well as the average distance between the
channels, are preserved using half of the wavelength as grid dimension, the face-to-face connectivity isn’t
maintained which might cause numerical issues when a Two-Point Flux Approximation (TPFA) is used
in the spatial discretization5 . The face-to-face connectivity is nicely represented using the dimensions
of the gray grid, leading to an estimation of the relevant static spatial scale, namely one-fourth of the
dominant wavelength of the ensemble. Note that this is a clear disadvantage of a fixed Cartesian grid and
can possibly be solved by using an unstructured grid or adaptive refinement.
The coarsening effect on the static spatial scale is examined in a similar way as the dominant basis-
vector analysis on the fine-scale, however, now the parameter transmissibility is used since this is directly
obtained from the global upscaling algorithm. In particular, the transmissibility in the y-direction is
examined since this is parallel to the orientation of the paleoflow, causing a more continuous upscaled result
w.r.t the transmissibility in the x-direction. Figure 4.8 displays the frequency maps of dominant basis-
vectors across various ensemble scales.The dominant frequencies are preserved throughout the upscaling,
however towards the Nx × Ny = 20 × 20 ensemble scale the dominant basis-vector shifts towards the lower
frequency side of the spectrum.
5 Two-Point Flux Approximation (TPFA) in a Finite Volume Method (FVM) means that the cell average of the Finite Volume is updated
based on two neighboring cell for each dimension. E.g. a two-dimensional FVM-scheme requires information in four neighboring cells
when using TPFA.
Figure 4.7: Depicting the hierarchical Strebelle3 ensemble, with several levels of coarsening.
Ty , models scale: 200 × 200 Ty , models scale: 100 × 100 Ty , models scale: 50 × 50
Frequency, ky [-]
Frequency, ky [-]
5 5 Frequency, ky [-] 5
10 10 10
15 15 15
20 20 20
5 10 15 20 5 10 15 20 5 10 15 20
Frequency, kx [-] Frequency, kx [-] Frequency, kx [-]
Ty , models scale: 25 × 25 Ty , models scale: 20 × 20 Ty , models scale: 10 × 10
2
Frequency, ky [-]
Frequency, ky [-]
Frequency, ky [-]
5 5
4
10 10
6
15 15
8
20
5 10 15 20 5 10 15 20 2 4 6 8 10
Frequency, kx [-] Frequency, kx [-] Frequency, kx [-]
Figure 4.8: Depicting the evolution of the dominant basis-vectors across various ensemble scales, indicated in the
title of each subplot by Nx × Ny . The dominant characteristics are, on a static level, are clearly preserved during the
upscaling algorithm, and only start to “fade away” around 20 × 20 model scale, while completely disappearing in
the final upscaling step.
(60)
K100×100 Strebelle3 Original DCT Energy DCT coef.
0 100
Frequency, ky
Length [m]
40 60
500
60 40
80 20
1000 100 0
0 500 1000 20 40 60 80 100 0 5000 10000
Width [m] Frequency, kx Number of nodes
20
Frequency, ky
Length [m]
Length [m]
40
500 500
60
80
Figure 4.9: Top left depicts the permeability map of an ensemble member of Strebelle3 . Top middle displays
the absolute values of the original DCT of the signal. Top right shows the cumulative energy contained in the
DCT-coefficients. The arbitrary threshold, chosen at two times the largest dominant frequency of the DCT, is able
to capture over 80% of the total energy of the original signal. Bottom left shows the non-zero elements of the
truncated DCT. Bottom middle shows the truncated signal, using only the 40 × 40 non-zero DCT coefficients.
Bottom right shows, instead of mapping the truncated DCT back to a 100 × 100 grid, a coarser mapping where
the remaining non-zero coefficients are mapped to a grid of equal dimensions Nx × Ny = 40 × 40.
6 Typically a deterministic PDE is solved for M-“deterministic” reservoir models, where deterministic refers to them being equally
probable, not to the actual method of acquiring the reservoir model.
7 Regionally oblique refers to obliqueness arising from the main orientation of the channel belt/paleoflow. Locally oblique refers to
obliqueness due to sinuosity of the individual channels inside the channel belt.
8 Use a set of designed experiments to obtain the optimal response (surface), given a set of explanatory variables.
9 Here referred to as the spread or variability in solution space, i.e. set of possible one-dimensional responses to the forward simulation
they describe very little on the reservoir behavior away from the well and therefore are referred to as one-dimensional data.
(95) (2)
1D field rate Strebelle100×100 1D field rate Strebelle100×100
Oil production rate [m3 /day]
200 200
0 0
0 500 1000 1500 2000 0 500 1000 1500 2000
Time [days] Time [days]
(95) (2)
Reservoir model Strebelle100×100 Reservoir model Strebelle100×100
0 0
Length [m]
Length [m]
500 500
1000 1000
0 500 1000 0 500 1000
Width [m] Width [m]
Figure 4.10: Top row displays one-dimensional field responses for two realizations from the Strebelle3 ensemble.
The difference in rate of deviation from the fine-scale is clearly visible. Bottom row shows the two reservoir
models corresponding to the one-dimensional flow response. The left reservoir model shows slightly more features
(channel branches and general sinuosity) oblique to the principal orientation of the grid as well as clear short-circuit
between production (red) and injection (well), namely the bottom well. This is less pronounced in the right model,
where the flow seems “easier” represented on a coarser scale (almost direct fit with only 10 × 10 nodes.
for comparing flow responses. The reason for this is the smooth behavior of cumulative production to
events such as water breakthrough (small change in its derivative) compared to the effect of this event
on changes in rate. Also, it is expected that individual wells contain larger variability in their response
than field data, which is merely a summation of all well production data. Therefore both data “scales”
are considered in the analysis.
Further illustrating the problem of Uncertainty Quantification and estimation of a relevant spatial scale
in reservoir simulation is the example in figure 4.11. The figure displays one-dimensional flow response
obtained from a member of a different ensemble, namely Strebelle1 ensemble (see chapter 2.2.2). In this
example, as displayed in chapter 2.2.2, the paleoflow orientation is oblique to principal grid orientation.
With the five-spot setup used in simulating the aforementioned ensemble, two production wells (well 1
and 4) are parallel to the main paleoflow orientation, i.e. generally in direct communication with the
injector, while the other two wells (well 2 and 3) are perpendicular to the main paleoflow orientation. The
key observation from this figure is the fact that the one-dimensional response of wells perpendicular to the
paleoflow orientation, in this five-spot setup, show relative homogeneous production patterns while the
reservoir clearly exhibits a highly heterogeneous facies distribution (and two-dimensional oil displacement).
The result is that the misfit between the fine- and coarse scale one-dimensional responses for these wells
are nearly negligible. The opposite is true for wells parallel to the paleoflow since they are in direct
communication with the injector well and show a highly heterogeneous production pattern (e.g. “early”
water breakthrough).
Multidimensional scaling, or MDS, is used in this work to simplify the representation of the production
data in time without any considerable loss of information11 . The spatial uncertainty in model parameters
and resulting response uncertainty is represented by the spread in the points after 2D MDS projection.
As shown in chapter 3.4.2, three curves in time can be easily characterized by three points on a two-
dimensional plane (actually even one-dimensional plane would suffice, see figure 3.5). The spread in the
projection of the points tells us much about the time-behavior and similarity of the curves. If their
distance after 2D-projection using MDS is small, the curves are said to be similar in the original domain.
This holds when almost all of the energy of the eigenvalues of B is in the first two eigenvalues since a
two-dimensional projection is concerned. Figure 4.12 shows how the dynamic response (oil production
rate, field scale) of the Strebelle3 ensemble is represented as a 2D projection using classical MDS.
The orientation and position of ensemble members, represented by points after the MDS projection on
a lower dimensional space, is determined by the dissimilarity matrix D up to a rotation (Borg & Groenen,
2005). This observation has led to an experiment where consecutively more elements12 were included in
the curves used in the computation of the dissimilarity matrix D such that the dissimilarity and resulting
MDS is taken over growing intervals of time. This is illustrated in figure 4.13. Stacking of these projections,
representing one-dimensional responses of the forward simulation up until that particular time, leads to
M-“continuous” curves describing the behavior of the uncertainty in time, referred to as the “Uncertainty
Trajectory”. This is illustrated in figure 4.14.
Since this work attempts to clarify the evolution of the uncertainty when coarsening, two options of
comparing the Uncertainty Trajectory between ensemble scales are considered. Figure 4.15 depicts one of
the two possibilities of comparing multiple ensemble scales and their associated uncertainty, while figure
4.16 depicts the second option. The first option is two compute the dissimilarities solely between ensemble
members of the same scale, resulting in the uncertainty of the ensemble at that particular scale. This
can then be done with all ensemble scales and the resulting Nc 13 transformations can then be analyzed in
a single Uncertainty Space. The other approach is to compute the dissimilarity between each ensemble
member at every ensemble scale, resulting in one large dissimilarity matrix (for each time interval) as
11 Energy of the first two eigenvalues is typically above 97% of the total energy of all eigenvalues when one-dimensional production
data curves are used for dissimilarity matrix D .
12 Where elements refer a vector element or index, such that more elements means a larger vector.
13 Where N is the number of ensemble scales in the hierarchy, from finest ensemble scale denoted N = 1 to the coarsest ensemble scale
c c
denoted Nc = C
(14) (14)
Response Well 1 Strebelle1 Response Well 2 Strebelle1
Oil production rate [m3 /day]
50 50
0 0
0 500 1000 1500 2000 0 500 1000 1500 2000
Time [days] Time [days]
(14) (14)
Response Well 3 Strebelle1 Response Well 4 Strebelle1
Oil production rate [m3 /day]
200x200 200x200
250 100x100 250 100x100
50x50 50x50
200 25x25 200 25x25
20x20 20x20
150 10x10 150 10x10
5x5 5x5
100 100
50 50
0 0
0 500 1000 1500 2000 0 500 1000 1500 2000
Time [days] Time [days]
Figure 4.11: Top left and bottom right, i.e. well 1 and 4, image displays the oil production rate of wells that
are parallel to the paleoflow orientation and therefore in direct communication with the injector well. Large initial
production which decreases rapidly after breakthrough. Also clearly visible is the rapidly increasing misfit between
the fine- and coarser scales. Top right and bottom left show two wells that are orthogonal to the main paleoflow
orientation, display a homogeneous (continuous) oil production rate. The well behavior is fully explained by the
most coarse representation due to the homogeneous nature of the production. Note that the reservoir is clearly
heterogeneous, however the one-dimensional response of well 2 and 3 are not able to capture this behavior away
from the well location.
P10 400
MDS-axis 2 [5.76%]
400
200
300 0
-200
200
-400
100
-600
0 -800
0 500 1000 1500 2000 -2000 -1000 0 1000 2000 3000
Time, t [days] MDS-axis 1 [92.96%]
Figure 4.12: Left image shows the one-dimensional oil field rate for the Strebelle3 ensemble. Right figure depicts
the two-dimensional MDS projection of this ensemble response, computed over t ∈ [0,tNt ] = [0,tend ].
opposed to Nc dissimilarity matrices (for each time interval) in the first case.
Both methods of uncertainty quantification offer certain advantages. The first method can serve as
a way to estimate how similar the behavior of the uncertainty is across several ensemble scales. The
magnitude of the separation or distance between finest ensemble Uncertainty Trajectory and coarser
ensemble scales is a measure of how similar realizations are behaving (in time and when coarsening). Note
that this doesn’t mean that there exists no error between the fine- and coarse scale solutions to the flow
problem, it merely means that the coarse ensemble uncertainty has similar characteristics as the respective
fine-scale ensemble uncertainty. These characteristics can then be exploited in the form of finding similar
flow responses and appointing representative realizations for the said groups of flow responses, using a
coarser scale flow simulations as a measure of distances between ensemble members. On top of that,
clustering algorithms such as K-medoids are invariant to translations and orthogonal transformations of
the data points (Kaufman & Rousseeuw, 1987). Even under the aforementioned linear transformations,
the model selection algorithm based on coarse flow response distances can be quite effective in selecting
representative realizations if the Uncertainty Trajectory of the particular coarse scale ensemble follows
the Uncertainty Trajectory of the fine-scale ensemble closely.
The second method allows for a more direct measure of how similar the uncertainty is across ensemble
scales while considering the intrinsic uncertainty within each ensemble scale simultaneously. Note that
uncertainty refers here to the spread or variability in flow responses of each ensemble. If after MDS
projection of the joint distance matrix, containing the distance between each ensemble member of every
ensemble scale, ensemble members of the same scale are projected closer to each other than to members
of different ensemble scales, there most likely exists a large upscaling bias. This can be understood
considering a water injection situation as is modeled in this work, where (hypothetically) the coarse
ensemble consistently predicts much later water breakthrough. This creates a bias in the coarse solutions
and causes coarse solutions to be more similar to themselves as opposed to the fine-scale solutions. An
attempt to depict this can be found in figure 4.21.
Using the first approach, shown in figure 4.14 and 4.15, the Uncertainty Trajectory is computed for each
ensemble scale using the several properties available (as mentioned above). The deviation in Uncertainty
Trajectory becomes evident when coarsening ratio increases. This is observed for each property, at the
Field rate, N = 100 × 100 Field rate, N = 100 × 100 Field rate, N = 100 × 100
1 1 1
Ensemble Ensemble Ensemble
0.8 P10 0.8 P10 0.8 P10
P50 P50 P50
Water cut [−]
0 0 0
0 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000
Time, t [days] Time, t [days] Time, t [days]
MDS proj. X45 : t ∈ [0, t45 ] = [0, 668] MDS proj. X89 : t ∈ [0, t89 ] = [0, 1331] MDS proj. X133 : t ∈ [0, t133 ] = [0, 2000]
1 2 2
0.5
MDS-axis 2 [1.43%]
MDS-axis 2 [5.34%]
MDS-axis 2 [3%]
1 1
0
-0.5 0 0
-1
-1 -1
-1.5
-2 -2 -2
-2 0 2 4 6 -5 0 5 -5 0 5
MDS-axis 1 [98.5%] MDS-axis 1 [96.26%] MDS-axis 1 [93.61%]
Figure 4.13: Top row displays the response of the Strebelle3 ensemble in terms of field water cut after different
times of production. Bottom row depicts the two-dimensional projection of the MDS, included different times of
production. The projection clearly depends on the interval chosen for calculating the distance matrix D . Since one
point in the 2D-projection represents one realization, they can be easily color coded relative to the property used
for computing the initial distance matrix D . In this case the color code represents breakthrough time of water, with
red being fast and blue being slow. Note that with increasing variation between the curves, due to increasing the
time interval over which the distance is computed, the energy of the second eigenvalue in the projection becomes
larger, denoted in % on the x- and y-axis of the plot.
Figure 4.14: Left schematic depicts the MDS projections, X i such that t ∈ [0,ti ] and i = 1, 2, . . . , Nt . Right image
displays the full spectrum of projections, referred to as the Uncertainty Trajectory. The procedure to obtain the
continuous Uncertainty Trajectory is not trivial, but a representative display can be obtained following the workflow
proposed in Appendix A.
well and field scale. Figure 4.17 illustrates the deviation from the fine-scale Uncertainty Trajectory for
two realizations. Note that two realizations are chosen since visually comparing the 100-trajectories for
seven ensemble scales isn’t very informative. A more quantitative comparison, however, is shown in figure
4.18 where the integrated distance for each trajectory with respect to the finest scale is displayed.
Another way of examining the coarsening effect in terms of Uncertainty Trajectory is considering the
first approach, but computing an Orthogonal Procrustes (OP) problem at each time-slice w.r.t. the finest
Uncertainty Trajectory. An OP can be defined as an orthogonal matrix R which is the closest map between
two matrices (Schönemann, 1966) (Zhang, 2000). The problem of finding the orthogonal matrix R that
minimizes the difference between two matrices say A and B , can be expressed through the Singular Value
Decomposition (SVD), a generalization of the Eigenvalue Decomposition explained in chapter 3.4.2.
Y = B A T = U ΣV T (4.10)
where U is the orthogonal matrix of left-singular vectors (as its columns), Σ is diagonal matrix containing
the singular values and V T is the orthogonal matrix of right-singular vectors (as its rows). Then, R can
be found by computing
R = UV T (4.11)
Matrix B in our case is the coordinate matrix obtained from the classical MDS using the fine-scale
ensemble, while matrix A is the coordinate matrix obtained from the classical MDS using the coarse-scale
ensembles. Since an orthogonal transformation doesn’t affect the clustering (Kaufman & Rousseeuw,
1987), aligning the “data clouds” at each time-slice with the fine-scale projection provides insight in the
accuracy of model selection on coarse distances. If the Uncertainty Trajectory deviates a lot, which
happens especially at early distances, the coarse scale distance cannot be used for model selection. Figure
4.19 depicts the outcome of this procedure for the cumulative oil production (as in the top right of figure
4.17). The deviation is heavily dependent on the time interval used in computing the MDS projection.
Near the end of the simulation time, all model scales tend to converge to the fine-scale orientation (except
the 5×5 ensemble scale. Note that no scaling nor translation was required to obtain this fit, merely an MDS
projection for each interval which afterward is transformed (rotated) using an orthogonal transformation
(i.e. linear transformation that preserves the dot product).
The integrated distance for this modified Uncertainty Trajectory, plotted on a semi-log scale is shown
in figure 4.20. This plot suggests a dynamic relevant spatial scale similar to the scale predicted on the
prior using DCT. This doesn’t seem to happen consistently throughout the other ensembles, which makes
it not possible to confirm a clear correlation, via DCT, between the static and dynamic characteristic
scale.
The mean integrated distance to the fine-scale Uncertainty Trajectory displays very similar behavior
using the second approach. The main difference between the two approaches can be visualized in the
following way. Consider figure 4.21 where the left plot depicts the last time-slice (namely t ∈ [0,tend ])
of the separate MDS projections of each ensemble scale (method 1) and the right plot shows the last
time-slice of the second method where the MDS projection depends on the response of all ensemble scales
and members simultaneously. The color in figure 4.21 indicates a particular model scale whereas the
symbol indicates a particular realization. The most important difference between the two methods is the
absence of the coarsest ensemble members on the left side of the plot associated with method 2. This
can be explained by considering the fact that the MDS projection of method 2 (see figure 4.16) contains
the flow response of all ensemble members of the same scale as well as across other scales in the distance
computation. This means that consequently, the coarsest scale ensemble response is more similar to itself
than to the finest scale response, due to an upscaling bias (consistent overestimation of the cumulative
oil produced). More ensembles and their Uncertainty Trajectories are shown in the Appendix, namely
Appendix B.
Sensitivity to sub-heterogeneity14 was also investigated. The Uncertainty Trajectory is relatively insen-
sitive to these high-frequency fluctuations of the parameters since flow in these models is mainly governed
by the facies distribution, especially on a statistical (uncertainty) level. The reason for this is most likely
due to the large contrast in permeability between the reservoir and non-reservoir facies, rendering the
high-frequency fluctuations less impact-full. This doesn’t necessarily imply flow with and without these
sub-heterogeneities being similar at each model scale. Figure 4.22 displays the flow response for the same
facies distribution with- and without sub-heterogeneity. The sub-heterogeneity was generated using a
smoothly varying Gaussian random field, resulting in a bi-modal distribution of the permeability with the
same mean as the homogeneous (in terms of permeability distribution inside facies) model, as also shown
in figure 4.22. The absolute difference (denoted “misfit”) between the two saturation fields displays that
the difference is rather small. This is most likely caused by the fact that most of the flow (and transport)
occurs through the highly permeable reservoir facies (due to the large contrast in permeability between
the reservoir and non-reservoir facies).
14 Where sub-heterogeneity refers to a heterogeneous representation of the properties inside a particular facies, as discussed in chapter
2.2.4.
Figure 4.15: Schematic indicating one way of comparing the Uncertainty Trajectory for several ensemble scales, namely by computing the Uncertainty
Trajectory separately for each ensemble scale.
Page 48
Determination spatial scale
Determination spatial scale
Np
N xi N iy
Npx
N 1 N 1y N 2 N y2
Npx
N C N Cy
Npx
N t CM
Figure 4.16: Schematic indicating the second way of comparing the Uncertainty Trajectory for several ensemble
scales, namely by computing the Uncertainty Trajectory separately for each ensemble scale.
Uncertainty Trajectory Field Water cut Uncertainty Trajectory Field Oil prod.
Time, t [days]
1500 200x200 1500
100x100
1000 50x50 1000
25x25
500 20x20 500
10x10
5x5
0 0
0.5 0.1
1 1
MD MD 0
S-a 0 0.5 S-a 0.5
xis 1 xis -0.1 1
2 0 -axis 2 0 -axis
-0.5 -0.5 MDS -0.2 -0.5 MDS
Uncertainty Trajectory Well Oil rate Uncertainty Trajectory Oil prod. Well 1
2000 2000
Time, t [days]
Time, t [days]
1500 1500
1000 1000
500 500
0 0
0.5 0.1
1 1
MD MD 0
S-a 0 0.5 S-a 0.5
xis 1 xis -0.1 1
2 0 -axis 2 0 -axis
-0.5 -0.5 MDS -0.2 -0.5 MDS
Figure 4.17: Top left image depicts the Uncertainty Trajectory for two realizations using the Field water cut as a
distance for the MDS projections. Clearly the deviation from the finest trajectory (200×200 model scale) increases
when coarsening. The same behavior can be seen, in a different extend for the other properties.
Optimum scale
1
Field water cut
Field oil prod.
0.5
0.25
0
100 ×100 50 ×50 25 ×25 20 ×20 10 ×10 5 ×5
Model scale (Nx × Ny )
Figure 4.18: Depicting the mean of the integrated distance between the finest ensemble scale and coarser scales
respectively. The distance in Uncertainty Trajectory increases when coarsening, and seems to show consistent
behavior across several properties.
2000
1500
Time [days]
200×200
1000
100×100
50×50
25×25
500 20×20
10×10
5×5
0
-2000
-200
0 -1000
MD
S -a x 200
is 2 0 is 1
400 S -a x
MD
600 1000
Figure 4.19: Displaying the Uncertainty Trajectory of two realizations from the Strebelle3 ensemble for Cumulative
Oil Production after performing an Orthogonal Procrustes problem w.r.t. to the finest scale Uncertainty Trajectory
at each time-slice.
0.6
0.4
0.2
0
10 1 10 2 10 3 10 4 10 5
Model scale (Nx × Ny )
Figure 4.20: Depicting the normalized mean of the integrated distance w.r.t. the finest scale Uncertainty Trajectory
(as shown in figure 4.19). The integrated distance is similar to the ones displayed in the top right of figure 4.17,
however now an Orthogonal Procrustes problem is solved w.r.t. the fine-scale ensemble at each time-slice.
MDS axis 2
Figure 4.21: Left image depicts the final-slice of the MDS projection, namely such that t ∈ [0,tend ], using method 1,
i.e. independently projecting the ensemble scale distances. Different realizations are depicted with different sym-
bols, while the color indicates the ensemble scale to which that particular realization (symbol) belongs. Clearly
grouping or realizations can be observed with ensemble scale 20 × 20 typically being the last scale which can
clearly be grouped together. Right image shows the final slice of the MDS projection using method 2, i.e. consid-
ering all the distances across all scales in one single projection (per time-slice). The key observation in this plot is
the absence of the coarsest ensemble on the left side of the plot, indicating that the flow response of the coarsest
ensemble is more similar to itself than to the same realization at different scales.
Figure 4.22: Top left depicts the two permeability fields of a binary (referring two homogeneous distribution in the
facies) and bimodal distribution, see chapter 2.2.4. Bottom left shows the actual distribution and the fact that they
share the same mean (so that they are able to be compared in a general sense). Top right displays the associated
saturation fields of the two models, where blue depicts water and yellow the oil phase. Bottom right depicts the
absolute difference between the two saturation fields at two particular times.
1 Assuming that financial decision associated with a Field Development Plan is mainly based on the quantile response of the ensemble,
p.262.
4 Using the “Statistics and Machine Learning Toolbox in MATLAB
” c (MathWorks, 2017). See also chapter 3.4.
5 In clustering with k-medoids, the center (referred to as a medoid) of each cluster is part of the data, therefore a data point itself.
55
Predicting fine-scale response using coarse scale distances
computational effort since every medoid has to be simulated on the fine-scale. In the limit, where a number
of clusters are equal to a number of ensemble members, the subset becomes equal to the ensemble and so
do the statistics. An optimum has to be found where enough clusters are chosen to ensure convergence
to the full ensemble statistics but constrained by computational costs of adding too many clusters. The
convergence is not only a function of the number of clusters but also a function of the “quality” of the
model selection which in turn is heavily influenced by the following three things:
1. Amount of correlation (positive or negative), contained in the property used for computing the
distances D (2) , between the fine- and respective coarse-scale.
2. Amount of energy explained by the first two eigenvalues of the MDS projection.
The first point can be understood intuitively through the definition of clustering. Clustering is based
on grouping objects (or data) such that objects in the resulting group are more “similar” to other objects
in that group than to objects outside of the particular group. Since a representative of a cluster is chosen
for fine-scale simulation, the resulting response of that simulation must have a large correlation with the
property used for computing the dissimilarities in the MDS projection. For example, if cumulative oil
production is used as a property to compute the matrix of squared distances D (2) which consequently used
in computing the MDS projection, the resulting configuration of points can be effectively clustered such
that members of the same cluster describe similar behavior in the cumulative oil production. Combining
all the representatives (medoids) will result in a good approximation of the overall behavior in cumulative
oil production of the full ensemble. It doesn’t mean, however, that the combination of these medoids will
approximate other properties equally well. It will most likely not work if there is no (strong) correlation
between the property used for dissimilarity and the property it tries to approximate. This is shown in
figure 5.1 where the property used for D (2) is field cumulative oil production, which clearly separates this
property in an ordered manner, but doesn’t manage to separate the water rate of well 1. The medoids for
the three clusters do show an approximation of the averaged behavior, however, the method will become
less robust in this prediction when clustering of similar realizations is less effective. This problem can
be solved easily by computing another MDS projection using the coarse ensemble water rate of well 1
for D (2) on which another clustering is done and finally simulate the intersection of the two subsets (set
1 containing representatives for field oil production, set 2 containing representatives for water rate of
well 1) on the fine-scale as approximation to the full fine-scale ensemble response. Another option is to
combine several properties for the computation of D (2) , such as oil and water production and perform a
joint-estimation, similar to the joint-estimation using a proxy response in (Scheidt & Caers, 2009).
The second point is also necessary since performing a clustering on the 2D-projection of the ensemble
response will only result in clusters where members of the clusters are similar to each other when the 2D-
projection accurately describes the dissimilarities between the responses. This is generally achieved when
at least a certain percentage of the total energy is contained in the first two eigenvalues of the projection,
which is very likely to happen considering these one-dimensional responses. When dissimilarities between
two-dimensional responses are concerned, for example when performing clustering on saturation maps or
velocity fields, extra dimensions might be required to accurately represent the dissimilarities of the full
ensemble in the lower dimensional space. Note that for clustering purposes, adding another dimension to
better approximate the original matrix D (2) is not a problem, only visually it cannot be represented when
more than three dimensions are required.
The third point is motivated by the erratic behavior of some of the coarse trajectories in the first half
of the simulation time, e.g. seen in figure 4.19.
×10 5 Clustering 10 × 10 Field Oil Prod. 100 × 100 Rate Well 1 100 × 100
6
Cluster 1 Cluster 1
2 Cluster 2 Cluster 2
Cluster 3 300 Cluster 3
Medoids Medoids
200
0
2
-1 100
-2
0 0
-2 -1 0 1 2 0 2000 0 2000
MDS-axis 1 ×10 6 Time, t [days] Time, t [days]
Figure 5.1: Left figure depicts clustering of the MDS projection, where D (2) is computed using the field oil
production of the coarse ensemble (number of nodes is 10 × 10). Note that the MDS-axis 1 is roughly one order of
magnitude larger than MDS-axis 2, causing the nearly vertical separation in the clustering. The number of clusters
is chosen to be three for this example. The middle graph shows that the clustering is very effective for identifying
production classes for the field oil production of the fine-scale ensemble. The right figure displays a less effective
prediction of similar behaving realizations when the water rate of well 1 is concerned. Note that even though the
clustering for this property using field oil production for D (2) is less effective in predicting similar responses for
the water rate of well 1, the medoids still manage to approximate the average behavior of the water rate of well 1.
However, it can be understood that if the clustering is less effective in grouping similar realizations, the method
becomes less robust in predicting the average behavior. Also note that if coarse ensemble water rate of well 1 was
used in computing D (2) , the clustering on that MDS projection would be again very effective in predicting similar
responses for the fine-scale ensemble behavior of the water rate of well 1.
Original MDS projection MDS with Gaussian RBF, σ = 0.1 MDS with Gaussian RBF, σ = 5
MDS-axis 2
MDS-axis 2
MDS-axis 2
Figure 5.2: Left image depicts the same clustering as in figure 5.1 but now plotted with equally scaled axis. This
illustrates the vertical clustering in figure 5.1, which is merely an artifact of the axes not being equally scaled.
Middle figure illustrates a MDS projection using the kernel trick (see chapter 3.4.4) with a Gaussian Radial Basis
Function (RBF) as kernel function, for σ = 0.1. The color code corresponds to the same clustering on the original
MDS projection. Right image shows another MDS projection with kernel trick using the same Gaussian RBF but
with σ = 5. The color code represents the same clustering as the original clustering. Several authors, such as
(Scheidt & Caers, 2009), advise to use a bandwidth of 20% for the σ parameter in the kernel function (which is
close to the σ = 5 used in the right image).
Pearson correlation, ρ
0.5 0.5
0 0
-0.5 -0.5
-1 -1
Qfo ield Qfwield WCf ield Nfp,w
ield
Qfo ield Qfwield WCf ield Nfp,w
ield
Property Property
0.9
Pearson correlation, ρ
Pearson correlation, ρ
0.5
0.8
0 Nmax,f
p,o
ield
max,f ield
0.7 Np,w
tfBT
ield
-0.5 well,1
tBT
0.6
Nmax,well1
p,o
-1 Nmax,well2
p,o
Qwell Qwell WCwell Nwell 0.5
o w p,w 100 × 100 50 × 50 25 × 25 20 × 20 10 × 10 5×5
Property Model scale (Nx × Ny )
f ield
Figure 5.3: Top left depicts the correlation between the field oil production (denoted Np,o ) and several other
properties for each ensemble member using a box-plot method, for the 200 × 200 ensemble scale. Top right shows
the correlation between the field oil production and several other properties for each ensemble member using a box-
plot method, for the 10 × 10 ensemble scale. Bottom left illustrates the correlation between the well oil production
and several other properties for each ensemble member using a box-plot method, for the 200 × 200 ensemble scale.
The three aforementioned graphs show similar behavior for well and field scale as well as fine- and coarse ensemble
scales. The negative correlation for the oil rate (denoted Q·o ) is because a large cumulative oil production occurs
at time of lower oil rate since oil rate tends to decline due to production of water and depletion of the field over
time. Bottom right depicts the correlation for several properties between ensemble scale 200 × 200 and coarser
max,·
ensemble scales. Np,· is the maximum production of the specific phase (oil or water) and particular scale (field
· corresponds to the time of breakthrough for a particular scale (field or well). Properties used for
or well) while tBT
computation of the distance matrix of squared dissimilarities D(2) which show a large correlation across several
ensemble scales will result in an effective clustering.
1 1
0.5 0.5
MDS axis 2
MDS axis 2
0 0
-0.5 -0.5
-1 -1
Figure 5.4: Top left image shows a MDS projection using cumulative oil production of the coarse (20 × 20)
ensemble as a property for computation of the dissimilarities, denoted by D (2) Np20×20 . Top right image shows a
K-medoids clustering, using 10 clusters denoted Nk = 10. Bottom left shows the response of these 10 medoids after
forward simulation on the finest scale. Underlying these 10 responses is the “unknown” full fine-scale ensemble
response. Bottom right image depicts the approximation of the quantiles using the subset of 10 medoids.
1. Choice of particular scale and response-metric6 for the coarse ensemble, denoted D (2) (·) (or for the
N ×N
distance in figure 5.4 by D (2) (Np,ox y )).
rate, etc..
b(i) − a(i)
s(i) = (5.1)
maxa(i), b(i)
where a(i) is the average dissimilarity of i to all other objects in cluster A, and b(i) is the minimum
averaged dissimilarity between i and other clusters (excluding A). Therefore s(i) lies on the interval [−1, 1]
where close to −1 means a poor clustering since the within cluster dissimilarity a(i) is larger than b(i),
while s(i) close to 1 indicates a good clustering since a(i) is much smaller than b(i).
Only considering the actual clustering, the average Silhouette index (width) is an objective (appro-
priate) measure for determining the number of optimal clusters (Rousseeuw, 1987). However, is it not
immediately evident if the number of clusters, or classes of flow responses, is sufficient to derive subset
statistics that converge to the full ensemble statistics. The number of clusters determines the number of
realizations included in the subset statistics. If the Silhouette index determines that the optimal number
of clusters is two, only two realizations are chosen as representatives of the full ensemble. This means that
the subset of the quantiles is only based on two realizations. Even for a perfect distance, this will result
in a very poor estimation of the three quantiles. Therefore, another way of determining the number of
clusters (size of the subset) should be investigated. The best choice is to use one which accounts for the
actual objective of converging to full-ensemble statistics with merely a subset of realizations.
In real-world applications, the full fine-scale ensemble response is unknown, which makes determining
the appropriate number of clusters required to converge to the (unknown) full fine-scale ensemble statistics
even harder. However, the full coarse ensemble response is known since it is used in the MDS projection
(by computation of D (2) (·)). This means that the convergence rate between the coarse-scale subset and
full ensemble can be computed as a function of a number of clusters. This is illustrated in figure 5.5. If
the coarse distance is of high quality, as previously defined, the convergence between the fine-scale subset
and ensemble can be similar to the convergence between the coarse-scale subset and ensemble.
The coarse scale subset converges to the full ensemble. In the interpretation of this result, we should
consider the reduction in variability for coarser grids. This causes the subset for coarser simulations to
converge faster to the coarse full ensemble statistics than for finer-scale subsets and ensembles which
contain a larger variability in response, see figure 5.6.
5.4.2 Effect of simulation time on dissimilarities between ensemble members and resulting clus-
tering
The clustering seems to be the most effective considering either the whole or the last 1/3 of the simulation
time, in the computation of the coarse distances subsequently used in the model selection (through clus-
tering). This is expected for the property water cut since several realizations haven’t shown breakthrough
in the first interval. The clustering, therefore, has difficulty predicting representative realizations which
accurately describe the full simulation time. This is best observed in figure 5.7. The clustering based on
coarse distances, taken over the first 1/3 of the simulation, is not able to find representative realizations
that also accurately describe the later simulation times. Similar behavior is observed in figure 5.8. For
both figure 5.7 and 5.8 the coarsest ensemble scale is used in the computation (most degenerate distance)
and 15 representatives are selected for computation of the subset quantiles.
Error quantiles
Error quantiles
Coarse Coarse Coarse
1.5 1.5 1.5
1 1 1
0 0 0
0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25
Nr. of clusters Nr. of clusters Nr. of clusters
×10 4 D(2) (Np,o
25×25
) ×10 4 D(2) (Np,o
20×20
) ×10 4 D(2) (Np,o
10×10
)
Error quantiles
Error quantiles
Coarse Coarse Coarse
1.5 1.5 1.5
1 1 1
0 0 0
0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25
Nr. of clusters Nr. of clusters Nr. of clusters
×10 4 D(2) (Np,o
5×5
)
2 Fine
Error quantiles
Coarse
1.5
0.5
0
0 5 10 15 20 25
Nr. of clusters
Figure 5.5: Convergence between the subset statistics and full ensemble for both the scale at which the distance
is computed (denoted in red with “coarse”) as well as the scale which it is compared with (denoted in blue with
“fine”), as a function of clusters. This is similar to the robustness-analysis shown in (Scheidt et al., 2009), however
here it is not compared for added noise which reduces the correlation with the finest-scale but compared for the
scale at which the coarse distance is computed. Similar convergence rate can be observed for the smaller upscaling
ratios. This indicates that when the coarse distance is of larger quality (see above for definition), the convergence
rate can predict the required number of clusters. However, caution is advised since a too coarse distance will
predict a low amount of required clusters due to the reduced variance in the coarse ensemble response, see 5.6.
10×10
var Np,ox y (t)
1.5 5×5
N ×N
1
0.5
0
200 400 600 800 1000 1200 1400 1600 1800 2000
Time [days]
Figure 5.6: Depicts the reduction in variance when coarsening. This poses a risk to using the convergence rate
between the subset of models (selected using the coarse distances) and the full coarse-scale ensemble as a measure
of required number of clusters/realizations. Generally, due to lower variance, a smaller subset (less realizations) is
required to converge/explain the full ensemble statistics on the coarser scale.
MDS axis 2
MDS axis 2
MDS axis 2
MDS axis 2
MDS axis 2
-2 -2 -2
-1
-4 -2
-4 -4
0 1 2 -4 -2 0 2 4 -2 0 2 4 -4 -2 0 2 4 -1 0 1 2
0 0 0 0 0
0 1000 2000 0 1000 2000 0 1000 2000 0 1000 2000 0 1000 2000
Time, t [days] Time, t [days] Time, t [days] Time, t [days] Time, t [days]
Figure 5.7: Top row depicts the coarse response used for computation of the distance, at several time-intervals. Middle row displays the effect of the
time-intervals on the computed MDS projection. Color of the points corresponds to the final value of the response for that interval. Bottom row shows the
associated quantiles for the full-ensembles (fine- and coarse-scale) as well as the computed quantiles from the K-Medoids clustering on the MDS projection
in the middle row. The earliest time interval clearly shows the most deviation from the full fine-scale ensemble quantiles. Note that the coarsest distance is
used for the distance computation, and the subset constitutes 15 realizations.
Page 64
Predicting fine-scale response using coarse scale distances
Response at t1 Response at t2 Response at t2 − t1 Response at t3 Response at t3 − t2
2 2 2 2 2
1 1 1 1 1
0 0 0 0 0
-5 -5
MDS axis 2
MDS axis 2
MDS axis 2
MDS axis 2
MDS axis 2
-5
-10
-5 -10 -10
-10
-4 -2 0 2 4 6 -5 0 5 10 15 -5 0 5 10 0 10 20 -5 0 5 10
MDS axis 1 MDS axis 1 MDS axis 1 MDS axis 1 MDS axis 1
1 1 1 1 1
0 0 0 0 0
0 1000 2000 0 1000 2000 0 1000 2000 0 1000 2000 0 1000 2000
Time, t [days] Time, t [days] Time, t [days] Time, t [days] Time, t [days]
Figure 5.8: Top row depicts the coarse response used for computation of the distance, at several time-intervals. Middle row displays the effect of the
time-intervals on the computed MDS projection. Color of the points corresponds to the final value of the response for that interval. Bottom row shows the
associated quantiles for the full-ensembles (fine- and coarse-scale) as well as the computed quantiles from the K-Medoids clustering on the MDS projection
in the middle row. Note that all MDS projections have a similar linear separation in the MDS axis 1, while the resulting subset statistics display very
different behavior. This might indicate that the clustering isn’t very effect since there are no actual clusters to be recognized. This could cause certain
medoids to be appointed as representative which are actually not that representative. Note that the coarsest distance is used for the distance computation,
and the subset constitutes 15 realizations.
Page 65
Predicting fine-scale response using coarse scale distances
Discussion and Conclusion
6
The main goal of the thesis project is to determine if there exists a relevant spatial scale in reservoir
simulation. The problem of finding this relevant spatial scale is subdivided into two parts, namely finding
a relevant static and dynamic spatial scale. Geological modeling resulted in several ensembles of models
constituting the test cases for the presented research question. The Discrete Cosine Transform (DCT) is
used to determine a relevant static spatial scale, based on the characteristics of the input signal (trans-
missibility fields). DCT is able to identify the dominant basis-vector, in particular, the basis-vector which
explains the most significant pattern contained in the original signal. Note that a basis-vector for the
two-dimensional DCT is nothing more than a cosine wave oscillating at two distinct frequencies, namely a
frequency in the x− and y−direction. Due to stochasticity in the generation of the static reservoir models,
this dominant basis-vector might vary from one model to the other. Therefore, the relevant static spatial
scale of the ensemble is estimated as the most frequent dominant basis-vector, across every ensemble
member.
A coarser representation of the original input signal is obtained through flow-based upscaling. Outlying
transmissibility values in the coarser representation are found, similar to the ones mentioned in Nielsen
et al.,(2000) and Chen et al.,(2003), caused by the global upscaling. It is shown that after transforming
the original signal by taking the natural logarithm and subtracting the mean, DCT is insensitive to the
aforementioned outlying values. This means that the DCT is also useful in analyzing the evolution of the
static characteristics of the ensemble when coarsening.
It is observed that the relevant static spatial scale remains constant across several ensemble scales,
however, degenerates when the ensemble scale approaches the determined dominant scale. This is ex-
pected since the dominant characteristics of the signal cannot be accurately represented at this particular
scale. A geometric interpretation of the dominant basis-vector is given. This interpretation leads to soft
constraints on the allowable (Cartesian) grid-dimensions, required to accurately represent the original
signal on the coarser domain. Note that a 2D cosine wave which constitutes the dominant basis-vector,
oscillate in the z-direction. The channels in the geological model however oscillate in the xy-plane. This
is illustrated in figure 4.6, where the face-to-face connectivity is not preserved. When using a Two-Point
Flux Approximation (TPFA) in the upscaling and consecutive flow simulations, this requires additional
constraints on the coarse grid-dimensions. Namely, such that fine-grid connectivity of the channels is
preserved on the coarse grid. The dynamic behavior of the reservoir determines most (if not all) of the
decision making. On top of that, the transfer function from the static (parameter space) to the dynamic
(solution space) is highly nonlinear. Therefore knowledge on a relevant static spatial scale is not enough.
Response uncertainty is obtained through forward simulating the hierarchical ensembles in time using
AD-GPRS. Uncertainty Quantification is done using a reduced representation of the ensemble response,
obtained with Multidimensional Scaling (MDS). The significance of simulation time on the response
uncertainty is identified via the construction of an Uncertainty Trajectory. The trajectories of all ensemble
scales are compared, and deviation from the finest uncertainty trajectory is used to analyze the effect of
coarsening on the response uncertainty. The magnitude of the deviation is dependent on the type of
response used in the computation of the MDS projections. When the water cut is chosen as a response,
67
Discussion and Conclusion
the time component becomes quite evident. This is most likely caused by the bias contained in the coarser
ensemble response. Bias refers to consistent later water breakthrough in the coarser responses. Near the
end of production time the characteristics of the uncertainty seem similar across all ensemble scales.
This constitutes the main reason for the attempt to exploit the coarser information. Clustering al-
gorithms such as k-medoids, are invariant to translations and orthogonal transformations (Kaufman &
Rousseeuw, 2009). This means that if the coarse- and fine-scale Uncertainty Trajectories are similar at a
particular time, clustering done on this time interval will be similar for both the coarse- and fine-scale.
Resulting representatives from the coarse clustering should therefore in theory approximate representative
fine-scale responses. If an appropriate number of clusters is found, the subset statistics can approximate
the full ensemble statistics. Typically this appropriate number of clusters is determined through the Sil-
houette index (Rousseeuw, 1987). For our purpose, this doesn’t always work. E.g., for some responses,
there might only exist two real clusters (based on the structure of the projection). This means that there
will only be two representative fine-scale realizations in the subset. The statistics of the subset will never
approximate all three quantiles of the full fine-scale ensemble accurately. A more robust method would
be to compute the convergence of the subset statistics and the full coarse-scale ensemble. The number
of clusters required to obtain the desired convergence can then be used for the representatives and sub-
sequently simulated on the fine-scale, constituting the subset statistics. Reduction of variability in the
coarser responses might pose a potential threat to this methodology and should, therefore, be investigated
further.
On top of the reduction in variability of the coarser response, the fine-scale ensemble response in real-
world applications is unknown. A possible way to handle these problems is using the following general
work-flow:
1. Construct Hf (geological) model.
2. Perform DCT to extract dominant scale.
3. Construct coarser representation using flow-based upscaling, where the grid-dimensions are governed
by the dominant scale estimated by the DCT.
4. Simulate fluid flow on the full coarse-scale ensemble.
5. Determine number of clusters via convergence test.
6. Simulate fluid flow on the subset of fine-scale representatives.
7. Compute the Uncertainty Trajectory for the subset of representatives, on the fine- and coarse scale.
8. Examine distance between the fine- and coarse-scale trajectories. If the behavior of the coarse-scale
trajectory is unsatisfactory, re-sample additional representatives and simulate these on the fine-scale.
This work-flow is still in an experimental phase and requires verification of robustness. Besides that, it also
requires a better definition of certain terms, such as “unsatisfactory deviation in uncertainty trajectory”.
The limitations of the model selection should be thoroughly understood.
In case future work is considered, an ever larger contrast in permeability between the reservoir and
non-reservoir facies can be considered as well as a larger number of depositional facies. Also, note that
the reservoirs in this work are considered geologically young, i.e. reservoir performance is governed
by sedimentology (which governs the petrophysics) only (Galloway & Hobday, 2012). Fractures occur
naturally in reservoirs (Berkowitz, 2002), and are currently heavily investigated (both from a geological
and reservoir simulation point of view). The relevant spatial scale of a reservoir model could be heavily
influenced by the presence of fractures and therefore this could be an interesting future research endeavor.
Note that flow-based gridding techniques in conjunction with flow-based upscaling methods, such as
summarized in Durlofsky (2005), provide a better coarse representation of the geological features of the
fine grid. This highlights one limitation of the performed study and could also be included in future
research.
69
DCT on LineDrive and GangesDelta
A
Ty , models scale: 120 × 120 Ty , models scale: 60 × 60 Ty , models scale: 40 × 40 Ty , models scale: 30 × 30
5 5 5 5
Frequency, ky [-]
Frequency, ky [-]
Frequency, ky [-]
Frequency, ky [-]
10 10 10 10
15 15 15 15
20 20 20 20
5 10 15 20 5 10 15 20 5 10 15 20 5 10 15 20
Frequency, kx [-] Frequency, kx [-] Frequency, kx [-] Frequency, kx [-]
5 5 5 5
Frequency, ky [-]
Frequency, ky [-]
Frequency, ky [-]
Frequency, ky [-]
10 10 10 10
15 15 15 15
20 20 20 20
5 10 15 20 5 10 15 20 5 10 15 20 5 10 15 20
Frequency, kx [-] Frequency, kx [-] Frequency, kx [-] Frequency, kx [-]
Figure A.1: Depicting the degeneration of the characteristics of the LineDrive3 ensemble at the model scale 20×20.
71
DCT on LineDrive and GangesDelta
Ty , models scale: 240 × 240 Ty , models scale: 120 × 120 Ty , models scale: 80 × 80 Ty , models scale: 60 × 60
5 5 5 5
Frequency, ky [-]
Frequency, ky [-]
Frequency, ky [-]
Frequency, ky [-]
10 10 10 10
15 15 15 15
20 20 20 20
5 10 15 20 5 10 15 20 5 10 15 20 5 10 15 20
Frequency, kx [-] Frequency, kx [-] Frequency, kx [-] Frequency, kx [-]
5 5 5 5
Frequency, ky [-]
Frequency, ky [-]
Frequency, ky [-]
Frequency, ky [-]
10 10 10 10
15 15 15 15
20 20 20 20
5 10 15 20 5 10 15 20 5 10 15 20 5 10 15 20
Frequency, kx [-] Frequency, kx [-] Frequency, kx [-] Frequency, kx [-]
Figure A.2: Depicting the preservation of the characteristics of the GangesDelta ensemble all model scales while
coarsening. Note that even at a coarse scale of 12 × 12, which is a factor of 20 × 20 upscaled, the dominant basis-
vector according to the DCT is still similar to the finest scale. This is likely indicates the thick channel features,
which can easily be described by a low frequency cosine wave, while the high-frequency (smaller channels) are
not easily characterized by the DCT.
2 2
0 0
MDS axis 2
MDS axis 2
-2 -2
-4 -4
-6 -6
-8 -8
-4 -2 0 2 4 6 -4 -2 0 2 4 6
MDS axis 1 ×10 7 MDS axis 1 ×10 7
5
5
P10
0 0
0 7320 0 7320
Time, t [days] Time [days]
Figure B.1: Depicting coarse distance model selection, for LineDrive2 ensemble well 1.
73
Subset statistics for full fine-scale ensembles for various properties and ensembles.
3 3
2 2
MDS axis 2
MDS axis 2
1 1
0 0
-1 -1
-2 -2
-3 -3
-15 -10 -5 0 5 10 -15 -10 -5 0 5 10
MDS axis 1 MDS axis 1
0.8 0.8
Water cut., W C(t), [-]
0.6 0.6
P10
0.4 0.4
Full 200 × 200 Ensemble
0.2 Response Medoid 1 0.2
Full 200 × 200 Ensemble
Response Medoid . . . Subset Approximation
Response Medoid Nk Full 5 × 5 Ensemble
0 0
0 2000 0 2000
Time, t [days] Time [days]
Figure B.2: Depicting coarse distance approximating water cut for Flumy3 ensemble, well 1. The limitations of
the clustering used in this thesis work is visible.
Ahmed, N., Natarajan, T., & Rao, K. R. (1974). Discrete cosine transform. IEEE transactions on
Computers, 100 (1), 90–93.
Arpat, G. B. (2005). Sequential simulation with patterns. Stanford University.
Aziz, K., & Settari, A. (1979). Petroleum reservoir simulation. Chapman & Hall.
Bear, J. (2013). Dynamics of fluids in porous media. Courier Corporation.
Berkowitz, B. (2002). Characterizing flow and transport in fractured geological media: A review. Advances
in water resources, 25 (8), 861–884.
Bishop, C. M. (2006). Pattern recognition and machine learning. springer.
Borg, I., & Groenen, P. J. (2005). Modern multidimensional scaling: Theory and applications. Springer
Science & Business Media.
Caers, J. (2011). Modeling uncertainty in the earth sciences. John Wiley & Sons.
Caers, J., Park, K., & Scheidt, C. (2010). Modeling uncertainty of complex earth systems in metric space.
In Handbook of geomathematics (pp. 865–889). Springer.
Caers, J., & Zhang, T. (2004). Multiple-point geostatistics: a quantitative vehicle for integrating geologic
analogs into multiple reservoir models.
Cao, H. (2002). Development of techniques for general purpose simulators (Unpublished doctoral disser-
tation). Stanford University.
Chen, C., Hu, D., Westacott, D., & Loveless, D. (2013). Nanometer-scale characterization of micro-
scopic pores in shale kerogen by image analysis and pore-scale modeling. Geochemistry, Geophysics,
Geosystems, 14 (10), 4066–4075.
Chen, Y., Durlofsky, L., Gerritsen, M., & Wen, X. (2003). A coupled local-global upscaling approach for
simulating flow in highly heterogeneous formations. Advances in Water Resources, 26 (10), 1041-
1060.
Chen, Z., Huan, G., & Ma, Y. (2006). Computational methods for multiphase flows in porous media.
SIAM.
Demyanov, V., Rojas, T., Arnold, D., & Christie, M. (2013). Uncertainty quantification in history
matching of fluvial reservoirs with connectivity analysis and realistic geology. In 75th eage conference
& exhibition incorporating spe europec 2013.
Donselaar, M. E., & Overeem, I. (2008). Connectivity of fluvial point-bar deposits: An example from the
miocene huesca fluvial fan, ebro basin, spain. AAPG bulletin, 92 (9), 1109–1129.
Dubuisson, M.-P., & Jain, A. K. (1994). A modified hausdorff distance for object matching. In Pattern
recognition, 1994. vol. 1-conference a: Computer vision & image processing., proceedings of the 12th
iapr international conference on (Vol. 1, pp. 566–568).
Durlofsky, L. J. (1991). Numerical calculation of equivalent grid block permeability tensors for heteroge-
neous porous media. Water resources research, 27 (5), 699–708.
Durlofsky, L. J. (2005). Upscaling and gridding of fine scale geological models for flow simulation. In 8th
international forum on reservoir simulation iles borromees, stresa, italy (Vol. 2024).
Ethridge, F. G., & Schumm, S. A. (1977). Reconstructing paleochannel morphologic and flow character-
istics: methodology, limitations, and assessment.
75
References
Fenwick, D., & Batycky, R. (2011). Using metric space methods to analyse reservoir uncertainty. In
Proceedings of the 2011 gussow conference.
Galloway, W. E. (1981). Depositional architecture of cenozoic gulf coastal plain fluvial systems.
Galloway, W. E., & Hobday, D. K. (2012). Terrigenous clastic depositional systems: Applications to fossil
fuel and groundwater resources. Springer Science & Business Media.
Gomez-Hernandez, J. J., Journel, A. G., et al. (1994). Stochastic characterization of gridblock perme-
abilities. SPE Formation Evaluation, 9 (02), 93–99.
Grappe, B., Cojan, I., Ors, F., & Rivoirard, J. (2016). Dynamic modelling of meandering fluvial systems
at the reservoir scale, flumy software. In Second conference on forward modelling of sedimentary
systems.
Haldorsen, H. H. (1986). Simulator parameter assignment and the problem of scale in reservoir engineering.
Reservoir characterization, 6 .
Hartigan, J. A., & Wong, M. A. (1979). Algorithm as 136: A k-means clustering algorithm. Journal of
the Royal Statistical Society. Series C (Applied Statistics), 28 (1), 100–108.
Hashemi, S., Javaherian, A., Ataee-pour, M., Tahmasebi, P., & Khoshdel, H. (2014). Channel character-
ization using multiple-point geostatistics, neural network, and modern analogy: A case study from
a carbonate reservoir, southwest iran. Journal of Applied Geophysics, 111 , 47–58.
He, C., & Durlofsky, L. (2006). Structured flow-based gridding and upscaling for modeling subsurface
flow. Advances in Water Resources, 29 (12), 1876-1892.
Helmig, R., et al. (1997). Multiphase flow and transport processes in the subsurface: a contribution to the
modeling of hydrosystems. Springer-Verlag.
Henriquez, A., Tyler, K. J., Hurst, A., et al. (1990). Characterization of fluvial sedimentology for reservoir
simulation modeling. SPE Formation Evaluation, 5 (03), 211–216.
Hoffimann, J., Scheidt, C., Barfod, A., & Caers, J. (2017). Stochastic simulation by image quilting of
process-based geological models. Computers & Geosciences.
Holden, L., & Lia, O. (1992). A tensor estimator for the homogenization of absolute permeability.
Transport in Porous Media, 8 (1), 37–46.
Holden, L., & Nielsen, B. (2000). Global upscaling of permeability in heterogeneous reservoirs; the output
least squares (ols) method. Transport in Porous Media, 40 (2), 115-143.
Jafarpour, B., Goyal, V. K., McLaughlin, D. B., & Freeman, W. T. (2010). Compressed history match-
ing: exploiting transform-domain sparsity for regularization of nonlinear dynamic data integration
problems. Mathematical Geosciences, 42 (1), 1–27.
Jain, A. K. (1989). Fundamentals of digital image processing. Prentice-Hall, Inc.
Jansen, J. D. (2013). A systems description of flow through porous media. Springer.
Jin, X., & Han, J. (2010). K-medoids clustering. In C. Sammut & G. I. Webb (Eds.), Encyclopedia of
machine learning (pp. 564–565). Boston, MA: Springer US.
Jungreuthmayer, C., Steppert, P., Sekot, G., Zankel, A., Reingruber, H., Zanghellini, J., & Jungbauer,
A. (2015). The 3d pore structure and fluid dynamics simulation of macroporous monoliths: High
permeability due to alternating channel width. Journal of Chromatography A, 1425 , 141–149.
Karssenberg, D., Tornqvist, T. E., & Bridge, J. S. (2001). Conditioning a process-based model of
sedimentary architecture to well data. Journal of Sedimentary Research, 71 (6), 868–879.
Kaufman, L., & Rousseeuw, P. (1987). Clustering by means of medoids. North-Holland.
Kaufman, L., & Rousseeuw, P. J. (2009). Finding groups in data: an introduction to cluster analysis
(Vol. 344). John Wiley & Sons.
Keogh, K. J., Martinius, A. W., & Osland, R. (2007). The development of fluvial stochastic modelling
in the norwegian oil industry: A historical review, subsurface implementation and future directions.
Sedimentary Geology, 202 (1), 249–268.
Kim, K. H., Lee, K., Lee, H. S., Rhee, C. W., & Shin, H. D. (2017). Lithofacies modeling by multipoint
statistics and economic evaluation by npv volume for the early cretaceous wabiskaw member in
athabasca oilsands area, canada. Geoscience Frontiers.
Kitanidis, P. K. (1990). Effective hydraulic conductivity for gradually varying flow. Water Resources
Research, 26 (6), 1197–1208.
Lay, D. C. (2003). Linear algebra and its applications. Addison Wesley, Boston.
Lee, K., Lim, J., Choe, J., & Lee, H. S. (2017). Regeneration of channelized reservoirs using history-
matched facies-probability map without inverse scheme. Journal of Petroleum Science and Engi-
neering, 149 , 340–350.
LeVeque, R. J. (2002). Finite volume methods for hyperbolic problems (Vol. 31). Cambridge university
press.
Lopez, S., Cojan, I., Rivoirard, J., & Galli, A. (2009). Process-based stochastic modelling: meandering
channelized reservoirs. Analogue Numer Model Sediment Syst: From Understand Predict (Special
Publ. 40 of the IAS), 40 .
Marden, J. I. (1996). Analyzing and modeling rank data. CRC Press.
Mariethoz, G., & Caers, J. (2014). Multiple-point geostatistics: stochastic modeling with training images.
John Wiley & Sons.
MathWorks, I. (2017). Statistics and machine learning toolbox. The MathWorks, Inc., Natick, Mas-
sachusetts, United States.
Mattax, C. C., Dalton, R. L., et al. (1990). Reservoir simulation (includes associated papers 21606 and
21620). Journal of Petroleum Technology, 42 (06), 692–695.
Miall, A. D. (2013). The geology of fluvial deposits: sedimentary facies, basin analysis, and petroleum
geology. Springer.
Michael, H., Li, H., Boucher, A., Sun, T., Caers, J., & Gorelick, S. (2010). Combining geologic-process
models and geostatistics for conditional simulation of 3-d subsurface heterogeneity. Water Resources
Research, 46 (5).
Nordahl, K., & Ringrose, P. S. (2008). Identifying the representative elementary volume for permeability
in heterolithic deposits using numerical rock models. Mathematical geosciences, 40 (7), 753–771.
Omre, H., Lødøen, O. P., et al. (2004). Improved production forecasts and history matching using
approximate fluid-flow simulators. SPE Journal , 9 (03), 339–351.
Ortiz, J., & Deutsch, C. V. (2002). Calculation of uncertainty in the variogram. Mathematical Geology,
34 (2), 169–183.
O’Sullivan, A., Christie, M., et al. (2005). Solution error models: a new approach for coarse grid history
matching. In Spe reservoir simulation symposium.
Peaceman, D. W. (1977). Fundamentals of numerical reservoir simulation. Elsevier Scientific Publishing
Company.
Pyrcz, M. J., Boisvert, J. B., & Deutsch, C. V. (2008). A library of training images for fluvial and
deepwater reservoirs and associated code. Computers & Geosciences, 34 (5), 542–560.
Rao, K. R., & Yip, P. (2014). Discrete cosine transform: algorithms, advantages, applications. Academic
press.
Remy, N., Boucher, A., & Wu, J. (2009). Applied geostatistics with sgems: a user’s guide. Cambridge
University Press.
Rongier, G., Collon, P., & Renard, P. (2017). Stochastic simulation of channelized sedimentary bodies
using a constrained l-system. Computers & Geosciences, 105 , 158–168.
Rousseeuw, P. J. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster
analysis. Journal of computational and applied mathematics, 20 , 53–65.
Scheidt, C., & Caers, J. (2009). Representing spatial uncertainty using distances and kernels. Mathematical
Geosciences, 41 (4), 397–419.
Scheidt, C., Caers, J., Chen, Y., & Durlofsky, L. J. (2011). A multi-resolution workflow to generate
high-resolution models constrained to dynamic data. Computational Geosciences, 15 (3), 545–563.
Scheidt, C., Caers, J., et al. (2009). Uncertainty quantification in reservoir performance using distances
and kernel methods–application to a west africa deepwater turbidite reservoir. SPE Journal , 14 (04),
680–692.