SPE-195800-MS Deep Learning and Bayesian Inversion For Planning and Interpretation of Downhole Fluid Sampling
SPE-195800-MS Deep Learning and Bayesian Inversion For Planning and Interpretation of Downhole Fluid Sampling
SPE-195800-MS Deep Learning and Bayesian Inversion For Planning and Interpretation of Downhole Fluid Sampling
Dante Orta Alemán, Stanford University; Morten Kristensen and Nikita Chugunov, Schlumberger
This paper was prepared for presentation at the SPE Annual Technical Conference and Exhibition held in Calgary, Alberta, Canada, 30 Sep - 2 October 2019.
This paper was selected for presentation by an SPE program committee following review of information contained in an abstract submitted by the author(s). Contents
of the paper have not been reviewed by the Society of Petroleum Engineers and are subject to correction by the author(s). The material does not necessarily reflect
any position of the Society of Petroleum Engineers, its officers, or members. Electronic reproduction, distribution, or storage of any part of this paper without the written
consent of the Society of Petroleum Engineers is prohibited. Permission to reproduce in print is restricted to an abstract of not more than 300 words; illustrations may
not be copied. The abstract must contain conspicuous acknowledgment of SPE copyright.
Abstract
Downhole fluid sampling is ubiquitous during exploration and appraisal because formation fluid properties
have a strong impact on field development decisions. Efficient planning of sampling operations and
interpretation of obtained data require a model-based approach. We present a framework for forward and
inverse modeling of filtrate contamination cleanup during fluid sampling. The framework consists of a deep
learning (DL) proxy forward model coupled with a Markov Chain Monte Carlo (MCMC) approach for the
inverse model.
The DL forward model is trained using precomputed numerical simulations of immiscible filtrate cleanup
over a wide range of in situ conditions. The forward model consists of a multilayer neural network with both
recurrent and linear layers, where inputs are defined by a combination of reservoir and fluid properties. A
model training and selection process is presented, including network depth and layer size impact assessment.
The inverse framework consists of an MCMC algorithm that stochastically explores the solution space using
the likelihood of the observed data computed as the mismatch between the observations and the model
predictions.
The developed DL forward model achieved up to 50% increased accuracy compared with prior proxy
models based on Gaussian process regression. Additionally, the new approach reduced the memory footprint
by a factor of ten. The same model architecture and training process proved applicable to multiple sampling
probe geometries without compromising performance. These attributes, combined with the speed of the
model, enabled its use in real-time inversion applications. Furthermore, the DL forward model is amendable
to incremental improvements if new training data becomes available.
Flowline measurements acquired during cleanup and sampling hold valuable information about formation
and fluid properties that may be uncovered through an inversion process. Using measurements of water
cut and pressure, the MCMC inverse model achieved 93% less calls to the forward model compared to
conventional gradient-based optimization along with comparable quality of history matches. Moreover, by
obtaining estimates of the full posterior parameter distributions, the presented model enables more robust
uncertainty quantification.
2 SPE-195800-MS
Introduction
Downhole fluid sampling using wireline formation tester (WFT) technology is widely used in exploration
and appraisal wells to collect representative formation fluid samples. Laboratory analysis of the samples
provides fluid properties, such as density, viscosity, and composition, which are critical in a variety of
field development and reservoir management workflows. The continued desire in the industry to expand
downhole sampling to more challenging environments, while still capturing minimally contaminated
samples, has been met by new types of WFT sampling hardware, such as focused probes (O'Keefe et
al. 2008) and radial probes (Al-Otaibi, et al., 2012), along with improvements in both availability and
quality of downhole fluid analysis (DFA) sensor measurements. DFA measurements are used to monitor
mud filtrate cleanup progression, but increasingly also to determine composition gradients, assess reservoir
connectivity, and evaluate important rock-fluid properties through advanced interpretation workflows. With
a broad hardware portfolio, and a vast amount of data captured from each sampling station, there is a need
to develop efficient and quantitative methods for fluid sampling job planning and real-time interpretation.
These methods are critical for driving operational efficiencies as well as extracting more value from the
sensor measurements.
In this paper, we adopt a model-based approach to planning and interpretation. At its core, this approach
uses a flow model of the mud filtrate contamination cleanup process. The model accounts for invasion of
mud filtrate into the near-wellbore region followed by cleanup and sample capture by a WFT probe. We
focus on invasion and cleanup under immiscible flow conditions, such as oil sampling in a well drilled with
water-based mud (WBM). The model used here was presented by Kristensen et al. (2018). With complex
tool inlet geometries and nonlinear multi-phase flow behavior, the model must be solved using numerical
simulation. However, even when optimized and solved in parallel, typical model simulation times of minutes
to hours would prohibit interactive and real-time applications. Therefore, we replace numerical simulation
with high-fidelity proxy models, which have been trained on pre-computed, full-scale numerical results.
As we will demonstrate in this paper, the proxy models introduce only negligible approximation errors
compared to the full numerical model, and they can thus be used, without loss of fidelity, in all planning
and interpretation workflows.
The goal of this paper is to develop accurate proxies for the forward cleanup model along with inversion
techniques for interpretation of formation-fluid properties from DFA measurements. We build on previous
work by Kristensen et al. (2018). They used a proxy model based on Gaussian Process Regression (GPR),
or kriging, coupled with a nonlinear optimization approach for the inverse problem. GPR predictions are
interpolations in the training set using kernel functions. An advantage of this approach is that training data
are honored exactly, but at the cost of the proxy model size scaling with the size of the training set. Here,
we instead explore the use of deep learning (DL) methods. Enabled by the construction of proxy models,
we then address the inverse problem through a Markov Chain Monte Carlo (MCMC) approach, in which
the posterior distribution of formation-fluid properties can be efficiently characterized.
Model description
Details of the model used in this study along with its assumptions can be found in Kristensen et al. (2018).
The two-phase flow problem is solved on high-resolution near-wellbore grids. Examples of such grids are
shown in Figure 1 for a 3D radial probe (3DRP) with four elongated inlets and a focused probe with two
concentric inlets (i.e., an outer guard inlet and an inner sample inlet). Due to the guarding of filtrate flow,
the focused probe achieves clean formation fluid much faster than the 3DRP, a well-understood behavior
of focused vs. non-focused sampling probes. The purpose of this paper is to develop proxy models for the
contamination response in the flowlines, as shown in Fig. Figure 1f.
SPE-195800-MS 3
Figure 1—Examples of mud filtrate cleanup simulations for the 3D radial probe (3DRP) and the focused probe
(FP). The 3D distributions of WBM filtrate are shown at three points in time corresponding to: before cleanup,
at formation fluid breakthrough, and at 5% produced contamination. Tool inlets are indicated in black. Only a
part of the full simulated domain is shown. The predicted flowline contamination is shown in the bottom figure.
(2)
where ϕ is porosity, M = kh/μo is mobility, krowi is the oil relative permeability endpoint, and Q is the pumpout
rate. The input parameters of the model consist of a 10-dimensional vector describing the relevant formation
and fluid properties. The input parameters are defined in Eq. 3 and Table 1. This model parametrization
4 SPE-195800-MS
was previously used in Kristensen et al. (2018) to model the sampling process using Gaussian Process
Regression proxy models.
Vertical permeability kv
Horizontal permeability kn
Wellbore diameter Dw
Formation thickness H
(3)
Model architecture
The proxy model was built using a Recurrent Neural Network (RNN). This specific type of Neural Network
is designed to handle sequential data and has been applied to model pressure and flow rate data from
permanent downhole gauges (Tian and Horne, 2017) as well as pressure response during hydraulic fracturing
processes (Madasu and Rangarajan, 2018). The idea behind RNNs is that in sequential data, the output of
a system depends on both the input as well as the state of the system. The state of the system is represented
by a variable h that encodes the input history the model has seen. That variable is then passed down the
network to consecutive layers that compute the estimated output. This structure is best understood by using
the so-called vanilla RNN described by Eqs. 4 and 5.
(4)
(5)
In the vanilla RNN case, the output yt depends linearly on the state ht which is a nonlinear recurrent
function of the input xt and the previous state ht-1. Matrices Whh, Wxh and Why are the learnable weight matrices
of the RNN. In practice, the vanilla RNN is seldom used due to issues with numerical stability of the gradient
computation during backpropagation. Because of the recurrence relation on which ht is based, computing its
gradient involves raising the operators Whh and Wxh to a high power, which leads to exploding or vanishing
gradients. To solve this issue, a variety of modified RNN types have been proposed. For this study, the
Gated Recurrent Unit (GRU) (Cho, et al. 2014) and Long Short Term Memory (LSTM) (Hochreiter and
Schmidhuber, 1997) types were tested.
The LSTM addresses the vanishing gradient problem by reparametrizing the vanilla RNN. Intuitively,
instead of computing the activation St directly from St-1, it first computes the difference ΔSt and then adds it
to ΔSt-1 to obtain St (Jozefowicz, 2015). This sequence of operations is shown in Eqs. 6 to 11.
SPE-195800-MS 5
(6)
(7)
(8)
(9)
(10)
(11)
where W* and b* are the weights and biases being learned, σ is the sigmoid function and ⊙ is the element-
wise vector product. The computation of the ct gate is the equivalent of the St update described before.
The GRU follows a similar but simpler structure compared to the LSTM. The detailed operations are
shown in Eqs. 12 to 15.
(12)
(13)
(14)
(15)
Pytorch 4.0 framework (Paszke, et al., 2017) was used to build the proxy models. Multiple network
architectures were tested by varying the number of recurrent layers. The fundamental architecture of the
model consists of a stack of recurrent layers followed by a single fully connected layer. The unfolded
computational graph of the network is detailed in Fig. 2. The output of the network is sequentially generated
from l = 1 to l = 200 representing contamination levels η = 1 to η = 0.01. At each step, the full input vector
x is fed to the network.
Figure 2—Unfolded computational graph of the general model architecture. The input x is
sequentially fed to the RNN at each step to produce a hidden state that is passed along
to the next step. Each hidden state h is used to then produce the estimate of the output.
6 SPE-195800-MS
(16)
where y^ji is the estimated value and yji is the value from numerical simulation for case i at contamination
level j. For model selection, the mean of E over the contamination range ηu = 95% to ηl = 1% was used.
Specifically:
(17)
The dataset (consisting of approximately 6000 full simulation cases for each WFT probe) was split
into 80% training, 10% validation and 10% test set. Different model architectures were trained and their
performances were compared using the validation set. The training was done over 16 epochs with a batch
size of 64 using the Adam optimizer (Kingma and Ba, 2015). Huber loss (Huber, 1965) was selected as
the loss function for the neural network. Using this loss function both minimized the above-defined error
measures as well as improved the convergence of the training. A model was considered to converge when
the validation error e was below 5%.
For the tested architectures, three design parameters were considered: network depth, layer size, and
RNN type. Network depth refers to the number of RNN layers that a specific model has. Layer size is the
number of units or “neurons” that each RNN layer has. If more than one recurrent layer was present, the
same layer size was used for all the layers. The RNN types were LSTM and GRU as defined above. Table
2 shows the ranges for each of these parameters.
Parameter Range
Network Deptd 1 to 3
Deeper networks with larger layer sizes are standard practice for difficult modeling tasks such as image
recognition. However, for the problem of filtrate cleanup the main interest is in preserving accuracy while
reducing the model size and improving the speed at test time. Figure 3 shows the effect of increasing the
network depth and layer sizes on the validation error e for the cleanup volume V. Two trends can be noticed
from the plot. First, the error decreases almost monotonically as the layer size increases. However, for layer
sizes larger than 128 units, the error e stays approximately constant, suggesting that the increased model
complexity has a marginal contribution to model performance.
SPE-195800-MS 7
Figure 3—Validation error for cleanup volume by network depth and layer size.
The second trend is that deeper networks have lower validation error at smaller layer sizes. This result is
not unexpected as increasing layer size or network depth both increase the model complexity. Nonetheless,
it can be noted that unless the layer size is on the low end of the range, increasing the depth leads to only
marginal improvements to the error. In other words, the proposed architecture reaches minimum bias with
at least a layer size of 128 and unitary depth.
With respect to the RNN type, LSTM has a marginally better performance overall, potentially due to the
increased complexity of the model. LSTM also shows a slightly lower variance for the error. This can be
observed in Figure 3, where the shadowed region represents the interdecile range for the validation error.
Figure 4—Test error at each contamination level for the focused probe tool.
8 SPE-195800-MS
Figure 5—Test error comparison between the Gaussian Process Regression and Deep
Learning proxy models for the 3DRP. The deep learning model is a single layer, 128 GRU RNN.
Another downside of GPR is the requirement of saving the full training dataset in memory which makes
scaling difficult. Figure 6 shows a comparison of the model size for different deep learning architectures.
The model size is defined as the required space in memory to store the trained weights for the neural network
using 32-bit floating-point numbers. For most of the tested architectures, the model sizes stayed below that
of GPR. For the deep learning models, the model size increases logarithmically with the layer size. For
architectures with layer sizes below 256 units, the model size is below the GPR threshold regardless of the
considered number of layers.
Inverse Model
The process of filtrate invasion and cleanup can be regarded as a full drainage-imbibition in situ flow
experiment. Therefore, in principle, it is possible to infer the parameters of the forward near-wellbore
reservoir model using measurements of the drawdown pressure and watercut during cleanup. In Kristensen
et al. (2018), this inversion problem was approached as a joint optimization problem using a weighted
mismatch function for pressure and watercut as the objective function. In this work, the inverse problem is
explored using Markov Chain Monte Carlo (MCMC) sampling, a bayesian approach where the goal is to
obtain a probabilistic distribution of the forward model parameters consistent with available observations. In
other words, we want to estimate the posterior distribution of the model parameters given observed pressure
and watercut data. This process differs from that of optimization methodologies in that the goal is not to find
a unique solution or “true” vector of parameters. Instead, MCMC gives the statistical posterior distribution
of the solution in the parameter space. If a single solution is needed, the solution with maximum likelihood
can be obtained. Moreover, this approach directly allows for uncertainty quantification of the solution, as
any chosen moment can be estimated from the resulting posterior distribution.
(19)
(20)
where δ is the normalized difference between the results of the proxy model and the observed data for both
pressure and watercut, and σ2 is the variance of δ. Using this likelihood, as well as a prior π(x), the inverse
problem can be solved by obtaining the posterior or target distribution p(x|y) according to Bayes theorem:
10 SPE-195800-MS
(21)
where the prior π(x) was set to a uniform distribution bounded by the ranges of x that the proxy model was
exposed to during training, or the ranges imposed by the available petrophysical data. With these definitions,
MCMC was used to compute the posterior distribution, p(x|y).
Step 2b of the Metropolis-Hastings algorithm either accepts or rejects the proposed solution x*. The rate
at which this proposal is accepted is called the acceptance rate of the chain. This rate depends on how far
the proposal is from the previous solution x(m-1) and is controlled by the jumping distribution parameter Σ2.
For this case, Σ2 was chosen to be 0.012 to approximate an optimal acceptance rate of 0.234 as suggested
in Roberts et al. (1997).
Hribar and Nocedal, 1999) with random starting values for the parameter vector x. The function fobj defined
in Eq. 23 was used as the objective function.
(23)
The results of both inversion methodologies are shown in Figure 7. The MCMC results come from using
the average of the posterior distribution p(x|y)while the optimization results come from the solution with
the lowest value of the objective function. It can be noted that both the optimization-based inversion and
the MCMC match are very close to each other. The results for watercut are acceptable even if the match for
pressure is not as close during the pressure drawdowns.
The distributions for the solution x for both the MCMC methodology and the optimization are shown in
Figure 8. It can be noticed that with the exception of the horizontal permeability kh, the distributions for
most of the parameters do not match closely, and in general the distributions from the MCMC methodology
tend to cover a larger spectrum of the parameter space.
The plots for the parameters kv/kh, Sor, z and DoI, corresponding, respectively, to permeability anisotropy,
residual oil saturation, tool distance from top of the formation, and depth of invasion, show that both
distributions display a bimodal pattern, although the distributions coming from MCMC have wider peaks
with slower decay in general. If it is assumed that both distributions are indeed very similar, the apparent
mismatch can be explained by the fact that the optimization methodology has significantly less data points,
which in turn produces a lower resolution histogram. In contrast, the distribution for the endpoint relative
permeability, krowi, show very different distributions, with the optimization solutions being concentrated in
higher values while the MCMC solutions are more spread throughout the range, which speaks to the non-
unique nature of the inverse problem and, more specifically, various sensitivities of the terms in the objective
function to the input parameters, as emphasized by Kristensen et al. (2018).
To compare the quality of the inversion for both procedures, Figure 9 shows the value of the mismatch
δ for both procedures. As expected, the MCMC solutions have a wider spread since the methodology
is designed to explore also the areas of low likelihood in the parameter space. Nonetheless, the MCMC
solutions also find solutions with mismatch as low as the gradient-based optimization. In terms of number
of proxy model calls, using MCMC with the described configuration required 93% less calls to the forward
model. This represents a significant improvement in the efficiency of the inversion, since not only the
estimates for the inverted parameters are obtained but also the posterior distribution of the solution. We note,
however, that improvements can also be made to the optimization based approach which would bring down
the number of proxy model evaluations. The above-mentioned comparison was done using a numerical
approximation to the gradient in the optimization approach.
Figure 9—Value for the objective function for both the MCMC and optimization solutions.
Using the Metropolis-Hastings algorithm is also an inefficient way of performing the sampling as it relies
in a random walk to sample the solution space. A potentially more efficient way of sampling would be to
use more advanced MCMC algorithms such as Hamiltonian Monte Carlo (Duane, et al. 1987), which uses
the Hamiltonian equations of motion to explore the solution space in a more efficient way. By using these
advanced methods, the number of proxy model calls could be further reduced, opening the door to potential
applications such as real-time inversion.
Conclusions
A deep learning model-based approach for the mud filtrate contamination cleanup process was presented.
The deep learning model achieved significant improvements in accuracy (upto 50%) and memory
requirements (reduction by 10-fold) compared to a previously-developed Gaussian Process Regression
model without sacrificing computational speed. The deep learning architecture and training framework
developed were shown to be robust and applicable to multiple probe geometries.
With a reliable proxy model in place, the inverse problem of inferring model parameters from
measurements was tackled. By using MCMC, the full posterior distribution of the model parameters was
inferred given watercut and pressure observations. The inversion methodology was tested using field data
and benchmarked with a gradient-based optimization inversion methodology. The MCMC results displayed
a more efficient exploration of the solution space with a substantial decrease in proxy model calls and
equivalent quality in the inversion results.
Bibliography
Al-Otaibi, S., Bradford, C., Zeybek, M., Corre, P.-Y., Slapal, M., & Kristensen, M. (2012). Oil-Water Delineation with
a New Formation Tester Module. SPE Annual Technical Conference and Exhibition (p. SPE Paper 159641). San
Antonio: SPE.
Byrd, R., Hribar, M., & Nocedal, J. (1999). An Interior Point Algorithm for Large-Scale Nonlinear Programming. SIAM
Journal on Optimization, 9:4, 877–900.
Cho, K. v. (2014). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation.
Duane, S. K. (1987). Hybrid Monte Carlo. Physics Letters B, 195 (2): 216–222.
Goodfellow, I. B. (2016). Deep Learning. MIT Press.
Hastings, W. K. (1970). Monte Carlo Sampling Methods Using Markov Chains and Their Applications. Biometrika 57,
no. 1, 97–109. doi: 10.2307/2334940.
Hochreiter, S. S. (1997). Long Short-Term Memory. Neural Computation, 9(8): 1735–1780.
Huber, P. (1965). Robust Estimation of a Location Parameter. The Annals of Mathematical Statistics, Volume 35, Number
1, 73–101.
Jozefowicz, R. Z. (2015). An Empirical Exploration of Recurrent Network Architectures. Proceedings of the 32nd
International Conference on Machine Learning. Lille, France: Proceedings of Machine Learning Research.
Kingma, D. B. (2015). Adam: A method for Stochastic Optimization. 3rd International Conference for Learning
Representation, (p. San Diego).
Kristensen, M. C. (2017). Real-Time Formation Evaluation and Contamination Prediction Through Inversion of Downhole
Fluid Sampling Measurements. SPE Annual Technical Conference and Exhibition. Society of Petroleum Engineers.
Kristensen, M., Chugunov, N., Cig, K., & Jackson, R. (2018). Proxy-Enabled Stochastic Interpretation of Downhole Fluid
Sampling Under Immiscible Flow Conditions. Petrophysics, 59(5), 633–648.
Kristensen, M., Chugunov, N., Gisolf, A., Biagi, M., & Dubost, F. (2019). Real-Time Formation Evaluation and
Contamination Prediction Through Inversion of Downhole Fluid Sampling Measurements. SPE Res. Eval. & Eng.,
22(2).
Madasu, S. R. (2018). Deep Recurrent Neural Network DRNN Model for Real-Time Multistage Pumping Data. OTC
Arctic Technology Conference. Houston, Texas, USA: Offshore Technology Conference.
O'Keefe, M., Eriksen, K., Williams, S., Stensland, D., & Vasques, R. (2008). Focused Sampling of Reservoir Fluids
Achieves Undetectable Levels of Contamination. SPE Res. Eval. & Eng., 205–218.
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., … Lerer, A. (2017). Automatic differentiation in
PyTorch. 31st Conference on Neural Information Processing Systems (NIPS). Long Beach, CA.
14 SPE-195800-MS
Rashaid, M., Al-Ibrahim, M., Van Steene, M., Ayyad, H., Friha, A., Liang, L., … Cherian, J. (2017). Application of a New
Methodology for In-Situ Evaluation of Relative Permeability and Capillary Pressure in the Ahmadi Field of Greater
Burgan, Kuwait. SPE Middle East Oil & Gas Show and Conference (p. Paper SPE 183868). Manara, Kingdom of
Bahrain: SPE.
Roberts, G., Gelman, A., & Gilks, W. (1997). Weak Convergence and Optimal Scaling of Random Walk Metropolis
Algorithms. The Annals of Applied Probability, Vol. 7, No. 1, 110–120.
Tian, C., & Horne, R. (2017). Recurrent Neural Networks for Permanent Downhole Gauge Data Analysis. SPE Annual
Technical Conference and Exhibition. San Antonio, Texas, USA: Society of Petroleum Engineers.