FINALVERSION

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. XX, NO.

X, XXX XXX 1

Elastic Properties Estimation from Prestack Seismic


Data Using GGCNNs and Application on Tight
Sandstone Reservoir Characterization
Hui Li, Jing Lin, Baohai Wu*, Jinghuai Gao, Member, IEEE, and Naihao Liu, Member, IEEE

Abstract—Traditional optimization algorithms are usually ap- In the last decades, seismic inversion algorithms, mainly di-
plied to estimate the elastic parameters of the subsurface by using vided into deterministic and stochastic subgroups, are utilized
field seismic data. However, these optimization algorithms highly to predict the elastic properties of the subsurface and its fluid-
depend on prior knowledge (e.g. the initial model setup and
sparsity), leading to serious inversion uncertainties. Nowadays, bearing properties from seismic data [3], [4]. Deterministic
with the rapid development of neural networks, convolutional inversions are essentially greedy algorithms, which usually
neural networks (CNNs) have been widely imposed on estimating adopt gradient descent methods to gradually approximate an
elastic parameters from field data. However, the deficiency of optimal solution in a least-squares frame. As a result, the
labeled seismic data impedes the CNNs application in seismic inverted results are generally non-unique and depend on the
inversion. Moreover, both the size and diversity of labeled
data sets are also critical factors influencing the accuracy and choice of the initial model because of incomplete geophysical
resolution of predicted parameters when using the CNNs based observations [5]. Therefore, a regularization term is generally
inversion techniques. In this work, taking the unconventional added to the objective function [6]. For the stochastic inver-
tight sandstone formation as an example, we develop a geological sions, the Bayesian framework becomes a natural choice to
and geophysical driven CNNs model, named as GGCNNs. The solve this issue [7]–[9]. Unlike the deterministic inversion,
proposed GGCNNs allows us to take advantage of both the prior
geological information and basic geophysical model from the gen- the stochastic inversion provides a chance to evaluate the
erated synthetic labeled prestack seismic data sets, representing reliability of the possible models, and furthermore to quantify
essential characteristics of the subsurface. Moreover, under the the uncertainties of the predicted elastic parameters.
consideration of data diversity, the GGCNNs model enables us Currently, with the rapid development of machine learning
to make a trade-off between the inversion accuracy and labeled (ML) techniques, the inversion for elastic parameters con-
data size. Applications on both synthetic and field data clearly
demonstrate the effectiveness of the proposed GGCNNs model tinues to be under active research. Generally, ML methods
for predicting elastic parameters by using prestack seismic data, based seismic inversions are data-driven and aim to “learn”
i.e. its predicted results are with high accuracy in the vertical an approximate mapping between the input and output data
profile and continuity and smooth in the horizon slice. sets. Namely, it does not employ any applicable physical
Index Terms—Geological and geophysical model driven con- model. For the inversion for elastic parameters, on the one
volutional neural networks (GGCNNs), diversity of labeled data, hand, it is because that the mapping relation between seismic
seismic inversion, unconventional tight sandstone. data sets and elastic parameters usually is non-unique, the
mapping relation based on ML using a small number of well
I. I NTRODUCTION logging as labeled data cannot be completely valid over the
entire study area, especially for severely lateral variations of
T HE seismic inversion of reservoir rocks are of consid-
erable importance in understanding the elastic nature
of upper-crustal rocks [1]. Nowadays, unconventional tight
sweet spot in unconventional tight sandstone reservoir. On the
other hand, there exists prior geological knowledge and basic
sandstone reservoir is becoming substantial with the dramatic geophysical law between elastic parameters and seismic data
demand of fossil energy. Tight sandstone exhibits low porosity sets. Therefore, physically speaking, it is possible to enhance
low permeability physically, and low impedance contrast seis- the capability of ML-based elastic inversions if we use such
mically. Therefore, quantitatively estimating elastic parameters geological and geophysical model-driven relation to generate
from field seismic data is significant in sweet spot characteri- labeled data sets for the network based inversions.
zation [2]. Nowadays, ML methods are becoming popular for seismic
inversions. Among them, the support vector regression (SVR)
Manuscript received January 10, 2021, revised April 14, 2021. This work [10], feedforward neural networks (FNN) [11], probabilistic
was supported in part by the National Natural Science Foundation of China
under Grant 41704127, in part by the Fundamental Research Funds for the neural network (PNN) [12], [13], and recurrent neural network
Central Universities under Grant xjj2018235, and in part by National Key (RNN) [14], [15] have been used to estimate reservoir param-
R&D Program of the Ministry of Science and Technology of China under eters based on seismic data. As one of the representative deep
Grant 2018YFC0603501. (Corresponding author: Baohai Wu)
H. Li, J. Lin, J. Gao, and N. Liu are with the School of Information learning (DL) algorithms, the convolutional neural networks
and Communications Engineering, Xi’an Jiaotong University, Xi’an, Shaanxi (CNNs) are capable of learning the multi-level features of the
710049, China (email: [email protected], [email protected], input data to establish a flexible mapping between the input
[email protected], and naihao [email protected]).
Baohai Wu is with the CGG (Beijing), Beijing 100016, China (email: and output data. Consequently, CNNs have been successfully
[email protected]). applied in fault detect [16], [17], rock facies classification [18],
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. XX, NO. X, XXX XXX 2

[19], first arrival picking [20], and seismic denoising [21], [22]. Additionally, the performance of all kinds of CNNs is
Recently, Di et al. [23] proposed a semi-supervised learning sensitive to both the size of training samples and time cost.
framework to estimate acoustic impedance by integrating 3D The large size of labeled training data will cause expensive
seismic data and sparsely distributed wells. To a certain degree, cost. In this study, we introduce the extraction of training
this strategy not only reduces the over-fitting risk but also samples during the training process of GGCNNs to make a
improves the horizontal continuity of the predicted attribute. trade-off between the sample size and training time. Therefore,
Cai et al. [24] developed a Wcycle-GAN (U-net [25] as GGCNNs with the extracted training samples can achieve
the generator and Alex-net [26] as the discriminator) for faster training speed as well as high accuracy.
seismic impedance inversion. Mustafa [27] proposed a learning Finally, based on the extracted training sample strategy, the
scheme to train two CNNs simultaneously and imposing soft proposed GGCNNs is applied to estimate elastic properties
constraints to obtain better generalization performance. Wu et by using prestack seismic angle gather datsets. The proposed
al. [28] proposed a residual CNN and combined with the model contains two sub-networks, i.e. the supervised sub-
transfer learning for improving impedance inversion. GGCNN and unsupervised sub-GGCNN. Specifically, based
In geophysical field, it is difficult to directly obtain enough on multiple scale data sets, the supervised sub-GGCNN is first
labeled data except for the limited well logging when conduct- applied to obtain a mapping relation between the geological
ing network training step. Consequently, the inadequacy of and geophysical knowledge driven input data and elastic prop-
the labeled data usually results in the over-fitting problem and erties. After that, combining the transfer learning, the trained
poor generalization. Through the forward modeling operation, supervised sub-GGCNN is used as the initial network model in
Das et al. [29], [30] generated synthetic labeled data and the unsupervised learning process. Then, a part of field data is
updated the network model parameters (weights and biases, applied to train the unsupervised sub-GGCNN, which finally
etc.) based on these labeled data. Furthermore, using a similar modifies GGCNNs to better represent the mapping relation
workflow of generating synthetic prestack data as labeled data between field data and elastic properties. Benefiting from
for training CNNs, Das et al. [31] practically compared the the trained supervised sub-GGCNNs as an initial network,
performances of end-to-end CNN and cascaded CNN. Al- since the available geological and physical information (e.g.,
though (a small number of well logging) data-driven network multiple well logs in target area, geological model, rock
architectures have been validated to be an feasible process in physics model, and forward modeling) can provide constraints
estimating elastic properties from a vertical profile of seismic and diversity to the network model, alleviate over-fitting issue,
traces, the aforementioned work still does not quantify the per- and reduce the training cost; thus it will finally improve the
formance of CNNs based seismic inversions in the entire study accuracy of elastic inversion from prestack field data.
area, especially in solving the horizontal continuity problem More specifically, the contributions of the proposed GGC-
(i.e. the spatial correlation). To overcome this remaining issue, NNs workflow are mainly threefold which are as follows.
in this work, we developed a geological- and geophysical- 1) Generation of training sample dataset using multi-
based CNNs (GGCNNs) to utilize the multiple scale data sets, scale modeling strategy: We propose to build a geolog-
such as geological parameters (macro-scale), multiple well ically and physically consistent model to obtain labeled
logging (meso-scale), rock physical measurements (micro- training data. This advanced multi-scale training data
scale) and combining the forward modeling operator to obtain generation workflow has solved the data deficiency and
high-quality labeled prestack seismic angle gather data sets. diversity problems and has not attempted in the CNNs
Here, we need to point out that, through integration of well based seismic inversion yet.
logging, seismic angle gathers, and stratigraphic interpretation, 2) A targeted network solving unconventional tight
the prestack seismic inversion cannot only avoid cross or angle reservoir: We consider the strong lateral heterogene-
smearing during stacking process, but also obtain reliable ity of tight sandstone reservoirs during the geological
estimates of elastic parameters (Vp , Vs , and ρ), from which modeling process. It is because that the quick lateral
to furthermore predict the properties of pore fluids and rock variation of the sweet spot could have a significant
lithology of the subsurface [32]. Meanwhile, it equally enables influence on lithology and elastic characterization. To
us to calculate other combinations of elastic parameters such as reflect the characteristics of such spatial heterogeneity in
P-impedance, Vp /Vs . More importantly, compared to poststack the training samples, the Sequential Indicator Simulation
inversion, it provides us an additional angle to recognize pore (SIS) and Sequential Gaussian Simulation (SGS) are
fluids and rock lithology, which can be directly compared to taken into account for both vertical and horizontal ranges
experimental measurements and allow a physically motivated in our generation of training labeled sample dataset.
link to reservoir properties. Consequently, GGCNNs gives us 3) Sub-supervised GGCNN and sub-unsupervised
a chance to obtain diverse labeled training data, which are GGCNN architecture: Based on aforementioned la-
not only geologically and physically consistent well with the beled training samples, we propose a supervised sub-
study area, but also capture the essential mapping between GGCNNs and unsupervised sub-GGCNNs architecture
seismic data and elastic parameters. Benefiting from the above- to conduct the elastic parameters estimates from prestack
mentioned strategies, GGCNNs shows superior performance in seismic angle gathers. Specifically, among specific net-
the diversity and consistence of labeled training data compared work training, based on different size of labeled training
with previous CNNs based seismic inversions using only a samples, we evaluate the horizontal smooth and conti-
small number of well logs to generate labeled data. nuity of the inverted results, which has not attempted
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. XX, NO. X, XXX XXX 3

in the CNNs based inversions from prestack seismic


data of unconventional tight sandstone formation yet. In
addition, we experimentally make a trade-off between
the inversion accuracy and time cost for GGCNNs by
extracting training data. And note that transfer learning
is applied in this paper to improve the accuracy of the
predicted results of the field seismic data.
The reminder of this work is structured as follows. In
Section II, we first introduce the basic theory of CNNs,
the architecture of the supervised-unsupervised GGCNNs, the
application of transfer learning in this work. Then a detailed
workflow of the generation of labeled data set is given. Finally,
the selection of training samples is introduced in detail. Then,
in Section III, both synthetic and field data examples are given
to verify the effectiveness of the proposed GGCNNs model.
Lastly, we draw conclusions in Section IV. Fig. 1. An example of the convolution operation.

II. M ETHODOLOGY
A. Convolutional Neural Networks (CNNs)
As one of the representative algorithms of deep learn-
ing, CNNs have achieved great success in exploration geo-
physics [1], [33], [34]. The basic CNN is composed of three
structures: convolution, activation, and pooling. In addition,
the operations, such as batch normalization and dropout, are
usually adopted. Before introducing the proposed model based Fig. 2. An example of the Max-pooling operation.
on CNNs, we first explain these main components briefly,
which would be used in the proposed GGCNNs model.
1) Convolution (Conv) layer
The convolution operation is one of the indispensable op- where Fw and Fh are the width and height of the filter, P is
erations in CNNs. The Conv layer aims to extract features the number of zero-padding, Sw and Sh are the strides of the
based on the input data. The input data of each Conv layer filter in the width and height directions. Fig. 1 shows a simple
is convoluted with a set of learnable filters to obtain multiple example of the convolution operation in most CNNs, where
feature maps. Assuming the input data X of a certain Conv the size of convolution filter is 3 × 3, the stride (Sw and Sh )
layer with a spatial size of W × H × C, where W × H is the is 1, and there is no zero-padding.
spatial size of X, and C presents the channels. Suppose there 2) Max-pooling layer
are k filters at the Conv layer, then the output size of the Conv The pooling operation is adopted to reduce the feature map
layer is Ŵ × Ĥ × k. The convolution operation is defined as: size. As one of the most commonly used pooling operations,
C
X Max-pooling is proposed to extract the maximum of feature
x̂j = f (xi ∗ wj + bj ), j = 1, 2, ..., k, (1) points in the pooling window. Considering a 2D feature map
i=1 with the size of W1 × H1 , the size of pooling filters and the
where wj and bj are the weight and bias of the jth filter. stride are Fw ×Fh and Sw ×Sh . After a Max-pooling operation,
xi is the ith feature map of X, and x̂j is the jth output the updated weight (W2 ) and height (H2 ) of the feature map
of the Conv layer. And, ∗ and f (·) denote the convolution are defined as:
operator and activation function, respectively. In this study, the
W2 = (W1 − Fw )/Sw + 1. (6)
nonlinear rectified linear unit (ReLU) function [35] is adopted
and defined as: H2 = (H1 − Fh )/Sh + 1. (7)

f (x) = max(x, 0). (2) An example of the Max-pooling operation is displayed in


Fig. 2. The size of the feature map is reduced by half after
The ReLU function can not only accelerate the training the Max-pooling, where the size of the pooling filter is 2 × 2
process but also alleviate the problem of gradient explosion and strides (Sw and Sh ) are both 2.
and gradient disappearance. The updated size (Ŵ × Ĥ × Ĉ) 3) Batch-normalization (BN) layer
of the extracted feature map after a Conv layer is defined as:
The batch-normalization (BN) layer is introduced to ac-
Ŵ = (W − Fw + 2P )/Sw + 1, (3) celerate the convergence speed of CNNs and to improve the
generalization ability of the trained model. The BN opeartion
Ĥ = (H − Fh + 2P )/Sh + 1, (4) can be summarized as the following steps:
Ĉ = k, (5) a) Calculate the mean of the training batch data X =
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. XX, NO. X, XXX XXX 4

xi , ..., xm : supervised sub-GGCNN model outputs a 2D tensor with the


m dimension of [Nt ∗ 3], which is recorded as mpredicted cor-
1 X
µB = xi , (8) responding to the three inversion parameters: P-wave velocity
m i=1 Vp (t), S-wave velocity Vs (t), and density ρ(t). Here, n stands
where m is the batch size. for the batch size. Ci is the output channels of convolutional
b) Calculate the variance: layers, which is same as the amount of convolution kernels.
m To evaluate the supervised sub-GGCNN, we select the
1 X mean square error (MSE) of the predicted elastic parameters
σB2 = (xi − µB )2 . (9)
m i=1 mpredicted and the synthetic elastic parameters (labeled data
mcorrect ) as the loss function for the network training. The
c) Normalize the batch data to obtain a 0 − 1 distribution
loss function is defined as
by using µB and σB , calculated as:
Nt
xi − µB
X X (mpredicted − mcorrect )2
x̂i = p 2 , (10) Loss = . (12)
σB +  3 ∗ Nt
m=Vp ,Vs ,ρ i=1

where  is a tiny positive constant for avoiding the divisor Based on labeled data sets, the supervised sub-GGCNN
equal to 0. x̂i denotes the normalized data. captures the primary mapping relation of synthetic seismic
d) Scale and shift: data sets and elastic parameters. However, the inversion for
yi = γ x̂i + β, (11) elastic parameters is a complex and nonlinear fitting problem.
Moreover, the mapping relation from synthetic data is not
where γ and β are the scale factor and shift factor. entirely applicable for field data. In addition, there is no
Note that, by introducing the above operations, the training labeled data corresponding to field data except for certain well
data of each batch has the same distribution, which makes it logs. Therefore, to estimate elastic parameters from prestack
easier and faster to train a stable model. field gathers, we combine GGCNN model with the seismic
forward modeling to build an unsupervised sub-GGCNN.
4) Fully-connected (FC) layer During field seismic inversion, the relation between elastic
Recently, fully-connected layers are adopted to the last parameters m and seismic data d is established through the
part of hidden layers in most CNNs based models. The forward operator f (.), which is defined as:
FC layer can successfully combine the features extracted by
the convolution and pooling layers by adopting a nonlinear d = f (m). (13)
function and save useful information to finally output the The goal of the inversion of elastic parameters is to achieve
predicted results of the model. optimal elastic parameters m∗ to minimize the discrepancy
between the observed data dobs and the forwarded data dcal ,
B. The workflow of supervised-unsupervised GGCNNs which can be described as:
The entire workflow of this work shown in Fig. 3 mainly in- m∗ = argmin J(m), (14)
cludes three steps: training sample generation, supervised sub- m
GGCNN learning and unsupervised sub-GGCNN learning. where J(m) is the misfit function that measures the match
Firstly, based on multiple well logs, geostatistical modeling, between dcal and dobs . In this study, we apply the L2 -norm-
rock physics modeling, and seismic forward modeling, a based misfit function as the loss function of the unsupervised
large number of high-quality synthetic labeled seismic angle learning, which is defined as:
gathers are generated accordingly. Secondly, the supervised Nt
sub-GGCNN training is conducted to obtain the mapping XX (dcal − dobs )2
Loss = . (15)
between synthetic prestack seismic angle gathers and output
i=1
Nt ∗ Nθ
θ
elastic parameters, which nearly captures the primary geo-
logical characteristics of field data at the study area because The details of the unsupervised sub-GGCNN is displayed in
of aforementioned geologically driven modeling. After that, Fig. 5. Firstly, the supervised sub-GGCNN trained on synthetic
through the transfer learning process, the supervised sub- data is used to initialize the subsequent unsupervised sub-
GGCNN is initialized to update the unsupervised sub-GGCNN GGCNN. Secondly, the transfer learning is evoked by partial
with part of prestack field seismic angle gathers. Finally, field data to fine-tune the unsupervised sub-GGCNN.
elastic parameters are predicted from field prestack seismic
data by using the trained unsupervised sub-GGCNN. The
C. Transfer learning
details of the hyperparameters of the network are given in
Table I accordingly. Transfer learning refers to a pre-trained model, which will
Fig. 4 shows the structure and workflow of the supervised be reused as an initial input model in the subsequent machine
sub-GGCNN. Note that the network input dsyn (t) is a 2D learning process. It is because that once the knowledge transfer
tensor with the dimension of [Nt ∗ Nθ ], where Nt and Nθ step is successfully fulfilled, the similar but redundant network
respectively represent the time length of seismic trace and training cost using a large number of the labeled data will be
the number of the angles in prestack angle gathers. Then, the omitted, thereby greatly improving the learning efficiency [36].
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. XX, NO. X, XXX XXX 5

Fig. 3. The main workflows in the study: (a) the workflow of generating the training samples; (b) the workflow of the supervised sub-GGCNN; (c) the
workflow of the unsupervised sub-GGCNN.

As shown in Fig. 6, through multi-scale modeling strategy D. The generation of labeled data set
(e.g. geostatistical modeling, rock physics modeling, and seis-
mic forward modeling), we built a set of synthetic labeled Comparing with the applications of CNNs on the natural
seismic data, which, to the maximum extent, can represent language Processing (NLP) [37]–[39] and image classifica-
geological characteristics of the field seismic angle gathers. tion [40]–[42], the deficiency of labeled samples of seismic
Then, these synthetic labeled seismic data are applied to train data severely impedes the accuracy of the geophysical param-
the supervised sub-GGCNN. Naturally, the trained supervised eter inversion. Additionally, the representativeness of labeled
sub-GGCNN model by using synthetic labeled data carries (seismic data) samples is another vital factor influencing the
the primary characteristics of research area. It will be then elastic inversion accuracy. Therefore, these two characteristics
used as an initial model to train the unsupervised sub-GGCNN lead to a typical small sample problem faced by ML methods
using part of field data profile with 804 lines, which will in geophysical application scenarios. Hence, how to make a
make unsupervised sub-GGCNN more representative. As we large number of high-quality training samples become signif-
can see that such a transferring step between supervised icant when adopting the network based inversion techniques
sub-GGCNN and unsupervised sub-GGCNN not only trans- in geophysical field.
fers fundamental geological characteristics of research area, To solve the sample augment issue, in this study, with the
but also dramatically decreases training cost of successive consideration of the basic physical model among reservoir
unsupervised sub-GGCNN with high accuracy. Essentially, parameters, elastic parameters, and seismic data, we combine
training unsupervised sub-GGCNN is a “fine-tuning” process the geological modeling (GM), rock physics modeling (RPM),
of supervised sub-GGCNN by using part of field data profile. and seismic forwarding (SF) to generate labeled seismic data
Here, it needs to points out that, in this work, we directly use sets. In summary, the key steps of generating synthetic labeled
multi-scale modeling strategy to obtain the labeled training data are as follows.
data representing dominant features of field data, instead of 1) Geostatistical modeling & Simulation
using general known models (e.g. Marmousi model, overthrust Geological modeling is divided into deterministic modeling
model, etc.), to improve the accuracy of predicting elastic and stochastic modeling. Stochastic modeling is not only the
parameters from field seismic data. mainstream modeling method, but also especially suitable
for geological model scenes with less wells [43]. Stochas-
tic modeling consists of two parts: discrete attribute model
(lithological model) and continuous attribute model (reservoir
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. XX, NO. X, XXX XXX 6

Fig. 4. The workflow of the supervised sub-GGCNN model using synthetic prestack data to conduct elastic estimation.

Fig. 5. The workflow of the unsupervised sub-GGCNN model by using field data sets to conduct elastic estimation.
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. XX, NO. X, XXX XXX 7

TABLE I
A RCHITECTURE AND PARAMETERS OF SUPERVISED SUB -GGCNN AND UNSUPERVISED SUB -GGCNN

Kernel size of Conv


Number of Conv
Network name Architecture layers and pooling Other parameters
filters and FC units
layers
Conv: Padding =
Conv: (32,3) + ’same’ Strides: (1,2)
Supervised
Conv1 + Conv2 + (32,1) + (32,1) + + (1,1) + (1,1) +
sub-GGCNN 64 + 32 + 64 + 32
Conv1 + Conv2 + (32,1) + (32,1) (1,1) + (1,1)
(Unsupervised + 1 + 600
Conv2 + FC Pooling: (2,2) + Pooling: Padding =
sub-GGCNN)
(2,2) ’same’ Strides: (1,1)
+ (1,1)
1 FC = Fully connected layer with ReLU activation
2 Conv1 = convolution + batch normalization + ReLU activation + Max-Pooling
3 Conv2 = convolution + batch normalization + ReLU activation

Fig. 6. The workflow of supervised sub-GGCNN transferring to unsupervised sub-GGCNN.

properties model). Each model is described by a numerical h) − Z(u) = 0, the variation function can be rewritten as:
probability function and a spatial variation function. The whole 1
process of modeling is to use geostatistical models, which γ(h) = E[(Z(u + h) − Z(u))2 ], (17)
2
will generate spatial distributions of discrete and continuous
attributes within a given geological grid. In this paper, the Note that the variational function can be defined as different
strata grid is 1000 (north-south) × 1000 (west-east) × 200 mathematical forms based on the degree of spatial variation,
(vertical) and the steps are 25 m (inline) × 25 m (crossline) such as exponential, Gaussian, and mixed type. The range of
× 0.25 ms (time) in the study area. For lithological model and the spatial correlation of variables is described by the spatial
reservoir properties model, the numerical probability functions range parameters. Within the range, the data is correlated.
usually are lithological proportion and probability density However, when outside the range of variation, the data are not
function (PDF) of reservoir properties respected [44], which correlated with each other, and the observed values outside the
could be estimated by well logging. range do not affect the estimation results [43]. According to
equation (17), the half-variance value γ should be 0 when h is
Note that the variational function describes the variation 0. Usually, γ is not equal to 0 and this phenomenon is called
degree of regional variables with the distance. We denote the the nugget effect. The intercept where the curve intersects the
spatial position as u, and the random variable (regionalized γ(h) is called the nugget value, which is used to describe the
variable) at u is presented as Z(u). Then, the variation function embodiment of the random characteristics of the regionalized
of Z can be expressed as: variable and to represent the spatial heterogeneity caused by
random factors. The larger the nugget, the stronger the spatial
1
γ(h) = E[(Z(u + h) − Z(u))2 heterogeneity of regional variables (lithology and reservoir
2 (16) parameters).
− E[Z(u − h) − Z(u)]2 ],
For CNNs-based seismic inversions, the vertical variation
where h is the hysteresis distance, namely the spatial variation. in the geostatistical modeling has been applied to generate
E denotes the mean value of the statistical samples. If Z(u + labeled data [31]. Actually, in the geology point, the hor-
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. XX, NO. X, XXX XXX 8

izontal variation of the elastic inversion is more important


to characterize reservoir sweet spot. Therefore, in the study
area (near to 200 km2 , more than 50 wells), by using the
logging, the horizontal variation can be further estimated. After
establishing the geostatistical model, the sequential indicator
simulation and Gaussian simulation are used to generate
lithology and reservoir properties volume respected [45].
2) Rock physics modeling

Rock physics modeling is a critical step to establish a


mapping relationship between reservoir parameters and elastic
parameters [46]. In this study area, the sedimentary back- Fig. 7. (a) The angle-dependent wavelets at angles 7° (black), 17° (yellow),
26° (red), 35° (brown), and 42° (blue) extracted from a section of 3D seismic
ground of braided river, lithology combination of sand and data, (b) the seismic angle gather at trace 100.
mud-stone, and diagenesis stage makes Xu-White model [47]
suitable for establishing the relationship between elastic pa-
rameters of the target layer and reservoir parameters. Xu- µsat = µd , (24)
White model includes 3 key steps: matrix model, dry rock
ρsat = φρf + (1 − φ)ρm , (25)
model, and fluid substitution. The rock matrix model, like
Voigt-Ruess-Hill averaging method, can be used to equalize where Ksat , µsat , and ρsat are volume modulus, shear mod-
the complex of different minerals into homogeneous and single ulus, and density of the saturated rock. Finally, by using
elastic equivalent according to the mineral volume ratio. On Xu-White model [47], the reservoir property model from
the basis of obtaining the effective model of rock matrix, it the geostatistical simulation can be transferred to the elastic
is necessary to consider the influence of pore structure on parameter model (Vp , Vs and ρ).
the elastic characteristics of rock, which is mainly reflected 3) Seismic forward modeling
in the modeling of elastic modulus of dry rock. In the Xu- The Aki-Richards equation [44] is used to calculate the
White model, the differential equivalent medium model or angle-dependent reflection coefficients (Rpps ) from the elastic
self-consistent model is used [48], [49] to add dry pores as parameter model, which is defined as:
inclusions into the equivalent model of rock matrix to obtain
a dry rock model. Taking differential equivalent media as an 2
1 1 ∆Vp V ∆Vs
example, we have Rpp (θ) = ( + tan2 θ) − 4sin2 θ 2s
2 2 Vp Vp Vs
d (26)
(1 − φ) [Kd (φ)] = (Kφ − Kd )P (∗φ) (φ), (18) 2
dφ 1 V ∆ρ
+ ( − 2sin2 θ 2s ) ,
d 2 V ρ
p
(1 − φ) [µd (φ)] = (µφ − µd )Q(∗φ) (φ), (19)

where V p , V s , and ρ represent the average value of Vp , Vs ,
ρd = φρφ + (1 − φ)ρm , (20) and ρ of the above and below reflective interfaces; ∆Vp , ∆Vs ,
1 X and ∆ρ correspond to the difference of the upper and lower
P (∗φ) = υl Tiijj (αl ), (21) interfaces Vp , Vs , and ρ. θ is the incident angle.
3
l=S,C To ensure that synthetic labeled data sets are more com-
1 X Tiijj (αl ) parable with field data, the wavelets W(θ) varying with the
Q(∗φ) = υl (Tijij (αl ) − , (22) incident angle are directly extracted from field data crossing
5 3
l=S,C well boreholes, which is displayed in Fig. 7(a). Finally, part-
where Kd , µd and ρd are volume modulus, shear modulus, stack seismic data at incident angles of 7°, 17°, 26°, 35°, 42°
and density of dry rock. Kφ , µφ , and ρφ are volume modulus, are generated by
shear modulus, and density of inclusion. Km , µm , and ρm are dcal = Rpp (θ) ∗ W(θ), (27)
volume modulus, shear modulus, and density of matrix. The
initial values of Kd and µd are Km and µm . The values of where dcal represents the angle gather and ∗ represents the
Kφ , µφ , and ρφ are equal to 0 in Xu-White model. P (∗φ) and convolution operator. Fig. 7(b) shows one of the generated
Q(∗φ) are geometric factors related to the pore structure. υS angle gathers.
and υC are the volume fractions of pores in quartz and clay.
Tiijj (αl ) and Tijij (αl ) are functions of the pore aspect ratio. E. The selection of training samples
The substitution of the pore fluid can be realized by using the Based on the above-mentioned workflow, a large number of
Gassmann equation [50]. label seismic database for the study area can be obtained. As
Kd shown in Fig. 8(a), the synthetic elastic parameters Vp , Vs , ρ
(1 −
) and corresponding synthetic forwarded 3D seismic data, i.e.
Km
Ksat = Kd + , (23) labeled data sets, are all a 3D volume with the same size (i.e.
φ 1−φ Kd
+ + 2 inline (1 ∼ M), crossline (1 ∼ N), time (1700 ms ∼ 1750 ms)).
Kf Km Km
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. XX, NO. X, XXX XXX 9

Here, M = 616, N = 397, and the time interval is 0.25 ms. compare the influence of sample diversity on the network
Besides, there are also two field profiles, which are denoted prediction accuracy, a total of 4312 pair of training samples
by the green and red lines in Fig. 8(a). The time duration of were also selected to train the supervised sub-GGCNN and the
field profiles (with 1 ms time interval) is from 1550 ms to Biswas’ CNN, which are located in the blue area in the Fig. 8
1750 ms. (crossline from 108 to 114, inline from 1 to 616). The green
Theoretically, we can use all labeled data to train the sub- solid line and dotted line in Fig. 9 are the training loss curves
GGCNN and to obtain a mapping between input and output of the supervised sub-GGCNN and the Biswas’s CNN based
data. However, a large data size results in expensive cost and on this training sets. The results show that the comparable
we generally need to make a trade-off between the inversion training data sets (K=5) but more representative distribution
accuracy and time cost by sampling the labeled data sets. In seems to be of better training performance.
addition to the size of labeled data, the diversity of labeled data Fig. 10 displays the actual and predicted Vp profiles at
will also have a significant effect on the inversion accuracy. crossline = 310 by using the trained supervised sub-GGCNN
For the supervised sub-GGCNN procedure, to build a map- with varying sizes of training data sets. Firstly, comparing
ping relation, the synthetic labeled seismic data sets have with the actual Vp vertical profile in Fig. 10(a), the overall
been separated into training part (crossline from 1 to 198) predicted results clearly show that the inversion accuracy of
and testing part (crossline from 199 to 397). Meanwhile, to Vp vertical profile is the virtually highest when the amount of
qualitatively investigate the effects of the size of training training samples is the largest (K = 2). In the vertical direction,
samples on the generalization of GGCNNs, we sample the the supervised sub-GGCNN can even identify thinner velocity
training data sets in both inline and crossline directions with layer with a higher resolution when the size of training data
a interval number K (1, K+1, 2K+1, ..., M; 1, K+1, 2K+1, is the largest, marked by black arrows. More importantly,
..., N). To ensure the temporal resolution of training data, we the rock lithology is dramatically changeable in the lateral
do not sample in the time direction. Moreover, we select the direction, leading to high inversion uncertainties. However,
comparable data size but different spatial distribution (k=5, the supervised sub-GGCNN can still steadily capture the
adjacent cluster lines (crossline from 108 to 114 shown in the velocity trend, denoted by the black circles. In addition, we
blue area of Fig 8(b)) in the study area to investigate the effect calculate the vertical profile correlation (VPC) and horizon
of the diversity of training data on the inversion accuracy. slice correlation (HSC) in TABLE II, and compute the nugget,
Finally, we use five indicators (i.e. vertical profile correlation, spatial range, and training time in TABLE III. Specifically, for
horizon slice correlation, nugget, spatial range, and training Vp profile, the VPC reaches a maximum value (0.852) when
time) to evaluate the effect of the size and diversity of labeled K = 2 in TABLE II, whereas the training time is expensive
data on the predicted parameters. The results are shown and (65918 s in 100 epochs) in TABLE III. With a decreasing size
analyzed in section III. of training data (such as K = 50), VPC dramatically decreases
to 0.555. However, training time is also down to 198 s (200
III. S YNTHETIC AND FIELD APPLICATIONS epochs). In addition, as shown in Fig. 10(c) and Fig. 10(f),
we select a comparable data size but from different location to
A. Synthetic data application in supervised sub-GGCNN train the supervised sub-GGCNN and to predict the Vp profile
To test the effects of both the size and diversity of labeled (crossline = 310). The results show that the diverse labeled
data on the inversion accuracy and efficiency of the supervised data-based (K = 5) supervised sub-GGCNN model can more
sub-GGCNN, we take K = 2, 5, 10, and 50 to obtain four precisely predict the Vp profile (VPC = 0.717) comparing that
sampled training sets with line numbers of 30492, 4960, (VPC = 0.589) of the cluster labeled data-based (adjacent)
1240, and 52. In addition, as a contrast, in the supervised training network.
learning process, we used the same training data sets to train Moreover, for the elastic inversion, the distribution of elastic
the CNN, which was proposed by Biswas et al. [30]. The parameters in the spatial space is more significant to identify
training curves of the supervised sub-GGCNN (full lines) and the reservoir sweet spot. Therefore, as shown in Fig. 11, we
Biswas’s CNN (dotted lines) based on different training data select a horizon slice at approximate 1.75 s and calculate the
sets with sampling intervals of 2, 5, 10, and 50 are shown in predicted Vp in horizontal distribution. Fig. 12 displays the
Fig. 9. Firstly, according to the local magnification diagram predicted Vp horizon slice results by using the supervised sub-
of the training loss, it can be found that in the case of the CNN model with varying training data sizes. Comparing with
same training data set, the initial value of training loss of our the actual Vp horizon slice, the predicted Vp plane exhibits
proposed supervised sub-GGCNN is smaller, the converges a higher inversion accuracy with a large size of training
speed faster, and the loss value after the final convergence is data, denoted by red and black circles. Specially, as displayed
also smaller. Secondly, training loss curves also demonstrate in TABLE III, the horizon slice correlation (HSC) is 0.708
that training loss decreases with increasing training samples when K = 2. However, HSC dramatically decreases with an
for both supervised sub-GGCNN and Biswas’s CNN, whereas increasing sample interval.
training loss of the supervised sub-GGCNN is systematically In addition, the data diversity is another factor influencing
smaller than that of Biswas’s CNN for a given epoch. To the inversion accuracy of elastic parameters. With a compa-
some extent, the proposed supervised sub-GGCNN may be rable training data size, Fig. 12(c) and Fig. 12(f) show that
more suitable for the prediction of elastic parameters of the predicted Vp with diverse training data (HSC = 0.643) is more
study area compared with the Biswas’s CNN. Meanwhile, to precisely than that of with cluster training data (HSC = 0.581).
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. XX, NO. X, XXX XXX 10

Fig. 8. The seismic data sets (a) in 3D configuration and (b) in 2D configuration. The blue line denotes the adjacent synthetic data sets. Green and red lines
denotes field data used in the transferring learning and field test, respectively.

TABLE II
T HE VERTICAL PROFILE CORRELATION (VPC) AND HORIZON SLICE CORRELATION (HSC) OF THE PREDICTED ELASTIC PARAMETERS BY S UPERVISED
SUB -GGCNN AND B ISWAS ’ S CNN BASED ON DIFFERENT TRAINING DATA SETS

Vertical profile horizon slice coorlation


correlation (VPC) (HSC)
Training data size (line numbers)
Vp Vs ρ Vp Vs ρ
Supervised sub-GGCNN 0.852 0.874 0.726 0.708 0.761 0.523
K = 2 (30492)
Biswas’s CNN 0.743 0.659 0.597 0.564 0.493 0.371
Supervised sub-GGCNN 0.717 0.739 0.630 0.643 0.680 0.440
K = 5 (4960)
Biswas’s CNN 0.692 0.655 0.567 0.538 0.487 0.340
Supervised sub-GGCNN 0.661 0.620 0.615 0.599 0.577 0.413
K = 10 (1240)
Biswas’s CNN 0.622 0.579 0.554 0.436 0.453 0.321
Supervised sub-GGCNN 0.555 0.544 0.495 0.459 0.455 0.338
K = 50 (52)
Biswas’s CNN 0.526 0.479 0.393 0.337 0.275 0.246
Adjacent lines Supervised sub-GGCNN 0.589 0.603 0.541 0.581 0.600 0.393
(4312) Biswas’s CNN 0.573 0.547 0.413 0.428 0.359 0.290

TABLE III
T HE NUGGET, SPATIAL RANGE , AND TRAINING TIME OF THE PREDICTED ELASTIC PARAMETERS BASED ON DIFFERENT TRAINING DATA SETS

Training data size (line


Nugget Spatial range Training time
numbers)
K = 2 (30492) 3400 1300 65918 s (100 epochs)
K = 5 (4960) 5800 1100 10148 s (200 epochs)
K = 10 (1240) 5800 1100 4657 s (200 epochs)
K = 50 (52) 6500 700 198 s (200 epochs)
Adjacent lines (4312) 5800 1000 7765 s (200 epochs)
Actual data 2000 1500 –
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. XX, NO. X, XXX XXX 11

Similarly, Figs. 13-14 show the corresponding predicted


Vs vertical profile and horizon slice results, respectively.
TABLE III shows that both VPC and HSC are the highest
when K = 2. Meanwhile, with the comparable data size,
the inversion accuracy of the predicted results from diverse
training data is higher (VPC = 0.739 and HSC = 0.680) than
that of from cluster data size (VPC = 0.603 and HSC = 0.600).
Meanwhile, Figs. 15 and 16 display the predicted ρ vertical
profile and horizon slices, respectively. Overall, the predicted
results tendency is similar with both Vp and Vs . However, both
VPC and HSC are systematically lower than those of Vp and
Vs in TABLE III.
The following Figs. 17-19 show the predicted results using
the trained Biswas’s CNN. Comparing with the Fig. 10, Fig.
13, and Fig. 15, which show the corresponding results using Fig. 9. The comparison of training loss with varying epoch between the
proposed unsupervised sub-GGCNN model and Biswas’s model with different
proposed sub-GGCNN, it can be found that the proposed training datasets.
supervised sub-GGCNN has two distinct advantages: (1) the
predicted results using the supervised sub-GGCNN can better
describe the spatial variation of elastic parameters, with higher
logging wells (i.e. Well A and Well B) crossing the test seismic
vertical resolution and better in describing the lateral variation
profile, and the well boreholes are denoted in Fig. 8(b).
of elastic parameters; (2) quantitatively, as shown in the
TABLE ii, the supervised sub-GGCNN has stronger prediction For the GGCNNs-based seismic inversion, we first use the
ability than the Biswas’s CNN in both vertical profile and trained supervised sub-GGCNN to initialize the unsupervised
horizon slice aspects. The results predicted by the supervised sub-GGCNN, and then update the unsupervised sub-GGCNN
sub-GGCNN are all in better consistency with the targeted with field data corresponding to the green line denoted in
elastic parameters. Fig. 8. Here, to test the effects of both the size and diversity
Lastly, Fig. 20 shows the correlation coefficient between of training data on the elastic inversion from field data, three
the predicted and actual parameters against the number of the labeled data types (K = 2, 5, and adjacent lines) are selected
training samples and training time. Clearly, for all parameters to train the supervised sub-GGCNN and to conduct successive
(Vp , Vs , and ρ), the correlation coefficients steadily increase unsupervised sub-GGCNN based inversion. Furthermore, in
with the increasing number of training samples. Simultane- the unsupervised process, as a comparison, we used the
ously, the training time also dramatically increases. Therefore, following three methods to estimate the elastic parameters
it suggests us that a trade-off between the correlation coeffi- using the similar field seismic angle gather data: (1) Similarly,
cients and training time is necessary, especially for massive combining with seismic forward modeling, and the trained
field data in real applications. Biswas’s CNN is used as the initial model, then the same
field data is used to fine-tune the network to obtain a trained
unsupervised Biswas’s CNN; (2) The Marmousi model data
B. Field data application in unsupervised sub-GGCNN is first selected for training supervised sub-GGCNN, and
In this section, we applied the proposed GGCNNs model to then a trained Marmousi-based unsupervised sub-GGCNN is
a prestack field seismic data, which is from Ordos basin. It is acquired through the same operation as the aforementioned
located in the west part of China, approximately occupying unsupervised sub-GGCNN; (3) Without the transfer learning,
area of 250 × 103 km2 . Due to the uplifting and erosion the unsupervised sub-GGCNN is initialized randomly and then
in Caledonian movement period, the reservoir layer exhibits trained with the same field seismic data.
strong horizontal inhomogeneity and widespread discontinuity. Fig. 22 shows the training curves of the aforementioned
Geologically, Ordos basin is an unconventional tight gas- unsupervised networks. Among them, the blue full line, red
bearing sandstone reservoir. The porosity and permeability of full line and dark green full line correspond to the training
sandstone are usually less than 10% and 0.1 mD. The curves of the unsupervised sub-GGCNN (K=2, K=5, adjacent
field data includes training and test parts, which correspond to training lines), whereas the dotted lines are the training curves
green area (804 line numbers) and red area (736 line numbers) of the unsupervised Biswas’s CNN (K=2, K=5, adjacent
in Fig. 8. The time sampling interval is 1 ms. The range of training lines). The black line shows the training process of
the offsets is from 200 m to 3700 m and the fold number is the Marmousi-based unsupervised sub-GGCNN. As for the
36. Based on the data coverage, we have stacked the common magenta full line, it is the training curve of the unsupervised
depth point (CDP) gathers into five part-stacked seismic data sub-GGCNN without transfer learning. Through the enlarged
sets. Here, we only display the field test data as shown in figure, it can be clearly found that based on the same train-
Figs. 21(a)-21(e), which correspond to the middle incident ing data sets, the initial loss of our proposed unsupervised
angles is 7°, 17°, 26°, 35°, and 42°, respectively. It should sub-GGCNN is smaller than that of unsupervised Biswas’s
be noted that we enlarged the prestack profile to pay attention CNN, and the convergence speed is also faster. To a certain
to the reservoir development area. Meanwhile, there exist two extent, the training loss curves objectively reflects that the
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. XX, NO. X, XXX XXX 12

Fig. 10. (a) The actual Vp vertical profile, the inverted Vp vertical profile predicted by supervised sub-GGCNN based on training data sizes with (b) K = 2,
(c) K = 5, (d) K = 10, (e) K = 50, and (f) adjacent data sets.

unsupervised Marmousi-based sub-GGCNN is very large, and


the convergence speed is slow (convergence after about 50
epochs). As for the unsupervised sub-GGCNN without transfer
learning, the result is worse. The above results show that it is
necessary to initialize the unsupervised networks by the trained
supervised networks, and the selection of training data for the
supervised networks is a very important key factor.
Figs. 23-24 display the predicted Vp and Vs vertical profiles
from field prestack seismic data, respectively. Firstly, the
influence of network structure on the predicted results is
compared. As can be seen from the comparison between
Fig. 11. Horizon slice position in the study area. Here, color bar denotes Figs. 23(a)-24(a) and Figs. 23(d)-24(d), Figs. 23(b)-24(b) and
time. Figs. 23(e)-24(e), the predicted Vp and Vs of the proposed
unsupervised sub-GGCNN can better describe the position
generalization ability of the previous trained supervised sub- and thickness of elastic interface on Well A and Well B
GGCNN is stronger than that of the Biswas’s CNN, and the than that of unsupervised Biswas’s CNN, and moreover the
captured mapping relationship is closer to the actual situa- overall lateral resolution is higher. Secondly, as circled in
tion. Moreover, the unsupervised sub-GGCNN (K=2) basically Figs. 23-24, with the larger size of labeled training data, the
converges after 3 epochs training (the unsupervised Biswas’s thinner reservoir layer can be steadily inverted and the results
CNN (K=2) converges after about 10 epochs training), while show a commendable continuity in the horizontal direction.
the unsupervised sub-GGCNN (K=5) needs to be trained at Apparently, the inverted Vp and Vs profiles with a large size
least 6 epochs before it gradually starts to converge (the of labeled training data (K = 2) in Fig. 23(a) and Fig. 24(a)
unsupervised Biswas’s CNN (K=5) converges after about 25 show higher inversion accuracy and fairly match well with
epochs training). This demonstrates that the size of the training well logs, compared to those with a small number of training
set in the supervised learning step has an impact on the data (K=5) in Fig. 23(b) and Fig. 24(b), denoted by white
learning effect of the subsequent unsupervised networks. If arrows.
the computing power of the device is strong, a data set with In addition, for the proposed unsupervised sub-GGCNN, we
a larger sample size should be applied as much as possible to made a comparison between Figs. 23(b)-23(c) to validate the
train the network. effect of sample diversity in the case of comparable sample
In addition, observing the black line, the initial loss of the size. As denoted by white arrows, when K=5, the accuracy
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. XX, NO. X, XXX XXX 13

Fig. 12. (a) The actual Vp horizon slice, the inverted Vp horizon slices predicted by supervised sub-GGCNN based on training data sizes with (b) K = 2, (c)
K = 5, (d) K = 10, (e) K = 50, and (f) adjacent data sets.

Fig. 13. (a) The actual Vs vertical profile, the inverted Vs vertical profiles predicted by supervised sub-GGCNN based on training data sizes with (b) K =
2, (c) K = 5, (d) K = 10, (e) K = 50, and (f) adjacent data sets.
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. XX, NO. X, XXX XXX 14

Fig. 14. (a) The actual Vs horizon slice, and the inverted Vs horizon slice predicted by supervised sub-GGCNN based on training data sizes with (b) K =
2, (c) K = 5, (d) K = 10, (e) K = 50, and (f) adjacent data sets.

Fig. 15. (a) The actual ρ vertical profile.The inverted ρ vertical profile predicted by supervised sub-GGCNN based on training data sizes with (b) K = 2;
(c) K = 5; (d) K = 10; (e) K = 50; (f) adjacent data sets.
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. XX, NO. X, XXX XXX 15

Fig. 16. (a) The actual ρ horizon slice.The inverted ρ horizon slice predicted by supervised sub-GGCNN based on training data size with (b) K = 2; (c) K
= 5; (d) K = 10; (e) K = 50; (f) adjacent data sets.

Fig. 17. (a) The actual Vp vertical profile, the inverted Vp vertical profile predicted by Biswas’s CNN based on training data sizes with (b) K = 2, (c) K =
5, (d) K = 10, (e) K = 50, and (f) adjacent data sets.
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. XX, NO. X, XXX XXX 16

Fig. 18. (a) The actual Vs vertical profile.The inverted ρ vertical profile predicted by Biswas’s CNN based on training data sizes (b) K = 2; (c) K = 5; (d)
K = 10; (e) K = 50; (f) adjacent data sets.

Fig. 19. (a) The actual ρ vertical profile.The inverted ρ vertical profile predicted by Biswas’s CNN based on training data sizes (b) K = 2; (c) K = 5; (d) K
= 10; (e) K = 50; (f) adjacent data sets.
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. XX, NO. X, XXX XXX 17

smooth and horizontal continuity of elastic inversion, which


are actually more significant to identify the reservoir sweet
spot. In addition, GGCNNs model has also been successfully
applied to a prestack field data set to further demonstrate its
effectiveness and its advantages in predicting elastic parame-
ters from unlabeled data sets. As a consequence, the proposed
GGCNNs model is promising in the elastic inversion by only
using unlabeled prestack data sets.
For a given labeled training data set, wisely selecting a
parameter K is still an open question. In addition, we do not
really quantify the training time for deep learning in this paper.
Therefore, in the future, the related works will be necessary.

Fig. 20. The correlation coefficients between the predicted and actual R EFERENCES
parameters against the number of training samples.
[1] Y. Wang, Q. Ge, W. Lu, and X. Yan, “Well-logging constrained seismic
inversion based on closed-loop convolutional neural network,” IEEE
of inverted Vp is obviously higher than that when adjacently Transactions on Geoscience and Remote Sensing, vol. PP, no. 99, pp. 1–
cluster lines are selected as training data, while the horizontal 11, 2020.
[2] H. Li, D. Wang, J. Gao, M. Zhang, Y. Wang, L. Zhao, and Z. Yang,
continuity (circled by black) is also better. Similarly, the “Role of saturation on elastic dispersion and attenuation of tight rocks:
inverted Vs profiles in Figs. 24(b)-24(c) also demonstrate such An experimental study,” Journal of Geophysical Research: Solid Earth,
truth. It is natural that the diversity of training data mainly vol. 125, 03 2020.
[3] A. Tarantola, Inverse Problem Theory and Methods for Model Parameter
determines the resolution, accuracy, and horizontal continuity Estimation. Society for Industrial and Applied Mathematics, 2005.
of the inverted results. [4] M. K. Sen, Seismic Inverse. Society of Petroleum Engineers, 2006.
Furthermore, the prediction results of Marmousi-based un- [5] G. E. Backus and J. F. Gilbert, “Numerical applications of a formalism
for geophysical inverse problems,” Geophysical Journal International,
supervised sub-GGCNN (Figs. 23(f)-24(f)) show a low resolu- vol. 13, pp. 247–276, 07 1967.
tion, especially in lateral direction. It implies that the existing [6] F. Li, R. Xie, W. Z. Song, and H. Chen, “Optimal seismic reflectivity
models-based training network generally cannot well represent inversion: Data-driven p-loss-q-regularization sparse regression,” IEEE
Geoscience and Remote Sensing Letters, pp. 806–810, 2019.
the primary characteristics of study area, leading to a weak [7] A. Tarantola, “Inverse problem theory: methods for data fitting and
prediction ultimately. model parameter estimation.,” Physics of the Earth & Planetary In-
Finally, it can be clearly found that the prediction results teriors, vol. 57, no. 3, pp. 350–351, 1987.
of all unsupervised networks combined with transfer learning [8] A. J. W. Duijndam, “Bayesian estimation in seismic inversion. part ii:
Uncertainty analysis,” Geophysical Prospecting, vol. 36, no. 8, pp. 899–
(Figs. 23(a)- 23(f) and Figs. 24(a)-24(f)) are superior to the 918, 1988.
prediction results of unsupervised networks without transfer [9] L. Zhao, J. Geng, J. Cheng, D. H. Han, and T. Guo, “Probabilistic
learning (Fig. 23(g) and Fig. 24(g)) in three aspects of the lithofacies prediction from prestack seismic data in a heterogeneous
carbonate reservoir,” Geophysics, vol. 79, no. 5, pp. M25–M34, 2014.
vertical resolution, the heterogeneous characterization and the [10] G. J. Cheng, L. Cai, and H. X. Pan, “Comparison of extreme learning
elastic parameter quantification. Therefore, transfer learning is machine with support vector regression for reservoir permeability predic-
of great necessity for predicting elastic parameters from field tion,” in 2009 International Conference on Computational Intelligence
and Security, CIS 2009, Beijing, China, 11-14 December 2009, Volume
seismic data. 2 - Workshop Papers, 2009.
[11] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep
IV. C ONCLUSIONS AND DISCUSSIONS feedforward neural networks,” Journal of Machine Learning Research,
vol. 9, p. 249–256, 2010.
We developed a GGCNNs model to improve the inver- [12] D. R. E. Grana D, “Probabilistic petrophysical-properties estimation
sion accuracy and resolution of elastic parameters from field integrating statistical rock physics with seismic inversion,” Geophysics,
vol. 75, no. 3, p. O21, 2010.
prestack seismic data of unconventional tight sandstone reser- [13] T. K. Wang, B. Y. Zhao, and X. F. Dai, “Probabilistic neural network
voir. Specifically, we build the GGCNNs model based on the inversion of porosity using pre-stack multi-attributes,” Computing Tech-
geoscience modeling constrains (geological, well logging, and niques for Geophysical and Geochemical Exploration, vol. 35, no. 2,
pp. 162–318, 2013.
rock physics). Through generating geological and geophysical [14] A. G. Alfarraj M, “Petrophysical property estimation from seismic data
consistent labeled data sets, GGCNNs was able to capture the using recurrent neural networks,” arXiv preprint arXiv, 2019.
essential features of field data at the study area. Meanwhile, [15] A. G. Alfarraj M, Keni N, “Property prediction from seismic attributes
using a boosted ensemble machine learning scheme: Sbgf,” in SEG
both the size and diversity of labeled data sets are tested to Machine Learning Workshop, 2018.
analyze their effects on the inversion accuracy and efficiency [16] X. Wu, Y. Shi, S. Fomel, L. Liang, Q. Zhang, and A. Z. Yusifov,
of elastic parameters. “Faultnet3d: Predicting fault probabilities, strikes, and dips with a single
Synthetic data tests demonstrated that the accuracy and convolutional neural network,” IEEE Transactions on Geoscience and
Remote Sensing, vol. 57, no. 11, pp. 9138–9155, 2019.
efficiency of the predicted elastic parameters mainly depend [17] N. Liu, T. He, B. Wu, J. GAo, and Z. Xu, “Common-azimuth seismic
on the size and diversity of training data. Therefore, we data fault analysis using residual unet,” Interpretation, vol. 8, no. 3,
should make a trade-off between the data size and time cost p. SM25–SM37, 2020.
[18] Y. Imamverdiyev and L. Sukhostat, “Lithological facies classification
when adopting GGCNNs for the elastic inversion. Moreover, using deep convolutional neural network,” Journal of Petroleum science
it also proved the potential advantages of GGCNNs on the and Engineering, vol. 174, 2018.
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. XX, NO. X, XXX XXX 18

Fig. 21. Part-stacked seismic test data with different incident angles (a) 7°, (b) 17°, (c) 26°, (d) 35°, and (e) 42°.

[20] H. Wu, B. Zhang, F. Li, and N. Liu, “Semi-automatic first-arrival picking


of microseismic events by using the pixel-wise convolutional image
segmentation method,” Geophysics, vol. 84, no. 3, pp. V143–V155,
2019.
[21] W. Zhu, S. M. Mousavi, and G. C. Beroza, “Seismic signal denoising
and decomposition using deep neural networks,” IEEE Transactions on
Geoence and Remote Sensing, vol. 57, no. 11, pp. 9476–9488, 2019.
[22] H. Wu, B. Zhang, T. Lin, F. Li, and N. Liu, “White noise attenuation
of seismic trace by integrating variational mode decomposition with
convolutional neural network,” Geophysics, vol. 84, no. 5, pp. V307–
V317, 2019.
[23] H. M. Haibin Di, Xiaoli Chen and A. Abubakar, “Semi-supervised
seismic and well log integration for reservoir property estimation,”
in SEG Technical Program Expanded Abstracts 2020, pp. 2166–2170,
2020.
[24] Z. L. Ao Cai, Haibin Di and H. M. et al., “Wasserstein cycle-consistent
generative adversarial network for improved seismic impedance inver-
sion: Example on 3d seam mode,” in SEG Technical Program Expanded
Abstracts 2020, pp. 1274–1278, 2020.
Fig. 22. The comparison of training loss with varying epoch between the [25] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks
proposed unsupervised sub-GGCNN, Marmousi-based network unsupervised for biomedical image segmentation,” LNCS, pp. 234–241, 2015.
sub-GGCNN, and Biswas’s with different training data sets. [26] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
with deep convolutional neural networks,” in Advances in neural infor-
mation processing systems, pp. 1097–1105, 2012.
[27] A. Mustafa and G. AlRegib, “Joint learning for seismic inver-
[19] F. Li, H. Zhou, Z. Wang, and X. Wu, “Addcnn: An attention-based deep sion: An acoustic impedance estimation case study,” arXiv e-prints,
dilated convolutional neural network for seismic facies analysis with p. arXiv:2006.15474, 2020.
interpretable spatial-spectral maps,” IEEE Transactions on Geoscience [28] W. L. Wu B, Meng D and L. N, “Seismic impedance inversion
and Remote Sensing, 2020. using fully convolutional residual network and transfer learning,” IEEE
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. XX, NO. X, XXX XXX 19

Fig. 23. Field data application. Vp results inverted by unsupervised sub-GGCNN when based on (a) K = 2, (b) K = 5, and (c) adjacent lines training sets; Vp
results inverted by unsupervised Biswas’s CNN when based on (d) K = 2, (e) K = 5 training sets; (f) Vp results inverted by Marmousi-based unsupervised
sub-GGCNN; (g) Vp results inverted by unsupervised sub-GGCNN without transfer learning.
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. XX, NO. X, XXX XXX 20

Fig. 24. Field data application. Vs results inverted by unsupervised sub-GGCNN when based on (a) K = 2, (b) K = 5, and (c) adjacent lines training sets; Vs
results inverted by unsupervised Biswas’s CNN when based on (d) K = 2, (e) K = 5 training sets; (f) Vs results inverted by Marmousi-based unsupervised
sub-GGCNN; (g) Vs results inverted by unsupervised sub-GGCNN without transfer learning.
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. XX, NO. X, XXX XXX 21

Geoscience and Remote Sensing Letters, vol. PP, pp. 1–5, 2020. Jing Lin received the B.Sc. degree from Huaibei
[29] U. W. Vishal Das1, Ahinoam Pollack and T. Mukerji, “Convolutional Normal University, Huai’bei, China, in 2018. She
neural network for seismic impedance inversion,” Geophysics, vol. 84, is currently pursuing the M.Sc. degree in Xi’an
no. 6, p. R869, 2019. Jiaotong University. Her current research interests
[30] R. Biswas, M. K. Sen, V. Das, and T. Mukerji, “Prestack and post- include reservoir parameters inversion combining
stack inversion using a physics-guided convolutional neural network,” machine learning.
Interpretation, vol. 7, no. 3, pp. SE161–SE174, 2019.
[31] V. Das and T. Mukerji, “Petrophysical properties prediction from
prestack seismic data using convolutional neural networks,” Geophysics,
vol. 84, no. 5, p. N41–N55, 2019.
[32] D. P. Hampson, “Simultaneous inversion of pre-stack seismic data,” SEG
Technical Program Expanded Abstracts, vol. 24, no. 1, p. 1633, 2005.
[33] Y. Ao, W. Lu, B. Jiang, and P. Monkam, “Seismic structural curvature
volume extraction with convolutional neural networks,” IEEE Transac-
tions on Geoscience and Remote Sensing, pp. 1–15, 2020.
[34] Y. Li, J. Song, W. Lu, P. Monkam, and Y. Ao, “Multitask learning
for super-resolution of seismic velocity model,” IEEE Transactions on Baohai Wu received the B.Sc. degree in department
Geoscience and Remote Sensing, pp. 1–12, 2020. of geophysics in China University of Geosciences
[35] V. Nair and G. E. Hinton, “Rectified linear units improve restricted (Wuhan), China, in 2008. He received the M.S. de-
boltzmann machines,” in International Conference on International gree in Research Institute of Petroleum Exploration
Conference on Machine Learning, 2010. and Development, China, in 2012. He is currently
[36] S. J. Pan and Y. Qiang, “A survey on transfer learning,” IEEE Transac- a technical advisor in CGG GeoSoftware. His cur-
tions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345– rent research interests include rock physics, seismic
1359, 2010. inversion and machine learning in geophysics.
[37] M. Egmont-Petersen, D. D. Ridder, and H. Handels, “Image processing
with neural networks—a review,” Pattern Recognition, vol. 35, no. 10,
pp. 2279–2301, 2002.
[38] N. Srivastava and R. Salakhutdinov, “Discriminative transfer learning
with tree-based priors,” Advances in Neural Information Processing
Systems, pp. 2094–2102, 2013.
[39] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma,
Z. Huang, A. Karpathy, A. Khosla, and M. Bernstein, “Imagenet large
scale visual recognition challenge,” International Journal of Computer Jinghuai Gao received the M.S. degree in applied
Vision, vol. 115, no. 3, pp. 211–252, 2015. geophysics from Chang’an University, Xi’an, China,
[40] Gu, Jiuxiang, Wang, Zhenhua, Kuen, Jason, Ma, Lianyang, Shahroudy, in 1991, and the Ph.D. degree in electromagnetic
and A. and, “Recent advances in convolutional neural networks,” Pattern field and microwave technology from Xi’an Jiaotong
Recognition the Journal of the Pattern Recognition Society, 2018. University, Xi’an, in 1997.
[41] A. oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, From 1997 to 2000, he was a Post-Doctoral
N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, “Wavenet: A gener- Researcher with the Institute of Geology and Geo-
ative model for raw audio,” 09 2016. physics, Chinese Academy of Sciences, Beijing,
[42] T. Sainath, B. Kingsbury, A.-r. Mohamed, G. Dahl, G. Saon, H. Soltau, China. In 1999, he was a Visiting Scientist with
T. Beran, A. Aravkin, and B. Ramabhadran, “Improvements to deep the Modeling and Imaging Laboratory, University of
convolutional neural networks for lvcsr,” 09 2013. California at Santa Cruz, Santa Cruz, CA, USA. He
[43] C. V. Deutsch, Geostatistical Reservoir Modeling (Applied Geostatistics is currently an Associate Director with the National Engineering Laboratory
Series). Oxford University Press,, 2002. for Offshore Oil Exploration, Xi’an Jiaotong University. He is also the Project
[44] K. Aki and P. G. Richards, Quantitative seismology. University Science Leader of the Fundamental Theory and Method for Geophysical Exploration
Books, 2002. and Development of Unconventional Oil and Gas, Xi’an Jiaotong University,
[45] C. V. Deutsch, “Andr e g. journel. gslib: Geostatistical software library which is a major program of the National Natural Science Foundation of China
and user’s guide,” 1992. under Grant 41390450. He is also a Professor with the School of Electronic
[46] P. Avseth, T. Mukerji, and G. Mavko, “Quantitative seismic interpreta- and Information Engineering, School of Mathematics and Statistics, Xi’an
tion,” qsi, p. 376, 2005. Jiaotong University. His research interests include seismic wave propagation
[47] S. Xu and R. E. White, “A new velocity model for clay-sand mixtures and imaging theory, seismic reservoir and fluid identification, and seismic
1,” Geophysical prospecting, vol. 43, no. 1, pp. 91–118, 1995. inverse problem theory and method. Dr. Gao was a recipient of the Chen
[48] G. T. Kuster, “Velocity and attenuation of seismic waves in two phase Zongqi Geophysical Best Paper Award in 2013. He is an Editorial Board
media,” Geophysics, vol. 39, pp. 587–606, 1974. Member of the Journal of Chinese Journal of Geophysics and Applied
[49] Berryman and G. James, “Effective stress for transport properties of Geophysics and Chinese Science Bulletin.
inhomogeneous porous rock,” Journal of Geophysical Research: Solid
Earth, vol. 97, no. B12, 1992.
[50] F. Gassmann, “Über die elastizität poröser medien,” Viertel. Naturforsch.
Ges. Zürich, vol. 96, pp. 1–23, 01 1951.

Naihao Liu received the B.S. degree (2012) in


Communication Engineering from Jilin University,
Changchun, Jilin, China, and the Ph.D. degree
Hui Li received the B.Sc. and M.S. degrees in (2018) in Information and Communication Engineer-
department of geophysics in China University of ing from Xi’an Jiaotong University, Xi’an, Shaanxi,
Geosciences (Wuhan), China, in 2008 and 2011, China.
respectively. He received the Ph.D degree in de- From 2017 to 2018, he visited the Department
partment of geophysics in University of Houston, of Geological Sciences, The University of Alabama,
Houston, USA, in 2015. Tuscaloosa, AL, USA. He is currently at School
He is currently an Associate Professor with Xi’an of Information and Communications Engineering,
Jiaotong University. His current research interests Xi’an Jiaotong University, Xi’an, Shaanxi, China.
include seismic rock physics, extracting fluids and His research interests include seismic time-frequency analysis, attribute analy-
rock properties from seismic data, and machine sis and parameter inversion, machine learning, and reservoir characterization.
learning application in geophysics.

You might also like