Improving Mini-Basin and Subsalt Imaging With Reflection Full Waveform Inversion

Improving mini-basin and subsalt imaging with reflection full waveform inversion
Katarina Jonke*, Zhan Fu, Brad Wray and Hao Shen, CGG
Summary long offset acquisition and reliance on the refraction energy

in data, reflection-based FWI methods that utilize the low-
Reflection-based full waveform inversion (RFWI) is wavenumber component of the FWI gradient (Mora, 1989)
increasingly used to recover long wavelengths of the are increasingly used to update deeper parts of the model
background velocity model and provide updates that extend (Ramos-Martinez et al., 2016; Sun et al., 2016). The low-
beyond the reach of diving waves. In our case study, we wavenumber component is generated along the reflection
use an RFWI method that first updates the density using the wavepath when transmission energy is reflected back to the
high-wavenumber components of the decomposed full surface from deeper reflectors. Methods like explicit model
waveform inversion (FWI) gradient and then updates the separation (Xu et al., 2012) or wavefield decomposition
velocity using the low-wavenumber components. We show (Liu et al., 2011; Tang et al., 2013; Irabor and Warner,
on a deep water example from the Mexican side of the 2016) have been proposed to separate low-wavenumber
Perdido fold belt that RFWI improves the velocity inside from the dominant high-wavenumber components.
sediment mini-basins and thus the interpretability of the
underlying salt. We also apply this method for the intra-salt In our case study, we use the RFWI approach described by
and subsalt velocity updates and show how it can improve Chazalnoel et al. (2017). Following the separation of the
imaging of the deep targets. FWI gradient based on the propagation direction of the
wavefields, high-wavenumber and low-wavenumber terms
Introduction are obtained alternately to update density and velocity,
respectively. The high-wavenumber density update
Subsalt imaging is strongly affected by the velocity model introduces the deep reflectors needed for the next iteration
of the overburden. Both a proper suprasalt velocity and a of low-wavenumber velocity updates. Our results show that
precise salt interpretation are required to define an accurate this method is able to improve the velocity of the sediment
salt geometry. Additionally, velocity errors related to mini-basins where both ray-based tomography and FWI
sediment inclusions and sutures within the salt, so called had limited success. We also demonstrate how application
‘dirty salt’, as well as subsalt velocity, can have a large of RFWI results in a better salt geometry interpretation.
influence on the imaging of targets under the salt. Different Finally, we show that RFWI can refine dirty salt velocity
methods are already available to help solve for all these and update subsalt velocity to improve subsalt imaging.
components of the velocity model, but they each have their
own limitations. Study area
Ray-based tomography is a default tool for updating both Our study area is situated in the western GOM, on the
suprasalt and subsalt velocities. However, when the Mexican side of the prolific Perdido fold belt. The water
shallow sediment geology is complex, tomography can fail depth ranges from 1500 m to 3500 m. Large deposits of salt
to provide a velocity model that is accurate enough for and shale are influenced by a regional compressional
correct top-down salt geometry modeling. Ray-based system. Advancement of the salt nappes is hindered by the
tomography also has limited success in dirty salt and Perdido folds, resulting in salt autosutures and a rugose top
subsalt applications in the Gulf of Mexico (GOM), even of salt (TOS). Shallow overburden is severely deformed
when reverse time migration (RTM) angle gathers (Xu at due to the shortening, while large folds and thrusts form the
al., 2011; Li et al., 2011) or surface offset gathers (SOGs) subsalt Lower Tertiary targets.
(Yang at al., 2015) are used. Due to a narrow range of
incident angles at subsalt events, ultra-long offset data is Data used in the study was acquired using a WAZ
essential for improving subsalt velocity tomography acquisition geometry. Maximum inline and crossline
updates with either type of RTM gathers. offsets were  8100 m and  4200 m, with a nominal fold
of 189 and record length of 14 s. Data pre-processing steps
FWI that primarily uses diving waves can refine the included denoise, source and receiver deghosting,
velocity model beyond the capabilities of tomography and designature, and 3D surface-related multiple elimination.
has become a standard tool used to improve the imaging of
the shallow overburden (Sirgue et al., 2010; Ratcliffe et al., Mini-basin velocity update with RFWI
2014). However, when applied on data acquired with
traditional narrow or wide azimuth (WAZ) geometry, FWI In our application of RFWI, we started with a velocity
has difficulty properly updating deeper strata due to the model that went through several iterations of tomography
lack of diving wave penetration. To reduce the need for and FWI updates. These updates were able to improve the
© 2017 SEG Page 1492

SEG International Exposition and 87th Annual Meeting
Improving mini-basin and subsalt imaging with RFWI
Figure 1: Sediment flood RTM shows TOS before (a) and after (b) suprasalt RFWI. Salt body RTM image before update (c). Suprasalt RFWI
and re-interpretation improved focusing of BOS and continuity of subsalt (d).
velocity of the shallow folds but failed to properly update our RFWI method (A. Gomes and N. Chazalnoel, personal
the shale velocity at the bottom of the mini-basin due to the communication, 2017).
close proximity of salt, the absence of reflectivity inside the
shale, and the shale depth. The sediment flood RTM shows The suprasalt RFWI updated velocity model was used to
that the underlying rugose TOS was poorly imaged, re-run sediment and salt floods, which were then used to re-
without sufficient focusing needed for a precise horizon interpret the TOS and base of salt (BOS), respectively.
interpretation (Figure 1a). Improved images of both the BOS and subsalt show the
direct impact of the suprasalt velocity accuracy on salt
RFWI was then applied from 4 Hz to 7 Hz using a wavelet interpretation (Figures 1c and 1d).
extracted from the data. High-wavenumbers of the density
were updated down to TOS to drive the low-wavenumber Subsalt velocity update with RFWI
velocity update above it. Figure 1b shows the sediment
flood RTM image after RFWI update. We can see Following the shallow sediment update and salt geometry
improvements in the imaging of TOS: a strong peak event refinement, we applied a second iteration of RFWI. We
has fewer swings and is now more coherent. The initial allowed updates in sediment inclusions within the salt and
velocity model is shown in Figure 2a and perturbation in in the subsalt, but maintained the precise salt boundaries
Figure 2b. Sediment flood RTM SOGs showed a more necessary for high-definition imaging. The parameters used
distinct TOS event following RFWI application (Figures 2c were similar to those in the suprasalt RFWI application.
and 2d). Although data QC indicated that the imaging was The key difference was allowing the high-wavenumber
improved, the perturbation showed the somewhat vertical density update to extend to 14 km. We show an example
nature of our RFWI update. Due to poor reflectivity of the from a location where large sediment inclusions and sutures
shale, this deeper part of the perturbation was dominated by influence the imaging under a salt nappe (Figure 3a).
the contribution of low-wavenumber energy backscattered Improvement in the intra-salt velocity provided more
from the stronger TOS event. Separating the update down correct BOS positioning and reduced pullup in the subsalt
to the TOS, and then subsequently down to the Mesozoic, (Figure 3b). RTM SOGs after RFWI showed flatter gathers
was one way to mitigate the vertical resolution limitation in both within and below the salt (Figures 3c and 3d).
© 2017 SEG Page 1493

Our final example shows the application of RFWI in an improving the imaging of both shallow and deep strata in a
area of highly deformed Perdido folds. Lines across the dip geologically complex salt province.
(Figure 4a) and strike (Figure 4b) directions of a subsalt
Figure 2: Initial velocity model (a) with TOS and SOG locations.
Mini-basin RFWI perturbation (b). Sediment flood RTM SOGs
zoomed on TOS before (c) and after RFWI (d).
fold show that the RFWI update improved the structural

continuity of the Lower Tertiary and Mesozoic sections Figure 3: Salt body RTM without (a) and with (b) intra-salt and
(Figures 4c and 4d). The initial, relatively smooth, subsalt subsalt RFWI velocity update. RTM SOGs at a large sediment
velocity is shown in Figure 5a. Low vertical resolution of inclusion before (c) and after (d) RFWI.
the perturbation (Figure 5b) again showed that
backscattered energy was dominated by events generated
from the strongest reflectors. Even with this current Acknowledgments
limitation in our RFWI, RTM SOGs (Figure 5c) were
overall better defined and flatter (Figure 5d). Gather The authors would like to thank CGG Multi-Client & New
improvements, along with better stack continuity, indicated Ventures and the Mexican CNH for permission to show
that the update was moving in the right direction. these results. We also thank Rachel Gong for her work on
salt interpretation, and Nicolas Chazalnoel and Adriano
Conclusions and discussions Gomes for their helpful suggestions.
RFWI was able to improve the hard-to-determine suprasalt

mobile shale velocity within a mini-basin. The result of the
image uplift was a less ambiguous salt interpretation. More
coherent and flatter gathers, as well as improved structural
continuity, show that RFWI provided improvements to both
intra-salt and subsalt velocities. Good results were likely
due to the presence of favorable strong events below our
target zones that generated the reflected energy necessary
for the low-wavenumber update. However, RFWI still
depended on a good starting velocity model and relatively
correct depth positioning of the necessary deep reflectors.
To mitigate the lack of vertical resolution of the current
method, a top-down approach of the update was performed.
This increased the benefit from the RWFI application by
© 2017 SEG Page 1494

Figure 4: Dip and strike RTM images across a Perdido area fold before (a) and (b), and after subsalt RFWI (c) and (d).
Figure 5: Initial velocity (a) and perturbation (b) from RFWI update. RTM SOGs along the same dip line before (c) and after subsalt RFWI (d).
© 2017 SEG Page 1495

EDITED REFERENCES
Note: This reference list is a copyedited version of the reference list submitted by the author. Reference lists for the 2017
SEG Technical Program Expanded Abstracts have been copyedited so that references provided with the online
metadata for each paper will achieve a high degree of linking to cited sources that appear on the Web.
REFERENCES
Chazalnoel, N., A. Gomes, W. Zhao, and B. Wray, 2017, Revealing shallow and deep complex geological
features with FWI: Lessons learned: 79th Annual International Conference and Exhibition,
EAGE, Extended Abstracts, https://doi.org/10.3997/2214-4609.201701158.
Irabor, K., and M. Warner, 2016, Reflection FWI: 86th Annual International Meeting, SEG, Expanded
Abstracts, 1136–1140, https://doi.org/10.1190/segam2016-13944219.1.
Li, Z., S. Ji, B. Bai, Q. Wu, and W. Han, 2011, Dirty salt tomography using RTM 3D angle gathers: 81st
Annual International Meeting, SEG, Expanded Abstracts, 4020–4024,
https://doi.org/10.1190/1.3628046.
Liu, F., G. Zhang, S. Morton, and J. Leveille, 2011, An effective imaging condition for reverse-time
migration using wavefield decomposition: Geophysics, 76, no. 1, S29–S39,
https://doi.org/10.1190/1.3533914.
Mora, P., 1989, Inversion = migration + tomography: Geophysics, 54, 1575–1586,
https://doi.org/10.1190/1.1442625.
Ramos-Martinez, J., N. Chemingui, S. Crawley, Z. Zou, A. Valenciano, and E. Klochikhina, 2016, A
robust FWI gradient for high-resolution velocity model building: 86th Annual International
Meeting, SEG, Expanded Abstracts, 1258–1262, https://doi.org/10.1190/segam2016-13872681.1.
Ratcliffe, A., G. Conroy, V. Vinje, and A. Bertrand, 2014, Full waveform inversion—A north sea OBC
case study—Reloaded: 76th Annual International Conference and Exhibition, EAGE, Extended
Abstracts, Th E106 13, https://doi.org/10.3997/2214-4609.20141419.
Sirgue, L., O. I. Barkved, J. Dellinger, J. Etgen, U. Albertin, and J.H. Kommedal, 2010, Full waveform
inversion: The next leap forward in imaging at Valhall: First Break, 28, 65–70,
https://doi.org/10.3997/1365-2397.2010012.
Sun, D., K. Jiao, X. Cheng, and D. Vigh, 2016, Reflection based waveform inversion: 86th Annual
International Meeting, SEG, Expanded Abstracts, 1151–1156,
https://doi.org/10.1190/segam2016-13966097.1.
Tang, Y., S. Lee, A. Baumstein, and D. Hinkley, 2013, Tomographically enhanced full wavefield
inversion: 83rd Annual International Meeting, SEG, Expanded Abstracts, 1037–1041,
Xu, S., Y. Zhang, and B. Tang, 2011, 3D angle gathers from reverse time migration: Geophysics, 76, no.
2, S77–S92, https://doi.org/10.1190/1.3536527.
Xu, S., D. Wang, F. Chen, G. Lambaré, and Y. Zhang, 2012, Inversion on reflected seismic wave: 82nd
Annual International Meeting, SEG, Expanded Abstracts, 1–7, http://doi.org/10.1190/segam2012-
1473.1.
Yang, Z., S. Huang, and R. Yan, 2015, Improved subsalt tomography using RTM surface offset gathers:
85th Annual International Meeting, SEG, Expanded Abstracts, 5254–5258,
© 2017 SEG Page 1496

Recovering Nile Delta Messinian salt structure using Reconstructed Full Waveform Inversion
Juergen Fruehn, Stuart Greenwood; Ken McDermott, Sahil Mannick, Chao Wang, ION.
Summary increase lateral resolution in the imaging workflow (e.g.

The Messinian evaporitic sequence in the Mediterranean Pratt et al., 1996; Plessix et al., 2010; Brittan et al., 2016).
Sea forms a complex screen which distorts and scatters
seismic energy travelling through it. Consequently, However, due to the acoustic approximation usually made
attempts to build reliable velocity models and associated with FWI, the data which can be reliably inverted against
depth images are greatly compromised. In this work, we are usually limited to the refracted (transmitted) wavefield,
present part of a very large scale regional basement study, as this is less sensitive to density change. Hence, classical
using long-offset long 2D seismic lines, where we have FWI only tends to give a meaningful velocity update in the
employed high resolution ray-tomography and shallower parts of the section.
conventional full waveform inversion, in combination with
a new technique: ‘reconstructed full waveform inversion’ More recently, variants of FWI have been introduced that
which better constrains the resultant models by using the use the image domain, or other criteria not simply restricted
velocity model as an image constraint. to amplitude and phase behavior, as a constraint (e.g.
Warner and Gausch, 2016, Wang et al., 2016, Vigh et al.
Introduction 2016). These techniques offer the possibility of achieving
As with much of the Mediterranean Sea, attempts to image deeper reliable model updates than conventional FWI.
seismic data from the Nile Delta is beset with image
distortion due to the presence of an evaporate layer in the Figure 1 shows a synthetic salt body example comparing
shallow overburden (hereafter referred to generically as the the results of FWI and RFWI, starting from a smoothed
‘salt’). The thickness of the Messinian evaporates varies version of the true model (Wang et al. 2017). As expected,
greatly, and in places is absent. Although model building is FWI is well able to update the shallow ‘post salt’ section,
often attempted by picking the top-salt features in great but fails to meaningfully delineate the salt body or the pre-
detail, the complex inter-bedded structure and variable salt section. Conversely, RFWI successfully recovers the
thickness of the evaporate bodies, still often hampers both salt geobody geometry and provides a reasonable update to
velocity estimation and subsequent depth migration of pre- the sub-salt model.
salt structures (e.g. Jones & Davison, 2014).
The data used in this study are from a long offset deep
basin-study dataset of the type described by Horn (2015).
In this case the overall program includes several thousand
linear kilometers of 2D data, and the line shown here is
some 360km long. The maximum offset recorded in the
marine cable was 12km, with a record length of 18s.
Imaging was performed to a depth of 40km, so as to try to
capture Moho reflectors and to determine the boundary
between continental and oceanic crust.
Data processing included various forms of multiple

suppression as well as broadband deghosting, the latter
being of particular importance in recovering low-frequency
components of the data in advance of waveform inversion.
Velocity model building commenced with several iterations Figure1. clockwise from top left: true model; smoothed
of high resolution non-parametric inversion (Fruehn et al., starting model; result of conventional FWI (the shallow
2014), in conjunction with interpretation of the top and model is successfully recovered, but the salt body is not);
base evaporite (‘salt’) geobodies. It is well understood that result of RFWI – the shape of the salt body is reasonable
the resolution available with such ray-based methods is recovered, and the sub-salt model is beginning to emerge
limited to lateral scale lengths of several hundred metres (example courtesy of Chao Wang et al., 2017).
(e.g. Jones 2010, 2015) except for well-behaved clean long
offset data. Consequently, in recent years we have seen the
introduction of Full Waveform Inversion (FWI) in order to
© 2017 SEG Page 1497

Recovering Nile Delta Messinian salt structure using RFWI
Figure 2: image with model overlay after high resolution Figure 3: image with model overlay after tomo plus
ray tomography, and preliminary picking of the salt conventional FWI: the near surface and post-salt sequence
bodies, is nicely updated, but salt and below remain unchanged
© 2017 SEG Page 1498

Figure 4: image with model overlay after tomo plus Figure 5: image with model overlay after further iterations
conventional FW, plus RFWI and residual tomo: salt and of RFWI and a final residual tomo: salt and below are now
below are now updated better defined
© 2017 SEG Page 1499

Results on Nile Delta field data

Figure 2 shows a small segment of the 2D regional line,
with an interval velocity model overlay resulting from
several iterations of structurally constrained ray-based non-
parametric tomography, in conjunction with preliminary
picking of both the top and base salt. After fifty (internal)
iterations of FWI, we obtain the image in Figure3: here we
see as expected for FWI, that the near-seabed structure has
been updated, but that the salt and pre-salt section remains
essentially unchanged.
Following this, the RFWI was run for 5 iterations, which

modified the salt boundary and velocity within the salt
geobody. Subsequent tomography for the deep section
refined the pre-salt model (Figure 4). Finally, a further 18
iterations of RFWI improved the salt geobody and the
intra-salt velocity distribution, and more ray-tomography
successfully updated the very deep section (Figure 5).
Figure 6 shows a well log superimposed on the interval

velocity models for the starting model (which was the final
product of the previous 2016 regional isotropic model
building) and then on the final anisotropic new model. In
the new results, the fine-detail of the salt geobodies is
evident, as is the near-seabed detail.
Conclusions
Although FWI has offered great promise as a mean of
updating near-surface structure, pushing past the limits of
ray-based tomography, the acoustic approximation, when
applied in the data-domain is still essentially limited to the
transmitted wavefield. Extending FWI to include image-
domain constraints can greatly expand the range of
usefulness of the wave-based inversion techniques, also
helping to mitigate the effects of cycle skipping on Figure 6: Top – interval velocity from legacy 2016
convergence. conventional isotropic tomographic model update.
Bottom – new anisotropic tomography + FWI + RFWI
model. Both shown with the associated well log.
Acknowledgements
We thank BP staff for their helpful insights into the
expected geological structure of the region, and to ION for
permission to present this work. We also thank Ian Jones
for his help in preparing the manuscript, and guidance in
the model building.
© 2017 SEG Page 1500

EDITED REFERENCES
REFERENCES
Brittan, J., J. Bai, H. Delome, C. Wang, and D. Yingst, 2013, Full waveform inversion—The state of the
art: First Break, 31, 75–81.
Fruehn, J., S. Greenwood, V. Valler, and D. Sekulic, 2014, Resolving small-scale near-seabed velocity
anomalies using non-parametric autopicking and hybrid tomography: CSEG Recorder, 39.
Jones, I. F., 2010, An introduction to velocity model building: EAGE.
Jones, I. F., 2015, Estimating subsurface parameter fields for seismic migration: Velocity model building:
Encyclopedia of Exploration Geophysics, U1-1–U1-24.
Jones, I. F., and I. Davison, 2014, Seismic imaging in and around salt bodies: Interpretation, 2, SL1–
SL20, http://doi.org/10.1190/INT-2014-0033.1.
Plessix, R.-E., G. Baeten, J. W. de Maag, M. Klaassen, R. Zhang, and Z. Tao, 2010, Application of
acoustic full waveform inversion to a low-frequency large-offset land data set: 80th Annual
International Meeting, SEG, Expanded Abstracts, 930–934, http://doi.org/10.1190/1.3513930.
Pratt, R. G., Z.-M. Song, P. Williamson, and M. Warner, 1996, Two-dimensional velocity models from
wide-angle seismic data by wavefield inversion: Geophysical Journal International, 124, 323–
340, http://doi.org/10.1111/j.1365-246X.1996.tb07023.x.
Vigh, D., K. Jiao, X. Cheng, D. Sun, and W. Lewis, 2016, Earth-model building from shallow to deep
with full-waveform inversion: The Leading Edge, 35, 1025–1030.
Wang, C., D. Yingst, P. Farmer, and J. Leveille, 2016, Full-waveform inversion with the reconstructed
wavefield method: 86th Annual International Meeting, SEG, Expanded Abstracts, 1237–1241,
Wang, C., D. Yingst, P. Farmer, G. Martin, and J. Leveille, 2017, Reconstructed full waveform inversion
with the extended source: 87th Annual International Meeting, SEG, Expanded Abstracts.
Warner, M., and L. Guasch, 2016, Adaptive waveform inversion: Theory: Geophysics, 81, no. 6, R429–
R445, https://doi.org/10.1190/geo2015-0387.1.
© 2017 SEG Page 1501

An application of full waveform inversion with automated adjustment of salt geometry
Lu Xin Zhang*, Denes Vigh, Xin Cheng, Kun Jiao and Dong Sun, Schlumberger
Summary velocity flood migration followed by base of salt

interpretation. In practice, only a one-digit number of
Full Waveform Inversion has recently emerged as a iterations of traveltime tomography is employed in one
promising method for refining seismic velocity fields, which survey, even so the salt picking work is quite time-
then benefit migration techniques to achieve enhanced consuming. However for FWI application, we usually need
subsurface imaging. In the contexts of salt tectonic, a salt close or more than 10 iterations, and in the loop salt
body model is needed for FWI to get properly update in a geometry has to be adjusted, which now turns to be a huge
timely manner, due to the strong and complex wave load for any production work.
phenomenon from interface between sediment and salt. The
tradition way of defining salt geometry includes several Recently, there are several methods have been introduced to
iterations of migrations and salt interpretations, such as address the issue of inverting salt geometry inside FWI,
sedimentary velocity flood, salt velocity flood. Especially Lewis (2016) introduced a level-set method to invert the salt
for the top salt, it needs to be re-picked after each body interface directly; Esser (2015) looked at Total
sedimentary velocity update. In practice, one survey needs Variation constraints to regularize the problem of
several iterations of FWI, which means several rounds of salt constructing salt bodies. But these methods are still away
geometry pickings, which is quite time consuming. In this from mature. In this context, we applied a method which not
paper we present an application of Full Waveform Inversion directly invert the salt geometry inside FWI, but
with automated salt geometry adjustment inside FWI in a automatically adjust the input salt body during FWI
wide azimuth seismic data. From the result, not only we see iterations and then merge with the inverted sediment velocity
the sediment velocity away from salt is refined, but also in field serving as input salt body velocity model for each
the vicinity of salt we get high resolution velocity model, iteration of FWI. In the application on real data example,
especially for the carbonate on top of salt. this method combined with our cycle-skip-mitigated FWI
algorithm worked well.
Introduction
Theory
Full Waveform Inversion (FWI) has recently emerged as a
promising method for updating detailed seismic velocity Full Waveform Inversion is seeking a quantitative model of
fields, which will then benefit imaging techniques to the subsurface that minimize the difference between
enhance the subsurface images. As a data-fitting method, acquired data and the simulated data, and in the past decades
the algorithm iteratively updates the subsurface earth models there have been extensive literatures. The conventional FWI
to reduce the misfit function between the recorded seismic algorithm is based on iteratively updating the model by
data and the simulated seismic data. Diving waves, minimizing the least square misfit function. One of the
reflections, and diffractions all carry different information major challenge for this lease square style FWI is the local
about the subsurface. In the past decade, successful FWI minimum caused by cycle-skip between the acquired and the
applications have all used early arrivals, diving waves, and simulated data. A tradition way to mitigate this issue is to
refractions to successfully update the shallow parts of the derive FWI starting model through several iterations of
model. Furthermore, cycle-skip-mitigation techniques, such tomography (Kapoor, 2012). However this offsets one of the
as traveltime based method, enabled FWI can start with a advantage of FWI that FWI doesn’t require a complicated
very simple model. Therefore it can be applied in the early signal processing which is a must for traveltime tomography.
stage of model building where no data precondition has been
applied. By modifying this conventional least square FWI to
minimize the traveltime shift between acquired and
Due to the large property contrast between salt and sediment simulated data, instead of direct subtraction, the cycle-skip
velocity, the wave phenomenon is quite complex and it’s issue is avoided. In the time domain, the instantaneous phase
impossible to separate and remove all the salt related energy/ difference can be computed and used as an objective
events on the acquired data. As a result, defining the salt function in FWI (Bozdağ, 2011). The traveltime shift in the
geometry has been identified as one of the major factors for time domain can be easily computed as local attributes, and
the successes of FWI, especially for the subsalt. To correctly then be translated into the corresponding unwrapped
define the salt geometry, this usually involves of iterations instantaneous phase error. As a local attribute, such an
of migration and salt picking, sediment velocity flood instantaneous phase error describes the local phase
migration followed by top of salt picking, and then salt misalignment. Thus, FWI can directly minimize the
© 2017 SEG Page 1502

An application of automated salt geometry adjustment FWI
traveltime shift objective functions, as shown in the equation initial model were through into Kirchhoff Depth Migration
below, where φ is the instantaneous phase error, which is a to produce the common offset gathers to validate the
function of time, is the Hilbert transform operator, E is kinematical convergence. Fig 2 shows the KDM gather
the instantaneous amplitude operator, and DF[m]* stands comparisons among (a) FWI initial model, (b) adjustive FWI
for the adjoint operator of the first derivative map of F[m], without salt auto-adjust and (c) adjustive FWI with salt auto-
which consists of a backward propagation and an application adjust. The depth gathers from adjustive FWI without salt
of imaging condition. FWI will use the instantaneous phase auto-adjust is slightly over-corrected while the gathers from
gradient formulation to back-project the local traveltime adjustive FWI with salt auto-adjust is quite flat.
shift into model error.
On the mega-size survey mentioned before, two frequency
bands (4Hz and 6Hz) of adjustive FWI were employed with
this auto-adjust salt geometry feature. Fig 1 (b) shows an
example of the update, comparing to the initial on Fig 1 (a),
with this auto-adjust salt feature, FWI was able to pick
several high velocity zone (carbonate) just on top of the salt,
also it picked up the shallow slow shale velocity bodies on
Regarding to salt, one of the conventional approach to the left side. Fig 3 shows one location with carbonate just
handle the salt is to over smooth the sediment/salt model to on top of the salt, (a) is the KDM gather from initial model,
mitigate the cycle-skipping issue, FWI might be able to in which the top salt and carbonate reflection events are both
change the salt with more iterations and at higher curving up indicating the initial velocity is too slow; (b) is
frequencies. The other approach would require manually re- the KDM gather from FWI updated model, in which the
picking the salt geometry on depth migration volumes after gather flatness is improved from shallow to bottom,
each FWI updates, which is also quite time-consuming. especially for the carbonate and top of salt the events are flat.
On the KDM image (c), it’s clear that the carbonate body is
Instead of dealing with the combined sediment and salt well defined.
model, we separate the velocity into two pieces, sediment
velocity field and a salt mask. The initial salt mask can be Comparison on Fig 4 showed that with this auto-adjust salt
easily picked on a depth migrated image (like Reverse Time method, reflection FWI can improve the base of salt imaging
Migration image) with FWI input data and initial sediment quality. Fig 4 (a) is an 18Hz RTM image from pre-RFWI
velocity model. Then inside the loop of FWI, the two pieces model, and (b) is an 18Hz RTM image from post-RFWI
will be combined together to generate a sediment velocity model. As the yellow arrow points, after FWI updates, the
model with salt body. After each iteration of FWI, as the coherency and continuity of the base of salt reflector is
sediment velocity changes, this salt body geometry will be improved after 5 iterations of Reflection based FWI.
repositioned using a map-migration kind of technique. In
such a way, the repositioned salt body matches the seismic Conclusions
imaging from the depth migration with a FWI updated
sediment model much better than the previous un-adjusted The conventional way to handle salt geometry in FWI is time
salt body. consuming. Without a proper way to handle the salt
geometry change during FWI updates, it’s very challenged
Examples and Results to get detailed updates close to salt. The auto-adjust salt
method is employed in FWI and the salt geometry is
A 2x4 wide-azimuth (WAZ) data set with about 9km automatically adjusted during FWI iterations in a timely
maximum offset acquired in the Gulf of Mexico is taken as manner. Applying this auto-adjust salt to adjustive FWI
example for the salt-auto-adjust FWI test. The gun array, the (cycle-skip mitigated), FWI is able to provide high-
shot depth and the cable depth allowed to record frequencies resolution update in the vicinity of salt. In the real data
lower to about 2.5 to 3 Hz. We started the FWI at 4Hz from examples, FWI was able to pick shallow slow shale body,
a 1D type of starting velocity field, one example section is high velocity carbonate sitting on top of the salt and also
displayed in Fig 1 (a). improve the coherence and continuity of the base salt
reflection.
First, we execute the adjustive FWI (cycle-skip mitigated) at
4 Hz with and without this auto-adjust salt method in a small Acknowledgements
area. For the test without auto-adjust salt method, the input
salt body model was smoothed to mitigate the errors from The authors would like to thank WesternGeco Geosolutions
mis-positioned salt geometry during FWI updates. After and Multi-Client for the permission to publish this paper.
several iterations, the two final updated models plus the
© 2017 SEG Page 1503

Fig 1. (a) Initial 1D velocity model vertical section overlaid with corresponded image; (b) FWI updated
velocity after 2 frequency band update in vertical section. Velocity from low to high using white-blue-
red color table. The horizontal extend is approximate 150km. (Courtesy Schlumberger)
Fig 2. On the top panel, from left to right are KDM gathers from (a) initial model, (b) adjustive FWI
without auto-adjusted salt geometry, (c) KDM gather from adjustive FWI with auto-adjusted salt
geometry, and the yellow arrow indicates the top of salt location. (Courtesy Schlumberger)
Fig 3. From left to right are KDM gathers from (a) initial model, (b) adjustive FWI with auto-adjusted salt
geometry, (c) KDM Image from adjustive FWI with auto-adjusted salt. Blue arrows on gathers indicate
the top of salt location, the yellow line on the KDM image indicates the gather location, and the red
arrow on the KDM image points to the high velocity geobody. (Courtesy Schlumberger)
© 2017 SEG Page 1504

Fig 4. (a) RTM image with pre-RFWI model; (b) RTM image with the salt auto-adjusted RFWI updated
model. (Courtesy Schlumberger)
© 2017 SEG Page 1505

EDITED REFERENCES
REFERENCES
Bozdag, E., J. Trampert, and J. Tromp, 2011, Misfit functions for full-waveform inversion based on
instantaneous phase and envelope measurements: Geophysical Journal International, 185, 845–
870, http://dx.doi.org/10.1111/j.1365-246X.2011.04970.x.
Esser, E., L. Guasch, T. van Leeuwen, A. Y. Aravkin, and F. Herrmann, 2015, Automatic salt
delineation - wavefield reconstruction inversion with convex constraints: 85th Annual
http://dx.doi.org/10.1190/segam2015-5877995.1.
Jiao, K., D. Sun, X. Cheng and D. Vigh, 2015, Adjustive full waveform inversion: 85th Annual
Lewis, W., D. Vigh, 2016, 3D salt geometry inversion in full-waveform inversion using a level-set
method: 86th Annual International Meeting, SEG, Expanded Abstracts, 1221–1226,
© 2017 SEG Page 1506

Salt model building at Atlantis with Full Waveform Inversion
Xukai Shen*, Imtiaz Ahmed, Andrew Brenders, Joe Dellinger, John Etgen and Scott Michell
BP America Inc., Houston, TX
Summary and Ross, 2007). Following initial success in 2005, several

follow-up OBN surveys have since been acquired at
BP recently acquired wide-offset ocean-bottom-node data Atlantis. As they were designed to do, these surveys
with conventional airguns over the Atlantis Field in the delivered a greatly increased subsurface (including sub-
deep-water Gulf of Mexico. Careful consideration during salt) illumination over traditional NATS seismic, and the
the acquisition was rewarded by recording usable signal additional illumination helped produce a much-improved
down to a lower frequency than previously achieved. Full- extra-salt image, as well as excellent 4D signature of the
waveform inversion was then applied to the resulting extra-salt targets. However, the wide-azimuth data also
dataset. The resulting velocity model was then used, made it clear that a considerable portion of the sub-salt “no-
unmodified, to reverse time migrate the seismic data. The image” zone cannot be resolved by simply acquiring more
result was the production of some of the best sub-salt data. In other words, the data quality has apparently
images ever seen at Atlantis. Furthermore, the FWI velocity reached a point where the limiting factor is the velocity
model clearly revealed several major interpretation errors model used to migrate the seismic data, not the
in the legacy salt model, and thus the FWI result also illumination.
offered an excellent basis for updating the conventional
salt-model-building workflow. These results demonstrated The legacy velocity model was built using the long-
that with appropriate seismic data to support it, FWI might standing industry standard practice – tomography, then
offer a paradigm shift in model building and imaging in sediment and salt floods interleaved with migration and
areas of complex salt. picking of top and bottom of salt, and finally labor-
intensive salt scenario testing with guidance from FWI
Introduction (full-waveform inversion). Such workflows are not only
time consuming and expensive, but they have continued to
Imaging sub-salt targets has been a major challenge due to perform poorly in complex salt areas like the one at
the presence of complex salt bodies in the overburden. Atlantis. Modeling studies showed that manual
Over the past decade substantial progress has been made in interpretation of complex salt can easily go astray, resulting
improving the illumination of subsalt targets using modern in large errors in the interpreted velocity model, and that
acquisition techniques such as WAZ (wide-azimuth towed these could explain many of the sub-salt imaging problems
streamers). However, our capability in model building is we see at Atlantis (Dellinger et al., 2017).
falling behind. More specifically, we still lack automatic
methods for generating velocity models in areas of salt, and Data-driven, automatic model-building methods provide
in areas of complex salt we sometimes cannot generate hope of avoiding the pitfalls of manual interpretation. In the
adequate velocity models at all, despite man-years of effort. last decade one such method, Full-Waveform Inversion
(Tarantola, 1984), has revolutionized velocity-model
The Atlantis Field in the Gulf of Mexico epitomizes this building in areas with shallow gas (Sirgue et al., 2010) and
problem. At Atlantis, the northern portion of the sub-salt many areas of the world where the velocity anomalies are
target sits beneath a complex allochthonous salt body. Two localized and/or lack significant contrast with the
salt masses are interacting to create a salt body that changes surrounding sediments (Tanis and Behura, 2017). In such
from thick (~1500 m) and tabular in the northeast to thin environments, FWI has become a standard and successful
and complex in the center of Atlantis, where several thin tool for building velocity models for imaging. However, the
salt bodies (“salt fingers”) pinch out over the crest of the presence of large, complex salt bodies, together with the
target (see Figure 1). Faulting in the Miocene target adds scale of the velocity contrast between the salt and the
further complexity and risk to development planning. The surrounding sediments, results in a much more difficult
development of the Atlantis Field is currently mostly mathematical inversion problem, with multiple local
limited to the extra-salt areas where even conventional minima. We certainly could not use FWI for velocity model
Narrow-Azimuth Towed-Streamer (NATS) seismic data building and imaging in a salt basin as we do elsewhere. At
are sufficient to produce a good image. To unlock the value best, the FWI velocity-model updates have only provided
in this field, BP pioneered wide-azimuth deep-water hints to the interpreters as to what further salt scenarios
acquisition using Ocean Bottom Nodes (OBN) (Beaudoin they might try. Much effort has been expended on
© 2017 SEG Page 1507

Salt-model building with FWI
enhancing FWI algorithms to better meet the challenge of calculated using the latest 2014-2015 OBN seismic data
automatic model-building (Routh et al., 2016), with varying and the legacy velocity model shown in Figure 1. The no-
degrees of success. Meanwhile, both theoretical studies image zone in the bottom panel of Figure 2 indicates that
(Sirgue, 2006) and case studies using onshore field data there must be a problem with the earth model, which
(Plessix et al., 2010) have indicated that recording low becomes even more obvious when compared with the clear
frequencies at wide offsets might provide a more direct and detailed extra-salt image immediately to the right of the
way to achieve success, by making the underlying “no-image” zone. The stark contrast in image quality
inversion problem more tractable. suggests that the sedimentary model in the extra-salt
overburden is of an acceptable quality, whereas the
With that in mind, we designed the most recent 2014-2015 complex salt model within the overburden is clearly not.
Atlantis 4D OBN Survey (van Gestel et al., 2015) to record
as much long offset and low frequency data as possible. By
design, the survey had full-azimuth coverage with a
significant portion of the data having offsets exceeding 20
km, considerably more than required for a 4D repeat of
previous surveys. In addition, improved acquisition quality
control, airguns that were more reliable and repeatable than
before, better OBN hardware and software, and paying
more careful attention to the data at every step of the
processing chain allowed us to record, retain, and make use
of lower frequencies at wider offsets than we had
previously managed for Atlantis OBN surveys. Comparison
of the amplitude spectra of data acquired in 2014-15 with
data acquired in 2005 shows a more consistent source Figure 1: The legacy Atlantis velocity model.
signature with a higher signal-to-noise ratio below 5 Hz,
and S/N crossover at 1.5 Hz, versus ~3 Hz for the 2005
data. Above 5 Hz the S/N ratios were similar. The extra
time and cost would not have been justified if the goal were
merely to record better data for imaging using our pre-
existing velocity models. The goal was to record better data
to allow us to build a better velocity model for imaging.
Using these improved data, and the best legacy model

available as the starting model, we rebuilt the model using
FWI. We did not perform any further manual updates. The
new model not only included sedimentary updates similar
to the ones we have seen in successful FWI applications
elsewhere, but it also had major salt updates where large
volumes of salt in the legacy model were replaced with
lower velocity sediments. We found that we could directly
use this FWI-produced velocity model for imaging, just as
we now typically do in sedimentary basins. The resulting
image is significantly better than the same image from the
legacy model, particularly in the sub-salt area. We have yet
to completely close the “no-data” imaging area, but as far
as sub-salt imaging is concerned, the FWI velocity model
produced the best image at Atlantis we have ever seen.
Imaging Challenges and Results
Figure 1 shows the latest-generation legacy velocity model Figure 2: Reverse time migration using the legacy model (Figure
across the center of the Atlantis Field. The “no-image” 1) applied to the 2014-2015 OBN data. Top: image of the same
zone lies underneath the prominent salt fingers at the top cross-section as the one in Figure 1. Bottom: zoomed-in image of
center of the section. Figure 2 shows the corresponding the box in the top panel.
section of a P-wave prestack depth-migrated image,
© 2017 SEG Page 1508

Figure 3 shows the same cross-section as Figure 1, but image. The structure is much better defined, including the
made using the velocity model produced by FWI. The target reservoir, the top of the mother salt, sub-salt faults
improved low frequencies and wide offsets present in the and sub-salt strata. We can also see a bright event where
latest generation of Atlantis OBN data have allowed the the (now removed) salt finger had been in the legacy
FWI to start from the legacy velocity model in Figure 1 and model. This event is likely what misled the interpreters into
make meaningful model updates all the way to the erroneously including the deeper salt finger in the velocity
maximum depth of the computational grid, something it model.
was not able to do using previous generations of Atlantis
OBN data. There are several features to note in the FWI
result. First, the shallow sediments have added geologically
reasonable detail, similar to what we expect to see from
FWI results in sedimentary basins. Second, the top of salt
around x=6 km has had non-trivial adjustments made to its
shape and has significantly reduced the degree of salt
overhang. Third, the salt fingers in the center of the section
have dramatically changed. The deepest salt finger has
been removed, the bottom of the second deepest finger is
deeper than before and the shape and location of the
sedimentary wedge between the salt fingers is significantly
different. Fourth, the sub-salt, and particularly sub-finger,
sediment velocities have had major updates, including the
addition of some low-velocity anomalies. Finally, the top of
the mother salt is significantly shallower than it was before.
Figure 4: As Figure 2, but for the FWI velocity model.
Figures 5 and 6 show depth sections at the level of the

Figure 3: The Atlantis velocity model from Figure 1, after update target before and after FWI. The imprint of the complex
by FWI. salt in the overburden is clearly visible in the center of the
legacy model image in Figure 5. The anticline structure is
The image created using the FWI model is shown in Figure visible where it projects to the South and East out from
4. The only difference between this image and the legacy- underneath the salt. However, in the center of the image
model image in Figure 2 is the velocity model that was underneath the complex salt overburden the image is either
used for the imaging. The seismic data being migrated are completely washed out or contains geologically
the same. There are several major uplifts in Figure 4. First, unreasonable discontinuities in the reflectors. These are
the top of salt image to the left of the salt finger area is largely removed in the FWI model image in Figure 6, with
much more clearly imaged, even though the salt boundaries the most significant improvements shown in the bottom
are unphysically smooth in the FWI model. This is a result two figures. The anticline structure extra-salt improved
of the improved sediment model above the top of salt and slightly, but the sub-salt anticline image improved
better positioning of the top of salt. Second, the extra-salt significantly. The washed-out zone is much smaller than
image is improved all the way to the maximum depth of the before and the anticline structure can almost entirely be
model, particularly the steeply dipping events. This mapped. The sub-salt faults are much better defined as
indicates an overall improvement in the velocity model well, which has significant implications for well planning
made by the FWI update. Last and most important, the sub- and drilling.
salt image is dramatically improved. In some places where
there was no image before there is now an interpretable
© 2017 SEG Page 1509

CONCLUSIONS ACKNOWLEDGMENTS
We demonstrated a successful FWI application in the salt We thank BP and the partner of the Atlantis Field (BHP
environment. With low-frequency, long-offset data, FWI Billiton) for permission to publish this paper. More results
can correct major errors in the salt definition and bring from this study may be found at Shen et al. (2017).
step-change improvements to the sub-salt image. The FWI
result is migration-friendly and points the way towards a
paradigm shift in the industry-standard practice of salt
model building.
Figure 5: A depth slice from reverse time migration using the Figure 6: As Figure 7, but for the FWI velocity model.
legacy model at the level of the target. Top: overall image; Middle:
zoomed-in image of the left box in the top panel; Bottom: zoomed-
in image of the right box in the top panel.
© 2017 SEG Page 1510

EDITED REFERENCES
REFERENCES
Beaudoin, G., and A. A. Ross, 2007, Field design and operation of a novel deepwater, wide-azimuth node
seismic survey: The Leading Edge, 26, 494–503, https://doi.org/10.1190/1.2723213.
Dellinger, J., A. J. Brenders, J. R. Sandschaper, C. Regone, J. Etgen, I. Ahmed, and K. J. Lee, 2017, The
Garden Banks model experience: The Leading Edge, 36, 64–71,
https://doi.org/10.1190/tle36020151.1.
Plessix, R-E., G. Baeten, J. W. de Maag, M. Klaassen, Z. Rujie, and T. Zhifei, 2010, Application of
acoustic full waveform inversion to a low-frequency large-offset land dataset: 80th Annual
International Meeting, SEG, Expanded Abstracts, 930–934, https://doi.org/10.1190/1.3513930.
Routh, P., J. Behura, and M. Tanis, 2016, Introduction to this special section: Full-waveform inversion
Part I: The Leading Edge, 35, 1024–1024, https://doi.org/10.1190/tle35121024.1.
Shen, X., I. Ahmed, A. Brenders, J. Dellinger, J. Etgen and S. Michell, 2017, Full-waveform inversion:
the next leap forward in subsalt imaging: The Leading Edge, in preparation.
Sirgue, L., 2006, The importance of low frequency and large offset in waveform inversion: 68th Annual
International Conference and Exhibition, EAGE, Extended Abstracts, A037,
http://doi.org/10.3997/2214-4609.201402146.
Sirgue, L., O. I. Barkved, J. Dellinger, J. Etgen, U. Albertin, and J. H. Kommedal, 2010, Full waveform
inversion: The next leap forward in imaging at Valhall: First Break, 28, 65–70.
Tanis, M., and J. Behura, 2017, Introduction to this special section: Full-waveform inversion Part II: The
Leading Edge, 36, 58–58, https://doi.org/10.1190/tle36010058.1.
Tarantola, A., 1984, Inversion of seismic reflection data in the acoustic approximation: Geophysics, 49,
1259–1266, https://doi.org/10.1190/1.1441754.
Van Gestel, J.-P., E. L’Heureux, J. R. Sandschaper, P.-O. Ariston, N. D. Bassett, and S. Dadi, 2015,
Atlantis “Beyond 4D” Ocean Bottom Nodes acquisition design: 85th Annual International
© 2017 SEG Page 1511

Deep learning prior models from seismic images for full-waveform inversion
Winston Lewis∗ , Schlumberger and Denes Vigh, WesternGeco
SUMMARY in the 1980s and 1990s, neural networks didn’t find widespread
adoption due to being very expensive to train even for simple
Full-waveform inversion (FWI) is now a mature technology models. The advent of GPUs and the massive amount of data
that is routinely used in exploration around the world to ob- available during the last decade transformed this field just as it
tain high resolution earth models. In geological areas such did with FWI. The availability of highly efficient and parallel
as the Gulf of Mexico, however, reconstructing complex salt solvers on GPUs have enabled training of deep networks in a
geobodies poses a huge challenge to FWI due to the absence short period of time. The deep learning approach to solving
of low frequencies in the data needed to resolve such features. classification and segmentation problems in medical imaging
A skilled seismic interpreter has to interpret these geobodies has shown significant success (Greenspan et al., 2016; Setio
and manually insert them into the earth model and repeat this et al., 2016) in discriminating between normal and pathologi-
process several times in the earth model building workflow. cal biological tissues.
Deep learning algorithms have gained a lot of interest in re-
cent years by obtaining state-of-the art results in various prob- Seismic interpreters have long relied on a number of different
lems arising in the fields of computer vision, automatic speech attributes computed from a seismic image to guide them in the
recognition and natural language processing. We investigate interpretation process (Halpert et al., 2008; Wang et al., 2015).
the use of these algorithms to generate useful prior models for Here we investigate an approach to apply deep learning meth-
full-waveform inversion by learning features relevant to earth ods to seismic images and attributes derived from it to generate
model building from a seismic image. We test this method- a probability of a region of the seismic image belonging to a
ology in full-waveform inversion by generating a probability salt geobody. We do this by using the prior that salt geobod-
map of salt bodies in the migrated image along with a prior ies found in Gulf of mexico regions usually don’t have well
model and incorporating it in the FWI objective function. This defined structure and hence discriminating between structure
approach is shown to be promising in enabling an automated and lack of structure in seismic images is a reasonable indi-
salt body reconstruction using FWI. cator of the presence of salt. This a priori information is then
used as an extra regularization term in the FWI objective func-
tion which guides the inversion towards the prior model.
INTRODUCTION
THEORY
Full-waveform inversion is a promising technology that the
seismic industry has to produce high resolution earth models The classical formulation of the full-waveform inversion prob-
that fully explain the recorded seismic data. Although the the- lem is posed as the minimization of the objective function,
ory for FWI was developed in the 1980s (Tarantola, 1984), it
is only in the last decade that major advances in high perfor- 1
min f (m) = kF(m) − dk22 , (1)
mance computing power provided by the graphical computing m 2
units (GPUs) enabled a widespread adoption of FWI on field where F : M → D is a wave equation modeling operator that
datasets (Virieux and Operto, 2009; Vigh et al., 2011; Kapoor given a earth property model m ∈ M (velocity, density, etc),
et al., 2014). However, geological areas such as the Gulf of simulates synthetic seismic data, d ∈ D the seismic data mea-
Mexico (GOM) pose a huge challenge to FWI due to the pres- sured in the field and k · k22 represents the squared L2 norm.
ence of high contrast complex salt geobodies. Given the long An adjoint-state method (Plessix, 2006) is used to compute the
wavelength nature of these anomalies, running FWI with an gradient g(m) = ∇ f (m) and iterative optimization algorithms
initial model without a reasonably accurate representation of are used to minimize the objective function.
these geobodies already present in the model would lead to cy-
cle skipping and cause the inversion to converge to a wrong Prior model regularization
solution. Several approaches to overcome this challenge have
Seismic inverse problems are generally ill-posed and their so-
been investigated (Esser et al., 2015, 2016). Here we investi-
lution is non-unique and unstable. Regularization in the form
gate a data driven approach that mimics what interpreters nat-
of incorporating additional information about the nature of the
urally do in the current model building workflow as it might
solution is essential to obtain a reasonable solution to the in-
have better success with field datasets.
verse problem (Tikhonov and Arsenin, 1977; Zhdanov, 1993).
Deep learning (LeCun et al., 2015) is a branch of machine
In model space, this is achieved by introducing a priori infor-
learning that has generated widespread interest in recent years
mation in the inversion procedure about the earth’s subsurface
by breaking previous records for recognition and classification
structure in the form of a functional J(m). We have the regu-
tasks in image and speech processing. The algorithms that
larized objective function as,
come under deep learning are essentially variants of artificial
neural networks but with a significantly deeper layers of neu- 1
min fˆ(m) = kF(m) − dk22 + λ J(m), (2)
rons than what have been used in the past. Although invented m 2
© 2017 SEG Page 1512

Deep learning prior models for FWI
Fully Fully
connected connected
Convolu-on Convolu-on Convolu-on Convolu-on
So6max
Pooling Probability
Pooling Pooling
Image
Fully
connected
Figure 1: Architecture of a typical convolutional neural network (CNN).
where λ is the user-defined regularization weight. The stan- reduce the amount of parameters and computation in the net-
dard Tikhonov regularizing functional is given by, work, and hence to also control overfitting. Neurons in a fully
connected layer (FC) have full connections to all the activa-
1
J(m) = kΓ(m − m prior )k22 , (3) tions in the previous layer, as done in regular neural networks.
2 A softmax layer is used as the final layer of the network. It
where m prior is some a priori model, usually set to zero when converts the output of the final fully connected layer, which
nothing is assumed to be known, and, Γ is the Tikhonov matrix outputs arbitrary values, into a probability distribution for the
that can be interpreted as the confidence we place on the prior output classes.
model. This approach has been successful on field datasets
in constraining the model updates around the well by creating Many CNN architectures have been proposed and shown to
prior and confidence models using the well log information obtain breakthrough classification accuracy on image classi-
(Vigh et al., 2015) and also ensuring model updates are geo- fication challenges. The most well known of these are the
logical and consistent with the structure in the seismic image LeNet (Lecun et al., 1998), AlexNet (Krizhevsky et al., 2012),
(Lewis et al., 2014). In a similar fashion we would like to GoogleNet (Szegedy et al., 2014) and VGGNet (Simonyan and
build a prior model that represents our best guess of the loca- Zisserman, 2014) CNN architectures. Here, instead of invent-
tion of salt geobodies in the current model and use it to guide ing our own CNN architecture, we investigate the performance
the current model towards a solution which also incorporates of the AlexNet architecture. Only minor changes are made to
the predicted salt geobodies. adapt it to our needs since the original network is geared to-
wards predicting a large number of classes of natural images.
Convolutional Neural Networks
input : Starting model m0 , regularization weight λ ,
Convolutional neural networks (CNNs) are a kind of deep learn-
deep learned model χ
ing model that are specially well suited to process images.
output: Final earth model m∗
They are a specialization of the neural network (LeCun, 1989)
designed to process input data that is specified on a grid, such k ← 0;
as an image. They differ from a regular neural network in that m prev ← 0;
they replace the matrix multiplication operation of a regular while not converged do
neuron by a convolution. By making the explicit assumption
if mk sufficiently different from m prev then
that the inputs are images, they are able to greatly reduce the
Ik ← Migrate (mk );
number of unknown weights that need to be trained in each
Γ ← GenerateConfidence (mk , Ik , χ);
layer. This allows one to build deeper models thus allowing
m prior ← GeneratePrior (mk , Γ);
them to learn highly complex non-linear relationships from the
m prev ← mk ;
inputs.
end
The typical architecture of a CNN is shown in Figure 1. They gk ← ∇ f (mk ) + λ ∇J(mk , Γ, m prior ); /* gradient */
are made up of interconnected layers, commonly made up of pk ← ComputeSearchDirection (gk , pk−1 , ..., p1 );
only three layer types, convolutional layer (CONV), pooling αk ← LineSearch (mk , pk ); /* find steplength */
layer (POOL) and fully-connected layer (FC). The CONV lay- mk+1 ← mk + αk pk ;
ers parameters consist of a set of filters whose filter weights are k ← k + 1;
determined in the learning stage. Every filter is small spatially
end
(along width and height), but extends through the full depth
of the input volume. Periodically a pooling layer (POOL) is m∗ ← mk
inserted in-between successive CONV layers. Its function is Algorithm 1: Prior regularized FWI workflow
to progressively reduce the spatial size of the representation to
© 2017 SEG Page 1513

Training dataset (known models)

Train deep learning model
Image patches with structure
Image patches without structure

Run prediction
Extract image
patches
Test dataset (unknown models)
Figure 2: Deep learning workflow.
Deep learning prior models model that contains salt velocity in regions of high salt prob-
ability and velocity from the current model everywhere else
Our deep learning workflow for generating prior models is where this probability is low.
shown in Figure 2. We first prepare a training and validation
dataset using earth models that are representative of the geol- We can now use this generated prior model in our regularized
ogy in Gulf of Mexico. We generate migrated seismic images FWI objective function (2) with an appropriate regularization
from the known velocity models and the ground truth annota- weight to guide the inversion towards incorporating features
tion indicating the locations of regions of each image contain- from the prior model into the inversion result. This prior model
ing the salt geobodies. Together these are then used to extract generation is repeated every time we have a velocity model that
image patches from the seismic images and label them as those is sufficiently different from the velocity model we last used to
containing structural features and those without as can be seen obtain the migrated seismic image. This newly migrated im-
in the top left of Figure 2. Before these image patches can be age is used to create a new prior model which can constrain
used for training, careful pre-processing is needed and steps the subsequent iterations of the inversion. Algorithm 1 speci-
such as normalization and rescaling are applied. fies the complete description of this deep learned prior model
regularized FWI workflow.
These image patches are then input into a deep learning model
along with the associated labels. The training is computation-
ally demanding and depending on the number of input images, RESULTS
the kind of model used and the solver parameters, can take
anywhere from a few hours to days even on the latest GPU We test our method on the subsalt multiples attenuation and re-
hardware. duction technology (SMAART JV) Pluto 1.5 model. The Fig-
When running FWI on an unknown dataset we first use the ure 3 (a) shows the starting model used for FWI. The starting
starting velocity model to obtain a migrated seismic image. model is created by removing the salt geobodies from the true
This image forms our test dataset. We apply the same pre- model and applying a heavy smoothing to the sediment only
processing steps used in creating the image patches for our velocity model. To generate the training and validation dataset
training dataset and extract overlapping image patches at de- for the deep learning model, we use the Sigsbee2B (Paffenholz
sired output locations in the image. We then use our previously et al., 2005) model and some inline sections of the SEAM 3D
trained model to output a probability of each image patch con- salt model (Fehler and Larner, 2008). In total about 90,000
taining structural features. We can then use this probability image patches were extracted from these datasets along with
map to create a confidence model of the salt geobodies. the ground truth labels. It is important to mention that no in-
formation about the Pluto dataset itself was used in either the
Now, using the above computed probability and the knowledge training or validation datasets. The training is performed on a
that the salt velocity is about 4500 m/s, we can create a prior 8x Tesla K80 GPU node and took about 39 hours for 200,000
© 2017 SEG Page 1514

Figure 3: (a) Starting velocity used for inversion and prior gen- Figure 4: (a) FWI inverted model without any regularization.
eration (b) Predicted salt geobody confidence overlaid on the (b) Regularized FWI inverted model with the deep learned
seismic image. prior.
iterations. CONCLUSIONS
Before starting the FWI iterations, a migration is performed We have demonstrated a data driven workflow to build prior
using the Pluto starting velocity to obtain a seismic image which models for FWI by learning features from the seismic image
is then used as the test dataset for our trained deep model. This using state of the art deep learning models. We apply this to the
model generates a confidence model of the salt geobodies as problem of salt body reconstruction by learning the probability
shown in Figure 3 (b) overlaid on the seismic image. Using of salt geobodies being present at any location in a seismic im-
this confidence map along with the prior information that salt age and using this information to regularize the FWI objective
velocity is 4500 m/s we use the current velocity model to re- function. Doing this we show that long wavelength features of
place the high probability regions with salt velocity to generate the model such as salt can be introduced in the model which
our prior model. due to lack of low frequencies in the data is really challenging
Now we perform some inversions with the starting Pluto veloc- for FWI to achieve using the seismic data alone.
ity where we limit the usable data frequency in the 3Hz-8Hz Future work will aim at validating our approach on field data,
bandwidth. We first use the regular FWI workflow (ie data and comparing its performance to the interpretations obtained
misfit only objective function in Equation (1)) without a prior by expert seismic interpreters, with the ultimate goal of gradu-
model. The inverted velocity model after 25 iterations can be ally bringing automated salt body reconstruction using FWI
seen in Figure 4 (a). The inversion gets stuck into a local min- into the earth model building workflow. Although here we
imum and only the top layer of the salt geobody has been re- have focused our efforts toward the problem of salt body re-
covered. This is because the majority of the data sensitivity is construction, what we have presented is a general regulariza-
to the top salt-sediment interface alone, which gets recovered tion strategy that can be used to incorporate any prior informa-
reasonably in the inversion. tion that can be learned from a seismic image into FWI.
We now run the prior regularized FWI workflow. The inver-
sion result after 25 iterations can be seen in Figure 4 (b). Even
though we are still far away from the true model, we have re- ACKNOWLEDGMENTS
covered the bulk of the salt geobody information and seem to
be headed in the right direction. This shows that our use of the The authors thank Schlumberger management for permission
prior model is nudging the inversion to incorporate these geo- to publish this work. Richard Coates is thanked for his sup-
bodies in the inversion result by adding the long wavelength port and numerous helpful discussions and suggestions. Ben
features missing in the data gradient alone. Veitch is thanked for his very helpful review. Jerry Kapoor is
thanked for providing the motivation for tackling this problem
by reminding frequently that the salt was still winning.
© 2017 SEG Page 1515

EDITED REFERENCES
REFERENCES
Esser, E., L. Guasch, F. J. Herrmann, and M. Warner, 2016, Constrained waveform inversion for
automatic salt flooding: The Leading Edge, 35, 235–239, https://doi.org/10.1190/tle35030235.1.
Esser, E., F. Herrmann, L. Guasch, and M. Warner, 2015, Constrained waveform inversion in salt-
affected datasets: 85th Annual International Meeting, SEG, Expanded Abstracts, 1086–1090,
Fehler, M., and K. Larner, 2008, SEG advanced modeling (SEAM): Phase I first year update: The
Leading Edge, 27, 1006–1007, https://doi.org/10.1190/1.2967551.
Greenspan, H., B. van Ginneken, and R. M. Summers, 2016, Guest editorial deep learning in medical
imaging: Overview and future promise of an exciting new technique: IEEE Transactions on
Medical Imaging, 35, 1153–1159, https://doi.org/10.1109/TMI.2016.2553401.
Halpert, A. D., R. G. Clapp, J. Lomask, and B. Biondi, 2008, Image segmentation for velocity model
construction and updating: 78th Annual International Meeting, SEG, Expanded Abstracts, 3088–
3092, https://doi.org/10.1190/1.3063986.
Kapoor, J., D. Vigh, and N. Moldoveanu, 2014, Image improvements with full-azimuth acquisition and
full-waveform inversion: 84th Annual International Meeting, SEG, Expanded Abstracts, 153–
157, https://doi.org/10.1190/segam2014-0319.1.
Krizhevsky, A., I. Sutskever, and G. E. Hinton, 2012, Imagenet classification with deep convolutional
neural networks, in Advances in Neural Information Processing Systems 25: Curran Associates,
1097–1105.
LeCun, Y., 1989, Generalization and network design strategies, Technical Report CRG-TR-89-4,
University of Toronto, Toronto.
LeCun, Y., Y. Bengio, and G. Hinton, 2015, Deep learning: Nature, 521, 436–444,
https://doi.org/10.1038/nature14539.
Lecun, Y., L. Bottou, Y. Bengio, and P. Haffner, 1998, Gradient-based learning applied to document
recognition: Proceedings of the IEEE, 86, 2278–2324, https://doi.org/10.1109/5.726791.
Lewis, W., D. Amazonas, D. Vigh, and R. Coates, 2014, Geologically constrained full-waveform
inversion using an anisotropic diffusion based regularization scheme: Application to a 3D
offshore Brazil dataset: 84th Annual International Meeting, SEG, Expanded Abstracts, 1083–
1088, https://doi.org/10.1190/segam2014-1174.1.
Paffenholz, J., B. McLain, J. Zaske, and P. J. Keliher, 2005, Subsalt multiple attenuation and imaging:
Observations from the sigsbee2b synthetic dataset: 75th Annual International Meeting, SEG,
Expanded Abstracts, 2122–2125, https://doi.org/10.1190/1.1817123.
Plessix, R. E., 2006, A review of the adjoint-state method for computing the gradient of a functional with
geophysical applications: Geophysical Journal International, 167, 495–503,
https://doi.org/10.1111/j.1365-246X.2006.02978.x.
Setio, A. A., A., A. Traverso, T. de Bel, M. S. N. Berens, C. van den Bogaard, P. Cerello, H. Chen, Q.
Dou, M. Evelina Fantacci, B. Geurts, R. van der Gugten, P. A. Heng, B. Jansen, M. M. J. de
Kaste, V. Kotov, J. Yu-Hung Lin, J. T. M. C. Manders, A. Sonora-Mengana, J. C. Garc´ia-
Naranjo, M. Prokop, M. Saletta, C. M. Schaefer-Prokop, E. T. Scholten, L. Scholten, M. M.
Snoeren, E. Lopez Torres, J. Vandemeulebroucke, N. Walasek, G. C. A. Zuidhof, B. van
Ginneken, and C. Jacobs, 2016, Validation, comparison, and combination of algorithms for
© 2017 SEG Page 1516

automatic detection of pulmonary nodules in computed tomography images: The LUNA16
challenge: ArXiv e-prints.
Simonyan, K., and A. Zisserman, 2014, Very deep convolutional networks for large-scale image
recognition: arXiv: 1409.1556.
Szegedy, C., W. Liu, Y. Jia, P. Sermanet, S. E. Reed,D. Anguelov, D. Erhan, V. Vanhoucke, and A.
Rabinovich, 2014, Going deeper with convolutions: arXiv: 1409.4842.
Tarantola, A., 1984, Linearized inversion of seismic reflection data: Geophysical Prospecting, 32, 998–
1015, https://doi.org/10.1111/j.1365-2478.1984.tb00751.x.
Tikhonov, A., and V. Arsenin, 1977, Solutions of ill-posed problems: John Wiley & Sons.
Vigh, D., J. Kapoor, and H. Li, 2011, Full-waveform inversion application in different geological
settings: 81th Annual International Meeting, SEG, Expanded Abstracts, SEG, 2374–2378,
https://doi.org/10.1190/1.3627685.
Vigh, D., W. Lewis, C. Parekh, K. Jiao, and J. Kapoor, 2015, Introducing well constraints in full
waveform inversion and its applications in time-lapse seismic measurements: 85th Annual
Virieux, J., and S. Operto, 2009, An overview of full- waveform inversion in exploration geophysics:
Geophysics, 74, no. 6, WCC1–WCC26, https://doi.org/10.1190/1.3238367.
Wang, Z., T. Hegazy, Z. Long, and G. AlRegib, 2015, Noise-robust detection and tracking of salt domes
in post-migrated volumes using texture, tensors, and subspace learning: Geophysics, 80, no. 6,
WD101–WD116, https://doi.org/10.1190/geo2015-0116.1.
Zhdanov, M., 1993, Tutorial: Regularization in inversion theory, Technical Report CWP-136, Center for
Wave Phenomena, Colorado School of Mines, Golden.
© 2017 SEG Page 1517

Parametric Level-Set Full-Waveform Inversion in the Presence of Salt-Bodies
Ajinkya Kadu , Wim A. Mulder† , Tristan van Leeuwen
Mathematical Institute, Utrecht University
† Shell Global Solutions International B.V. & Delft University of Technology
SUMMARY dynamically adapts the width of the level-set boundary to

obtain faster convergence (Kadu et al., 2016). This approach
Full-waveform inversion attempts to estimate a high-resolution implicitly avoids flattening and steepening of level-set
model of the Earth by inverting all the seismic data. This function near boundary by adjusting the width according to an
procedure fails if the Earth model contains high-contrast bodies approximate level-set gradient. We assumed, however, a
such as salt and if sufficiently low frequencies are absent from known background sediment model. In the present paper, we
the data. Salt bodies are important for hydrocarbon exploration propose a joint-inversion approach and try to reconstruct the
because oil or gas reservoirs are often located on their sides or background and salt geometry simultaneously.
underneath. We represent the shape of the salt body with a
level set, constructed from radial basis functions to keep its
dimensionality low. We have shown earlier that the salt body THEORY
can be completely recovered if the sediment structure is already
known. In this paper, we propose a strategy to simultaneously The FWI problem in its classic least-squares formulation
reconstruct the sediment and the salt. The sediment is implicitly (Tarantola, 1984) reads
represented by a bilinear interpolation kernel with a small ( )
number of variables. An alternating minimization technique min 12 kF (m) dk22 ,
m
solves the resulting optimization problem. The results on a
synthetic model using Gauss-Newton approximation of the where F models the scalar Helmholtz equation, m defines the
Hessian shows the feasibility of the approach. subsurface model, for instance, P-wave velocity or density or
both, and d represents the data.
It is natural to separate the model into salt and sediment with

INTRODUCTION constant and known salt velocity, m1 and sediment model m0 :
m(x) = {1 a(x)}m0 (x) + a(x)m1, (1)

Full-waveform inversion (FWI) has become a popular
technique to produce high-resolution maps of the subsurface where a is an indicator function that separates the salt from the
velocity and density from the available seismic data. In its sediment. Solving for a(x) is an NP-hard problem because of
classic form, the method obtains a model by iteratively fitting its discrete nature (Del Lungo and Nivat, 1999). A level-set
modeled to observed data in a least-squares sense. Because function, : R2 ! R, will simplify the problem:
the underlying minimization problem is highly nonlinear and
seismic data typically lack sufficiently low frequencies, a good 8
> 1, (x) 0;
initial guess of the model is required to avoid convergence to a(x) = <
> 0,
: (x) < 0.
the nearest local minimum, which generally does not represent
the ground truth (Virieux and Operto, 2009). The problem Mathematically, we represent an indicator function with the
becomes worse in the presence of salt bodies, which are of Heaviside function, i.e., a(x) = h( (x)). Now, the problem is
particular interest to the oil industry because hydrocarbon to find a function (x) that represents the true salt geometry.
reservoir are often located nearby or underneath. Often,
conventional FWI reconstructs the top of the salt but fails to Parametric Level-Set Method
obtain its interior, its bottom, and everything underneath. Kadu et al. (2016) chose a representation of the level-set
A level set can help to regularize the problem. Santosa (1996) function by a linear combination of finite
introduced the method to geometric inverse problems. It compactly-supported radial basis functions (RBFs):
implicitly defines the shape of the salt as the level set or zero ns
X
contour of a smooth function. The conventional method deals (x) = ↵ j (kx j k2 ).
with the level-set evolution through the Hamilton-Jacobi j=1
equation, also known as the level-set equation. This approach
Its discrete version is represented by a RBF Kernel Matrix, B,
suffers from steepening and flattening of the level set during
with entries bi j = [ (kxi j k2 )], where {xi }i=1 represents
n
iterations. The classic solution, re-initialization of the level ns
set, solves the issue but is computationally intensive. Dahlke the model grid points. The RBF nodes { j } j=1 are placed
et al. (2015), Lewis et al. (2012) and Guo and de Hoop (2013) on an equidistant model grid. The RBF amplitudes {↵ j } nj=1s
successfully applied the method in combination with control the level set. Typically, we use a sufficiently smooth
full-waveform inversion. Wendland RBF of the form
Previously, we presented a robust implementation that (r) = (1 r)+8 (32r 3 + 25r 2 + 8r + 1).
© 2017 SEG Page 1518

Parametric Level-Set Full-Waveform Inversion
2 Bilinear Interpolation
We can impose smoothness constraints on the sediment velocity
by means of bilinear interpolation from a model on a coarser
mesh:
Xnb
m0 (x) = m0min + j l j (x H j ).
(x)
h(x)
1 1
j=1
0.5 Here, m0min

denotes the water velocity, l j (·) represents a
piecewise linear basis function at node H j and j denotes its
corresponding weight. The latter term can be captured using a
0 0
bilinear interpolation kernel B when the model is discretized
-1 -0.5 0 0.5 1
x
on a grid.
compact true global For fixed m1 and (x) in equation (1), the optimization problem
becomes
Figure 1: Heaviside and corresponding Dirac-Delta functions minn f ( ) = 12 kF ( ) dk 2 .
(dashed) with a width of 0.5. 2R b
The gradient and Gauss-Newton Hessian are given by
The Parametric Level-Set Full-Waveform Inversion (PLS-FWI) g = BT J ⇤ (F ( ) d) ,

for fixed m0 and m1 becomes T ⇤
H = B J J B.
( )
minn f (↵) = 12 kF (↵) d k22 .
↵ 2R s
The gradient and Gauss-Newton Hessian for this problem are 0 0
1 1
z [km]
z [km]
T ⇤
g↵ = AT D↵ J (F (↵) d) , 2 2
T ⇤
H↵ = AT D↵ J J D↵ A,
0 2 4 6 8 0 2 4 6 8
x [km] x [km]
(a) (b)
where J is the Jacobian of the forward modeling operator F.
The diagonal matrix D↵ denotes the element-wise
0 0
multiplication of the difference between the salt and sediment

1 1
z [km]
z [km]
velocity with the Dirac-Delta function:
2 2
0 2 4 6 8 0 2 4 6 8
x [km] x [km]
D↵ = diag((m1 m0 ) h✏0 ( A↵)), (c) (d)
where h✏ (·) is an approximation of the Heaviside function with Figure 2: An example of bilinear interpolation. The less nodes,
width ✏. It is important to note that the level-set sensitivities from (a) to (d), the smoother the representation of the model.
depend on two main factors: (1) the difference between the
salt and sediment velocity and (2) the approximation of the
Joint reconstruction
Heaviside function.
The model is now represented in terms of ↵ and as:
We use a compact approximation of the Heaviside function,
m = {1 h( A↵)} (m0min + B ) + h( A↵)m1 . (3)
plotted in Figure 1. This formulation provides a constant region
of sensitivity around the level-set boundary. This allows level- With prior information about the salt velocity m1 and water
set parameters to take large steps as the FWI gradient (i.e., velocity m0min , the minimization problem becomes
J ⇤ (F (↵) d)) are extrapolated by a constant factor. Due to its
( )
compactness, only neighboring RBFs are updated, providing min f (↵, ) = 12 kF (↵, ) dk22 ,
less artifacts in the reconstruction. ↵,
such that min   max .
The width of the Heaviside depends on the level-set boundary
and the spatial gradient of the level-set function. Because We use an alternating minimization strategy, splitting the
this gradient is expensive to compute, we approximate it using optimization procedure in two parts, namely minimization
the lower and upper bounds of level-set function. Hence, the over the level-set parameters ↵ and minimization over the
Heaviside width is given by: background parameters . We alternately update the level set
! and the background in a multi-scale fashion. Algorithm 1
max( ) min( )
✏= x presents the basic algorithm, whereas Figure 3 outlines the
x (2) multi-scale work-flow.
=  [max( A↵) min( A↵)]
We refer the reader to (Kadu et al., 2017) for more details.
© 2017 SEG Page 1519

Algorithm 1 Basic Joint Reconstruction Algorithm
Require: d - data for frequency batch I f , F - operator
m1, m0min, min, max - model prior information
A, B - Kernel matrices, ↵ 0, 0 - initial values
Ensure: {↵ final, final }, m - final model
1: for i = 1 to InnerItermax do
2: for j = 1 to J do
⇣ ⇣ ⌘⌘ 1 ⇣ ⌘
3:
( j+1) = ( j ) + H ↵ (i 1), ( j ) g ↵ (i 1), ( j )
4: end for
5:
(i) = (J )
6: for k = 1 to K do ⇣ ⇣ ⌘⌘ 1 ⇣ ⌘
7: ↵ (k+1) = ↵ (k ) + H↵ ↵ (k ), (i) g↵ ↵ (k ), (i) Figure 4: Bounds on the sediment velocity
8: end for
9: ↵ (i) = ↵ (K )
10: end for and 200 m apart. The source is a Ricker wavelet with a 15-Hz
11: compute m from equation (3). peak frequency and zero time lag. The data were acquired
with receivers placed 50 m apart starting at a smallest offset of
100 m up to a largest of 4 km. To avoid a full inverse crime, a
In step 3, we use an interior-point method that incorporates different finite-difference code generated the data for a model
bounds constraints on the parameters and in step 7, a simple discretized with a much finer grid spacing of 50/8 m. The
Newton method. In both these steps, the descent direction is amplitude of the source wavelet is estimated for each
calculated by a truncated conjugate-gradient method. frequency at every step (Pratt, 1999).
0
4000
1
z [km]
3000
2
2000
3
0 2 4 6 8 10
x [km]
Figure 5: True velocity (in m/s) of synthetic model .
For the classic full-waveform inversion, we apply a spectral

projected gradient method with bounds constraints (Schmidt
Figure 3: Multiscale Joint Reconstruction Algorithm et al., 2009) on the velocity. For the initial model, we take a
linear velocity profile with depth. The inversion is performed
Bounds on sediment parameter in a multi-scale fashion over the frequency range 2.5–4.5 Hz
To avoid the delineation of salt in the sediment, it is important with 200 iterations for each frequency batch and a total of 3
to put bounds on sediment velocity. Figure 4 shows an example. passes (Esser et al., 2015) over the frequency range. Figure 6
We allow for smooth updates of the level set near the top region displays the results. The top of the salt near the water bottom
of the salt by constraining the sediment velocity to an upper has been reconstructed reasonably well, but not the salt body
bound close to true velocity. As noticed in the FWI results, the below. The sediment structure at larger depths is also lost.
region just below the top part demands very low velocities. This
0
can also happen with the proposed approach and will slow down 4000
the reconstruction procedure. To accelerate the convergence, 1
z [km]
we impose an appropriate lower bound as a function of depth. 3000

2
2000
3
EXAMPLES 0 2 4 6 8 10
x [km]
To demonstrate the capabilities of the proposed method, we
present numerical experiments on a synthetic model with Figure 6: Classic Full-Waveform Inversion results
acoustic data. Figure 5 shows a model 10 km long and 3 km
deep, discretized with a 50-m grid spacing. The sediment is a The parametric level-set parameters ↵ are initialized as shown
staircase in depth below a 300-m water layer with velocities in Figure 7. All negative RBFs have a value of 1 and all
between 1500 and 4000 m/s. The salt body, embedded in the positive of +1. We place 25(z) ⇥ 20(x) nodes over the model
sediment, has its top at 200 m below the water bottom with a grid for the background. The background parameters are
constant velocity of 4500 m/s. Sources were placed 10 m deep initialized with a smooth linear trend in depth. We incorporate
© 2017 SEG Page 1520

prior information about the water bottom, at z = 350 m, in the 0 0 0
initial model.
0.5 0.5 0.5
0 1 1 1
4000
depth [km]
depth [km]
depth [km]
1 1.5 1.5 1.5
z [km]
3000
2 2 2
2
2000 2.5 2.5 2.5
3
0 2 4 6 8 10 3 3 3
1000 2000 3000 4000 1000 2000 3000 4000 5000 1000 2000 3000 4000
x [km] velocity [m/s] velocity [m/s] velocity [m/s]
(a) (b) (c)

Figure 7: Initial model with velocities in m/s. The level set is
initialized around the top of the salt. Positive RBFs are denoted Figure 10: Velocity profile at x = 2 km (a), x = 5 km (b), and
by red pluses and negative by dots. x = 8 km (c). The blue line denotes the reconstructed velocity,
the red dotted line denotes the true velocity, while the yellow
The optimization over is performed with the interior-point dash-dotted line represents the initial velocity.
method (fmincon in MATLAB® ). We restrict ourselves to
J = 10 iterations in algorithm 1. We apply the minFunc
(Schmidt, 2012) code in MATLAB® to optimize over ↵, limited
0.1
20 20 20 0.08
to K = 20 iterations. In both these steps, the step direction is 40 40 40 0.06
calculated by at most 10 iterations of the conjugate gradient

60 60 60 0.04
80 80 80 0.02
receivers
receivers
method. A total of 4 passes are made over the frequency range,
receivers
100 100 100 0
along with 2 inner iterations for each frequency batch. In 120
140
120
140
120 -0.02
total, we perform about 960 iterations. The Heaviside width

140 -0.04
160 160 160 -0.06
parameter () is initialized with 0.05 and reduced by 20% after 180 180 180 -0.08
each frequency pass to produce sharp boundaries for the salt.

200 200 200 -0.1
10 20 30 40 50 10 20 30 40 50 10 20 30 40 50
sources sources sources
(a) (b) (c)

0
4000
1 Figure 11: Normalized data misfit (only real part) for final
z [km]
3000 reconstructed model at (a) 2.5 Hz, (b) 3.5 Hz, and (c) 4.5 Hz.
2
2000
We normalized the differences by the maximum value of the
3 true data per frequency.
0 2 4 6 8 10
x [km]
sediment. The salt geometry is defined with a level set
Figure 8: Model obtained by the joint reconstruction approach. represented by radial basis functions. The sediment, in turn, is
Corresponding positive (pluses) and negative (dots) RBFs. represented by piecewise linear functions on a small number
of nodes. This low-dimensional formulation of the model
Figure 8 shows the model obtained with the proposed method. imposes smoothness on the sediment and on the salt
The salt is reconstructed accurately at its top and sides. The geometries. The proposed compact approximation of the
sediment structure is reconstructed well down to a depth of Heaviside function leads to faster convergence and produces
1.5 km as shown in Figure 10. Figure 9 illustrates the need for no artifacts. We apply an alternating minimization strategy to
multiple passes over the frequency range. Figure 11 shows that optimize over the two different parameters. Results on a
the method manages to fit the data for the lower frequencies but synthetic acoustic example demonstrates the method’s
not for the higher. capability. The proposed method accurately predicts the salt
geometry where the conventional full-waveform inversion
0 0 0 fails.
1 1 1
z [km]
z [km]
z [km]
ACKNOWLEDGMENT
2 2 2
3 3 3
0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10
x [km] x [km] x [km]
(a) (b) (c) This work is part of the Industrial Partnership Programme
(IPP) ‘Computational sciences for energy research’ of the
Figure 9: Reconstructed model after the 1st (a), 2nd (b) and 3rd Foundation for Fundamental Research on Matter (FOM),
(c) frequency pass. which is part of the Netherlands Organisation for Scientific
Research (NWO). This research programme is co-financed by
Shell Global Solutions International B.V. The third author is
financially supported by the Netherlands Organisation for
CONCLUSIONS
Scientific Research (NWO) as part of research programme
We have proposed a joint reconstruction approach for salt 613.009.032. We acknowledge Seismic Laboratory for
delineation in seismic full-waveform inversion. Our approach Modeling and Imaging (SLIM) at UBC for providing
is based on the idea of separating the model into salt and computational software.
© 2017 SEG Page 1521

EDITED REFERENCES
REFERENCES
Dahlke, T., B. Biondi, and R. Clapp, 2015, Domain decomposition of level set updates for salt
segmentation: 85th Annual International Meeting, SEG, Expanded Abstracts, 1366–1371,
http://doi.org/10.1190/segam2015-5917368.1.
Del Lungo, A., and M. Nivat, 1999, Reconstruction of connected sets from two projections, in G. T.
Herman and A. Kuba, eds., Discrete tomography: Springer, 163–188.
Esser, E., L. Guasch, T. van Leeuwen, A. Y. Aravkin, and F. J. Herrmann, 2015, Automatic salt
delineation - wavefield reconstruction inversion with convex constraints: 85th Annual
Guo, Z., and M. V. de Hoop, 2013, Shape optimization and level set method in full waveform inversion
with 3D body reconstruction: 83rd Annual International Meeting, SEG, Expanded Abstracts,
1079–1083, https://doi.org/10.1190/segam2013-1057.1.
Kadu, A., T. van Leeuwen, and W. A. Mulder, 2016, A parametric level-set approach for seismic full-
waveform inversion: 83rd Annual International Meeting, SEG, Expanded Abstracts, 1146–1150,
Kadu, A., T. van Leeuwen, and W. A. Mulder, 2017, Salt reconstruction in full waveform inversion with
a parametric level-set method: IEEE Transactions on Computational Imaging, 3, 1–11,
https://doi.org/10.1109/TCI.2016.2640761.
Lewis, W., B. Starr, and D. Vigh, 2012, A level set approach to salt geometry inversion in full-waveform
inversion: 82nd Annual International Meeting, SEG, Expanded Abstracts, 1–5,
Pratt, R. G., 1999, Seismic waveform inversion in the frequency domain, part 1: Theory and verification
in a physical scale model: Geophysics, 64, 888–901, https://doi.org/10.1190/1.1444597.
Santosa, F., 1996, A level-set approach for inverse problems involving obstacles: ESAIM: Control,
Optimisation and Calculus of Variations, 1, 17–33, https://doi.org/10.1051/cocv:1996101.
Schmidt, M., 2012, minfunc: unconstrained differentiable multivariate optimization in matlab,
http://www.di.ens.fr/mschmidt/Software/minFunc.html.
Schmidt, M. W., E. van den Berg, M. P. Friedlander, and K. P. Murphy, 2009, Optimizing costly
functions with simple constraints: A limited-memory projected quasi-newton algorithm:
Proceeding of the 12th International Conference on Artificial Intelligence and Statistics
(AISTATS), 456–463.
1259–1266, https://doi.org/10.1190/1.1441754.
Virieux, J., and S. Operto, 2009, An overview of full-waveform inversion in exploration geophysics:
© 2017 SEG Page 1522

Multi-scale seismic envelope inversion for salt structures using a new direct envelope Fréchet
derivative
Ru-Shan Wu*1, Guoxin Chen1,2 (1: University of California, Santa Cruz, USA; 2: Zhejiang University,
China)
SUMMARY After demodulation process, the envelope data can be

decomposed into different scales. Wu et al., (2016) proposed
For strong-nonlinear FWI involved with salt structures or a multi-scale decomposition method by a window-average
other strong-contrast structures, the standard envelope operator with different window-widths. Note that this scale
inversion does not work well. We introduce multi-scale decomposition to the envelope is very different from the
window-averaged envelope (WAE) data and derive a new linear decomposition of the waveform data. The l-f
envelope Frechét derivative (sensitivity operator) based on information filtered out from the envelope does not exist in
energy pulse scattering physics without using the chain rule the original waveform data, but is inherently coded in the
of differentiation. This new envelope Frechét derivative envelope in the modulation model of seismic signal. The
(EFD) does not depend on the weak nonlinearity assumption multi-scale envelope functional which is very nonlinear, is
and can be applied to the case of strong-nonlinear inversion. defined as (Wu et al., 2016)
In this paper we show the features of the new Frechét and its
d (u (t ), t ) = e (u (t ), t ) = ∫ dt 'W (t − t ')[u (t ') + u (t ')]
1 2 2
related adjoint source and gradient fields. Then the inversion W
τW
H
results for the SEG 2D salt model is shown to prove the

theory and method. (1)
where uH (t ) is the Hilbert transform of u (t ) (synthetic
INTRODUCTION seismogram), W (t ) is a window function and τ w is the
effective window-width (a normalization constant). we plot
It is well-known that the traditional FWI is based on the the envelograms with different scales (different window
weak scattering theory and can only be applied to the weak- widths from 400ms to 1000ms) for the waveform data of
nonlinear full waveform inversion. This leads to the intrinsic SEG salt model in Figure 1. Since the envelope curve
difficulty of starting model dependence (for a review see represents energy-pulse reflection series, we see different
Virieux and Operto, 2009). In order to obtain the large scale scale envelope-curves resemble the responses of reflectors
structure which can serve as a “good” starting model, to different width energy-pulses.
different approaches were introduced to mitigate the
difficulty. Shin & Cha (2008, 2009) has performed inversion
in the Laplace and Laplace-Fourier domain to estimate the
smooth, large-scale velocity structure. Seismic envelope
inversion has recently been introduced and developed to
recover the long-wavelength background structure without
relying on the ultra-low frequency source (Wu et al., 2012,
2013, 2014; Chi et al., 2014; Luo et al., 2015). However,
when applied to the envelope-inversion for large structures
of strong-scattering media, such as those involved with salt,
basalt or karst, the success is very limited. In this paper, we
propose a multi-scale envelope inversion (MS-EI) and
derive a new envelope Fréchet derivative (EFD) for the
application to strong-nonlinear inversion. Numerical tests on
a simple salt layer model and the SEG 2D salt model
demonstrated the validity of the approach.
Figure 1: window-averaged envelope (WAE) profiles for the SEG
MULTI-SCALE ENVELOPE DECOMPOSITION salt model (shot at 10.2 km): (a) Original waveform traces; (b)
Envelope (IE) profile; (c) WAE with width 400ms; (d) WAE with
Envelope obtained from the instantaneous amplitude by width 1000ms
AST (analytical signal transform) of waveform data
(seismograms) is a time-varying energy curve. It is still a DIRECT ENVELOPE FRÉCHET DERIVATIVE
highly fluctuating curve. Through this nonlinear signal
operator, the envelope operator, ULF (ultra-low-frequency) Problem of using traditional waveform Fréchet
information coded nonlinearly in seismograms are extracted. derivative.
© 2017 SEG Page 1523

Multi-scale seismic envelope inversion with new Frechét derivative
Traditionally, envelope Fréchet derivative (envelope by neglecting the interference in energy scattering, linear
sensitivity kernel) is derived from a chain rule of functional superposition is valid under the single scattering
derivatives, and the implementation is relied on the use of approximation for velocity (or impedance) perturbation,
waveform Fréchet derivative (WFD). In operator form, it is leading to a better linearity in the case of strong scattering
∂d ∂d ∂u ∂d such as the boundary scattering (reflection) of strong-
= = Fu (2) contrast media. This is why we prefer to derive the envelope
∂v ∂u ∂v ∂u
Fréchet derivative (EFD) directly with energy formulation
Where ∂d / ∂v =Fd is the data Fréchet derivative (data rather than through the chain rule and relate it to the
sensitivity operator), and Fu = ∂u / ∂v is the waveform waveform Fréchet derivative.
Fréchet derivative (waveform sensitivity operator). If the
In waveform inversion, the sensitivity operator (Fréchet
data functional is very nonlinear, the partial functional
derivative) can be expressed in an operator form (Tarantola,
derivative ∂d / ∂u is also nonlinear. However, when we 2005; Pratt et al, 1998)
introduce a nonlinear data functional to the waveform data, δu = Fuδ v = G 0Q 0δ v (3)
we hope the new data set will have a better linearity to the
velocity perturbations. On the other hand, the chain rule of Where G 0 is the background Green’s operator, and Q 0 is
differentiation is in fact making the linearization to both the linearized virtual source operator, defined as (under
nonlinear partial derivatives ∂d / ∂u and ∂u / ∂v which may weak scattering approximation)
destroy the advantage of the new nonlinear data functional. 2 ∂2
This can be explained clearly by a schematic graph Figure 2, = Q0 ( x, x ', t ) u0 ( x ', t )δ ( x − x ') (4)
v0 ( x ') ∂t 2
3
which shows the strong nonlinearity of partial functional

derivatives ∂d / ∂u (green line) & ∂u / ∂v (balck line) in where v0 ( x) is the background velocity and
contrast with the weak nonlinearity of direct functional u0 ( x, t ) = g 0 ( x, t ; xs ) is the local incident wavefield excited
derivative ∂d / ∂v (red line). The traditional derivation of by a shot at xs . We see that in the case of scalar wave
gradient operator for a data functional using the chain rule equation, the virtual source operator is a diagonal operator
assumes weak nonlinearity for both partial derivatives and therefore is called virtual source term (Pratt et al., 1998).
∂d / ∂u and ∂u / ∂v , so that the gradient calculation for For envelope inversion, virtual source operator is different
misfit functional can take use of the waveform Fréchet for boundary reflection (boundary scattering) and volume
derivative ∂u / ∂v for easy programming. However, as we scattering. For the case of boundary scattering
can see from Figure 2 that the price to pay for the easy (backscattering or reflection) of strong-contrast media,
handling is to lose the advantage of introducing the new the corresponding virtual source operator can be defined as
nonlinear data functional, resulting in a worse convergence 1
property of inversion process. = Q0( e ) ( x, x ', t ) g 0( e ) ( x ', t ; xs )δ ( x − x ') (5)
v0 ( x ')
where “(e)” for superscript denotes for “envelope”. So
g 0( e ) =| u0 | is the envelope or amplitude Green’s function,
therefore, is the incident energy packet from the source. In
operator form, the envelope sensitivity operator is written as
δe = FEδ v = G E Q Eδ v (6)
Where FE is the envelope Fréchet (EFD), G E = G (0e ) is the
envelope (or amplitude) Green’s operator, and Q E = Q (0e ) is
the envelope virtual source operator. In this way, we derive
the envelope sensitivity operator (Fréchet derivative)
directly based on energy scattering theory, and therefore no
Fig.2: Schematic diagram to show the strong nonlinearity of partial weak scattering assumption or weak nonlinearity of
functional derivatives ∂d / ∂u (green line) & ∂u / ∂v (balck line) waveform sensitivity operator is imposed. This direct
and the weak nonlinearity of direct functional derivative ∂d / ∂v envelope Fréchet derivative can improve the convergence of
(red line). envelope inversion and is critical for multi-scale envelope
inversion.
New envelope Fréchet derivative (EFD) for strong-
nonlinear data functional In order to see the merit of the new envelope Fréchet
Envelope formation can be formulated based on the theory compared with the traditional Fréchet for envelope inversion,
of energy scattering (Wu et al., 2016). Due to the additivity now we compare the adjoint sources and gradient fields of
SEG-2017
© 2017 SEG Page 1524
envelope inversion related to the two different Fréchet field of EI using the traditional Fréchet (WFD) (Figure 5a),
derivatives. For envelope inversion, defining the envelope as the superiority of the new Fréchet is obvious.
our data, then we can apply the adjoint operator of FE to
derive the gradient operator for envelope inversion. Apply
FET , which is the transpose (approximate adjoint) of FE , to
equation (6), resulting in
δ v = (FET FE ) −1 FET δe (7)
This is a generalized linear inversion, and FET FE is
recognized as the approximate Hessian operator. For the (a)
gradient method, we need only the adjoint envelope Fréchet:
= FET (=G E Q E )T QTE G TE (8)
We see from (7) and (8) that the adjoint source by using the
envelope Fréchet is just the envelope residual δe . In
comparison, if we use the waveform Fréchet as in the
traditional envelope inversion, the adjoint source will be
(b)
(∂d / ∂u)T δe . Due to the strong filtering of the partial
derivative (∂d / ∂u)T , the ultra-low-frequency (ULF) in the
envelope data are lost. In the case large-scale envelope data,
the ULF information may be totally lost. Figure 3 shows the
comparison of adjoint sources between the conventional
FWI (a), envelope inversion (EI) using the waveform
Fréchet derivative (b), and the MS-EI using the new
(c)
envelope Fréchet EFD (c). In Figure 4 we give the Fig.3: Comparison of adjoint sources between the conventional
corresponding spectra. We know that the adjoint source for FWI (a), EI (envelope inversion) using the waveform Fréchet
traditional FWI is the waveform data residual, so its derivative (WFD) (b), and the MS-EI using the new envelope
spectrum is restricted to the effective band of the source Fréchet derivative (EFD) (c).
spectrum (red spectrum). Conventional EI using the
waveform Fréchet Derivative can expand the spectra
towards the l-f band, but the expansion is severely filtered
out due to the weak nonlinearity assumption. In contrast, for
the MS-EI using the new EFD the ultra-low-frequency
components are preserved in the adjoint sources.
Next, we show the difference in gradient fields between Fig.4: Comparison of adjoint sources spectra: FWI (red),
using the new EFD and using the traditional WFD for conventional EI (green), and MS-EI using the new envelope
envelope data. Figure 5 shows the Gradient fields using Fréchet derivative (EFD) (blue)
different Fréchet derivatives: (a) conventional EI using
WFD; (b)-(h) MS-EI using EFD with different window- NUMERICAL TESTS OF MS-EI (MULTI-SCALE
widths: (b) original, (c) to (h) are 20ms, 50ms, 100ms, ENVELOPE INVERSION) ON THE SEG 2D SALT
200ms, 300ms, and 500ms, respectively. We see that for this MODEL
strong-nonlinear case of salt model inversion, using the
traditional WFD derived by the chain rule results in a Now we show the inversion results of our MS-EI using the
gradient similar to the case of FWI, and the gradient reaches new EFD applied on the SEG 2D salt model. There are 128
only shallow depth; while the new EFD can reach greater shots distributed along the model surface at intervals of
depth, since the adjoint source in the latter case is just the 120m. For each shot, we use 645 receivers with intervals of
envelope residual. For the multi-scale envelope data, the new 24m. A low-cut Ricker wavelet is used as source in the test
Fréchet EFD depicts the better linear correspondence (cut from 4Hz below, 4-5Hz is the taper zone). The
between the multi-scale data and the multiple-scale velocity dominant frequency of the source is 9 Hz. We set the linear
structures (Figure 5 b-h). Figure 6 gives the multiple-scale gradient model as the initial model. For comparison, we plot
gradient field of MS-EI by superposing the individual the results of conventional FWI in Fig. 7(a) and conventional
gradient fields, showing how the gradient field resembles the EI+FWI in Fig. 7(b). Due to the strong-contrast and large
gross feature of the salt structure. Compared to the gradient size of the salt body, the traditional FWI can only see the top
SEG-2017
© 2017 SEG Page 1525
salt boundary. For the same reason, conventional envelope CONCLUSIONS

inversion cannot penetrate deep into the salt body. The
conventional seismic EI still has cycle-skipping problem Multi-scale envelope inversion (MS-EI) has two key
although less severe than FWI. Secondly, conventional EI ingredients: one is the multi-scale (MS) decomposition of
derived the gradient using the chain rule of differentiation envelope curves, the other is the introduction of a new
and updates the model using the waveform Fréchet envelope Fréchet derivative (EFD) operator and a gradient
derivative which is based on weak-scattering assumption. In method based on the operator. The direct envelope Fréchet
this way, many severe limitations of the first order waveform (EFD) has a much better linearity with respect to the velocity
Fréchet is expected to bring difficulties to the conventional variations. Therefore, the new EFD plays a critical role in
EI or EI+FWI. In sharp contrast, we plot the final results of multi-scale envelope inversion (MS-EI). We demonstrated
MS-EI with different window widths in Fig.7c. In the both analytically and numerically that the new
iteration process we have applied the regular FWI at the end theory/method of MS-EI can be applied to the case of strong-
of each loop to recover the details of the salt top. In addition, nonlinear inversion for large-scale strong-contrast media.
we combined the multi-offset method into the last loop to
better delineate the salt bottom. For implementation details,
see a companying abstract (Chen et al., SEG 2017). We see
that large scale structure has been recovered due to the use
of the new Fréchet EFD and the multi-scale decomposition
of envelope data. a
Figure 7: Inversion results for the SEG 2D salt model. A 1-D

linear model is used as the starting model: (a) conventional FWI;
(b) Conventional EI+FWI; (c) MS-EI using the new Fréchet (EFD)
Fig.5: Gradient fields using different Fréchet derivatives: (a) Acknowledgments

conventional EI using waveform Fréchet derivative (WFD); (b)-(h)
This work is supported by WTOPI (Wavelet Transform On
MS-EI using envelope Fréchet derivative (EFD) with different
window-widths: (b) original envelopes, (c) width = 20ms, (d) Propagation and Imaging for seismic exploration) Research
50ms, (e) 100ms (f) 200ms, (g) 300ms, (h) 500ms. Consortium and other funding resources at the Modeling
and Imaging Laboratory, University of California, Santa
Cruz. We are grateful to our Consortium sponsors for their
financial support and allowing us to publish our research
results.
Fig.6: Gradient field of MS envelope Fréchet (EFD) with multi-

scale envelope data (window widths 0-500ms) (see figure 5). Data
are produced with a low-cut (from 4Hz below) source.
SEG-2017
© 2017 SEG Page 1526
EDITED REFERENCES
REFERENCES
Bozdag, E., J. Trampert, and J. Tromp, 2011, Misfit functions for full wave-form inversion based on
870, https://doi.org/10.1111/j.1365-246X.2011.04970.x.
Bharadwaj, P, W. Mulder, and G. Drijkoningen ,2016, Full waveform inversion with an auxiliary bump
functional: Geophys. J. Int., 206, 1076–1092, https://doi.org/10.1093/gji/ggw129.
Bunks, C., F.M. Saleck, S. Zaleski, and G. Chavent, 1995, Multiscale seismic waveform inversion:
Geophysics, 60, 1457–1473, https://doi.org/10.1190/1.1443880.
Chi, B., L. Dong, and Y. Liu, 2014, Full waveform inversion method using envelope objective function
without low frequency data: J. of Applied Geophysics, 109, 36-46,
https://doi.org/10.1016/j.jappgeo.2014.07.010.
Luo, J. and R.S. Wu, 2015, Seismic envelope inversion: Reduction of local minima and noise resistance:
Geophys. Prospecting, 63, 597-614, https://doi.org/10.1111/1365-2478.12208.
Lailly, P., 1983, The seismic inverse problem as a sequence of before stack migrations, in Bednar, J. B.,
Redner, R., Robinson, E., and Weglein, A., eds., Conference on Inverse Scattering: Theory and
Application, Soc. Industr. Appl. Math..
Pratt, R. G., C. Shin, and G. J. Hicks, 1998, Gauss-Newton and full Newton methods in frequency-space
seismic waveform inversion: Geophysical Journal International, 133, 341–362,
https://doi.org/10.1046/j.1365-246X.1998.00498.x.
Shin, C. and Y.H. Ha, 2008, waveform inversion in the Laplace domain: Geophysical Journal of
International, 173, 922-931, https://doi.org/10.1111/j.1365-246X.2008.03768.x.
Sirgue, L. and R.G. Pratt 2004, Efficient waveform inversion and imaging: A strategy for selecting
temporal frequencies: Geophysics, 69, 231–248, https://doi.org/10.1190/1.1649391.
Tarantola, A., 1987, Inverse problem theory: Elsevier.
Virieux J. and S. Operto, 2009, An overview of full waveform inversion in exploration geophysics:
Geophysics 74, no. 6, WCC1–WCC26, https://doi.org/10.1190/1.3238367.
Wu, R.S., J. Luo and B. Wu, 2013, Ultra-low-frequency information in seismic data and envelope
inversion: 83rd Annual International Meeting, SEG, Expanded Abstracts, 3078-3082,
Wu, R.S., J. Luo and B. Wu, 2014, Seismic envelope inversion and modulation sigma model: Geophysics
79, No 3, WA13-24, https://doi.org/10.1190/geo2013-0294.1.
© 2017 SEG Page 1527

Full waveform inversion with 3D shape optimization on unstructured meshes
Jia Shi ⇤ , Ruichao Ye, and Maarten V. de Hoop, Rice University
SUMMARY finite element method is a natural choice. Shi et al. (2016)

introduced the CG finite element method for modeling time-
We apply 3D shape optimization via Full Waveform Inversion harmonic elastic waves and applied FWI to update both P- and
(FWI) technique for recovery of the subsalt geometries. We S-wave speeds on a fixed unstructured tetrahedral mesh.
use the polyhedral representation of regional subsurface struc-
tures and partition them into unstructured tetrahedral meshes. In this work, we study the Hausdorff distance, which measures
Naturally we employ the continuous Galerkin (CG) finite el- the differences between two meshes. We then establish the
ement method for modeling time-harmonic waves. We then shape derivatives from FWI via the adjoint state method. With
derive the shape derivatives and the Hausdorff warping and fully unstructured mesh, we do not need a level set method for
utilize meshing techniques for robust shape evolution. Sev- constructing the shape derivatives. Our shape derivatives are
eral computational experiments are presented to illustrate the computed from boundary integrals over the triangular bound-
forward solution and the gradient flows. These gradient flows aries of the target shapes. We also study several techniques in
help us to correct the interior boundaries of the complicated mesh deformation. In our computational experiments, we il-
geological features. lustrate the time-harmonic waves using SEAM phase I as an
example. We build the SEG/EAGE 3D salt model as a target
model and a trial tetrahedral mesh with an initial guess of the
salt body. We illustrate the gradient flows on the shape via the
INTRODUCTION vector plots with arrows. We expect the shape derivatives will
correct the initial shape through our iterative reconstruction.
Subsalt recovery is an important but difficult problem. The
wave speeds of salt are typically much higher than the sur-
rounding materials. To model the physical property more pre- THEORY
cisely, we use a polyhedral representation particularly for these
high-contrast geological features, such as the salt boundaries, We consider a partitioning of the computational domain with
fluid-solid boundaries, fault planes, sediment layers, etc. The a fully unstructured tetrahedral mesh. In practice, a fully un-
representations are initiated via segmentation using an image structured tetrahedral mesh admits a local refinement with piece-
of material parameters as input, generating interfaces that di- wise constant wave speeds, can approximate the geological
vide subdomains. Here we mention two related early works: models accurately. Figs. 1 show the mesh we generate for the
Shin (1988) made use of the blocky representation of geologi- SEAM phase I. Fig. 1(a) shows a triangular ocean bottom mesh
cal models for the elastic wave inversion and Hale (2002) im- and Fig. 1(b) illustrates a triangular salt body mesh.
plemented triangular unstructured meshes for seismic imaging
Since we consider to update the the internal shapes of the ge-
and reservoir simulation. More recently, Qiu et al. (2016) uti-
ological bodies, we use the Hausdorff distance to describe the
lized a steerable variation regularization in FWI for recover-
differences between the current mesh W and the ideal true mesh
ing subsalt bodies. Lewis and Vigh (2016) applied a level set
W† . For the true model, ideally we suppose the target sub-
method for salt geometry recovery in FWI.
domain W† 2 R3 is known, and the domain containing W† is
To study the generality of geological structures, we consider properly partitioned into a tetrahedral mesh, with the bound-
the general models with piecewise constant model parameters ary ∂ W† aligned with the triangulated interfaces. We define
and without usual smoothness along the shapes. To begin with the distance function from any spatial point x to W as
our shape recovery approach, we mention four previous works.
dW (x) = min |x V j |, 8V j 2 W,
For the theoretical part, Beretta et al. (2015) established a Lip-
schitz stability estimate for stable determination of polyhedral where V j is a vertex in W. Then a Hausdorff distance dH can
interfaces from boundary data. For mesh algorithms, Guo and be expressed as
de Hoop (2013) applied a level set method for constructing the !
3D salt bodies. Ye and de Hoop (2015) studied 3D shape op- dH (W, W† ) = max max dW† (Vi ) , max dW (V j† ).
timization based on a fully unstructured tetrahedral mesh with Vi 2W V j† 2W†
adaptive deformation, quality control, and local refinements.
Via studying the Hausdorff distance dH between partitions
For modeling the seismic waves on unstructured meshes, the
using tetrahedra, Beretta et al. (2015) established a theory of
© 2017 SEG Page 1528

Shape Optimization in FWI
Shape derivatives
(a) Before we derive the shape derivatives, we describe the model
information mathematically. Let {T j }Nj=1 be a regular partition
of W into tetrahedra, and we naturally have W = [Nj=1 T j , for
j 6= k either T j \ Tk = 0/ or it consists of a common vertex,
a common edge or a common facet. We consider piecewise
constant Helmholtz potential
N
X
q(x) = q j cTj (x),
j=1
where q(x) = 1/Vp (x)2 . We define that at t0 , the mesh is de-

noted as
t
W 0 = [Nj=1 T jt0 .
Given a misfit functional F , we obtain the shape derivative
X N Z
d
F |t=t0 = w 2 qj u(x;t0 )v(x;t0 )(F j,t0 (x) · n j ) dsx ,
dt t
∂ Tj 0
j=1
(b)
where n j is the exterior normal to ∂ T jt0 , sx is the surface mea-
sure, w is the frequency, given a mesh Wt0 , u(x;t0 ) solves
the forward problem, v(x;t0 ) solves the adjoint problem, and
F j,t0 (x) : R3 7! R3 is the affine map with the property that for
all four vetices in a tetrahedron
(0)
F j,t0 (V j,i + t0 v j,i ) = v j,i , for i = 1, 2, 3, 4,
(0)
where V j,i denotes the initial vertex and v j,i here is the direc-
tion of a particular vertex. More details can be found in Beretta
et al. (2015).
In practice, we perform integration over the internal surfaces
Figure 1: SEAM example: (a) mesh for the ocean bottom, (b) and sum over all the available sources. Initially the shape
mesh for the salt body. derivative provides us information to move the internal facets
along their normal directions and the length of their move-
ments are determined by the integral values of the surface inte-
a Lipschitz stability estimate from the Dirichlet-to-Neumann gration. We then derive the direction and relative magnitude of
map. Although the real data is essentially the single-layer movement for each vertex. It will depend on the shape deriva-
potential operator, the stability estimates for the Dirichlet-to- tive information among all the triangles which contain this ver-
Neumann map directly carry over to stability estimate for the tex.
real data operator. Lipschitz stability estimates provide us a
framework for iterative reconstruction. Indeed, the conver- Deformation techniques
We move vertex V j from V jt to V jt . Then, the linearized shape
0
gence radius of this reconstruction is determined by the stabil-
ity constant. The information about the Hausdorff distance dH deformation scheme can be described by
between two different meshes can naturally be transformed to
d
V jt = V jt F (V jt )n(V jt ),
0
the vertices of the tetrahedra forming the new meshes. Via a
dt
minimizing the Hausdorff distance dH , the recovery of the
polyhedral interfaces becomes a shape optimization. Ye and with a some proper step size, n(V jt ) the update direction of
de Hoop (2015) obtained a sequence of tetrahedral meshes for this vertex V jt and d t the gradient magnitude at this
dt F (V j )
reconstructing the true mesh. vertex V jt .
In reality, the true shape needs to be revealed and the Hausdorff Based on this information, we deform the interfaces. Since
distance dH can not be measured directly. But the Hausdorff we expect some significant changes over the shape of the sur-
distance connects to the misfit between real and synthetic data. face, it is quite possible that the mesh qualities of the surround-
Hence we now employ FWI technique to detect the shape in- ing volume elements will be bad. Hence we design an edge-
formation for subsalt recovery. based joint refinement and coarsening algorithm for repairing
© 2017 SEG Page 1529

the volume mesh, removing undesired elements. Several nu-

merical illustrations are shown in Ye and de Hoop (2015). The (a) 0 X Axis (km)
mesh refinement techniques is conducted by dividing the edges

1 1 2
Y Axis (km) 2 3
3 4
4 5
6
5 7
6
0
and forming new vertices from their centers. Ye and de Hoop 8

7 Vp (km/s)
0 0.5 4.800
(2015) designed a multi-level approach for mesh evolution. 0.5 1
1.5 Z Axis (km) 3.9725
Within each level, we apply a local refining approach with a

1
2
Z Axis (km) 1.5
3.145
penalty term controlling the mesh quality. They also designed
2.5
2
3
0
an edge-based joint refinement and coarsening algorithm for 2.5

2
2.3175
repairing the volume mesh, removing undesired elements. We

3
0
1 4
1.490
2
Y Axis (km)
now have a reliable mesh deformation algorithm.

3
4 6
X Axis (km)
5
7 8
COMPUTATIONAL EXPERIMENTS
(b)
Our model is constructed with appropriate domain partition- Y Axis (km)
5
4
3
2
1
0 1 2
3
X Axis (km)
4
5
6
7
ing from segmentation of a 3D image into unstructured tetra-

6
7 0
8
0 0.5
hedral mesh. We generate our initial 3D mesh using Tetgen 0.5 1
(Si, 2015). Since we use fully unstructured meshes, we can 1

2
1.5 Z Axis (km)
not obtain the geometrical information directly. Instead, we Z Axis (km) 1.5
2
2.5
utilize a algebraic description of the mesh, including elements,

3
0
2.5
2
neighbors of each element and vertices. We design our FWI 3

0
1 4
scheme for computing the shape derivatives. We illustrate one

2
Y Axis (km)
3
4 6
X Axis (km)
forward solution for SEAM phase I using the CG finite element 5
method. Both forward and FWI algorithms have been paral-

7 8
lelized for computational efficiency. We illustrate the gradient Figure 2: (a) SEAM Vp model with axes, (b) a time-harmonic
flows on two different shapes. These gradient flows provide solution at 14.0 Hz.
updating direction.
Forward modeling
salt body is complicated, but the triangular mesh can very well
To model complex geometries accurately, we capture these in-
outline the structure. The model has a size of 8km⇥5km⇥4km
ternal discontinuities using triangular meshes via a level set
and contains roughly 250,000 elements. We use polynomial
method. The mesh examples are illustrated in Figs. 1. We then
order p = 2 for constructing the CG matrices. We applied 50
construct the geological models using unstructured tetrahedral
sources and 50 receivers at the upper surface for computing the
meshes. Fig. 2(a) shows the Vp model of size 7km⇥8km⇥3km
FWI shape gradients.
with axes. This model contains roughly 3,000,000 elements,
which is enough for representing the major geological features. For the shape optimization test, we start at 2.0Hz. We do not
The CG finite element method is a natural choice for solving update the wave speed here. For better illustration, we illus-
this problem. We design an efficient scheme for the general trate the minus gradient, i.e. d t t
dt F (V j )n(V j ). Fig. 4(a)
boundary value problems and implement Perfectly Matched shows the pure gradient flow, the color indicates the magnitude
Layer (PML). We also parallelize this method via a domain of the flow vector. Fig. 4(b) contains a lighter shape, which is
partitioning to obtain distributed matrices. We construct vari- the true shape shown in Fig. 3(b). The gradient flow leads a
ous polynomial orders for better accuracy. We use polynomial reasonable direction for mesh evolution. We note that most
order p = 3 in this case, which generates a sparse matrix of vertices are moving to the right direction, i.e. the lighter true
size n = 1.35 ⇥ 107 . Fig. 2(b) shows a pressure field at 14.0Hz, mesh area. After we update the shape as shown in (Ye and de
given a source near ocean surface and PMLs cover the whole Hoop, 2015), the new shape is closer to the true one. We move
computational domain. to 4.0Hz and expect to obtain more resolution. Fig. 4(a) shows
the new pure gradient flow on the updated shape. Similarly, the
Gradient flows on the shapes
color indicates the magnitude of the gradient flows. Some scal-
Here we study the shape derivatives from our computational
ing terms are needed for balancing the amplitudes the shape
experiments. Since we partition the computational domain, the
derivatives, since the amplitudes of the near-surface gradient
shape derivatives are computed in a parallel way as well. The
are typically stronger than the amplitudes of the deep part gra-
master computer node then collects all the information for con-
dient. Fig. 5(b) contains a lighter true shape and a purple up-
structing the final gradient flow. We use the SEG/EAGE 3D
dated shape with its edge information. Although the gradient
salt model for our numerical tests. Fig. 3(a) shows the starting
flow does reasonably well for the upper surface, one may still
model with an initial guess of the salt body. Fig. 3(b) illus-
have difficulties to update the deep part of the shapes, since the
trates the true model with the true salt body. The shape of the
data are collected from the ocean surface.
© 2017 SEG Page 1530

X Axis (km)
Y Axis (km) 5 4 3 2 1 0
8
5 7 6
3 4 0
1 2
0
(a) (a)
0
0.5
0.5 1
Vp (km/s)
1 1.5 4.500
1.5 2 Z Axis (km) Node Movement

3.75 1.168
Z Axis (km) 2 2.5
3
2.5
3 0.87611
3.5
3
4
3.5 2.25 0.58407
4
4
8 Y Axis (km)
2
7
6 1.500 0.29204
5
4
3
X Axis (km) 2
1 0
0
0.000
X Axis (km)
Y Axis (km) 5 4 3 2 1 0
8
5 7 6
3 4 0
1 2
0
(b)
0
0.5
0.5 1
Vp (km/s)
1 1.5 4.500
1.5 2 Z Axis (km)
3.75
Z Axis (km) 2 2.5
2.5
3
3 (b)
3.5
3
4
3.5 2.25
4
Node Movement
4
8
2 Y Axis (km)
1.168
7
6 1.500
5
4
3
X Axis (km) 2
1 0 0.87611
0
Figure 3: Models: (a) starting model, (b) true model. 0.58407
0.29204
(a) 0.000
Node Movement
1.168
0.87611
0.58407 Figure 5: The shape derivative on the updated shape: (a) pure
0.29204
illustration of the gradient flow, (b) comparison between the
lighter (true) shape and the purple (updated) shape.
0.000
shape optimization algorithms are based on regular meshes, we

note that unstructured meshes are the natural choice for mod-
eling complicated geometries. We make use of unstructured
(b) tetrahedral meshes for representing the geological models and
CG finite element method for modeling time-harmonic waves.
Node Movement We illustrate our gradient flows on the shapes with edge infor-
mation. The gradient flow offers us the direction to move the
1.168
0.87611 vertices. We can then either deform the mesh or remesh the
0.58407
whole computational domain. Via this framework, we rebuild
the geometries iteratively for subsalt recovery.
0.29204
0.000
ACKNOWLEDGEMENTS
This research is supported by the members of the Geo-Mathematical

Imaging Group at Rice University. J.S. would also like to
Figure 4: The shape derivative on the initial shape: (a) pure
thank PGS for using their computation resources.
illustration of the gradient flow, (b) comparison between the
lighter (true) shape and the purple (initial) shape.
CONCLUSION
We present our derivation and numerical construction for the

shape derivatives using our FWI framework. Most current
© 2017 SEG Page 1531

EDITED REFERENCES
REFERENCES
Beretta, E., M. V. de Hoop, E. Francini, and S. Vessella, 2015, Stable determination of polyhedral
interfaces from boundary data for the Helmholtz equation: Communications in Partial Differential
Equations, 40, 1365–1392.
Guo, Z., and de Hoop, M. V., 2013, Shape optimization and level set method in full waveform inversion
with 3D body reconstruction: 86th Annual International Meeting, SEG, Expanded Abstracts,
1079–1083, http://dx.doi.org/10.1190/segam2013-1057.1.
Hale, D., 2002, Atomic meshes: From seismic imaging to reservoir simulation: Proceedings of the 8th
European Conference on the Mathematics of Oil Recovery.
Lewis, W., and Vigh, D., 2016, 3D salt geometry inversion in full-waveform inversion using a level-set
method: 86th Annual International Meeting, SEG, Expanded Abstracts, 1221–1226,
Qiu, L., Chemingui, N., Zou, Z., and Valenciano, A., 2016, Full-waveform inversion with steerable
variation regularization: 86th Annual International Meeting, SEG, Expanded Abstracts, 1174–
1178, http://dx.doi.org/10.1190/segam2016-13872436.1.
Shi, J., de Hoop, M., Faucher, F., and Calandra, H., 2016, Elastic full-waveform inversion with surface
and body waves: 86th Annual International Meeting, SEG, Expanded Abstracts, 1120–1124,
Shin, C., 1988, Nonlinear elastic wave inversion by blocky representations: Ph.D. thesis, University of
Oklahoma.
Si, H., 2015, TetGen, a Delaunay-based quality tetrahedral mesh generator: ACM Transactions on
Mathematical Software, 41, 11, http://dx.doi.org/10.1145/2629697.
Ye, R., and de Hoop, M. V., 2015, 3D shape optimization based on unstructured triangle/tetrahedral mesh
deformation: 85th Annual International Meeting, SEG, Expanded Abstracts, 1351–1355,
© 2017 SEG Page 1532

Efficient 3D elastic FWI using a spectral-element method on Cartesian-based mesh
P.T. Trinh1,2 , R. Brossier2 , L. Métivier2,3 , L. Tavard2,4 , J. Virieux2 , P. Wellington2
1 Total EP, 2 Univ. Grenoble Alpes, ISTerre, 3 Univ. Grenoble Alpes, CNRS, LJK 4 Univ. Grenoble Alpes, GRICAD
SUMMARY In this work, we present an efficient SEM-based 3D elastic FWI

approach (SEM3D code) designed for crustal-scale exploration.
Full Waveform Inversion offers the possibility to extract high- This implementation relies on (1) Cartesian-based deformed
resolution quantitative multi-parameters models of the subsur- mesh with high-order shape functions to capture complex to-
face from seismic data. Heretofore, most of FWI applications pographies; (2) two Message Passing Interface (MPI)-based
at the crustal scale have been performed under the acoustic parallelism levels for tackling large scale and multiple shots
approximation, generally for marine environments. When con- experiments, associated with an efficient computation of inci-
sidering challenging land problems, efficient strategies are re- dent and adjoint fields through optimized computing kernels
quired for moving toward elastic inversion. We present such (Deville et al., 2002); (3) structurally-based nonstationary and
approach for 3D elastic time-domain inversion, based on spec- anisotropic smoothing filter implemented as a partial differen-
tral element methods designed on cartesian-based meshes. The tial equation (PDE) solved with SEM on the modeling mesh
proposed workflow integrates an easy and accurate cartesian- (Trinh et al., 2017).
based mesh building with high-order shape functions to capture
rapid topography variations and an efficient workflow for the WAVE PROPAGATION AND FWI IN SEM3D
incident and adjoint fields computation. A nonstationary and
anisotropic structure-oriented smoothing filter is implemented A classical hexahedra-based SEM frame is considered for elas-
directly on the spectral element mesh, for preconditioning FWI tic modeling (Komatitsch and Tromp, 1999): the physical do-
by incorporating prior geological information such as coherent main Ω is decomposed into a set of non-overlapping hexahedral
lengths, dip and azimuth angles. Numerical illustrations on elements. Each element can be mapped to the unitary refer-
Marmousi and SEAM II benchmarks illustrate the importance ence space of Gauss-Lobatto-Legendre (GLL) points, where
of each ingredient we have developed for making efficient and the cube [−1, 1] ⊗ [−1, 1] ⊗ [−1, 1] is discretized into a set of
flexible elastic FWI for land applications. (N +1) 3 GLL points (ξk1 , η k2 , ζ k3 ); k 1, k 2, k 3 = 0, ..., N, where
N refers to the interpolation order. These collocation points
define (N + 1) 3 basis functions, which are triple products of
INTRODUCTION
Lagrange polynomials of degree N. Considering this choice
High-resolution quantitative multi-parameters models of the of basis functions and the GLL quadrature for numerical inte-
subsurface are essential for crustal exploration. By considering gration, the weak form of the second-order PDE governing the
the entire information contained in seismic data, full waveform elastic waves propagation can be written as:
inversion (FWI) (Virieux and Operto, 2009) offers the possi- M∂tt u = −Ku + F, (1)
bility to extract such models. While most of FWI applications
where the displacement field is denoted by u, the global mass
at the crustal scale have been performed in the acoustic approx-
and stiffness matrices by M and K, respectively, and the source
imation for a decade, mainly for marine-acquired data, it now
term by F. The global mass matrix M is diagonal by construc-
becomes mandatory to tackle challenging land targets. The
tion. The free-surface condition is naturally taken into account
complex geology and possibly complex topography, found in
by the weak formulation. A second order explicit Newmark
many land environments, require to consider elastic effects, and
scheme is implemented for the time integration (Komatitsch,
therefore to develop an accurate and affordable 3D elastic FWI
1997).
engine. Due to the complexity of the acquired seismic data,
multiple frequency components are required to better constrain The inversion problem relies on a classical least-squares norm
the inverse problem. Therefore, time-domain approaches are given by
preferred for their ability to apply the time-windowing and to 1
C(m) = kdobs − dcal k 2, (2)
process the data (Brossier et al., 2009). 2
which computes the L 2 distance between the recorded seis-
Moving toward elastic modeling could be done by a natural
mic data dobs and the modeled seismic data dobs . In the time
extension of the finite-difference (FD) methods widely-used
domain, the gradient of C(m) with respect to the elastic ten-
for acoustic modeling. However, free-surface and near-surface
sor coefficients Ci j can be computed through the adjoint-state
representation, as well as free-surface effects, can be challeng-
approach (Plessix, 2006; Vigh et al., 2014)
ing to model with FD. Conversely, finite element (FE) methods
∂C(m) ∂C
can handle such kind of boundary conditions very accurately. g(x) = = ε̄, ε , (3)
For large scale problems, spectral element methods (SEM) ap- ∂Ci j ∂Ci j Ω,t
pear to be accurate, efficient and flexible for 3D elastic modeling where ε̄ and ε are respectively adjoint and incident strain fields.
and FWI (Komatitsch and Tromp, 1999; Fichtner et al., 2008; The matrix C = (Ci j )6×6 contains the elastic tensor coeffi-
Tape et al., 2010; Peter et al., 2011). One drawback of conven- cients, with 21 independent components in the case of full
tional SEM is the requirement of using and building hexahedral anisotropy. The gradient for any parameter α (seismic velocity,
meshes, which can be a challenging and time-consuming task. anisotropic parameter, impedance, ...) can then be computed
© 2017 SEG Page 1533

0 0
A B
by chain rule using the density ρ and Ci j elementary gradient
Vertical elevation (m)
Vertical elevation (m)

100 100
6 6
∂C X X ∂C ∂Ci j ∂C ∂ ρ 200 200
= + . (4)
∂α ∂Ci j ∂α ∂ ρ ∂α
i=1 j=i
300 300
True topography True topography
CARTESIAN-BASED DEFORMED MESH Discretization at 100m P4 GLL interpretation
400 400
2600 2800 3000 3200 3400 3600 2600 2800 3000 3200 3400 3600
To combine the accurate representation of topography, allowed X (m) X (m)
by FE meshes, and the easiness of implementation of FD grid, Figure 1: Topography description of a 2D cross-section (extracted
our SEM3D package considers a Cartesian-based mesh, with from SEAM II model) using (A) the eight corners of each element and
vertically deformed elements. The numbers of elements in x, P1 shape function and (B) the (4 + 1) 3 GLL control points associated
y and z directions are constant. For the interpolation at order with P4 shape functions. The element size is 100m in both case.
N = 4 or 5, SEM allows to accurately model elastic waves while the simulation accuracy related to the complex wave-
propagation with around 5 GLL nodes per shortest wavelength phenomena at the free-surface is significantly improved.
(Komatitsch, 1997). This condition is referred as the volume
condition.
OPTIMIZED ARCHITECTURE
When considering the presence of significant topography vari- Modeling kernel
ation, hexahedral elements can be vertically deformed. For The SEM implementation used in our workflow is based on
each element, a set of (n + 1) control points in each direction limited interpolation orders for test functions with N = 4 or
is considered, leading to (n + 1) 3 control points and associ- 5. It has been shown that these orders provide a good com-
ated shape functions in 3D. These shape functions are triple promise between the numerical accuracy and the constraint
products of Lagrange polynomials of degree n. The number of on the CFL stability condition (Komatitsch, 1997). The key
control points and shape functions (n + 1) is not related to the part of the modeling kernel is the computation of the stiffness-
interpolation order N of the test functions needed for solving displacement matrix-vector product Ku. Our implementation
the PDE. benefits from the factorization of the stiffness matrix as
Representing the surface with P1 shape functions (linear func- K = Dw CD, (6)
tions with n = 1) leads to the use of the eight corners of the where the operator D estimates the spatial derivatives of a vec-
element as control points. Such simple representation cannot tor in the Catesian space. The operator Dw is equivalent to a
honor sharp spatial variation of the free surface, as shown in weighted spatial derivatives operator. The application of these
one example in Figure 1A. The rough P1 approximation of operators on a vector can be decomposed into two steps: the
the topography affect the accuracy of the simulation due to estimation of the spatial derivatives in the reference space and
the interaction between elastic waves and the complex surface. the projection back to the real space. The former step in the
Decreasing the element size is one way for following the rapid reference space can be estimated by using highly efficient al-
variation of the topography, namely the surface condition. This gorithms developed by Deville et al. (2002), which take benefit
criterion might be stricter than the volume condition, and would from the tensorial properties of hexahedral elements, the opti-
significantly increase the computational cost. mization of cache usage, and the combination of efficient loop
This surface condition limitation can be overcome by Pn shape vectorization and manual unrolling. Similar strategies can be
functions at the arbitrary order n, where the control points are applied to accelerate the computation of the volumetric Jaco-
(n + 1) 3 GLL points inside the element: bian associated with the Pn shape functions, when necessary.
n+1 X
X n+1 X
n+1 Parallel implementation
x(ξ, η, ζ ) = ` k̂ (ξ, η, ζ )xk̂ , (5) Our implementation relies on a two-level MPI-based paral-
k1 =1 k2 =1 k3 =1 lelization: one level is designed on Cartesian-based domain
where k̂ stands for the triple indexes k 1, k 2, k 3 . The associated decomposition, allowing an efficient load-balancing thanks to
shape function is a triple product of Lagrange polynomials of the Cartesian-based mesh. This avoids the use of a third-party
degree n: ` k̂ (ξ, η, ζ ). Figure 1B highlights that with the same mesh-partitioner, even when the number of possible subdo-
size of the element (at 100 m), the P4 shape function provides mains is constrained by the mesh split in each direction. The
a better representation of the complex topography (i.e. (4 + 1) 2 second MPI-level is over seismic shots (or “super-shots”) man-
GLL points are used in each element to capture the topography aged in parallel.
map, instead of (1 + 1) 2 points for the P1 case).
Inversion kernel
It should be noticed that only the volumetric Jacobian matrix The inversion kernel relies on the reverse-communication in-
associated with the mapping from the reference space to the terface provided by the SEISCOPE optimization toolbox (Mé-
Cartesian space is required for the wave propagation (together tivier and Brossier, 2016), which includes various non-linear
with the surface Jacobian for the radiative absorbing boundary optimization methods. The FWI gradient, required as the in-
condition). The mesh creation with Pn shape functions only put of the optimization process, is computed by the zero-lag
affects the mesh construction and the computation of the Ja- cross-correlation of the incident and adjoint wavefields in the
cobian, which are computed only once in the FWI workflow. time-domain. The incident field is recomputed by the backward
The computational cost of the wavefield modeling is unaltered, propagation in time from the stored wavefield in the boundaries,
© 2017 SEG Page 1534

Figure 2: Example of the nonstationary filtering operator on FWI gradient, from pseudo-3D Marmousi model. (A) True velocity model. (B) Initial
velocity model. (C) Dip field, estimated from the true velocity model. (D) Coherent lengths in bedding plan (Lu and L w ), which vary from 12 m
at fault location to 45 m at other places. (E) Original scaled gradient without any smoothing. (F) Smoothed gradient with anisotropic nonstationary
Laplace filter (approximated by the application of two Bessel filters): dip field as presented in Figure C, Lu = L w as presented in Figure D, and
L v =12 m (≈ 0.15 of the shortest wavelength). Some interesting features are highlighted by black and red arrows, and faults are indicated by black
dash-lines.
synchronously with the forward propagation of the adjoint field. naturally yields a symmetric, positive-definite and well-condi-
As the gradient for the elastic tensor coefficient Ci j (Equation tioned linear system
3) involves the strain field, the gradient is directly accumulated (Mb + Kb )s = Mb g. (9)
during the simultaneous back and forth computation of inci-
Similar to the wave propagation problem, the mass matrix Mb
dent and adjoint fields, resulting in a cheap operation (Dussaud
associated with the application of Bessel filter is diagonal and
et al., 2008).
the stiffness matrix Kb is symmetric by definition.
BESSEL SMOOTHING FILTER The anisotropic nonstationary filter is defined by a 3D rotation
In many application, the gradient vector g(x) can exhibit arti- defined by dip θ and azimuth ϕ angles and variable coherent
ficial high wavenumber components, incompatible with the in- lengths: L v is associated with the direction perpendicular to
trinsic resolution of FWI. Designing a non-stationary, anisotro- the local bedding plan, L u and L w are related to the planar
pic smoothing operator which can incorporate some prior knowl- structure of geological structures. Under the assumption of
edge of the geological structure, such as the local 3D rotation, slow variation of the filter parameters, their spatial derivatives
becomes mandatory for practical applications. Such a filter has can be neglected. The PDE governing the smoothing process
to be efficiently applied to the vector of interest. can be approximated as
s(x) − ∇tz, x,y P(x)Pt (x)∇z, x,y s(x) = g(x), (10)
To fulfill those requirements, we introduce the Bessel filter
where ∇z, x,y is the spatial derivatives (∂/∂z, ∂/∂ x, ∂/∂ y) t , and
B3D (x), which can be directly and efficiently implemented with
the upper symbol “t ” stands for the transposed operator. The
any FD or FE method (Trinh et al., 2017). Instead of convolving
information related to the geological variation of the medium
the original vector g(x) with the forward filter B3D (x) to get the
is preserved in the projection matrix
smoothed vector s(x), we solve the following equation relying
L v cos ϕ L u sin ϕ
 
on the inverse operator 0
−1 P(x) = −L v cos θ sin ϕ L u cos θ cos ϕ L w sin θ  ,
B3D (x) ∗ s(x) = g(x). (7)
L v sin θ sin ϕ −L u sin θ cos ϕ L w cos θ
If the coherent lengths L x , L y and L z in x, y and z directions (11)
are uniform over space, Equation (7) can be translated into the between the real space and the locally rotated dimensionless
related-Bessel PDE coordinates system.
∂2 ∂2 ∂2
s(x) − L 2z 2 + L 2x 2 + L 2y 2 s(x) = g(x), (8) Similar to the system (9) for constant filter parameters, the weak
∂z ∂x ∂y
form of Equation (10) yields a symmetric, positive-definite and
in which the original gradient g(x) appears in the right hand
well-conditioned linear system. We solve this linear system
side. Following the weak formulation of SEM, Equation (8)
© 2017 SEG Page 1535

0 2500
A
(9) through a parallel conjugate gradient (CG) iterative solver,
using the same high-performance-computing structure as the 1
one for the wave equation. The most expensive operator is the
Z (km)
2000
product of the sparse stiffness-matrix Kb with a given vector.
2
Again, the factorization of this matrix-vector product is used
to achieve an efficient implementation (Deville et al., 2002).
3 1500
A double application of Bessel operators provides an accurate 0 1 2 3 4 5 6 7
0 X (km)
approximation of the Laplace filter. The overall scheme is 2500
B
highly efficient, as the algorithmic complexity is of order O(L),
for a given coherent length L = L x = L y = L z , compared to 1
the complexity O(L 3 ) for the 3D explicit convolution approach
Z (km)
2000
(Trinh et al., 2017).
2
Figure 2 illustrates the application of an anisotropic nonsta-
tionary Laplace filter (approximated by double application of
1500
Bessel filters) on a gradient computed in the Marmousi bench- 3
0 1 2 3 4 5 6 7
mark. The 2D Marmousi has been extended to an elastic 3D 0 X (km) 2500
volume for this test. A surface acquisition is used with a line C

of 24 sources, with distance 160 m between adjacent sources.
1
The receivers are located on the whole surface, with 12.5 m
Z (km)
between receivers. A Ricker wavelet centered at 8 Hz is used as 2000
the source signal. The 2D cross-section of the gradient without 2

any smoothing filter underneath the source line is shown in Fig-
ure 2E. The gradient contains significant acquisition footprint
3 1500
at the near-surface and high wavenumber artifacts in the deeper 0 1 2 3 4 5 6 7
part. An anisotropic nonstationary Laplace filter with param- X (km)
eters as described in Figures 2C and D is applied to produce Figure 3: (A) True Vs model. (B) Initial Vs model. (C) Inverted Vs
the smoothed gradient as shown in Figure 2F. The near-surface model.
acquisition footprint is effectively removed. The continuity of smoothing filter with L z = 25 m, L x = L y = 75 m is applied
the features at greater depths is enhanced, because the horizon- to the gradient to remove artifacts beyond the FWI intrinsic
tal oscillation artifacts are attenuated, as indicated by the black resolution. The inversion successfully recovers details in the
arrows. Due to the design of the coherent lengths, the energy is Vs model, even with the presence of a complex topography.
not smeared out across the faults, indicated by the red arrows in No acquisition footprint or high wavenumber artifact appears
Figures 2E and F. In this example, the smoothing process costs in the final model, thanks to the effectiveness of the smoothing
about 2.2% running time comparing to the forward problem. filter.
FULL WAVEFORM INVERSION EXAMPLE CONCLUSIONS AND PERSPECTIVES

Figure 3 shows a 3D elastic FWI example on a subset of SEAM We present an integrated workflow capable of efficiently per-
Phase II foothills benchmark (Oristaglio, 2012). SEM is used forming 3D elastic time-domain FWI for multi-parameters on
at order N = 4 in this test. A surface acquisition is used with a a hexahedral mesh, based on a spectral-element method in 3D.
line of 24 sources, each 300 m. Receivers are mapping the free The scheme relies on the use of deformed Cartesian-based mesh
surface each 12.5 m. A Ricker wavelet, centered at 3 Hz is used and high-order shape functions, simultaneous forward and ad-
as the source signal. At this frequency, the volume condition joint fields for gradient computation coupled with a highly op-
implies that the element size should be 100 m. As illustrated timized kernel, MPI-based parallelism and a novel nonstation-
in Figure 1A, shape functions of order 1 are not sufficient for ary and anisotropic structure-oriented smoothing filter solved
such rapid variation of the surface. The P4 shape functions efficiently and directly by SEM. Perspectives include the atten-
are thus used, which provide an accurate representation of the uation modeling, the coupling with an optimal wavefield deci-
topography (Figure 1B). mation and reconstruction proposed by Yang et al. (2016a,b),
The 2D cross-section of the shear wave velocity (Vs ) model and applications to large-scale complex land targets.
underneath the source line is shown in Figure 3A. The initial
model in Figure 3B is a smoothed version of the true model. ACKNOWLEDGMENTS
Similar smoothed model is used for P wave and density as the The authors would like to thank Total E&P for financial support of
input of the inversion process. Both surface waves and body PTT’s PhD project, and for allowing to present and access the SEAM
waves are used for the inversion without any distinction. We II model. This study was partially funded by the SEISCOPE con-
sortium (http://seiscope2.osug.fr). This study was granted access to
invert for compressional and shear velocities, but only show the HPC resources of the CIMENT infrastructure (https://ciment.ujf-
the result for Vs in Figure 3C. We do not invert for density in grenoble.fr) and CINES/IDRIS (allocation 046091 made by GENCI).
this example. The inversion process consists of 60 iterations, Authors enjoy discussions with M. Appe, B. Duquet, J.L. Boelle and
with l-BFGS optimization method. Only a stationary Laplace P. Williamson.
© 2017 SEG Page 1536

EDITED REFERENCES
REFERENCES
Brossier, R., S. Operto, and J. Virieux, 2009, Seismic imaging of complex onshore structures by 2D
elastic frequency-domain full-waveform inversion: Geophysics, 74, no. 6, WCC105–WCC118,
http://doi.org/10.1190/1.3215771.
Deville, M., P. Fischer, and E. Mund, 2002, High order methods for incompressional fluid flow:
Cambridge University Press.
Dussaud, E., W. W. Symes, P. Williamson, L. Lemaistre, P. Signer, B. Denel, and A. Cherrett, 2008,
Computational strategies for reverse-time migration: 78th Annual International Meeting, SEG,
Expanded Abstracts, 2267–2271, http://doi.org/10.1190/1.3059336.
Fichtner, A., B. L. N. Kennett, H. Igel, and H. P. Bunge, 2008, Theoretical background for continental-
and global-scale full-waveform inversion in the time-frequency domain: Geophysical Journal
International, 175, 665–685, http://doi.org/10.1111/j.1365-246x.2008.03923.x.
Komatitsch, D., 1997, Méthodes spectrales et éléments spectraux pour l’équation de l’élastodynamique
2D et 3D en milieu hétérogène: Ph.D. thesis, Institut de Géophysique du Globe de Paris.
Komatitsch, D., and J. Tromp, 1999, Introduction to the spectral element method for three-dimensional
seismic wave propagation: Geophysical Journal International, 139, 806–822,
http://doi.org/10.1046/j.1365-246x.1999.00967.x.
Métivier, L., and R. Brossier, 2016, The SEISCOPE optimization toolbox: A large-scale nonlinear
optimization library based on reverse communication: Geophysics, 81, no. 2, F11–F25,
http://doi.org/10.1190/geo2015-0031.1.
Oristaglio, M., 2012, SEAM phase II — Land seismic challenges: The Leading Edge, 31, 264–266,
http://doi.org/10.1190/1.3694893.
Peter, D., D. Komatitsch, Y. Luo, R. Martin, N. Le Goff, E. Casarotti, P. Le Loher, F. Magnoni, Q. Liu,
C. Blitz, T. Nissen-Meyer, P. Basini, and J. Tromp, 2011, Forward and adjoint simulations of
seismic wave propagation on fully unstructured hexahedral meshes: Geophysical Journal
http://doi.org/10.1111/j.1365-246x.2006.02978.x.
Tape, C., Q. Liu, A. Maggi, and J. Tromp, 2010, Seismic tomography of the southern California crust
based on spectral-element and adjoint methods: Geophysical Journal International, 180, 433–462,
http://doi.org/10.1111/j.1365-246x.2009.04429.x.
Trinh, P. T., R. Brossier, L. Métivier, J. Virieux, and P. Wellington, 2017, Bessel smoothing filter for
spectral element mesh: Geophysical Journal International, in press.
Vigh, D., K. Jiao, D. Watts, and D. Sun, 2014, Elastic full-waveform inversion application using
multicomponent measurements of seismic data collection: Geophysics, 79, no. 2, R63–R77,
http://doi.org/10.1190/geo2013-0055.1.
Virieux, J., and S. Operto, 2009, An overview of full waveform inversion in exploration geophysics:
Geophysics, 74, no. 6, WCC1–WCC26, http://doi.org/10.1190/1.3238367.
Yang, P., R. Brossier, L. Métivier, and J. Virieux, 2016a, Wavefield reconstruction in attenuating media:
A checkpointing-assisted reverse-forward simulation method: Geophysics, 81, no. 6, R349–R362,
http://doi.org/10.1190/geo2016-0082.1.
© 2017 SEG Page 1537

Yang, P., R. Brossier, and J. Virieux, 2016b, Wavefield reconstruction from significantly decimated
boundaries: Geophysics, 81, no. 5, T197–T209, http://doi.org/10.1190/geo2015-0711.1
© 2017 SEG Page 1538

Inter-parameter tradeoff quantification and reduction in isotropic-elastic full-waveform inversion
Wenyong Pan, Kristopher A. Innanen, Department of Geoscience, CREWES Project, University of Calgary, Yanhua O. Yuan,
Department of Geoscience, Princeton University
SUMMARY ume, we adopt a stochastic probing strategy, in which the multiparam-
eter Hessian operator is applied to random vectors. Expectation values
Quantifying inter-parameter tradeoffs (i.e., cross-talk artifacts) and re- of the correlation between these random vectors with their correspond-
ducing the introduced uncertainties are becoming increasingly impor- ing Hessian-vector products approximate the diagonals of multiparam-
tant for multiparameter full-waveform inversion (FWI). Parameter res- eter Hessian off-diagonal blocks, which measure parameter coupling
olution studies based on scattering patterns may neglect important in- strengths within the whole volume. To quantify inter-parameter trade-
formation for understanding inter-parameter tradeoffs. Our objective offs with off-diagonal elements of the off-diagonal blocks, the multipa-
is to evaluate and reduce the inter-parameter tradeoff errors experi- rameter Hessian is further applied to vectors with regularly distributed
enced by isotropic-elastic FWI using multiparameter Hessian. We in- spikes. The resulting MPSF volumes approximate the row summations
troduce the inter-parameter contamination kernel as a tool to expose of multiparameter Hessian.
the origin and characters of these tradeoffs. By applying the multi-
parameter Hessian to various types of diagnostic vectors, we are able Due to strong contaminations from velocity parameters, density struc-
to quantify interparameter tradeoffs either locally or within the whole tures are poorly constrained in isotropic-elastic FWI (Jeong et al.,
volume, i.e., considering finite-frequency effects in complex and het- 2011; Yuan et al., 2015; Modrak et al., 2016). Newton-based optimiza-
erogeneous models, etc. We observe that S-wave velocity perturbation methods are promising strategies for multiparameter FWI, balanc-
tions produce strong contaminations into density and P-wave velocity ing efficiency with suppression of unwanted inter-parameter cross-talk
updates, but suffer little contaminations from other parameters. Based artifacts (Métivier et al., 2015; Pan et al., 2016; Yang et al., 2016;
on our findings, we subtract approximated contamination kernels, con- Pan et al., 2017). However, iteratively solving the Newton equation
structed by applying off-diagonal Hessian blocks to estimated model with conjugate gradient algorithm is also expensive, and with a limited
perturbation vectors, from the standard sensitivity kernels. Numerical number of inner iterations, parameter crosstalk artifacts may not be ad-
examples show that this approach increases the reliability of certain equately suppressed. In this paper we develop a strategy to reduce the
cross-talk prone model estimations, for instance density in isotropic- parameter cross-talk artifacts using the approximated contamination
elastic FWI. kernels, which are constructed by applying multiparameter Hessian to
estimated model perturbation vectors. Multiparameter Hessian-vector
products can be calculated with second-order adjoint-state or finite-
difference approaches. Numerical examples illustrate that the interpa-
INTRODUCTION rameter contaminations in density updates are reduced effectively with
the new inversion strategy.
Simultaneously reconstructing multiple physical parameters in seismic
FWI suffers from interparameter tradeoffs, which reflects the inabil-
ity to distinguish the influence of one from another elastic property METHODS
(Tarantola, 1986; Operto et al., 2013; Innanen, 2014). Perturbations in
one physical parameter are therefore in danger of being mapped into Isotropic-elastic FWI review
the estimation of another. Evaluating inter-parameter tradeoffs and re-
ducing the introduced uncertainties are essential for multiparameter In seismic FWI a misfit function which measures the differences be-
FWI. tween seismic observations and synthetic data is defined:
X S X R Z T
Researchers have devoted intensive efforts to parameter resolution stud- Φ (m) = k∆d (xs , xr ,t; m) k2 dt, (1)
ies, usually basing them on analytic solutions of Fréchet derivative s=1 r=1 0
wavefields for different parameter classes (Tarantola, 1986; Gholami where ∆d (xs , xr ,t; m) are the data residuals. Within the Newton op-
et al., 2013; Alkhalifa and Plessix, 2014). Scattering patterns provide timization framework, the search direction ∆m is obtained by solving
invaluable information for understanding the inter-parameter coupling the Newton linear system: Hk̃ ∆mk̃ = −∇m Φk̃ . Here, ∇m Φ and H are
effects, but at the same time they neglect some important types of the gradient and Hessian, which represent the first and second deriva-
information (Podgornova et al., 2015). Conclusions drawn from the tives of the misfit function. In the adjoint-state method, explicit ex-
overlapping of scattering patterns hold asymptotically (Operto et al., pressions for the κ, µ and ρ sensitivity kernels are written as (Tromp
2013), meaning inter-parameter tradeoffs originating outside of a ray- et al., 2005; Yuan and Simons, 2014):
approximate framework are neglected.
Kκ (x) = −hκ (x) ∇ · ũ (xr , x, T − t) ∇ · u (x, xs ,t)i, (2)
One objective of this research is to quantify inter-parameter trade-
Kµ (x) = −2hµ (x) D̃ (xr , x, T − t) :D (x, xs ,t)i, (3)
offs in isotropic-elastic FWI by analyzing the multiparameter Hessian,
which provides complete characterization of these issues (Fichtner and Kρ (x) = −hρ (x) ũ (xr , x, T − t) ∂t2 u (x, xs ,t)i, (4)
van Leeuwen, 2015). We find that the products of the off-diagonal where ũ represents the adjoint wavefield, D is the traceless strain de-
blocks of multiparameter Hessian with the model perturbation vec- viator and h·i represents summation over source, receiver and time.
tors, referred to as inter-parameter contamination kernels in this pa- Sensitivity kernels for α, β and ρ 0 are given by:
per, account for the inter-parameter tradeoffs encountered in a given
4µ

4µ

FWI problem. For large-scale inverse problems, characteristics of the Kα = 2 1 + Kκ , Kβ = 2 Kµ − Kκ , Kρ 0 = Kρ +Kκ +Kµ .
3κ 3κ
Hessian can be inferred via matrix probing techniques, which are use-
(5)
ful when explicit representation of a matrix is too expensive (Trampert
et al., 2013). The product of the Hessian with a point-localized vector Inter-parameter contamination kernels
generates one Hessian column, which is referred to as the point spread
function (Hu et al., 2001; Valenciano et al., 2006; Tang and Lee, 2015). The sensitivity kernel Kα in the Newton system of isotropic-elastic
Fichtner and Trampert (2011) evaluated local inter-parameter tradeoffs FWI can be written as (Fichtner and van Leeuwen, 2015):
in FWI using multiparameter point spread functions (MPSFs). To eval- Kα = Kα↔α +Kβ →α +Kρ 0 →α = −Hαα ∆mα −Hαβ ∆mβ −Hαρ 0 ∆mρ 0 ,
uate the strengths of the inter-parameter tradeoffs within the whole vol- (6)
© 2017 SEG Page 1539

Interparameter tradeoffs quantification and reduction
where Kα↔α is the correct update kernel for α, and the terms Kβ →α
and Kρ 0 →α are defined as inter-parameter contamination kernels. The
model perturbation vectors ∆mβ and ∆mρ 0 , blurred by off-diagonal
blocks Hαβ and Hαρ 0 are mapped into the update for parameter α.
Similarly, sensitivity kernels Kβ and Kρ 0 can be written as:
Kβ = Kα→β +Kβ ↔β +Kρ 0 →β = −Hβ α ∆mα −Hβ β ∆mβ −Hβ ρ 0 ∆mρ 0 ,
(7)
Kρ 0 = Kα→ρ 0 +Kβ →ρ 0 +Kρ 0 ↔ρ 0 = −Hρ 0 α ∆mα −Hρ 0 β ∆mβ −Hρ 0 ρ 0 ∆mρ 0 ,
(8)
where Kβ ↔β and Kρ 0 ↔ρ 0 are the correct update kernels for β and ρ 0 ,
Kα→β and Kρ 0 →β , involving off-diagonal blocks Hβ α and Hβ ρ 0 , mea-
sure contaminations from α and ρ 0 into β , and Kα→ρ 0 and Kβ →ρ 0
involving off-diagonal blocks Hρ 0 α and Hρ 0 β , further measure con-
taminations from α and β into ρ 0 .
Figure 1: MPSFs arranged in a block structure. APSF indicates the
Quantifying inter-parameter tradeoffs maximum magnitude.
Local inter-parameter tradeoffs
Selecting model perturbations ∆mα = 0, ∆mρ = 0 and ∆mβ = Aβ δ (x − z), where means element-wise division, nr = [1, ..., NR] indicates the
the gradient update Kβ in equation (7) becomes: random vector index. If the subblocks of multiparameter Hessian are
highly diagonally-dominant, a small number of random probes are
Z
Hβ β x, x0 δ x0 − z dx0 = −Aβ Hβ β (x, z) , (9)

Kβ (x) = −Aβ needed. To measure the inter-parameter tradeoffs with off-diagonal el-
Ω(x0 )
ements in off-diagonal blocks, the multiparameter Hessian is applied
where Kβ (x) represents an conservative estimate of spike model per- to vectors with discrete spikes regularly distributed in the whole vol-
turbation blurred by diagonal block Hβ β . The sensitivity kernels Kα ume. The resulting MPSF volumes approximate row summations of
and Kρ 0 due to this point-localized perturbation can likewise be ex- multiparameter Hessian.
pressed as:
Reducing cross-talks with approximated contamination kernels
Kα (x) = −Aβ Hβ α (x, z) , Kρ 0 (x) = −Aβ Hβ ρ 0 (x, z) . (10)
In equation (8), the sensitivity kernel Kρ 0 is a linear summation of the
The preserved multiparameter Hessian column Hβ (x, z) is named as correct update kernel Kρ 0 ↔ρ 0 with the contamination kernels. We have
multiparameter point spread function (MPSF) following the conven- observed, during parameter tradeoff analysis in the numerical mod-
tion in exploration geophysics. Applying the Hessian to spike model elling section, that density suffers contaminations primarily from S-
perturbations ∆mα = Aα δ (x − z) or ∆mρ 0 = Aρ 0 δ (x − z) allows us to wave velocity, and S-wave velocity is less contaminated by other pa-
calculate the MPSFs Hβ α (x, z), Hρ 0 α (x, z), Hαρ 0 (x, z), and Hβ ρ 0 (x, z), rameters. This is suggestive that we can first update the S-wave ve-
which describe the local contaminations from α to β and ρ 0 and the locity iteratively for a finite number of k0 iterations, which provides an
contaminations from ρ 0 to α and β . 0
estimated model mkβ . The inversion is then started from initial models
Stochastic probing and MPSF volumes by simultaneously updating the three parameters. At the k̃th iteration,
approximated contamination kernels are constructed:
If the Hessian operator is diagonally dominant, expectation values of Z
the correlations between a zero-mean random vector v and its Hessian-
K̃βk̃ →α (x) = − k̃
x, x0 ∆m̃k̃β x0 dx0

Hαβ (14)
vector product H = Hv approximates the Hessian diagonals (Sacchi Ω(x0 )
et al., 2007). In a multiparameter inverse problem, the random vector v Z
can be partitioned into N p subvectors, and the multiparameter Hessian K̃βk̃ →ρ 0 (x) = − Hρk̃ 0 β x, x0 ∆m̃k̃β x0 dx0 ,

(15)
can be likewise divided into N p × N p subblock matrices. Applying Ω(x0 )
the multiparameter Hessian to the random vector gives N p Hessian- k̃ 0
vector products. Diagonals of the Hessian subblock matrices can then where ∆m̃β = mkβ − mk̃β is the approximated model perturbation vec-
be estimated by: tor. The new update kernels for α, β and ρ 0 are given by: :
Np
diag
X K̃αk̃ = Kαk̃ − K̃βk̃ →α , K̃βk̃ = Kβk̃ , K̃ρk̃ 0 = Kρk̃ 0 − K̃βk̃ →ρ 0 . (16)
H pq = E [v p H p ] = E[v p H pq vq ], (11)
p=1 Subtracting the approximated contamination kernels from the sensitiv-
where means element-wise multiplication, E denotes the expecta- ity kernels, Kαk̃ and Kρk̃ 0 , suppresses the contaminations partially. Better
tion operation, p and q are indices for subvectors representing dif- approximations of the model perturbation vector ∆m̃β would remove
ferent physical parameters, and H p is the sub-Hessian-vector product. contaminations more completely, but would be more computationally
Assuming independent zero-mean random vectors v p and vq for two expensive.
different physical parameters, E v p (x) vq (x0 ) = 0, equation (11) be-

comes:
Np NUMERICAL EXAMPLES
diag
X
H pq = H pq E v p vq = H pq E [v p v p ] . (12)
p=1 Multiparameter point spread functions
For example, in isotropic-elastic FWI, diagonal elements of the off- We use a 2D, 1km×1km homogeneous and isotropic-elastic model to
diagonal block Hρ 0 β can be estimated by: test the idea of quantifying parameter tradeoffs with multiparameter
NR NR
point spread functions (MPSFs). P-wave, S-wave velocity and density
diag
Hρ 0 β ≈
X
vβ ,nr Hρ 0 β vβ ,nr
X
vβ ,nr vβ ,nr , (13) are 2000 m/s, 1400 m/s and 1.2 kg/m3 . A P-SV source with a Ricker
nr=1 nr=1
wavelet ( fdom = 8Hz) is used for modelling. A total of 60 sources
© 2017 SEG Page 1540

Figure 2: Panels (a-c) show the true P-wave velocity, S-wave velocity
and density of the Gaussian anomaly models: mtrue true
α , mβ and mρ 0 .
true
Figure 4: Panels (a-c) estimated models using a standard simultaneous

inversion strategy; Panels (d-f) estimated models with the cross-talk
suppression strategy.
Figure 5: Panel (a-c) show true P-wave velocity, S-wave velocity and
Figure 3: Panels (a-c) sensitivity kernels Kα , Kβ and Kρ 0 ; Panels (d-f) density models; Panels (d-f) show the corresponding initial models;
show estimated model vector ∆m̃β and approximated contamination Panels (g-i) show the true model perturbations.
kernels K̃β →α and K̃β →ρ 0 ; Panels (g-i) new update kernels K̃α , K̃β and
K̃ρ 0 .
tions. The P-wave velocity and density however have strong cross-talk
artifacts leaking from the S-wave velocity. Figures 4d, 4e and 4f plot
and 200 receivers are arranged regularly along all boundaries of the the inverted models created using the new inversion strategy with ap-
model. We first apply a positive spike perturbation in P-wave veloc- proximated contamination kernels. As indicated by the arrows, the
ity at position z (center of the model): ∆mα (z) = 100 m/s. MPSFs contaminations due S-wave velocity in the inverted P-wave and den-
Hαα (x, z), Hβ α (x, z), and Hρ 0 α (x, z) are calculated, where Hβ α (x, z) sity models have been significantly suppressed.
and Hρ 0 α (x, z) describe the mappings from α to β and ρ 0 . Then, spike Marmousi model example
perturbations ∆mβ (z) = 100 m/s and ∆mρ 0 (z) = 100 kg/m3 are ap-
plied respectively. MPSFs Hαβ (x, z), Hβ β (x, z), Hρ 0 β (x, z), Hαρ 0 (x, z), In Figure 5, the true P-wave and S-wave velocity and density models
Hβ ρ 0 (x, z), and Hρ 0 ρ 0 (x, z) are obtained. for a more complex Marmousi model are illustrated. The second row
and third row in Figure 5 are the corresponding initial models and
These MPSFs are arranged in a block structure which is consistent model perturbations. We deploy sources and receivers regularly along
with their positions in the multiparameter Hessian, as shown in Figure top surface of the model.
1. Magnitudes of the MPSFs differ significantly. Contaminations from
α to β and ρ 0 appear to be relatively weak. An S-wave velocity β per- The stochastic probing approach is applied to estimate the diagonals
turbation produces strong negative mapping to α and positive mapping of the subblock matrices in the multiparameter Hessian with two inde-
to ρ 0 . A density ρ 0 perturbation also produces unwanted artifacts in α pendent random vectors. The first row in Figure 6 shows the diagonals
diag diag diag
and β . S-wave velocity β experiences the least contaminations of the of the diagonal blocks Hαα , Hβ β and Hρ 0 ρ 0 which can be used as
three parameters. These contaminations may result in density being preconditioners in the inversion process (Pan et al., 2014, 2015). The
highly under- or overestimated. Figures 2a, 2b, and 2c show the true second row in Figure 6 illustrates the diagonals of the off-diagonal
α, β and ρ 0 models with 3 isolated Gaussian anomalies. The initial diag diag diag
blocks Hαβ , Hαρ 0 and Hβ ρ 0 which indicate the coupling strengths
models are homogeneous. An l-BFGS optimization method is em-
of the isotropic-elastic parameters within the whole volume. Similar-
ployed for updating α, β and ρ 0 simultaneously. Figures 3a, 3b and 3c diag diag
show the sensitivity kernels. Relative strengths and phase-character of ity of Hβ ρ 0 and Hβ β indicates that contaminations from β to ρ 0 are
the inter-parameter contaminations are observed to match the predic- strong.
tions made using MPSFs, as shown in Figure 1. Contamination kernels A vector v0 consisting of densely distributed spikes with a constant
Kβ →α and Kβ →ρ 0 are relatively strong. Negative S-wave velocity per- magnitude of 100 is designed. The corresponding multiparameter Hessian-
turbations produce positive and negative contaminations in α and ρ 0 . vector products are shown in Figure 7. The strengths of inter-parameter
We iteratively update S-wave velocity by k0 = 3 iterations, which pro- contaminations generally match our predictions with multiparameter
vides an estimated model perturbation vector ∆m̃β , shown in Figure Hessian diagonals. Comparing strengths of the off-diagonal Hessian-
3d. Figures 3e and 3f show the approximated contamination kernels vector products (i.e., Hβ α ) with those of the diagonal Hessian-vector
K̃β →α and K̃β →ρ 0 . Figures 3g, 3h and 3i show the new update kernels products (i.e., Hβ β ), we conclude that contaminations from α to β and
K̃α , K̃β and K̃ρ 0 . Compared to the sensitivity kernels shown in Figures ρ 0 are relatively weak and can be ignored. Contaminations from ρ 0 to
3a, 3b and 3c, the mappings from S-wave velocity to P-wave velocity α and β are also relatively weak. However, contamination from β to
and density are now much weaker. Figures 4a, 4b and 4c show the α could potentially decrease the α update by as much as 20%, and
inverted α, β and ρ 0 models with traditional simultaneous inversion contamination from β to ρ 0 may increase the ρ 0 update by as much as
strategy. The S-wave velocity exhibits minimal cross-talk contamina- 2.8 times.
© 2017 SEG Page 1541

Figure 6: Panel (a) show the stochastic estimations of diagonals of

diag diag diag
multiparameter Hessian diagonal blocks Hαα , Hβ β and Hρ 0 ρ 0 ; Panel Figure 11: Panel (a-c) show the well log data comparison at 3.0 km.
(b) show the stochastic estimations of diagonals of off-diagonal blocks
diag diag diag
Hαβ , Hαρ 0 and Hβ ρ 0 .
To verify our predictions, we calculate the true contamination kernels
by applying the multiparameter Hessian to the true model perturba-
tion vectors ∆mα , ∆mβ and ∆mρ 0 . Figure 8 illustrates the sensitivity
kernels, the correct update kernels, and the inter-parameter contami-
nation kernels respectively. The Hessian diagonals shown in Figure
6 and the Hessian-vector products shown in Figure 7 generally repro-
duce (i.e., predict) the energy distribution and relative strengths of the
inter-parameter tradeoffs. The correct update kernel Kα↔α will be de-
creased by the S-wave contamination kernel. The correct update Kβ ↔β
exhibits limited contaminations from α and ρ 0 . The contamination
kernel Kβ →ρ 0 is ≈ 1.7 times stronger than the correct update kernel
Kρ 0 ↔ρ 0 for density. The estimated density structures are dominated by
Figure 7: The first row show the multiparameter Hessian-vector prod- contamination from S-wave velocity.
ucts Hαα ,Hαβ and Hαρ 0 ; The second row show Hβ α ,Hβ β and Hβ ρ 0 ;
The third row show Hρ 0 α ,Hρ 0 β and Hρ 0 ρ 0 . To mitigate the contaminations of S-wave velocity to other parameters,
we first update the S-wave velocity with 15 iterations, which gives an
estimated model perturbation ∆m̃β , as shown in Figure 9a. The ap-
proximated contamination kernels K̃β →α and K̃β →ρ constructed with
the estimated model perturbation vector are very close to the true con-
tamination kernels Kβ →α and Kβ →ρ . The second row in Figure 9 show
the new update kernels K̃α , K̃β and K̃ρ 0 respectively. The new update
kernel K̃ρ 0 is very close to the correct update kernel Kρ 0 ↔ρ 0 shown in
Figure 8. Figure 10a, 10b and 10c show the inverted P-wave, S-wave
and density models using traditional simultaneous inversion strategy.
The inverted density model is highly under- or overestimated due to
the leakages from S-wave velocity, as indicated by the arrows in Fig-
ure 10c. Figure 10d, 10e and 10f show the inverted P-wave, S-wave
and density models with approximated contamination kernels. Figure
Figure 8: Panel (a) illustrate the sensitivity kernels Kα , Kβ , and Kρ 0 ; 11 is a well log comparison of the inverted models. The contamina-
Panel (b) show the correct update kernel Kα↔α and contamination ker- tions in inverted density model are significantly suppressed.
nels Kβ →α and Kρ 0 →α ; Panel (c) show the kernels Kβ ↔β , Kα→β and
Kρ 0 →β ; Panel (d) show kernels Kρ 0 ↔ρ 0 , Kβ →ρ 0 and Kα→ρ 0 .
CONCLUSIONS
We present strategies for quantifying inter-parameter tradeoffs in isotropic-

elastic FWI with multiparameter Hessian-vector products. We propose
a strategy to reduce cross-talk artifacts with approximated contamina-
tion kernels, by applying multiparameter Hessian off-diagonal blocks
to estimated model perturbation vectors; this proves effective in re-
moving contaminations in estimated density structures.
Figure 9: Panel (a) show the estimated model perturbation ∆m̃β

and approximated contamination kernels K̃β →α and K̃β →ρ 0 ; Panel (b) ACKNOWLEDGMENTS
show the new update kernels K̃α , K̃β and K̃ρ 0 .
This research was supported by CREWES Consortium and National
Science and Engineering Research Council of Canada (NSERC, CRDPJ
461179-13). Wenyong Pan is also supported by SEG/Chevron schol-
arship and Eyes High International Doctoral Scholarship. Thanks also
to Tiger cluster (Princeton University) and Lattice cluster (Compute
Canada) for providing parallel computing environment. Thanks greatly
to Frederik Simons, Andreas Fichtner, Samual Gray, Daniel Trad, Yu
Geng, Junxiao Li, Hassan Khaniani, Youyi Ruan, Wenjie Lei, and
Figure 10: Panels (a-c) show the inverted P-wave, S-wave and density Ryan Modrak for valuable discussions.
models with simultaneous inversion strategy; Panels (d-f) show the
inverted models with approximated contamination kernels.
© 2017 SEG Page 1542

EDITED REFERENCES
REFERENCES
Alkhalifa, T., and R. Plessix, 2014, A recipe for practical full-waveform inversion in anisotropic media:
An analytic parameter resolution study: Geophysics, 79, no. 3, R91–R101,
http://doi.org/10.1190/geo2013-0366.1.
Fichtner, A., and J. Trampert, 2011, Resolution analysis in full waveform inversion: Geophysical Journal
International, 187, 1604–1642, http://doi.org/10.1111/gji.2011.187.issue-3.
Fichtner, A., and T. van Leeuwen, 2015, Resolution analysis by random probing: Journal of Geophysical
Research: Solid Earth, 120, 5549–5573, http://doi.org/10.1002/2015JB012106.
Gholami, Y., R. Brossier, S. Operto, A. Ribodetti, and J. Virieux, 2013, Which parametrization for
acoustic VTI full waveform inversion?-part 1: sensitivity and trade-off analysis: Geophysics, 78,
no. 2, R81–R105, http://doi.org/10.1190/geo2012-0204.1.
Hu, J., G. T. Schuster, and P. A. Valasek, 2001, Poststack migration deconvolution: Geophysics, 66, 939–
952, http://doi.org/10.1190/1.1444984.
Innanen, K. A., 2014, Seismic AVO and the inverse Hessian in precritical reflection full waveform
inversion: Geophysical Journal International, 199, 717–734, http://doi.org/10.1093/gji/ggu291.
Jeong, W., H.-Y. Lee, and D. J. Min, 2011, Full waveform inversion strategy for density in the frequency
domain: Geophysical Journal International, 188, no. 3, 1221–1242,
http://doi.org/10.1111/gji.2012.188.issue-3.
Metivier, L., R. Brossier, S. Operto, and J. Virieux, 2015, Acoustic multi-parameter FWI for the
reconstruction of p-wave velocity, density and attenuation: preconditioned truncated Newton
approach: 85th Annual International Meeting, SEG, Expanded Abstracts, 1198–1203,
Modrak, R., J. Tromp, and Y. O. Yuan, 2016, On the choice of material parameters for elastic full
waveform inversion: 86th Annual International Meeting, SEG, Expanded Abstracts, 1115–1119,
Operto, S., Y. Gholami, V. Prieux, A. Ribodetti, R. Brossier, L. Metivier, and J. Virieux, 2013, A guided
tour of multiparameter full waveform inversion with multicomponent data: from theory to
practice: The Leading Edge, 32, 1040–1054, http://doi.org/10.1190/tle32091040.1.
Pan, W., K. A. Innanen, and W. Liao, 2017, Accelerating Hessian-free Gauss-Newton full-waveform
inversion via l-BFGS preconditioned conjugate-gradient algorithm: Geophysics, 82, no. 2, R49–
R64, http://doi.org/10.1190/geo2015-0595.1.
Pan, W., K. A. Innanen, G. F. Margrave, and D. Cao, 2015, Efficient pseudo-Gauss-Newton full-
waveform inversion in the t-p domain: Geophysics, 80, no. 5, R225–R14,
http://doi.org/10.1190/geo2014-0224.1.
Pan, W., K. A. Innanen, G. F. Margrave, M. C. Fhler, X. Fang, and J. Li, 2016, Estimation of elastic
constants for HTI media using Gauss-Newton and full-Newton multiparameter full-waveform
inversion: Geophysics, 81, no. 5, R275–R291, http://doi.org/10.1190/geo2015-0594.1.
Pan, W., G. F. Margrave, and K. A. Innanen, 2014, Iterative modeling migration and inversion (IMMI):
Combining full waveform inversion with standard inversion methodology: 84th Annual
International Meeting, SEG, Expanded Abstracts, 938–943, http://doi.org/10.1190/segam2014-
0402.1.
© 2017 SEG Page 1543

Podgornova, O., S. Leaney, and L. Liang, 2015, Analysis of resolution limits of VTI anisotropy with full
Sacchi, M. D., J. Wang, and H. Kuehl, 2007, Estimation of the diagonal of the migration blurring kernel
through a stochastic approximation: 77th Annual International Meeting, SEG, Expanded
Abstracts, 2437–2441, http://doi.org/10.1190/1.2792973.
Tang, Y., and S. Lee, 2015, Multi-parameter full wavefield inversion using non-stationary point-spread
functions: 85th Annual International Meeting, SEG, Expanded Abstracts, 1111–1115,
Tarantola, A., 1986, A strategy for nonlinear elastic inversion of seismic reflection data: Geophysics, 51,
1893–1903, http://doi.org/10.1190/1.1442046.
Trampert, J., A. Fichtner, and J. Ritsema, 2013, Resolution tests revisited: the power of random numbers:
Geophysical Journal International, 192, 676–680, http://doi.org/10.1093/gji/ggs057.
Tromp, J., C. Tape, and Q. Liu, 2005, Seismic tomography, adjoint methods, time reversal, and banana-
doughnut kernels: Geophysical Journal International, 160, 195–216,
Valenciano, A. A., B. Biondi, and A. Guitton, 2006, Target-oriented wave-equation inversion:
Geophysics, 71, no. 4, A35–A38, http://doi.org/10.1190/1.2213359.
Yang, J., Y. Liu, and L. Dong, 2016, Simultaneous estimation of velocity and density in acoustic
multiparameter full-waveform inversion using an improved scattering-integral approach:
Geophysics, 81, no. 6, R399–R415, http://doi.org/10.1190/geo2015-0707.1.
Yuan, Y. O., and F. J. Simons, 2014, Multiscale adjoint waveform difference tomography using wavelets:
Geophysics, 79, no. 3, WA79–WA95, http://doi.org/10.1190/geo2013-0383.1.
Yuan, Y. O., F. J. Simons, and E. Bozdağ, 2015, Multiscale adjoint waveform tomography for surface
and body waves: Geophysics, 80, no. 5, R281–R302, http://doi.org/10.1190/geo2014-0461.1.
© 2017 SEG Page 1544

Elastic reflection waveform inversion with variable density
Yuanyuan Li*, China University of Petroleum(East China), Tariq Alkhalifah, Qiang Guo, King Abdullah
University of Science and Technology, Zhenchun Li, China University of Petroleum(East China)
Summary Alkhalifah (2015) proposed a new optimization problem

where the background model and perturbations are inverted
Elastic full waveform inversion (FWI) provides a better simultaneously. Subsequently, Guo and Alkhalifah (2016)
description of the subsurface than those given by the brought the concept of RWI to elastic media and developed
acoustic assumption. However it suffers from a more elastic reflection waveform inversion that aims to invert for
serious cycle skipping problem compared with the latter. the long wavelength components of P- and S-wave
Reflection waveform inversion (RWI) provides a method to velocities. However, the variable density as a potential
build a good background model, which can serve as an reflector, with little influence on the propagation of waves,
initial model for elastic FWI. Therefore, we introduce the can not be neglected as a perturbation parameter. The
concept of RWI for elastic media, and propose elastic RWI absence of perturbation density can cause amplitude errors
with variable density. We apply Born modeling to generate in predicted reflections and thus expose our RWI to some
the synthetic reflection data by using optimized artifacts. The inverted density in elastic FWI plays the role
perturbations of P- and S-wave velocities and density. The of absorbing high-wavenumber components mainly on the
inversion for the perturbations in P- and S-wave velocities layer interfaces (Xu and McMechan, 2014). We invert for
and density is similar to elastic least-squares reverse time the perturbations and background models simultaneously,
migration (LSRTM). An incorrect initial model will lead to and the inverted model parameters include the
some misfits at the far offsets of reflections; thus, can be perturbations of P- and S-wave velocities and density and
utilized to update the background velocity. We optimize the the background P- and S-wave velocities.
perturbation and background models in a nested approach.
Numerical tests on the Marmousi model demonstrate that Base on the Born approximation, we use the objective
our method is able to build reasonably good background function proposed by Alkhalifah and Wu (2014). The
models for elastic FWI with absence of low frequencies, gradient for the perturbations of P- and S-wave velocities
and it can deal with the variable density, which is needed in and density and the background part of P- and S-wave
real cases. velocities can be derived using the adjoint state method
(Plessix, 2006). We then establish elastic reflection
Introduction waveform inversion scheme with variable density. Finally,
numerical tests on the Marmousi model demonstrate that
Elastic waveform inversion aims to estimate elastic model our proposed method has the potential to construct good
parameters for seismic imaging and interpretation, such as initial models for elastic FWI in the absence of low
the P- and S-wave velocities and density, by minimizing frequencies.
the misfit between the predicted and observed
multicomponent data (Tarantola, 1986; Virieux and Operto, Theory
2009). Although elastic multiparameter inversion can
utilize multicomponent data and extract more medium The 2-D elastic wave displacement-stress equation is given
properties compared with acoustic inversion, the as:
nonlinearity of inversion will increase because of the   2u x  xx  xz , (1)
   
additional degrees of freedom. The parameterization and  t 2 x z
  2u z  xz  zz
inversion scheme, which are widely studied, are crucial for   t 2  x  z
the stability of multiparameter inversion (Köhn et al., 2012; 
 u x u z
Jeong et al., 2012; Operto et al., 2013). Additionally, an  xx  (  2  ) 
 x z
initial model, close enough to the true model, is  u x u z
 zz    (  2  )
prerequisite for the convergence to the global minimum in  x z
 u z u x
elastic FWI (Symes, 2008). Xu et al. (2012) and Wang et al.   xz   
 x z
(2013), based on the previous work (Mora, 1989; Plessix et
al., 1995), proposed a reflection waveform inversion where u x and u z are displacement components,  xx ,  xz
method to construct the background model using and  zz are stress components,  and  are lame
reflections generated from migration/demigration. The low
parameters, and  is the bulk density.
wavenumber update of the background model requires a
true amplitude migration that can be used to predict the
reflections in the migration/demigration process. With a We split the elastic medium parameters into background
least-square optimization to the predicted image, Wu and and perturbation components, which can be written as:
© 2017 SEG Page 1545

Elastic RWI with variable density
all     all     all     . (2) T  u u   u † u †  T   u  u z   u x† u z† 

,
  O     x  z   x  z dt    x
   dt
Correspondingly, the total wavefield can be defined as the s
0
 x z   x z  0
 x z   x z 
summation of the background and perturbed wavefields: T  u  u    u †

 u †

  x  z   x
 z
dt
 x z   x z 
uall  u   u，τ all  τ   τ .
0
(3)
  u u x† u z u z†   u z u x   u z† u x†  
T  
  O    2  x       dt
We subtract the wave equation for the background s
0
  x x
 z z   x z   x z  
wavefield (eq. 1) from that for the total wavefield, and T 
   u x u x  u z u z    u z  u x   u z u x  
† † † †

  2          dt
apply the Born approximation. After further simplification, 0

  x x z z   x z    x z 
the perturbed wavefields satisfy the following wave T 
  u  u x u z  u z   u z u x    u z  u x  
† † † †

  2  x         dt
equations:   x x
 z z   x z   x z   
0
  2 u x  xx  xz  2u x (4) u x u x† u z u z† T  u u

†
 u z u z†
O   
T
       dt   x x
 dt
2
t x 2
z t 0 t t t t 0 t t t t
 s
  2 u z  xz  zz  2u z u x  u x† u z  u z†
+
T
      dt
 t 2 x z t 2
0 t t t t

  u x  u z u u (7)
 xx  (  2  )     2  x   z .
 x z x z where u† is the wavefield propagating backward in time
  u x  u z u u
 zz   (  2  )    2  z   x generated from the residual data at the receiver positions,
 x z z x

 xz  
 u z

 u x u
  z   x
u  u† is the perturbed wavefield propagating backward

 x z x z †
The perturbed wavefield refers to the single scattered generated from the wavefield u at the model perturbations.
wavefield, which is produced when the background
wavefield encounters the perturbation of medium The trade-off between different parameters is inevitable in
parameters. That is to say, eq. (4) expresses the de- the multiparameter inversion. We should select a proper
migration operator. parameterization to suppress the crosstalk, especially for
the important parameters that we care about. Here, we use
We employ the objective function (Wu and Alkhalifah, the velocity parameterization, where the investigation of
2015): other parameterizations will be left to future study. We can
1 obtain the gradients with respect to P- and S-wave

(5)
min O m,  m   
u u u
2
. obs velocities and density by using the chain rule, given by:
m , m 2 t
s ,r
Ov  2  v p Op . (8)
where u obs is the observed multicomponent data, Ovs  4  vs O  2  vs O
m    , u,   and  m   ,  u,   are background and O   v 2p  2vs2  O  vs2O  O
perturbed model parameters, respectively. The background Considering that the stable update of the background model
and perturbed model parameters are inverted needs an accurate perturbation image, the perturbations of
simultaneously to minimize the misfit between the modeled P- and S-wave velocities and density are simultaneously
and observed data. Specifically, we update the model optimized in the elastic least-squares migration to produce
perturbations to fit the near-offset data well. Thus, when the reflections. The background parts of P- and S-wave
the background models are closer to the true models, the velocities are updated iteratively in the elastic RWI.
residuals at the far offsets will then be used to further
correct the background models. Examples
Given the objective function, we deduce the gradient of the We test our method on the elastic Marmousi model. We
functional with respect to model perturbations using the add a constant-velocity layer (we keep it fixed through the
adjoint state method (Plessix, 2006): inversion) above the original model to reduce the source
near field effect. The true P-wave velocity model is shown
T  u u   u † u †  ,(6)
O     x  z   x  z dt in Figure 1. The true S-wave velocity model is built with a
s
0
 x z   x z 
passion ratio of 0.25. The true density model is constructed
T  u u u u †   u u   u † u †  
†

O    2  x x  z z    z  x   z  x  dt based on the Gardner formula (Gardner et al., 1974). The
s
0

  x x z z   x z   x z   explosive source, a Ricker wavelet with peak frequency of
u x u x† u z u z†
O   
T
 dt 5 Hz after filtering out frequencies below 2.5 Hz, is used in
s
0 t t t t
synthesizing the wavefields. We ignite 47 evenly
and the gradient for the background model: distributed shots on the surface. The initial P-wave velocity
is shown in Figure 2. Similarly, we build the initial S-wave
velocity and density. Given the initial models, we first
update the model perturbations (δvp, δvs, δρ)
© 2017 SEG Page 1546

simultaneously using the near-offset residuals. Actually, the

optimization of model perturbations is equivalent to elastic
least-squares reverse time migration, however with an
inaccurate velocity. The optimized model perturbations (i.e.,
δvp, δvs and δρ) are shown in Figure 3. The inverted
background P- and S-wave velocities (i.e., vp and vs) are (a) (b)
shown in Figure 4a and 4b, and the optimal δvp, δvs and δρ
for the inverted background models are shown in Figure 5a,
5b and 5c, respectively. Most of the reflections and
scattered waves are focused and located at the accurate
positions. The quality of image is considerably improved
compared to the first iteration. Adding the model
(c)
perturbations to the background models, we can obtain the
more exact models, which can be used as initial model for Figure 3: The inverted model perturbations δvp(a), δvs(b) and δρ(c)
the subsequent conventional elastic FWI, and the results are for the initial models.
shown in Figure 6. For comparison, we apply conventional
elastic FWI starting from the initial models shown in
Figure 2, and obtain the inverted results shown in Figure 7.
In addition, we also implement elastic RWI in the case that
the density perturbation is not considered, that is to say,
δρ=0. The inverted background P- and S-wave velocities
shown in Figure 8 contain more high-wavenumber artifacts,
compared to the inverted background models using our
approach (Figure 4). The comparisons of P- and S-wave
velocity profiles at location 3 km are shown in Figure 9. (a)
As we can see from these figures, our ERWI approach can
deal with the variable-density case, and provide a good
starting model for conventional EFWI to avoid cycle
skipping even in the absence of the low frequencies below
2.5 Hz.
(b)
Figure 4: The inverted background models vp(a) and vs(b).
Figure 1: The true P-wave velocity.
(a) (b)
(c)
Figure 5: The inverted model perturbations δvp(a), δvs(b) and δρ(c)

Figure 2: The initial P-wave velocity. for the background models shown in Figure 4.
© 2017 SEG Page 1547

(a) (b)
Figure 8: The inverted background models vp(a) and vs(b) in the
case that the density perturbation is not inverted.
(a) (b)
(b)
Figure 6: The elastic FWI results starting from the background

models (Figure 4) with the inverted perturbations (Figure 5) added
to them: (a)Inverted vp and (b) inverted vs.
Figure 9: The comparison of vp (a) and vs (b) profiles at the

distance of 3 km. (Red: true velocity, cyan: initial velocity, green:
the inverted background velocity, blue: the elastic FWI result
starting from the inverted models, dash black: conventional EFWI
starting from the initial models shown in Figure 2).
Conclusions
(a)
We proposed an elastic RWI method with variable density.
The background and perturbation components of elastic
medium parameters are updated in a nested way.
Considering that the density influences the amplitudes of
reflections, the perturbation components of density and
velocities are optimized simultaneously to better fit the
near-offset reflections. The inclusion of density helps
mitigate high wavenumber artifacts from our RWI results.
Numerical tests demonstrate that our approach can better
(b)
recover the background P- and S-wave velocities, which
Figure 7: The elastic FWI results starting from the initial models can accurately describe the kinematic information. Using
shown in Figure 2. Inverted vp(a) and inverted vs(b). the background models, the elastic LSRTM can be
implemented effectively. In addition, high-resolution
models are obtained when the model perturbations are
added to the background models, and enable elastic FWI to
converge to a more accurate result with less iterations.
Acknowledgments
We thank KAUST for its support and SWAG for

collaborative environment. The authors would like to thank
(a) the China Scholarship Council for supporting the study.
This research is also sponsored by Fundamental Research
Funds for Central Universities (16CX06039A).
© 2017 SEG Page 1548

EDITED REFERENCES
REFERENCES
Alkhalifah, T., and Z. Wu, 2014, FWI and MVA the natural way: 76th Annual International Conference
and Exhibition, EAGE, Extended Abstracts, WeE10612, http://doi.org/10.3997/2214-
4609.20141091.
Gardner, G. H. F., L. W. Gardner, and A. R. Gregory, 1974, Formation velocity and density — The
diagnostic basics for stratigraphic traps: Geophysics, 39, 770–780,
http://doi.org/10.1190/1.1440465.
Guo, Q., and T. Alkhalifah, 2016, A nonlinear approach of elastic reflection waveform inversion: 86th
Annual International Meeting, SEG, Expanded Abstracts, http://doi.org/10.1190/segam2016-
13865457.1.
Jeong, W., H.-Y. Lee, and D.-J. Min, 2012, Full waveform inversion strategy for density in the frequency
domain: Geophysical Journal International, 188, 1221–1242, https://doi.org/10.1111/j.1365-
246X.2011.05314.x.
Köhn, D., D. De Nil, A. Kurzmann, A. Przebindowska, and T. Bohlen, 2012, On the influence of model
parametrization in elastic full waveform tomography: Geophysical Journal International, 191,
325–345, https://doi.org/10.1111/j.1365-246X.2012.05633.x.
http://doi.org/10.1190/1.1442625.
Operto, S., Y. Gholami, V. Prieux, A. Ribodetti, R. Brossier, L. Métivier, and J. Virieux, 2013, A guided
tour of multi-parameter full waveform inversion with multi-component data: From theory to
Plessix, R., 2006, A review of the adjoint-state method for computing the gradient of a functional with
http://doi.org/10.1111/j.1365-246X.2006.02978.x.
Plessix, R., Y. D. Roeck, and G. Chavent, 1995, Automatic and simultaneous migration velocity analysis
and waveform inversion of real data using a MBTT/WKB J formulation: 65th Annual
International Meeting, SEG, Expanded Abstracts, http://doi.org/10.1190/1.1887624.
Symes, W. W., 2008, Migration velocity analysis and waveform inversion: Geophysical Prospecting, 56,
765–790, http://doi.org/10.1111/j.1365-2478.2008.00698.x.
1893–1903, http://doi.org/10.1190/1.1442046.
Wang, S., F. Chen, H. Zhang, and Y. Shen, 2013, Reflection-based full waveform inversion (RFWI) in
the frequency domain: 83rd Annual International Meeting, SEG, Expanded Abstracts, 877–881,
Wu, Z., and T. Alkhalifah, 2015b, Simultaneous inversion of the background velocity and the
perturbation in full-waveform inversion: Geophysics, 80, no. 6, R317–R329,
http://doi.org/10.1190/geo2014-0365.1.
Xu, K., and G. A. McMechan, 2014, 2D frequency-domain elastic full-waveform inversion using time-
domain modeling and a multistep-length gradient approach: Geophysics, 79, no. 6, R41–R53,
http://doi.org/10.1190/geo2013-0134.1.
© 2017 SEG Page 1549

Xu, S., D. Wang, F. Chen, G. Lambare, and Y. Zhang, 2012, Inversion on reflected seismic wave: 82nd
Annual International Meeting, SEG, Expanded Abstracts,
http://doi.org/10.1190/segam20121473.1.
© 2017 SEG Page 1550

Multiparameter Elastic Full Waveform Inversion With Facies Constraints
Zhen-dong Zhang∗ , Tariq Alkhalifah∗ and Ehsan Zabihi Naeini †
∗ King Abdullah University of Science and Technology.
† Ikon Science.
SUMMARY (Köhn et al., 2012). However, to get a better understanding

of the subsurface, a more accurate multiparameter inversion
is necessary. Other common ways to reduce the null space
Full waveform inversion (FWI) aims fully benefit from all the in multiparameter inversion are choice of better parametriza-
data characteristics to estimate the parameters describing the tion (Operto et al., 2013; Alkhalifah and Plessix, 2014; Oh and
assumed physics of the subsurface. However, current efforts Alkhalifah, 2016) and incorporation of a priori information to
to utilize full waveform inversion as a tool beyond acoustic constrain the inversion. Utilizing a priori information in the
imaging applications, for example in reservoir analysis, faces form of preconditioning or regularization has been shown to
inherent challenges related to the limited resolution and the po- efficiently reduce the null space (Asnaashari et al., 2013).
tential trade-off between the elastic model parameters. Adding
rock physics constraints does help to mitigate these issues, but As opposed to deterministic logic, Bayesian theory provides a
current approaches to add such constraints are based on includ- natural platform to incorporate available prior information of
ing them as a priori knowledge mostly valid around the well the model for many geophysical inversion problems (Buland
or as a boundary condition for the whole area. Since certain and Omre, 2003). In classic AVO inversion, a more advanced
rock formations inside the Earth admit consistent elastic prop- type of constraint, namely per-facies rock physics constraints,
erties and relative values of elastic and anisotropic parameters has been proved to be very effective in optimizing seismic in-
(facies), utilizing such localized facies information in FWI can version (Zabihi Naeini and Exley, 2017). Zabihi Naeini et al.
improve the resolution of inverted parameters. We propose (2016) discussed the main components of FWI as a poten-
a novel confidence map based approach to utilize the facies- tial reservoir characterization tool and one of their suggestions
based constraints in both isotropic and anisotropic elastic FWI. was to use facies based rock physics constraints in FWI. Later,
We invert for such a confidence map using Bayesian theory, in Zhang et al. (2017) applied a simplified version of Bayesian in-
which the confidence map is updated at each iteration of the version to classify the facies and then utilize the inverted facies
inversion using both the inverted models and a prior informa- as constraints in isotropic elastic FWI. In this study, we extend
tion. The numerical examples show that the proposed method the previous work to both isotropic and anisotropic elastic FWI
can reduce the trade-offs and also can improve the resolution using the full Bayesian inversion framework. We assume that
of the inverted elastic and anisotropic properties. the inverted models adhere to a Gaussian distribution (Taran-
tola, 2005) and, iteratively, based on the prior information, de-
velop and update a facies confidence map, which in turn is used
as a regularization term in the inversion. We test the proposed
INTRODUCTION method on both isotropic and anisotropic elastic models.
Uncertainties in inversion results are usually related to the for-

ward modeling engine and the assumptions behind it (Taran- THEORY
tola, 2005). For example, we assume that the uncertainties
in the observed data have normal distributions and have noth- Elastic FWI With Regularization
ing to do with such uncertainties. However, the uncertain-
ties caused by simulation engine can be reduced by consid- Our proposed misfit function contains a standard data misfit
ering a more accurate physics to describe the earth response. term, a smoothed Total Variation (TV) regularization term and
To better simulate wave propagation, depending on the effi- a facies-based regularization term, as follows
ciency and accuracy requirements, elastic and anisotropic elas-
J(m) = Jd (m) + αJTV (m) + β J prior (m), (1)
tic wave equations are used as the forward modeling engine
in full waveform inversion. With the current improvements where α and β control the contribution from the penalty terms,
in computational power, solving these complex equations is and m denotes a vector of model parameters.
becoming more and more practical. However, more complex
physics require more parameters to describe the real earth and The standard data misfit is given by
inevitably introduces more null space.
Jd =k Wd (d pre − dobs ) k2 , (2)
Estimating elastic and anisotropic parameters is an ongoing
topic of interest for the seismic exploration community of the where d, with the corresponding superscripts, denote the vec-
seismic exploration community. In current practice, due to an tors of multicomponent data, and Wd is a weighting operator
inherent trade-off between parameters, for example, P-wave applied to the data, Wd = σd I. Here, σd is the standard devia-
velocity and density, the density model is not usually updated tion of the predicted data.
at all to reduce the nonlinearity (null space) of the inversion
© 2017 SEG Page 1551

2.4 3.5
We use a smoothed TV as a penalty in the objective function, face1
face2
face1
face2
2.2
face3 face3
as follows
S-wave Velocity (km/s)

face4 3 face4
)
2
3
Zq
face5 face5
ε 2 + k ∇m k2 dx,
Density (g/cm
JTV = (3) 1.8 face6
2.5
face6
1.6
where ε mitigates the singularities in the gradient. 1.4

2
1.2
The last term in equation 1 utilizes an a priori given by the 1

2 2.5 3 3.5 4
1.5
2 2.5 3 3.5 4
P-wave Velocity (km/s) P-wave Velocity (km/s)
facies constraint, as a penalty, as follows
J prior =k Wm (minv − mc ) k2 . (4) Figure 1: Six facies with normal observation errors. Each color
denotes the corresponding group of the face. There are nine
Similarly, Wm is a diagonal matrix, minv
denotes the inverted observations with normally distributed errors for each face.
model in each iteration, and mc is the calculated confidence They’re used as the possibility P( f ) in equation 6.
map which depends on both the inversion results and the prior
information, as we will see below. Usually we only have a
general information of the facies without knowing their spa- where γ controls the resolution of estimated confidence map.
tial distributions. Thus, we use Bayesian inversion to estimate denotes the element-wise multiplication. The solution of the
their spatial distributions and then use them as constraints in Bayesian inverse problem is then represented by its posterior
the inversion. expectation.
The gradient with respect to the objective function is written
as,
NUMERICAL EXAMPLE
T We will start by testing the approach on an isotropic model, in-

∂ d pre

g= WTd Wd (d pre − dobs )+ verting for 3 parameters parameters followed by an anisotropic
∂m example with 6 parameters. All the examples are 3D models
!
∇m (5) with a spray of the 2D slice in the third axis.
αdiv p +
ε 2 + k ∇m k2 Isotropic Example
β WTm Wm (minv − mc ), We consider a layered model with six facies with nine nor-
mally distributed observation errors in each of them as shown
where the weighting factors α and β are decided by several in Figure 1. A staggered finite-difference method is used to
trials. solve the elastic equation with an absorbing boundary layer
condition. The model size is 1.75 km by 2.5 km with 50 ex-
Bayesian Inversion for Confidence Maps plosive point sources distributed evenly on the surface shown
Seismic facies are classified based on similarities in any ob- in Figure 2. The recorded seismic data are multi-component
servable attribute of rocks such as elastic properties, connec- particle velocities. The initial model shown in Figure 3 is a
tivity and overall appearance over a geological area. Facies smoothed version of the true model obtained by applying a
can, therefore, provide rock physics relationships, which can smoothing operator of length 200 m.
be utilized as constraints in the inversion. This is a key fea-
ture (i.e. rock physics constraints per facies) as opposed to
assuming only one relationship over the entire area (Kemper
and Gunning, 2014; Zabihi Naeini and Exley, 2017).
With a probabilistic type inversion in mind, quantitative FWI
can be analyzed from a statistical perspective. Specifically,
we use Bayesian inversion to calculate a confidence map from
Figure 2: True models. P-wave velocity (left), S-wave velocity
quantitative inversion and a prior knowledge. The first step of
(middle) and density (right).
Bayesian inversion is to find the maximum likelihood, which
is
P(f) · P(m|f)
P(f|m) = R , (6)
P(m|f)P(f)
where · denotes element-wise multiplication. m is the vector
of inverted model from conventional least-square fitting. f de-
notes the vector of facies given.
To be consistent with the least-square criterion, the Gaussian
model is used to describe the uncertainties in the model space Figure 3: Initial models. P-wave velocity (left), S-wave veloc-
(Tarantola, 2005): ity (middle) and density (right).
P(m|f) = exp(−γ(m − f) (m − f)), (7) Five frequency bands are used in the inversion, which are 2-7
© 2017 SEG Page 1552

Hz, 2-10 Hz, 2-13 Hz, 2-16 Hz and 2-19 Hz, sequentially. We
also add random noise to the 2 Hz low-cut filtered data to form
the observed data. We conduct a standard elastic full waveform
inversion, for reference, as shown in Figure 4 to compare with
the results of our proposed method shown in Figure 5. To as-
sist with the visual comparison, a vertical profile in the middle
of the model is also plotted in Figures 6 and 7. The standard
method overestimates the S-wave velocities and there is a rel- Figure 8: Observed data with random noise at 2-19 Hz (left);
atively strong crosstalk between P-wave velocity and density, Predicted data of the proposed inversion result (middle); Data
as expected. However, the proposed facies-based method can residual (right).
recover most of the layers correctly, and at reasonably high
resolution. Figure 8 shows the observed data, the predicted
data and the data residual for the proposed method at 2-13 Hz We extend the isotropic model used in the first example to
respectively. a transversely isotropic model with a vertical symmetry axis
(VTI). The assumed reservoir (at depth 1.2 km) is surrounded
by VTI layers (from 1 km to 1.45 km in depth). Facies are not
listed in the example but they are constructed to have a simi-
lar behavior (certain relationships with Vp ) to the facies used
in the first example (e.g. Figure 1). Six parameters (Vp , Vs ,
ρ, Vn , Vh and V sh ) are inverted, simultaneously. The config-
uration of the acquisition system is similar to that of the first
example. Four frequency bands, 2-5 Hz, 2-7 Hz, 2-10 Hz and
Figure 4: Standard elastic FWI results. P-wave (left), S-wave 2-16 Hz, are utilized in the multistage approach. Again we add
(middle) velocities and density (right). random noise to the observed data. Figure 9 shows the actual
model used in the example, note that the potential reservoir is
surrounded by VTI layers. The initial model shown in Fig-
ure 10 is a smoothed version of the true model, which has the
same smoothing radius used in the isotropic case. For compar-
ison, the inverted models from conventional elastic FWI and
➚
Edge eﬀect
the proposed method are shown in Figures 11 and 12, respec-
tively. Vertical profiles from the conventional (Figure 13) and
the proposed (Figure 14) method inverted models are also plot-
Figure 5: Proposed elastic FWI results. P-wave (left), S-wave ted. Note that the anisotropic parameters are recovered insuf-
(middle) velocities and density (right). ficiently compared to the elastic parameters through the con-
ventional anisotropic elastic FWI approach. However, the pro-
posed statistical approach can improve the inverted anisotropic
parameters. The observed data, the predicated data and the
corresponding data residual (2-16 Hz) are shown in Figure 15.
The inverted model can predict most of the coherent events in
the observed data.
Figure 6: One vertical profile of the conventional method.

P-wave (left), S-wave (middle) velocities and density (right).
Cyan: true model; Green: initial model; Pink: standard elastic
FWI.
Vp Vs ρ
Figure 7: One vertical profile of our proposed method. P- Vn Vh V sh

wave velocity (left), S-wave velocity (middle) and density
(right). Cyan: true model; Green: initial model; Red: Pro- Figure 9: Actual models used in the VTI example.
posed method.
Anisotropic Example
© 2017 SEG Page 1553

Vp Vs ρ Vp Vs ρ
Vn Vh V sh
Vn Vh V sh
Figure 13: Vertical profiles from the conventional method.
Figure 10: Initial models used in the VTI example. Cyan: true models; Green: initial models; Pink: inverted mod-
els.
Vp Vs ρ
Vp Vs ρ
Vn Vh V sh
Vn Vh V sh
Figure 11: Inverted models from conventional elastic FWI.
Figure 14: Vertical profiles from the proposed method. Cyan:
true models; Green: initial models; Red: inverted models.
Vp Vs ρ
Figure 15: Observed data with random noise at 2-16 Hz (left);

Predicted data of the proposed inversion result (middle); Data
residual (right).
Vn Vh V sh between different parameters and also can improve the resolu-

tion of the estimated anisotropic properties. However, edge
Figure 12: Inverted models from the proposed method. effect and other artifacts can degrade the results as the pro-
posed updates depend on the first pass inversion results. The
proposed method can be easily incorporated into the general
CONCLUSION inversion framework.
We proposed a novel approach to utilize facies dependent a

prior information to constrain the multiparameter elastic FWI. ACKNOWLEDGMENTS
A statistically driven confidence map is calculated and itera-
tively updated based on the inversion results and the priors. We thank Juwon Oh, Bingbing Sun, Vladimir Kazei and Yike
It is consistent with the framework of the local optimization Liu (IGG, CAS) for their helpful discussions. For computer
method, which has the assumption of a Gaussian distribution time, this research used the resources of the Supercomputing
for both model and data uncertainties. The numerical exam- Laboratory at King Abdullah University of Science & Tech-
ples show that the proposed method can suppress the cross-talk nology (KAUST) in Thuwal, Saudi Arabia.
© 2017 SEG Page 1554

EDITED REFERENCES
REFERENCES
Alkhalifah, T., and R.-E. Plessix, 2014, A recipe for practical full-waveform inversion in anisotropic
media: An analytical parameter resolution study: Geophysics, 79, no. 3, R91–R101,
http://doi.org/10.1190/geo2013-0366.1.
Asnaashari, A., R. Brossier, S. Garambois, F. Audebert, P. Thore, and J. Virieux, 2013, Regularized
seismic full waveform inversion with prior model information: Geophysics, 78, no. 2, R25–R36,
http://doi.org/10.1190/geo2012-0104.1.
Buland, A., and H. Omre, 2003, Bayesian linearized AVO inversion: Geophysics, 68, 185–198,
http://doi.org/10.1190/1.1543206.
Kemper, M., and J. Gunning, 2014, Joint impedance and facies inversion–seismic inversion redefined:
First Break, 32, 89–95.
Köhn, D., D. De Nil, A. Kurzmann, A. Przebindowska, and T. Bohlen, 2012, On the influence of model
parametrization in elastic full waveform tomography: Geophysical Journal International, 191,
325–345, http://doi.org/10.1111/gji.2012.191.issue-1.
Oh, J.-W., and T. Alkhalifah, 2016, Elastic orthorhombic anisotropic parameter inversion: An analysis of
parameterization: Geophysics, 81, no. 6, C279–C293, http://doi.org/10.1190/geo2015-0656.1.
tour of multiparameter full-waveform inversion with multicomponent data: From theory to
Tarantola, A., 2005, Inverse problem theory and methods for model parameter estimation: SIAM.
Zabihi Naeini, E., T. Alkhalifah, I. Tsvankin, N. Kamath, and J. Cheng, 2016, Main components of full-
waveform inversion for reservoir characterization: First Break, 34, 37–48,
http://doi.org/10.3997/1365-2397.2016015.
Zabihi Naeini, E., and R. Exley, 2017, Quantitative interpretation using facies-based seismic inversion:
Interpretation, 5, SL1–SL8, http://doi.org/10.1190/INT-2016-0178.1.
Zhang, Z.-D., E. Zabihi Naeini, and T. Alkhalifah, 2017, Facies constrained elastic full waveform
inversion: 79th Annual International Conference and Exhibition, EAGE, Extended Abstracts,
https://doi.org/10.3997/2214-4609.201700719.
© 2017 SEG Page 1555

3D elastic full waveform inversion for OBC data using the P-wave excitation amplitude
Ju-Won Oh*, Mahesh Kalita and Tariq Alkhalifah, King Abdullah University of Science and Technology
Summary where d̂ m (p) = d m (p) / d m (p) and d̂o = do / do .

We suggest a fast and efficient 3D elastic full waveform The gradient of this objective function with respect to a
inversion (FWI) algorithm based on the excitation model parameter p is given by
amplitude (maximum energy arrival) of the P-wave in the ∂E ⎡ ∂c ∂u ∂u a ⎤
source wavefield. It evaluates the gradient direction (x) = ∫ ⎢ ijkl (x) l (x,t) i (x,tmax − t) ⎥dt , (2)
∂p ⎢⎣ ∂p ∂xk ∂x j ⎥⎦
significantly faster than its conventional counterpart. In
addition, it removes the long-wavelength artifacts from the where cijkl represents the elastic constants as a function of
gradient, which are often originated from SS correlation space coordinates x; u and ua denote the source and adjoint
process. From these advantages, the excitation approach wavefields, respectively. The source wavefield, u, is a
offers faster convergence not only for the S wave velocity, solution of elastic wave equation,
but also for the entire process of multi-parameter inversion, ∂u ∂ ! ∂u $
compared to the conventional FWI. The feasibility of the ρ (x) 2i (x,t) − #cijkl (x) l (x,t)& = f i (x s ,t) , (3)
∂t ∂x j"
∂x %
proposed method is demonstrated through the synthetic k
Marmousi and a real OBC data from North Sea. where ρ and f indicate the density and seismic source at a
position, xs. The adjoint wavefield, ua, can be computed by
Introduction backpropagating the following adjoint source,
1
Multi-parameter elastic FWI is in the forefront of our f a (x r ,t) =
d m (p)
{ ( ) }
d̂ m (p) d̂ m (p) ⋅ d̂ o − d̂ o . (4)
efforts to recover sub-surface information from measured
seismic data (Operto et al., 2013, Alkhalifah et. al., 2016). Eq. (2) requires both the source and adjoint wavefields to
Although promising, it suffers from the endemic problems be present for the dot product operation, which is
of a large computational cost and for Null space in its quest computationally cumbersome especially in large 3D
in recovering the unknowns despite accommodating better problems. However, ExA mitigates those requirements
physics into the inversion framework (Tarantola, 1986; using the following steps (Kalita and Alkhalifah, 2016):
Brossier et al., 2009; Prieux et al., 2013; Vigh et al., 2014). (I) Computation of the excitation time and amplitude,
However, as usual, trade-off exist between the (II) modification in the adjoint source by a temporal
computational advantages and the accuracy of Physics. To cross-correlation process of source function,
ameliorate the former, numerous solutions exist, including (III) evaluation of gradient direction only at the
boundary saving (Raknes and Weibull, 2016) schemes, excitation time.
time-frequency domain approaches (Sirgue et al., 2008) Therefore, ExA does not require storing entire wavefields.
and wavefield compression techniques (Boehm et al., 2016). Neither it requires additional wavefield extrapolation, often
Alternately, Kalita and Alkhalifah (2016) reduce the used in boundary saving schemes (Rakness and Weibull,
computational overhead by approximating the source 2016). As a result, ExA completes the gradient evaluation
wavefield with the excitation amplitude (the most energetic accessing a limited memory block and investing shorter
arrival) and its arrival time (excitation time) in acoustic time than its conventional counterpart. Unlike acoustic
FWI. In this abstract, we employ the excitation approach media, displacement fields include both P- and S-waves
(ExA) under the elastic FWI realm to mitigate both the together in the elastic case. Therefore, it is hard to
computational and convergence issues of large-scale 3D segregate the excitation time of P-wave in elastic media. As
problems. We show the versatility of elastic excitation suggested by Nguyen and McMechan (2015) in the context
method on Marmousi II model and 3D real data. In addition of reverse time migration, we compute excitation time of
to the computational advantages, ExA removes long- the divergence of wavefields ( S∇ ), which includes only P-
wavelength artifacts from SS correlations, thus, results in a wave motions. Next, we store the original displacement
cleaner gradient than its conventional counterpart. wavefield at this excitation time. Mathematically,
P-wave ExA in elastic FWI tex = tex (x) = arg max S ∇ (x,t)
t∈⎡⎣1,nt ⎤⎦
{ } (5)
The global correlation norm (Choi and Alkhalifah, 2012) represents the excitation time of P waves. Using ExA in eq.
that measures the mismatch between observed (do) and (2), the gradient can be reduced as follows:
modeled (dm) data is defined as ∂E ∂c ∂u ∂u! a
(x) = ijkl (x) l (x,tex (x)) i (x,tmax − tex (x)) , (6)
r
{
E = − ∑ d̂ m (p) ⋅ d̂o } (1) ∂p ∂p ∂xk ∂x j
Elastic FWI using P-wave excitation amplitude
where u! a is the modified adjoint source in step (II). times for P-waves can be computed using the divergence
operator in anisotropic media, in which the divergence
Automatic mode separation using P-wave ExA operator results in much stronger P-wave than S-wave
energy unless anisotropy is extremely strong.
Normalized
In addition to enhanced computational speed, ExA offers (a) Distance (km) gradient
0 2 4 6 8
the possibility of an automatic mode separation in elastic 0 0.5
Depth (km)
FWI. As eq. (2) shows, the conventional gradient direction
1
incurs all possible correlations between P- and S-waves. 0
However, using the excitation approximation with P-waves, 2
the gradient evaluation process automatically filters out the 3 -0.5
SP and SS contributions, which mainly includes artifacts in
marine surveys. (b) 0 0.5
Depth (km)
Figure 1 shows the P- and S-wave velocities of the true 1
0
Marmousi II model. The original Marmousi model has a 2
soft sea bottom, which induces slow S-wave velocity.
3 -0.5
However, to avoid numerical dispersion, S-wave velocities
are modified in this example. We use a homogeneous (c) 0 0.5
Depth (km)
density model (1g/cm3) and update only seismic velocities. 1
For initial models, we use smoothed P- and S-wave 0
velocities. As Kalita and Alkhalifah (2016) demonstrated, 2
the gradient directions for P-wave velocity from the 3 -0.5
boundary saving method and ExA show quite similar

(d) 0 0.5
features. This is because, in elastic FWI, the P-wave Depth (km)
1
velocity perturbation still generates only PP waves 0
(Tarantola, 1986). However, the gradient direction for the 2
S-wave velocity using conventional methods includes some 3 -0.5
long wavelength artifacts (yellow arrows in Figure 2a).
These long wavelength artifacts are products of SS Figure 2: The gradient directions for the S-wave velocity at 1st
correlations (Figure 2c). As discussed earlier, the gradient iteration from (a) conventional method, (b) mode-separated PP +
direction using ExA contains neither SP nor SS correlations, PS, (c) mode-separated SP + SS waves and (d) P-wave ExA.
P-wave
thus, resembles the mode-separated gradient (Wang and (a) Distance (km) Velocity (km/s)
0 2 4 6 8
Cheng, 2017) as shown in Figure 2b. Moreover, those long- 0 4.5
wavelength artifacts aggravate the convergence of S-wave
Depth (km)
1
velocity, so does the overall convergence rate of multi-
3
parameter FWI (Figures 3a and 3b). Thus, ExA improves 2
computational speed and reduces number of required 3 1.5
iterations involved in completing the FWI.
(a) Distance (km)
P-wave (b) 0
2.8
Depth (km)
Velocity (km/s)
0 2 4 6 8 1
0 4.5 1.8
Depth (km)
2
1
3 3 0.8
2
3 1.5 (c) 0 4.5

Depth (km)
1
(b) 0
2.8 3
Depth (km)
2
1
1.8 3 1.5
2
3 0.8 (d) 0
2.8
Depth (km)
1
Figure 1: The true (a) P- and (b) S-wave velocities. 1.8
2
In anisotropic media, mode separation requires additional 3 0.8
work in solving the Christoffel equation (Cheng and Kang,
2014), which increases the cost in large 3D elastic FWI. Figure 3: The inverted P- (a and c) and S-wave (b and d) velocities
However, in marine survey (P-wave source), excitation using the conventional method (a and b) and ExA (c and d).
Real data application: 2C 3D OBC data xy

(c)
6
Depth (km) Crossline (km)
1.6
Inverted Vs
We further apply the P-wave ExA to 3D real OBC data 1.3 (Conventional from ux )
3
from the North Sea (Szydlik et al., 2007). We use a 1 xz/yz
smoothed initial model (Figure 4) derived from the
0
2.6
tomography result. The direct waves are muted by the data- 2 1.3
owner so we opt to use the global correlation norm over the
least squares. FWI is conducted for data with frequencies 4 0
ranging from 2.75 to 10 Hz. We randomly choose 200 xy

(d)
6
sources out of the total of 3400 at each iteration to avoid 1.6
Inverted Vs
artifacts (Díaz and Guitton, 2011). 1.3 (EXA from ux )
3
xy 1 xz/yz
6
0
3 2.6
2.7 initial 2
3
1.3
2.4 xz/yz
4 0
0
4 0 2 4 6 8 10 0 2 4 6
2 3
Inline (km) Crossline (km)
4 2 Figure 6: (a) Inverted P-wave velocity after conventional acoustic

0 2 4 6 8 10 0 2 4 6 FWI of the vertical component, (b) initial S-wave velocity for
Inline (km) Crossline (km) elastic FWI and inverted S-wave velocities from inline component
using (c) conventional method and (d) P-wave ExA.
Figure 4: The initial P-wave velocity model.
(a) 6
Vs
(ExA)
3
2C Volve data
Normalized gradient
0
1
PZ data (vertical) PS data (inline)
Tomography 2 0
Acoustic FWI 4 -1
Smoothed Vp
Inverted Vp
(b)
6
Smoothed ! and "

Initial Vs (Poisson’s ratio = 0.25) Elastic FWI Vs
Only Vs is inverted (conventional)
3
Normalized gradient
Inverted Vs
0
Figure 5: The schematic diagram illustrating the FWI strategy. 2 0

xy
(a)
6
3 Inverted 4 -1
Vp
2.7 (acoustic FWI
3
(c)
6
from uz )
2.4 xz/yz
Vs
0
(SP+SS correlations)
3
2 3
Normalized gradient
0
1
4 2
2 0
xy
(b)
6
Initial
1.6 4 -1
Vs
1.3
(Poisson’s ratio 0 2 4 6 8 10 0 2 4 6
3
= 0.25)
Inline (km) Crossline (km)
1 xz/yz
0
2.6
Figure 7: The gradient directions for the S-wave velocity from (a)
2 1.3
P-wave ExA, (b) the conventional method and (c) mode-separated
SP+SS waves.
4 0
Figure 5 shows the FWI strategy for 2-component Volve
data set. As Sears et al. (2010) suggested, we separately
invert for vp from vertical component and then vs from
inline component, which helps to reduce the trade-off (Oh Conclusions

and Alkhalifah, 2016). Since PS waves in the vertical
component are also muted by the data-owner, conventional To improve the computational speed and convergence rate
acoustic FWI is good enough for the vertical component of multi-parameter FWI in large 3D elastic anisotropic
(Figures 6a, 8a and 9a). For elastic FWI in the 2nd stage, we media, we proposed an elastic FWI with P-wave excitation
build up an initial S-wave velocity from inverted P-wave amplitude of the source wavefields. The excitation method
velocity with a fixed Poisson’s ratio (Figure 6b). The invests shorter computational time in gradient evaluation
density is assumed to satisfy Gardner’s formula (Gardner et than its conventional counterpart. In addition, it is mainly
al., 1974) and is, thus, not updated. As Figure 9a (blue immune from artifacts resulting from unwanted SS
arrows) shows, it seems that the Volve area has significant correlation, which is inevitable in conventional methods.
anisotropy. For this reason, we use smoothed background Although multipathing is theoretically a limitation of this
anisotropic parameters extracted from tomography without method, it does not impede the quality of the inversion
any update. For the elastic FWI applied in the 2nd stage, we process, as justified through the synthetic and real data
compare the performance of using the P-wave ExA in the examples.
FWI with using the conventional approach. Figure 7
displays gradient directions of the S-wave velocity. As we Acknowledgments
expected, conventional gradients (Figure 7b) have long-
wavelength artifacts induced by SS correlation (Figure 7c) Research reported in this publication was supported by
whereas ExA (Figure 7a) produces clean gradient competitive research funding from King Abdullah
directions. Although long wavelength features are not University of Science and Technology (KAUST) in Thuwal,
properly recovered due to lack of long wavelength Saudi Arabia. The authors would like to thank Statoil ASA
components in PP and PS radiation patterns (Tarantola, and the Volve license partners ExxonMobil Exploration
1986), ExA captures low S-wave velocity layers (Figure and Production Norway AS and Bayerngas Norge AS for
6d) providing a better convergence rate and computational the release of the Volve data. We would like to thank
speed (Figure 8b) compared to the conventional FWI. The Marianne Houbiers from Statoil for providing some helpful
vertical profiles (Figure 8) also support that the proposed suggestions and corrections. For computer time, this
approach produces both P- and S-wave velocities with less research used the resources of the Supercomputing
trade-offs, particularly at 2.4 km in depth. The seismogram Laboratory in KAUST. We thank the members of Seismic
(Figure 9) demonstrates that PP waves are well inverted Wave Analysis Group (SWAG) in KAUST for helpful
after an acoustic FWI of the vertical component and PS discussions.
waves are partially inverted after elastic FWI for the
horizontal component (red arrows).
well log conventional

(a) initial ExA (b)
5.0
Velocity (km/s)
4.5
Vp/Vs ratio
2.5 Vp 3.0
Vs
0.5 1.0
1.5 2.0 2.5 3.0 1.9 2.4 2.9 3.2
Depth (km) Depth (km)
Figure 8: (a) Vertical profiles extracted from the well, which starts from the red dot on the top and ends at the blue dot at the bottom in Figure 4
and (b) computational time of boundary saving method and ExA.
Offset (km) Amplitude Offset (km) Amplitude
(a) (b)
5 2.5 0 2.5 5 2.5 5 4 2 0 2 4 2 4
0 0.05 0 0.02
observed acoustic observed elastic
initial initial
(vertical) FWI (inline) FWI
Time (s)
Time (s)
2 0 2 0
slower
4 -0.05 4 -0.02
Figure 9: Comparisons of (a) vertical and (b) inline components of observed and modeled data from inverted model after acoustic FWI and after
elastic FWI using P-wave excitation amplitude. The source (black dot) and receiver (black dashed line) locations are marked in Figure 4. We also
display modeled data from initial models for comparison. Notice that the acoustic modeling is used for (a) and elastic modeling is used for (b).
EDITED REFERENCES
REFERENCES
Alkhalifah, T., N. Masmoudi, and J. W. Oh, 2016, A recipe for practical full waveform inversion in
orthorhombic anisotropy: The Leading Edge, 35, 1076–1083,
https://doi.org/10.1190/tle35121076.1.
Boehm, C., M. Hanzich, J. Puente, and A. Fichtner, 2016, Wavefield compression for adjoint methods in
full-waveform inversion: Geophysics, 81, no. 6, R385–R397, http://doi.org/10.1190/geo2015-
0653.1.
https://doi.org/10.1190/1.3215771.
Cheng, J., and W. Kang, 2014, Simulating propagation of separated wave modes in general anisotropic
media, Part 1: qP-wave propagators: Geophysics, 79, no. 1, C1–C18,
https://doi.org/10.1190/geo2012-0504.1.
Choi, Y., and T. Alkhalifah, 2012, Application of multi-source waveform inversion to marine streamer
data using the global correlation norm: Geophysical Prospecting, 60, 748–758,
http://doi.org/10.1111/j.1365-2478.2012.01079.x.
Díaz, E., and A. Guitton, 2011, Fast full waveform inversion with random shot decimation: 81st Annual
diagnostic basis for stratigraphic traps: Geophysics, 39, 770–780,
http://doi.org/10.1190/1.1440465.
Kalita, M., and T. Alkhalifah, 2016, Full waveform inversion using the excitation representation of the
source wavefield: 86th Annual International Meeting, SEG, Expanded Abstracts, 1084–1088,
Nguyen, B. D., and G. A. McMechan, 2015, Five ways to avoid storing source wavefield snapshots in 2D
elastic prestack reverse time migration: Geophysics, 80, no. 1, S1–S18,
https://doi.org/10.1190/geo2014-0014.1.
Oh, J. W., and T. Alkhalifah, 2016, Elastic orthorhombic anisotropic parameter inversion: An analysis of
practice: The Leading Edge, 32, 1040–1054, https://doi.org/10.1190/tle32091040.1.
Prieux, V., R. Brossier, S. Operto, and J. Virieux, 2013, Multiparameter full waveform inversion of
multicomponent ocean-bottom-cable data from the Valhall field. Part 2: Imaging compressive
wave and shear-wave velocities: Geophysical Journal International, 194, 1665–1681,
https://doi.org/10.1093/gji/ggt178.
Raknes, E. B., and W. Weibull, 2016, Efficient 3D elastic full-waveform inversion using wavefield
reconstruction methods: Geophysics, 81, no. 2, R45–R55, http://doi.org/10.1190/geo2015-0185.1.
Sears, T. J., P. J. Barton, and S. C. Singh, 2010, Elastic full waveform inversion of multicomponent
ocean-bottom cable seismic data: Application to Alba Field, U. K. North Sea: Geophysics, 75, no.
6, R105–R119, http://doi.org/10.1190/1.3484097.
© 2017 SEG Page 1560

Sirgue, L., J. Etgen, and U. Albertin, 2008, 3D frequency domain waveform inversion using time domain
finite difference methods: 70th Annual International Conference and Exhibition, EAGE,
Extended Abstracts, F022, , http://doi.org/10.3997/2214-4609.20147683.
Szydlik, T., P. Smith, S. Way, L. Aamodt and C. Friedrich, 2007, 3D PP/PS prestack depth migration on
the Volve field: First Break, 25, 43–47.
1893–1903, https://doi.org/10.1190/1.1442046.
Vigh, D., K. Jiao, D. Watts, and D. Sun, 2014, Elastic full-waveform inversion application using
multicomponent measurements of seismic data collection: Geophysics, 79, no. 2, R63–R77,
http://doi.org/10.1190/geo2013-0055.1.
Wang, T. F., and J. B. Cheng, 2017, Elastic full waveform inversion based on mode decomposition: The
approach and mechanism: Geophysical Journal International, 209, 606–622,
http://doi.org/10.1093/gji/ggx038.
© 2017 SEG Page 1561

Multi-parameter elastic full waveform inversion in the presence of azimuthally rotated
orthorhombic anisotropy: Application to 9C land data
Ju-Won Oh* and Tariq Alkhalifah, King Abdullah University of Science and Technology
Summary marine streamer data. However, using the marine data, all
source and receiver components are coupled and the
To examine the feasibility of elastic full waveform seismic source does not generate S-waves thus, the
inversion (FWI) for azimuthally rotated orthorhombic coupling between parameters are hard to be detached. On
(rORT) media, we analyze the sensitivity of the 9- the other hand, 9C 3D seismic data (Simmons and Backus,
component (9C) land data set acquired on the surface on 2001) provides the opportunity to recover more ORT
each of the ORT parameters. The trade-off analysis parameters with less trade-off by capturing more wave
supports that the parameter set that includes deviation modes than conventional data (Rusmanugroho and
parameters offers the best choice for a 9C data set. McMechan, 2012).
Compared to the data from an explosive source, using the In this study, we examine the feasibility of the elastic ORT
9C land data, ORT parameters show different trade-off FWI to 9C land data considering a rotation in the
patterns for the different source and receiver components. orthorhombic anisotropy (Tsvankin and Grechka, 2011).
For this reason, finding an optimal component considering First, we analyze the trade-off among parameters
trade-offs is another important issue to better recover considering different wave modes and different source and
subsurface rotated orthorhombic anisotropy. receiver components, and use such analysis to build up an
optimal FWI strategy. Finally, the proposed multi-stage
Introduction FWI is applied on synthetic 9C land data set.
The elastic orthorhombic assumption (Tsvankin, 1997), Inverse problem in elastic rORT media
which takes into account the horizontal layering and
vertical fractures, is one of the most practical Earth models. The elastic wave equation in rORT media can be expressed
Under the orthorhombic assumption, finding the azimuthal with particle displacements (ui), stress (σij), strain (εij) and
variations of anisotropic parameters along fast and slow seismic source (fi) as follows:
axes and the rotation angle of the symmetry planes is ∂2 u
important to increase the convergence rate of multi- ρ 2 i = ∂ j σ ij + f i (1)
∂t
parameter FWI. To reduce the trade-off in multi-parameter
FWI (Operto et al., 2013), finding an optimal and σ ij = Cijkl
rORT
ε kl . (2)
parameterization is crucial. Alkhalifah and Plessix (2014) In Voight notation (Cijkl -> CIJ), the stiffness matrix can be
compared parameterizations for acoustic vertical expressed as the rotation of the orthorhombic stiffness
transversely isotropic (VTI) FWI. Then, Masmoudi and
matrix ( CORT ; Tsvankin, 1997) around the vertical
Alkhalifah (2016) extended this investigation to acoustic IJ
ORT FWI and showed a new parameterization by direction (z) using bond transformation matrix ( M z (φ ) ;
employing deviation parameters to define differences Ivanov and Stova, 2016) as follows:
between fast and slow axes. For elastic FWI, although ! rORT = M (φ )CORT MT (φ ) ,
finding an optimal parameterization also has been actively C IJ z IJ z
(3)
studied from isotropic (Tarantola, 1986; Kӧhn et al., 2012; where
Prieux et al., 2013; Operto et al., 2013) to TI
⎛ C!11 C!12 C!13 0 0 C!16 ⎞
approximations (He and Plessix, 2016; Kamath and ⎜ ⎟
Tsvankin, 2016; Pan et al., 2016), the studies on elastic ⎜ C!12 C! 22 C! 23 0 0 C! 26 ⎟
⎜ ⎟
ORT media are relatively rare. Oh and Alkhalifah (2016) ⎜ C!13 C! 23 C! 0 0 C! 36 ⎟
! rORT
C =⎜
33
(4)
suggested a new elastic ORT parameterization, which C! 44 C! 45 0 ⎟
IJ
0 0 0
⎜ ⎟
provides the opportunity to apply a multi-stage FWI ⎜ 0 0 0 C! C! 0 ⎟
strategy by decoupling the scattering potentials of the ⎜ 45 55
⎟
⎜⎝ C!16 C! 26 C! 36 0 0 C! 66 ⎟⎠
model parameters corresponding to high-symmetry and
low-symmetry anisotropy as well as between acoustic and and the azimuth angle (ϕ) is defined as the angle measured
elastic media. For this ORT parameterization, resolution from x–axis in a counter-clockwise direction. The goal of
analysis is also conducted (Kazei and Alkhalifah, 2017). FWI is finding a solution, which minimizes the objective
Oh and Alkhalifah (2017) further extended this parameter function that measures the misfit between modeled (u) and
set to rORT media and examined the feasibility of inverting observed (d) data. In the least squares sense, the objective
for the rotation angle of a symmetric plane assuming function can be expressed by
Elastic FWI in presence of rotated orthorhombic anisotropy
2 anisotropy. Then, the deviation parameters (εD, ηD, γD, δ3)

E(p) = ∫ ⎡⎣ d(t) − u(p,t) ⎤⎦ dt . (5) and the azimuth angle (ϕ) can be inverted. In this study,
The gradient direction for an arbitrary parameter, p, is focusing on land seismic data, we compare FWI
given by performance of this parameterization for the data from an
explosive source and 9C land data. Because we can capture
∇ p E = ∫ uPDW
p
(p,t) ⎡⎣ d(t) − u(p,t) ⎤⎦ dt . (6)
various wave modes in 9C land survey (Table 1), we have
Here, uPDW and σPDW are the partial derivative wavefield in the opportunity to recover more ORT parameters. To do
terms of displacement and stress, respectively, that are the that, we need to choose an optimal FWI strategy for 9C
solution of following wave equation. land data to mitigate the trade-off among parameters.
∂ 2 uPDW,i
p
⎛ ∂C! ⎞ 0
I
4 (km) 8
ρ = ∂ j σ PDW,ij
p
+ ∂ j ⎜ ijkl ε kl ⎟ (7) X Z
∂t 2
⎝ ∂ p ⎠ 4 S
R
(km) θo
8
We ignore density in this abstract. The last term is the 0
virtual source (Pratt et al., 1998) acting like a point moment L1
2 L2
tensor source induced by the model parameter perturbation (km)
L3
4
(Oh and Alkhalifah, 2016). The gradient direction for any
parameter (p) can be obtained by the linear combination of Figure 1: The geometry of a 3-layerd model: L1 and L3 are
13 monoclinic CIJ parameters as follows: isotropic backgrounds and L2 is the target rORT layer. The blue
∂C! line indicates a source-receiver line. The red, green and yellow
∇ p E = ∑ IJ ∇ C! E
IJ ∂p
(
IJ
) (8) dots denote the source, receiver and perturbation points used for
radiation pattern analysis. For the 9C seismic survey, each source
(S) and receiver (R) has 3 components along inline (I), crossline
Optimal parameterization (X) and vertical (Z) directions. θo is the opening angle.
To define rORT media, Oh and Alkhalifah (2017) v p1 ε1 η1

suggested a parameterization given by the following
SH-SH
parameters: PP
(v p1
vs1 ε1 η1 γ 1 δ 3 ε D η D γ D φ ). (9)
The main advantage of this parameterization is the
hierarchical parameter update demonstrated in the radiation
v s1 γ1 δ3
pattern analysis. Figure 1 shows the geometry of source
(red dot), receiver (green dot) and reflection point (yellow
dot) for an rORT layer interbedded by isotropic layers.
Table 1: Wave modes associated with different seismic data
acquisition options (after Hardage et al., 2011). γD
εD ηD
Azimuth ( φsr ): 0° 30° 45° 60° 90° All
Figure 2: The reflection based radiation pattern of P-P and SH-SH

modes from a horizontal reflector (Oh and Alkhalifah, 2016). We
assume an isotropic background model and isotropically radiated
incidence wave. ϕsr is the azimuth angle of the source-receiver line
The radiation patterns in Figure 2 are calculated in terms of (blue line in Figure 1), which is not related to the rotation angle of
the opening angle following the Snell’s law for a horizontal fractures (ϕ), and the angles in each plot indicate the opening angle
reflector (Gholami et al., 2013; Oh and Alkhalifah, 2016). (θo) in Figure 1. Yellow area denotes the resolution limit as a result
Because of the limited space, we only display P-P and SH- offset-to-depth ratio (4) for the doughnut model.
SH modes. This parameterization offers an optimal
configuration for orthorhombic anisotropy. As Oh and Trade-off analysis
Alkhalifah (2017) showed, we can build up the subsurface
model from isotropic (vp1 and vs1) to VTI (ε1, η1, γ1) To analyze trade-off between the parameters for different
regardless of the azimuthal rotation of the orthorhombic source types, we calculate the gradient direction for a
simple model called the doughnut model, in which each 2

Inline (km)
4 6
Inline (km) Inline (km) Inline (km)
2 4 6
ORT parameter perturbation occurs at a different piece of (a) 2
xy xy xy Vs1 xy
Crossline (km)
the doughnut (Figure 3). Uniformly spaced 400 sources (3- "D "D
4
10Hz) are located at 200 m in depth and 3C receivers are
#1
located at all nodal points at the same depth. For each 6
0.4
source, only receivers within 4 km offset are activated thus 0.9
the offset-to-depth ratio at a target doughnut is about 4. For 1.4
AA′ AA′ AA′ AA′
Vs1
this ratio, the maximum opening angle is about 126°, which
Depth (km)
BB′ BB′ BB′ BB′
indicates that the yellow area in the radiation pattern 0.4
#1
0.9
(Figure 2) is not available in this example. The benefit of 1.4 CC′ CC′ CC′ CC′
this model is that we can clearly distinguish the trade-off "D "D
among ORT parameters in a given acquisition geometry vp1

DD′ DD′ DD′ DD′
vs1 !1 !D
(source type and offset-to-depth ratio). For this case, we -1 1
Normalized gradient
compare the trade-off patterns for different source types.
Inline (km)
Here, for simplicity, we assume that orthorhombic media is 2 4 6
Inline (km)
2
Inline (km)
4 6
Inline (km)
not rotated. However, the influences of azimuth angle on (b) 2

xy xy Vs1 xy xy
Crossline (km)
"D
the radiation pattern and gradient directions are well 4
"1
described in Oh and Alkhalifah (2017). η1 Vp1
(a) (b) 0.4 6
Inline (km) 0.9

0 2 4 6 8 1.4
Vs1 Vp1
Inline (km)
Depth (km)
B
Crossline (km)
0 2 4 6 8 C D′ 0.4
background 9 0.9 η1
7 2 8 1.4 CC′ CC′ CC′ CC′
Depth (km)
4 A 4 9 3 A′ "D
1 Anomaly 6 1 5 DD′ DD′ DD′ DD′
vp1 vs1 !1 !D
D C′
vp=2 km/s vs=1.15 km/s !=1 g/cm3 -1 1
Normalized gradient
B′
2 8
Inline (km) Inline (km) Inline (km) Inline (km)
2 4 6 2 4 6
(c) (c) 2
xy xy xy xy
Crossline (km)
$3 "D "D "D

#D "1 4
Figure 3: A doughnut model for the trade-off analysis. The ηD η1

6
anomaly has 9 different parameter perturbations and each 0.4
0.9 #D
parameter perturbation occurs at a different piece. AA′ AA′ AA′ AA′
1.4
Inline (km) Inline (km) Inline (km) Inline (km)
2 4 6 2 4 6
Depth (km)
2 0.4
xy xy xy xy $3 η1
Crossline (km)
"D "D #3 0.9

1.4 CC′ CC′ CC′ CC′
4
"D ηD "D "D
Vp1 Vp1 Vp1 DD′ DD′ DD′ DD′

0.4 6 vp1 vs1 !1 !D
0.9 -1 1
Normalized gradient
1.4
Vp1 Vp1 Vp1 Figure 5: Same as Figure 4 but from (a) vertical, (b) inline and (c)
Depth (km)
0.4
BB′ BB′ BB′ BB′ crossline sources with 3C recordings.
0.9 #3
1.4 CC′ CC′ CC′ CC′ Figure 4 shows the gradient direction from the explosive
"D "D source at each slice. Due to the limited space, only vp1, vs1,
DD′ DD′ DD′ DD′ ε1 and εD, following Oh and Alkhalifah (2017) reference as
vp1 vs1 !1 !D “ideal” parameters, are displayed. For each parameter, the
-1 1
Normalized gradient arrow indicates the true location of that parameter while the
parameters having trade-off are written in blue color. The
Figure 4: The gradient directions for 9 orthorhombic parameters
from explosive sources and 3C recordings for the doughnut model. black, white and red colors for each arrow indicate that the
The horizontal slice is chosen at the top of the anomaly and 4 influence of that parameter is strong, medium and weak,
vertical slices are chosen along different azimuth angles in Figure respectively, compared to trade-off imprints. In Figure 4,
3. For visualization, only the area within black dashed lines is because the explosive source only generates P-waves, the
displayed. parameter’s trade-off are mainly constrained by only P-P
and P-SV modes (Table 1). For this reason, the ideal parallelized over shots and subdomains. The gradient
parameters suffer from trade-off for the given offset-to- direction is computed in the frequency domain (3-10Hz)
depth ratio (4). Figure 5 shows the gradients of the ideal after taking discrete Fourier transforms of the forward and
parameters obtained from the 9C land data. As shown in backward wavefields on the fly (Sirgue et al., 2008). To
Table 1, seismic data from the vertical source also captures reduce the complexity, we consider only the ideal
mainly P-P and P-SV modes. However, because most of parameters in Figure 6 with the other parameters set to 0.
the energy is actually propagating in the vertical direction, For an initial model, we use a smoothed VTI version of the
the parameters that require wide opening angle like εD are model. We start FWI by inverting for vp1, vs1 and ε1 using
weakly resolved. As Oh and Alkhalifah (2017) showed, the the data from the vertical source. In Figure 5, the inline
resolution of εD is the crucial to the success of inverting for source is crucial to separate vp1 and vs1. However, because
the azimuth angle. For this reason, using only the vertical we do not consider the η and γ parameters in this example,
source, we do not expect to recover crucial subsurface ORT the vertical source is good enough to build vp1 and vs1 with
features. Figures 5b and 5c show the gradients from the reduced trade-off (Figure 5a). Then, we run FWI for εD and
horizontal sources. As the radiation pattern (Figure 2) ϕ by matching the data from the crossline source, which is
shows, the parameters, vs1, η1, ηD and γD, have strong trade- the most sensitive component to εD under our assumption
off between them. The trade-off is hard to decouple. (ϕ < ±45°). Thanks to the multi-stage FWI, subsurface
However, for the inline source (Figure 5b), [vs1, η1] and rORT features are well estimated (Figure 7). However,
[ηD, γD] are decoupled because the latter parameters are depending on the true azimuth angle, a proper horizontal
sensitive to wide azimuth data only. For the crossline source component should be chosen. We will discuss our
source (Figure 5c), although the trade-off is more severe, εD observations in more detail when we present it.
has a strong gradient (sensitivity). However, εD suffers 0
Inline (km)
4 8
Inline (km)
4 8
Inline (km)
4 8
Inline (km)
4 8
Inline (km)
4 8
from a strong trade-off with γ1. Because the parameter γ1 is Crossline (km)
8 xy xy xy xy xy
only sensitive to SH-SH mode (Oh and Alkhalifah, 2016),

4
such a trade-off between εD and γ1 is the product of SH-SH

correlation. This indicates that, if we properly decouple εD
0
and γ1 by isolating the SH-SH mode in the data, the 1

Depth (km)
parameter εD can be recovered using the crossline source 2

xz xz xz xz xz
(or inline source when the rotation is large). From this 1
initial εD, ϕ can be estimated, being recovered with εD 2

yz yz yz yz yz
simultaneously, as Oh and Alkhalifah (2017) showed. 1.5

v p1
4.5 0.8
vs1
2.8 0
ε1
0.12 -0.08
εD
0.08 -30 °
φ
30°
Inline (km) Inline (km) Inline (km) Inline (km) Inline (km)
0 4 8 4 8 4 8 4 8 4 8
Figure 7: Inverted model with multi-stage approach.
8
xy xy xy xy xy
Crossline (km)
Conclusions
4
0
1
We analyze the sensitivity of elastic orthorhombic
Depth (km)
2
xz xz xz xz xz parameters to 9C land data acquired on the surface. A
1
simple doughnut-shaped model is designed to investigate
2
yz yz yz yz yz the trade-off among parameters for the various source
1.5 4.5 0.8 2.8 0 0.12 -0.08 0.08 -30 ° 30° components. The trade-off analysis shows that 9C land data
v p1 vs1 ε1 εD φ
has the potential to resolve subsurface ORT anisotropy with
Figure 6: The true model. Note that other parameters are set to 0. less trade-off between the parameters. An efficient
implementation is needed through a hierarchical strategy
Numerical example: Synthetic 9C land data for only the parameters that the 9C data is sensitive to.
To test the multi-parameter FWI of the 9C land data on a Acknowledgments

more complex 3D model, we conduct FWI considering a
3D model with a high velocity mud-filled channel (Figure Research reported in this publication was supported by
6). The true azimuth angle is designed to be less than ±45° competitive research funding from King Abdullah
so that fast axis is still closer to crossline direction. Because University of Science and Technology (KAUST) in
recovering ORT parameters requires wide-offset data, a Thuwal, Saudi Arabia. For computer time, this research
huge 3D elastic ORT simulation is inevitable. In this study, used the resources of the Supercomputing Laboratory in
the FWI is conducted by MPI-based time-domain elastic KAUST. We thank the members of Seismic Wave Analysis
FWI code using a 4th-order finite difference on a staggered Group (SWAG), particularly Vladimir Kazei and Nabil
grid (Graves, 1996), in which the modeling process is Masmoudi, in KAUST for helpful discussions.
EDITED REFERENCES
REFERENCES
Alkhalifah, T., and R.-E. Plessix, 2014, A recipe for practical full-waveform inversion in anisotropic
http://doi.org/10.1190/geo2013-0366.1.
Gholami, Y., R. Brossier, R. S. Operto, A. Robodetti, and J. Virieux, 2013, Which parameterization is
suitable for acoustic vertical transverse isotropic full waveform inversion? Part 1: Sensitivity and
trade-off analysis: Geophysics, 78, no. 2, R81–R105, http://doi.org/10.1190/geo2012-0204.1.
Graves, R. W., 1996, Simulating seismic wave propagation in 3D elastic media using staggered-grid finite
differences: Bulletin of the Seismological Society of America, 86, 1091–1106.
Hardage, B. A., M. V. DeAngelo, P. E. Murray, and D. Sava, 2011, Multicomponent seismic technology:
SEG, https://doi.org/10.1190/1.9781560802891.
He, W., and R.-E. Plessix, 2016, Analysis of different parameterisations of waveform inversion of
compressional body waves in an elastic transverse isotropic Earth with a vertical axis of
symmetry: Geophysical Prospecting, http://doi.org/10.1111/1365-2478.12452.
Ivanov, Y., and A. Stovas, 2016, Upscaling in orthorhombic media: Behavior of elastic parameters in
heterogeneous fractured earth: Geophysics, 81, no. 3, C113–C126,
http://doi.org/10.1190/geo2015-0392.1.
Kamath, N., and I. Tsvankin, 2016, Elastic full-waveform inversion for VTI media: Methodology and
sensitivity analysis: Geophysics, 81, no. 2, C53–C68, http://doi.org/10.1190/geo2014-0586.1.
Kazei, V. V., and T. Alkhalifah, 2017, On the resolution of inversion for orthorhombic anisotropy: 79th
Annual International Conference and Exhibition Incorporating SPE EUROPEC 2017, EAGE,
Extended Abstracts.
Köhn, D., D. D. Nil, A. Kurzmann, A. Przebindowska, and T. Bohlen, 2012, On the influence of model
parameterization in elastic full waveform tomography: Geophysical Journal International, 191,
325–345, http://doi.org/10.1111/j.1365-246X.2012.05633.x.
Masmoudi, N., and T. Alkhalifah, 2016, A new parameterization for waveform inversion in acoustic
orthorhombic media: Geophysics, 81, no. 4, R157–R171, http://doi.org/10.1190/geo2015-0635.1.
Oh, J. W., and T. Alkhalifah, 2016, Elastic orthorhombic anisotropic parameter inversion: An analysis of
Oh, J. W., and T. Alkhalifah, 2017, Optimal full waveform inversion strategy in azimuthally rotated
elastic orthorhombic media: 79th Annual International Conference and Exhibition Incorporating
SPE EUROPEC, EAGE, Extended Abstracts, https://doi.org/10.3997/2214-4609.201701232.
Pan, Y., K. A. Innanen, G. F. Margrave, M. C. Fehler, X. Fang, and J. Li, 2016, Estimation of elastic
constants for HTI media using Gauss-Newton and full-Newton multiparameter full-waveform
inversion: Geophysics, 81, no. 5, R275–R291, http://doi.org/10.1190/geo2015-0594.1.
Prieux, V., R. Brossier, S. Operto, and J. Virieux, 2013, Multiparameter full waveform inversion of
multicomponent ocean-bottom-cable data from the Valhall field. Part 2: Imaging compressive
wave and shear-wave velocities: Geophysical Journal International, 194, 1665–1681,
http://doi.org/10.1093/gji/ggt178.
© 2017 SEG Page 1566

Rusmanugroho, H., and G. A. McMechan, 2012, 3D, 9C seismic modeling and inversion of Weyburn
Field data: Geophysics, 77, no. 4, R161–R173, http://doi.org/10.1190/geo2011-0406.1.
Simmons, J., and M. Backus, 2001, Shear waves from 3-D-9-C seismic reflection data: The Leading
Edge, 20, 604–612, http://doi.org/10.1190/1.1439002.
Sirgue, L., T. J. Etgen, and U. Albertin, 2008, 3d frequency domain waveform inversion using time
domain finite difference methods: 70th Annual International Conference and Exhibition
incorporating SPE EUROPEC 2008, EAGE, Extended Abstracts, http://doi.org/10.3997/2214-
4609.20147683.
1893–1903, http://doi.org/10.1190/1.1442046.
Tsvankin, I., 1997, Anisotropic parameters and P-wave velocity for orthorhombic media: Geophysics, 62,
1292–1309, http://doi.org/10.1190/1.1444231.
Tsvankin, I., and V. Grechka, 2011, Seismology and azimuthally anisotropic media and seismic fracture
characterization: SEG, https://doi.org/10.1190/1.9781560802839.
© 2017 SEG Page 1567

An algorithm for Vector data Full Waveform Inversion
Mostafa Akrami∗ , Polina Zheglova and Alison Malcolm, Department of Earth Sciences, Memorial University of New-
foundland, St. John’s, NL, Canada
SUMMARY recorded data, specifically:
In exploration seismology, constructing an accurate velocity model is 1

minimize kPu − dk22
imperative. One of the algorithms which can lead to an accurate ve- m 2 (1)
locity model is Full Waveform Inversion (FWI). Recent advances in subject to L(m)u = q,
marine acquisition make it possible to record multicomponent seis-
mic data that contain both pressure and velocity components. There is
where P is a projection operator that restricts the synthetic wave field
also interest in creating more complicated source signatures, such as
to the receiver locations, L(m) is the forward modelling operator and
directional dipole sources. In order to take advantage of these new ac-
m is the model parameter to be recovered. In the conventional acoustic
quisition technologies, new algorithms need to be developed that make
FWI, u and d are acoustic pressure modeled and field data respectively,
use of the additional information contained in these extended data sets.
q is a pressure source, and m is typically the squared slowness of the
Vector data sets have been successfully applied in data processing and
compressional wave.
reverse time migration, but so far no FWI algorithms have been devel-
oped. In this paper we present such an algorithm, also expanding it to The gradient of this objective function with respect to the model pa-
include different source types. We demonstrate the performance of our rameter m is computed by the adjoint state method, in which the mod-
method on two 2D synthetic examples with data sets generated from eled incident waves from the sources are cross-correlated with the
the Marmousi and BP models. In both cases we obtain good recoveries residual wavefields obtained by back propagating the data residuals
with relatively small data sets. Pu − d into the model (Pratt, 1999). We then search for the best model
update along the direction of this gradient. The perturbation obtained
INTRODUCTION after the first iteration of the local optimization is similar to a migrated
image obtained with RTM. We then add the recovered velocity pertur-
The workflow of FWI consists of four main steps (Tarantola, 1984):
bation to the initial velocity model and this updated velocity model is
(i) Obtaining the modeled data by solving the wave equations for all
used as the starting model for the next iteration.
the sources; (ii) calculation of the misfit function; (iii) back propagat-
ing the data residual and (iv) correlating the wavefields from (i) and Following the notation of Fleury and Vasconcelos (2013), we aim to
(iii) obtaining a model update. Additionally, regularization and con- reconstruct model parameter m which can be defined by
ditioning may be applied. Most work in FWI focuses on algorithmic
improvements for one or more of these aspects. In this paper we focus 1
instead on changing the input data to improve our ability to form high m = κρ = , (2)
c2
resolution images.
where κ and ρ are the compressibility and mass density respectively,
To do this, we build on recent advances in marine data acquisition.
and c is velocity, so that m is squared slowness. For our VFWI scheme,
Instead of recording only conventional (scalar) acoustic pressure data,
we define a vector source s that contains both pressure, q and point-
seismic streamers are now able to record both scalar and vector data
force, f source components:
(pressure and particle velocity components) at the same time (Roberts-
son et al., 2008). On the other hand, novel dual source technology
combines monopole and dipole marine sources (Meier et al., 2015). q
s= , (3)
These new seismic acquisition techniques in turn lead to advances f
in data processing, e.g. new techniques for wavefield separation and
ghost removal (Sun et al., 2015). and the total wavefields

On the imaging side, to better exploit these data, Fleury and Vasconce- pq, f
uq,f = . (4)
los (2013) proposed a multicomponent reverse time migration (RTM) vq, f
method, which is based on the adjoint-state formulation, in which the
image is defined as the sensitivity kernel of an objective function min- Here pq,f and vq,f are pressure and particle displacement and the sub-
imizing the misfit between modeled and recorded data. Their method scripts q and f denote the fields generated from pressure and point-
uses vector acoustic wave equations for pressure and the correspond- force sources respectively. In the constant density acoustic case, we
ing displacement fields as the forward model. In addition, monopole can write the vector-acoustic wave equation as a set of linear differen-
and dipole sources are employed to generate the wave fields. This ap- tial equations in terms of pq,f and vq,f , in which m is a parameter:
proach is more computationally expensive than conventional RTM, but
it allows additional information about the directionality of the recorded (
m pq,f + ∇ · vq,f = q,
waves to be extracted from the data. Moreover, using different types ∂2 (5)
of sources facilitates directionally targeted imaging. v + ∇pq,f = f.
∂t 2 q,f
In this paper, we extend this approach to FWI. With this extension, we Our vector-acoustic wave equations have Perfectly Matched Layer (PML)
then explore how both incorporating vector data and different source absorbing boundary conditions (Collino and Tsogka, 2001) on all sides
configurations influences the final image. We test our method on both of the computational domain to mimic an infinite medium. In matrix
the Marmousi and BP synthetic models. form, the set of equations (5) becomes
METHODOLOGY !
Problem statement and solution m ∇· pq,f q
LVA (m)uq,f = ∂2 = , (6)
∇ I vq,f f
FWI tries to minimize the misfit function between the modeled and ∂t 2
© 2017 SEG Page 1568

VFWI
where the vector acoustic operator LVA is for the dipole point force source. Thus our sources have the following
signatures:
!
m ∇· ∂ 2 w(t)
LVA = (7) β δ (x − xs ) (14)
∇ ∂2
I
. ∂t 2
∂t 2
for the monopole pressure source, and
∂ δ (z − zs , x − xs ) ∂ δ (z − zs , x − xs )
LVA† , the adjoint of operator LVA , can be derived by some straightfor- nz γ w(t) + nx γ w(t) (15)
∂z ∂x
ward algebra: !
m −∇· for the dipole point force source, where x = (z, x), xs = (zs , xs ), β and
LVA† = ∂ 2 . (8) γ are source weights and (nz , nx )T is a unit vector. By varying this unit
−∇ ∂t 2I
vector we can arbitrarily set the orientation of the dipole point force
We now define our joint objective function as source, e.g. (nz , nx )T = (0, 1)T for the horizontal dipole source and
(1, 0)T for the vertical dipole source. Introducing β and γ ensures that
Z different sources have correct total output energy:
1
2∑
J(m) = kWr [uq (xs , xr ,t; m) − dq (xs , xr ,t)]k2 + (
s,r T (9) ∆t
β = 2.0κ ∆x 2,
+ kWr [uf (xs ; xr ,t; m) − df (xs , xr ,t)]k 2
dt, ∆t (16)
γ = 2.0ρ ∆x
−1
2.
where dq,f are recorded pressure and displacement data, subscripts s, r

refer to sources and receivers and Wr is the receiver linear weight- In order to generate vector data and solve the forward problem in the
ing operator, which is typically applied to ensure that the contributions time domain, we use a two-dimensional acoustic solver from PySIT
from the different data types are comparable and have the same physi- (Hewett and Demanet, 2013).
cal dimensions. Note that this objective function is very similar to the Optimization and algorithm
equation (18) of Fleury and Vasconcelos (2013). In this abstract how-
ever, in order to simplify the derivations and demonstrate the concepts, Minimization of the objective function (9) requires computation of the
we set Wr to the identity matrix. Applying the weights to the data will gradient, an inverse Hessian approximation and an optimization pro-
be addressed in further work. We also note that the objective function cedure. VFWI is a large scale problem, for which computation and
is given in (9) in its most general form, that is it incorporates the data storage of the inverse Hessian, H−1 is prohibitively expensive. There-
from both pressure and point-force sources. In numerical experiments fore, we use the l-BFGS optimization method (Nocedal and Wright,
presented below we use only one source type per experiment, so that 2006) in which H−1 is never formed explicitly. Only a few gradients
our objective function contains only one of the two terms under the from the previous iterations need to be stored in order to compute an
integral sign. This reduces the importance of the Wr weights. approximation to H−1 , which has almost negligible cost compared to
methods such as conjugate gradient.
From equation (9) we can now derive the adjoint sources (see e.g.
Fichtner, 2011, for the necessary procedure): Algorithm 1 summarizes the sequence of steps in our time domain
VFWI method, where we note that steps in the inner loop are per-
s†q,f (x, xs ,t) = ∑[uq,f (xr , xs , T − t; m) − dq,f (xr , xs , T − t)]δ (x − xr ), formed in parallel. We use PySIT implementation of the l-BFGS
r method.
(10)
Algorithm 1 Time domain VFWI algorithm
and the adjoint equations: Input: Measured vector acoustic data dq,f
Output: argminm J(m)
LVA† u†q,f (x, xs ,t) = −s†q,f (x, xs ,t). (11) For k=1:Niter
For s=1:Nsrc
Finally, from (6) and (9) we find that the gradient ∂∂m J(m) is computed Starting model ←− m0
by cross-correlation of the forward and adjoint pressure fields: Compute forward wavefields uq,f via Equation 5;
Compute data residuals and the objective function;
Z Compute back-propagated residual wavefields;
∂
J(m) = ∑ p†q,f (x, xs ,t)pq,f (x, xs ,t) dt. (12) Compute the gradient using Equation 12;
∂m s T Add to the summation over all sources;
End
Source implementation Calculate the model update using l-BFGS and update the model
End
As mentioned above, we can extend our analysis to examine two main
types of sources: monopole pressure sources and dipole point-force
sources. The dipole sources can be oriented in any way; here we ex- Regularization
amine horizontal, vertical and an angular source half-way between the
horizontal and vertical directions. The conventional seismic source is In our inversions we see significant oscillations that are clearly not
defined by the multiplication of the Ricker wavelet w(t) in time and part of the true model. To mitigate this, we regularize the inversion,
Dirac delta function in space: modifying our objective function to be
Z
1
2∑
s(t, z, x) = w(t) δ (x − xs ). (13) J(m) = kWr [uq (xs , xr ,t; m) − dq (xs , xr ,t)]k2 +
s,r T
(17)
µ
+ kWr [uf (xs ; xr ,t; m) − df (xs , xr ,t)]k2 dt + k∇mk2 ,
In the conventional second order acoustic wave equation, which is used 2
to solve the forward modelling problem in our method, we therefore where the last term is a standard quadratic regularization and µ is a
∂2
need to input κ ∂t 2 q for the monopole pressure source and ρ
−1 ∇ · f, regularization weight. The result is a well-behaved objective function.
To illustrate the benefit of this function we apply it to the BP velocity
model in the Results section.
© 2017 SEG Page 1569

VFWI
RESULTS
In this section, we demonstrate our VFWI algorithm on two examples.
We first show the results of applying VFWI to the Marmousi model
(Brougois et al., 1990). We then show the results for the BP model
(Billette and Brandsberg-Dahl, 2005). In both examples sources and
receivers are equally spaced and spread over the entire top boundary.
The peak frequency of the source is 10 Hz.
Marmousi Model
In this example we use the VFWI algorithm to reconstruct the Mar-

mousi velocity model. The model is discretized using 151 nodes in the
z direction and 461 nodes in the x direction. Node spacing is 20 m. We
use the PySIT 2D acoustic solver to generate the synthetic recorded Figure 2: Objective function decay for the recovery of the Marmousi
data. This may make our results appear better than they would be for model using vector acoustic data and monopole pressure sources.
a real data set.
The true and initial velocities are plotted in Figure 1 (top and middle
images respectively). For this example we used 10 monopole pressure In this example, we perform three recoveries: (1) with monopole sources;
sources and 30 iterations of the inversion. Receivers were placed all (2) with horizontal dipole sources and (3) with angle dipole sources
the way across the top boundary. The reconstruction is shown Figure 1 with 45◦ orientation. We use 50 sources and 100 receivers for the
(bottom) and the objective function decay is plotted in Figure 2. We monopole source recovery, 50 sources and 150 receivers for the hori-
see that, the model generated by monopole pressure sources gives a zontal dipole source recovery, and 100 sources and 150 receivers for
good recovery of true velocity model. We see that using vector acous- the angle dipole source recovery. Both sources and receivers are placed
tic data enables us to obtain a high resolution recovery of the model across the top boundary of the computational domain with equal spac-
and fast objective function decay with only a small number of conven- ing. In all cases we perform 30 iterations of l-BFGS to invert the
tional seismic sources. model.
The recovery with the monopole pressure sources in shown in Figure 4.

The recovery using the horizontal dipole source is shown in Figure 5.
The recovered models in both cases have relatively high resolution
and contain few artifacts, with the recovery from the horizontal dipole
sources being slightly better. Because of the radiation pattern of the
horizontal dipole sources, we have more energy interacting with the
perturbations especially on the sides of the model.
We then recovered the BP model using angle dipole sources. As can be

seen from the top plot in Figure 6, the recovered velocity model asso-
ciated with the angle dipole sources has oscillations across the model.
In this case, we have a poor overall reconstruction, however we also
observe sharper resolution of some of the smaller features, especially
at the top of the salt body, compared to the previous two recoveries. In
order to remove the artifacts, we use the regularized objective function
in equation (17), as explained in the methodology section, with reg-
ularization weight µ = 10. After applying regularization we obtain a
more reliable result (Figure 6, bottom), but the boundaries of the salt
body are less sharp due to the smoothing property of the L2 norm in
the regularization term.
CONCLUSIONS
In this paper we presented an algorithm for FWI using vector acoustic
data. We define vector data as data in which both pressure and particle
velocity are recorded. In addition, both monopole and dipole sources
are used in our method.
We demonstrated the performance of our VFWI algorithm on two syn-

Figure 1: True, initial and reconstruction of the Marmousi veloc-
thetic 2D examples. Our results show that the method is promising. It
ity model. The reconstruction is obtained from monopole pressure
provided a very good recovery of the Marmousi model with a very
sources using vector-acoustic data.
small number of monopole sources, and also reliable recoveries of the
BP Model subsampled BP model using three different kinds of sources, where
in one case we used regularized objective function. Possible future
In our second example we reconstruct the BP velocity model, which research directions include applying different kinds of regularization,
contains salt flanks and irregular shapes. The BP velocity model is for example TV, in order to preserve sharper boundaries of the homo-
discretized using 115 nodes in the z direction and 205 nodes in the x geneous salt bodies in the regularized recovery; combining different
direction. Since the BP model is large and computationally expensive, source types in one experiment with the expectation to gain higher
we subsampled the true model so that only 12 percent of samples re- resolution and better cancellation of artifacts; extending the method to
mained. The true and initial velocity model is shown in Figure 3. As the variable density acoustic case with proper receiver weighting; and
in the previous example, we generate the synthetic data with the 2D comparing our method with the existing FWI methodology.
acoustic forward modeler in PySIT.
© 2017 SEG Page 1570

VFWI
Figure 5: Reconstruction of the subsampled BP velocity model using

horizontal dipole sources.
Figure 3: True and initial BP velocity model. A subsampled BP ve-

locity model is used (with 12% of samples remaining).

monopole pressure sources.
ACKNOWLEDGMENTS
This study was supported by Chevron and with grants from the Nat-
ural Sciences and Engineering Research Council of Canada Industrial
Research Chair Program and the Research and Development Corpo-
ration of Newfoundland and Labrador and the Hibernia Management
and Development Corporation.

angle dipole sources without regularization (top) and with regulariza-
tion, µ = 10 (bottom).
© 2017 SEG Page 1571

EDITED REFERENCES
REFERENCES
Billette, F., and S. Brandsberg-Dahl, 2005, The 2004 BP velocity benchmark: 67th Annual International
Conference and Exhibition, EAGE, Extended Abstracts.
Brougois, A., M. Bourget, P. Lailly, M. Poulet, P. Ricarte, and R. Versteeg, 1990, Marmousi model and
data: Presented at the EAGE Workshop — Practical Aspects of Seismic Data Inversion.
Collino, F., and C. Tsogka, 2001, Application of the perfectly matched absorbing layer model to the linear
elastodynamic problem in anisotropic heterogeneous media: Geophysics, 66, 294–307,
http://dx.doi.org/10.1190/1.1444908.
Fichtner, A., 2011, Full seismic waveform modelling and inversion: Springer.
Fleury, C., and I. Vasconcelos, 2013, Adjoint-state reverse time migration of 4C data: Finite-frequency
map migration for marine seismic image: Geophysical Journal International, 78, WA159–
WA172, http://dx.doi.org/10.1190/geo2012-0306.1.
Hewett, R., and L. Demanet, 2013, The PySIT team, 2013, PySIT: Python seismic imaging toolbox v0. 5:
Release 0.6.
Meier, M. A., R. E. Duren, K. T. Lewallen, J. Otero, S. Heiney, and T. Murray, 2015, A marine dipole
source for low frequency seismic acquisition: 85th Annual International Meeting, SEG, Expanded
Abstracts, 176–180, http://dx.doi.org/10.1190/segam2015-5920771.1.
Nocedal, J., and S. J. Wright, 2006, Numerical optimization: Springer.
Pratt, R. G., 1999, Seismic waveform inversion the frequency domain, Part 1: Theory and verification in a
physical scale model: Geophysics, 64, 888–910, http://dx.doi.org/10.1190/1.1444597.
Robertsson, J. O. A., I. Moore, M. Vassallo, O. Kemal, D.-J. van Manen, and A. Ozbek, 2008, On the use
of multicomponent streamer recordings for reconstruction of pressure wavefields in the crossline
direction: Geophysics, 73, no. 5, A45–A49, http://dx.doi.org/10.1190/1.2953338.
Sun, D., K. Jiao, and D. Vigh, 2015, Compensating for source and receiver ghost effects in full waveform
inversion and reverse time migration for marine streamer data: Journal of Geophysical Research,
201, 1507–1521, http://dx.doi.org/10.1093/gji/ggv089.
1259–1266, http://dx.doi.org/10.1190/1.1441754.
© 2017 SEG Page 1572

Feasibility testing of simultaneous source elastic full waveform inversion
Gian Matharu∗ and Mauricio D. Sacchi, University of Alberta
SUMMARY reformulate equation 1 to accommodate simultaneous sources

(assuming a fixed-receiver geometry). The revised misfit func-
The suitability of simultaneous sources in elastic full wave- tion may be expressed as
form inversion is tested through a series of synthetic inver- Nr Z
sions. Point spread functions, representing the action of the 1X
Jss (m) = |û(xir ,t; m) − d̂(xir ,t)|2 dt, (2)
Hessian on a model perturbation, are employed to assess the 2 T
i
resolution and parameter trade-offs of simultaneous source full
where d̂ are the simultaneous data and û are the synthetic data
waveform inversion relative to its conventional counterpart.
generated by a simultaneous source ŝ. ŝ and d̂ are defined as
The expected value of point spread functions in simultane-
ous source full waveform inversion approach those obtained Ns
X
from sequential source waveform inversion as the number of ŝ(x,t) = q j (t) ∗ s j (x,t), (3)
random realizations increases. The observation suggests that j
the resolving power and parameter trade-offs of both inversion Ns
X
schemes are comparable if cross-talk artefacts are attenuated. d̂(xir ,t) = q j (t) ∗ d j (xir ,t). (4)
A series of synthetic inversions demonstrate that simultaneous j
source inversion is able to attain comparable inversion results q j (t) are source specific encoding functions and ∗ denotes con-
to full waveform inversion while requiring orders of magnitude volution in the time domain. The use of simultaneous sources
fewer computational resources. eliminates the summation over sources in equation 1, signif-
icantly reducing the computational resources required for in-
version. An adverse side-effect of simultaneous sources, is the
INTRODUCTION introduction of cross-talk artefacts in the gradient that arise due
to interactions between forward and adjoint wavefields that are
Full waveform inversion (FWI) has experienced increasing suc- not related to the same source.
cess in applications to both 2D and 3D real data examples;
The encoding functions in equations 3-4 are selected to mini-
however, applications thus far have largely been limited to the
mize the imprint of cross-talk artefacts in SSFWI. Krebs et al.
acoustic approximation. The adoption of elastic FWI has been
(2009) and Lee et al. (2012) explored a range of random en-
precluded, in part, by the increased computational cost of elas-
coding functions of variable length. Lee et al. (2012) demon-
tic modelling. Krebs et al. (2009) applied the concept of ran-
strated that normalized random encoding functions of length
domized phase encoding (Romero et al., 2000) to generate en-
one act as all-pass filters and do not increase the number of
coded sources and data, thereby reducing the data volume by
time samples in the encoded data. With this choice of encod-
a factor equal to the number of sources. Since the cost of FWI
ing, the convolutions in equations 3-4 amount to multiplying
grows proportionally with the number of sources, the use of si-
the data/sources by a random sequence of 1s and -1s. By al-
multaneous sources provides an attractive approach to reduce
tering the encoding functions at each iteration, Krebs et al.
the computational resources required for inversion. This study
(2009) were able to attenuate cross-talk artefacts over itera-
seeks to evaluate the performance and suitability of simultane-
tions to achieve results from SSFWI that were comparable to
ous sources in elastic full waveform inversion. Henceforth, se-
FWI.
quential source full waveform inversion is referred to as FWI
and simultaneous/encoded source full waveform inversion is
referred to as SSFWI. PARAMETER TRADE-OFF
Independent model parameters can seldom be recovered uniquely

THEORY through multi-parameter inversion. The erroneous mapping of
one parameter to another is known as parameter trade-off and
The least-squares waveform misfit for conventional, sequential is distinct from the issue of cross-talk. This section compares
source FWI is defined as point spread functions (PSFs) from FWI, to the expected value
Nr XNs Z of PSFs from SSFWI to assess the influence of cross-talk arte-
1X
Jseq (m) = |u j (xir ,t; m) − d j (xir ,t)|2 dt, (1) facts on resolution and parameter trade-offs.
2 T
i j
In the vicinity of the true model, the resolvability of a model
where u(xir ,t; m) represents synthetic multi-component data perturbation can be assessed by computing
recorded at the i-th receiver that is generated with model m
H−g Hδ mtrue = δ m (5)
and source s j (x,t). A similar interpretation applies for the ob-
served data d(xir ,t). Ns and Nr denote the number of sources where H−g is the generalized inverse of the Hessian, δ mtrue is
and receivers, respectively. The linear dependence of the wave- a true model perturbation and δ m is an estimated model pertur-
field with respect to the source, allowed Krebs et al. (2009) to bation. H−g H mimics the role of the resolution operator found
© 2017 SEG Page 1573

Simultaneous source elastic FWI
in linear inverse problems. If H−g is replaced by an identity not aggravate resolvability or parameter trade-offs beyond that
in equation 5, Hδ mtrue provides a conservative estimate of the of conventional FWI, provided that cross-talk artefacts are at-
point spread function. tenuated.
In SSFWI, H is dependent on the choice of source encoding

at a given iteration. The corresponding PSFs will contain con- INVERSION SETUP
siderable cross-talk artefacts. The expected value of PSFs in
SSFWI can be estimated as A series of synthetic inversions are performed to evaluate the
N
X suitability of SSFWI for isotropic elastic waveform inversion.
1
E[Hδ m] ≈ Hi δ m, (6) The conditions stated in this section are applied to all synthetic
N inversions unless stated otherwise. ‘Observed’ and synthetic
i
data are generated using 2D time-domain, P-SV finite differ-
where Hi is the Hessian for a particular random source encod-
ence modelling (fourth order in space, second order in time)
ing and N is the number of realizations.
(Virieux, 1986; Levander, 1988). Absorbing boundaries are
For a simple test, PSFs are computed in a homogeneous back- applied to the edges of numerical grids via convolutional per-
ground model discretized on a 200 x 100 grid. 16 sources fectly matched layers (Komatitsch and Martin, 2007). An ex-
and 50 receivers are evenly distributed along the surface. The ception is made in the marine experiment where the free sur-
model parameters are P- and S-wave velocities (v p , vs ). Model face and multiples are retained. Source inversion is not per-
perturbations are introduced as an array of distributed spikes formed and the true source wavelet is available in all trials.
of unit amplitude. Perturbations are only applied to one pa-
All sources in a given survey are combined into a single en-
rameter at a time. This allows for computation of the action of
coded source for SSFWI. Challenging inversion scenarios may
the off-diagonal blocks of the Hessian on a model perturbation.
necessitate the use of multiple encoded sources where each en-
The off-diagonal blocks provide information pertaining to the
coded source forms a disjoint subset of all available sources.
strength of inter-parameter mappings. PSFs from conventional
Equation 2 can be readily adjusted to incorporate Nss encoded
FWI are compared to the expected value of PSFs in SSFWI for
sources.
a range of random realizations. Similarity is assessed using a
quality factor P and S-wave velocities (m = [v p , vs ]T ) are selected as the
model parameters for inversion; density is not inverted for.
kpest − ptrue k2
Q(pest , ptrue ) = 10 · log10 , (7) Both parameters are estimated simultaneously. In instances
kptrue k2 where density is updated, the update is performed via Gar-
where pest is any estimated vector and ptrue is a reference vec- dener’s relation (ρ = 310v0.25
p ). Non dimensionalization is ap-
tor. plied to the model parameters to obtain a new parametrization
of the form m̂ = m/m0 . The scaling values m0 are taken as the
5 mean velocities of the starting models in each case. The gradi-
Hv v p p
ent for the new parametrization is given by ĝ = m0 g, where g is
0 Hv v s s
the gradient for parametrization m. Inversions are performed
Hv v p s
using the non-dimensionalized form of the model parameters.
−5
Explicit regularization is not applied, instead gradients are con-
PSF Q
−10 volved with a Gaussian kernel. The width of the kernel is

selected to match the dominant wavelength of the frequency
−15 band being inverted. A square root of depth, diagonal pre-
conditioner is applied to search directions to compensate for
−20 inadequate illumination owing to geometrical spreading.
−25
Inversions are terminated after a case-dependent number of it-
0 20 40 60 80 100 120 140 erations. Search directions for SSFWI are computed using the
Number of realizations non-linear conjugate gradient algorithm (NLCG). Search di-
rections in FWI are computed using both NLCG and the quasi-
Figure 1: PSF Q as a function of the number of random re- Newton L-BFGS approach (Nocedal, 1980). A bracketing line
alizations. The expected PSFs from SSFWI more closely research is used in inversions using NLCG, whereas inversions
semble those from FWI as the number of random realizations using L-BFGS adopt a backtracking line search. Modrak and
is increased. Each line represents the action of particular block Tromp (2016) found the backtracking line search was more ef-
element of the Hessian acting on a model perturbation ficient than a bracketing line search when used in conjunction
with L-BFGS.
Figures 1 and 2 indicate that E[Hδ m] from SSFWI approaches
Hδ m from regular FWI with an increasing number of ran-
dom realizations. This effect is simulated during inversions TEST CASES
through repeated iteration and changes to the random encoding
at each iteration. The PSF analysis suggests that SSFWI does SEG/EAGE Overthrust
© 2017 SEG Page 1574

0 Conventional n=1 n=2 n=4 n=32 5.9e-04
20
40
0.0e+00
z
60
80
-5.9e-04
0 5.7e-04
20
40
0.0e+00
z
60
80
-5.7e-04
0 5.6e-05
20
40
0.0e+00
z
60
80
-5.6e-05
0 50 100 150 0 50 100 150 0 50 100 150 0 50 100 150 0 50 100 150
x x x x x
Figure 2: Comparison of PSFs for FWI and SSFWI. Each row corresponds to a separate block of the multi-parameter Hessian
acting on a model perturbation. The blocks are Hv p v p (top), Hvs vs (middle) and Hv p vs (bottom). The leftmost column displays the
PSFs obtained from sequential source FWI. The remainder of the columns display the expected PSFs for an increasing number of
random realizations. As the number of realizations increases, cross-talk artefacts in SSFWI are attenuated and the PSFs approach
those obtained in conventional FWI.
Synthetic tests are performed on land and marine variants of a The starting model is sufficiently far from the true model that
2D section of the 3D acoustic SEG/EAGE overthrust model full-bandwidth FWI converges to a local minima in the ob-
(Aminzadeh et al., 1997). The land model has dimensions jective function. The multi-scale approach of (Bunks et al.,
of 20 km x 4.5 km. For the marine model, a 500 m water 1995) is implemented to circumvent issues of cycle-skipping.
layer is added√to the top. A vs model is created using the rela- The frequency bands used for inversion are informed by the
tion vs = v p / 3. A heterogeneous density model is acquired selection criteria of Sirgue and Pratt (2004). The inversion is
using Gardener’s relation. Starting models are generated by performed using low-pass cutoff frequencies of 3 Hz, 5 Hz and
smoothing the true models with a Gaussian kernel of standard 8 Hz.
deviation 700 m. 96 sources are distributed at 200 m intervals
and 25 m depth. 264 multi-component receivers are placed at
75 m intervals at depths of 25 m and 500 m for land and ma- RESULTS
rine acquisitions, respectively. The source wavelet is a Ricker
wavelet with a dominant frequency of 5 Hz. Inversions are run Figure 3 displays a suite of convergence plots for all 3 test
for 50 iterations in the land case and 200 iterations in the ma- cases. Convergence is presented in three forms: normalized
rine case. The full bandwidth data is inverted and multi-scale misfit as a function of simulations per source, model Q as a
methods are not applied in this case. function of simulations per source and normalized misfit as a
function of total simulations. Convergence behaviour is pre-
Marmousi II sented as a function of the number of cumulative simulations,
as opposed to iteration, to provide a more accurate representa-
The Marmousi II model is a fully elastic synthetic model with
tion of computational cost. ‘Simulation’ in this context refers
multiple hydrocarbon layers and complex faulting (Martin et al.,
to all forward and adjoint computations (includes line search).
2006). Shallow shale layers in the original model exhibit low
FWI is commonly deployed using an embarrassingly parallel
shear wave velocities (300-400 m/s) that introduce shear-wave
scheme over sources. As such, the number of cumulative simu-
dispersion artefacts in the data unless a fine grid spacing is
lations per source is used in figure 3 to provide an approximate
used. Reduced grid spacing increases the the computational
indication of the real-time convergence rate of a particular al-
cost due to an increased model size and considerations of nu-
gorithm. The total number of simulations provides insight into
merical stability. To reduce the computational burden, shear- √ the computational resources required for any given inversion.
wave velocities in the shale layers are replaced by vs = v p / 3.
The water layer is also removed to simulate land acquisition. Both overthrust test cases show a comparable convergence rate
Initial models are derived from the true models by convolving in both misfit and model Q for SSFWI and NLCG FWI. L-
with a Gaussian kernel of standard deviation 800 m. The ac- BFGS FWI routinely demonstrates faster convergence in the
quisition survey is comprised of 112 point sources positioned per source setting. The advantage of the quasi-Newton ap-
at 80 m intervals and 10 m depth. 296 receivers are deployed at proach is apparent in the marine overthrust example, where
30 m intervals and 10 m depth. The source wavelet is a Ricker both NLCG FWI and SSFWI struggle to substantially improve
wavelet with a dominant frequency of 10 Hz. Inversions are model Q over repeated iterations. The convergence of mis-
run until a maximum number of iterations is reached or until fit with the total number of simulations, indicates that SSFWI
kgk+1 − gk k < ε, where ε is some threshold. requires approximately 2 and 1.5 orders of magnitude fewer
simulations than NLCG FWI and L-BFGS FWI, respectively.
© 2017 SEG Page 1575

1.0 −44 0.8
Normalized misfit FWI - NLCG SSFWI - NLCG −46 0.7
Normalized misfit
0.8 FWI - L-BFGS 0.6
−48
0.6 0.5
Model Q
−50
0.4
0.4 −52 0.3
−54 0.2
0.2 −56 0.1
0.0 −58 0.0 0
0 50 100 150 200 250 300 0 50 100 150 200 250 300 10 10 1 10 2 10 3 10 4 10 5
Simulations per source Simulations per source Total simulations
1.0 −44 0.6
FWI - NLCG SSFWI - NLCG −45 0.5
Normalized misfit
Normalized misfit
0.8 FWI - L-BFGS −46
−47 0.4
0.6
Model Q
−48 0.3
0.4 −49 0.2
0.2 −50
−51 0.1
0.0 −52 0.0 0
0 200 400 600 800 1000 1200 1400 0 200 400 600 800 1000 1200 1400 10 10 1 10 2 10 3 10 4 10 5 10 6
−33.0 2.0
2.0 FWI - L-BFGS SSFWI - NLCG −33.5
Normalized misfit
Normalized misfit
−34.0 1.5
1.5
Model Q
−34.5
1.0
1.0 −35.0
−35.5 0.5
0.5
−36.0
0.0 −36.5 0.0 0
0 200 400 600 800 1000 1200 1400 1600 0 200 400 600 800 1000 1200 1400 1600 10 10 1 10 2 10 3 10 4 10 5
Figure 3: Convergence plots for synthetic inversions of the overthrust land (top), overthrust marine (middle) and Marmousi II
(bottom) models. SSFWI displays comparable convergence rates (per source) with NLCG FWI. FWI using L-BFGS exhibits the
highest convergence rates per source. SSFWI is generally between 1-2 orders of magnitude less resource intensive than conventional
FWI.
These values are consistent with the expected cost reduction ACKNOWLEDGMENTS
being approximately equal to Ns . Both conventional FWI and
SSFWI are able to recover the Marmousi II model using the This research has been funded by the NSERC Alexander Gra-
multi-scale approach. The inverted results are presented in fig- ham Bell Canada graduate scholarship and the AITF scholar-
ure 4. The convergence of SSFWI is notably slower per source, ship. We would also like to thank the sponsors of the SAIG
requiring ∼1500 iterations compared to ∼300 using L-BFGS group at the University of Alberta. Inversions were automated
FWI. using a modified version of the SeisFlows inversion frame-
work.
CONCLUSIONS 0 vp - True vs - True

2.5
Depth [km]
1 3.0
m/s
m/s
2.0 1.5
The feasibility of SSFWI applied to elastic isotropic inver- 2
1.0 0.5
3
sion has been demonstrated through a series of synthetic tests.
The expected PSFs in SSFWI approach those of FWI as the 0 vp - Initial vs - Initial
2.5
Depth [km]
3.0
number of random realizations are increased. When cross- 1
m/s
2.0 m/s
1.5
talk artefacts are attenuated, SSFWI has comparable resolu- 2
1.0 0.5
3
tion and parameter trade-offs to FWI. Cross-talk artefacts are
successfully mitigated by altering the encoding functions at 0 vp - FWI vs - FWI
2.5
Depth [km]
3.0
each iteration. Models inverted using SSFWI are of compara- 1
m/s
m/s
2.0 1.5
ble quality to those inverted using NLCG FWI while requiring 2
1.0 0.5
3
orders of magnitude fewer computational resources. In com-
plex cases, the quasi-Newton form of sequential source FWI 0 vp - SSFWI vs - SSFWI
2.5
Depth [km]
3.0
may still hold advantages over SSFWI, as suggested by the 1
m/s
m/s
2.0 1.5
higher quality of inverted models in the marine overthrust case. 2
1.0 0.5
30
The results presented demonstrate the feasibility of SSFWI in 2 4 6 8 0 2 4 6 8
Distance [km] Distance [km]
favourable inversion scenarios. Further research should pur-
sue more challenging cases that include noise and poor start- Figure 4: True, intial and final inverted v p , vs models after
ing models. Convergence rates in elastic SSFWI may further multi-scale inversion for the Marmousi II test case.
benefit from second order optimization methods (Castellanos
et al., 2015). A-priori knowledge coupled with explicit regu-
larization schemes will likely be necessary to ensure successful
inversion in real data examples.
© 2017 SEG Page 1576

EDITED REFERENCES
REFERENCES
Aminzadeh, F., J. Brac, and T. Kunz, 1997, SEG/EAGE 3-D salt and overthrust models: SEG/EAGE 3-D
Modeling Series, No. 1, SEG.
Bunks, C., F. M. Saleck, S. Zaleski, and G. Chavent, 1995, Multiscale seismic waveform inversion:
Geophysics, 60, 1457–1473, http://dx.doi.org/10.1190/1.1443880.
Castellanos, C., L. Mtivier, S. Operto, R. Brossier, and J. Virieux, 2015, Fast full waveform inversion
with source encoding and second-order optimization methods: Geophysical Journal International,
200, 718–742, http://dx.doi.org/10.1093/gji/ggu427.
Komatitsch, D., and R. Martin, 2007, An unsplit convolutional perfectly matched layer improved at
grazing incidence for the seismic wave equation: Geophysics, 72, no. 5, SM155–SM167,
http://dx.doi.org/10.1190/1.2757586.
Krebs, J. R., J. E. Anderson, D. Hinkley, R. Neelamani, S. Lee, A. Baumstein, and M.-D. Lacasse, 2009,
Fast full wavefield seismic inversion using encoded sources: Geophysics, 74, no. 6, WCC177–
WCC188, http://dx.doi.org/10.1190/1.3230502.
Lee, S., D. Hinkley, J. R. Krebs, and J. E. Anderson, 2012, Crosstalk noise analysis of simultaneous-
source full wavefield inversion: 82nd Annual International Meeting, SEG, Expanded Abstracts,
Levander, A. R., 1988, Fourth-order finite-difference P-SV seismograms: Geophysics, 53, 1425–1436,
http://dx.doi.org/10.1190/1.1442422.
Martin, G. S., R. Wiley, and K. J. Marfurt, 2006, Marmousi2: An elastic upgrade for marmousi: The
Leading Edge, 25, 156–166, http://dx.doi.org/10.1190/1.2172306.
Modrak, R., and J. Tromp, 2016, Seismic waveform inversion best practices: regional, global and
exploration test cases: Geophysical Journal International, 206, 1864–1889,
http://dx.doi.org/10.1093/gji/ggw202.
Nocedal, J., 1980, Updating Quasi-Newton Matrices with Limited Storage: Mathematics of Computation,
35, 773–773, http://dx.doi.org/10.1090/S0025-5718-1980-0572855-7.
Romero, L. A., D. C. Ghiglia, C. C. Ober, and S. A. Morton, 2000, Phase encoding of shot records in
prestack migration: Geophysics, 65, 426–436, http://dx.doi.org/10.1190/1.1444737.
Sirgue, L., and R. G. Pratt, 2004, Efficient waveform inversion and imaging: A strategy for selecting
temporal frequencies: Geophysics, 69, 231–248, http://dx.doi.org/10.1190/1.1649391.
Virieux, J., 1986, P-SV wave propagation in heterogeneous media: Velocity-stress finite-difference
method: Geophysics, 51, 889–901, http://dx.doi.org/10.1190/1.1442147.
© 2017 SEG Page 1577

From classical reflectivity-to-impedance inversion to full waveform impedance inversion using
phase-modified, deconvolved reverse-time-migration image and rock-physics information
Chen Tang* and George A. McMechan, The University of Texas at Dallas
Summary and density). Separation of the trade-off between them

requires high-accuracy AVA information and flat ADCIGs
The full waveform inversion (FWI) is based on a strongly with consistent wavenumber along the angle axis. Both of
approximate solution of a mathematical problem. To them are hand to obtain for a practical FWI; particularly the
connect it with physical concepts, previously we used the latter requirement means the half-wavelength condition of
classical reflectivity-to-velocity inversion to derive an FWI single-parameter FWI does not work for multi-parameter
formula, which suggests that the relative velocity update is inversion. To proceed, we propose to use rock physics
a phase-modified and deconvolved reverse-time-migration information to obtain the density from the P-velocity. Then
(RTM) image using the residual data. In this paper, we the multi-parameter inversion is converted into single-
extend this method from P-velocity inversion into P- parameter inversion. We also give the extension of this
impedance inversion. During the extension, a difficult point method to elastic media.
is that the P-impedance is a product of P-velocity and
density; updates of the two parameters have a trade-off Full waveform impedance inversion
during the inversion process. In our system, solving this
trade-off is similar to an amplitude-versus-angle (AVA) In acoustic media, the relation between the true reflectivity
inversion. However, the accurate AVA information is hard (that is independent from reflection angle) and impedance
to obtain in practice; the AVA inversion also requires the is defined as below (used by, e.g., Peterson et al. [1955] for
angle-domain common-image gathers (ADCIGs) to be a classical reflectivity-to-impedance inversion),
I 2 − I1 ∆I ρ 2v2 − ρ1v1
(nearly) flat, which poses a much stricter constraint on the =R = = , (1)
accuracy of the background P-velocity than the half- I1 + I 2 2 I a ρ1v1 + ρ 2v2
wavelength condition for single-parameter (P-velocity) where R is the PP reflectivity; I1 and I 2 are P-impedances
FWI. To proceed, we combine our formula with the rock
physics in which the relation between the P-velocity and of the upper and lower layers; ∆I is the impedance contrast
density can be approximated. Thus we can invert only the and I a is the average impedance; v is the P-velocity and
P-impedance and use it to approximate the P-velocity and ρ is the density. Using the same definition of Fourier
density in each iteration. So we convert the multi-parameter transforms (FTs) with Tang and McMechan (2017a), Eq. 1
inversion (P-velocity and density) into single-parameter can be approximately reformed as
inversion (P-impedance). Extension of this method into ∆xi 1 ∆I 1 dI 1
elastic media is also given. In the real world, the rock- R= = A = − A iki I , (2)
2 I a ∆xi I a dxi Ia
physics relationship is complicated, probabilistic and
condition-dependent; we suggest addressing this via where the ∆xi and ki are a small unit interval and the
Machine learning, by building and progressively modifying wavenumber in reflector-normal direction, respectively; A
a database of statistical multi-parameter (P and S velocities, is a scale value. We can rewrite Eq. 2 as
density, attenuation, etc.) relations that can be navigated I 1 R
with a search engine (e.g., like Google) to dynamically = − , (3)
Ia A iki
determine the solution with the highest probability and its
nonuniqueness. This implies a major, multi-year effort. A where R can be approximately obtained from amplitude-
stable amplitude-preserved RTM formula is also given as preserved (AP) RTM (Tang and McMechan, 2017a)
an approximation to the deconvolution imaging condition.  U ( x ,ω ,θ ) 
=R ( x ) µ ∫ R′ ( x,θ ) cos 2 θ dθ =µℜ  ∫ ∫ r cos 2 θ dω dθ  , (4)
 U s ( x ,ω , θ ) 
Introduction
where µ is a scale factor, ℜ means taking the real part,
Classical FWI (e.g., Tarantola, 1984) defines a θ is the reflection angle, R′ is the angle-dependent
mathematical problem to minimize an L2 norm of the reflectivity, and U s and U r are the FTs of source wavefield
residual data. The mathematical complexity of the problem us and receiver wavefield ur which are obtained by using
forces ignorance of several terms during the derivation. To
the observed data directly as the boundary condition,
relate the FWI process with physical concepts, Tang and
McMechan (2017a) use a classical reflectivity-to-velocity  ∂ 2u r ( t , x ) ∂  1 ∂ur ( t , x ) 
 − ρv2  = 0,
inversion to derive an FWI formula. In this paper, we  ∂t
2
∂x  ρ ∂x  (5)
extend this method into P-impedance inversion in acoustic 
media. The inversion involves multi-parameters (P-velocity ur ( t , x, y, = z 0= ) dobs ( t , x, y, xs ) ,
© 2017 SEG Page 1578

Full waveform impedance inversion
where d obs is the observed data, xs is the source location, ADCIGs are not flat, the least-squares estimation is
and t is the time. Inserting Eq. 4 into Eq. 3 gives difficult to be stable.
(III). The Aki-Richards equation is a linear approximation
I  1  U r ( x ,ω ,θ )   of Zoeppritz equations and only work for small angles
=− µℜ ∫∫F cos 2 θ   dω dθ , (6)
−1
 F
 iki  U s ( x ,ω ,θ )
(say, up to 35°).
Ia  
So it is difficult to implement Eq. 10 at the current stage,
where F is a forward FT from the x to the k domain; A−1 but the real world does involve more parameters than Vp.
is included in µ . Based on a half-wavelength assumption To proceed, we propose to use the rock physics information
(Tang and McMechan, 2017a), we can obtain to convert the P-impedance (in Eq. 8) into P-velocity and
v0 H r ( x ,ω ,θ )
density. For example, Gardner et al. (1974) propose that the
δI
= µℜ ∫ ∫ iω U ( x ,ω ,θ ) cos θ dωdθ
2
. (7) P-velocity and density has the following relation,
Ia s ρ = 310V p0.25 , (11)
where δ I is the impedance update and H r is an FT of where the unit of Vp is m/s and the unit of ρ is kg/m3. The
hr that is obtained by using the residual data (calculated Vp here is the v in Eq. 1~10. Based on Eq. 11, we have
( )
0.8
data minus observed data) directly as the boundary V p = I p 310 . (12)
condition of Eq. 5. Eq. 7 can be written as
In the real world, the relation between Vp and ρ is certainly
δI  G ( x ,ω ,θ )  much more complicated than Gardner’s relation. The rock-
= µ v0ℜ  ∫ ∫ r cos 2 θ dω dθ  , (8)
Ia  U s ( x ,ω ,θ )  physics relationship between Vp and ρ is strongly non-linear
and probabilistic. It also depends on different geological
where Gr is the FT of g r that is obtained using conditions such as lithology, pressure, porosity, and so on.
 ∂2 gr (t, x ) ∂  1 ∂g r ( t , x )  With the fast development of the computer science, one
− ρv2  =
 ∂t 2
0, way to address this is via Machine learning, by building
∂x  ρ ∂x 
 (9) and progressively modifying a large database of statistical
 multi-parameter (Vp, Vs, ρ, attenuation, etc.) relations that
 g r ( t , x, y, = ) ∫t δ d ( t′, x, y, xs ) dt′.
T
z 0=
can be navigated with a fast search engine (e.g., like
Google) to dynamically determine the solution with the
From impedance to velocity and density highest probability and its nonuniqueness, which also has
Eq. 8 is a full waveform impedance inversion, but Eq. 9 the potential for defining uncertainty (or risk) in using the
requires P-velocity and density for wavefield extrapolation. FWI result for subsequent interpretation. The significance
Based on the Aki and Richards (1980) equation, we can of this approach is using rock-physics relationships to
separate the P-impedance update in Eq. 8 into reduce the multi parameters (e.g., Vp, Vs, ρ) to a single one
δv δρ  G ( x ,ω ,θ )  (e.g. Ip) for inversion, in which the half-wavelength
+ cos 2 θ =µ v0ℜ  ∫ r cos 2 θ dω  ， (10) criterion can be applied. Establishing this database and
va ρa  U s ( x ,ω ,θ )  search engine is beyond the scope of the present study, as it
which has similarity with the formulas of Zhang et al. implies a major, multi-year effort. In the example section
(2014) and Qin and Lambaré (2016). Solving δ v va and below, we assume that the constraint defined by Eq. 11 is
accurate, and show the results of combining this with an
δρ ρ a requires a least-squares solution to satisfy the AVA. approximate implementation of Eq. 8. A different approach
However, implementation of Eq. 10 is very difficult at the of using a physical constraint is to include it in the
current stage, because objective function of P- and S-velocity tomography, as
(I). Eq. 10 requires a high demand on the accuracy of shown by Duan and Sava (2016).
the AVA information. However, the deconvolution
imaging condition is often not stable in practice; this A practical implementation for AP-RTM
imaging condition assumes a single reflection at each
grid point, but the image in practice involves multipath The FWI implementation in Eq. 8 requires a deconvolution
and is also often stacked from several sources. In imaging condition and calculation of reflection angles. In
complicated structure, each image point often does not practice, the RTM image is stacked over the partial images
have a balanced illumination at each angle. from all sources. Thus, the AP-RTM in Eq. 4 gives a
(II). For a stable least-squares estimation from Eq. 10, it relative amplitude. Then it assume that, for each reflection
requires the ADCIGs to be flat. This means we may angle, the number of reflections at each image point is the
need the velocity above the reflection image to be same (sufficient illumination). In this case, the cos 2 θ can
correct, which results a correct image location. Thus, be involved in the global scale µ , and thus Eq. 4 is
the half-wavelength condition does not work for the
simplified to
multi-parameter inversion in Eq. 10, because, if the
© 2017 SEG Page 1579

U r ( x ,ω )  U ( x, ω ) U r ( x, ω )  where i and j are the reference numbers of the decomposed

R ( x )= µ ∫ dω= µℜ  ∫ s dω , (13) source and receiver wavefields, respectively; F (θ ) is an
U s ( x ,ω )  U s2 ( x, ω ) 
(reflection or open) angle filter.
The U s in the denominator still cause an instability,
especially when the velocity is incorrect, the image point However, in FWI, removing the backscattering and
with a large U r may not have a large U s and then the tomographic signal may not be necessary; see the
instability occurs. To stabilize it, we propose a practical discussion of Tang and McMechan (2017a). Thus, based on
scheme below. The first step is to rewrite Eq. 13 as (Tang the process from Eq. 4 to Eq. 17, Eq. 8 becomes
and McMechan, 2017a) δI v0 ∫ uˆs ( x, t ) gˆ r ( x, t ) dt
 U ( x, ω ) U r ( x, ω )  =µ . (19)
R ( x )= µℜ  ∫ s 2
−2
G ( x s , x, ω )  dω  , (14) ∫ ( )
2
Ia uˆ x , t dt + ε 2
A
S ( xs ,ω )
s
 
The I a is difficult to obtain; it can be approximated by
where the S is the source wavelet and G ( x s , x, ω ) is the using I 0 or the global average of the I 0 . The latter is used
Green’s function from x s to x. Eq. 14 involves two parts; in our implementation, in which I a can be included in the
one is the convolution imaging condition with source scale factor µ . If we ignore the spreading loss
wavelet deconvolution and the other is the spreading loss
compensation, Eq. 19 becomes
compensation (Tang and McMechan, 2017a). Assuming the
δI
source wavelet is known, the instability issue mainly = µ v0 ∫ uˆs ( x, t ) gˆ r ( x, t ) dt , (20)
focuses on the second part (spreading loss compensation). Ia
To stabilize Eq. 14, we use an approximate formula, which enhances the update-magnitude for short wave path
 U s ( x, ω )U r ( x, ω ) S 2 (ω )  dω 
∫
from the source. This generally gives priority to the shallow

R ( x )= µℜ   
, (15) parts and the near-offset data. The proposed scheme can
 ∫ U 2
s ( x , ω ) S 2
( ω ) 
 d ω  also be combined with some algorithms to relax the half-
wavelength condition, such as adaptive matching (e.g., Zhu
where the denominator is an approximation of the
and Fomel, 2016) and layer striping (e.g., Masoni et al.
spreading loss G ( x s , x, ω )  . Eq. 15 is equivalent to
2
2016). The latter can be applied to either the residual data
(in time) or the image (in depth), which often requires a
R (x) = µ
∫ uˆ ( x, t ) uˆ ( x, t ) dt ,
s r
(16)
tapering function at cut-off boundaries to avoid artifacts.
∫ uˆ ( x, t ) dt
2
s
Examples
which is similar to the source-normalized cross-correlation
imaging condition (e.g., Kaelin and Guitton, 2006) but it Here we provide a 2D example using a complicated portion
uses the deconvolved source and receiver wavefields ( uˆs of the SEG Overthrust model. The true Ip model is obtained
and uˆs ; the ˆ means ‘deconvolved’). In complicated using the density in Eq. 11; see Fig. 1a. To remove the
effect of the direct wave, we add a constant layer (2500
media, it is possible that an image point is not illuminated m/s) at the top and so the target region is below 500 m
by a particular source. Therefore, to increase the stability of depth. The initial Ip model is in Fig. 1b. There are 200
Eq. 16, we use sources from 40 to 4000 m; each source corresponds to 401
R (x) = µ
∫ uˆs ( x, t ) uˆr ( x, t ) dt , (17)
receivers with a migration aperture from -2000 to 2000 m.
Some algorithms (nonlinear conjugate, step-length
∫ uˆs ( x, t ) dt + ε A
2
2
searching and multi-scale scheme) are applied to improve
for each source, which ε 2 is a small positive value and A Eq. 19 and 20; refer to the paper of Tang and McMechan
can be, e.g., an average or a maximum value of (2017a) for details. Thus, when we mention using Eq. 19 or
20 below (including the figure captions), we actually mean
∫ uˆ ( x, t )
2
s dt in the global space. The cross-correlation in using the corresponding equation as the core of the FWI
the numerator involves backscattering artifacts, which can implementation. We use 20 iterations for each of the five
be removed by using a wave decomposition plus angle- cut-off frequencies (4, 6, 8, 12 and 16 Hz which are the
filter (Tang and McMechan, 2016), central frequency in a tapering window with a width of
M −1, N −1 4Hz), and finally run another 80 iterations using the
∫ ∑ F (θ ) δ ( i − j ) uˆs ,i ( x, t ) uˆr , j ( x, t )dt unfiltered data. So the total iteration number is 180. To
R (x) = µ
=i 0,=j 0 maintain the stability of the finite-difference extrapolation,
, (18)
∫ uˆ ( x, t )
2
s dt + ε 2 A we set a maximum value for the Ip, which corresponds to
the P-velocity at 7000 m/s. Moreover, the Ip must be
© 2017 SEG Page 1580

(a) (b)
(c) (d)
Fig. 3. P-impedance profile at horizontal

locations 2000 m. The blue, green and
Fig. 1. (a) and (b) are the true and initial P-impedance models. (c) and (d) are inverted results using
red lines denote the true, initial and
−2 −1
Eq. 19 and 20, respectively. The unit of P-impedance shown in Fig. 1 and 3 is kg ⋅ cm ⋅ s . inverted P-impedance (Fig. 1c).
positive. Fig. 1c and 1d contains the results using Eq. 19 which use PP and SS images. Here the upper-case ‘P’ and
and 20; both of them have good quality and they are very ‘S’ in the subscripts denote P and S waves, respectively.
similar. Actually Fig. 1c is slightly better than Fig. 1d, Obtaining Vp, Vs and ρ from Ip and Is requires rock-physics
because Fig. 2 shows that Fig. 1c has slightly less average knowledge. In practice, because the P-wave signal is often
relative Ip residual than Fig. 1d. Fig. 3 shows an Ip profile more reliable than the S-wave signal, a choice is inverting
in Fig. 1c, where the inverted Ip almost overlaps with the only the P-impedance and using the rock-physics
true Ip . In summary, Fig. 1~3 show that the proposed FWI information to obtain Vp, Vs and ρ for the elastic wave
flow inverts an accurate Ip model. extrapolation, from which we can obtain the decomposed P
wave for the phase-modified and deconvolved PP image
(using residual data). It is easy to use the P stress tensors to
obtain the PP image. If the vector P and S particle-
velocities are used, please refer to Tang and McMechan
(2017c) for obtaining the scalar PP and SS images. Using
the rock-physics relationship to reduce the number of
parameters in inversion can also be used in classical FWI
frame for inverting the velocity.
Conclusion
We derive a new formula for full waveform impedance

inversion based on the classical reflectivity-to-impedance
Fig. 2. Average P-impedance relative residual (shown in percentage) inversion, which suggests that the impedance update is a
versus iteration number. This relative residual is obtained by phase-modified and deconvolved RTM image obtained
averaging I p ,result − I p ,true I p ,true in the global space. The red and using the residual data which is then multiplied by the
blue lines corresponding to Fig. 1c (using Eq. 19) and Fig. 1d background velocity and the I a . To obtain the P-velocity
(using Eq. 20), respectively. and density for wave extrapolation, we propose to use the
rock-physics relationship to obtain them from P-
Extension of the proposed flow into elastic media impedance. Because of the complexity of this relationship
in practice, we suggest establishing a large database and
The proposed flow can be extended into elastic media. As using a fast, dynamic search engine to support this scheme,
there are both P- and S-impedances, Eq. 8 becomes in which the number of parameters that needs to be inverted
δ I  G ( x ,ω )  is decreased. Furthermore, we suggest a scheme to stabilize
= p
µ P v p ,0ℜ  ∫ ∫ r ,P cos 2 θ PP d ω dθ PP  ,
s ,P ( )
 I p ,a ω the implementation of deconvolution imaging condition.
 U x , 
 (21) The numerical example shows that our method works well
r ,S ( ) cos2 θ dω dθ ,
 δ Is  G x ,ω  for the P-impedance inversion in acoustic media. Extension
= µ S vs ,0ℜ  ∫ ∫ SS 
 U s ,S ( x ,ω )
SS
 I s ,a  of the method into elastic media is also given.
© 2017 SEG Page 1581

EDITED REFERENCES
REFERENCES
Aki, K., and P. Richards, 1980, Quantitative seismology: Theory and methods: W.H. Freeman and Co.
Duan, Y., and P. Sava, 2016, Elastic wavefield tomography with physical model constraints: Geophysics,
81, no. 6, R447–R456, http://dx.doi.org/10.1190/geo2015-0508.1.
diagnostic basics for stratigraphic traps: Geophysics, 39, 770–780,
http://dx.doi.org/10.1190/1.1440465.
Kaelin, B., and A. Guitton, 2006, Imaging condition for reverse time migration: 76th Annual International
Meeting, SEG, Expanded Abstracts, 2594–2598, http://dx.doi.org/10.1190/1.2370059.
Masoni, I., J.-L. Boelle, R. Brossier, and J. Virieux, 2016, Layer stripping FWI for surface waves: 86th
Peterson, R. A., W. R. Fillipone, and F. B. Coker, 1955, The synthesis of seismograms from well log
data: Geophysics, 20, 516–538, http://dx.doi.org/10.1190/1.1438155.
Qin, B., and G. Lambaré, 2016, Joint inversion of velocity and density in preserved-amplitude full-
Tang, C., and G. A. McMechan, 2016, Combining multidirectional-source vector with revised antileakage
Fourier transform to calculate angle gathers from reverse time migration in two steps: 86th
Tang, C., and G. A. McMechan, 2017a, From classical reflectivity-to-velocity inversion to full-waveform
inversion using phase-modified and deconvolved reverse time migration images: Geophysics, 82,
no. 1, S31–S49, http://dx.doi.org/10.1190/geo2016-0033.1.
Tang, C., and G. A. McMechan, 2017b, Multidirectional-vector-based elastic reverse time migration and
angle-domain common-image gathers with approximate wavefield decomposition of P and S
waves: 87th Annual International Meeting, SEG, Expanded Abstracts.
1259–1266, http://dx.doi.org/10.1190/1.1441754.
Zhang, Y., A. Ratcliffe, G. Roberts, and L. Duan, 2014, Amplitude-preserving reverse time migration:
From reflectivity to velocity and impedance inversion: Geophysics, 79, no. 6, S271–S283,
http://dx.doi.org/10.1190/geo2013-0460.1.
Zhu, H., and S. Fomel, 2016, Building good starting models for full-waveform inversion using adaptive
matching filtering misfit: Geophysics, 81, no. 5, U61–U72, http://dx.doi.org/10.1190/geo2015-
0596.1.
© 2017 SEG Page 1582

Iterative modeling, migration and inversion (IMMI): evaluating the well-calibration technique
to scale the gradient in the FWI process
Sergio Romahn* and Kristopher A. Innanen, Department of Geoscience, University of Calgary, CREWES
Summary measures the difference between the recorded data and the
modeled data at the kth iteration (equation 2).
Iterative modeling, migration and inversion (IMMI) aims to
incorporate standard processing techniques into the process
of full waveform inversion (FWI). Within IMMI, depth (2)
migration method may be used to obtain the gradient, in
contrast to standard FWI which uses a two-way reverse The gradient in Equation 1 can be written in the time
time migration (RTM). Another aspect of the IMMI domain as:
approach is the use of well-calibration to scale the gradient,
rather than applying a line search to find the scalar or an
approximation of the inverse Hessian matrix. We examine
with synthetic examples the performance of IMMI in (3)
circumstances of progressively increasing geological
complexity. We find consistently low errors nearby the where T denotes record length. Equation 3 says that the
well-calibration location, even in the most complex gradient of the objective function is formed by correlating
settings. This suggests that the gradient obtained by the time-reversed residuals propagated into the medium
applying a migration method other than RTM, though less with the source field propagated into the medium. This is
wave-theoretically complete, points in the correct direction the core of FWI. The gradient is the element that contains
in order to minimize an FWI-like objective function, and the direction of the velocity update in the minimization
that well-calibration provides a working approach for scheme. The other element is the inverse Hessian or an
scaling. These refinements of FWI may be important approximation of it. If the inverse Hessian is replaced by a
enablers for application of waveform inversion in reservoir scalar λ, the mathematical effort is reduced to the gradient
characterization, where we may have many control-wells, or steepest-descent method. λ scales the gradient to be
and we may wish to extend our approach to the converted into a velocity perturbation. λ is commonly
determination of several elastic and/or rock properties. We estimated by a line-search method, which requires an extra
find that well-calibration scales the updates properly up to forward modeling per shot (Virieux and Operto, 2009),
what we refer to as moderate lateral velocity changes. doing the process more expensive.
Introduction FWI is an iterative cycle that involves four main steps,

shown in Figure 1 (Margrave et al., 2010).
Lailly (1983) and Tarantola (1984) provided the
mathematical foundations for full waveform seismic
inversion. They showed that FWI and migration are
strongly linked, in what Margrave et al. (2010) called the
fundamental theorem of FWI, which is summarized in
Equation 1.
(1)
where is the velocity update, λ is a scalar constant,

is the gradient with respect to the velocity model , is
the objective function for iteration k, ω is angular
frequency, is a model of the source wavefield for source FIG. 1. The cycle of FWI (Margrave et al., 2010)
s propagated to all (x, z), is the kth data residual for
source s back propagated to all (x, z), and * means complex The first step consists in generating synthetic seismic data
conjugation. The residual δΨ is the difference between the (predicted data ) from an initial model and the
observed data and the modeled data. The objective function calculation of the data residual . The
second steps involves the pre-stack depth migration using
© 2017 SEG Page 1583

Iterative modeling, migration and inversion and well calibration
the current velocity model of the data residual and stack to First iteration
obtain . This step provides the gradient or update
direction. The third step is scaling or calibrating the The initial velocity model was generated by applying a
gradient by aplying λ, which produces the velocity Gaussian smoother 290 meters wide to the true velocity
perturbation . The last step is updating the velocity model. The initial velocity model provides no more than 2
model that will be used in the next Hz of geological information, while the true velocity model
iteration. mainly contains information between 1 and 30 Hz, with the
main events around 12 Hz. The seismic data have a
Iterative Modeling, Migration and Inversion (IMMI), dominant frequency of roughly 15 Hz and provide
introduced by Margrave et al. (2012), was proposed as an information between 7 and 25 Hz. There is a gap between 2
alternative to “classical” FWI which involved tools already and 5 Hz, where neither the initial model nor the seismic
available and widely applied by the industry. The key data contribute. Modeled shots were generated by using the
IMMI innovations are the use of any depth migration initial model. The difference between the observed and the
method in place of RTM, and the incorporation of well modeled shots is the data residual. We obtain a data
information to scale the gradient. The authors further residual per shot, which are migrated in depth with the
suggested that using a deconvolution imaging condition, PSPI method, which permits us to limit the process to a
instead of the correlation type generally employed, may specific frequency range. We used frequencies between 1
produce updates similar those obtained by preconditioning and 5 Hz for the first iteration. A mute, before stacking the
with the main diagonal elements of the inverse Hessian, residuals, is commonly applied to avoid migration artifacts.
which is a gain correction, as illustrated by Shin et al. The result of stacking the migrated data residuals is the
(2001). Pan et al. (2014) applied the IMMI method, gradient.
compared the crosscorrelation and deconvolution imaging
conditions, and showed that using a deconvolution-based The next step is to scale or calibrate the gradient. We use
gradient can compensate the geometrical spreading. well C to perform this process (Figure 3). The well
calibration technique was described by Margrave et al.
Following the IMMI approach, we used a phase-shift plus (2010). Figure 2 shows the calibration process. Firstly, the
interpolation (PSPI) migration method (one-way wave difference, δvel, between the well and model velocities is
migration) with a deconvolution imaging condition to calculated. The second step is to estimate the amplitude
obtain the gradient. PSPI, introduced by Gazdag and scalar a and the phase rotation ϕ that optimally match the
Sguazzero (1984), allows selecting a range of frequencies gradient trace g to δvel. The scalar a is found such that the
of interest, which is very convenient to explore frequency- difference between δvel and ag is minimized by least
based (i.e., multiscale) strategies in FWI, wherein the squares. Finally, a convolutional match filter is obtained
inversion is started using low frequencies and then higher incoporporating a and ϕ. This match filter is applied to
frequencies are progressively included, to avoid local every gradient trace in order to obtain the velocity update.
minima (Pratt, 1999). We will follow this strategy. The
scale λ in Equation 1 takes the form of a match filter that
equates the size of the gradient to the size of the velocity
residual in a well location. The velocity residual is the
difference between the well velocity and the current
velocity model.
Method
The observed shots for this experiment are idealized

version of the ones that would be recorded on the field. We
generated these shots with an acoustic finite-difference
algorithm to propagate the wavefield. A minimum phase
wavelet with a dominant frequency of 20 Hz was used as
seismic source. The sources are placed every 50 m from
2100 to 9250 m, giving 144 shots in total. Receiver stations
are located along the whole model every 10 m, and all of
them were kept alive for each shot.
FIG. 2. Well calibration for the first iteration.
© 2017 SEG Page 1584

More iterations calibration well. For this case, the resulting inverted model
captures the main features and amplitudes of the true
The inputs which must be supplied for the subsequent model, and again it is possible to identify the low velocity
iterations are the frequency range to be used, and the body enclosed in the anticline at 2500 m depth. In the
updated velocity model. The frequency range was increased presence of strong lateral velocity changes, such as in
by 1 Hz in each iteration. We stopped the inversion at the Model 3, the inversion produces good results in the vicinity
10th iteration because in our experiments the error in the of the calibration well, but the error increases quite strongly
model does not decrease anymore after that point. as we move away from the well, especially in the zones of
the high velocity bodies.
Examples
Figure 4 illustrates the results when more than one well are
We evaluated the performance of the well calibration incorporated to scale the gradient in constructing Model 3.
technique in three different geological settings shown in Wells A, B and C were used to obtain an average
Figure 3. Model 1 is the simplest model we consider, calibration filter that was applied to scale the gradient. As
consisting of horizontal layers. The inversion is able to we include more wells, the error across the model
recover the most important features of the subsurface, decreases. If more than one well is available, more options
including the low velocity body at a 2500 m depth, which is arise: for example, a spatial-varying filter can be estimated.
not present in the calibration well. The error is consistently
low across the model. When moderate lateral velocities are The match filter used for the experiments above was
found, such as in Model 2, the error across the model still designed over the whole depth interval from zero to 3000
decreases in each iteration, giving the best result around the m. A more realistic experiment is shown in Figure 5,
FIG. 3. Comparison among initial, inverted and true velocity models. The calibration and blind wells are C and B, respectively. The
evolution of the inverted trace, from the initial model to iteration 10, shows an excellent performance in the calibration well for the three
models, and the normalized error is consistenly low at this location. The error tends to increase with stronger lateral velocity variations
and as we move away the calibration well.
© 2017 SEG Page 1585

wherein different depth intervals were selected to obtain the Conclusions

calibration filter. We note that the inverted trace is better in
the depth zone where the filter is estimated. Following this The gradient, calculated with a one-way wave migration
observation, we tested a depth-varying calibration filter. method (PSPI) under a deconvolution imaging condition,
The result is illustrated in the rightmost panel of Figure 5. points in the correct direction in order to minimize the
The depth-varying filter provides superior results in objective function in as per the IMMI scheme. We showed
comparison to those derived using a stationary filter. The that the use of well information to calibrate the gradient
example in the middle panel of Figure 5, where we use a produces a velocity perturbation to update the model which
depth interval from 1000-2250 m, exhibits poor inversion reduces model error effectively in several benchmark
performance in the shallower part. This result is a reminder examples. This was confirmed by the consistently low error
of the importance of sufficient well information when at the well location, even for the most complex of the
applying this technique. geological models. Well calibration satisfactorily performs
in the presence of moderate lateral velocity changes, such
as in Model 1 and 2. The error decreases in each iteration
as we go to higher frequencies, and the main geological
features of the subsurface are captured. When we have
strong lateral velocity variations, such as in the Marmousi
model, the inversion works properly in the shallow part,
and is able to recover the main futures in the deeper part.
However, the velocity tends to be underestimated as we go
to deeper zones. A depth-varying calibration filter helps to
overcome this issue. We found that well calibration can be
applied in complex settings, providing that the well is
representative of the geology of the area of interest. The
results suggest that a calibration filter that varies
horizontally (providing more control wells) and with depth,
is a worthy option to obtain better velocity updates in the
FWI process.
Acknowledgements
FIG. 4. Calibration with more than one well.
We thank the sponsors of CREWES for their support. We
also acknowledge support from NSERC through the grant
CRDPJ 461179-13. Author 1 thanks PEMEX and the
government of Mexico for founding his research.
FIG. 5 Depth-varying calibration filter.
© 2017 SEG Page 1586

EDITED REFERENCES
REFERENCES
Gazdag, J., and P. Sguazzero, 1984, Migration of seismic data by phase shift plus interpolation:
Lailly, P., 1983, The seismic inverse problem as a sequence of before stack migration: SIAM, 206–220.
Margrave, G. F., R. J. Fergurson, and C. M. Hogan, 2010, Full-waveform inversion with wave equation
migration and well control: CREWES Research Report, 22.
Margrave, G. F., K. A. Innanen, and M. Yedlin, 2012, A perspective on full-waveform inversion:
CREWES Research Report, 24.
Pan, W., G. F. Margrave, and K. A. Innanen, 2014, Iterative modeling migration and inversion (IMMI):
Combining full waveform inversion with standard inversion methodology: 84th Annual
International Meeting, SEG, Expanded Abstracts, 938–943, https://doi.org/10.1190/segam2014-
0402.1.
Pratt, R. G., 1999, Seismic waveform inversion in the frequency domain, part 1: Theory and verification
in a physical scale model: Geophysics, 64, 888–901, https://doi.org/10.1190/1.1444597.
Shin, C., K. Yoon, K. J. Marfurt, K. Park, D. Yang, H. Y. Lim, S. Chung, and S. Shin, 2001, Efficient
calculation of a partial-derivative wavefield using reciprocity for seismic imaging and inversion:
1259–1266, https://doi.org/10.1190/1.1441754.
Virieux, A., and S. Operto, 2009, An overview of full-waveform inversion in exploration geophysics:
© 2017 SEG Page 1587

FWI without tears: a forward modeling- free gradient
Marcelo Guarido, Laurence R. Lines* and Robert Ferguson
University of Calgary
Summary gradient method) in the time domain to minimize the

objective function without explicitly calculate the partial
Full waveform inversion (FWI) is a machine learning derivatives. They estimated the gradient by
algorithm with the goal to find the Earth’s model backpropagating the residuals using a reverse-time
parameters that minimize the difference of acquired and migration (RTM). Pratt et al. (1998) develop a matrix
synthetic shots. In this work, we are introducing a new formulation for the full waveform inversion in the
interpretation of the gradient as the residual impedance frequency domain, and presented efficient strategies to
inversion of the acquired data. Its estimation is forward compute the gradient and the inverse of the Hessian matrix,
modeling and wavelet free, reducing its costs drastically, as for both the Gauss-Newton and the Newton
the inverted model could be obtained on a personal laptop approximations. The FWI is shown to be more efficient if
without the need of parallel processing. The new method applied in a multi-scale method, where lower frequencies
was applied, with great success, on the acoustic Marmousi are inverted first and they are increased on later iterations
simulation. The inverted model, when using the same (Pratt et al., 1998; Virieux and Operto, 2009; Margrave et
starting point, is comparable to the results when using the al., 2010). An overview of the FWI theory and studies are
migrated residuals. This approximation also opened the compiled by Virieux and Operto (2009). Lindseth (1979)
possibility to change the order of migration and the stack showed that an impedance inversion from seismic data is
steps, during the gradient estimation, to use a post-stack not effective due to the lack of low frequencies during the
depth migration, and results are promising. In the end, we acquisition but could be compensated by the match with a
are proposing a new FWI approximation that is cheap and sonic-log profile. Margrave et al. (2010) used a gradient
stable, and could be applied on a real seismic survey in a method and matched it with sonic logs profiles to
processing center that has enough computer power to run a compensate the absence of the low frequency, and to
PSDM or even just a post-stack depth migration. calibrate the model update by computing the step length
and a phase rotation (avoiding cycle skipping). They also
Introduction proposed the use of a PSPI (phase-shift-plus-interpolation)
migration (Ferguson and Margrave, 2005) instead of the
Seismic inversion techniques are the ones that use intrinsic RTM, so the iterations are done in time domain but only
information contained in the data to determine rock selected frequency bands are migrated, using a
properties by matching a model that "explains" the data. deconvolution imaging condition (Margrave et al., 2011;
Some examples are the variation of amplitude per offset, or Wenyong et al., 2013) as a better reflectivity estimation,
AVO (Shuey, 1985; Fatti et al., 1994), the traveltime same strategy used by Guarido et al. (2015a;2015b).
differences between traces, named traveltime tomography Guarido et al. (2016) show the need of the application of an
(Langan et al., 1984; Bishop and Spongberg, 1984; Cutler impedance inversion step in the gradient and use a band-
et al., 1984), or even by matching synthetic data to the limited impedance inversion (BLIMP) method using the
observed data, as it is done in full waveform inversion algorithm implemented by Ferguson and Margrave (1996).
(Tarantola, 1984; Virieux and Operto, 2009; Margrave et Warner and Guasch (2014) use the deviation of the Wiener
al., 2010; Pratt et al., 1998), among others. These filters of the real and estimated data as the object function
inversions can compute rock parameters as P and S waves with great results.
velocities, density, viscosity and others. In this paper, we We are proposing a new approximation for the FWI, where
are focusing only on the inversion of the P wave velocity the gradient is interpreted as a residual impedance of the
(acoustic). current model and the impedance inversion of the acquired
Full waveform inversion (FWI) is a machine learning based data. On each iteration, the data is PSPI migrated (Ferguson
method, with the objective to estimate the Earth’s model and Margrave, 2005), with a deconvolution imaging
parameters that minimize the difference between observed condition, using the current model and applying a BLIMP
(acquired) data and synthetic shots (Margrave et al., 2011). inversion on the stacked data. A conjugate gradient is also
This is accomplished by iteratively updating the starting used to improve the quality of the gradient and to reduce
model with a scaled gradient and then creating new the number of iterations (Zhou et al., 1995; Vigh and Starr,
synthetic shots with the new model. 2008). The step length is computed by a least-squares
The method was proposed in the early 80’s (Pratt et al., minimization (Pica et al., 1990). To compute the residuals
1998) but it was considered too expensive in computational on the standard methodology, a finite difference forward
terms. Lailly (1983) and Tarantola (1984) simplified the modelling algorithm is used to create the synthetic shots.
methodology by using the steepest-descent method (or The results of the new approximation are comparable with
© 2017 SEG Page 1588

FWI without tears: a forward modeling free gradient
the classic method (steepest descent). We went further and Interpreting the gradient as the residual difference of the
inverted the order of the migration and stack processing processed acquired data and the current model saves us to
steps, and computed the gradient using a zero-offset PSPI compute a synthetic data at each shot position. Source
migration (post-stack). The preliminary tests results are estimation is also not required. Two forward modeling are
promising, with a huge gap for improvements. still required on the step length process. But, as it is only an
amplitude matching, it is unnecessary to compute the
Theory correct wavelet.
We can make the method even cheaper if we invert the
The objective of the FWI methodology is to minimize an order of the migration and stacking operators on equation 4.
objective function. Here we minimize the residuals Δd(m), This would result on using a stacked session as input and a
that is the difference between observed data d0 and post-stack migration at each iteration:
synthetic data d(m), in the current model model m (here P
wave velocity): (5)
(1) Two forward modeling are still required to estimate the

step length. However, the costs drop significantly.
Minimizing the objective function C(m) with respect to the
model m, we can to the steepest-descent formula (Pratt et Examples
al.,1998):
Simulations are done on the Marmousi velocity model
(2) (figure 1a). Simulated acquired data are generated by a 2D
where α is the step length, g is the gradient and n is the n-th acoustic finite difference code using a Ricker wavelet with
iteration. This equation shows that a model update can be 5Hz of dominant frequency (even though the dominant
obtained by adding a scaled gradient to the current model. frequency of the data is 12Hz) on 104 different positions.
This routine is kept until stopping criteria is reached. The Starting model (figure 1b) is a smoothed version of the real
gradient should be, due to the theory, computed by a Marmousi. For the classic FWI method (figure1c), forward
reverse time migration of the residuals (Tarantola, 1984; modeling is done using current model (initial model on first
Pratt et al., 1998; Virieux and Operto, 2009), but we iteration and updated model subsequently) and the same
decided to use the phase-shift-plus-interpolation (PSPI) wavelet as the acquired data (we are just applying the FWI
migration, by the assumption that FWI is a set of on a synthetic simulation. No real data was tested).
processing tools and any pre-stack depth migration could First iteration uses only low frequency content, and the
be used to back-propagate the residuals. Later, the BLIMP initial band is from 4Hz to 6Hz. The same frequency band
algorithm uses the initial model as pilot to apply an is repeated until convergence is reached (objective function
impedance inversion of the gradient. The first iterations use varies less than 0.001% for three consecutive iterations).
only the low frequency on the data while the higher Then the frequency band is changed by fixing the minimum
frequencies are included on later iterations. frequency in 4Hz and increasing the maximum one by 2Hz.
By interpreting the gradient computation steps (migration, This routine is repeated until the maximum frequency of
stack, and impedance inversion) as seismic processing 30Hz.
tools, equation 2 can be rewritten in terms of the operators Acquired data is backpropagated using a PSPI migration
M for migration, S for stacking and I for impedance algorithm, then muted and stacked (Guarido et al. 2015b),
inversion, leading to: resulting on a reflection coefficient model, in depth, that
has the same size of the velocity model. It represents the
(3) usual gradient estimation and it is often assumed to be an
where dn is the synthetic shot. Guarido et al. (2016) assume equivalent to velocity when multiplied by the step length.
that all three operators are linear (true for migration and So, in many cases, the step length can be interpreted as an
stack and approximate for impedance inversion), and the impedance inversion operator. We decided to convert the
gradient can be interpreted as a residual difference of the reflection coefficients model to velocity by applying an
processed acquired data and the processed synthetic data impedance inversion. As data lack in low frequencies (1 to
(both migrated using the current model). The second one, 3Hz), we use the BLIMP algorithm, assuming the initial
on a perfect case, is the current model itself (the migrated, model is a good pilot, to fulfill the missing frequency
stacked and impedance inversion of the synthetic data). content. This means that the initial model must contain the
This explanation is better visualized in equation 4: low frequency (linear trend) of the study are. Later, the step
length is estimated.
Figure 1 shows the Marmousi model (a), initial model (b),
inverted model based on equation 3 (c), where residuals are
(4) computed as the difference between acquired and synthetic
© 2017 SEG Page 1589

data with an impedance inversion applied to the gradient, reduced by about 70%.
and, finally, the resulted model based on equation 4 (d), the Figure 2a is the stacked session used as input data for the
forward modeling free gradient with a PSDM. For both post-stack FWI method, based on equation 5, and resulted
inversions, the step length is estimated as proposed by Pica model is shown on figure 2b. There is a loss of resolution if
et al. (1990). The resulted models are comparable and show compared to the previous results. However, most of the
great resolution. major layers were correctly inverted and placed. Shallow
Figure 1: a) true Marmousi model, b) initial model for all runs, c) inverted model with classic FWI and d) inverted model with
the forward modeling free gradient method.
Model of figure 1c (using synthetic data to estimate the and mid-depth areas are comparable to previous models. It
gradient) has more geological structures than the model of is also possible to note some borders effects. They are due
figure 1d (forward modeling free gradient), better noticed to the step length be estimated using the central shot as
on deeper areas. In the shallow and mid depth, the models control point for the whole model. This effect could be
are comparable. This means that the regular FWI still reduced if more control points, closer to the borders, are
works better, mostly on higher frequencies, on a synthetic included.
simulation. However, it still requires a very good source The differences between methods are, mostly, the costs
estimation so the residuals are stable. The forward associated to each one. For the classic method, to run the
modeling free gradient does not require the source, and we full inversion routine, it was required 24 clusters for a
believe it would be more stable when applied on real data. parallel processing in MatLab, and total elapsed time was
Another advantage of the forward modeling free method is over 48 hours. The forward modeling free gradient method
the computing requirement and processing time, which is (pre-stack) reduce the costs considerably, and the routine
Figure 2: a) stacked section as input data and b) inverted model on the post-stack approximation.
© 2017 SEG Page 1590

ran on a personal gaming laptop (16Gb of RAM), with no with the classic FWI, the results are comparable, but with
parallel processing, and 8 hours of run time in Octave. The some loss in resolution as costs become cheaper. However,
post stack method ran on a tablet with dual core processor the cost-benefit trade-off looks to be worthwhile.
(4Gb RAM), where the total elapsed time was around 1 A post-stack method with preliminary results were also
hour only. presented, reducing even more the costs for a FWI run, but
Figure 3: Respectively, shot and model errors of a) and b) classic FWI, c) and d) forward modeling free gradient method and
e), and f) post-stack approximation.
Figure 3 compares the objective function and models also losing some resolution and the addition of border
deviations of the 3 methods. They all show to be stable and effects. However, we are confident that this is a safe
we observe the convergence of the objective function. The strategy to follow with the goal of applying the FWI on
models deviations show to reach a minimum at some point large surveys with reduced computer requirements and gain
and then starts to slowly diverge. We believe this is due to on stability, as it does not require a source estimation. In
the low signal-to-noise ratio at higher frequencies. We also the end, the choice of which method to be used will depend
observe a “break” of the curves. This happens when the on the investment power of the user.
inversion starts to include the dominant frequency of the
data (12Hz) during the migration. Acknowledgements
It is safe to say that the resolution of the inverted model
decreases the method gets simpler and cheaper. The choice The authors thank the sponsors of CREWES for continued
of the method is just a matter of cost and benefit. Better support. This work was funded by CREWES industrial
responses will require the highest investments, and no sponsors and NSERC (Natural Science and Engineering
guarantee of stability, as for some surveys the source can Research Council of Canada) through the grant CRDPJ
be very complicated to estimate. However, we show that a 461179-13. We also thank Soane Mota dos Santos for the
reasonable result, with just a small loss of resolution, can suggestions, tips and productive discussions.
be achieved by a drastically reduction of costs, and a more
robust inversion.
Conclusions
We have presented a new FWI method based on

interpreting the gradient as a residual difference of the
impedance inversion of the acquired data and the current
inverted model, removing the need to compute one forward
modeling per shot location on every iteration. Comparing
© 2017 SEG Page 1591

EDITED REFERENCES
REFERENCES
Bishop, T. N., and M. E. Spongberg, 1984, Seismic tomography: a case study: 54th Annual International
Meeting, SEG, Expanded Abstracts, 712–713, https://doi.org/10.1190/1.1894310.
Cutler, R. T., T. N. Bishop, H. W. Wyld, R. T. Shuey, R. A. Kroeger, R. C. Jones, and M. L. Rathbun,
1984, Seismic tomography: formulation and methodology: 54th Annual International Meeting,
SEG, Expanded Abstracts, 711–712, https://doi.org/10.1190/1.1894311.
Fatti, J. L., G. C. Smith, P. J. Vail, P. J. Strauss, and P. R. Levitt, 1994, Detection of gas in sandstone
reservoirs using AVO analysis: a 3-D seismic case history using the Geostack technique:
Ferguson, R., and G. Margrave, 2005, Planned seismic imaging using explicit one-way operators:
Geophysics, 70, S101–S109, https://doi.org/10.1190/1.2073885.
Guarido, M., L. Lines, and R. Ferguson, 2015a, Convergence of a FWI scheme based on PSPI migration:
GeoConvention Technical Program Expanded Abstract.
Guarido, M., L. Lines, and R. Ferguson, 2015b, Full waveform inversion: a synthetic test using PSPI
migration: 85th Annual International Meeting, SEG, Expanded Abstracts, 1456–1460,
Guarido, M., L. Lines, and R. Ferguson, 2016, FWI without tears: a forward modeling free gradient:
CREWES Research Report, 28, 26.1–26.20.
Lailly, P., 1983, The seismic inverse problem as a sequence of before stack migrations: Conference on
Inverse Scattering, Theory and Application: Society of Industrial and Applied Mathematics,
Expanded Abstracts, 206–220.
Langan, R. T., I. Lerche, R. T. Cutler, T. N. Bishop, and N. J. Spera, 1984, Seismic tomography: the
accurate and efficient tracing of rays through heterogeneous media: 54th Annual International
Meeting, SEG, Expanded Abstracts, 314, 713–715.
Lindseth, R. O., 1979, Synthetic sonic logs-a process for stratigraphic interpretation: Geophysics, 44, 3–
26, http://dx.doi.org/10.1190/1.1440922.
Ma, Y., D. Hale, Z. J. Meng, and B. Gong, 2010, Full waveform inversion with image-guided gradient:
https://doi.org/10.1190/1.3513016.
Margrave, G., R. Ferguson, and C. Hogan, 2010, Full waveform inversion with wave equation migration
and well control: CREWES Research Report, 22, 63.1–63.20.
Margrave, G., M. Yedlin, and K. Innanen, 2011, Full waveform inversion and the inverse Hessian:
CREWES Research Report, 23, 77.1–77.13.
Pica, A., J. P. Diet, and A. Tarantola, 1990, Nonlinear inversion of seismic reflection data in a laterally
invariant medium: Geophysics, 55, R59–R80, http://dx.doi.org/10.1190/1.1442836.
Pratt, R. G., C. Shin, and G. J. Hick, 1998, Gauss–Newton and full Newton methods in frequency–space
http://dx.doi.org/10.1046/j.1365-246X.1998.00498.x.
Shuey, R. T., 1985, A simplification of the Zoeppritz equations: Geophysics, 50, 609–614,
http://dx.doi.org/10.1190/1.1441936.
1259–1266, http://dx.doi.org/10.1190/1.1441754.
© 2017 SEG Page 1592

Treitel, S., L. Lines, and G. Ruckgaber, 1995, Seismic impedance estimation: Geophysical Inversion and
Applications: Memorial University of Newfoundland, 1, 6–11.
Vigh, D., and E. W. Starr, 2008, 3D prestack plane-wave, full-waveform inversion: Geophysics, 73, No.
5, VE135–VE144, http://dx.doi.org/10.1190/1.2952623.
Geophysics, 74, No. 6, WCC1–WCC26, http://dx.doi.org/10.1190/1.3238367.
Warner, M., and Guasch, L., 2014, Adaptive waveform inversion: theory: 84th Annual International
Wenyong, P., G. Margrave, and K. Innanen, 2013, On the role of the deconvolution imaging condition in
full waveform inversion: CREWES Research Report, 25, 72.1–72.19.
Zhou, C., W. Cai, Y. Luo, G. T. Schuster, and S. Hassanzadeh, 1995, Acoustic wave-equation traveltime
and waveform inversion of crosshole seismic data: Geophysics, 60, 765–773,
http://dx.doi.org/10.1190/1.1443815.
© 2017 SEG Page 1593

A denoising formulation of Full-Waveform Inversion
Rongrong Wang and Felix J. Herrmann, Seismic Laboratory for Imaging and Modelling, University of British Columbia
ABSTRACT FWI is very robust to white noise but less so to other types of
noises such as those coming from the source.
We propose a wave-equation-based subsurface inversion method
that in many cases is more robust than the conventional Full- A better way of treating the modelling error is to allow a
Waveform Inversion. The new formulation is written in a small misfit in the PDE, like the formulation in the Wavefield
denoising form that allows the synthetic data to match the ob- Reconstruction Inversion (WRI) (van Leeuwen and Herrmann,
served ones up to a small error. Compared to the Full-Waveform 2013a, 2015)
Inversion, our method treats the noise arising from the data means
X
suring/recording process and that from the synthetic modelling min kP⌦i ui d i k22 + k A(m)ui qi k22 .
m,ui ,i=1,...,ns
process separately. Comparing to the Wavefields Reconstruction i
Inversion, the new formulation mitigates the difficulty of choos- Despite the similarity to FWI, both the data and model misfits
ing the penalty parameter . To solve the proposed optimization are now softly penalized in their ` 2 norms with certain weight
problem, we develop an efficient frequency domain algorithm containing prior information of their relative strengths van
that alternatively updates the model and the data. Numerical Leeuwen and Herrmann (2013b). However, setting the parame-
experiments confirm strong stability of the proposed method ter is a problem, as the modelling error is often unknown. In
by comparisons between the results of our algorithm with that contrast, the energy of pure data side noise is easier to evaluate
from both plain FWI and a weighted formulation of the FWI. by data processing techniques as such noise is usually close to
being Gaussian. Note that we consider interfering signals from
unknown sources as modelling error here, the data side noise
INTRODUCTION only consists of those introduced by the measuring procedure
of the receivers.
All wave-equation based seismic inversion techniques suffer
from both additive noise and modelling errors. Albeit smaller, Recognizing this relative difficulty of estimating model-side
the latter can cause more damage to the inverted model. Depend- errors and the relative easiness of estimating data-side errors,
ing on specific experiment settings, modelling errors arising we hereby propose a denoising formulation, named FWI-DN
from PDE discretization, the use of inaccurate modelling ker- (denoising), as an alternative to FWI with enhanced stability
nels, trace truncations, timing errors, source estimation and
location errors all contribute to the noise in quite different ways.
X
These modelling errors are not taken into account in conventional min kDz ( A(m)ui qi )k22 (FWI-DN)
m,u
i
Full-Waveform Inversion (FWI) formulations, where the PDE
constraints are strictly imposed (Tarantola and Valette, 1982; subject to kP⌦i ui d i k2  ✏ i, i = 1, ..., ns
Virieux and Operto, 2009), Here ✏ i is the estimated data-side noise in ` 2 norm for the
ns
X ith shot gather, and Dz is a linear operator that performs
min kP⌦i ui d i k22 depth weighting on the modelling error with weights that are
m
i nondecreasing with depth. In this way, we avoid the difficulty
subject to A(m)ui = qi, i = 1, ..., ns. to estimate the modelling error, and at the same time acquire
the flexibility of incorporating different noise levels for different
Following the usual notation, we used qi to denote the ith sources.
source, d i to denote the corresponding data, P⌦i to denote
the restriction operator to receiver locations, and A for the We end this section by addressing the related weighted FWI. One
discretized Helmholtz matrix for the model m at a specific may argue that the usual FWI formulation can also be modified
frequency. Although the strict PDE constraints increases the to handle the non-uniform noise case where different sources
numerical efficiency by allowing one to eliminate ui and derive or even traces have different noise levels, through introducing
the reduced form, weights to the FWI misfit e.g., Farquharson and Oldenburg
(1998). Specifically, we can reformulate the objective as
ns
X X
min kP⌦i A(m) 1 qi d i k22 min wi kP⌦i A(m) 1 qi d i k22 (1)
m m
i i
it is not very effective in handle modelling errors. Note that where a natural choice of the weight could be wi = ✏ 2i0 /✏ 2i for
modelling errors may be generated from all regions of the model. some fixed i 0 and for all i = 1, ..., ns . This way, the shot gather
They become part of the data after propagating to the receiver corresponding to a larger noise level ✏ i is relatively lightly
locations. As noise terms, they are often more coherent to than penalized, reflecting the correct believe that this source is less
pure additive noise, therefore the two-norm data misfit may not reliable. However, putting the misfits for all sources in a mixed
be an appropriate way to model them. As is indeed observed, objective form causes this formulation fail to provide guarantees
© 2017 SEG Page 1594

for the final solution to lie inside the ✏ i -ball of the real data. The algorithm now reduces to solving (2) and (4) alternatively.
Furthermore, due to the nonlinearity of the objective function, As we will see in the next section, solving the subproblem (2) for
the weighted FWI can get trapped at local minima before the wavefields ui to a relatively high accuracy typically involves
entering the region where the misfits become proportionally to several inversions of the augmented system [A; P⌦ ] for some
the weights. To further support these arguments, a comparison . Subproblem (4) can be solved using variable projection
of the performances between the weighted FWI and the proposed as in the Wavefields Reconstruction Inversion whose cost is
method will be presented in the numerical section. proportional to the number of model updates multiplied by the
cost of augmented system [A; P⌦ ] inversion. We found that
the most efficient way to solve the whole (FWI-DN) problem is
METHODOLOGY to solve (2) with high accuracy during each iteration followed
by performing a few updates of the subproblem (4) for vi and m,
Solving the denoising problem so that the two subproblems have similar computational cost.
To solve FWI-DN, we propose to use the alternative minimiza- Some readers may wonder that instead of (4), why not use
tion method. For numerical efficiency considerations, instead a more natural way of updating m, that is to fix uik+1 and to
of updating the wavefields ui (i = 1, ..., ns) and the model m minimize m via
alternatively, we suggest to do the alternative update between the X
data P⌦i ui (i = 1, ..., ns) and the model m. To be more specific, m k+1 = arg min kDz ( A(m)uik+1 qi )k22, (5)
m
let us first write out the subproblems. At the k-th iteration, for i
fixed m k , we obtain P⌦i uik+1 by solving a quadratic constraint The reason we prefer (4) to (5) is that when solving (5) we
problem cannot make much progress in updating m since the complete
P⌦i uik+1 = arg min kDz ( A(m k )ui qi )k22, (2) wavefields remain fixed, whereas this is better in (4) because
ui only the wavefields at the boundary are fixed.
subject to kP⌦i ui d i k2  ✏ i .
Solving the denoising problem
Then fix P⌦i uik+1 , and solve for m k+1 through (3),
The subproblem (2) is a least-squares problem with a norm
X
(m k+1, uek+1 ) = arg min kDz ( A(m)ui qi )k22, inequality constraint. It is known that for this type of problem,
m,u1,...,un s
i ui can be evaluated by transforming it to a related Lasso problem,
i.e., the penalty formulation of WRI with a specific parameter ,
subject to P⌦i ui = P⌦i uik+1, (3)
and be solved by carrying out one inversion of the augmented
where uek+1 = [eu1k+1, ..., uenk+1 ] represents a collection of the ns system [A; P⌦ ].
s
wavefields that minimizes (3). Specifically, to solve Subproblem (2), we use the Lagrangian
Intuitively, each iteration of subproblem (2) can be interpreted dual approach, which solves the dual problem
as a data denoising procedure with the output P⌦ uik+1 being the
max G( ) (6)
denoised data. The denoised data is then fitted exactly in solving 0
the subproblem (3). The denoising step will get increasingly
where G is the Lagrange dual of (2) and is the dual variable,
more accurate as the model iterates m k become better.
i.e.,
Notice that subproblems (3) decouple for each source i, and
that its strict equality constraints can be used to eliminate the
G( ) = min kDz ( A(m)ui qi )k22 + kP⌦i ui d i k22 ✏i.
ui
wavefileds variable ui at the receiver locations. As a result (3)
is equivalent to ns unconstrained problems Since G is differentiable with respect to , one can solve
G 0 ( ) = 0 for the minimizer ˆ and use the strong duality (More,
(m k+1, e
vik+1 ) (4) 1993) to obtain ui . It is easy to calculate that
= arg min kDz { A(m)(P⌦
T
c vi + T
P⌦ P uk+1 )
i ⌦i i
qi }k22, G 0 ( ) = kP⌦i ūi ( ) d i k22 ✏i
m,vi i
for i = 1, ..., ns with ūi being the solution to the related Lasso problem
where ⌦ic is the complementary set of ⌦i and is the T
P⌦ i ūi = arg min kDz ( A(m k )ui qi )k 2 + kP⌦i ui d i k2, (7)
transpose of the operator P⌦i . More specifically, (4) is obtained ui
by setting a new variable vi through vi = P⌦ic ui and substituting which has the closed form solution
the ui in (3) by
 †
T T Dzp( A(m k )) pz (qi )
D
ui = P⌦ P u + P⌦
i ⌦i i
c P⌦ c ui ūi =
i i P⌦i di
T
= P⌦ P uk+1 + P⌦
i ⌦i i
T
c vi .
i where † is the pseudo inverse. Also, we can obtain the second
Note that since the variables will not be used in future
ṽik+1 derivative
iterations, the whole purpose of solving subproblem (4) is to
G 00 ( ) = (P⌦i ūi d i )T P⌦i C 1 P⌦
T
(P⌦i ūi di )
update the model m. i
© 2017 SEG Page 1595

where compare the results of the proposed method with that of the
C = A(m)T DzT Dz A(m) + P⌦
T
P .
i ⌦i
conventional FWI. For our method, since only modelling errors
The equation d 0 ( ) = 0 can then be solved by the Newton’s exist in this case, we set ✏ i = 0 and the output of subproblem (2)
method. Start with = 0 and update is therefore simply P⌦i d i . As a result, for each frequency, we
only need to solve the subproblem (4) without any alternating.
k+1
= k G 0 ( )/G 00 ( ). We choose a simple linear depth weighting Dz = z, which
we observed greatly enhanced the stability of our algorithm
According to the strong duality principle (More, 1993), once compared to using a constant weight. The inversions are
the optimal ⇤ is found, we can obtain uik+1 through (7). performed in frequency domain one frequency slice at a time
from 3Hz to 15 Hz. For each frequency, the minimization
Updating the model
problems for both methods are solved until convergence.
To solve (4), we use the variable projection approach as in WRI.
The inverted results from FWI and our method are shown
For a fixed m, vik+1 has a unique closed form optimal solution
in Figure 1c and Figure 1d, respectively. Compared to the
that minimizes the objective function in (4)
true model in Figure 1a, both inversions are kinematically
v̄i = Dz ( A⌦ic ) † (m k )Dz (qi + A⌦i (m k )P⌦i uik ), correct. However, our result is more accurate in high-frequency
reconstruction, therefore has greater smoothness. The final
where A⌦ic (m k ) denotes the submatrix of A(m k ) formed by model reconstruction error of our method is about 60% of that
columns indexed by ⌦ic . Using this to project out vi and from the FWI. We also observe that at the shallow part of the
rewriting (4) as inverted model especially around the receiver locations, our
result is very noisy. This is a consequence of the increased
mk+1 = arg min kDz { A(m)(P⌦
T T k+1
c v̄i + P⌦ P⌦i ui
i
) qi }k22, depth weighting, which pushes the modelling error up to the
m i
shallow part and guarantees the deep part to be stable. Since
it is easy to calculate its gradient g and Gauss-Newton Hessian the deep part is usually the region of interest, this is a benefit of
H with respect to m. As mentioned before, for numerical incorporating the weight.
efficiency we only perform a few updates of m We hence update
m by the Newton’s step
(km/s) (km/s)
H 1 (m k )g(m k ).
0 4.5 0 4.5
m k+1 = mk (8) 0.2

4
0.2
4
0.4 0.4
0.6 0.6
3.5 3.5
Now we summary the main steps of the proposed algorithm.

0.8 0.8
depth(km)
depth(km)
1 1
3 3
1.2 1.2
1.4 2.5 1.4 2.5
Algorithm 1 Algorithm for solving WRI-DN 1.6
2
1.6
2
1.8 1.8
1: procedure I : d i , ⌦i , A, m0 , T1 , T2 2
1.5
2
1.5
for frequency = f low , ...., f high do

0 1 2 3 4 5 0 1 2 3 4 5
2: distance (km) distance (km)
3: for k = 1,... T1 do (a) True model (b) Initial model
4: for each source i {1, ..., ns } do (km/s) (km/s)
update uik by solving (2)

0 4.5 0 4.5
5: 0.2 0.2
6: end for 0.4

4
0.4
4
for j= 1,..., T2 do
0.6 0.6
7: 0.8
3.5
0.8
3.5
depth(km)
depth(km)
8: updating m using (5). 1

3
1
3
end for
1.2 1.2
9: 1.4 2.5 1.4 2.5
10: end for 1.6 1.6
end for
1.8 2 1.8 2
11: 2 2
end procedure
1.5 1.5
12: 0 1 2 3
distance (km)
4 5 0 1 2 3
distance (km)
4 5
(c) Inverted model with FWI (d) Inverted model with FWI-DN
Figure 1: A comparison of stability of FWI with FWI-DN under

NUMERICAL EXPERIMENT source estimation error: (a) The true model; (b) The initial
model; (c) Inversion result with FWI; (d) Inversion result with
Experiment 1 the proposed method FWI-DN and linear depth weighting.
We test the robustness of the proposed method in the presence
of modelling errors. We use a special type of modelling Experiment 2
error coming from inaccurate estimates of the source signature.
We test the robustness our method with respect to strong white
The test model is the 2D BG-compass model. Sources and
Gaussian noise. The SNR of the low-frequency data (2 10 Hz)
receivers are placed at 12 m in depth with source spacing 240 m
used in this example is 0 dB, and that for high-frequency (10
and receiver spacing 48 m. All sources have the same Ricker
15 Hz) is 25 dB. We use shallow water Marmousi model with
wavelet signature centred at 10 Hz. The wrongly estimated
50 m water layer. Sources and receivers are placed at 12m
source signature is a shrinkage of the true one by 20%. We
© 2017 SEG Page 1596

depth with source and receiver spacing of 240 m and 48 m, (km/s) (km/s)
respectively. All the inversions are performed in the frequency
0 5.5 0 5.5
domain one frequency slice at a time from 3 Hz to 15 Hz.

0.2 5 0.2 5
4.5 4.5
0.4 0.4
We assume non-uniform noise. The data associated with sources

4 4
depth(km)
depth(km)
0.6 0.6
located on the left half of the top model are polluted by white
3.5 3.5
0.8 0.8
3 3
Gaussian noise at level ✏, and those on the right half are damped 1
2.5
1
2.5
by the same type of noise at a different levels 3✏. In the FWI-DN, 1.2
2
1.2
2
we set the corresponding noise threshold to be 0.8✏ and 2.4✏

1.4 1.4
1.5 1.5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
which are inaccurate and conservative estimates of the true

distance (km) distance (km)
(a) True model (b) Initial model

noise levels. We comment that it is always preferable to choose
a conservative estimate as we do not need the output of the first 0
(km/s)
0
(km/s)
5.5
subproblem to be completely noise free. This is because the

6
0.2
5.5 0.2 5
second subproblem (4) itself is also a stable algorithm that can

0.4
5 4.5
0.6 0.4
handle the rest of the noise. Besides, if the noise threshold ✏ i is 0.8
4.5
4
depth(km)
depth(km)
0.6
set too high, then the algorithm will remove a significant portion
1 4
3.5
1.2 3.5 0.8
of the signal components as well, leading to an unsatisfactory

3
1.4 3
1
update of the model.

1.6 2.5
2.5
1.8 1.2
2 2
2
1.4
For comparison, we run both our algorithm and the weighted

1.5 1.5
0 1 2 3 4 5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
distance (km) distance (km)
FWI. For the latter, we set weights to exactly reflect the prior (c) Inverted model with FWI (d) Inverted model with FWI-DN
information that sources on the left are 3 times more accurate
than those on the right. Specifically, let N1 be the set of source Figure 2: A comparison of robustness of FWI with FWI-DN
indices on the left and N2 be those on the right. The weighted under Gaussian noise with SNR=0 dB: (a) The true model; (b)
FWI minimizes The initial model; (c) Inversion result with weighted FWI; (d)
X X Inversion result with the proposed method FWI-DN and linear
min 9kP⌦i A 1 (m)qi d i k22 + kP⌦i A 1 (m)qi d i k22 . depth weighting.
m
i 2N1 i 2N2
Figure 2c shows the inverted model using weighted FWI started

from the initial guess displayed in Figure 2b. We observed information in the data about m = mtrue m0 . When ✏ is
that the noise is too large for the weighted FWI to even keep neither too large nor too small, the method smooths the data
the kinematic correctness of the inversion. In contrast, our towards the direction of the initial guess. Hence the better the
method not only produces a kinematically correct model but initial guess is, the better the denoiser performs. In addition,
is also able to filter out the noise and keep the smoothness when the computational speed is a concern, one can sacrifice
in the reconstruction. Admittedly, the deep part of the model some accuracy and use a very small m0 , say only the top water
is not well reconstructed, due to the fact that the weak signal layer.
components coming from the deep part are filtered out along
with the noise. This seems to be an inevitable consequence
in such a high noise scenario. Moreover, as in the previous DISCUSSION AND CONCLUSIONS
example, there are large inversion errors in the shallow water
layer region, this is due to the depth weighting which allows We proposed a denoising formulation for FWI named FWI-DN,
a larger modelling error for the shallower part than the deeper which separately treats the data-side and model-side noises as
part. Fortunately, this error does not propagate down to affect opposed to combing them into one objective as in the conven-
the reconstruction of those regions that we are most interest in. tional FWI. The WRI formulation is in between the proposed
method and the FWI, and is equivalent to our problem if an
Finally, it is worth mentioning that one can modify the proposed oracle is given for the parameter that balances the data and
method into a pure noise attenuation method by deactivating the PDE misfits. In a sense, our formulation is an extension
its model updating step (3). We use the initial guess m0 as a of the WRI formulation. We proposed an efficient algorithm
tuning parameter of the denoiser, and solves for FWI-DN which performs alternating updates on the model
X and the data. The part of the algorithm that solves the data
ū = arg min kDz ( A(m0 )ui qi )k22
ui ,i=1,...,ns updating subproblem can be used as a stand-alone denoising
i
method when good initial guesses of the model is available.
subject to kP⌦i ui d i k2  ✏ i, i = 1, ..., ns .
The output of the algorithm P⌦i ūi is then a smooth approxima-
tion to d i . The smoothness is a result of minimizing the PDE ACKNOWLEDGEMENT
misfit, as wavefields that obey the PDE has a certain level of
regularity. If the noise level ✏ i is too small, then the output This research was carried out as part of the SINBAD project
is close to the input. On the other hand, if ✏ i is too larger, with the support of the member organizations of the SINBAD
say larger than kd i d i0 k2 with d i0 being the data of the initial Consortium.
model, then ū is simply the wavefiled of m0 , so we lose all the
© 2017 SEG Page 1597

EDITED REFERENCES
REFERENCES
Farquharson, C. G., and D. W. Oldenburg, 1998, Non-linear inversion using general measures of data
misfit and model structure: Geophysical Journal International, 134, 213–227,
http://dx.doi.org/10.1046/j.1365-246x.1998.00555.x.
More, J. J., 1993, Generalizations of the trust region problem: Optimization methods and Software, 2,
189–209, http://dx.doi.org/10.1080/10556789308805542.
Tarantola, A., and B. Valette, 1982, Generalized nonlinear inverse problems solved using the least
squares criterion: Reviews of Geophysics, 20, 219–232,
http://dx.doi.org/10.1029/RG020i002p00219.
van Leeuwen, T., and F. J. Herrmann, 2013a, Mitigating local minima in full-waveform inversion by
expanding the search space: Geophysical Journal International, 195, 661–667,
van Leeuwen, T., and F. J. Herrmann, 2013b, A penalty method for PDE-constrained optimization: U. S.
Patent 20160070023 A1.
van Leeuwen, T., and F. J. Herrmann, 2015, A penalty method for PDE-constrained optimization in
inverse problems: Inverse Problems, 32, 015007, https://doi.org/10.1088/0266-5611/32/1/015007.
Geophysics, 74, no. 6, WCC1–WCC26, http://dx.doi.org/10.1190/1.3238367
© 2017 SEG Page 1598

Global 3D acoustic FWI using sparse model parameterization
Debanjan Datta⇤ and Mrinal K. Sen, University of Texas at Austin; Scott Morton and Faqi Liu, Hess Corporation
SUMMARY In this paper, we propose an extension to the velocity inter-

face method to three dimensions. Following Datta and Sen
Estimating a starting velocity model for Full Waveform Inver- (2016), we employ a sparse parameterization technique where
sion can be challenging. It requires several passes of migra- we represent a velocity model using a set of interfaces and ve-
tion velocity analysis to obtain a model accurate enough to locities across the interfaces. The idea behind our approach is
prevent cycle skipping. We present an alternative approach to find an optimal set of interfaces and velocities in which the
to estimate a 3D starting model using a global optimization corresponding model has the minimum misfit using a global
method called Very Fast Simulated Annealing. To constrain optimization technique. Once the model is estimated, we test
the optimization problem with a large number of unknowns, it for correctness using the conventional FWI over a few 2D
we parameterize the 3D model with surfaces and velocities slices of the model.
surrounding them and solve for the optimal parameters. The
final estimated model from VFSA serves as a starting model
for FWI. We demonstrate the effectiveness by comparing FWI THEORY
results along a few 2D lines from the 3D model. The proposed
method is largely automated and reduces numerous man hours Consider a 3D velocity model defined by v(x, y, z). We define
required to build a starting model. We apply our proposed a set of interfaces in 3D as zi (x, y), where zi is the depth of the
method to one toy model and one complex synthetic model. interface at (x, y). Each interface is in turn parameterized by a
In both cases, we were able to obtain acceptable results. Use set of isodepth contours defined as point sets. The collection
of VFSA with a sparse parameterization method makes our 3D of pointsets are then interpolated to obtain interface defined in
global inversion a practical tool. (x, y). 2 velocities vup and vdn are defined at the top and bot-
tom of each interface. When the set of interfaces and the veloc-
ities across them are decided, the complete velocity model is
built by linearly interpolating the velocities across them. This
INTRODUCTION sparse parameterization allows us to represent a 3D model us-
ing a few finite parameters. To compute the seismic response
Building a starting model for Full Waveform Inversion (Taran- of the models, we use the acoustic wave equation in 3D given
tola, 1984; Virieux and Operto, 2009) requires enormous man by
hours where an initial estimate of the velocity model is devel- 1 ∂ 2P
oped after multiple iterations of Migration Velocity Analysis = —2 P + s (x, y, z,t) , (1)
c(x, y, z) ∂t 2
2
(MVA) before arriving at a model that does not suffer from cy-
cle skipping in the seismic bandwidth. This problem is further where P is the pressure wavefield, —2 is the Laplacian given
2 2 2
exacerbated in 3D where each migration pass is several orders by ∂∂x2 + ∂∂z2 + ∂∂y2 , c(x, y, z) is the velocity field and s(x, y, z,t)
of magnitude more expensive than that of 2D. is the source term. We sample the pressure wavefield at the re-
ceiver locations to obtain the desired seismograms. The seis-
Several approaches have been proposed over the years to alle- mograms are then compared with the recorded seismograms
viate the problem of cycle skipping. The multiscale approach using a cross-correlation objective function.
(Bunks et al., 1995; Sirgue and Pratt, 2004) steps over multiple
frequencies to go from low to high frequencies. An alterna- We find an optimal set of velocities and interfaces by minimiz-
tive domain of implementation in the Laplace Fourier domain ing the misfit between the data from the true 3D model and
was presented by Kim et al. (2013), which focuses on differ- data from random models. We use Very Fast Simulated An-
ent scales using damping values for the Laplace domain. Al- nealing (VFSA) (Ingber and Rosen, 1992) for this purpose. A
momin et al. (2012) presented a composite objective function starting model is updated iteratively using a control parame-
containing updates to both migration as well as tomographic ter called temperature and unlike greedy algorithms, is able to
components and thereby being less dependent on the starting accept worse solutions. A detailed overview of VFSA can be
model. Methods based on signal processing constitute adap- found in Sen and Stoffa (2013).
tive filters proposed by Warner and Guasch (2014), Dynamic
Image warping proposed by Ma and Hale (2013) or auxillary Using the VFSA derived model along several 2D slices, we
Bump Functional by Bharadwaj et al. (2016). Another pro- carry out FWI. We use the same equation as in Equation 1.
posed method is a hybrid optimization approach (Datta and To compute the gradient we back propagate the data residuals
Sen, 2016; Datta et al., 2016) where the starting model for using the adjoint state method (Plessix, 2006) given by
FWI is estimated by using a combination of sparse parame-
terization in a global optimization method. Global methods 1 ∂ 2R
are not strongly dependent on the choice of the starting model = —2 R + Dd (x,t) , (2)
c(x, z)2 ∂t 2
and because of sparse parameterization they converge in finite
iterations. Once a starting model is estimated, it is used in the where R is the adjoint wavefield , and Dd (x,t) is the data-
conventional FWI. residual.
© 2017 SEG Page 1599

3D Hybrid FWI
The gradient is computed by crosscorelating the forward wave- shown in Figure 2(b). The error vs iteration plot is shown in
field with the adjoint one as Figure 3 . The curve shows that error oscillates in the initial
iterations while in the later iterations it searches for a better
X ∂ 2P solution in the vicinity of the current solution. To demonstrate
∂E 1
= R. (3) the quality of inversion, we show the shot gathers from the true
∂ m c (x, z) 3 ∂ 2t
shots model vs the VFSA model in Figure 4. The gathers show that
The gradient is now used to update the model using a L-BFGS they do not suffer from cycle skipping and therefore, the de-
optimizer (Zhu et al., 1997) to obtain the final model update. rived model can potentially be a good starting model for stan-
dard FWI.
We took 2 slices from our 3D VFSA model from the (x, z)

RESULTS
plane at y= 3km and 5.7km and performed a 2D FWI on both
slices. The inversion was done at frequency ranges of 3,4,5,6
Validation test on toy model
Hz for 30 iterations per frequency. The slices from the VFSA
First we demonstrate a proof of concept using a flat toy model model are shown in Figure 5(b) and 6(b) while the final mod-
with 4 layers and 3 interfaces shown in Figure 1. Because it is els are shown in Figures 5(c) and 6(c). The final model shows
a model with flat interfaces, all shot gathers will give the same close resemblance to the true model shown in Figures 5(a) and
response. So we ran the inversion with just one shot in the cen- 6(a) thereby demonstrating the effectiveness of the VFSA ap-
ter of the model. We ran the VFSA algorithm for 100 iterations proach.
to obtain an optimal set of interfaces and velocities. The true
values of the depth of interfaces and the velocities, their search
space and inverted values are show in Table 1. We observe that CONCLUSIONS
even with a wide search space we are able to obtain a model
that is very close to the true model. We present a novel FWI method using VFSA to estimate start-
ing models in 3D. The method is largely automated and has
Toy model inversion the potential to automate velocity model building skipping on
a few iterations of MVA. The algorithm is reliably able to esti-
vup vdn Depth mate the correct set of interfaces and velocities the correspond-
ing model of which gives the least misfit with the real data. The
1500 [1500] 2004 [2000] 50 [50]
Int 1 estimated models do not suffer from cycle skipping and is able
(1500,1500) (1750,2250) (50,50)
to recover the true model after conventional FWI. The cost of
this approach is comparable to standard FWI as it avoids ex-
2009 [2000] 2508 [2500] 98 [100]
Int 2 traneous forward modeling operations to compute gradient and
(1700,2300) (2200,2800) (75,125)
step lengths.
2520 [2500] 3021 [3000] 157 [150]
Int 3
(2200 2800) (2800,3500) (120,180)
ACKNOWLEDGMENTS
Table 1: Table showing VFSA results for a 4-layered toy The authors are thankful to Hess Corporation for providing
model with 3 interfaces. The true values are shown in square computational facilities and partial financial support.
brackets and the search space is given in parentheses
SEG EAGE Overthrust Model

The second model we used was a modified EAGE-SEG 3D
Overthrust model (Aminzadeh et al., 1997) shown in Figure
2(a). We selected half the grid points from the strike section
making the model size (801,401,207) in the x, y and z direc-
tions respectively, each with a grid spacing of 15m. We gen-
erated synthetic data using a Ricker wavelet with a central fre-
quency of 4Hz. The maximum offset were 9 km and 4.5 km
in the dip and strike direction respectively. To obtain an initial
guess for the interfaces, we heavily smoothed the model and
extracted a few surfaces with the same velocities. For a real
dataset, this can also be done after semblance based velocity Figure 1: The 4 layer TOY model
analysis. We selected 5 interfaces from the smoothed model
at velocity values between 2200 m/s and 6000m/s. From each
interface a set of few points were identified. The search space
of the few isodepths were put 500m top and below the initial
isodepth. We ran VFSA for 100 iterations, the final model is
© 2017 SEG Page 1600

3D Hybrid FWI
(a)
(a)
(b)
Figure 2: (a) True 2D SEG-EAGE Overthrust Model (b) In-

verted model after VFSA
(b)
Figure 4: (a) Modelled Seismogram from the true SEG-EAGE

overthrust model and (b) Modeled seismogram from the VFSA
Figure 3: Error vs iterations for the SEG EAGE Overthrust result
Model
© 2017 SEG Page 1601

3D Hybrid FWI
(a) (a)
(b) (b)
(c) (c)
Figure 5: (a) The True 2D slice from the SEG-EAGE Over- Figure 6: (a) The True 2D slice from the SEG-EAGE Over-
thrust Model (b) Starting slice from VFSA and (c) Inverted 2D thrust Model (b) Starting slice from VFSA and (c) Inverted 2D
model from (b) model from (b)
© 2017 SEG Page 1602

EDITED REFERENCES
REFERENCES
Almomin, A., B. Biondi, 2012, Tomographic full waveform inversion: Practical and computationally feasible
approach: 82nd Annual International Meeting, SEG, Expanded Abstracts, 1–5,
Aminzadeh, F., B. Jean, and T. Kunz, 1997, 3D salt and overthrust models: Society of Exploration Geophysicists.
Bharadwaj, P., W. Mulder, and G. Drijkoningen, 2016, Full waveform inversion with an auxiliary bump functional:
Geophysical Journal International, 206, 1076–1092, http://doi.org/10.1093/gji/ggw129.
Bunks, C., F. M. Saleck, S. Zaleski, and G. Chavent, 1995, Multiscale seismic waveform inversion: Geophysics, 60,
1457–1473, http://doi.org/10.1190/1.1443880.
Datta, D., M. Sen, F. Liu, and S. Morton, 2016, Salt model building by shape-based parameterization and global
FWI: 86th Annual International Meeting, SEG, Expanded Abstracts, 1069–1073,
Datta, D., and M. K. Sen, 2016, Estimating a starting model for full-waveform inversion using a global optimization
method: Geophysics, 81, R211–R223, http://doi.org/10.1190/geo2015-0339.1.
Ingber, L., and B. Rosen, 1992, Genetic algorithms and very fast simulated reannealing: A comparison:
Mathematical and Computer Modelling, 16, 87–100, http://doi.org/10.1016/0895-7177(92)90108-W.
Kim, Y., C. Shin, H. Calandra, and D. J. Min, 2013, An algorithm for 3D acoustic time-Laplace-Fourier-domain
hybrid full waveform inversion: Geophysics, 78, R151–R166, http://doi.org/10.1190/geo2012-0155.1.
Ma, Y., and D. Hale, 2013, Wave-equation reflection traveltime inversion with dynamic warping and full waveform
inversion: Geophysics, 78, R223–R233, http://doi.org/10.1190/geo2013-0004.1.
Plessix, R.E., 2006, A review of the adjoint-state method for computing the gradient of a functional with
geophysical applications: Geophysical Journal International, 167, 495–503, http://doi.org/10.1111/j.1365-
246X.2006.02978.x.
Sen, M. K., and P. L. Stoffa, 2013, Global optimization methods in geophysical inversion: Cambridge University
Press, https://doi.org/10.1017/cbo9780511997570.
Sirgue, L., and R. G. Pratt, 2004, Efficient waveform inversion and imaging: A strategy for selecting temporal
frequencies: Geophysics, 69, 231–248, http://doi.org/10.1190/1.1649391.
Tarantola, A., 1984, Inversion of seismic reflection data in the acoustic approximation: Geophysics, 49, 1259–1266,
http://doi.org/10.1190/1.1441754.
Virieux, J., and S. Operto, 2009, An overview of full waveform inversion in exploration geophysics: Geophysics,
74, WCC1–WCC26, http://doi.org/10.1190/1.3238367.
Warner, M., and L. Guasch, 2014, Adaptive waveform inversion: Theory: 84th Annual International Meeting, SEG,
Expanded Abstracts, 1089–1093, http://doi.org/10.1190/geo2015-0387.1.
Zhu, C., R. H. Byrd, P. Lu, and J. Nocedal, 1997, Algorithm 778: Lbfgsb: Fortran subroutines for large-scale bound
constrained optimization: ACM Transactions on Mathematical Software (TOMS), 23, 550–560,
http://doi.org/10.1145/279232.279236.
© 2017 SEG Page 1603

Automated time-window selection based on machine learning for full-waveform inversion
Yangkang Chen1 , Judith C. Hill1 , Wenjie Lei2 , Matthieu Lefebvre2 ,
Ebru Bozdağ3 , Dimitri Komatitsch4 , and Jeroen Tromp2
1 Oak Ridge National Laboratory, 2 Princeton University, 3 Colorado School of Mines, 4 LMA, CNRS UPR 7051
SUMMARY and Chen, 2013) and even attenuation measurements (Laurence

and Shearer, 2006).
Due to increased computational capabilities afforded by mod-
ern and future computer architectures, the seismology commu- Maggi et al. (2009) introduced an automated time-window se-
nity is demanding a more comprehensive understanding of full lection algorithm called FELXWIN to arbitrarily select time
waveform information from recorded seismic data. Full wave- windows in entire seismic traces where observed seismograms
form inversion seeks to match observed seismic data with syn- and synthetic waveforms are sufficiently close. Maggi et al.
thesized seismograms by iteratively updating subsurface model (2009) designed a sophisticated five-stage workflow to con-
parameters. Synthetic data are generated by solving the seis- struct a robust filter that passes high-quality data components.
mic wave equation using an effective and efficient numerical In this abstract, we present an intelligent algorithm to select
algorithm. In order to ensure inversion accuracy and stability, time windows from observed and synthetic seismograms based
both synthesized and observed seismograms must be carefully on machine learning. The window selection problem can be
pre-processed. More specifically, when synthetic and observed formulated as a classification problem, i.e., for each candidate
data have a large waveform mismatch during the initial itera- window the decision is to either select or reject. A neural
tions, waveforms should be carefully selected for calculating network can be trained using available time windows that are
the misfit gradient in order to avoid instability. We introduce selected by the FLEXWIN method (Maggi et al., 2009), and is
a fully automated algorithm based on machine learning (ML) subsequently applied to a large independent dataset.
to intelligently select time windows for calculating the misfit
between observed and synthetic seismograms. The training
dataset can be prepared using time windows obtained based THEORY
on the FLEXWIN method, in which selection parameters are
finely tuned. Results show that automatically selected time Adjoint state method
windows are of sufficiently high quality compared with the The target of full waveform inversion is to minimize an objec-
benchmark FLEXWIN method. tive function that measures the misfit between observed and
synthetic seismic data (Tromp et al., 2005; Chen et al., 2016),
e.g., the least-squares misfit
INTRODUCTION N Z
1X T
χ(m) = k d(xr , t) − s(xr , t, m) k22 dt , (1)
Emerging 3D-3D tomographic methods, i.e., seismic tomog- 2
r =1 0
raphy based on a 3D reference model and 3D numerical simu-
where d(xr , t) denotes three-component seismic data recorded
lations of seismic wavefields, take advantage of full wavefield
at station xr , N the number of stations, and m a given earth
simulations and finite-frequency kernels, thereby reducing data
model. It is worth noting that any misfit function can be
restrictions required when using approximate forward mod-
used in adjoint inversions. In fact, the selected windows —
eling and simplified descriptions of sensitivity (Hung et al.,
by FLEXWIN or any other algorithm— are also related to the
2000; Dahlen et al., 2000; Maggi et al., 2009). In explo-
chosen misfit function.
ration seismology, tomographic methods using full waveform
forward modeling and adjoint state based inversion techniques In isotropic elastic media, the gradient of the misfit function (1)
are commonly referred to as “full-waveform inversion” (FWI). can be formulated as
FWI is an iterative process to update the model by minimizing Z
the least-squares misfit between recorded and synthesized data δχ = (Kκ δ ln κ + K µ δ ln µ + K ρ δ ln ρ) d3 x , (2)
predicted from the current model in the data domain (Tarantola, V
1984; Pratt et al., 1998; Pratt, 1999; Symes, 2008; Virieux and where κ, µ, and ρ denote the bulk modulus, shear modulus, and
Operto, 2009; Morgan et al., 2013; Warner et al., 2013; Xue density, respectively. The sensitivity kernels Kκ , K µ , and K ρ
et al., 2016). Because of rapid developments in waveform- are the Fréchet derivatives with respect to the bulk modulus,
based inversion methods and strategies and larger and larger shear modulus, and density, respectively. Specifically
data volumes, more efforts have been directed into investiga- Z T
tions of automated picking of seismic phases for misfit calcu- Kκ (x) = − κ(x) [∇ · s(x, t)][∇ · s† (x, T − t)] dt , (3)
lation. vanDecar and Crosson (1990) proposed a partially au- 0
Z T
tomated multi-channel cross-correlation method to determine
K µ (x) = − 2µ(x) D(x, t) : D† (x, T − t) dt , (4)
teleseismic relative phase arrival times. This approach was 0
extended to efficient methods for obtaining highly accurate Z T
traveltime (Sigloch and Nolet, 2006; Houser et al., 2008; Lee K ρ (x) = ρ(x) ∂t s(x, t) · ∂t s† (x, T − t) dt , (5)
0
© 2017 SEG Page 1604


where s† denotes the adjoint wavefield, and D = 21 ∇s + (∇s)T − all vectors in equations (6)–(9) are row vectors. The softmax
1 (∇ · s)I and D† denote the traceless strain deviator and its ad- function is defined as
3
joint (Tromp et al., 2005). ez j
softmax(z j ) = P K . (10)
Time-window selection as a classification problem k=1 ez k
The aim of pattern recognition is the classification of objects
into a finite number of categories. In a pattern recognition The neural network learning process is equivalent to the fol-
system an object and a set of categories are given as input and lowing minimization problem:
the system decides to which category the object belongs. In
general, it works in two stages. In the first stage, feature ex- K
X
traction (also known as the pre-processing or parameterization min L(y, ŷ) = − yk log ŷk , (11)
W1,W2,b1,b2
stage), a set of measures are extracted from the input object. k=1
In the second stage, classification, the object is associated with where L(y, ŷ) is called the loss function, or more specifically
one of the categories based on these features. the categorical cross-entropy loss (also known as the negative
log likelihood), and K is the number of classes.
The main steps of machine learning based window selection
are learning and predicting. In the learning process, the three Equation (11) sums over our training examples and adds to
stages are listed as follows: the loss if we predicted the incorrect class. The further away
the two probability distributions y (the correct labels) and ŷ
1. Gathering as many windows as possible, not only con- (our predictions) are, the greater our loss will be. By finding
taining usable windows, but also unusable windows. If parameters that minimize the loss we maximize the correctness
only usable windows are considered, the machine will likelihood of our trained network.
not infer the criteria of defining an unusable window.
We use the gradient descent method to find the minimum.
2. Collecting the values of the five measurements (fea-
The gradient descent method needs the gradient of the loss
tures) of all input windows: cross-correlation value
function with respect to the parameters: ∂W
∂L , ∂L , ∂L ,
between synthetic and observed seismograms, cross- 1 ∂b1 ∂W2
∂b2 . To calculate these gradients, we use the backpropagation
∂L
correlation time lag between synthetic and observed
seismograms, amplitude ratio between synthetic and algorithm, which conveniently calculates the gradients starting
observed seismograms, window length, and minimum from the output.
short-term-average/long-term-average (STA/LTA) of en-
Initial windows selection
velopes of synthetic seismogram, as the input variables There are typically two ways to create initial windows. One is
(which will be introduced in detail later). by collecting all the windows along each trace with a constant
3. Using a typical pattern recognition neural network model window length, and then using all these initial windows in the
for setting up the model (neural network). The intro- learned NN. The advantage of this strategy is that we will not
duction of the neural network model will be discussed miss any possible window along the entire trace. The disad-
below. vantage of this strategy is that the performance highly depends
on the search window length, which still requires human input
When the neural network is trained, the next step is to use it for and experience, and it is also sensitive to the signal-to-noise
predicting the selection mode (usable or unusable) of each input ratio of an input trace or input window. Thus, this strategy
window from the measurements (features) of each individual makes the resulting window selection algorithm not fully au-
window. tomated. Another strategy is to use initial windows picked
from the STA/LTA as in the traditional FLEXWIN algorithm.
Classification via neural network training A STA/LTA measure from the envelope of synthetic data is
In this section, we introduce the mathematical basics of the calculated in order to detect triggers in the traces. Since the
NN related classification framework. Given input data x, the STA/LTA already represents an automated way of detecting
neural network makes predictions using forward propagation. arrivals, we can simply use it to create initial windows for sub-
Take the 3-layer neural network as an example. The algorithm sequent mode prediction. The STA/LTA can also help reject
is as follows: a large part of noisy traces that are not good for misfit calcu-
z1 = x W1 + b1 , (6) lation. A problem arising in STA/LTA based initial window
selection is that it will possibly create some overlap between
a1 = tanh(z1 ) , (7) selected windows. However, this will not be an issue at the
z2 = a1 W2 + b2 , (8) final stage, after window mode prediction, since we can merge
a2 = ŷ = softmax(z2 ) , (9) windows with temporal overlap. In our algorithm, we use the
second strategy.
where zi is the input of layer i and ai the output of layer i
after applying the hyperbolic tangent activation function. W1 , Feature selection
W2 , b1 , and b2 are the unknown parameters we need to solve Feature extraction is basically a transformation stage from data
during the NN training process. W1 and W2 are called weight space into a feature space to extract robust information from
matrices and b1 and b2 are called bias vectors. Note that the waveform in a compressed form. This step is critical for
© 2017 SEG Page 1605

the success of the classification task. Each datum —here three- In this abstract, the actual classification value of each window
component seismograms associated with one event— is rep- is given by the FLEXWIN algorithm based on finely tuned pa-
resented by a feature vector which is used to train and test rameters. We select a benchmark dataset for training the neural
machine-learning models. It is important to select features that network, and then apply the trained network to an independent
are informative and predictive of the individual datum prop- waveform dataset and evaluate the classification accuracy in or-
erties. Furthermore, the size of the feature set and types of der to validate the algorithm. It is worth mentioning that here
features (e.g., nominal, numeric, etc.) define the size of the “accuracy” simply means the level at which the NN classified
learning problem. In other words, the larger the feature space, results are similar to results obtained based on the FLEXWIN
the more possible combinations of features need to be exam- method. We can treat this “accuracy” as a quantitative ref-
ined and learned by the machine-learning algorithm (Mousavi erence when evaluating performance, and we can confirm the
et al., 2016). We have selected five features for training the validity of window selection performance by visual observa-
neural network, namely tion and human experience. However, the biggest challenge
for FLEXWIN is to find a common set of parameters for data
• Normalized cross-correlation (CC) value between ob- from different types of earthquakes. Thus we are generally
served and synthetic seismograms, conservative, eliminating bad selections but also eliminating
good waveforms when using large datasets. We are currently
CC = max[Γ(t)] , using FLEXWIN selections to validate the proposed approach,
R
˜ 0 − t) dt 0
s̃(t 0 ) d(t (12) but the ultimate aim is to go beyond FLEXWIN selections
Γ(t) = R R . and maximize the selection of usable parts of waveforms in
[ s̃2 (t ) dt d˜2 (t 0 − t) dt 0 ]1/2
0 0
seismograms.
• Cross-correlation time lag,
∆τ = arg max[Γ(t)] . (13) EXAMPLES

t
• Amplitude ratio between observed and synthetic data, We first use a synthetic example generated from the Marmousi
R ˜ velocity model (Bourgeois et al., 1991) to demonstrate the per-
d 2 (t) dt formance of the proposed automated window selection method.
∆ ln A = ln( Aobs /Asyn ) = 0.5 ln R . (14)
s̃2 (t) dt For this example, we simulated a shot record from the true ve-
locity model shown in Figure 1a as the observed data and then
• Window length, simulate a shot record from the smoothed velocity model shown
w = t end − t start . (15) in Figure 1b as the synthetic data. The shot location is located
at the surface at position 4,596 m. 767 receivers are evenly
• Minimum STA/LTA value, distributed along the survey at a depth of 60 m. The win-
dows used for training are extracted from FLEXWIN selection
mstalta = min(STA/LTA) . (16) results from 6 seismograms between positions 1,920 m and
1,980 m. We show a comparison between the semi-automated
Performance evaluation FLEXWIN method and the fully-automated ML method for
In binary classification problems, like the subject of automated two stations. The comparison for position 6,840 m is shown
window selection in this abstract, the goal is to categorize the in Figure 2, where the blue rectangles denote the selected time
outcome of an event into one of two categories, either accepted windows for misfit calculation. It is obvious that the result
window (1) or rejected window (0). This process can result in from the two methods are exactly the same. Using the ML
one of four possible outcomes that are defined as follows: based method, 2 among 21 initial windows are selected for this
data and the accuracy defined in (17) is 100%. Figure 3 show
1. True Positive (TP): Evaluated and actual results are 1 a comparison for location 7,836 m. 12 among 30 initial win-
(Valid Detection). dows are selected for this result using the proposed method and
2. False Positive (FP): Evaluated result is 1, but actual the accuracy is 90%. It is also salient that although window
result is 0 (False Alarm). selection results are slightly different, the merged waveform
selection result are exactly the same.
3. False Negative (FN): Evaluated result is 0, but actual
result is 1 (Missed Detection).
4. True Negative (TN): Evaluated and actual results are 0
(Valid Non-detection).
Stations used
for validation
Stations used
for training
Let TP, TN, FN, and TN denote the number of instances that
fall into the four mentioned categories of classification results. (a) (b)
Then the classification accuracy (or success rate) can be repre-
sented as Figure 1: (a) Marmousi velocity model. (b) Smoothed velocity
TP + TN model.
Accuracy = . (17)
TP + TN + FP + FN
© 2017 SEG Page 1606

0.2 Seismograms 2e-06 Seismograms
Observed Observed
Synthetic Synthetic
0.2 21 initial windows 83272 initial windows
0.2 2 selected windows 1e-06 6 selected windows
0.1 success rate=0.99979585

0
0.05
Amplitude
Amplitude
0
-1e-06
-0.05
-0.1 -2e-06
-0.2
-0.2 -3e-06
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 0 200 400 600 800 1000 1200 1400
Time [s] Time [s]
(a) (a)
0.2 Seismograms 6e-06 Seismograms

Observed Observed
Synthetic Synthetic
0.2 21 initial windows 4e-06 41144 initial windows
0.2 2 selected windows 1 selected windows
2e-06
0.1 success rate=1.00000000 success rate=0.99992709
0.05 0
Amplitude
Amplitude
0 -2e-06
-0.05
-4e-06
-0.1
-0.2 -6e-06
-0.2 -8e-06
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 0 200 400 600 800 1000 1200 1400
Time [s] Time [s]
(b) (b)
Figure 2: Time window selection results for a station located at Figure 5: Time window selection results for stations (a)
6,840 m using (a) FLEXWIN method and (b) the ML method. AU.NFK and (b) AK.GAMB for the 2014 Mw 6.6 Panama
Note that the results from the two methods are exactly the same. earthquake.
0.4 Seismograms
Observed
Synthetic
0.3 30 initial windows
the source mechanism and location and the green inverted tri-
0.2 9 selected windows
0.1
angles denote the earthquake stations used for demonstration

0
Amplitude
-0.1
in this abstract. Figures 5a and 5b show the time window

-0.2
-0.3
selection results for the stations AU.NFK (with a 99.98% accu-

-0.4
-0.5
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5
racy) and AK.GAMB (with a 99.99% accuracy). This example

Time [s]
(a)
shows that the proposed algorithm can be effective for very
0.4 Seismograms
0.3 30 initial windows
Observed
Synthetic
complicated seismograms.
0.2 12 selected windows
0.1 success rate=0.90000000
0
Amplitude
CONCLUSIONS
-0.1
-0.2
-0.3
-0.4
Selecting windows that contains synthetic and observed seis-

-0.5
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5
Time [s]
(b)
mograms which are sufficiently close to each other plays an
indispensable role in practical implementation of full wave-
Figure 3: Time window selection results for station located at
form inversion since it guarantees convergence of the inver-
7,836 m using (a) FLEXWIN method and (b) the ML method.
sion. While the traditional FLEXWIN algorithm can be “auto-
Note that although window selection results are slightly dif-
mated” to some extent, it still involves a huge amount of labor
ferent, the merged waveform selection result are exactly the
that requires human input and prior experience, and thus is not
same.
deemed to be fully automated. We have presented a fully au-
tomated way of selecting optimal misfit calculation windows
to avoid numerical instability during large-scale seismic inver-
sion. A neural network can be trained from a small dataset and
Event: C201412080854A
90°N 90°N then applied to a large number of data automatically. Synthetic
60°N 60°N
experiments for the Marmousi model and a real earthquake data
30°N 30°N
example demonstrate the performance of the proposed machine
0° 0°
learning based algorithm. The next step of this project is to
30°S 30°S
verify the reliability and robustness of the proposed method in
60°S 60°S
improving inversion results of full waveform inversion.
90°S 90°S
180° 120°W 60°W 0° 60°E 120°E 180°
ACKNOWLEDGEMENTS
Figure 4: Source and station locations. This research used resources of the Oak Ridge Leadership
Computing Facility, which is a DOE Office of Science User
Facility supported under contract DE-AC05-00OR22725. The
A real data example is from the Mw 6.6 Panama earthquake in spectral-element software package SPECFEM3D_GLOBE used
December 8, 2014. The epicenter was 20 kilometers (12 miles) for simulating the seismograms and the benchmark window
south of the Punta de Burica peninsula, on Panama’s Pacific selection software package FLEXWIN used in this article are
Ocean side, near the Costa Rican border. The source and station freely available via the Computational Infrastructure for Geo-
locations are shown in Figure 4, where the beachball indicates dynamics (CIG; geodynamics.org).
© 2017 SEG Page 1607

EDITED REFERENCES
REFERENCES
Bourgeois, A., M. Bourget, P. Lailly, M. Poulet, P. Ricarte, and R. Versteeg, 1990, The Marmousi
experience in The Marmousi Experience: Proceedings of the 1990 EAEG workshop: Marmousi,
model and data, 5–16, https://doi.org/10.3997/2214-4609.201411190.
Chen, Y., H. Chen, K. Xiang, and X. Chen, 2016, Geological structure guided well log interpolation for
high-fidelity full waveform inversion: Geophysical Journal International, 207, 1313–1331,
https://doi.org/10.1093/gji/ggw343.
Dahlen, F. A., G. Nolet, and S. H. Hung, 2000, Fréchet kernels for finite frequency traveltime - I. theory:
Geophysical Journal International, 141, 157–174, https://doi.org/10.1046/j.1365-
246x.2000.00070.x.
Houser, C., G. Masters, and G. Laske, 2008, Shear and compressional velocity models of the mantle from
cluster analysis of long-period waveforms: Geophysical Journal International, 174, 195–212,
Hung, S. H., F. A. Dahlen, and G. Nolet, 2000, Fréchet kernels for finite frequency traveltime — II.
examples: Geophysical Journal International, 141, 175–203, https://doi.org/10.1046/j.1365-
246x.2000.00072.x.
Laurence, J. F., and P. M. Shearer, 2006, Imaging mantle transition zone thickness with SdS-SS finite-
frequency sensitivity kernels: Geophysical Journal International, 174, 143–158,
https://doi.org/10.1111/j.1365-246x.2007.03673.x.
Lee, E.-J., and P. Chen, 2013, Automating seismic waveform analysis for full 3-D waveform inversions:
Geophysical Journal International, 194, 572–589, https://doi.org/10.1093/gji/ggt124.
Maggi, A., C. Tape, M. Chen, D. Chao, and J. Tromp, 2009, An automated time-window selection
algorithm for seismic tomography: Geophysical Journal International, 178, 257–281,
https://doi.org/10.1111/j.1365-246X.2009.04099.x.
Morgan, J., M. Warner, R. Bell, J. Ashley, D. Barnes, R. Little, K. Roele, and C. Jones, 2013, Next-
generation seismic experiments: Wide-angle, multi-azimuth, three-dimensional, full-waveform
inversion: Geophysical Journal International, 195, 1657–1678,
https://doi.org/10.1093/gji/ggv513.
Mousavi, S. M., S. P. Horton, C. A. Langston, and B. Samei, 2016, Seismic features and automatic
discrimination of deep and shallow induced-microearthquakes using neural network and logistic
regression: Geophysical Journal International, 207, 29–46, https://doi.org/10.1093/gji/ggw258.
Pratt, G., 1999, Seismic waveform inversion in the frequency domain, Part 1: Theory and verification in a
physical scale model: Geophysics, 64, 888–901, https://doi.org/10.1190/1.1444597.
Pratt, G., C. Shin, and G. Hick, 1998, Gauss-Newton and full Newton methods in frequency-space
https://doi.org/10.1046/j.1365-246X.1998.00498.x.
Sigloch, K., and G. Nolet, 2006, Measuring finite-frequency body-wave amplitudes and traveltimes:
Geophysical Journal International, 167, 271–287, https://doi.org/10.1111/j.1365-
246X.2006.03116.x.
Symes, W. W., 2008, Migration velocity analysis and waveform inversion: Geophysical Prospecting, 56,
765–790, https://doi.org/10.1111/j.1365-2478.2008.00698.x.
1259–1266, https://doi.org/10.1190/1.1441754.
© 2017 SEG Page 1608

Tromp, J., C. Tape, and Q. Liu, 2005, Seismic tomography, adjoint methods, time reversal and banana-
doughnut kernels: Geophysical Journal International, 160, 195–216,
https://doi.org/10.1190/1.1441754.
van Decar, J. C., and R. S. Crosson, 1990, Determination of teleseismic relative phase arrival times using
multi-chanel cross-correlation and least squares: Bulletin of Seismological Society of America,
80, 150–169.
Geophysics, 74, WCC1–WCC26, https://doi.org/10.1190/1.3238367.
Warner, M., A. Ratcliffe, T. Nangoo, J. Morgan, A. Umpleby, N. Shah, V. Vinje, I. Stekl, L. Guasch, C.
Win, G. Conroy, and A. Bertrand, 2013, Anisotropic 3D full-waveform inversion: Geophysics,
79, R59–R80, https://doi.org/10.1190/geo2012-0338.1.
Xue, Z., N. Alger, and S. Fomel, 2016, Full-waveform inversion using smoothing kernels: 86th Annual
International Meeting, SEG, Expanded Abstracts 1358–1363, https://doi.org/10.1190/segam2016-
13948739.1.
© 2017 SEG Page 1609

Waveform inversion using the shifted Laplace transform
Jungmin Kwon, Hyeonjun Song, and Changsoo Shin, Department of Energy Systems Engineering, Seoul
National University; U Geun Jang, Korea Polar Research Institute; Hyunggu Jun, Korea Institute of Ocean
Science and Technology; Hyunseok Whang*, Computational Science and Technology, Seoul National
University
Summary: velocity model with low-resolution in some cases, such as

low-velocity zones below salt. This phenomenon prevents
Laplace domain FWI has advantages due to its insensitivity the frequency domain FWI algorithm from reaching global
of the bandwidth with respect to the source wavelet. minima when the initial model is far from these global
However, Laplace domain FWI has an inaccuracy problem minima.
from its unwanted cross-correlation terms between
residuals of first arrival travel times and apparent Thus, in this paper, we suggest a new waveform
amplitudes. In this paper, using a new form of objective inversion algorithm that improves the resolution of Laplace
function (shifted Laplace domain wave field), we obtained domain FWI results. In the new algorithm, the shifted
improved data resulting from suppressing unwanted cross- Laplace transform, a modified Laplace transform that uses
correlation terms, crucial factors that make inaccuracy of a time-shifted decay function, is applied to observed and
an inversion result, of residual travel times and apparent modeled seismic data. By shifting the initial time of the
amplitudes. To verify our theory, we implemented this decay function to the first arrival traveltime, the
theory to a BP 2004 benchmark model and obtained an information for the first arrival traveltime of both the
improved result compared to a method of conventional observed and modeled data can be separated within the
Laplace domain FWI. logarithmic objective function. Also, we verify improved
result of shifted Laplace domain FWI through BP 2004
Introduction: benchmark model.
Full-waveform inversion (FWI) is a velocity model Review of Laplace domain full waveform inversion:
building process that minimizes differences between the
modeled and observed data using an appropriate objective Laplace domain FWI is an inversion algorithm that uses
function (Tarantola 1984; Tarantola 1986). Widely used, Laplace-transformed wavefields. In this algorithm, each
algorithms about FWI have a problem with building time domain seismic trace ( ) is transformed as follows:
velocity structures because of the nonlinear nature of ̃( ) ∫ ( ) (1)
inverse problems.
where is the maximum recording time of the trace
and is a Laplace constant. The Laplace domain FWI
To overcome these problems, many studies have focused
results represents the spectral variation in the source
on designing inversion algorithms. One of the algorithms,
wavelet with respect to frequency is less sensitive, as Ha et
ray-based refraction travel time tomography, has been
al. (2012) explained, so we can express modeled wavefield
studied by many geophysicists (Hampson & Russel 1984;
( ) using weighting coefficient and traveltime as
Schneider & Kuo 1985; White 1989; Zhu & McMechan
follows:
1989;Docherty 1992; Qin et all 1993; Cai & Qin 1994;
Stefani 1995; Shtivelman 1996; Zhang & Toksöz 1998).
Also, other algorithms have been studied. Especially, Shin ( ) ∑ ( ) (2)
and Cha (2008) suggested Laplace domain FWI as an
effective algorithm to generate initial models. Despite the where represents the number of seismic events in the
absence of low-frequency components in the seismic data, modeled trace. Likewise, the observed wavefield ( ) in
Laplace domain FWI can generate high-quality initial the time domain can be expressed as follows:
models including long-wave length components, because
the Laplace domain FWI algorithm has less sensitivity on ( ) ∑ ( ). (3)
the frequency spectrum of the source wavelet (Ha & shin
2012). By applying Laplace transformation to eq.2 and eq.3, we
can obtain Laplace domain wavefields as follows:
The results of conventional Laplace domain FWI are
often negatively influenced by the gradient distortion effect,
̃( ) ∑ (4)
which is caused by undesirable cross-correlation terms.
This gradient distortion can create an inaccurate estimated ̃( ) ∑ . (5)
© 2017 SEG Page 1610

Waveform inversion in the shifted Laplace domain
From eq.4 and eq.5, we can see that the amplitude of

Laplace domain wavefield is damped at long offsets, and
long-offset data is neglected in the least-squares objective
function. Thus, Min & Shin (2006) suggested the
logarithmic least-squares objective function of the Laplace
domain wavefield can be written as follows:
̃( ) Figure 1: Initial model with seawater velocity for BP

( ) ∑ ̃( ) ∑ ( ̃ ( )) (6)
velocity FWI processing.
also, eq.6 can be written
̃ ( )
( ) ∑ ( ( ) (̃ )) (7)
( )
where,
̃ ( ) ∑ ( )
(8)
̃ ( ) ∑ ( )
(9) (a)
̃ ( ) and ̃ ( ) include information on the
amplitude of the first arrivals and later seismic events. We
call this the “apparent amplitude”. The information that is
provided by later events (a second term of eq.7) in the
apparent amplitude is influenced by parameters of deeper
areas, so deeper areas can be reconstructed by using small
Laplace constants (Bae et al. 2012; Ha et al. 2012). Large
damping constants strongly emphasize early arrivals and
update shallow portions of the velocity model, while small (b)
damping constants can be used to obtain information from
later arrivals.
In case of waveform inversion with a small Laplace

constant, each term of the partial derivative wavefield
should be cross-correlated only with its adjoint source as
follows:
( )
(c)
̃ ( )
∑ [ [ ( ) (̃ )] [ (
( )
̃ ( )
) (̃ )]] (10)
( )
where,
∑ * ( )+ (11)
(̃ ( )) ̃ ( )
∑ [ (̃ )] (12) (d)
( )
̃ ( ) Figure 2: Updates of the four correlation terms (a) , (b)
∑ [ (̃ )] (13) , (c) and (d) , which were generated from the
( )
and, initial model (Figure 1).
(̃ ( ))
∑ [ ( )] (14) In idealistic case, eq.13 and eq.14 should be neglected
because each term of the partial derivative wavefield
should be cross-correlated only with its adjoint source.
© 2017 SEG Page 1611

These undesirable terms in eq.10, eq.13 and eq.14, are the

major causes of the distortion of the gradient in Laplace
domain FWI. Fig 1. and Fig 2. show conventional Laplace
FWI processing. Given initial model (fig 1.), fig 2. shows
each correlation term of eq.10. Fig 2-(a) and Fig 2-(b)
show and have proper updating directions,
while and show that these terms have improper
updating directions.
From these undesirable terms, we can prospect that FWI

result would have been improved, if we suppressed the data
of first arrival traveltime.
Shifted Laplace transform to extract the apparent (a)

amplitude:
In this section, we introduce the „Shifted Laplace

Transform‟. The new transform is a modified Laplace
transform that uses a shifted decay function as follows:
̂( ) ∫ ( ) ( )
(15)
As we expanded misfit function and partial derivative of
misfit function in previous section, we can write misfit
function and partial derivative of misfit function that has no
undesirable terms as follows:
(b)
̃ ( )
Figure 3: Partial derivative of each data set from 1D
̂( ) ∑ (( ( )) ( (̃ )) ) homogeneous media when the Laplace constant was 10 (a)
( )
(16) and 1 (b), including partial derivatives of the logarithmic
where, wavefield in the Laplace domain (black line), the first
̂( ) ̃ ( ) (17) arrival traveltime multiplied by the Laplace constant (blue
̂( ) ̃ ( line), and the logarithmic wavefield in the shifted Laplace
) (18)
domain, which represents the apparent amplitude of the
wavefield (red line).
Thus, the gradient of the objective function can be
expressed as follows: Numerical examples:
̂( )
To verify the effectiveness of the shifted Laplace domain
(̃ ( )) ̃ (
FWI method, the inverted result was compared to the result
)
∑ [ ( ) (̃ )] of the conventional Laplace domain FWI. The shifted
( )
Laplace domain FWI result was inverted with 5 Laplace
(19) constants, 5 consecutive odd integers from 1 ( ), and the
first arrival travel time was extracted by using a complex
Fig 3-(a) shows shifted Laplace domain may not be frequency of 0.01Hz and a Laplace constant of 9 ,
essential when an extremely large Laplace constant is used using an algorithm of the first arrival traveltime and
due to the dominant ratio a first arrival travel time to an amplitude which is suggested by Shin et al. (2003). This
apparent amplitude. Meanwhile, Fig.3-(b) shows that the Laplace constant was large enough to extract the first
cross-correlation terms, and , were not negligible. arrival traveltime, and the frequency was small enough to
Therefore, using a shifted Laplace domain wavefield is avoid the cycle-skipping effect. The Laplace domain FWI
necessary, when a small or modest Laplace constant is used process also used the same 5 Laplace constants which is
for Laplace domain FWI. equal to previous number array.
As shown in Figure 4-(b), this process modeled the deep

areas of the velocity model exhibiting some overestimation,
and the low-velocity zone below the salt dome was not
detected. Meanwhile, Fig 4-(c) that shows the inverted
© 2017 SEG Page 1612

result when using shifted Laplace domain FWI, expresses The model misfit of the velocity models in Fig 5-(a), and (b)
the low-velocity zone below the salt dome better than the are and , respectively.
previous method by suppressing overestimation in the deep
regions of the domain. Conclusions:
We performed frequency domain full waveform inversion In this paper, we introduced shifted Laplace domain FWI
to evaluate the adequacy of the inverted models as initial as a solution to the in accuracy problem of undesirable
models for the conventional FWI. In this process, seventy cross-correlation in the Laplace domain gradient directions.
frequencies from 2 to 10 Hz were used. Fig. 5 shows the The wavefields and partial derivatives in the shifted
velocity models that were obtained from conventional Laplace domain could be successfully obtained by using
frequency domain FWI when using the initial models from the first arrival travel time calculation method by Shin et al.
conventional Laplace domain FWI and shifted Laplace (2003) and Pyun et al. (2005). As examples of above, the
domain FWI methods. shifted Laplace domain FWI has advantages to generate an
To appraise the error of inverted model and true model, we initial velocity model and a high resolution initial velocity.
calculated a relative model misfit following equation: For all advantages of shifted Laplace domain FWI has
shown above, the need of more research is arising because
‖ ‖ (20) of the lack of information about the optimal Laplace
constant that would be useful to improve the resolution of
where, n is the number of nodes, is the inverted the deeper sections of velocity models.
velocity model, and is the true velocity model.
Acknowledgement:
This work was supported by the Energy Efficiency &

Resources Core Technology Program of the Korea Institute
of Energy Technology Evaluation and Planning granted
financial resource from the Ministry of Trade, Industry &
Energy, Republic of Korea (nos. 20132510100060). This
work was supported by the Korea Meteorological
Administration through the Korea Meteorological Industry
(a) Promotion Agency (Grant KMIPA 2015-3091)
(b) (a)
(b)
(c)
Figure 4: True BP P-wave velocity model (a), conventional Figure 5: Velocity models from conventional frequency
Laplace domain FWI results after 500 iterations (b), and domain full waveform inversion when using the initial
shifted Laplace domain FWI results after 500 iterations (c). models from the conventional Laplace domain FWI (a) and
shifted Laplace domain FWI (b).
© 2017 SEG Page 1613

EDITED REFERENCES
REFERENCES
Bae, H. S., S. Pyun, C. Shin, K. J. Marfurt, and W. Chung, 2012, Laplace-domain waveform inversion
versus refraction-traveltime tomography: Geophysical Journal International, 190, 595–606,
http://doi.org/10.1111/j.1365-246X.2012.05504.x.
Bharadwaj, P., W. A. Mulder, and G. Drijkoningen, 2013, Multi-objective full waveform inversion in the
absence of low frequencies: 83rd Annual International Meeting, SEG, Expanded Abstracts,
Bozda, E., J. Trampert, and J. Tromp, 2011, Misfit functions for full waveform inversion based on
870, http://doi.org/10.1111/j.1365-246X.2011.04970.x.
Brenders, A. J., S. Charles, and R. G. Pratt, 2008, Velocity estimation by waveform tomography in the
Canadian foothill?A synthetic benchmark study: 70th Annual International Conference and
Exhibition, EAGE, Extended Abstracts, F020, http://doi.org/10.3997/2214-4609.20147678.
Brenders, A. J., and R. G. Pratt, 2007, Full waveform tomography for lithospheric imaging: Results from
a blind test in a realistic crustal model: Geophysical Journal International, 168, 133–151,
http://doi.org/10.1111/j.1365-246X.2006.03156.x.
https://doi.org/10.1190/1.3215771.
Cai, W., and F. Qin, 1994, Three-dimensional refraction imaging: 64th Annual International Meeting,
SEG, Expanded Abstracts, 629–632, https://doi.org/10.1190/1.1932177.
Chi, B., L. Dong, and Y. Liu, 2014, Full waveform inversion method using envelope objective function
without low frequency data: Journal of Applied Geophysics, 109, 36–46,
https://doi.org/10.1016/j.jappgeo.2014.07.010.
Choi, Y., and T. Alkhalifah, 2013, Frequency-domain waveform inversion using the phase derivative:
Datta, D., and M. K. Sen, 2016, Estimating a starting model for full-waveform inversion using a global
optimization method: Geophysics, 81, no. 4, R211–R223, https://doi.org/10.1190/geo2015-
0339.1.
Docherty, P., 1992, Solving for the thickness and velocity of the weathering layer using 2-D refraction
tomography: Geophysics, 57, 1307–1318, https://doi.org/10.1190/1.1443198.
Fichtner, A., J. Trampert, P. Cupillard, E. Saygin, T. Taymaz, Y. Capdeville, and A. Villasenor, 2013,
Multiscale full waveform inversion: Geophysical Journal International, 194, 534–556,
Ha, W., W. Chung, E. Park, and C. Shin, 2012, 2-D acoustic Laplace-domain waveform inversion of
marine field data: Geophysical Journal International, 190, 421–428,
https://doi.org/10.1111/j.1365-246X.2012.05487.x.
Ha, W., and C. Shin, 2012, Laplace-domain full-waveform inversion of seismic data lacking low-
frequency information: Geophysics, 77, no. 5, R199–R206, https://doi.org/10.1190/geo2011-
0411.1.
Ha, W., and C. Shin, 2013, Why do Laplace-domain waveform inversions yield long-wavelength results?:
Geophysics, 78, no. 4, R167–R173, https://doi.org/10.1190/geo2012-0365.1.
© 2017 SEG Page 1614

Hampson, D., and B. Russell, 1984, First break interpretation using generalized linear inversion: 54th
https://doi.org/10.1190/1.1894084.
Kamei, R., R. G. Pratt, and T. Tsuji, 2013, On acoustic waveform tomography of wide-angle OBS
data?strategies for pre-conditioning and inversion: Geophysical Journal International, 194, 1250–
1280, https://doi.org/10.1093/gji/ggt165.
Kwon, J. M, H. Jun, H. Song, U. Jang, and C. Shin, 2017, Waveform inversion in the shifted Laplace
domain: Geophysical Journal International, 210, 340–353, https://doi.org/10.1093/gji/ggx170.
Min, D., and C. Shin, 2006, Refraction tomography using a waveform-inversion back-propagation
technique: Geophysics, 71, no. 3, R21–R30, https://doi.org/10.1190/1.2194522.
Plessix, R.-E., S. Michelet, H. Rynja, H. Kuehl, C. Perkins, J. W. de Maag, and P. Hatchell, 2010, Some
3D applications of full waveform inversion: 72nd Annual International Conference and
Exhibition, EAGE, Extended Abstracts, https://doi.org/10.3997/2214-4609.20149933.
Pyun, S., C. Shin, D. Min, and T. Ha, 2005, Refraction traveltime tomography using damped
monochromatic wavefield: Geophysics, 70, no. 2, U1–U7, https://doi.org/10.1190/1.1884829.
Qin, F., W. Cai, and G. T. Schuster, 1993, Inversion and imaging of refraction data: 63rd Annual
Schneider, W. A., and S. Y. Kuo, 1985, Refraction modeling for static corrections: 55th Annual
International Meeting, SEG, Expanded Abstracts, 295–299.
Sheng, J., A. Leeds, M. Buddensiek, and G. T. Schuster, 2006, Early arrival waveform tomography on
near-surface refraction data: Geophysics, 71, no. 4, U47–U57, https://doi.org/10.1190/1.2210969.
Shin, C., and Y. H. Cha, 2008, Waveform inversion in the Laplace domain: Geophysical Journal
International, 173, 922–931, https://doi.org/10.1111/j.1365-246X.2008.03768.x.
Shin, C., S. Ko, W. Kim, D. Min, D. Yang, K. J. Marfurt, S. Shin, K. Yoon, and C. H. Yoon, 2003,
Traveltime calculations from frequency domain downward-continuation algorithms: Geophysics,
68, 1380–1388, https://doi.org/10.1190/1.1598131.
Shtivelman, V., 1996, Kinematic inversion of first arrivals of refracted waves?A combined approach:
Sirgue, L., 2003, Inversion de la forme d’onde dans le domaine frequentiel de donnees sismiques grand
offset: Ph.D. thesis, Universite Paris and Queen’s University.
Sirgue, L., O. I. Barkved, J. P. Van Gestel, O. J. Askim, and J. H. Kommedal, 2009, 3D waveform
inversion on Valhall Wideazimuth OBC: 70th Annual International Conference and Exhibition,
EAGE, Extended Abstracts, U038, https://doi.org/10.3997/2214-4609.201400395.
Stefani, J. P., 1995, Turing-ray tomography: Geophysics, 60, 1917–1929,
https://doi.org/10.1190/1.1443923.
1259–1266, https://doi.org/10.1190/1.1441754.
1893–1903, https://doi.org/10.1190/1.1442046.
White, D. J., 1989, Two-dimensional seismic refraction tomography: Geophysical Journal International,
97, 223–245, https://doi.org/10.1111/j.1365-246X.1989.tb00498.x.
Woodward, M. J., 1992, Wave-equation tomography: Geophysics, 57, 15–26,
https://doi.org/10.1190/1.1443179.
Wu, R., and M. N. Toksoz, 1987, Diffraction tomography and multisource holography applied to seismic
imaging: Geophysics, 52, 11–25, https://doi.org/10.1190/1.1442237.
© 2017 SEG Page 1615

Xu, S., D. Wang, F. Chen, G. Lambare, and Y. Zhang, 2012, Inversion on reflected seismic wave: 82nd
Zelt, C. A., and P. J. Barton, 1998, Three-dimensional seismic refraction tomography: A comparison of
two methods applied to data from the Faeroe Basin: Journal of Geophysical Research, 103, 7187–
7210, https://doi.org/10.1029/97JB03536.
Zhang, J., and M. N. Toksoz, 1998, Nonlinear refraction traveltime tomography: Geophysics, 63, 1726–
1737, https://doi.org/10.1190/1.1444468.
Zhu, X., and G. A. McMechan, 1989, Estimation of a two-dimensional seismic compressional-wave
velocity distribution by iterative tomographic imaging: International Journal of Imaging Systems
and Technology, 1, 13–17, https://doi.org/10.1002/ima.1850010103.
© 2017 SEG Page 1616

Efficient Laplace constant selection strategy for the Laplace domain waveform inversion
Jungmin Kwon*, Ian Miller-Evans, Changsoo Shin, Seoul National University
Summary maintain the continuity and minimize redundancy of the vertical

imaginary wavenumber of the wavepath. Since it is difficult to
Previous studies on Laplace domain waveform inversion (WI) lack determine whether attenuation is caused by the local Laplace
guidelines to determine efficient Laplace constants. This paper constant or by geometric spreading, we will present a method for
presents presents such guidelines when considering a given source- selecting the Laplace constant by reflecting the geometric spreading
receiver geometry. For better understanding of this method, we effect. We present a 1D model which shows that the set of Laplace
describe the Green's function of the Laplace domain assuming constants obtained by the newly proposed Laplace constant
homogeneous media and analyze the wavepath of the Laplace selection method makes the Laplace domain WI more efficient than
domain using the Green’s function. We demonstrate that the the conventional fixed interval Laplace constant selection method.
continuity of the imaginary wavenumber coverage of the wavepath
in Laplace domain should be maintained to improve the resolution Review of the Laplace domain waveform inversion (WI)
of the Laplace domain inversion result. Using the proposed Laplace
constant selection strategy, the Laplace constants can be chosen to Laplace domain WI proposed by Shin and Cha (2008) is an
maintain the continuity of the vertical imaginary wavenumber of the inversion algorithm that uses Laplace transformed data. This
local wavepath and minimize the redundancy of the vertical method is the same as waveform inversion using the zero frequency
imaginary wavenumber. A 1D model test shows that the proposed component of a damped wavefield. In the Laplace domain WI
Laplace constant selection strategy has better performance than the algorithm, the amount of damping can be controlled by the Laplace
conventional Laplace constant selection strategy using a fixed constant σ.
interval.
To perform Laplace domain WI, the objective function should be
Introduction defined. To compensate for data amplitude loss corresponding to
offset increase, a logarithmic objective function is generally adopted
Laplace domain waveform inversion (WI) suggested by Shin and for the Laplace domain WI algorithm. The logarithmic objective
Cha (2008) is a method for estimating a velocity model using a function of a single Laplace constant is expressed as
Laplace domain wavefield. It has been used to provide an initial 𝑁 2
1 𝑑𝐬,𝐠 (𝜎) (1)
model for frequency domain or time domain waveform inversion. E(𝜎) = ∑ ln ( ) ,
The advantage of the Laplace domain FWI is that there are fewer 2 𝑢𝐬,𝐠 (𝜎)
𝑖
local minima in the objective function compared to frequency where 𝑢𝑖 and 𝑑𝑖 are the modeled and observed wavefields of the 𝑖th
domain FWI (Shin and Ha, 2008). Its disadvantage is that the source-receiver pair, respectively, and 𝑁 are the number of source-
inverted model with high resolution cannot be obtained using receiver pairs, and 𝜎 is Laplace constant. 𝐬 and 𝐠 are the position
gradient based method alone. vector of source and receiver, respectively. Expanding the
𝑑𝐬,𝐠 (𝜎)
Recently, Kwon et al. (2017) succeeded in improving the resolution logarithmic residual, ln ( ), to the first-order term of a Taylor’s
𝑢𝐬,𝐠 (𝜎)
of inverse results in the Laplace domain by applying the truncated series yields
Gauss-Newton method to the Laplace domain WI. Kwon et al. (2017) 𝑀
𝑑𝐬,𝐠 (𝜎) ∂ ln(𝑢𝐬,𝐠 (𝜎)) (2)
also asserted that the resolution of the inverted result is acceptable ln ( ) ≈ ∑{ ∆𝑝𝐦𝒋 }
when the range and density of the Laplace constants are sufficiently 𝑢𝐬,𝐠 (𝜎) 𝜕𝑝𝐦𝒋
𝑗=1
large. However, these studies of the Laplace domain WI have failed where 𝐦𝒋 is the position vector of 𝑖th model parameter, 𝑀 is the
to suggest an efficient Laplace constant selection strategy number of model parameters, and ∆𝑝𝐦𝒋 is the difference between the
considering a given source-receiver geometry.
true model and the estimated model of the 𝑗 th parameter ( 𝑗 =
Sirgue and Pratt (2004) provided instructions on how to choose a set 1, ⋯ , 𝑀).
of frequencies for frequency domain FWI using a 1D assumption.
This method helps select a set of frequencies that maintain the From equation (2), we can recognize that the residual wavefield at a
continuity of the wavenumber of the wavepath (Woodward, 1992) receiver ln ( ) is generated by the superposition of scattered
and minimize the redundancy of the wavenumber in a given source-
wavefields resulting from ∆𝑝𝐦𝒋 (Woodward, 1992). Model
receiver geometry. They verified that the larger the range of offsets,
the fewer frequencies are required in the frequency domain FWI. differences at each point in the model ∆𝑝𝐦𝒋 acts as a scatterer and
∆𝑝𝐦𝒋 can be regarded as an weight of the following basis:
In this paper, we provide instruction on how to select the Laplace
∂ ln(𝑢𝐬,𝐠 (𝜎)) (3)
constants in the Laplace domain WI. First, we present the Green’s 𝐿(𝐬, 𝐠, 𝐦, 𝜎) =
function in the Laplace domain assuming a homogeneous medium 𝜕𝑝𝐦𝒋
and explain the wavepath of the Laplace domain by using the where 𝐿(𝐬, 𝐠, 𝐱, 𝜎) is called the wavepath and represents a basis
Green’s function. We clarify that the wavepath is a function of 𝑑𝐬,𝐠 (𝜎)
imaginary wavenumber, which represents the Laplace constant in function constituting the residual wavefield, ln ( ) as shown in
space domain. By analyzing the wavepath of the Laplace domain, equation (2). In this case, the wavepath is obtained from the
we propose a method to select the set of Laplace constants that both
© 2017 SEG Page 1617

logarithmic objective function and is called the Rytov wavepath. It and

can be expressed using three Green’s functions as follows: 𝐵 (𝑛𝐷) (𝐬, 𝐠, 𝐨, 𝜎)
𝑔0 (𝐦|𝐬, 𝜎)𝑔0 (𝐦|𝐠, 𝜎) (4) 𝜎 2 exp (−𝛼(|𝐫𝒔 | + |𝐫𝐠 |)) 1
𝐿(𝐬, 𝐠, 𝐦, 𝜎) = 𝜎 2 𝑖𝑓 𝑛 = 1
𝑔0 (𝐠|𝐬, 𝜎) 𝑔
(1𝐷)
(𝐠|𝐬, 𝜎) 4𝛼 2
0
where 𝑔0 (𝐱𝟏 |𝐱𝟐 , 𝜎) is a Green’s function of a single Laplace 𝜎 2 exp (−𝛼(|𝐫𝒔 | + |𝐫𝐠 |))
constant 𝜎 from 𝐱𝟏 to 𝐱𝟐 (Wu and Toksӧz, 1987; Woodward, 1992). 1
= (2𝐷)
𝑖𝑓 𝑛 = 2 .
This Rytov wavepath assumes sloth (inverse of velocity square) 𝑔0 (𝐠|𝐬, 𝜎)
8𝜋𝛼√|𝐫𝐬 ||𝐫𝐠 |
parameterization.
𝜎 2 exp (−𝛼(|𝐫𝒔 | + |𝐫𝐠 |)) 1
As shown above, the Rytov wavepath in Laplace domain can be 𝑖𝑓 𝑛 = 3
(3𝐷)
expressed using the Laplace domain Green’s functions. To { 𝑔0 (𝐠|𝐬, 𝜎) 16𝜋 2 |𝐫𝐬 ||𝐫𝐠 |
investigate the Rytov wavepath in Laplace domain more specifically
and analyze the role of each Laplace constant by using the Rytov
wavepath, we should determine the Green’s function in the Laplace
domain.
Laplace domain Green’s functions for a homogeneous acoustic

unbounded medium
When the source and receiver is sufficiently far from the model
parameter, the Laplace domain Green’s function for a homogeneous
acoustic unbounded medium can be expressed asymptotically as
follows:
Figure 1. A schematic diagram describing the relationship between the
exp(−𝛼𝑅) (5) incident wavefield and scattering wavefield.
𝑔(𝑛𝐷) (𝐱𝟏 |𝐱𝟐 , 𝜎)~
𝐴𝑛 (𝑅, 𝛼)
(𝑛𝐷)
where For simplicity, we introduce the notation 𝛂𝑎𝑝𝑝 (𝐬, 𝐠, 𝐨, 𝜎) and we
2𝛼 𝑖𝑓 𝑛 = 1 will call it the apparent imaginary wavenumber vector.
(𝑛𝐷)
𝐴𝑛 (𝑅, 𝛼) = { √8𝜋𝛼𝑅 𝑖𝑓 𝑛 = 2 . 𝛂𝑎𝑝𝑝 (𝐬, 𝐠, 𝐨, 𝜎) defines the steepness of the Rytov wavepath
4𝜋𝑅 𝑖𝑓 𝑛 = 3 decaying exponentially in the direction of the 𝐧 vector near the
Here, 𝑛 is the dimension of the space domain, 𝑅 is the distance from scattering point 𝐨. 𝐵 (𝑛𝐷) (𝐬, 𝐠, 𝐨, 𝜎) represents the amplitude of the
𝐱𝟏 to 𝐱𝟏 , and 𝛼 (= 𝜎/𝑐0 ) is the imaginary wavenumber. Using these (𝑛𝐷)
Rytov wavepath. Note that 𝛂𝑎𝑝𝑝 (𝐬, 𝐠, 𝐨, 𝜎) and 𝐵 (𝑛𝐷) (𝐬, 𝐠, 𝐨, 𝜎) are
Green’s functions, we can express the Rytov wavepath in the
independent of 𝐱 as long as |𝐱| ≪ |𝐫𝐬 | and |𝐱| ≪ |𝐫𝐠 |. Thus, they
Laplace domain.
can be regarded as constants near the scattering point 𝐨. Hence, it
Rytov wavepath in the Laplace domain can be confirmed that the Rytov wavepath in the Laplace domain is
approximately an exponential decaying real basis function whose
(𝑛𝐷)
To help understanding the Retov wavepath, we provide a schematic imaginary wavenumber vector is 𝛂𝑎𝑝𝑝 (𝐬, 𝐠, 𝐨, 𝜎) in the space
diagram describing the relationship between the incident wavefield domain as equation (6) shows.
and scattering wavefield (Figure 1). 𝐬, 𝐠, 𝐦 and 𝐨 are the position
vectors corresponding to source, receiver, model parameter and Optimal conditions of the imaginary wavenumber, 𝛼 (= 𝜎/𝑐0 ),
central point , respectively, within a specific window that is far from should be determined to achieve a high-resolution model within the
the source and receiver. 𝐫𝒔 represents the vector from 𝐬 to 𝐨, 𝐫𝒈 window shown in Figure 1. Laplace domain WI often utilizes the
represents the vector from 𝐠 to 𝐨, and 𝐱 represents the vector from Gauss-Newton method or gradient descent method. The model
𝐨 to 𝐦. Since the incident angle and scattering angle are always parameter vector estimated from the Gauss-Newton method can be
same due to Snell’s law, both angles can be equally represented by expressed as follows:
𝜃. Therefore, the Laplace domain Rytov wavepath can be expressed 𝑇 −1 𝑇 𝑑𝐬,𝐠 (𝜎) (7)
using the Fraunhofer approximation as follows:
𝑒𝑠𝑡
∆𝑝𝐦 = − (𝐿(𝑛𝐷) 𝐿(𝑛𝐷) + 𝜖𝐼) 𝐿(𝑛𝐷) ln ( )
𝐿(𝑛𝐷) (𝐬, 𝐠, 𝐦, 𝜎) The model parameter vector estimated from the gradient descent
(𝑛𝐷) (6)
≈ 𝐵 (𝑛𝐷) (𝐬, 𝐠, 𝐨, 𝜎) exp(−𝛂𝑎𝑝𝑝 (𝐬, 𝐠, 𝐨, 𝜎) ∙ 𝐱) method can be expressed as follows:
where 𝑇 𝑑𝐬,𝐠 (𝜎) (8)
2𝜎
𝑒𝑠𝑡
∆𝑝𝐦 = −𝑙𝐿(𝑛𝐷) ln ( )
𝑐𝑜𝑠𝜃𝐧 𝑖𝑓 𝑛 = 1
𝑐0 where 𝑙 is step length. As shown in both cases, the estimated model
2𝜎 |𝐫𝐠 |𝐬̂ + |𝐫𝐬 |𝐠̂ parameter vector ∆𝑝𝐦 𝑒𝑠𝑡
is a linear combination of the exponential
𝛂𝑎𝑝𝑝 (𝐬, 𝐠, 𝐨, 𝜎) = 𝑐 𝑐𝑜𝑠𝜃𝐧 + 𝑖𝑓 𝑛 = 2
(𝑛𝐷)
0 𝟐|𝐫𝐬 ||𝐫𝐠 | basis functions in both case. Data residuals ln ( ) are the
2𝜎 |𝐫𝐠 |𝐬̂ + |𝐫𝐬 |𝐠̂ weights of the exponential basis functions in both cases shown in
𝑐𝑜𝑠𝜃𝐧 + 𝑖𝑓 𝑛 = 3
{ 𝑐0 |𝐫𝐬 ||𝐫𝐠 | equation (7) and (8). For the exponential basis functions to span the
© 2017 SEG Page 1618

𝑀 th dimensional model space, the exponential basis functions 𝜎 𝑖𝑓 𝑛 = 1

should be linearly independent. If the number of elements in the set 𝑐0
(10)
𝜎𝑎𝑝𝑝 = 𝜎 + 2𝑅 𝑖𝑓 𝑛 = 2 .
(𝑛𝐷)
of linearly independent exponential basis functions becomes larger,
the exponential basis functions can span higher dimensional model 𝑐0
𝜎+ 𝑖𝑓 𝑛 = 3
space. Note that the set of the exponential basis functions whose { 𝑅
imaginary (or real) wavenumbers are unique is a linearly To properly address the effect of geometric spreading on
independent set. Thus, the number of unique imaginary wavenumber analysis, we introduce the apparent Laplace constant
(𝑛𝐷)
wavenumbers defines the maximum dimension of the estimated 𝜎𝑎𝑝𝑝 . We assume a series of apparent Laplace constants 𝜎𝑎𝑝𝑝,𝑖 (𝑖 =
model space. This implies that the larger number of unique 1,2, ⋯ , 𝑁𝜎 ) used for the Laplace domain WI. We also assume that
imaginary wavenumbers, the higher the resolution within the 𝜎𝑎𝑝𝑝,𝑖+1 is larger than σ𝑖 for all 𝑖.
window near point 𝐨. By selecting proper Laplace constants, we can
maximize the number of different imaginary wavenumbers, and in Given source-receiver pairs, the range of vertical imaginary
turn maximize resolution. wavenumbers of a single Laplace constant is defined. The vertical
imaginary wavenumber is at a maximum when the scattering angle
An efficient strategy for Laplace constant selection is smallest and the vertical imaginary wavenumber is at a minimum
when the scattering angle is largest. Therefore, we can express the
maximum and minimum vertical imaginary wavenumbers of a
Our selection strategy is largely similar to the frequency selection single Laplace constant as follows:
strategy suggested by Sirgue and Pratt (2004) due to the similarity 2𝜎𝑎𝑝𝑝,𝑖
between the Green’s functions in both the Laplace and frequency 𝛼𝑖,𝑚𝑎𝑥 =
𝑐0 (11)
domain. However, unlike the wavepath in the frequency domain, the 2𝜎𝑎𝑝𝑝,𝑖
Rytov wavepath in the Laplace domain is affected by the geometric 𝛼𝑖,𝑚𝑖𝑛 = 𝑐𝑜𝑠 𝜃𝑚𝑎𝑥
𝑐0
spreading effect and imaginary wavenumber. Therefore, if the
To make the vertical imaginary wavenumbers constant continuous,
window around the point o is small, the Laplace domain WI cannot
the maximum vertical imaginary wavenumber of 𝑖 th Laplace
distinguish between attenuation caused by the Laplace constant and
constant should be equal to or larger than the minimum vertical
attenuation caused by the geometrical spreading. Thus geometric
imaginary wavenumber of the 𝑖 + 1th Laplace constant. Also, to
spreading should be considered when selecting Laplace constants.
make the vertical imaginary wavenumbers continuous and have
minimum redundancy, the maximum vertical imaginary
Similarly to the frequency selection strategy, the purpose of the
Laplace constant strategy is to make the vertical apparent imaginary wavenumber of the 𝑖 th Laplace constant should be equal to or
wavenumber continuous and minimum redundant for all given smaller than the minimum vertical imaginary wavenumber of the
source-receiver pairs and Laplace constants. 𝑖 + 1th Laplace constant. To satisfy these two conditions (continuity
and minimum redundancy), the maximum vertical imaginary
For a 1D model, the incident angle and scattering angle are wavenumber of the 𝑖 th Laplace constant should be equal to or
symmetric. Therefore, the Rytov wavepath of the Laplace domain smaller than the minimum vertical imaginary wavenumber of the
shown equation (6) can be simplified for the 1D model as follows: 𝑖 + 1th Laplace constant as follows:
(12)
𝐿(𝑛𝐷) (𝐬, 𝐠, 𝐦, 𝜎) 𝛼𝑖+1,𝑚𝑖𝑛 = 𝛼𝑖,𝑚𝑎𝑥
(𝑛𝐷)
2𝜎𝑎𝑝𝑝 (9)
≈ 𝐵 (𝑛𝐷) (𝐬, 𝐠, 𝐨, 𝜎) exp (− 𝑐𝑜𝑠𝜃𝑧) Using this condition, we can obtain a recurrence formula of apparent
𝑐0 Laplace constants for satisfying continuity and minimum
where redundancy as follows:
𝜎𝑎𝑝𝑝,𝑖 (13)
𝜎𝑎𝑝𝑝,𝑖+1 =
𝑐𝑜𝑠 𝜃𝑚𝑎𝑥
Substituting equation (10) to equation (13), we can arrive

at the relations of Laplace constants in 1D, 2D and 3D
cases as:
𝜎𝑖
𝑖𝑓 𝑛 = 1
𝑐𝑜𝑠 𝜃𝑚𝑎𝑥
𝑐0
𝜎𝑖 + 𝑐0
(𝑛𝐷) 2𝑅𝑚𝑖𝑛 (14)
𝜎𝑖+1 = − 𝑖𝑓 𝑛 = 2
𝑐𝑜𝑠 𝜃𝑚𝑎𝑥 2𝑅𝑚𝑎𝑥
𝑐
𝜎𝑖 + 0 𝑐0
𝑅𝑚𝑖𝑛
− 𝑖𝑓 𝑛 = 3
{ 𝑐𝑜𝑠 𝜃𝑚𝑎𝑥 𝑅𝑚𝑎𝑥
where 𝜃𝑚𝑎𝑥 is the maximum angle at the depth of target

layer and 𝑅𝑚𝑖𝑛 and 𝑅𝑚𝑎𝑥 is the shortest and longest
distance respectilvely from source (or receiver) to
Figure 2. Figure of Laplace constant selection strategy of (a) 1D, (b) 2D and (c) 3D. A single scattering point at the target layer. As shown in equation
discrete Laplace constant produces a range of vertical imaginary wavenumbers in the wavepath.
(14), we can select the Laplace constant more sparsely due
The following Laplace constants are chosen in such a way that they obtain a continuous range in
vertical imaginary wavenumbers. to the geometrical spreading effect in 2D and 3D cases.
© 2017 SEG Page 1619

Figure 2 shows the Laplace constant discretization strategy of 1D, 1 𝑚𝑖𝑛𝑣 − 𝑚𝑡𝑟𝑢𝑒 (15)
𝑚𝑜𝑑𝑒𝑙 𝑚𝑖𝑠𝑓𝑖𝑡 = ‖ ‖
2D and 3D cases. The suggested strategy allows us to choose 𝑀𝑙 𝑚𝑡𝑟𝑢𝑒 1
Laplace constants which make the vertical imaginary wavenumber where 𝑚𝑖𝑛𝑣 is the inverted model parameter, 𝑚𝑡𝑟𝑢𝑒 is the true model
coverage of the model update continuous and minimize vertical parameter and 𝑀𝑙 is the number of model parameters on the vertical
imaginary wavenumber redundancy. dotted line positioned at the center of the model. The relative model
misfit of each result is shown in Table 2.
Numerical examples
Number of
To verify the validity of the proposed Laplace constant selection Strategy Laplace Relative model misfit
strategy, we implement a comparison test with a 1D velocity model constants
whose size is 10 km×3 km as shown in Figure 3. In this comparison Proposed strategy 4 3.059 × 10−2
test, the result inverted with the set of Laplace constants selected 2 3.935 × 10−2
from the proposed strategy is compared to several results inverted 3 3.461 × 10−2
with the sets of Laplace constants selected with fixed intervals. The 4 3.194 × 10−2
Fixed interval
true velocity model is a three-layered model whose velocities are 5 3.171 × 10−2
strategy
1.7, 3.5 and 1.7 km/s from the top as shown Figure 3. We use a (Conventional)
6 3.128 × 10−2
homogeneous starting velocity model whose velocity is 1.7 km/s 7 3.110 × 10−2
where the maximum offset is 10 km and grid interval is 0.025 km. 8 3.060 × 10−2
In this test, we set the maximum depth, 3 km, as the depth of the 9 3.059 × 10−2
Table 2. The relative model misfit of inverted model parameters obtained from
target layer. We also fix the minimum and maximum Laplace each strategy.
constants as 1.0 s −1 and 10.0 s −1 , respectively.
The relative model misfit of the result obtained from the proposed
strategy is smaller than the relative model misfits of all the results
obtained from the fixed interval strategy. This implies that the
proposed strategy for Laplace constants selection allows us to
appropriately select the set of Laplace constants to such an extent
that the exponential basis function sufficiently reflects the model.
Conclusions
So far, the studies of the Laplace domain WI have failed to provide

Figure 3. Three-layered velocity model. The relative model misfit of each a basis for selecting the Laplace constant. As Sirgue and Pratt (2004)
inverted model parameter on the vertical dotted line is compared for this test. proposed a method for selecting frequencies using the concept of
continuity and redundancy of wavenumber, in this paper, we
Given the maximum offset (10 km) and the depth of the target layer propose a reasonable strategy for Laplace constants selection so that
(3 km), the cosine value of 𝜃𝑚𝑎𝑥 is 0.088 and 𝑅𝑚𝑖𝑛 and 𝑅𝑚𝑎𝑥 is 3 the continuity of imaginary wavenumber is maintained with
km and 5.83 km, respectively. Therefore, the set of Laplace minimum redundancy. Using the one-dimensional three-layer
constants (s −1 ) selected by the suggested strategy is {1.000, 2.349, model, we can confirm that the Laplace constants selection strategy
4.970, 10.00}. The result inverted with the set of Laplace constant shows better performance than the conventional fixed Laplace
is compared with the results inverted with 8 sets of Laplace constants strategy. This study is expected to help the selection of
constants with fixed intervals as shown in Table 1. For the Laplace complex frequencies not only in Laplace Fourier domain, but also
domain WI, we used truncated Newton method (Kwon et al., 2017) may be applied to resolution analysis of the velocity model obtained
with 2500 iteration, which is expected as a sufficient iteration for in the Laplace domain.
convergence.
Acknowledgement
Number of
Strategy Laplace Laplace constants (s −1) This work was supported by the Energy Efficiency & Resources
constants Core Technology Program of the Korea Institute of Energy
Proposed Technology Evaluation and Planning (KETEP) granted financial
4 1.000, 2.349, 4.970, 10.00
strategy resource from the Ministry of Trade, Industry & Energy, Republic
2 Range: 1.000 – 10.00, Interval: 9.000
of Korea (Nos. 20132510100060 and 20152520100740).
3 Range: 1.000 – 10.00, Interval: 4.500
4 Range: 1.000 – 10.00, Interval: 3.000
Fixed interval
5 Range: 1.000 – 10.00, Interval: 2.250
strategy
6 Range: 1.000 – 10.00, Interval: 1.800
(Conventional)
7 Range: 1.000 – 10.00, Interval: 1.500
8 Range: 1.000 – 10.00, Interval: 1.286
9 Range: 1.000 – 10.00, Interval: 1.125
Table 1. Description of the set of Laplace constants used in each strategy.
After the Laplace domain waveform inversion was performed using

the sets of Laplace constants in Table 1, the relative model misfit of
each result was calculated using the following equation:
© 2017 SEG Page 1620

EDITED REFERENCES
REFERENCES
Kwon, J., H. Jin, H. Calandra, and C. Shin, 2016, Interrelation between Laplace constants and the
gradient distortion effect in Laplace-domain waveform inversion: Geophysics, 82, no. 2, R31–
R47, http://dx.doi.org/10.1190/geo2015-0670.1.
Métivier, L., R. Brossier, J. Virieux, and S. Operto, 2013, Full waveform inversion and the truncated
Newton method: SIAM Journal on Scientific Computing, 35, B401–B437,
http://dx.doi.org/10.1137/120877854.
Shin, C., and Y. H. Cha, 2008, Waveform inversion in the Laplace domain: Geophysical Journal
International, 173, 922–931, http://dx.doi.org/10.1111/j.1365-246X.2008.03768.x.
Shin, C., and W. Ha, 2008, A comparison between the behavior of objective functions for waveform
inversion in the frequency and Laplace domains: Geophysics, 73, no. 5, VE119–VE133,
http://dx.doi.org/10.1190/1.2953978.
Woodward, M. J., 1992, Wave-equation tomography: Geophysics, 57, 15–26,
http://dx.doi.org/10.1190/1.1443179.
Wu, R. S., and M. N. Toksöz, 1987, Diffraction tomography and multisource holography applied to
seismic imaging: Geophysics, 52, 11–25, http://dx.doi.org/10.1190/1.1442237.
© 2017 SEG Page 1621

A stochastic L-BFGS approach for full waveform inversion
Gabriel Fabien-Ouellet*, Erwan Gloaguen, Bernard Giroux, INRS
Summary optimization theory. In stochastic optimization, the descent

direction is obtained by calculating the gradient on a
Speeding-up convergence rates and reducing the random subset of the data, which reduces the cost of the
computational burden of Full Waveform Inversion (FWI) is computation. This method is advantageous for optimization
increasingly important as we move toward large-scale 3D problems of large size and large datasets (Bottou, 2010),
multi-parameter inversion. To this end, second-order like FWI.
optimization algorithms like L-BFGS or the truncated
Newton method allow a much faster convergence rate at Most stochastic optimization algorithms used in FWI are
minimal computational costs. In the same fashion, based on first order gradient descent methods (van
stochastic source subsampling approaches have been shown Leeuwen et al., 2011). As shown by Castellanos et al.
to reduce the computational cost of FWI. In this study, we (2015), the difficulty of introducing second order
propose to combine these two strategies and present how approximations stems from the error introduced by the
the L-BFGS algorithm can be used along with the random subsampling in the Hessian approximation. To be
stochastic source subsampling strategy, or what we call the able to apply stochastic second order descent algorithms, a
stochastic L-BFGS algorithm. strategy to reduce this error must be adopted. In this study,
we show how this can be achieved for the L-BFGS method.
Introduction We first present the theory of L-BFGS with random
sources subsampling and then show a performance
The advances in high performance computing over the last comparison between the proposed algorithm and the
decades have allowed the application of FWI to larger and standard stochastic descent method with the Marmousi
larger 3D problems, and to more complex physics, going model.
from acoustic to the more complex anisotropic
(visco)elastic wave propagation (Fabien-Ouellet et al., Problem definition
2017, Komatitsch et al., 2002). Still, large-scale multi-
parameter FWI remains computationally challenging, Full waveform inversion is formulated as a minimization
preventing its widespread adoption. Hence, reducing the problem: find the Earth parameters 𝒎 that minimize a
computing times remains an important issue to broaden the measure of the discrepancy between the modelled and the
applicability of FWI. recorded data, 𝒅. This measure is given by the cost
function, often taken as the 𝑙! norm of the residuals:
Many strategies have been proposed to decrease the
computational burden of FWI. One such strategy is the use 1 !∈! 𝑺𝒊 𝒗𝒊 𝒎 − 𝒅𝒊 ! 𝑺𝒊 𝒗𝒊 𝒎 − 𝒅𝒊
of second-order descent algorithms, like the Newton Χ ! (𝒎) = !
(1)
2 !∈! 𝒅𝒊 𝒅𝒊
method, which has been shown to dramatically improve the
convergence rate of FWI and its resolution (Pratt et al., where Ω represents a source ensemble and 𝒗𝒊 is the
1998). More precisely, inexact Newton methods like the modelled particle velocities due to source 𝑖, sampled at the
limited memory Broyden-Fletcher-Goldfarb-Shannol (L- recorder’s location by the sampling operator 𝑺. The cost
BFGS) or the truncated Newton method retain the better function is normalized by the sum of the squared amplitude
convergence rates of the full Newton method, without its of the data to scale appropriately for different sources
prohibitively high computing cost. In effect, inexact subset. In what follows, Χ ! will be used to designate the
Newton methods speed-up convergence at minimal costs, cost function on a source subset Ω and Χ will be used for
thus reducing the required number of iterations and the the cost function on the complete set of sources.
overall computing time of FWI. In addition, inexact
Newton methods are particularly important for multi- The particle velocities 𝒗𝒊 obey the wave equation, which
parameter FWI, as second order information helps to must be solved numerically for an arbitrarily heterogeneous
decouple different parameter classes (Virieux et al., 2017). Earth (Virieux et al., 2011). Solving the wave equation, or
what we call forward modeling, represents the main
Another successful strategy to mitigate the computing computational cost of FWI. Time-domain finite differences
requirements of FWI is to use data subsampling, be it in the (FDTD) remains the method of choice for large 3D elastic
form of random sources encoding (Krebs et al., 2009) or of FWI. FDTD requires one complete forward modeling per
stochastic source subsampling, two methods that show source, which is why reducing the number of modelled shot
similar performance (van Leeuwen and Herrmann, 2013). points during inversion is advantageous. As solving the
These methods take their roots in the stochastic
© 2017 SEG Page 1622

Stochastic L-BFGS
wave equation remains challenging even with the Stochastic formulation

computational resources of today, the cost function is
usually optimized with local line search algorithms of the In the traditional form, the source ensemble Ω is constant
following form: throughout the inversion and taken as the complete
ensemble of sources positions. On the other hand, the
𝒎!!! = 𝒎! − 𝛼𝑯! 𝛻Χ (2) stochastic approach uses a random subset of the sources
where 𝛻𝛸 is the misfit gradient, calculated at the cost of that changes at every iteration, with a number of sources
usually much smaller than the complete ensemble. In what
approximately two forward modelling per source owing to
follows, this is achieved by doing a random draw on shot
the adjoint method (Plessix, 2006), 𝛼 is the step size and
gathers, at each iteration, with a constant probability
𝑯! is an approximation of the inverse Hessian matrix, i.e.
distribution over shots. Because the acquired seismic data
(𝑯! ≈ 𝛻 ! Χ !! ) . For the simplest line search algorithm, the
is highly redundant by design, a small source subsample
steepest descent or gradient descent, the approximation of
can be used to estimate the value of the cost function and
the inverse Hessian is discarded, i.e. 𝑯! → 𝑰. Although this
its gradient. This subsampling introduces some noise, but
approach has the merit of being the most parsimonious in
on average, the expectation of the cost function should
terms of forward modelling, it suffers from slow
converge to the true value along iterations, which justifies
convergence.
the stochastic gradient descent (SGD) algorithm (equation
(2) with 𝑯! → 𝑰 and 𝛻Χ → 𝛻Χ !! ).
Brossier et al. (2009) show that a much better convergence
can be attained at no additional forward modelling costs
with the L-BFGS algorithm (Nocedal and Wright, 2006). The noise introduced by the stochastic subsampling is more
Furthermore, this method does not require the complete problematic in the case of the L-BFGS algorithm. In
storage of 𝑯! , which is prohibitive for large models. particular, the gradient change vector 𝒚! = 𝛻𝛸! !!! −
Instead, the product 𝑯! 𝛻Χ can be computed by storing 𝑛 𝛻𝛸! ! will be dominated by the sampling noise if
vector pairs of parameter changes, 𝒔! = 𝒎!!! − 𝒎! , and Ω!!! ≠ Ω!!! . Hence a naïve implementation of L-BFGS
gradient changes, 𝒚! = 𝛻𝛸!!! − 𝛻𝛸! . The inverse Hessian using the previously defined 𝒚! will be unstable, and in
preconditioning is then obtained with the two-loops most cases, will diverge. As shown by Schraudolph et al.
recursion (Algorithm 1), which involves only vector (2007) for online learning, we can circumvent this problem
products. This has a negligible cost compared to the by using the same source subset in the evaluation of the
forward/adjoint modelling. To ensure that the approximate vector pairs 𝒔! and 𝒚! . To efficiently implement this
inverse Hessian remains positive definite and that the step solution, we propose the stochastic SL-BFGS algorithm
direction remains a descent direction, the step length 𝛼 is (Algorithm 3). A single iteration of this algorithm contains
chosen to respect the Wolfe conditions, namely sufficient two parameters updates. After the first update (line 5), the
decrease of the cost function and its curvature. A simple gradient of the updated model is computed with the same
line search implementing those two conditions is presented subset of sources (line 6), which can be used to update the
in Algorithm 2. The sufficient decrease and curvature 𝒔! and 𝒚! vectors (line 9). The Wolfe line search can
conditions appear at lines 5 and 7 respectively. proceed simultaneously, at virtually no cost because the
gradient of the updated model is already computed. Note
Algorithm 1: Two-loops recursion
Algorithm 2: Wolfe line search
1. Inputs: 𝛻Χ, 𝑯𝒑𝒓𝒆
2. 𝒒 ← 𝛻Χ 1. Inputs: 𝒎, 𝒑, 𝛻𝛸! , 𝛸!
3. for i=k-1…k-n do 2. 𝛼 ← 1, 𝜏 ← 0.6, 𝑐! ← 10!! , 𝑐! ← 0.9
!!
4. 𝜌! ← 𝒚!! 𝒔! 3. while stop_criteria do
5. 𝛾! ← 𝜌! 𝒔!! 𝒒 4. Compute 𝛸, 𝛻𝛸, 𝑯𝒑𝒓𝒆 with 𝒎 ← 𝒎 + 𝛼𝒑
6. 𝒒 ← 𝒒 − 𝛾! 𝒚! 5. if 𝛸 > 𝛸! + 𝑐! 𝒑𝑻 𝛻𝛸!
7. end for 6. 𝛼 ← 𝜏𝛼
8. 𝒒 ← 𝑯𝒑𝒓𝒆 𝒒 7. else if 𝒑𝑻 𝛻𝛸 < 𝑐! 𝒑𝑻 𝛻𝛸!
9. for i=k-n…k-1 do 8. 𝛼 ← 𝛼/𝜏
10. 𝛽 ← 𝜌! 𝒚!! 𝒒 9. else
11. 𝒒 ← 𝒒 + 𝒔! 𝛾! − 𝛽 10. break
12. end for 11. end while
13. Outputs: 𝒒=𝑯𝛻Χ 12. Outputs: 𝛼, 𝛻𝛸, 𝑯𝒑𝒓𝒆
© 2017 SEG Page 1623

Stochastic L-BFGS
that, with high probability, 𝛼=1 for L-BFGS and the line
search does not require any new computations. Finally, the
gradient of the updated model is used to update the model a
second time (line 10), without updating 𝒔! and 𝒚! , nor
performing a line search. In our experience, this strategy
allows two model updates using only two gradient
computations, i.e. the step size 𝛼 = 1 respects Wolfe
conditions most of the time. Note also that this algorithm
can use a preconditioning matrix 𝑯𝒑𝒓𝒆 , for example the
diagonal approximation of Shin et al. (2001).
Numerical experiment
To evaluate the performance of the SL-BFGS algorithm,

we performed an acoustic FWI experiment with the
classical Marmousi model (Versteeg, 1994). For seismic
modelling and gradient calculations, we used the FDTD
code of Fabien-Ouellet et al. (2017). The Marmousi model
is discretized on a grid with cells of 20x20 m2. We use the
full aperture data, with shots and receivers every 20 meters,
for a total of 460 shot points. The source is a Ricker
wavelet with a central frequency of 7.5 Hz. The source
signature and the density are considered fixed in this
experiment, and no noise is added to the data. This is to
keep to a minimum the number of factors that can impact
FWI, as we want to focus on the convergence of different
stochastic algorithms.
We compared the performance of the SL-BFGS with the

stochastic gradient descent SGD algorithm. For SL-BFGS,
we use a memory length 𝑛 of 8. The stochastic gradient
descent algorithm is identical to Algorithm 2, with the two-
loops recursion (lines 5 and 10) replaced by a simple
preprocessing of the gradient, 𝒑! ← 𝑯!! 𝛻𝛸!! . Hence,
each iteration step of the two algorithms should have more
or less the same cost, with two gradient calculations per
Algorithm 3: Stochastic L-BFGS
1. Inputs: 𝒎! , 𝒅
2. while stop_criteria do
3. Draw 𝛀! from 𝒅
4. Compute 𝛸!! , 𝛻𝛸!! ! , 𝑯!𝒑𝒓𝒆 !!
5. 𝒑!! ← two-loops recursion 𝛻𝛸!! ! , 𝑯!𝒑𝒓𝒆 !!
6. 𝛼, 𝛻𝛸!! ! , 𝑯!𝒑𝒓𝒆 !! ←Wolfe search 𝒑!! , 𝛻𝛸!! ! , 𝛸!!
7. if k>n
8. Discard 𝒔!!! , 𝒚!!!
9. 𝒔! ← 𝛼𝒑!! , 𝒚! ← 𝛻𝛸!! ! − 𝛻𝛸!! !
10. 𝒑!! ← two-loops recursion 𝛻𝛸!! ! , 𝑯!𝒑𝒓𝒆 !!
11. 𝒎!!! ← 𝒎! + 𝛼 𝒑!! + 𝒑!!
12. 𝑘 ← 𝑘 + 1 Figure 1: True Marmousi model (a), initial model (b), SGD
13. end while inverted model (c) and stochastic L-BFGS inverted model (d).
© 2017 SEG Page 1624

Stochastic L-BFGS
iteration. We used the hierarchical inversion strategy of

Bunks et al. (1995) and inverted sequentially for increasing 2Hz 3 Hz 5 Hz 8 Hz 12 Hz 16 Hz
discrete frequencies (2, 3, 5, 8, 12 and 16 Hz). In our time 40
domain code, those frequencies are computed with the Fast SL−BFGS
Fourier Transform. For each frequency, 40 iterations are 35 SGD
performed. We started with a linearly increasing P-wave
velocity model starting from 1500 m/s to 4200 m/s 30
(Figure1 b).
25
XΩ (%)
The inverted models obtained with SGD and SL-BFGS are
presented in Figure 1 c) and d) respectively. Comparing the 20
results with the true model (Figure 1 a), we see that the
15
model above 2 km is very well reconstructed in both cases.
Below 2 km, the inversion is more challenging due to
10
poorer illumination, but the velocity magnitude is better
reconstructed with the stochastic SL-BFGS algorithm.
5
Overall, the resolution of the model obtained with SL-
BFGS is higher than the SGD inversion. This is due to the
0
better convergence of SL-BFGS that can take advantage of 0 40 80 120 160 200 240
the curvature information. Iteration
Figure 2: Cost function value as a function of iteration number for
To better compare the convergence of both algorithms, the SGD and SL-BFGS. Increasing frequency bands are shown on the
cost function value is plotted against the number of top axis.
iterations in Figure 2. At the lowest frequency of 2 Hz, both
algorithms behave similarly and lead to more or less the
same decrease in the cost function, with a faster decrease
for SGD. However, from 5 Hz and higher, SGD
performance degrades rapidly and the cost function stays
above 10 %. The SL-BFGS convergence stays much more
constant across frequency bands and reaches a plateau
below 10% in all cases. This is the main reason why the
model obtained by L-BFGS shows a much better solution:
higher frequencies have converged, contrary to SGD.
Conclusions
We proposed a modification of the classical L-BFGS

algorithm that supports the stochastic random subsampling
of sources. The random subsampling allowed a drastic
reduction of the computing time over the complete dataset
with the Marmousi model: each iteration of the complete
dataset would have required 460 shots, whereas we used a
batch size of 20 shots with SL-BFGS. This represents a
mere 5% of the cost of traditional L-BFGS for the same
number of iterations. The second order information
included in SL-BFGS allowed an improved convergence
over SGD, at virtually no further computing costs. In
summary, the stochastic L-BFGS algorithm allows a much
faster convergence than SGD, at a fraction of the cost of the
non-stochastic version.
Acknowledgements
This work was supported by the Vanier Canada Graduate

Scholarships.
© 2017 SEG Page 1625

EDITED REFERENCES
REFERENCES
Bottou, L., 2010, Large-scale machine learning with stochastic gradient descent: Proceedings of
COMPSTAT, 177–186, http://doi.org/10.1007/978-3-7908-2604-3_16.
Brossier, R., S. Operto and J. Virieux, 2009, Seismic imaging of complex onshore structures by 2D elastic
frequency-domain full-waveform inversion: Geophysics, 74, no. 6, WCC105-WCC118,
http://doi.org/10.1190/1.3215771.
Bunks, C., F. M. Saleck, S. Zaleski and G. Chavent, 1995, Multiscale seismic waveform inversion:
Geophysics, 60, 1457–1473, http://doi.org/10.1190/1.1443880.
Castellanos, C., L. Metivier, S. Operto, R. Brossier and J. Virieux, 2015, Fast full waveform inversion
with source encoding and second-order optimization methods: Geophysical Journal International,
200, no. 2, 718–742, http://doi.org/10.1093/gji/ggu427.
Fabien-Ouellet, G., E. Gloaguen and B. Giroux, 2017, Time-domain seismic modeling in viscoelastic
media for full waveform inversion on heterogeneous computing platforms with OpenCL:
Computers & Geosciences, 100, 142–155, http://doi.org/10.1016/j.cageo.2016.12.004.
Komatitsch, D., J. Ritsema and J. Tromp, 2002, The spectral-element method, Beowulf computing, and
global seismology: Science, 298, 1737–42, http://doi.org/10.1126/science.1076024.
Krebs, J. R., J. E. Anderson, D. Hinkley, R. Neelamani, S. Lee, A. Baumstein and M.-D. Lacasse, 2009,
Fast full-wavefield seismic inversion using encoded sources: Geophysics, 74, no. 6, WCC177–
WCC188, http://doi.org/10.1190/1.3230502.
Nocedal, J. and S. Wright, 2006, Numerical optimization, Springer Science & Business Media.
http://doi.org/10.1111/j.1365-246X.2006.02978.x.
Pratt, R. G., C. Shin and G. Hicks, 1998, Gauss-Newton and full Newton methods in frequency-space
http://doi.org/10.1046/j.1365-246X.1998.00498.x.
Schraudolph, N. N., J. Yu and S. Günter, A Stochastic Quasi-Newton Method for Online Convex
Optimization: Proceedings of The 11th International Conference on Artificial Intelligence and
Statistics, 436-443.
Shin, C., S. Jang and D.-J. Min, 2001, Improved amplitude preservation for prestack depth migration by
inverse scattering theory: Geophysical Prospecting, 49, 592–606,
http://doi.org/10.1046/j.1365-2478.2001.00279.x.
van Leeuwen, T., A. Y. Aravkin and F. J. Herrmann, 2011, Seismic Waveform Inversion by Stochastic
Optimization: International Journal of Geophysics, 2011, 1–18,
http://doi.org/10.1155/2011/689041.
van Leeuwen, T. and F. J. Herrmann, 2013, Fast waveform inversion without source-encoding:
Geophysical Prospecting, 61, 10–19, http://doi.org/10.1111/j.1365-2478.2012.01096.x.
Versteeg, R., 1994, The Marmousi experience: Velocity model determination on a synthetic complex data
set: The Leading Edge, 13, 927–936, http://doi.org/10.1190/1.1437051.
© 2017 SEG Page 1626

Virieux, J., A. Asnaashari, R. Brossier, L. Métivier, A. Ribodetti and W. Zhou, 2014, An introduction to
full waveform inversion: Encyclopedia of Exploration Geophysics, R1-1–R1-40,
https://doi.org/10.1190/1.9781560803027.entry6.
Virieux, J., H. Calandra and R. E. Plessix, 2011, A review of the spectral, pseudo-spectral, finite-
difference and finite-element modelling techniques for geophysical imaging: Geophysical
Prospecting, 59, no. 5, 794-813, http://doi.org/10.1111/j.1365-2478.2011.00967.x.
© 2017 SEG Page 1627

Fast building initial velocity modeling using encoding multiscale multi-shot full-waveform
inversion
Yundong Guo *, Jianping Huang, Chao Cui and ZhenChun Li
China University of Petroleum(East China)
Summary (Boonyasiriwat, 2009). But compared to the single-scale

For the traditional gradient inversion method, the Hessian waveform inversion, multiscale FWI needs large
matrix is an important factor to the stability and computing costs. For FWI in time domain, the computation
computational efficiency. However, the accurate Hessian amount is directly related to the number of sources because
matrix is difficult to obtain in the inversion process, and the the misfit function is the sum of the L2-norm distance of
pseudo Hessian matrix is widely used. The calculation of observed and synthetic data of individual source. One
step length is the key step of the pseudo Hessian matrix major strategy to increase the efficiency of FWI is to
method, which directly influences the stability and decrease the number of sources, which may damage the
efficiency of the inversion. The cost of conventional quality of inversion solution for lacking illumination.
calculating step length is huge, due to at least two forward Alternatively, composing super source by different
modeling, including linear test, parabolic fitting and so on. encoding strategies is a compromise between the efficiency
In order to improve the accuracy and efficiency of step-size and inversion accuracy. The super shot is applied in FWI
calculation, this paper combines the multi-shot modeling problems by Krebs et al(2009). By replacing the single
into the step-length calculation, by reducing the required source forward modeling by encoded multi-source forward
number of forwards. Combining parallel computing and modeling, FWI based on super source is capable of
adaptive algorithm, multiple steps can be calculated relieving the reliance of the computation burden of FWI on
simultaneously, then the optimal step-length would be the number of sources and improving the illumination
selected. Compared with the traditional method, the new ability. However, one difficult of applying the FWI based
method has higher computational efficiency and accuracy. on super shot is the cross talk noise between different shots
even though which can be partly mitigated by the complex
Introduction encoding strategies.
Full waveform inversion (FWI) has been an important In this paper, to further improve the stability and efficiency
velocity modeling method in geophysics research, which of conventional FWI, we combine the encoding Multi-shot
has been a hotspot of geophysics researchers in recent years. FWI (EMSFWI) with the multiscale FWI (MTFWI) in time
Theoretically, it can make full use of the information of domain based on the strategy proposed by Boonyasiriwat et
seismic data, including amplitude, phase, travel time and al (2009). During the multi-scale inversion stage, the
waveform, to describe the parameters of underground multi-source strategy is adopted for the sake of decreasing
model accurately. Since the method was put forward, it has computational burden. The output model of multiscale
been tremendously developed in different data and model stage is contaminated by singular value because of cross-
fields. FWI can achieve good performance for global and talk noise, which can be eliminated by smoothing. On the
regional tectonics and seismic exploration. other hand FWI using smoothing kernels can effectively
overcome the cycle skipping problem(Xue,2016). The
In the 1980s, Tarantola proposed full-waveform inversion inverted model can be used as a more accurate initial model
in time domain based on a generalized least-squares theory for conventional FWI. Synthetic data test verifies the
(Tarantola A. 1984). However, when the initial background efficiency of the proposed method.
velocity model is not accurate, the traditional full-
waveform inversion often fails to achieve a good inversion Theory and/or Method
result. Because high-frequency data used in the inversion FWI estimates the velocity model by minimizing the
cause the misfit function to be highly nonlinear, FWI can seismic wave field misfit function, where the waveform
suffer from the local minima problem (Gauthier et al., data residual based on Multi-shots can be stated generally
1986). The multiscale approach has been undertaken to as:
overcome this local minima problem in the time domain Ns Ns 2
(Bunks et al., 1995) and the frequency domain (Sirgue and
Pratt, 2004). Boonyasiriwat (2009) improved the
E (v )   p(v,s )   d
n 1
n
n 1
n
computational efficiency of the multiscale method by using (1)

a more efficient non-leaky low-pass filter and an optimized
strategy for choosing frequency bands in time domain
© 2017 SEG Page 1628

Double click here to type your header
where the sum over time samples and receivers is implied v f  S (s)  vuo
E (v) is the objective function, (5)
by the norm and where v vf
Where uo , are the velocity model before and after
p(v,sn ) is the simulated wavefield for source sn and smooth filter by the filter function S(s).
On the other hand, The computational efficiency gain
d
model v , n is measured seismic data and
Ns is the  evaluating equation (1)-(3) is then given by
number of source gathers in the seismic survey.
td
  Ns
Data residua for polarity encoded multi-shot full waveform tm
inversion is defined as: (6)
Ns Ns 2 Where N s is the number of the shots which form one
E (v )  e
n 1
n  p(v,sn )   en  d n
n 1
super-gather,
tm is the cost time of one iteration for the
(2)
td is the cost time of one iteration for the
Where n is the coding sequence.  is the convolution in
e multi-FWI, while
t  td
time domain. Note that, for orthogonal polarity encoding, one shot of the conventional FWI. Due to m , time
en could form orthogonal matrix e , satisfy the conditions: spent for encoding FWI is far less than the latter.
eeT  I . Examples (Optional)
Data residua for the main frequency

tar can be expressed SAG model test
The true velocity model is shown in Figure 1.
as:
2
Model parameter: The model is three layers structure, 201
Ns Ns
×201 cells at 8-m cell size, from 3500m/s to 4500m/s.
E (v, tar )  e
n 1
n  p(v,sn (tar ))   en  d n (tar )
n 1
Table 1. Parameters for the uniform
(3)
fixed-spread geometry
where
sn (tar ) is source ricker wavelet with main Parameter
Measurement
 d ( ) Number of receivers 201
frequency tar , measured seismic data n tar is got
by applying the Wiener filter to the row data, which can Receiver interval 8m
filter one signal to closely match another target signal Receiver depth 8m
(Boonyasiriwat ,2009 ). A low-pass Wiener filter can be Number of sources 49
computed by: Source interval 32 m
Source depth 8 m
Wtar ( )W †ori ( )
f Wiener ( )  Source wavelet 20 Hz
Wori ( )   2 Trace length 1.2 s
(4)
f W
Where Wiener is the Wiener filter, ori is the original
wavelet, tar is the low-frequency target wavelet,  is the
W Observation system: A uniform fixed-spread geometry
angular frequency,  is a small parameter that prevents

was simulated with parameters in Table 1.
The measured seismic data were generated by using a 20-
numerical overflow, and † denotes the complex conjugate. Hz Ricker wavelet, and the measured data and source
Equation 3 has the advantage that only one seismic signatures were encoded before the objective function and
p N gradient evaluation. The super-gather data formed by
simulation is needed to compute , as opposed to the s encoding and summing all 49 shot gathers. Then the
simulations needed for the conventional FWI objective difference gather between encoding super shot gather with
function. This is because the first term of equation (4) is initial velocity model and encoding synthesis with
computed by running the simulator one time with all measured seismic data were obtained, shown in Figure 2a.
sources acting simultaneously and with each source The difference shot gather, filtered by the Wiener filter to
injecting its encoded signature into the model. the frequency band of a 5-Hz Ricker wavelet, is shown in
By adding a smoothing kernel S(s) into equation 3, we Figure 2a. Then we use the filtered synthetic super shot
formulate a new velocity update as follows: gather as the input.
© 2017 SEG Page 1629

Figure1 real velocity model (b)
(a)
(c)
Figure3 Different velocity models ((a) initial velocity
models (b) encoding MTFWI after 172 iterations (c) by the
filter)
The initial velocity model is closed to a linear gradient

model increasing from the surface to underground, which is
shown in Figure 3a. We use low-pass filter by the Wiener
filter to Ricker wavelet with the frequency band from 5-Hz
to 15-Hz in the inversion process. Inversion results of the
method are shown in Figure 3(b): (1) After 172 iterations,
(b) the multiscale EMSFWI reflects the basic structure, but has
Figure2 The difference profile between super shot gather obvious shot positions and low frequency noise near the
with initial velocity model and synthesis with measured sag and significant cross noise internal layers , even though
seismic data(a) no filter (b) Shot gather low-pass filtered by which can be partly mitigated by the complex encoding
the Wiener filter to the frequency band of a 5-Hz Ricker strategies. (2) The velocity after smooth filtering suppresses
wavelet cross noise in mid-depth layers, describes the large
structure of the underground more effectively (Fig 3(c)).
Inversion results with the different initial velocity are
shown in Figure 4: traditional FWI with inaccurate
background velocity fails to invert the sag internal velocity
field shown in Fig 4(a); traditional FWI inversion using the
velocity after the multiscale EMSFWI could invert some
little structure in Fig 4(b), but contaminated by cross noise
in initial velocity; traditional FWI inversion, using the
velocity after smooth the inversion velocity with the
multiscale EMSFWI, could reflect the accurate structure
shown in Fig4(d), as the same as the single multiscale FWI
(a) after 200 iterations.
© 2017 SEG Page 1630

(d)
(a) Figure 4 Different inversion profiles with different initial

velocity models((a) traditional FWI with velocity shown in
Fig 3a after 100 iterations (b) traditional FWI with velocity
shown in Fig 3b after 100 iterations (c) traditional FWI
with velocity shown in Fig 3c after 100 iterations (4)
traditional multiscale FWI with velocity shown in Fig 3a
after 200 iterations)
Normalized data error convergence of different inversions

are shown in Figure 5: Data error of multi-scale EMSFWI
and traditional multiscale FWI will be down in the vicinity
of a lower error value, since every shot gathers are different
during each iteration with different scale. On the other hand
data error of FWI with the smoothed initial model (Fig3(c))
is down in a lower error value and has faster convergence
(b) speed.
Conclusions
Numerical tests on SAG model show that multiscale

EMSFWI can invert large scale structure of the
underground effectively and the smoothing filter could
suppress cross noise so that improve the quality of small
scale structures inversion. Multiscale EMSFWI combined
with smoothing filter is a fast and effective method to build
the background velocity for the FWI especially for the
massive seismic data. It would save a lot of computer cost
compared with traditional Multiscale FWI. On the father
discuss, if the smooth filter is instead of Wiener filter in
initial modeling and the smooth function is proper, it can
also get the preferable effect even in not the existence of
(c)
very low frequency components.
Acknowledgments (Optional)
We are grateful to the Seismic Wave Propagation and

Imaging Lab (SWPI) for its financial support.
We also thank reviewers for reviewing this manuscript.
© 2017 SEG Page 1631

EDITED REFERENCES
REFERENCES
Boonyasiriwat, C., P. Valasek, P. Routh, W. Cao, G. T. Schuster, and B. Macy, 2009, An efficient
multiscale method for time-domain waveform tomography: Geophysics, 74, no. 6, WCC59–
Gauthier, O., J. Virieux, and A. Tarantola, 1986, Two-dimensional nonlinear inversion of seismic
waveforms: Numerical results: Geophysics, 51, 1387–1403, http://dx.doi.org/10.1190/1.1442188.
Krebs, J. R., J. E. Anderson, D. Hinkley, R. Neelamani, S. Lee, A. Baumstein, and M. D. Lacasse, (2009)
1259–1266, http://dx.doi.org/10.1190/1.1441754.
Xue, Z., N. Alger, and S. Fomel, 2016, Full-waveform inversion using smoothing kernels: 86th Annual
© 2017 SEG Page 1632

Adaptive Full Waveform Inversion based on Non-stationary Phase Correction in TTI media
Yihua Xuan1, Yong Hu2, Zhenbo Zhang1, Liguo Han2;
1. CNOOC China Ltd. Shenzhen Branch; 2. Jilin University
Summary When the seismic wave propagate in the underground, the

media properties may cause waveform local phase rotation,
Full waveform inversion (FWI) has become one of the especially in the tilt transverse isotropic (TTI) media. So
important means for high precision velocity modeling, but when we forward modeling in an inaccurate velocity
there are still many problems in real seismic data models, it may have severe phase difference between
processing. In this abstracts, we use non-stationary phase synthetic data and recorded data. In this abstract, we
rotation to correct the phase of seismic waveform for FWI. propose to use non-stationary phase rotation method to
In order to mitigate the cycle skipping problem, we correct the phase difference between the synthetic data and
introduce an adaptive factor to adjust the rotation angle recorded data. We introduce an adaptive factor to adjust the
value. The underground media properties are very complex correcting value for the recorded data. The main idea of
and anisotropic, in order to match the kinematics accurately, this abstract is that we first put recorded data close to the
we use pseudo-spectral method to conduct the acoustic TTI synthetic data by using non-stationary phase correction, and
media forward modeling. The numerical experiments then use corrected recorded data to guide the FWI
demonstrate that the adaptive non-stationary phase objective function to jump out of local minimum values.
correction FWI in TTI media is a good method to guide the
FWI objective function to jump out of the local minimum In this abstract, we first give some theoretical formula of
and it can effectively mitigate the cycle skipping problem. the non-stationary phase correction. And then we talk about
the acoustic forward modeling in TTI media by using
Introduction pseudo-spectral method. After that we discuss about how to
conduct the adaptive FWI by adjusting the adaptive factor.
Full waveform inversion (FWI) was first introduced by Finally, we give some numerical inversion results to prove
Lailly (1983) and Tarantola (1984), and then Pratt (1999) that non-stationary phase correction is a good method to
extended time domain FWI to the frequency domain, it can mitigate the cycle skipping problem and it can effectively
provide an accurate inversion result with only a limited avoid FWI falling into the local minimum values.
selected frequencies. However, FWI is a strongly nonlinear
problem, and the waveform of seismic data is very complex, Non-stationary phase correction
if the background velocity is not good enough, the
waveform of synthetic data and recorded data may have Given a real value seismic trace u (t ) , the related complex
mismatch problem, that is cycle skipping (Virieux and ~
seismic trace u (t ) is obtained by the Hilbert transform:
Operto, 2009). Another problem of FWI is that the
underground media properties are very complex and u~(t )  u (t )  iH u (t ) (1)
anisotropy, if we use isotropic media to conduct the Where the operator H 

indicates the Hilbert transform.
acoustic FWI, it may not accurate. Therefore it is necessary While the complex seismic trace can also be expressed by:
to incorporate anisotropy in order to match the kinematics (Luo,2016).
accurately.
u~ (t )  u~ (t ) e i ( t )u (2)
FWI finds a subsurface parameter by minimizing the data
Where u~(t )  u (t ) 2  H u (t ) 2 , it is the envelope of the
residual, between the synthetic data and the recorded data,
in both kinematics and amplitude. In order to mitigate the seismic data. In this abstract, we use u (t ) to denote the
waveform mismatch problem, a series of inversion synthetic seismic data, and use d (t ) to denote the recorded
strategies have been proposed. Adaptive waveform data. Similar to equation (2), we have:
inversion was introduced by Warner (2014), it is a form of ~ ~
FWI that is immune to the effects of cycle skipping. Bai d (t )  d (t ) e i (t ) d (3)
(2016) introduced the least squares filter to narrow the So the difference of instantaneous phase between recorded
phase difference between the synthetic data and recorded data and synthetic data can be obtained by :
data, the inversion result is very good and effectively
 (t )   (t )u   (t ) d (4)
mitigate the cycle skipping problem. Shao (2014) used
numerical tests to show that isotropic acoustic FWI can not we can use equation (5) to express the difference between
obtain accurate inversion results with an anisotropic the two phase, in order to avoid the phase wrapping
seismic data. problem:
© 2017 SEG Page 1633

Non-stationary Phase Correction FWI in TTI media
u~ (t ) u~ (t ) i (t ) Fig.2 shows the objective functions value of conventional

~  ~ e (5) FWI and non-stationary phase correction FWI, the FWI
d (t ) d (t )
norm was calculated by one trace waveform (Fig.1a). From
According to equation (5), we have: Fig.2 we can see that the FWI objective function has many
~
u~(t ) d (t ) local minimum values, even if the trace only has two
e i ( t )
 ~ ~ (6) Ricker waveform (Fig1a). When the time difference of two
d (t ) u (t ) waveform (synthetic data and recorded data) is large than
half period, the FWI results may have cycle skipping
So the Phase Correction of seismic data can be expressed as:
problem. At the same time when the seismic wave
 
~ propagate in the real underground media, the recorded
d rot (t )  Re d (t )ei (t ) (7) waveform may have a phase rotation. If we use synthetic
data to match phase rotated recorded data, the FWI norm
d rot (t ) is the non-stationary phase rotated version of d (t ) . cannot converge to the real minimum value (Fig.2 blue
In order to test the accuracy and feasibility of non- arrow). While after we use the non-stationary phase
stationary phase correction, we use Ricker wavelet to correction method, the FWI objective can converge to the
synthesize a waveform sequence (Fig.1a), we can see that true global minimum value (Fig.2 red arrow).
the blue line ( u ) is a synthetic waveform, and the red line
( d ) is a rotated waveform from u . By using equation (7),
we can obtain the time-variant phase rotated waveform
(Fig.1b).
Fig.3 Non-stationary phase correction result,

the data are synthetic from the Marmousi model;
a b
Fig.1 Non-stationary phase correction;
(a)Waveform; (c) Rotated waveform.
And then according to equation (7), we can see that the red
line has been corrected (Fig.1b). The non-stationary phase
correction method has been used for improving the
resolution of seismic profile, but in this abstract, we use it
to conduct seismic FWI, with the purpose of mitigating the
phase difference between the synthetic data and recorded
data.
Fig.4 Partial enlarged view from Fig.3 (red rectangular).
Fig.3 shows the result of non-stationary phase correction

waveform, where the u (black line) denotes synthetic
waveform, d (green line) denotes recorded waveform,
drot (red line) denotes the recorded waveform corrected by
non-stationary phase rotation. From Fig.4, d become close
True global minimum Local minimum to u (especially, position at the red arrows), so we can use
Global minimum the corrected recorded data to guide the FWI objective
function to jump out of local minimum values.
Fig.2 FWI norm value of the original waveform and phase Acoustic wave equations for TTI media
correction waveform. Which is from Fig.1a.
In this abstract, we use pure P wave equation to conduct 2D
© 2017 SEG Page 1634

TTI media FWI. Compared with conventional TTI coupled

equations, the pure wave equation is unconditionally stable 1  2u ( k x , k z , t )

(Zhan, 2011). Firstly, we would like to give a brief Vp
2
t 2
introduction about the complete isolated wave equation.
 k4
The exact phase-velocity wave equation was proposed by  k x2  k z2  (2 sin 2  cos 2   2 cos 4  ) 2 x 2
Tsvankin (1996) :  kx  kz
4
k
 (2 sin 2  cos 2   2 sin 4  ) 2 z 2
V 2 ( ) f f  2 sin 2  
 1   sin 2    1    1 
2(   ) sin 2 2 kx  kz
V p20 2 2 f   2 sin 2  
2
(8) (13)
f 1   k x2 k z2
 f 

 ( sin 2 2  3 sin 2 2  2 cos 2  )
k x2  k z2
where  is the phase angle measured from the symmetry k x3k z
axis, V p 0 is the P wave velocity in the direction of symmetry  ( sin 4  4 sin 2 cos 2  )
k x2  k z2
axis,  and  are Thomsen ’ s anisotropy parameters
k z3k x 
2 2
(Thomsen, 1986),and f  1  Vs 0 / V p 0 with the shear wave  ( sin 4  4 sin 2 sin 2  ) u (k x , k z , t )
k x2  k z2 
velocity along the symmetry axis denoted by Vs 0 .
Where the square root can be expanded by Taylor series

X
(ie., 1 X  1 ). So the pure wave equation can be
2
approximated by the following expression:
V 2 ( ) (   ) sin 2 2
 1  2 sin 2  
V p20  2 sin 2  
21 
(9)
f 
 
V ( )k x V ( ) k z  2k
Where sin   , cos   , V ( )  (k 2  kz 2 ) , Vs 0  0 . a b
  x z
Fig.5 P wave impulse response (snapshot at 0.4s).
Adding them into equation (9), we can obtain:
(a) Isotropic media; (b) TTI media (   0.25,   0.1,   45o ).
 2(   )k x2 k z2 
 2  V p20 (1  2 )k x2  k z2   (10)
 k x2  k z2 
Where the TTI media equation can be obtained by rotating
the z axis in the counterclockwise sense. The rotated
coordinate system can be write as:
 kˆx   cos sin   k x 

    (11)
 kˆ    sin  cos  k z 
 z 
And we have:
 kˆx  k x2 cos 2   k x k z sin 2  k z2 sin 2 
ˆ (12)
k z  k x2 sin s 2  k x k z sin 2  k z2 cos 2 
Replacing k x2 , k z2 with kˆx2 , kˆz2 , we can get the pure P wave a b

equation for TTI media. Using inverse Fourier transform to Fig.6 Recorded seismic data.
(a) The difference of recorded seismic data between isotropic
the both sides of wave equation, and then using the relation
media and TTI media ( d iso  dTTI ). (b) One trace data.
i   / t , we finally get the pure P wave equation in the
time-wavenumber domain for 2D TTI media. (Zhan, 2011).
The wave equation forward modeling in isotropic media
Equation (13) shows the time wave number domain wave was shown in Fig.5a, and the forward modeling in TTI
media was shown in Fig.5b. By comparing the two
equation for TTI media. Where V p denotes P-wave velocity,
snapshot, we can see that the wave front in TTI media is
 is the tilt angle of symmetry axis, k x is the horizontal not a circle. Fig.6 shows the difference of seismic data
wave number and k z is the vertical wave number). between isotropic media and TTI media, which was
© 2017 SEG Page 1635

forward modeling from Marmousi model. From Fig.6b, we

can see that the wave front has a great difference in travel We apply adaptive non-stationary phase correction FWI
time and waveform phase. (APFWI) to test on the modified Marmousi model as
shown in Fig.7a. The grid size of this modified Marmousi
Adaptive full waveform inversion model is 69  192 ,with a grid interval of 12.5 m, and the
velocity value ranges from 1.5km/s to 4km/s. The initial
In the process of adaptive non-stationary phase correction velocity model is built by linear model (Fig.7b). There are
FWI (APFWI), we do not want the phase of recorded 192 receivers and 20 shots equally spaced on the surface.
waveform is exactly the same as the synthetic waveform, The seismic source function is Ricker wavelet with the
because if there is no difference between synthetic data and dominant frequency of 18Hz. Recording time is 1.8s with
recorded data, the FWI process can not update the velocity the time interval of 1ms.
models. So we introduce an adaptive factor to conduct the
APFWI. The modified non-stationary phase rotation
equation is:
d rot ( t )  Re  d~e i  
 (14)
Where the adaptive factor   [0,1] .The objective function of

APFWI aims to minimize the difference between synthetic
data( u ) and corrected recorded data ( d rot ). The APFWI a b
which only need to match u with d rot , so the objective Fig.8 FWI result in TTI media (   0.25,   0.1,   45 o );
(a) Conventional FWI result; (b) APFWI result.
function can be expressed by the following equation:
1
  u  d rot  dt
T
2 s r 0
2
E (V p )  (15) From Fig.8a, we can see that the conventional FWI has
severe cycle skipping problem and the inversion result is
Adding equation (14) into equation (15), we have: very poor. In order to solve the cycle skipping problem, we
use APFWI with the same parameter, the inversion result is
2
  u~ d~    shown in the Fig.8b, we can see that it is much better than
1 T ~ 
E (V p )    u  Re  ~ ~  d   dt (16) conventional FWI result. The numerical results show that
2 s r 0 ud  the APFWI is a good method to mitigate the cycle skipping
    
problem for FWI.
Where the adjoint-state method is employed to calculate the Conclusion
gradient, and according to equation (16), we can obtain the
the gradient of multi-source acoustic APFWI in the TTI When the seismic recorded data has a phase deflection
media. The gradient operator is the same as conventional which is caused by underground complex media, the
FWI, which is shown in the equation(17): conventional objective function for FWI has a limitation
2
which the desired solution is not located at the global
E (v ) 2  Pf minimum. In order to avoid FWI falling into the local
  3  2  Pb (17)
v v s r t minimum values and guide FWI objective function
converge to the global minimum value, we introduce an
Where Pf denotes incident wave-field, Pb denotes back-
adaptive factor to conduct the adaptive non-stationary
propagation adjoint source wave-field. phase correction FWI. The numerical experiments
demonstrate that the adaptive FWI in TTI media is a good
Adaptive non-stationary phase correction FWI test way to guide the objective function jump out of the local
minimum and it can effectively mitigate the cycle skipping
problem.
The next step: In the field data of FWI, estimated seismic

source function may also has a phase deflection with
respect to real one, so the phase correction method may
obtain a good inversion result, even if the phase of source
function is not accurate.
a b
Fig.7 Velocity models.
(a) True velocity model; (b) Initial velocity model.
© 2017 SEG Page 1636

EDITED REFERENCES
REFERENCES
Du, X., J. C. Bancroft, and L. Lines, 2007, Anisotropic reverse-time migration for tilted TI media:
Geophysical Prospecting, 55, 853–869, http://dx.doi.org/10.1111/j.1365-2478.2007.00652.x.
Fomel, S., 2007, Local seismic attributes: Geophysics, 72, A29–A33,
http://dx.doi.org/10.1190/1.2437573.
Ge, Z., C. Reynam C. Pestana, Paul L. Stoffa .2011. An acoustic wave equation for pure P wave in 2D
TTI media. SEG Technical Program Expanded Abstracts: pp. 168–173,
http://dx.doi.org/10.1190/1.3627529.
Bai, L., L. Han, F. Zhang, P. Zhang, and Y. Hu, 2016, Multiscale adaptive full waveform inversion based
on the wavelet transform: 78th Annual International Conference and Exhibition, EAGE,
Extended Abstracts, https://doi.org/10.3997/2214-4609.201600645.
Guasch, L., and M. Warner, 2014, Adaptive waveform inversion - FWI without cycle skipping -
applications: 76th Annual International Conference and Exhibition, EAGE, Extended Abstracts,
http://dx.doi.org/10.3997/2214-4609.20141093.
Lailly, P., 1983, The seismic inverse problem as a sequence of before-stack migrations, in J. Bednar, ed.,
Conference on Inverse Scattering: Theory and Applications: Society for Industrial and Applied
Mathematics, Philadelphia, 206–220.
Luo, J., R. S. Wu, and F. Gao, 2016, Time-domain full-waveform inversion using instantaneous phase
with damping: 86th Annual International Meeting, SEG, Expanded Abstracts, 1472–1476,
Mirko, V., and S. Fomel, 2009, Nonstationary phase estimation using regularized local kurtosis
maximization: Geophysics, 74, A75–A80, http://dx.doi.org/10.1190/1.3213533.
Pratt, R. G., 1999, Seismic waveform inversion in the frequency domain; part 1, theory and verification in
a physical scale model: Geophysics, 64, 888–901, http://dx.doi.org/10.1190/1.1444597.
Shao, B., J. Huang, and Q. Li, 2014, An anisotropic gradient acoustic full waveform inversion in TTI
media: Beijing 2014 International Geophysical Conference & Exposition, 21–24 April 2014,
720–722, http://dx.doi.org/10.1190/IGCBeijing2014-184.
Tarantola A., 1984, Inversion of seismic reflection data in the acoustic approximation: Geophysics, 49,
1259–1266, http://dx.doi.org/10.1190/1.1441754.
Tsvankin, I., 1996, P-wave signatures and notation for transversely isotropic media: an overview:
Geophysics, 74, WCC1–WCC26, http://dx.doi.org/10.1190/1.3238367.
Warner M., and L. Guasch, 2014, Adaptive waveform inversion: theory: 84th Annual International
Meeting, SEG, Expanded Abstracts, 5183, https://doi.org/10.1190/segam2014-0371.1.
© 2017 SEG Page 1637

Frequency-domain full waveform inversion with an efficient iterative solver approach
Fei Xie, Jianping Huang and Zhenchun Li, School of Geosciences, China University of Petroleum (East China)
Summary method uses a gradually increasing batch of sources. As the

The huge computational complexity is one of the major batch-size grows, the method behaves like conventional
challenges in the application of full waveform inversion optimization, allowing for fast convergence. A finite-
(FWI) technology. In this study, an efficient iterative solver difference contrast source inversion (FD-CSI) method is
approach is proposed, which is applied to the frequency- described by Abubakar et al. (2009). For the FD-CSI
domain acoustic full-waveform inversion. By constructing method, when a direct solver is employed to solve the
the objective function for wavefield iteration, and deriving forward problem, the corresponding matrix factorization is
the corresponding gradient and step-length formula, the conducted only once regardless of the number of the
new method transforms the process of forward and back- inversion iteration. Since the first step of the FDCSI
propagated of residual data into an unconstrained optimiza- method is to calculate the contrast source of each shot, the
tion problem. In theory, the computational efficiency of the efficiency of the calculation will be challenged when the
new method is significantly higher than that of the conv- number of shots is too large. Secondly, this inversion
entional FWI. In the numerical experiment, the method can method is special and differs from the conventional FWI
obtain high precision of forward wavefields and back- method. The poor portability restricts its development.
propagated wavefields of residual data by several iterations, In the present study, we implement frequency-domain FWI
and the convergence rate is obviously higher than that of with an efficient iterative solver approach. Like the FD-CSI
the untreated GMRES method. With the aid of high- method, the new method also needs only one matrix
efficiency source encoding technology, the calculation time factorization in a frequency. We first decompose the
of the new method is only 1/8 of that of the conventional wavefield into background field and scattered field, and
FWI, which is consistent with the theoretical analysis (the construct the objective function for wavefield iteration.
number of wavefield iteration niter  8 , unknown number of Then, the gradient formula and calculation method of the
model N  4 104 ). And when niter  5 , the inversion effect step length are provided. Finally, we apply our method to a
Sag velocity model and a resampled Marmousi model. The
of the new method is basically the same as that of the
final results prove the efficiency and effectiveness of our
conventional FWI.
approach.
Introduction
Theory
Full waveform inversion (FWI) seeks to iteratively estimate
In the frequency domain, the acoustic-wave equation with
subsurface properties by minimizing the difference between
constant density in 2D is governed by the following
observed and modeled data (Lailly, 1983; Tarantola, 1984;
equation (Marfurt, 1984)
Pratt et al., 1998). Although FWI overcomes several
limitations of some of the common imaging techniques and 2  k 2 r, ur, rs ,    f   r  rs  ， (1)
can provide high-resolution estimates of the medium where r   x, z  denotes the subsurface location with
parameters. One of the main problems in FWI is the Cartesian coordinates, rs is the source position,  2 is the
computational cost of the inversion for multiple sources
and receivers. Laplace operator,  r  rs  indicates the Dirac delta
The simultaneous source encoding technology is the main function, f   is the source vector and  is the angular
method to solve the problem of a large number of sources frequency. Here, ur, rs ,   respects the scalar pressure
at present. The concept of phase-encoded shot gathers was
first introduced in prestack migration by Romero et al. wavefield in the frequency domain, k r,    vr 
(2000). Krebs et al. (2009) applied the random phase denotes wavenumber in which vr  is the acoustic velocity.
encoding to FWI in the time domain, and generated new We split the total field into its background field and
encoded supershots at every iteration to suppress the scattered field, u  ubac  u sct . The background field
crosstalk that arises from the correlation between shots in
satisfies the equation (Abubakar et al., 2009)
the supershot. This approach is also implemented in the
frequency-domain FWI. And it seems more sensitive to 2  kb2 r, ubac r, rs ,    f   r  rs  ， (2)
random noise in the data, depending on the source where kb is the wavenumber of the background medium.
assembling method (Ben-Hadj-Ali et al., 2011). By subtracting equation 2 from equation 1 and using the
In addition to source encoding, some scholars take other definition of the scattered field, the scattered field can be
methods to reduce the computation of FWI. A reduction in written as
computational cost based on a batch sampling method was
2  kb2 r, u sct r, rs ,   r, ur, rs ,  ， (3)
proposed by van Leeuwen and Herrmann (2013). This
© 2017 SEG Page 1638

Frequency-domain FWI
where  r,   kb2 r,   k 2 r,  . By using a linear ope- approach will be very efficient by reducing the N S by
rator L b , the solution of equation 3 can be formally written efficient source encoding and parallel techniques.
as
Numerical Examples
u sct r, rs ,   Lb  r, ur, rs ,  . (4) In this section, first, we examine the effectiveness and
For convenience, the following is no longer deliberately efficiency of the new method to generate the wavefields.
emphasize the coordinates of the location r , angular We consider the Sag velocity model, Figure 1a, discretized
frequency  et al. We can define the objective function for on a 150  100 grid with a grid spacing of 10 m for the
the wavefield iteration as follows vertical and horizontal directions. Figure 1b is a starting
2 model obtained by smoothing the true velocity model. The
Cu   u  u bac  Lb u  ， (5)
only source is located at r  ,20 m, and 150 receivers
where the linear operator L b is determined by the imped- are regularly placed at the surface. For the source function,
ance matrix decomposition results of the background we used a Ricker wavelet with a 10 Hz central frequency.
medium and remains unchanged at the same frequency. In
this study, the conjugate gradient method with Polak-
Ribière search direction is employed to minimize objection (a)
function 5. The gradient can be shown to be
uC  2ru    *Lb ru  ， (6)
ru   u  u bac  Lb u ， (7)

where operator L represents the adjoint of operator L b .
b
After the search direction w is obtained, the step length 

can be found by minimizing the equation Cu   w 
0. (b)
The solution of the forward wavefields are solved in the
above derivation. Another important part of FWI is to solve
the back-propagated wavefields of residual data. The
equation 5 is also suitable for solving the backward
wavefields. It is worth noting that the conjugate of residual
data correspond to the source and is constantly changing in
the model update. So it needs to recalculate background
wavefield in each inversion iteration.
The operators L b and Lb are computed only once, at the Figure 1: (a) The sag model and (b) the initial smoothing model.
beginning of the inversion process. It is a good choice to
use a direct solver if one could afford the large memory Figure 2a is the forward wavefield (real part) of 8 Hz data
requirement of the LU decomposition. The step length for generated by LU decomposition of the true model. And
updating model is computing by parabolic fitting. So an Figure 2b shows the forward wavefield generated by LU
additional 2 times forward are needed to estimate the step decomposition of the starting model. Figure 2c is the
length. The computational complexity of our new method wavefield obtained by new method after 10 wavefield
and conventional FWI with CG method are estimated iterations using Figure 2b as the background wavefield.
respectively by Figure 2d is the result of the wavefield (using Figure 2b as
the initial value) solved by the GMRES method without the
Tnew  O N 1.5   N iter niterO N S N log N  ， (8)
pretreatment, and the iteration time is consistent with
TCG  N iter  O N 1.5   O N S N log N  ， (9) Figure 2c (about 0.7 s). The third row in Figure 2 is the
S iter iter
in which, N , N , N and n respectively represent the result of the difference between the second row and Figure
2a, respectively. It can be seen that the residual wavefield
number of the unknown parameters, sources, model
(Figure 2g) after the iteration of the GMRES method is
iterations and wavefield iterations. The term O N 1.5  and smaller than that of the figure 2e, and the residuals (Figure
O N log N  denote the cost of the LU decomposition and 2f) after the iteration of the new method approaches to zero,
back-substitutions respectively. Pratt and Worthington have which is consistent with the standard wavefield (Figure 2a).
pointed out that the back-substitutions take less than 5% of In this study, the iterative solver approach must be
the CPU time of matrix decomposition (Pratt and combined with the source encoding technique and the
Worthington, 1990). The new method as an iterative solver hardware parallel acceleration method to compress N S , in
© 2017 SEG Page 1639

order to give full play to the new method of computing the new method is an unconstrained optimization method
efficiency. And when N S is very small, the commonly used based on the CG method, which has a clear direction and
iterative solution (e.g., GMRES) becomes an option to step length. However, the GMRES method needs to be
consider. Figure 3a is the normalized wavefield error of the pretreated, which is usually difficult to be satisfied. With
two iterative methods with the iteration time. The the efficient source encoding technique, the calculation
convergence rate of the new method is significantly higher time of the new method is obviously lower than that of the
than that of the GMRES method (the time of new method conventional FWI, and the inversion effect is similar.
does not include the matrix decomposition, because only
one decomposition is done in a frequency). Figure 3b is the Acknowledgments
curve of normalized wavefield error with the number of This research is supported in part by the National Natural
iterations. The residuals converge substantially after about Science Foundation of China (grants 11501302 and
six iterations. 91646116), and the Scientific and Technological Support
Then, the inversion effect of the new method is tested on Project (Society) of Jiangsu Province (grant BE2016776).
data associated with the Marmousi model, where the
original model is modified to be the dimensions of 3.75 (a)
km  1.5 km with a 15 m grid interval. We distribute 125
sources at the surface with a source interval of 30 m, and
250 receivers are regularly placed at the surface. The
truncated singular value encoding (TSV) method (Godwin
and Sava, 2010) is used to encode the 125 shots into the 15
supershots and the parallel inversion strategy is adopted.
We select 12 frequencies ranging from 4 to 36 Hz, and the
inversions results of the previous frequency are chosen as
the initial guess for the next frequency inversion. The
maximum number of the inversion iteration is N iter  20 for
each frequency. (b)
The inverted model of conventional FWI is shown in
Figure 4a. When niter is taken at 10, 8, 5 and 2, the
inversion results of the new method corresponds to Figure
4(b-e). Due to the influence of cross-talk, the low velocity
layer of the inversion results are obviously affected. But
overall, the inversion results in Figure 4(a-d) are better and
very close. When niter is equal to 2, the inverted model
falls into the local minimum due to the small number of
wavefield iteration. Figure 5 plots the computation time as
a function of the iteration number. The calculation of Figure 3: Normalized error of the wavefield versus (a) computation
conventional FWI is significantly higher than that of the time (read line: GMRES method, black line: new method) and (b)
iterations (new method).
new method. And the smaller the niter , the less computation
time. In this test, there are 20 grid PML boundaries around
the model. The conventional FWI calculation time is about
8 times that of the new method at niter equals 8
( N  4 104 ).
Conclusions
In this study, an efficient iterative solver approach is
proposed, which is applied to the frequency-domain
acoustic full-waveform inversion. The new method
converts the calculation of frequency-domain wavefields
into an unconstrained optimization problem, and it can
generate high precision forward and backward wavefields
by several iterations. The convergence rate of normalized
error of the new method is significantly higher than that of Figure 5: The computation time versus iterations for conventional
FWI method and the new method with the number of wavefield it-
the unprocessed GMRES method. The main reason is that
eration equals 10, 8, 5, and 2, respectively.
© 2017 SEG Page 1640

(a)
(b) (c) (d)
(e) (f) (g)
Figure 2: The real part of forward wavefields of 8Hz data. The forward wavefield obtained by LU decomposition of (a) the true model and (b) the
initial model. The forward wavefield obtained by (c) the new method and (d) the GMRES method with same computing time. The third row is
generated by subtracting the (a) from the second row, respectively. For example, (e) is the difference between (b) and (a).
(a)
(b) (c)
(d) (e)
Figure 4: Reconstructed Marmousi velocity model obtained using (a) conventional FWI and the new method with (b) niter  10 , (c) niter  8 , (d)
niter  5 and (e) niter  2 .
© 2017 SEG Page 1641

EDITED REFERENCES
REFERENCES
Abubakar, A., W. Hu, T. Habashy, and P. M. van den Berg, 2009, Application of the finitedifference
contrast-source inversion algorithm to seismic full-waveform data: Geophysics, 74, no. 6,
WCC47–WCC58, https://doi.org/10.1190/1.3250203.
Ben-Hadj-Ali, H., S. Operto, and J. Virieux, 2011, An efficient frequencydomain full waveform inversion
method using simultaneous encoded sources: Geophysics, 76, no. 4, R109–R124,
https://doi.org/10.1190/1.3581357.
Godwin, J., and P. Sava, 2010, Simultaneous source imaging by amplitude encoding, Technical Report
CWP-645, Center for Wave Phenomena, Colorado School of Mines, Golden.
Krebs, J. R., J. E. Anderson, D. Henkley, R. Neelamani, S. Lee, A. Baumstein, and M. Lacasse, 2009,
WCC188, https://doi.org/10.1190/1.3230502.
Inverse Scattering, Theory and Application, Society for Industrial and Applied Mathematics,
Marfurt, K., 1984, Accuracy of finite-difference and finite-elements modeling of the scalar and elastic
wave equation: Geophysics, 49, 533–549, https://doi.org/10.1190/1.1441689.
Pratt, R. G., and M. H. Worthington, 1990, Inverse theory applied to multisource cross-hole tomography:
Geophysical Prospecting, 38, 287–310, https://doi.org/10.1111/j.1365-2478.1990.tb01847.x.
Pratt, R. G., C. Shin, and G. J. Hicks, 1998, Gauss-Newton and full Newton methods in frequency-space
https://doi.org/10.1046/j.1365-246X.1998.00498.x.
Romero, L. A., D. C. Ghiglia, C. C. Ober, and S. A. Morton, 2000, Phase encoding of shot records in
prestack migration: Geophysics, 65, 426–436, https://doi.org/10.1190/1.1444737.
1259–1266, https://doi.org/10.1190/1.1441754.
van Leeuwen, T., and F. J. Herrmann, 2013, Fast waveform inversion without source-encoding:
Geophysical Prospecting, 61, 10–19, https://doi.org/10.1111/j.1365-2478.2012.01096.x.
© 2017 SEG Page 1642

Fast Full Waveform Inversion using a Schur complement based frequency-domain finite-
difference modeling
Debanjan Datta*, Piyoosh Jaysaval, Mrinal K. Sen, and Adrien Arnulf (Institute for Geophysics, The University
of Texas at Austin)
Summary constrained, and thereby costing crucial wall clock time and
needlessly making FWI more expensive.
Full Waveform Inversion (FWI) iteratively updates an initial
model using a scaled gradient of the objective function until In this paper for the above constrained velocity situations,
a satisfactory match is found between the observed and we present a FWI with an efficient frequency-domain finite-
synthetic data. Each iteration of FWI requires two forward difference (FDFD) modeling approach to reduce the
evaluations to compute the gradient and several other computational cost by using a Schur complement approach
evaluations to compute a step length. The numerous forward following the work done by Jaysaval et al. (2014). In this
evaluations make FWI computationally expensive for approach, we precompute the contribution of the constrained
routine use. However, there are parts of the model — e.g., region only once and for all in the form of a Schur
water column in a marine environment — that do not change complement. We then solve the Schur complement system
their velocities. Still in FWI with the standard finite- to get responses in the unconstrained region. For each
difference modeling, we have to recompute the effect of the subsequent simulation for a model with new unconstrained
constrained part multiple times because of changing velocity region velocities, one only needs to modify the Schur
of the unconstrained part. We propose a FWI with an complement using some simple algebraic operations
efficient frequency-domain finite-difference(FDFD) followed by solving the modified Schur complement system.
modeling approach that partitions the model into a Therefore, for subsequent simulations, the large linear
constrained region and an unconstrained region. The system solution is reduced to the solution of only a
contribution of the constrained region is precomputed only comparatively small Schur complement linear system. This
once in the form of a Schur complement and subsequent approach leads to significant savings in modeling time, and
modeling is done by just solving a comparatively small hence in inversion time, compared to FWI with the standard
Schur complement system. This approach saves significant FDFD modeling. We use a sparse direct solver MUMPS
run time compared to solving for the whole model again and (Amestoy et al., 2001, 2006) to precompute the Schur
again as done with the standard FDFD modeling. We complement as well as to solve the Schur complement
demonstrate the efficiency of our approach using a deep- system. We apply our proposed approach to a deep-water
water synthetic model and the Sigsbee2A model. synthetic model and the Sigsbee2A model to demonstrate the
efficiency.
Introduction
Theory
Full Waveform Inversion (FWI) (Tarantola, 1984; Virieux
and Operto, 2009) requires several passes of forward We use the constant-density acoustic wave equation
modeling to compute the model updates iteratively to obtain formulation given in the frequency domain as
an optimal model for which the misfit between its synthetic
data and observed data is below a tolerance level. As the 𝜔"
∇" 𝑃 𝐱|𝐱 & , 𝜔 + 𝑃 𝐱|𝐱 & , 𝜔 = 𝑠 𝐱 & , 𝜔 , (1)
forward modeling kernel is computationally expensive, the 𝑣"𝐱
FWI workflow as a whole becomes prohibitive to use on a
routine basis. where 𝐱 is the position vector, 𝑣 and 𝜔 are the velocity and
angular frequency, respectively. 𝑃 is the pressure field and 𝑠
In general, the starting model for FWI incorporates a is the source located at 𝐱 & .
considerable amount of priori information and is developed
after several passes of migration velocity analysis. This To simulate wavefield 𝐩, we discretize equation 1 using the
allows certain regions of the velocity model to remain standard finite differences with five-point stencil. This leads
constant during the course of inversion. A good example is to a linear system
the water column in a marine environment which velocity
remains unchanged during the workflow. In such situation,
FWI with the standard finite-difference or finite-element 𝐀𝐩 = 𝐬 , (2)
modeling methods computes the response of the entire
computational domain even if certain regions remain where 𝐀 is the complex-valued impedance matrix. The
diagonal elements of 𝐀 depend on 𝜔, 𝑣, and the grid spacing,
© 2017 SEG Page 1643

Fast FWI using Schur FDFD modeling
while the off-diagonal elements depend only on the grid To compute the response 𝐩8 from the anomalous region, we
spacing. In the standard FDFD method, we solve equation 2 need to solve only the Schur complement system (equation
to obtain the pressure wavefield. 8).
The seismic data can then be extracted by sampling the During the process of model update in FWI, only the
wavefield at given receiver locations as velocity in the anomalous zone changes. Therefore, 𝐀 77 ,
𝐋77 and 𝐔77 , remain unchanged. Because the off-diagonal
elements in 𝐀 do not depend on the velocities, matrices 𝐀 78
𝐝 = 𝑔(𝐩), (3)
and 𝐀 87 , and hence 𝐋87 and 𝐔78 are also invariant. Equation
7 implies that vector 𝐲 also remains unchanged. Therefore,
where 𝑔 is an interpolating operator that extracts the to compute the anomalous field 𝐩C8 for the changing models,
wavefield at the receiver locations. one only requires the modified Schur complement 𝐒 C , which
is given by
Schur complement based FDFD modeling
𝐒 C = 𝐒 − 𝐀 88 + 𝐀C𝒂𝒂 , (10)
The computational domain is divided into zones 𝑎 and 𝑏.
Velocity in 𝑎 is allowed to vary, while in 𝑏 it remains where matrix 𝐀C𝒂𝒂 is built on the anomalous region. As a
constant. As a result of this decomposition, the linear system result, each subsequent modeling requires solving for a
can be written in the block form as, relatively small modified Schur complement system 𝐒 C 𝐩C8 =
𝐲8 instead of the main system 𝐀𝐩 = 𝐬.
𝐀 77 𝐀 78 𝐩7 𝐬7
= , (4) Full Waveform Inversion
𝐀 87 𝐀 88 𝐩8 𝐬8
where matrices 𝐀 77 and 𝐀 88 include coefficients The misfit between the synthetic and observed data can be
exclusively to 𝑏 and 𝑎 respectively. defined by the 𝐿" norm of the data residual at the receiver
locations as
This blocked system is partially factorized only for 𝐀 77
eliminating the background unknowns as M
𝐸 = 𝒅𝒐𝒃𝒔 − 𝒅𝒔𝒚𝒏 𝒅𝒐𝒃𝒔 − 𝒅𝒔𝒚𝒏 , (11)
𝐋77 𝟎 𝐔77 𝐔78 𝐩7 𝐬7
= , (5) where 𝒅N7& is the observed data and 𝒅&OP is the synthetic
𝐋87 𝐈 𝟎 𝐒 𝐩8 𝐬8
data, and † is the complex conjugate transpose operator.
where 𝐋77 𝐔77 = 𝐀 77 , 𝐋87 𝐔77 = 𝐀 87 , and 𝐋77 𝐔78 = 𝐀 78 .
The gradient of the objective function is computed using the
adjoint-state method (Plessix, 2006) where the data misfit is
𝐒 = 𝐀 88 − 𝐋87 𝐔78 = 𝐀 88 − 𝐀 87 𝐀?𝟏
77 𝐀 78 , (6) back propagated to compute the adjoint wavefield. The
equation used for computing the adjoint wavefield is given
is the Schur complement. Here, 𝐀 87 𝐀?𝟏 77 𝐀 78 indicates the by
effect from the factorized region 𝑏 to the unfactorized region
𝑎. 𝜔"
∇" 𝑅 𝐱|𝐱 S , 𝜔 + 𝑅 𝐱|𝐱 S , 𝜔 = Δ𝑑 𝐱 S |𝐱 & , 𝜔 ∗ , (12)
𝑣" 𝐱
An intermediate vector 𝐲 is then computed by forward
substitution using
where 𝑅 is the adjoint wavefield, Δ𝑑 = 𝒅N7& − 𝒅&OP is the
𝐋77 𝟎 𝐲7 𝐬7 data residual at receivers located at 𝐱 S due to a source at 𝐱 & ,
= . (7) and ∗ represents complex conjugate. The gradient of the
𝐋87 𝐈 𝐲8 𝐬8 objective function is estimated by cross correlation of the
adjoint and forward wavefields as
From equations 5 and 7 we obtain
𝜕𝐸 2𝜔 "
= − ℜ𝑒 𝑃 𝐱|𝐱 & , 𝜔 𝑅 𝐱|𝐱 S , 𝐱 & , 𝜔 , (13)
𝐒𝐩8 = 𝐲8 , (8) 𝜕𝒎 𝑣 𝐱 \
𝐱 ] ,^
𝐔77 𝐩7 = 𝐲7 − 𝐔78 𝐩8 . (9)
where ℜ𝑒 denotes the real part.
© 2017 SEG Page 1644

After estimating the gradient, we update the model model shown in Figure 6 was chosen as the Sigsbee
iteratively using a scaled value of the gradient. We use an L- migration velocity with the correct salt shape and velocity
BFGS optimizer (Liu and Nocedal, 1989) to compute the with a smooth background. Similar to the previous synthetic
pseudo Hessian and the step size is estimated using line model, we used the L-BFGS optimizer and run the inversion
search algorithm. over three frequency groups starting from 3 to 8 Hz with 30
iterations per frequency group. The final model after
Examples inverting 3 frequency groups and 251 forward evaluation is
shown in Figure 7. Figure 8a compares the Wall Clock time
We demonstrate the computational efficiency of our Schur of the modeling kernel using the standard FDFD as well as
complement based FWI using a simple deep-water synthetic the Schur approaches, while 8b compares the Wall Clock
model and the Sigsbee2A model. The deep-water model Time of the FWI workflow. Owing to the shallower water
shown in Figure 1 consists of a small anomalous block depth, the Schur complement based FDFD takes about 80%
beneath the sea floor. The dimensions of the model are 800 of the time taken by the conventional approach.
and 2010 points, respectively, in the 𝑧- and 𝑥-directions with
a grid spacing of 10 m. The sea floor is fixed at a depth of
4 km halfway across the vertical extent of the model and the
anomalous block is placed 1 km beneath the sea floor. The
starting model shown in Figure 2 is the same as the true
model without the anomalous block. Because the water
velocity does not change, we do not update the water layer,
and hence all velocity updates are performed only beneath
the sea floor. Because of fixed velocity of the water column,
the Schur interface that partitions the model into a
background and an anomalous block is placed at a depth of
4km just above the sea floor and is marked in Figure 1 with
a white dotted line. All computations have been performed Figure 1. The synthetic deep-water model. The model
on a single Intel Xeon E5-1650 running at 3.5Ghz with 128 consists of 4 km water column with an anomalous body
GB of RAM. placed at a depth of 1 km below the sea floor. The Schur
interface is shown in a dotted white line.
We placed 101 shots at a spacing of 200 m and 400 receivers
per shot with a spacing of 50 m horizontally across the model
at 20 m below the sea surface. The receivers were the same
for each shot. We run our inversion for three frequency
groups starting from 3 Hz to 7 Hz with 30 iterations per
frequency group using the standard FDFD solver as well as
the Schur complement based FDFD solver. For the Schur
FDFD, we define an interface (refers Schur interface) at
4 km that separates the background (i.e., water) and
anomalous (i.e., sub-seabed sediments) regions. We used an
L-BFGS routine to minimize the data misfit in our FWI
formulation. The final inverted model obtained from FWI Figure 2. The starting model for FWI for the synthetic data
after 220 forward evaluations are shown in Figure 3. The example
anomalous block is recovered well from FWI. Figure 4a
compares the Wall Clock time of the modeling kernel using
the standard FDFD as well as the Schur FDFD approaches,
while 4b compares the Wall Clock Time of the FWI
workflow. The Schur complement based FDFD takes about
40% of the time taken by the conventional approach.
The second model we used is the Sigsbee2A model shown

in Figure 5. The model contains 801 and 2133 points in the
𝑧 and 𝑥directions respectively with a grid spacing of 10m.
267 shots were placed at a spacing of 80 m and 1065
receivers placed 20 m apart. The Schur interface is marked Figure 3. The inverted model obtained by FWI using 3
with a white dotted line at a depth of 1.6 km. The starting frequency groups from 3 − 7 Hz.
© 2017 SEG Page 1645

Figure 7. The inverted Sigsbee2A model using 3 frequency

groups from 3 − 7 Hz. The sediments are better defined with
Figure 4. Comparison of Wall clock times for the synthetic
more resolution.
deep-water model: (a) for a single forward modeling
operation and (b) for FWI workflow.
Figure 8 Comparison of Wall clock times for the Sigsbee2A

Figure 5. The true Sigsbee2A model. The Schur interface model: (a) for a single forward modeling operation and (b)
shown in the dotted white line is placed at 1.6 km. for FWI workflow.
Conclusions
We propose a novel approach to reduce runtime for FWI by

exploiting the singularity of part of the model that remains
unchanged during the inversion by computing its Schur
complement. This results in significant gains in runtime over
solving for the whole model. The new approach fits in
exactly with the current formulation of FWI with changes
only to the modeling kernel. Additionally, the results
obtained by the Schur complement based solver are the exact
same as obtained by the conventional method. Although in
our examples, the gains in runtime are directly related to the
Figure 6. The starting model for the Sigsbee2A model. length of the water column, the Schur complement approach
can also be used in a layer stripping formulation where
inverted layers can be kept constant as we invert for deeper
parts of the velocity model.
© 2017 SEG Page 1646

EDITED REFERENCES
REFERENCES
Amestoy, P. R., A. Guermouche, J. Y. L’Excellent, and S. Pralet, 2006, Hybrid scheduling for the parallel
solution of linear systems: Parallel Computing, 32, 136–156,
https://doi.org/10.1016/j.parco.2005.07.004.
Amestoy, P. R., I. S. Duff, J. Y. L’Excellent, and J. Koster, 2001, A fully asynchronous multifrontal
solver using distributed dynamic scheduling: SIAM Journal on Matrix Analysis and Applications,
23, 15–41, https://doi.org/10.1137/S0895479899358194.
Jaysaval, P., D. Shantsev, and S. de la Kethulle de Ryhove, 2014, Fast multimodel finite-difference
controlled-source electromagnetic simulations based on a Schur complement approach:
Geophysics, 79, no. 6, E315–E327, https://doi.org/10.1190/geo2014-0043.1.
Liu, D. C., and J. Nocedal, 1989, On the limited memory BFGS method for large scale optimization:
Mathematical Programming, 45, 503–528, https://doi.org/10.1007/BF01589116.
https://doi.org/10.1111/j.1365-246X.2006.02978.x.
Tarantola, A., 1984, Linearized inversion of seismic reflection data: Geophysical Prospecting, 32, 998–
1015, http://doi.org/10.1111/j.1365-2478.1984.tb00751.x.
© 2017 SEG Page 1647

Estimating velocity and Q by fractional Laplacian constant-Q wave equation-based full
waveform inversion
Hanming Chen* and Hui Zhou, State Key Laboratory of Petroleum Resources and Prospecting, CNPC Key Lab
of Geophysical Exploration, China University of Petroleum (Beijing)
Summary Recently, a decoupled fractional Laplacian viscoacoustic

wave equation was developed by Zhu and Harris (2014).
We propose to estimate velocity and seismic quality factor The fractional wave equation was derived from the CQ
(Q) by using a time-domain viscoacoustic full waveform dispersion relation, and has a high accuracy to fit the CQ
inversion (FWI) method. A newly developed fractional model. Numerical difficulty in approximating the spatial
Laplacian constant-Q (CQ) wave equation is used as the variable-order fractional Laplacian has later been resolved
forward modeling kernel in FWI. The adjoint operator and by Chen et al. (2014, 2016), and Sun et al. (2014, 2015)
gradients for updating velocity and Q are derived in this with the help of the low-rank decomposition algorithm
abstract. We adopt a convolution-based objective function (Fomel et al., 2013). The decoupled fractional Laplacian
to remove the source wavelet effect on inversion result, viscoacoustic wave equation has been verified very helpful
thus we do not need to know the exact source wavelet or for developing stable Q-compensated reverse time
estimate the source wavelet before implementing FWI in migration (RTM) methods (e.g., Sun et al., 2016). More
our method. To improve computational efficiency of our recently, Xue et al. (2016) used the same viscoacoustic
viscoacoustic FWI, we apply the plane-wave inversion wave equation to implement FWI to invert velocity.
scheme to reduce the amount of data to invert. A synthetic
data example using the Marmousi model verifies that our In this abstract, we develop a new viscoacoustic FWI
method can rebuild both velocity and Q models with a high scheme to invert velocity and Q simultaneously based on a
resolution. new fractional Laplacian viscoacoustic wave equation
(Chen et al., 2016). By utilizing the decoupling effects of
Introduction the wave equation, we propose a non-attenuating adjoint
operator to backward propagate residual data to build
Full waveform inversion (FWI) has become a popular gradient images. The gradients used to update velocity and
method for estimating subsurface geophysical parameters Q are derived. To eliminate the impacts caused by using an
due to its higher resolution than traditional traveltime inaccurate source wavelet in FWI, we adopt the source-
tomography. By using different wave equations, FWI can independent objective function (Choi and Alkhalifah, 2011)
be used to predict different classes of geophysical to drive our viscoacoustic FWI. The plane-wave inversion
parameters, separately or simultaneously (Virieux and scheme (Vigh and Starr, 2008) is also incorporated to
Operto, 2009). Among the existing multi-parameter FWI speed-up our viscoacoustic FWI.
literatures, simultaneous inversion of velocity and Q by
viscoacoustic FWI is reported to be challenging due to a Fractional Laplacian viscoacoustic wave equation
strong trade-off effect between these two parameters (e.g.,
Operto et al., 2013). Viscoacoustic FWI can be The newly developed fractional Laplacian viscoacoustic
implemented in either the frequency domain or the time wave equation is expressed as (Chen et al., 2016),
domain. In the frequency domain, Q is augmented into F  u, s, co , Q   A  co , Q  u  s  0, (1)
velocity as the imaginary part to form a complex velocity
according the constant-Q (CQ) model (Kjartansson, 1979). and
1
However, developing a CQ wave equation is not trivial in
A  co , Q    tt   2
the time domain for the intractable calculation of the co2
temporal convolution between stress and strain. To our 33 (2)
 1 2  2   2  32  3  t   2 
0.505
knowledge, most of the existing time domain viscoacoustic ,
FWI schemes adopt the standard linear solid (SLS) model
Ld La
based wave equation as the forward modeling kernel (e.g.,
Bai et al., 2014). However, the SLS wave equation has a where tt , t represent the second- and first-order temporal
low accuracy to match the widely used CQ model. To
derivatives respectively,  2 is the Laplacian operator, u
ensure a high accuracy, at least three SLS wave equations
have to be solved together (e.g., Zhu et al., 2013). The denotes wavefield, s is source wavelet, co represents the
increased number of wave equation leads to a higher reference velocity at the reference angular frequency o ,
computational cost. d is dominant frequency of the source, and
© 2017 SEG Page 1648

Estimating velocity and Q by FWI
1 separately, the symbol  represents temporal convolution.

 32  32  co 16
1  1     1, 2     , di , k and ui,k are reference traces in the i-th shots of
 Q   Q  d 
predicted and observed data respectively. The reference
2
(3)
traces should have a relatively high singal to noise ratio
   Q  1  1
3   ,    d  cos 2   cos   , (SNR), thus they are usually selected from the near-offset
Qco 
 o  2Q  Q traces or computed by averaging several near-offset traces
For the detailed derivation of equations 1-3 and the (e.g., Zhang et al., 2016).
accuracy analysis, one can refer to Chen et al. (2016). The
fractional Laplacians in equation 2 are computed by fast FWI uses a zero-lag cross-correlation of the forward and
Fourier transform (FFT), for example, backward propagated wavefileds to calculate gradients.
With the objective function 5, a wrong wavelet is allowed
 2 32 u  IFFT  k 16 FFT u .
33 33
(4) be applied to simulate the predicted data, and to remove the
wrong wavelet impacts, a modified residual data should be
where k denotes wavenumber vector in 2D or 3D. In this computed and backward propagated using the adjoint
abstract, we only consider inversion in 2D case. Figure 1 opeartor,
F  u, s, co , Q 
displays simulated snapshots by different wave equations in †
a homogeneous medium with co  2 km/s and Q  50 . The i  ri,' j +   x - xk  ri,'' j , (6)

same Ricker wavelet with the dominant frequency of
u
f d  20 Hz is used as the source. The velocity is defined at where i denotes the i-th shot adjoint wavefield,   x  is
the reference frequency of fo  100 Hz . Figure 1 indicates the Dirac function, xk represents spatial position of the
that the viscous effect of media not only attenuates reference trace, † denotes conjugate transpose, and
amplitude, but also delays phase. Comparison of the top- ri', j = di , k  ri , j ,
right, bottom-left, and bottom-right snapshots implies that  ''
the operator Ld in equation 2 mainly controls phase ri , j = di , j  ri , j , (7)

ri , j = ui , j  di , k  di , j  ui , k ,
distortion, and La mainly controls amplitude loss during
wave propagation.
in which  denotes temporal cross-correlation.
Non-attenuating adjoint operator and gradients
Substitution of eqautions 1 and 2 into 6 leads to the adjoint

wave equation,
33
    2   1 2    2   2  32 
1
2 tt
co
Ld
(8)
 3  t   
2 0.505
d , res
La
where d res is the residual data on the right-hand side of

Figure 1: Snapshots propagated by different wave equations. equation 6. One can notice that the only difference from the
forward propagating wave equation 2 is that the minus in
Source-independent objective function
front of 3 in 2 changes into plus in 8. However, since the
When implementing FWI, we adopt the convolution-based residual data is imposed in a reverse time order, the
objective function to remove the source wavelet effect on discretized formulation of 8 is actually same as that of
inversion result (Choi and Alkhalifah, 2011), equation 1, which means the adjoint opeartor attenuates
ns nr wavefield again. Here, we propose to use a non-attenuating
E   ui , j  di , k  di , j  ui,k ,
2
(5) adjoint operator to backward propagate the residual data,
i j 33
where u, d represent the predicted and observed data

1
co2
 tt    2
  1 2
   2   
2 32
  d res . (9)
respectively, i, j are the indexes for shot and receiver, ns Ld
and nr denote the total numbers of shots and receivers
© 2017 SEG Page 1649

Equation 9 is obtained by directly removing La in 8, and well in both Figure 3a and 3b. Through iterations the
we refer it to as non-attenuating adjoint operator. We velocity values of different layers are corrected to be very
expect equation 9 to construct a phase-correct gradient, close to the true values. The Q values in the shallow layers
because the simulated wavefields by equations 8 and 9 are rebuilt with a small misfit, however, in the deeper
have almost the same phase, as displayed by Figure 1. The layers Q is not estimated nicely. This is further
constructed gradient by equation 9 would have a higher demonstrated by Figure 4, in which the vertical profiles at
resolution than that by equation 8, since equation 9 does x  810 m in Figure 3 are displayed. The reason for the
not attenuate the wavefield again. Additionally, the non- large Q misfit in the deep layers is that velocity dominates
attenuating operator saves computational cost due to less the contribution to the objective function, and Q has a
terms involved in equation 9. much weaker contribution. Even so, the spatial variation
tendency of the Q model is still predicted correctly. One
The gradients of velocity and Q can be derived by using the can also compare the inversion results in Figure 3a and 3b,
following formulae, and observe some slight crosstalk noises in the inverted Q
image, as indicated by the ellipse in Figure 3b. The
F  u, s, co , Q 
†
 co  , crosstalk noise can be further suppressed by using

co regularization (e.g., Xue and Zhu, 2015; Xue et al., 2016).
(10) Since only 5 p-indexed super shots are used to implement
F  u, s, co , Q 
†
Q  . the viscoacoustic FWI, an approximate speed-up factor of 6

Q is achieved in Figure 3b.
To reduce the computational cost of the viscoacoustic FWI, Conclusions

we incorporate the plane-wave data scheme (Vigh and Starr,
2008) into our inversion method. When the scheme is We demonstrate the feasibility of a new time domain
applied, the total number of shots to invert decreases from viscoacoustic FWI scheme in estimating velocity and Q
ns to np , where np denotes the total number of p- simultaneously. Our viscoacoustic FWI scheme is based on
a newly developed fractional Laplacian wave equation. The
indexed super shots, and usually much smaller than ns .
equation has a high accuracy to match the constant-Q
model. As a basic work in FWI, we derive the gradients of
Examples velocity and Q, and further propose a non-attenuating
adjoint operator to backward propagate the residual data.
We use the Marmousi model (Figure 2a) to test our Compared with the traditional attenuating adjoint operator,
viscoacoustic FWI scheme. The model grid size is the new adjoint operator saves computational cost, and can
156  z   340  x  with the grid intervals of 10 15 m . A total be expected to build a phase-correct gradient with higher
number of 31 common-shot gathers are synthesized by resolution. The source-independent objective function and
using equation 1, and regarded as the observed data. A the plane-wave inversion scheme are incorporated into our
fixed-spread geometry is applied. The first-order derivative viscoacoustic FWI method to remove the source effect and
of the Gaussian function, with an approximately maximum increase computational efficiency. A numerical example
frequency of 60 Hz acts as the true source wavelet to verifies the effectiveness of our inversion method to invert
generate the observed data. When implementing FWI, we velocity and Q.
adopt the Ricker wavelets (the second-order derivative of
the Gaussian function) with the dominant frequency of Acknowledgments
fd  1, 3, 5, 10, 15, 20 Hz as the sources to form a multi-
scale inversion flow. For each frequency range, 31 L-BFGS This work is jointly supported by 973 Program of China
iterations are conducted. A line-search scheme is imposed (2013CB228603), National Science and Technology
to find a desirable step-length per iteration. Figure 2b Program (2016ZX05010-001), National Natural Science
displays the initial models for inversion. Note that the Foundation of China (U1562110, 41630314), the Research
shallow part ( z  60 m ) represents water, and the velocity of Novel Method and Technology of Geophysical
Prospecting (CNPC 2016-3302). We appreciate Dr.
and Q values are fixed as 1.5 km/s and 1000 during the
Zhiguang Xue for reviewing this abstract, and Texas
inversion.
Advanced Computing Center (TACC) for providing
computing resources.
The final inversion results are displayed in Figure 3. Figure
3a shows the inversion results by using 31 commons-shot
gathers, while Figure 3b shows the results by using 5 p-
indexed super shot gathers. Generally, the geological
structures of both velocity and Q models are recovered very
© 2017 SEG Page 1650

(a) (b)
Figure 2: True models (a), and initla models (b), top panel for velocity and bottom for Q.
(a) (b)
Figure 3: Inverted models (a) using 31 common-shot gathers, and (b) using 5 p-indexed super shot gathers, top for velocity and bottom for Q.
(a) (b)
Figure 4: Vertical profiles at x=810 m for (a) velocity, and (b) Q. The legend “Inverted 1” represents the inversion result using 31 common-shot
gathers, and “Inverted 2” represents the inversion result using 5 p-indexed super shot gathers.
© 2017 SEG Page 1651

EDITED REFERENCES
REFERENCES
Bai, J., D. Yingst, R. Bloor, and J. Leveille, 2014, Viscoacoustic waveform inversion of velocity
structures in the time domain: Geophysics, 79, no. 3, R103–R119,
http://doi.org/10.1190/geo2013-0030.1.
Chen, H., H. Zhou, and S. Qu, 2014, Low-rank approximation for time domain viscoacoustic wave
equation with spatially varying order fractional Laplacians: 84th Annual International Meeting,
SEG, Expanded Abstracts, 3400–3445, http://doi.org/10.1190/segam2014-0055.1.
Chen, H., H. Zhou, Q. Li, and Y. Wang, 2016, Two efficient modeling schemes for fractional Laplacian
viscoacoustic wave equation: Geophysics, 81, no. 5, T233–T249, http://doi.org/10.1190/geo2015-
0660.1.
Choi, Y., and T. Alkhalifah, 2011, Source-independent time-domain waveform inversion using convolved
wavefields: Application to the encoded multisource waveform inversion: Geophysics, 76, no. 5,
R125–R134, http://doi.org/10.1190/geo2010-0210.1.
Fomel, S., L. Ying, and X. Song, 2013, Seismic wave extrapolation using low-rank symbol
approximation: Geophysical Prospecting, 61, 526–536, http://doi.org/10.1111/j.1365-
2478.2012.01064.x.
Kjartansson, E., 1979, Constant Q-wave propagation and attenuation: Journal of Geophysical Research,
84, 4737–4748, http://doi.org/10.1029/jb084ib09p04737.
Sun, J., S. Fomel, and T. Zhu, 2014, Viscoacoustic modeling and imaging using low-rank approximation:
Sun, J., T. Zhu, and S. Fomel, 2015, Viscoacoustic modeling and imaging using low-rank approximation:
Geophysics, 80, no. 5, A103–A108, http://doi.org/10.1190/geo2015-0083.1.
Sun, J., S. Fomel, T. Zhu, and J. Hu, 2016, Q-compensated least-squares reverse-time migration using
low-rank one-step wave extrapolation: Geophysics, 81, no. 4, S271–S279,
http://doi.org/10.1190/geo2015-0520.1.
Vigh, D., and E. W. Starr, 2008, 3D prestack plane-wave full-waveform inversion: Geophysics, 73, no. 5,
VE135-VE144, http://doi.org/10.1190/1.2952623.
Xue, Z., and H. Zhu, 2015, Full waveform inversion with sparsity constraint in seislet domain: 85th
Xue, Z., T. Zhu, S. Fomel, and J. Sun, 2016, Q-compensated full waveform inversion using constant-Q
wave equation: 86th Annual International Meeting, SEG, Expanded Abstracts, 1063–1068,
Xue, Z., Y. Chen, S. Fomel, and J. Sun, 2016, Seismic imaging of incomplete data and simultaneous-
source data using least-squares reverse time migration with shaping regularization: Geophysics,
81, no. 1, S11–S20, http://doi.org/10.1190/geo2014-0524.1.
© 2017 SEG Page 1652

Zhang, Q., H. Zhou, Q. Li, H. Chen, and J. Wang, 2016, Robust source-independent elastic full-waveform
inversion in the time domain: Geophysics, 81, no. 3, R13–R28, http://doi.org/10.1190/geo2015-
0073.1.
Zhu, T., J. M. Carcione, and J. M. Harris, 2013, Approximating constant-Q seismic propagation in the
time domain: Geophysical Prospecting, 61, 931–940, http://doi.org/10.1111/1365-2478.12044.
Zhu, T., and J. M. Harris, 2014, Modeling acoustic wave propagation in heterogeneous attenuating media
using decoupled fractional Laplacians: Geophysics, 79, no. 3, T105–T116,
http://doi.org/10.1190/geo2013-0245.1.
© 2017 SEG Page 1653

Preconditioned elastic full-waveform inversion with approximated Hessian
Ettore Biondi*, Guillaume Barnier, and Biondo Biondi, Stanford University
Summary
In this work, we follow a similar approach proposed
We present a simple method of preconditioning the by Tang and Lee (2015) to estimate an approximated
gradient of elastic multi-component full-waveform Gauss-Newton Hessian matrix. By taking advantage
inversion (FWI) using an approximated Gauss- of the sparse structure of this matrix, we evaluate its
Newton Hessian. By sampling this matrix we are able elements by sampling it with impulses placed in the
estimate the Hessian elements. We use this model space and interpolating for the unknown values.
approximated matrix to compute a preconditioner to
apply during the inversion. We show on a synthetic 2D We explain how to write the Gauss-Newton Hessian
sediment model that a main-diagonal approximation in the case of elastic multi-component FWI as a series
already improves the convergence rate of the FWI of forward and adjoint operators. On a 2D complex
optimization and properly scales the gradients for synthetic elastic subsurface, we demonstrate that a
different parameter classes. Therefore, it also simple main-diagonal approximation can already
decreases the differential sensitivities to the data of the improve the convergence rate and diminish the
simultaneously inverted parameters. parameter crosstalk in the inverted model when wave
velocities and density are simultaneously estimated.
Introduction
Theory
Multi-parameter FWI has become one of the most
studied topics in seismic data inversion (Operto et al., The 2D velocity-stress formulation of the elastic wave
2013). Since its first envision by Tarantola (1984), equation is given by the following set of relations
FWI was proposed to simultaneously invert for bulk (Virieux, 1986):
modulus and density in the subsurface. Nowadays, we
explore the possibility of inverting for elastic 𝜌 𝑥
#$% &,(
−
#*%% &,(
−
#*%+ &,(
= 𝑆$% 𝑥, 𝑡 , (1)
anisotropic parameters (Albertin et al., 2016). #( #& #,
#$+ &,( #*%+ &,( #*++ &,(
𝜌 𝑥 − − = 𝑆$+ 𝑥, 𝑡 , (2)
#( #& #,
Although the advancement in computational #*%% &,( #$% &,( #$+ &,(
− [𝜆 + 2𝜇] 𝑥 −𝜆 𝑥 =
technologies, solving for more complex wave #( #& #,
equations is still the limiting factor when running 𝑆*%% 𝑥, 𝑡 , (3)
#*++ &,( #$% &,( #$+ &,(
several iterations of any FWI algorithm. It is, −𝜆 𝑥 − [𝜆 + 2𝜇] 𝑥 =
#( #& #,
therefore, fundamental to find new methods to
𝑆*++ 𝑥, 𝑡 , (4)
precondition any FWI problem and improve the #*%+ &,( #$% &,( #$+ &,(
convergence rate of the optimization algorithm used − 𝜇(𝑥) + = 𝑆*%+ 𝑥, 𝑡 , (5)
#( #, #
during the data inversion.
where 𝜆, and 𝜇 are the Lamé parameters, 𝜌 is density,
Many authors have explored different schemes to 𝑣& , and 𝑣, are the particle velocities, and 𝜎&& , 𝜎,, , and
improve the robustness of FWI when multiple 𝜎&, are the propagated stresses. The variables on the
parameter classes are estimated (Virieux and Operto, right-hand side of these equations represent the forcing
2009). Many of these have in common the use of an terms. As shown by Alves and Biondi (2016), we can
exact or approximated Hessian matrix (Tang and Lee, rewrite these equations as a non-linear operator:
2010; Korta et al., 2013), which contains the objective
function’s curvature information. Not only does this 𝑑 = 𝑓(𝑚), (6)
matrix improve the FWI rate of convergence, but it
also reduces the unbalanced sensitivity to the data of
the different parameter classes, also known as where 𝑑 = 𝑣& 𝑣, 𝜎&& 𝜎,, 𝜎&, > and 𝑚 = 𝜆 𝜇 𝜌 >
parameter crosstalk (Operto et al., 2013). define our data and model vectors, respectively. In real
seismic acquisition only the hydrostatic pressure and
© 2017 SEG Page 1654

Preconditioned elastic FWI
particle velocities are recorded. Therefore, we apply a the non-linear transformation of equation 9 is given
linear transformation to the data vector to simulate a by:
real experiment:
2𝑉I 𝜌 −4𝑉A 𝜌 𝑉IE − 2𝑉AE
𝑑?@A = 𝑹𝑑, (7) 𝑮= 0 2𝑉A 𝜌 𝑉AE . (12)
0 0 1
where 𝑹 is defined as follows:
Using equation 11 we can apply any gradient-based
D D
0 0 0 optimization algorithm, such as non-linear conjugate
E E
𝑹= 1 gradient (CG) (Fletcher and Reeves, 1964).
0 0 0 0 , (8)
To improve the convergence rate of any optimization
0 1 0 0 0
method we apply an approximated inverse Hessian
and whose effect is to average the normal stresses and matrix to precondition the gradient. In fact, this
extract the particle velocities from the data vector. The operation would approximate a Newton's optimization
choice of model parametrization influences the step. The structure of the Gauss-Newton Hessian
inversion results because of the presence of parameter enables us to estimate its elements by applying this
crosstalk as discussed by Operto et al. (2013). We matrix to impulses in the model space and
choose to parametrize our model space with the vector interpolating for the unknown values (Tang and Lee,
> 2015). For the objective function in equation 10, this
𝑚′ = 𝑉I 𝑉A 𝜌 that contains the wave propagation
matrix 𝑯ST takes the following form:
velocities as opposed to elastic parameters. This
change of variables introduces the following non- 𝑯ST = 𝑮> 𝑭> 𝑹> 𝑹𝑭𝑮 , (13)
linear transformation:
where 𝑭 is the Born operator that maps perturbations
𝑉IE − 2𝑉AE 𝜌 in the model space into data perturbations.
𝑚 = 𝑔 𝑚K = 𝑉AE 𝜌 , (9)
𝜌 Results and discussion
where 𝑉I and 𝑉A are the compressional- and shear- We generate a 2D complex subsurface model using the
wave propagation velocities, respectively. Given the model builder software by Clapp (2014). Figure 1
previous equations we define our FWI objective shows this model in terms of compressional-wave
function as: velocity, shear-wave velocity, and density. In such
model, we generate multi-component data by placing
D E D 50 explosive sources at the surface spaced by 100 m.
E
𝜙 𝑚K = 𝑹𝑓 𝑔 𝑚 K − 𝑑?@A = 𝑟 E, (10)
E E E As source signature, we employ a Ricker wavelet with
dominant frequency of 20 Hz. The hydrophones and
where 𝑑?@A is the observed multi-component data, 𝑟 is geophones are positioned at the sea bottom but with an
the residual vector, and we parametrized the modeling interval of 10 m.
operator in terms of the 𝑚 K vector. To minimize this
function we compute the gradient of equation 10 that The initial model parameters used to start the FWI
can be written as follows: problem is constructed by smoothing the ones
displayed in Figure 1 (Figure 2). To avoid local
𝛻𝜙 = 𝑮> 𝑭> 𝑹> 𝑟 , (11) minima and increase the attraction basin of the global
minimum, we follow a multi-scale approach (Bunks,
where 𝑮> and 𝑭> are the linearized adjoint operators 1995). We start the inversion with a bandwidth of
of equations 9 and 6, respectively. The linearized maximum frequency of 5 Hz, and we progressively
adjoint operator 𝑭> can be found using the adjoint state increase the bandwidth by intervals of 5 Hz up to 20
method (Fichtner, 2010), and the Jacobian matrix of Hz. The last inverted model from one band is going to
be our initial subsurface model for the next one. For
© 2017 SEG Page 1655

each frequency band we run 40 iterations of non-linear the shallow portion of the subsurface, meaning both
CG algorithm. FWI optimizations have likely converged to the same
minimum. However, in the deeper layers, they differ
in terms of parameter resolution. In fact, when the
optimization is preconditioned, the inverted
parameters present more structural features. In terms
of objective function values, both inversion results
have similar behavior. From Figure 5, we observe a
modest improvement in convergence rate when the
problem is preconditioned with a Gauss-Newton
main-diagonal approximation, especially as we
increase the frequency content used during the
inversion. In addition, the norm of the Euclidian
distance of the last inverted model from the true
solution is smaller when the inverse problem is
preconditioned. In this test, the effect the
preconditioner is to mostly properly scaling the
gradients of the simultaneously inverted parameter
Figure 3: Application of the Gauss-Newton Hessian for the 15 Hz classes and compensates for the illumination factor.
frequency band to spikes positioned in the subsurface. The top labels
indicate the parameter image obtained by applying the Hessian matrix Conclusions
to impulses in the parameter class indicated by the left label. The
diagonal panels enable us to estimate the main diagonal of the Gauss-
Newton Hessian matrix. In all of these panels the water layer has been We discuss how to derive the Gauss-Newton Hessian
removed. The interaction of different Hessian columns make the off- approximation of elastic multi-component FWI as a
diagonal panels not perfectly symmetric. series of linear operators when wave velocities and
density parameterize the inverse problem. We describe
We compare an optimization result obtained without how to estimate these matrix elements from
employing any preconditioning, and a different one applications to model vectors containing sparse
where an approximated Gauss-Newton Hessian impulses. Using linear interpolation we compute the
inverse is estimated for each frequency band and is missing matrix elements.
used to construct a preconditioner. To estimate this
matrix, we apply the Hessian of equation 13 to twelve On a complex 2D synthetic model we show the use of
impulses placed in the subsurface and spaced by 500 this estimated matrix to precondition the FWI problem
and 1000 meters along the z-axis and x-axis, when multiple parameters are inverted
respectively (Figure 3). From these applications, we simultaneously. We demonstrate that a simple main-
extract the main diagonal elements and linearly diagonal approximation already provides a moderate
interpolate the unknown Hessian values. We then convergence improvement and properly scales the
create an approximated Gauss-Newton Hessian FWI gradients. In fact, this scaling removes inversion
inverse that is used to precondition any computed FWI artifacts in the final inverted model.
gradient. This preconditioner is estimated only once
for each frequency band before starting the inversion.
This approach assumes that this weighting is not
changing as the FWI is varying the model. This
assumption is reasonable since the model
perturbations introduced by the FWI scheme will not
drastically change the Gauss-Newton Hessian matrix.
Figure 4 shows the elastic FWI results. For all

parameter classes, the inverted models are similar in
© 2017 SEG Page 1656

Figure 1: True subsurface model. Left: Compressional-wave velocity. Central: shear-wave velocity. Right: Subsurface density.
Figure 2: Starting FWI model created by smoothing the true model shown in Figure 1. Left panel: compressional-wave velocity. Central panel: shear-wave velocity. Right
panel: subsurface density.
Figure 4: Inverted FWI model comparison. Top panels: inverted model parameters without preconditioning. Bottom panels: inverted model parameters using approximated
Gauss-Newton Hessian inverse.
Figure 5: From left to right, relative objective function comparison between CG (red curve) and preconditioned CG (blue curve) for maximum frequency content of 5, 10, 15,
20 Hz used during the inversion.
© 2017 SEG Page 1657

EDITED REFERENCES
REFERENCES
Albertin, U., P. Shen, A. Sekar, T. Johnsen, C. Wu, K. Nihei, and K. Bube, 2016, 3D orthorhombic elastic
full-waveform inversion in the reflection domain from hydrophone data: 86th Annual
Alves, G., and B. Biondi, 2016, Imaging condition for elastic reverse time migration: 86th Annual
Clapp, R., 2014, Synthetic model building using a simplified basin modeling approach: SEP-Report, 155,
143–150.
Fichtner, A., 2010, Full seismic waveform modelling and inversion: Springer.
Fletcher, R., and C. M. Reeves, 1964, Function minimization by conjugate gradients: The Computer
Journal, 7, 149–154, https://doi.org/10.1093/comjnl/7.2.149.
Korta, N., A. Fichtner, and V. Sallarecs, 2013, Block-diagonal approximate hessian for preconditioning in
full waveform inversion: 75th Annual International Conference and Exhibition, EAGE, Extended
Abstracts, https://doi.org/10.3997/2214-4609.20130604.
practice: The Leading Edge, 32, 1040–1054, https://doi.org/10.1190/tle32091040.1.
Tang, Y., and S. Lee, 2010, Preconditioning full waveform inversion with phase-encoded hessian: 80th
https://doi.org/10.1190/1.3513023.
Tang, Y., and S. Lee, 2015, Multi-parameter full wavefield inversion using non-stationary point-spread
functions: 85th Annual International Meeting, SEG, Expanded Abstracts, 1111–1115,
1259–1266, https://doi.org/10.1190/1.1441754.
Virieux, J., 1986, P-SV wave propagation in heterogeneous media: Velocity-stress finite-difference
method: Geophysics, 51, 889–901, https://doi.org/10.1190/1.1442147.
© 2017 SEG Page 1658

Waveform inversion in acoustic orthorhombic media with a practical set of parameters
Nabil Masmoudi∗ and Tariq Alkhalifah, King Abdullah University of Science and Technology
SUMMARY parameters include the P-wave vertical velocity VP0 , the pa-
rameters ε1 and δ1 defined in the (x, z) vertical plane, ε2 and
Full-waveform inversion (FWI) in anisotropic media is over- δ2 defined in the (y, z) vertical plane, and δ3 defined in the
all challenging, mainly because of the large computational (x, y) horizontal plane. The trade-off between the model pa-
cost, especially in 3D, and the potential trade-offs between rameters and their resolution limits, due mainly to the limited
the model parameters needed to describe such a media. We data coverage and the different influence that parameters exert
propose an efficient 3D FWI implementation for orthorhom- on seismic data, have motivated several studies investigating
bic anisotropy under the acoustic assumption. Our modeling optimal parameterization set for our inversion schemes (Op-
is based on solving the pseudo-differential orthorhombic wave erto et al., 2013; Gholami et al., 2013; Alkhalifah and Plessix,
equation split into a differential operator and a scalar one. The 2014; Alkhalifah, 2016). Most of these studies rely on the
modeling is computationally efficient and free of shear wave analysis of the radiation (scattering) patterns based on the Born
artifacts. Using the adjoint state method, we derive the gradi- approximation (Wu and Aki, 1985; Panning et al., 2009). In
ents with respect to a practical set of parameters describing the this study, we use the parameter set of Masmoudi and Alkhali-
acoustic orthorhombic model, made of one velocity and five fah (2016a), built around a central parameter: the horizontal
dimensionless parameters. This parameterization allows us to velocity vh1 in the (x, z) plane, two VTI parameters ε1 and
use a multi-stage model inversion strategy based on the con- η1 in the (x, z) plane, two deviations parameters εd = (ε2 −
tinuity of the scattering potential of the parameters as we go ε1 )/(1 + 2ε1 ) and ηd = (η2 − η1 )/(1 + 2η1 ), and finally the
from higher symmetry anisotropy to lower ones. We apply the parameter δ3 . Here, η1 and η2 are the anellipticity parameters
proposed approach on a modified SEG-EAGE overthrust syn- defined respectively in the (x, z) and (y, z) planes. The main
thetic model. The quality of the inverted model suggest that we features of this parameterization are discussed in Alkhalifah
may recover only 4 parameters, with different resolution scales et al. (2016).
depending on the scattering potential of these parameters.
The efficiency of wave simulators is a vital ingredient in FWI.
Several methods have been proposed to model P-wave propa-
gation in orthorhombic media, among them solving coupled-
INTRODUCTION systems of equations (Fowler and King, 2011; Cheng and Kang,
2014) and the mixed domain wavefield extrapolators (Fowler
The objective of full-waveform inversion (FWI) is to recover a and Lapilli, 2012; Fomel et al., 2013). One common issue in
high-resolution model that is capable of matching the observed the coupled system approaches is the existence of shear wave
seismic data, trace by trace, through repetitive modeling and a artifacts in the simulated wavefield. Alkhalifah (2000) propose
local optimization technique (Lailly, 1983; Tarantola, 1984). a pseudo-acoustic wave equation which provides accurate sim-
Over the last three decades, most FWI methods were designed ulation of the kinematics of P-wave propagation. Xu and Zhou
to recover only P-wave velocity because of the high compu- (2014) propose to solve this pseudo-acoustic wave equation
tational cost. The recent progress in high-performance com- by decomposing the pseudo-differential operator into a scalar
puting and the improvement in data acquisition resulted in the and a differential operator. It has been shown that such an ap-
extension of FWI to 2D and 3D acoustic and elastic vertical proach can yield accurate simulation of P-wave kinematics and
transverse isotropic (VTI) media, e.g. (Operto et al., 2014; Wu a wavefield free of shear wave artifacts. This new approach has
and Alkhalifah, 2016). This extension has also resulted in chal- been successfully applied to VTI migration (Mu et al., 2015)
lenges related to the trade-off between the model parameters in and inversion (Le et al., 2015).
our inversion process.
Here, we first apply Xu and Zhou (2014) decomposition method
The orthorhombic model is usually regarded as the most prac- to Alkhalifah (2003) orthorhombic wave equation. Then, we
tical realistic approximation of the subsurface, as it combines derive the gradients with respect to the model parameters us-
both the anisotropy admitted by the natural, mostly horizon- ing the adjoint state method. Finally, we test our approach in
tal layering of the Earth (due to gravity), as well as the ver- inverting for the parameters of a modified orthorhombic SEG-
tical aligned fractures, usually found in fractured reservoirs EAGE overthrust model, and analyze the resolution of the in-
(Schoenberg and Helbig, 1997; Tsvankin, 1997; Bakulin et al., verted parameters.
2000). For seismic data exhibiting azimuthal anisotropy, we
expect FWI to perform better with an orthorhombic model rep-
resentation. Recently, some approaches of waveform inversion FORWARD MODELING
for elastic and acoustic orthorhombic models have been pro-
posed (Albertin et al., 2016; Oh and Alkhalifah, 2016; Wang The acoustic wave equation (Alkhalifah, 2000) for anisotropic
and Tsvankin, 2016). media satisfies a linear pseudo-differential equation, which we
can write in the following form:
Here, we focus on acoustic orthorhombic models, described h i
by six parameters (Tsvankin, 1997; Alkhalifah, 2003). These ω 2 − v2h1 φ (x, k) u(k, ω) = 0, (1)
© 2017 SEG Page 1659

Waveform inversion in acoustic orthorhombic media
where u(k, ω) is the pressure wavefield, ω is the angular fre- wavefield solution is free of shear wave artifacts. Figure 1(c)
quency, k is the magnitude of the wave vector, vh1 is the hori- shows the corresponding S operator. One can notice that S cor-
zontal velocity in the [x, z] plane and φ is the pseudo-differential rects for the anelipticity of the wavefield, since its maximum
phase operator. In orthorhombic media, the phase operator can influence resides at 45◦ propagation direction and is equal to 1
be obtained from the dispersion relation using the Christoffel at the principal axes of symmetry.
equation (Alkhalifah, 2003). This yields a cubic polynomial,
with one of its roots (specifically, the biggest root) correspond-
ing to P-waves. The general form of the polynomial is given
below (Song and Alkhalifah, 2013) with the parameterization
of Masmoudi and Alkhalifah (2016a):
−φ 3 + a φ 2 + b φ + c = 0, (2)
where
1
a = kx2 + (1 + 2εd )ky2 + k2 ,
1 + 2ε1 z
2η1 (a) (b)
b = ((1 + 2δ3 ) − (1 + 2εd )) kx2 ky2 − k2 k2
(1 + 2η1 )(1 + 2ε1 ) x z
(1 + 2εd ) ((1 + 2ηd )(1 + 2η1 ) − 1) 2 2
− ky kz , (3)
(1 + 2ηd )(1 + 2η1 )(1 + 2ε1 )
kx2 ky2 kz2
c=− (1 + 2η1 )(1 + 2δ3 ) −
(1 + 2ε1 )(1 + 2η1 )
s !
(1 + 2εd )(1 + 2δ3 ) 1
2 + (1 + 2εd ) −2η1 + .
1 + 2ηd 1 + 2ηd
To solve equation 1 in the time-space domain, we propose (c)

similarly to Xu and Zhou (2014) to decompose the pseudo-
differential operator into a Laplacian operator ∇2E and a scalar Figure 1: Wavefield snapshots computed with the low-rank
operator S(x) as follows: method (a), the new modeling (equations 4 and 5) (b) and the
corresponding S operator (c). The red dashed contours repre-
∂tt u(x,t) − v2h1 S(x) ∇2E u(x,t) = 0, (4)
sent the eikonal solution.
where

∇2E u(x,t) = ∂xx u + (1 + 2εd ) ∂yy u + 1 ∂zz u,
1+2ε1
φ (x,n) (5) COMPUTING THE GRADIENTS
S(x) = 1
n2x +(1+2εd )n2y + 1+2ε n2z
,
1
The objective of FWI is to minimize a certain misfit function.
where ∂tt is the second-order time derivative, ∂xx , ∂yy and ∂zz
We consider here the classic L2-norm data difference defined
are the second-order space derivatives, nx , ny and nz are the
as:
components of the unit vector n of phase direction n = k/|k|. Z
To compute the scalar operator in the space domain, we ap- 1X T
χ= ku(xr ,t) − d(xr ,t)k2 dt, (6)
proximate the phase direction n from the components of ∇u/|∇u| 2 r 0
(Xu and Zhou, 2014). In our decomposition, the ellipticity of
the wavefield is taken into account in the Laplacian operator, where xr are the receivers’ locations, d(xr ,t) is the observed
while the anellipticity is handled in the scalar operator. This el- data and T is total time. We derive the adjoint wavefield and
liptic decomposition yields better handling of amplitudes and the gradients of the misfit function 6 using the adjoint state
a more stable solution (Xu et al., 2015). method (Liu and Tromp, 2006; Plessix, 2006). In this case, the
adjoint wavefield λ satisfies the following wave equation:
We compare the accuracy of the wave equation 4 with the low-
rank approximation (Song and Alkhalifah, 2013), as well as 1 2 2 ∂S T
∂tt λ (x,t) − S(x)∇ E λ (x,t) − ∇ E u(x,t)∇λ (x,t) =
with the traveltimes from the eikonal solution (Waheed et al., v2h1 ∂ ∇u
2014). Figures 1(a) and 1(b) show wavefield snapshots in an X
[u(x, T − t) − d(xr , T − t)] δ (x − xr ), (7)
orthorhombic medium with constant parameters (VP0 = 2000
r
m/s, ε1 = 0.2, ε2 = 0.4, δ1 = 0.1, δ2 = 0.15, δ3 = 0.05), which
correspond to (vh1 ≈ 2366, ε1 = 0.2, εd ≈ 0.143, η1 ≈ 0.0833, with the boundary conditions:
ηd ≈ 0.093, δ3 = 0.05) in the parameterization considered here.
λ (x, T ) = 0, ∂t λ (x, T ) = 0, λ |∂ Ω = 0, (8)
We note that the new wavefield solution matches the kinemat-
ics of the low-rank solution, as confirmed by comparison with where δ (x − xr ) is the Dirac delta function. The gradients with
the traveltimes from the eikonal equation. Obviously, the new respect to the model parameters vh1 , ε1 , εd , η1 , ηd and δ3 are:
© 2017 SEG Page 1660

Z T 2
∂χ
∂ vh1
= 3
2
vh1 0
λ (x, T − t)
∂ u
∂t 2
dt, η1 δ3
Z T 2
∂χ ∂ u ∂S 2
= −2 λ (x, T − t) S 2 + ∇E u dt,
∂ ε1 0 ∂z ∂ ε1
Z T 2
∂χ ∂ u ∂S 2
=2 λ (x, T − t) S 2 + ∇ u dt, (9)
∂ εd 0 ∂y ∂ εd E
εd
Z T
∂χ
∂ η1
= λ (x, T − t)
∂S 2
∇ u dt,
∂ η1 E
ηd
0
Z T
∂χ ∂S 2
= λ (x, T − t) ∇E u dt,
∂ ηd 0 ∂ ηd
Z T
∂χ ∂S 2
= λ (x, T − t) ∇E u dt.
∂ δ3 0 ∂ δ3
The partial derivatives of S involved in equations 7 and 9 are
not easy to obtain from the cubic polynomial 2. Therefore, we
vh1 ε1
approximate its solution using Taylor expansion. Specifically,
by assuming the anisotropy parameters are realtively small, we
expand the solution of φ in terms of ε1 , εd , η1 , ηd and δ3 as
follows:
φ ≈ φ0 + φεd εd + φε1 ε1 + φη1 η1 + φηd ηd + φδ3 δ3 , (10) Figure 2: Orthorhombic model parameters obtained from the
where φε1 , φεd , φη1 , φηd and φδ3 are the coefficients in this SEG-EAGE overthrust model.
expansion. By replacing equation 10 into polynomial 2, we
solve for the coefficients of the trial solutions corresponding to
the P-wave phase velocity, and obtain an approximation of φ : inversion domain is 8km by 6km laterally and 4km in depth.
The domain is discretized using a 25m grid spacing in all three
φ ≈ 1 + 2ky2 (ky2 + kz2 )εd − 2kz2 ε1 − 2kz2 (kx2 + ky2 )η1 dimensions. We employ 60 sources distributed on the sur-
face of the inversion domain in three lines at y={1.5km, 3km,
+2ky2 kz2 ηd + 2kx2 ky2 δ3 . (11)
4.5km}. We record the data on the surface from receivers
placed on all nodal points of the inversion domain. We in-
We use a precondionned non-linear conjugate gradient method vert data filtered between 2Hz to 8Hz using the starting model
to update the model. The preconditioner is the diagonal of the parameters shown in Figure 3.
approximate Hessian Ha (Shin et al., 2001), referred as the
pseudo-Hessian: We use a multi-stage model inversion strategy: we update first
the isotropic model given by vh1 (20 iterations), then the VTI
1
pi = gi , (12) model by including η1 and ε1 (5 iterations), and finally the or-
diag(Hai ) thorhombic model by including εd , ηd and δ3 (5 iterations).
where gi and pi are respectively the conjugate and the pre- This inversion strategy takes advantage of the practical param-
conditioned conjugate gradients corresponding to the i-th pa- eterization that allows for a continuity in the scattering poten-
rameter. We estimate the steplengths using a second-order ap- tial of the model parameters as we move from higher symmetry
proximation of the objective function (Hu et al., 2011). In the anisotropy to lower ones (Masmoudi and Alkhalifah, 2016b).
multi-parameter case, this requires solving a linear system of Figure 4 shows the inverted model. Vertical profiles located
equations of the form: at (x=4km, y=3km) comparing initial, true and inverted vh1 ,
h i ε1 , εd and δ3 are shown in Figure 5. Our inversion strategy
(F pi )T (F p j ) α j = −gTi pi , (13) leads to a very well recovered horizontal velocity. ε1 which
affects the data mainly at narrow opening angles is generally
where F is the Frechet derivative and α represents the vector well recovered here, as well. In general, ε1 can help in fitting
of steplengths. The effect of the Frechet derivative on the gra- the data amplitudes at near offsets. Therefore, when real data
dients is computed using a first-order finite difference approxi- is inverted, ε1 should help in fitting the mismatch between am-
mation (Pica et al., 1990), and requires one extra modeling for plitudes, due to an acoustic approximation. Moreover, εd and
each parameter. δ3 , which affect large offsets data in the cross line and in the
45◦ -source-to-receiver-azimuth directions, are only recovered
at the shallow part (up to 1km depth). This is due to the limited
NUMERICAL EXAMPLES offset range (used here) where these parameters exert the most
influence. Finally, η1 and ηd are not recovered, as predicted,
We test our inversion algorithm on a modified version of an by some previous studies pin-pointing the weak influence of η
SEG-EAGE orthorhombic overthrust model (see Figure 2). The
© 2017 SEG Page 1661

on surface seismic data as a scatterer in this parameterization vh1

(Alkhalifah, 2016; Alkhalifah and Guitton, 2016). However, η
is still important in defining the wave propagation and should
be extracted from migration velocity analysis (MVA) methods.
η1 δ3 ε1
ηd εd εd
vh1 ε1
δ3
Figure 3: Initial model parameters for the inversion.
Figure 5: Vertical profiles located at (x=4km, y=3km) compar-

ing initial, true and inverted vh1 , ε1 , εd and δ3 .
η1 δ3
CONCLUSION
We proposed a waveform inversion strategy to update the model

parameters of an acoustic orthorhombic medium. Decompos-
ing the pseudo-differential wave equation into two solvable
ηd εd terms yielded an efficient wavefield simulation. The choice
of a practical parameterization allowed us to invert the main
parameters the data are sensitive to, specifically 4 parameters:
vh1 , ε1 , εd and δ3 . The resolution of the inverted parameters is
consistent with their scattering potential given by their radia-
tion patterns. The model update was based on preconditionned
non-linear conjugate gradient and steplengths computed from
a second-order approximation of the objective function. Addi-
vh1 ε1 tional gradient preconditionning such as the truncated newton
might help improve the quality of the inverted model.
ACKNOWLEDGMENTS
We would like to thank KAUST for financial support and SWAG

members for many useful discussions. For computer time, this
Figure 4: Inverted model parameters.
research used the resources of the Supercomputing Laboratory
in KAUST.
© 2017 SEG Page 1662

EDITED REFERENCES
REFERENCES
Albertin, U., P. Shen, A. Sekar, T. Johnsen, C. Wu, K. Nihei, and K. Bube, 2016, 3D orthorhombic elastic
full-waveform inversion in the reflection domain from hydrophone data: 86th Annual
International Meeting, SEG, Expanded Abstracts, 1094–1098, http://doi.org/10.1190/segam2016-
13866375.1.
Alkhalifah, T., 2000, An acoustic wave equation for anisotropic media: Geophysics, 65, 1239–1250,
http://doi.org/10.1190/1.1444815.
Alkhalifah, T., 2003, An acoustic wave equation for orthorhombic anisotropy: Geophysics, 68, 1169–
1172, http://doi.org/10.1190/1.1598109.
Alkhalifah, T., 2016, Research note: Insights into the data dependency on anisotropy: an inversion
prospective: Geophysical Prospecting, 64, 505–513, http://doi.org/10.1111/gpr.2016.64.issue-2.
Alkhalifah, T., and A. Guitton, 2016, An optimal parameterization for full waveform inversion in
anisotropic media: 78th Annual International Conference and Exhibition, EAGE, Extended
Abstracts, http://doi.org/10.3997/2214-4609.201601192.
Alkhalifah, T., N. Masmoudi, and J.-W. Oh, 2016, A recipe for practical full-waveform inversion in
orthorhombic anisotropy: The Leading Edge, 35, 1076–1083,
http://doi.org/10.1190/tle35121076.1.
Alkhalifah, T., and R. E. Plessix, 2014, A recipe for practical full-waveform inversion in anisotropic
http://doi.org/10.1190/geo2013-0366.1.
Bakulin, A., V. Grechka, and I. Tsvankin, 2000, Estimation of fracture parameters from reflection seismic
data-Part II: Fractured models with orthorhombic symmetry: Geophysics, 65, 1803,
http://doi.org/10.1190/1.1444864.
Cheng, J., and W. Kang, 2014, Simulating propagation of separated wave modes in general anisotropic
media, Part I: qP-wave propagators: Geophysics, 79, no. 1, C1–C18,
http://doi.org/10.1190/geo2012-0504.1.
Fomel, S., L. Ying, and X. Song, 2013, Seismic wave extrapolation using lowrank symbol approximation:
Geophysical Prospecting, 61, 526–536, http://doi.org/10.1111/gpr.2013.61.issue-3.
Fowler, P. J., and R. King, 2011, Modeling and reverse time migration of orthorhombic pseudoacoustic P
waves: 81st Annual International Meeting, SEG, Expanded Abstracts, 190–195,
http://doi.org/10.1190/1.3627580.
Fowler, P. J., and C. Lapilli, 2012, Generalized pseudospectral methods for orthorhombic modeling and
reverse-time migration: 82nd Annual International Meeting, SEG, Expanded Abstracts, 1–5,
Gholami, Y., R. Brossier, S. Operto, A. Ribodetti, and J. Virieux, 2013, Which parameterization is
suitable for acoustic vertical transverse isotropic media full waveform inversion? Part 1:
Sensitivity and trade-off analysis: Geophysics, 78, no. 2, R81–R105,
http://doi.org/10.1190/geo2012-0204.1.
Hu, W., A. Abubakar, T. Habashy, and J. Liu, 2011, Preconditioned non-linear conjugate gradient method
for frequency domain full-waveform seismic inversion: Geophysical Prospecting, 59, 477–491,
http://doi.org/10.1111/gpr.2011.59.issue-3.
© 2017 SEG Page 1663

Inverse Scattering, Theory and Application, Society of Industrial and Applied Mathematics,
Le, H., B. Biondi, R. G. Clapp, and S. A. Levin, 2015, Using a nonlinear acoustic wave equation for
anisotropic inversion: 85th Annual International Meeting, SEG, Expanded Abstracts, 467–471,
Liu, Q., and J. Tromp, 2006, Finite-frequency kernels based on adjoint methods: Bulletin of the
Seismological Society of America, 96, 2383–2397, http://doi.org/10.1785/0120060041.
Masmoudi, N., and T. Alkhalifah, 2016a, A new parameterization for waveform inversion in acoustic
orthorhombic media: Geophysics, 81, no. 4, R157–R171, http://doi.org/10.1190/geo2015-0635.1.
Masmoudi, N., and T. Alkhalifah, 2016b, Scattering potential of acoustic orthorhombic parametrization -
An inversion prospective: 78th Annual International Conference and Exhibition, EAGE,
Extended Abstracts, http://doi.org/10.3997/2214-4609.201601010.
Mu, J., B. Tang, S. Xu, H. Zhou, and A. DeNosaquo, 2015, Quasi-p wave reverse time migration on seam
dataset: 85th Annual International Meeting, SEG, Expanded Abstracts, 3991–3995,
Oh, J.-W., and T. Alkhalifah, 2016, 3D elastic-orthorhombic anisotropic full-waveform inversion:
Application to field OBC data: 86th Annual International Meeting, SEG, Expanded Abstracts,
1206–1210, http://doi.org/10.1190/segam2016-13862511.1.
Operto, S., R. Brossier, L. Combe, L. Mtivier, A. Ribodetti, and J. Virieux, 2014, Computationally
efficient three- dimensional acoustic finite-difference frequency-domain seismic modeling in
vertical transversely isotropic media with sparse direct solver: Geophysics, 79, no. 5, T257–T275,
http://doi.org/10.1190/geo2013-0478.1.
Panning, M. P., Y. Capdeville, and B. A. Romanowicz, 2009, Seismic waveform modelling in a 3-D
Earth using the Born approximation: Potential shortcomings and a remedy: Geophysical Journal
International, 177, 161–178, http://doi.org/10.1111/gji.2009.177.issue-1.
Pica, A., J. P. Diet, and A. Tarantola, 1990, Nonlinear inversion of seismic reflection data in a laterally
invariant medium: Geophysics, 55, 284–292, http://doi.org/10.1190/1.1442836.
Plessix, R.-E., 2006, A review of the adjoint-state method for computing the gradient of a functional with
Schoenberg, M., and K. Helbig, 1997, Orthorhombic media: Modeling elastic wave behavior in a
vertically fractured earth: Geophysics, 62, 1954–1974, http://doi.org/10.1190/1.1444297.
Shin, C., S. Jang, and D. J. Min, 2001, Improved amplitude preservation for prestack depth migration by
inverse scattering theory: Geophysical Prospecting, 49, 592–606, http://doi.org/10.1046/j.1365-
2478.2001.00279.x.
Song, X., and T. Alkhalifah, 2013, Modeling of pseudoacoustic p-waves in orthorhombic media with a
low-rank approximation: Geophysics, 78, no. 4, C33–C40, http://doi.org/10.1190/geo2012-
0144.1.
1259–1266, http://doi.org/10.1190/1.1441754.
Tsvankin, I., 1997, Anisotropic parameters and P-wave velocity for orthorhombic media: Geophysics, 62,
1292–1309, http://doi.org/10.1190/1.1444231.
© 2017 SEG Page 1664

Waheed, U., C. E. Yarman, and G. Flagg, 2014, An iterative fast sweeping based eikonal solver for tilted
orthorhombic media: 84th Annual International Meeting, SEG, Expanded Abstracts, 480–485,
Wang, H., and I. Tsvankin, 2016, Feasibility of waveform inversion in acoustic orthorhombic media: 86th
Wu, R., and K. Aki, 1985, Scattering characteristics of elastic waves by an elastic heterogeneity:
Geophysics, 50, 582–595, http://doi.org/10.1190/1.1441934.
Wu, Z., and T. Alkhalifah, 2016, Waveform inversion for acoustic VTI media in frequency domain: 86th
Xu, S., B. Tang, J. Mu, and H. Zhou, 2015, Elliptic decomposition of quasi-P wave equation: 77th Annual
International Conference and Exhibition, EAGE, Extended Abstracts,
http://doi.org/10.3997/2214-4609.201413134.
Xu, S., and H. Zhou, 2014, Accurate simulations of pure quasip-waves in complex anisotropic media:
Geophysics, 79, no. 6, T341–T348, http://doi.org/10.1190/geo2014-0242.1.
© 2017 SEG Page 1665

Comparison of model resolution matrix in Laplace-Fourier-domain and frequency-domain
Hyojoon Jin, Jungmin Kwon*, and Changsoo Shin, Department of Energy Systems Engineering, Seoul National
University
Summary al., 2010; Sirgue et al., 2009; Choi and Alkhalifah, 2013).
Among the above algorithms, waveform inversion in the
Laplace-Fourier Full waveform inversion is powerful Laplace-domain proposed by Shin and Cha (2008) is
technique for constructing background or medium P-wave known to be able to make an initial model very effectively.
velocity model compared to conventional frequency- Furthermore, they suggested the concept of the Laplace-
domain FWI. Of course, if the frequency components of the Fourier FWI using complex frequency, which is a robust
observed data contains a full-band, the conventional algorithm for constructing a background or medium
frequency-domain FWI also has no problem in constructing velocity model without low frequency components (Shin
the P-wave velocity model. However, there are limitations and Cha, 2009). However, previous studies have not
in constructing a good P-wave velocity model because the analyzed whether the Laplace-Fourier FWI can build a
data obtained from reality are band-limited data and lack more accurate P-wave velocity model than the frequency-
information of low frequencies. In this paper, we analyzed domain FWI, even when no low frequency components are
how the Laplace-Fourier FWI using complex frequencies is used.
a better technique than the conventional frequency-domain So, in this study, we have confirmed through model
FWI in constructing the P-wave velocity model. In addition, resolution matrix analysis that the Laplace-Fourier FWI can
we propose that we can generate P-wave velocity model construct better results than the frequency-domain FWI
with better resolution by using Gauss-Newton method even in the absence of low frequencies. In addition,
without using gradient-based method which have been used numerical tests for the synthetic P-wave velocity model
in many Laplace and Laplace-Fourier domain FWI (BP model) were conducted to confirm the validity of the
researches. model resolution matrix analysis in waveform inversion.
Introduction Review of Laplace-Fourier FWI
It is important to estimate the correct subsurface P-wave The basic process of Laplace-Fourier FWI is the same as
velocity model to image oil and gas reservoirs. In order to that of the conventional frequency-domain waveform
obtain the P-wave velocity model, many oil and gas inversion. The difference is that the conventional
industries are still constructing velocity models using frequency-domain waveform inversion uses the Fourier
conventional methods such as semblance or travel-time transform to transform the observed data and generate the
tomography. However, these conventional methods have modeled data in the frequency-domain, whereas the
the disadvantage of providing only a low resolution Laplace-Fourier FWI uses the Laplace-Fourier transform
velocity model. On the other hand, Full waveform using the complex-frequency instead of the Fourier
inversion (FWI) is a method of constructing a velocity transform. The process of converting the time-domain
model using all waveforms based on wave equations seismic data into the Laplace-Fourier-domain can be
(Tarantola, 1984, 1986; Pratt et al., 1998; Virieux and transformed by following equation:
Operto, 2009; Warner et al., 2013) and FWI has the
advantage of being able to construct a high resolution 𝑇𝑇𝑚𝑚𝑚𝑚𝑚𝑚
velocity model. The FWI is a method to solve the inverse 𝑢𝑢�(𝜎𝜎) = � 𝑢𝑢(𝑡𝑡)𝑒𝑒 −𝜎𝜎𝜎𝜎 𝑑𝑑𝑑𝑑 (1)
0
problem in a way that minimizes the difference between the
acquired seismic data and the simulated seismic data. where 𝑇𝑇𝑚𝑚𝑚𝑚𝑚𝑚 is the maximum recording time of seismic
Because of the nonlinearity of the inverse problem, the traces, 𝑢𝑢(𝑡𝑡)is the wavefield in time-domain, 𝑢𝑢�(𝜎𝜎) is the
FWI is very sensitive to initial model. It is difficult to wavefield in Laplace-Fourier-domain, 𝜎𝜎 = 𝑠𝑠 + 𝑖𝑖𝑖𝑖 is
construct a desired velocity model by using a velocity complex-valued angular frequency, 𝑠𝑠 is positive Laplace
model that is far from the true model as the initial velocity damping constant, 𝜔𝜔 is angular frequency, and 𝑖𝑖 denotes
model for the FWI. Also, if the acquisition data is
insufficient for low frequency components, the waveform √−1.
inversion results are difficult to converge to the true model. The simplified discretized acoustic wave equation in the
To overcome these problems, many researchers devel- Laplace-Fourier-domain can be written as (Marfurt 1984):
oped robust algorithms to create a good staring model for
FWI (Brenders et al., 2008; Shin and Cha, 2008; Plessix et �(𝜎𝜎) = 𝒇𝒇�(𝜎𝜎)
𝐒𝐒𝒖𝒖 (2)
© 2017 SEG Page 1666

describes a generalized inverse. The generalized inverse in

where 𝐒𝐒 is the complex impedance matrix and 𝒇𝒇� is the a mixed-determined problems can be expressed as follows:
Laplace-Fourier transformed source vector. In this study,
we consider the constant density and isotropic media. 𝐆𝐆 −g = (𝐆𝐆 † 𝐆𝐆 + ϵ𝐈𝐈)−1 (7)
Laplace-Fourier transformed signals eventually reduce the
late signal and emphasize the early arrival signal relatively. where † denotes the complex conjugate transpose, ϵ is the
Therefore, to fully consider the transformed wavefield damping factor, and 𝐈𝐈 is the identity matrix. The equation
signals, we should use the logarithmic least-squares (6) can be expressed as shown in Figure 1. The relationship
objective function proposed by Min and Shin (2006) between the estimated model update and the model
instead of the general least-squares objective function. The difference is expressed as a resolution matrix. Most inverse
logarithmic objective function in the Laplace-Fourier- problems that arise in practice are neither completely over-
domain can be written as: determined nor under-determined. The mixed-determined
problem is a concept that combines the above two problems.
𝑛𝑛𝑠𝑠 𝑛𝑛𝑟𝑟 𝑇𝑇 ∗
1 �(𝜎𝜎)
𝒖𝒖 �(𝜎𝜎)
𝒖𝒖
𝐸𝐸(𝐦𝐦) = � ln � � ln � � (3)
2 �
𝒅𝒅(𝜎𝜎) � (𝜎𝜎)
𝒅𝒅
𝑖𝑖=1
where 𝑛𝑛𝑠𝑠 and 𝑛𝑛𝑟𝑟 are the number of sources and receivers,
respectively, 𝑑𝑑̃ is the observed wavefield vectors in the
Lapalce-Fourier-domain, 𝐦𝐦 is the model parameter, 𝑇𝑇
indicates transpose, and ∗ denotes a complex conjugate. To
find the model parameter 𝐦𝐦 that minimizes the objective
function 𝐸𝐸(𝐦𝐦), the local optimization method is used, and
the model update ∆𝐦𝐦 is obtained as follows: Figure 1. The relationship between the estimated model update and
the model difference is expressed as a resolution matrix (This
figure is a modification of Menke’s book(2012))
∆𝐦𝐦 = − 𝐇𝐇−𝟏𝟏 ∇𝐸𝐸(𝐦𝐦) (4)
If the resolution matrix is an identity matrix, then each
where 𝐇𝐇 is the Hessian matrix, and ∇𝐸𝐸(𝐦𝐦) is the gradient
model parameter is uniquely determined. However, if the
of the objective function. There are various method for
resolution matrix is not an identity matrix, then the
calculating the model update, but in this study, the Hessian
estimates of the model parameters are really weighted
matrix was calculated using the Gauss-Newton method.
averaged of the true model parameters. In other words, the
The gradient of the objective function is calculated using
fact that the resolution matrix is close to the identity matrix
the adjoint state method (Plessix, 2006).
means that the inverse problem of the system can be solved
well. Unfortunately, the waveform inversion problems we
encounter are mixed-determined problems, and in this case
Model resolution matrix
the generalized inverse cannot be the identity matrix.
We assume that the inverse problem we are dealing with
The Dirichlet spread function can be used as a measure of
in waveform inversion is linear through the Born
how similar the resolution matrix is to the identity matrix.
approximation. The linear inverse problem can be
This is based on the size, or spread, of the off-diagonal
expressed in an explicit linearized equation as follows:
elements. The spread function used in this study is as
follows:
𝐆𝐆∆𝐦𝐦 = ∆𝐝𝐝 (5)
𝑀𝑀 𝑀𝑀
where 𝐆𝐆 denotes a data kernel expressed in a Jacobian
spread(𝐑𝐑) = ‖𝐑𝐑 − 𝐈𝐈‖22 = � �[R 𝑖𝑖𝑖𝑖 − 𝛿𝛿𝑖𝑖𝑖𝑖 ]2 (8)
matrix and ∆𝐝𝐝 describes the data residual. Using equation
𝑖𝑖=1 𝑗𝑗=1
(5), the model resolution matrix can be defined by
where 𝑀𝑀 denotes the number of model parameters and 𝛿𝛿
following equation:
describes the dirac delta function. The closer the spread
value is to zero, the closer the resolution matrix is to the
∆𝐦𝐦𝒆𝒆𝒆𝒆𝒆𝒆 = 𝐆𝐆 −g ∆𝐝𝐝𝒐𝒐𝒐𝒐𝒐𝒐 = 𝐆𝐆 −g [𝐆𝐆∆𝐦𝐦𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕 ]
identity matrix.
(6)
To compare the performance of the Laplace-Fourier-
= [𝐆𝐆 −g 𝐆𝐆]∆𝐦𝐦𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕 = 𝐑𝐑∆𝐦𝐦𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕
domain FWI and frequency-domain FWI, we calculated
each model resolution matrix and spread values. The data
where ∆𝐦𝐦𝒆𝒆𝒆𝒆𝒆𝒆 is the estimated model update, ∆𝐦𝐦𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕 is the residual in the frequency-domain can be defined as the
model difference, 𝐑𝐑 is the resolution matrix, and 𝐆𝐆 −g
© 2017 SEG Page 1667

Fourier transform of the model difference, and the gradient

can be defined as the inverse Fourier transform of the data
residual (Sirgue and Pratt, 2004). In other words, analyzing
the transform at each domain (Laplace-Fourier and
frequency) allows us to analyze the FWI performance in
the corresponding domain. First, the model resolution
matrix for the Fourier transform in the case where the
frequency band is full-band has almost the same value as
the identity matrix as shown in Figure 2. The spread value
is also close to zero at 1.682e-26. In the second case, the
resolution matrix for band-limited (assuming no data below
3Hz) frequency band is shown in Figure 3. Compared with
Figure 2, we can figure out that there are values other than
diagonal elements. Also, the spread value is 25.0. Therefore,
the frequency-domain FWI for band-limited data can be
regarded as poor performance. In the third case, the Figure 3. Model resolution matrix of Fourier transform (band-
resolution matrix for the Laplace-Fourier transform was limited) and Spread(𝐑𝐑)=25.0
calculated. As in the second case, we assume that the data
does not exist below 3Hz. In the case of using only one
Laplace damping constant, the value of the resolution
matrix was not significantly different from that obtained by
the Fourier transform, and the spread value was also 24.99,
which was also not significantly different. This can be
confirmed by comparing Figure 3 and Figure 4. Finally, we
calculated the resolution matrix in the case of using 10
Laplace damping constants under the same conditions as
the third case. The resolution matrix using 10 Laplace
damping constants case is much closer to the identity
matrix than using only one Laplace damping constant case
and it is shown in Figure 5. In addition, the spread value is
also decreased to 12.39. As the number of Laplace damping
constants increases, the resolution matrix becomes closer to
the identity matrix because the rank of the Hessian matrix
increases. Consequently, the performance of the Laplace- Figure 4. Model resolution matrix of Laplace-Fourier transform
Fourier-domain FWI in the band-limited situation is better (single Laplace damping constant) and Spread(𝐑𝐑)=24.99
than the frequency-domain FWI.
Figure 5. Model resolution matrix of Laplace-Fourier transform

(10 Laplace damping constants) and Spread(𝐑𝐑)=12.39
Figure 2. Model resolution matrix of Fourier transform (full-band)
and Spread(𝐑𝐑)=1.682e-26
© 2017 SEG Page 1668

Numerical examples
We compared the Laplace-Fourier-domain FWI and the

frequency-domain FWI using a synthetic data based on the
model resolution matrix analysis. The P-wave velocity
model used in the experiment was the BP model and it is
illustrated in Figure 6-(a). The total number of shots is
1348 with 50m interval, a total of 300 receivers are used
with same interval, and the maximum offset is 15km. The
initial model to use for waveform inversion uses the
Laplace-domain FWI result, which are depicted in Figure
6-(b). We used 7 frequencies (it increases by 0.5Hz from
(a)
3Hz to 6Hz) for the frequency-domain FWI. Each
frequency has a total of 10 iterations. The frequency
selection of the Laplace-Fourier-domain FWI was the same
as the frequency-domain FWI. However, 10 Laplace
damping constants were used for each frequency (it
increases by 1s from 1s to 10s). The Laplace-Fourier-
domain has one iteration per complex-frequency. Finally,
the two FWIs have the same computational costs. Figure 6-
(c) shows the results of the Laplace-Fourier-domain FWI
and Figure 6-(d) shows the results of the frequency-domain
FWI. As can be seen from the inversion results, the
performance of the Laplace-Fourier-domain FWI is
superior to that of the frequency-domain FWI. Although (b)
the result of the Laplace-Fourier-domain FWI has not yet
converged on the true model, this can vary depending on
how the Laplace-Fourier constants (complex-frequency)
are selected. The research of efficient and accurate
determination of the Laplace-Fourier constants is a future
work.
Conclusion
Through this work, we demonstrate that the Laplace-

Fourier-domain FWI outperforms the frequency-domain
(c)
FWI by model resolution matrix analysis and waveform
inversion numerical tests. As with real-world data, this
study is an experiments that tests for the lack of low
frequency components, and it has become a study to show
the importance of complex-frequency in future other
frequency-domain FWI. Finally, our future work is to study
efficient and accurate Laplace-Fourier constants
determinations.
Acknowledgement
This work was supported by the Energy Efficiency & (d)

Resources Core Technology Program of the Korea Institute Figure 6. True BP P-wave velocity model (a), initial P-wave
of Energy Technology Evaluation and Planning granted velocity model (b), Laplace-Fourier-domain FWI result (c), and
financial resource from the Ministry of Trade, Industry & frequency-domain FWI result (d)
Energy, Republic of Korea (nos. 20132510100060).
© 2017 SEG Page 1669

EDITED REFERENCES
REFERENCES
Brenders, A. J., S. Charles, and R. G. Pratt, 2008, Velocity estimation by waveform tomography in the
canadian foothill-a synthetic benchmark study: 70th Annual International Conference and
Choi, Y., and A. Tariq, 2013, Frequency-domain waveform inversion using the phase derivative:
Marfurt, K. J., 1984, Accuracy of finite-difference and finite-element modeling of the scalar and elastic
wave equations: Geophysics, 49, 533–549, https://doi.org/10.1190/1.1441689.
Menke, W., 2012, Geophysical data analysis: Discrete inverse theory: Academic press, 45.
Plessix, R.-E., 2006, A review of the adjoint-state method for computing the gradient of a functional with
https://doi.org/10.1111/j.1365-246X.2006.02978.x.
Plessix, R.-E., S. Michelet, H. Rynja, H. Kuehl, C. Perkins, J. W. de Maag, and P. Hatchell, 2010, Some
3D applications of full waveform inversion: 72nd Annual International Conference and
Pratt, R. G., S. Changsoo, and G. J. Hick, 1998, Gauss–Newton and full Newton methods in frequency-
space seismic waveform inversion: Geophysical Journal International, 133, 341–362,
https://doi.org/10.1046/j.1365-246X.1998.00498.x.
Shin, C., and D.-J. Min, 2006, Waveform inversion using a logarithmic wavefield: Geophysics, 71, no. 3,
R31–R42, https://doi.org/10.1190/1.2194523.
Shin, C., and H. C. Young, 2008, Waveform inversion in the Laplace domain: Geophysical Journal
Shin, C., and H. C. Young, 2009, Waveform inversion in the Laplace — Fourier domain: Geophysical
Journal International, 177, 1067–1079, https://doi.org/10.1111/j.1365-246X.2009.04102.x.
Sirgue, L., O. I. Barkved, J. P. Van Gestel, O. J. Askim, and J. H. Kommedal, 2009, 3D waveform
inversion on Valhall wide-azimuth OBC: 71st Annual International Conference and Exhibition,
EAGE, Extended Abstracts, https://doi.org/10.3997/2214-4609.201400395.
Sirgue, L., and R. Gerhard Pratt, 2004, Efficient waveform inversion and imaging: A strategy for
selecting temporal frequencies: Geophysics, 69, 231–248, https://doi.org/10.1190/1.1649391.
1259–1266, https://doi.org/10.1190/1.1441754.
1893–1903, https://doi.org/10.1190/1.1442046.
Virieux, J., and O. Stéphane, 2009, An overview of full-waveform inversion in exploration geophysics:
Warner, M., A. Ratcliffe, T. Nangoo, J. Morgan, A. Umpleby, N. Shah, V. Vinje, I. Štekl, L. Guasch, C.
Win, G. Conroy, and A. Bertrand, 2013, Anisotropic 3D full-waveform inversion: Geophysics,
78, no. 2, R59–R80, https://doi.org/10.1190/geo2012-0338.1.
© 2017 SEG Page 1670

Characterizing and mitigating FWI modelling errors due to uncertainty in attenuation physics
Scott Keating and Kristopher A. Innanen, Department of Geoscience, University of Calgary
SUMMARY of the true (or at least a different, more appropriate) physical model.
A key assumption in seismic FWI is the adequacy of the wave prop-

agation physics model used in simulation and sensitivity calculations. THEORY AND APPROACH
The wide variety of available seismic attenuation and dispersion mod-
els makes the risk of modelling errors in QFWI high. We examine the Constant density an-acoustic physics models can be characterized by
consequences of unknown attenuation physics for QFWI, and propose two parameters: a Q term specifying attenuation, and a term specifying
an alternate updating strategy to alleviate some of them. By relaxing P-wave phase velocity, both of which can be functions of frequency.
the requirement that the frequency dependence of the assumed atten- Many different physics models exist, differing in the frequency depen-
uation model be self-consistent across the full spectrum, significant dence of Q and VP (Ursin and Toverud, 2002). QFWI, like standard
improvement in the fidelity of models, inferred from data obeying one FWI, involves an objective function based on least-squares data misfit,
attenuation model using methods assuming another holds, is found.
1
INTRODUCTION Φ(m) = ||dobs − dmod ||22 , (1)
2
Seismic full waveform inversion (FWI) is a technique which attempts where dobs and dmod are, respectively, measured and modelled wave-
to recover subsurface properties by iteratively minimizing a measure fields evaluated on a measurement surface, and m is the set of an-
of the discrepancy between observed data and modelled data (e.g., acoustic model parameters giving rise to dmod . This objective function
Lailly, 1983; Tarantola, 1984; Virieux and Operto, 2009). Multiparam- is minimized subject to the condition that a prior-defined wave equa-
eter FWI (Operto et al., 2013; Plessix et al., 2013; Pan et al., 2016), by tion is satisfied by these wavefields. In the framework of a frequency
involving multiple physical properties, offers the potential to recover domain finite difference approximation, for instance, data are mea-
not only this larger list of properties but to better match observed data. surements of a field u which satisfies
Broad application of FWI technology in, for instance, reservoir charac-
terization and monitoring, will require methods which are tuned to the S1 (ω, m)u(ω) = f(ω), (2)
multi-parameter problem. Significant challenges remain in bringing
multiparameter FWI to the same levels of practicality and sophistica- where f is a source term and S1 (ω, m) is a matrix that applies a fi-
tion currently occupied by mono-parameter FWI. A particularly press- nite difference stencil based on the an-acoustic physics relevant to the
ing issue is the stable inclusion of anelasticity/an-acousticity (e.g., problem.
Hicks and Pratt, 2001; Hak and Mulder, 2011; Malinowski et al., 2011;
Kamei and Pratt, 2013; Métivier et al., 2015). In this paper we con- In any FWI problem, but of special concern to QFWI, there exists the
sider one of the more difficult aspects of an-acoustic FWI (hereafter possibility that wave propagation in the unknown medium is better
QFWI), the problem of management of modelling error. In a compan- represented by
ion paper the interrelation between VP and QP cross-talk, frequency- S2 (ω, m)u(ω) = f(ω), (3)
band selection in multiscale FWI, and efficiency in truncated Newton
optimization, are also discussed (Keating and Innanen, 2017). where S2 (ω, m) invokes an attenuation model differing from that in
S1 (ω, m). Differences between, for instance, low-frequency velocity
A crucial assumption in FWI is that the wave physics giving rise to dispersion from one Q model to the next can vary significantly, so the
the observed data are adequately accounted for in the simulation com- two operators cannot be assumed to be similar. Therefore, concern
ponent of the procedure. If the wave propagation equations miss, or about the kinds of parameter values a model belonging to S2 will re-
incorrectly model, important features of the data, FWI will seek to quire, in order to minimize an objective function based on S1 , is high.
match those data features through often dramatically un-physical spa-
tial arrangements of the available model parameters. QFWI is espe- We must assume that one of these, say S1 , holds in order to begin the
cially prone to modelling errors, because (1) even small changes in the process of inverting the data. This means adopting equation (2) as a
Q model-type can lead to large differences in, for instance, wave ve- constraint. Our approach to managing QFWI modelling errors is to
locities at low frequencies, and (2) many model-types exist, and which relax this constraint to instead read
is suitable in any given instance may not be clear.
S1 (ω, mN )u(ω) = f(ω), for ωN < ω < ωN+1 , (4)
Innanen (2016) points out that, ideally, uncertainty in attenuation physics
would be managed by being maximally non-committal – framing FWI where mN is a subsurface model for the angular frequency range (ωN ,
to solve for a complex, frequency-dependent velocity at each point in ωN+1 ). This allows greater freedom in matching the attenuation be-
space; but, that is not possible because seismic data cannot in general haviour of the measured data, because it requires that the assumed
constrain this many parameters. On the other hand, while more de- physics be satisfied exactly only on a certain frequency band. As
cisively parameterized models are much more completely constrained the bandwidths ωN+1 -ωN decrease, modelling errors within any given
by data, choosing one a priori risks serious modelling errors. band become less significant. Piecewise application of S1 can, in other
words, closely mimic a model belonging to S2 .
In this paper, we formulate frequency-domain QFWI such that a “middle-
ground” between the two above extremes is occupied. In other words, The lower limit of this process involves bands containing single fre-
a parameterization in which seismic data are maximally non-committal quency components. Because the simultaneous determination of ve-
regarding model-type within the bounds of what can be constrained locity and Q requires several frequencies to be compared (Innanen and
by seismic data. The idea of relaxing the constraint that the assumed Weglein, 2007; Keating and Innanen, 2017), this limit should not in
physics be exactly obeyed is investigated by allowing a band-wise practice be approached. In the QFWI approach we consider here, it is
frequency-dependence in the recovered model. This increased flexi- in fact necessary to treat the width of the frequency bands as a trade-
bility offers important benefits when the assumed physical model in- off parameter, balancing the suppression of modelling error with the
volves different frequency dependence of wave propagation from that suppression of parameter cross-talk.
© 2017 SEG Page 1671

Mitigation of FWI modelling errors due to uncertainty in attenuation physics
KF and SLS models of an-acoustic wave propagation
To study attenuation-based modelling errors in isolation, we formu-

late a constant density anacoustic FWI, in which wave propagation is
governed by
2
ω s(r, ω) + ∇2 u(r, ω) = f (r, ω), (5)
where u is the pressure field, f is a source term, and the model param-
eter s depends on the dispersive velocity and attenuation:

ic0 (r) −2 Figure 1: Comparison of SLS and KF models for velocity 2500m/s
s(r, ω) ≈ c(r, ω) − , (6) at reference frequency 15Hz, and Q=20. Left: Velocity comparison.
2Q(r, ω)
Right: Attenuation comparison. Note the semilog scale.
where c is the phase velocity, c0 is the phase velocity at a reference
frequency, and Q is a quality factor.
Many models of attenuation and dispersion exist and are in regular use Many physical processes which could have significant impact on seis-
for processing, imaging and inverting seismic data. They tend to agree mic wave attenuation are well modelled by the standard linear solid
in their general reproduction of the amplitude and phase features of (Liu et al., 1976). Furthermore it has been pointed out (Liu et al., 1976)
dissipating waves, but in their detailed predictions of, e.g., phase ve- that the SLS and KF models are not necessarily at odds with one an-
locities at low frequencies they may differ widely. We select as bench- other. A general standard linear solid can be introduced by considering
mark models the the Kolsky-Futterman (KF) nearly constant Q model, several standard linear solid systems arranged in parallel. This intro-
and the standard linear solid (SLS) model. duces several relaxation mechanisms, and several attenuation peaks. If
the amplitudes and peak frequencies of these individual SLS compo-
Kolsky-Futterman (KF) model nents are chosen correctly, a general SLS with approximately constant
Q over a given bandwidth can be constructed. In this case the disper-
In certain attenuation models, the quality factor Q, defined as sive behaviour of the velocity reduces to equation 8 over the nearly
constant Q frequency band. In a situation like this a KF-based QFWI
1 ∆E procedure would suffer from little modelling error.
= , (7)
Q(ω) 2πE
Our purpose in this paper is to develop a methodology which lim-
where E and ∆E are the peak strain energy stored and strain lost its modelling errors when the QFWI model (e.g., KF) and the actual
during a given cycle (Aki and Richards, 2002), is forced to be con- model operating in the Earth are dissimilar. So, the SLS model con-
stant over a given frequency range. A constant Q in a non-dispersive sidered in the following examples is based on a single spring/dashpot
medium violates causality (Aki and Richards, 2002), so in many mod- system and does not reduce to KF behaviour. A comparison of KF and
els a frequency-dependent Q, which is nearly constant over the range SLS Q and P-wave velocity is shown in Figure 1, where the models
of seismic frequencies, and a dispersion term are adopted. There are have the same Q and P-wave velocity at 15Hz.
many ways to create a function which is nearly constant over the range
of seismic frequencies, so there are many different nearly constant Q Flexible FWI with unknown attenuation physics
model types (Ursin and Toverud, 2002; Liu et al., 1976). We select the
The discrepancies between the KF and SLS models illustrated in Fig-
nearly constant Q model due to Kolsky and Futterman (Kolsky, 1956;
ure 1 will have strong negative consequences for a QFWI procedure,
Futterman, 1962), hereafter the KF model, in which
if the KF model is assumed and the SLS model (or something like it)
actually holds. But, the consequences can be significantly reduced if
1 ω
c(ω) = c0 1 + log , (8) in the QFWI procedure the KF model is not forced to be self-consistent
πQ ω0 over the full frequency range. The additional flexibility afforded QFWI
by imposing the relaxed constraint in equation (4) is illustrated in Fig-
where c(ω) is the wave velocity, ω0 is a reference frequency and c0 = ure 2. An example SLS profile for Q and P-wave velocity is illus-
c(ω0 ). trated in this figure as a black dashed line, along with the KF model
Standard linear solid (SLS) model which most closely matches it in blue. Although both models are
evaluated using the same parameters, the highly dissimilar frequency-
The standard linear solid (SLS) model is based on viscoelastic con- dependence of these parameters in the different physics models mean
siderations, with a constitutive relation that is linear in the stress, the that the matching is very poor. The red line shows the best match
strain, and their derivatives (Casula and Carcione, 1992; Liu et al., which can be obtained using a relaxed KF model, with different pa-
1976). Continua are treated as consisting of a spring and dash-pot in rameters on each 1 Hz band. Clearly, this step offers considerable im-
series, in parallel with a second spring. The Q value given by this provement in the ability to match the observed behaviour, despite hav-
model is not constant, but is instead given by ing assumed physics different from the SLS. Adopting an FWI strategy
which allows for this better matching should improve the quality of the
1 + ω 2 τε τσ results in the case where the true attenuation model is unknown.
Q(ω) = , (9)
ω(τε − τσ )
While the flexible strategy outlined above in principle has the capac-
where τε and τσ are relaxation times related to the constants of the ef- ity to match unknown an-acoustic physics, the question of whether a
fective springs and dash-pot of the model (Casula and Carcione, 1992; QFWI procedure based on this idea works in practice is settled neither
Liu et al., 1976). This function is sharply peaked at ω = τ −1 , where by simply stating it nor by Figure 4. Two significant challenges may
√ present themselves in inversion using this strategy. First, while the
τ = τε τσ . The P-wave phase velocity for this model is given by
overall dispersive character of an ideal recovered model will closely
q match the true model, these behaviours may differ significantly within
1+iω0 τσ
Re 1+iω0 τε the small bands on which the inversion occurs. This means that inso-
c(ω) = c0 q , (10) far as the inversion considers the dispersive character of the observed
1+iωτσ
Re 1+iωτε
© 2017 SEG Page 1672

increasing. The initial model used for m1 was identical to that used
in the traditional FWI approach. The initial model for every other mN
was set equal to the final mN−1 .
For the first example, the model in Figure 3, and KF an-acoustic physics,
are used to generate the synthetic observed data. The initial model is
a uniform velocity of 2500 m/s and uniform Q−1 of 0, matching the
background of the true model. The QFWI procedure assumes (in this
case, correctly) an KF an-acoustic model. The result of traditional
QFWI with an exact Gauss-Newton optimization is illustrated in Fig-
ure 4, where the recovered velocity at reference frequency and Q are
Figure 2: Comparison of SLS with best fitting KF, and band-defined shown. This result acts as a kind of benchmark, reflecting the ideal
KF. Due to the highly dissimilar behaviour of the model types, the KF case of a simple model, dense acquisition and exact Gauss-Newton
result is a poor approximation of the SLS behaviour. The band-defined numerical optimization.
KF is capable of matching the SLS behaviour much more closely,
though still differs in dispersive behaviour on each band. The result of applying the flexible QFWI for two example bands is
illustrated in Figures 5 and 6. Results comparable to the benchmark
are obtained here, however comparison of the results generated using
different bands make clear that variance in the recovered model pa-
data, it may lead the estimated model away from the best approxima- rameters is introduced from band to band. The left panel of Figure
tion. The second problem is that it is not straightforward to predict 6 is suggestive that cross-talk issues can appear for certain frequency
what spatial arrangement of (e.g.) KF model parameters will be set- bands, and that therefore the issues discussed in this paper and those
tled on by a QFWI procedure, if those parameters vary widely and discussed by Keating and Innanen (2017) are not independent. This
non-self consistently over the full spectrum, and whether or not these is suggestive that self-consistency of the an-acoustic model across the
structures will tend to be realistic. It is difficult to address the impact full frequency range is optimal, if the correct an-acoustic model type
of these concerns without the aid of synthetic examples, which are is well established in advance.
considered in the next section.
NUMERICAL EXAMPLES
In this section, two distinct QFWI approaches are implemented numer-

ically and used to explore the an-acoustic modelling-error suppression
strategy. In the first, which will be referred to as ‘traditional’ QFWI,
equation 2 is strictly adhered to, and the updating strategy used is
designed to minimize an-acoustic cross-talk, following Keating and
Innanen (2017). The second, which will be called ‘flexible’ QFWI,
obeys instead the constraint in equation 4 and uses an updating strategy
Figure 3: True model velocity at reference frequency 30 Hz (left) and
consistent with this constraint. The QFWI procedure is built around a
reciprocal Q (right) for KF model type.
KF an-acoustic model in both cases. The phase velocity at the ref-
erence frequency, c0 , and Q, are the two unknown parameters to be
recovered in all cases. The reference frequency is fixed at 30 Hz.
In order to avoid introducing complicating optimization issues, rather

than use the more efficient truncated Newton algorithms, the examples
calculated in this paper were chosen to be sufficiently small that exact
Gauss-Newton numerical optimization could be employed. Frequency
domain finite difference modelling is used for two-dimensional mod-
els on a 50 by 50 grid, with grid spacing of 10 meters in both the
horizontal and vertical dimensions. 24 sources at 20m intervals are
arrayed laterally, at 30m depth, from 10m to 470m. 48 receivers at
10m intervals are placed at 20m depth from 10m to 480m. Frequen-
Figure 4: Final result of conventional QFWI using the correct KF
cies from 1Hz to 25Hz are assumed to be available, and the source
model type.
function is considered to have a uniform amplitude spectrum over this
range. First order Engquist boundary conditions are implemented at
every boundary.
Traditional QFWI is applied here with a multiscale approach, consid-

ering 24 frequency bands, beginning with 1Hz-2Hz, and increasing
the upper limit by 1Hz at each band, up to the final band of 1Hz-25Hz.
At each stage of the inversion, 6 frequencies spaced evenly from the
minimum to the maximum frequency of the band are simultaneously
inverted. The frequency band is changed every 2 iterations.
The flexible FWI approach used here uses 12 instances of fN , in 2Hz

steps from 1Hz to 23Hz. To recover each mN , six evenly spaced fre-
quencies from fN to fN+1 were inverted simultaneously. 4 iterations Figure 5: Final result of flexible QFWI approach with a 13-15 Hz
were performed for each mN , resulting in a total number of iterations maximum band, using the correct KF model type.
equal to the number used in the traditional FWI approach. The mN
were solved for sequentially, beginning with the smallest fN and then In a second example the central problem of the current study, in which
© 2017 SEG Page 1673

Figure 6: Final result of flexible QFWI approach with a 23-25 Hz Figure 8: True model velocity (left) and reciprocal Q (right) at 25 Hz
maximum band, using the correct KF model type. for SLS attenuation physics.
the assumed attenuation model is incorrect, is explored. The model

shown at example frequencies in Figures 7 and 8 was used to generate
the data. In this model, a SLS model was used, with a peak Q−1 at 15
Hz. The acquisition geometry and FWI strategy are unchanged from
the previous example. The peak reciprocal Q is appreciably higher
than for the KF model, in order to introduce non negligible attenuation
away from the peak.
First we examine the result of applying to the SLS-type data a tradi-

tional QFWI procedure in which the KF an-acoustic model is assumed Figure 9: Final result of conventional QFWI; KF inversion carried out
to hold. The results are illustrated in Figure 9. Evidently QFWI fails to on SLS data..
recover a meaningful velocity or Q model — despite benefiting from
the simple model geometry, dense acquisition and powerful numer-
ical optimization which allowed for strong recovery in the previous
example. This highlights the hazards associated with uncertainty in
the QFWI attenuation model, adding further incentive for a flexible
approach.
Results produced by applying the flexible QFWI approach, based on

the KF model, again on SLS data, are shown for two example bands
in Figures 10 and 11, where the recovered velocity at the example fre-
quency and Q are shown. These example results correspond to the true
model at frequencies shown in figures 7 and 8, respectively. The less Figure 10: Final result of flexible band approach for 13-15 Hz maxi-
restrictive constraints used in this approach allow for a significantly mum band; KF inversion carried out on SLS data.
improved recovery of the true model behaviour. The recovered mod-
els effectively identify the position and shape of the anomalies. Gen-
erally our survey of the modelling error problem as summarized here
supports the use of a flexible QFWI strategy of the type we present
here.
Figure 11: Final result of flexible band approach for 23-25 Hz maxi-
mum band; KF inversion carried out on SLS data.
CONCLUSIONS
Figure 7: True model velocity (left) and reciprocal Q (right) at 15 Hz
for SLS attenuation physics.
The inclusion of attenuation in seismic FWI offers the potential for
improved recovery of subsurface parameters of interest, but presents
The computational cost of the two QFWI approaches is identical, the unique risks associated with modelling error. The flexible QFWI ap-
greater number of models recovered in the flexible approach being off- proach suggested here relaxes the FWI constraint that the modelled
set by the smaller number of iterations used to invert for each. The rea- wavefield strictly obeys an assumed physics model across all experi-
son for this similarity is that the flexible approach can be interpreted as mental variables. This allowed for significant improvements over tra-
an alternative multiscale strategy in conventional FWI, with the caveat ditional FWI strategies as applied to dissipative problems.
that the final result is an approximation of the model behaviour only
within the highest frequency band considered, and that the interme- ACKNOWLEDGMENTS
diate steps themselves provide an estimate of the model behaviour at
We thank the sponsors of CREWES for continued support. This work
their respective frequency ranges.
was funded by CREWES industrial sponsors and NSERC (Natural
Science and Engineering Research Council of Canada) through the
grant CRDPJ 461179-13.
© 2017 SEG Page 1674

EDITED REFERENCES
REFERENCES
Aki, K., and P. G. Richards, 2002, Quantitative seismology, 2nd ed.: University Science Books.
Casula, G., and J. Carcione, 1992, Generalized mechanical model analogies of viscoelastic behaviour:
Bolletina di Geofisica Teorica ed Applicata, 34, 235–256.
Futterman, W., 1962, Dispersive body waves: Journal of Geophysical Research, 67, 5279–5291,
http://doi.org/10.1029/jz067i013p05279.
Hak, B., and W. Mulder, 2011, Seismic attenuation imaging with causality: Geophysical Journal
Hicks, G., and R. Pratt, 2001, Reflection waveform inversion using local descent methods: Estimating
attenuation and velocity over a gas-sand deposit: Geophysics, 66, 598–612,
http://doi.org/10.1190/1.1444951.
Innanen, K. A., 2016, Selecting a dispersion model-type for anelastic FWI — near-surface
characterization using uncorrelated vibroseis data: 78th Annual International Conference and
Exhibition, EAGE, Extended Abstracts, http://doi.org/10.3997/2214-4609.201601577.
Innanen, K. A., and A. B. Weglein, 2007, On the construction of an absorptive–dispersive medium model
via direct linear inversion of reflected seismic primaries: Inverse Problems, 23, 2289–2310,
http://doi.org/10.1088/0266-5611/23/6/001.
Kamei, R., and R. Pratt, 2013, Inversion strategies for visco-acoustic waveform inversion: Geophysical
Journal International, 194, 859–884, http://doi.org/10.1093/gji/ggt109.
Keating, S., and K. A. Innanen, 2017, Characterizing and mitigating uncertainty in the physics of
attenuation in an-acoustic full waveform inversion: 87th Annual International Meeting, SEG,
Expanded Abstracts, submitted.
Kolsky, H., 1956, The propagation of stress pulses in viscoelastic solids: Philosophical Magazine, 1, 693–
710, http://doi.org/10.1080/14786435608238144.
Inverse Scattering, Theory and Application, Society for Industrial and Applied Mathematics,
Liu, H., D. Anderson, and H. Kanamori, 1976, Velocity dispersion due to anelasticity; implications for
seismology and mantle composition: Geophysical Journal International, 47, 41–58,
http://doi.org/10.1111/j.1365-246X.1976.tb01261.x.
Malinowski, M., S. Operto, and A. Ribodetti, 2011, High-resolution seismic attenuation imaging from
wide-aperture onshore data by visco-acoustic frequency-domain full-waveform inversion:
Geophysical Journal International, 186, 1179–1204, http://doi.org/10.1111/j.1365-
246X.2011.05098.x.
Metivier, L., R. Brossier, S. Operto, and J. Virieux, 2015, Acoustic multi-parameter FWI for the
reconstruction of P-wave velocity, density and attenuation: preconditioned truncated Newton
approach: 84th Annual International Meeting, SEG, Expanded Abstracts, 1198–1203,
tour of multiparameter full-waveform inversion with multicomponent data: from theory to
© 2017 SEG Page 1675

Pan, W., K. A. Innanen, G. F. Margrave, M. Fehler, X. Fang, and J. Li, 2016, Estimation of elastic
constants for HTI media using Gauss-Newton and full Newton multi-parameter full waveform
inversion: Geophysics, 81, no. 5, E323–E339, http://doi.org/10.1190/geo2015-0594.1.
Plessix, R.-E., P. Milcik, H. Rynia, A. Stopin, K. Matson, and S. Abri, 2013, Multiparameter full-
waveform inversion: marine and land examples: The Leading Edge, 32, 1030–1038,
http://doi.org/10.1190/tle32091030.1.
1259–1266, http://doi.org/10.1190/1.1441754.
Ursin, B., and T. Toverud, 2002, Comparison of seismic dispersion and attenuation models: Studia
Geophysica et Geodaetica, 46, 293–320, http://doi.org/10.1023/A:1019810305074.
Weglein, A. B., and K. H. Matson, 1998, Inverse scattering internal multiple attenuation: An analytic
example and subevent interpretation: Annual International Meeting, SPIE Conference on
Mathematical Methods in Geophysical Imaging, 108–117, http://doi.org/10.1117/12.323282.
© 2017 SEG Page 1676

A new method of low wavenumber updates for full waveform inversion
Shijie Lian*, Sanyi Yuan, Guanchao Wang, Ying Liu, Weibin Song, Shangxu Wang, China University of
Petroleum-Beijing, State Key Laboratory of Petroleum Resources and Prospecting, CNPC Key Laboratory of
Geophysical Exploration
Summary Better results can be obtained by using these methods

compared with those of the conventional FWI.
Full waveform inversion (FWI) tries to obtain a subsurface
model that best explains amplitude and phase information Besides, Liu et al. (2011) proposed a computational
of the measured data. However, if the initial model is far efficient method based on Hilbert transform. Fei et al.
from the true one, the result of FWI will inevitably get (2015) further extended the method by applying Hilbert
trapped in local minimums. In order to enhance the validity transform to the source wavefield with respect to depth and
of FWI, we develop a method to improve the low to the receiver wavefield with respect to time and depth.
wavenumber updates during the acoustic FWI. In this Their methods based on Hilbert transform avoid intensive
method, we first decompose both source and residual computation and improve the efficiency of the imaging
wavefields to their one-way components using a fast condition (Liu et al., 2011). These two methods aimed at
algorithm and extract the tomographic term of the removing the low-frequency noise in the narrow region in
conventional gradient to serve as a new gradient. Then we reverse time migration (RTM) with the extracting
apply the result of FWI using the new gradient to the migration term of gradient.
conventional FWI. The fast wavefields decomposition
algorithm based on Hilbert transform is the key aspect of Considering the similarity of the gradient between FWI and
our method for avoiding expensive decomposition RTM, we apply the fast algorithm proposed by Fei et al.
computation, saving memory and improving the efficiency (2015) to FWI. The new gradient constructed by extracting
of the gradient computation compared to the conventional the tomographic term of the gradient can update reliable
decomposition method adopted in frequency-wavenumber low wavenumber information after several iterations.
(f-k) domain. Numerical tests modeled on a simple two-
layer model and on the Marmousi model demonstrate that
our method can effectively obtain reliable low wavenumber Method
updates and make FWI converge to the true model even in
situations where the conventional FWI fails. In FWI, the image is conventionally constructed by taking
the cross-correlation of the source and backscattered
wavefields, which is defined as:
Introduction
Tmax
I ( x)   S ( x, t ) R ( x, t )dt , (1)
Full waveform inversion (FWI) attempts to find a model 0
that can best explain observed seismic data by iteratively where S ( x, t ) is the source wavefield, R ( x, t ) is the residual
minimizing the difference between observed and simulated wavefield at spatial location x and time t, and Tmax
data (Tarantola, 1984). However, the objective function is
represents the maximum record time. The source and
highly nonlinear and has many local minimums. If low
residual wavefields can be decomposed as:
wavenumber components of the initial velocity model are
insufficient or far from the true one, the result of FWI will
S ( x, t )  Sd ( x, t )  Su( x, t ) , (2)
get trapped in local minimums. Hence, a good method
which can be used to relax the dependence on the initial R( x, t )  Rd ( x, t )  Ru( x, t ) , (3)
model is needed.
where “ u (up)” and “ d (down)” denote the directions of
Many methods have been developed to enhance the validity wave propagation.
of FWI. Shin et al. (2008) proposed Laplace domain and
Laplace-Fourier domain FWI approach to build a low- Equation 1 can be further expressed as:
wavenumber velocity model for FWI. Tang et al. (2013)
proposed to enhance the tomographic components at early T max T max
I ( x)   Sd ( x, t ) Ru ( x, t )dt   Su ( x, t ) Rd ( x, t )dt 
iterations and gradually reduce its weights toward 0 0
convergence. Meanwhile, seismic envelope inversion (Wu

T max T max
et al., 2014) and full travel time inversion (Luo et al., 2016)
were proposed to generate reliable initial velocity models.
 0
Sd ( x, t ) Rd ( x, t )dt  
0
Su ( x, t ) Ru ( x, t )dt . (4)
© 2017 SEG Page 1677

A method of low wavenumber updates
Uz S ( x, t )Uz Ut  R ( x, t )  dt
T max T max
0
Sd ( x, t ) Rd ( x, t )dt  Re 
0
T max  S ( x, t ) R( x, t )  S ( x, t ) Hz Ht  R( x, t )  
 

  dt ,
0

 Hz  S ( x , t )  Ht  R ( x , t )   Hz  S ( x , t )  Hz  R ( X , T ) 

(12)
Uz S ( x, t )Uz Ut  R ( x, t ) dt
T max T max
0
Su ( x, t ) Ru ( x, t )dt  Re 
0
Figure 1. Wave paths of forward and backward scattering
wavefields in the model space.
T max  S ( x, t ) R( x, t )  S ( x, t ) Hz Ht  R( x, t )  
 

  dt ,
In usual, the wavefields decomposition can be adopted in  Hz  S ( x, t ) Ht  R( x, t )  Hz  S ( x, t )  Hz  R( X , T ) 

0

frequency-wavenumber (f-k) domain (Hu and McMechan, (13)
1987), but such decomposition requires intensive
computation. Inspired by the approach proposed by Liu et
Applying equations 12 and 13 to equation 11, we obtain a
al. (2011), Fei et al. (2015) developed a fast wavefields
new gradient which can be used to update the low
decomposition algorithm based on Hilbert transform.
wavenumber information. Though the formulas of
Hence, we apply the fast algorithm to wavefields
equations 12 and 13 are complex, we can finally obtain a
decomposition and derivate the extended complex
simple gradient formula by summing them up as:
wavefields with respect to time t and spatial location z as:
T max
S : Ut  S (t )  S (t )  iHt  S (t ) (5) I ( x)   Sd ( x, t ) Rd ( x, t )  Su ( x, t ) Ru ( x, t ) dt
0
R : Ut  R(t )  R(t )  iHt  R(t ) (6) 

T max
2 * S ( x, t ) R( x, t )  2 * Hz  S ( x, t )  Hz  R( x, t )  dt .
0
(14)
Sd : Uz  S ( z)  S ( z)  iHz  S ( z) (7)
Here, Hz is the Hilbert transform with respect to depth.
Su : Uz  S ( z)  S ( z)  iHz  S ( z) (8) The workflow of our method can be mainly expressed as
follows:
Rd : Uz  R( z)  R( z)  iHz  R( z) (9) (1) Updating a reliable initial model after several iterations
of FWI using equation 14.
Ru : Uz  R( z)  R( z)  iHz  R( z) (10)
(2) Applying the result in step 1 to the conventional FWI
to serve as the initial model.
Here, U represents the extended wavefields in complex-
value domains, H represents the Hilbert transform, and the Examples
subscripts t and z represent the Hilbert transform with
respect to the time and depth variables. The tomographic To demonstrate the validity of our method to acoustic FWI,
term of FWI representing the cross-correlation wavefields we test the method on a simple Gaussian high-velocity
traveling in the same direction can be constructed by anomaly model and on the more complicated Marmousi
keeping only the last two terms of equation 4 as: model. In both examples, we only use reflection data.
Wavefields simulation and velocity inversion are both
T max
I ( x)   Sd ( x, t ) Rd ( x, t )  Su ( x, t ) Ru ( x, t ) dt achieved in the time domain.
0
Uz Ut  S ( x, t )Uz Ut  R( x, t )  dtSdRd 

T max
 Re  The first numerical example is based on a two-layer model
0
with a circular anomaly placed at the middle position. The
Uz Ut  S ( x, t ) Uz Ut  R ( x, t )  dt SuRu .
T max
Re  (11) initial model is the same as the true model except the
0
circular anomaly. They are shown in Figure 2. The space
Here, only the receiver wavefield is extended with respect sampling interval in both horizontal and vertical directions
to time because the extended source and receiver is 0.01 km. We use 40 evenly distributed shots on the
wavefields are both zero for negative  (Fei et al., 2015). surface, and a Ricker wavelet with the peak frequency of
Hence, equation 11 can be adjusted as: 10 Hz is used as the source. The source and receiver
intervals are 0.07 km and 0.01 km, respectively. The
© 2017 SEG Page 1678

maximum record time is 4 s with a sampling interval of 1 example except the source interval and the initial model. In
ms. this example, we adopt a 0.1 km source interval and a
linear initial model (Figure 5b).
The results of the conventional FWI and the first step of
our method are shown in Figure 3. They both iterate for 30
times, and we can see how dramatically different the two
images look. The result of the conventional FWI (Figure 3a)
only recovers parts of the circular anomaly, and the inner
velocities of the anomaly do not match the true velocity
model well especially in the deep region. This can be
(a)
interpreted into lacking enough low wavenumber
information in initial model and without generating
sufficient low wavenumber information during the
inversion. However, for Figure 3b, the result of the first
step of our method successfully recovers the anomaly from
top to bottom.
Further numerical comparison between the conventional (b)

FWI and our method can be seen in Figure 4. It is obvious Figure 3. Comparison of (a) the result of the conventional
that the result of our method is more accurate and that the FWI and (b) the result of our method after 30 iterations for
tendency of the velocity profile is stable. But for the result simple model.
of the conventional FWI, the accuracy of its velocity profile Velocity(m/s)
2100 2150 2200 2250 2300 2350
degrades seriously in the deep region and its differences
with the true one are much more than ours. Considering the 0.3
narrow contrast between the background velocity of 2.0 True model

Conventional FWI
km/s and the anomaly velocity of 2.3 km/s, we can know 0.4 The method of low
wavenumber updates
why the numerical gap between the conventional FWI and

our method is not big. The results can also support our 0.5
Distance(Km)
ideas mentioned above. Meanwhile, a higher accuracy can

be achieved after proceeding the second step of our method. 0.6
0.7
0.8
0.9
(a)
Figure 4. The comparison between the result of the
conventional FWI (green line) and the result of our method
(blue line) at x=2.5 km. The red line denotes the velocity
profile of the true model. The green line denotes the
velocity profile of the conventional FWI. The blue line
denotes the velocity profile of our method.
(b)
The result of the first step of our method after 30 iterations
Figure 2. Simple model test. (a) The true velocity model. A
is shown in Figure 6. We can clearly see the contour of the
high-velocity anomaly is embedded in a layered
Marmousi model in the narrow region which demonstrates
background velocity. (b) The initial velocity model. The
sufficient low wavenumber updates. In order to test the
upper- and lower-layer interval velocities are 2.0 km/s and
reliability of this low wavenumber enhanced result, we set
2.5 km/s. respectively. The velocity of the circular anomaly
it as the initial model and apply it to the conventional FWI.
is 2.3 km/s.
The resulting image is shown in Figure 7b. In shallow and
middle regions, the matching between the true model and
The second example is based on the more complicated
the inverted result is good. Quality of the inverted velocity
Marmousi model (Figure 5a). The recording geometry and
model degrades in the deep region because of the poor
corresponding settings are the same as those in the first
© 2017 SEG Page 1679

illumination and weak records of the reflections from the allowing FWI to converge to the true model even in
dipping reflectors. situations where the conventional FWI fails even if the
computational cost of the conventional FWI is the same as
our method.
(a)
Figure 6. The result of our method after 30 iterations of

FWI using the new gradient proposed in this paper.
(b)
Figure 5. Complex model test. (a) The Marmousi velocity
model. (b) The linear initial velocity model.
(a)
To demonstrate the superiority of our method, we also take
a conventional FWI test using the linear initial model in
Figure 5b, and the result is shown in Figure 7a. The
computational cost of this test is the same as that of our
method. Obviously, the result is unsatisfactory because of
the local-minimal convergence at the upper left corner and
the misplaced reflectors, which is caused by the insufficient
low wavenumber information of the initial model and
unreliable low wavenumber updates during the inversion.
(b)
Conclusions
Figure 7. The comparison between (a) the result of the
conventional FWI after 150 iterations and (b) the result of
The gradient of FWI contains tomographic and migration
our method with 30 iterations for the updates of low
terms. With the wavefields decomposition using a fast
wavenumber components using equation 14 and 100
algorithm based on Hilbert transform and the extracted
iterations of the conventional FWI. The computational
tomographic term serving as a new gradient, we
costs of these two methods are the same.
successfully develop a method of low wavenumber updates
which can relax the dependence on the initial model of FWI.
The fast wavefields decomposition algorithm based on
Acknowledgments
Hilbert transform is the key aspect of our method for
avoiding expensive decomposition computation, saving
This research was supported by the National Natural
memory and improving the efficiency of the gradient
Science Foundation of China (41674127), the Science
computation compared to the conventional decomposition
Foundation of China University of Petroleum, Beijing
method adopted in frequency-wavenumber (f-k) domain.
(2462015BJB04), and the National Basic Research
Numerical tests demonstrate the validity of our method for
Program (2013CB228600).
© 2017 SEG Page 1680

EDITED REFERENCES
REFERENCES
Fei, T., Y. Luo, J. Yang, H. Liu, and F. Qin, 2015, Removing false images in reverse time migration: The
concept of de-primary: Geophysics, 80, no. 6, S237–S244.9, http://doi.org/10.1190/geo2015-
0289.1.
Hu, L., and G. A. McMechan, 1987, Wave-field transformations of vertical seismic profiles: Geophysics,
52, 307–321, http://doi.org/10.1190/1.1442305.
Liu, F., G. Zhang, S. Morton, and J. P. Leveille, 2011, An effective imaging condition for reverse-time
migration using wavefield decomposition: Geophysics, 76, no. 1, S29–S39,
http://doi.org/10.1190/1.3533914.
Luo, Y., Y. Ma, Y. Wu, H. W. Liu, and L. Cao, 2016, Full-traveltime inversion: Geophysics, 81, R261–
R274, http://doi.org/10.1190/geo2015-0353.1.
http://doi.org/10.1190/1.1442625.
Shin, C., and Y. H. Ha, 2008, Waveform inversion in the Laplace domain: Geophysical Journal
International, 173, 922–931. http://doi.org/10.1111/j.1365-246X.2008.03768.x.
Tang, Y., S. Lee, A. Baumstein, and D. Hinkley, 2013, Tomographically enhanced full wavefield
inversion: 83rd Annual International Meeting, SEG, Expanded Abstracts, 1037–1041,
1259–1266, http://doi.org/10.1190/1.1441754.
Wu, R. S., J. R. Luo, and B. Y. Wu, 2014, Seismic envelope inversion and modulation signal model:
© 2017 SEG Page 1681

Extrapolated full waveform inversion: An image-space approach
Yunyue Elita Li, National University of Singapore, and Laurent Demanet, Massachusetts Institute of Technology
SUMMARY to enhance the signal-to-noise ratio, and to interpolate the prestack

seismic data (Weibull and Arntsen, 2014; Hou and Symes,
The primary factor that prevents full waveform inversion from 2016).
universal success is the band-limited nature of seismic data,
resulting in a gap between the low wavenumber background Here, we focus on the wavenumber (and corresponding fre-
velocity model and the high wavenumber seismic images. In quency) analysis of the extended images (and corresponding
this paper, we propose to bridge the wavenumber gap in the data). After extended Born modeling, the frequency band of
extended image space, where full kinematic information in the the synthesized data dˆ is determined by two factors: the fre-
data is preserved in spite of the inaccuracy of the background quency band of the source wavelet w defined in the modeling
migration velocity model, and where the wavenumber range of operator Fe , and the wavenumber band of the extended image
the extended image is extrapolated using total-variation con- Ie . The key innovation of this paper can be summarized math-
strained deconvolution. This explicit wavenumber extrapola- ematically as follows:
tion is nested within least-squares reverse time migration iter- dˆl = Fe (wl )Dme , (2)
ations to ensure that the resulting extended images match the
recorded band-limited data. We then synthesize reflection data where Dme is the extended model, a modified version of Ie with
using extended Born modeling with the extrapolated images. same kinematical information but a wider wavenumber range;
Numerical experiments show that although the total variation wl is a synthesized wavelet, whose frequency support is wider
projection has limited the high frequencies that can be recre- than the original source wavelet w; and dˆl is the newly syn-
ated by extended Born modeling, the low frequencies are re- thesized data, whose frequency support is also wider than the
liably extrapolated at all offsets, given a reasonable starting original recorded data d. In other words, frequency extrapo-
velocity model. When the initial model is too crude, the pro- lation of the data is achieved by wavenumber extrapolation of
posed frequency extrapolation breaks down near the complex the extended images and subsequent extended Born modeling.
structures.
The wavenumber extrapolation is an essential step in the fre-
quency extrapolation, because the raw extended images only
contain wavenumber components that are supported by the fre-
INTRODUCTION
quency band of the recorded data. In this paper, we propose to
perform wavenumber extrapolation by adding a total variation
The extended imaging scheme introduces extra spatial or time (TV) constraint (Rudin et al., 1992; Fadili and Peyré, 2011)
lags in the image space to accommodate the kinematic error to the LSERTM inversion. However, the extended Born mod-
that has not been modeled correctly by the migration velocity eling and imaging operators are too expensive to evaluate for
model (Rickett and Sava, 2002; Sava and Fomel, 2006). When hundreds of iterations that are required for the TV constraint.
the initial velocity is inaccurate, the images in the extended Therefore, we design a variable-wavelet 1D convolutional op-
model space will not focus at the zero subsurface-offset or at erator to approximate the expensive normal operator Fe⇤ Fe and
zero time-lag, as opposed to the assumption by conventional solve the TV-constrained deconvolution problem as a precon-
RTM. These “leaked” events in the extended model space help ditioning step of the LSERTM iterations. This deconvolution
reconstruct the accurate kinematics of the recorded reflection step also compensates the footprint of the wavelet on the ex-
data via extended Born modeling, regardless of the accuracy tended images. Consequently, we achieve fast convergence for
of the migration velocity model. A conceptual description of both wavenumber extrapolation and data fitting by extended
the extended Born modeling operator takes the form Born modeling.
dˆ = Fe (w)Ie , (1) The spirit of this paper is aligned with our previous papers (Li
and Demanet, 2015, 2016): low frequency data can be syn-
where Fe is the extended Born modeling operator, the adjoint
thesized based on the bandlimited recorded data. The full
operator of the extended imaging operator; Ie is the extended
waveform inversion schemes based on the extrapolated low
image, and dˆ is the modeled data. Since the kinematics in dˆ
frequencies are referred as extrapolated full waveform inver-
match the kinematics in the recorded reflection data d, the ex-
sion (EFWI) in both cases. However, the particular approaches
tended images can be used to fit the recorded reflection data
are drastically different. The previously proposed phase track-
without the concerns of cycle-skipping in conventional full
ing method is performed in the data space, via highly con-
waveform inversion (Gauthier et al., 1986; Gerhard Pratt et al.,
trolled model parameterization and strong model reduction.
1998; Pratt and Shipp, 1999a,b; Etgen et al., 2009). The result-
The method we propose in this paper achieves a similar goal
ing iterative inversion scheme is referred as least-squares ex-
in the image space, via image extension and deconvolutive fre-
tended reverse time migration (LSERTM). LSERTM has shown
quency extrapolation. The implicit duality between these two
robustness against the inaccuracies in the migration velocity
methods is yet to be explored.
model in many applications, such as to separate contaminated
seismic records with simultaneous sources (Leader et al., 2014),
© 2017 SEG Page 1682

METHOD data:
ˆ s , xr ,t)
d d(x = (Fe Dme )(xs , xr ,t)
To illustrate the workflow of the proposed method, we demon-
Z
strate each step using a simple three-layer synthetic example ∂2
= dxdhdtw(t)G(xs , x + h, t)
(Figure 1(a)). The background velocity is 2000 m/s and the ∂t 2
velocity of the middle layer is 4000 m/s. The thickness of the ⇥ 2m0 (x)Dme (x, h)G(x h, xr ,t t). (6)
middle layer is 200 m. Figure 1(b) shows the bandlimited shot
record between 6 and 50 Hz at xs = 0 m. We use 41 shots at Under favorable conditions, this pair of forward and adjoint
100 m spacing to image the synthetic model. operators are pseudodifferential. In that case, sequentially ap-
plying them one after the other, regardless of the order, does
not move the singularities. Therefore, the phases modeled
by sequential application of this pair of operators are aligned
with the original object. The amplitude differences can be
compensated by gradient-based inversion due to linearity. ten
Kroode (2012) and Hou and Symes (2015) provided detailed
explanations for the statement above and proposed designs of
pseudoinverse operators for the extended Born modeling in
(a) (b)
terms of Kirchhoff migration and reverse-time migration, re-
spectively.
Figure 1: (a) The synthetic model with a high-velocity layer. In this paper, we formulate least-squares extended reverse time
(b) Synthetic seismic data recorded between (6-50 Hz). migration problem explicitly:
Least-squares extended reverse time migration (LSERTM) 1X

JLSERT M (Dme (x, h)) = (d us (xs , xr ,t; Dme ) d dr,s,t )2 ,
2 r,s,t
The primary reflection data can be estimated by Born approxi- (7)
mation using the causal Green’s function G(x, y,t) for a given where d d is the primary reflection data, and d us is the modeled
background velocity m0 . Hence, the resulting scattered data data by Equation 6. When working with reflection data whose
modeled by the Born modeling operator can be expressed: specular angle is limited to a maximum of 45 , the LSERTM
Z inversion can only resolve the wavenumber components corre-
∂2
(FDm)(xs , xr ,t) = dxdtw(t)G(xs , x, t) sponding to the frequency content of the data.
∂t 2
⇥ 2m0 (x)Dm(x)G(x, xr ,t t), (3) Figure 2(a) shows the extended image obtained by migrating
the reflection seismic data using a constant velocity model of
with w(t) a source wavelet that is bandlimited to the frequency 2200 m/s. Due to the too fast migration velocity, the reflectors
content of the data and Dm the perturbation to the background are imaged deeper than their true depth. The upward curva-
velocity. The adjoint operator F ⇤ maps scattered data d d to ture and strong energy at h 6= 0 in the subsurface offset domain
the model space: indicates that the migration velocity is inaccurate. Similarly,
Z Figure 2(b) shows the extended image when a constant migra-
(F ⇤ d d)(x) = 2m0 (x) dxs dxr dtdtw(t)G(xs , x, t) tion velocity of 1800 m/s is used. The reflectors are imaged
shallower with strong curvatures in the subsurface offset do-
∂2
⇥ G(x, xr ,t t) d d(xs , xr ,t). (4) main. In both cases, the leaked energy in the subsurface off-
∂t 2 set domain preserves the kinematic information of the seismic
The model space object in Equation 4 is often referred as a data. The wavenumber contents in both extended images are
reverse-time migration image. consistent with the frequency contents in the recorded data,
leading to low resolution images of the true model.
Based on the survey-sinking imaging condition introduced by
Claerbout (1985), we can introduce an extra subsurface offset
to the migration image. The resulting imaging operator is re-
ferred as the extended imaging operator Fe⇤ which maps the
scattered data to the extended images:
I(x, h) = (Fe⇤ d d)(x, h)

Z
= 2m0 (x) dxs dxr dtdtw(t)G(xs , x + h, t)
(a) (b)
∂2
⇥ G(x h, xr ,t t) d d(xs , xr ,t). (5)
∂t 2
Figure 2: Extended images obtained by LSERTM, using a con-
The adjoint operator Fe is referred as the extended Born model- stant fast velocity mode in (a) and using a constant slow veloc-
ing operator which maps the extended images to the scattered ity model in (b).
© 2017 SEG Page 1683

Preconditioning with total variation constrained deconvo-
where Proj{||·|| t} is the projector on the TV ball and a is
lution TV
a predetermined step length.
This inverse problem in Equation 7 has a large nullspace due
to the incompleteness of data in space and frequency. To bet- Figure 3 shows the extended images obtained by LSERTM
ter constrain the inverse problem and to honor the geological with TV-constrained deconvolution preconditioning. Compared
information, we introduce a total variation (TV) regularization with the extended images in Figure 2, preconditioned LSERTM
to augment the inversion: not only removes the migration artifacts from the image, but
also extends the bandwidth in wavenumber way beyond the
min ||Fe Dme (x, h) d dr,s,t ||22 , wavenumber supported by the frequencies in the data. The pre-
s.t. ||Dme ||TV  t, (8) conditioning will also speed up the convergence of LSERTM.
where || · ||TV denotes the TV norm (morally, the norm of L1 Figure 4 compares the wavenumber spectra of the true model
the gradient), and t is a user-defined parameter. The top equa- (in blue), the extended images (in green), and the extended im-
tion in the fitting system is the same as Equation 7. Solving ages after TV-constrained deconvolution (in red). The original
a TV constrained problem requires too many iterations for the image does not contain any low wavenumber components due
computational cost of LSERTM to be affordable. Hence, we to the missing low frequencies in the data. Although the image
rearrange the optimization goals: after TV-LSERTM does not reproduce the spectrum of the true
min ||Fe⇤ Fe Dme (x, h) Fe⇤ d d||22 , model exactly, it has restored the relative balance between the
low wavenumber and the high wavenumber components.
s.t. ||Dme ||TV  t. (9)
The object Fe⇤ d d
is already defined in Equation 5 as the ex- 1
(a)
1
(b)
tended image I(x, h). We now approximate the normal op- True
LSRTM
True
LSRTM
erator using a cheap 1-D convolution operator A ⇡ Fe⇤ Fe and 0.8 TV−LSRTM 0.8 TV−LSRTM
Normalized amplitude
Normalized amplitude
obtain an approximated system of equations to solve for the 0.6 0.6
TV constrained optimization problem:
0.4 0.4
min ||ADme (x, h) I(x, h)||22 ,
s.t. ||Dme ||TV  t. (10) 0.2 0.2
0 0
0 0.01 0.02 0.03 0.04 0.05 0 0.01 0.02 0.03 0.04 0.05
Wavenumber (1/m) Wavenumber (1/m)
Figure 4: Comparison of average wavenumber spectra of

the true model (Blue), of the inverted image (Green) by
LSERTM, and of the image by LSERTM preconditioned by
TV-constrained deconvolution (Red), for the cases of constant
fast velocity (a) and constant slow velocity (b).
Figure 5 compares the low frequency (0 - 6 Hz) seismic records

(a) (b)
reconstructed by extended Born modeling with the data record
modeled using the true velocity model. Both phase and am-
Figure 3: Extended images obtained by LSERTM precondi- plitude of the low frequency data have been successfully re-
tioned by TV-constrained deconvolution, using a constant fast constructed, despite the inaccuracies in the migration velocity
velocity model in (a) and using a constant slow velocity model model. Slight phase differences at large offsets are due to the
in (b). differences in the boundary reflections.
At this point, the low frequency data are reconstructed using

Notice that the deconvolution problem in Equation 10 is per- the extended linearization scheme preconditioned by wavenumber-
formed in the depth domain, where the seismic wavelet is stretched extending deconvolution. After filling in the frequency gap,
due to velocity variation and variable illumination. Conven- the conventional Born linearization at the low frequencies is
tional stationary wavelet deconvolution as performed in the accurate enough to be used for updating the background (low-
time domain is insufficient to capture the source signature for wavenumber) velocity model m0 . Therefore, the synthesized
all depth ranges. Therefore, we design a velocity-dependent data can provide coherent information for full waveform in-
variable wavelet deconvolution operator A to better approxi- version when sweeping over data from low to high frequen-
mate the extended normal operator Fe⇤ Fe . cies. The resulting model from FWI contains a full wavenum-
ber spectrum, eliminating the need of interpreting velocity and
To solve the TV constrained inverse problem, we use a pro-
reflectivity models separately.
jected gradient descent method, which projects the updated
model after each iteration to a total variation ball of finite ra- Extrapolated full waveform inversion
dius: To demonstrate the reliability of the extrapolated low frequency
⇣ ⌘
Dmi+1 i ⇤ i data, we use the reconstructed low frequencies to initialize the
e = Proj{||·|| t} Dme + aA (I ADme ) , (11)
TV
© 2017 SEG Page 1684

appears to be higher resolution than the inverted model from
modeled data; however, this apparent higher resolution is due
to the unbalanced frequency components in the data. Nonethe-
less, the extrapolated data provide reliable information in the
frequency gap between 2 Hz and 6 Hz, which maps to reason-
able estimations of the model. Both of these models can be
used to initialize FWI at higher frequencies (6 Hz and above).
(a) (b)
DISCUSSIONS AND CONCLUSIONS
We have proposed two different methods to synthesize the low

frequency data from the bandlimited field recordings. In the
previously phase tracking method (Li and Demanet, 2015, 2016),
the low frequency data are estimated in the data space as a
pure data processing step, hence the accuracy of which is in-
(c)
dependent of the accuracy of the initial velocity model. Due
to the weak constraint on extrapolated amplitude, the extrap-
Figure 5: Comparison of low frequency (0-6 Hz) seismic
olated phase information is often better approximated. Accu-
record. (a) Data record reconstructed from the constant fast
racy of the extrapolated low frequencies by phase tracking is
velocity and extended image in Figure 3(a). (b) Data record
in general higher for near offsets than for far offsets, due to
reconstructed from the constant slow velocity and extended
higher signal amplitudes near the source. Phase tracking be-
image in Figure 3(b). (c) Data record modeled using the true
comes ambiguous at large offsets where weak crossing events
velocity model in Figure 1(a).
are contaminated by noise. Consequently, we have limited the
application of the current version of the phase tracking algo-
rithm to near offsets (< 500 m).
frequency sweep of full waveform inversion on the Marmousi
model. The initial model we use for the numerical test repre- On the other hand, the accuracy of the image-space method
sents a typical case when the low-wavenumber velocity model we proposed here is relatively uniform for all offsets and more
is obtained by ray-based tomography or wave-equation migra- robust to the noise in the data, not only because of the mul-
tion velocity analysis: the kinematics of seismic data are accu- tifold stacking at each image point, but also because of the
rately explained up to 2 Hz, which is still lower than the lowest denoising effect of the TV constraints. The main drawback of
frequency recorded (6 Hz) in the data. The highest available the image-space method is that its accuracy relies on the accu-
frequency in the data for extrapolation is 50 Hz. racy of the initial velocity model. Numerical experience shows
that the low frequencies extrapolated after TV-constrained de-
convolution become unreliable when the initial velocity is too
crude, especially in regions where the geological environment
is highly complex. Comparing with the phase-tracking method
which extracts explicit handles of extrapolated phase and am-
plitude, the wavenumber extrapolation by TV deconvolution
is more automatic and implicit. Therefore, further studies are
needed to improve the stability of the wavenumber extrapola-
(a) (b) tion when the background velocity is less accurate.
In general, the image-space method is more computational in-

tensive than the data-space method. Nonetheless, the recon-
structed data by the image-space method is more consistent
between the extrapolated low frequency and the recorded high
frequency. Consequently, the reconstructed data can be used
throughout the frequency sweep of FWI after extrapolation.
(c)
ACKNOWLEDGEMENTS
Figure 6: (c) A low-passed initial model (maximum wavenum-
ber corresponds to an average frequency of 2 Hz). This project was funded in part by Total SA. LD is also funded
by AFOSR grants FA9550-12-1-0328 and FA9550-15-1-0078,
Figure 6 compares the low wavenumber models inverted from ONR grant N00014-16-1-2122, NSF grant DMS-1255203. Yun-
modeled low frequency (2 - 6 Hz) data (a) and from the ex- yue Elita Li acknowledges the MOE Tier-1 Grant R-302-000-
trapolated low frequency (2 - 6 Hz) data (b) with the smooth 165-133 for financial support.
initial model in (c). The inverted model from extrapolated data
© 2017 SEG Page 1685

EDITED REFERENCES
REFERENCES
Claerbout, J. F., 1985, Fundamentals of geophysical data processing: Pennwell Books.
Etgen, J., S. H. Gray, and Y. Zhang, 2009, An overview of depth imaging in exploration geophysics:
Geophysics, 74, no. 6, WCA5–WCA17, http://doi.org/10.1190/1.3223188.
Fadili, J. M., and G. Peyre, 2011, Total variation projection with first order schemes: IEEE Transactions
on Image Processing, 20, 657–669, http://doi.org/10.1109/TIP.2010.2072512.
Gauthier, O., J. Virieux, and A. Tarantola, 1986, Two-dimensional nonlinear inversion of seismic
waveforms: Numerical results: Geophysics, 51, 1387–1403, http://doi.org/10.1190/1.1442188.
Hou, J., and W. W. Symes, 2015, An approximate inverse to the extended Born modeling operator:
Hou, J., and W. W. Symes, 2016, Accelerating extended least-squares migration with weighted conjugate
gradient iteration: Geophysics, 81, no. 4, S165–S179, http://doi.org/10.1190/geo2015-0499.1.
Leader, C., and B. Biondi, 2014, Demigration and image space separation of simultaneously acquired
data: 84th Annual International Meeting, SEG, Expanded Abstracts,
Li, Y. E., and L. Demanet, 2015, Phase and amplitude tracking for seismic event separation: Geophysics,
80, no. 6, WD59–WD72, http://doi.org/10.1190/geo2015-0075.1.
Li, Y. E., and L. Demanet, 2016, Full waveform inversion with extrapolated low frequency data:
Pratt, G., C. Shin, and G. J. Hicks, 1998, Gauss-Newton and full Newton methods in frequency-space
http://doi.org/10.1046/j.1365-246X.1998.00498.x.
Pratt, R. G., and R. M. Shipp, 1999a, Seismic waveform inversion in the frequency domain. Part 2: Fault
delineation in sediments using crosshole data: Geophysics, 64, 902–914,
http://doi.org/10.1190/1.1444598.
Rickett, J. E., and P. C. Sava, 2002, Offset and angle-domain common image-point gathers for shot-
profile migration: Geophysics, 67, 883–889, http://doi.org/10.1190/1.1484531.
Rudin, L. I., S. Osher, and E. Fatemi, 1992, Nonlinear total variation based noise removal algorithms:
Physica D: Nonlinear Phenomena, 60, 259–268, http://doi.org/10.1016/0167-2789(92)90242-F.
Sava, P., and S. Fomel, 2006, Time-shift imaging condition in seismic migration: Geophysics, 71, no. 6,
S209–S217, http://doi.org/10.1190/1.2338824.
ten Kroode, F., 2012, A wave-equation-based Kirchhoff operator: Inverse Problems, 28, 115013–115040,
http://doi.org/10.1088/0266-5611/28/11/115013.
Weibull, W. W., and B. Arntsen, 2014, Reverse-time demigration using the extended-imaging condition:
© 2017 SEG Page 1686

Applications of the Direct Waveform Inversion (DWI) on 2D Models
Zhonghan Liu, Yingcai Zheng*, Department of Earth and Atmospheric Science, University of Houston
Summary showed better convergence behavior and reduced number

of iterations required by adding higher order non-linear
The Direct Waveform Inversion (DWI) is a new full- terms in the inversion. Another envelope inversion scheme
waveform inversion idea that is different from current full proposed by Luo and Wu, (2014) and Wu et al., (2014b)
waveform inversion (FWI) procedures. DWI combines the also alleviates such issue by introducing more low
seismic imaging and velocity inversion into one process. frequency information into the inversion.
The DWI uses the full seismic wavefield including
multiples and recursively invert for the velocity model in a Liu and Zheng (2015) proposed the DWI scheme by
shallow-to-deep fashion by explicitly using the time-space explicitly exploiting the causality of space-time wavefield.
causality. Therefore, we do not need a global initial Basically, DWI converts a global nonlinear problem into
velocity model and the absolute convergence is reached many local linear inversion problems. Hence, the
when the deep reflections are beyond the finite recording challenges in the traditional FWI methods could be
time. DWI offers new possibilities to overcome the circumvented.
challenges in many model perturbation based inversion
methods, whose effectiveness largely depends on the In this abstract, we first briefly review the DWI idea,
quality of initial model and nonlinear optimization schemes. and adopt this concept for solving 2D waveform inversion
We present numerical applications of DWI on two models: problems. Then a new DWI scheme is presented by
a layered model and a 2D model. DWI can successfully modifying the extrapolation step with boundary integration
invert for both models using full wavefield data where theory in the old recursive process. Some numerical
reflections and multiples are present in data. simulations further show this new scheme can handle
varying boundary geometry, multiples, and large velocity
Introduction contrasts.
The primary goal of FWI is to find a subsurface model such Theory and Method
that the modeled seismic data can match the observed data.
The seismic FWI concept generally is related to a group of We proposed the DWI (Liu and Zheng, 2015) which could
inversion methods aiming to minimize the difference directly invert the waveform data without a global initial
between the recorded and modeled seismic data by solving model and no iteration was needed for model updating. We
a global nonlinear optimization problem. Among these, the illustrated our idea in a 1D acoustic model: we use the
iterative perturbation-based approaches (Lailly, 1983; recorded pressure P(t) and vertical particle velocity V𝑧 (t),
Tarantola, 1984; Pratt et al., 1998; Pratt, 1999; Pratt and then we can separate the upgoing U(t) and downgoing D(t)
Shipp, 1999; Virieux and Operto, 2009; Tao and Sen, 2012) wavefield according to the relation
are most widely used to solve the optimization problem.
Recent work by Wu and Zheng (2014) revealed a one-to- D-U= 𝜌c𝑉𝑧 , (1)
one correspondence between the n-th order Frechet
derivative and the n-th order Born scattering and this D+U= P, (2)
indicates that the single-scattering perturbation assumption
of FWI is not adequate to model transmission waves where ρ and c are density and velocity of the medium,
(critical in velocity inversion) through a strong velocity respectively.
contrast (e.g., salt bodies) and with a large spatial scale.
Conventional FWI formulism was based on single Deconvolution is used for the up- and down-going
scattering perturbation (see also Tarantola, 2005) and its wavefields to find the causal response of the Earth where
successful application also depends on availability of the all the surface multiples are suppressed. In the meantime,
low-frequency data, which can make the single-scattering the first arrival we get in the causal response represents the
approximation more likely to be valid. However, due to the reflectivity and location of the next but deeper layer. After
local-minimum issue and the slow convergence of the that, we extrapolate the wavefield to the next layer and
iteration, FWI algorithms highly depend on the accuracy of form the pressure and particle velocity data again. Hence, a
the initial model, the quality of the input data, optimization closed recursive loop is achieved. We repeat this procedure
strategy, and the propagator’s efficiency. A T-matrix from shallow to deep depths until the reflections from the
method (Wu et al., 2014; Jakobsen and Wu, 2016; Wang et bottom of the inverted model are beyond the recording time
al., 2017) was proposed in the waveform inversion and window.
© 2017 SEG Page 1687

In exploration seismology, the Earth is often modeled reflectors (depth and impedance contrast). Hence, using
as a horizontally stratified structure, which means a equations (1), (2), and (3), we are able to implement our
laterally homogeneous model. Our aim is to fit the earliest recursive loop again and carry on the DWI in the 2D model.
part of the wavefield and find the local reflector by
explicitly using the causality of space-time domain Numerical examples
wavefield. To implement this idea, we need to re-datum the
sources and receivers downward to the locations of the To verify DWI for 2D models, we provide 2 numerical
local reflector. But in the 2D model with point-source examples below. These examples are to study the influence
incidence, the incident angle varies and is not well defined, of large velocity contrasts and irregular boundaries.
as a result, the relation between up- and downgoing plane-
wave pressure field and the total pressure and particle Numerical example 1
velocity field can no longer be computed by equations (1)
and (2). For laterally varying media, we need a wavefield The first model is a layered model (Figure 1a) with no
extrapolate scheme. Here, we follow a different route by lateral variation. We used a 2-D acoustic finite difference
investigating the boundary integral equation (e.g., Zheng et method to model the seismic data to be used in DWI. The
al., 2016). model grid size is 5m and time step is 1ms. The density is
constant. The source is located at (1.5km, 0.1km). A linear
The boundary integral equation, which acts as the array of receivers, recording the pressure (P) and particle
mathematical language of the Huygens-Fresnel principle, velocity (𝑉𝑧 ), are placed in the first layer below the source,
describes the relation between the pressure within one layer, at depth of 0.19km, the interval of receivers is 5m. Both top
and the pressure and particle velocity field on the and bottom are half spaces and there is no free surface. The
boundaries: source wavelet is a Ricker wavelet of 20Hz central
frequency and the source is at 0.1km depth in the first layer.
⃑⃑⃑⃑′ )𝐹(𝑥
𝑢(𝑥⃑) = ∫ 𝐺(𝑥⃑|𝑥 ⃑⃑⃑⃑′ )𝑑𝑥
⃑⃑⃑⃑′ + In solving the integral equation (3), we discretize it into
𝑆
short linear elements on the boundary. The element length
⃑⃑⃑⃑⃑′ )
𝜕𝑢(𝑥 ⃑⃑⃑⃑⃑′ )
𝜕𝐺(𝑥⃑|𝑥
for the inversion at each layer is same as the grid size (5m)
∫𝐷 [ ⃑⃑⃑⃑′ ) − 𝑢(𝑥
𝐺(𝑥⃑|𝑥 ⃑⃑⃑⃑′ ) ⃑⃑⃑⃑′ ),
] 𝑑𝑆(𝑥 (3) and we used the Gaussian Quadrature approach for the
𝜕𝑛′ 𝜕𝑛′
numerical integration on each element.
where S is the volumetric space of a layer, 𝑥⃑ is an arbitrary Figure 1a shows the true model and Figure 1c shows
⃑⃑⃑⃑′ ) is the distribution of sources in the S, D is
point in S, 𝐹(𝑥 the corresponding seismic data. We use the data (Figure 1c)
⃑⃑⃑⃑⃑′ )
𝜕𝑢(𝑥 and applied DWI and obtained the inverted model (Figure
the boundary of the space and and 𝑢(𝑥⃑⃑⃑⃑′ ) are the
𝜕𝑛′ 1b). We then use the finite-difference method to generate
particle velocity (apart from the impedance constant) and forward seismic synthetic gather (Figure 1d) and compare it
pressure on the boundary. to the data (Figure 1c). The starting model is the first layer
around the source and receivers. DWI recursively invert for
In the right hand side of equation (3), the first term all layers from top to bottom. The DWI can faithfully invert
represents the contribution from sources in the space S, and for the model.
the second term represents all the contribution outside this
space. Thus, by selecting a space partitioned by two infinite
upper and lower boundaries 𝐿1 and 𝐿2 respectively, without
sources. We can divide the boundary integration into two
portions, one is over 𝐿1 and the other is over 𝐿2 .
Respectively, the integration of pressure and particle
velocity fields on 𝐿1 and 𝐿2 can be interpreted as all the
contribution outside 𝐿1 ∪ 𝐿2 . and we can conclude that the
point 𝑥⃑,, the pressure field offered by 𝐿1 is the down-going
pressure field (D). Since we only have the recorded data on
the top of this space, which is 𝐿1 , in order to determine the
upgoing wavefield (U) of 𝑥⃑ , we need to propagate the
wavefield back from 𝐿1 to 𝑥⃑ . This can be achieved by
replacing the causal terms in equation (3) by the anticausal
terms. After that, we can return to our familiar local
inversion process and obtain information for the local
© 2017 SEG Page 1688

Numerical example 2
In our second model (Figure), the source and receiver

geometry for the modeling and inversion are same as the
first one. In this example, the true model (Figure 2a) is
laterally varying and has large velocity contrasts. In
modeling the data, we did not use the free surface boundary
condition. However, the velocity contrasts are 100%, 30%,
for the first and second layer interface. Inter-bed multiples
are present in the data. This model represents more
complexity than the layered model in example 1. Again,
we use recorded data (Figure 2c) to do the DWI inversion
and obtained the inverted model (Figure 2b). We then used
finite-difference method to model the wavefield in the
inverted model and obtained the seismic gather (Figure 2d).
It can be seen that the inverted model (Figure 2b) is similar
to the true model (Figure 2a). But the inverted velocity in
the third layer is slightly different than the velocity in the
true model, resulting in weaker multiples (Figure 2d) than
in the true dataset. This can be due to finite recording
aperture and need to be investigated further. However, the
arrival times of the multiples are same as in the true dataset.
We have not added noise to the data. However, our
previous work (Liu and Zheng, 2015) showed that random
noise had little effect on the DWI inversion results.
Figure 1. (a) True model and data (c); inverted model using DWI
(b) ; seismic gather modeled by finite-difference method on the
inverted model. The source is located at (1.5km, 0.1km).
© 2017 SEG Page 1689

Acknowledgements
We greatly thank Dr. Hao Hu and Professor Yike Liu for
the insightful discussions.
Figure 2. (a) True model and data (c); inverted model using DWI
(b) ; seismic gather modeled by finite-difference method on the
inverted model. The source is located at (1.5km, 0.1km).
In these two examples, we observe that the 2D DWI

scheme can work extremely well for the large velocity
contrast and un-even layer. However, we may have new
challenges for more complex structure or deeper parts of
the model due to the finite boundary effects of boundary
integration. Which needs future work to test these effects.
Conclusions
We extended the direct waveform inversion (DWI) scheme

to 2D models from previous layered model cases The
numerical success we have presented here show that DWI
is a promising alternative framework to the conventional
FWI formalism. In our numerical examples, we have
shown that DWI inverts the model in a shallow-to-deep
fashion and DWI does not require an initial global model to
start with. DWI is unconditionally convergent and it uses
recursion rather than iteration. In future, we will extend the
DWI to models with complex structures.
© 2017 SEG Page 1690

EDITED REFERENCES
REFERENCES
Jakobsen, M., and R. Wu, 2016, Domain decomposition method for efficient waveform inversion in
strongly scattering media: 86th Annual International Meeting, SEG, Expanded Abstracts, 1395–
1399, http://dx.doi.org/10.1190/segam2016-13951062.1.
Lailly, P., 1983, The seismic inverse problem as a sequence of before-stack migrations: Proceedings of
the Conference on Inverse Scattering: Theory and Applications.
Liu, Z., and Y. Zheng, 2015, Direct waveform inversion: 85th Annual International Meeting, SEG,
Expanded Abstracts, 1268–1273, http://dx.doi.org/10.1190/segam2015-5923910.1.
Pratt, G., C. Shin, and G. J. Hicks, 1998, Gauss-Newton and full-Newton methods in frequency-space
Pratt, G., and R. M. Shipp, 1999, Seismic waveform inversion in the frequency domain, Part 2: Fault
delineation in sediments using crosshole data: Geophysics, 64, 902–914,
http://dx.doi.org/10.1190/1.1444598.
Pratt, R. G., 1999, Seismic waveform inversion in the frequency domain, Part 1: Theory and verification
in a physical scale model [Abstract]: Geophysics, 64, 888–901,
http://dx.doi.org/10.1190/1.1444597.
Tarantola, A., 1984, Inversion of seismic-reflection data in the acoustic approximation: Geophysics, 49,
1259–1266, http://dx.doi.org/10.1190/1.1441754.
Tarantola, A., 2005, Inverse problem theory and methods for model parameter estimation: SIAM.
Wang, B., M. Jakobsen, R.-S. Wu, W. Lu, and X. Chen, 2017, Accurate and efficient velocity estimation
using transmission matrix formalism based on the domain decomposition method: Inverse
Problems, 33, 035002, http://dx.doi.org/10.1088/1361-6420/aa5998.
Wu, R., C. Hu, and B. Wang, 2014a, Nonlinear sensitivity operator and inverse thin-slab propagator for
tomographic waveform inversion: 84th Annual International Meeting, SEG, Expanded Abstracts,
Wu, R.-S., J. Luo, and B. Wu, 2014b, Seismic envelope inversion and modulation signal model
[Abstract]: Geophysics, 79, no. 3, WA13–WA24, http://dx.doi.org/10.1190/geo2013-0294.1.
Wu, R.-S., and Y. Zheng, 2014, Nonlinear partial derivative and its De Wolf approximation for nonlinear
seismic inversion: Geophysical Journal International, 196, 1827–1843,
http://dx.doi.org/10.1093/gji/ggt496.
Zheng, Y., A. H. Malallah, M. C. Fehler, and H. Hu, 2016, 2D full-waveform modeling of seismic waves
in layered karstic media: Geophysics, 81, no. 2, T25–T34, http://dx.doi.org/10.1190/geo2015-
0307.1
© 2017 SEG Page 1691

Single Frequency Waveform based multi-scale Wave-equation Traveltime Inversion
Yong Hu, Liguo Han, Zhongyuan Jin, Yajie Wei, Fengjiao Zhang, Hongyu Sun. Jilin University
Summary using the theory of interference to reconstruct the low

frequency information, but the high frequency component
Wave-equation Traveltime Inversion (WTI) is a good also can be reconstructed by the theory of interference,
method for building background velocity models, and it can therefore, it may destroy the inversion result. Hybrid
provide a good initial velocity model for full waveform domain FWI (Sirgue, 2008) performs the numerical wave
inversion (FWI). But, sometimes, the WTI result is not modeling in the time domain and construct multi-scale
good enough for conventional FWI, while in order to gradient vector in frequency domain, which can avoid the
calculate the traveltime difference between synthetic data limitation of 3D frequency domain FWI. But we must use
and recorded data, we have to extract the first arrival Discrete Fourier Transform to extract wave-field frequency
waveform and use cross-correlation method. In this paper, component in hybrid domain FWI, that could waste amount
we propose to use Single Frequency (SF) waveform to of time.
conduct Wave-equation Traveltime Inversion, and combine
with frequency multi-scale strategy, which can build a Considering these problems, we develop the idea of Luo
high-precision initial model for conventional FWI, we call (1991), and use Single Frequency waveform in time
it as Single Frequency waveform Wave-equation domain and combine with frequency multi-scale strategy to
Traveltime Inversion (SFWTI). The only difference conduct the traveltime inversion. This SFWTI method do
between single frequency waveform of synthetic data and not need to extract the first arrival waveform, and it can
recorded data is traveltime, and it does not need to extract provide a high-precision initial velocity model for FWI,
the first arrival waveform. We use the traveltime difference which the cycle skipping problem can be mitigated. The
with low frequency waveform to restore the macro adjoint-state method (Tarantola, 1984) is employed to
structure of velocity models, and then gradually increase calculate the gradient of the objective function. In this
the frequency of the seismic data in order to obtain the paper, We first describe the method of SFWTI and its
details of underground structure. Numerical examples show advantages, and then derive the objective function and the
that SFWTI can provide a high-precision initial velocity corresponding gradient operator. Finally, we give some
model for conventional FWI even started with 15Hz. numerical examples to prove that SFWTI can be a good
SFWTI+conventional FWI can recover underground detail supplement for FWI .
information and effectively mitigate the cycle skipping
problem for FWI. Single frequency waveform in time domain
Introduction Recorded seismic waveform is a complex frequency

waveform, so the variation of time domain waveform is
Seismic inversion is a good method to build high-precision very complex, while the hight-variability waveform which
velocity models, which can be split into two categories: may have severe cycle skipping problem in process of FWI.
Traveltime Inversion (TI) and Full Waveform Inversion In this paper, we propose to use Single frequency
(FWI). While FWI is a strongly nonlinear problem, and waveform to conduct the traveltime inversion. While Single
easy to fall into the local minimum value, especially when frequency waveform is a time domain waveform which
low frequency information is missing (Virieux and Operto, only has one frequency, just like sinusoidal function or
2009). When there is no low frequency information in the cosinoidal function.
seismic data, we need to invert a good enough velocity
model for conventional FWI. Traditional TI method is According to the idea of Luo (1991), if we use WTI to
based on high-frequency assumption, and not sensitive to obtain the macro models, we have to extract the first arrival
the initial velocity models, but the inversion result has low- waveform and then use the cross-correlation method to
resolution, usually, the inversion velocity model is not good calculate the time difference. If the seismic data do not
enough for conventional FWI. contain enough long off-set information and the reflected
wave information has been muted, we can not obtain a
Luo (1991) proposed to use Wave-equation Traveltime high-precision initial velocity model, in order to avoid this
Inversion (WTI) method to build the initial velocity model problem, we use full wave information to build a high-
for FWI by using the cross-well data. But in the seismic precision initial velocity model for FWI.
data processing, WTI result has low-resolution, and if we
do not have long off-set information, we can not obtain a Seismic wave is made up of multiple frequency and the
high-precision initial velocity model for conventional FWI. least square misfit function has many local minimum, but if
Beat tone FWI can mitigate the cycle skipping problem by we matched all the single waveform, the multiple
© 2017 SEG Page 1692

Single Frequency waveform Wave-equation Traveltime Inversion
frequency seismic waveform can be matched. For these Fig.4 Seismic data of single frequency waveform by single shot.
reasons, it is possible to separate the seismic waveform into (a) Recorded data, (b) Synthetic data,
single frequency waveform, and then use the separated (c) The difference between recorded data and synthetic data.
single frequency waveform for the SFWTI. From Fig.1a,
we can see that, the two seismic waveform (recorded data From Fig.3a, we can see that the start point is far from the
and synthetic data) have a great traveltime difference global minima, and not in the same neighborhood, so it is
(150ms), if we only use local optimization algorithm, we impossible for local optimization algorithm to find the
can not find the best matched time. The spectrum of the global minimum value. While from Fig.3b, we can see that
two ricker waveform are shown in the Fig.1b, we select one the start point and the global minimum point are in the
frequency point (15Hz) and then use Inverse Fourier same neighborhood, so it is easy to convergence to the
transform to obtain the single frequency waveform, From global minimum value. According to the previous steps, we
Fig.2, we can see that the traveltime difference of the two can transform the seismic data to single frequency
ricker waveform have been reduced and very close to each waveform trace by trace, while one shot of single frequency
other, so the FWI cycle skipping problem can be mitigated. waveform of seismic data are shown in the Fig.4.
The gradient of SFWTI objective function
Traveltime Inversion is to find a subsurface parameter( m )

by minimizing the time difference (  ) between the
synthetic data and the recorded data. In this paper,we use
Single Frequency Waveform to conduct the Wave-equation
a b Traveltime Inversion, which is aiming at reducing the
Fig.1 The difference of seismic data. (a) waveform, (b) spectrum. traveltime difference (  ( f , xr , xs ) ) between the single
frequency waveform of synthetic data and the recorded data.
Such a misfit can be measured by an objective function:
1
E ( m)    ( f , xr , xs )
2
(1)
2 s r
When m denotes subsurface parameter, s denotes the
number of seismic sources, r denotes the number of
Fig.2 Single frequency waveform.
receivers which equally spread on the surface of the
velocity model, f denotes selected single frequency,
xr denotes the position of receivers, and xs denotes the
position of seismic source.
In order to obtain the adjoint source for the SFWTI, we

must define a connective function that connects the travel-
a b
Fig.3 Misfit value. (a) Misfit value of Fig.1a waveform; time difference with seismic data. The traveltime difference
(b) Misfit value of Fig.2 waveform. can be calculated by the cross-correlation function:
P( f , xr , t   , xs ) obs P( f , xr , t , xs ) cal
C ( f , xr ,  , x s )   dt (2)
A( f , xr , xs ) obs A( f , xr , xs ) cal
where A( f , xr , xs ) obs and A( f , xr , xs ) cal is the maximum
amplitude of P ( f , xr , t   , xs ) obs and P ( f , xr , t   , xs ) cal .
 is a shift time between synthetic and recorded data. The
divisor A( f , xr , xs ) obs and A( f , xr , xs ) cal is to normalize the
seismic data, in order to eliminate amplitude problems.
(Luo, 1991)
Where we can use equation (2) to find the time
difference (  ) by seek the maximum value of C :
C ( f , xr ,  , xs )  maxC ( f , xr , , xs ) |   [T , T ] (3)
a b c where T is the estimated maximum traveltime difference
© 2017 SEG Page 1693

between the recorded data and synthetic data. In order to Fig.5 Single frequency waveform forward modeling in time
obtain the traveltime difference, we can take the derivative domain (snapshot at 0.42s). (a) 5Hz, (b) 15Hz, (c) 22Hz.
of C ( f , xr ,  , xs ) with respect to  , which should be zero
From Fig.5, we can see that the low frequency waveform is
at  . very fat, and not sensitive to the detail information of
P ( f , xr , t   , xs ) obs P ( f , xr , t , xs ) cal
C ( x , , x )
r s    
A( f , xr , xs ) obs A( f , xr , xs ) obs
dt  0 (4) velocity models. When we increase the frequency of
seismic waveform, the detail information of velocity
models can be demonstrated. The multi-scale SFWTI has
 P ( f , xr , t   , xs ) obs  many advantages: (1) it can obtain high-precision initial
Where P ( f , xr , t   , xs ) obs  . So we can use
t velocity models only by using selected frequency data; (2)
equation(4) to calculate the gradient for SFWTI. While the it can avoid the cycle skipping problem, even if start with
Jacobian matrix can be calculated by using the rule of the high frequency data (15Hz); (3) it has the advantages of
implicit function derivative (Luo, 1991): multi-scale in frequency domain, but it can avoid the
  C    limitation of 3D frequency inversion.
 v 

 P ( f , x , t   , x ) 
 1  P ( f , xr , t , xs ) cal 
 dt (5)
v   C    H
r s obs 
 v  Numerical examples
  
 
Where H in the equation(5) is a normalization value, which In order to demonstrate our method which can build the
is: H   P ( f , xr , t   , xs ) obs P ( f , xr , t , xs ) cal dt . According to high-precision initial velocity models, we apply it to the
the gradient operator of FWI, the gradient of SFWTI can be modified Marmousi model. The true model (Fig.6a) with
expressed by: size of 69  192 , and the grid interval is 12.5 m. The initial
E (v ) 2 P ( f , xr , t , xs ) cal 1 T   ( f , xr , xs ) P ( f , xr , t   , xs ) obs  velocity model for inversion is built by linear model. From
v
 3 
v t 2
L   H dt (6) Fig.6b, we can see that linear initial model is far from true
 
Single Frequency waveform of recorded data and synthetic model, even do not satisfy the variation tendency of the
data only has the traveltime difference, so it has: true model. The range of velocity value is from 1.5km/s to
P( f , xr , t   , xs ) obs P( f , xr , t , xs ) cal 4km/s.
 (7)
A( f , xr , xs ) obs A( f , xr , xs ) cal
In the end, the gradient of SFWTI can be approximate as
follows:

E (v)
v v
2 P( f , xr , t , xs ) cal 1
 3 
t 2
L     ( f , x , x ) P( Hf , x , t   , x )
T r s r s cal

dt (8)
 
Where in equation (8), L1 denotes adjoint operator. So the
gradient of multi-source can be expressed by follows:
E (v) 2  2 Pm ( f , t , x, z ) a b
  3   Pb ( f , t , x, z ) (9)
v v s r t 2 Fig6 Velocity model. (a) True model ; (b)Linear initial model.
Where Pm denotes incident wave-field, Pb denotes back-
propagation adjoint source wave-field, and x, z denote the The source function is Ricker wavelet with a dominant
velocity model space. frequency of 22 Hz. The Ricker wavelet waveform and
spectrum are shown in Fig.7. Recording time is 1.8 s with
the time interval of 1 ms. To demonstrate our methods can
Multi-scale strategy for SFWTI
mitigate cycle skipping problem, even if the start frequency
Frequency domain multi-scale FWI was proposed by Pratt is 15Hz. We selected 20 frequencies range from 15 to 25Hz
which is noted by red star in the Fig.7b.
(1998), while in this paper, in order to invert the velocity
model from large scale to small scale, we first use low
frequency component (15Hz) to invert the macro structure,
and then gradually increase the frequency to obtain a high-
precision initial velocity models.
a b
Fig.7 Ricker wavelet. (a) waveform, (b) spectrum.
a b c
© 2017 SEG Page 1694

Both of SFWTI method and WTI method can retrieve the

macro information of the velocity models. In order to
compare the inversion results of SFWTI and WTI, we
follow the process which is built by Luo (1991). The key
steps of the two methods are shown in the following:
(1a) Extract the first arrival waveform (Fig.10) (WTI).
(1b) Obtain single frequency waveform (SFWTI).
a b (2) Calculate pseudo-traveltime residual (adjoint source).
Fig.8 FWI result without low frequency information. (3) Calculate the gradient of WTI or WTI.
(a) Conventional FWI; (b) Traditional multi-scale FWI. (3) Calculate the step-length and update the velocity model.
When the seismic data lack of low frequency information,
the conventional FWI result is shown in Fig.8a, which has
severe cycle skipping problem. While the time domain
multi-scale FWI (Bunks,1995) result is shown in Fig.8b,
we can see that the traditional multi-scale strategy can not
solve the cycle skipping problem, When the seismic data
lack of low frequency information. The SFWTI result is
shown in the Fig.9a, it proves that SFWTI method can a b
build a high-precision initial velocity models even without Fig.11 Inversion result. (a) WTI, (b) WTI+conventional FWI.
low frequency information. Using SFWTI result as an
initial model, and then use conventional FWI to invert the Luo (1991) proposed to use WTI to build a good initial
velocity model, and result is shown in Fig.9b. Over all, the velocity model, and the high-frequency assumptions about
inversion results make a clear demonstration that the the data are not needed in WTI. But it has less velocity
SFWTI+Conventional FWI result has a good convergence resolution compared to that from the FWI method.
to the true model and does not dependence on initial model. Compare Fig.8b and Fig.11b, it proves that WTI method
can provide a good initial velocity model for FWI, but the
initial model is not good enough, and the FWI still has
cycle skipping problem. Compared Fig.9a and Fig.11a, it
proves that SFWTI can provide a high-precision initial
velocity model for FWI.
Conclusion
a b In this paper, we use Single Frequency waveform multi-

Fig.9 Inversion result. scale Wave-equation Traveltime Inversion (SFWTI) to
(a) SFWTI, (b) SFWTI result + conventional FWI. build a high-precision initial velocity model. The Single
Frequency waveform contains full wave information and
Compare SFWTI with WTI do not need to extract the first arrival waveform.
Meanwhile the inversion start point and the global
minimum point are in the same neighbourhood,
furthermore the multi-scale SFWTI can conduct traveltime
inversion from low frequency to high frequency. Therefore
SFWTI can obtain a high-precision initial velocity model
and it is not likely to be influenced by the cycle skipping
problem. The numerical experiments demonstrate the
applicability of this approach, especially when the initial
model is far from the true model and without low frequency
information, it proves that the SFWTI can be a good
supplement for conventional FWI.
a b c Acknowledgements
Fig.10 Single shot record. (a) Original recorded data,
(b) First arrival waveform of recorded data, This work is part of a project supported by the National
(c) First arrival waveform of synthetic data. Science and Technology Major Special Project of China
(Grant No.2014AA06A605).
© 2017 SEG Page 1695

EDITED REFERENCES
REFERENCES
Bunks, C., 1995, Multiscale seismic waveform inversion: Geophysics, 60, 1457–1473,
http://dx.doi.org/10.1190/1.1443880.
Chi, B. X., D. Liangguo, and L. Yuzhu, 2014, FWI method using envelope objective function without low
frequency data: Journal of Applied Geophysics, 109, 36–46,
http://dx.doi.org/10.1016/j.jappgeo.2014.07.010.
Hu, W., 2014, FWI without low frequency data — Beat tone inversion: 84th Annual International
Meeting, SEG, Expanded Abstracts, 1116–1120, http://dx.doi.org/10.1190/segam2014-0978.1.
Hu., Y., L. Han, and P. Zhang, 2016 Multistep full-waveform inversion based on waveform-mode
decomposition: 86th Annual International Meeting, SEG, Expanded Abstracts, 1501–1505,
Luo Y, and G. T. Schuster, 1991, Wave-equation traveltime inversion: Geophysics, 56, 645–653,
http://dx.doi.org/10.1190/1.1443081.
Sirgue L., J. T. Etgen, and U. Albertin, 2008, 3D frequency domain waveform inversion using time
domain finite difference methods: 70th Annual International Conference and Exhibition, EAGE,
Extended Abstracts, F022, http://dx.doi.org/10.3997/2214-4609.20147683.
1259–1266, http://dx.doi.org/10.1190/1.1441754.
Geophysics, 74, no. 6, WCC1–WCC26, http://dx.doi.org/10.1190/1.3238367.
Zhou, C., W. Cai, and Y. Luo, 1994, Acoustic wave-equation traveltime and waveform inversion of
crosshole seismic data, Proceedings of SPIE — The International Society for Optical
Engineering, 60, 765–773.
© 2017 SEG Page 1696

The nonlinear data functional and Multi-scale seismic envelope inversion: Algorithm and
methodology for application to salt structure inversion
Guoxin Chen1,2 ,Ru-Shan Wu2, Shengchang Chen1
(1: School of Earth Sciences, Zhejiang University, China; 2: University of California, Santa Cruz, USA;)
SUMMARY produced by non-linear transformation have artifacts. The

artifacts would distort the inversion, influence the
Multi-scale envelope inversion (MSEI) was proposed by convergence rate of inversion and even lead inversion to
using window-averaged envelope to invert strong contrast failure.
models. From seismic data to window averaged envelope is
a nonlinear data transformation and it is aimed to extract In order to overcome the drawbacks of traditional envelope
low-frequency information in the data without depending inversion (EI), Multi-scale envelope inversion method was
on the seismic data frequency range. We have derived a proposed (Wu et al. 2016; Wu & Chen, 2017). Wu and
new direct envelope Fréchet derivative for working with Chen (in this 2017 SEG abstract) has addressed the
the multi-scale envelope data (see Wu and Chen, 2017 SEG theoretical part of this MS envelope inversion for strong-
abstract). However, there are still many problems for nonlinear inversion, derived a new envelope Fréchet
strong-nonlinear inversion, especially when applying to salt derivative for working with the multi-scale envelope data
structures. Several strategies were proposed for this case: a and discussed the features of the new Fréchet derivative, its
joint objective function is proposed to make inversion related gradient field. In this paper, the implementation
method more adaptive. Then a multi-offset method based details of the algorithm and other strategies when applying
on offset weighting is introduced. Combination of the to salt structures are discussed, including the multi-
multi-scale envelope inversion and multi-offset can objective function and the multi-offset method. Numerical
improve the inversion quality of salt bottom and sub-salt. In tests on the SEG/EAGE Salt model demonstrate the
the numerical tests, we applied the multi-scale envelope validity and special features of the method for the case of
inversion method to the SEG/EAGE 2-D Salt model. Low- low-cut(cut from 4Hz below) sources and noisy data.
cut source (frequency components below 4 Hz were
truncated) was used in the inversion. We also tested the Multi scale envelope inversion using new Envelope
noisy data to prove the anti-noise characteristics of the new Fréchet derivative
algorithm.
The misfit functional of Multi scale envelope inversion can
INTRODUCTION be written as
How to get a good inversion result under the condition that

  W   1  r t r t 
W

W W
2 SRT
low frequency components are missing is an interesting 1

issue for FWI. In order to extract the low frequency r  t    dt ' W  t  t '   y  t '   y  t '   u t '   u t ' 
2 2 2 2
(1)

W H H
W
information coded in the envelope of seismic data,
nonlinear transformation methods are introduced into the Where y is the synthetic data, u is the observe data y H
inversion. Envelope inversion (EI) (Wu et al. 2013) can and u H are the corresponding Hilbert transformed data.
extract partially the ultra-low frequency information which
contained in seismic trace envelopes and apply to recover W(t) is a time window function and  W is its effective
long-wavelength background structure without using low- width. Using the new gradient formula proposed in Wu &
frequency sources. Auxiliary bump functional was Chen (2017):
introduced in to the full waveform inversion to make up
the shortcomings of the envelope inversion (Bharadwaj et
 W 2  2 (eW  t ) (e) 
al. 2016). In order to invert strong contrast models, multi-  3 GW R  rW 
scale envelope inversion based on the renormalization v SRT v  2t
1
group theory was proposed(Wu et al. 2016). Nonlinear eW  t    dt ' W  t  t '  e  t  (2)
transformation can reconstruct the ultra-low frequency  W
information in the envelopes of seismic data even the DC
e  t   y 2  t   yH2  t 
component, no matter what the frequency band of the
original data is. The low frequency which was Where R is a restriction operator onto the receiver
reconstructed by nonlinear transformation has played a position. R * is the reverse process of R , it will extend the
positive role in avoiding cycle-Skipping. However, residual data which is limited on the receivers to the entire
nonlinear transformation is a double-edged sword. Unlike model space. G is a amplitude Green’s operator which
linear filter, the low-frequency components which are
© 2017 SEG Page 1697

The nonlinear transformation makes FWI Linearization: Multi-scale seismic envelope inversion
represents the forward process in the equation. G* is the 1 1

E 1     ( y  u)2    rW  t rW  t  (3)
conjugate transpose of G , it represents the wave field 2 SRT 2 SRT
backpropagation, including time-reversal.  e means The gradient can be written
E y
envelop and W is the window width. S and R are shot and  1     ( y  u )
v SRT v (4)
receiver coordinates, T is the seismic recording time. 1 e  t ' 
   rW  t   dt ' W  t  t ' 
SRT W v
The new derivative makes it possible to use the low- Using the gradient-based method using back-propagator,
frequency information of the window averaged envelope we can get the gradient operator
directly and efficiently, and the window averaged envelope
is regarded as energy pulse, which makes the nonlinearity
E 2  2 y
of the full waveform inversion reduced (see Wu and Chen,  1     3 2 GW(e)R *  y  u 
2017). v SRT v  t (5)
2  ( eW  t ) (e) *
2
  3 GW R  rW 
The workflow of Multi scale envelope inversion (MSEI) is SRT v 2t
given in the Table 1. In the MSEI, The width of the Where  is the weighting factor. The selection of
window gradually decreases. The inversion result from weighting factor is a question worthy of study. It is a
previous window width will be used as the initial model for balance between reconstructing the large-scale model
current window width. When W is reduced to be structure and repairing the model's high-frequency
sufficiently small, we use the inversion result as the initial component. The weight factors which are too large or too
model of the traditional FWI. From MSEI to FWI, we call small will not produce the best result. There are many
it one complete loop. In the work flow, N is the number of aspects that can influence the choice of weighting factors:
the complete loop. the complexity of the model; the approximation of the
initial model to the true model; the absence of low
Table 1. The flow chart of the Multi-scale envelope frequencies in the seismic data and so on. In this paper in
inversion order to facilitate the calculation we set up   0.5 , of
course a better choice worth trying.
Set an initial model
Loop over N
MS-EI combines with multi offset inversion
Loop over window width W
It has been demonstrated that we can get the information
Loop over iteration n
about the low wavenumbers from the long-offset seismic
Calculate the gradient update the velocity model data through wide-angle illumination and is not constrained
by the range of the source bandwidth. However, only using
End of loop over the iteration long-offset seismic data cannot guarantee accurate
End of Loop over window width reconstruction of the long-wavelength components of the
model. Since the envelope data has sufficient low
End of loop N
frequency data, it is possible to reduce the possibility of
Get an accurate inversion result
occurrence of cycle skipping when using the long offset
data alone. Unlike the conventional multi-offset method,
Multi objective function inversion method :combine we weight the seismic data by the offset rather than directly
traditional FWI with multi scale envelope inversion truncating the near-offset data to reduce the non-linearity
caused by the long-offset data. We introduce the weighting
Low-frequency components which was reconstructed by factor into the inversion
nonlinear transformation have artifact. Unfortunately, we 
 H 1u if  L  L0 
u ( H1  1 0  H 2  1) (6)
do not have the ability to distinguish them from the useful

H 2u if  L  L0 
components. We also cannot remove all of these low-
frequency components because they indeed have played a Where u is the observed seismic data, L is the distance
positive role in the inversion of long-wavelength between the receiver and the source, L0 is the offset
information. In order to counteract the negative effects of threshold, and H 1 & H 2 are weighting factor. We combine
these artefacts in the inversion, we combine the multi scale
envelope inversion with the traditional FWI method and the MS-EI with the multi-offset method to improve the
propose a multi-objective inversion method. The objective inversion quality of the salt bottom and subsalt. The work
function is written as: flow of the combination inversion method is similar to the
MSEI expect that the observed seismic data is replaced by
SEG-2017
© 2017 SEG Page 1698
the weighed data by H 1 and H 2
Numerical experiments on 2-D SEG/EAGE Salt model
In order to test the inversion effect of the new method on

models with strong contrast, we apply the method to the
SEG/EAGE Salt model (Fig.1(a)). We use the linear
gradient model (Fig.1(b)) as the initial model. We use the (c)
Ricker wavelet as source in the test (cut from 4Hz below).
The dominant frequency of the source is 9 Hz. As a
comparison, we first give the results of the traditional FWI
(Fig.1(c)) and conventional EI+FWI (Fig.1(d)). From the
inversion results, it can be seen that these two methods do
not reconstruct the structure of salt dome. In the MS-EI,
three window widths are used in succession: 300ms,150ms,
(d)
50ms. The inversion result obtained by the previous
window width is used as the initial model of the next
window width. figures 1(e), (f) and (g) are the
corresponding inversion results. Significantly different
from traditional envelope inversion, the new method can
successfully reconstruct the boundary and the velocity of
salt. Using the MS-EI inversion result (Fig 1(g)) as the
initial model, figure 1(h) is the traditional FWI inversion
(e)
result. The error in the high frequency band in Fig. 1(g) is
corrected. In order to improve the inversion quality, figure
1(h) is token as the initial model to do another loop. figure
1(i), figure 1(j) is the inversion result of the second and
third loop. In the finial inversion result, the upper boundary
of the salt body is sharp and the velocity of the salt is well
defined. The velocity of the low velocity zone below the
salt has also been well reconstructed. The inversion result is (f)
close to the true model except some minor fluctuations in
the subsalt structure. The inversion results of MS-EI
combined with multi-offset method are shown in the figure
2. We can find the velocity of the salt is well reconstructed,
and even the bottom of the salt body is sketched out.
(g)
(a)
(h)
(b)
(i)
SEG-2017
© 2017 SEG Page 1699
significant difference between the clean data and noise data

inversion result except for some details.
Thanks to the envelope's anti-noise capability and the
smoothness characteristic of the window function, multi-
scale envelope inversion method has anti-noise ability.
(j)
Fig.1 Inversion results of salt model test using low-cut source. The
frequency components below 4Hz are truncated: (a)True
model;(b)Initial model;(c) Inversion result of Traditional FWI (d)
EI+FWI (e) Inversion result of the MS-EI for window width
300ms; (f)width 150ms ; (g) width 50ms; (h) Inversion result of
traditional FWI, the initial model is figure1 (g); (i) The inversion
result of second loop. Initial model is figure 1(h); (j) The inversion
result of third loop. Initial model is figure1 (i);
(a) (b)
Figure 3 (a) One shot profile without noise; (b) One shot profile
with noise, the SNR is 1.
(a)
Figure 4 The inversion result with noise

Conclusion
Non-linear transformation of data provides us with the

possibility to extract the low frequency information of
seismic data regardless of the frequency band of the
(b) original seismic data. However, Non-linear inversion use
nonlinear data functional will bring artefact and the artefact
will have a negative impact on the inversion. In order to
mitigate the effects of artifact, we made improvements in
several areas: firstly, we use linear filter to separate low-
frequency and high-frequency components of envelope data
to implement multi-scale inversion strategy on envelope
inversion; we changed the calculation method of gradient to
(c) reduce the nonlinearity of full waveform inversion by using
Figure 2: MSEI combined with multi-offset method with three a new Fréchet derivative; we combined traditional FWI
loops of iterations: (a) first loop; (b) second loop; (c) third loop. method with multi scale envelope inversion to make
Each loop consists of iterations with different window lengths. inversion method more adaptive. Multi offset inversion
strategy was used to improve the inversion quality of the
Multi-scale envelope inversion with noise data salt bottom and the sub-salt. The results of numerical test
demonstrate the effectiveness of this method.
In order to test the sensitivity of the multi-scale envelope
inversion to Gaussian noise, we do inversion use the
Acknowledgement
seismic data with Gaussian noise. In figure 3 we give two
shot profiles from the SEG/EAGE Salt model. figure 3(a) is This work was financially supported by the WTOPI
the original shot profile without noise and figure3 (b) is the (Wavelet Transform On Propagation and Imaging for
shot profile with Gaussian noise, the SNR is 1(SNR=log10
seismic exploration) University of California, Santa Cruz
(signal power/noise power). figure 4 is the inversion result and National Natural Science Foundation of China (Grant
of multi scale envelope inversion using weighted offset 41074133 and 41374001).
data. From the inversion result we can see that there is no
SEG-2017
© 2017 SEG Page 1700
EDITED REFERENCES
REFERENCES
Bharadwaj, P., W. Mulder, and G. Drijkoningen, 2016, Full waveform inversion with an auxiliary bump
functional: Geophysical Journal International, 206, 1076–1092,
http://dx.doi.org/10.1093/gji/ggw129.
Wu, R.S, and G. X. Chen, 2017, Multi-scale seismic envelope inversion for salt structures using a new
direct envelope Frechét derivative: Submitted for 87th Annual International Meeting, SEG,
Expanded Abstracts.
Wu, R.S, J.R. Luo, and G. X. Chen, 2016, Seismic envelope inversion and renormalization group theory:
Nonlinear scale separation and slow dynamics: 86th Annual International Meeting, SEG,
Expanded Abstracts, 1346–1351, http://dx.doi.org/10.1190/segam2016-13962956.1.
Wu, R.S., J. Luo, and B. Wu, 2014, Seismic envelope inversion and modulation sigma model:
Geophysics, 79, no. 3, WA13–W24, http://dx.doi.org/10.1190/geo2013-0294.1.
© 2017 SEG Page 1701

Time Domain Wavefield Reconstrcution Inversion
LIN Yu-Zhao ,LI Zhen-Chun,, ZHANG Kai, LI Yuan-yuan, YU Zhen-nan
Department of Geoscience, China University of Petroleum (East China), Qingdao 266580, China
Summary and FJ Herrmann et al. (2014) optimized the WRI theory

and gave the physical meaning of the reconstructed
Wavefield reconstruction inversion(WRI) is an improved wavefield and the value of the balance factor in the
full waveform inversion theory proposed in recent years. objective function; Z. Fang and F. Herrmann et al. (2015)
This method can expand the searching space by modifying proposed a seismic wavelet evaluation method for
the objective function, the model gradient is based on wavefield reconstruction inversion.
reconstructed wavefield which greatly improves the Current wavefield reconstruction inversion is usually
computing efficiency and migrates the influence of the carried out in the frequency domain, whereas inversion in
local minimum. As the frequency domain wavefield time domain requires less computational memory. Besides,
reconstruction inversion have a high demand for time domain inversion theory is more instructive for
computational memory and requires time-frequency practical applications. In this paper, we extend the WRI to
transformation with additional computational costs to time domain and obtained the augmented wave equation
applied to real seismic data, we extend wavefield inversion and model gradient. An anomaly model is used to illustrates
theory to time domain, the augmented equation of WRI in the stability of WRI and a Marmousi tests example
time domain is deduced so as the model gradient. The demonstrate the feasibility of WRI. Finally, we analyzed
advantages and limitations of wavefield reconstruction in the advantages and shortcomings of this method in seismic
time domain are discussed in combination with numerical parameters inversion.
tests. At last, we analysis the applicability of wavefield
reconstruction inversion in seismic parameters inversion. Theory
Introduction Full waveform inversion(FWI) method can obtain the

parameters of underground media by the amplitude, travel
The conventional method of full-wave inversion (Tarantola, time and phase information. The traditional frequency
1984; Pratt, 1990) is used to obtain the distribution of domain FWI objective function can be expressed as:
underground medium parameters by inversion between the M
1
min  (m)  
2
synthetic and observed data. The objective function of FWI Pi Ai (m) 1 qi  di , (1)
i 1 2
is a strong nonlinear function, which is easily affected by m 2
local minimum. The absence and inaccuracy of the low
where m denotes model parameters, d is observed data;
frequency information can lead to the mismatch between
the travel time of the simulated record and the real seismic i is frequency and M is total frequency, P is shot
data, which causes the inversion to fall into the local operator to make sure shot location is corresponding to
minimum. In order to solve this problem, Shin and Cha observed data. A is forward modeling operator:
(2009) extended the FWI to the Laplace domain to invert
the low-frequency data. Moghaddam and Mulder (2012) A(m)   2diag (m)  L where ω is frequency and
proposed to use the improved objective function inversion L is Laplace operator.
to reduce the low-frequency information requirements.
In order to mitigate the influence of local minimum on FWI, The objective function of FWI is the functional under L2
Tristan van Leeuwen and Felix J. Herrmann (2013) norm and can be solved by the adjoint state method:
proposed a new inversion method by expanding the target M
space: Wavefield Reconstruction Inversion (WRI). The  m   i2 diag (ui )* vi . (2)
method extends the solution space to data space and model i 1
space, so as to increase the accuracy of the solution and Solving the above equation requires computing and
improve the computational efficiency. Tristan and Felix add storing the forward wavefields for all shot points at each
the wave equation to the objective function in the frequency
domain and calculate the model gradient by reconstructing
frequency: Ai (m)ui  qi and adjoint wavefield v：
the wavefield. The method does not need to store or Ai (m)* vi  Pi* ( Pu
i i  di ) . It is obvious that the
calculate the forward wavefield and adjoint wavefield, thus
the calculation efficiency is greatly improved. By inversion method has high computational requirements
expanding the solution space, WRI is more stable and and low computational efficiency. In the absence of an
weakly influenced by the local minimum. T. van Leeuwen1
© 2017 SEG Page 1702

accurate initial model, the inversion results are Examples

susceptible to local minima. To demonstrate the accuracy of time domain WRI, the
To solve the above problem, Tristan van Leeuwen (2013) proposed algorithm is tested using an anomaly model and a
proposed an inversion method based on wavefield complex model extracted from Marmousi model.
reconstruction by modifying the objective function: the
state equation was added to the objective function and the Example 1: Anomaly model
wave field and model were used as parameters to process Figure 1 shows the true velocity model that has a reflection
inversion. The general mathematical equation of the layer and three anomalies. A homogeneous background
objective function of WRI can be written as: model with one reflection layer is used as the initial model
M
1 2 (Figure 2). The grid interval for both the horizontal and
min  (m, u )   ui  di 2  Ai (m)ui  qi 2 , (3)
2 2
vertical directions is 10 m. A Ricker wavelet with a main
i 1 2 2
m ,u
frequency of 10 Hz is used.
where Λ is balance factor. The wavefield reconstruction
We extracted snapshots at 1200 ms in forward modeling
problem in the time domain can be approximated as a
and inversion relatively. Figure 3 shows the true snapshot
L2 norm minimization problem, and the wavefield update and figure 4 is the reconstructed snapshot. Compared these
gradient can be obtained by deriving the wavefield from the two snapshots, we can see there are only a few difference
objective function: exists in the anomalies which can be neglected. As we
J A(m)ui discussed before, model gradient included both high and
gu   (ui  di )   2 ( A(m)ui  qi ) , (4-a) low frequency information (Figure 5-a), while the energy of
ui ui high frequency is very strong compared to low frequency
The second term of wavefield gradient is 0, so the data which will cause instability of inversion. To solve this
wavefield update process can be expressed as: problem, we replaced the filter term as reconstructed
Ui  ui    gu  ui    (ui  di ) . (4-b) wavefield, the optimal gradient shown in Figure 5-b. We
can see the gradient contains more low-wavenumber
Equation (4) is the time-domain augmented wave equation. information which makes the inversion more convergent.
When we applied this equation to model tests, we can get As illustrated in Figure 6, WRI can retrieve the position and
the most accurate wavefield by set step length  equal to 1 energy of the anomalies bodies well, which proves the
due to the same wave equation we used in both forward and accuracy of this method.
inversion. When dealing with the real seismic data, the
solution of the augmented equation is limited by the state
equation, thus we can only use part of observed data which
satisfy the state equation, weakening the nonlinearity of
inversion. This wavefield reconstruction method does not
need much computational memory as in frequency domain.
And compare to compared to conventional FWI, WRI does
not require the storage of the forward modeling wavefield
of the initial model.
Figure 1: True model
After solving the augmented equation, the model
parameters and the wavefield behaves in a more linear way,
The gradient of the model can be obtained by directly
deriving the wavefield from the gradient expression:
M
2  2U i
g m    2 [ A(m)U i  qi ] . (5)
i 1 v3 t 2 Figure 2: Initial model
The derivative term of wavefield versus time can be seen as
a high pass filter, which can be migrated by replaced as a
normal wavefield term; and the inside part of brackets is
linear to model perturbation. Furthermore, WRI updated the
model based on reconstructed wavefield, and then
determines the optimal solution from the data space and the
model space, which broadens the search space of the
solution and greatly reduces the influence of the local
minimum.
© 2017 SEG Page 1703

Figure 3: True Snapshot at 1200 ms to describe the middle structure of the model. But after 10
iterations, WRI can obtain a better result (Figure 9), each
structure is relatively clear, the energy intensity is more
obvious, the velocity is basically accurate.
Figure 4: Reconstructed Snapshot at 1200 ms
Figure 7: Marmousi Model
Figure 5-a: Gradient calculated
Figure 8: Initial model
Figure 5-b: Optimal gradient
Figure 9: Final Result

Figure 6: Final result
Example 2: Marmousi Model Conclusions

In the following tests, we adopt the system same as in the
anomalies model. Acoustic seismic modeling is performed In this paper, we extend wavefield reconstruction inversion
with a high-order finite-difference method in the time to time domain. The time-domain augmented wave
domain. We extracted a section (Figure 8) of 5.0 km × 3.0 equation and the model gradient are deduced. The solution
km from the original Marmousi model. The smoothed of the augmented wave equation in the time domain also
model is taken as the initial model (Figure 9). contains the limitations of the observed data and the state
When reconstructing wavefield, some strong direct waves equation, so that we can make full use of the real seismic
or transmitted waves propagating as second sources, which data which we can predicted accurately. After
will cause some extra energy clusters at the boundary of the reconstructing the wavefield, the wavefield and the model
model, and because the intermediate model is very complex, parameters behaves in a morel linear way, so the velocity
some waves can`t be propagated to the boundary or the field can be well retrieved. Moreover, the reconstruction of
energy is extremely weak at the receivers, thus the the time-domain wavefield requires less computation
wavefield can`t be reconstructed accurately, so it is difficult memory and can be easily applied to actual production.
© 2017 SEG Page 1704

Acknowledgments
We would like to thank the other members of the China

Seismic Wave Propagation and Imaging Laboratory (SWPI)
for their advice and assistance.
© 2017 SEG Page 1705

EDITED REFERENCES
REFERENCES
Bertsekas, D. P., 2014, Constrained optimization and Lagrange multiplier methods: Academic press.
Bozda, E., J. Trampert, and J. Tromp, 2011, Misfit functions for full waveform inversion based on
870, https://doi.org/10.1111/j.1365-246X.2011.04970.x.
De Hoop, A. T., 1960, A modification of Cagniard’s method for solving seismic pulse problems: Applied
Scientific Research, Section B, 8, 349–356, https://doi.org/10.1007/BF02920068.
Fichtner, A., B. L. N. Kennett, H. Igel, and H.-P. Bunge, 2008, Theoretical background for continental-
and global-scale full-waveform inversion in the time–frequency domain: Geophysical Journal
Fang, Z., and F. Herrmann, 2015, Source Estimation for Wavefield Reconstruction Inversion: 77th
Annual International Conference and Exhibition, EAGE, Extended Abstracts,
https://doi.org/10.3997/2214-4609.201412588.
Fang Z., C. Lee, C. Silva, F. Herrmann, and R. Kuske, 2015, Uncertainty quantification for Wavefield
Reconstruction Inversion: 77th Annual International Conference and Exhibition, EAGE,
Extended Abstracts, https://doi.org/10.3997/2214-4609.201413198.
Mulder, W. A., and B. Hak, 2009, Simultaneous imaging of velocity and attenuation perturbations from
seismic data is nearly impossible: 1st EAGE Conference and Exhibition incorporating SPE
EUROPEC 2009.
Moghaddam, P. P., and W. A.Mulder, 2012, The diagonalator: Inverse data space full waveform
inversion: 82nd Annual International Meeting, SEG, Expanded Abstracts,
Pratt, R. G., C. Shin, and G. J. Hick, 1998, Gauss–Newton and full Newton methods in frequency-space
https://doi.org/10.1046/j.1365-246X.1998.00498.x.
https://doi.org/10.1111/j.1365-246X.2006.02978.x.
Peters, B., F. J. Herrmann, and T. van Leeuwen, 2014, Wave-equation Based Inversion with the Penalty
Method-Adjoint-state Versus Wavefield-reconstruction Inversion: 76th Annual International
Conference and Exhibition, EAGE, Extended Abstracts, https://doi.org/10.3997/2214-
4609.20140704.
Shen, P., and W. W. Symes, 2008, Automatic velocity analysis via shot profile migration: Geophysics,
73, no. 5, VE49–VE59, https://doi.org/10.1190/1.2972021.
Shin, C., and Y. H. Cha, 2009, Waveform inversion in the Laplace—Fourier domain: Geophysical
Journal International, 177, 1067–1079, https://doi.org/10.1111/j.1365-246X.2009.04102.x.
Tarantola, A., and Valette B. 1982, Generalized nonlinear inverse problems solved using the least squares
criterion: Reviews of Geophysics, 20, 219–232, https://doi.org/10.1029/RG020i002p00219.
van Leeuwen, T., and F. J. Herrmann, 2013, Mitigating local minima in full-waveform inversion by
expanding the search space: Geophysical Journal International, 195, 661–667,
© 2017 SEG Page 1706

van Leeuwen, T., F.J. Herrmann and B. Peters, 2014, A New Take on FWI-Wavefield Reconstruction
Inversion: 76th Annual International Conference and Exhibition, EAGE, Extended Abstracts,
https://doi.org/10.3997/2214-4609.20140703.
van Leeuwen T., and W. A. Mulder 2010, A correlation-based misfit criterion for wave-equation
traveltime tomography: Geophysical Journal International, 182, 1383–1394,
https://doi.org/10.1111/j.1365-246X.2010.04681.x.
© 2017 SEG Page 1707


Improving Mini-Basin and Subsalt Imaging With Reflection Full Waveform Inversion

Uploaded by

Copyright:

Available Formats

Improving Mini-Basin and Subsalt Imaging With Reflection Full Waveform Inversion

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Improving Mini-Basin and Subsalt Imaging With Reflection Full Waveform Inversion

Uploaded by

Copyright:

Available Formats

Improving mini-basin and subsalt imaging with reflection full waveform inversion

Summary long offset acquisition and reliance on the refraction energy

© 2017 SEG Page 1492

© 2017 SEG Page 1493

fold show that the RFWI update improved the structural

RFWI was able to improve the hard-to-determine suprasalt

© 2017 SEG Page 1494

© 2017 SEG Page 1495

© 2017 SEG Page 1496

Summary increase lateral resolution in the imaging workflow (e.g.

Data processing included various forms of multiple

© 2017 SEG Page 1497

© 2017 SEG Page 1498

© 2017 SEG Page 1499

Results on Nile Delta field data

Following this, the RFWI was run for 5 iterations, which

Figure 6 shows a well log superimposed on the interval

© 2017 SEG Page 1500

© 2017 SEG Page 1501

Summary velocity flood migration followed by base of salt

© 2017 SEG Page 1502

© 2017 SEG Page 1503

© 2017 SEG Page 1504

© 2017 SEG Page 1505

© 2017 SEG Page 1506

BP America Inc., Houston, TX

Summary and Ross, 2007). Following initial success in 2005, several

© 2017 SEG Page 1507

Using these improved data, and the best legacy model

Imaging Challenges and Results

© 2017 SEG Page 1508

Figure 4: As Figure 2, but for the FWI velocity model.

Figures 5 and 6 show depth sections at the level of the

© 2017 SEG Page 1509

© 2017 SEG Page 1510

© 2017 SEG Page 1511

© 2017 SEG Page 1512

Convolu-on Convolu-on Convolu-on Convolu-on

Figure 1: Architecture of a typical convolutional neural network (CNN).

© 2017 SEG Page 1513

Training dataset (known models)

Image patches without structure

Figure 2: Deep learning workflow.

© 2017 SEG Page 1514

© 2017 SEG Page 1515

© 2017 SEG Page 1516

© 2017 SEG Page 1517

SUMMARY dynamically adapts the width of the level-set boundary to

It is natural to separate the model into salt and sediment with

m(x) = {1 a(x)}m0 (x) + a(x)m1, (1)

© 2017 SEG Page 1518

0.5 Here, m0min

The gradient and Gauss-Newton Hessian are given by

The Parametric Level-Set Full-Waveform Inversion (PLS-FWI) g = BT J ⇤ (F ( ) d) ,

The gradient and Gauss-Newton Hessian for this problem are 0 0

multiplication of the difference between the salt and sediment

D↵ = diag((m1 m0 ) h✏0 ( A↵)), (c) (d)

We refer the reader to (Kadu et al., 2017) for more details.

© 2017 SEG Page 1519