Real-Time Compression of Logging Data
Real-Time Compression of Logging Data
Real-Time Compression of Logging Data
This paper was prepared for presentation at the European Petroleum Conference held in Cannes, France, 16-18 November 1992.
This paper was selected for presentation by an SPE Program CwnmitWa following review of information contained in an abstract submitted by the aulhor(s). Contents of the paper,
as presented, have not been reviewed by the Society of Petroleum Enginesrs and are subject to correction by me author(s). The material, as presented, does not necessarily reflect
any position of the Society of Petroleum Engineers, its officers, or members. Papers presented at SPE meetings are subject to publication review by Editorial Committees of the Society
of Petroleum Engineers. Permissionto wpy is restricted to an abstract of not morethen 300 words. Illustrationsmay not be copied, The abstract should wntain conspicuous acknowledgment
of where and by whom the paper is presented. Write Librarian, SPE, P.O. Box 833836, Richardson, TX 750833836, U S A . Telex, 1 W 4 5 SPEUT.
logging job. With compression, only one tape would be required that log, (K bits always suffices t o specify the outcomes.
for all the full wave acoustic data. Imaging and vertical seismic The less obvious conclusion is that the maximum number
tools generate even more data. of bits, log, (K ), is required only when the outcomes are
equally likely, which corresponds to the maximum
Present wireline telemetry systems achieve data rates in the information conveyed by observing X.
range of 1 0 0 kbps t o 5 0 0 kbps. Future wireline telemetry
systems will be capable of data rates in the range of 7 0 0 kbps The constant source is an example of a source that has no
to 1 Mbps'. In contrast, future wireline imaging tools can information content; that is, since w e always know what
require effective data rates exceeding 5 megabits per second the next character will be, nothing is learned when it is
and future wireline vertical seismic tools may require data rates received. The random source is an example of a source
exceeding 2 0 Mbps. Clearly, there is a need for wireline data w i t h high information content; that is, since all characters
compression in both communications and storage. are equally likely t o be next, w e learn a full character's
worth of information (as much as w e could) when a
Present logging-while-drilling (LWD) telemetry systems have character is received. Thus, the less random the source, the
achieved data rates of 1 t o 3 bps. Future LWD telemetry fewer bits per character or sample will be needed t o convey
systems must overcome difficult technical challenges t o achieve all the information and the larger the resulting compression
data rates in the range of 1 0 t o 15 bps. New LWD formation ratio (if the data is compressional coded). In summary:
evaluation services will require a minimum of 1 0 bps and could 1) Random data cannot be compressed.
easily require 2 0 bps or more. The data compression challenges 2) Data that has been compressed by an optimal compres-
in LWD are in maximizing the real-time telemetry information sor (one that always achieves the entropy of the source)
and in maximizing the downhole data storage in the tool for cannot be compressed further.
subsequent access when the tool is retrieved (since most of the 3) One cannot guarantee that a data compressor will
logging data cannot be transmitted in real time). achieve any given performance on all data.
THEORY AND DEFINITIONS The data compression techniques applied t o logging data
for the scope of this paper can be separated into the
lnformation Theory following categories:
1) Lossless variable-length-coding (VLC) of logging data
Data compression has long been considered a topic in the field files,
of information theory (the study of the representation, storage, 2) Lossless predictive coding (LPCI of signal logs,
transmission, and transformation of data). lnformation theory 3) Lossy coding of image logs.
provides profound insights into the situation pictured in Figure
1, where a source is communicating over a channel to a data
processor. lnformation theory introduced the general idea of Lossless vs Lossy Compression
coding. The objective of source coding (data compression) is to
minimize the bit rate required for representation of the source Lossless data compression is the process of transforming
with a specified fidelity at the output of the source coder. We a body of data into a smaller one, from which it is possible
limit our attention here to the simple special case of a discrete- to recover exactly the original data at some point in time.
valued random process {X,} w i t h independent identically- By contrast, lossy data compression is the process of
distributed samples. Because the process is discrete-valued, it transforming a body of data into a smaller body from which
is possible to encode the signal as a bit stream with perfect an approximation of the original can be constructed. For
fidelity. In fact, the minimum average number of bits required various types of data, what defines a close approximation
to represent each sample without distortion is equal to the is an area of research in itself. A n important application of
entropy of X, defined t o be2: lossy compression is the compression of digitally sampled
analog data such as speech, music, black and white or
color images and video. For example if one sends a digital
representation of a photograph over a communication line,
it may only be important that the photograph received
looks, t o the human eye, identical t o the original; that is as
where px ( X ) is the probability distribution function of X,, E long as this is true, it is acceptable if the actual bits
is the expected or mean value function, and 0, is the group of received differ from the bits sent.
data symbols (alphabet) that X, represents.
Lossless non-predictive data compression techniques are
Since the entropy determines the number of bits required to expected t o provide average compression ratios of at least
represent a sample at the output of the source coder, it is said 3 : l for LWD and wireline data files. Lossy signal
to determine the amount of information in the sample, where compression techniques are expected to compress wireline
information is measured in bits. It is clear from the definition of and LWD signal logs by factors ranging from 3 to 1 0
entropy that without visibly deteriorating the log. Lossy image compres-
sion techniques are expected t o be able to compress image
logs by factors ranging from 2 0 : l t o 50:l without visibly
deteriorating the logs.
where K is the size of the alphabet of X, with equality if and
only if the outcomes of X are equally likely. The conclusion is
SPE 25015 W.R. Gardner and W.C. Sanstrom
3
Lossless Variable Length Coding from y ( T ) in the encoder prior t o quantization in the AID
converter. This same value )3 ( T ) is added in the receiver
Of the numerous lossless variable-length-coding techniques that at the output of the DIA converter t o yield a reconstructed
have been developed, this paper will examine the Huffman3, and quantized sample )i ( T ). The encoder and decoder are
Modified L e m p e l - Z i ~ ~ ,Lempel-Ziv-Welch6and
~,~, Shannon-Fano3 then represented by the t w o relations
Techniques.
frequency than with low spatial frequency. Also the eye is known means of reducing the number of bits needed to
much more receptive to fine detail in luminance (or brightness) represent a data set without losing any information. The dc
signals than chrominance (or color) signals. Generally, methods coefficients are differentially encoded so that the dc
that achieve high image compression ratios (10;l to 50:l) are coefficients of the previous 8x8 block of the same
lossy in that the reconstructed data is not identical to the component is used to predict the dc coefficients of the
original. current 8-by-8 block and the difference between these two
dc terms is encoded. The Huffman code table for the dc
Lossless image compression methods do exist but their term is based on the difference table.
compression ratios are much smaller, perhaps no better than
3:l. Such techniques are used only in the most sensitive The zigzag-coded ac coefficients are first run-length coded.
applications. For example, artifacts introduced by a lossy This process reduces each 8-by-8 block of DCT coefficients
algorithm into an x-ray radiograph may suggest an incorrect t o a number of events. Each event represents a nonzero
interpretation and alter the diagnosis of a medical condition. coefficient and the number of preceding zero coefficients.
Conversely, for commercial, industrial and consumer applica- Since the high frequency coefficients are more likely to be
tions, lossy algorithms are preferred because they save on zero, Huffman-coding these events makes it possible to
storage and communication bandwidth. In many subjective achieve efficient compression.
tests, reconstructed images that were encoded with a 20:l
compression ratio are hard to disti~guishfrom the original. For JPEG decoding, the encoding algorithm is simply run in
reverse as shown in Figure 4b.
The compression ratio for the Full-Wave Sonic data was also compression ratio t o approximately 20: 1 (achieved using
very good at 7-to-1 for the Lempel-Ziv-Welch technique. This 4x4 pixels).
relatively high compression ratio was achieved because the Full-
Wave Sonic produces a smooth signal with relatively small
changes between most data points. The practical consequences OBSERVATIONS AND CONCLUSIONS
of achieving a 7:1 compression ratio for this tool are great. This
tool generates a large amount of uncompressed data and 1. Lossless variable-length-coding data compression
requires an uncompressed telemetry data rate of 8 5 kbps, techniques can be used t o significantly compress the data
which is 7 0 % of the present available telemetry bandwidth. from a wide range of log data. The effective LWD and
With 7-to-1 data compression, this tool would require only 1 2 wireline well-logging telemetry data rates can be increased
kbps telemetry, freeing up the additional telemetry capacity for by a average factor of approximately 3-to-1 using these
other tools or even allowing the tool to run with older methods alone.
generation telemetry systems. In addition, each seven data 2. Lossless predictive techniques can be used in conjunc-
tapes that this tool generates on jobs could be reduced t o one tion w i t h variable-length coding techniques to significantly
tape with compression. improve the compression results for some log signals.
3. Lossy image data compression techniques can be used
to significantly compress the data from image logs. The
Predictive Coding of Log Signals effective well-logging telemetry data rate can be increased
by a factors ranging from 20:1, where the compressed
A dipmeter log signal was compressed with a combination of a images are barely distinguishable from the original, to 80:1,
predictive coder and a Huffman variable length coding coder. where most of the geologic features are still preserved.
The prediction algorithm used was simple "delta modulation" in 4. Data compression techniques can be used to
which the signal's preceding value was used as the predicted dramatically reduce the amount of disk and tape storage
present value. When a theoretically perfect Huffman coder was required t o perform the processing for post-logging image
used, a compression ratio of 1.85:1 was achieved. When a analysis.
gaussian "bell" curve was used to approximate the exact
statistics of the data, a compression ratio of 1.65:1 was
achieved. This is a considerable improvement over 1.25 NOMENCLATURE
compression ratio achieved using only variable-length coding
without prediction. X, = discrete valued random process
H ( X = entropy of X
Compression of Log Images p, ( X ) = probability distribution function
Zl, = alphabet (sample space) of X
Two test well images produced by a Circumferential Acoustic y ( T ) = Input sample t o predictive coder
Scanning Tool were compressed w i t h the JPEG algorithms )? ( T = predicted value of y(T)
using a DEC 3 1 0 0 workstation. The raw uncompressed images (T = quantized value-of y(Tl
are shown in Figures 5(a) and 6(a) along w i t h the respective e, ( T ) = prediction error
JPEG compressed images in Figures 5(b) and 6(b). The 6, ( T 1 = quantized prediction error
compression ratios achieved were 89-to-1 for the image in JPEG = Joint Photographic Experts Group
figure 5 and 97-to-1 for the image in Figure 6. This means that DCT = Discrete Cosine Transform
the compressed images require 8 9 and 97 times less data than I S 0 = International Standards Organization
their respective uncompressed images. VLC = Variable Length Coding
LPC = lossless predictive coding
Most of the important geologic features are still preserved in the PCM = pulse code modulation
80-to-1 compressed images although there is some loss of DPCM = differential PCM
resolution. The thin bedding and rollover slump can be clearly ADPCM = adaptive differential PCM
observed in the image in Figure 5 although the microfractures
observed within the thin beds in the uncompressed images are
smeared in the compressed images. The thin bedding and ACKNOWLEDGEMENTS
fracturing are clearly observable in the image in Figure 6.
We thank Halliburton Logging Services and its associates
The block edge effects produced by the JPEG algorithms that for their support and permission t o publish this paper. We
are noticeable at close range generally do not greatly interfere specifically thank Doug Seiler for providing the
with the observer's ability t o distinguish important geologic uncompressed CAST images, Charles Conley for drawing
features. These edge effects could have been reduced by the figures and Margaret Waid for many useful comments.
additional image processing on the compressed images. We especially thank Harry Smith for his knowledgeable and
patient help.
The loss of resolution seen at close range in the images in
Figures 5 and 6 can be reduced to the point where the
compressed images are barely distinguishable from the
uncompressed original images by reducing the JPEG
6 Real Time Compression of Logging Data SPE 2501 5
4. Ziv, J. and Lempel A., "Compression of Individual Sequences 14. Dual Induction 1 2.1 1 2.3 1 3.5 1
Via Variable-Rate Coding", IEEE Transactions on Information 5. Dual-spaced Neutron 1.3 1.3 1.6
Theory, Vol 24:5, pp. 530-536. 6. Four Arm Caliper 2.9 3.3 8.6
7. Full-wave Sonic 1 .8 1.9 7 .O
5. Ziv, J. and Lempel A., "A Universal Algorithm for Sequential
Data Compression", IEEE Transactions on lnformation Theory, 8. Natural Gamma 4.0 4.1 3.7
23:3, May 1977, pp 337-343. 9. High-frequency Dielectric 1.7 1.8 5.4
10. Dipmeter 1.1 1.1 1.2
6. Welch T.A., " A Technique for High-Performance Data
Compression", IEEE Computer, Vol 17, no. 6, June 1984, pp. Average 1.8 2.0 3.6
8-1 9.
Source + Source
Coder
Telemetry
Chamel
Souce
Decoder
'Processor
Data
563
8 Real Time Compression of Logging Data SPE 2501 5
ADPCM Prhcpk
?(TI
ENCODER DECODER
Y(n = mJtsample
Data
2-Dhensional
Discrete -"+ Run
Ac
Zigzag -"-W
Quantization 'Variable
Length '
C m
Data
=
Cosine Encoder Coder
t
L+ Difference
Encoder Do
- Quantization
Table
Data
Decoder
(b) JPEG DECODER
,j 5s; Transform
p-4~offset
564
SPE 25015 W.R. Gardner and W.C. Sanstrom 9
Borehole
Breakout
- - Sub-vertical
Fracture
-
-- Shale
Bed _____)
Sand
Bed
Amplitude Amplitude
Figure 5a. Raw CAST image. Figure 5b. Compressed CAST image
(compression ratio = 89.1-to-1).
10 Real Time Compression of Logging Data SPE 25015
Depth
(meters)
XXlO -
xx11 -
*\Thin
XX12 - JSeds
+Micro
XX13 - Fractures
+Rollover
Slump
XX14 -