Image Compression Using Discretewaveletstransforms.: G.Raja Sekhar

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 45

IMAGE COMPRESSION

USING
DISCRETEWAVELETSTRANSFORMS.

G.RAJA SEKHAR
INTRODUCTION
The advent of multimedia computing has lead to an increased demand for
digital images. The storage and manipulation of these images in their raw
form is very expensive; for example, a standard 35mm photograph digitized
at 12µm per pixel requires about 18Mbytes of storage and one second NTSC
quality color video requires almost 23Mbytes of storage. To make
widespread use of digital imagery practical, some form of data compression
must be used.
Digital images can be compressed by eliminating redundant information.
There are three types of redundancy that can be exploited by image
compression systems:

Spatial Redundancy: In almost all-natural images the values of


neighboring pixels are strongly correlated.
Spectral Redundancy: In images composed of more than one spectral
band, the spectral values for the same pixel location are often correlated.
Temporal Redundancy: Adjacent frames in a video sequence often show
very little change.
The removal of spatial and spectral redundancies is often accomplished by
transform coding, which uses some reversible linear transform to decorrelate
the image data. Temporal redundancy is exploited by techniques that only
encode the differences between adjacent frames in the image sequence, such
as motion prediction and compensation.

DIGITAL IMAGE PROCESSING:


Digital image processing is defined as the processing or the manipulation
done by a digital computer on a two dimensional array of real or complex
numbers which is actually representing a digital image.
Any image in the form of photograph, slide or a transparency is analog
which is digitized and stored as a matrix of binary digits in a digital
computer.
Digital image processing has a wide range of applications, such as image
storage and transmission for business applications, medical image
processing, remote sensing via satellites and other space crafts, robotics, and
automated inspection of industrial parts. Images acquired from satellites are
useful in tracking of earth resources; geographical mapping; prediction of
agricultural crops, urban growth, and whether; flood and fire control; and
many other environment applications. Space image applications include
recognition and analysis of objects contained in images obtained from deep
space probe missions. Image storage and transmission applications occur in
television broadcast, teleconferencing, transmission of facsimile images,
communication over computer networks, closed circuit television based
security monitoring systems, and in military applications. In medical
applications one is concerned with processing of chest X-rays,
cineangiograms, and projection images of transaxial tomography, radiology,
and ultrasonic scanning.

Image representation and modeling:


Any two-dimensional function that bears information can be of any
kind like the luminance of objects in a scene, absorption characteristics of
the body tissue, the radar cross section of a target, the temperature profile of
a region or the gravitational field in an area. Image representation is
basically concerned with what actually an image pixel (picture-element)
represents.

The fundamental requirement in image processing is that the images are


sampled quantized. The sampling rate has to be large enough to preserve the
useful information in an image. Usually it is determined by band width of
the image. Images can be analogously represented via two-dimensional
orthogonal functions called basis images. For sampled images basis images
can be determined from unary matrices called image transforms. Any image
can be represented as a weighted sum of basis images.

Image enhancement:
Image enhancement refers to accentuation, or sharpening of image
features such as edges, boundaries, or contrast to make a graphic display
more useful for analysis. The enhancement process does not increase the
inherent information content in the data. But it does increase the dynamic
range of the chosen features so that they can be detected easily. Image
enhancement includes gray level and contrast manipulation, noise reduction,
edge crisping and sharpening, filtering, interpolation and magnification,
pseudo coloring, etc.
Image enhancement techniques can be broadly classified into four
groups, Point operations
1. Contrast stretching.
2. Clipping and thresholding.
3. Digital negative.
4. Intensity level slicing.
5. Range compression
6. Image subtraction and change detection.

Histogram modeling
Histogram equalization, modification & specification

Spatial operators

1. Spatial Averaging and Spatial Low-pass filtering.


2. Directional Smoothing.
3. Median Filtering

Image restoration:

Image restoration is mainly concerned with the removal or


minimization of known degradations in an image. This refers to deblurring
of images degraded by the limitations of the sensor or its environment, noise
filtering, and correction of geometric distortion or non-lineraties due to
sensors. Some of the image restoration methods are least squares,
constrained least squares, and spline interpolation which belongs to the
Wiener filtering algorithms. Other methods such as maximum likelihood,
maximum entropy, and maximum a posteriori are nonlinear techniques that
require iterative solutions.

Image analysis:

Image analysis is mainly concerned with making quantitative


measurements from an image to produce a description of it. This can be
something like reading label on a grocery item, sorting different parts of an
assembly line or measuring size and orientation of blood cells in a medical
image. More advanced image analysis systems measure quantative
information and use it to make a sophisticated decision, such as controlling
the arm of a robot to move an object after identifying it or navigating an
aircraft with the aid of images acquired along its trajectory.

Image analysis techniques require extraction of certain features that aid


in the identification of the object. Segmentation techniques are used to
isolate the desired object from the scene so that measurements can be made
on it.

Image reconstruction from projections

Image reconstruction from projections is a special class of image


restoration problems where a two-dimensional object is reconstructed from
several one-dimensional projections. Reconstruction algorithms derive an
image of a thin axial slice of the object, giving an inside view otherwise
unobtainable without performing extensive surgery. Such techniques are
important in medical imaging (CT scanners), astronomy, radar imaging,
geological exploration, and nondestructive testing of assemblies.

Image data compression:

The amount of data associated with visual information is so large that its
storage requires enormous storage capacity. Although the storage capacities
of several media are substantial, their access speeds are usually inversely
proportional to their capacity. Typical television images are generating data
rates exceeding 10 million bytes per second. There are other images that
generate even higher data rates. Storage and/or band width, which could be
expensive. Image data compression techniques are concerned with reduction
in the number of bits required to store or transmit images without any
predictable loss of information. Image transmission applications are in
broadcast television; remote sensing via satellite, aircraft, radar, or sonar;
teleconferencing; computer communications; and facsimile transmission.
Image storage is required most commonly for educational and business
documents, medical images, etc. Because of their wide applications, data
compression is of great importance in digital image processing.
IMAGE TRANSFORMS

The image transforms usually refers to a class of unitary matrices used for
representing images. Just as one-dimensional signal can be represented by an
orthogonal series of basis functions, an image can also be expanded in terms
of a discrete set of basis functions, an image can also be expanded in terms
of a discrete set of basis arrays called basis images.

Properties of unitary transform:


1. Energy conservation and rotation: In any unitary transform the total
signal energy is preserved.
2. Energy compaction of transform coefficients: Most of the unitary
transforms have a tendency to pack a large fraction of the average
energy of the image into a relatively few components of the transform
coefficients.
3. De-correlation: When the input elements are highly correlated, the
transform coefficients tend to be uncorrelated.
4. Entropy: The entropy of a random vector is preserved under a unitary

transformation. Since entropy is measure of average information, this


means information is preserved under a unitary transformation.
ONE-DIMENSIONAL DISCRETE FOURIER TRANSFORM (DFT)

The Discrete Fourier Transform (DFT) of a sequence {u(n), n=0,……,N-


1} is defined as
N-1
v(k)=1/N ∑ u(n)WN^kn ,k=0,1,………N-1
n=0

WN=exp {-j2∏/N}

The N X N unitary DFT matrix F is given by

F= {1/√N WN^kn} , 0≤k, n≤N-1

The DFT is one of the most important transforms in digital signal and
image processing. It has several properties that make it deal for image
processing applications.

Properties of DFT

1. The DFT and unitary DFT matrices are symmetric. By definition


matrix F is symmetric. Therefore

F^-1=F*
2. The DFT is sampled spectrum of the finite sequence
u(n) extended by zeros outside the interval [0, N-1]

3. The extension of the DFT of a sequence and its inverse transforms


are periodic with period N.

4. The DFT of dimension N can be implemented by fast


algorithm O(Nlog2N) operations.
5. The DFT of a real sequence {x(n), n=0,1,…..,N-1} is conjugate
symmetric about N/2.

6. The DFT of a circular convolution of two sequences is equal to the


product of their DFT’s.

TWO-DIMENSIONAL DFT

The two dimensional DFT of an N X N image matrix {u(m, n)} is a


separable transform defined as

v(k, l)=1/N∑ ∑ u(m,n) WN^km WN^ln , 0≤k, l≤N-1

u(m,n)= 1/N∑ ∑ v(k,l) WN^-km WN^-ln , 0≤m, n≤N-1

Properties of two-dimensional DFT


1. The two dimensional DFT matrices are symmetric. By definition
matrix F is symmetric. Therefore

F1=F*
‫־‬
2. The two dimensional DFT is sampled spectrum of the finite sequence
u(n) extended by zeros outside the interval [0,N-1]

3. The extensions of the two dimensional DFT of a sequence and its


inverse transforms are periodic with period N.
4. The two dimensional DFT of dimension N can be implemented by a
fast algorithm O(Nlog2N) operations.

5. The two dimensional DFT of a real sequence {x(n), n=0,1,…..N-1} is


conjugate symmetric about N/2.

6. The DFT of a two dimensional circular convolution of two arrays is


equal to the product of their DFT’s.

THE COSINE TRANSFORM:

The cosine transform matrix C={c(k,n)}, also called the Discrete


Cosine Transform (DCT), is defined as
c(k,n) =1/N, k=0, 0≤n≤N-1

(or)

c(k,n)={√2/√N cos∏(2N+1)k/2N , 1≤k≤N-1,


0≤n≤N-1

Properties of the cosine transform:

1. The cosine transform is real and orthogonal, that is,

C=C*≥C1=C^T
‫־‬
2. The cosine transform is not real part of the unitary DFT. However, the

cosine transform of a sequence is related to the DFT of its symmetric


extension.
3. The cosine transform is a fast transform. The cosine transform of a
vector of N elements can be calculated in O(N log2N) operations via
an N-point FFT.
4. The cosine transform has excellent energy compaction for highly
correlated data.

THE SINE TRANSFORM

The N X N sine transform matrix ψ={ψ(k,n)}, also called the Discrete


Sine Transform(DST), is defined as
N-1
v(k)=√(2/N+1)∑ u(n)sin∏(k+1)(n+1)/(N+1), 0≤k≤N-1
n=0
N-1
u(n)=√(2/N+1)∑ v(k)sin∏(k+1)(n+1)/(N+1), 0≤n≤N-1
n=0

Properties of the sine transform

1. The sine transform is real, symmetric, and orthogonal, that is,


C=C*≥C1=C^T
‫־‬
2. The sine transform is not the real part of the unitary DFT. The sine

transform of a sequence is related to the DFT of its antisymmetric


extension.
3. The sine transform is a fast transform of a vector of N elements can be

calculated in O(N log2N) operations via a 2(N+1) point FFT.

THE HAAR TRANSFORM

1. The Haar transform is real and orthogonal. Therefore,


Hr=Hr*
Hr1=Hr^T
‫־‬
2. The Haar transform is a very fast transform. On a N X 1 vector it can
be implemented in O(N) operations.
3. The basis vectors of the Haar matrix are sequence ordered.
4. The Haar transform has poor energy compaction for images.

LITERATURE SURVEY ON IMAGE COMPRESSION


LITERATURE SURVEY ON IMAGE COMPRESSION

The use of digital images has increased at a rapid pace over the past
decade. Photographs, printed text, and other hard-copy media are now
routinely converted into digital form, and the direct acquisition of digital
images is becoming more common as sensors and associated electronics
improve. Many recent imaging modalities in medicine such as MRI, CT
generate images directly in digital form. Representing images in digital
form allows visual information to be easily manipulated in useful and
novel ways. Compression is required in order to
1. Reduce the memory required for storage,
2. Improve the data access rate from storage devices.
3. Reduce the band width and/or the time required for transfer across
communication channels.

Achieving Compression: Redundancy and Irrelevancy

Each pixel value represents a unique and perceptually important


piece of information; it would be difficult indeed to compress an image.
Fortunately, the data comprising a digital image or sequence of images
are often redundant and/or irrelevant. Redundancy relates to the statistical
properties of images, while irrelevant relates to the observer viewing an
image. Redundancy can be classified into three types:
1. Spatial (due to correlation between neighboring pixels in an image),
2. Spectral (due to the correlation between color planes or spectral
bands),
3. Temporal (due to correlation between neighboring frames in a
sequence of images).
Similarly, irrelevancy can be classified as spatial, spectral, and/or
temporal in nature, but the key issues in this case are the limitations and
variations of the human visual systems (HVS) when presented with
certain stimuli under various viewing conditions. Ideally, an image
compression technique removes redundant and irrelevant information and
then efficiently encodes what remains. Practically, it is often necessary to
throw away both non-redundant and relevant information to achieve the
necessary degree of compression.
The preceding comments point to a fundamental dichotomy in the
classification of image compression is lossless or lossy. In lossless
compression (also known as bit-preserving or reversible compression),
the reconstructed image after compression is numerically identical to the
original image on a pixel-by-pixel basis. In lossy compression (also
known as irreversible compression), the reconstructed image contains
degradations relative to the original image. However, under certain
conditions, these degradations may not be visually apparent (sometimes
called visually lossless compression). Obviously, lossless compression is
ideally desired since no information is compromised. Unfortunately, only
modest compression ratios (an average of 2:1 for a single –band image)
are possible with lossless compression. Much higher compression ratios
can be obtained with lossless compression. Much higher compression
ratios can be obtained with lossy techniques in exchange for potentially
visible degradations.

Compression System Components:

The three basic components of a general compression scheme are


1. Image decompositions or transformation,
2. Quantization, and
3. Symbol encoding.
The image decomposition or transformation is usually a information, or
more generally, to provide a representation that is more amenable to
efficient coding. This stage is used in both lossless and lossy techniques.
Examples include the prediction error signal formation in Differential
Pulse Code Modulation (DPCM), the discrete cosine transform (DCT),
and subband/wavelet decompositions. The next stage, quantization, is a
many-to-one mapping found only in lossy techniques, and it is the point
at which the errors are introduced. The type and degree of quantization
has a large impact on the bit rate and the reconstructed picture quality of
a lossy scheme. In essence, quantization can be viewed as a control knob
that trades off image quality for bit rate. Examples of quantization
strategies include uniform or nonuniform scalar quantization or vector
quantization. It is also desirable to quantize in such a way that resulting
output can be efficiently encoded by the last stage. The final stage,
symbol encoding, is a means for mapping the symbols (values) resulting
from the decomposition and/or quantization stages into strings of 0’s and
1’s, which can be transmitted or stored. This mapping may be simple as
using fixed-length binary code words to represent the symbols, or it
might use a variable-length code, such as a Huffman code or an
arithmetic code, as a means of achieving rates close to fundamental
information-theoretic limits. These three components often mutually
interact and their joint optimization is a complicated task. As a result,
they are often optimized individually based on assumed inputs.

An important aspect of compression technique is that one or more of


the components can be implemented in either an adaptive or a non-
adaptive mode. A component is adaptive if its structure (or parameters)
changes with in an image to take advantage of locally varying image
characteristics. Since most images vary significantly from one region to
another, adaptivity offers the potential for improved performance, in
exchange for an increase in complexity. In systems with causal
adaptivity, the adaptivity is inferred only from the previously
reconstructed pixel values (e.g., reconstructed in a raster scan fashion),
and as such, no overhead information needs to be transmitted to be
decoder. On the other hand, in systems with noncausal adaptivity, the
encoder parameters are based on previous pixel values (reconstructed or
actual) in addition to future input values. Although this may result in a
higher bit rate due to the required overhead information, it also usually
results in superior performance and lower decoder complexity.

Symbol Encoding
In these techniques each pixel is processed independently, ignoring the
inter pixel dependencies.

PCM
In PCM the incoming video signal is sampled, quantized, and coded by
a suitable code word (before feeding it to a digital modulator for
transmission). The quantizer output is generally coded by a fixed length
binary code word having B bits. Commonly, 8 bits are sufficient for
monochrome broadcast or video conferencing quality images; whereas
medical images or color video signals may require 10 to 12 bits per pixel.

The number of quantizing bits needed for visual display of images


can be reduced to 4 to 8 bits per pixel by using companding, contrast
quantization, or dithering techniques. Halftone technique reduce the
quantizer output to 1 bit per pixel, but usually, the input sampling rate
must be increase by a factor of 2 to 16. The compression achieved by
these techniques is generally less than 2:1.

Entropy Coding
If the quantized pixels are not uniformly distributed, then their entropy
will be less than B, and there exists a code that uses les than B bits per
pixel. In entropy will be less than B, and there exists a code that uses less
than B bits per pixel. In entropy coding the goal is to encode a block of
M pixels containing MB bits with probabilities pi, i=0, 1,…….., L-1,
L=2^MB, by -log2pi bits, so that the average bit rate is

∑pi (-log2pi) =H

This gives a variable-length code for each block, where highly probable
blocks (or symbols) are represented by small-length codes, and vice
versa.

The Huffman Coding Algorithm

1. Arrange the symbol probabilities pi in a decreasing order and consider


them as leaf nodes of a tree.
2. While there is more than one node:

Merge the two nodes with smallest probability to form a new node
whose probability is the sum of the two merged nodes.
Arbitrarily assign 1 and 0 to each pair of branches merging into a
node.
3. Read sequentially from the root node to the leaf node
where the symbol is located.

Run-Length Coding

Consider a binary source whose output is coded as the number of


0’s between two successive 1’s, that is, the length of the runs of 0’s are
coded. This is called run-length coding (RLC). It is useful whenever
large runs of 0’s are expected. Such a situation occurs in printed
documents, graphics, weather maps, and so on, where the probability of a
0 is close to unity.

Bit-Plane Encoding

A 256 gray level image can be considered as a set of eight 1-bit planes,
each of which can be run-length encoded. For 8-bit monochrome images,
compression ratios of 1.5 to 2 can be achieved. This method becomes
very sensitive to channel errors unless the significant bit planes are
carefully protected.
PREDICTIVE TECHINQUES

Basic Principle

The philosophy underlying predictive techniques is to remove mutual


redundancy between successive pixels and encode only the new
information. This can be done by differential pulse code modulation
(DPCM).

Feedback versus Feed forward prediction

An important aspect of the DPCM which says prediction is based on


the output the quantized samples rather than the input the input the
quantized samples. This results in the predictor being the feedback loop
around the quantizer error at a given step is fed back to the quantizer
input at the next step. This has a stabilizing effect that prevents dc drift
and accumulation of error in the reconstructed signal.

On the other hand, if the prediction rule is based on the past


inputs the signal reconstruction error would depend on all the past and
present quantization errors in the feed forward prediction-error sequence.
Generally, the mean square value of this reconstruction error will be
greater than that in DPCM.

Distortion less Predictive Coding


In digital processing the input sequence is generally digitized at the
source itself by a sufficient number of bits (typically 8 for images). Then,
the sequence may be considered as an integer sequence. By requiring the
predictor error sequence. By requiring predictor outputs to be integer
values, the prediction error sequence will also take integer values and can
be entropy coded without distortion. This gives a distortion less
predictive codec whose minimum achievable rate would be equal to the
entropy of the prediction-error sequence.
Delta Modulation

Delta modulation (DM) is the simplest of the predictive coders. It


uses a one-step delay function as a predictor and a 1-bit quantizer, giving
a 1-bit representation of the signal. The predictor integrates the quantizer
output, which is a sequence of binary pulses. The receiver integrates the
quantizer output, which is a sequence of binary pulses. The receiver is a
simple integrator. Primary limitations of delta modulation are (1) Slope
overload, (2) granularity noise, and (3) instability to channel errors. Slope
overload occurs whenever there is a large jump or discontinuity in the
signal, to which quantizer can respond only in several delta steps.
Granularity noise is the step like nature of the output when the signal is
almost constant. Both of these errors can be compensated to a certain
extent by low pass filtering the input and output signals. Slope overload
can also be reduced by increasing the sampling rate, which will reduce
the inter pixel differences. However, the higher sampling rate will tend to
lower the achievable compression. An alternative for reducing
granularity while retaining simplicity is to go a tristate delta modulator.
The advantage is that a large number (65 to 85%) of pixels are found in
the level, or 0, state where as, the remaining pixels are in the +/-1 states.
The reconstruction filter, which is a simple integrator, is unstable.
Therefore, in the presence of channel errors, the receiver output can
accumulate large errors. It can be stabilized by attenuating the predictor
output by a positive constant called leak.

Arithmetic coding

In Huffman coding there is one-to-one correspondence between the


code words and the source sequence blocks. In comparison, arithmetic
coding is a nonblock code (also know as tree code), where a codeword is
assigned to the entire input sequence sm of length m symbols. In
arithmetic coding, slightly different source sequences can result in
dramatically different code sequences.

Consider encoding a sequence of m binary symbols, sm, that has a


probability of occurrence p(sm) over all of the 2^m possible source
sequences of length m must be 1, it is possible to assign a subinterval
within the half-open interval [0,1], to each source sequence sm, such that
the length of the subinterval is equal to p(sm) and the subintervals are
non overlapping, For any sequence it can be shown that the subinterval
generated by this method has a width equal to the probability of that
sequence. Furthermore, the subintervals produced by the possible
sequences of length m are nonoverlapping, and their union completely
covers the interval [0, 1]. Once the subinterval corresponding to a certain
source sequence sm has been identified, a codeword for sm can
constructed by the binary expansion of the subinterval beginning point,
c(m). Since the beginning point of each subinterval is separated from the
beginning point of its nearest right hand neighbour by the subinterval
width p(sm), it can be shown that it is only necessary to retain 1(sm) = [-
log2 p(sm)] bits after the decimal point to uniquely specify the
subinterval ([x] is the smallest integer that is larger than x). This method
of encoding results in a single codeword for the entire sequence sm that
has a length within 1 bit of the sequence’s ideal codeword length, -log2
p(sm).

QUANTIZATION

A quantizer is essentially a staircase function that maps the


possible input values into smaller number of output levels. In this way,
the numbers of symbols that need to be encoded are reduced at the
expense of introducing error in the reconstructed image. The type and
degree of quantization has a large impact on the final bit rate and the
reconstructed picture quality of a lossy scheme. The individual
quantization of each signal value is called scalar quantization (SQ), and
the joint quantization of a block of signal values is called block or vector
quantization (VQ). For the same encoding strategy, VQ can always be
made to outperform SQ, but in many cases the gain is so minimal that it
is not worth the additional complexity.

The quantizer design problem is usually formulated as the


minimization of some distortion measure for a given average output bit
rate. The distortion measure may be context free (i.e. independent of the
neighboring signal values) or context-dependent (e.g. a perceptual
measure based on the properties of the human visual system). An
important aspect of quantizer design is to determine, for a desired level of
distortion, the theoretical lower bound and quantify the potential gains of
adapting a more complicated quantization strategy (e.g. VQ instead of
SQ). Rate distortion, a branch of information theory, deals with obtaining
results such performance bounds. Due to the sophisticated mathematics
involved, key results have been established only for context-free
distortion measures such as mean squared error (MSE) and for a few well
known signal probability distributions such as the Gaussian distribution.
Despite these limitations, the results still provide useful insights into
quantizer performance.

Several approaches to designing quantizers based on visual criteria have


been suggested, but a justifiable debate continues as to the best criteria to
use. As a result we restrict ourselves to the design of context free
quantizers based on the minimization of the MSE. Although it is well
known that MSE does not always correlate well with perceived image
quality, it does provide a measure of relative quality for the same
algorithm at different bit rates, and its mathematical tractability has led to
its wide spread use.

Scalar Quantization

Scalar Quantization (SQ) refers to the independent quantization of


each signal value. The main advantage of SQ is its implementation
simplicity, but in many situations is close to optimal in a rate distortion
sense.
1. Lloyd-Max Quantizer.

In many transmission or storage applications, the communication


channel is fixed –rate, and the complexity of a rate equalizer buffer to
accommodate variable length codes cannot be justified. In such cases it is
desirable to find the quantizer output levels. Note that since the quantizer
output bit rate is log2N, the value of N is usually chosen to be power of 2
for efficiency, although this is not strictly necessary.

The mathematical development of the Lloyd-Max quantizer is


straightforward. Let x be continuous random variable with a signal x into
a discrete variable x* that belongs to a finite set [ri, I=0,…………..,N-1]
of real numbers referred to as reconstruction levels. The range of values
of x that map to a particular x* are defined by a set of points [di,I=0,
……….,N-1] of real numbers referred to as reconstruction levels. The
range of values of x that map to a particular x* are defined by a set of
points [di, di+1] it is mapped (quantized) to ri , which lies in the same
interval.

di=xi-1+ri /2

ri =∫xp(x)dx/∫p(x)dx
The Lloyd-Max quantizer has non uniform decision regions. Since its
objective is to minimize the average distortion, it tends to allocate more
levels to those regions where the signal pdf is large.
2. Entropy-Constrained Quantizers

In applications where entropy coding of the quantizer output level is


allowed, the final bit rate is determined by the entropy of the quantizer
output levels rather than strictly by the number of levels, N. This leads to
another criterion for quantizer design, known as entropy-constrained
quantization, where the quantization error is minimized subject to the
constraint that the entropy of the quantizer output levels has a prescribed
value. This means finding the quantizer that achieves the smallest MSE
among all scalar quantizers with the same output entropy (but not
necessarily the same output levels, N). The optimal entropy-constrained
quantizer can be found numerically by an iterative technique.

Vector Quantization

In VQ, an n-dimensional input vector X=[x1, x2,...., xn], whose


components represent the discrete or continuous signals, is mapped
(quantized) into one of N possible reconstruction vectors Yi, i=1,2,
…….N. The distortion in quantizing X with Yi, is denoted by d(X, Yi,)
and is defined according to the application. The most common distortion
measure is MSE, which corresponds to the square of the Euclidean
distance between the two vectors; that is,
dMSE(X,Y) =1/2∑(xi-yi)^2

The set Y is sometimes referred to as the reconstruction codebook and


its members are called code vectors or templates. The code book design
problem is to find the optimal codebook (in the sense of minimizing
average distortion) for given input signal statistics, distortion measure,
and codebook size, N. Once the codebook has been determined, the
quantization process is straight forward and is based on the minimum
distortion rule: a given input vector is compared to all the entries in the
codebook and is quantized to the code vector that results in the smallest
distortion; that is, choose Yk such that d(X, Yk) ≤ d(X,Yj) for all j=1,
…….N. With N possible code vectors, the output of the vector quantizer
can be specified with log2N bits, and the resulting bit rate per vector
component is R= (log2N)/n bits.

PACKING

Depending on the number of quantization levels the quantized


coefficient matrix contains coefficients which occupy less than 8 bits as
the image is 8 bits per pixel and the coefficients after quantization occupy
less than this, we can get a compressed image if the bits which are not
significant are removed. This can be done by packing coefficients. This
process involves some bit operations like AND, OR, right shift, left shift.
If we are using a 8 level quantizer then the coefficient values occupy only
three bits (i.e. 5 bits are unused). So if we remove these five bits we can
achieve a compression ratio of 8:3. For this we declare a character string
and place each coefficient one after the other each occupying only 3 bits
(i.e. the 1^st character of the string contains most significant 3 bits as the
1^st coefficient, next three bits as 2^nd coefficient and the least
significant 2 bits contain the most significant 2 bits of the 3^rd
coefficient whose last bit is stored in the next character most significant 2
bits of the 3^rd coefficient whose last bit is stored in the next character
most significant bit). Similarly, all the coefficients in the array are packed
into a character string.

The assigning of quantizer levels purely depends on the compression


required and the tolerable distortion. For high compression ratios we can
use only 2 bits per pixel but distortion increases.

STANDARDS FOR CONTINUOUS-TONE, STILL IMAGES

A committee known as JPEG (for Joint Photographic Experts Group)


was formed under the joint auspices of ISO and CCITT at the end of
1986 for the purpose of developing an international standard for the
compression and decompression of continuous-tone, still-frame,
monochrome and color images. The goal of this committee was to define
a general –purpose standard for such diverse applications as photo-
videotext, desktop publishing, graphic arts, color facsimile,
photojournalism, medical systems, and many others. To meet the needs
of these different applications, the proposed JPEG standard consists of
there main components: (1) a base line system that provides a simple and
efficient algorithm that is adequate for most image coding applications.
(2) a set of extended system features such as progressive buildup that
allows the baseline system to satisfy a broader range of applications, and
(3) an independent lossless method for applications requiring that type of
compression. The JPEG-proposed system became a draft international
standard in 1992. The JPEG baseline system has already gained broad
acceptance as a lossy compression technique, and several manufacturers
have introduced JPEG chips and basic DCT engines.

SELECTING A COMPRESSION TECHINQUE

The selection of an appropriate compression technique for a specific


application can be daunting task, due in part to wide range of basic
approaches. Compounding the problem is the existence of many subtle
variations on the same theme (such as incorporating adaptivity in
different ways), resulting in seemingly similar algorithms that can
perform quite different on the application. Similarly, evaluating the
claims of a novel technique can be difficult, since the true advantages and
disadvantages may not be immediately apparent.

In general, the usefulness of a particular algorithm is heavily


dependent on the application requirements. An obvious example is
whether lossless or lossy compression is needed. As another example, a
storage application might require a constant image quality and a simple
decoder structure (at the possible expense of a more complex encoder),
while a transmission application might strive for a fixed compression
ratio (at the expense of a varying image quality) and encoder/decoder
symmetry. The following is a list of factors, by no means exhaustive, that
can be consulted as a general guide in the process of evaluating and
selecting an appropriate technique.
1. Sensitivity to input images types. Input image characteristics such as
bit depth, resolution, noise, spatial frequency content, pixel-to-pixel
correlation, and other image statistics may all affect the performance
and thus the choice of an algorithm. Also, some compression schemes
may require parameter tuning to obtain good performance with a
given class of images, and performance can degrade significantly if
other types of input images are allowed.
2. Operational bit rate. In some applications the priority is to achieve a
very high degree of compression even at the cost of low image
quality. In contrast, other applications may require a high degree of
image quality that can only be achieved at modest compression ratios.
In general there is certain range of output bit rates for which an
algorithm is most efficient. Furthermore, some algorithms cannot
inherently be operated below a certain bit rates. It is also desirable to
have the ability to optimally and easily trade off the bit rate for the
reconstructed image quality by adjusting a small set of compression
parameters (which usually control the degree of quantization).
However, some schemes can operate at only a few specific bit rates
(or even just one bit rate) and require a redesign of the system to
achieve other bit rates.
3. Constant bit rate versus constant quality. Algorithms that operate
with a constant bit rate are more suitable for transmission applications
where a fixed rate channel with no buffering is used or for storage
applications where the storage space is pre-specified. Unfortunately,
due to the wide variation in the information content of different
images, such schemes do not result in constant reconstruction quality.
The specified bit rate may be unnecessarily high for some images
while resulting in unsatisfactory image quality or distortion measure
(e.g., SNR) at the expense of a variable bit rate is often the result of
including entropy coding in the scheme. The range over which the bit
rate may vary is an important hardware consideration since adequate
buffering must generally be provided.
4. Implementation issues. This refers to the nature and complexity of
the algorithm relative to the particular hardware or software
environment in which it is implemented. Three aspects of an
algorithm need to be considered (1) Computational complexity (i.e.,
the number of additions, multiplications, shifts, comparisons, or other
operations required per pixel), (2) Memory requirements, and (3)
amenability to parallel processing structures. Typical implementation
environments with current technology include PC-based, DSP (digital
signal processor) chip-based, and ASIC (application specific
integrated circuit)-based systems. The characteristics of each type of
environment determine the suitability of an algorithm and the speed at
which it can operate in that environment.
5. Encoder/Decoder asymmetry. Some approaches to compression
result inherently in a complex encoder but a simple decoder; others
require an encoder and a decoder of comparable complexity, Although
encoders and decoders of equal complexity may be acceptable in
many transmission applications, a simple decoder is more desirable in
applications where it is used repeatedly, such as image storage and
retrieval systems. The inclusion of adaptivity can also alter this
balance significantly as well as increase the overall complexity of the
system.
6. Channel error tolerance. Unfortunately, one of the prices paid for
data compression is the increased susceptibility of encoded data to
channel errors, and the degree of susceptibility varies widely among
the different schemes. In the case of block processing algorithms, the
effect of a bit error is often confined to only a small block of the
image, while in other schemes it may appear as a streak across the
image. If variable –length coding is used, which is often the case in
more sophisticated schemes, the effect of channel errors can be
catastrophic and can result in the loss of the entire image. Of course,
error control coding can be added to any system, but the price to be
paid is an increase in the overall complexity and bit rate.
7. Artifacts. Different algorithms create different artifacts, depending on
their mode of operation. Even a given algorithm may exhibit artifacts,
depending on the bit rate at which it is operated. Some artifacts such
as blocking or edge jaggedness may be more visually objectionable
than random noise or overall edge smoothing. Also, the visibility of
artifacts is highly dependent on the particular image and the
conditions under which it is viewed.
8. Effect of multiple coding. Some applications may require that an
image undergo the compression-decompression cycle many times. For
example, an image may be compressed and transmitted to a
destination where it is decompressed and viewed. An operator may
alter a small portion of the image and then compress the entire image
and send it to another destination, where this process is repeated. In
such applications it is essential that the repeated compression and
decompression of the unaltered portions does not result in any
additional degradation beyond the first stage of compression. Some
compression schemes do not create further loss of image quality when
used successively on the same image, while others can additional
degradations. Fortunately, in most cases, subsequent losses are
significantly less than the losses caused by the first stage.
9. Progressive transmission capability. Progressive image transmission
allows for an approximate image to be sent at a low bit rate for quick
recognition, and then remaining details are transmitted incrementally
if desired by the user. Any scheme can artificially be made
progressive by using it to encode a low resolution version of the
original image as a first approximation, and then encoding the
difference between this stage and either the original image or another
intermediate stage to create as many levels progression as desired.
Some schemes are inherently amenable to progressive transmission,
due to the embedded structure of the encoder and can operate in that
mode without requiring any additional complexity. Other techniques
can also operate in a progressive mode but at the expense of additional
complexity.
10. System Compatibility. If the system requires compatibility
with other manufacturer’s products the choice of a compression
scheme may be indicated by the existence of standards that have been
proposed and / or adopted (e.g., CCITT facsimile standards or the
JPEG standard for continuous-tone images).
A Note on performance Measures

Throughout this project report, numbers are given for two


measures of compression performance-compression ratio and peak
signal-to-noise ratio (PSNR). The results of both these performance
measures can be used to mislead the unwary reader, so it is important to
explain exactly how these figures are computed. We define compression
ratio as
(The number of bits in the original image) / (The number of bits in the
compressed image)

In this project we confine our measurements to 8 bits per pixel (bpp)


grayscale images, so the peak signal-to-noise ratio in decibels (dB) is
computed as
PSNR=20 log10 (255/RMSE)
Where RMSE is the root mean squared error defined as

RMSE= √1/NM∑∑ [f(i,j)-f^(i,j)]^2


ij

And N and M are the width and height, respectively, of the images in
pixels, f is the original image and f is the reconstructed image. Note that
the original and reconstructed images must be same size.
WAVELET TRNSFORM TECHINQUE
Wavelet Transform Technique
The purpose of this section is to provide an intuitive understanding
of what wavelets are and why they are useful for signal compression.
One of the most commonly used approaches for analyzing a signal f(x) is
to represent it as a weighted sum of simple building blocks, called basis
functions:
f(x) = ∑ ciΨi(x)
i
Where Ψi(x) are basis functions and the ci are coefficients, or weights.
Since the basis functions Ψi are fixed, it is the coefficients which contain
the information about the signal. The simplest such representation uses
translates of the impulse functions as its only bases, yielding a
representation that reveals information only about the signal’s frequency
domain behavior.

For the purpose of signal compression, neither of the above


representations is ideal what we would like to have is a representation,
which contains information about both the time and frequency behavior
of the signal. More specifically, we want to know the frequency content
of the signal at a particular instant in time. However, resolution in time
(Δx) and resolution in frequency (Δω) cannot both be made arbitrarily
small at the same time because their product is lower bounded by the
Heisenberg inequality,

Δω x Δx ≥ 0.5

This inequality means that we must trade off time resolution for
frequency resolution, or vice versa. Thus, it is possible to get very good
resolution in time if you are willing to settle for low resolution in
frequency, and you can get very good resolution in frequency if you are
willing to settle for low resolution in time.

The situation is really not all that bad from a compression standpoint.
By their very nature, low frequency events are spread out (or non-local)
in time and high frequency events are concentrated (or localized) in time.
Thus, one way that we can live within the confines of the Heisenberg
inequality and yet still get useful time-frequency information about a
signal is if we design our basis functions to act like cascaded octave band
pass filters, which repeatedly split the signal’s bandwidth in half.

To gain insight into designing a set of basis function that will satisfy
both our desire for information and the Heisenberg inequality, let us
compare the impulse function and the sinusoids. The impulse function
cannot provide information about the frequency behavior of the signal
because its support- the interval over which it is non-zero is
infinitesimally small. At the opposite extreme are the sinusoids, which
cannot provide information about comprise between these two extremes:
a set of basis functions {ψi}, each with finite support of a different width.
The different support widths allow us to trade off time and frequency
resolution in different ways; for example, a wide basis function can
examine a large region of the signal and resolve low frequency details
accurately, while a short basis function can examine a small region of the
signal to resolve time details accurately.

To simplify things, let us constrain the entire basis functions in


{ψi} to be scaled and translated versions of the same prototype function
ψ, known as the mother wavelet. The scaling is accomplished by
multiplying x by some scale factor; if we choose the scale factor to be a
power of two, yielding ψ(2^v x-k), kЄZ

Note that this really means that we are translating ψ in steps of size
2^vk. Putting this altogether gives us a wavelet decomposition of the
signal,

F(x) =∑∑ Cvk ψvk (x)


v k

Where

Ψvk(x) = 2^v/2 Ψ (2^vx-k)


(The multiplication by 2^v/2 is needed to make the base orthonormal).
Cvk are computed by the wavelet transform, which is just the inner
product of the signal f(x) with the basis functions Ψvk(x).
Still Image Compression

A wide variety of wavelet-based image compression schemes


have been reported in the literature, ranging from simple entropy coding
to more complex techniques such as vector quantization, adaptive
transforms, tree encoding, and edge based coding. All of these schemes
can be described in terms of the general framework depicted in fig1.
Compression is accomplished by applying wavelet transform to
decorrelate the image data, quantizing the resulting transform
coefficients, and coding the quantized values. Image reconstructed is
accomplished by inverting the compression operations.

Implementing the Wavelet Transform

The forward and inverse wavelet transform can each be efficiently


implemented in O(n) time by a pair of appropriately designed
Quadurature Mirror filters (QMF’s). Therefore, wavelet-based image
compression can be viewed as a form of subband coding. Each QMF pair
consists of a low pass filter (H) and a high pass filter (G), which split a
signal’s bandwidth in half. The impulse response of H and G are mirror
images, and are related by

Gn = (-1)^1-n H1-n

The impulse responses of the forward and inverse transform QMF’s


denoted by (H,G) and (H,G) respectively- are by

Gn=G-n
Hn=H-n

To illustrate how the wavelet transform is implemented,


Daubechies’s W6 Wavelet is chosen as it is well known and has some
nice properties. One such property is that it has two vanishing moments,
which means the transform coefficients will be zero for any signal that
can be described by a polynomial of degree 2 or less. The mother wavelet
basis for W6 is shown in fig.2. The filter coefficients for W6 are

H0 = 0.33267055290
H1 = 0.806891509311
H2 =0.459877502118
H3=-0.135011020010
H4 =-0.085441273882
H5=0.035226291882

From which the coefficients for G,H and G can be derived using
equations 1,2,and 3. The impulse response of H and G are shown in fig 3.

Convoluting the filter coefficients Ck with the signal values can


filter a one-dimensional signal S:
N
Si=∑ Ck Si-k
k=o
Where m is the number of coefficients, or taps, in the filter. The one-
dimensional forward wavelet transform of a signal S is performed by
convoluting S with both H and G and down sampling by 2. The
relationship of the H and G filter coefficients with the beginning of a
signal S is

h5 h4 h3 h2 h1 h0
S0 S1 S2 S3 S4 S5 S6 ………….
g5 g4 g3 g1 g0
Note that G filter extends before the signal in time; if S is finite, the H
filter will extend beyond the end of the signal. A similar situation is
encountered with the inverse wavelet transform filters H and G. In an
implementation; one must make some choice about what values to pad
the extension with. A choice which works well in practice, is to wrap the
signal about its endpoints, i.e.,

………Sn-1 Sn S0 S1 S2 S3 …………….Sn-2 Sn-1 Sn S0 S1………

There by creating a periodic extension of S.

Fig. 4 illustrates a single 2-D forward wavelet transform of an


image, which is accomplished, by two separate 1-D transforms. The
image F(x,y) is first filtered along the x dimension, resulting in a low
pass image FL(x,y) and a high pass image FH(x,y). Since the bandwidth of
fL and fH along the x dimension is now half that of F, we can safely down
sample each of the filtered images in the x dimension by 2 without loss of
information. The down sampling is accomplished by dropping every
other filtered value. Both fL and fH are then filtered along the y
dimension, resulting in four sub-images: fLL,fLH, fHL, and fHH. Once again
we can down sample the sub-images by 2, this time along the y
dimension. As illustrated in fig.4, the 2-D filtering decomposes an image
into an average signal (fLL), and three detail signals which are
directionally sensitive: fLH emphasizes the horizontal image features, fHL
the vertical features, and fHH the diagonal features. The directional
sensitivity of the detail signals is an artifact of the frequency ranges they
contain.

It is customary in wavelet compression to recursively transform


the average signal. The number of transformations performed depends on
several factors, including the amount of compression desired, the size of
the original image, and the length of the QMF filters. In general, the
higher the desired compression ratio, the more times the transform is
performed.
Forward transform tree

fll
Inverse Transform tree
After the forward wavelet transform is completed, we are left

You might also like