Image Compression Using The Discrete Cosine Transform: Andrew B. Watson, NASA Ames Research Center
Image Compression Using The Discrete Cosine Transform: Andrew B. Watson, NASA Ames Research Center
Image Compression Using The Discrete Cosine Transform: Andrew B. Watson, NASA Ames Research Center
The discrete cosine transform (DCT) is a technique for converting a signal into elementary
frequency components. It is widely used in image compression. Here we develop some simple
functions to compute the DCT and to compress images. These functions illustrate the power of
Mathematica in the prototyping of image processing algorithms.
The rapid growth of digital imaging applications, including Each element of the transformed list S(u) is the inner (dot)
desktop publishing, multimedia, teleconferencing, and high- product of the input list s(x) and a basis vector. The constant
definition television (HDTV) has increased the need for effec- factors are chosen so that the basis vectors are orthogonal
tive and standardized image compression techniques. Among and normalized. The eight basis vectors for n = 8 are shown
the emerging standards are JPEG, for compression of still in Figure 1. The DCT can be written as the product of a
images [Wallace 1991]; MPEG, for compression of motion vector (the input list) and the n ¥ n orthogonal matrix whose
video [Puri 1992]; and CCITT H.261 (also known as Px64), rows are the basis vectors. This matrix, for n = 8, can be
for compression of video telephony and teleconferencing. computed as follows:
All three of these standards employ a basic technique
known as the discrete cosine transform (DCT). Developed by In[1]:= DCTMatrix =
Ahmed, Natarajan, and Rao [1974], the DCT is a close rela- Table[ If[ k==0,
tive of the discrete Fourier transform (DFT). Its application Sqrt[1/8],
to image compression was pioneered by Chen and Pratt Sqrt[2/8] Cos[Pi (2j + 1) k/16] ],
[1984]. In this article, I will develop some simple functions to {k, 0, 7}, {j, 0, 7}] // N;
compute the DCT and show how it is used for image com-
pression. We have used these functions in our laboratory to We can check that the matrix is orthogonal:
explore methods of optimizing image compression for the
human viewer, using information about the human visual In[2]:= DCTMatrix . Transpose[DCTMatrix] // Chop // MatrixForm
system [Watson 1993]. The goal of this paper is to illustrate Out[2]//MatrixForm=
the use of Mathematica in image processing and to provide
1. 0 0 0 0 0 0 0
the reader with the basic tools for further exploration of this
0 1. 0 0 0 0 0 0
subject.
0 0 1. 0 0 0 0 0
0 0 0 1. 0 0 0 0
The One-Dimensional Discrete Cosine Transform 0 0 0 0 1. 0 0 0
The discrete cosine transform of a list of n real numbers s(x), 0 0 0 0 0 1. 0 0
x = 0, ..., n - 1, is the list of length n given by: 0 0 0 0 0 0 1. 0
0 0 0 0 0 0 0 1.
n -1
In[9]:= DCT[input1]
where C(u) is as defined above. This equation expresses s as Out[9]= {-0.610952, 0.0740846, 0.83188, 0.825302, -0.607786,
a linear combination of the basis vectors. The coefficients -0.410739, 0.157452, -1.0884}
are the elements of the transform S, which may be regarded
as reflecting the amount of each frequency present in the In[10]:= % - output1 // Chop
input s. Out[10]= {0, 0, 0, 0, 0, 0, 0, 0}
We generate a list of random numbers to serve as a test
input: The inverse DCT can be computed by multiplication with
the inverse of the DCT matrix. We illustrate this with our
In[4]:= input1 = Table[Random[Real, {-1, 1}], {8}] previous example:
Out[4]= {0.142689, 0.539381, -0.964253, -0.70434, -0.98625,
0.789134, -0.368739, -0.175656} In[11]:= Inverse[DCTMatrix] . output1
Out[11]= {0.142689, 0.539381, -0.964253, -0.70434, -0.98625,
The DCT is computed by matrix multiplication: 0.789134, -0.368739, -0.175656}
S(u, v) = 2 C(u)C(v)
nm
m -1 n -1
u = 0, K , n - 1; v = 0, K , m - 1
In[22]:= DCT[array_?MatrixQ] :=
Transpose[DCT /@ Transpose[DCT /@ array] ]
In[24]:= IDCT[array_?MatrixQ] :=
2D Blocked DCT
Transpose[IDCT /@ Transpose[IDCT /@ array] ]
To this point, we have defined functions to compute the DCT
As an example, we invert the transform of the letter A: of a list of length n = 8 and the 2D DCT of an 8 ¥ 8 array.
We have restricted our attention to this case partly for sim-
plicity of exposition, and partly because when it is used for
image compression, the DCT is typically restricted to this
size. Rather than taking the transformation of the image as a
whole, the DCT is applied separately to 8 ¥ 8 blocks of the
image. We call this a blocked DCT.
81-88 Watson.mj 7/21/99 10:34 AM Page 85
To compute a blocked DCT, we do not actually have to Applying the DCT to this image gives an image consisting
divide the image into blocks. Since the 2D DCT is separable, of 64 blocks, each a DCT of 8 ¥ 8 pixels:
we can partition each row into lists of length 8, apply the
DCT to them, rejoin the resulting lists, and then transpose In[32]:= ShowImage[DCT[shuttle], {-300, 300}]
the whole image and repeat the process:
In[28]:= DCT[list_?(Length[#]>8&)] :=
Join @@ (DCT /@ Partition[list, 8])
In[29]:= ?DCT
Global`DCT
DCT[(array_)?MatrixQ] :=
Transpose[DCT /@ Transpose[DCT /@ array]]
DCT[(list_)?(Length[#1] > 8 & )] :=
Apply[Join, DCT /@ Partition[list, 8]]
The lattice of bright dots is formed by the DC coefficients
DCT[list_] :=
from each of the DCT blocks. To reduce the dominance of
Re[DCTTwiddleFactors*InverseFourier[N[list[[{1, 3, 5,
these terms, we display the image with a clipped graylevel
7, 8, 6, 4, 2}]]]]]
range. Note also the greater activity in the lower left com-
pared to the upper right, which corresponds mainly to uni-
When evaluating DCT of a 16 ¥ 16 image, Mathematica form sky.
begins by checking the first rule. It recognizes that the input The inverse DCT of a list of length greater than 8 is
is a matrix, and thus invokes the rule and applies DCT to each defined in the same way as the forward transform:
row. When DCT is applied to a row of length 16, the second
rule comes into play. The row is partitioned into two lists of In[33]:= IDCT[list_?(Length[#]>8&)] :=
length 8, and DCT is applied to each. These applications Join @@ (IDCT /@ Partition[list, 8])
invoke the last rule, which simply computes the 1D DCT of the
lists of length 8. The two sub-rows are then rejoined by the
Here is a simple test:
second rule. After each row has been transformed in this
way, the entire matrix is transposed by the first rule. The In[34]:= shuttle - IDCT[DCT[shuttle]] // Chop // Abs // Max
process of partitioning, transforming, and rejoining each row
Out[34]= 0
is then repeated, and the resulting matrix is transposed again.
For a test image, we provide a small 64 ¥ 64 picture of a
space shuttle launch. We use the utility function ReadImageRaw,
Quantization
defined in the package GraphicsImage.m to read a matrix of
graylevels from a file: DCT-based image compression relies on two techniques to
reduce the data required to represent the image. The first is
In[30]:= shuttle = ReadImageRaw[“shuttle”, {64, 64}]; quantization of the image’s DCT coefficients; the second is
entropy coding of the quantized coefficients. Quantization is
In[31]:= ShowImage[shuttle] the process of reducing the number of possible values of a
quantity, thereby reducing the number of bits needed to rep-
resent it. Entropy coding is a technique for representing the
quantized data as compactly as possible. We will develop
functions to quantize images and to calculate the level of
compression provided by different degrees of quantization.
We will not implement the entropy coding required to create
a compressed image file.
A simple example of quantization is the rounding of reals
into integers. To represent a real number between 0 and 7 to
some specified precision takes many bits. Rounding the num-
ber to the nearest integer gives a quantity that can be repre-
sented by just three bits.
In[37]:= w = 1/4;
In[38]:= Round[x/w]
Out[38]= 23
In[56]:= Entropy[Characters[“mississippi”]]
Out[56]= 1.82307
In[57]:= Entropy[Characters[“california”]]
Out[57]= 2.92193
Andrew B. Watson
NASA Ames Research Center
[email protected]