Huffman Coding - Base of JPEG Image Compression

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 13

Huffman Coding –

Base of JPEG Image


Compression
Huffman coding can be used to
compress all sorts of data. It is an
entropy-based algorithm that relies
on an analysis of the frequency of
symbols in an array.

Huffman coding can be


demonstrated most vividly by
compressing a raster image.
Suppose we have a 5×5 raster
image with 8-bit color, i.e. 256
different colors. The uncompressed
image will take 5 x 5 x 8 = 200 bits
of storage.
First, we count up how many times
each color occurs in the image.
Then we sort the colors in order of
decreasing frequency. We
end up with a row that looks like

this:

Now we put the colors together by


building a tree such that the colors
farthest from the root are the least
frequent. The colors are joined in

pairs, with a node forming the


connection. A node can connect
either to another node or to a color.
In our example, the tree might look
like this:

Our result is known as a Huffman


tree. It
can be used for encoding and
decoding. Each color is encoded as
follows. We create codes by moving
from the root of the tree to each

color. If we turn right at a node, we


write a 1, and if we turn left – 0. This
process yields a Huffman code table
in which each symbol is assigned a
bit code such that the most
frequently occurring symbol has the
shortest code, while the least
common symbol is given the
longest code.
The Huffman tree and code table
we created are not the only ones
possible. An alternative Huffman
tree that looks like
this could be created for our image:

The corresponding code table


would then be:

Using the variant is preferable in our


example. This is because it provides
better compression for our specific
image.
Because each color has a unique
bit code that is not a prefix of any
other, the colors can be replaced by
their bit codes in the image file. The
most frequently occurring color,
white, will be represented with just a
single bit rather than 8 bits. Black
will take two bits. Red and blue will
take three. After these replacements
are made, the 200-bit image will be
compressed to 14 x 1 + 6 x 2 + 3 x
3 + 2 x 3 = 41 bits, which is about 5
bytes compared to 25 bytes in the
original image.

Of course, to decode the image the


compressed file must include the
code table, which takes up some
space. Each bit code derived from
the Huffman tree unambiguously
identifies a color, so the
compression loses no information.
This compression technique is used
broadly to encode music, images,
and certain communication
protocols. Typically, a variation of
the algorithm is used for improved
efficiency. The method described is
generally part of general
compression algorithms such as
Flate-ZIP for images or FLAC for
music.

Lossless JPEG compression uses


the Huffman algorithm in its pure
form. Lossless JPEG is common in
medicine as part of the DICOM
standard, which is supported by the
major medical equipment
manufacturers (for use in ultrasound
machines, nuclear resonance
imaging machines, MRI machines,
and electron microscopes).
Variations of the Lossless JPEG
algorithm are also used in the RAW
format, which is popular among
photo enthusiasts because it saves
data from a camera’s image sensor
without losing
information.

You might also like