Information Theory & Source Coding

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

UCCN2043 Lecture Notes

2.0

Information Theory and Source Coding

Claude Shannon laid the foundation of information theory in 1948. His paper "A
Mathematical Theory of Communication" published in Bell System Technical Journal is the
basis for the entire telecommunications developments that have taken place during the last
five decades.
A good understanding of the concepts proposed by Shannon is a must for every budding
telecommunication professional. In this chapter, Shannon's contributions to the field of
modern communications will be studied.

2.1

Requirements of A Communication System

In any communication system, there will be an information source that produces information
in some form, and an information sink absorbs the information. The communication medium
connects the source and the sink. The purpose of a communication system is to transmit the
information from the source to the sink without errors.
However, the communication medium always introduces some errors because of noise. The
fundamental requirement of a communication system is to transmit the information without
errors in spite of the noise.

2.1.1

The Communication System

The block diagram of a generic communication system is shown in Figure 2.1. The
information source produces symbols (such as English letters, speech, video, etc.) that are
sent through the transmission medium by the transmitter. The communication medium
introduces noise, and so errors are introduced in the transmitted data. At the receiving end,
the receiver decodes the data and gives it to the information sink.

Figure 2.1: Generic communication system

Page 1 of 14

UCCN2043 Lecture Notes


As an example, consider an information source that produces two symbols A and B. The
transmitter codes the data into a bit stream. For example, A can be coded as 1 and B as 0. The
stream of 1's and 0's is transmitted through the medium. Because of noise, 1 may become 0 or
0 may become 1 at random places, as illustrated below:
Symbols produced:

Bit stream produced:

Bit stream received:

At the receiver, one bit is received in error. How to ensure that the received data can be made
error free? Shannon provides the answer. The communication system given in Figure 2.1 can
be expanded, as shown in Figure 2.2.

Figure 2.2: Generic communication system as proposed by Shannon.

As proposed by Shannon, the communication system consists of source encoder, channel


encoder and modulator at the transmitting end, and demodulator, channel decoder and source
decoder at the receiving end.
In the block diagram shown in Figure 2.2, the information source produces the symbols that
are coded using two types of codingsource encoding and channel encodingand then
modulated and sent over the medium.
At the receiving end, the modulated signal is demodulated, and the inverse operations of
channel encoding and source encoding (channel decoding and source decoding) are
performed. Then the information is presented to the information sink. Each block is explained
below.
Information source: The information source produces the symbols. If the information source
is, for example, a microphone, the signal is in analog form. If the source is a computer, the
signal is in digital form (a set of symbols).

Page 2 of 14

UCCN2043 Lecture Notes


Source encoder: The source encoder converts the signal produced by the information source
into a data stream. If the input signal is analog, it can be converted into digital form using an
analog-to-digital converter. If the input to the source encoder is a stream of symbols, it can be
converted into a stream of 1s and 0s using some type of coding mechanism. For instance, if
the source produces the symbols A and B, A can be coded as 1 and B as 0. Shannon's source
coding theorem tells us how to do this coding efficiently.
Source encoding is done to reduce the redundancy in the signal. Source coding techniques
can be divided into lossless encoding techniques and lossy encoding techniques. In lossy
encoding techniques, some information is lost.
In source coding, there are two types of codinglossless coding and lossy coding. In lossless
coding, no information is lost. When we compress our computer files using a compression
technique (for instance, WinZip), there is no loss of information. Such coding techniques are
called lossless coding techniques. In lossy coding, some information is lost while doing the
source coding. As long as the loss is not significant, we can tolerate it. When an image is
converted into JPEG format, the coding is lossy coding because some information is lost.
Most of the techniques used for voice, image, and video coding are lossy coding techniques.
Note: The compression utilities we use to compress data files use lossless encoding
techniques. JPEG image compression is a lossy technique because some information
is lost.

Channel encoder: If we have to decode the information correctly, even if errors are
introduced in the medium, we need to put some additional bits in the source-encoded data so
that the additional information can be used to detect and correct the errors. This process of
adding bits is done by the channel encoder. Shannon's channel coding theorem tells us how to
achieve this.
Modulation: Modulation is a process of transforming the signal so that the signal can be
transmitted through the medium. We will discuss the details of modulation in a later chapter.
Demodulator: The demodulator performs the inverse operation of the modulator.
Channel decoder: The channel decoder analyzes the received bit stream and detects and
corrects the errors, if any, using the additional data introduced by the channel encoder.
Source decoder: The source decoder converts the bit stream into the actual information. If
analog-to-digital conversion is done at the source encoder, digital-to-analog conversion is
done at the source decoder. If the symbols are coded into 1s and 0s at the source encoder, the
bit stream is converted back to the symbols by the source decoder.
Information sink: The information sink absorbs the information.
The block diagram given in Figure 2.2 is the most important diagram for all communication
engineers. We will devote separate chapters to each of the blocks in this diagram.

Page 3 of 14

UCCN2043 Lecture Notes

2.2

Entropy of an Information Source

What is information? How do we measure information? These are fundamental issues for
which Shannon provided the answers. We can say that we received some information if there
is "decrease in uncertainty."
Consider an information source that produces two symbols A and B. The source has sent A,
B, B, A, and now we are waiting for the next symbol. Which symbol will it produce? If it
produces A, the uncertainty that was there in the waiting period is gone, and we say that
"information" is produced. Note that we are using the term "information" from a
communication theory point of view; it has nothing to do with the "usefulness" of the
information.
Shannon proposed a formula to measure information. The information measure is called the
entropy of the source. If a source produces N symbols, and if all the symbols are equally
likely to occur, the entropy of the source is given by
H = log2 N

bits/symbol

For example, assume that a source produces the English letters (in this chapter, we will refer
to the English letters A to Z and space, totaling 27, as symbols), and all these symbols will be
produced with equal probability. In such a case, the entropy is
H = log2 27 = 4.75

bits/symbol

The information source may not produce all the symbols with equal probability. For instance,
in English the letter "E" has the highest frequency (and hence highest probability of
occurrence), and the other letters occur with different probabilities. In general, if a source
produces (i)th symbol with a probability of P(i), the entropy of the source is given by
H = P (i ) log 2 P (i )

bits/symbol

If a large text of English is analyzed and the probabilities of all symbols (or letters) are
obtained and substituted in the formula, then the entropy is
H = 4.07

bits/symbol

Note: Consider the following sentence: "I do not knw wheter this is undrstandble." In spite
of the fact that a number of letters are missing in this sentence, you can make out what
the sentence is. In other words, there is a lot of redundancy in the English text.

This is called the first-order approximation for calculation of the entropy of the information
source.

Page 4 of 14

UCCN2043 Lecture Notes


In English, there is a dependence of one letter on the previous letter. For instance, the letter
U always occurs after the letter Q. If we consider the probabilities of two symbols
together (aa, ab, ac, ad,..ba, bb, and so on), then it is called the second-order approximation.
So, in second-order approximation, we have to consider the conditional probabilities of
digrams (or two symbols together). The second-order entropy of a source producing English
letters can be worked out to be
H = 3.36

bits/symbol

The third-order entropy of a source producing English letters can be worked out to be
H = 2.77
bits/symbol
(which means each combination of three letters can be represented by 2.77 bits).

As you consider the higher orders, the entropy goes down.


As another example, consider a source that produces four symbols with probabilities of 1/2,
1/4, 1/8, and 1/8, and all symbols are independent of each other. The entropy of the source is
7/4 bits/symbol.

2.3

Channel Capacity

Shannon introduced the concept of channel capacity, the limit at which data can be
transmitted through a medium. The errors in the transmission medium depend on the energy
of the signal, the energy of the noise, and the bandwidth of the channel.
Conceptually, if the bandwidth is high, we can pump more data in the channel. If the signal
energy is high, the effect of noise is reduced. According to Shannon, the bandwidth of the
channel and signal energy and noise energy are related by the formula
C = W log 2 (1 +

S
)
N

where
C is channel capacity in bits per second (bps)
W is bandwidth of the channel in Hz
S/N is the signal-to-noise power ratio (SNR). SNR generally is measured in dB using
the formula
Signal Power (W )
S

( dB ) = 10 log
N
Noise Power (W )
The value of the channel capacity obtained using this formula is the theoretical maximum. As
an example, consider a voice-grade line for which W = 3100Hz, SNR = 30dB (i.e., the signalto-noise ratio is 1000:1)

C = 3100 log 2 (1 + 1000) = 30,894 bps


Page 5 of 14

UCCN2043 Lecture Notes


So, we cannot transmit data at a rate faster than this value in a voice-grade line. An important
point to be noted is that in the above formula, Shannon assumes only thermal noise. To
increase C, can we increase W? No, because increasing W increases noise as well, and SNR
will be reduced. To increase C, can we increase SNR? No, that results in more noise, called
inter-modulation noise. The entropy of information source and channel capacity are two
important concepts, based on which Shannon proposed his theorems.

2.4

Shannons Theorems

In a digital communication system, the aim of the designer is to convert any information into
a digital signal, pass it through the transmission medium and, at the receiving end, reproduce
the digital signal exactly. To achieve this objective, two important requirements are:
1. To code any type of information into digital format. Note that the world is analog
voice signals are analog, images are analog. We need to devise mechanisms to
convert analog signals into digital format. If the source produces symbols (such as A,
B), we also need to convert these symbols into a bit stream. This coding has to be
done efficiently so that the smallest number of bits is required for coding.
2. To ensure that the data sent over the channel is not corrupted. We cannot eliminate the
noise introduced on the channels, and hence we need to introduce special coding
techniques to overcome the effect of noise.
These two aspects have been addressed by Claude Shannon in his classical paper "A
Mathematical Theory of Communication" published in 1948 in Bell System Technical
Journal, which gave the foundation to information theory. Shannon addressed these two
aspects through his source coding theorem and channel coding theorem.
Shannon's source coding theorem addresses how the symbols produced by a source have to
be encoded efficiently. Shannon's channel coding theorem addresses how to encode the data
to overcome the effect of noise.

2.4.1

Source Coding Theorem

The source coding theorem states that "the number of bits required to uniquely describe an
information source can be approximated to the information content as closely as desired."
Again consider the source that produces the English letters. The information content or
entropy is 4.07 bits/symbol. According to Shannon's source coding theorem, the symbols can
be coded in such a way that for each symbol, 4.07 bits are required. But what should be the
coding technique? Shannon does not tell us!
Shannon's theory puts only a limit on the minimum number of bits required. This is a very
important limit; all communication engineers have struggled to achieve the limit all these 50
years.

Page 6 of 14

UCCN2043 Lecture Notes


Consider a source that produces two symbols A and B with equal probability.

Symbol

Probability

Code Word

0.5

0.5

The two symbols can be coded as above, A is represented by 1 and B by 0. We require 1


bit/symbol.
Now consider a source that produces these same two symbols. But instead of coding A and B
directly, we can code AA, AB, BA, BB. The probabilities of these symbols and associated
code words are shown here:
Symbol

Probability

Code Word

AA

0.45

AB

0.45

10

BA

0.05

110

BB

0.05

111

Here the strategy in assigning the code words is that the symbols with high probability are
given short code words and symbols with low probability are given long code words.
Note: Assigning short code words to high-probability symbols and long code words to lowprobability symbols results in efficient coding.
In this case, the average number of bits required per symbol can be calculated using the
formula
L=

P (i ) L (i )

bits/symbol

where
P(i) = Probability of the code word
L(i) = Length of the code word
For this example,
L = (1 * 0.45 + 2 * 0.45 + 3 * 0.05 + 3 * 0.05) = 1.65 bits/symbol.
The entropy of the source can be calculated to be 1.469 bits/symbol.
So, if the source produces the symbols in the following sequence:
AABABAABBB
then source coding gives the bit stream
0 110 110 10 111
Page 7 of 14

UCCN2043 Lecture Notes


This encoding scheme on an average, requires 1.65 bits/symbol. If we code the symbols
directly without taking into consideration the probabilities, the coding scheme would be
AA
00
AB
01
BA
10
BB
11
Hence, we require 2 bits/symbol. The encoding mechanism taking the probabilities into
consideration is a better coding technique. The theoretical limit of the number of bits/symbol
is the entropy, which is 1.469 bits/symbol. The entropy of the source also determines the
channel capacity.
As we keep considering the higher-order entropies, we can reduce the bits/ symbol further
and perhaps achieve the limit set by Shannon.
Based on this theory, it is estimated that English text cannot be compressed to less than 1.5
bits/symbol even if you use sophisticated coders and decoders.
This theorem provides the basis for coding information (text, voice, video) into the minimum
possible bits for transmission over a channel. More details of source coding will be covered
in section 2.5.
The source coding theorem states "the number of bits required to uniquely describe an
information source can be approximated to the information content as closely as desired."

2.4.2

Channel Coding Theorem

Shannon's channel coding theorem states that "the error rate of data transmitted over a
bandwidth limited noisy channel can be reduced to an arbitrary small amount if the
information rate is lower than the channel capacity."
This theorem is the basis for error correcting codes using which we can achieve error-free
transmission. Again, Shannon only specified that using good coding mechanisms, we can
achieve error-free transmission, but he did not specify what the coding mechanism should be!
According to Shannon, channel coding may introduce additional delay in transmission but,
using appropriate coding techniques, we can overcome the effect of channel noise.

Page 8 of 14

UCCN2043 Lecture Notes


Consider the example of a source producing the symbols A and B. A is coded as 1 and B as 0.
Symbols Produced

Bit Stream

Now, instead of transmitting this bit stream directly, we can transmit the bit stream
111 000 000 111 000
that is, we repeat each bit three times. Now, let us assume that the received bit stream is
101 000 010 111 000

Two errors are introduced in the channel. But still, we can decode the data correctly at the
receiver because we know that the second bit should be 1 and the eighth bit should be 0
because the receiver also knows that each bit is transmitted thrice. This is error correction.
This coding is called Rate 1/3 error correcting code. Such codes that can correct the errors are
called Forward Error Correcting (FEC) codes.
Ever since Shannon published his historical paper, there has been a tremendous amount of
research in the error correcting codes. We will discuss error detection and correction in
Lesson 5 "Error Detection and Correction".
All these 50 years, communication engineers have struggled to achieve the theoretical limits
set by Shannon. They have made considerable progress. Take the case of line modems that
we use for transmission of data over telephone lines. The evolution of line modems from
V.26 (2400bps data rate, 1200Hz bandwidth), V.27 modems (4800bps data rate, 1600Hz
bandwidth), V.32 modems (9600bps data rate, 2400Hz bandwidth), and V.34 modems
(28,800bps data rate, 3400Hz bandwidth) indicates the progress in source coding and channel
coding techniques using Shannon's theory as the foundation.
Shannon's channel coding theorem states that "the error rate of data transmitted over a
bandwidth limited noisy channel can be reduced to an arbitrary small amount if the
information rate is lower than the channel capacity."

Note: Source coding is used mainly to reduce the redundancy in the signal, whereas channel
coding is used to introduce redundancy to overcome the effect of noise.

Page 9 of 14

UCCN2043 Lecture Notes

2.5

Source Coding (Ian Glover 9.1)

There may be a number of reasons for wishing to change the form of a digital signal as
supplied by an information source prior to transmission. In the case of English language text,
for example, we start with a data source consisting of about 40 distinct symbols (the letters of
the alphabet, integers and punctuation). In principle, we could transmit such text using a
signal alphabet consisting of 40 distinct voltage waveforms. This would constitute an M-ary
system where M = 40 unique signals. It may be, however, that for one or more of the
following reasons this approach is inconvenient, difficult or impossible:

The transmission channel may be physically unsuited to carrying such a large number
of distinct signals.
The relative frequencies (chances of occurrence) with which different source symbols
occur will vary widely. This will have the effect of making the transmission
inefficient in terms of the time it takes and/or bandwidth it requires.
The data may need to be stored and/or processed in some way before transmission.
This is most easily achieved using binary electronic devices as the storage and
processing elements.

For all these reasons, sources of digital information are almost always converted as soon as
possible into binary form, i.e. each symbol is encoded as a binary word.
After appropriate processing the binary words may then be transmitted directly, as either:
Baseband signals or
Bandpass signals (after going through modulation process)
or re-coded into another multi-symbol alphabet (it is unlikely that the transmitted symbols
map directly onto the original source symbols).

2.5.1

Variable Length Source Coding (Ian Glover 9.5)

We are generally interested in finding a more efficient code which represents the same
information using fewer digits on average. This results in different lengths of codeword being
used for different symbols. The problem with such variable length codes is in recognizing the
start and end of the symbols.

2.5.2

Decoding Variable Length Codewords (Ian Glover 9.5.2)

The following properties need to be considered when attempting to decode variable length
codewords:.

Page 10 of 14

UCCN2043 Lecture Notes


(A)
Unique Decoding.
This is essential if the received message is to have only a single possible meaning. Consider
an M = 4 symbol alphabet with symbols represented by binary digits as follows:
A=0
B = 01
C = 11
D = 00
If we receive the codeword 0011, it is not known whether the transmission was
D , C or
A , A , C.
This example is not, therefore, uniquely decodable.

(B)
Instantaneous Decoding.
Consider now an M = 4 symbol alphabet, with the following binary representation:
A=0
B = 10
C = 110
D = 111
This code can be instantaneously decoded using the decision tree shown in Figure 2.3 below
since no complete codeword is a prefix of a larger codeword.

Figure 2.3: Algorithm for decision tree decoding and example of practical code tree.

Page 11 of 14

UCCN2043 Lecture Notes

This is in contrast to the previous example where A is a prefix of both B and D. The latter
example is also a comma code as the symbol zero indicates the end of a codeword except
for the all-ones word whose length is known. Note that we are restricted in the number of
available codewords with small numbers of bits to ensure we achieve the desired decoding
properties.
Using the representation:
A=0
B = 01
C = 011
D = 111
the code is identical to the example just given but the bits are time reversed. It is thus still
uniquely decodable but no longer instantaneous, since early codewords are now prefixes of
later ones.

2.5.3

Variable Length Coding (Ian Glover 9.6)

Assume an M = 8 symbol source A , . . . , H having probabilities of symbol occurrence:


m

P(m)

0.1

0.18

0.4

0.05

0.06

0.1

0.07

0.04

The source entropy is given by:


H = P ( m) log 2 P (m)

2.55 bits/symbol

If the symbols are each allocated 3 bits, comprising all the binary patterns between 000 and
111, the maximum entropy of an eight-symbol source is log 2 8 = 3 bit/symbol and the source
efficiency is therefore given by:
source

2.55
100% = 85%
3

ShannonFano coding, in which we allocate the regularly used or highly probable messages
fewer bits, as these are transmitted more often, is more efficient. The less probable messages
can then be given the longer, less efficient bit patterns. This yields an improvement in
efficiency compared with that before source coding was applied.
The improvement is not as great, however, as that obtainable with another variable length
coding scheme, namely Huffman coding.

Page 12 of 14

UCCN2043 Lecture Notes


Huffman Coding
The Huffman coding algorithm comprises two steps reduction and splitting. These steps can
be summarised by the following instructions:
(A)

Reduction (Referring to Figure 2.4)


1) List the symbols in descending order of probability.
2) Reduce the two least probable symbols to one symbol with probability equal to
their combined probability.
3) Reorder in descending order of probability at each stage.
4) Repeat the reduction step until only two symbols remain.

(B)

Splitting (Referring to Figure 2.5)


1) Assign 0 and 1 to the two final symbols and work backwards.
2) Expand or lengthen the code to cope with each successive split and, at each stage,
distinguish between the two split symbols by adding another 0 and 1 respectively
to the codeword.

The result of Huffman encoding (Figures 2.4 and 2.5) of the symbols A , . . . , H in the
previous example is to allocate the symbols codewords as follows:
m

P(m)

0.40

0.18

0.10

0.10

0.07

0.06

0.05

0.04

Codewords

001

011

0000

0100

0101

00010

00011

The code length is now given as:


L = 1 (0.4) + 3 (0.18+0.10 ) + 4 (0.10+0.07+0.06) + 5 (0.05+0.04) = 2.61
and the code efficiency is:
code

H
100% =
L

2.55
100% = 97.7%
2.61

Note that the Huffman codes are formulated to minimise the average codeword length. They
do not necessarily possess error detection properties but are uniquely, and instantaneously,
decodable, as defined in section 2.5.2.

Page 13 of 14

UCCN2043 Lecture Notes

Figure 2.4 Huffman coding of an eight-symbol alphabet reduction step.

Figure 2.5 Huffman coding allocation of the codewords to the eight symbol

Page 14 of 14

You might also like