Information Theory & Source Coding
Information Theory & Source Coding
Information Theory & Source Coding
2.0
Claude Shannon laid the foundation of information theory in 1948. His paper "A
Mathematical Theory of Communication" published in Bell System Technical Journal is the
basis for the entire telecommunications developments that have taken place during the last
five decades.
A good understanding of the concepts proposed by Shannon is a must for every budding
telecommunication professional. In this chapter, Shannon's contributions to the field of
modern communications will be studied.
2.1
In any communication system, there will be an information source that produces information
in some form, and an information sink absorbs the information. The communication medium
connects the source and the sink. The purpose of a communication system is to transmit the
information from the source to the sink without errors.
However, the communication medium always introduces some errors because of noise. The
fundamental requirement of a communication system is to transmit the information without
errors in spite of the noise.
2.1.1
The block diagram of a generic communication system is shown in Figure 2.1. The
information source produces symbols (such as English letters, speech, video, etc.) that are
sent through the transmission medium by the transmitter. The communication medium
introduces noise, and so errors are introduced in the transmitted data. At the receiving end,
the receiver decodes the data and gives it to the information sink.
Page 1 of 14
At the receiver, one bit is received in error. How to ensure that the received data can be made
error free? Shannon provides the answer. The communication system given in Figure 2.1 can
be expanded, as shown in Figure 2.2.
Page 2 of 14
Channel encoder: If we have to decode the information correctly, even if errors are
introduced in the medium, we need to put some additional bits in the source-encoded data so
that the additional information can be used to detect and correct the errors. This process of
adding bits is done by the channel encoder. Shannon's channel coding theorem tells us how to
achieve this.
Modulation: Modulation is a process of transforming the signal so that the signal can be
transmitted through the medium. We will discuss the details of modulation in a later chapter.
Demodulator: The demodulator performs the inverse operation of the modulator.
Channel decoder: The channel decoder analyzes the received bit stream and detects and
corrects the errors, if any, using the additional data introduced by the channel encoder.
Source decoder: The source decoder converts the bit stream into the actual information. If
analog-to-digital conversion is done at the source encoder, digital-to-analog conversion is
done at the source decoder. If the symbols are coded into 1s and 0s at the source encoder, the
bit stream is converted back to the symbols by the source decoder.
Information sink: The information sink absorbs the information.
The block diagram given in Figure 2.2 is the most important diagram for all communication
engineers. We will devote separate chapters to each of the blocks in this diagram.
Page 3 of 14
2.2
What is information? How do we measure information? These are fundamental issues for
which Shannon provided the answers. We can say that we received some information if there
is "decrease in uncertainty."
Consider an information source that produces two symbols A and B. The source has sent A,
B, B, A, and now we are waiting for the next symbol. Which symbol will it produce? If it
produces A, the uncertainty that was there in the waiting period is gone, and we say that
"information" is produced. Note that we are using the term "information" from a
communication theory point of view; it has nothing to do with the "usefulness" of the
information.
Shannon proposed a formula to measure information. The information measure is called the
entropy of the source. If a source produces N symbols, and if all the symbols are equally
likely to occur, the entropy of the source is given by
H = log2 N
bits/symbol
For example, assume that a source produces the English letters (in this chapter, we will refer
to the English letters A to Z and space, totaling 27, as symbols), and all these symbols will be
produced with equal probability. In such a case, the entropy is
H = log2 27 = 4.75
bits/symbol
The information source may not produce all the symbols with equal probability. For instance,
in English the letter "E" has the highest frequency (and hence highest probability of
occurrence), and the other letters occur with different probabilities. In general, if a source
produces (i)th symbol with a probability of P(i), the entropy of the source is given by
H = P (i ) log 2 P (i )
bits/symbol
If a large text of English is analyzed and the probabilities of all symbols (or letters) are
obtained and substituted in the formula, then the entropy is
H = 4.07
bits/symbol
Note: Consider the following sentence: "I do not knw wheter this is undrstandble." In spite
of the fact that a number of letters are missing in this sentence, you can make out what
the sentence is. In other words, there is a lot of redundancy in the English text.
This is called the first-order approximation for calculation of the entropy of the information
source.
Page 4 of 14
bits/symbol
The third-order entropy of a source producing English letters can be worked out to be
H = 2.77
bits/symbol
(which means each combination of three letters can be represented by 2.77 bits).
2.3
Channel Capacity
Shannon introduced the concept of channel capacity, the limit at which data can be
transmitted through a medium. The errors in the transmission medium depend on the energy
of the signal, the energy of the noise, and the bandwidth of the channel.
Conceptually, if the bandwidth is high, we can pump more data in the channel. If the signal
energy is high, the effect of noise is reduced. According to Shannon, the bandwidth of the
channel and signal energy and noise energy are related by the formula
C = W log 2 (1 +
S
)
N
where
C is channel capacity in bits per second (bps)
W is bandwidth of the channel in Hz
S/N is the signal-to-noise power ratio (SNR). SNR generally is measured in dB using
the formula
Signal Power (W )
S
( dB ) = 10 log
N
Noise Power (W )
The value of the channel capacity obtained using this formula is the theoretical maximum. As
an example, consider a voice-grade line for which W = 3100Hz, SNR = 30dB (i.e., the signalto-noise ratio is 1000:1)
2.4
Shannons Theorems
In a digital communication system, the aim of the designer is to convert any information into
a digital signal, pass it through the transmission medium and, at the receiving end, reproduce
the digital signal exactly. To achieve this objective, two important requirements are:
1. To code any type of information into digital format. Note that the world is analog
voice signals are analog, images are analog. We need to devise mechanisms to
convert analog signals into digital format. If the source produces symbols (such as A,
B), we also need to convert these symbols into a bit stream. This coding has to be
done efficiently so that the smallest number of bits is required for coding.
2. To ensure that the data sent over the channel is not corrupted. We cannot eliminate the
noise introduced on the channels, and hence we need to introduce special coding
techniques to overcome the effect of noise.
These two aspects have been addressed by Claude Shannon in his classical paper "A
Mathematical Theory of Communication" published in 1948 in Bell System Technical
Journal, which gave the foundation to information theory. Shannon addressed these two
aspects through his source coding theorem and channel coding theorem.
Shannon's source coding theorem addresses how the symbols produced by a source have to
be encoded efficiently. Shannon's channel coding theorem addresses how to encode the data
to overcome the effect of noise.
2.4.1
The source coding theorem states that "the number of bits required to uniquely describe an
information source can be approximated to the information content as closely as desired."
Again consider the source that produces the English letters. The information content or
entropy is 4.07 bits/symbol. According to Shannon's source coding theorem, the symbols can
be coded in such a way that for each symbol, 4.07 bits are required. But what should be the
coding technique? Shannon does not tell us!
Shannon's theory puts only a limit on the minimum number of bits required. This is a very
important limit; all communication engineers have struggled to achieve the limit all these 50
years.
Page 6 of 14
Symbol
Probability
Code Word
0.5
0.5
Probability
Code Word
AA
0.45
AB
0.45
10
BA
0.05
110
BB
0.05
111
Here the strategy in assigning the code words is that the symbols with high probability are
given short code words and symbols with low probability are given long code words.
Note: Assigning short code words to high-probability symbols and long code words to lowprobability symbols results in efficient coding.
In this case, the average number of bits required per symbol can be calculated using the
formula
L=
P (i ) L (i )
bits/symbol
where
P(i) = Probability of the code word
L(i) = Length of the code word
For this example,
L = (1 * 0.45 + 2 * 0.45 + 3 * 0.05 + 3 * 0.05) = 1.65 bits/symbol.
The entropy of the source can be calculated to be 1.469 bits/symbol.
So, if the source produces the symbols in the following sequence:
AABABAABBB
then source coding gives the bit stream
0 110 110 10 111
Page 7 of 14
2.4.2
Shannon's channel coding theorem states that "the error rate of data transmitted over a
bandwidth limited noisy channel can be reduced to an arbitrary small amount if the
information rate is lower than the channel capacity."
This theorem is the basis for error correcting codes using which we can achieve error-free
transmission. Again, Shannon only specified that using good coding mechanisms, we can
achieve error-free transmission, but he did not specify what the coding mechanism should be!
According to Shannon, channel coding may introduce additional delay in transmission but,
using appropriate coding techniques, we can overcome the effect of channel noise.
Page 8 of 14
Bit Stream
Now, instead of transmitting this bit stream directly, we can transmit the bit stream
111 000 000 111 000
that is, we repeat each bit three times. Now, let us assume that the received bit stream is
101 000 010 111 000
Two errors are introduced in the channel. But still, we can decode the data correctly at the
receiver because we know that the second bit should be 1 and the eighth bit should be 0
because the receiver also knows that each bit is transmitted thrice. This is error correction.
This coding is called Rate 1/3 error correcting code. Such codes that can correct the errors are
called Forward Error Correcting (FEC) codes.
Ever since Shannon published his historical paper, there has been a tremendous amount of
research in the error correcting codes. We will discuss error detection and correction in
Lesson 5 "Error Detection and Correction".
All these 50 years, communication engineers have struggled to achieve the theoretical limits
set by Shannon. They have made considerable progress. Take the case of line modems that
we use for transmission of data over telephone lines. The evolution of line modems from
V.26 (2400bps data rate, 1200Hz bandwidth), V.27 modems (4800bps data rate, 1600Hz
bandwidth), V.32 modems (9600bps data rate, 2400Hz bandwidth), and V.34 modems
(28,800bps data rate, 3400Hz bandwidth) indicates the progress in source coding and channel
coding techniques using Shannon's theory as the foundation.
Shannon's channel coding theorem states that "the error rate of data transmitted over a
bandwidth limited noisy channel can be reduced to an arbitrary small amount if the
information rate is lower than the channel capacity."
Note: Source coding is used mainly to reduce the redundancy in the signal, whereas channel
coding is used to introduce redundancy to overcome the effect of noise.
Page 9 of 14
2.5
There may be a number of reasons for wishing to change the form of a digital signal as
supplied by an information source prior to transmission. In the case of English language text,
for example, we start with a data source consisting of about 40 distinct symbols (the letters of
the alphabet, integers and punctuation). In principle, we could transmit such text using a
signal alphabet consisting of 40 distinct voltage waveforms. This would constitute an M-ary
system where M = 40 unique signals. It may be, however, that for one or more of the
following reasons this approach is inconvenient, difficult or impossible:
The transmission channel may be physically unsuited to carrying such a large number
of distinct signals.
The relative frequencies (chances of occurrence) with which different source symbols
occur will vary widely. This will have the effect of making the transmission
inefficient in terms of the time it takes and/or bandwidth it requires.
The data may need to be stored and/or processed in some way before transmission.
This is most easily achieved using binary electronic devices as the storage and
processing elements.
For all these reasons, sources of digital information are almost always converted as soon as
possible into binary form, i.e. each symbol is encoded as a binary word.
After appropriate processing the binary words may then be transmitted directly, as either:
Baseband signals or
Bandpass signals (after going through modulation process)
or re-coded into another multi-symbol alphabet (it is unlikely that the transmitted symbols
map directly onto the original source symbols).
2.5.1
We are generally interested in finding a more efficient code which represents the same
information using fewer digits on average. This results in different lengths of codeword being
used for different symbols. The problem with such variable length codes is in recognizing the
start and end of the symbols.
2.5.2
The following properties need to be considered when attempting to decode variable length
codewords:.
Page 10 of 14
(B)
Instantaneous Decoding.
Consider now an M = 4 symbol alphabet, with the following binary representation:
A=0
B = 10
C = 110
D = 111
This code can be instantaneously decoded using the decision tree shown in Figure 2.3 below
since no complete codeword is a prefix of a larger codeword.
Figure 2.3: Algorithm for decision tree decoding and example of practical code tree.
Page 11 of 14
This is in contrast to the previous example where A is a prefix of both B and D. The latter
example is also a comma code as the symbol zero indicates the end of a codeword except
for the all-ones word whose length is known. Note that we are restricted in the number of
available codewords with small numbers of bits to ensure we achieve the desired decoding
properties.
Using the representation:
A=0
B = 01
C = 011
D = 111
the code is identical to the example just given but the bits are time reversed. It is thus still
uniquely decodable but no longer instantaneous, since early codewords are now prefixes of
later ones.
2.5.3
P(m)
0.1
0.18
0.4
0.05
0.06
0.1
0.07
0.04
2.55 bits/symbol
If the symbols are each allocated 3 bits, comprising all the binary patterns between 000 and
111, the maximum entropy of an eight-symbol source is log 2 8 = 3 bit/symbol and the source
efficiency is therefore given by:
source
2.55
100% = 85%
3
ShannonFano coding, in which we allocate the regularly used or highly probable messages
fewer bits, as these are transmitted more often, is more efficient. The less probable messages
can then be given the longer, less efficient bit patterns. This yields an improvement in
efficiency compared with that before source coding was applied.
The improvement is not as great, however, as that obtainable with another variable length
coding scheme, namely Huffman coding.
Page 12 of 14
(B)
The result of Huffman encoding (Figures 2.4 and 2.5) of the symbols A , . . . , H in the
previous example is to allocate the symbols codewords as follows:
m
P(m)
0.40
0.18
0.10
0.10
0.07
0.06
0.05
0.04
Codewords
001
011
0000
0100
0101
00010
00011
H
100% =
L
2.55
100% = 97.7%
2.61
Note that the Huffman codes are formulated to minimise the average codeword length. They
do not necessarily possess error detection properties but are uniquely, and instantaneously,
decodable, as defined in section 2.5.2.
Page 13 of 14
Figure 2.5 Huffman coding allocation of the codewords to the eight symbol
Page 14 of 14