Lec 2 Source Coding
Lec 2 Source Coding
Lec 2 Source Coding
Unit 2 - Part 2
Before we discuss the connection between binary representation (coding) and the information mea-
sure entropy, we nd it necessary to rst present formal denition of uniquely decodable codes and
introduce a special class of uniquely decodable codes called instantaneous codes that allow for reduced
time delay (latency) for encoding and decoding. We then proceed to dene eciency of any instanta-
neous code and present methods for design of ecient codes. We have already used the general idea of
a code and the related notions of a code alphabet and a source alphabet.
1 Classication of Codes
Denition: We dene a code as a mapping of all possible sequences of symbols of alphabet S =
{s1 , s2 , · · · , sq } into sequences of symbols of some other alphabet X = {x1 , x2 , · · · , xr }. We call xi the
codeword corresponding to source symbol si and X the code alphabet.
The denition of a code given above is too general to be of any much use in code synthesis. We
therefore restrict our attention to codes having certain additional properties. The rst of the properties
we require is that the code be a block code.
Denition: A block code is a code which maps each of the symbols of the source alphabet S into
a xed sequence of symbols of the code alphabet X . These xed sequences of the code alphabet are
called code words. We denote the code word corresponding to the source symbol si by Xi .
Example 1:
An example of a binary block code is given in Table 1.
Table 1: A Binary Block Code
Source Symbols Code
s1 0
s2 01
s3 11
s4 01
1
Table 2: A Nonsingular Block Code
Source Symbols Code
s1 0
s2 01
s3 11
s4 10
Even tough all the words are distinct in the example given of a nonsingular block code, it is still
possible for a given sequence of code symbols to have an ambiguous origin. For example, the sequence
010 might represent either s2 s1 or s1 s4 . That is, for the code of Table 2, even tough it is nonsingular
in the scalar outputs, it is singular when source extensions are considered. This example shows that we
must dene an even more restrictive condition than nonsingularity if we are to obtain useful codes. To
derive such condition we rst dene the block codes formed by the nth extension of a block code.
Denition: The nth extension of a block code which maps the symbols si into the code words Xi
is the block code which maps the sequence of source symbols si1 si2 · · · sin into the sequences of code
words Xi1 Xi2 · · · Xin .
Example 3:
The second extension of the block code given in Table 2 is shown in Table 3.
Table 3: The Second Extension of a Block Code
Source Symbols Code Source Symbols Code
s1 s1 00 s3 s1 110
s1 s2 001 s3 s2 1101
s1 s3 011 s3 s3 1111
s1 s4 010 s3 s4 1110
s2 s1 010 s4 s1 100
s2 s2 0101 s4 s2 1001
s2 s3 0111 s4 s3 1011
s2 s4 0110 s4 s4 1010
Note that the fourth and fth codewords of the second extension code given in Table 3 are not
distinct. We now dene the condition that assures that any two sequences of source symbols of the
same length lead to distinct sequences of code symbols. The block codes that satisfy this condition are
called uniquely decodable.
Denition: A block code is said to uniquely decodable if, and only if, the nth extension of the code
is nonsingular for every nite n.
It is easy to show that for uniquely decodable codes any two sequences of source symbols, even if
they are not the same length, will lead to distinct sequences of code symbols. This property follows
from our denition.
Code A given above illustrates what is undoubtedly the simplest method of constructing a uniquely
decodable code. All the words of A are of the same length, and, in addition, A is obviously nonsingular.
These two properties are sucient to ensure unique decodability.
2
Code B in Table 4 is uniquely decodable because it is nonsingular, and, in addition, it is what has
been called a comma code. That is, in B the 1 acts as a comma to separate one word from the next.
When scanning a sequence of code symbols, we may use the comma to determine when one word ends
and another begins.
The ability to tell when a code word, immersed in a nite sequence of code symbols, comes to an
end can be seen to be central to the construction of both types of uniquely decodable codes discussed
above. Indeed this property is at the very heart of the uniquely decodable concept. Consider yet another
uniquely decodable code of Table 5
Table 5: Another Uniquely Decodable Code
Source Symbols Code C
s1 1
s2 01
s3 011
s4 0111
Code C is dierent from codes A and B in an important aspect. If we are given a binary sequence
composed of words from code C , we can not decode the sequence, word by word, as it is received. If we
receive 01, for example, we can not say that this corresponds to the source symbol s2 until we receive
the code symbol. If the next code symbol is 0, then we know the 01 correspond to s2 ; if, however,
the next code symbol is 1, then we would have to inspect one more code symbol to tell us whether we
are receiving s3 (011) or s4 (0111). This time lag is essential to the decoding process if we use code C ,
whereas with code A and B we can decode each word as it arrives.
Denition: A uniquely decodable code is said to be instantaneous if it is possible to decode each
word in a sequence without reference to succeeding code symbols.
Code A and B given above are instantaneous. Code C is an example of a uniquely decodable code
which is not instantaneous. It will be useful to have a general test for a code which will tell us when a
code instantaneous; we now develop such a test.
Denition: Let Xi = xi1 xi2 · · · xim be a word of some code. The sequence of code symbols
xi1 xi2 · · · xij , where j ≤ m, is called a prex of the code word Xi .
The test we seek may now be stated:
A necessary and sucient condition for a code to be instantaneous is that no complete
word of the code be a prex of some other code word.
At this point it is helpful to summarize the various classes of code we have dealt with. In Figure 1
we indicate the path dened through our maze of subclasses of codes which nally led us to the subclass
of instantaneous codes.
The nature of the constraints imposed upon a code when we require it to be instantaneous may be
appreciated more fully by some primitive attempts at code synthesis. Let us try to synthesize a binary
3
code for a source with four symbols. We might start by assigning 0 to symbol s1 .
s1 → 0
If this is the case, then note that all other source symbols must correspond to code words beginning
with 1. If this were not true, we contradict prex free rule. We cannot let s2 correspond to the single
symbol code 1; this would leave us with no symbols with which to start remaining two code words. We
might have
s2 → 10
This, in turn, would require us to start remaining code words with 11, the only two bit prex still
unused and we might set
s3 → 110
and
s4 → 111
In the above code, note how starting the code by letting s1 correspond to 0 cut down the number
of code words of length 2. We can see that if we were to select a 2-bit code word to represent s1 , we
would have more freedom of selection in the ensuing words. We may set
s1 → 00
s2 → 10
s3 → 11
s4 → 01
The question of which of these two codes constructed above is better, in the sense that fewer bits
are required for the binary representation of the message sequence, cannot be resolved on the basis of
the information given. We address this issue in the following section by expressing the quantitative
constraints on the word lengths of an instantaneous code. Necessary and sucient conditions for the
existence of an instantaneous code with word length l1 , l2 , · · · , lq are provided by the Kraft inequality.
A necessary and sucient condition for the existence of an instantaneous code with word
length l1 , l2 , · · · , lq is that
q
X
r−li ≤ 1
i=1
where r is the number of dierent symbols in the code alphabet.
Note, however, that the Kraft inequality does not tell us how to construct an instantaneous code
or whether the given code is an instantaneous code or not. The inequality is merely a condition on the
word lengths of the code and not on the words themselves. If a set of word lengths satises the Kraft
inequality then we can say that there does exist an instantaneous code with the specied set of word
lengths. For the binary case, the Kraft inequality tells us that the li must satisfy the equation
q
X
2−li ≤ 1.
i=1
4
2 Coding Information Sources
For a given source alphabet and a given code alphabet, we can construct many instantaneous or uniquely
decodable codes. This abundance of acceptable codes forces us to nd a criterion by which we may
choose among the codes. If there are no other considerations, from the standpoint of mere economy of
expression and the resulting economy of communication equipment, we prefer a code with many short
words to one with long words. Hence, the natural criterion for this selection, although by no means the
only possibility, is length. We therefore dene the average length of a code.
Denition: Let a block code transform of the source symbols s1 , s2 , · · · , sq into the code words
X1 , X2 , · · · , Xq . Let the probabilities of the source symbols be P1 , P2 , · · · , Pq , and let the lengths of the
code words be l1 , l2 , · · · , lq . Then we dene L, the average length of the code, by the equation
q
X
L= Pi li .
i=1
We shall be interested in nding uniquely decodable codes with average length as small as possible.
McMillan's inequality assures us that any set of word lengths achievable in a uniquely decodable code
is also achievable in instantaneous code. Because of this, we may restrict our search to the class of
instantaneous codes.
Our denition of L is valid for either zero-memory or Markov information sources. In order to
simplify the discussion, however, we temporarily limit our consideration to zero-memory sources.
q q
X 1 X 1
Pi log ≤ Pi log
Pi Qi
i=1 i=1
with equality if and only if Pi = Qi for all i. By the denition of entropy (in binary units), we can write
that Now we choose
r−li
Qi = Pq −li
i=1 r
and obtain
q q q q
X 1 X X X
Pi logr ≤ Pi logr rli + Pi logr ( r−li )
Pi
i=1 i=1 i=1 i=1
Xq q
X
−li
= Pi li + log( r )
i=1 i=1
≤L
Here,
Pq for the last inequality we have made use of Kraft's inequality for the instantaneous codes, i.e.,
i=1 r
−li ≤ 1.
Hr (S) ≤ L,
5
where Hr (S) = − qi=1 Pi logr Pi is the entropy of the source in r-ary units and L is average code word
P
length in r-ary units.
This equation is the rst instance which demonstrates the connection between our denition of
information and a quantity (in this case L) which does not depend upon that denition. This equation
presents the justication for our information measure.
For equality in L ≥ Hr (S) necessary and sucient conditions are Pi = Qi and qi=1 r−li = 1. After
P
simplication the necessary and sucient condition will become
1
logr = li for all i.
Pi
Each symbol of the source has a probability of 1/4, so a compact code must have four words of
length = 2. Such a code was given in Lecture 2. It is
Sunny → 00
Cloudy → 01
6
Rainy → 10
Foggy → 11
The average word length of this code is 2 bits/symbol and so the eciency is
η = 1.
There exists no instantaneous decodable code for this source with smaller average word length. Hence,
the given code is compact code for binary representation of the state of weather in Manali.
Source 2
The smallest possible average length we can achieve in a binary instantaneous code for this source
will, therefore, be 1.75 bits/symbol. All the symbol probabilities of this source are of the form (1/r)αi ,
with αi integer. The compact code is obtained by setting the code word lengths equal to 1,2,3, and 3,
respectively. Such a code was given in Lecture 2. It is
Sunny → 1
Cloudy → 01
Rainy → 001
Foggy → 000
As a check, we calculate the average word length of this code which turns out to be 1.75 bits/symbol
and the eciency is
η = 1.
There exists no instantaneous decodable code for this source with smaller average word length. Hence,
the given code is compact code for binary representation of the state of weather in Jodhpur.
Using the denition of the average length of the code and compact codes, we may formulate the
fundamental problem of coding information sources as that of nding compact codes. Note that both
denition refer only to the word lengths of codes, and not to the words themselves.
7
Table 8: The state of weather in Jodhpur
Messages Probabilities
Sunny 1
2
Cloudy 1
4
Rainy 1
8
Foggy 1
8
the procedure outlined earlier for instantaneous codes. It is easy to show that such compact codes for
special zero-memory sources described above would have an eciency of 1.
Example 1: We again consider the example from Lecture 2 where weather data from Jodhpur was
to be encoded.
All the symbol probabilities of this source are of the form (1/r)αi , with αi integer and r = 2. The
binary compact code is obtained by setting the code word lengths equal to 1,2,3, and 3, respectively.
Such a code was given in Lecture 2. It is
Sunny → 1
Cloudy → 01
Rainy → 001
Foggy → 000
The average word length of this code which turns out to be 1.75 bits/symbol, same as the entropy 1.75
bits/symbol of the source. We now turn our attention to the design of compact codes for zero-memory
source with arbitrary symbol probabilities.
8
σi P (σi ) Compact Code
σ1 9
16 0
σ2 3
16 10
σ3 3
16 110
σ4 1
16 111
and
η4 = 0.991.
It can be seen that as we encode higher and higher extensions of the original source S , the eciency
approaches 1. We will use this insight to dene fundamental limits of source coding for general sources.
Therefore, we can conclude that (1) denes an acceptable set of li for an instantaneous code.
If we multiply (1) by Pi and sum over all i, we nd
Hr (S) ≤ L ≤ Hr (S) + 1 (2)
We note that these bounds, in (2), on the average word length L are derived using the coding method
dened in (1).
Since (2) is valid for any zero-memory source, we may apply it to the nth extension of our original
source S
Hr (S n ) ≤ Ln ≤ Hr (S n ) + 1 (3)
Ln in (3) represents the average length of the code words corresponding to symbols from the nth
extension of the source S . That is, if λi is the length of the code word corresponding to the symbol σi ,
and P (σi ) is the probability of σi , then
q n
X
Ln = P (σi )λi .
i=1
9
Ln /n, therefore, is the average number of code symbols used per single symbol from S . This along with
the fact that for a zero-memory source Hr (S n ) = nHr (S) allows us to re-write the (3) as
Ln 1
Hr (S) ≤ ≤ Hr (S) + (4)
n n
and so it is possible to make Ln /n as close to Hr (S) as we wish by coding the nth extension of S rather
than S :
Ln
lim = Hr (S).
n→∞ n
This equation is known as Shannon's rst theorem or the noiseless coding theorem. The theorem tells
us that we can make the average number of r-ary code symbols per source symbol as small as, but no
smaller than, the entropy of the source measured in r-ary units. The price we pay for decreasing Ln /n
is the increased coding complexity caused by the large number (q n ) of source symbols with which we
must deal.
Our proof of Shannon's rst theorem have been constructive. That is to say, if we use method in (1)
to choose word lengths of a block code for encoding the symbols from S n , and make n suciently large,
the quantity Ln /n can be made as close to Hr (S) as we wish. What if due to implementation complexity
and coding delay issues we do not want to make n suciently large, then how do we construct a compact
code for a xed source? For a xed n, all the theorem tells us is that there exists instantaneous codes
with the average length no greater than the right side of (4). The theorem does not tell us what value of
L we shall obtain. Even more important, it does not guarantee that choosing the word lengths according
to (1) will give us the smallest possible value of L it is possible to obtain for that xed n.
10