1 Introduction To Linear Block Codes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

ECE 7670

Lecture 2 Linear block codes


Objective: To deepen understanding of the concepts of linear block codes previ-
ously introduced.
1 Introduction to linear block codes
Recall the example from the last time of a code: we had some generator matrix,
some parity check matrix, and a means of doing some decoding. We will generalize
these ideas.
A block code is a code in which k bits (or, more generally, symbols) are input
and n bits (or, more generally symbols) are output. We designate the code as an
(n, k) code. We will start with bits, elements from the eld GF(2); later we will
consider elements from a eld GF(q) (after we know what this means).
If we input k bits, then there are 2
k
distinct messages (or, more generally q
k
).
Each message of n symbols associated with a with each input block is called a
codeword. We could, in general, simply have a lookup table with k inputs and n
outputs. However, as k gets large, this quickly becomes infeasible. (Try k = 255,
for example.) We therefore restrict our attention to linear codes.
Denition 1 A block code C of length n with 2
k
codewords is called a linear
(n, k) code if and only if its 2
k
code words form a k-dimensional subspace of the
vector space of all n-tuples over the eld GF(2).
More generally, with a bigger eld, a block code C of length n with q
k
is called
a linear (n, k) code if and only if its q
k
code words form a k-dimensional subspace
of the vector space of all n-tuples over the eld GF(q). 2
We remind ourselves of what a vector space is: we have an addition dened that
is commutative and closed; we have scalar multiplication that is closed, distributive,
and associative. We will formalize these properties a little further, but this suces
for the present purposes. We will see (later) that we have a group structure on the
addition operation.
So what does this mean for codewords: the sum of any two codewords is a
codeword. Being a linear vector space, there is some basis, and all codewords can be
obtained as linear combinations of the basis. We can designate g
0
, g
1
, . . . , g
k1
as
the basis vectors. In a nutshell, it means that we can represent the coding operation
as matrix multiplication, as we have already seen. We can formulate a generator
matrix as
G =

g
0
g
1
.
.
.
g
k1

G is a k n matrix. If m = (m
0
, m
1
, . . . , m
k1
) is an input sequence, then the
output is the codeword
mG = m
0
g
0
+ m
1
g
1
+ +m
k1
g
k1
.
We observe that the all-zero sequence must be a codeword. Therefore, the minimum
distance of the code C is the codeword of smallest weight.
Comment on circuits to implement encoding.
ECE 7670: Lecture 2 Linear block codes 2
We have a vector space of dimension k embedded in a vector space of dimension
n, the set of all n-tuples. Associated with every linear block code generator G is a
matrix H called the parity check matrix whose rows span the nullspace of G. Then
if c is a codeword, then
cH
T
= 0.
That is, a codeword is orthogonal to each row of H. From this we observe that
GH
T
= 0.
There is also associated with each code a dual code that has H as its generator
matrix. The dual code is denoted as C

. If G is the generator for an (n, k) code,


then H is the generator for an (n, n k) code.
Example 1 A (7, 4) Hamming code can be generated by
G =

1 0 0 0 0 1 1
0 1 0 0 1 0 1
0 0 1 0 1 1 0
0 0 0 1 1 1 1

.
The 16 codewords are
0 0 0 0 0 0 0
0 0 0 1 1 1 1
0 0 1 0 1 1 0
0 0 1 1 0 0 1
0 1 0 0 1 0 1
0 1 0 1 0 1 0
0 1 1 0 0 1 1
0 1 1 1 1 0 0
1 0 0 0 0 1 1
1 0 0 1 1 0 0
1 0 1 0 1 0 1
1 0 1 1 0 1 0
1 1 0 0 1 1 0
1 1 0 1 0 0 1
1 1 1 0 0 0 0
1 1 1 1 1 1 1
The parity check matrix is
H =

0 1 1 1 1 0 0
1 0 1 1 0 1 0
1 1 0 1 0 0 1

When regarded as a generator of an (7, 3) code, the codewords of this code, the
dual code has the codewords
0 0 0 0 0 0 0
1 1 0 1 0 0 1
1 0 1 1 0 1 0
0 1 1 0 0 1 1
0 1 1 1 1 0 0
1 0 1 0 1 0 1
1 1 0 0 1 1 0
0 0 0 1 1 1 1
ECE 7670: Lecture 2 Linear block codes 3
It may be veried that every codeword in C is orthogonal to every codeword in
C

2
When we want to do the encoding, it is often convenient to have the original data
explicitly evident in the codeword. Coding of this sort is called systematic encoding.
For the codes that we are to talk about, it will always be possible to determine a
generator matrix in such a way the encoding is systematic: simply perform row
reductions and column reordering on G until an identity matrix is revealed. We can
thus write G as
G = [P[I
k
]
where I
k
is the k k identity matrix and P is k n k. . Then
c = mG = m[P[I
k
] = [c
0
c
1
. . . c
nk1
[m
0
m
1
. . . m
k
]
When G is systematic, it is easy to determine the parity check matrix H. It is
simply
H = [I
nk
[ P
T
].
Note: in GF(2) (binary operations) the negative of a number is simply the number.
We could write (for binary codes)
H = [I
nk
P
T
].
The parity check matrix (whether systematic or not) can be used to get some
useful information about the code.
Theorem 1 Let a linear block code C have a parity check matrix H. The minimum
distance of C is equal to the smallest positive number of columns of H which are
linearly dependent.
This concept should be distinguished from that of rank, which is the largest number
of columns of H which are linearly independent.
Proof Let the columns of H be designated as d
0
, d
1
, . . . , d
n1
. Then since cH
T
=
0 for any codeword c, we have
0 = c
0
d
0
+ c
1
d
1
+ + c
n1
d
n1
Let c be the codeword of smallest weight, w = w(c) = d
min
. Then the columns of
H corresponding to the elements of c are linearly dependent. 2
Based on this, we can determine a bound on the distance of a code:
d
min
n k + 1. The Singleton bound
This follows since H has n k linearly independent rows. (The row rank = the
column rank.) So any combination of n k + 1 columns of H must be linearly
dependent.
For a received vector r, the syndrome is
s = rH
T
.
Obviously, for a codeword the syndrome is equal to zero. We can determine if a
received vector is a codeword. Furthermore, the syndrome is independent of the
transmitted codeword. If r = c +e,
s = (c +e)H
T
= eH
T
.
ECE 7670: Lecture 2 Linear block codes 4
Furthermore, if two error vectors e and e

have the same syndrome, then the error


vectors must dier by a nonzero codeword. That is, if
eH
T
= e

H
T
then
(e e

)H
T
= 0
which means they must be a codeword.
2 Maximum likelihood detection
Before talking about decoding, we should introduce a probabilistic criterion for
decoding, and show that it is equivalent to nding the closest codeword. Given
a received vector r, the decision rule that minimizes the probability of error is to
nd that codeword c
i
which maximizes P(c = c
i
[r). This is called the maximum a
posteriori decision rule. (Proof that this minimizes probability of error is shown in
the communications class.) We note by Bayes rule that
P(c[r) =
P(c)P(r[c)
P(r)
,
where, for example, P(r) is the probability of observing the vector r. Now, since
P(r) is independent of c, maximizing P(c[r) is equivalent to maximizing
P(c)P(r[c).
If we now assume that each codeword is chosen with equal probability, then maxi-
mizing P(c)P(r[c) is equivalent to maximizing
P(r[c).
A codeword which is selected on the basis of maximizing P(r[c) is said to be selected
according to the maximum likelihood criterion. We shall assume throughout the text
a maximum likelihood criterion.
Let us see what this means for us.
P(r[c) =
n

i=1
P(r
i
[c
i
)
Assuming a BSC channel with crossover probability p, we have
P(r
i
[c
i
) =

1 p ifc
i
= r
i
p ifc
i
,= r
i
Then
P(r[c) =
n

i=1
P(r
i
[c
i
) = (1 p)
#(pi=ci)
p
#(pi=ci)
= (1 p)
n#(pi=ci)
p
#(pi=ci)
= (1 p)
n

p
1 p

d(c,r)
.
Then if we want to maximize P(r[c), we should choose that c which is closest to
r, since 0 (p/(1 p)) 1. Thus, under our assumptions, the ML criterion is
the minimum distance criterion. In every case, we should choose the error vector of
lowest weight.
ECE 7670: Lecture 2 Linear block codes 5
3 The standard array and syndrome table decod-
ing
Suppose we send c and we receive
r = c +e.
Assuming that error sequences with lower weight are more probable than error
sequences with higher weight (the maximum likelihood criterion), we want to de-
termine our decoded word c

such that the error sequence e

satisfying
r = c

+e

has minimum weight.


One way to do this is to create a standard array. We form it the following way:
1. Write down a list of all possible codewords in a row, with the all-zero codeword
rst.
2. From the remaining n-tuples which have not already been used in the standard
array, select one of smallest weight. Write this down as the coset leader under
the all-zero codeword. On this row, add the coset leader to each codeword at
the top of the column.
3. Repeat step 2 until all the n-tuples have been listed.
An example standard array for a (7, 3) code is shown here, where
G =

1 0 0 0 1 1 1
0 1 0 1 0 1 1
0 0 1 1 1 0 1

We make the following observations:


ECE 7670: Lecture 2 Linear block codes 6
1. There are 2
k
codewords (columns) and 2
n
possible vectors, so there are 2
nk
rows in the standard array.
2. The sum of any two vectors in the same row of the standard array is a code
vector.
3. No two vectors in the same row of a standard array are identical. Because
otherwise we have
e
i
+c
i
= e
i
+c
j
, with i ,= j
which means c
i
= c
j
, which is impossible.
4. Every vector appears exactly once in the standard array. We know every
vector must appear at least once, by the construction. If a vector appears in
both the lth row and the mth row we must have
e
l
+c
i
= e
m
+c
j
for some i and j. Let us take l < m. We have
e
m
= e
l
+c
i
c
j
= e
l
+c
k
for some k. This means that e
m
is on the lth row of the array, which is a
contradiction.
Each row of the standard array is called a coset; we will encounter the term coset
in a more formal setting soon.
To decode with the standard array, we locate the received vector r in the stan-
dard array. Then the error sequence is the coset leader; the best guess of the
transmitted word is the codeword at the top of the column. For example, if
r = 0011011
then
c

= 0011101.
Since we designed the standard array with the smallest error patterns as coset
leaders, this is the ML decision.
As observed before, there are 2
nk
coset leaders. These are called the correctable
error patterns. Fact: an (n, k) code is capable of correcting 2
nk
error patterns.
The standard array can be used to decode linear codes, but suers from a major
problem: the memory requirements quickly become excessive. We want to look for
easier approaches.
A rst step (which doesnt go far enough), is to use the syndrome in the decoding.
Based on the properties of the syndrome above, all elements in a row of the standard
array have the same syndrome. We therefore only need to store syndromes and their
associated error patterns.
For the code whose standard array was given before, we have
H =

0 1 1 1 0 0 0
1 0 1 0 1 0 0
1 1 0 0 0 1 0
1 1 1 0 0 0 1

ECE 7670: Lecture 2 Linear block codes 7


Steps to decoding
1. Compute the syndrome, s = rH
T
.
2. Look up the error pattern e using s.
3. Then c = r +e.
Example 2 We provide another example of the standard array, because it raises
some interesting issues. For the (5, 2) code with
G =

1 0 1 1 1
0 1 1 0 1

the standard array is .


This code is capable of correcting all errors of one bit. In addition, there are
two other errors of two bits that can be corrected. Note that the minimum distance
of this code is 3. The parity check matrix is
H =

1 1 1 0 0
1 0 0 1 0
1 1 0 0 1

.
ECE 7670: Lecture 2 Linear block codes 8
The standard array using syndromes is
2
4 Hamming codes
Hamming codes are the earliest and simples example of linear block codes. The
parameters are as follows, for m 2:
code length: n = 2
m
1
Number of information symbols: k = 2
m
m 1
Number of parity symbols: n k = m
Error correcting capability: t = 1
Examples are: (3, 1), (7, 4), (15, 11) and (31, 26) codes.
The parity check matrix of a Hamming code consists of all nonzero binary m-
tuples. The columns may be ordered to correspond to a systematic code. Syndrome
decoding of Hamming codes using the standard array is straightforward: the syn-
drome indicates which column of H corresponds to the error position.
5 Weight distributions and performance of linear
codes
We pause in our development of decoding algorithms to address the question of
how well linear codes can perform. We recall that we can correct up to t errors if
d

= 2t + 1. We can also detect up to d

1 errors. It may be possible to detect


some errors up to d

, but it cannot detect all of them, because there is at least one


codeword with weight d

. We say that the random error detecting capability of a


code is d

.
However, it may be possible to detect a large number of error patterns with
weight d

. Consider the following: there are 2


n
possible vectors, of which 2
k
are
codewords. There are thus 2
n
2
k
error patterns which are distinct from codewords.
It is possible that an error sequence is exactly equal to a codeword, in which case
the error is undetectable. There are 2
k
1 undetectable error patterns (the all
zero error does not matter). The 2
n
2
k
patterns which are detectable are called
detectable error patterns. In many codes, 2
k
1 is much smaller than 2
n
, so that
only a small fraction of the error patterns are undetectable.
In carefully characterizing the performance of codes, the weight distribution of
the code is important. Consider the codewords of a linear (n, k) code C. One of
the codewords has weight 0. There may be codewords with weight 1, weight 2,
and so forth. Let A
i
be the number of code vectors of weight i. The numbers
A
0
, A
1
, . . . , A
n
are called the weight distribution of C.
Example 3 For the (7, 4) Hamming code presented above,
A
0
= 1 A
1
= 0 A
2
= 0 A
3
= 7 A
4
= 7 A
7
= 1.
ECE 7670: Lecture 2 Linear block codes 9
2
If we want to use a code for error detection, we can determine the probability of an
undetected error using the weight distribution. The probability of an undetected
error is the probability that an error pattern is equal to a (nonzero) code vector.
Then
P
u
(E) =
n

i=1
A
i
p
i
(1 p)
ni
,
where p is the crossover probability. For the Hamming example,
P
u
(E) = 7p
3
(1 p)4 + 7p
4
(1 p)
3
+ p
7
.
If p = .01 then P
u
(E) 7 10
6
.
We saw in example 2 that it may be possible to correct more than the minimum
number of errors in some cases. The number
t = (d

1)/2|
is called the random-error-correcting capability of the code. Based upon this number,
the probability of erroneous decoding is upper bounded by
P(E)
n

i=t+1

n
i

p
i
(1 p)
ni
Example 4 For the (7, 4) Hamming code,
P(E) 21p
2
(1 p)
5
+ 35p
3
(1 p)
4
+ 35p
4
(1 p)
3
+ 21p
5
(1 p)
2
+ 7p
6
(1 p) + p
7
.
When p = 0.01 we get P(E) 0.0020 2
In most cases we can correct many error patterns of more than t errors. There are a
total of 2
nk
correctable error patterns (the number of rows in the standard array).
When we know the weight distribution, we can get a more precise statement of
the probability of error. We make a decoding error if and only if the error pattern is
not a coset leader. Let
i
be the number of coset leaders of weight i. The numbers

0
,
1
, . . . ,
n
are called the weight distribution of the coset leaders. Using these
numbers,
P(E) = 1
n

i=0

i
p
i
(1 p)
ni
,
Example 5 For the (7, 3) code presented above,

0
= 1 a
1
= 7
2
= 7
3
= 1.
Then
P(E) = 1 (1 p)
7
7p(1 p)
6
7p
2
(1 p)
5
p
3
(1 p)
4
.
If p = 0.01, then P(E) = 1.4 10
3
. 2
There is an interesting relationship between the weight distribution of a code
and a dual code. Let A
0
, A
1
, . . . , A
n
be the weight distribution of a code C, and
let B
0
, B
1
, . . . , B
n
be the weight distribution of the dual code C

. Also let
A(z) = A
0
+ A
1
z + + A
n
z
n
.
ECE 7670: Lecture 2 Linear block codes 10
This polynomial is called the weight enumerator for C. (Think of the Z transform
of a nite series.) Let
B(z) = B
0
+ B
1
z + + B
n
z
n
.
The following identity is known as the MacWilliams identity:
A(z) = 2
(nk)
(1 + z)
n
B

1 z
1 + z

.
The MacWilliams identity can be used to compute the probability of an undetected
error from a linear code from the weight distribution of its dual. It can be shown
that
P
u
(E) = 2
(nk)
B(1 2p) (1 p)
n
.
6 Modications to linear codes
We introduce some minor modications to linear codes.
Denition 2 A code is punctured by deleting one of its parity symbols. An
(n, k) code becomes an (n 1, k) code. 2
Denition 3 A code is shortened by deleting a message symbol. An (n, k) code
becomes an (n, k 1) code. 2
Denition 4 A code is expurgated by deleting some of its codewords (while still
maintaining the linear code properties.) 2
Denition 5 A code is extended by adding an additional redundant coordinate,
producing an (n + 1, k) code. 2
7 Bounds on linear block codes
Thinking geometrically, around each code point is a cloud of points corresponding
to non-codewords. These form a sphere around each of the code vectors. The
Hamming sphere is the sphere of radius t that contains vectors a distance t
from a codeword. The number of vectors in the Hamming sphere is denoted V
q
(n, t)
(where q = 2 for binary codes). We can see that
V
q
(n, t) =
t

j=0

n
j

(q 1)
j
.
From a decoding point of view, if a received vector r falls inside the Hamming sphere
of a codeword, then that codeword is selected.
The redundancy of a code is essentially the number of parity symbols in a
codeword. More precisely we have
r = n log
q
M
where M is the number of codewords. For the codewords we have seen to this point,
r = n k.
ECE 7670: Lecture 2 Linear block codes 11
Theorem 2 (The Hamming Bound) A t-error correcting q-ary code must
have redundancy r satisfying
r log
q
V
q
(n, t)
Proof Each of M spheres in C has radius t. The spheres do not overlap. The total
number of points enclosed by the spheres must be q
n
. We must have
MV
q
(n, t) q
n
so
q
n
/M V
q
(n, t)
from which the result follows. 2
A code that satises the Hamming bound with equality is said to be a perfect
code. In terms of the standard array, it means that the random error correcting
capability is the best it gets: there are no leftover codewords. Hamming codes are
perfect codes. (Actually, being perfect codes does not mean the codes are the best
possible codes.) The set of perfect codes is actually quite limited, since (it can be
shown that) the number of codewords of a q-ary code must be M = q
k
. Possible
perfect codes:
1. The set of all n-tuples, with minimum distance = 1 and t = 0.
2. Odd-length binary repetition codes.
3. Hamming codes (linear) or other nonlinear codes with equivalent parameters.
4. The Golay G
23
code.
5. The G
11
and G
23
codes, related to quadratic residue codes.
An upper bound on the redundancy is the Gilbert bound:
Theorem 3 There exists a t-error correcting q-ary code of length n and redundancy
r that satises
r log
q
V
q
(n, 2t).
The proof is interesting, because it demonstrates a power technique in coding, the
random code. It also points out that the code need not be linear.
Proof There are q
n
n-tuples possible. Begin by selecting one of these at random,
then delete from further consideration all vectors that are at Hamming distance less
than or equal to 2t from the selected codeword. Repeat, selecting another codeword
at random from those still available. The selection of each code word results in the
deletion of at most V
q
(n, 2t) vectors from the set of q
n
possible. Continuing, at
least M code vectors may be selected, where
M = q
n
/V
q
(n, 2t)|
q
n
V
q
(n, 2t)
.
2
ECE 7670: Lecture 2 Linear block codes 12
Of considerable theoretical interest is how families of codes perform as their
block length becomes long. Consider the ratio
= d
min
/n.
We would hope that as n gets longer, d
min
might grow correspondingly. In exploring
this behavior, let A
q
(n, d
min
) be the maximum possible number of codewords for a
q-ary code of length n and minimum distance d
min
. Then the number of codewords
is log
q
A
q
(n, d
min
), and the rate is
log
q
A
q
(n, d
min
)
n
We say that the asymptotic rate of the code is the limiting value of this:
a() = limsup
n
log
q
A
q
(n, d
min
)
n
We will now examine a bound on a().
Denition 6 Let the q-ary entropy function be dened as
H
q
(x) = xlog
q
(q 1) xlog
q
x (1 x) log
q
(1 x)
for 0 < x < (q 1)/q. 2
This function has the following property, which we will not prove here (see the
book)
Lemma 4 For 0 (q 1)/q),
lim
n
log
q
V
q
(n, n|)
n
= H
q
().
We now employ this in a bound:
ECE 7670: Lecture 2 Linear block codes 13
Theorem 5 (Gilbert-Varsharmov bound) If 0 (q 1)/q, then a() 1
H
q
().
Proof The error correction capability is
t = (n 1)/2|,
so that
V
q
(n, 2t) V
q
(n, n|).
Then
a() = limsup
n
log
q
A
q
(n, n|)
n
lim
n
1
n
log
q

q
n
V
q
(n, n|)

Gilber bound
= lim
n

1
log
q
V
q
(n, n|)
n

= 1 H
q
() previous lemma
2
We also can state a lower bound:
Theorem 6 (McEliece-Rodemich-Rumsey-Welch bound)
a() H
2
(
1
2

(1 )).
Plots of these bounds for binary codes are shown.

You might also like