Gary Notes

Math 550
Coding and Cryptography

Workbook
J. Swarts
0121709
ii
Contents
1 Introduction and Basic Ideas 3

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Basic Assumptions About the Channel . . . . . . . . . . . . . . . . . . . . 5
2 Detecting and Correcting Errors 7
3 Linear Codes 13
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 The Generator and Parity Check Matrices . . . . . . . . . . . . . . . . . . 16
3.3 Cosets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3.1 Maximum Likelihood Decoding (MLD) of Linear Codes . . . . . . . 21
4 Bounds for Codes 25
5 Hamming Codes 31
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.2 Extended Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6 Golay Codes 37
6.1 The Extended Golay Code : C24 . . . . . . . . . . . . . . . . . . . . . . . . 37
6.2 The Golay Code : C23 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
7 Reed-Muller Codes 43
7.1 The Reed-Muller Codes RM (1, m) . . . . . . . . . . . . . . . . . . . . . . 46
8 Decimal Codes 49
8.1 The ISBN Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
8.2 A Single Error Correcting Decimal Code . . . . . . . . . . . . . . . . . . . 51
8.3 A Double Error Correcting Decimal Code . . . . . . . . . . . . . . . . . . . 53
iii
iv CONTENTS
8.4 The Universal Product Code (UPC) . . . . . . . . . . . . . . . . . . . . . . 56

8.5 US Money Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
8.6 US Postal Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
9 Hadamard Codes 61
9.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
9.2 Definition of the Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
9.3 How good are the Hadamard Codes ? . . . . . . . . . . . . . . . . . . . . . 67
10 Introduction to Cryptography 71
10.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
10.2 Affine Cipher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
10.2.1 Cryptanalysis of the Affine Cipher . . . . . . . . . . . . . . . . . . . 74
10.3 Some Other Ciphers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
10.3.1 The Vigenère Cipher . . . . . . . . . . . . . . . . . . . . . . . . . . 76
10.3.2 Cryptanalysis of the Vigenère Cipher : The Kasiski Examination . 77
10.3.3 The Vernam Cipher . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
11 Public Key Cryptography 81

11.1 The Rivest-Shamir-Adleman (RSA) Cryptosystem . . . . . . . . . . . . . . 82
11.1.1 Security of the RSA System . . . . . . . . . . . . . . . . . . . . . . 85
11.2 Probabilistic Primality Testing . . . . . . . . . . . . . . . . . . . . . . . . . 86
12 The Rabin Cryptosystem 89

12.1 Security of the Rabin Cryptosystem . . . . . . . . . . . . . . . . . . . . . . 94
13 Factorisation Algorithms 97
13.1 Pollard’s p − 1 Factoring Method (ca. 1974) . . . . . . . . . . . . . . . . . 97
14 The ElGamal Cryptosystem 99

14.1 Discrete Logarithms (a.k.a indices) . . . . . . . . . . . . . . . . . . . . . . 99
14.2 The system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
14.3 Attacking the ElGamal System . . . . . . . . . . . . . . . . . . . . . . . . 101
15 Elliptic Curve Cryptography 105

15.1 The ElGamal System in Arbitrary Groups . . . . . . . . . . . . . . . . . . 105
15.2 Elliptic Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
15.3 The Menezes-Vanstone Cryptosystem . . . . . . . . . . . . . . . . . . . . . 110
CONTENTS v
16 The Merkle-Hellman “knapsack” System 113
17 The McEliece Cryptosystem 117
A Assignments 121
vi CONTENTS
Part I Coding
2 CONTENTS
Chapter 1
Introduction and Basic Ideas
1.1 Introduction
A model of a communication system is shown in Figure 1.1.
Figure 1.1: Model of a Communications System
The function of each block is as follows

✗ Source : This produces the output that we would like to transmit over the channel.
The output can be either continuous, for example voice, or discrete as in voltage
measurements taken at set times.
✗ Source Encoder : The source encoder transforms the source data into binary
digits (bits). Further, the aim of the source encoder is to minimise the number of
bits required to represent the source data. We have one of two options here in that
we could require that the source data be perfectly reconstructible (as in computer
data) or we may allow a certain level of errors in the reconstruction (as in images
or speech).
✗ Encryption : Is intended to make the data unintelligible to all but the intended
receiver. That is, it is intended to preserve the secrecy of the messages in the
presence of unwelcome monitoring of the channel. Cryptography is the science of
maintaining secrecy of data from both passive intrusion (eavesdropping) and active
intrusion (introduction or alteration of messages). This introduces the following
distinctions
3
4 CHAPTER 1. INTRODUCTION AND BASIC IDEAS
✎ Authentication : is the corroboration that the origin of a message is as

claimed.
✎ Data Integrity : is the property that the data has not been changed in an
unauthorised way.
✗ Channel Encoder : Tries to maximise the rate at which information can be reliably
transmitted on the channel in the presence of disruptions (noise) that can introduce
errors. Error correction coding, located in the channel encoder, adds redundancy in
a controlled manner to messages to allow transmission errors to be detected and/or
corrected.
✗ Modulator : Transforms the data into a format suitable for transmission over the
channel.
✗ Channel : The medium that we use to convey the data. This could be in the form
of a wireless link (as in cell phones), a fiber optic link or even a storage medium
such as magnetic disks or CD’s.
✗ Demodulator : Converts the received data back into its original format (usually
bits).
✗ Channel Decoder : Uses the redundancy introduced by the channel encoder to

try and detect/correct any possible errors that the channel introduced.
✗ Decryption : Converts the data back into an intelligible form. At this point one
would also perform authentication and data integrity checks.
✗ Source Decoder : Adds back the original (naturally occurring) redundancy that
was removed by the source encoder.
✗ Receiver : The intended destination of the data produced by the source.
1.2 Terminology
Definition 1.2.1 (Alphabet). An alphabet is a finite, nonempty set of symbols. Usually
the binary alphabet K = {0, 1} is used and the elements of K are called binary digits or
bits.
Definition 1.2.2 (Word). A word is a finite sequence of elements from an alphabet.
Definition 1.2.3 (Code). A code is a set C of words (or codewords).

1.3. BASIC ASSUMPTIONS ABOUT THE CHANNEL 5
Example 1.2.4. C1 = {00, 01, 10, 11} and C2 = {000, 001, 01, 1} are both codes over the
alphabet K.
Definition 1.2.5 (Block Code). A block code has all codewords of the same length;
this number is called the length of the code.
In our example C1 is a (binary) block code of length 2, while C2 is not a block code.
Definition 1.2.6 (Prefix code). A prefix code is a code such that there do not exist
distinct words wi and wj such that wi is a prefix (initial segment) of wj .
C2 is an example of a prefix code.

Prefix codes can be decoded easily since no codeword is a prefix of any other. We
simply scan the word from left to right until we have a codeword, we then continue
scanning from this point on until we reach the next codeword. This is continued until we
reach the end of the received word.
Example 1.2.7. Suppose the words 000, 001, 01 and 1 correspond to N, S, E and W
respectively. If 001011100000101011 is received, the message is S E W W N S E E W.
Since C2 is a prefix code, there is only one way to decode this message.
Prefix codes can be used in a situation where not all data are equally likely — shorter
words are used to encode data that occur more frequently and longer words are used for
infrequent data. A good example of this is Morse code. In English, the letter E occurs
quite frequently and so in Morse code it is represented by a single dot ·. On the other
hand the letter Z rarely occurs and it is assigned the sequence − − · · of dots and dashes.
1.3 Basic Assumptions About the Channel

We make the following three assumptions about the channel
1. If n symbols are transmitted, then n symbols are received — though maybe not the
same ones. That is, nothing is added or lost.
2. There is no difficulty identifying the beginning of the first word transmitted.
3. Noise is scattered randomly rather than being in clumps (called bursts). That is
the probability of error is fixed and the channel stays constant over time. Most
channels do not satisfy this property with one notable exception — the deep space
channel (used to transmit images and other data between spacecraft and earth). In
6 CHAPTER 1. INTRODUCTION AND BASIC IDEAS
almost all other channels the probability of error varies over time. For example on
a CD, where certain areas are more damaged than others, and in cell phones where
the radio channel varies continually as the users move about their terrain.
A binary channel is called symmetric if 0 and 1 are transmitted with equal accuracy.
The reliability of such a channel is the probability p, 0 ≤ p ≤ 1, that the digit sent is the
digit received. The error probability q = 1 − p is the probability that the digit sent is
received in error.
Figure 1.2: The Binary Symmetric Channel
Example 1.3.1. If we are transmitting codewords of length 4 over a binary symmetric

channel (BSC) with reliability p and error probability q = 1 − p, then
µ ¶
4 4 0
Probability of no errors : pq ,
0
µ ¶
4 3 1
Probability of 1 error : pq ,
1
µ ¶
4 2 2
Probability of 2 errors : pq ,
2
µ ¶
4 1 3
Probability of 3 errors : pq ,
3
µ ¶
4 0 4
Probability of 4 errors : pq .
4
Chapter 2
Detecting and Correcting Errors
Errors can be detected when the received word is not a codeword. If a codeword is
received, then it could be that no errors, or several errors (changing one codeword into
another) have occurred.
Consider C1 = {00, 01, 10, 11}. Every received word is a codeword, so no errors can
be detected (and none can be corrected). On the other hand consider
C2 = {000000, 010101, 101010, 111111},
obtained from C1 by repeating each word in C1 three times — this is known as a repetition
code. Here we can detect 2 or fewer errors as we need 3 or more errors to change one
codeword into another.
Example 2.1.2. Let C2 = {000000, 010101, 101010, 111111} be the code defined above.
Assume that 010101 is sent, but that 110111 is received. By examining C2 we conclude
that at least 1 or 2 errors could have occurred (more is also possible). Keep in mind that
we as the receiver only know the received word.
We can correct 1 error by using a majority rule : examine each pair of bits starting at
the left and decode the received word to a word that has its repeating bits that pair that
occurred the majority of times. So if 101010 is sent and 111010 is received, we decode
this to 101010. The idea is that if 1 error occurs, then there exists a unique codeword
that differs in 1 place from the received word; we decode to this word. If 2 errors occur,
the possibility exists that we may decode to the wrong codeword : if 010101 is sent and
110111 is received we will incorrectly decode this to 111111. Therefore this code cannot,
in general, correct 2 errors (specific cases may be possible, but not all possible patterns
of two errors are correctable).
Consider C3 = {000, 011, 101, 110}, formed from C1 by adding a third digit to each
codeword such that the total number of ones in the resulting word is even. This is called
7
8 CHAPTER 2. DETECTING AND CORRECTING ERRORS
a parity check code and the digit that was added is called the parity check digit. This
code can detect 1 error as no two codewords differ in exactly one place, but it cannot
detect 2 errors since two errors may change one codeword into another codeword.
Example 2.1.3. Let C3 = {000, 011, 101, 110} be the parity check code from above. If 000
is sent and 001 is received we immediately detect an error as the received word has an
odd number of ones. If we had been in a position whereby we knew that only one error
has occurred (in general we won’t know how many errors will have occurred) this won’t
even help us in correctly decoding the word above as 011, 101 and 000 all could have been
possible sent words.
If 011 is sent and 110 is received (i.e two errors occurred) we will not detect an error
since the received word has the required number of ones.
If u and v are binary words of the same length, we define u + v to be the binary word
obtained by component-wise addition modulo 2. That is 0 + 0 = 0, 0 + 1 = 1, 1 + 0 = 1
and 1 + 1 = 0.
Example 2.1.4. 01101 + 11001 = 10100
Definition 2.1.5 (Hamming weight). Let v be a binary word of length n. The Ham-
ming weight of v, denoted by wt(v), is the number of times the digit 1 occurs in v.
Definition 2.1.6 (Hamming distance). Let u and v be a binary words of length n.

The Hamming distance between u and v, denoted by d(u, v), is the number of places
where u and v differ.
Note that d(u, v) = wt(u + v) — in the places where u and v differ a 1 will appear in
u + v and in the places where u and v are the same a 0 will appear in u + v — from the
way that + was defined.
Example 2.1.7. d(01011, 00111) = 2 = wt(01011 + 00111) = wt(01100) and
d(10110, 10110) = 0 = wt(10110, 10110) = wt(00000).
Let C be a binary code of length n. If v ∈ C is sent and w is received, then the error
pattern is u = v + w. That is the error pattern indicates those positions where an error
occurred by a 1. Also since u = v + w, w = v + u by the addition defined earlier (addition
and subtraction are the same under this rule). Therefore, the received word equals the
transmitted word plus the error pattern.
Definition 2.1.8 (Detecting an error pattern). A code C detects the error pattern
u if u + v ∈
/ C ∀v ∈ C.
9
Example 2.1.9. Let C = {001, 101, 110}. Then C detects the error pattern u = 010
because none of the codewords added to the error pattern is again a codeword : 001+010 =
011 ∈/ C, 101 + 010 = 111 ∈ / C and 110 + 010 = 100 ∈ / C. On the other hand C
does not detect u = 100 because the codeword 001 added to u is again a codeword :
001 + 100 = 101 ∈ C.
Definition 2.1.10 (Minimum distance of a code). For a code C with |C| ≥ 2, the
minimum distance of C, dmin (C), is the smallest distance between two distinct codewords.
That is
dmin = min d(u, v).

u,v∈C
u6=v
Example 2.1.11. Let C = {0000, 1010, 0111}. Then
d(0000, 1010) = wt(0000 + 1010) = wt(1010) = 2,

d(0000, 0111) = wt(0000 + 0111) = wt(0111) = 3,
d(1010, 0111) = wt(1010 + 0111) = wt(1101) = 3.
Therefore the minimum distance of C is 2.
Theorem 2.1.12. A code C can detect all nonzero error patterns of weight at most d − 1
⇐⇒ dmin (C) ≥ d.
Proof.
⇐:
Suppose C has dmin (C) ≥ d. Let u be a nonzero error pattern with wt(u) ≤ d − 1 and
v ∈ C. Suppose v is sent and w = v + u is received. Then d(v, w) = wt(v + w) =
wt(v + v + u) = wt(u) ≤ d − 1. Since C has minimum distance at least d, w ∈ / C as
the codeword closest to v (in terms of Hamming distance) is at least a distance d from v.
Therefore C detects the error pattern u.
⇒:
Suppose C can detect all nonzero error patterns of weight at most d− 1. Let v, w ∈ C and
v 6= w. Then u = v + w is not a detectable error pattern as v + u = v + v + w = w ∈ C.
Thus wt(u) = 0 or wt(u) ≥ d. Since v 6= w, wt(u) 6= 0. Therefore wt(u) ≥ d, so that
d(v, w) = wt(v + w) = wt(u) ≥ d. This shows that dmin (C) ≥ d since v and w were two
arbitrary, distinct codewords.
Definition 2.1.13 (t-error detecting code). A code C is a t-error detecting code if
1. C detects all nonzero error patterns of weight at most t.
2. C fails to detect some error pattern of weight t + 1.
So, by Theorem 2.1.12 a code C is t-error detecting if and only if C has minimum
distance t + 1.
Definition 2.1.14 (Correcting an error pattern). A code C is said to correct the

error pattern u if ∀ v ∈ C, v + u (the received word) is closer (in the sense of Hamming
distance) to v than to any other w ∈ C.
Example 2.1.15. Let C = {0000, 1010, 0111}. Then C corrects the error pattern u = 0100 :
• If 0000 is sent, then (with u as error pattern) 0000 + u = 0100 is received. This
received word is closer to 0000 than to any other codeword.
• If 1010 is sent, then 1010 + u = 1110 is received. This received word is closer to
1010 than to any other codeword.
• If 0111 is sent, then 0111 + u = 0011 is received. This received word is closer to
0111 than to any other codeword.
On the other hand C does not correct the error pattern 1000 : If 0000 is sent, then
0000 + 1000 = 1000 is received and d(1000, 0000) = 1 = d(1000, 1010). Thus there is more
than one codeword that is closest to the received word.
Theorem 2.1.16. A code C will correct all error patterns of weight at most t ⇐⇒
dmin (C) ≥ 2t + 1.
Proof.
⇒:
Suppose that C corrects all error patterns of weight at most t, but that dmin (C) ≤ 2t.
Let v, w ∈ C such that d(v, w) = dmin (C) = d ≤ 2t. Let u by any error pattern obtained
from v + w by replacing 1’s by 0’s until only dd/2e 1’s remain. If v is sent and the error
pattern u occurs, then
» ¼
d
d(v, v + u) = wt(v + v + u) = wt(u) = ,
2
¹ º » ¼
d d
d(w, v + u) = wt(w + v + u) = dmin (C) − ≤ .
2 2
11
The second to last step, wt(w + v + u) = dmin (C) − dd/2e, follows from the fact that
wt(w + v) = dmin (C) and that u has its 1’s in exactly the same locations where w + v
has some of its 1’s. So by computing w + v + u, u will cancel exactly dd/2e of (w + v)’s
1’s. Therefore the received word, v + u, is at least as close to w as to v. This implies that
C does not correct u, but u is an error pattern of weight dd/2e and dd/2e ≤ t, so C was
supposed to be able to correct u, a contradiction.
⇐:
Suppose C has dmin (C) ≥ 2t + 1. Let u be a nonzero error pattern of weight at most t. If
v ∈ C is sent and v + u is received, then d(v, v + u) = wt(v + v + u) = wt(u) ≤ t. Now for
any other w ∈ C, w 6= v, d(w, v +u) = wt(w +v +u) ≥ 2t + 1− wt(u) ≥ 2t +1 − 2t = t + 1.
As above, wt(w + v + u) is bounded by realizing that wt(w + v) = d(w, v) ≥ 2t + 1 since
w and v are distinct codewords and dmin (C) ≥ 2t + 1. Further u will be able to cancel
at most wt(u) ≤ t of (w + v)’s 1’s. Therefore the received word v + u is closer to v than
to any other codeword w, w 6= v. Thus v + u can be correctly decoded showing that C
corrects all error patterns of weight at most t.
Definition 2.1.17 (t-error correcting code). A code C is a t-error correcting code if
1. C corrects all error patterns of weight at most t.
2. Fails to correct at least one error pattern of weight t + 1.

Chapter 3
Linear Codes
3.1 Introduction
Definition 3.1.1 (Linear code). A code C is called a linear code if u +v ∈ C whenever
u, v ∈ C.
Example 3.1.2. Let C1 = {000, 001, 101}. C1 is not a linear code since 001 + 101 = 100 ∈ /
C1. Let C2 = {0000, 1001, 0110, 1111}, then C2 is linear : If v ∈ C2 , then v +v = 0000 ∈ C
and also 0000 + u = u ∀u ∈ C. Therefore we only have to consider the addition of two
nonzero, distinct words :
1001 + 0110 = 1111 ∈ C,
1001 + 1111 = 0110 ∈ C,
0110 + 1111 = 1001 ∈ C.
Theorem 3.1.3. The minimum distance of a linear code is the smallest weight of a
nonzero codeword.
Proof.
See assignment 1 in the appendix.
Recall that we let K = {0, 1} be the binary alphabet. If we define K n to be the set
of words (vectors) of length n over K, then K n together with the addition defined earlier
and scalar multiplication by elements of K is a vector space.
Let S ⊆ K n, say S = {v1 , v2 , . . . , vk }, then the subspace spanned by S (or generated
by S) is
hSi = {w ∈ K n | w = α1 v1 + α2 v2 + · · · + αk vk , αi ∈ K},
13
14 CHAPTER 3. LINEAR CODES
if S 6= ∅. If S = ∅, then hSi = {0}. Note that a linear code can also be thought of as a
subspace generated by some set S.
Example 3.1.4. Let S = {0100, 0011, 1100}. The the subspace (code) generated by S,
C = hSi, is
C = {w | w = α1 (0100) + α2 (0011) + α3(1100) αi ∈ K}.
α1 α2 α3 w
0 0 0 0000
0 0 1 1100
0 1 0 0011
0 1 1 1111
1 0 0 0100
1 0 1 1000
1 1 0 0111
1 1 1 1011
Recall that two vectors u = {u1 , u2 , . . . , un} and v = {v1 , v2 , . . . , vn } in K n are or-
thogonal if their dot product u · v = u1 v1 + u2 v2 + · · · + un vn = 0.
Example 3.1.5. If u = 11001 and v = 01101, then
u · v = 1 × 0 + 1 × 1 + 0 × 1 + 0 × 0 + 1 × 1 = 0 + 1 + 0 + 0 + 1 = 0,
so u and v are orthogonal, keeping in mind that addition is done mod 2.
If S ⊆ K n , a vector v ∈ K n is orthogonal to S if v · x = 0 ∀x ∈ S. The set of vectors
orthogonal to S is called the orthogonal complement of S and is denoted by S ⊥ .
Theorem 3.1.6. If V is a vector space and S ⊆ V (note a subset, not necessarily a
subspace), then S ⊥ is a subspace of V .
If S ⊆ K n and C = hSi (the linear code generated by S), we write C ⊥ = S ⊥ and call
C ⊥ the dual code of C.
Example 3.1.7. Let S = {0100, 0101}. Then C = hSi = {0000, 0101, 0100, 0001}. To find
C ⊥ we must find all words v = v1 v2 v3 v4 such that v · 0100 = 0 and v · 0101 = 0. It is
enough to ensure that v is orthogonal to the two basis vectors as v will also be orthogonal
to any linear combination of these vectors. This translates to
0v1 + 1v2 + 0v3 + 0v4 = 0,
0v1 + 1v2 + 0v3 + 1v4 = 0.
3.1. INTRODUCTION 15
From the first equation we find that v2 = 0 and then from the second equation we see
that v4 = 0. Thus as long as v2 = v4 = 0, v will be orthogonal to C; v1 and v3 may
assume arbitrary values. This implies that
C ⊥ = {0000, 0010, 1000, 1010} = S ⊥ .
Definition 3.1.8 (Dimension). The dimension of a linear code C = hSi, is the dimen-
sion of hSi. We denote this by dim(C).
The next result follows from linear algebra.
Theorem 3.1.9. If C is a linear code of length n, then dim(C) + dim(C ⊥ ) = n.
Proposition 3.1.10. A linear code C = hSi of dimension k contains exactly 2k code-

words.
Proof.
Suppose C = hSi has dimension k and let {v1 , v2, . . . , vk } be a basis for hSi. Then each
codeword w ∈ C can be uniquely written as : α1 v1 + α2 v2 + · · · αk vk ; αi ∈ K = {0, 1}.
Since there are 2 choices for each αi , there are 2k choices for (α1 , α2 , . . . , αk ). Each such
choice gives a different word.
Definition 3.1.11 (Information rate). The information rate of a code C is defined as
log2 |C|
i(C) = .
n
Corollary 3.1.12. The information rate of a linear code C of dimension k and length n
is k/n.
Proof.
log2 |C| log2 2k k

i(C) = = = .
n n n
3.2 The Generator and Parity Check Matrices

Let C = hSi be a linear code and {v1 , v2, . . . , vk } be a set of basis vectors for C (as
in the proof of Proposition 3.1.10). Further, let G be the k × n matrix whose rows are
v1 , v2 , . . . , vk . Then the codeword w = α1 v1 + α2 v2 + · · · αk vk = [α1 , α2, . . . , αk ]G. Every
codeword arises in this way for some choice of [α1 , α2 , . . . , αk ]. If we therefore let the
binary words of length k correspond to our data, we can encode a word of length k simply
by computing [α1 , α2 , . . . , αk ]G.
Example 3.2.1. Let S = {11101, 10110, 01011, 11010} and C = hSi. To be able to use C
for encoding we first need a basis for hSi. We do this by writing the elements of S into a
matrix (as its rows) and reducing this matrix to row echelon form :
     
1 1 1 0 1 1 1 1 0 1 1 1 1 0 1
     
 1 0 1 1 0   0 1 0 1 1   0 1 0 1 1 
A= → → .
 0 1 0 1 1   0 1 0 1 1   0 0 1 1 1 
1 1 0 1 0 0 0 1 1 1 0 0 0 0 0
Therefore a basis for hSi is {11101, 01011, 00111}. We now let

 
1 1 1 0 1
G= 0 1 0 1 1 .
0 0 1 1 1
Then G is what is known as a generator matrix for C (see Definition 3.2.2).

Suppose we have established the following correspondence.
A 000
E 001
H 010
K 011
L 100
M 101
P 110
X 111
Then the message HELP is encoded as a sequence of four codewords by computing
[010]G, [001]G, [100]G, [110]G = 01011, 00111, 11101, 10110.

3.2. THE GENERATOR AND PARITY CHECK MATRICES 17
Definition 3.2.2 (Generator matrix). A generator matrix for a linear code C = hSi
is a matrix G whose rows are a basis for C.
The generator matrix has the following properties :
• The rows of G are linearly independent.
• The number of rows of G is equal to dim(C).
• Any linear code C has a generator matrix in row echelon form or reduced row echelon
form.
• Any matrix G whose rows are linearly independent is the generator matrix for some
linear code.
Definition 3.2.3 (Parity check matrix). A parity check matrix for a linear code C is
a matrix whose columns are a basis for the dual code C ⊥ .
Therefore H is a parity check matrix for C if and only if H T is a generator matrix for
C ⊥ . Also, if C has length n and dim(C) = k, then C ⊥ has length n and dim(C ⊥ ) = n − k.
Thus H has n rows and n − k columns.
Theorem 3.2.4. Let C be a linear code of length n, dimension k and let H be a parity
check matrix for C. Then C = {w ∈ K n | wH = 0}.
Proof.
Let H = [x1 | x2 | x3 | · · · | xn−k ], where the columns of H, x1 , x2 , x3 , . . . , xn−k , are a
basis for C ⊥ . Then wH is the vector [w · x1 , w · x2 , w · x3 , . . . , w · xn−k ]. Now w ∈ K n has
wH = 0 if and only if each w · xi = 0. That is wH = 0 if and only if w is orthogonal to
each xi if and only if w ∈ (C ⊥ )⊥ = C.
Note that if G is a generator matrix for C and H a parity check matrix for C, GH = 0.
This follows from the fact that the rows of G are a basis for C, while the columns of H
are a basis for C ⊥ .
We now consider the following special case : Let C = hSi be a linear code of length n
and dimension k. If A is the matrix whose rows are the words in S and if A can be put in
reduced row echelon form (which is not always possible, hence the special case), that is
" #
Ik X
A→ ,
0 0
then G = [Ik | X] is a generator matrix for C. Note the dimensions of the sub matrix X :
k rows → [Ik | |{z}
X ].
(n−k)
Let
" #
X
H= ,
In−k
then
" #
X
GH = [Ik | X] ,
In−k
= Ik X + XIn−k ,
= X + X = 0.
This implies that every column of H lies in C ⊥ , furthermore there are n − k linearly
independent columns (because of the identity matrix) and so the columns of H form a
basis for C ⊥ . Thus H is a parity check matrix for C.
So in this special case we can transform between the matrices HC ⊥ , GC ⊥ , GC and HC ,
which are the parity check and generator matrices for the codes C and C ⊥ , as shown in
the following diagram.
∗
HC ⊥ ←→ GC ⊥
T l T l
∗
GC ←→ HC
Here T denotes transpose and ∗ denotes the operation described above.
Example 3.2.5. (This example is continued from the previous one.)
In the previous example we had S = {11101, 10110, 01011, 11010} and C = hSi. We found
a basis for C by writing the elements of S into a matrix a (as its rows) and reducing A
to row echelon form. We now take this one step further and reduce A to reduced row
echelon form.
     
1 1 1 0 1 1 1 1 0 1 1 1 0 1 0
  from previous    
 1 0 1 1 0  example  0 1 0 1 1   0 1 0 1 1 
A =   −−−−−−→  → →
 0 1 0 1 1   0 0 1 1 1   0 0 1 1 1 
1 1 0 1 0 0 0 0 0 0 0 0 0 0 0
 
1 0 0 0 1 " #
 0 1 0 1 1  I X
  3
 = .
 0 0 1 1 1  0 0
0 0 0 0 0
3.2. THE GENERATOR AND PARITY CHECK MATRICES 19
Therefore
 
0 1
G =  I3 1 1 ,
1 1
is a (different) generator matrix for C and

 
0 1
 1 1 
 
 
H= 1 1 ,
 
 1 0 
0 1
is a parity check matrix for C. Note also that GH = 0.
Definition 3.2.6 (Equivalent codes). Let C1 and C2 be block codes of length n. If

there is a permutation of the n digits which, when applied to codewords in C1 gives C2 ,
then C1 and C2 are called equivalent.
Example 3.2.7. Let C1 be the linear code with generator matrix

· ¸
1 0 0
G= ,
0 0 1
so that C1 = {000, 001, 100, 101} (this can easily be seen, as C1 is just all possible sums
of the rows of G).
Let C2 be obtained from C1 by exchanging the last two digits of every codeword of
C1. Then C2 = {000, 010, 100, 110}, so that C1 and C2 are equivalent.
Equivalent codes are exactly the same, except for the order of the digits.
Proposition 3.2.8. Any linear code C is equivalent to a (linear) code C 0 having a gen-
erator matrix in the standard form G0 = [Ik | X].
Example 3.2.9. (Continued from previous example.)

C1’s codewords all have a 0 in their second position. Therefore C1 can not have a generator
matrix in the standard form [I2 | X], as such a generator matrix will generate codewords
that have a nonzero second digit : The second row of this matrix as a 1 in the second
digit (because of the I2 ) and this row itself is a codeword.
On the other hand

· ¸
1 0 0
G2 = ,
0 1 0
is a generator matrix for C2 in standard form and by the previous example we know that
C1 and C2 are equivalent.
Let G = [Ik | X] be a standard form generator matrix for a code C, and let y ∈ K k be
encoded as w = yG = y[Ik | X] = [yIk | yX] = [y | yX]. Then from the last expression
it is clear that the first k digits of w carry the information being encoded (i.e. y) and the
last n − k digits are there to detect/correct errors. In this case the first k digits are known
as the information digits and the last n − k digits are known as the parity check digits.
3.3 Cosets
Recall from Algebra that (K n, +) may be regarded as a group and that a linear code
C ⊆ K n can be thought of as a subgroup of (K n , +).
Definition 3.3.1 (Coset). The (right) coset of C determined by a word u ∈ K n is the

set
C + u = {v + u | v ∈ C}.
Furthermore, from Algebra it also follows that distinct cosets partition K n . The
following result is well known.
Theorem 3.3.2. Let C ⊆ K n be a linear code and u, v ∈ K n . Then
1. u ∈ C + u.
2. u ∈ C + v ⇒ C + u = C + v.
3. u + v ∈ C ⇒ C + u = C + v.
4. u + v ∈
/ C ⇒ C + u 6= C + v.
5. Either C + u = C + v or (C + u) ∩ (C + v) = ∅.
6. |C + u| = |C|.
7. dim(C) = k ⇒ |C + u| = 2k ∀u ∈ K n. That is there exists 2n−k different cosets.

3.3. COSETS 21
8. C = C + 0 is a coset.
Example 3.3.3. Let C = {0000, 1011, 0101, 1110}. Then the cosets (in this case left and
right cosets are the same since addition is commutative) of C are :
C C + 1000 C + 0100 C + 0010

0000 1000 0100 0010
1011 0011 1111 1001
0101 1101 0001 0111
1110 0110 1010 1100
The cosets are determined as follows. By Theorem 3.3.2 number 8, C itself is always a
coset. We therefore write down its elements. Next, Theorem 3.3.2 number 5 guarantees
that the cosets partition K n . We now look for a word in K n that does not appear in any
of the cosets that we have already found and add it to the elements of C. This generates
a new coset as all elements in this coset does not appear in any other coset (by number
5 of the previous Theorem). We keep on repeating this process until we have accounted
for all the elements of K n .
3.3.1 Maximum Likelihood Decoding (MLD) of Linear Codes

Suppose the word w is received, if there exists a unique codeword v such that d(w, v) is
minimum, then decode w as v. If more than one such a codeword exists we have one of
two options.
• Complete Maximum Likelihood Decoding (CMLD) — we arbitrarily choose one

of the candidate codewords.
• Incomplete Maximum Likelihood Decoding (IMLD) — we request that the code-

word be sent again.
If C is a linear code and v ∈ C is sent and w ∈ K n is received, the error pattern

is u = v + w (remember that addition and subtraction is the same operation). Then
w + u = w + w + v = v ∈ C. By Theorem 3.3.2 number 3, w and u are in the same coset.
Now the received word makes it possible to determine the coset and from the coset we
should therefore be able to extract the error pattern.
Assuming that errors of small weight are more likely (since the channel is hopefully
not too bad), we choose as the error pattern, u, a word of least weight in the coset that
contains the received word (C + w in the notation above). We then decode w as w + u

(the error pattern cancels itself).
Suppose w = 1101 is received. Then w ∈ C + 1000 and the word of least weight in
C + 1000 is u = 1000. Therefore we decode w as w + u = 1101 + 1000 = 0101. Note that
the sum of two words in the same coset is always a codeword.
As a second example suppose w = 1111 is received. Then w ∈ C + 0100 and in this
case there are two words of least weight present in this coset. They are 0100 and 0001. If
we are using IMLD we will ask for a retransmission, otherwise using CMLD we arbitrarily
choose between 0100 and 0001 as the possible error pattern.
Two steps arise in the decoding of linear codes.
• Find the coset containing the received word.
• Find the word of least weight in this coset.
The need to perform these operations as quickly as possible prompts the following use of
the parity check matrix.
Definition 3.3.5 (Syndrome). Let C ⊆ K n be a linear code and suppose dim(C) = k.

Let H be a parity check matrix for C. For w ∈ K n , the syndrome of w is the word wH
(∈ K n−k ).

A parity check matrix for C is
 
1 1
 
 0 1 
H= .
 1 0 
0 1
The syndrome for w = 1101 is wH = 11. In fact all x ∈ C + w have syndrome 11 (see
next Theorem).
Theorem 3.3.7. Let C ⊆ K n be a linear code with parity check matrix H. Then for
u, w ∈ K n
1. wH = 0 ⇐⇒ w ∈ C.
2. wH = uH ⇐⇒ w and u are in the same coset of C.

3.3. COSETS 23
Proof.
1. See Theorem 3.2.4.
2.
wH = uH ⇐⇒ (w + u)H = 0,
⇐⇒ w + u ∈ C (by 1.),
⇐⇒ C + w = C + u.
We note the following.
• We can identify each coset by its syndrome. If C has dimension k, there are 2n−k
cosets and 2n−k syndromes. The fact that there are 2n−k syndromes follows from
considering the form of the parity check matrix : it has an identity matrix of size
n − k in its bottom part and all possible words of length n − k can be formed by
summing these rows (which is what the product wH, that arises when calculating
the syndrome, amounts to) the appropriate way.
• If C is a linear code, H its parity check matrix, v ∈ C is sent and the error pattern
u occurs. Then w = v + u will be received. The syndrome of w will be wH =
(v + u)H = vH + uH = 0 + uH = uH (since v is a codeword, vH = 0). Therefore
the syndrome of w will be the sum of those rows of H that correspond to the
positions in v where the errors occurred.
In summary, if C ⊆ K n is a linear code with parity check matrix H and w ∈ K n is

received we decode as follows.
1. Compute syndrome wH.
2. Identify the coset containing w from the syndrome.
3. Find the word of least weight, say u, in this coset.
4. Decode w as w + u.
Definition 3.3.8 (Standard decoding array, Coset leader). A standard decoding ar-
ray (SDA) is a table that matches each syndrome with a word of least weight — the coset
leader — in the corresponding coset.

The standard decoding array from the last example is.
Syndrome Coset Leader

00 0000
01 0100 ❖
10 0010
11 1000
Here ❖ serves to indicate that the coset contains more than one word of least weight.
Therefore we either ask for a retransmission (IMLD) or choose one of the two words
arbitrarily (CMLD).
Chapter 4
Bounds for Codes
In this chapter we consider a variety of bounds on the parameters of a code. Note that
we do not restrict our attention only to linear codes. We will explicitly indicate when a
code is linear. We begin with the following Proposition.
Proposition 4.0.10. If 0 ≤ t ≤ n and v ∈ K n, then the number words w ∈ K n such

that d(v, w) ≤ t is
µ ¶ µ ¶ µ ¶ µ ¶
n n n n
+ + + ··· + .
0 1 2 t
Proof.
¡ ¢
Note that ni counts the number of words that differ from v in exactly i places. The final
result follows by summing over all possible values for i.
Theorem 4.0.11 (Hamming bound [?]). If C ⊆ K n is a code with minimum distance

dmin = 2t + 1 or dmin = 2t + 2 then
·µ ¶ µ ¶ µ ¶ µ ¶¸
n n n n
|C| + + + ··· + ≤ 2n ,
0 1 2 t
or
2n
|C| ≤ ¡n¢ ¡n ¢ ¡n ¢ ¡n¢ .
0
+ 1
+ 2
+ ··· + t
Proof.
For v ∈ C define the set
Bt (v) = {w ∈ K n | d(v, w) ≤ t},
25
26 CHAPTER 4. BOUNDS FOR CODES
to be the ball of radius t centred at v. If u, v ∈ C and u 6= v, then Bt (u) ∩ Bt (v) = ∅ :

Assume some w ∈ K n ∈ Bt (u) ∩ Bt (v). That is d(u, w) ≤ t and d(v, w) ≤ t. This implies
that d(u, v) ≤ d(u, w) + d(w, v) ≤ t + t = 2t. This contradicts dmin ≥ 2t + 1.
The above implies that every w ∈ K n lies in at most one ball Bt (v) for some v ∈ C
(it might not lie in any ball). Therefore
X
|Bt (v)| ≤ |K n | = 2n.
v∈C
By Proposition 4.0.10
µ ¶ µ ¶ µ ¶ µ ¶
n n n n
|Bt (v)| = + + + ··· + ,
0 1 2 t
so that
·µ ¶ µ ¶ µ ¶ µ ¶¸
n n n n
|C| + + + ··· + ≤ 2n .
0 1 2 t
Example 4.0.12. A code C of length 6 and minimum distance 3 (dmin = 2t + 1 = 3 ⇒

t = 1) has
26
|C| ≤ ¡6¢ ¡¢
0
+ 61
64
=
7
1
= 9 .
7
Since |C| ∈ N ∪ {0}, we have |C| ≤ 9. If C is a linear code then |C| is a power of two
(Proposition 3.1.10). In this case |C| ≤ 8 and dim(C) ≤ 3.
Theorem 4.0.13 (Gilbert-Varshamov bound [?, ?]). If

µ ¶ µ ¶ µ ¶ µ ¶
n−1 n−1 n−1 n−1
+ + + ··· + < 2n−k ,
0 1 2 d−2
then there exists a linear code of length n, dimension k and minimum distance at least d.
Proof.
Suppose the inequality holds. We construct a parity check matrix H for the proposed
code. Such a matrix H would have n rows and n − k linearly independent columns. To
27
ensure a minimum distance of d we need to construct H in such a way that any d − 1

rows are linearly independent (and some collection of d rows linearly dependent) — see
Exercises.
In the last n − k rows of H, put the identity matrix In−k . This guarantees that
the columns of H are linearly independent. The remainder of the proof is by induction.
Suppose the last l rows of H have been completed, where n − k ≤ l ≤ n − 1. We claim
that another row can be added :
Among the 2n−k possibilities for the rows of H we cannot select select the following
• The zero row.
• Any row that is a sum of 1, 2, . . . , d − 2 rows already chosen.
The reason is that any such choice would create a linearly dependent set of size at most
d − 1 while we are trying to construct linearly independent sets of size d − 1. The number
of rows that are not allowed is
any sum of sum of
row 2 rows d−2 rows
z}|{
µ ¶ z}|{
zero row µ ¶ zµ }| ¶{
z}|{ l l l
1 + + +··· + ,
1 2 d−2
µ ¶ µ ¶ µ ¶ µ ¶
l l l l
= + + + ··· + ,
0 1 2 d−2
µ ¶ µ ¶ µ ¶ µ ¶
n−1 n−1 n−1 n−1
≤ + + + ···+ ,
0 1 2 d−2
< 2n−k .
¡¢ ¡ ¢
Here we used the fact that if l ≤ n − 1, then li ≤ n−1 i . Therefore another row is
available and so by induction H can be constructed.
Since the columns of H are linearly independent this implies that the dual code has
dimension n − k which in turn implies that the code itself has dimension k. By selecting
the last row to be the sum of some d − 1 rows of H, we guarantee that the minimum
distance equals d.
Corollary 4.0.14. If n 6= 1 and d 6= 1 (since we are assuming d ≤ n — a code of length

n cannot have minimum distance greater than n), then there exists a linear code C of
length n and minimum distance at least d with
2n−1
|C| ≥ ¡n−1¢ ¡n−1¢ ¡ ¢ ¡ ¢.
0
+ 1
+ n−1
2
+ · · · + n−1
d−2
Proof.
Let k be the largest integer less than or equal to n such that
µ ¶ µ ¶ µ ¶ µ ¶
n−1 n−1 n−1 n−1
+ + + ··· + < 2n−k .
0 1 2 d−2
Such an integer k exists since the inequality holds in at least one case namely k = 0. For
this k we have by the Proposition a linear code C with |C| = 2k and
·µ ¶ µ ¶ µ ¶ µ ¶¸
k n−1 n−1 n−1 n−1
2 + + + ··· + < 2k 2n−k = 2n .
0 1 2 d−2
Since we chose k to be the largest integer such that the inequality holds, the inequality
will be reversed for k + 1. Therefore
2n
2k < ¡n−1¢ ¡n−1¢ ¡n−1¢ ¡n−1¢ ≤ 2k+1 ,
0
+ 1
+ 2
+ ···+ d−2
or written differently
2n−1
|C| = 2k ≥ ¡n−1¢ ¡n−1¢ ¡ ¢ ¡ ¢.
0 + 1 + n−1
2 + · · · + n−1
d−2
Definition 4.0.15 ([n, k, d]-code). An [n, k, d]-code is a linear code with length n, di-
mension k and minimum distance d.
Example 4.0.16.
(a) Is there a (9, 2, 5)-code ?
According to the Gilbert-Varshamov bound such code will exist if
µ ¶ µ ¶ µ ¶ µ ¶
8 8 8 8
+ + + = 1 + 8 + 28 + 56 = 93,
0 1 2 3
is less than 29−2 = 27 = 128. Since the inequality holds we know such a code exists.
(b) Does there exist a (15, 7, 5)-code ?

Using the Gilbert-Varshamov bound again we find that
µ ¶ µ ¶ µ ¶ µ ¶
14 14 14 14
+ + + = 1 + 14 + 91 + 364 = 470,
0 1 2 3
is not less than 215−7 = 28 = 256 and so we cannot reach any conclusion based on
this bound alone.
29
(c) Find bounds on the size and dimension of a linear code C with n = 9 and dmin (C) =
5.
Using the Hamming bound we get
29
|C| ≤ ¡9¢ ¡ 9¢ ¡9¢ , (dmin (C) = 2t + 1 = 5 ⇒ t = 2),
0
+ 1 + 2
512
= ,
1 + 9 + 36
512
= ,
46
≈ 11.1 .
Since C is linear, |C| = 2k so that in fact |C| ≤ 8 and k ≤ 3.

The Corollary to the Gilbert-Varshamov bound says that
28
|C| ≥ ¡8¢ ¡ 8¢ ¡ ¢ ¡ ¢,
0 + 1 + 82 + 83
256
= ,
93
≈ 2.8 .
Since C is linear, |C| = 2k so that |C| ≥ 4 and k ≥ 2.
Definition 4.0.17 (Perfect code). A code C of length n is called perfect if equality

holds in the Hamming bound. That is if
2n
|C| = ¡n¢ ¡n¢ ¡n¢ ¡n ¢ ,
0
+ 1
+ 2
+ ··· + t
where dmin (C) = 2t + 1 or dmin (C) = 2t + 2.

In other words the balls of radius t centred at each codeword partition the space K n
— every word in K n is in exactly one such ball. It actually turns out that a perfect
code cannot have an even minimum distance : Let C be a code with minimum distance
dmin (C) = 2t + 2. Let v ∈ C and change v in t + 1 places to obtain a new word z.
Therefore d(v, z) = t + 1. Let u ∈ C, with u 6= v. Then d(u, v) ≤ d(u, z) + d(z, v) or
d(u, z) ≥ d(u, v) − d(z, v) ≥ 2t + 2 − (t + 1) = t + 1. This implies that z is a distance of
at least t + 1 from every codeword in C. Therefore z is in no ball of radius t centred at
the codewords in C. Thus C is not perfect.
Example 4.0.18.
(a) K n is perfect : dmin (K n ) = 1 = 2(0) + 1 ⇒ t = 0 and
2n
2n = |K n | = ¡n¢ .
0
(b) The code C = {000 · · · 0, 111 · · · 1} with length n = 2t + 1 and dmin (C) = 2t + 1 is
perfect :
2n
2 = |C| = ¡n¢ ¡n ¢ ¡ ¢,
0 + 1 + · · · + nt
n
2
= 1 n,
22
= 2.
¡ ¢ ¡ ¢ ¡ ¢
Here we used the fact that if n = 2t + 1, then n0 + n1 + · · · + nt only includes
¡ ¢ ¡ ¢ ¡ ¢ ¡n¢ ¡ n ¢ ¡n¢
half of the terms in n0 + n1 + · · · + nt + t+1 + · · · + n−1 + n = 2n .
The two examples above — K n and the repetition code — are called the trivial perfect
codes.
Theorem 4.0.19 (Tietäväinen and van Lint [?, ?]). If C is a nontrivial perfect code
of length n and minimum distance dmin (C) = 2t + 1 then either
1. n = 2r − 1 for some r ≥ 2 and dmin (C) = 3 or
2. n = 23 and dmin (C) = 7.
Note the following.
• If C is a perfect code of length n and d = 2t + 1, then every w ∈ K n is within a

distance t of some codeword. Thus C can correct all error patterns of weight at
most t and no others.
• By Theorem 4.0.19, a nontrivial perfect code is either
1. of length 2r − 1 (r ≥ 2) and 1-error correcting or

2. of length 23 and 3-error correcting.
Chapter 5
Hamming Codes
5.1 Introduction
In this chapter we focus our attention on one of the first classes of codes that was discov-
ered, namely the Hamming codes. We do this by describing their parity check matrices.
Definition 5.1.1 (Hamming codes [?]). Let r ≥ 2 and let H be the (2r −1)×r matrix
whose rows are the nonzero binary words of length r. The linear code that has H as its
parity check matrix is called the Hamming code of length 2r − 1.
Note that H has linearly independent columns since r of its rows contain those binary
words with only a single 1. In the example below (for the case r = 3) these rows are the
last three rows.
Example 5.1.2. If we let r = 3 in the definition above, we find that the Hamming code
of length 7 has the following parity check matrix.
 
1 1 1
 1 1 0 
 
 1 0 1 
 
 
H= 0 1 1 .
 
 1 0 0 
 
 0 1 0 
0 0 1
In the example above we see that the matrix is in standard form. It will always be
possible to place it in this form by ensuring that the appropriate rows are at the bottom
31
32 CHAPTER 5. HAMMING CODES
of the matrix. If
" #
X
H= ,
Ir
then the generator matrix will be of the form
£ ¤
G = I2r −1−r X .
r
Therefore the Hamming code has dimension 2r − r − 1, so that it has 22 −r−1 codewords.
Since H has no row equal to zero and no two identical rows, every set of 2 rows is lin-
early independent, so that the minimum distance is at least three. On the other hand there
is a linearly dependent set of three rows, for example {1000 · · · 00, 0100 · · · 00, 1100 · · · 00}.
This implies that the minimum distance of the Hamming code is exactly three, i.e. it is a
1-error correcting code.
Let’s Consider the Hamming bound in this case, here n = 2r − 1 and dmin = 3 =
2t + 1 ⇒ t = 1. Therefore
r
2r −r−1 22 −1
2 = |C| ≤ ¡2r −1¢ ¡2r −1¢ ,
0 + 1
2r −1
2
= ,
1 + (2r − 1)
r
22 −1
= ,
2r
r
= 22 −r−1 .
Therefore Hamming codes are perfect.
Its easy to make up a standard decoding array for a Hamming code : The coset leaders
are the words of weight at most 1 (see the exercises). All we therefore need is to compute
the syndromes for each one.
Syndrome Coset Leader

000 0000000
111 1000000
110 0100000
101 0010000
011 0001000
100 0000100
010 0000010
001 0000001
5.2. EXTENDED CODES 33
Recall that equivalent codes have exactly the same parameters. Consider then the
Hamming code whose parity check matrix are the nonzero binary words of length r in
numerical order. For the example above this would mean that
 
0 0 1
 0 1 0 
 
 0 1 1 
 
0  
H = 1 0 0 ,
 
 1 0 1 
 
 1 1 0 
1 1 1
would be the new parity check matrix.

Let C be the Hamming code of length 2r − 1 that has its parity check matrix’s rows
r
in numerical order. Let v ∈ C be sent and let u be an error pattern in K 2 −1 . Then
w = u + v will be received and the syndrome will be wH = (u + v)H = uH + vH = uH
since v is a codeword and has zero syndrome. Thus if u is an error pattern of weight 1
(and C can only correct these) the syndrome will be equal to some row of H. By the
ordering of the rows we know in fact that if u had a 1 in the i’th position then uH will
be the i’th row of H which will equal the number i expressed in binary. We can therefore
decode w by merely complementing the i’th bit.
5.2 Extended Codes

Recall that if a code C has a generator matrix G = [Ik | X], the the first k bits of a
codeword are the information bits and the last n − k bits are the parity check bits.
Example 5.2.1. (Hamming code.)
Recall that the Hamming code of length 7 has the following generator matrix
 
1 0 0 0 1 1 1
 
 0 1 0 0 1 1 0 
G= .
 0 0 1 0 1 0 1 
0 0 0 1 0 1 1
Take x1 x2 x3 x4 ∈ K 4 and multiply it against G to get the codeword x1 x2 x3 x4 p1 p2 p3 . We

then see that
p1 = x1 + x2 + x3 ,
p2 = x1 + x2 + x4 ,
p3 = x1 + x3 + x4 .
Definition 5.2.2 (Extended code). Let C be a linear code of length n and C ∗ a code
obtained from C by adding one extra digit to each codeword so that each word in C ∗ has
even weight. Then C ∗ is called the extended code of C.
If C is a linear code with parity check matrix H, consider the following matrix
" #
H j
H∗ = ,
0 1
obtained from H by adding a column of 1’s to H and adding a row that has zeros
everywhere except in its last entry to this. Thus the final column of H ∗ is made up of 1’s.
Then H ∗ has n − k + 1 linearly independent columns and for every v ∈ C ∗, vH ∗ = 0.
Furthermore, dim(C ∗ ) = k : We can use the same set of basis vectors for C ∗ that was used
for C by adding a digit to each one of the basis vectors such that each new basis vector
has even weight. Each new codeword is still a sum of these new basis vectors. Therefore
C ∗ is the null space of H ∗ , implying that C ∗ is a linear code. As discussed above the
generator matrix, G∗ , for C ∗ can be obtained from the generator matrix of C by adding
a bit to every row such that each row has even weight. We then again have
 
" #  GH · ¸ 
H j  j 
∗ ∗
G H = [G | i] = =
 [G | i]  = 0,

0 1  1 
0
where i represents the column that was added to G so that each row of G∗ now has an
even number of 1’s. The product in the last column is zero since each entry is just the
sum of the corresponding row of [G | i], but since each row has an even number of 1’s
such a sum equals 0. Of course we also have GH = 0.
Example 5.2.3. Extending the Hamming code of length 7 we get the following parity
5.2. EXTENDED CODES 35
check and generator matrices

 
1 1 1 1
 
 1 1 0 1 
 
 1 0 1 1 
 
 0 1 1 1 
H∗ = 

,

 1 0 0 1 
 0 1 0 1 
 
 
 0 0 1 1 
0 0 0 1
 
1 0 0 0 1 1 1 0
 
 0 1 0 0 1 1 0 1 
G∗ =  .
 0 0 1 0 1 0 1 1 
0 0 0 1 0 1 1 1
If v ∈ C and v ∗ is the corresponding codeword in C ∗ then

(
wt(v) if wt(v) is even,
wt(v∗ ) =
wt(v) + 1 if wt(v) is odd.
Therefore
(
dmin (C) if dmin (C) is even,
dmin (C ∗ ) =
dmin (C) + 1 if dmin (C) is odd.
Thus if dmin (C) is odd, then C ∗ will detect one more error than C. This implies that
extending C is only useful when dmin (C) is odd.
The extended Hamming code of length 8, Ce , has a minimum distance of 4 (it was
3 before it was extended). Looking back at the example above in which the generator
and parity check matrices were given for this code, we see that any two rows of G∗ are
orthogonal (even if the rows aren’t distinct). Therefore
G∗ (G∗ )T = 0.
This implies that the rows of G∗ are all in the dual code, Ce⊥ . The dual code has dimension
4 and G∗ has 4 linearly independent rows. Therefore the rows of G∗ are (also) a basis for
Ce⊥ . This shows that the extended Hamming code of length 8 is self-dual : Ce = Ce⊥ .
Chapter 6
Golay Codes
6.1 The Extended Golay Code : C24.

The extended Golay code, C24 , is a [24, 12, 8]-code. We describe this code by giving its
generator matrix. The construction of the generator matrix proceeds in three steps.
Step 1 : Let B1 be the 11 × 11 matrix whose first row is 11011100010 and each subsequent
row is obtained by cyclically shifting its predecessor one position left. That is
 
1 1 0 1 1 1 0 0 0 1 0
 1 0 1 1 1 0 0 0 1 0 1 
 
 0 1 1 1 0 0 0 1 0 1 1 
 
 
 1 1 1 0 0 0 1 0 1 1 0 
 
 1 1 0 0 0 1 0 1 1 0 1 
 
B1 = 
 1 0 0 0 1 0 1 1 0 1 1 .

 0 0 0 1 0 1 1 0 1 1 1 
 
 
 0 0 1 0 1 1 0 1 1 1 0 
 
 0 1 0 1 1 0 1 1 1 0 0 
 
 1 0 1 1 0 1 1 1 0 0 0 
0 1 1 0 1 1 1 0 0 0 1
Step 2 : Let B be the 12 × 12 matrix

· ¸
B1 j T
B= ,
j 0
37
38 CHAPTER 6. GOLAY CODES
where j is the all 1’s vector. Therefore

 
1 1 0 1 1 1 0 0 0 1 0 1
 1 0 1 1 1 0 0 0 1 0 1 1 
 
 0 1 1 1 0 0 0 1 0 1 1 1 
 
 
 1 1 1 0 0 0 1 0 1 1 0 1 
 
 1 1 0 0 0 1 0 1 1 0 1 1 
 
 1 0 0 0 1 0 1 1 0 1 1 1 
B=

.

 0 0 0 1 0 1 1 0 1 1 1 1 
 0 0 1 0 1 1 0 1 1 1 0 1 
 
 
 0 1 0 1 1 0 1 1 1 0 0 1 
 
 1 0 1 1 0 1 1 1 0 0 0 1 
 
 0 1 1 0 1 1 1 0 0 0 1 1 
1 1 1 1 1 1 1 1 1 1 1 0
Note the following.
1. B = B T (B1 is symmetric.)
2. BB T = BB = I.
Step 3 : The extended Golay code, C24 , is the linear code with generator matrix
G = [I12 | B]
 
1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 0 0 0 1 0 1
 0 1 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0 1 0 1 1 
 
 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 1 1 1 
 
 
 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 1 1 0 1 
 
 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 1 0 1 1 
 
 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 1 1 0 1 1 1 
= 
 0
.

 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 1 0 1 1 1 1 
 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 1 0 1 1 1 0 1 
 
 
 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 1 0 1 1 1 0 0 1 
 
 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 1 0 1 1 1 0 0 0 1 
 
 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 1 1 1 0 0 0 1 1 
0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0
We now have
1. C24 has length 24 and dimension 12.

6.1. THE EXTENDED GOLAY CODE : C24 . 39
2. The matrix
· ¸
B
,
I12
is a parity check matrix for C24 .
3. Furthermore
· ¸
I12
H= ,
B
is also a parity check matrix for C24 (with respect to the same generator matrix G) :
· ¸
I12 2
GH = [I12 | B] = I12 + BB = I + I = 0,
B
and H has 12 linearly independent columns (dim(C ⊥ ) = 12). Therefore H is a

parity check matrix.
4. Another generator matrix for C24 is [B | I12 ]
5. C24 is self-dual :
GGT = GH = 0,
⊥ ⊥
therefore every basis vector for C24 belongs to C24 . Further, dim(C24) = dim(C24 )=
⊥
12 so that the basis vectors for C24 are also basis vectors for C24 . This implies that
⊥
C24 = C24 .
6. C24 has minimum distance equal to 8. We prove this in three steps.
(a) The weight of any codeword is a multiple of 4 :

Note that every row of G has weight 8 or 12. Let v ∈ C24 be a sum of two
different rows, ri and rj , of G. Since any two different rows of G are orthogonal
(GGT = 0), this implies that when calculating their inner-product all of the
1’s in one word get multiplied against zeros in the other or else the two words
have to have an even number of 1’s in the same positions (so that their product
can sum to zero). Therefore ri and rj have an even number of 1’s in common,
say 2x. Then
wt(v) = wt(ri + rj ) = wt(ri) + wt(rj ) − 2(2x),

which is a multiple of four.

Suppose that any word that is a sum of t or fewer rows of G has weight that
is a multiple of 4. Let v ∈ C24 be a sum of t + 1 rows of G. Then v = r + w,
where r is a row of G and w is a sum of t rows of G. By hypothesis wt(w) is a
multiple of 4. As before r and w have an even number of 1’s in common, say
2y, so that
wt(v) = wt(r + w) = wt(r) + wt(w) − 2(2y),
which is a multiple of 4. By induction any v ∈ C24 has a weight that is a

multiple of 4.
(b)
dmin (C24) = min wt(v),

v∈C24
v6=0
= 4 or 8.
The last step follows from the fact that G has rows of weight 8 and each of
these rows are also codewords of C24.
(c) C24 has no codewords of weight 4 :
Suppose v ∈ C24 has weight 4. We noted earlier that both [I12 | B] and [B | I12 ]
are generator matrices for C24 . Therefore v = w1 [I12 | B] = w2 [B | I12 ] for
some w1 , w2 6= 0. Now neither of the two halves of v can be identically zero.
This is because of the identity matrices in the generator matrices and also since
w1, w2 6= 0. Further, if either half of v contained only one 1, this would imply
that v equalled a row of either [I12 | B] or [B | I12 ], but each row has weight at
least eight. Therefore each half of v must contain exactly two 1’s. This implies
that wt(w1 ) = wt(w2) = 2, but the sum of two rows of B has weight at least
4. Therefore wt(v) = wt(wi ) + wt(wiB) > 2 + 4 > 4 — a contradiction.
Thus no v ∈ C24 has weight 4. Therefore dmin (C24) = 8.
6.2 The Golay Code : C23 .

This section discusses the original Golay code that appeared in [?].
Definition 6.2.1 (Puncturing a code). Puncturing a code means removing a digit

(the same one) from every codeword.
6.2. THE GOLAY CODE : C23. 41
Recall, that the extended Golay code, C24, has generator matrix [I12 | B], where
· ¸
B1 j T
B= .
j 0
The Golay code C23 is obtained by removing the last digit from every codeword in C24 .
Therefore it has the generator matrix [I12 | B 0 ], where
· ¸
0 B1
B = .
j
Then G is a 12 × 23 matrix with linearly independent rows. Therefore C23 has length
n = 23 and dimension k = 12.
Notice that the extended Golay code, C24 , is the extension (as defined previously) of
the Golay code C23 (or C23 resulted from puncturing C24 ). Since the extended code, C24 ,
has minimum distance 8, C23 has minimum distance 7 or 8. Since G has rows of weight
7, dmin (C23 ) = 7. Thus C23 is a [23, 12, 7]-code. This also means that C23 is perfect (as
the computation below shows) and so corrects all error patterns of weight at most 3 and
no others (every word in K 23 is within distance 3 of some codeword in C23 ).
223
212 = |C23 | = ¡23¢ ¡23¢ ¡23¢ ¡23¢,
0 + 1 + 2 + 3
223
= ,
1 + 23 + 253 + 1771
223
= 11 = 212 .
2
Chapter 7
Reed-Muller Codes
The rth order Reed-Muller [?, ?] code (of length 2m ), denoted by RM (r, m), 0 ≤ r ≤ m,
is defined recursively as
m
1. RM (0, m) = {00 · · · 0}, 11
| {z · · · 1}}. Further, RM (m, m) = K 2 .
| {z
2m 2m
2. RM (r, m) = {(x, x + y) | x ∈ RM (r, m − 1) and y ∈ RM (r − 1, m − 1)}, where

0 < r < m.
Example 7.0.2.
RM (0, 0) = {0, 1}.
RM (0, 1) = {00, 11}.
RM (1, 1) = {00, 01, 10, 11}.
RM (0, 2) = {0000, 1111}.
RM (1, 2) = {(x, x + y) | x ∈ RM (1, 1) and y ∈ RM (0, 1)},
= {(x, x + y) | x ∈ {00, 01, 10, 11} and y ∈ {00, 11}}
= {0000, 0011, 0101, 0110, 1010, 1001, 1111, 1100}.
RM (2, 2) = K 4.
Let G(r, m) denote the generator matrix for RM (r, m). These can be defined recur-
sively.
1.
G(0, m) = [1| 1{z
· · · 1}].
2m
43
44 CHAPTER 7. REED-MULLER CODES
2. For 0 < r < m,

· ¸
G(r, m − 1) G(r, m − 1)
G(r, m) = .
0 G(r − 1, m − 1)
3.
· ¸
G(m − 1, m)
G(m, m) = .
00 · · · 01
Example 7.0.3.
1.
G(0, 1) = [1 1].
2.
" #
1 1
G(1, 1) = .
0 1
3.
 
· ¸ 1 1 1 1
G(1, 1) G(1, 1)  
G(1, 2) = =  0 1 0 1 .
0 G(0, 1)
0 0 1 1
4.
 
1 1 1 1
· ¸  
G(1, 2)  0 1 0 1 
G(2, 2) = = .
00 · · · 1  0 0 1 1 
0 0 0 1
5.
 
1 1 1 1 1 1 1 1
· ¸  
G(1, 2) G(1, 2)  0 1 0 1 0 1 0 1 
G(1, 3) = = .
0 G(0, 2)  0 0 1 1 0 0 1 1 
0 0 0 0 1 1 1 1
45
Theorem 7.0.4. The r’th order Reed-Muller code RM (r, m) has the following properties
1. Length n = 2m .
2. Minimum distance, dmin = 2m−r .
3. Dimension,
Xr µ ¶
m
k= .
i=0
i
4. For r > 0, RM (r − 1, m) ⊆ RM (r, m).
5. For r < m, RM (r, m)⊥ = RM (m − 1 − r, m).
Proof.
We only prove 1, 2 and 4.
All proofs are by induction. From the previous example we see that all statements are
true when m = 0, 1, 2.
1. When r = 0 or r = m this follows from the definition. If 0 < r < m then RM (r, m) is
constructed from two Reed-Muller codes both of length 2m−1. Therefore RM (r, m)
will have length 2m−1 + 2m−1 = 2m .
4. The proof uses induction on r+m. We see from a previous example that RM (0, 1) ⊆
R(1, 1), therefore the statement is true if r + m ≤ 2. Assume that if r + m < t and
r > 0, then RM (r − 1, m) ⊆ RM (r, m) and consider the situation where r + m = t
and r > 0. There are three cases.
(i) r = 1 (so that what we want is RM (0, m) ⊆ RM (1, m)).

It follows from the recursive definition of G(r, m) that its first row is all 1’s.
Since G(0, m) = [1 1 · · · 1], this means that G(0, m) is a sub-matrix (the first
row) of G(r, m). Therefore RM (0, m) ⊆ RM (1, m).
(ii) r = m.
m
This is clear since RM (m, m) = K 2 , giving RM (m − 1, m) ⊆ RM (m, m).
(iii) 1 < r < m.
We have
· ¸
G(r − 1, m − 1) G(r − 1, m − 1)
G(r − 1, m) = ,
0 G(r − 2, m − 1)
and
· ¸
G(r, m − 1) G(r, m − 1)
G(r, m) = .
0 G(r − 1, m − 1)
By hypothesis G(r − 1, m− 1) is a sub-matrix of G(r, m− 1) and G(r −2, m− 1)
is a sub-matrix of G(r − 1, m − 1). Therefore G(r − 1, m) is a sub-matrix of
G(r, m) and the result follows by induction.
2. Again the proof uses induction on r+m. The result is clear if r+m = 0 or r+m = 1.
Suppose the minimum distance of RM (s, t) = 2t−s whenever s + t < l and consider
the case r + m = l and the code RM (r, m). If r = 0 or r = m, then the result is
true, so assume 0 < r < m. We know
RM (r, m) = {(x, x + y) | x ∈ RM (r, m − 1) and y ∈ RM (r − 1, m − 1)},

and RM (r − 1, m − 1) ⊆ RM (r, m − 1) (by number 4). Therefore x + y ∈
RM (r, m − 1). If x 6= y then by the induction hypothesis wt(x) ≥ 2m−1−r and
wt(x+y) ≥ 2m−1−r . Thus wt(x, x+y) = wt(x)+wt(x+y) ≥ 2m−1−r +2m−1−r = 2m−r .
If x = y, then x ∈ RM (r − 1, m) and (x, x + y) = (x, 0). Therefore wt(x, x + y) =
wt(x) ≥ 2m−1−(r−1) = 2m−r . This shows that dmin ≥ 2m−r . Equality can be seen to
hold by choosing x and y so that equality holds in RM (r, m−1) and RM (r−1, m−1)
(i.e. choosing them to be the minimum weight words in their respective codes).
7.1 The Reed-Muller Codes RM (1, m)

These codes have the following properties
• Length n = 2m .
• Minimum distance dmin = 2m−1 .

¡ ¢ ¡ ¢
• Dimension m0 + m 1 = m + 1, so that the code contains 2
m+1
codewords.
Therefore it is a [2m , m + 1, 2m−1 ]-code.
As always when receiving a word w the goal is to find a codeword closest to w if any
such a codeword exists. The general principles that apply to the RM (1, m) codes in this
situation are illustrated by the following example.
Example 7.1.1. RM (1, 3) is an [8, 4, 4]-code. Assume that w is received. If we can find
a codeword v such that d(v, w) < 2 we decode w to v. Also, if u is a codeword with
7.1. THE REED-MULLER CODES RM (1, M) 47
d(u, w) > 6, then d(w, u + 11111111) < 2 and so w may be decoded to u + 11111111.
Therefore no more than half the codewords need to be examined to find a codeword closest
to w, if one exists. Note that 11111111 is always a codeword so that u + 11111111 is a
codeword if u is a codeword.
Chapter 8
Decimal Codes
8.1 The ISBN Code

The International Standard Book Number is the number that is displayed on the back
of books that uniquely identifies the book. An example of such a number is : 0 – 201 –
34292 – 8. It is always made up of 10 digits, say x1 x2 x3 · · · x10, with the last digit being
a parity check digit. The first 9 digits is divided into three groups as follows.
Group Identifier
This identifies the group of countries where the book was published. It has more digits
if the group produces fewer books. As an example x1 = 0 identifies the English speaking
countries, x1 = 3 the German speaking countries and x1 x2 = 87 Denmark.
Publisher Prefix
This is anywhere from 2 to 7 digits and identifies the publisher.
Title Number
It is 1 to 6 digits in length and is assigned by the publisher.
Parity Check
The tenth digit, x10 , is chosen so that
1x1 + 2x2 + 3x3 + · · · + 9x9 + 10x10 ≡ 0 (mod 11)

≡ −10x10 (mod 11)
≡ 1x10 (mod 11).
49
50 CHAPTER 8. DECIMAL CODES
If x10 = 10, then X is used in the ISBN.

Example 8.1.1. Given 0 – 201 – 34292 – 8 check that is it a valid ISBN. To do this we
compute
1 · 0 + 2 · 2 + 3 · 0 + 4 · 1 + 5 · 3 + 6 · 4 + 7 · 2 + 8 · 9 + 9 · 2 + 10 · 8
= 0 + 4 + 0 + 4 + 15 + 24 + 14 + 72 + 18 + 80
≡ 4 + 4 + 4 + 2 + 3 + 6 + 7 + 3 (mod 11)
≡0 (mod 11).
Therefore it is a valid ISBN.

The ISBN code has the following properties.
• The ISBN code detects all single errors.

Suppose the ISBN is x1 x2 · · · x10 , but that xi is recorded wrong as xi + e, e 6= 0.
The sum then becomes
1 · x1 + 2 · x2 + · · · + 10x10 + i · e,
≡ 0 + i · e (mod 11).
Suppose i · e ≡ 0 (mod 11). This implies that 11|i · e, so that 11|i or 11|e, but
neither i nor e are congruent to 0 mod 11. Therefore we will have a nonzero sum if
an error occurred.
• It is possible to correct an error if you know which digit is in error. We show this
by way of an example.
Example 8.1.2. Find x if 01383x0943 is an ISBN. Since it is an ISBN we know that
1 · 0 + 2 · 1 + 3 · 3 + 4 · 8 + 5 · 3 + 6 · x + 7 · 0 + 8 · 9 + 9 · 4 + 10 · 3,
≡ 6x + 9 ≡ 0 (mod 11),
∴ 6x ≡ −9 ≡ 2 (mod 11),
x ≡ 4 (mod 11).
• The ISBN code detects any single error arising from transposing two digits (adjacent
or not).
8.2. A SINGLE ERROR CORRECTING DECIMAL CODE 51
Let x1 x2 · · · x10 be the correct ISBN number. Assume that xi and xj get interchanged
in the recording of the number (i < j). The check sum becomes
1 · x1 + 2 · x2 + · · · ixj + · · · jxi + · · · + x10 ,

= 1x1 + 2x2 + · · · + ixi + · · · jxj + · · · + 10x10 + i(xj − xi ) + j(xi − xj ),
≡ 0 + j(xi − xj ) − i(xi − xj ) (mod 11),
≡ (xi − xj )(j − i) (mod 11).
We are implicitly assuming that xi 6= xj (otherwise no error occurs) and furthermore

j − i 6= 0. Therefore (xi − xj )(j − i) is a product of two nonzero numbers in Z11 and
so is not equal to zero (since Z11 is a field).
8.2 A Single Error Correcting Decimal Code

Definition 8.2.1 (Distance). If u and v are words in any code (binary or otherwise)
then the distance between u and v, d(u, v), is the number of places where u and v differ.
The code that we are considering in this section has the following properties.
• It is made up of 10 digits : x1 x2 · · · x9 x10.
• x1 through x8 are information digits.
• x9 and x10 are parity check digits chosen such that

10
X
S1 = ixi ≡ 0 (mod 11),
i=1
10
X
S2 = xi ≡ 0 (mod 11).
i=1
The calculation of the parity digits is illustrated in the following example.

Example 8.2.2. Let x1 x2 · · · x8 be 02013429. We need to choose x9 and x10 such that
S1 = 1 · 0 + 2 · 2 + 3 · 0 + 4 · 1 + 5 · 3 + 6 · 4 + 7 · 2 + 8 · 9 + 9x9 + 10x10 ≡ 0 (mod 11),

S2 = 0 + 2 + 0 + 1 + 3 + 4 + 2 + 9 + x9 + x10 ≡ 0 (mod 11).
This simplifies to
9x9 + 10x10 ≡ 10 (mod 11),

x9 + x10 ≡ 1 (mod 11).
Adding the two equation together we get 10x9 ≡ 0 (mod 11), which implies that x9 = 0.
From this we get x10 = 1.
So in general we will find that
9x9 + 10x10 ≡ a (mod 11),

x9 + x10 ≡ b (mod 11).
Adding the equations together leads to
10x9 ≡ (a + b) (mod 11),

∴ x9 ≡ 10(a + b) (mod 11),
∴ x10 ≡ a + 2b (mod 11).
Suppose now that x1 x2 · · · x10 is sent and a single error e 6= 0 occurs in the i’th digit.
Then the received word becomes x1 x2 · · · xi−1 (xi + e)xi+1 · · · x10 . Therefore
S1 = 1x1 2x2 + · · · ixi + · · · 10x10 + ie ≡ 0 + ie (mod 11),

S2 = x1 + x2 + · · · + xi + · · · + 10x10 + e ≡ 0 + e (mod 11).
So, the second equation gives the magnitude of the error, e. Once we know this we can
also calculate the position of the error : i ≡ e−1 S1 ≡ S2−1 S1 .
Therefore the decoding may be summed up as follows.
1. If the received word is r, then compute the syndrome,
· ¸
S1
rH = , (mod 11),
S2
where
 
1 1
 2 1 
 
 3 1 
H= .
 .. .. 
 . . 
10 1
2. If S1 = S2 = 0 then assume r is a codeword.
3. If S1 6= 0 and S2 6= 0 then assume an error of magnitude e ≡ S2 (mod 11)

has occurred in location i ≡ S2−1 S1 (mod 11) and decode r as x1 x2 · · · xi−1 (xi −
e)xi+1 · · · x10 .
8.3. A DOUBLE ERROR CORRECTING DECIMAL CODE 53
4. If S1 = 0 and S2 6= 0 or S1 6= 0 and S2 = 0, then at least two errors have occurred,

so request retransmission.
Note that point 4 above always occurs when two different digits are transposed (S1 6= 0
and S2 = 0). Therefore this code can detect all errors involving the transposition of two
digits.
8.3 A Double Error Correcting Decimal Code

This code has the following properties.
• Length n = 10.
• Information digits are the first 6 digits, x1 x2 , · · · x6 .
• The last four digits, x7 x8 x9 x10 , are the parity check digits. They are chosen so that
10
X
S1 = ixi ≡ 0 (mod 11),
i=1
X10
S2 = xi ≡ 0 (mod 11),
i=1
X10
S3 = i2 xi ≡ 0 (mod 11),
i=1
X10
S4 = i3 xi ≡ 0 (mod 11),
i=1
The digits x1 through x6 will be known since they are information digits. Therefore the
equations above will assume the form
7x7 + 8x8 + 9x9 + 10x10 ≡ a (mod 11),

x7 + x8 + x9 + x10 ≡ b (mod 11),
72 x7 + 82 x8 + 92x9 + 102x10 ≡ c (mod 11),
3 3 3 3
7 x7 + 8 x8 + 9 x9 + 10 x10 ≡ d (mod 11).
Here a, b, c and d are elements of Z11 with

i=6
X
a ≡ − ixi (mod 11),
i=1
i=6
X
b ≡ − xi (mod 11),
i=1
i=6
X
c ≡ − i2 x i (mod 11),
i=1
i=6
X
d ≡ − i3 x i (mod 11).
i=1
Suppose that x1x2 · · · x10 are sent and that a single error occurs in location i and is
of size e 6= 0. This situation is exactly the same as in the previous section and we will be
able to correct this single error if we know that only this error occurred. It turns out that
it is indeed possible to detect the presence of only one error. To do this we examine the
general case of two errors first.
Suppose therefore that two errors occur in locations i and j with sizes e1 6= 0 and
e2 6= 0. The presence of these errors affects the parity check equations above as follows.
S1 ≡ ie1 + je2 (mod 11),

S2 ≡ e1 + e2 (mod 11),
2 2
S3 ≡ i e1 + j e2 (mod 11),
S 4 ≡ i3 e 1 + j 3 e 2 (mod 11).
It can be shown that i and j are the roots (in Z11 ) of
ax2 + bx + c,
where
a = S12 − S2 S3 ,
b = S2 S4 − S1 S3 ,
c = S32 − S1 S4 .
This quadratic can be solved using a formula analogous to the one used for real numbers.
The derivation of this formula is done below. Note that all calculations are in Z11 and
8.3. A DOUBLE ERROR CORRECTING DECIMAL CODE 55
that the square-root used here may not exist. The root represents the number in Z11 that
upon squaring yields the number under the root sign.
ax2 + bx + c = 0, a 6= 0,
a−1 ax2 + a−1 bx + a−1 c = 0,
x2 + a−1 bx + (2−1 a−1 b)2 − (2−1 a−1 b)2 = −a−1 c,
(x + 2−1 a−1 b)2 = (2−1 a−1 b)2 − a−1 c,
√
x + 2−1 a−1 b = ± 2−2 a−2 b2 − a−1 c,
p
x = −2−1 a−1 b ± 2−1 a−1 b2 − (a−1 c)(2−2 a−2 )−1 ,
√
= [−b ± b2 − 4ac](2a)−1
So, using this formula we can find i and j and then solve
S1 ≡ ie1 + je2 ,
S2 ≡ e1 + e2 ,
for e1 and e2.

The decoding can be summed up as follows. Let
 
1 1 12 13
 
 2 1 22 23 
 
 3 1 32 33 
H=  4 1 42 43
.

 
 .. .. .. .. 
 . . . . 
10 1 102 103
If r is the received word then
1. Compute the syndrome

 
S1
 
 S2 
rH =   = S.
 S3 
S4
2. If S = 0 then a codeword was received.
3. If S 6= 0 and a = b = c = 0 then a single error occurred in digit i = S1 S2−1 and with

size e = S2 . This may be seen to hold be realizing that a single error implies that
S1 = ie1 , S2 = e1 , S3 = i2 e1 and S4 = i3e1 .
4. If S 6= 0 and a 6= 0, c 6= 0 and b2 − 4ac is a square in Z11 , then there are two errors
e1 and e2 in locations i and j as before.
5. Otherwise, at least three errors are detected, so retransmit.
Example 8.3.1. Suppose the received word is 3254571396. Then
S1 = 1 · 3 + 2 · 2 + 3 · 5 + 4 · 4 + · · · + 10 · 6 ≡ 2 (mod 11),
S2 = 3 + 2 + 5 + · · · + 6 ≡ 1 (mod 11),
S3 = 12 · 3 + 22 · 2 + · · · 102 · 6 ≡ 10 (mod 11),
S4 = 13 · 3 + 23 · 2 + · · · + 103 · 6 ≡ 3 (mod 11).
So that
a = S12 − S2 S3 = 22 − 1 · 10 ≡ 5 (mod 11),

b = S2 S4 − S1 S3 = 1 · 3 − 2 · 10 ≡ 5 (mod 11),
c = S32 − S1 S4 = 102 − 2 · 3 ≡ 6 (mod 11).
Then b2 − 4ac = 52 − 4 · 5 · 6 = 25 − 120 ≡ 4 is a square. Thus
i, j = (−5 ± 2)(2 · 5)−1 ,

= (−5 ± 2)10 since 10−1 = 10,
= 3, 7.
Now
S1 = 3e1 + 7e2 ≡ 2 (mod 11),

S2 = e1 + e2 ≡ 1 (mod 11).
Adding 4 times S2 to S1 gives 7e1 ≡ 6, implying that e1 = 4 and e2 = 8. Thus the decoded
word is 3214574396.
8.4 The Universal Product Code (UPC)

This code has the following properties
• All codewords have length 12 : x1 x2 x3 · · · x12.
• The first digit specifies the type of product.

8.5. US MONEY ORDER 57
• x2 x3 · · · x6 specifies the manufacturer.
• x7 · · · x11 is a product number.
• The check digit, x12 , is determined by
3(x1 + x3 + x5 + x7 + x9 + x11 ) + x2 + x4 + x6 + x8 + x10 + x12 ≡ 0 (mod 10).
Example 8.4.1. A 50 cents off Kellogg’s Rice Krispies coupon has UPC 53800051150x12 .
Therefore
3(5 + 8 + 0 + 5 + 1 + 0) + 3 + 0 + 0 + 1 + 5 + x12 ≡ 7 + 9 + x12 ≡ 0 (mod 10).
Thus x12 = 4.
Suppose that a single error of size e 6= 0 occurs in the i’th digit. The check sum
either changes by e or by 3e depending on i being even or odd. Since gcd(10, 3) = 1,
10|3e ⇐⇒ 10|e. Further 1 ≤ e ≤ 9, so that 10 does not divide e. Thus the error is
detectable since we have a nonzero checksum. Note that the error cannot be corrected
unless you know which digit is in error.
If adjacent digits xi and xj are transposed then the change to the check sum is −3xi +
xi − xj + 3xj = 2(xj − xi). This type of error is detected unless (xj − xi) = ±5.
8.5 US Money Order

The code employed for US money orders has the following properties.
• The codewords have length 11 : x1 x2 · · · x11.
• The check digit, x12 , is defined by
x1x2 x3 · · · x10 ≡ x11 (mod 9).
Here x1 x2 x3 · · · x10 represents the number with the digits x1 , x2 , . . . , x10 and is not
the product of the numbers x1 , x2 , . . . , x10. Also 0 ≤ x11 ≤ 8.
• The check equation is equivalent to
x1 + x2 + x3 + · · · + x10 ≡ x11 (mod 9).
The proof of this is in assignment 3 in the appendix.

Should an error occur in x11 then it will be detected since x1 + x2 + x3 + · · · x10 will
not be equivalent to x11 + e (mod 9).
If an error occurs in the first 10 digits then it will be detected unless a 0 is changed
into a 9 or 9 is changed into a 0. To estimate the probability of this we proceed as follows.
The probability that the i’th digit is a 0 is 109 /1010 = 0.1. The probability of the error
being a 9 is 1/9. Therefore the probability of changing a 0 into a 9 is 1/90. Similarly we
find that the probability of changing a 9 into a 0 is also 1/90. So, the probability of an
error occurring in the i’th (1 ≤ i ≤ 10) digit is 2/90.
8.6 US Postal Code

The code used by the US postal service is the following.
• Each codeword is of length 10 : x1 x2 · · · x10 .
• The code is printed in a bar-coded format on the envelope. The bar-code is delimited
by two long bars on either side. Each decimal digit is represented by a group of 5
long and short bars.
• Each such group of 5 bars has exactly 2 long and 3 short bars. Thus there are
¡ 5¢
2
= 10 such bars — one for each decimal digit.
• We can think of a long bar as representing a 1 and a short bar representing a 0. Then
the correspondence between the bars and the decimal digits is given by : if abcde
is the binary sequence of bars then 7a + 4b + 2c + d is the decimal digit associated
with it. In tabular form this is
Binary Decimal
00011 1
00101 2
00110 3
01001 4
01010 5
01100 6
10001 7
10010 8
10100 9
11000 0
8.6. US POSTAL CODE 59
Therefore the following sequence of bars represents 10010 which in turn repre-
sents 8.
• A parity check digit x10 is added to the bar-code so that x1 + x2 + · · · x10 ≡ 0

(mod 10).
Suppose that the scanner makes a single error when reading the code (i.e. it reads a 1
as 0 or a 0 as 1). Then some block of length 5 doesn’t have 2 ones — so the block itself
can be identified. If one error occurs in the i’th block, then the decimal digit associated
with it will be undefined. This digit can be recovered from the parity check equation since
we now know which digit is in error.
Chapter 9
Hadamard Codes
Suppose we want a 2-error correcting code, C, of length 11. Then according to the
Hamming bound we have
211
|C| ≤ ¡11¢ ¡11¢ ¡11¢ ≈ 30.57.
0 + 1 + 2
Therefore C can have at most 30 codewords. Since 24 < 30 < 25 , the largest binary, linear
2-error correcting code of length 11 will have at most 24 = 16 codewords.
On the other hand there exists a nonlinear code of length 11 that is 2-error correcting
and that has 24 codewords — a definite improvement. This is a Hadamard code and its
construction is presented at the end of this chapter. To be able to construct these codes
we need a fair amount of information on Hadamard matrices. After we have constructed
these matrices we use their rows as the codewords for the Hadamard code.
Definition 9.0.1 ((n, M, d)-code). An (n, M, d)-code is a blockcode (linear or nonlin-

ear) that has length n, M codewords and minimum distance d.
9.1 Background
Definition 9.1.1 (Hadamard Matrix). A Hadamard matrix, H, of order n is an n× n
matrix of +1’s and −1’s such that
HH T = nI.
Therefore any two distinct rows of a Hadamard matrix are orthogonal (using the real
inner product). Furthermore, multiplying any row or column of a Hadamard matrix by −1
61
62 CHAPTER 9. HADAMARD CODES
changes it into another Hadamard matrix. This allows us to put a given Hadamard matrix
into the so-called normalised form where the first row and first column only contains 1’s.
Example 9.1.2. Normalised Hadamard matrices of orders 1, 2, 4 and 8 are shown below.
Note that +1’s are only shown as +, while −1 are shown as −.
· ¸
+ +
n = 1 : H1 = [+], n=2: ,
+ −
 
+ + + + + + + +
 
   + − + − + − + − 
 
+ + + +  + + − − + + − − 
   
 + − + −   + − − + + − − + 
n = 4 : H4 =   , n = 8 : H8 = 
 + +
.

 + + − −   + + − − − − 
+ − − +  + − + − − + − + 
 
 
 + + − − − − + + 
+ − − + − + + −
Theorem 9.1.3. If a Hadamard matrix of order n exists, then n is 1,2 or a multiple of

4.
Proof.
By the previous example we know that Hadamard matrices of order 1 and 2 exist.
Let H be a Hadamard matrix of order n > 2. Put H in normalised form — this does
not affect n, therefore the first row of H is made up entirely of +1’s. The innerproduct
of the first row with any other row has to be zero, but this innerproduct is just the other
row. Therefore all rows of H (except the first) have an equal amount of +1’s and −1’s.
By permuting the columns of H we can change the second row into one that has as its
first n/2 entries +1’s and its second n/2 entries all −1’s. Therefore the first two rows of
H are as follows.
n/2 n/2
z }| { z }| {
+ + ··· + + + + ··· + +
+
| + ·{z
· · + +} −
| − ·{z
· · − −}
n/2 n/2
Let u be any row of H except the first two. Again u has n/2 +1’s and n/2 −1’s. Say u
has x +1’s in its first n/2 positions (so that it has (n/2 − x) −1’s in its first n/2 positions)
9.1. BACKGROUND 63
and y +1’s in its second n/2 positions (and therefore (n/2 − y) −1’s in its second n/2
positions). Since the innerproduct between u and the second row is zero we have
x − (n/2 − x) − y + (n/2 − y) = 2x − 2y = 0,
∴ x − y = 0.
Further the innerproduct between u and the first row is also zero and this leads to
x − (n/2 − x) + y − (n/2 − y) = 2x + 2y − n = 0,
∴ x + y = n/2.
Adding these two equations we find 2x = n/2 or that n = 4x.

We now present two constructions of Hadamard matrices that prove to be useful from
a coding theory standpoint.
Construction 1
If Hn is a Hadamard matrix of order n, then
· ¸
Hn Hn
H2n = ,
Hn −Hn
is a Hadamard matrix of order 2n.

To see that this construction works note that
· ¸· T ¸
T Hn Hn Hn HnT
H2n H2n = ,
Hn −Hn HnT −HnT
· ¸
Hn HnT + Hn HnT HnHnT − Hn HnT
= ,
HnHnT − HnHnT Hn HnT + Hn HnT
· ¸
2nI 0
= = 2nI.
0 2nI
Starting with H1 = [1], this gives H2 , H4 , H8 , . . . . These are Hadamard matrices of

orders a power of two and are known as Sylvester matrices.
For the second construction we need some facts about quadratic residues.
Definition 9.1.4 (Quadratic residue). Let p be an odd prime. The nonzero squares
modulo p : 12 , 22 , 32, . . . reduced (mod p), are called quadratic residues (mod p) (or just
residues (mod p)).
To find the residues (mod p) it is enough to consider

µ ¶2
2 2 p−1
1 ,2 ,... , (mod p),
2
since (p − a)2 = p2 − 2pa + a2 ≡ a2 (mod p). These are all distinct for if i2 ≡ j 2 (mod p)
with 1 ≤ i, j ≤ (p − 1)/2 then (i − j)(i + j) = i2 − j 2 ≡ 0 (mod p). That is p|(i − j)
or p|(i + j). In the first case i ≡ j (mod p) and in the latter case i ≡ −j (mod p), but
(p − 1)/2 + 1 ≤ −j ≤ p − 1 and 1 ≤ i ≤ (p − 1)/2. So i2 ≡ j 2 (mod p) is only possible if
i ≡ j (mod p). Therefore there are (p − 1)/2 quadratic residues (mod p). The remaining
(p − 1)/2 numbers are called nonresidues.
Example 9.1.5. For p = 11, the quadratic residues are
11 = 1,
22 = 4,
32 = 9,
42 = 16 ≡ 5 (mod 11) and
52 = 25 ≡ 3 (mod 11).
We need the following properties of quadratic residues.
1. The product of two quadratic residues or of two nonresidues is a residue, while the
product of a residue and a nonresidue is a nonresidue.
2. If p is of the form 4k + 1, −1 is a quadratic residue (mod p). If p is of the form

4k + 3, −1 is a nonresidue (mod p).
3. We define the following function on Zp , also known as the Legendre symbol (but we
use a different notation to simplify later expressions).
χ(i) = 0 if i ≡ 0 (mod p),

χ(i) = 1 if i is a residue,
χ(i) = −1 if i is a nonresidue.
Note that χ(x)χ(y) = χ(xy), where x, y ∈ Zp .
We also need the following theorem.
Theorem 9.1.6. For c ∈ Zp and c 6= 0,

p−1
X
χ(b)χ(b + c) = −1.
b=0
9.1. BACKGROUND 65
Construction 2 (The Paley Construction)

This construction produces a Hadamard matrix of order n = p + 1, where p is a prime of
the form 4k + 3 (i.e. n is a multiple of four).
(i) First construct the Jacobsthal matrix Q = (qij ). This is a p × p matrix whose rows
and columns are labelled 0, 1, 2, . . . , p − 1 and qij = χ(j − i).
Note that qij = χ(j − i) = χ(−1)χ(i − j) = −qji since p is of the form 4k + 3 and
by property 2 −1 is a nonresidue. That is Q is skew-symmetric : QT = −Q.
(ii) We need the following Lemma.

Lemma 9.1.7. QQT = pI − J and QJ = JQ = 0, where J is the matrix all of
whose entries are 1.
(iii) Let
· ¸
1 j
H= T ,
j Q−I
where j is a vector of all 1’s. Therefore H is a (p + 1) × (p + 1) matrix. We now

have
· ¸· ¸
T 1 j 1 j
HH = ,
jT Q − I j T QT − I
· ¸
(p + 1) 0
= .
0 J + (Q − I)(QT − I)
From the Lemma it follows that
J + (Q − I)(QT − I) = J + QQT − QI − IQT + I 2 ,

= J + pI − J − Q − QT + I,
= pI − Q − QT + I,
= (p + 1)I − Q − QT ,
= (p + 1)I + QT − QT = (p + 1)I.
Therefore
· ¸
T (p + 1) 0
HH = = (p + 1)I.
0 (p + 1)I
Thus H is a normalised Hadamard matrix of order p + 1 which is said to be of Paley
type.
9.2 Definition of the Codes

We now (finally) come to the definition of the codes themselves.
Let Hn be a normalised Hadamard matrix of order n. Replace +1’s by 0’s and −1’s by
1’s, in Hn. Then Hn is changed into the binary Hadamard matrix An . Since the rows of
Hn are orthogonal, any two rows of Hn (and therefore of An ) agree in n/2 places and also
differ in n/2 places. Therefore any two rows of An are a Hamming distance n/2 apart.
An gives rise to three Hadamard codes.
1. An (n − 1, n, n/2)-code, An, consisting of the rows of An with the first column

deleted.
2. An (n − 1, 2n, (n/2) − 1)-code, Bn, consisting of An together with the complements

of all its codewords.
3. An (n, 2n, n/2)-code Cn consisting of the rows of An and their complements.
Example 9.2.1. Using a 12 × 12 Hadamard matrix (constructed using Paley’s method)

and deleting the first column we find the following codes.
0 0 0 0 0 0 0 0 0 0 0
1 1 0 1 1 1 0 0 0 1 0
0 1 1 0 1 1 1 0 0 0 1
1 0 1 1 0 1 1 1 0 0 0
0 1 0 1 1 0 1 1 1 0 0
0 0 1 0 1 1 0 1 1 1 0
0 0 0 1 0 1 1 0 1 1 1
1 0 0 0 1 0 1 1 0 1 1
1 1 0 0 0 1 0 1 1 0 1
1 1 1 0 0 0 1 0 1 1 0
0 1 1 1 0 0 0 1 0 1 1
1 0 1 1 1 0 0 0 1 0 1
0 0 1 0 0 0 1 1 1 0 1
1 0 0 1 0 0 0 1 1 1 0
0 1 0 0 1 0 0 0 1 1 1
1 0 1 0 0 1 0 0 0 1 1
1 1 0 1 0 0 1 0 0 0 1
1 1 1 0 1 0 0 1 0 0 0
0 1 1 1 0 1 0 0 1 0 0
0 0 1 1 1 0 1 0 0 1 0
0 0 0 1 1 1 0 1 0 0 1
1 0 0 0 1 1 1 0 1 0 0
0 1 0 0 0 1 1 1 0 1 0
1 1 1 1 1 1 1 1 1 1 1
9.3. HOW GOOD ARE THE HADAMARD CODES ? 67
The first twelve rows form the (11, 12, 6) Hadamard code A12 (this is the code referred
to at the start of the chapter). All 24 rows form the (11, 24, 5) Hadamard code B12 .
9.3 How good are the Hadamard Codes ?

In trying to determine how good the Hadamard codes are we turn to the following bound
known as Plotkin’s bound.
Theorem 9.3.1 (Plotkin bound). For any (n, M, d) code C with dmin = d > n/2 we
have
¹ º
d
M ≤2 .
2d − n
Proof.
We will assume throughout that M ≥ 2, which is what is needed for a code to be useful.
Consider the sum
XX
S= d(u, v).
u∈C v∈C
Since d(u, v) ≥ d if u 6= v, S ≥ M (M − 1)d.

Let A be the M × n matrix whose rows are the codewords of C. Suppose the i’th
column contains xi 0’s and M − xi 1’s. This column contributes 2xi(M − xi ) to S.
Therefore
n
X
S= 2xi (M − xi ).
i=1
Now xi (M − xi ) is maximised if xi = M/2, so S is maximised if all xi = M/2.

Suppose first that M is even, then S ≤ (nM 2 )/2. In this case then
n
M (M − 1)d ≤ M 2 ,
2
n 2
∴ M d − M d − M ≤ 0,
2
µ ¶ 2
2d − n
∴ M 2 − dM ≤ 0,
2
µ ¶
2d − n
∴ M ≤ d,
2
2d
∴M ≤ .
2d − n
The last step being possible since d > n/2. We have that b2xc ≤ 2bxc + 1, so
¹ º
d
M≤2 + 1,
n − 2d
but since M is even
¹ º
d
M ≤2 .
n − 2d
Suppose now that M is odd. Then S is maximised if all xi = (M − 1)/2. In this case
µ ¶µ ¶ µ µ ¶¶
M −1 2M − M + 1 M +1 n
S≤n 2 = n (M − 1) = (M 2 − 1).
2 2 2 2
Therefore
n
M (M − 1)d ≤ (M 2 − 1),
2
n
∴ M (M − 1)d ≤ (M − 1)(M + 1),
2
n
∴ M d ≤ (M + 1),
2
n n
∴ Md − M ≤ ,
µ 2 ¶ 2
2d − n n
∴M ≤ ,
2 2
n 2d
∴M ≤ = − 1.
2d − n 2d − n
Again using b2xc ≤ 2bxc + 1, we find
¹ º ¹ º
d d
M≤2 +1−1=2 .
2d − n 2d − n
If we denote by A(n, d) the maximum number of codewords in an (n, d) code, then

¹ º
d
A(n, d) ≤ 2 ,
2d − n
if d > n/2.
By considering the code An from above (for which this bound holds) we see that
A(n − 1, n/2) ≥ n provided a Hadamard matrix of order n exists. Also, from the theorem
¹ º
n/2
A(n − 1, n/2) ≤ 2 = n.
n − (n − 1)
Therefore in this case the Hadamard codes satisfy the bound (if the appropriate Hadamard
matrix exists).
Part II Cryptography
Chapter 10
Introduction to Cryptography
To begin this chapter we define a few basic terms.

Cryptography for our purposes will be considered communication in the presence
of adversaries. Cryptanalysis is the study of the mathematical methods for defeating
cryptosystems. These two fields fall under the heading of Cryptology.
10.1 Basic Definitions

Definition 10.1.1 (Alphabet, Message space). An alphabet, A, is a finite set. An is
the set of all strings of length n over A. A∗ is the set of all strings of finite length over A.
The message space, M , is the set of strings over A.
Definition 10.1.2 (Encryption scheme). An Encryption scheme (or cipher) consists

of
• Messages spaces M and C — the plain-text and cipher-text spaces.
• A key set, K (also called the key space). Often keys are in a (key) pair k = (e, d),
where e is used for encryption and d is used for decryption.
• Encryption function Ek : M → C.
• Decryption function Dk : C → M , such that Dk (Ek (m)) = m for all m ∈ M .
We distinguish between the following two classes of adversaries :
1. Passive adversary : This adversary relies on eavesdropping only.
2. Active adversary : This type of adversary may insert or block transmissions.
71
72 CHAPTER 10. INTRODUCTION TO CRYPTOGRAPHY
Our goal is try to provide secrecy and the ability to detect altered or forged messages.
Although the delivery of messages cannot always be guaranteed (meaning that the message
may not arrive at all or may be not the original, intended message) we can send messages
regularly to discover communications disruptions.
“Breaking” a cryptosystem will be taken to mean that the plain-text can be discovered
from the cipher-text. This leads to various levels of security being defined.
A cryptosystem is
• Unconditionally secure if the adversary can’t gain any knowledge about the plain-
text (except maybe the length) regardless of the amount of cipher-text available and
the amount of computing resources available.
• Computationally secure if it is “computationally infeasible” to discover the plain-

text using the best known methods and some specified amount of computing power.
• Provably secure if breaking the system is at least as hard as solving a “difficult”

computational problem.
Kerckhoff (1883) gave the following guidelines for choosing a cipher :
1. The system should be unbreakable in practice.
2. The cipher-text should be easy to transmit.
3. The apparatus should be portable and be of such a nature that one person can
operate it.
4. It should be “easy” to operate.
We also have Kerckhoff’s principle :
The security of the cipher should rest with the keys alone. That is security is
maintained even if the adversary has the encryption scheme.
The possible attacks on a cryptosystem are :
• Cipher-text only attack. Here the adversary attempts to recover plain-text by

observing some amount of cipher-text.
• Known plain-text attack. The adversary has some quantity of plain-text and the
corresponding cipher-text.
• Chosen plain-text attack. Cipher-text corresponding to plain-text chosen by the

adversary is available.
10.2. AFFINE CIPHER 73
• Chosen cipher-text attack. The plain-text corresponding to cipher-text chosen

by the adversary is available.
One of the earliest ciphers is the simple substitution cipher. In this system the
keys are just permutations of A. The following example illustrates this.
Example 10.1.3. (The shift cipher).
Number A, B, C, . . . , Z by 0, 1, 2, . . . , 25. The encryption key e is a fixed shift (mod 26).
That is
α 7→ α + e (mod 26).
The decryption key d = 26 − e; the additive inverse (mod 26). If e = 3 then d = 23,
E3 (HOCKEY) = KRFNHB and D23 (KRFNHB) = HOCKEY.
This system is completely insecure against chosen plain-text attacks — choose as text
A, B, C, . . . , Z.
Definition 10.1.4 (Symmetric key encryption). A symmetric key encryption scheme

is one in which the effort required to find Dd from e is about as much as finding Ee from
d.
Two examples of symmetric key encryption schemes are the simple substitution cipher
where the keys are permutations of the alphabet and the shift cipher where the key
specifies the amount by which a letter is shifted mod 26.
10.2 Affine Cipher

To describe this cipher we label the letters of the alphabet, A,B, . . . ,Z, with the numbers
0,1, . . . ,25. The encryption function is then given by
Ek (x) = ax + b (mod 26).
Note that for decryption to be possible, Ek (x), needs to be one-to-one. This implies that
we need gcd(a, 26) = 1; any b may be used.
Therefore the encryption key e is a pair, (a, b), with gcd(a, 26) = 1. The number of
encryption keys thus is φ(26) · 26 = 12 · 26 = 312, where φ is Euler’s phi function.
As far as decryption is concerned, note that if y ≡ ax + b (mod 26), then y − b ≡ ax
mod 26, so that x ≡ a−1 (y − b) (mod 26). Here a−1 is the multiplicative inverse of a
among the elements relatively prime to 26. We can determine a−1 as follows : since
gcd(a, 26) = 1 there exists integers r and s such that ar + 26s = 1. These integers can be
found using the Euclidean algorithm. Therefore
ar ≡ 1 + 26(−s) (mod 26),

ar ≡ 1 (mod 26),
∴a −1
≡ r (mod 26).
Therefore the decryption key is also a pair (a−1 , b) and the decryption function is
Dk (y) = a−1 (y − b).
Example 10.2.1. As an example of the affine cipher let our key pair be k = (e, d) =
((7, 3), (15, 3)). Therefore
Ek (x) = 7x + 3 (mod 26),

Dk (y) = 15(y − 3) (mod 26).
Also, Dk (Ek (x)) = Dk (7x + 3) = 15(7x + 3 − 3) = 15(7x) ≡ x (mod 26).

The encryption of the word GARY would then proceed as follows
G : 7·6+3 = 45 ≡ 19 → T,
A : 7·0+3 = 3 ≡ 3 → D,
R : 7 · 17 + 3 = 122 ≡ 18 → S,
Y : 7 · 24 + 3 = 171 ≡ 15 → P,
10.2.1 Cryptanalysis of the Affine Cipher
The cryptanalysis of the affine cipher is based on examining the frequency with which
cipher-text occurs. As such we require the frequency with which the letters occur in
everyday usage. Below is a table that shows the letters, their associated decimal number
and their frequency of occurrence in everyday English (plain-text).
10.2. AFFINE CIPHER 75
Decimal Letter Frequency

0 A 0.082
1 B 0.015
2 C 0.028
3 D 0.043
4 E 0.127
5 F 0.022
6 G 0.020
7 H 0.061
8 I 0.070
9 J 0.002
10 K 0.008
11 L 0.040
12 M 0.024
13 N 0.067
14 O 0.075
15 P 0.019
16 Q 0.001
17 R 0.060
18 S 0.063
19 T 0.091
20 U 0.028
21 V 0.010
22 W 0.023
23 X 0.001
24 Y 0.020
25 Z 0.001
Consider the following block of text that was obtained from an affine cipher.
FMXVEDKAPHFENDRB
NDKRXRSREFMORUDS
DKDVSHVUFEDKAPRK
DLYEVLRHHRH
Counting the number of occurrences of the letters above we find that the most frequent
characters are : R (8 times), D (8 times), E,H,K (5 times) and F,V (4 times). Based on
this we guess that one of the most frequent characters in the cipher-text, R, represents
the most frequent character in plain-text, E. Next, we guess that the most frequent letter
after R in the cipher-text, D, represents the second most frequent letter in plain-text, T.
Therefore our guess is that E → R and T → D. This is the same as Ek (4) = 17 and
Ek (19) = 3 or a · 4 + b = 17 and a · 19 + b = 3. This implies that 15a = −14 ≡ 12
(mod 26), so that 7 · 15 · a ≡ a ≡ 7 · 12 ≡ 6 (mod 26) and b = 19. This cannot be a valid
key as gcd(a, 26) 6= 1.
After this we might guess that E → R and T → E, but this leads to the same sort of
problem as does E → R and T → H.
Our next best guess would be that E → R and T → K. Therefore Ek (4) = 17 and
Ek (19) = 10. This implies that 4a + b ≡ 17 (mod 26) and 19a + b ≡ 10 (mod 26). Thus
15a ≡ −7 ≡ 19 (mod 26), so that a ≡ 3 (mod 26) and b ≡ 5 (mod 26). Therefore our
encryption key would be (3, 5), which is a valid key since 3 is relatively prime to 26.
The decryption key corresponding to this is (3−1 , 5) = (9, 5) and then Dk (y) = 9(y − 5).
Applying this to the cipher-text we find the following.
ALGORITHMSAREQUI
TEGENERALDEFINIT
IONSOFARITHMETIC
PROCESSES
This corresponds to the text algorithms are quite general definitions of arithmetic pro-
cesses. Seeing that we have a piece of plain-text that “makes sense” we can assume that
we have the key.
10.3 Some Other Ciphers

Definition 10.3.1 (Block cipher). A block cipher transforms the message by encrypt-
ing blocks of a fixed number of characters.
Definition 10.3.2 (Polyalphabetic cipher). In a polyalphabetic cipher each source

symbol can map to one of several cipher-text symbols.
An example of a polyalphabetic cipher is the following.
10.3.1 The Vigenère Cipher

This cipher has the following properties.
• The symbols from A are identified with 0, 1, . . . , |A| − 1.

10.3. SOME OTHER CIPHERS 77
• A key(word) k = k0 k1 , . . . , kn−1 is a string in An.
• Encryption is performed on n character blocks of source by adding the keyword

(mod |A|) to the source characters.
Example 10.3.3. (Vigenère cipher)

A = {a, b, . . . , z} and k =golf. The encryption of a piece of plain-text (double eagle)
would then proceed as follows.
d o u b l e e a g l e
g o l f g o l f g o l
j c f g r s p f m z p
Therefore the plain-text double eagle is encrypted as jcfgrspfmzp.

To decrypt one subtracts the key from the cipher-text.
u b p r a g e k u z w t c h s w u i r m
g o l f g o l f g o l f g o l f g o l f
o n e m u s t f o l l o w t h r o u g h
In this case the cipher-text ubpragekuzwtchswucrm corresponds to the pain-text one must
follow through.
10.3.2 Cryptanalysis of the Vigenère Cipher : The Kasiski Ex-

amination
We now discuss a method that can be used to attack the Vigenère cipher. It was proposed
by Friedrich Kasiski in 1863. The discussion is based on the following piece of cipher-text
that is the result of a Vigenère cipher applied to a piece of English text. The plain-text
was first processed to remove all punctuation and spaces. The spaces in the cipher-text
were added to improve readability.
Offset Cipher-text
0 UPVZB BVUPN KKFOL OGAKU FBTKF LFXUJ VIPZV KFZXO FIDLO ONLUP
50 KKFUZ OMQFQ MQXKU AFIUP VVVVK KFDFL DMFIU PVVFI ZVTMU XDBZY
100 FVVYF ZTHBA ZQHEY LTXVU JVXFM IDRSQ EJNCI PVZZQ HQEYJ BZQHB
150 YHTWL OUWND OLVUJ VREZA JHTWW VPTZW VLVDM TROPV XWIMN KJBVE
200 FITKV XRQEL FZOBY HSMND TVFOJ DZQHB YLOOZ QTQXK UISLS LNLUP
250 RESWB HOEZQ HERVC MRWJV XWIMR LSISR WMIHF TZQHN CXUBV UJVXF
300 JZTOJ VXGJA REMMU GPEEG PEEWP BYHXI KHS
The idea of the Kasiski examination is to try and recover the key-length used in
the Vigenère cipher. This is accomplished by analysing the distances between identical
pieces of cipher-text : occasionally identical portions of plain-text will align with the same
fragment of the key producing exactly the same cipher-text. The possibility also exists for
“accidental” matches. That is identical fragments of cipher-text which did not result from
the same plain-text, but these tend to be rare especially with longer match lengths. In
the case of a “correct” match the key-length has to divide the difference in the positions
where these pieces of identical cipher-text occurred. Since accidental matches may also
occur it may not be enough to only examine the greatest common divisor of all distances.
In our example the fragment of cipher-text ZQH occurs in a number of places which
are underlined in the table above. The corresponding offsets are : 110, 138, 146, 226,
258, 286. It is therefore likely that the keyword length, l, divides the difference of any
of these. The differences 138 − 110 = 22 7 and 226 − 110 = 22 29 suggest that l divides
gcd(22 7, 22 29) = 22 . The situation with l = 1 corresponds to a simple shift.
At this point we may now examine the frequency distributions for each candidate
key-length. The idea being that at these fixed lengths the plain-text was combined with
the same letter in the keyword and so will give an accurate distribution. The table below
shows the frequencies that were found at each position of the key, that is for l = 2 we
examine the letters in the odd and even positions separately.
l Offsets Frequencies of the letters in the sub-message (sorted)

2 0, 2, 4, . . . 19 16 12 12 11 10 9 8 8 7 6 6 5 5 5 5 5 4 4 3 2 2 2 1 0 0
1, 3, 5, . . . 14 14 13 10 10 10 9 9 9 8 8 7 6 6 5 5 5 3 3 3 2 2 2 2 1 0
4 0, 4, 8, . . . 12 10 10 7 6 6 6 5 5 4 2 2 2 1 1 1 1 1 1 1 0 0 0 0 0 0
1, 5, 9, . . . 10 9 8 7 5 5 5 5 5 5 4 3 3 2 2 2 1 1 1 0 0 0 0 0 0 0
2, 6, 10, . . . 12 11 8 8 7 5 5 5 4 4 2 2 2 2 2 1 1 1 1 0 0 0 0 0 0 0
3, 7, 11, . . . 9 9 8 8 7 5 5 5 4 3 3 3 3 3 2 2 2 1 1 0 0 0 0 0 0 0
For a fixed source distribution each line of the correct key-length should reflect the
source frequencies. In this example, the frequencies suggest that l = 4 is more likely than
l = 2.
Since we know that the letter ‘e’ is the most frequent letter in everyday English, we
suspect that one of the higher-frequency cipher-text letters, in each line of the l = 4 case,
corresponds to the plain-text letter ‘e’. Now each line in the l = 4 case represents a letter
of the keyword. So if the highest frequency cipher-text letter in each row corresponded
to ‘e’, then all we would have to do is to subtract ‘e’ from the cipher-text letter to get the
corresponding keyword letter and so retrieve the keyword. Since we are not absolutely
10.3. SOME OTHER CIPHERS 79
100% sure that the highest-frequency cipher-text letter in each line corresponds to ‘e’
our best strategy would be to consider the 3 or 4 most frequent letters in each row (‘e’
would definitely be among the highest frequency ones). The table below shows the four
most frequent letters in each row as well as the required keyword letter that maps these
cipher-text letters back to ‘e’.
Cipher-text Keyword
c−e (mod 26)
F J U P −−−−−−−→ B F Q L
c−e (mod 26)
B V M I −−−−−−−→ X R I E
c−e (mod 26)
V Z R X −−−−−−−→ R V N T
c−e (mod 26)
K Q H L −−−−−−−→ G M D H
At this point one would do an exhaustive search using keywords that are built from
the likely letters shown above. That is one chooses a letter from the first row as the
possible first letter of the keyword, a letter from the second row for the second possible
letter of the keyword and so on. This requires 44 = 256 decryptions. If the keyword was
chosen from a dictionary this simplifies things a great deal. In our example this would be
words such as LEND and BIRD. Using BIRD on the cipher-text yields the following.
The water of the Gulf stretched out before her, gleaming with the million lights
of the sun. The voice voice of the sea is seductive, never ceasing, whispering,
clamoring, murmuring, inviting, the soul to wander in abysses of solitude. All
along the white beach, up and down, there was no living thing in sight. A
bird with a broken wing was beating the air above, reeling fluttering, circling
disabled down, down to the water.
The ZQH in the cipher-text used in the Kasiski examination corresponds to ‘ing’ in the
plain-text.
10.3.3 The Vernam Cipher

The Vernam cipher is an example of a stream cipher.
Definition 10.3.4 (Stream cipher (state cipher)). In a stream cipher the mapping
of a block may depend on its position in the message.
This cipher operates as follows.
• The alphabet A = {0, 1}.

• Keys are the same length as messages over A.
• A message m is encrypted as m ⊕ k. Where ⊕ is (mod 2) addition.
• To decrypt the cipher c we add the key k to it : c ⊕ k = (m ⊕ k) ⊕ k = m.
If the key digits are the result of independent Bernoulli trials with probability 1/2
(and used only once), then the cipher is known as a one-time-pad and is unconditionally
secure against a cipher-text only attack. The reason for this is that every message of the
same length as the cipher-text maps to the cipher-text for some choice of key, and all keys
are equally likely.
Chapter 11
Public Key Cryptography
This chapter is devoted to one of the most widely used forms of cryptography, namely
public key cryptography. The RSA cryptosystem, which is an example of this, is used by
millions of people around the world.
Definition 11.0.5 (Private key system). In a private key cryptosystem Dk is either
the same as Ek , or easy to get from it. If Ek is known the system is insecure. Therefore
Dk and Ek must be kept private.
Definition 11.0.6 (Public key system). In a public key system if Ek is known it is
(believed to be) computationally infeasible to use it to determine Dk . Therefore Ek can
be made public (for instance in a directory) and anyone can lookup Ek when they want
to send a message.
The believed computational infeasibility of determining Dk makes the exchange of
keys unnecessary.
The idea of a public key system arose in 1976 in the work of Diffie and Hellman [?].
The first public key system was discovered in 1977 by Rivest, Shamir and Adleman [?]
(the RSA system).
Public key systems can never be unconditionally secure. If the adversary has the
cipher-text, they can use Ek to encrypt every possible piece of plain-text until a match
is found. This process might not be feasible in practice, but is nonetheless possible in
principle.
Definition 11.0.7 (One-way function). A function f : M → C is one-way if
• f (m) is “easy” to compute for all m ∈ M .
• for all (or most) c ∈ C it is “computationally infeasible” to find m ∈ M such that

f (m) = c.
81
82 CHAPTER 11. PUBLIC KEY CRYPTOGRAPHY
It is not known whether one-way functions exist. A number of functions have been
identified that seem to be one-way.
Example 11.0.8. An example of a one-way function is found in the UNIX operating
system. Here the password to a user’s account is stored in a file that is readable to all
users of the system. The passwords are encrypted using a (what is believed to be) one-way
function.
Definition 11.0.9 (Trapdoor function). A one-way function is a trapdoor function if

it has the property that given some extra information, it becomes feasible to find m ∈ M
such that f (m) = c for a given c ∈ C.
So, in public key systems the extra information in a trapdoor function makes it possible
to find Dk .
11.1 The Rivest-Shamir-Adleman (RSA) Cryptosys-

tem
This system consists of the following.
• An integer n = pq, where p and q are large, distinct primes.
• The message space and cipher space are the same : M = C = Zn .
• The key pair k = (a, b) where ab ≡ 1 (mod φ(n)) with φ(n) = φ(pq) = (p − 1)(q − 1)
the Euler phi function.
• The encryption function Ek (x) = xb (mod n).
• The decryption function Dk (y) = ya (mod n).
• n and b are public, while p, q and a are private.
Let’s compute Dk (Ek (x)) to see that we can recover an encrypted message. To do this
we need the following facts.
✗ ab ≡ 1 (mod φ(n)) ⇐⇒ ab = 1 + tφ(n).
✗ Euler’s Theorem : If gcd(x, n) = 1, then xφ(n) ≡ 1 (mod n).

11.1. THE RIVEST-SHAMIR-ADLEMAN (RSA) CRYPTOSYSTEM 83
✗ If n = pq, then x1 ≡ x2 (mod n) ⇐⇒ x1 ≡ x2 (mod p) and x1 ≡ x2 (mod q) :

Proof.
⇒:
x1 ≡ x2 (mod pq) implies that x1 = x2 +t·pq. Therefore x1 = x2 +(tq)p = x2 +(tp)q,
so that x1 ≡ x2 (mod p) and x1 ≡ x2 (mod q).
⇐:
x1 ≡ x2 (mod p) and x1 ≡ x2 (mod q) implies that x1 − x2 = l1 p and x1 − x2 = l2 q.
Therefore q|l1 p so that q|l1 (p and q are different primes). Thus x1 − x2 = l3pq or
in otherwords x1 ≡ x2 (mod pq)
We have Dk (Ek (x)) = Dk (xb ) = (xb )a (mod n). Here xab = x1+tφ(n) = x · xtφ(n)
(mod n). Now there are two cases to consider.
1. gcd(x, n) = 1.
By Euler’s Theorem xφ(n) ≡ 1 (mod n), so that xtφ(n) = (xφ(n) )t ≡ 1t ≡ 1 (mod n).
Therefore Dk (Ek (x)) = x · xtφ(n) ≡ x (mod n).
2. gcd(x, n) > 1.
Here xtφ(n) = s (mod n) ⇐⇒ xtφ(n) = s (mod p) and xtφ(n) = s (mod q). We know
that φ(n) = (p − 1)(q − 1), φ(p) = p − 1 and φ(q) = q − 1. Thus t · φ(n) = t1 · φ(p) =
t2 · φ(q). Therefore xtφ(n) ≡ xt1 φ(p) ≡ 1 (mod p) and xtφ(n) ≡ xt2φ(q) ≡ 1 (mod q).
By Euler’s Theorem then xtφ(n) ≡ 1 (mod pq). This gives Dk (Ek (x)) = x·xtφ(n) ≡ x
(mod n).
In summary then the RSA system operates as follows.
1. Generate two large distinct primes p and q.
2. Compute n = pq and φ(n) = (p − 1)(q − 1).
3. Choose a random b with 0 < b < φ(n) and gcd(b, φ(n)) = 1. The last requirement
ensures that b−1 exists, which is needed in the next step.
4. Compute a = b−1 (mod φ(n)).
5. Publish b and n.
The choice of the b in step 3 is made using the using the Euclidean algorithm to check
whether gcd(b, φ(n)) = 1. The probability that a randomly chosen b is relatively prime to
φ(n) is φ(φ(n))/φ(n). Therefore we need to try about φ(n)/φ(φ(n)) different b’s before
one that is relatively prime to φ(n) is found.
The computation of a in step 4 can be done as a byproduct of the Euclidean algorithm

(using it in reverse).
Example 11.1.1. Suppose Bob choose his two primes to be p = 101 and q = 113. Then
n = 11413 and φ(n) = (p − 1)(q − 1) = (100)(112) = 11200 = 26 52 7. Since the prime
divisors of φ(n) are 2, 5 and 7 any b not divisible by 2, 5 and 7 is relatively prime to φ(n).
Let’s say Bob chooses b = 3533.
We now need to find a such that ab ≡ 1 (mod 11200). The Euclidean algorithm yields
11200 = 3533 × 3 + 601,

3533 = 601 × 5 + 528,
601 = 528 × 1 + 73,
528 = 73 × 7 + 17,
73 = 17 × 4 + 5,
17 = 5 × 3 + 2,
5 = 2 × 2 + 1.
The last nonzero remainder is the gcd (=1). We now use the results of the algorithm
in reverse to express 1 (the gcd) in terms of 11200 and 3533.
1 = 5 − 2 × 2,
= 5 − 2(17 − 5 × 3) = 7 × 5 − 2 × 17,
= 7(73 − 17 × 4) − 2 × 17 = 7 × 73 − 30 × 17,
= 7 × 73 − 30(528 − 73 × 7) = 217 × 73 − 30 × 528,
= 217(601 − 528) − 30(528) = 217 × 601 − 247 × 528,
= 217 × 601 − 247(3533 − 601 × 5) = 1452 × 601 − 247 × 3533,
= 1452(11200 − 3533 × 3) − 247 × 3533 = 1452 × 11200 − 4603 × 3533,
= 1452 × 11200 + (−4603) × 3533.
Therefore a = b−1 ≡ −4603 ≡ 6597 (mod 11200).

Bob will publish n = 11413 and b = 3533.
To send plain-text message 9726 to Bob, Alice computes
97263533 (mod 11413) = 5761.
Bob decrypts this as
57616597 (mod 11413) = 9726,

11.1. THE RIVEST-SHAMIR-ADLEMAN (RSA) CRYPTOSYSTEM 85
which equals the original message.

For the RSA system to be effective we need fast methods to do the exponentiation.
The method we describe is known as “square and multiply.”
Suppose we want to compute xb (mod n). Let the binary representation of b be equal
to bl bl−1 · · · b0 . That is b = b0 + 2b1 + 22b2 + · · · 2l bl (here l = dlog2 be − 1). Therefore
2 b +···2l b
xb = xb0+2b1 +2 2 l
,
b0 2b1 22 b2 2l bl
= x x x ···x ,
l
= xb0 (x2 )b1 (x4 )b2 · · · (x2 )bl .
l
Thus we compute x, x2 , x4 , . . . , x2 and multiply together the terms corresponding to the
bits bi that equal 1.
Example 11.1.2. Compute 5726 (mod 91).
26 = 11010.
We need
571 ≡ 57 (mod 91),

2
57 = 3249 ≡ 64 (mod 91),
574 = (572 )2 = 642 = 4096 ≡ 1 (mod 91),
8 4 2 2
57 = (57 ) = 1 ≡ 1 (mod 91),
16 8 2 2
57 = (57 ) = 1 ≡ 1 (mod 91).
Therefore 5726 = 1 · 1 · 64 ≡ 64 (mod 91).
11.1.1 Security of the RSA System

One of the most straightforward ways to attack the RSA system would be to try and
compute φ(n) — from which we would be able to compute b. In this section we show that
the level of difficulty involved in doing this is no higher than trying to factor n.
Assume then that a cryptanalyst has both n and φ(n). Then by solving the set of
equations
n = pq,
φ(n) = (p − 1)(q − 1),
for the unkowns p and q we can factor n. If we substitute q = n/p into the second
equation, we obtain a quadratic in the unkown p
p2 − (n − φ(n) + 1)p + n = 0.
The two roots of this equation will be p and q.

Therefore if a cryptanalyst knows φ(n) this can be used to factor n. Thus computing
φ(n) is no easier than factoring n.
11.2 Probabilistic Primality Testing

In setting up an RSA system we need two large (on the order of 100 decimal digits)
primes. This section discusses a method for identifying candidate primes.
Definition 11.2.1 (Decision problem). A decision problem is a problem with a “yes”

or “no” answer.
Definition 11.2.2 (Probabilistic algorithm). A probabilistic algorithm is any algo-

rithm that uses random numbers.
Definition 11.2.3 (Yes-based Monte Carlo algorithm). A yes-based Monte Carlo

algorithm, for a decision problem, is a probabilistic algorithm in which a “yes” answer is
always correct but a “no” answer may be incorrect.
A yes-based Monte Carlo algorithm has an error probability of ² if the algorithm gives
an incorrect answer “no” (when the answer is really “yes”) with a probability at most ².
Here the probability is computed over all possible random choices made by the algorithm
when it is run with a given input.
The basic idea behind the identification of possible primes is as follows. Generate
random odd numbers and use an algorithm that answers “yes” or “no” to the question
“is n composite ?” A “yes” answer is always correct, but a “no” answer maybe incorrect
with some probability ² < 1. Run the algorithm k times. If it ever answers “yes” then n
is composite. Otherwise n is prime with probability 1 − ²k .
Theorem 11.2.4 (The Prime Number Theorem (1899)).
π(n)
lim = 1,
n←∞ n/ ln(n)
where π(n) is the number of primes less than or equal to n.
From the Prime Number Theorem we see that the probability that a randomly chosen
integer k is prime is approximately
k/ ln(k) 1
= .
k ln(k)
11.2. PROBABILISTIC PRIMALITY TESTING 87
If only odd integers are considered, the probability √is approximately 2/ ln(k). So on
average one could expect to test about ln(k)/2 = ln( k) odd integers before a prime is
found.
a prime with about 100 digits, then k ≈ 10100 above, so that we
If we need to find √
need to test about ln( k) ≈ 115 integers on average before a prime is found.
The algorithm for identifying possible primes that we will be discussing is known as
the Miller-Rabin algorithm. We introduce this algorithm by way of the following theorem.
Theorem 11.2.5 (Fermat’s Little Theorem). If n is prime and gcd(a, n) = 1, then

an−1 ≡ 1 (mod n).
Suppose that n is an odd prime, then n − 1 is even, so that n − 1 can be written as

n − 1 = 2k m, where k ≥ 1 and m is odd. Let a be an integer relatively prime to n.
k k−1
By Fermat’s little Theorem then a2 m ≡ 1 (mod n). That is (a2 m )2 ≡ 1 (mod n).
k−1
Therefore a2 m ≡ ±1 (mod n). If it is congruent to +1, we can repeat the argument to
k−2
find that a2 m ≡ ±1 (mod n).
k−i
If at each step a2 m ≡ 1 (mod n) then eventually am ≡ ±1 (mod n).
2 k−1
The algorithm works in reverse by looking at the values am , a2m , a2 m , . . . , a2 m
and seeing whether they are congruent to −1 (mod n).
The Miller-Rabin algorithm is given as input an odd integer n and it answers the
question “is n composite ?” A “yes” answer is correct and a “no” answer maybe wrong.
The algorithm consists of six steps.
1. Write n − 1 = 2k m, where m is odd.
2. Choose a random a with 1 ≤ a ≤ n − 1.
3. Compute b = am (mod n).
4. If b ≡ 1 (mod n) then answer “prime” and stop.
5. For i = 0, 1, 2, . . . , k − 1
If b ≡ −1 (mod n), then answer “prime” and stop,
else replace b by b2 , i by i + 1 and go to step 5.
6. Answer “composite” and stop.
Theorem 11.2.6. The Miller-Rabin algorithm is a yes-based Monte-Carlo algorithm for

testing if n is composite.
Proof.
Assume the algorithm answers “composite” for some prime integer n. We will obtain a
contradiction.
Since the answer is “composite”, it must be the case that am 6≡ 1 (mod n). In step 5
of the algorithm, the sequence of values tested is
2m 3m k−1 k−1 m
b = am , b2 = a2m , b4 = a2 , b8 = a 2 . . . b2 = a2 .
Since the algorithm answers “composite” we know that

i
a2 m 6≡ −1 (mod n) i = 0, 1, 2, . . . , k − 1.
k k−1
The fact that n is prime implies that an−1 = a2 m ≡ 1 (mod n), so that (a2 m )2 ≡ 1
k−1 k−1 k−1
(mod n). Thus a2 m ≡ ±1 (mod n). We know that a2 m 6≡ −1 (mod n) so a2 m 6≡ 1
(mod n). Repeating the argument k − 1 more times we find that am ≡ 1 (mod n), a
contradiction.
Fact The error probability in the Miller-Rabin Algorithm is less than or equal to 0.25.
Chapter 12
The Rabin Cryptosystem
The second example of a public key cryptosystem that we will consider is the Rabin
Cryptosystem [?]. This system has the following features.
• It uses two distinct primes p and q each of which is congruent to 3 (mod 4). Let
n = pq.
• The message space and cipher space are the same : M = C = Zn .
• Further, it has an encryption key B, chosen so that 0 ≤ B ≤ n − 1.
• The encryption function,
Ek (x) = x(x + B) (mod n).
• The decryption function,

r
B2 B
Dk (y) = +y − (mod n),
4 2
√
where B/4 = 4−1 B and t is any number s such that s2 = t.
• n and B are made public, p and q are private.
For this system to be of use we need to know that there are infinitely many primes of
the form 4k + 3 (i. e primes that are congruent to 3 (mod 4)). This can be seen in one of
two ways. The first is Dirichlet’s Theorem and the second is a direct proof, both of which
are shown below.
Theorem 12.0.7 (Dirichlet’s Theorem). The sequence ak + b, k = 0, 1, 2, . . . , con-

tains infinitely many primes if and only if gcd(a, b) = 1.
89
90 CHAPTER 12. THE RABIN CRYPTOSYSTEM
Since gcd(4, 3) = 1, we know that there are infinitely many primes of the form 4k + 3.
We now show directly that there are infinitely many primes congruent to 3 (mod 4) :
Assume that there are only finitely many primes congruent to 3 (mod 4), say p0 = 3,
p1 = 7, p2 = 11, . . . , pn . Consider N = 4p1 p2 · · · pn + 3 (note that p0 is not included in
this product). We know that if p and q are both congruent to 1 (mod 4), then so is their
product. Therefore a number congruent to 3 (mod 4) must have a prime divisor that
is congruent to 3 (mod 4) (if all were congruent to 1 (mod 4), the number itself would
be congruent to 1 (mod 4)). We have N ≡ 3 (mod 4), so that N has a prime divisor
congruent to 3 (mod 4). On the other hand if 3|N, then 3|4p1 p2 · · · pn and since 3 is
prime this implies that 3|4 or 3|pi for some i ≥ 1, a contradiction. Thus 3 6 | N . Also, if
i ≥ 1 then pi |N and pi|4p1 p2 · · · pn implies that pi|3, a contradiction. That is there exists
a prime congruent to 3 (mod 4) not among p0, p1 , . . . , pn, a contradiction. Therefore the
number of primes congruent to 3 (mod 4) is infinite.
It turns out that the encryption function is not one-to-one. To see this we need the
following two results.
Theorem 12.0.8 (Chinese Remainder Theorem). The system of congruences
x ≡ a1 (mod n1 ),
x ≡ a2 (mod n2 ),
..
.
x ≡ at (mod nt ),
where gcd(ni , nj ) = 1 if i 6= j, has a unique solution (mod n1 n2 · · · nt ). This solution is

x = a1 M1 y1 +a2 M2 y2 +· · ·+at Mt yt , where Mi = (n1 n2 · · · nt )/ni and yi = Mi−1 (mod ni ),
1 ≤ i ≤ t.
Proposition 12.0.9. If p and q are distinct odd primes, then the congruence x2 ≡ 1
(mod pq) has exactly four solutions (mod pq).
Proof.
x2 ≡ 1 (mod pq) ⇐⇒ x2 − 1 ≡ 0 (mod pq) ⇐⇒ x2 − 1 = kpq, k ∈ Z ⇐⇒ p|(x2 − 1)
and q|(x2 − 1) ⇐⇒ x2 ≡ 1 (mod p) and x2 ≡ 1 (mod q).
Now x2 − 1 = (x − 1)(x + 1), so p|(x2 − 1) ⇐⇒ p|(x − 1)(x + 1) ⇒ p|(x − 1) or
p|(x + 1). That is x ≡ 1 (mod p) or x ≡ −1 (mod p). Similarly x ≡ ±1 (mod q).
91
Consider the four systems of congruences.
x≡1 (mod p) and x≡1 (mod q),

x≡1 (mod p) and x ≡ −1 (mod q),
x ≡ −1 (mod p) and x≡1 (mod q),
x ≡ −1 (mod p) and x ≡ −1 (mod q).
By the Chinese Remainder Theorem each system has a unique solution (mod pq) and each
solution leads to a solution of x2 ≡ 1 (mod pq). Since p and q are odd primes, 1 6≡ −1,
so the four solutions are all distinct.
Example 12.0.10. If n = 15 = 3 × 5, then 1 ≡ 1, −1 ≡ 14, 4 ≡ 4 and −4 ≡ 11 are the

square roots of 1 (mod 15).
We are now in a position to show that the encryption function is not one-to one. Let
x ∈ Zn and ω be one of the four square roots of 1 (mod n), where n = pq. Then
µ µ ¶ ¶
B B
Ek ω x + − ,
2 2
· µ ¶ ¸· µ ¶ ¸
B B B B
= ω x+ − ω x+ + ,
2 2 2 2
µ ¶2
2 B B2
= ω x+ − ; ω2 = 1,
2 4
= x2 + xB = x(x + B) = Ek (x).
This implies that there are four different plain-texts that encrypt to the same cipher-text
as x.
Next we treat the decryption process. The receiver is given a cipher-text y and wants
to determine x such that x2 + Bx ≡ y (mod n). To simplify notation let x1 = x + B/2,
that is x = x1 − B/2. The congruence then becomes
µ ¶2 µ ¶
B B
y ≡ x1 − + B x1 − ,
2 2
B2 B2
= x21 + − ,
4 2
B2
= x21 − (mod n).
4
Therefore
B2
x21 ≡ y + (mod n).
4
By letting C = y + B 2 /4 we get
x21 ≡ C (mod n).
This is equivalent to the system
x21 ≡ C (mod p),

x21 ≡ C (mod q).
Each congruence in the system has zero or two solutions. These can be combined, as
before, to get up to four solutions (mod pq).
To determine the solutions to the congruences above we need the following concept
and theorem.
Definition 12.0.11 (Quadratic residue). Let m be a prime number. A number a 6≡ 0

(mod m) is called a quadratic residue (mod m) if there exists and x such that x2 ≡ a
(mod m). Otherwise a is a quadratic non-residue (mod m).
Example 12.0.12. If m = 7, the quadratic residues are
12 ≡ 1,
22 ≡ 4,
32 ≡ 2,
−22 ≡ 4,
−32 ≡ 2,
−12 ≡ 1.
As an aside we note that if m is prime then there exists (m − 1)/2 quadratic residues
and (m − 1)/2 quadratic non-residues.
One way to determine if a is a quadratic residue (mod m) is to use Euler’s criterion.
Theorem 12.0.13 (Euler’s Criterion). Let m be prime. Then a number a 6≡ 0 (mod m)
is a quadratic residue (mod m) ⇐⇒ a(m−1)/2 ≡ 1 (mod m).
Note that if a is a nonzero quadratic residue then a(m−1)/2 ≡ −1 (mod m).
Euler’s Criterion only answers yes or no as to whether there exists an x such that
2
x ≡ a (mod m). It does not say how to find this x. If our prime, m, is of a specific form
then the determination of this x becomes easy.
93
If m is prime and m ≡ 1 (mod 4), then there is no known, efficient algorithm to find
square roots, i.e. the x, (mod m).
On the other hand when m is a prime and m ≡ 3 (mod 4), the square roots are easy
to find. It is just ±a(m+1)/4 . To check this we compute
¡ ¢2
±a(m+1)/4 ≡ a(m+1)/2 ,
≡ a · a(m−1)/2,
≡ a (mod m),
as long as a is a quadratic residue (mod m).

Therefore the square roots of a are ±a(m+1)/4 (mod m). Note that we are using the
fact that m ≡ 3 (mod 4) when we calculate the exponent (m + 1)/4.
Returning to our original problem we have
x21 ≡ C (mod p),

x21 ≡ C (mod q).
Using the above procedure we find the two square roots of C (mod p) and the two square
roots (mod q). These may then be combined using the Chinese Remainder Theorem to
find the square roots of C (mod n).
Recall that we have
B
x = x1 −
,
2
B2
C = y+ .
4
We know that
r
√ B2
x1 = C= y+ ,
4
so that
r
B2 B
x= y+ − = Dk (y).
4 2
The four square roots of C (mod n) leads to the four possibilities for x.
Example 12.0.14. n = 7 × 11 = 77 and B = 9.
The encryption function is Ek (x) = x2 + Bx = x2 + 9x (mod 77).
The decryption function is

p
Dk (y) = y + 4−1 B 2 − 2−1 B,
p
= y + 4−1 · 81 − 2−1 · 9,
p
= y + 4−1 · 4 − 39 · 9,
p
= y + 1 − 43.
√
Suppose the cipher-text 22 is received. Then Dk (22) = 23 − 43 (mod 77). We
√
therefore need to find 23 (mod 77). Both 7 and 11 are congruent to 3 (mod 4). Therefore
23(7+1)/4 ≡ 232 ≡ 22 ≡ 4 (mod 7),

23(11+1)/4 ≡ 233 ≡ 13 ≡ 1 (mod 11).
Thus the square roots of 23 (mod 7) are ±4 and the square roots of 23 (mod 11) are ±1.
We use this to get the four square roots of 23 (mod 77).
√ √ √
From 4 ≡ 23 (mod 7) and 1 ≡ 23 (mod 11) we get x1 ≡ 23 (mod 77), where
x1 = 4 × 11 × y1 + 1 × 7 × y2 (mod 77),
and
y1 ≡ 11−1 ≡ 2 (mod 7),

y2 ≡ 7−1 ≡ 8 (mod 11).
Therefore x1 = 4 × 11 × 2 + 1 × 7 × 8 ≡ 144 ≡ −10 (mod 77).

Similarly, the other square roots of 23 (mod 77) can be found to be 10, ±32.
The four possible plain-texts are found by subtracting B/2 ≡ 43 (mod 77). They are
10 − 43 ≡ 44 (mod 77),
67 − 43 ≡ 24 (mod 77),
32 − 43 ≡ 66 (mod 77),
45 − 43 ≡ 2 (mod 77).
12.1 Security of the Rabin Cryptosystem

The aim of this section is to show that being in possession of an algorithm that decrypts
(maybe only part of the) cipher-text from a Rabin cryptosystem is equivalent to factoring
12.1. SECURITY OF THE RABIN CRYPTOSYSTEM 95
n. Since it is suspected that factorisation is “difficult,” one can expect that the design
of such a decryption algorithm will be at least as hard as designing an algorithm that
factors n.
So suppose that the attacker has a decryption algorithm A. We may then proceed as
follows.
1. Choose a random r, 1 ≤ r ≤ n − 1.
2. Compute y = r 2 − B 2 /4 (mod n).
3. Use A to obtain a decryption x.
4. Let x1 = x + B/2.
5. If x1 ≡ ±r (mod n) the algorithm fails, else gcd(x1 + r, n) = p or q — a success.
We will see below that this algorithm factors n with probability at least 1/2. First we
explain its working.
In step 2 we have
µ ¶ µ ¶µ ¶
B B B
Ek r − = r− r− +B ,
2 2 2
µ ¶µ ¶
B B
= r− r+ ,
2 2
B2
≡ r2 − (mod n),
4
= y.
So y is the encryption of x = r − B/2 or x will be the decryption that we obtain in step

3.
In step 4, x21 ≡ (x + B/2)2 ≡ r2 (mod n). That is x1 ≡ ±r (mod n) or x1 ≡ ±ωr
(mod n), where ω 6= ±1 is a square-root of 1 (mod n). This also says that x21 − r 2 ≡ 0
(mod n), that is n|(x1 − r)(x1 + r).
From this we see why the algorithm fails if x1 ≡ ±r (mod n) : if this happens, then one
of the two factors that n divides is zero and this does not give us any useful information.
On the other hand if x1 ≡ ±ωr (mod n), then n does not divide either of the factors
(x1 − r) or (x1 + r). Thus computing gcd(x1 + r, n) (or gcd(x1 − r, n)) must give p or q.
We now turn to estimating the probability of success of the algorithm. To do this we
consider all possible n − 1 choices for the random r in step 1.
Define an equivalence relation on Zn \ {0} by r1 ∼ r2 ⇐⇒ r12 ≡ r22 (mod n). The
equivalence classes all have 4 elements : The equivalence class of r is {±r, ±ωr} and these
are all distinct by Proposition 12.0.9. Note that any two values in the same equivalence
class give the same value for y in step 2. In step 3, given y the decryption algorithm
returns x. By the calculation in step 4, x1 is a member of the equivalence class of r. If it
is ±r, the algorithm fails. If it is ±ωr the algorithm factors n as explained above.
Since r is chosen at random it is equally likely to be any of the four members of its
equivalence class. Two of the four members leads to success, therefore the probability of
success is at least 1/2.
Chapter 13
Factorisation Algorithms
The purpose of this chapter is to discuss some of the algorithms available for attempting
to factor a given integer.
13.1 Pollard’s p − 1 Factoring Method (ca. 1974)

Suppose the odd number n is to be factored and let p be a prime divisor of n. This
method uses the following result.
Proposition 13.1.1. If for any prime power q|(p − 1) we have q ≤ B, then (p − 1)|B!.
Proof.
Suppose p − 1 has the prime factorisation
p − 1 = pα1 1 pα2 2 · · · pαk k ,

= q1 q2 · · · qk , where qi = pαi i .
Since pi 6= pj if i 6= j, we have gcd(qi, qj ) = 1 if i 6= j.

By hypothesis, qi ≤ B for i = 1, 2, . . . , k. Therefore q1 q2 · · · qk are all distinct terms
in B! = 1 · 2 · 3 · · · B. Thus p − 1 = q1 q2 · · · qk |B!.
At this point we assume that we have found the B in the proposition above.
Let a ≡ 2B! (mod n), then a ≡ 2B! (mod p). By Fermat’s Theorem we know 2p−1 ≡ 1
(mod p). Since (p − 1)|B!, a ≡ 2B! ≡ (2p−1 )t ≡ 1t ≡ 1 (mod p). Therefore p|(a − 1),
furthermore p|n. These two statements together imply p| gcd(a − 1, n).
We now present the algorithm. It has as input an integer n (the integer to be factored)
and an integer B (the “bound” on the size of the prime divisors of p − 1).
1. Compute a ≡ 2B! (mod n).
97
98 CHAPTER 13. FACTORISATION ALGORITHMS
2. Set d = gcd(a − 1, n).
3. If 1 < d < n then d|n (success) else no factor of n is found.
Example 13.1.2. Let’s apply the algorithm to n = 36259 and B = 5. We need to calculate
25! (mod 36259). This can be done in five steps.
21! ≡ 2 (mod 36259),

2! 1! 2
2 = (2 ) ≡ 4 (mod 36259),
23! = (22! )3 = 43 ≡ 64 (mod 36259),
24! = (23! )4 = 644 ≡ 25558 (mod 36259),
5! 4! 5 5
2 = (2 ) = 25558 ≡ 22719 (mod 36259).
Therefore a ≡ 22719 (mod 36259). Now we need gcd(a − 1, n) = gcd(22718, 36259) = 1.

Since d = 1 in the algorithm we have failed to find a factor of n.
Next we try a bigger B, namely B = 10. So we need 210! (mod 36259). We calculate
in the same manner as above and find that
26! ≡ 34839 (mod 36259),

27! ≡ 7207, (mod 36259)
8!
2 ≡ 21103, (mod 36259)
9!
2 ≡ 25536, (mod 36259)
210! ≡ 25251. (mod 36259)
That is a ≡ 25251 (mod 36259). So gcd(a − 1, n) = gcd(25250, 36259) = 101. From this
we find that n = 36259 = 101 × 359.
Note that in the example we have p = 101 so that p − 1 = 100 = 22 52 . By the
proposition we need B to be at least 25 for the algorithm to be guaranteed to work. In
our case B = 10 worked, so the algorithm apparently does not need B to be as big as
required by the proposition. The proposition just says that if B is this big the algorithm
will work.
Chapter 14
The ElGamal Cryptosystem
In this chapter we introduce yet another public key cryptosystem. It is known as the
ElGamal Cryptosystem [?] and relies on discrete logarithms which we introduce first.
14.1 Discrete Logarithms (a.k.a indices)

Let p be a prime. Then Zp is a field and Z∗p = Zp \{0} is a cyclic group under multiplication
(mod p).
Definition 14.1.1 (Primitive element (primitive root)). A generator of Z∗p is called

a primitive element of Z∗p (or also a primitive root of p).
So if a is a primitive element of Z∗p , then a, a2 , a3 , . . . , ap−1 (mod p) are just the el-
ements 1, 2, 3, . . . , p − 1 in some order. That is for each x ∈ Z∗p there is a number
e ∈ {1, 2, 3, . . . , p − 1} such that ae ≡ x (mod p).
Definition 14.1.2 (Discrete Logarithm). We call e the discrete logarithm (or index)
of x with respect to a if ae ≡ x (mod p) and denote it by loga (x).
The problem of finding loga (x) in Zp is generally regarded as being difficult. Modular
exponentiation is easy, but its inverse — discrete logarithm — is not. That is modular
exponentiation is believed to be a one-way function.
14.2 The system

The ElGamal cryptosystem has the following features.
✗ It is a public key system.
99
100 CHAPTER 14. THE ELGAMAL CRYPTOSYSTEM
✗ It is based on the difficulty of finding discrete logarithms.
✗ It is non-deterministic : it involves a random integer k chosen by the sender.
The details of the system are as follows.
• Choose a prime p such that the determination of discrete logarithms in Zp is difficult.
• Choose a generator α of Z∗p , there are φ(p − 1) choices.
• The message space M = Z∗p .
• The cipher space C = Z∗p × Z∗p .
• The keys k = (p, α, e, β), where β = αe (mod p), that is logα (β) = e (mod p).
• p, α and β are made public while e is kept private.
• The encryption function
Ek (x) = (y1 , y2 ),
where
y1 = αk (mod p),
k
y2 = xβ (mod p),
and k is a random integer in {0, 1, 2, . . . , p − 1}.
• The decryption function Dk (y1 , y2 ) = y2 (y1e )−1 (mod p).
Firstly let’s check that decryption works. So with y1 and y2 as given above we find.
Dk (y1 , y2 ) = y2 (y1e )−1 (mod p),

= y2 ([αk ]e)−1 (mod p),
e k −1
= y2 ([α ] ) (mod p),
k −1
= y2 (β ) (mod p),
= (xβ k )(β k )−1 ≡ x (mod p).
Example 14.2.1. Let p = 23 and α = 5 in an ElGamal cryptosystem. If e = 9, then

β ≡ αe ≡ 59 ≡ 11 (mod 23).
To send the plain-text x = 7 :
14.3. ATTACKING THE ELGAMAL SYSTEM 101
• Choose a random k ∈ {0, 1, 2, . . . , 22}. Suppose k = 13.
• Then compute
y1 ≡ 513 ≡ 21 (mod 23),

13
y2 ≡ 7 · 11 ≡ 4 (mod 23),
and send (21, 4).

On the receiving end suppose now that (21, 4) is received. So firstly we compute
y1 ≡ 219 ≡ 17 (mod 23). Now we need (y1e)−1 = 17−1 in Z∗23 . To do this we use the
e
Euclidean algorithm.
23 = 1 × 17 + 6,
17 = 2 × 6 + 5,
6 = 1 × 5 + 1.
Working backwards through the results of the algorithm we find.
1 = 5 − 6,
= 6 − (17 − 2 × 6) = 3 × 6 − 17,
= 3(23 − 17) − 17 = 3 × 23 − 4 × 17.
From the last step we find −4×17 = 1+(−3)×23 ≡ 1 (mod 23). That is 17−1 ≡ −4 ≡ 19
(mod 23).
To complete decryption we compute y2 (y1e )−1 = 4 × 19 ≡ 76 ≡ 7 (mod 23).
14.3 Attacking the ElGamal System by Computing

Discrete Logarithms
This section will discuss one algorithm that can be used to compute discrete logarithms
and in so doing break the ElGamal system. The algorithm is due to Shanks.
Let p be a prime and α a generator for Z∗p . In the ElGamal system we are given β and
§√ ¨
we want to find e = logα β. Let m = p − 1 . By the division algorithm we can write
e = mj + i, 0 ≤ i ≤ m − 1.
§√ ¨
Since m = p − 1 and 0 ≤ e ≤ p − 2, 0 ≤ j ≤ m − 1. Now β ≡ αe (mod p) ⇐⇒
β ≡ αmj+i (mod p) ⇐⇒ βα−i ≡ αmj (mod p). This is the basis of Shanks’ algorithm.
1. Compute αmj (mod p), 0 ≤ j ≤ m − 1 and store the pairs (j, αmj ) in a list sorted
by increasing value of the second coordinate. The reason for storing the numbers in
this way is to simplify searching through the list later.
2. Compute βα−i (mod p), 0 ≤ i ≤ m − 1 and store the pairs (i, βα−i) in a list sorted
by increasing second coordinate.
3. Find a pair (j, y) in the list from step 1 and a pair in the list from step 2 having the
same second coordinate.
4. e = logα β = mj + i (mod p − 1)
Note that in the last step we are computing (mod p − 1). The reason for this is that the
powers of α (the generator) are always going to be in the range 1, 2, 3, . . . , p − 1.
Example 14.3.1. Let p = 23 and α = 5. We would like to find log5 (11).
§√ ¨ §√ ¨
m= p−1 = 22 = 5, so that m − 1 = 4.
We compute the following.
50·5 ≡ 1 (mod 23),

51·5 ≡ 20 (mod 23),
52·5 ≡ 9 (mod 23),
53·5 ≡ 19 (mod 23),
54·5 ≡ 12 (mod 23).
Therefore list 1 is (0, 1), (2, 9), (4, 12), (3, 19), (1, 20).
Next we need to compute 11 · 5−i, 0 ≤ i ≤ 4. To do this note that 522 ≡ 1 (mod 23).
Therefore 5−i ≡ 522−i (mod 23).
11 · 5−0 ≡ 11 · 1 ≡ 11 (mod 23),

11 · 5−1 ≡ 11 · 521 ≡ 11 · 14 ≡ 16 (mod 23),
11 · 5−2 ≡ 11 · 520 ≡ 11 · 12 ≡ 17 (mod 23),
−3 19
11 · 5 ≡ 11 · 5 ≡ 11 · 8 ≡ 8 (mod 23),
−4 18
11 · 5 ≡ 11 · 5 ≡ 11 · 6 ≡ 20 (mod 23).
This gives list 2 as (3, 8), (0, 11), (1, 16), (2, 17), (4, 20).
14.3. ATTACKING THE ELGAMAL SYSTEM 103
Scanning through the lists we find (1, 20) in the first list and (4, 20) in the second list,
giving
e = log5 (11) = 1 × m + 4 = 1 × 5 + 4 ≡ 9 (mod 22).

Chapter 15
Elliptic Curve Cryptography
This chapter discusses a generalisation of the ElGamal cryptosystem, namely elliptic curve
cryptography. This was proposed by Koblitz and Miller [?, ?].
15.1 The ElGamal System in Arbitrary Groups

Let G be a group and α ∈ G. Let H = hαi, the cyclic subgroup of G generated by α.
For β ∈ H, we can define logα β = k ⇐⇒ αk = β and 0 ≤ k ≤ |H| − 1. If the
problem of computing logα β from G, α and β is “hard” we can set up a cryptosystem as
follows.
• Let (G, ◦) be a finite group and α ∈ G.
• Let H = hαi such that the problem of computing logα β (β ∈ H) is hard.
• The message space M = G.
• The cipher space C = G × G.
• The keys k = (G, α, e, β), where β = αe , that is logα β = e.
• α and β are made public, while e is kept private.
• For x ∈ G the encryption function is Ek (x) = (y1 , y2 ), where
y1 = αk ,
y2 = x ◦ β k ,
and k is a secret random number with 0 ≤ k ≤ |G| − 1.
105
106 CHAPTER 15. ELLIPTIC CURVE CRYPTOGRAPHY
• The decryption function is Dk (y1 , y2 ) = y2 ◦ (y1e)−1 .
Note that xk = x ◦ x ◦ · · · ◦ x (k times).

The proof that encryption is one-to-one is exactly the same as in the case of the original
ElGamal system.
An important point to note is that care is needed when choosing the group G and
α ∈ G. The following illustrates this point.
Let G = (Zn, +) and α a generator of Zn, that is gcd(α, n) = 1. Since exponentiation
is repeated application of the group operation we have “xk ”= x + x + · · · + x = kx in
this group. Therefore finding logα β means finding k such that kα = β (mod n). That is
k = α−1 β, α−1 exists since gcd(α, n) = 1 and α−1 can be found by using the Euclidean
algorithm.
Example 15.1.1. Let G = Z30 and α = 3. Then H = {0, 3, 6, 9, . . . , 27}. Suppose e = 4
is chosen, then β =“αe ”= 4 × 3 = 12. To send the plain-text 17 we proceed as follows.
• Choose a random k ∈ {0, 1, 2, . . . , 29}, say k = 5.
• Compute Ek (17) = (y1 , y2 ), where
y1 = “αk ” = 5 × 3 ≡ 15 (mod 30),

y2 = x ◦ “β k ” = 17 + 5 × 12 ≡ 17 (mod 30).
We then send (15, 17).

Assume now that (15, 17) is received. The first step in the decoding is to compute
“y1 ”. We find that “y1e ”= 4 × 15 = 60 ≡ 0 (mod 30). Secondly we invert this to get
e
(“y1e ”)−1 = −0 ≡ 0 (mod 30). Lastly we compute y2 ◦ (“y1e ”)−1 = 17 + 0 ≡ 17 (mod 30).
Suppose that Eve intercepts the cipher-text (9, 16) and tries to decrypt it. She knows
that α = 3 and β = 12 (these are made public). She needs to find an e such that e×α = β
in Z30. That is
3e ≡ 12 (mod 30) ⇐⇒ e ≡ 4 (mod 30/ gcd(3, 30)) ⇐⇒ e ≡ 4, 14, 24 (mod 30).
If Eve knew that |H| = 10, then she’d be done. On the other hand by Lagrange’s Theorem
we have that |H| divides |G| and therefore |H| ≤ 15 which rules out 24.
This shows that we might also want to keep order of H secret.
15.2 Elliptic Curves

We begin with the definition of elliptic curves.
15.2. ELLIPTIC CURVES 107
Definition 15.2.1 (Elliptic Curve). Let p > 3 be prime. The elliptic curve
y 2 = x3 + ax + b,
over Zp is the set of solutions to the congruence
y 2 ≡ x3 + ax + b (mod p),
where a, b ∈ Zp are constants such that
4a3 + 27b2 6≡ 0 (mod p),
together with a special point O called the point at infinity.
An elliptic curve E can be made into an Abelian group by using the following opera-
tion, where arithmetic is in Zp .
Let P = (x1 , y1 ) and Q = (x2 , y2 ) be points on E, then
(
O if x1 = x2 and y1 = −y2 ,
P +Q=
(x3 , y3 ) otherwise,
where
x3 = λ2 − x1 − x2 ,
y3 = λ(x1 − x3 ) − y1 ,
and
(
y2 −y1
x2 −x1 if P 6= Q,
λ= 3x21 +a
2y1
if P = Q.
We also take O to be the identity, that is P + O = O + P = P .

Note that the inverse of a point (x, y) is the point (x, −y).
We will skip the proof that (E, +) is an Abelian group with identity O.
Example 15.2.2. Let E be the elliptic curve y 2 ≡ x3 + x + 6 over Z11 .
The first thing we need to do is to find the points on E. We do this by taking each
x ∈ Z11 , computing x3 + x + 6 and using Euler’s criterion to see whether the result is a
√
square. If it is a square, then since 11 ≡ 3 (mod 4), z ≡ ±z (11+1)/4 ≡ ±z 3 (mod 11).
This produces the following table.
x x3 + x + 6 square ? y
0 6 65 ≡ −1 ∴ ✗
1 8 85 ≡ −1 ∴ ✗
2 5 55 ≡ 1 ∴ ✓ ±53 ≡ ±4
3 3 ✓ ±33 ≡ ±5
4 8 ✗
5 4 ✓ ±2
6 8 ✗
7 4 ✓ ±2
8 9 ✓ ±3
9 7 ✗
10 4 ✓ ±2
Therefore E has 13 points (including the point at infinity) and they are :
O; (2, 4); (2, 7); (3, 5); (3, 6); (5, 2); (5, 9);
(7, 2); (7, 9); (8, 3); (8, 8); (10, 2); (10, 9).
Since 13 is prime any nonidentity element will generate the group. Note also that
(E, +) is cyclic and isomorphic to Z13 .
Let α = (2, 7). Then the powers of α are multiples of α in this group and we have the
following.
2α = (2, 7) + (2, 7),

λ = (3(22 ) + 1)(2 · 7)−1 ≡ 2(3−1 ) ≡ 2 · 4 ≡ 8 (mod 11),
∴ x3 = 82 − 2 − 2 = 60 ≡ 5 (mod 11),
∴ y3 = 8(2 − 5) − 7 ≡ 8(−3) − 7 ≡ 82 − 7 ≡ 2 (mod 11),
∴ 2α = (5, 2).
Also,
3α = 2α + α = (5, 2) + (2, 7),

λ = (7 − 2)(2 − 5)−1 ≡ 5(−3)−1 ≡ 5(−4) ≡ 5(7) ≡ 2 (mod 11),
∴ x3 = 22 − 5 − 2 ≡ −3 ≡ 8 (mod 11),
∴ y3 = 2(5 − 8) − 2 ≡ 2(−3) − 2 ≡ 2 (mod 11),
∴ 3α = (8, 3).
15.2. ELLIPTIC CURVES 109
The following table gives the “powers” of α.
k kα
1 (2, 7)
2 (5, 2)
3 (8, 3)
4 (10, 2)
5 (3, 6)
6 (7, 9)
7 (7, 2)
8 (3, 5)
9 (10, 9)
10 (8, 8)
11 (5, 9)
12 (2, 4)
In general one would like to be able to know how many points there are on a given el-
liptic curve over Zp . This is needed so that one may be able to construct a correspondence
between plain-text and the points on the curve. The following theorem gives bounds on
the number of points.
Theorem 15.2.3 (Hasse’s Theorem). Let p > 3 be prime and E an elliptic curve over
Zp . Then the number, N(E), of points on E satisfies
√ √
p + 1 − 2 p ≤ N(E) ≤ p + 1 + 2 p.
We also have the following interesting result.
Theorem 15.2.4. Let p > 3 be prime and E an elliptic curve over Zp . Then there exist
integers n1 and n2 such that
(E, +) ∼
= Zn1 × Zn2 .
Furthermore n2 |n1 and n2 |(p − 1).
The last theorem implies that there exists a cyclic subgroup of (E, +) isomorphic to
Zn1 . We may be able to use this in an ElGamal system if we can find it.
Example 15.2.5. (Continued)
We use the elliptic curve in the previous example to set up an ElGamal system with
α = (2, 7) and an exponent e = 7. So β = αe = 7 · α = (7, 2) (from table).
Ek (x) = (kα, kβ + x) = (k(2, 7), k(7, 2) + x), where x ∈ E and k is chosen at random
from {0, 1, 2, . . . , 12}.
Dk (y1 , y2 ) = y2 − ey1 = y2 − 7y1 .
To encrypt (10, 9) (which is a point on the curve) :
1. Choose k, say k = 3.
2. Compute
y1 = 3α = (8, 3),
y2 = 3(7, 2) + (10, 9),
= 3(7α) + (10, 9),
= 21α + (10, 9),
= 8α + (10, 9),
= (3, 5) + (10, 9),
= (10, 2).
Therefore we send the cipher-text ((8, 3), (10, 2)).

Suppose now that we are the receiver of the cipher-text ((8, 3), (10, 2)) = (y1 , y2 ) and
that we would like to decrypt it. So we would have to compute
Dk (y1 , y2 ) = y2 ◦ (y1e )−1 ,

= (10, 2) − 7(8, 3),
= (10, 2) − 7(3α),
= (10, 2) − 21α = (10, 2) − 8α,
= (10, 2) + (−8(2, 7)) = (10, 2) + (−(3, 5)),
= (10, 2) + (3, −5) = (10, 2) + (3, 6),
= (10, 9).
15.3 The Menezes-Vanstone Cryptosystem

In this section we describe another cryptosystem, the Menezes-Vanstone system [?], that
also uses elliptic curves. This system is made up as follows.
15.3. THE MENEZES-VANSTONE CRYPTOSYSTEM 111
• An elliptic curve E over Zp with p > 3 is used such that (E, +) has a cyclic subgroup
H = hαi for which computing discrete logarithms is “hard.”
• The message space M = Z∗p × Z∗p .
• The cipher space C = E × Z∗p × Z∗p .
• The keys k = (E, α, e, β) where β = eα, that is logα β = e, and α ∈ E.
• α and β are made public, while e is kept secret.
• The encryption function Ek (x1 , x2 ) = (y0 , y1 , y2 ), where
y0 = kα where k is a random integer,

y1 = c1 x1 (mod p),
y2 = c2 x2 (mod p),
and
(c1 , c2 ) = kβ ∈ E.
• The decryption function Dk (y0 , y1 , y2 ) = (x0 , x00 ), where
x0 = y1 c−1
1 (mod p),
00
x = y2 c−1
2 (mod p),
and
(c1 , c2) = ey0 .
We now show that the decryption function is in fact the inverse of the encryption
function. So suppose (as above) that we have encrypted the message (x1 , x2 ) as (y0, y1 , y2 ).
Then computing Dk (y0 , y1 , y2 ) yields the following.
ey0 = ekα = kβ = (c1 , c2 ),
x0 = y1 c−1 −1
1 ≡ c1 x1 c1 ≡ x1 (mod p),
x00 = y2 c−1 −1
2 ≡ c2 x2 c2 ≡ x2 (mod p).
Therefore Dk (y0 , y1 , y2 ) = (x1 , x2 ), as desired.

Example 15.3.1. Let E be the elliptic curve y2 ≡ x3 + x + 6 over Z11 — the same one as
in the previous example. We also choose α = (2, 7) and e = 7. Therefore β = 7α = (7, 2).
To encrypt the message (9, 1) (which in this case is an element of Z∗11 × Z∗11 and not
of the curve as in the previous example) we proceed as follows.
1. Choose a random integer k, say k = 6.
2. Compute
y0 = kα = 6(2, 7) = (7, 9),

kβ = 6(7, 2) = 6 · 7 · α = 42α = 3α = (8, 3),
∴ c1 = 8 and c2 = 3,
y1 = c1 x1 = 8 · 9 ≡ 6 (mod 11),
y2 = c2 x2 = 3 · 1 ≡ 3 (mod 11).
So we send ((7, 9), 6, 3) as the encrypted message.

Suppose now that ((7, 9), 6, 3) is received and that we would like to decrypt it. We
therefore compute the following.
ey0 = 7(7, 9) = 7 · 6α = 42α = 3α = 2(2, 7) = (8, 3),

∴ c1 = 8 and c2 = 3,
x0 = y1 c−1
1 = 6·8
−1
≡ 6 · 7 ≡ 42 ≡ 9 (mod 11),
x00 = y2 c−1
2 = 3·3
−1
≡ 1 (mod 11).
Therefore we decrypt this as (9, 1).

Chapter 16
The Merkle-Hellman “knapsack”

System
This chapter describes a system that was developed around 1978, but that was subse-
quently broken a few years later. It remains an interesting system in spite of this that
can be used in conjunction with other systems.
The Merkle-Hellman [?] or “knapsack” cryptosystem revolves around the subset sum
problem.
Definition 16.0.2 (Subset sum problem). Given positive integers s1 , s2, . . . , sn and
T — the sizes and target — try to find a binary vector x = (x1 , x2 , . . . , xn) such that
x1 s1 + x2 s2 + · · · xnsn = T .
This problem is known to be NP-complete in general, but there are easy special cases.
Definition 16.0.3 (Superincreasing list). A list (s1, s2 , . . . , sn ) is called superincreas-

ing if
j−1
X
sj > si,
i=1
for 2 ≤ j ≤ n.
If the list of sizes in the subset sum problem is superincreasing, then the problem is
easy to solve, as demonstrated by the following algorithm (shown in Figure 16) that finds
the binary vector in this case.
The reason why the algorithms works is simply because sn is greater than the sum of
all the other sizes, so if sn ≤ T it has to be chosen to try and reach the target — all the
other sizes put together is not enough to reach the target. Once sn is chosen (or discarded
113
114 CHAPTER 16. THE MERKLE-HELLMAN “KNAPSACK” SYSTEM
for i from n down to 1

{
if T ≥ si
then xi = 1 and replace T by T − si
else xi = 0
}
P
if xi si = T then a solution has been found
otherwise no solution exists
Figure 16.1: Algorithm for solving the subset problem for a superincreasing sequence
if sn > T ) and the size of T reduced to T − sn (or left as T if sn is not chosen) we have a
new subset sum problem with a smaller target (or the same) and n − 1 sizes that have to
be chosen; we just repeat this procedure. To see the uniqueness, realize that each si that
was chosen had to be chosen — there was no possibility of not using it.
The Merkle-Hellman “knapsack” system is made up of the following components.
• S = (s1 , s2 , . . . , sn) a superinscreasing list of integers.

P
• p> si is a prime number and a ∈ Z∗p .
• t = (t1 , t2 , . . . , tn) is defined by
ti = asi (mod p).
That is each ti is taken to be the least residue of asi (mod p).
• The message space M = {0, 1}n .
• The cipher-text space C = {0, 1, 2, . . . , n(p − 1)}.
• The keys k = (S, p, a, t), where t is made public and S, p and a are kept private.
• The encryption function

n
X
Ek (x1 , x2 , . . . , xn) = xi ti.
i=1
• The decryption function Dk (y) = (x1 , x2 , . . . , xn), where x1 , x2 , . . . , xn equals the

solution to the subset sum problem with target sum T = a−1 y (mod p) and sizes S.
115
First let’s show that the decryption function is the inverse of the encryption function.
P
So suppose that the binary vector x1 , x2 , . . . , xn is encrypted as y = xiti , as shown
P −1
above. We need to show that xisi = T = a y (mod p).
y = x1 t1 + x2 t2 + · · · + xn tn,
∴ a−1 y ≡ x1 a−1 t1 + x2 a−1 t2 + · · · + xn a−1 tn (mod p),
≡ x1 s1 + x2 s2 + · · · xn sn (mod p),
= x1 s1 + x2 s2 + · · · + xnsn .
P
The equality in the last step follows from the fact that p > si . Also since (s1 , s2 , . . . , sn )
is superincreasing the solution is unique.
Example 16.0.4. S = (2, 5, 10, 25) is superincreasing. Choose p = 53 and a = 10. The
public list of sizes, t, is
t1 = 20 (mod 53),
t2 = 50 (mod 53),
t3 = 100 ≡ 47 (mod 53),
t4 = 250 ≡ 38 (mod 53).
Thus t = (20, 50, 47, 38).

To encrypt (1, 0, 1, 1) compute the following
1 · 20 + 0 · 50 + 1 · 47 + 1 · 38 = 105.
To decrypt 105 we need 10−1 ≡ 16 (mod 53). Then T = 16 × 105 ≡ 16 × −1 ≡ 37

(mod 53). We now solve for the subset sum problem with the list (2, 5, 10, 25) and target
sum 37.
25 ≤ 37 ⇒ x4 = 1 new target : 37 − 25 = 12,

10 ≤ 12 ⇒ x3 = 1 new target : 12 − 10 = 2,
5 > 2 ⇒ x2 = 0 new target : 2 − 0 = 2,
2 ≤ 2 ⇒ x1 = 1 new target : 2 − 2 = 0.
Since 1 · 2 + 0 · 5 + 1 · 10 + 1 · 25 = 37 we have achieved the target and the solution is

correct.
116 CHAPTER 16. THE MERKLE-HELLMAN “KNAPSACK” SYSTEM
Chapter 17
The McEliece Cryptosystem
In the previous chapter we saw an example of a cryptosystem that was constructed using
an easy instance of a “hard” problem. The system that we present in this chapter is based
on the same idea and appeared in [?]. Here the hard problem is that of decoding a binary
linear code where the generator matrix is given. As an easy special case we consider the
class of Goppa codes (which include the Hamming codes).
The Goppa codes have the following properties.
✗ They are [2m , 2m − mt, 2t + 1]-codes.
✗ They have efficient encoding and decoding algorithms.
✗ There exist many inequivalent codes in this family all with the same parameters.
The McEliece cryptosystem has the following components.
• G is a generator matrix for a [2m , 2m − mt, 2t + 1] Goppa code.
• S is a k × k matrix that is invertible over Z2 where k = 2m − mt.
• P is an n × n permutation matrix where n = 2m .
• G0 = SGP .
• The message space M = (Z2 )k — k-tuples over Z2 .
• The cipher-text space C = (Z2 )n.
• The keys k = (G, S, P, G0 ), where G0 is made public and S, P and G are kept private.
• The encryption function is Ek (x) = xG0 + e, where x ∈ (Z2)k and e ∈ (Z2 )n is a

random error of weight t.
117
118 CHAPTER 17. THE MCELIECE CRYPTOSYSTEM
• The decryption function is a four step process that operates on y ∈ (Z2 )n as follows.
(i) Compute y1 = yP −1 .
(ii) Decode y1 , obtaining y1 = x1 + e1 , where x1 ∈ C.
(iii) Compute x0 ∈ (Z2 )k such that x0 G = x1 .
(iv) Compute x = x0 S −1 .
Let’s show that decryption actually reverses the encryption.

Assume that x is encrypted as Ek (x) = y = xG0 + e = x(SGP ) + e. Therefore
yP −1 = x(SGP )P −1 + eP −1 = xSG + eP −1 = x1 + e1 , where x1 = (xS)G that is x0 = xS.
Thus x0 S −1 = xSS −1 = x.
Example 17.0.5. As our Goppa code we choose the [7,4,3] Hamming code that has gen-
erator matrix
 
1 0 0 0 1 1 0
 
 0 1 0 0 1 0 1 
G= .
 0 0 1 0 0 1 1 
0 0 0 1 1 1 1
Furthermore we choose
 
1 1 0 1
 
 1 0 0 1 
S= ,
 0 1 1 1 
1 1 0 0
to be invertible over Z2 and
 
0 1 0 0 0 0 0
 0 0 0 1 0 0 0 
 
 0 0 0 0 0 0 1 
 
 
P = 1 0 0 0 0 0 0 .
 
 0 0 1 0 0 0 0 
 
 0 0 0 0 0 1 0 
0 0 0 0 1 0 0
The public generating matrix then is
 
1 1 1 1 0 0 0
 
 1 1 0 0 1 0 0 
G0 = SGP =  .
 1 0 0 1 1 0 1 
0 1 0 1 1 1 0
119
Suppose now that we would like to encrypt the plain-text x = (1, 1, 0, 1). Since the
Hamming code is a single error correcting code our random error vector has to be of
weight one. Say we choose e = (0, 0, 0, 0, 1, 0, 0). The corresponding cipher-text is
y = xG0 + e,
 
1 1 1 1 0 0 0
 
 1 1 0 0 1 0 0 
= (1, 1, 0, 1)   + (0, 0, 0, 0, 1, 0, 0),
 1 0 0 1 1 0 1 
0 1 0 1 1 1 0
= (0, 1, 1, 0, 0, 1, 0) + (0, 0, 0, 0, 1, 0, 0),
= (0, 1, 1, 0, 1, 1, 0).
Assume now that we receive the cipher-text (0, 1, 1, 0, 1, 1, 0) and that we would like
to decrypt it. First we compute
y1 = yP −1 ,
 
0 0 0 1 0 0 0
 1 0 0 0 0 0 0 
 
 0 0 0 0 1 0 0 
 
 
= (0, 1, 1, 0, 1, 1, 0)  0 1 0 0 0 0 0 ,
 
 0 0 0 0 0 0 1 
 
 0 0 0 0 0 1 0 
0 0 1 0 0 0 0
= (1, 0, 0, 0, 1, 1, 1).
Next we need to decode y1 . Looking at the generator matrix we see that y1 is Hamming
distance one from the first row of G and since the Hamming code is a single error correcting
code we would decode y1 as x1 = (1, 0, 0, 0, 1, 1, 0). At this point we get x0 = (1, 0, 0, 0).
Finally we compute
 
1 1 0 1
 
 1 1 0 0 
x = x0 S −1 = (1, 0, 0, 0)   = (1, 1, 0, 1).
 0 1 1 1 
1 0 0 1
120 CHAPTER 17. THE MCELIECE CRYPTOSYSTEM
Appendix A
Assignments
MATH 433D/550
Assignment 1
Due: Monday, January 21, 2002, at the start of class.
1. Recall that the reliability of a binary symmetric channel (BSC) is the probability
p, 0 ≤ p ≤ 1, that the digit sent is the digit received.
(a) [1] Would you use a BSC with p = 0? If so, how? What about a BSC with p = 1/2?
(b) [1] Explain how to convert a BSC with 0 ≤ p < 1/2 into a channel with 1/2 < p ≤ 1.
2. Let C be a binary code of length n. We define the information rate of C to be the

number i(C) = n1 log2 (|C|). This quantity is a measure of the proportion of each codeword
that is carrying the message, as opposed to redundancy that has been added to help deal
with errors.
(a) [1] Prove that if C is a binary code, then 0 ≤ i(C) ≤ 1. In questions (b) and (c),
suppose you are using a BSC with reliability p = 1 − 10−8 and that digits are transmitted
at the rate 107 digits per second.
(b) [3] Suppose the code C contains all of the binary words of length 11. (i) What is the
information rate of C? (ii) What is the probability that a word is transmitted incorrectly
and the errors are not detected? (iii) If the channel is in constant use, about how many
words are (probably) transmitted incorrectly each day?
(c) Now suppose the code C 0 is obtained from C by adding an extra (parity check) digit
to the words in C, so that the number of 1’s in each codeword is even.
(i) What is the information rate of C 0 ?
(ii) What is the probability that a word is transmitted incorrectly and the transmission
errors go undetected?
121
122 APPENDIX A. ASSIGNMENTS
(iii) If the channel is in constant use, about how long do you expect must pass between
undetected incorrectly transmitted words? Express your answer as a number of days.
3. [4] Establish the following three properties of the Hamming distance (for binary codes
C):
(a) d(u, w) = 0 if and only if u = w.
(b) d(v, w) = d(w, v).
(c) d(v, w) ≤ d(v, u) + d(u, w), ∀u ∈ C.
4. Let C be the code consisting of all binary words of length 4 that have even weight.
(a) [2] Find the error patterns C detects.
(b) [2] Find the error patterns C corrects.
5. [2] Prove that the minimum distance of a linear code is the smallest weight of a non-zero
codeword.
6. [6] Prove that a code can simultaneously correct all error patterns of weight at most
t, and unambiguously detect all non-zero error patterns of weight t + 1 to d (where
t ≤ d) if and only if it has minimum distance at least t + d + 1. (For example, consider
C = {000, 111}. This single error correcting code detects all non-zero error patterns of
weight at most 2. But, if 000 is sent and 110 is received, then only one error is detected
and the received word is incorrectly decoded as 111. The ambiguity here is that it is not
clear whether the error pattern is 110 or 001.)
123
MATH 550
Assignment 1
Solutions
Question 1
(a) Yes. Since the probability of error equals 1, each bit is received incorrectly. By
inverting each bit at the receiver we obtain the original bit. On the other hand a
BSC with p = 1/2 is completely unreliable. The probability of seeing a specific bit
at the receiving end equals 1/2, this situation is similar to flipping an unbiased coin
at the receiver and recording heads as 1 and tails as 0. Thus the channel is not able
to carry any information.
(b) Simply invert each bit at the receiving end.
Question 2
(a) We have 1 ≤ |C| ≤ 2n (one codeword; all words of length n). Therefore since log is
monotone increasing
log2 (1) log2 |C| log2 (2n)
≤ ≤ ,
n n n
so that
0 ≤ i(C) ≤ 1.
(b) n = 11 and |C| = 211 . Then
(i)
log2 |C|
i(C) = = 1.
n
(ii) This code cannot detect any errors since all words of length 11 are codewords
— any codeword will be changed into another codeword by any error pattern.
The undetected error probability then is (q = 1 − p the error probability)
µ ¶ µ ¶ µ ¶ µ ¶ µ ¶ µ ¶
11 10 11 2 9 11 3 8 11 4 7 11 5 6 11 6 5
Pe (C) = qp + q p + q p + q p + q p + q p +
1 2 3 4 5 6
µ ¶ µ ¶ µ ¶ µ ¶ µ ¶
11 7 4 11 8 3 11 9 2 11 10 1 11 11
q p + q p + q p + q p + q ,
7 8 9 10 11
= 1.1 × 10−7 .
(iii) 107 bits per second implies 864 × 109 bits per day, this is approximately
78545454545 words (of length 11) per day. We expect a fraction of 1.1 × 10−7
of these to be in error. Therefore about 8640 words per day are in error.
(c) (i) |C 0 | = 211 ; n0 = 12, so that
log2 |C 0 | 11
i(C 0 ) = 0
=
n 12
(ii) This parity check code can detect all error patterns of odd weight. On the
other hand an error pattern of even weight results in a received word of even
weight (either two ones cancel or both ones contribute to the weight of the
received word). Thus even weight error patterns are not detectable. Therefore
µ ¶ µ ¶ µ ¶ µ ¶ µ ¶ µ ¶
0 12 2 10 12 4 8 12 6 6 12 8 4 12 10 2 12 12
Pe (C ) = q p + q p + q p + q p + q p + q ,
2 4 6 8 10 12
= 6.6 × 10−15 .
(iii) We are transmitting about 72 × 109 words per day. From part (ii) we know
that the undetected error rate is 6.6 × 10−15 . Therefore about (72 × 109 )(6.6 ×
10−15 ) = 4.752 × 10−4 words per day are in error. This is the same as about 1
word error every 2104 days (≈ 5.77 years).
Question 3
(a) d(u, w) = 0 ⇐⇒ u = w.
⇒:
Let u, w ∈ C and d(u, w) = 0. This means that u and w differ in 0 places, therefore
u = w.
⇐:
d(u, u) = wt(u + u) = wt(0) = 0.
(b) d(v, w) = d(w, v).

d(v, w) = wt(v + w) = wt(w + v) = d(w, v).
(c) d(v, w) ≤ d(v, u) + d(u, w), ∀ u ∈ C.

d(v, w) = wt(v + w) = wt(v +u + u + w) = d(v + u, u + w) ≤ wt(v + u) + wt(u +w) =
d(v, u) + d(u, w). The last inequality follows from the fact that d(x, y) ≤ wt(x) +
wt(y) for any x, y ∈ C since the ones in x could line up with the zeros in y and vice
versa.
125
Question 4
C = {0000, 0011, 0101, 0110, 1001, 1010, 1100, 1111}
(a) Any odd weight error pattern is detectable since it changes a codeword (of even
weight) into a word of odd weight. Even error patterns are not detectable since
they change a codeword into a word if even weight, which is a codeword. Thus the
detectable error patterns are
{0001, 0010, 0100, 0111, 1000, 1011, 1101, 1110}
(b) It is easily verified that this code is linear and so its minimum distance is 2. To
be able to correct all error patterns of weight t we need a minimum distance of at
least 2t + 1. In our case this implies t = 1/2. This then says that no errors of
weight greater than 1 will be correctable but some errors of weight one may still be
correctable. Adding a weight one error pattern to any of the codewords produces a
received word that is closer to more than one codeword, therefore decoding becomes
impossible. Thus the code cannot correct any errors.
Question 5
Let C be a linear code, then
dmin (C) = min d(u, v) = min wt(u + v) = min wt(w).

u,v∈C u,v∈C w∈C
u6=v u6=v w6=0
The last step follows from the fact that C is linear so that the sum of two codewords is
again a codeword. Further, all nonzero codewords can be formed in this way since if v is
a nonzero codeword, then v 6= 0 and v = v + 0.
Question 6
A code C can simultaneously correct all error patterns of weight at most t and detect
all nonzero error patterns of weight t + 1 to d (d ≥ t) ⇐⇒ dmin (C) ≥ t + d + 1.
⇐:
Let dmin (C) ≥ t + d + 1 ≥ 2t + 1, since d ≥ t. By a previous theorem we now know that
C can correct all error patterns of weight t. Let u ∈ C be sent, w be received and z be
an error pattern such that t + 1 ≤ wt(z) ≤ d. Therefore t + 1 ≤ d(u, w) ≤ d. Let v ∈ C
and v 6= u, then
d(u, v) ≤ d(u, w) + d(w, v)

∴ d(w, v) ≥ d(u, v) − d(u, w)
≥ t + d + 1 − d = t + 1.
Therefore w lies outside the t-ball around v. Since v was an arbitrary codeword, w does
not lie inside any of the t-balls that surround the codewords of C. This implies that w
will not be decoded to any codeword, as only words that lie inside a t-ball of a codeword
will be decoded to that codeword. We are therefore able to detect that more than t, but
less than d + 1 errors have occurred.
⇒:
Let C be a code that can simultaneously correct t and detect t + 1 to d (d ≥ t) errors.
The fact that C can correct t errors implies that dmin (C) ≥ 2t + 1. Let u, v ∈ C such
that d(u, v) = dmin (C). Place a ball of radius d around u and a ball of radius t around
v. By the error detection property of the code, any error pattern of weight d will not be
able to change u in such a way that it lies inside the t-ball around v. Therefore the two
balls above are disjoint. This implies that d(u, v) = dmin (C) ≥ t + d + 1.
127
Math 433D/550
Assignment 2
Due: Monday, February 4, 2002, in class.
1. Let S = {11000, 01111, 11110, 01010}, and C =< S >.

(a) [3] Find a both generator matrix and a parity check matrix for C.
(b) [1] What is the dimension of C, and of C ⊥ ?
(c) An (n, k, d)-code is a linear code of length n, dimension k, and minimum distance
d. Find the parameters (n, k, d) for C and C ⊥ . (You may want to use the result of
question 3 below.)
2. Let C be the linear code with generator matrix

 
1 0 0 0 1 1 1
 
0 1 0 0 1 1 0
G= .
0 0 1 0 1 0 1
0 0 0 1 0 1 1
Assign data to the words in K 4 by letting the letters A, B, . . . , P correspond to

0000, 0001, . . . , 1111, respectively.
(a) [2] Encode the message CALL HOME (ignore the space).
(b) [5] Suppose the message 0111000, 0110110, 1011001, 1011111, 1100101, 0100110 is
received. Decode this message using the CMLD procedure involving syndromes and cosets
(syndrome decoding). How many errors occurred in transmission?
3. [4] Let H be a parity check matrix for a linear code C. Prove that C has minimum
distance d if and only if any set of d − 1 rows of H is linearly independent and some
set of d rows of H is linearly dependent.
4. [4] Suppose that C is a linear code of length n with minimum distance at least 2t+1 .
Prove that every coset of C contains at most one word of weight t or less. Use this to
show that syndrome decoding corrects all error patterns of weight at most t.
5. (a) [3] Prove the Singleton bound: For an (n, k, d)-code, d − 1 ≤ n − k. (Hint: consider
a parity check matrix.)
(b) [4] An (n, k, d)-code is called maximum distance separable (MDS) if equality holds in
the Singleton bound, that is, if d = n − k + 1. Prove that the following statements are
equivalent.
(1) C is MDS,
(2) every n − k rows of the parity check matrix are linearly independent,
(3) every k columns of the generator matrix are linearly independent.
(c) [3] Show that the dual of an (n, k, n + k − 1) MDS code is an (n, n − k, k + 1) MDS
code.
129
MATH 550
Assignment 2
Solutions
Question 1
(a) We write the rows of S into the matrix A and put it in reduced row echelon form :
     
1 1 0 0 0 1 1 0 0 0 1 1 0 0 0
  Add row 1   Add row 2  
 0 1 1 1 1  to row 3  0 1 1 1 1  to row 4  0 1 1 1 1  Add row 3
to row 4
G =   −−−−−→   −−−−−→   −−−−−→
 1 1 1 1 0   0 0 1 1 0   0 0 1 1 0 
0 1 0 1 0 0 1 0 1 0 0 0 1 0 1
     
1 1 0 0 0 1 0 1 1 1 1 0 0 0 1
  Add row 2   Add row 3  
 0 1 1 1 1  to row 1  0 1 1 1 1  to row 1  0 1 1 1 1  Add row 3
to row 2
  −−−−−→   −−−−−→   −−−−−→
 0 0 1 1 0   0 0 1 1 0   0 0 1 1 0 
0 0 0 1 1 0 0 0 1 1 0 0 0 1 1
   
1 0 0 0 1 1 0 0 0 1
  Add row 4  
 0 1 0 0 1  to row 3  0 1 0 0 1 
  −−−−−→  .
 0 0 1 1 0   0 0 1 0 1 
0 0 0 1 1 0 0 0 1 1
Therefore the generator matrix is
 
1 0 0 0 1
 
 0 1 0 0 1 
G= ,
 0 0 1 0 1 
0 0 0 1 1
and the parity check matrix is
 
1
 1 
 
 
H = 1 .
 
 1 
1
(b) dim(C) = 4 and dim(C ⊥ ) = 1.
(c) Using question 3 we see that dmin (C) = 2. Further, GT is a parity check matrix
for C ⊥ from which we see (using question 3 again) that dmin (C ⊥ ) = 5. Therefore
C is a [5, 4, 2]-code and C ⊥ is a [5, 1, 5]-code.
Question 2
(a) The message CALL HOME is encoded as : 0010101 0000000 1011001 1011001
0111000 111100 1100001 0100110.
(b) The parity check matrix for this code is

 
1 1 1
 1 1 0 
 
 1 0 1 
 
 
H= 0 1 1 .
 
 1 0 0 
 
 0 1 0 
0 0 1
This is a parity check matrix for the (7, 4, 3) Hamming code. We can also see using
question 3 that the minimum distance is 3. Therefore this code can only correct one
error and so the coset leaders are the words of weight at most 1. The syndrome for
a coset leader of weight 1 will be the row of H corresponding to the position where
the coset leader has a one. The received words have the following syndromes.
Received Syndrome
0111000 000
0110110 101
1011001 000
1011111 110
1100101 100
0100110 000
The syndrome 101, which is row three of H, shows that an error occurred in the
third position of the second received word. The syndrome 110, which is the second
row of H, shows that an error occurred in the second position of the fourth received
word. Similarly, the fifth received word has an error in position five. Of course, the
zero syndrome indicates no errors. Decoding then produces the following words :
0111 0100 1011 1111 1100 0100. This corresponds to : HELP ME.
Question 3
Let C be a linear code and H its parity check matrix. Then C has minimum distance d
if and only if any set of d − 1 rows of H is linearly independent and some set of d rows is
131
linearly dependent.
Proof :
⇐:
Let ri1 , ri2 , . . . , rid be the set of linearly dependent rows. That is ri1 + ri2 + · · · + rid = 0.
Let x = x1 x2 x3 . . . xn ∈ K n such that xj = 1 if j ∈ {i1 , i2 , . . . , ik } and xj = 0 otherwise.
Then xH = ri1 + ri2 + · · · + rid = 0. This implies that wt(x) = d and x ∈ C. Let z ∈ C
with z 6= 0. Then zH = 0. Assume wt(z) ≤ d − 1. This would imply that a sum of d − 1
or fewer rows of H is equal to zero. By hypothesis any set of d − 1 rows of H is supposed
to be linearly independent (any smaller set will also be), so that their sum will be nonzero
— a contradiction. Thus wt(z) ≥ d, so that dmin (C) = d.
⇒:
Let dmin (C) = d. Then there exists x ∈ C such that wt(x) = d and x 6= 0. Further,
xH = 0, which implies that a sum of d rows of H equals zero meaning that they are linearly
dependent. Since wt(z) ≥ d for all z ∈ C, this means that if y ∈ K n and wt(y) ≤ d − 1,
yH 6= 0. Therefore a sum of any set of d−1 or fewer rows of H is linearly independent.
Question 4
Let C be a linear code of length n and dmin (C) ≥ 2t + 1. One of the cosets of C is C
itself. For every x ∈ C with x 6= 0, wt(x) ≥ 2t + 1. Therefore in the coset C, 0 is the
only word of that has weight at most t. Let y ∈ K n with wt(y) ≤ t and consider the
coset y + C. Let z ∈ y + C, therefore z = y + c, c ∈ C. If c 6= 0, then wt(y + c) ≥ t + 1,
since y has only at most t 1’s to cancel the 1’s of c (of which there are at least 2t + 1).
Now y ∈ y + C and y = y + 0, so y is the only word in y + C that has weight at most
t. Thus the set of cosets {y + C | wt(y) ≤ t} each has a unique word of weight at most
t. All cosets are disjoint and any cosets that remain, apart from the ones above (if there
are any), will all contain words of weight at least t + 1. Therefore every coset has at most
one word of weight t or less (some cosets may have none of these words).
Let u be an error pattern of weight at most t, v ∈ C be sent and w = v +u be received.
Then wH = (u + v)H = uH + vH = uH. Therefore the coset is uniquely determined by
the error pattern. If we let the syndrome uH correspond to the coset u + C then u is the
unique word of weight at most t in u + C and so u will be chosen as the (correct) error
pattern and decoding will be successful.
Question 5
(a) This can be shown in one of two ways :

(i) An (n, k, d)-code is equivalent to a code in standard form that has a parity
check matrix of the form
" #
X
.
In−k
Assume d − 1 > n − k (d − 1 ≥ n − k + 1). By question 3, any set of d − 1

rows of H is linearly independent. Therefore any set of n − k + 1 rows of H is
linearly independent. On the other hand if we take the n − k rows of In−k and
any row of X they form a linearly dependent set — any z ∈ K n−k is a linear
combination of the rows of In−k . This gives us the desired contradiction.
(ii) Here again we use the fact that the code is equivalent to one that is in standard
form. Specifically we use the fact that the generator matrix is in standard
form. Let v ∈ C be a codeword that has only one of its information bits being
nonzero. Then the remaining n − k parity bits can have weight at most n − k.
Therefore v has weight at most 1 + n − k, so that dmin (C) ≤ n − k + 1.
(b) (i) C is MDS ⇐⇒ every n − k rows of the parity check matrix are linearly inde-
pendent.
⇒:
Follows from question 3.
⇐:
We again use question three and assume that C is in standard form. Note that
the linearly dependent set of n − k + 1 rows can be taken as the n − k rows of
In−k together with a row from X.
(ii) C is MDS ⇐⇒ every k columns of the generator matrix are linearly indepen-
dent.
⇒:
Let C be an MDS code and assume some set of k columns of its generator
matrix, G, are linearly dependent, say ci1 , ci2 , . . . , cik . Consider the square
matrix M = [ci1 , ci2 , . . . , cik ]. Since dimension of the row space is equal to the
dimension of the column space and since M has k linearly dependent columns,
we know that the k rows of M are also linearly dependent. That is they sum
to zero. Thus summing the k rows of G (which is the same as encoding the
all ones word) produces a word with zeros in positions i1 , i2 , . . . , ik . Therefore
this codeword has at least k zeros so that it can have at most n − k ones. Since
C is MDS, dmin = n − k + 1, implying that all nonzero words should have
133
weight at least n − k + 1 giving a contradiction.

⇐:
Let C be a linear code such that every k columns of the generator matrix are
linearly independent and assume that dmin < n − k + 1. If w ∈ C is a word of
minimum weight then wt(w) ≤ n − k. Therefore w has at least k zeros say in
positions i1 , i2 , . . . , ik . Furthermore let w be the sum of rows j1, j2 , . . . , jl with
j1 < j2 < . . . < jl ≤ k (so l ≤ k). Let M be the submatrix of G obtained by
taking rows j1 , j2, . . . , jl , but only in columns i1 , i2 , . . . , ik . This submatrix has
l linearly dependent rows (because of the zeros in w) and so it has l linearly
dependent columns. We know that l ≤ k which is a contradiction to the fact
that every set of k columns is linearly independent so that sets of smaller size
are still linearly independent.
(c) Assume C is in standard form. We know that if G and H are the generator and parity
check matrices for C, then GT and H T are the parity check and generator matrices
for C ⊥ . So from part (b) above we know that the parity check matrix for C ⊥ (GT )
has every set of k rows linearly independent and we can also find a set of k+1 linearly
dependent rows (as above). Then by question 3 this shows that dmin (C ⊥ ) = k + 1.
Therefore C ⊥ is a [n, n − k, k + 1]-code and since k + 1 = n − (n − k) + 1 it is also
MDS.
Math 433D/550
Assignment 3
Due: Thursday, February 28, 2002, in class.
1. [5] Is it true that in a self-dual code all words have even weight?
2. [4] Let C be a Hamming code of length 15. Find the number of error patterns the
extended code C ∗ will detect and the number of error patterns that C ∗ will correct.
3. [4] Count the number of codewords of weight 7 in the Golay code C23 . (Hint: Start by
proving that every word of weight 4 in K 23 is distance 3 from exactly one codeword.)
4. [4] Let G(1, 3) be the generator matrix for RM (1, 3). Decode the following received
words: (i) 01011110; (ii) 01100111; (iii) 00010100; (iv)11001110.
5. [5] If possible, devise a single error correcting code with length 6, 4 information
digits, and using the digits 0, 1, 2, 3, 4, 5. Describe the code, an encoding procedure,
and a decoding procedure. Prove that your code corrects all single errors. What is its
information rate? If not possible, say why not.
6. Codewords x1x2 x3 x4 x5 with decimal digits are defined by x1 x2 x3 x4 ≡ x5 (mod 9),

where x5 is a the check digit.
(a) [2] Show that the parity check equation is equivalent to x1 +x2 +x3 +x4 ≡ x5 (mod 9).
(b) [3] Assuming that each single error is equally likely, what percentage of single errors
are undetected?
(c) [2] Which errors involving transposition of digits are detected?
7. Consider the code with 10 decimal digits in which the check digit x10 is the least residue
of x1 x2 · · · x9 (mod 7) (that is, 0 ≤ x10 ≤ 6). (As of my last information, this code is
used by UPS and Federal Express.)
(a) [3] Under what conditions are single errors undetected?
(b) [3] Assuming that each single error is equally likely, what percentage of single errors
are undetected?
(c) [2] Repeat (b) for errors involving transposition of digits.
135
MATH 550
Assignment 3
Solutions
Question 1
Let v ∈ C, where C is a self-dual code. Then since C = C ⊥ , v · u = 0 for all u ∈ C.

Specifically, v · v = 0. If v = v1 v2 · · · vn, then v · v = v12 + v22 + · · · + vn2 . In the binary case
this equals v1 + v2 + · · · + vn which from the orthogonality equals zero. This implies that
v has an even number of 1’s, so that all v ∈ C have even weight.
Question 2
All Hamming codes have a minimum distance of 3. The extended code C ∗ has minimum
distance 4. Therefore C ∗ can detect all errors of weight 1, 2 or 3. Further C ∗ can detect
all error patterns of odd weight. This can be seen from the parity check matrix for C ∗ .
 
0 0 0 1 1
 0 0 1 0 1 
 
 0 0 1 1 1 
 
 
 0 1 0 0 1 
 
 0 1 0 1 1 
 
 0 1 1 0 1 
 
 0 1 1 1 1 
 
 
∗  1 0 0 0 1 
H = .
 1 0 0 1 1 
 
 1 0 1 0 1 
 
 1 0 1 1 1 
 
 1 1 0 0 1 
 
 
 1 1 0 1 1 
 
 1 1 1 0 1 
 
 1 1 1 1 1 
0 0 0 0 1
The syndrome associated with an error pattern of odd weight is the sum of an odd number
of rows of H ∗ . Such a sum will always have a 1 in the last digit and will thus be nonzero,
enabling us to detect the error. The number of odd weight error patterns are
µ ¶ µ ¶ µ ¶ µ ¶
16 16 16 16 1
+ + + ··· + = 216 = 215 .
1 3 5 15 2
An even weight error pattern that is not detectable takes one codeword into a another
codeword. Therefore this error pattern is itself a codeword. All codewords of C ∗ are of
even weight and therefore they are all even weight error patterns that C ∗ cannot detect.
4
C ∗ has 22 −4−1 = 211 codewords. So the number of even weight error patterns that are
detectable are
µ ¶ µ ¶ µ ¶ µ ¶
16 16 16 16 1
+ + + ··· + − 211 = 216 − 211 = 30720.
0 2 4 16 2
Thus the number of detectable error patterns are the number of odd weight error
patterns plus the number of even weight detectable error patterns. This is 215 + 30720 =
63488.
All error patterns of weight one are correctable since the minimum distance is 4. The
syndrome corresponding to an error pattern of weight 2 is the sum of two rows of H ∗ .
Such a sum is equal to a row of H ∗ except that the entry on the right will be 0 instead of
1. Each such sum can arise in at least two different ways : use the row itself plus the last
row of H ∗ and row 1 = row 2 + row 3, row 2 = row 8 + row 10, row 3 = row 1 + row 2,
row 4 = row 3 + row 1, . . . , row 16 = row 9 + row 7. Therefore the syndrome associated
with error patterns of weight 2 is not unique. In other words the coset containing an error
pattern of weight 2 contains at least one other error pattern of weight 2. Thus no error
pattern of weight 2 is correctable.
An error pattern of weight k ≥ 3 is the sum of an error pattern of weight 2 and an
error pattern of weight k − 2. Therefore the syndrome associated with this error pattern
is the sum of the syndrome of the weight 2 error pattern and the syndrome of the weight
k − 2 error pattern. Since the first syndrome is not unique, these syndromes (for weight
k error patterns) can arise in more than one way. So no error pattern of weight k ≥ 3 is
correctable.
Therefore the number correctable error patterns equals the number of single errors
which equals 16.
Question 3
The Golay code C23 is a perfect (23, 12, 7)-code. Therefore every word in K 23 is in exactly
one ball with center a codeword and radius 3. The ball around the zero codeword contains
words of weight at most 3, so no word of weight 4 is inside this ball. Let v ∈ C23 be a
codeword of weight at least 8. Then the words inside the ball around v differ from v in
1, 2 or 3 places. Therefore the words inside this ball have weight at least 5. Every nonzero
codeword in C23 has weight at least 7. So consider now a codeword u ∈ C23 of weight 7.
The ball around u contains words that differ from u in 1, 2 or 3 places. By changing 3 of
137
¡¢
u’s 1’s to 0’s we obtain a word of weight 4 inside this ball. So we have 73 = 35 words
¡ ¢
of weight 4 inside this ball. There are 23 4
= 8855 words of weight 4 in K 23 and they
all have to lie inside a ball of radius 3 with center a codeword of weight 7. So we have
35 words of weight 4 in each such ball so that this requires 8855/35 = 253 such balls.
Therefore there are 253 codewords of weight 7.
Question 4
 
1 1 1 1 1 1 1 1
 
 0 1 0 1 0 1 0 1 
G(1, 3) =  .
 0 0 1 1 0 0 1 1 
0 0 0 0 1 1 1 1
Note that RM (1, 3)⊥ = RM (3 − 1 − 1, 3) = RM (1, 3). Therefore G(1, 3)G(1, 3)T = 0.
Also G(1, 3)T has 4 linearly independent columns so that G(1, 3)T is a parity check matrix
for RM (1, 3). Further dmin (RM (1, 3)) = 4.
(i) 01011110 · G(1, 3) = 1100 + 1110 + 1001 + 1101 + 1011 = 1101. This equals the sixth
row of G(1, 3) and therefore we assume an error in the sixth position and decode to
01011010.
(ii) 01100111 · G(1, 3) = 1100 + 1010 + 1101 + 1011 + 1111 = 1111. This is row 8 so we
assume a single error in position 8 and decode to 01100110.
(iii) 00010100 · G(1, 3) = 1110 + 1101 = 0011. This doesn’t equal a row of G(1, 3) so
more than one error probably occurred. Furthermore this syndrome can arise in
two different ways : errors in positions 4 and 6 as well as errors in positions 3 and
5. So the best we can do is ask for a retransmission.
(iv) 11001110 · G(1, 3) = 1000 + 1100 + 1001 + 1101 + 1011 = 1011. This equals row 7
so we assume a single error in position 7 and decode to 11001100.
Question 5
A codeword will be made up of 4 information digits x1, x2 , x3 and x4 . The two parity
digits will be chosen such that
6
X
S1 = ixi ≡ 0 (mod 7),
i=1
X6
S2 = xi ≡ 0 (mod 7).
i=1
Adding twice the second equation the first we find that

3x1 + 4x2 + 5x3 + 6x4 + x6 ≡ 0 (mod 7).
Therefore
x6 ≡ −3x1 − 4x2 − 5x3 − 6x4 ≡ 4x1 + 3x2 + 2x2 + x4 (mod 7).
Adding the two check equations together we find
2x1 + 3x2 + 4x3 + 5x4 + 6x5 ≡ 0 (mod 7),
so that
6x5 ≡ −2x1 − 3x2 − 4x3 − 5x4 ,
∴ x5 ≡ 2x1 + 3x2 + 4x3 + 5x4 .
Up to now we have not placed any restriction on the digits of the code. Therefore x1
through x4 could be any digit from 0 to 6 and x5 and x6 will be any digit from 0 to 6.
We will show later how to remedy this. First we show that the code can correct all single
errors.
To encode a given word [x1 , x2 , x2 , x4 ] we compute the product
 
1 0 0 0 4 2
 
 0 1 0 0 3 3 
[x1 , x2, x2 , x4]  .
 0 0 1 0 2 4 
0 0 0 1 1 5
Assume now that a single error of size e occurs in position i. Then
S1 = 1 · x1 + · · · + i(xi + e) + · · · 6x6 ≡ 0 + i · e (mod 7),
S2 = x1 + · · · + (xi + e) + · · · x6 ≡ 0 + e (mod 7).
Therefore by computing firstly S2 and then S1 we can find the size and position of the
error.
Let
 
1 1
 2 1 
 
 
 3 1 
H= .
 4 1 
 
 5 1 
5 1
Then the decoding may be described as follows.
139
1. Compute rH = [S1 , S2 ], where r is the received word and H is defined above.
2. If S1 = 0 and S2 = 0 assume the word is correct.
3. If S1 6= 0 and S2 6= 0, then assume an error of size S2 occurred in position S2−1S1

(mod 7).
4. If only one of the Si’s is nonzero and the other equal to zero, then assume more
than 3 errors occurred.
We now modify the code so that it only uses the digits 0 through 5. By simply
restricting the possibilities for the digits x1 , x2, x3 and x4 to {0, 1, 2, 3, 4, 5} we ensure
that they meet the requirement. The possibility still exits that x5 , x6 or both could equal
6. So among the 64 codewords that have x1 , x2 , x3 and x4 in {0, 1, 2, 3, 4, 5}, we want to
remove those with x5 , x6 or both equal 6.
For x5 = 6 we have
2x1 + 3x2 + 4x3 + 5x4 ≡ 6 (mod 7),
where 0 ≤ xi ≤ 5. This is the same as having
y1 + y2 + y3 + y4 = 7k + 6,
with y1 = 2x1 , y2 = 3x2 , y3 = 4x3 and y4 = 5x4 , so that y1 ∈ {0, 2, 4, 6, 8, 10}, y2 ∈

{0, 3, 6, 9, 12, 15}, y3 ∈ {0, 4, 8, 12, 16, 20} and y4 ∈ {0, 5, 10, 15, 20, 25}.
The generating function that counts the number of solutions in this case is
(1 + x2 + x4 + x6 + x8 + x10 )(1 + x3 + x6 + x9 + x12 + x15 )(1 + x4 + x8 + x12 + x16 + x20 )(1 + x5 + x10 + x
We want the coefficient of xi for i = 6, 13, 20, 27, 34, 41, 48, 55, 62, 69 where i here cor-
responds to the possible values 7k + 6 for k = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. In each case
the corresponding coefficients are 3, 10, 22, 33, 38, 36, 25, 13, 5, 0. So in total there are
3 + 10 + 22 + 33 + 38 + 36 + 25 + 13 + 5 + 0 = 185 solutions (or codewords then) that
have at least x5 = 6.
Proceeding in the same manner as above we find that for x6 = 6 we get
4x1 + 3x2 + 2x3 + x4 ≡ 6 (mod 7),
which is the same as
y1 + y2 + y3 + y4 = 7k + 6,
with y1 ∈ {0, 4, 8, 12, 16, 20}, y2 ∈ {0, 3, 6, 9, 12, 15}, y3 ∈ {0, 2, 4, 6, 8, 10} and y4 ∈
{0, 1, 2, 3, 4, 5}.
Here the generating function is
(1 + x4 + x8 + x12 + x16 + x20)(1 + x3 + x6 + x9 + x12 + x15 )(1 + x2 + x4 + x6 + x8 + x10 )(1 + x1 + x2 + x
The coefficients that we are after are the coefficients of xi for i = 6, 13, 20, 27, 34, 41, 48, 55, 62, 69.
In this case they are 8, 27, 46, 51, 36, 15, 2, 0, 0, 0. Therefore there are 8 + 27 + 46 + 51 +
36 + 15 + 2 + 0 + 0 + 0 = 185 codewords that have at least x6 = 6.
Lastly we need the case x5 = x6 = 6. In this case we have
2x1 + 3x2 + 4x3 + 5x4 ≡ 6 (mod 7) and

4x1 + 3x2 + 2x3 + x4 ≡ 6 (mod 7).
Subtracting the second equation form the first gives
5x1 + 2x3 + 4x4 ≡ 0 (mod 7).
This is the same as
y1 + y2 + y3 = 7k,
where y1 = 5x1 , y2 = 2x3 and y3 = 4x4 so that y1 ∈ {0, 5, 10, 15, 20, 25}, y2 ∈ {0, 2, 4, 6, 8, 10}
and y3 ∈ {0, 4, 8, 12, 16, 20}. The generating function for counting the number of solutions
is
(1 + x5 + x10 + x15 + x20 + x25)(1 + x2 + x4 + x6 + x8 + x10 )(1 + x4 + x8 + x12 + x16 + x20 ).
Here we want the coefficients of xi for i = 0, 7, 14, 21, 28, 35, 42, 49, 56, 63, 70. They are
respectively 1, 1, 5, 5, 7, 7, 3, 2, 0, 0, 0. So there are 1+1+5+5+7+7+3+2+0+0+0 = 31
solutions where x5 = x6 = 6.
Therefore the total number of solutions where x5 , x6 or both equal 6 is 185+185−31 =
339. So by removing these codewords from the code we get a code with 64 − 339 = 957
codewords.
For the rate we notice that there are 64 possible codewords but that only 957 of them
are used, therefore the rate is 957/(64 ) ≈ 0.73.
Question 6
141
(a) The parity check equation is x1 x2 x3 x4 ≡ x5 (mod 9) where x1 x2 x3 x4 is a number

with the digits x1 , x2 , x3 , x4 and not the product of these numbers. Therefore the
parity check equation is x1 · 103 + x2 · 102 + x3 · 101 + x4 · 100 ≡ x5 (mod 9). This is
the same as 999x1 + 99x2 + 9x4 + (x1 + x2 + x3 + x4 ) ≡ x5 (mod 9). The first set
of terms is congruent to zero mod 9 so that the parity check equation is equivalent
to x1 + x2 + x3 + x4 ≡ x5 (mod 9).
(b) If an error, e, occurs in position 5 and it is to be undetectable, it has to be the

case that x1 + x2 + x3 + x4 ≡ x5 + e (mod 9). This says that e ≡ 0 (mod 9). This
corresponds to the following two errors : 0 → 9 and 9 → 0. Since 0 ≤ x5 ≤ 8,
the only possible undetectable error in position 5 is 0 → 9. This should also be
detectable because if x5 = 0, then x1 + x2 + x3 + x4 ≡ 0 (mod 9) and should x5
change to 9 and if this is the only error then x1 + x2 + x3 + x4 is still congruent to
zero mod 9 which is different from 9. So all errors are detectable in position 5. Note
however that it is not possible detect this error from the check equation alone.
If an undetectable error, e, occurs in positions 1 through 4, then x1 +x2 +x3 +x4 +e ≡
x5 (mod 9). Again this says that e ≡ 0 (mod 9) which corresponds to the errors
0 → 9 and 9 → 0.
Now in total there are 10 × 5 × 9 = 450 possible single errors : one of 10 digits is in
error, in one of 5 positions and it can be changed to one of 9 possible values. Since
all errors in position 5 are detectable we only need to consider the errors in positions
1 through 4. Of these we know that the errors 0 → 9 and 9 → 0 are undetectable.
These two types of errors can occur in any of the positions 1 through 4, so there
are 8 possible undetected errors. Therefore the percentage that is undetectable is
8/450 ≈ 1.8%.
(c) If two digits from x1 , x2 , x3 or x4 are transposed this will not be detected since
the parity check equation remains valid. As an example say that x4 and x5 are
transposed, then the check equation becomes x1 + x2 + x3 + x5 ≡ x4 (mod 9). This
is the same as x1 + x2 + x3 + x4 + x5 ≡ 2x4 (mod 9), which in turn is 2x5 ≡ 2x4
(mod 9) implying x4 ≡ x5 (mod 9). The only way in which this can occur is if
x4 = 9 and x5 = 0.
Therefore transpositions involving the check digit will be detectable as long as the
digits involved are not 9 (for x1 through x4 ) and 0 (for x5 ).
Question 7
The check equation is x1 x2 · · · x9 ≡ x10 (mod 7) (as above). Therefore it is the same as
x1 · 108 + x2 · 107 + · · · + x9 · 100 ≡ x10 (mod 7).
(a) Say a single error of size e occurs in position i, 1 ≤ i ≤ 9. The check equation
becomes x1 · 108 + x2 · 107 + · · · + (xi + e) · 109−i + · · · + x9 · 100 ≡ x10 + e · 109−i
(mod 7). The error will be undetectable if e · 109−i ≡ 0 (mod 7). This implies that
e = ±7. This corresponds to the following errors : 0 → 7, 1 → 8, 2 → 9, 7 → 0,
8 → 1 and 9 → 2.
If an undetectable error of size e occurs in position 10, then it has to be the case
(also) that e ≡ 0 (mod 7). This corresponds to the same set of errors as above,
but now since 0 ≤ x10 ≤ 6, the only possible undetectable errors in position 10 are
0 → 7, 1 → 8 and 2 → 9. Since all these errors change x10 into something bigger
than 6, they will be detectable.
(b) Consider first errors occurring in positions 1 through 9. For each of the 9 possible
positions there are 10 possible digits for each position and for each digit and position
there are 9 possible errors. Therefore in the first nine positions there are 10× 9×9 =
810 possible errors. In the tenth position there can be any one of 7 digits and each
digit can be changed into one of 9 possibilities. Therefore there are 7 × 9 = 63 single
errors involving the tenth position. So in total there are 810 + 63 = 873 possible
single errors.
All errors in position 10 are detectable, so we only concern ourselves with the first
nine positions. In these positions there are 6 possible undetectable errors. Each of
these 6 errors can occur in one of 9 positions, so there are 9 × 6 = 54 undetectable
errors. Therefore the percentage of undetectable errors is 54/873 ≈ 6%.
(c) Let xi and xj be transposed with i < j. If i = 1, there 9 possible transpositions :

x1 ↔ x2 , x1 ↔ x3 , . . . , x1 ↔ x10 . If i = 2 there are 8 transpositions, . . . , with i = 9
there is 1 transposition. In total there are 1 + 2 + · · · + 9 = 45 transpositions. Each
transposition that is an error is made up of an ordered pair of distinct numbers :
there are 90 of these. Each pair can be combined with each of the 45 transpositions.
So there are 90 × 45 = 4050 transpositions that are errors.
Say digits xi and xj are transposed where 1 ≤ i < j ≤ 9. The checksum becomes
x1 · 108 + · · · + xj · 109−i + · · · + xi · 109−j + · · · + x9 · 100 ,

= x1 · 108 + · · · + xi · 109−i + · · · + xj · 109−j + · · · + x9 · 100 − xi · 109−i + xj · 109−i
−xj · 109−j + xi · 109−j ,
≡ x10 + (xj − xi) · 109−i − (xj − xi ) · 109−j (mod 7).
143
The error will be undetectable if (xj − xi )(109−i − 109−j ) ≡ 0 (mod 7). That is if
(xj − xi) = ±7 or if (109−i − 109−j ) ≡ 0 (mod 7). The last equation is the same as
109−j (10j−i − 1) ≡ 0 (mod 7). Now 7 - 109−j , so 7 | 10j−i − 1. That is 10j−i ≡ 1
(mod 7) implying j − i ≡ 0 (mod φ(7)). Since φ(7) = 6 this corresponds to j = 9,
i = 3; j = 8, i = 2 and j = 7, i = 1. Therefore if the digits in positions 1 and 7, 2
and 8 or 3 and 9 are transposed an undetectable error occurs (regardless of the digits
involved). Now there are 90 × 3 = 270 transpositions involving these positions.
Furthermore if the size of a transposition is ±7, that is (xi − xj ) = ±7, the trans-
position is undetectable. This corresponds to the 6 transpositions 0 ↔ 7, 1 ↔ 8,
2 ↔ 9, 7 ↔ 0, 8 ↔ 1 and 9 ↔ 2. Transpositions of size ±7 have already been
considered for positions 3 and 9, 2 and 8 and positions 1 and 7, above. Thus we
are left with considering transpositions involving all the other positions. There are
(1+2+3+· · ·+8)−3 = 33 of these (the total number of transpositions involving the
first 9 positions − the 3 considered above). So there are 33 × 6 = 198 transpositions
of size 7 in the remaining positions.
The check equation is equivalent to
2x1 + 3x2 + x3 + 5x4 + 4x5 + 6x6 + 2x7 + 3x8 + x9 + 6x10 ≡ 0 (mod 7).
Therefore x10 and x6 can always be transposed and it will not be detected. There
are 90 such transpositions.
So, there are 270 + 198 + 90 = 558 undetectable transpositions. The percentage
then is 558/4050 ≈ 14%.
Math 433D/550
Assignment 4
Due: Thursday, March 28, 2002, in class.
1. [5] The following text was intercepted from an Affine cipher:

KQEREJEBCPPCJCRKIEACUZBKRVPKRBCIBQCARBJCVFCUP
KRIOFKPACUZQEPBKRXPEIIEABDKPBCPFCDCCAFIEABDKP
BCPFEQPKAZBKRHAIBKAPCCIBURCCDKDCCJCIDFUIXPAFF
ERBICZDFKABICBBENEFCUPJCVKABPCYDCCDPKBCOCPERK
IVKSCPICBRKIJPKABI
Determine the plain-text. Give a clear description of the steps you followed.
2. Suppose Bob has an RSA cryptosystem with a large modulus n which can not be
factored easily. Alice sends a message to Bob by representing the alphabetic characters
A, B, ..., Z as 0, 1, ..., 25, respectively, and then encrypting each character (i.e., number)
separately.
(a) [4] Explain how a message encrypted in this way can easily be decrypted.
(b) [2] The following cipher-text was encrypted using the scheme described above with
n = 18721 and b = 25:
365, 0, 4845, 14930, 2608, 2608, 0
Illustrate your method from (a) by decrypting this cipher-text without factoring n.
This example illustrates a protocol failure in RSA. It demonstrates that a cipher-text can
sometimes be decrypted by an adversary if the system is used in a careless way. Thus a
secure cryptosystem is not enough to assure secure communication, it must also be used
properly.
3. [5] What happens if the RSA system is set up using p and q where p is prime but q
is not? Does encryption work (is EK (x) 1-1)? Can all encrypted messages be uniquely
decrypted? Illustrate your points with an example where p and q are two digit numbers.
4. [4] Find a (hopefully small) composite odd integer n, and an integer a, 1 ≤ a ≤ n − 1

for which the Miller-Rabin algorithm answers ”prime”. Demonstrate this by stepping
through the algorithm, assuming a is generated in step 2.
5. Suppose p = 199, q = 211 and B = 1357 in a Rabin cryptosystem.

(a) [2] Determine the four square roots of 1 modulo n, where n = pq.
145
(b) [2] Compute Ek (32767).

(c) [2] Determine the four possible decryptions of this cipher-text y.
6. [4] Factor 262063 and 9420457 using the p − 1 method. In each case, how big does B
have to be to be successful?
7. [6] Gary’s Poor Security (GPS) public key cryptosystem has Ek (x) = ax (mod 17575),
where a the receiver’s public key, and gcd(a, n) = 1. The plain-text space and cipher-text
space are both Zn, and each element of Zn represents three alphabetic characters as in
the following examples:
DOG → 3 × 262 + 14 × 26 + 6 = 2398
CAT → 2 × 262 + 0 × 26 + 19 = 1731.
The following message has been encrypted using Gary’s public key, 1411.
7017, 17342, 5595, 16298, 12285
Explain how to break the system and decrypt the message. Do it. Show your work.
8. [5] The following message was encrypted using the Rabin cryptosystem with
n = 19177 = 127 × 151 and B = 5679:
2251, 8836, 7291, 6035
The elements of Zn correspond to triples of alphabetic characters as in question 7. Decrypt
the message. explain how you decided among the four possible plain-texts for each cipher-
text symbol.
MATH 550
Assignment 4
Solutions
Question 1
The frequencies of the letters in the cipher-text are shown below.
Letter Frequency
Letter Frequency
A 13
O 2
B 21
P 20
C 32
Q 4
D 9
R 12
E 13
S 1
F 10
U 6
H 1
V 4
I 16
X 2
J 6
Y 1
K 20
Z 4
N 1
The most frequent letters turn out to be C, B, K, P and I. Based on this we guess that
E 7→ C and T 7→ B. That is
Ek (4) = 2, ∴ a · 4 + b = 2,
Ek (19) = 1, ∴ a · 19 + b = 1.
From this we find that a ≡ 19 (mod 26) and b ≡ 4 (mod 26). Therefore
Dk (y) = a−1 (y − b) ≡ 11(y + 22) (mod 26).
Applying Dk (y) to the cipher-text we find the following.

147
Dk (A) = I
Dk (O) = G
Dk (B) = T
Dk (P) = R
Dk (C) = E
Dk (Q) = C
Dk (D) = P
Dk (R) = N
Dk (E) = A
Dk (S) = Y
Dk (F) = L
Dk (U) = U
Dk (H) = H
Dk (V) = F
Dk (I) = S
Dk (X) = B
Dk (J) = D
Dk (Y) = M
Dk (K) = O
Dk (Z) = X
Dk (N) = V
Therefore the cipher-text becomes.
OCANADATERREDENOSAIEUXTONFRONTESTCEINTDEFLEUR
ONSGLORIEUXCARTONBRASSAITPORTERLEPEEILSAITPOR
TERLACROIXTONHISTOIREESTUNEEPOPEEDESPLUSBRILL
ANTSEXPLOITSETTAVALEURDEFOITREMPEEPROTEGERANO
SFOYERSETNOSDROITS
Which turns out to be the following.
Ô Canada!
Terre de nos aı̈eux.
Ton front est ceint,
De fleurons glorieux.
Car ton bras
Sait porter l’épée,
Il sait porter la croix.
Ton histoire est une épopée,
des plus brillants exploits.
Et ta valeur,
de foi trempée,
protègera nos foyers et nos droits.
Question 2
(a) The eavesdropper encrypts the alphabet A,B, . . . ,Z. He/She then knows what the
cipher-text of each plain-text letter is and since the encryption function is one-to-one
we know what the inverse of each cipher-text “letter” is.
(b) Encrypting the alphabet using the given parameters we find the following.
A 7→ 0 N 7→ 4845
B 7→ 1 O 7→ 1375
C 7→ 6400 P 7→ 13444
D 7→ 18718 Q 7→ 16
E 7→ 17173 R 7→ 13663
F 7→ 1759 S 7→ 1437
G 7→ 18242 T 7→ 2940
H 7→ 12359 U 7→ 10334
I 7→ 14930 V 7→ 365
J 7→ 9 W 7→ 10789
K 7→ 6279 X 7→ 8945
L 7→ 2608 Y 7→ 11373
M 7→ 4644 Z 7→ 5116
Reading off the plain-text from the table we find that the cipher-text message is :
VANILLA.
Question 3
In general the encryption function will not be one-to-one. To see this let p = 11 and q = 12.
Then n = pq = 132 = 11 × 3 × 22 and φ(n) = [132/(11 × 2 × 2)](11 − 1)(3 − 1)(2 − 1) =
40 = 5 × 23 . Thus if we choose b = 3 then gcd(b, φ(n)) = 1. Further a ≡ b−1 (mod φ(n)),
so that a ≡ 27 (mod 40). We now find that Ek (3) ≡ 3b ≡ 33 ≡ 9 (mod 132) and
Dk (9) ≡ 9a ≡ 927 ≡ 81 (mod 132). So here Dk (Ek (3)) 6= 3. Also, Ek (81) ≡ 81b ≡ 813 ≡ 9
(mod 26). Therefore two different elements, 3 and 81, both encrypt to 9.
Question 4
Let n = 49 = 7 × 7. Then n − 1 = 48 = 24 × 3, therefore m = 3 in the Miller-Rabin

algorithm. If we choose a = 18 or 30, then in step 3 b = a3 ≡ 1 (mod 49). In step 4 the
algorithm will answer “prime” contrary to n being composite.
149
Question 5
p = 199, q = 211, n = pq = 41989 and B = 1357.
(a) x2 ≡ 1 (mod 41989) ⇐⇒ x ≡ ±1 (mod 199) and x ≡ ±1 (mod 211).

From x ≡ 1 (mod 199) and x ≡ 1 (mod 211) the Chinese remainder Theorem gives
the solution x ≡ 1 × M1 × y1 + 1 × M2 × y2, where
M1 = 211 M2 = 199,
y1 ≡ M1−1 ≡ 83 (mod 199) y2 ≡ M2−1 ≡ 123 (mod 211).
Therefore x ≡ 1 × 211 × 83 + 1 × 199 × 123 ≡ 1 (mod 41989).

From x ≡ 1 (mod 199) and x ≡ −1 (mod 211) the Chinese remainder Theorem
gives the solution x ≡ 1 × M1 × y1 + (−1) × M2 × y2 ≡ 1 × 211 × 83 − 1 × 199 × 123 ≡
35025 (mod 41989).
The other two solutions will therefore be x ≡ −1 (mod 41989) and x ≡ −35025
(mod 41989).
(b) Ek (x) = x(x + B) (mod n). Therefore
Ek (32767) ≡ (32767)(32767 + 1357) (mod 41989),

≡ 16027 (mod 41989).
(c) We know that

µ µ ¶ ¶
B B
Ek ω x + − = Ek (x),
2 2
where ω is a square-root of 1 mod(41989). Also 2−1 ≡ 20995 (mod 41989). There-

fore
µ ¶
1357
ω 32767 + ≡ ω(12451) − 21673 ≡ ω(12451) + 20316 (mod 41989).
2
This gives the four possible decryptions of 16027 as
ω=1: 1(12451) + 20316 ≡ 32767 (mod 41989),

ω = −1 : −1(12451) + 20316 ≡ 7865 (mod 41989),
ω = 35025 : 35025(12451) + 20316 ≡ 18837 (mod 41989),
ω = 6964 : 6964(12451) + 20316 ≡ 21795 (mod 41989).
Question 6
Using the p − 1 method we find that 262063 = 521 × 503 and the first B for which the
method produces an answer is B = 13. Further 9420457 = 2351 × 4007 and the smallest
B that works in this case is B = 47.
Question 7
We are given that Ek (x) = ax (mod 17575), with a = 1411. From this we see that
Dk (y) = a−1 y (mod 17575). Therefore all we need to do is find a−1 (mod 17575). We
find that a−1 ≡ 16591 (mod 17575). Decrypting the given cipher-text we get.
2247 797 13070 8743 3160
From this we get
2247 = 3 × 262 + 8 × 26 + 11,

797 = 1 × 262 + 4 × 26 + 17,
13070 = 19 × 262 + 8 × 26 + 18,
8743 = 12 × 262 + 24 × 26 + 7,
3160 = 4 × 262 + 17 × 26 + 14.
Reading off the letters we find that the plain-text is : DILBERT IS MY HERO.
Question 8
We are given n = 127 × 151 = 19177 and B = 5679. Therefore

p
Dk (y) = y + 4−1 B 2 − 2−1B.
Now B 2 ≡ 14504 (mod 19177) and 4−1 ≡ 14383 (mod 19177), so that 4−1 B 2 ≡ 3626
(mod 19177). Also, 2−1 ≡ 9589 (mod 19177), therefore 2−1 B ≡ 12428 (mod 19177). We
now have
p p
Dk (y) = y + 3626 − 12428 ≡ y + 3626 + 6749 (mod 19177).
Decrypting the cipher-text we find the following.

√ √
Dk (2251) ≡ 5877 + 6749 (mod 19177). Now 5877 ≡ 5877128/4 ≡ ±17 (mod 127)
√
and 5877 ≡ 5877152/4 ≡ ±21 (mod 151).
151
From x ≡ 17 (mod 127) and x ≡ 21 (mod 151) the Chinese remainder Theorem gives
x ≡ 17 × 151 × y1 + 21 × 127 × y2 (mod 19177). Here y1 ≡ 151−1 ≡ 90 (mod 127) and
y2 ≡ 127−1 ≡ 44 (mod 151). Therefore x ≡ 17 × 151 × 90 + 21 × 127 × 44 ≡ 3192
(mod 19177).
From x ≡ 17 (mod 127) and x ≡ −21 (mod 151), we find x ≡ 17 × 151 × 90 +(−21) ×
127 × 44 ≡ 17797 (mod 19177).
From x ≡ −17 (mod 127) and x ≡ 21 (mod 151), we find x ≡ −17 × 151 × 90 + 21 ×
127 × 44 ≡ 1380 (mod 19177).
From x ≡ −17 (mod 127) and x ≡ −21 (mod 151), we find x ≡ −17 × 151 × 90 +
(−21) × 127 × 44 ≡ 15985 (mod 19177).
Therefore
Dk (2251) ≡ 3192 + 6749 ≡ 9941 (mod 19177),
Dk (2251) ≡ 17797 + 6749 ≡ 5369 (mod 19177),
Dk (2251) ≡ 1380 + 6749 ≡ 8129 (mod 19177),
Dk (2251) ≡ 15985 + 6749 ≡ 3557 (mod 19177).
Giving us the following
9941 = 14 × 262 + 18 × 26 + 9,
5369 = 7 × 262 + 24 × 26 + 13,
8129 = 12 × 262 + 0 × 26 + 17,
9941 = 5 × 262 + 6 × 26 + 21.
In each case this corresponds to the plain-text OSJ, HYN, MAR, FGV. At this point the
plain-text that holds the most promise seems to be MAR.
√
Dk (8836) ≡ 12462 + 6749 (mod 19177). Here we find that the four square-roots of
12462 (mod 19177) are : 18038, 13974, 5203 and 1139. Therefore
Dk (8836) ≡ 18038 + 6749 ≡ 5610 (mod 19177),
Dk (8836) ≡ 13974 + 6749 ≡ 1546 (mod 19177),
Dk (8836) ≡ 5203 + 6749 ≡ 11952 (mod 19177),
Dk (8836) ≡ 1139 + 6749 ≡ 7888 (mod 19177).
We now have
5610 = 8 × 262 + 7 × 26 + 20,
1546 = 2 × 262 + 7 × 26 + 12,
11952 = 17 × 262 + 17 × 26 + 18,
7888 = 11 × 262 + 17 × 26 + 10.
Here the corresponding plain-text is IHU, CHM, RRS, LRK. Considered on their own
none of these seem to be a better choice than the other. If we combine them with the
first set we find that CHM seems to be a good choice as this gives MARCHM.
√
Dk (7291) = 10917 + 6749. The four square-roots of 10917 are 12519, 15567, 3610
and 6658. This gives
Dk (7291) ≡ 12519 + 6749 ≡ 91 (mod 19177),

Dk (7291) ≡ 15567 + 6749 ≡ 3139 (mod 19177),
Dk (7291) ≡ 3610 + 6749 ≡ 10359 (mod 19177),
Dk (7291) ≡ 6658 + 6749 ≡ 13407 (mod 19177).
This leads to
91 = 0 × 262 + 3 × 26 + 13,
3139 = 4 × 262 + 16 × 26 + 19,
10359 = 15 × 262 + 8 × 26 + 11,
13407 = 19 × 262 + 21 × 26 + 17.
The plain-text is ADN, EQT, PIL, TVR. Out of these four plain-texts the only one that
combines with the result so far in a sensible manner is the first one. This combined with
the result so far gives MARCHMADN.
√
Dk (6035) = 9661 + 6749. The four square-roots of 9661 are 17904, 15618, 3559 and
1273. This gives
Dk (6035) ≡ 17904 + 6749 ≡ 5476 (mod 19177),

Dk (6035) ≡ 15618 + 6749 ≡ 3190 (mod 19177),
Dk (6035) ≡ 3559 + 6749 ≡ 10308 (mod 19177),
Dk (6035) ≡ 1273 + 6749 ≡ 8022 (mod 19177).
Giving us the following
5476 = 8 × 262 + 2 × 26 + 16,

3190 = 4 × 262 + 18 × 26 + 18,
10308 = 15 × 262 + 6 × 26 + 12,
8022 = 11 × 262 + 22 × 26 + 14.
153
The four plain-texts are : ICQ, ESS, PGM and LWO. Here the second one seems to be
the only one that fits in with the results so far. This gives MARCHMADNESS 7→ March
Madness.

Gary Notes

Uploaded by

Copyright:

Available Formats

Gary Notes

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Gary Notes

Uploaded by

Copyright:

Available Formats

Math 550

Coding and Cryptography

1 Introduction and Basic Ideas 3

2 Detecting and Correcting Errors 7

4 Bounds for Codes 25

8.4 The Universal Product Code (UPC) . . . . . . . . . . . . . . . . . . . . . . 56

11 Public Key Cryptography 81

12 The Rabin Cryptosystem 89

14 The ElGamal Cryptosystem 99

15 Elliptic Curve Cryptography 105

16 The Merkle-Hellman “knapsack” System 113

17 The McEliece Cryptosystem 117

Introduction and Basic Ideas

Figure 1.1: Model of a Communications System

The function of each block is as follows

✎ Authentication : is the corroboration that the origin of a message is as

✗ Channel Decoder : Uses the redundancy introduced by the channel encoder to

✗ Receiver : The intended destination of the data produced by the source.

Definition 1.2.2 (Word). A word is a finite sequence of elements from an alphabet.

Definition 1.2.3 (Code). A code is a set C of words (or codewords).

C2 is an example of a prefix code.

1.3 Basic Assumptions About the Channel

2. There is no difficulty identifying the beginning of the first word transmitted.

Figure 1.2: The Binary Symmetric Channel

Example 1.3.1. If we are transmitting codewords of length 4 over a binary symmetric

Detecting and Correcting Errors

Definition 2.1.6 (Hamming distance). Let u and v be a binary words of length n.

dmin = min d(u, v).

Example 2.1.11. Let C = {0000, 1010, 0111}. Then

d(0000, 1010) = wt(0000 + 1010) = wt(1010) = 2,

Therefore the minimum distance of C is 2.

Definition 2.1.13 (t-error detecting code). A code C is a t-error detecting code if

1. C detects all nonzero error patterns of weight at most t.

2. C fails to detect some error pattern of weight t + 1.

Definition 2.1.14 (Correcting an error pattern). A code C is said to correct the

Definition 2.1.17 (t-error correcting code). A code C is a t-error correcting code if

1. C corrects all error patterns of weight at most t.

2. Fails to correct at least one error pattern of weight t + 1.

C ⊥ = {0000, 0010, 1000, 1010} = S ⊥ .

The next result follows from linear algebra.

Theorem 3.1.9. If C is a linear code of length n, then dim(C) + dim(C ⊥ ) = n.

Proposition 3.1.10. A linear code C = hSi of dimension k contains exactly 2k code-

Definition 3.1.11 (Information rate). The information rate of a code C is defined as

log2 |C| log2 2k k

3.2 The Generator and Parity Check Matrices

Therefore a basis for hSi is {11101, 01011, 00111}. We now let

Then G is what is known as a generator matrix for C (see Definition 3.2.2).

[010]G, [001]G, [100]G, [110]G = 01011, 00111, 11101, 10110.

The generator matrix has the following properties :

• The rows of G are linearly independent.

• The number of rows of G is equal to dim(C).

is a (different) generator matrix for C and

is a parity check matrix for C. Note also that GH = 0.

Definition 3.2.6 (Equivalent codes). Let C1 and C2 be block codes of length n. If

Example 3.2.7. Let C1 be the linear code with generator matrix

Example 3.2.9. (Continued from previous example.)

On the other hand

Definition 3.3.1 (Coset). The (right) coset of C determined by a word u ∈ K n is the

Theorem 3.3.2. Let C ⊆ K n be a linear code and u, v ∈ K n . Then