Full The Mathematics of Coding Theory 1st Edition Paul Garrett Ebook All Chapters
Full The Mathematics of Coding Theory 1st Edition Paul Garrett Ebook All Chapters
Full The Mathematics of Coding Theory 1st Edition Paul Garrett Ebook All Chapters
com
https://ebookgate.com/product/the-mathematics-of-
coding-theory-1st-edition-paul-garrett/
https://ebookgate.com/product/the-theory-of-information-and-
coding-2nd-edition-mceliece/
https://ebookgate.com/product/advances-in-coding-theory-and-
crytography-1st-edition-t-shaska/
https://ebookgate.com/product/the-foundations-of-mathematics-in-
the-theory-of-sets-1st-edition-john-p-mayberry/
https://ebookgate.com/product/arithmetic-geometry-and-coding-
theory-agct-2003-1st-edition-yves-aubry/
Transforming Children s Services Social Work
Neoliberalism and the Modern World 1st Edition Paul
Michael Garrett
https://ebookgate.com/product/transforming-children-s-services-
social-work-neoliberalism-and-the-modern-world-1st-edition-paul-
michael-garrett/
https://ebookgate.com/product/lattice-coding-for-signals-and-
networks-a-structured-coding-approach-to-quantization-modulation-
and-multiuser-information-theory-zamir-r/
https://ebookgate.com/product/information-theory-and-coding-1st-
edition-dr-muralidhar-kulkarni-dr-shivaprakash-k-s/
https://ebookgate.com/product/non-representational-theory-1st-
edition-paul-simpson/
https://ebookgate.com/product/channel-coding-in-communication-
networks-from-theory-to-turbo-codes-1st-edition-edition-alain-
glavieux/
Copyright ©2004 by Pearson Education
ISBN 0-13-101967-8
Copyright © 2004 by Prentice-Hall. Inc.
Ali right reserved.
Contents
Preface . . . . . . . . . . . . . . . . '; . . . . . . xi
1 Probability . . . . . . 1
1.1 Set.s and functions 1
1.2 COllnting . . . . .... 5
1.3 Prdiminary idea..'i of probability 8
1.4 More formal view of probability 13
1.5 Random variables, expected values, variance 20
1.6 Markov's inequality, Chebysheff's inequality 27
1.7 Law of Large Numbers . . . . . . . . . 27
2 Information 33
2.1 Uncertainty, acquisition of information 33
2.2 Definition of entropy . . . '. . . . . 37
vii
viii Contents
6 The Integers ..... . 93
6.1 The reduction algorithm 93
6.2 Divisibility 96
6.3 Factorization into primes 99
6.4 A failure of unique factorization 103
6.5 The Euclidean Algorithm 105
6.6 Equivalence relations 108
6.7 The integers modulo m 111
6.8 The finite field Zip for p prime 115
6.9 Fermat's Little Theorem 117
6.10 Euler's theorem 118
6.11 Facts about primitive roots 120
6.12 Euler's criterion' 121
6.13 Fast modular exponentiation 122
6.14 Sun-Ze's theorem. 124
6.15 Euler's phi-function. 128
xi
xii Preface
redundancy must be designed into CD-ROM and other data storage protocols to
achieve similar robustness.
There are other uses for detection of changes in data: if the data in question is
the operating system of your computer, a change not initiated by you is probably
a sign of something bad, either failure in hardware or software, or intrusion by
hostile agents (whether software or wetware). Therefore, an important component.
of systems security is implementation of a suitable procedure to det.ect alterations
in critical files.
In pre-internet times, various schemes were used to reduce the bulk of commu-
nication without losing the content: this influenced the design of the telegraphic
alphabet, traffic lights, shorthand, etc. With the advent of the telephone and ra-
dio, these matters became even more significant. Communication with exploratory
spacecraft having very limited resources available in deep space is a dramatic ex-
ample of how t.he need for efficient and accurate transmission of information has
increased in our recent. history.
In this course we will begin with the model of communication and information
made explicit by Claude Shannon in the' 1940's, after some preliminary forays by
Hartley and others in the preceding decades.
Many things are omitted due to lack of space and time. In spite of their
tremendous importance, we do not mention convolutional codes at all. This is
partly because there is less known about them mathematically. Concatenated codes
are mentioned only briefly. Finally, we also omit any discussion of the so-called
turbo codes. Thrbo codes have been recently developed experimentally. Their
remarkably good behavior, seemingly approaching the Shannon bound, has led to
the conjecture that they are explicit solutions to the fifty-year old existence results
of Shannon. However, at this time there is insufficient understanding of the reasons
for their good behavior, and for this reason we will not attempt to study them here.
We do give a very brief introduction to geometric Goppa codes, attached to
algebraic curves, which are a natural generalization of Reed-Solomon codes (which
we discuss), and which exceed the Gilbert-Varshamov lower bound for performance.
The exercises at the ends of the chapters are mostly routine, with a few more
difficult exercises indicated by single or double asterisks. Short answers are given
at the end of the book for a good fraction of the exercises, indicated by '(ans.)'
following the exercise.
I offer my sincere thanks to the reviewers of the notes that became this volume.
They found many unfortunate errors, and offered many good ideas about improve-
ments to the text. While I did not choose to take absolutely all the advice given, I
greatly appreciate the thought and energy these people put into their reviews: John
Bowman, University of Alberta; Sergio Lopez, Ohio University; Navin Kashyap,
University of California, San Diego; James Osterburg, University of Cincinnati;
LeRoy Bearnson, Brigham Young University; David Grant, University of Colorado
at Boulder; Jose Voloch, University of Texas.
Paul Garrett
garrett@math. umn. edu
http://wWw .math. umn.edu(garrett /
1
Probability
1.1 Sets and functions
1.2 Counting
1.3 Preliminary ideas of probability
1.4 More formal view of probability
1.5 Random variables, expected values, variance
1.6 Markov's inequality, Chebysheff's inequality
1. 7 Law of Large Numbers
s= {1,2,3,4,5,6,7,8}
1
2 Chapter 1 Probability
which is the set of integers greater than 0 and less than 9. This set can also be
described by a rule like
T C = S - T = {s E S : s It T}
Sets can also be elements of other sets. For example, {Q, Z, R, C} is the set
with 4 elements, each of which is a familiar set of numbers. Or, one can check that
{{1,2},{1,3},{2,3}}
pairs of real numbers, which we use to describe points in the plane. And R3 is
the collection of ordered triples of real numbers, which we use to describe points in
three-space.
The power set of a set S is the set 01 subsets of S. This is sometimes denoted.
by PS. Thus,
P¢> = {¢>}
although the latter notation gives no information about the nature of 1 in any
detail.
More rigorously, but less intuitively, we can define a function by really telling
its graph: the formal definition is that a function 1 : A - B is a subset of the
product A x B with the property that for every a E A there is a unique b E B so
that (a, b) E I. Then we would write I(a) = b.
This formal definition is worth noting at least because it should make clear that
there is absolutely no requirement that a function be described by any recognizable
or simple 'formula'.
Map and mapping are common synonyms for function.
As a silly example of the formal definition of function, let 1 : {I,3} - {2,6}
be the function 'multiply-by-two', so that 1(1) = 2 and 1(3) = 6. Then the 'official'
definition would say that really 1 is the subset of the product set {I,3} x {2,6}
consisting of the ordered pairs (1,2), (3,6). That is, formally the function 1 is the
set
1 = {(I, 2), (3, 6)}
Of course, no one usually operates this way, but it is important to have a precise
meaning underlying more intuitive usage.
A function 1 : A - B is surjective (or onto) if for every b E B there is
a E A so that I(a) = b. A function 1 : A - B is injective (or one-to-one) if
I(a) = I(a') implies a = a'. That is, 1 is injective if for every b E B there is at
most one a E A so that I(a) = b. A map is a bijection if it is both injective and
surjective.
The number of elements in a set is its cardinality. Two sets are said to have
the same cardinality if there is a bijection between them. Thus, this is a trick
so that we don't have to actually count two sets to see whether they have the same
number of elements. Rather, we can just pair them up by a bijection to achieve
this purpose.
Since we can count the elements in· a finite set in a traditional way, it is clear
that a finite set has no bijection to a proper subset 01 itsell. After all, a proper
subset has lewer elements.
4 Chapter 1 Probability
By contrast, for infinite sets it is easily possible that proper subsets have bijec-
tions to the whole set. For example, the set A of all natural numbers and the set
E of even natural numbers have a bijection between them given by
n-2n
f(g(b)) = idB(b) = b
This completes the proof that if f has an inverse then it is a bijection. / //
1.2 Counting
Here we go through various standard elementary-but-important examples of count-
ing as preparation for finite probability computations. Of course, by 'counting' we
mean structured co~ting.
Example: Suppose we have n different things, for example the integers from 1 to
n inclusive. The question is how many different orderings or ordered listings
of these numbers are there? Rather than just tell the formula, let's quickly derive
it. The answer is obtained by noting that there are n choices for the first thing il,
then n -1 remaining choices for the second thing i2 (since we can't reuse whatever
i l was), n - 2 remaining choices for i3 (since we can't reuse i l nor i2, whatever they
were!), and so on down to 2 remaining choices for in-l and then just one choice for
in. Thus, there are
n· (n - 1) . (n - 2) .... ·2· 1
possible orderings of n distinct things. This kind of product arises often, and there
is a notation and name for it: n-factorial, denoted nt, is the product
n! = n . (n - 1) . (n - 2) .... ·2·1
O! = 1
,n x n xv ... x n' = nk
k
there are n - 1 remaining choices for the second, since the second element must
be different from the first. For each choice of the first and second there are n - 2
remaining choices for the third, since it must be different from the first and second.
This continues, to n - (k - 1) choices for the kth for each choice of the first through
(k - 1) th, since the k - 1 distinct element already chosen can't be reused. That is,
altogether there are
ordered k-tuples of distinct elements that can be chosen from a set with n elements.
Example: How many (unordered!) subsets of k elements are there in a set of
n things? There are n possibilities for the first choice, n - 1 remaining choices for
the second (since the first item is removed), n - 2 for the third (since the first and
second items are no longer available), and so on down to n - (k -1) choices for the
kth. This number is n!/(n - k)!, but is not what we want, since it includes a count
of all different orders of choices, but subsets are not ordered. That is,
n'
(n _. k)! = k! x the actual number
since we saw in a previous example that there are k! possible orderings of k distinct
things. Thus, there are
n!
k! (n - k)!
choices of subsets of k elements in a set with n elements.
The number n!/k!(n - k)! also occurs often enough to warrant a name and
notation: it is called a binomial coefficient, is written
n! (n)
k! (n - k)! - k
and is pronounced 'n choose k' in light of the previous example. The name 'binomial
coefficient' is explained below in the context of the Binomial Theorem.
Example: How many disjoint pairs of 3-element and 5-element subsets are there
in a set with 10 elements? We just saw that there are (~) choices for the
first subset with 3 elements. Then the remaining part of the original set has just
10 - 3 = 7 elements, so there are (~) choices for the second subset of 5 elements.
Therefore, there are
=
1O! 5!
5! 5! . 3! 2! =
(10)
5
(5)
3
1.2 Counting 7
pairs of disjoint subsets of 3 and 5 elements inside a set with 10 elements. Note
that we obtain the same numerical outcome regardless of whether we first choose
the 3-element subset or the 5-element subset.
Example: How many disjoint pairs of subsets, each with k elements, are there in
a set with n elements, where 2k ~ n? We saw that there are (~) choices for the
first subset with k elements. Then the remaining part of the original set has just
n - k elements, so there are (n ~ k) choices for the second subset of k elements.
But our counting so far inadvertently takes into account a first subset and a second
one, which is not what the question is. By now we know that there are 2! = 2
choices of ordering of two things (subsets, for example). Therefore, there are
for the eth subset. But since ordering of these subsets is inadvertently counted
here, we have to divide bye! to have the actual number of families. There is some
cancellation among the factorials, so that the actual number is
This identity shows that the binomial coefficients are integers, and is the basis for
other identities as well. This identity is proven by induction, as follows. For n = 1
the assertion is immediately verified. Assume it is true for exponent n, and prove
the corresponding assertion for exponent n + 1. Thus,
= t
k=O
(~) (xk+1yn-k + xkyn-k+l)
Thus, to prove the formula of the Binomial Theorem for exponent n + 1 we must
prove that for 1 ~ k ~ n
n! n! n! k n! (n - k + 1)
= (k - I)! (n - k + I)! + k! (n - k)!
= k! (n - k + I)! + -:-;--::=-----:-----:--c-:
k! (n - k + I)!
(n+l)! (n+l)
= k! (n - k + I)! = k
as claimed.
rule is taken as true for 'chance': the sum of the percenta.ges of all possible outcomes
should be 100%.
But what is 'probability'?
Example: A 'fair coin' is presumed to have equal probabilities ('equal chances')
of landing heads-up or tails-up. Since the probability of heads is the same as the
probability of tails, and since the two numbers should add up to 1, there is no
choice but to assign the probability 1/2 to both. Further, each toss of the coin is
presumed to-have an outcome independent of other tosses before and after. That is,
there is no mechanism by which the outcome of one toss affects another. Now we
come to a property which we would assume to be roughly true: out of (for example)
10 coin tosses we expect about half to be heads and about half to be tails. We
expect that it would very seldom happen that 10 out of 10 tosses would all be
heads. Experience does bear this out. So far, in this vague language, there is no
obvious problem.
But, for example, it would be a mistake to be too aggressive, and say that we
should expect exactly half heads and half tails. Experimentation will show that out
of repeated batches of 10 coin flips, only about 1/4 of the time will there be exactly
5 heads and 5 tails. (Only once in about 210 = 1024 times will one get all heads.)
In fact, about 2/5 of the time there will be either 6 heads and 4 tails, or vice versa.
That is, a 6-4 or 4-6 distribution of outcomes is more likely than the 'expected' 5-5.
But this is not a paradox, since· upon reflection our intuition might assure us
not that there will be exactly half heads and half tails, but only approximately half
and half. And we can retell the story in a better way as follows. So what does
'approximately' mean, exactly?
In a trial of n coin flips, each flip has two possible outcomes, so there are
possible sequences of n outcomes. The assumptions that the coin is 'fair' and
that the separate coin tosses do not 'influence' each other is interpreted as saying
that each one of the 2n possible sequences of coin-toss outcomes is equally likely.
Therefore, the probability of any single sequence of n outcomes is 1/2n. Further,
for any subset S of the set A of all 2n possible sequences of outcomes, we assume
that
probability of a sequence of n tosses giving an outcome in S
number of elements in S number of elements in S
=
number of elements in A 2n
Then the probability that exactly k heads will occur out of n tosses (with
0:$ k :$ n) is computed as
as commented jUBt above. And the probability that 6 heads and 4 tails or 4 heads
and 6 tails occur is
Perhaps not entirely surprisingly, the probability of getting exactly half heads
and half tails out of 2n flips goes down as the number of flips goes up, and in fact
goes to 0 as the number of flips goes to infinity. Nevertheless, more consistent
with our intuition, the sense that the number of heads is approximately one half is
correct. Still, i~ terms of the expression
lim
n ..... oo
e:)
22n
=0
It is not so easy to verify this directly, but consideration of some numerical examples
is suggestive if not actually persuasive. Quantification of the notion that the number
of heads is approximately one half is filled in a little later. by the Law of Larye
Numbers.
1.3 Preliminary ideas of probability 11
m/ 22
(~)/24
::::l
::::l
0.5
0.375
m/ 26
(!)/2 8
::::l
::::l
0.3125
0.2734
C5 )/212
0 1O ::::l 0.2461
(';)/2 ::::l 0.1813
cn/ 214
(~0)/220
::::l
::::l
0.1683
0~176197052002
(~~)/230 ::::l 0.144464448094
40
(:)/2 ::::l 0.12537068762
(~~)/250 ::::l 0.112275172659
(:)/260 ::::l 0.102578173009
G~)/270 ::::l 0.0950254 735405
(!g)/280 ::::l 0.0889278787739
(:0)/290 ::::l 0.0838711229887
(~)/2100 ::::l 0.0795892373872
(~~D/2200 ::::l 0.0563484790093
(~)/2300 ::::l 0.046027514419
400
d:)/2 ::::l 0.0398693019638
(~og)/2500 ::::l 0.0356646455533
(~)/2600 ::::l 0.032559931335
7OO ::::l 0.0301464332521
G:)/2
(!oog) /2800 ::::l 0.0282006650947
900
(=)/2 ::::l 0.0265887652343
C5~~) /2 1000 ::::l 0.0252250181784
Remark: We're not really answering the question 'what is probability?', but
instead we're telling how to compute it.
One attempt to be more quantitative taken in the past, but which has several
flaws, is the limiting frequency definition of probability, described as follows in
the simplest example. Let N{n} be the number of times that a head came up inn
trials. Then as n grows larger and larger we might imagine that the ratio N{n}/n
should get 'closer and closer' to the 'probability' of heads {1/2 for a fair coin}. Or,
in the language of limits, it should be that
flips and approximate the probability, how many do we need to do? Third, how
do we know that every infinite sequence of trials will give the same limiting value?
There are many further objections to this as a fundamental definition, but we should
be aware of interpretations in this direction. A more supportable viewpoint would
make such limiting frequency assertions a consequence of other things, called the
Law of Large Numbers. We'll prove a special case of this a bit later.
Example: The next traditional example involves picking colored balls out of an
urn. Suppose, for example, that there are N balls in the urn, r red ones and
b = N - r blue ones, and that they are indistinguishable by texture, weight, size,
or in any way. Then in choosing a single ball from the urn we are 'equally likely'
to choose anyone of the N. As in the simpler case of coin flips, there are N
possibilities each of which is equally likely, and the probabilities must add up to 1,
so the probability of drawing any particular ball must be 1 IN. Further, it may seem
reasonable to postulate that the probability of picking out one ball from among a
fixed subset of k would be k times the probability of picking a single ball. Granting
this, with r red balls and b blue ones, we would plausibly say that the probability
is r IN that a red ball will be "chosen and biN that a blue ball will be chosen. (We
should keep iI;l mind that some subatomic particles do not behave in this seemingly
reasonable manner!) So without assigning meaning to probability, in some cases we
can still reach some conclusions about how to compute it.
We suppose that one draw (with replacement) has no effect on the next one,
so that they are independent. Let r(n) be the number of red balls drawn in a
sequence of n trials. Then, in parallel with the discussion just above, we would
presume that for any infinite sequence of trials
number of red balls drawn in n draws r
lim
n-+oo n N
But, as noted above, this should not be the definition, but rather should be a
deducible consequence of whatever definition we make.
Running this. in the opposite direction: if there are N balls in an urn, some
red and some blue, if r(n) denotes the number of red balls chosen in n trials, and if
lim r(n) = f
n--+oo n
then we would suspect that
PI+P2+···+Pn=1
P(H) = ~
1
P(T) = 2
This is the model of a fair coin.
A more general idea of event (sometimes called compound event) is any
subset A of the sample space, that is, of the set 0 = {Xl, ... , Xn} of all possible
events. In that context, the events {Xi} are sometimes called atomic events. The
probability of A is
P(A) = I: P(Xi)
xiEA
where (to repeat) the sum is over the 'points' Xi that lie in A. The function PO
extended in this fashion is really what a probability measure is. The event A
occurs if anyone of the Xi E A occurs. Thus, for A = {Xi) , ... , Xik },
As extreme cases,
1'(0) = 1
and
P(<fJ) = 0
Generally, for an event A, the event not-A is the set-theoretic complement AC =
0- A of A inside O. Then
1
P(ri) = 10
. 1
P(bj ) = 10
for all i and j. This is the model of 10 balls in an urn. Then the subsets
7
P(B) = P(bI ) + P(~) + ... + P(~) = 10
exists and is unique. Then this limiting frequency Pi should be the probability
of the event Wi.
1.4 .More formal view of probability 15
Example: Consider the experiment of drawing a ball from an urn in which there
are 3 red balls, 3 blue balls, and 4 white balls (otherwise indistinguishable). As
above, we would postulate that the probability of drawing any particular individual
ball is 1/10. (These atomic events are indeed 'mutually exclusive, because we only
draw one ball at a time.) Thus, the 'smallest' events Xl, X2, ••• ,XlO are the possible
drawings of each one of the 10 balls. Since they have equal chances of being drawn,
the probabilities Pi = P(Xi) are all the same (and add up to 1):
PI = P2 = P3 == ... = PIO
Then the ('compound') event A of 'drawing a red ball' is the subset with three
elements consisting of 'draw red ball one' ,'draw red ball two', and 'draw red ball
three'. Thus,
1 1 1 3
P(A) = 10 + 10 + 10 = 10
Let B be the event 'draw a white ball'. Then, since A and B are disjoint events,
the probability of drawing either a red ball or a white ball is the sum:
Proof: When N = 1, the probability that A occurs is p, and the binomial coef-
ficient G) is 1. The probability that A does not occur is 1 - p, and @ = 1 also.
The main part of the argument is an induction on N. Since the different trials are
independent, by assumption, we have
P(A occurs in k of N)
= P(A occurs in k of the first N ~ 1) . P(A does not occur in the Nth)
We can see already that the powers of p and of 1 - p will match, so it's just a matter
of proving that
which we already verified in proving the Binomial Theorem earlier. This completes
the induction and the proof. III
Let n be a probability space, and let A be a ('compound') event with P(A) > o.
Let B be another ('compound') event. Then the conditional probability
P(BIA) = P(A n B)
P(A)
In effect, the phrase 'given that A occurs' means that we replace the 'universe' n
of possible outcomes by the smaller 'universe' A of possibilities, and 'renormalize'
all the probabilities accordingly.
The formula P(BIA) = P(A n B)I P(A) allows us to compute the conditional
probability in terms of the other two probabilities. In 'real-life' situations, it may
be that we know P(BIA) directly, for some other reasons. If we also know P(A),
then this gives us the formula for
namely
P(A n B) = P{BIA) . P{A)
Example: What is the probability that 7 heads appear in 10 flips of a fair coin
given that at least 4 heads appear? This is a direct computation of conditional
probability:
P(7 heads)
P( at least 4 heads)
C~)·~
1.4 More formal view of probability 17
C~)
= ....,.(~=O)-+---:-;;C5=O)-+---:-;;C6=O)-+----:'C7~O)-+---:-;;C8=O)-+---:-;;C9=O)-+~G=-g)
since the requirement of 7 heads and at least 4 is simply the requirement of 7 heads.
Two subsets A, B of a probability space n are independent if
P(B) = P(B/A)
and equivalently
P(A) = P(AIB)
Example: Let n = {10, 11, ... , 99} be the collection of all integers from 10 to 99,
inclusive. Let A be the subset of nconsisting of integers x E n whose ones'-place
digit is 3, and let B be the subset of integers x E n whose tens'-place digit is 6.
Then it turns out that
P(A n B) :::;: P(A) . P(B)
so, by definition, these two ('compound') events are independent. Usually we expect
an explanation for an independence result, rather than just numerical verification
that the probabilities behave as indicated. In the present case, the point is that
there is no 'causal relation' between the one's-place and tens'-place digits in this
example.
To model repeated events in this style, we need to use the set-theoretic idea of
cartesian product: again, the cartesian product of n sets Xl,'" ,Xn is simply
the collection of all ordered n tuples (Xl!' .. , xn) (the parentheses and commas are
mandatory), where Xi E Xi. The notation is
for any n-tuple (Wi!, Wi2' . .. ,Win). It's not hard to check that with this probability
measure nn is a probability space. Further, even for 'compound' events AI, ... , An
in n, it's straightforward to show that
where Al x ... x An is the cartesian product of the AiS and naturally sits inside
the cartesian product n x ... x n = nn.
The idea is to imagine that (Wi"Wi2' ..• ,Win) is the event that Wi, occurs on the
first trial, Wi2' on the second, and so on until Win occurs on the nth. Implicit in this
model is the idea that later events are independent of earlier ones. Otherwise that
manner of assigning a probability measure on the cartesian power is not appropriate!
Example: Let n = {H, T} with P(H) = 1/2 and P(T) = 1/2, the fair-coin-
flipping model. To model flipping a fair coin 10 times, one approach is to look at n lO ,
which is the set of all 100tuples of values which are either heads or tails. Each such
10-tuple is assigned the same probability, 1/210 • Now consider the ('compound')
event
A = exactly 7 heads in 10 flips
This subset of n10 consists of all (ordered!) l(}.tuples with exactly 7 heads values
among them, and (by definition) the probability of A is the number of such mul-
tiplied by 1/210 , since each such 'atomic' event has probability 1/210 • Thus, to
compute P(A) we need only count the number of elements of A. It is the number
of ways to choose 7 from among 10 things, which is the binomial coefficient e~).
Thus,
P(7 heads in 10 flips) = P(A) = C;) . 2!O
Example: Let n = {TI, bI,~} with P(TI) = P(bd = P(~) = 1/3, modeling a
red ball and two blue balls in an urn. To model drawing with replacement 5 times,
one approach is to look at n5 , which is the set of all 5-tuples of values either TI, bb
or~. Each such 5-tuple is assigned the same probability, 1/35 . Now consider the
('compound') event
Note that the numerical value does not depend on the exact location of the Aos and
the rIS, but only on the number of them. Thus, the number of ways to choose the
3 locations of the Ao element from among the 5 places is the binomial coefficient
(~). Thus,
P(A) = L P(x)
xEA
and instead only assign probabilities to a restricted class of subsets. For example,
we might assign
P([a, b]) = b - a
for any subinterval la, b] of 10, 1], and then define
for disjoint collections of intervals [ai, bi]. This is a start, but we need more. In
fact, for a collection of mutually disjoint intervals
indexed by positive integers, we can compute the probability of the union by the
obvious formula
{WEn:w¢A}
20 Chapter 1 Probability
P(not A) = 1 - P(A)
Further, we can repeat these two types of operations, taking countable unions of
disjoint sets, and taking complements, making ever more complicated sets whose
probability measure is definable in this example. (The totality of sets created has
a name: it is the collection of Borel sets in [O,lJ.) To know that these processes
really define a probability measure requires proof!
Example: A more important example for our immediate applications is
where all the symbols Si lie in some fixed set 0 0 , Analogous to the previous example,
we restrict our attention initially to cylinders (also called cylindrical sets), which
means sets of the form
S(8}, S2, .•. , sn)
We can repeat these processes indefinitely, making ever more complicated subsets
to which we can assign a probability measure.
Remark: Yes, due to tradition at least, instead of the 'f' otherwise often used
for functions, an 'X' is used, perhaps to be more consonant with the usual use of x
for a (non-random?) 'variable'. Further, there is a tradition that makes the values
of X be labeled 'Xi' (in conflict with the calculus tradition).
For a possible value X of X, we extend the notation by writing
That is, the probability that X = x is defined to be the probability of the subset
of n on which X has the value x.
The expected value of such a random variaole on a probability space n =
{Wl,' .. ,wn } is defined to be
will be 'close to' E(X). But in fact we can prove such a thing, rather than just
imagine that it's true: again, it is a Law of Large Numbers.
The simplest models for the intuitive content of this idea have their origins
in gambling. For example, suppose Alice and Bob ('A' and 'B') have a fair coin
(meaning heads and tails both have probability 0.5) and the wager is that if the
coin shows heads Alice pays Bob a dollar, and if it shows tails Bob pays Alice a
dollar. Our intuition tells us that this is fair, and the expected value computation
n
corroborates this, as follows. The sample space is = {wo,wd (index '0' for ¥ads
and '1' for tails), with each point having probability 0.5. LetfX be the random
variable which measures Alice's gain (or loss):
X(Wo) = -1 X(wt} = +1
In general, a fair wager is one such that everyone's expected gain is O. (What's
the point of it then? Probably that perceptions of probabilities can differ, and that
some people like pure gambling.)
It is important to notice that an expected value is more sophisticated than the
most naive idea of 'average'. For example, suppose we choose an integer at random
in the range 1-10 and square it. With equal probabilities assigned, the expected
value of the square is
22 Chapter 1 Probability
It is not true that we can take the average of 0-10 first (namely, 5) and square it
(getting 25) to obtain the expected value.
Proposition: Let X and Y be two random variables on a sample space n =
{WI. ... ,wn }, with probabilities P(Wi) = Pi. The sum random variable X + Y is
defined in the natural way as
Then
E(X + Y) = E(X)+ E(Y)
III
Proposition: Let X be a random variable on a sample space n = {WI. ... , wn },
with probabilities P(Wi) = Pi. Let c be a constant. The random variable eX is
defined in the natural way as
Then
E(cX) = e· E(X)
III
Let n be a sample space. Let X and Y be random variables on n. The product
random variable XY is defined on the sample space n in the reasonable way:
These two random variables X and Y are independent random variables if for
every pair x, y of possible values of X, Y, we have
E(XY) = L P(w)XY(w}
wEn
L P(w)X(w) Y(w)
wEn
To prove the proposition gracefully it is wise to use the notation introduced above:
let x range' over possible values of X and let y range over possible values of Y.
Then we can rewrite the expected value by grouping according to values of X, and
Y: it is
L LP(w)X(w) Y(w)
x,Y w
L P(X=x)P(Y=y)xy
x,Y
III
Remark: If X and Y are not independent the conclusion of the previous proposi-
tibn may be faIse. For example, let X and Y both be the number of heads obtained
in a single flip of a fair coin. Then XY = X = Y, and we compute that
ON =0 x '" x 0
..
N
The (non-negative) square root (7 of the variance (72 is the stand~d deviation
ofX.
Finally, we compute the expected value and variance for the binomial distri-
bution. That is, fix a positive integer n, fix a real number p in the range 0 $ p $ 1,
and let 0 be the probability space consisting of all ordered n-tuples of O's and 1's,
with
P(a particular sequence with iI's and n - i O's) = pi (1 _ p)n-i
Let X be the random variable on 0 defined by
Remark: The expected value assertion is certainly intuitively plausible, and there
are also easier arguments than what we give below, but it seems reasonable to warm
up to the variance computation by a similar but easier computation of the expected
value.
Proof: This computation will illustrate the use of generating functions to evaluate
naturally occurring but complicated looking expressions. Let q = 1 - p.
First, let's get an expression for the expected value of X: from the definition,
An astute person who remembered the binomial theorem might remember that it
asserts exactly that the analogous summation without the factor i in front of each
term is simply the expanded form of (p + q)n:
This is encouraging! The other key point is to notice that if we differentiate the
latter expression with respect to p, without continuiIig to re<J.uire q = 1 - p, we get
The left-hand side is nearly the desired expression, but we're missing a power of p
throughout. To remedy this, multiply both sides of the equality by p, to obtain
Once again requiring that p + q = 1, this siInplifies to give the expected value.
To compute the variance, we first do some formal computations: let J.L be E(X).
Then
u 2(X) = E«X - J.L?) = E (X2 - 2J.LX + J.L2) = E(X2) - 2J.L E(X) + J.L2
= E(X2) - 2J.L . J.L + J.L2 = E(X2) - J.L2
So to compute the variance of X the thing we need to compute is E(X2)
n
E(X2) = L P(X = k) . k 2
k=O
26 . Chapter 1 . Probability
As usual, there are (z) ways to have exactly -k 1's, and each way 'Occurs with
probability pkqn-k. Thus,
This is v~ry similar to the expression that occurred above in cOIuputing the
expected value,· but now we have the extra factor i 2 i~ front of each term instead
of i. But of course we might repeat the trick we used above and see what happens:
since
(). .
p-p' = ip'
()p
then by repeating it we have
~
L- . (n) i 2p'qn-.
' . = ~
L- (n). () (). .
p--:-' p~p'qn-:.
i=O ~ i=O ~ 8P 8p
() () (n).
=p- ' p - ~
8p
L-
8p i=O ~
Pi qn- i = p()- ' p()- (.p+q )n
8p ()p
since after getting the i 2 out from inside the sum we can recogru~ the. binomial
expansion. Taking derivatives gives . . . ,
Using p + q = 1 gives
E(X2) = p(n + p. n(n - 1»
So then
= p(l- p)n
This finishes t.hecomputation of the variance of a binomial distribution. ///
Remark: The critical or skeptical reader might notice that there's something
suspicious about differentiating with respect to p in the above arguments, as if p and
. q were independent variables, when in fact p+q = 1. Indeed, if apers6n' had decided .
that p was a constant, then they might feel inhibited about differentiating with
respect to it at all. But, in fact, there is no imperative to invoke the relationship
p + q = 1 unt.il after the differentiation, so the computation is legitimate.
1.7 Law of Large Numbers 27
Then
f(X) ~ a· X(X)
Note that the expected value of the random variable X(X) is simply the probability
that f(X) ~ a:
EX(X) =L P(X(X) = x)·x = P(X(X) = O)·O+P(X(X) = 1)·1 = P(f(X) ~. a)
by the definition of x. Taking the expected value of both sides of this and using
f(X) ~ a· X(X) gives
E f(X) ~ a· EX(X) = a· P(f(X) ~ a)
by the previous observation. III
Corollary: (ChebyshefJ's Inequality) Let X be a real-valued random variable. Fix
e > o. Then
P(IX - E(X)I ~ e) ~ q2;
g
Prool: This follows directly from the Markov inequality, with f(X) = (X _E(X»2
and a = g2. III
Fix a real number p in the range 0 S PSI, and put q = 1 - p. These will
be unchanged throughout this section. Let n be a positive integer, which will be
thought of as increasing. Let On be the probability space consisting of all ordered
n-tuples of O's and 1's, with
(for 0 SiS n, 0 otherwise). We earlier computed the expected value EXn = pn.
We also computed the variance
Proof: We will obtain this by making a good choice of the parameter in Cheby-
chefl"'s inequality. We know from computations above that E(Xn) = p. n and
0"2(Xn) = p(1 - p)n. Chebychefl"'s inequality asserts in general that
1
P (IX - E(X)I > to"(X)) < t 2
Exercises
1.01 How many elements are in the set {I, 2, 2, 3, 3, 4, 5}? How many are in the
set {I, 2, {2}, 3, {3}, 4, 5}? In {I, 2, {2, 3}, 3, 4, 5}? (ans.)
1.02 Let A = {I, 2, 3, 4, 5} and B = {3,4, 5, 6, 7}. List (without repetition) the
elements of the sets Au B, An B, and of {x E A : x f/. B}. (ans.)
1.03 List all the elements of the power set (set of subsets) of {I,2,3}. (ans.)
1.04 Let A = {I, 2, 3} and B = {2,3}. List (without repetition) all the elements
of the cartesian product set A x B. (ans.)
1.05 How many different ways are there to order the set {I, 2, 3, 4}? (ans.)
1.06 How many choices of 3 things from the list 1,2,3, ... ,9, 10 are there? (ans.)
1.07 How many subsets of {I, 2, 3, 4,5,6, 7} are there with exactly 4 elements?
(ans.)
1.08 How many different choices are there of an unordered pair of distinct numbers
from the set {I, 2, ... ,9, 1O}? How many choices of ordered pair are there?
(ans.) .
1.09 How many functions are there from the set {I, 2, 3} to the set {2, 3, 4, 5}?
(ans.)
1.10 How many injective functions are there from {I,2,3} to {I,2,3,4}? (ans.)
1.11 How many injective functions are there from {I, 2, 3} to {I, 2, 3, 4, 5}?
1.12 How many surjective functions are there from {I,2,3,4} to {I,2,3}? (ans.)
1.13 How many surjective functions are there from {I,2,3,4,5} to {I,2,3,4}?
1.14 How many surjective functions are there from {I, 2, 3, 4, 5} to {I, 2, 3}?
1.15 Prove a formula for the number injective functions from an m-element set
to an n-element set.
1.16 (*) Let S(m, n) be the number of surjective functions from an m-element
set to an n-element set (with m ~ n). Prove the recursive formula
S(m,n) = nm - ~
n-l (
7. )
S(m,i)
1.20 Verify that the sum of all binomial coefficients (~) with 0 ::; k ::; n is 2n.
(ana.)
1.21 Verify that the sum of expressions (_l)k (~) with 0 ::; k ::; n is O.
1.22 How many subsets of all sizes are there of a set 8 with n elements? (ana.)
1.23 How many pairs are there of disjoint subsets A, B each with 3 elements
inside the set {1,2,3,4,5,6, 7,8}? (ana.)
1.24 Give a bijection from the collection 2Z of even integers to the collection Z
of all integers. ( ana.)
1.25 Give a bijection from the collection of all integers to the collection of non-
negative integers. ( ana. )
1.26 (*) Give a bijection from the collection of all positive integers to the collection
of all rational numbers.
1.27 (**) This illustrates a hazard in a too naive notion of a rule for forming a
set. Let 8 be the set of all sets which are not an element of themselves.
That is, let
8 = { sets x: x ¢ x}
Is 8 E 8 or is 8 ¢ 8? (Hint: Assuming either that 8 is or isn't an element
of itself leads to a contradiction. What's going on?)
1.28 What is the probability of exactly 3 heads out of 10 flips of a fair coin?
(ana.)
1.29 What is the probability that there will be strictly more heads than tails out
of 10 flips of a fair coin? Out of 20 flips? ( ana.)
1.30 If there are 3 red balls and 7 blue balls in an urn, what is the probability
that in two trials two red balls will be drawn? (ans.)
1.31 If there are 3 red balls and 7 blue balls in an urn, what is the probability
that in 10 trials at least 4 red balls will be drawn?
1.32 Prove that
1
1 + 2 + 3 + 4 + ... + (n -1) + n = 2n(n + 1)
1.33 A die is a small cube with numbers 1-6 on its six sides. A roll of two dice
has an outcome which is the sum of the upward-facing sides of the two, so
is an integer in the range 2-12. A die is fair if anyone of its six sides is as
likely to come up as any other. What is the probability that a roll of two
fair dice will give either a '7' or an '8'? What is the probability of a '2'?
1.34 What is the probability that there will be fewer than (or exactly) N heads
out of 3N flips of a fair coin?
1.35 (*) You know that in a certain house there are two children, but you do
not know their genders. You know that each child has a 50-50 chance of
Exercises 31
being either gender. When you go to the door and knock, a girl answers the
door. What is the probability of the other child being a boy? (False hint:
out of the 4 possibilities girl-girl, girl-boy, boy-girl, boy-boy, only the first
3 occur since you know there is at least one girl in the house. Of those 3
possibilities, in 2/3 of the cases in addition to a girl there is a boy. So (?) if
a girl answers the door then the probability is 2/3 that the other child is a
boy.) (Comment: In the submicroscopic world of elementary particles, the
behavior of the family of particles known as bosons is contrary to the correct
macroscopic principle illustrated by this exercise, while fermions behave in
the manner indicated by this exercise.)
1.36 The Birthday Paradox: Show that the probability is greater than 1/2 that,
out of a given group of 24 people, at least two will have the same birthday.
1.37 (*) The Monty Hall paradox You are in a game show in which contestants
choose one of three doors, knowing that behind one of the three is a good
prize, and behind the others nothing of any consequence. After you've chosen
one door, the gameshow host (Monty Hall) always shows you that behind
one of the other doors there is nothing and offers you the chance to change
your selection. Should you change? (What is the probability that the prize
is behind the door you did not initially choose? What is the probability that
the prize is behind the other closed door?)
1.38 (**) Suppose that two real numbers are chosen 'at random' between 0 and
1. What is the probability that their sum is greater than I? What is the
probability that their product is greater than 1/2?
1.39 If there are 3 red balls in an urn and 7 black balls, what is the expected
number of red balls to be drawn in 20 trials (replacing whatever ball is
drawn in each trial)? (ans.)
1.40 What is the expected number of consecutive heads as a result of tossing a
fair coin? (ana.)
1.41 What is the expected number of coin flips before a head comes up (with a
fair coin)?
1.42 What is the expected number of coin flips before two consecutive heads come
up?
1.43 What is the expected distance between two 'e's in a random character stream
where "e's occur 11% of the time?
1.44 What is the expected distance between two 'ee's in a random character
stream where "e's occur 11% of the time?
1.45 Let X be the random variable defined as 'the number of heads in 10 flips
of a fair coin.' The sample space is all 210 different possible sequences of
outcomes of 10 flips. The expected value of X itself is 5. What is the
expected value of the random variable (X - 5)2?
1.46 (*) What is the expected number of coin flips before n consecutive heads
come up?
32 Chapter 1 Probability
1.47 (*) Choose two real numbers 'at random' from the interval [0,1]. What is
the expected value of their sum? product?
1.48 Compute the variance of the random variable which tells the result of the
roll of one fair die.
1.49 Compute the variance of the random variable which tells the sum of the
result of the roll of two fair dice.
1.50 Compute the variance of the random variable which tells the sum of the
result of the roll of three fair dice.
1.51 (*) Compute the variance of the random variable which tells the sum of the
result of the roll of n fair dice.
1.52 (*) Consider a coin which has probability p of heads. Let X be the random
variable which tells how long before 2 heads in a row come up. What is the
variance of X?
1.53 Gracefully estimate the probability that in 100 flips of a fair coin the number
of heads will be at least 40 and no more than 60. (ans.)
1.54 Gracefully estimate the probability that in 1000 flips of a fair coin the num-
ber of heads will be at least 400 and no more than 600. (ans.)
1.55 Gracefully estimate t.he probability t.hat in 10,000 flips of a fair coin the
number of heads will be at least. 4000 and no more than 6000. ( ans.)
1.56 With a coin that has probability only 1/10 of coming up heads, show that
the probabilit.y is less than 1/9 that in 100 flips the number of heads will be
more than 20. (ans.)
1.57 With a coin that has probability only 1/10 of coming up heads, show that
the probability is less than 1/900 that in 10,000 flips the number of heads
will be less than 2000.
2
Information
The words uncertainty, information, and redundancy all have some in-
tuitive content. The term entropy from thermodynamics may suggest a related
notion, namely a degree of disorder. We can make this more precise, and in our
context we will decide that the three things, uncertainty, information, and entropy,
aU refer to roughly the same thing, while redundancy refers to lack of uncertainty.
Noiseless coding addresses the issue of organizing information well for trans-
mission, by adroitly removing redundancy. It does not address issues about noise or
any other sort of errors. The most visible example of noiseless coding is compres-
sion of data, although abbreviations, shorthand, and symbols are equally important
examples.
The other fundamental problem is noisy coding, more often called error-
correcting coding, meaning to adroitly add redundancy to make information
robust against noise and other errors.
The first big result in noiseless coding is that the entropy of a memoryless
source gives a lower bound on the length of a code which encodes the source.
And the average word length of such a code is bounded in terms of the entropy.
This should be interpreted as a not-too-surprising assertion that the entropy of a
source correctly embodies the notion of how much information the source emits.
33
34 Chapter 2 Information
The outcome of the roll of a single fair die (with faces 1-6) is more uncertain
than the toss of a coin: there are more things that can happen, each of which has
rather small probability.
On the other hand, we can talk in a similar manner about acquisition of infor-
mation. For example, in a message consisting of ordinary English, the completion
of the fragment
Because the weather forecast called for rain, she took her ...
to
Because the weather forecast called for rain, she took her umbrella.
imparts very little further information. While it's true that the sentence might
have ended boots instead, we have a clear picture of where the sentence is going.
By contrast, completion of the fragment
to
The weather forecast called for rain.
imparts a relatively large amount of information, since the first part of the sentence
gives no clues to its ending. Even more uncertainty remains in trying to complete
a sentence like
Then he surprised everyone by...
and commensurately more information is acquired when we know the completion.
In a related direction: the reason we are able to skim newspapers and other
'lightweight' text so quickly is that most of the words are not at all vital to the con-
tent, so if we ignore many of them the message still comes through: the information
content is low, and information is repeated. By contrast, technical writing is harder
to read, because it is more concise, thereby not allowing us to skip over things. It is
usually not as repetitive as more ordinary text. What 'concise' means here is that
it lacks redundancy (meaning that it does not repeat itself). Equivalently, there
is a high information rate.
Looking at the somewhat lower-level structure of language: most isolated ty-
pographical errors in ordinary text are not hard to correct. This is because of the
redundancy of natural languages such as English. For example,
is easy to correct to
The sun was shining brightly.
In fact, in this particular example, the modifier 'brightly' is hardly necessary at all:
the content would be almost identical if the word were omitted entirely. By contrast,
typographical errors are somewhat harder to detect and correct in technical writing
than in ordinary prose, because there is less redundancy, especially without a larger
context
2.1 Uncertainty, acquisition of information 35
Note that correction of typos is a lower-level task than replacing missing words,
since it relies more upon recognition of what might or might not be an English word
rather than upon understanding the content. Corrections based upon meaning
would be called semantics-based correction, while corrections based upon mis-
spelling or grammatical errors would be syntax-based correction. Syntax-based
correction is clearly easier to automate than semantics-based correction, since the
'rules' for semantics are much more complicated than the 'rules' for spelling and
grammar (which many people find complicated enough already).
Still, not every typo is easy to fix, because sometimes they occur at critical
points in a sentence:
I cano go with you.
In this example, the 'cano' could be either 'can' with a random '0' stuck on its
end, or else either 'cannot' with two omissions or maybe 'can't' with two errors. By
contrast, errors in a different part of the message, as in
are easier to fix. In the first of these two examples, there would be a lot of infor-
mation imparted by fixing the typo, but in the second case very little. In other
words, in the first case there was high uncertainty, but in the.second not.
Let's look at several examples of the loss of intelligibility of a one-line sentence
subjected to a 12% rate of random errors. That is, for purposes of this example,
we'll randomly change about 12% of the letters to something else. We'll do this
several times to see the various effects. Starting with
we get
Several things should be observed here. First, the impact on clarity and correctabil-
ity depends greatly on which letters get altered. For example, the word 'number' is
sensitive in this regard. Second, although the average rate of errors is 12%, some-
times more errors than this occur, and sometimes fewer. And the distribution of
errors is not regular. That is, a 12% error rate does not simply mean that every
8th letter is changed, but only expresses an average. Among the above 10 samples,
36 Chapter 2 Information
in at least 2 the meaning seems quite obscure. Third, using more than one of the
mangled sentences makes it very easy to infer the correct original.
With an error rate of 20%, there are frequent serious problems in intelligibility:
perhaps none of the ten samples retains its meaning if presented in isolation. From
the same phrase as above
In these 10 examples few of the words are recognizable. That is, looking for
an English word whose spelling is close to the given, presumably misspelled, word
does not succeed on a majority of the words in these garbled fragments. This is
because so many letters have been changed that there are too many equally plausible
possibilities for correction. Even using semantic information, these sentences are
mostly too garbled to allow recovery of the message.
Notice, though, that when we have, in effect, 9 retransmissions of the original
(each garbled in its own way) it is possible to make inferences about the original
message. For example, the 10 messages can have a majority vote on the correct
letter at each spot in the true message. Ironically, the fact that there are so many
different error possibilities but only one correct possibility makes it easy for the
correct message to win such votes. But a large number of repetitions is an inefficient
method for compensating for noise in a communications channel.
Another version of noise might result in erasures of some characters. Thus,
we might be assured that any letter that 'comes through' is correct, but some are
simply omitted.
One point of this discussion is that while English (or any other natural lan-
guage) has quite a bit of redundancy in it, this redundancy is unevenly distributed.
In other words, the information in English is not uniformly distributed but is con-
centrated at some spots and thin at others.
Another way to illustrate the redundancy is to recall an advertisement from
the New York subways of years ago:
F u cn rd ths, u cn gt a gd jb.
An adroit selection of about 40% of the letters was removed, but this is still intel-
ligible.
2.2 Definition of entropy 37
and
P(wd ? 0 (for all indices i)
as usual. The (self-) information of the event Wi is defined to be
Thus, a relatively unlikely event has greater (self-) information than a relatively
likely event.
For example, for flipping a fair coin, the sample space is {H, T}. Since the coin
is fair,
P(H) = P(T) = ~
The self-information of either ,head or tail is
1
J(H) = -log2 '2 = 1
1
=1
I (T) = - log2 -
2 .
This simplest example motivates the name for the unit of information, the hit.
The entropy of a sample space is the expected value of the self-information of
(atomic) events in n. That is, with the notation as just above,
H(pl. ... ,Pn) = entropy of sample space {x!, ... , x n } with P(Wi) = Pi
We also can define the entropy of a random variable X in a similar way. Let
X be a random variable on a sample space n
= {Wl,"" w n }. The entropy of X
can be viewed as a sort of expected value:
That is, only the probabilities matter, not their ordering or labeling.
• H(pl. ... ,Pn) ~ 0, and is 0 only if one of the PiS is 1. That is, uncertainty
disappears entirely only if there is no randomness present.
• H(pl. ... ,Pn) = H(pl. ... ,Pn, 0). That is, 'impossible' outcomes do not con-
tribute to uncertainty.
• 1 1 1 1
H(-, ... ,-):5 H ( - l ' ' ' ' ' - l )
n
--..-..n n+
'
n+. '
n n+l
Remark: The logarithm is taken base 2 for historical reasons. Changing the base
of the logarithm to any other number b > 1 merely uniformly divides the values
of the entropy function by log2 b. Thus, for comparison of the relative uncertainty
of different sets of probabilities, it doesn't really matter what base is used for the
logarithm. But base 2 is traditional and also does make some answers come out
nicely. Some early work by Hartley used logaritJuns base 10 instead, and in that
case the unit of information or entropy is the Hartley, which possibly sounds more
exotic.
Remark: The units for entropy are also bits, since we view entropy as an ex-
pected value (thus, a kind of avemge) of information, whose unit is the bit. This is
compatible with the other use of bit (for binary digit), as the coin-flipping example
illustrates.
Example: The entropy in a single toss of a fair coin is
= 21 ( -(-1)) + 2(-(-1)
1 ). = 1 bit
Indeed, one might imagine that such a coin toss is a basic unit of information.
Further, if we label the coin '0' and '1' instead of 'heads' and 'tails', then such a
coin toss exactly determines the value of a. bit.
40 Chapter 2 Information
. (1 1 1 1
H (dIe) = H 6' 6' 6' 6'
11)
6' 6 = -
~6
{:r 1 1
log2 6 =
.
log2 6:::::: 2.58496250072 bIts
H . _ (1 2 3 4 5 6 5 4 3 2 1)
(sum two dIce) - H 36' 36' 36' 36' 36' 36 ' 36' 36' 36' 36' 36
1 12 2 1 1 .
= --log2
36
- - -log2 - - .. - -log2 - :::::: 3 27440191929 bIts
36 36 36 . 36 36 .
Example: The entropy in a single letter of English (assuming that the various
letters will occur with probability equal to their frequencies in typical English) is
approximately
H(letter of English) :::::: 4.19
(This is based on empirical information, that 'e' occurs Q-bout 11% of the time, 't'
occurs about '9'% of the time, etc.) By contrast, if all letters were equally likely,
then the entropy would be somewhat larger, about
H (21
6' ... ' 2~) = log2 (26):::::: 4.7
Remark: The proof that the axioms uniquely characterize entropy is hard, and
not necessary for us, so we'll skip it. But an interested reader can certainly. use
basic properties of logarithms (and a bit of algebra and basic probability) to verify
that
H(Pb· .. ,Pn) = - L
Pi log2 Pi
i
meets the conditions, even if it's not so easy to prove that nothing elae does.
Joint entropy of a collection Xl, ... , X N of random variables is defined in
the reasonably obvious manner
Lemma: Fix Pl, ... , Pn with each Pi ~ 0 and I:i Pi = 1. Let ql," . , qn. vary,
subject only to the restriction that qi ~ 0 for all indices, and I:i qi = 1. Then
lnx:::; x-I
we have
log2 X :::; (log2 e)(x - 1)
with equality only for x = 1. Then replace x by qlp to obtain
and then
plog2 q :::; plog2 P + (log2 e)(q - p)
with equality occurring only for q = p. Replacing P, q by Pi and qi and adding the
resulting inequalities, we have
Multiplying through by -1 reverses the order of inequality and gives the assertion.
III
Corollary: H(p}, ... ,Pn) :::; log2 n with equality occurring only when Pi
all indices. .
= * for
Proof: This corollary follows from the previous inequality by letting qi = *. III
42 Chapter 2 Infonnation
We will use the fact that for fixed i we have Lj Tij = Pi and for fixed j we have
Li Tij = qj. Then compute directly:
ij ij (j
And then for another random variable Y on fl, define a conditional entropy by
H(XIY) = LP(Y = Yj)H(XIY = Yj)
j
where we use the previous notion of conditional entropy with respect to the subset
w where Y = Yj. The idea here is that H(XIY) is the amount of uncertainty or
entropy remaining in X after Y is known ..
It is pretty easy to check that
H(XIX) =0
Another random document with
no related content on Scribd:
háttérbe szorultak, a nélkül, hogy komolyan el lehetne őket érte
ítélni.
Brighton-Pomfrey dr. alacsony termetű, kerek arcú, erősen
rövidlátó ember volt, akinek orra sehogysem volt alkalmas a
szemüveghez, amelyet viselt. Ezenkívül – Isten tudja, miért, – óriási
oldalszakállal büszkélkedett.
– Nos, hát hogy vagyunk? – kérdezte, miközben nyakát
hátrahúzva, kiegyensúlyozta az orrán ingadozó szemüveget. – Miben
lehetek szolgálatára? Külsőleg semmit sem tudok felfedezni. Kissé
halvány ugyan és le van soványodva, de egyébként úgy látszik, hogy
minden teljesen rendben van.
– Igen, – mondta a néhai püspök, – minden rendben van…
– Csak…? – kérdezte az orvos s mosolygás közben megvillantotta
a fogát.
Modorában volt valami az öreg fürdős asszonyéból, aki a kis
gyereket arra biztatja, hogy ugorjék a vízbe.
– Ki vagyok merülve és nagyon fáradt vagyok.
– Akkor foglaljunk helyet, – mondta a híres orvos mesterségszerű
hangon és jobban szemügyre vette vendégét. A székre mutatott s
megfogta a szék karfáját.
Az ex-püspök leült, az orvos pedig úgy foglalt helyet, hogy
vendége és a világosság közé kerüljön.
– A püspökségemről való lemondás és a többi vele összefüggő
dolgok nagyon igénybe vették idegeimet, – kezdte Scrope. – Azt
hiszem, hogy bajomnak ez a lényege. Ha az ember így mindent
elvág… mindenféle szálak megszakadnak… Nehézségek támadnak az
új helyzet megteremtése körül…
– Természetesen, természetesen, – mondta az orvos, miközben
arcizmai megrándultak és szemüvege rezegni kezdett az orrán. –
Kimerültség. Nem érzi szükségét valami erősítő szernek, vagy
változásnak?
– Igen. Csakugyan egy bizonyos erősítő szert óhajtanék öntől.
Brighton-Pomfreynek kerekre nyílt a szeme és szája, mint aki
kérdezni akar valamit.
– Mialatt ön a tavasszal nem volt idehaza…
– El kellett mennem, – mondta az orvos. – Nem térhettem ki a
megbízatás elől. Gázmérgezések, bizonyos kutatások… A fiatal
kutatók nagyon jók a maguk módja szerint, de mi öregebb
tekintélyek… Több a tapasztalatunk. Megfontoltabb az ítéletünk.
Nem boldogultak nélkülünk.
– Szóval, mikor tavasszal önt fel akartam keresni, egy
asszisztenst vagy kisegítőt, – azt hiszem, önök így szokták nevezni?
– találtam az ön rendelőjében… Hogy is hívták?… D…?
– Dale!
Mikor ezt a nevet kiejtette, az orvos arca szokatlan harag
kifejezését öltötte magára. Kerek, kék szeme szikrázni látszott és
ártatlan kis izmocskái ráncokat vetettek homlokára. Arcszíne haragos
vörösre változott.
– Az őrült! – kiáltotta. – Veszedelmes őrült! Remélem, önnél
semmit sem… semmi rosszat sem követett el?
Látszott, hogy a dolog nagyon kellemetlen emlékeket ébresztett
benne.
– Azt az embert a legjobb bizonyítványokkal küldték hozzám
Cambridgeből. A legjobb bizonyítványokkal. Nekem csak huszonnégy
óra időm volt. Gázmérgezéseket kellett vizsgálnom. Nem tehettem
egyebet, minthogy rábíztam ügyeimet.
Brighton-Pomfrey úgy tárta szét rövid és kövér ujjas kezét,
mintha a felelősséget akarná elhárítani magától.
– Semmi különös kárt nem tett bennem, – mondta Scrope.
– Akkor ön az első ember, akit megkímélt, – állapította meg
Brighton-Pomfrey.
– Miért?… Ügyetlen volt?
– Ügyetlen? Nem, ez nem jó kifejezés.
– A módszere volt különleges?
A kis orvos felugrott és fel-alá kezdett járni a szobában.
– Különleges! – kiáltotta. – Gazság volt, hogy őt küldték hozzám.
Gazság!
Vendége felé fordult s az arcát alkotó kerek halmok haragtól
égtek. Oldalszakálla úgy mozgott, mint a repülni készülő madár
szárnya. Arcát a karosszékben ülő vendége felé tolta.
– Megkönnyebbültem, mikor hallottam, hogy megölték. Elesett.
Odaát.
Módfeletti izgatottságában szemüvegje leesett az orráról. Úgy
előrecsapódott a zsinórján, mintha meg akarna szökni, mialatt
gazdája szabad folyást enged érzései áradatának.
– Az őrült! – hörögte a doktor, sokat jelző kézmozdulatok
kiséretében. – Veszedelmes őrült! Az volt a rögeszméje, hogy
mindenkit tönkretesz. Mérgekkel dolgozott! A legveszedelmesebb
mérgekkel. Hazajövök s hölgyeket találok a rendelőmben. Magas
társadalmi állású hölgyeket. Morfinisták! Másokat… Megfontolás
nélkül használta a legveszedelmesebb szereket. Kokaint egyáltalán
nem, csak idegizgatókat, a legerősebb izgatószereket. A
legmagasabb körökben is. Rettenetes! Fenségek. Az uralkodóházhoz
tartozók! A háború okozta náluk és… inkognitóban akartak maradni.
Borzalmas! Rettenetes befolyást gyakorolt betegeire. Az volt a
rögeszméje, hogy feldúlja a testet és lelket egyaránt. A sarkukból
emelte ki a lelkeket… Tönkretette az összeköttetéseimet.
Elpusztította esztendők munkáját. Micsoda károkat okozott! Micsoda
károkat!
Úgy nézett maga elé, mintha haragjában szét akarna robbanni,
de azután lecsillapodott. Reszkető kézzel igyekezett visszahelyezni
orrára szemüvegét, kabátjának hátsó zsebéből terjedelmes selyem
zsebkendőt húzott elő és megtörülte az üvegeket, aztán orrára
helyezte csíptetőjét, gallérjában ide-oda forgatta a fejét és kezével a
nyaka körül babrálva, ide-oda rángatta a nyakkendőjét.
– Bocsásson meg ezért a kitörésért, – mondotta. – De Dale dr.
annyi bajt csinált…
Scrope felkelt. Lassan az ablakhoz lépett, miközben kezét a hátán
összekulcsolta, majd ismét megfordult. Modorában még sokat
megőrzött néhai püspöki méltóságából.
– Sajnálom, hogy így történt, de talán meg tudja állapítani a
naplóból, hogy mit adott nekem Dale. Valami erősítőszer volt, ami
nagyon jó hatással volt rám. Úgy érzem, hogy ismét nagy
szükségem volna rá.
Brighton-Pomfrey úgy felelt, mintha bizonyos rosszakarat
működnék benne.
– Egyáltalában nem vezetett naplót, – mondotta. – Egyáltalán
nem.
– De…
– Ha vezetett volna is, – mondta Brighton-Pomfrey a tenyerét
ide-oda lóbálva, – nem folytathatnám az ő kezelését.
Szavainak azzal adott nyomatékot, hogy még gyorsabban kezdte
mozgatni a kezét.
– Nem folytatnám az ő kezelését. Semmi körülmények között
sem tenném…
– Természetesen, – mondta Scrope, – ha az eredmény olyan volt,
mint ahogy ön állítja. De az én esetemben nem volt szó kezelésről.
Álmatlan voltam, zavartnak, nyomorúságosnak és tönkrementnek
éreztem magamat. Idejöttem tehát és ő azt a szert adta nekem. Az a
szer megtisztítja az ember fejét és gondolkozását. Olyan, mintha az
ember kiszabadulna a dolgok ködéből és eljutna a lényeghez és a
dolgok alapjához. Engem teljesen talpraállított… Bizonyára ön is
ismer valami ilyen szert, amelynek szükségét most, amikor a
lemondásomból kifolyólag egész sereg problémával állok szemben,
még fokozottabban érzem. Meg kell kapnom. Sok dologban kell
határoznom és nem tudok határozni, mert bizonytalannak érzem
magamat és óráról-órára változom. Nem azt kívánom öntől, hogy a
Dale módszerét kövesse, de szükségem van arra az orvosságra.
Mikor Scrope beszélni kezdett, Brighton-Pomfreynek keze
lankadtan hullott csípője mellé, de aztán magatartása fokról-fokra
határozottabbá lett. Fejét kissé féloldalt fordította és a szemüvegével
kezdett játszani. Végül kettőt köhécselt és a kezében tartott
szemüveggel látszott nyomatékot adni annak, amit mondott.
– Mondja, kérem, – kérdezte, – mondja, volt-e annak az
orvosságnak, amely agyát így megtisztította, volt-e valami köze az
ön lemondásához?
Szemüvegét az orrára illesztette és miközben a feleletet várta,
nyaka közé húzta a fejét.
– Mindenesetre. Elősegítette a helyzet tisztázását.
– Értem, – mondotta Brighton Pomfrey olyan hangon, amely
könyörtelenül és félreérthetetlenül juttatta kifejezésre
meggyőződését. – Értem, – ismételte és tenyerét tiltakozólag
tartotta maga elé…
– Kedves uram, – kérdezte azután némi szünet után, – hogy
kívánhatja tőlem, hogy egy ilyen szerencsétlen hatású méreg
megszerzésében segítségére legyek? Még ha tudnám is, hogy mi volt
az a méreg, akkor sem!…
– De rám nézve nem volt szerencsétlen hatású, – mondta
Scrope.
– És az ön különös lemondása és e lemondásnak még
különösebb bejelentése!
Ebben azt hiszem, nincs semmi szerencsétlenség
– Már engedjen meg, tisztelt uram…
– Tudom, hogy ön nem akar velem vallási kérdésekről vitatkozni.
Épen azért engedje meg, hogy csak a magam szempontjából
jegyezzek meg annyit, hogy az a nagy megvilágosodás, amelyhez
részben Dale dr. orvossága segített, nagy megkönnyebbülése volt
életemnek. Valósággal kristálytisztává tette agyamat. Elsöpörte a
zavaró, köznapi jelenségeket. Egy időre tisztán láttam az igazságot s
most újból így szeretnék látni.
– Miért?
– Megint válság előtt állok… mellékes, hogy milyen dolgok miatt.
Nem látom tisztán az utat, melyet követnem kell.
Brighton-Pomfrey a szőnyegre szegzett tekintettel gondolkozott,
miközben szája hevesen rángatódzott. Szemüvegét úgy lóbálta a
zsinórján, mintha inga volna.
– Mondja el mindenekelőtt, – mondta, sanda tekintetet vetve
Scrope felé, – hogy melyek voltak ennek az orvosszernek a hatásai?
Azt hiszem tudniillik, hogy többféle lehetett. Hogyan segítette önt az
igazság látomásához, amely – mint mondja, – lemondásához
vezetett?
Scrope hirtelen elszégyelte magát, de annyira sóvárgott Dale
orvossága után, hogy a legtökéletesebben igyekezett leírni
tapasztalatait.
– Aranyszínű áttetsző folyadék volt, – mondta a száraz
ténymegállapítás hangján. – Nagyon aranyos folyadék, akárcsak a
tiszta és szép Chablis bor. Ha e folyadékhoz vizet öntöttek,
gyöngyözve emelkedett fel a vízben, opalizált s mintha élet rezgett
volna benne.
– Helyes. És mikor beszedte?
– Érzéseim hirtelen megtisztultak. Az agyam… emelkedettség és
biztonság érzetét tapasztaltam.
– Szóval agyműködése úgy viszonylott a régebbihez, – segített
Brighton-Pomfrey a magyarázatban, – mint huszonkilenc a
tizenkettőhöz.
– Erősebbnek és tisztultabbnak éreztem magamat, – mondta
Scrope, előbbi megállapításait folytatva.
– A környező dolgok pedig másformáknak tetszettek, mint
rendesen? – kérdezte az orvos, ismét előrenyujtva kerek arcát.
– Nem, – felete Scrope és az orvosra nézett, mintha azt kémlelte
volna, hogy tapasztalataiból mennyit szabad elárulnia egy ilyenfajta
embernek.
– Szóval különböztek? – kérdezte az orvos, kissé
engedékenyebben.
– Igen… Hogy világosan fejezem ki magamat… közvetlenül
éreztem Isten jelenlétét. A világ olyan volt, mintha áttetsző függöny
lett volna és Isten… nyilvánvaló lett előttem. Szükséges ez az
orvosság megállapításához?
– Isten nyilvánvaló lett ön előtt? – ismételte az orvos a kifejezést
s bizonyos rosszalás volt hangjában. A fejét csóválta s aztán élesen
fürkésző hangon tette hozzá: – Szóval ön azt hiszi, hogy látomása
volt? Vagy tényleg látta őt?
– Látomás formájában történt.
Scrope nagyon kellemetlenül kezdte érezni magát.
Az orvos ajka hangtalanul ismételte ezeket a szavakat.
Tekintetéről a megvetés kifejezését lehetett leolvasni.
– Bizonyára valamit adott önnek… Azt hiszem, valami
morfiumféle lehetett. De aranyszínű… opalizáló? És ön e látomás
hatása alatt lepett meg bennünket lemondásával?
– Lemondásom hosszabb lelki átalakulásnak volt részlete, –
felelte Scrope türelmesen. – Már hosszabb idő óta abba az irányba
sodródtam, hogy az anglikán álláspontot teljesen elvetem. Az
orvosság mindössze megvilágította azt, ami már készen volt
lelkemben. Alakot adott neki. Úgy hatott, mint az előhívó a
fényképezésnél.
Az orvos hirtelen a vidámság ellenkező végletébe csapott át.
– Ilyesmire még nem volt eset… Hogy egy orvos a Mount
Streeten Isten látomásáról konzultáljon a betegével!… Ön pedig
tudja és tudatosan el szeretné hitetni, hogy ez a látomás valóságos
volt. Önnek tudnia kell, hogy ezt akarja…
Mindezideig Scrope nem akarta önmaga előtt bevallani, hogy a
kísérlete hiábavaló volt, de most utat engedett a kétségbeesésnek,
amely szembeszállt Brighton-Pomfrey véleményével.
– Azt hiszem, – mondotta, – hogy az az orvosság valahogyan
megfoghatóvá tette számomra Istent. Azt hiszem, hogy láttam
Istent.
Brighton-Pomfrey úgy csóválta a fejét, hogy Scropenak kedve lett
volna őt megütni.
– Igen, azt hiszem, hogy láttam Istent, – ismételte szilárdabb
hangon. – Hirtelen megéreztem, hogy ő mily hatalmas, hogy az élet
mily hatalmas és hogy milyen félénk, alsórendű és szennyes a mi
hivatásszerű, előkelő életünk. Hatalmába ejtett és egyideig teljesen
uralkodott rajtam a szenvedély, hogy igazán és semmi egyébbel sem
törődve, csak őt szolgáljam s hogy véget vessek a kényelemmel,
önszeretettel és másodrangú dolgokkal kötött kompromisszumoknak.
Ezt éreztem és most ehhez akarom tartani magamat, ezt akarom
visszaidézni, mert újból elfáradtam és elernyedtem. Vérmérsékletem
szerint könnyelmű ember vagyok, de fel akarok egyenesedni,
magasabb hivatásomnak akarok élni, pedig fáradtnak, kábultnak és
gúzsbakötöttnek érzem magamat… Az az orvosság jó hatású volt.
Nekem legalább jót tett. Megint a szükségét érzem, hogy segítsen
rajtam.
– De én nem tudom, hogy mi lehetett.
– Hát nincs valami más orvosság, amelynek hasonló a hatása?…
Ha például morfiumot próbálnék meg valamilyen formában?
– Esetleg hallucinálna tőle, de nem volnának isteni látomásai. Ha
nagyon óvatosan és kis mennyiségeket szedne, ideiglenesen ugyan
élénkséget idézne elő, de minden ilyen állandó orvosság szedésének
hatása az erkölcsi leromlás, a gyors erkölcsi leromlás. Az ilyen
orvosságok rendszeres szedésének mindig az eredménye, hogy az
ember pontatlan, kíméletlenül önző lesz és elveszti az őszinteség és
igazság iránti érzékét. Közismert dolgok ezek…
– Pedig azt gondoltam és úgy reméltem…
– Önnek épen elég leküzdeni valója lesz enélkül is, – mondta az
orvos. Ne vegye magára még ezt a tehertételt is.
– Egyszóval, ön nem akar rajtam segíteni?
Az orvos fel-alá lépkedett a szőnyegen, azután kezét kinyujtva és
ujjait rezegtetve mondta:
– Ha tudnék, sem tehetném. Nem tenném meg az ön érdekében.
De ha akarnám is, nem tehetném, mert nem ismerem azt az
orvosságot, amely kétségtelenül valami pokoli kotyvaléka volt annak
az őrültnek. Valami esetlegesen ható szer, amely hála Istennek, már
nincs meg… az ön érdekében…
2. §.
3. §.
4. §.
5. §.
6. §.
7. §.