Lec 02
Lec 02
Lecture 02 – Probability
Probability is a familiar concept. We feel we know with some certainty that when we flip a coin,
P(H) = ½, and we feel we understand what that means.
Also, the mathematics of probability is fairly straightforward and simple, as mathematics goes:
P(impossible)=0
P(certain)=1
But the concept of probability can also be elusive. In fact, there are at least three different
notions of how a probabilistic statement should be interpreted.
Can you see differences in the notions of probability used in these statements? Specifically, can
you make 3 groups of statements so that each group has the same type of concept of
"probability"?
Experiment: any process or situation in which different things can happen. (example –
experiment = flip a coin once, observe value that comes up)
Outcome: one of the possible things that can happen (example – H (meaning a head comes up)).
Outcome or sample space: the set of all possible outcomes of an experiment. (example –
S = { H, T }) Note that we use capital S to denote the sample space, and we always write it as a
set, that is, as a list enclosed in curly brackets.
Outcomes in a sample space are always mutually exclusive, meaning that if one outcome
happens the other(s) cannot. For example, if you flip a coin once, H and T cannot both happen
simultaneously. The outcome of that experiment is EITHER H OR T. But the outcome space
consists of both H and T because they are both possible outcomes of the experiment.
Now that we have this vocabulary, we can point out that classical probability theory applies
ONLY in situations that can be broken down into a finite number of mutually exclusive, equally
likely elementary events.
Many gambling games can be broken down in this way. For example consider that games in
modern casinos: card games, dice, and roulette. Other gambling situations (e.g. horse racing,
betting on football teams) often are a little more complex. The probability that the Yankees win
or lose in next week's game is not exactly 50%. The probability that Columbia's football team
wins their next game is definitely not 50-50. So the classical definition of probability doesn't
work well here.
1. How about if the coin is not fair (it comes up H more than it does T)? Imagine bending the
coin or using loaded dice. For this situation (and football games and other examples), we need
to be able to deal with events that aren't equally likely.
2. How about when number of outcomes is infinite? Imagine choosing an integer at random
from positive integers. What is probability that it is event? 1/ ?
2. Frequentist (empirical) Probability
Going back to that biased coin, how can we establish the probability that a biased coin will come
up heads when H and T not equally likely to happen.
The frequentist approach is to take the biased coin and flip it 100 times. We can suppose that
for this coin there is a true probability of a head, call it p. When we flip it 100 times, we can get
an approximation of p by counting the number of times out of 100 that a head comes up. If we
could flip it an infinite number of times, we could establish the probability, p, with certainty.
Limitation: Many situations of uncertainty don't fit into this conceptual experiment idea because
it is not possible to repeat the experiment under similar conditions.
For example: how do you establish the probability that the Yankees will win their next game?
Say they are playing against the Red Sox. You could look at the historical record against the Sox,
but are they really the same teams that have played in the past? Not really. You could ask them
to play 10 games next time they play instead of 1. But those 10 games in a row won’t be
experimentally the same either due to fatigue, injury, weather, etc… All of these solutions are
impractical or invalid because the repeated experiments are not really replications.
3. Subjective probability
The only conceptual approach to probability for questions like the Yanks vs. Sox is the notion of
subjective probability or the probability as degree of belief that an event will occur or that a
proposition is true.
We won't consider theories of subjective probability in this course. We will leave that for
psychologists, philosophers, and computer scientists to worry about.
Let's review our vocabulary and look at several other simple examples of experiments.
Experiment: some game or situation in which the results are not certain
Experiment 1: Roll a single die Experiment 2: Flip a coin twice
Note that outcomes in an outcome space are mutually exclusive and exhaustive (i.e. we've
covered all the possibilities).
Here is a new concept, called an EVENT. An event is any set of outcomes. Below are 5 different
events for experiments 1 and 2. Note that events are denoted with capital letters (A, B, C, D,
and E), but the use of the letter S is reserved for the outcome or sample space.
S1 = {1, 2, 3, 4, 5, 6} S2 = {HH, HT, TH, TT}
EXPERIMENT 1 EXPERIMENT 2
A=(roll odd number)={1,3,5} D=get H on first flip={HH, HT}
B=(roll number > 2)={3,4,5,6} E=same on both flips={HH,TT}
C=(roll a six)={6}
INTERSECTION:
A and B = A B = the set of all outcomes in A AND in B (what events A and B have in common)
EXPERIMENT 1: A B = {3,5} – 3 & 5 are the outcomes that are in both A and B
EXPERIMENT 2: D E = {HH} – HH is the outcome that is in both D and E
A USEFUL GRAPHICAL DEVICE FOR REASONING ABOUT COMPOUND EVENTS IS THE VENN
DIAGRAM:
Example 1: A B is the overlap between A and B (when events A and B overlap)
A B
Example 2: A C does not exist because in this example, A and C are mutually exclusive,
meaning they have no outcomes in common
A C
UNION:
A or B = A U B = the set of all outcomes in A OR in B. Everything in A or B, but you only count
what they have in common once.
EXPERIMENT 1: A U B = {1,3,4,5,6}
EXPERIMENT 2: D U E = {HH,HT,TT}
VENN DIAGRAM:
Example 1: A U B is everything that is shaded in (for overlapping events A and B)
A B
example 2: A U B\C is still everything that is shaded even though A & C have nothing in common.
A C
SET COMPLEMENT:
NOT(A) = Ac = A' = the set of all outcomes NOT in A
VENN DIAGRAM:
example: Ac is the orange region.
A
Example:
Experiment. 3: You have three ping-pong balls, one red, one green, one blue, in an urn. You
draw one, place it back and draw again recording the color each time. This whole process is an
experiment (sampling with replacement)
Classical definition of probability applies only to situations in which there are a finite number of
equally-likely outcomes.
Definition Assume that an outcome space S consists of a finite # of equally likely individual
outcomes, and let A represent any event (that is any subset of S). Then define the probability of
A, P(A), as
# outcomes in A
P(A) =
# outcomes in S
What is the P of drawing a red ball from an urn that contains 19 red balls, 6 white balls, and 12
black balls? # events in A = 19 # events in S = 19+6+12 = 37 so P(A) = 19/37
In order to apply this classical definition of probability to any situations that are complicated
enough to be interesting; we need efficient, accurate ways of counting the number of outcomes
in large outcome spaces.
A. graphical devices:
1. Outcome trees - especially useful for listing outcomes in a sequential experiment.
H H4
H5
H6
T1
T2
T3
T T4
T5
T6
Example: Flip coin twice times, observing sequence
HH
H HT
TH
T TT
B. combinatorics formulas:
To help us calculate the number of outcomes there are in an experiment, we can use a formula
called the Fundamental Principle of combinatorics. This formula is used for figuring how many
outcomes in an outcome space. It’s also sometimes called the “Multiplication Principle”
n(S) = n1 * n2 * n3, etc…
For the 2 examples above, there are n(S) = 2 * 6 = 12 outcomes for the experiment where we
flip a coin and then roll a die. There are n(S) = 2*2 = 4 outcomes for the experiment where we
flip a coin twice.
Combinations – The combinations formula allows us to find the number of outcomes there are.
It answers the question: Selecting from a total of n items, how many groups of size k can I
make? You can think of this type of problem as the “subcommittee” problem.
n!
Ckn =
k !(n − k )!
You know that there are 45 different ways to network 2 computers out of 10, but the formula
does not tell us what the sets of 2 actually are. Another useful graphical representation is a
table or a matrix. The table will actually help us enumerate all of the paths. You will notice that
only half of the table is filled in. Why is this?
This is because the assumption is that the connections between 2 computers is reciprocated,
meaning a connection from computer 1 to computer 2 is the same as computer 2 to computer
1. So in the case of combinations, the order does not matter.
Also, why are there X’s on the diagonal? Because any object cannot be paired with itself!
1 2 3 4 5 6 7 8 9 10
1 X 1,2 1,3 1,4 1,5 1,6 1,7 1,8 1,9 1,10
2 X 2,3 2,4 2,5 2,6 2,7 2,8 2,9 2,10
3 X 3,4 3,5 3,6 3,7 3,8 3,9 3,10
4 X 4,5 4,6 4,7 4,8 4,9 4,10
5 X 5,6 5,7 5,8 5,9 5,10
6 X 6,7 6,8 6,9 6,10
7 X 7,8 7,9 7,10
8 X 8,9 8,10
9 X 9,10
10 X
Permutations
In the example above, we decided that order does not matter between pairs of computers. But
what happens if order is actually important? For example, if you set up a connection from
computer 1 to computer 10 to send information, and you want need to confirm that from
computer 10 that it received the information from computer 1?
Consider this example: You have 10 computers you wish to network. How many different ways
can you network computers 2 at a time if the flow of information is unidirectional? (Order is
important)
Instead of using the Combinations formula for this problem, because we have to consider order,
we use the Permutations formula. The Permutations formula also tells us the number of the
sets of 2 there are, but it gives us the number of ORDERED sets. It answers the question: How
many ways can you order n objects?
n!
Pkn =
(n − k )!
where n = total # of objects; k = # in group
Here is example of a permutations problem: You have 5 horses in a race. How many ways can
they finish?
n! 5! 5!
Pkn = = = = 120
(n − k )! (5 − 5)! 0!
0! = 1 since a factorial counts the number of permutations, and there is one permutation for 0
outcomes.
Going back to the computer integration problem, try to solve it. You have 10 computers you
wish to network. How many different ways can you network 2 computers at a time if the flow
of information is unidirectional?
n! 10!
Pkn = = = 90
(n − k )! 8!
From the Permutations formula, you know that there are 90 different ways to network 2
computers together out of 10 where order matters, but again, the formula does not tell us what
they actually are. We can use the matrix to help us find all of the pairs, but this time, the entire
table is filled in. Why is this?
This is because the assumption is that the connections between 2 computers is reciprocated no
longer holds. Computer 1 communicating with Computer 2 is not the same as Computer 2
communicating with Computer 1. So in the case of permutations, the order matters.
Again, the diagonal of this matrix not filled in because an object cannot be matched to itself.
1 2 3 4 5 6 7 8 9 10
Here is another example problem: You have 5 employees who have Instant Messenger, Dana,
Pat, Cameron, Kelly, and Robin. How many different conversations between 2 people can there
be if the order in which the conversation is started matters? Enumerate the conversations.
n! 5!
P25 = = = 20
(n − k )! 3!
D P C K R
Example 1: Draw people from a classroom and observe whether they are male or female, and if
their major is Education, Psych, or Health. Each cell contains a frequency.
Each cell, except for the Totals are intersections between 2 things.
We can turn all these cell frequencies into cell probabilities to make a probability (contingency)
table. If you take each frequency and divide it by the total, (n = 116), you get this table:
Union
The probability of the event A U B (A or B) is: P(A U B) = P(A) + P(B) - P(A B)
Complement - the probability of the complement of an event A (i.e. "not A") is equal to one
minus the probability of the event:
P(Ac)= 1 - P(A)
Example: Draw one voter from list with: 35% Republican, 53% Dem., 12% Ind.
P(R U D) = P(R or D) = P(R)+P(D) = .35 + .53 = .88 (R,D are M.E.)
P(not R) = 1 - P(R) = 1 - .35 = .65
Conditional Probability
Def. Let A & B be any two events for which P(B) > 0. The “conditional probability of the event
A, given the event B” is defined as:
P(A B)
P(A|B)= Some simple algebra shows that therefore: P(A|B)P(B) = P(A B)
P(B)
Ex. Choose a card at random, observe that it is red. What is the P(ace, given that it's red)?
2
P(ace and red) 2 1
P(ace|red) = = 52 = =
P(red) 26 26 13
52
Males Females
Freshman .16 .14 .30
Sophomore .12 .14 .26
Junior .10 .13 .23
Senior .07 .14 .21
.45 .55 1.00
P(M2|M1) = 4/6
P(F2|M1) = 2/6
Tree diagram:
P(M1) = 5/7
P(M2|F1) = 5/6
P(F1) = 2/7
P(F2|F1) = 1/6
Example 3: Flip a coin three times, recording result of each toss. Outcome space will be
triples: H1H2H3, etc.
Note. For trees with more than 2 successive draws, need conditional probabilities involving
compound events. P(H1H2H3) = P(H1)*P(H2|H1)*P(H3|H1 H2)
Example: Choose a card at random, observe that it is red. What is the (conditional)
probability that it is an ace, given that it is a red card, P(A|R)?
P(A R) 2 52 2 1
P(A|R)= = = =
P(R) 26 52 26 13
What does this imply about the independence of events A and R?
How about the conditional probability that it is a heart, given that it is a red
card?
P(H R) 13 52 13 1
P(H|R)= = = =
P(R) 26 52 26 2
Males Females
Freshman .16 .14 .30
Sophomore .12 .14 .26
Junior .10 .13 .23
Senior .07 .14 .21
.45 .55 1.00
Check by definition 2
Verify if P(Fr Sen) = P(Fr) * P(Sen)?
P(Fr Sen) = 0
P(Fr) = .30
P(Sn) = .21
P(Fr)* P(Sn) = .063 P(F Sn) = 0,
Therefore Freshman and Senior are NOT independent.
Two events A and B are mutually exclusive if have no outcomes in common (i.e. if A occurs,
then B can't occur, and vice versa)
Two events A and B are independent if A occurring or not makes no difference as to how
probable B is (i.e. P(A)=P(A|B)=P(A|BC))
What do these concepts have to do with each other? NOTHING! No, less than nothing
One can prove that if A and B are mutually exclusive., then they cannot be independent (unless
P(A)=0 or P(B)=0)
P(A B) P(A B)
P(A|B) = ; P(B|A) =
P(B) P(A)
P(B) * P(A|B) = P(A B) = P(A)*P(B|A)
Bayes’ Theorem:
P(B) * P(A|B)
P(B|A) =
P(A)
Example 1: You are leaving for spring break! Since it’s March, the weather can still be
unpredictable in NYC. There is a 30% chance that it will snow on the day you are scheduled to
depart. So you call La Guardia airport and ask them for some information. They tell you that
the probability that a flight will take off on time is 0.59. They also tell you that the probability
that a flight will take off on time GIVEN that it snows is 0.10. What is the probability that it
snows, given that the flight takes off on time?
Example 2: You are leaving for spring break! Since it’s March, the weather can still be
unpredictable in NYC. There is a 30% chance that it will snow on the day you are scheduled to
depart. So you call La Guardia airport and ask them for some information. They tell you that
the probability that a flight will take off on time GIVEN that it snows is 0.10. They also tell you
that the probability that a flight will take off on time GIVEN that it doesn’t snow is 0.80. What is
the probability that it snows, given that the flight takes off on time?
Note that the denominator of the problem P(O) is made up of 2 parts. The probably of a flight
taking off on time GIVEN it snows and the probability of a flight taking off on time GIVEN it doesn’t
snow.