L16 RandomVariables ExpectedValue F20

Random Variables & Expectation
(Rosen Sections 7.2 and 7.4)
Comic by Zach Wiener, http://www.smbc-comics.com

Fall 2020 UM EECS 203 Lecture 16 1
Details to Consider
(Updates from Lecture 15)
• CSE Two Attempts Rule [203, 280, 281, and more]
– Each student gets two attempts to earn a passing grade
(C or better)
– Withdrawing from the course counts as an attempt
• Appears as a “W” on your transcript
– From EECS Undergraduate Advising Office: “Elections of EECS 203, EECS 280. and
EECS 281 during WN20, SP/SU/SS20, and FA20 will not count as attempts.”
• 281 Prerequisites
– Need average grade of 2.5 between 203 and 280
– From EECS Undergraduate Advising Office: “The move of the 2.5 GPA
requirement from enforced prerequisite to advisory was approved by the College
at the beginning of the semester. This already reflects in the public course catalog
via Wolverine Access.”
• Late Drop Deadline (F20/W21 special circumstances)
– “We will also institute a more flexible withdrawal policy, allowing students to
withdraw from a course at any time up until the end of classes and not have the
course appear on their transcript.” – email from Provost Collins to Faculty, 8/20/20

Discrete Probability: Review
• Experiment: Procedure that yields an outcome
• Sample space: Set S of all possible outcomes
• Event: a subset of S (i.e., event is a set consisting of individual outcomes)
• Probability distribution (function) p : S ® [0,1]
– For s Î S, 0 £ p(s) £ 1 (each outcome is assigned a probability)
– SsÎ S p(s) = 1 (probabilities sum to 1)
• Probability of an event E Í S: p(E) = SsÎ Ep(s)

• Conditional probability: p(F|E) = p(E∩F)/p(E)
• Independent events: p(E∩F) = p(E)p(F) or p(E|F)=p(E)
! #|$ !($)
• Bayes Rule: 𝑝 𝐸 𝐹 =
!(#)
Bayes Theorem (Ch 7.3)

Diagnosing a Rare Disease
• Meningitis is rare: P(m) = 1/50000 = .00002
• Meningitis causes stiff neck: P(s | m) = 0.5
• Stiff neck is not so rare: P(s) = 1/20
• You have a stiff neck. What is P(m | s), the
probability of meningitis if you have a stiff neck?

Diagnosing a Rare Disease
• Meningitis is rare: P(m) = 1/50000 = .00002
• Meningitis causes stiff neck: P(s | m) = 0.5
• Stiff neck is not so rare: P(s) = 1/20
• You have a stiff neck. What is P(m | s)?
P(s | m) P(m) 0.5 ×1/50000
P(m | s) = = = 0.0002
P(s) 1/20
• So, you are 10 times more likely to have meningitis if
you have a stiff neck!
• But it is still very unlikely – A high likelihood
(symptom match) doesn’t overcome a very low prior
(to evidence) probability.
Bayes’ Theorem
Suppose that E and F are events from a sample Thomas Bayes
space S such that p(E) > 0 and p(F) > 0. Then 1702-1761
This rule is the foundation of Bayesian methods for probabilistic

reasoning, which are very powerful and widely used in artificial
intelligence applications:
• For data mining, automated diagnosis, pattern recognition, statistical
modeling, even evaluating scientific hypotheses.

Why is Bayes’ Rule so useful?
P(s | d) P(d)
P(d | s) =
P(s)
• Diagnostic evidence P(disease | symptom) is often

hard to get.
– But it’s usually what you really want.
• Causal evidence P(symptom | disease) is often
easier to get.
• P(disease) is easy to get.
• P(symptom) is just a normalizer.

Not So Fast My Friend!
Sure, Bayes Theorem works great when you’ve been given all the numbers you
need: P(s | m) P(m) 0.5 ×1/50000
P(m | s) = = = 0.0002
P(s) 1/20
Often, the denominator isn’t directly given.
𝑠∩𝑚
* 𝑠∩𝑚
But you usually can figure it out!
• You know
€ 𝑝 𝑠|𝑚 and 𝑝(𝑚), so you know probability
of a stiff neck from meningitis. 𝑝 𝑠 ∩ 𝑚 = 𝑝 𝑠 𝑚 𝑝(𝑚) s
m
• How about a stiff neck from something else?
– Same as asking: What is the probability of a stiff neck
without meningitis? 𝑝 𝑠∩𝑚 $ =𝑝 𝑠𝑚 $ 𝑝(𝑚)
$
• And what’s the probability of not having meningitis?
𝑝 𝑚
$ = 1 − 𝑝(𝑚)
For any events s and m,
𝑝 𝑠 =𝑝 𝑠∩𝑚 +𝑝 𝑠∩𝑚
*
= 𝑝 𝑠 𝑚 𝑝 𝑚 + 𝑝 𝑠|𝑚
* 𝑝 𝑚
*
= 𝑝 𝑠 𝑚 𝑝 𝑚 + 𝑝 𝑠|𝑚
* (1 − 𝑝 𝑚 )
Bayes’ Theorem
Suppose that E and F are events from a sample Thomas Bayes
1702-1761
space S such that p(E) > 0 and p(F) > 0. Then
• Replace p(E) in denominator with something equivalent.
• Why is the bottom line the same as p(E)?

• On previous slide
• Notice first term in the sum in the denominator is the same as

the term in the numerator! Not an accident!!
Probability of an Event
The probability of an event is the sum of the
probabilities of all of the disjoint
combinations of events that include it
p( E ) = p( E | F ) p( F ) + p( E | F ) p( F )
U
E
F

Proving Bayes’ Theorem
From definition of conditional probability:
E is union of disjoint events : ( E Ç F ) ! ( E Ç F )
Plugging in:

Useful Identities to remember
• We can show that the following equality holds
for any events Y and D:
p(Y | D) + p(Y | D) = 1
• Proof:
p(Y ∩ D) + P(Y ∩ D) Defn of conditional
p(Y | D) + p(Y | D) = probability
p(D)
p((Y∩ D)∪ (Y ∩ D)) Union of disjoint
€ =
p(D)
events
Distributive law of sets

p((Y∪ Y)∩ D) p(D)
= = =1
p(D) p(D) Y ∪Y = S
S=sample space
€
p(Y | D) + p(Y | D) = 1
• Proof:
p(D)
€ =
p(D)
events

p((Y∪ Y)∩ D) p(D)
= = =1
p(D) p(D) Y ∪Y = S
S=sample space
€
p(Y | D) + p(Y | D) = 1
• Proof:
p(D)
€ =
p(D)
events

p((Y∪ Y)∩ D) p(D)
= = =1
p(D) p(D) Y ∪Y = S
S=sample space
€
p(Y | D) + p(Y | D) = 1
• Proof:
p(D)
€ =
p(D)
events

p((Y∪ Y)∩ D) p(D)
= = =1
p(D) p(D) Y ∪Y = S
S=sample space
€
p(Y | D) + p(Y | D) = 1
• Proof:
p(D)
€ =
p(D)
events

p((Y∪ Y)∩ D) p(D)
= = =1
p(D) p(D) Y ∪Y = S
S=sample space
€
Important Tips for Bayes’ Theorem
• When applying Bayes’ Theorem to a situation
that involves events Y and D (or any arbitrary
notation), you can use the following identities
p(Y | D) + p(Y | D) = 1
p(Y | D ) + p(Y | D ) = 1
p(Y ) + p(Y ) = 1
p(D) + p(D ) = 1
False positives, False negatives
• Let D be the event that a person has the disease
• Let Y be the event that a test for the disease comes back
positive
𝑫 "
𝑫
𝒀 True positive False positive
"
𝒀 False negative True negative
We want
• High probabilities of TP and TN
• Low probabilities of FP and FN
Y ∩ D : True positive (accurate prediction)
Y ∩ D : False positive (inaccurate prediction)
Y ∩ D : False negative (inaccurate prediction)
Y ∩ D : True negative (accurate prediction)
Testing for Disease (Medical Diagnosis)
• The disease is very rare: p(D) = 1/100,000
• Testing is Accurate
– Few False Positives p(Y | D ) = 0.5%
– Few False Negatives p(Y | D) = 1%
• Suppose you get a positive result. What do you
conclude? € p(Y | D) = 1− p(Y | D)
€
p (Y | D) p ( D)
p( D | Y ) =
p (Y | D) p ( D) +€p (Y | D ) p ( D )
0.99 *10 -5
= -5 -5
» 0.002
0.99 *10 + 0.005 * (1 - 10 )
Ans: If you get a positive result, there is a 0.2% chance that you have the disease.
Exercise
• When a test for steroids is given to soccer players
– 98% of the players taking steroids test positive
– 12% of the players not taking steroids test positive
• Suppose 5% of soccer players take steroids. What is
the probability that a soccer player who tests
positive takes steroids?

Exercise
• When a test for steroids is given to soccer players
– 98% of the players taking steroids test positive
– 12% of the players not taking steroids test positive
• Suppose 5% of soccer players take steroids. What is
the probability that a soccer player who tests
positive takes steroids?
Shorthand notation : S = Event that a player takes steroids
Y = Event that a player gets a positive result
p (Y | S ) p ( S )
p( S | Y ) =
p (Y | S ) p ( S ) + p (Y | S ) p ( S )
0.98 * 0.05
= » 0.301
0.98 * 0.05 + 0.12 * 0.95
Automobile Diagnosis Example
• Say that a car engine can have 4 different
(mutually-exclusive) kinds of failures: Electrical,
Fuel, eXhaust, and Timing-belt.
– We know the probability that these happen:
p(E)=.05, p(F)=.1, p(X)=.01, p(T)=.001
• For each kind of failure, we know the probability

of symptoms that include: Won’t-start, Idles-
rough, Stalls, Revs-high.
– E.g., p(W|E)=.2, p(I|E)=.4… p(W|F)=.7…
• If the car Won’t-Start, what is the probability that

the car has an Electrical problem?
Automobile Diagnosis Example
If the car Won’t-Start, what is the probability that the
car has an Electrical problem?
P(W | E )P( E )
P( E | W ) =
P(W )
P(W | E )P( E )
=
P(W | E ) P( E ) + P(W | E ) P( E )
P(W | E )P( E )
=
P(W | E ) P( E ) + P(W | F ) P( F ) + P(W | X ) P( X ) + P(W | T ) P(T )
Looking at what proportion of all of

E F
the samples for event W are also
associated with event E. W
And the probability of event E versus
the other events. T X

Generalized Bayes’ Theorem
• If E is an event from sample space S Thomas Bayes
• And F1, F2, … Fn are mutually exclusive events that 1702-1761
entirely cover S: !i =1 Fi = S and thus: åi =1 p( Fi ) = 1

n n
• And probability of all events > 0

• Then:
P( E | F j ) p( F j )
P( F j | E ) =
å
n
i =1
P ( E | Fi ) p ( Fi )
In earlier slide, called the “normalizer”. Why?
Answer: This is what ensures that: ∑*

'() 𝑃 𝐹' 𝐸 = 1
Coin-toss Example
A coin whose probability of getting heads is 2/3 is tossed 8 times.
What is the probability of exactly 3 heads in the 8 tosses?
One sequence with 3 heads: THHTTHTT

! # # ! ! # ! !
Probability of this sequence: " " " " " " " "
= (2/3)3 *(1/3)5
• So, what is the probability of any particular (2/3)3 *(1/3)5

sequence with 3 heads?
8
• How many of sequences with 3 heads are there? 3
Probability of getting 3 heads in the 8 tosses:
8 2- !
1- "
𝑃 3 ℎ𝑒𝑎𝑑𝑠 = 3 3
3
Coin-toss Example
A coin whose probability of getting heads is 2/3 is tossed 8 times.
What is the probability of exactly 3 heads in the 8 tosses?
One sequence with 3 heads: THHTTHTT

! # # ! ! # ! !
Probability of this sequence: " " " " " " " "
= (2/3)3 *(1/3)5
• So, what is the probability of any particular (2/3)3 *(1/3)5

sequence with 3 heads?
8
• How many of sequences withTake-aways
3 heads are there?
from the
example:
3
Probability of getting 3 heads •in theThe
8 tosses:
Experiment consists of repeating a simpler
8 2 ! 1 "
𝑃 3 ℎ𝑒𝑎𝑑𝑠 = -3 -experiment
3 n times.
3
• This simpler experiment is “simpler” because it’s
sample space consists of two outcomes.
• In turn, the Sample space of the experiment is the
Cartesian product of the “simpler” two-outcome
sample spaces.
• Events of interest may consist of the number of times
one of the two simpler experimental outcomes occurs.
Bernoulli Trials, Binomial Experiment, and
Binomial Distribution
• Bernoulli Trial
– Experiment that has exactly 2 outcomes, S = {success, failure}
• p(success) = p
• p(failure) = q = (1-p)
• Binomial Experiment
– Repeat the Bernoulli trial n times where each trial has the same
probability of success and all trials are mutally independent
• Binomial Distribution
– Let E be that k successes occur in n Bernoulli trials. Then
𝑛 # $%#
p(E) = 𝑝 𝑞
𝑘

Binomial Distribution
Binomial Distribution:
The probability of exactly k successes in n independent (and
identically distributed) Bernoulli trials is
𝑛 # $%#
𝑝 𝑞
𝑘
æ n ö k n-k n
• Binomial Theorem: for any x and y, (x + y) = åç ÷ x y n
k= 0 è ø
• Binomial expansion of 𝑝 + 𝑞 + k
– Each term gives probability of k successes
$ $ $ $
𝑝+𝑞 $ = ∑$#&' #
𝑝# 𝑞$%# = '
𝑞$ + (
𝑝( 𝑞$%( +…+ $
𝑝$
• What is 𝑝 + 𝑞 + ?
𝑝 + 𝑞 + = 𝑝 + (1 − 𝑝 )+ = 1 + = 1
• Why is this important?
Answer: The sum of the probabilities must be 1
Random Variables
A random variable is a function from
a sample space to the real numbers
X:S→R
R
-2 0 2
S
Note: random variable is a bad choice of name!

• It is a function, not a variable
• And the function isn’t random
But this is the name we’re stuck with!
Random Variables
• A random variable is a function X : S → R
• “X = r” is the event {s∈S | X(s) = r},
• “X ≥ r” is the event {s∈S | X(s) ≥ r}, etc.
-2 0 2
R
S
Suppose our experiment is a roll of 2 dice. S is set of pairs.

• X = sum of two dice. X((2,3)) = 5
• Y = difference between two dice. Y((2,3)) = 1
• Z = max of two dice. Z((2,3)) = 3

Random Variable: Example
• Roll 2 dice. X(s)= sum of numbers on outcome s.
• X((1,1))= 2,
• X((1,2))= X((2,1))= 3,
• X((1,3))= X((2,2))= X((3,1))= 4,
• X((1,4))= X((2,3))= X((3,2))= X((4,1))= 5,
• X((1,5))= X((2,4))= X((3,3))= X((4,2))= X((5,1))= 6,
• X((1,6))= X((2,5))= X((3,4))= X((4,3))= X((5,2))= X((6,1))= 7,
• X((2,6))= X((3,5))= X((4,4))= X((5,3))= X((6,2))= 8,
• X((3,6))= X((4,5))= X((5,4))= X((6,3))= 9,
• X((4,6))= X((5,5))= X((6,4))= 10,
• X((5,6))= X((6,5))= 11,
• X((6,6))= 12
• Events • Probability of events

– “X=7” = {(1,6), (2,5), (3,4), (4,3), (5,2), (6,1)} – p(X=7) = 1/6
– “X>10” = {(5,6), (6,5), (6,6)} – p(X >10) = 3/36 = 1/12
– “X=7 or X=11” = {(1,6), (2,5), (3,4), (4,3), – p(X=7 or X = 11) = 8/36
(5,2), (6,1), (5,6), (6,5)}

Random Variable: Example
• Roll 2 dice. X= sum of two numbers on dice. 0.18
Probabilities of the sum of two dice
– X((1,1))= 2,
– X((1,2))= X((2,1))= 3, 0.16
– X((1,3))= X((2,2))= X((3,1))= 4,
– X((1,4))= X((2,3))= X((3,2))= X((4,1))=
0.14 5,
– X((1,5))= X((2,4))= X((3,3))= X((4,2))= X((5,1))= 6,
– X((1,6))= X((2,5))= X((3,4))= X((4,3))=
0.12 X((5,2))= X((6,1))= 7,
– X((2,6))= X((3,5))= X((4,4))= X((5,3))= X((6,2))= 8,
– 0.1 9,
X((3,6))= X((4,5))= X((5,4))= X((6,3))=
p(X)
– X((4,6))= X((5,5))= X((6,4))= 10,

– X((5,6))= X((6,5))= 11, 0.08
– X((6,6))= 12.
0.06
0.04
0.02
0
2 4 6 8 10 12
X

Random Variable: Binomial Distribution
X is the number of successes in n Bernoulli trials
where p(success) = 1/2
✓ ◆
n 1
P (X = k) =
k 2n
Binomial(12,1/2) Binomial(24,1/2)
0.25 0.18
0.16
0.2
0.14
0.12
0.15
0.1
p(X)
p(X) 0.08
0.1
0.06
0.05 0.04
0.02
0 0
0 2 4 6 8 10 12 0 5 10 15 20 25
X X

Random Variable: Binomial Distribution
X is the number of successes in n Bernoulli trials
where probability of a success
✓ ◆ is p
n k
P (X = k) = p (1 p)n k
k
Binomial(24,1/5) Binomial(24,4/5)
0.2 0.2
0.18 0.18
0.16 0.16
0.14 0.14
0.12 0.12
p(X)
p(X)
0.1 0.1
0.08 0.08
0.06 0.06
0.04 0.04
0.02 0.02
0 0
0 5 10 15 20 25 0 5 10 15 20 25
X X

Expected Value
• Let p be a probability distribution over S.
• The expected value of X : S → R is the average of
X over S, according to (“weighted by”) the
distribution p:
E( X) = ∑ p(s) ⋅ X(s) =∑ p( X = r) ⋅ r
s∈S r
This sum is over the This sum is over the possible

outcomes in sample space S values that X can take.
€ Fall 2020 UM EECS 203 Lecture 16 58
Expected Value
E( X) = ∑ p(s) ⋅ X(s) =∑ p( X = r) ⋅ r
s∈S r
• Expected value of a roll of a (fair) die:
• S = {1,2,3,4,5,6}, p(s)=1/6 and X(s)=s, for all s∈S.
𝐸 𝑋 = % 𝑝(𝑠) * 𝑋(𝑠)
€ #∈%
=𝑝 1 𝑋 1 +𝑝 2 𝑋 2 +𝑝 3 𝑋 3 +𝑝 4 𝑋 4 +𝑝 5 𝑋 5 +𝑝 6 𝑋 6
= (1⁄6) * 1 + (1⁄6) * 2 + (1⁄6) * 3 + (1⁄6) * 4 + (1⁄6) * 5 + (1⁄6) * 6
= 1⁄6 + 2⁄6 + 3⁄6 + 4⁄6 + 5⁄6 + 6⁄6 = 7/2
• Expected value of a single-number lottery for $10M with a 1 in 100M

chance of winning: • Choose a number between
• S = {1,2,3,4,…,108}, X(s) ∈ {0, 10M} = {0, 107} 1 and 100,000,000
• p(X=10M) = 10-8, p(X=0) = 1- p(X=10M) = 1 - 10-8
• The winning number will
win $10M
– 𝐸 𝑋 = ∑& 𝑟 * 𝑝 𝑋 = 𝑟
= 0 * 𝑝 𝑋 = 0 + 10' * 𝑝 𝑋 = 10' = 0 + 10' * 10() = 1/10
Random Variable: Example 1
• Roll 2 dice. X= sum of two numbers on dice. What is E(X)?
– X((1,1))= 2,
– X((1,2))= X((2,1))= 3,
– X((1,3))= X((2,2))= X((3,1))= 4,
– X((1,4))= X((2,3))= X((3,2))= X((4,1))= 5,
– X((1,5))= X((2,4))= X((3,3))= X((4,2))= X((5,1))= 6,
– X((1,6))= X((2,5))= X((3,4))= X((4,3))= X((5,2))= X((6,1))= 7,
– X((2,6))= X((3,5))= X((4,4))= X((5,3))= X((6,2))= 8,
– X((3,6))= X((4,5))= X((5,4))= X((6,3))= 9,
– X((4,6))= X((5,5))= X((6,4))= 10,
– X((5,6))= X((6,5))= 11,
– X((6,6))= 12.
• E ( X ) = å p( s ) × X ( s ) would add up 36 values (every outcome)
sÎS
• E ( X ) = å p( X = r ) × r would add up 11 values (every value of X)

r
Is there an easier way to

– Either way, E(X) = 7
arrive at this conclusion?
Random Variable: Example 2
• If you rolled two dice a dozen times, what would be the expected
number of times that a 7 would be rolled?
– Bernoulli trials again! There could be anywhere from 0 to 12
7’s rolled. n = 12, p = 1/6
– Let Y be the r.v. of the number of 7s in 12 rolls.
12
E (Y ) = å k × p (Y = k ) = 2
k =0 Once again, is there an
easier way to get this answer,
𝑛 # $%# without evaluating a 13-term
𝑝 𝑌=𝑘 = 𝑝 𝑞 sum?
𝑘

Linearity of Expectations
• The expected value of the sum of random
variables is the sum of their expectations.
E( X1 + X2 ) = E( X1 ) + E( X2 )
E(a ⋅ X + b) = a ⋅ E( X) + b for any constants a,b
• Proof:
€
E( X1 + X2 ) =∑ p(s)( X1 (s) + X2 (s)) defn of E( )
s∈S
= ∑ p(s) X (s) + ∑ p(s) X (s)

1 2
algebra
s∈S s∈S
= E( X1 ) + E( X2 ) defn of E( )

Linearity of Expectations… Examples
• The expected value of the sum of random variables is the sum
of their expectations.
E( X1 + X2 ) = E( X1 ) + E( X2 )
E(a ⋅ X + b) = a ⋅ E( X) + b for any constants a,b
• Example 1: X = sum of two dice, i.e., X((a,b)) = a+b

– X1 = outcome of 1st die, X2 = outcome of 2nd die
– X = X1+X2.
–€E(X) = E(X1) + E(X2) = 7/2 + 7/2 = 2(7/2) = 7
• Example 2: Y = number of 7s in 12 rolls of two dice

– Consider each roll individually Expected number of
• Yi = 1 (if roll i was a 7) or 0 (if not) successes in n
– E(Yi) = 1*1/6 + 0*5/6 = 1/6 Bernoulli trials is np
– E(Y) = E(Y1+ Y2+ … + Y12) = E(Y1) + E(Y2) + … + E(Y12)
= 1/6 + 1/6 + … + 1/6 = 12 (1/6) = 2
The Renegade GSI problem
• The GSIs grade n students’ exams and enter them in a
spreadsheet. A renegade GSI then permutes the score
column randomly.
• What is the expected number of students who
nonetheless get assigned the correct score?
– (Assume all students and scores are distinct.)
A: 0
B: 1
C: 2
D: n/2
E: n(n-1)/2

The Renegade GSI problem… Solution
• What is the expected number of n students who end up with A: 0
the correct score after a renegade GSI randomly permutes B: 1
the scores? C: 2
D: n/2
Solution E: n(n-1)/2
• Let 𝑋 = the number of students who have the correct score.
• Let 𝑋, be an indicator variable
1 , if the 𝑖 12 student has the correct score
𝑋0 = 0
0 , otherwise
• Then: 𝑋 = 𝑋- + 𝑋. + 𝑋/ + ⋯ + 𝑋+ 𝑝 𝑋! = 1 = 1/𝑛
• 𝐸 𝑋, = 1 , 𝑝 𝑋, = 1 + 0 , 𝑝 𝑋, = 0 because each student is
equally likely to get any
= 1 , 1/𝑛 + 0 = 1/𝑛 one of n scores
𝐸 𝑋 = 𝐸(𝑋- + 𝑋. + 𝑋/ + ⋯ + 𝑋+ )
= 𝐸 𝑋- + 𝐸 𝑋. + … + 𝐸 𝑋+ By linearity of expectations
-
= 𝑛 , 𝐸 𝑋- = 𝑛 , =1
+
Geometric Distributions
• A coin has probability p of Tails. Flip it until you get the
first Tails; count the total number of flips (including the
last one).
– What is the sample space S?
– What is the probability distribution?
A r.v. X has a geometric distribution with parameter p if:

𝑝 𝑋 = 𝑘 = (1 − 𝑝)#%( 𝑝
If X is geometrically distributed with parameter p then:
𝐸 𝑋 = 1/𝑝
• Proof…
Random Variable: Geometric Distribution
Geometric(1/5)
0.2
0.18
0.16
0.14
0.12
p(X)
0.1
0.08
0.06
0.04
0.02 Geometric(1/50)
0.02
0
0 5 10 15 20 25 0.018
X
0.016
0.014
p(X) 0.012
0.01
0.008
0.006
0.004
0.002
0
0 20 40 60 80 100
X

• The sample space is countably infinite!
– S = {T, HT, HHT, HHHT, HHHHT, HHHHHT, … }
• X(s) = {1, 2, 3, 4, 5, 6, …} (X = number of flips until the first Tails)
• The probability of a particular sequence is
Do you remember what this
𝑝 𝑋 = 𝑗 = (1 − 𝑝))%( 𝑝. sum is off the top of your head?
• So: (Neither did I.)
¥ ¥ ¥
E( X) = å j × P( X = j) = å j(1- p) j -1 p = på j(1- p) j -1
j =1 j =1 j =1
¥
1 1
å j (1 - p)
j =1
j -1
=
(1 - (1 - p )) 2
= 2 (From
p
(From Table
Table2,2,PgPg176)
166)
) )
Thus, 𝐸 𝑋 = 𝑝 !8
= !
• The sample space is countably infinite!
• The probability of sequence: 𝑝 𝑋 = 𝑗 = (1 − 𝑝)'+) 𝑝.
¥
E( X) = å j × P( X = j)
j =1 How many times is the event
¥ “X=j” counted in this sum?
= å P( X ³ j) A handy alternative expression for E(X)
j =1
¥ when X is always a non-negative integer.
= åj =1
(1- p) j -1
1 1
= =
1- (1- p) p
Watching Seinfeld Reruns
• Every night a Seinfeld episode is drawn uniformly
at random from the 180 shows and broadcast.
• What is the expected number of nights you need
to watch to see all episodes?
• This is a tricky problem if you don’t start right!

– Use linearity of expectations.
– You know the expectation of a geometric distribution.

Seinfeld Reruns
• Let Xi→ j be the number of days you have to watch
to go from having watched i distinct shows to having
watched j distinct shows. Then,

Independence of Random Variables
• Random variables X and Y on a sample space S are independent if
the events “X(s)=a” and “Y(s)=b” are independent, for all a,b:
i.e., p( X ( s)= a and Y ( s) = b) = p( X ( s) = a) × p(Y ( s) = b)
Thm. If X and Y are independent r.v.s then:
E(XY) = E(X)E(Y)
(Not necessarily true if X and Y are dependent!)
• Independent: X1 and X2 are number of heads on first and second flip respectively. So
E(X1)= E(X2)= 1/2.
(X1X2)(HH)= 1*1 = 1. (X1X2)(HT)= 1*0 = 0. (X1X2)(TH)= 0*1 = 0. (X1X2)(TT)= 0*0 = 0
Then E(X1X2)= ¼*1 + ¼*0 + ¼*0 + ¼*0 = ¼ = E(X1)E(X2).
• Dependent: X1 is number of heads on first flip and X3 is number of heads across both
flips. So E(X1)= ½, E(X3)= 1.
(X1X3)(HH)=1*2=2. (X1X3)(HT)=1*1=1. (X1X3)(TH)=0*1=0. (X1X3)(TT)=0*0=0
E(X1X3) = ¼*2 + ¼*1 + ¼*0 + ¼*0 = ¾ ≠ ½ = E(X1)E(X2).

Limits of the Usefulness of Expectation
• Expectation tells us what to expect from a random variable
on average.
– E.g., when you roll a pair of dice many times, the average sum should
be close to 7.
– This doesn’t mean that a “7” is particularly likely.
• Similarly, when you wait for the bus, you might know what
the average arrival time is at that stop, …
– But even if you get to the bus stop a little early, there’s still a chance
you’ll have missed the bus.
• Sometimes helpful to know how widely the values of a
random variable are distributed.
– How early should you get to the bus stop to keep the chances of
missing the bus below some small probability?
• Next time: Variance!
Take Aways
• Bayes Theorem
• Bernoulli Trials
• Binomial Distribution
• Random Variables
– What is a RV?
– E(X):
• Two ways: Sum over outcomes or sum over values of X
• Linearity of expectation
• Common Distributions: geometric, binomial, etc.
– Independence of RVs

L16 RandomVariables ExpectedValue F20

Uploaded by

Copyright:

Available Formats

L16 RandomVariables ExpectedValue F20

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

L16 RandomVariables ExpectedValue F20

Uploaded by

Copyright:

Available Formats

Random Variables & Expectation

(Rosen Sections 7.2 and 7.4)

Comic by Zach Wiener, http://www.smbc-comics.com

Fall 2020 UM EECS 203 Lecture 15 2

• Probability of an event E Í S: p(E) = SsÎ Ep(s)

Fall 2020 UM EECS 203 Lecture 16 8

Fall 2020 UM EECS 203 Lecture 16 9

This rule is the foundation of Bayesian methods for probabilistic

Fall 2020 UM EECS 203 Lecture 16 11

• Diagnostic evidence P(disease | symptom) is often

Fall 2020 UM EECS 203 Lecture 16 12

space S such that p(E) > 0 and p(F) > 0. Then

• Replace p(E) in denominator with something equivalent.

• Why is the bottom line the same as p(E)?

• Notice first term in the sum in the denominator is the same as

Fall 2020 UM EECS 203 Lecture 16 15

E is union of disjoint events : ( E Ç F ) ! ( E Ç F )

Fall 2020 UM EECS 203 Lecture 16 16

Distributive law of sets

Distributive law of sets

Distributive law of sets

Distributive law of sets

Distributive law of sets

Fall 2020 UM EECS 203 Lecture 16 25

• For each kind of failure, we know the probability

• If the car Won’t-Start, what is the probability that

Looking at what proportion of all of

Fall 2020 UM EECS 203 Lecture 16 28

entirely cover S: !i =1 Fi = S and thus: åi =1 p( Fi ) = 1

• And probability of all events > 0

In earlier slide, called the “normalizer”. Why?

Answer: This is what ensures that: ∑*

One sequence with 3 heads: THHTTHTT

• So, what is the probability of any particular (2/3)3 *(1/3)5

One sequence with 3 heads: THHTTHTT

• So, what is the probability of any particular (2/3)3 *(1/3)5

Fall 2020 UM EECS 203 Lecture 16 41

Note: random variable is a bad choice of name!

Suppose our experiment is a roll of 2 dice. S is set of pairs.

Fall 2020 UM EECS 203 Lecture 16 46

• Events • Probability of events

Fall 2020 UM EECS 203 Lecture 16 47

– X((4,6))= X((5,5))= X((6,4))= 10,

Fall 2020 UM EECS 203 Lecture 16 48

Fall 2020 UM EECS 203 Lecture 16 49

Fall 2020 UM EECS 203 Lecture 16 50

This sum is over the This sum is over the possible

• Expected value of a single-number lottery for $10M with a 1 in 100M

• E ( X ) = å p( X = r ) × r would add up 11 values (every value of X)

Is there an easier way to

Fall 2020 UM EECS 203 Lecture 16 61

= ∑ p(s) X (s) + ∑ p(s) X (s)

Fall 2020 UM EECS 203 Lecture 16 63

• Example 1: X = sum of two dice, i.e., X((a,b)) = a+b

• Example 2: Y = number of 7s in 12 rolls of two dice

Fall 2020 UM EECS 203 Lecture 16 65

A r.v. X has a geometric distribution with parameter p if:

Fall 2020 UM EECS 203 Lecture 16 69

• This is a tricky problem if you don’t start right!

Fall 2020 UM EECS 203 Lecture 16 72