L16 RandomVariables ExpectedValue F20

Download as pdf or txt
Download as pdf or txt
You are on page 1of 52

Random Variables & Expectation

(Rosen Sections 7.2 and 7.4)

Comic by Zach Wiener, http://www.smbc-comics.com


Fall 2020 UM EECS 203 Lecture 16 1
Details to Consider
(Updates from Lecture 15)
• CSE Two Attempts Rule [203, 280, 281, and more]
– Each student gets two attempts to earn a passing grade
(C or better)
– Withdrawing from the course counts as an attempt
• Appears as a “W” on your transcript
– From EECS Undergraduate Advising Office: “Elections of EECS 203, EECS 280. and
EECS 281 during WN20, SP/SU/SS20, and FA20 will not count as attempts.”
• 281 Prerequisites
– Need average grade of 2.5 between 203 and 280
– From EECS Undergraduate Advising Office: “The move of the 2.5 GPA
requirement from enforced prerequisite to advisory was approved by the College
at the beginning of the semester. This already reflects in the public course catalog
via Wolverine Access.”
• Late Drop Deadline (F20/W21 special circumstances)
– “We will also institute a more flexible withdrawal policy, allowing students to
withdraw from a course at any time up until the end of classes and not have the
course appear on their transcript.” – email from Provost Collins to Faculty, 8/20/20

Fall 2020 UM EECS 203 Lecture 15 2


Discrete Probability: Review
• Experiment: Procedure that yields an outcome
• Sample space: Set S of all possible outcomes
• Event: a subset of S (i.e., event is a set consisting of individual outcomes)
• Probability distribution (function) p : S ® [0,1]
– For s Î S, 0 £ p(s) £ 1 (each outcome is assigned a probability)
– SsÎ S p(s) = 1 (probabilities sum to 1)

• Probability of an event E Í S: p(E) = SsÎ Ep(s)


• Conditional probability: p(F|E) = p(E∩F)/p(E)
• Independent events: p(E∩F) = p(E)p(F) or p(E|F)=p(E)
! #|$ !($)
• Bayes Rule: 𝑝 𝐸 𝐹 =
!(#)
Fall 2020 UM EECS 203 Lecture 16 7
Bayes Theorem (Ch 7.3)

Fall 2020 UM EECS 203 Lecture 16 8


Diagnosing a Rare Disease
• Meningitis is rare: P(m) = 1/50000 = .00002
• Meningitis causes stiff neck: P(s | m) = 0.5
• Stiff neck is not so rare: P(s) = 1/20
• You have a stiff neck. What is P(m | s), the
probability of meningitis if you have a stiff neck?

Fall 2020 UM EECS 203 Lecture 16 9


Diagnosing a Rare Disease
• Meningitis is rare: P(m) = 1/50000 = .00002
• Meningitis causes stiff neck: P(s | m) = 0.5
• Stiff neck is not so rare: P(s) = 1/20
• You have a stiff neck. What is P(m | s)?
P(s | m) P(m) 0.5 ×1/50000
P(m | s) = = = 0.0002
P(s) 1/20
• So, you are 10 times more likely to have meningitis if
you have a stiff neck!
• But it is still very unlikely – A high likelihood
(symptom match) doesn’t overcome a very low prior
(to evidence) probability.
Fall 2020 UM EECS 203 Lecture 16 10
Bayes’ Theorem
Suppose that E and F are events from a sample Thomas Bayes
space S such that p(E) > 0 and p(F) > 0. Then 1702-1761

This rule is the foundation of Bayesian methods for probabilistic


reasoning, which are very powerful and widely used in artificial
intelligence applications:
• For data mining, automated diagnosis, pattern recognition, statistical
modeling, even evaluating scientific hypotheses.

Fall 2020 UM EECS 203 Lecture 16 11


Why is Bayes’ Rule so useful?
P(s | d) P(d)
P(d | s) =
P(s)

• Diagnostic evidence P(disease | symptom) is often


hard to get.
– But it’s usually what you really want.
• Causal evidence P(symptom | disease) is often
easier to get.
• P(disease) is easy to get.
• P(symptom) is just a normalizer.

Fall 2020 UM EECS 203 Lecture 16 12


Not So Fast My Friend!
Sure, Bayes Theorem works great when you’ve been given all the numbers you
need: P(s | m) P(m) 0.5 ×1/50000
P(m | s) = = = 0.0002
P(s) 1/20
Often, the denominator isn’t directly given.
𝑠∩𝑚
* 𝑠∩𝑚
But you usually can figure it out!
• You know
€ 𝑝 𝑠|𝑚 and 𝑝(𝑚), so you know probability
of a stiff neck from meningitis. 𝑝 𝑠 ∩ 𝑚 = 𝑝 𝑠 𝑚 𝑝(𝑚) s
m
• How about a stiff neck from something else?
– Same as asking: What is the probability of a stiff neck
without meningitis? 𝑝 𝑠∩𝑚 $ =𝑝 𝑠𝑚 $ 𝑝(𝑚)
$
• And what’s the probability of not having meningitis?
𝑝 𝑚
$ = 1 − 𝑝(𝑚)
For any events s and m,
𝑝 𝑠 =𝑝 𝑠∩𝑚 +𝑝 𝑠∩𝑚
*
= 𝑝 𝑠 𝑚 𝑝 𝑚 + 𝑝 𝑠|𝑚
* 𝑝 𝑚
*
= 𝑝 𝑠 𝑚 𝑝 𝑚 + 𝑝 𝑠|𝑚
* (1 − 𝑝 𝑚 )
Fall 2020 UM EECS 203 Lecture 16 13
Bayes’ Theorem
Suppose that E and F are events from a sample Thomas Bayes
1702-1761

space S such that p(E) > 0 and p(F) > 0. Then

• Replace p(E) in denominator with something equivalent.

• Why is the bottom line the same as p(E)?


• On previous slide

• Notice first term in the sum in the denominator is the same as


the term in the numerator! Not an accident!!
Fall 2020 UM EECS 203 Lecture 16 14
Probability of an Event
The probability of an event is the sum of the
probabilities of all of the disjoint
combinations of events that include it
p( E ) = p( E | F ) p( F ) + p( E | F ) p( F )

U
E
F

Fall 2020 UM EECS 203 Lecture 16 15


Proving Bayes’ Theorem
From definition of conditional probability:

E is union of disjoint events : ( E Ç F ) ! ( E Ç F )

Plugging in:

Fall 2020 UM EECS 203 Lecture 16 16


Useful Identities to remember
• We can show that the following equality holds
for any events Y and D:
p(Y | D) + p(Y | D) = 1
• Proof:
p(Y ∩ D) + P(Y ∩ D) Defn of conditional
p(Y | D) + p(Y | D) = probability
p(D)
p((Y∩ D)∪ (Y ∩ D)) Union of disjoint
€ =
p(D)
events

Distributive law of sets


p((Y∪ Y)∩ D) p(D)
= = =1
p(D) p(D) Y ∪Y = S
S=sample space
Fall 2020 UM EECS 203 Lecture 16 17


Useful Identities to remember
• We can show that the following equality holds
for any events Y and D:
p(Y | D) + p(Y | D) = 1
• Proof:
p(Y ∩ D) + P(Y ∩ D) Defn of conditional
p(Y | D) + p(Y | D) = probability
p(D)
p((Y∩ D)∪ (Y ∩ D)) Union of disjoint
€ =
p(D)
events

Distributive law of sets


p((Y∪ Y)∩ D) p(D)
= = =1
p(D) p(D) Y ∪Y = S
S=sample space
Fall 2020 UM EECS 203 Lecture 16 18


Useful Identities to remember
• We can show that the following equality holds
for any events Y and D:
p(Y | D) + p(Y | D) = 1
• Proof:
p(Y ∩ D) + P(Y ∩ D) Defn of conditional
p(Y | D) + p(Y | D) = probability
p(D)
p((Y∩ D)∪ (Y ∩ D)) Union of disjoint
€ =
p(D)
events

Distributive law of sets


p((Y∪ Y)∩ D) p(D)
= = =1
p(D) p(D) Y ∪Y = S
S=sample space
Fall 2020 UM EECS 203 Lecture 16 19


Useful Identities to remember
• We can show that the following equality holds
for any events Y and D:
p(Y | D) + p(Y | D) = 1
• Proof:
p(Y ∩ D) + P(Y ∩ D) Defn of conditional
p(Y | D) + p(Y | D) = probability
p(D)
p((Y∩ D)∪ (Y ∩ D)) Union of disjoint
€ =
p(D)
events

Distributive law of sets


p((Y∪ Y)∩ D) p(D)
= = =1
p(D) p(D) Y ∪Y = S
S=sample space
Fall 2020 UM EECS 203 Lecture 16 20


Useful Identities to remember
• We can show that the following equality holds
for any events Y and D:
p(Y | D) + p(Y | D) = 1
• Proof:
p(Y ∩ D) + P(Y ∩ D) Defn of conditional
p(Y | D) + p(Y | D) = probability
p(D)
p((Y∩ D)∪ (Y ∩ D)) Union of disjoint
€ =
p(D)
events

Distributive law of sets


p((Y∪ Y)∩ D) p(D)
= = =1
p(D) p(D) Y ∪Y = S
S=sample space
Fall 2020 UM EECS 203 Lecture 16 21


Important Tips for Bayes’ Theorem
• When applying Bayes’ Theorem to a situation
that involves events Y and D (or any arbitrary
notation), you can use the following identities

p(Y | D) + p(Y | D) = 1
p(Y | D ) + p(Y | D ) = 1
p(Y ) + p(Y ) = 1
p(D) + p(D ) = 1
Fall 2020 UM EECS 203 Lecture 16 22
False positives, False negatives
• Let D be the event that a person has the disease
• Let Y be the event that a test for the disease comes back
positive
𝑫 "
𝑫
𝒀 True positive False positive

"
𝒀 False negative True negative

We want
• High probabilities of TP and TN
• Low probabilities of FP and FN
Y ∩ D : True positive (accurate prediction)
Y ∩ D : False positive (inaccurate prediction)
Y ∩ D : False negative (inaccurate prediction)
Y ∩ D : True negative (accurate prediction)
Fall 2020 UM EECS 203 Lecture 16 23
Testing for Disease (Medical Diagnosis)
• The disease is very rare: p(D) = 1/100,000
• Testing is Accurate
– Few False Positives p(Y | D ) = 0.5%
– Few False Negatives p(Y | D) = 1%
• Suppose you get a positive result. What do you
conclude? € p(Y | D) = 1− p(Y | D)

p (Y | D) p ( D)
p( D | Y ) =
p (Y | D) p ( D) +€p (Y | D ) p ( D )
0.99 *10 -5
= -5 -5
» 0.002
0.99 *10 + 0.005 * (1 - 10 )
Ans: If you get a positive result, there is a 0.2% chance that you have the disease.
Fall 2020 UM EECS 203 Lecture 16 24
Exercise
• When a test for steroids is given to soccer players
– 98% of the players taking steroids test positive
– 12% of the players not taking steroids test positive
• Suppose 5% of soccer players take steroids. What is
the probability that a soccer player who tests
positive takes steroids?

Fall 2020 UM EECS 203 Lecture 16 25


Exercise
• When a test for steroids is given to soccer players
– 98% of the players taking steroids test positive
– 12% of the players not taking steroids test positive
• Suppose 5% of soccer players take steroids. What is
the probability that a soccer player who tests
positive takes steroids?
Shorthand notation : S = Event that a player takes steroids
Y = Event that a player gets a positive result
p (Y | S ) p ( S )
p( S | Y ) =
p (Y | S ) p ( S ) + p (Y | S ) p ( S )
0.98 * 0.05
= » 0.301
0.98 * 0.05 + 0.12 * 0.95
Fall 2020 UM EECS 203 Lecture 16 26
Automobile Diagnosis Example
• Say that a car engine can have 4 different
(mutually-exclusive) kinds of failures: Electrical,
Fuel, eXhaust, and Timing-belt.
– We know the probability that these happen:
p(E)=.05, p(F)=.1, p(X)=.01, p(T)=.001

• For each kind of failure, we know the probability


of symptoms that include: Won’t-start, Idles-
rough, Stalls, Revs-high.
– E.g., p(W|E)=.2, p(I|E)=.4… p(W|F)=.7…

• If the car Won’t-Start, what is the probability that


the car has an Electrical problem?
Fall 2020 UM EECS 203 Lecture 16 27
Automobile Diagnosis Example
If the car Won’t-Start, what is the probability that the
car has an Electrical problem?
P(W | E )P( E )
P( E | W ) =
P(W )
P(W | E )P( E )
=
P(W | E ) P( E ) + P(W | E ) P( E )
P(W | E )P( E )
=
P(W | E ) P( E ) + P(W | F ) P( F ) + P(W | X ) P( X ) + P(W | T ) P(T )

Looking at what proportion of all of


E F
the samples for event W are also
associated with event E. W
And the probability of event E versus
the other events. T X

Fall 2020 UM EECS 203 Lecture 16 28


Generalized Bayes’ Theorem
• If E is an event from sample space S Thomas Bayes
• And F1, F2, … Fn are mutually exclusive events that 1702-1761

entirely cover S: !i =1 Fi = S and thus: åi =1 p( Fi ) = 1


n n

• And probability of all events > 0


• Then:

P( E | F j ) p( F j )
P( F j | E ) =
å
n
i =1
P ( E | Fi ) p ( Fi )

In earlier slide, called the “normalizer”. Why?

Answer: This is what ensures that: ∑*


'() 𝑃 𝐹' 𝐸 = 1
Fall 2020 UM EECS 203 Lecture 16 29
Coin-toss Example
A coin whose probability of getting heads is 2/3 is tossed 8 times.
What is the probability of exactly 3 heads in the 8 tosses?

One sequence with 3 heads: THHTTHTT


! # # ! ! # ! !
Probability of this sequence: " " " " " " " "
= (2/3)3 *(1/3)5

• So, what is the probability of any particular (2/3)3 *(1/3)5


sequence with 3 heads?
8
• How many of sequences with 3 heads are there? 3
Probability of getting 3 heads in the 8 tosses:
8 2- !
1- "
𝑃 3 ℎ𝑒𝑎𝑑𝑠 = 3 3
3
Fall 2020 UM EECS 203 Lecture 16 39
Coin-toss Example
A coin whose probability of getting heads is 2/3 is tossed 8 times.
What is the probability of exactly 3 heads in the 8 tosses?

One sequence with 3 heads: THHTTHTT


! # # ! ! # ! !
Probability of this sequence: " " " " " " " "
= (2/3)3 *(1/3)5

• So, what is the probability of any particular (2/3)3 *(1/3)5


sequence with 3 heads?
8
• How many of sequences withTake-aways
3 heads are there?
from the
example:
3
Probability of getting 3 heads •in theThe
8 tosses:
Experiment consists of repeating a simpler
8 2 ! 1 "
𝑃 3 ℎ𝑒𝑎𝑑𝑠 = -3 -experiment
3 n times.
3
• This simpler experiment is “simpler” because it’s
sample space consists of two outcomes.
• In turn, the Sample space of the experiment is the
Cartesian product of the “simpler” two-outcome
sample spaces.
• Events of interest may consist of the number of times
one of the two simpler experimental outcomes occurs.
Fall 2020 UM EECS 203 Lecture 16 40
Bernoulli Trials, Binomial Experiment, and
Binomial Distribution
• Bernoulli Trial
– Experiment that has exactly 2 outcomes, S = {success, failure}
• p(success) = p
• p(failure) = q = (1-p)
• Binomial Experiment
– Repeat the Bernoulli trial n times where each trial has the same
probability of success and all trials are mutally independent
• Binomial Distribution
– Let E be that k successes occur in n Bernoulli trials. Then
𝑛 # $%#
p(E) = 𝑝 𝑞
𝑘

Fall 2020 UM EECS 203 Lecture 16 41


Binomial Distribution
Binomial Distribution:
The probability of exactly k successes in n independent (and
identically distributed) Bernoulli trials is
𝑛 # $%#
𝑝 𝑞
𝑘
æ n ö k n-k n
• Binomial Theorem: for any x and y, (x + y) = åç ÷ x y n

k= 0 è ø
• Binomial expansion of 𝑝 + 𝑞 + k
– Each term gives probability of k successes
$ $ $ $
𝑝+𝑞 $ = ∑$#&' #
𝑝# 𝑞$%# = '
𝑞$ + (
𝑝( 𝑞$%( +…+ $
𝑝$

• What is 𝑝 + 𝑞 + ?
𝑝 + 𝑞 + = 𝑝 + (1 − 𝑝 )+ = 1 + = 1
• Why is this important?
Answer: The sum of the probabilities must be 1
Fall 2020 UM EECS 203 Lecture 16 43
Random Variables
A random variable is a function from
a sample space to the real numbers
X:S→R

R
-2 0 2
S

Note: random variable is a bad choice of name!


• It is a function, not a variable
• And the function isn’t random
But this is the name we’re stuck with!
Fall 2020 UM EECS 203 Lecture 16 45
Random Variables
• A random variable is a function X : S → R
• “X = r” is the event {s∈S | X(s) = r},
• “X ≥ r” is the event {s∈S | X(s) ≥ r}, etc.

-2 0 2
R
S

Suppose our experiment is a roll of 2 dice. S is set of pairs.


• X = sum of two dice. X((2,3)) = 5
• Y = difference between two dice. Y((2,3)) = 1
• Z = max of two dice. Z((2,3)) = 3

Fall 2020 UM EECS 203 Lecture 16 46


Random Variable: Example
• Roll 2 dice. X(s)= sum of numbers on outcome s.
• X((1,1))= 2,
• X((1,2))= X((2,1))= 3,
• X((1,3))= X((2,2))= X((3,1))= 4,
• X((1,4))= X((2,3))= X((3,2))= X((4,1))= 5,
• X((1,5))= X((2,4))= X((3,3))= X((4,2))= X((5,1))= 6,
• X((1,6))= X((2,5))= X((3,4))= X((4,3))= X((5,2))= X((6,1))= 7,
• X((2,6))= X((3,5))= X((4,4))= X((5,3))= X((6,2))= 8,
• X((3,6))= X((4,5))= X((5,4))= X((6,3))= 9,
• X((4,6))= X((5,5))= X((6,4))= 10,
• X((5,6))= X((6,5))= 11,
• X((6,6))= 12

• Events • Probability of events


– “X=7” = {(1,6), (2,5), (3,4), (4,3), (5,2), (6,1)} – p(X=7) = 1/6
– “X>10” = {(5,6), (6,5), (6,6)} – p(X >10) = 3/36 = 1/12
– “X=7 or X=11” = {(1,6), (2,5), (3,4), (4,3), – p(X=7 or X = 11) = 8/36
(5,2), (6,1), (5,6), (6,5)}

Fall 2020 UM EECS 203 Lecture 16 47


Random Variable: Example
• Roll 2 dice. X= sum of two numbers on dice. 0.18
Probabilities of the sum of two dice
– X((1,1))= 2,
– X((1,2))= X((2,1))= 3, 0.16
– X((1,3))= X((2,2))= X((3,1))= 4,
– X((1,4))= X((2,3))= X((3,2))= X((4,1))=
0.14 5,
– X((1,5))= X((2,4))= X((3,3))= X((4,2))= X((5,1))= 6,
– X((1,6))= X((2,5))= X((3,4))= X((4,3))=
0.12 X((5,2))= X((6,1))= 7,
– X((2,6))= X((3,5))= X((4,4))= X((5,3))= X((6,2))= 8,
– 0.1 9,
X((3,6))= X((4,5))= X((5,4))= X((6,3))=
p(X)

– X((4,6))= X((5,5))= X((6,4))= 10,


– X((5,6))= X((6,5))= 11, 0.08

– X((6,6))= 12.
0.06

0.04

0.02

0
2 4 6 8 10 12
X

Fall 2020 UM EECS 203 Lecture 16 48


Random Variable: Binomial Distribution
X is the number of successes in n Bernoulli trials
where p(success) = 1/2
✓ ◆
n 1
P (X = k) =
k 2n
Binomial(12,1/2) Binomial(24,1/2)
0.25 0.18

0.16
0.2
0.14

0.12
0.15
0.1
p(X)

p(X) 0.08
0.1
0.06

0.05 0.04

0.02

0 0
0 2 4 6 8 10 12 0 5 10 15 20 25
X X

Fall 2020 UM EECS 203 Lecture 16 49


Random Variable: Binomial Distribution
X is the number of successes in n Bernoulli trials
where probability of a success
✓ ◆ is p
n k
P (X = k) = p (1 p)n k
k
Binomial(24,1/5) Binomial(24,4/5)
0.2 0.2

0.18 0.18

0.16 0.16

0.14 0.14

0.12 0.12

p(X)
p(X)

0.1 0.1

0.08 0.08

0.06 0.06

0.04 0.04

0.02 0.02

0 0
0 5 10 15 20 25 0 5 10 15 20 25
X X

Fall 2020 UM EECS 203 Lecture 16 50


Expected Value
• Let p be a probability distribution over S.
• The expected value of X : S → R is the average of
X over S, according to (“weighted by”) the
distribution p:

E( X) = ∑ p(s) ⋅ X(s) =∑ p( X = r) ⋅ r
s∈S r

This sum is over the This sum is over the possible


outcomes in sample space S values that X can take.
€ Fall 2020 UM EECS 203 Lecture 16 58
Expected Value
E( X) = ∑ p(s) ⋅ X(s) =∑ p( X = r) ⋅ r
s∈S r
• Expected value of a roll of a (fair) die:
• S = {1,2,3,4,5,6}, p(s)=1/6 and X(s)=s, for all s∈S.

𝐸 𝑋 = % 𝑝(𝑠) * 𝑋(𝑠)
€ #∈%
=𝑝 1 𝑋 1 +𝑝 2 𝑋 2 +𝑝 3 𝑋 3 +𝑝 4 𝑋 4 +𝑝 5 𝑋 5 +𝑝 6 𝑋 6
= (1⁄6) * 1 + (1⁄6) * 2 + (1⁄6) * 3 + (1⁄6) * 4 + (1⁄6) * 5 + (1⁄6) * 6
= 1⁄6 + 2⁄6 + 3⁄6 + 4⁄6 + 5⁄6 + 6⁄6 = 7/2

• Expected value of a single-number lottery for $10M with a 1 in 100M


chance of winning: • Choose a number between
• S = {1,2,3,4,…,108}, X(s) ∈ {0, 10M} = {0, 107} 1 and 100,000,000
• p(X=10M) = 10-8, p(X=0) = 1- p(X=10M) = 1 - 10-8
• The winning number will
win $10M
– 𝐸 𝑋 = ∑& 𝑟 * 𝑝 𝑋 = 𝑟
= 0 * 𝑝 𝑋 = 0 + 10' * 𝑝 𝑋 = 10' = 0 + 10' * 10() = 1/10
Fall 2020 UM EECS 203 Lecture 16 59
Random Variable: Example 1
• Roll 2 dice. X= sum of two numbers on dice. What is E(X)?
– X((1,1))= 2,
– X((1,2))= X((2,1))= 3,
– X((1,3))= X((2,2))= X((3,1))= 4,
– X((1,4))= X((2,3))= X((3,2))= X((4,1))= 5,
– X((1,5))= X((2,4))= X((3,3))= X((4,2))= X((5,1))= 6,
– X((1,6))= X((2,5))= X((3,4))= X((4,3))= X((5,2))= X((6,1))= 7,
– X((2,6))= X((3,5))= X((4,4))= X((5,3))= X((6,2))= 8,
– X((3,6))= X((4,5))= X((5,4))= X((6,3))= 9,
– X((4,6))= X((5,5))= X((6,4))= 10,
– X((5,6))= X((6,5))= 11,
– X((6,6))= 12.
• E ( X ) = å p( s ) × X ( s ) would add up 36 values (every outcome)
sÎS

• E ( X ) = å p( X = r ) × r would add up 11 values (every value of X)


r

Is there an easier way to


– Either way, E(X) = 7
arrive at this conclusion?
Fall 2020 UM EECS 203 Lecture 16 60
Random Variable: Example 2
• If you rolled two dice a dozen times, what would be the expected
number of times that a 7 would be rolled?
– Bernoulli trials again! There could be anywhere from 0 to 12
7’s rolled. n = 12, p = 1/6
– Let Y be the r.v. of the number of 7s in 12 rolls.
12
E (Y ) = å k × p (Y = k ) = 2
k =0 Once again, is there an
easier way to get this answer,
𝑛 # $%# without evaluating a 13-term
𝑝 𝑌=𝑘 = 𝑝 𝑞 sum?
𝑘

Fall 2020 UM EECS 203 Lecture 16 61


Linearity of Expectations
• The expected value of the sum of random
variables is the sum of their expectations.
E( X1 + X2 ) = E( X1 ) + E( X2 )
E(a ⋅ X + b) = a ⋅ E( X) + b for any constants a,b

• Proof:


E( X1 + X2 ) =∑ p(s)( X1 (s) + X2 (s)) defn of E( )

s∈S

= ∑ p(s) X (s) + ∑ p(s) X (s)


1 2
algebra

s∈S s∈S

= E( X1 ) + E( X2 ) defn of E( )

Fall 2020 UM EECS 203 Lecture 16 63


Linearity of Expectations… Examples
• The expected value of the sum of random variables is the sum
of their expectations.
E( X1 + X2 ) = E( X1 ) + E( X2 )
E(a ⋅ X + b) = a ⋅ E( X) + b for any constants a,b

• Example 1: X = sum of two dice, i.e., X((a,b)) = a+b


– X1 = outcome of 1st die, X2 = outcome of 2nd die
– X = X1+X2.
–€E(X) = E(X1) + E(X2) = 7/2 + 7/2 = 2(7/2) = 7

• Example 2: Y = number of 7s in 12 rolls of two dice


– Consider each roll individually Expected number of
• Yi = 1 (if roll i was a 7) or 0 (if not) successes in n
– E(Yi) = 1*1/6 + 0*5/6 = 1/6 Bernoulli trials is np
– E(Y) = E(Y1+ Y2+ … + Y12) = E(Y1) + E(Y2) + … + E(Y12)
= 1/6 + 1/6 + … + 1/6 = 12 (1/6) = 2
Fall 2020 UM EECS 203 Lecture 16 64
The Renegade GSI problem
• The GSIs grade n students’ exams and enter them in a
spreadsheet. A renegade GSI then permutes the score
column randomly.
• What is the expected number of students who
nonetheless get assigned the correct score?
– (Assume all students and scores are distinct.)

A: 0
B: 1
C: 2
D: n/2
E: n(n-1)/2

Fall 2020 UM EECS 203 Lecture 16 65


The Renegade GSI problem… Solution
• What is the expected number of n students who end up with A: 0
the correct score after a renegade GSI randomly permutes B: 1
the scores? C: 2
D: n/2
Solution E: n(n-1)/2
• Let 𝑋 = the number of students who have the correct score.
• Let 𝑋, be an indicator variable
1 , if the 𝑖 12 student has the correct score
𝑋0 = 0
0 , otherwise
• Then: 𝑋 = 𝑋- + 𝑋. + 𝑋/ + ⋯ + 𝑋+ 𝑝 𝑋! = 1 = 1/𝑛
• 𝐸 𝑋, = 1 , 𝑝 𝑋, = 1 + 0 , 𝑝 𝑋, = 0 because each student is
equally likely to get any
= 1 , 1/𝑛 + 0 = 1/𝑛 one of n scores

𝐸 𝑋 = 𝐸(𝑋- + 𝑋. + 𝑋/ + ⋯ + 𝑋+ )
= 𝐸 𝑋- + 𝐸 𝑋. + … + 𝐸 𝑋+ By linearity of expectations
-
= 𝑛 , 𝐸 𝑋- = 𝑛 , =1
+
Fall 2020 UM EECS 203 Lecture 16 66
Geometric Distributions
• A coin has probability p of Tails. Flip it until you get the
first Tails; count the total number of flips (including the
last one).
– What is the sample space S?
– What is the probability distribution?

A r.v. X has a geometric distribution with parameter p if:


𝑝 𝑋 = 𝑘 = (1 − 𝑝)#%( 𝑝
If X is geometrically distributed with parameter p then:
𝐸 𝑋 = 1/𝑝
• Proof…
Fall 2020 UM EECS 203 Lecture 16 68
Random Variable: Geometric Distribution
Geometric(1/5)
0.2

0.18

0.16

0.14

0.12
p(X)

0.1

0.08

0.06

0.04

0.02 Geometric(1/50)
0.02
0
0 5 10 15 20 25 0.018
X
0.016

0.014

p(X) 0.012

0.01

0.008

0.006

0.004

0.002

0
0 20 40 60 80 100
X

Fall 2020 UM EECS 203 Lecture 16 69


Geometric Distributions
• The sample space is countably infinite!
– S = {T, HT, HHT, HHHT, HHHHT, HHHHHT, … }
• X(s) = {1, 2, 3, 4, 5, 6, …} (X = number of flips until the first Tails)
• The probability of a particular sequence is
Do you remember what this
𝑝 𝑋 = 𝑗 = (1 − 𝑝))%( 𝑝. sum is off the top of your head?
• So: (Neither did I.)
¥ ¥ ¥

E( X) = å j × P( X = j) = å j(1- p) j -1 p = på j(1- p) j -1
j =1 j =1 j =1
¥
1 1
å j (1 - p)
j =1
j -1
=
(1 - (1 - p )) 2
= 2 (From
p
(From Table
Table2,2,PgPg176)
166)

) )
Thus, 𝐸 𝑋 = 𝑝 !8
= !
Fall 2020 UM EECS 203 Lecture 16 70
Geometric Distributions
• The sample space is countably infinite!
• The probability of sequence: 𝑝 𝑋 = 𝑗 = (1 − 𝑝)'+) 𝑝.
¥

E( X) = å j × P( X = j)
j =1 How many times is the event
¥ “X=j” counted in this sum?
= å P( X ³ j) A handy alternative expression for E(X)
j =1
¥ when X is always a non-negative integer.

= åj =1
(1- p) j -1

1 1
= =
1- (1- p) p
Fall 2020 UM EECS 203 Lecture 16 71
Watching Seinfeld Reruns
• Every night a Seinfeld episode is drawn uniformly
at random from the 180 shows and broadcast.
• What is the expected number of nights you need
to watch to see all episodes?

• This is a tricky problem if you don’t start right!


– Use linearity of expectations.
– You know the expectation of a geometric distribution.

Fall 2020 UM EECS 203 Lecture 16 72


Seinfeld Reruns
• Let Xi→ j be the number of days you have to watch
to go from having watched i distinct shows to having
watched j distinct shows. Then,

Fall 2020 UM EECS 203 Lecture 16 73


Independence of Random Variables
• Random variables X and Y on a sample space S are independent if
the events “X(s)=a” and “Y(s)=b” are independent, for all a,b:
i.e., p( X ( s)= a and Y ( s) = b) = p( X ( s) = a) × p(Y ( s) = b)
Thm. If X and Y are independent r.v.s then:
E(XY) = E(X)E(Y)
(Not necessarily true if X and Y are dependent!)

• Independent: X1 and X2 are number of heads on first and second flip respectively. So
E(X1)= E(X2)= 1/2.
(X1X2)(HH)= 1*1 = 1. (X1X2)(HT)= 1*0 = 0. (X1X2)(TH)= 0*1 = 0. (X1X2)(TT)= 0*0 = 0
Then E(X1X2)= ¼*1 + ¼*0 + ¼*0 + ¼*0 = ¼ = E(X1)E(X2).

• Dependent: X1 is number of heads on first flip and X3 is number of heads across both
flips. So E(X1)= ½, E(X3)= 1.
(X1X3)(HH)=1*2=2. (X1X3)(HT)=1*1=1. (X1X3)(TH)=0*1=0. (X1X3)(TT)=0*0=0
E(X1X3) = ¼*2 + ¼*1 + ¼*0 + ¼*0 = ¾ ≠ ½ = E(X1)E(X2).

Fall 2020 UM EECS 203 Lecture 16 74


Limits of the Usefulness of Expectation
• Expectation tells us what to expect from a random variable
on average.
– E.g., when you roll a pair of dice many times, the average sum should
be close to 7.
– This doesn’t mean that a “7” is particularly likely.
• Similarly, when you wait for the bus, you might know what
the average arrival time is at that stop, …
– But even if you get to the bus stop a little early, there’s still a chance
you’ll have missed the bus.
• Sometimes helpful to know how widely the values of a
random variable are distributed.
– How early should you get to the bus stop to keep the chances of
missing the bus below some small probability?
• Next time: Variance!
Fall 2020 UM EECS 203 Lecture 16 77
Take Aways
• Bayes Theorem
• Bernoulli Trials
• Binomial Distribution
• Random Variables
– What is a RV?
– E(X):
• Two ways: Sum over outcomes or sum over values of X
• Linearity of expectation
• Common Distributions: geometric, binomial, etc.
– Independence of RVs

Fall 2020 UM EECS 203 Lecture 16 78

You might also like