8 Probability Review

Overview of Chapter
We will review the basics of probability theory.

Since uncertainty is a typical aspect of problems,
rigorous and accurate problem solving requires
using probability theory (i.e., math and logic).
• Specifically, we want you to:
– Understand probability concepts
– Use probability to model simple situations
– Interpret probability statements
– Manipulate and analyze models.
A Quick Note on Terminology
• We use the term chance event to refer to
something about which a decision maker is
uncertain. In turn, a chance event has more than
one possible outcome.
• Symbology:
– Chance Events are designated with boldface letters (A)
– Outcomes are designated with lightface letters (A) and
sometimes letters with subscripts (Ai)
Sample Space, Events, and Set Theory
• Probability is a theory to study uncertainty. To understand
probability theory, we need some mathematical background
and an understanding of random phenomena.
• Envision an experiment for which the result is unknown. A

sample space, A, consists of all possible outcomes.
– Flip a coin once. Before it lands on the ground we are uncertain which
side of the coin will face up, A ={H,T}.
– To a cashier over the counter, the time for the next customer comes to
the counter is also random, A =[0, ).
– Roll a die, A ={1,2,3,4,5,6}
– Flip a coin twice, A ={(H,H},(T,T),(H,T),(T,H)}
A Little Probability Theory
lnm
A Little Probability Theory
3. Total Probability Must Equal 1
– If we have a set of outcomes such that one (and
only one) of them has to occur, then the
probability of the set must sum to 100% and there
is a 100% chance that one (and only one) of the
outcomes will occur.
– Thus, the set of outcomes is called collectively
exhaustive and mutually exclusive.
Venn Diagrams
Venn diagrams graphically represent probability.
The entire Venn diagram
is the set of all possible
outcomes (A1, A2, and C),
and the entire area of the
diagram is 1 or 100%.
There is a 100% chance
one of the outcomes in
the diagram will occur.
The area of A1, e.g., is the
probability of A1 occurring,
say, 10%. That is, just as
A1 is 10% of the A1 and A2 do not overlap since they are mutually
diagram’s area, so it has exclusive. They cannot both happen. Thus, their
a 10% chance of probabilities are additive, just as A1, A2 and C must
happening. add up to 100%.
More Probability Formulas
4. Conditional Probability
– The probability of A happening when B has also
occurred
– Stated as “the probability of A given B”
– “The probability of A given B is the joint probability

of A and B divided by the probability of B.”
Conditional Probability
as a Venn Diagram—
the probability of a
certain stock price going
up given that the Dow
Jones went up (the
diagonally shaded area)
Conditional Probabilities (Venn Diagram)
• Suppose that P(E)=0.6. What can you say

about P(E|F) when
a) E and F are mutually exclusive?
b) E ⊆ F ?
c) F ⊆ E ?
• A school is offering 3 language classes: Spanish, French and German.
These classes are open to any of the 100 students in the school. There are
28 students in the Spanish class, 26 in the French class and 16 in the
German class. There are 12 students that are in both Spanish and French,
4 in both Spanish and German, and 6 in both French and German. There
are 2 students taking all three classes.
a) If a student is chosen at randomly, what is the probability that he or she
is not in any of these classes?
b) If a student is chosen randomly, what is the probability that he or she is
taking exactly one language class?
c) If 2 students are chosen randomly, what is the probability that at least 1
is taking a language class?
14 10 10
S 2 F
2 4
8
G 50
5. Independence
– The probability of outcome A occurring stays the
same no matter which outcome of B has occurred.
where
– Note that independence is a special instance of

conditional probability.
Conditional Probabilities
• If the occurrence of event B makes event A more likely, does
the occurrence of event A make event B more likely?
For example, suppose A is the event that the Nasdaq stock index
goes up on a given day, while B is the event that the weather is
nice on the particular day (15 to 25 degrees, humidity below
30%).
Some Additional Probability Rules
• Symmetry in independence: independence in
one direction implies independence in the other
direction
If P(Ai Bj) = P(Ai), then P(Bj Ai) = P(Bj)
– This is also a special case of conditional probability.
• Independent chance events are not the same as
mutually exclusive outcomes.
• Two chance events being probabilistically
dependent does not imply a causal relationship.
• Conditional Independence: If A and B are
conditionally independent given C, then learning
the outcome of B adds no new information
regarding A if the outcome of C already is known.
– For example, drownings are correlated with ice cream
consumption. It is unlikely that these two are causally
related. The explanation lies in a common cause; both
tend to happen during summer. Therefore, drownings
and ice cream consumption can be said to be
conditionally independent given the season.
In these influence diagrams, A and B are conditionally independent given C. The

conditioning event can be either (a) a chance event or (b) a decision.
6. Complements
– One event is said to be the complement of another
if the two together are the case of all that is.
7. Total Probability of an Event
– A convenient way to calculate the probability of A:
P(A)
The probability of outcome A

is made up of the probability
of outcome “A and B” and the
probability of outcome “A and
B.”
8. Bayes’ Theorem
– Lets you simultaneously relate P(A|B), the chance that
an event A happened given B, and P(B|A), the chance B
happened given that event A occurred
• Via the symmetry principle of conditional probability
– Lets you factor in false positives (concluding things don’t

exist when in fact they do) and false negatives
(concluding things do exist when they don’t)
• Type I error—incorrectly rejecting a true null hypothesis
• Type II error—incorrectly not rejecting a false null hypothesis
Deriving Bayes’ theorem
Starting with the
symmetry formula
Rearranging it
algebraically
And replacing P(A) with the formula for total probability

Bayes Theorem
Let B1, B2,…,Bn be mutually exclusive events where

(exhaustive).
then
and
Example
There are three coins in a box. One is a two-headed coin, another

is a fair coin, and the third is a biased coin that comes up heads
75 percent of the time. Suppose that one of these three coins is
selected at random and flipped. What is the conditional
probability that the randomly selected coin was the two-headed
coin, given that the outcome of the flipping showed heads?
Example (John Hinckley’s Trial)
› In 1982 John Hinckley was on trial, accused of having attempted to kill President
Reagan.
› Hinckley’s lawyers attempted to sway the jurors by convincing them that Hinckley
suffers from mental illness (specifically schizophrenia).
› Approximately 1.5% of the population suffer from schizophrenia.
› According to the available information, patients with schizophrenia are given a CAT
scan where 30% of the patients show brain atrophy compared to 2% of the rest of the
population.
› What is the probability that Hinckley has schizophrenia? Is it a convincing argument?
P(S|A)=P(A|S)P(S) / [P(A|S)P(S) + P(A | not S) P(not S)

(0.3)(0.015) / [(0.3)(0.015) + (0.02)(0.985)] =0.186.
Example (John Hinckley’s Trial)
› To improve their argument, the lawyers can test how sensitive

is the posterior probability to the prior probability.
› For example, is it a strong statement to argue that before

undergoing the CAT scan, Hinckley is more likely than the
normal person to be mentally ill?
› If the prior probability is increased to 10% (Hinckley is nine

times likely to be normal than schizophrenic).
P(S|A)=0.63.
› Example (John Hinckley’s Trial)
› For P(S)=10% then P(S|A)=0.63.

Random Variables
› Consider a function that assigns real numbers to events

(outcomes) in A. Such real-valued function is a random
variable (rv).
› The rv X is a function mapping a probability space (A,P) into

the real line .
› E.g., when rolling two fair dice, define X as the sum of the two
dice. Then, X is a random variable with P{X = 2} =
P{(1,1)}=1/36, P{X = 3} = P{(1, 2), (2, 1)}=2/36=1/18, etc.
Random Variables
› If the random variable can take on a limited number of values.

Then, this is a discrete random variable. E.g., the random
variable X representing the sum of two dice.
› If the random variable can take on an uncountable number of

values. Then, this is a continuous random variable. e.g., the
random variable H representing height of an AUB student.
› If X is a discrete random variable, the function

fX(x) = PX(x) = P{X = x} is the probability mass function
(pmf) of X .
Uncertain Quantities (Random Variables)
• Probability distribution
– The set of probabilities associated with all possible
outcomes of an uncertain quantity
– The probabilities must add to 1 because the events
are collectively exhaustive.
• Two types:
– Discrete uncertain events and probability distributions
– Continuous uncertain events and probability
distributions
Random Variables
› The function FX(x) = P{X ≤ x} = is the cumulative distribution

function (cdf) of X.
› E.g., for the random variable S,

Random Variables
› For a continuous random variable, X, the cdf is defined based

on a function fX(x) called the density function, where
› Fact. For a discrete random variable
For a continuous random variable,

Discrete Probability Distributions
A discrete uncertain quantity is one that can
assume a finite or countable number of possible
values, e.g., the numbers of cars in a parking lot.
Graph of Probability Mass Function Graph of Cumulative Distribution Function

Continuous Probability Distributions
• A continuous
uncertain Graph of Cumulative Distribution Function
quantity is one
that can take
any value within
a range, e.g.,
time, speed.
• The probability of
any specific value
of a continuous
variable equals
zero.
Expected Value: Discrete Case
• The probability-weighted average of an
uncertain quantity’s possible values
– Denoted as E(X) or µx
Probability Density Functions
• A function for a continuous variable in which
the area under the curve within a specific
interval represents the probability that the
uncertain quantity will fall in that interval.
• Corresponds to the probability mass function
of discrete uncertain quantities
– Whereas height represents probability for discrete
distributions, it is area that represents probability
for continuous distributions.
Probability Density Functions
Variance and Standard Deviation
• Are two ways to describe how far numbers in
a probability distribution lie from the expected
value (e.g., the average or mean)
– That is, how far a set of numbers is spread out
– Variance is denoted as Var(X) or σ2
x
Var(X) = E[(X-E(X))2]
– Standard deviation is denoted as σx and is the
square root of variance
Expected Value, Variance, Standard Deviation:
The Continuous Case
• Continuous probability distributions also have
expected values, variances, and standard
deviations.
– Characteristics are defined by calculus (integral
sign ∫ replaces the summation sign ∑ )
Measures of variability
› The variance of a random variable X is
› The standard deviation of a random variable X is
› The coefficient of variation of a random variable X is
CV[X] = sX/E[X] .
› The variance (standard deviation) measures the spread of the
random variable around the expectation.
› The coefficient of variation is useful when comparing

variability of different alternatives.
› Note that Var[aX+b] =a2 Var[X], for any real numbers a and b
and random variable X .
› In addition, if X and Y are two independent random variables,

then Var[X + Y] = Var[X] + Var[Y] .
› Independent Random variables
Two random variables X and Y are said to be independent if
› Expectation of a random variable

› The expectation of a discrete random variable X is
› The expectation of a continuous random variable X is

› The expectation of a random variable is the value obtained if
the underlying experience is repeated for a number of times
which is large enough and the resulting values are averaged.
› The expectation is “linear.” That is, for two random variables

X and Y, E[aX + bY] = aE[X] + bE[Y] .
› E[aX+b]=aE[X]+b
› The expectation of a function of random variable X, g(X), is
› An important measure is the nth moment of X, n =1, 2, …

› Example – Oil Wildcatting
An oil company is considering two sites for an exploratory well. Due to budget
constraints, only one well can be drilled.
Site 1: Cost to Drill $100,000

Outcome Payoff
Dry -$100,000
Low Producer $150,000
High Producer $500,000
Site 2: Cost to Drill $200,000

Outcome Payoff Probability
Dry -$200,000 0.2
Low Producer $50,000 0.8
› Conditional Probabilities of outcomes at Site 1.
If Dome Structure Exists (prob 0.6) If No Dome Structure Exists (prob 0.4)
Outcome P(Outcome|Dome) Outcome P(Outcome|No Dome)
Dry 0.6 Dry 0.85
Low Producer 0.25 Low Producer 0.125
High Producer 0.15 High Producer 0.025
› P(Dry) = P(Dry|Dome)P(Dome) + P(Dry|No Dome)P(No Dome)
= 0.6(0.6)+0.85(0.4) = 0.7.
› P(Low)=0.25(0.6)+0.125(0.4) = 0.2.
› P(High)=0.15(0.6)+0.025(0.4) = 0.1.
› Let Xi be the profit for selecting Site i, i = 1,2.

› E[X1] = EMV(Site 1) = 0.7(-100)+0.2(150)+0.1(500) = $10K.
› E[X2] = EMV(Site 2) = 0.2(-200) + 0.8(50) = $0.
› E[X12] = 0.7(-1002) +0.2(1502) + 0.1(500) = 36,500 K2

› E[X22] = 0.2(-2002) + 0.8(502) = 10,000 K2
› 2
= Var(X1) = 36,500 - 102 = 36,400 K2 , and = 190.79 K.
› 2
= Var(X2) = 10,000 - 02 = 10,000 K2 , and = 100 K.

8 Probability Review

Uploaded by

Copyright:

Available Formats

8 Probability Review

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

8 Probability Review

Uploaded by

Copyright:

Available Formats

Overview of Chapter

We will review the basics of probability theory.

• Envision an experiment for which the result is unknown. A

– “The probability of A given B is the joint probability

• Suppose that P(E)=0.6. What can you say

– Note that independence is a special instance of

In these influence diagrams, A and B are conditionally independent given C. The

The probability of outcome A

– Lets you factor in false positives (concluding things don’t

And replacing P(A) with the formula for total probability

Let B1, B2,…,Bn be mutually exclusive events where

There are three coins in a box. One is a two-headed coin, another

› What is the probability that Hinckley has schizophrenia? Is it a convincing argument?

P(S|A)=P(A|S)P(S) / [P(A|S)P(S) + P(A | not S) P(not S)

› To improve their argument, the lawyers can test how sensitive

› For example, is it a strong statement to argue that before

› If the prior probability is increased to 10% (Hinckley is nine

› For P(S)=10% then P(S|A)=0.63.

› Consider a function that assigns real numbers to events

› The rv X is a function mapping a probability space (A,P) into

› If the random variable can take on a limited number of values.

› If the random variable can take on an uncountable number of

› If X is a discrete random variable, the function

› The function FX(x) = P{X ≤ x} = is the cumulative distribution

› E.g., for the random variable S,

› For a continuous random variable, X, the cdf is defined based

› Fact. For a discrete random variable

For a continuous random variable,

Graph of Probability Mass Function Graph of Cumulative Distribution Function

› The variance of a random variable X is

› The standard deviation of a random variable X is

› The coefficient of variation of a random variable X is

› The coefficient of variation is useful when comparing

› In addition, if X and Y are two independent random variables,

Two random variables X and Y are said to be independent if

› Expectation of a random variable

› The expectation of a continuous random variable X is

› The expectation is “linear.” That is, for two random variables

› The expectation of a function of random variable X, g(X), is

› An important measure is the nth moment of X, n =1, 2, …

Site 1: Cost to Drill $100,000

Site 2: Cost to Drill $200,000

› Let Xi be the profit for selecting Site i, i = 1,2.

› E[X12] = 0.7(-1002) +0.2(1502) + 0.1(500) = 36,500 K2

You might also like