Probability/Conditional Probability

Probability
Conditional Probability

Motivation

In the previous chapter, we have actually only dealt with unconditional probabilities. To be more precise, in the problems encountered in the previous chapter, the sample space is defined initially, and all probabilities are assigned with respect to that initial sample space. However, in many situations, after defining the initial sample space for a random experiment, we may get some new information about the random experiment. Hence we may need to update the sample space based on those information. The probability based on this updated sample space is known as a conditional probability.

To illustrate how we get the new information and update the sample space correspondingly, consider the following example:

Example. Suppose we randomly draw a card from the poker deck (such that every card is equally likely to be drawn). We define the sample space to be the set containing all 52 cards.

(a) Calculate the probability that an ace is drawn.

(b) Suppose an ace is drawn out from the poker deck beforehand. Calculate the probability that an ace is drawn. (This is a conditional probability.)

(c) Suppose two aces are drawn out from the poker deck beforehand. Calculate the probability that an ace is drawn. (This is a conditional probability.)

(d) Suppose three aces are drawn out from the poker deck beforehand. Calculate the probability that an ace is drawn. (This is a conditional probability.)

(e) Suppose we instead randomly draw four cards from the deck. Calculate the probability that four aces are drawn.

Solution.

(a) The probability is ${\frac {4}{52}}$ (since there are four aces in the deck).

(b) With the given condition (an ace is removed from the deck), we know that there are only 51 cards, and 3 aces in the deck. So, the sample space is updated to contain these 51 cards. Although we cannot precisely describe the sample space since we do not know which ace is removed, we know that the sample space has 51 sample points and only 3 aces. It is also reasonable to assume that the sample points in the updated sample space are equally likely. Hence, the probability should be ${\frac {3}{51}}$ . (Notice that we are still able to calculate the probability, despite we do not know exactly what the updated sample space is.)

(c) Similar to (b), the sample space is updated to contain 50 cards and two aces. Hence, the probability is ${\frac {2}{50}}$ .

(d) Similar to (b), the sample space is updated to contain 49 cards and one ace. Hence, the probability is ${\frac {1}{49}}$ .

(e) Clearly, we can use the concept of combinatorial probability to calculate this probability: ${\frac {1}{\binom {52}{4}}}={\frac {1}{270725}}$ .

Another argument (which may be quite intuitive to many people) is to consider the four draws "one by one".

For the first draw, the probability is ${\frac {4}{52}}$ .
For the second draw, we know that an ace is drawn for the first draw, so the probability is ${\frac {3}{51}}$ (similar to (b)).
For the third draw, we know that two aces are drawn for the first two draws, so the probability is ${\frac {2}{50}}$ (similar to (c)).
For the fourth draw, we know that three aces are drawn for the first three draws, so the probability is ${\frac {1}{49}}$ (similar to (d)).

Then, we are somehow told that multiplying all the probabilities gives the desired one ("multiplication rule of probability"): ${\frac {4}{52}}\times {\frac {3}{51}}\times {\frac {2}{50}}\times {\frac {1}{49}}={\frac {1}{270725}},$ which turns out to be the same as the answer above.

It turns out that this argument is valid, but actually we have not discussed any results that justify this argument in the previous chapter. Indeed, we have implicitly used the concept of conditional probability (in 2,3,4 above), where the sample space is updated as we have new information/knowledge. (In some sense, the (conditional) probability reflect our state of knowledge about the random experiment (about the deck).)

From the example above, we are able to calculate the (conditional) probability "reasonably" through some arguments (see (b)) when the sample points in the initial sample space are equally likely. Furthermore, we can notice that the condition should be an occurrence of an event, which involves the sample points in the sample space. When the condition does not involves the sample points at all, it is irrelevant to the random experiment. For example, if the condition is "the poker deck costs $10", then this is clearly not an event in the sample space and does not involve the sample points. Also, it is irrelevant from this experiment.

To motivate the definition of conditional probability, let us consider more precisely how we obtain the (conditional) probability in (b). In (b), we are given that an ace is drawn out from the poker deck beforehand. This means that ace can never be drawn in our draw. This corresponds to the occurrence of the event (with respect to original sample space) $\{{\text{not drawing that ace from the poker deck}}\}$ (denoted by $B$ ) which consists of 51 sample points, resulting from excluding that ace from the original 52 sample points. Thus, we can regard the condition as the occurrence of event $B$ . Now, under this condition, the sample space is updated to be the set $B$ , that is, only points in $B$ are regarded as legit sample points now.

Consider part (b) again. Let us denote by $A$ the event $\{{\text{an ace is drawn from the deck}}\}$ (with respect to the original sample space).

Now, only part of the points (whose also lie in the set $B$ ) in the set $A$ are regarded as legit sample points. All other points in set $A$ are no longer legit sample points anymore under this condition. In other words, only the points in both sets $A$ and $B$ (i.e., in set $A\cap B$ ) are legit sample points in event $A$ under this condition.

In the part (b) above, only the three aces remaining in the deck (in both sets $A$ and $B$ , and hence in set $A\cap B$ ) are considered to be legit sample points. The other ace in set $A$ (the ace that is drawn out in the condition) is not considered to be a legit sample point, since that ace is not in the deck at all!

To summarize, when we want to calculate the conditional probability of event $A$ given the occurrence of event $B$ , we do the following:

We update the sample space to the set $B$ .
We only regard the sample points in set $A\cap B$ to be the (valid) sample points of event $A$ .

In the above example, we encounter a special case where the sample points in the initial sample space (assumed to be finite) are equally likely (and hence the sample points in the updated sample space $B$ should also be equally likely). In this case, using the result about combinatorial probability (in previous chapter), the conditional probability, denoted by $\mathbb {P} (A|B)$ , is given by $\mathbb {P} (A|B)={\frac {|A\cap B|}{|B|}}={\frac {{\text{number of (valid) sample points in }}A}{{\text{number of sample points in updated sample space }}B}}.$ (Notice that " $\mathbb {P} (A|B)$ " is just a notation ( $\mathbb {P} (\cdot |B)$ is a function, $A$ is the "input"). Merely " $A|B$ " means nothing. Particularly, " $A|B$ " is not an event/set.)

When the sample points are not equally likely, we can apply a theorem in the previous chapter for constructing probability measure on the updated sample space $B$ . (Here, we assume that $B$ is countable.) Particularly, since we are only regarding the sample points in set $A\cap B$ as the (valid) sample points of event $A$ , it seems that the (naive) "conditional probability" of $A$ given the occurrence of event $B$ should be given by $\mathbb {P} (A\cap B)=\sum _{\omega \in {\color {darkgreen}A\cap B}}^{}\mathbb {P} (\{\omega \})$ according to that theorem (where $\mathbb {P}$ is the probability measure in the initial probability space $(\Omega ,{\mathcal {F}},\mathbb {P} )$ ).

However, when we apply the original probability measure $\mathbb {P}$ (in the original probability space) to every singleton event in the new sample space $B$ , we face an issue: the sum of those probabilities are just $\sum _{\omega \in B}^{}\mathbb {P} (\{\omega \})=\mathbb {P} (B),$ which is not 1 in general! But that theorem requires this probability to be 1! A natural remedy to this problem is to define a new probability measure $\mathbb {P} (\cdot |B):{\mathcal {F}}\to [0,1]$ , based on the original probability measure and the above (naive) "conditional probability" $\mathbb {P} (A\cap B)$ , such that the sum must be 1. After noticing that $1={\frac {\mathbb {P} (B)}{\mathbb {P} (B)}}$ , a natural choice of such probability measure is given by $\mathbb {P} (A|B)={\frac {\mathbb {P} (A\cap B)}{\mathbb {P} (B)}},$ for every $A\in {\mathcal {F}}$ . The probability $\mathbb {P} (B)$ can be interpreted as the normalizing constant, and every (naive) "conditional probability" (as suggested previously) is scaled by a factor of ${\frac {1}{\mathbb {P} (B)}}$ . (It turns out that the probability measure $\mathbb {P} (\cdot |B):{\mathcal {F}}\to [0,1]$ defined in this way also satisfies all the probability axioms. We will prove this later.)

When comparing this formula with the formula for the equally likely sample points case, the two formulas look quite similar actually. In fact, we can express the formula for the equally likely sample points case in the same form as this formula (since the equally likely sample points case is actually just a special case for the theorem we are considering): ${\frac {|A\cap B|}{|B|}}={\frac {|A\cap B|/|\Omega |}{|B|/|\Omega |}}={\frac {\mathbb {P} (A\cap B)}{\mathbb {P} (B)}}.$

A graphical illustration for the above motivation.

So, now we have developed a reasonable and natural formula to calculate the conditional probability $\mathbb {P} (A|B)$ when the outcomes are equally likely (applicable to finite sample space), and the outcomes are not equally likely (only for countable sample space). It is thus natural to also use the same formula when the sample space is uncountable. This motivates the following definition of conditional probability:

Definition

Definition. (Conditional probability) Let $(\Omega ,{\mathcal {F}},\mathbb {P} )$ be a probability space, and $A,B\in {\mathcal {F}}$ be events. Assume that $\mathbb {P} (B)>0$ . Then, The conditional probability of event $A$ given the occurrence of event $B$ , denoted by $\mathbb {P} (A|B)$ , is $\mathbb {P} (A|B)={\frac {\mathbb {P} (A\cap B)}{\mathbb {P} (B)}}.$

Remark.

The assumption that $\mathbb {P} (B)>0$ prevents the above formula gives an undefined value.
It follows that $\mathbb {P} (A\cap B)=\mathbb {P} (A|B)\mathbb {P} (B)$ .

This formula can be illustrated by the following tree diagram:


                *-------> A    P(A n B)
              / P(A|B)
  P(B)       /
   *------> B 
  /          \
 /            \
/              *-------> A^c
\               
 \
  \
   *------> B^c

Sometimes, the probability $\mathbb {P} (A\cap B)$ is unknown, and we are given $\mathbb {P} (A|B)$ and $\mathbb {P} (B)$ . In this case, we can apply this formula to get $\mathbb {P} (A\cap B)$ .
Also, the value of $\mathbb {P} (A|B)$ is often not stated in the question explicitly. Instead, we may have to use our intuition to understand the situation to get the conditional probability $\mathbb {P} (A|B)$ .
Besides, in a more complicated situation, it may not be clear that what the condition " $B$ " in $\mathbb {P} (A|B)$ is, and therefore we may have to decide what should we condition at, depending on the context. (When we condition on some appropriate events, we may be able to determine the conditional probability readily.)

Through this definition, we induces (in some sense) another probability space where the conditional probability is taken to be the probability measure: $(\Omega ,{\mathcal {F}},\mathbb {P} (\cdot |B))$ , where $\mathbb {P} (\cdot |B):{\mathcal {F}}\to [0,1]$ is defined by

$\mathbb {P} (A|B)={\frac {\mathbb {P} (A\cap B)}{\mathbb {P} (B)}}$

for every

A\in {\mathcal {F}}

.

(Technically speaking, since we have not proved that

\mathbb {P} (\cdot |B)

satisfies all probability axioms, we cannot say that

(\Omega ,{\mathcal {F}},\mathbb {P} (\cdot |B))

is a probability space yet. But it turns out to be the case.)

Example. ( $\mathbb {P} (\cdot |B)$ is a valid probability measure) Let $(\Omega ,{\mathcal {F}},\mathbb {P} )$ be a probability space, and $A,B\in {\mathcal {F}}$ be events. Assume that $\mathbb {P} (B)>0$ . Define the function $\mathbb {P} (\cdot |B):{\mathcal {F}}\to [0,1]$ by $\mathbb {P} (A|B)={\frac {\mathbb {P} (A\cap B)}{\mathbb {P} (B)}}$ for every $A\in {\mathcal {F}}$ .

Prove that $\mathbb {P} (\cdot |B)$ is a valid probability measure, that is, this function satisfies all three probability axioms.

Proof.

P1 (Nonnegativity): Since $\mathbb {P} (A\cap B)\geq 0$ and $\mathbb {P} (B)>0$ ( $\mathbb {P}$ satisfies all the probability axioms, particularly, the nonnegativity), it follows that $\mathbb {P} (A|B)\geq 0$ for every event $A\in {\mathcal {F}}$ .

P2 (Unitarity): We have $\mathbb {P} (\Omega |B)={\frac {\mathbb {P} (\Omega \cap B)}{\mathbb {P} (B)}}={\frac {\mathbb {P} (B)}{\mathbb {P} (B)}}=1.$

P3 (Countable additivity): For every infinite sequence of pairwise disjoint events $A_{1},A_{2},\dotsc$ , we have $\mathbb {P} \left(\bigcup _{i=1}^{\infty }A_{i}{\bigg |}B\right)={\frac {\mathbb {P} {\big (}(A_{1}\cup A_{2}\cup \dotsb )\cap B{\big )}}{\mathbb {P} (B)}}={\frac {\mathbb {P} {\big (}(A_{1}\cap B)\cup (A_{2}\cap B)\cup \dotsb {\big )}}{\mathbb {P} (B)}}{\overset {\text{ P3 }}{=}}{\frac {\sum _{i=1}^{\infty }\mathbb {P} (A_{i}\cap B)}{\mathbb {P} (B)}}=\sum _{i=1}^{\infty }{\frac {\mathbb {P} (A_{i}\cap B)}{\underbrace {\mathbb {P} (B)} _{{\text{not involving }}i}}}{\overset {\text{ def }}{=}}\sum _{i=1}^{\infty }\mathbb {P} (A_{i}|B).$

$\Box$

Example. (Special cases for conditional probability)

Suppose $B\subseteq A$ . Then, when we update the sample space to set $B$ , the set $A$ contains all (legit) sample points in the updated sample space. Thus, the conditional probability $\mathbb {P} (A|B)$ should be 1. Alternatively and more intuitively, given that $B$ occurs, this means the realized outcome lies in set $B$ , and hence must lie in set $A$ (since $B\subseteq A$ ). So, the probability should be 1.

Formally, we can see this readily: $\mathbb {P} (A|B)={\frac {\mathbb {P} (A\cap B)}{\mathbb {P} (B)}}={\frac {\mathbb {P} (B)}{\mathbb {P} (B)}}=1$ ( $A\cap B=B$ when $B\subseteq A$ .)

Suppose $A$ and $B$ are disjoint. Then, when we update the sample space to set $B$ , the set contains zero (legit) sample point in the updated sample space. Thus, the conditional probability $\mathbb {P} (A|B)$ should be 1. Alternatively and more intuitively, given that $B$ occurs, this means the realized outcome lies in set $B$ . So, it must not lie in set $A$ . Hence, the probability should be 0.

Formally, we can see this readily: $\mathbb {P} (A|B)={\frac {\mathbb {P} (A\cap B)}{\mathbb {P} (B)}}={\frac {\mathbb {P} (\varnothing )}{\mathbb {P} (B)}}=0$ .

Example. Suppose we roll a fair dice once. Let $E$ and $P$ be the events that even number comes up and prime number comes up respectively.

(a) Calculate $\mathbb {P} (E|P)$ .

(b) Calculate $\mathbb {P} (P|E)$ .

Solution.

(a) $\mathbb {P} (E|P)={\frac {\mathbb {P} (E\cap P)}{\mathbb {P} (P)}}={\frac {1/6}{3/6}}={\frac {1}{3}}.$ (b) $\mathbb {P} (P|E)={\frac {\mathbb {P} (P\cap E)}{\mathbb {P} (E)}}={\frac {1/6}{2/6}}={\frac {1}{3}}.$

(Alternatively, we may use the formula obtained in the motivation section since all sample points are equally likely: (a) $\mathbb {P} (E|P)={\frac {|E\cap P|}{|P|}}={\frac {1}{3}}$ ; (b) $\mathbb {P} (P|E)={\frac {|P\cap E|}{|E|}}={\frac {1}{3}}$ .)

Exercise.

(a) Calculate $\mathbb {P} (P|P\cap E)$ .

(b) Calculate $\mathbb {P} (P|P\cup E)$ .

Solution

(a) $\mathbb {P} (P|P\cap E)={\frac {\mathbb {P} (P\cap E)}{\mathbb {P} (P\cap E)}}=1.$

(b) $\mathbb {P} (P|P\cup E)={\frac {\mathbb {P} (P\cap (P\cup E))}{\mathbb {P} (P\cup E)}}={\frac {\mathbb {P} (P)}{\mathbb {P} (P\cup E)}}={\frac {3/6}{5/6}}={\frac {3}{5}}.$ (We have $P\cap (P\cup E)=P$ since $P\subseteq P\cup E$ .)

Example. Suppose Amy tosses a fair coin twice, and she tells Bob that she obtains at least one heads in the two tosses.

(a) Calculate the probability that the outcome of the two tosses is "HH" (two heads).

(b) Calculate the probability that the outcome of the two tosses is "TT" (two tails).

Solution.

(a) The probability is $\mathbb {P} (\{HH\}|\{{\text{at least one heads}}\})={\frac {\mathbb {P} (\overbrace {\{HH\}\cap \{{\text{at least one heads}}\}} ^{\{HH\}})}{\mathbb {P} (\underbrace {\{{\text{at least one heads}}\}} _{\{HH,HT,TH\}})}}={\frac {1/4}{3/4}}={\frac {1}{3}}.$

(b) Notice that $\{TT\}\cap \{{\text{at least one heads}}\}=\varnothing$ . Thus, the probability is 0.

Exercise. A student claims that the probability that the outcome of the two tosses is "HH" should be ${\frac {1}{2}}$ . He reasons as follows:

When Bob is told that there is at least one heads in the two tosses, we can update the sample space to be $B=\{{\text{exactly one head}},{\text{two heads}}\}$ . The probability is thus ${\frac {|\{HH\}|}{|B|}}={\frac {1}{2}}.$

Point out the mistakes made by the student.

Solution

The mistake is that the two outcomes in $B$ are not equally likely. Indeed, the outcome "exactly one head" in the set $B$ can actually be decomposed into two more specific outcomes: $HT$ and $TH$ . So, strictly speaking $B$ is not even a sample space!

Example. (Boy or Girl paradox) Amy is a mother with two children, where each of them is equally likely to be a boy or girl.

(a) Suppose Amy tells you that she has at least one son. Calculate the probability that Amy has two sons .

(b) Suppose Amy tells you that her older child is a son. Calculate the probability that Amy has two sons.

Solution.

Define the sample space to be $\{BB,BG,GB,GG\}$ where $BG$ represents the older child is boy and the younger child is a girl, etc. Then, all four sample points in the sample space are equally likely.

(a) The probability is $\mathbb {P} (\{{\text{2 sons}}\}|\{\geq 1{\text{ son}}\})=\mathbb {P} (\{BB\}|\{BB,BG,GB\})=\mathbb {P} (\{BB\}|\{BB,BG,GB\})={\frac {\mathbb {P} (\overbrace {\{BB\}\cap \{BB,BG,GB\}} ^{\{BB\}})}{\mathbb {P} (\{BB,BG,GB\})}}={\frac {1/4}{3/4}}={\frac {1}{3}}.$

(b) The probability is $\mathbb {P} (\{{\text{2 sons}}\}|\{{\text{older child is a son}}\})=\mathbb {P} (\{BB\}|\{BB,BG\})={\frac {\mathbb {P} (\overbrace {\{BB\}\cap \{BB,BG\}} ^{\{BB\}})}{\mathbb {P} (\{BB,BG\})}}={\frac {1/4}{2/4}}={\frac {1}{2}}.$

Remark.

This example shows that even with a small change in the given information, the conditional probability obtained can be quite different. So, we should be careful about what information is actually given in order to have a correct calculation.

Example. Consider a course in probability theory. The following summarizes the probability for a student to pass the course when the student has different scores in the mid-term exam: ${\begin{array}{cc}{\text{Interval of scores}}&{\text{Probability for passing the course}}\\\hline [90,100]&0.95\\{}[70,90)&0.9\\{}[50,70)&0.5\\{}[0,50)&0.3\\\end{array}}$

(a) Suppose the probability for getting at least 50 marks in the mid-term exam (for every student) is 0.6. Calculate the probability that a student gets less than 50 marks in the mid-term exam but pass the course.

(b) Assuming that it is equally likely for the score of mid-term exam of a student to lie in one of the above four intervals, calculate the probability that a student (with unknown score) passes the course.

Solution.

(a) Let $A$ be the event that a student gets at least 50 marks in the mid-term exam, $P$ be the event that a student passes the course. Then, the desired probability is $\mathbb {P} (A^{c}\cap P)$ . From the question, we know that $\mathbb {P} (P|A^{c})=0.3$ , and $\mathbb {P} (A)=0.6$ . Hence, the desired probability is $\mathbb {P} (A^{c}\cap P)=\mathbb {P} (P|A^{c})\mathbb {P} (A^{c})=0.3(1-\mathbb {P} (A))=0.3(0.4)=0.12.$

(b) Let $A_{1},A_{2},A_{3},A_{4}$ be the event that the score of mid-term exam of a student lies in $[90,100],[70,90),[50,70),[0,50)$ respectively. From the assumption, we have $\mathbb {P} (A_{1})=\mathbb {P} (A_{2})=\mathbb {P} (A_{3})=\mathbb {P} (A_{4})=0.25$ . Notice that the events $P\cap A_{1},P\cap A_{2},P\cap A_{3},P\cap A_{4}$ are pairwise disjoint. Also, $(P\cap A_{1})\cup (P\cap A_{2})\cup (P\cap A_{3})\cup (P\cap A_{4})=P.$ (possibly seen by Venn diagram informally) Hence, by the finite additivity, the desired probability is $\mathbb {P} (P)=\mathbb {P} (P\cap A_{1})+\mathbb {P} (P\cap A_{2})+\mathbb {P} (P\cap A_{3})+\mathbb {P} (P\cap A_{4})=\underbrace {\mathbb {P} (P|A_{1})} _{0.95}\underbrace {\mathbb {P} (A_{1})} _{0.25}+\underbrace {\mathbb {P} (P|A_{2})} _{0.9}\underbrace {\mathbb {P} (A_{2})} _{0.25}+\underbrace {\mathbb {P} (P|A_{3})} _{0.5}\underbrace {\mathbb {P} (A_{3})} _{0.25}+\underbrace {\mathbb {P} (P|A_{4})} _{0.3}\underbrace {\mathbb {P} (A_{4})} _{0.25}=0.6625.$

Exercise. Suppose the assumption for part (b) above still applies.

(a) Calculate the probability that a student gets at least 90 marks in the mid-term exam if the student passes the course. (Answer: approximately 0.3585)

(b) Calculate the probability that a student gets less than 50 marks in the mid-term exam if the student fails the course. (Answer: approximately 0.5185)

Solution

Let us continue using the notations defined in part (b) above.

(a) The desired probability is $\mathbb {P} (A_{1}|P)={\frac {\mathbb {P} (A_{1}\cap P)}{\mathbb {P} (P)}}={\frac {\mathbb {P} (P|A_{1})\mathbb {P} (A_{1})}{\mathbb {P} (P)}}={\frac {0.95(0.25)}{0.6625}}\approx 0.3585.$

(b) The desired probability is $\mathbb {P} (A_{4}|P^{c})={\frac {\mathbb {P} (A_{4}\cap P^{c})}{\mathbb {P} (P^{c})}}={\frac {\mathbb {P} (A_{4})-\mathbb {P} (A_{4}\cap P)}{1-\mathbb {P} (P)}}={\frac {\mathbb {P} (A_{4})-\mathbb {P} (P|A_{4})\mathbb {P} (A_{4})}{1-\mathbb {P} (P)}}={\frac {0.25-0.3(0.25)}{1-0.6625}}\approx 0.5185.$

Example. Amy rolls two fair six-faced dice, with one colored red and another colored blue (so that they are distinguishable), without looking at the dice. After Amy rolls the two dice, Bob tells Amy that there is at least one 6 coming up (assume Bob tells the truth). Calculate the probability that 6 comes up for both dice after hearing the information from Bob.

Solution.

The condition is there is at least one 6 coming up, and the probability of this condition can be calculated by inclusion-exclusion principle: ${\begin{aligned}\mathbb {P} (\{{\text{at least one 6 comes up}}\})&=\mathbb {P} (\{{\text{6 comes up for the red dice}}\}\cup \{{\text{6 comes up for the blue dice}}\})\\&=\mathbb {P} (\{{\text{6 comes up for the red dice}}\})+\mathbb {P} (\{{\text{6 comes up for the blue dice}}\})\\&\quad -\mathbb {P} (\underbrace {\{{\text{6 comes up for the red dice}}\}\cap \{{\text{6 comes up for the blue dice}}} _{=\{{\text{6 comes up for both dice}}\}}\})\\&={\frac {1}{6}}+{\frac {1}{6}}-{\frac {1}{36}}\\&={\frac {11}{36}}.\end{aligned}}$ Also, $\mathbb {P} (\{{\text{at least one 6 comes up}}\}\cap \{{\text{ 6 comes up for both dice}}\})=\mathbb {P} (\{{\text{6 comes up for both dice}}\})={\frac {1}{36}}.$ Thus, the probability that 6 comes up for both dice after hearing the information from Bob is $\mathbb {P} (\{{\text{6 comes up for both dice}}\}|\{{\text{at least one 6 comes up}}\})={\frac {1/36}{11/36}}={\frac {1}{11}}.$

Exercise.

1. Calculate the probability again if the blue dice is colored red, such that the two dice is not distinguishable.

Solution

Changing the color of dice does not affect the probability. Thus, the probability is still ${\frac {1}{11}}$ .

Remark.

After changing the color, although the number of sample points in the sample space changes, the sample points are not equally likely.

2. Chris claims that the desired probability in the example should be $1/6$ , and he reasons as follows:

Given that there is at least one 6 coming up, we know that 6 comes up in a dice. Now, we consider another dice, having six equally likely possible outcomes for the number coming up, namely 1,2,3,4,5 and 6. Thus, we can update sample space to $B=\{(6,1),(6,2),(6,3),(6,4),(6,5),(6,6)\}$ where the second coordinate of each ordered pair represents the number coming up for another dice.
The desired event is that 6 comes up for both dice, that is, $A=\{(6,6)\}$ . Clearly, $A\cap B=A=\{(6,6)\}$ . It follows that the probability is ${\frac {|A\cap B|}{|B|}}={\frac {1}{6}}$ .

We know that the correct answer is $1/11$ , and not $1/6$ , but why is this claim wrong? (cf. this discussion)

Solution

The six outcomes considered include the outcome in which 6 comes up in either red or blue dice. But the sample points for which 6 comes up for the blue or red dice only (resp.) are missed. There are five of them, namely $(1,6),(2,6),(3,6),(4,6),(5,6)$ . So, the updated sample space suggested is actually not complete, and thus a wrong answer is obtained.

The complete updated sample space should be $B=\{({\color {red}1},{\color {blue}6}),({\color {red}2},{\color {blue}6}),({\color {red}3},{\color {blue}6}),({\color {red}4},{\color {blue}6}),({\color {red}5},{\color {blue}6}),({\color {red}6},{\color {blue}6}),({\color {red}6},{\color {blue}1}),({\color {red}6},{\color {blue}2}),({\color {red}6},{\color {blue}3}),({\color {red}6},{\color {blue}4}),({\color {red}6},{\color {blue}5}),({\color {red}6},{\color {blue}6})\},$ where first and second coordinate in each ordered pair represents the number coming up for red and blue dice respectively. The set $B$ containing 11 sample points. The desired event is $A=\{({\color {red}6},{\color {blue}6})\}$ . Hence, the correct answer is ${\frac {|A\cap B|}{|B|}}={\frac {1}{11}}$ .

Remark.

Chris' claim is correct if Bob tells Amy that 6 comes up for red (or blue) dice. There is difference between '6 comes up for red (or blue) dice' and 'there is at least one 6 coming up (which does not specify color)'. E.g., if Bob tells Amy that 6 comes up for the red dice, then the updated sample space does consist of 6 outcomes, as suggested by Chris:

$B=\{({\color {red}6},{\color {blue}1}),({\color {red}6},{\color {blue}2}),({\color {red}6},{\color {blue}3}),({\color {red}6},{\color {blue}4}),({\color {red}6},{\color {blue}5}),({\color {red}6},{\color {blue}6})\}$ where first and second coordinate in each ordered pair represents the number coming up for red and blue dice respectively.

Example. Suppose a poker deck is to be distributed to four players: player A,B,C,D. Calculate the probability that player C gets exactly 3 cards with diamond suit, with the condition

(a) Exactly 8 cards with diamond suit are distributed to players A and B, and then 13 of the remaining 44 cards are to be distributed randomly to player C.

(b) 26 cards are distributed to players A and B, where exactly 8 of them are with diamond suit. Then, the remaining 26 cards are to be distributed randomly and equally to players C and D.

Solution.

(a) Let us update the sample space to contain the possible distributions of the 13 of the 44 cards. The number of sample points in the updated sample space is ${\binom {44}{13}}$ (placing 13 card places for player C into 44 cards). On the other hand, the number of sample points in the event is ${\binom {5}{3}}{\binom {39}{10}}$ (placing 3 card places for player C into 5 cards with diamond suit, then placing 10 card places for player C into 39 cards with non-diamond suit). Then, the probability is ${\frac {{\binom {5}{3}}{\binom {39}{10}}}{\binom {44}{13}}}\approx 0.122.$

(b) Let us update the sample space to contain the possible distributions of the remaining 26 cards. The number of sample points in the updated sample space is ${\binom {26}{13}}$ (placing 13 card places for player C into 26 cards. Then, the remaining 13 cards are for player D). On the other hand, the number of sample points in the event is ${\binom {5}{3}}{\binom {21}{10}}$ . (placing 3 card places for player C into 5 distinguishable cards with diamond suit, then placing 10 card places for player C into 39 cards with non-diamond suit. The remaining 13 cards are for player D.) Then, the probability is ${\frac {{\binom {5}{3}}{\binom {21}{10}}}{\binom {26}{13}}}\approx 0.339.$

Exercise. Suppose 13 of the 52 cards are to be distributed to player C. Calculate the probability that player C gets exactly 3 cards with diamond suit. (Answer: approximately 0.056)

Solution

The probability is ${\frac {{\binom {13}{3}}{\binom {39}{10}}}{\binom {52}{13}}}\approx 0.056.$ (The number of sample points in the sample space is ${\binom {52}{13}}$ (placing 13 card places into 52 cards). The number of sample points in the event is ${\binom {13}{3}}{\binom {39}{10}}$ (placing 3 card places into 13 cards with diamond suit, and then placing 10 card places into 39 cards with non-diamond suit.)

Example. (Simpson's paradox) Suppose there are two treatments for a disease: treatment A and B. To test the effectiveness of the treatments, treatment A and B are applied to two different groups of patients with that disease. The following summarizes the number of patients with different results for the two treatments: ${\begin{array}{ccccc}&{\text{Men }}(M)&{\text{Men }}(M)&{\text{Women }}(W)&{\text{Women }}(W)\\&{\text{Recovered }}(R)&{\text{Dead }}(D)&{\text{Recovered }}(R)&{\text{Dead }}(D)\\\hline {\text{Treatment }}A&30&120&40&20\\{\text{Treatment }}B&50&150&15&5\\\end{array}}$ (Assume that there are only two outcomes for the treatment: $R$ or $D$ .)

(a) Calculate the probability of recovery for men receiving treatment A.

(b) Calculate the probability of recovery for men receiving treatment B.

(c) Calculate the probability of recovery for women receiving treatment A.

(d) Calculate the probability of recovery for women receiving treatment B.

(e) Calculate the probability of recovery for all patients receiving treatment A.

(f) Calculate the probability of recovery for all patients receiving treatment B.

(To calculate the probability of recovery, we consider the probability of picking a recovered patient when we select a patient from the pool of patients randomly such that every patient is equally likely to be picked, where the pool of patients depends on our conditions.)

Solution.

(a) The probability is $\mathbb {P} (R|M\cap A)={\frac {|R\cap M\cap A|}{|M\cap A|}}={\frac {30}{30+120}}={\frac {1}{5}}$ .

(b) The probability is $\mathbb {P} (R|M\cap B)={\frac {|R\cap M\cap B|}{|M\cap B|}}={\frac {50}{50+150}}={\frac {1}{4}}>\mathbb {P} (R|M\cap A)$ . (This means treatment B is better than A when applied to men only.)

(c) The probability is $\mathbb {P} (R|W\cap A)={\frac {|R\cap W\cap A|}{|W\cap A|}}={\frac {40}{40+20}}={\frac {2}{3}}$ .

(d) The probability is $\mathbb {P} (R|W\cap B)={\frac {|R\cap W\cap B|}{|W\cap B|}}={\frac {15}{15+5}}={\frac {3}{4}}>\mathbb {P} (R|W\cap A)$ . (This means treatment B is better than A when applied to women only.)

(e) The probability is $\mathbb {P} (R|A)={\frac {|R\cap A|}{|A|}}={\frac {30+40}{30+120+40+20}}={\frac {1}{3}}$ .

(f) The probability is $\mathbb {P} (R|B)={\frac {|R\cap B|}{|B|}}={\frac {50+15}{50+150+15+5}}={\frac {13}{44}}\approx 0.2955<\mathbb {P} (R|A)$ . (This means treatment B is worse than A when applied to all patients.)

Remark.

This example shows the Simpson's paradox where there is a reverse in direction of a comparison when data from several groups are combined into a single group.

The following is a generalization to the formula $\mathbb {P} (A\cap B)=\mathbb {P} (A|B)\mathbb {P} (B)$ . It is useful when we calculate the probability of the occurrence of multiple events together, by considering the events "one by one".

Proposition. (Multiplication rule of probability) Let $(\Omega ,{\mathcal {F}},\mathbb {P} )$ be a probability space, and ${\color {red}E_{1}},{\color {blue}E_{2}},\dotsc ,{\color {darkgreen}E_{n}}\in {\mathcal {F}}$ be events. Then, $\mathbb {P} ({\color {red}E_{1}}\cap {\color {blue}E_{2}}\cap \dotsb \cap {\color {darkgreen}E_{n}})=\mathbb {P} ({\color {red}E_{1}})\mathbb {P} ({\color {blue}E_{2}}|{\color {red}E_{1}})\mathbb {P} ({\color {purple}E_{3}}|{\color {red}E_{1}}\cap {\color {blue}E_{2}})\dotsb \mathbb {P} ({\color {darkgreen}E_{n}}|{\color {red}E_{1}}\cap \dotsb \cap {\color {brown}E_{n-1}}).$

Proof. With the assumptions, we have by definition $\mathbb {P} ({\color {red}E_{1}})\mathbb {P} ({\color {blue}E_{2}}|{\color {red}E_{1}})\mathbb {P} ({\color {purple}E_{3}}|{\color {red}E_{1}}\cap {\color {blue}E_{2}})\dotsb \mathbb {P} ({\color {darkgreen}E_{n}}|{\color {red}E_{1}}\cap \dotsb \cap {\color {brown}E_{n-1}}){\overset {\text{ def }}{=}}{\cancel {\mathbb {P} ({\color {red}E_{1}})}}\cdot {\frac {\cancel {\mathbb {P} ({\color {blue}E_{2}}\cap {\color {red}E_{1}})}}{\cancel {\mathbb {P} ({\color {red}E_{1}})}}}\cdot {\frac {\cancel {\mathbb {P} ({\color {purple}E_{3}}\cap {\color {red}E_{1}}\cap {\color {blue}E_{2}})}}{\cancel {\mathbb {P} ({\color {red}E_{1}}\cap {\color {blue}E_{2}})}}}\cdot \dotsb \cdot {\frac {\mathbb {P} ({\color {darkgreen}E_{n}}\cap {\color {red}E_{1}}\cap \dotsb \cap {\color {brown}E_{n-1}})}{\cancel {\mathbb {P} ({\color {red}E_{1}}\cap \dotsb \cap {\color {brown}E_{n-1}})}}}.$

$\Box$

Remark.

It is also known as chain rule of probability.

Example. Consider an urn that contains 20 distinguishable balls, where 5 of them is red, 6 of them is blue, 7 of them is green, and 2 of them is purple. Suppose we draw 5 balls from the urn.

(a) Calculate the probability of drawing 3 red balls and 2 blue balls from the urn, if the draw is done without replacement.

(b) Calculate the probability of drawing 3 red balls and 2 blue balls from the urn, if the draw is done with replacement.

Solution.

Let $R_{i},B_{j}$ be the event that red ball is drawn in $i$ -th draw, and blue ball is drawn in $j$ -th draw respectively.

(a) The desired probability can be expressed as $\mathbb {P} (R_{1}\cap R_{2}\cap R_{3}\cap B_{4}\cap B_{5})$ (or $\mathbb {P} (R_{1}\cap B_{2}\cap R_{3}\cap R_{4}\cap B_{5})$ , etc., but simply changing the order does not affect the probability in this case).

Method 1: Combinatorial probability. Let $\Omega$ be the sample space that contains all possible outcomes of the 5 ordered draws. The probability is $\mathbb {P} (R_{1}\cap R_{2}\cap R_{3}\cap B_{4}\cap B_{5})={\frac {|R_{1}\cap R_{2}\cap R_{3}\cap B_{4}\cap B_{5}|}{|\Omega |}}={\frac {5\times 4\times 3\times 6\times 5}{20\times 19\times 18\times 17\times 16}}\approx 0.0009674923.$

Method 2: Multiplication rule. Then, the probability is given by ${\begin{aligned}\mathbb {P} (R_{1}\cap R_{2}\cap R_{3}\cap B_{4}\cap B_{5})&=\mathbb {P} (R_{1})\mathbb {P} (R_{2}|R_{1})\mathbb {P} (R_{3}|R_{1}\cap R_{2})\mathbb {P} (B_{4}|R_{1}\cap R_{2}\cap R_{3})\mathbb {P} (B_{5}|R_{1}\cap R_{2}\cap R_{3}\cap B_{4})\\&={\frac {5}{20}}\times {\frac {4}{19}}\times {\frac {3}{18}}\times {\frac {6}{17}}\times {\frac {5}{16}}\\&\approx 0.0009674923.\end{aligned}}$ (For example, $\mathbb {P} (R_{2}|R_{1})={\frac {4}{19}}$ since the sample space is updated from the set containing 20 balls, to the set containing 19 balls, excluding the ball drawn out in the first draw, and also there are only four (valid) sample points (four red balls remaining) in $R_{2}$ in this updated sample space. Here, the initial sample space contains the 20 possible outcomes from drawing a ball from the urn.)

(b) By the multiplication rule, the probability is given by ${\begin{aligned}\mathbb {P} (R_{1}\cap R_{2}\cap R_{3}\cap B_{4}\cap B_{5})&=\mathbb {P} (R_{1})\mathbb {P} (R_{2}|R_{1})\mathbb {P} (R_{3}|R_{1}\cap R_{2})\mathbb {P} (B_{4}|R_{1}\cap R_{2}\cap R_{3})\mathbb {P} (B_{5}|R_{1}\cap R_{2}\cap R_{3}\cap B_{4})\\&=\mathbb {P} (R_{1})\mathbb {P} (R_{2})\mathbb {P} (R_{3})\mathbb {P} (B_{4})\mathbb {P} (B_{5})\\&=\left({\frac {5}{20}}\right)^{3}\left({\frac {6}{20}}\right)^{2}\\&=0.00140625.\end{aligned}}$ (The conditional probabilities are the same as the unconditional probabilities, since even the sample space is updated according to the conditions, there are not any changes at all! (since the draws are done with replacement) In fact, the outcome of a draw is not affected by the outcomes of other draws. Actually, it turns out that these events ( $R_{1},R_{2},R_{3},B_{4},B_{5}$ ) are (statistically) independent. We will discuss the formal definition of independence in more details in a later section. )

Two important theorems related to conditional probability, namely law of total probability and Bayes' theorem, will be discussed in the following section.

Law of total probability and Bayes' theorem

Sometimes, it is not an easy task to assign a suitable unconditional probability to an event. For instance, suppose Amy will perform a COVID-19 test, and the result is either positive or negative (impossible to have invalid results). Let $P=\{{\text{Amy tests positive at the COVID-19 test}}\}$ . What should be $\mathbb {P} (P)$ ? It is actually quite difficult to answer directly, since this probability is without any condition. In particular, it is unknown that whether Amy gets infected by COVID-19 or not, and clearly the infection will affect the probability assignment quite significantly.

On the other hand, it may be easier to assign/calculate related conditional/unconditional probabilities. Now, let $I=\{{\text{Amy gets infected by COVID-19}}\}$ . The conditional probability $\mathbb {P} (P|I)$ , called sensitivity, may be known based on the research on the COVID-19 test. Also, the conditional probability $\mathbb {P} (P^{c}|I^{c})$ , called specificity, may also be known based on the research. Besides, the probability $\mathbb {P} (I)$ may be obtained according to studies on COVID-19 infection for Amy's place of living. Since $\mathbb {P} (P)=\mathbb {P} (P\cap I)+\mathbb {P} (P\cap I^{c}),$ by the definition of conditional probability, we have $\mathbb {P} (P)=\mathbb {P} (P|I)\mathbb {P} (I)+\mathbb {P} (P|I^{c})\mathbb {P} (I^{c}).$ Since the conditional probability satisfies the probability axioms, we have the relation $\mathbb {P} (P|I^{c})=1-\mathbb {P} (P^{c}|I^{c})$ , and thus the value of $\mathbb {P} (P|I^{c})$ can be obtained. The remaining terms in the expression can also be obtained, as suggested by above. Thus, we can finally obtain the value of $\mathbb {P} (P)$ .

This shows that conditional probabilities can be quite helpful for calculating unconditional probabilities, especially when we condition appropriately so that the conditional probabilities, and the probability of the condition are known in some ways.

The following theorem is an important theorem that relates unconditional probabilities and conditional probabilities, as in above discussion.

Theorem. (Law of total probability)

An illustration of law of total probability.

Let $(\Omega ,{\mathcal {F}},\mathbb {P} )$ be a probability space.

(finite case) Assume that $B_{1},B_{2},\dotsc ,B_{n}\in {\mathcal {F}}$ are pairwise disjoint events such that $\mathbb {P} (B_{1}),\mathbb {P} (B_{2}),\dotsc ,\mathbb {P} (B_{n})>0$ and $A\subseteq B_{1}\cup B_{2}\cup \dotsb \cup B_{n}$ . Then,

$\mathbb {P} (A)=\mathbb {P} (A|B_{1})\mathbb {P} (B_{1})+\mathbb {P} (A|B_{2})\mathbb {P} (B_{2})+\dotsb +\mathbb {P} (A|B_{n})\mathbb {P} (B_{n}).$

(countably infinite case) Assume that $B_{1},B_{2},\dotsc \in {\mathcal {F}}$ are pairwise disjoint events such that $\mathbb {P} (B_{1}),\mathbb {P} (B_{2}),\dotsc >0$ and $A\subseteq B_{1}\cup B_{2}\cup \dotsb$ . Then,

$\mathbb {P} (A)=\mathbb {P} (A|B_{1})\mathbb {P} (B_{1})+\mathbb {P} (A|B_{2})\mathbb {P} (B_{2})+\dotsb .$

Proof. Here we only prove the finite case. The proof for countably infinite case is similar and thus left as an exercise.

Under the assumptions, $B_{1},B_{2},\dotsc ,B_{n}$ are pairwise disjoint, and thus $A\cap B_{1},A\cap B_{2},\dotsc ,A\cap B_{n}$ are also pairwise disjoint (by observing that $(A\cap B_{1})\cap (A\cap B_{2})=A\cap (B_{1}\cap B_{2})=A\cap \varnothing =\varnothing$ , and other intersections have similar results). Also, since $\mathbb {P} (B_{1}),\mathbb {P} (B_{2}),\dotsc ,\mathbb {P} (B_{n})>0$ , the conditional probabilities $\mathbb {P} (\cdot |B_{1}),\mathbb {P} (\cdot |B_{2}),\dotsc ,\mathbb {P} (\cdot |B_{n})$ are defined. Moreover, since $A\subseteq B_{1}\cup B_{2}\cup \dotsb \cup B_{n}$ , we can observe that $A=(A\cap B_{1})\cup (A\cap B_{2})\cup \dotsb \cup (A\cap B_{n})$ (through Venn diagram, informally). It follows that ${\begin{aligned}\mathbb {P} (A)&=\mathbb {P} ((A\cap B_{1})\cup (A\cap B_{2})\cup \dotsb \cup (A\cap B_{n}))\\&=\mathbb {P} (A\cap B_{1})+\mathbb {P} (A\cap B_{2})+\dotsb +\mathbb {P} (A\cap B_{n})&({\text{finite additivity}})\\&=\mathbb {P} (A|B_{1})\mathbb {P} (B_{1})+\mathbb {P} (A|B_{2})\mathbb {P} (B_{2})+\dotsb +\mathbb {P} (A|B_{n})\mathbb {P} (B_{n}).&({\text{definition}})\\\end{aligned}}$

$\Box$

Exercise. Prove the countably infinite case for law of total probability.

Proof

Proof. Under the assumptions, $B_{1},B_{2},\dotsc ,$ are pairwise disjoint, and thus $A\cap B_{1},A\cap B_{2},\dotsc$ are also pairwise disjoint (by observing that $(A\cap B_{1})\cap (A\cap B_{2})=A\cap (B_{1}\cap B_{2})=A\cap \varnothing =\varnothing$ , and other intersections have similar results). Also, since $\mathbb {P} (B_{1}),\mathbb {P} (B_{2}),\dotsc >0$ , the conditional probabilities $\mathbb {P} (\cdot |B_{1}),\mathbb {P} (\cdot |B_{2}),\dotsc$ are defined. Moreover, since $A\subseteq B_{1}\cup B_{2}\cup \dotsb$ , we can observe that $A=(A\cap B_{1})\cup (A\cap B_{2})\cup \dotsb$ (through Venn diagram, informally). It follows that ${\begin{aligned}\mathbb {P} (A)&=\mathbb {P} ((A\cap B_{1})\cup (A\cap B_{2})\cup \dotsb )\\&=\mathbb {P} (A\cap B_{1})+\mathbb {P} (A\cap B_{2})+\dotsb &({\text{countable additivity}})\\&=\mathbb {P} (A|B_{1})\mathbb {P} (B_{1})+\mathbb {P} (A|B_{2})\mathbb {P} (B_{2})+\dotsb .&({\text{definition}})\\\end{aligned}}$

$\Box$

Now, suppose Amy has performed a COVID-19 test, and the result is positive! So now Amy is worrying about whether she really gets infected by COVID-19, or it is just a false positive. Therefore, she would like to know the conditional probability $\mathbb {P} (I|P)$ (the conditional probability for getting infected given testing positive). Notice that the conditional probability $\mathbb {P} (P|I)$ may be known (based on some research). However, it does not equal the conditional probability $\mathbb {P} (I|P)$ generally. (These two probabilities are referring to two different things.) So, now we are interested in knowing that whether there is formula that relates these two probabilities, which have somewhat "similar" expressions. See the following exercise for deriving the relationship between $\mathbb {P} (A|B)$ and $\mathbb {P} (B|A)$ :

Exercise. Let $(\Omega ,{\mathcal {F}},\mathbb {P} )$ be a probability space, and $A,B\in {\mathcal {F}}$ be events. Assume $\mathbb {P} (A)>0$ and $\mathbb {P} (B)>0$ . Propose a relationship between $\mathbb {P} (A|B)$ and $\mathbb {P} (B|A)$ , and prove it. (Hint: You may apply the definition of conditional probability on each of these two conditional probabilities. Do you notice any similarity in the expressions?)

Solution

Proposition: $\mathbb {P} (A|B)={\frac {\mathbb {P} (B|A)\mathbb {P} (A)}{\mathbb {P} (B)}}$ .

Proof. Applying the definition of conditional probability, we have $\mathbb {P} (A|B)={\frac {\mathbb {P} (A\cap B)}{\mathbb {P} (B)}},$ and $\mathbb {P} (B|A)={\frac {\mathbb {P} (A\cap B)}{\mathbb {P} (A)}}.$ Notice that these two expressions have the same numerator. Thus, we have $\mathbb {P} (A|B)\mathbb {P} (B)=\mathbb {P} (B|A)\mathbb {P} (A).$ Rearranging the equation yields the desired result (the rearrangement is valid since $\mathbb {P} (A)$ and $\mathbb {P} (B)$ are nonzero).

$\Box$

Remark.

$\mathbb {P} (A)$ is known as the prior probability of event $A$ . It is called prior since this probability does not take into account the knowledge of the occurrence of event $B$ . It reflects our initial degrees of belief of event $A$ (according to the subjectivism interpretation of probability).
$\mathbb {P} (A|B)$ is known as the posterior probability of event. It is called posterior since this probability takes into account the knowledge of the occurrence of event $B$ . It reflects our adjusted degrees of belief of event $A$ as there are more new information (the occurrence of event $B$ in this case) (according to the subjectivism interpretation of probability).

This formula suggests us the way of adjusting/updating the prior probability $\mathbb {P} (A)$ to the posterior probability $\mathbb {P} (A|B)$ .
Of course, after obtaining the posterior probability, there may be another round of new information available, and we may need to treat this "posterior probability" as the "prior probability", and update it to a "new version" of posterior probability.

The following theorem is the generalization of the above result.

Theorem. (Bayes' theorem)

An illustration of Bayes' theorem.

Let $(\Omega ,{\mathcal {F}},\mathbb {P} )$ be a probability space.

(finite case) Assume that $B_{1},B_{2},\dotsc ,B_{n}\in {\mathcal {F}}$ are pairwise disjoint events such that $\mathbb {P} (B_{1}),\mathbb {P} (B_{2}),\dotsc ,\mathbb {P} (B_{n})>0$ and $A\subseteq B_{1}\cup B_{2}\cup \dotsb \cup B_{n}$ . Then,

$\mathbb {P} (B_{i}|A)={\frac {\mathbb {P} (A|B_{i})\mathbb {P} (B_{i})}{\mathbb {P} (A|B_{1})\mathbb {P} (B_{1})+\mathbb {P} (A|B_{2})\mathbb {P} (B_{2})+\dotsb +\mathbb {P} (A|B_{n})\mathbb {P} (B_{n})}}$

for every

i=1,2,\dotsc ,{\text{ or }}n

.

(countably infinite case) Assume that $B_{1},B_{2},\dotsc \in {\mathcal {F}}$ are pairwise disjoint events such that $\mathbb {P} (B_{1}),\mathbb {P} (B_{2}),\dotsc >0$ and $A\subseteq B_{1}\cup B_{2}\cup \dotsb$ . Then,

$\mathbb {P} (B_{i}|A)={\frac {\mathbb {P} (A|B_{i})\mathbb {P} (B_{i})}{\mathbb {P} (A|B_{1})\mathbb {P} (B_{1})+\mathbb {P} (A|B_{2})\mathbb {P} (B_{2})+\dotsb }}$

for every

i\in \mathbb {N}

.

Proof.

Finite case: Under the assumptions, we have by law of total probability $\mathbb {P} (A)=\mathbb {P} (A|B_{1})\mathbb {P} (B_{1})+\mathbb {P} (A|B_{2})\mathbb {P} (B_{2})+\dotsb +\mathbb {P} (A|B_{n})\mathbb {P} (B_{n}).$ On the other hand, by the definition of conditional probability, $\mathbb {P} (A\cap B_{i})=\mathbb {P} (A|B_{i})\mathbb {P} (B_{i}).$ Since $\mathbb {P} (B_{i}|A)={\frac {\mathbb {P} (A\cap B_{i})}{\mathbb {P} (A)}}$ by the definition of conditional probability, the result follows.

Countably infinite case: Under the assumptions, we have by law of total probability $\mathbb {P} (A)=\mathbb {P} (A|B_{1})\mathbb {P} (B_{1})+\mathbb {P} (A|B_{2})\mathbb {P} (B_{2})+\dotsb .$ On the other hand, by the definition of conditional probability, $\mathbb {P} (A\cap B_{i})=\mathbb {P} (A|B_{i})\mathbb {P} (B_{i}).$ Since $\mathbb {P} (B_{i}|A)={\frac {\mathbb {P} (A\cap B_{i})}{\mathbb {P} (A)}}$ by the definition of conditional probability, the result follows.

$\Box$

Example. Suppose the weather at a certain day can either be sunny or rainy, with equal probability. Amy has a probability of ${\color {darkgreen}0.8}$ ( ${\color {blue}0.3}$ ) to bring an umbrella at that day if the weather of that day is rainy (sunny). At a day, we see that Amy brings an umbrella. Calculate the probability for that day to be rainy.

Solution. Let $R,S,U$ be the events that the weather at that day is rainy, sunny and Amy brings an umbrella at that day respectively. Then, the probability that Amy brings an umbrella at that day is $\mathbb {P} (U)={\color {darkgreen}\mathbb {P} (U|R)}{\color {orangered}\mathbb {P} (R)}+{\color {blue}\mathbb {P} (U|S)}{\color {orangered}\mathbb {P} (S)}={\color {darkgreen}0.8}\cdot {\color {orangered}0.5}+{\color {blue}0.3}\cdot {\color {orangered}0.5}=0.55.$ by law of total probability.

Given that Amy brings an umbrella at that day, the probability for that day to be rainy is $\mathbb {P} (R|U)={\frac {{\color {darkgreen}\mathbb {P} (U|R)}{\color {orangered}\mathbb {P} (R)}}{\mathbb {P} (U)}}={\frac {{\color {darkgreen}0.8}\cdot {\color {orangered}0.5}}{0.55}}={\frac {8}{11}}$ (by Bayes' theorem).

Exercise. Assume that the weather can also be cloudy, such that the weather is twice as likely to be cloudy compared with sunny (or rainy). (That is, the probability for the weather to be cloudy is twice of the probability for the weather to be sunny (which equals the probability for the weather to be rainy).) Also, Amy has a probability $p$ to bring an umbrella at a day if the weather of that day is cloudy. Calculate $p$ such that the probability for a day to be rainy if Amy brings an umbrella at that day is ${\frac {4}{11}}$ instead of ${\frac {8}{11}}$ . (Answer: ${\frac {11}{20}}$ )

Solution

Let $C$ be the event that the weather is cloudy. Since the weather is twice as likely to be cloudy compared with sunny (or rainy) and also the weather can either be sunny, rainy, or cloudy, it follows that $\mathbb {P} (C)=2\mathbb {P} (R)=2\mathbb {P} (S)\implies 2\mathbb {P} (R)+\mathbb {P} (R)+\mathbb {P} (S)=1\implies \mathbb {P} (R)=\mathbb {P} (S)=0.25{\text{ and }}\mathbb {P} (C)=0.5.$ So, $\mathbb {P} (U)=0.8\cdot 0.25+0.3\cdot 0.25+p\cdot 0.5=0.5p+0.275$ . We want to set the value of $p$ such that the probability $\mathbb {P} (R|U)={\frac {4}{11}}$ . Hence, ${\frac {\mathbb {P} (U|R)\mathbb {P} (R)}{\mathbb {P} (U)}}={\frac {8}{11}}\implies {\frac {0.8\cdot 0.25}{0.5p+0.275}}={\frac {4}{11}}\implies p={\frac {11}{20}}.$

Example. Consider Morse code where dots and dashes are used to encode messages. Suppose Amy and Bob want to communicate through Morse code. However, when Amy sends a Morse code to Bob, there is interference during the transmission, and hence there is a ${\frac {1}{8}}$ probability for a dot (or dash) to be mistakenly received as a dash (or dot resp.). It is known that the dots and dashes are used in proportion 3:4 in the Morse code. From this, we may assume that the probability for a dot and dash to be sent is ${\frac {3}{7}}$ and ${\frac {4}{7}}$ respectively. Calculate the probability that a dot is really sent by Amy if Bob receives a dot.

Solution.

Let $S$ and $R$ be the event that a dot is sent and a dot is received respectively.

Then, we have $\mathbb {P} (R^{c}|S)=\mathbb {P} (R|S^{c})={\frac {1}{8}}.$ Furthermore, we have $\mathbb {P} (S)={\frac {3}{7}}{\text{ and }}\mathbb {P} (S^{c})={\frac {4}{7}}.$ Then, we have $\mathbb {P} (R)=\mathbb {P} (R|S)\mathbb {P} (S)+\mathbb {P} (R|S^{c})\mathbb {P} (S^{c})=\left(1-{\frac {1}{8}}\right)\cdot {\frac {3}{7}}+{\frac {1}{8}}\cdot {\frac {4}{7}}={\frac {25}{56}}.$ It follows that the desired probability is given by $\mathbb {P} (S|R)={\frac {\mathbb {P} (R|S)\mathbb {P} (S)}{\mathbb {P} (R)}}={\frac {(1-1/8)(3/7)}{25/56}}={\frac {21}{25}}.$

Example. Suppose a diagnosis for cancer is conducted for Bob. Let $C$ be the event that Bob has cancer, and $D$ be the event that Bob is diagnosed with cancer. Based on research, it is known that the probability for a correct diagnosis is 0.99. That is, $\mathbb {P} (D|C)=0.99{\text{ and }}\mathbb {P} (D^{c}|C^{c})=0.99.$ Based on the research on the incidence rate of cancer for the Bob's place of living, we may assume that $\mathbb {P} (C)=0.001$ . Calculate the probability that Bob has cancer if he is diagnosed with cancer.

Solution. The desired probability is given by $\mathbb {P} (C|D)$ . First, $\mathbb {P} (D)=\mathbb {P} (D|C)\mathbb {P} (C)+\underbrace {\mathbb {P} (D|C^{c})} _{1-\mathbb {P} (D^{c}|C^{c})}\mathbb {P} (C^{c})=0.99(0.001)+(1-0.99)(1-0.001)=0.01098.$ We have $\mathbb {P} (C|D)={\frac {\mathbb {P} (D|C)\mathbb {P} (C)}{\mathbb {P} (D)}}={\frac {\mathbb {P} (D|C)\mathbb {P} (C)}{0.01098}}={\frac {0.99(0.001)}{0.01098}}\approx 0.0902.$

Remark.

This result is quite counter-intuitive, since it appears that the probability for a correct diagnosis is quite high, but it turns out that this probability is quite low!

The main reason for this result is that the incidence rate of cancer is quite low.

Exercise. Calculate the probability that Bob does not have cancer if he is not diagnosed with cancer. (Answer: approximately 0.99999)

Solution

The desired probability is given by $\mathbb {P} (C^{c}|D^{c})$ . We have $\mathbb {P} (C^{c}|D^{c})={\frac {\mathbb {P} (D^{c}|C^{c})\mathbb {P} (C^{c})}{\mathbb {P} (D^{c})}}={\frac {0.99(1-0.001)}{1-0.01098}}\approx 0.99999.$ So, if Bob is not diagnosed with cancer, then he can be quite sure that he actually does not have cancer.

The following is a famous problem.

Example. (Monty Hall problem)

Suppose you are on a game show. There are three doors in front of you, labelled door 1,2,3, and you are allowed to open one of them. Among these three doors, a new car is behind one of the doors, and a goat is behind the other doors. If you open the door with a car behind it, then you will get the car. Otherwise, you get nothing.

Now, suppose you pick the door 1. Then the host Monty Hall, who knows what are behind the three doors, opens a door with a goat behind it (which is not chosen by you, of course). It turns out that Monty Hall opens door 3. After that, you are given an offer where you can keep choosing door 1, or switch your choice to door 2.

With these information, calculate the probability that the new car is behind door 2.

Solution.

Let $D_{1},D_{2},D_{3}$ be the event that a car is behind door 1,2,3 respectively. Also, let $M$ be the event that Monty Hall opens door 3. It should be reasonable to assume that a car is equally likely to be behind one of the doors. So, we have $\mathbb {P} (D_{1})=\mathbb {P} (D_{2})=\mathbb {P} (D_{3})={\frac {1}{3}}.$ Now, we consider different conditional probabilities related to $M$ , by considering three different cases:

Case 1: The car is behind door 1. Then, Monty Hall can open door 2 and 3, and it is reasonable to assume that it is equally likely for him to open one of them. Hence, the conditional probability $\mathbb {P} (M|D_{1})={\frac {1}{2}}$ .
Case 2: The car is behind door 2. Then, Monty Hall must open door 3 (since this is the only choice). Hence, the conditional probability $\mathbb {P} (M|D_{2})=1$ .
Case 3: The car is behind door 3. Then, Monty Hall can never open door 3 (where a car is behind it). Hence, the conditional probability $\mathbb {P} (M|D_{3})=0$ .

The desired probability is given by $\mathbb {P} (D_{2}|M)={\frac {\mathbb {P} (M|D_{2})\mathbb {P} (D_{2})}{\mathbb {P} (M|D_{1})\mathbb {P} (D_{1})+\mathbb {P} (M|D_{2})\mathbb {P} (D_{2})+\mathbb {P} (M|D_{3})\mathbb {P} (D_{3})}}={\frac {(1)(1/3)}{(1/2)(1/3)+(1)(1/3)+(0)(1/3)}}={\frac {2}{3}}.$

Remark.

Clearly, $\mathbb {P} (D_{3}|M)=0$ since it is impossible for the car to be behind door 3 if Monty Hall opens door 3. It follows that $\mathbb {P} (D_{1}|M)=1-\mathbb {P} (D_{1}^{c}|M)=1-\mathbb {P} (D_{2}|M)={\frac {1}{3}}<{\frac {2}{3}}=\mathbb {P} (D_{2}|M)$ . Thus, it is favourable for you to switch your choice to door 2.

Exercise. 1. A student thinks that the probability should be ${\frac {1}{2}}$ with the following argument:

After Monty Hall opens door 3, we know that a goat is behind it. This means a car is not behind door 3. Then, it is equally likely for the car to be behind one of door 1 and door 2. Hence, the probability is ${\frac {1}{2}}$ .

What is the mistake in this argument?

Solution

The mistake is that a wrong event is conditioned on for the above argument. To be more precise, through this reasoning the probability calculated is actually $\mathbb {P} (D_{1}|D_{3}^{c})={\frac {\overbrace {\mathbb {P} (D_{1}\cap D_{3}^{c})} ^{\mathbb {P} (D_{1})\;\because D_{1}\subseteq D_{3}^{c}}}{\mathbb {P} (D_{3}^{c})}}={\frac {\mathbb {P} (D_{1})}{1-\mathbb {P} (D_{3})}}={\frac {1/3}{1-1/3}}={\frac {1}{2}}.$ But $D_{3}^{c}\neq M$ . Particularly, when the car is not behind door 3, this does not mean Monty Hall must open door 3. It is possible that the car is behind door 1, where Monty Hall can open door 2 as well.

2. Suppose there are $N-3>0$ more doors in addition to the doors 1,2,3, labelled door 4,5,..., $N$ , such that there are $N$ doors in total. In this case, Monty Hall opens $N-2$ doors where a goat is behind each of them. It turns out that Monty Hall opens doors 3,4,..., $N$ . Now, you are given an offer where you can keep choosing door 1, or switch your choice to door 2. Calculate the probability that the car is behind door 2. (Answer: ${\frac {N-1}{N}}$ )

Solution

Let $D_{1},D_{2},D_{3},\dotsc ,D_{N}$ be the event that a car is behind door 1,2,3,..., $N$ respectively. Also, let $M$ be the event that Monty Hall opens doors 3,4,..., $N$ . It should be reasonable to assume that a car is equally likely to be behind one of the doors. So, we have $\mathbb {P} (D_{1})=\mathbb {P} (D_{2})=\dotsb =\mathbb {P} (D_{N})={\frac {1}{N}}.$ In this situation, we have $N$ cases.

Case 1: The car is behind door 1. Monty can then open $N-1$ of the doors 2,3,..., $N$ (or equivalently, choose one of the doors 2,3,..., $N$ to not open). We know that the event $M$ means Monty Hall chooses door 2 to not open. Thus, the probability is $\mathbb {P} (M|D_{1})={\frac {1}{N-1}}$ .
Case 2: The car is behind door 2. Monty Hall must then open of the doors 3,4,..., $N$ . Thus, the probability is $\mathbb {P} (M|D_{2})=1$ .
Case 3: The car is behind door 3. Then, Monty Hall must not open door 3, and hence must not open doors 3,4,..., $N$ . Thus, the probability is $\mathbb {P} (M|D_{3})=0$

With similar arguments, we have $\mathbb {P} (M|D_{4})=\mathbb {P} (M|D_{5})=\dotsb =\mathbb {P} (M|D_{N})=0.$ It follows that the desired probability is $\mathbb {P} (D_{2}|M)={\frac {\mathbb {P} (M|D_{2})\mathbb {P} (D_{2})}{\mathbb {P} (M|D_{1})\mathbb {P} (D_{1})+\mathbb {P} (M|D_{2})\mathbb {P} (D_{2})+\dotsb +\mathbb {P} (M|D_{N})\mathbb {P} (D_{N})}}={\frac {(1)(1/N)}{{\big (}1/(N-1){\big )}(1/N)+(1)(1/N)+0}}={\frac {1/N}{(1/N){\big (}1/(N-1)+1{\big )}}}={\frac {1}{(1+N-1)/(N-1)}}={\frac {N-1}{N}}.$

Remark.

For other cases in which another door is picked, the same result holds by symmetry (notations can be changed).

Example. (Gambler's ruin) Consider a gambling game as follows. The player starts with capital $\$k$ ( $k$ is a nonnegative integer and is specified by the player), and tosses a fair coin repeatedly until he wants to stop. If head comes up, then he gains $\$1$ . If tail comes up, then he loses $\$1$ .

Suppose Bob plays this game and decides to stop tossing when one of the following conditions are satisfied:

He runs out of money. (That is, his capital is $\$0$ .)
His capital is $\$N$ , where $N$ is a certain nonnegative integer specified by Bob.

Calculate the probability that Bob will eventually run out of all money if he starts with capital $\$k$ .

Solution. In this case, we should calculate the probability using recursive approach. First, let $A_{k}$ be the event that Bob will eventually run out of all money if he starts with capital $\$k$ . Also, let $B$ be the event that head comes up in the first toss.

Then, a crucial observation is that if the first toss gives a head, then the situation is exactly the same as that Bob starts playing the game with capital $\$k+1$ after the toss where he gains $\$1$ . Hence, we have $\mathbb {P} (A_{k}|B)=\mathbb {P} (A_{k+1})$ . Likewise, we have $\mathbb {P} (A_{k}|B^{c})=\mathbb {P} (A_{k-1})$ . (This is often the most important step for solving a problem recursively: identifying the recursive relation given (usually implicitly) in the question.)

Then, we can establish the equation $\mathbb {P} (A_{k})=\underbrace {\mathbb {P} (A_{k}|B)} _{\mathbb {P} (A_{k+1})}\mathbb {P} (B)+\underbrace {\mathbb {P} (A_{k}|B^{c})} _{\mathbb {P} (A_{k+1})}\mathbb {P} (B^{c})={\frac {1}{2}}\mathbb {P} (A_{k+1})+{\frac {1}{2}}\mathbb {P} (A_{k-1})\implies 2\mathbb {P} (A_{k})=\mathbb {P} (A_{k+1})+\mathbb {P} (A_{k-1})\implies \mathbb {P} (A_{k-1})-\mathbb {P} (A_{k})=\mathbb {P} (A_{k})-\mathbb {P} (A_{k+1}),$ which holds for every $k=1,2,\dotsc ,N-1$ (we do not include $0$ and $N$ since these are boundary values). Notice that we have $\mathbb {P} (A_{0})=1$ (since Bob immediately runs out of money if he starts with zero capital). On the other hand, we have $\mathbb {P} (A_{N})=0$ (since Bob immediately stops playing the game if he starts with capital $\$N$ , so it is impossible for him to run out of money).

Notice that the above equation implies that the decrement of the probability $\mathbb {P} (A_{i})$ when $i$ increases by one is the same for every $i=0,1,2,\dotsc ,N-1$ . Since there are in total $N$ decrements, it follows that every decrement is of value ${\frac {1}{N}}$ . Hence, $\mathbb {P} (A_{k})=\mathbb {P} (A_{0})\underbrace {-{\frac {1}{N}}-{\frac {1}{N}}-\dotsb -{\frac {1}{N}}} _{k{\text{ decrements}}}=1-{\frac {k}{N}}.$

Exercise. Suppose Bob is so greedy that $N\to \infty$ . Under this case, what does the probability tend to (that is, what is the limit of the above probability as $N\to \infty$ )? Suggest an intuitive meaning of this result.

Solution

Since $\lim _{N\to \infty }\left(1-{\frac {k}{N}}\right)=1$ , it follows that the probability ( $1-{\frac {k}{N}}$ ) tends to 1 as $N\to \infty$ . This means Bob will inevitably run out of all money if he is extremely greedy and never wants to stop at any fixed value of $N$ . This result is known as the Gambler's ruin.

Example. Player A and B take turn to roll a fair dice, where player A rolls the dice first. A player wins if he is the first player that rolls a six. Calculate the probability that player A wins.

Solution. We divide the situation into three cases:

Player A rolls a six in the first roll. (probability ${\frac {1}{6}}$ ) In this case, player A wins.
Player A does not roll a six in the first roll, and then player B rolls a six in the second roll. (probability ${\frac {5}{6}}\times {\frac {1}{6}}$ . This can be obtained by considering multiplication rule of probability. We can see that the probability for player B to roll a six in the second roll given that player A does not roll a six in the first roll is $1/6$ .) But in this case, the probability for player A to win is zero anyway.
Player A and B does not roll a six in the first and second roll respectively. (probability ${\frac {5}{6}}\times {\frac {5}{6}}$ )

Notice that in case 3, the situation is exactly the same as the original situation at the beginning where it is player A's turn to roll the dice. Hence, under the condition for case 3, the probability for player A to win is the same as the unconditional one.

Thus, we can write $\mathbb {P} ({\text{A wins}})={\frac {1}{6}}+{\frac {5}{6}}\times {\frac {5}{6}}\times \mathbb {P} ({\text{A wins}})\implies \mathbb {P} ({\text{A wins}})={\frac {6}{11}}.$

Remark.

Notice that eventually, one of the players must roll a six. Hence, it is impossible for the game to end up with a draw. This means the probability that player B wins is $1-{\frac {6}{11}}={\frac {5}{11}}$ .

Exercise. Suppose the dice is changed such that the probability for getting a six from rolling the dice is $p\in (0,1)$ (which is not necessarily ${\frac {1}{6}}$ ), while the player A still rolls the dice first. Show that the game is always favourable to player A for every $p\in (0,1)$ .

Solution

Proof. When the probability is $p$ , we can write the above equation as $\mathbb {P} ({\text{A wins}})=p+(1-p)^{2}\mathbb {P} ({\text{A wins}})\implies \mathbb {P} ({\text{A wins}})={\frac {p}{1-(1-p)^{2}}}.$ Thus, the probability that player B wins is $\mathbb {P} ({\text{B wins}})=1-{\frac {p}{1-(1-p)^{2}}}$ . We have ${\begin{aligned}\mathbb {P} ({\text{A wins}})-\mathbb {P} ({\text{B wins}})&={\frac {p}{1-(1-p)^{2}}}-1+{\frac {p}{1-(1-p)^{2}}}\\&={\frac {2p}{1-(1-p)^{2}}}-1\\&={\frac {2p-1+1-2p+p^{2}}{1-(1-p)^{2}}}\\&={\frac {p^{2}}{1-(1-p)^{2}}}.\\\end{aligned}}$ Notice that $p^{2}>0$ and $(1-p)^{2}<1\implies 1-(1-p)^{2}>0$ for every $p\in (0,1)$ . Hence, this difference is always positive, and hence $\mathbb {P} ({\text{A wins}})>\mathbb {P} ({\text{B wins}})$ always for every $p\in (0,1)$ .

$\Box$

Example. (Three Prisoners problem) There are three prisoners, A, B and C, which will originally be executed tomorrow. However, the governor chooses one of them randomly to be pardoned (and hence not executed). Only the warden knows who is to be pardoned, but is not allowed to tell the prisoners.

The three prisoners have also heard about this. So, prisoner A asks the warden

Which of the prisoners B and C will be executed? If both are to be executed, then just tell me randomly one of them.

The warden thinks a while, and tells the prisoner A that prisoner B is to be executed. The warden thinks that he does not give any information to prisoner A about whether he is to be pardoned or not, with the following reasoning:

Let $A,B,C$ be the events that A, B, or C is pardoned respectively. Also, let $W$ be the event that the warden says B will die. Since the prisoner to be pardoned is chosen randomly, we have $\mathbb {P} (A)=\mathbb {P} (B)=\mathbb {P} (C)={\frac {1}{3}}.$ After prisoner A hears the answer from warden, the conditional probability for him to be pardoned is $\mathbb {P} (A|W)={\frac {\mathbb {P} (A\cap W)}{\mathbb {P} (W)}}={\frac {\mathbb {P} (W|A)\mathbb {P} (A)}{\mathbb {P} (W|A)\mathbb {P} (A)+\mathbb {P} (W|B)\mathbb {P} (B)+\mathbb {P} (W|C)\mathbb {P} (C)}}={\frac {(1/2)(1/3)}{(1/2)(1/3)+(0)(1/3)+(1)(1/3)}}={\frac {1}{3}}.$ So, this probability is the same as the unconditional one.

However, prisoner A thinks that his probability of being pardoned is increased to ${\frac {1}{2}}$ with the following reasoning:

Given that prisoner B will be executed, either prisoner A or C will be pardoned. Hence, my probability of being pardoned is ${\frac {1}{2}}$ .

Explain why the prisoner A's reasoning is incorrect.

Solution. Prisoner A falsely interprets the warden's saying as the event $B^{c}$ , and he calculates $\mathbb {P} (A|B^{c})={\frac {\mathbb {P} (A\cap B^{c})}{\mathbb {P} (B^{c})}}={\frac {\mathbb {P} (B^{c}|A)\mathbb {P} (A)}{1-\mathbb {P} (B)}}={\frac {(1)(1/3)}{2/3}}={\frac {1}{2}}.$ But $B^{c}$ is not the same as $W$ ! Particularly, if B is not to be pardoned (that is, to be executed), then the warden will not necessarily tell that prisoner B is to be executed, since it is possible that C is also to be pardoned, and the warden tells C is to be executed instead.

Exercise. Suppose prisoner A tells the other two prisoners about the answer from the warden to his question.

(a) What is the probability that prisoner B is to be pardoned?

(b) What is the probability that prisoner C is to be pardoned?

Solution

Let us use the notations as defined by the warden's answer.

(a) The probability is $\mathbb {P} (B|W)={\frac {\mathbb {P} (B\cap W)}{\mathbb {P} (W)}}={\frac {\mathbb {P} (W|B)\mathbb {P} (B)}{\mathbb {P} (W)}}={\frac {0(1/3)}{\mathbb {P} (W)}}=0.$ That is, prisoner B knows that he must be executed!

(b) The probability is $\mathbb {P} (C|W)={\frac {\mathbb {P} (C\cap W)}{\mathbb {P} (W)}}={\frac {\mathbb {P} (W|C)\mathbb {P} (C)}{\mathbb {P} (W)}}={\frac {(1)(1/2)}{(1/2)(1/3)+(0)(1/3)+(1)(1/3)}}={\frac {2}{3}}.$ That is, prisoner C knows that he is twice as likely as before to be pardoned!

Remark.

From this, we actually know that the warden can give information to the prisoners about whether they are pardoned or not through his answer, if any of the prisoners B and C knows about this.

Independence

From the previous discussion, we know that the conditional probability of event $A$ given the occurrence of event $B$ can be interpreted as the probability of $A$ where the sample space is updated to event $B$ . In general, through this update, the probability of $A$ should be affected. But what if the probability is somehow the same as the one before the update?

If this is the case, then the occurrence of a particular event $B$ does not affect the probability of event $A$ actually. Symbolically, it means $\mathbb {P} (A|B)=\mathbb {P} (A)$ . If this holds, then we have $\mathbb {P} (B|A)={\frac {\mathbb {P} (A|B)\mathbb {P} (B)}{\mathbb {P} (A)}}={\frac {\mathbb {P} (A)\mathbb {P} (B)}{\mathbb {P} (A)}}=\mathbb {P} (B)$ . This means the occurrence of event $A$ also does not affect the probability of event $B$ . This result matches with our intuitive interpretation of the independence of two events, so it seems that it is quite reasonable to define the independence of events $A$ and $B$ as follows:

Two events $A$ and $B$ are independent if $\mathbb {P} (A|B)=\mathbb {P} (A)$ or $\mathbb {P} (B|A)=\mathbb {P} (B)$ .

However, this definition has some slight issues. If $\mathbb {P} (A)=0{\text{ or }}\mathbb {P} (B)=0$ , then some of the conditional probabilities involved may be undefined. So, for some events, we may not be able to tell whether they are independent or not using this "definition". To deal with this, we consider an alternative definition, that is equivalent to the above when $\mathbb {P} (A)\neq 0$ and $\mathbb {P} (B)\neq 0$ . To motivate that definition, we can see that $\mathbb {P} (A|B)=\mathbb {P} (A){\text{ or }}\mathbb {P} (B|A)=\mathbb {P} (B)\iff \mathbb {P} (A\cap B)=\mathbb {P} (A)\mathbb {P} (B)$ , when both $\mathbb {P} (A)$ and $\mathbb {P} (B)$ are nonzero. This results in the following definition:

Definition. (Independence of two events) Let $(\Omega ,{\mathcal {F}},\mathbb {P} )$ be a probability space, and $A,B\in {\mathcal {F}}$ be two events. Then, the events $A$ and $B$ are independent if $\mathbb {P} (A\cap B)=\mathbb {P} (A)\mathbb {P} (B)$ .

Remark.

We sometimes write $A\perp \!\!\!\!\perp B$ when $A$ and $B$ are independent.

But what if there are more than two events involved? Intuitively, we may consider the following as the general "definition" of independence:

The events $E_{1},E_{2},\dotsc ,E_{n}$ are independent if $\mathbb {P} (E_{1}\cap E_{2}\cap \dotsb \cap E_{n})=\mathbb {P} (E_{1})\mathbb {P} (E_{2})\dotsb \mathbb {P} (E_{n})$ . ( $n$ is an integer that is at least 2.)

But we will get some strange results by using this as the "definition":

Example. Suppose we roll a dice twice. Let $A$ be the event that both rolls result in the same number, $B$ be the event that sum of the numbers obtained is between 7 and 10, $C$ be the event that the sum is 7 or 8 or 12. Show that $\mathbb {P} (A\cap B\cap C)=\mathbb {P} (A)\mathbb {P} (B)\mathbb {P} (C)$ (that is, the events $A,B,C$ are "independent" according to the above "definition") but $\mathbb {P} (B\cap C)\neq \mathbb {P} (B)\mathbb {P} (C)$ (that is, the events $B$ and $C$ are not independent).

Proof. First, we have $\mathbb {P} (A)={\frac {6}{36}}={\frac {1}{6}}$ , $\mathbb {P} (B)={\frac {6+5+4+3}{36}}={\frac {18}{36}}={\frac {1}{2}}$ , and $\mathbb {P} (C)={\frac {6+5+1}{36}}={\frac {12}{36}}={\frac {1}{3}}$ . Hence, we have $\mathbb {P} (A\cap B\cap C)=\mathbb {P} (\{{\text{4 is obtained for both rolls}}\})={\frac {1}{36}}={\frac {1}{6}}\times {\frac {1}{2}}\times {\frac {1}{3}}=\mathbb {P} (A)\mathbb {P} (B)\mathbb {P} (C).$ But, $\mathbb {P} (B\cap C)=\mathbb {P} ({\text{the sum is 7 or 8}})={\frac {6+5}{36}}={\frac {11}{36}}\neq \mathbb {P} (B)\mathbb {P} (C)={\frac {1}{2}}\times {\frac {1}{3}}={\frac {1}{6}}.$

$\Box$

From this example, merely requiring $\mathbb {P} (E_{1}\cap E_{2}\cap \dotsb \cap E_{n})=\mathbb {P} (E_{1})\mathbb {P} (E_{2})\dotsb \mathbb {P} (E_{n})$ cannot ensure all pairs of events involved are independent. So, this suggests us another definition:

The events $E_{1},E_{2},\dotsc ,E_{n}$ are independent if all the pairs of events involved are independent.

However, we will again get some strange results by using this as the "definition":

Example. Consider two balls, where one is bigger than another. Both balls are equally likely to be red or blue. Let $A$ be the event that the bigger ball is red, $B$ be the event that the smaller ball is red, and $C$ be the event that both balls have the same color.

Show that all the pairs of events involved are independent, but $\mathbb {P} (C\cap A\cap B)\neq \mathbb {P} (C)\mathbb {P} (A\cap B)=\mathbb {P} (C)\mathbb {P} (A)\mathbb {P} (B)$ (that is, the events $C$ and $A\cap B$ are not independent.)

Proof. Consider the following tables containing relevant probabilities: ${\begin{array}{cccc}{\text{bigger ball}}\backslash {\text{smaller ball}}&{\color {red}{\text{red}}}&{\color {blue}{\text{blue}}}\\\hline {\color {red}{\text{red}}}&1/4({\color {darkgreen}\checkmark _{A}}{\color {brown}\checkmark _{B}}{\color {purple}\checkmark _{C}})&1/4({\color {darkgreen}\checkmark _{A}})\\{\color {blue}{\text{blue}}}&1/4({\color {brown}\checkmark _{B}})&1/4({\color {purple}\checkmark _{C}})\\\end{array}}$ From this, we can see that $\mathbb {P} (A)=\mathbb {P} (B)=\mathbb {P} (C)={\frac {1}{2}}$ . On the other hand, $\mathbb {P} (A\cap B)=\mathbb {P} (B\cap C)=\mathbb {P} (A\cap C)=\mathbb {P} (\{{\text{both balls are red}}\})={\frac {1}{4}}=\left({\frac {1}{2}}\right)^{2}.$ So, this means all the pairs of events involved are independent.

However, $\mathbb {P} (C\cap A\cap B)=\mathbb {P} (\{{\text{both balls are red}}\})={\frac {1}{4}}\neq \mathbb {P} (C)\mathbb {P} (A\cap B)=\mathbb {P} (C)\mathbb {P} (A)\mathbb {P} (B)={\frac {1}{8}}$ . (Actually, we have $\mathbb {P} (C|A\cap B)=1$ . This means given the occurrence of both events $A$ and $B$ , we know for sure that the event $C$ occurs. So, the knowledge of the occurrence of both events $A$ and $B$ affect the probability of event $C$ , although the probability of event $C$ is not affected by merely the occurrence of $A$ or merely the occurrence of $B$ . )

$\Box$

The above examples suggest us that we actually need both requirements

$\mathbb {P} (A\cap B\cap C)=\mathbb {P} (A)\mathbb {P} (B)\mathbb {P} (C)$ . (This ensures $\mathbb {P} (C|A\cap B)=\mathbb {P} (C),\mathbb {P} (A|B\cap C)=\mathbb {P} (A){\text{ and }}\mathbb {P} (B|A\cap C)=\mathbb {P} (B)$ . That is, if we condition on the intersection of two events, the probability of another one is not affected.)
All the pairs of events $A,B,C$ are independent. (That is, $\mathbb {P} (A\cap B)=\mathbb {P} (A)\mathbb {P} (B),\mathbb {P} (B\cap C)\mathbb {P} (B)\mathbb {P} (C){\text{ and }}\mathbb {P} (A\cap C)=\mathbb {P} (A)\mathbb {P} (C)$ .

in the definition of independence of events $A,B,C$ for the definition to "make sense".

Similarly, the independence of four events $A,B,C,D$ should require

$\mathbb {P} (A\cap B\cap C\cap D)=\mathbb {P} (A)\mathbb {P} (B)\mathbb {P} (C)\mathbb {P} (D)$ .
All the triples of events are independent.

In other words, we need the probability of intersection of any three and any two events to be able to "split as product of probabilities of single events" as in above.

This leads us to the following general definition:

Definition. (Independence) Let $(\Omega ,{\mathcal {F}},\mathbb {P} )$ be a probability space, and $E_{1},E_{2},\dotsc \in {\mathcal {F}}$ be events. Then, the events $E_{1},E_{2},\dotsc$ are (mutually) independent if $\mathbb {P} \left(\bigcap _{i\in I}E_{i}\right)=\prod _{i\in I}\mathbb {P} (E_{i})$ for every finite index set $I\subseteq \{1,2,\dotsc \}$ .

Remark.

This definition also applies to finitely many events $E_{1},E_{2},\dotsc ,E_{n}\in {\mathcal {F}}$ , where the index set $I$ is a subset of $\{1,2,\dotsc ,n\}$ instead.
If all the pairs of events involved are independent, then we say that $E_{1},E_{2},\dotsc ,E_{n}$ are pairwise independent. By definition, the mutual independence of $E_{1},E_{2},\dotsc ,E_{n}$ implies their pairwise independence (we just consider the index set with two elements in this case). But, the converse does not hold (discussed in a previous example).
Sometimes, based on the situation, two events may be assumed to be independent.

Example. Suppose Amy and Bob take turn to toss a fair coin, where Amy toss the coin first. Let $H_{1}$ and $H_{2}$ be the events that head comes up in the first and second toss respectively. Calculate the probability $\mathbb {P} (H_{1}\cap H_{2})$ .

Solution. It is reasonable to assume that $H_{1}$ and $H_{2}$ are independent. Hence, $\mathbb {P} (H_{1}\cap H_{2})=\mathbb {P} (H_{1})\mathbb {P} (H_{2})=\left({\frac {1}{2}}\right)^{2}={\frac {1}{4}}.$ (Alternatively, one may also reasonably assume that the four outcomes arising from these two tosses are equally likely, and hence conclude that $\mathbb {P} (H_{1}\cap H_{2})={\frac {1}{4}}$ .)

Example. Suppose a fair coin is tossed 10 times independently. Which of the following outcomes is more likely? (H represents head and T represents tail)

HHHHHHHHHT
HTTHTHHTHT

Solution. The probability for each of these outcomes is $\left({\frac {1}{2}}\right)^{10}=0.0009765625$ . So, they are equally likely.

Remark.

One may falsely think that the second outcome is more likely, since it seems the heads and tails are distributed more "equally". The reasoning behind this may be one falsely interpret the second outcome to be the same as "5 heads and 5 tails" (without specifying the order). But this is not the case: the second one specifies the outcome required in every toss.

Exercise. Calculate the probability that 5 heads and 5 tails are obtained in these 10 tosses. (Answer: approximately 0.246) (Hint: consider the number of different arrangements of 5 heads and 5 tails in these 10 tosses.)

Solution

We can regard the 10 tosses as 10 distinguishable cells with capacity one, and 5 "heads" to be 5 indistinguishable balls. Then, the process of arranging the outcomes is equivalent to placing the 5 balls into the 10 cells (the 5 empty cells are for "tails"). Hence, the number of arrangements is given by ${\binom {10}{5}}=252$ . Notice that every such arrangement is a sample point in the sample space (for the experiment of tossing a coin 10 times), and the probability of every singleton event is $\left({\frac {1}{2}}\right)^{10}$ . It follows that the desired probability is $\underbrace {\left({\frac {1}{2}}\right)^{10}+\left({\frac {1}{2}}\right)^{10}+\dotsb +\left({\frac {1}{2}}\right)^{10}} _{252{\text{ times}}}\approx 0.246.$

Example. Let $(\Omega ,{\mathcal {F}},\mathbb {P} )$ be a probability space, and $A,B\in {\mathcal {F}}$ be events. Show that if the events $A$ and $B$ are independent, then so is the events $A$ and $B^{c}$ .

Proof. Assume that the events $A$ and $B$ are independent. We wish to show that $\mathbb {P} (A\cap B^{c})=\mathbb {P} (A)\mathbb {P} (B^{c})$ .

We have ${\begin{aligned}\mathbb {P} (A\cap B^{c})&=\mathbb {P} (A)-\mathbb {P} (A\cap B)&({\text{a property}})\\&=\mathbb {P} (A)-\mathbb {P} (A)\mathbb {P} (B)&({\text{independence of }}A{\text{ and }}B)\\&=\mathbb {P} (A)(1-\mathbb {P} (B))\\&=\mathbb {P} (A)\mathbb {P} (B^{c}).&({\text{a property}})\\\end{aligned}}$ So, the events $A$ and $B^{c}$ are independent.

$\Box$

Exercise. Show that if the events $A$ and $B$ are independent,

(a) then so is the events $A^{c}$ and $B$ .

(b) then so is the events $A^{c}$ and $B^{c}$ .

Solution

(a)

Proof. Assume that the events $A$ and $B$ are independent. Then, ${\begin{aligned}\mathbb {P} (A^{c}\cap B)&=\mathbb {P} (B)-\mathbb {P} (A\cap B)&({\text{a property}})\\&=\mathbb {P} (B)-\mathbb {P} (A)\mathbb {P} (B)&({\text{independence of }}A{\text{ and }}B)\\&=\mathbb {P} (B)(1-\mathbb {P} (A))\\&=\mathbb {P} (B)\mathbb {P} (A^{c}).&({\text{a property}})\\\end{aligned}}$ So, the events $A^{c}$ and $B$ are independent.

$\Box$

Remark.

Actually, this result is symmetric to the one in the example above.

(b)

Proof. Assume that the events $A$ and $B$ are independent. Then, by (a) in this exercise, $A^{c}$ and $B$ are independent. It follows by the result in the example above that $A^{c}$ and $B^{c}$ are independent. (Take " $A$ " in the example above to be $A^{c}$ .)

$\Box$

In general, we have the following result.

Proposition. Let $(\Omega ,{\mathcal {F}},\mathbb {P} )$ be a probability space, and $E_{1},E_{2},\dotsc ,E_{n}\in {\mathcal {F}}$ be events (where $n$ is an arbitrary integer such that $n\geq 2$ ). If $E_{1},E_{2},\dotsc ,E_{n}$ are independent, then after replacing some of the events by their respective complements, the independence still holds.

Proof. Assume that the events $E_{1},E_{2},\dotsc ,E_{n}$ are independent. Then, we have $\mathbb {P} \left(\bigcap _{i=1}^{n}E_{i}\right)=\prod _{i=1}^{n}\mathbb {P} (E_{i}).$ Now, suppose we replace $E_{1}$ by $E_{1}^{c}$ , and we want to prove that the independence still holds: ${\begin{aligned}\mathbb {P} \left(E_{1}^{c}\cap \bigcap _{i=2}^{n}E_{i}\right)&=\mathbb {P} \left(\bigcap _{i=2}^{n}E_{i}\right)-\mathbb {P} \left(E_{1}\cap \bigcap _{i=2}^{n}E_{i}\right)&({\text{a property}})\\&={\color {blue}\prod _{i=2}^{n}\mathbb {P} (E_{i})}-\mathbb {P} (E_{1}){\color {blue}\prod _{i=2}^{n}\mathbb {P} (E_{i})}&({\text{independence of }}E_{1},E_{2},\dotsc ,E_{n})\\&={\color {blue}\prod _{i=2}^{n}\mathbb {P} (E_{i})}(1-\mathbb {P} (E_{1}))\\&={\color {blue}\prod _{i=2}^{n}\mathbb {P} (E_{i})}\mathbb {P} (E_{1}^{c}).\\\end{aligned}}$ Thus, $E_{1}^{c},E_{2},\dotsc ,E_{n}$ are still independent. By symmetry, we can instead replace an event other than $E_{1}$ by its respective complement, and the independence still holds.

Notice that we can split the process of replacing some of the events into multiple steps, where we replace one event in every step (that is, we replace the events one by one). Now, we can apply the above argument in each step to ensure that the resulting events are still independent. Thus, after all steps, and we finish the replacement, the independence still holds.

$\Box$

Example. Let $(\Omega ,{\mathcal {F}},\mathbb {P} )$ be a probability space, and $A,B\in {\mathcal {F}}$ be events such that $\mathbb {P} (A)=0$ and $\mathbb {P} (B)=1$ .

(a) Show that for every event $C$ , $A$ and $C$ are independent.

(b) Show that for every event $C$ , $B$ and $C$ are independent.

Solution.

(a)

Proof. For every event $C$ , we have $A\cap C\subseteq C$ . So, $\mathbb {P} (A\cap C)\leq \mathbb {P} (C)=0$ by monotonicity. On the other hand, $0\leq \mathbb {P} (A\cap C)$ by nonnegativity. These two inequalities imply that $\mathbb {P} (A\cap C)=0=\underbrace {\mathbb {P} (A)} _{0}\mathbb {P} (C)$ . Hence, the events $A$ and $C$ are independent.

$\Box$

Remark.

Notice that one cannot say that if $\mathbb {P} (A)=0$ , then $A=\varnothing$ (but the converse is true). It is possible to have nonempty event with probability zero. (We can have such event for continuous distribution (to be discussed later).)

(b)

Proof. Notice that $B=A^{c}$ . By (a), the events $A$ and $C$ are independent. It follows that $A^{c}=B$ and $C$ are independent (see a previous exercise).

$\Box$

Exercise. Are the events $A$ and $B$ independent?

Solution

They are independent. We can just apply (a) where the "event $C$ " is set to be the event $B$ .

Remark.

The meaning of this result is that knowledge of arbitrary event does not make a certain event less certain, and also does not make an impossible event possible, which is intuitive.

Example. Suppose a machine has three components: A, B, and C. Let $A,B,C$ be the event that the component A, B, and C to fail respectively. Based on the studies on the machine, it is known that $\mathbb {P} (A)=0.2,\mathbb {P} (B)=0.1{\text{ and }}\mathbb {P} (C)=0.01$ , and we can also assume that the events $A,B,C$ are (mutually) independent. Suppose the machine fails if one of the components fails, and operates normally otherwise.

(a) Calculate the probability that the machine operates normally.

(b) Given that the machine fails, calculate the probability that all three component fail.

Solution.

(a) The desired probability is given by $\mathbb {P} (A^{c}\cap B^{c}\cap C^{c})$ . We know that the events $A^{c},B^{c},C^{c}$ are independent. So, we have $\mathbb {P} (A^{c}\cap B^{c}\cap C^{c})=\mathbb {P} (A^{c})\mathbb {P} (B^{c})\mathbb {P} (C^{c})=(1-0.2)(1-0.1)(1-0.01)=0.7128.$

(b) The desired probability is given by $\mathbb {P} (A\cap B\cap C|{\text{machine fails}})$ . We have by definition ${\begin{aligned}\mathbb {P} (A\cap B\cap C|\{{\text{machine fails}}\})&={\frac {\mathbb {P} (A\cap B\cap C)}{\mathbb {P} (\{{\text{machine fails}}\})}}&(A\cap B\cap C\subseteq \{{\text{machine fails}}\})\\&={\frac {\mathbb {P} (A)\mathbb {P} (B)\mathbb {P} (C)}{1-0.7128}}\\&={\frac {0.2(0.1)(0.01)}{0.2872}}\\&\approx 0.000696.\end{aligned}}$

Exercise. Suppose instead the machine fails if at least two of the component fail, and operates normally otherwise. Repeat parts (a) and (b) above. (Answer: (a) 0.9774; (b) approximately 0.00885)

Solution

(a) The probability is ${\begin{aligned}\mathbb {P} (\{{\text{machine operates normally}}\})&=\mathbb {P} (A\cap B^{c}\cap C^{c})+\mathbb {P} (A^{c}\cap B\cap C^{c})+\mathbb {P} (A^{c}\cap B^{c}\cap C)+\mathbb {P} (A^{c}\cap B^{c}\cap C^{c})\\&=0.2646+0.7128\\&=0.9774.\end{aligned}}$ (b) The desired probability is ${\begin{aligned}\mathbb {P} (A\cap B\cap C|\{{\text{machine fails}}\})&={\frac {\mathbb {P} (A\cap B\cap C)}{\mathbb {P} (\{{\text{machine fails}}\})}}&(A\cap B\cap C\subseteq \{{\text{machine fails}}\})\\&={\frac {\mathbb {P} (A)\mathbb {P} (B)\mathbb {P} (C)}{1-0.9774}}\\&={\frac {0.2(0.1)(0.01)}{0.0226}}\\&\approx 0.00885.\end{aligned}}$

Exercise.

(a) Show that the events $A$ and $A^{c}$ are independent if and only if $\mathbb {P} (A)=0{\text{ or }}1$ .

(b) Show that the event $A$ is independent of itself if and only if $\mathbb {P} (A)=0{\text{ or }}1$ .

(Hint: prove the "if" and "only if" parts simultaneously)

Solution

(a)

Proof. We have ${\begin{aligned}A\perp \!\!\!\!\perp A^{c}&\iff \mathbb {P} (A\cap A^{c})=\mathbb {P} (A)\mathbb {P} (A^{c})\\&\iff \underbrace {\mathbb {P} (\varnothing )} _{0}=\mathbb {P} (A)(1-\mathbb {P} (A))\\&\iff \mathbb {P} (A)=0{\text{ or }}1.\\\end{aligned}}$

$\Box$

(b)

Proof. We have ${\begin{aligned}A\perp \!\!\!\!\perp A&\iff \mathbb {P} (A\cap A)=\mathbb {P} (A)\mathbb {P} (A)\\&\iff \mathbb {P} (A)={\big (}\mathbb {P} (A){\big )}^{2}\\&\iff \mathbb {P} (A)=0{\text{ or }}1.\\\end{aligned}}$

$\Box$

Exercise. A student claims that for every event $A,B,C$ , if $A$ and $B$ are independent, and $B$ and $C$ are independent, then $A$ and $C$ are also independent.

Is the student's claim correct? If yes, prove it. If no, give a counterexample.

Solution

The student's claim is not correct. For example, consider an experiment where we toss a fair coin twice. Let $A$ be the event that head comes up in the first toss, $B$ be the event that head comes up in the second toss, $C$ be the same event as $A$ . Since all four outcomes should be equally likely, it follows that $\mathbb {P} (A\cap B)={\frac {1}{4}}=\mathbb {P} (A)\mathbb {P} (B)=\left({\frac {1}{2}}\right)^{2}.$ Thus, $A$ and $B$ to be independent, then so is $B$ and $C$ (since $C$ is the same as $A$ ). However, $A$ and $C$ are clearly not independent:

$\mathbb {P} (A\cap C)=\mathbb {P} (A)={\frac {1}{2}}$ .
But, $\mathbb {P} (A)\mathbb {P} (C)=\left({\frac {1}{2}}\right)^{2}={\frac {1}{4}}$ .

Example. Consider the following procedure to make a fair toss from a coin with unknown probability $p\in (0,1)$ of getting a head (suggested by John von Neumann):

Toss the coin twice (independently).
If both outcomes are the same, ignore them and return to step 1. Otherwise, go to next step.
Regard the first outcome as the outcome obtained.

Show that this procedure works.

Proof. Let $E$ be the event that a head is obtained at the end of this procedure. We then want to show that $\mathbb {P} (E)={\frac {1}{2}}$ (this also means the probability for obtaining a tail at the end is ${\frac {1}{2}}$ ). Now, we consider the following tree diagram:


 *---- HH ----> repeat
/
------ HT -----> E occurs
------ TH -----> E^c occurs
\
 *---- TT ----> repeat

Notice that if both outcomes are the same, and we return to step 1, then the situation is exactly the same as the initial situation. Thus, $\mathbb {P} (E|\{HH\})=\mathbb {P} (E)$ and $\mathbb {P} (E|\{TT\})=\mathbb {P} (E)$ . Also, we can calculate

$\mathbb {P} (\{HH\})=p^{2}$ .
$\mathbb {P} (\{HT\})=p(1-p)$ .
$\mathbb {P} (\{TH\})=p(1-p)$ .
$\mathbb {P} (\{TT\})=(1-p)^{2}$ .

Hence, by law of total probability, we can conclude that $\mathbb {P} (E)=p^{2}\mathbb {P} (E)+p(1-p)(1)+p(1-p)(0)+(1-p)^{2}\mathbb {P} (E)\implies \mathbb {P} (E)={\frac {p(1-p)}{\underbrace {1-p^{2}} _{(1-p)(1+p)}-(1-p)^{2}}}={\frac {p}{1+p-1+p}}={\frac {1}{2}}.$

$\Box$

Conditional independence

Conditional independence is a conditional version of independence, and has the following definition which is similar to that of independence.

Definition. (Conditional independence) Let $(\Omega ,{\mathcal {F}},\mathbb {P} )$ be a probability space, and $E_{1},E_{2},\dotsc ,C\in {\mathcal {F}}$ be events. Then, the events $E_{1},E_{2},\dotsc$ are conditionally independent given the occurrence of event $C$ if $\mathbb {P} \left(\bigcap _{i\in I}E_{i}{\bigg |}C\right)=\prod _{i\in I}\mathbb {P} (E_{i}|C)$ for every finite index set $I\subseteq \{1,2,\dotsc \}$ .

Remark.

This definition also applies to finitely many events $E_{1},E_{2},\dotsc ,E_{n}\in {\mathcal {F}}$ , where the index set $I$ is a subset of $\{1,2,\dotsc ,n\}$ instead.
In particular, if events $A$ and $B$ are conditionally independent given $C$ (assuming $\mathbb {P} (B{\color {darkgreen}{\color {darkgreen}|C}})>0$ and $\mathbb {P} ({\color {darkgreen}C})>0$ ), $\mathbb {P} (A\cap B{\color {darkgreen}{\color {darkgreen}|C}})=\mathbb {P} (A{\color {darkgreen}{\color {darkgreen}|C}})\mathbb {P} (B{\color {darkgreen}{\color {darkgreen}|C}})\iff {\frac {\mathbb {P} (A\cap B{\color {darkgreen}|C})\mathbb {P} ({\color {darkgreen}C})}{\mathbb {P} (B{\color {darkgreen}|C})\mathbb {P} ({\color {darkgreen}C})}}=\mathbb {P} (A{\color {darkgreen}|C})\iff \mathbb {P} (A|B{\color {darkgreen}\cap C})=\mathbb {P} (A{\color {darkgreen}|C}).$

This means given the occurrence of $C$ , the occurrence of $B$ does not affect the probability of $A$ .

Conditional independence of some events neither implies nor is implied by independence of them. These two concepts are not related.

Example. Suppose we select two people from a large population randomly, and we label them as person A and B. Let $A$ be the event that the birthday of person A is June 1st, $B$ be the event that the birthday of person B is July 1st, and $C$ be the event that person A and B are twins. (Assume that it is equally likely for the birthday of a person to be one of the 365 days.) Then, it should be reasonable to assume that events $A$ and $B$ are conditionally independent given $C^{c}$ . Show that the events $A$ and $B$ are not conditionally independent given $C$ .

Proof. Since $\mathbb {P} (A\cap B|C)=0$ and $\mathbb {P} (A|C)\mathbb {P} (B|C)=\mathbb {P} (A)\mathbb {P} (B)={\frac {1}{365^{2}}}$ , ( $A$ and $C$ are independent. Also, $B$ and $C$ are independent.) it follows that $\mathbb {P} (A\cap B|C)\neq \mathbb {P} (A|C)\mathbb {P} (B|C)$ .

$\Box$

Exercise. Show that the events $A$ and $B$ are independent (unconditionally) if and only if $\mathbb {P} (C)=1$ .

Solution

Proof. By law of total probability, ${\begin{aligned}\mathbb {P} (A\cap B)&=\mathbb {P} (A\cap B|C)\mathbb {P} (C)+\mathbb {P} (A\cap B|C^{c})\mathbb {P} (C^{c})\\&=\mathbb {P} (A|C)\mathbb {P} (B|C)\mathbb {P} (C)+0&(A{\text{ and }}B{\text{ are conditionally independent given }}C)\\&={\frac {\mathbb {P} (C)}{365^{2}}}.\end{aligned}}$ On the other hand, $\mathbb {P} (A)\mathbb {P} (B)={\frac {1}{365^{2}}}$ . This means $\mathbb {P} (A\cap B)=\mathbb {P} (A)\mathbb {P} (B)$ if and only if $\mathbb {P} (C)=1$ .

$\Box$

Probability Spaces

Probability
Conditional Probability

Random Variables