Lesson 21: Normal Distributions: What If A Normal Isn't Standard?
Lesson 21: Normal Distributions: What If A Normal Isn't Standard?
01
Assume that X has a normal distribution with mean µ and standard deviation
(
(SD) σ . That is, X ~ N µ , σ . )
• µ and σ could have any units: points, inches, feet, pounds, etc.
Make sure µ and σ are written using the same units, however.
( ) ( )
• a) Find P X < 60 inches , which is the same as P X < 5 feet . First write
the corresponding probability expression for Z. Show work by using the
Formula for z Scores.
( ) ( )
• b) Find P X > 60 inches , which is the same as P X > 5 feet . First write
the corresponding probability expression for Z .
( )
• c) Find P 60 inches < X < 72 inches , which is the same as
P (5 feet < X < 6 feet ) . First write the corresponding probability
expression for Z . Use the Formula for z Scores when showing work.
§ Solution
( )
• a) Find P X < 60 inches . First take the boundary x score, 60 inches, and
transform it into a z score by using the Formula for z Scores:
x−µ 60 − 65
x = 60 inches ⇒ z = = ≈ − 1.43
σ 3.5
Write the corresponding probability expression for Z ; from the figure on
the left, we see that this probability is about 0.0764:
( )
• b) Find P X > 60 inches . We want the complementary probability;
remember that the total area under the density curve is 1:
( )
• c) Find P 60 inches < X < 72 inches . First take the boundary x scores,
60 inches and 72 inches, and transform them into z scores by using the
Formula for z Scores:
x−µ 60 − 65
x = 60 inches ⇒ z =
σ
=
3.5
≈ − 1.43 ( found in a))
x−µ 72 − 65
x = 72 inches ⇒ z = = ≈ 2.00
σ 3.5
Write the corresponding probability expression for Z ; use the given
probabilities in the figures:
§
(Lesson 21: Normal Distributions) 21.04
Proof
Solve for x:
x−µ
z=
σ
Multiply both sides by σ :
zσ = x − µ
Add µ to both sides:
µ + zσ = x
x = µ + zσ
§ Solution
( )
The desired probability statement for X is: P X < 68.6 inches ≈ 0.85 .
That is, about 85% of American women are shorter than 68.6 inches.
§
(Lesson 22: The Central Limit Theorem (CLT)) 22.01
(
X has the following probability distribution, which we will call D X ~ D ; )
we saw this in Lesson 15, Example 1.
Value Probability
(x) P (x)
1 1/6
2 1/6
3 1/6
4 1/6
5 1/6
6 1/6
This is a discrete uniform distribution, but the ideas of this lesson will apply to all
discrete and continuous distributions with a finite standard deviation (which we
will assume).
Using the formulas from Lesson 16, we can find that D has:
mean, µ = 3.5
SD, σ ≈ 1.7078
For one die, it is convenient that the die result, the sum, and the mean are all
the same. For example, if the die comes up a “3,” then the sum is 3, and the mean
is 3.
We roll two standard six-sided dice, one red and one green.
We will consider:
X1 + X 2
X = = the sample mean (or average) of the dice
2
Table of Sums (∑ X ) ( )
Table of Means X
(Lesson 22: The Central Limit Theorem (CLT)) 22.03
The spike plot below describes two sampling distributions: the distribution of
sums (∑ X ) as well as the distribution of means ( X ) , although they use
different scales. Possible sums are between 2 and 12, while possible means are
still between 1 and 6, with a mean of 3.5 (as for one die).
3.5
X~D
↙ ↘
Sample: X1 X2
↘ ↙
Sample Mean: X
Imagine 1000 people, each rolling a pair of dice. Each person takes a sample of
size n = 2 from D, the original, uniform distribution. Each person finds the sample
mean of the two dice; these are values of X . For example:
X1 X2 X
Sample #1 2 3 2.5
Sample #2 6 2 4.0
! ! ! !
Sample #1000 1 5 3.0
(Lesson 22: The Central Limit Theorem (CLT)) 22.04
The relative frequency histogram (or spike plot) for the values of X should
resemble the triangular distribution. This is because of the Law of Large
Numbers (LLN).
n=2
mean, µ X = µ = 3.5
σ 1.7078
SD or SE, σ X = ≈ ≈ 1.2076
n 2
Think About It: As n, the number of dice in a sample, increases, what will the
shape of the sampling distribution for X begin to resemble? See the next page ….
(Lesson 22: The Central Limit Theorem (CLT)) 22.05
D = Uniform Standard Die Distribution; µ = 3.5 ; σ ≈ 1.7078
x P( x)
1 1/6 ≈ 0.16667
2 1/6 ≈ 0.16667
3 1/6 ≈ 0.16667
4 1/6 ≈ 0.16667
5 1/6 ≈ 0.16667
6 1/6 ≈ 0.16667
In the figures below, n dice are to be rolled. “N” means “Normal,” not population size.
n =1 n=2
n=4 n=8
n = 16 n = 32
The Central Limit Theorem (CLT) applies when the sample size is large
enough; usually n > 30 is the standard.
( )
According to the CLT for Means, if a machine rolls 32 dice n = 32 , then the
sampling distribution for X , is approximately normal:
n = 32
mean, µ X = µ = 3.5
σ 1.7078
SD or SE, σ X = ≈ ≈ 0.30190
n 32
Therefore:
( )
approx.
X ∼ N µ X = 3.5, σ X ≈ 0.30190
We expect the sample mean to be very close to 3.5, since we expect high and low
numbers to have a strong tendency to balance each other out. The standard
error (SE) is about 0.3, much smaller than the 1.7 or so that we started with for σ .
(Lesson 22: The Central Limit Theorem (CLT)) 22.07
( )
In Lesson 9, we saw the “Two SD” 2σ Rule for Usual Values.
( )
We can extend this to the following “Two SE” 2σ X Rule for Usual Values of a
Sample Mean.
( )
The "Two SE" 2σ X Rule for Usual Values of a Sample Mean
Use the following distribution for X ; round off the value of σ X to 0.3.
( )
approx.
X ∼ N µ X = 3.5, σ X ≈ 0.3
§ Solution
( )
When finding probabilities for the sample mean X , we need to adapt our
Formula for z Scores:
( )
Find the probability that the average of the 32 dice X will be between 3.0
( )
and 4.0. That is, find P 3.0 < X < 4.0 .
§ Solution
mean, µ X = µ = 3.5
σ 1.7078
SD or SE, σ X = ≈ ≈ 0.30190
n 32
The sample size n = 32 . Since n > 30 , the CLT applies, so we use the
following distribution for X :
( )
approx.
X ∼ N µ X = 3.5, σ X ≈ 0.30190
( )
We want to find P 3.0 < X < 4.0 . Take the boundary x scores, 3.0 and
4.0, and transform them into z scores by using the Formula for z Scores for
Sample Means:
x−µX 3.0 − 3.5
x = 3.0 ⇒ z = ≈ ≈ − 1.66
σX 0.30190
• Note: If we are more interested in the sum (or total) of the dice, observe
( ) ( )
that: P 3.0 < X < 4.0 = P 96 < ∑ X < 128 . Here, a sum is 32 times an
average. The CLT for Sums implies that:
32 ⎛ mean = nµ = 112,
approx. ⎞
∑ i X ∼ N ⎜ ⎟
⎝ SD = σ n ≈ 9.6608⎠
i=1
( )
If the sample size n is large n > 30 , or if D is approximately normal, then
approx.
⎛ σ ⎞
X ∼ N ⎜ mean = µ X = µ , SD or SE = σ X = ⎟
⎝ n⎠
• Note: The CLT works best when D is symmetric and does not have thick
tails.
An asymmetric distribution D is analyzed in the table on the next page. For small
values of n, the asymmetry is still evident, but it becomes less of an issue by the
n = 32 case.
(Lesson 22: The Central Limit Theorem (CLT)) 22.11.
x P x()
1 1/2 = 0.5
2 1/3 ≈ 0.33333
3 1/6 ≈ 0.16667
In the figures below, n dice are to be rolled. “N” means “Normal,” not population size.
n =1 n=2
n=4 n=8
n = 16 n = 32
⎛ 1 ⎞
In fact, Bin ⎜ n = 10, p = or 0.5⎟ is the most basic binomial distribution that can
⎝ 2 ⎠
be approximated by a normal distribution.
n=1
n=2
n=3
n=4
n=5
n=6
n=7
n=8
n=9
n = 10
(Lesson 23: Normal Approximations to Binomial Distributions) 23.03
( )
If X ∼ Bin n, p , then:
mean, µ = np
SD, σ = npq
( )
Let X ∼ Bin n, p . If np ≥ 5 and nq ≥ 5 , then:
( )
approx.
X ∼ N mean = µ = np, SD = σ = npq
(Lesson 23: Normal Approximations to Binomial Distributions) 23.04
We use continuity corrections to adjust for the fact that we are using a continuous
distribution (a normal distribution) to approximate a discrete distribution
(a binomial distribution).
⎛ 1⎞
( )
Let’s say we want to find P 45 ≤ X ≤ 55 , where X ∼ Bin ⎜ n = 100, p = ⎟ .
⎝ 2⎠
Since the distribution is discrete, it may matter whether we use “<” or “< .”
• We associate the integer value a in the binomial distribution with the interval
( )
a − 0.5, a + 0.5 in the approximating normal distribution. Think: “rounding.”
• We associate the integer value “45” in the binomial distribution with the interval
( )
44.5, 45.5 in the approximating normal distribution.
• We associate the integer value “55” in the binomial distribution with the interval
( )
54.5, 55.5 in the approximating normal distribution.
( )
To approximate P 45 ≤ X ≤ 55 for the binomial random variable X ,
(
we will use continuity corrections and find P 44.5 ≤ X c ≤ 55.5 )
for the normal random variable X c . Think: “corrected X.”
( )
• X c is continuous, so we may consider P 44.5 < X c < 55.5 instead.
(Lesson 23: Normal Approximations to Binomial Distributions) 23.05
( )
Approximate P 45 ≤ X ≤ 55 by following these steps:
( )
• d) Apply continuity corrections and rewrite P 45 ≤ X ≤ 55 in terms of
Xc .
• e) Find the z scores for the boundary values of xc using the Formula for
z Scores.
• f) Write the corresponding probability expression for Z .
( )
• g) Approximate P 45 ≤ X ≤ 55 . Use these hints regarding the Z
distribution:
(Lesson 23: Normal Approximations to Binomial Distributions) 23.06
§ Solution
• a) Describe the distribution of X.
⎛ 1⎞
X ∼ Bin ⎜ n = 100, p = ⎟
⎝ 2⎠
approx.
(
Therefore, X ∼ N µ = 50, σ = 5 )
(
• d) Apply continuity corrections and rewrite P 45 ≤ X ≤ 55 in terms of )
Xc .
( )
From Part C, we found that: P 45 ≤ X ≤ 55 ≈ P 44.5 < X c < 55.5 . ( )
(Lesson 23: Normal Approximations to Binomial Distributions) 23.07
• e) Find the z scores for the boundary values of xc using the Formula for
z Scores.
x−µ 44.5 − 50
xc = x = 44.5 ⇒ z = = ≈ − 1.10
σ 5
x−µ 55.5 − 50
xc = x = 55.5 ⇒ z = = ≈ 1.10
σ 5
• f) Write the corresponding probability expression for Z .
(
• g) Approximate P 45 ≤ X ≤ 55 . )
P ( −1.10 < Z < 1.10 ) ≈ 0.8643− 0.1357
≈ 0.7286
( ) ( ) ( )
and calculate P 45 + P 46 + ...+ P 55 to get a more precise result.
• If we had not applied the continuity corrections, our answer would have
been about 0.6826.
§
(Lesson 23: Normal Approximations to Binomial Distributions) 23.08
( )
• a) The probability of at most 45 heads, P X ≤ 45 .
(
• b) The probability of more than 55 heads, P X > 55 . )
§ Solution
• a)
• b)
• They are often seen in practice. This is why the “68-95-97%” Rule for
normal distributions is called the Empirical Rule.
The Central Limit Theorem (CLT) can be applied to a sum of many indicator
variables. In our coin examples, we consider the sum of 100 indicator variables: