Chapter 5: Common Distributions: 5.1 The Normal Distribution
Chapter 5: Common Distributions: 5.1 The Normal Distribution
Chapter 5: Common Distributions: 5.1 The Normal Distribution
In this chapter we examine four of the distributions that will be frequently encountered later
in the course.
The normal distribution is the most widely used distribution in statistics. Continuous data
such as mass, length, etc, can often be modelled using a normal distribution.
The normal distribution has two parameters- the mean ( ) and variance ( 2 ). If a random
variable X has a normal distribution then we can write this as:
X ~ N[ , 2 ].
X
~ N[0,1].
The process of subtracting the mean and dividing by the standard deviation is referred to as
standardisation:
x
z
0.4
0.12
0.10
0.3
dnorm(x, sd = 3)
0.08
dnorm(x)
0.2
0.06
0.04
0.1
0.02
0.0
0.00
-4 -2 0 2 4 -5 0 5
Example:
The fully grown lengths (in mm) of a certain insect can be regarded as having the following
normal distribution:
X ~ N[64, 16].
What is the probability that an insect has length less than 59 mm?
Definition: Consider a random variable X with some distribution. The (upper) 100 %
point is the value of x such that:
P(X > x) = .
For the standard normal distribution, we will denote the (upper) 100% point by z , i.e.:
P(Z > z ) = .
X ~ N[ , 2 ] Z ~ N[0, 1]
x
z
In statistical tables (e.g. Lindley and Scott), there is a separate percentage point table covering
the most used values of . In Lindley and Scott,
P represents 100 ,
x(P) represents the value of z .
Extract:
The 10% point for
P = 100 x(P) = z the standard
10% 0.01 1.2816 normal is
5% 0.05 1.6449 z 0.1 1.2816.
2% 0.02 2.0537
1% 0.01 2.3263
0.1% 0.001 3.0902
Example 1:
Let X ~ N[50, 16]. Find the value of x such that P(X > x) = 0.05, i.e. find the (upper) 5% point.
X 50
If X ~ N[50, 16], then ~ N[0, 1].
4
The 5% point for the standard normal is z 0.05 1.6449.
x 50
Thus, the 5% point for a N[50, 16] distribution can be obtained by solving 1.6449.
4
So, the 5% point is x 50 1.6449 4 56.5796.
Example 2:
Let Z ~ N[0, 1]. Find the value of z such that P(Z < z) = 0.01 (i.e. find the lower 1% point).
The upper 1% point for a standard normal is z 0.01 2.3263 . Therefore, P(Z > 2.3263) =
0.01.
By symmetry, we must also have P(Z < -2.3263) = 0.01. So, the lower 1% point is –2.3263.
3.2.1 Introduction
The chi-squared ( 2 ) distribution has a single parameter called the degrees of freedom- this
can be any positive integer. The 2 distribution with n degrees of freedom is denoted n2 .
This density is written in terms of the gamma function. Some of the key properties of this
function are:
( x) ( x 1)( x 1);
12 ;
( x) ( x 1)! if x is a natural number.
The degrees of freedom, n, define the shape of the 2 density. For n < 3, the density has a
mode at zero. For n 3, the mode moves further away from zero as n increases. The shapes
of some specific densities are given below.
Graph of several chi-squared densities
0.6
n= 1
n= 2
0.5 n= 4
n= 8
0.4
0.3
0.2
0.1
0
0 2 4 6 8 10 12
Extracts:
= 3.0 = 7.0
x P(X < x) x P(X < x)
0.0 0.0000 1.0 0.0052
0.5 0.0811 2.0 0.0402
1.0 0.1987 3.0 0.1150
1.5 0.3177 4.0 0.2202
2.0 0.4276 5.0 0.3400
2.5 0.5247 6.0 0.4603
3.0 0.6084 7.0 0.5711
3.5 0.6792 8.0 0.6674
4.0 0.7385 9.0 0.7473
etc 10.0 0.8114
Example 1:
If X ~ 32 , then P(X < 2.5) = 0.5247.
Example 2:
Suppose X ~ 72 . Find P(X > 10).
Now, from tables we can find, P(X < 10) = 0.8114 P(X > 10) = 1 – 0.8114 = 0.1886.
The 100 % point for the n2 distribution is denoted n2, . Therefore, if X ~ n2 , then
P(X > n2, ) = .
The percentage points of the 2 distribution are in a separate table in Lindley and Scott.
Extract:
P 99 95 10 5 1
52, 0.1 9.236.
= 1.0 0.000 0.004 2.706 3.841 6.635 So P(X > 9.236) = 0.1
= 2.0 0.020 0.103 4.606 5.991 9.210
= 3.0 0.115 0.352 6.251 7.815 11.34
= 4.0 0.297 0.711 7.779 9.488 13.28
= 5.0 0.554 1.145 9.236 11.07 15.09
= 6.0 0.872 1.635 10.64 12.59 16.81
= 7.0 1.239 2.167 12.02 14.07 18.48
= 8.0 1.646 2.733 13.36 15.51 20.09
In this table, the degrees of freedom for the distribution are listed going down the rows and P
is 100.
The chi-squared distribution is not symmetric (unlike the normal distribution). So if we want
a lower percentage point (i.e. a value of x such that P(X < x) = ) , then we can't simply
negate the corresponding upper percentage point. Instead we need to find n2,1 .
Example 1:
Let X ~ 82 . Find the lower 1% point (i.e. the value of x such that P(X < x) = 0.01).
The lower 1% point is denoted 82, 0.99 , the value for which is 1.646.
Example 2:
Suppose X ~ 10
2
. Find the value of t for which P(X > t) = 0.1321.
Here, t would be the 13.21% point for the distribution. But, 0.1321 is a non-standard
value of . So we need to use the distribution function table to find t.
P(X > t) = 0.1321 P(X < t) = 1 – 0.1321 = 0.8679.
Going through the distribution table we find that t = 15.
5.3.1 Introduction
Definition: Suppose that we have two independent random variables Y and Z, such that:
Y ~ N[0, 1] and Z ~ n2 .
Then the random variable X defined by
Y
X
Z n
The t-distribution is symmetric about zero and its general shape is like the bell-shape of a
normal distribution. However, the tails of the t-distribution can approach zero much more
slowly than those of the normal distribution- i.e. the t-distribution is more heavy tailed than
the normal. The degrees of freedom define how heavy-tailed the t-distribution is.
Note:
The t-distribution with n = 1 is sometimes referred to as the Cauchy distribution. This is so
heavy tailed that its mean and variance do not exist! (This is because the integrals specifying
the mean and variance are not absolutely convergent.)
Important note:
The density of a t-distribution converges to that of the standard normal as n .
The diagram below shows how the t-distribution varies for different degrees of freedom.
0.25
Density
0.2
0.15
0.1
0.05
0
-3 -2 -1 0 1 2 3
x
5.3.2 Probabilities
Probabilities associated with the t-distribution can be looked up in tables. In Lindley and
Scott, the degrees of freedom are again denoted by and are listed along the top of the
columns. Then for each value t listed, the values in the table are the probability that X < t.
Example 1:
Let X ~ t 3 . Then P(X < 2.5) = 0.9561.
Example 2:
Let X ~ t12 . Find P(X > 2.5).
Percentage points
Example 1:
Find the 5% point for t 6 .
Directly from tables, this is seen to be t 6, 0.05 1.943. (Thus P(X > 1.943) = 0.05.)
Example 2:
Let X ~ t10 . Find the value of t such that P(X < t) = 0.01 (i.e. find the lower 1% point).
Note: To find non-standard percentage points (such as the 12.5% point, for example), we
need to use the t-distribution function table.
5.4.1 Introduction
Note: The density for the F-distribution is only defined for positive values of x. The values
of the two degrees of freedom define the shape of the distribution. Plots of the F-distribution
for various values of n and m are shown below.
Graphs of several F distributions
1
n=2, m=2
0.9 n=4, m=4
n=8, m=8
0.8 n=20, m=20
0.7
0.6
Density
0.5
0.4
0.3
0.2
0.1
0
0 1 2 3 4 5 6
x
0.7
0.6
Density
0.5
0.4
0.3
0.2
0.1
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x
Lindley and Scott do not have tables for looking up probabilities associated with the F-
distribution.
Separate tables giving 10, 5, 2.5, 1, 0.5 and 0.1 percentage points for F-distributions with
different combinations of degrees of freedom can be found in Lindley and Scott.
We will denote the (upper) 100 % point for the Fn, m distribution by Fn, m, . If X ~ Fn, m ,
then:
P(X > Fn, m, ) = .
In the table of the 100 percentage points for the F-distribution, the first degrees of freedom
is denoted 1 and listed along the columns. The second degrees of freedom is denoted by 2
and listed down the rows.
1 1 2 3 4 5
2 1 4052 4999 5403 5625 5764
2 98.50 99.00 99.17 99.25 99.30
3 34.12 30.82 29.46 28.71 28.24
4 21.20 18.00 16.69 15.98 15.52
5 16.26 13.27 12.06 11.39 10.97
Example:
Find the 5% point for both the F5,10 and the F10, 5 distributions.
The tables in Lindley and Scott give the upper percentage points only- i.e. they give the
values of x such that P(X > x) = , for small values of . Since the F-distribution is not
symmetric, to find lower upper percentage points we cannot simply use the negative of the
corresponding upper percentage point:
P ( X x) P ( X x).
The density is in fact not even defined for x < 0.
5.3.3 Finding lower percentage points
Y
Result: Suppose that X ~ Fn, m . Then
Z
Z
X 1 ~ Fm, n .
Y
Proof:
Y
X ~ Fn, m if nY ~ n2 and mZ ~ m2 .
Z
But by definition of the F-distribution, this means that
Z
~ Fm, n
Y
as required.
We can use this result to find lower percentage points for F-distributions:
Important result:
The lower 100 percentage point for the Fn, m distribution is the reciprocal of the upper
100 percentage point of the Fm, n distribution.
Proof:
If X ~ Fn, m and x represents the lower 100 percentage point for this distribution, then P(X <
x) = .
But
1 1
P( X x) P .
X x
1 1
As ~ Fm, n then is (by definition) the upper 100 percentage point of the Fm, n
X x
distribution.
1
So, x .
Fm, n,
Example 1:
Let X ~ F5,10 . Suppose we wish to find x such that P(X < x) = 0.05- i.e. we want to find the
lower 5% point of the F5,10 distribution.
The lower 5% point of the F5,10 distribution is the reciprocal of the upper 5% point of
F10, 5 distribution.
So,
1 1
x 0.2112.
F10, 5, 0.05 4.735
Example 2:
Suppose X ~ F4,7 . Find the upper and lower 10% points.
The upper 10% point can be found directly from tables:
F4, 7, 0.1 2.961.
The lower 10% point is the reciprocal of the upper 10% point of the F7, 4 distribution:
1 1
Lower 10% point = F4, 7, 0.9 0.2513.
F7, 4, 0.1 3.979
Exercise:
Suppose X ~ F2, 4 . Find the upper and lower 1% points.
4) If X ~ t n , then X 2 ~ F1, n .
The purpose of many statistical investigations is to learn about the distribution of some
random variable X. Many aspects about X's distribution may be of interest, but attention often
focuses on one or two particular population characteristics.
Example 1:
A bakery needs to decide how many loaves of fresh bread it should put out on its shelves each
day. If they put out too many, then they will lose money as stale bread will not sell, and if
they put out too few, then they will lose potential sales. Therefore, to help the bakery make its
order, interest might focus on the mean number of loaves, , usually sold on a particular day.
Example 2:
Suppose that a company has the job of packing a certain breakfast cereal into boxes, so that
each box approximately contains 500g of cereal. The weight of cereal in each box varies
around 500g due to the variability of the cereal product. The company wants to check that the
amount going into each box doesn't vary too much about 500g- weights greater than 500g will
lose the company money and weights less than 500g could lead to customer dissatisfaction. In
this case, attention may focus on the variability of weights in the boxes as described by , the
standard deviation of weights.
Example 3:
When testing a new drug, a doctor might not be interested so much in the number of people
cured by the drug, but rather the proportion, , of people who are cured by the drug.
Definition: Any quantity computed from values in a sample is called a (sample) statistic.
Example:
All the numerical summaries introduced in Chapter 2 are statistics as they are all calculated
from values in the random sample. This includes statistics such as the sample mean (which
utilises all the observations in its calculation) and the sample median (which only takes
account of the middle observations).
It is important to realise that there is a difference between population parameters and sample
statistics. The population parameter is a characteristic of the distribution of the random
variable, is typically unknown and cannot be observed. By contrast, a statistic is a
characteristic of the sample and can be observed. For example, the population mean has
some fixed (but unknown) value. On the other hand, the sample mean, X , can be observed
and therefore can be known for a particular sample. The observed value of X , however, can
vary from sample to sample (as different samples will give different values of x1 ,..., x n ). The
value of a statistic, therefore, is subject to sampling variability.
The sampling distribution of a statistic describes the long-run behaviour of the statistic's
values when many different samples, each of size n, are obtained and the value of the statistic
is computed for each sample.
6.2 The sampling distribution of the sample mean
Experiment 1: We generate 500 random samples (each of size n) from N[100, 400]. For
each of these 500 samples we calculate x , so we have a random sample of 500 observations
from the sampling distribution of X . This was repeated for n = 5, 20, 50.
Sampling distribution for the sample mean (n = 5) Sampling distribution for the sample mean (n = 20)
60 70
50 60
50
40
Frequency
Frequency
40
30
30
20
20
10 10
0 0
90
80
70
60
Frequency
50
40
30
20
10
0
90 100 110
Sample mean
Observations: In each case the distribution seems roughly normal and it is clear that each of
these histograms is centred roughly at 100 (the mean of the normal distribution from which
the samples were generated). We can also see that as the sample size n increases, the
variability in the sampling distributions decreases (look carefully at the scales on the
horizontal axes).
These points can also be seen if we look at some statistics relating to each histogram above:
We will do a similar set of experiments to see what the sampling distribution for X is like
when we are not sampling from the normal distribution.
Experiment 2: We generate 500 random samples (each of size n) from a uniform U[0,1]
distribution. Again, for each of these 500 samples we calculate x , so we have a random
sample of 500 observations from the sampling distribution of X . This was repeated for n = 5,
10, 20, 50.
Note: If X ~ U[0, 1], then E[X] = 0.5 and Var[X] = 1/12 (so s.d. = 0.289).
Sampling distribution for the sample mean (n = 5) Sampling distribution for the sample mean (n = 10)
80
60
70
50 60
Frequency
40 50
Frequency
40
30
30
20
20
10 10
0
0
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Sample mean Sample mean
Sampling distribution for the sample mean (n = 20) Sampling distribution for the sample mean (n = 50)
90
60
80
70 50
60
Frequency
Frequency
40
50
40 30
30 20
20
10
10
0 0
0.25 0.35 0.45 0.55 0.65 0.75 0.35 0.45 0.55 0.65
Sample mean Sample mean
Observations: The shapes of the histograms relating to the sample means look increasingly
more like normal distributions as n increases- this is despite the data being sampled from a
uniform distribution. The histograms in each case seem to centre on 0.5 (the mean of the U[0,
1] distribution). Also, the variability of the sampling distributions is decreasing as the sample
size becomes larger.
The mean and standard deviation for the data in the four situations above are given below:
Proof
1 n 1 n 1
E[ X ] E X i E[ X i ] n (as required).
n 1 n 1 n
Because we are assuming that the random variables are independent, we can also write:
1 n 1 n 1 2
Var [ X ] Var X i 2 Var [ X i ] 2 n 2 (as required).
n 1 n 1 n n
A linear combination of normally distributed random variables also has a normal distribution.
The mean and variance are as given above.
Not proved here.
Note:
Part (4) of the above result is the Central Limit Theorem, an extremely powerful and useful
result in Statistics.
Example 1:
X 1 ,..., X 20 are independently and identically distributed N[30, 5]. Find the sampling
distribution for X .
Example 2:
X 1 ,..., X 40 are i.i.d Po(10) random variables. What approximately is the sampling
distribution for X ?
The sample size can be considered large enough for the Central Limit Theorem to be
applied. The sampling distribution can therefore be considered approximately normal. A
10
Po(10) distribution has mean and variance equal to 10, therefore X ~ N 10, N10, 0.25
40
(roughly).
6.3 Sampling distribution of the sample proportion
To learn about , we could observe a random sample in which each of the n observations is
either a “success” or a “failure”. The sample proportion, p, is given by:
p = (number of successes) n.
The sample proportion is clearly a sample statistic. It makes sense to use p to learn about .
We are therefore interested in the sampling distribution for p.
Experiment 1:
Suppose that we generate 500 samples of size n where each sampled value is either a success
(with probability = 0.25) or a failure (with probability 1 - = 0.75). We then calculate the
observed proportion of “successes” in each of the 500 samples. We will do this for n = 5, 10,
25 and 50.
Sampling distribution for the sample proportion (n = 5) Sampling distribution for the sample proportion (n = 10)
140
200
120
100
Frequency
Frequency
80
100
60
40
20
0 0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.0 0.1 0.2 0.3 0.4 0.5 0.6
Sample proportion, p Sample proportion, p
Sampling distribution for the sample proportion (n = 20) Sampling distribution for the sample proportion (n = 50)
70
100
60
50
Frequency
Frequency
40
50
30
20
10
0
0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
0.1 0.2 0.3 0.4 0.5
Sample proportion, p
Sample proportion, p
Observations:
For a sample of size 5, the possible values of p are 0, 0.2, 0.4, 0.6, 0.8 and 1. The sampling
distribution for p gives the probability of each of these 6 values. The histogram for the case n
= 5 is positively skewed.
As n increases, the histograms become more and more symmetrical and in fact when n = 50
the histogram clearly resembles a normal curve centred on 0.25. In addition, increasing the
sample size decreases the range of observed values for p.
Experiment 2:
Once again we will generate 500 samples, but this time we will have the sample sizes n = 10,
25, 50 and 100 and we will take the true proportion of successes, to be 0.07. So once again
each observation in each sample is either a success (S) with probability 0.07, or failure (F)
with probability 0.93.
Sampling distribution for the sample proportion (n = 10) Sampling distribution for the sample proportion (n = 25)
200 150
Frequency
Frequency
100
100
50
0
0
0.00 0.05 0.10 0.15 0.20 0.25
0.0 0.1 0.2 0.3 0.4 Sample proportion, p
Sample proportion, p
Sampling distribution for the sample proportion (n = 50) Sampling distribution for the sample proportion (n = 100)
80
70
100 60
Frequency
Frequency
50
40
50 30
20
10
0 0
Observations:
When n = 10, the possible values for p are 0, 0.1, 0.2, …, 1. The histogram for the 500
samples is very positively skewed and no values greater than 0.4 was observed for p. [Notice
how in the previous experiment, the density for p was not very skewed when n = 10].
As n increases to 25 and 50, the histograms still look positively skewed. However, when the
sample size reaches 100, the histogram is beginning to look slightly more normal. Therefore
we note that in this experiment we need larger sample sizes than in Experiment 1 before the
sampling distribution for p looks approximately normal.
We also note that increasing the sample size again results in a narrowing in the range of
observed values for p.
Note: The further the value of is from 0.5, the larger the value of n must be in order for the
normal approximation of the sampling distribution for p to be accurate.
Rule of thumb:
If both n 5 and n(1 ) 5 , then we may use the normal approximation for p.
Proof:
Let X = total number of successes in the sample. Then X ~ Bi[n, ] and so:
E[X] = n
V[X]= n(1 - ) sd[X] = n (1 ) .
X
But, by definition, the sample proportion p = , and so
n
X 1 1
E[p] = E E[ X ] n .
n n n
2
X 1 1 (1 )
Also, V[p] = V V[ X ] 2 n (1 ) .
n n n n
Taking square roots, we get the required standard error for p.
Proof of the normality approximation is simply an application of the Central Limit Theorem,
so that for large n
(1 )
X ~ N , .
n
approximately.
Example 1:
Suppose that the proportion of women who believe that they are underpaid is 0.55.
a) If we had a random sample of size 10, could we assume that the sampling distribution
for p is approximately normal?
b) For a random sample of 400, what are the mean value and standard deviation for p?
c) In a sample of size 400, what is the probability that we observe the proportion of
women who believe they are underpaid to be greater than 0.6?
b) n = 400, so:
E[p] = = 0.55
(1 ) 0.55 0.45
V[p] = 0.000619
n 400
sd[X] = 0.0249.
For n = 400, n = 220 and n(1 - ) = 180 and so p's distribution can be considered
approximately normal. Therefore:
p ~ N[0.55, 0.000619].
0.6 0.55
c) P( p 0.6) P Z P( Z 2.008) 1 (2.008) 1 0.9778 0.0222 approximat ely.
0.0249
Example 2:
Suppose that the true proportion of individuals with a particular disease is 0.02. What
minimum sample size would be needed before p's distribution can be assumed to be
approximately normal?
Exercise:
90% of the population are right-handed. In a sample of 200 people, what is the probability
that the sample proportion who are right-handed is less than 0.86?
When we want to learn about the variance, 2 , of a population, it is natural to first look
towards the sample variance, S 2 . We are therefore interested in the sampling distribution for
S2.
In general, the sampling distribution for S 2 does not follow any fixed rules and so here we
will only look at the case when X 1 ,..., X n are i.i.d. N[ , 2 ].
Important result:
If X 1 ,..., X n are i.i.d. N[ , 2 ] where is unknown, then
(n 1) S 2
~ n21 .
2
Histogram for n = 3
200
Frequency
100
0 5 10 15
Statistic
(n 1) S 2
then demonstrate what the sampling distribution for looks like in each case.
2
Histogram for n = 5
90
80
70
60
Frequency
50
40
30
20
10
0
0 5 10
Statistic
60 70
50 60
50
Frequency
40
Frequency
40
30
30
20 20
10
10
0
0
0 1 2 3 4 5
0 1 2 3 4 5 6
Statistic
Statistic
Observations:
(n 1) S 2
In the case when n = 3, the histogram for the sample of 500 observations of is
2
heavily positively skewed and resembles a 22 distribution. The histograms for the other
cases, where n = 5, 10 and 20, also resemble chi-squared distributions (the respective degrees
of freedom should be 4, 9 and 19).