Unit 5

STAT 2000 – Unit 5
Carrie Madden
Carrie Madden STAT 2000 – Unit 5 1 / 185

Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests
Unit 5 – Test for Proportions and Analysis of

Categorical Data and Goodness-of-fit Tests

Review of Inference for a Single Population

Proportion P
Now suppose we are interested in the proportion p̂ of successes:

X
p̂ =
n
The mean and standard deviation of p̂ are
µp̂ = p
s
p (1 − p)
and σp̂ =
n

Distribution of a Sample Proportion
The Central Limit Theorem says that, if a random variable X represents a

sample mean, and if the sample size is high, then the sampling distribution
of X is approximately normal. Specifically,
X − µX
Z= ∼ N(0, 1)
σX
But we can think of p̂ as a kind of sample mean, because we are adding

up all the successes we observe and dividing by the sample size n.

Distribution of a Sample Proportion
Result
So when the sample size n is high,
p̂ − p
⇒Z = s ∼ N(0, 1)
p(1 − p)
n
We can safely use this approximation provided that
np ≥ 10 and n (1 − p) ≥ 10
and that the population size is very large compared to the sample size.

Example
Example
Suppose we randomly select 200 U of M students and ask them whether
they are left- or right-handed. Assuming 10% of all people are left-handed,
what is the probability that at least 25 (12.5%) of the students in our
sample are left-handed?
Solution
Let p̂ be the proportion of students in the sample who are left-handed.
Then for a sample of size 200, the mean and standard deviation of p̂ are
µp̂ = p = 0.10
s
0.10 (0.90)
and σp̂ = = 0.02121
200

Example
We calculate
np = 200 (0.10) = 20 > 10 and n (1 − p) = 200 (0.90) = 180 > 10
Since the population is large, we can use the normal distribution. The
probability that at least 12.5% of the sampled students are left-handed is:
   
p̂ − p  0.125 − 0.10 
   
 
P(p̂ ≥ 0.125) = P Z ≥
 s  = P Z ≥
  s 

 p(1 − p)   0.1(0.9) 
n 200
= P(Z ≥ 1.18) = 1 − P(Z < 1.18) = 1 − 0.8810 = 0.1190

Example – R code
Just as a refresher to find P(Z > 1.18)
prob<- 1-pnorm(1.18)
prob
## [1] 0.1190001

Example – Airlines
Example
A large airplane making a flight from Winnipeg to Toronto has 150 seats.
The airline knows from past records that 8% of customers do not show up
for their flight, and so they regularly overbook their flights. They have sold
160 tickets for this flight. What is the probability that all passengers who
arrive will get a seat?
Solution
Let p̂ be the proportion of passengers who do not show up for their flight.
The mean and standard deviation of p̂ are
µp̂ = p = 0.08
s s
p (1 − p) 0.08 (0.92)
and σp̂ = = = 0.0214
n 160
\end{frame}
Example
Solution
We can calculate np = 12.8 > 10 and n (1 − p) = 147.2 > 10, so we can
safely use the normal approximation. For everyone to get a seat, at least
10
10 passengers, or a proportion of 160 = 0.0625 must not show up.
Therefore, the probability
 that everyonewho shows
 up gets a seat is
p̂ − p  0.0625 − 0.08 
   
 
P(p̂ ≥ 0.0625) = P Z ≥
 s  = P Z ≥
  s 

 p(1 − p)   0.08(0.92) 
n 160
= P(Z ≥ −0.82) = 1 − P(Z < −0.82) = 1 − 0.2061 = 0.7939

Example – R code
P(Z > −0.82)
prob<- 1-pnorm(-0.82)
prob
## [1] 0.7938919

Practice Question
It is known that 20% of a certain type of lottery ticket are winners. If you
buy 100 lottery tickets, what is the approximate probability that at least
25 of them are winners?
A 0.1056
B 0.1539
C 0.2061
D 0.2578
E 0.3085

Inference for a Population Proportion
We now examine inference methods for the case where the parameter of
interest is some population proportion p.
We estimate p by the sample proportion p̂, which has mean and standard
deviation s
p (1 − p)
µp̂ = p and σp̂ =
n

Inference for a Population Proportion
Recall that, in order to use the normal distribution when doing probability
calculations for p̂, we required that np ≥ 10 and n (1 − p) ≥ 10, and that
the sample size was small compared to the population size.
Although we won’t formally verify this for each example in this unit, these
assumptions will hold for all of them, and so the use of the normal
distribution in doing probability calculations is justified.

Confidence Interval for a Population Proportion}
We take a simple random sample of n individuals and calculate the

proportion p̂ that possess some characteristic of interest. A (1 − α) 100%
confidence interval for a population proportion p is given as
s
p̂ (1 − p̂)
p̂ ± z ∗ .
n
Ideally, we would like to use the true standard deviation of p in the

formula, but we don’t know p (this is the reason for doing inference!), so
we must estimate it by p̂, and we estimate the standard deviation by the
standard error of p̂.

Example – Obama
Example
In a survey of 1000 randomly selected Americans, 58% said they approve
of the job being done by President Barack Obama. The following is a 90%
confidence interval for the true proportion of all Americans who approve of
Obama:
s s
∗ p̂ (1 − p̂) 0.58 (0.42)
p̂ ± z = 0.58 ± 1.645
n 1000
= 0.58 ± 0.026
= (0.554, 0.606)
Interpretation of the interval: If we repeatedly selected random samples

of 1000 Americans and constructed the confidence interval in a similar
manner, then 90% of all such intervals would contain the true proportion
of all Americans who approve of the job being done by Barack Obama.
Practice Question
We would like to construct a 95% confidence interval for the proportion of

people who take vitamins regularly. A random sample of 900 individuals
has been selected from a large population. It was found that 180 take
vitamins regularly. The standard error of this estimate is:
A 0.1600
B 0.0002
C 0.0261
D 0.0133
E 0.0298

Sample Size Determination
Suppose we would like to select a sample of individuals large enough to

estimate some population proportion p to within a specified margin of
error m with a given level of confidence.
s
p̂ (1 − p̂)
m = z∗
n
∗ 2
z
⇒ n= p̂ (1 − p̂)
m

Sample Size Determination
But we have a problem – we are at the stage where we have not yet
selected the sample, and so we don’t know the value of p̂. We will estimate
the value of p by some value p ∗ . We can either use an educated guess for
p ∗ , or we can use a conservative estimate p ∗ = 0.5, which will result in a
margin of error no greater than m, regardless of the sample proportion p̂.

Example
We therefore use the following formula to determine the required sample

size: ∗ 2
z
n= p ∗ (1 − p ∗ )
m
If we believe the value of p is relatively close to 0.5 (say, between 0.3 and
0.7), we should use p ∗ = 0.5. Otherwise, we use some educated guess.

Example
Example
Suppose we would like to take a sample large enough to estimate the true
proportion of all consumers who prefer Pepsi over Coke to within 3% with
95% confidence. We require a sample of size
∗ 2
z
n= p ∗ (1 − p ∗ )
m
1.96 2

= (0.5) (0.5)
0.03
= 1067.11 ≈ 1068

Example
Using the conservative estimate p ∗ = 0.5 does not result in a much higher
sample size than if we had used p ∗ = 0.3 or p ∗ = 0.7, for which the
required sample size would be n = 897.
However, if we suspect the sample proportion will be quite far from 0.5,
we may want to use an educated guess for p ∗ .

Example
Example
Suppose we would like to estimate the true proportion of all Canadians
who are left-handed to within 0.02 with 90% confidence. If we use
p ∗ = 0.5, we require a sample of size
∗ 2
z
n= p ∗ (1 − p ∗ )
m
1.645 2

= (0.5) (0.5)
0.02
= 1691.3 ≈ 1692

Example
Example
But we know the proportion of people who are left-handed is much lower
than 0.5. Suppose we believe the true proportion is somewhere close to
0.10. Using p ∗ = 0.10, we require a sample size
∗ 2
z
n= p ∗ (1 − p ∗ )
m
1.645 2

= (0.1) (0.9)
0.02
= 608.9 ≈ 609

Example
We see that we would be taking much too large a sample if we used

p ∗ = 0.5. This would give us a confidence interval with a margin of error
much smaller than what we originally wanted.
A small margin of error is good, by we decided we were happy estimating

the true proportion to within 2%, and we see that if we use a more
reasonable value of p ∗ (i.e., 0.10), we need to sample 1083 fewer
individuals.

Practice Question
We would like to estimate the true proportion of people who use cell
phones while driving to within 0.05 with 95% confidence. What sample
size is required?
A 49
B 385
C 91
D 196
E 246

Practice Question
A researcher calculates that, in order to estimate the true proportion of

Canadian adults who smoke cigarettes to within 0.03 with 90% confidence,
she requires a sample of 360 Canadians. What sample size would be
required in order to estimate the true proportion of Canadian adults who

smoke cigarettes to within 0.01 with 90% confidence?
A 40
B 120
C 360
D 1,080
E 3,240

Hypothesis Tests for a Population Proportion
We can also conduct hypothesis tests for a population proportion p.
Example
The subject of the long gun registry has been very controversial in recent
years. We would like to conduct a hypothesis test, at the 5% level of
significance, to determine whether a majority of Canadians support the
registry. In a simple random sample of 850 Canadians, 459 (54%)
indicated their support for the registry.

Example
Solution
1 Level of significance:
Let α = 0.05.
2 Hypothesis:
H0 : Canadians are evenly split in their support for the long gun registry.
Ha : A majority of Canadians support the long gun registry.
Equivalently,
H0 : p = 0.5 vs. Ha : p > 0.5

Example
Solution
3 Decision rule:
Reject H0 at α = 0.05 if the p ≤ α.

4 Test statistic:
0.54 − 0.50
Z=s = 2.33
0.50(0.50)
850
Notice that we are assuming that p = p0 (the value of p in the null

hypothesis) in the calculation of the test statistic, just as we assumed
µ = µ0 in calculating the test statistic when conducting a test for a
population mean.
We always calculate the test statistic assuming H0 is true.

Example
Solution
5 The p-value is
P(p̂ ≥ 0.54|p = 0.50) = P(Z ≥ 2.33)

= 1 − P(Z < 2.33) = 1 − 0.9901 = 0.0099
Here, the p − value = 0.0099 < α = 0.05, so we reject the null

hypothesis.
6 Conclusion:
We have sufficient evidence to conclude that more than 50% of
Canadians support the long gun registry.
Interpretation of the P-value: If the true proportion of Canadians who

support the gun registry was 0.5, the probability of observing a sample
proportion at least as high as 0.54 would be 0.0099.
Example
Suppose we had instead use the critical value method to conduct the test.
Our decision rule would be to reject H0 if Z ≥ z ∗ = 1.645.
We would reject H0 since Z = 2.33 > z ∗ = 1.645.
Our conclusion would be that we have sufficient evidence to conclude that

more than 50% of Canadians support the long gun registry.

Example – Gun Registry R Code
prop.test(459, 850, p = 0.50,correct = FALSE)
##
## 1-sample proportions test without continuity correction
##
## data: 459 out of 850, null probability 0.5
## X-squared = 5.44, df = 1, p-value = 0.01968
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
## 0.5063896 0.5732504
## sample estimates:
## p
## 0.54

Practice Question
Shortly after the introduction of the Euro coin in Belgium, newspapers

around the world published articles claiming the coin was biased. A
hypothesis test is to be conducted to determined if the Euro coin really is
unfair. Let p denote the true proportion of all flips of the coin that would
result in heads (and so p̂ is the proportion of heads in the sample). The
hypotheses for the appropriate test of significance are:
A H0 : p = p̂ vs. Ha : p ̸= p̂
B H0 : p = 0.5 vs. Ha : p > 0.5
C H0 : p̂ = 0.5 vs. Ha : p̂ > 0.5
D H0 : p = 0.5 vs. Ha : p ̸= 0.5
E H0 : p̂ = 0.5 vs. Ha : p̂ ̸= 0.5

Example
Example
In the 2011 Canadian general election, the NDP received 30.6% of all votes
cast. The party would like to determine whether their popular support has
changed since the election, using α = 0.05. They take a simple random
sample of 500 voters, 141 of whom say they support the NDP.
Also, calculate a 95% confidence interval for the true proportion of all
voters who support the NDP.

Example
Solution
141
We calculate p̂ = 500 = 0.282. So, our 95% confidence interval is
s s
p̂ (1 − p̂) 0.282 (0.718)
p̂ ± z ∗ = 0.282 ± 1.96
n 500
= 0.282 ± 0.039 = (0.243, 0.321)
We will now conduct a hypothesis test to determine whether there is

evidence that the popular support for the NDP has changed since the last
election.

Example
Solution
1 Let α = 0.05.
2 We are testing the hypotheses:
H0 : The popular support for the NDP is the same as last election.
Ha : The popular support the for NDP has changed since the
last election.
Equivalently,
H0 : p = 0.306
Ha : p ̸= 0.306
3 Reject H0 if the p − value ≤ α = 0.05.

Example
Solution
4 The test statistic is:
0.282 − 0.306
Z=s = −1.16
0.306(0.694)
500
5 The pvalue is
P(Z ≤ −1.16 = 2 (0.1230) = 0.2460

since p − value = 0.2460 > α = 0.05, we fail to reject H0 at the 5%
level of significance.
6 We have insufficient evidence to conclude that the NDP’s popular
support has changed since the last election.

Example
Suppose we had instead used the critical value method to conduct the
test. Our decision rule would be to reject H0 if |Z | ≥ z ∗ = 1.96 (i.e., if
Z ≤ −1.96 or Z ≥ 1.96), where z ∗ = 1.96 is the upper 0.025 critical value
for the standard normal distribution.
We would fail to reject H0 since −1.96 < z = −1.16 < 1.96.

Practice Question
A drug company manufactures antacid that is known to be successful in
providing relief for 70% of people with heartburn.The company tests a new
formula to determine whether it is better than the current one.A total of
150 people with heartburn try the new antacid and 120 report feeling
some relief. The appropriate test statistic for testing whether the new
formula is more effective than the old formula is:
0.8 − 0.7
A Z=s
0.8(0.2)
150
0.7 − 0.8
B Z=s
0.7(0.3)
150
0.8 − 0.7
C Z=s
0.7(0.3)
150
0.8 − 0.7
Power Calculations
We can also calculate the power of hypothesis tests for proportions. For
example, we conducted a test of
H0 : p = 0.50 vs. Ha : p > 0.50
for the proportion of Canadians who support the long gun registry at the
5% level of significance. What would be the power of the test if the true
proportion of Canadians who supported the registry was 0.55?
Power = P(reject H0 | p = 0.55)

Example
Step 1:
Find the rejection rule in terms of p̂, assuming H0 is true:
Reject H0 if Z ≥ 1.645
p̂ − p
Z=s
p(1 − p)
n
s
0.5(0.5)
⇒ p̂ ≥ 0.50 + 1.645
850
⇒ p̂ ≥ 0.5282

Example
Step 2:
Find the probability of rejecting H0 assuming Ha is true.
 
p̂ − p
 
 
Z ≥ s
Power = P  

 p(1 − p) 
n
 
0.5282 − 0.55 
 

⇒P
Z ≥ s


 0.55(0.45) 
850
⇒ P(Z ≥ −1.28) = 1 − P(Z < −1.28) = 1 − 0.1003 = 0.8997

Example
A health organization claims that less than one quarter of all adults smoke
cigarettes. We would like to conduct a hypothesis test, at the 10% level of
significance, to verify this claim. We will select a simple random sample of
250 adults and ask them whether they smoke. What is the power of the
test if the true proportion of adults who smoke is 0.20?
We want the power of the test if H0 : p = 0.25 vs. Ha : p < 0.25 when
p = 0.20.

Example
Step 1:
Find the rejection rule in terms of p̂, assuming H0 is true:
Reject H0 if Z ≤ −1.282
p̂ − p
Z=s
p(1 − p)
n
s
0.25(0.75)
⇒ p̂ ≤ 0.25 − 1.282
250
⇒ p̂ ≤ 0.2149

Example
Step 2:
Find the probability of rejecting H0 assuming Ha is true.
 
p̂ − p
 
 
Z ≤ s
Power = P  

 p(1 − p) 
n
 
0.2149 − 0.20 
 

⇒P
Z ≤ s


 0.20(0.80) 
250
⇒ P(Z ≤ 0.59) = 0.7224

Inference Comparing Two Proportions
We will now turn our attention to the case where we wish to compare two
population proportions.
Let p1 be the true proportion of all individuals in Population 1 who possess
some attribute. Let p2 be the true proportion of all individuals in
Population 2 who possess the same attribute.

We would like to estimate the difference in population proportions p1 − p2 .

To do this, we will take an SRS of size n1 from Population 1 and an SRS
of size n2 from Population 2.
We will calculate p̂1 and p̂2 , the sample proportions from the first and
second samples, respectively.

Our estimate of p1 − p2 is pˆ1 − pˆ2 .

When estimating p1 − p2 , we obviously won’t know the standard deviation
of pˆ1 − pˆ2 , which is a function of the population proportions p1 and p2 .
We will estimate the standard deviation by the standard error of pˆ1 − pˆ2 :
s
pˆ1 − pˆ2 pˆ1 − pˆ2
SE (pˆ1 − pˆ2 ) = +
n1 n2

Confidence Intervals
A level C confidence interval for the difference in population proportions

p1 − p2 is
s
pˆ1 (1 − pˆ1 ) pˆ2 (1 − pˆ2 )
pˆ1 − pˆ2 ± z∗ +
n1 n2
(1−C )
where z ∗ is the upper 2 critical value from the standard normal
distribution.
The use of the normal distribution as an approximation is only appropriate
if n1 pˆ1 , n1 (1 − pˆ1 ), n2 pˆ2 and n2 (1 − pˆ2 ) are all greater than or equal to
ten (i.e., if there are at least ten successes and ten failures in each of the
two samples.)

Example – Legalization
Example
Do older adults and young adults have different views on legalizing
marijuana in Canada? A sample of 150 young adults (aged 18 – 30) and a
sample of 120 older adults (aged 40+) were selected. Respondents were
asked if they supported the legalization of marijuana. Of the young adults,
87 indicated their support, while 54 of the older adults supported
legalization of the drug.

Let p1 be the true population proportion of all young adults who support
the legalization of marijuana and let p2 be the true population proportion
of all older adults who support it. We calculate the sample proportions:
87 54
pˆ1 = 150 = 0.58 pˆ2 = 120 = 0.45
We check
n1 pˆ1 = 150 (0.58) = 87 > 10 n1 (1 − pˆ1 ) = 150 (0.42) = 63 > 10

n2 pˆ2 = 120 (0.45) = 54 > 10 n2 (1 − pˆ2 ) = 120 (0.55) = 66 > 10
so the use of the normal approximation is justified.

Let us calculate a 95% confidence interval for the difference in the

population proportions p1 − p2 \begin{solution} To calculate the 95%
confidence interval, we need to first find the standard error of pˆ1 − pˆ2 :
s
pˆ1 (1 − pˆ1 ) pˆ2 (1 − pˆ2 )
+
n1 n2
s
0.58(1 − 0.58) 0.45(1 − 0.45)
+ = 0.0607
150 120
Therefore, a 95% confidence interval is:
0.58 − 0.45 ± (1.96)(0.0607)
(0.110, 0.2490)

Interpretation of the confidence interval: If we took repeated samples

of the same sizes from the same populations and calculated the interval in
a similar manner, then 95% of all such intervals would contain the true
difference in population proportions of young and older adults who support
the legalization of marijuana in Canada.
If we had decided to label older adults as Population 1 and young adults as

Population 2, our interval would be
(−0.2490, −0.0110)
Negative values do not indicate negative proportions (which do not exist),

but rather negative differences in population proportions.

Practice Question
An SRS of 100 flights of a large airline (Airline 1) showed that 79 were on

time. An SRS of 150 flights of another airline (Airline 2) showed that 96
were on time. Let p1 and p2 be the proportions of all flights that are on
time for these two airlines, respectively. A 90% confidence interval for
p1 − p2 is:
A 0.15 ± 1.645 (0.068)
B 0.15 ± 1.645 (0.079)
C 0.15 ± 1.645 (0.057)
D 0.15 ± 1.645 (0.084)
E 0.15 ± 1.645 (0.093)

Practice Question
A university administrator would like to estimate the true proportion of
students at the university with student loans. She takes a random sample
of 137 students at the university and calculates a 95% confidence interval
to be (0.46, 0.59). What is the correct interpretation of this confidence
interval?
A 95% of samples of 137 students will give proportions between 0.46
and 0.59.
B 95% of similarly constructed intervals will contain the sample
proportion of students at the university with loans.
C 95% of similarly constructed intervals will contain the true proportion
of students at the university with loans.
D The probability that the true proportion is between 0.46 and 0.59 is
95%.
E Between 46% and 59% of students have loans.
Hypothesis Testing
Recall;
For large sample sizes,
(pˆ1 − pˆ2 ) − (p1 − p2 )
Z=s
pˆ1 (1 − pˆ1 ) pˆ2 (1 − pˆ2 )
+
n1 n2
However, if the null hypothesis is true, p1 = p2 , and so p1 − p2 = 0, and
so under H0 ,
(pˆ1 − pˆ2 ) − 0
Z=s
pˆ1 (1 − pˆ1 ) pˆ2 (1 − pˆ2 )
+
n1 n2

Hypothesis Testing
But if p1 = p2 , then the proportions in the denominator are really the

same proportion, say p.
We will estimate this common proportion p (= p1 = p2 ) by the pooled
sample proportion p̂:
total successes in both samples x1 + x2

pˆc = =
total observations in both samples n1 + n2

Hypothesis Testing
The appropriate test statistic for this test of significance is therefore
(pˆ1 − pˆ2 )
Z=s
1 1

pˆc (1 − pˆc ) +
n1 n2
The use of the normal distribution as an approximation is only appropriate

if n1 pˆ1 , n1 (1 − pˆ1 ), n2 pˆ2 and n2 (1 − pˆ2 ) are all greater than or equal to
ten (i.e., if there are at least ten successes and ten failure in each of the
two samples.)

Example – Legalization Hypothesis Test
Is the true proportion of young adults who support the legalization of

marijuana in Canada greater than that for older adults?
\begin{solution}
1 Let α = 0.05.
2 We are testing the hypotheses
H0 : p1 = p2 vs. Ha : p1 > p2
3 We will reject the null hypothesis if the p-value≤ α = 0.05.

We calculate the pooled sample proportion

x1 + x2 87 + 54 141
pˆc = = = = 0.5222
n1 + n2 150 + 120 270
4 The test statistic is

(pˆ1 − pˆ2 )
Z=s
1 1

pˆc (1 − pˆc ) +
n1 n2
(0.58 − 0.45)
=s = 2.12
1 1

0.5222(1 − 0.5222) +
150 120

5 The p-value is
P(Z ≥ 2.12) = 1 − P(Z < 2.12) = 0.0170
Since the p-value=0.0170 < α = 0.05, we reject H0 .

6 There is sufficient evidence that the true proportion of young adults
who support the legalization of Marijuana in Canada is greater than
that for older adults.
Interpretaion of the P-value: if the true proportions of young and older

adults who support the legalization of marijuana in Canada were equal, the
probability of observing a difference in sample proportions at least as 0.13
would be 0.0170

test. Our decision would be to reject H0 if Z ≥ z ∗ = 1.645.
We would still reject H0 , since z = 2.12 > z ∗ = 1.645.

Example – Legalization Hypothesis Test – R Code
res <- prop.test(x = c(87, 54), n = c(150, 120), alternative =

# Printing the results
res

Example – Legalization Hypothesis Test – R Code
##
## 2-sample test for equality of proportions without continui
## correction
##
## data: c(87, 54) out of c(150, 120)
## alternative hypothesis: greater
## 0.03013015 1.00000000
## prop 1 prop 2
## 0.58 0.45

Example – Smoking
We can also use the two-sample test for proportions to compare two
treatments in an experiment.
Example
We would like to compare the effectiveness of two popular treatments that
are designed to help smokers quit smoking. A sample of 125 smokers who
have expressed a desire to quit smoking volunteer to participate in an
experiment. The 63 subjects in Group 1 are assigned to chew nicotine gum
and the 62 subjects in Group 2 are assigned to wear a nicotine patch. At
the end of six months, 22 of the subjects in Group 1 and 17 of the
subjects in Group 2 have quit smoking.

Example – Smoking (CI)

Let p1 be the true proportion of all smokers who chew nicotine gum that
are able to quit smoking, and let p2 be the true proportion of all smokers
who wear a nicotine patch that are able to quit smoking. We calculate the
sample proportions:
22 17
pˆ1 = = 0.3492 pˆ2 = = 0.2742
63 62
We check
n1 pˆ1 = 63 (0.3492) = 22 > 10 n2 (1 − pˆ1 ) = 63 (0.6508) = 41 > 10

n2 pˆ2 = 62 (0.2742) = 17 > 10 n2 (1 − pˆ2 ) = 62 (0.7258) = 45 > 10
so the use of the normal approximation is justified.

We would like to construct a 95% confidence interval for the true

difference in the proportions of smokers who are able to quit chewing
nicotine gum and wearing a nicotine patch.
The standard error of pˆ1 − pˆ2 is:
s
pˆ1 (1 − pˆ1 ) pˆ2 (1 − pˆ2 )
SE = +
n1 n2
s
0.3492(1 − 0.3492) 0.2742(1 − 0.2742)
SE = + = 0.0826
63 62

Therefore, a 95% confidence interval for p1 − p2 is:

(0.3492 − 0.2742) ± 1.96(0.0826) = (−0.0869, 0.2369)

Example – Smoking (Hypothesis Test)
We will now conduct a hypothesis test to determine whether there is a

difference in effectiveness for nicotine gum and the nicotine patch.
1 Let α = 0.05.
2 We are testing the hypotheses: H0 : p1 = p2 vs. Ha : p1 ̸= p2
3 We will reject the null hypothesis if the p-value ≤ α = 0.05.

We calculate the pooled sample proportion
x1 +x2 22+17 39
p̂ = n1 +n2 = 63+62 = 125 = 0.312

(pˆ1 − pˆ2 )
Z=s
1 1

pˆc (1 − pˆc ) +
n1 n2
(0.3492 − 0.2742)
=s = 0.90
1 1

0.312(1 − 0.312) +
63 62

5 The p-value is
2PZ ≥ 0.90 = 2 (1 − P(Z < 0.90) = 2 (0.1841) = 0.3682
Since the pvalue = 0.3682>α = 0.05, we fail to reject H0 .
6 There is insufficient evidence that there is a difference in effectiveness
between nicotine gum and the nicotine patch.

test. Our decision rule would be to reject H0 if |Z | ≥ z ∗ = 1.96 (i.e., if
z ≤ −1.96 or z ≥ 1.96), where z ∗ = 1.96 is the upper 0.025 critical value
from the standard normal distribution.
We would fail to reject H0 , since −1.96 < z = 0.090 < z ∗ = 1.96.

##
## 2-sample test for equality of proportions without continui
## correction
##
## data: c(22, 17) out of c(63, 62)
## alternative hypothesis: two.sided
## -0.08681405 0.23683966
## prop 1 prop 2
## 0.3492063 0.2741935

Practice Question
An SRS of 200 of a certain model of 2010 car found that 50 had minor
brake defects. An SRS of 100 of the same model of 2011 car found that 10
had minor brake defects. Let p1 and p2 be the true proportions of all 2010
and 2011 cars with brake defects. We wish to conduct a hypothesis test of
H0 : p1 = p2 vs. Ha : p1 ̸= p2
at the 5% level of significance. The value of the appropriate test statistic

is:
A 3.50
B 2.74
C 4.03
D 3.06
E 3.64
Practice Questions
An SRS of 200 of a certain model of 2010 car found that 50 had minor
brake defects. An SRS of 100 of the same model of 2011 car found that 10
had minor brake defects. Let p1 and p2 be the true proportions of all 2010
and 2011 cars with brake defects. We wish to conduct a hypothesis test of
H0 : p1 = p2 vs. Ha : p1 ̸= p2
at the 5% level of significance. The test statistic is calculated to be 3.06.

What is the p-value of the test?
A 0.0006
B 0.9989
C 0.0022
D 0.0011
E 0.9978
Practice Question
We wish to conduct a test of significance to determine whether there is

evidence that the true proportion of females who smoke cigarettes is
greater than that for males. We would make a Type I Error if we conclude
that:
A pf > pm when in fact pm > pf .
B pf = pm when in fact pf > pm .
C pf ̸= pm when in fact pf > pm .
D pm > pf when in fact pf > pm .
E pf > pm when in fact pf = pm .

Contingency Tables
Consider the following contingency table classifying each individual in a

sample by both eye colour and hair colour:
Eye Colour Blonde Red Brown Black Grey

Brown 11 8 39 14 6
Blue 15 7 16 3 10
Green 9 4 12 2 5
The total number of subjects characterized by one value of each variable is

plead in the cell formed by the intersection of the two categories. For
example, there are 12 subjects in the sample with brown hair and green
eyes.

Contingency Tables in R
That is the contingency table followed by the row totals and column totals
## Blonde Red Brown Black Grey

## Brown 11 8 39 14 6
## Blue 15 7 16 3 10
## Green 9 4 12 2 5
## Brown Blue Green

## 78 51 32
## Blonde Red Brown Black Grey

## 35 19 67 19 21

Contingency Tables in R
That is the contingency table followed by the row totals and column totals
colour<-matrix(c(11,8,39,14,6,15,7,16,3,10,9,4,12,2,5),
nrow = 3, ncol = 5, byrow = TRUE )
dimnames(colour) = list(c("Brown","Blue","Green"),
c("Blonde","Red", "Brown","Black","Gre
colour2<-data.frame(colour)
colour2
rowSums(colour2)
colSums(colour2)

Contingency Table
The entries in the table are referred to as observed cell frequencies and are
denoted by O.
The previous two-way table is called a 3 × 5 table, since there are three
rows and five columns (i.e., three possible eye colours and five possible
hair colours). In general, a two-way table with r rows and c columns is
called an r × c table.
Contingency tables are very useful in helping us conduct several different
tests of significance. The first use we will make of these tables is to
conduct tests of significance for the homogeneity of several populations
with respect to some variable of interest.

Example – Ecology Department

The Ecology department head is examining the course evaluations for four
different sections of an introductory course taught last semester by four
different instructors. He would like to know if the opinions of students are
homogenous with respect to the quality of instruction they received from
their respective professors.
One question on the course evaluation reads:
“Overall, I would say this professor is. . . ”
Students indicate whether they found their professor to be Very Good,

Good, Average, Poor, or Very poor.
The department head regroups these ratings into three categories: Positive
(i.e., Very Good or Good), Neutral (i.e., Average) and Negative (i.e., Poor
and Very Poor).
The results for a sample of students in each of the four classes are shown
in the two-way table below:
Section
Rating A01 A02 A03 A04
Positive 22 16 25 10
Neutral 14 21 13 14
Negative 4 10 7 19

Inference for Homogeneous Populations
How can these data be analyzed to determine if the opinions of the

students in each of the classes are homogeneous with respect to the
quality of teaching they received.
That is, we want to test the hypotheses
H0 : Opinions of students are homogeneous with respect to the quality

of teaching they received.
Ha : Opinions of students are not homogeneous with respect to the quality
of teaching they received.

Inference for Homogenous Populations
In other words, we are testing whether the proportion of positive, neutral

and negative ratings are the same for all four professors.
In conducting a test of significance, we must always ask,
“If the null hypothesis were true, what would we expect?”
In other words, what would we expect to see if students’ opinions really

were homogenous for all four classes?

If all students are equally satisfied, then:
the same proportion should rate their professor as positive for each
section,
the same proportion should rate their professor as neutral for each
section, and
the same proportion should rate their professor as negative for each
section.

How do we estimate this common proportion for each of the three rating
categories?
To help us answer this question, we examine the table again, this time
with row, column and table totals included:
Section
Rating A01 A02 A03 A04 Row Total
Positive 22 16 25 10 73
Neutral 14 21 13 14 62
Negative 4 10 7 19 40
Column Total 40 47 45 43 175

The column totals represent the sample sizes from each of the four classes
and the row totals represent the total number of positive, neutral and
negative ratings given by all students in the sample. Note that we can
obtain the table total (which in this case is 175) by adding the row totals
or the column totals.
Section
Positive 22 16 25 10 73
Neutral 14 21 13 14 62
Negative 4 10 7 19 40
Column Total 40 47 45 43 175

Let us examine the row for Positive ratings. The estimated proportion p̂
for all students who rate their professor as positive is
total number of positive ratings row 1 total

=
total number of students in the sample table total
73
= = 0.4171
175
As such, if opinions are homogenous for all four classes, we would expect
to see 41.71% of students in each class give a positive rating.

Expected Cell Counts

For example, the expected count of A02 students who gave a positive
rating is
E = (p̂) (# of responses for A02)

(row 1 total) (column 2 total)
=
table total
(73)(47)
= = 19.61
175
By a similar argument, the expected count for the cell at the intersection
of the r th row and the c th column is
(row r total) (column c total)
E=
table total

The expected counts for all cells are calculated similarly and are displayed
in the following table below the observed counts (and it parentheses):
Section
22 16 25 10 73
Positive (16.69) (19.61) (18.77) (17.94)
14 21 13 14 62
Neutral (14.17) (16.65) (15.94) (15.23)
4 10 7 19 40
Negative (9.14) (10.74) (10.29) (9.83)
Column Total 40 47 45 43 175

The test statistic we will use to test for homogeneity for these four
populations measures how far off our observed counts are from our
expected counted. The test statistic is
X (O − E )2
χ2 =
all cells
E
Under the null hypothesis of homogenous populations, this test statistic

follows a chi-square distribution with (r − 1)(c − 1) degrees of freedom.

Chi-Square Distributions
The chi-square distributions are a family of right-skewed distributions

completely characterized by their degrees of freedom (i.e., the degrees of
freedom is the only parameter).

Chi-Square Distributions
Chi−Square at Various Degrees of Freedom
0.15
0.10
df
Density
df_05
df_15
df_30
0.05
0.00
0 20 40 60
Chi−square

Test Statistic
If the null hypothesis is true, then we will likely have observed cell counts
which are quite close to their expected cell counts and the value of the
test statistic
X (O − E )2
χ2 =
all cells
E
should be quite low. On the other hand, if the populations are not
homogenous, observed cell counts will differ substantially from expected
cell counts and the value of the test statistic will be high.

As such, we will reject the null hypothesis of homogeneity if the value of

the test statistic is high, namely if it exceeds the upper α critical value
from the χ2 distribution with (r − 1) (c − 1) degrees of freedom (or
equivalently, if the p-value is less than or equal to α). Selected critical
values for the χ2 distribution are given in Table 5. Chi-square tests are
always upper-tailed.

Example
For example, we see from the table that
P(χ2 (4) ≥ 6.74) = 0.15

Example
p<-pchisq(6.74,4, lower.tail = FALSE)

p
## [1] 0.1502827

Like the z procedures for comparing two proportions, the chi-square test
for homogeneity is an approximate method. The approximation becomes
more accurate as the observed counts in the cells become larger.
In practice, we can safely use the chi-square distribution in our tests for
homogeneity if:
no more than 20% of expected cell counts are less than five, and
there are no expected cell counts less than one.

In our example, none of the expected cell counts are less than five, and so
the chi-square approximation is justified. We will now conduct the formal
hypothesis test from the beginning.

1 Let α = 0.05.
H0 : Opinions of students are homogeneous with respect to

the quality of teaching they received.
Ha : Opinions of students are not homogeneous with respect to
the quality of teaching they received.
3 We will reject H0 if the p-value ≤ α = 0.05.

In order to compute the test statistic, we must first calculate each of the
cell chi-square values separately.
For instance, we previously calculated that the expected count for the first
row and second column (positive ratings from A02 students) is 19.61. The
observed cell count is 16, and so the cell chi-square value is
(O − E )2 (16 − 19.61)2
= = 0.66
E 19.61
Other cell chi-square values are calculated similarly and are shown in the
following table below both the observed and expected cell counts (and are
in brackets).

Section
22 16 25 10 73
Positive (16.69) (19.61) (18.77) (17.94)
[1.09] [0.66] [2.07] [3.51]
14 21 13 14 62
Neutral (14.17) (16.65) (15.94) (15.23)
[0.00] [1.14] [0.54] [0.10]
4 10 7 19 40
Negative (9.14) (10.74) (10.29) (9.83)
[2.89] [0.05] [1.05] [8.55]
Column Total 40 47 45 43 175

Solution
4 We can now add all the cell chi square values to obtain the value of
the test statistic:

X (O − E )2
χ2 = = 1.69 + 0.66 + . . . + 8.55 = 22.25
all cells
E
which, under the null hypothesis, this test statistic follows a

chi-square distribution with (2)(3) = 6 degrees of freedom.

Solution
5 The p-value is
P(χ2 (6) ≥ 22.25)

We see from Table 5 that

Interpretation of the P-value: If opinions of students in the four

sections were homogeneous with respect to the quality of teaching they
received, the probability of observing a value of the test statistic at least as
high as 22.25 would be between 0.001 and 0.0025.
Suppose we had instead conducted the test using the critical value
approach. The decision rule would be to reject H0 if
χ2 ≥ χ2∗ = 12.59
where χ2∗ = 12.59 is the upper 0.05 critical value from the chi-square
distribution with 6 degrees of freedom.
We would still reject the null hypothesis, since χ2 = 22.25 > χ2∗ = 12.59.

We can examine which cells contribute the most to the value of the test
statistic in an effort to understand why opinions were found not to be
homogeneous.
The highest cell chi-square values are the negative rating for A04, which is
higher than expected, the positive rating for A04, which is lower than
expected, and the negative rating for A01, which is lower than expected.
This tells us that students in A01 liked their instructor more than average
and students in A04 like their instructor less than average.

P(χ2 (6) ≥ 20.25) = 0.0025 and P(χ2 (6) ≥ 22.46) = 0.001

Since 20.25 < χ2 = 22.25 < 22.46, our p-value is between 0.001 and
0.0025. Since the p-value < α = 0.05, we reject the null hypothesis.
We have sufficient evidence to conclude that opinions of students in the
four sections are not homogenous with respect to the quality of teaching
they received.

Example – Ecology Department R Code
opinion<-matrix(c(22,16,25,10,14,21,13,14,4,10,7,19), nrow = 3
dimnames(opinion) = list(c("Positive","Neutral","Negative"), c
opinion
## A01 A02 A03 A04

## Positive 22 16 25 10
## Neutral 14 21 13 14
## Negative 4 10 7 19


mosaicplot(opinion)
opinion
Positive Neutral Negative

A01
A02
A03
A04

chisq.test(opinion)
##
## Pearson’s Chi-squared test
##
## data: opinion

p-value
p2<-pchisq(22.25,6, lower.tail = FALSE)

p2
## [1] 0.001090816
Critical Value
q1<-qchisq(0.05,6, lower.tail = FALSE)

q1
## [1] 12.59159

Example – Archer
Five archers shoot several arrows at a target. The table below displays the
number of times each archer hit and missed the bull’s-eye on the target:
Archer
Result Archer 1 Archer 2 Archer 3 Archer 4 Archer 5 Row Total
Hit 25 30 30 50 25 160
Missed 10 25 10 20 25 90
Column
Total 35 55 40 70 50 250
Are the archers homogeneous with respect to their accuracy?

\end{example}

Example – Archer
Solution
1 Let α = 0.05.
H0 : The five archers are homogeneous with respect to their accuracy.

Ha : The five archers are not homogeneous with respect to their
accuracy.
Note that, since there are only two values of the explanatory variable
(hit or miss), we are actually testing the equality of five population
proportions.
H0 : p1 = p2 = p3 = p4 = p5
Ha : At least one of the population proportions differs from the others.

Example – Archer
The chi-square test for homogeneity is in fact an extension of the z test

for comparing two proportions – only the chi-square test can compare
several population proportions.
Solution
3 We will reject H if the p-value ≤ α = 0.05.
0
We first calculate the expected cell counts. For example, the expected
number of hits for Archer 5 is
(row 1 total) (column 5 total) (160)(50)
E= = = 32.0
table total 250

Example – Archer
The rest of the expected cell counts are calculated similarly and are shown
in the table:
Archer
25 30 30 50 25 160
Hit
(22.4) (35.2) (25.6) (44.8) (32.0)
10 25 10 20 25 90
Missed
(12.6) (19.8) (14.4) (25.2) (18.0)
Column
Total 35 55 40 70 50 250
Note:
None of the expected cell counts are less than five, and so the chi-square
approximation is justified.

Example – Archer
We now calculate the cell chi-square values. For example, the cell
chi-square value for the number of misses for Archer 2 is
(O − E )2 (25 − 19.8)2
= = 1.37
E 19.8

Example – Archer
Other cell chi-square values are calculated similarly and are shown below
with the expected cell counts:
Archer
25 30 30 50 25 160
Hit (22.4) (35.2) (25.6) (44.8) (32.0)
[0.30] [0.77] [0.76] [0.60] [1.53]
10 25 10 20 25 90
Missed (12.6) (19.8) (14.4) (25.2) (18.0)
[0.56] [1.37] [1.34] [1.07] [2.72]
Column
Total 35 55 40 70 50 250
None of the expected cell counts are less than five, and so the chi-square
Example – Archer
Solution
X (O − E )2
χ2 =
all cells
E
= 0.30 + 0.77 + . . . + 2.72 = 11.02
5 Under the null hypothesis, this test statistic follows a chi-square

distribution with (r − 1) (c − 1) = (1)(4) = 4 degrees of freedom.
The p-value is P(χ2 (4) ≥ 11.02).

Example – Archer
Solution
5 We see from Table 5 that
P(χ2 (4) ≥ 9.49) = 0.05 and P(χ2 (4) ≥ 11.14) = 0.025.
Since 9.49 < χ2 = 11.02 < 11.14, our p-value is between 0.025 and
0.05.

Example – Archer
Solution
Since the p-value < α = 0.05, we reject the null hypothesis.
6 We have sufficient evidence to conclude that the five archers are not
homogeneous with respect to their accuracy.
χ2 ≥ χ2∗ = 9.49
We would still reject the null hypothesis, since χ2 = 11.02 > χ2∗ = 9.49.

Practice Question
A survey is conducted in each of four regions of Canada. Respondents are
asked whether they approve of the job being done by the prime minister.
Results are shown in the table below:
Region
Rating West Prairies Central Atlantic Row Total
Approve 94 65 61 38 258
Disapprove 28 30 89 32 179
Neutral 18 15 20 10 63
Column Total 140 110 170 80 500
What are the degrees of freedom for the appropriate test statistic?
A 5
B 6
C 8
D 9
E 11
Practice Question
A survey is conducted in each of four regions of Canada. Respondents are
asked whether they approve of the job being done by the prime minister.
Results are shown in the table below:
Rating West Prairies Central Atlantic Row Total
Approve 94 65 61 38 258
Disapprove 28 30 89 32 179
Neutral 18 15 20 10 63
Column Total 140 110 170 80 500
What is the expected number of Central Canadians who disapprove of the

job being done by the prime minister?
A 58.28
B 60.86
C 66.34
D 69.57
E 72.49
Z Test vs. χ2 Test
We say that a chi-square test can be used to test the equality of several
population proportions. In the case where we are conducting a two-sided
test comparing just two proportions, the chi-square test is in fact
equivalent to the two-sample z test. It can be shown that z 2 = χ2 and the
p-value’s of the two tests are identical.

Example – Smoking
Recall the smoking experiment:

Example
We would like to compare the effectiveness of two popular treatments that
are designed to help smokers quit smoking. A sample of 125 smokers who
have expressed a desire to quit smoking volunteer to participate in an
experiment. The 63 subjects in Group 1 are assigned to chew nicotine gum
and the 62 subjects in Group 2 are assigned to wear a nicotine patch. At
the end of six months, 22 of the subjects in Group 1 and 17 of the
subjects in Group 2 have quit smoking.

Example – Smoking
We conducted a test of H0 : p1 = p2 vs. Ha : p1 ̸= p2 and we obtained a

test statistic of z = 0.90 and a p-value of 0.3682.
s<-2*(1-pnorm(0.9))
s
## [1] 0.3681203

Example – Smoking
Suppose instead we constructed a 2 × 2 Table 5or the data and contracted

a chi-square test for homogeneity. The resulting table is shown below:
Treatment
result Gum Patch Row Total
22 17 39
Quit (19.656) (19.344)
[0.2795] [0.2840]
41 45 86
Didn’t Quit (43.344) (42.656)
[0.1268] [0.1288]
Column Total 63 62 125

Example – Smoking
The test statistic is found to be
χ2 = 0.2795 + . . . + 0.1288
= 0.8191
which is the same (after a rounding error) as we found using the z test
(z 2 = (0.9)2 = 0.81).
Under the null hypothesis, the test statistics follows a chi-square
distribution with (r − 1)(c − 1) = (1)(1) = 1 degree of freedom.

Example – Smoking
The exact p-value is
pval<-pchisq(0.81,1, lower.tail = FALSE)

pval
## [1] 0.3681203

Example – Smoking
Because of the large P-value, we do not reject H0 , but remember, we

never accept H0 . We cannot conclude that the two treatments are
homogenous; we can only say we have insufficient evidence that they are
not homogeneous.
We could conclude that homogeneity appears to be a reasonable
assumption.

Practice Question
We conduct a hypothesis test of H0 : p1 = p2 vs. Ha : p1 ̸= p2 to compare

the proportion of patients whose condition improves with an experimental
drug vs. a placebo. We conduct an experiment and determine the value of
the test statistic to be z = 1.75, and the p-value is 0.08. Suppose we had
instead conducted a chi-square test for homogeneity. The values of the
test statistic and the p-value would be:
A 3.06 and 0.0064
B 1.75 and 0.016
C 1.32 and 0.08
D 3.06 and 0.08
E 1.75 and 0.28

Chi-Square Test for Independence
We will now examine another situation for which a chi-square test of

significance is appropriate.
In testing homogeneity among several populations with respect to some
variable, we took a separate sample from each of the populations and
compared them with respect to a single variable. We could choose the
sample size we took from each of the populations (i.e., the column totals).
Now consider the case where we wish to study the relationship between
two categorical variables. We will take one simple random sample
from a single population of individuals and measure and compare the
values for the two variables.

We will then conduct a test of significance to examine whether or not the

two variables of interest are independent. That is, we will test the
hypotheses
H0 : The two categorical variables of interest are independent.

Ha : The two categorical variables of interest are not independent
(i.e., they are dependent).

Example – Money and Happiness
According to a famous saying “money can’t buy you happiness”, but are
wealth and happiness really independent?
A psychological study was conducted, in which subjects were analyzed and
categorized as either very happy, somewhat happy or unhappy. The income
levels of subjects were also examined, and each subject was categorized as
either low income, middle class, or wealthy. Let α = 0.10.

The data are displayed in the table below:
Income Levels
Happiness Low Income Middle Class Wealthy Row Total
Very Happy 7 16 13 36
Somewhat Happy 10 20 11 41
Unhappy 5 7 5 17
Column Total 22 43 29 94

Let α = 0.10.\ We are testing the hypotheses
H0 : Wealth and happiness are independent.

Ha : Wealth and happiness are dependent.
We will reject H0 if p-value ≤ α = 0.10.

Recall again that all hypothesis tests are conducted under the assumption
that the null hypothesis is true. So we must again ask, if the two variables
really are independent, then what would we expect to see?

For example, if wealth and happiness were independent, what would be the
expected number of individuals in the sample who are unhappy and
wealthy?
Recall:
If two events are independent then:
P(AandB) = P(A)P(B)
For example, if wealth and happiness are independent, the probability of a

person being unhappy and wealthy is
P(UandW ) = P(U)P(W )

As such, the expected number of individuals in our sample who are

unhappy and wealthy is
E = (total sample size) P(unhappy and wealthy)

= (table total) P(unhappy)P(wealthy)
Of course, we don’t know the true probabilities, so we must estimate

them by our sample proportions.

The estimated probability of an individual being unhappy is

row 3 total 17
p̂U = = = 0.1809
table total 94
The estimated probability of an individual being wealthy is
column 3 total 29
p̂W = = = 0.3085
table total 94

Therefore, the expected number of individuals in the sample who are

unhappy and wealthy is
E = (total sample size) p̂U p̂W

row 3 total column 3 total

= (table total)
table total table total
= = = 5.24
table total 94

By a similar line of reasoning, the expected frequency for the cell at the
intersection of the r th row and the c th column is
(row r total) (column c total)

E=
table total
T
he formula for the expected cell count is the same as it was for the test
of homogeneity, but for different reasons! The two test are in fact
identical.

Other expected cell counts are calculated similarly and are shown in the
table below:
Income Levels
7 16 13 36
Very Happy
(8.43) (16.47) (11.11)
10 20 11 41
Somewhat Happy
(9.60) (18.76) (12.65)
5 7 5 17
Unhappy
(3.98) (7.78) (5.24)

One of the nine cells (11% < 20%) has an expected count less than five,
and all expected cell counts are greater than one, so the use of the
chi-square approximation is justified.
We will now calculate the cell chi-square values. For example, the cell
chi-square value for somewhat happy and wealthy individuals is
(O − E )2 (11 − 12.65)2
= = 0.22
E 12.65

table below:
Income Levels
7 16 13 36
Very Happy (8.43) (16.47) (11.11)
[0.24] [0.01] [0.32]
10 20 11 41
Somewhat Happy (9.60) (18.76) (12.65)
[0.02] [0.08] [0.22]
5 7 5 17
Unhappy (3.98) (7.78) (5.24)
[0.26] [0.08] [0.01]

Test Statistic
The test statistic is
X (O − E )2
χ2 =
all cells
E
= 0.24 + 0.01 + . . . + 0.01 = 1.24.
Under the null hypothesis, this test statistic follows a chi-square

distribution with (r − 1) (c − 1) = (2)(2) = 4 degrees of freedom. The
p-value is P(χ2 (4) ≥ 1.24).

P-Value
We see from Table 5 that
P(χ2 (4) ≥ 5.38) = 0.25
Since χ2 = 1.24 < 5.38, our p-value is greater than 0.25.

Example
Since the p-value>α = 0.10, we fail to reject the null hypothesis at the
10% level of significance. We have insufficient evidence to conclude that
wealth and happiness are dependent (i.e., independence appears to be a
reasonable assumption).
χ2 ≥ χ2∗ = 7.78
distribution with 4 degrees of freedom. We would still fail to reject the
null hypothesis, since χ2 = 1.24 < χ2∗ = 7.78.

Example – Political Party
Example
The individuals in the previous example were also asked which political
party they support. We would like to conduct a hypothesis test, at the 1%
level of significance, to determine whether wealth and political preference
are independent. The data are displayed in the table below:
Income Levels
Political Party Low Income Middle Class Wealthy Row Total
Conservative 2 14 20 36
Liberal 6 10 6 22
NDP 11 12 2 25
Green 3 7 1 11
\end{frame}

Solution
1 Let α = 0.01.
H0 : Wealth and political preference are independent.

Ha : Wealth and political preference are dependent.
3 We will reject H0 if the p-value ≤ α = 0.01.

Solution
We will first calculate the expected cell counts. For example, the expected
number of middle class individuals in the sample who support the NDP is

E= = = 11.44
table total (94)
Other expected cell counts are calculated similarly and are shown in the
table on the following page.

Solution
Income Levels
2 14 20 36
Conservative
(8.43) (16.47) (11.11)
6 10 6 22
Liberal
(5.15) (10.06) (6.79)
11 12 2 25
NDP
(5.85) (11.44) (7.71)
3 7 1 11
Green
(2.57) (5.03) (3.39)

Note:
Two of the 12 cells (17% < 20%) have expected counts less than five, and
all expected cell counts are greater than one, so the use of the chi-square
Solution
We will now calculate the cell chi-square values. For example, the cell
chi-square value for wealthy individuals who support the Liberal Party is
(O − E )2 (6 − 6.79)2
χ2 = = = 0.09
E 6.79
table on the following page.


Solution
Income Levels
2 14 20 36
Conservative (8.43) (16.47) (11.11)
[4.90] [0.37] [7.11]
6 10 6 22
Liberal (5.15) (10.06) (6.79)
[0.14] [0.00] [0.09]
11 12 2 25
NDP (5.85) (11.44) (7.71)
[4.53] [0.03] [4.23]
3 7 1 11
Green (2.57) (5.03) (3.39)
[0.07] [0.77] [1.68]

Solution
X (O − E )2
χ2 =
all cells
E
= 4.90 + 0.37 + . . . + 1.68 = 23.92

distribution with (3)(2) = 6 degrees of freedom.
5 The p-value is P(χ2 (6) ≥ 23.92)
P(χ2 (6) ≥ 22.46) and P(χ2 (6) ≥ 24.10) = 0.0005
Since 22.46 < χ2 = 23.92 < 24.10, our p-value is between 0.0005
and 0.001.

Solution
Since the p-value < α = 0.01, we reject the null hypothesis.
6 We have sufficient evidence to conclude that wealth and political
preference are dependent.
χ2 ≥ χ2∗ = 16.81
distribution with 6 degrees of freedom. We would still reject the null
hypothesis, since χ2 = 23.92 > χ2∗ = 16.81

We see that the four cells that contribute the most to the test statistics
are for low income and wealthy individuals who support the Conservative
Party and the NDP.
It appears that poorer individuals tend to support the NDP while wealthier
voters are more likely to support the Conservative Party.

Practice Question
In a survey, 400 people were classified with respect to stress level and the
occurrence of migraine headaches. The data are shown in the table below:
Stress Level
Migraine Low High
Never 110 43 153
Occasional 114 80 194
Often 26 27 53
250 150 400
We would like to test whether the two variables are independent at the 5%
level of significance. What is the critical value of the test?
A 4.61
B 5.99
C 7.81
D 9.49
E 11.07
Practice Question
Stress Level
Migraine Low High
Never 110 43 153
Occasional 114 80 194
Often 26 27 53
250 150 400
What is the expected count of individuals with high stress who

occasionally suffer from migraine headaches?
A 74.25
B 70.75
C 72.75
D 78.50
E 76.25
Practice Question
Stress Level
Migraine Low High
110 43 153
Never
95.625 57.375
114 80 194
Occasional
121.25 72.75
26 27 53
Often
33.125 19.875
250 150 400
What is the cell chi-square value for low stress individuals who never
experience migraines?
A 1.97 (B) 2.04 (C) 2.16 (D) 1.78 (E) 1.89

Practice Questions
Stress Level What is the p-value for the

Migraine Low High chi-square test for
110 43 153 independence?
Never 95.625 57.375
2.1609 3.6016 A between 0.001 and 0.0025
114 80 194 B between 0.0025 and 0.005
Occasional 121.25 72.75 C between 0.005 and 0.01
0.4335 0.7225
26 27 53 D between 0.01 and 0.02
Often 33.125 19.875 E between 0.02 and 0.025
1.5325 2.5542
250 150 400

Chi-Square Goodness-of-fit Tests
Throughout most of this course, we have conducted tests of significance

concerning some population parameter while assuming the form of the
population distribution is known. For example, many of our methods have
required the assumption that the variable of interest follows a normal
distribution. The most we have been able to do is to look at a histogram of
the sample data to assess whether normality was a reasonable assumption.

Goodness-of-fit
What if we don’t know the distribution and we would like to determine if

it has some specific form?
Fortunately, we have the option of conducting a formal test of significance
to determine whether a random variable X follows some specific
distribution. These tests are known as the chi-square goodness of fit tests.

Example – Western Canada Pick 3
The Western Canada Pick 3 is a lottery which players choose a three-digit

number (between 000 and 999). Draws are held once per day, when a
lucky three-digit number is selected.
One player has suspicions about whether the draws are truly random. She
has noticed that some digits seem to come more frequently than others.

We will conduct a hypothesis test to investigate the player’s suspicion.

Let X be a digit selected in a given Pick 3 draw. If the draw was random,
we would expect all digits 0 through 9 to be drawn with equal probability,
and so over a long period of time, we would expect an equal number of
each digit to have been drawn.

A random variable X is said to follow a discrete uniform distribution with

parameter a and b (a < b) if all integer values from a to b have equal
probability.
We are testing whether X has a discrete uniform distribution on the
interval from a = 0 to b = 9, which would mean the probability
distribution of X is
1
P(X = x ) = , for x = 0, 1, . . . , 9
10
and we write X ∼ DU (0, 9).

The parameters of any discrete uniform distribution are the minimum and
maximum values of X (in our case, 0 and 9, respectively), which enable us
to calculate probabilities of occurrence for any value of X .
We will now conduct the chi-square goodness of fit test to investigate
whether the player’s claim has any merit.

Let α = 0.05.
We are testing the hypotheses
H0 : All ten digits are equally likely to be drawn,

i.e., X follows a uniform distribution.
Ha : The ten digits are not equally likely to be drawn,
i.e., X does not follow a uniform distribution.
We will reject H0 if the p-value ≤ α = 0.05.

Data were tabulated for all 1000 Pick 3 draws from November 1, 2008 to
July 30, 2011. The number of times each digit was drawn is shown in the
contingency table below.
Digit 0 1 2 3 4 5 6 7 8 9
Freq. 314 285 302 307 294 277 284 321 318 298
Note that in 1000 draws, a total of 3(1000) = 3000 digits were drawn.

The first thing to do is to look at a histogram of the data to get an idea of

what we can expect from the test. The histogram is shown below:
300
250
200
150
100
50
0
0 2 4 6 8 10
It certainly appears reasonable to assume that these data come from a

population that is uniformly distributed. We still need the test, however,
to verify this.

The expected frequency of each of these digits under the null hypothesis is
E = (total number of observations) ¶(digit)

= 3000 (0.1) = 300
Note:
The use of the chi-square distribution for goodness of fit tests is
approximate, and can safely be used when all cell counts are at least
five.
Clearly this condition is satisfied in this case.

We now calculate the cell chi-square values for each of the digits. For
example, the cell chi-square value for the digit 8 is
(O − E )2 (318 − 300)2
= = 1.08
E 300
Other cell chi-square values are calculated similarly and are displayed under
the observed and expected counts in the table on the next page.

Digit 0 1 2 3 4 5 6 7 8 9
Freq. 314 285 302 307 294 277 284 321 318 298
300 300 300 300 300 300 300 300 300 300
0.65 0.75 0.02 0.16 0.12 1.76 0.85 1.47 1.08 0.01
We calculate the chi-square test statistic by adding all of the cell

chi-square values:
χ2 = 0.65 + 0.75 + . . . + 0.01 = 6.86


distribution with degrees of freedom equal to
(# of cells − 1) = 10 − 1 = 9
The degrees of freedom for goodness of fit tests are only (# of cells − 1) if
we know the values of all necessary parameters. In the above example, we
knew the maximum and minimum values, so all parameter values are
known. When we don’t know the value of a parameter, we must estimate
it and deduct one additional degree of freedom.

In general, the degrees of freedom in a chi-square goodness of fit test are
(# of cells − 1) − (# of estimated parameters)

We see from Table 5 that P(χ2 (9) ≥ 11.29) = 0.25.


Since the p >α = 0.05, we fail to reject the null hypothesis.
We have insufficient evidence to conclude that the selected numbers do
not follow a uniform distribution (i.e., it is reasonable to assume that the
distribution is uniform).
χ2 ≥ χ2∗ = 16.92
We would still fail to reject the null hypothesis since
χ2 = 6.86 < χ2∗ = 16.92

Example – Optometrist
An optometrist believes that the distribution of eye colours in some

population is as follows:
Eye Colour Brown Blue Green Other

Probability 0.5 0.1 0.2 0.2
In a sample of 150 people from the population, 66 have brown eyes, 23

have blue eyes, 35 have green eyes and 26 have another eye colour. We
would like to conduct a chi-square goodness-of-fit test at the 5% level of
significance to determine whether the optometrist’s proposed distribution
is correct.

1 Let α = 0.05
2 H0 The Optometrists proposed distribution is correct vs. Ha : The
distribution is something other than the proposition.
3 Decision Rule: Reject H0 if p-value is ≤ 0.05

4 Test Statistic:
Eye Colour Brown Blue Green Other
Observed 66 23 35 26
Expected (150*0.5=75) 15 30 30
Cell Chi-Square 1.08 4.27 0.833 0.533
Therefore χ2 = 6.717

5 p-value:
p-value: Follow along #cells − 1 = 3df. We see our test statistic falls
between 6.25 and 7.82 which corresponds to a p-value between 0.05
and 0.10.
pvalue<-pchisq(6.716,3, lower.tail = FALSE)

pvalue
## [1] 0.08152236

6 Conclusion :
Since our p-value is > 0.05, we fail to reject H0 , there is insufficent
evidence to support the alternative, hence the Optometrist’s
proposition is plausible.

Supposed we used the critical value method, the decision rule would be:
Reject H0 if χ2 ≤ χ2∗
3,0.05 = 7.82. Since our test statistic is less than 7.82
we have the same conclusion.
qchisq(0.05,3,lower.tail = FALSE)
## [1] 7.814728

Practice Question
A website claims that the distribution of blood types for a certain group of
people is as follows:
Blood Type A B AB O
Probability 0.4 0.2 0.1 0.3
In a random sample of 200 people from this group, 70 had blood type A,
20 had blood type B, 30 had blood type AB and 80 had blood type O.
We would like to conduct a chi-square goodness of fit test at the 1% level
of significance to verify the website’s claim. What is the critical value of
the test?
A 7.82
B 9.49
C 13.28
D 11.35
E 6.63
Practice Question
Under the null hypothesis (that the distribution of blood types is the one
claimed by the website), the expected number of people in the sample
with blood type A is 200(0.4) = 80. Other expected counts are calculated
similarly and are shown below:
Blood Type A B AB O
Count 70 20 30 80
Expected 80 40 20 60
What is the value of the appropriate test statistic?

A 22.92
B 18.78
C 24.37
D 31.84
E 16.05

Unit 5

Uploaded by

Copyright:

Available Formats

Unit 5

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 5

Uploaded by

Copyright:

Available Formats

STAT 2000 – Unit 5

Carrie Madden STAT 2000 – Unit 5 1 / 185

Unit 5 – Test for Proportions and Analysis of

Carrie Madden STAT 2000 – Unit 5 3 / 185

Review of Inference for a Single Population

Now suppose we are interested in the proportion p̂ of successes:

Carrie Madden STAT 2000 – Unit 5 4 / 185

Distribution of a Sample Proportion

The Central Limit Theorem says that, if a random variable X represents a

But we can think of p̂ as a kind of sample mean, because we are adding

Carrie Madden STAT 2000 – Unit 5 5 / 185

Distribution of a Sample Proportion

We can safely use this approximation provided that

Carrie Madden STAT 2000 – Unit 5 6 / 185

Carrie Madden STAT 2000 – Unit 5 7 / 185

np = 200 (0.10) = 20 > 10 and n (1 − p) = 200 (0.90) = 180 > 10

Carrie Madden STAT 2000 – Unit 5 8 / 185

Just as a refresher to find P(Z > 1.18)

Carrie Madden STAT 2000 – Unit 5 9 / 185

Carrie Madden STAT 2000 – Unit 5 11 / 185

P(Z > −0.82)

Carrie Madden STAT 2000 – Unit 5 12 / 185

Carrie Madden STAT 2000 – Unit 5 13 / 185

Inference for a Population Proportion

Carrie Madden STAT 2000 – Unit 5 14 / 185

Inference for a Population Proportion

Carrie Madden STAT 2000 – Unit 5 15 / 185

Confidence Interval for a Population Proportion}

We take a simple random sample of n individuals and calculate the

Ideally, we would like to use the true standard deviation of p in the

Carrie Madden STAT 2000 – Unit 5 16 / 185

Interpretation of the interval: If we repeatedly selected random samples

We would like to construct a 95% confidence interval for the proportion of

Carrie Madden STAT 2000 – Unit 5 18 / 185

Sample Size Determination

Suppose we would like to select a sample of individuals large enough to

Carrie Madden STAT 2000 – Unit 5 19 / 185

Sample Size Determination

Carrie Madden STAT 2000 – Unit 5 20 / 185

We therefore use the following formula to determine the required sample

Carrie Madden STAT 2000 – Unit 5 21 / 185

Carrie Madden STAT 2000 – Unit 5 22 / 185

Carrie Madden STAT 2000 – Unit 5 23 / 185

Carrie Madden STAT 2000 – Unit 5 24 / 185

Carrie Madden STAT 2000 – Unit 5 25 / 185

We see that we would be taking much too large a sample if we used

A small margin of error is good, by we decided we were happy estimating

Carrie Madden STAT 2000 – Unit 5 26 / 185

Carrie Madden STAT 2000 – Unit 5 27 / 185

A researcher calculates that, in order to estimate the true proportion of

required in order to estimate the true proportion of Canadian adults who

Carrie Madden STAT 2000 – Unit 5 28 / 185

Hypothesis Tests for a Population Proportion

We can also conduct hypothesis tests for a population proportion p.

Carrie Madden STAT 2000 – Unit 5 29 / 185

Carrie Madden STAT 2000 – Unit 5 30 / 185

Reject H0 at α = 0.05 if the p ≤ α.

Notice that we are assuming that p = p0 (the value of p in the null