Unit 5

Download as pdf or txt
Download as pdf or txt
You are on page 1of 185

STAT 2000 – Unit 5

Carrie Madden

Carrie Madden STAT 2000 – Unit 5 1 / 185


Carrie Madden STAT 2000 – Unit 5 2 / 185
Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Unit 5 – Test for Proportions and Analysis of


Categorical Data and Goodness-of-fit Tests

Carrie Madden STAT 2000 – Unit 5 3 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Review of Inference for a Single Population


Proportion P

Now suppose we are interested in the proportion p̂ of successes:


X
p̂ =
n
The mean and standard deviation of p̂ are

µp̂ = p
s
p (1 − p)
and σp̂ =
n

Carrie Madden STAT 2000 – Unit 5 4 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Distribution of a Sample Proportion

The Central Limit Theorem says that, if a random variable X represents a


sample mean, and if the sample size is high, then the sampling distribution
of X is approximately normal. Specifically,

X − µX
Z= ∼ N(0, 1)
σX

But we can think of p̂ as a kind of sample mean, because we are adding


up all the successes we observe and dividing by the sample size n.

Carrie Madden STAT 2000 – Unit 5 5 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Distribution of a Sample Proportion

Result
So when the sample size n is high,
p̂ − p
⇒Z = s ∼ N(0, 1)
p(1 − p)
n

We can safely use this approximation provided that

np ≥ 10 and n (1 − p) ≥ 10
and that the population size is very large compared to the sample size.

Carrie Madden STAT 2000 – Unit 5 6 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example

Example
Suppose we randomly select 200 U of M students and ask them whether
they are left- or right-handed. Assuming 10% of all people are left-handed,
what is the probability that at least 25 (12.5%) of the students in our
sample are left-handed?

Solution
Let p̂ be the proportion of students in the sample who are left-handed.
Then for a sample of size 200, the mean and standard deviation of p̂ are

µp̂ = p = 0.10
s
0.10 (0.90)
and σp̂ = = 0.02121
200

Carrie Madden STAT 2000 – Unit 5 7 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example

We calculate

np = 200 (0.10) = 20 > 10 and n (1 − p) = 200 (0.90) = 180 > 10

Since the population is large, we can use the normal distribution. The
probability that at least 12.5% of the sampled students are left-handed is:
   

p̂ − p  0.125 − 0.10 
   
 
P(p̂ ≥ 0.125) = P Z ≥
 s  = P Z ≥
  s 

 p(1 − p)   0.1(0.9) 
n 200
= P(Z ≥ 1.18) = 1 − P(Z < 1.18) = 1 − 0.8810 = 0.1190

Carrie Madden STAT 2000 – Unit 5 8 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – R code

Just as a refresher to find P(Z > 1.18)

prob<- 1-pnorm(1.18)
prob

## [1] 0.1190001

Carrie Madden STAT 2000 – Unit 5 9 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Airlines
Example
A large airplane making a flight from Winnipeg to Toronto has 150 seats.
The airline knows from past records that 8% of customers do not show up
for their flight, and so they regularly overbook their flights. They have sold
160 tickets for this flight. What is the probability that all passengers who
arrive will get a seat?

Solution
Let p̂ be the proportion of passengers who do not show up for their flight.
The mean and standard deviation of p̂ are

µp̂ = p = 0.08
s s
p (1 − p) 0.08 (0.92)
and σp̂ = = = 0.0214
n 160

\end{frame}
Carrie Madden STAT 2000 – Unit 5 10 / 185
Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example

Solution
We can calculate np = 12.8 > 10 and n (1 − p) = 147.2 > 10, so we can
safely use the normal approximation. For everyone to get a seat, at least
10
10 passengers, or a proportion of 160 = 0.0625 must not show up.
Therefore, the probability
 that everyonewho shows
 up gets a seat is

p̂ − p  0.0625 − 0.08 
   
 
P(p̂ ≥ 0.0625) = P Z ≥
 s  = P Z ≥
  s 

 p(1 − p)   0.08(0.92) 
n 160
= P(Z ≥ −0.82) = 1 − P(Z < −0.82) = 1 − 0.2061 = 0.7939

Carrie Madden STAT 2000 – Unit 5 11 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – R code

P(Z > −0.82)

prob<- 1-pnorm(-0.82)
prob

## [1] 0.7938919

Carrie Madden STAT 2000 – Unit 5 12 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Practice Question

It is known that 20% of a certain type of lottery ticket are winners. If you
buy 100 lottery tickets, what is the approximate probability that at least
25 of them are winners?
A 0.1056
B 0.1539
C 0.2061
D 0.2578
E 0.3085

Carrie Madden STAT 2000 – Unit 5 13 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Inference for a Population Proportion

We now examine inference methods for the case where the parameter of
interest is some population proportion p.
We estimate p by the sample proportion p̂, which has mean and standard
deviation s
p (1 − p)
µp̂ = p and σp̂ =
n

Carrie Madden STAT 2000 – Unit 5 14 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Inference for a Population Proportion

Recall that, in order to use the normal distribution when doing probability
calculations for p̂, we required that np ≥ 10 and n (1 − p) ≥ 10, and that
the sample size was small compared to the population size.
Although we won’t formally verify this for each example in this unit, these
assumptions will hold for all of them, and so the use of the normal
distribution in doing probability calculations is justified.

Carrie Madden STAT 2000 – Unit 5 15 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Confidence Interval for a Population Proportion}

We take a simple random sample of n individuals and calculate the


proportion p̂ that possess some characteristic of interest. A (1 − α) 100%
confidence interval for a population proportion p is given as
s
p̂ (1 − p̂)
p̂ ± z ∗ .
n

Ideally, we would like to use the true standard deviation of p in the


formula, but we don’t know p (this is the reason for doing inference!), so
we must estimate it by p̂, and we estimate the standard deviation by the
standard error of p̂.

Carrie Madden STAT 2000 – Unit 5 16 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Obama
Example
In a survey of 1000 randomly selected Americans, 58% said they approve
of the job being done by President Barack Obama. The following is a 90%
confidence interval for the true proportion of all Americans who approve of
Obama:
s s
∗ p̂ (1 − p̂) 0.58 (0.42)
p̂ ± z = 0.58 ± 1.645
n 1000
= 0.58 ± 0.026
= (0.554, 0.606)

Interpretation of the interval: If we repeatedly selected random samples


of 1000 Americans and constructed the confidence interval in a similar
manner, then 90% of all such intervals would contain the true proportion
of all Americans who approve of the job being done by Barack Obama.
Carrie Madden STAT 2000 – Unit 5 17 / 185
Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Practice Question

We would like to construct a 95% confidence interval for the proportion of


people who take vitamins regularly. A random sample of 900 individuals
has been selected from a large population. It was found that 180 take
vitamins regularly. The standard error of this estimate is:
A 0.1600
B 0.0002
C 0.0261
D 0.0133
E 0.0298

Carrie Madden STAT 2000 – Unit 5 18 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Sample Size Determination

Suppose we would like to select a sample of individuals large enough to


estimate some population proportion p to within a specified margin of
error m with a given level of confidence.

s
p̂ (1 − p̂)
m = z∗
n
 ∗ 2
z
⇒ n= p̂ (1 − p̂)
m

Carrie Madden STAT 2000 – Unit 5 19 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Sample Size Determination

But we have a problem – we are at the stage where we have not yet
selected the sample, and so we don’t know the value of p̂. We will estimate
the value of p by some value p ∗ . We can either use an educated guess for
p ∗ , or we can use a conservative estimate p ∗ = 0.5, which will result in a
margin of error no greater than m, regardless of the sample proportion p̂.

Carrie Madden STAT 2000 – Unit 5 20 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example

We therefore use the following formula to determine the required sample


size:  ∗ 2
z
n= p ∗ (1 − p ∗ )
m

If we believe the value of p is relatively close to 0.5 (say, between 0.3 and
0.7), we should use p ∗ = 0.5. Otherwise, we use some educated guess.

Carrie Madden STAT 2000 – Unit 5 21 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example

Example
Suppose we would like to take a sample large enough to estimate the true
proportion of all consumers who prefer Pepsi over Coke to within 3% with
95% confidence. We require a sample of size
 ∗ 2
z
n= p ∗ (1 − p ∗ )
m
1.96 2
 
= (0.5) (0.5)
0.03
= 1067.11 ≈ 1068

Carrie Madden STAT 2000 – Unit 5 22 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example

Using the conservative estimate p ∗ = 0.5 does not result in a much higher
sample size than if we had used p ∗ = 0.3 or p ∗ = 0.7, for which the
required sample size would be n = 897.

However, if we suspect the sample proportion will be quite far from 0.5,
we may want to use an educated guess for p ∗ .

Carrie Madden STAT 2000 – Unit 5 23 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example

Example
Suppose we would like to estimate the true proportion of all Canadians
who are left-handed to within 0.02 with 90% confidence. If we use
p ∗ = 0.5, we require a sample of size
 ∗ 2
z
n= p ∗ (1 − p ∗ )
m
1.645 2
 
= (0.5) (0.5)
0.02
= 1691.3 ≈ 1692

Carrie Madden STAT 2000 – Unit 5 24 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example

Example
But we know the proportion of people who are left-handed is much lower
than 0.5. Suppose we believe the true proportion is somewhere close to
0.10. Using p ∗ = 0.10, we require a sample size
 ∗ 2
z
n= p ∗ (1 − p ∗ )
m
1.645 2
 
= (0.1) (0.9)
0.02
= 608.9 ≈ 609

Carrie Madden STAT 2000 – Unit 5 25 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example

We see that we would be taking much too large a sample if we used


p ∗ = 0.5. This would give us a confidence interval with a margin of error
much smaller than what we originally wanted.

A small margin of error is good, by we decided we were happy estimating


the true proportion to within 2%, and we see that if we use a more
reasonable value of p ∗ (i.e., 0.10), we need to sample 1083 fewer
individuals.

Carrie Madden STAT 2000 – Unit 5 26 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Practice Question

We would like to estimate the true proportion of people who use cell
phones while driving to within 0.05 with 95% confidence. What sample
size is required?
A 49
B 385
C 91
D 196
E 246

Carrie Madden STAT 2000 – Unit 5 27 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Practice Question

A researcher calculates that, in order to estimate the true proportion of


Canadian adults who smoke cigarettes to within 0.03 with 90% confidence,
she requires a sample of 360 Canadians. What sample size would be

required in order to estimate the true proportion of Canadian adults who


smoke cigarettes to within 0.01 with 90% confidence?
A 40
B 120
C 360
D 1,080
E 3,240

Carrie Madden STAT 2000 – Unit 5 28 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Hypothesis Tests for a Population Proportion

We can also conduct hypothesis tests for a population proportion p.

Example
The subject of the long gun registry has been very controversial in recent
years. We would like to conduct a hypothesis test, at the 5% level of
significance, to determine whether a majority of Canadians support the
registry. In a simple random sample of 850 Canadians, 459 (54%)
indicated their support for the registry.

Carrie Madden STAT 2000 – Unit 5 29 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example

Solution
1 Level of significance:

Let α = 0.05.
2 Hypothesis:

H0 : Canadians are evenly split in their support for the long gun registry.
Ha : A majority of Canadians support the long gun registry.

Equivalently,
H0 : p = 0.5 vs. Ha : p > 0.5

Carrie Madden STAT 2000 – Unit 5 30 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example
Solution
3 Decision rule:

Reject H0 at α = 0.05 if the p ≤ α.


4 Test statistic:

0.54 − 0.50
Z=s = 2.33
0.50(0.50)
850

Notice that we are assuming that p = p0 (the value of p in the null


hypothesis) in the calculation of the test statistic, just as we assumed
µ = µ0 in calculating the test statistic when conducting a test for a
population mean.

We always calculate the test statistic assuming H0 is true.


Carrie Madden STAT 2000 – Unit 5 31 / 185
Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example
Solution
5 The p-value is

P(p̂ ≥ 0.54|p = 0.50) = P(Z ≥ 2.33)


= 1 − P(Z < 2.33) = 1 − 0.9901 = 0.0099

Here, the p − value = 0.0099 < α = 0.05, so we reject the null


hypothesis.
6 Conclusion:
We have sufficient evidence to conclude that more than 50% of
Canadians support the long gun registry.

Interpretation of the P-value: If the true proportion of Canadians who


support the gun registry was 0.5, the probability of observing a sample
proportion at least as high as 0.54 would be 0.0099.
Carrie Madden STAT 2000 – Unit 5 32 / 185
Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example

Suppose we had instead use the critical value method to conduct the test.
Our decision rule would be to reject H0 if Z ≥ z ∗ = 1.645.

We would reject H0 since Z = 2.33 > z ∗ = 1.645.

Our conclusion would be that we have sufficient evidence to conclude that


more than 50% of Canadians support the long gun registry.

Carrie Madden STAT 2000 – Unit 5 33 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Gun Registry R Code

prop.test(459, 850, p = 0.50,correct = FALSE)

##
## 1-sample proportions test without continuity correction
##
## data: 459 out of 850, null probability 0.5
## X-squared = 5.44, df = 1, p-value = 0.01968
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
## 0.5063896 0.5732504
## sample estimates:
## p
## 0.54

Carrie Madden STAT 2000 – Unit 5 34 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Practice Question

Shortly after the introduction of the Euro coin in Belgium, newspapers


around the world published articles claiming the coin was biased. A
hypothesis test is to be conducted to determined if the Euro coin really is
unfair. Let p denote the true proportion of all flips of the coin that would
result in heads (and so p̂ is the proportion of heads in the sample). The
hypotheses for the appropriate test of significance are:
A H0 : p = p̂ vs. Ha : p ̸= p̂
B H0 : p = 0.5 vs. Ha : p > 0.5
C H0 : p̂ = 0.5 vs. Ha : p̂ > 0.5
D H0 : p = 0.5 vs. Ha : p ̸= 0.5
E H0 : p̂ = 0.5 vs. Ha : p̂ ̸= 0.5

Carrie Madden STAT 2000 – Unit 5 35 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example

Example
In the 2011 Canadian general election, the NDP received 30.6% of all votes
cast. The party would like to determine whether their popular support has
changed since the election, using α = 0.05. They take a simple random
sample of 500 voters, 141 of whom say they support the NDP.
Also, calculate a 95% confidence interval for the true proportion of all
voters who support the NDP.

Carrie Madden STAT 2000 – Unit 5 36 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example

Solution
141
We calculate p̂ = 500 = 0.282. So, our 95% confidence interval is
s s
p̂ (1 − p̂) 0.282 (0.718)
p̂ ± z ∗ = 0.282 ± 1.96
n 500
= 0.282 ± 0.039 = (0.243, 0.321)

We will now conduct a hypothesis test to determine whether there is


evidence that the popular support for the NDP has changed since the last
election.

Carrie Madden STAT 2000 – Unit 5 37 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example

Solution
1 Let α = 0.05.

2 We are testing the hypotheses:

H0 : The popular support for the NDP is the same as last election.
Ha : The popular support the for NDP has changed since the
last election.

Equivalently,

H0 : p = 0.306
Ha : p ̸= 0.306

3 Reject H0 if the p − value ≤ α = 0.05.

Carrie Madden STAT 2000 – Unit 5 38 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example

Solution
4 The test statistic is:

0.282 − 0.306
Z=s = −1.16
0.306(0.694)
500
5 The pvalue is

P(Z ≤ −1.16 = 2 (0.1230) = 0.2460


since p − value = 0.2460 > α = 0.05, we fail to reject H0 at the 5%
level of significance.
6 We have insufficient evidence to conclude that the NDP’s popular
support has changed since the last election.

Carrie Madden STAT 2000 – Unit 5 39 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example

Suppose we had instead used the critical value method to conduct the
test. Our decision rule would be to reject H0 if |Z | ≥ z ∗ = 1.96 (i.e., if
Z ≤ −1.96 or Z ≥ 1.96), where z ∗ = 1.96 is the upper 0.025 critical value
for the standard normal distribution.

We would fail to reject H0 since −1.96 < z = −1.16 < 1.96.

Carrie Madden STAT 2000 – Unit 5 40 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Practice Question
A drug company manufactures antacid that is known to be successful in
providing relief for 70% of people with heartburn.The company tests a new
formula to determine whether it is better than the current one.A total of
150 people with heartburn try the new antacid and 120 report feeling
some relief. The appropriate test statistic for testing whether the new
formula is more effective than the old formula is:
0.8 − 0.7
A Z=s
0.8(0.2)
150
0.7 − 0.8
B Z=s
0.7(0.3)
150
0.8 − 0.7
C Z=s
0.7(0.3)
150
0.8 − 0.7
Carrie Madden STAT 2000 – Unit 5 41 / 185
Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Power Calculations

We can also calculate the power of hypothesis tests for proportions. For
example, we conducted a test of

H0 : p = 0.50 vs. Ha : p > 0.50

for the proportion of Canadians who support the long gun registry at the
5% level of significance. What would be the power of the test if the true
proportion of Canadians who supported the registry was 0.55?
Power = P(reject H0 | p = 0.55)

Carrie Madden STAT 2000 – Unit 5 42 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example

Step 1:
Find the rejection rule in terms of p̂, assuming H0 is true:
Reject H0 if Z ≥ 1.645
p̂ − p
Z=s
p(1 − p)
n
s
0.5(0.5)
⇒ p̂ ≥ 0.50 + 1.645
850
⇒ p̂ ≥ 0.5282

Carrie Madden STAT 2000 – Unit 5 43 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example

Step 2:
Find the probability of rejecting H0 assuming Ha is true.
 

p̂ − p
 
 
Z ≥ s
Power = P  

 p(1 − p) 
n
 

0.5282 − 0.55 
 

⇒P
Z ≥ s


 0.55(0.45) 
850
⇒ P(Z ≥ −1.28) = 1 − P(Z < −1.28) = 1 − 0.1003 = 0.8997

Carrie Madden STAT 2000 – Unit 5 44 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example

A health organization claims that less than one quarter of all adults smoke
cigarettes. We would like to conduct a hypothesis test, at the 10% level of
significance, to verify this claim. We will select a simple random sample of
250 adults and ask them whether they smoke. What is the power of the
test if the true proportion of adults who smoke is 0.20?
We want the power of the test if H0 : p = 0.25 vs. Ha : p < 0.25 when
p = 0.20.

Carrie Madden STAT 2000 – Unit 5 45 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example

Step 1:
Find the rejection rule in terms of p̂, assuming H0 is true:
Reject H0 if Z ≤ −1.282
p̂ − p
Z=s
p(1 − p)
n
s
0.25(0.75)
⇒ p̂ ≤ 0.25 − 1.282
250
⇒ p̂ ≤ 0.2149

Carrie Madden STAT 2000 – Unit 5 46 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example

Step 2:
Find the probability of rejecting H0 assuming Ha is true.
 

p̂ − p
 
 
Z ≤ s
Power = P  

 p(1 − p) 
n
 

0.2149 − 0.20 
 

⇒P
Z ≤ s


 0.20(0.80) 
250
⇒ P(Z ≤ 0.59) = 0.7224

Carrie Madden STAT 2000 – Unit 5 47 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Inference Comparing Two Proportions

We will now turn our attention to the case where we wish to compare two
population proportions.
Let p1 be the true proportion of all individuals in Population 1 who possess
some attribute. Let p2 be the true proportion of all individuals in
Population 2 who possess the same attribute.

Carrie Madden STAT 2000 – Unit 5 48 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Inference Comparing Two Proportions

We would like to estimate the difference in population proportions p1 − p2 .


To do this, we will take an SRS of size n1 from Population 1 and an SRS
of size n2 from Population 2.
We will calculate p̂1 and p̂2 , the sample proportions from the first and
second samples, respectively.

Carrie Madden STAT 2000 – Unit 5 49 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Inference Comparing Two Proportions

Our estimate of p1 − p2 is pˆ1 − pˆ2 .


When estimating p1 − p2 , we obviously won’t know the standard deviation
of pˆ1 − pˆ2 , which is a function of the population proportions p1 and p2 .
We will estimate the standard deviation by the standard error of pˆ1 − pˆ2 :
s
pˆ1 − pˆ2 pˆ1 − pˆ2
SE (pˆ1 − pˆ2 ) = +
n1 n2

Carrie Madden STAT 2000 – Unit 5 50 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Confidence Intervals

A level C confidence interval for the difference in population proportions


p1 − p2 is
s
pˆ1 (1 − pˆ1 ) pˆ2 (1 − pˆ2 )
pˆ1 − pˆ2 ± z∗ +
n1 n2

(1−C )
where z ∗ is the upper 2 critical value from the standard normal
distribution.
The use of the normal distribution as an approximation is only appropriate
if n1 pˆ1 , n1 (1 − pˆ1 ), n2 pˆ2 and n2 (1 − pˆ2 ) are all greater than or equal to
ten (i.e., if there are at least ten successes and ten failures in each of the
two samples.)

Carrie Madden STAT 2000 – Unit 5 51 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Legalization

Example
Do older adults and young adults have different views on legalizing
marijuana in Canada? A sample of 150 young adults (aged 18 – 30) and a
sample of 120 older adults (aged 40+) were selected. Respondents were
asked if they supported the legalization of marijuana. Of the young adults,
87 indicated their support, while 54 of the older adults supported
legalization of the drug.

Carrie Madden STAT 2000 – Unit 5 52 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Legalization

Let p1 be the true population proportion of all young adults who support
the legalization of marijuana and let p2 be the true population proportion
of all older adults who support it. We calculate the sample proportions:

87 54
pˆ1 = 150 = 0.58 pˆ2 = 120 = 0.45

We check

n1 pˆ1 = 150 (0.58) = 87 > 10 n1 (1 − pˆ1 ) = 150 (0.42) = 63 > 10


n2 pˆ2 = 120 (0.45) = 54 > 10 n2 (1 − pˆ2 ) = 120 (0.55) = 66 > 10

so the use of the normal approximation is justified.

Carrie Madden STAT 2000 – Unit 5 53 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Legalization

Let us calculate a 95% confidence interval for the difference in the


population proportions p1 − p2 \begin{solution} To calculate the 95%
confidence interval, we need to first find the standard error of pˆ1 − pˆ2 :
s
pˆ1 (1 − pˆ1 ) pˆ2 (1 − pˆ2 )
+
n1 n2
s
0.58(1 − 0.58) 0.45(1 − 0.45)
+ = 0.0607
150 120
Therefore, a 95% confidence interval is:
0.58 − 0.45 ± (1.96)(0.0607)
(0.110, 0.2490)

Carrie Madden STAT 2000 – Unit 5 54 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Legalization

Interpretation of the confidence interval: If we took repeated samples


of the same sizes from the same populations and calculated the interval in
a similar manner, then 95% of all such intervals would contain the true
difference in population proportions of young and older adults who support
the legalization of marijuana in Canada.

If we had decided to label older adults as Population 1 and young adults as


Population 2, our interval would be

(−0.2490, −0.0110)

Negative values do not indicate negative proportions (which do not exist),


but rather negative differences in population proportions.

Carrie Madden STAT 2000 – Unit 5 55 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Practice Question

An SRS of 100 flights of a large airline (Airline 1) showed that 79 were on


time. An SRS of 150 flights of another airline (Airline 2) showed that 96
were on time. Let p1 and p2 be the proportions of all flights that are on
time for these two airlines, respectively. A 90% confidence interval for
p1 − p2 is:
A 0.15 ± 1.645 (0.068)
B 0.15 ± 1.645 (0.079)
C 0.15 ± 1.645 (0.057)
D 0.15 ± 1.645 (0.084)
E 0.15 ± 1.645 (0.093)

Carrie Madden STAT 2000 – Unit 5 56 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Practice Question
A university administrator would like to estimate the true proportion of
students at the university with student loans. She takes a random sample
of 137 students at the university and calculates a 95% confidence interval
to be (0.46, 0.59). What is the correct interpretation of this confidence
interval?
A 95% of samples of 137 students will give proportions between 0.46
and 0.59.
B 95% of similarly constructed intervals will contain the sample
proportion of students at the university with loans.
C 95% of similarly constructed intervals will contain the true proportion
of students at the university with loans.
D The probability that the true proportion is between 0.46 and 0.59 is
95%.
E Between 46% and 59% of students have loans.
Carrie Madden STAT 2000 – Unit 5 57 / 185
Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Hypothesis Testing

Recall;
For large sample sizes,
(pˆ1 − pˆ2 ) − (p1 − p2 )
Z=s
pˆ1 (1 − pˆ1 ) pˆ2 (1 − pˆ2 )
+
n1 n2
However, if the null hypothesis is true, p1 = p2 , and so p1 − p2 = 0, and
so under H0 ,
(pˆ1 − pˆ2 ) − 0
Z=s
pˆ1 (1 − pˆ1 ) pˆ2 (1 − pˆ2 )
+
n1 n2

Carrie Madden STAT 2000 – Unit 5 58 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Hypothesis Testing

But if p1 = p2 , then the proportions in the denominator are really the


same proportion, say p.
We will estimate this common proportion p (= p1 = p2 ) by the pooled
sample proportion p̂:

total successes in both samples x1 + x2


pˆc = =
total observations in both samples n1 + n2

Carrie Madden STAT 2000 – Unit 5 59 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Hypothesis Testing

The appropriate test statistic for this test of significance is therefore

(pˆ1 − pˆ2 )
Z=s
1 1
 
pˆc (1 − pˆc ) +
n1 n2

The use of the normal distribution as an approximation is only appropriate


if n1 pˆ1 , n1 (1 − pˆ1 ), n2 pˆ2 and n2 (1 − pˆ2 ) are all greater than or equal to
ten (i.e., if there are at least ten successes and ten failure in each of the
two samples.)

Carrie Madden STAT 2000 – Unit 5 60 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Legalization Hypothesis Test

Is the true proportion of young adults who support the legalization of


marijuana in Canada greater than that for older adults?
\begin{solution}
1 Let α = 0.05.
2 We are testing the hypotheses

H0 : p1 = p2 vs. Ha : p1 > p2

3 We will reject the null hypothesis if the p-value≤ α = 0.05.

Carrie Madden STAT 2000 – Unit 5 61 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Legalization Hypothesis Test

We calculate the pooled sample proportion


x1 + x2 87 + 54 141
pˆc = = = = 0.5222
n1 + n2 150 + 120 270

4 The test statistic is


(pˆ1 − pˆ2 )
Z=s
1 1
 
pˆc (1 − pˆc ) +
n1 n2

(0.58 − 0.45)
=s  = 2.12
1 1

0.5222(1 − 0.5222) +
150 120

Carrie Madden STAT 2000 – Unit 5 62 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Legalization Hypothesis Test

5 The p-value is

P(Z ≥ 2.12) = 1 − P(Z < 2.12) = 0.0170

Since the p-value=0.0170 < α = 0.05, we reject H0 .


6 There is sufficient evidence that the true proportion of young adults
who support the legalization of Marijuana in Canada is greater than
that for older adults.

Interpretaion of the P-value: if the true proportions of young and older


adults who support the legalization of marijuana in Canada were equal, the
probability of observing a difference in sample proportions at least as 0.13
would be 0.0170

Carrie Madden STAT 2000 – Unit 5 63 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Legalization Hypothesis Test

Suppose we had instead used the critical value method to conduct the
test. Our decision would be to reject H0 if Z ≥ z ∗ = 1.645.

We would still reject H0 , since z = 2.12 > z ∗ = 1.645.

Carrie Madden STAT 2000 – Unit 5 64 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Legalization Hypothesis Test – R Code

res <- prop.test(x = c(87, 54), n = c(150, 120), alternative =


# Printing the results
res

Carrie Madden STAT 2000 – Unit 5 65 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Legalization Hypothesis Test – R Code

##
## 2-sample test for equality of proportions without continui
## correction
##
## data: c(87, 54) out of c(150, 120)
## X-squared = 4.5156, df = 1, p-value = 0.01679
## alternative hypothesis: greater
## 95 percent confidence interval:
## 0.03013015 1.00000000
## sample estimates:
## prop 1 prop 2
## 0.58 0.45

Carrie Madden STAT 2000 – Unit 5 66 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Smoking

We can also use the two-sample test for proportions to compare two
treatments in an experiment.
Example
We would like to compare the effectiveness of two popular treatments that
are designed to help smokers quit smoking. A sample of 125 smokers who
have expressed a desire to quit smoking volunteer to participate in an
experiment. The 63 subjects in Group 1 are assigned to chew nicotine gum
and the 62 subjects in Group 2 are assigned to wear a nicotine patch. At
the end of six months, 22 of the subjects in Group 1 and 17 of the
subjects in Group 2 have quit smoking.

Carrie Madden STAT 2000 – Unit 5 67 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Smoking (CI)


Let p1 be the true proportion of all smokers who chew nicotine gum that
are able to quit smoking, and let p2 be the true proportion of all smokers
who wear a nicotine patch that are able to quit smoking. We calculate the
sample proportions:

22 17
pˆ1 = = 0.3492 pˆ2 = = 0.2742
63 62
We check

n1 pˆ1 = 63 (0.3492) = 22 > 10 n2 (1 − pˆ1 ) = 63 (0.6508) = 41 > 10


n2 pˆ2 = 62 (0.2742) = 17 > 10 n2 (1 − pˆ2 ) = 62 (0.7258) = 45 > 10

so the use of the normal approximation is justified.


Carrie Madden STAT 2000 – Unit 5 68 / 185
Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Smoking (CI)

We would like to construct a 95% confidence interval for the true


difference in the proportions of smokers who are able to quit chewing
nicotine gum and wearing a nicotine patch.
The standard error of pˆ1 − pˆ2 is:
s
pˆ1 (1 − pˆ1 ) pˆ2 (1 − pˆ2 )
SE = +
n1 n2
s
0.3492(1 − 0.3492) 0.2742(1 − 0.2742)
SE = + = 0.0826
63 62

Carrie Madden STAT 2000 – Unit 5 69 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Smoking (CI)

Therefore, a 95% confidence interval for p1 − p2 is:


(0.3492 − 0.2742) ± 1.96(0.0826) = (−0.0869, 0.2369)

Carrie Madden STAT 2000 – Unit 5 70 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Smoking (Hypothesis Test)

We will now conduct a hypothesis test to determine whether there is a


difference in effectiveness for nicotine gum and the nicotine patch.
1 Let α = 0.05.
2 We are testing the hypotheses: H0 : p1 = p2 vs. Ha : p1 ̸= p2
3 We will reject the null hypothesis if the p-value ≤ α = 0.05.

Carrie Madden STAT 2000 – Unit 5 71 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Smoking (Hypothesis Test)

We calculate the pooled sample proportion

x1 +x2 22+17 39
p̂ = n1 +n2 = 63+62 = 125 = 0.312

4 The test statistic is:


(pˆ1 − pˆ2 )
Z=s
1 1
 
pˆc (1 − pˆc ) +
n1 n2

(0.3492 − 0.2742)
=s  = 0.90
1 1

0.312(1 − 0.312) +
63 62

Carrie Madden STAT 2000 – Unit 5 72 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Smoking (Hypothesis Test)

5 The p-value is
2PZ ≥ 0.90 = 2 (1 − P(Z < 0.90) = 2 (0.1841) = 0.3682
Since the pvalue = 0.3682>α = 0.05, we fail to reject H0 .
6 There is insufficient evidence that there is a difference in effectiveness
between nicotine gum and the nicotine patch.

Carrie Madden STAT 2000 – Unit 5 73 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Smoking (Hypothesis Test)

Suppose we had instead used the critical value method to conduct the
test. Our decision rule would be to reject H0 if |Z | ≥ z ∗ = 1.96 (i.e., if
z ≤ −1.96 or z ≥ 1.96), where z ∗ = 1.96 is the upper 0.025 critical value
from the standard normal distribution.
We would fail to reject H0 , since −1.96 < z = 0.090 < z ∗ = 1.96.

Carrie Madden STAT 2000 – Unit 5 74 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Smoking (Hypothesis Test)

##
## 2-sample test for equality of proportions without continui
## correction
##
## data: c(22, 17) out of c(63, 62)
## X-squared = 0.81912, df = 1, p-value = 0.3654
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.08681405 0.23683966
## sample estimates:
## prop 1 prop 2
## 0.3492063 0.2741935

Carrie Madden STAT 2000 – Unit 5 75 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Practice Question
An SRS of 200 of a certain model of 2010 car found that 50 had minor
brake defects. An SRS of 100 of the same model of 2011 car found that 10
had minor brake defects. Let p1 and p2 be the true proportions of all 2010
and 2011 cars with brake defects. We wish to conduct a hypothesis test of

H0 : p1 = p2 vs. Ha : p1 ̸= p2

at the 5% level of significance. The value of the appropriate test statistic


is:
A 3.50
B 2.74
C 4.03
D 3.06
E 3.64
Carrie Madden STAT 2000 – Unit 5 76 / 185
Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Practice Questions
An SRS of 200 of a certain model of 2010 car found that 50 had minor
brake defects. An SRS of 100 of the same model of 2011 car found that 10
had minor brake defects. Let p1 and p2 be the true proportions of all 2010
and 2011 cars with brake defects. We wish to conduct a hypothesis test of

H0 : p1 = p2 vs. Ha : p1 ̸= p2

at the 5% level of significance. The test statistic is calculated to be 3.06.


What is the p-value of the test?
A 0.0006
B 0.9989
C 0.0022
D 0.0011
E 0.9978
Carrie Madden STAT 2000 – Unit 5 77 / 185
Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Practice Question

We wish to conduct a test of significance to determine whether there is


evidence that the true proportion of females who smoke cigarettes is
greater than that for males. We would make a Type I Error if we conclude
that:
A pf > pm when in fact pm > pf .
B pf = pm when in fact pf > pm .
C pf ̸= pm when in fact pf > pm .
D pm > pf when in fact pf > pm .
E pf > pm when in fact pf = pm .

Carrie Madden STAT 2000 – Unit 5 78 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Contingency Tables

Consider the following contingency table classifying each individual in a


sample by both eye colour and hair colour:

Eye Colour Blonde Red Brown Black Grey


Brown 11 8 39 14 6
Blue 15 7 16 3 10
Green 9 4 12 2 5

The total number of subjects characterized by one value of each variable is


plead in the cell formed by the intersection of the two categories. For
example, there are 12 subjects in the sample with brown hair and green
eyes.

Carrie Madden STAT 2000 – Unit 5 79 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Contingency Tables in R

That is the contingency table followed by the row totals and column totals

## Blonde Red Brown Black Grey


## Brown 11 8 39 14 6
## Blue 15 7 16 3 10
## Green 9 4 12 2 5

## Brown Blue Green


## 78 51 32

## Blonde Red Brown Black Grey


## 35 19 67 19 21

Carrie Madden STAT 2000 – Unit 5 80 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Contingency Tables in R

That is the contingency table followed by the row totals and column totals

colour<-matrix(c(11,8,39,14,6,15,7,16,3,10,9,4,12,2,5),
nrow = 3, ncol = 5, byrow = TRUE )

dimnames(colour) = list(c("Brown","Blue","Green"),
c("Blonde","Red", "Brown","Black","Gre

colour2<-data.frame(colour)
colour2

rowSums(colour2)
colSums(colour2)

Carrie Madden STAT 2000 – Unit 5 81 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Contingency Table

The entries in the table are referred to as observed cell frequencies and are
denoted by O.
The previous two-way table is called a 3 × 5 table, since there are three
rows and five columns (i.e., three possible eye colours and five possible
hair colours). In general, a two-way table with r rows and c columns is
called an r × c table.
Contingency tables are very useful in helping us conduct several different
tests of significance. The first use we will make of these tables is to
conduct tests of significance for the homogeneity of several populations
with respect to some variable of interest.

Carrie Madden STAT 2000 – Unit 5 82 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Ecology Department


The Ecology department head is examining the course evaluations for four
different sections of an introductory course taught last semester by four
different instructors. He would like to know if the opinions of students are
homogenous with respect to the quality of instruction they received from
their respective professors.
One question on the course evaluation reads:

“Overall, I would say this professor is. . . ”

Students indicate whether they found their professor to be Very Good,


Good, Average, Poor, or Very poor.
The department head regroups these ratings into three categories: Positive
(i.e., Very Good or Good), Neutral (i.e., Average) and Negative (i.e., Poor
and Very Poor).
Carrie Madden STAT 2000 – Unit 5 83 / 185
Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Ecology Department

The results for a sample of students in each of the four classes are shown
in the two-way table below:

Section
Rating A01 A02 A03 A04
Positive 22 16 25 10
Neutral 14 21 13 14
Negative 4 10 7 19

Carrie Madden STAT 2000 – Unit 5 84 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Inference for Homogeneous Populations

How can these data be analyzed to determine if the opinions of the


students in each of the classes are homogeneous with respect to the
quality of teaching they received.
That is, we want to test the hypotheses

H0 : Opinions of students are homogeneous with respect to the quality


of teaching they received.
Ha : Opinions of students are not homogeneous with respect to the quality
of teaching they received.

Carrie Madden STAT 2000 – Unit 5 85 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Inference for Homogenous Populations

In other words, we are testing whether the proportion of positive, neutral


and negative ratings are the same for all four professors.
In conducting a test of significance, we must always ask,

“If the null hypothesis were true, what would we expect?”

In other words, what would we expect to see if students’ opinions really


were homogenous for all four classes?

Carrie Madden STAT 2000 – Unit 5 86 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Inference for Homogeneous Populations

If all students are equally satisfied, then:

the same proportion should rate their professor as positive for each
section,
the same proportion should rate their professor as neutral for each
section, and
the same proportion should rate their professor as negative for each
section.

Carrie Madden STAT 2000 – Unit 5 87 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Inference for Homogenous Populations

How do we estimate this common proportion for each of the three rating
categories?
To help us answer this question, we examine the table again, this time
with row, column and table totals included:

Section
Rating A01 A02 A03 A04 Row Total
Positive 22 16 25 10 73
Neutral 14 21 13 14 62
Negative 4 10 7 19 40
Column Total 40 47 45 43 175

Carrie Madden STAT 2000 – Unit 5 88 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Inference for Homogenous Populations

The column totals represent the sample sizes from each of the four classes
and the row totals represent the total number of positive, neutral and
negative ratings given by all students in the sample. Note that we can
obtain the table total (which in this case is 175) by adding the row totals
or the column totals.

Section
Rating A01 A02 A03 A04 Row Total
Positive 22 16 25 10 73
Neutral 14 21 13 14 62
Negative 4 10 7 19 40
Column Total 40 47 45 43 175

Carrie Madden STAT 2000 – Unit 5 89 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Inference for Homogenous Populations

Let us examine the row for Positive ratings. The estimated proportion p̂
for all students who rate their professor as positive is

total number of positive ratings row 1 total


=
total number of students in the sample table total
73
= = 0.4171
175

As such, if opinions are homogenous for all four classes, we would expect
to see 41.71% of students in each class give a positive rating.

Carrie Madden STAT 2000 – Unit 5 90 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Expected Cell Counts


For example, the expected count of A02 students who gave a positive
rating is

E = (p̂) (# of responses for A02)


(row 1 total) (column 2 total)
=
table total
(73)(47)
= = 19.61
175

By a similar argument, the expected count for the cell at the intersection
of the r th row and the c th column is
(row r total) (column c total)
E=
table total

Carrie Madden STAT 2000 – Unit 5 91 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Ecology Department

The expected counts for all cells are calculated similarly and are displayed
in the following table below the observed counts (and it parentheses):

Section
Rating A01 A02 A03 A04 Row Total
22 16 25 10 73
Positive (16.69) (19.61) (18.77) (17.94)
14 21 13 14 62
Neutral (14.17) (16.65) (15.94) (15.23)
4 10 7 19 40
Negative (9.14) (10.74) (10.29) (9.83)
Column Total 40 47 45 43 175

Carrie Madden STAT 2000 – Unit 5 92 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Ecology Department

The test statistic we will use to test for homogeneity for these four
populations measures how far off our observed counts are from our
expected counted. The test statistic is

X (O − E )2
χ2 =
all cells
E

Under the null hypothesis of homogenous populations, this test statistic


follows a chi-square distribution with (r − 1)(c − 1) degrees of freedom.

Carrie Madden STAT 2000 – Unit 5 93 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Chi-Square Distributions

The chi-square distributions are a family of right-skewed distributions


completely characterized by their degrees of freedom (i.e., the degrees of
freedom is the only parameter).

Carrie Madden STAT 2000 – Unit 5 94 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Chi-Square Distributions
Chi−Square at Various Degrees of Freedom

0.15

0.10

df
Density

df_05
df_15
df_30

0.05

0.00

0 20 40 60
Chi−square

Carrie Madden STAT 2000 – Unit 5 95 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Test Statistic

If the null hypothesis is true, then we will likely have observed cell counts
which are quite close to their expected cell counts and the value of the
test statistic
X (O − E )2
χ2 =
all cells
E

should be quite low. On the other hand, if the populations are not
homogenous, observed cell counts will differ substantially from expected
cell counts and the value of the test statistic will be high.

Carrie Madden STAT 2000 – Unit 5 96 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Inference for Homogeneous Populations

As such, we will reject the null hypothesis of homogeneity if the value of


the test statistic is high, namely if it exceeds the upper α critical value
from the χ2 distribution with (r − 1) (c − 1) degrees of freedom (or
equivalently, if the p-value is less than or equal to α). Selected critical
values for the χ2 distribution are given in Table 5. Chi-square tests are
always upper-tailed.

Carrie Madden STAT 2000 – Unit 5 97 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example

For example, we see from the table that

P(χ2 (4) ≥ 6.74) = 0.15

Carrie Madden STAT 2000 – Unit 5 98 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example

p<-pchisq(6.74,4, lower.tail = FALSE)


p

## [1] 0.1502827

Carrie Madden STAT 2000 – Unit 5 99 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Inference for Homogeneous Populations

Like the z procedures for comparing two proportions, the chi-square test
for homogeneity is an approximate method. The approximation becomes
more accurate as the observed counts in the cells become larger.
In practice, we can safely use the chi-square distribution in our tests for
homogeneity if:

no more than 20% of expected cell counts are less than five, and
there are no expected cell counts less than one.

Carrie Madden STAT 2000 – Unit 5 100 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Inference for Homogeneous Populations

In our example, none of the expected cell counts are less than five, and so
the chi-square approximation is justified. We will now conduct the formal
hypothesis test from the beginning.

Carrie Madden STAT 2000 – Unit 5 101 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Ecology Department

1 Let α = 0.05.
2 We are testing the hypotheses

H0 : Opinions of students are homogeneous with respect to


the quality of teaching they received.
Ha : Opinions of students are not homogeneous with respect to
the quality of teaching they received.

3 We will reject H0 if the p-value ≤ α = 0.05.

Carrie Madden STAT 2000 – Unit 5 102 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Ecology Department

In order to compute the test statistic, we must first calculate each of the
cell chi-square values separately.
For instance, we previously calculated that the expected count for the first
row and second column (positive ratings from A02 students) is 19.61. The
observed cell count is 16, and so the cell chi-square value is

(O − E )2 (16 − 19.61)2
= = 0.66
E 19.61

Other cell chi-square values are calculated similarly and are shown in the
following table below both the observed and expected cell counts (and are
in brackets).

Carrie Madden STAT 2000 – Unit 5 103 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Ecology Department

Section
Rating A01 A02 A03 A04 Row Total
22 16 25 10 73
Positive (16.69) (19.61) (18.77) (17.94)
[1.09] [0.66] [2.07] [3.51]
14 21 13 14 62
Neutral (14.17) (16.65) (15.94) (15.23)
[0.00] [1.14] [0.54] [0.10]
4 10 7 19 40
Negative (9.14) (10.74) (10.29) (9.83)
[2.89] [0.05] [1.05] [8.55]
Column Total 40 47 45 43 175

Carrie Madden STAT 2000 – Unit 5 104 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Ecology Department

Solution
4 We can now add all the cell chi square values to obtain the value of

the test statistic:


X (O − E )2
χ2 = = 1.69 + 0.66 + . . . + 8.55 = 22.25
all cells
E

which, under the null hypothesis, this test statistic follows a


chi-square distribution with (2)(3) = 6 degrees of freedom.

Carrie Madden STAT 2000 – Unit 5 105 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Ecology Department

Solution
5 The p-value is

P(χ2 (6) ≥ 22.25)


We see from Table 5 that

Carrie Madden STAT 2000 – Unit 5 106 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Ecology Department

Interpretation of the P-value: If opinions of students in the four


sections were homogeneous with respect to the quality of teaching they
received, the probability of observing a value of the test statistic at least as
high as 22.25 would be between 0.001 and 0.0025.
Suppose we had instead conducted the test using the critical value
approach. The decision rule would be to reject H0 if

χ2 ≥ χ2∗ = 12.59

where χ2∗ = 12.59 is the upper 0.05 critical value from the chi-square
distribution with 6 degrees of freedom.
We would still reject the null hypothesis, since χ2 = 22.25 > χ2∗ = 12.59.

Carrie Madden STAT 2000 – Unit 5 107 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Ecology Department

We can examine which cells contribute the most to the value of the test
statistic in an effort to understand why opinions were found not to be
homogeneous.
The highest cell chi-square values are the negative rating for A04, which is
higher than expected, the positive rating for A04, which is lower than
expected, and the negative rating for A01, which is lower than expected.
This tells us that students in A01 liked their instructor more than average
and students in A04 like their instructor less than average.

Carrie Madden STAT 2000 – Unit 5 108 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Ecology Department

P(χ2 (6) ≥ 20.25) = 0.0025 and P(χ2 (6) ≥ 22.46) = 0.001


Since 20.25 < χ2 = 22.25 < 22.46, our p-value is between 0.001 and
0.0025. Since the p-value < α = 0.05, we reject the null hypothesis.
We have sufficient evidence to conclude that opinions of students in the
four sections are not homogenous with respect to the quality of teaching
they received.

Carrie Madden STAT 2000 – Unit 5 109 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Ecology Department R Code

opinion<-matrix(c(22,16,25,10,14,21,13,14,4,10,7,19), nrow = 3

dimnames(opinion) = list(c("Positive","Neutral","Negative"), c

opinion

## A01 A02 A03 A04


## Positive 22 16 25 10
## Neutral 14 21 13 14
## Negative 4 10 7 19

Carrie Madden STAT 2000 – Unit 5 110 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Ecology Department R Code


mosaicplot(opinion)
opinion

Positive Neutral Negative


A01
A02
A03
A04

Carrie Madden STAT 2000 – Unit 5 111 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Ecology Department R Code

chisq.test(opinion)

##
## Pearson’s Chi-squared test
##
## data: opinion
## X-squared = 22.268, df = 6, p-value = 0.001083

Carrie Madden STAT 2000 – Unit 5 112 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Ecology Department R Code

p-value

p2<-pchisq(22.25,6, lower.tail = FALSE)


p2

## [1] 0.001090816

Critical Value

q1<-qchisq(0.05,6, lower.tail = FALSE)


q1

## [1] 12.59159

Carrie Madden STAT 2000 – Unit 5 113 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Archer

Five archers shoot several arrows at a target. The table below displays the
number of times each archer hit and missed the bull’s-eye on the target:

Archer
Result Archer 1 Archer 2 Archer 3 Archer 4 Archer 5 Row Total
Hit 25 30 30 50 25 160
Missed 10 25 10 20 25 90
Column
Total 35 55 40 70 50 250

Are the archers homogeneous with respect to their accuracy?


\end{example}

Carrie Madden STAT 2000 – Unit 5 114 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Archer
Solution
1 Let α = 0.05.

2 We are testing the hypotheses

H0 : The five archers are homogeneous with respect to their accuracy.


Ha : The five archers are not homogeneous with respect to their
accuracy.

Note that, since there are only two values of the explanatory variable
(hit or miss), we are actually testing the equality of five population
proportions.

H0 : p1 = p2 = p3 = p4 = p5
Ha : At least one of the population proportions differs from the others.

Carrie Madden STAT 2000 – Unit 5 115 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Archer

The chi-square test for homogeneity is in fact an extension of the z test


for comparing two proportions – only the chi-square test can compare
several population proportions.
Solution
3 We will reject H if the p-value ≤ α = 0.05.
0

We first calculate the expected cell counts. For example, the expected
number of hits for Archer 5 is
(row 1 total) (column 5 total) (160)(50)
E= = = 32.0
table total 250

Carrie Madden STAT 2000 – Unit 5 116 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Archer

The rest of the expected cell counts are calculated similarly and are shown
in the table:

Archer
Result Archer 1 Archer 2 Archer 3 Archer 4 Archer 5 Row Total
25 30 30 50 25 160
Hit
(22.4) (35.2) (25.6) (44.8) (32.0)
10 25 10 20 25 90
Missed
(12.6) (19.8) (14.4) (25.2) (18.0)
Column
Total 35 55 40 70 50 250

Note:
None of the expected cell counts are less than five, and so the chi-square
approximation is justified.

Carrie Madden STAT 2000 – Unit 5 117 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Archer

We now calculate the cell chi-square values. For example, the cell
chi-square value for the number of misses for Archer 2 is

(O − E )2 (25 − 19.8)2
= = 1.37
E 19.8

Carrie Madden STAT 2000 – Unit 5 118 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Archer

Other cell chi-square values are calculated similarly and are shown below
with the expected cell counts:

Archer
Result Archer 1 Archer 2 Archer 3 Archer 4 Archer 5 Row Total
25 30 30 50 25 160
Hit (22.4) (35.2) (25.6) (44.8) (32.0)
[0.30] [0.77] [0.76] [0.60] [1.53]
10 25 10 20 25 90
Missed (12.6) (19.8) (14.4) (25.2) (18.0)
[0.56] [1.37] [1.34] [1.07] [2.72]
Column
Total 35 55 40 70 50 250

None of the expected cell counts are less than five, and so the chi-square
approximation is justified.
Carrie Madden STAT 2000 – Unit 5 119 / 185
Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Archer

Solution
4 The test statistic is:

X (O − E )2
χ2 =
all cells
E
= 0.30 + 0.77 + . . . + 2.72 = 11.02

5 Under the null hypothesis, this test statistic follows a chi-square


distribution with (r − 1) (c − 1) = (1)(4) = 4 degrees of freedom.
The p-value is P(χ2 (4) ≥ 11.02).

Carrie Madden STAT 2000 – Unit 5 120 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Archer

Solution
5 We see from Table 5 that

P(χ2 (4) ≥ 9.49) = 0.05 and P(χ2 (4) ≥ 11.14) = 0.025.

Since 9.49 < χ2 = 11.02 < 11.14, our p-value is between 0.025 and
0.05.

Carrie Madden STAT 2000 – Unit 5 121 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Archer

Solution
Since the p-value < α = 0.05, we reject the null hypothesis.
6 We have sufficient evidence to conclude that the five archers are not
homogeneous with respect to their accuracy.

Suppose we had instead conducted the test using the critical value
approach. The decision rule would be to reject H0 if

χ2 ≥ χ2∗ = 9.49

where χ2∗ = 9.49 is the upper 0.05 critical value from the chi-square
distribution with 4 degrees of freedom.
We would still reject the null hypothesis, since χ2 = 11.02 > χ2∗ = 9.49.

Carrie Madden STAT 2000 – Unit 5 122 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Practice Question
A survey is conducted in each of four regions of Canada. Respondents are
asked whether they approve of the job being done by the prime minister.
Results are shown in the table below:
Region
Rating West Prairies Central Atlantic Row Total
Approve 94 65 61 38 258
Disapprove 28 30 89 32 179
Neutral 18 15 20 10 63
Column Total 140 110 170 80 500

What are the degrees of freedom for the appropriate test statistic?
A 5
B 6
C 8
D 9
E 11
Carrie Madden STAT 2000 – Unit 5 123 / 185
Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Practice Question
A survey is conducted in each of four regions of Canada. Respondents are
asked whether they approve of the job being done by the prime minister.
Results are shown in the table below:
Rating West Prairies Central Atlantic Row Total
Approve 94 65 61 38 258
Disapprove 28 30 89 32 179
Neutral 18 15 20 10 63
Column Total 140 110 170 80 500

What is the expected number of Central Canadians who disapprove of the


job being done by the prime minister?
A 58.28
B 60.86
C 66.34
D 69.57
E 72.49
Carrie Madden STAT 2000 – Unit 5 124 / 185
Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Z Test vs. χ2 Test

We say that a chi-square test can be used to test the equality of several
population proportions. In the case where we are conducting a two-sided
test comparing just two proportions, the chi-square test is in fact
equivalent to the two-sample z test. It can be shown that z 2 = χ2 and the
p-value’s of the two tests are identical.

Carrie Madden STAT 2000 – Unit 5 125 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Smoking

Recall the smoking experiment:


Example
We would like to compare the effectiveness of two popular treatments that
are designed to help smokers quit smoking. A sample of 125 smokers who
have expressed a desire to quit smoking volunteer to participate in an
experiment. The 63 subjects in Group 1 are assigned to chew nicotine gum
and the 62 subjects in Group 2 are assigned to wear a nicotine patch. At
the end of six months, 22 of the subjects in Group 1 and 17 of the
subjects in Group 2 have quit smoking.

Carrie Madden STAT 2000 – Unit 5 126 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Smoking

We conducted a test of H0 : p1 = p2 vs. Ha : p1 ̸= p2 and we obtained a


test statistic of z = 0.90 and a p-value of 0.3682.

s<-2*(1-pnorm(0.9))
s

## [1] 0.3681203

Carrie Madden STAT 2000 – Unit 5 127 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Smoking

Suppose instead we constructed a 2 × 2 Table 5or the data and contracted


a chi-square test for homogeneity. The resulting table is shown below:

Treatment
result Gum Patch Row Total
22 17 39
Quit (19.656) (19.344)
[0.2795] [0.2840]
41 45 86
Didn’t Quit (43.344) (42.656)
[0.1268] [0.1288]
Column Total 63 62 125

Carrie Madden STAT 2000 – Unit 5 128 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Smoking

The test statistic is found to be

χ2 = 0.2795 + . . . + 0.1288
= 0.8191

which is the same (after a rounding error) as we found using the z test
(z 2 = (0.9)2 = 0.81).
Under the null hypothesis, the test statistics follows a chi-square
distribution with (r − 1)(c − 1) = (1)(1) = 1 degree of freedom.

Carrie Madden STAT 2000 – Unit 5 129 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Smoking

The exact p-value is

pval<-pchisq(0.81,1, lower.tail = FALSE)


pval

## [1] 0.3681203

Carrie Madden STAT 2000 – Unit 5 130 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Smoking

Because of the large P-value, we do not reject H0 , but remember, we


never accept H0 . We cannot conclude that the two treatments are
homogenous; we can only say we have insufficient evidence that they are
not homogeneous.
We could conclude that homogeneity appears to be a reasonable
assumption.

Carrie Madden STAT 2000 – Unit 5 131 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Practice Question

We conduct a hypothesis test of H0 : p1 = p2 vs. Ha : p1 ̸= p2 to compare


the proportion of patients whose condition improves with an experimental
drug vs. a placebo. We conduct an experiment and determine the value of
the test statistic to be z = 1.75, and the p-value is 0.08. Suppose we had
instead conducted a chi-square test for homogeneity. The values of the
test statistic and the p-value would be:
A 3.06 and 0.0064
B 1.75 and 0.016
C 1.32 and 0.08
D 3.06 and 0.08
E 1.75 and 0.28

Carrie Madden STAT 2000 – Unit 5 132 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Chi-Square Test for Independence

We will now examine another situation for which a chi-square test of


significance is appropriate.
In testing homogeneity among several populations with respect to some
variable, we took a separate sample from each of the populations and
compared them with respect to a single variable. We could choose the
sample size we took from each of the populations (i.e., the column totals).
Now consider the case where we wish to study the relationship between
two categorical variables. We will take one simple random sample
from a single population of individuals and measure and compare the
values for the two variables.

Carrie Madden STAT 2000 – Unit 5 133 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Chi-Square Test for Independence

We will then conduct a test of significance to examine whether or not the


two variables of interest are independent. That is, we will test the
hypotheses

H0 : The two categorical variables of interest are independent.


Ha : The two categorical variables of interest are not independent
(i.e., they are dependent).

Carrie Madden STAT 2000 – Unit 5 134 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Money and Happiness

According to a famous saying “money can’t buy you happiness”, but are
wealth and happiness really independent?
A psychological study was conducted, in which subjects were analyzed and
categorized as either very happy, somewhat happy or unhappy. The income
levels of subjects were also examined, and each subject was categorized as
either low income, middle class, or wealthy. Let α = 0.10.

Carrie Madden STAT 2000 – Unit 5 135 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Money and Happiness

The data are displayed in the table below:

Income Levels
Happiness Low Income Middle Class Wealthy Row Total
Very Happy 7 16 13 36
Somewhat Happy 10 20 11 41
Unhappy 5 7 5 17
Column Total 22 43 29 94

Carrie Madden STAT 2000 – Unit 5 136 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Money and Happiness

Let α = 0.10.\ We are testing the hypotheses

H0 : Wealth and happiness are independent.


Ha : Wealth and happiness are dependent.

We will reject H0 if p-value ≤ α = 0.10.


Recall again that all hypothesis tests are conducted under the assumption
that the null hypothesis is true. So we must again ask, if the two variables
really are independent, then what would we expect to see?

Carrie Madden STAT 2000 – Unit 5 137 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Chi-Square Test for Independence

For example, if wealth and happiness were independent, what would be the
expected number of individuals in the sample who are unhappy and
wealthy?
Recall:
If two events are independent then:

P(AandB) = P(A)P(B)

For example, if wealth and happiness are independent, the probability of a


person being unhappy and wealthy is

P(UandW ) = P(U)P(W )

Carrie Madden STAT 2000 – Unit 5 138 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Chi-Square Test for Independence

As such, the expected number of individuals in our sample who are


unhappy and wealthy is

E = (total sample size) P(unhappy and wealthy)


= (table total) P(unhappy)P(wealthy)

Of course, we don’t know the true probabilities, so we must estimate


them by our sample proportions.

Carrie Madden STAT 2000 – Unit 5 139 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Money and Happiness

The estimated probability of an individual being unhappy is


row 3 total 17
p̂U = = = 0.1809
table total 94
The estimated probability of an individual being wealthy is
column 3 total 29
p̂W = = = 0.3085
table total 94

Carrie Madden STAT 2000 – Unit 5 140 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Expected Cell Counts

Therefore, the expected number of individuals in the sample who are


unhappy and wealthy is

E = (total sample size) p̂U p̂W


row 3 total column 3 total
  
= (table total)
table total table total
(row 3 total) (column 3 total) (17)(29)
= = = 5.24
table total 94

Carrie Madden STAT 2000 – Unit 5 141 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Expected Cell Counts

By a similar line of reasoning, the expected frequency for the cell at the
intersection of the r th row and the c th column is

(row r total) (column c total)


E=
table total

T
he formula for the expected cell count is the same as it was for the test
of homogeneity, but for different reasons! The two test are in fact
identical.

Carrie Madden STAT 2000 – Unit 5 142 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Money and Happiness

Other expected cell counts are calculated similarly and are shown in the
table below:

Income Levels
Happiness Low Income Middle Class Wealthy Row Total
7 16 13 36
Very Happy
(8.43) (16.47) (11.11)
10 20 11 41
Somewhat Happy
(9.60) (18.76) (12.65)
5 7 5 17
Unhappy
(3.98) (7.78) (5.24)
Column Total 22 43 29 94

Carrie Madden STAT 2000 – Unit 5 143 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Money and Happiness

One of the nine cells (11% < 20%) has an expected count less than five,
and all expected cell counts are greater than one, so the use of the
chi-square approximation is justified.
We will now calculate the cell chi-square values. For example, the cell
chi-square value for somewhat happy and wealthy individuals is

(O − E )2 (11 − 12.65)2
= = 0.22
E 12.65

Carrie Madden STAT 2000 – Unit 5 144 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Money and Happiness

Other cell chi-square values are calculated similarly and are shown in the
table below:

Income Levels
Happiness Low Income Middle Class Wealthy Row Total
7 16 13 36
Very Happy (8.43) (16.47) (11.11)
[0.24] [0.01] [0.32]
10 20 11 41
Somewhat Happy (9.60) (18.76) (12.65)
[0.02] [0.08] [0.22]
5 7 5 17
Unhappy (3.98) (7.78) (5.24)
[0.26] [0.08] [0.01]
Column Total 22 43 29 94

Carrie Madden STAT 2000 – Unit 5 145 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Test Statistic

The test statistic is

X (O − E )2
χ2 =
all cells
E
= 0.24 + 0.01 + . . . + 0.01 = 1.24.

Under the null hypothesis, this test statistic follows a chi-square


distribution with (r − 1) (c − 1) = (2)(2) = 4 degrees of freedom. The
p-value is P(χ2 (4) ≥ 1.24).

Carrie Madden STAT 2000 – Unit 5 146 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

P-Value

We see from Table 5 that

P(χ2 (4) ≥ 5.38) = 0.25

Since χ2 = 1.24 < 5.38, our p-value is greater than 0.25.

Carrie Madden STAT 2000 – Unit 5 147 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example

Since the p-value>α = 0.10, we fail to reject the null hypothesis at the
10% level of significance. We have insufficient evidence to conclude that
wealth and happiness are dependent (i.e., independence appears to be a
reasonable assumption).
Suppose we had instead conducted the test using the critical value
approach. The decision rule would be to reject H0 if

χ2 ≥ χ2∗ = 7.78

where χ2∗ = 7.78 is the upper 0.10 critical value from the chi-square
distribution with 4 degrees of freedom. We would still fail to reject the
null hypothesis, since χ2 = 1.24 < χ2∗ = 7.78.

Carrie Madden STAT 2000 – Unit 5 148 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Political Party

Example
The individuals in the previous example were also asked which political
party they support. We would like to conduct a hypothesis test, at the 1%
level of significance, to determine whether wealth and political preference
are independent. The data are displayed in the table below:
Income Levels
Political Party Low Income Middle Class Wealthy Row Total
Conservative 2 14 20 36
Liberal 6 10 6 22
NDP 11 12 2 25
Green 3 7 1 11
Column Total 22 43 29 94

\end{frame}

Carrie Madden STAT 2000 – Unit 5 149 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Political Party

Solution
1 Let α = 0.01.

2 We are testing the hypotheses

H0 : Wealth and political preference are independent.


Ha : Wealth and political preference are dependent.

3 We will reject H0 if the p-value ≤ α = 0.01.

Carrie Madden STAT 2000 – Unit 5 150 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Political Party

Solution
We will first calculate the expected cell counts. For example, the expected
number of middle class individuals in the sample who support the NDP is

(row 3 total) (column 2 total) (25)(43)


E= = = 11.44
table total (94)

Other expected cell counts are calculated similarly and are shown in the
table on the following page.

Carrie Madden STAT 2000 – Unit 5 151 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Political Party

Solution
Income Levels
Political Party Low Income Middle Class Wealthy Row Total
2 14 20 36
Conservative
(8.43) (16.47) (11.11)
6 10 6 22
Liberal
(5.15) (10.06) (6.79)
11 12 2 25
NDP
(5.85) (11.44) (7.71)
3 7 1 11
Green
(2.57) (5.03) (3.39)
Column Total 22 43 29 94

Carrie Madden STAT 2000 – Unit 5 152 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Political Party

Note:
Two of the 12 cells (17% < 20%) have expected counts less than five, and
all expected cell counts are greater than one, so the use of the chi-square
approximation is justified.

Solution
We will now calculate the cell chi-square values. For example, the cell
chi-square value for wealthy individuals who support the Liberal Party is

(O − E )2 (6 − 6.79)2
χ2 = = = 0.09
E 6.79
Other cell chi-square values are calculated similarly and are shown in the
table on the following page.

Carrie Madden STAT 2000 – Unit 5 153 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Political Party


Solution
Income Levels
Political Party Low Income Middle Class Wealthy Row Total
2 14 20 36
Conservative (8.43) (16.47) (11.11)
[4.90] [0.37] [7.11]
6 10 6 22
Liberal (5.15) (10.06) (6.79)
[0.14] [0.00] [0.09]
11 12 2 25
NDP (5.85) (11.44) (7.71)
[4.53] [0.03] [4.23]
3 7 1 11
Green (2.57) (5.03) (3.39)
[0.07] [0.77] [1.68]
Column Total 22 43 29 94

Carrie Madden STAT 2000 – Unit 5 154 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Political Party

Solution
4 The test statistic is:

X (O − E )2
χ2 =
all cells
E
= 4.90 + 0.37 + . . . + 1.68 = 23.92

Under the null hypothesis, this test statistic follows a chi-square


distribution with (3)(2) = 6 degrees of freedom.
5 The p-value is P(χ2 (6) ≥ 23.92)
P(χ2 (6) ≥ 22.46) and P(χ2 (6) ≥ 24.10) = 0.0005
Since 22.46 < χ2 = 23.92 < 24.10, our p-value is between 0.0005
and 0.001.

Carrie Madden STAT 2000 – Unit 5 155 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Political Party

Solution
Since the p-value < α = 0.01, we reject the null hypothesis.
6 We have sufficient evidence to conclude that wealth and political
preference are dependent.

Suppose we had instead conducted the test using the critical value
approach. The decision rule would be to reject H0 if

χ2 ≥ χ2∗ = 16.81

where χ2∗ = 16.81 is the upper 0.01 critical value from the chi-square
distribution with 6 degrees of freedom. We would still reject the null
hypothesis, since χ2 = 23.92 > χ2∗ = 16.81

Carrie Madden STAT 2000 – Unit 5 156 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Political Party

We see that the four cells that contribute the most to the test statistics
are for low income and wealthy individuals who support the Conservative
Party and the NDP.
It appears that poorer individuals tend to support the NDP while wealthier
voters are more likely to support the Conservative Party.

Carrie Madden STAT 2000 – Unit 5 157 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Practice Question
In a survey, 400 people were classified with respect to stress level and the
occurrence of migraine headaches. The data are shown in the table below:
Stress Level
Migraine Low High
Never 110 43 153
Occasional 114 80 194
Often 26 27 53
250 150 400

We would like to test whether the two variables are independent at the 5%
level of significance. What is the critical value of the test?
A 4.61
B 5.99
C 7.81
D 9.49
E 11.07
Carrie Madden STAT 2000 – Unit 5 158 / 185
Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Practice Question
In a survey, 400 people were classified with respect to stress level and the
occurrence of migraine headaches. The data are shown in the table below:
Stress Level
Migraine Low High
Never 110 43 153
Occasional 114 80 194
Often 26 27 53
250 150 400

What is the expected count of individuals with high stress who


occasionally suffer from migraine headaches?
A 74.25
B 70.75
C 72.75
D 78.50
E 76.25
Carrie Madden STAT 2000 – Unit 5 159 / 185
Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Practice Question
In a survey, 400 people were classified with respect to stress level and the
occurrence of migraine headaches. The data are shown in the table below:
Stress Level
Migraine Low High
110 43 153
Never
95.625 57.375
114 80 194
Occasional
121.25 72.75
26 27 53
Often
33.125 19.875
250 150 400

What is the cell chi-square value for low stress individuals who never
experience migraines?
A 1.97 (B) 2.04 (C) 2.16 (D) 1.78 (E) 1.89

Carrie Madden STAT 2000 – Unit 5 160 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Practice Questions

Stress Level What is the p-value for the


Migraine Low High chi-square test for
110 43 153 independence?
Never 95.625 57.375
2.1609 3.6016 A between 0.001 and 0.0025
114 80 194 B between 0.0025 and 0.005
Occasional 121.25 72.75 C between 0.005 and 0.01
0.4335 0.7225
26 27 53 D between 0.01 and 0.02
Often 33.125 19.875 E between 0.02 and 0.025
1.5325 2.5542
250 150 400

Carrie Madden STAT 2000 – Unit 5 161 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Chi-Square Goodness-of-fit Tests

Throughout most of this course, we have conducted tests of significance


concerning some population parameter while assuming the form of the
population distribution is known. For example, many of our methods have
required the assumption that the variable of interest follows a normal
distribution. The most we have been able to do is to look at a histogram of
the sample data to assess whether normality was a reasonable assumption.

Carrie Madden STAT 2000 – Unit 5 162 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Goodness-of-fit

What if we don’t know the distribution and we would like to determine if


it has some specific form?
Fortunately, we have the option of conducting a formal test of significance
to determine whether a random variable X follows some specific
distribution. These tests are known as the chi-square goodness of fit tests.

Carrie Madden STAT 2000 – Unit 5 163 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Western Canada Pick 3

The Western Canada Pick 3 is a lottery which players choose a three-digit


number (between 000 and 999). Draws are held once per day, when a
lucky three-digit number is selected.
One player has suspicions about whether the draws are truly random. She
has noticed that some digits seem to come more frequently than others.

Carrie Madden STAT 2000 – Unit 5 164 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Western Canada Pick 3

We will conduct a hypothesis test to investigate the player’s suspicion.


Let X be a digit selected in a given Pick 3 draw. If the draw was random,
we would expect all digits 0 through 9 to be drawn with equal probability,
and so over a long period of time, we would expect an equal number of
each digit to have been drawn.

Carrie Madden STAT 2000 – Unit 5 165 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Western Canada Pick 3

A random variable X is said to follow a discrete uniform distribution with


parameter a and b (a < b) if all integer values from a to b have equal
probability.
We are testing whether X has a discrete uniform distribution on the
interval from a = 0 to b = 9, which would mean the probability
distribution of X is
1
P(X = x ) = , for x = 0, 1, . . . , 9
10

and we write X ∼ DU (0, 9).

Carrie Madden STAT 2000 – Unit 5 166 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Western Canada Pick 3

The parameters of any discrete uniform distribution are the minimum and
maximum values of X (in our case, 0 and 9, respectively), which enable us
to calculate probabilities of occurrence for any value of X .
We will now conduct the chi-square goodness of fit test to investigate
whether the player’s claim has any merit.

Carrie Madden STAT 2000 – Unit 5 167 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Western Canada Pick 3

Let α = 0.05.
We are testing the hypotheses

H0 : All ten digits are equally likely to be drawn,


i.e., X follows a uniform distribution.
Ha : The ten digits are not equally likely to be drawn,
i.e., X does not follow a uniform distribution.

We will reject H0 if the p-value ≤ α = 0.05.

Carrie Madden STAT 2000 – Unit 5 168 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Western Canada Pick 3

Data were tabulated for all 1000 Pick 3 draws from November 1, 2008 to
July 30, 2011. The number of times each digit was drawn is shown in the
contingency table below.

Digit 0 1 2 3 4 5 6 7 8 9
Freq. 314 285 302 307 294 277 284 321 318 298

Note that in 1000 draws, a total of 3(1000) = 3000 digits were drawn.

Carrie Madden STAT 2000 – Unit 5 169 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Western Canada Pick 3

The first thing to do is to look at a histogram of the data to get an idea of


what we can expect from the test. The histogram is shown below:

300
250
200
150
100
50
0

0 2 4 6 8 10

It certainly appears reasonable to assume that these data come from a


population that is uniformly distributed. We still need the test, however,
to verify this.

Carrie Madden STAT 2000 – Unit 5 170 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Western Canada Pick 3

The expected frequency of each of these digits under the null hypothesis is

E = (total number of observations) ¶(digit)


= 3000 (0.1) = 300

Note:
The use of the chi-square distribution for goodness of fit tests is
approximate, and can safely be used when all cell counts are at least
five.

Clearly this condition is satisfied in this case.

Carrie Madden STAT 2000 – Unit 5 171 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Western Canada Pick 3

We now calculate the cell chi-square values for each of the digits. For
example, the cell chi-square value for the digit 8 is

(O − E )2 (318 − 300)2
= = 1.08
E 300
Other cell chi-square values are calculated similarly and are displayed under
the observed and expected counts in the table on the next page.

Carrie Madden STAT 2000 – Unit 5 172 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Western Canada Pick 3

Digit 0 1 2 3 4 5 6 7 8 9
Freq. 314 285 302 307 294 277 284 321 318 298
300 300 300 300 300 300 300 300 300 300
0.65 0.75 0.02 0.16 0.12 1.76 0.85 1.47 1.08 0.01

We calculate the chi-square test statistic by adding all of the cell


chi-square values:

χ2 = 0.65 + 0.75 + . . . + 0.01 = 6.86

Carrie Madden STAT 2000 – Unit 5 173 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Western Canada Pick 3

Under the null hypothesis, this test statistic follows a chi-square


distribution with degrees of freedom equal to

(# of cells − 1) = 10 − 1 = 9

The degrees of freedom for goodness of fit tests are only (# of cells − 1) if
we know the values of all necessary parameters. In the above example, we
knew the maximum and minimum values, so all parameter values are
known. When we don’t know the value of a parameter, we must estimate
it and deduct one additional degree of freedom.

Carrie Madden STAT 2000 – Unit 5 174 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Western Canada Pick 3

In general, the degrees of freedom in a chi-square goodness of fit test are

(# of cells − 1) − (# of estimated parameters)

Carrie Madden STAT 2000 – Unit 5 175 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Western Canada Pick 3

We see from Table 5 that P(χ2 (9) ≥ 11.29) = 0.25.

Carrie Madden STAT 2000 – Unit 5 176 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Western Canada Pick 3


Since the p >α = 0.05, we fail to reject the null hypothesis.
We have insufficient evidence to conclude that the selected numbers do
not follow a uniform distribution (i.e., it is reasonable to assume that the
distribution is uniform).
Suppose we had instead conducted the test using the critical value
approach. The decision rule would be to reject H0 if

χ2 ≥ χ2∗ = 16.92

where χ2∗ = 16.92 is the upper 0.01 critical value from the chi-square
distribution with 9 degrees of freedom.
We would still fail to reject the null hypothesis since

χ2 = 6.86 < χ2∗ = 16.92

Carrie Madden STAT 2000 – Unit 5 177 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Optometrist

An optometrist believes that the distribution of eye colours in some


population is as follows:

Eye Colour Brown Blue Green Other


Probability 0.5 0.1 0.2 0.2

In a sample of 150 people from the population, 66 have brown eyes, 23


have blue eyes, 35 have green eyes and 26 have another eye colour. We
would like to conduct a chi-square goodness-of-fit test at the 5% level of
significance to determine whether the optometrist’s proposed distribution
is correct.

Carrie Madden STAT 2000 – Unit 5 178 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Optometrist

1 Let α = 0.05
2 H0 The Optometrists proposed distribution is correct vs. Ha : The
distribution is something other than the proposition.
3 Decision Rule: Reject H0 if p-value is ≤ 0.05

Carrie Madden STAT 2000 – Unit 5 179 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Optometrist

4 Test Statistic:
Eye Colour Brown Blue Green Other
Observed 66 23 35 26
Expected (150*0.5=75) 15 30 30
Cell Chi-Square 1.08 4.27 0.833 0.533
Therefore χ2 = 6.717

Carrie Madden STAT 2000 – Unit 5 180 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Optometrist

5 p-value:
p-value: Follow along #cells − 1 = 3df. We see our test statistic falls
between 6.25 and 7.82 which corresponds to a p-value between 0.05
and 0.10.

pvalue<-pchisq(6.716,3, lower.tail = FALSE)


pvalue

## [1] 0.08152236

Carrie Madden STAT 2000 – Unit 5 181 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Optometrist

6 Conclusion :
Since our p-value is > 0.05, we fail to reject H0 , there is insufficent
evidence to support the alternative, hence the Optometrist’s
proposition is plausible.

Carrie Madden STAT 2000 – Unit 5 182 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Example – Optometrist

Supposed we used the critical value method, the decision rule would be:
Reject H0 if χ2 ≤ χ2∗
3,0.05 = 7.82. Since our test statistic is less than 7.82
we have the same conclusion.

qchisq(0.05,3,lower.tail = FALSE)

## [1] 7.814728

Carrie Madden STAT 2000 – Unit 5 183 / 185


Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Practice Question
A website claims that the distribution of blood types for a certain group of
people is as follows:

Blood Type A B AB O
Probability 0.4 0.2 0.1 0.3

In a random sample of 200 people from this group, 70 had blood type A,
20 had blood type B, 30 had blood type AB and 80 had blood type O.
We would like to conduct a chi-square goodness of fit test at the 1% level
of significance to verify the website’s claim. What is the critical value of
the test?
A 7.82
B 9.49
C 13.28
D 11.35
E 6.63
Carrie Madden STAT 2000 – Unit 5 184 / 185
Unit 5 – Test for Proportions and Analysis of Categorical Data
and Goodness-of-fit Tests

Practice Question
Under the null hypothesis (that the distribution of blood types is the one
claimed by the website), the expected number of people in the sample
with blood type A is 200(0.4) = 80. Other expected counts are calculated
similarly and are shown below:

Blood Type A B AB O
Count 70 20 30 80
Expected 80 40 20 60

What is the value of the appropriate test statistic?


A 22.92
B 18.78
C 24.37
D 31.84
E 16.05
Carrie Madden STAT 2000 – Unit 5 185 / 185

You might also like