Document From Da??

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 40

Chi-square Analysis

The Chi-Square Test for Goodness-of-Fit


• The chi-square test for goodness-of-fit uses frequency data from a
sample to test hypotheses about the shape or proportions of a
population.
• Each individual in the sample is classified into one category on the
scale of measurement.
• The data, called observed frequencies, simply count how many
individuals from the sample are in each category.
The Chi-Square Test for Goodness-of-Fit
(cont.)
• The null hypothesis specifies the proportion of the population that
should be in each category.
• The proportions from the null hypothesis are used to compute
expected frequencies that describe how the sample would appear if
it were in perfect agreement with the null hypothesis.
The Chi-Square Test for Independence
• The second chi-square test, the chi-square test for
independence, can be used and interpreted in two
different ways:
1. Testing hypotheses about the
relationship between two variables in a
population, or
2. Testing hypotheses about differences between
proportions for two or more populations.
The Chi-Square Test for Independence
(cont.)
• Although the two versions of the test for
independence appear to be different, they are
equivalent and they are interchangeable.
• The first version of the test emphasizes the
relationship between chi-square and a correlation,
because both procedures examine the relationship
between two variables.
The Chi-Square Test for Independence
(cont.)
• The second version of the test emphasizes the
relationship between chi-square and an independent-
measures t test (or ANOVA) because both tests use
data from two (or more) samples to test hypotheses
about the difference between two (or more)
populations.
The Chi-Square Test for Independence
(cont.)
• The first version of the chi-square test for independence views the
data as one sample in which each individual is classified on two
different variables.
• The data are usually presented in a matrix with the categories for one
variable defining the rows and the categories of the second variable
defining the columns.
The Chi-Square Test for Independence
(cont.)
• The data, called observed frequencies, simply show
how many individuals from the sample are in each
cell of the matrix.
• The null hypothesis for this test states that there is no
relationship between the two variables; that is, the
two variables are independent.
The Chi-Square Test for Independence
(cont.)
• The second version of the test for independence
views the data as two (or more) separate samples
representing the different populations being
compared.
• The same variable is measured for each sample by
classifying individual subjects into categories of the
variable.
• The data are presented in a matrix with the different
samples defining the rows and the categories of the
variable defining the columns..
The Chi-Square Test for Independence
(cont.)
• The data, again called observed frequencies, show
how many individuals are in each cell of the matrix.
• The null hypothesis for this test states that the
proportions (the distribution across categories) are
the same for all of the populations
The Chi-Square Test for Independence
(cont.)
• Both chi-square tests use the same statistic. The
calculation of the chi-square statistic requires two
steps:

1. The null hypothesis is used to construct an


idealized sample distribution of expected frequencies
that describes how the sample would look if the data
were in perfect agreement with the null hypothesis.
The Chi-Square Test for Independence
(cont.)
For the goodness of fit test, the expected frequency for each
category is obtained by
expected frequency = fe = pn
(p is the proportion from the null hypothesis and n is the size
of the sample)

For the test for independence, the expected frequency for each
cell in the matrix is obtained by

(row total)(column total)


expected frequency = fe = ─────────────────
n
The Chi-Square Test for Independence
(cont.)
2. A chi-square statistic is computed to measure the amount of
discrepancy between the ideal sample (expected frequencies from
H0) and the actual sample data (the observed frequencies = fo).

A large discrepancy results in a large value for chi-square and


indicates that the data do not fit the null hypothesis and the
hypothesis should be rejected.
The Chi-Square Test for Independence
(cont.)
The calculation of chi-square is the same for all chi-
square tests:
(fo – fe)2
chi-square = χ2 = Σ ─────
fe

The fact that chi‑square tests do not require scores


from an interval or ratio scale makes these tests a
valuable alternative to the t tests, ANOVA, or
correlation, because they can be used with data
measured on a nominal or an ordinal scale.
Measuring Effect Size for the Chi-Square
Test for Independence
• When both variables in the chi-square test for
independence consist of exactly two categories (the
data form a 2x2 matrix), it is possible to re-code the
categories as 0 and 1 for each variable and then
compute a correlation known as a phi-coefficient that
measures the strength of the relationship.
Measuring Effect Size for the Chi-Square
Test for Independence (cont.)
• The value of the phi-coefficient, or the squared value
which is equivalent to an r2, is used to measure the
effect size.
• When there are more than two categories for one (or
both) of the variables, then you can measure effect
size using a modified version of the phi-coefficient
known as Cramér=s V.
• The value of V is evaluated much the same as a
correlation.
Chi-Square Applications
Goodness-of-Fit Tests
• The Question:
• Does the distribution of sample data resemble a specified
probability distribution, such as:
• the binomial, hypergeometric, or Poisson discrete distributions.
• the uniform, normal, or exponential continuous distributions.
• a predefined probability distribution.
• Hypotheses:
• H0: i = values expected H1: i  values expected
where

 j  1 .
Goodness-of-Fit Tests
• Rejection Region:
• Degrees of Freedom = k – 1 – m
• where k = # of categories, m = # of parameters

• Uniform Discrete: m = 0 so df = k – 1
• Binomial: m = 0 when  is known, so df = k – 1
m = 1 when  is unknown, so df = k – 2
• Poisson: m = 1 since µ usually estimated, df = k – 2
• Normal: m = 2 when µ and  estimated, df = k – 3
• Exponential: m = 1 since µ usually estimated, df = k – 2
Goodness-of-Fit Tests
• Test Statistic:

(O – E )2
2 j j
Ej
where Oj = Actual number observed in
each class
Ej = Expected number, j • n
Goodness-of-Fit: An
Example
• Problem 13.20: In a study of vehicle ownership, it has been found
that 13.5% of U.S. households do not own a vehicle, with 33.7%
owning 1 vehicle, 33.5% owning 2 vehicles, and 19.3% owning 3 or
more vehicles. The data for a random sample of 100 households in
a resort community are summarized below. At the 0.05 level of
significance, can we reject the possibility that the vehicle-
ownership distribution in this community differs from that of the
nation as a whole?
# Vehicles Owned # Households
0 20
1 35
2 23
3 or more 22
Goodness-of-Fit: An
Example
# Vehicles O E [O – E ] / E
j j j j
2
j

0 20 13.5 3.1296
1 35 33.7 0.0501
2 23 33.5 3.2910
3+ 22 19.3 0.3777
Sum = 6.8484
I. H0: 0 = 0.135, 1 = 0.337, 2 = 0.335, 3+ = 0.193
Vehicle-ownership distribution in this community is the same as
it is in the nation as a whole.
H1: At least one of the proportions does not equal the stated
value. Vehicle-ownership distribution in this community is not
the same as it is in the nation as a whole.
Goodness-of-Fit: An
Example
II. Rejection Region:
 = 0.05
df = k – 1 – m = 4 – 1 – 0 = 3
Do Not Reject H0 Reject H0
0.95 0.05
III. Test Statistic:
2 = 6.8484 c 2=7.815

IV. Conclusion: Since the test statistic of 2 = 6.8484 falls below the
critical value of 2 = 7.815, we do not reject H0 with at least 95%
confidence.
V. Implications: There is not enough evidence to show that vehicle
ownership in this community differs from that in the nation as a
whole.
Chi-Square Tests of
Independence
Between Two
• TheVariables
Question:
• Are the two variables independent? If the two variables of
interest are independent, then
• the way elements are distributed across the various levels of one
variable does not affect how they are distributed across the levels of
the other.
• the probability of an element falling in any level of the second
variable is unaffected by knowing its level on the first dimension.
An Integrated Definition of

Independence
• From basic probability:
If two events are independent
P(A and B) = P(A) • P(B)

• In the Chi-Square Test of Independence:


If two variables are independent
P(rowi and columnj) = P(rowi) • P(columnj)
Chi-Square Tests of
Independence
• Hypotheses:
• H0: The two variables are independent.
• H1: The two variables are not independent.
• Rejection Region:
• Degrees of freedom = (r – 1) (k – 1)
• Test Statistic:

(O – E )2
 2   ij ij
E
ij
Chi-Square Tests of
Independence
• Calculating expected values

E  P(row and column )n  P(row ) P(column )n


ij i j i j

# elements in row # elements in column j


 i n
n n

Cancelling two factors of n,


(# elements in row )  (# elements in column )
E  i j
ij n
Chi-Square Tests of
Independence
An Example, Problem 13.35: Researchers in a California
community have asked a sample of 175 automobile owners
to select their favorite from three popular automotive
magazines. Of the 111 import owners in the sample, 54
selected Car and Driver, 25 selected Motor Trend, and 32
selected Road & Track. Of the 64 domestic-make owners in
the sample, 19 selected Car and Driver, 22 selected Motor
Trend, and 23 selected Road & Track. At the 0.05 level, is
import/domestic ownership independent of magazine
preference? Based on the chi-square table, what is the most
accurate statement that can be made about the p-value for
the test?
Chi-Square Tests of
Independence
• First, arrange the data in a table.
Car and Motor Road &
Driver (1) Trend (2) Track (3) Totals
Import (Imp) 54 25 32 111
Domestic (Dom) 19 22 23 64
Totals 73 47 55 175
• Second, compute the expected values and
contributions to 2 for each of the six cells.
• Then to the hypothesis test....
Chi-Square Tests of
Independence
Car and Motor Road &
Driver (1) Trend (2) Track (3)
Import (Imp): O- 54 25 32
E- 46.3029 29.8114 34.8857
2 contribution - 1.2795 0.7765 0.2387

Domestic (Dom) : O- 19 22 23
E- 26.6971 17.1886 20.1143
2 contribution - 2.2192 1.3468 0.4140

 2 contributions = 6.2747


Chi-Square Tests of
Independence
• I. Hypotheses:
H0: Type of magazine and auto ownership are
independent.
H1: Type of magazine and auto ownership are not
independent.
• II. Rejection Region:
 = 0.05
df = (r – 1) (k – 1)
Do Not Reject H0 Reject H0
= (2 – 1)• (3 – 1)
0.95 0.05
=1•2=2
c 2=5.991
If 2 > 5.991, reject H0.
Chi-Square Tests of
Independence
• III. Test Statistic:
2 = 6.2747
• IV. Conclusion:
Since the test statistic of 6.2747 falls beyond the critical value of
5.991, we reject the null hypothesis with at least 95% confidence.
• V. Implications:
There is enough evidence to show that magazine preference is not
independent from import/domestic auto ownership.
• p-value: In a cell on a Microsoft Excel spreadsheet, type:
=CHIDIST(6.2747,2). The answer is: p-value = 0.043398
Chi-Square Tests of Multiple
’s
• The Question:
• Are the multiple population proportions all equal to
each other?

• Hypotheses:
• H0: 1 = 2 = ... = k
• H1: At least one of the population proportions differs
from the other.
Chi-Square Tests of
Multiple ’s
• Rejection Region:
Degrees of freedom: df = (k – 1)

• Test Statistic:

(O – E )2
 2   ij ij
E
ij
Chi-Square Tests of Multiple ’s
• Some applications:
• A Scenic America study of billboards found that 70% of the
billboards in a sample observed in Baltimore advertised
alcohol or tobacco products, compared to 50% in Detroit
and 54% in St. Louis.
• It has been reported that 4.9% of all U.S. households
burned wood as the main heating fuel in 1983, compared
to 4.6% in 1960 and 3.4% in 1980.
Chi-Square Tests of
Multiple ’s
• Comparison of –

• The Chi-Square Goodness-of-Fit Test:


The proportions being tested sum to one and the
categories are exhaustive.

• The Chi-Square Test of Multiple


Proportions:
The proportions being tested do not sum to one.
Chi-Square Test of a Single

Population
Variance
• The Question:
• Does the value of the sample variance differ from the
value of the assumed population variance?

• Hypotheses:
• H0: 2 {=, , } a specific value.
• H1: 2 {, >, <} a specific value.
Chi-Square Test of a Single

Population
• Variance
Rejection Region:
Degrees of freedom: df = n – 1
where n = sample size

• Test Statistic:

 2  (n –1)s2
2

You might also like