Chapter Four
Chapter Four
Chapter Four
Chi-square Distributions
.
1
What is chi-square (𝒙𝟐 ) distribution?
2
Characteristics of the square distributions
1. It is a continuous distribution
2. It has a single parameter; degree of freedom, ν
3. The mean and variance of the chi-square distribution are ν and 2ν.
Thus, the mean and Variance depend on the degree of freedom.
4. It is based on a comparison of the sample of observed data (results)
with the expected results under the assumption that the null
hypothesis is true.
Count…
6. It is a skewed distribution (right skewed) and only non negative
values of the variable X2 are possible.
❑The skewness decreases as ν increases; and when V increases
without limit it approaches a normal distribution.
❑It extends indefinitely in the positive direction.
6
Areas of application
7
Test For The Independence Between Two Variables
➢X2 test of independence is used to analyze the frequencies of two
variables with multiple categories to determine whether the two
variables are independent or related to each other when a single
sample is selected
➢ The sample data are given in to a two way table called a contingency table.
➢Because the X2 test of independence uses a contingency table, the test
is sometimes referred to as Contingency Analysis (Contingency
table test). 8
Count…
➢The X2 test of independence is used to analyze cases, like:
▪ Whether employee absenteeism is independent of job classification
▪ Whether beer preference is independent of sex (gender)
▪ Whether favorite sport is independent of nationality.
▪ Whether type of financial investment is independent of geographic
region.
Where: Oij (fo) = observed frequency for contingency table category in row i and column j.
Eij (fe) = expected frequency for contingency table in row i and column j.
= 𝟐𝟏. 𝟏𝟕𝟒
4. Reject the null hypothesis that choice of TV program is independent from income
level. 13
Example 2
A human resource manager at EAGLE Inc. was interested in knowing whether the
voluntary absence behavior of the firm’s employees was independent of marital status.
The employee files contained data on marital status and on voluntary absenteeism
behavior for a sample of 500 employees is shown below.
Marital Status
Absence behavior Married Divorced Widowed Single Total
Often absent 36 16 14 34 100
Seldom absent 64 34 20 82 200
Never absent 50 50 16 84 200
Total 150 100 50 200 500
Test the hypothesis that absence behavior is independent of marital status at a
significance level of 1%. 14
Solution 2
1. Ho: Voluntary absence behavior is independent of marital status
Ha: Voluntary absence behavior and marital status are dependent
2. = 0.01, V = (R-1) (C-1) = (3-1) (4-1) = 6,
X2 ,ν= X2 0.01,6 = 16.81, Reject Ho if sample X2 > 16.81
3. Sample X2 Observed freq Expected Freq (fo-fe)2 𝑓𝑜 − 𝑓𝑒 2
(fo) (fe) 𝑓𝑒
36 30 36 1.200
64 60 16 0.267
50 60 100 1.667 4. Do not reject Ho; because
16 20 16 0.800
34 40 36 0.900
50 40 100 2.500
10.883 is less than 16.81.
14 10 16 1.600
20 20 0 0.000 Voluntary absence and marital
16 20 16 0.800
34 40 36 0.900 status are independent
82 80 4 0.050
84 80 16 0.200
𝑓𝑜 −𝑓𝑒 2 10.883
𝝌𝟐 = σ
𝑓𝑒
15
Example 3
The personnel administrator of XYZ Company provided the following data as an example of
selection among 40 male and 40 female applicants for 12 open positions.
Applicant Status
Selected Not selected Total
Male 7 33 40
Female 5 35 40
Total 12 68 80
a. The X2 test of independence was suggested as a way of determining if the decision to hire 7
malls and females should be interpreted as having a selection bias in favor of males.
Conduct the test of independence using = 0.10. What is your conclusion?
b. Using the same test, would the decision to hire 8 malls and 4 females suggest concern for a
selection bias?
c. How many males could be hired for the 12 open positions before the procedure would
concern for a selection bias? 16
Solution a
1. Ho: There is no selection bias in favor of males. (Selection status and gender of the
applicant are independent).
Ha: There is selection bias in favor of males. (Selection status and gender of the
applicant are not independent).
2. = 0.1, V = (R-1) (C-1) = (2-1) (2-1) =1, X2 0.1,1 = 2.71, Reject Ho if sample X2 > 2.71
3. Sample X2 Observed freq Expected Freq (fo-fe)2 𝑓𝑜 − 𝑓𝑒 2
(fo) (fe) 𝑓𝑒
7 6 1 0.1667
33 34 1 0.0294
5 6 1 0.1667
35 34 1 0.0294
𝐟𝐨 −𝐟𝐞 𝟐 0.3922
𝝌𝟐 = σ 𝐟𝐞
4. Do not reject Ho; because 0.392 is less than 2.71. There is no selection bias in favor
of male applicants. 17
Solution b
1. Ho: There is no selection bias in favor of males. (Selection status and gender of the
applicant are independent).
Ha: There is selection bias in favor of males. (Selection status and gender of the
applicant are not independent).
2. = 0.1, V = (R-1) (C-1) = (2-1) (2-1) = 1, X2 0.1,1 = 2.71, Reject Ho if sample X2 > 2.71
3. Sample X2 Observed freq Expected Freq (fo-fe)2 𝑓𝑜 − 𝑓𝑒 2
(fo) (fe) 𝑓𝑒
8 6 4 0.6667
32 34 4 0.1176
4 6 4 0.6667
36 34 4 0.1176
𝐟𝐨 −𝐟𝐞 𝟐 1.5686
𝝌𝟐 = σ 𝐟𝐞
4. Do not reject Ho; because 1.569 is less than 2.71. There is no selection bias in favor
of male applicants. 18
Solution c
There is no shortcut method to answer this question. Therefore, lets try by increasing the
number of male applicants who are accepted and decreasing the number of female applicants.
1. Ho: There is no selection bias in favor of males. (Selection status and gender of the applicant
are independent).
Ha: There is selection bias in favor of males. (Selection status and gender of the applicant are
not independent).
2. = 0.1, V = (R-1) (C-1) = (2-1) (2-1) = 1, X2 0.1,1 = 2.71, Reject Ho if sample X2 > 2.71
3. Sample X2
22
Solution
1. Ho: P1 = 9/16; P2 =3/16; P3 = 3/16; P4 = 1/16
Ha: One or more of the proportions are not equal to the proportions given in the Ho.
2. = 0.05, ν =K - 1 = 4-1 = 3, X20.05, 3 = 7.81, Reject Ho if sample X2> 7.81
3. Test Statistics (Sample χ2) n= 800
2
Class Observed freq Expected Freq (fo-fe)2 𝑓𝑜 − 𝑓𝑒
(fo) (fe = npi) 𝑓𝑒
Current 439 450 121 0.269
Moderately late 168 150 324 2.160
Very late 133 150 289 1.927
Uncollectible 60 50 100 2.000
fo −fe 2
χ2 =σ fe
6.356
24
Solution
1. Ho: People have no color preference with this product; P1 = P2 = P3 = 1/3
Ha: People have color preference with this product
2. = 0.05, V= K-1 = 3 -1=2, X2 0.05,2 = 5.99, Reject Ho if sample X2 is greater than 5.99.
3. Sample χ2
2
Class Observed freq Expected Freq (fo-fe)2 𝑓𝑜 − 𝑓𝑒
(fo) (fe = npi); pi = 1/3 𝑓𝑒
Red 60 40 400 10.00
Blue 20 40 400 10.00
Yellow 40 40 0 0.00
fo −fe 2
χ2 = σ 20.00
fe
4. Reject Ho; because 20 > 5.99. This means that customers do have color preference. It
appears that red is the most popular color and blue is the least popular.
25
Example 3
Rating sciences, Inc., a TV program rating service, surveyed 600 families where the television
was turned on during the prime time on week nights. They found the following numbers of
people turned to the various networks.
Name of the network Type Number of viewers
NBC 210
CBS Commercial 170
ABC 165
PBS Noncommercial 55
600
a) Test the hypothesis that all four networks have the same proportion of viewers during this
prime time period. Use = 0.05
b) Eliminate the results for PBS and repeat the test of hypothesis for the three commercial
networks, using = 0.05
c) Test the hypothesis that each of the three major networks has 30% of the weeknight prime
time market and PBS has 10% using = 0.005 26
Solution a
1. Ho: All of the four networks do have equal number of viewers; P1 = P2 = P3 = P4 = 1/4.
Ha: All of the four networks do not have equal number of viewers.
2. = 0.05, V= K-1 = 4 -1= 3
X2,ν = X2 0.05,3 = 7.81, Reject Ho if sample X2 is greater than 7.81
3. Sample χ2
Class Observed freq Expected Freq (fo-fe)2 𝑓𝑜 − 𝑓𝑒 2
(fo) (fe = npi); pi = 1/4 𝑓𝑒
NBC 210 150 3,600 24.0000
CBS 170 150 400 2.6667
ABC 165 150 225 1.5000
PBS 55 150 9,025 60.1667
fo −fe 2
χ2 = σ 88.3334
fe
Frequency 10 41 60 20 6 3
32
Solution Total sample
1. Ho: The frequency distribution is Binomial with n = 5 and P = 0.4 size = 140
Ha: The frequency distribution is not binomial with n = 5 and P = 0.4
2. = 0.05, K-1 –m = 5-1-0 = 4
X , ν = X 0.05,4 = 9.49 , Reject Ho if sample χ2 is greater than 9.49
2 2
3. Sample χ2. No. of sales Prob. with Observed Expected Freq (fo-fe)2 𝑓𝑜 − 𝑓𝑒 2
per day n= 5, p = 0.4 freq (fo) (fe = npi) 𝑓𝑒
0 .0778 10 10.892 0.7957 0.0731
Number 1 .2592 41 36.288 22.2029 0.6119
of trial 2 .3456 60 48.384 134.9315 2.7888
3 .2304 20 32.256 150.2095 4.6567
4&5 .0870 9 12.18 10.1124 0.8302
𝐟𝐨 −𝐟𝐞 𝟐
χ2 =σ 8.9607
𝐟𝐞
4. Do not reject Ho. The data are well described by the binomial distribution with n=5
and P=0.4. 33
Example (Binomial) 2
A professional baseball player, Philippos, was at bat five times in each
of 100 games. Philippos claims that he has a probability of 0.4 of
getting a hit each time he goes to bat. Test his claim at the 0.05 level by
seeing if the following data are distributed binomially.
34
Solution
1. Ho: The freq. Distribution can be best described by binomial distribution with n=5, P=0.4
Ha: The freq. Distribution can’t be best described by binomial distribution with n=5, P=0.4
2. = 0.05, V = K-1 –m = 5-1-0 = 4
X2,ν = X2 0.05,4 = 9.49, Reject Ho if sample χ2 > 9.49
3. Sample χ2 2
No. of No. of games with Prob. with Expected freq (fo-fe)2 𝑓𝑜 − 𝑓𝑒
hits per that no. of hit (fo) n=5, P=0.4 (fe = npi) 𝑓𝑒
game
0 12 .0778 7.78 17.8084 2.2890
1 38 .2592 25.92 145.9264 5.6249
2 27 .3456 34.56 57.1536 1.6538
3 17 .2304 23.04 36.4816 1.5834
4&5 6 .0870 8.70 4.2900 0.8379
𝐟𝐨 −𝐟𝐞 𝟐
χ2 =σ 11.9940
𝐟𝐞
4. Reject Ho. The # of hit over the same in not binomially distributed 35
Example (Binomial) 3
The Ethiopian postal service is interested in modeling the “mangled letter” problem.
It has been suggested that any letter sent to a certain area has a 0.15 chance of being
mangled. Since the post office is so big, it can be assumed that two letters chances of
being mangled are independent. A sample of 310 people was selected, and two test
letters were mailed to each of them. The number of people receiving zero, one, or two
mangled letters was 260, 40, and 10, respectively. At the 0.10 level of significance, is
it reasonable to conclude that the number of mangled letters received by people
follows a binomial distribution with P = 0.15?
36
Solution
1. Ho: The number of mangled letters received by people follows a binomial distribution with n
= 2, p = 0.15.
Ha: The number of mangled letters received by people doesn’t follow a binomial distribution.
With n =2, P = 0.15.
2. = 0.1, V = K-1 – m = 3-1-0 = 2
2 2
X , ν = X 0.1,2 = 4.61, Reject Ho if sample x2 > 4.61
3. Sample χ2
No. of mangled Observed Prob. with Expected freq (fo-fe)2 𝑓𝑜 − 𝑓𝑒 2
letters freq. (fo) n=2 P=0.15 (fe = npi) 𝑓𝑒
0 260 0.7225 223.9750 1297.8006 5.7944
1 40 0.2550 79.0500 1524.9025 19.2904
2 10 0.0225 6.9750 9.1506 1.3119
𝐟𝐨 −𝐟𝐞 𝟐
χ2 = σ 26.3967
𝐟𝐞
4. Reject Ho. The number of hit over the game is not binomially distributed with n =
2 and P = 0.15. 37
Example (Poisson) 1
It is hypothesized that the number of breakdowns per month of a computer
system at a major university follows a Poisson distribution with μ = 2. The
data below show the observed number of breakdowns per month during a
sample of 100 months. Use a 5% level of significance and test the null
hypothesis.
Observed freq. 7 18 25 17 12 5
40
Solution
Before we solve the question, first we have to compute the arrival rate per minute, and
hence one degree of freedom is lost.
𝑛𝑢𝑚𝑏𝑒𝑟𝑜𝑓𝑎𝑟𝑟𝑖𝑣𝑎𝑙𝑠∗
σ
𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
𝜆= σ 𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
0 ∗ 7 + 18 ∗ 1 + 25 ∗ 2 + 17 ∗ 3 + 12 ∗ 4 + 5 ∗ 5
=
84
192
= = 𝟐. 𝟑𝒄𝒖𝒔𝒕/𝒎𝒊𝒏
84
1. Ho: The arrival of customers at a bank is Poisson distributed with λ = 2.3
Ha: The arrival of customers at a bank is not Poisson distributed with λ = 2.3
2.
41
count…
2. = 0.05, V = K-1 – m = 6-1-1 = 4
X2, ν = X2 0.05,4 = 11.07, Reject Ho if sample χ2 > 9.488
3. Sample χ2
Number of Observed Prob. with Expected freq (fo-fe)2 𝑓𝑜 − 𝑓𝑒 2
arrivals freq. (fo) λ=2.3 (fe = npi) 𝑓𝑒
0 7 0.1003 8.4252 2.0312 0.2411
1 18 0.2306 19.3704 1.8778 0.0969
2 25 0.2652 22.2768 7.4158 0.3329
3 17 0.2033 17.0772 0.0060 0.0003
4 12 0.1169 9.8196 4.7541 0.4841
5 or more 5 0.0837 7.0308 4.1241 0.5866
𝐟𝐨 −𝐟𝐞 𝟐
χ2 =σ 1.795
𝐟𝐞
43
Solution
1. Ho: The attitude scores are normally distributed with μ = 24.9 and σ = 7.194
Ha: The attitude scores are not normally distributed with μ = 24.9 and σ = 7.194
2. = 0.05 V = K-1 – m = 6-1-0 = 5
X2, ν = X2 0.05,5 = 11.07 Reject Ho if sample χ2 > 11.07
3. Sample χ2
𝑋−𝜇
• With 𝑍 = , the expected probability of each category can be obtained as follows:
𝜎
4. Do not Reject Ho. The attitude score are normally distributed with mean 24.9 and
standard deviation 7.194.
45