Chi Square Test For Normal Distribution - A Case
Chi Square Test For Normal Distribution - A Case
Chi Square Test For Normal Distribution - A Case
CD12-1
12.5:
ARRIVALS
FREQUENCY
0 1 2 3 4 5 6 7 8
14 31 47 41 29 21 10 5 2 200
To determine whether the number of arrivals per minute follows a Poisson distribution, the null and alternative hypotheses are as follows: H0: The number of arrivals per minute follows a Poisson distribution H1: The number of arrivals per minute does not follow a Poisson distribution Since the Poisson distribution has one parameter, its mean , either a specified value can be included as part of the null and alternative hypotheses, or the parameter can be estimated from the sample data. In this example, to estimate the average number of arrivals, you need to refer back to Equation (3.15) on page 111. Using Equation (3.15) and the computations in Table 12.13,
X =
mj f j
j =1
CD12-2
CD MATERIAL
ARRIVALS FREQUENCY fj mj fj
TABLE 12.13 Computation of the sample average number of arrivals from the frequency distribution of arrivals per minute
0 1 2 3 4 5 6 7 8
14 31 47 41 29 21 10 5 2 200
This value of the sample mean is used as the estimate of for the purposes of finding the probabilities from the tables of the Poisson distribution (Table E.7). From Table E.7, for = 2.9, the frequency of X successes (X = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, or more) can be determined. The theoretical frequency for each is obtained by multiplying the appropriate Poisson probability by the sample size n. These results are summarized in Table 12.14.
TABLE 12.14 Actual and theoretical frequencies of the arrivals per minute
ARRIVALS
ACTUAL FREQUENCY f0
THEORETICAL FREQUENCY fe = n P (X )
0 1 2 3 4 5 6 7 8 9 or more
14 31 47 41 29 21 10 5 2 0
0.0550 0.1596 0.2314 0.2237 0.1622 0.0940 0.0455 0.0188 0.0068 0.0030
11.00 31.92 46.28 44.74 32.44 18.80 9.10 3.76 1.36 0.60
Observe from Table 12.14 that the theoretical frequency of 9 or more arrivals is less than 1.0. In order to have all categories contain a frequency of 1.0 or greater, the category 9 or more is combined with the category of 8 arrivals. The chi-square test for determining whether the data follow a specific probability distribution is computed using Equation (12.8).
2 k p 1 =
( f 0 f e )2 fe
(12.8)
where f0 = observed frequency fe = theoretical or expected frequency k = number of categories or classes remaining after combining classes p = number of parameters estimated from the data
CD12-3
Returning to the example concerning the arrivals at the bank, nine categories remain (0, 1, 2, 3, 4, 5, 6, 7, 8 or more). Since the mean of the Poisson distribution has been estimated from the data, the number of degrees of freedom are k p 1=9 1 1 = 7 degrees of freedom
Using the 0.05 level of significance, from Table E.4, the critical value of 2 with 7 degrees of freedom is 14.067. The decision rule is Reject H0 if 2 > 14.067; otherwise do not reject H0. From Table 12.15, since 2 = 2.28954 < 14.067, the decision is not to reject H0. There is insufficient evidence to conclude that the arrivals per minute do not fit a Poisson distribution.
TABLE 12.15 Computation of the chi-square test statistic for the arrivals per minute
ARRIVALS
fo
fe
(fo
fe)
(fo
fe)2
(fo
fe)2/fe
0 1 2 3 4 5 6 7 8 or more
14 31 47 41 29 21 10 5 2
0.81818 0.02652 0.01120 0.31264 0.36478 0.25745 0.08901 0.40894 0.00082 2.28954
From Table E.2, the area below Z = 4.22 is approximately 0.0000. To compute the area between 10.0 and 5.0, the area below 5.0 is computed as follows Z = 5.0 10.149 = 3.17 4.773
CD12-4
CD MATERIAL
From Table E.2, the area below Z = 3.17 is approximately 0.00076. Thus, the area between 5.0 and 10.0 is the difference in the area below 5.0 and the area below 10.0, which is 0.00076 0.0000 = 0.00076. Continuing, to compute the area between 5.0 and 0.0, the area below 0.0 is computed as follows Z = 0.0 10.149 = 2.13 4.773
From Table E.2, the area below Z = 2.13 is approximately 0.0166. Thus the area between 0.0 and 5.0 is the difference in the area below 0.0 and the area below 5.0, which is 0.0166 0.00076 = 0.01584. In a similar manner, the area in each class interval can be computed. The complete set of computations needed to find the area and expected frequency in each class is summarized in Table 12.16.
TABLE 12.16 Computation of the area and expected frequencies in each class interval for the 5-year annualized returns
CLASSES
X X
AREA BELOW
AREA IN CLASS
fe = n P (X )
Below 10.0 10.0 but < 5.0 5.0 but <0.0 0.0 but <5.0 5.0 but <10.0 10.0 but <15.0 15.0 but <20.0 20.0 but <25.0 25.0 but <30.0 30.0 or more
0.00000 0.00076 0.01660 0.14010 0.48800 0.84610 0.98030 0.99906 1.00000 1.00000
0.00000 0.00076 0.01584 0.12350 0.34790 0.3581 0.13420 0.01876 0.00094 0.00000
0.00000 0.12008 2.50272 19.51300 54.96820 56.5798 21.20360 2.96408 0.14852 0.00000
Observe from Table 12.16 that the theoretical frequency of below 10.0, between 10.0 and 5.0, between 25.0 and 30.0, and 30.0 or more are all less than 1.0. In order to have all categories contain a frequency of 1.0 or greater, the categories below 10.0 and between 10.0 and 5.0 are combined with the category 5.0 to 0.0 and the categories between 25.0 and 30.0, and 30.0 or more are combined with the category 20.0 to 25.0. The chi-square test for determining whether the data follow a specific probability distribution is computed using Equation (12.8) on page CD12-2. In this example, after combining classes, 6 classes remain. Since the population mean and standard deviation have been estimated from the sample data, the number of degrees of freedom is equal to k p 1 = 6 2 1 = 3. Using a level of significance of 0.05, the critical value of chi-square with 3 degrees of freedom is 7.815. Table 12.17 summarizes the computations for the chi-square test.
TABLE 12.17 Computation of the chi-square test statistic for the 5-year annualized returns
CLASSES
fo
fe
(fo
fe)
(fo
fe)2
(fo
fe)2/fe
Below <0.0 0.0 but <5.0 5.0 but <10.0 10.0 but <15.0 15.0 but <20.0 20.0 and above
4 14 58 61 17 4
From Table 12.17, since 2 = 3.87963 < 7.815, the decision is not to reject H0. Thus there is insufficient evidence to conclude that the 5-year annualized return does not fit a normal distribution.
CD12-5
12.52 A random sample of 500 car batteries revealed the following distribution of battery life (in years).
LIFE (IN YEARS) FREQUENCY
0 1 2 3 4 5 6
For these data, X = 2.80 and S = 0.97. At the 0.05 level of significance, does battery life follow a normal distribution? 12.53 A random sample of 500 long distance telephone calls revealed the following distribution of call length (in minutes).
LENGTH (IN MINUTES) FREQUENCY
Does the distribution of service interruptions follow a Poisson distribution? (Use the 0.01 level of significance.) 12.50 Referring to the data in problem 12.49, at the 0.01 level of significance, does the distribution of service interruptions follow a Poisson distribution with a population mean of 1.5 interruptions per day? 12.51 The manager of a commercial mortgage department of a large bank has collected data during the past two years concerning the number of commercial mortgages approved per week. The results from these two years (104 weeks) indicated the following:
NUMBER OF COMMERCIAL MORTGAGES APPROVED
FREQUENCY
0 1 2 3 4 5 6 7
13 25 32 17 9 6 1 1 104
a. Compute the mean and standard deviation of this frequency distribution. b. At the 0.05 level of significance, does call length follow a normal distribution?
Does the distribution of commercial mortgages approved per week follow a Poisson distribution? (Use the 0.01 level of significance.)