Tests of Hypothesis
Tests of Hypothesis
Tests of Hypothesis
Hypothesis testing determines the validity of the assumption (technically described as null hypothesis) with a view to choose between two conflicting hypotheses about the value of a population parameter. Hypothesis testing helps to decide on the basis of a sample data, whether a hypothesis about the population is likely to be true or false. Statisticians have developed several tests of hypotheses (also known as the tests of significance) for the purpose of testing of hypotheses which can be classified as: (a) Parametric tests or standard tests of hypotheses; and (b) Non-parametric tests or distribution-free test of hypotheses.
Parametric tests usually assume certain properties of the parent population from which we draw samples. Assumptions like observations come from a normal population, sample size is large, assumptions about the population parameters like mean, variance, etc., must hold good before parametric tests can be used. But there are situations when the researcher cannot or does not want to make such assumptions. In such situations we use, statistical methods for testing hypotheses which are called non-parametric tests because such tests do not depend on any assumption about the parameters of the parent population. Besides, most nonparametric tests assume only nominal or ordinal data, whereas parametric tests require measurement equivalent to at least an interval scale.
As a result, non-parametric tests need more observations than parametric tests to achieve the same size of Type I and Type II errors.
T-test is based on t-distribution and is considered an appropriate test for judging the significance of a sample mean or for judging the significance of difference between the means of two samples in case of small sample(s) when population variance is not known (in which case we use variance of the sample as an estimate of the population variance). In case two samples are related, we use paired t-test (or what is known as difference test) for judging the significance of the mean of difference between the two related samples. It can also be used for judging the significance of the coefficients of simple and partial correlations. The relevant test statistic, t, is calculated from the sample data and then compared with its probable value based on t-distribution (to be read from the table that gives probable values of t for different levels of significance for different degrees of freedom) at a specified level of significance for concerning degrees of freedom for accepting or rejecting the null hypothesis. It may be noted that t-test applies only in case of small sample(s) when population variance is unknown. 2 -test is based on chi-square distribution and as a parametric test is used for comparing a sample variance to a theoretical population variance. F-test is based on F-distribution and is 'used to compare the variance of the two-independent samples. This test is also used in the context of analysis of variance (.ANOVA) for judging the significance of more than two sample means at one and the same time. It is also used for judging the significance of multiple correlation coefficients. Test statistic, F, is calculated and compared with its probable value (to be seen in the F-ratio tables for different degrees of freedom for greater and smaller variances at specified level of significance) for accepting or rejecting the null hypothesis.
2. Population normal, population finite, sample size may be large or small but variance of the population is known Ha may be one sided or two sided. In such a situation z-test is used and the test statics z is worked out as under (using finite population multiplier):
3. Population normal, population infinite, sample size small and variance of the population unknown, Ha may be one sided or two sided In such a situation t-test is used and the test statics t is worked out as under :
4. Population normal, population finite, sample size and small and variance of the population unknown and Ha may be one sided or two sided. In such a situation t-test is used and the test static t is worked out as under (using finite population multiplier):
5. Population may not be normal but sample size is large, variance of the population may be known or unknown and Ha may be one sided or two sided : In such a situation we use z-test and work out the test statistic z and under :
(This applies in case of infinite population when variance of the population is known but when variance is known, we use s in place p of in this formula)
(This applies in case of finite population when variance of the population is known but when variance is not known, we use s in place p of in this formula).
Illustration 1
A sample of 400 male students is found to have a mean height 637.417 inches. Can it be reasonably regarded as a sample from a large population with mean height 67.39 inches and standard deviation 1.30 inches ? Test at 5% level of significance.
Solution: Taking the null hypothesis that the mean height of the population is
equal to 67.39 inches we can write :
and the given information as x =67.47, p = 1300, n=400. Assuming the population to be normal, we can work out the test statics z as under :-
As Ha is two sided in the given question, we shall be applying a two tailed test for determining the rejection regions at 5% level of significance which comes to as under using normal curve are table :
The observed value of z is 1.231 which is in the acceptance region since R : I Z I > 1.96 and thus H0 is accepted. We may conclude that the given sample (with mean height=67.47) can be regarded to have been taken from a population with mean height 637.39 and standard deviation 1.30 at 5% level of significance.
Illustration - 2
Suppose we are interested in a population of 20 industrial units of the same size, all of which are experiencing excessive labour turnover problems. The past
records show that the mean of the distribution of annual turnover is 320 employees, with a standard deviation of 75 employees. A sample of 5 of these industrial units is taken random which gave a mean of annual turnover as 300 employees. Is the sample mean consistent with the population mean? Test at 5% level.
Solution: Taking the null hypothesis that the population mean is 320 employees,
we can write:
Assuming the population to be normal, we can work out the test statistics z as under:
As Ha is two sided in the given question, we shall apply a two tailed test for determining the rejection regions at 5% level of significance which comes to asunder, using normal curve area table: R: I Z I > 1.96 The observed value of z is -0.67 which is the acceptance region since R: I Z I > 1.96 and thus Ho is accepted as we may consistent with population mean i.e. the populations mean 320 is supported by sample results.
HYPOTHESIS MEANS
TESTING
FOR
DIFFERENCE
BETWEEN
In many decision situations we may be interest in knowing whether the parameters of two populations are alike or different. For instance, we may be interested in testing whether female workers earn less than male workers for the same job. We shall explain now the technique of hypothesis testing for difference between means. The null hypothesis for testing of difference between means is generally stated as H0 : 1 = 2 where 2 is population mean of one population and is population means of the second population, assuming both the population mean of one population and is population mean of the second population, assuming both the populations to be normal populations. Alternative hypothesis may be of not equal to or less than or greater than type as stated earlier and accordingly we shall determine the acceptance or rejection regions of testing the hypotheses. There may be different situations when we are examining the significance of difference between two means, but he following may be taken as the usual situations:
1. Population variances are known or the sample happens to be large samples: In this situation we use z-test for difference in means and work out the test statistics z as under:
2. Samples happen to be large but presumed to have been drawn from the same population whose variance is known: In this situation we use z test for difference in means and work out the test statistics z as under:
In case p is not known, we use s1.2 (combined standard deviation of the two samples) in its place calculating
3. Sample happens to be small samples and population variances not known but assumed to be equal. In this situation we use t-test for difference is means and work out the test statistic t as under:
Illustration -3
The mean produce of wheat of a sample of 100 fields in 200 lbs. per acre with a standard deviation of 10 lbs. Another samples of 150 fields gives the mean of 220 lbs. with a standard deviation of 12 lbs. Can two samples be considered to have been taken from the same population whose standard deviation is 11 lbs ? Use 5 per cent level of significance.
Solution: Taking the null hypothesis that the means of two populations do not
differ, we can write.
As Ha is two sided we shall apply a two tailed test for determining the rejection regions at 5 per cent level of significance which comes to as under, using normal curve area table. R: I Z I > 1.96 The observed value of z is -14.08 which falls in the rejection region and thus we reject Ho and conclude that the two samples cannot be considered to have been taken at 5 percent level of significance from the same population whose standard deviation is 11 lbs. This means that the difference between means of two samples is statistically significant and not due to sampling fluctuations.
Illustration -4
A simple random sampling survey in respect of monthly earnings of semi-skilled workers in two cities given the following statistical information:
Table City Mean monthly Earning (Rs.) Standard deviation of sample date of monthly earnings A B 695 710 (Rs.) 40 60 200 175 Size of sample
Test the hypothesis at 5 per cent level that there is not difference between monthly earnings of workers in the two cities.
As the sample size is large, we shall use z-test for difference in means assuming the population to be normal and shall work out the test statistics z as under:
(Since the population variance are not known we have used the sample variance, considering the sample variance as the estimates of population variance)
As Ha is two sided we shall apply at two tailed test for determining the rejection regions at 5 percent level of significance which some to as under, using normal curve area table: R: I Z I > 1.96 The observed value of z is -2.809 which falls in the rejection region and thus we reject H0 at 5 percent level and conclude the earning of workers in the two cities differ significantly.
Illustration -5
A group of seven week old chickens reared on a high protein diet weight 12, 15, 11, 16, 14, 14 and 16 ounces; a second group of five chickens, similarly treated except that they receive a low protein diet, weight 8, 10, 14, 10 and 13 ounces. Test at 5 per cent level whether there is significant evidence that additional protein has increased the weight of the chickens. Use assumed mean (or A1) = 10 for the sample of 7 and assumed means (or A2) = 8 for the sample of 5 chickens in your calculations.
Solution: Taking the null hypothesis that additional protein has not increased the
weight of the chicken we can write:
(As we want to conclude that additional protein has increased the weight of chickens) Since in the given question variances of the populations are not known
and the size of samples is small, we shall use t-test for difference in means, assuming the populations to be normal and thus work out the statistics t as under :
with d.f. = (n1 + n2 -2) From the sample data we work out X1, X2, 2s1and 2s2 (taking high protein diet sample as sample one and low protein diet sample as sample two) as shown below :
The table value of F at 5 per cent level for v1 = 8 and v2= 7 is 3.73. Since the calculated value of F is greater than the table value, the F ratio is significant at 5 per cent level. Accordingly we reject H0 and conclude that the difference is significant.
(a) In case of simple correlation coefficient: We use t-test and calculate the test statistic as under:
with (n - 2) degrees of freedom r being coefficient of simple correlation between x and y. This calculated value of t is then compared with its table value and if the calculated value is less than the table value, we accept the null hypothesis at the given level of significance and may infer that there is no relationship of statistical significance between the two variables. (b) In case of partial correlation coefficient: We use t-test and calculate the test statistic as under:
with (n - k) degrees of freedom, n being the number of paired observations and k being the number of variables involved, rp happens to be the coefficient of partial correlation. If the value of t in the table is greater than the calculated value, we may accept null hypothesis and infer that there is no correlation. (c) In case of multiple correlation coefficient: We use F-test and work out the test statistic as under:
where R is any multiple coefficient of correlation, k being the number of variables involved and n being the number of paired observations. The test is performed by entering tables of the F-distribution with v1 = k- 1 = degrees of freedom for variance in numerator. v2 = n - k = degrees of freedom for Valiance in denominator. If the calculated value of F is less than the table value, then we may infer that there is no statistical evidence of significant correlation.