Chapter 9 Multiple Regression Analysis: The Problem of Inference
Chapter 9 Multiple Regression Analysis: The Problem of Inference
INFERENCE
With the normality assumption, the OLS estimators of the partial regression are best linear
unbiased estimators (BLUE). Moreover, the estimators β2, β3, and β1 are themselves normally
distributed with means equal to true β2, β3, and β1 and the variances.
Note that the df are now n − 3 because we first need to estimate the three partial regression
coefficients, which therefore put three restrictions on the residual sum of squares (RSS)
(following this logic in the four-variable case there will be n − 4 df, and so on). Therefore, the
t distribution can be used to establish confidence intervals as well as test statistical hypotheses
about the true population partial regression coefficients. Similarly, the χ2 distribution can be
used to test hypotheses about the true σ2.
t test can be used to test a hypothesis about any individual partial regression coefficient. To
illustrate the mechanics, consider the following regression:
The null hypothesis states that, with X3 held constant, X2 has no influence on Y. To test the
null hypothesis, we use the t test. If the computed t value exceeds the critical t value at the
chosen level of significance, we may reject the null hypothesis. Otherwise, we may not reject
it.
If we have 64 observations, the degrees of freedom are 61. If you refer to the t table, we do not
have data corresponding to 61 df. The closest we have are for 60 df. If we use these df, and
1
assume α, the level of significance of 5 percent, the critical t value is 2.0 for a two-tail test or
1.671 for a one-tail test.
We need to decide whether we want to use a one-tail or a two-tail t test. For instance, since a
priori child mortality and per capita GNP are expected to be negatively related, we should use
the one-tail test. That is, our null and alternative hypothesis should be
whether we use the t test of significance or the confidence interval estimation, we reach the
same conclusion. However, this should not be surprising in view of the close connection
between confidence interval estimation and hypothesis testing.
Remember that the t-testing procedure is based on the assumption that the error term follows
the normal distribution. Although we cannot directly observe error term, we can observe
residuals. Take mortality regression as an example, the histogram of the residuals is:
From the histogram it seems that the residuals are normally distributed. We can also compute
the Jarque–Bera (JB) test of normality. In our case the JB value is 0.5594 with a p value 0.76.
Therefore, it seems that the error term in our example follows the normal distribution.
2
Testing the overall significance of the sample regression
This null hypothesis is a joint hypothesis that β2 and β3 are jointly or simultaneously equal to
zero. A test of such a hypothesis is called a test of the overall significance of the observed or
estimated regression line, that is, whether Y is linearly related to both X 2 and X3.
Joint hypothesis cannot be tested by testing the significance of estimated β2 and β3 individually.
Testing a series of single hypotheses is not equivalent to testing those same hypotheses jointly.
The intuitive reason for this is that in a joint test of several hypotheses any single hypothesis is
“affected” by the information in the other hypotheses.
We cannot use the usual t test to test the joint hypothesis that the true partial slope coefficients
are zero simultaneously. However, joint hypothesis can be tested by the analysis of variance
(ANOVA) technique.
TSS has, as usual, n − 1 df and RSS has n − 3 df for reasons already discussed. ESS has 2 df
since it is a function of β2 and β3.
Now it can be shown that, under the assumption of normal distribution for u i and the null
hypothesis β2 = β3 = 0, the variable
If the F value exceeds the critical F value from the F table, we reject H0; otherwise we do not
reject it. Alternatively, if the p value of the observed F is sufficiently low, we can reject H0.
3
The p value of obtaining an F value of as much as 73.8325 or greater is almost zero, leading to
the rejection of the hypothesis that together PGNP and FLR have no effect on child mortality.
If you were to use the conventional 5 percent level-of-significance value, the critical F value
for 2 df in the numerator and 60 df in the denominator is about 3.15, or about 4.98 if you were
to use the 1 percent level of significance. Obviously, the observed F of about 74 far exceeds
any of these critical F values.
There is an intimate relationship between the coefficient of determination R2 and the F test
used in the analysis of variance.
4
When R2 = 0, F is zero ipso facto. The larger the R2, the greater the F value. In the limit, when
R2 = 1, F is infinite. Thus, the F test, which is a measure of the overall significance of the
estimated regression, is also a test of significance of R2. In other words, testing the null
hypothesis is equivalent to testing the null hypothesis that (the population) R2 is zero.
The F test is also a test of significance of R2. One advantage of the F test expressed in terms of
R2 is its ease of computation. All that one needs to know is the R2 value.
5
token, one does not want to exclude a variable(s) that substantially increases ESS. But how
does one decide whether an X variable significantly reduces RSS? The analysis of variance
technique can be easily extended to answer this question.
To assess the incremental contribution of X3 after allowing for the contribution of X2, we form
where ESSnew = ESS under the new model (i.e., after adding the new regressors = Q3), ESSold
= ESS under the old model (= Q1), and RSSnew = RSS under the new model (i.e., after taking
into account all the regressors = Q4).
If you use the R2 version of the F test, make sure that the dependent variable in the new and
the old models is the same. If they are different, use the F test.
The F-test procedure just outlined provides a formal method of deciding whether a variable
should be added to a regression model. Often researchers are faced with the task of choosing
from several competing models involving the same dependent variable but with different
explanatory variables. As a matter of ad hoc choice, these researchers frequently choose the
model that gives the highest adjusted R2. Therefore, if the inclusion of a variable increases
adjusted R2, it is retained in the model although it does not reduce RSS significantly in the
statistical sense. When does the adjusted R2 increase? It can be shown that adjusted R2 will
increase if the t value of the coefficient of the newly added variable is larger than 1 in absolute
value, where the t value is computed under the hypothesis that the population value of the said
coefficient is zero.
6
When to add a group of variables
Can we develop a similar rule for deciding whether it is worth adding (or dropping) a group of
variables from a model? If adding (dropping) a group of variables to the model gives an F value
greater (less) than 1, R2 will increase (decrease). One can easily find out whether the addition
(subtraction) of a group of variables significantly increases (decreases) the explanatory power
of a regression model.
where β0 = ln β1.
Now if there are constant returns to scale, economic theory would suggest that
7
How does one find out if there are constant returns to scale, that is, if the restriction is valid? t-
test and F-test approaches can be used.
t-test approach
F-test approach
Chow test
When we use a regression model involving time series data, it may happen that there is a
structural change in the relationship between the regressand and the regressors. By structural
change, we mean that the values of the parameters of the model do not remain the same through
the entire time period. Sometimes the structural change may be due to external forces (e.g., the
oil embargoes imposed by the OPEC oil cartel in 1973 and 1979 or the Gulf War of 1990–
1991), policy changes (such as the switch from a fixed exchange-rate system to a flexible
exchange-rate system around 1973), actions taken by Congress (e.g., the tax changes initiated
by President Reagan in his two terms in office or changes in the minimum wage rate), or a
variety of other causes.
How do we find out that a structural change has in fact occurred? To be specific, consider a
given data. The table gives data on disposable personal income and personal savings, in billions
of dollars, for the United States for the period 1970–1995. Suppose we want to estimate a
simple savings function that relates savings (Y) to disposable personal income DPI (X). Since
we have the data, we can obtain an OLS regression of Y on X. But if we do that, we are
maintaining that the relationship between savings and DPI has not changed much over the span
8
of 26 years. That may be a tall assumption. For example, it is well known that in 1982 the
United States suffered its worst peacetime recession. The civilian unemployment rate that year
reached 9.7 percent, the highest since 1948. An event such as this might disturb the relationship
between savings and DPI. To see if this happened, let us divide our sample data into two time
periods: 1970–1981 and 1982–1995, the pre- and post-1982 recession periods.
Third regression assumes that there is no difference between the two time periods and therefore
estimates the relationship between savings and DPI for the entire time period consisting of 26
observations. In other words, this regression assumes that the intercept as well as the slope
coefficient remains the same over the entire period; that is, there is no structural change. If this
is in fact the situation, then α1 = λ1 = γ1 and α2 = λ2 = γ2.
The first and second regressions assume that the regressions in the two time periods are
different; that is, the intercept and the slope coefficients are different, as indicated by the
subscripted parameters.
9
The mechanics of the Chow test are as follows:
v. The idea behind the Chow test is that if in fact there is no structural change, then the RSS R
and RSSUR should not be statistically different. Therefore, if we form the following ratio:
then Chow has shown that under the null hypothesis are (statistically) the same (i.e., no
structural change or break) and the F ratio given above follows the F distribution with k
and (n1 + n2 − 2k) df in the numerator and denominator, respectively.
vi. We do not reject the null hypothesis of parameter stability (i.e., no structural change) if the
computed F value in an application does not exceed the critical F value obtained from the
F table at the chosen level of significance (or the p value). In this case we may be justified
in using the pooled (restricted) regression. Contrarily, if the computed F value exceeds the
critical F value, we reject the hypothesis of parameter stability, in which case the pooled
regression is of dubious value, to say the least.
References
Book
Video
https://www.youtube.com/watch?v=ie-MYQp1Nic
https://www.youtube.com/watch?v=orGhAoQvSOM
https://www.youtube.com/watch?v=Ga8BJe5nG3I
10