Regression 2006-03-01
Regression 2006-03-01
Regression 2006-03-01
1 GLM: Regression
1.1 Simple Regression
1.1.1 Background
Simple regression involves predicting one quantitative variable (called a dependent
variable) from another quantitative variable (called the independent or predictor variable). The
terms dependent and independent imply predictability but do not necessarily imply causality.
The most common notation in regression is to let Y denote the dependent variable and X, the
independent variable. The phrase “regress (name of dependent variable) on (name of
independent variable)” is often used. For example, “regress receptor levels on age” denotes that
receptor level is the dependent variable and age is the independent variable.
In simple regression, one fits a straight line through the data points and then uses that line
to predict values of the dependent variable from values of the independent variable. The
fundamental equation for the predicted value is
Yˆ = α + βX
where Yˆ is the predicted value of the dependent variable, α is the intercept of the line (i.e., the
place where it crosses the vertical axis which is the same as the predicted value of Y when X = 0),
and β is the slope of the line.
We can write a similar equation for an observed value of Y. Because one will never be
able to predict all of the observed Ys with perfect accuracy, there will be prediction errors. A
prediction error is the difference between an observed value for the dependent variable and its
predicted value, i.e., Y − Yˆ . Letting E denote a prediction error, then the equation for an
observed Y is
Y = Yˆ + E = α + β X + E .
In simple regression, the estimates of parameters α and β are those that minimize the sum
of squared prediction errors. That is, calculate the prediction error for the first observation and
square it, then do the same for the all other observations in the sample. Finally, add together all
the squared prediction errors. The result is termed the error sum of squares. Parameters that
minimize the error sum of squares are called least squares estimates.
We illustrate simple regression by considering the problem of change in the number of a
particular receptor in human cortex with age. To investigate this, a researcher obtains cortex
from a series of post mortems, extracts protein, and then performs a binding assay for the
receptor.
outlined in Section X.X and construct a scatter plot. Figure 1.1 illustrates the plot, along with the
regression line. (We discuss this line later).
In the present example, there do not appear to be any disconnected data points. (Later we
shall demonstrate the effects of outliers).
Figure 1.1 Example of a scatter plot and the regression line (line of best bit).
Observed values of Receptor Number in the cortex, however, will not always be equal to
their predicted values. Hence, simple regression adds an error term when it writes the equation
for observed values of the dependent variable:
Receptor = Receˆ ptor + Error = α + β ⋅ Age + E .
Note that we abbreviate error as E. Regression procedures obtain estimates of the population
parameters α and β by minimizing the sum of squared error, the summation being taken over all
observations in the data set.
Figure 1.2 Output from a simple regression predicting the quantity of receptors in human
cortex as a function of age.
The REG Procedure
Model: MODEL2
Dependent Variable: receptor Receptor Binding fmol/mg
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 1 7.55991 7.55991 9.02 0.0060
Error 25 20.95872 0.83835
Corrected Total 26 28.51863
Root MSE 0.91561 R-Square 0.2651
Dependent Mean 4.90630 Adj R-Sq 0.2357
Coeff Var 18.66202
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 9.43824 1.51942 6.21 <.0001
age 1 -0.06052 0.02015 -3.00 0.0060
With only a single independent variable, the analysis of variance table can be skipped
because it will lead to the same inference as the table of parameter estimates. The estimate of the
intercept (i.e., the estimate of α in Equation X.1) is 9.44. If the relationship between receptor
number and age were linear throughout the lifespan, then this is the predicted receptor
concentration at birth. (Although this is the correct mathematical interpretation of the intercept,
one should never extrapolate the regression line beyond the range of the data at hand. Hence, it
is best to regard the receptor concentration at birth as an unknown best examined by empirical
data.)
The slope of the regression line (i.e., the estimate of parameter β in Equation X.1) is -.06.
The minus sign implies a negative or inverse relationship—increasing age results in lower
receptor numbers. The value of the estimate implies that a one-year increase in age is associated
with a .06 reduction in receptor concentration (measured in this hypothetical example as fmoles
per mg of protein). Taking the estimates of α and β and placing them into Equation X.1 gives
Receˆ ptor = 9.44 - .06 ⋅ Age .
These numbers define the regression line in Figure 1.1. This line is sometimes referred to as the
line of best fit because the estimates are based on minimizing the sum of squared error.
The standard error for this parameter estimate is an estimated standard error, so the
appropriate test statistic for the hypothesis that β = 0 is the t statistic. The value here (-3.00) is
large and its associated p value (.006) is less than .05. Hence, we would conclude that there is
evidence for a change in receptor number with age. In write-ups of a regression result, be careful
of the degrees of freedom, the column labeled DF in Figure 1.2 because there is only one
parameter being tested. The actual degrees of freedom for the t test equal the defrees of freedom
for error in the model, in this case, 26.
QMIN Regression 2006-03-01: 1-5
Note that there is also a test that the intercept is 0. Usually—but not always—this test is
unimportant. You may also have noticed that the p value of the regression coefficient is identical
to the fourth decimal place to that of the F test in the ANOVA table. This is not coincidence. It
is due to the fact that there is only a single independent variable in the model.
1.1.3 Assumptions
At this point it is useful to examine the assumptions underlying regression analysis
because they will apply to the other sections in this chapter and beyond.
1.1.3.1 Linearity
Simple regression assumes that the relationship between the IV and the DV is linear.
Figure 1.1 illustrates two different forms of a nonlinear relationship.
Usually, the effect of fitting a straight line to data with a strong nonlinear relationship is
to reduce power. This is especially true when the relationship is U-shaped or inverted U-shaped
because the slope of the linear term will be close to 0. Hence, the danger is that it might fool the
analyst into concluding that there was no relationship.
QMIN Regression 2006-03-01: 1-6
The best way to diagnose linearity is to construct a scatterplot along the lines of Figure
1.1or Figure 1.3 and visually inspect it. If you suspect the relationship may be nonlinear, then
you can test that using polynomial regression (see Section 1.3.3).
If the residuals in this case were normally distributed, then they would tend to form a
straight line. It is quite clear that they do not, so one should question this regression.
Often, the solution to non-normal residuals can be found in a transformation of the
independent and/or the dependent variable. For the present example, the solution is clear—
regress Y on the log of X and not on X itself. Figure 1.5 gives the residual plots for this
regression.
QMIN Regression 2006-03-01: 1-7
Heteroscedasticity of the form in Panel B of Figure 1.6 usually results from a scaling
problem. Often taking a square root or log transform of Y will remove the heteroscedasticity.
QMIN Regression 2006-03-01: 1-8
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 26.31758 5.26743 5.00 <.0001
age 1 0.42592 0.20180 2.11 0.0450
The results of the regression analysis show a significant effect of age, and both the
regression coefficient and the regression line suggest that the relationship is positive—like a
good wine, retrieval from working memory improves with age. But these results depend entirely
on Ralph.
Figure 1.9 and Figure 1.10 repeat the scatter plot and the regression but this time after
removing Ralph from the data set. Notice how the scatter plot gives a very different impression
of the relationship between age and working memory. Even if one were to erase Ralph and the
regression line from the previous scatter plot, the remaining data points give little hint of the
negative relationship clearly apparent in Figure 1.9. Furthermore, this negative relationship is
significant.
Figure 1.9 The same scatterplot after removing the outlier.
QMIN Regression 2006-03-01: 1-10
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 82.71226 21.35160 3.87 0.0007
age 1 -2.13679 0.96343 -2.22 0.0363
This example, albeit extreme, should impress on the reader the importance of screening
data in preparation for data analysis. By happenstance, the sample contained an elderly
gentleman with an excellent memory, and including him in the analysis gives very misleading
information about the relationship between age and memory. In general, the effect of an outlier
on tests of statistical significance is unpredictable. In can—as it did in this example—retain
statistical significance but switch the direction of the effect. In other cases, outliers can give
statistical significance when there is none, or the outlier can result in failure to detect significant
findings when in fact there is significance. In almost all cases, however, inclusion of an outlier
can serious bias parameter estimates.
where baseline variables are entered into the regression model. If participants are truly
randomized to control and experimental conditions, then a baseline variable will not correlate
with the treatment variable but will usually correlate with the outcome measure.
A simple regression fit a one-dimensional model (a straight line) to data points
distributed in two-dimensional space. Similarly, a multiple regression with two independent
variables fits a two-dimensional model (a plane) to data points distributed in three-dimensional
space. A multiple regression with three independent variables fits a three-dimensional model to
data points located in four-dimensional space—a task that cannot be easily visualized but can be
dealt with in the world of mathematics.
The interpretation of the parameters is easiest to learn by considering two independent
variables. Parameter β1 gives the predicted change in the dependent variable per unit change in
X1 holding variable X2 constant. Expressed in different terms, if one fixed X2 at any value, then a
one-unit change in X1 predicts a change of β1 units in Y. Similarly, if one fixed X1 at any value,
then a one-unit change in X2 predicts a change of β2 units in Y. In general, βi gives the predicted
change in Y for a one-unit change in Xi holding all other independent variables in the model
constant. The term controlling for is usually used to refer to the phrase “holding all other
variables constant.” For instance, “βi measures the effect (predictive effect, not necessarily
causal effect) of variable Xi controlling for X1, X 2 ,K.”
We illustrate multiple regression by elaborating on the data set used in simple regression.
Suppose that the receptor was a nicotinic receptor. Use of nicotine could upregulate the number
of receptors and we all know that smokers die young. Could the relationship between receptor
number and age be due to the effects of smoking? To check for this, we will include among the
assays one for cotinine, a metabolite of nicotine. We can now use cotinine levels as a control
variable in a multiple regression that predicts the levels of nicotinic receptors in cortex from age
and cotinine levels.
1.2.2 How to Do It
to the data and independent variables are added, kept, or deleted based on some statistical
criteria. Complete estimation should be the choice for the vast majority of problems in
neuroscience and must always be used for planned experiments. 1
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 2 15.02076 7.51038 13.35 0.0001
Error 24 13.49787 0.56241
Corrected Total 26 28.51863
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 7.33197 1.37229 5.34 <.0001
age 1 -0.05382 0.01661 -3.24 0.0035
cotinine 1 0.03823 0.01050 3.64 0.0013
The next item for inspection is R2, called the squared multiple correlation, the quantity R
being the multiple correlation. As in simple regression R is the correlation between the predicted
values and the observed values of the dependent variable. Squaring R gives the proportion of
1
Variable-selection techniques and stepwise regression were useful in the past when it took
considerable time to compute regressions by hand. In the modern era, it is very easy to compute
regressions for all possible subsets of IVs. One can then accept the model that best fits
predetermined criteria. For this reason, we will not discuss variable-selection methods in this
text.
QMIN Regression 2006-03-01: 1-13
variance in the dependent variable explained by the model (i.e., by all of the independent
variables). R2 is a measure of effect size (see Section X.X) of the whole model.
Here, the value of R2 is .53, so 53% of the variance in receptor levels if predictable from
the model, both age and cotinine. R2 values from nested models can be added or subtracted.
Two models are nested when all the independent variables in the smaller model are contained in
the larger model. R2 values for non-nested models should not be compared to each other (see
Section 1.3.1). The R2 from the simple regression of receptor on age was .27 (see Figure 1.2).
Because the simple regression is nested within the current model, their R2s can be compared.
Hence, we can say that adding cotinine to the prediction equation explains an addition (.53 - .27)
= .26 or 26% of the variance in nicotinic receptors. Model comparisons are such an important
part of regression that we devote a whole section to it (Section 1.3.1) and we explain the testing
of interactions and polynomial regression in multiple regression along these lines (Sections
1.3.2.3 and 1.3.3.1).
The R2 and its significance inform us that we have overall predictability much better than
chance would allow. The table of parameter estimates and their significance can help us decide
which independent variables contribute to that significant predictability. In Figure 1.11 both age
and cotinine are statistically significant. Hence, there are contributions from both age and
cotinine to density of nicotinic receptors. The decline in nicotinic receptors with age was not due
to the fact that smokers elevate their receptors levels and also die at younger ages.
receptor concentration decreases with age even when controlling for cotinine levels. A good
write-up might be: “Increased cotinine levels significantly predicted increased receptor numbers
(b = .038, t(24) = 3.64, p = .001). Even controlling for effect of cotinine, however, receptor
concentration still significantly declined with age (b= -.054, t(24) = -3.24, p = .004). Hence the
relationship between age and receptor number cannot be explained solely in terms of cotinine
concentrations in brain.”
Note how this latter write-up expresses the significance of the regression coefficients,
gives the direction of their effect (positive for cotinine, negative for age), and provides a clear
summary of the results in terms of the reason for performing the regression in the first place
(despite controlling for cotinine, age is still significant).
Instead of following a slavish and formulaic write up (that is usually quite uninteresting),
the analyst is urged to be creative in terms of explaining the results in terms of the major
hypotheses responsible for the analysis.
1.2.3 Assumptions
The assumptions of multiple regression are the same as those for simple linear regression:
linearity, normality of residuals, and equality of residual variances. Two of these—normality of
residuals and equality of residual variances—can be checked in the same way as these
assumptions are assessed in simple regression.
1.2.3.1 Linearity
If there are k independent variables, then the inclusion of the dependent variable gives a
problem in (k + 1) dimensional space. It is not possible to construct plots for visual inspection in
more than three-dimensional space. So how does one assess linearity? The answer is that there is
no exact mathematical way. Some statisticians recommend constructing scatter plots of all
variables in the analysis taken two at a time. Indeed, software packages such as the Interactive
Data Analysis feature of SAS, allow one to do this with a only few point-and-clicks. A second
method is to construct plots of the residuals as a function of each independent variable. A U- or
inverted U-shaped plot suggests nonlinearity. Finally, an analysis of the residuals for outliers
and/or influential data points may give individuals of nonlinearity.
graph. The situation becomes more complicated as the number of IVs increases. For example,
when there are three X variables, then the data occupy a four-dimensional space (the three Xs
plus the Y variable), something very difficult for us humans to conceptualize, let alone plot.
Most statisticians recommend two processes to deal with outliers and influential data
points. The first is the inspection of the residuals or errors. The second is the inspection of
statistics designed to identify a multivariate outlier and/or influential data points. We shall
speak of each in turn.
2
There are several variations on how these residuals may be calculated. The major variation is
whether the observation in question is included or excluded from the regression model when its
error (residual) is calculated. Consult the manual for your software to make certain that you
know what the residuals mean.
QMIN Regression 2006-03-01: 1-16
1.2.4.2 Multicollinearity
Multicollinearity is usually defined as a state that exists when two or more of the
independent variables are highly correlated. A more precise definition may be developed by
imaging that we computed a series of multiple regressions. In each regression, one of the
independent variables became the dependent variable and all of the other IVs were the predictors.
Then multicollinearity would occur when the R2 for at least one of these regressions was high.
Note that multicollinearity applies to the X variables only. It is not influenced in any way by the
extent to which the X variables correlate with the Y variable. Hence, R2 for the model is not
influenced by multicollinearity. Instead, multicollinearity increases the standard errors of the
regression coefficients, thus making it more difficult to detect whether a coefficient is in fact
significant.
As in many statistical phenomenon multicollinearity is not an either-or state, akin to
falling off a cliff. Instead, regressions descend gradually into multicollinearity as the
correlations among the X variables increase. The central issue for the analyst is to identify of the
situation when multicollinearity becomes such a problem that it compromises the interpretation
of a regression.
Most designs in neuroscience do not have to worry about multicollinearity, except for the
very important situation of statistical interactions (which we deal with below). Why? Most
designs are experimental and hence, the independent variables will not be correlated (if there are
an equal number of observations in each cell) or very weakly correlated (if the number is close to
being equal in each cell). Outside of statistical interactions, the most likely situation that could
QMIN Regression 2006-03-01: 1-17
induce multicollinearity in experimental designs occurs when two or more highly correlated
variables are entered into the equation as control variables. Generally, however,
multicollinearity is a problem most often encountered in observational studies.
principal component analysis and use the resulting principal component scores as the IVs. This
technique is sometimes referred to as regression on principal components.
and
Yˆ = α + β 2 X 2 + β 3 X 3 .
Each of three smaller models is nested within the larger model because all of the predictor
variables on the right-hand side of the equations are predictor variables in the larger model.
In contrast, none of the following models are nested within the larger model:
Yˆ = α + β1 X 1 + β 4 X 4
Yˆ = α + β X + β X
2 2 4 4
Yˆ = α + β 3 X 3 + β 4 X 4
and
Yˆ = α + β 4 X 4 .
Even though each of the above four models are smaller than the larger model, they are not nested
within the larger model. Why? Because they all contain variable X4 which is not contained in the
large model. Hence, it is not possible to compare these four models with the larger model.
3
More advanced methods can permit the assessment of non-nested models. They are, however,
beyond the purview of this book.
QMIN Regression 2006-03-01: 1-19
To examine model comparisons, we will first develop the general case of comparing any
two nested models. After that, we will examine the special case of comparing two models that
differ in one and only one parameter.
.425 − .4084 53 − 3 − 1 − 1
F (1,48) = • = 1.39
1 − .425 1
The critical value for this F is 4.04. Because the observed F is less than its critical value, the
observed F is not significant. Hence, dropping X3 from the model does not significantly worsen
fit. (Note that if we had started with the reduced model and compared it to a model that added X3,
then we could have stated that adding X3 to the model does not significantly increase R2).
Figure 1.12 Model Comparisons: A General Model with Four Predictors,
Model Comparisons
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 4 15.14742 3.78685 8.87 <.0001
Error 48 20.49560 0.42699
Corrected Total 52 35.64302
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
A second method is to use the TEST statement. The SAS syntax for this along with the
results of the test statement is given in Figure 1.13. Note that the F statistic for the TEST
statement is the same (within rounding error) as the one calculated above. Note also that the p
value for the F statistic is identical to the p value for the t statistic in the original, general model
presented earlier in Figure 1.12. This is no coincidence. The F statistic in Figure 1.13 is actually
the square of the t statistic in Figure 1.12. Both statistics answer the same question: Can β3 be
set to 0 without sacrificing predictability? In mathematics, two methods that answer the same
question with the same data must give the same answer. Hence, the t statistic for a parameter in
QMIN Regression 2006-03-01: 1-21
a general model is the same as the F statistic in a model comparison that drops that parameter
and only that parameter.
Figure 1.13 Model Comparisons: Example of the TEST statement in SAS for a single
predictor variable.
SAS Syntax:
TITLE Model Comparisons;
PROC REG DATA=ModelComparison2;
MODEL Y = X1 X2 X3 X4;
RUN;
SAS Output (NOTE: Only results of the TEST statement are shown):
Model Comparisons
Test that beta3 = 0
Mean
Source DF Square F Value Pr > F
Numerator 1 0.59084 1.38 0.2453
Denominator 48 0.42699
significant loss of prediction? The t statistics for individual predictors apply to dropping one and
only one parameter from the model. They do not necessarily inform us of the effect of dropping
more than one parameter from the model.
As an analogy, imagine that your ship sinks and you are left with two floatation
devices—a life-jacket and a circular life-saver. You can discard the life-saver and still float for a
long time because of the life-jacket. Similarly, you can discard the life-jacket, keeping the life-
saver, and still float for a long time. But does this imply that you can safely throw away both the
life-jacket and the life-saver and still remain until rescue arrives?
Hence, it is quite legitimate to ask whether both X2 and X4 can be dropped from the
general model with a significant sacrifice in prediction. The first method of doing this is to use
the TEST statement with PROC REG (or, of course, an equivalent statement with another
statistical package). The SAS syntax is shown in the upper part of Figure X.X and the results are
provided in the lower part of that Figure. The F statistic is 3.67 and its p value is .03. Hence, we
cannot simultaneously set β2 to 0 and β4 to 0 at the same time. At least one of these—and
perhaps both of them—are important for prediction. In the original, general model, we either
lacked the power or had conditions such as multicollinearity that prevented us from detecting
significance.
QMIN Regression 2006-03-01: 1-23
Figure 1.14 Model Comparisons: Example of the TEST statement in SAS for a two
predictor variables.
SAS Syntax:
SAS Output (NOTE: Only results of the TEST statement are shown):
Model Comparisons
Test that beta2 = 0 and beta4=0
Mean
Source DF Square F Value Pr > F
The second method for model comparisons is to run the reduced model with only X1 and
X3 as predictors and then compute the F statistic using Equation X.X. We will not show all the
results from this reduced model, but simply give its R2 (.337). Hence, in terms of the algebraic
quantities in Equation X.X, Rk2+ m = .425, Rk2 = .337, N = 53, k = 2, and m = 2. The F statistic
becomes
.425 − .337 53 − 2 − 2 − 1
F (2,48) = • = 3.67
1 − .425 2
The critical value for F with 2 and 48 df is 3.19. Because the observed F is larger than the
critical value, we reject the smaller model. Eliminating both X2 and X4 from the model
significantly worsens fit.
1.3.2 Interactions
In everyday language, we often say that two variables “interact” in predicting a third
variable, meaning that both variables are important for the prediction. In the GLM, however, the
term “interaction” has a more precise meaning. A statistical interaction between two variables
QMIN Regression 2006-03-01: 1-24
implies that the slope (or curve) for one independent variable differs in shape as a function of the
second variable. For example, a statistical interaction between dose of drug and sex implies that
the dose-response curves for males and females differ in shape. In different words, an
interaction implies that the effect of a dose is different for men and women.
An interaction can be looked upon as a nonadditive contribution of two (or more)
variables to the prediction of Y. We explore this viewpoint on interactions further in the
discussion of a two-by-two factorial design (Section 1.3.2.1) but urge the reader to view
interactions as the extent to which nonadditive factors contribute to prediction.
Let us first consider the case of two independent variables. In multiple regression,
modeling an interaction begins by creating a new variable that is the product of the two variables
involved in the interaction: e.g., X3 = X1*X2 denotes that the new variable (X3) represents the
interaction between variables X1 and X2. Then the two original variables plus the new variable
become the independent variables in the regression model. Hence, the model with the interactive
term is
Yˆ = α + β1 X1 + β 2 X 2 + β 3 X 3
or, writing it in terms of the original variables,
Yˆ = α + β1 X1 + β 2 X 2 + β 3 X1 X 2 .
With three independent variables, we can construct three new variables that are the
product of any two of the three variables: X4 = X1*X2, X5 = X1*X3, and X6 = X2*X3. These are
called two-way interactions because they model the interaction between two variables. In
addition, we can model a three-way interaction by calculating yet another new variable that is
the product of all three independent variables: X7 = X1*X2*X3. The model, expressed in terms of
the original variables, is
Yˆ = α + β1 X1 + β 2 X 2 + β 3 X 3 + β 4 X1 X 2 + β 5 X1 X 3 + β 6 X 2 X 3 + β 7 X1 X 2 X 3 .
With four independent variables, there would be six possible two-way interactions, three
possible three-way interactions (X1*X2*X3, X1*X2*X4, and X2*X3*X4), and a four-way interaction
(X1*X2*X3*X4). Usually, higher order interactions such as a four-way interaction are ignored
because it is very difficult to interpret them.
Let us examine a specific problem to illustrate interactions. It has long been known that
testosterone induces sexual activity in castrated male rats. The recovery of sexual activity is also
a function of prior sexual experience—rats with high levels of previous experience have higher
post testosterone sexual activity than those with less experience. Suppose that a lab is
investigating a new compound with a resemblance to testosterone. The experimenters raised
male rats in two ways—one group had no opportunity to mate with females while the other was
allowed to have sexual activity. Rats were then surgically castrated and divided into three
groups: (1) controls, (2) those given 10 mgs of the new drug per unit weight, and (3) those given
15 mgs. All groups were then allowed access to females in estrus and a composite index of
sexual activity was derived. The results from this hypothetical study are depicted in Figure 1.15.
QMIN Regression 2006-03-01: 1-25
Figure 1.15 Mean sexual activity (± 1 standard error) of rats with and without prior sexual
experience as a function of dose of a testosterone-like compound.
In principle, the procedure for assessing an interaction model is to fit two regression
models. The first of these may be termed the main effects model, and it does not have the
interaction term. The second regression model has the same variables as the first but includes
the interaction term. The purpose is to assess the significance of the interaction term in the
second model. (In practice, one can attain the same result by fitting a model that includes the
main effects and the interaction and then assessing the t statistic for the regression coefficient for
the interaction term—see Section 1.3.1.2. The notion of fitting two models, however, greatly
aids in the interpretation of the regression coefficients, as the following algebraic exercise will
illustrate.)
For the main effects model, we start with two independent variables. The first,
Experience, is dummy coded by assigning a 0 to rats with no previous sexual experience and a 1
to rats with prior experience. The second variable is Dose with values of 0, 10, and 15. Now
examine the regression equation in this model:
Yˆ = α + β1 ⋅ Experience + β 2 ⋅ Dose . (X.X)
QMIN Regression 2006-03-01: 1-26
Substitute the numeric values for Experience to get the predicted values for rats with no
experience ( Yˆ0 ) and those with prior experience ( Yˆ1 ):
Yˆ0 = α + β 2 ⋅ Dose ,
Yˆ = (α + β ) + β ⋅ Dose .
1 1 2
Notice that the equation for the inexperienced rats is a simple regression with intercept α and
slope β2. The equation for the experienced rats is also a simple regression, but here the intercept
is now (α + β1) while the slope remains the same at β2. Hence, the main effects model fits two
simple regressions—one for the inexperienced, the other for the experienced group—allowing
the intercepts to differ between groups but constraining the slopes to be equal. Consequently, the
regression lines for main effects models will be parallel. Parallel regression lines are not
idiosyncratic to this example—they will always be predicted for a main effects model.
Note the careful phrasing above about the intercept. The main effect model permits or
allows the intercepts to differ. It does not force them to be different. Because the intercept for
the inexperienced group is α and the intercept for the experienced rats is (α + β1), a test of
parameter β1 is a test for equality of intercepts. If β1 = 0, then the intercepts are the same; if the
hypothesis that β1 = 0 is rejected, then there is evidence for different intercepts.
Now examine the regression equation that includes the interaction term:
Yˆ = α + β1 ⋅ Experience + β 2 ⋅ Dose + β 3 ⋅ Experience ⋅ Dose . (X.X)
Substituting the numeric values for Experience gives the equations for the two groups of rats as
Yˆ0 = α + β 2 ⋅ Dose (X.X)
Y1 = (α + β1 ) + (β 2 + β 3 ) ⋅ Dose .
ˆ (X.X)
Once again, the equation for the inexperience rats is a simple regression with intercept α and
slope β2. The equation for the experienced rats also remains a simple regression. The intercept,
however, is now (α + β1) and the slope is now (β2 + β3). Hence, the interaction term allows the
slopes for the two groups to differ in addition to the intercepts. Furthermore, parameter β3, the
coefficient for the interaction term in Equation X.X, provides the test for differing slopes. When
β3 is 0, then the slopes for the two groups are the same; if we reject the hypothesis that β3 = 0,
then we have evidence for different slopes. When the two groups have different slopes, their
regression lines will not be parallel. (See Figure 1.18 and Figure 1.19 for, respectively, examples
of parallel and nonparallel regression slopes).
To summarize, main effects models allow intercepts to differ among groups but forces
their slopes to be equal. Interaction models permit different intercepts but also allow different
slopes. 4 . Hence, a significant interaction rejects the null hypothesis that the slopes are parallel.
Although the example used groups, this principle extends to continuous independent variables.
For a continuous X1, the main effects model predicts that the slopes will be parallel for any set of
specific values of X1. The interaction model provides a test for parallel slopes.
Figure 1.16 provides output from PROC GLM in SAS for the main effects model. The
main effects model fit the data well—R2 = 56, F(2,57) = 36.09, p < .0001. Both of the regression
coefficients are significant. For Experience, b = 2.51, t(57) = 4.64, p < .0001, and for Dose, b =
0.31, t(57) =7.11, p < .0001. These finding agree with previous research, so there is evidence
that the new compound has physiological activity like testosterone.
4
The terms homogeneity of slopes and heterogeneity of slopes are sometimes used to refer to,
respectively, parallel and non-parallel slopes.
QMIN Regression 2006-03-01: 1-27
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 2 316.2206 158.1103 36.09 <.0001
Error 57 249.7392 4.3814
Corrected Total 59 565.9598
Standard
Parameter Estimate Error t Value Pr > |t|
Intercept 11.32785714 0.52578024 21.54 <.0001
experience 2.51000000 0.54045600 4.64 <.0001
dose 0.30825714 0.04333288 7.11 <.0001
Results from fitting the interactive model are given in Figure 1.17. This interaction
model is also significant, R2 = .59, F(3,56) = 27.19, p < .0001. The appropriate statistical test for
the interaction is for the coefficient for Experience*Dose. The coefficient is b = .182 and we can
reject the null hypothesis that this it is random draw from a sampling distribution with a mean of
0 (t(56) = 2.17, p = .034). Hence, we can reject the hypothesis that the regression lines are
parallel for the rats with and without prior sexual experience.
Figure 1.17 Regression results for the interaction model.
Dependent Variable: Response Sexual Activity Index
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 3 335.5551 111.8517 27.19 <.0001
Error 56 230.4048 4.1144
Corrected Total 59 565.9598
Standard
Parameter Estimate Error t Value Pr > |t|
Intercept 12.08642857 0.61810091 19.55 <.0001
experience 0.99285714 0.87412669 1.14 0.2609
dose 0.21722857 0.05938521 3.66 0.0006
experience*dose 0.18205714 0.08398338 2.17 0.0344
QMIN Regression 2006-03-01: 1-28
Note that the coefficient for Experience in Figure 1.17 is not significant (β1 = .99, t(56) =
1.14, p = .26). This does not imply that Experience plays no role in the recovery of sexual
function. Why? Because the coefficients in interactive models do not have the same
interpretation as the coefficients in main effects models. To examine the meaning of a
coefficient in an interactive model, one must substitute numeric values into the regression
equation—as we did above—and then examine their meaning. In Equation X.X, the coefficient
for Experience is β1, and from Equations X.X and X.X, we see that β1 tests whether the
intercepts for the inexperienced and experienced rats differ. Consequently, the lack of
significance for Experience implies that the two groups have the same intercept. Substantively,
this means that in the absence of the drug (i.e., Dose = 0) there is no difference in sexual activity
for rats with and without prior sexual experience. Hence, the difference in means for the two
Vehicle groups in Figure 1.15 is not statistically significant.
By substituting the numeric values of the regression coefficients into Equations X.X, we
can examine the interaction between Experience and Dose:
Yˆ0 = 12.09 + .22Dose , (X.X)
Yˆ = 13.08 + .40Dose .
1 (X.X)
For naïve rats, a 1mg increase in the drug will increase sexual activity by .22 units. In
experienced rats, however, there will be almost a two-fold increase (.40 units). Hence,
experienced rats are more sensitive to the drug than inexperienced rats.
Factor 2:
Factor 1: Control Treatment
Control
Treatment
In concrete terms, let us assume that the mean of the group receiving both Control
treatments is 10.3. Suppose that Treatment 1 increases the response by 4.6 units. Hence, the
mean for those observations who are Controls for Factor 2 but Treatments for Factor 1 will be
10.3 + 4.6 = 14.9. Let us further assume that Treatment 2 increases the response by 2.4 units.
Then the mean for the observations who are Controls for Factor 1 but Treatments for Factor 2
will be 10.3 + 2.4 = 12.7.
If the Treatments are additive, then the predicted value for those Observations receiving
both Treatments will be the base rate (10.3) plus the effect of Treatment 1 (4.6) plus the effect of
Treatment 2 (2.4) or 10.3 + 4.6 + 2.4 = 17.3. Hence, the additive model predicts the
Treatment/Treatment cell in Table 1.1. If the mean of this cell differs significantly from this
predicted value, then we have evidence that the additive model is false. Thus, the test of an
additive model (i.e., Main Effects model) versus a non-additive model (i.e., Interactive Model)
QMIN Regression 2006-03-01: 1-29
consists in how well the mean of the Treatment/Treatment cell is predicted by the Main Effects
of Treatment 1 and Treatment 2.
To place this design into a regression analysis, we code two independent variables: X1 is
the independent variable for Factor 1 and it is dummy coded as 0 = Control and 1 =
Experimental; X2 is the independent variable for the second Factor and it is similarly coded as 0
= Control and 1 -= Experimental. The Main Effects model (i.e., the additive model) is
Yˆ = α + β1 X1 + β 2 X 2 ,
and the interactive model is
Yˆ = α + β 1 X 1 + β 2 X 2 + β 3 X 1 X 2 .
Clearly, the main effects model is a subset of the interactive model that stipulates that β3 = 0.
We can now substitute the dummy codes for X1 and X2 to obtain the predicted values for the
dependent variable in any of the four cells in this design. For example, the predicted value for a
Control for Factor 1 (i.e., X1 = 0) and a Treatment for X2 is
Yˆ = α + β1 (0) + β 2 (1) + β 3 (0)(1) = α + β 2 .
Filling all the predicted values into the empty cells of Table 1.1 gives the algebraic expressions
in Table 1.2.
Table 1.2 Predicted values of the dependent variable in a two by two factorial design with
interaction.
Factor 2:
Factor 1: Control Treatment
Control α α + β2
Treatment α + β1 α + β1 + β 2 + β 3
Here, the intercept in the regression model (i.e., α) is the predicted value for the two
Control conditions. Parameter β1 gives the effect of Treatment for Factor 1, and parameter β2
gives the effect of Treatment for Factor 2. If the effect of both Treatments is additive (i.e., the
Main Effects model), then the predicted value for those observations that have receive both
Treatments is
Yˆ = α + β1 + β 2 .
If, on the other hand, the effects of both Treatments are non-additive (or interactive) then
the predicted value will differ from the Main Effects model. Hence, the predicted value of this
cell becomes
Yˆ = α + β 1 + β 2 + β 3 .
Consequently, a test that β3 = 0 effectively tests whether the relationship between Treatment 1
and Treatment 2 is additive (β3 is not significant) or non-additive (β3 is significant). If the test
that β3 = 0 is not significant, then we prefer the Main Effects model. Otherwise, we favor the
Interactive Model.
X2. That is, the predicted value of Y given any specific value of X2—a quantity that we denote as
(Yˆ | X 2 ) --will be
(Yˆ | X ) = (α + β X ) + β X .
2 2 2 1 1 (X.X)
In this equation, the intercept is the quantity (α + β2X2) which will indeed depend on X2, but the
slope of the regression line is always β1 which does not depend on X2. To illustrate this, let α =
.7, β1 = .4, and β2 = 1.3. Now fix X2 at any three values that you wish, substitute each of these
into Equation X.X, and draw the three regression lines. No matter what three values you select,
the regression lines will be parallel as illustrated in Figure 1.18.
Figure 1.18 Regression lines from a main effects model.
line of Y on X1 depends on the value of X2. To illustrate, keep α, β1 and β2 as before but let β3 =
.8. Then, for the same three values of X2 used to compute the lines in Figure 1.18, we have the
regression lines in Figure 1.19.
Figure 1.19 Regression lines from an interaction model.
In the example, we fixed the value(s) of X2 and then examined the slope of the regression
line of Y on X1. The same principles, however, hold if we were to fix values of X1 and examine
the slopes of the regression lines of Y on X2. For example, according to the additive, Main
Effects model, the equation is
Yˆ = α + β1 X1 + β 2 X 2 = (α + β1 X1 ) + β 2 X 2 .
Again, the intercept depends on the value of X1, but the slope does not.
The non-additive or Interactive Model is
Yˆ = α + β 1 X 1 + β 2 X 2 + β 3 X 1 X 2 = (α + β 1 X 1 ) + ( β 2 + β 3 X 1 ) X 2 .
Here, both the intercept and the slope depend on the value of X1.
As in the case of the two by two factorial, the procedure for dealing with interactions
involving continuous variables is to fit two different regression models. The first of these
contains the terms without an interaction. That is,
Yˆ = α + β1X1 + β2X2 .
The second model contains the interaction, or
QMIN Regression 2006-03-01: 1-32
5
The operative phrase here is “can induce multicollinearity.” Interactions do not always have to
induce multicollinearity. Rather than learn all the conditions under which interactions do and do
not result in multicollinearity, we urge that the reader use the algorithm outlined in the text—
statistically test for interactions and if the test is not significant, then eliminate the interaction
term and rerun the model. This simple algorithm will always work.
QMIN Regression 2006-03-01: 1-33
(1) Fit the most plausible general model first. The most plausible general model should
not necessarily be the one with the highest possible interaction term, but the model with
the highest plausible interaction term. In fitting the general model, always include all
lower-order interactions, and always include all of the variables that form the
interaction. For example, if the highest plausible interaction is a three-way interaction,
then the model should include all of the two-way interactions as well as the three
variables that compose the interaction.
(2) If the highest plausible interaction term is significant, then accept that model. (One
may, however, wish to test whether other terms in the model can be eliminated). If the
interaction is not significant, then remove it from the model, consider the reduced model
as the next plausible general model, and proceed to step one.
NOTE: In some cases, this algorithm may result in evaluating a model with several
interactions of the same order. For example, suppose that the initial general model
contained a non-significant three-way interaction. Then, the next model will have three
two-way interactions as the next highest interactive terms. What follows next is an
exercise in logic. If all three of these interactions are not significant, then we know that
we can eliminate any one of them without a significant loss of fit. We do not know,
however, whether or not we can eliminate any two of them (or even all three of them)
without a significant loss of fit. In this case, it is safest to test the fit by comparing the
model with all three interactions to the model with none of the two-way interactions. If
there is no significant difference between the two, then all of the two-way interactions
can be safely eliminated. If there is a significant loss of fit, then we know that at least
two of the interactions are needed. Here, the next step would be to compare all the three
models that have one and only one two-way interaction to the model with all three two-
way interactions.
At the end of the day, one should be left with a parsimonious model that still
satisfactorily predicts the dependent variable. Inferences about the predictor variables
should be based on the significance of their coefficients in this, reduced model.
There will be three two-way interactions among the three treatments—X1 and X2, X1 and
X3, and X2 and X3. These may be calculated as the products of the two variables involved in the
interaction. For instance, X1X2 = X1 * X2.
Finally, there will the three-way interaction. Again, this will be the product of all three
variables: X1X2X3 = X1 * X2 * X3. Hence, the general model is
Yˆ = α + β1 X 1 + β 2 X 2 + β 3 X 3 + β 4 X 1 X 2 + β 5 X 1 X 3 + β 6 X 2 X 3 + β 7 X 1 X 2 X 3 .
The logic is to fit this general model first and then test for the significance of the three-way
interaction. If it is significant, it will be kept in all subsequent models. If it is not significant,
then it will be dropped from the model. Next, the two-way interactions will be evaluated. The
ultimate goal is to arrive at a parsimonious model that explains the data well without sacrificing
predictability.
The output from the general regression model is given in Figure 1.21. The coefficient for
the three-way interaction is .7625 and it is not close to being significant (t = 0.54, p = .59).
Hence, the next step is to create a reduced model without the three-way interaction. This will
serve as a baseline model for testing the two-way interactions. Before leaving Figure 1.21, note
that there is only one significant regression coefficient, the one for X3. The overall R2 for this
model is very high (R2 = .65, not shown in Figure 1.21). The lack of significance for the
coefficients coupled with the high R2 strongly suggests multicollinearity.
Figure 1.21 Interactions: Regression coefficients for the general model.
QMIN: Regression
Model Comparisons: Interactions in an experimental design
General Model
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
The coefficients for the first reduced model are presented in Figure 1.22. The coefficient
for the interaction term X1X2 is significant. Given that the effect of multicollinearity is usually to
make it difficult to detect significance when it is indeed present, this suggests that the interaction
between X1 and X2 is meaningful. But what of the other two-way interactions?
QMIN Regression 2006-03-01: 1-35
Figure 1.22 Interactions: Regression coefficients from the first reduced model.
QMIN: Regression
Model Comparisons: Interactions in an experimental design
Reduced Model 1
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
It is clear that we can delete X1X3 from the model (t = 1.25, p = .22) or we can delete
X2X3 from the model (t = 1.71, p = .09). So the major question is whether we can delete both of
them without significantly worsening the fit of the model. (Recall that the lack of significance of
X1X3 and X2X3 does not necessarily imply that both can safely be removed from the model—see
Section 1.3.1.3).
All good statistical packages have provisions that allow one to delete more than one IV
from the model and then assess the fit. In PROC REG in SAS, that provision is given by the
TEST statement. The output from a test for setting the coefficient for X1X3 to 0 and
simultaneously setting the coefficient for X2X3 to 0 is given in Figure 1.23. Here the test statistic
is an F ratio and its associated p value (.12) is not significant. Hence, both of these coefficients
can be set to 0 without a significant loss of fit.
Figure 1.23 Interactions: Test of the significant of both interaction terms X1X3 and X2X3.
QMIN: Regression
Model Comparisons: Interactions in an experimental design
Reduced Model 1
We now run a second reduced model, this time eliminating both of the two-way
interactions. The results from this reduced model are given in Figure 1.24. Note that all of the
coefficients are now significant. Hence, we cannot reduce this model any further. This second
reduced model becomes out final model.
QMIN Regression 2006-03-01: 1-36
Figure 1.24 Interactions: Regression coefficients from the second reduced model.
QMIN: Regression
Model Comparisons: Interactions in an experimental design
Reduced Model 2
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Note the difference in the significance of the main effects in the general model (Figure
1.21) versus the final model (Figure 1.24). In the final model, all three main effects were
significant. In the general model, only X3 was significant. Note that the interaction X1X2 was
also significant in the final, but not the general, model. The reason for this is multicollinearity.
The three main effects have a correlation of .38 with the three-way interaction, and each two-
way interaction correlates .65 with X1X2X3. Also, in the general model, the tolerance for X1X2X3
is quite low (.14). When the three-way interaction is dropped and the model is reduced to its
final form, then the maximum tolerance is .33—still small, but given that each coefficient is
significant, there is no way to reduce the model without sacrificing predictability.
If we had interpreted the coefficients from the general model, we would have erroneously
concluded that only Treatment 3 had an effect on the response. Both Treatments 1 and 2, as well
as their interaction, influenced the response. Hence, we state a cardinal rule for analyzing
interaction terms in regression, ANOVA, and GLM.
Never, under any circumstances, interpret the significance of lower-order coefficients in a
model in which a higher-order interaction is not significant. Always remove the higher-
order interaction from the model and rerun the GLM.
6
An “original independent variable” may be a variable originally recorded in the data set or
some transform of that variable (e.g., a log or square root transform).
QMIN Regression 2006-03-01: 1-37
straight line ranges from negative infinity to positive infinity. For any concrete data set,
however, the straight line applies only to the range of values of the independent variable in the
data set.
To illustrate, consider a linear regression of weight on height in humans. Fitting a
straight line through the data points might generate the mathematical prediction that a person
who is -1.7 meters in stature should weigh -322 kilograms. Mathematically, this prediction is
correct, but it is quite illogical from the common sense view that no human can be -1.7 meters in
height.
In a similar way, when we fit a polynomial model with a square term for an independent
variable, i.e.,
Yˆ = α + β1 X + β 2 X 2 ,
we fit a parabola to the data. Mathematically, that parabola takes the form of the curve in Figure
1.25a (or that curve flipped on its head), and that curve extends from negative infinity to positive
infinity alone the horizontal axis and from the minimum of the curve to positive (or negative)
infinity along the vertical axis. For any specific data set, however, the range of values on the
horizontal axis will be limited. Hence, the practical effect of fitting a model with the X2
independent variable to the data will be to take a “slice” from the parabola in Figure 1.25a (or a
“slice” from an inverted form of that parabola). Examples of those slices—i.e., the types of
curves that could be fitted to real data—are illustrated in panels (b) through (d) of Figure 1.25.
QMIN Regression 2006-03-01: 1-38
(a) (b)
(c) (d)
(a) (b)
(c) (d)
The first step in fitting polynomial regression is to formulate an a priori hypothesis about
the form of the curve. In experimental studies this hypothesis should guide the original design of
the study. For example, in the 5-HT study, we assume that 5-HT levels will increase, reach a
peak, and then decrease to baseline. Hence, we expect at least a quadratic relationship, but we
should also be prepared to fit a cubic or even a quartic to test whether the rate of increase in the
initial stage equals the rate of decrease in the latter stages.
In terms of fitting polynomial models, think of the term X2 as X * X or an interaction
between X and itself. Similarly, X3 may be viewed as the three-way interaction of variable X
with itself. According to this perspective, polynomial regression amounts to a series of
interactions involving one variable with itself.
Hence, the algorithm developed for interactions also applies to polynomial regression. In
concrete terms this algorithm is stated in Figure 1.28
QMIN Regression 2006-03-01: 1-41
(1) Fit the model with the highest plausible polynomial term.
(2) If the coefficient for the highest plausible polynomial terms is significant, then
accept that model (although one may wish to test the significance of lower-order
coefficients in that model). If the coefficient for the highest plausible polynomial term
is not significant, then drop that term from the model. Consider the next highest
polynomial term as the most plausible polynomial tern and go to (1).
In short, one starts with the largest plausible polynomial and then tests for the significant of the
largest term and only the largest term. If that coefficient is significant then accept the model
(although one may subsequently test for the significance of other terms in the model). If the term
is not significant, then eliminate it from the model, rerun the regression, and examine the
significance of the next highest term. Continue this procedure to reduce the polynomial to the
lowest significant order.
In the 5-HIAA data, we would start with a quartic and then test the quartic term. If the
quartic is significant, then we stop there and accept that model. If it is not significant, then we
delete the quartic term, fit a cubic polynomial, and examine the significance of the regression
coefficient for the cubic term. If it is significant, then we accept the cubic polynomial as the
final model. If it is not significant, then we delete the cubic term and fit a quadratic model. If
the quadratic term is significant, then we accept the quadratic. Otherwise, we drop the quadratic
term and fit the simple linear model. If the linear model is significant, then we accept that model.
Otherwise, we conclude that there is no change in 5_HIAA over time.
Figure 1.29 gives the output from PROC REG in SAS that fitted up to a quartic
polynomial to the 5-HT data. In this output, variable time2 is the square of time, time3 is the
cube of time, etc. Only that section of the output that pertains to the regression coefficients is
shown. For completeness, all regressions from the linear to the quartic are shown.
We would begin with the quartic model given in the last part of the table. The coefficient
for the quartic is not significant (b = .0043, t = .40, df = 91, p = .69). Hence, we would eliminate
the quartic and fit a cubic. Recall here the importance of the algorithm. Our interest is only in
the quartic term. If we examined the significance of all terms in the model, we may have
erroneously concluded that there was not change of 5-HIAA over time.
The next step is to eliminate the quartic term and run the cubic model. The coefficient
for the cubic is significant (b = .0399, t = 2.14, df = 92, p = .03), so we would accept this model
as the most parsimonious polynomial that does not sacrifice predictability. The plot of the
predicted means in Figure 1.27 was derived from the parameters of the cubic model.
The inclusion of the linear and the quadratic regressions in Figure 1.29 suggest an
important lesson—be wary of starting with the lowest term and working up. Had we fitted the
linear model first, then we might have been tempted to say there was no significant change over
time because the linear term was not significant (b = -.133, t = -1.68, df = 94, p = .10). It is quite
true that this coefficient is not significant, but the only thing that this tells us is that the best
fitting straight line through the means in Figure 1.27 has a slope near 0. There may be nonlinear
effects of time on 5-HIAA.
QMIN Regression 2006-03-01: 1-42
HIAA data set referenced above 7 . (Recall that variable “time1” is the linear term, “time2” is the
quadratic, and so on.)
Figure 1.30 Hierarchical solution (Type I SS) for fitting a quartic polynomial model to the
5-HIAA data set.
The quartic term (time4) is not significant (F(1, 94) = 0.16, p = .69). The cubic term
(Time3), however, is significant (F(1, 94) = 4.55, p = .04). Hence, we select the cubic
polynomial as the best model. We would then run the cubic model and use the coefficients from
this model to obtain predicted values of 5-HIAA as a function of time. The results would be the
same as if we had followed the algorithm presented in Figure 1.28.
7
This solution was generated from PROC GLM in SAS.
QMIN Regression 2006-03-01: 1-44
of polynomials. Good statistical packages will always have such a feature 8 . For a polynomial of
order k, there will be (k – 1) roots. Select only those roots that are within (or very close to being
within) the range of values of X in the data set.
Table 1.3 Equations for finding the maxima and minima of some polynomial functions.
Order of
Polynomial Equation
2 β1 + 2β 2 X = 0
3 β1 + 2β 2 X + 3β 3 X 2 = 0
4 β1 + 2β 2 X + 3β 3 X 2 + 4β 4 X 3 = 0
5 β1 + 2β 2 X + 3β 3 X 2 + 4β 4 X 3 + 5β 5 X 4 = 0
As an example, substituting the regression coefficients for the serotonin time-course
example into the equation in Table 1.3 for a cubic polynomial gives
3.46558 − 2(.71051) X + 3(.03994) X 2 = 0 ,
3.46558 − 1.42102 X + .11982 X 2 = 0 .
Note that it is recommended practice to use several decimal places in solving for the roots of
polynomials. Using the POLYROOT function in PROC IML of SAS, we find that the time unit
giving maximal response was 3.4. Minimal response was at time unit 8.4.
8
Sometimes the routine will be described as finding “zeros” of a polynomial.
QMIN Regression 2006-03-01: 1-45
group variable. Finally, perform the polynomial regressions as outlined above in Sections
1.3.3.1 and 1.3.3.1.1.
Figure 1.31 Mean (+/- one standard error) relaxation scores of six categories of cigarette
use after a brief period of abstinence. Also shows are the best fitting linear, quadratic, and
cubic polynomials.
In analyzing these data, we used the hierarchical method specified in Section 1.3.3.1.1
and started with a quartic model. The polynomial variables were called Poly1 (the linear term)
through Poly4 (the quartic term). Output from the hierarchical SS from PROC GLM in SAS is
given in Figure 1.32.
QMIN Regression 2006-03-01: 1-46
Figure 1.32 Hierarchical solution (Type I SS) from fitting a quartic polynomial to the
cigarette-abstinence data.
The quartic model over fits the data because the quartic term is not significant. The first
three terms, however, are significant, so we settle on the cubic polynomial as the best model.
As should be clear from this description, the mechanics of fitting polynomials to ordered
groups are the same as fitting polynomials where the independent variable is measured on an
interval or ratio scale. The difference lies in the interpretation of the results. With an interval or
ratio scale, one can extrapolate between groups and calculate maxima and minima response
points. With ordered groups, however, such extrapolations and calculations are valid only to the
extent that the underlying order of the groups approaches an interval scale.
The polynomial functions for ordered groups should be used to interpret differences in
the group means. Hence, the model can be used to make inferences about the groups, but the
nature of those inferences depends on the nature of the groups. Let us illustrate. Return to
Figure 1.31 and compare the predicted values for the linear model to those from the quadratic
model. The linear model predicts that increasing exposure to tobacco is associated with
decreased relaxation. The quadratic model—which, recall, fits better than the linear model—
reveals something more. The curve for the first three groups is flat. This suggests that those
who have never smoked, are ex-smokers, or occasionally smoke have similar levels of
relaxation. Once we get to daily smokers, however, the curve descends in a “dose-dependent”
fashion. This pattern could be interpreted as a difference between those currently addicted to
nicotine and those not currently addicted.
The cubic curve adds to prediction in two ways. First it suggests a meaningful difference
among the first three groups. Those who have never smoked may be different in their tendency
to relax from those who have taken up smoking—either in the past (the abstinent group) or only
occasional. This may have less to do with the additive and physiological properties of nicotine
and more to do with the participants’ environments and personalities during the period of
maximal risk for sampling cigarettes. The statistical analysis cannot prove this, but it acts as a
good heuristic than can guide future research into this area.
The second difference between the cubic and the quadratic curve is the “dose-response”
portion of the curve for daily smokers. The quadratic curve predicts an almost linear decrease in
relaxation from the 3rd (daily smokers, less than 1 pack) to the 6th (daily smokers, more than 2
packs). The cubic curve, on the other hand, agrees with the observed means in suggesting that
the “dose-response” curve flattens after a certain point. Because the data consist of ordered
groups and not a firm quantitative estimate of dose, one should not make strong claims about
where the predictions asymptote. One could, however, use the form of the curve to guide the
design of further studies into this area.
QMIN Regression 2006-03-01: 1-47
Figure 1.33 Observed means and means predicted from the best fitting regression model
for dose-response curves from a control group and a group administered a drug agonist.
To answer this question, we write a model that includes both polynomial terms (in order
to model the dose-response) and interactive terms (in order to test whether the shape of the dose-
response curve differs between the Control and the Agonist group). Let us begin by dummy
coding the group variable such that 0 = Control and 1 = Agonist. We will then fit a series of
regression models, starting with the most general model and then reducing it to a parsimonious
one that still predicts well. Finally, after we settle on the best model, we will substitute into the
regression equation a value of 0 for the Control group and 1 for the Agonist group to interpret the
meaning of the model’s parameters.
QMIN Regression 2006-03-01: 1-48
Figure 1.34 presents the statistics for overall fit and the parameter estimates from two
regression models. The first order of business is to ascertain the most parsimonious model that
explains the data without sacrificing parsimony to explanatory power. Model 1 fits the main
effect for agonist, the polynomial for dose, the an interaction term between agonist and the linear
term for dose (agonist*dose), and an interaction between agonist and the quadratic term for dose
(agonist*dose*dose).
Figure 1.34 Results of regressing dependent variable Response on independent variables
group (Control vs. Agonist) and dose of drug, including the polynomial effects of drug and
the interactions of group and dose of drug.
In the general model, the three way interaction is not significant (t(174)= -1.26, p = .21).
Hence, this term is removed form the model and the regression is rerun.
In the reduced model, both of quadratic term for dose (dose*dose) and the interaction
between agonist and dose (agonist*dose) are significant. Hence, we will accept this model.
(Note: One could also drop the variable agonist from the model and rerun it. We hold off doing
that in order to explain the meaning of the coefficient for agonist).
Having arrived at a satisfactory model, we must now perform some algebraic
manipulations to derive the meaning of the parameters. The general equation for Model 3 is
Yˆ = α + β1Agonist + β 2 Dose + β 3Dose 2 + β 4 Agonist * Dose .
Because we have coded Controls as 0 and Agonists as 1, we have the following two equations
for the Control and the Agonist groups:
YˆControl = α + β 2 Dose + β 3Dose 2 , (X.X)
and
QMIN Regression 2006-03-01: 1-49
1.4 References
Cohen, J. & Cohen, P. (1983). Applied Multiple Regression/Correlation Analysis for the
Behavioral Sciences, 2nd Ed. Hillsdale, NJ: Lawrence Erlbaum.
Judd, C.M. & McClelland, G.H. (1989). Data Analysis: A Model-Comparison Approach. New
York: Harcourt, Brace, Jovanovich.
Belsley, D.A., Kuh, E., and Welsch, R.E. (1980). Regression Diagnostics. New York: John
Wiley & Sons, Inc.
QMIN Regression 2006-03-01: 1-51
1.5 Tables
Table 1.1 Schematic of a two by two factorial design............................................................... 1-28
Table 1.2 Predicted values of the dependent variable in a two by two factorial design with
interaction. ................................................................................................................................. 1-29
Table 1.3 Equations for finding the maxima and minima of some polynomial functions......... 1-44
QMIN Regression 2006-03-01: 1-52
1.6 Figures:
Figure 1.1 Example of a scatter plot and the regression line (line of best bit). ........................... 1-2
Figure 1.2 Output from a simple regression predicting the quantity of receptors in human cortex
as a function of age. ..................................................................................................................... 1-4
Figure 1.3 Examples of nonlinear relationships. ......................................................................... 1-5
Figure 1.4 Example of residuals that are not normally distributed.............................................. 1-6
Figure 1.5 Example of normally distributed residuals................................................................. 1-7
Figure 1.6 Examples of equal variance of residuals (homoscedasticity) and unequal variance of
residuals (heteroscedasticity). ...................................................................................................... 1-7
Figure 1.7 Example of a scatter plot containing an outlier. ......................................................... 1-8
Figure 1.8 Results from a regression with an outlier. .................................................................. 1-9
Figure 1.9 The same scatterplot after removing the outlier. ........................................................ 1-9
Figure 1.10 Results of the regression with the outlier removed. ............................................... 1-10
Figure 1.11 Multiple regression of receptor number on age and cotinine. ................................ 1-12
Figure 1.12 Model Comparisons: A General Model with Four Predictors,............................... 1-20
Figure 1.13 Model Comparisons: Example of the TEST statement in SAS for a single predictor
variable....................................................................................................................................... 1-21
Figure 1.14 Model Comparisons: Example of the TEST statement in SAS for a two predictor
variables. .................................................................................................................................... 1-23
Figure 1.15 Mean sexual activity (± 1 standard error) of rats with and without prior sexual
experience as a function of dose of a testosterone-like compound............................................ 1-25
Figure 1.16 Regression results for the main effect model. ........................................................ 1-27
Figure 1.17 Regression results for the interaction model. ......................................................... 1-27
Figure 1.18 Regression lines from a main effects model........................................................... 1-30
Figure 1.19 Regression lines from an interaction model. .......................................................... 1-31
Figure 1.20 Model comparison algorithm for interactions. ....................................................... 1-33
Figure 1.21 Interactions: Regression coefficients for the general model. ................................. 1-34
Figure 1.22 Interactions: Regression coefficients from the first reduced model....................... 1-35
Figure 1.23 Interactions: Test of the significant of both interaction terms X1X3 and X2X3........ 1-35
Figure 1.24 Interactions: Regression coefficients from the second reduced model. ................. 1-36
Figure 1.25 Examples of quadratic curves................................................................................. 1-38
Figure 1.26 Examples of cubic curves....................................................................................... 1-39
Figure 1.27 Assays of 5-HIAAA in CSF as a function of time. ................................................ 1-40
Figure 1.28 Algorithm for fitting polynomial regression models.............................................. 1-41
Figure 1.29 Output from regression of 5-HIAA on the polynomials of time. ........................... 1-42
Figure 1.30 Hierarchical solution (Type I SS) for fitting a quartic polynomial model to the 5-
HIAA data set. ........................................................................................................................... 1-43
Figure 1.31 Mean (+/- one standard error) relaxation scores of six categories of cigarette use after
a brief period of abstinence. Also shows are the best fitting linear, quadratic, and cubic
polynomials................................................................................................................................ 1-45
Figure 1.32 Hierarchical solution (Type I SS) from fitting a quartic polynomial to the cigarette-
abstinence data. .......................................................................................................................... 1-46
Figure 1.33 Observed means and means predicted from the best fitting regression model for
dose-response curves from a control group and a group administered a drug agonist. ............. 1-47
QMIN Regression 2006-03-01: 1-53
Figure 1.34 Results of regressing dependent variable Response on independent variables group
(Control vs. Agonist) and dose of drug, including the polynomial effects of drug and the
interactions of group and dose of drug. ..................................................................................... 1-48