6: Regression and Multiple Regression: Independent Variable. Then, Click

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

6: Regression and Multiple Regression

Objectives
Calculate regressions with one independent variable
Calculate regressions with multiple independent variables
Scatterplot of predicted and actual values
Calculating residuals and predicted values
Regression
Regression allows you to predict variables based on another variable. In this
chapter we will focus on linear regression or relationships that are linear (a line) rather
than curvilinear (a curve) in nature. Lets begin with the example used in the text in
which mental health symptoms are predicted from stress.
Open symptoms and stress.sav.
Select Analyze/Regression/Linear.

Select symptoms as the Dependent


variable and stress as the
Independent variable. Then, click
on Statistics to explore our
options. The following dialog box
will appear.

As you can see there are


many options. We will focus
only on information covered
in the textbook. Estimates
and Model Fit are selected
by default. Leave them that
way. Then select
Descriptives and Part and
partial correlations. SPSS
will then calculate the mean
and standard deviation for
each variable in the equation
and the correlation between
the two variables. Then,
click Continue.

At the main dialog box, click on Plots so we can see our options.
It looks like we can create scatterplots here.
Click Help to see what the abbreviations
represent. Id like to plot the Dependent
variable against the predicted values to see
how close they are. Select Dependnt for Y
and Adjpred for X. Adjpred is the adjusted
prediction. Used Help/Topics/Index to find
out what this means for yourself. Then, click
Continue.

In the main dialog box, click Save, and the


dialog box to the left will appear. For
Predicted Values, select Unstandardized and
Standardized. For Residuals, also select
Unstandardized and Standardized. Now,
SPSS will save the predicted values of
symptoms based on the regression equation
and the residual or difference between the
predicted values and actual values of
symptoms in the data file. This is a nice
feature. Remember, the standardized values
are based on z score transformations of the
data whereas the unstandardized values are
based on the raw data. Click Continue.

Finally, click on Options.

Including a constant in the equation is


selected by default. This simply means
that you want both a slope and an
intercept (the constant). Thats good. We
will always leave this checked. Excluding
cases listwise is also fine. We do not have
any missing cases in this example
anyway. Click Continue, and then Ok in
Descriptive Statistics
the main dialog box. The output follows.
Mean
90.70
21.47

SY MPTOMS
STRESS

Std. Dev iat ion


20.27
13.10

N
107
107

Correlati ons

Pearson Correlation
Sig. (1-tailed)
N

SY MPTOMS
1.000
.506
.
.000
107
107

SY MPTOMS
STRESS
SY MPTOMS
STRESS
SY MPTOMS
STRESS

STRESS
.506
1.000
.000
.
107
107

Variabl es Entered/Removedb

Model
1

Variables
Entered
STRESSa

Variables
Remov ed
.

Method
Enter

a. All requested v ariables entered.


b. Dependent Variable: SYMPTOMS

Model Summaryb

Model
1

R
.506a

R Square
.256

Adjusted
R Square
.249

St d. Error of
the Estimate
17.56

a. Predictors: (Constant), STRESS


b. Dependent Variable: SYMPTOMS
ANOVAb

Model
1

Regression
Residual
Total

Sum of
Squares
11148.382
32386.048
43534.430

a. Predictors: (Const ant), STRESS


b. Dependent Variable: SY MPTOMS

df
1
105
106

Mean Square
11148.382
308.439

F
36.145

Sig.
.000a

Coeffi ci entsa

Model
1

(Constant)
STRESS

Unstandardized
Coef f icients
B
St d. Error
73.890
3.271
.783
.130

St andardi
zed
Coef f icien
ts
Beta
.506

t
22.587
6.012

Sig.
.000
.000

Zero-order
.506

Correlations
Part ial
.506

Part
.506

a. Dependent Variable: SYMPTOMS

Charts

How does our output compare to the output presented in the textbook? Take a moment
to identify all of the key pieces of information. Find r2, find the ANOVA used to test
the significance of the model, find the regression coefficients used to calculate the
regression equation. One difference is that the text did not include the scatterplot.
What do you think of the scatterplot? Does it help you see that predicting symptoms
based on stress is a pretty good estimate? You could add a line of best fit to the
scatterplot using what you learned in Chapter 5.
Now, click Window/Symptoms and stress.sav and look at the new data (residuals and
predicted values) in your file. A small sample is below. Note how they are named
and labeled.

Lets use what we know about the regression equation to check the accuracy of the
scores created by SPSS. We will focus on the unstandardized predicted and residual
values. This is also a great opportunity to learn how to use the Transform menus to
perform calculations based on existing data.
We know from the regression equation that:
Symptoms Predicted or Y = 73.890 + .783* Stress.
We also know that the residual can be computed as follows:
Residual = Y- Y or Symptoms Symptoms Predicted Values.
Well use SPSS to calculate these values and then compare them to the values computed
by SPSS.
In the Data Editor window, select Transform/Compute.

Check the Data Editor to see if your new variable is there, and compare it to pre_1.
Are they the same? The only difference I see is that our variable is only expressed to 2
decimal places. But, the values agree.
Follow similar steps to calculate the residual. Click on Transform/Compute. Name
your Target Variable sympres and Label it symptoms residual. Put the formula
symptoms-sympred in the Numeric Expression box by double clicking the two preexisting variables and typing a minus sign between them. Then, click Ok.
Compare these values to res_1. Again they agree. A portion of the new data file is
below.

Now that you are confident that the predicted and residual values computed by
SPSS are exactly what you intended, you wont ever need to calculate them yourself
again. You can simply rely on the values computed by SPSS through the Save command.
Multiple Regression
Now, lets move on to multiple regression. We will predict the dependent
variable from multiple independent variables. This time we will use the course
evaluation data to predict the overall rating of lectures based on ratings of teaching skills,
instructors knowledge of the material, and expected grade.
Open course evaluation.sav. You may want to save symptoms and stress.sav to
include the residuals. Thats up to you.
Select Analyze/Regression/Linear.
Select overall as the Dependent
variable, and teach, knowledge,
and grade as the Independents.
Since there are multiple
independent variables, we need
to think about the Method of
entry. As noted in the text,
stepwise procedures are
seductive, so we want to select
Enter meaning all of the
predictors will be entered
simultaneously.

Click Statistics and select Descriptives and Part and partial correlations. Click
Continue.
Click Plots and select Dependnt as Y and Adjpred as X. Click Continue.
Click Save and select the Residuals and Predicted values of your choice. Click
Continue.
Click Ok at the main dialog box. The output follows.

Descriptive Stati stics


Mean
3.55
3.66
4.18
3.49

OVERALL
TEACH
KNOWLEDG
GRADE

Std. Dev iat ion


.61
.53
.41
.35

N
50
50
50
50

Correlations

Pearson Correlation

Sig. (1-tailed)

OVERALL
TEACH
KNOWLEDG
GRADE
OVERALL
TEACH
KNOWLEDG
GRADE
OVERALL
TEACH
KNOWLEDG
GRADE

OVERALL
1.000
.804
.682
.301
.
.000
.000
.017
50
50
50
50

TEACH
.804
1.000
.526
.469
.000
.
.000
.000
50
50
50
50

Variabl es Entered/Removedb

Model
1

Variables
Entered

Variables
Remov ed

GRADE,
KNOWLED
a
G, TEACH

Method

Enter

a. All requested v ariables entered.


b. Dependent Variable: OVERALL

Model Summaryb

Model
1

R
.863a

R Square
.745

Adjusted
R Square
.728

St d. Error of
the Estimate
.32

a. Predictors: (Constant), GRADE, KNOWLEDG, TEACH


b. Dependent Variable: OVERALL

KNOWLEDG
.682
.526
1.000
.224
.000
.000
.
.059
50
50
50
50

GRADE
.301
.469
.224
1.000
.017
.000
.059
.
50
50
50
50

ANOVAb

Model
1

Regression
Residual
Total

Sum of
Squares
13.737
4.708
18.445

df
3
46
49

Mean Square
4.579
.102

F
44.741

Sig.
.000a

a. Predictors: (Const ant), GRADE, KNOWLEDG, TEACH


b. Dependent Variable: OVERALL

Coeffi cientsa

Model
1

Unstandardized
Coef f icients
B
St d. Error
-.927
.596
.759
.112
.534
.132
-.153
.147

(Constant)
TEACH
KNOWLEDG
GRADE

St andardi
zed
Coef f icien
ts
Beta

t
-1.556
6.804
4.052
-1.037

.658
.355
-.088

Sig.
.127
.000
.000
.305

a. Dependent Variable: OVERALL

Charts
Scatterplot
Dependent Variable: OVERALL
5.0

4.5

4.0

3.5

OVERALL

3.0

2.5
2.0
2.0

2.5

3.0

3.5

4.0

4.5

5.0

Regression Adjusted (Press) Predicted Value

Compare this output to the results in the text. Notice the values are the same, but the
styles are different since the output in the book (earlier edition) is from Minitab, a
different data analysis program.
Exit SPSS. Its up to you to decide if you want to save the changes to the data file and
the output file.
In this chapter, you have learned to use SPSS to calculate simple and multiple
regressions. You have also learned how to use built in menus to calculate descriptives,
residuals and predicted values, and to create various scatterplots. As you can see, SPSS

has really simplified the process. Complete the following exercises to increase your
comfort and familiarity with all of the options.
Exercises
1. Using data in course evaluations.sav, predict overall quality from expected grade.
2. To increase your comfort with Transform, calculate the predicted overall score
based on the regression equation from the previous exercise. Then calculate the
residual. Did you encounter any problems?
3. Using data in HeightWeight.sav, predict weight from height and gender. Compare
your results to the output in Table 11.6 of the textbook.
4. Using the data in cancer patients.sav, predict distress at time 2 from distress at
time 1, blame person, and blame behavior. Compare your output to the results
presented in Table 11.7 in the textbook.

You might also like