Simple Regression With SPSS

Example of Using SPSS to Generate a Simple Regression Analysis
Given a desire of a Retail Chain management team to develop a strategy to forecasting annual sales, the following data from a random sample of existing stores has been gathered: STORE 1 2 3 4 5 6 7 8 9 10 11 12 13 14 SQUARE FOOTAGE 1726.00 1642.00 2816.00 5555.00 1292.00 2208.00 1313.00 1102.00 3151.00 1516.00 5161.00 4567.00 5841.00 3008.00 ANNUAL SALES ($) 3681.00 3895.00 6653.00 9543.00 3418.00 5563.00 3660.00 2694.00 5468.00 2898.00 10674.00 7585.00 11760.00 4085.00
We can enter the data into SPSSPc by typing it directly into the data editor, or by cutting and pasting:
Next, by clicking on Variable View, we can apply variable and value labels where appropriate:
Assuming, for now, that if a relationship exists between the two variables, it is linear in nature, we can generate a simple Scatterplot (or Scatter Diagram) for the data. This is accomplished with the command sequence:
Which yields the following (editable) scatterplot:
Regression Analysis for Site Selection

Simple Scatterplot of Data
14000 12000 10000 8000
Sales Revenue of Store
6000 4000 2000 0 0 1000 2000 3000 4000 5000 6000 7000
Square Footage of Store
We can generate a simple straight line equation from the output resulting when using the Enter Command in regression:
Which yields:
b Variable s Ente re d/Re mov e d
Model 1
Variables Entered Square Footage of a Store
Variables Removed .
Method Enter
a. All requested variables entered. b. Dependent Variable: Sales Revenue of Store
M ode l Summary Model 1 R R Square .954a .910 Adjusted R Square .902 Std. Error of the Estimate 936.8500
a. Predictors: (Constant), Square Footage of Store
b ANOVA
Model 1
Regression Residual Total
Sum of Squares 1.06E+08 10532255 1.17E+08
df 1 12 13
Mean Square 106208119.7 877687.937
F 121.009
Sig. .000a
a. Predictors: (Constant), Square Footage of Store b. Dependent Variable: Sales Revenue of Store
SS T b0
SS E
SS R
a Coefficients
Model 1
(Constant) Square Footage of Store
Unstandardized Coefficients B Std. Error 901.247 513.023 1.686 .153
Standardi zed Coefficien ts Beta .954
t 1.757 11.000
Sig. .104 .000
95% Confidence Interval for B Lower Bound Upper Bound -216.534 2019.027 1.352 2.020
a. Dependent Variable: Sales Revenue of Store
b1
So then
Yi = 901.247 + 1.686X (noting that no direct interpretation of the Y intercept at 0 Square Footage is possible, so that the intercept represents the portion of the annual sales varying due to factors other than store size)
and where
SST = SSR (regression sum of squares) + SSE (error sum of squares) = sum of the squared differences between each observed value for Y and Y-Bar SSR = sum of the squared differences between each predicted value of Y and Y-Bar SSE = sum of the squared differences between each observed value of Y and each predicted value for Y Coefficient of Determination = SSR/SSt = 0.91 (sample) Standard Error of the Estimate = SYX = SQRT { SSE / n - 2} = 936.85
Testing the General Assumptions of Regression and Residual Analysis

1. Normality of Error - similar to the t-test and ANOVA, regression is robust to departure from the normality of errors around the regression line. This assumption is often tested by simply plotting the Standardized Residuals (each residual divided by its standard error) on a histogram with a superimposed normal distribution, or on a normalo probability plot. SPSS allows us to perform both functions automatically (while, incidentally, saving the residual values in the original data file if this option is toggled):
Histogram Dependent Variable: Sales Revenue of Store

5
Normal P-P Plot of Regression Standardized Residual Dependent Variable: Sales Revenue of Store
1.00
.75
3
.50
Expected Cum Prob
.25
Frequency
Std. Dev = .96 Mean = 0.00 N = 14.00 -2.00 -1.50 -1.00 -.50 0.00 .50 1.00
0.00 0.00 .25 .50 .75 1.00
Regression Standardized Residual
Observed Cum Prob
Of course, the assessment of normality by visually scanning the data leaves some statisticians unsettled; so I usually add an appropriate test of normality conducted on the data: Variable Stand._Resid. n 14 A-D 0.348 p-value 0.503
2. Homoscedasticity - the assumption that the variability of data around the regression line be constant for all values of X. In other words, error must be independent of X. Generally, this assumption may be tested by plotting the X values against the raw residuals for Y. In SPSS, this must be done by plotting a Scatterplot from the saved variables:
Click Here
Results in data automatically added to the data file:
Then, simply produce the requisite scatterplot as before:
2000
1000
Unstandardized Residual
-1000
-2000 1000 2000 3000 4000 5000 6000
Notice how there is no 'fanning' pattern to the data, implying homoscedasticity.
Other authors, including those who wrote the SPSS routine, choose to plot the X values against the Studentized Residuals (Standardized Residuals Adjusted for their distance from the average X value) rather than the Unstandardized (raw) Residuals. SPSS will generate this plot automatically (select this under the Plots panel):
Scatterplot of Studentized Residuals and Square Footage (X)

1.5 1.0
Studentized Residual
.5 0.0 -.5 -1.0 -1.5 -2.0 -2.5 1000 2000 3000 4000 5000 6000

Note the equivalence of results between the two plots. Statistically speaking, the X values and Residuals may be inferred to be 0.00. We can infer this using the correlation utility in SPSSPc, which tests the null hypothesis that the Pearson rho for the population is equal to 0.00:
Corre lations Square Footage of Store 1.000 . 14 .000 1.000 14 .015 .959 14 Unstanda rdized Studentize Residual d Residual .000 .015 1.000 .959 14 14 1.000 .999** . .000 14 14 .999** 1.000 .000 . 14 14
Unstandardized Residual
Studentized Residual
Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N
**. Correlation is significant at the 0.01 level (2-tailed).
It should be noted that the distribution of the data also suggest that an assumption of linearity is also reasonable at this point. 3) Independence of the Errors - assumes that no autocorrelation is present. Generally, evaluated by plotting the residuals in the order or sequence in which the original data were collected. This approach, when meaningful, uses the Durbin-Watson Statistic and associated Tables of Critical values. SPSS can generate this value when requested as part of the Model Summary:
b M ode l Summary
Change Statistics Model 1 R R Square .954a .910 Adjusted R Square .902 Std. Error of the Estimate 936.8500 R Square Change .910 F Change 121.009 df1 1 df2 12 Sig. F Change .000 Durbin-W atson 2.446
A number of other statistics are also available in SPSS regarding Residual Analysis:
a Re siduals Statistics
Minimum Maximum Mean Predicted Value 2759.3672 10749.96 5826.9286 Std. Predicted Value -1.073 1.722 .000 Standard Error of 250.7362 512.8126 345.3026 Predicted Value Adjusted Predicted Value 2771.8208 10518.55 5804.4373 Residual -1888.14 1070.6108 -3.25E-13 Std. Residual -2.015 1.143 .000 Stud. Residual -2.092 1.288 .011 Deleted Residual -2033.82 1442.1392 22.4913 Stud. Deleted Residual -2.512 1.329 -.014 Mahal. Distance .003 2.967 .929 Cook's Distance .001 .355 .086 Centered Leverage Value .000 .228 .071 a. Dependent Variable: Sales Revenue of Store
Std. Deviation 2858.2959 1.000 81.3831 2830.7178 900.0964 .961 1.035 1049.3911 1.111 .901 .103 .069
N 14 14 14 14 14 14 14 14 14 14 14 14
Inferences About the Model and Interval Estimates

We can determine the presence of a significant relationship between X and Y by testing to determine whether the observed slope is significantly greater than 0, the hypothesized slope of the regression line if no relationship existed. This can be done with a t-test, which divides the observed slope by the standard error of the slope (supplied by SPSS):
a Coe fficie nts
Model 1
(Constant) Square Footage of Store
Unstandardized Coefficients B Std. Error 901.247 513.023 1.686 .153
Standardi zed Coefficien ts Beta .954
t 1.757 11.000
Sig. .104 .000
95% Confidence Interval for B Lower Bound Upper Bound -216.534 2019.027 1.352 2.020
a. Dependent Variable: Sales Revenue of Store
or with an ANOVA model, which provides identical results:
b ANOVA
Model 1
Regression Residual Total
Sum of Squares 1.06E+08 10532255 1.17E+08
df 1 12 13
Mean Square 106208119.7 877687.937
F 121.009
Sig. .000a
noting that t2, as expected, equals F; and the p-values are therefore equal. Note that SPSS also provides the confidence interval associated with the slope. Finally, SPSS allows you to calculate and store both Confidence and Prediction Limits for the observed data. After you generate the scatterplot, left double-click on the chart; this will take you to the chart editor:
Next:
Then:
Click on Fit Options
Regression Analysis for Site Selection

Scatterplot of Data Including Confidence & Prediction Limits
12000
10000
8000
Sales Revenue of Store
6000
4000
2000 1000 2000 3000 4000 5000 6000
Rsq = 0.9098
LCL 3135.52558 2976.95430 5102.73145 9232.70820 2309.22155 4028.95209 2349.56701 1942.80866 5663.35086 2737.79303 8677.59067 7827.42925 9632.63839 5426.83323
UCL 4487.50548 4362.80609 6196.07384 11302.74446 3850.24435 5219.51308 3880.71656 3575.92595 6765.16486 4177.06134 10529.18763 9376.22071 11867.28348 6519.44789
LPL 1661.27256 1514.25297 3536.24581 7979.09247 897.92860 2497.98206 935.07592 560.87909 4100.00127 1293.06683 7362.03125 6418.64584 8422.94738 3860.07783
UPL 5961.75850 5825.50741 7762.55948 12556.36019 5261.53731 6750.48311 5295.20765 4957.85553 8328.51446 5621.78754 11844.74705 10785.00412 13076.97449 8086.20329

Simple Regression With SPSS

Uploaded by

Simple Regression With SPSS

Uploaded by

Example of Using SPSS to Generate a Simple Regression Analysis

Which yields the following (editable) scatterplot:

Regression Analysis for Site Selection

Sales Revenue of Store

Square Footage of Store

Variables Entered Square Footage of a Store

a. All requested variables entered. b. Dependent Variable: Sales Revenue of Store

a. Predictors: (Constant), Square Footage of Store

Regression Residual Total

Sum of Squares 1.06E+08 10532255 1.17E+08

Mean Square 106208119.7 877687.937

(Constant) Square Footage of Store

Unstandardized Coefficients B Std. Error 901.247 513.023 1.686 .153

Standardi zed Coefficien ts Beta .954

Sig. .104 .000

a. Dependent Variable: Sales Revenue of Store

Testing the General Assumptions of Regression and Residual Analysis

Histogram Dependent Variable: Sales Revenue of Store

Expected Cum Prob

0.00 0.00 .25 .50 .75 1.00

Regression Standardized Residual

Observed Cum Prob

Results in data automatically added to the data file:

Then, simply produce the requisite scatterplot as before:

-2000 1000 2000 3000 4000 5000 6000

Square Footage of Store

Notice how there is no 'fanning' pattern to the data, implying homoscedasticity.

Scatterplot of Studentized Residuals and Square Footage (X)

Square Footage of Store

Square Footage of Store

**. Correlation is significant at the 0.01 level (2-tailed).

Inferences About the Model and Interval Estimates

(Constant) Square Footage of Store

Unstandardized Coefficients B Std. Error 901.247 513.023 1.686 .153

Standardi zed Coefficien ts Beta .954

Sig. .104 .000

a. Dependent Variable: Sales Revenue of Store

or with an ANOVA model, which provides identical results:

Regression Residual Total

Sum of Squares 1.06E+08 10532255 1.17E+08

Mean Square 106208119.7 877687.937

Click on Fit Options

Regression Analysis for Site Selection

Sales Revenue of Store

2000 1000 2000 3000 4000 5000 6000

Square Footage of Store

You might also like