Simple Regression With SPSS
Simple Regression With SPSS
Given a desire of a Retail Chain management team to develop a strategy to forecasting annual sales, the following data from a random sample of existing stores has been gathered: STORE 1 2 3 4 5 6 7 8 9 10 11 12 13 14 SQUARE FOOTAGE 1726.00 1642.00 2816.00 5555.00 1292.00 2208.00 1313.00 1102.00 3151.00 1516.00 5161.00 4567.00 5841.00 3008.00 ANNUAL SALES ($) 3681.00 3895.00 6653.00 9543.00 3418.00 5563.00 3660.00 2694.00 5468.00 2898.00 10674.00 7585.00 11760.00 4085.00
We can enter the data into SPSSPc by typing it directly into the data editor, or by cutting and pasting:
Next, by clicking on Variable View, we can apply variable and value labels where appropriate:
Assuming, for now, that if a relationship exists between the two variables, it is linear in nature, we can generate a simple Scatterplot (or Scatter Diagram) for the data. This is accomplished with the command sequence:
6000 4000 2000 0 0 1000 2000 3000 4000 5000 6000 7000
We can generate a simple straight line equation from the output resulting when using the Enter Command in regression:
Which yields:
b Variable s Ente re d/Re mov e d
Model 1
Variables Removed .
Method Enter
M ode l Summary Model 1 R R Square .954a .910 Adjusted R Square .902 Std. Error of the Estimate 936.8500
b ANOVA
Model 1
df 1 12 13
F 121.009
Sig. .000a
a. Predictors: (Constant), Square Footage of Store b. Dependent Variable: Sales Revenue of Store
SS T b0
SS E
SS R
a Coefficients
Model 1
t 1.757 11.000
95% Confidence Interval for B Lower Bound Upper Bound -216.534 2019.027 1.352 2.020
b1
So then
Yi = 901.247 + 1.686X (noting that no direct interpretation of the Y intercept at 0 Square Footage is possible, so that the intercept represents the portion of the annual sales varying due to factors other than store size)
and where
SST = SSR (regression sum of squares) + SSE (error sum of squares) = sum of the squared differences between each observed value for Y and Y-Bar SSR = sum of the squared differences between each predicted value of Y and Y-Bar SSE = sum of the squared differences between each observed value of Y and each predicted value for Y Coefficient of Determination = SSR/SSt = 0.91 (sample) Standard Error of the Estimate = SYX = SQRT { SSE / n - 2} = 936.85
Normal P-P Plot of Regression Standardized Residual Dependent Variable: Sales Revenue of Store
1.00
.75
3
.50
.25
Frequency
Std. Dev = .96 Mean = 0.00 N = 14.00 -2.00 -1.50 -1.00 -.50 0.00 .50 1.00
Of course, the assessment of normality by visually scanning the data leaves some statisticians unsettled; so I usually add an appropriate test of normality conducted on the data: Variable Stand._Resid. n 14 A-D 0.348 p-value 0.503
2. Homoscedasticity - the assumption that the variability of data around the regression line be constant for all values of X. In other words, error must be independent of X. Generally, this assumption may be tested by plotting the X values against the raw residuals for Y. In SPSS, this must be done by plotting a Scatterplot from the saved variables:
Click Here
2000
1000
Unstandardized Residual
-1000
Other authors, including those who wrote the SPSS routine, choose to plot the X values against the Studentized Residuals (Standardized Residuals Adjusted for their distance from the average X value) rather than the Unstandardized (raw) Residuals. SPSS will generate this plot automatically (select this under the Plots panel):
Studentized Residual
.5 0.0 -.5 -1.0 -1.5 -2.0 -2.5 1000 2000 3000 4000 5000 6000
Corre lations Square Footage of Store 1.000 . 14 .000 1.000 14 .015 .959 14 Unstanda rdized Studentize Residual d Residual .000 .015 1.000 .959 14 14 1.000 .999** . .000 14 14 .999** 1.000 .000 . 14 14
Unstandardized Residual
Studentized Residual
Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N
It should be noted that the distribution of the data also suggest that an assumption of linearity is also reasonable at this point. 3) Independence of the Errors - assumes that no autocorrelation is present. Generally, evaluated by plotting the residuals in the order or sequence in which the original data were collected. This approach, when meaningful, uses the Durbin-Watson Statistic and associated Tables of Critical values. SPSS can generate this value when requested as part of the Model Summary:
b M ode l Summary
Change Statistics Model 1 R R Square .954a .910 Adjusted R Square .902 Std. Error of the Estimate 936.8500 R Square Change .910 F Change 121.009 df1 1 df2 12 Sig. F Change .000 Durbin-W atson 2.446
a. Predictors: (Constant), Square Footage of Store b. Dependent Variable: Sales Revenue of Store
A number of other statistics are also available in SPSS regarding Residual Analysis:
a Re siduals Statistics
Minimum Maximum Mean Predicted Value 2759.3672 10749.96 5826.9286 Std. Predicted Value -1.073 1.722 .000 Standard Error of 250.7362 512.8126 345.3026 Predicted Value Adjusted Predicted Value 2771.8208 10518.55 5804.4373 Residual -1888.14 1070.6108 -3.25E-13 Std. Residual -2.015 1.143 .000 Stud. Residual -2.092 1.288 .011 Deleted Residual -2033.82 1442.1392 22.4913 Stud. Deleted Residual -2.512 1.329 -.014 Mahal. Distance .003 2.967 .929 Cook's Distance .001 .355 .086 Centered Leverage Value .000 .228 .071 a. Dependent Variable: Sales Revenue of Store
Std. Deviation 2858.2959 1.000 81.3831 2830.7178 900.0964 .961 1.035 1049.3911 1.111 .901 .103 .069
N 14 14 14 14 14 14 14 14 14 14 14 14
Model 1
t 1.757 11.000
95% Confidence Interval for B Lower Bound Upper Bound -216.534 2019.027 1.352 2.020
b ANOVA
Model 1
df 1 12 13
F 121.009
Sig. .000a
a. Predictors: (Constant), Square Footage of Store b. Dependent Variable: Sales Revenue of Store
noting that t2, as expected, equals F; and the p-values are therefore equal. Note that SPSS also provides the confidence interval associated with the slope. Finally, SPSS allows you to calculate and store both Confidence and Prediction Limits for the observed data. After you generate the scatterplot, left double-click on the chart; this will take you to the chart editor:
Next:
Then:
10000
8000
6000
4000
Rsq = 0.9098
LCL 3135.52558 2976.95430 5102.73145 9232.70820 2309.22155 4028.95209 2349.56701 1942.80866 5663.35086 2737.79303 8677.59067 7827.42925 9632.63839 5426.83323
UCL 4487.50548 4362.80609 6196.07384 11302.74446 3850.24435 5219.51308 3880.71656 3575.92595 6765.16486 4177.06134 10529.18763 9376.22071 11867.28348 6519.44789
LPL 1661.27256 1514.25297 3536.24581 7979.09247 897.92860 2497.98206 935.07592 560.87909 4100.00127 1293.06683 7362.03125 6418.64584 8422.94738 3860.07783
UPL 5961.75850 5825.50741 7762.55948 12556.36019 5261.53731 6750.48311 5295.20765 4957.85553 8328.51446 5621.78754 11844.74705 10785.00412 13076.97449 8086.20329