Ch18 Multiple Regression
Ch18 Multiple Regression
Ch18 Multiple Regression
Multiple Regression
Introduction
In this chapter we extend the simple linear regression model, and allow for any number of independent variables. We expect to build a model that fits the data better than the simple linear regression model.
Introduction
We all believe that weight is affected by the amount of calories consumed. Yet, the actual effect is different from one individual to another. Therefore, a simple linear relationship leaves much unexplained error.
Weight
Calories consumed
Introduction
Weight
Calories consumed
Click to to continue
In an attempt to reduce the unexplained errors, well add a second explanatory (independent) variable
4
Introduction
Weight
Calories consumed
If we believe a persons height explains his/her weight too, we can add this variable to our model. The resulting Multiple regression model is shown:
Introduction
We shall use computer printout to
Assess the model
How well it fits the data Is it useful Are any required conditions violated?
Market awareness
Customers
Community
Physical
X1 Rooms
Number of hotels/motels rooms within 3 miles from the site.
x2 Nearest
Distance to the nearest La Quinta inn.
x3 Office space
x4 Enrollment
College Enrollment
x5 Income
Median household income.
x6 Distance
Distance to downtown.
11
This is the sample regression equation (sometimes called the prediction equation) MARGIN = 38.14 - 0.0076ROOMS +1.65NEAREST + 0.02OFFICE +0.21COLLEGE +0.41INCOME - 0.23DISTTWN
Coefficients Standard Error t Stat Intercept 38.13858 6.992948 5.453862 Number -0.00762 0.001255 -6.06871 Nearest 1.646237 0.632837 2.601361 Office Space 0.019766 0.00341 5.795594 Enrollment 0.211783 0.133428 1.587246 Income 0.413122 0.139552 2.960337 Distance -0.22526 0.178709 -1.26048
13
SSE se n k 1
From the printout, se = 5.5121 Calculating the mean value of y we have y 45.739
Coefficients Standard Error t Stat Intercept 38.13858 6.992948 5.453862 Number -0.00762 0.001255 -6.06871 Nearest 1.646237 0.632837 2.601361 Office Space 0.019766 0.00341 5.795594 Enrollment 0.211783 0.133428 1.587246 Income 0.413122 0.139552 2.960337 Distance -0.22526 0.178709 -1.26048
16
As you can see, SSE (thus se) effects the value of r2.
17
Coefficient of Determination
SUMMARY OUTPUT Regression Statistics Multiple R 0.724611 R Square 0.525062 Adjusted R Square 0.49442 Standard Error 5.512084 Observations 100 ANOVA df Regression Residual Total SS MS F Significance F 6 3123.832 520.6387 17.13581 3.03E-13 93 2825.626 30.38307 99 5949.458 P-value Lower 95% Upper 95% 1.11E-14 56.78049 88.12874 2.77E-08 -0.01011 -0.00513 0.010803 -2.90292 -0.38955 9.24E-08 0.012993 0.026538 0.115851 -0.05318 0.476744 0.003899 -0.69025 -0.136 0.210651 -0.12962 0.580138
From the printout, R2 = 0.5251 that is, 52.51% of the variability in the margin values is explained by this model.
Coefficients Standard Error t Stat Intercept 72.45461 7.893104 9.179483 ROOMS -0.00762 0.001255 -6.06871 NEAREST -1.64624 0.632837 -2.60136 OFFICE 0.019766 0.00341 5.795594 COLLEGE 0.211783 0.133428 1.587246 INCOME -0.41312 0.139552 -2.96034 DISTTWN 0.225258 0.178709 1.260475
18
19
If errors exist in small amounts, SSR will be close to SS(Total) and the ratio SSR/SSE will be large. This leads to the F ratio test presented next.
20
SSR MSR k
The ratio MSR/MSE is F-distributed
SSE MSE n k 1
SSR
n k 1
21
F>Fa,k,n-k-1
Note. A Large F results from a large SSR, which indicates much of the variation in y is explained by the regression model; this is when the model is useful. Hence, the null hypothesis (which states that the model is not useful) should be rejected when F is sufficiently large. Therefore, the rejection region has the form of F > Fa,k,n-k-1
22
The F ratio test is performed using the ANOVA portion of the regression output
MSR/MSE
SS MS F Significance F SS MS Significance F 3123.832 520.6387 17.13581 F 3.03E-13 2825.626 30.38307 6 3123.832 520.6387 17.13581 3.03382E-13 5949.458 Lower 95% Upper 95% 1.11E-14 56.78049 88.12874 2.77E-08 -0.01011 -0.00513 0.010803 -2.90292 MSR=SSR/k -0.38955 9.24E-08 0.012993 0.026538 MSE=SSE/(n-k-1) 0.115851 -0.05318 0.476744 0.003899 -0.69025 -0.136 23 0.210651 -0.12962 0.580138
k 93 = Residual nk1 99 = 93 2825.626 30.38307 Total 99 5949.458 n1 = Coefficients Standard Error t Stat P-value
7.893104 0.001255 0.632837 0.00341 0.133428 0.139552 0.178709 9.179483 -6.06871 -2.60136 5.795594 1.587246 -2.96034 1.260475
df 6
Intercept 72.45461 ROOMS -0.00762 SSR NEAREST -1.64624 OFFICE 0.019766 SSE COLLEGE 0.211783 INCOME -0.41312 DISTTWN 0.225258
df
26
.23%.
28
Test statistic
b i bi t sb i
d.f. = n - k -1
For example, a test for b1: t = (-.007618-0)/.001255 = -6.068 Suppose alpha=.01. t.005,100-6-1=3.39 There is sufficient evidence to reject H0 at 1% significance level. Moreover the p=value of the test is 2.77(10-8). Clearly H0 is strongly rejected. The number of rooms is linearly related to the margin.
P-value 4.04E-07 2.77E-08 0.010803 9.24E-08 0.115851 0.003899 0.210651
Lower 95% Upper 95%
Intercept 38.13858 Number -0.007618 Nearest 1.646237 Office Space0.019766 Enrollment 0.211783 Income 0.413122 Distance -0.225258
29
30
Interpretation
Interpretation of the regression results for this model
The number of hotel and motel rooms, distance to the nearest motel, the amount of office space, and the median household income are linearly related to the operating margin Students enrollment and distance from downtown are not linearly related to the margin Preferable locations have only few other motels nearby, much office space, and the surrounding households are affluent.
31
The model can be used to learn about relationships between the independent variables xi, and the dependent variable y, by interpreting the coefficients bi
32
It is predicted that the average Margin operating margin will lie 37.09149 within 25.4% and 48.8%, with 95% confidence.
25.39527 It is expected the average 48.78771 operating margin of all sites
32.96972 41.21326
that fit this category falls within 33% and 41.2% with 95% confidence.
I1 = 0; I2 = 0
37
White color
Other color
Silver color
39
as
Price = 16.837 - .0591(Odometer) + .0911(1) + .3304(0) Price = 16.837 - .0591(Odometer) + .0911(0) + .3304(0)
Odometer
40
A silver color car sells, on the average, for $330.4 more than a car of the Other color category.
41
There is insufficient evidence Car Price-Dummy to infer that a white color car and a car of other color sell for a different auction price. There is sufficient evidence to infer that a silver color car sells for a larger price than a car of the other color category.
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% 16.83725 0.1971054 85.42255 2.28E-92 16.446 17.2285 -0.059123 0.0050653 -11.67219 4.04E-20 -0.069177 -0.049068 0.091131 0.0728916 1.250224 0.214257 -0.053558 0.235819 0.330368 0.0816498 4.046157 0.000105 0.168294 0.492442
42
It is now believed that the type of undergraduate degree should be included in the model.
43
I1 = 0; I2 = 0; I3 = 0
44
Coefficients Standard Error t Stat 0.189814 1.406734 0.134932 -0.00606 0.113968 -0.05317 0.012793 0.001356 9.432831 0.098182 0.030323 3.237862 -0.34499 0.223728 -1.54199 0.705725 0.240529 2.934058 0.034805 0.209401 0.166211
45
Regression analysis is extensively employed in cases of equal pay for equal work.
46
Analysis and Interpretation The model fits the data quite well. The model is very useful. Experience is a variable strongly related to salary. There is no evidence of sex discrimination.
Coefficients Standard Error t Stat -5835.1 16082.8 -0.36282 2118.898 1018.486 2.08044 4099.338 317.1936 12.92377 1850.985 3703.07 0.499851
49
Analysis and Interpretation Further studying the data we find: Average experience (years) for women is 12. Average experience (years) for men is 17 Average salary for female manager is $76,189 Average salary for male manager is $97,832
SS MS F Significance F 3 5.74E+10 1.91E+10 72.28735 1.55E-24 96 2.54E+10 2.65E+08 99 8.29E+10 P-value Lower 95% Upper 95% 0.71754 -37759.2 26089.02 0.040149 97.21837 4140.578 9.89E-23 3469.714 4728.963 0.618323 -5499.56 9201.527
Coefficients Standard Error t Stat -5835.1 16082.8 -0.36282 2118.898 1018.486 2.08044 4099.338 317.1936 12.92377 1850.985 3703.07 0.499851
50
Review problems
51