Introduction To Linear Regression and Correlation Analysis
Introduction To Linear Regression and Correlation Analysis
Introduction To Linear Regression and Correlation Analysis
Regression and
Correlation Analysis
Scatter Diagrams
(a) Linear
Two Variable Relationships
(b) Linear
Two Variable Relationships
(c) Curvilinear
Two Variable Relationships
(d) Curvilinear
Two Variable Relationships
(e) No Relationship
Correlation
r
( x x )( y y )
[ ( x x ) ][ ( y y )
2 2
]
where:
r = Sample correlation coefficient
n = Sample size
x = Value of the independent variable
y = Value of the dependent variable
Correlation
SAMPLE CORRELATION COEFFICIENT
n xy x y
r
[n( x 2 ) ( x) 2 ][n( y 2 ) ( y ) 2 ]
Correlation
Sales Years
y x yx y2 x2
487 3 1,461 237,169 9
445 5 2,225 198,025 25
272 2 544 73,984 4
641 8 5,128 410,881 64
187 2 374 34,969 4
440 6 2,640 193,600 36
346 7 2,422 119,716 49
238 1 238 56,644 1
312 4 1,248 97,344 16
269 2 538 72,361 4
655 9 5,895 429,025 81
563 6 3,378 316,969 36
n xy x y
r
[n( x ) ( x) ][n( y ) ( y ) ]
2 2 2 2
12(26,091) 55(4,855)
r
[12(329) (55) 2 ][12(2,240,687) (4,855) 2 ]
0.8325
Correlation
+7
+1 +4
-2 𝒚 = 10
-5 -5
Residual
y yˆ
• The best fit line is the one that minimizes the sum of the squares of the
residuals (errors).
• The error is the difference between the actual data point and the point on
the line.
• SSE (Sum Of Squared Errors) = (-5)2 + 72 + 12 + (-2) 2 + 42 + (-5) 2 = 120
+
7 +
+
1 4
- 𝒚=
- 2 -
10
5 5
y 0 1 x
where:
y = Value of the dependent variable
x = Value of the independent variable
0= Population’s y-intercept
1 = Slope of the population regression line
= Error term, or residual
Simple Linear Regression
Analysis
The simple linear regression model has four
assumptions:
Individual values if the error terms, i, are
statistically independent of one another.
The distribution of all possible values of is normal.
The distributions of possible i values have equal
variances for all value of x.
The means of the dependent variable, for all specified
values of the independent variable, y, can be
connected by a straight line called the population
regression model.
Simple Linear Regression
Analysis
REGRESSION COEFFICIENTS
In the simple regression model, there
are two coefficients: the intercept and
the slope.
Simple Linear Regression
Analysis
390
400
300
312
200
Residual = 312 - 390 = -78
100
4 X
Years with Company
Simple Linear Regression
Analysis
ESTIMATED REGRESSION MODEL
(SAMPLE MODEL)
yˆ i b0 b1 x
where:
ŷ= Estimated, or predicted, y value
b0 = Unbiased estimate of the regression intercept
b1 = Unbiased estimate of the regression slope
x = Value of the independent variable
Simple Linear Regression
Analysis
LEAST SQUARES EQUATIONS
b1
( x x )( y y )
(x x) 2
algebraic equivalent:
xy x y
b1 n
( x ) 2
x 2
n
and
b0 y b1 x
Simple Linear Regression
Analysis
SSE y b0 y b1 xy
2
Simple Linear Regression Analysis
Sales Years
y x xy y2 x2
487 3 1,461 237,169 9
445 5 2,225 198,025 25
272 2 544 73,984 4
641 8 5,128 410,881 64
187 2 374 34,969 4
440 6 2,640 193,600 36
346 7 2,422 119,716 49
238 1 238 56,644 1
312 4 1,248 97,344 16
269 2 538 72,361 4
655 9 5,895 429,025 81
563 6 3,378 316,969 36
xy x y
26,091
55(4,855)
b1 n 12 49.9101
x 2
( x ) 2
329
(55) 2
n 12
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.832534056
R Square 0.693112955
Adjusted R Square 0.662424251
Standard Error 92.10553441
Observations 12
ANOVA
df SS MS F Significance F
Regression 1 191600.622 191600.622 22.58527906 0.000777416
Residual 10 84834.29469 8483.429469
Total 11 276434.9167
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 175.8288191 54.98988674 3.197475563 0.00953244 53.30369475 298.3539434 53.30369475 298.3539434
Years with Midwest 49.91007584 10.50208428 4.752397191 0.000777416 26.50996978 73.3101819 26.50996978 73.3101819
SUM OF RESIDUALS
( y yˆ ) 0
SUM OF SQUARED RESIDUALS
( y yˆ ) 2
Simple Linear Regression
Analysis
TSS ( y y ) 2
where:
TSS = Total sum of squares
n = Sample size
y = Values of the dependent variable
y= Average value of the dependent variable
Simple Linear Regression
Analysis
SSE ( y yˆ ) 2
where:
SSE = Sum of squares error
n = Sample size
y = Values of the dependent variable
ŷ= Estimated value for the average of y for the
given x value
Simple Linear Regression
Analysis
SSR ( yˆ y ) 2
where:
SSR = Sum of squares regression
y= Average value of the dependent variable
y = Values of the dependent variable
ŷ= Estimated value for the average of y for the
given x value
Simple Linear Regression
Analysis
SUMS OF SQUARES
2 SSR
R
TSS
Simple Linear Regression
Analysis
2SSR 191,600.62
R 0.6931
TSS 276,434.90
COEFFICIENT OF DETERMINATION
SINGLE INDEPENDENT VARIABLE CASE
2 2
R r
where:
R2 = Coefficient of determination
r = Simple correlation coefficient
Simple Linear Regression
Analysis
STANDARD DEVIATION OF THE
REGRESSION SLOPE COEFFICIENT
(POPULATION)
b1
where:
(x x) 2
SSE
s
n k 1
where:
SSE = Sum of squares error
n = Sample size
k = number of independent variables in the model
Simple Linear Regression
Analysis
ESTIMATOR FOR THE STANDARD
DEVIATION OF THE REGRESSION SLOPE
s s
sb1
(x x) 2
x 2
( x ) 2
where: n
sb1= Estimate of the standard error of the least squares
slope
s = SSE Sample standard error of the estimate
n2
Simple Linear Regression
Analysis
TEST STATISTIC FOR TEST OF
SIGNIFICANCE OF THE REGRESSION SLOPE
b1 1
t df n 2
where: sb1
b1 = Sample regression slope coefficient
1 = Hypothesized slope
sb1 = Estimator of the standard error of the slope
Significance Test of
Regression Slope
H 0 : 1 0. 0
H A : 1 0.0
0.05
Rejection Region Rejection Region
/2 = 0.025 /2 = 0.025
SSR
MSR
where: k
SSR = Sum of squares regression
k = Number of independent variables in the model
Simple Linear Regression
Analysis
SSE
MSE
where: n k 1
SSE = Sum of squares error
n = Sample size
k = Number of independent variables in the model
Significance Test
H 0 : 1 0.0 F Ratio
H A : 1 0.0 MSR 191,600.6
22.59
0.05 MSE 8,483.43
Rejection Region
= 0.05
F 4.96
Since F= 22.59 > 4.96, reject H0: conclude that the
regression model explains a significant amount of the
variation in the dependent variable
Simple Regression Steps
b1 t / 2 sb1
or equivalently: df n 2
s
b1 t / 2
where:
(x x) 2
1 ( x p x )2
yˆ t / 2 s
n (x x) 2
where:
ŷ = Point estimate of the dependent variable
t = Critical value with n - 2 d.f.
s = Standard error of the estimate
n = Sample size
xp = Specific value of the independent variable
x = Mean of independent variable observations
Simple Linear Regression
Analysis