Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
A Scatter Diagram
is a chart that shows the
relationship between
two variable.
The Independent
The Dependent Variable
is the variable being Variable provides the
predicted or estimated. basis for estimation. It
is the predictor variable.
Independent variable, x or
dependent variables, y?
Weight Height
y x
Sales calls x y sold
Units
Saving Income
y x
Revision Hours Grading
x y
Pressure of balloon yTemperature
x in a room
The Coefficient of Correlation (r) is a measure of the
strength of the relationship between two variables.
Also called Pearson’s r and It requires interval or ratio-
Pearson’s product moment scaled data.
correlation coefficient.
It can range from P e a r s o n 's r
-1.00 to 1.00.
Values of -1.00 or 1.00
indicate perfect and strong
correlation. -1 0 1
Negative values indicate an Values close to 0.0 indicate
inverse relationship and weak correlation.
positive values indicate a
direct relationship.
10
9
8
7
6
Y 5
4
3
r=1
2
1
0
0 1 2 3 4 5 6 7 8 9 10
X
Perfect Positive Linear
Correlation
10
9
8
r = -1
7
6
Y 5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
X
Nonlinear Correlation
Example 1
The manager of MRM System randomly selected 10 sales
representative and determined the number of sales calls each one
made last month and the number of units of the product he/she
sold. Sales Number of Number of
Representative sales calls units sold
Mohd Ali 14 28
Budiyanto 35 66
Cheng Long 22 38
Fathimah 29 70
Hashim 6 22
Kamarul 15 27
Rajagopal 17 28
Roslina 20 47
Swee Lee 12 14
Siti Rahimah 29 68
Represent the above information
in a Scatter Diagram.
2
x 2
where SS xx x
n
2
y 2
SS yy y
n
x y
SS xy xy
n
x y
14 28 14x28=784
35 66
22 38
29 70
6 22
15 27
17 28
20 47
12 14
29 68
x y
14 28 196 784 392
35 66 1225 4356 2310
22 38 484 1444 836
29 70 841 4900 2030
6 22 36 484 132
15 27 225 729 405
17 28 289 784 476
20 47 400 2209 940
12 14 144 196 168
29 68 841 4624 1972
199 408 4681 20510 9661
We calculate the coefficient of correlation from the
following formula.
2
x 2
SS xx x
n
2
y 2
SS yy y
n
x y
SS xy xy
n
SS xy
r
SS xx SS yy
SS xy
b a y bx
SS xx
where
x y
SS xy xy
n
2
x 2
SS xx x
n
SS xy a y bx
b
SS xx
2
x 2
SS xx x
n
2
y 2
SS yy y
n
x y
SS xy xy
n
SS xy
r
SS xx SS yy
Hour 12 9 16 3 15 5 16
GPA 3.52 3.31 3.75 2.10 4.00 1.69 3.74
4.00
3.50
3.00
2.50
2.00
1.50
1.00
0.50
0.00
2 4 6 8 10 12 14 16 18
r = 0.9252
There is a strong and positive correlation between the
GPA and the number hours studied by students.
The Least Square Regression Line is b = 0.1555
yˆ 1.4703 0.1555 x a = 1.4703