Correlation and Regression
Correlation and Regression
Correlation and Regression
UNIT - III
Unit-4
CORRELATION AND REGRESSION
Unit Structure :
4.0 Objectives
4.1 Introduction
4.2 Types of Correlation
4.3 Measurement of Correlation
4.4 Rank Correlation
4.5 Regression Analysis
4.0 OBJECTIVES
4.1 INTRODUCTION
Between these variables we can note that there exist some sort of
interrelationship or cause and effect relationship. i.e. change in the value
of one variable brings out the change in the value of other variable also.
Such relationship is called as correlation.
Therefore, correlation analysis gives the idea about the nature and
extent of relationship between two variables in the bivariate data.
O Fig 4.2 X
64
Y
(iii) Perfect Negative Correlation:
If the graph of the values of the variables is a
straight line with negative slope as shown in
Figure 4.3, we say there is a perfect negative O X
correlation between X and Y. Here r = –1. Fig. 4.3
(iv) Imperfect Negative Correlation:
If the graph of the values of X and Y show a band Y
of points from upper left corner to the lower right
corner as shown in Figure 4.4, then we say that
there is an imperfect negative correlation. Here –1 < r < 0
O Fig. 4.3 X
(v) Zero Correlation:
If the graph of the values of X and Y do not show any of the above trend
then we say that there is a zero correlation between X and Y. The graph of
such type can be a straight line perpendicular to the axis, as shown in
Figure 4.5 and 4.6, or may be completely scattered as shown in Figure
4.7. Here r = 0.
Y Y Y
The Figure 4.5 show that the increase in the values of Y has no effect on
the value of X, it remains the same, hence zero correlation. The Figure 4.6
show that the increase in the values of X has no effect on the value of Y, it
remains the same, hence zero correlation. The Figure 4.7 show that the
points are completely scattered on the graph and show no particular trend,
hence there is no correlation or zero correlation between X and Y.
1 1 2
S.D.(x) = x = x-X) 2 = x X 2 and
n n
1 1 2
S.D.(y) = y = y-Y ) 2 y Y 2
n n
Remark : We can also calculate this co-efficient by using the formula
given by
1 xy
x-X) y-Y ) - XY
r n n
1 1
x-X) y-Y )
2 2 x 2 2 y
2
X Y 2
n n n n
SOLVED EXAMPLES :
X Y XxY X2 Y2
12 7 84 144 49
10 14 140 100 196
20 6 120 400 36
13 12 156 169 144
15 11 165 225 121
x = 70 y = 50 xy= 665 x =1038 y2 =546
2 And n= 5
Cov( x, y )
r
x . y
Where,
66
x 70 y 50
x= = = 14 y= = =10
n 5 n 5
xy 1 2 1 2
Cov(x,y) = - XY x = x X 2 y = y Y 2
n n n
665 1038 546
= - 14x10 = 142 = 102
5 5 5
= 133 - 140 = 11.6 =3.40 9.2 =3.03 = -07
7
r = -0.68
3.40x3.03
.
... r = - 0.68
Table of calculation:
Marks Marks In
InMaths Accounts
X Y XY X2 Y2
28 30 840 784 900
25 40 1000 625 1600
32 50 1600 1024 2500
16 18 288 256 324
20 25 500 400 625
15 12 180 225 144
19 11 209 361 121
17 21 357 289 441
40 45 1800 1600 2025
30 35 1050 900 1225
x = 242 y= 287 xy= 7824 x2 = 6464 y2 9905
n=10
xy 1 2 1 2
Cov(x,y) = - XY x = x X 2 y = y Y 2
n n n
7824 6464 9905
= - 24.2x28.7 x= 24.22 y = 28.7 2
10 10 10
Remarks: We can note that, the value of ‘R’ always lies between –1 and
+1
The positive value of ‘R’ indicates the positive correlation (association) in
the rank allocation. Whereas, the negative value of ‘R’ indicates the
negative correlation (association) in the rank allocation.
68
SOLVED EXAMPLES:
Example 3
a) When ranks are given:-
Data given below read the ranks assigned by two judeges to 8 participants.
Calculate the co-efficient of Rank correlation.
Example: The data given below scores assigned by two judges for 10
participants in the singing competition. Calculate the Spearman’s Rank
correlation co-efficient.
Note: In this example we can note that the ranks are in fraction e.g. 4.5,
which is logically incorrect or meaningless. Therefore in the calculation of
‘R’ we add a correction factor (C.F.) to d2 calculated as follows.
Value Frequency
Repeated M m(m2-1)
35 2 2x(22-1)=6
28 2 6
26 2 6
Total m(m2-1)=18
m3 m)
Now C.F . = 18/12 =1.5
12
... d2 = 117.5+1.5= 119
We use this value in the calculation of ‘R’
Now the Spearman’s rank correlation co-efficient is given by
6d 2
R= 1-
n(n2 1)
6x 119
Substituting the values we get, R= 1- = 1-0.72 = 0.28
10(10 2 1)
EXERCISE I
10.Calculate the coefficient of rank correlation from the data given below.
X: 40 33 60 59 50 55 48
Y: 70 60 85 75 72 82 69
Marks in
Maths: 15 20 28 12 40 60 20 80
Marks in
Accounts: 40 30 50 30 20 10 25 60
(i) Variable X depends on variable y. in this case we can find the value of
x if know the value of y. This is called regression of x on .
(ii) Variable depends on variable X. we can find the value of y if know
the value of X. This is called regression of y on x. Hence there are two
regressions,
Regression of X on Y Regression of X on Y
Where,
1 1
Cov(x,y) = (x- x ) (y- y ) = xy- x y
n n
1 1
V(x) = (x- x )2 and V(y) = (y- y )2
n n
1 1
V(x) = x2 - x 2
and V(y) = y2- y 2
n n
Use: To find X Use: To find
73
SOLVED EXAMPLES
Example 1:
Obtain the two regression equatione and hence find the value of x when
y=25
Data:-
X Y X2 Y2 XxY
8 15 64 225 120
10 20 100 400 200
12 30 144 900 360
15 40 225 1600 600
20 45 400 2025 900
y =
2
xy=
x= 65 y =150 x = 9332
5150 2180
And n= 5
Where,
1 65 1 150
x= x = =13 and y= y = =30
n 5 n 5
Also,
1 1 1
Cov(x,y,) = xy- x y V(x) = x2 - x 2 V(y) = y2- y 2
n n n
2180 933 5150
= - 13x30 = -132 = -302
5 5 5
= 436-390 = 186.6 - 169 = 1030 – 900
Now we find,
Regression co-efficient of X on Y Regression co-efficient of X on Y
Cov(x, y ) Cov(x, y )
bxy = byx =
V ( y) V (x)
46 46
= =
130 17.6
.
. . bxy = 0.35 and byx = 2.61
Remark:
From the above example we can note some points about Regression
coefficients.
Both the regression coefficients carry the same sign (+ or -)
Both the regression coefficients can not be greater than 1 in
number
(e.g. -1.25 and -1.32) is not possible.
Product of both the regression coefficients bxy and byx must be < 1
i.e. bxy X byx < 1 Here 0.35x2.61 = 0.91< 1 (Check this
always)
Example 2:
Obtain the two regression equations and hence find the value of y when
x=10
Data:-
X Y XxY X2 Y2
12 25 300 144 625
20 18 360 400 324
8 17 136 64 289
14 13 182 196 169
16 15 240 256 225
x= 70 y=88 xy=1218 x =1060 y2=1632
2
And n= 5
Where,
1 70 1 88
x= x = = 14 and y = y = =17.6
n 5 n 5
Also,
1 1 1
Cov(x,y,) = xy- x y V(x) = x2 - x 2 V(y) = y2- y 2
n n n
1218 1060 1632
= – 14x17.6 = -142 = -17.62
5 5 5
= 243.6-246.4 = 212 -196 = 326.4 – 309.76
... Cov(x,y) = -2.8 V(x) = 16 V(y) = 16.64
75
Now we find,
Regression co-efficient of X on Y Regression co-efficient of X on Y
Cov(x, y ) Cov(x, y )
bxy = byx =
V ( y) V (x)
2.8 2.8
= =-
16.64 16.64
... bxy = - 0.168 byx = 0.175
Example 3:
The following data give the experience of machine operators and
their performance rating given by the number of good parts turned out per
100 pieces.
Operator: 1 2 3 4 5 6 7 8
Experience: 16 12 18 4 3 10 5 12
(in years)
Performance: 87 88 89 68 78 80 75 83
Rating
Obtain the two regression equations and estimate the permance rating of
an operator who has put 15 years in service.
X Y Xy x2 Y2
16 87 1392 256 7569
12 88 1056 144 7744
18 89 1602 324 7921
4 68 272 16 4624
3 78 234 9 6084
10 80 800 100 6400
5 75 375 25 5625
12 83 996 144 6889
x= 80 y= 648 xy= 6727 x2= 1018 y2= 52856
76
Where,
1 80 1 648
x = x = = 10 and y = y = =81
n 8 n 8
Also,
1 1 1
Cov(x,y,) = xy- x y V(x) = x2 - x 2 V(y) = y2- y 2
n n n
6727 1018 52856
= –10x81 = -102 = -812
8 8 8
= 840.75- 810 = 127.25 - 100 = 6607 – 6561
... Cov(x,y) = 30.75 V(x) = 27.25 V(y) = 46
Now we find,
Regression co-efficient of X on Y Regression co-efficient of X on Y
Cov(x, y ) Cov(x, y )
bxy = byx =
V ( y) V (x)
30.75 30.75
= =
46 27.25
Now to estimate Performance rating (y) when Experience (x) = 15, we use
the regression equation of y on x
... (y-81) =1.13(15-10)
. . . y = 81+ 5.65 = 86.65
Hence the estimated performance rating for the operator with 15 years of
experience is approximately 86.65 i.e approximately 87
Also consider,
x y
bxy X byx = r r = r2 i.e. r = bxy x byx
y x
Example 5:
You are given the information about advertising expenditure and sales:
---------------------------------------------------------------------------------------
Exp. on Advertisiment Sales (Rs. In Lakh)
(Rs. In Lakh)
----------------------------------------------------------------------------------------
Mean 10 90
S.D. 3 12
----------------------------------------------------------------------------------------
Coefficient of correlation between sales and expenditure on Advertisement
is 0.8.Obtain the two regression equations.
Therefore we have,
x = 10, y =90, 6x= 3, 6y = 12 and r = 0.8
Now, using the above results we can write the two regression equations as
(x- x ) = r x (y- y ) -------x on y (i)
y
y
(y- y ) = r (x- x ) ------- y on x (ii)
x
Substituting the values in the equations we get,
3
(x-10) = 0.8 (y-90)
12
i.e x- 10 = 0.2 (y-90) -------x on y (i)
12
also (y-90) = 0.8 (x-10)
3
i.e. y-90 = 3.2 (x-10) -------y on x (ii)
Now when expenditure on advertisement (x) is 15, we can find the sales
from eqn (ii) as,
y-90 = 3.2 (15-10)
... y = 90 + 16 = 106
78
X Y
Mean 40 45
Standard deviation 10 9
Solution: We have,
x = 40, y =45, x= 10, y = 9 and r = 0.5
Example 7:
Find the marks of a student in the Subject of Mathematics who
have scored 65 marks in Accountancy Given,
Therefore we have,
x = 70, y =80, σx= 8, σy =10 and r = 0.64
Example 8:
From the following regression equation, find means x , y , x, y and ‘r’
3x-2y-10 = 0, 24x-25y+145 = 0
Example 9:
Find the means values of x,y, and r from the two regression equations.
3x+2y-26=0 and x+y-31=0. Also find x when y = 3.
0.16x3
- 1.5 = -
6x
0.48
... 6x = = 0.32
1.5
EXERCISES
1. What is mean by Regression? Explain the use of regression in the
statistical analysis.
2. Why are there two Regressions? Justify.
3. State the difference between Correlation and Regression.
4. Obtain the two regression equations from the data given bellow.
X: 7 4 6 5 8
Y: 6 5 9 8 2
Hence estimate y when x = 10.
5. The data given below are the years of experience (x) and monthly
wages (y) for a group of workers. Obtain the two regression equations
and approximate the monthly wages of a workers who have completed
15 years of service.
82
Experience: 11 7 9 5 8 6 10
In years
Monthly wages: 10 8 6 8 9 7 11
(in ‘000Rs.)
6. Following results are obtained for a bivariate data. Obtain the two
regression equations and find y when x = 12
Student no: 1 2 3 4 5 6 7 8 9 10
Marks 13 18 9 6 14 10 20 28 21 16
in Maths
Marks 12 25 11 7 16 12 24 25 22 20
in Stats:
8. The data given below are the price and demand for a certain
commodity over a period of 7 years. Find the regression equation of Price
on Demand and hence obtain the most likely demand for the in the year
2008 when it’s price is Rs.23.
Price(in RS): 15 12 18 22 19 21 25
Demand 89 86 90 105 100 110 115
(100 units)
12. Find the values of x,y, and r from the two regression equations given
bellow. 3x+2y-26=0 and 6x+y-31=0. Also find 6x when y = 3.
14. The two regression equations for a certain data were y = x+5 and 16x
= 9y-94. Find values of x , y and r. Also find the S.D. of y when S.D. of x
is 2.4.