© Ncert Not To Be Republished: Correlation
© Ncert Not To Be Republished: Correlation
© Ncert Not To Be Republished: Correlation
d
e
h
s
T
i
l
R
b
E
u
C
p
N re
e
b
o
t
t
o
n
1. INTRODUCTION
Correlation
92
d
e
h
s
T
i
l
R
b
E
u
C
p
N re
e
b
o
t
t
o
n
2. TYPES
OF
RELATIONSHIP
CORRELATION
93
3. T E C H N I Q U E S
CORRELATION
FOR
MEASURING
d
e
h
s
T
i
l
R
b
E
u
C
p
N re
e
b
o
t
t
Types of Correlation
o
n
Scatter Diagram
94
d
e
h
s
T
i
l
R
b
E
u
C
p
N re
e
b
o
t
t
o
n
Activity
X=
X
;
N
Y=
Y
N
s2 x =
( X - X )2 X 2
=
- X2
N
N
CORRELATION
95
o
n
d
e
h
s
T
i
l
R
b
E
u
C
p
N re
e
b
o
t
t
96
and
( Y - Y )2 Y 2
=
=
- Y2
N
N
( X - X )( Y - Y ) xy
=
N
N
Where x = X - X and y = X - Y
are the deviations of the ith value of X
and Y from their mean values
respectively.
The sign of covariance between X
and Y determines the sign of the
correlation coefficient. The standard
deviations are always positive. If the
covariance is zero, the correlation
coefficient is always zero. The product
moment correlation or the Karl
Pearsons measure of correlation is
given by
r = xy Ns s
x y
or
or
...(1)
( X - X ) ( Y - Y )
( X - X )2
o
n
r=
XY -
X 2 -
( Y - Y )2
...(2)
(X )(Y )
N
(X ) 2
(Y ) 2 ...(3)
Y 2 N
N
or
r=
d
e
h
s
T
i
l
R
b
E
u
C
p
N re
e
b
o
t
t
Cov(X,Y) =
r=
NXY (X )(Y )
NX 2 (X )2
NY 2 (Y )2 ...(4)
CORRELATION
97
d
e
h
s
T
i
l
R
b
E
u
C
p
N re
e
b
o
t
t
; V=
B
D
where A and C are assumed means of
X and Y respectively. B and D are
common factors. Then
rxy = ruv
This. property is used to calculate
correlation coefficient in a highly
simplified manner, as in the step
deviation method.
As you have read in chapter 1, the
statistical methods are no substitute
for common sense. Here, is another
example, which highlights the need for
understanding the data properly
o
n
Example 1
No. of years
of schooling
of farmers
0
2
4
6
8
10
12
xy, s x , s y
98
sx =
d
e
h
s
T
i
l
R
b
E
u
C
p
N re
e
b
o
t
t
( Y - Y )2
38
sy =
=
N
7
Substituting these values in
formula (1)
42
r=
= 0.644
112
38
7
7
7
The same value can be obtained
from formula (2) also.
r=
( X - X )( Y - Y )
( X - X )2
r=
( Y - Y )2
...(2)
42
= 0.644
112
38
XY -
r=
X 2 -
(X )(Y )
N
(X ) 2
(Y ) 2
Y 2 N
N
...(3)
XY, X 2 , Y 2 .
TABLE 7.1
Calculation of r between years of schooling of farmers and annual yield
Years of
Education
(X)
o
n
0
2
4
6
8
10
12
X=42
(X X )
(X X ) 2
6
4
2
0
2
4
6
36
16
4
0
4
16
36
Annual yield
per acre in 000 Rs
(Y)
(X X )2=112
4
4
6
10
10
8
7
Y=49
(Y Y )
(Y Y )2
3
3
1
3
3
1
0
9
9
1
9
9
1
0
(X X )(Y Y )
18
12
2
0
6
4
0
numerator
CORRELATION
99
Year
199293
199394
199495
199596
199697
199798
199899
199900
200001
200102
14
17
18
17
16
12
16
11
8
10
24
23
26
27
25
25
23
25
24
23
d
e
h
s
T
i
l
R
b
E
u
C
p
N re
e
b
o
t
t
Activity
TABLE 7.2
o
n
a pr operty of r. It is that r is
independent of change in origin and
scale. It is also known as step
deviation method. It involves the
transformation of the variables X and
Y as follows:
X A
Y B
;V =
h
k
where A and B are assumed means, h
and k are common factors.
Then rUV = rXY
U=
Price
120
150 190
index (X)
Money
1800 2000 2500
supply
in Rs crores (Y)
220
230
2700
3000
100
d
e
h
s
T
i
l
R
b
E
u
C
p
N re
e
b
o
t
t
TABLE 7.3
X - 100 Y - 1700
10 100
2
5
9
12
13
U2
V2
UV
25
15
81
64
72
10
144
100
120
13
169
169
169
r=
U 2 -
UV -
(U )(V )
N
(U ) 2
N
V 2 -
378 -
423 -
o
n
= 0.98
Activity
(41) 2
5
(V )2 (3)
N
41 35
5
343 -
(35) 2
5
CORRELATION
101
d
e
h
s
T
i
l
R
b
E
u
C
p
N re
e
b
o
t
t
6D 2
...(4)
n3 n
where n is the number of observations
and D the deviation of ranks assigned
to a variable from those assigned to
the other variable. When the ranks are
repeated the formula is
rk = 1
rk = 1
( m 31 - m1 ) ( m 32 - m 2 )
6 D2 +
+
+ ...
12
12
2
n( n - 1)
m 31 m1
...,
12
their corresponding correction
factors. This correction is needed for
every repeated value of both variables.
If three values are repeated, there will
be a correction for each value. Every
time m1 indicates the number of times
a value is repeated.
All the properties of the simple
correlation coefficient are applicable
here. Like the Pearsonian Coefficient
of correlation it lies between 1 and
1. However, generally it is not as
accurate as the ordinary method. This
is due the fact that all the information
Example 3
o
n
Judge 1
A
B
C
2
4
3
3
1
5
4
5
2
5
3
4
1
2
1
102
6D2
...(4)
n3 - n
The rank correlation between A
and B is calculated as follows:
rs = 1 -
Example 4
We are given the percentage of marks,
secured by 5 students in Economics
and Statistics. Then the ranking has
to be worked out and
the rank
correlation is to be calculated.
d
e
h
s
T
i
l
R
b
E
u
C
p
N re
e
b
o
t
t
A
1
2
3
4
5
Total
D2
2
4
1
5
3
1
2
2
1
2
1
4
4
1
4
Student
14
rs = 1 -
6D2
n3 - n
A
1
2
3
4
5
Total
o
n
Marks in
Economics
(Y)
85
60
55
65
75
60
48
49
50
55
Ranking in
Statistics
(Rx)
Ranking in
Economics
(RY )
1
4
5
3
2
1
5
4
3
2
A
B
C
D
E
...(4)
Student
6 14
84
=1= 1 - 0.7 = 0.3
3
5 -5
120
The rank correlation between A
and C is calculated as follows:
=1-
Marks in
Statistics
(X)
C
1
3
5
2
4
0
1
2
2
1
D2
0
1
4
4
1
A
B
C
D
E
10
Example 5
25
55
45
60
35
30
40 15
35 40
19
42
35 42
36 48
CORRELATION
103
+
+ ...
6 D2 +
12
12
n( n 2 - 1)
3
1
m 31 - m1
..., their corresponding
12
correction factors.
X has the value 35 both at the
4th and 5th rank. Hence both are
given the average rank i.e.,
4+5
th
2
25
45
35
40
15
19
35
42
= 4.5 th rank
Y Rank of Rank of
55
80
30
35
40
42
36
48
XR'
YR''
6
1
4.5
3
8
7
4.5
2
2
1
8
7
5
4
6
3
o
n
Total
(m3 - m )
6 D 2 +
12
...(5)
rs = 1 -
n3 - n
Substituting the values of these
expressions
d
e
h
s
T
i
l
R
b
E
u
C
p
N re
e
b
o
t
t
rs = 1 -
m 3 - m 23 - 2 1
=
=
12
12
2
Using this equation
Deviation in D2
Ranking
D=R'R''
4
0
3.5
4
3
3
1.5
1
16
0
12.25
16
9
9
2.25
1
D = 65.5
6(65.5 + 0.5)
396
=183 - 8
504
= 1 - 0.786 = 0.214
rs = 1 -
4. CONCLUSION
We have discussed some techniques
for studying the relationship between
104
d
e
h
s
T
i
l
R
b
E
u
C
p
N re
e
b
o
t
t
Recap
EXERCISES
o
n
CORRELATION
105
d
e
h
s
T
i
l
R
b
E
u
C
p
N re
e
b
o
t
t
5. Of the following three measures which can measure any type of relationship
(i) Karl Pearsons coefficient of correlation
(ii) Spearmans rank correlation
(iii) Scatter diagram
8. Can r lie outside the 1 and 1 range depending on the type of data?
9. Does correlation imply causation?
10. When is rank correlation more precise than simple correlation coefficient?
11. Does zero correlation mean independence?
13. Collect the price of five vegetables from your local market every day for a
week. Calculate their correlation coefficients. Interpret the result.
14. Measure the height of your classmates. Ask them the height of their
benchmate. Calculate the correlation coefficient of these two variables.
Interpret the result.
15. List some variables where accurate measurement is difficult.
16. Interpret the values of r as 1, 1 and 0.
17. Why does rank correlation coefficient differ from Pearsonian correlation
coefficient?
18. Calculate the correlation coefficient between the heights of fathers in inches
(X) and their sons (Y)
X 65
66
57
67
68
69
70
72
Y 67
56
65
68
72
72
69
71
(Ans. r = 0.603)
o
n
1
1
1
1
2
4
3
9
106
3
6
4
8
5
10
7
14
8
16
d
e
h
s
T
i
l
R
b
E
u
C
p
N re
e
b
o
t
t
Activity
o
n