Correlation and Regression

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

62

UNIT - III

Unit-4
CORRELATION AND REGRESSION
Unit Structure :

4.0 Objectives
4.1 Introduction
4.2 Types of Correlation
4.3 Measurement of Correlation
4.4 Rank Correlation
4.5 Regression Analysis

4.0 OBJECTIVES

 To understand the relationship between two relevant characteristics of


a statistical unit.
 Learn to obtain the numerical measure of the relationship between two
variables.
 Use the mathematical relationship between two variables in order to
estimate value of one variable from the other.
 Use the mathematical relationship to obtain the statistical constants
line means and S.D.’s

4.1 INTRODUCTION

In the statistical analysis we come across the study of two or more


relevant characteristics together in terms of their interrelations or
interdependence. e.g. Interrelationship among production, sales and
profits of a company. Inter relationship among rainfall, fertilizers, yield
and profits to the farmers.
Relationship between price and demand of a commodity When we
collect the information (data) on two of such characteristics it is called
bivariate data. It is generally denoted by (X,Y) where X and Y are the
variables representing the values on the characteristics.
Following are some examples of bivariate data.
a) Income and Expenditure of workers.
b) Marks of students in the two subjects of Maths and Accounts.
c) Height of Husband and Wife in a couple.
d) Sales and profits of a company.
63

Between these variables we can note that there exist some sort of
interrelationship or cause and effect relationship. i.e. change in the value
of one variable brings out the change in the value of other variable also.
Such relationship is called as correlation.

Therefore, correlation analysis gives the idea about the nature and
extent of relationship between two variables in the bivariate data.

4.2 TYPES OF CORRELATION:


There are two types of correlation.
a) Positive correlation. and b) Negative correlation.

4.2.1 Positive correlation: When the relationship between the variables X


and Y is such that increase or decrease in X brings out the increase or
decrease in Y also, i.e. there is direct relation between X and Y, the
correlation is said to be positive. In particular when the ‘change in X
equals to change in Y’ the correlation is perfect and positive. e.g. Sales and
Profits have positive correlation.

4.2.2Negative correlation: When the relationship between the variables X


and Y is such that increase or decrease in X brings out the decrease or
increase in Y, i.e. there is an inverse relation between X and Y, the
correlation is said to be negative. In particular when the ‘change in X
equals to change in Y’ but in opposite direction the correlation is perfect
and negative. e.g. Price and Demand have negative correlation.

4.3 MEASUREMENT OF CORRELATION

The extent of correlation can be measured by any of the following


methods:
 Scatter diagrams
 Karl Pearson’s co-efficient of correlation
 Spearman’s Rank correlation

4.3.1 Scatter Diagram: The Scatter diagram is a chart prepared by


plotting the values of X and Y as the points (X,Y) on the graph. The
pattern of the points is used to explain the nature of correlation as follows.
The following figures and the explanations would make it clearer.
(i) Perfect Positive Correlation:
If the graph of the values of the variables is a
straight line with positive slope as shown in
Figure 4.1, we say there is a perfect positive
correlation between X and Y. Here r = 1. O Fig 4.1 X
(ii) Imperfect Positive Correlation:
If the graph of the values of X and Y show a band Y
of points from lower left corner to upper right corner
as shown in Figure 4.2, we say that there is an imperfect

O Fig 4.2 X
64

positive correlation. Here 0 < r < 1.

Y
(iii) Perfect Negative Correlation:
If the graph of the values of the variables is a
straight line with negative slope as shown in
Figure 4.3, we say there is a perfect negative O X
correlation between X and Y. Here r = –1. Fig. 4.3
(iv) Imperfect Negative Correlation:
If the graph of the values of X and Y show a band Y
of points from upper left corner to the lower right
corner as shown in Figure 4.4, then we say that
there is an imperfect negative correlation. Here –1 < r < 0
O Fig. 4.3 X
(v) Zero Correlation:
If the graph of the values of X and Y do not show any of the above trend
then we say that there is a zero correlation between X and Y. The graph of
such type can be a straight line perpendicular to the axis, as shown in
Figure 4.5 and 4.6, or may be completely scattered as shown in Figure
4.7. Here r = 0.

Y Y Y

O Fig 4.5 X O Fig 4.6 X O Fig 4.7 X

The Figure 4.5 show that the increase in the values of Y has no effect on
the value of X, it remains the same, hence zero correlation. The Figure 4.6
show that the increase in the values of X has no effect on the value of Y, it
remains the same, hence zero correlation. The Figure 4.7 show that the
points are completely scattered on the graph and show no particular trend,
hence there is no correlation or zero correlation between X and Y.

4.3.2 Karl Pearson’s co-efficient of correlation.


This co-efficient provides the numerical measure of the correlation
between the variables X and Y. It is suggested by Prof. Karl Pearson and
calculated by the formula
Cov( x, y )
r
 x . y
Where, Cov(x,y) : Covariance between x & y
x.: Standard deviation of x & y: Standard deviation of y
1 1
Also, Cov(x,y) =  (x- x ) (y- y ) =  xy- x y
n n
65

1 1 2
S.D.(x) = x =  x-X) 2 = x  X 2 and
n n
1 1 2
S.D.(y) = y =   y-Y ) 2 y  Y 2
n n
Remark : We can also calculate this co-efficient by using the formula
given by

1 xy
 x-X) y-Y ) - XY
r n  n
1 1
 x-X)   y-Y )
2 2  x 2 2  y
2

  X  Y 2 
n n  n  n 

The Pearson’s Correlation co-efficient is also called as the ‘product


moment correlation co-efficient’

Properties of correlation co-efficient ‘r’


The value of ‘r’ can be positive (+) or negative(-)
The value of ‘r’ always lies between –1 & +1, i.e. –1< r<+1]

Significance of ‘r’ equals to –1, +1 & 0


When ‘r’= +1; the correlation is perfect and positive.
‘r’= -1; the correlation is perfect and negative.
and when there is no correlation ‘r’= 0

SOLVED EXAMPLES :

Example.1: Calculate the Karl Pearson’s correlation coefficient from the


following.
X: 12 10 20 13 15
Y: 7 14 6 12 11

Solution: Table of calculation,

X Y XxY X2 Y2
12 7 84 144 49
10 14 140 100 196
20 6 120 400 36
13 12 156 169 144
15 11 165 225 121
x = 70 y = 50 xy= 665 x =1038 y2 =546
2 And n= 5

The Pearson’s correlation coefficient r is given by,

Cov( x, y )
r
 x . y
Where,
66

x 70 y 50
x= = = 14 y= = =10
n 5 n 5

xy 1 2 1 2
Cov(x,y) = - XY x = x  X 2 y = y  Y 2
n n n
665 1038 546
= - 14x10 =  142 =  102
5 5 5
= 133 - 140 = 11.6 =3.40 9.2 =3.03 = -07

... Cov(x,y) = -7 x = 3.40 and y = 3.03

Substituting the values in the formula of r we get

7
r = -0.68
3.40x3.03
.
... r = - 0.68

Example 2: Let us calculate co-efficient of correlation between Marks of


students in the Subjects of Maths & Accounts. in a certain test conducted.

Table of calculation:

Marks Marks In
InMaths Accounts
X Y XY X2 Y2
28 30 840 784 900
25 40 1000 625 1600
32 50 1600 1024 2500
16 18 288 256 324
20 25 500 400 625
15 12 180 225 144
19 11 209 361 121
17 21 357 289 441
40 45 1800 1600 2025
30 35 1050 900 1225
x = 242 y= 287 xy= 7824 x2 = 6464 y2 9905

n=10

Now Pearson’s co-efficient of correlation is given by the fomula,


Cov( x, y )
r
 x . y
Where,
x 242 y 287
x= = = 24.2 y= = =28.7
n 10 n 10
67

xy 1 2 1 2
Cov(x,y) = - XY x = x  X 2 y = y  Y 2
n n n
7824 6464 9905
= - 24.2x28.7 x=  24.22 y =  28.7 2
10 10 10

=782.4 - 694.54 = 60.76 = 166.81

Cov(x,y) = 87.86, x = 7.79 and y = 12.91

... Cov(x,y) = 87.86 x = 7.79 and y = 12.91


Substituting the values in the formula of r we get
87.86
r = 0.87
7.79 x12.91
... r = 0.87

4.4 RANK CORRELATION

In many practical situations, we do not have the scores on the


characteristics, but the ranks (preference order) decided by two or more
observers. Suppose, a singing competition of 10 participants is judged by
two judges A and B who rank or assign scores to the participants on the
basis of their performance. Then it is quite possible that the ranks or scores
assigned may not be equal for all the participants. Now the difference in
the ranks or scores assigned indicates that there is a difference of openion
between the judges on deciding the ranks. The rank correlation studies the
association in this ranking of the observations by two or more observers.
The measure of the extent of association in rank allocation by the two
judges is calculated by the co-efficient of Rank correlation ‘R’. This co-
efficient was developed by the British psychologist Edward Spearman in
1904.

Mathematically, Spearman’s rank correlation co-efficient is defined as,


d 2
R= 1-
n(n 2  1)
Where d= rank difference and n= no of pairs.

Remarks: We can note that, the value of ‘R’ always lies between –1 and
+1
The positive value of ‘R’ indicates the positive correlation (association) in
the rank allocation. Whereas, the negative value of ‘R’ indicates the
negative correlation (association) in the rank allocation.
68

SOLVED EXAMPLES:

Example 3
a) When ranks are given:-
Data given below read the ranks assigned by two judeges to 8 participants.
Calculate the co-efficient of Rank correlation.

Participant Ranks by Judge Rank diff


No. A B Square d2
1 5 4 (5-4)2 = 1
2 6 8 4
3 7 ` 1 36
4 1 7 36
5 8 5 9
6 2 6 16
7 3 2 1
8 4 3 1
N=8 Total 104 = d2

Spearman’s rank correlation co-efficient is given by


6d 2
R=1-
n(n 2  1)

Substituting the values from the table we get,


6x 104
R= 1- = -0.23
8(82  1)
The value of correlation co-efficient is - 0.23. This indicates that there is
negative association in rank allocation by the two judges A and B

b) When scores are given:-


Example 4
The data given below are the marks given by two Examiners to a
set of 10 students in a aptitude test. Calculate the Spearman’s Rank
correlation co-efficient, ‘R’

Now the Spearman’s rank correlation co-efficient is given by


6d 2
R= 1-
n(n 2  1)

Substituting the values from the table we get,


6x5
R= 1-
10(10 2  1)
= 1-0.03
= 0.97
69

The value of correlation co-efficient is + 0.97. This indicates that


there is positive association in assessment of two examiners, A and B.

c) Case of repeated values:-

It is quite possible that the two participants may be assigned the


same score by the judges. In such cases Rank allocation and calculation of
rank correlation can be explained as follows.

Example: The data given below scores assigned by two judges for 10
participants in the singing competition. Calculate the Spearman’s Rank
correlation co-efficient.

Participant Score assigned By Ranks Rank


No. Judges difference
square
A B RA RB D2
1 28 35 9 (8.5) 6 (8.5-6)2
=6.25
2 40 26 3 10(9.5) 42.25
3 35 42 5 (4.5) 3 2.25
4 25 26 10 9 (9.5) 0.25
5 28 33 8 (8.5) 7 2.25
6 35 45 4 (4.5) 2 6.25
7 50 32 1 8 49
8 48 51 2 1 1
9 32 39 6 4 4
10 30 36 7 5 4
N = 10 Total d2 =117.5

Student No. Marka By Examiner Ranks Rank


difference
square
A B RA RB D2
1 85 80 2 2 0
2 56 60 8 7 1
3 45 50 10 10 0
4 65 62 6 6 0
5 96 90 1 1 0
6 52 55 9 8 1
7 80 75 3 4 1
8 75 68 5 5 0
9 78 77 4 3 1
10 60 53 7 9 1
N = 10 Total 5 = d2
70

Explanation:- In the column of A and B there is repeatation of scores so


while assigning the ranks we first assign the ranks by treating them as
different values and then for rereated scores we assign the average rank.
e.g In col A the score 35 appears 2 times at number 4 and 5 in the order of
ranking so we calculate the average rank as (4+5)/2 = 4.5.
Hence the ranks assigned are 4.5 each. The other repeated scores can be
ranked in the same manner.

Note: In this example we can note that the ranks are in fraction e.g. 4.5,
which is logically incorrect or meaningless. Therefore in the calculation of
‘R’ we add a correction factor (C.F.) to d2 calculated as follows.

Table of correction factor (C.F.)

Value Frequency
Repeated M m(m2-1)
35 2 2x(22-1)=6
28 2 6
26 2 6
Total m(m2-1)=18

 m3  m)
Now C.F .  = 18/12 =1.5
12
... d2 = 117.5+1.5= 119
We use this value in the calculation of ‘R’
Now the Spearman’s rank correlation co-efficient is given by
6d 2
R= 1-
n(n2  1)
6x 119
Substituting the values we get, R= 1- = 1-0.72 = 0.28
10(10 2  1)

EXERCISE I

1. What is mean by correlation? Explain the types of correlation with


suitable examples.

2. What is a scatter diagram? Draw different scattered diagrams to


explain the correlation between two variables x and y.

3. State the significance of ‘r’ = +1, –1 and 0.

4. Calculate the coefficient of correlation r from the following data.


X: 18 12 16 14 10 15 17 13
Y: 9 13 20 15 11 24 26 22
71

5. The following table gives the price and demand of a certain


commodity over the period of 8 months. Calculate the Pearson’s
coefficient of correlation.
Price: 15 12 23 25 18 17 11 19
Demand 45 30 60 65 48 45 28 50

6. Following results are obtained on a certain bivariate data.


(i) n = 10 x= 75 y = 70 x2 = 480
y2 = 600 xy = 540
(ii) n = 15 x= 60 y = 85 x2 = 520
y2 =1200 xy = -340
Calculate the Pearson’s correlation coefficient in each case.

7. Following data are available on a certain bi-variate data :


(i) (x- x ) (y- y )=120, (x- x )2 = 150 (y- y )2 = 145
(ii) (x- x ) (y- y )= -122, (x- x )2 = 136 (y- y )2 = 148
Find the correlation coefficient.

8. Calculate the Pearson’s coefficient of correlation from the given


information on a bivariate series:
No of pairs: 25
Sum of x values:300
Sum of y values:375
Sum of squares of x values: 9000
Sum of squares of y values:6500
Sum of the product of x and y values:4000.

9. The ranks assigned to 8 participants by two judges are as


followes.Calculate the Spearman’s Rank correlation coefficient ‘R’.
Participant No: 1 2 3 4 5 6 7 8
Ranks by JudgeI: 5 3 4 6 1 8 7 2
JudgeII : 6 8 3 7 1 5 4 2

10.Calculate the coefficient of rank correlation from the data given below.
X: 40 33 60 59 50 55 48

Y: 70 60 85 75 72 82 69

11. Marks given by two Judges to a group of 10 participants are as


follows. Calculate the coefficient of rank correlation.
Marks by Judge
A: 52 53 42 60 45 41 37 38 25 27
Judge B: 65 68 43 38 77 48 35 30 25 50.

12. An examination of 8 applicants for a clerical post was by a bank. The


marks obtained by the applicants in the subjects of Mathematics and
Accountancy were as follows. Calculate the rank correlation coefficient.
Applicant: A B C D E F G H
72

Marks in
Maths: 15 20 28 12 40 60 20 80
Marks in
Accounts: 40 30 50 30 20 10 25 60

4.5 REGRESSION ANALYSIS

As the correlation analysis studies the nature and extent of


interrelationship between the two variables X and Y, regression analysis
helps us to estimate or approximate the value of one variable when we
know the value of other variable. Therefore we can define the
‘Regression’ as the estimation (prediction) of one variable from the other
variable when they are correlated to each other. e.g. We can estimate the
Demand of the commodity if we know it’s Price.

Why are there two regressions?


When the variables X and Y are correlated there are two possibilities,

(i) Variable X depends on variable y. in this case we can find the value of
x if know the value of y. This is called regression of x on .
(ii) Variable  depends on variable X. we can find the value of y if know
the value of X. This is called regression of y on x. Hence there are two
regressions,

(a) Regression of X on Y; (b) Regression of X on Y.

4.5.1 Formulas on Regression equation,

Regression of X on Y Regression of X on Y

Assumption: X depends on Y Y depends on X


The regression equation is The regression equation is
(x- x ) = bxy(y- y ) (y- y ) = byx (x- x )

bxy= Regression co-efficient of byx= Regression co-efficient


Cov(x, y ) Cov(x, y )
X on Y = of Y on X =
V ( y) V (x)

Where,
1 1
Cov(x,y) =  (x- x ) (y- y ) =  xy- x y
n n
1 1
V(x) =  (x- x )2 and V(y) =   (y- y )2
n n

1 1
V(x) =  x2 - x 2
and V(y) =  y2- y 2
n n
Use: To find X Use: To find 
73

SOLVED EXAMPLES

Example 1:
Obtain the two regression equatione and hence find the value of x when
y=25
Data:-
X Y X2 Y2 XxY
8 15 64 225 120
10 20 100 400 200
12 30 144 900 360
15 40 225 1600 600
20 45 400 2025 900
y =
2
xy=
x= 65 y =150 x = 9332
5150 2180
And n= 5

Now the two regression equations are,


(x- x ) = bxy(y- y ) -------x on y (i)
(y- y ) = byx (x- x ) ------- y on x (ii)

Where,
1 65 1 150
x= x = =13 and y= y = =30
n 5 n 5

Also,
1 1 1
Cov(x,y,) =  xy- x y V(x) =  x2 - x 2 V(y) =  y2- y 2
n n n
2180 933 5150
= - 13x30 = -132 = -302
5 5 5
= 436-390 = 186.6 - 169 = 1030 – 900

... Cov(x,y) = 46 V(x) = 17.6 V(y) =130

Now we find,
Regression co-efficient of X on Y Regression co-efficient of X on Y
Cov(x, y ) Cov(x, y )
bxy = byx =
V ( y) V (x)
46 46
= =
130 17.6
.
. . bxy = 0.35 and byx = 2.61

Now substituting the values of x , y , bxy and byx in the regression


equations we get,
(x-13) = 0.35(y-30) -------x on y (i)
(y-30) =2.61(x-13) ------- y on x (ii)
as the two regression equations.
74

Now to estimate x when y =25, we use the regression equation of x on y


... (x-13) = 0.35(25-30)
... x = 13 -1.75 = 11.25

Remark:
From the above example we can note some points about Regression
coefficients.
 Both the regression coefficients carry the same sign (+ or -)
 Both the regression coefficients can not be greater than 1 in
number
(e.g. -1.25 and -1.32) is not possible.
 Product of both the regression coefficients bxy and byx must be < 1
i.e. bxy X byx < 1 Here 0.35x2.61 = 0.91< 1 (Check this
always)

Example 2:
Obtain the two regression equations and hence find the value of y when
x=10
Data:-
X Y XxY X2 Y2
12 25 300 144 625
20 18 360 400 324
8 17 136 64 289
14 13 182 196 169
16 15 240 256 225
x= 70 y=88 xy=1218 x =1060 y2=1632
2

And n= 5

Now the two regression equations are,


(x- x ) = bxy(y- y ) -------x on y (i)
(y- y ) = byx (x- x ) ------- y on x (ii)

Where,
1 70 1 88
x= x = = 14 and y = y = =17.6
n 5 n 5

Also,
1 1 1
Cov(x,y,) =  xy- x y V(x) =  x2 - x 2 V(y) =  y2- y 2
n n n
1218 1060 1632
= – 14x17.6 = -142 = -17.62
5 5 5
= 243.6-246.4 = 212 -196 = 326.4 – 309.76
... Cov(x,y) = -2.8 V(x) = 16 V(y) = 16.64
75

Now we find,
Regression co-efficient of X on Y Regression co-efficient of X on Y
Cov(x, y ) Cov(x, y )
bxy = byx =
V ( y) V (x)
2.8 2.8
= =-
16.64 16.64
... bxy = - 0.168 byx = 0.175

Now substituting the values of x , y , bxy and byx in the regression


equations we get,

(x-14) = -0.168(y-17.6) -------x on y (i)


(y-17.6)=-0.175(x-14) ------- y on x (ii)
as the two regression equations.

Now to estimate y when x =10, we use the regression equation of y on x


... (y-17.6) =-0.175(10-14)
... y =17.6 +0.7 = 24.3

Example 3:
The following data give the experience of machine operators and
their performance rating given by the number of good parts turned out per
100 pieces.

Operator: 1 2 3 4 5 6 7 8
Experience: 16 12 18 4 3 10 5 12
(in years)
Performance: 87 88 89 68 78 80 75 83
Rating

Obtain the two regression equations and estimate the permance rating of
an operator who has put 15 years in service.

Solution: We define the variables,


X: Experience y: Performance rating
Table of calculations:

X Y Xy x2 Y2
16 87 1392 256 7569
12 88 1056 144 7744
18 89 1602 324 7921
4 68 272 16 4624
3 78 234 9 6084
10 80 800 100 6400
5 75 375 25 5625
12 83 996 144 6889
x= 80 y= 648 xy= 6727 x2= 1018 y2= 52856
76

Now the two regression equations are,


(x- x ) = bxy(y- y ) -------x on y (i)
(y- y ) = byx (x- x ) ------- y on x (ii)

Where,
1 80 1 648
x = x = = 10 and y = y = =81
n 8 n 8

Also,
1 1 1
Cov(x,y,) =  xy- x y V(x) =  x2 - x 2 V(y) =  y2- y 2
n n n
6727 1018 52856
= –10x81 = -102 = -812
8 8 8
= 840.75- 810 = 127.25 - 100 = 6607 – 6561
... Cov(x,y) = 30.75 V(x) = 27.25 V(y) = 46

Now we find,
Regression co-efficient of X on Y Regression co-efficient of X on Y
Cov(x, y ) Cov(x, y )
bxy = byx =
V ( y) V (x)
30.75 30.75
= =
46 27.25

... bxy = 0.67 and byx = 1.13

Now substituting the values of x , y , bxy and byx in the regression


equations we get,
(x-10) = 0.67(y-81) -------x on y (i)
(y-81)=1.13(x-10) ------- y on x (ii)
as the two regression equations.

Now to estimate Performance rating (y) when Experience (x) = 15, we use
the regression equation of y on x
... (y-81) =1.13(15-10)
. . . y = 81+ 5.65 = 86.65

Hence the estimated performance rating for the operator with 15 years of
experience is approximately 86.65 i.e approximately 87

4.5.2 Regression coefficients in terms of correlation coefficient.


We can also obtain the regression coefficients bxy and byx from
standard deviations, x., y and correlation coefficient ‘r’ using the
formulas
 
bxy = r x and byx= r y
y x
77

Also consider,
x y
bxy X byx = r r = r2 i.e. r = bxy x byx
y x

Hence the correlation coefficient ‘r’ is the geometric mean of the


regression coefficients, bxy and byx

Example 5:
You are given the information about advertising expenditure and sales:

---------------------------------------------------------------------------------------
Exp. on Advertisiment Sales (Rs. In Lakh)
(Rs. In Lakh)
----------------------------------------------------------------------------------------
Mean 10 90
S.D. 3 12
----------------------------------------------------------------------------------------
Coefficient of correlation between sales and expenditure on Advertisement
is 0.8.Obtain the two regression equations.

Find the likely sales when advertisement budget is Rs. 15 Lakh.

Solution: We define the variables,


X: Expenditure on advertisement
Y: Sales achieved.

Therefore we have,
x = 10, y =90, 6x= 3, 6y = 12 and r = 0.8

Now, using the above results we can write the two regression equations as

(x- x ) = r x (y- y ) -------x on y (i)
y
y
(y- y ) = r (x- x ) ------- y on x (ii)
x
Substituting the values in the equations we get,
3
(x-10) = 0.8 (y-90)
12
i.e x- 10 = 0.2 (y-90) -------x on y (i)

12
also (y-90) = 0.8 (x-10)
3
i.e. y-90 = 3.2 (x-10) -------y on x (ii)

Now when expenditure on advertisement (x) is 15, we can find the sales
from eqn (ii) as,
y-90 = 3.2 (15-10)
... y = 90 + 16 = 106
78

Thus the likely sales are Rs.106 Lakh.

Example 6: Comput the two regression equations on the basis of the


following information:

X Y
Mean 40 45
Standard deviation 10 9

Karl Pearson’s coefficient of correlation between x and y = 0.50.


Also estimate the value of x when y = 48 using the appropriate equation.

Solution: We have,
x = 40, y =45, x= 10, y = 9 and r = 0.5

Now, we can write the two regression equations as



(x- x ) = r x (y- y ) -------x on y (i)
y
y
(y- y ) = r (x- x ) ------- y on x (ii)
x

Substituting the values in the equations we get,


10
(x-40) = 0.5 (y-45)
9
i.e x- 40 = 0.55 (y-45) -------eqn of x on y (i)
9
and (y-45) = 0.5 (x-40)
10
i.e. y-45 = 0.45(x-40) -------eqn of y on x (ii)

Now when y is 48, we can find x from eqn (i) as,


x-40 = 0.55(48-45)
. . . x = 40 +1.65 = 41.65

Example 7:
Find the marks of a student in the Subject of Mathematics who
have scored 65 marks in Accountancy Given,

Average marks in Mathematics 70


Accountancy 80
Standard Deviation of marks in Mathematics 8
in Accountancy 10

Coefficient of correlation between the marks of Mathematics and marks of


Accountancy is 0.64.
79

Solution: We define the variables,


X: Marks in Mathematics
Y: Marks in Accountancy

Therefore we have,
x = 70, y =80, σx= 8, σy =10 and r = 0.64

Now we want to approximate the marks in Mathematics (x), we obtain the


regression equation of x on y, which is given by

(x- x ) = r x (y- y ) -------x on y (i)
y
Substituting the values we get,
8
(x-70) = 0.64 (y-80)
10
i.e x- 70 = 0.57 (y-80)

Therefore, when marks in Accountancy (Y) = 65


x- 70 = 0.57(65-80)
.
. . x = 70-2.85 = 67.15 i.e. 67 appro.

Use of regression equations to find means x , y S.D.s x , y and


correlation coefficient ‘r’

As we have that, we can obtain the regression equations from the


values of Means, standard deviations and correlation coefficients ‘r’, we
can get back these values from the regression equations.

Now, we can note that the regression equation is a linear equation


in two variables x and y. Therefore, the linear equation of the type
Ax+By+C = 0 or y = a+bx represents a regression equation.

e.g. 3x+5y-15 = 0 and 2x+7y+10 = 0 represent the two regression


equations.

The values of means x , y can be obtain by solving the two equations as


the simultaneous equations.

Example 8:
From the following regression equation, find means x , y , x, y and ‘r’
3x-2y-10 = 0, 24x-25y+145 = 0

Solution: The two regression equations are,


3x-2y-10 = 0 --------(i)
24x-25y+145 = 0 ---(ii)

Now for x and y we solve the two equations as the simultaneous


equations.
80

Therefore, by (i) x 8 and (ii) x1, we get


24x-16y-80 = 0
24x-25y+145 = 0
- + -
225
9y-225 = 0 y= = 25
9
Putting y = 25 in eqn (i), we get
3x-2(25)-10 = 0
60
3x – 60 = 0 x= = 20
3
Hence x = 20 and y = 25.

Now to find ‘r’ we express the equations in the form y=a+bx


So, from eqns (i) and (ii)
3 x 10 24 x 145
y= – and y= +
2 2 25 25
3 24
... b1 = = 1.5 ... b2 = = 0.96
2 25
Since, b1 > b2 (i.e. b2 is smaller in number irrespective of sign + or -)

... Equation (ii) is regression of y on x and byx = 0.96

Hence eqn (i) is regression of x on y and bxy = 1/1.5 = 0.67


________ _________
Now we find, r = √ bxy X byx i.e. r = √0.67x0.96 = + 0.84

(The sign of ‘r’ is same as the sign of regression coefficients)

Example 9:
Find the means values of x,y, and r from the two regression equations.
3x+2y-26=0 and x+y-31=0. Also find x when y = 3.

Solution: The two regression equations are,


3x+2y-26=0 -------- (i)
6x+y-31=0 ----------(ii)

Now for x and y we solve the two equations as the simultaneous


equations.

Therefore, by (i) x 2 and (ii) x1, we get


6x+4y-52 = 0
6x+ y-31 = 0
21
- - + ... y = =7
3
3y-21 = 0
81

Putting y = 7 in eqn (i), we get


3x+2(7)-26 = 0
12
3x – 12 = 0 x= = 4.
3
Hence x = 4 and y = 7.

Now to find ‘r’ we express the equations in the form y=a+bx


So, from eqns (i) and (ii)
3 26 6 31
y=- x– and y = - x +
2 2 1 1
3 6
... b1 = - = - 1.5 ... b2 = =-6
2 1
since, b1 < b2 (i.e. b1 is smaller in number irrespective of sign + or -)

... Equation (i) is regression of y on x and byx = - 1.5

Hence, eqn (ii) is regression of x on y and bxy = - 1/6 = - 0.16


Now we find, . r = bxy x byx r = 0.16x1.5 = = - 0.16

Note: The sign of ‘r’ is same as the sign of regression coefficients

Now to find 6x when 6y = 3, we use the formula,



byx = r x
y

0.16x3
- 1.5 = -
6x

0.48
... 6x = = 0.32
1.5

Hence means x = 4, y = 7, r = - 0.16 and 6x = 0.32.

EXERCISES
1. What is mean by Regression? Explain the use of regression in the
statistical analysis.
2. Why are there two Regressions? Justify.
3. State the difference between Correlation and Regression.
4. Obtain the two regression equations from the data given bellow.
X: 7 4 6 5 8
Y: 6 5 9 8 2
Hence estimate y when x = 10.
5. The data given below are the years of experience (x) and monthly
wages (y) for a group of workers. Obtain the two regression equations
and approximate the monthly wages of a workers who have completed
15 years of service.
82

Experience: 11 7 9 5 8 6 10
In years

Monthly wages: 10 8 6 8 9 7 11
(in ‘000Rs.)

6. Following results are obtained for a bivariate data. Obtain the two
regression equations and find y when x = 12

n = 15 x= 130 y = 220 x2 = 2288 y2 = 5506 xy = 3467

7. Marks scored by a group of 10 students in the subjects of Maths and


Stats in a class test are given below.Obtain a suitable regression
equation to find the marks of a student in the subject of Stats who have
scored 25 marks in Maths.

Student no: 1 2 3 4 5 6 7 8 9 10

Marks 13 18 9 6 14 10 20 28 21 16
in Maths

Marks 12 25 11 7 16 12 24 25 22 20
in Stats:

8. The data given below are the price and demand for a certain
commodity over a period of 7 years. Find the regression equation of Price
on Demand and hence obtain the most likely demand for the in the year
2008 when it’s price is Rs.23.

Year: 2001 2002 2003 2004 2005 2006 2007

Price(in RS): 15 12 18 22 19 21 25
Demand 89 86 90 105 100 110 115
(100 units)

9. For a bivariate data the following results were obtained


x = 53.2 , y = 27.9 , 6x = 4.8, y = .4 and r =0.75
Obtain the two regression equations, find the most probable value of x
when y =25.

10. A sample of 50 students in a school gave the following statistics


about Marks of students in Subjects of Mathematics and Science,
---------------------------------------------------------------------------------------
Subjects: Mathematics Science
---------------------------------------------------------------------------------------
Mean 58 79
S.D. 12 18
----------------------------------------------------------------------------------------
83

Coefficient of correlation between the marks in Mathematics and marks in


Science is 0.8. Obtain the two regression equations and approximate the
marks of a student in the subject of Mathematics whose score in Science is
65.

11. It is known that the Advertisement promotes the Sales of the


company. The company’s previous records give the following results.
---------------------------------------------------------------------------------------
Expenditure on Advertisement Sales
(Rs. In Lakh) (Rs. In Lakh)
----------------------------------------------------------------------------------------
Mean 15 190
S.D. 6 20
----------------------------------------------------------------------------------------
Coefficient of correlation between sales and expenditure on Advertisement
is 0.6. Using the regression equation find the likely sales when
advertisement budget is Rs.25 Lakh.

12. Find the values of x,y, and r from the two regression equations given
bellow. 3x+2y-26=0 and 6x+y-31=0. Also find 6x when y = 3.

13. Two random variables have the regression equations:


5x+7y-22=0 and 6x+2y-20=0. Find the mean values of x and y. Also
find S.D. of x when S.D. of y = 5.

14. The two regression equations for a certain data were y = x+5 and 16x
= 9y-94. Find values of x , y and r. Also find the S.D. of y when S.D. of x
is 2.4.



You might also like