Weekly Wages in Rs. No. of Persons Weekly Wages in Rs. No of Persons
Weekly Wages in Rs. No. of Persons Weekly Wages in Rs. No of Persons
Weekly Wages in Rs. No. of Persons Weekly Wages in Rs. No of Persons
7. Calculate the mean and standard deviation of the following frequency distribution.
Marks 20 – 29 30 – 39 40 – 49 50 – 59 60 – 69 70 – 79 80 – 89 90 – 99
No. of students 5 12 15 20 18 10 6 4
8. Compute the standard deviation from the following distribution of marks obtained by 90 students:
Ans. 17.65.
9. The following table shows the Marks obtained by 100 candidates in an examination. Calculate
Marks 1 – 10 11 – 20 21 – 30 31 – 40 41 – 50 51 – 60
No. of candidates 3 16 26 31 16 8
Mean – Mode
(g) mode, (h) , (i) Coefficient of standard deviation, (j) Variance
Standard deviation
11. The expected value of a random variable X is 2 and its variance is 1, then variance of 3X+ 4 is
(a) 9 (b) 7 (c) 3 (d) 13 (A.M.I.E.T.E., Dec. 2004) Ans. (a)
12. The expected value of a random variable X is 3 and its variance is 2. Then the variance of
2X + 5 is
(a) 8 (b) 9 (c) 10 (d) 11 (A.M.I.E.T.E., June 2006) Ans. (a)
10.16 CORRELATION
Whenever two variables x and y are so related that an increase in the one is accompanied by
an increase or decrease in the other, then the variables are said to be correlated.
For example, the yield of crop varies with the amount of rainfall.
If an increase in one variable corresponds to an increase in the other, the correlation is said to be
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
746 Statistics
positive. If increase in one corresponds to the decrease in the other the correlation is said to be
negative. If there is no relationship between the two variables, they are said to be independent.
Perfect Correlation: If two variables vary in such a way that their ratio is always constant, then
the correlation is said to be perfect.
10.17 SCATTER OR DOT-DIAGRAM
When we plot the corresponding values of two variables, taking one on x-axis and the other
along y-axis, it shows a collection of dots.
Graphical Algebraic
r
XY
P
Covariance( x, y )
,
X Y
2 2 x y variance x variance y
where X = x – x, Y = y – y
i.e. X, Y are the deviations measured from their respective means,
XY
p co - variance
n
and x , y being the standard deviations of these series.
Example 15. Ten students got the following percentage of marks in Economics and Statistics.
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
Statistics 747
Roll No. 1 2 3 4 5 6 7 8 9 10
Marks in Economics 78 36 98 25 75 82 90 62 65 39
Marks in Statistics 84 51 91 60 68 62 86 58 53 47
x y X = x – 65 Y = y – 66 X2 Y2 XY
78 84 13 18 169 324 234
36 51 – 29 – 15 841 225 435
98 91 33 25 10 89 625 825
25 60 –40 –6 1600 36 240
75 68 10 2 100 4 20
82 62 17 –4 289 16 – 68
90 86 25 20 625 400 500
62 58 –3 –8 9 64 24
65 53 0 – 13 0 169 0
39 47 –26 – 19 676 361 494
650 660 0 0 5 39 8 2224 2704
Here X 2
= 5398, Y 2
= 2224, XY = 2704
r
XY
2704
X Y
2 2 5398 2224
2704 2704
0.78 Ans.
73.4 47.1 3457
Example 16. Find the coefficient of correlation between the age and the sum assured from
the following table.
Solution. Let the sum assured denote by x and the age group by y.
x 30, 000 y 45
x' , y'
10, 000 10
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
748 Statistics
25 –2 16 12 0 –14 –4
20–30 4 6 3 7 1 –42 84 +10
–1 4 8 0 –7 –2
30–40
35 2 8 15 7 1 –33 33 +3
0 0 0 0 0 0
40–50
45 3 9 12 6 2 0 0 0
–16 –4 0 0 0
50–60
55 1 8 4 2 – – 14 14 –20
f 17 27 32 20 4 fy= fy2 fxy
(colu- –61 =131 = –7
mn)
fx 34 27 0 20 8 fx
= –33
fx2 68 27 0 20 16 fx2
= 131
fx y 4 16 0 –21 –6 fx y
= –7
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
Statistics 749
Example 17. Calculate the coefficient of correlation for the following table :
x–age
0–4 4–8 8 –12 12 –16
marks
0–5 7 — — —
5 –10 6 8 — —
10 –15 — 5 3 —
15 – 20 — 7 2 —
20 – 25 — — — 9
Solution. Replace the class-interval for x and y by their mid-points and then let
x 10 y – 12.5
X ' and Y '
4 5
x 2 6 10 14 f fY fY2 fXY
–2 –1 0 1 (Row)
X
y
Y f fXY f fXY f fXY f fXY
0–5 2.5 –2 7 28 7 –14 –28 28
5–10 7.5 –1 6 12 8 8 14 –14 –14 20
10–15 12.5 0 5 0 3 0 8 0 0 0
15–20 17.5 1 7 2 0 9 9 9 –7
–7
20–25 22.5 2 9 18 9 18 36 18
f 13 20 5 9 47 fY= fY2 fXY
–1 =87 = 59
fX –26 –20 0 9 fX
= –37
fX2 52 20 0 9 fX2
= 81
fX Y 40 1 0 18 fX Y
= 59
Here, fX '
37, fX 81, fY 1 , fY '
'2 ' 2
87, fX Y = 59
fX Y fX
'
fY
' ' '
N N N
r
2 2
fX '2
fX '
fY '2
fY '
N N N N
59 37 1
47 47 47 1.255 0.017
81 37 2
87 1 2
1.732 0.620 1.851 0.0005
47 47 47 47
1.238 1.238 1.238
0.87 Ans.
1.103 1.8505 1.05 1.36 1.428
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
750 Statistics
10. 20 SPEARMAN’S RANK CORRELATION
6 d 2
r 1
n(n2 1)
Solution. Let ( x1 , y1 ), ( x2 , y2 )...( xn , yn ) be the ranks of n individuals corresponding to two
characteristics.
Assuming nor two individuals are equal in either classification, each individual takes the values
1, 2, 3, ... n and hence their arithmetic means are, each
n 1 n(n 1 n 1
n n 2 2
Let x1 , x2 , x3 ,... xn be the values of variable X and y1 , y2 , y3 ,... yn those of Y.
n 1 n 1
Then d X Y x y x y
2 2
where X and Y are deviations from the mean.
2 2
n 1 n 1
X 2 x 2 x 2 (n 1) x 2
2
n( n 1)(2n 1) ( n 1) n (n 1) n 1
n
6 2 2
n(n 2 1)
=
12
Clearly,
X Y and X 2
Y 2
n(n 2 1)
Y 12 2
Hence d ( x y) x y
2 2 2 2
2 xy
1 n(n2 1)
XY 2 6
d2
1 1
n(n2 1) d 2
12 2
X Y2 2
1 1
n (n2 1) d 2
12 2
n (n2 1)
12
6 d 2
1 Ans.
n(n 2 1)
10.21 SPEARMAN’S RANK CORRELATION COEFFICIENT
6 d 2
r 1
n(n2 1)
where r denotes rank coefficient of correlation and d refers to the difference of ranks between
paired items in two series.
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
Statistics 751
Example 18. Compute Spearman’s rank correlation coefficient r for the following data:
Person A B C D E F G H I J
Rank in statistics 9 10 6 5 7 2 4 8 1 3
Rank in income 1 2 3 4 5 6 7 8 9 10
Solution.
Person Rank in statistics Rank in income d = R1 – R2 d2
A 9 1 8 64
B 10 2 8 64
C 6 3 3 9
D 5 4 1 1
E 7 5 2 4
F 2 6 –4 16
G 4 7 –3 9
H 8 8 0 0
I 1 9 –8 64
J 3 10 –7 49
d2 = 280
6 d 2
r 1
n(n2 1)
6 280
r 1 1 1.697 0.697 Ans.
10(100 1)
Example 19. Establish the formula 2x y 2x 2y 2 r x y
where r is the correlation coefficient between x and y.
2
[( x y) ( x y)]
x y
n
x y mean of ( x y ) series = mean of x – mean of y x y
2 2
2
[( x y) ( x y )]
[( x x ) ( y y )]
x y
n n
[( x x ) ( y y ) 2( x x )( y y )]
2 2
(x x ) 2
( y y )2
2 ( x x )( y y )
n n n
2 ( x x )( y y )
2x 2y ...(1)
n
r
( x x )( y y ) ( x x )( y y ) r
We know that n x y or x y
n
Putting this value in (1), we get x2 y x2 y2 2r x y Proved.
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
752 Statistics
Example 20. If X and Y are uncorrelated random variables, find the coefficient of correlation
between X + Y and X – Y.
Solution.
Let u=X+Y and v = X – Y
r
(u u )(v v )
Then
n u v
Now u X Y ,u X Y
Similarly v X Y
Now (u u )(v v )
( X X Y Y )[( X X ) (Y Y )]
( x y)( x y )
x y 2 2
n x2 n y2
(u u ) 2
1
Also u2
n
n
[( X X ) (Y Y )]2
1
n
( x y) 2
1
( x 2 y 2 2 xy )
n
x2 y2 (As X and Y are not correlated, we have xy 0 )
Similarly v2 x2 y2
r
(u u )(v v )
n u v
n( x2 y2 )
n( x2 y2 ) n( x2 y2 )
x2 y2
Ans.
x2 y2
10.22 REGRESSION
If the scatter diagram indicates some relationship between two variables x and y, then the dots of
the scatter diagram will be concentrated round a curve. This curve is called the curve of regression.
Regression analysis is the method used for estimating the unknown values of one variable
corresponding to the known value of another variable.
10.23 LINE OF REGRESSION
When the curve is a straight line, it is called a line of regression. A line of regression is the
straight line which gives the best fit in the least square sense to the given frequency.
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
Statistics 753
10.24 EQUATIONS TO THE LINES OF REGRESSION
Let y a bx ... (1)
be the equation of the line of regression of y on x.
Let ( xr , yr ) be any point of dot.
From the figure
PR yr
QR a bxr
PQ PR QR yr a bxr
Let S be the sum of the squares of such distances, then
S ( y a bx) 2
According to the principle of least squares, we have to choose a and b so that S is minimum. The
method of least square gives the condition for minimum value of S.
S S
–2 ( y a bx ), 2 ( y a bx )x
a b
S S
0, 0, for S minimum
a b
i.e. ( y a bx) 0 y na b x 0
y na b x ... (2)
and ( xy ax bx ) 0 xy a x b x
2 2
0
xy a x b x 2
...(3)
Dividing (2) by n, we get
y a b x
y
y , x x
n n n n
y a bx
where x and y are the means of x series and y series.
This shows that x , y lie on the line of regression (1), shifting the origin to x , y , the equation
(3) becomes
( x x )( y y ) a ( x x ) b ( x x ) 2
But x x 0 i.e. ( x x )( y y ) b ( x x ) 2
b
x x ( y y ) XY
...(4)
x x
2
X 2
r
XY
XY
XY
We know X Y n X Y n x y
2 2 2 2
n n
XY nr x y
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
754 Statistics
y
i.e. slope of the line of regression b r
x
The line of regression passes through ( x , y ).
Hence the equation to the line of regression is
y
y y r (x x )
x
Similarly the regression line of x on y is
x x r x ( y y ).
y
Note. byx r y and bxy r x are known as the coefficients of regression.
x y
y x 2
byx .bxy r r r
x y
Example 21. If be the acute angle between the two regression lines in the case of two variables
x and y, show that
1 r 2 x y
tan
r x2 y2
where r , x , y have their usual meanings. Explain the significance wheree r 0 and r 1 .
(A.M.I.E., Winter 2001)
Solution. Lines of regression are
y y
y y r x x ...(1) m1 r
x x
1y
and x x r x y y ...(2) m2
y r x
m2 m1
tan
1 m1m2
1 y y 1 y
r r
r x x r x
y 1 y 2y
1 r 1 2
x r x x
y 2
x
1 r x
2
1 r 2 x y ...(3) Proved.
. 2
r x 2y r 2x 2y
(a) If r = 0, then there is no relationship between the two variables and they are independent.
On putting the value of r = 0 in (3), we get tan , . So the lines (1) and (2) are
perpendicular. (A.M.I.E., Summer 1998)
(b) If r = 1 or –1
On putting these values of r in (3) we get, tan 0 or
i.e. lines (1) and (2) coincide.
The correlation between the variables is perfect. Ans.
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
Statistics 755
Example 22. Find the correlation coefficient between x and y, when the lines of regression are:
2 x 9 y 6 0 and x 2 y 1 0
Solution. Let the line of regression of x on y be 2 x 9 y 6 0
Then, the line of regression of y on x is x 2 y 1 0
9 9
2 x 9 y 6 0 x y 3 bxy
2 2
1 1 1
and x 2 y 1 0 y x byx
2 2 2
9 1 3
r bxy byx 1 which is not possible.
2 2 2
So our choice of regression line is incorrect.
The regression line of x on y is x 2 y 1 0
And, the regression line of y on x is 2 x 9 y 6 0
x 2 y 1 0 x 2 y 1 bxy 2
2 2 2
And 2 x 9 y 6 0 y x byx
9 3 9
2 2
r bxy byx 2
9 3
2
Hence the correlation coefficient between x and y is .
3
Example 23. The following regression equations were obtained from a correlation table:
y 0.516 x 33.73, x 0.512 y 32.52
Find the value of (a) the correlation coefficient, (b) the mean of x’s and (c) the mean of y’s
Solution. y 0.516 x 33.73 ...(1)
x 0.512 y 32.52 ...(2)
y
(a) From (1), r 0.516 ...(3)
x
From (2), r x 0.512 ...(4)
y
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
756 Statistics
Example 24. The two regression equations of the variables x and y are
x 19.13 0.87 y and y 11.64 0.50 x .
Find (i) Mean of x’s; (ii) Mean of y’s; (iii) The correlation coefficient between x and y.
Solution. x 19.13 0.87 y ...(1)
y 11.64 0.50 x ...(2)
As (1) and (2) pass through ( x , y ) :
x 19.13 0.87 y ...(3)
y 11.64 0.50 x ...(4)
On solving (3) and (4) we get
x 15.935, y 3.67
x
From (1) r 0.87 ...(5)
y
y
From (2) r 0.50 ...(6)
x
As x and y are always positive, so r is negative.
Multiplying (5) and (6) we get
x y
r r 0.87 (0.50)
y x
r 2 0.435 r 0.66 Ans.
Example 25. The regression equations calculated from a given set of observations for two
random variables are
x 0.4 y 6.4 and y 0.6 x 4.6
Calculate x , y and r..
Solution. The regression equations are
x 0.4 y 6.4 ... (1)
y 0.6 x 4.6 ... (2)
x
From (1) coefficient of regression of x on y r 0.4 ...(3)
y
y
From (2) coefficient of regression of y on x r 0.6 ...(4)
x
From (3) and (4)
x y
r r (0.4)(0.6)
y x
r 2 0.24
r = ± 0.49
In (3) and (4), x and y are (always) positive so r is negative
r = – 0.49
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
Statistics 757
To find x and y we solve the equations (1) and (2) simultaneously. Their point of intersection
is ( x, y )
x = 6, y =1 Ans.
Example 26. Show that the geometric mean of the coefficients of regression is the coefficient of
correlation.
y x
Solution. The coefficients of regressions are r and r
x y
y x
i.e. G.M. r .r r
x y
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
758 Statistics
equations :
y na b x or 40 = 8a +56b ...(1)
xy a x b x 2
or 364 = 56a +524b ...(2)
On solving (1) and (2) we get,
6 7
a and b
11 11
The equation of the required line is
6 7
y x or 7x–11y + 6 =0 Ans..
11 11
6 7 76 10
If x 10, y
(10) 6 Ans.
11 11 11 11
Example 29. In a study between the amount of rainfall and the quantity of air pollution
removed the following data were collected.
Daily Rainfall in 0.01 cm 4.3 4.5 5.9 56 61 5.2 3.8 2.1
3
Pollution Removed (mg/m ) 12.6 121 11.6 11.8 11.4 11.8 13.2 14.1
Find the regression line of y on x. (A.M.I.E., Summer 2000)
Solution.
S.No. x (metre) y xy x2
1 4.3 12.6 54.18 18.49
2 4.5 12.1 54.45 20.25
3 5.9 11.6 68.44 34.81
4 5.6 11.8 66.08 31.36
5 6.1 11.4 69.54 37.21
6 5.2 11.8 61.36 27.04
7 3.8 13.2 50.16 14.44
8 2.1 14.1 29.61 4.41
37.5 98.6 453.82 188.01
Let y = a + bx be the equation of the line of regression of y on x, where a and b are given by
the following equations.
y na b x 98.6 = 8a + 37.5b ...(1)
xy a x b x 2
453.82 = 37.5a + 188.01b ...(2)
On solving (1) and (2), we get a = 15.49 and b = – 0.675.
The equation of the line of regression is y = 15.49 – 0.675x Ans.
Example 30. The following data regarding the heights (y) and the weights (x) of 100
college students are given :
x = 15000, x = 2272500 2
y =6800, y =46.3025
2
x y =1022250
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
Statistics 759
Find the correlation coefficient between height and weight and state the equation of regression
of height on weight
Solution. x
x 15000 150, y = y 6800 68
n 100 n 100
2
x
x 2
x
2272500 15000
2
n n 100 100
x 22725 22500 225 15
2
y
y 2
y
463025 6800
2
n n 100 100
4630.25 4624 6.25 2.5
xy ( x ) ( y ) 1022250
(150)(68)
r n 100
( x )( y ) 15 2.5
10222.5 10200 22.5 1.5
0.6
15 2.5 15 2.5 2.5
Regression equation of y on x we have
y 2.5
y y r ( x x ), y 68 0.6 ( x 150)
x 15
1
y 68 ( x 150) or 10 y – 680 = x – 150
10
10 y = x + 530 Ans.
10.25 ERROR OF PREDICTION
The deviation of the predicted value from the observed value is known as the standard error of
prediction. It is given by
E yx
(y y ) r
2
n
where y is the actual value and yr the predicted value.
Example 31. Prove that
(i) E yx y . 1 – r 2 (ii) E xy x 1 – r 2
Solution. The equation of the line of regression of y on x is
y
y – y r (x – x)
x
y
yr y + r (x – x )
x
2 1/2
(y y ) 2 1 y
y – y r
r
So, E yx ( x x )
n n x
1/ 2
1 r 2 2y 2r y
2
·
= n ( y y ) 2
( x x )2 ( x x )( y y )
x x
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/