Index Number

Download as pdf or txt
Download as pdf or txt
You are on page 1of 48

CHAfYrER NINE

Curve FittingandPrincipleo!
LeastSquares
9·1. Curve Fitting. Let (x., Yi) ; i = I, 2, ... , n be a given set of n pairs
of values, X being independent variable and Y the dependent variable. The
general problem in curve fitting is to find, if 'possible, ail analytic expression
of the form Y ==1 (x), for the functional relationship. suggested by the given data.
Fitting of curves to a set of numerical data is of considerable importance-
theoretical as well as practical. Theoretically it is useful in the study of correla-
tion arid (egressjon, e.g., lines of regression can be regarded as fitting of linear
curves to the given bivariate distribution (c.c. § 10·8'1). In pfactical statistics .it
enables us to represent the relationship between two var~bIes by simple al-
gebraic expressions, e.g., polynomials, exponential or logarithmic functions.
Moreover, it may be used to estimate the values of one variable which 'would
correspond to the specified values of the other variable.
9·1·1. Fitting of a straight line. Let us• consider the 'fitting of a straight
line
Y=a+bX ...(9,1)
to a set of n points (x.. Yi) ; i = 1,2, ... , n. Equation. (9·1) represents a family of
straight lines for different values of the arbitrary constants' a' and :b' . The problem
is to determine 'a' and 'b' so that the line (9(1) is the line of " best fit ".
. The term 'best, fit' is interpreted in accor~ce with Legender.'s principle
of least squares which consists in minimising the sum of the squares of the
deviations of the actual values of y
from their estimated values as given
'by the line Of best fit.
Let Pi (Xi. Yi) be any general
point in the scatter diagram (§ 10·2).
Draw Pi M 1. to x-axis meeting the
lin~, (9·1) ilJ Hi: Abscissa of Hi is
Xi and since H;. lies on (9·1), its.or-
dinate is a + bx i. Hence the 'co-or-
dinates 'of Hi are (Xi. a + bXi).
piHi=PiM-HiM
=Yi-(a+bxi),
-O~-------M~---------~X
is Called the error of estimate .or the
residual for Yi.
According 'to the principle of,
Fundamentals of Matbematical Statistics

least squares, we have to detern1ine a and b so that


. n 2 n 2
E= L PH· = L (y.-a-bx.)
;=1 1 1 ;=1 1 1

is minimum. From the principle of.maxima and minima, the partial derivaUves of
E, with respect to (w.r.t.) a and b should vanish separately, i.e .•
aE' n BE n
-=0=-2 L (y.-a-bx,) and-=O=-2 L x.(y.-a-bx,) ... (9'2)
Oa ;=1 1 1 Bb ;=\ 1 1 1

n n .n n n 2
=> .L Yi
1=\
=na + b .L X;
I=\'
and .L xiY;
1=\
=a .L x; + b .L Xi
1=1 1=\
... (9'20)

Equations (9'2) and (9'2a) are:known as the normal equations for estimating
a and h.
'n n 2 n n
All the quantities L x; E x;, L y; and L x;Y;, can be obtained from
;=\ ;=\ ;=\ ;=(

the given set of points (x, Yi); j = .1, 2, ... , nand .the equations (9'2a) can be
solved for a and b. With the values of a and b so obtained, equation (9'1) is
the line of best fit to the given set of poin~ (Xi' Yi); i = 1, 2, ... , n.
Remark. The eqmllion of the line of best. f\t of Y on x is obtained on
eliminating a clOd b in (9' !) am) (9'2a) and can be expressed in the determinant
forlp as follows:

y X
. LY; LX; n ~O ...(9·2b)
LX;Y; LX; LX;
9-1-2. Fitting of second degree I)araboln. 'Let
Y = a + hX + e}{2 ... (9'3)
be the second degree parabola of best fit to set of n points (Xi' Yi); i = 1, 2, ... ,
n, Using the principle of l¥ast squares, we have to determine the constcints 0,
band e so that
2 ~
n
L (v-I -a-bx.1 -ex·1
E= ;=( r
is mininuun.
Equating to zero the partial derivatives of E with respect to a, h clOd c
separately, we get the normal equations for estimating Q. band c as

BE • 1 -a -bx.1 -cx1 )
fJa = 0 =..,..2 L(y.
2

DE 2
-Db = 0= -2 LX.(y. -a~bx· -ex·1 )
1 1 1
...(9,4)
0: = = 0 -2 L xi(Y; -a-bx; -cxi>
Curve Fittillg IIl1d Prilldple of lellst Squares 9·3

L: .V, ;: na + b t X" + c L: x.2


~ L: Xi Yi = a L: Xi " b L: xl + c L: x/ ... (9·4a)
,.
, .,Y = a L: x2, + b L: x3, + c L: x4.
L: x2
summation taken over i from I to n.
For given set of points (Xi' Yi); i = 1,2, ... , n, equations (9'4a) can be solved
for a, band c, and with these values of a, band c, (9'3) is the parabola of best
fit.
Remarl<•. Eliminating a. band c in (9'3) and (9'4a), the parabola of best fit
of Y on X is given by

y X2 X
LYi LX~I LXi n
=0
LXiYi LX~I LX~I LXi ... (9·4b)
LX; Yj LX4I LX3I LX2I

9·1·3. Fitting of Polynomiul of kth Degree. If


}' -
- ao + a 1 ."-v +• a 2 •'\"2 + ... + ak .~vI. ... (9'5)
is the kth degree polynomial of best fit to the sct of points (Xi' Yi); i = 1.2, ....
n. the constants ao, a l • a2 , ... , ak are to be obtained so that
n . 2 I; 2
E = L. (yj -aO -aJxj -a2 xj - ... -al; Xj )
J=}

is minimum. Thus the normal: equations for estimating ao, aI' .... ak are obtained
~n equating to zero the partial derivatives of f
w.r.t. ao, ai' ... , ak separately,
I.e.•
BE 2 I:
-;-- =0 =-2 L(Yi -ao -aJ Xj -a~ Xi - ... - al;xi )
vao
aE 2 I;
--=0=-2 LXj(Yj-aO-a J Xj -a2 Xj - ... -al;xj )
aaJ ... (9'6)
BE k 2 I;
-;--=0=-2LXj (Vj-aO-a J xi -a2 xj - ... -al;xj )
val;
~

LY; = nao +aJ L'C; +a~ "Ex? + ... + ak 'i.xf }


L XjY; = a o L.'Cj + a J L.t; +a2 L.'C: + ... + ak 'txf+J
... (9·6a)
L xtYj = ao rxt +aJ L.'Ct+ J +a2 ~'Ct+2 + ... + ak Lx;t,
summation extended over i from 1 to n. These are (k + I) equations in (k + 1)
unknowns aOI aI' a2, ... , ak and can be solved with the help of algebra.
Remarl<. It has been found that in all the above cases, the values of the
a2 E [;2 E
second order derivatives, viz., - - 2 ' - - 2 '" come out to be positive at the
aao aaJ
94 Fundamentals of Mathematical Statistics

points 00, a" ___ ,


01;. the solutions of the 'normal equations'. Hence they
provide minima of E. For proof see Remark 1 to § 10_7. I-Lines of Regression_
Example 9·1. Fit a straight line to the following data.
X: 1 2 3 4 6 8
Y. : 2·4 3 3·6 4 5 6
Solution. Let the line be Y = a + bX
X Y )(2 XY
2·4 2·4
2 3-0 4 6·0
3 3·6 9 10·8
4 4·0 16 16·0
6 5·0 36 30·0
8 6·0 64 48·0
Total 24 24 130 113·2
Using nonnal equations (9_2a), we get
=
24 6a + 24b and J J3-2 =24a + 130b
=
Solving these equations, we get a 1-976 and b 0-506. =
Example 9·2. Fit a parabola of second degree to the following data:
X 0 1 2 3 4
Y: 1·8 1·3 2·5 6·3
(Delbi Univ. B.Sc., Oct. 1992)
Solution. Let Y =a + bX + c)(2 be the second degree parabola.
X Y Xl X3 X4 XY Xly
0 1-0 0 0 0 0 0
I 1-8 1 1 I 1-8 1·8
2 1·3 4 8 16 2·6 5·2
3 2-5 9 27 81 7-5 22·5
4 6·3 16 64 256 25·2 100·8
Total 10 12·9 30 100 354 37·1 130-3
Using nonnal equations (9·4), we get
12·9 = 5a + lOb + 30c; 37·~ = lOa + 30b + 100c;
130· 3 = 30a + 100b + 354c
Solving these equations, we get a 1·42, b= =- 1·07 and cO-55. Thus the
required equation of the second degree parabola is
Y = 1-42 - 1·07 X + 0·55)(2
Remark. If the values which X and Y take are large, the calculation of
LX, L x2, LX y, ... , becomes q~ite tedious and the solution of the normal
equations, is also quite cumbersorrte.
, In this case arithmetic is reduced to a great
(:ur~e Fittin~ lind j'rinciplc of LClisl Squares 95

C,xlcnt by suilllble change of origin in X or (and) in )'.


9·1·~. Change of origin. Lei us suppose thalthe values of X are given
to be equidistant at an interval of It, i.e., X rakes the values, (say), a, a + h.
;1 + 21t •... If n is odd. i.e .• n = 2m + I (say). we take
U = X - (m,tddle t?r~ ~ = X - (a + mh)
- Interval h
Now U takes the values - m. - (m - 1)•...• - 1. 0.1, ... , (m - 1), m, so that
~U= I: U 3=0.
If n is even, i.e.• n = 2m (say). then there are two middle terms, viz., mth
and (m + l)lh terms which are p + (m - 1) h and a + mho In this case.we take
X - (mean of two middl~ terms) X - [a + ~ (2m - 1) h)
U- ---~----
- ~ (i.nterval) - ~ (h)
2X ~ 2a - (2m - 1) h
= h ...(9·7)
Now for X = a, a + h , ... , a + ( 2m - 1 ) h; U takes the values
-( 2m - 1 ), - ( 2m - 3 ) ,... , - 3, - 1, 1. 3,.... , ( 2m - 3 ), ( 2m - 1 ) .
Again we see that I: U = I: U3 = 0 .
E"a~ple 9·3. The weights of a calf taken at weekly intervals are given
below. Fit a straight Iif!e using the method of least squares and calculate the
average rate of growth ,per week.
Age (X) 1 2 3 4 5 6 7 8 9 10
Weight (Y): 52·5 58·7 65·0 70·2 75·4 81·1 87·2 95·5 102·2 108·4
Solution. Let the variables age and weight be ,.denoted by X and Y
respectively.
Here n = 10. i.e .• even and the values of X are equidistant at an interval
of unity, i.e., h = 1. Thus we take
U = X - (5 1+ 6)12) = 2X - 1.1.
- -2
Let the least-square line of Y on l.j be Y =a + bU.
The normal equations for estimating a and b are
I:Y =na + bllJ and I:UY =aW + bI:U 2
X Y U ~2 UY
1 52·5 -9 81 -472·5
2 58-7 -7 49 -410·9
3 65·0 -5 25 - 325-0
4 7~2 -3 9 -210-6
5 75-4 - 1 1 -75·4
6 81-1 1 1 8].]
7 87·2 3 9 261·6
." (, Fundamentals {If ;\1aihematical Statistics

8 95·5 5 25 477·5
9 ](\2·2 7 49 71~-4
10 IOS·4 9 81 975·6
Total 796·2 0 330 1016·8
Thus the normal equations are
796·2= lOa + 0 x band 1016·8 =a x 0+ 330b,
1016·S
which give a = 79·62 and b =~ = 3·0S (approx).

:. The least ~uare


Ijne of Y on U is
I y =79·62 + 3·0SU
Hence the line of best fit of Y on X is
y= 79·62 + 3·0S (2X -11) => Y=45·74 +6·16X
The weighrs of the calf (as given bY,the line of best fit Y =A + BX ) after
I, 2, 3, ... weeks are A + B, A + 2B, A + 3B, ... , respectively. Hence the average
rate of growth per week is B units, i.e., 6·16 units.

EXERCISE 9 (a)
1. (a) (Xi, Yi) ; i = 1,2, ... , n, give the co-ordinates of n points in- a plane.
It is proposed to fit a straight line Y =aX + b to. those points such that the sum
-of the squares of the perpendiculars from those n points to the line is a mini-
mum. Find the constants a and b. Use the above meth.Qd to 'fit a straight line
to the following points :
X: 0 2 3 4
Y: 1 I·S 3·3 4;5 6-3
Ans. Y=O·72+ 1·33X
(b) Fit a straight.line of the form Y =AX + B to. the following data:
X: 0 5 10 15 20 '25 30
Y: 10 14 19 25 31 36 39
2. Show that the line' of best fit to the following data is given by
Y=-0·5X + S
X: 6 77'S 8 8 9 9 10
Y: 5 5 4 5 4 3 4 3 3
3. (a) How do you define ~e ter:m "line of best fit". Give the normal
equations generally used to obtain such a line. Fit a straight line and parabolic
curve to the following data :X :1·01·52·02·53·03·54·0
Y: I-I 1·3 1·6 2·6 2·7 3·4 4·1
Ans. Y = t-()4 - O· 20X + 0- 24X 2
(b) Fit a straight line to the following data. P,Iot the observed and the ex·
("urn' Fittin\! ,IOI! I'rincillll' ,,' Ll'ast Squarl's 97

(h) FiL a ~LraighL line LO the following daw. PIOL Lhc observcd and Lhc cx-
I)CCLCd values III a graph amll~xamil)c wheLhcr the sLraighL line givcs an adequaLc
rlL
x... I 2 3 4 5 6 7 R
55
y... 46 40 38 33 30 29 30
4. An expcrimcnt is conductcd to verify the law of falling under gravity
expressed by . S = ~ gt 2
where S is the distance fallen at time t and go is a gravitational constant. The
following resulL<; arc obtained:
t(seconds): 1 2 3 4 5
S ( feet) 15 70 14<r 250 380
Taking S as thc dcpenent variable, fit a stmight line to the data by the
method of least squares in a manner thar you can estimate g. What is the estimate
of g? .
S. (a) Explain the 'method of fitting a second degree parabola by using the
principle <,>f lea<;t squares.
(b) Fit a parabola Y = a + bx + ex 2 to the following data :
X; I 2 3 4 5 6 7
Y: 2·3 5·2 9·7 16·5 29·4 35·5 544
6. Fit a second degree parabola to the following data taking X as the
independent variable :
X ': I 2 3 4 5 6 7 '8 9
Y: 2 6 7 ,8 10" 11 11 10 9
An~. Y = - 1 + 3·55X - 0·27X 2
7. In a spectroscopic method for determining the .per cent X of natural
rubber-content of vu\cani7..ates, the variable Y used is 1 + 10glO r, where r is the
ratio of transmission at two selected wavelengths. In order to establish a relation-
ship between X and Y, the following data were obtained :
X: 0 20 40 60 80 100
Y: 2·19 2·65 3-16 3·57 3·93 4·27
Using least square method, fit a parabola. Comment on your results.
8. Fit ~ second degrec curve'Y = a + bX + eX 2 to the following data relat-
ing to- profjt of a certain comP.any ..
Year : 1980 1982 1984 1986 1988
Profit in lakhs o{rupees.: 125 140 165 195 230
Estimate the profit in the year 1995.
Ans. Y=114+7·2X+3·15X 2
9. Explain the method of least squares of fitting a curve to the given mass
of data :
X:. -2 -1 I) 2
Fundamentals (If Mathematical Statistics

Y: YI Y2 Ya Y4 Ys
Fll a parabola Y = a + bX + c (X 2 - 2), by lhe ,melhod of leasl squares and
show lhal
- b=IO(-2
a=y, I 2 1 2
Y I-Y2+ Y4+ ys), C=14(2Y I-Y2+ Y4+ ys)

10. Show lhal lhe besl filling linear function for the points (x\,Y\),
(X2, Y2), ... , (XII' YII) may be expressed in lhe form
x Y 1
EXi EYi n =0, (i=1,2, .. ,n).
Ex? LX;Yi LXi
Show that lhe line passes through the mean point (x, Y).
9.·2. Most Plausible Solution of a System of L..ine~r Equations.
Method of least squares is helpful in fmding the most plausible values of the
variables satisfying a system of independent linear equations whose number is
more than the number of variables under study. Consider the follQwing set of
m equations in n variables X, Y, Z, ... , T :
a\X+b\Y+c\Z+ ... +k\T=/\ }
a2X + blY + c2Z + .. , + k2T =h ...(9·8)
a",)( + bmY + cmZ + ... + k".T= 1m
where aj, bj, ... , 'i; i =--1, 2, ... , m are constants.
If m = n, the system of equations (9·8) can 'be solved uniquely with the help
of algebra. If m > n, it is not posible to detennine a u,nique solution X, Y, Z, ... ,
T which will satisfy the system (9·8). In this case we find th& values of X, Y,
Z, ... , T which will satisfy the system (9·8) as nearly as possible.
Legender's principle of least squares Consists in minimising the-sum Of the
squares o'f the 'residuals' or the 'errors'. If
Ej = aX + bjY + c;Z + ... +- kiT -/j; i = 1,2, .. , m
is the residual for lhe ilh equation, then we have to determine X, Y, Z, ... , T so that
m m
U = L E? ~ L (aj X + bjY + 'Cj Z + ... + kj T _ /j)2
j .. I ;= I
is minimum.
Using the principle of maxima and minima in differential calculus, the par-
tial derivatives of 'U' w.r.t. X, Y, Z, ... , T should vanish separately.
Thus
Curve Fitting lind Principle (If Lellst Squares 99

~ ~ = 0 = .~ a. (a. X + b.Y + GiL + .,. + kiT - Ii)


(J • = t

~ ~ = 0 = .~ b. (a. X + b.Y + G.L + ... + kiT - liJ ...(9·9)


a •= t

~ ~ = 0 = .~ k. (a. X + bjY + CiZ + ... + kiT - /;)


U .= t
These are known as the normal equations for X, Y, Z, ... , T respectively.
Thus we have n - normal equations in n unknowns X, Y, Z, ... , T and' their
unique solution gives the best or the most pla4sible solution of the system (9·8).
Here we see that the norm,al equation for any variable is obtained by mul-
tiplying each equation py {he coefficient of the variable in that equation and
then adding all the resulting equations.
Example 9·4. Find the most plausible values of X and Y from' the
fol/owing equations :
X-5Y+4=0, 2X-31:'+5=0
X + 2Y - 3 = 0, 4X -¥ 3Y + 1 == 0
Solution. Normal equation for X is
I . (X - 5Y + 4) + 2 (2X - 3Y + 5) + 1 . (X + 2Y - 3) + 4 (4X +.3 Y + 1) = 0
=> 22X+3Y+15=O ...(*)
Normal equation for Y is
- 5 (X - SY + 4) - 3 (2X - 3Y + S) + 2 (X + 2Y - 3) + 3 (4X + 3Y + 1) =0
=> 3X + 47Y - 38 =0 ... (**)
Solving (*) and (**), we get X = - 0·799 and Y =0·86.
Hence the most plausible values of X and Y are X = - 0·80 (approx.) and
Y =0·86 (approx.)
EXERCISE 9 (b)
1. Find the most plausible values of X and Y from tile following equations:
(i)
X + Y= 3·01, 2X - Y=0·03,
.X + 3Y = 7·03, 3X + Y =4·97.
Ans. X = 1·0003, Y = 2·0007.
(ii) X + Y = 3, X - Y = 2,
X + 2Y -'- 4 = 0, X =2Y + l.
2. Find the most plausible values of X, Y and 2 from the following
equations:
X - Y + 1Z = 3, 3X + 2Y - 52 = 5, 4X + Y + 4Z = 7t and -x + 3Y + 3L = 14
Ans. X=2·47, Y=3·5S, Z=I·92.
9·3. Conversion of Data to Linear Form. Sometimes it may happen that
the original data is not in a linear form but can be reduced to linear form by
f.l0 Fundamentals of Mathematic;11 Statistks

somc sImple transformation of variables. We will illustrate this by consldenng


thc following curves :
<a) Fitting 01 a Power Curve. Y = aXb ... (9 10)
to a set of n points.
Taking logarithm of both sides, we £et
log Y =log a + b log X
~ U=A+bV
I
where U = log Y, A = log a and V = log X.
This is a linear equation in V and U.
Normal equations for estimating A and B are
r..U::: nA+ br..V and r..UV =Ar..V + br..V 2 ... (9· lOa)
These equations can be solved for A and b and consequently, we get
a = antilog (A)
With the values of a and IF so obtained, (9·10) is the curve of best fit to
the set of n points.
( b ) Fitting of Exponential Curves. (f) Y = abx , (ii)' Y ::: ae bX
to a set of n points.
(i) Y =abx .:.(9 11)
Taking logarithm of both sides, we get
log Y::: log a + X log b
~ U=A+BX
where U::: log Y, A =log a and B::: log b.
This is linear equation in X and U.
The normal equations for estimating A and B are
r.. U = nA
+ BU and UU:;: AU + BU 2 ...(9·11a)
Solving these equations for A and B, we finally get
a = antilog (A) and l!.:;: antilog (B)
With these values of a and b, (9·11) is the curve of best fit to the given
set of n points.
(ii) Y=al X ... (9·12)
log Y = log a + bX log e =log a + (b log e) X
V=A+BX
where U =log Y, A::: log a and B = b tog e.
This is linear equation in X and U.
Thus the normal equations arc
r..U :;: nA + au and r..xU = Af..X" + BU 2 ... (9·12a)
From these we ~ind A and B and consequently J ,
Curve FiltinJ,l and Principle 01' Leas' Slluares 911

8
(/ = antilog (A) and fl=--
loge
Example 9·5. Fit an exponential curve of the form Y = ab~ 10 Ihe
following data :
X: I ·2 3 4 5 6 7 8
}' : 1·0 1·2 1·8 2·5 3·6 4·7 6·6 9·1
Solution.
X y V = log Y XU X2
t 1·0 0·0000
0·0000 1
2 1·2 0·0792
0·1584 4
3 1·8 0·2553
0·7659 9
4 2·5 0·3979
1·5916 16
5 3·6 2·7815
0·5563 25
6 4·7 0·6721
4·0326 36
7 6·6 0·8195
5·7365 49
8 9·1 0·9590
7·6720 64
total 3~ 30·5 . 22·7385
, .. 3·7393 204
(~rlla) gives the nonnal equations as
3· 7393 ::= 8A + 368
and 22· 7385 =36A + 2048
Solving. we get
8 ::= 0·1408 and A =~·t662 = f.8338
f

:. b = Antilog B ::= 1·383 and a::= Antilog A ::= 0·6821


Hence the equation of the required curve is
Y::= 0·6821 :1·38)x .
Example 9·6. J)erive the least square equations for filling a curve of the
Iype Y::= aX + (b/X>, to a set of n points (Xi. Yi) ; i = 1.2..... n.
Solution. The error of estimate £1 for the ith point (Xi. Yi) is given by

Ei = ( Yi - axi - ~J
According to the principle of least squares. we have to detennine thy values of a
and b so that sum of the Squares of errors £, viz.,

E= E E?::= E
n n ( b
Ji-axi--. J2
;=1 ;=1 XI
is minimum.
Consequently. the normal equations are

iJE
da
=0 ::= - 2
;= 1
i Xi ( Yi - ax; - ~XI J
9·12 Fundamentals of Math<''Il1atical Statistics

dE n I ( b '\
db
- = 0 =- 2
i=lxi
1: - Yi - axi - -
x,
J
which on simplificalion give
n n
1: Xi Yi = a 1: xl + nb
i= I i= 1

and i [~)=:na
i=1 x,
+ b .i [-~)
x, i='l
Example 9·7. Three independent measurements on each of the
three angles A, B, C of a tria'lgle are as follows,'
ABC
39·5 60·3 80·1
39·3 62·2 80·3
39·6 60·1 80·4
Obtain the best estimates of the three angles taking into account the relation 'that
the sum ofthe angles is equal to 180°.
Solution. Let the Lhree observations on A be denoted by XI, X2, X3, on B
by y., Y2, Y3 and on C by ZI, Z2, Z3. ~t 9.,92 be the best estimates for A and
B respectively.
According to the principle of least squares, our problem is to estimale
9. and 92, so that
E =1: (Xi - 91)2 + 1: (yj - 92)2 + 1: (Zj - 180 + 91 + 92)2
is minimum, summation being taken over i from 1 to 3.
Equating to zerQ the ,partial derivatives of E w.r.t. 91 and 92, the. nonnal
equations are
dE
:l9.
o
=0 =- 1: (Xi - 9d + 1: (Zj - 180 + 9. + 92)
.. ,(*)
dE
a92 = 0 = -1: (yj - 92) + 1: (Zj - 180 + 9. + 92) ...(**)
From {*) and (**), we get
391 - 1: Xi + 1: Zj - 540 + 39. + 392 = () }
392 - 1: Yj + 1: Zj - 540 + 39. + 392 = 0 ... (***)
But 1: x, =39·5 + 39· 3 + 39·6 = 118·4
1: Yj '" 60·3 + 62·2 + 60·1 = 182·6
1: Zj = '80·1 + 80· 3 + 80·4 = 240·8
Substituting in (***), we get
69.+392-417·6=0 and 39.+692-481.8=0
:. ~ =Q1 = 39.27,h =Q2 = 60·66 and t = 180 - QI - G2 = 80·07
cur"" Fitting nnd Pri,nciple of lellst Squnrcs 9,13

9.4: Selection of Type of Curve to be Fitted. The greatest limitation of the


method of curve fitting by the principle of least squares is the choice of the
Inalhematical curve to be fitted to the given data. The chojce of a particular
curve for describing the given data requires great skill, intelligence and expertise.
The graph of the given data enables us to have a fairly good idea about the
type of the curve to be fitted. The graph will clearly reveal if the trend is linear
(straight line) or curvilinear (non-linear). If the graph exhibits a curvilinear trend
Illen further appr.oximations _to the type of trend curve can be obtained 011
plolting the data. on a semi-Iogarilhll~ic sca!e. A ca~eful sl.udy .of the graph
obtained on plottmg the data on an anthmellc or senll-iogantllllllc scale often
provides adequale basis for selecling the type of the curve. The various types
of curves that may be llsed to describe Ihe' given data in practise are: [If Yx is
Ibe value of the dependenl variable corresponding to th~ value x of the
independent variableJ
(i) A straight line: yx = a + hx
(ii) Second degree parabola: Yx = a + bx + cx 2
(iii) kth degree po~vnomial: yx = ao T a J x + a2 x 2 + '" + ak xk
(iv) Exponential curve: Yx = ab x
~ logyx = log a + x log b = A + Bx, (~ay).
.
(iv) Second degree curve jitted to logarithms:
x
.y x = ab cx
2
~ log y x = log a + x log b . = x 2 log c
= A + Bx + Cx 2 , (say).
(vi) Growth Curve ...:
(a) Yx = a + \)X (Modified Exponential Curve)
(b) Yx = abeX (Gonlpertz Curve)
~ log Yx = log a + "x. log. h = A + BeX, (say)
1

(c) -k . b < 0 (Lo'gistic Curve)


YI' = I + exp (a +hx)
For decideing about the type of curve 10 be titted to. a given sel of data,
the following points may b~ helpful:
(i) When the Yx series is found to be increasing by equal absolute amouIlls.
the straight line curve is used. In tllis case, the graph of the data will give a
straight line graph.
(ii) The logarithmic straigl"t line (exponential curve Yx = aV) is llsed when
the series is increasiJ)g or decreasing by a cO~lstant percentage rath~r than a
constant absolute amOlan\. In this case, the data plotted· on a semi-logarithmic
scale will give a straight line graph.
(iii) Second degree curve fitted to·logarithnls may be tried if the-data plotted
9·14 Fundamentals 01" Mathematical Statistics

on a semi~logarithmic scale is not a straight-line graph but shows curvature, being


concave.either upward or downward.
For further guidelines, the following statistical tests based on the calculus
of finite differences [ef. Chapter 17] may be applied.
We know that ·for a polynomial y" of nth degree in x.
~ ~ y" =constant,
r =n }
• =0 '\, r>n
where ~ is the difference operator given by ~ y" =YH It - y", h being the interval
of differencing and ~ , y" is the rth order difference of y".
r. If ~ y" = constant, use straight line curve.
~. If ~ 2 y" = constant, use a second degree (parabolic) curve.
3. If ~ (IOKY,,) 'T constant, use exponential curve.
4. If ~2 (log Y.) = constant, use second degree curve fitted to logarithms.
S. If ~ y" tends to decrease by a constant percentage, use modified ex-
ponential curve.
6. If ~ y" resembles a skewed frequency curve, use a Gompertz curve or
Logistic curve.
7. The growth curves, viz., modified exponential, Gompertz and Logistic
curves, can be approximated by the constancy of the ratios
~ { ~logY1i } { ~(l/y,,) }
. ~Y"-l' ~IOgY"-l ' ~(l/Y"-l)'
respectively for all possible values of x.
EXERCISE 9 (c)
1. Describe the method of fitting ~e following curves :
(t) Y =aeb", (ii) Y = aX b .
2. (a) Fit an eQuation of the form Y=abx to the following data :
-------' ..... X: 2 3 4 5 6
Y: 144 112·8 207·4 248·8 298·6
Ans. Y= (101·3) (l·I96l
(b) Fit a curve of the type y'='iJl,x to the following data :
X:.2 3 4 ~5 "'6
f: 8·3 15·4 33·1 65·2 127·4
Estimate Y when X = 4·5, 7 and 3·5.
3. Fit a curve of the' form Y= bex to the following data :
YeCJl'(X): 195'1 1952 1953 1954 1955 1956 ·1957
Production
In 'lQ!'IJ (f) : 201 263 314 395 427 504· 612
.". In a,n' experiment in which! the growth of duck weed under certain con-
Curve Fitting and l>rinciple of Least Squares 915

ditions was measured, the following results were obtained:


Weeks (X) '" 0 1 2 3 4 2 6 7 H
No. of friends (Y) ... 20 30 52 77 135 211 326 550 '1052
Assuming the relationship of the form Y = alx , find the best values of (l
and b by the method of least squares.
S. For the data given below, find the equation to the bes~ fitting exponential
curve of the form Y = alx.
X: 1 2 3 4 5 6
Y: 1·6 4·5 13·8 40·2 125·0 300·0
Ans. Y = (0.557) e!·05X
6. Fit the curve Y = aX 2 + (b/X) to the following data :
X: 1 2 3 4
Y: - 1·51 0·99 3·88 7·66
7. The following table gives correspondipg values of two variables
X and Y.
X: 1 2 3 4 5
Y: 1·8 5·1 8·9 14·1 19·8
It is found that they are connected by 'a law of the form Y= aX of bX 2 , where a and
b are constants. Find the best values of a and b by the method of least squares.
Calculate the value of.Y for X =2.
Ans. a = 1·521; b = 0·49; 5·006
8. The following pairs of observations were noted in experimental work on
cosmic rays. Find, by the method of least squares, the best values of a and b
for the equation log R =a - bC which fits the data and estimate the most
probable value of R for C =20· 7.
C: 14 15 16 17 18
R: 24·1 20·5 14·0 7·3 5·0
9. (a) Explain the principle of least squares and describe its applications,
in fitting a curve of the form Y=a exp (bX + cX 2).
(b) Fit an indifference curve of the type XY =r b + aX to the data given
below:
Consumption of Commodity X : 2 3 4
Consumption of Commodity Y : 3 1· 5 6 7· 5
Hint. y = a + (b/x). Now proceed as'in Example 9·ti.
Ans. XY= 1·3X+'l·7
10. (a) Show that the parabola of best fit for the points
(XI,YI); (Xl,Y2) ; ••.••• ; (X2n+I,Y2n+ I)
where the values of x are in A.P. with common difference unity and x~ 0, can be
expressed in the form
916 'FUndamentals of Mathematical Statistics

i x
Y n(n+l)(2n+l)
Iy; 3
o an+ 1
n(n+l)(2n+l) .0 =0
IX;Yi o 3 n (m + 1 )( 2n + 1 )
r.x~ Yi n2 (n+ 1 )2 n 3
2
[Delhi Univ. B.A. (Pass), 198'1)
Hint. Use (9·4b), with
xi=a+i; i= 1,2, ... ,(2n+ I). Since x=O,
l:xi=(2n+ I)a+l:i => 0-(211+ I)a+ (2n+ 1)(2n+2)
- 2
a=-(n+ I)
,i
l:Xi 2 = l: (a + = (2n + oi + l:i 2 + 2d ~i
and so on, for l: x? and 1; Xi4 •

[ iil=m(m+l)(2m+I); i;3=[m(m+l)f
i= 1 6 i= I 2

and m
i -: t= 1
30 m(m + 1)(2m + 1)(3m2 + 3m - I)
]

(b)
.
When do we prefer logarithmic curve to ordinary curve?
9·5. Curve Ji'itting by Orthogonal Polynomials. Suppose that the
polynomial of pth degree of Y on X is
= 2
Y ao + alX + az X + ... + ap X p
'
...(9·13)
The normal equations for detennining the .constants als are 'obtained by
the principle of least squares by minimising the residual or error sum of squares
E=l:(y-ao-alx-a2X2- ... -apxP)'" ...(9·14)
summation being extended over the given set of observations. The nonnal equa-
tions are:

~E =0, (j=0, 1,2, ... ,0)


Oaj
. 2
i.e .• l:xJ(y-ao-alx-,a2x - ... -apx")=O, [j=0, 1,2, ... ,p] ...(9·15)
Assume that X and Y are measured from their means (and this we can de
without any losS of generality) so that
~r=. !!r"=
' E /(X') =...!..
N l:x'
and write,
1 .,
~jJ =- l:xJi.y,
- N
Theory of Attributes
11'1. Introduction. Literally, an attribute means a quality 0 r
characteristic. Theory of attributes deals with qualitative characteristics which are
not, amenable to quantitative measurements and hence need slightly different
statistical treatment from that of the variables. Examples of attributes are
drinking, smoking, blindness, health, honesty, etc. An attribute may be marked
by its presence (possession) or absence (disposSession) in a member of given
population. It may be pointed out that the method of statisticai analysis
applicable to the study of variables can also be used to a great extent In the
theory of attri1>ates and vice-versa. For example, the presence or absence of an
attribute may De regard~ as changes in th~. values of a variable which can
possess only two values viz. 0 @lid I. For tJte ~e of clarity and simplicity, the
theory of attributes has been developed independently.
11'2. Notations. Suppose- the population is divided into two classes,
according to .the presence or absence of a single attribute. The positive class.
which denotes the presence of the attribute is generally written in capital Roman
letters such as A, B, C. D etc. and the negative class, denoting the absence of
the attribute is written in conespond}.ng small Greek letters such as a, ~, y, a,
etc. For example if A represents the ,attribute sickness and B repres~p~
blindness, than a and ~ represent the attributes non-sickness (health) and' sight
respectively. The two classes .viz.• A (possession of the attribute) and a
(dispossession of the attribute) are S!tid to be complementary classes and the
attribute a used in the sense of not-A is often called the complementary
attribute of A. Similarly P. 'Y. S are the complementary attributes of B, C, D
respectiVely.
The colJlbinations of attributes are ~noted by grouping toge1her the letters
concerned e.g. AB is the combination of-the attributes A and B; Thus for the
auribu~ A (sickness) and B (smoking), AB would'mean the simultaneous
possessiOh of sicJcnesS and smoking. Similarly AP will represent sickness and
non~smoking, aB non-sickness (health) and smoking, and a~ non-sickness and
nQn-smoking.
If a third attribute be inclUded to re~sent, say ~ale, then ABC will stand
for sick males who are smOkers. Similar in~rpretations can be given to ABy,
A~C,A~y.etc.
11'3. Vjc~otomy. If the universe (pop,ulation) is divided into two sub-
classes or complementary classes arid no more, with respect to each of the
attributes A. B. C etc., the division or classification is said to be 'dichotomous
classification". The classification is termed manifold if each class is further sub-
divided.
11·4. Classes and C'lass Frequencies. Different attributes in
theQ\Selves are called difft".tent classes and 'the number of observations assigned to
11·2 Funda.mental8 ofMathematiea1 Statistiea

them are called class frequencies which are denoted by bracketing the class-
symbols. Thus (A) stands for the frequency of A and (AB) for the number of
objects possessing the attribute AB.
Remark. Class frequencies ~f the type (A), (AB), (ABC) etc. are known as
positive frequencies; (a), (a~), (a~'Y) etc. are known as negative frequencies
and (DB), (A~C), (a~C) etc. are called' the contrary frequencies.
11·4·1. Order or Classes aod Class rreq ueocies. A class
represented by n attributes is called a class of nth order and the corresponding
frequency as the frequency of the nth order. Thus (A) is a class frequency of order
1; (AB); (AC), (~'Y) etc. are class frequencies of second order; (ABC), (A~'Y)
(a~C) etc. are frequencies of third order and so on. N, the total number of
members of the population, witltout any specification of attributes, is reckoned
as a frequency of zero-order.
.Thus in a dichotolpouS cl~ification with respect to n attributes, the

number of class frequencies of order •r' is ( :. ). 21', since r attributes our of n

can be selected in (nr ) ways and each of the r attributes contributes two

symbols, one representing the positive part (e.g" A) and the other the negative'
part (e,g .• a), Thus the total number of class frequencies of all orders, for n
attributes is :
"
~ (nr ) 2' = 1 + (~ ) 2+( ~ .) 22 + ... '+ ( : ) 2~ =(1 + 2)" = 3"
... (11-1)
Remarks 1. lit particular, for.n attributes, the tom! number of class
frequencies of different orders are given as follows:

Order o 1 2 r n

No. of frequencies 1 2"

2. Since in the case of n attributes, the positive class frequency of order r

has ( n,. ) elements,. their. total number is :

3. In ease of 3 attributes A, B ar!d C, the total num~r 9f class frequencies


is 33 =27, as given below:
'11JeorY ofAttributea 11·3

Order Frequencies
0 N
{ (A) (B) (C)
1 @) (y)
(a)
(AI3) (aB) (a(3)
{ (AD)
2 (AC) (Ay) (aC) (ay)
'(BC) (By) @C) (~y)

{ (ABC) (ABy) (AP~ (APy)


3 (aBC) (aBy) (aJ3C: (aJ3y)
... (11·2)
1I·4·2. Relation Between Class Frequencies. All the class
frequencies of v;uious orders are not independent of each other ane any class
frequency can always be expressed in terms of class frequencies of hi". -tier.
Thus
= = =
N (A} + (a) (B) + (13) (C) + (y), etc.
Also, since each of these A's or a's can either beB's or J3's, we have
(it) =(AB) + (AJ3) and (a) =(aB) + (aJ3)
Similarly (B) =(AB) + (aD) and ( J3) =(AJ3) + (aJ3)
(AB) =(ABC) + (ABy), =
(AJ3) (AJ3C) + (A~y)
(aD) =(aBC) + (aBy), (a(3) =(aJ3C) + (aJ3y)
and so on. Thus
(A) =(AB) + (AP) ~ (ABC) + WJy) + (AJ3C) + (AJ3y)
@) =(AP) + (aJ3) =(APC) + (Apy) + (aJ3c) + (aj3y), etc.
The classes of highest order ar"! called the ultimate classes and their
frequencies, the ulti'r'~te class frequencies. Thus in case of n attributes, the
ultimate class frequencies will be the frequencies of nth order. For example, the
class frequencies (ABC), (ABy), (Aj3C), (APy), (aBC), (aBy), (aJ3C), (aJ3y)
are the ultimate frequencies for three attributes A. B and C.
Remarks 1. In case of n attributes" 1h~ ultimate class .frequencies each
contain" symbols and since each symbol may be written in two ways, viz.,
positive p'ait and negative part, e.g., A or a, B or 13, etc., the tot8I number of
ultimate class frequencies is 2".
2. By expressing any ciass'·frequency in terms of the class frequency of
higher order, we can express it ultimately as Jhe sum of some of the 2" ultimate
. class frequencies.
3. The total numb~r of ultiJpate cl~s frequencies specify the data.
completely.
4. The set of ultimate class frequencies is not the only set which specify
the data completely. In fact any set of class frequencies which are (i) 2" in
number and (il) which are algebraically independent of e8~h other, will specify
the data completely. Such a set is called the Fundamental Set. For example, the
positive class frequencies form.such a seL Thus for n =2, the set of positive
. =
class frequencies 22 4 (c.f. 11·2), is N, (A), (8), (AB). If we are given these
"
11·4 Fundamentals of Mathematical Statistics

frequencies, then it is obvious from the table that the remaining frequencies,
(A~), (a~) (aB),(a) and (~) can be obtained by subtraction, e.g., given:
viz.,
A a Total
B (AB) - (B)
~ - - (~)
Total (A) (a) N

(a) = N - (A), (~) = N - (B)


(A~) =(A) - (AB), (aB) =(B) - (AB)
(a~) =(a) - (aB) =N - (A) - (B) + (AB)
11.5. Class Symbols as Operators. Let us write symbolicallY'
A.N= (A) ... (11·3)
which means that the operation of dichotomising N according to A gives the
class frequency equal to (A). Similarly, we write
a.N=(a)
Adding, we get
A.N;: a.N = (A) + (a)
~ (A + a). N = (A) + (a)
~ (A + a). N=N
=:. A+a=l
Thus in symbolic expression we can replace A by (I - a) and a by (I - A).
Similarly, B can be replaced by (1 - ~) and p by (I - B), and so on.
Dichotomising (B) according to A, let u~ write
A. (B) =(AB)
Similarly, B. (A) = (BA)
A. (B) = B. (A) = (AB) = AB.N,
which amounts to dichotomising N according to AB.
For example:
(a~) = a~. N = (I - A) (1 - B). N = N - A. N - B. N + AB. N
= N - [ (A) + (B)] + (AB)
(a~y) = a~y. N = (1- A)(J - B) (I - C). N
=N-AN-RN-CN+ARN+ACN+BCN-ABCN
= N- rCA) + (B) + (C)] + [(AB) + (A C) + (BC)J - (ABC)
(A By) = ABy. N = ABO - C). N = AB. N - ABC. N
= (AB) - (ABC)
(a~C) = (1 - A)(I ~ lJ) C N = (C -AC - BC + ABC). N
= (C) - (AC) - (Be) -+- (ABC)
and so on.
Example 11·1. An investigation of 23.713 'households was made in an
. urban and rural mixed locality. Of these 1.618 were farmers. 2.015 well-to-do
and 770 families were having at least one graduate. Of these graduate families
335 were those of farmers and 428 were well-to-do. also 587 well-to-do families
were those offarmers and out of them only 156 were having at least one of their
family member as graduate. Obtain all the ultimatei:lassfrequencies.
Solution. Let the attribute 'fannjng' be denoted by A, the attribute 'well-
to-do' by.B and 'having at least one graduate' by C. Then in the usUal notations,
weare given
N = 23713, (A) = 1618, (B) =2015, (c) = 770, (AB) = 587,
= =
(BC) 428, (AC) 335 and (:ABC) = 156.
For three attributes A. B. c.,the number of ultimate class frequencies is
23 = 8, one of them being (ABC) = 156. The rerruuning frequencies are obtained
below:
(ABy) = (AB) - (ABC) = 587 - 156 = 431
=
(APc)· (AC) - (ABC) =335 ..oJ 156 = 179
(APy) = (A) - (AB) - (AC) + (ABC)
= 1618 - 587 - 335 + 156 =852
(aBC) = (BC) - (ABC) = 428 - 156 = 272
=
(aBy) (B) - (AB) - (BC) + (ABC)
= 2015 - 587 -428 + 156 = 1156
= =
(aPC) (C) -(AC) - (BC) +.(ABC) 770 - 335 - 428 + 156 = 163
(apy) = N - (A) - (B) - (C) + (AB) + (AC) + (BC) - (ABC)
= 23713 - 1618 - 2015 - 770 + 587 + 335 + 428 - 156 = 20504
Example 11·2. (a) Given the following ultimate class frequencies. find
thefrequencies o/positive .class. .
=
(ABC) 149. (ABr) = 738. =
(A{3C) 225. (APr) = 1.196
(aBC) =204. (aBr) =1.762. (a{3C)=171 em (aPr) =21,842
(b) Find the remaining class frequencies. given (he following data:
N = 23.713. (A) =1618. (B) = 2015. (C) = 770
(AB) = 587. (AC) = 428. (BC) =335. (ABC) =156
Solution. (a) (A) = (ABC) + (ABy) + (APC) + (APy) = 2,308
(B) = (ABC) + (ABy) + (aBC) + (aBy) = 2,853
(C)= (ABC):" (APC) + (aBC) + (aPC) = 749
(AB) = (ABC) + (ABy) = 887
(AC) :: (ABC) + (APC) =374
(BC) :: (ABC) + (aBC) = 353
aIXI N ::[(ABC) + (ABy) + (APc) + (APy) + (aBC) + (aBy)
+(aPC) + (apy)] = 26,287
=
(b) For three attributes, there are 33 27, class frequencies in all. Thus we
have to detennine the remaining 19 class frequencies:
Order1 :
=
(a) N - (A) = 22,095 ; (/3) = N - (B) = 21.698
Fundamentals of Mathematical Statis&s

(y) =N - (C) =22.943


Order2: Order3:'
(A~) = (A) - (AB) = 1.031 (ABy) =
(AB) - (ABC) =431
(aB) :; (B) - (AB) :; 1,428 (APc) = (AC) - (ABC) =272
(cxP) =
(a) - (aB) 20.667= (A~y) =
(A~) - (A~C) =759
(Ay) :; (A) - (AC) :; 1.190 (aBC) = (BC) - (ABC) = 179
(aC) = (C) - (AC) = 342 (aBy) =
(aB) - (aBC) =1249
(ay) = (a) - (aC) = 21.753 (apC) =(Pc) - (A~C) =163
(By) =
(B) - (BC) 1.680 = (a~y) =(~) - (a~C) =20.504
(~C) =
(C) - (BC) 435 =
(~y) = -
(~) (~C) 21.263=
Example 11'3. Show tllat fo~ n attributes AI. A2 • A 3 • •••• A"
(At A 2A3... A,.) ~ (At) + (A 2) + (A 3) + ... + (A,.) - (n - 1) N ... (11· 4)
where N is the total number of observations.
Solution. We have
= =
(ataV ata2. N (l-A t)(1- Av. N =N -' (At) - (Av + (AtAv
Since class frequency is always non-negative. we have
(ataV ~ 0 => (AtAv ~ (At) + (Av -N ... (*)
It follows that (11·4) is true for 2 attributes.
Let us now suppose that (11·4) is true for r attributes At. A 2 • .... A, so
that
(At A2 A3... A,) ~ (At) + (Av + (A3) + '" + (A,) - (r - I)N
Replacing the attribute A, by another compound attribute A,A,...t. we get
(At A2 A3 ... A,Ar+t) ~ (At) + (Av + (A3) +... + (A,A""l) - (r - 1)N
~ (At) + (Av + (A 3H· ... + ({A,) + (Ar+t) - NJ - (r - 1)N
[From (*)]
= (At) + (Av +... + (A,) + (A,+t) - rN
This irpplies that if (11·4) is true for n =r. it is also true for n = r + 1
attributes. But we have seen in (*) that (11.4>. is true for n = 2. Hence by
mathematical induction. the result is true for all positive integral values of n.
Example 11·4. Show that if A occurs in a larger proportion ?f the cases
where B is than where B is not. then B will occur in a larger proportion of cases
where A is than where A is not.
Solution. The problems can be restated as follows:
(AB) ~ (AB) (aB)
Given (B) > (~) • prove that (A) > (a)

Now
(AB) ~ (ill.. ~
(B) > (~)=> (B) > (AB)
an. @ll.
1 + (B) > 1 + (AB)
11·7

N J.&.
(B) > (AB).
N, J!lL
(A) > (AB)
(A) + (a) (AB) + (aB)
(A) > (AB)
{gl (aB)'
1 + (A) > I + (AB)
{g1 (aB)
(A) > (AD)
(AB) (aB) .
(A) > (a) , as required.

EXERCISE 11 (a)
1. (0) Explain the following: (,) Draer of a class, (i,) Ultimate classes and,
(iiI) Fundamental set of class frequencies.
(b) What is meant by a class-frequency of (,) flt~t order, (i,) third order?
How would you express a class frequency of rltst order in terms of class
frequencies of third order?
2. Wh.at is dichotomy? Show that the continued dichotomy according to n
attributes gives rise to 3/1 classes.
3. (0) Given that (AB) = 150, (A~) = 230, (aB) = 260, (a~) = 2,340; find
the other frequencies and the value of N. .
(b) Given the following frequencies of We positive classes, find the
frequencies of the rest Of the classes :
(A) = 977, (AB) = 453, (ABC) =
127, (B) = 1.185, (AC) = 284,
N = 12;000, (C) = 596, and (BC) = 250.
= =
Ans. (A~) 524, (aB) = 732. (a~) 10,291, (~y) = 935, (~C) 346. =
=
(~y) = 10,469. (Ay) =693. (aC) 312. (ABy) = 326. (aBC) = 123.
"(aBy) =609,(A~C) = 157, (A~y)=367, (a~C) = 189, (a~y) =10,192.
4. Given the following data, find frequencies of (i) the remaining positive
classes; and (ii) the ultimate classes;
= =
N = 1,800, (A) 850, (B) = 780, (C) 326, (ABy) = 200, (A~C) 94, =
=
(aBC) 72, and (ABC) 50. =
S. (0) Measurements are made on a thousand husbands and a thousand
wives. If the measurements of the husbands exceed the measurements of the
wives in 800 cases for one measurement, in 700 cases for another and in 660
cases for both measurements, in how mapy cases will both measurements on the
wife-exceed the measurements on the husband ?
Ans. 160
(b) Ap unofficial political study was made about the recent changes in
Indian political scene and it was found that 919 Indira Gandhi Congress
supporters and 1,268 Organisatidn Congress supporters wanted socialistic
economy, whereas 310 Indira Gandhi Congress supporters and 503 supporters of
11.8

the Organisation Congress wanted capitalistic economy in the country. Find out
the total number of Indira Gandhi's and that of the Organisation's supporters,
giving the number of capitalistic economy's and of the socialistic economy's
votaries, out of the individuals, who were surveyed.
6. At a competitive examinatio~ at which 600 graduates appeared, boys
outnumbered girls by 96. Those qualifying for interview exceeded in number
those failing to qualify by 310. The number of Science graduate ~oys
interviewed was 300 while among the Arts graduate girls there were 2S who
failed to qualify for intervew. Altogether there were only 135 Arts graduates and
33 among them failed to qualify. Boys who failed to qualify numbered 18.
Find (e) the number of boys who qualified for interview,
(il) the total number of Science graduate boys appearing, and
(iii) the number of Science graduate girls who qualified.
ADS. (i) 330, (ie) 310, and (iii) 53.
7. 100 children took three examinations A, B and C; 40 passed the fust, 39-
passed the second and 48 passed the third, 10 passed all the.three, 21 f~led all
three, 9 passed the first two and failed the thUd, 19 failed the ftrSt two and passed
the third. find how many children passed at least two examinations. Show that
for the question asked certain of the given frequencies are not necessary. Which
are they?
ADS. 38. Only frequencies required are (C)" (apC), (ABy).
8. In a university examination, which was indeed very tough. 50% at least
failed in "Statistics", 75% at least in Topology, 82% at least in "Functional
Analysis" and 96% at least in •• Applied Mathematics". IWw many at least failed
in all the four? (Ans. 3%)
HiDt. Use the result in Example 11·3. Page 11-6.
9. If a collection contains N items, each of which is characterized by one
or more of the aUributes A, B, C and D, show that with the usual notations
(Q (ABCD) ~ (A) + (B) + (C) + (D) -3N, aJ!d
=
(ii) (ABCD) (..uJD) + (ACD) - (AD) + (AD~y),'
where p and "f represent the characteristics of the libsence of Band.c
respectively.
10. Given (A) = (a) =(B) =(~) =kN; show that (AB) =(a~), (A~) =(aB).
11. Given that (A) =(a) =(B) =(P) =(C) =(y) =l N
1
=
and also that (ABC) (a~"f), show that 2(ABC) =(AB) + (AC) + (BC) - 2" N.
n·6. Consistency of Data. Any class frequencies which have been
or might have been observed within one and the same population are said'to be
consistent if they conform with one another and do not ~.any way conflict ~or
example~ the figures (A) =20, (AB) = 25 are incons~~nt as (AB) cannot be
greater than (A), if they are observed from the same population.
'Consistency' of a set of class frequencies may be defined as the property
that none of·them is negative, otherwise, the data forcJass frequencies are said to
be 'inconsistent'.
Since any class frequency can be expressed as the sum Qf some of the
ultimate class freqlJCncies, it is necessarily non-negative if all the ultimate class
frequencies are non-negative. This provides a criterion for testing the consistency
of the data. In fact, we have the following theorem.
Theorem 11·1. "The necessary and sufficient condition for the
consistency of a set of independent class jrer[tu!1lcies is that. no ultimate class
frequency is neg~tive."
Remark. We can test the consistency of a set of 2" algebraically
independent class frequencies by calculating the ultimate class frequenCies. If any
one of them is negative, the given data are inconsistent
11·'·1. Conditions for consistency of Data. . Criteria ~r
consistency of class frequencies are obtained by using theorem 11·1. For a singte
attribute A we have conditions of consistency as follows :
(I) (A) ~ 0 }
(iI) (a) ~ 0 ~ (A) S N) ... (11.5)
For two attributes A and B, the coJll\litions of consistency are :
(I) (AB) ~ 0 }
(ilj (A~) ~ 0 ~ (AQ) S (A)
(iii) (all) ~ 0 ~ (AB) S (B)
(iv) (~) ~ 0 ~ (AB) ~ (A) + (B)' ..;. N) ... (11-6)
Conditions of consistency for three atiributes A. B and C are
(I) (ABC) ~ 0
(iI) (ABy) ~ 0 ~ (ABC) S (AB)
(ii.) (APc) ~ 0 ~ (ABC) S (AC)
(iv) (oBC) ~ 0 ~ (ABC) S (BC)
(v) (APY), ~ 0 ~ (ABC) ~ (AB) + (AC) - (A)
(vi) (C11Jy) ~ 0 ~ (ABC) ~ (AB) + (BC) - (B)
(vii) (apC) ~ 0 ~ (ABC) ~ (AC) + (BC) - (C)
(viii) (a~y) ~ 0 ~ (ABC) S (AB) + (BC) + (AC) - (A) - (B) - (C) + N
, ... (11'7)
(i) and (viii) in (U ·7) give:
(AB) + (BC) + (AC) ~ (A) + (B) + (C) - (IV) .}
Similarly
(i.) an (vii) ~ (AC)' + (BC) - (AB) s (C) ••• (11.8)
(iii) sd (v.) ~. (AB) + (BC) - (AC) S (8)
(iv) ad (v) ~ (AD) + (AC) - (flC) S (A)
Remark. As already pointed out [cf. Remarks (3) and (4),..§ 11-4·2)],2"
algebraically independent cJass frequencies are necessary to specify the data
completely, one such ~t being the set of ultimate class frequencieS and the other
being the set of positive class frequencies. If the data supplied are incomplete so
that it is not possible to detennine all the class frequencies, then the conditiqns
(ll·S), (11-6) and (II·S) for one, two and three, ~ttributes respectively, enable us
to assign the limits w~thin which an unknown class frequency Can lie.
11·10
Example 11·S. Examine the consistency of the following data:
= =
N = 1,000, (A) 600, (B) 500, (AB) = 50, the symbols 'having their
usual meaning.
Solution. We have
(aP) =t N - (A) - (B) + (AB) = 1000 - 600 - 500 + 50 =-50.
Since (aP) < 0, the data are inconsistent.
Enmple 11'6. Among the adult population of a certain town 50 per
cent are males. 60 per cent are wage earners and 50 per cent are., 45 years of-age
or over, 10 per cent of the males are not wage-earners and 40'per cent Of the
males are u1ll/er 45. Make the best possible inferenc~ about the limits within
which the percentage of persons (male or female) of 45 years or over are wage-
earners.
=
, Solution. LetN 100. Then denoting males by A, wage-eamers by Band
45 years of age or over by C, we are give~:
N = 100, (A) =SO, (B) =60, (C) =50
10 40
(AP) =100 x 50 = 5, (A'Y) = 100 x 50 =20
•• (AB) =(A) - (AP) =45, (AC) =(A) - (A'Y) =30
We are required to fmd the limits for (~C).
Conditions of consistency (11·8) give
(I) (AB) + (BC) + (AC) ~ (A) + (B) + (C)-N
~ (BC) ~ 50 + 60 + 50 - 100 - 45 - 30 -15 =
(il) (AB) + (AC) - (BC) S A
~ (BC) ~ (AB) + (AC) - (A) =45 + 30 - 50 = 25
(iii) (AB) + (BC) - (AC) S· (B)
~ (BC) S (B) +. (AC) - (AB)= 60 + 30 -45 =45
(iv) (AC) + (BC) -' (AB) S (C)
=
(BC) S (C) + (AB) "" (AC) 50 + 45 - 30 65 =
(I) to (iv) => 25 S (BC) S 45
Hence the percentage of wage-earning population of 45 years or over must
lle between 25 and 45.
Example 11·7. In a series of houses actually invaded by smallpox, 70%
of the'inhabitants are tJltacJced and 85% have been vaccintJIed. What is the lowest
percentage of the vaccinated that must have been attacJced ?
Solution. Let A and B denote the attributes of the inhabitants being
aaacked and vaccinated respectively. Then we are given:
N = 100, (A) =70 and (B) =85
Consistency condition gives':
(AB) ~ (A) + (B) - N => (AB) ~ 55
Hence the lowest percentage of inhabitants vaccinated, who have been
auackedis ~
(AB) 55'
=
(B) x 100 = 85 x ~OO 64·7%
'lbeorY ofAttributes

Example 11·8. Show that if


@
N
= x' @=2x
N·· , N
(O=3x
(AB) _ (!!fl_ (CA)_
and N - N - N -Y',
then, the vallie of neither x 'nor y can exceed 114,
Solution. Conditions of consistency give :
(AB) S (A) => NySNx => ySx ...(i)
Also (Be) ~ (B) + (C) - N

=>
mm > @ (0_1
N -N+N
=> y ~ 2x+3x-1
=> 5x - I S.y ... (ii)
.(1) an4 (il) give
5x - 1 S x => 4x S 1 => x S 41 ...(iii)
.Thus ,from ~l) and (iiI) we have y S.x S ~. ~hich establisheS ~e ,result
Example 11'9. Show that (i) If all A's are B's and all B's are C;' s then
aliA's are C's, (ii) IfallA's are Bs.and noB's are C's then no A's are C's.
Solution'. (i) All A's are B"s => (All) = (A) }
and all B's are C's => -(BC) (B) = ...(*)

To,prove (AC) = (A)


We have (AB) +- (BC) - (AQ S,an
=> ,(A) + (B) - (AC) S (B)' [Using (*)]
=>- (A)-S (AG) => (A C), ~ (A)
But since (A~'; (A), we have (AC) = (A), as desired.
(il) We are given (AB) = (A) and (BC) = 0 and we want to prove (AC)'= 0,
'Wehave' • "
(AB) + (AC) - (BC) S (A)
~ (A) + (AC) -0 s (tt)
=> (AC) S.O' , .
And since (AC) ~ 0, we inust have (AC) = O.

EXERCISE l1(b)
1. What do you' understand by consistency of giv~n data ? How do you
check it?
2. (a) If a report gives the following frequencies as actually observed, show
that there must be a misprint or mistake of some sort, and that possibly the
misprint consists in the dropping of 1 before 85 given as the frequency (BC) :
=
N= 1000, (A) 5 iO., (B) =490; (C)=427, ~1B)= 189,t{AQ= 140, (P9>=8'5.
(b) A student reported the results of a survey in the followi~g planner, in
teons of the usual notations :
N = 1000, (A) = 525, (B) = 312, (C) = 470, (AB) = 42, (BC) = 86,
(AC) = 147, and (ABC) =25.
Examine the consistency of ~e above data.
(c) £xamine the consistency and adequacy of the following data tQ detemtJne,
the frequencies of the remaining positi ve and ultimate classes.
N = 10,000, (A) = 1087, (B) =486, (C) = 877,
=
(CA.~) =281, (Ca~) =86, (yAB) 78, (ABC) 57 =
3. Given that (A) == (B) =(C) =!N and'80 per ce~t of,A's are II's, 75 per
cent of A's are C's, find the limits to'the percentage of B's that are C's.
Ans. 55% and 95%.
4. If (A) =50, (B) = 60, (C) = 50, (A~) =5, (Ay) =20, N = 100, find the
greatest and the least poss~ble vaiues of (BC) so that the data may be consistent.
Ans. 25 ~ (BC) ~ 45
5
5. If 1,000 =N =Ij'(A) =2(B) =22'5 (C) =5 (AB), and (AC) =(BC), what
should be the minimum value of (BC) ?
Ans. 150
'. . I
'.-Glven that (A) =(B) =(C) =2' N = 50 and (AB) = 30, (AC) = 25, find
the limits within which (BC) will lie.
7. In a university examination 6,5% of ttte candidate passed in English,
90% passed in the second language and 60% passed in the optional subjects.
Find how many at least should have passed the whole examination•
.-\ns; 15%. Hint. Use Example 11·3.
8. A market investigator returns the following'data. Of 1,000 people
consulted 811 liked chocolates, 7S2,liked toffees and 418 tpced boiled sweets,
570 liJced both chocolates and toffees. 356 liked chocolates and boiled sweets and
348 liked toffees and boiled sweets, 297 liked all three. Show that this
information as it stari<is must be incorrect ' .
9. (a) In a school, 50 per cent of the students are boys: 60 per cent are
Hindus and 50 per cent are 10 years of age or over. Twenty per cent of the boys
are not 'Hindus and 40 per cent of die boys are under 10. What conclusions can
you,draw in regard to percentage of Hindu students of 10 years or over?
(b) In a college. 50 per cent of the students are boys, 60 per cent of the
student are above 18 years and 80'per cent receive scholarships. 35 per cent of
the -students are beys above 18 y~s of age, 45, per cent are'bqys r:eceiving
scholarships, and 42 per cent are above 'l8 years and receive scholarships.
Determine the limits to the proportion of boys above 18 years who are in rec~~pt
of sclioJarsh~ps~
ADS. Between 30 and 3~., .'
10., The following Summary a~ in'a,report on a survey coveriQg 1,000
fieldS. Scrutinise the numbers and point out 'if there is any mistake or Il}isprint
in them. ' ,
'lbecry of.AUzibutM 11·13

Manured fields 510


irrigated fields 490
Fields growing improved varieties 427
Fields both irrigated and manured 1,89
Fields both manured'and growing improved varieties 140
Fields both irrigated and growing improved varieties 85
Hint. Let A: manured fields;
B: Irrigated fields
anl C: Growing improved varieties; then (a~'Y) < O.
11. ~ social survey in a v~llage revealed that ~ere were more uneducated
employed males than educated ones; there were more educated employed males
than uneducated unemployed males. There were more educated unemployed under
35 years of age tl}an emp~oyed uneducated males Qver 35 years of age. Show that
there are more uneducated employed males under'35 years of age than educated
unemployed males over 3S years of age.,
i2. In a war between White and Red forces" there are more'Red soldiers than
White, there are more armed Whites than unarm~ Reds, thety are fewer armed
Reds with ammunition than unarmed Whites without ammunition. Show that
there' are more armed Reds without ammunition 'than unarmed Whites ~ith
ammunition.
13! Given that (A) =(Bj =(C) =~ .<~) =CA;! ::; p, find what must ~
the greatest and least values of p in order that.. we may infer that (BC)/N, exceeds
any given value, say q.
1· 1
Ans. 4(1-2q)SpS4(1+~).
11·7. Independence or Attributes. Two attributes A and B are said
to be independent)f there exists no relationship of any kind between them. If A
and B are independent, we would expect (I) the same proportion of A 's amongst
B's as amongst P' s, (ii) the proportion of B' s amongst A's is same as that
amongst the a's. For example, if insanity and deafness are independent, the
proportion of the insane people among deafs and non-deafs ~ust be ~e:
11·7·1. Criterion or Independence. If A and B are independent, then
(i) in § 11·7 gives, '
(AB)· '00 ••. (11·9)
(B) = (\3)
@_~, (dID:
1- (B) -1- (\3)
(aB) ._(~
••• (11·9a)
(B) - @)
Similarty, (il) in § 11\7 gives
. " ~-{gID
(A) - (a) •••(11·10)

~'
1- (AB) _1_(<<8)
(A) - (a)
11·14 Fllndamentala of Mathematical Statiatiec

!dID. _!2ID ... (IHOa)


(A) - (a)
In fact (11·9) ~ (11-10) and vice-versa.
For e~ple. (1.1·9) gives
(AB) _ ~ _ (AB) + (AS) _!&
(B) - $) - (B) + (~) - (N)
(AB) _ @_ (B) - (AB) (aB)
(l1·IOb)
(A) - N - N - (A) «X).
wh~ch is (1 tIO). Similarly. starting from (1l.10):we would arrive at (11·9).
It becomes easier to grasp the nature of the.~ve relations if the frequencies
are supposed to.be grouped. into a table-with two rows and two columns as
follows,:

Attributes A a Total
-
B (J\B) (aB) (/J)

~ (A~) (a~) $)

Total (A) (a) N


\
SecoQd criterion of independence may' be obtained in terms 'of the class
frequencies of first order. (I I· lOb) gives
(AB) =~
N
... (11·11)

(AB) .....ill @
... (11·11a)
N - N' N'
which leads to the following i~por1a9t fundam~ntal rule :
"[f the attributes A and,B are independent, the proportion of AB' s in the
population is equal t() the product of the proport!o14S of A's qnd B's in the
population."
We may obtain a third criterion of indepen4{mce in terms of second order
class frequencies, as follows.
("AB) ( A.) - ,<A) (B)
. • al-' N'
~ ~!&OO.
N -
(gl@(Using11.11)
N • N
(AB) . (aM
(AB)
=(A~).
~
(aB)} ... (11·12)
(aB) = (ap)
Aliter. (11·12) may also be obtained from(11·9) and (11·9a) as explained
below'; .

(11·9) and (11·9a) ~


=> (AB)(a~) =(A~) • (a~)
Similarly. (11·10) and (11·10a) give the same resull
11·7·2. Symbols (AB)o and S. Let us write
(AB) _IDID ... (11·13)
0- N
which is the value of (AB) under the hypothesis that the attributes A and B are
independenl
Let =
S (AB) - (AB)o ... (11·14~
denote the excess of (AB) over (AB)o. Then
S =(AB) - (A~B) = k [N (AB) - (A)(B)]

=k [{ (AB) + (Aft) + (aB) + (a~)} (AB»


- {(AB)+(A~)} {(AB)+(aB)}]

=k [ (AB) (a~),.... (A~) (aB)] [On simplification]

=> S =O. if A and B are independenl


~11·12) ... (11·15)
Example 11·10. If S =(AB) - '(AB)o. -then with usual notations, prove
that
(,) [(A) - (q)][(B) - (~)] +2NS = (AB)2+ (a~)2- (A~)2_ (aB)2

(il) S=¥ {(t:/- (~)} =(~~a) {(t:/- (~!}}


.
Solution. (i) We have S =(AB) - (AB)o =(AB) -
®ill
N
L.R.S. =[(Ai - (a)][(B) - (~)] + 2NS
\

=[(AB) + (A~) - (aB) - (a~)][(AB) + (aB) - (A~) - (a~)]

+ 2N [(AB) _ (A!»B'>]
=[{(AB)-(~)} + {(Aro-(aB)}][{(AB)-(a~)} -'((A~)-(aB)H
+ 2[N(AB) - (A) (B)]
=[(AB) - (a~)]2 - [(A~) - (aB)]2
+ 2[(AB){(AB) + (A~) +.(aB) + (o./J)} - {(AB)+ (A~)} {(AB) + (as)}]
=[(AB)2 + (~)2 - 2(AB) (a~)]
-[(A~)2 + (fI,B)2 - 2(A~) (as)] + 2[(.48) (~~) - (A~) (as)]
(On simplification)
=
;:::: (AB)2 + (a~Y~ - (A~)2 - (aB)2 R,H.S.

(iJ) ¥ [(1:/ -(A~) =h[ (~)-(AB) ] - (B)(A~).J


=~[ (AB) (N-(B)}-(B) {(A)-(AB)}]

= ![ N '. (AB) - (.4)(B) ] = (AB)- (A)li(B) =~


a
Since is symmetric in A and B, by interchanging ~A and B, we will
obtain the second resUlt
11·8. Association 0'" Attributes. Two attributes A and B are said
to be associated if they are not independent but are related in some way or the
other. They are said to be

positNely associated if (AB) > (A}JB) }


'A\ 'B\ ... (11·16)
ad negativel associated if (AB) < ~
In other words, two attributes A and B are pOsitively associated if ~ > 0,
negatively associated if ~ < 0 orand are independent if6 = 0 (c.f. § 11·7·2).
Remarks 1. Two attributeS A and B are said to be completely associated
if A cannot occur without B, though B may occur without A and vice-versa. In
other words, for complete association either all A's areB's i.e .• (AB) = (A) or all
B's areA's i.e .• (AB) =(B) accOrding as eiiher A's or B's are in a minority.
Similarly, complete dissociation means m. no A's are B's i.e.• (AB) =0 or no
u's are (3's i.e.• (uJ3) = 0 or more 'generally when either of tftese state~ents is
true.
2. It should be carefully noted that the word 'association' used in Statistics
is technically different from the general notion of association as used in day-to-
day life. OrdinariJy, two attributes are said to be associated if they occur together
In a number of cases. But statistically two attributes are said to be associated if
they occur together in a large .number of cases th~ expected if. t1Jey were
a
independent, i.e .• if = (AB) - (A),(B)/N > O. In Statistics, the Statement that
"some A's are B' s", however great the proportion, does· not necessarily imply
association between them. Thus to fmd out if two atlributes are associated, we
must know (A), (B), (AB) and N. I,ncomplete information will not enable us to
conclude anything about association between them. For example, consider the
following statement:
"90 per cent of the people who drink alcohol die before reaching ~e age of
7S years. Hence drinking is bad.for longevity of life. "
The .i~ference drawn is not correct, since the given information is not
complete for drawing any valid conclusions about association. It'might happen
that 95% d. the people who do not 'drink. die before reaching 75 years of age. In•
. ~ ~ drinking might be found good for lOngevity of life.
3. Sampling jluctua(ions. If ~ ~ 0 and its vlplue is fairly small, then it is
possible that this association is just by chance (or commonly temied as"due to
~uctuations of sampling) and not. really significant of any . real association
between the attributes. We should not, th~ore, draw hasty conclusions about
association or dissociation unless ~, the difference between (AB).and its expected
value (under the hypothes~s of independence) (A}(B)/N, is signiti~an.t. The
11·17

problem: 'how much difference is to be regarded as significant' will be discussed


in detail in Chapters 12 (Large sample test for attributes) and 13 (Chi-square test
of goodness of fit). This point has been raised here only as a precautionary
measure to warn the reader against dmwillg hasty inferences.
11'8·1. Yule's Coefficient of Association. As a measure of the
intensity of association between two attributes A and B. G. Udny Yule gave the
coefficient of association Q, defined as follows:
_ (AB) (aB) - (AB) (aB) N8
Q - (AB) (a~) + (A~) (aB) (as) (a~) + (A~) (aB)
= •.. (11·17)
If A and B are inlM:pendent. 8:: 0 => Q =O.
If A and B are completely associated. then
either (AB) =(A) => (A~) =0
(I' , (AB)= (B) => (aB) 0 =
and in each ~ Q =+1.
If A and B are iii complete dissociation then either (AB) = 0 or (a~) = 0 and
we get Q =-1.
Hence -1 S Q S 1 ... (11·18)
Remark. An important property of Q ,is that it i'3 inde~ndent of the
relative proportion of A's or a's in the data. Thus if all the terms containing A
in Q are mUltiplied by a constant, k' (say), its value remains unaltered. Similarly
for B, ~ and a. This property renders it specially useful to situations where the
proportions are arbitrary, e.g.. experiments.
11·8·2. Coefficient of Colligation. Another coefficient with the
same properties as Q. is the coeffIcient of colligation Y. given by

Y-,
_{I _--'II (A6)(aB)
(AB)(~)
}' / '{I. + --VI (AB)(aB)}
(AB)(a~)
... (11·19)

. 1- 1
Remarks ,1. Obvlously Q =0 ~ Y = 1+1 =o.
Q = -1 => Y= -1 and Q = 1 =>' Y = 1 and Conversely.
(AB) (aB)
2. If we let (AB) (a~) =k, so that

Y ,= 1 - {k => y2 = 1 +} - 2;Vk
1 '+ ~ 1 + k + 2~ k
1+ yz _ 2(1 + k) _ 2(1 + k)
- 1 + k + 2~ k - (1 + {k )2
2Y = 2(1 - {kx (1' + {k). == 1 - k
1 + y2 2Q + k) 1+k
(AS) (aB)
_ 1 - (AB) (aB) =(AB) (aB) -(AB) (aB)
- 1 + (AB)(aB) (AB)(a~)'+ (A~)(aB)
(AB) (a~)
11·18

Q . 2Y
... (11.20)
=1 + y2
Example 11·11. Find if A and B are independent. positively associated
or negatively associated. in each of the following cases :
=
(i) N = 1000. (A) = 470. (B) 620. and (AB) = 320.
(li) (A) = 490. (AB) = 294. (a) = 510. and (aB) = 380.
(iii) (AB) = 256. (aB) = 768. (AP) = 48, and (ap) = 144.
Solution. (I) ~ ::;: (AB) _ (A~B)

= 320 47~;;20 = 320 - 291· 4 = 28· 6


Since ~ > 0, A and B are positively associated.
(i,) We have N =(A) + (a.) = 490 + 570 =1060
(B) = (AB) + (aB) = 294 + 380 = 674
:. ~ =(AB) - (A~(B) = 294 ... 49~~74 = 294 - 311.6 '< 0
Hence A and B are negatively associated.
(iii) (A) =(AB) + (AP) =256 + 48 =304
(B) =(AB) + (aB) =256 + 768 =1024
N = (AB) + (A~) + (a.B) + (aP) = 256 + 48 + 768 .,.. 144 ::l::
1216

',Heney, A and B are independent


Aliter. Since all the four frequencies of order 2 are given, using (11·15),
we have
~ =~ [(AB) (a.P) - (AP) (aB)] !
= [256 x 144 - 48 x 768]

256 [ 144-48x3 ] =0
=N
~ A and B ate independent.
Example U·U. Investigate the association between darkness of eye-
colour' in father and so'!from the following data :
Fathers with dark eyes and sons )Vith dar~ eyes: 50
F athen with dark eyes and sons with not dark eyes: 79
Fathers with not ~k eyes and'sons with dark eyes: 89
Fathers with not dark eyes and sons .with not dark eye~ : 782
Also tabulate for comparison the frequencies that would have been observed
hod therp been no heredity.
Sol uti 0 n. Let tl: Dark eye-colour of father and
B: Dark eye-C9iour of son.
=
Then we ate given (AB) SO, (A~) = 79~ (aB) =89, (a.P) =.782
.11·19

Q _ 50 x 782 - '79 'X' 89 32069 _ 0.69


- 50 x 782 + 79 x 89 - 46131 - +
Hence there is a fairly high tlegree of positive association between the eye
colour of fathers and sons.
We have (A) = (AB) + (A~) =50 + 79 = 129
(8) = =
(AS) + (aB) 50 + 89 = 139
(a) =(aB) + (a~) =89 + 782 =871
~) = =
(A~) + (a~) 79 + 782 = 861
N = (A) + (a) = 129 + 871 = 1000
Under the condi.tian of no heredity, i.e.• independence of attributes A and B,
we have .
'AB) _illill_ 129 x 139 -18' (AR) _ffiiID._ 129 x 861 111
" 0- N - 1000 - , "'0- N - 1000
~D) _.!ID.ill_ 871 x 139 _ 121' (NR) _1rU1ID. =871 x 861 750
(\AU 0 - N - 1000 - ,"'t' 0 - ,N JOOO-
~.mple 11·13. Can vaccination be regarded as (l preventive measure for
small pox from the data give below ? .
, 'Of 1482 persons in a locality exposed to small-pox. 368 in all were
attacked:
'0/1482 persons; 343 had been vaccinated and of these only 35 were
attacked:
Solution. Let A denote the attribute of vaccination and B that of attack by
small-pox. Then the given data are :
= = =
N 1482, (A) 368, (B) 343 and (AB) 35 =
(a~) =N - (A) - (B) + (AB) = 1482': 368 - ~43 + 35 =806
=
(A~) (A) - (AB) =36a - 35 333 =
= =
(aB) (B) - (AB) 343 - 35 308 =
Q _ (AB~ (a B) - (AB) (~) _ 35 x 806 - 333 x 308 =_0.57
.. (AB)(aP)'+ (a~)(aB) -35 x 8'06 + 333 x 308
Th~s, there is negative association between A and iJ i.e .• between 'attacked'
and 'vaccinated'. In other words, there is positive association between not
attacked and vaccinated. Hence vaccination can be regarded as a pr~veptive
measure for smallpox.
EXERCISE 11(c) •
1. (a) What 00 you mean by independence of attributes ? Give a cri~rion
of independence for attributes A and B.
(b) What are the various methods of finding whether two attributeS are
associated. dissociated' or independent ? Deduce anyone such measure of
asso:ciation.
(c) When are ~o attributesJaid to be positively assOCiated and nep'tive1y
associated? Also define coniplete association aIld dissociation of two attributes.
(d) Derive an expression for a m~asure of association between two
attributes.
(e) What is association of attributes '1 Write a note o~ the strength of
association and how it is measured '1
if) Find whether the attributes a and ~ ~ positively associated, negatively
=
asso:ciated or independent. Given (AB) 500, (a) = 800, (B) = &:IJ.N = 1500.
2. (a) Define Yule's coefficient of aSsociation and the coefficient of
Colligation. Establish the following reJation between coefficient of association
Q and coefficient of colligation Y : .
2Y
Q= 1 + yl
(b) For the following table, give Yule's coefficient of as_soci~tion (Q) and-
coeffICient of Colligation (Y). Examine the cases (,) be :: 0, (i,) ad =0, and (iii)
od=bc.
B notB
A a b
not A c d
=
ADS. Q = 1 = Yifbe oand Q =-1 = Yif ad= oand Q = 0 if ad= be.
(c) Prove that in the usual notatlbns Q = 2Y/(1 + f2). What i~ the range of
values for Q '1
(11) If an attribute A is known to be comple~ly associated with an aUJibute
B, (,) what ~'you infer about the association between a and ~ '1 (a and p are
'equivalent to 'not A' and 'not B' respectively), (it) a and B '1
3. (a) The following table is reproduced from a memoir written tiy Karl
Pearson:
Eye colour in son
Not lighl Light
~ye cqlour} 'Not light 230 148
in [ather Light 151 471

'Discuss if the colour of son's eyes is ~sociated. wjtl) tha~ Qf. father.
ADS. Yes. Positively associated, Q 0·66.=
(b) The following t4ble sltows the result of inocculation against cholera.
1!ot attacked Attacked
/nocculaJtd 431 5
Not-inocculated 291 9
Examine th~ effect of inoccuJation in controlling susceptibility to cholera.
ADS. • Il)occulation is.. effective in· controlling cholera.
4. (a) tmd the association between proficiency.in English 2Ild in Hindi
among candidates at a certain test if 245--of them passed in Hindi, 285 failed in
Hindi, 190 failed in Hindi but passe4 in English and 147 ~ in.both.
(b) TJte'rpaie population of a state is 250 lakhs. The number of li~ra~.
males is 20 lakhs and total number of male criminals is 26 thouSand. The
number of literate male criminals is 2 thousand. Do you find any association
between literacy and criminality '1 •
ADS. Lite~y and criminality are positively associated.
11·21

(c) From the following particulars frnd whethec blindness and baldness are
associated :
Tot~1
population 1,62,6.4,000
Number of baldheaded 24,441
Number of blind 7,26~
Number of baldheaded blind 221
5. In a certain investigation carried on with regard to 500 graduates and
1500 non-graduates, it was found that the num~r of employed graduates was
450 while the number of unemployed non-graduates was 300. In the second
investigation 5000 cases were examined. The number of non-graduates was 3000
and the number of employe4 1J0n-graduates was 2500. The number of graduates
who were fQ1.JDd to be employed was 1600:
Calculate the coeffICient of association between graduation and employment
in both the investigations.
Can any defiriite conclusion be drawn from the coefficients ?
ADS. Q (lst Investigation) =~. 38, Q (SeCOnd Investigation) =- 0·11
6. (a) Three aptitude tests A, B, C were given to 200 apprentice trainees.
From amongst them 80 passed test A.-78 passed test B and 96 passed die third
test While 20 passed aU the three tests, 42 failed all the three, l~ p~ it and
B but failed C and 38 failed A and B byt passed the third. DeterD!ine (i) how
many trainees pa~sed at least two of the three tests and (ii) whether the
performances in tests A and B are associated. ADS. (I) 76, (it) ~ =0·3 '
(b) In a survey of a population of 12000, information is gathered regarding
three attributes A, B and C. In the usual notations, .
(A) =980; (AS) =450, (ASC) = 130
(fJ) =1190, CAc) =280, (C) = 600 and (BC) ~ 250.
=
Find : (l) (a~'Y) (ii)QAB Coefficient of Association between A and I'
Comment on your findings.
7. A group of 1000 fathers was studied and it was found that 12·9% had
dark eyes. Among them the ratio of those having ~ns with dark eyes to diose
having sons with not daPc eyes was 1 : 1·58. The number of cases where fathers
and sons both did not have dark eyes was 782. Calculate coefficient of
association between darkness of eye colour in father and son. Give the
frequencies that would have been observed had there been completely no heredity.
HiDt. (AS) = 50, (~~) =79, (Q.8) =89 and. (a~) =782.
8. A census revealed the following figures of the blind and the insane in
two ag~-groups in a cectain population: .
Age-Gro." A'ge-9roup
15-J5 years over 25 years
Total population 2,70,000 '1,60,200
Number of-blind "1,000 2,000
.Number of insane 6,000 1,000
Number bf insane among the blind 19 9
(IJ Obtain a measure of association between blindness and insanity for each
age-group. .' •
(ii) Which group shows more association or dis-a:ssociatiotl (if any) ?
9. Show that if (AB)1o (as)1' (AP)1o (ap)1 and (ABh. (aSh. (i'\P)2 and
(ap)2 be two aggregateS corresponding to the same values of (A). (B). (a) and
(~),dlen •
=
(AB)l - (AB)2 =(aB)2 - (aB)l =(AP)2 - (AP)l (ap)1 - (aPh

10. Show that if a =(AB), - (AMB) • then

·a;;: k[(AB)(ap) - (AP)(aB)]

OBJECTIVE TYPE QUESTIONS


1. State,. giving reasons, whether each of the :following statements is true
or false:
(I) There is no difference ~tween correlation aIJd association.
(ii) All the class freqqencies of various orders are· independent of each other.
(iiI) H the attributes'A and,B are positively .assQCj~. then. a and B a;re also
pp~itively associated.
(iV) Square. of Yule's coefficient of association cannot exceed 1.
(v) Yule.'s coefficient of association cannot be negative.
(vi) FOt: !WO attributes A and B, the coefficient of association Q is O· ~6 .. If
each ultimate class frequency is doubled then Q is 0:72:
(vii) If (AB) = 10, (al1)-= 15, (AP) = 20 and (aP) = 30, then A and Q ~
associated.
(viii) If every item which posSesses an aitrlbute A posse~es the attribute B
as well, then the coefficient of association between A and B'is 1.
D. Indicate the correct answer: •
(I) In case of two attributes A and B, the ultimate class frequencies·ate :'
(0) : (A), (b) : (AB), (c) : (a): (d) : (B) ..
(ii) The condition fO.r 'the consistency of a set of -independent ciass'
frequencies is that no ultimate class' frequency is (0) zero, (6) posttive.
(c) n e g a t i v e . ' . I

(iiI) Attributes A ana B are said to. J>e inde.~ndent if


(0) (AB) > (A); (B) , (b~ (AB) = (A); .(B). (c) (AB) < (A) ~ (B)

(iv) Attributes A and B are said to.be pos~tlvely associated if


(AB) @D. (AB) @D. tAB) @D. (AB) @D.
(0) (B) < @) , (6) (B) =
$) , (c) (B) > @) , (d) (A) < @)
(v) If N =50, (A) =35, (B) =25,. (AB) =15, then the a~tributes A and B
are said to be :
(0) correlated, (b) independent, (c) negatively associated, (d). J)Qsitively
associated
(VI) When there is a p<?rfe.ct IX?sitive association ~tween tw,o attriJ>!1.~s, Q
would be (0) zero, (b)-0·9, (c)-I,(d) +1. -

You might also like