Index Number
Index Number
Index Number
Curve FittingandPrincipleo!
LeastSquares
9·1. Curve Fitting. Let (x., Yi) ; i = I, 2, ... , n be a given set of n pairs
of values, X being independent variable and Y the dependent variable. The
general problem in curve fitting is to find, if 'possible, ail analytic expression
of the form Y ==1 (x), for the functional relationship. suggested by the given data.
Fitting of curves to a set of numerical data is of considerable importance-
theoretical as well as practical. Theoretically it is useful in the study of correla-
tion arid (egressjon, e.g., lines of regression can be regarded as fitting of linear
curves to the given bivariate distribution (c.c. § 10·8'1). In pfactical statistics .it
enables us to represent the relationship between two var~bIes by simple al-
gebraic expressions, e.g., polynomials, exponential or logarithmic functions.
Moreover, it may be used to estimate the values of one variable which 'would
correspond to the specified values of the other variable.
9·1·1. Fitting of a straight line. Let us• consider the 'fitting of a straight
line
Y=a+bX ...(9,1)
to a set of n points (x.. Yi) ; i = 1,2, ... , n. Equation. (9·1) represents a family of
straight lines for different values of the arbitrary constants' a' and :b' . The problem
is to determine 'a' and 'b' so that the line (9(1) is the line of " best fit ".
. The term 'best, fit' is interpreted in accor~ce with Legender.'s principle
of least squares which consists in minimising the sum of the squares of the
deviations of the actual values of y
from their estimated values as given
'by the line Of best fit.
Let Pi (Xi. Yi) be any general
point in the scatter diagram (§ 10·2).
Draw Pi M 1. to x-axis meeting the
lin~, (9·1) ilJ Hi: Abscissa of Hi is
Xi and since H;. lies on (9·1), its.or-
dinate is a + bx i. Hence the 'co-or-
dinates 'of Hi are (Xi. a + bXi).
piHi=PiM-HiM
=Yi-(a+bxi),
-O~-------M~---------~X
is Called the error of estimate .or the
residual for Yi.
According 'to the principle of,
Fundamentals of Matbematical Statistics
is minimum. From the principle of.maxima and minima, the partial derivaUves of
E, with respect to (w.r.t.) a and b should vanish separately, i.e .•
aE' n BE n
-=0=-2 L (y.-a-bx,) and-=O=-2 L x.(y.-a-bx,) ... (9'2)
Oa ;=1 1 1 Bb ;=\ 1 1 1
n n .n n n 2
=> .L Yi
1=\
=na + b .L X;
I=\'
and .L xiY;
1=\
=a .L x; + b .L Xi
1=1 1=\
... (9'20)
Equations (9'2) and (9'2a) are:known as the normal equations for estimating
a and h.
'n n 2 n n
All the quantities L x; E x;, L y; and L x;Y;, can be obtained from
;=\ ;=\ ;=\ ;=(
the given set of points (x, Yi); j = .1, 2, ... , nand .the equations (9'2a) can be
solved for a and b. With the values of a and b so obtained, equation (9'1) is
the line of best fit to the given set of poin~ (Xi' Yi); i = 1, 2, ... , n.
Remark. The eqmllion of the line of best. f\t of Y on x is obtained on
eliminating a clOd b in (9' !) am) (9'2a) and can be expressed in the determinant
forlp as follows:
y X
. LY; LX; n ~O ...(9·2b)
LX;Y; LX; LX;
9-1-2. Fitting of second degree I)araboln. 'Let
Y = a + hX + e}{2 ... (9'3)
be the second degree parabola of best fit to set of n points (Xi' Yi); i = 1, 2, ... ,
n, Using the principle of l¥ast squares, we have to determine the constcints 0,
band e so that
2 ~
n
L (v-I -a-bx.1 -ex·1
E= ;=( r
is mininuun.
Equating to zero the partial derivatives of E with respect to a, h clOd c
separately, we get the normal equations for estimating Q. band c as
BE • 1 -a -bx.1 -cx1 )
fJa = 0 =..,..2 L(y.
2
DE 2
-Db = 0= -2 LX.(y. -a~bx· -ex·1 )
1 1 1
...(9,4)
0: = = 0 -2 L xi(Y; -a-bx; -cxi>
Curve Fittillg IIl1d Prilldple of lellst Squares 9·3
y X2 X
LYi LX~I LXi n
=0
LXiYi LX~I LX~I LXi ... (9·4b)
LX; Yj LX4I LX3I LX2I
is minimum. Thus the normal: equations for estimating ao, aI' .... ak are obtained
~n equating to zero the partial derivatives of f
w.r.t. ao, ai' ... , ak separately,
I.e.•
BE 2 I:
-;-- =0 =-2 L(Yi -ao -aJ Xj -a~ Xi - ... - al;xi )
vao
aE 2 I;
--=0=-2 LXj(Yj-aO-a J Xj -a2 Xj - ... -al;xj )
aaJ ... (9'6)
BE k 2 I;
-;--=0=-2LXj (Vj-aO-a J xi -a2 xj - ... -al;xj )
val;
~
8 95·5 5 25 477·5
9 ](\2·2 7 49 71~-4
10 IOS·4 9 81 975·6
Total 796·2 0 330 1016·8
Thus the normal equations are
796·2= lOa + 0 x band 1016·8 =a x 0+ 330b,
1016·S
which give a = 79·62 and b =~ = 3·0S (approx).
EXERCISE 9 (a)
1. (a) (Xi, Yi) ; i = 1,2, ... , n, give the co-ordinates of n points in- a plane.
It is proposed to fit a straight line Y =aX + b to. those points such that the sum
-of the squares of the perpendiculars from those n points to the line is a mini-
mum. Find the constants a and b. Use the above meth.Qd to 'fit a straight line
to the following points :
X: 0 2 3 4
Y: 1 I·S 3·3 4;5 6-3
Ans. Y=O·72+ 1·33X
(b) Fit a straight.line of the form Y =AX + B to. the following data:
X: 0 5 10 15 20 '25 30
Y: 10 14 19 25 31 36 39
2. Show that the line' of best fit to the following data is given by
Y=-0·5X + S
X: 6 77'S 8 8 9 9 10
Y: 5 5 4 5 4 3 4 3 3
3. (a) How do you define ~e ter:m "line of best fit". Give the normal
equations generally used to obtain such a line. Fit a straight line and parabolic
curve to the following data :X :1·01·52·02·53·03·54·0
Y: I-I 1·3 1·6 2·6 2·7 3·4 4·1
Ans. Y = t-()4 - O· 20X + 0- 24X 2
(b) Fit a straight line to the following data. P,Iot the observed and the ex·
("urn' Fittin\! ,IOI! I'rincillll' ,,' Ll'ast Squarl's 97
(h) FiL a ~LraighL line LO the following daw. PIOL Lhc observcd and Lhc cx-
I)CCLCd values III a graph amll~xamil)c wheLhcr the sLraighL line givcs an adequaLc
rlL
x... I 2 3 4 5 6 7 R
55
y... 46 40 38 33 30 29 30
4. An expcrimcnt is conductcd to verify the law of falling under gravity
expressed by . S = ~ gt 2
where S is the distance fallen at time t and go is a gravitational constant. The
following resulL<; arc obtained:
t(seconds): 1 2 3 4 5
S ( feet) 15 70 14<r 250 380
Taking S as thc dcpenent variable, fit a stmight line to the data by the
method of least squares in a manner thar you can estimate g. What is the estimate
of g? .
S. (a) Explain the 'method of fitting a second degree parabola by using the
principle <,>f lea<;t squares.
(b) Fit a parabola Y = a + bx + ex 2 to the following data :
X; I 2 3 4 5 6 7
Y: 2·3 5·2 9·7 16·5 29·4 35·5 544
6. Fit a second degree parabola to the following data taking X as the
independent variable :
X ': I 2 3 4 5 6 7 '8 9
Y: 2 6 7 ,8 10" 11 11 10 9
An~. Y = - 1 + 3·55X - 0·27X 2
7. In a spectroscopic method for determining the .per cent X of natural
rubber-content of vu\cani7..ates, the variable Y used is 1 + 10glO r, where r is the
ratio of transmission at two selected wavelengths. In order to establish a relation-
ship between X and Y, the following data were obtained :
X: 0 20 40 60 80 100
Y: 2·19 2·65 3-16 3·57 3·93 4·27
Using least square method, fit a parabola. Comment on your results.
8. Fit ~ second degrec curve'Y = a + bX + eX 2 to the following data relat-
ing to- profjt of a certain comP.any ..
Year : 1980 1982 1984 1986 1988
Profit in lakhs o{rupees.: 125 140 165 195 230
Estimate the profit in the year 1995.
Ans. Y=114+7·2X+3·15X 2
9. Explain the method of least squares of fitting a curve to the given mass
of data :
X:. -2 -1 I) 2
Fundamentals (If Mathematical Statistics
Y: YI Y2 Ya Y4 Ys
Fll a parabola Y = a + bX + c (X 2 - 2), by lhe ,melhod of leasl squares and
show lhal
- b=IO(-2
a=y, I 2 1 2
Y I-Y2+ Y4+ ys), C=14(2Y I-Y2+ Y4+ ys)
10. Show lhal lhe besl filling linear function for the points (x\,Y\),
(X2, Y2), ... , (XII' YII) may be expressed in lhe form
x Y 1
EXi EYi n =0, (i=1,2, .. ,n).
Ex? LX;Yi LXi
Show that lhe line passes through the mean point (x, Y).
9.·2. Most Plausible Solution of a System of L..ine~r Equations.
Method of least squares is helpful in fmding the most plausible values of the
variables satisfying a system of independent linear equations whose number is
more than the number of variables under study. Consider the follQwing set of
m equations in n variables X, Y, Z, ... , T :
a\X+b\Y+c\Z+ ... +k\T=/\ }
a2X + blY + c2Z + .. , + k2T =h ...(9·8)
a",)( + bmY + cmZ + ... + k".T= 1m
where aj, bj, ... , 'i; i =--1, 2, ... , m are constants.
If m = n, the system of equations (9·8) can 'be solved uniquely with the help
of algebra. If m > n, it is not posible to detennine a u,nique solution X, Y, Z, ... ,
T which will satisfy the system (9·8). In this case we find th& values of X, Y,
Z, ... , T which will satisfy the system (9·8) as nearly as possible.
Legender's principle of least squares Consists in minimising the-sum Of the
squares o'f the 'residuals' or the 'errors'. If
Ej = aX + bjY + c;Z + ... +- kiT -/j; i = 1,2, .. , m
is the residual for lhe ilh equation, then we have to determine X, Y, Z, ... , T so that
m m
U = L E? ~ L (aj X + bjY + 'Cj Z + ... + kj T _ /j)2
j .. I ;= I
is minimum.
Using the principle of maxima and minima in differential calculus, the par-
tial derivatives of 'U' w.r.t. X, Y, Z, ... , T should vanish separately.
Thus
Curve Fitting lind Principle (If Lellst Squares 99
8
(/ = antilog (A) and fl=--
loge
Example 9·5. Fit an exponential curve of the form Y = ab~ 10 Ihe
following data :
X: I ·2 3 4 5 6 7 8
}' : 1·0 1·2 1·8 2·5 3·6 4·7 6·6 9·1
Solution.
X y V = log Y XU X2
t 1·0 0·0000
0·0000 1
2 1·2 0·0792
0·1584 4
3 1·8 0·2553
0·7659 9
4 2·5 0·3979
1·5916 16
5 3·6 2·7815
0·5563 25
6 4·7 0·6721
4·0326 36
7 6·6 0·8195
5·7365 49
8 9·1 0·9590
7·6720 64
total 3~ 30·5 . 22·7385
, .. 3·7393 204
(~rlla) gives the nonnal equations as
3· 7393 ::= 8A + 368
and 22· 7385 =36A + 2048
Solving. we get
8 ::= 0·1408 and A =~·t662 = f.8338
f
Ei = ( Yi - axi - ~J
According to the principle of least squares. we have to detennine thy values of a
and b so that sum of the Squares of errors £, viz.,
E= E E?::= E
n n ( b
Ji-axi--. J2
;=1 ;=1 XI
is minimum.
Consequently. the normal equations are
iJE
da
=0 ::= - 2
;= 1
i Xi ( Yi - ax; - ~XI J
9·12 Fundamentals of Math<''Il1atical Statistics
dE n I ( b '\
db
- = 0 =- 2
i=lxi
1: - Yi - axi - -
x,
J
which on simplificalion give
n n
1: Xi Yi = a 1: xl + nb
i= I i= 1
and i [~)=:na
i=1 x,
+ b .i [-~)
x, i='l
Example 9·7. Three independent measurements on each of the
three angles A, B, C of a tria'lgle are as follows,'
ABC
39·5 60·3 80·1
39·3 62·2 80·3
39·6 60·1 80·4
Obtain the best estimates of the three angles taking into account the relation 'that
the sum ofthe angles is equal to 180°.
Solution. Let the Lhree observations on A be denoted by XI, X2, X3, on B
by y., Y2, Y3 and on C by ZI, Z2, Z3. ~t 9.,92 be the best estimates for A and
B respectively.
According to the principle of least squares, our problem is to estimale
9. and 92, so that
E =1: (Xi - 91)2 + 1: (yj - 92)2 + 1: (Zj - 180 + 91 + 92)2
is minimum, summation being taken over i from 1 to 3.
Equating to zerQ the ,partial derivatives of E w.r.t. 91 and 92, the. nonnal
equations are
dE
:l9.
o
=0 =- 1: (Xi - 9d + 1: (Zj - 180 + 9. + 92)
.. ,(*)
dE
a92 = 0 = -1: (yj - 92) + 1: (Zj - 180 + 9. + 92) ...(**)
From {*) and (**), we get
391 - 1: Xi + 1: Zj - 540 + 39. + 392 = () }
392 - 1: Yj + 1: Zj - 540 + 39. + 392 = 0 ... (***)
But 1: x, =39·5 + 39· 3 + 39·6 = 118·4
1: Yj '" 60·3 + 62·2 + 60·1 = 182·6
1: Zj = '80·1 + 80· 3 + 80·4 = 240·8
Substituting in (***), we get
69.+392-417·6=0 and 39.+692-481.8=0
:. ~ =Q1 = 39.27,h =Q2 = 60·66 and t = 180 - QI - G2 = 80·07
cur"" Fitting nnd Pri,nciple of lellst Squnrcs 9,13
i x
Y n(n+l)(2n+l)
Iy; 3
o an+ 1
n(n+l)(2n+l) .0 =0
IX;Yi o 3 n (m + 1 )( 2n + 1 )
r.x~ Yi n2 (n+ 1 )2 n 3
2
[Delhi Univ. B.A. (Pass), 198'1)
Hint. Use (9·4b), with
xi=a+i; i= 1,2, ... ,(2n+ I). Since x=O,
l:xi=(2n+ I)a+l:i => 0-(211+ I)a+ (2n+ 1)(2n+2)
- 2
a=-(n+ I)
,i
l:Xi 2 = l: (a + = (2n + oi + l:i 2 + 2d ~i
and so on, for l: x? and 1; Xi4 •
[ iil=m(m+l)(2m+I); i;3=[m(m+l)f
i= 1 6 i= I 2
and m
i -: t= 1
30 m(m + 1)(2m + 1)(3m2 + 3m - I)
]
(b)
.
When do we prefer logarithmic curve to ordinary curve?
9·5. Curve Ji'itting by Orthogonal Polynomials. Suppose that the
polynomial of pth degree of Y on X is
= 2
Y ao + alX + az X + ... + ap X p
'
...(9·13)
The normal equations for detennining the .constants als are 'obtained by
the principle of least squares by minimising the residual or error sum of squares
E=l:(y-ao-alx-a2X2- ... -apxP)'" ...(9·14)
summation being extended over the given set of observations. The nonnal equa-
tions are:
them are called class frequencies which are denoted by bracketing the class-
symbols. Thus (A) stands for the frequency of A and (AB) for the number of
objects possessing the attribute AB.
Remark. Class frequencies ~f the type (A), (AB), (ABC) etc. are known as
positive frequencies; (a), (a~), (a~'Y) etc. are known as negative frequencies
and (DB), (A~C), (a~C) etc. are called' the contrary frequencies.
11·4·1. Order or Classes aod Class rreq ueocies. A class
represented by n attributes is called a class of nth order and the corresponding
frequency as the frequency of the nth order. Thus (A) is a class frequency of order
1; (AB); (AC), (~'Y) etc. are class frequencies of second order; (ABC), (A~'Y)
(a~C) etc. are frequencies of third order and so on. N, the total number of
members of the population, witltout any specification of attributes, is reckoned
as a frequency of zero-order.
.Thus in a dichotolpouS cl~ification with respect to n attributes, the
can be selected in (nr ) ways and each of the r attributes contributes two
symbols, one representing the positive part (e.g" A) and the other the negative'
part (e,g .• a), Thus the total number of class frequencies of all orders, for n
attributes is :
"
~ (nr ) 2' = 1 + (~ ) 2+( ~ .) 22 + ... '+ ( : ) 2~ =(1 + 2)" = 3"
... (11-1)
Remarks 1. lit particular, for.n attributes, the tom! number of class
frequencies of different orders are given as follows:
Order o 1 2 r n
Order Frequencies
0 N
{ (A) (B) (C)
1 @) (y)
(a)
(AI3) (aB) (a(3)
{ (AD)
2 (AC) (Ay) (aC) (ay)
'(BC) (By) @C) (~y)
frequencies, then it is obvious from the table that the remaining frequencies,
(A~), (a~) (aB),(a) and (~) can be obtained by subtraction, e.g., given:
viz.,
A a Total
B (AB) - (B)
~ - - (~)
Total (A) (a) N
Now
(AB) ~ (ill.. ~
(B) > (~)=> (B) > (AB)
an. @ll.
1 + (B) > 1 + (AB)
11·7
N J.&.
(B) > (AB).
N, J!lL
(A) > (AB)
(A) + (a) (AB) + (aB)
(A) > (AB)
{gl (aB)'
1 + (A) > I + (AB)
{g1 (aB)
(A) > (AD)
(AB) (aB) .
(A) > (a) , as required.
EXERCISE 11 (a)
1. (0) Explain the following: (,) Draer of a class, (i,) Ultimate classes and,
(iiI) Fundamental set of class frequencies.
(b) What is meant by a class-frequency of (,) flt~t order, (i,) third order?
How would you express a class frequency of rltst order in terms of class
frequencies of third order?
2. Wh.at is dichotomy? Show that the continued dichotomy according to n
attributes gives rise to 3/1 classes.
3. (0) Given that (AB) = 150, (A~) = 230, (aB) = 260, (a~) = 2,340; find
the other frequencies and the value of N. .
(b) Given the following frequencies of We positive classes, find the
frequencies of the rest Of the classes :
(A) = 977, (AB) = 453, (ABC) =
127, (B) = 1.185, (AC) = 284,
N = 12;000, (C) = 596, and (BC) = 250.
= =
Ans. (A~) 524, (aB) = 732. (a~) 10,291, (~y) = 935, (~C) 346. =
=
(~y) = 10,469. (Ay) =693. (aC) 312. (ABy) = 326. (aBC) = 123.
"(aBy) =609,(A~C) = 157, (A~y)=367, (a~C) = 189, (a~y) =10,192.
4. Given the following data, find frequencies of (i) the remaining positive
classes; and (ii) the ultimate classes;
= =
N = 1,800, (A) 850, (B) = 780, (C) 326, (ABy) = 200, (A~C) 94, =
=
(aBC) 72, and (ABC) 50. =
S. (0) Measurements are made on a thousand husbands and a thousand
wives. If the measurements of the husbands exceed the measurements of the
wives in 800 cases for one measurement, in 700 cases for another and in 660
cases for both measurements, in how mapy cases will both measurements on the
wife-exceed the measurements on the husband ?
Ans. 160
(b) Ap unofficial political study was made about the recent changes in
Indian political scene and it was found that 919 Indira Gandhi Congress
supporters and 1,268 Organisatidn Congress supporters wanted socialistic
economy, whereas 310 Indira Gandhi Congress supporters and 503 supporters of
11.8
the Organisation Congress wanted capitalistic economy in the country. Find out
the total number of Indira Gandhi's and that of the Organisation's supporters,
giving the number of capitalistic economy's and of the socialistic economy's
votaries, out of the individuals, who were surveyed.
6. At a competitive examinatio~ at which 600 graduates appeared, boys
outnumbered girls by 96. Those qualifying for interview exceeded in number
those failing to qualify by 310. The number of Science graduate ~oys
interviewed was 300 while among the Arts graduate girls there were 2S who
failed to qualify for intervew. Altogether there were only 135 Arts graduates and
33 among them failed to qualify. Boys who failed to qualify numbered 18.
Find (e) the number of boys who qualified for interview,
(il) the total number of Science graduate boys appearing, and
(iii) the number of Science graduate girls who qualified.
ADS. (i) 330, (ie) 310, and (iii) 53.
7. 100 children took three examinations A, B and C; 40 passed the fust, 39-
passed the second and 48 passed the third, 10 passed all the.three, 21 f~led all
three, 9 passed the first two and failed the thUd, 19 failed the ftrSt two and passed
the third. find how many children passed at least two examinations. Show that
for the question asked certain of the given frequencies are not necessary. Which
are they?
ADS. 38. Only frequencies required are (C)" (apC), (ABy).
8. In a university examination, which was indeed very tough. 50% at least
failed in "Statistics", 75% at least in Topology, 82% at least in "Functional
Analysis" and 96% at least in •• Applied Mathematics". IWw many at least failed
in all the four? (Ans. 3%)
HiDt. Use the result in Example 11·3. Page 11-6.
9. If a collection contains N items, each of which is characterized by one
or more of the aUributes A, B, C and D, show that with the usual notations
(Q (ABCD) ~ (A) + (B) + (C) + (D) -3N, aJ!d
=
(ii) (ABCD) (..uJD) + (ACD) - (AD) + (AD~y),'
where p and "f represent the characteristics of the libsence of Band.c
respectively.
10. Given (A) = (a) =(B) =(~) =kN; show that (AB) =(a~), (A~) =(aB).
11. Given that (A) =(a) =(B) =(P) =(C) =(y) =l N
1
=
and also that (ABC) (a~"f), show that 2(ABC) =(AB) + (AC) + (BC) - 2" N.
n·6. Consistency of Data. Any class frequencies which have been
or might have been observed within one and the same population are said'to be
consistent if they conform with one another and do not ~.any way conflict ~or
example~ the figures (A) =20, (AB) = 25 are incons~~nt as (AB) cannot be
greater than (A), if they are observed from the same population.
'Consistency' of a set of class frequencies may be defined as the property
that none of·them is negative, otherwise, the data forcJass frequencies are said to
be 'inconsistent'.
Since any class frequency can be expressed as the sum Qf some of the
ultimate class freqlJCncies, it is necessarily non-negative if all the ultimate class
frequencies are non-negative. This provides a criterion for testing the consistency
of the data. In fact, we have the following theorem.
Theorem 11·1. "The necessary and sufficient condition for the
consistency of a set of independent class jrer[tu!1lcies is that. no ultimate class
frequency is neg~tive."
Remark. We can test the consistency of a set of 2" algebraically
independent class frequencies by calculating the ultimate class frequenCies. If any
one of them is negative, the given data are inconsistent
11·'·1. Conditions for consistency of Data. . Criteria ~r
consistency of class frequencies are obtained by using theorem 11·1. For a singte
attribute A we have conditions of consistency as follows :
(I) (A) ~ 0 }
(iI) (a) ~ 0 ~ (A) S N) ... (11.5)
For two attributes A and B, the coJll\litions of consistency are :
(I) (AB) ~ 0 }
(ilj (A~) ~ 0 ~ (AQ) S (A)
(iii) (all) ~ 0 ~ (AB) S (B)
(iv) (~) ~ 0 ~ (AB) ~ (A) + (B)' ..;. N) ... (11-6)
Conditions of consistency for three atiributes A. B and C are
(I) (ABC) ~ 0
(iI) (ABy) ~ 0 ~ (ABC) S (AB)
(ii.) (APc) ~ 0 ~ (ABC) S (AC)
(iv) (oBC) ~ 0 ~ (ABC) S (BC)
(v) (APY), ~ 0 ~ (ABC) ~ (AB) + (AC) - (A)
(vi) (C11Jy) ~ 0 ~ (ABC) ~ (AB) + (BC) - (B)
(vii) (apC) ~ 0 ~ (ABC) ~ (AC) + (BC) - (C)
(viii) (a~y) ~ 0 ~ (ABC) S (AB) + (BC) + (AC) - (A) - (B) - (C) + N
, ... (11'7)
(i) and (viii) in (U ·7) give:
(AB) + (BC) + (AC) ~ (A) + (B) + (C) - (IV) .}
Similarly
(i.) an (vii) ~ (AC)' + (BC) - (AB) s (C) ••• (11.8)
(iii) sd (v.) ~. (AB) + (BC) - (AC) S (8)
(iv) ad (v) ~ (AD) + (AC) - (flC) S (A)
Remark. As already pointed out [cf. Remarks (3) and (4),..§ 11-4·2)],2"
algebraically independent cJass frequencies are necessary to specify the data
completely, one such ~t being the set of ultimate class frequencieS and the other
being the set of positive class frequencies. If the data supplied are incomplete so
that it is not possible to detennine all the class frequencies, then the conditiqns
(ll·S), (11-6) and (II·S) for one, two and three, ~ttributes respectively, enable us
to assign the limits w~thin which an unknown class frequency Can lie.
11·10
Example 11·S. Examine the consistency of the following data:
= =
N = 1,000, (A) 600, (B) 500, (AB) = 50, the symbols 'having their
usual meaning.
Solution. We have
(aP) =t N - (A) - (B) + (AB) = 1000 - 600 - 500 + 50 =-50.
Since (aP) < 0, the data are inconsistent.
Enmple 11'6. Among the adult population of a certain town 50 per
cent are males. 60 per cent are wage earners and 50 per cent are., 45 years of-age
or over, 10 per cent of the males are not wage-earners and 40'per cent Of the
males are u1ll/er 45. Make the best possible inferenc~ about the limits within
which the percentage of persons (male or female) of 45 years or over are wage-
earners.
=
, Solution. LetN 100. Then denoting males by A, wage-eamers by Band
45 years of age or over by C, we are give~:
N = 100, (A) =SO, (B) =60, (C) =50
10 40
(AP) =100 x 50 = 5, (A'Y) = 100 x 50 =20
•• (AB) =(A) - (AP) =45, (AC) =(A) - (A'Y) =30
We are required to fmd the limits for (~C).
Conditions of consistency (11·8) give
(I) (AB) + (BC) + (AC) ~ (A) + (B) + (C)-N
~ (BC) ~ 50 + 60 + 50 - 100 - 45 - 30 -15 =
(il) (AB) + (AC) - (BC) S A
~ (BC) ~ (AB) + (AC) - (A) =45 + 30 - 50 = 25
(iii) (AB) + (BC) - (AC) S· (B)
~ (BC) S (B) +. (AC) - (AB)= 60 + 30 -45 =45
(iv) (AC) + (BC) -' (AB) S (C)
=
(BC) S (C) + (AB) "" (AC) 50 + 45 - 30 65 =
(I) to (iv) => 25 S (BC) S 45
Hence the percentage of wage-earning population of 45 years or over must
lle between 25 and 45.
Example 11·7. In a series of houses actually invaded by smallpox, 70%
of the'inhabitants are tJltacJced and 85% have been vaccintJIed. What is the lowest
percentage of the vaccinated that must have been attacJced ?
Solution. Let A and B denote the attributes of the inhabitants being
aaacked and vaccinated respectively. Then we are given:
N = 100, (A) =70 and (B) =85
Consistency condition gives':
(AB) ~ (A) + (B) - N => (AB) ~ 55
Hence the lowest percentage of inhabitants vaccinated, who have been
auackedis ~
(AB) 55'
=
(B) x 100 = 85 x ~OO 64·7%
'lbeorY ofAttributes
=>
mm > @ (0_1
N -N+N
=> y ~ 2x+3x-1
=> 5x - I S.y ... (ii)
.(1) an4 (il) give
5x - 1 S x => 4x S 1 => x S 41 ...(iii)
.Thus ,from ~l) and (iiI) we have y S.x S ~. ~hich establisheS ~e ,result
Example 11'9. Show that (i) If all A's are B's and all B's are C;' s then
aliA's are C's, (ii) IfallA's are Bs.and noB's are C's then no A's are C's.
Solution'. (i) All A's are B"s => (All) = (A) }
and all B's are C's => -(BC) (B) = ...(*)
EXERCISE l1(b)
1. What do you' understand by consistency of giv~n data ? How do you
check it?
2. (a) If a report gives the following frequencies as actually observed, show
that there must be a misprint or mistake of some sort, and that possibly the
misprint consists in the dropping of 1 before 85 given as the frequency (BC) :
=
N= 1000, (A) 5 iO., (B) =490; (C)=427, ~1B)= 189,t{AQ= 140, (P9>=8'5.
(b) A student reported the results of a survey in the followi~g planner, in
teons of the usual notations :
N = 1000, (A) = 525, (B) = 312, (C) = 470, (AB) = 42, (BC) = 86,
(AC) = 147, and (ABC) =25.
Examine the consistency of ~e above data.
(c) £xamine the consistency and adequacy of the following data tQ detemtJne,
the frequencies of the remaining positi ve and ultimate classes.
N = 10,000, (A) = 1087, (B) =486, (C) = 877,
=
(CA.~) =281, (Ca~) =86, (yAB) 78, (ABC) 57 =
3. Given that (A) == (B) =(C) =!N and'80 per ce~t of,A's are II's, 75 per
cent of A's are C's, find the limits to'the percentage of B's that are C's.
Ans. 55% and 95%.
4. If (A) =50, (B) = 60, (C) = 50, (A~) =5, (Ay) =20, N = 100, find the
greatest and the least poss~ble vaiues of (BC) so that the data may be consistent.
Ans. 25 ~ (BC) ~ 45
5
5. If 1,000 =N =Ij'(A) =2(B) =22'5 (C) =5 (AB), and (AC) =(BC), what
should be the minimum value of (BC) ?
Ans. 150
'. . I
'.-Glven that (A) =(B) =(C) =2' N = 50 and (AB) = 30, (AC) = 25, find
the limits within which (BC) will lie.
7. In a university examination 6,5% of ttte candidate passed in English,
90% passed in the second language and 60% passed in the optional subjects.
Find how many at least should have passed the whole examination•
.-\ns; 15%. Hint. Use Example 11·3.
8. A market investigator returns the following'data. Of 1,000 people
consulted 811 liked chocolates, 7S2,liked toffees and 418 tpced boiled sweets,
570 liJced both chocolates and toffees. 356 liked chocolates and boiled sweets and
348 liked toffees and boiled sweets, 297 liked all three. Show that this
information as it stari<is must be incorrect ' .
9. (a) In a school, 50 per cent of the students are boys: 60 per cent are
Hindus and 50 per cent are 10 years of age or over. Twenty per cent of the boys
are not 'Hindus and 40 per cent of die boys are under 10. What conclusions can
you,draw in regard to percentage of Hindu students of 10 years or over?
(b) In a college. 50 per cent of the students are boys, 60 per cent of the
student are above 18 years and 80'per cent receive scholarships. 35 per cent of
the -students are beys above 18 y~s of age, 45, per cent are'bqys r:eceiving
scholarships, and 42 per cent are above 'l8 years and receive scholarships.
Determine the limits to the proportion of boys above 18 years who are in rec~~pt
of sclioJarsh~ps~
ADS. Between 30 and 3~., .'
10., The following Summary a~ in'a,report on a survey coveriQg 1,000
fieldS. Scrutinise the numbers and point out 'if there is any mistake or Il}isprint
in them. ' ,
'lbecry of.AUzibutM 11·13
~'
1- (AB) _1_(<<8)
(A) - (a)
11·14 Fllndamentala of Mathematical Statiatiec
Attributes A a Total
-
B (J\B) (aB) (/J)
~ (A~) (a~) $)
(AB) .....ill @
... (11·11a)
N - N' N'
which leads to the following i~por1a9t fundam~ntal rule :
"[f the attributes A and,B are independent, the proportion of AB' s in the
population is equal t() the product of the proport!o14S of A's qnd B's in the
population."
We may obtain a third criterion of indepen4{mce in terms of second order
class frequencies, as follows.
("AB) ( A.) - ,<A) (B)
. • al-' N'
~ ~!&OO.
N -
(gl@(Using11.11)
N • N
(AB) . (aM
(AB)
=(A~).
~
(aB)} ... (11·12)
(aB) = (ap)
Aliter. (11·12) may also be obtained from(11·9) and (11·9a) as explained
below'; .
+ 2N [(AB) _ (A!»B'>]
=[{(AB)-(~)} + {(Aro-(aB)}][{(AB)-(a~)} -'((A~)-(aB)H
+ 2[N(AB) - (A) (B)]
=[(AB) - (a~)]2 - [(A~) - (aB)]2
+ 2[(AB){(AB) + (A~) +.(aB) + (o./J)} - {(AB)+ (A~)} {(AB) + (as)}]
=[(AB)2 + (~)2 - 2(AB) (a~)]
-[(A~)2 + (fI,B)2 - 2(A~) (as)] + 2[(.48) (~~) - (A~) (as)]
(On simplification)
=
;:::: (AB)2 + (a~Y~ - (A~)2 - (aB)2 R,H.S.
Y-,
_{I _--'II (A6)(aB)
(AB)(~)
}' / '{I. + --VI (AB)(aB)}
(AB)(a~)
... (11·19)
. 1- 1
Remarks ,1. Obvlously Q =0 ~ Y = 1+1 =o.
Q = -1 => Y= -1 and Q = 1 =>' Y = 1 and Conversely.
(AB) (aB)
2. If we let (AB) (a~) =k, so that
Y ,= 1 - {k => y2 = 1 +} - 2;Vk
1 '+ ~ 1 + k + 2~ k
1+ yz _ 2(1 + k) _ 2(1 + k)
- 1 + k + 2~ k - (1 + {k )2
2Y = 2(1 - {kx (1' + {k). == 1 - k
1 + y2 2Q + k) 1+k
(AS) (aB)
_ 1 - (AB) (aB) =(AB) (aB) -(AB) (aB)
- 1 + (AB)(aB) (AB)(a~)'+ (A~)(aB)
(AB) (a~)
11·18
Q . 2Y
... (11.20)
=1 + y2
Example 11·11. Find if A and B are independent. positively associated
or negatively associated. in each of the following cases :
=
(i) N = 1000. (A) = 470. (B) 620. and (AB) = 320.
(li) (A) = 490. (AB) = 294. (a) = 510. and (aB) = 380.
(iii) (AB) = 256. (aB) = 768. (AP) = 48, and (ap) = 144.
Solution. (I) ~ ::;: (AB) _ (A~B)
256 [ 144-48x3 ] =0
=N
~ A and B ate independent.
Example U·U. Investigate the association between darkness of eye-
colour' in father and so'!from the following data :
Fathers with dark eyes and sons )Vith dar~ eyes: 50
F athen with dark eyes and sons with not dark eyes: 79
Fathers with not ~k eyes and'sons with dark eyes: 89
Fathers with not dark eyes and sons .with not dark eye~ : 782
Also tabulate for comparison the frequencies that would have been observed
hod therp been no heredity.
Sol uti 0 n. Let tl: Dark eye-colour of father and
B: Dark eye-C9iour of son.
=
Then we ate given (AB) SO, (A~) = 79~ (aB) =89, (a.P) =.782
.11·19
'Discuss if the colour of son's eyes is ~sociated. wjtl) tha~ Qf. father.
ADS. Yes. Positively associated, Q 0·66.=
(b) The following t4ble sltows the result of inocculation against cholera.
1!ot attacked Attacked
/nocculaJtd 431 5
Not-inocculated 291 9
Examine th~ effect of inoccuJation in controlling susceptibility to cholera.
ADS. • Il)occulation is.. effective in· controlling cholera.
4. (a) tmd the association between proficiency.in English 2Ild in Hindi
among candidates at a certain test if 245--of them passed in Hindi, 285 failed in
Hindi, 190 failed in Hindi but passe4 in English and 147 ~ in.both.
(b) TJte'rpaie population of a state is 250 lakhs. The number of li~ra~.
males is 20 lakhs and total number of male criminals is 26 thouSand. The
number of literate male criminals is 2 thousand. Do you find any association
between literacy and criminality '1 •
ADS. Lite~y and criminality are positively associated.
11·21
(c) From the following particulars frnd whethec blindness and baldness are
associated :
Tot~1
population 1,62,6.4,000
Number of baldheaded 24,441
Number of blind 7,26~
Number of baldheaded blind 221
5. In a certain investigation carried on with regard to 500 graduates and
1500 non-graduates, it was found that the num~r of employed graduates was
450 while the number of unemployed non-graduates was 300. In the second
investigation 5000 cases were examined. The number of non-graduates was 3000
and the number of employe4 1J0n-graduates was 2500. The number of graduates
who were fQ1.JDd to be employed was 1600:
Calculate the coeffICient of association between graduation and employment
in both the investigations.
Can any defiriite conclusion be drawn from the coefficients ?
ADS. Q (lst Investigation) =~. 38, Q (SeCOnd Investigation) =- 0·11
6. (a) Three aptitude tests A, B, C were given to 200 apprentice trainees.
From amongst them 80 passed test A.-78 passed test B and 96 passed die third
test While 20 passed aU the three tests, 42 failed all the three, l~ p~ it and
B but failed C and 38 failed A and B byt passed the third. DeterD!ine (i) how
many trainees pa~sed at least two of the three tests and (ii) whether the
performances in tests A and B are associated. ADS. (I) 76, (it) ~ =0·3 '
(b) In a survey of a population of 12000, information is gathered regarding
three attributes A, B and C. In the usual notations, .
(A) =980; (AS) =450, (ASC) = 130
(fJ) =1190, CAc) =280, (C) = 600 and (BC) ~ 250.
=
Find : (l) (a~'Y) (ii)QAB Coefficient of Association between A and I'
Comment on your findings.
7. A group of 1000 fathers was studied and it was found that 12·9% had
dark eyes. Among them the ratio of those having ~ns with dark eyes to diose
having sons with not daPc eyes was 1 : 1·58. The number of cases where fathers
and sons both did not have dark eyes was 782. Calculate coefficient of
association between darkness of eye colour in father and son. Give the
frequencies that would have been observed had there been completely no heredity.
HiDt. (AS) = 50, (~~) =79, (Q.8) =89 and. (a~) =782.
8. A census revealed the following figures of the blind and the insane in
two ag~-groups in a cectain population: .
Age-Gro." A'ge-9roup
15-J5 years over 25 years
Total population 2,70,000 '1,60,200
Number of-blind "1,000 2,000
.Number of insane 6,000 1,000
Number bf insane among the blind 19 9
(IJ Obtain a measure of association between blindness and insanity for each
age-group. .' •
(ii) Which group shows more association or dis-a:ssociatiotl (if any) ?
9. Show that if (AB)1o (as)1' (AP)1o (ap)1 and (ABh. (aSh. (i'\P)2 and
(ap)2 be two aggregateS corresponding to the same values of (A). (B). (a) and
(~),dlen •
=
(AB)l - (AB)2 =(aB)2 - (aB)l =(AP)2 - (AP)l (ap)1 - (aPh