Neyman 1934
Neyman 1934
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].
Wiley and Royal Statistical Society are collaborating with JSTOR to digitize, preserve and extend access to
Journal of the Royal Statistical Society.
http://www.jstor.org
CONTENTS.
PAGE
I. Introductory ... ... ... ... ... ... ... ... 558
II. MathematicalTheoriesunderlying theRepresentative
Method ... 561
1. The theoryof probabilitiesa posterioriand the work of R. A.
Fisher. ... ... ... ... ... ... 561
2. The choice of the estimates ... ... ... ... 563
III. DifferentAspectsof theRepresentative Method ... ... ... 567
1. The methodof randomsampling ... ... ... 567
2. The methodof purposiveselection ... ... ... 570
IV. ComparisonofthetwoMethodsofSampling . .. ... ... 573
1. The estimatesof Bowley and of Gini and Galvani ... ... 573
2. The hypothesesunderlyingboth methods and the conditions
of practicalwork... . .. ... ... ... ... ... 576
3. Numericalillustration ... ... ... ... ... 583
V. Conclusions ... ... ... ... ... ... ... ... 585
VI. Appendix ... ... ... ... ... ... ... ... ... 589
I. INTRODUCTORY.
the Italian statisticiansC. Gini and L. Galvani were faced with the
problem of the choice between the two principlesof sampling,when
theyundertookto select a sample fromthe data ofthe Italian General
Census of 1921. All the data were already worked out and
published and the original sheets containing informationabout
individual familieswere to be destroyed. In orderto make possible
any furtherresearch,the need forwhich mightbe feltin the future,
it was decided to keep for a longertime a fairlylarge sample of the
census data, amountingto about I5 per cent. of the same.
The chiefpurposeofthe workis stated by the authorsas follows: *
"To obtain a sample which would be representativeof the whole
countrywith respect to its chiefdemographic,social, economic and
geographiccharacteristics."
At the beginning of the work the original data were already
sorted by provinces, districts (circondari) and communes, and the
authors state that the easiest method of obtaining the sample was
to select data in accordance with the division of the country in
administrative units. As the purpose of the sample was among
othersto allow local comparisonsto be made in the future,the authors
expressedthe view that the selectionofthe sample,takingadministra-
tive units as elements,was the only possible one.
For various reasons,which,however,the authorsdo not describe,
it was impossibleto take as an elementofsamplingan administrative
unit smaller than a commune. They did not, however, think it
satisfactoryto use communesas units of selection because (p. 3 loc.
cit.) their large number (8,354) would make it difficultto apply the
method of purposiveselection. So finallythe authorsfixeddistricts
(circondari)to serve as units of sampling. The total numberof the
districtsin which Italy is divided amounts to 2I4. The number of
the districtsto be included in the sample was 29, that is to say,
about I3-5 per cent. of the total numberof districts.
Having thus fixed the units of selection,the authors proceed to
the choice ofthe principleofsampling: shouldit be randomsampling
or purposive selection? To solve this dilemma they calculate the
probability,7, that the mean income ofpersonsincluded in a random
sample of k 29 districts drawn from their universe of K 214
districtswill differfromits universe-valueby not more than i 5 per
cent. The approximate value of this probabilitybeing very small,
about 7t *o8, the authors decided that the principle of sampling
to choose was that of purposive selection.t
The quotation fromthe Report of the Commissionof the Inter-
* Annali di Statistica,Ser. VI. Vol. IV. p. 1. 1929.
t It may be noted,however,that the choice of the principleseems to have
been predeterminedby the previouschoice of the unit of sampling.
THEORIESUNDERLYINGTHE REPRESENTATIVE
II. MATHEMATICAL
METHOD.
1. The Theory of Probabilitiesa posteriori and theworkof R. A.
Fisher.
Obviously the problem of the representative method is par
excellencethe problemof statisticalestimation. We are interestedin
characteristicsof a certain population, say 7, which it is either
impossible or at least very difficultto study in detail, and we try to
estimate these characteristicsbasing our judgment on the sample.
Until recentlyit has been usually assumed that the accurate solution
of such a problem requires the knowledge of probabilities a priori
attached to differentadmissible hypotheses concerningthe values
of the collective characters* of the population t. Accordingly,the
memoirof A. L. Bowley may be regardedas divided into two parts.
Each questionis treatedfromtwo points ofview: (a) The population
7-is supposed to be known; the question to be answered is: what
could be the samplesfromthispopulation? (b) We knowthe sample
and are concernedwiththe probabilitiesa posteriorito be ascribed to
differenthypothesesconcerningthe population.
In sections which I classifyas (a) we are. on the safe ground of
classical theory of probability, reducible to the theory of com-
binations.t
In sections (b), however, we are met with conclusions based,
* This is a translationofthe terminology used by Bruns and Orzecki. Any
characteristicsof the populationor sample is a collectivecharacter.
t In this respect1 should like to call attentionto the remarkablepaper of
the late L. Marchpublishedin M11etron,Vol. VI. Thereis practicallyno question
of probabilitiesand many classical theoremsof this theoryare reduced to the
theoryof combinations.
Here I should like to quote the words of Laplace, that the theory
of probabilityis in fact but the good commonsense which is reduced
to formulm. It is able to express in exact terms what the sound
mindsfeelby a sort of instinct,sometimeswithoutbeing able to give
good reasons fortheirbeliefs.
2. The ChoiceoftheEstimates.
However, it may be observed that there remains the question
of the choice of the collectivecharactersof the samples which would
be most suitable for the purpose of the constructionof confidence
intervalsand thus forthe purposes of estimation. The requirements
with regard to these characters in practical statistics could be
formulatedas follows:
1. They must follow a frequencydistributionwhich is already
tabled or may be easily calculated.
2. The resulting confidenceintervals should be as narrow as
possible.
The firstof these requirementsis somewhat opportunistic,but
I believe as far as the practical work is concerned this condition
should be borne in mind.*
Collectivecharactersof the samples whichsatisfyboth conditions
quoted above and whichmay be used in the most common cases, are
supplied by the elegant methodof A. A. Markoff,tused by him when
dealing with the theoryof least squares. The method is not a new
one, but as it was published in Russian it is not generallyknown.*
This method, combined with some results of R. A. Fisher and of
E. S. Pearson concerningthe extension of " Student's " distribution
allows us to build up the theoryof diflerentaspects of representative
method to the last details.
Suppose 0 is a certain collective character of a population and
. . . Xn
Xl, X2. * * . . . . (
t .(4)
71,7r2* . . . (6)
fromwhich we may draw random samples. Let
where the a's are some known coefficients. Markoffgives now the
method of findinglinear functionsof the x's determinedby samples
fromall the populations, namely,
of '= llx1 + X12X12 + * * * + jn1x1,n1 +
+ . . . . . . . . . . . . +
thevalueofunknown
such,thatwhatever O:
(a) Mean O'-in repeated samples Oj.
(b) Standard errorof O'jis less than that of any other linear func-
tion, satisfying(a).
The details concerningthis method are given in Note II of the
Appendix.
It is worth consideringthe statistical meaning of the two con-
ditions (a), (b), when combined with the fact that if the number of
observationsis large,the distributionof O' in repeated samplingtends
to be, and forpractical purposes is actually normal. The condition
(a) means that the most frequentvalues of O' will be those close to
0. Therefore,if 4 is some linear functionof the x's, which does not
satisfythe condition (a), but instead the condition,
Mean 4 in repeated samples = 0 + A, (say),
then, using 4 as an estimate of 0, we should commit systematic
errors,which most frequentlywould be near .A. Such estimates as
4 are called biased.
The condition (b) assures us that when using 0"s as estimates of
* Sometimes,in special problems,even this knowledgeis not required.
X __ .(21)
X' X .(26)
This may be shown on the followingsimple example. Suppose
VOL. XCVII. PART IV. X
TABLE I.
I. 07 09 100 7 9
II. 09 09 400 36 36
III. *11 *12 100 11 12
IV. | 13 *12 900 117 108
values,
Owingto the fact that the controly has only two different
09 and *12,thereis no question about the hypothesisH concerning
the linearityof regression,which is certainlysatisfied. The regres-
sion line passes through the points with co-ordinates (y= -09,
x= 08) and (y= *12,x= *12). Thus the coefficient of regression
g ,j-. Assume now we have a sample fromthe above population,
which includes the whole of it and calculate the estimate X' of
X *14. We shall have
X Y.= = *114
+ (X>, -x)= -*014
weightedregressionequation
+ b2y(2)+-..
x = bo+ bly(-" + bsy(s) (28)
is foundby minimizingthe sum of squares
Ev(x - bo-b.1yi) -. . . - bsyicS))2 (29)
X' )
E(vixs . . . . (30)
which is consideredas an estimateof the unknownmean X.
Simple mathematicalanalysis of the situation proved (see Note
III) that this estimate is consistentwhen a special hypothesis,H',
about the linearityof regressionof xj on ygholds good, and even
that it is the best linear estimateunder an additional condition,Hj,
concerningthe variationof the Ti in strata correspondingto different
fixedvalues of y and v.
The hypothesisH' consistsin the assumptionthat the regression
of x on y is linear not only if we considerthe whole population 7t of
the districts,but also if we consider only districtscomposed of a
fixednumberof individuals. It is seen that the hypothesisH' is a
still more limitingthan the hypothesisH.
The other condition,H1, is as follows. Consider a stratum, ',
definedby the values y= y' and v v' and consider the districts
belongingto this stratum. Let
X1,~
X27X.. . (31)
be the values of the means x correspondingto these districts. The
hypothesis,say H1, under whichthe estimate of X proposed by Gini
and Galvani is the best linear estimate,consists in the assumption
that the standard deviation, say a' of the xj correspondingto the
stratum r' may be presentedby the formula
=_
a/W . . . . ..(32)
FIG.1.
c~~~~~~
~~~ 0 0 0
RIEGIRESS.0 0 F0 0 0N
0 0 0 0 0
H~~~ FIG. 0
21ZI-.+0 0 03-6
0 0 _0
F,'lf-1818 0717 01
0 00 0 0 B0F0T.00000
estimate of the sum of the u's for the whole stratum. Summing
these estimates for all strata, we get the best estimate of the sum
of u's forthe whole population. To get an estimate of X it remains
only to divide the estimate of the sum of u's by the sum of v's,
which may be known or may be estimated by the same method.
Thus the finalestimate of X say X" is either
X" = S(M u) . . . . .
(36)
MKo
EMi Si E(lM+&S,)) (39)
whereSj2 stands for Miai2/(M, - 1). We see that only the middle
term of the right-handside depends upon the values of the m's.
The othertermsremain constant whatever the system of m's, pro-
vided their sum, mO, remains unchanged. Thus the method of
diminishingthe value of a2 consistsin diminishingthe middle term
of the right-handside of (39). This has its minimumvalue, zero,
when the numbersmi are proportionalto the productsMiSi. Thus
if it is possible to estimate the variances aj2 of the u's within any
given stratum,the most favourable system of mi's is not that for
which the mi are proportionalto the Mi. Denote the three terms
of the right-handside of (39) respectivelyby A, B and -C. If we
assume that the m,'s are proportionalto Mi, then we shall findthat
the termB = C and the variance a2 is reducedto
Y= -1 Y = O. y= + 1.
1 -17 1 4 1 3 7 20 3
2 -18 2 5 0 2 8 18 2
3 -19 3 6 -1 1 9 16 1
Totals -54 6 0 6 54. 6
Means x-1) -9 - J (0) = 0 - (1) 9
TABLE III.
Populations. TT2. T3. IT4. All popul.
Here X' and X" mean the estimatesof X, (i) obtained by method
proposed by Gini and Galvani, and (ii) calculated fromthe formula
(35). A' = X'-X and A" -X" - X represent the errors of
these estimates. It willbe seen that the estimateX" gives generally
better results. But this is not an essential point in the example,
as it is easy to constructanotherin which the estimate X' would be
the better. In fact, the accuracy of X" is connected with the
variabilityofthe u's withinthe strata. If in singlestratacorrespond-
ing to different values of y, the variation of the u's is very large,
then the results obtained by using X" would not be very good.
The comparisonbetweentwo methodscould perhaps be worked out
arithmeticallyif we were to considersecond orderstrata. But this
would extend the example to the point of losing its illustrative
properties.
What is importantto note is that the results obtained by using
X' get worse and worse with the departure fromthe linearity of
regression. This last circumstance does not affectthe accuracy
of X" at all. On the otherhand, a change in the values of aj would
affectX".
V. CONCLUSIONS.
FIG III.
250
15 20 25 50 35 +0 +5 50 55 60 65 70 75 80
AGE
TABLE IV.
Age Distributionof Polish Workers.
Males.
VI. APPENDIX.
Note I.
Suppose we are taking samples, E, from some population 7t.
We are interestedin a certaincollectivecharacterof this population,
say 0. Denote by x a collective character of the sample E and
suppose that we have been able to deduce its frequencydistribution,
say p(x10), in repeated samples and that this is dependent on the
unknowncollectivecharacter,0, of the population 7.
The collective characters I am speaking about are arbitrary.
The position may be illustrated, for instance, by supposing that
the collective character0 is the proportionof a certain type of in-
dividuals in the population 7t, and x the proportion of the same
type of individuals in the sample. The distributionof x is then a
binomial, dependingupon the value of 0.
Denote now by cp(O)the unknown probability distribution a
priori of 0. Suppose that the general conditions of sampling and
the propertiesof the collective characters 0 and x definecertain
values which these characters may possess. In the example I
mentionedabove, 0, the proportionof individuals of the given type
in the population may be any number between 0 and 1. On the
other hand, x, the proportionof these individuals in the sample,
say of n, could have values of the form k7/n,k being an integer
o< k <n.
The new form of the problem of estimation of the collective
character 0 may be stated as follows: given any positive number
? < 1, to associate with any possible value of x an interval
such that if we accept the rule of stating that the unknown value
of the collectivecharacter0 is contained withinthe limits
O1(x') < 0 < 02(x'). (2)
every time the actual sampling provides us with the value x= x',
the probabilityof our being wrongis less than or at most equal to
1 - c, and this whatever the probability law a priori, y(O).
The value of ?, chosen in a quite arbitrarymanner, I propose
* Interestingremarksin thisrespectare to be foundin the excellentbook of
M. Ezekiel: Methodsof CorrelationAnalysis (1930).
FIG. IV
o~~~~~~~~~~~~~~~~~~~~~~~~~(
,~~~~x (0 ,
0-c~ ~ ~ ~~~~A
P(0') = . . . . . (5)
In the case, however, when the variate x is not continuous, for
instance if it followsthe binomial law of frequency,we should have
only
P (0) > ? . . " . . . '(6)
For the sake of definiteness,we shall assume that the interval
[x1(0'), x2(0')] is chosen to be the shortestpossible satisfyingthe
condition (6). Such an interval will be called the interval of ac-
ceptance, correspondingto the chosen value of ?. Suppose now we
have foundthe intervalsof acceptance, correspondingto all possible
values of 0. Now join all left-handside boundaries of the intervals
of acceptance by a continuousline, which may be a smooth curve
or a polygon. Denote this line by LL. Another line, say 11,will
join the right-handside boundaries of the intervals of acceptance.
The two lines LL and 11will be boundaries of a certainbelt, which I
shall call the confidencebelt, CB.
Consider now the points, say A with coordinates (x10), thus
representingcombinations of all possible values of x and 0. The
confidencebelt CB as definedabove has the fundamentalproperty
that whatever the probability law a priori cp(O),the probabilityof
having the point A inside of the confidencebelt is equal to or larger
than the chosenvalue ofthe confidencecoefficient. This probability
may be representedeitherby means of integralsor by means of the
sums extendingover all possible positions of the point A inside CB,
according to the properties of the variates x and 0, which may be
continuous or not. Thus if pUB is the probability under consider-
ation, we should have eitherthe expression
or
X2(0)
PCB =2 p(O)fp(xlO)dx,.(9)
0 Xi(a)
or finally
PCB fp(O) E P(xO)dO, . . . . (10)
x,(O)
PCB -?.(13)
The constructionof the confidencebelt is quite independent
of any arbitraryassumption concerningthe values of 0. If the
confidencebelt is constructed,we may affirmthat the point A will
lie inside of the belt. This statement may be erroneous,but the
probabilityof the erroris eitherequal to or less than 1 - s-thus is
as small as desired.
The solutionof the problemof estimationconsistsin constructing
the confidencebelt and in affirming that the point A, representing
the combination of some possible value of x with some possible
value of 0, will lie inside of the belt. When observationprovides
I (Xiais)= bs
As there are s linear equations with regard to n > s unknown co-
efficientsX we may generally make a choice among all possible
systemssatisfying(9) in orderto satisfy(6).
Using the known formulafor the variance of a linear function
of n variables xi, we may writethe condition(6) in the form
G20, _ Z(X,i2G,2) + 2=E(XjXjaia,gri) ***(10)
>(Ps
)*2** * * * * (13)
q1 q2,. . . (14)
found byminimizingthesumn
of squares
0 .(Mit) *(17)
? which would
and then try to findsuch values of the coefficients
satisfythe conditions
E(O') 0. (19)
E(xi3) =. i *(21)
where Xistands forthe mean value of ?ij calculated from (23), that
is,
X Mt . . . (25)
M?,
Xi Mt.. (26)
It will be noticed that this result holds good not only whatever
be the unknown vii,but also whatever the standard deviations ac.
Thus if we denote by jT the mean value xij forany stratum,then the
function
6' E (M) . (27)
i=l1
The estimate O' is, of course, the one which could be suggested
on purely intuitive grounds. The advantage of using Markoff's
method consists (i) in avoiding biased estimates, which may be
sometimes used when their choice is based on intuition only,
and (ii) in findingthe best linear estimates. It would be rather
difficultto fulfilthis last conditionon intuitivegrounds only.
Note III. The consistencyand theefficiency
of theestimateof C. Gini
and L. Galvani.
Consider a population 7r of districtsdivided into second order
strata, 7CYV,accordingto the values of the controly and the number
v of individuals in the districts. Thus any districtin the stratum
7yv contains the same number of individuals v and the value of the
control correspondingto each districtis also the same y. Denote
by My, and myvthe total numberof districtscontainedin ny and the
numberof them to be included in the sample. The lettersuyviand
xyj will denote the values of the character sought, x, associated
respectivelywith the i-th districtof the stratum7yvand with the
i-th districtout of the myvof them, which have been selected from
this stratum. The lettersuyvand xyVwill denote the means of uyv
and xyvicorrespondingto the stratumnyvand to the partial sample
of districts,drawn from this stratum. The standard deviation of
uyvjwill be denoted by oyv. Finally, X and Y will denote the
weightedmeans of the charactersought x and of the control,calcu-
lated forthe whole population n. The sample weightedmeans will
be denoted by X, and Yy. Denote furtherby
W == =(MyVv) .(1)
yv
We shall have
X - Y(Myvuy * .
-(2)
yv
I
Y l2'(MYvy) .(3)
_ yV
- (4)
E(MYVV)
y v
YE(rnyvvy)
E(Xyn) Y vgu X_
E(X~) V
YY,( ) -
Xe (say). . . . (7)
y V
Now we wish to have
X= X,.(8)
whenever(6) is satisfied.
The equations (6) and (8) may be writtenin the followingform
YM2YVv(y Y) 0
O .(9) .
y v
E-ny,,v(uyv -X) = 0, .(10)
yv
(myv) = w .(12)
y v
The values of the X's must be so chosen that this last expression
should be identicallyequal to X whateverthe unknownvalue of A.
Thus we should have, say
fa-(Pv) 1.(20)
yv i
= ?
Y- (YYG(Xyvz)) (21)
U
y v
a20, >2
E{ G2 v E(X2yvi) - 2_ LY(ytXY'Y))}J . (22)
2 ,i
yvf 1 - y}=o
P :2
- (Xy j) )YV)-oX
y (24)
Uxvi myv
ymv (O + 3Y) MY -
m (26)
myv yv
pV _ qxyv(Myv1) (30)
92 Y- (say). (33)
672
S2YV -, (34)
which will lead to minimizing (32) when getting the best linear
estimate of X. I think it would be exceedingly difficultto find
instancesin social,vital or economicstatisticsin whichthe hypothesis
H1 would be true. However, it may be true in some engineering
problems.
I want to emphasize that the general problem of estimationof
any given collectivecharacter0 of a population 7 must be considered
from two differentpoints of view: (i) Given a sample from the
population, obtained in some known manner, what arithmetical
procedure will give us an unbiased and a most accurate estimate
of 0 ? (ii) What method of samplingwill give samples, allowingthe
most accurate estimates? These two aspects of the problem may
be traced in any theoreticalresearch concerningthe representative
method. Oftenthe solutions proposed are based only on intuition
and requiretheoreticaljustification. The principleof the purposive
selectionmethod,advisingselectionofsamples such that the weighted
sample means of the controls should be equal to their population
values, is an intuitive solution of the problem (ii). The methods
whichhave been proposedto estimatethe unknownweightedpopula-
tion mean X are the solutionsof the problemof kind (i). In the first
part of the present Note I have considered the conditions under
which the intuitivesolutions of the problem (i) are justified. Now
I shall proceed to consider whether, and if so then under what
conditions,is justifiedthe solution of the problemof kind (ii).
To do so I shall assume that the hypothesisH' is satisfiedand
shall consider the variance of the best linear estimate, O', of the
unknownweightedmean X. Its expressionwill involvethe numbers
MYVof districtsselected for the sample fromeach stratum7ry. It
will be possible to see what-system of these numbers myvwould
minimizethe value of the variance a20' of O'. We shall see that under
certain, rather limiting,conditions,the system of myvminimizing
a20' will be the system for which the sample weightedmean of the
controlis equal to its population value.
1 (yy(Qyvy))2
a20o -f1+ (9
=20 Qyv) E-(QYVy2)y_(QYV)- (y(Qyvy))2J
Consideringthis formula,we see that it will provide a small
value of a20' if we succeed in minimizingsay
I YT I I . . . . .(40)
SE(QYVY)I
y EE(M1v ) .(41)
C Myv-1 yv .. (43)
TABLE I.
Population n1.
y= -1. y= +1.
v=1. v = 6. v = 1. v = 6.
Stratum I. Stratum II. Stratum III. Stratum IV.
TABLE II.
Population72
Y=-1 y =+ 1
=u =- 44 u= -2
U2 =0 U2 = -1 U2 =+2 U2 =+1
- -=U3= +2 U3 +4
ib =-1~~~f =
=- I = +1 uf =: +1
a2 =Q'1 =1 a2 = Q1 = 6 u2 = Q'1 = I a2 Q 1= 6
TABLE III.
Population7C3.
y=-1. y=+1.
v=1. v = 6. v=1. v = 6.
Stratum I. Stratum II. Stratum III. Stratum IV.
U =-1 U =-3 U +1 I =3
0r2 = 1 2= 6 U2 1 u2 = 6
have been misled by it. I can only say I have read it at the time
it appeared and since, and I read Dr. Neyman's elucidation of it
yesterdaywith great care. I am referringto Dr. Neyman's confi-
dence limits. I am not at all sure that the " confidence" is not a
" confidencetrick." Put in a simple formI thinkthe methodis as
follows:-Given that in a sample of I,000 taken at random, there
are i in io with the definedquality, and given that the population
fromwhichthe sample was drawncontainedan'yproportionbetween
I20 and 8o per thousand, then the chance of such an occurrenceis
less than one in twenty(approx.). Actual figures,of course,do not
matter. That margin between 120 and 8o per thousand in the
assumed population is shown on the vertical of the confidencebelt
in the veryilluminatinggraphs whichDr. Neyman has given. Does
that reallytake us any further? Do we know morethan was known
to Todhunter? Does it take us beyondKarl Pearson and Edgeworth?
Does it reallylead us towards what we need-the chance that in the
universe which we are sampling the proportion is within these
certain limits? I thinkit does not. I thinkwe are in the position
of knowing that eitheran improbable event has occurred or the
proportionin the population is withinthe limits. To balance these
things we must make an estimate and forma judgment as to the
likelihoodof the proportionin the universe-the very thing that -is
supposed to be eliminated. I do not say that we are making crude
judgments that everythingis equal throughoutthe possible range,
but I thinkwe are makingsome assumptionor we have not got any
further. I do not know that I have expressed my thoughtsquite
accurately,but it is not a thing that has occurredto me for the
I have feltsince the method
firsttimethisevening; it is the difficulty
was firstpropounded. The statementofthe theoryis not convincing,
and untilI am convincedI am doubtfulofits validity.
I regretthatin openingup that subject I have distractedattention
fromDr. Neyman's paper, but since he has made that an integral
part of his paper, I thinkit a properoccasion on whichto make this
kind of statement.
With referenceto my formula,quoted in Section IV, equation 24,
I must admit that the originalpassage is obscure. In Dr. Neyman's
notationand withonly one control,my estimatewould be *
X1 _ X-X + (rax - rayryx)ax. aaIaa.
where the a's are the weights attached to the x's in the weighted
average. The second termis zero, if thereis no correlationbetween
the weights and the divergencies,e's, from the linear regression
equation.t In other cases, K should be regardedas attached to the
errortermnegatively,since the weightedaverage of the e is not zero,
but -K. The formulais then " consistent." I am not, however,
* The completeformulain the notationI used may be written
X = X?,+ {ux. RxlRll . oaIi/a}
where Rx= raxrau'rav . . . r I pay . . rvPav I
know what was the state of affairsin a general population, and ask
what was one likely to get in particular cases? One might be
convinced,for instance, that pennies such as were provided by the
Mint were fairlysymmetrical,and on the basis of that it mightbe
said that the theoretical probability of heads was so and so, say
50 per cent. It mightthen be said, supposinga coin was tossed IOO
times, what should one expect? Should there be surpriseif there
were only 3o heads, or iftherewere go heads, or should one expect to
get about 50? Withoutany mathematicaltechniqueit was perhaps
sufficient thereto say that there was an a prioriprobabilityof one-
half.
But there was the converse problem. A coin had been tossed
ioo times and fallen heads iOO times. What kind of a universe of
coins had it come from? Had it come fromthat kind of universe
of coin in which the side with the head was more likely to show up
than the side withthe tail?
The classical theory provided us with a method which, when
limitedin its application to the fieldforwhich it was intended,was
perfectlylegitimate. According to this theory, if we knew the
a priori probabilities,that the penny was an Epsom Downs penny,
or that it was an ordinaryMintpenny,equally likelyto fall heads or
tails, or if we knew that it was a penny three times more likely to
fall heads than tails,-then if 70 heads had been observed in Ioo
trials, we could say what were the respectiveprobabilitiesthat the
pennywas an ordinarypenny or an Epsom Downs penny.
The criticismswhich had been made of the so-called theory of
inverse probability and of Bernoulli's theorem had always rested
not on the accuracy of the theoryitself,but on the correctnessof its
application, because in most cases these things were not known
a prioriat all.
Given the actual probabilityin the universe,it was possible to
make probabilitystatementsabout the sort of thing one was likely
to get in a sample. These probabilitystatementsusually provideda
measure forthe probabilitythat a certaininequality should be true.
Referringto Equation No. 4 in Dr. Neyman's Appendix I, in that
equation x was somethingthat belonged to the observedsample, and
it was imaginedtherethat we knew for the momenatthe particular
propertyof the universe.
His own criticismof that particular equation, and of the whole
structureplaced thereon,was that it was nothingnew; it was not a
departure from the various earlier attempts that had been made.
What was actually done was this: A particular0 was chosen; an
inequality was writtendown, and more values of 0 were chosen.
In each of these inequalities, c, which occurred on the right-hand
side of the equation, occurredin the definitionof inequality on the
left side, and had been left out of the equation by Dr. Neyman.
When all the pointswhicharose fromall the inequalitieswhich could
be got by consideringall permissiblevalues of 0 had been marked,it
would be found that we were no furtherthan the man who said,
" Let us suppose that the a priori probability in the universe is
distributedin a particular way. Let us suppose that it fulfilsa
y2
and 2)2)
/L2= av~2(1-r2)(1 ? (X 2
-
wherex and meanthesamplemeansofthetwovariables,uX2 and a52 the sample
variances,X the knownpopulationmean of x, Y'-the estimateof the popula-
tion mean of y, p2-the estimate of the variance of the same, a-the sample
regressioncoefficientof y on x and r-the sample correlationcoefficient.As
the differenceI X - J is practicallyneverzero,in orderto attain the greatest
accuracyofthe estimateY', it is advisable to tryto minimizeI X - J keeping
ax as large as possible. In order to do so, it is necessary to include in the
sample individualsboth with very large values of x and with very small ones.
- will be small and gxlarge,and 2 will approach its
Then the differenceIX -
minimumvalue Oy2(1- r2)
n -2
I did not expect that the sections of my paper dealing with the
new formof the problemof estimationwould play so large a part in
the discussion. It is at least gratifyingthat the criticismswere so
divergentthat one of the speakers could say that everythingin it
is doubtful,and another that it is nothing new. I considered it
necessaryto include these sectionsin mypaper, as otherwiseit would
not be complete,and it would be justifiableto ask why I am not
troublingto considerthe problems which ProfessorBowley termed
the inverseproblems.
Detailed commentson all questionsraised in the discussionon the
confidenceintervalswould requiretoo much space. In fact, to clear
up the matter entirely,a separate publication is needed. As this
is in preparation,I shall limit myselfonly to one or two remarks,
which may clear away certainobvious misunderstandings.
It has been suggested in the discussion that I used the term
"confidence coefficient " insteadof the term " fiducialprobability."
This is certainly a misunderstanding. The term confidence co-
efficientis not synonymousto the term probability. It means an
arbitrarilychosen value of the probability of our being rightwhen
applying a certain rule of behaviour. The relation of the concept
of the confidencecoefficient to.that of probabilitymay be compared
to the relation betweenthe concepts of the " price " and " money"
(this,ifwe accept the definitionofthe " price" as " a certainamount
of money which has been fixed by the merchant . . ."). Perhaps
a still better comparison is provided by the terms " rate of
interest" and " money." The analogy here is less superficialthan
one would expect. Banks are workingat a certainrate of interest,
which is being fixedonce fora longerperiod,and just this constancy
led to the introductionof the term " rate of interest," The validity
of probability statements in the new form of the problem of
estimation,whichhas been here so extensivelydiscussed,depends on
the permanentuse of a systemof confidenceintervals. This system
as a whole (not separate intervals) correspondsto a fixedprobability
that our predictionsare correct, and certainly there is a definite
advantage in having a special term to denote this value of the
probability. It would allow us, for example, to use the convenient
expressions like the following: the seed-testingstation in X is
workingwith the confidencecoefficient -95,etc.
Another important misunderstanding,which I think it useful
to clear up now, is contained in the following remarks of Pro-
fessor Bowley concerningthe theory of the confidenceintervals:-
" (a) Does it really lead us towards what we need the chance that,
in the universe which we are sampling, the proportionis within
these certainlimits? I thinkit does not; (b) I thinkwe are in the
position of knowingthat eitheran improbable event has occurred,
or the proportionin the population is withinthe limits."
I have marked the two sentences with letters (a) and (b) as I
shall have to commenton them separately.
The sentence(a) containsthe statementof the problemof estima-
tion in the formof Bayes. Simple algebra shows that the solution
of this problem must depend upon the probability law a priori.