Statistics, History Of: Alain Desrosie'res, INSEE, Paris, France
The history of statistics can be told in several ways: as a quantitative description of the social world, or as a branch of applied
mathematics making it possible to build models in any area featuring large numbers. There are many historical connections
between these narratives. This history began in the eighteenth century, with German Statistik and English political arithmetic.
In the nineteenth century Quetelet established a relation between the probabilistic theory of the average man and the
gathering of ofcial statistics by governments, with a large use of the Gaussian distribution and of the relative stability of
macrosocial phenomena. In the 1890s, Francis Galton and Karl Pearson created a mathematical theory of statistical distri-
bution, and introduced the ideas of variance, regression, correlation, and chi-square test. That theory was then developed by
Ronald Fisher, Jerzy Neyman, and Egon Pearson, with the inferential statistics theory, giving a strong probabilistic foundation
to estimation, tests, maximum likelihood, and sampling theory. Simultaneously, ofcial statistical bureaux, compiling
administrative statistics, and organising censuses and sampling surveys, were also developed. These institutions, often
perceived only as suppliers of data assumed to reect reality, are also places where this reality is instituted through co-
constructed operations of social representations, public action, and statistical measurement.
The word statistics today has several different meanings. For Nineteenth-century statistics were therefore a fairly
the public, and even for many people specializing in social informal combination of these two traditions: taxonomy and
studies, it designates numbers and measurements relating to numbering. At the beginning of the twentieth century, it further
the social world: population, gross national product, and became a mathematical method for the analysis of facts (social
unemployment, for instance. For academics in Statistics or not) involving large numbers and for inference on the basis
Departments, however, it designates a branch of applied of such collections of facts. This branch of mathematics is
mathematics making it possible to build models in any area generally associated with the probability theory, developed in the
featuring large numbers, not necessarily dealing with society. seventeenth and eighteenth centuries, which, in the nineteenth
History alone can explain this dual meaning. Statistics century, inuenced many branches of both the social and
appeared at the beginning of the nineteenth century as natural sciences (Gigerenzer et al., 1989). The word statistics is
meaning a quantied description of humancommunity still used in both ways today and both of these uses are still
characteristics. It brought together two previous traditions: that related, of course, insofar as quantitative social studies use, in
of German Statistik and that of English political arithmetic varied proportions, inference tools provided by mathematical
(Lazarsfeld, 1977). statistics. The probability calculus, on its part, grounds the
When it originated in Germany in the seventeenth and credibility of statistical measurements resulting from surveys,
eighteenth centuries, the Statistik of Hermann Conring together with random sampling and condence intervals.
(160681) and Gottfried Achenwall (171979) was a means The diversity of meanings of the word statistics, main-
for classifying the knowledge needed by kings. This science of tained until today, has been heightened by the massive devel-
the state included history, law, political science, economics, opment, beginning in the 1830s, of bureaus of statistics,
and geography, that is, a major part of what later became the administrative institutions distinct from universities, in charge
subjects of social studies, but presented from the point of view of collecting, processing, and transmitting quantied infor-
of their utility for the state. These various forms of knowledge mation on the population, the economy, employment, living
did not necessarily entail quantitative measurements: they were conditions, etc. Starting in the 1940s, these bureaus of statistics
basically associated with the description of specic territories in became important data suppliers for empirical social studies,
all their aspects. This territorial connotation of the word then in full growth. Their history is therefore an integral part of
statistics would subsist for a long time in the nineteenth these sciences, especially in the second half of the twentieth
century. century, during which the mathematical methods developed by
Independently of the German Statistik, the English tradition university statisticians were increasingly used by the so-called
of political arithmetic had developed methods for analyzing ofcial statisticians. Consequently the history of statistics is
numbers and calculations, on the basis of parochial records of a crossroads history connecting very different elds, which are
baptisms, marriages, and deaths. Such methods, originally covered in other articles of this encyclopedia: general problems
developed by John Graunt (162074) in his work on bills of of the quantication of social studies (Theodore Porter),
mortality, were then systematized by William Petty mathematical sampling methods (Stephen E. Fienberg and
(162787). They were used, among others, to assess the pop- J. M. Tanur), survey research (Martin Bulmer), demography
ulation of a kingdom and to draw up the rst forms of life (Simon Szreter), econometrics, etc. The history of these
insurance. They constitute the origin of modern demography. different elds was the object of much research in the 1980s
and 1990s, some examples of which are indicated in the were published in volumes with heterogeneous contents,
Bibliography. Its main interest is to underscore the increasingly but their very existence suggests that the characteristics of
closer connections between the so-called internal dimensions society were henceforth a matter of scientic law and no
(history of the technical tools), the external ones (history of longer of judicial law, that is, of observed regularity and not
the institutions), and those related to the construction of social of the normative decisions of political power. Quetelet was
studies objects, born from the interaction between the three the man who orchestrated this new way of thinking the
foci constituted by university research, administrations in social world. In the 1830s and 1840s, he set up adminis-
charge of social problems, and bureaus of statistics. This co- trative and social networks for the production of statistics
construction of objects makes it possible to join historiogra- and established until the beginning of the twentieth
phies that not long ago were distinct. Three key moments of century how statistics were to be interpreted.
this history will be mentioned here: Adolphe Quetelet and the This interpretation is the result of the combination of two
average man (1830s), Karl Pearson and correlation (1890s), ideas developed from the law of large numbers: the generality
and the establishment of large systems for the production and of normal distribution (or, in Quetelets vocabulary: the law
processing of statistics. of possibilities) and the regularity of certain yearly statistics. As
early as 1738, Abraham de Moivre, seeking to determine the
convergence conditions for the law of large numbers, had
Quetelet, the Average Man, and Moral Statistics formulated the mathematical expression of the future
Gaussian law as the limit of a binomial distribution. Then
The cognitive history of statistics can be presented as that of the Laplace (17491827) had shown that this law constituted
tension and sliding between two foci: the measurement of a good representation of the distribution of measurement
uncertainty (Stigler, 1986), resulting from the work of errors in astronomy, hence the name that Quetelet and his
eighteenth-century astronomers and physicists, and the reduc- contemporaries also used to designate it: the law of errors
tion of diversity, which will be taken up by social studies. (the expression normal law, under which it is known today,
Statistics is a way of taming chance (Hacking, 1990) in two would not be introduced until the late nineteenth century by
different ways: chance and uncertainty related to protocols of Karl Pearson).
observation, chance and dispersion related to the diversity Quetelets daring intellectual masterstroke was to bring
and the indetermination of the world itself. The Belgian together two forms: on the one hand the law of errors in
astronomer and statistician Adolphe Quetelet (17961874) is observation, and on the other, the law of distribution of certain
the essential character in the transition between the world of body measurements of individuals in a population, such as
uncertain measurement of the probability proponents (Carl the height of conscripts in a regiment. The likeness of the
Friedrich Gauss, Pierre Simone de Laplace) and that of the Gaussian looks of these two forms of distribution justied the
regularities resulting from diversity, thanks to his having invention of a new being with a promise of notable posterity in
transferred, around 1830, the concept of average from the social studies: the average man. Thus, Quetelet restricted the
natural sciences to the human sciences, through the construc- calculation and the legitimate use of averages to cases where the
tion of a new being, the average man. distribution of the observations had a Gaussian shape, analo-
As early as the eighteenth century, specicities appeared gous to that of the distribution of the astronomical observa-
from observations in large numbers: drawing balls out of urns, tions of a star. Reasoning on that basis, just as previous to this
gambling, successive measurements of the position of a star, distribution there was a real star (the cause of the Gaussian-
sex ratios (male and female births), or the mortality resulting shaped distribution), previous to the equally Gaussian distri-
from preventive smallpox inoculation, for instance. The radical bution of the height of conscripts there was a being of a reality
innovation of this century was to connect these very different comparable to the existence of the star. Quetelets average man
phenomena thanks to the common perspective provided by the is also the constant cause, previous to the observed controlled
law of large numbers formulated by Jacques Bernoulli in variability. He is a sort of model, of which specic individuals
1713. If draws from a constant urn containing white and black are imperfect copies.
balls are repeated a large number of times, the observed share The second part of this cognitive construction, which is so
of white balls converges toward that actually contained by the important in the ulterior uses of statistics in social studies, is the
urn. Considered by some as a mathematical theorem and by attention drawn by the remarkable regularity of series of
others as an experimental result, this law was at the crossroads statistics, such as those of marriages, suicides, or crimes. Just as
of the two currents in epistemological science: one hypothet- series of draws from an urn reveal a regularity in the observed
ical deductive, the other empirical inductive. frequency of white balls, the regularity in the rates of suicide or
Beginning in the 1830s, the statistical description of crime can be interpreted as resulting from series of draws from
observations in large numbers became a regular activity of a population, some of the members of which are affected with
the state. Previously reserved for princes, this information a propensity to suicide or crime. The average man is therefore
henceforth available to enlightened men was related to the endowed not only with physical attributes but also moral
population, to births, marriages, and deaths, to suicides and ones, such as these propensities. Here again, just as the average
crimes, to epidemics, to foreign trade, schools, jails, and heights of conscripts are stable, whereas individual heights are
hospitals. It was generally a by-product of administration dispersed, crime or suicide rates are just as stable, whereas these
activity, not the result of special surveys. Only the pop- acts are eminently individual and unpredictable. This form of
ulation census, the showcase product of nineteenth-century statistics, then called moral, signaled the beginning of soci-
statistics, was the object of regular surveys. These statistics ology, a science of society radically distinct from a science of the
individual, such as psychology (Porter, 1986). Quetelets Current mathematical statistics proceed from the works of
reasoning would ground the one developed by Durkheim in Karl Pearson and his successors: his son Egon Pearson
Suicide: A Study in Sociology (1897). (18951980), the Polish mathematician Jerzy Neyman
This way of using statistical regularity to back the idea of the (18941981), the statistician pioneering in agricultural exper-
existence of a society ruled by specic laws, distinct from those imentation Ronald Fisher (18901962), and nally the engi-
governing individual behavior, dominated nineteenth-century, neer and beer brewer William Gosset, alias Student
and, in part, twentieth-century social studies. Around 1900, (18761937). These developments were the result of an
however, another approach appeared, this one centered on two increasingly thorough integration of so-called inferential
ideas: the distribution (no longer just the average) of obser- statistics into probabilistic models. The interpretation of these
vations, and the correlation between two or several variables, constructions is always stretched between two perspectives: the
observed in individuals (no longer just in groups, such as one of science, which aims to prove or test hypotheses, with
territories). truth as its goal, and the one of action, which aims to make the
best decision, with efciency as its goal. This tension explains
a number of controversies that opposed the founders of
Distribution, Correlation, and Causality inferential statistics in the 1930s. In effect, the essential inno-
vations were often directly generated within the framework of
This shift of interest from the average individual to the distri- research as applied to economic issues, for instance in the cases
butions and hierarchies among individuals, was connected to of Gosset and Fisher.
the rise, in late-century Victorian England, of a eugenicist and Gosset was employed in a brewery. He developed product
hereditarian current of thought inspired from Darwin quality-control techniques based on a small number of
(MacKenzie, 1981). Its two advocates were Francis Galton samples. He needed to appraise the variances and laws of
(18221911), a cousin of Darwin, and Karl Pearson distribution of parameters calculated on observations not
(18571936). In their attempt to measure biological heredity, complying with the supposed law of large numbers. Fisher,
which was central to their political construction, they created who worked in an agricultural research center, could only carry
a radically new statistics tool that made it possible to conceive out a limited number of controlled tests. He mitigated this
partial causality. Such causality had been absent from all limitation by articially creating a randomness, itself
previous forms of thought, for which A either is or is not the controlled, for variables other than those for which he was
cause of B, but cannot be so somewhat or incompletely. Yet trying to measure the effect. This randomization technique
Galtons research on heredity led to such a formulation: the thus introduced probabilistic chance into the very heart of the
parents height explains the childrens, but does not entirely experimental process. Unlike Karl Pearson, Gosset and Fisher
determine it. The taller fathers are, the taller are their sons on used distinct notations to designate, on the one hand, the
average, but, for a fathers given height, the sons height theoretical parameter of a probability distribution (a mean,
dispersion is great. This formalization of heredity led to the two a variance, a correlation) and on the other, the estimate of this
related ideas of regression and correlation, later to be exten- parameter, calculated on the basis of observations so insuf-
sively used in social studies as symptoms of causality. cient in number that it was possible to disregard the gap
Pearson, however, greatly inuenced by the antirealist between these two values, theoretical and estimated.
philosophy of the physicist Ernst Mach, challenged the idea of This new system of notation marked a decisive turning
causality, which according to him was metaphysical, and point: it enabled an inferential statistics based on probabilistic
stuck to the one of correlation, which he described with the models. This form of statistics was developed in two directions.
help of contingency tables (Pearson, 1911: Chap. 5). For him, The estimation of parameters, which took into account a set of
scientic laws are only summaries, brief descriptions in mental recorded data, presupposed that the model was true. The
stenography, abridged formulas, a condensation of perception information produced by the model was combined with the
routines for future use and forecasting. Such formulas are the data, but nothing indicated whether the model and the data
limits of observations that never perfectly respect the strict were in agreement. In contrast, the hypothesis tests allowed this
functional laws. The correlation coefcient makes it possible agreement to be tested and if necessary to modify the model:
to measure the strength of the connection, between zero this was the inventive part of inferential statistics. In wondering
(independence) and one (strict dependence). Thus, in this whether a set of events could plausibly have occurred if
conception of science associated by Pearson with the budding a model were true, one compared these events explicitly or
eld of mathematical statistics, the reality of things can only otherwise to those that would have occurred if the model
be invoked for pragmatic ends and provided that the were true, and made a judgment about the gap between these
perception routines are maintained. Similarly, causality can two sets of events.
only be insofar as it is a proven correlation, therefore This judgment could itself be made according to two
predictable with a fairly high probability. Pearsons pointed different perspectives, which were the object of vivid contro-
formulations would constitute, in the early twentieth century, versy between Fisher on the one hand, and Neyman and Egon
one of the foci of the epistemology of statistics applied to Pearson on the other. Fishers test was placed in a perspective of
social studies. Others, in contrast, would seek to give new truth and science: a theoretical hypothesis was judged plausible
meaning to the concepts of reality and causality by dening or was rejected, after consideration of the observed data. Ney-
them differently. These discussions were strongly related to man and Pearsons test, in contrast, was aimed at decision
the aims of the statistical work, strained between scientic making and action. One evaluated the respective costs of
knowledge and decisions. accepting a false hypothesis and of rejecting a true one,
described as errors of Type I and II. These two different aims was linked to the fairly general development of a specic labor
truth and economy although supported by close probabilistic law and the rst so-called social welfare legislation, such as
formalism, led to practically incommensurable argumentative Bismarcks in Germany, or that developed in the Nordic coun-
worlds, as was shown by the dialogue of the deaf between tries in the 1890s. It is signicant that the application of the
Fisher on one side, and Neyman and Pearson on the other sample survey method (then called representative survey) was
(Gigerenzer et al., 1989: pp. 90109). rst tested in Norway in 1895, precisely in view of preparing
a new law enacting general pension funds and invalidity
insurance: this suggests the consistency of the political, tech-
Official Statistics and the Construction of the State nical, and cognitive dimensions of this co-construction.
These forms of consistency are found in the statistics systems
At the same time as mathematical statistics were developing, that were extensively developed, at a different scale, after 1945.
so-called ofcial statistics were also being developed in At that time, public policies were governed by a number of
bureaus of statistics, for a long time on an independent course. issues: the regulation of the macroeconomic balance as seen
These latter did not use the new mathematical tools until the through the Keynesian model, the reduction of social inequal-
1930s in the United States and the 1950s in Europe, in ities and the struggle against unemployment thanks to social-
particular when the random sample-survey method was used welfare systems, the democratization of school, etc. Some
to study employment or household budgets. Yet in the 1840s, people then spoke of revolution in government statistics
Quetelet had already actively pushed for such bureaus to be set (Duncan and Shelton 1978), and underscored its four
up in different countries, and for their scientication with the components, which have largely shaped the present statistics
tools of the time. In 1853, he had begun organizing meetings of systems. National accounting, a vast construction integrating
the International Congress of Statistics, which led to the a large number of statistics from different sources, was the
establishment in 1885 of the International Statistical Institute instrument on which the macroeconomic models resulting
(which still exists and includes mathematicians and ofcial from the Keynesian analysis were based. Sample surveys made
statisticians). One could write the history of these bureaus as an it possible to study a much broader range of issues and to
aspect of the more general history of the construction of the accumulate quantitative descriptions of the social world, which
state, insofar as they developed and legitimized a common were unthinkable at a time when the observation techniques
language specically combining the authority of science were limited to censuses and monographs. Statistical coordi-
and that of the state (Anderson 1988; Desrosires 1998; nation, an apparently strictly administrative affair, was indis-
Patriarca 1996). pensable to make consistent the observations resulting from
More precisely, every period of the history of a state could different elds. Finally, beginning in 1960, the generalization of
be characterized by the list of socially judged social questions computer data processing radically transformed the activity of
that were consequently put on the agenda of ofcial statistics. bureaux of statistics.
So were co-constructed three interdependent foci: representa- So, ofcial statistics, placed at the junction between social
tion, action, and statistics a way of describing and interpreting studies, mathematics, and information on public policies, has
social phenomena (to which social studies would increasingly become an important research component in the social studies.
contribute), a method for determining state intervention and Given, however, that from the institutional standpoint it is
public action, and nally, a list of statistical variables and generally placed outside, it is often hardly perceived by those
procedures aimed at measuring them. who seek to draw up a panorama of these sciences. In fact, the
Thus for example in England in the second third of the way bureaus of statistics operate and are integrated into
nineteenth century, poverty, mortality, and epidemic administrative and scientic contexts varies a lot from one
morbidity were followed closely in terms of a detailed country to another, so a history and a sociology of social
geographical distribution (counties) by the General Register studies cannot omit examining these institutions, which are
Ofce (GRO), set up by William Farr in 1837. Englands often perceived as mere suppliers of data assumed to reect
economic liberalism and the Poor Law Amendment Act of reality, when they are actually places where this reality is
1834 (which led to the creation of workhouses) were consistent instituted through co-constructed operations of social repre-
with this form of statistics. In the 1880s and 1890s, Galton and sentation, public action, and statistical measurement.
Pearsons hereditarian eugenics would compete with this
environmentalism, which explained poverty in terms of the
beginning of the century (Szreter 1996).
In all the important countries (including Great Britain) of
the 1890s and 1900s, however, the work of the bureaus of
