Introduction To Statistics Data Collection
Introduction To Statistics Data Collection
Introduction To Statistics Data Collection
Objective: The aim of the present lesson is to enable the students to understand
the meaning, definition, nature, importance and limitations of
statistics.
1.1 INTRODUCTION
At the micro level, individual firms, howsoever small or large, produce extensive
statistics on their operations. The annual reports of companies contain variety of data
These data are often field data, collected by employing scientific survey techniques.
Unless regularly updated, such data are the product of a one-time effort and have limited
use beyond the situation that may have called for their collection. A student knows
physics, and others. It is a discipline, which scientifically deals with data, and is often
described as the science of data. In dealing with statistics as data, statistics has
In the beginning, it may be noted that the word ‘statistics’ is used rather curiously in
two senses plural and singular. In the plural sense, it refers to a set of figures or data. In
the singular sense, statistics refers to the whole body of tools that are used to collect
data, organise and interpret them and, finally, to draw conclusions from them. It should
be noted that both the aspects of statistics are important if the quantitative data are to
methodology, we could not know the right procedure to extract from the data the
information they contain. Similarly, if our data are defective or that they are inadequate
or inaccurate, we could not reach the right conclusions even though our subject is well
developed.
A.L. Bowley has defined statistics as: (i) statistics is the science of counting, (ii)
Statistics may rightly be called the science of averages, and (iii) statistics is the science
Boddington defined as: Statistics is the science of estimates and probabilities. Further,
W.I. King has defined Statistics in a wider context, the science of Statistics is the method
of judging collective, natural or social phenomena from the results obtained by the
Seligman explored that statistics is a science that deals with the methods of collecting,
some light on any sphere of enquiry. Spiegal defines statistics highlighting its role in
scientific method for collecting, organising, summa rising, presenting and analyzing
data as well as drawing valid conclusions and making reasonable decisions on the basis
of such analysis. According to Prof. Horace Secrist, Statistics is the aggregate of facts,
systematic manner for a pre-determined purpose, and placed in relation to each other.
From the above definitions, we can highlight the major characteristics of statistics as
follows:
(i) Statistics are the aggregates of facts. It means a single figure is not statistics.
For example, national income of a country for a single year is not statistics but
(iii) Statistics must be reasonably accurate. Wrong figures, if analysed, will lead to
accurate figures.
haphazard manner, they will not be reliable and will lead to misleading
conclusions.
(vi) Lastly, Statistics should be placed in relation to each other. If one collects data
unrelated to each other, then such data will be confusing and will not lead to any
logical conclusions. Data should be comparable over time and over space.
Statistical data are the basic raw material of statistics. Data may relate to an activity of
our interest, a phenomenon, or a problem situation under study. They derive as a result
of the process of measuring, counting and/or observing. Statistical data, therefore, refer
classified. Any object subject phenomenon, or activity that generates data through this
process is termed as a variable. In other words, a variable is one that shows a degree of
variability when successive measurements are recorded. In statistics, data are classified
into two broad categories: quantitative data and qualitative data. This classification is
Quantitative data are those that can be quantified in definite units of measurement.
continuous variable is the one that can assume any value between any two points
on a line segment, thus representing an interval of values. The values are quite
precise and close to each other, yet distinguishably different. All characteristics
etc., represent continuous variables. Thus, the data recorded on these and similar
continuous variable assumes the finest unit of measurement. Finest in the sense
(ii) Discrete data are the values assumed by a discrete variable. A discrete variable
is the one whose outcomes are measured in fixed numbers. Such data are
essentially count data. These are derived from a process of counting, such as the
flights at an airport, and the defective items in a consignment received for sale,
characteristic is qualitative in nature when its observations are defined and noted in
terms of the presence or absence of a certain attribute in discrete numbers. These data
(i) Nominal data are the outcome of classification into two or more categories of
undergraduates, and post-graduates), all result into nominal data. Given any
(ii) Rank data, on the other hand, are the result of assigning ranks to specify order
in terms of the integers 1,2,3, ..., n. Ranks may be assigned according to the
Data sources could be seen as of two types, viz., secondary and primary. The two can
be defined as under:
(i) Secondary data: They already exist in some form: published or unpublished -
(ii) Primary data: Those data which do not already exist in any form, and thus have
to be collected for the first time from the primary source(s). By their very nature,
these data require fresh and first-time collection covering the whole population
There are two major divisions of statistics such as descriptive statistics and inferential
statistics. The term descriptive statistics deals with collecting, summarizing, and
simplifying data, which are otherwise quite unwieldy and voluminous. It seeks to
achieve this in a manner that meaningful conclusions can be readily drawn from the
data. Descriptive statistics may thus be seen as comprising methods of bringing out
and highlighting the latent characteristics present in a set of numerical data. It not
interpretations.
The first step in any scientific inquiry is to collect data relevant to the problem in hand.
When the inquiry relates to physical and/or biological sciences, data collection is
normally an integral part of the experiment itself. In fact, the very manner in which an
experiment is designed, determines the kind of data it would require and/or generate.
The problem of identifying the nature and the kind of the relevant data is thus
the case of physical sciences. In the case of social sciences, where the required data are
respondents, the problem is not that simply resolved. For one thing, designing the
questionnaire itself is a critical initial problem. For another, the number of respondents
to be accessed for data collection and the criteria for selecting them has their own
implications and importance for the quality of results obtained. Further, the data have
been collected, these are assembled, organized, and presented in the form of appropriate
tables to make them readable. Wherever needed, figures, diagrams, charts, and graphs
are also used for better presentation of the data. A useful tabular and graphic
presentation of data will require that the raw data be properly classified in accordance
to be carried out.
A well thought-out and sharp data classification facilitates easy description of the
measures of central tendency, dispersion, skewness, and kurtosis, which constitute the
essential scope of descriptive statistics. These form a large part of the subject matter of
any basic textbook on the subject, and thus they are being discussed in that order here
as well.
Inferential statistics, also known as inductive statistics, goes beyond describing a
presenting the related data. Instead, it consists of methods that are used for drawing
of knowledge about a part of that totality. The totality of observations about which an
The part of totality, which is observed for data collection and analysis to gain
The desired information about a given population of our interest; may also be collected
even by observing all the units comprising the population. This total coverage is called
census. Getting the desired value for the population through census is not always
feasible and practical for various reasons. Apart from time and money considerations
making the census operations prohibitive, observing each individual unit of the
population with reference to any data characteristic may at times involve even
destructive testing. In such cases, obviously, the only recourse available is to employ
the partial or incomplete information gathered through a sample for the purpose. This
is precisely what inferential statistics does. Thus, obtaining a particular value from the
sample information and using it for drawing an inference about the entire population
underlies the subject matter of inferential statistics. Consider a situation in which one is
required to know the average body weight of all the college students in a given
cosmopolitan city during a certain year. A quick and easy way to do this is to record the
weight of only 500 students, from out of a total strength of, say, 10000, or an unknown
total strength, take the average, and use this average based on incomplete weight data
to represent the average body weight of all the college students. In a different situation,
one may have to repeat this exercise for some future year and use the quick estimate of
average body weight for a comparison. This may be needed, for example, to decide
whether the weight of the college students has undergone a significant change over the
years compared.
example, an inspection of a sample of five battery cells drawn from a given lot may
reveal that all the five cells are in perfectly good condition. This information may be
used to conclude that the entire lot is good enough to buy or not.
Since this inference is based on the examination of a sample of limited number of cells,
it is equally likely that all the cells in the lot are not in order. It is also possible that all
the items that may be included in the sample are unsatisfactory. This may be used to
conclude that the entire lot is of unsatisfactory quality, whereas the fact may indeed be
otherwise. It may, thus, be noticed that there is always a risk of an inference about a
population being incorrect when based on the knowledge of a limited sample. The
rescue in such situations lies in evaluating such risks. For this, statistics provides the
decisions taken on the basis of sample information being incorrect. This requires an
understanding of the what, why, and how of probability and probability distributions to
equip ourselves with methods of drawing statistical inferences and estimating the
Apart from the methods comprising the scope of descriptive and inferential branches
of statistics, statistics also consists of methods of dealing with a few other issues of
specific nature. Since these methods are essentially descriptive in nature, they have
been discussed here as part of the descriptive statistics. These are mainly concerned
(i) It often becomes necessary to examine how two paired data sets are related. For
example, we may have data on the sales of a product and the expenditure
incurred on its advertisement for a specified number of years. Given that sales
the nature of relationship between the two and quantify the degree of that
(ii) Situations occur quite often when we require averaging (or totalling) of data on
example, price of cloth may be quoted per meter of length and that of wheat per
apply to such price/quantity data, special techniques needed for the purpose are
activity with a view to determining its future behaviour. For example, when
analysis of relevant sales data over time. The more complex the activity, the
more varied the data requirements. For profit maximising and future sales
planning, forecast of likely sales growth rate is crucial. This needs careful
collection and analysis of past sales data. All such concerns are taken care of
(iv) Obtaining the most likely future estimates on any aspect(s) relating to a business
or economic activity has indeed been engaging the minds of all concerned. This
regression, correlation, and time series analyses together help develop the basic
which statistical data are analysed, interpreted, and the inferences drawn for
decision-making.
Though generic in nature and versatile in their applications, statistical methods have
come to be widely used, especially in all matters concerning business and economics.
These are also being increasingly used in biology, medicine, agriculture, psychology,
and education. The scope of application of these methods has started opening and
finds them of increasing relevance for examining the political behaviour and it is, of
course, no surprise to find even historians statistical data, for history is essentially past
There are three major functions in any business enterprise in which the statistical
(i) The planning of operations: This may relate to either special projects or to
(ii) The setting up of standards: This may relate to the size of employment,
volume of sales, fixation of quality norms for the manufactured product, norms
achieved against the norm or target set earlier. In case the production has fallen
setting standards, and control-are separate, but in practice they are very much
interrelated.
Different authors have highlighted the importance of Statistics in business. For instance,
Croxton and Cowden give numerous uses of Statistics in business such as project
planning, budgetary planning and control, inventory planning and control, quality
control, marketing, production and personnel administration. Within these also they
have specified certain areas where Statistics is very relevant. Another author, Irwing
number of areas where statistics is extremely useful. These are: customer wants and
inspection, packaging and shipping, sales and complaints, inventory and maintenance,
arising in the course of business operations are multitudinous. As such, one may do no
more than highlight some of the more important ones to emphasis the relevance of
statistics to the business world. In the sphere of production, for example, statistics can
Statistical quality control methods are used to ensure the production of quality goods.
Identifying and rejecting defective or substandard goods achieve this. The sale targets
can be fixed on the basis of sale forecasts, which are done by using varying methods of
forecasting. Analysis of sales affected against the targets set earlier would indicate the
deficiency in achievement, which may be on account of several causes: (i) targets were
too high and unrealistic (ii) salesmen's performance has been poor (iii) emergence of
increase in competition (iv) poor quality of company's product, and so on. These factors
management. Here, one is concerned with the fixation of wage rates, incentive norms
and performance appraisal of individual employee. The concept of productivity is very
awarded to the workers. Comparisons of wages and productivity are undertaken in order
Statistical methods could also be used to ascertain the efficacy of a certain product, say,
the formation of two comparable groups of asthma patients. One group is given this
new medicine for a specified period and the other one is treated with the usual
medicines. Records are maintained for the two groups for the specified period. This
record is then analysed to ascertain if there is any significant difference in the recovery
of the two groups. If the difference is really significant statistically, the new medicine
is commercially launched.
(i) There are certain phenomena or concepts where statistics cannot be used. This
(ii) Statistics reveal the average behaviour, the normal or the general trend. An
situation may lead to a wrong conclusion and sometimes may be disastrous. For
example, one may be misguided when told that the average depth of a river from
one bank to the other is four feet, when there may be some points in between
where its depth is far more than four feet. On this understanding, one may enter
relevant or useful in other situations or cases. For example, secondary data (i.e.,
data originally collected by someone else) may not be useful for the other
person.
(iv) Statistics are not 100 per cent precise as is Mathematics or Accountancy.
to cover all the units or elements comprising the universe. The results may not
based on the same size of sample but different sample units may yield different
results.
in statistics, but such a relationship does not indicate cause and effect'
the two variables. In such cases, it is the user who has to interpret the results
(vii) A major limitation of statistics is that it does not reveal all pertaining to a certain
cover. Similarly, there are some other aspects related to the problem on hand,
which are also not covered. The user of Statistics has to be well informed and
should interpret Statistics keeping in mind all other aspects having relevance on
Apart from the limitations of statistics mentioned above, there are misuses of it. Many
people, knowingly or unknowingly, use statistical data in wrong manner. Let us see
what the main misuses of statistics are so that the same could be avoided when one has
to use statistical data. The misuse of Statistics may take several forms some of which
absence of the source, the reader does not know how far the data are reliable.
(ii) Defective data: Another misuse is that sometimes one gives defective data. This
point. This apart, the definition used to denote a certain phenomenon may be
definition may include even those who are employed, though partially. The
universe. The sample may turn out to be unrepresentative of the universe. One
may choose a sample just on the basis of convenience. He may collect the
sample.
(iv) Inadequate sample: Earlier, we have seen that a sample that is unrepresentative
of the universe is a major misuse of statistics. This apart, at times one may
city we may find that there are 1, 00,000 households. When we have to conduct
only 0.1 per cent of the universe. A survey based on such a small sample may
comparisons from the data collected. For instance, one may construct an index
of production choosing the base year where the production was much less. Then
he may compare the subsequent year's production from this low base.
attempted. Such a comparison is wrong. Likewise, when data are not properly
classified or when changes in the composition of population in the two years are
not taken into consideration, comparisons of such data would be unfair as they
For example, while making projections of population in the next five years, one
may assume a lower rate of growth though the past two years indicate otherwise.
Sometimes one may not be sure about the changes in business environment in
the near future. In such a case, one may use an assumption that may turn out to
average. Suppose in a series there are extreme values, one is too high while the
other is too low, such as 800 and 50. The use of an arithmetic average in such a
case may give a wrong idea. Instead, harmonic mean would be proper in such a
case.
(vii) Confusion of correlation and causation: In statistics, several times one has to
examine the relationship between two variables. A close relationship between the two
1.8 SUMMARY
coverage, and scope. At the macro level, these are data on gross national product and
At the micro level, individual firms, howsoever small or large, produce extensive
statistics on their operations. The annual reports of companies contain variety of data
These data are often field data, collected by employing scientific survey techniques.
Unless regularly updated, such data are the product of a one-time effort and have limited
use beyond the situation that may have called for their collection. A student knows
physics, and others. It is a discipline, which scientifically deals with data, and is often
described as the science of data. In dealing with statistics as data, statistics has
1. Define Statistics. Explain its types, and importance to trade, commerce and
business.
4. What are the major limitations of Statistics? Explain with suitable examples.
2. Hooda, R. P.: Statistics for Business and Economics, Macmillan, New Delhi.
NJ.
NY.