Introduction To Statistics Data Collection

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Introduction to Business Statistics

Objective: The aim of the present lesson is to enable the students to understand
the meaning, definition, nature, importance and limitations of
statistics.

1.1 INTRODUCTION

For a layman, ‘Statistics’ means numerical information expressed in quantitative


terms. This information may relate to objects, subjects, activities, phenomena, or
regions of space. As a matter of fact, data have no limits as to their reference,
coverage, and scope. At the macro level, these are data on gross national product
and shares of agriculture, manufacturing, and services in GDP (Gross Domestic
Product).

At the micro level, individual firms, howsoever small or large, produce extensive

statistics on their operations. The annual reports of companies contain variety of data

on sales, production, expenditure, inventories, capital employed, and other activities.

These data are often field data, collected by employing scientific survey techniques.

Unless regularly updated, such data are the product of a one-time effort and have limited

use beyond the situation that may have called for their collection. A student knows

statistics more intimately as a subject of study like economics, mathematics, chemistry,

physics, and others. It is a discipline, which scientifically deals with data, and is often

described as the science of data. In dealing with statistics as data, statistics has

developed appropriate methods of collecting, presenting, summarizing, and analysing

data, and thus consists of a body of these methods.

1.2 MEANING AND DEFINITIONS OF STATISTICS

In the beginning, it may be noted that the word ‘statistics’ is used rather curiously in

two senses plural and singular. In the plural sense, it refers to a set of figures or data. In

the singular sense, statistics refers to the whole body of tools that are used to collect

data, organise and interpret them and, finally, to draw conclusions from them. It should

be noted that both the aspects of statistics are important if the quantitative data are to

serve their purpose. If statistics, as a subject, is inadequate and consists of poor

methodology, we could not know the right procedure to extract from the data the

information they contain. Similarly, if our data are defective or that they are inadequate
or inaccurate, we could not reach the right conclusions even though our subject is well

developed.

A.L. Bowley has defined statistics as: (i) statistics is the science of counting, (ii)

Statistics may rightly be called the science of averages, and (iii) statistics is the science

of measurement of social organism regarded as a whole in all its manifestations.

Boddington defined as: Statistics is the science of estimates and probabilities. Further,

W.I. King has defined Statistics in a wider context, the science of Statistics is the method

of judging collective, natural or social phenomena from the results obtained by the

analysis or enumeration or collection of estimates.

Seligman explored that statistics is a science that deals with the methods of collecting,

classifying, presenting, comparing and interpreting numerical data collected to throw

some light on any sphere of enquiry. Spiegal defines statistics highlighting its role in

decision-making particularly under uncertainty, as follows: statistics is concerned with

scientific method for collecting, organising, summa rising, presenting and analyzing

data as well as drawing valid conclusions and making reasonable decisions on the basis

of such analysis. According to Prof. Horace Secrist, Statistics is the aggregate of facts,

affected to a marked extent by multiplicity of causes, numerically expressed,

enumerated or estimated according to reasonable standards of accuracy, collected in a

systematic manner for a pre-determined purpose, and placed in relation to each other.

From the above definitions, we can highlight the major characteristics of statistics as

follows:

(i) Statistics are the aggregates of facts. It means a single figure is not statistics.

For example, national income of a country for a single year is not statistics but

the same for two or more years is statistics.


(ii) Statistics are affected by a number of factors. For example, sale of a product

depends on a number of factors such as its price, quality, competition, the

income of the consumers, and so on.

(iii) Statistics must be reasonably accurate. Wrong figures, if analysed, will lead to

erroneous conclusions. Hence, it is necessary that conclusions must be based on

accurate figures.

(iv) Statistics must be collected in a systematic manner. If data are collected in a

haphazard manner, they will not be reliable and will lead to misleading

conclusions.

(v) Collected in a systematic manner for a pre-determined purpose

(vi) Lastly, Statistics should be placed in relation to each other. If one collects data

unrelated to each other, then such data will be confusing and will not lead to any

logical conclusions. Data should be comparable over time and over space.

1.3 TYPES OF DATA AND DATA SOURCES

Statistical data are the basic raw material of statistics. Data may relate to an activity of

our interest, a phenomenon, or a problem situation under study. They derive as a result

of the process of measuring, counting and/or observing. Statistical data, therefore, refer

to those aspects of a problem situation that can be measured, quantified, counted, or

classified. Any object subject phenomenon, or activity that generates data through this

process is termed as a variable. In other words, a variable is one that shows a degree of

variability when successive measurements are recorded. In statistics, data are classified

into two broad categories: quantitative data and qualitative data. This classification is

based on the kind of characteristics that are measured.

Quantitative data are those that can be quantified in definite units of measurement.

These refer to characteristics whose successive measurements yield quantifiable

observations. Depending on the nature of the variable observed for measurement,

quantitative data can be further categorized as continuous and discrete data.


Obviously, a variable may be a continuous variable or a discrete variable.

(i) Continuous data represent the numerical values of a continuous variable. A

continuous variable is the one that can assume any value between any two points

on a line segment, thus representing an interval of values. The values are quite

precise and close to each other, yet distinguishably different. All characteristics

such as weight, length, height, thickness, velocity, temperature, tensile strength,

etc., represent continuous variables. Thus, the data recorded on these and similar

other characteristics are called continuous data. It may be noted that a

continuous variable assumes the finest unit of measurement. Finest in the sense

that it enables measurements to the maximum degree of precision.

(ii) Discrete data are the values assumed by a discrete variable. A discrete variable

is the one whose outcomes are measured in fixed numbers. Such data are

essentially count data. These are derived from a process of counting, such as the

number of items possessing or not possessing a certain characteristic. The

number of customers visiting a departmental store everyday, the incoming

flights at an airport, and the defective items in a consignment received for sale,

are all examples of discrete data.

Qualitative data refer to qualitative characteristics of a subject or an object. A

characteristic is qualitative in nature when its observations are defined and noted in

terms of the presence or absence of a certain attribute in discrete numbers. These data

are further classified as nominal and rank data.

(i) Nominal data are the outcome of classification into two or more categories of

items or units comprising a sample or a population according to some quality

characteristic. Classification of students according to sex (as males and

females), of workers according to skill (as skilled, semi-skilled, and unskilled),

and of employees according to the level of education (as matriculates,

undergraduates, and post-graduates), all result into nominal data. Given any

such basis of classification, it is always possible to assign each item to a


particular class and make a summation of items belonging to each class. The

count data so obtained are called nominal data.

(ii) Rank data, on the other hand, are the result of assigning ranks to specify order

in terms of the integers 1,2,3, ..., n. Ranks may be assigned according to the

level of performance in a test. a contest, a competition, an interview, or a show.

The candidates appearing in an interview, for example, may be assigned ranks

in integers ranging from I to n, depending on their performance in the interview.

Ranks so assigned can be viewed as the continuous values of a

variable involving performance as the quality characteristic.

Data sources could be seen as of two types, viz., secondary and primary. The two can

be defined as under:

(i) Secondary data: They already exist in some form: published or unpublished -

in an identifiable secondary source. They are, generally, available from

published source(s), though not necessarily in the form actually required.

(ii) Primary data: Those data which do not already exist in any form, and thus have

to be collected for the first time from the primary source(s). By their very nature,

these data require fresh and first-time collection covering the whole population

or a sample drawn from it.

1.4 TYPES OF STATISTICS

There are two major divisions of statistics such as descriptive statistics and inferential

statistics. The term descriptive statistics deals with collecting, summarizing, and

simplifying data, which are otherwise quite unwieldy and voluminous. It seeks to

achieve this in a manner that meaningful conclusions can be readily drawn from the

data. Descriptive statistics may thus be seen as comprising methods of bringing out

and highlighting the latent characteristics present in a set of numerical data. It not

only facilitates an understanding of the data and systematic reporting thereof in a


manner; and also makes them amenable to further discussion, analysis, and

interpretations.

The first step in any scientific inquiry is to collect data relevant to the problem in hand.

When the inquiry relates to physical and/or biological sciences, data collection is

normally an integral part of the experiment itself. In fact, the very manner in which an

experiment is designed, determines the kind of data it would require and/or generate.

The problem of identifying the nature and the kind of the relevant data is thus

automatically resolved as soon as the design of experiment is finalized. It is possible in

the case of physical sciences. In the case of social sciences, where the required data are

often collected through a questionnaire from a number of carefully selected

respondents, the problem is not that simply resolved. For one thing, designing the

questionnaire itself is a critical initial problem. For another, the number of respondents

to be accessed for data collection and the criteria for selecting them has their own

implications and importance for the quality of results obtained. Further, the data have

been collected, these are assembled, organized, and presented in the form of appropriate

tables to make them readable. Wherever needed, figures, diagrams, charts, and graphs

are also used for better presentation of the data. A useful tabular and graphic

presentation of data will require that the raw data be properly classified in accordance

with the objectives of investigation and the relational analysis

to be carried out.
A well thought-out and sharp data classification facilitates easy description of the

hidden data characteristics by means of a variety of summary measures. These include

measures of central tendency, dispersion, skewness, and kurtosis, which constitute the

essential scope of descriptive statistics. These form a large part of the subject matter of

any basic textbook on the subject, and thus they are being discussed in that order here

as well.
Inferential statistics, also known as inductive statistics, goes beyond describing a

given problem situation by means of collecting, summarizing, and meaningfully

presenting the related data. Instead, it consists of methods that are used for drawing

inferences, or making broad generalizations, about a totality of observations on the basis

of knowledge about a part of that totality. The totality of observations about which an

inference may be drawn, or a generalization made, is called a population or a universe.

The part of totality, which is observed for data collection and analysis to gain

knowledge about the population, is called a sample.

The desired information about a given population of our interest; may also be collected

even by observing all the units comprising the population. This total coverage is called

census. Getting the desired value for the population through census is not always

feasible and practical for various reasons. Apart from time and money considerations

making the census operations prohibitive, observing each individual unit of the

population with reference to any data characteristic may at times involve even

destructive testing. In such cases, obviously, the only recourse available is to employ

the partial or incomplete information gathered through a sample for the purpose. This

is precisely what inferential statistics does. Thus, obtaining a particular value from the

sample information and using it for drawing an inference about the entire population

underlies the subject matter of inferential statistics. Consider a situation in which one is

required to know the average body weight of all the college students in a given

cosmopolitan city during a certain year. A quick and easy way to do this is to record the

weight of only 500 students, from out of a total strength of, say, 10000, or an unknown

total strength, take the average, and use this average based on incomplete weight data

to represent the average body weight of all the college students. In a different situation,

one may have to repeat this exercise for some future year and use the quick estimate of

average body weight for a comparison. This may be needed, for example, to decide
whether the weight of the college students has undergone a significant change over the

years compared.

Inferential statistics helps to evaluate the risks involved in reaching inferences or

generalizations about an unknown population on the basis of sample information. for

example, an inspection of a sample of five battery cells drawn from a given lot may

reveal that all the five cells are in perfectly good condition. This information may be

used to conclude that the entire lot is good enough to buy or not.

Since this inference is based on the examination of a sample of limited number of cells,

it is equally likely that all the cells in the lot are not in order. It is also possible that all

the items that may be included in the sample are unsatisfactory. This may be used to

conclude that the entire lot is of unsatisfactory quality, whereas the fact may indeed be

otherwise. It may, thus, be noticed that there is always a risk of an inference about a

population being incorrect when based on the knowledge of a limited sample. The

rescue in such situations lies in evaluating such risks. For this, statistics provides the

necessary methods. These centres on quantifying in probabilistic term the chances of

decisions taken on the basis of sample information being incorrect. This requires an

understanding of the what, why, and how of probability and probability distributions to

equip ourselves with methods of drawing statistical inferences and estimating the

degree of reliability of these inferences.

1.5 SCOPE OF STATISTICS

Apart from the methods comprising the scope of descriptive and inferential branches

of statistics, statistics also consists of methods of dealing with a few other issues of

specific nature. Since these methods are essentially descriptive in nature, they have

been discussed here as part of the descriptive statistics. These are mainly concerned

with the following:

(i) It often becomes necessary to examine how two paired data sets are related. For

example, we may have data on the sales of a product and the expenditure
incurred on its advertisement for a specified number of years. Given that sales

and advertisement expenditure are related to each other, it is useful to examine

the nature of relationship between the two and quantify the degree of that

relationship. As this requires use of appropriate statistical methods, these falls

under the purview of what we call regression and correlation analysis.

(ii) Situations occur quite often when we require averaging (or totalling) of data on

prices and/or quantities expressed in different units of measurement. For

example, price of cloth may be quoted per meter of length and that of wheat per

kilogram of weight. Since ordinary methods of totalling and averaging do not

apply to such price/quantity data, special techniques needed for the purpose are

developed under index numbers.

(iii) Many a time, it becomes necessary to examine the past performance of an

activity with a view to determining its future behaviour. For example, when

engaged in the production of a commodity, monthly product sales are an

important measure of evaluating performance. This requires compilation and

analysis of relevant sales data over time. The more complex the activity, the

more varied the data requirements. For profit maximising and future sales

planning, forecast of likely sales growth rate is crucial. This needs careful

collection and analysis of past sales data. All such concerns are taken care of

under time series analysis.

(iv) Obtaining the most likely future estimates on any aspect(s) relating to a business

or economic activity has indeed been engaging the minds of all concerned. This

is particularly important when it relates to product sales and demand, which

serve the necessary basis of production scheduling and planning. The

regression, correlation, and time series analyses together help develop the basic

methodology to do the needful. Thus, the study of methods and techniques of

obtaining the likely estimates on business/economic variables comprises the


scope of what we do under business forecasting. Keeping in view the importance

of inferential statistics, the scope of statistics may finally be restated as

consisting of statistical methods which facilitate decision-- making under

conditions of uncertainty. While the term statistical methods is often used to

cover the subject of statistics as a whole, in particular it refers to methods by

which statistical data are analysed, interpreted, and the inferences drawn for

decision-making.

Though generic in nature and versatile in their applications, statistical methods have

come to be widely used, especially in all matters concerning business and economics.

These are also being increasingly used in biology, medicine, agriculture, psychology,

and education. The scope of application of these methods has started opening and

expanding in a number of social science disciplines as well. Even a political scientist

finds them of increasing relevance for examining the political behaviour and it is, of

course, no surprise to find even historians statistical data, for history is essentially past

data presented in certain actual format.

1.6 IMPORTANCE OF STATISTICS IN BUSINESS

There are three major functions in any business enterprise in which the statistical

methods are useful. These are as follows:

(i) The planning of operations: This may relate to either special projects or to

the recurring activities of a firm over a specified period.

(ii) The setting up of standards: This may relate to the size of employment,

volume of sales, fixation of quality norms for the manufactured product, norms

for the daily output, and so forth.

(iii) The function of control: This involves comparison of actual production

achieved against the norm or target set earlier. In case the production has fallen

short of the target, it gives remedial measures so that such a deficiency

does not occur again.


A worth noting point is that although these three functions-planning of operations,

setting standards, and control-are separate, but in practice they are very much

interrelated.

Different authors have highlighted the importance of Statistics in business. For instance,

Croxton and Cowden give numerous uses of Statistics in business such as project

planning, budgetary planning and control, inventory planning and control, quality

control, marketing, production and personnel administration. Within these also they

have specified certain areas where Statistics is very relevant. Another author, Irwing

W. Burr, dealing with the place of statistics in an industrial organisation, specifies a

number of areas where statistics is extremely useful. These are: customer wants and

market research, development design and specification, purchasing, production,

inspection, packaging and shipping, sales and complaints, inventory and maintenance,

costs, management control, industrial engineering and research. Statistical problems

arising in the course of business operations are multitudinous. As such, one may do no

more than highlight some of the more important ones to emphasis the relevance of

statistics to the business world. In the sphere of production, for example, statistics can

be useful in various ways.

Statistical quality control methods are used to ensure the production of quality goods.

Identifying and rejecting defective or substandard goods achieve this. The sale targets

can be fixed on the basis of sale forecasts, which are done by using varying methods of

forecasting. Analysis of sales affected against the targets set earlier would indicate the

deficiency in achievement, which may be on account of several causes: (i) targets were

too high and unrealistic (ii) salesmen's performance has been poor (iii) emergence of

increase in competition (iv) poor quality of company's product, and so on. These factors

can be further investigated.

Another sphere in business where statistical methods can be used is personnel

management. Here, one is concerned with the fixation of wage rates, incentive norms
and performance appraisal of individual employee. The concept of productivity is very

relevant here. On the basis of measurement of productivity, the productivity bonus is

awarded to the workers. Comparisons of wages and productivity are undertaken in order

to ensure increases in industrial productivity.

Statistical methods could also be used to ascertain the efficacy of a certain product, say,

medicine. For example, a pharmaceutical company has developed a new medicine in

the treatment of bronchial asthma. Before launching it on commercial basis, it wants to

ascertain the effectiveness of this medicine. It undertakes an experimentation involving

the formation of two comparable groups of asthma patients. One group is given this

new medicine for a specified period and the other one is treated with the usual

medicines. Records are maintained for the two groups for the specified period. This

record is then analysed to ascertain if there is any significant difference in the recovery

of the two groups. If the difference is really significant statistically, the new medicine

is commercially launched.

1.7 LIMITATIONS OF STATISTICS

Statistics has a number of limitations, pertinent among them are as follows:

(i) There are certain phenomena or concepts where statistics cannot be used. This

is because these phenomena or concepts are not amenable to measurement. For

example, beauty, intelligence, courage cannot be quantified. Statistics has no

place in all such cases where quantification is not possible.

(ii) Statistics reveal the average behaviour, the normal or the general trend. An

application of the 'average' concept if applied to an individual or a particular

situation may lead to a wrong conclusion and sometimes may be disastrous. For

example, one may be misguided when told that the average depth of a river from

one bank to the other is four feet, when there may be some points in between

where its depth is far more than four feet. On this understanding, one may enter

those points having greater depth, which may be hazardous.


(iii) Since statistics are collected for a particular purpose, such data may not be

relevant or useful in other situations or cases. For example, secondary data (i.e.,

data originally collected by someone else) may not be useful for the other

person.

(iv) Statistics are not 100 per cent precise as is Mathematics or Accountancy.

Those who use statistics should be aware of this limitation.

(v) In statistical surveys, sampling is generally used as it is not physically possible

to cover all the units or elements comprising the universe. The results may not

be appropriate as far as the universe is concerned. Moreover, different surveys

based on the same size of sample but different sample units may yield different

results.

(vi) At times, association or relationship between two or more variables is studied

in statistics, but such a relationship does not indicate cause and effect'

relationship. It simply shows the similarity or dissimilarity in the movement of

the two variables. In such cases, it is the user who has to interpret the results

carefully, pointing out the type of relationship obtained.

(vii) A major limitation of statistics is that it does not reveal all pertaining to a certain

phenomenon. There is some background information that statistics does not

cover. Similarly, there are some other aspects related to the problem on hand,

which are also not covered. The user of Statistics has to be well informed and

should interpret Statistics keeping in mind all other aspects having relevance on

the given problem.

Apart from the limitations of statistics mentioned above, there are misuses of it. Many

people, knowingly or unknowingly, use statistical data in wrong manner. Let us see

what the main misuses of statistics are so that the same could be avoided when one has

to use statistical data. The misuse of Statistics may take several forms some of which

are explained below.


(i) Sources of data not given: At times, the source of data is not given. In the

absence of the source, the reader does not know how far the data are reliable.

Further, if he wants to refer to the original source, he is unable to do so.

(ii) Defective data: Another misuse is that sometimes one gives defective data. This

may be done knowingly in order to defend one's position or to prove a particular

point. This apart, the definition used to denote a certain phenomenon may be

defective. For example, in case of data relating to unem-ployed persons, the

definition may include even those who are employed, though partially. The

question here is how far it is justified to include partially employed persons

amongst unemployed ones.

(iii) Unrepresentative sample: In statistics, several times one has to conduct a

survey, which necessitates to choose a sample from the given population or

universe. The sample may turn out to be unrepresentative of the universe. One

may choose a sample just on the basis of convenience. He may collect the

desired information from either his friends or nearby respondents in his

neighbourhood even though such respondents do not constitute a representative

sample.

(iv) Inadequate sample: Earlier, we have seen that a sample that is unrepresentative

of the universe is a major misuse of statistics. This apart, at times one may

conduct a survey based on an extremely inadequate sample. For example, in a

city we may find that there are 1, 00,000 households. When we have to conduct

a household survey, we may take a sample of merely 100 households comprising

only 0.1 per cent of the universe. A survey based on such a small sample may

not yield right information.

(v) Unfair Comparisons: An important misuse of statistics is making unfair

comparisons from the data collected. For instance, one may construct an index
of production choosing the base year where the production was much less. Then

he may compare the subsequent year's production from this low base.

Such a comparison will undoubtedly give a rosy picture of the production

though in reality it is not so. Another source of unfair comparisons could be

when one makes absolute comparisons instead of relative ones. An absolute

comparison of two figures, say, of production or export, may show a good

increase, but in relative terms it may turnout to be very negligible. Another

example of unfair comparison is when the population in two cities is different,

but a comparison of overall death rates and deaths by a particular disease is

attempted. Such a comparison is wrong. Likewise, when data are not properly

classified or when changes in the composition of population in the two years are

not taken into consideration, comparisons of such data would be unfair as they

would lead to misleading conclusions.

(vi) Unwanted conclusions: Another misuse of statistics may be on account of

unwarranted conclusions. This may be as a result of making false assumptions.

For example, while making projections of population in the next five years, one

may assume a lower rate of growth though the past two years indicate otherwise.

Sometimes one may not be sure about the changes in business environment in

the near future. In such a case, one may use an assumption that may turn out to

be wrong. Another source of unwarranted conclusion may be the use of wrong

average. Suppose in a series there are extreme values, one is too high while the

other is too low, such as 800 and 50. The use of an arithmetic average in such a

case may give a wrong idea. Instead, harmonic mean would be proper in such a

case.

(vii) Confusion of correlation and causation: In statistics, several times one has to

examine the relationship between two variables. A close relationship between the two

variables may not establish a cause-and-effect-relationship in the sense that on


variable is the cause and the other is the effect. It should be taken as something that measures

degree of association rather than try to find out causal relationship..

1.8 SUMMARY

In a summarized manner, ‘Statistics’ means numerical information expressed in

quantitative terms. As a matter of fact, data have no limits as to their reference,

coverage, and scope. At the macro level, these are data on gross national product and

shares of agriculture, manufacturing, and services in GDP (Gross Domestic Product).

At the micro level, individual firms, howsoever small or large, produce extensive

statistics on their operations. The annual reports of companies contain variety of data

on sales, production, expenditure, inventories, capital employed, and other activities.

These data are often field data, collected by employing scientific survey techniques.

Unless regularly updated, such data are the product of a one-time effort and have limited

use beyond the situation that may have called for their collection. A student knows

statistics more intimately as a subject of study like economics, mathematics, chemistry,

physics, and others. It is a discipline, which scientifically deals with data, and is often

described as the science of data. In dealing with statistics as data, statistics has

developed appropriate methods of collecting, presenting, summarizing, and analysing

data, and thus consists of a body of these methods.

1.9 SELF-TEST QUESTIONS

1. Define Statistics. Explain its types, and importance to trade, commerce and

business.

2. “Statistics is all-pervading”. Elucidate this statement.

3. Write a note on the scope and limitations of Statistics.

4. What are the major limitations of Statistics? Explain with suitable examples.

5. Distinguish between descriptive Statistics and inferential Statistics.


1.10 SUGGESTED READINGS

1. Gupta, S. P. : Statistical Methods, Sultan chand and Sons, New Delhi.

2. Hooda, R. P.: Statistics for Business and Economics, Macmillan, New Delhi.

3. Hein, L. W. Quantitative Approach to Managerial Decisions, Prentice Hall,

NJ.

4. Levin, Richard I. and David S. Rubin: Statistics for Management, Prentice

Hall, New Delhi.

5. Lawrance B. Moore: Statistics for Business & Economics, Harper Collins,

NY.

6. Watsman Terry J. and Keith Parramor: Quantitative Methods in Finance

International, Thompson Business Press, London.

You might also like