Chapter 1 - Definition and Uses of Statistics - 1
Chapter 1 - Definition and Uses of Statistics - 1
Chapter 1 - Definition and Uses of Statistics - 1
Statistical Method
Limitation of statistics: Although Statistics has it’s application in almost all sciences-
social, physical, and natural- it has it’s own limitations as well, which restrict it’s scope
and utility.
a) Statistics does not study qualitative phenomena: Since Statistics deals with
numerical data, it cannot be applied in studying those problems which can be
stated and expressed quantitatively. Qualitative characteristics such as honesty,
poverty, welfare, beauty or health cannot be measured directly quantitatively.
However, these subjective concepts can be related in an indirect manner to
numerical data after assigning particular scores.
b) Statistics does not study individuals: Statistics always deals with aggregated
data that Statistics always helps for taking decision about a population based on
sample information.
c) Statistics can be misused: Statistics are liable to be misused. For proper use of
statistics one should have enough skill, knowledge, experience to draw accurate
and sensible conclusion. Further, valid results cannot be drawn from the use of
statistics unless one has a proper understanding of the subject to which it is
applied.
Dummy Variable: The variable which takes only two values either 0 or 1. As for
example, yes or no, present or absent etc.
Primary Data: The data which collected from the respondent directly by different survey
using questionnaire or schedule is known as primary data.
Secondary Data: The data collected from different published journals or books or other
sources is known as secondary data.
Discrete Variable: A discrete variable may (but does not necessary) have a finite number
of values. However, the most common type of discrete that we will encounter produces a
response that comes from a counting process. As for example, the number of students
enrolled in a class, the number of university credits earned by a student at end of a
particular semester and the number of insurance claims filed following a particular
hurricane in any particular state.
Continuous Variable: A continuous variable may take any value within a given range of
real numbers and usually arises from a measurement (not a counting) process. As for
example continuous numerical variables include height, weight, time, distance,
temperature etc. Someone might say that he is 6 feet (or 72 inches) tall but his height
could be actually be 72.1 inches, 71.8 inches or some other similar number, depending on
the accuracy of the instrument used to measure height. Other examples of continuous
numerical variables include the weight of cereal boxes, the time to run a race, and the
distance between two cities etc.
Measurement and Scaling Concept:
Measurement: Measurement is the assignment of numbers or other symbols to
characteristics of objects according to certain pre-specified rules. Note that what we
measure is not the object, but some characteristic of it. Thus, we do not measure objects-
Scale: A Scale may be defined as any series of items that are arranged progressively
according to value or magnitude into which an item can be placed according to its
quantification. It can be defined as a continuous spectrum or series of categories.
Nominal Scale: It is a measurement scale of simplest type in which the number or letters
assigned to objects serve only labels or tags for identifying and classifying objects with a
strict one-to-one correspondence between the numbers and the objects.
Example: In business research if we give the coding of males as 1 and females as 2.
These two numbers are nothing but levels.
Ordinal Scale: It is type of scale that arranges objects or alternatives according to their
magnitudes in an ordered relationship. When the respondents are ordered, ordinal values
are assigned. Thus it is possible to determine whether an object has more of less of a
characteristic than some other object.
Example: In business research if we ask to rate companies as excellent, good, fair, poor
we know excellent is higher.
Interval Scale: It is another type of scale that not only arranges objects according to their
magnitudes but also distinguishes their ordered arrangement in units of equal intervals.
An interval scale contains all the information of an ordinal scale, but it also allows you to
compare the differences between objects. The difference between any two scale values is
identical to the difference between any other two adjacent values of an interval scale.
There is a constant or equal interval between scale values.
Ratio Scale: A ratio scale possesses all the properties of the nominal, ordinal and interval
scales and in addition, an absolute zero point. It possesses an absolute zero. Thus, in ratio
scales we can identify or classify objects, rank the objects and compare intervals or
differences. It is also meaningful to compute ratios of scale values.
Example: Money and weight are ratio because they posses an absolute zero and interval
properties.
Permissible Statistics
Common Marketing Numerical
Scale Basic Characteristics
Examples Examples Operation
Descriptive Inferential
Numbers identify Social Security Brand Counting Percentages, Chi-square,
and classify objects numbers, numbers, store mode binomial test
Nominal
numbering of types, sex
football players classification
Numbers indicate Quality Preference Rank Percentile, Rank-order
the relative rankings, rankings, ordering median correlation,
positions of the rankings of market Friedman
Ordinal objects but not the teams in a position, social ANOVA
magnitude of tournament class
differences
between them
Frequency distribution:
If the variation within the data set is not so wide, then it is wide to construct ungrouped
frequency distribution for summarizing data. If the number of observations obtained gets
large, the method discussed above to summarize data become difficult and time
consuming. Thus to further summarizing the data into group frequency distribution
tables, the following steps should be taken:
i) select an appropriate number of non-overlapping class intervals
ii) determine the width of the class interval
iii) determine class limits (or boundaries) for each class interval to avoid
overlapping.
Step 1: Determine range: From the given data set, find out the lowest value and the
highest value. Then range is the difference between highest value and lowest value.
Step 2: Determine the number of class: If K determine the number of classes and N the
total number of observations, then the value K will be the smallest exponent of number 2,
that is 2k N .
Another way to find the value of K by using Sturge’s rule is given by: K = 1 + 3.322
log10 N , where log10 N is the logarithm (base 10) of total number.
Step 3: Determine the width or the interval of classes: For constructing the frequency
distribution determine the suitable class interval i,
Range Range
i= =
K 1+3.322 log10 N
Both K and i should be rounded upward, possible to the next longest integer.
Step 4: Determine the class limits (boundaries): The limits of each class interval
should be clearly defined so that each observation (element) of the data set belongs to one
and only one class. The class interval must be inclusive and non-overlapping such as 20-
29, 30-39, etc. Sometimes we also need exclusive types of class, where upper limit of
each classes are excluded from the each class (such as 20-30, 30-40, 40-50 etc.)
Step 5: Mid-point of class interval: The class mid-point is the point halfway between
the boundaries of each class. That means, it is the average of upper limit and of lower
limit of each classes.
Step 6.Tally marks: Now each and every observations of the data set are matched with
the respective classes and put a tally for every observation, after completing the whole
data set, the tallies of every class are added and put it on corresponding classes. This is
known as frequencies.
Step 7: Cumulative frequency (less than): If you add the frequencies of each classes
with next class from the top in a cumulative form then it is known as cumulative
frequencies less than. If do the same thing from the bottom then it is known as
cumulative frequencies more than. If we divide the frequencies of each class by the total
frequencies then it is known as relative frequency.
Example 1: Following data shows the total time (in hours) work by 30 machinists.
Construct a frequency distribution.
90 88 90 89 90 84 86 90 84 89 93 84 90 94 91
94 93 93 92 92 85 88 86 91 87 94 89 85 90 95
Solution: Here the variations among the data set are not vary wide, so we construct a
ungroup frequency distribution as follows:
Working hours 84 85 86 87 88 89 90 91 92 93 94 95
No. of Employee 3 2 2 1 2 3 6 2 2 3 3 1
Example 2: Following data shows the weekly overtime (in hours) of 50 employees in a
reputed fashion design company. Construct a frequency distribution by taking suitable
class interval.
22 77 79 82 65 50 65 73 60 33 75 66 65 30 63 41 55
65 67 62 45 49 75 59 55 54 51 28 39 25 50 48 68 55
81 35 65 65 79 61 45 53 81 49 37 57 78 27 87 77
Solution: Here, 26 50 , so the value of number of classes is 6. And the range of the data
is 22-87=65. Therefore the width of the class inter is 10.83≅ 11. On the other hand, we
know that, suitable class interval = .
= . × .
=9.7837≅ 10
Here the nearest value of the width is 10 (we select it multiple of 5).
Example: The management of a factory wants to know per month working pattern of
workers of their factory. In this connection, a survey was conducted on randomly selected
48 workers of the factory. Following data give the number of hours work per month of
the 48 workers of the factory.
140 165 103 110 130 144 133 204 175 156 187 195
162 161 167 184 151 149 157 124 87 71 79 155
164 40 94 113 108 146 122 87 69 164 116 203
121 128 149 148 30 93 114 104 150 62 143 42
Once we carefully define a problem, we will need to collect data for making decision.
Often the number of observations collected is so large that the actual findings of the study
are unclear. For this reason, it is necessary to summarize data in such a way that a clear
and accurate picture emerges. Unfortunately, there is no single method or way to describe
data. Rather, the appropriate line is typically problem-specific, depending on two factors,
the type of data and the purpose of the study. Tables and graphs help us to gain a better
understanding of data and provide visual support for improved decision making.
(i) A graph should be clear and simple; a complicated graph defeats its own
purpose.
(ii) A graph should be completely self explanatory.
(iii) The origin, the vertical and the horizontal scales should be so chosen that a
graph does not convey a false impression about the nature of the data.
Limitation of graphs:
(i) They may be misleading, unless drawn and studied with care.
(ii) The conclusions drawn from the graphs should normally be regarded as
tentative and therefore, the graphs are no substitute for more critical statistical
analysis.
Types of diagrams:
(i) bar diagram, (ii) pie diagram, (iii) histogram, (iv) frequency polygon, (v) line diagram,
(vi) ogive curve (vii) scatter diagram.
Bar diagram and pie diagram are mainly used for representing qualitative data. The
former is also frequently used for depicting numerical values of a given item over a
period of time. Histogram, frequency polygon and cumulative frequency polygon are
used to represent frequency distributions. Line diagram is widely used to study the
changes in the values of a variable with the passage of time. Scatter diagram is very
useful in studying the interrelationship of two variables. .
Bar Diagram: This diagram is drawn by constructing a series of blocks of equal widths
but the heights of the blocks or rectangles is proportional to the values corresponding to
different time period or categories. Following Table shows the distribution of the
expenditure budget (in core taka) of different sector of country in the year 2012 as
follows:
taka)
Now if we put the categories (sector) in the x-axis and the expenditure in y-axis then the
diagram will be a bar diagram where the width of the bars are equal but the heights are
proportional to the expenditure of the sectors.
100
80
60
Series1
40
20
0
ry
s
re
t
or
ti o
er
st
tu
sp
th
ca
du
ul
an
O
ic
u
In
Ed
r
Tr
Ag
An interesting and useful extension to the simple bar chart can be used when components
of individual categories are also of interest. As for example, following Table shows the
number of students enrolled in three business majors for three different years of BUP.
100
90
80
70
60 Accounting
50 Marketing
40 Finance & Banking
30
20
10
0
2001-02 2002-03 2003-04
This information can be shown in a bar chart by breaking down the total number of
students for each year so that the three components are distinguished by differences
called components or bar chart. This graph allows us at make visuals comparisons of
totals and individual components. In this example it appears that the increase in
enrollment between 2001 and 2004 was almost uniform over the three majors.
Pie diagram: Pie charts are also used to describe categorical data. If we want to draw
attention to the proportion of frequencies in each category, then we will probably use a
pie chart to depict the division of a whole into its constituent parts. The circle represents
the total, and the segments cut from its center depict shares of that total. Following Table
shows the distribution of monthly expenditure of the students of BUP.
Food
Clothing
House rent
Education
Miscellaneous
9
8
7
6
No. of Client
5
4
3
2
1
0
10-15 15-20 20-25 25-30 30-35
Audit Time
Frequency Polygon
14
12
10
Frequency
8
6
4
2
0
15 25 35 45 55 65
Mid value
Ogive (less than): In the X axis we plot upper limit of the class and in Y axis we plot
cumulative frequency less than.
Frequency 5 7 11 8 4 2
Cumulative 5 12 23 31 35 37
frequency
Ogive Curve
40
Cumulative frequency
30
20 Series2
10
0
10 15 20 25 30 35
Upper lim its
Line diagram: If we are given the values of a variable at different point of time, the set
of values is known as a time series. The line diagram is used to represent this type of
data. In this diagram time is represented along the x-axis and the variable is plotted along
the y-axis. Thus we get a point for each time period and successive points, when it
connected by straight line, gives the desired diagram.
Year of 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
enrollment
No. of student 12 14 15 17 18 15 16 19 20 21
(‘00)
Enrollment
25
20
15
10
5
0
1 2 3 4 5 6 7 8 9 10
a) When the emphasis is on the movement of a variable rather then on it’s actual
magnitude.
b) When several series are compared on the same chart.
c) When estimates or forecasts of a variable are to be obtained or displayed
graphically.
Scatter diagram: Sometimes the data consist of pair values of two related variables, and
the statistical problem is to investigate the inter-relationship between the variables. The
pairs of values of such related variable are: height and weight, income and expenditure,
price and consumption etc. When the given pair of values is plotted on ordinary graph
paper, we get a scatter diagram. If the dotted points form an upward trend on the graph
paper then the relationship between the variable is positive. If it forms a down ward trend
on the graph paper then the relationship between two variables is negative.
Expense of Ad. 10 12 15 20 23 9 6 7 11 12 13
Sales (in lac) 14 17 23 21 25 11 8 9 14 13 27
Sales
30
25
20
15
10
0
0 5 10 15 20 25
The stem of an observation is the leading digit or digits and the leaf of an observation is
the trailing digit. All the values in the stem are listed in order in a column, a vertical line
is drawn beside them and then all the corresponding leaf values are recorded for each
stem in row, to right of vertical line.
a) Divide each observation into two parts: the stem and the leaf.
b) List the leaf in a column, with a vertical line to their right.
c) For each observation, record the leaf portion in the same raw as its corresponding
stem.
d) Order the leaves from lowest to highest in each stem.
Example: The prices (in taka) of 20 different brand of walking shoes are given below:
4 7 7 5 7 7 7 6 6 6 7 8 8 5 6 8 9 6 7 8
5 0 0 5 5 3 0 5 8 0 4 0 3 8 8 5 0 4 5 2
Construct a stem and leaf plot to display the distribution of the data.
Stem Leaf
4 5
5 5 8
6 5 8 0 8 4
7 0 0 5 3 0 4 5
8 0 3 5 2
9 0
Stem Leaf
4 5
5 5 8
6 0 4 5 8 8
7 0 0 0 3 4 5 5
8 0 2 3 5
9 0
From the display it is seen that lowest price of walking shoe is 45 and highest is 90. And
the most common price is 70.