CH 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Method of Data Presentations

Chapter

II

15
CHAPTER

2. Introduction 2
Chapter Contents
2.1. Types of data
2.1 Types of data 2.2.Data collection methods
2.3. Frequency distribution
2.4. Pictorial presentation
Before knowing the methods used to collect data / information we shall see
2.5. Review questions
first about types of data. Statistical data may be classified in to two basic
types based on sources.

Definition 2.1. Primary data / original data: refers to those data


which are collected for the first time. Data obtained from primary
sources.
Primary statistical data can be obtained from:
(i) Census – complete enumeration of all the unit of the population
(ii) Surveys – the study of representative part of a population
(iii) Experimentation – observation from experiment carried out in lab-
oratories and research center.
(iv) Administrative process e.g. Record of births and deaths.

Advantages of using primary data Primary data

• Comprises of actual data needed


• It is more reliable with clarity
• Comprises a more detail information

Disadvantages of using primary data


• Cost of data collection is high
• Time consuming
• There may be larger range of non response

Definition 2.2. Secondary data: refers to already existing information.


Data obtained from secondary sources such as publication, newspapers,
and annual reports, internet.

Secondary statistical data can be obtained from:


(i) Publication e.g. extract from publications
Methods of collection of data 17

(ii) Research/Media organization


(iii) Educational institutions

Advantages of using secondary data Secondary data

• Cheap–It is less expensive to gather.


• Saves time–The outcome is timely
• can be obtained quickly

Disadvantages of using secondary data


• lack of accuracy
• the method by which it was collected at the first time is not known
to the user of the data
Now we shall discuss the methods used for collecting data. Collecting
data was the first stage in any statistical investigation after identifica-
tion of the problem to solve.

2.2 Methods of collection of data


One of the first problems a statistician faces is that of obtaining data. Data
doesn’t just happen; data must be collected. It is important to obtain
“good data” since the inferences ultimately or finally made will be based
on the sample statistic. Good data -is data that accurately represents the
population from which it was taken.
There are various methods 1 we can use to collect data. The method used 1 Methods
depends on the nature of problem and type of data to be collected. Observation
Interview
Questionnaire
Activity 2.1. : What are the techniques or instruments of data collection?
Rewritten documents
Discuss.

Some of the methods include:


1. Extraction of data from records/ written documents: data ob-
tained through this technique are transport data, hospital data, prison
data, accident figures etc.
2. Questionnaire – A set of questions or statement is assembled to get
information on a variable (or a set of variable). The entire package
of questions or statement is called a questionnaire. Human beings
usually are required to respond to the questions or statements on
the questionnaire. Copies of the questionnaire can be administered
personally by its user or sent to people by post. Advantage of this
method are cheapest, very effective, can be conducted by a single
Methods of collection of data 18

researcher. The dis advantage were low response rate, (mail) is not
suitable for illiterate community.
3. Observation : Observational methods are used mostly in scientific
inquiry where data are observed directly from controlled experiment.
It is used more in the natural sciences through laboratory works than
in social sciences. But this is very useful studying small communities
and institutions.
4. Interviewing: In this method, the person collecting the data is called
the interviewer goes to ask the person (interviewee) direct questions.
The interviewer has to go to the interviewees personally to collect
the information required verbally. This method has also different
schemes.
Observation and Interviewing are used to collect information regarding land
area, crop output, and exchange activities (purchase and sale prices) ,etc.

Classification and Tabulation


Definition 2.3. Classification of data:It is the process of arranging
items / data in to homogenous group or classes or categories according
to their similarities and/ or differences.

For Example: The process of sorting letters in a post office, the letters are
classified according to the cities and further arranged according to streets.

Definition 2.4. Tabulation of data:The process of placing classified


data into tabular form is known as tabulation.

Definition 2.5. A table is a symmetric arrangement of statistical data


in rows and columns.
Rows are horizontal arrangements whereas columns are vertical arrange-
ments. It may be simple, double or complex depending upon the type of
classification.

Activity 2.3. What is the difference b/n Classification & Tabulation? Dis-
cuss.
Frequency Distribution 19

2.3 Frequency Distribution


Some terms
- Raw data: are collected data which have not been organized numer-
ically.
- An array: is an arrangement or sorting of raw numerical data in
either increasing or decreasing order of magnitude.
- Frequency: is the number of values that occur in a specific class of
the distribution.
- Tallies: are marks used to facilitate counting.
Uses of array:It enables us to know the range of the data set easily. It
also gives us some idea about the general X- Cs of the distribution.

Definition 2.6. Frequency distribution is a grouping of data in two cat-


egories showing the number of observations in each mutually exclusive
category. In other words a frequency distribution is the presentation of
raw data in table form, using classes with the corresponding frequen-
cies.

Note 2.1. In general grouping of data amounts to / requires two things:


(i)dividing the data in to a certain number of class intervals and (ii)Finding
how many [observations] values fall in to each interval.

Uses of frequency distribution

The following are the uses of frequency distributions.


1. To present data in a meaningful way.
2. To determine the shape of the distribution.
3. To make calculations simple.
4. Drawing charts and graphs becomes easier.
5. To make comparisons among different sets of data.

Types of frequency distributions


3 3
There are 3 basic types of frequency distributions. let’s see how to prepare Types of f.d
each type of frequency distributions. (A) Categorical frequency
distributions
(B) Ungrouped frequency dis-
(A) Categorical frequency distribution tributions
(C) Grouped frequency distri-
It is used for data that can be placed in specific categories, such as nominal butions
- or -ordinal level data. For example: blood type with values (A, B, AB,
and O)
Frequency Distribution 20

ILLUSTRATION 2.1. 25 army inductees were given a blood test to deter-


mine their blood type. The data were recorded and shown to the right. 4 4
Data 1: Blood type data
Construct a suitable frequency distribution for this data. A B B AB O O A
O B AB B B B
Solution:The data (blood type) is nominal data. So we must use categor- O A O A O O
ical frequency distribution to present such type of data. O AB AB A O B

Table 2.1. Blood type of army inductees in a blood test


Class(blood type) Tally Frequency
A ///// 5
B /////// 7
O ///////// 9
AB //// 4
Total 25

(B) Ungrouped frequency distribution

We use this type of frequency distribution to present raw data when the
range of the data values is small. Mostly it is useful for discrete and small
set of data. Discrete data is generated by counting.

ILLUSTRATION 2.2. A survey taken in a restaurant shows the following


number of cups of coffee consumed with each meal. Construct a suitable
frequency distribution for this data. 5 5
Data 2: coffee consumed( in
cups)
Solution: 0 , 2 , 2 ,1 ,1, 2 , 3 , 5 , 3 ,2 ,
Step1 Find the range of the data. R=5-0=5. Since the range is small, 2 , 2, 1 , 0 , 1, 2 , 4 , 2 , 0, 1,
0, 1, 4, 4 , 2 , 2 ,0 , 1 , 1 , 5
Class consisting of a single data value can be used. They are 0, 1, 2,
3, 4, and 5.
Step2 Make a table using two categories (class and frequencies)
Step3 Tally the data.
Step4 Complete the f column.

Table 2.2. coffee consumption of people surveyed at restaurant


Class (cups of coffee consumed) tally frequency (number of people)
0 ///// 5
1 //////// 8
2 ////////// 10
3 // 2
4 /// 3
5 // 2
Total 30
Frequency Distribution 21

(C) Grouped frequency distribution

When the range of the data is large, the data must be grouped in to classes
that are more than one unit in width. Some concepts used when we present
data in g .f .d are mentioned and defined below.
1. Lower class limits (LCL): represents the smallest data value that can be included in the
class.
2. Upper class limits (UCL): represents the largest data value that can be included in the
class.
3. Class boundaries (CBs): these are numbers used to separate the class so that there are
no gaps in frequency distribution. The class limits should have the same decimal place value
as the data but the class boundaries have one additional place value and always end with the
digit 5.
4. Unit of measurement (U): is the smallest difference between any two values of the variable
being studied. It is important to change the class limits in to class boundaries.
5. The class width (W): for a class in a frequency distribution is found by subtracting
lower/upper class limit of one class minus the lower /upper class limit of the previous class.
Or it is the difference between UCB and LCB of any class.
6. Cumulative frequency (cf): are used to show how many values are accumulated up to and
including a specific class.
7. Relative frequency (rf): is the frequency of the class divided by the total frequency (i.e.
sum of all frequencies). And if multiplied by 100, gives the percentage of values falling in that
class.
8. Class mark or midpoint: is denoted by Xmi or CMi and is obtained by adding the upper and
lower Class boundaries and dividing by 2, or adding the LCL and UCLs and dividing by 2.
9. The class interval: is the range of values of each class. The class interval 10 – 20 contains all
the values between 10 and 20. Its mid point is 15. The lower class limit is 10. The upper class
limit is 20.
10. Class: is different non overlapping group of data.
11. Open-end Classes: A class has either no lower class limit or no upper class limit in a
frequency table is called an open-end class. We do not like to use open-end classes in practice,
because they create problems in calculation. For example:
Weights (Pounds) No of Persons
Below 110 6
110 - 120 12
120 - 130 20
130 - 140 10
Above 140 2

Definition 2.7 (Cumulative frequency (less than type)). is the sum of


frequencies accumulated up to the UCB of a class in the distribution.

Lcf = f 1, f 1 + f 2, . . . , f 1 + f 2 + ... + f k
Frequency Distribution 22

Note 2.2. The relative frequency shows what fractional part or portion of
the total frequencies belongs to the corresponding class. The sum of all the
relative frequencies in the frequency distribution is always one.

Note 2.3. The Xmi is the numerical location of the center of the class
interval. Midpoints are necessary for graphing frequency polygon. They
are also used in computing the mean and the standard deviation.

Rules to construct a frequency distribution

1. There should be between five and twenty classes, i.e. 5=k=20.


2. The class width should be an odd number. This ensures that the midpoint of each class has
the same decimal place value as the data.
3. The classes’ musts are mutually exclusive. Mutually exclusive classes have non ?overlapping
class limits so that data can?t be placed in to two classes.
4. The classes must be continuous.
5. The classes must be exhaustive?there should be enough classes to accommodate all the data
6. The classes must be equal in width in case of no open ended classes

Summary table for constructing a grouped frequency distribution

Step1 Find the range.


R=H −L (2.3.1)

Step2 Select the No. of classes desired.

K = 1 + 3.322 ∗ logn (2.3.2)

Step3 Find the class width and round up if necessary.

R
W = (2.3.3)
k
Step4 Select a starting point (usually the lowest value; this becomes the lowest limit of the 1st class.
Then add the width successively to get the lower limits of the next classes.
Step5 Find U (unit of measurement).
Step6 Find the upper class limits.
Step7 Find the class boundaries.
Step8 Tally the data.
Step9 Find the frequencies.
Step10 Find the cumulative frequencies.
Frequency Distribution 23

ILLUSTRATION 2.3. Construct a suitable frequency distribution with suit-


able class interval size of marks obtained by 50 students of a class given .
6 6
23, 50, 38, 42, 63, 75, 12,
33, 26, 39, 35, 47, 43, 52, 56,
Solution: 59, 64, 77, 15, 21, 51, 54, 72,
Step1 Arrange the marks in ascending order as :12, 15, ... , 75, 77. Then 68, 36, 65, 52, 60, 27, 34, 47,
48, 55, 58, 59, 62, 51, 48, 50,
find the range. Minimum Value = 12 ; Maximum value = 77. ⇒ R
41, 57, 65, 54, 43, 56, 44, 30,
=77-12 =65. 46, 67, 53.
Step2 Decide the N o of classes. ⇒ k = 1 + 3.322logn = 1 + 3.322log50 =
6.65 or 7 approximate.
Step3 Class Interval Size (h or w) = R/k=65/7=9.3 ≈ 10.
Step4 Select starting point. Let it be 10. It is considered as LCL1 .
Step5 The completed table

Table 2.3. Mark distribution of 50 students


class Class class Class Number of Cumulative Cumulative
Limits Boundary Marks Students frequencies frequencies
C.L C.B Xmi fi <F >F
1 10-19 9.5-19.5 14.5 2 2 50
2 20-29 19.5-29.5 24.5 4 6 48
3 30-39 29.5-39.5 34.5 7 13 44
4 40-49 39.5-49.5 44.5 10 23 37
5 50-59 49.5-59.5 54.5 16 39 27
6 60-69 59.5-69.5 64.5 8 47 11
7 70-79 69.5-79.5 74.5 3 50 3

Note 2.4. For finding the class boundaries, we take half of the difference between lower class limit
of the 2nd class and upper class limit of the 1st class (20-19)/2=0.5. This value is subtracted from
lower class limit and added in upper class limit to get the required class boundaries.

Activity 2.4. Find less than and more than comulative frequencies.

Table 2.4. Distribution of female workers in different garments industries


Number of workers <100 100-150 150-200 200-250 250-300 300+
Number of industries (f) 10 28 65 35 15 8

Solution:

Table 2.5. Distribution of female workers in different garments industries


Number of workers <100 100-150 150-200 200-250 250-300 300+
Number of industries 10 28 65 35 15 8
<F 10 38 103 138 153 171
>F 171 161 123 58 23 8
Frequency Distribution 24

2.4 Graphs and Diagrams


One of the most effective and interesting alternative way in which a statistical data may be presented
is through diagrams and graphs. This means the data that is organized by a frequency distribution
can also be displayed or presented diagrammatically or graphically. Usually diagrams are appropriate
for presenting discrete data, where as graphs are appropriate for presenting continuous types of data.

Uses of graphs and diagrams


¬ They are attractive to the eye. Since graphs have eye catching power, they can convey
messages easily. E.g. while reading books or news papers; you first go to the pictures.

­ They are helpful for memorizing facts. Because the imprecations created by diagrams
and graphs can be retained in your mind for a long period of time.

® They facilitate comparison. They help one in making quick and accurate comparisons of
data. They bring out hidden facts and relationships. The information presented can be easily
understood at a glance.

Graphs for quantitative data

These include the histogram, the frequency polygon the cumulative fre-
quency curve / Ogive curve, the line graph, etc.
1. Histogram – is a graph that displays the data by using vertical bars
[rectangles] of various heights to represent the frequencies.

Figure 2.1. Mark distribution of 50 students shown in Table ??

Histogram of marks
20
15
Frequency

10
5
0

10 20 30 40 50 60 70 80

marks
Frequency Distribution 25

2. The frequency polygon – is a graph that displays the data by using


the lines that connect the points plotted for the frequencies at the midpoints
of the class. The frequency represents the height of the mid points. The
dotted lines at both ends show that they are not part of the data. They
are needed for the polygon to be closed.

Figure 2.2. Mark distribution of 50 students shown in Table ??

frequency polygon
14
12
10
frequency

8
6
4
2

20 30 40 50 60 70

Marks(midpoint)

3. The Ogive curve – is a graph that represents the cumulative frequencies


for the class in a frequency distribution. It is used to visually represent how
many values are below a certain upper class boundary.

Figure 2.3.
less than ogive greater than ogive
50

50
40

40
30

30
<F

>F
20

20
10

10

20 30 40 50 60 70 80 10 20 30 40 50 60 70

Marks) Marks)
Frequency Distribution 26

Diagrams for qualitative data

Types of Diagrams/Charts: These include bar chart, pie chart, etc.


4. Simple Bar Chart: A simple bar chart is used to represents data
involving only one variable classified on spatial, quantitative or temporal
basis.In simple bar chart, we make bars of equal width but variable length,
i.e. the magnitude of a quantity is represented by the height or length of
the bars. Following steps are undertaken in drawing a simple bar diagram.
• Draw two perpendicular lines one horizontally and the other vertically
at an appropriate place of the paper.
• Take the basis of classification along horizontal line (X-axis) and the
observed variable along vertical line (Y-axis) or vice versa.
• Marks signs of equal breath for each class and leave equal or not less
than half breath in between two classes.
• Finally marks the values of the given variable to prepare required
bars.

ILLUSTRATION 2.4. Draw simple bar diagram to represent the profits of a


bank for 5 years.

Table 2.6. A profit of a bank for 5 years


Years 1989 1990 1991 1992 1993
Profit (million $) 10 12 18 25 42

Solution:

Figure 2.4. A profit of a bank for 5 years

5. Multiple Bar Charts– By multiple bars diagram two or more sets of


inter-related data are represented (multiple bar diagram facilities compar-
ison between more than one phenomenons’). The technique of simple bar
chart is used to draw this diagram but the difference is that we use dif-
ferent shades, colors, or dots to distinguish between different phenomena.
Frequency Distribution 27

We use to draw multiple bar charts if the total of different phenomena is


meaningless.

ILLUSTRATION 2.5. Draw a multiple bar chart to represent the import and
export of Canada (values in $) for the years 1991 to 1995. Years Imports
Exports
Solution:

Figure 2.5. The import and export of Canada: from 1991 - 1995.

6. Component Bar Chart: Sub-divided or component bar chart is used


to represent data in which the total magnitude is divided into different
or components. In this diagram, first we make simple bars for each class
taking total magnitude in that class and then divide these simple bars into
parts in the ratio of various components . This type of diagram shows
the variation in different components within each class as well as between
different classes. Sub-divided bar diagram is also known as component bar
chart or staked chart.
7. Pie chart: It is a circle that is divided in sections according to the
percentage of frequencies in each class/category of the distribution. Pie
chart can used to compare the relation between the whole and its compo-
nents. Circles are drawn with radii proportional to the square root of the
quantities because the area of a circle is .

ILLUSTRATION 2.6. The following table gives the details of monthly budget
of a family. Represent these figures by a suitable diagram.

Table 2.7. Monthly budget of a family


Item of Expenditure Family Budget Angle of sectors Percentage
Food 144
Clothing 24
House Rent 96
Fuel and Lighting 24
Miscellaneous 72
Total 360
Review questions 28

Solution:The necessary computations are given below:

Figure 2.6. Monthly budget of a family

2.5 Review questions


I. The following table shows the type of cars manufactured by a certain
company during 1972-1975
Car type years
1972 1973 1974 1975
Toyota 400 300 380 450
Nissan 260 340 350 390
Isuzu 330 310 445 470
Construct component bar chart, multiple bar charts.
II. You want to find out whether your respondents smoke cigarettes,how
many and what they think about smoking in public places. Design a
short questionnaire to establish this using no more than five questions.
III. The daily emission (in tons) of sulfur oxides from an industrial plant
is shown below. Answer questions that follow based on this data.
15.8 26.4 17.3 11.2 23.9 24.8 18.7 13.9 9.0 13.2
22.7 9.8 6.2 14.7 17.5 26.1 12.8 28.6 17.6 23.7
26.8 22.7 18.0 20.5 11.0 20.9 15.5 19.4 16.7 10.7
19.1 15.2 22.9 26.6 20.4 21.4 19.2 21.6 16.9 19.0
18.5 23.0 24.6 20.1 16.2 18.0 7.7 13.5 23.5 14.5
14.4 29.6 19.4 17.0 20.8 24.3 22.5 24.6 18.4 18.1
8.3 21.9 12.3 22.3 13.3 11.8 19.3 20.0 25.7 31.8
25.9 10.5 15.9 27.5 18.1 17.9 9.4 24.1 20.1 28.5
1. construct f.d table
2. draw i) histogram ii) frequency polygon iii) ogive curve

You might also like