0% found this document useful (0 votes)
22 views

Stat 1&2

Financial Statistics
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Stat 1&2

Financial Statistics
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 24

CHAPTER - ONE: INTRODUCTION

1.1 Definition and Classification of Statistics

Definition of Statistics
The word 'Statistics' is derived from the Latin word 'Statis' which means a "political state." Clearly,
statistics is closely linked with the administrative affairs of a state such as facts and figures
regarding defense force, population, housing, food, financial resources etc.

The word statistics has several meanings.

 In the first place, it is a plural noun which describes a collection of numerical data such as
employment statistics, accident statistics, population statistics, economic statistics, and
agricultural statistics e t c.

 It is in this sense that the word 'statistics' is usually understood by a layman.

 Secondly the word statistics as a singular noun is used to describe a branch of applied
mathematics, whose purpose is to provide methods of dealing with collections of data and
extracting information from them in compact form by tabulating, summarizing and
analyzing the numerical data or a set of observations.

Classification of Statistics
Statistics may be divided into two main branches:
(1) Descriptive Statistics (2) Inferential Statistics
STATISTICS

Descriptive Statistics Inferential Statistics

Collecting Making inference


Organizing Hypothesis testing
Summarizing Determining
Presenting relationships

 Descriptive statistics includes statistical methods involving the collection, presentation, and
characterization of a set of data in order to describe the various features of the data.
 In general, methods of descriptive statistics include graphic methods (bar chart, pie chart, e t
c) and numeric measures (mean, median, variance e t c).
 Descriptive statistics do not, however, allow us to make conclusions beyond the data we
have analyzed.
 They are simply a way to describe data.
 Meaningful and pertinent information cannot be realized from raw data unless summarized
by the tools of descriptive statistics.
 Descriptive statistics, therefore, allow us to present the data in a more meaningful way
which allows interpretation of the data easily.

1 of 25
 For example, a biologist collected blood samples of 10 students from biology
department to study blood types.
Accordingly, the following data is obtained:

O A O AB A O O B A O
 Summary measures, for example, the proportion of students with blood type O in the sample
is 50% is an example of descriptive statistics.
 We can also describe the data using bar or pie charts.

 Inferential Statistics: Inferential statistics is used to make valid inferences about the larger
group from sample data, which are helpful in effective decision making for managers or
professionals.

 Inferential statistics includes statistical methods which facilitate estimation the


characteristics of a population or making decisions concerning a population on the basis of
sample results.
 In this regard, methods like estimation and hypothesis testing are examples of inferential
statistics.
 Example: In the above example, if the HR manager uses the average salary of employees of
production department to estimate the overall average salary of all the departments in
the firm, his method will become under inferential statistics.

 Note: Statistical methods such as estimation, prediction and hypothesis testing belong to
inferential statistics. The researchers make deductions or conclusions from the collected
data samples regarding the characteristics of large population from which the samples are
taken. If generalizations or decisions are drawn with incomplete and additional
information, the method used will be considered as inferential statistics

1.2 Stages in statistical investigation

A statistical study might involve the following stages: collection of data, organizing and presenting
the collected data, analyzing and interpreting the result.

Stage 1: Data collection: this stage involves acquiring data related with the problem at hand.
Stage 2: Organizing and presenting data: this stage involves the classification or sorting the
collected data based on some characteristics or attributes such as age, sex, marital status e t c.
Further we may use tables, graphs, charts so on to present the data.

2 of 25
Stage 3: Data analysis: a thorough scrutiny or analysis of the data is necessary in order to reach
conclusions or provide answers to a problem. The analysis might require simple or sophisticated
statistical tools depending on the type of answers that may have to be provided.
Stage 4: Interpretation of the result: logically a statistical analysis has to be followed by
conclusions in order to be able to make a decision. The technical terminology used to describe this
last process of a statistical study is referred to as interpretation.

1.3 Definition of some terms

A population: is the totality (collection) of all individuals, objects or items under consideration.
 Consists of all elements, individuals, items or objectives whose characteristics are being
studied.
 The population that is being studied is called target population.
Sample: A portion of the population selected for study.
Sample survey: The technique of collecting information from a portion of the population.
Census survey: A survey that includes every member of the population.
Variable: is a characteristic under study that assumes different values for different element.
Quantitative variable: A variable that can be measured numerically.
 The data collected on quantitative variable are called quantitative data.
 Examples include weight, height, number of students in a class, number of car accidents, e t c.
Qualitative variable: A variable that cannot assume a numerical value but can be classified into
two or more non numerical categories.
 The data collected on such a variable are called qualitative or categorical data.
 Examples include sex, blood type, marital status, religion e t c.
Discrete variable: a variable whose values are countable.
 Examples include number of patients in a hospital, number of white blood cells in a droplet
of blood sample, number of rodents per plot of farmland e t c.
Continuous variable: a variable that can assume any numerical value over a certain interval or
intervals.
 Examples include weight of new born babies, height of seedlings, temperature
measurements e t c.
Parameter: A statistical measure obtained from a population data.
 Examples include population mean, proportion, and variance and so on.
Statistic: A statistical measure obtained from a sample data.
 Examples include sample mean, proportion, and variance and so on.

3 of 25
Unit of analysis: The type of thing being measured in the data, such as persons,
families, households, states, nations, etc.

1.4 Applications, uses and limitations of statistics

Application of statistics
 We pointed out that statistics has already become a very important subject area, and, that
various tools of statistics are being used to solve problems in everyday life, in research, in
marketing, in planning, in production and quality control and other areas.
Nevertheless, statistics has its own limitation and it can also be misused.

Limitation of statistics
 Statistics deals with only those subjects of inquiry which are capable of being quantitatively
measured and numerically expressed.

 Statistics deals only with aggregates of facts and no importance is attached to individual items

 Statistical data is only approximately and not mathematically correct

 Statistics is liable to be misused.

 Hence expertise in the subject is very essential. Besides, honesty is very important in the use of
statistics.

1.5 Scales of measurement

If we use different types of measurement scales having different levels of refinement to measure
one and the same object, we obtain different amounts and types of information about a variable
under consideration.
Formally, we distinguish among four levels of measurement scales, and, therefore, among four
types of data.

Nominal scale: it is the simplest measurement scale.

 Values of nominal scale are used merely to categorize the quantity being measured and
hence there is no natural ordering of the levels or values of the scale.

 For example, sex of an individual may be male or female.

 There is no natural ordering of the two sexes.

 Others examples include religion, blood type, eye color, marital status e t c.

 The values of nominal scale can be coded using numerical values; however, we cannot
perform any mathematical operations on the numbers used to code.

4 of 25
Ordinal scale: this measurement scale is similar to the nominal scale but the levels or categories
can be ranked or order.

 That is, we can compare levels or categories of the scale.

 Therefore, this scale of measurement gives better information on the quantities being
measured as compared to nominal scale.

 For example, living standard of a family can be poor, medium or higher.

 These categories can be ordered as poor is less than medium and medium is less than higher
class. However, the distance or magnitude between the levels, say between poor and
medium, is not clearly known.

Interval scale: this measurement scale shares the ordering or ranking and labeling properties of
ordinal scale of measurement.

 Besides, the distance or magnitude between two values is clearly known (meaningful).

 However, it lacks a true zero point (i.e., zero point is not meaningful).

 For example, temperature in degree centigrade or Fahrenheit of an object. If the temperature


of an object is zero degrees centigrade, it doesn’t mean that the object lacks heat.

 Hence zero is arbitrary point in the scale.

 It doesn’t make sense to say that 80° F is twice as hot as 40° F; in centigrade the ratio would
be 6; neither ratio is meaningful.

 We can do subtraction and addition on interval level data but division and multiplication are
impossible to use.

Ratio scale: it is the highest level of measurement scale. It shares the ordering, labeling and
meaningful distance properties of interval scale.

 In addition, it has a true or meaningful zero point.

 The existence of a true zero makes the ratio of two measures meaningful.

 For instance, if your salary is 1000 birr and your wife’s is 2000 we can say that your wife
earns twice of yours.

 If you don’t have any source of income, your income is zero in this scale context and it is
meaningful assignment.

 Other example includes, weight, height, volume measurements e t c.

 We can do subtraction, addition, multiplication and division on ration level data.

 The more precise variable is ratio variable and the least precise is the nominal variable.

5 of 25
 Ratio and interval level data are classified under quantitative variable and, nominal and
ordinal level data are classified under qualitative variable.

CHAPTER - TWO

METHODS OF DATA COLLECTION AND PRESENTATION

Objectives:
After completing this unit, you should be able to
 Organize data using frequency distribution.
 Present data using suitable graphs or diagrams.
Introduction

The amount of data collected in real life situations is often too large, thus we need some methods to
organize it. One of such methods is grouping, that is putting data into groups rather than treating
each observation individually.

In fact, raw data provide little, if any, information to decision makers. Thus, they need a means of
converting the raw data into useful information. Hence, the purpose of this chapter is to introduce
tools used for data presentation.

2.1 Classification and tabulation of data

The use of classifying and tabulating data is to display the points of similarity and dissimilarity; to
save mental strain by systematic condensation and suppression of irrelevant detail; to enable one to
form a mental picture of objects of perception; and to prepare the ground for comparison and
inference.
Types of classification
1. Geographical- in terms of cities, districts, countries etc.
2. Chronological - on the basis of time
3. Qualitative - according to some qualitative characteristics.
4. Quantitative – in terms of magnitude.
6 of 25
One can also use combination of these to classify data.

Tabulation: tables may be classified according to the number of characteristics used for tabulation.
1. Simple or one way table: it uses only one characteristic or variable for classification.

Example 2.1: Students who took introduction to statistics in 1998 E.C.by gender.
Gender Number

Male 2000

Female 700

2. Two-way tables: it uses two characteristics for classification.

Example 2.2: Students who took introduction to statistics in 1998 E.C.by age and gender.
Age Gender
Number of males Number of females
19 and below 200 180
20-25 1415 385
26 and above 385 135

3. Higher ordered tables: results when we have more than two characteristics of classification. For
instance, we can classify the students who took introduction to statistics in 1998 by age, gender
and faculty.

2.2 Introduction to methods of data collection

There are many types of data collection techniques which are used to collect data for study. There
are two types of data: primary and secondary data. Primary data refers to the statistical material
which the investigator originates for the purpose of inquiry. But secondary data, on the other hand,
refers to that statistical material which is not investigated by the investigator himself, but which he
obtains from someone else records.

Primary methods of data collection: Those methods that aim at collecting primary data are
termed as primary method. These may involve data collection using observation, personal
interview, self administered questionnaire, mailed questionnaire etc.

Secondary method of data collection: Secondary data can be obtained from published or
unpublished documents: reports, journals, magazines, articles e t c.
Any aggregate of numbers cannot be called statistical data. We say an aggregate of numbers is
statistical data when they are

 Comparable
 Meaningful and
 Collected for a well-defined objective
7 of 25
Raw data: are collected data, which have not been organized numerically.
Examples: 25, 10, 32, 18, 6, 93, 4.
An array: is an arrangement of raw numerical data in ascending or descending order of magnitude.
 It enables us to know the rang of the data set easy and it also gives us some idea about
the general characteristics of the distribution.
Any scientific investigation requires data related to the study. The required data can be obtained
from either a primary source or a secondary source.

Primary source: Is a source of data that supplies first hand information for the use of the
immediate purpose.
 Primary data: are data originally collected for the immediate purpose.
- Primary data are more expensive than secondary data.
Secondary source: are individuals or agencies, which supply data originally collected for other
purposes by them or others.
- Usually, they are published or unpublished materials, records, reports, e t c.
 Secondary data: data collected from a secondary source.
The process of data collection from a primary source may in value: field trials, laboratory
experiments, surveys (sample survey and census survey), etc….

2.3Methods of Data Presentation


2.3.1 Frequency distributions

In this section, we will concentrate on some of the frequently used method of organizing data. The
easiest method of organizing data is using a frequency distribution, which converts raw data into a
meaningful pattern for statistical analysis.
The main uses of a frequency distribution are
 to organize data in a meaningful, intelligible way.
 to enable one to determine the nature or shape of the distribution; how the observations
cluster around a central value; and how the values spread around the center of the data.
 to facilitate computational procedures for measures of average and spread.
 to enable one to draw charts and graphs for the presentation of data.
 to enable one to make comparisons between data sets.
Frequency distribution: a grouping of data into categories showing the number of observations in
each mutually exclusive category.
Array: data put in an ascending or descending order of magnitude.
Grouped data: data presented in the form of a frequency distribution.
Frequency: the number of observations corresponding to a fixed value or to a class of values.
Relative frequency: the number obtained when the frequency of a class is divided by total number
of observations.
Generally, there are three basic types of frequency distributions: Categorical, Ungrouped and
Grouped frequency distributions.

1. Categorical frequency distribution


8 of 25
– the data are usually qualitative
– the scales of measurements for the data are usually nominal or ordinal

The categorical frequency distribution is used for data which can be placed in specific categories
such as nominal or ordinal level data. For example, data such as political affiliation, religious
affiliation, blood type, marital status, or major field of study would use categorical frequency
distributions.

Example 1.1: The following data are on the political party affiliations of sample of 40 statistics
students. D, R, and O stand for Democratic, Republican and other, respectively.
D D D D O R O R O R O R O D D R D D D R
R O R D R R O R R R R R O O R R D R D D
The classes for grouping are ‘Democratic’, ‘Republican’ and ‘Other’.

Table 2.12 Number of students by political party affiliations.

Class frequency Relative frequency


Democratic 13 0.325
Republican 18 0.45
Other 9 0.225
Total 40 1
Example 1.2: Thirty students, last year, took Stat 273 course and their grades were as follows. Construct an
appropriate frequency distribution for these data.

B B C B A C

D C C C B B

B A B C D C

A F B F C A

B C C A C D

There are five kinds of grades: A, B, C, D and F which may be used as the classes for constructing the
distribution. The procedure for constructing a frequency distribution for categorical data is given below.

STEP 1. Construct a table as shown below


Class Tally Frequency Percent*

(I) ( II ) ( III ) ( IV )

9 of 25
F

STEP 2. Tally the data and place the results in column (II)
STEP 3. Count the Tallies and put the results in column (III)
STEP 4. Calculate the percentages (%) of frequencies in each class by using the formula
%   f 100
f
n Where = frequency of the class (result in column (III))
n = total number of observations
*
Percentages, normally, are not parts of a frequency distribution, but they can be included since they are
important in different statistical analyses.

STEP 5. [For checking] find the total of column (III) and that of column (IV) and see that the total of
column (III) and that of column (IV) are n (total number of observations) and 100%
respectively.
Finally, the frequency distribution becomes as follows.

Class Tally Frequency Percent*

(I) ( II ) ( III ) ( IV )

A ///// 5 16.7

B ///// //// 9 30.0

C ///// ///// / 11 36.7

D /// 3 10.0

F // 2 6.7

2. Ungrouped frequency distribution

Ungrouped frequency distribution is a table of all potential raw scored values that could possibly occur in the
data along with their corresponding frequencies. Ungrouped frequency distribution is often constructed for
small set of data or a discrete variable.

Constructing an ungrouped frequency distribution

To construct an ungrouped frequency distribution, first find the smallest and the largest raw scores in the
collected data. Then make a columnar table of all potential raw scored values arranged in order of magnitude
with the number of times a particular value is repeated, i.e., the frequency of that value. To facilitate
counting method, tallies can be used.

Example 2.1: The following data are the ages in years of 20 women who attend health education last year:
30, 41, 39, 41, 32, 29, 35, 31, 30, 36, 33, 36, 32, 42, 30, 35, 37, 32, 30, and 41.

Construct a frequency distribution for these data.

STEP 1. Find the range of the data:


Range  Maximum observation  Minimum observation
10 of 25
STEP 2. Construct a table, tally the data and complete the frequency column. The frequency distribution
becomes as follows.
Age Tally Frequency

29 / 1

30 //// 4

31 / 1

32 /// 3

33 / 1

35 // 2

36 // 2

37 / 1

39 / 1

41 /// 3

42 / 1

3. Grouped frequency distribution

Components of a grouped frequency distribution


Class limits: the values of a variable which typically serve to identify the classes of a frequency
distribution. They are sometimes referred to as nominal or apparent limits. The smaller and the
larger values are known as the lower and the upper-class limits, respectively. They should be
selected in such a way that they have the same number of significant places or units of measurement
as the observations to be classified.

Class boundaries: the precise points which separate various classes rather than the values included
in any one of the classes. They are sometimes referred to as exact or true limits. They leave no
space for ambiguity and overlapping. A class boundary is located mid-way between the upper-class
limit of a class and the lower-class limit of the next higher class. They are carried out to one more
decimal place than the class limits.

Class mark: the point which divides the class into two equal parts. This is also known as c lass
mid-point. This can be determined by dividing the sum of the two limits or the sum of the two
boundaries by 2.

Class width: the length of a class.


Example 2.3: The following data are the weights in kg of 40 individuals participated in a diet
program for weight loss:

70 64 99 55 64 89 87 65 62 38 67 70 60 69 78 39 75 56 71 51
99 68 95 86 57 53 47 50 55 81 80 98 51 36 63 66 85 79 83 70

11 of 25
By grouping data into classes, we can make the data much easier to read and understand. We group
these data by 10s. The smallest weight is 36 kg, thus the 1rst class of weights is 31 kg up to,
including, 40 kg.
Table 3.1: Distribution of weights.
Class Class boundary Count (Frequency)
31 – 40 30.5-40.5 3
41 – 50 40.5-50.5 2
51 – 60 50.5-60.5 8
61 – 70 60.5-70.5 12
71 – 80 70.5-80.5 5
81 – 90 80.5-90.5 6
91 - 100 90.5-100.5 4
Total 40

For this example, the first class is ‘31-40’. Lower limit of this class = 31; upper limit = 40. The
lower class boundary = 30.5; upper class boundary = 40.5. The width of the class = upper class
boundary - lower class boundary = 40.5-30.5 = 10. The class mark (class mid-point) of this class is
(31+40)/2 = 35.5. The values 36, 39, 38 are included in this class. Therefore, the frequency of this
class is 3.

 Cumulative frequency (Cf) less than type – is the total frequency of all values (observations) less
than or equal to the upper-class boundary for the given class.
 Cumulative frequency (Cf) more than type – is the total frequency of all values (observations)
greater than or equal to the lower-class boundary for the given class.
 A tabular arrangement of class intervals together with their corresponding cumulative frequency
(either more than or less than type; as defined above) is called cumulative frequency distribution.

Steps for construction of a grouped frequency distribution

STEP 1. Find the maximum (Max) and the minimum (Min) observation, and then compute their range, R
Range  Max  Min
STEP 2. Fix the number of classes’ desired (k). there are two ways to fix k:
– Fix k arbitrarily between 6 and 20, or
k 1  3.332 log N
Use Sturge’s Formula:
– 10 where N is the total frequency. And round
this value of k up to get an integer number.
STEP 3. Find the class widths (W) by dividing the range by the number of classes and round the number up to
W R
get an integer value. K
STEP 4. Pick a suitable starting point less than or equal to the minimum value. This starting point is the lower
limit of the first class. Continue to add the class width to this lower limit to get the rest of the lower
limits.
STEP 5. Find the upper-class limits. To find the upper-class limit of the first class, subtract one unit of
measurement from the lower limit of the second class. Then continue to add the class width to this
upper limit so as to get the rest of the upper limits.
LCB  LCL  1 U UCB UCL  1 U
STEP 6. Compute the class boundaries as: 2 and 2

Where LCL = lower class limit, UCL= upper class limit, LCB= lower class boundary and UCB= upper class
boundary. The class boundaries are also half way between the upper limit of one class and the lower limit of
the next class.

12 of 25
STEP 7. Tally the data and find the frequencies.
STEP 8. (If necessary) Find the cumulative frequencies (more than and less than types).
Example 3.1: The number of hours 40 employees spends on their job for the last 7 working days is given below.

62 50 35 36 31 43 43 43

41 31 65 30 41 58 49 41

37 62 27 47 65 50 45 48

27 53 40 29 63 34 44 32

58 61 38 41 26 50 47 37

Construct a suitable frequency distribution for these data using 8 classes.

STEP 1. Max = 65, Min = 26 so that R = 65-26 = 39


STEP 2. It is already determined to construct a frequency distribution having 8 classes.
W 39 4.875 5
STEP 3. Class width 5
STEP 4. Starting point = 26 = lower limit of the first class. And hence the lower-class limits become
26 31 36 41 46 51 56 61

STEP 5. Upper limit of the first class = 31-1 = 30. And hence the upper-class limits become
30 35 40 45 50 55 60 65

The lower and the upper-class limits (Steps 5 and 6) can be written as follows.

Class limits

26 – 30

31 – 35

36 – 40

41 – 45

46 – 50

51 – 55

56 – 60

61 – 65

STEP 6. By subtracting 0.5 units of measurement from the lower-class limits and by adding 0.5 units of
measurement to the upper-class limits, we can get lower- and upper-class boundaries as follows.
Class
boundaries

25.5 – 30.5

30.5 – 35.5

35.5– 40.5

13 of 25
40.5– 45.5

45.5– 50.5

50.5– 55.5

55.5– 60.5

60.5– 65.5

STEPS 7 and 8 are displayed in the following table (columns 3, 4 and 5&6 respectively).

Class limits Class Tally frequency Cumulative Cumulative


boundaries frequency (less frequency
than type) (more than type)

26 – 30 25.5 – 30.5 ///// 5 5 40

31 – 35 30.5 – 35.5 ///// 5 10 35

36 – 40 35.5– 40.5 ///// 5 15 30

41 – 45 40.5– 45.5 ///// //// 9 24 25

46 – 50 45.5– 50.5 ///// // 7 31 16

51 – 55 50.5– 55.5 / 1 32 9

56 – 60 55.5– 60.5 // 2 34 8

61 – 65 60.5– 65.5 ///// / 6 40 6

Example 3.2: The following data are on the number of minutes to travel from home to work for a
group of automobile workers.

28 25 48 37 41 19 32 26 16 23 23 29 36

31 26 21 32 25 31 43 35 42 38 33 28.
Construct a frequency distribution for this data.

Solution:

 Range = 48 – 16 =32

 K=1+3.322 =5.64≈6

14 of 25
 W=32/6=5.33 rounding up to the nearest integer i.e W=6.

Let the lower limit of the first class be 16 then the frequency distribution is as follows:
Class limit Class boundaries Tally Frequency
16-21 15.5-21.5 \\\ 3
22-27 21.5-27.5 \\\\\ \ 6
28-33 27.5-33.5 \\\\\ \\\ 8
34-39 33.5-39.5 \\\\ 4
40-45 39.5-45.5 \\\ 3
46-51 45.5-51.5 \ 1
Total 25
Table 3.2: The distribution of the time in minutes spent by automobile workers to travel from home
to work place.

Time (in minute) Number of workers


16-21 3
22-27 6
28-33 8
34-39 4
40-45 3
46-51 1
Total 25

This frequency distribution is more understandable than the raw data. We can see some feature of
the data from this table. For instance, many observations are found in the second class and third
class. This in turn implies that many workers took around 22 to 33 minutes to travel from home to
work place.

Activity 2.2
1. In a biology experiment the lengths of 25 worms, measured to the nearest 0.1cm, were:
9.5 8.1 5.1 6.6 9.3 9.1 6.5 5.0 6.9 7.6 9.3 8.3 6.0
6.2 7.4 7.7 7.8 7.9 7.0 7.8 5.4 9.8 6.3 7.5 8.4
Construct a frequency distribution for the data by using Sturgess’ rule for the number of classes. What
do you think about the typical length of these worms?

Types of grouped frequency distributions

Based on the type of frequency assigned to the classes we have three types of grouped frequency
distributions:
 Absolute frequency distribution
 Relative frequency distribution
 Cumulative frequency distribution
The frequency distributions that we have seen in the previous examples (examples 3.2 and table 3.2) are
absolute frequency distributions because the frequencies assigned are absolute frequencies.

Definition 2.1: A relative frequency distribution is a distribution which specifies


the frequency of a class relative to the total frequency.

15 of 25
Example 3.3: Convert the above absolute frequency distribution in example 2.6 to a relative frequency
distribution.
Solution: First, we find the relative frequency of each class. The relative frequency of a class is the
frequency of the class divided by the total number of observations. For instance, the relative frequency of the
first class is 3/25=0.12, the relative frequency of the second class is 6/25=0.24, and so on. Thus, the relative
frequency distribution is shown in table 2.7.
Table 3.3: The distribution of the time in minutes spent by automobile workers to travel from home to work
place.
Time (in minute) Relative frequency
16-21 0.12
22-27 0.24
28-33 0.32
34-39 0.16
40-45 0.12
46-51 0.04
Total 1
Note: Proportion may also be changed to percentages to obtain a percentage relative frequency distribution.

Example 3.4: Convert the above relative frequency distribution to a percentage relative frequency
distribution.

Solution: We simply multiply the relative frequencies of the above relative frequency distribution by 100.

Table 3.4: The distribution of the time in minutes spent by automobile workers to travel from home to work.

Time (in minute) Relative frequency


16-21 12
22-27 24
28-33 32
34-39 16
40-45 12
46-51 4
Total 100

Definition 2.2: Cumulative frequency refers to the number of observations that


are below a specified value or that are above a specified value.
Note: Class boundaries are mostly used to obtain cumulative frequencies. Based on whether the
observations are bounded from above or from below we can have a cumulative less than or a
cumulative more than frequency distributions, respectively.

Example 2.8: Convert the absolute frequency distribution in example 2.5 into:

i) a cumulative frequency distribution less than type.


ii) a cumulative frequency distribution more than type.

Solution:
We use the class boundaries to form cumulative frequencies.

16 of 25
Table 3.5: The less than and more than type cumulative frequency distribution of the time in
minutes spent by automobile workers to travel from home to work place.

Time (in minute) Cf Cf


less than type more than type
15.5 – 21.5 3 25
21.5 – 27.5 9 22
27.5 – 33.5 17 16
33.5 – 39.5 21 8
39.5 – 45.5 24 4
45.5 – 51.5 25 1

Activity 2.3
1. The following are the scores of 32 students who took statistics test:
55 70 80 75 90 80 60 100 95 70 75 85 80 80 70 95

100 80 85 70 85 90 80 75 85 70 90 60 80 70 85 80

Organize this data set using an absolute frequency distribution consisting of 7 classes. Start the first class
with the minimum value in the data set. Construct also the relative frequency distribution, the less than
cumulative frequency distribution, and the more than cumulative frequency distribution. What do you
think about the typical score of these students? How many students score below the lower limit of the
third class?

2.3.2 Diagrammatic and graphical presentation of data

2.3.2.1 Graphs for quantitative data


1. Histogram
It consists of a set of adjacent rectangles whose bases are marked off by class boundaries (not class
limits) along the horizontal axis and whose heights are proportional to the frequencies associated
with the respective classes.

To construct a histogram from a data set:


1. Arrange the data in increasing order.
2. Choose class intervals so that all data points are covered.
3. Construct a frequency table.
4. Draw adjacent bars having heights determined by the frequencies in step 3.

The importance of a histogram is that it enables us to organize and present data graphically so as to
draw attention to certain important features of the data. For instance, a histogram can often indicate
how symmetric the data are; how spread out the data are; whether there are intervals having high
levels of data concentration; whether there are gaps in the data; and whether some data values are
far apart from others.

17 of 25
Example: Construct a histogram for the frequency distribution of the time spent by the automobile
workers.

Table 3.6: The distribution of the time in minutes spent by automobile workers to travel from home
to work.

Time (in minute) Class mark Number of workers


15.5- 21.5 18.5 3
21.5-27.5 24.5 6
27.5-33.5 30.5 8
33.5-39.5 36.5 4
39.5-45.5 42.5 3
45.5-51.5 48.5 1

2. Frequency Polygon

A frequency polygon is a line graph drawn by taking the frequencies of the classes along the vertical axis
and their respective class marks along the horizontal axis. Then join the cross points by a free hand curve.

Example: Draw a frequency polygon presenting the following data.

Frequency Cumulative Cumulative


Frequency (less Frequency (more
Class Boundaries Class Mark than type) than type)

5.5 – 11.5 8.5 2 2 20

11.5 – 17.5 14.5 2 4 18

17.5 – 23.5 20.5 7 11 16

23.5 – 29.5 26.5 4 15 9

29.5 – 35.5 32.5 3 18 5

35.5 – 41.5 38.5 2 20 2

18 of 25
10

Frequency
6

0
0.0 8.50 14.50 20.50 26.50 32.50 38.50

Class Marks

3. Cumulative Frequency Polygon (Ogive)

Cumulative frequency polygon can be traced on less than or more than cumulative frequency basis. Place the
class boundaries along the horizontal axis and the corresponding cumulative frequencies (either less than or
more than cumulative frequencies) along the vertical axis. Then join the cross points by a free hand curve.

Example: the data in the above example can be presented using either a less than or a more than cumulative
frequency polygon as given below (i) and (ii) respectively.

(i) Less than type cumulative frequency curve

30
Less than type cumulative frequencies

20

10

0
11.50 17.50 23.50 29.50 35.50 41.50

Upper class boundaries

(ii) More than type cumulative frequency curve


19 of 25
30

More than type cumulative frequencies


20

10

0
5.50 11.50 17.50 23.50 29.50 35.50

Lower class boundaries

4. Line graph

Data from a frequency table can be graphically pictured by a line graph which plots the successive
values on the horizontal axis and indicates the corresponding frequency by the height of a vertical
line. This method of data presentation is especially suitable for discrete data. For instance data on
number of family members, number of car accidents, number of defective items produced by
machines etc could be well explained using line graph.

Example: The following data are on the number of seeds germinated out of six seeds planted in
each of 50 pots.
1 1 1 2 6 3 3 4 2 43 2 1 5 2 1 3 6 2 23 1 1 4 3
2 2 2 2 30 3 1 2 1 2 3 1 1 3 3 2 1 2 1 1 3 1 5 1

Construct a line graph for this data.

2.3.2.2 Graphs for qualitative data

1. Bar-charts
i) Simple bar charts: are diagrammatic representation of data in which the data are
represented by series of vertical or horizontal bars, the height (or length) of each bar
indicating the size of the figure represented.

20 of 25
Example: Draw a bar chart for the following coffee production data.

Table: Coffee productions from 1990 to 1995.

Production year 1990 1991 1992 1993 1994 1995


Amounts of coffee (in 1000 tons) 50 75 92 64 100 120

Amount of coffee in 1000 tons


120

100

80

60

40

20

0
1990 1991 1992 1993 1994 1995

Production year

ii) Component bar charts: are like ordinary bar charts except that the bars are subdivided
into two or more component parts. It is used to represent total figure in terms of
components. The components are proportional in size to the component parts of the total
quantity being represented by each bar.
a. Actual component bar charts: are charts in which the overall height of the bar and the
individual component lengths represent actual figures.
Example: Draw an actual component bar chart for the following data on production of coffee (in
1000 tons).
Table: Coffee productions from 1991 to 1993 by region.
Production year 1991 1992 1993

Amount of coffee Region A 80 85 90


(in 1000 tons) Region B 120 165 120
Total 200 250 210

250 Region
Amount of coffee in 1000 tons

A
B
200

150

100

50

0
1991 1992 1993

Production year

21 of 25
b. Percentage component bar charts: are charts in which the individual component lengths
represent the percentage forms of the overall total. Note that a series of such bars will all be
of the same total height, i.e. 100 percent.
Example: Draw a percentage component bar chart for the above data on production of coffee (in
1000 tons).

Solution: First convert the component figures into percentage forms of their corresponding totals to
get the following result.
Table: Coffee productions from 1991 to 1993 by region.

Production year 1991 1992 1993

Amount of coffee Region A 40 34 42.9


(in percents) Region B 60 66 57.1
Total 100 100 100

100.0 Region
Amount of coffee in percent

A
B
80.0

60.0

40.0

20.0

0.0
1991 1992 1993

Production year

iii) Multiple bar charts: are charts in which figures are shown as separate bars adjoining
each other. The height of each bar represents the actual value of the component figures.

Example: Draw a multiple bar chart for the data on production of coffee.

200 Region
Amount of coffee in 1000 tons

A
B

150

100

50

0
1991 1992 1993

Production year

22 of 25
2. Pie-chart
Is a circle divided by radial lines into sections or sectors so that the area of each sector is
proportional to the size of the figure represented.
Pie-chart construction:
fi
* 100
 Calculate the percentage frequency of each component. It n .
fi
* 360 0
 Calculate the degree measures of each sector. It is given by n .
 Draw the circle using protractor and compass

Example: Draw a pie-chart to represent the following data on a certain family expenditure.

Table: Data on a certain family expenditure.


Item Food Clothing House rent Fuel & light Miscellaneous Total
Expenditure(in birr) 50 30 20 15 35 150
Percentage frequencies 33.33 20 13.33 10 23.33
Angles of the sector 1200 720 480 360 840 3600

Item
Food
Clothing
House rent
Fuel and light
Miscellaneous

Activity 2.4

1. The following data are the blood types of 50 volunteers at a blood plasma donation clinic:

O A O AB A A O O B A O A AB B O O O A B A A O A A B O B A O AB A O
O A B A A A O B O O A O A B O AB A O

a. Organize this data using a categorical frequency distribution


b. Present the data using both a pie and a bar chart.

2. The following table gives the number of deaths in a certain country in 1987 due to accidents for
individuals in various classifications.

23 of 25
Classification Number of deaths

Pedestrians 1699

Bicyclists 280

Motorcyclists 650

Automobile drivers 1327

Represent the data using both a bar chart and a pie chart. Which of the charts is more
informative?

3. Pictogram

Is a device used to represent data by means of pictures or small symbols. It is customary to


represent a unique value of the data by standard symbol or a picture and the whole quantity by an
appropriate number of repetitions of the symbol assumed. The symbol should be simple and clear
for understanding.

Example: The following table shows the orange production in a plantation from production year
1990-1993. Represent the data by a pictogram.

Table: Orange productions from 1990 to 1993.

Production year 1990 1991 1992 1993


Amount (in kg) 3000 3850 3500 5000

24 of 25

You might also like