0% found this document useful (0 votes)
15 views

Measures Of: Central Tendency & Dispersion

This document discusses measures of central tendency including the mean, median, and mode. It defines each measure and provides examples of how to calculate them using both discrete and continuous data sets. The mean is the average value obtained by dividing the total of all values by the number of values. The median is the middle value when data is arranged in order. The mode is the value that occurs most frequently. The document outlines advantages and disadvantages of each measure and concludes that measures of central tendency describe the center or typical value in a data set.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Measures Of: Central Tendency & Dispersion

This document discusses measures of central tendency including the mean, median, and mode. It defines each measure and provides examples of how to calculate them using both discrete and continuous data sets. The mean is the average value obtained by dividing the total of all values by the number of values. The median is the middle value when data is arranged in order. The mode is the value that occurs most frequently. The document outlines advantages and disadvantages of each measure and concludes that measures of central tendency describe the center or typical value in a data set.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Measures of

Central Tendency
& Dispersion
Mean, deviations and variations
Introduction:
▪ Measures of central tendency are statistical
measures which describe the position of a
distribution.
▪ They are also called statistics of location, and
are the complement of statistics of dispersion,
which provide information concerning the
variance or distribution of observations.
▪ In the univariate context, the mean, median and
mode are the most commonly used measures of
central tendency.
▪ computable values on a distribution that discuss
the behavior of the center of a distribution.
The value or the figure which represents the whole
series is neither the lowest value in the series nor the
highest it lies somewhere between these two extremes.
1. The average represents all the measurements made
on a group, and gives a concise description of the
group as a whole.
2. When two or more groups are measured, the central
tendency provides the basis of comparison between
them.
Definition

Simpson and Kafka defined it as “ A measure of central tendency


is a typical value around which other figures congregate”

Waugh has expressed “An average stand for the whole group of
which it forms a part yet represents the whole”.
1. Arithmetic Mean
Arithmetic mean is a mathematical
average and it is the most popular measures
of central tendency. It is frequently referred to
as ‘mean’ it is obtained by dividing sum of the
values of all observations in a series (ƩX) by
the number of items (N) constituting the series.
Thus, mean of a set of numbers X1, X2,
X3,………..Xn denoted by x̅ and is defined
as
The mean is the most widely used average in statistics. It is
found by adding up all the values in the data and dividing by
how many values there are.

Notation: If the data values are x1, x2 , x3 ,..., xn , then the


mean is
x1 + x2 + x3 + ... + xn  xi This symbol
This is the x= = means the
mean symbol n n total of all the
x values

Note: The mean takes into account every piece of


data, so it is affected by outliers in the data. The
median is preferred over the mean if the data
contains outliers or is skewed.
7
If data are presented in a frequency table:

Value Frequency
x1 f1
x2 f2
… …
xn fn

then the mean is

x=
x1 f1 + x2 f 2 + ... + xn f n
=
x f
i i

 i f f i
Example: The table shows the results of a survey
into household size. Find the mean size.
Household size, x Frequency, f x×f
1 20 20
2 28 56
3 25 75
4 19 76
5 16 80
6 6 36

TOTAL 114 343

To find the mean, we add a 3rd column to the table.


Mean = 343 ÷ 114 = 3.01
Advantages of Mean:
• It is easy to understand & simple
calculate.
• It is based on all the values.
• It is rigidly defined .
• It is easy to understand the arithmetic
average even if some of the details of the
data are lacking.
• It is not based on the position in the
series.
Disadvantages of Mean:
• It is affected by extreme values.
• It cannot be calculated for open end
classes.
• It cannot be located graphically
• It gives misleading conclusions.
median
Median is a central value of the distribution, or the
value which divides the distribution in equal parts,
each part containing equal number of items. Thus it is
the central value of the variable, when the values are
arranged in order of magnitude.
Connor has defined as “ The median is that value of
the variable which divides the group into two equal
parts, one part comprising of all values greater, and the
other, all values less than median”
EXAMPLES - Median
The ages for a sample of The heights of four basketball
players, in inches, are:
five college students are: 76, 73, 80, 75
21, 25, 19, 20, 22
Arranging the data in ascending
order gives:

Arranging the data in 73, 75, 76, 80.


ascending order gives:
Thus the median is 75.5

19, 20, 21, 22, 25.

Thus the median is 21.


14
Calculation of Median –Discrete series :
i. Arrange the data in ascending or descending
order.

ii. Calculate the cumulative frequencies.

iii. Apply the formula.


Calculation of median – Continuous series

For calculation of median in a continuous


frequency distribution the following formula
will be employed. Algebraically,
• L1 is the lower class boundary of the
median
• N is total frequency of the distribution
• cf is the cumulative frequency of the
immediate class before median class
• f frequency of the median class
• i is class interval of the median class
Example: Median of a set Grouped Data in a
Distribution of Respondents by age
Age Group Frequency of Cumulative
Median class(f) frequencies(cf)
0-20 15 15
20-40 32 47
40-60 54 101
60-80 30 131
80-100 19 150
Total 150
Median (M)=40+

= 40+

= 40+0.52X20
= 40+10.37
= 50.37
Advantages of Median:
• Median can be calculated in all distributions.

• Median can be understood even by common people.

• Median can be ascertained even with the extreme


items.

• It can be located graphically

• It is most useful dealing with qualitative data


Disadvantages of Median:
• It is not based on all the values.
• It is not capable of further mathematical
treatment.
• It is affected fluctuation of sampling.
• In case of even no. of values it may not the
value from the data.
3. Mode
➢ Mode is the most frequent value or score

in the distribution.

➢ It is defined as that value of the item in

a series.

➢ It is denoted by the capital letter Z.

➢ highest point of the frequencies

distribution curve.
Croxton and Cowden : defined it as “the mode of a
distribution is the value at the point armed with the item
tend to most heavily concentrated. It may be regarded as
the most typical of a series of value”

The exact value of mode can be obtained by the


following formula.

Z=L1+
Central trend measures
• Mode
• It is the value that occur more frequently.

Interval Point middle Frequency (fi)


_________________________________
1–3 2 18
4–6 5 27
7–9 8 34
10 – 12 11 22
13 – 15 14 13
_________________________________
Total 114
where
L1 : lower boundaries of modal class
f1: frequency of modal class
f0: frequency preceding modal class
f2: frequency succeeding modal class
i : width of the modal class
Example: Calculate Mode for the distribution of
monthly rent Paid by civil servants in Accra

Monthly rent (GH¢) Number of Libraries (f)


500-1000 5
1000-1500 10
1500-2000 8
2000-2500 16
2500-3000 14
3000 & Above 12
Total 65
Z=2000+

Z =2000+

Z=2000+0.8 ×500=400
Z=2400
Advantages of Mode :
• Mode is readily comprehensible and easily calculated
• It is the best representative of data
• It is not at all affected by extreme value.
• The value of mode can also be determined
graphically.
• It is usually an actual value of an important part of
the series.
Disadvantages of Mode :

• It is not based on all observations.


• It is not capable of further mathematical
manipulation.
• Mode is affected to a great extent by sampling
fluctuations.
• Choice of grouping has great influence on the value
of mode.
Conclusion
• A measure of central tendency is a measure
that tells us where the middle of a bunch of
data lies.

• Mean is the most common measure of central


tendency. It is simply the sum of the numbers
divided by the number of numbers in a set of
data. This is also known as average.
• Median is the number present in the middle when
the numbers in a set of data are arranged in
ascending or descending order. If the number of
numbers in a data set is even, then the median is the
mean of the two middle numbers.

• Mode is the value that occurs most frequently in a


set of data.
Trial works
The following are monthly salaries in a company of 30 employees in
Accra:
91 139 126 119 100 87 65 77 99 95 108 127 86 148 116
76 69 88 112 118 89 116 97 105 95 80 86 106 93 135
During the last festive period, the company gave bonuses of 10, 15, 20,
25, 30, 35, 40, 45 and 50 to her employees in respective salary groups:
exceeding 60 but not exceeding 70, exceeding 70 but not exceeding
80, and so on up to exceeding 140 but not exceeding 150.
Put the above data into frequency distribution table.
• Find the average bonus paid per employee
• Find the median bonus
• Find the median salary
• Find the modal bonus
• Find the modal salary
Measures of
Dispersion

33
Definition
• Measures of dispersion are descriptive statistics that describe how
similar a set of scores are to each other
• The more similar the scores are to each other, the lower the measure of
dispersion will be
• The less similar the scores are to each other, the higher the measure of
dispersion will be
• In general, the more spread out a distribution is, the larger the measure of
dispersion will be

34
35

Measures of Dispersion
• Which of the
distributions of scores
125
has the larger 100
dispersion? 75
50
25
The upper distribution 0
has more dispersion 1 2 3 4 5 6 7 8 9 10
125
because the scores are 100
more spread out 75
50
That is, they are less 25
similar to each other 0
1 2 3 4 5 6 7 8 9 10
Measures of Variability or Dispersion

• The dispersion of a distribution reveals how the


observations are spread out or scattered on each side of
the center.
• To measure the dispersion, scatter, or variation of a
distribution is as important as to locate the central
tendency.
• If the dispersion is small, it indicates high uniformity of the
observations in the distribution.
• Absence of dispersion in the data indicates perfect
uniformity. This situation arises when all observations in
the distribution are identical.
• If this were the case, description of any single observation
would suffice.
Purpose of Measuring Dispersion

• A measure of dispersion appears to serve two purposes.


• First, it is one of the most important quantities used to characterize a
frequency distribution.
• Second, it affords a basis of comparison between two or more frequency
distributions.
• The study of dispersion bears its importance from the fact that various
distributions may have exactly the same averages, but substantial
differences in their variability.
Measures of Dispersion
❖Range
❖Percentile range
❖Quartile deviation
❖Mean deviation
❖Variance and standard deviation

Relative measure of dispersion


• Coefficient of variation
• Coefficient of mean deviation
• Coefficient of range
• Coefficient of quartile deviation
Range
• The simplest and crudest measure of dispersion is the
range. This is defined as the difference between the
largest and the smallest values in the distribution. If
are the values of observations in a
sample, then., xrange (R) of the variable X is given by:
x , x ,.........
1 2 n

R (x1 , x2 ,........, xn ) = maxx1 , x2 ,..........., xn − min ( x1 , x2 ,............, xn 


The Range

• What is the range of the following data:


4 8 1 6 6 2 9 3 6 9

40
When To Use the Range
• The range is used when
• you have ordinal data or
• you are presenting your results to people with little or no knowledge of
statistics
• The range is rarely used in scientific work as it is fairly insensitive
• It depends on only two scores in the set of data, XL and XS
• Two very different sets of data can have the same range:
1 1 1 1 9 vs 1 3 5 7 9

41
Mean Deviation
• The mean deviation is an average of absolute
deviations of individual observations from the central
value of a series. Average deviation about mean
k

f i xi − x
MD(x ) = i =1
n
• k = Number of classes
• xi= Mid point of the i-th class
• fi= frequency of the i-th class
Standard Deviation

• Standard deviation is the positive square root of the


mean-square deviations of the observations from
their arithmetic mean.

Population Statistic

 (x −  )
i
2
(
 ix − x )2

= s=
N n −1

SD = variance
Example: The mid-day temperatures (in °C) recorded for
one week in June were: 21, 23, 24, 19, 19, 20, 21

First we find the mean: 21+ 23 + ... + 21 147 °C


x= = = 21
7 7
xi xi − x ( xi − x )2
 (x − x )
2

21 0 0 variance = i

n
23 2 4
24 3 9 So variance = 22 ÷ 7 = 3.143
19 -2 4
So, s.d. = 1.77°C (3 s.f.)
19 -2 4
20 -1 1
21 0 0

Total: 22
There is an alternative formula which is usually a more
convenient way to find the variance:

 (x
2
− x)
variance = i

n
But,  ( xi − x ) =  ( x − 2 xi x + x )
2 2
i
2

=  xi2 − 2 x  xi + nx 2

=  xi2 − 2 x  nx + nx 2

=  xi2 − nx 2

x x
2 2

variance = i
− x2 s.d. = i
− x2
n n
Example (continued): Looking again at the temperature
data for June: 21, 23, 24, 19, 19, 20, 21

We know that 147


x= = 21 °C
7
 xi = 212 + 232 + ... + 21=2 3109
2
Also,

So,
x
2
3109
variance = i
− x2 = − 212 = 3.143
n 7
s.d. = 1.77 °C
When the data is presented in a frequency table, the
formula for finding the standard deviation needs to be
adjusted slightly:

Example: A class of 20 Number of times Frequency


exercise taken
students were asked how
0 5
many times they exercise
1 3
in a normal week.
2 5
Find the mean and the 3 4
standard deviation. 4 2
5 1
If data is presented in a grouped frequency table, it is only
possible to estimate the mean and the standard deviation.
This is because the exact data values are not known.
An estimate is obtained by using the mid-point of an interval to
represent each of the values in that interval.

Example: The table Annual mileage, x Frequency


shows the annual mileage 0 ≤ x < 5000 6
for the employees of an 17
5000 ≤ x < 10,000
insurance company.
10,000 ≤ x < 15,000 14
Estimate the mean and
15,000 ≤ x < 20,000 5
standard deviation.
20,000 ≤ x < 30,000 3
Example: Find Standard Deviation of Ungroup
Data

Family
1 2 3 4 5 6 7 8 9 10
No.
Size (xi) 3 3 4 4 5 5 6 6 7 7
Here, x=
x i
=
50
=5
n 10

Family No. 1 2 3 4 5 6 7 8 9 10 Total

xi 3 3 4 4 5 5 6 6 7 7 50
xi − x -2 -2 -1 -1 0 0 1 1 2 2 0

(x i − x )2 4 4 1 1 0 0 1 1 4 4 20

2
xi 9 9 16 16 25 25 36 36 49 49 270

 (x − x)
2
i 20
s 2
= = = 2.2, s = 2.2 = 1.48
n −1 9
Find Standard Deviation from distribution table

xi fi f i xi f i xi
2 x i − x (x i − x )2 f i (x i − x )2
3 2 6 18 -3 9 18
5 3 15 75 -1 1 3
7 2 14 98 1 1 2
8 2 16 128 2 4 8
9 1 9 81 3 9 9
Total 10 60 400 - - 40

x=
fx i i
=
60
=6  f (x i − x)
2
i 40
s 2
= = = 4.44
f i 10 n −1 9
Computational Formula Example
X X2 X- (X-)2
9 81 2 4
8 64 1 1
6 36 -1 1
5 25 -2 4
8 64 1 1
6 36 -1 1
 = 42  = 306 =0  = 12

52
Relative Measures of Dispersion

• To compare the extent of variation of different distributions whether


having differing or identical units of measurements, it is necessary to
consider some other measures that reduce the absolute deviation in
some relative form.

• These measures are usually expressed in the form of coefficients and


are pure numbers, independent of the unit of measurements.
Relative Measures of Dispersion

• Coefficient of variation
• Coefficient of mean deviation
• Coefficient of range
• Coefficient of quartile deviation
Coefficient of Variation
• A coefficient of variation is computed as a ratio of the standard
deviation of the distribution to the mean of the same distribution
express in percentage.

sx
CV = *100
x
Example-3: Comments on Children in a community
Height weight
Mean 40 inch 10 kg
SD 5 inch 2 kg
CV 0.125 0.20

• Since the coefficient of variation for weight is greater


than that of height, we would tend to conclude that
weight has more variability than height in the
population.
CLASS TEST 2 AND ASSIGNMENT 2

57
THE
END
THANK
YOU
THE
END

You might also like