Unit 1: Measures of Central Tendency: Module 6: Descriptive Statistical Measures
Unit 1: Measures of Central Tendency: Module 6: Descriptive Statistical Measures
Unit 1: Measures of Central Tendency: Module 6: Descriptive Statistical Measures
Numerical values that tend to locate in some sense the middle of a set of data when
arranged in increasing or decreasing order are called measures of central tendency or
central location. The term average is often associated with these measures, which are the
mean, median, mode,and midrange. In this unit, we will walk through the simple steps in
computing these measures.
MEAN
a. Arithmetic Mean. It is obtained by adding all the observations and dividing the sum by
the number of observations, thus it is called a computational average.
2. Sample Mean: If𝑥1 , 𝑥2 , … , 𝑥𝑛 representsthe data values from a finite sample of size 𝑛,
the sample mean𝑥 (“𝑥 bar”) is given by
𝑥𝑖
𝑥=
𝑛
The symbol 𝑥𝑖 , read “summation of 𝑥 sub 𝑖” means that we take the sum of all the
values in the data set. It uses the Greek letter Σ “sigma” (not the letter E!). Note that the
data values do not need to be arranged in any order when the mean is computed. For
data sets with many values, 𝑥𝑖 can be computed using the Statistics mode of a
scientific calculator.
Example 1:
Suppose you chose ten people who entered the campus and whose ages are as
follows: 15, 25, 18, 20, 25, 18, 18, 20, 20, 25. What is the mean age of this sample?
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 77
Solution:
𝑥𝑖 15 + 25 + 18 + 20 + 25 + 18 + 18 + 20 + 20 + 25
𝑥= = = 20.40
𝑛 10
The mean age of the sample is 20.40 years.
Note that your solution does not have to include the list of all values being added.
For instance, you can simply write
𝑥𝑖 204
𝑥= = = 20.40.
𝑛 10
R Script
# Create the data vector (for small samples)
ages <-c(15, 25, 18, 20, 25, 18, 18, 20, 20, 25)
[1] 20.4
Example 2:
A student was taking 5 subjects last semester. Find his average if his final grades were as
follows:
Solution:
The grades will serve as the data values𝑥𝑖 and the units will be the corresponding
weights 𝑤𝑖 .
3 1.75 + 5 2.50 + 3 2.25 + 2 1.50 + 4 3.0
𝑥= = 2.32
3+5+3+2+4
The weighted average of the student is 2.32.
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 78
R Script
# Create the data vectors
grade <-c(1.75, 2.50, 2.25, 1.50, 3.0)
units <-c(3, 5, 3, 2, 4)
# Compute for the weighted mean
weighted.mean(grade, units)
[1] 2.323529
MODE
The mode of a data set is the value in the distribution with the highest frequency. It locates
the point where the observation values occur with the greatest density. It can be used for
quantitative aswell as qualitative data. The mode of a population is denoted by 𝜇 (“mu
hat”) while that of a sample is denoted by 𝑥 (“𝑥 hat”).
A data set can have one mode, more than one mode, or no mode.
When two data values occur with the same greatest frequency, then the data set has
two modes and is calledbimodal.
When more than two data values occur with the same greatest frequency, each of
those values is a mode and the data set is said to be multimodal.
When no data value is repeated, or if all data values are repeated the same number of
times, we say that there is no mode.
Example 3:
Observe the given ungrouped data below:
a. 1,2,3,4,5,6,7 (No Mode)
b. 15.2, 12.3, 4.6, 12.3, 6.5, 12.3, 5.5 (There is one mode, 𝑥 = 12.3)
c. 15,12,4,15,4,6,5 (There are two modes, 𝑥 = 12 and 𝑥 = 4, so the data set is bimodal)
d. 3,4,5,1,3,2,4,5,7,10 (There are three modes, 𝑥 = 3,𝑥 = 4, and 𝑥 = 5, so the data set is
multimodal.)
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 79
R Script
Here we present hypothetical examples, considering both numeric data and
nonnumeric data.
values
3 4 5 1 2 7 10
2 2 2 1 1 1 1
labels
A2 A3 A1 A4
5 3 2 2
MEDIAN
The median of a data set is the value that divides the distribution into two equal parts(after
arranging thevalues in ascending or descending order). As such, it is a positional average.
The median𝜇(“mu curl” or “mu tilde”) of the population or 𝑥 (“𝑥 curl” or “𝑥 tilde”) can be
determined using the following formula:
𝑁+1 𝑁 𝑁 𝑛 +1 𝑛
where𝑁 denotes the population size and 𝑛 is the sample size. Note that , 2, + 1, ,
2 2 2 2
𝑛
and + 1 are all subscripts, referring to position of the data value in the data set, after
2
being arranged in increasing (or decreasing) order. For example, 𝑥7 refers to the seventh
data value in the sequence, while 𝑥4 is the fourth value in the data set.
Example 4:
A retail outlet selling a particular product sold this many packs in the past few days: 90,
92, 93, 88, 95, 88, 97, 87, and 98. What is the median number of packs sold?
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 80
Solution:
Ordering the data from least to greatest and labeling these values, we get:
𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6 𝑥7 𝑥8 𝑥9
87 88 88 90 92 93 95 97 98
Since 𝑛 = 9 (odd),
𝑥 = 𝑥𝑛 +1 = 𝑥9+1 = 𝑥5 = 92
2 2
The median number of packs sold is 92. (Four days sold more packsthan 92 and four
days sold less than 92.)
R Script:
# Create the data vector
packs.sold<-c(90, 92, 93, 88, 95, 88, 97, 87, 98)
[1] 92
Example 5:
The ages of 10 college students are listed below. Find the median.
18, 24, 20, 35, 19, 23, 26, 23, 19, 20
Solution:
Ordering the data from least to greatest and labeling these values, we get:
𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6 𝑥7 𝑥8 𝑥9 𝑥10
18 19 19 20 20 23 23 24 26 35
Since 𝑛 = 10 (even),
𝑥𝑛 + 𝑥𝑛 +1 𝑥10 + 𝑥10 +1 𝑥5 + 𝑥6 20 + 23
2 2 2 2
𝑥= = = = = 21.5
2 2 2 2
The median age of the college students is 21.5 years.
R Script
# Create the data vector
ages <-c(18, 24, 20, 35, 19, 23, 26, 23, 19, 20)
[1] 21.5
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 81
2. Only the middle scores or measurements are considered in the computation of the
median.
3. Very high or very low scores do not affect the median.
4. When there are extreme values in the data set (interval or ratio data), that is, the
distribution is markedly skewed, it is more appropriate to use the median than the
mean since the extreme values affect the mean.
5. The median is used as a basis of knowing whether cases fall within the upper half or
the lower half of a data distribution
MIDRANGE
Another measure of center is the midrange. Because the midrange uses only the maximum
and minimum values, it is too sensitive to those extremes, so the midrange is rarely
used. However, the midrange does have three redeeming features:
1. It is very easy to compute.
2. It helps to reinforce the important point that there are several different ways to define
the center of a data set.
3. It is sometimes incorrectly used for the median, so confusion can be reduced by
clearly defining the midrange along with the median.
The midrange of a data set is the measure of center that is the value midway between the
maximum and minimum values in the original data set. It is found by adding the maximum
data value to the minimum data value and then dividing the sum by 2, as in the following
formula:
𝑚𝑎𝑥𝑖𝑚𝑢𝑚 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒 + 𝑚𝑖𝑛𝑖𝑚𝑢𝑚 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒
𝑚𝑖𝑑𝑟𝑎𝑛𝑔𝑒 =
2
Example 6:
Find the midrange of these values representing the sales, in pesos, of a restaurant on five
business days:
27,531 15,684 5,638 27,997 and 25,433.
Solution:
The midrange is found as follows:
𝑚𝑎𝑥𝑖𝑚𝑢𝑚 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒 + 𝑚𝑖𝑛𝑖𝑚𝑢𝑚 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒
𝑚𝑖𝑑𝑟𝑎𝑛𝑔𝑒 =
2
27997 + 5638
𝑚𝑖𝑑𝑟𝑎𝑛𝑔𝑒 = = 16817.50
2
The midrange is P 16,817.50.
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 82
R Script
# Create the data vector
sales <-c(27531, 15684, 5638, 27997, 25433)
[1] 16817.5
Suppose we wish to determine central tendency measures for numeric variables in a data
frame. In this case, the sapply() function in RStudio would be utilized if we wish to present
measures for variables in a data set simultaneously. For this example, we use the
“salaries.csv” file. Execute the following script in RStudio. Open a new file, select R Script to
proceed.
averages<-sapply(salaries, mean)
pander(averages)
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 83
X rank discipline yrs.since.phd yrs.service sex salary
199 NA NA 22.31 17.61 NA 113706
# Notice that the sapply function behaves inconsistently when there are
# nonnumeric variables in the data set.
# To avoid this problem, we exclude the nonnumeric variables by using the
# bracket notation where contained inside is a negation of the column number/s
# of the nonnumeric variable/s
# If you wish to eliminate only one variable, say "rank" which is in column 2
averages <-sapply(salaries[-2], mean)
pander(averages)
Suppose we would like to present the mean salary of the teacher grouped according to
rank. We can generate the statistical measures in RStudio by using the tapply and
aggregate functions. Check out the following script.
# Statistics by Group
# Using the tapply function. We generate the mean salary for each group of
teachers based on rank.
output1 <-tapply(salaries$salary, salaries$rank, mean)
pander(output1)
# Using the aggregate function, we generate the same statistical measures for
the same groups.
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 84
output2 <-aggregate(salary ~rank, salaries, mean)
pander(output2)
rank salary
AssocProf 93876
AsstProf 80776
Prof 126772
Find the mean, median, mode and midrange of the following data set on the total weight,
in kilograms, of ready-to-cook chicken inasal leg quarters sold by a frozen foods retail store
during selected days of June and July. Express your answers up to 2 decimal places.
35.2 7.0 24.0 42.4 33.0 27.5 24.0 21.0 8.0 45.6 25.9 14.8 29.8 21.0
17.5 9.7 40.0 18.8 57.9 21.0 12.0 12.0 19.6 51.5 12.0 36.8 13.7 32.8
12.0 10.5 22.5 19.5 37.5 35.0 10.5 33.6 14.5 36.5 17.9 26.9 12.0 41.5
Using RStudio, solve the following problems as directed. Submit a single .docx file
containing the output of R for each problem and submit also the saved RStudio script.
Summarize your answers for each problem with a conclusion. Save your files as LRA6-
1<LASTNAME>.docx and LRA6-1<LASTNAME>.R.
1. Find the mean, median, mode, and midrange for the following data set
representing the number of applications for a fiber internet plan received in a day
by a service provider, over the past 30 working days. (10 points)
45 46 48 53 54 55 56 59 62 63
65 66 66 69 69 70 71 71 73 73
74 75 75 75 77 78 81 82 82 83
2. A BS Accountancy student received the following final grades in his course during
the second semester of his sophomore year. Find his general weighted average if his
final grades were as follows. Would he be part of the Dean’s List for the semester if
the cutoff grade is 88? (5 points)
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 85
FINAL
COURSE NO. DESCRIPTIVE TITLE UNITS
GRADE
CFE 104 CICM Missionary Identity 3 89
GSTS Science, Technology, and Society 3 84
GMATH Mathematics in the Modern World 3 89
Physical Activity Towards Health & Fitness
FIT AQ 2 91
(Aquatics)
AE 221 Intermediate Accounting 3 3 88
AE 222 Accounting Information Systems 3 95
BLR 221 Business Laws and Regulations 2 3 87
CMPC 221 Accounting for Business Combinations 3 89
INCTAXa Income Taxation 6 87
3. Below is the number of units produced by a factory in the last 33 days of production.
Assuming the data to be a sample, compute the mean, median, mode and
midrange. (10 points)
322 343 348 358 361 366 374 376 386 390 396
329 344 349 359 362 366 375 377 389 392 397
333 347 351 360 365 367 376 379 390 395 398
4. The table that follows shows the time (in minutes) it takes for customers to wait in line
before being served at a fast food restaurant. Assuming the data to be a sample,
compute the mean, median, mode and midrange. (10 points)
3.2 3.3 3.5 3.9 4.1 4.4 4.7 4.8 5.2 5.6
5.6 5.7 5.8 6.0 6.2 6.3 6.4 6.5 6.7 6.7
6.9 7.0 7.2 7.5 8.0 8.8 8.9 9.4 9.7 9.9
10.0 11.3 12.4 12.5 14.8 15.0 16.5 16.8 17.2 19.3
Congratulations! You just completed all the module and units for the Prelims.
You are now ready to take the examination.
Because you were diligent with your studies, you will surely ace the exam.
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 86