Lecture 2b - Describing Data-Numerical
Lecture 2b - Describing Data-Numerical
Lecture 2b - Describing Data-Numerical
Quantitative Methods
By Handema M.
Section Goals
After completing this section, you should be able to:
Compute and interpret the mean, median, and mode for a
set of data
Compute the range, variance, and standard deviation and
know what these values mean
Construct and interpret a box and whiskers plot
Compute and explain the coefficient of variation and
z scores
Use numerical measures along with graphs, charts, and
tables to describe data
Section Topics
Measures of Center and Location
Mean, median, mode, geometric mean,
midrange
Other measures of Location
Weighted mean, percentiles, quartiles
Measures of Variation
Range, interquartile range, variance and
standard deviation, coefficient of variation
Summary Measures
Coefficient of
Variation
Measures of Center and Location
Overview
Center and Location
n
x i
XW
wx i i
x
w
i 1
n i
N
x i W
wxi i
i 1
N w i
Mean (Arithmetic Average)
x i
x1 x 2 xn
x i1
n n
Population mean N = Population Size
N
xx1 x 2 xN
i
i1
N N
Mean (Arithmetic Average)
(continued)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Mean = 3 Mean = 4
1 2 3 4 5 15 1 2 3 4 10 20
3 4
5 5 5 5
Median
Not affected by extreme values
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Median = 3 Median = 3
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
Mode = 5 No Mode
Weighted Mean
Example: Sample of
26 Repair Projects
Weighted Mean Days
Days to Frequency to Complete:
Complete
5 4 XW
w x
i i
(4 5) (12 6) (8 7) (2 8)
6 12 w i 4 12 8 2
7 8 164
6.31 days
8 2 26
Review Example
$2,000,000
500,000 $500 K
300,000 $300 K
100,000
100,000
$100 K
$100 K
Summary Statistics
House Prices:
Mean: ($3,000,000/5)
$2,000,000 = $600,000
500,000
300,000
100,000
100,000 Median: middle value of ranked data
Sum 3,000,000
= $300,000
Mean < Median < Mode Mean = Median = Mode < Median < Mean
(Longer tail extends to left)
Mode (Longer tail extends to right)
Other Location Measures
Other Measures
of Location
Percentiles Quartiles
Q1 Q2 Q3
Example: Find the first quartile
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
(n = 9)
Q1 = 25th percentile, so find the 25 (9+1) = 2.5 position
100
so use the value half way between the 2nd and 3rd values,
so Q1 = 12.5
Box and Whisker Plot
A Graphical display of data using 5-number
summary:
Minimum -- Q1 -- Median -- Q3 -- Maximum
Example:
Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3
Box-and-Whisker Plot Example
00 22 33 55 27
27
This data is very right skewed, as the plot depicts
Measures of Variation
Variation
Sample Sample
Variance Standard
Deviation
Variation
Same center,
different variation
Range
Simplest measure of variation
Difference between the largest and the smallest
observations:
Range = xmaximum – xminimum
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
Disadvantages of the Range
Ignores the way in which data are distributed
7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5
Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
Interquartile Range
Example:
Median X
X Q1 Q3 maximum
minimum (Q2)
25% 25% 25% 25%
12 30 45 57 70
Interquartile range
= 57 – 30 = 27
Variance
i
(x x ) 2
s2 i1
n -1
Population variance: N
i
(x μ) 2
σ2 i1
N
Standard Deviation
i
Sample standard deviation:
(x x ) 2
s i1
n -1
i
(x μ) 2
σ i1
N
Calculation Example:
Sample Standard Deviation
Sample
Data (Xi) : 10 12 14 15 17 18 18 24
n=8 Mean = x = 16
126
4.2426
7
Comparing Standard Deviations
Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 3.338
Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = .9258
Data C
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 4.57
Coefficient of Variation
Measures relative variation
Always in percentage (%)
Shows variation relative to mean
Is used to compare two or more sets of data
measured in different units
Population Sample
σ s
CV 100% CV 100%
μ x
Comparing Coefficient
of Variation
Stock A:
Average price last year = K50
Standard deviation = K5
s K5
CVA 100%
100% 10%
x K50 Both stocks
have the same
Stock B:
standard
Average price last year = K100 deviation, but
stock B is less
Standard deviation = K5 variable relative
to its price
s K5
CVB 100%
100% 5%
x K100
Grouped Data
Mean
Median
Mode
The Empirical Rule
68%
μ
μ 1σ
The Empirical Rule
μ 2σ contains about 95% of the values in
the population or the sample
μ 3σ contains about 99.7% of the values
in the population or the sample
95% 99.7%
μ 2σ μ 3σ
Chebyshev’s Theorem
Examples:
At least within
(1 - 1/12) = 0% ……..... k=1 (μ ± 1σ)
(1 - 1/22) = 75% …........ k=2 (μ ± 2σ)
(1 - 1/32) = 89% ………. k=3 (μ ± 3σ)
Standardized Data Values
x μ
z
σ
where:
x = original data value
μ = population mean
z = standard score
xx
z
s
where:
x = original data value
x = sample mean
z = standard score
Click OK
Excel output
Microsoft Excel
descriptive statistics output,
using the house price data:
House Prices:
K2,000,000
500,000
300,000
100,000
100,000
Section Summary