Lecture 2b - Describing Data-Numerical

BBA 240
Quantitative Methods
Describing Data: Numerical
By Handema M.
Section Goals
After completing this section, you should be able to:
 Compute and interpret the mean, median, and mode for a
set of data
 Compute the range, variance, and standard deviation and
know what these values mean
 Construct and interpret a box and whiskers plot
 Compute and explain the coefficient of variation and
z scores
 Use numerical measures along with graphs, charts, and
tables to describe data
Section Topics
 Measures of Center and Location
 Mean, median, mode, geometric mean,
midrange
 Other measures of Location
 Weighted mean, percentiles, quartiles
 Measures of Variation
 Range, interquartile range, variance and
standard deviation, coefficient of variation
Summary Measures
Describing Data Numerically
Center and Location Other Measures Variation

of Location
Mean Range
Percentiles
Median Interquartile Range
Quartiles
Mode
Variance
Weighted Mean
Standard Deviation
Coefficient of
Variation
Measures of Center and Location
Overview
Center and Location
Mean Median Mode Weighted Mean

n
x i
XW 
wx i i
x
w
i 1
n i
N
x i W 
 wxi i
 i 1
N w i
Mean (Arithmetic Average)
 The Mean is the arithmetic average of data

values
 Sample mean n = Sample Size
n
x i
x1  x 2    xn
x i1

n n
 Population mean N = Population Size
N
xx1  x 2    xN
i
 i1
N N
Mean (Arithmetic Average)
(continued)
 The most common measure of central tendency

 Mean = sum of values divided by the number of values
 Affected by extreme values (outliers)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Mean = 3 Mean = 4
1  2  3  4  5 15 1  2  3  4  10 20
 3  4
5 5 5 5
Median
 Not affected by extreme values
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Median = 3 Median = 3
 In an ordered array, the median is the “middle”

number
 If n or N is odd, the median is the middle number
 If n or N is even, the median is the average of the
two middle numbers
Mode
 A measure of central tendency
 Value that occurs most often
 Not affected by extreme values
 Used for either numerical or categorical data
 There may be no mode
 There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
Mode = 5 No Mode
Weighted Mean
 Used when values are grouped by frequency or

relative importance
Example: Sample of
26 Repair Projects
Weighted Mean Days
Days to Frequency to Complete:
Complete
5 4 XW 
w x
i i

(4  5)  (12  6)  (8  7)  (2  8)
6 12 w i 4  12  8  2
7 8 164
  6.31 days
8 2 26
Review Example
 Five houses on a hill by the beach

$2,000 K
House Prices:
$2,000,000
500,000 $500 K
300,000 $300 K
100,000
100,000
$100 K
$100 K
Summary Statistics
House Prices:
 Mean: ($3,000,000/5)
$2,000,000 = $600,000
500,000
300,000
100,000
100,000  Median: middle value of ranked data
Sum 3,000,000
= $300,000
 Mode: most frequent value

= $100,000
Which measure of location
is the “best”?
 Mean is generally used, unless

extreme values (outliers) exist
 Then median is often used, since
the median is not sensitive to
extreme values.
 Example: Median home prices may be
reported for a region – less sensitive to
outliers
Shape of a Distribution
 Describes how data is distributed
 Symmetric or skewed
Left-Skewed Symmetric Right-Skewed
Mean < Median < Mode Mean = Median = Mode < Median < Mean
(Longer tail extends to left)
Mode (Longer tail extends to right)
Other Location Measures
Other Measures
of Location
Percentiles Quartiles
The pth percentile in a data array:  1st quartile = 25th percentile

 p% are less than or equal to this
value
 2nd quartile = 50th percentile
 (100 – p)% are greater than or = median
equal to this value
(where 0 ≤ p ≤ 100)
 3rd quartile = 75th percentile
Percentiles
 The pth percentile in an ordered array of n

values is the value in ith position, where
p
i (n  1)
100
 Example: The 60th percentile in an ordered array of 19
values is the value in 12th position:
p 60
i (n  1)  (19  1)  12
100 100
Quartiles
 Quartiles split the ranked data into 4 equal

groups
25% 25% 25% 25%
Q1 Q2 Q3
 Example: Find the first quartile
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
(n = 9)
Q1 = 25th percentile, so find the 25 (9+1) = 2.5 position
100
so use the value half way between the 2nd and 3rd values,
so Q1 = 12.5
Box and Whisker Plot
 A Graphical display of data using 5-number
summary:
Minimum -- Q1 -- Median -- Q3 -- Maximum
Example:
25% 25% 25% 25%
Minimum 1st Median 3rd Maximum

Minimum Quartile
1st Median Quartile
3rd Maximum
Quartile Quartile
Shape of Box and Whisker Plots
 The Box and central line are centered between the

endpoints if data is symmetric around the median
 A Box and Whisker plot can be shown in either vertical

or horizontal format
Distribution Shape and
Box and Whisker Plot
Left-Skewed Symmetric Right-Skewed
Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3
Box-and-Whisker Plot Example
 Below is a Box-and-Whisker plot for the following

data:
Min Q1 Q2 Q3 Max
0 2 2 2 3 3 4 5 5 10 27
00 22 33 55 27
27
 This data is very right skewed, as the plot depicts
Measures of Variation
Variation
Range Variance Standard Deviation Coefficient of

Variation
Population Population
Interquartile
Variance Standard
Range
Deviation
Sample Sample
Variance Standard
Deviation
Variation
 Measures of variation give information on

the spread or variability of the data
values.
Same center,
different variation
Range
 Simplest measure of variation
 Difference between the largest and the smallest
observations:
Range = xmaximum – xminimum
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
Disadvantages of the Range
 Ignores the way in which data are distributed
7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5
 Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
Interquartile Range
 Can eliminate some outlier problems by using

the interquartile range
 Eliminate some high-and low-valued

observations and calculate the range from the
remaining values.
 Interquartile range = 3rd quartile – 1st quartile

Interquartile Range
Example:
Median X
X Q1 Q3 maximum
minimum (Q2)
25% 25% 25% 25%
12 30 45 57 70
Interquartile range
= 57 – 30 = 27
Variance
 Average of squared deviations of values from

the mean
 Sample variance: n
 i
(x  x ) 2
s2  i1
n -1
Population variance: N
 i

(x  μ) 2
σ2  i1
N
Standard Deviation
 Most commonly used measure of variation

 Shows variation about the mean
 Has the same units as the original data
n
 i
 Sample standard deviation:
(x  x ) 2
s i1
n -1
Population standard deviation: N
 i

(x  μ) 2
σ i1
N
Calculation Example:
Sample Standard Deviation
Sample
Data (Xi) : 10 12 14 15 17 18 18 24
n=8 Mean = x = 16
(10  x )2  (12  x )2  (14  x )2    (24  x )2

s 
n 1
(10  16)2  (12  16)2  (14  16)2    (24  16)2


8 1
126
  4.2426
7
Comparing Standard Deviations
Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 3.338
Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = .9258
Data C
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 4.57
Coefficient of Variation
 Measures relative variation
 Always in percentage (%)
 Shows variation relative to mean
 Is used to compare two or more sets of data
measured in different units
Population Sample
σ  s 
CV     100% CV     100%

μ  x 
Comparing Coefficient
of Variation
 Stock A:
 Average price last year = K50
 Standard deviation = K5
s  K5
CVA    100% 
 100%  10%
x  K50 Both stocks
have the same
 Stock B:
standard
 Average price last year = K100 deviation, but
stock B is less
 Standard deviation = K5 variable relative
to its price
s  K5
CVB    100% 
 100%  5%
x  K100
Grouped Data
 Mean
 Median
 Mode
The Empirical Rule
 If the data distribution is bell-shaped, then

the interval:
 μ  1σ contains about 68% of the values in
the population or the sample
X
68%
μ
μ  1σ
The Empirical Rule
 μ  2σ contains about 95% of the values in
the population or the sample
 μ  3σ contains about 99.7% of the values
in the population or the sample
95% 99.7%
μ  2σ μ  3σ
Chebyshev’s Theorem
 Regardless of how the data are distributed,

at least (1 - 1/k2) of the values will fall within
k standard deviations of the mean
 Examples:
At least within
(1 - 1/12) = 0% ……..... k=1 (μ ± 1σ)
(1 - 1/22) = 75% …........ k=2 (μ ± 2σ)
(1 - 1/32) = 89% ………. k=3 (μ ± 3σ)
Standardized Data Values
 A standardized data value refers to

the number of standard deviations a
value is from the mean
 Standardized data values are

sometimes referred to as z-scores
Standardized Population Values
x μ
z
σ
where:
 x = original data value
 μ = population mean
 σ = population standard deviation
 z = standard score
(number of standard deviations x is from μ)

Standardized Sample Values
xx
z
s
where:
 x = original data value
 x = sample mean
 s = sample standard deviation
 z = standard score
(number of standard deviations x is from μ)

Using Microsoft Excel
 Descriptive Statistics are easy to obtain

from Microsoft Excel
 Use menu choice:

tools / data analysis / descriptive statistics
 Enter details in dialog box

Using Excel
Use menu choice:

tools / data analysis /
descriptive statistics
Using Excel
(continued)
 Enter dialog box

details
 Check box for

summary statistics
 Click OK
Excel output
Microsoft Excel
descriptive statistics output,
using the house price data:
House Prices:
K2,000,000
500,000
300,000
100,000
100,000
Section Summary
 Described measures of center and location

 Mean, median, mode, geometric mean, midrange
 Discussed percentiles and quartiles
 Described measure of variation
 Range, interquartile range, variance,
standard deviation, coefficient of variation
 Created Box and Whisker Plots
Section Summary
(continued)
 Illustrated distribution shapes

 Symmetric, skewed
 Discussed Tchebysheff’s Theorem
 Calculated standardized data values
Thank You….

Lecture 2b - Describing Data-Numerical

Uploaded by

Copyright:

Available Formats

Lecture 2b - Describing Data-Numerical

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 2b - Describing Data-Numerical

Uploaded by

Copyright:

Available Formats

BBA 240

Describing Data: Numerical

Describing Data Numerically

Center and Location Other Measures Variation

Mean Median Mode Weighted Mean

 The Mean is the arithmetic average of data

 The most common measure of central tendency

 In an ordered array, the median is the “middle”

 Used when values are grouped by frequency or

 Five houses on a hill by the beach

 Mode: most frequent value

 Mean is generally used, unless

Left-Skewed Symmetric Right-Skewed

The pth percentile in a data array:  1st quartile = 25th percentile

 The pth percentile in an ordered array of n

 Quartiles split the ranked data into 4 equal

25% 25% 25% 25%

Minimum 1st Median 3rd Maximum

 The Box and central line are centered between the

 A Box and Whisker plot can be shown in either vertical

Left-Skewed Symmetric Right-Skewed

 Below is a Box-and-Whisker plot for the following

Range Variance Standard Deviation Coefficient of

 Measures of variation give information on

 Can eliminate some outlier problems by using

 Eliminate some high-and low-valued

 Interquartile range = 3rd quartile – 1st quartile

 Average of squared deviations of values from

 Most commonly used measure of variation

Population standard deviation: N

(10  x )2  (12  x )2  (14  x )2    (24  x )2

(10  16)2  (12  16)2  (14  16)2    (24  16)2

 If the data distribution is bell-shaped, then

 Regardless of how the data are distributed,

 A standardized data value refers to

 Standardized data values are

 σ = population standard deviation

(number of standard deviations x is from μ)

 s = sample standard deviation

(number of standard deviations x is from μ)

 Descriptive Statistics are easy to obtain

 Use menu choice:

 Enter details in dialog box

Use menu choice:

 Enter dialog box

 Check box for

 Described measures of center and location

 Illustrated distribution shapes

You might also like