4 Numerical Methods For Describing Data
4 Numerical Methods For Describing Data
4 Numerical Methods For Describing Data
3 4 5 8 10
Suppose we caught a sample of 6 fish from the
lake. The median length is …
5.5
3 4 5 6 8 10
Measures of Central Tendency
Mean is the arithmetic average.
Formula:
x x
n
Suppose we caught a sample of 6 fish from
the lake. Find the mean length of the fish.
3 4 5 6 8 10
x 6
6
3 4 5 6 8 10
Now find how each observation deviates
from the mean.
x (x - x)
3 -3
3-6
4 -2
5 -1
6 0
8 2
10 4
Sum 0
Imagine a ruler with pennies placed at
3”, 4”, 5”, 6”, 8” and 10”.
To balance the
ruler on your
finger, you would
need to place your
finger at the mean
of 6.
The mean is the
balance point of a
distribution
What happens to the median & mean if
the length of 10 inches was 15 inches?
3 4 5 6 8 15
What happens to the median & mean if
the 15 inches was 20?
3 4 5 6 8 20
Some statistics that are not affected by
extreme values . . .
Mean = 6.5
Median = 6.5
3 5 6 10 6 7 7 8 4 5
6 4 7 5 9 9 8 7 6 8
Suppose we caught a sample of 20 fish with
the following lengths. Create a histogram
for the lengths of fish. (Use a class width 1.)
Mean = 6.8
Median = 5.5
3 5 6 10 15 7 3 3 4 5
6 4 12 5 3 4 8 13 11 9
Suppose we caught a sample of 20 fish with
the following lengths. Create a histogram
for the lengths of fish. (Use a class width of 1.)
Mean = 7.75
Median = 8.5
3 5 6 10 10 7 10 8 9 5
6 4 9 10 9 9 10 7 10 8
Recap:
• In a symmetrical distribution, the mean
and median are equal.
• In a skewed distribution, the mean is
pulled in the direction of the skewness.
12 14 19 20 22 24 25 26 26 50
Mean = 23.8
14 19 20 22 24 25 26 26
xT 22
8
What values are used to describe
categorical data?
Suppose that each person in a sample of 15 cell
phone users is asked if he or she is satisfied
with the cell phone service.
20 30 40 50 60 70
20 30 40 50 60 70
Range =
largest observation – smallest observation
The first two data
20 30 40 50 60 70 sets have a range of
50 (70-20) but the
20 30 40 50 60 70 third data set has a
20 30 40 50 60 70
much smaller range
of 10.
Measures of Variability
Another measure of the variability in a
data set uses the deviations from the
mean (x – x).
2
2 x x
s
n 1
When calculating sample variance, we use
degrees of freedom (n – 1) in the
denominator instead of n because this
tends to produce better estimates.
x (x - x) (x - x)2
3 -3 9
4 -2 4
5 -1 1
6 0 0
8 2 4
10 4 16
Sum 0 34 s2 = 6.5
Measures of Variability
The square root of variance is called standard
deviation.
2
x x
s
n 1
Measures of Variability
Interquartile range (iqr) is the range of
the middle half of the data.
iqr = Q3 – Q1
The Chronicle of Higher Education (2009-2010
issue) published the accompanying data on the
percentage of the population with a bachelor’s or
higher degree in 2007 for each of the 50 states
and the District of Columbia.
21 27 26 19 30 35 35 26 47 26
27 30 24 29 22 24 29 20 20 27
35 38 25 31 19 24 27 27 23 34
25 32 26 24 22 28 26 30 23 25
22 25 29 33 34 30 17 25 23 34
26
iqr = 30 – 24 = 6
Another graph- Boxplots
What are some advantages of boxplots?
• ease of construction
• convenient handling of outliers
• construction is not subjective (like
histograms)
• Used with medium or large size data
sets (n > 10)
• useful for comparative displays
Boxplots
When to Use Univariate numerical data
To describe
– comment on the center, spread, and shape of the
distribution and if there is any unusual features
Remember the data on the percentage of the
population with a bachelor’s or higher degree in
2007 for each of the 50 states and the District of
Columbia.
17 19 19 20 20 21 22 22 22 23
23 23 24 24 24 24 25 25 25 25
25 26 26 26 26 26 26 27 27 27
27 27 28 29 29 29 30 30 30 30
31 32 33 34 34 34 35 35 35 38
47
10 20 30 40 50
Percentages
Modified boxplots
To display outliers:
• Identify mild & extreme outliers
An observation is an outliers if it is more than
1.5(iqr) away from the nearest quartile.
1 1.5 iqr
AnQoutlier and Q3 if
is extreme 1it iqrmore
.5is than 3(iqr)
away from the nearest quartile.
17 19 19 20 20 21 22 22 22 23
23 23 24 24 24 24 25 25 25 25
25 26 26 26 26 26 26 27 27 27
27 27 28 29 29 29 30 30 30 30
31 32 33 34 34 34 35 35 35 38
47
24-1.5(6) = 15
30+1.5(6) = 39
30+3(6) = 48
10 20 30 40 50
Percentages
Symmetrical boxplots Approximately symmetrical boxplot
Skewed boxplot
The 2009-2010 salaries of NBA players
published on the web site hoopshype.com were
used to construct the comparative boxplot of
salary data for five teams.
Interpreting Center & Variability
Chebyshev’s Rule –
value - mean
z - score
standard deviation
What do these z-scores mean?
62 56 69 65
z 1.714 z 1.429
3 .5 2. 8