Introduction To Probability and Statistics Twelfth Edition
Introduction To Probability and Statistics Twelfth Edition
and Statistics
Twelfth Edition
Chapter 2
Describing Data
with Numerical Measures
Some graphic screen captures from Seeing Statistics ® Copyright ©2006 Brooks/Cole
Some images © 2001-(current year) www.arttoday.com A division of Thomson Learning, Inc.
Describing Data with Numerical
Measures
• Graphical methods may not always be
sufficient for describing data.
• Numerical measures can be created for
both populations and samples.
– A parameter is a numerical descriptive
measure calculated for a population.
population
– A statistic is a numerical descriptive
measure calculated for a sample.
sample
xi 2 9 11 5 6 33
x 6.6
n 5 5
n 25 8/25
Relative frequency
• Median? 6/25
m2 4/25
mode 2
0
0 1 2 3 4 5
Quarts
MY APPLET
n 1
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
The Standard Deviation
• In calculating the variance, we squared all
of the deviations, and in doing so changed
the scale of the measurements.
• To return this measure of variability to the
original units of measure, we calculate the
standard deviation,
deviation the positive square
root of the variance.
Population standard deviation : 2
Sample standard deviation : s s 2
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Two Ways to Calculate
the Sample Variance
Use the Definition Formula:
xi xi x ( xi x ) 2
2
5 -4 16 ( x x )
s2 i
12 3 9 n 1
6 -3 9
60
8 -1 1 15
14 5 25
4
Sum 45 0 60
s s 2 15 3.87
s2 n
12 144
n 1
6 36
452
8 64 465
14 196 5 15
Sum 45 465 4
2
s s 15 3.87
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Some Notes MY APPLET
12/50
x 44.9
Relative frequency
10/50
8/50
s 10.73 6/50
4/50
2/50
0
25 33 41 49 57 65 73
•Yes. Tchebysheff’s
•Do the actual proportions in the three
Theorem must be
intervals agree with those given by true for any data
Tchebysheff’s Theorem? set.
•Do they agree with the Empirical •No. Not very well.
Rule?
•The data distribution is not very
•Why or why not? mound-shaped, but skewed right.
Actual s = 10.73
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Measures of Relative Standing
• Where does one particular measurement stand in
relation to the other measurements in the data set?
• How many standard deviations away from the
mean does the measurement lie? This is measured
by the z-score.
Suppose s = 2. s
xx 4
z - score s s
s
x 5 x9
x = 9 lies z =2 std dev from the mean.
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
z-Scores
• From Tchebysheff’s Theorem and the Empirical Rule
– At least 3/4 and more likely 95% of measurements lie within
2 standard deviations of the mean.
– At least 8/9 and more likely 99.7% of measurements lie
within 3 standard deviations of the mean.
• z-scores between –2 and 2 are not unusual. z-scores should not
be more than 3 in absolute value. z-scores larger than 3 in
absolute value would indicate a possible outlier.
p% (100-p) %
x
p-th percentile
Q1 m Q3
*
Q1 m Q3
m
Q1 Q3
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Example
IQR = 340-292.5 = 47.5
Lower fence = 292.5-1.5(47.5) = 221.25
Upper fence = 340 + 1.5(47.5) = 411.25
MY APPLET
Outlier: x = 520
*
m
Q1 Q3
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Interpreting Box Plots
Median line in center of box and whiskers
of equal length—symmetric distribution
Median line left of center and long right
whisker—skewed right
Median line right of center and long left
whisker—skewed left
2 ( xi ) 2
2 xi
( x x ) n
s2 i
n 1 n 1
3. Standard deviation
Population standard deviation : 2
Sample standard deviation : s s 2
4. A rough approximation for s can be calculated as s R / 4.
The divisor can be adjusted depending on the sample size.
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Key Concepts
III. Tchebysheff’s Theorem and the Empirical Rule
1. Use Tchebysheff’s Theorem for any data set, regardless of
its shape or size.
a. At least 1-(1/k 2 ) of the measurements lie within k
standard deviation of the mean.
b. This is only a lower bound; there may be more
measurements in the interval.
2. The Empirical Rule can be used only for relatively mound-
shaped data sets.
– Approximately 68%, 95%, and 99.7% of the measurements
are within one, two, and three standard deviations of the
mean, respectively.
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Key Concepts
IV. Measures of Relative Standing
1. Sample z-score:
2. pth percentile; p% of the measurements are smaller, and
(100 p)% are larger.
3. Lower quartile, Q 1; position of Q 1 .25(n 1)
4. Upper quartile, Q 3 ; position of Q 3 .75(n 1)
5. Interquartile range: IQR Q 3 Q 1
V. Box Plots
1. Box plots are used for detecting outliers and shapes of
distributions.
2. Q 1 and Q 3 form the ends of the box. The median line is in
the interior of the box.