QM Statistic Notes
QM Statistic Notes
QM Statistic Notes
Frequency ( f )
4
6
4
3
2
1
Note that the sum of the frequency column is equal to 20, the number of test
scores.
Additional Terminology:
Lower Class Limit The least value that can belong to a class.
Upper Class Limit The greatest value that can belong to a
class.
Class Width The difference between the upper (or lower) class
limits of consecutive classes. All classes should have the same
class width.
Class Midpoint The middle value of each data class. To find
the class midpoint, average the upper and lower class limits.
class midpoint =
upper lower
2
ascending, highest if descending.) NOTE: if the lower limit is 5 and the class width
Mathematical Notation:
In this course, the following symbols and variables will have the meanings given
below (unless otherwise specified.)
Variables
x = data value
n = number of values in a sample data set
N = number of values in a population data set
f
= frequency of a data class
Symbol
= the sum of all values for the following variable or
expression.
Ex: Using our notation, we can write the statement that the sum of the
frequencies in a frequency table should equal the number of values in the
sample data set as follows:
Cumulative Frequency:
Frequency ( f )
90-99
80-89
70-79
60-69
50-59
40-49
4
6
4
3
2
1
Cumulative
Frequency
4
10
14
17
19
20
Notice that the last entry in the cumulative frequency column is n = 20.
Class Exercise: Add a cumulative frequency column to the table of gasoline
purchases.
Relative Frequency:
f
n
Ex:
Class
Frequency ( f )
Cumulative
Frequency
90-99
80-89
70-79
60-69
50-59
40-49
4
6
4
3
2
1
4
10
14
17
19
20
Relative
Frequency
(f / n)
.20
.30
.20
.15
.10
.05
f
1
n
Histograms:
Notice that the bar for each class is centered at the class midpoint, and the bars
for successive classes touch.
Class Exercise:
Construct a histogram for the frequency table of gasoline purchases.
Frequency Polygon:
Class Exercise:
Construct a frequency polygon from the gasoline purchase frequency table.
Now construct a cumulative frequency polygon (ogive) from the gasoline
purchase frequency table. What does the slope of the line segments tell you in
either case? What does a line segment with zero slope (flat) tell you in a
cumulative frequency polygon?
A stem and leaf plot reports the exact data by the leftmost
number(s) being part of the stem, and the rightmost number(s)
being the leaves.
Ex: Given the previous statistics exam grades for 20 statistics students, let us
create a stem and leaf plot.
97, 92, 88, 75, 83, 67, 89, 55, 72, 78, 81, 91, 57, 63, 67, 74, 87, 84, 98, 46
10
x
N
number order. If n is odd, the middle value is the median. If n is even, the mean of
the two middle values is the median.
Exercise: Find the median value for the set of quiz scores.
Find the median if the low score of 1 is dropped.
Mode the data value (or values) which appears the largest
number of times in the set. If no data value is repeated, we say that there is
no mode.
Exercise: Find the mode(s) of the quiz score data set.
Outlier a data entry far removed from the other entries in the data set.
11
( x w)
w
Note: This is equivalent to counting each data value the number of times given by its weight.
Ex: Grade
point average. We assign the letter grades the number values A=4,
B=3, C=2, D=1, F=0, and then each grade value is counted into the GPA
according to the number of credits earned (courses weight) with that course
grade.
Course grade. The final grade in a course is calculated according to the
following scale: Quizzes count for 15%, 3 exams whose average counts 60%,
12
and the final exam is worth 25%. We can weigh the score for each component
of the final grade with its percentage to calculate the final grade.
Exercises (use the previous pages information):
Calculate the GPA of a student who has earned 12 credits of As, 21 credits
of Bs, 5 credits of Cs and 3 credits of Ds.
Calculate the final score for a student who has scored 95 on quizzes, has
exam scores of 83, 94, and 77, and a final exam score of 88.
13
Given the frequency distribution of a data set, we can make the best estimate of
the mean for the data set by using a weighted mean.
1. Calculate the class midpoint for each data class.
values for calculating the weighted mean.)
lower upper
2
(Our data
2. Use the frequency of the data class as the weight for each data class.
3. Calculate the weighted mean by the weighted mean formula, or:
( x f )
f
mid
Exercise: Estimate the mean of the data set whose frequency distribution is
given by:
Class Frequency ( f )
90-99
4
80-89
6
70-79
4
60-69
3
50-59
2
40-49
1
14
15
16
1. Range the difference between the largest and smallest data values
in a data set.
range highest value lowest value
(x )
(x x )
n 1
18
Given the frequency distribution of a data set, we can make the best estimate of
the standard deviation for the data set by using the same technique as for mean.
1. Calculate the class midpoint for each data class. These will be our data values
for calculating the standard deviation.
2. Use the frequency of the data class as the weight for each data class midpoint.
(That is, multiply by the frequency rather than having to sum that many times.)
3. Calculate the standard deviation by using the formula:
(x
mid
x ) 2 f
n 1
(sample)
OR
(x
mid
)2 f
N
(population)
Exercise: Estimate the standard deviation of the data set whose frequency
distribution is given by:
Class
Frequency ( f )
90-99
80-89
70-79
60-69
50-59
40-49
4
6
4
3
2
1
19
x
Sx
Med
20
The standard deviation of a data set is an important quantity because it limits the
number of data values that can be very far (high or low) from average.
The Empirical Rule (68-95-99.7 Rule)
Applies only to bell-shaped distributions.
Approximately 68% of data values must be within 1 standard deviation of
the mean.
Approximately 95% of data values must be within 2 standard deviation of
the mean.
Approximately 99.7% of data values must be within 3 standard deviation of
the mean.
Ex: Mens Heights have a bell-shaped distribution with a mean of
69.2 inches and a standard deviation of 2.9 inches. Between what
heights does 95% of the male population lie?
Chebychevs Theorem
Applies to any data set.
The portion (%) of data values that must be within k ( for k>1) standard
1
21
Note: Chebychevs Theorem gives only cautious lower bounds for the proportion of data
values, whereas the Empirical Rule gives approximations. If a data distribution is known to
be bell-shaped, the Empirical Rule should be used.
2.5 Measures of Position:
Fractiles divide a data set into consecutive intervals so that each interval has (at
least approximately) the same number of data values. The most common fractiles
are:
Note: There are 99 percentiles P1-P99, 3 quartiles Q1-Q3, and 9 deciles D1-D9.
Note: P50 = Q2 = D5 = Median
Ex: Using the quiz scores from before put them in order, then find and
interpret Q1-Q3.
Quiz Scores: 1, 5, 7, 7, 6, 8, 10, 9, 5, 10, 8
A Box (and Whisker) Plot illustrates the range, Q1, Q2 (median) and Q3. Lets draw one and
discuss it.
(We can also do all of this in our calculators, anyone interested?) (p 125)
22
Ex: If your doctor tells you your 3 year old is in the 50th percentile for height
and the 35th percentile for weight, what does that mean?
The Standard Score:
The standard score (or z-score) of a data value is the number of standard
deviations that the value lies above or below the mean for a bell-shaped
distribution.
Think about it. The larger the z-score, the ____________________the mean.
The ______________________the mean, the ___________ the percentage of
data between the mean and that z-score.
Standard Scores can be calculated using the formula:
z
Exercise: Men have a mean height of 69.2 inches with a standard deviation
of 2.9 inches. Find the standard (z-)score of a man who is:
6 feet tall
51/2
63
Now find the percentile for the last two men above.
Note: The z-score of a value is positive if the value is above the mean and
negative if it is below the mean. The mean itself always has a z-score of _____.
A data value is considered to be unusual if it is more than two standard
deviations from the mean. A data value is unusually high if it has a z-score
larger than 2 and unusually low if it has a z-score of less than -2.
23
Think about the Empirical Rule and Chebychevs Theorem. Why does this make
sense?
Ex: p112 #32, 36 (look at uses & abuses charts p 115)
24