Chapter 2, Discriptive Stastics
Chapter 2, Discriptive Stastics
Chapter 2, Discriptive Stastics
DATA COLLECTION
AND PRESENTATION
DATA COLLECTION AND PRESENTATION
What is data? • refers to facts or figures from which conclusion can be drawn.
information collected, organized, analyzed, and interpret by statisticians.
needed whenever we undertake studies or researches which are designed to answer particular problems, or to provide a base with which certain decisions may be formulated.
Kinds of Statistical Data •
1.Qualitative data - classificatory data ex. Sex, religion, citizenship
2. Quantitative data - either counts or measures ex. Weekly allowance Note: Qualitative data can be transformed into quantitative data by coding; ex. Female=1, Male=0 •
3.Primary data - to information gathered directly from an original source
4.Secondary data - refer to information taken from published or unpublished data which were previously gathered by other individuals or agencies
3. Experimental method - used when the objective is to determine the cause and effect relationship of certain variables under controlled conditions. Experimental
data are used to test hypotheses on significance of effects of one or more controlled variables on certain characteristics of the unit of analysis.
2. Box head - portion of the table which consists of the spanner and
column heads or captions describing the data in each column
a. Column head - basic unit of the box head; descriptive title placed
directly above the column to which it refers
b. Spanner head - title under which column heads are further classified
Class Frequency, f
1–4 4
Upper Class
Lower 5–8 5
Limits 9 – 12 3 Frequencies
13 – 16 4
17 – 20 2
Class Frequency, f
1–4 4
5–1=4 5–8 5
9–5=4 9 – 12 3
13 – 9 = 4 13 – 16 4
17 – 13 = 4 17 – 20 2
The class width is 4.
Ages of Students
18 20 21 27 29 20
19 30 32 19 34 19
24 29 18 37 38 22
30 39 32 44 33 46
54 49 18 51 21 21
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 29
Constructing a Frequency Distribution
Example continued:
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 30
Constructing a Frequency Distribution
Example continued:
3. The minimum data entry of 18 may be used for the lower limit of
the first class. To find the lower class limits of the remaining
classes, add the width (8) to each lower limit.
The lower class limits are 18, 26, 34, 42, and 50.
The upper class limits are 25, 33, 41, 49, and 57.
4. Make a tally mark for each data entry in the appropriate class.
5. The number of tally marks for a class is the frequency for that
class.
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 31
Constructing a Frequency Distribution
Example continued:
Number of
Ages students
Ages of Students
Class Tally Frequency, f
18 – 25 13
26 – 33 8
34 – 41 4
42 – 49 3
Check that the
50 – 57 2 sum equals the
number in the
f 30
sample.
Midpoint = 1 4 5 2.5
2 2
Class frequency f
Relative frequency =
Sample size n
Relative
Class Frequency, f
Frequency
1–4 4 0.222
f 18
f 4 0.222
Relative frequency
n 18
Larson & Farber, Elementary Statistics: Picturing the World, 3e 35
Relative Frequency
Example:
Find the relative frequencies for the “Ages of Students” frequency
distribution.
Relative Portion of
Class Frequency, f Frequency students
18 – 25 13 0.433 f 13
26 – 33 8 0.267 n 30
34 – 41 4 0.133 0.433
42 – 49 3 0.1
50 – 57 2 0.067
f
f 30 1
n
Larson & Farber, Elementary Statistics: Picturing the World, 3e 36
Cumulative Frequency
The cumulative frequency of a class is the sum of the frequency
for that class and all the previous classes.
Ages of Students
Cumulative
Class Frequency, f Frequency
18 – 25 13 13
26 – 33 +8 21
34 – 41 +4 25
42 – 49 + 3 28
Total number of
50 – 57 + 2 30 students
f 30
14 13 Ages of Students
12
10
8
8
f 6
4
4 3
2 2
0
17.5 25.5 33.5 41.5 49.5 57.5
Broken axis
Age (in years)
Larson & Farber, Elementary Statistics: Picturing the World, 3e 40
Frequency Polygon
A frequency polygon is a line graph that emphasizes the continuous change in frequencies.
14
Ages of Students
12
10
8 Line is extended to
the x-axis.
f 6
4
2
0
13.5 21.5 29.5 37.5 45.5 53.5 61.5
Broken axis
Age (in years) Midpoints
0.5
0.433
(portion of students)
Relative frequency
Relative
Type Frequency
Frequency
Motor Vehicle 43,500 0.578
Falls 12,200 0.162
Poison 6,400 0.085
Drowning 4,600 0.061
Fire 4,200 0.056
Ingestion of Food/Object 2,900 0.039
Firearms 1,400 0.019
n = 75,200
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 44
Pie Chart
Next, find the central angle. To find the central angle, multiply the
relative frequency by 360°.
Relative
Type Frequency Angle
Frequency
Motor Vehicle 43,500 0.578 208.2°
Falls 12,200 0.162 58.4°
Poison 6,400 0.085 30.6°
Drowning 4,600 0.061 22.0°
Fire 4,200 0.056 20.1°
Ingestion of Food/Object 2,900 0.039 13.9°
Firearms 1,400 0.019 6.7°
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 45
Pie Chart
Ingestion Firearms
3.9% 1.9%
Fire
5.6%
Drowning
6.1%
Poison
8.5% Motor
vehicles
Falls 57.8%
16.2%
Measures of Central
Tendency
Mean
A measure of central tendency is a value that represents a typical,
or central, entry of a data set. The three most commonly used
measures of central tendency are the mean, the median, and the
mode.
The mean of a data set is the sum of the data entries divided by the
number of entries.
x x
Population mean: μ Sample mean: x
N n
“mu” “x-bar”
53 32 61 57 39 44 57
Calculate the population mean.
Example:
Calculate the median age of the seven employees.
53 32 61 57 39 44 57
To find the median, sort the data.
32 39 44 53 57 57 61
The median age of the employees is 53 years.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 50
Mode
The mode of a data set is the data entry that occurs with the
greatest frequency. If no entry is repeated, the data set has no
mode. If two entries occur with the same greatest frequency, each
entry is a mode and the data set is called bimodal.
Example:
Find the mode of the ages of the seven employees.
53 32 61 57 39 44 57
The mode is 57 because it occurs the most times.
An outlier is a data entry that is far removed from the other entries
in the data set.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 51
Comparing the Mean, Median and Mode
Example:
A 29-year-old employee joins the company and the ages of the
employees are now:
53 32 61 57 39 44 57 29
Recalculate the mean, the median, and the mode. Which measure
of central tendency was affected when this new age was added?
Example:
Grades in a statistics class are weighted as follows:
Tests are worth 50% of the grade, homework is worth 30% of the
grade and the final is worth 20% of the grade. A student receives a
total of 80 points on tests, 100 points on homework, and 85 points
on his final. What is his current grade?
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 53
Weighted Mean
(x w ) 87 0.87
x
w 100
Example:
The following frequency distribution represents the ages of 30
students in a statistics class. Find the mean of the frequency
distribution.
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 55
Mean of a Frequency Distribution
Class midpoint
Class x f (x · f )
18 – 25 21.5 13 279.5
26 – 33 29.5 8 236.0
34 – 41 37.5 4 150.0
42 – 49 45.5 3 136.5
50 – 57 53.5 2 107.0
n = 30 Σ = 909.0
10 Annual Incomes
15,000
20,000
22,000
5
24,000 Income
4
25,000
25,000 f 3
2
26,000
28,000 1
30,000 0
$25000
35,000
mean = median = mode
= $25,000
Larson & Farber, Elementary Statistics: Picturing the World, 3e 58
Skewed Left Distribution
10 Annual Incomes
0
20,000
22,000
24,000 5
25,000 4
Income
25,000
f 3
26,000 2
28,000 1
30,000 0
35,000 $25000
mean = $23,500
median = mode = $25,000 Mean < Median
Larson & Farber, Elementary Statistics: Picturing the World, 3e 59
Skewed Right Distribution
10 Annual Incomes
15,000
20,000
22,000
5
24,000 Income
25,000 4
25,000 f 3
26,000 2
28,000 1
30,000 0
$25000
1,000,000
mean = $121,500
median = mode = $25,000 Mean > Median
Larson & Farber, Elementary Statistics: Picturing the World, 3e 60
Summary of Shapes of Distributions
Symmetric Uniform
Mean = Median
Example:
The following data are the closing prices for a certain stock
on ten successive Fridays. Find the range.
Stock 56 56 57 58 61 63 63 67 67 67
Example:
Stock Deviation
The following data are the closing x x–μ
prices for a certain stock on five 56 56 – 61 = – 5
successive Fridays. Find the 58 58 – 61 = – 3
deviation of each price. 61 61 – 61 = 0
63 63 – 61 = 2
The mean stock price is 67 67 – 61 = 6
μ = 305/5 = 61.
Σx = 305 Σ(x – μ) = 0
Guidelines
In Words In Symbols
1. Find the mean of the population data x
μ
set. N
x μ
2
5. Divide by N to get the population
2
variance. N
6. Find the square root of the variance to x μ
2
get the population standard
N
deviation.
Guidelines
In Words In Symbols
1. Find the mean of the sample data set. x
x
n
2. Find the deviation of each entry.
x x
3. Square each deviation.
4. Add to get the sum of squares. x x
2
SS x x x
2
5. Divide by n – 1 to get the sample
x x
2
variance.
s2
n 1
6. Find the square root of the variance to
get the sample standard deviation. x x
2
s
n 1
Example:
The following data are the closing prices for a certain stock on five
successive Fridays. The population mean is 61. Find the population
standard deviation.
Always positive!
14 14
12 x=4 12 x=4
10 s = 1.18 10 s=0
Frequency
Frequency
8 8
6 6
4 4
2 2
0 0
2 4 6 2 4 6
Data value Data value
Larson & Farber, Elementary Statistics: Picturing the World, 3e 69
Empirical Rule (68-95-99.7%)
Empirical Rule
For data with a (symmetric) bell-shaped distribution, the standard
deviation has the following characteristics.
1. About 68% of the data lie within one standard deviation of the
mean.
2. About 95% of the data lie within two standard deviations of the
mean.
3. About 99.7% of the data lie within three standard deviation of
the mean.
68% within 1
standard
deviation
34% 34%
2.35% 2.35%
13.5% 13.5%
–4 –3 –2 –1 0 1 2 3 4
68%
1 1 8
For k = 3: In any data set, at least 1 32 1 9 9 , or 88.9%, of the
(x x )2 f
Sample standard deviation = s n 1
where n = Σf is the number of entries in the data set, and x is the
data value or the midpoint of an interval.
Example:
The following frequency distribution represents the ages of 30
students in a statistics class. The mean age of the students is 30.3
years. Find the standard deviation of the frequency distribution.
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 76
Standard Deviation for Grouped Data
The mean age of the students is 30.3 years.
Class x f x–x (x – x )2 (x – x )2f
18 – 25 21.5 13 – 8.8 77.44 1006.72
26 – 33 29.5 8 – 0.8 0.64 5.12
34 – 41 37.5 4 7.2 51.84 207.36
42 – 49 45.5 3 15.2 231.04 693.12
50 – 57 53.5 2 23.2 538.24 1076.48
n = 30 2988.80
(x x )2 f 2988.8
s 103.06 10.2
n 1 29
Median
Q1 Q2 Q3
0 25 50 75 100
Q1 Q2 Q3
About one fourth of the students scores 37 or less; about one half score
43 or less; and about three fourths score 48 or less.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 80
Interquartile Range
The interquartile range (IQR) of a data set is the difference
between the third and first quartiles.
Interquartile range (IQR) = Q3 – Q1.
Example:
The quartiles for 15 quiz scores are listed below. Find the
interquartile range.
Q1 = 37 Q2 = 43 Q3 = 48
28 37 43 48 55
28 32 36 40 44 48 52 56
Larson & Farber, Elementary Statistics: Picturing the World, 3e 83
Standard Scores
The standard score or z-score, represents the number of standard
deviations that a data value, x, falls from the mean, μ.
va lu e m ea n x
z
st a n da r d devia t ion
Example:
The test scores for all statistics finals at ormia College have a
mean of 78 and standard deviation of 7. Find the z-score for
a.) a test score of 85,
b.) a test score of 70,
c.) a test score of 78.
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 84
Standard Scores
Example continued:
a.) μ = 78, σ = 7, x = 85
x 85 78
z 1.0
7
This score is 1 standard deviation higher
than the mean.
b.) μ = 78, σ = 7, x = 70
x 70 78
z
7 1.14 lower than the mean.
This score is 1.14 standard deviations
c.) μ = 78, σ = 7, x = 78
x 78 78
z 0
7
This score is the same as the mean.