Chapter 2, Discriptive Stastics

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 86

UNIT 2

DATA COLLECTION
AND PRESENTATION
DATA COLLECTION AND PRESENTATION
 What is data? • refers to facts or figures from which conclusion can be drawn.
 information collected, organized, analyzed, and interpret by statisticians.
 needed whenever we undertake studies or researches which are designed to answer particular problems, or to provide a base with which certain decisions may be formulated.
 Kinds of Statistical Data •
1.Qualitative data - classificatory data ex. Sex, religion, citizenship
2. Quantitative data - either counts or measures ex. Weekly allowance Note: Qualitative data can be transformed into quantitative data by coding; ex. Female=1, Male=0 •
3.Primary data - to information gathered directly from an original source
4.Secondary data - refer to information taken from published or unpublished data which were previously gathered by other individuals or agencies

Larson & Farber, Elementary Statistics: Picturing the World, 3e 2


DATA COLLECTION AND PRESENTATION
 Advantages of primary over secondary data:
1. Primary data frequently give detailed definitions of terms and accurate statistical units used in the experiment or in the survey.
2. Primary data lend more relevance to the researcher's study because of his direct participation in the project.
3. Primary data are more reliable because of their first-hand nature.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 3


DATA COLLECTION AND PRESENTATION
Methods of Data Collection
1. Survey method - the desired information is obtained either through personal interview or by distributing questionnaires to respondents.
2. Observations method - the desired information is obtained by observing and recording the behavior of persons, organizations, etc. but only at the time of occurrence.
- Direct observation can be used to discover a variety of types of information including aspects of social and economic behavior.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 4


DATA COLLECTION AND PRESENTATION

3. Experimental method - used when the objective is to determine the cause and effect relationship of certain variables under controlled conditions. Experimental
data are used to test hypotheses on significance of effects of one or more controlled variables on certain characteristics of the unit of analysis.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 5


Forms of Data Presentation
A. Textual Presentation - an expository form describing a set of information. - a useful manner of presenting limited amounts of information. Example
1. The carabao or kalabaw in Filipino is a type of water buffalo in our country. The carabaos are usually associated with farmers because these animals are used in farms. Their life span is about 18 to 20 years. A female carabao can give on calf each year.
2. Now is the time for you to shine! Join SEARCH for the GOLDEN DIVA June 12, 2010, 6pm @ Plaza Musica
1st Prize - 20, 000 Php
2 nd Prize – 15, 000 Php
3rs Prize – 10, 000 Php

Larson & Farber, Elementary Statistics: Picturing the World, 3e 6


Forms of Data Presentation
3. The Republic of the Philippines has more than
7000 islands.

B) Tabular Presentation the process of condensing classified


data and arranging them in a table.
Types of tables
1. General or reference table - used mainly as a repository of information;
its primary purpose is to present data in such a way that individual items
may easily be found by a reader; it is often placed in an appendix.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 7


Forms of Data Presentation
2. Summary or text table - usually small in size and designed to guide the reader in analyzing the data; usually accompanies a text discussion.
Parts of a Formal Statistical Table:
1. Heading - consists of the table number, table title, and head note, when necessary
a. Table number - identifies and positions a table within a series, it is preceded by the word "Table" with a capital T
b. Table title - the what, how classified, where and when of the table; a brief statement of the nature, classification and geographic area and time reference of the data .

Larson & Farber, Elementary Statistics: Picturing the World, 3e 8


Examble

Larson & Farber, Elementary Statistics: Picturing the World, 3e 9


Forms of Data Presentation
c. Head note - not a necessary part; statement is enclosed in brackets [ ] or in parentheses and appearing between the table title and the top rule of the table; or after the title

2. Box head - portion of the table which consists of the spanner and
column heads or captions describing the data in each column

a. Column head - basic unit of the box head; descriptive title placed
directly above the column to which it refers

b. Spanner head - title under which column heads are further classified

Larson & Farber, Elementary Statistics: Picturing the World, 3e 10


example

Larson & Farber, Elementary Statistics: Picturing the World, 3e 11


Forms of Data Presentation
3. Stub - contains the stub head, center heads and line captions; the first column on the left where the line descriptions are.
a. Stub head - describes the stub listing as a whole in terms of classification presented
b. b. Center head - describes a group of line captions c. Line caption - describes the data on a given rows.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 12


Example

Larson & Farber, Elementary Statistics: Picturing the World, 3e 13


4. Field (body) - depository of information appearing in the cells.
5. Footnote - statement qualifying or explaining the information presented in, or omitted from specific cells, columns or lines

Larson & Farber, Elementary Statistics: Picturing the World, 3e 14


Descriptive……..

Frequency Distributions and


Their Graphs

Larson & Farber, Elementary Statistics: Picturing the World, 3e 15


Descriptive…….
 Everyday we come across a lot of information in the form of facts, numerical figures, tables, graphs, etc. These are provided by newspapers, televisions, magazines and other means of communication.
 These may relate to cricket batting or bowling averages, profits of a company, temperatures of cities, expenditures in various sectors of a five year plan, polling results, and so on.
 These facts or figures, which are numerical or otherwise, collected with a definite purpose, are called data. Data is the plural form of the Latin word datum. Of course, the word ‘data’ is not new for you. You have studied about data and data handling in earlier classes.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 16


Descriptive…..
 Collection of Data
 Let us begin with an exercise on gathering data by performing the following activity.
  Activity : Divide the students of your class into four groups.
 Allot each group the work of collecting one of the following kinds of data:
 Age of 20 students of your class.
 Number of absentees in each day in your class for a month.
 Heights of 15 plants in or around your school.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 17


Descriptive….
Presentation of Data
 As soon as the work related to collection of data is over, the investigator has to find out ways to present them in a form which is meaningful, easily understood and gives its main features at a glance. Let us now recall the various ways of presenting the data through some examples.
 Example : Consider the marks obtained by 10 students in a mathematics test as given below:

 55  36  95  73  60  42  25  78  75  62

Larson & Farber, Elementary Statistics: Picturing the World, 3e 18


Descriptive…..
 The data in this form is called raw data. 
By looking at it in this form, can you find the highest and the lowest marks?  Did it take you some time to search for the maximum and minimum scores? Wouldn’t it be less time consuming if these scores were arranged in ascending or descending order? So let us arrange the marks in ascending order as
  25  36  42  55  60  62  73  75  78  95
Now, we can clearly see that the lowest marks are 25 and the highest marks are 95. The difference of the highest and the lowest values in the data is called the  range of the data. So, the range in this case is 95 – 25 = 70.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 19


Descriptive…..
 Presentation of data in ascending or descending order can be quite time consuming, particularly when the number of observations in an experiment is large, as in the case of the next example.
 Example : Consider the marks obtained (out of 100 marks) by 30 students of Class IX of a school:  
 10   20   36   92   95   40   50   56   60   70
92   88   80   70   72   70   36   40   36   40
92   40   50   50   56   60   70   60   60   88 

Larson & Farber, Elementary Statistics: Picturing the World, 3e 20


Frequency  Distribution
 Recall that the number of students who have obtained a certain number of marks is called the frequency of those marks. For instance, 4 students
got 70 marks. So the frequency of 70 marks is 4. To make the data more easily understandable, we write it in a table, as given below:

Larson & Farber, Elementary Statistics: Picturing the World, 3e 21


frequency …..
 The above Table is called an ungrouped frequency distribution table, or simply a frequency distribution table. Note that you can use also tally marks in preparing these tables, as in the next example.
  Example: 100 plants each were planted in 100 schools during Van Mahotsava. After one month, the number of plants that survived were recorded as :
 95   67   28   32   65   65   69   33   98   96
76   42   32   38   42   40   40   69   95   92
75   83   76   83   85   62   37   65   63   42
89   65   73   81   49   52   64   76   83   92
93   68   52   79   81   83   59   82   75   82
86   90   44   62   31   36   38   42   39   83
87   56   58   23   35   76   83   85   30   68
69   83   86   43   45   39   83   75   66   83
92   75   89   66   91   27   88   89   93   42
 53   69   90   55   66   49   52   83   34   36

Larson & Farber, Elementary Statistics: Picturing the World, 3e 22


frequency ……
 To present such a large amount of data so that a reader can make sense of it easily, we condense it into groups like 20-29, 30-39, ., 90-99
 (since our data is from 23 to 98). These groupings are called ‘classes’ or ‘class-intervals’, and their size is called the class-size or class width, which is 10 in this case. In each of these classes, the least number is called the lower class limit and the greatest number is called the upper class limit, e.g., in 20-29, 20 is
the ‘lower class limit’ and 29 is the ‘upper class limit’.
 Also, recall that using tally marks, the data above can be condensed in tabular form as follows:

Larson & Farber, Elementary Statistics: Picturing the World, 3e 23


frequency ……

Larson & Farber, Elementary Statistics: Picturing the World, 3e 24


frequency ……
 Presenting data in this form simplifies and condenses data and enables us to observe certain important features at a glance. This is called a grouped frequency distribution table. Here we can easily observe that 50% or more plants survived in 8 + 18 + 10 + 23 + 12 = 71 schools.
 We observe that the classes in the table above are non-overlapping. Note that we could have made more classes of shorter size, or fewer classes of larger size also. For instance, the intervals could have been 22-26, 27-31, and so on. So, there is no hard and fast rule about this except that
the classes should not overlap.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 25


Frequency….
A frequency distribution is a table that shows classes or
intervals of data with a count of the number in each class. The
frequency f of a class is the number of data points in the class.

Class Frequency, f
1–4 4
Upper Class
Lower 5–8 5
Limits 9 – 12 3 Frequencies
13 – 16 4
17 – 20 2

Larson & Farber, Elementary Statistics: Picturing the World, 3e 26


Frequency…..
The class width is the distance between lower (or upper) limits of
consecutive classes.

Class Frequency, f
1–4 4
5–1=4 5–8 5
9–5=4 9 – 12 3
13 – 9 = 4 13 – 16 4
17 – 13 = 4 17 – 20 2
The class width is 4.

The range is the difference between the maximum and minimum


data entries.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 27
Constructing a Frequency Distribution
Guidelines
1. Decide on the number of classes to include. The number of classes
should be between 5 and 20; otherwise, it may be difficult to detect any
patterns.
2. Find the class width as follows. Determine the range of the data, divide
the range by the number of classes, and round up to the next convenient
number.
3. Find the class limits. You can use the minimum entry as the lower limit
of the first class. To find the remaining lower limits, add the class width
to the lower limit of the preceding class. Then find the upper class
limits.
4. Make a tally mark for each data entry in the row of the appropriate class.
5. Count the tally marks to find the total frequency f for each class.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 28


Constructing a Frequency Distribution
Example:
The following data represents the ages of 30 students in a statistics
class. Construct a frequency distribution that has five classes.

Ages of Students
18 20 21 27 29 20
19 30 32 19 34 19
24 29 18 37 38 22
30 39 32 44 33 46
54 49 18 51 21 21
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 29
Constructing a Frequency Distribution
Example continued:

1. The number of classes (5) is stated in the problem.

2. The minimum data entry is 18 and maximum entry is 54, so the


range is 36. Divide the range by the number of classes to find
the class width.

Class width = 36 = 7.2 Round up to 8.


5

Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 30
Constructing a Frequency Distribution
Example continued:
3. The minimum data entry of 18 may be used for the lower limit of
the first class. To find the lower class limits of the remaining
classes, add the width (8) to each lower limit.

The lower class limits are 18, 26, 34, 42, and 50.
The upper class limits are 25, 33, 41, 49, and 57.
4. Make a tally mark for each data entry in the appropriate class.

5. The number of tally marks for a class is the frequency for that
class.
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 31
Constructing a Frequency Distribution
Example continued:
Number of
Ages students
Ages of Students
Class Tally Frequency, f
18 – 25 13
26 – 33 8
34 – 41 4
42 – 49 3
Check that the
50 – 57 2 sum equals the
number in the
 f  30
sample.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 32


Midpoint
The midpoint of a class is the sum of the lower and upper limits
of the class divided by two. The midpoint is sometimes called the
class mark.

Midpoint = (Lower class limit) + (Upper class limit)


2

Class Frequency, f Midpoint


1–4 4 2.5

Midpoint = 1  4  5  2.5
2 2

Larson & Farber, Elementary Statistics: Picturing the World, 3e 33


Midpoint
Example:
Find the midpoints for the “Ages of Students” frequency
distribution.
Ages of Students
Class Frequency, f Midpoint
18 + 25 = 43
18 – 25 13 21.5
43  2 = 21.5
26 – 33 8 29.5
34 – 41 4 37.5
42 – 49 3 45.5
50 – 57 2 53.5
 f  30
Larson & Farber, Elementary Statistics: Picturing the World, 3e 34
Relative Frequency
The relative frequency of a class is the portion or percentage of
the data that falls in that class. To find the relative frequency of a
class, divide the frequency f by the sample size n.

Class frequency f
Relative frequency = 
Sample size n

Relative
Class Frequency, f
Frequency
1–4 4 0.222
 f  18
f 4  0.222
Relative frequency  
n 18
Larson & Farber, Elementary Statistics: Picturing the World, 3e 35
Relative Frequency
Example:
Find the relative frequencies for the “Ages of Students” frequency
distribution.

Relative Portion of
Class Frequency, f Frequency students
18 – 25 13 0.433 f 13

26 – 33 8 0.267 n 30
34 – 41 4 0.133  0.433
42 – 49 3 0.1
50 – 57 2 0.067
f
 f  30  1
n
Larson & Farber, Elementary Statistics: Picturing the World, 3e 36
Cumulative Frequency
The cumulative frequency of a class is the sum of the frequency
for that class and all the previous classes.

Ages of Students
Cumulative
Class Frequency, f Frequency
18 – 25 13 13
26 – 33 +8 21
34 – 41 +4 25
42 – 49 + 3 28
Total number of
50 – 57 + 2 30 students
 f  30

Larson & Farber, Elementary Statistics: Picturing the World, 3e 37


Frequency Histogram
A frequency histogram is a bar graph that represents the
frequency distribution of a data set.
1. The horizontal scale is quantitative and measures the data
values.
2. The vertical scale measures the frequencies of the classes.
3. Consecutive bars must touch.

Class boundaries are the numbers that separate the classes


without forming gaps between them.
The horizontal scale of a histogram can be marked with either the
class boundaries or the midpoints.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 38
Class Boundaries
Example:
Find the class boundaries for the “Ages of Students” frequency
distribution.
Ages of Students
Class
Class Frequency, f Boundaries
The distance from the 18 – 25 13 17.5  25.5
upper limit of the first
class to the lower limit 26 – 33 8 25.5  33.5
of the second class is 1. 34 – 41 4 33.5  41.5
42 – 49 3 41.5  49.5
Half this distance 50 – 57 2 49.5  57.5
is 0.5.
 f  30

Larson & Farber, Elementary Statistics: Picturing the World, 3e 39


Frequency Histogram
Example:
Draw a frequency histogram for the “Ages of Students” frequency distribution. Use the class boundaries.

14 13 Ages of Students
12
10
8
8

f 6
4
4 3
2 2

0
17.5 25.5 33.5 41.5 49.5 57.5
Broken axis
Age (in years)
Larson & Farber, Elementary Statistics: Picturing the World, 3e 40
Frequency Polygon
A frequency polygon is a line graph that emphasizes the continuous change in frequencies.

14
Ages of Students
12
10
8 Line is extended to
the x-axis.
f 6
4
2
0
13.5 21.5 29.5 37.5 45.5 53.5 61.5
Broken axis
Age (in years) Midpoints

Larson & Farber, Elementary Statistics: Picturing the World, 3e 41


Relative Frequency Histogram
A relative frequency histogram has the same shape and the
same horizontal scale as the corresponding frequency histogram.

0.5
0.433
(portion of students)
Relative frequency

0.4 Ages of Students


0.3
0.267
0.2
0.133
0.1
0.1 0.067
0
17.5 25.5 33.5 41.5 49.5 57.5
Age (in years)
Larson & Farber, Elementary Statistics: Picturing the World, 3e 42
Pie Chart
A pie chart is a circle that is divided into sectors that represent
categories. The area of each sector is proportional to the frequency of
each category.
Accidental Deaths in the USA in 2002
Type Frequency
Motor Vehicle 43,500
Falls 12,200
Poison 6,400
Drowning 4,600
Fire 4,200
Ingestion of Food/Object 2,900
(Source: US Dept. of Firearms 1,400 Continued.
Transportation)
Larson & Farber, Elementary Statistics: Picturing the World, 3e 43
Pie Chart
To create a pie chart for the data, find the relative frequency (percent)
of each category.

Relative
Type Frequency
Frequency
Motor Vehicle 43,500 0.578
Falls 12,200 0.162
Poison 6,400 0.085
Drowning 4,600 0.061
Fire 4,200 0.056
Ingestion of Food/Object 2,900 0.039
Firearms 1,400 0.019
n = 75,200
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 44
Pie Chart
Next, find the central angle. To find the central angle, multiply the
relative frequency by 360°.

Relative
Type Frequency Angle
Frequency
Motor Vehicle 43,500 0.578 208.2°
Falls 12,200 0.162 58.4°
Poison 6,400 0.085 30.6°
Drowning 4,600 0.061 22.0°
Fire 4,200 0.056 20.1°
Ingestion of Food/Object 2,900 0.039 13.9°
Firearms 1,400 0.019 6.7°
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 45
Pie Chart
Ingestion Firearms
3.9% 1.9%
Fire
5.6%
Drowning
6.1%

Poison
8.5% Motor
vehicles
Falls 57.8%
16.2%

Larson & Farber, Elementary Statistics: Picturing the World, 3e 46


Chapter 3

Measures of Central
Tendency
Mean
A measure of central tendency is a value that represents a typical,
or central, entry of a data set. The three most commonly used
measures of central tendency are the mean, the median, and the
mode.

The mean of a data set is the sum of the data entries divided by the
number of entries.
x x
Population mean: μ  Sample mean: x 
N n

“mu” “x-bar”

Larson & Farber, Elementary Statistics: Picturing the World, 3e 48


Mean
Example:
The following are the ages of all seven employees of a small
company:

53 32 61 57 39 44 57
Calculate the population mean.

x 343 Add the ages and


 
N 7 divide by 7.
 49 yea r s

The mean age of the employees is 49 years.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 49


Median
The median of a data set is the value that lies in the middle of the
data when the data set is ordered. If the data set has an odd number
of entries, the median is the middle data entry. If the data set has an
even number of entries, the median is the mean of the two middle
data entries.

Example:
Calculate the median age of the seven employees.
53 32 61 57 39 44 57
To find the median, sort the data.
32 39 44 53 57 57 61
The median age of the employees is 53 years.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 50
Mode
The mode of a data set is the data entry that occurs with the
greatest frequency. If no entry is repeated, the data set has no
mode. If two entries occur with the same greatest frequency, each
entry is a mode and the data set is called bimodal.

Example:
Find the mode of the ages of the seven employees.
53 32 61 57 39 44 57
The mode is 57 because it occurs the most times.

An outlier is a data entry that is far removed from the other entries
in the data set.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 51
Comparing the Mean, Median and Mode
Example:
A 29-year-old employee joins the company and the ages of the
employees are now:
53 32 61 57 39 44 57 29

Recalculate the mean, the median, and the mode. Which measure
of central tendency was affected when this new age was added?

Mean = 46.5 The mean takes every value into account,


but is affected by the outlier.
Median = 48
The median and mode are not influenced
by extreme values.
Mode = 57
Larson & Farber, Elementary Statistics: Picturing the World, 3e 52
Weighted Mean

A weighted mean is the mean of a data set whose entries have


varying weights. A weighted mean is given by
(x w )
x 
w
where w is the weight of each entry x.

Example:
Grades in a statistics class are weighted as follows:
Tests are worth 50% of the grade, homework is worth 30% of the
grade and the final is worth 20% of the grade. A student receives a
total of 80 points on tests, 100 points on homework, and 85 points
on his final. What is his current grade?
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 53
Weighted Mean

Begin by organizing the data in a table.

Source Score, x Weight, w xw


Tests 80 0.50 40
Homework 100 0.30 30
Final 85 0.20 17

(x w ) 87  0.87
x  
w 100

The student’s current grade is 87%.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 54


Mean of a Frequency Distribution
The mean of a frequency distribution for a sample is
approximated by
(x  f )
x  Not e t h a t n   f
n
where x and f are the midpoints and frequencies of the classes.

Example:
The following frequency distribution represents the ages of 30
students in a statistics class. Find the mean of the frequency
distribution.

Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 55
Mean of a Frequency Distribution
Class midpoint

Class x f (x · f )
18 – 25 21.5 13 279.5
26 – 33 29.5 8 236.0
34 – 41 37.5 4 150.0
42 – 49 45.5 3 136.5
50 – 57 53.5 2 107.0
n = 30 Σ = 909.0

(x  f )  909  30.3


x 
n 30
The mean age of the students is 30.3 years.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 56
Shapes of Distributions
A frequency distribution is symmetric when a vertical line can be
drawn through the middle of a graph of the distribution and the
resulting halves are approximately the mirror images.
A frequency distribution is uniform (or rectangular) when all
entries, or classes, in the distribution have equal frequencies. A
uniform distribution is also symmetric.
A frequency distribution is skewed if the “tail” of the graph
elongates more to one side than to the other. A distribution is
skewed left (negatively skewed) if its tail extends to the left. A
distribution is skewed right (positively skewed) if its tail extends
to the right.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 57


Symmetric Distribution

10 Annual Incomes
15,000
20,000
22,000
5
24,000 Income
4
25,000
25,000 f 3
2
26,000
28,000 1

30,000 0
$25000
35,000
mean = median = mode
= $25,000
Larson & Farber, Elementary Statistics: Picturing the World, 3e 58
Skewed Left Distribution
10 Annual Incomes
0
20,000
22,000
24,000 5
25,000 4
Income
25,000
f 3
26,000 2
28,000 1
30,000 0
35,000 $25000

mean = $23,500
median = mode = $25,000 Mean < Median
Larson & Farber, Elementary Statistics: Picturing the World, 3e 59
Skewed Right Distribution

10 Annual Incomes
15,000
20,000
22,000
5
24,000 Income
25,000 4

25,000 f 3

26,000 2

28,000 1
30,000 0
$25000
1,000,000
mean = $121,500
median = mode = $25,000 Mean > Median
Larson & Farber, Elementary Statistics: Picturing the World, 3e 60
Summary of Shapes of Distributions
Symmetric Uniform

Mean = Median

Skewed right Skewed left

Mean > Median Mean < Median


Larson & Farber, Elementary Statistics: Picturing the World, 3e 61
§ 2.4
Measures of
Variation
Range
The range of a data set is the difference between the maximum and
minimum date entries in the set.
Range = (Maximum data entry) – (Minimum data entry)

Example:
The following data are the closing prices for a certain stock
on ten successive Fridays. Find the range.

Stock 56 56 57 58 61 63 63 67 67 67

The range is 67 – 56 = 11.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 63


Deviation
The deviation of an entry x in a population data set is the difference
between the entry and the mean μ of the data set.
Deviation of x = x – μ

Example:
Stock Deviation
The following data are the closing x x–μ
prices for a certain stock on five 56 56 – 61 = – 5
successive Fridays. Find the 58 58 – 61 = – 3
deviation of each price. 61 61 – 61 = 0
63 63 – 61 = 2
The mean stock price is 67 67 – 61 = 6
μ = 305/5 = 61.
Σx = 305 Σ(x – μ) = 0

Larson & Farber, Elementary Statistics: Picturing the World, 3e 64


Variance and Standard Deviation
The population variance of a population data set of N entries is
2 (x  μ )2
Population variance =   N
.
“sigma
squared”

The population standard deviation of a population data set of N


entries is the square root of the population variance.
(x  μ )2 2
Population standard deviation =     N
.
“sigma”

Larson & Farber, Elementary Statistics: Picturing the World, 3e 65


Finding the Population Standard Deviation

Guidelines
In Words In Symbols
1. Find the mean of the population data x
μ 
set. N

2. Find the deviation of each entry. x μ

3. Square each deviation. x  μ


2

4. Add to get the sum of squares. SS x  x  μ


2

x  μ 
2
5. Divide by N to get the population
2 
variance. N
6. Find the square root of the variance to x  μ 
2
get the population standard 
N
deviation.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 66


Finding the Sample Standard Deviation

Guidelines
In Words In Symbols
1. Find the mean of the sample data set. x
x 
n
2. Find the deviation of each entry.
x x
3. Square each deviation.
4. Add to get the sum of squares. x x
2

SS x  x  x 
2
5. Divide by n – 1 to get the sample
x  x 
2
variance.
s2 
n 1
6. Find the square root of the variance to
get the sample standard deviation. x  x 
2
s 
n 1

Larson & Farber, Elementary Statistics: Picturing the World, 3e 67


Finding the Population Standard Deviation

Example:
The following data are the closing prices for a certain stock on five
successive Fridays. The population mean is 61. Find the population
standard deviation.
Always positive!

Stock Deviation Squared SS2 = Σ(x – μ)2 = 74


x x–μ (x – μ)2
x  μ 
2
56 –5 25 74
2    14.8
58 –3 9 N 5
61 0 0
x  μ
2
63 2 4   14.8  3.8
3.85
67 6 36 N

Σx = 305 Σ(x – μ) = 0 Σ(x – μ)2 = 74


σ  $3.85
Larson & Farber, Elementary Statistics: Picturing the World, 3e 68
Interpreting Standard Deviation
When interpreting standard deviation, remember that is a measure
of the typical amount an entry deviates from the mean. The more
the entries are spread out, the greater the standard deviation.

14 14
12 x=4 12 x=4
10 s = 1.18 10 s=0
Frequency

Frequency
8 8
6 6
4 4
2 2
0 0
2 4 6 2 4 6
Data value Data value
Larson & Farber, Elementary Statistics: Picturing the World, 3e 69
Empirical Rule (68-95-99.7%)
Empirical Rule
For data with a (symmetric) bell-shaped distribution, the standard
deviation has the following characteristics.

1. About 68% of the data lie within one standard deviation of the
mean.
2. About 95% of the data lie within two standard deviations of the
mean.
3. About 99.7% of the data lie within three standard deviation of
the mean.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 70


Empirical Rule (68-95-99.7%)
99.7% within 3
standard deviations

95% within 2 standard


deviations

68% within 1
standard
deviation

34% 34%
2.35% 2.35%
13.5% 13.5%

–4 –3 –2 –1 0 1 2 3 4

Larson & Farber, Elementary Statistics: Picturing the World, 3e 71


Using the Empirical Rule
Example:
The mean value of homes on a street is $125 thousand with a standard deviation
of $5 thousand. The data set has a bell shaped distribution. Estimate the percent
of homes between $120 and $130 thousand.

68%

105 110 115 120 125 130 135 140 145


μ–σ μ μ+σ
68% of the houses have a value between $120 and $130 thousand.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 72
Chebychev’s Theorem
The Empirical Rule is only used for symmetric
distributions.

Chebychev’s Theorem can be used for any distribution,


regardless of the shape.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 73


Chebychev’s Theorem
The portion of any data set lying within k standard deviations
(k > 1) of the mean is at least
1  12 .
k
1 1 3
For k = 2: In any data set, at least 1  22  1  4  4 , or 75%, of the

data lie within 2 standard deviations of the mean.

1 1 8
For k = 3: In any data set, at least 1  32  1  9  9 , or 88.9%, of the

data lie within 3 standard deviations of the mean.


Larson & Farber, Elementary Statistics: Picturing the World, 3e 74
Using Chebychev’s Theorem
Example:
The mean time in a women’s 400-meter dash is 52.4
seconds with a standard deviation of 2.2 sec. At least 75%
of the women’s times will fall between what two values?
2 standard deviations

45.8 48 50.2 52.4 54.6 56.8 59

At least 75% of the women’s 400-meter dash times will fall


between 48 and 56.8 seconds.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 75
Standard Deviation for Grouped Data

(x  x )2 f
Sample standard deviation = s  n 1
where n = Σf is the number of entries in the data set, and x is the
data value or the midpoint of an interval.

Example:
The following frequency distribution represents the ages of 30
students in a statistics class. The mean age of the students is 30.3
years. Find the standard deviation of the frequency distribution.

Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 76
Standard Deviation for Grouped Data
The mean age of the students is 30.3 years.
Class x f x–x (x – x )2 (x – x )2f
18 – 25 21.5 13 – 8.8 77.44 1006.72
26 – 33 29.5 8 – 0.8 0.64 5.12
34 – 41 37.5 4 7.2 51.84 207.36
42 – 49 45.5 3 15.2 231.04 693.12
50 – 57 53.5 2 23.2 538.24 1076.48
n = 30   2988.80

(x  x )2 f 2988.8
s    103.06  10.2
n 1 29

The standard deviation of the ages is 10.2 years.


Larson & Farber, Elementary Statistics: Picturing the World, 3e 77
§ 2.5
Measures of Position
Quartiles
The three quartiles, Q1, Q2, and Q3, approximately divide an
ordered data set into four equal parts.

Median

Q1 Q2 Q3

0 25 50 75 100

Q1 is the median of the Q3 is the median of the


data below Q2. data above Q2.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 79


Finding Quartiles
Example:
The quiz scores for 15 students is listed below. Find the first,
second and third quartiles of the scores.
28 43 48 51 43 30 55 44 48 33 45 37 37 42 38

Order the data.


Lower half Upper half
28 30 33 37 37 38 42 43 43 44 45 48 48 51 55

Q1 Q2 Q3

About one fourth of the students scores 37 or less; about one half score
43 or less; and about three fourths score 48 or less.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 80
Interquartile Range
The interquartile range (IQR) of a data set is the difference
between the third and first quartiles.
Interquartile range (IQR) = Q3 – Q1.

Example:
The quartiles for 15 quiz scores are listed below. Find the
interquartile range.
Q1 = 37 Q2 = 43 Q3 = 48

(IQR) = Q3 – Q1 The quiz scores in the middle


= 48 – 37 portion of the data set vary by at
= 11 most 11 points.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 81


Box and Whisker Plot
A box-and-whisker plot is an exploratory data analysis tool that
highlights the important features of a data set.
The five-number summary is used to draw the graph.
• The minimum entry
• Q1
• Q2 (median)
• Q3
• The maximum entry
Example:
Use the data from the 15 quiz scores to draw a box-and-whisker
plot.
28 30 33 37 37 38 42 43 43 44 45 48 48 51 55
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 82
Box and Whisker Plot
Five-number summary
• The minimum entry 28
• Q1 37
• Q2 (median) 43
• Q3 48
55
• The maximum entry
Quiz Scores

28 37 43 48 55

28 32 36 40 44 48 52 56
Larson & Farber, Elementary Statistics: Picturing the World, 3e 83
Standard Scores
The standard score or z-score, represents the number of standard
deviations that a data value, x, falls from the mean, μ.
va lu e  m ea n x 
z  
st a n da r d devia t ion 

Example:
The test scores for all statistics finals at ormia College have a
mean of 78 and standard deviation of 7. Find the z-score for
a.) a test score of 85,
b.) a test score of 70,
c.) a test score of 78.

Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 84
Standard Scores
Example continued:
a.) μ = 78, σ = 7, x = 85
x   85  78
z   1.0
  7
This score is 1 standard deviation higher
than the mean.

b.) μ = 78, σ = 7, x = 70
x   70  78
z 
  7  1.14 lower than the mean.
This score is 1.14 standard deviations

c.) μ = 78, σ = 7, x = 78
x   78  78
z  0
  7
This score is the same as the mean.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 85


Relative Z-Scores
Example:
John received a 75 on a test whose class mean was 73.2 with a
standard deviation of 4.5. Samantha received a 68.6 on a test whose
class mean was 65 with a standard deviation of 3.9. Which student
had the better test score?

John’s z-score Samantha’s z-score


x   75  73.2 x   68.6  65
z   z  
 4.5  3.9
 0.4  0.92
John’s score was 0.4 standard deviations higher than the mean,
while Samantha’s score was 0.92 standard deviations higher
than the mean. Samantha’s test score was better than John’s.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 86

You might also like