Chapter 2 Describing Data Using Tables and Graphs

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Chapter 2 Describing data using tables and graphs

2.1 Raw data



data recorded in the sequence in which they are collected and before they are processed
or ranked

Table 2.1: The weights of 20 students in kg (Quantitative raw data)

61 68 65 67 68 71 69 63 74 64
66 65 62 67 60 73 69 70 70 71

Table 2.2: The grades of MAA161 of 20 students (Qualitative raw data)

A B C A C B B A B C
B A B B B A C D D B

Arrays
an arrangement of numerical raw data in ascending order or descending order of
magnitude

60 , 61 , 62 , 63 , 64 , 65 , 65 , 66 , 67 , 67 ,
68 , 68 , 69 , 69 , 70 , 70 , 71 , 71 , 73 , 74


Ungrouped data
Contains information on each member of a sample or population individually
Examples: Data presented in Table 1 and Table 2


2.2 Organizing and Graphing Qualitative Data

Frequency distributions for qualitative data
A tabular arrangement that lists all categories and the number of elements that belong to
each of the categories.

Example 2.1
A sample was taken of 25 students who were planning to go to college. The majors he/she
intended to choose:
Economics MIS Economics Business Business
Business Business Other Other Other
BS BS MIS Other MIS
Other Business MIS Business Other
Economics MIS Other Other MIS
Construct a frequency distribution table for these data.



Solution

Major Tally Frequency
Business |||| | 6
Economics ||| 3
MIS |||| | 6
BS || 2
Others |||| ||| 8
Sum = 25


Relative frequency and percentage distributions

Tabular arrangement that lists the relative frequencies and percentages for all categories.
f
f
s frequencie all of sum
category that of frequency
category a of frequency relative
E
= =

% 100 = frequency relative Percentage


Example 2.2
Determine the relative frequency and percentage distributions for the data in Example 2.1.

Solution
Major Relative
Frequency
Percentage
Business 6/25 24
Economics 3/25 12
MIS 6/25 24
BS 2/25 8
Others 8/25 32
Sum = 1.00 Sum = 100



Graphical presentation of qualitative data
- Bar Graphs (bar chart)
A graph made of bars whose heights represent the frequencies of respective
categories.

Example 2.3
Construct a bar chart for the data in Example 2.1.








- Pie Chart
A circle divided into portions that represent the relative frequencies or percentages of
a population or a sample belonging to different categories.

Example 2.4
Construct a pie chart for the data in Example 2.1.

Major Relative
Frequency
Angle Size
Business 6/25
0
4 . 86 360
25
6
=
Economics 3/25
0
2 . 43 360
25
3
=
MIS 6/25
0
4 . 86 360
25
6
=
BS 2/25
0
8 . 28 360
25
2
=
Others 8/25
0
2 . 115 360
25
8
=
Sum = 1.00 Sum = 360


























2.3 Organizing and Graphing Quantitative data

Frequency Distribution for quantitative data
Lists all the classes and the number of values that belong to each class.
Data presented in the form of a frequency distribution are called grouped data.

Note:
- Generally, the grouping process destroys some of the original information
- The classes are non-overlapping i.e. each value belongs to one and only one class

Class
an interval that includes all the values that fall within two numbers, the lower and upper
limits

Class limits
endpoints of each interval

Class boundary
the dividing line between two classes
is given by the midpoint of the upper limit of one class and the lower limit of the next
higher class

Class width / class size
is the difference between the upper and lower class boundary
boundary lower boundary upper width class =

Class mark / class midpoint
is the midpoint of the class interval
2 / ) ( limit class upper limit class lower mark class + =


Constructing frequency distribution tables

1. Determine the number of classes is Sturge formula,
n c log 3 . 3 1+ =
where c is the number of classes
n is the number of observations in the data set.

2. Determine the class interval or width ( w )
Must cover at least the distance from the smallest value (L) in the raw data up to the
largest value (H)
classes of number
L value smallest H value largest
width class e approximat
) ( ) (
=

- The class width is usually rounded up to some convenient number.
- The rounding of this number may slightly change the number of classes initially
intended.

3. Determine the lower limit of the first class or the starting point.

Any convenient number that is equal to or less than the smallest value in the data set
can be used as the lower limit of the first class.

Example 2.5
Table 2.3 gives the total home runs hit by all players of each of the 30 major League Baseball
teams during the 2004 season.

Table 2.3: Total home runs hit by League Baseball teams
Team Home Runs Team Home Runs
Arizona
Atlanta
Baltimore
Boston
Chicago Clubs
Chicago White Sox
Cincinnati
Cleveland
Colorado
Detroit
Florida
Houston
Kansas City
Anaheim Angels
Los Angeles Dodgers
135
178
169
222
235
242
194
184
202
201
148
187
150
162
203
Milwaukee
Minnesota
Montreal( now Washington)
New York Mets
New York Yankees
Oakland
Philadelphia
Pittsburgh
St. Louis
San Diego
San Francisco
Seattle
Tampa Bay
Texas
Toronto
135
191
151
185
242
189
215
142
214
139
183
136
145
227
145

Solution:
Approximate width of each class = 4 . 21
5
135 242
=

) 22 (~


Total Home Runs Tally f
135-156 |||| |||| 10
157-178 ||| 3
179-200 |||| || 7
201-222 |||| | 6
223-244 |||| 4

30 =

f



Relative frequency and percentage distributions

f
f
s frequencie all of sum
class that of frequency
class a frequency relative
E
= =

% 100 = frequency relative Percentage


Example 2.6
Calculate the relative frequencies and percentages distributions for the data in Example 2.5.

Solution
Total Home Runs Class Boundaries Relative
Frequency
Percentage
135-156 134.5 - 156.5 0.333 33.3
157-178 156.5 178.5 0.100 10.0
179-200 178.5- 200.5 0.233 23.3
201-222 200.5 222.5 0.200 20.0
223-244 222.5 244.5 0.133 13.3
Sum= 0.999 Sum = 99.9%


Grouped (quantitative) data can be displayed in a histogram or a polygon.



Graphing Grouped Data

Histogram

Three types of histogram
1. Frequency histogram
2. Relative frequency histogram
3. Percentage histogram

A frequency histogram consists of a set of rectangle having
a) The bases on a horizontal axis with centres at the class marks and lengths equal to the
class interval sizes
b) The areas proportional to the class frequencies

If the class intervals all have equal size
the height of the rectangles are proportional to the class frequencies
otherwise
the height of the rectangles must be adjusted


Procedures to draw a histogram:
1. Mark the class boundary of each interval on the horizontal axis.
2. For each class, mark the frequencies (or relative frequencies or percentages) on
the vertical axis.
3. Draw a bar for each class so that its height represents the frequency of that class.
(No gap between each bars)
4. Label the histogram.






Polygon

A line graph formed by joining the midpoints of the tops of successive bars in a
histogram
Next, we mark two more classes (with zero frequencies), one at each end, and mark the
midpoints.

Three types of polygon:
1. Frequency polygon
2. Relative frequency polygon
3. Percentage polygon


Table 2.4: Frequency distribution for Table 2.3
Total Home Runs Class mark Frequency
135-156 145.5 10
157-178 167.5 3
179-200 189.5 7
201-222 216 6
223-244 233.5 4





Figure 2.1: Frequency histogram for Table 2.4



Figure 2.2: Relative frequency histogram for Table 2.4















Figure 2.3: Frequency polygon for Table 2.4


For a very large data set, as the number of classes is increased (and the width of classes is
decreased), the frequency polygon eventually becomes a smooth curve. Such a curve is called
a frequency distribution curve or simply a frequency curve. Figure 2.4 shows the frequency
curve for a large data set with a large number of classes.



Figure 2.4: Frequency distribution curve


Single-Valued Classes
Is used if the observations in a data set assume only a few distinct values (classes that are
made of single values and not of intervals)
Useful in cases of discrete data with only a few possible values.


Example 2.7
A sample of 40 randomly selected household from a city produced the following data on the
number of vehicles owned:

5
1
2
4
1
3
1
2
1
3
2
1
2
0
2
1
0
2
1
2
1
5
2
1
1
1
2
1
2
2
1
4
1
3
1
1
1
4
1
3
Construct a frequency distribution table for these data.

Solution

Vehicles owned Number of households (f)
0
1
2
3
4
5
2
18
11
4
3
2



The frequency distribution can be displayed in a bar graph.





2.4 Shapes of Histograms
Symmetric
- Identical on both sides of its central point.





Skewed
- the tail on one side is longer than the tail on the other side.





Uniform or rectangular
- has the same frequency for each class.







(a) and (b) Symmetric frequency curves.
(c) Frequency curve skewed to the right.
(d) Frequency curve skewed to the left.



2.5 Cumulative frequency distribution

A table that presents the total number of values that fall below the upper boundary of
each class.
It is constructed for quantitative data only.
set data the in s frequencie all of sum
class a of frequency cumulative
frequency relative cumulative =
% 100 = frequency relative cumulative percentage cumulative


Example 2.8:
Using the frequency distribution of Table 2.4, reproduced here, prepare a cumulative
frequency distribution for the number of Total Home Runs

Total Home Runs Frequency
135-156 10
157-178 3
179-200 7
201-222 6
223-244 4





Table 2.5: Cumulative frequency distribution for Table 2.4
Total Home Runs Cumulative frequency
<156.5 10
<178.5 10+3=13
<200.5 10+3+7=20
<222.5 10+3+7+6=26
<244.5 10+3+7+6+4=30


Table 6: Cumulative relative frequency and cumulative percentage for Table 2.4
Weight (kg) Cumulative relative frequency Cumulative percentage
<156.5 10/30=0.333 33.3
<178.5 13/30=0.433 43.3
<200.5 20/30=0.667 66.7
<222.5 26/30=0.867 86.7
<244.5 30/30=1.000 100.0



Ogive / Cumulative frequency curve
A curve drawn for the cumulative frequency distribution by joining with straight lines the
dots marked above the upper boundaries of classes at heights equal to the cumulative
frequencies of respective classes.

Note:
1. The ogive starts at the lower boundary of the first class and ends at the upper
boundary of the last class.
2. If relative cumulative frequency is used in placed of cumulative frequency, the graph
is called relative cumulative frequency curve or percentage ogive


Figure 2.5: Ogive for the cumulative frequency distribution of Table 2.4






2.6 Stem-and-leaf displays

Each value is divided into two portions - a stem and a leaf. The leaves for each stem are
shown separately in a display.

Note:
1. It is constructed only for quantitative data.
2. An advantage over a frequency distribution because we do not lose information on
individual observations.

Example 2.9
The following are the scores of 30 college students on a statistics test.

75
69
83
52
72
84
80
81
77
96
61
64
65
76
71
79
86
87
71
79
72
87
68
92
93
50
57
95
92
98
Construct a stem-and-leaf display.




Figure 2.6: Stem and Leaf display Figure 2.7: Ranked stem-and-leaf
of test scores display of test scores








Example 2.10
The following data are the monthly rents paid by a sample of 30 households selected from a
city.
880 1081 721 1075 1023 775 1235 750 965 960
1210 985 1231 932 850 825 1000 915 1191 1035
1151 630 1175 952 1100 1140 750 1140 1370 1280
Construct a stem-and-leaf display for these data.

Solution










































Example 2.11
The following stem-and leaf display is prepared for the number of hours that 25 students
spent working on computer during the past month.

0 6
1 1 7 9
2 2 6
3 2 4 7 8
4 1 5 6 9 9
5 3 6 8
6 2 4 4 5 7
7
8 5 6

Prepare a new stem-and leaf display by grouping the stems.
























Note:
The leaves for each stem of a group are separated by an asterisk (*).
If the stem does not contain a leaf, indicate the leaf by two consecutive asterisks.








Dotplot display
Displays the data of a sample by representing each piece of data with a dot positioned
along a scale (horizontal scale or vertical scale).
The frequency of the values is represented along the other scale.

Example 2.5
A sample of 19 exam grades was randomly selected from a large class:
76 74 82 96 66 76 78 72 52 68
86 84 62 76 78 92 82 74 88
Construct a dotplot of these data.

You might also like