ENGDAT1 - Module2 (Review) PDF
ENGDAT1 - Module2 (Review) PDF
ENGDAT1 - Module2 (Review) PDF
æ Ic ö æ Ic ö
ç LO + ÷ - ç SO - ÷
è 2ø è 2ø
ii =
Interval size Kr
where
ii = the initial estimate of the interval size
LO = the largest observed value
Ic = the smallest increment of change in data
SO = smallest observed value
Kr = the K value obtained from Sturges’ rule rounded
upwards
EXCESS = Space available – Required Space, where
EXCESS Space available = Kr*I
SPACE Required Space = [LO + Ic/2] – [SO – Ic/2]
With the interval size and the number of intervals
Table known, we can construct the frequency distribution
Construction by first dividing the excess between the lowest and
the highest ends of the data.
Frequency Histogram – a bar graph
representation of a frequency
distribution table. Marked along the
horizontal axis are the class boundaries
(CB). Frequencies are marked along the
GRAPHICAL vertical axis. Each interval is drawn as a
METHODS bar bounded or defined by the class
boundaries and the corresponding
FOR frequencies.
DESCRIBING Frequency Polygon – uses class
QUANTITATIVE midpoints (CM) to represent the
DATA intervals. Class midpoint is computed as
the average of the lower class limit (LCL)
and the upper class limit (UCL). Class
limits are the visible limits of the
intervals in the frequency distribution
table.
The manager of Juan Auto would like to get a
better picture of the distribution of costs for engine
tune-up parts. A sample of 50 customer invoices has
Example: been taken and the costs of parts, rounded to the
Juan nearest dollar, are listed below.
Auto Repair
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73
Guidelines for Selecting Number of Classes
Frequency
Distribution
where
ii = the initial estimate of the interval size
LO = the largest observed value
Ic = the smallest increment of change in data
SO = smallest observed value
Kr = the K value obtained from Sturges’ rule rounded upwards
Initial estimate of class size = (109.5 – 51.5)/7 =
8.29857 ≈ 9 = I (actual class size)
Note: The rounding off of the initial estimate of
class size is based on the type of data that you have
obtained:
Interval Size If the data are integers, therefore, I = 9
or Class If the data are in one decimal places, therefore, I =
Width 8.3
If the data are in two decimal places, therefore, I =
8.29
If the data are in three decimal places, therefore, I =
8.299 and so on……
Excess Space = Space available – required space
where
Space Available = Kr I
Computation Required Space = [LO + Ic /2] – [SO – Ic /2]
of Excess
Space Therefore, in our example,
Space available = 7*9 = 63
Required Space = [109 + ½] – [52 – ½] = 58
Excess Space = 63 – 58 = 5
Divide the excess space between the lowest and
highest ends of the data set. That means, you have to
divide the excess space into 2.
In our example, the excess space is 5.
It is an odd number, therefore, you cannot exactly
divide it into 2 without having a value in one decimal
Dividing the place, which is 2.5.
Excess Space Since our data are in integers or whole numbers,
therefore, we should make our excess space into
integers by rounding up or down.
In case of rounding down, we have to subtract 2 from
the lowest end and add 3 to the highest end.
In case of rounding up, we have to subtract 3 from the
lowest end and add 2 to the highest end.
Frequency Distribution in the case of rounding down
Cost Frequency
Example: 50-58 2
Juan 59-67 7
Auto 68-76 17
Repair 77-85 10
86-94 4
95-103 6
104-112 4
Total 50
■ Frequency Distribution in the case of rounding up
Cost Frequency
49-57 2
Example: 58-66 6
Juan 67-75 17
Auto 76-84 10
Repair 85-93 5
94-102 6
103-111 4
Total 50
Relative Frequency and Percent Frequency Distributions
Example:
Juan
Auto Repair
One of the simplest graphical summaries of data is
a dot plot.
A horizontal axis shows the range of data values.
Then each data value is represented by a dot placed
above the axis.
Dot Plot
Another common graphical presentation of quantitative
data is a histogram.
The variable of interest is placed on the horizontal axis.
A rectangle is drawn above each class interval with its
height corresponding to the interval’s frequency,
relative frequency, or percent frequency.
Histogram
Unlike a bar graph, a histogram has no natural
separation between rectangles of adjacent classes.
Cumulative frequency distribution -- shows the number of
items with values less than or equal to the upper limit of
each class.
Cumulative relative frequency distribution -- shows the
proportion of items with values less than or equal to the
upper limit of each class.
Cumulative percent frequency distribution -- shows the
percentage of items with values less than or equal to the
Cumulative upper limit of each class.
Distributions
An ogive is a graph of a cumulative distribution.
The data values are shown on the horizontal axis.
Shown on the vertical axis are the:
cumulative frequencies, or
cumulative relative frequencies, or
cumulative percent frequencies
The frequency (one of the above) of each class is plotted as
a point.
The plotted points are connected by straight lines.
Ogive
The techniques of exploratory data analysis consist
Exploratory of simple arithmetic and easy-to-draw pictures that
can be used to summarize data quickly.
Data Analysis One such technique is the stem-and-leaf display.
A stem-and-leaf display shows both the rank order and
shape of the distribution of the data.
It is similar to a histogram on its side, but it has the
advantage of showing the actual data values.
Stem-and-
The first digits of each data item are arranged to the left of a
Leaf Display vertical line.
To the right of the vertical line we record the last digit for
each item in rank order.
Each line in the display is referred to as a stem.
Each digit on a stem is a leaf.
8 57
9 3678
Leaf Units
X =
åX
n
MEAN
where X = the raw data or
observations
n = the number of
observations or values
X =
å fx
Grouped n
Data Mean where f = frequency of each class interval
x = class midpoints
n = total number of observations
MEDIAN the middle value in an arrayed data (data which has
been arranged in ascending order).
én ù
~ ê2 -Fú
X = LCB med +ê úI
ê f med ú
where: êë úû
Grouped LCBmed = lower class boundary of the
Median median class
n = number of observations
F = cumulative frequency of the
class before the median class
fmed = frequency of the median class
I = class interval size
the value which occurs
MODE with the most number
of times in a data set.
é Df1 ù
M = LCBmod +ê úI
ë Df1 + Df 2 û
Grouped
Mode
formula where:
LCBmod = lower class boundary of the modal class
Δf1 = the difference between the frequency of the
modal class and the class immediately before it
Δf2 = the difference between the frequency of the
modal class and the class immediately after it
I = size of the class interval
is the simplest measure of spread or variability. It is
the difference between the highest score and
lowest score in any given set of data or distribution.
RANGE In the case of the data grouped into intervals, the
range becomes the difference between the higher
boundary of the highest class and the lower
boundary of lowest class.
the most useful measure of variability. It is special
Standard form of average deviation from the mean which is
Deviation affected by all individual values of the items in any
given distributions.
å (X - µ )
2
s=
Ungrouped N
Population
Standard where X = the individual values of all the
items
Deviation
μ = the population mean
N = the population size
s=
å [ f ( X - µ ) ] 2
Grouped
Population N
Standard
Deviation where X = the class midpoints
f = frequency of each class
interval
μ = the population mean
N = the population size
å (X - X )
2
Ungroupe s=
d Sample n -1
Standard
where X = the individual values of all
deviation
the items
Xbar = the sample mean
n = sample size
å [ f (X - X ) ]
Grouped 2
Sample
s=
Standard
Deviation n -1
where X = the class midpoints
Xbar = the sample mean
n = sample size