Lecture 1 PDF
Lecture 1 PDF
Lecture 1 PDF
(MTH60403/ENG 2123)
1
Distinguish between discrete and continuous data
Construct frequency and relative frequency tables for
grouped and ungrouped discrete data
Determine class boundaries, class intervals and central
values for discrete and continuous data
Construct a histogram and a frequency polygon
Determine the mean, median and mode of grouped and
ungrouped data
Determine the range, variance and standard deviation of
discrete data
Measure dispersion of data using the normal and
standard normal curves.
2
Topics to be covered
Introduction
Arrangement of data
Histograms
Measure of central tendency
Dispersion
Frequency polygons
Frequency curves
Normal distribution curve
Standardized normal curve
3
Introduction
Statistics as a discipline is the development and
application of methods to collect, analyze and interpret
data.
6
Arrangement of data
A set of data:
28 31 29 27 30 29 29 26 30 28
28 29 27 26 32 28 32 31 25 30
27 30 29 30 28 29 31 27 28 28
25 26 26 27 27 27 27 28 28 28
28 28 28 28 29 29 29 29 29 29
30 30 30 30 30 31 31 31 32 32
7
Arrangement of data
Number of
Once the data is in ascending order:
Value times
25 26 26 27 27 27 27 28 28 28
28 28 28 28 29 29 29 29 29 29 25 1
30 30 30 30 30 31 31 31 32 32 26 2
27 4
It can be entered into a table.
28 7
The number of occasions on which any 29 6
particular value occurs is called the 30 5
frequency, denoted by f. 31 3
32 2
8
Arrangement of data
9
Arrangement of data
Grouped Data
If the range of values of the variable is large, it is often
helpful to consider these values arranged in regular
groups or classes.
10
Arrangement of data
Grouping with Continuous Data
The lengths (in mm) of 40 spindles were measured as below :
20.90 20.57 20.86 20.74 20.82 20.63 20.53 20.89 20.75 20.65
20.71 21.03 20.72 20.41 20.94 20.75 20.79 20.65 21.08 20.89
20.50 20.88 20.97 20.78 20.61 20.92 21.07 21.16 20.80 20.77
20.82 20.72 20.60 20.90 20.86 20.68 20.75 20.88 20.56 20.94
Lowest value = 20.41 } form classes from 20.40 to 21.20 at 0.10 intervals.
Highest value = 21.16
11
Arrangement of data
Grouping with Continuous Data
With continuous data the groups boundaries are given
to the same number of significant figures or decimal
places as the data:
The lengths
(in mm) of 40
spindles were
measured and
arranged in this
table.
12
Arrangement of data
Relative Frequency
If the frequency of any one group is divided by the
sum of the frequencies the ratio is called the relative
frequency of that group. Relative frequencies can be
expressed as percentages:
1
100 2.5
40
9
100 22.5
40
13
Arrangement of data
Rounding off Data
If the value 21.7 is expressed to two significant
figures, the result is rounded up to 22. similarly, 21.4
is rounded down to 21.
To maintain consistency of group boundaries, middle
values will always be rounded up. So that 21.5 is
rounded up to 22 and 42.5 is rounded up to 43.
Therefore, when a result is quoted to two significant
figures as 37 on a continuous scale this includes all
possible values between:
14
Arrangement of data
Class Boundaries
A class or group boundary lies midway between the data
values. For example, for data in the class or group labelled:
7.1 – 7.3
(a)The class values 7.1 and 7.3 are the lower and upper limits
of the class and their difference gives the class width.
(b) The class boundaries are 0.05 below the lower class limit
and 0.05 above the upper class limit.
(c) The class interval is the difference between the upper and
lower class boundaries.
(d) The central value (or mid-value) of the class interval is one
half of the difference between the upper and lower class
boundaries.
15
Arrangement of data
Class Boundaries
These terms can be summarized in the following diagram, using
the class 7.1 – 7.3 (inclusive) as example
(c)
16
Histograms
Frequency histogram
A histogram is a graphical
representation of a frequency
distribution in which vertical
rectangular blocks are drawn so that:
17
Histograms
Frequency histogram
For example, the measurement of the lengths of 50
brass rods gave the following frequency distribution:
18
Histograms
Frequency histogram
This gives rise to the histogram:
20
Measure of central tendency
Mean
The arithmetic mean: x of a set of n observations is
their average:
mean =
sum of observations
that is x
x
number of observations n
x
xf xf
n f
21
Measure of central tendency
Mean
x
xf xf
862
28.73
n f 30
22
Measure of central tendency
Coding for calculating the mean
A deal of tedious work can be avoided by coding with
a false mean. It involves converting the x-values into
simpler values for the calculation and then converting
back again for the final result.
(b)
(a) (c)
False mean
(d) xc
x f
c
2.0
0.0333 to 4 dp
f 60
24
Measure of central tendency
Decoding for calculating the mean
Decoding requires the coding process to be reversed.
xc
x f
c
2.0
0.0333 to 4 dp where xc
x 30.8
f 60 0.2
Therefore:
x (0.0333) 0.2 30.8 30.79 to 2 dp
25
Measure of central tendency
Coding with a grouped frequency distribution
This procedure is similar where the false mean is the
centre value of a convenient class.
xc
xc f
11
0.22
50 11
f 50
26
Measure of central tendency
Decoding with a grouped frequency distribution
Decoding again requires the coding process to be reversed.
xc
x f
c
11 x 2.30
0.22 where xc m
f 50 0.03
Therefore:
x m (0.22) 0.03 2.30 2.3067 to4 dp
giving:
x 2.307 to 3 dp
27
Measure of central tendency
Mode of a set of data
The mode of a set of data is that value of the variable
that occurs most often.
29
Measure of central tendency
Median of a set of data
The median is the value of the middle datum when the
data is arranged in ascending or descending order.
If there is an even number of values the median is the
average of the two middle data.
30
Measure of central tendency
Median with grouped frequency distribution
In the case of grouped data the median divides the population
of the largest block of the histogram into two parts A and B:
In this frequency distribution A + B = 20
6 12 15 A B 13 9 5
B
A 20
31
Mean, Mode, Median
How we know which is the correct measure of location
to use in a given situation?
32
Mean, Mode, Median
How we know which is the correct measure of location to use in a given situation?
For example,
the set 26, 27, 28 ,29 30 has a mean of 28.
and the set 5, 19, 20, 36, 60 also has a mean of 28.
35
Dispersion
Range
The simplest measure of dispersion is the range – the
difference between the highest and the lowest values.
36
Dispersion
Standard Deviation
The standard deviation is the most widely
used measure of dispersion. The variance
of a set of data is the average of the square
of the difference in value of a datum from the http://www.scienceofrelationships.com/
mean: home/tag/love-letter
( x1 x ) 2 ( x2 x ) 2 ( xn x ) 2
variance
n
This has the disadvantage of being n
37
Dispersion
Standard Deviation (Alternative formula)
Since: n n
( xi x ) 2
i
( x 2
2 xi x x 2
)
i 1
i 1
n n
n n n n
x 2 x xi x
2
i
2
i
x 2
2 nx 2
nx 2
i 1 i 1 i 1
i 1
n n
n
i
x 2
i 1
x2
n
That is:
x x 2 2
38
A question to try
39
Frequency Polygons and Frequency Curves
A represents
the total
frequency of
the variable.
A
40
Normal Distribution Curve
When very large numbers of observations are made and
the range is divided into a very large number of ‘narrow’
classes, the resulting frequency curve, in many cases,
approximates closely to a standard curve known as the
normal distribution curve, which has a characteristic
bell-shaped formation.
The normal distribution AR= AL
curve is symmetrical
about its centre line AL AR
which coincides with the
mean of the observations.
41
Normal Distribution Curve
Values within 1 standard deviation of the mean
There are two points on the normal distribution curve where the concavity
switches, one from concave to convex and the other from convex to concave.
The horizontal distance of each of these two points from the mean line is one
standard deviation.
68%
42
Normal Distribution Curve
Values within 1 standard deviation of the mean
(68%)
44
Normal Distribution Curve
Values within 3 standard deviations of the mean
Of the area beneath the
normal distribution curve: 99.7%
45
Normal Distribution Curve
46
Standardized Normal Curve
49