Introduction To Statistics and Statistical Inference

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 68

INTRODUCTION TO

STATISTICS AND
STATISTICAL
INFERENCE
Training on Teaching Note: Most of the Slides were taken from
Basic Statistics for Elementary Statistics: A Handbook of Slide
Presentation prepared by Z.V.J. Albacea, C.E.
Tertiary Level Teachers Reano, R.V. Collado, L.N. Comia and N.A.
Tandang in 2005 for the Institute of Statistics,
Summer 2008 CAS, UP Los Banos
TEACHING BASIC STATISTICS ….

Florence Nightingale on Statistics

 “...the most important science in the whole


world: for upon it depends the practical
application of every other science and of every
art: the one science essential to all political
and social administration, all education, all
organization based on experience, for it only
gives results of our experience.”
 “To understand God's thoughts, we must study
statistics, for these are the measures of His
purpose.”

Session 1.2
TEACHING BASIC STATISTICS ….

Realities about Statistics

 The man in the street distrusts statistics and


despises [his image of] statisticians, those who
diligently collect irrelevant facts and figures and
use them to manipulate society.
“There are three kinds of lies: lies, damned lies, and
statistics” – Mark Twaine
 One can not go about without statistics.
“Statistics are like bikinis. What they reveal is suggestive,
but what they conceal is vital.” – Aaron Levenstein

Session 1.3
TEACHING BASIC STATISTICS ….

Definition of Statistics

plural sense: numerical facts, e.g. CPI,


peso-dollar exchange rate
singular sense: scientific discipline
consisting of theory and methods for
processing numerical information
that one can use when making
decisions in the face of uncertainty.

Session 1.4
TEACHING BASIC STATISTICS ….

History of Statistics

 The term statistics came from the Latin phrase


“ratio status” which means study of practical
politics or the statesman’s art.
 In the middle of 18th century, the term statistik
(a term due to Achenwall) was used, a German
term defined as “the political science of several
countries”
 From statistik it became statistics defined as a
statement in figures and facts of the present
condition of a state.

Session 1.5
TEACHING BASIC STATISTICS ….

Application of Statistics

 Diverse applications
“During the 20th Century statistical thinking
and methodology have become the
scientific framework for literally dozens of
fields including education, agriculture,
economics, biology, and medicine, and with
increasing influence recently on the hard
sciences such as astronomy, geology, and
physics. In other words, we have grown
from a small obscure field into a big
obscure field.” – Brad Efron

Session 1.6
TEACHING BASIC STATISTICS ….

Application of Statistics

 Comparing the effects of five kinds of


fertilizers on the yield of a particular
variety of corn
 Determining the income distribution of
Filipino families
 Comparing the effectiveness of two diet
programs
 Prediction of daily temperatures
 Evaluation of student performance

Session 1.7
TEACHING BASIC STATISTICS ….

Two Aims of Statistics

Statistics aims to uncover


structure in data, to explain
variation…
 Descriptive
 Inferential

Session 1.8
TEACHING BASIC STATISTICS ….

Areas of Statistics

Descriptive statistics Inferential statistics


 methods concerned w/  methods concerned
collecting, describing, and with the analysis of a
analyzing a set of data subset of data leading
without drawing to predictions or
conclusions (or inferences) inferences about the
about a large group entire set of data

Session 1.9
TEACHING BASIC STATISTICS ….

Example of Descriptive Statistics

Present the Philippine population by constructing a


graph indicating the total number of Filipinos counted
during the last census by age group and sex

Session 1.10
TEACHING BASIC STATISTICS ….

Example of Inferential Statistics

A new milk formulation designed to improve the psychomotor


development of infants was tested on randomly selected infants.

Based on the results, it was concluded that the new milk formulation is
effective in improving the psychomotor development of infants.

Session 1.11
TEACHING BASIC STATISTICS ….

Inferential Statistics
Larger Set
(N units/observations) Smaller Set
(n units/observations)

Inferences and
Generalizations

Session 1.12
TEACHING BASIC STATISTICS ….

Key Definitions
 The universe/physical population is the collection of
things or observational units under consideration.
 A variable is a characteristic observed or measured on
every unit of the universe.
 The statistical population is the set of all possible values
of the variable.
 Measurement is the process of determining the value or
label of the variable based on what has been observed.
 An observation is the realized value of the variable.
 Data is the collection of all observations.

Session 1.13
TEACHING BASIC STATISTICS ….

Key Definitions

 Parameters are numerical measures


that describe the population or universe
of interest. Usually donated by Greek
letters;  (mu),  (sigma),  (rho), 
(lambda),  (tau),  (theta),  (alpha) and
 (beta).
 Statistics are numerical measures of a
sample

Session 1.14
TEACHING BASIC STATISTICS ….

Types of Variables

Qualitative variable VARIABLES


 Describes the quality or
character of something Qualitative Quantitative
Quantitative variable
 Describes the amount or
number of something
a. Discrete
Discrete Continuous
 countable
b. Continuous
 Measurable (measured
using a continuous scale
such as kilos, cms, grams)
c. Constant

Session 1.15
TEACHING BASIC STATISTICS ….

Levels of Measurement
1. Nominal
 Numbers or symbols used to classify units
into distinct categories
2. Ordinal scale
 Accounts for order; no indication of distance
between positions
3. Interval scale
 Equal intervals (fixed unit of measurement);
no absolute zero
4. Ratio scale
 Has absolute zero

Session 1.16
TEACHING BASIC STATISTICS ….

Methods of Collecting Data

 Objective Method
Subjective Method

Use of Existing Records

Session 1.17
TEACHING BASIC STATISTICS ….

Methods of Presenting Data

 Textual

 Tabular

 Graphical

Session 1.18
TEACHING BASIC STATISTICS ….

Summary Measures

Location Variation Skewness

Percentile Kurtosis
Maximum Quartile
Range
Decile
Minimum Coefficient of
Median
Variance Variation
Central Interquartile
Tendency Range

Standard Deviation
Mean Median Mode

Session 1.19
TEACHING BASIC STATISTICS ….

Measures of Central Tendency

 A single value that is used to identify


the “center” of the data
 it is thought of as a typical value of
the distribution
 precise yet simple
 most representative value of the
data

Session 1.20
TEACHING BASIC STATISTICS ….

Mean

 Most common measure of the center


 Also known as arithmetic average
N

X i
X1  X 2   XN
Population Mean:  i 1

N N
n

x i
x1  x2   xn
Sample Mean: x i 1

n n

Session 1.21
TEACHING BASIC STATISTICS ….

Properties of the Mean

 may not be an actual


observation in the data set
 can be applied in at least
interval level
 easy to compute
 every observation contributes
to the value of the mean

Session 1.22
TEACHING BASIC STATISTICS ….

Properties of the Mean

 subgroup means can be combined to come up


with a group mean (use weighted mean)

 easily affected by extreme values

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

Mean = 5
Mean = 6

Session 1.23
TEACHING BASIC STATISTICS ….

Median

 Divides the observations into two equal


parts
 If the number of observations is odd, the
median is the middle number.
 If the number of observations is even, the
median is the average of the 2 middle
numbers.
~
 Sample median denoted as x
while population median is denoted as 
~

Session 1.24
TEACHING BASIC STATISTICS ….

Properties of a Median

 may not be an actual observation in


the data set
 can be applied in at least ordinal level
 a positional measure; not affected by
extreme values

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

Median = 5

Session 1.25
TEACHING BASIC STATISTICS ….

Mode

 occurs most frequently


 nominal average
 may or may not exist

0 1 2 3 4 5 6
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
No Mode
Mode = 9

Session 1.26
TEACHING BASIC STATISTICS ….

Properties of a Mode

 can be used for qualitative as


well as quantitative data
 may not be unique
 not affected by extreme values
 can be computed for
ungrouped and grouped data

Session 1.27
TEACHING BASIC STATISTICS ….

Mean, Median & Mode

Use the mean when:


 sampling stability is desired
 other measures are to be
computed

Session 1.28
TEACHING BASIC STATISTICS ….

Mean, Median & Mode

Use the median when:


 the exact midpoint of the
distribution is desired
 there are extreme
observations

Session 1.29
TEACHING BASIC STATISTICS ….

Mean, Median & Mode

Use the mode when:


 when the "typical" value is
desired
 when the dataset is measured
on a nominal scale

Session 1.30
TEACHING BASIC STATISTICS ….

Measures of Location

 A Measure of Location summarizes a


data set by giving a value within the
range of the data values that describes
its location relative to the entire data set
arranged according to magnitude
(called an array).

Some Common Measures:


 Minimum, Maximum
 Percentiles, Deciles, Quartiles

Session 1.31
TEACHING BASIC STATISTICS ….

Maximum and Minimum

 Minimum is the smallest value in the


data set, denoted as MIN.

 Maximum is the largest value in the


data set, denoted as MAX.

Session 1.32
TEACHING BASIC STATISTICS ….

Percentiles

 Numerical measures that give the


relative position of a data value
relative to the entire data set.
 Divide an array (raw data arranged
in increasing or decreasing order
of magnitude) into 100 equal parts.
 The jth percentile, denoted as Pj, is
the data value in the the data set
that separates the bottom j% of the
data from the top (100-j)%.

Session 1.33
TEACHING BASIC STATISTICS ….

EXAMPLE

Suppose LJ was told that relative


to the other scores on a certain
test, his score was the 95th
percentile.
 This means that (at least) 95%
of those who took the test had
scores less than or equal to LJ’s
score, while (at least) 5% had
scores higher than LJ’s.

Session 1.34
TEACHING BASIC STATISTICS ….

Deciles

 Divide an array into ten equal


parts, each part having ten
percent of the distribution of
the data values, denoted by Dj.

 The 1st decile is the 10th


percentile; the 2nd decile is the
20th percentile…..

Session 1.35
TEACHING BASIC STATISTICS ….

Quartiles

 Divide an array into four equal


parts, each part having 25% of
the distribution of the data
values, denoted by Qj.
 The 1st quartile is the 25th
percentile; the 2nd quartile is
the 50th percentile, also the
median and the 3rd quartile is
the 75th percentile.

Session 1.36
TEACHING BASIC STATISTICS ….

Measures of Variation

A measure of variation is a
single value that is used to
describe the spread of the
distribution
A measure of central tendency
alone does not uniquely
describe a distribution

Session 1.37
TEACHING BASIC STATISTICS ….

A look at dispersion…

Data A

Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21
s = 3.338

Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = .9258

Data C Mean = 15.5


s = 4.57
11 12 13 14 15 16 17 18 19 20 21

Session 1.38
TEACHING BASIC STATISTICS ….

Two Types of Measures of


Dispersion
Absolute Measures of Dispersion:
 Range
 Inter-quartile Range
 Variance
 Standard Deviation
Relative Measure of Dispersion:
 Coefficient of Variation

Session 1.39
TEACHING BASIC STATISTICS ….

Range (R)
The difference between the maximum and
minimum value in a data set, i.e.
R = MAX – MIN
Example: Pulse rates of 15 male residents of a
certain village
54 58 58 60 62 65 66 71
74 75 77 78 80 82 85

R = 85 - 54 = 31

Session 1.40
TEACHING BASIC STATISTICS ….

Some Properties of the Range

 The larger the value of the


range, the more dispersed
the observations are.
 It is quick and easy to
understand.
 A rough measure of
dispersion.

Session 1.41
TEACHING BASIC STATISTICS ….

Inter-Quartile Range (IQR)


The difference between the third quartile and
first quartile, i.e.
IQR = Q3 – Q1
Example: Pulse rates of 15 residents of a
certain village

54 58 58 60 62 65 66 71
74 75 77 78 80 82 85

IQR = 78 - 60 = 18

Session 1.42
TEACHING BASIC STATISTICS ….

Some Properties of IQR

 Reduces the influence of


extreme values.

 Not as easy to calculate


as the Range.

Session 1.43
TEACHING BASIC STATISTICS ….

Variance

 important measure of variation


 shows variation about the mean
N

(X i   )2
Population variance 2  i 1
N

Sample variance  (x  x)i


2

s2  i 1
n 1

Session 1.44
TEACHING BASIC STATISTICS ….

Standard Deviation (SD)

 most important measure of variation


 square root of Variance
 has the same units as the original data
N

(X i   )2
Population SD  i 1
N

 (x  x) i
2

Sample SD s i 1
n 1

Session 1.45
TEACHING BASIC STATISTICS ….

Computation of Standard Deviation

(Sample) Data: 10 12 14 15 17 18 18 24

n=8 Mean =16

(10  16) 2  (12  16) 2  (14  16) 2  (15  16) 2  (17  16) 2  (18  16) 2  (24  16) 2
s
7
 4.309

Session 1.46
TEACHING BASIC STATISTICS ….

Remarks on Standard Deviation


 If there is a large amount of variation,
then on average, the data values will be
far from the mean. Hence, the SD will be
large.
 If there is only a small amount of
variation, then on average, the data
values will be close to the mean. Hence,
the SD will be small.

Session 1.47
TEACHING BASIC STATISTICS ….

Comparing Standard Deviations


(comparable only when units of measure are the same and
the means are not too different from each other)

Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 3.338
Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = .9258
Data C
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 4.57

Session 1.48
TEACHING BASIC STATISTICS ….

Comparing Standard Deviations

Example: Team A - Heights of five marathon players in inches

Mean = 65
S =0

5”

65 “ 65 “ 65 “ 65 “ 65 “

Session 1.49
TEACHING BASIC STATISTICS ….

Comparing Standard Deviation


Example: Team B - Heights of five marathon players in inches

Mean = 65”
s = 4.0”

62 “ 67 “ 66 “ 70 “ 60 “

Session 1.50
TEACHING BASIC STATISTICS ….

Properties of Standard Deviation

 It is the most widely used measure of


dispersion. (Chebychev’s Inequality)
 It is based on all the items and is rigidly
defined.
 It is used to test the reliability of measures
calculated from samples.
 The standard deviation is sensitive to the
presence of extreme values.
 It is not easy to calculate by hand (unlike the
range).

Session 1.51
TEACHING BASIC STATISTICS ….

Chebyshev’s Rule

It permits us to make statements about


the percentage of observations that
must be within a specified number of
standard deviation from the mean
The proportion of any distribution that
lies within k standard deviations of the
mean is at least 1-(1/k2) where k is
any positive number larger than 1.
This rule applies to any distribution.
Session 1.52
TEACHING BASIC STATISTICS ….

Chebyshev’s Rule

For any data set with mean () and


standard deviation (SD), the following
statements apply:
At least 75% of the observations are
within 2SD of its mean.

At least 88.9% of the observations are


within 3SD of its mean.

Session 1.53
TEACHING BASIC STATISTICS ….

Illustration

At least 75%

At least 75% of the observations


are within 2SD of its mean.

Session 1.54
TEACHING BASIC STATISTICS ….

Example
The midterm exam scores of 100 STAT 1 students
last semester had a mean of 65 and a standard
deviation of 8 points.
Applying the Chebyshev’s Rule, we can say that:
1. At least 75% of the students had scores
between 49 and 81.
2. At least 88.9% of the students had scores
between 41 and 89.

Session 1.55
TEACHING BASIC STATISTICS ….

Coefficient of Variation (CV)

 measure of relative variation


 usually expressed in percent
 shows variation relative to mean
 used to compare 2 or more groups
 Formula :
 SD 
CV     100%
 Mean 

Session 1.56
TEACHING BASIC STATISTICS ….

Comparing CVs

 Stock A: Average Price = P50


SD = P5
CV = 10%
 Stock B: Average Price = P100
SD = P5
CV = 5%

Session 1.57
TEACHING BASIC STATISTICS ….

Measure of Skewness

 Describes the degree of departures of the


distribution of the data from symmetry.
 The degree of skewness is measured by
the coefficient of skewness, denoted as SK
and computed as,

3Mean  Median
SK 
SD
Session 1.58
TEACHING BASIC STATISTICS ….

What is Symmetry?

A distribution is said to be
symmetric about the mean,
if the distribution to the left
of mean is the “mirror
image” of the distribution to
the right of the mean.
Likewise, a symmetric
distribution has SK=0 since
its mean is equal to its
median and its mode.

Session 1.59
TEACHING BASIC STATISTICS ….

Measure of Skewness

SK > 0
positively
skewed

SK < 0
negatively skewed

Session 1.60
TEACHING BASIC STATISTICS ….

Measure of Kurtosis
 Describes the extent of peakedness or
flatness of the distribution of the data.
 Measured by coefficient of kurtosis (K)
computed as,
N

 X  
4
i

K i 1
3
N
4

Session 1.61
TEACHING BASIC STATISTICS ….

Measure of Kurtosis

K=0
mesokurtic

K>0 K<0
leptokurtic platykurtic

Session 1.62
TEACHING BASIC STATISTICS ….

Box-and-Whiskers Plot

 Concerned with the symmetry of the


distribution and incorporates
measures of location in order to study
the variability of the observations.
 Also called as box plot or 5-number
summary (represented by Min, Max,
Q1, Q2, and Q3).
 Suitable for identifying outliers.

Session 1.63
TEACHING BASIC STATISTICS ….

Box-and-Whiskers Plot
The diagram is made up of a box which
lies between the first and third
quartiles.
The whiskers are the straight lines
extending from the ends of the box to
the smallest and largest values that
are not outliers.

Session 1.64
TEACHING BASIC STATISTICS ….

Steps to Construct a Box-and-Whiskers plot

Step 1: Draw a rectangular box whose left edge is at the


Q1 and whose right edge is at the Q3 so the box width
is the IQR. Then draw a vertical line segment inside
the box where the median is found.

Q1 Md Q3

75 78 85

Session 1.65
TEACHING BASIC STATISTICS ….

Steps to Construct a Box-and-Whiskers plot

Step 2: Place marks at distances 1.5 IQR from


either end of the box. (1.5 IQR =15)
1.5 IQR 1.5 IQR

Q1 Md Q3

60 75 78 85 100

Session 1.66
TEACHING BASIC STATISTICS ….

Steps to Construct a Box-and-Whiskers plot

Step 3:Draw the horizontal line


segments known as the “whiskers”
from each of the end box to the
largest and smallest values in the data
set that are not outliers.
(An observation beyond 1.5 IQR is
an outlier.)

Session 1.67
TEACHING BASIC STATISTICS ….

Steps to Construct a Box-and-Whiskers plot

Step 4: For every outlier, draw a dot. If two or more dots


have the same values, draw the dots side by side.
1.5 IQR 1.5 IQR

.
.
Q1 Md Q3

55 60 75 78 85 98 100

Session 1.68

You might also like