Module 1 Statistical Inference
Module 1 Statistical Inference
Inference
Dr. Basheer Ahmad Samim
06:13 PM 1
Course Outline
1. Review of Descriptive Statistics and SPSS
2. Random Variable and Mathematical Expectation
3. Discrete Probability Distributions (Binomial, Poisson)
4. Continuous Probability Distribution (Normal)
5. Sampling Theory
6. Confidance Intervals
7. Hypotheses Testing
8. Goodness of Fit
9. Regression and Correlation with ANOVA
10. Multiple Regression
11. All the topics will be SPSS oriented
06:13 PM 2
Recommended Readings (Books)
Introduction to Statistics,
Walpole, R. E., 3rd Edition
(2000)
Statistical Methods for Practice
and Research by Ajai S. Gaur
and Sanjaya S. Gaur
06:13 PM 3
Attendance Policy
16-Weeks Teaching
16-Lectures (32-Attendance)
Twice Roll Call, Once before the break
and once after the break
At Least 80% (24) Attendance is
compulsory to be elligible for the Final
Examination
No Roll Call after First Ten(5) minutes
06:13 PM 4
Mode of Teaching
Lecture
SPSS Workshop
Discussion Session
06:13 PM 5
Mode of Assessment
Quizes (15%)
Assignments (15%)
Class Performance (5%)
Mid Term Test (25%)
Final Examination (40%)
06:13 PM 6
Questionnaire
06:13 PM 7
Variable
A characteristic or
property that varies
from individual to
individual.
06:13 PM 8
Constant
A characteristic or
property that does not
change from individual
to individual.
06:13 PM 9
Types of Variables
Types of
Variables
Qualitative Quantitative
Discrete Continuous
06:13 PM 10
Nominal Scale
Variable categories are mutually
exclusive and exhaustive.
Variable categories have no
logical order.
Eye Color, Hair Color, Gender.
06:13 PM 11
Ordinal Scale
Data categories are mutually
exclusive and exhaustive.
Data classifications are ranked or
ordered according to the
particular trait they possess.
Level of Knowledge about SPSS
06:13 PM 12
Interval Scale
Data categories are mutually exclusive
and exhaustive.
Data classifications are ranked or
ordered according to the particular trait
they possess.
Equal differences in the characteristic are
not represented by equal differences in
the measurements.
Temperature, Shoe Size and IQ scores
06:13 PM 13
Ratio Scale
Data categories are mutually exclusive and
exhaustive.
Data classifications are ranked or ordered
according to the particular trait they
possess.
Equal differences in the characteristic are
represented by equal differences in the
measurements.
The zero point is the essence of the
characteristic.
06:13 PM
Height, Weight, Distance. 14
Measurement Scales
06:13 PM 15
Data
The information collected
for any kind of investigation.
Usually Numerical but can
be Qualitative.
06:13 PM 16
Primary Data
The initial material collected
during the research process.
The information collected
directly from the respondent.
Personal Invetigation, Through Investigator, Through Questionnaire,
Through Local Sources, Through Telephone,
06:13 PM 17
Secondary Data
The information
collected and processed
by the people other than
the researcher
Government Organizations, Semi-Government
Organizations,
06:13 PM 18
Data Collection
Any of the following methods may be
adopted:
(a) Personal interview
(b) Direct observation
(c) Mail interview (internet interview)
(d) Telephone interview
What are the cons and pros of each?
06:13 PM 19
Data management
Office Editing,
Post Coding,
Data entry and Verification.
06:13 PM 20
Data organization and
Analysis
Preparing data for analysis,
Extracting descriptive measures
from the data,
Using advanced statistical
techniques to analyze the data
and draw inference there from.
06:13 PM 21
Measures of Central Tendency
Arithmetic Mean
Quantiles
(Median, Quartiles, Deciles, Percentiles)
Mode
06:13 PM 22
Arithmetic Mean
A value obtained by dividing the sum of all the observations by
their number.
X1 X 2 X n X i
X i 1
n n
06:13 PM 23
Arithmetic Mean
The marks obtained by 8 students are:
67 72 68 70 65 68 75 63
67 72 63 548
X 68.5 Marks
8 8
06:13 PM 24
Quantiles
For individual observations/discrete frequency
distribution, the ith quartile, jth decile and kth
percentile are located in the array/discrete frequency
distribution by the following relations
i(n 1)
Qi th observation in the distribution, i 1, 2, 3
4
j(n 1)
Dj th observation in the distribution, j 1, 2,,9
10
k(n 1)
Pk th observation in the distribution, k 1, 2, ,99
100
06:13 PM 25
Quartiles
The weekly TV Watching times (Hours):
25 41 27 32 43 66 35 31 15 5
34 26 32 38 16 30 38 30 20 21
5 15 16 20 21 25 26 27 30 30
31 32 32 34 35 37 38 41 43 66
06:13 PM 26
Quartiles
1(20 1)
Q1 th observation in the distribution
4
5.25th observation in the distribution
5th obs. 0.25{6th obs. - 5th obs.}
21 0.25{25 - 21} 22.0 Hours
06:13 PM 27
Quartiles
2(20 1)
Q2 th observation in the distribution
4
10.50th observation in the distribution
10th obs. 0.50{11th obs. - 10th obs.}
30 0.50{31 - 30} 30.5 Hours
06:13 PM 28
Quantiles
06:13 PM 29
Mode
The mode is a value which occurs
most frequently in a set of data. Or
mode is a value that occurs
maximum number of times in a
sequence of observations.
06:13 PM 30
Mode
The total automobile sales (in millions) in
the United States for the last 14 years.
06:13 PM 31
Measures of variation measure the
variation present among the values
of a data set, so measures of
variation are measures of spread of
values in the data.
06:13 PM 32
Absolute Measures of
Dispersion
Range
Quartile Deviation
Mean (Average) Deviation
Variance and Standard Deviation
06:13 PM 33
Relative Measures of
Dispersion
Coefficient of Range
Coefficient of Quartile Deviation
Coefficient of Mean Deviation
Coefficient of Variation (CV)
06:13 PM 34
Range
Difference between the largest
and the smallest observations
06:13 PM 35
Disadvantages of the Range
Ignores the way in which data are distributed
7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5
Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
06:13 PM 36
Inter-quartile Range (IQR)
06:13 PM 37
Inter-quartile Range
X Median X
minimum Q1 (Q2) Q3 maximum
12 30 45 57 70
06:13 PM 38
The Mean (absolute) Deviation
Mean Deviation is the average of absolute
deviations taken form the mean value.
X (X X ) X X
8 3 3
(x x ) 6
2
5 0 0 n 3
2 -3 3
0 6
06:13 PM 39
Variance
Variance is the average X cm (X-Mean)^2 X2
of the squared 4 36 16
deviations taken from 6 16 36
the mean value. 9 1 81
12 4 144
(i ) S 2
(x x ) 2
102
17cm 2
n 6
13 9 169
X2 X
2
702 102 2 16 36 256
(ii ) S
2
17 cm
2
n n 6 6
60 102 702
06:13 PM 40
Comparing Standard Deviations
Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 3.338
Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 0.926
Data C
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 4.567
•The smaller the standard deviation, the more tightly
clustered the scores around mean
•The larger the standard deviation, the more spread out
06:13 PM
the scores from mean 41
Relative Measures of Variation
X Largest X Smallest
Coefficient of Range
X Largest X Smallest
Q3 Q1
Coefficient of Quartile Deviation
Q3 Q1
MD
Coefficient of Mean Deviation
Mean
06:13 PM 42
Coefficient of Variation (CV)
S
CV 100%
X
Stock B:
Average price last year = $100
Standard deviation = $5 but stock B is
less variable
relative to its
S $5 price
CVB 100% 100% 5%
X $100
06:13 PM 44
Appropriate Choice of Measure
of Variability
If data are symmetric, with no serious
outliers, use range and standard
deviation.
If data are skewed, and/or have serious
outliers, use IQR.
If comparing variation across two data
sets, use coefficient of variation (C.V)
06:13 PM 45
Five Number Summary
The five number summary of a data set consists of the
minimum value, the first quartile, the second quartile, the
third quartile and the maximum value written in that order:
Min, Q1, Q2, Q3, Max.
06:13 PM 46
Five Number Summary
The weekly TV viewing times (in hours).
25 41 27 32 43 66 35 31 15 5
34 26 32 38 16 30 38 30 20 21
5 15 16 20 21 25 26 27 30 30
31 32 32 34 35 37 38 41 43 66
06:13 PM 47
Five Number Summary
1(20 1)
LOCATION of Q1 ; th obs. in the data 5.25th obs.
4
VALUE of Q1 ; 5th obs. 0.25{6th obs. - 5th obs.} 21 0.25{25 - 21} 22.0 Hrs
2(20 1)
LOCATION of Q 2 ; th obs. in the data 10.50th obs.
4
VALUE of Q2 ;10th obs. 0.50{11th obs. - 10th obs.} 30 0.50{31- 30} 30.5 Hrs
3(20 1)
LOCATION of Q 3 ; th obs. in the data 15.75th obs.
4
VALUE of Q 3 ; 15th obs 0.75 {16th obs - 15th obs} 35 0.75{37 - 35} 36.5 Hrs
06:13 PM 49
Max
Construction of Box-Plot Value
Q3
1. Start the box from Q1 and end at
Q3
Q2
2. Within the box draw a line to
represent Q2
3. Draw lower whisker to Min.
Value up to Q1 Q1
06:13 PM 50
70
Construction of Box-Plot 60
50
1. Q1=22.0 Q3=36.5
40
2. Q2=30.5
3. Minimum Value=5.0
30
4. Maximum Value=66.0
20
10
0
06:13 PM 51
70
Interpretation of Box-Plot 60
06:14 PM 53
Inner and Outer Fences
If Q1=22.0 Q2=30.5 Q3=36.5
06:14 PM 54
80
Identification of the Outliers
70
outliers outlier
3. The values that lie outside outer 30
Female
06:14 PM Male 56
Standardized Variable
A variable that has mean “0” and Variance “1” is
called standardized variable
Values of standardized variable are called
standard scores
Values of standard variable i.e standard scores are
unit-less
Construction
Variable Mean of Variable
Z
Standard Deviation of Variable
06:14 PM 57
Standardized Variable
X ( X X )2 Z (Z Z ) 2 X 32
X 8
n 4
3 25 -1.3624 1.8561
54
S x2 13.5
6 4 -0.5450 0.2970 4
X X X 8
11 9 0.81741 0.6682 Z
Sx 3.67
12 16 1.0899 1.1879
Z
Z
0
32 54 0 4.009 n
4.009
Variable Z has mean “0” and S z2 1
4
variance “1” so Z is a standard variable.
Standard Score at X=11 is Z X X 11 8 0.8174
Sx 3.67
06:14 PM
Performance evaluation by z-scores
The industry in which sales rep Mr. Atif works has mean
annual sales=$2,500
standard deviation=$500.
The industry in which sales rep Mr. Asad works has mean
annual sales=$4,800
standard deviation=$600.
Last year Mr. Atif’s sales were $4,000 and
Mr. Asad’s sales were $6,000.
Which of the representatives would you hire
if you have one sales position to fill?
06:14 PM 59
Performance evaluation by z-scores
Sales rep. Atif Sales rep. Asad
XB= $2,500 XP =$4,800
X 2S
99.7%
X 3S contains about 99.7% of values
06:14 PM X 3S 61
Measures of Skewness
A distribution in which the values equidistant
from the centre have equal frequencies is defined
to be symmetrical and any departure from
symmetry is called skewness.
06:14 PM 62
Measures of Skewness
A distribution is positively skewed, if the observations
tend to concentrate more at the lower end of the possible
values of the variable than the upper end. A positively
skewed frequency curve has a longer tail on the right
hand side
06:14 PM 63
Measures of
A distribution Skewness
is negatively skewed, if the
observations tend to concentrate more at the upper
end of the possible values of the variable than the
lower end. A negatively skewed frequency curve has
a longer tail on the left side.
06:14 PM 64
Measures of Kurtosis
The Kurtosis is the degree of peakedness or flatness of a
unimodal (single humped) distribution,
• When the values of a variable are highly concentrated around
the mode, the peak of the curve becomes relatively high; the
curve is Leptokurtic.
• When the values of a variable have low concentration
around the mode, the peak of the curve becomes relatively
flat;curve is Platykurtic.
• A curve, which is neither very peaked nor very flat-toped, it
is taken as a basis for comparison, is called
Mesokurtic/Normal.
06:14 PM 65
Measures of Kurtosis
06:14 PM 66
Measures of Kurtosis
n X-X
4
Coefficient of Kurtosis=
2 2
X-X
06:14 PM 67