Inferential Statistics Lecture
Inferential Statistics Lecture
Inferential Statistics Lecture
Descriptive vs Inferential
• Descriptive statistics is • Inferential statistics is a
a branch of statistics statistics used to make
used to summarize and inferences or predictions
describe the about a population
characteristics of a based on a sample of
dataset. data
Sample Population
Statistic Parameter
Arithmetic mean x µ
Standard deviation s σ
Number of items n N
Proportion p π
Introduction
• The parametric tests are tests that
require normal distribution, and the
levels of measurement are expressed in
interval or ratio data
• The t-test is used to compare two means,
the means of two independent samples or
two independent groups and the means
of correlated samples before and after
treatment.
• The ANOVA on the other hand is used in
comparing the means of two or more
independent groups
Power Analysis
• Is normally conducted before the data
collection
• The reason for applying power analysis is
that, ideally, the investigator desires a
smaller sample because larger
samples are often costlier than
smaller samples.
Power Analysis
• A critically important aspect of study
design is determining the appropriate
sample size to answer the research
question
• The power of your study is the probability
that you will find a significant difference
or relationship if a difference or
relationship truly exists in the population
• Power analysis is directly related to the
tests for hypothesis and is usually
conducted before the data collection
Cont.
• Power Analysis is used to
determine the smallest sample
size that would ideally give the best
results without exhausting the
resources of the research study
Resources:
Financial capabilities (since not all
researches are
considered to be cheap and affordable)
Intellectual and research capacity
Time
Why determine
sample size?
• Ex: A study is proposed to evaluate a
new screening test for down
syndrome.
For evaluation, pregnant women will be
asked to provide a blood sample and
undergo Amniocentesis
H0 = null hypothesis
size.
Statistical Power
• The power of your study is the probability
that you will find a significant difference
or relationship if a difference or
relationship truly exists in the population
• To find a significant difference in the
study
TYPE I ERROR
• The best way to increase your study’s
power is to increase your sample size
(Directly Proportional).
• However, resist increasing your sample
size beyond what your power analysis
indicates, as that will increase the risk of
finding a False Positive or an Alpha,
also known as a Type I error.
TYPE II ERROR
• Even if your null hypothesis is indeed false,
if your study is underpowered, you will not
find significant results. In other words, you
will have False Negative or a Beta, also
known as a Type II error.
• Some factors that will affect the power of
your study include sample size, significance
level (α), effect size, and the type of
statistical analysis you plan to conduct.
SUMMARY OF THE POSSIBLE DECISIONS
THAT WE COULD MAKE WITH STATISTICAL
POWER:
• Power analysis is a very tedious task to compute.
−μ
• Z=
σ
Z = Standardization of x
µ = population mean
σ = Standard deviation
• An age of students is normally distributed with a
mean age of 35 years and a standard deviation of 5
years. If a student is randomly picked, find the
probability that the age of the student
i) Lies between 35 and 40
•z =
−μ • We need to find the
probability that Z lies
between 0 and 1
σ
• P(0 < Z < 1) ≈ 0.3413
Exercise
ii) Lies between 30 and 40
iii) Is below 29 yrs
iv)Lies between 25 and 30
v) Lies beyond 45 yrs
vi) Lies beyond 30 yrs
vii) Lies below 25 years
Exercise
• Lies beyond 45 yrs
Assignment
The time taken to finish a statistics exam is normally
distributed with a mean of 130 minutes and standard deviation
of 14.5 minutes.
A. If a student is randomly selected, find the probability that the
student finishes the exam
i. Before 1 hr 30 minutes
ii. After 1hr 30 minutes
iii. Between 120 and 150 minutes
B. If the exam is to take 2hrs 30minutes, find the percentage of
students who don’t complete the exam
Hypothesis Testing
• A hypothesis is an opinion, claim or belief
about a certain issue.
• Hypothesis testing is a scientific method
of checking whether a claim/opinion is
correct or incorrect.
• Broadly classified into two:
i. Parametric tests
ii. Non parametric tests
Parametric test
• This is a test where the parameters
(especially standard deviation) are
known or assumed to be constant.
• The type of distribution is known to be
normal distribution
e.g. Normal test
Non parametric test
• Is
a test where the parameters are
not known
• The
type of distribution is not also
known.
• E.g. chi square test.
• Hypothesis testing involves two hypotheses:
i. The Null hypothesis
ii. The Alternative hypothesis
Lower
Upper
Critical
Critical
value
Value
ii)One tail test: in this case one is
concerned only on one side of the curve.
There are two types here:
a) Right (Upper) tail test: In this case the
researcher is concerned with the greater
than, more than, better than or above a
given value.
The decision rule is that if the calculated is greater
than the critical, reject H0
Acceptance region
Critical
region
Critical
Value
Left Tail test
Acceptance region
Critical
region
Critical
value
E.g.
• The School of Allied Health claims that
the average performance of students in
national exam is 48% Formulate the null
hypothesis and its alternative.
Two tails test
H0: μ=48%
Ha: μ≠48%
Acceptance region
Critical Critical
region region
Lower
Upper
Critical
Critical
value
Value
One tail test
• Right tail test Left tail test
H0: μ=48% H0: μ=48%
Ha: μ>48% Ha: μ<48%
Result of Hypothesis
testing
• There are four possible results in
hypothesis testing:
i. Accepting a true null hypothesis. This is a
correct decision.
ii. Rejecting a false null hypothesis. This is a
correct decision.
iii. Rejecting a true null hypothesis. This is a
wrong decision and is known as Type I (or
type α or A) error .
iv. Accepting a false null hypothesis. This is a
wrong decision ad known as Type II (or type
β or B) Error.
Acceptance and rejection
regions
• Acceptance region is that area within
which the null hypothesis is
accepted.
It depends on confidence level which is
usually expressed as a percentage . Ex. 99%,
95% and 90%.
• Rejection region is that area that if the
null falls is rejected.
It usually depends on level of significance (α)
ex. 0.01, 0.05 and 0.1
• The acceptance region and the rejection region
are separated by critical values that are
obtained from statistical tables.
• The critical values depend on:
i. The type of distribution
ii. The sample size
iii. The level of significance (α) (=1-level of confidence)
iv. The degrees of freedom (d.f)
v. Whether dealing with one tail or two tails test,
Most common critical
values for Z tests
Critical Values (for normal distribution)
Three types:
• One Sample t-test
• Two Independent/Unpaired t-test - 2 different
groups
• Two Correlated/Dependent/Paired t-test - before
and after treatment
When to use T-test?
Placebo: Control
Group (sugar pill)
• Case 2: α = 0.01, df = 10
• Significance Level (α): 0.01
• Degrees of Freedom (df): 10
• This value is approximately 23.209.
IMPORTANT TERMS
• HYPOTHESIS TESTING
Used to mean a statement about one or more
parameters of a population or populations.
≠ (TWO-TAILED TEST)
≤ or ≥
It is always dependent on the alternative hypothesis on how
you can identify if your test is one-tailed or two-tailed.
TYPES OF T-TEST
✓ ONE SAMPLE T-TEST
Compares one sample mean to a known population mean.
Hypotheses:
Group 1: Males
Group 2: Females
Research in the Effects of Salbutamol vs. Ambroxol
treatment with patients of AMCM
Group 1: Salbutamol
Group 2: Ambroxol
TYPES OF T-TEST
✓ TWO SAMPLE DEPENDENT T-TEST
Makes use of dependent samples or paired samples.
Essentially connected because again, they are paired —
they are tests done on the same person or thing; before
or after treatment.
Examples: Effect of using Social Networking Site on
Medtech Students
Group 1: Before social network
Group 2: After social network