Inferential Statistics Lecture

Download as pdf or txt
Download as pdf or txt
You are on page 1of 83

INFERENTIAL STATISTICS

Descriptive vs Inferential
• Descriptive statistics is • Inferential statistics is a
a branch of statistics statistics used to make
used to summarize and inferences or predictions
describe the about a population
characteristics of a based on a sample of
dataset. data

• Involves calculating • Involves using


summary measures statistical tests, such as
such as the mean, hypothesis tests and
median, mode, range, regression analysis
standard deviation and
variance
Introduction
• It is the process of drawing
conclusions about attributes of a
population based upon information
contained in a sample (taken from the
population).

• It is divided into two broad categories:

i. Statistical Estimation and

ii. Hypothesis Testing


Statistical Estimation
• It is the procedure of using statistic to estimate a
population parameter
• It is divided into two:
i. Point estimation (where an estimate of a
population parameter is given by a single
number) and
ii. Interval estimation (where an estimate of a
population is given by a range which the
parameter may be considered to lie)
• Symbols for statistic of population parameters are as
follows.

Sample Population
Statistic Parameter

Arithmetic mean x µ

Standard deviation s σ

Number of items n N

Proportion p π
Introduction
• The parametric tests are tests that
require normal distribution, and the
levels of measurement are expressed in
interval or ratio data
• The t-test is used to compare two means,
the means of two independent samples or
two independent groups and the means
of correlated samples before and after
treatment.
• The ANOVA on the other hand is used in
comparing the means of two or more
independent groups
Power Analysis
• Is normally conducted before the data
collection
• The reason for applying power analysis is
that, ideally, the investigator desires a
smaller sample because larger
samples are often costlier than
smaller samples.
Power Analysis
• A critically important aspect of study
design is determining the appropriate
sample size to answer the research
question
• The power of your study is the probability
that you will find a significant difference
or relationship if a difference or
relationship truly exists in the population
• Power analysis is directly related to the
tests for hypothesis and is usually
conducted before the data collection
Cont.
• Power Analysis is used to
determine the smallest sample
size that would ideally give the best
results without exhausting the
resources of the research study
Resources:
 Financial capabilities (since not all
researches are
considered to be cheap and affordable)
 Intellectual and research capacity
 Time
Why determine
sample size?
• Ex: A study is proposed to evaluate a
new screening test for down
syndrome.
 For evaluation, pregnant women will be
asked to provide a blood sample and
undergo Amniocentesis

• Cost per respondent:


 Blood sample test : 12k
 Amniocentesis: 45k
 Total/respondents : 57k
Is it efficient?
What if it doesn’t work?
Statistical Power
• Used to calculate the minimum sample size needed to
produce a reasonable level of accuracy.

H0 = null hypothesis

• Type II Error - says that our study lacks power and in

order to increase power, we need to increase sample

size.
Statistical Power
• The power of your study is the probability
that you will find a significant difference
or relationship if a difference or
relationship truly exists in the population
• To find a significant difference in the
study
TYPE I ERROR
• The best way to increase your study’s
power is to increase your sample size
(Directly Proportional).
• However, resist increasing your sample
size beyond what your power analysis
indicates, as that will increase the risk of
finding a False Positive or an Alpha,
also known as a Type I error.
TYPE II ERROR
• Even if your null hypothesis is indeed false,
if your study is underpowered, you will not
find significant results. In other words, you
will have False Negative or a Beta, also
known as a Type II error.
• Some factors that will affect the power of
your study include sample size, significance
level (α), effect size, and the type of
statistical analysis you plan to conduct.
SUMMARY OF THE POSSIBLE DECISIONS
THAT WE COULD MAKE WITH STATISTICAL
POWER:
• Power analysis is a very tedious task to compute.

• Our goal here is for us to know that this exists, and it


can help us in our future studies and researches
TYPES OF POWER
ANALYSIS
NORMAL DISTRIBUTION
Normal Distribution
• Is a probability distribution which is used
to determine probabilities of
continuous variables
• Examples of continuous variables are
Distances, Times, Weights, Heights,
Capacity, Distance, e.t.c.
• Usually continuous variables are those,
which can be measured by using the
appropriate units of measurement.
Characteristics of normal
distribution curve
 The total area under the curve is = 1
which is equivalent to the maximum
value of probability
 The line of symmetry divides the curve
into two equal halves
 The two ends of the normal distribution
curve continuously approach the
horizontal axis but they never cross it.
 It has only one mode
 The values of the mean, mode and
median are all equal
Standard Normal distribution
curve
• This is a probability distribution
used to estimate the probabilities of
all normally distributed variables
• It
has a mean of zero and a
standard deviation of 1.
0 Z
Reading a standard normal
distribution table
• Ex. Given Z=1.22
• Probability of Z > 1.22: This is the area under the curve to
the right of Z = 1.22.
• Using a Z-table
• P(Z > 1.22) = 1 - P(Z < 1.22) ≈ 0.1112

• Ex. Given Z=-2.75


• Probability of Z < -2.75: This is the area under the curve to
the left of Z = -2.75.
• Using a Z-table
• P(Z < -2.75) ≈ 0.0030
Exercise
• Using the standard normal distribution table obtain the
probability that z
i. Lies between zero and 1.64
ii. Lies between zero and -2.99
iii. Lies between 1.34 and 2.57
iv. Lies between -1.76 and -3.25
v. Lies between -1.39 and 2.49
vi. Lies above 1.27
vii. Lies above -3.07
viii. Lies below 3.20
ix. Lies below -1.44
Standardizing variables
• Normally distributed variables can be transformed into
their standard form using

 −μ
• Z=
σ

Where χ = Value to be standardized

Z = Standardization of x

µ = population mean

σ = Standard deviation
• An age of students is normally distributed with a
mean age of 35 years and a standard deviation of 5
years. If a student is randomly picked, find the
probability that the age of the student
i) Lies between 35 and 40

•z =
 −μ • We need to find the
probability that Z lies
between 0 and 1
σ
• P(0 < Z < 1) ≈ 0.3413
Exercise
ii) Lies between 30 and 40
iii) Is below 29 yrs
iv)Lies between 25 and 30
v) Lies beyond 45 yrs
vi) Lies beyond 30 yrs
vii) Lies below 25 years
Exercise
• Lies beyond 45 yrs
Assignment
The time taken to finish a statistics exam is normally
distributed with a mean of 130 minutes and standard deviation
of 14.5 minutes.
A. If a student is randomly selected, find the probability that the
student finishes the exam
i. Before 1 hr 30 minutes
ii. After 1hr 30 minutes
iii. Between 120 and 150 minutes
B. If the exam is to take 2hrs 30minutes, find the percentage of
students who don’t complete the exam
Hypothesis Testing
• A hypothesis is an opinion, claim or belief
about a certain issue.
• Hypothesis testing is a scientific method
of checking whether a claim/opinion is
correct or incorrect.
• Broadly classified into two:
i. Parametric tests
ii. Non parametric tests
Parametric test
• This is a test where the parameters
(especially standard deviation) are
known or assumed to be constant.
• The type of distribution is known to be
normal distribution
 e.g. Normal test
Non parametric test
• Is
a test where the parameters are
not known
• The
type of distribution is not also
known.
• E.g. chi square test.
• Hypothesis testing involves two hypotheses:
i. The Null hypothesis
ii. The Alternative hypothesis

The Null Hypothesis (H0)


• This refers to the hypothesis to be tested. It is usually
stated in the negative.
• It usually involves a population parameter and not a
sample statistics.
• It usually contains an equal sign and may take ≤, = or

• It may be rejected or not rejected.
The Alternative Hypothesis
(H1, Ha or HA
• This is usually the opposite of the null
hypothesis. Therefore, the Null hypothesis
and its alternative are mutually exclusive
i.e. they cannot be both correct at the
same time.
• It is also about a population parameter.
• It is usually what the researcher is
interested in ‘proving’.
• It does not contain an equal sign and may
take >, ≠ or <.
• It may be rejected or not.
Formulating H0 and its Ha
• Whenformulating the H0 and its
alternative, one may be interested
in two types of tests:
i. Two tails test: In this case one is
concerned with both sides of the
curve. This is concerned with
testing it is not equal to.
The decision rule is that if the calculated
value is more than the critical value,
reject the H0
Acceptance region
Critical Critical
region region

Lower
Upper
Critical
Critical
value
Value
ii)One tail test: in this case one is
concerned only on one side of the curve.
There are two types here:
a) Right (Upper) tail test: In this case the
researcher is concerned with the greater
than, more than, better than or above a
given value.
The decision rule is that if the calculated is greater
than the critical, reject H0

b) Left (Lower) tail test: in this case, one is


concerned with the lesser than, worse than
or below a given value.
The decision rule is that if the calculated is
less than the critical, reject H0
Right tail test

Acceptance region
Critical
region

Critical
Value
Left Tail test

Acceptance region
Critical
region

Critical
value
E.g.
• The School of Allied Health claims that
the average performance of students in
national exam is 48% Formulate the null
hypothesis and its alternative.
Two tails test
H0: μ=48%
Ha: μ≠48%
Acceptance region
Critical Critical
region region

Lower
Upper
Critical
Critical
value
Value
One tail test
• Right tail test Left tail test
H0: μ=48% H0: μ=48%
Ha: μ>48% Ha: μ<48%
Result of Hypothesis
testing
• There are four possible results in
hypothesis testing:
i. Accepting a true null hypothesis. This is a
correct decision.
ii. Rejecting a false null hypothesis. This is a
correct decision.
iii. Rejecting a true null hypothesis. This is a
wrong decision and is known as Type I (or
type α or A) error .
iv. Accepting a false null hypothesis. This is a
wrong decision ad known as Type II (or type
β or B) Error.
Acceptance and rejection
regions
• Acceptance region is that area within
which the null hypothesis is
accepted.
 It depends on confidence level which is
usually expressed as a percentage . Ex. 99%,
95% and 90%.
• Rejection region is that area that if the
null falls is rejected.
 It usually depends on level of significance (α)
ex. 0.01, 0.05 and 0.1
• The acceptance region and the rejection region
are separated by critical values that are
obtained from statistical tables.
• The critical values depend on:
i. The type of distribution
ii. The sample size
iii. The level of significance (α) (=1-level of confidence)
iv. The degrees of freedom (d.f)
v. Whether dealing with one tail or two tails test,
Most common critical
values for Z tests
Critical Values (for normal distribution)

level of Two tailed One tailed One tailed


significance test test (upper test (Lower
Tail) tail)
10% ±1.65 +1.28 -1.28

5% ±1.96 +1.65 -1.65

1% ±2.58 +2.33 -2.33


T-TEST
• Is a type of inferential statistic used to
determine if there is a significant difference
between 2 means.

Three types:
• One Sample t-test
• Two Independent/Unpaired t-test - 2 different
groups
• Two Correlated/Dependent/Paired t-test - before
and after treatment
When to use T-test?
Placebo: Control
Group (sugar pill)

This control group is a


group who are given a
placebo or a sugar pill.

Hence, it has no, it has no


effect. to the control
group. This is the BASIS
or
COMPARATOR.
When to use T-test?
Experimental group -
this is the group who will
actually receive the
actual drug that we are
trying to test here in this
experiment.
• The control group may
show 13 out of 20
recoveries.
• While the group
taking the new drug has
16 out of 20
recoveries.
Finding critical values
from the t-table
• Student t-distribution table is used when
the sample size is less than 30 (n<30)
• It is also used if the standard deviation is not
known.
• The critical values from a t table depends on
a) Level of significance (α)
b) No of sampled items
c) Degree of freedom (d.f.).
NOTE: the d.f.=n-1 for one sample test.
d.f.=n-2 for a two sample test
d) Whether two tails test or one tail test.
Ex.
• Find the critical values from a t
table given the following.
i. Confidence level = 95% and two
tails test
ii. α = 0.05 and right tail test
iii. level of confidence = 90% and left
tail test
Ex.
• Let's assume we have a sample size
of 20. So, df = 20 - 1 = 19.
▪ Two-tailed test with 95% confidence level (α =
0.05)
➢ We need to find the t-value that corresponds to α/2 = 0.025
and df = 19.
➢ Looking at a t-table, we find the critical values to be
approximately ±2.093.
Ex.
• Let's assume we have a sample size of
20. So, df = 20 - 1 = 19.
▪ Right-tailed test with α = 0.05:

➢ Find the t-value that corresponds to α = 0.05 and df = 19.


➢ Looking at a t-table, we find the critical value to be
approximately 1.729.

• Left-tailed test with 90% confidence level (α = 0.10):


 Find the t-value that corresponds to α = 0.10 and df = 19.
 Looking at a t-table, we find the critical value to be
approximately -1.327. (Note the negative sign, indicating the
left tail.)
Finding critical values
from a chi square table
• Chi square test is a non parametric test
that compares the observed frequency
with the expected frequency.
• The critical values from a χ2 depend on:
i. Level of significance
ii. Degrees of freedom
df = (number of rows - 1) x (number of
columns - 1)
Ex.
• Find the critical values from chi square
table given the following
i. Level of confidence =90% d.f.=5
ii. α=0.01, d.f.= 10
Level of confidence =90%
d.f.=5
• Case 1: 90% Confidence Level, df = 5
 Significance Level (α): 0.10
 Degrees of Freedom (df): 5
 This value is approximately 9.236

• Case 2: α = 0.01, df = 10
• Significance Level (α): 0.01
• Degrees of Freedom (df): 10
• This value is approximately 23.209.
IMPORTANT TERMS
• HYPOTHESIS TESTING
 Used to mean a statement about one or more
parameters of a population or populations.

• PARAMETRIC TEST – two or more


hypothesis
• Testing the validity of one of these
statements through statistical test is what
we call hypothesis testing.
HYPOTHESIS TESTING
• H0

• NULL HYPOTHESIS - Refers to the statement about


the absence of any effect claimed for a certain action.
Asserts the absence of difference between the observed
and the expected values.

 totally negate the significance that you are trying to expect


from a certain study/research
 Null: invalidity; equating something to the value of 0.
HYPOTHESIS TESTING
• Ha or H1
• ALTERNATIVE HYPOTHESIS - Refers to the assertion
contradicting the null hypothesis
 companion or supportive mechanism

• Null hypothesis TRUE ; Alternative hypothesis


FALSE
H0: μ=μo ; Ha: μ ≠ μo
• This only shows that your null hypothesis and alternative
hypothesis should always have contradicting results/.
• Hypothesis in terms of t-test or parametric test.This is for t-
test.
 Means for every group: μ (mju)
 Null hypothesis: H0
 Alternative hypothesis: Ha / H1
 unequal
Level of Significance
• Value of the probability of rejecting the hypothesis.
Usually presented by an α (alpha).

• Commonly used values range from 0.01 to 0.10


(particularly 0.05=5%).
 0.01 = 1%
 0.10 = 10%

• Lower chance of rejecting the true null hypothesis which


is better because it tests the true credibility or
significance of your study.

5% chance= REJECT H0 | 95% chance= DO NOT REJECT


H0
Confidence Level
• Otherwise known as Confidence Interval
• Alternate of Level of Significance
 If 5% LOS; 95% Confidence level
• Probability of not rejecting the null hypothesis
• There should always be a bigger chance of not
rejecting your null hypothesis because again, you
would like to test for the credibility of the significance
of your research
Critical Value
• Tabular value – shows the area of rejection and the area of
acceptance

• Basis of accepting or rejecting the null hypothesis

2.575 as the critical value


Critical Value
• Assuming that the said critical value is 2.575

• Any value that is higher or lower than 2.575, we will


automatically reject the null hypothesis.

• On the other hand, if the value is equal to 2.575, then we


will accept the hypothesis.

2.575 as the critical value


Critical Value
• If it is equal to 2.575, your t-test result is 2.575 as well,
same with your critical value, then we will accept the null
hypothesis. Hence, there is no significance for the
research.

• On the other hand, if it is greater than or less than your


critical value, then you will reject the null hypothesis.

2.575 as the critical value


ONE or TWO TAILED
TEST
• Technically dependent on Alternative Hypothesis

• Two-tailed Test: The whole diagram is utilized; If you are


utilizing the whole diagram, you are using of equal, greater
than, or less than

≠ (TWO-TAILED TEST)

• One-tailed Test: Happens if only half of the graph in


utilized

≤ or ≥
It is always dependent on the alternative hypothesis on how
you can identify if your test is one-tailed or two-tailed.
TYPES OF T-TEST
✓ ONE SAMPLE T-TEST
 Compares one sample mean to a known population mean.

Example: Comparing the average height of a sample of


students to the known average height of the general
population

Hypotheses:

• Null Hypothesis (H₀): μ = μ₀ (The sample mean is equal


to the population mean.)

• Alternative Hypothesis (H₁): μ ≠ μ₀ (The sample mean is


different from the population mean.)
TYPES OF T-TEST
✓ INDEPENDENT SAMPLE T-TEST
 Compares the mean of two independent samples

Example: Research conducted on effectivity of a new set


of workout routine in obese patients.

• Group 1: Obese patients with new workout (S)

• Group 2: Obese patients doing old workout (P)

• Null Hypothesis (H₀): μS = μP (There is no significant


difference in mean weight loss between the two groups.)

• Alternative Hypothesis (H₁): μS > μP (The mean weight


loss of the new workout group is significantly greater than
the old workout group.)
TYPES OF T-TEST
✓ TWO SAMPLE INDEPENDENT T-TEST
 Makes use of unpaired samples.
 Most common form of t-test
 It helps you to compare the means of two sets of data.
 Random pick from a different type of population
Examples: Prevalence of Alzheimer’s in Males and Females
of
Baguio City, Benguet.

 Group 1: Males
 Group 2: Females
Research in the Effects of Salbutamol vs. Ambroxol
treatment with patients of AMCM
 Group 1: Salbutamol
 Group 2: Ambroxol
TYPES OF T-TEST
✓ TWO SAMPLE DEPENDENT T-TEST
 Makes use of dependent samples or paired samples.
 Essentially connected because again, they are paired —
they are tests done on the same person or thing; before
or after treatment.
Examples: Effect of using Social Networking Site on
Medtech Students
 Group 1: Before social network
 Group 2: After social network

• Stress difference of students before and after


BioStatistics Exam
 Group 1: Before BioStat exam
 Group 2: After BioStat exam
ANOVA (Analysis of
Variance) – f-test
• F- distribution
• Logical extension of the t-test
• In instances where you deal with researches that
presents one or two variables at the same time that
discusses multiple categories, ANOVA (f-test) would
be your statistical tool in solving for your hypotheses.
TYPES OF ANOVA
• ONE WAY ANOVA
 Only one variable is being tested
 A one way ANOVA is used to compare two or more
means with the same variable or category.

• A one way ANOVA is used to compare two means from


two independent (unrelated) groups using the
Fdistribution.
Ex.
1. What is the relationship between the number of hours
worked (per week) and health, as measured by BMI
among employees working in AMCM.
 POSSIBLE GROUPS: BMI Result - Underweight, Obese,
Normal, etc.

2. A group of psychiatric patients are trying three


different therapies: counseling, medication and
biofeedback. You want to see if one therapy is better
than the others.

POSSIBLE GROUPS: Counseling, Medication, Biofeedback


TYPES OF ANOVA
• TWO WAY ANOVA
 More than two variables are being tested
 With a Two Way ANOVA, there are two independents
(categories or variables).
 If your experiment has a quantitative outcome and you have
two categorical explanatory variables, a two way ANOVA is
appropriate.
Is there an interaction between income and gender for
anxiety level at job interviews?
 POSSIBLE GROUPS: Gender - Male, Female,
Trans; Income - Middleman, Rich, Poor

Forty-five AUF MT students were randomly assigned to


one of three instructors and to one of three methods of
teaching. Achievement was measured on a test
administered at the end of the term.
 POSSIBLE GROUPS: Instructors - 1, 2, 3; Methods -
Verbal, Visual, Kinesthetic, etc.

You might also like