Biostatistics M1-1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 57

Dr.

Nguyen Thi Van Anh


PMAB department
Content
• Introduction: parametric and non-parametric tests

• Check normality of data

• Independent t-test

• Paired t – test

• Pearson and Spearman correlation test


Introduc<on

Descriptive statistics Inferential statistics

• Collecting • Analyzing data

• Summarizing • Conclusion about the

• Presenting data population


How can use statistics to extrapolate from
sample to population?

Gaussian (normal) Non-gaussian (non-normal)


distribution
distribution (bell-shaped)

Parametric tests Non-parametric tests

Ex: t-test, ANOVA test Ex: Mann-Whitney test,


Wilcoxon test
Introduc<on
Parametric tests Non-parametric tests

• Assume normal distribution • No assumed normal distribution


• Handles Interval/Ratio data • Handles Ordinal/Nominal data
• Results can be significantly • Results cannot be significantly
affected by ouliers affected by ouliers
• More statistical power • Less statistical power
Measurement and measurement scales
Measurement scale: 4 types of scale ordered in that all later scales
have all the properties of the earlier scales, plus new properties.

ü Norminal scale: use number just to distinguish categories

ü Ordinal scale: observations can be ranked

ü Interval scale: distance (not ratio) between 2 measurements is known

ü Ratio scale: equality of ratios and intervals may be determined


Examples of measurement scales

• Nominal scale:

Gender: male = 1, female = 2

“Scale” simply labels object


Examples of measurement scales
• Ordinal scale
Numbers used to place subjects in orders
But, No information on differences (intervals) between categories
Examples of measurement scales

• Interval scale:
Temperature: Celsius or Fahrenheiht

Same difference between 5°C (41°F) and 15°C (59°F) as between


20°C (68°F) and 30°C (86°F)

v The interval difference is meaningful


v BUT, we can not defend ratio relationships
v Zero point does not mean a true zero or absence of quantity

0°C does not mean lack of heat


We can’t say: 80°C is twice as hot as 40°C
Examples of measurement scales

Ratio scale
Height, weight, length, time…
We can say “100 kg is double 50 kg”
0 is meaningful (“absence of characteristics”)

v Have a true zero


v Ratio is meaningful
Variable

(Numerical)

(Quanlita<ve)
Parametric and Non-parametric test

Parametric tests Non-parametric tests


Make assumptions about Make no assumptions
parameters of the
population

Check the normality of data


Normal distribu:on

The distribu:on gives informa:on about


• a typical value (a center) which data are spread
• the variability of values (the spread of distribu:on)
• Shape of distribu:on (whether a distribu:on is symmetric or
skewed,…)
Normal distribu:on
Symetric distribution: right half is a mirror image of left half
Asymetric distribution = skewed distribution
Normal distribu:on
Check the normality
• Graphical tests

• Analysis of Skewness and Kurtosis

• Statistical tests

• Transformation
Check the normality – Graphical tests
Histogram (a simple check):
Check the normality – Graphical tests
Histogram – smaller samples:
Check the normality – Graphical tests
QQ plot: scatter diagram
Check the normality – Graphical tests
Box plot:
Check the normality –
Skewness and Kurtosis test

P < 0.05 : non-normal


P > 0.05: normal

Compare your data with a known distribution (normal distribution)


Check the normality – Transforma<on

• When a sample is not normally distributed

è Transform data è transformed data is more normal

• Ex: f(x) = log(x)


Check the normality – Transforma<on
• When a sample is not normally distributed

è Transform data è transformed data is more normal

• Ex: f(x) = log(x)


T- tests
• Independent sample t – test
• Paired t - test
The null – alterna:ve hypothesis
• Null hypothesis (Ho):
- What we attempt to find evidence to against our hypothesis
- P-value < α (significance level) è Reject Ho
- P-value > α è Fail to reject Ho
The null – alterna:ve hypothesis
• Alternative hypothesis (HA):
- What we attempt to demonstrate our hypothesis
- Reject Ho è accept HA
- Fail to reject Ho è do not accept HA
T-test
• Comparing two population means

• Examples:
2 methods to purify water: Method A: use a carbon filter
Method B: use a certain enzyme
µ1: mean bacteria count (method A)
µ2: mean bacteria count (method B)
We want to check if µ1 > µ2 è µ1-µ2 > 0
T-test

Are the data independent samples or dependent samples?


Independent/dependent samples
Are the data are independent samples or dependent samples?

Independent samples:
- The sample collected from one population has no relationship with
the sample collected from the other population

Dependent samples:
- Each measurement in 1 sample is matched or paired with
a particular measurement in the other sample
Independent/dependent samples
Example: Compare whether people give a higher taste rating to
Coke or Pepsi. To avoid psychological effects, people should
taste the drink blind.
A. Randomly assign half of the subjects to taste Coke and the
other half to taste Pepsi
B. All subject taste both Coke and Pepsi. The drinks should be
given in a random order.
Independent sample t-test
• Evaluate whether 2 means from 2 samples of the same
dependent variable are significantly different from one another

• Used only with interval or ratio data but not norminal nor ordinal

• Used when the data are reasonably normally distributed


Independent sample t - test
• Null hypothesis H0 “There is no significant difference between
the two groups in terms of dependent variable”

• Alternative hypothesis HA “There is a significant difference


between the two groups in terms of dependent variable”
Independent sample t - test
Ex: Same dependent variable – baby birth weight

Independent variable – 2 groups of expectant mothers

- Group 1: drink < 2 bottles of water per day

- Group 2: drink > 2 bottles of water per day


Independent sample t - test
Ex: Ho: No significant difference in the baby birth weight
between two mother groups

HA: There is significant difference in the baby birth weight


between two mother groups

Daily water drinking

Expectant group
Independent sample t - test
The formula for the independent samples t-test

t-value =
Independent sample t - test

6 10

t – value =
Independent sample t - test

df = N1 + N2 -2
Independent sample t - test

Sp =
Independent sample t - test

Determine cri<cal t value:


Degree of Standard t value to determine
freedom significant difference or not
between 2 means
Independent sample t - test
(1-α)100% Confidence interval for the difference between
two means:
Independent sample t - test
• Let’s begin with Computa<on 1
Independent sample t - test
• Let’s begin with Computa<on 1
Specific version

Simplified version

Need to calculate: sample size, variance


Independent sample t - test
Independent sample t - test
Ho: µ1 - µ2 ≤ 0; Ha: µ1 - µ2 > 0
If N1 = 30 = N2, variance = 2.0; Mean 1 = 10, mean 2 = 6
è SE = 0.395
è t = 10.95
Look for t-critical value: t (α)
(df= N1+N2-2)
• t > t-critical: reject Ho
• T < t-critical: fail to reject Ho

α = 0.05, df = 58 è t-critical value = 1.671


Independent sample t - test
t = 10.95 > 1.671 è reject Ho

The mean weight of babies whose mothers drink less than 2 bottles
of water per month is statistically significantly greater than the mean
weight of babies whose mothers drink more than 2 bottles of water
per month
Independent sample t - test
• Same variance

Subtracting distributions with similar variance yield more stable results

Variances must be tested for similarity


• Different variance

Subtracting distributions with similar variance yield lest stable results


Independent sample t - test
Exercise: In a packing plant, a machine packs cartons with jars. It is
supposed that a new machine will pack faster on the average than
the machine currently used. To test that hypothesis, the times it
takes each machine to pack ten cartons are recorded.

Can we conclude that


the new machine
packs faster?
(α = 0.05)
Independent sample t - test

Assumption 1: Independent samples? Yes

Assumption 1: Normal population? n1, n2 < 30


Independent sample t - test

Assumption 3: Same variances? Yes

Rule of thumb: s1/s2 = [0.5 – 2]: same variance


Independent sample t - test

Independent sample t - test
Independent sample t - test
Independent sample t - test
Independent sample t - test

Example: Calculate 99% CI for the difference between the mean time
it takes the new machine to pack 10 cartons and that of the new
machine to pack 10 cartons?
Independent sample t - test
df = 18

Interpret the results:

You might also like