T-Tests, Anovas & Regression: and Their Application To The Statistical Analysis of Neuroimaging
T-Tests, Anovas & Regression: and Their Application To The Statistical Analysis of Neuroimaging
T-Tests, Anovas & Regression: and Their Application To The Statistical Analysis of Neuroimaging
Regression
and their application to the statistical
analysis of neuroimaging
OVERVIEW
• Basics, populations and samples
• T-tests
• ANOVA
• Beware!
• Summary Part 1
• Part 2
Basics
• Hypotheses • Descriptive vs
– H0 = Null-hypothesis inferential statistics
– H1 = experimental/ • (Gaussian)
research hypothesis distributions
Sample
(of a population)
t-tests and
distributions
12
10
95% CI
8
x1 x 2
t
s x1 x2
Reporting convention: t= 11.456, df= 9, p< 0.001
Formula cont.
x1 x 2
t
s x1 x2
2 2
s1 s2
Cond. 1 Cond. 2
s x1 x2
n1 n2
Types of t-tests
Independent Related
Samples Samples
also called dependent means
test
2.5% 2.5%
Mean Mean
A known
value
Mean
5%
Comparison of more than 2
samples
Tell me the
difference
between these
groups…
Thank God I have
ANOVA
ANOVA in VWFA (2x2)
comp
infer
• Is activation in VWFA for
12
different for a) naming and
reading and b) influenced by
10
age and if so (a + b) how so?
95% CI
8 • H1 & H0
• H2 & H0
6
• H3 & H0
•
Left hemisphere right hemisphere
Old
ANOVA
• ANalysis Of VAriance (ANOVA)
– Still compares the differences in means between groups but it
uses the variance of data to “decide” if means are different
• F- statistic
– Magnitude of the difference between the different conditions
– p-value associated with F is probability that differences between
groups could occur by chance if null-hypothesis is correct
– need for post-hoc testing (ANOVA can tell you if there is an
effect but not where)
Reporting convention: F= 65.58, df= 4,45, p< .001
Types of ANOVAs
Type 2-way ANOVA for independent repeated measures ANOVA mixed ANOVA
groups
Participants Condition Condition II Condition Condition II Condition Condition II
I I I
NOTE: You may have more than 2 levels in each condition/ task
BEWARE!
• Errors
– Type I: false positives
– Type II: false negatives
Correlation
- How much linear is the relationship of two
variables? (descriptive)
Regression
- How good is a linear model to explain my data?
(inferential)
Correlation:
- How much depend the value of one variable on the value of
the other one?
Y Y
X X X
Covariance
( x x)( y
i i y)
cov( x, y ) i 1
n
( x x)( y
i i y)
cov( x, y ) i 1
n
sign of covariance =
sign of correlation
Y
Y Y
X X X
Positive correlation: cov > 0 Negative correlation: cov < No correlation. cov ≈ 0
0
How to describe correlation (2):
cov( x, y)
rxy (S = st dev of sample)
sx s y
Problems:
- It is sensitive to outlayers
(25, 7.498)
Preliminars:
Lineal dependence between 2 variables
Two variables are linearly dependent when the increase of one
variable is proportional to the increase of the other one
y
x
y2 y1
m
x2 x1
y (1) n
m m
1 0
n
0 1
Samples: ‘m’= Energy needed to boil one liter of water , ‘n’=0
‘m’ = prize of one coffeepot, ‘n’= fixed tax/comission to add
Fiting data to a straight line (o viceversa):
Here, ŷ = ax + b
– ŷ : predicted value of y
– a: slope of regression line
– b: intercept
ŷ = ax + b
εi
= ŷi, predicted
= yi , observed
εi = residual
Residual error (εi): Difference between obtained and predicted values of y (i.e. yi- ŷi)
Best fit line (values of b and a) is the one that minimises the sum of squared errors
(SSerror) (yi- ŷi)2
Adjusting the straight line to data:
• Minimise (yi- ŷi)2 , which is (yi-axi+b)2
• Minimum SSerror is at the bottom of the curve where the gradient is zero
– and this can found with calculus
rs y
a b y ax
sx
• This calculus can allways be done, whatever is the data!!
How good is the model?
• We can calculate the regression line for any data, but how well does it fit the
data?
• From this we can see that the greater the correlation the smaller the error
variance, so the better our prediction
Is the model significant?
• F-statistic:
sŷ2 r2 (n - 2)2
F(df ,df ) = =......=
ŷ er
ser 2 1 – r2
r (n - 2)
• And it follows that:
t(n-2) =
√1 – r2
• The different x variables are combined in a linear way and each has its
own regression coefficient:
y = b0 + b1x1+ b2x2 +…..+ bnxn + ε
y = b 0 + b 1 x 1 + b 2 x2 + ε
x2
x1 ŷ = b0 + b1x1+ b2x2
Multiple regression in SPM:
y : voxel value
GLM: given a set of values yi, (voxel value at a determinated position for a
sample of images) and a set of explanatories variables xi (group, factors, age,
TIV, … for VBM or condition, movement parameters,…. for fMRI) find the
(hiper)plane nearest all the points. The coeficients defining the plane are
named b1, b2,…, bn