Anova: Analysis of Variation: Math 243 Lecture R. Pruim
Anova: Analysis of Variation: Math 243 Lecture R. Pruim
Anova: Analysis of Variation: Math 243 Lecture R. Pruim
Informal Investigation
Graphical investigation: side-by-side box plots multiple histograms Whether the differences between the groups are significant depends on the difference in the means the standard deviations of each group the sample sizes ANOVA determines P-value from the F statistic
days
9 8 7 6 5 A B P
treatment
Assumptions of ANOVA
each group is approximately normal
check
this by looking at histograms and/or normal quantile plots, or use assumptions can handle some nonnormality, but not severe outliers
rule of thumb: ratio of largest to smallest sample st. dev. must be less than 2:1
Normality Check
We should check for normality using: assumptions about population histograms for each group normal quantile plot for each group
With such small data sets, there really isnt a really good way to check normality from data, but we make the common assumption that physical measurements of people tend to be normally distributed.
Compare largest and smallest standard deviations: largest: 1.764 smallest: 1.458 1.458 x 2 = 2.916 > 1.764
Note: variance ratio of 4:1 is equivalent.
x i x
ij
xi
The ANOVA F-statistic is a ratio of the Between Group Variaton divided by the Within Group Variation:
R ANOVA Output
treatment Residuals
Df Sum Sq Mean Sq F value Pr(>F) 2 34.7 17.4 6.45 0.0063 ** 22 59.3 2.7
( xij xi )
Count
Sum Average Variance 3 18 6 0.49 4 23.8 5.95 0.176667 3 22.6 7.533333 0.123333
1 less than number of groups 1 less than number of individuals (just like other situations)
data group 5.3 1 6.0 1 6.7 1 5.5 2 6.2 2 6.4 2 5.7 2 7.5 3 7.2 3 7.9 3 TOTAL TOTAL/df
F = 2.5528/0.25025 = 10.21575
# of data values - # of groups 1 less than # of groups (equals df for each group added together)
(x
obs
ij
xi )
(x
obs
ij
x)
(x
obs
x)
F = MSG / MSE
So How big is F?
Since F is
ij
n 1
SST
2
DFT
MST
So SST = (n -1) s2, and MST = s2. That is, SST and MST measure the TOTAL variation in the data set.
ij
xi
ni 1
So SS[Within Group i] = (si2) (dfi ) This means that we can compute SSE from the standard deviations and sizes (df) of each group:
2 I
In Summary
SST (x ij x ) s (DFT)
2 2 obs
SSE (x ij x i ) si (df i )
2 2 obs groups 2
SSG (x i x)
obs
n (x
i groups
x)
R2 Statistic
R2 gives the percent of variance due to between group variation
Level A B P
N 8 8 9
Pooled StDev =
Individual 95% CIs For Mean Based on Pooled StDev ----------+---------+---------+-----(-------*-------) (-------*-------) (------*-------) ----------+---------+---------+-----7.5 9.0 10.5
Multiple Comparisons
Once ANOVA indicates that the groups do not all have the same means, we can compare them two by two using the 2-sample t test
95% confidence
Use alpha = 0.0199 for each test.
Intervals for (column level mean) - (row level mean) A B -3.685 0.435 -4.863 -0.859 -3.238 0.766 B
Only P vs A is significant
(both values have same sign)
Tukeys Method in R
Tukey multiple comparisons of means 95% family-wise confidence level
diff lwr upr B-A 1.6250 -0.43650 3.6865 P-A 2.8611 0.85769 4.8645 P-B 1.2361 -0.76731 3.2395