Analysis of Variance
Analysis of Variance
Analysis of Variance
ANOVA is a technique that will enable us to test for the significance of the difference among more
than 2 sample means.
Assumptions in ANOVA:
Determine one estimate of the population variance from the variance among the sample means.
Determine a 2nd estimate of the population variance from the variance within the sample.
Compare these two estimates if they are approximately equal in value, accept the null
hypothesis.
Within Samples
Sum of Squares
Sum of squares
between samples
(SSC)
Sum of squares
within samples
(SSE)
Degree of
freedom
1 K 1
2 N K
Mean square
Mean squares
between samples
SSC
MSC
K 1
Mean squares
within samples
SSE
MSE
N K
F-ratio
Fc
MSC
MSE
X X
2. Correction Factor(C.F) =
....
T2
N
T 2
X 22 .... -
N
( X 1 ) 2 ( X 2 ) 2
T 2
.....)
4. SSC ( Sum of Squares between samples) = (
n
n
N
5. MSC =
2
1
SSC
df
SSE
df
1. A common test was given to a number of students taken at random from a particular class of
the four departments concerned to assess the significance of possible variation in performance.
Make an analysis of variance given in the following data:(Take the level of significance as 5%)
Departments
C
M
E
I
9
12
17
13
10
13
17
12
13
11
15
12
9
14
9
18
9
5
7
15
7. MSE =
Sample II
X2
X22
12
144
13
169
11
121
14
196
5
25
55
655
Sample III
X3
X32
17
289
17
289
15
225
9
81
7
49
65
933
Sample IV
X4
X42
13
169
12
144
12
144
18
324
15
225
70
1086
X1 X 2 X 3 X 4 ....= 50 + 55 + 65 + 70 = 240
(240)
T2
Step 2:Correction Factor(C.F) =
=
= 2880
N
20
Step 3: Total Sum of Squares (TSS) = Sum of squares of all items - CF
T 2
X 22 .... -
N
= 512 + 655 + 933 + 1006 2880
= 226
( X 1 ) 2 ( X 2 ) 2
T 2
.....)
Step 4: SSC (Sum of Squares between samples) = (
n
n
N
2
1
....) 2880
5
5
5
5
= 50
Step 5: MSC = Mean square between samples =
50
SSC
=
16.67
df
4 1
Step 6: Sum of Squares within Samples (SSE) = TSS SSC = 226 50 = 176
Step 7: MSE =
Source of
Variation
Between
Samples
Within
Samples
SSE
= 11
df
Sum of Squares
Sum of squares
between samples
(SSC) = 50
Sum of squares
within samples
(SSE) = 176
Degree of
freedom
1 K 1
=3
2 N K
= 20 4
= 16
Mean square
Mean squares
between samples
SSC
MSC
K 1
= 16.67
Mean squares
within samples
SSE
MSE
N K
= 11
F-ratio
MSC
MSE
= 1.515
Fc
2. Three different machines are used for a production. On the basis of the outputs, set up one-way
ANOVA table and test whether the machines are equally effective.
Outputs
Machine I
Machine II
Machine III
10
9
20
15
7
16
11
5
10
10
6
14
Given that the value of F at 5% level of significance for (2,9) df is 4.26.
Solution:
Source
of
Variation
Between
Samples
Within
Samples
Sum of Squares
Degree of
freedom
Mean square
Sum of squares
between samples
(SSC) = 162.17
1 K 1
Mean squares
between samples
SSC
MSC
K 1
= 81.085
Mean squares
within samples
SSE
MSE
N K
= 13.63
Sum of squares
within samples
(SSE) = 122.75
= 3-1
=2
2 N K
= 12 3
=9
F-ratio
MSC
MSE
= 5.95
Fc
Sum of
Squares
SSC
Degree of
freedom
k1
Mean Square
MSC
SSC
k 1
F ratio
FC
MSC
MSE
Between rows
(r = Number of rows)
Residual (or) Error
SSR
r -1
SSE
(k-1)(r-1)
SSR
MSR
FR
r 1
MSE
SSE
MSE
(r 1)(k 1)
MSR
P
6
1
7
1
Breed 2
3
Q
3
3
3
Rations
R
2
8
5
S
9
7
2
1
2
3
Total
Rations
Q
R
3
2
3
8
3
5
9
15
Total
S
9
7
2
18
20
19
17
56(T)
Step 1: Total T = 56
(56) 2
T2
Step 2: Correction Factor CF =
=
261.33
12
N
Step 3: SSC = Sum of Squares between columns(Rations)
( X 1 ) 2 ( X 2 ) 2
T 2
.....)
=(
n
n
N
(14) 2 (9) 2 (15) 2 (18) 2
....) 261.33
3
3
3
3
= 14
Step 4: SSR = Sum of Squares between rows (workers)
=(
( X 1 ) 2
( X 2 ) 2
T 2
.....)
n
n
N
(20) 2 (19) 2 (17) 2
=(
.
) 261.33
4
4
4
= 1.17
=(
Sum of Squares
Degree of
freedom
k 1 = 4 -1
=3
Mean Square
SSR = 1.17
r -1 = 3 1
=2
SSE = 63.5
(k-1)(r-1) = 6
SSR
r 1
= 0.585
SSE
MSE
(r 1)(k 1)
SSC = 14
MSC
SSC
k 1
= 4.67
MSR
= 10.58
F ratio
MSC
MSE
= 2.26
FC
MSR
MSE
= 18.08
FR
2. The following data represent the number of units of production per day turned out by 5
different workers using 4 different types of machines.
Workers
1
2
3
4
5
A
44
46
34
43
38
Machine Type
B
C
38
47
40
52
36
44
38
46
42
49
D
36
43
32
33
39
a) Test whether the mean production is the same for the different machine types.
b) Test whether the 5 men differ with mean productivity.
Null hypothesis: i) The mean productivity is the same for four different machines.
ii) 5 men do not differ with respect to mean productivity.
Since the data is too large, code the data by subtracting 40 from each value.
Workers
1
2
3
4
5
Total
A
-4
6
-6
3
-2
5
Machine Type
B
C
-2
7
0
12
-4
4
-2
6
2
9
-6
38
Total
D
-4
3
-8
-7
-1
-17
5
21
-14
0
8
T = 20
Step 1: Total T = 20
400
T2
20
Step 2: Correction Factor CF =
=
N
20
Step 3: SSC = Sum of Squares between columns(machines)
( X 1 ) 2 ( X 2 ) 2
T 2
.....)
=(
n
n
N
....) 20
=(
5
5
5
5
= 338.8
Step 4: SSR = Sum of Squares between rows (workers)
( X 1 ) 2 ( X 2 ) 2
T 2
.....)
=(
n
n
N
....) 20
=(
4
4
4
4
4
= 161.5
Step 5: Total Sum of Squares (TSS) = Sum of squares of each values CF
T 2
= X 12 X 22 .... -
N
= 594 - 20
= 574
Step 6: SSE = Residual
= TSS ( SSC + SSR)
= 574 - 338.8 161.5
= 73.7
SSC
Step 7: MSC = Mean square between columns =
= 112.933
df
Source of
Variation
Between
columns
( k = Number of
columns)
Between rows
(r = Number of
rows)
Residual (or)
Error
Sum of Squares
SSC = 338.8
SSR = 161. 5
SSE = 73.7
Degree of
freedom
k 1 = 4 -1
=3
Mean Square
r -1 = 5 1
=4
SSR
r 1
= 40.375
SSE
MSE
(r 1)(k 1)
(k-1)(r-1) =
12
SSC
k 1
= 112.933
MSC
MSR
F ratio
MSC
MSE
= 18.38
FC
MSR
MSE
= 6.574
FR
= 6.142
I
6
Salesmen
II
III
5
3
IV
8
B
8
9
6
5
C
10
7
8
7
Set up the analysis of variance table and test whether there is any significant difference i) between
sales by the firm salesmen and ii) between sales in the three states.
Solution:
Null hypothesis: i) there is no significant difference between the sales by the firms salesmen and
ii) there is no significant difference between sales in the three states.
Source of
Variation
Between columns
( k = Number of
columns)
Between rows
(r = Number of
rows)
Residual (or)
Error
Sum of
Squares
SSC =
8.334
Degree of
freedom
k 1 = 4 -1
=3
SSR =
161. 5
r -1 = 3 1
=2
SSE =
73.7
(k-1)(r-1) = 6
Mean Square
SSC
k 1
= 2.778
SSR
MSR
r 1
= 6.334
SSE
MSE
(r 1)(k 1)
= 3.444
MSC
F ratio
MSC
MSE
= 0.81
MSR
FR
MSE
= 1.84
FC
fertility, each block containing k plots. Thus the plots in each block will be of homogeneous
as far as possible.
Within each block, the k treatments are given to the k plots in a perfectly random
manner, such that each treatment occurs only once in any block. But the same k treatments
are repeated from block to block.
Null hypothesis: Rows and columns are homogeneous
3. Latin Square Design: (LSD) (Three factor classification)
We consider an agricultural experiment, in which n2 plots are taken and arranged in the form
of an n x n square, such that the plots in each row will be homogeneous as far as possible with
respect to one factor of classification say (soil fertility) and plots in each column will be
homogeneous as as far as possible with respect to another factor of classification say (seed quality)
The n treatments are given to these plots such that each treatment occurs only once in each row
and only once in each column.
The various possible arrangements obtained in this manner are known as Latin squares of order
n. Here rows, columns and letters stand for the three factors say fertility, seed quality and treatment
respectively.
Null hypothesis: Rows, columns and letters are homogeneous.
Comparison of RBD & LSD:
1. The number of replications of each treatment is equal to the number of treatments in LSD,
whereas there is no such restrictions on treatments and replication in RBD.
2. LSD can be performed on a square field, while RBD can be performed either on a square field
or a rectangular field.
3. LSD is known to be suitable for the case when the number of treatments is between 5 and 12,
whereas RBD can be used for any number of treatments.
4. The main advantage of LSD is that it controls the effect of two extraneous variables, whereas
RBD controls the effect of only one extraneous variable. Hence the experimental error is
reduced to a larger extent in LSD than in RBD.
1. Three varieties A, B, C of a crop are tested in a RBD with four replications. The plot yields in
pounds are as follows.
A6
C5 A8 B9
C8
A4 B6 C9
B7
B 6 C 10 A 6
Analysis the experimental yield and state your conclusion.
Solution:
Null hypothesis: There is no significant difference between varieties ( rows) and between yiels
(blocks)
Source of
Sum of Squares
Degree of
Mean Square
F ratio
Variation
Between
columns
( k = Number of
columns)
Between rows
(r = Number of
rows)
Residual (or)
Error
SSC = 18
freedom
k 1 = 4 -1
=3
SSR = 8
r -1 = 3 1
=2
SSE = 10
(k-1)(r-1) =
6
SSC
k 1
=6
MSC
SSR
r 1
=4
SSE
MSE
(r 1)(k 1)
MSR
MSC
MSE
= 3.6
FC
MSR
MSE
= 2.4
FR
= 1.667
Tabulated value : i) (3,6) df at 5% level of significance is 4.76.
ii) ( 2,6) df at 5% level of significance is 5.14
Conclusion: i) there is no significant difference between yields.
ii) There is no significant difference between varieties.
2. The following data resulted from an experiment to compare three burners B1 , B2 and B3 . A
latin square design was used as the tests were made on 3 engines and were spread over 3 days.
Day 1
Day 2
Day - 3
Engine 1
B1 - 16
Engine 2
B2 -17
Engine - 3
B3 - 20
B2 - 16
B3 - 15
B3 - 21
B1 - 15
B2 - 13
B1 - 12