Design and Analysis of Experiments: Tropical Animal Feeding: A Manual For Research Workers
Design and Analysis of Experiments: Tropical Animal Feeding: A Manual For Research Workers
Design and Analysis of Experiments: Tropical Animal Feeding: A Manual For Research Workers
Chapter 8
The methodology
Once the objectives are clear, the methodology can be considered. This
should be planned to provide the data to answer the questions raised and
156 Design and analysis of experiments
to satisfy the needs of the researcher and also others who may wish to
adopt the findings and apply them in other situations. It must also be
possible within the confines of the resources available (land, animals,
buildings, pens, laboratory equipment, etc.). Some of these problems
(such as numbers of replicates and land resources) may be overcome by
conducting the research 'on-farm', which also has important implications
for short-cutting the process of research application or technology
transfer.
Types of data
Many of the measurements made in this type of work will be of the kind
that are called 'continuous variables': weight, food intake, blood levels,
etc. The pattern of variation of such variables conforms to the 'normal
distribution'. These can be analyzed by a range of tools called
parametric statistics, including regression analysis and analysis of
variance.
Certain variables of the type 'success/failure', 'germinated/not
germinated', 'conceived/not conceived' are 'discontinuous variables' and
the variation conforms to the binomial distribution. Also amongst this
type may be records of the type 'class 1, class 2 or class 3', where a
measurement 1.1 or 1.2 is not possible. These conform to the Poisson
distribution. These data cannot be analyzed by techniques like analysis
of variance but require 'non-parametric statistics'. However, in many
cases, such data can be 'transformed' using mathematical devices (e.g.
logarithm, square root, etc.) to make them conform to a normal
distribution. Percentage data should also be transformed.
Types of analysis
Although ANOVAR and regression are well-known techniques, both
may be analyzed using a newer method called the Generalized Linear
Model (GLM) which has a number of advantages. It fits a 'model' to the
data and predicts the means and variance from the model. Equivalent
examples of MINITAB instructions are:
REGRESS C1 1 C2 GLM C1=C2;
COVARIATE C2.
REGRESS C1 2 C2 C3 GLM C1=C2+C3;
COVARIATE C2 C3
ANOVA C1=C5 GLM C1=C5
ANOVA C1=C5+C6 GLM C1=C5+C6
No equivalent GLM C1=C2+C5;
COVARIATE C2.
No equivalent GLM C1=C2+C3+C5+C6;
COVARIATE C2 C3.
158 Design and analysis of experiments
Types of variables
From the above, it is clear that some variables are suited to 'regression
analysis' and some to ANOVA. The former are called continuous
variables and the latter are discrete variables. Levels of fertilizer, levels
of feed protein, etc. are continuous variables and can be analyzed in
GLM with the 'COVAR' subcommand. Discrete variables like variety
of crop, breed of livestock, etc. are analyzed with ANOVA or the
equivalent GLM command.
Number of treatments
The number of treatments which can be applied may depend on what is
available and the amount of experimental resources. However, with
continuous variables, it is always better to have more levels of a factor
where possible. For example, with 30 experimental units, more
information on the type of response to a factor will be obtained with 5
levels and 6 replicates than with 3 levels and 10 replicates. Very little is
lost in precision whereas much is gained in knowledge about the shape
of the response (linear, quadratic or cubic). Thus we can find the
maximum or optimum response level of a factor.
Numbers of replicates
An experiment uses a sample of a population as the experimental unit.
In general, the more replicates that are used, the greater the difference
that can be detected. However, experimental facilities are always limited
and therefore it is important to be economical with the use of resources.
There overriding rule is never to have less than 3 replicates per
(sub-)treatment. A more precise estimate of the numbers required
to detect the desired percentage difference with a t-test is given by the
formula:
%r
Expected difference = t x CV
The actual size of the experiment will vary with both the number of
replicates and the number of treatments. Fewer replicates are needed in
factorial experiments where the overall total is greater. Again, as a
general rule, ensure that the design has at least 15 degrees of freedom for
error (residual degrees of freedom).
Blocks
Blocking is a way to deal with known sources of variation which may be
sites on a gradient of fertility down a slope, different litters of pigs,
different farms, etc. Each block contains all treatments with replicates.
The analysis enables the variable 'block' to be measured and removed
from the error variation, eg:
Explanation:
The data consist of two sets of values (two treatments) stored in C1 and
C2. These are listed with the MINITAB command 'PRINT'. Then the
data are compared using a 't-test' with the command 'TWOSAMPLE'.
The printout shows the means, standard deviations and standard error of
the means and calculated t value. The probability value of 0.026 is less
Tropical animal feeding: a manual for research workers 161
than 0.05 and therefore the null hypothesis that C2 is NOT different to
C1 is rejected, i.e. C2 is significantly greater than C1 (P<0.05).
Explanation:
The two variables are stored in columns C1-C2 and labelled Energy and
Oil. The GLM model to test is C2=C1 and the subcommand
COVARIATE C1 (abbreviated to 'cova C1') tells MINITAB to treat C!
as a continuous variable and not a discrete series of treatments. The
probability value (P=0.011) tells us that there IS a significant
relationship between Energy and Oil (P<0.05) and the equation is given
below. Badly fitting data are also indicated. The constant and
coefficient of the regression equation are given and the equation can be
derived as:
162 Design and analysis of experiments
When more than two variables are involved, these may be included in the
model to give a multiple regression analysis. Only significant factors
should be included in the equation. The COEFFICIENT OF
DETERMINATION (r²) is found by dividing the SSx by the SStotal.
In this case:
(Equations with r² less than 70% should not be used for prediction).
Explanation:
The TABLE command is used to give the means and standard deviations
of the treatments. These are the figures that should be presented in a
published paper. Then the GLM test is used as shown to produce the
analysis of variance table.
From the results shown it can be seen that the effect of treatment is
highly significant (P<0.001). A significant F test must be obtained
before it is valid to compare treatments by a t-test.
The final table lists the means and pooled standard deviation of the
mean. This is used to test for differences. The least significant
difference is t x SE of the difference. In cases where there are a
reasonable number of replicates, t will be approximately 2. Therefore
differences between means greater than 2 x SE(difference) are
significant. In this example, there are significant differences between all
treatments.
Animal
Period 1 2 3 4
1 B A D C
2 A D C B
3 C B A D
4 D C B A
Explanation:
The analysis shows a significant effect of feed (P<0.01); the table of
means is given at the top of this page, together with their standard
deviations.
In general, a 4x4 (or better, a 6x6) latin square is suitable for this
type of experiment. The design can be chosen at random from lists of
latin square designs in statistical textbooks.
166 Design and analysis of experiments
2 3 3 3 9
640.75 650.09 711.54 667.46
65.69 18.29 22.36 48.96
3 3 3 3 9
649.58 737.08 627.28 671.32
40.65 59.81 54.66 67.68
ALL 9 9 27
601.15 674.71 663.17 646.34
80.60 64.84 50.98 71.94
CELL CONTENTS --
LWG:N
MEAN
STD DEV
Explanation:
Both energy and protein have significant effects. In addition there is a
significant interaction between energy and protein, that is, the effect of
one is mediated by the effect of the other.
Introducing covariates
Another known source of variation may be a continuous variable such as
previous milk yield, starting weight, previous performance, etc. This is
very often the case in milking experiments with cows or goats when the
experimental animals will almost certainly have different yields and be
at different stages of lactation. The following example is an experiment
with three treatments to measure the effect on the milk yield of cow.
Initial yield is stored in the data table as the variable 'init' and the
analysis is as follows:
168 Design and analysis of experiments
Explanation:
If the analysis had been performed without including initial milk yield as
a covariate, no significant differences between treatments would have
been found. However, with the inclusion of the term 'init' as a covariate,
there is a significant effect of treatment (P<0.05).
The final table of means shown are the values for each treatment
adjusted for initial milk yield. Treatment 1 again differs significantly
from the other 2.
Numerator DF Seq MS F P
Energy 1 135343 165.47 0.000
Protein 1 45991 56.23 0.000
Energy*Energy 1 4514 5.52 0.022
Protein*Protein 1 6682 8.17 0.006
Energy*Protein 1 414 0.51 0.479
Explanation:
The first TABLE gives the means for each sub-treatment with standard
deviations. The mean for each main treatment is shown at the right hand
side and bottom of the table. Then the analysis of variance is performed.
Notice that both Energy and Protein are set as continuous variables with
the subcommand COVA Energy Protein. Notice also an additional
subcommand TEST. This requires some explanation.
TEST is used as a sub-command to GLM to force MINITAB to use
the sequential sums-of-squares and consequent mean squares in the test
of significance, rather than the adjusted sums-of-squares and mean
squares, which is the default action. The difference between them is that
the adjusted sum-of- squares refers to each factor when all the others
have been accounted for; the sequential sum-of-squares is calculated
sequentially from the top so that each factor is taken out in turn.
Tropical animal feeding: a manual for research workers 171
The TEST sub-command should always be used when the factors are
NOT independent, as is inevitably the case with linear, quadratic and
cubic effects (X, X*X, X*X*X). In other experiments where the
sequential sums-of-squares and adjusted sums-of-squares are very
different, non-independence is implied and the TEST sub-command
should be used to force the use of the sequential sums- of-squares. The
factors tested by the above commands are:
This will test the main effects and the interaction (FEED, SYSTEM and
FEED*SYSTEM). This could not be used in the above example because
we excluded some of the more complex interactions.
is a powerful tool for dealing with unbalanced designs and has less
limitations. A fuller explanation of the use of GLM for unbalanced
designs is given below.
The chi-squared statistic is rather like the SS in that it is the square of the
difference between the observed result and the expected result (if the
results were averaged between the two treatments). We compute a value
for each cell, then sum the values for all the cells and compare the value
with the value in tables.
If the total chi-squared value is GREATER than the tabulated value, then
there is a significant difference between the rows or treatments.
The data should be entered into MINITAB in two columns and the
MINITAB command CHISQUARE used as follows:
MTB > chis c1 c2
C1 C2 Total
1 149 51 200
165.50 34.50
2 102 10 200
165.50 34.50
Total 331 69
Note that for a 2x2 table there is one degree of freedom (only one
comparison possible). Look up the tables on the line for 1 d.f.
Four treatments are applied to 100 cows each and the results measured
as 'conceived' or 'failed' to conceive:
First, compute the chi-squared value for the whole table (3 d.f.):
Total treatment effect (3 df) chi ² = 58.549 > 11.3 significant
(P<0.01)
Now combine rows 1+2 and 3+4 into a 2x2 table and calculate
chi-squared (1 df) to calculate the energy effect and combine rows 1+3
and 2+4 into another 2x2 table and calculate the chi-squared to test the
protein effect:
Energy effect (1 df) chi ² = 32.080 > 6.63 significant (P<0.01)
Protein effect (1 df) chi² = 7.709 > 6.63 significant (P<0.01)
Subtract the energy and protein chi-squared values from the total
chi-squared to get the remaining effect which is due to the interaction.
Energy x protein (1 df) chi ² = 6.760 > 6.63 significant (P<0.01)
It is only when we have 225 cows per treatment that we can detect the
10% difference in fertility (P<0.05), which is an important practical
difference.
Exact probabilities
Occasionally it is possible to obtain only limited amounts of data, for
example, if to obtain data would destroy experimental units. When the
numbers in a 2 x 2 table are very small, it may be best to compute exact
probabilities rather than to rely on the chi-squared approximation.
Example:
6 1 7 7 0 7
2 4 6 1 5 6
and
8 5 13 8 5 13
P = n1.!n2.!n.1!n.2!
n11!n12!n21!n22!n..!
n! = n(n-1)...1 and 0! = 1
P= 7! 6! 8! 5! = 0.3263
5! 2! 3! 3! 13!
P= 7! 6! 8! 5! = 0.0816
6! 1! 2! 4! 13!
P = 7! 6! 8! 5! = 0.0047
7! 0! 1! 5! 13!
All the above can be used as quick tests without having to make
assumptions about the nature of the population, its type of distribution
and variance. However, where it is possible to make the necessary
assumptions for the use of anovar, etc., more information (on means,
variance, etc.) will be obtained.
%(x+0.5) should be used when some of the values are <10 and especially
when zeros are present.
We might do the same in C2, with mean 12 and SD±1 and perform an
ANOVAR on the two columns.
The technique can be used to simulate factorial experiments,
randomized block designs, latin squares, etc., using appropriate columns
for different effects and variances. These can be summed to produce the
simulated values for the data column and the appropriate analysis
performed.
It is a good method to 'practise' statistics, while gaining an
appreciation of the effects of numbers, different levels of variation and
different methods of analysis.