Fixed and Random Model Linear Gndu2020 PDF
Fixed and Random Model Linear Gndu2020 PDF
Fixed and Random Model Linear Gndu2020 PDF
y = Xb + e
• y = vector of observed dependent values
• X = Design matrix: observations of the variables in the
assumed linear model
BLUE
• Both the OLS and GLS solutions are also
called the Best Linear Unbiased Estimator (or
BLUE for short)
• Whether the OLS or GLS form is used
depends on the assumed covariance
structure for the residuals
– Special case of Var(e) = se2 I -- OLS
– All others, i.e., Var(e) = R -- GLS
4
Linear Models
One tries to explain a dependent variable y as a linear
function of a number of independent (or predictor)
variables.
Linear Models
As with a univariate regression (y = a + bx + e), the model
parameters are typically chosen by least squares,
wherein they are chosen to minimize the sum of
squared residuals, S ei2
8
Experimental design and X
• The structure of X determines not only which
parameters are estimable, but also the expected
sample variances, as Var(b) = k (XTX)-1
• Experimental design determines the structure of X
before an experiment (of course, missing data almost
always means the final X is different form the
proposed X)
• Different criteria used for an optimal design. Let V =
(XTX)-1 . The idea is to chose a design for X given the
constraints of the experiment that:
– A-optimality: minimizes tr(V)
– D-optimality: minimizes det(V)
– E-optimality: minimizes leading eigenvalue of V
Sum of squared
residuals can
be written as
10
Ordinary Least Squares (OLS)
Sample Variances/Covariances
The residual variance can be estimated as
12
Example: Regression Through the Origin
yi = bxi + ei
13
Polynomial Regressions
GLM can easily handle any function of the observed
predictor variables, provided the parameters to estimate
are still linear, e.g. Y = a + b1f(x) + b2g(x) + … + e
Quadratic regression:
14
Interaction Effects
Interaction terms (e.g. sex x age) are handled similarly
16
Generalized Least Squares (GLS)
Suppose the residuals no longer have the same
variance (i.e., display heteroscedasticity). Clearly
we do not wish to minimize the unweighted sum
of squared residuals, because those residuals with
smaller variance should receive more weight.
18
Model diagnostics
• It’s all about the residuals
• Plot the residuals
– Quick and easy screen for outliers
– Plot y or yhat on e
• Test for normality among estimated residuals
– Q-Q plot
– Wilk-Shapiro test
– If non-normal, try transformations, such as log
19
20
Fixed vs. Random Effects
In linear models are are trying to accomplish two goals:
estimation the values of model parameters and estimate
any appropriate variances.
23
Environmental effects
• Consider yield data measured over several years in a
series of plots.
• Standard to treat year-to-year variation at a specific
site as being random effects
• Often the plot effects (mean value over years) are
also treated as random.
• For example, consider plants group in growing
region i, location j within that region, and year
(season) k for that location-region effect
– E = Ri + Lij + eijk
– Typically R can be a fixed effect, while L and e are
random effects, Lik ~ N(0,s2L) and eikj ~ N(0,s2e)
24
Random models
• With a random model, one is assuming that
all “levels” of a factor are not observed.
Rather, some subset of values are drawn from
some underlying distribution
– For example, year to year variation in rainfall at a
location. Each year is a random sample from the
long-term distribution of rainfall values
– Typically, assume a functional form for this
underlying distribution (e.g., normal with mean 0)
and then use observations to estimate the
distribution parameters (here, the variance)
25
26
Random models
• Let’s go back to treating yearly effects as random
• If assume these are uncorrelated, only use one
degree of freedom, but makes assumptions about
covariance structure
– Standard: Uncorrelated
– Option: some sort of autocorrelation process, say with a
yearly decay of r (must also be estimated)
• Conversely, could all be treated as fixed, but would
use k degrees of freedom for k years, but no
assumptions on their relationships (covariance
structure)
27
Identifiability
• Recall that a fixed effect is said to be
estimable if we can obtain a unique estimate
for it (either because X is of full rank or when
using a generalized inverse it returns a
unique estimate)
– Lack of estimable arises because the experiment
design confounds effects
• The analogous term for random models is
identifiability
– The variance components have unique estimates
28
The general mixed model
29
Observe y, X, Z.
33
Mixture models
• Under a mixture model, an observation potentially
comes from one of several different distributions, so that
the density function is p1f1 + p2f2 + p3f3
– The mixture proportions pi sum to one
– The fi represent different distribution, e.g., normal with mean µi
and variance s2
• Mixture models come up in QTL mapping -- an
individual could have QTL genotype QQ, Qq, or qq
– See Lynch & Walsh Chapter 13
• They also come up in codon models of evolution, were
a site may be neutral, deleterious, or advantageous,
each with a different distribution of selection coefficients
– See Walsh & Lynch (volume 2A website), Chapters 10,11
34
Generalized linear models