Aitken' GLS
Aitken' GLS
Aitken' GLS
Powell Department of Economics University of California, Berkeley The Generalized Regression Model Departures from the standard assumption of a scalar covariance matrix that is, V(y) = 2 I yield a particular extension of the classical regression model known as the generalized regression model, or sometimes generalized classical regression model. A concise statement of the assumptions on the N -dimensional vector y of dependent variables and the N K matrix X of regressors is: 1. (Linear Expectation) E(y) = X, for some K-dimensional vector of unknown regression coecients . 2. (Noncalar Covariance Matrix) V(y) E (y E(y)) (y E(y))0 = , for some positive denite (N N ) matrix . 3. (Nonstochastic Regressors) The (N K) matrix X is nonrandom. 4. (Full Rank Regressors) The rank of the matrix X is K, or, equivalently, the (K K) matrix (X0 X) is invertible. To this set of assumptions is sometimes appended the following assumption, which yields the generalized normal regression model : 5. (Multinormality) The vector y has a multivariate normal distribution. Often matrix will be written as = 2 , where 2 is an unknown scaling parameter; as is true for the best linear unbiased estimator for the classical regression model (namely, classical LS), the BLU estimator for the generalized regression model does not depend upon the value of 2 . Typically the matrix will also depend upon unknown parameters, but extension of Gauss-Markov arguments to this model will require it to be known exactly.
There are several varieties of linear models which yield a nonscalar covariance matrix, each with its own nomenclature; four leading examples, to be studied in greater detail later, are: Seemingly Unrelated Regression Model: The matrix has a Kronecker product form (to be dened later). That is = IN , where is a J J covariance matrix and IN is an N N identity matrix, where now dim(y) =J N. Such a model arises when N observations on J dependent variables yij are each assumed to be generated from separate linear models yij = x0 j +ij , ij where the errors ij are assumed to satisfy the assumptions of the standard regression model for j xed, but with Cov(ij , ik ) jk , which might be nonzero if equations j and k are related through common components in their error terms. Models of Heteroskedasticity (dierent scatter): The matrix is diagonal, i.e., =diag[ ii ]; this arises from the model yi = x0 +ci i , i where ci = ii and i satises the assumptions of the standard regression model.
Models of Serial Correlation (or autocorrelated errors): The matrix is band-diagonal, i.e., [ ij ] = [c(|i j|)] for some autocorrelation function c(). A model which generates a special case is yt = x0 +ut , t ut = ut1 + t , where t satises the assumptions of the standard regression model with V ar(t ) 2 (12 ). For this case, known as the rst-order serial correlation model, the autocovariance function takes the form c(s) = s . Panel Data Models (or pooled cross-section / time series models): The matrix again has Kronecker product form, with = 2 INT + 2 (N 0 2 IT ) + 2 (IT T 0 ), where 2 , 2 , and 2 are positive u v u v N T constants and N denotes an N -dimensional column vector of ones, etc. This case arises for a doublyindexed dependent variable y =vec([yit ]0 ) with dim(y) =N T, satisfying the structural equation yit = x0 +ui + vt + it , it 2
where the error terms {ui }, {vt }, and {it } are all mutually uncorrelated with variances 2 , 2 , and 2 , u v respectively. There are many other variants, which typically combine two or more of these sources of diering variances and/or nonzero correlations. Properties of Classical Least Squares Recalling the algebraic form of the LS estimator LS = (X0 X)1 X0 y, this estimator remains unbiased under the assumptions of the generalized regression model; as before, assumptions 1, 3, and 4 imply E(LS ) = (X0 X)1 X0 E(y) = (X0 X)1 X0 X = . But the form of the covariance matrix for LS for the generalized regression model diers from that under the classical regression model: V(LS ) = (X0 X)1 X0 V(y)X(X0 X)1 X = (X0 X)1 X0 X(X0 X)1 X = 2 (X0 X)1 X0 X(X0 X)1 X, which generally does not equal 2 (X0 X)1 except for special forms of (like = I). Also, in general E(s ) E
2
6= 2 .
1 LS )0 (y X LS ) (y X N K
Similar conclusions hold for the large-sample properties of LS . Assuming plim plim 1 0 X X = D, N
1 0 X X = C, N
and assuming suitable limit theorems are applicable, the classical LS estimator will have an asymptotically normal distribution, d N LS N(0, 2 D1 CD1 ),
The bias and inconsistency of the usual estimator V(LS ) = s2 (X0 X)1 of the asymptotic covariance matrix of LS means that the standard normal-based inference will be incorrect. There are two types of solution to this problem: either construct a consistent estimator of the asymptotic covariance matrix of LS, or nd an alternative estimation method to LS which does not suer from this problem. Finally, the classical LS estimator is no longer best linear unbiased in general; the BLU estimator GLS , the generalized least squares estimator, was derived by Aitken and is named after him. Aitkens Generalized Least Squares To derive the form of the best linear unbiased estimator for the generalized regression model, it is rst useful to dene the square root H of the matrix 1 as satisfying 1 = H0 H, which implies H1 H0 = IN . In fact, several such matrices H exist, so that, for convenience, we can assume H = H0 . Now, to derive the form of the BLU estimator of for the generalized regression model under the assumption that is known, dene y Hy, X HX; by the usual mean-variance calculations, E(y ) = HX = X
and V(y ) = HV(y)H0 = H[ 2 ]H0 = 2 IN . Since X = HX is clearly nonrandom with full column rank if X satises assumptions 3 and 4, the classical regression model applies to y and X , so the Gauss-Markov theorem implies that the best linear (in terms of y ) unbiased estimator of is GLS (X0 X )1 X0 y = (X0 1 X)1 X0 1 y. But since this estimator is also linear in the original dependent variable y, it follows that this generalized least squares (GLS) estimator is best linear unbiased using y. Also, the usual estimator of the scalar variance parameter 2 will also be unbiased if y and X are used: s2 GLS = 1 (y X GLS )0 (y X GLS ) N K 1 (y XGLS )0 1 (y XGLS ) N K
then the existing results for classical LS imply that GLS is also multinormal, GLS N , 2 (X0 1 X)1 , and is independent of s2 , with GLS (N K) s2 GLS 2 K . N 2
And the same arguments for consistency of the classical LS estimator under the classical regression model imply the corresponding large-sample results for the GLS estimator under the generalized regression model, assuming the usual limit theorems are applicable: d N GLS N (0, V) , 5
It is worth noting in passing that the denition of the squared multiple correlation coecient R2 generally must be adjusted for the GLS estimator. Even if one column of the original matrix of regressors X has elements identically equal to one, that is not generally true for the transformed regressors X = HX. Thus, the correct restricted sum of squares in the denominator of the R2 formula (imposing the restriction y that all coecients except the intercept are zero) is dierent from the usual (y)0 (y) . y Feasible GLS If the matrix involves unknown quantities, there are (at least) three possible strategies for inference: 1. Parametrize the matrix in terms of a nite p-dimensional vector of unknown parameters = (), which is constructed so that (0) = I, and conduct a diagonostic test of the null hypothesis H0 : = I = 0. (The form of this test will depend upon the particular parametric structure of ().) If that test fails to reject, common practice is to conclude that the classical regression model is appropriate, and use the usual LS methods for inference. 2. Again parametrize the matrix = () in terms of a nite-dimensional parameter vector , and use the classical LS residuals e y XLS to obtain consistent estimators and = () of
and . (The details of this step depend upon the particular model, e.g., heteroskedasticity, serial
correlation, etc.) Then replace the unknown with the estimated in the formula for GLS, yielding the feasible GLS estimator F GLS = (X0 1 X)1 X0 1 y,
with a corresponding estimator s2 GLS of 2 . Because is typically a function of y, this estimator F will no longer be linear in y, and the nite-sample results for GLS will no longer be applicable; nevertheless, it is often possible to nd conditions under which FGLS is asymptotically equivalent to GLS, d N F GLS GLS 0, d N F GLS N (0, V) , V = plim s2 GLS F 1 1 0 1 X X . N
so that
where
3. If the form of the parametric form () for the covariance matrix is incorrect for example, if it is mistakenly assumed () = I then application of the usual calculations will yield a particular form for the asymptotically normal distribution of F GLS : d N F GLS N 0, D1 CD1 , with D plim and C plim 1 0 1 1 X X, N 1 0 1 X X plim D N
for = V(y) as above. Consistent estimation of D is immediate; the trick is to nd a consistent estimator of the matrix C without imposing a parametric structure on the matrix . The feasibility of consistent estimation of the asymptotic covariance matrix D1 CD1 , termed robust covariance matrix estimation, will vary with the particular restrictions on distributional heterogeneity and dependence imposed (e.g., i.i.d. sampling, stationary data, etc.).