Name: Adewole Oreoluwa Adesina Matric. No.: RUN/ACC/19/8261 Course Code: Eco 307

Name: Adewole Oreoluwa Adesina
Matric. No.: RUN/ACC/19/8261
Course Code: Eco 307

1. The Linear term is linear in parameters
Assumption 1 requires that the dependent variable is a linear combination of the
explanatory variables and the error terms . Assumption 1 requires the specified model
to be linear in parameters, but it does not require the model to be linear in variables.
Equation 1 and 2 depict a model which is both, linear in parameter and variables. Note
that Equation 1 and 2 show the same model in different notation.
(1)
(2)
In order for OLS to work the specified model has to be linear in parameters. Note that if the
true relationship between and is non linear it is not possible to estimate the coefficient
in any meaningful way. Equation 3 shows an empirical model in which is of quadratic
nature.
(3)
Assumption 1 requires the model to be linear in parameters. OLS is not able to estimate
Equation 3 in any meaningful way. However, assumption 1 does not require the model to be
linear in variables. OLS will produce a meaningful estimation of in Equation 4.
(4)
Using the method of ordinary least squares (OLS) allows us to estimate models which are
linear in parameters, even if the model is non linear in variables. On the contrary it is not
possible to estimate models which are non linear in parameters, even if they are linear in
variables.
2. Explanatory variables are fixed in repeated samples

Giving more emphasis on the assumption, that the design variable X is traditionally been taken
as nonstochastic, it is realized that the assumption of nonstochastic regressor is not always
plausible, the regressor may be stochastic in nature. The term stochastic regressor means that the
regressors, i.e. the explanatory variables are random with the change of time. The basic
assumption in case of Stochastic regressors are: i) X, Y, e random ii) (X,Y) obtained from iid
sampling iii) E(e|X)=0 iv) X takes atleast two values v) Var(e|X) =𝜎 ଶ vi) e is normal. The
Variables X, Y and e is already defined in section 1. It has been recognized, however, that in
numerous applications X might be stochastic and e might not be normal. This may give rise to
three problems (a) X is nonstochastic and e is nonnormal, (b) X is stochastic and e is normal, and
(c) X is stochastic and e is nonnormal. The basic problem of the stochastic regressor in General
Linear Model (GLM) is that the least square estimators may not be unbiased because when
taking the expectation of the random vector b (OLS estimator), E[𝑏෨] = 𝛽෨+ E [(𝑋′𝑋) −1𝑋′𝑒̃]
the second term E[(𝑋′𝑋) −1𝑋′𝑒̃] may not vanish. It may happen because E[(𝑋′𝑋) −1𝑋′𝑒̃] cannot
be expressed as (𝑋′𝑋) −1𝑋′𝐸[𝑒̃] as in the case of fixed regressor, nor, due to the stochastic nature
of X and e, can be expressed as 𝐸[(𝑋′𝑋) −1𝑋′]𝐸[𝑒̃] unless X and e can be assumed to be
independent. If X is stochastic, the conceivable consequences of OLS estimators can be *
unbiased, efficient & consistent * biased but consistent * biased but inconsistent. Also, the
distribution of OLS estimators 𝛽መ depends on the distribution of e and X. In such cases
statistical inference become more difficult. Appropriateness of OLS depends on the stochastic
dependencies between the matrix X and the error vector e. Three dependency structures are
defined, in the first case error and regressor are stochastically independent, secondly, error term
and regressor are contemporary uncorrelated and thirdly error term and regressor are
contemporary correlated. In first two cases, OLS estimators are feasible since they are consistent.
In the third case, application of alternative estimation procedures is needed since OLS estimators
are biased and inconsistent. Instrumental Variable Regression can be applied in such cases.
3. Zero mean value of the mean value of the disturbance
error term
The error term accounts for the variation in the dependent variable that the independent
variables do not explain. Random chance should determine the values of the error term.
For your model to be unbiased, the average value of the error term must equal zero.
Suppose the average error is +7. This non-zero average error indicates that our model
systematically underpredicts the observed values. Statisticians refer to systematic error
like this as bias, and it signifies that our model is inadequate because it is not correct on
average.
Stated another way, we want the expected value of the error to equal zero. If the expected
value is +7 rather than zero, part of the error term is predictable, and we should add that
information to the regression model itself. We want only random error left for the error
term.
In other words, the distribution of error terms has zero mean and doesn’t depend on the
independent variables X′s. Thus, there must be no relationship between the X′s and the
error term.
4. Homoscedasticity or equal variance of the error term

The assumption of homoscedasticity (homo — equal , scedasticity — spread) states that the y
population corresponding to various X values have the same variance i.e. it neither increases nor
decreases a X increases.
Heteroscedasticity is when the variance is unequal i.e. the variance of y population varies with X.
What are the reasons for non constant variance?
One of the most popular example quoted on heteroscedasticity is the savings vs income data. As
incomes grow, people have more discretionary income and thus a wider choice about the income
disposition. Thus, regressing savings on income is likely to find higher variance as income
increases cause people have more choices regarding their saving decision.
Other reasons for Heteroscedasticity include presence of outliers, omitted important variables,
incorrect data transformation or incorrect functional form of the equation.

OLS does not change the weights of the observation depending on their contribution to residual
sum of squares (RSS). It gives equal weight to all observations. Thus when the RSS is minimized
to compute the estimate values, the observation with higher variance will have a larger pull in the
equation. Thus the beta estimated using OLS for heteroscedastic data will no longer have
minimum variance.
The beta estimated with heteroscedastic data will therefore have a higher variance and thus a high
standard error.
5. No auto correlation between the disturbance error term
One observation of the error term should not predict the next observation. For instance, if
the error for one observation is positive and that systematically increases the probability
that the following error is positive, that is a positive correlation. If the subsequent error is
more likely to have the opposite sign, that is a negative correlation. This problem is
known both as serial correlation and autocorrelation. Serial correlation is most likely to
occur in time series models.
For example, if sales are unexpectedly high on one day, then they are likely to be higher
than average on the next day. This type of correlation isn’t an unreasonable expectation
for some subject areas, such as inflation rates, GDP, unemployment, and so on.
Assess this assumption by graphing the residuals in the order that the data were collected.
You want to see randomness in the plot. In the graph for a sales model, there is a cyclical
pattern with a positive correlation.
This assumption is most likely to be violated in time series regression models and, hence,
intuition says that there is no need to investigate it. However, you can still check for
autocorrelation by viewing the residual time series plot. If autocorrelation is present in
the model, you can try taking lags of independent variables to correct for the trend
component. If you do not correct for autocorrelation, then OLS estimates won’t be
BLUE, and they won’t be reliable enough.
6. Zero co-variance between error term and the

explanatory variance
In an OLS regression, the residuals (your estimates of the error or disturbance
term) 𝜀̂ are indeed guaranteed to be uncorrelated with the predictor variables,
assuming the regression contains an intercept term.
But the "true" errors 𝜀 may well be correlated with them, and this is what counts
as endogeneity.
To keep things simple, consider the regression model (you might see this
described as the underlying "data generating process" or "DGP", the theoretical
model that we assume to generate the value of 𝑦
𝑦𝑖=𝛽1+𝛽2𝑥𝑖+𝜀𝑖
There is no reason, in principle, why 𝑥 can't be correlated with 𝜀 in our model,

however much we would prefer it not to breach the standard OLS assumptions in
this way. For example, it might be that 𝑦 depends on another variable that has
been omitted from our model, and this has been incorporated into the disturbance
term ( 𝜀 ) is where we lump in all the things other than 𝑥 that affect 𝑦. If this
omitted variable is also correlated with 𝑥, then 𝜀 will in turn be correlated with 𝑥
and we have endogeneity (in particular, omitted-variable bias).
When you estimate your regression model on the available data, we get
𝑦𝑖=𝛽̂1+𝛽̂2𝑥𝑖+𝜀̂𝑖
Because of the way OLS works*, the residuals 𝜀̂ will be uncorrelated with 𝑥. But
that doesn't mean we have avoided endogeneity — it just means that we can't
detect it by analysing the correlation between 𝜀̂and 𝑥, which will be (up to
numerical error) zero. And because the OLS assumptions have been breached,
we are no longer guaranteed the nice properties, such as unbiasedness, we enjoy
so much about OLS. Our estimate 𝛽̂2 will be biased
7. The number of observations must be greater than the
number of parameters to be estimated
This is seen immediately you look at the problem. If the number of parameters to
be estimated is more than the observations, it is impossible to estimate. If the
parameters to be estimated and the observations are equal, then Ordinary Least
Square Method is not required.

8. Variability in X values i.e x variables must not be
constant
OLS does not require that the error term follows a normal distribution to produce
unbiased estimates with the minimum variance. However, satisfying this assumption
allows you to perform statistical hypothesis testing and generate reliable confidence
intervals and prediction intervals.
Another important OLS (Ordinary Least Squares) assumptions is the fact that when you
want to run a regression, you need to make sure that the sample is drawn randomly from
the population. When this doesn’t occur, you are basically running the risk of introducing
an unknown factor into your analysis and the model won’t take it into account.
Notice that this assumption also makes it clear that your independent variable causes the
dependent variable (in theory). So, simply put, the OLS is a causal statistical method that
investigates the ability of the independent variable to predict the dependent variable. This means
that you are looking for a causal relationship instead of a correlation.
The easiest way to determine whether the residuals follow a normal distribution is to
assess a normal probability plot. If the residuals follow the straight line on this type of
graph, they are normally distributed.

9. The regression model is correctly specified
This means that if the Y and X variable has an inverse relationship, the model equation
should be specified appropriately:
Y=β1+β2∗(1/X)
OLS estimation provides the Best Linear Unbiased Estimate (BLUE) of beta if all assumptions of
the linear regression are satisfied.
What does BLUE mean? It means that the OLS method gives the estimates that are :
Linear :
It is a linear function of the dependent variable Y of the regression model
Unbiased :
Its expected or average value is equal to the true value
Efficient :
It has minimum variance. An unbiased estimator with the least variance is known as efficient
estimator.
If all assumptions of the Linear Regression are satisfied, OLS gives us the best linear unbiased
estimates.
10. There is no perfect multicollinearity
For a classical linear regression model with multiple regressors (explanatory variables),
there should be no exact linear relationship between the explanatory variables. The
collinearity or multicollinearity term is used if there is/are one or more linear
relationship exists among the variables.
The term multicollinearity is considered as the violation of the assumption of “no
exact linear relationship between the regressors.
Ragnar Frisch introduced this term, originally it means the existence of a “perfect”
or “exact” linear relationship among some or all regressors of a regression model.
Consider a k-variable regression model involving explanatory variables X1,X2,⋯,Xk. An exact
linear relationship is said to exist if the following condition is satisfied.

λ1X1+λ2X2+⋯+λkXk=0,
where λ1,λ2,⋯,λk are constant and all of them all are non-zero, simultaneously, and X1=1 for all
observations for intercept term.
Now a day, multicollinearity term is not only being used for the case of perfect multicollinearity
but also in case of not perfect collinearity (the case where the X variables are intercorrelated but
not perfectly). Therefore,
λ1X1+λ2X2+⋯λkXk+υi,
where υi is a stochastic error term.
In case of a perfect linear relationship (correlation coefficient will be one in this case) among
explanatory variables, the parameters become indeterminate (it is impossible to obtain values for
each parameter separately) and the method of least square breaks down. However, if regressors
are not intercorrelated at all, the variables are called orthogonal and there is no problem
concerning the estimation of coefficients.

Name: Adewole Oreoluwa Adesina Matric. No.: RUN/ACC/19/8261 Course Code: Eco 307

Uploaded by

Copyright:

Available Formats

Name: Adewole Oreoluwa Adesina Matric. No.: RUN/ACC/19/8261 Course Code: Eco 307

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Name: Adewole Oreoluwa Adesina Matric. No.: RUN/ACC/19/8261 Course Code: Eco 307

Uploaded by

Copyright:

Available Formats

Name: Adewole Oreoluwa Adesina

Matric. No.: RUN/ACC/19/8261

Course Code: Eco 307

that Equation 1 and 2 show the same model in different notation.

2. Explanatory variables are fixed in repeated samples

as nonstochastic, it is realized that the assumption of nonstochastic regressor is not always

of X and e, can be expressed as 𝐸[(𝑋′𝑋) −1𝑋′]𝐸[𝑒̃] unless X and e can be assumed to be

independent. If X is stochastic, the conceivable consequences of OLS estimators can be *

distribution of OLS estimators 𝛽መ depends on the distribution of e and X. In such cases

systematically underpredicts the observed values. Statisticians refer to systematic error

4. Homoscedasticity or equal variance of the error term

What are the reasons for non constant variance?

incorrect data transformation or incorrect functional form of the equation.

5. No auto correlation between the disturbance error term

occur in time series models.

pattern with a positive correlation.

autocorrelation by viewing the residual time series plot. If autocorrelation is present in

BLUE, and they won’t be reliable enough.

6. Zero co-variance between error term and the

There is no reason, in principle, why 𝑥 can't be correlated with 𝜀 in our model,

7. The number of observations must be greater than the

number of parameters to be estimated

be estimated is more than the observations, it is impossible to estimate. If the

Square Method is not required.

allows you to perform statistical hypothesis testing and generate reliable confidence

that you are looking for a causal relationship instead of a correlation.

graph, they are normally distributed.

should be specified appropriately:

the linear regression are satisfied.

It is a linear function of the dependent variable Y of the regression model

10. There is no perfect multicollinearity

collinearity or multicollinearity term is used if there is/are one or more linear

relationship exists among the variables.

The term multicollinearity is considered as the violation of the assumption of “no

exact linear relationship between the regressors.

or “exact” linear relationship among some or all regressors of a regression model.

Consider a k-variable regression model involving explanatory variables X1,X2,⋯,Xk. An exact

linear relationship is said to exist if the following condition is satisfied.

observations for intercept term.

not perfectly). Therefore,

where υi is a stochastic error term.

concerning the estimation of coefficients.

You might also like