Econometrics
Econometrics
Econometrics
In linear regression we try to fit the best fit line , whereas in multiple regression we try to fit the
hyperplane which best defines the relationship.
Assumptions
1. The dependent variable is linear in parameters
Which means that the power of beta coefficients must be 1.
In other words there should be a linear relationship between dependent and independent
variables (otherwise we may have to use a non linear model). Linear regression however
always means linearity in parameters , irrespective of linearity in explanatory variables.
We can check for linearity through residuals vs fitted plot
All the above 4 assumptions make sure that the ols estimator is unbiased.
5.The conditional variances of error term are constant (no heteroscedasticity) and there is no
autocorrelation.
Under MLR 1-6 Beta hat follows normal distribution with mean Beta(population
parameter) and var -var(Bj).The proof of this is that as each B^ can be expressed in
form of linear combination of the error terms.
Gauss markov theorem: When the MLR assumption 1-5 are satisfied, the ols estimators have
minimum variance among all the unbiased estimators and are unbiased.
Consistency:The value of ols estimators approaches the true value of population parameter as
the sample size increases.Every beta hat has its own distribution.estimator is consistent, then
the distribution of ˆ bj becomes more and more tightly distributed around bj as the sample size
grows. As n tends to infinity, the distribution of ˆ bj collapses to the single point bj .
Efficiency: The ols estimators are efficient as they have the minimum variance among all the
estimators.
Standard error of Ols estimates sd of ols estimates about true population parameters.
Note that as SST (var x) goes to 0, the variance of the OLS estimator goes to infinity (thus we
need some variation in x)
Var(˜ b1 -where we leave out x2) is always smaller than Var(ˆ b1), unless x1 and x2 are
uncorrelated in the sample, in which case the two estimators ˜ b1 and ˆ b1 are the same.
The higher variance for the estimator of b1 is the cost of including an irrelevant variable in a
model.
Because even if b2 is zero if x1 & x2 happen to have a high Rj2 then the variance of b1 will also
be higher
Reasons:
1.omitted variable
2.selection bias
Effect:
The actual impact of the independent variable could be underestimated.
Overfitting:
Overfitting a model is a condition where a statistical model begins to describe the random error
(noise) in the data rather than the relationships between variables. This leads to misleading R2,
low bias and high variance.
Underfitting:
Underfitting is a scenario in data science where a data model is unable to capture the
relationship between the input and output variables accurately, generating a high error rate on
both the training set and unseen data. High bias and low variance are good indicators of
underfitting.
In other words, when a model is underspecified - it yields biased regression coefficients and the
se is overestimated thus widening the confidence interval
When the regression model is overspecified-
Although these models yield unbiased coefficients they give inflated standard errors
t and F statistic:
F statistic for testing exclusion of a single variable is equal to the square of the corresponding t
statistic.
So t statistic is better for single coefficient while F statistic is better for joint hypothesis testing
about coefficients.
Outliers:
Outliers are those observations which differ significantly from the other data points. It can affect
the normality of the data in the form of skewness and affect the results by making them biased.
Detection:
1.Can be detected using box plots and scatter plot.Z score can also be used for detecting the
outliers.
2.The discrepancy or influence of outliers on the data can be seen through studentized
residuals.Studentized residuals or standardised residuals tell us about how large the residuals
are in standardised terms.
An observation with a standardized residual of more than 3 can be considered as an outlier.
3.For our data to stay normal 99.7% of data needs to lie within 3 sd i.e:0.03% of outliers could
be tolerable. If the percentage of outliers is significantly higher than that then we need to treat
the outliers.
Imputation:
Outliers which are because of error can be removed.But outliers which are because of a special
cause and might recur in future then they shouldn't be removed.
Outliers can be replaced with the median value.
Scaling of variables
If dependent variable is multiplied by c then all the coefficients are multiplied by c
If independent variable is multiplied by c then all the coeff are divided by c
When we multiply x value by 16
The beta values, std error and SSR value are divided by 16
The R2 value stays the same
As the std errors are 16 times smaller
The t statistic remain the same
The endpoints of the confidence interval are 16 times smaller
Standardization of all the variables-when we standardise the y and x variables
The resultant beta coefficient tell us about std deviation change in y value when x value
changes by 1 standard deviation and there is no intercept term present
Quadratic terms:
Quadratic functions are also used quite often in applied economics to capture decreasing or
increasing marginal effects (polynomial Regression).
Before a certain level x has a positive impact on y and after that the impact is negetive -in such
cases the quadratic variable is important.
As x^2, x^3 are nonlinear functions of x, they don't violate the assumption of multicollinearity.
Interaction terms
Interaction effect occurs when the effect of an independen variable on the dependent variable
changes depending on the value of another independent variable.
Sometimes along with individual effects of variables, the variables act together in such a way
that they reinforce the effect on y.
Eg- while considering the effect of rainfall and fertilizers we need to take into account the
interaction of both which makes the hyperplane to curve upward (reinforces the effect on y).
Sometimes, it is natural for the partial effect, elasticity, or semi-elasticity of the dependent
variable with respect to an explanatory variable to depend on the magnitude of yet another
explanatory variable.
Like when we consider the price of a player, we need to consider the interaction effect between
goals and assists.
R squared
Adjusted r squared imposes a penalty for adding insignificant explanatory variables because
SSR never goes up (and usually falls) as more independent variables are added. But the
formula for R2 shows that it depends explicitly on k, the number of independent variables.
Adjusted R2 increases if, and only if, the t statistic on the new variable is greater than one in
absolute value.
Adjusted r 2 can also help us in selecting between 2 models with same number (but different)
explanatory variables i.e -nonnested models.
The F statistic only allows us to compare nested model where one model (restricted) is special
case of another model.
However adjusted r2 cannot help us in comparing between models with dependent variables
having different functional forms
Log model
Expressing variables in log form gives us the interpretations in terms of percentages
Logs are often used for dollar amounts that are always positive, as well as for variables such as
population, especially when there is a lot of variation.
taking the log greatly reduces the variation of a variable, making OLS estimates less prone to
outlier influence.
Auxiliary Regression: A regression used to compute a test statistic-such as the test statistics for
heteroskedasticity and serial correlation or any other regression that does not estimate the
model of primary interest.
Multicollinearity
Multicollinearity is a phenomenon when one predictor is correlated with other predictor
variables.
Perfect multicollinearity:
When one independent variable can be expressed as linear combination of other
independent variables
Intuition :
Our beta coefficients tell us change in y as x changes keeping other variables constant.
However when we have both age and experience in our model, holding age constant as
experience increases doesn't really makes sense.
So when some of the variables which had to have been significant in the model have
insignificant t statistic, we have a problem of multicollinearity at our hand.
Impact:
● It inflates the variance (se) which results in insignificant t statistic and p value.
● While our overall model fit is not affected, i.e-R2 and F statistic are significant.
● If we are using model for prediction, then multicollinearity doesn't create a
problem (as standard errors are not required for predictions) so we get the same
predictions irrespective of multicollinearity.
Diagnosis:
1.Check correlation between all pairs of x variables.
2.If VIF is greater than 10 then we have very high multicollinearity.
VIF=1/1-Rj2
Rj2=R2 from regression of xj on other x variables.
Higher the Rj2 higher is the vif thus higher is our variance of the estimator.
Vif is called variance inflation factor as it tells us by how much the variance is inflated
because of multicollinearity.
Remedy:
1.Do nothing if we are using model only for predictions.
2.Remove the variables with high multicollinearity (results won't be affected if we
remove age variable and just keep experience variable).
3.If the variables shouldn't be correlated in realilty but are correlated then the
multicollinearity may be due to sampling error and we can increase our sample sizeor
take a different sample.
4. Transformation of variables- log transformation, first difference form.
Heteroscedasticity
When the variance of y is not constant as we move along the x variables, we have a
case of heteroscedasticity.
Causes:
● Outliers
● Learning effect where mistakes decrease with time.
● Error variance of spending tend to rise with income.
Impact:
1.Even in presence of heteroscedasticity we can calculate heteroscedasticity robust variance
and std errors (with different formula), which are valid as the sample size becomes sufficiently
large
2.Even in presence of heteroscedasticity the ols estimators are still unbiased i.e we will still
have correct predictions (as assumption of homoscedasticity is not necessary for unbiasedness
of ols estimators).Heteroskedasticity
Does not cause bias or inconsistency (this depends on MLR. 1 through MLR. 4) Does not affect
R2 or adjusted R2 (since these estimate the POPULATION variances which are not conditional
on X).
Although
Actual impact
However Ols estimators no longer have minimum variance, i.e they are no longer efficient
4.The standard errors are underestimated thus we have insignificant t statistic and t and F
statistic are no longer valid.
4.The heteroscedasticity robust t statistic are also asymptotically t distributed
Heteroscedasticity is present if u varies with the change in x value.
5.The confidence interval is unrealistically wide or narrow
Diagnosis:
The first step is to always plot the error terms and check for heteroscedasticity. (Residual vs
fitted plot) If the error terms are funnel shaped then we have heteroscedasticity.
We run the regression of squared residual errors (as ui is not available) with xi and look at the p
value of the F test,
If p value is above 0.05 then we fail to reject h0 and thus the heteroscedasticity is not present
If p value is less than 0.05 then we reject the null hypothesis, thus our residual error is function
of xi which denotes the presence of heteroscedasticity
2.White test:
We run the regression of the squared error term with the xi , cross product of xi and the squared
values of xi
Remedy
1.weighted least squares, where we assign weights to the xi values based on the variance of
the fitted value (higher the variance, smaller the weights ) which neutralises the changing error
variances.
2.Taking a log can also help in making the error variance constant.
3.White’s standard errors correct for the heteroscedasticity, which can be used instead of
normal standard errors for hypothesis testing.
https://online.stat.psu.edu/stat501/lesson/7/7.3
econometrix complete guide
Partial R2
Suppose we have 3 predictors and we wish to know what percentage of variation not explained
by x1 is explained by x2 and x3 (proportion of variance explained by x2 & x3 that cannot be
explained by x1).
Categorical data notes
Predictor type
1.Qualitative-yes or no
Which can be converted to dummies
2.Quantitative variable
For qualitative variables
We get two different regression estimates- one with the presence of variable and one when the
variable is absent
If we don't use n-1 dummies :
If x2 is when person smokes
We ignore the dummy variable trap and allot x3 when mother doesn't smoke
Now when x2 is 1, x3 is 0, thus they are perfectly correlated
When you run the regression of x2 on x3- the resultant r2 is 1, so this increases you vif to infinity
Thus we use n-1 dummies
When two predictors do not interact the predictors are said to have an additive effect on the
dependent variable
Two predictors interact when the response variable of one predictor is dependent on the value
of other
Model building
An estimate is unbiased when the average value of statistics determined from all possible
random samples equals the value of the population parameter
Autocorrelation:
When our error terms are correlated with the preceding error terms (you can predict the moment
of the error term in the next time period based on its preceding moment), they are said to be
autocorrelated. ( Autocorrelation also affects non-time series data).
So the error term is no longer random.
Causes:
1.Omitted variable: when we leave out some important variable
Eg- fitting the regression of stock price on time might result in autocorrelated error terms
So we need to add s&p 500 as another explanatory variable which is also an imp variable in
explaining our model
2.incorrect functional form
Eg-fitting regression of weightlifting just on age will also result in autocorrelation
As the weight lifted also decreases after certain age is crossed
So we also need to add a quadratic term of age to explain the change in weight lifted correctly
Impact:
1.Although the coefficients are unbiased, autocorrelation affects (underestimates) the std errors
thus making p value and t statistic useless (making us to think that predictors are significant
even when they are not).
2.As std error is small so t, p value are shown as significant even when they aren't and
confidence interval is narrow
3.Standard errors that are too small (for a time series with positive serial correlation and an
independent variable that grows over time).
T-statistics that are too large.
Diagnosis:
1.Durbin watson test:
H0:no first order autocorrelation, H1: first order autocorrelation.
looks at the successive error terms to determine autocorrelation at lag, the db statistic is
calculated (value of which depends on the number of x variables) and then the rejection region
is calculated.
The Durbin-Watson statistic will always have a value ranging between 0 and 4. A value of 2.0
indicates there is no autocorrelation detected in the sample. Values from 0 to less than 2 point to
positive autocorrelation and values from 2 to 4 means negative autocorrelation.
● Positive autocorrelation is when a positive error term is associated with a positive term at
time t+1.
● Negative autocorrelation is when the negetive error term is associated with negative
error term at time t+1.
● Order of autocorrelation :when the autocorrelation is observed with k+n th term we have
n order of autocorrelation.
2.A correlogram: we can check for a pattern by plotting the autocorrelation across lags. A
pattern in the results is an indication for autocorrelation. Any values above zero should be
looked at with suspicion.
Remedies:
1.add in the omitted variable
2.correct the functional form issues
3.AR 1 procedure
4.cochrane orcutt procedure.
Akaike Information Criteria (AIC):
In simple terms, AIC estimates the relative amount of information lost by a given model. So the
less information lost the higher the quality of the model. Therefore, we always prefer models
with minimum AIC.
Mispecifiation:
1.ommited variable:
when we omit a variable (thinking that it's beta coefficient is 0) which should have been
included in that model.
Consequences:
● Autocorrelation
● Incorrectly estimated variance (which affects the methods of inference).
● The coefficient of ols estimators are biased
Consequences:
● The R2 is not accurate (as the adjusted R2 decreases with the addition of each
irrelevant variable.
● Our model becomes unnecessarily complicated.
ARIMAX model is similar to a multivariate regression model, but allows to take advantage of
autocorrelation that may be present in residuals of the regression to improve the accuracy of a
forecast.