Econometrics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Econometrics

In linear regression we try to fit the best fit line , whereas in multiple regression we try to fit the
hyperplane which best defines the relationship.

Mean squared error:


Measures the average of the squares of the errors — that is, the average squared difference
between the estimated values and what is estimated.
The mean of all residual errors shows how bad the model is for defining the dataset. The best
model is thus the one with the minimum MSE.

Assumptions
1. The dependent variable is linear in parameters
Which means that the power of beta coefficients must be 1.
In other words there should be a linear relationship between dependent and independent
variables (otherwise we may have to use a non linear model). Linear regression however
always means linearity in parameters , irrespective of linearity in explanatory variables.
We can check for linearity through residuals vs fitted plot

This plot suggests that a nonlinear model would be better.

2.There should be random sampling of observations


This is done to reduce the chances of selecting outliers and to have atleast some variance in x
values for us to proceed with regression analysis (which is achievable through random
sampling).
3.The conditional mean of error terms must equal zero..
In other words, the regression model should be correctly specified.A correctly specified
regression model is the one that contains all the relevant predictors, transformations and
interaction terms.If the conditional mean of random variable given another random variable is
zero then the covariance between those variables is also zero - this is another implication from
the assumption 3.
Failure of the zero conditional mean assumption (Assumption MLR.3) causes OLS to be biased
(ommited variable bias as the conditional expectation is not zero due to ommited variable).

4.No perfect multicollinearity, i.e- an independent variable cannot be expressed as a linear


combination of other independent variables.

All the above 4 assumptions make sure that the ols estimator is unbiased.

5.The conditional variances of error term are constant (no heteroscedasticity) and there is no
autocorrelation.

6.optional: Error terms should be normally distributed.


Because u is sum of many different unobserved factors affecting y, we can invoke the
central limit theorem to conclude that u has an approximate normal distribution.
As beta coefficients are linear functions of error terms, they are also normally
distributed.

We can use qq plots to check for normality.

● the assumptions of normality, constant variance, and independence (error terms


are iid ) in linear models are about the residuals

Under MLR 1-6 Beta hat follows normal distribution with mean Beta(population
parameter) and var -var(Bj).The proof of this is that as each B^ can be expressed in
form of linear combination of the error terms.

Gauss markov theorem: When the MLR assumption 1-5 are satisfied, the ols estimators have
minimum variance among all the unbiased estimators and are unbiased.

Properties of Ols estimator


● Linear:y is a linear function of x.
● Unbiased:OLS is unbiased under Assumptions MLR.1 through MLR.4, we mean that the
procedure by which the OLS estimates are obtained is unbiased when applied across
the multiple samples, then the average value of estimated B^ coefficients will give us an
estimate close to the population value
● Min variance
These 3 properties make ols estimator the best estimator

Consistency:The value of ols estimators approaches the true value of population parameter as
the sample size increases.Every beta hat has its own distribution.estimator is consistent, then
the distribution of ​ˆ b​j becomes more and more tightly distributed around bj as the sample size
grows. As n tends to infinity, the distribution of ​ˆ b​j collapses to the single point bj .

Efficiency: The ols estimators are efficient as they have the minimum variance among all the
estimators.

Standard error of regression: Se of regression is sd of y values about the estimated regression


line.Se of regression is an unbiased estimator of the standard deviation of the error term.

Standard error of Ols estimates sd of ols estimates about true population parameters.

Note that as SST (var x) goes to 0, the variance of the OLS estimator goes to infinity (thus we
need some variation in x)

Var(​˜ b​1 -where we leave out x2) is always smaller than Var(​ˆ b​1), unless x1 and x2 are
uncorrelated in the sample, in which case the two estimators ​˜ b​1 and ​ˆ b​1 are the same.

The higher variance for the estimator of b1 is the cost of including an irrelevant variable in a
model.

Because even if b2 is zero if x1 & x2 happen to have a high Rj2 then the variance of b1 will also
be higher

● Impact of omitting a variable: Omitted variable bias

Unless B2 is 0 or x1 and x2 are uncorrelated, we’ll have omitted variable bias.


Endogeneity
Arises when our independent variable is correlated with the error term. So the error term not
only impacts the dependent variable directly but also through its correlation with the x variable

Reasons:
1.omitted variable
2.selection bias

Effect:
The actual impact of the independent variable could be underestimated.

Overfitting:
Overfitting a model is a condition where a statistical model begins to describe the random error
(noise) in the data rather than the relationships between variables. This leads to misleading R2,
low bias and high variance.

Underfitting:
Underfitting is a scenario in data science where a data model is unable to capture the
relationship between the input and output variables accurately, generating a high error rate on
both the training set and unseen data. High bias and low variance are good indicators of
underfitting.
In other words, when a model is underspecified - it yields biased regression coefficients and the
se is overestimated thus widening the confidence interval
When the regression model is overspecified-
Although these models yield unbiased coefficients they give inflated standard errors

Converting data into normal:


https://www.analyticsvidhya.com/blog/2021/05/how-to-transform-features-into-normal-gaussian-
distribution/

Testing relevance of the variables:


● we run a restricted (exclude the variables which we think to be insignificant) and
unrestricted regression
● The null hypothesis for this is that the beta coefficients for the excluded variables equal
zero
● Using t statistic in this case isn't right as the individual t statistic are hardly significant, so
we would mostly fail to reject the null hypothesis
● So we used F statistic which considers the R2 from both the regressions
● If the F statistic is significant then we reject the null hypothesis, so the variables do have
impact on the dependent variable
● However if we fail to reject null then the variables are insignificant and we can drop them
F and t statistic.

t and F statistic:
F statistic for testing exclusion of a single variable is equal to the square of the corresponding t
statistic.
So t statistic is better for single coefficient while F statistic is better for joint hypothesis testing
about coefficients.

Outliers:
Outliers are those observations which differ significantly from the other data points. It can affect
the normality of the data in the form of skewness and affect the results by making them biased.

Detection:
1.Can be detected using box plots and scatter plot.Z score can also be used for detecting the
outliers.
2.The discrepancy or influence of outliers on the data can be seen through studentized
residuals.Studentized residuals or standardised residuals tell us about how large the residuals
are in standardised terms.
An observation with a standardized residual of more than 3 can be considered as an outlier.
3.For our data to stay normal 99.7% of data needs to lie within 3 sd i.e:0.03% of outliers could
be tolerable. If the percentage of outliers is significantly higher than that then we need to treat
the outliers.

Imputation:
Outliers which are because of error can be removed.But outliers which are because of a special
cause and might recur in future then they shouldn't be removed.
Outliers can be replaced with the median value.

Scaling of variables
If dependent variable is multiplied by c then all the coefficients are multiplied by c
If independent variable is multiplied by c then all the coeff are divided by c
When we multiply x value by 16
The beta values, std error and SSR value are divided by 16
The R2 value stays the same
As the std errors are 16 times smaller
The t statistic remain the same
The endpoints of the confidence interval are 16 times smaller
Standardization of all the variables-when we standardise the y and x variables
The resultant beta coefficient tell us about std deviation change in y value when x value
changes by 1 standard deviation and there is no intercept term present

Quadratic terms:
Quadratic functions are also used quite often in applied economics to capture decreasing or
increasing marginal effects (polynomial Regression).
Before a certain level x has a positive impact on y and after that the impact is negetive -in such
cases the quadratic variable is important.
As x^2, x^3 are nonlinear functions of x, they don't violate the assumption of multicollinearity.

Interaction terms
Interaction effect occurs when the effect of an independen variable on the dependent variable
changes depending on the value of another independent variable.
Sometimes along with individual effects of variables, the variables act together in such a way
that they reinforce the effect on y.
Eg- while considering the effect of rainfall and fertilizers we need to take into account the
interaction of both which makes the hyperplane to curve upward (reinforces the effect on y).

Sometimes, it is natural for the partial effect, elasticity, or semi-elasticity of the ­dependent
variable with respect to an explanatory variable to depend on the magnitude of yet another
explanatory variable.
Like when we consider the price of a player, we need to consider the interaction effect between
goals and assists.
R squared
Adjusted r squared imposes a penalty for adding insignificant explanatory variables because
SSR never goes up (and usually falls) as more independent variables are added. But the
formula for ​ R​2 shows that it depends explicitly on k, the number of independent variables.
Adjusted R​2 increases if, and only if, the t statistic on the new variable is greater than one in
absolute value.
Adjusted r 2 can also help us in selecting between 2 models with same number (but different)
explanatory variables i.e -nonnested models.
The F statistic only allows us to compare nested model where one model (restricted) is special
case of another model.
However adjusted r2 cannot help us in comparing between models with dependent variables
having different functional forms
Log model
Expressing variables in log form gives us the interpretations in terms of percentages
Logs are often used for dollar amounts that are always positive, as well as for variables such as
population, especially when there is a lot of variation.
taking the log greatly reduces the variation of a variable, making OLS estimates less prone to
outlier influence.

Auxiliary Regression: A regression used to compute a test statistic-such as the test statistics for
heteroskedasticity and serial correlation or any other regression that does not estimate the
model of primary interest.

Multicollinearity
Multicollinearity is a phenomenon when one predictor is correlated with other predictor
variables.
Perfect multicollinearity:
When one independent variable can be expressed as linear combination of other
independent variables

Intuition :
Our beta coefficients tell us change in y as x changes keeping other variables constant.
However when we have both age and experience in our model, holding age constant as
experience increases doesn't really makes sense.
So when some of the variables which had to have been significant in the model have
insignificant t statistic, we have a problem of multicollinearity at our hand.

Impact:
● It inflates the variance (se) which results in insignificant t statistic and p value.
● While our overall model fit is not affected, i.e-R2 and F statistic are significant.
● If we are using model for prediction, then multicollinearity doesn't create a
problem (as standard errors are not required for predictions) so we get the same
predictions irrespective of multicollinearity.
Diagnosis:
1.Check correlation between all pairs of x variables.
2.If VIF is greater than 10 then we have very high multicollinearity.
VIF=1/1-Rj2
Rj2=R2 from regression of xj on other x variables.
Higher the Rj2 higher is the vif thus higher is our variance of the estimator.
Vif is called variance inflation factor as it tells us by how much the variance is inflated
because of multicollinearity.

Remedy:
1.Do nothing if we are using model only for predictions.
2.Remove the variables with high multicollinearity (results won't be affected if we
remove age variable and just keep experience variable).
3.If the variables shouldn't be correlated in realilty but are correlated then the
multicollinearity may be due to sampling error and we can increase our sample sizeor
take a different sample.
4. Transformation of variables- log transformation, first difference form.

Heteroscedasticity
When the variance of y is not constant as we move along the x variables, we have a
case of heteroscedasticity.

Causes:
● Outliers
● Learning effect where mistakes decrease with time.
● Error variance of spending tend to rise with income.

Impact:
1.Even in presence of heteroscedasticity we can calculate heteroscedasticity robust variance
and std errors (with different formula), which are valid as the sample size becomes sufficiently
large
2.Even in presence of heteroscedasticity the ols estimators are still unbiased i.e we will still
have correct predictions (as assumption of homoscedasticity is not necessary for unbiasedness
of ols estimators).Heteroskedasticity
Does not cause bias or inconsistency (this depends on MLR. 1 through MLR. 4) Does not affect
R2 or adjusted R2 (since these estimate the POPULATION variances which are not conditional
on X).
Although

Actual impact
However Ols estimators no longer have minimum variance, i.e they are no longer efficient
4.The standard errors are underestimated thus we have insignificant t statistic and t and F
statistic are no longer valid.
4.The heteroscedasticity robust t statistic are also asymptotically t distributed
Heteroscedasticity is present if u varies with the change in x value.
5.The confidence interval is unrealistically wide or narrow

Diagnosis:
The first step is to always plot the error terms and check for heteroscedasticity. (Residual vs
fitted plot) If the error terms are funnel shaped then we have heteroscedasticity.

Thus to test for heteroscedasticity we check whether u is function of xi


In both Bp test and white test (where we also add the interaction and squared terms of xi) we
run the regression of ui on xi terms
1.Bp test:
H0:all the coefficients obtained from regression of (ui)^2 on xi =0 (homoscedastic error terms)

We run the regression of squared residual errors (as ui is not available) with xi and look at the p
value of the F test,
If p value is above 0.05 then we fail to reject h0 and thus the heteroscedasticity is not present
If p value is less than 0.05 then we reject the null hypothesis, thus our residual error is function
of xi which denotes the presence of heteroscedasticity

2.White test:
We run the regression of the squared error term with the xi , cross product of xi and the squared
values of xi

H0:coefficient of the following regression=0


Instead of this tedious process
What we do is we just run the regression of ui^2 on fitted value of y and squared fitted values
(we use fitted values as they are function of all the independent variables)
If the p value of this regression is greater than 0.05 then there is no presence of
heteroscedasticity

3.Goldfield quant test:


Calculates F statistic by running 2 separate regressions (terms are arranged ascending order
and the data is divided into 2 parts by omitting the central observation ) and reject the null
hypothesis of homoscedasticity if F value (it’s calculated by taking the ratio of variance of error
term- if it is less than critical value, then we have homoscedasticity and if greater than critical
value is significant.

Remedy
1.weighted least squares, where we assign weights to the xi values based on the variance of
the fitted value (higher the variance, smaller the weights ) which neutralises the changing error
variances.

The resultant variance is homoscedastic.

2.Taking a log can also help in making the error variance constant.
3.White’s standard errors correct for the heteroscedasticity, which can be used instead of
normal standard errors for hypothesis testing.

https://online.stat.psu.edu/stat501/lesson/7/7.3
econometrix complete guide
Partial R2
Suppose we have 3 predictors and we wish to know what percentage of variation not explained
by x1 is explained by x2 and x3 (proportion of variance explained by x2 & x3 that cannot be
explained by x1).
Categorical data notes
Predictor type
1.Qualitative-yes or no
Which can be converted to dummies
2.Quantitative variable
For qualitative variables
We get two different regression estimates- one with the presence of variable and one when the
variable is absent
If we don't use n-1 dummies :
If x2 is when person smokes
We ignore the dummy variable trap and allot x3 when mother doesn't smoke
Now when x2 is 1, x3 is 0, thus they are perfectly correlated
When you run the regression of x2 on x3- the resultant r2 is 1, so this increases you vif to infinity
Thus we use n-1 dummies
When two predictors do not interact the predictors are said to have an additive effect on the
dependent variable
Two predictors interact when the response variable of one predictor is dependent on the value
of other
Model building
An estimate is unbiased when the average value of statistics determined from all possible
random samples equals the value of the population parameter

Autocorrelation:
When our error terms are correlated with the preceding error terms (you can predict the moment
of the error term in the next time period based on its preceding moment), they are said to be
autocorrelated. ( Autocorrelation also affects non-time series data).
So the error term is no longer random.

Causes:
1.Omitted variable: when we leave out some important variable
Eg- fitting the regression of stock price on time might result in autocorrelated error terms
So we need to add s&p 500 as another explanatory variable which is also an imp variable in
explaining our model
2.incorrect functional form
Eg-fitting regression of weightlifting just on age will also result in autocorrelation
As the weight lifted also decreases after certain age is crossed
So we also need to add a quadratic term of age to explain the change in weight lifted correctly

Impact:
1.Although the coefficients are unbiased, autocorrelation affects (underestimates) the std errors
thus making p value and t statistic useless (making us to think that predictors are significant
even when they are not).
2.As std error is small so t, p value are shown as significant even when they aren't and
confidence interval is narrow
3.Standard errors that are too small (for a time series with positive serial correlation and an
independent variable that grows over time).
T-statistics that are too large.

Diagnosis:
1.Durbin watson test:
H0:no first order autocorrelation, H1: first order autocorrelation.
looks at the successive error terms to determine autocorrelation at lag, the db statistic is
calculated (value of which depends on the number of x variables) and then the rejection region
is calculated.
The Durbin-Watson statistic will always have a value ranging between 0 and 4. A value of 2.0
indicates there is no autocorrelation detected in the sample. Values from 0 to less than 2 point to
positive autocorrelation and values from 2 to 4 means negative autocorrelation.

● Positive autocorrelation is when a positive error term is associated with a positive term at
time t+1.
● Negative autocorrelation is when the negetive error term is associated with negative
error term at time t+1.
● Order of autocorrelation :when the autocorrelation is observed with k+n th term we have
n order of autocorrelation.

2.A correlogram: we can check for a pattern by plotting the autocorrelation across lags. A
pattern in the results is an indication for autocorrelation. Any values above zero should be
looked at with suspicion.

3.Breusch godfrey test:


H0:no autocorrelation H1:autocorrelation of any order upto lag p
To test for autocorrelations of more than order of 1, we use this test.
● We run an auxiliary regression of current error terms on the preceding error term.
● et=B0+B1et-1+B2et-2……+Bnet-k+ut
● We obtain the chi square statistic (based on R2, as it follows chi square distribution with
p(lags) degrees of freedom) from the regression and if it’s significant, we reject our null
hypothesis and conclude that there is autocorrelation.

Remedies:
1.add in the omitted variable
2.correct the functional form issues
3.AR 1 procedure
4.cochrane orcutt procedure.
Akaike Information Criteria (AIC):
In simple terms, AIC estimates the relative amount of information lost by a given model. So the
less information lost the higher the quality of the model. Therefore, we always prefer models
with minimum AIC.

Mispecifiation:

1.ommited variable:
when we omit a variable (thinking that it's beta coefficient is 0) which should have been
included in that model.
Consequences:
● Autocorrelation
● Incorrectly estimated variance (which affects the methods of inference).
● The coefficient of ols estimators are biased

2.Addition of irrelevant variables:


Here we think that the beta coefficient of the variable is not zero (where as in reality that variable
has no significant effect on the dependent variable).Even here the ols estimators are unbiased.
Adding irrelevant variables does not affect the variances and the CI and hypothesis testing
procedures (they still remain valid).

Consequences:
● The R2 is not accurate (as the adjusted R2 decreases with the addition of each
irrelevant variable.
● Our model becomes unnecessarily complicated.

ARIMAX model is similar to a multivariate regression model, but allows to take advantage of
autocorrelation that may be present in residuals of the regression to improve the accuracy of a
forecast.

You might also like