Econometrics 1 Cumulative Final Study Guide
Econometrics 1 Cumulative Final Study Guide
Econometrics 1 Cumulative Final Study Guide
STUDY GUIDE:
1. Section1: Econometrics
a. Purposes
b. Process
c. Model
d. Properties of variables
2. Section 2: Review of statistics
a. Random variables
b. PDFs
c. CDFs
d. Expected Value
e. Variance
f. Correlation coefficient
3. Section 3: Simple Regression
Analysis
a. Assumptions
b. Statistical inference
c. OLS method
d. Properties of estimators
e. Deriving regression coefficients
(simple model)
f. Residual
g. Variances (precision) of
regression coefficients
h. Standard error of regression
coefficients
i. Sum of squares
j. R2
k. Random/nonrandom components
l. Types of data
4. Section 4: Hypothesis testing
a. Formation of null
b. One vs two-tailed tests
c. testing hypotheses relating to
regression coefficients
d. Confidence intervals
e. Type 1 / 2 Errors
f. P-value
g. F-tests
h. Interpreting Stata Tables
i. ANOVA table
5. Section 5: Multiple Regression
Analysis
a. Definition
b. Assumptions
c. Deriving a fitted model
d. Efficiency
e. Precision
f. Standard errors of coefficients
g. Multicollinearity
h. Further Analysis of Variance
i. Adjusted/Corrected R2
j. Hedonic pricing (prediction)
6. Section 6: Nonlinear models &
transformation of variables
a. Linearity in var/par
b. Linearizing functions nonlin in var
c. Linearizing functions nonlin in
params
d. Log model
e. Semilog models
f. Economies of scale
g. Linear/semilog/loglog table
h. Disturbance terms
i. Comparing linear & logarithmic
specifications
j. Making RSS comparable: Box &
Cox
k. Models with quadratic and
interactive variables
l. Quadratic model
m. Interactive explanatory variables
n. Ramseys RESET tests
o. Nonlinear regression
7. Section 7: Dummy Variables
a. Definition
b. reasons for use
c. Combining multiple functions into
one
d. Standard errors & hypothesis
testing of dummies
e. More elaborate dummies
f. Fitting individual implicit costs
g. Changing reference categories
h. Dummy trap
i. Multiple sets of dummies
j. Slope dummy (interactive)
k. F-testing dummies
l. Dummies in log/semilog
m. Chow test
n. Dummies vs chow
Properties of R2 (even the obscure ones). Know when, why, and by how much R2 changes in
specific
F-tests and degrees of freedom
Sums of squares
Variances
Correlations
Differences between simple and multiple regression assumptions and processes
Estimators: finding them, properties, etc.
OLS methods
R2! It seems to come up on every test
Errors (probabilities, definitions, etc)
Properties of the null hypothesis
Be sure to make a good note sheet. Try to analyze the trends of test questions in the past, and
figure out what to expect on this.
Good luck! Feel free to email me or send a message directly through Notehall with questions, rather
than leaving a nasty review if I made a small mistake. If you liked the guide, please rate it!
Section 1: Introduction
Econometrics: the application of statistical and mathematic methods to the analysis of economic data, with a
purpose of giving empirical content to economic theories and verifying/refuting them (Maddala)
Purposes of econometrics
1. Put empirical content to theory
2. Conduct hypothesis tests
3. Forecast
Process of econometrics
Theory: Identify a theorized relationship, figure out what you need from the theory to address your
problem
Collect data
Answer the problem by describing it, testing your hypothesis, and forecast
Econometric model: = 1 + 2 2 + 3 3 + + +
Y: dependent/explained/caused/endogenous variable
Properties of variables
EX: The outcomes of rolling a die are specific integers. There is no possibility of getting a 1.5
Continuous RVs can take on any value in a given range (infinite # of possibilities)
o
EX: The temp in a room can fall at any value between 55 and 85 degrees. This means that it
could be 60.74274 or 55.1123 degrees
The chance of getting a specific value of X in this case is 0, because the prob would equal 1/
Probability distributions: pdfs are formulas that give the prob of getting different values of the RV
Uniform distributions: the chances of getting specific values of the RV X are equal for all outcomes.
If a pdf is distributed uniformly, the chances of getting a specific X value is constant across a range,
and is zero outside of the range
The base * height approach works for uniform dists, but youll need integrals for cdfs
To derive a cdf:
o
1
Take the integral of 0 4 = 4 | = 4 , where C is any given X value, and is the height of
0
The height of the cdf (f(X)) at a given X value gives the chance of getting a value between - and X
EV of a discrete RV: The weighted avg of all outcomes times their associated prob
() = 1 1 + +
o
EV of a continuous RV: ( ) =
( )
Var(X) =
)2 ]
2 = [(
)2 + + (
)2
= (1
1
)2
= (
2 = (
) 2
has a sample
2, then its sample mean
(
variance of 2 =
1 )
[( ) ( )]
(, ) =
Ranges from -1 to 1
1
) (
)
Sample cov: = 1 (
(2 2 )
The pop corr coef is better because it is not subject to changes in the units of measure
Ranges btwn-1 and 1, w/ neg numbers showing a negative correlation, pos showing positive
(
)(
)
2
2
(
) ( )
= 1 + 2 +
Linear:
Not linear:
= 1 2 +
2) There is some variation in the regressor in the sample, and is measured without error
o
A population that is unique and varied could be referred to as heterogeneous, one with little
variation is homogenous
Notice to the right that the homogenous pop would NOT provide data that would allow us to
extrapolate to low & high values of X
( ) = 0
4) The disturbance term is homoscedastic (its values will have a constant pop variance)
2
( ) = 2
This means that u is not subject to autocorrelation (no systematic association btwn ui & uj
is equal to zero
While your sample wont describe the entire population perfectly, estimating an econometric model
allows you to get a good guess of population characteristics
Note that a fitted model fits the sample data better than it fits the population. This is by design.
Ordinary least squares method: best way to obtain estimators. The Gauss-Markov theorem states that OLS
is BLUE:
Unbiased [E(bj)=j]
Estimator properties:
1 2 with 1 2
2) Efficiency: want pdf to be as concentrated as possible around the mean (want pop var to be as
small as possible)
3) Consistency: an estimator is said to be consistent if it
has a prob limit so that its dist closes into a spike
around the pop mean. The spike is located at the true
value of the characteristic you are trying to estimate.
This is called the central limit theorem (right)
Deriving estimators of regression coefs (simple model)
1 = 2
We know that
Weve already covered this extensively, but if you want further proof check out pages 85 through 92
Define the residual for each observation. The residual is the vertical distance between the actual and
(slope)
fitted values of y
2
( )
2 = 12 +22 + 32
First-order conditions for a minimum: take partial derivates of RSS and set them equal to zero
o
= 0
=0
1
2
21 = 2 (1 +
22 =
2
)2
(
MSD(X) =
)
(
2
]
( )2
(1) = 2 [
(2) = ( )2
However, it does not tell you whether your estimate comes from the middle of the pdf, or one
of the tails
With a greater
Explained SS
)2
ESS = (
Unexplained SS
)2
RSS = 2 = (
)2
TSS = (
2
(
)
=
2 = 1
= 1
( )
)2
(
)2
(
Its value can be thought of as the percent of data explained by the model
The addition of variables to a model will typically increase R2, but often by negligible amounts
o
Added observations can decrease R2. If you add an observation thats an outlier in the population (very
different from the rest), it can contribute to the fitted model being less accurate
Note that in the case of multiple reg models, it is impossible to measure each explanatory variables
contribution to the overall R2
I noticed that the properties of R2 (even some obscure ones) have shown up in the T/F questions in
every test (old and current). Make sure you are very comfortable answering questions about it
Random/nonrandom components: With the model were using, Y depends on the non-random X
according to
Nonrandom component of Y:
Random component: u
Types of data
Cross-sectional data: observations relating to units of observation at one moment in time (units could
be ppl, households, companies, countries, etc)
Time series data: repeated observations through time on the same subjects (ex: quarterly GDP)
Panel data: basically a hybrid of the two above, repeated observations on the same elements through
time
If you can, always specify the null as the thing that you dont believe (Straw persons principle),
and the HA as the thing you want to prove to be true
Closed parameters: you must account for all possible values of the test stat in H0 & H1.
0 : 1 0
Closed:
Not closed: :
0 : 1 = 0
1 : 1 < 0
1 : 1 < 0
Note that this null/alternative does not account for positive values of
Two-tail test: testing whether your test stat has a specific value or not EX: HA: 2 3
Tcrit will be lower in the one sided test, making it easier to prove your belief
We originally looked at hypotheses through the z-test, but now weve realized that there are too many
factors at play to use such a simple test (ex: df)
o
In the case of simple regression coefficients, we test hypotheses with the t-distribution
The t dist is symmetric and bell shaped, but has heavier tails. This means that its more likely to
produce values that are very far from the true mean
2 2 is too great.
To find the critical value, refer to the t table, and find where the confidence interval (top of the
Where
2 20
(2 )
0
The t-test helps you to determine whether X has some effect on Y, rather than a specific effect
Percentage level measures how sure you are with your result. This means that at the 5% level youre
pretty sure of the result, but at the 1% level, youre VERY sure
If the t stat is VERY high, try testing it at the 0.1% level (99.9% confident). This reduces the risk of a
Type 1 Error
True model: = 1 + 2 +
=
Fitted model:
22
>
(2)
1+ 2,
0
22
<
( 2 )
2 (2 ) 2
2 + (2 )
Ex: 1.999 2 2.911. In this case, we would reject hypothetical values below 1.999
or above 2.911
Error types
Type 1 error: where the null is rejected when its actually true
o
Prb (type 1) is the size of the rejection region. In a 5% test, a true null is rejected %5 of the
time
Type 2 error: when the null isnt rejected, but its false
o
Prb(type 2) is beta. Find your rejection regions t or z value, then subtract it from 1 (ex: z=1,
prb(type 2)=1 - 0.8413 = .1587
To see type 1 / 2 errors graphically, look at problem 3 on the winter 2010 MT2, and IV(d) on this
years MT2
||
P is the prb of obtaining the corresponding t stat as a matter of chance (if null is true)
A p-value less than 0.01 means that the prb is less than 1%, which means that the null would be
rejected at the 1% level
The p-value approach tells you more than the 5%/1% approach, bc it gives the exact prb of a type 1
error if the null were true
F-tests:
Even if there is no relationship between Y & X, in any sample there may appear to be one
In other words, F-tests give you a critical value for R2, which gives a cutoff point for when you can
declare that X causes Y (for a given confidence interval)
o
=
o
/(1)
/( )
2 (1)
(1 2 )/( )
2
(1 2 )/()
The critical value of F gives a cutoff point (just like t), at which point you can conclude that the
variables are correlated
o
To find the critical value of F, refer to the corresponding table which matches your confidence
interval
Find where the df(num) and df(denom) intersect. This gives you Fcrit
If your calculated value of F is greater than Fcrit, you can reject the null
For our purposes we only need to find Fcrit that lies to one side of the distribution (think of it as
a one-tailed test)
This is only true in simple models! F & t play very different roles in multi reg models
Proof on P147
Source
SS
df
MS=SS/DF
ESS
K-1
ESS/(K-1)
Model
the intercept)
Total
TSS
N=Number of observations
Multiple reg models are 3D. There are multiple independent variables
= 1 + 2 2 + 3 3 +
N-1
TSS/(N-1)
= 12 + 2.2 + 0.7
EX: if
-12 implies that one with no exp and no education would have to pay $12 per hour to work.
This is not realistic (solution later)
The coef on S implies an extra $2.20/hr for each year of education completed
( ) = 0
2 = 2
= = 1 2 2 3 3
Thus,
To get first order minimums, take the partial derivatives for b1, b2, b3
= 2 ( 1 2 2 3 3 ) = 0
1
= 2 2 ( 1 2 2 3 3 ) = 0
2
= 2 3 ( 1 2 2 3 3 ) = 0
3
1 = 2
2 3
3
( 1 2 2 3 3 )2
3 : switch all of the X2' s with X3s and vice versa in the b2 above
Efficiency
Gauss-Markov theorem proves that OLS yields the most efficient linear estimators of the parameters
(lowest possible variance)
Precision (multireg)
True model: = 1 + 2 2 + 3 3 +
= + +
Fitted model:
1
2 2
3 3
Popvar of b2:
22 =
2
1
2
2
(2 )
23
2: pop variance of u
23 : corr btwn X2&X3
2
2
Replace (2
2 ) / (3
3 ) to get pop variance of b3
(2 ) (2 )
(2 ) = (
2 )
1
123
2
Multicollinearity (multico):
2.
If N & MSD are large, and var(u) is small, you could get good ones
Multico must be caused by a combination of high corr and one of the other components being
unhelpful
Multico is most common in time series data, since the same subjects are analyzed over time
Overcoming multico:
2
1) 2 2) 3) () 4) 23
Direct methods attempt to improve the conditions that are responsible for the variances in the reg
coefs. E
2:
1) Reducing
Indirect methods
1) If correlated variables measure a similar concept, it might make sense to combine them into an
overall index
a. Ex: Vocabulary and grammar should be correlated
2) Drop some correlated variables that have insignificant coefs
a. You risk introducing bias if said variable really should be in the model
3) Use extraneous info concerning the coef of one of the variables
a. Ex: Using data found in a cross-sectional study in a time series data
b. See pg 174&175 for an extended example
4) Use a theoretical restriction (hypothetical relationship among the params of a reg model
a. EX:
1 + 2 2 + 3 3 +
b. Assume that 2 (grammar skills) are equally important to 3 (vocabulary) when determining
overall public speaking skills
i. 2 = 3
c. Now,
1 + 2 (2 + 3 ) +
Using an F-test to see if joint marginal contribution of a group of variables is significant (when adding
more)
Given model:
o
1 + 2 2 + + + , with K variables
= 1 + 2 2 + + + +1 +1 + + +
You now have explained an additional SS equal to ESSM - ESSK, using up an additional M-K df
F-test verbally:
Under the H0: additional variables contribute nothing to the original equation:
( , ) =
( )/()
/()
See table below. The upper half gives the ANOVA for the explanatory power of the original K-1
variables. The lower half gives it for the joint marginal contribution of the new variables
Explained by original
df
SS/df
ESSK
K-1
ESSK / (K-1)
N-K
RSSK / (N-K)
ESSM- ESSK =
M-K
(RSSK - RSSM) / (M -
variables
Residual
Explained by new variables
RSSK - RSSM
Residual
K)
N-M
RSSM/(N-M)
F-stat
ESSK /(K 1)
/( )
(RSSK RSSM )/(M K)
/( )
: Adjusted/corrected R2:
Remember how R2 cant decrease with additional variables? Adj. R2 compensates for this by imposing a
penalty for increasing K
= 1 (1 )
1
1
1
=
=
(1
2
It can be shown that the addition of a new variable to a model can cause adj R2 to go up, but only if
the absolute value of its t-stat is greater than 1
This means if adj R2 increases when K goes up, it doesnt necessarily mean that the coef of the new
variable is significantly different than zero
assumes that the value of a good is determined by the combined values of its components
= +
Fitted model:
1
Its natural to predict that the price of the new variety should be given by:
= 1 + Xj
Start by assuming that the good only has one relevant characteristic and we have fitted the simple reg
model Pi = 1 + b2 Xi
= 1 + b2 X
Because the new variety of the good has the characteristic X=X*: P
)
Prediction error (PE): difference btwn actual (P*) and predicted price (P
o
PE= P* - P
Assume model applies to the new good, therefore the actual price is
= 1 + 2 +
(1 + 2 + ) (1 + b2 X )
( ) = (1 + 2 + ) (1 + b2 X )
1 + 2
+ ( ) (1 )
= 1 + 2 1
( ) = 0
(2 )
You assume that there is no prediction error, and that your prediction was exact
Popvar of the PE
o
= {1 + +
1
)
(
)2
(
Obvious implications: the further the value of X* is from the sample mean, the larger its
popvar. Also, var(PE) goes down as N goes up
( ) <
< + ( )
Linear in variables (lin in var): every term consists of a straightforward variable times a parameter
Linear in parameters (lin in par): every term consists of a straightforward parameter times a
variables
Given
This transformation is only cosmetic, but it gives us a function that is linear in var & par
1 +
1 + 2 +
Logarithmic/loglinear model
Regardless of how Y & X are related mathematically, or their definitions, the elasticity of Y WRT X is
the proportional (%) change in Y for a given proportional change in X:
EX: if Y is demand for a commodity, and X is income, this defines the income elasticity of
demand for that good
Rewrite:
Therefore,
, then
Semilog models
Therefore,
o
Note that this same function can be made linear in params by logging both sides:
Only the left side is in logarithmic in variables (Log1 means its logarithmic in
parameters), still making it a semilog model
Economies of scale
The Economist defines economies of scale as factors that cause the average cost of producing
something to fall as the volume of output increases
o
Example: cell phone providers. Smaller companies are unable to compete due to the expensive
infrastructure that the market requires (like service towers)
However, these seemingly large costs are negligible for very large firms. Through their ability to
make large investments, they are able to exploit the fact that not everyone is able to produce
as cheaply as them
In the question on Midterm 2, the function was given as LN(cost) = 1 + 2 ln(Q)+u, with a null
hypothesis being that it was a competitive market. You believed that economies of scale were present,
estimating a slope of 0.6
As can be seen in the table below, the slope coefficient of this log model (2) tells you the effect of a
1% increase in quantity on the percentage change in costs
On the test question, you found a slope of 0.6, implying that increasing Q by 1% would be
accompanied by a 0.6% increase in cost
This would imply that economies of scale are present, since in a perfectly competitive market, the
increase in cost would be equal to the change in output
o
This would mean that under the null (economies of scale NOT present), 2 1, and in the
alternative (economies of scale ARE present), 2 < 1
Note: I dont understand why we use LN sometimes, and LOG sometimes. I figured that theyre
interchangeable in this context, but I have not yet confirmed it. I suggest asking before the test if youre
confused
Overview:
Linear model
1 + 2
Semilog A
= 1 + 2 log
Semilog B
Log/Log
1 + 2
2: effect of X on Y
1 + 2 log
2: effect of a 1% X
2: effect of X on
2: effect of a 1% X
on the level of Y
%Y
on %Y
u needs to be an additive term (+u) that satisfies conditions of reg model. If this is untrue, least
squares reg will not have normal properties, making tests invalid
If v=1, then the random factor is 0 (as multiplying it by 1 doesnt change the model)
This showed that to obtain an additive disturbance term, we need to start with a multiplicative one in
the original equation
If the term was additive in the original, then taking the log of
would be impossible, as
we cant calculate this complex term mathematically. You would have to use a nonlinear technique
(later)
Comparing linear and logarithmic specifications
Sometimes looking at the scatter plot can tell you if its linear or not, but not
always (ex to the right)
The problem with choosing btwn 2 models is that RSS and R2 cant be
compared between different functional forms of Y
o
If theyre both similar, you can scale the observations of Y so that RSS in
lin/log models are directly comparable (Box & Cox)
Making RSS comparable between a linear and log model (Box & Cox):
1. Calculate geometric mean of Y values in the sample. This is equal to the exponential of the mean of
logY:
= , where
The RSSs are now comparable, and the lower value provides the better fit
Do not use this method to find coefs. This is solely for deciding preferred model
Interpretation of coefs:
Quad: cant follow normal rule of holding other variables fixed to measure anothers effect on Y
because its not possible to change X2 while keeping X22 fixed
This is also the case in the inter model, since X2 also appears as X2X3
Quadratic model:
This means that B2 has a different interpretation than the ordinary model (Y=B1+B2X2+u), where B2 is
In the quad model, B2 is the effect of a unit change in X2 on Y for the special case where X2 = 0
o
Only 1 has a conventional translation. It is the value of Y (apart from random component) when X2=0
Theres another problem. Weve seen that the intercept of a regression usually doesnt have a sensible
meaning if X2 =0 is outside the data range
o
Quadratics are justified due to the concepts of diminishing marginal returns (parabolic shape)
As higher order terms are added, the fit will be improved slightly, but it will be sample specific
Ex:
Rewrite:
constant) is (3 + 4X2), and that 3 may be seen as the marginal effect of X3 on Y when X2 = 0
o
If X3 = 0 is a long way outside of the range of X3 in the sample, the interpretation of the
estimate of 2 as an estimate of the marginal effect of X2 when X3=0 should be treated with
caution.
Sometimes the estimate will be completely implausible, like giving a literal explanation of the yintercepts of a model
We have just ran into a similar problem here, with the interpretation of 2 in the quadratic
specification
Its often of interest to compare estimates of the effect of X2 & X3 on Y in models excluding and
including the interactive term
o
2 = 2
2
3 = 3
3
I couldnt figure out how to get asterisks to go directly above the Xs with the software I used to type these
functions. Note that in the functions above and below, the star near the middle of the terms should be over
the X (as in
2 )
Where
Now, coefs of
2 & 3 give the marginal effect of their variables if the other is held at the
sample mean
Rewrite:
o
o
Adding quad terms of Xs, and interactive terms to the specification is one way of investigating the
possibility of nonlinearity in Y
If there are a lot of explanatory variables in the model, we might want to have some sort of evidence
of nonlinearity before spending too much time manipulating them
Run reg in original form, save the fitted values of depvar (Y-hat)
By definition:
o
If Yhat2 is added to reg specification, it should pick up quad or inter nonlinearity, without
necessarily being highly correlated with any X variables (and consuming only one DF)
If the t-stat of the coef of Yhat2 is significant, some kind of nonlin is likely to be present
This does not tell you WHICH kind of nonlin your data is represented by, and may fail to detect
other types of nonlin
In principal, we could include higher powers of Yhat, but most dont think this is worthwhile
, and you want to obtain estimates of the betas, given data on Y & X
Note that this cannot be transformed to obtain a linear relationship, so its not possible to apply the
regular reg procedure
But, we can still use the process of minimizing RSS to estimate params
This nonlinear regression algorithm is a simple method that uses the principle of RSS minimization
1) Guess plausible values for the params
2) Calculate the predicted values of Y from the data on X using these values as the params
3) Calculate residuals for each observation and find RSS
4) Make small changes in one or more of your estimates of the params
5) Calculate the new predicted values of Y, residuals, and RSS
6) If the new RSS is smaller than the original, your new estimates of the parameters are better.
Take them as your new starting point
7) Repeat steps 4, 5, and 6 again and again until you are unable to reduce RSS any further
8) Conclude that you have minimized RSS, and describe the final estimates of the params as the
least squares estimates
Dummies are treated just like ordinary variables, despite the fact that they only have 2 possible
values
Better than having to use more than one reg model within sample
The example of types of schooling in Shanghai that persists throughout Chapter 5 is great for
explaining dummies, and will be used here
changes as N (#
1 is the FC (fixed cost) or intercept because it doesnt vary with the # of students
EX:
is the associated increase in overhead when an OCC school is being looked at (changes intercept, but
not slope. This will be addressed later)
:
, 1 , 2 ,
From the Stata output of the data we get fitted values for
Setting OCC equal to 0 and 1, respectively, we can obtain the implicit costs for the two types of
schools:
the intercept implies an annual overhead cost of -34000 Yuan for regular schools The negative value is
not at all realistic, and you should immediately realize that the model is misspecd (doesnt pass the
laugh test)
the N-coef of 331 implies a constant slope in both categories (fixed later)
= 0, : 0
If the t stat is greater than tcrit, conclude that the special schools are significantly more expensive
than regular ones
Standard errors are usually given in the outputs. Make sure they are included in the appropriate ttests
More elaborate dummies: extension to more than 2 categories and multiple sets of DVs
Now, we are going to make models with 4 possible categories of schools: General (GEN), Occupational
(OCC), Skilled Worker schools (WORKER), and Vocational (VOC)
Pick one reference category (refcat) to which the basic equation applies. You should always start with
the dominant or most normal category (unless you have good reason to do otherwise)
In the school example, general schools are picked as the first refcat
New model:
o
Where
= extra overhead
required by specific special schools in addition to the overhead of general ones, and
TECH/WORKER/VOC are dummies which will equal 1 when true and zero otherwise
o
Two dummies cannot equal 1 simultaneously in this model. One will equal 1, the rest zero. This
will change later when we use 2 separate qualitative measurements of individual schools (with
the RES
Do not make a DV for the reference category. This is why its called the omitted category. Note that
GEN is the reference category in the case above
Notice that TECH schools require an additional 154,000 Yuan of overhead over general schools
Notice that the general school has a negative number again (despite being measured against
itself). Somethings wrong w/ the model
Changing the reference category (refcat): replace the variable and its associated parameter () with one
for your previous omitted category
R2, coefs for other variables, t-stats for other variables, F-stat for the whole equation all stay
the same
Standard errors and the interpretations of the t-tests are the only things that change
DV trap:
If it were possible to calculate coefs, you couldnt interpret them. Dummies change the intercept, and
there would be no definition of the base-level intercept (b1
Ex:
o
= 1 1 + 2 +
Suppose there are M dummy categories, and you define DVs D1, DM
Since one DV will equal one, and the rest zero, the sum of DVs will always be 1
Because the intercept is a product of 1 and a special variable equal to 1 in all observations
o
This means that for all observations, the sum of the DVs is equal to this special variable
As a consequence, this model is subject to a special case of exact multico, preventing the calculation of
coefs
EX:
Regression output:
This implies that OCC schools cost 110,000 more Yuan, and residential schools as a whole cost 58,000
more
The 4 combos of OCC and RES can be broken into individual implicit cost functions by the 4 combos of OCC
and RES:
* I didnt include the other 2 because theyre pretty easy to find, and I think this process has been
pretty well-covered. Check pg 238 if you need clarification
Slope DV (interactive):
drops assumption that the slope of the reg is the same for each category of qualitative variables, since
it would be unrealistic for MC to remain fixed by each school
Slope DV: N*OCC
Effect: allows the coefs of N for OCC schools to be greater than that of regular schools
Setting OCC equal to zero gets rid of the OCC conditions (steeper slope and intercept) and gives
you the original cost function for reg schools:
is the incremental marginal cost associated w/ OCC schools, just like is the incremental
overhead cost
The addition of these variables has allowed the OCC and regular schools to have their own
slopes (MC) and intercepts (FC) while remaining a part of the same function:
If the Y-int is negative for a data set with no negative values, its probably misspecd. Likely due
to an MC which was a compromise between the slopes of regular and occ schools
F-tests of dummies
The joint explanatory power of the intercept and slope dummies can be tested with the normal F-tests,
comparing RSS when dummies are included and excluded
H0: ==0
Numerator: Cost in df: additional K estimated (# of DVs)
The term
If
If
Therefore, (
In general, there will be an improvement (RSSP - RSSA - RSSB) when the sample is split up
However, theres a price to pay when splitting it. K extra df have been used up, since instead of K
params for the pooled regression, there are now 2K params
After breaking up the sample, we are still left with (RSSA+RSSB) (unexplained) sum of squares of the
residuals, and N-2K df remaining
We can now test whether the improved fit due to splitting the sample is significant w/ a special
F-test known as a Chow Test
Which is distributed w/ K and (N-2K) df under the null (no improvement in fit)
If FStat > FCrit, use the separate regressions (because the improvement is significant)
Chow is quick. You just run 3 regs and calculate test stats
o
However, it doesnt tell you how the functions differ (if they do)
DV gives you more info bc you can perform t-tests on individual dummy coefs. This may show where
the FNs differ (if they do)
o
DV takes longer bc you have to define a DV for each intercept and each slope coef
= 1 + 2 2 +
Fitted Model
= 1 + 2 2
= 1 + 2 2
+ 3 3
= 1 + 2 2 + 3 3 +
Correct specification. No
B3X3 left out. This means coefs are biased (in general)
Problems
But, you are unaware of the importance of X3. You think the model should be (simple reg)
Instead of:
Assuming the model is correctly specified in that it has multiple regressors, b2 is subject to omitted
variable bias
If X3 is omitted from the reg model, X2 will appear to have a double effect
It will have a direct effect and a proxy effect by mimicking the effects of X3
2 ) (3
3 )
(2
(which
is the numerator in the sample correlation btwn X2 and X3, ), and the denom of the corr coef
2 3
will always be positive
If 3 and the correlation is pos, the bias will be positive, and b2 will tend to overestimate 2
The direction of the bias could just as likely be negative. It depends on the sign of the true
coef of the omitted variable, and on the sign of the correlation btwn the included and omitted
variables
Omitting a variable that should be included makes the SEs of the coefs and the test stats invalid
(generally)
Suppose that the true model is represented by a simple regression, but you think its a multiple reg model. You
estimate b2 with
Instead of
Generally, adding a redundant variable doesnt cause bias, it just causes inefficient estimations
If you regress Y on X2 and X3, b2 will be an unbiased estimator of 2, and b3 will be an unbiased
estimator of zero (as long as reg models are correct)
Proxy Variables
Used when you are unable to get data on a variable that you think should be included, or if its too
difficult to measure
o
In this case, its usually a good idea to use a proxy variable to stand in for the missing variables
(instead of simply dropping it)
o
EX: Since socioeconomic status (SES) isnt easily measurable, use income to stand in
Leaving the variable out can cause reg to suffer from omitted variable bias, making statistical
tests invalid
The results from your proxy reg may indirectly shed light on the influence of the missing
variable
EX:
2 = 3 2 + 3 = 1
2 = 3 4
Nonlinear restriction:
Run reg on restricted and unrestricted forms, denoting RSS as RRSS for the restricted model, and URSS
for the unrestricted one
The restriction makes it harder to fit a model, so RRSSURSS, generally being greater
We want to test if the improvement in fit when going from the restricted to unrestricted model is
significant
As in chapter 3,
In this case, the improvement is RRSS-URSS, and one additional df is used up in the unrestricted model
(bc theres one more parameter to estimate), and the RSS remaining is URSS
o
) =
/()
First argument for the distribution of the F stat: we are testing ONE restriction
will become the coef of one of the variables in the model, and a t-test of H0: =0 is
effectively a t-test of H0:
And reparameterize
Example on p273
If the estimate of is not significantly different than zero, we can drop it and use the restricted
version
Multiple restrictions: F-test can be applied to test whether several restrictions are valid simultaneously
Let RRSS be for the model where all P restrictions have been imposed
o
(, ) =
()
()
The t-stat can only be used to test single restrictions in isolation (one at a time)
Zero restrictions
The testing of multiple zero restrictions is a special case of testing multiple restrictions
o
The test of the joint explanatory of a group of explanatory variables can be thought of in this
way
The F stat for the equation as a whole can also be thought of in this way
RRSS is
Where URSS & UESS are for the original unrestricted model