ECON6001 F2021 Topic4

Download as pdf or txt
Download as pdf or txt
You are on page 1of 76

ECON6001: Applied Econometrics

S&W: Chapter 8
Nonlinear Regression Functions

Dr. Gedeon Lim

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


2

Outline
1. Why do we need to account for non-linearities?
2. Nonlinear functions of one variable: polynomials and logs
3. Nonlinear functions of two variables: interactions
4. Application to the California Test Score data set

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


3

Nonlinear regression function


• The regression function so far has been linear in the X’s
• But the linear approximation is not always a good one
• Goal today: Allow for non-linearities

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


4

The TestScore – STR relation looks linear


(maybe)…

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


5

But the TestScore – Income relation looks


nonlinear...

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


6

• Report by Adam Ozimek at Moody’s Analytics


• Is there anything odd about the way the relationship between
county-level Trump support and immigration levels is modeled?

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


Introduction
7

• Think back to California class size, income and test score


example
• Consider the multivariate regression we ran:
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑟𝑟𝑖𝑖 = 𝛽𝛽0 + β1 𝑠𝑠𝑠𝑠𝑟𝑟𝑖𝑖 + 𝛽𝛽2 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑒𝑒𝑖𝑖 + 𝑢𝑢𝑖𝑖

• Interpretation:
– An additional $1,000 increase in income (income is defined in $’000s) is
associated with 𝛽𝛽2 change in test scores …
– … regardless of whether $1,000 is given to ________ family
– This linearity assumption may be unrealistic

• Non-linear regression functions allow the predicted change in Y


associated with a change in X to vary with X.
• Idea: We need to fit curves to the data, use quadratic and
logarithmic regressions to do so
Copyright © 2019 Pearson Education Ltd. All Rights Reserved.
8

Nonlinear Regression Population Regression


Functions – General Ideas (SW Section 8.1)
If a relation between Y and X is nonlinear:
• The effect on Y of a change in X depends on the value of X – that
is, the marginal effect of X is not constant so e.g. effect of class
size (X) on test scores (Y) might depend on HOW BIG the class
was to begin with (X). So reducing from 30 to 25 might have
different effect than if 20 to 15
• A linear regression is mis-specified: the functional form is
wrong
• The estimator of the effect on Y of X is biased: in general it isn’t
even right on average.
• The solution is to estimate a regression function that is nonlinear
in X
Copyright © 2019 Pearson Education Ltd. All Rights Reserved.
9

Two ways to add non-linearities


• Effect on Y of a change in X depends on the value of X
(polynomials and logs)
• In contrast (later): Effect on Y of a change in 𝑋𝑋1 depends on
value of another variable 𝑿𝑿𝟐𝟐 (interaction terms)
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑟𝑟𝑖𝑖 = 𝐵𝐵0 + 𝐵𝐵1 𝑆𝑆𝑆𝑆𝑅𝑅𝑖𝑖 + 𝜀𝜀𝑖𝑖
• I.e. Slope coefficient 𝐵𝐵1 “changes”
 Let’s start with example of the first

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


10

Quadratics (and other polynomials)


• A regression can fit any relationship between Y and X by adding
higher order powers of X (X2, X3, etc.) to the right-hand side.
• The quadratic (squared) regression function is written as:
𝑌𝑌𝑖𝑖 = 𝛽𝛽0 + β1 𝑋𝑋𝑖𝑖 + 𝛽𝛽2 𝑋𝑋𝑖𝑖2 + 𝑢𝑢𝑖𝑖

• Quadratic functions fit U-shapes (parabolas) to the data. The


coefficients tell us where the U is centered, how flat or steep it is,
if it faces up or down.
• BUT: β1 no longer represents the predicted change in Y
associated with a one-unit change in X because…
• … the predicted change in Y associated with a one-unit change
in X also depends on _____________________.

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


11

Effect on Y of a change in X depends on the


value of X: TestScore – Income relation

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


12

Effect on Y of a change in X depends on the


value of X: TestScore – Income relation

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


13

Effect on Y of a change in X depends on the


value of X: TestScore – Income relation
Incomei = average district income in the ith district (thousands of
dollars per capita)
Quadratic specification:
TestScorei = β0 + β1Incomei + β2(Incomei)2 + ui
Cubic specification:
TestScorei = β0 + β1Incomei + β2(Incomei)2 + β3(Incomei)3 + ui

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


14

The estimated quadratic regression function


(a) Plot the predicted values

TestScore 607.3 3.85 Incomei − 0.0423( Incomei ) 2
=+
(2.9) (0.27) (0.0048)

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


15

Data-wise: Estimation of the quadratic


specification in STATA
generate avginc2 = avginc*avginc; Create a new regressor
reg testscr avginc avginc2, r;

Regression with robust standard errors Number of obs = 420


F( 2, 417) = 428.52
Prob > F = 0.0000
R-squared = 0.5562
Root MSE = 12.724

------------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
avginc | 3.850995 .2680941 14.36 0.000 3.32401 4.377979
avginc2 | -.0423085 .0047803 -8.85 0.000 -.051705 -.0329119
_cons | 607.3017 2.901754 209.29 0.000 601.5978 613.0056
------------------------------------------------------------------------------
Test of non-linearity: What does the null hypothesis mean?
H0: 𝐵𝐵2 = 0
H1: 𝐵𝐵2 ≠ 0
Copyright © 2019 Pearson Education Ltd. All Rights Reserved.
16

Simple calculus: Interpreting the estimated


regression function
(b) Compute the slope, evaluated at various values of X

TestScore 607.3 3.85 Incomei − 0.0423( Incomei ) 2
=+
(2.9) (0.27) (0.0048)

Predicted change in TestScore for a change in income from $5,000


per capita to $6,000 per capita:
 = 607.3 + 3.85 × 6 − 0.0423 × 62
∆TestScore
− (607.3 + 3.85 × 5 − 0.0423 × 52 )
= 3.4
𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
= 𝐵𝐵1 + 2𝐵𝐵2 𝐼𝐼𝐼𝐼𝐼𝐼
𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
Copyright © 2019 Pearson Education Ltd. All Rights Reserved.
17

Interpreting the estimated regression function


• To see how predicted test scores change with a one unit change
in income, we can no longer use only the coefficient on Income.
• We need to specify a particular level of income because now the
relationship between X and Y depends on the level of X.
• We must calculate this at each level of X we are interested in

Income TestScore = 607.3 + 3.9𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼 – 0.04𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝑒𝑒 2 �


Change in 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡

10

11

40

41

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


18

General Approach to Modeling Non-


Linearities
1. Identify possible non-linear relationships
• Economic theory/plot the data

2. Specify and estimate non-linear functions using OLS


3. Determine if non-linear spec is important: t and F-stats
4. Plot estimated nonlinear regression function
5. Estimate effect on Y of a change in X

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


The general nonlinear population regression
19

function
Yi = f (X1i, X2i,…, Xki) + ui, i = 1,…, n

Assumptions
1. E(ui| X1i, X2i,…, Xki) = 0 (same), so f is the conditional expectation
of Y given the X’s.
2. (X1i,…, Xki, Yi) are i.i.d. (same).
3. Big outliers are rare (same idea; the precise mathematical condition depends
on the specific f ).
4. No perfect multicollinearity (same idea; the precise statement depends on
the specific f ).
The expected difference in Y associated with a difference in X1, holding X2,…,
Xk constant is
ΔY = f (X1 + ΔX1, X2,…, Xk) – f (X1, X2,…, Xk)
Copyright © 2019 Pearson Education Ltd. All Rights Reserved.
20

Nonlinear Functions of a Single Independent


Variable (SW Section 8.2)
• Effect on Y of a change in X depends on the value of X
(polynomials and logs)
We’ll look at two complementary approaches:
1. Polynomials in X
Inclusion of quadratic, cubic, or higher-degree polynomials
2. Logarithmic transformations
Take log(Y) and/or log(X): a “percentages” interpretation of
the coefficients

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


21

1. Polynomials in X (generalizing before)


Approximate the population regression function by a polynomial:

Yi = β 0 + β1 X i + β 2 X i2 +  + β r X ir + ui

• r = 1 (linear), r = 2 (quadratic), r = (cubic)


• (Still) linear multiple regression model – estimate using OLS
• Estimation, hypothesis testing, etc. proceeds as in the multiple
regression model
• General test for linearity (𝐹𝐹𝑟𝑟−1,∞ ):
𝐻𝐻0 ∶ 𝐵𝐵2 = 𝐵𝐵3 = … = 𝐵𝐵𝑟𝑟 = 0
𝐻𝐻1 : 𝑎𝑎𝑎𝑎 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜 𝑛𝑛𝑛𝑛𝑛𝑛 − 𝑧𝑧𝑧𝑧𝑧𝑧𝑧𝑧
Copyright © 2019 Pearson Education Ltd. All Rights Reserved.
22

Example: the TestScore – Income relation


Incomei = average district income in the ith district (thousands of
dollars per capita)
Quadratic specification:
TestScorei = β0 + β1Incomei + β2(Incomei)2 + ui
Cubic specification:
TestScorei = β0 + β1Incomei + β2(Incomei)2 + β3(Incomei)3 + ui

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


23

Estimation of a cubic specification in STATA


(1 of 2)

gen avginc3 = avginc*avginc2; Create the cubic regressor


reg testscr avginc avginc2 avginc3, r;

Regression with robust standard errors Number of obs = 420


F( 3, 416) = 270.18
Prob > F = 0.0000
R-squared = 0.5584
Root MSE = 12.707

------------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
avginc | 5.018677 .7073505 7.10 0.000 3.628251 6.409104
avginc2 | -.0958052 .0289537 -3.31 0.001 -.1527191 -.0388913
avginc3 | .0006855 .0003471 1.98 0.049 3.27e-06 .0013677
_cons | 600.079 5.102062 117.61 0.000 590.0499 610.108
------------------------------------------------------------------------------

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


24

Estimation of a cubic specification in STATA


(2 of 2)

Testing the null hypothesis of linearity, against the alternative that the population
regression is quadratic and/or cubic, that is, it is a polynomial of degree up to 3:
H0: population coefficients on Income2 and Income3 = 0
H1: at least one of these coefficients is nonzero.
test avginc2 avginc3
( 1) avginc2 = 0.0
( 2) avginc3 = 0.0

F( 2, 416) = 37.69
Prob > F = 0.0000

The hypothesis that the population regression is linear is rejected at the 1%


significance level against the alternative that it is a polynomial of degree up to 3.

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


25

Summary: polynomial regression functions


Yi = β 0 + β1 X i + β 2 X i2 +  + β r X ir + ui

• Estimation: by OLS after defining new regressors


• The individual coefficients have complicated interpretations
• To interpret the estimated regression function:
– plot predicted values as a function of x
– compute predicted ΔY/ΔX for different values of x
• Hypotheses concerning degree r can be tested by t- and F-tests on
the appropriate (blocks of ) variable(s).
• Choice of degree r
– plot the data; t- and F-tests, check sensitivity of estimated effects; judgment.
– Or use model selection criteria (later)

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


26

2. Logarithmic functions of Y and/or X


• Logarithms are another way to allow curvature in a regression
• Logarithms help describe relationships in terms of percentage
changes, a fact that relies on this result from calculus:
 ∆x  ∆x
Here’s why : ln( x + ∆x) − ln(=x) ln 1 + ≅
 x  x
d ln( x) 1
(calculus: = )
dx x
Numerically :
ln(1.01)
= .00995 ≅ .01;
ln(1.10)
= .0953 ≅ .10 (sort of )
• Note: percentage change interpretations most accurate only when
change is relatively small (<10%)
Copyright © 2019 Pearson Education Ltd. All Rights Reserved.
27

The three log regression specifications:

Case Population regression function


I. linear-log Yi = β0 + β1ln(Xi) + ui
II. log-linear ln(Yi) = β0 + β1Xi + ui
III. log-log ln(Yi) = β0 + β1ln(Xi) + ui

• Computation of OLS coefficients stay the same but


interpretation of slope coefficient differs
• Apply the general “before and after” rule: “what is the change in
Y for a given change in X?”
• Each has a natural interpretation (for small changes in X )
Copyright © 2019 Pearson Education Ltd. All Rights Reserved.
28

I. Linear-log: 1% ∆ in X  0.01𝑩𝑩𝟏𝟏 ∆ 𝒊𝒊𝒊𝒊 𝒀𝒀


Compute Y “before” and “after” changing X:
Y = β0 + β1ln(X ) (“before”)
Now change X: Y + ΔY = β0 + β1ln(X + ΔX ) (“after”)
Subtract (“after”) – (“before”): ΔY = β1[ln(X + ΔX ) – ln(X )]
∆X
now ln( X + ∆X ) − ln( X ) ≅
X
∆X
so ∆Y ≅ β1
X
∆Y
or β1 ≅ (small ∆X )
∆X /X
Copyright © 2019 Pearson Education Ltd. All Rights Reserved.
29

I. Linear-log: 1% ∆ in X  0.01𝑩𝑩𝟏𝟏 ∆ 𝒊𝒊𝒊𝒊 𝒀𝒀


Yi = β0 + β1ln(Xi) + ui
for small ΔX,
∆Y
β1 ≅
∆X /X

∆X
Now 100 × = percentage change in X , so a 1% increase
X
in X ( multiplying X by 1.01) is associated with a .01β 1
change in Y .

(1% increase in X → .01 increase in ln(X ) → .01β1 increase in Y )

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


30

Same example: TestScore vs. ln(Income)


• Define ln(Income) (instead of taking polynomials)
• Estimate by OLS:


𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 = 557.8 + 36.42 × ln( 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝑒𝑒𝑖𝑖 )
(3.8) (1.40)

• Interpretation: If Income increases by 1%, TestScore increases


by 0.36 points
• Standard errors, confidence intervals, R2 – all the usual tools
of regression apply here.
• How does this compare to the cubic model?

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


31

The linear-log and cubic regression functions

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


32

II. Log-linear population regression function


• A 1 unit change in X  100 𝐵𝐵1 % ∆ 𝑌𝑌

ln(Y ) = β0 + β1X (b)


Now change X: ln(Y + ΔY ) = β0 + β1(X + ΔX ) (a)
Subtract (a) – (b): ln(Y + ΔY ) – ln(Y ) = β1ΔX

∆Y
so ≅ β1∆X
Y
∆Y /Y
or β1 ≅ (small ∆X )
∆X

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


33

II. Log-linear population regression function


ln(Yi ) =β 0 + β1 X i + ui
∆Y /Y
for small ∆X , β1 ≅
∆X
∆Y
• Now 100 × = percentage change in Y , so a change in X by
Y
one unit ( ∆X = 1) is associated with a 100 β 1 % change in Y .

• 1 unit increase in X → β1 increase in ln(Y )


→ 100β1% increase in Y

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


II. Log-linear population regression function -
34

Example
• Consider the following relationship between income and years of
education (also including controls for age and gender):

ln 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝛽𝛽2 𝑎𝑎𝑎𝑎𝑎𝑎 + 𝛽𝛽3 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 + 𝑢𝑢𝑖𝑖


• Here are estimates of that relationship

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


35

II. Log-linear population regression function


- Example

• Interpret the coefficient on educ. Is it statistically significant?

• Interpret the coefficient on female. Is it statistically significant?

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


36

III. Log-log population regression function


ln(Yi) = β0 + β1ln(Xi) + ui (b)
Now change X: ln(Y + ΔY ) = β0 + β1ln(X + ΔX ) (a)
Subtract: ln(Y + ΔY ) – ln(Y ) = β1[ln(X + βX ) – ln(X )]

∆Y ∆X
so ≅ β1
Y X
∆Y /Y
or β1 ≅ (small ∆X )
∆X /X

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


37

III. Log-log population regression function


ln(Yi) = β0 + β1ln(Xi) + ui
for small ΔX,
∆Y /Y
β1 ≅
∆X /X
∆Y ∆X
Now 100 × percentage change
= in Y , and 100 × percentage
Y X
change in X , so a 1% change in X is associated with a β 1 %
change in Y .

In the log-log specification, β1 has the interpretation of an


elasticity.

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


38

Example: ln(TestScore) vs. ln(Income)


• Define: ln(TestScore), and the new regressor, ln(Income)
• Linear regression  estimate by OLS:


ln( 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇) = 6.336 + 0.0554 × ln( 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝑒𝑒𝑖𝑖 )
(0.006) (0.0021)

An 1% increase in Income is associated with an increase of .0554%


in TestScore

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


39

Example: ln(TestScore) vs. ln(Income)



ln( 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇) = 6.336 + 0.0554 × ln( 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝑒𝑒𝑖𝑖 )
(0.006) (0.0021)

• E.g.: Suppose income increases from $10,000 to $11,000 (by


10%).
• TestScore increases by ≈ .0554 × 10% = .554%.
• If TestScore = 650, this corresponds to increase of 3.6 points
(0.00554 × 650).
• How does this compare to the log-linear model?

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


40

The log-linear and log-log specifications:

• Note vertical axis


• Neither seems to fit as well as the cubic or linear-log, at least
based on visual inspection (formal comparison is difficult
because the dependent variables differ)
Copyright © 2019 Pearson Education Ltd. All Rights Reserved.
41

Logarithms – Example 2
• Here’s an estimated relationship between neighborhood pollution
and housing prices:
� = 9.23 − 0.72 ln 𝑛𝑛𝑛𝑛𝑛𝑛 + 0.31𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟
ln 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
– price = median housing price in the neighborhood
– nox = amount of nitrous oxide in the air (parts per million)
– rooms = mean number of rooms in houses in the neighborhood

• Interpret the coefficient on log(nox):

• Interpret the coefficient on rooms:

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


42

The three log regression specifications:


Population regression
Case
function
I. linear-log: 1 % ↑X  0.01 β1∆Y Yi = β0 + β1ln(Xi) + ui
II. log-linear: 1 unit ↑ X  100 β1%∆Y ln(Yi) = β0 + β1Xi + ui
III. log-log: 1% ↑ X  β1% ∆Y ln(Yi) = β0 + β1ln(Xi) + ui

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


43

Summary: Logarithmic transformations


• Three cases, differing in whether Y and/or X is transformed by
taking logarithms.
• The regression is linear in the new variable(s) ln(Y ) and/or ln(X ),
and the coefficients can be estimated by OLS.
• Hypothesis tests and confidence intervals are now implemented
and interpreted “as usual.”
• The interpretation of β1 differs from case to case.
The choice of specification (functional form) should be guided by
judgment (which interpretation makes the most sense in your
application?), tests, and plotting predicted values

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


44

Two ways to add non-linearities


• Previously: We allowed effect on Y of a change in X to depend
on the value of X (polynomials and logs)
• Now: Effect on Y of a change in 𝑋𝑋1 depends on value of
another variable 𝑿𝑿𝟐𝟐 (interaction terms)
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑟𝑟𝑖𝑖 = 𝐵𝐵0 + 𝐵𝐵1 𝑆𝑆𝑆𝑆𝑅𝑅𝑖𝑖 + 𝜀𝜀𝑖𝑖
• I.e. Slope coefficient 𝐵𝐵1 “changes”
 Done with first, let’s move on to second

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


45

Interactions Between Independent Variables


• Effect on Y of a change in 𝑋𝑋1 depends on value of another
variable 𝑿𝑿𝟐𝟐
• E.g.: Reduction in class sizes might be more effective under
certain circumstances
• Perhaps smaller classes help more if there are many English
learners, who need individual attention
∆TestScore
• That is, might depend on PctEL
∆STR
∆Y
• More generally, might depend on X 2
∆X 1

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


46

Interactions Between Independent Variables


• More generally: used to answer questions such as:
– Is the association between class sizes and test scores the same for classes
with more vs less English language learners
– Is the association between wages and years of education the same for
immigrants and non-immigrants?

• Q: How do we model such “interactions” between X1 and X2?


• A: By defining a variable called an interaction
– The multiplicative product of two explanatory variables (𝑿𝑿𝟏𝟏 ∗ 𝐗𝐗 𝟐𝟐 )
– Add to usual regressions

• A: Three cases
– Binary X Binary
– Binary X Continuous
– Continuous X Continuous
Copyright © 2019 Pearson Education Ltd. All Rights Reserved.
47

Binary X Binary
Yi = β0 + β1D1i + β2D2i + ui
• D1i, D2i are binary (dummy variables)

1 𝑖𝑖𝑖𝑖 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 1 𝑖𝑖𝑖𝑖 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓


• Say, D1i = � , D2i = �
0 𝑜𝑜𝑜𝑜ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 0 𝑜𝑜𝑜𝑜ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒
• β1 is the effect of changing D1= 0 to D1 = 1. In this specification,
this effect doesn’t depend on the value of D2.
• If, however, we think that effect of changing D1 depends on D2:
include “interaction term” D1i × D2i
Yi = β0 + β1D1i + β2D2i + β3(D1i × D2i) + ui

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


48

Binary X Binary: Interpreting the coefficients


Yi = β0 + β1D1i + β2D2i + β3(D1i × D2i) + ui
General rule: compare the various cases
E(Yi|D1i = 0, D2i = d2) = β0 + β2d2 (b)
E(Yi|D1i = 1, D2i = d2) = β0 + β1 + β2d2 + β3d2 (a)
subtract (a) – (b):
E(Yi|D1i = 1, D2i = d2) – E(Yi|D1i = 0, D2i = d2) = β1 + β3d2
• The effect of D1 depends on d2 (what we wanted)
• β3 = increment to the effect of D1, when D2 = 1

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


49

Example: TestScore, STR, English learners (1 of 2)


Let
1 if 𝑆𝑆𝑆𝑆𝑆𝑆 ≥ 20 1 if 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 ≥ l0
𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻 = � and 𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻 = �
0 if 𝑆𝑆𝑆𝑆𝑆𝑆 < 20 0 if 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 < 10


𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 = 664.1 − 18.2𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻 − 1.9𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻 − 3.5(𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻 × 𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻)
1.4 (2.3) (1.9) (3.1)

• “Effect” of HiSTR when HiEL = 0 is –1.9


• “Effect” of HiSTR when HiEL = 1 is –1.9 – 3.5 = –5.4
• One interpretation: Class size reduction is estimated to have a
bigger effect when the percent of English learners is large (or?)
• This interaction isn’t statistically significant: t = 3.5/3.1
Copyright © 2019 Pearson Education Ltd. All Rights Reserved.
50

Example: TestScore, STR, English learners (2 of 2)


Let
1 if STR ≥ 20  1 if PctEL ≥ l0
HiSTR = and HiEL 
0 if STR < 20 0 if PctEL < 10

TestScore 664.1 − 18.2 HiEL − 1.9 HiSTR − 3.5( HiSTR × HiEL)
=
(1.4) (2.3) (1.9) (3.1)

• Can you relate these coefficients to the following table of group


(“cell”) means?

Low STR High STR


Low EL 664.1 662.2
High EL 645.9 640.5

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


51

Continuous X Binary: Motivation


• Consider this regression of wages on years of education (beyond
8th grade)
𝑤𝑤𝑤𝑤𝑤𝑤𝑒𝑒𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 + 𝛽𝛽2 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝑢𝑢𝑖𝑖
• Using OLS, we can estimate this with data from the 2016
American Community Survey on 30-50 year old residents
(N=17,288):

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


52

Continuous X Binary: Motivation


• What’s the estimated change in wages associated with 1 more
year of schooling?
• For immigrants?

• For non-immigrants?

• Here’s a graphical
version of the model:

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


53

Continuous X Binary: Motivation


• Model assumes relationship between education and wages is
_________ for immigrants and non-immigrants, which may not be
realistic.
• If that relationship is, for example, stronger for immigrants than
nonimmigrants, the graph would look like:

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


54

Continuous X Binary Interaction


• We want the regression model to allow the ___________ (not
just the ___________) to differ by immigration status.

• We can do so by creating an
interaction between the dummy
variable immigrant and the
continuous variable educ by
multiplying the two:

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


55

Continuous X Binary Interaction


• To see why that interaction variable helps, consider this regression
model:
𝑤𝑤𝑤𝑤𝑤𝑤𝑒𝑒𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 + 𝛽𝛽2 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝛽𝛽3 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖gXeduc + ui

• What is the sample regression function for:


– Non-immigrants?

– Immigrants?

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


56

Continuous X Binary Interaction


• What’s the predicted wage change associated 1 more year of
education for
– Non-immigrants?
– Immigrants?

• What is the interpretation of


�1
• 𝐵𝐵
�3
• 𝐵𝐵

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


57

Continuous X Binary Interaction


• Here are the actual results from the data:

• What’s the predicted wage change associated with 1 more year


of education for
– Non-immigrants?
– Immigrants?

• Which slope is steeper? Is the difference in slopes statistically


significant?

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


58

Continuous X Binary Interaction


• Which slope is steeper? Is the difference in slopes statistically
significant?
• Here’s the graphical version of the regression model:

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


59

Binary-continuous interactions: the two


regression lines (1 of 2)
Yi = β0 + β1Di + β2Xi + β3(Di × Xi) + ui
Observations with Di = 0 (the “D = 0” group):
Yi = β0 + β2Xi + ui The D = 0 regression line
Observations with Di = 1 (the “D = 1” group):
Yi = β0 + β1 + β2Xi + β3Xi + ui
= (β0 + β1) + (β2 + β3)Xi + ui The D = 1 regression line

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


60

Binary-continuous interactions: the two


regression lines (2 of 2)

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


61

Graph (c): omitting lower order terms


• Always include the lower order terms!
• Imagine if you omit lower order term for immigrants
• 𝑤𝑤𝑤𝑤𝑤𝑤𝑒𝑒𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 + 𝛽𝛽2 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝛽𝛽3 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖gXeduc + ui
becomes…
𝑤𝑤𝑤𝑤𝑤𝑤𝑒𝑒𝑖𝑖 = 𝛽𝛽0 + 0 ∗ 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 + 𝛽𝛽2 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝛽𝛽3 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖gXeduc + ui
Intercept for educ Slope for educ
Non-immigrant (immig=0)

Immigrant (immig=1)

• Implication: no difference between immigrants and non-


immigrants when educ =0
• Distorts slope estimates, very rarely justified
Copyright © 2019 Pearson Education Ltd. All Rights Reserved.
62

Interpreting the coefficients


Yi = β0 + β1Di + β2Xi + β3(Di × Xi) + ui
General rule: compare the various cases
Y = β0 + β1D + β2X + β3(D × X ) (b)
Now change X:
Y + ΔY = β0 + β1D + β2(X + ΔX ) + β3[D × (X + ΔX)] (a)
subtract (a) − (b):
∆Y
∆Y = β 2 ∆X + β 3 D∆X or = β 2 + β3 D
∆X
• The effect of X depends on D (what we wanted)
• β3 = increment to the effect of X, when D = 1
Copyright © 2019 Pearson Education Ltd. All Rights Reserved.
63

Example: TestScore, STR, HiEL (=1 if


PctEL ≥ 10) (1 of 2)

𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 = 682.2 − 0.97𝑆𝑆𝑆𝑆𝑆𝑆 + 5.6𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻 − 1.28(𝑆𝑆𝑆𝑆𝑆𝑆 × 𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻)
11.9 0.59 19.5 (0.97)
• When 𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻 = 0

𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 = 682.2 − 0.97𝑆𝑆𝑆𝑆𝑆𝑆
• When 𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻 = 1,

𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 = 682.2 − 0.97𝑆𝑆𝑆𝑆𝑆𝑆 + 5.6 − 1.28𝑆𝑆𝑆𝑆𝑆𝑆

𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 = 687.8 − 2.25𝑆𝑆𝑆𝑆𝑆𝑆

• Two regression lines: one for each HiEL group.


• Class size reduction is estimated to have a larger effect when
the percent of English learners is large.

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


64

Example: TestScore, STR, HiEL (=1 if


PctEL ≥ 10) (2 of 2)

TestScore 682.2 − 0.97 STR + 5.6 HiEL − 1.28( STR × HiEL)
=
(11.9) (0.59) (19.5) (0.97)
• The two regression lines have the same slope ↔ the coefficient
on STR×HiEL is zero: t = –1.28/0.97 = –1.32
• The two regression lines have the same intercept ↔ the
coefficient on HiEL is zero: t = –5.6/19.5 = 0.29
• The two regression lines are the same ↔ population coefficient
on HiEL = 0 and population coefficient on STR×HiEL = 0: F =
89.94 (p-value < .001) (!!)
• We reject the joint hypothesis but neither individual hypothesis
(how can this be?)
Copyright © 2019 Pearson Education Ltd. All Rights Reserved.
65

(c) Interactions between two continuous


variables
Yi = β0 + β1X1i + β2X2i + ui
• X1, X2 are continuous
• As specified, the effect of X1 doesn’t depend on X2
• As specified, the effect of X2 doesn’t depend on X1
• To allow the effect of X1 to depend on X2, include the
“interaction term” X1i × X2i as a regressor:
Yi = β0 + β1X1i + β2X2i + β3(X1i × X2i) + ui

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


66

Interpreting the coefficients:


Yi = β0 + β1X1i + β2X2i + β3(X1i × X2i) + ui
General rule: compare the various cases
Y = β0 + β1X1 + β2X2 + β3(X1 × X2) (b)
Now change X1:
Y + ΔY = β0 + β1(X1 + ΔX1) + β2X2 + β3[(X1 + ΔX1) × X2] (a)
subtract (a) – (b):
∆Y
∆Y = β1∆X 1 + β 3 X 2 ∆X 1 or = β1 + β3 X 2
∆X 1

• The effect of X1 depends on X2 (what we wanted)


• β3 = increment to the effect of X1 from a unit change in X2
Copyright © 2019 Pearson Education Ltd. All Rights Reserved.
67

Example: TestScore, STR, PctEL (1 of 2)



TestScore 686.3 − 1.12 STR − 0.67 PctEL + .0012( STR × PctEL),
=
(11.8) (0.59) (0.37) (0.019)
The estimated effect of class size reduction is nonlinear because
the size of the effect itself depends on PctEL:
∆TestScore
−1.12 + .0012 PctEL
=
∆STR

∆TestScore
PctEL
∆STR
0 –1.12
20% –1.12 + .0012 × 20 = –1.10
Copyright © 2019 Pearson Education Ltd. All Rights Reserved.
68

Example: TestScore, STR, PctEL (2 of 2)



𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 = 686.3 − 1.12𝑆𝑆𝑆𝑆𝑆𝑆 − 0.67𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 + .0012(𝑆𝑆𝑆𝑆𝑆𝑆 × 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃),
(11.8) (0.59) (0.37) (0.019)

• Does population coefficient on STR×PctEL = 0?


t = .0012/.019 = .06 → can’t reject null at 5% level
• Does population coefficient on STR = 0?
t = –1.12/0.59 = –1.90 → can’t reject null at 5% level
• Do the coefficients on both STR and STR×PctEL = 0?
F = 3.89 (p-value = .021) → reject null at 5% level (Why? high
but imperfect multicollinearity)
Copyright © 2019 Pearson Education Ltd. All Rights Reserved.
69

Application: Nonlinear Effects on Test Scores


of the Student-Teacher Ratio (SW Section 8.4)
Nonlinear specifications let us examine more nuanced questions
about the Test score – STR relation, such as:
1. Are there nonlinear effects of class size reduction on test
scores? (Does a reduction from 35 to 30 have same effect as a
reduction from 20 to 15?)
2. Are there nonlinear interactions between PctEL and STR? (Are
small classes more effective when there are many English
learners?)

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


70

Strategy for Question #1 (different effects


for different STR?)
• Estimate linear and nonlinear functions of STR, holding constant
relevant demographic variables
– PctEL
– Income (remember the nonlinear TestScore-Income relation!)
– LunchPCT (fraction on free/subsidized lunch)

• See whether adding the nonlinear terms makes an “economically


important” quantitative difference
• Test for whether nonlinear terms are significant

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


71

Strategy for Question #2 (interactions


between PctEL and STR?)
• Estimate linear and nonlinear functions of STR, interacted with
PctEL.
• If the specification is nonlinear (with STR, STR2, STR3), add
interactions with all the terms so entire functional form can be
different, depending on the level of PctEL.
• We will use a binary-continuous interaction specification by
adding HiEL×STR, HiEL×STR2, and HiEL×STR3.

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


72

What is a good “base” specification? (1 of 3)


• The TestScore – Income relation:
• The logarithmic specification is better behaved near the extremes
of the sample, especially for large values of income.

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


73

What is a good “base” specification? (2 of 3)


TABLE 8.3 Nonlinear Regression Models of Test Scores

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


74

What is a good “base” specification? (3 of 3)


TABLE 8.3 (Continued)

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.


75

Tests of joint hypotheses:

What can you conclude about question #1?


About question #2?
Copyright © 2019 Pearson Education Ltd. All Rights Reserved.
76

Summary: Nonlinear Regression Functions

• Functions of independent variables allows recasting nonlinear


regression functions as multiple regression.
• Estimation and inference proceed in the same way
• If you ever forget how to interpret coefficients, try predicting
outcomes for various groups in your data one at a time (different
X’es).
• Interactions often used to explore heterogeneity, i.e. whether
relationships between two variables differ by a third variable (gender,
income, race)
• Post-midterms: We will use interactions to implement a method
of causal inference called differences-in-differences.

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.

You might also like