ECON6001 F2021 Topic4

ECON6001: Applied Econometrics
S&W: Chapter 8
Nonlinear Regression Functions
Dr. Gedeon Lim
Copyright © 2019 Pearson Education Ltd. All Rights Reserved.

2
Outline
1. Why do we need to account for non-linearities?
2. Nonlinear functions of one variable: polynomials and logs
3. Nonlinear functions of two variables: interactions
4. Application to the California Test Score data set

3
Nonlinear regression function

• The regression function so far has been linear in the X’s
• But the linear approximation is not always a good one
• Goal today: Allow for non-linearities

4
The TestScore – STR relation looks linear

(maybe)…

5
But the TestScore – Income relation looks

nonlinear...

6
• Report by Adam Ozimek at Moody’s Analytics

• Is there anything odd about the way the relationship between
county-level Trump support and immigration levels is modeled?

Introduction
7
• Think back to California class size, income and test score

example
• Consider the multivariate regression we ran:
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑟𝑟𝑖𝑖 = 𝛽𝛽0 + β1 𝑠𝑠𝑠𝑠𝑟𝑟𝑖𝑖 + 𝛽𝛽2 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑒𝑒𝑖𝑖 + 𝑢𝑢𝑖𝑖
• Interpretation:
– An additional $1,000 increase in income (income is defined in $’000s) is
associated with 𝛽𝛽2 change in test scores …
– … regardless of whether $1,000 is given to ________ family
– This linearity assumption may be unrealistic
• Non-linear regression functions allow the predicted change in Y

associated with a change in X to vary with X.
• Idea: We need to fit curves to the data, use quadratic and
logarithmic regressions to do so
8
Nonlinear Regression Population Regression

Functions – General Ideas (SW Section 8.1)
If a relation between Y and X is nonlinear:
• The effect on Y of a change in X depends on the value of X – that
is, the marginal effect of X is not constant so e.g. effect of class
size (X) on test scores (Y) might depend on HOW BIG the class
was to begin with (X). So reducing from 30 to 25 might have
different effect than if 20 to 15
• A linear regression is mis-specified: the functional form is
wrong
• The estimator of the effect on Y of X is biased: in general it isn’t
even right on average.
• The solution is to estimate a regression function that is nonlinear
in X
9
Two ways to add non-linearities

• Effect on Y of a change in X depends on the value of X
(polynomials and logs)
• In contrast (later): Effect on Y of a change in 𝑋𝑋1 depends on
value of another variable 𝑿𝑿𝟐𝟐 (interaction terms)
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑟𝑟𝑖𝑖 = 𝐵𝐵0 + 𝐵𝐵1 𝑆𝑆𝑆𝑆𝑅𝑅𝑖𝑖 + 𝜀𝜀𝑖𝑖
• I.e. Slope coefficient 𝐵𝐵1 “changes”
 Let’s start with example of the first

10
Quadratics (and other polynomials)

• A regression can fit any relationship between Y and X by adding
higher order powers of X (X2, X3, etc.) to the right-hand side.
• The quadratic (squared) regression function is written as:
𝑌𝑌𝑖𝑖 = 𝛽𝛽0 + β1 𝑋𝑋𝑖𝑖 + 𝛽𝛽2 𝑋𝑋𝑖𝑖2 + 𝑢𝑢𝑖𝑖
• Quadratic functions fit U-shapes (parabolas) to the data. The

coefficients tell us where the U is centered, how flat or steep it is,
if it faces up or down.
• BUT: β1 no longer represents the predicted change in Y
associated with a one-unit change in X because…
• … the predicted change in Y associated with a one-unit change
in X also depends on _____________________.

11
Effect on Y of a change in X depends on the

value of X: TestScore – Income relation

12


13

Incomei = average district income in the ith district (thousands of
dollars per capita)
Quadratic specification:
TestScorei = β0 + β1Incomei + β2(Incomei)2 + ui
Cubic specification:
TestScorei = β0 + β1Incomei + β2(Incomei)2 + β3(Incomei)3 + ui

14
The estimated quadratic regression function

(a) Plot the predicted values

TestScore 607.3 3.85 Incomei − 0.0423( Incomei ) 2
=+
(2.9) (0.27) (0.0048)

15
Data-wise: Estimation of the quadratic

specification in STATA
generate avginc2 = avginc*avginc; Create a new regressor
reg testscr avginc avginc2, r;
Regression with robust standard errors Number of obs = 420

F( 2, 417) = 428.52
Prob > F = 0.0000
R-squared = 0.5562
Root MSE = 12.724
------------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
avginc | 3.850995 .2680941 14.36 0.000 3.32401 4.377979
avginc2 | -.0423085 .0047803 -8.85 0.000 -.051705 -.0329119
_cons | 607.3017 2.901754 209.29 0.000 601.5978 613.0056
------------------------------------------------------------------------------
Test of non-linearity: What does the null hypothesis mean?
H0: 𝐵𝐵2 = 0
H1: 𝐵𝐵2 ≠ 0
16
Simple calculus: Interpreting the estimated

regression function
(b) Compute the slope, evaluated at various values of X

TestScore 607.3 3.85 Incomei − 0.0423( Incomei ) 2
=+
(2.9) (0.27) (0.0048)
Predicted change in TestScore for a change in income from $5,000

per capita to $6,000 per capita:
 = 607.3 + 3.85 × 6 − 0.0423 × 62
∆TestScore
− (607.3 + 3.85 × 5 − 0.0423 × 52 )
= 3.4
𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
= 𝐵𝐵1 + 2𝐵𝐵2 𝐼𝐼𝐼𝐼𝐼𝐼
𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
17
Interpreting the estimated regression function

• To see how predicted test scores change with a one unit change
in income, we can no longer use only the coefficient on Income.
• We need to specify a particular level of income because now the
relationship between X and Y depends on the level of X.
• We must calculate this at each level of X we are interested in
Income TestScore = 607.3 + 3.9𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼 – 0.04𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝑒𝑒 2 �

Change in 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
10
11
40
41

18
General Approach to Modeling Non-

Linearities
1. Identify possible non-linear relationships
• Economic theory/plot the data
2. Specify and estimate non-linear functions using OLS

3. Determine if non-linear spec is important: t and F-stats
4. Plot estimated nonlinear regression function
5. Estimate effect on Y of a change in X

The general nonlinear population regression
19
function
Yi = f (X1i, X2i,…, Xki) + ui, i = 1,…, n
Assumptions
1. E(ui| X1i, X2i,…, Xki) = 0 (same), so f is the conditional expectation
of Y given the X’s.
2. (X1i,…, Xki, Yi) are i.i.d. (same).
3. Big outliers are rare (same idea; the precise mathematical condition depends
on the specific f ).
4. No perfect multicollinearity (same idea; the precise statement depends on
the specific f ).
The expected difference in Y associated with a difference in X1, holding X2,…,
Xk constant is
ΔY = f (X1 + ΔX1, X2,…, Xk) – f (X1, X2,…, Xk)
20
Nonlinear Functions of a Single Independent

Variable (SW Section 8.2)
• Effect on Y of a change in X depends on the value of X
(polynomials and logs)
We’ll look at two complementary approaches:
1. Polynomials in X
Inclusion of quadratic, cubic, or higher-degree polynomials
2. Logarithmic transformations
Take log(Y) and/or log(X): a “percentages” interpretation of
the coefficients

21
1. Polynomials in X (generalizing before)

Approximate the population regression function by a polynomial:
Yi = β 0 + β1 X i + β 2 X i2 +  + β r X ir + ui
• r = 1 (linear), r = 2 (quadratic), r = (cubic)

• (Still) linear multiple regression model – estimate using OLS
• Estimation, hypothesis testing, etc. proceeds as in the multiple
regression model
• General test for linearity (𝐹𝐹𝑟𝑟−1,∞ ):
𝐻𝐻0 ∶ 𝐵𝐵2 = 𝐵𝐵3 = … = 𝐵𝐵𝑟𝑟 = 0
𝐻𝐻1 : 𝑎𝑎𝑎𝑎 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜 𝑛𝑛𝑛𝑛𝑛𝑛 − 𝑧𝑧𝑧𝑧𝑧𝑧𝑧𝑧
22
Example: the TestScore – Income relation

Incomei = average district income in the ith district (thousands of
dollars per capita)
Quadratic specification:
TestScorei = β0 + β1Incomei + β2(Incomei)2 + ui
Cubic specification:
TestScorei = β0 + β1Incomei + β2(Incomei)2 + β3(Incomei)3 + ui

23
Estimation of a cubic specification in STATA

(1 of 2)
gen avginc3 = avginc*avginc2; Create the cubic regressor

reg testscr avginc avginc2 avginc3, r;
Regression with robust standard errors Number of obs = 420

F( 3, 416) = 270.18
Prob > F = 0.0000
R-squared = 0.5584
Root MSE = 12.707
------------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
avginc | 5.018677 .7073505 7.10 0.000 3.628251 6.409104
avginc2 | -.0958052 .0289537 -3.31 0.001 -.1527191 -.0388913
avginc3 | .0006855 .0003471 1.98 0.049 3.27e-06 .0013677
_cons | 600.079 5.102062 117.61 0.000 590.0499 610.108
------------------------------------------------------------------------------

24
Estimation of a cubic specification in STATA

(2 of 2)
Testing the null hypothesis of linearity, against the alternative that the population
regression is quadratic and/or cubic, that is, it is a polynomial of degree up to 3:
H0: population coefficients on Income2 and Income3 = 0
H1: at least one of these coefficients is nonzero.
test avginc2 avginc3
( 1) avginc2 = 0.0
( 2) avginc3 = 0.0
F( 2, 416) = 37.69
Prob > F = 0.0000
The hypothesis that the population regression is linear is rejected at the 1%

significance level against the alternative that it is a polynomial of degree up to 3.

25
Summary: polynomial regression functions

Yi = β 0 + β1 X i + β 2 X i2 +  + β r X ir + ui
• Estimation: by OLS after defining new regressors

• The individual coefficients have complicated interpretations
• To interpret the estimated regression function:
– plot predicted values as a function of x
– compute predicted ΔY/ΔX for different values of x
• Hypotheses concerning degree r can be tested by t- and F-tests on
the appropriate (blocks of ) variable(s).
• Choice of degree r
– plot the data; t- and F-tests, check sensitivity of estimated effects; judgment.
– Or use model selection criteria (later)

26
2. Logarithmic functions of Y and/or X

• Logarithms are another way to allow curvature in a regression
• Logarithms help describe relationships in terms of percentage
changes, a fact that relies on this result from calculus:
 ∆x  ∆x
Here’s why : ln( x + ∆x) − ln(=x) ln 1 + ≅
 x  x
d ln( x) 1
(calculus: = )
dx x
Numerically :
ln(1.01)
= .00995 ≅ .01;
ln(1.10)
= .0953 ≅ .10 (sort of )
• Note: percentage change interpretations most accurate only when
change is relatively small (<10%)
27
The three log regression specifications:
Case Population regression function

I. linear-log Yi = β0 + β1ln(Xi) + ui
II. log-linear ln(Yi) = β0 + β1Xi + ui
III. log-log ln(Yi) = β0 + β1ln(Xi) + ui
• Computation of OLS coefficients stay the same but

interpretation of slope coefficient differs
• Apply the general “before and after” rule: “what is the change in
Y for a given change in X?”
• Each has a natural interpretation (for small changes in X )
28
I. Linear-log: 1% ∆ in X  0.01𝑩𝑩𝟏𝟏 ∆ 𝒊𝒊𝒊𝒊 𝒀𝒀

Compute Y “before” and “after” changing X:
Y = β0 + β1ln(X ) (“before”)
Now change X: Y + ΔY = β0 + β1ln(X + ΔX ) (“after”)
Subtract (“after”) – (“before”): ΔY = β1[ln(X + ΔX ) – ln(X )]
∆X
now ln( X + ∆X ) − ln( X ) ≅
X
∆X
so ∆Y ≅ β1
X
∆Y
or β1 ≅ (small ∆X )
∆X /X
29
I. Linear-log: 1% ∆ in X  0.01𝑩𝑩𝟏𝟏 ∆ 𝒊𝒊𝒊𝒊 𝒀𝒀

Yi = β0 + β1ln(Xi) + ui
for small ΔX,
∆Y
β1 ≅
∆X /X
∆X
Now 100 × = percentage change in X , so a 1% increase
X
in X ( multiplying X by 1.01) is associated with a .01β 1
change in Y .
(1% increase in X → .01 increase in ln(X ) → .01β1 increase in Y )

30
Same example: TestScore vs. ln(Income)

• Define ln(Income) (instead of taking polynomials)
• Estimate by OLS:
�
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 = 557.8 + 36.42 × ln( 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝑒𝑒𝑖𝑖 )
(3.8) (1.40)
• Interpretation: If Income increases by 1%, TestScore increases

by 0.36 points
• Standard errors, confidence intervals, R2 – all the usual tools
of regression apply here.
• How does this compare to the cubic model?

31
The linear-log and cubic regression functions

32
II. Log-linear population regression function

• A 1 unit change in X  100 𝐵𝐵1 % ∆ 𝑌𝑌
ln(Y ) = β0 + β1X (b)

Now change X: ln(Y + ΔY ) = β0 + β1(X + ΔX ) (a)
Subtract (a) – (b): ln(Y + ΔY ) – ln(Y ) = β1ΔX
∆Y
so ≅ β1∆X
Y
∆Y /Y
∆X

33

ln(Yi ) =β 0 + β1 X i + ui
∆Y /Y
for small ∆X , β1 ≅
∆X
∆Y
• Now 100 × = percentage change in Y , so a change in X by
Y
one unit ( ∆X = 1) is associated with a 100 β 1 % change in Y .
• 1 unit increase in X → β1 increase in ln(Y )

→ 100β1% increase in Y

II. Log-linear population regression function -
34
Example
• Consider the following relationship between income and years of
education (also including controls for age and gender):
ln 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝛽𝛽2 𝑎𝑎𝑎𝑎𝑎𝑎 + 𝛽𝛽3 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 + 𝑢𝑢𝑖𝑖

• Here are estimates of that relationship

35

- Example
• Interpret the coefficient on educ. Is it statistically significant?
• Interpret the coefficient on female. Is it statistically significant?

36
III. Log-log population regression function

ln(Yi) = β0 + β1ln(Xi) + ui (b)
Now change X: ln(Y + ΔY ) = β0 + β1ln(X + ΔX ) (a)
Subtract: ln(Y + ΔY ) – ln(Y ) = β1[ln(X + βX ) – ln(X )]
∆Y ∆X
so ≅ β1
Y X
∆Y /Y
∆X /X

37
III. Log-log population regression function

ln(Yi) = β0 + β1ln(Xi) + ui
for small ΔX,
∆Y /Y
β1 ≅
∆X /X
∆Y ∆X
Now 100 × percentage change
= in Y , and 100 × percentage
Y X
change in X , so a 1% change in X is associated with a β 1 %
change in Y .
In the log-log specification, β1 has the interpretation of an

elasticity.

38
Example: ln(TestScore) vs. ln(Income)

• Define: ln(TestScore), and the new regressor, ln(Income)
• Linear regression  estimate by OLS:
�
ln( 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇) = 6.336 + 0.0554 × ln( 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝑒𝑒𝑖𝑖 )
(0.006) (0.0021)
An 1% increase in Income is associated with an increase of .0554%

in TestScore

39
Example: ln(TestScore) vs. ln(Income)

�
ln( 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇) = 6.336 + 0.0554 × ln( 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝑒𝑒𝑖𝑖 )
(0.006) (0.0021)
• E.g.: Suppose income increases from $10,000 to $11,000 (by

10%).
• TestScore increases by ≈ .0554 × 10% = .554%.
• If TestScore = 650, this corresponds to increase of 3.6 points
(0.00554 × 650).
• How does this compare to the log-linear model?

40
The log-linear and log-log specifications:
• Note vertical axis

• Neither seems to fit as well as the cubic or linear-log, at least
based on visual inspection (formal comparison is difficult
because the dependent variables differ)
41
Logarithms – Example 2
• Here’s an estimated relationship between neighborhood pollution
and housing prices:
� = 9.23 − 0.72 ln 𝑛𝑛𝑛𝑛𝑛𝑛 + 0.31𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟
ln 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
– price = median housing price in the neighborhood
– nox = amount of nitrous oxide in the air (parts per million)
– rooms = mean number of rooms in houses in the neighborhood
• Interpret the coefficient on log(nox):
• Interpret the coefficient on rooms:

42
The three log regression specifications:

Population regression
Case
function
I. linear-log: 1 % ↑X  0.01 β1∆Y Yi = β0 + β1ln(Xi) + ui
II. log-linear: 1 unit ↑ X  100 β1%∆Y ln(Yi) = β0 + β1Xi + ui
III. log-log: 1% ↑ X  β1% ∆Y ln(Yi) = β0 + β1ln(Xi) + ui

43
Summary: Logarithmic transformations

• Three cases, differing in whether Y and/or X is transformed by
taking logarithms.
• The regression is linear in the new variable(s) ln(Y ) and/or ln(X ),
and the coefficients can be estimated by OLS.
• Hypothesis tests and confidence intervals are now implemented
and interpreted “as usual.”
• The interpretation of β1 differs from case to case.
The choice of specification (functional form) should be guided by
judgment (which interpretation makes the most sense in your
application?), tests, and plotting predicted values

44
Two ways to add non-linearities

• Previously: We allowed effect on Y of a change in X to depend
on the value of X (polynomials and logs)
• Now: Effect on Y of a change in 𝑋𝑋1 depends on value of
another variable 𝑿𝑿𝟐𝟐 (interaction terms)
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑟𝑟𝑖𝑖 = 𝐵𝐵0 + 𝐵𝐵1 𝑆𝑆𝑆𝑆𝑅𝑅𝑖𝑖 + 𝜀𝜀𝑖𝑖
• I.e. Slope coefficient 𝐵𝐵1 “changes”
 Done with first, let’s move on to second

45
Interactions Between Independent Variables

• Effect on Y of a change in 𝑋𝑋1 depends on value of another
variable 𝑿𝑿𝟐𝟐
• E.g.: Reduction in class sizes might be more effective under
certain circumstances
• Perhaps smaller classes help more if there are many English
learners, who need individual attention
∆TestScore
• That is, might depend on PctEL
∆STR
∆Y
• More generally, might depend on X 2
∆X 1

46
Interactions Between Independent Variables

• More generally: used to answer questions such as:
– Is the association between class sizes and test scores the same for classes
with more vs less English language learners
– Is the association between wages and years of education the same for
immigrants and non-immigrants?
• Q: How do we model such “interactions” between X1 and X2?

• A: By defining a variable called an interaction
– The multiplicative product of two explanatory variables (𝑿𝑿𝟏𝟏 ∗ 𝐗𝐗 𝟐𝟐 )
– Add to usual regressions
• A: Three cases
– Binary X Binary
– Binary X Continuous
– Continuous X Continuous
47
Binary X Binary
Yi = β0 + β1D1i + β2D2i + ui
• D1i, D2i are binary (dummy variables)
1 𝑖𝑖𝑖𝑖 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 1 𝑖𝑖𝑖𝑖 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓

• Say, D1i = � , D2i = �
0 𝑜𝑜𝑜𝑜ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 0 𝑜𝑜𝑜𝑜ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒
• β1 is the effect of changing D1= 0 to D1 = 1. In this specification,
this effect doesn’t depend on the value of D2.
• If, however, we think that effect of changing D1 depends on D2:
include “interaction term” D1i × D2i
Yi = β0 + β1D1i + β2D2i + β3(D1i × D2i) + ui

48
Binary X Binary: Interpreting the coefficients

Yi = β0 + β1D1i + β2D2i + β3(D1i × D2i) + ui
General rule: compare the various cases
E(Yi|D1i = 0, D2i = d2) = β0 + β2d2 (b)
E(Yi|D1i = 1, D2i = d2) = β0 + β1 + β2d2 + β3d2 (a)
subtract (a) – (b):
E(Yi|D1i = 1, D2i = d2) – E(Yi|D1i = 0, D2i = d2) = β1 + β3d2
• The effect of D1 depends on d2 (what we wanted)
• β3 = increment to the effect of D1, when D2 = 1

49
Example: TestScore, STR, English learners (1 of 2)

Let
1 if 𝑆𝑆𝑆𝑆𝑆𝑆 ≥ 20 1 if 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 ≥ l0
𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻 = � and 𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻 = �
0 if 𝑆𝑆𝑆𝑆𝑆𝑆 < 20 0 if 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 < 10
�
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 = 664.1 − 18.2𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻 − 1.9𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻 − 3.5(𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻 × 𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻)
1.4 (2.3) (1.9) (3.1)
• “Effect” of HiSTR when HiEL = 0 is –1.9

• “Effect” of HiSTR when HiEL = 1 is –1.9 – 3.5 = –5.4
• One interpretation: Class size reduction is estimated to have a
bigger effect when the percent of English learners is large (or?)
• This interaction isn’t statistically significant: t = 3.5/3.1
50
Example: TestScore, STR, English learners (2 of 2)

Let
1 if STR ≥ 20  1 if PctEL ≥ l0
HiSTR = and HiEL 
0 if STR < 20 0 if PctEL < 10

TestScore 664.1 − 18.2 HiEL − 1.9 HiSTR − 3.5( HiSTR × HiEL)
=
(1.4) (2.3) (1.9) (3.1)
• Can you relate these coefficients to the following table of group

(“cell”) means?
Low STR High STR

Low EL 664.1 662.2
High EL 645.9 640.5

51
Continuous X Binary: Motivation

• Consider this regression of wages on years of education (beyond
8th grade)
𝑤𝑤𝑤𝑤𝑤𝑤𝑒𝑒𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 + 𝛽𝛽2 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝑢𝑢𝑖𝑖
• Using OLS, we can estimate this with data from the 2016
American Community Survey on 30-50 year old residents
(N=17,288):

52

• What’s the estimated change in wages associated with 1 more
year of schooling?
• For immigrants?
• For non-immigrants?
• Here’s a graphical
version of the model:

53

• Model assumes relationship between education and wages is
_________ for immigrants and non-immigrants, which may not be
realistic.
• If that relationship is, for example, stronger for immigrants than
nonimmigrants, the graph would look like:

54
Continuous X Binary Interaction

• We want the regression model to allow the ___________ (not
just the ___________) to differ by immigration status.
• We can do so by creating an
interaction between the dummy
variable immigrant and the
continuous variable educ by
multiplying the two:

55

• To see why that interaction variable helps, consider this regression
model:
𝑤𝑤𝑤𝑤𝑤𝑤𝑒𝑒𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 + 𝛽𝛽2 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝛽𝛽3 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖gXeduc + ui
• What is the sample regression function for:

– Non-immigrants?
– Immigrants?

56

• What’s the predicted wage change associated 1 more year of
education for
– Non-immigrants?
– Immigrants?
• What is the interpretation of

�1
• 𝐵𝐵
�3
• 𝐵𝐵

57

• Here are the actual results from the data:
• What’s the predicted wage change associated with 1 more year

of education for
– Non-immigrants?
– Immigrants?
• Which slope is steeper? Is the difference in slopes statistically

significant?

58

• Which slope is steeper? Is the difference in slopes statistically
significant?
• Here’s the graphical version of the regression model:

59
Binary-continuous interactions: the two

regression lines (1 of 2)
Yi = β0 + β1Di + β2Xi + β3(Di × Xi) + ui
Observations with Di = 0 (the “D = 0” group):
Yi = β0 + β2Xi + ui The D = 0 regression line
Observations with Di = 1 (the “D = 1” group):
Yi = β0 + β1 + β2Xi + β3Xi + ui
= (β0 + β1) + (β2 + β3)Xi + ui The D = 1 regression line

60
Binary-continuous interactions: the two

regression lines (2 of 2)

61
Graph (c): omitting lower order terms

• Always include the lower order terms!
• Imagine if you omit lower order term for immigrants
• 𝑤𝑤𝑤𝑤𝑤𝑤𝑒𝑒𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 + 𝛽𝛽2 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝛽𝛽3 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖gXeduc + ui
becomes…
𝑤𝑤𝑤𝑤𝑤𝑤𝑒𝑒𝑖𝑖 = 𝛽𝛽0 + 0 ∗ 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 + 𝛽𝛽2 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝛽𝛽3 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖gXeduc + ui
Intercept for educ Slope for educ
Non-immigrant (immig=0)
Immigrant (immig=1)
• Implication: no difference between immigrants and non-

immigrants when educ =0
• Distorts slope estimates, very rarely justified
62
Interpreting the coefficients

Yi = β0 + β1Di + β2Xi + β3(Di × Xi) + ui
Y = β0 + β1D + β2X + β3(D × X ) (b)
Now change X:
Y + ΔY = β0 + β1D + β2(X + ΔX ) + β3[D × (X + ΔX)] (a)
subtract (a) − (b):
∆Y
∆Y = β 2 ∆X + β 3 D∆X or = β 2 + β3 D
∆X
• The effect of X depends on D (what we wanted)
• β3 = increment to the effect of X, when D = 1
63
Example: TestScore, STR, HiEL (=1 if

PctEL ≥ 10) (1 of 2)
�
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 = 682.2 − 0.97𝑆𝑆𝑆𝑆𝑆𝑆 + 5.6𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻 − 1.28(𝑆𝑆𝑆𝑆𝑆𝑆 × 𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻)
11.9 0.59 19.5 (0.97)
• When 𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻 = 0
�
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 = 682.2 − 0.97𝑆𝑆𝑆𝑆𝑆𝑆
• When 𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻 = 1,
�
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 = 682.2 − 0.97𝑆𝑆𝑆𝑆𝑆𝑆 + 5.6 − 1.28𝑆𝑆𝑆𝑆𝑆𝑆
�
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 = 687.8 − 2.25𝑆𝑆𝑆𝑆𝑆𝑆
• Two regression lines: one for each HiEL group.

• Class size reduction is estimated to have a larger effect when
the percent of English learners is large.

64
Example: TestScore, STR, HiEL (=1 if

PctEL ≥ 10) (2 of 2)

TestScore 682.2 − 0.97 STR + 5.6 HiEL − 1.28( STR × HiEL)
=
(11.9) (0.59) (19.5) (0.97)
• The two regression lines have the same slope ↔ the coefficient
on STR×HiEL is zero: t = –1.28/0.97 = –1.32
• The two regression lines have the same intercept ↔ the
coefficient on HiEL is zero: t = –5.6/19.5 = 0.29
• The two regression lines are the same ↔ population coefficient
on HiEL = 0 and population coefficient on STR×HiEL = 0: F =
89.94 (p-value < .001) (!!)
• We reject the joint hypothesis but neither individual hypothesis
(how can this be?)
65
(c) Interactions between two continuous

variables
Yi = β0 + β1X1i + β2X2i + ui
• X1, X2 are continuous
• As specified, the effect of X1 doesn’t depend on X2
• As specified, the effect of X2 doesn’t depend on X1
• To allow the effect of X1 to depend on X2, include the
“interaction term” X1i × X2i as a regressor:
Yi = β0 + β1X1i + β2X2i + β3(X1i × X2i) + ui

66
Interpreting the coefficients:

Yi = β0 + β1X1i + β2X2i + β3(X1i × X2i) + ui
Y = β0 + β1X1 + β2X2 + β3(X1 × X2) (b)
Now change X1:
Y + ΔY = β0 + β1(X1 + ΔX1) + β2X2 + β3[(X1 + ΔX1) × X2] (a)
subtract (a) – (b):
∆Y
∆Y = β1∆X 1 + β 3 X 2 ∆X 1 or = β1 + β3 X 2
∆X 1
• The effect of X1 depends on X2 (what we wanted)

• β3 = increment to the effect of X1 from a unit change in X2
67
Example: TestScore, STR, PctEL (1 of 2)


TestScore 686.3 − 1.12 STR − 0.67 PctEL + .0012( STR × PctEL),
=
(11.8) (0.59) (0.37) (0.019)
The estimated effect of class size reduction is nonlinear because
the size of the effect itself depends on PctEL:
∆TestScore
−1.12 + .0012 PctEL
=
∆STR
∆TestScore
PctEL
∆STR
0 –1.12
20% –1.12 + .0012 × 20 = –1.10
68
Example: TestScore, STR, PctEL (2 of 2)

�
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 = 686.3 − 1.12𝑆𝑆𝑆𝑆𝑆𝑆 − 0.67𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 + .0012(𝑆𝑆𝑆𝑆𝑆𝑆 × 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃),
(11.8) (0.59) (0.37) (0.019)
• Does population coefficient on STR×PctEL = 0?

t = .0012/.019 = .06 → can’t reject null at 5% level
• Does population coefficient on STR = 0?
t = –1.12/0.59 = –1.90 → can’t reject null at 5% level
• Do the coefficients on both STR and STR×PctEL = 0?
F = 3.89 (p-value = .021) → reject null at 5% level (Why? high
but imperfect multicollinearity)
69
Application: Nonlinear Effects on Test Scores

of the Student-Teacher Ratio (SW Section 8.4)
Nonlinear specifications let us examine more nuanced questions
about the Test score – STR relation, such as:
1. Are there nonlinear effects of class size reduction on test
scores? (Does a reduction from 35 to 30 have same effect as a
reduction from 20 to 15?)
2. Are there nonlinear interactions between PctEL and STR? (Are
small classes more effective when there are many English
learners?)

70
Strategy for Question #1 (different effects

for different STR?)
• Estimate linear and nonlinear functions of STR, holding constant
relevant demographic variables
– PctEL
– Income (remember the nonlinear TestScore-Income relation!)
– LunchPCT (fraction on free/subsidized lunch)
• See whether adding the nonlinear terms makes an “economically

important” quantitative difference
• Test for whether nonlinear terms are significant

71
Strategy for Question #2 (interactions

between PctEL and STR?)
• Estimate linear and nonlinear functions of STR, interacted with
PctEL.
• If the specification is nonlinear (with STR, STR2, STR3), add
interactions with all the terms so entire functional form can be
different, depending on the level of PctEL.
• We will use a binary-continuous interaction specification by
adding HiEL×STR, HiEL×STR2, and HiEL×STR3.

72
What is a good “base” specification? (1 of 3)

• The TestScore – Income relation:
• The logarithmic specification is better behaved near the extremes
of the sample, especially for large values of income.

73

TABLE 8.3 Nonlinear Regression Models of Test Scores

74

TABLE 8.3 (Continued)

75
Tests of joint hypotheses:
What can you conclude about question #1?

About question #2?
76
Summary: Nonlinear Regression Functions
• Functions of independent variables allows recasting nonlinear

regression functions as multiple regression.
• Estimation and inference proceed in the same way
• If you ever forget how to interpret coefficients, try predicting
outcomes for various groups in your data one at a time (different
X’es).
• Interactions often used to explore heterogeneity, i.e. whether
relationships between two variables differ by a third variable (gender,
income, race)
• Post-midterms: We will use interactions to implement a method
of causal inference called differences-in-differences.

ECON6001 F2021 Topic4

Uploaded by

Copyright:

Available Formats

ECON6001 F2021 Topic4

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ECON6001 F2021 Topic4

Uploaded by

Copyright:

Available Formats

ECON6001: Applied Econometrics

Dr. Gedeon Lim

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.

Nonlinear regression function

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.

The TestScore – STR relation looks linear

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.

But the TestScore – Income relation looks

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.

• Report by Adam Ozimek at Moody’s Analytics

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.

• Think back to California class size, income and test score

• Non-linear regression functions allow the predicted change in Y

Nonlinear Regression Population Regression

Two ways to add non-linearities

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.

Quadratics (and other polynomials)

• Quadratic functions fit U-shapes (parabolas) to the data. The

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.

Effect on Y of a change in X depends on the

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.

Effect on Y of a change in X depends on the

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.

Effect on Y of a change in X depends on the

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.

The estimated quadratic regression function

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.

Data-wise: Estimation of the quadratic

Regression with robust standard errors Number of obs = 420

Simple calculus: Interpreting the estimated

Predicted change in TestScore for a change in income from $5,000

Interpreting the estimated regression function

Income TestScore = 607.3 + 3.9𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼 – 0.04𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝑒𝑒 2 �

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.

General Approach to Modeling Non-

2. Specify and estimate non-linear functions using OLS

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.

Nonlinear Functions of a Single Independent

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.

1. Polynomials in X (generalizing before)

• r = 1 (linear), r = 2 (quadratic), r = (cubic)

Example: the TestScore – Income relation

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.

Estimation of a cubic specification in STATA

gen avginc3 = avginc*avginc2; Create the cubic regressor

Regression with robust standard errors Number of obs = 420

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.

Estimation of a cubic specification in STATA

The hypothesis that the population regression is linear is rejected at the 1%

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.

Summary: polynomial regression functions

• Estimation: by OLS after defining new regressors

Copyright © 2019 Pearson Education Ltd. All Rights Reserved.

2. Logarithmic functions of Y and/or X

The three log regression specifications:

Case Population regression function

• Computation of OLS coefficients stay the same but

I. Linear-log: 1% ∆ in X  0.01𝑩𝑩𝟏𝟏 ∆ 𝒊𝒊𝒊𝒊 𝒀𝒀