Extensions Beyond Linear Regression: Topics in Data Science

Download as pdf or txt
Download as pdf or txt
You are on page 1of 66

Extensions beyond linear regression

Topics in Data Science

David Rossell, UPF


Reading material

• Section 1. Basu & Michailidis (LASSO theory for time series, at


Box)
• Section 2. Rockove & George (reading paper at Box)
• Sections 3-4. Hastie-T-M Ch 3.1-3.4; Gelman et al, Ch 16.
References
Basu, Michailidis. Regularized estimation in sparse high-dimensional time series
models. The Annals of Statistics 2015
Gelman et al. Bayesian data analysis (3rd ed). CRC press.
Hastie, Tibshirani, Wainwright. Statistical Learning with Sparsity. CRC Press
Rockova, George. Fast Bayesian factor analysis with automatic rotations to sparsity.
JASA, 2015
Outline

1 Time series

2 Factor regression

3 Generalized linear models

4 Beyond standard GLMs

5 Application to polarization and segregation


Until now we assumed
y ∼ N(X β; φI )
If y = (y1 , . . . , yT ) is a time series, independence not reasonable

Example: AR1 model

yt = ηyt−1 + x0t β + t
where |η| < 1 and t ∼ N(0, φ) indep t = 1, . . . , T . Likelihood:
T
Y
p(y | η, β, φ, X ) = p(yt | yt−1 , η, β, φ, xt ) = N(y; µ, φΣ)
t=1

Defining y0 = 0, algebra shows


• µ = Xβ
P
• σt,t = t−1 2 i
i=0 (η ) ≈
1
1−η 2

• σt,t−j = η j σt−j,t−j ⇒ cor(yt , yt−j ) ≈ η j


Main novelty: η features in the covariance
Penalized likelihood

min(y − X β)0 Σ−1 (y − X β) + h(β, Σ)


β,Σ

Easy to min over β for given Σ, but we also need to min over Σ

• Not too hard for AR(r) models (Σ−1 has 0’s after r lags)
• Function convex in β, not necessarily in (β, Σ)
Bayesian inference

Z
p(γ | y) ∝ p(γ) N(y; Xγ βγ , φΣ)p(β, φ, Σ | γ)dβdφdΣ

Closed form given Σ, but we also need to integrate over Σ


Approx for AR(r) models

Let D be (T − r ) × r matrix with lagged predictors dtj = yt−j


    
yr +1 yr yr −1 . . . y1 η1
 ...  =  ...  . . . + X β +  = Dη + X β + 
yT yT −1 . . . yT −r ηr

where  ∼ N(0, φI )
• Can infer r via variable selection on η’s. Hierarchical restrictions?
• We lost the first r observations. OK if r  T .
Formally equivalent to

log p(yr +1:T | y1:r , β, φ, η) = log p(y | β, φ, η) − log p(y1:r | β, φ, η)


Approx for MA models
Consider the MA(1) model

yt = x0t β + αt−1 + t , t ∼ N(0, φ) iid

Since t−1 = yt−1 − x0t−1 β − αt−2 , this implies

yt = x0t β − x0t−1 αβ + αyt−1 − α2 t−2 + t = . . . =


t
X t
X
(−1)j x0t−j αj β + (−1)j+1 αj yt−j + t
j=0 j=1

Regresses on lagged x’s and y’s imposing restrictions on the parameters


Alternative: If |α| < 1 this can be approximated with r lags

y = Dy ηy + X β + Dx ηx + 

where ηy , ηx now unrestricted, Dy , Dx contain lagged y, X


Canada macroeconomy
(data(Canada) in R package vars)

Outcome: quarterly labour productivity in 1980-2000


• e: 100*ln(civil employment)
• U: unemployment rate
• rw: 100*ln(100* manufacturing real wage)

2
Residuals
0
Residuals under iid model
(+ 3 covariates)
−2
−4

0 20 40 60 80

Time
Residuals from AR(5) model (+ 3 covariates)

1.0
2

0.8
1

0.6
Residuals

ACF
0.4
0

0.2
−1

0.0
−0.2
−2

0 20 40 60 80 0 5 10 15
Time Lag

Let’s run BVS with Zellner + BetaBin(1,1) priors


iid model

Top models
Model p(γ | y)
e,rw .461 Variable P(γj = 1 | y)
e,rw,u .337 e 0.991
e,u .156 rw 0.807
u .037 u 0.502
rw,u .009
...

AR(5) model

Top models Variable P(γj = 1 | y)


Model p(γ | y) e 0.109
lag1 0.231 rw 0.222
u, lag1 0.129 u 0.282
lag1, lag2 0.065 lag1 1.000
lag1, lag3 0.054 lag2 0.178
lag1, lag4 0.054 lag3 0.148
lag1, lag5 0.050 lag4 0.167
... lag5 0.181
Extensions to panel data/VAR

Let yt ∈ Rq , xt ∈ Rp . Consider

yt = A1 yt−1 + . . . + Ar yt−r + B0 xt + . . . + Bm xt−m + t

t ∼ N(0, Σ) indep t = 1, . . . , T

This implies y0 = (y1 , . . . , yT ) ∼ Normal


• Dimension of A’s and B’s high ⇒ sparsity more important
• Algebra tractable, log-likelihood quadratic in A’s & B’s
VAR & Graphical models

Let yt be endogenous, xt exogenous, t ∼ N (0, φI ). Model

A0 yt = A1 yt−1 + . . . + Ar yt−r + B0 xt + . . . + Bm xt−m + t


⇒ yt = Ã1 yt−1 + . . . + Ãr yt−r + B̃0 xt + . . . + B̃m xt−m + ˜t

where Ãj = A−1 −1


˜t ∼ N (0, Σ), Σ = (A0 A00 )−1
0 Aj , B̃j = A0 Bj , 

• Ã1 , . . . , Ãr , B̃1 , . . . , B̃m and Σ are identifiable


• A0 not identifiable ⇒ Aj = A0 Ãj and Bj = A0 B̃j not identifiable
Idea: if A0 has enough 0’s, then Σ = (A0 A00 )−1 has unique solution

log p(y | A0 , A1 , . . . , Ar , B0 , . . . , Bm ) + h(A0 )


where h() is penalty inducing 0’s in A0
Example
(by Monetta 2007) Quarterly US data from 1947-1994 (T = 188)
• R: nominal interest rate
• I : investment per capita
• Y : GNP per capita
• C : consumption per capita
• M: real balances (ratio money/price level)
4. GRAPHICAL MODELS AND STRUCTURAL VARS
• ∆P: inflation
Graphical structure implied by Â0

R I Y M ∆P


❅❅C

Figure 1.1. Output of the search algorithm.


Time-varying yt = x0t βt + t
PT −1
Option 1. Penalize changes in βt , say t=1 |βt+1 − βt |1

Option 2. Changepoints in βt : consider t1 < . . . < tK , reparameterize


K
X
yt = x0t θk I(t > tk ) + t
k=1

• If t ∈ [t1 , t2 ) then yt = x0t θ1 + t


• If t ∈ [t2 , t3 ) then yt = x0t (θ1 + θ2 ) + t
• ...

 
x01 0 0 ... 0  
   
y1  0
x02 ...  θ1 1
 x2 0 0   θ2 
. . . =  x03 x03 x03 ... 0    + . . . 
  . . .
yT . . .  T
θK
x0T x0T x0T ... x0T
Outline

1 Time series

2 Factor regression

3 Generalized linear models

4 Beyond standard GLMs

5 Application to polarization and segregation


Factor regression
We have n observations, q outcomes, p known predictors, r factors

Y = X B + Z M + E
n×q n×p p×q n×r r ×q n×q

• X are p observed predictors, B regression coef


• Z are r unobserved factors, M are factor loadings
• E has i th row ei ∼ N(0, Σ) iid i = 1, . . . , n, Σ diagonal
The log-likelihood, given Z
n
1 1X
log p(Y | Z , B, M, Σ) = − log(|Σ|) − |yi − B 0 xi − M 0 zi |22
2 2
i=1

Complete with p(Z | B, M, Σ), add likelihood penalties / priors as usual


Z
p(Y | B, M, Σ) = p(Y | Z , B, M, Σ)p(Z | B, M, Σ)dZ
Example (Rockova & George, JASA 2015)
Let zi ∼ N(0, I ) indep. To encourage loadings m̂ij = 0 set g0 < g1 ,
Y
p(mij ) = L(mij ; 0, g0 )π + L(mij ; 0, g1 )(1 − π)
i,j

n = 48 job applicants, p = 15 characteristics (10-point scale). No X ’s


Factor 1 Factor 2 Factor 3 Factor 4 Factor 5 σ̂jj
Application 0.89 1.3 2.08 0 0 0.16
Appearance 1.01 0.02 0.14 1.6 0 0.18
Academic ability 0.2 0.5 -0.3 0.2 -0.1 1.84
Likability 1.41 0.03 0.46 0.35 2.29 0.16
Self-confidence 2.02 0 0 0 0 1.24
Lucidity 2.81 0 0 0 0 1.28
Honesty 0 0 0 0 0 2.49
Salesmanship 3.13 0 0 0 0 1.30
Experience 0.87 3.11 0 0 0 0.24
Drive 2.51 0 0 0 0 1.42
Ambition 2.61 0 0 0 0 1.20
Grasp 2.72 0 0 0 0 1.20
Potential 2.79 0 0 0 0 1.39
Keenness 1.67 0 0 0 0 2.00
Suitability 1.85 1.91 0 0 0 1.84
GDP nowcasting (Ferrara & Simoni 2018)

Outcome: Euro area GDP growth rate at quarter t (2005-2016)

yt = β0 + β1 yt−1 + βg0 xt,g + βs xt,s + βh xt,h + t

• xt,g : weekly Google trends data (search in broad categories)


• xt,s : monthly Sentiment index (European Commission survey)
• xt,h : biweekly industrial production (Eurostat)

Google data reduces MSE mainly weeks 1-3 (relative to no Google data)
xt,g has p = 1776 de-seasonalized weekly changes for category/country
• 26 search categories, 296 subcategories
• Belgium, France, Germany, Italy, Netherlands, Spain
Strategy 1: add xt,g with LASSO penalties

Strategy 2: let zt be PCA from xt,g

yt = β0 + β1 yt−1 + βg0 zt + βs xt,s + βh xt,h + t

Note: zt given by eigendecomposition of X 0 X

Strategy 3. Let zt be Sparse PCA


T
X X X
max log p(Xg | Z , M) = min |xg ,t − M 0 zt |22 + λ1 mij2 + λ2 |mij |
Z Z
t=1 i,j i,j

Note: zt given by iterative LARS-LASSO algorithm


Predictive accuracy
Authors selected r = 3 factors (adhoc). Data until 04 2014 to fit model,
rest to assess MSE

0.15
0.10
MSE (out−of−sample)
0.05

LASSO
PCA
0.00

SPCA

2 4 6 8 10 12
Week
Factor loadings

Loadings for the 1st factor

Factor LoadiSparse
ng for F1 (SPCA) PCA Factor Loading forPCA
F1 (PCA)
0.2 0.2
Factor Loading

Factor Loading
0 0

-0.2 -0.2
0 200 400 600 800 1000 1200 1400 1600 1800 0 200 400 600 800 1000 1200 1400 1600 1800
Category by Country Category by Country
Factor Loading for F2 (SPCA) Factor Loading for F2 (PCA)
0.4 0.4 analysis?
Ideas on how to potentially improve this
oading

oading

0.2 0.2
Outline

1 Time series

2 Factor regression

3 Generalized linear models

4 Beyond standard GLMs

5 Application to polarization and segregation


In many problems yi is not a continuous variable
• Binary yi ∈ {0, 1}
• Discrete yi ∈ {1, . . . , G } (or ordinal)
• Counts yi ∈ N

GLMs: common framework to deal with these cases

p(yi | xi , β, φ) = p(yi | x0i β, φ)


g (E (yi | xi )) = x0i β

• g : known invertible link function


• φ: nuisance parameter vector (may be absent)
Also, p(yi | xi , β, φ) assumed to be in the exponential family of
distributions (includes Normal, Binomial, Poisson etc)
Examples

Logistic regression: p(yi | xi , β) = Bern(yi ; µi ), µi = P(yi = 1 | x0i β)


 
µi
log = x0i β
1 − µi

Poisson regression: p(yi | xi , β) = Poisson(yi ; µi ), µi = E (yi | xi )

log µi = x0i β

Can lack flexibility to capture variance, since Var(yi | xi , β) = µi

Negative Binomial regression: p(yi | xi , β) = NegBin(yi ; µi , φ),


µi = E (yi | xi ), Var(yi | xi ) = µi φ

Multinomial regression, ordinal regression...


Penalized log-likelihood

n
X
min − log p(y | β, φ) + h(β) = − log p(yi | β, φ) + h(β)
β,φ
i=1

For most GLMs log p(y | β, φ) concave in β and depends on X through a


linear function

Theory: most LASSO, ALASSO, Bayesian results etc carry over to


GLMs: consistency, MSE, variable screening & selection

Bayesian framework: integrated likelihood p(y | γ) has no closed-form,


but convexity facilitates Laplace approx
Example: Poisson regression

Log-likelihood: sum of linear + concave functions in β

If yi ∼ Poisson(yi ; µi ), µi = E (yi | xi ) where log µi = x0i β, then

n
Y n
X
µyi 0
log p(y | β) = log i
e −µi = c + yi x0i β − e xi β
yi !
i=1 i=1

Example: data PoissonExample (glmnet package)


• n = 500, p = 20
• Simulate Poisson y with β ∗ = (−0.75, −0.5, 0.5, 0.75, 1, 0, . . . , 0)
• Add LASSO penalty (convex optimization)
LASSO analysis
(glmnet with family=’poisson’)

Histogram yi LASSO path


0 3 5 5
350

1.0
300
250

0.5
200

Coefficients
Frequency

150

0.0
100

−0.5
50
0

0 20 40 60 0 1 2 3

y L1 Norm

Cross-validated λ: correctly selects the first 5 variables


Bayesian analysis
(modelSelection with family=’poisson’)

BMA

1.0
BMS
0.5
Model γ p(γ | y)
1, 2, 3, 4, 5 0.721
β

1, 2, 3, 4, 5, 10 0.128
0.0

1, 2, 3, 4, 5, 12 0.018
... ...
−0.5

5 10 15 20
Variable
Outline

1 Time series

2 Factor regression

3 Generalized linear models

4 Beyond standard GLMs

5 Application to polarization and segregation


Limitations of standard GLMs

Data may be over- or under-dispersed relative to predicted by model


• Linear models E (yi ) = x0i β, Var(yi ) = φ. β and φ are “decoupled”
• Poisson regr. log E (yi ) = x0i β, Var(yi ) = E (yi )
• Binomial regr. logitE (yi ) = x0i β, Var(yi ) = ni E (yi )(1 − E (yi ))

Data may have more 0’s than predicted by model. Let µi = E (yi | xi )
• Poisson: P(yi = 0 | xi ) = e −µi
• Binomial: P(yi = 0 | xi ) = (1 − µi /ni )ni

Common solutions
1 Models with additional variance parameter, e.g. NegBinom
2 Zero-inflated and hurdle models
Why is it a problem in practice?

Pni
Dispersion. If zij ∼ Bern(µi ) iid then yi = j=1 zij ∼ Bin(ni , µi )
• What if not indep across j? e.g. measures from same school,
company, region...
• What if µi not constant across j? e.g. unobserved covariates

Zero counts
• Zeroes depend on x’s in a different way, e.g. never visiting a doctor,
never applying for well-fare
• Under-reporting (e.g. self-reported depression), lack of sensitivity
(e.g. counting people in satellite images)
model. In addition we assessed the general fit of the models by computing Pearson’s
chi-square using the predicted counts. Lastly, we used the Akaike (AIC) and
Bayesian (BIC) information criteria to compare models.

5. RESULTS
Vaccine adverse events
(Rose et al, J Biopharm Stat 2006, 16: 463-81)
We computed the empirical mean and variance for our systemic adverse event
Goal: assess safety
endpoint. of Anthrax
Total number vaccine
of systemic adverse from randomized
events was clinical
recorded after trial
each of the
four injections for the 1005 study participants, which results in 4020 observations.
• n= 1005
Study participants,
participants 4 study
experienced injections
from 0–12 (0, 2,adverse
unique systemic 4, 24events
weeks)
after
an injection. The observed mean and variance using all 4020 observations are
Motivation
• 5 treatments: full-doseOur
1.51 and 2.90, respectively. SQobserved
or IM,variance
reduced dose,
to mean placebo
ratio SQ or IM
is 1.92, which
indicates some over-dispersion. Figure 1 compares the observed count distribution
• Outcome: number of adverse events after each injection
Negative Binomial
Def. yi = number of successes in Ber(µi ) trials until r failures
⇒ yi ∼ NegBin(r , µi )

r µi E (yi )
E (yi ) = ; Var(yi ) =
1 − µi 1 − µi
 
yi + r − 1
p(yi | r , µi ) = (1 − µi )r µyi i
yi

Often parameterized as log E (yi | xi ) = x0i β, Var(yi ) = φE (yi | xi )


Note that φ = 1/(1 − µi ) > 1, i.e. NegBin captures over-dispersion

MLE. Init φ̂ = 1, iterate until convergence


1 β̂ = arg maxβ p(y | β, φ̂) (convex optimization)
2 φ̂ = arg maxφ p(y | β̂, φ) (univariate optimization)
MCMC. Gibbs or HMC easy to apply
Gibbs sampling

Trick: Suppose we model

yi | λi ∼ Poi(λi )
λi | r , µi ∼ Gamma(r , µi /(1 − µi )).

This implies E (yi ) = E (λi ) = r µi /(1 − µi ). Further,


Z
p(yi | r , µi ) = Poi(yi ; λi )p(λi | r , µi )dλi = NegBin(yi ; r , µi )

Let log E (yi ) = x0i β. Gibbs for parameter augmentation (λ1 , . . . , λn , β, r )


   
µi µi
p(λi | yi , r , µi ) ∝ Poi(yi ; λi )G λi ; r , ∝ G λi ; r + yi , yi +
1 − µi 1 − µi

Sample from p(β, r | λ1 , . . . , λn , y) using Gamma regression methods


Example: Twitter German elections 2017

(with R. Knudsen & S. Majó, Oxford Reuters Institute for Investigative Journalism)

Data: n = 11, 503 tweets (21/08-25/09) by candidates/parties

Outcomes: number of favourites, number of retweets

Predictors
• Number of followers
• Party affiliation
• Party vs. candidate
• % of tweet talking about 40 topics (Supervised LDA, to be seen)
● ● ●
● ●
● ●
● ●
● ● ● ●● ● ● ● ●
●●●●
●● ●●●●●
●●

●●
●●●
●●●●

●●●●
●●
●●
●●


●●
●●


●●

●●
●●
●●

●●
●●
●●
●●
●●

●●
●●●●●
●●●
●●●
● ● ●● ●● ● ● ●●
● ●●● ●●● ●● ●●●● ● ●
● ●● ● ●
● ● ● ●● ● ● ●

●●●● ●
● ● ●●●● ●
●● ●●
1,000.0 ● ● ● ●● ●●●●● ●●
●● ●
10.0 ●●● ●● ●● ● ●
●●●●●●●●● ● ●
● ● ●
● ● ●●● ●●● ●
● ● ●
●●
●●
●● ● ●
● ●●●●
● ● ●●
●●● ●
Outcomes vs. number of followers and party ● ● ●●

● ●
● ●●
● ● ●●● ●●
● ● ●




● ●● ●●
●●

●● ●●
●● ●




●●
●● ●


● ● ●●
●●
● ●●
●●





●●
●●


●●

●●

●●● ● ●●●
● ● ● ● ●●

●●● ●
● ● ●●

● ●

●●● ● ●●●●●● ●●
● ● ● ● ● ●● ●
● ● ●●
●● ●●●

●● ●● ●●
● ● ●● ●● ●
● ●●● ●●
● ● ● ● ●●
●● ●●●● ● ●● ● ● ● ● ●
● ● ● ● ● ● ●
● ●● ● ●● ● ● ● ● ● ●●● ● ●
● ●● ●● ●● ● ●
●●●
●●
●● ● ● ●●
● ●● ●● ●●●●
● ● ●●
● ● ●● ● ● ● ● ● ●● ● ● ●●
● ●● ● ● ● ● ●
● ●●●● ●● ● ● 0.1 ● ● ●●●
● ●●
● ● ● ●
10.0 ●
● ● ● ●● ● ●● ●
● ●● ●● ●● ●
Favorites

●●
●● ●
●●●●

●●●
●● ● ●
●●
●●● ●●

●●

●●

●●
●●
●●●●●●● ●

●●● ●● ● ● Retweets●
● ●

● ●●● ●● ●●●

● ●●●●●● ●● ●●

yu2
●●● ●●● ● ●
● ●●●●●●●●● ●● ●
●●●
●●
● ●
●●●

●●●●●



●●●● ● ● ● ● ● ●●●● ●●
● ● ● ●
●● ●
● x ● ●●
● ● ● ● ●●●●●● ●● ●●●●
●● ●●●●●
●●

●●
●●●
●●●●

●●●●

●●
●●
●●


●●
●●


●●

●●
●●
●●

●●
●●
●●
●●
●●

●●
●●●
●●●●

●●●

● ●● u2 ●=● "candidate"

● ●●

●● ●


●●● ●
● ●

● ● ●
● ●●● ●
● ● ● xu2 = "party" ● ●

●●● ●
●● ●●●●● ●● ●●●
●●● ●●
●●
●●
●●
●● ●●

●●●●
● ●
●●

●● ● ●
● ●● ●
● ●●●● ● ●● ● ●● ●● ●
●● ● ●
● ●●●●●● ●●●● ●● ● ●
● party
● ● ●


●●●●● ● ●
●●
● ●●● ● ●● ●● ● ●●● ●
●● ●
●●● ●

1,000.0
● ●● ●●
1,000.0 ● ●
● ● ●●
● ●● ● ●● ●●●●●● ● ● ● AfD
●● ● ●●● ● ●
0.1 ●
● ●● ● ●

● ●
● ● ●●
● ●● ● ● ● ●
● ● ●● ● ● ● ●● ●●● ●●

● ● CDU
●● ●● ●● ● ● ●● ●
● ● ●●● ● ●● ●● ●
●● ● ● ●
● ●● ●●● ●● ●● ●● ● ● ●● ● ●●
● ● ● ● ● ●

● ●● ● ●● ●● ● ● ●
● ●● ● ● ● ● ● ●

●● ● ●● ● ● ●
●●●●
●● ●●●● ●●●●●●
● ●●


●●
●●●●




●●
●●●
●●●
●●
●●
●●
●●
●●●
●●
●●●●
●●●

●●
●●●●
●●
●●●●
●●●
● ●●
●●●
● ● ● ●●●

● ● ●● ● ● ●●●● ●● ●● ●
● ● ● ● CSU
● ● ● ●● ●●●●●
● ●●
● ●

● ● ● ●●
● 10.0 ● ● ●● ● ●
● ● ●
●● ●● ● ●● ● ●●
●● ●
10.0 ● ●● ● ●●●
● ●● ● ●● ● ● ●● ● ●●●●●● ●●●●●● ● ● ●
●●
●● ●●●●●●● ●

●●
● ● ●
●●●●●● ●●
● ● ●●●● ●
● ●●● ●●● ●●●●● ●●
●●● ●●

● ● ● ●
● ● ●
●●● ●
● ●●●● ●● ●
● ●●
●●●● ●● ●●● ●●

yu1
100,000.0 ● ● ●● ●
●●

●●
●● ●
● ● ●


● ●●
●●

●●
●●

●●●
●●
●●● ●
● ● ● ●●●●●●● ● ●●●●●● ●● ●
●●●
● ●
●●●●

●●
●●●
●●

●● ●● ● ●
● ● FDP
● ● ●
● ●● ●●
● ● ●● ●●
● ●
●●
●●●
●●
●●●● ● ●● ● ●● ● ● ● ●


●●●●● ●
●● ● ●● ● ●
●●● ●● ●
●●
● ● ●
●●●●●●● ●●
● ● ●●● ●● ● ● ●●●●●●● ●● ●●
●●●● ●●

● ●● ● ● ●●
●● ●● ●● ●●● ● ●● ● ● ●●●
●●
● ●● ●●
● ●● ●
●● ●●●●●●● ●● ●● ● ●
●● ●●● ●● ●● ●
●● ●● ●●●● ●
● ● ●● ● ● ● ●● ● ● ● ●
●●●● ●●
● ●●●
●● ●●● ●●● ● ● Gruene
● ● ●
● ● ● ●● ●● ●● ●● ●
●●●●●● ●
●●● ●●

● ●●●
● ●
●● ●● ● ● ●●● ●●●● ●
●● ● ●●●


●●
● ●
●● ● ●
● ●●●● ●● ● ● ● ● ● ●●●●●● ●●● ● ● ●● ●● ●● ●●
●●
●● ● ●
● ●●●


●●●

●●
● ●● ●●●●
● ●● ● ●● ●●●●● ●

● ●● ●●
●●
● ● ●● ●●●

●● ●
● ●●● ● ●

● ●● ●●
● ●●
● ● ●
●●●●●●● ●●
●● ● ●●
● ● ● ● ●●● ● ● ● ●●
● ● ●
● ● ● ●●●●●●● ● ●●● ● ● ●● ●●●●●●● ● ●
●●●
●● ●
● ● ●● ●


●●●●
●● ●
● ●● ● ●●● ● ●● ● ● ●● ● ● ●●● ●●● ● ●●●●● ● ● Linke
●●● ● ● ●● ●●●● ●●●● ●
●● ●● ●●● ●●●●●● ● ●
●●●● ●●●● ● ● ●● ● ●● ●●●● ●●●●
0.1
10,000.0 ● ●●
●●●
●●● ●●●● ●●● ● ●●
● ● ●●● ●●●●●●●●●
0.1 ● ●●
●● ● ●● ● ● ●●●
● ●
● ● ●● ●

●● ●● ● ●● ●
●●●● ●
● ●● ● ● ● ● ●● ● ● ● ●

●●● ● ●● ● ●
● ●●●●●● ●● ●
●● ●
● ● ● ● ● ● ●●● ● ●● ● ●● ● SPD
● ● ●

●●● ● ● ●● ● ●


● ● ●
●● ● ●●
● ● ●
● ● ●● ● ●●● ●● ● ● ●
● ● ● ● ● ● ●
● ● ●● ●
●●● ● ●●●●● ● ●● ● ●

●● ● ● ● ●
● ●● ● ●●
●●●●
●● ●●●●●
●●

●●
●●●
●●●●

●●●●

●●
●●

●●


●●
●●


●●

●●

●●

●●

●●
●●
●●
●●
●●●

●●●●●
●●●
●●●●
● ●●●● ●


●●●●
●● ●●●●
●●
●●●●
● ●●

●●●●
●●●

●●

●●

●●

●●


●●
●●

●●

●●
●●
●●
●●
●●
●●

●●●●●
●●●
●●●
● ●

●●● ●
●● ●● ● ●
● ●

zu1

●● ● ● ●

●● ●●● ●
● ● ● ● ● ● ●
1,000.0 ● ● 100,000.0 ●
1,000.0 ● ●

● ● ● ● ● ● ●
●●
● ● ●
●●● ●
●● ● ● ●●
● ● ● ● ●●
●● ● ● ● ●●● ● ● ●
●●● ●● ● ●● ● ●●

● ● ● ●●● ● ● ●● ●● ●●

• Many 0’s, strong asymmetry. Normal model inadequate ●●


● ●● ●● ● ●● ●● ●● ●

●●● ●
● ●● ● ● ●● ● ●●●●
● ● ●●●
● ●● ● ● ● ● ● ●●● ●●

●●●

● ●● ●● ●
●●●●●●
●● ● ● ●
●● ● ●●● ●●● ● ●
●● ●●●
● ● ● ●
● ● ● ● ● ●● ●●●
●● ●●
●●
●●
● ●● ●
● ●●●●●●
●●●● ●●●●
● ● 10,000.0

● ●●

●● ●

●●
●● ● ●
● ●●
●●
● ● ● ● ● ●●●●

● ●●●

●●●●
● ● ●●
●● ●
●●
●● ●●●●●●●
● ●●●●●
●●●
● ● ● ● ● ●● ●
●● ●●●
●● ●

100.0 ●
● ●
●●
●● ●● ●
● ● ●● ●●●● ●●● ● ●
● ●●● ●
●●● ● ●●● ●●●●
● ● ●

●● ●
●● ● ● ●●●● ● ●●●● ●● ● ●●● ●●●●
•10.0BIC strongly prefers NegBin over Poisson


●●

● ●
● ●●
●● ●

●●
●●
●●●
● ●
●● ●

●●● ●


●●
●● ●●
●●
●●●●● ●●
●●●● ●
● ●●●



● ●

●● ●

●●

●●●●● ●

●●●
●● ● ●


● ●

●●● ●



● ●● ●




●●● ● ●


●●●●● ●
●●


●●



● ●
●●


● ●●●

● ●

●●

yu2
● ● ●● ● ●● ● ● ● ●●
● ●●● ● ●● ●● ●

●●●●●●
●● ●● ●

●●● ● ● ● ● ●●
●● ●●●●● ● ● ●●
●● ● ●
● ● ● ●● ●●● ● ● ●●
●● ●
● ● ●●
●●●● ● ●●
●● ● ●
●●● ● ●
●●

● ●●●
●●●●● ● ● ● ●● ● ●
● ●● ●● ●●● ● ●● ●
● ●● ● ●●

● ● ●●●
●● ●● ● ● ● ●●● ● ● ● ●
100,000.0 ●● ●●
●●●●●●●
● ●●
●●●● ● ●
●●
●●
●● ●●
●● ● ●
● ● ●●●● ● 1,000.0 ●● ●●

●●
● ● ● ● ●

• Also use BIC to select variables


● ● ●

●●●

●●●
● ●●●●●●
●●

●●
●●
●●●
● ● ● ● ●● ●
●●


●●


● ●●
●●●

●●●●
●●
● ● ●●
●● ● ●●● ●● ●
● ●●●● ● ●●●

●●

●●

●●●



●● ●
●● ● ●




● party
● ● ● ●● ● ● ●
●● ●●● ●● ●●●●
●● ●
● ● ●● ● ● ●● ● ● ● AfD
● ● ●● ●●● ● ●● ● ● ● ●

0.1 ● ●● ● ●●●●●● ● ● ● ● ● ●● ● ●
● ●● ● ● ● ●
● ● ● ●


● ●●
●●●
● ●
● ●● ●● ●
● ● ● ● ● ●●● ● ●●● ● ●● 100.0 ●
● ● ●●● ●● ● ●● ● ● ● ● ● CDU
● ●●●● ●● ● ● ●●● ● ●
10,000.0 ●● ● ●
●●●● ●●●●

●● ●● ●● ●
●●●● ●●● ● ● ● ●●● ● ●
●●● ●● ●●●●●● ●●●
●● ● ● ●● ● ●● ●
●●●●
●● ●●●●
●●
●●●●
● ●●

●●●●
●●●
●●



●●

●●

●●

●●
●●



●●

●●

●●●

● ●
●●

●●
●●●●●●●●
●●
● ● ● ●●●● ●● ● ● ●● ●●● ● ● ● ● ●●●● ● ● ● CSU
● ● ● ● ●●●●●● ●● ● ● ● ● ● ● ●● ●
● ●● ●●●●
● ● ● ● ●● ● ●●●
●●●

●●●●●●● ● ●●● ● ●



●● ●● ● ● ● ●●
● ● ● ●● ●●●● ●●●● ●●●●●
100,000.0 ● ● ●● ● ●● ●●●●
●●●
● ●●● ●●● ● ● ●●
● 100,000.0 ● ● ● ●
● FDP
● ●● ● ● ● ● ●● ● ●

zu2
● ●● ● ●● ● ●●● ● ●
● ● ●


●●
●● ● ● ● ●
● ●
● ● ● ● ● ● ●
●●● ●● ●●
●●●●● ● ●●●● ●
● ● ● ●● ●● ● ●● ●● ● ●●
● ● ● ●● ● ● ● ●●● ●●
● ●
MLE results (SE)
Topic 1 uses words related to crime, violence & migration
Favourites Retweets
log-followers:candidate 0.745 (0.013) 0.648 (0.011)
log-followers:party 0.445 (0.013) 0.384 (0.011)
AfD - -
CDU −3.028 (0.107) −3.181 (0.094)
CSU −2.884 (0.193) −3.096 (0.173)
FDP −1.929 (0.115) −2.350 (0.100)
Gruene −2.490 (0.100) −2.541 (0.087)
Linke −1.996 (0.099) −1.961 (0.085)
SPD −1.903 (0.100) −2.172 (0.087)
topic1 −5.573 (0.450) −5.436 (0.396)
CDU:topic1 8.829 (2.464) 5.186 (2.124)
CSU:topic1 13.533 (3.369) 10.679 (2.902)
FDP:topic1 10.180 (2.321) 12.270 (2.021)
Gruene:topic1 1.092 (1.759) 3.037 (1.547)
Linke:topic1 −2.784 (1.185) 0.650 (0.933)
SPD:topic1 2.019 (1.408) 2.313 (1.208)

+ 10 other topics, candidate vs. party, candidate:party interaction


● ●
●● ● ● ●●● ● ● ● ● ●

● ●● ●●●●
● ● ●●
Outcomes vs. % of topic
● ● ●1 ● ●● ●● ● ●
● ● ● ●● ●●●●●●● ●
● ● ●●● ●● ● ●
10.0 ●● ● ●● ●●●● ●● ●
●●● ● ●● ●●● ●●●●●●●●●●● ● ● ●●● ●●●●● ●●● ●●●
●● ●● ● ● ●
● ●●●● ●● ●●●●●●●●●● ● ● ● ● ● ●● ● ●●

yu2
Favorites Retweets ●●● ●
● ● ●●●●●●●●●
● ● ●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●
candidate party candidate

●● ● ● ●●●●● ●● ●●●●● ●● ●●●●●●● ●●● ● ●


● ●● ●● ●●● ●●●●●● ●●●●●●●● ●●●●● ● ● ●●● ●
●● ●●●●● ●●●●● ●●●●●●● ●●● ● ●●● ● 1000 ●● ●● ●
● ●●●●●● ●● ●●● ●● ●
● ● ● ●●●●●●●●●●●●●●●●●●●●●● ●● ●●● ●
●● ●●● ●● ●● ● ● ●
party
●●● ● ● ● ●
● ●●
1000 ● ●● ● ●● ●●●●●●● ●●●● ● AfD
● ● ●● ●● ● ●
0.1 ●
●●

● CDU
● 100
party
AfD

●●●●●● ●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●


CDU
CSU
● CSU
favourite_count

retweet_count
100
FDP
Gruene
Linke

100,000.0 ● ●●
SPD
● FDP
● ● ● ● alpha
● ● ●
● ● ●● ● ● ●●● ●
0.3

● ● ● 10
● ●●
● Gruene
●● ● ● ● ● ●● ●
10
● ●●● ●●●●●● ●●●● ●●●●●●● ● ● ●●
●●● ● ●●●● ●●● ●●●● ● ● ●●
● ●● ●●●● ●●● ●●● ● ● ●●●● ● ● ●
● ● Linke
●● ●●● ●●●●● ●●●● ●● ●●●● ● ●● ● ● ●●●
● ● ● ● ●●● ●●●●●●●●● ●● ●●●●●●●●●●● ●●●● ● ●● ● ● ●
10,000.0 ● ● ●●●● ●● ● ●●●●●●●●●●●●●●●●●● ●
● ● ● ●●●● ●● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ●
● ● ● ●●● ●●● ● SPD
●●●● ●●● ● ●●●●●●●● ●●●●●●●● ● ●●● ● ● ● ●● ●● ● ●● ● ●●
●● ● ●● ● ●● ●●● ●●●●●● ●●●● ● ● ● ● ● ● ●● ●
● ●● ●●● ●●●● ● ●●● ●● ● ● ●
●● ● ●
1 1

●●● ●●● ● ●● ●●●● ●●● ● ● ● ●


● ●
●● ●●●● ●●● ● ● ●

zu1
0.01 0.03 0.10 0.30 0.01 0.01
0.03 0.03
0.10 0.10
0.30 0.30 0.01

●●●●●●
topic_1 ● topic_1

● ● ● ● ● ● ● ●
● ●
Models for zeroes
Zero-inflated Poisson. With prob 1 − ηi we observe yi = 0, where

logit(ηi ) = x0i µ

and with prob ηi we observe yi ∼ Poi(µi ), log(µi ) = x0i β

Hurdle models. With prob 1 − ηi we get yi = 0, else yi ∼ Poi(µi )I(yi ≥ 1)

Key distinction
• Zero-inflated: zeroes can be “structural” or from non-zero process

P(yi ) = 1 − ηi + e −µi
• Hurdle: all zeroes are “structural”

P(yi ) = 1 − ηi

Binomial/NegBin versions available


Example: Vaccine adverse events

Models for number of adverse events vs. treatment group + covariates

Nb Param BIC AIC


Poisson 13 14033 13951
Zero-inflated Poisson 26 13549 13386
Hurdle Poisson 26 13550 13386
NegBin 14 13303 13215
Zero-infl NegBin 27 13332 13162
Hurdle NegBin 27 13340 13170
s: Model selection for male respondents
Observed vs. predicted frequencies
ON THE USE OF ZERO-INFLATED AND HURDLE MODELS 475
Health care usage
(dataset NMES1988 from AER R package)
(example from http://data.library.virginia.edu/getting-started-with-hurdle-models)

n = 4, 406 individuals aged ≥ 66 covered by Medicare


• Outcome: number of physician visits
• Predictors: gender, years of education, number of chronic
conditions, hospital stays, private insurance, health (3 categories)
Poisson regression
MLE SE P-value
Intercept 1.029 0.024 <0.0001
hospital 0.165 0.006 <0.0001
health - poor 0.248 0.018 <0.0001
health - excellent -0.362 0.030 <0.0001
chronic 0.147 0.005 <0.0001
gender - male -0.112 0.013 <0.0001
school 0.026 0.002 <0.0001
insurance - yes 0.202 0.017 <0.0001
Histogram of number of visits
Poisson regression. Observed 0’s: 683. Predicted 0’s: 47

700
600
500
400
Frequency
300
200
100
0

0 6 13 22 31 40 49 58 68 89
Number of visits
Poisson hurdle model
Model for counts. SE same as before, but MLE differs
MLE SE P-value
Intercept 1.406 0.024 < 2e-16
hospital 0.159 0.006 < 2e-16
health - poor 0.254 0.018 < 2e-16
health - excellent -0.304 0.031 < 2e-16
chronic 0.102 0.005 < 2e-16
gender - male -0.062 0.013 1.86e-06
school 0.019 0.002 < 2e-16
insurance - yes 0.081 0.017 2.37e-06

Model for P(yi > 0 | xi ) (logistic regression)


MLE SE P-value
Intercept 0.043 0.140 0.7577
hospital 0.312 0.091 0.0006
health - poor -0.009 0.161 0.9568
health - excellent -0.290 0.143 0.0424
chronic 0.535 0.045 < 2e-16
gender - male -0.416 0.088 2.09e-06
school 0.059 0.012 1.05e-06
insurance - yes 0.747 0.101 1.30e-13
Assess model fit
Red line: counts predicted by model; Bars: observed counts

Poisson hurdle NegBin hurdle


700

700
600

600
500

500
Frequency

Frequency
400

400
300

300
200

200
100

100
0

0 20 40 60 80 0 20 40 60 80

Number of visits Number of visits


Compare inference

Model for counts


Hurdle-Poisson Hurdle-NB
MLE SE P-value MLE SE P-value
Intercept 1.406 0.024 <0.0001 1.198 0.059 <0.0001
hospital 0.159 0.006 <0.0001 0.212 0.021 <0.0001
health - poor 0.254 0.018 <0.0001 0.316 0.048 <0.0001
health - excellent -0.304 0.031 <0.0001 -0.332 0.066 <0.0001
chronic 0.102 0.005 <0.0001 0.126 0.012 <0.0001
gender - male -0.062 0.013 <0.0001 -0.068 0.032 0.0351
school 0.019 0.002 <0.0001 0.021 0.005 <0.0001
insurance - yes 0.081 0.017 <0.0001 0.100 0.043 0.0188

• Higher uncertainty in Hurdle-NB


• Same model for P(yi > 0 | xi ) (logistic regression, as before)
R code
library(AER)
data("NMES1988")
nmes= NMES1988[, c(1,6:8,13,15,18)]
plot(table(nmes$visits), ylab=’Frequency’, xlab=’Number of visits’)

mod1= glm(visits ~ ., data = nmes, family = "poisson")


summary(mod1)
mu= predict(mod1, type="response") # predict expected mean count
exp= sum(dpois(x=0, lambda=mu)) # sum prob of a 0 count for each mean
round(exp) # predicted 0’s
sum(nmes$visits < 1) # observed 0’s

#Poisson hurdle model


library(pscl)
library(countreg)
mod.hurdle= hurdle(visits ~ ., data=nmes,dist="poisson")
mod.hurdle.nb= hurdle(visits ~ ., data=nmes,dist="negbin")
summary(mod.hurdle)
summary(mod.hurdle.nb)

rootogram(mod.hurdle,max=80,xlab=’Nb of visits’,style=’standing’,scale=’raw’)
rootogram(mod.hurdle.nb,max=80,xlab=’Nb of visits’,style=’standing’,scale=’raw’
R software for count data

Zeileis A, Kleiber C, Jackman S (2008). Regression Models for Count


Data in R. Journal of Statistical Software, 27(8)
https://www.jstatsoft.org/article/view/v027i08

R package mpath: L1 , L2 , SCAD and MCP penalties for Poisson/NegBin


and zero-inflated Poisson/NegBin
Outline

1 Time series

2 Factor regression

3 Generalized linear models

4 Beyond standard GLMs

5 Application to polarization and segregation


(Gentzkow, Shapiro, Taddy. Measuring polarization in high-dim data. NBER 2017)

Regression framework for high-dimensional count outcomes


• Political polarization: partisanship of USA congressional speech
1873-2016
• Residential segregation by political affiliation
Consider polarization. Parties employ different terms
• Democrats: estate taxes, undocumented workers, tax breaks for the
wealthy...
• Republicans: death taxes, illegal aliens, tax reform...
• Orlando nightclub killing of 49 people in 2016: “mass shooting” for
Democrats, “terrorism” for Republicans
Has partisanship evolved in time? Can we predict party from speech?
Data from US congress records 1873-2016 (digital text from HeinOnline)
• Select speeches by democrats/republicans (7,732 speakers). Total
36,161 unique speaker-session
• Count two-word phrases (bigrams). p = 508, 351 phrases with
count ≥ 10 in at least one session
• Predictors: party, state, chamber, gender

Methodology can affect the findings


• Jensen et al (2012): partisanship increased recently but was even
higher in the past. Analysis based on correlations
• Peterson & Spirling (2016): measure partisanship via
machine-learning. Suggests polarization even in fictitious data
• Gentzkow et al show that MLE may lead to misleading results
Note: in 1996-2014 speech polarization > voting polarization (Lauderdale
& Herzog 2016)
Model

• yitj : phrase count for speaker i, session t, phrase j


P
• mit = j yitj total speech of i at session t
• zi = 1 if speaker i is republican, zi = −1 if democrat
• xit : speaker characteristics (possibly time-varying)

Let yit ∼ Mult (mit , qt (zi , xit )). Prob of uttering phrase j is
P
• qtj (zi , xit ) = e uitj / l e uilt (multinomial logistic regression)
• uitj = αjt + zi ϕjt + x0it βjt

Methodological issues
• Sparse counts, many parameters: regularization
• Computational: multinomial likelihood hard, find a proxy
Measuring partisanship
Idea: If qt (1, x) very different from qt (−1, x) ⇒ partisanship

Suppose neutral observer hears a word, what prob would she assign to
the speaker’s true party?
• Prior P(republican)= P(democrat)= 0.5
• If speaker republican, choose phrase j with prob qtj (1, x)
• Given word j, probability of the speaker being republican is

ρtj (x) = qtj (1, x)/(qtj (1, x) + qtj (−1, x))

Define partisanship of speech at x as expected ρtj


1 1
πt (x) = qt (1, x)0 ρt (x) + qt (−1, x)0 (1 − ρt (x))
2 2
Let st total speakers at session t. Average partisanship
st
1 X
π̄t = πt (xit )
st
i=1
MLE: q̂t from multinomial-logistic regression, ρ̂t trivial. Then

π̂t (x) = 0.5q̂t (1, x)0 ρ̂t (x) + 0.5q̂t (−1, x)0 (1 − ρ̂t (x))

• π̂t is biased (Jensen’s ineq), important for low counts


• q̂t and ρ̂t correlated ⇒ inflates Var(π̂t )

Option 1: use different data for q̂t and ρ̂t , then average
Option 2: regularization. Authors use LASSO penalty
X
λj |ϕjt | + 10−5 λj (|αjt | + |βjt |1 )
t

• λj chosen to minimize BIC (also tried 5-fold CV)


• 10−5 means xit less penalized than party. Tried alternatives to 10−5
Computation
P
Multinomial likelihood requires qit (zi , xit ) = e uitj / l e uitl ,
uitj = αjt + zi ϕjt + x0it βjt . Instead, authors use Poisson approx
 0

yitj ∼ Poisson e αjt +zi ϕjt +xit βjt

P
• Avoids computing sums over phrases, i.e. l e uitl
• Parallel computation of uitj across phrases j
Other tricks to speed LASSO path etc.
Remark. If vj ∼ Poi(µj ) indep j = 1, . . . , J, then
 
X X 1
v1 , . . . , vJ | vj ∼ Mult  vj , P µ
j µj
j j
MLE
Figure 1: Average Partisanship and Polarization of Speech, Plug-in Estimates

Partisanship:
Paneloriginal & “random”
A: Partisanship data (permuted
from Maximum Likelihoodparty affiliation
Estimator (p̂ MLEz)i ) t
0.64 real random

0.62
Average partisanship

0.60

0.58

0.56

0.54

1870 1890 1910 1930 1950 1970 1990 2010

Bands are 90% intervals obtained


Panel B: by sub-sampling
Polarization from Jensen et al. (2012)
real random
3
0.54

1870 1890 1910 1930


Jensen et al (2012)
1950 1970 1990 2010

Panel B: Polarization from Jensen et al. (2012)


real random
3

2
Standardized polarization

−1

1870 1890 1910 1930 1950 1970 1990 2010

Notes: Panel A plots the average partisanship series from the maximum likelihood estimator p̂tMLE defined in Section 4.1. “Real” series is fr
actual data; “random” series is from hypothetical data in which each speaker’s party is randomly assigned with the probability that the spea
is Republican equal to the average share of speakers who are Republican in the sessions in which the speaker is active. The shaded reg
around each series represents a pointwise confidence interval obtained by subsampling (Politis et al. 1999). Specifically, we randomly partit
the set of speakers into 10 equal-sized subsamples (up to integer restrictions) and, for each subsample k, we compute the MLE estimate
k p k 1 1 10 l 1
Option 1. Obtain ρ̂, q̂ indep

Figure 2:i:Average
For each individual q̂ usesPartisanship
only data of from
Speech,i,Leave-out all j 6=(p̂itLO )
ρ̂ fromEstimate
0.54 real random

0.53
Average partisanship

0.52

0.51

0.50

0.49
1870 1890 1910 1930 1950 1970 1990 2010

Notes: Figure shows the average partisanship series from the leave-out estimator p̂tLO defined in Section
4.1. “Real” series is from actual data; “random” series is from hypothetical data in which each speaker’s
party is randomly assigned with the probability that the speaker is Republican equal to the average share of
Option 2. LASSO
Figure 3: Average Partisanship of Speech, Penalized Estimates

Panel A: Preferred Specification


real
random
0.512

0.510
Average partisanship

0.508

0.506

0.504

0.502

0.500

1870 1890 1910 1930 1950 1970 1990 2010

Panel B: Post-1976 with Key Events


Same pattern, smaller variance
0.512
C−SPAN C−SPAN2 Contract with America
0.504

0.502
From 1976
0.500

1870 1890 1910 1930 1950 1970 1990 2010


In 1994 Republicans won. Newt Gingrich led platform Contract with
America, used focus groups/polling to set rethoric resonating with voters
Panel B: Post-1976 with Key Events
0.512
C−SPAN C−SPAN2 Contract with America
Average partisanship

0.508

0.504

0.500 Ford Carter Reagan Bush Clinton Bush Obama

1976 1980 1984 1988 1992 1996 2000 2004 2008 2012 2016

Notes: Panel A shows the results from our preferred penalized estimator defined in Section 4.2. “Real” series is from
actual data; “random” series is from hypothetical data in which each speaker’s party is randomly assigned with the
probability that the speaker is Republican equal to the average share of speakers who are Republican in the sessions in
Consultant Frank Luntz was involved in the 1994 campaign. When asked
if “language can change a paradigm”, he answered

I don’t believe it – I know it. I’ve seen it with my own eyes... I watched
in 1994 when the group of Republicans got together and said: “We’re
going to do this completely differently than it’s ever been done before...”
(Luntz 2004)
Partisanship using multiple words
Partisanship measures predictive power of using 1 phrase. What if we
Figure 4: Informativeness of Speech by Speech Length and Session
used > 1 phrase?

1.0 1873−1874
1989−1990 One minute
2007−2008 of speech

0.9
Expected posterior

0.8

0.7

0.6

0.5

0 20 40 60 80 100
Number of phrases

Notes: For each speaker i and session t we calculate, given characteristics xit , the expected posterior that
an observer with a neutral prior would place on a speaker’s true party after hearing a given number of
Partisanship by topics
Partisanship if we only used words
Figure from one
7: Partisanship (manually defined) topic
by Topic
alcohol budget business
0.50 0.54 0.58

0.50 0.54 0.58

0.50 0.54 0.58


partisanship
Avg.

1870 1890 1910 1930 1950 1970 1990 2010 1870 1890 1910 1930 1950 1970 1990 2010 1870 1890 1910 1930 1950 1970 1990 2010
Freq.

0.001 0.015 0.025


0.000 0.000 0.000

crime defense economy


0.50 0.54 0.58

0.50 0.54 0.58

0.50 0.54 0.58


1870 1890 1910 1930 1950 1970 1990 2010 1870 1890 1910 1930 1950 1970 1990 2010 1870 1890 1910 1930 1950 1970 1990 2010
0.005 0.028 0.009
0.000 0.000 0.000

education elections environment


0.50 0.54 0.58

0.50 0.54 0.58

0.50 0.54 0.58


1870 1890 1910 1930 1950 1970 1990 2010 1870 1890 1910 1930 1950 1970 1990 2010 1870 1890 1910 1930 1950 1970 1990 2010
0.011 0.004 0.004
0.000 0.000 0.000

federalism foreign government


0.50 0.54 0.58

0.50 0.54 0.58

0.50 0.54 0.58


1870 1890 1910 1930 1950 1970 1990 2010 1870 1890 1910 1930 1950 1970 1990 2010 1870 1890 1910 1930 1950 1970 1990 2010
0.020 0.009 0.019
0.000 0.000 0.000

health immigration justice


0.50 0.54 0.58

0.50 0.54 0.58

0.50 0.54 0.58

1870 1890 1910 1930 1950 1970 1990 2010 1870 1890 1910 1930 1950 1970 1990 2010 1870 1890 1910 1930 1950 1970 1990 2010
Residential segregation

Goal: relationship between where an individual buys a property and


• Party affiliation (from American National Election Studies 2015 &
Pew Research Center 2016)
• Political contributions (from Federal Election Commission 2015)
Same model as before
• yitj if individual i buys a property at location j at time t (mit = 1
properties per individual)
• zi : party affiliation / party donated to
• No covariates xit
Panel A: By Party Identification
MLE By party affiliation
real random 0.62
0.62
0.60
0.60
Average partisanship

Average partisanship
0.58
0.58

0.56
0.56

Residential Segregation of Voters 0.54


0.54

0.52 0.52

l A: By Party Identification 0.50 0.50

1955 1960 1965 Preferred Specification


1970 1975 1980 1985 1990 1995 2000 2005 2010
MLE1955 1960 1965 197

0.62 real random

0.60
Panel B: By Party Campaign Contributio
MLE
Average partisanshipAverage partisanship

0.58
0.70 real random 0.70
0.56

0.54
0.65 0.65

Average partisanship
0.52
0.60 0.60
0.50

00 2005 2010
0.55
1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010
LASSO
0.55
0.62 real random

0.60
Panel B: By Party Campaign Contributio
MLE By donation
Average partisanship Average partisanship
0.58

0.70 real random 0.70


0.56

0.54
0.65 0.65

Average partisanship
0.52

0.60 0.60
0.50

00 2005 2010 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010
0.55 0.55

y Party Campaign Contributions 0.50 0.50

Preferred Specification 1980 1985 1990 1995 2000 2005 2010 2015
MLE1980 1985 19

0.70 real random

Notes: Plots in Panel A show the average residential partisanship series u


0.65

fication data with j indexing counties. Plots in Panel B show the average
Average partisanship

party contribution data with j indexing zipcodes. For each panel, the plot
0.60

maximum likelihood estimator defined in 4.1. The plot on the right sho
penalized estimator without covariates and with settings y = 10 6 and
0.55

actual data; “random” series is from hypothetical data in which each respo
0.50

2010 2015
signed a party with the probability that the respondent/contributor
1980
LASSO is Rep
1985 1990 1995 2000 2005 2010 2015

of respondents/contributors who are Republican in that year.

You might also like