Extensions Beyond Linear Regression: Topics in Data Science
Extensions Beyond Linear Regression: Topics in Data Science
Extensions Beyond Linear Regression: Topics in Data Science
1 Time series
2 Factor regression
yt = ηyt−1 + x0t β + t
where |η| < 1 and t ∼ N(0, φ) indep t = 1, . . . , T . Likelihood:
T
Y
p(y | η, β, φ, X ) = p(yt | yt−1 , η, β, φ, xt ) = N(y; µ, φΣ)
t=1
Easy to min over β for given Σ, but we also need to min over Σ
• Not too hard for AR(r) models (Σ−1 has 0’s after r lags)
• Function convex in β, not necessarily in (β, Σ)
Bayesian inference
Z
p(γ | y) ∝ p(γ) N(y; Xγ βγ , φΣ)p(β, φ, Σ | γ)dβdφdΣ
where ∼ N(0, φI )
• Can infer r via variable selection on η’s. Hierarchical restrictions?
• We lost the first r observations. OK if r T .
Formally equivalent to
y = Dy ηy + X β + Dx ηx +
2
Residuals
0
Residuals under iid model
(+ 3 covariates)
−2
−4
0 20 40 60 80
Time
Residuals from AR(5) model (+ 3 covariates)
1.0
2
0.8
1
0.6
Residuals
ACF
0.4
0
0.2
−1
0.0
−0.2
−2
0 20 40 60 80 0 5 10 15
Time Lag
Top models
Model p(γ | y)
e,rw .461 Variable P(γj = 1 | y)
e,rw,u .337 e 0.991
e,u .156 rw 0.807
u .037 u 0.502
rw,u .009
...
AR(5) model
Let yt ∈ Rq , xt ∈ Rp . Consider
t ∼ N(0, Σ) indep t = 1, . . . , T
R I Y M ∆P
❅
❅
❅❅C
x01 0 0 ... 0
y1 0
x02 ... θ1 1
x2 0 0 θ2
. . . = x03 x03 x03 ... 0 + . . .
. . .
yT . . . T
θK
x0T x0T x0T ... x0T
Outline
1 Time series
2 Factor regression
Y = X B + Z M + E
n×q n×p p×q n×r r ×q n×q
Google data reduces MSE mainly weeks 1-3 (relative to no Google data)
xt,g has p = 1776 de-seasonalized weekly changes for category/country
• 26 search categories, 296 subcategories
• Belgium, France, Germany, Italy, Netherlands, Spain
Strategy 1: add xt,g with LASSO penalties
0.15
0.10
MSE (out−of−sample)
0.05
LASSO
PCA
0.00
SPCA
2 4 6 8 10 12
Week
Factor loadings
Factor LoadiSparse
ng for F1 (SPCA) PCA Factor Loading forPCA
F1 (PCA)
0.2 0.2
Factor Loading
Factor Loading
0 0
-0.2 -0.2
0 200 400 600 800 1000 1200 1400 1600 1800 0 200 400 600 800 1000 1200 1400 1600 1800
Category by Country Category by Country
Factor Loading for F2 (SPCA) Factor Loading for F2 (PCA)
0.4 0.4 analysis?
Ideas on how to potentially improve this
oading
oading
0.2 0.2
Outline
1 Time series
2 Factor regression
log µi = x0i β
n
X
min − log p(y | β, φ) + h(β) = − log p(yi | β, φ) + h(β)
β,φ
i=1
n
Y n
X
µyi 0
log p(y | β) = log i
e −µi = c + yi x0i β − e xi β
yi !
i=1 i=1
1.0
300
250
0.5
200
Coefficients
Frequency
150
0.0
100
−0.5
50
0
0 20 40 60 0 1 2 3
y L1 Norm
BMA
1.0
BMS
0.5
Model γ p(γ | y)
1, 2, 3, 4, 5 0.721
β
1, 2, 3, 4, 5, 10 0.128
0.0
1, 2, 3, 4, 5, 12 0.018
... ...
−0.5
5 10 15 20
Variable
Outline
1 Time series
2 Factor regression
Data may have more 0’s than predicted by model. Let µi = E (yi | xi )
• Poisson: P(yi = 0 | xi ) = e −µi
• Binomial: P(yi = 0 | xi ) = (1 − µi /ni )ni
Common solutions
1 Models with additional variance parameter, e.g. NegBinom
2 Zero-inflated and hurdle models
Why is it a problem in practice?
Pni
Dispersion. If zij ∼ Bern(µi ) iid then yi = j=1 zij ∼ Bin(ni , µi )
• What if not indep across j? e.g. measures from same school,
company, region...
• What if µi not constant across j? e.g. unobserved covariates
Zero counts
• Zeroes depend on x’s in a different way, e.g. never visiting a doctor,
never applying for well-fare
• Under-reporting (e.g. self-reported depression), lack of sensitivity
(e.g. counting people in satellite images)
model. In addition we assessed the general fit of the models by computing Pearson’s
chi-square using the predicted counts. Lastly, we used the Akaike (AIC) and
Bayesian (BIC) information criteria to compare models.
5. RESULTS
Vaccine adverse events
(Rose et al, J Biopharm Stat 2006, 16: 463-81)
We computed the empirical mean and variance for our systemic adverse event
Goal: assess safety
endpoint. of Anthrax
Total number vaccine
of systemic adverse from randomized
events was clinical
recorded after trial
each of the
four injections for the 1005 study participants, which results in 4020 observations.
• n= 1005
Study participants,
participants 4 study
experienced injections
from 0–12 (0, 2,adverse
unique systemic 4, 24events
weeks)
after
an injection. The observed mean and variance using all 4020 observations are
Motivation
• 5 treatments: full-doseOur
1.51 and 2.90, respectively. SQobserved
or IM,variance
reduced dose,
to mean placebo
ratio SQ or IM
is 1.92, which
indicates some over-dispersion. Figure 1 compares the observed count distribution
• Outcome: number of adverse events after each injection
Negative Binomial
Def. yi = number of successes in Ber(µi ) trials until r failures
⇒ yi ∼ NegBin(r , µi )
r µi E (yi )
E (yi ) = ; Var(yi ) =
1 − µi 1 − µi
yi + r − 1
p(yi | r , µi ) = (1 − µi )r µyi i
yi
yi | λi ∼ Poi(λi )
λi | r , µi ∼ Gamma(r , µi /(1 − µi )).
(with R. Knudsen & S. Majó, Oxford Reuters Institute for Investigative Journalism)
Predictors
• Number of followers
• Party affiliation
• Party vs. candidate
• % of tweet talking about 40 topics (Supervised LDA, to be seen)
● ● ●
● ●
● ●
● ●
● ● ● ●● ● ● ● ●
●●●●
●● ●●●●●
●●
●
●●
●●●
●●●●
●
●●●●
●●
●●
●●
●
●
●●
●●
●
●
●●
●
●●
●●
●●
●
●●
●●
●●
●●
●●
●
●●
●●●●●
●●●
●●●
● ● ●● ●● ● ● ●●
● ●●● ●●● ●● ●●●● ● ●
● ●● ● ●
● ● ● ●● ● ● ●
●
●●●● ●
● ● ●●●● ●
●● ●●
1,000.0 ● ● ● ●● ●●●●● ●●
●● ●
10.0 ●●● ●● ●● ● ●
●●●●●●●●● ● ●
● ● ●
● ● ●●● ●●● ●
● ● ●
●●
●●
●● ● ●
● ●●●●
● ● ●●
●●● ●
Outcomes vs. number of followers and party ● ● ●●
●
● ●
● ●●
● ● ●●● ●●
● ● ●
●
●
●
●
● ●● ●●
●●
●
●● ●●
●● ●
●
●
●
●
●●
●● ●
●
●
● ● ●●
●●
● ●●
●●
●
●
●
●
●
●●
●●
●
●
●●
●
●●
●
●●● ● ●●●
● ● ● ● ●●
●
●●● ●
● ● ●●
● ●
●
●
●●● ● ●●●●●● ●●
● ● ● ● ● ●● ●
● ● ●●
●● ●●●
●
●● ●● ●●
● ● ●● ●● ●
● ●●● ●●
● ● ● ● ●●
●● ●●●● ● ●● ● ● ● ● ●
● ● ● ● ● ● ●
● ●● ● ●● ● ● ● ● ● ●●● ● ●
● ●● ●● ●● ● ●
●●●
●●
●● ● ● ●●
● ●● ●● ●●●●
● ● ●●
● ● ●● ● ● ● ● ● ●● ● ● ●●
● ●● ● ● ● ● ●
● ●●●● ●● ● ● 0.1 ● ● ●●●
● ●●
● ● ● ●
10.0 ●
● ● ● ●● ● ●● ●
● ●● ●● ●● ●
Favorites
●
●●
●● ●
●●●●
●
●●●
●● ● ●
●●
●●● ●●
●
●●
●
●●
●
●●
●●
●●●●●●● ●
●
●●● ●● ● ● Retweets●
● ●
●
● ●●● ●● ●●●
●
● ●●●●●● ●● ●●
●
●
yu2
●●● ●●● ● ●
● ●●●●●●●●● ●● ●
●●●
●●
● ●
●●●
●
●●●●●
●
●
●
●●●● ● ● ● ● ● ●●●● ●●
● ● ● ●
●● ●
● x ● ●●
● ● ● ● ●●●●●● ●● ●●●●
●● ●●●●●
●●
●
●●
●●●
●●●●
●
●●●●
●
●●
●●
●●
●
●
●●
●●
●
●
●●
●
●●
●●
●●
●
●●
●●
●●
●●
●●
●
●●
●●●
●●●●
●
●●●
●
● ●● u2 ●=● "candidate"
●
● ●●
●
●● ●
●
●
●●● ●
● ●
●
● ● ●
● ●●● ●
● ● ● xu2 = "party" ● ●
●
●●● ●
●● ●●●●● ●● ●●●
●●● ●●
●●
●●
●●
●● ●●
●
●●●●
● ●
●●
●
●● ● ●
● ●● ●
● ●●●● ● ●● ● ●● ●● ●
●● ● ●
● ●●●●●● ●●●● ●● ● ●
● party
● ● ●
●
●
●●●●● ● ●
●●
● ●●● ● ●● ●● ● ●●● ●
●● ●
●●● ●
●
1,000.0
● ●● ●●
1,000.0 ● ●
● ● ●●
● ●● ● ●● ●●●●●● ● ● ● AfD
●● ● ●●● ● ●
0.1 ●
● ●● ● ●
●
● ●
● ● ●●
● ●● ● ● ● ●
● ● ●● ● ● ● ●● ●●● ●●
●
● ● CDU
●● ●● ●● ● ● ●● ●
● ● ●●● ● ●● ●● ●
●● ● ● ●
● ●● ●●● ●● ●● ●● ● ● ●● ● ●●
● ● ● ● ● ●
●
● ●● ● ●● ●● ● ● ●
● ●● ● ● ● ● ● ●
●
●● ● ●● ● ● ●
●●●●
●● ●●●● ●●●●●●
● ●●
●
●
●●
●●●●
●
●
●
●
●●
●●●
●●●
●●
●●
●●
●●
●●●
●●
●●●●
●●●
●
●●
●●●●
●●
●●●●
●●●
● ●●
●●●
● ● ● ●●●
●
● ● ●● ● ● ●●●● ●● ●● ●
● ● ● ● CSU
● ● ● ●● ●●●●●
● ●●
● ●
●
● ● ● ●●
● 10.0 ● ● ●● ● ●
● ● ●
●● ●● ● ●● ● ●●
●● ●
10.0 ● ●● ● ●●●
● ●● ● ●● ● ● ●● ● ●●●●●● ●●●●●● ● ● ●
●●
●● ●●●●●●● ●
●
●●
● ● ●
●●●●●● ●●
● ● ●●●● ●
● ●●● ●●● ●●●●● ●●
●●● ●●
●
● ● ● ●
● ● ●
●●● ●
● ●●●● ●● ●
● ●●
●●●● ●● ●●● ●●
yu1
100,000.0 ● ● ●● ●
●●
●
●●
●● ●
● ● ●
●
●
● ●●
●●
●
●●
●●
●
●●●
●●
●●● ●
● ● ● ●●●●●●● ● ●●●●●● ●● ●
●●●
● ●
●●●●
●
●●
●●●
●●
●
●● ●● ● ●
● ● FDP
● ● ●
● ●● ●●
● ● ●● ●●
● ●
●●
●●●
●●
●●●● ● ●● ● ●● ● ● ● ●
●
●
●●●●● ●
●● ● ●● ● ●
●●● ●● ●
●●
● ● ●
●●●●●●● ●●
● ● ●●● ●● ● ● ●●●●●●● ●● ●●
●●●● ●●
●
● ●● ● ● ●●
●● ●● ●● ●●● ● ●● ● ● ●●●
●●
● ●● ●●
● ●● ●
●● ●●●●●●● ●● ●● ● ●
●● ●●● ●● ●● ●
●● ●● ●●●● ●
● ● ●● ● ● ● ●● ● ● ● ●
●●●● ●●
● ●●●
●● ●●● ●●● ● ● Gruene
● ● ●
● ● ● ●● ●● ●● ●● ●
●●●●●● ●
●●● ●●
●
● ●●●
● ●
●● ●● ● ● ●●● ●●●● ●
●● ● ●●●
●
●
●●
● ●
●● ● ●
● ●●●● ●● ● ● ● ● ● ●●●●●● ●●● ● ● ●● ●● ●● ●●
●●
●● ● ●
● ●●●
●
●
●●●
●
●●
● ●● ●●●●
● ●● ● ●● ●●●●● ●
●
● ●● ●●
●●
● ● ●● ●●●
●
●● ●
● ●●● ● ●
●
● ●● ●●
● ●●
● ● ●
●●●●●●● ●●
●● ● ●●
● ● ● ● ●●● ● ● ● ●●
● ● ●
● ● ● ●●●●●●● ● ●●● ● ● ●● ●●●●●●● ● ●
●●●
●● ●
● ● ●● ●
●
●
●●●●
●● ●
● ●● ● ●●● ● ●● ● ● ●● ● ● ●●● ●●● ● ●●●●● ● ● Linke
●●● ● ● ●● ●●●● ●●●● ●
●● ●● ●●● ●●●●●● ● ●
●●●● ●●●● ● ● ●● ● ●● ●●●● ●●●●
0.1
10,000.0 ● ●●
●●●
●●● ●●●● ●●● ● ●●
● ● ●●● ●●●●●●●●●
0.1 ● ●●
●● ● ●● ● ● ●●●
● ●
● ● ●● ●
●
●● ●● ● ●● ●
●●●● ●
● ●● ● ● ● ● ●● ● ● ● ●
●
●●● ● ●● ● ●
● ●●●●●● ●● ●
●● ●
● ● ● ● ● ● ●●● ● ●● ● ●● ● SPD
● ● ●
●
●●● ● ● ●● ● ●
●
●
● ● ●
●● ● ●●
● ● ●
● ● ●● ● ●●● ●● ● ● ●
● ● ● ● ● ● ●
● ● ●● ●
●●● ● ●●●●● ● ●● ● ●
●
●● ● ● ● ●
● ●● ● ●●
●●●●
●● ●●●●●
●●
●
●●
●●●
●●●●
●
●●●●
●
●●
●●
●
●●
●
●
●●
●●
●
●
●●
●
●●
●
●●
●
●●
●
●●
●●
●●
●●
●●●
●
●●●●●
●●●
●●●●
● ●●●● ●
●
●
●●●●
●● ●●●●
●●
●●●●
● ●●
●
●●●●
●●●
●
●●
●
●●
●
●●
●
●●
●
●
●●
●●
●
●●
●
●●
●●
●●
●●
●●
●●
●
●●●●●
●●●
●●●
● ●
●
●●● ●
●● ●● ● ●
● ●
zu1
●
●● ● ● ●
●
●● ●●● ●
● ● ● ● ● ● ●
1,000.0 ● ● 100,000.0 ●
1,000.0 ● ●
●
● ● ● ● ● ● ●
●●
● ● ●
●●● ●
●● ● ● ●●
● ● ● ● ●●
●● ● ● ● ●●● ● ● ●
●●● ●● ● ●● ● ●●
●
● ● ● ●●● ● ● ●● ●● ●●
●
●●
●
● ●
● ●●
●● ●
●
●●
●●
●●●
● ●
●● ●
●
●●● ●
●
●
●●
●● ●●
●●
●●●●● ●●
●●●● ●
● ●●●
●
●
●
● ●
●
●● ●
●
●●
●
●●●●● ●
●
●●●
●● ● ●
●
●
● ●
●
●●● ●
●
●
●
● ●● ●
●
●
●
●
●●● ● ●
●
●
●●●●● ●
●●
●
●
●●
●
●
●
● ●
●●
●
●
● ●●●
●
● ●
●
●●
yu2
● ● ●● ● ●● ● ● ● ●●
● ●●● ● ●● ●● ●
●
●●●●●●
●● ●● ●
●
●●● ● ● ● ● ●●
●● ●●●●● ● ● ●●
●● ● ●
● ● ● ●● ●●● ● ● ●●
●● ●
● ● ●●
●●●● ● ●●
●● ● ●
●●● ● ●
●●
●
● ●●●
●●●●● ● ● ● ●● ● ●
● ●● ●● ●●● ● ●● ●
● ●● ● ●●
●
● ● ●●●
●● ●● ● ● ● ●●● ● ● ● ●
100,000.0 ●● ●●
●●●●●●●
● ●●
●●●● ● ●
●●
●●
●● ●●
●● ● ●
● ● ●●●● ● 1,000.0 ●● ●●
●
●●
● ● ● ● ●
●
●
●● ●
●● ● ●
●
●
●
● party
● ● ● ●● ● ● ●
●● ●●● ●● ●●●●
●● ●
● ● ●● ● ● ●● ● ● ● AfD
● ● ●● ●●● ● ●● ● ● ● ●
●
0.1 ● ●● ● ●●●●●● ● ● ● ● ● ●● ● ●
● ●● ● ● ● ●
● ● ● ●
●
●
● ●●
●●●
● ●
● ●● ●● ●
● ● ● ● ● ●●● ● ●●● ● ●● 100.0 ●
● ● ●●● ●● ● ●● ● ● ● ● ● CDU
● ●●●● ●● ● ● ●●● ● ●
10,000.0 ●● ● ●
●●●● ●●●●
●
●● ●● ●● ●
●●●● ●●● ● ● ● ●●● ● ●
●●● ●● ●●●●●● ●●●
●● ● ● ●● ● ●● ●
●●●●
●● ●●●●
●●
●●●●
● ●●
●
●●●●
●●●
●●
●
●
●
●●
●
●●
●
●●
●
●●
●●
●
●
●
●●
●
●●
●
●●●
●
● ●
●●
●
●●
●●●●●●●●
●●
● ● ● ●●●● ●● ● ● ●● ●●● ● ● ● ● ●●●● ● ● ● CSU
● ● ● ● ●●●●●● ●● ● ● ● ● ● ● ●● ●
● ●● ●●●●
● ● ● ● ●● ● ●●●
●●●
●
●●●●●●● ● ●●● ● ●
●
●
●
●● ●● ● ● ● ●●
● ● ● ●● ●●●● ●●●● ●●●●●
100,000.0 ● ● ●● ● ●● ●●●●
●●●
● ●●● ●●● ● ● ●●
● 100,000.0 ● ● ● ●
● FDP
● ●● ● ● ● ● ●● ● ●
zu2
● ●● ● ●● ● ●●● ● ●
● ● ●
●
●
●●
●● ● ● ● ●
● ●
● ● ● ● ● ● ●
●●● ●● ●●
●●●●● ● ●●●● ●
● ● ● ●● ●● ● ●● ●● ● ●●
● ● ● ●● ● ● ● ●●● ●●
● ●
MLE results (SE)
Topic 1 uses words related to crime, violence & migration
Favourites Retweets
log-followers:candidate 0.745 (0.013) 0.648 (0.011)
log-followers:party 0.445 (0.013) 0.384 (0.011)
AfD - -
CDU −3.028 (0.107) −3.181 (0.094)
CSU −2.884 (0.193) −3.096 (0.173)
FDP −1.929 (0.115) −2.350 (0.100)
Gruene −2.490 (0.100) −2.541 (0.087)
Linke −1.996 (0.099) −1.961 (0.085)
SPD −1.903 (0.100) −2.172 (0.087)
topic1 −5.573 (0.450) −5.436 (0.396)
CDU:topic1 8.829 (2.464) 5.186 (2.124)
CSU:topic1 13.533 (3.369) 10.679 (2.902)
FDP:topic1 10.180 (2.321) 12.270 (2.021)
Gruene:topic1 1.092 (1.759) 3.037 (1.547)
Linke:topic1 −2.784 (1.185) 0.650 (0.933)
SPD:topic1 2.019 (1.408) 2.313 (1.208)
yu2
Favorites Retweets ●●● ●
● ● ●●●●●●●●●
● ● ●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●
candidate party candidate
retweet_count
100
FDP
Gruene
Linke
100,000.0 ● ●●
SPD
● FDP
● ● ● ● alpha
● ● ●
● ● ●● ● ● ●●● ●
0.3
● ● ● 10
● ●●
● Gruene
●● ● ● ● ● ●● ●
10
● ●●● ●●●●●● ●●●● ●●●●●●● ● ● ●●
●●● ● ●●●● ●●● ●●●● ● ● ●●
● ●● ●●●● ●●● ●●● ● ● ●●●● ● ● ●
● ● Linke
●● ●●● ●●●●● ●●●● ●● ●●●● ● ●● ● ● ●●●
● ● ● ● ●●● ●●●●●●●●● ●● ●●●●●●●●●●● ●●●● ● ●● ● ● ●
10,000.0 ● ● ●●●● ●● ● ●●●●●●●●●●●●●●●●●● ●
● ● ● ●●●● ●● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ●
● ● ● ●●● ●●● ● SPD
●●●● ●●● ● ●●●●●●●● ●●●●●●●● ● ●●● ● ● ● ●● ●● ● ●● ● ●●
●● ● ●● ● ●● ●●● ●●●●●● ●●●● ● ● ● ● ● ● ●● ●
● ●● ●●● ●●●● ● ●●● ●● ● ● ●
●● ● ●
1 1
zu1
0.01 0.03 0.10 0.30 0.01 0.01
0.03 0.03
0.10 0.10
0.30 0.30 0.01
●●●●●●
topic_1 ● topic_1
● ● ● ● ● ● ● ●
● ●
Models for zeroes
Zero-inflated Poisson. With prob 1 − ηi we observe yi = 0, where
logit(ηi ) = x0i µ
Key distinction
• Zero-inflated: zeroes can be “structural” or from non-zero process
P(yi ) = 1 − ηi + e −µi
• Hurdle: all zeroes are “structural”
P(yi ) = 1 − ηi
700
600
500
400
Frequency
300
200
100
0
0 6 13 22 31 40 49 58 68 89
Number of visits
Poisson hurdle model
Model for counts. SE same as before, but MLE differs
MLE SE P-value
Intercept 1.406 0.024 < 2e-16
hospital 0.159 0.006 < 2e-16
health - poor 0.254 0.018 < 2e-16
health - excellent -0.304 0.031 < 2e-16
chronic 0.102 0.005 < 2e-16
gender - male -0.062 0.013 1.86e-06
school 0.019 0.002 < 2e-16
insurance - yes 0.081 0.017 2.37e-06
700
600
600
500
500
Frequency
Frequency
400
400
300
300
200
200
100
100
0
0 20 40 60 80 0 20 40 60 80
rootogram(mod.hurdle,max=80,xlab=’Nb of visits’,style=’standing’,scale=’raw’)
rootogram(mod.hurdle.nb,max=80,xlab=’Nb of visits’,style=’standing’,scale=’raw’
R software for count data
1 Time series
2 Factor regression
Let yit ∼ Mult (mit , qt (zi , xit )). Prob of uttering phrase j is
P
• qtj (zi , xit ) = e uitj / l e uilt (multinomial logistic regression)
• uitj = αjt + zi ϕjt + x0it βjt
Methodological issues
• Sparse counts, many parameters: regularization
• Computational: multinomial likelihood hard, find a proxy
Measuring partisanship
Idea: If qt (1, x) very different from qt (−1, x) ⇒ partisanship
Suppose neutral observer hears a word, what prob would she assign to
the speaker’s true party?
• Prior P(republican)= P(democrat)= 0.5
• If speaker republican, choose phrase j with prob qtj (1, x)
• Given word j, probability of the speaker being republican is
π̂t (x) = 0.5q̂t (1, x)0 ρ̂t (x) + 0.5q̂t (−1, x)0 (1 − ρ̂t (x))
Option 1: use different data for q̂t and ρ̂t , then average
Option 2: regularization. Authors use LASSO penalty
X
λj |ϕjt | + 10−5 λj (|αjt | + |βjt |1 )
t
P
• Avoids computing sums over phrases, i.e. l e uitl
• Parallel computation of uitj across phrases j
Other tricks to speed LASSO path etc.
Remark. If vj ∼ Poi(µj ) indep j = 1, . . . , J, then
X X 1
v1 , . . . , vJ | vj ∼ Mult vj , P µ
j µj
j j
MLE
Figure 1: Average Partisanship and Polarization of Speech, Plug-in Estimates
Partisanship:
Paneloriginal & “random”
A: Partisanship data (permuted
from Maximum Likelihoodparty affiliation
Estimator (p̂ MLEz)i ) t
0.64 real random
0.62
Average partisanship
0.60
0.58
0.56
0.54
2
Standardized polarization
−1
Notes: Panel A plots the average partisanship series from the maximum likelihood estimator p̂tMLE defined in Section 4.1. “Real” series is fr
actual data; “random” series is from hypothetical data in which each speaker’s party is randomly assigned with the probability that the spea
is Republican equal to the average share of speakers who are Republican in the sessions in which the speaker is active. The shaded reg
around each series represents a pointwise confidence interval obtained by subsampling (Politis et al. 1999). Specifically, we randomly partit
the set of speakers into 10 equal-sized subsamples (up to integer restrictions) and, for each subsample k, we compute the MLE estimate
k p k 1 1 10 l 1
Option 1. Obtain ρ̂, q̂ indep
Figure 2:i:Average
For each individual q̂ usesPartisanship
only data of from
Speech,i,Leave-out all j 6=(p̂itLO )
ρ̂ fromEstimate
0.54 real random
0.53
Average partisanship
0.52
0.51
0.50
0.49
1870 1890 1910 1930 1950 1970 1990 2010
Notes: Figure shows the average partisanship series from the leave-out estimator p̂tLO defined in Section
4.1. “Real” series is from actual data; “random” series is from hypothetical data in which each speaker’s
party is randomly assigned with the probability that the speaker is Republican equal to the average share of
Option 2. LASSO
Figure 3: Average Partisanship of Speech, Penalized Estimates
0.510
Average partisanship
0.508
0.506
0.504
0.502
0.500
0.502
From 1976
0.500
0.508
0.504
1976 1980 1984 1988 1992 1996 2000 2004 2008 2012 2016
Notes: Panel A shows the results from our preferred penalized estimator defined in Section 4.2. “Real” series is from
actual data; “random” series is from hypothetical data in which each speaker’s party is randomly assigned with the
probability that the speaker is Republican equal to the average share of speakers who are Republican in the sessions in
Consultant Frank Luntz was involved in the 1994 campaign. When asked
if “language can change a paradigm”, he answered
I don’t believe it – I know it. I’ve seen it with my own eyes... I watched
in 1994 when the group of Republicans got together and said: “We’re
going to do this completely differently than it’s ever been done before...”
(Luntz 2004)
Partisanship using multiple words
Partisanship measures predictive power of using 1 phrase. What if we
Figure 4: Informativeness of Speech by Speech Length and Session
used > 1 phrase?
1.0 1873−1874
1989−1990 One minute
2007−2008 of speech
0.9
Expected posterior
0.8
0.7
0.6
0.5
0 20 40 60 80 100
Number of phrases
Notes: For each speaker i and session t we calculate, given characteristics xit , the expected posterior that
an observer with a neutral prior would place on a speaker’s true party after hearing a given number of
Partisanship by topics
Partisanship if we only used words
Figure from one
7: Partisanship (manually defined) topic
by Topic
alcohol budget business
0.50 0.54 0.58
1870 1890 1910 1930 1950 1970 1990 2010 1870 1890 1910 1930 1950 1970 1990 2010 1870 1890 1910 1930 1950 1970 1990 2010
Freq.
1870 1890 1910 1930 1950 1970 1990 2010 1870 1890 1910 1930 1950 1970 1990 2010 1870 1890 1910 1930 1950 1970 1990 2010
Residential segregation
Average partisanship
0.58
0.58
0.56
0.56
0.52 0.52
0.60
Panel B: By Party Campaign Contributio
MLE
Average partisanshipAverage partisanship
0.58
0.70 real random 0.70
0.56
0.54
0.65 0.65
Average partisanship
0.52
0.60 0.60
0.50
00 2005 2010
0.55
1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010
LASSO
0.55
0.62 real random
0.60
Panel B: By Party Campaign Contributio
MLE By donation
Average partisanship Average partisanship
0.58
0.54
0.65 0.65
Average partisanship
0.52
0.60 0.60
0.50
00 2005 2010 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010
0.55 0.55
Preferred Specification 1980 1985 1990 1995 2000 2005 2010 2015
MLE1980 1985 19
fication data with j indexing counties. Plots in Panel B show the average
Average partisanship
party contribution data with j indexing zipcodes. For each panel, the plot
0.60
maximum likelihood estimator defined in 4.1. The plot on the right sho
penalized estimator without covariates and with settings y = 10 6 and
0.55
actual data; “random” series is from hypothetical data in which each respo
0.50
2010 2015
signed a party with the probability that the respondent/contributor
1980
LASSO is Rep
1985 1990 1995 2000 2005 2010 2015