0% found this document useful (0 votes)

6 views

Evaluating Neural Network Explanation Methods Using Hybrid Documents and Morphosyntactic Agreement

Uploaded by

Arthur Lopes

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Evaluating Neural Network Explanation Methods Using Hybrid Documents and Morphosyntactic Agreement

Uploaded by

Arthur Lopes

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Evaluating neural network explanation methods using

hybrid documents and morphosyntactic agreement

Nina Poerner, Benjamin Roth & Hinrich Schütze

Center for Information and Language Processing
LMU Munich, Germany
[email protected]

Abstract A number of post hoc explanation methods for

DNNs have been proposed. Due to the complexity
The behavior of deep neural networks of the DNNs they explain, these methods are nec-
arXiv:1801.06422v3 [cs.CL] 6 May 2019

(DNNs) is hard to understand. This makes essarily approximations and come with their own
it necessary to explore post hoc expla- sources of error. At this point, it is not clear which
nation methods. We conduct the first of these methods to use when reliable explanations
comprehensive evaluation of explanation for a specific DNN architecture are needed.
methods for NLP. To this end, we design Definitions. (i) A task method solves an NLP
two novel evaluation paradigms that cover problem, e.g., a GRU that predicts sentiment.
two important classes of NLP problems: (ii) An explanation method explains the behav-
small context and large context problems. ior of a task method on a specific input. For our
Both paradigms require no manual annota- purpose, it is a function φ(t, k, X) that assigns
tion and are therefore broadly applicable. real-valued relevance scores for a target class k
We also introduce LIMSSE, an explana- (e.g., positive) to positions t in an input text X
tion method inspired by LIME that is de- (e.g., “great food”). For this example, an ex-
signed for NLP. We show empirically that planation method might assign: φ(1, k, X) >
LIMSSE, LRP and DeepLIFT are the most φ(2, k, X).
effective explanation methods and recom- (iii) An (explanation) evaluation paradigm
mend them for explaining DNNs in NLP. quantitatively evaluates explanation methods for a
task method, e.g., by assigning them accuracies.
1 Introduction Contributions. (i) We present novel evaluation
paradigms for explanation methods for two classes
DNNs are complex models that combine linear of common NLP tasks (see §2). Crucially, nei-
transformations with different types of nonlinear- ther paradigm requires manual annotations and
ities. If the model is deep, i.e., has many layers, our methodology is therefore broadly applicable.
then its behavior during training and inference is (ii) Using these paradigms, we perform a com-
notoriously hard to understand. prehensive evaluation of explanation methods for
This is a problem for both scientific method- NLP (§3). We cover the most important classes
ology and real-world deployment. Scientific of task methods, RNNs and CNNs, as well as the
methodology demands that we understand our recently proposed Quasi-RNNs.
models. In the real world, a decision (e.g., “your (iii) We introduce LIMSSE (§3.6), an expla-
blog post is offensive and has been removed”) by nation method inspired by LIME (Ribeiro et al.,
itself is often insufficient; in addition, an expla-
nation of the decision may be required (e.g., “our
tasks sentiment analysis,
system flagged the following words as offensive”). morphological prediction, . . .
The European Union plans to mandate that intelli- task methods CNN, GRU, LSTM, . . .
gent systems used for sensitive applications pro- explanation methods LIMSSE, LRP, DeepLIFT, . . .
evaluation paradigms hybrid document,
vide such explanations (European General Data morphosyntactic agreement
Protection Regulation, expected 2018, cf. Good-
man and Flaxman (2016)). Table 1: Terminology with examples.
From : kolstad @ cae.wisc.edu ( Joel Kolstad ) Subject : Re : Can Radio Freq . Be Used To Measure Distance ? [...] What is the difference
lrp
between vertical and horizontal ? Gravity ? Does n’t gravity pull down the photons and cause a doppler shift or something ? ( Just kidding ! )
If you find faith to be honest , show me how . David The whole denominational mindset only causes more problems , sadly . ( See section 7 for
gradL2
1p details . ) Thank you . ’The Armenians just shot and shot . Maybe coz they ’re ’quality’ cars ; - ) 200 posts/day . [...]
If you find faith to be honest , show me how . David The whole denominational mindset only causes more problems , sadly . ( See section 7 for
limssems
s details . ) Thank you . ’The Armenians just shot and shot . Maybe coz they ’re ’quality’ cars ; - ) 200 posts/day . [...]

Figure 1: Top: sci.electronics post (not hybrid). Underlined: Manual relevance ground truth.
Green: evidence for sci.electronics. Task method: CNN. Bottom: hybrid newsgroup post, classified
talk.politics.mideast. Green: evidence for talk.politics.mideast. Underlined: talk.politics.mideast frag-
ment. Task method: QGRU. Italics: OOV. Bold: rmax position. See supplementary for full texts.

2016) that is designed for word-order sensitive explanation method in Fig 1 (top) marks “Kolstad”
task methods (e.g., RNNs, CNNs). We show em- as relevant, but the human annotator does not.
pirically that LIMSSE, LRP (Bach et al., 2015)
and DeepLIFT (Shrikumar et al., 2017) are the 2.1 Small context: Hybrid document
most effective explanation methods (§4): LRP and paradigm
DeepLIFT are the most consistent methods, while Given a collection of documents, hybrid docu-
LIMSSE wins the hybrid document experiment. ments are created by randomly concatenating doc-
ument fragments. We assume that, on average, the
2 Evaluation paradigms most relevant input for a class k in a hybrid doc-
ument is located in a fragment that stems from a
In this section, we introduce two novel evalua- document with gold label k. Hence, an explana-
tion paradigms for explanation methods on two tion method succeeds if it places maximal rele-
types of common NLP tasks, small context tasks vance for k inside the correct fragment.
and large context tasks. Small context tasks are Formally, let xt be a word inside hybrid docu-
defined as those that can be solved by finding ment X that originates from a document X0 with
short, self-contained indicators, such as words and gold label y(X0 ). xt ’s gold label y(X, t) is set
phrases, and weighing them up (i.e., tasks where to y(X0 ). Let f (X) be the class assigned to the
CNNs with pooling can be expected to perform hybrid document by a task method, and let φ
well). We design the hybrid document paradigm be an explanation method as defined above. Let
for evaluating explanation methods on small con- rmax(X, φ) denote the position of the maximally
text tasks. Large context tasks require the cor- relevant word in X for the predicted class f (X).
rect handling of long-distance dependencies, such If this maximally relevant word comes from a doc-
as subject-verb agreement.1 We design the mor- ument with the correct gold label, the explanation
phosyntactic agreement paradigm for evaluating method is awarded a hit:
explanation methods on large context tasks.
We could also use human judgments for hit(φ, X) = I[y X, rmax(X, φ) = f (X)] (1)
evaluation. While we use Mohseni and Ragan where I[P ] is 1 if P is true and 0 otherwise. In
(2018)’s manual relevance benchmark for com- Fig 1 (bottom), the explanation method gradL2 1p
parison, there are two issues with it: (i) Due to places rmax outside the correct (underlined) frag-
the cost of human labor, it is limited in size and ment. Therefore, it does not get a hit point, while
domain. (ii) More importantly, a good explana- limssems
s does.
tion method should not reflect what humans at- The pointing game accuracy of an explana-
tend to, but what task methods attend to. For in- tion method is calculated as its total number of
stance, the family name “Kolstad” has 11 out of hit points divided by the number of possible hit
its 13 appearances in the 20 newsgroups corpus in points. This is a form of the pointing game
sci.electronics posts. Thus, task methods probably paradigm from computer vision (Zhang et al.,
learn it as a sci.electronics indicator. Indeed, the 2016).
1
Consider deciding the number of [verb] in “the children
in the green house said that the big telescope [verb]” vs.
2.2 Large context: Morphosyntactic
“the children in the green house who broke the big telescope agreement paradigm
[verb]”. The local contexts of “children” or “[verb]” do not
suffice to solve this problem, instead, the large context of the Many natural languages display morphosyntactic
entire sentence has to be considered. agreement between words v and w. A DNN that
graddot the link provided by the editor above [encourages ...]
3.1 Gradient-based explanation methods
R
s
lrp the link provided by the editor above [encourages ...]
limssebb the link provided by the editor above [encourages ...] Gradient-based explanation methods approximate
gradL2 few if any events in history [are ...]
the contribution of some DNN input i to some out-
R
s
occ1 few if any events in history [are ...]
limssems
s few if any events in history [are ...] put o with o’s gradient with respect to i (Simonyan
et al., 2014). In the following, we consider two
Figure 2: Top: verb context classified singular.
output functions o(k, X), the unnormalized class
Green: evidence for singular. Task method: GRU.
score s(k, X) and the class probability p(k|X):
Bottom: verb context classified plural. Green: ev-
idence for plural. Task method: LSTM. Under- s(k, X) = w ~ k · ~h(X) + bk (2)
lined: subject. Bold: rmax position.

exp s(k, X)
p(k|X) = PK (3)
predicts the agreeing feature in w should pay at- k0 =1 exp s(k 0 , X)
tention to v. For example, in the sentence “the where k is the target class, ~h(X) the document
children with the telescope are home”, the num- representation (e.g., an RNN’s final hidden layer),
ber of the verb (plural for “are”) can be predicted w
~ k (resp. bk ) k’s weight vector (resp. bias).
from the subject (“children”) without looking at The simple gradient of o(k, X) w.r.t. i is:
the verb. If the language allows for v and w to be
far apart (Fig 3, top), successful task methods have ∂o(k, X)
grad1 (i, k, X) = (4)
to be able to handle large contexts. ∂i
Linzen et al. (2016) show that English verb grad1 underestimates the importance of inputs
number can be predicted by a unidirectional that saturate a nonlinearity (Shrikumar et al.,
LSTM with accuracy > 99%, based on left context 2017). To address this, Sundararajan et al. (2017)
alone. When a task method predicts the correct integrate over all gradients on a linear interpola-
number, we expect successful explanation meth- tion α ∈ [0, 1] between a baseline input X̄ (here:
ods to place maximal relevance on the subject: all-zero embeddings) and X:
R1
gradR (i, k, X) = α=0 ∂o(k,X̄+α(X−
∂i
X̄))
∂α
hittarget (φ, X) = I[rmax(X, φ) = target(X)]
m
1 PM ∂o(k,X̄+ M (X−X̄))
≈M m=1 ∂i (5)
where target(X) is the location of the subject,
and rmax is calculated as above. Regardless of where M is a big enough constant (here: 50).
whether the prediction is correct, we expect rmax In NLP, symbolic inputs (e.g., words) are often
to fall onto a noun that has the predicted number: represented as one-hot vectors ~xt ∈ {1, 0}|V | and
embedded via a real-valued matrix: ~et = M~xt .

hitfeat (φ, X) = I[feat X, rmax(X, φ) = f (X)] Gradients are computed with respect to individual
entries of E = [~e1 . . . ~e|X| ]. Bansal et al. (2016)
where feat(X, t) is the morphological feature and Hechtlinger (2016) use the L2 norm to reduce
(here: number) of xt . In Fig 2, rmax on “link” vectors of gradients to single values:
gives a hittarget point (and a hitfeat point), rmax
φgradL2 (t, k, X) = ||grad(~et , k, E)|| (6)
on “editor” gives a hitfeat point. gradL2
R does not
s
get any points as “history” is not a plural noun. where grad(~et , k, E) is a vector of elementwise
Labels for this task can be automatically gen- gradients w.r.t. ~et . Denil et al. (2015) use the dot
erated using part-of-speech taggers and parsers, product of the gradient vector and the embedding2 ,
which are available for many languages. i.e., the gradient of the “hot” entry in ~xt :
φgraddot (t, k, X) = ~et · grad(~et , k, E) (7)
3 Explanation methods
We use “grad1 ” for Eq 4, “gradR ” for Eq 5, “p ”
In this section, we define the explanation meth- for Eq 3, “s ” for Eq 2, “L2” for Eq 6 and “dot”
ods that will be evaluated. For our purpose, ex- for Eq 7. This gives us eight explanation meth-
planation methods produce word relevance scores ods: gradL2 L2 dot dot L2
1s , grad1p , grad1s , grad1p , grad s ,
R
φ(t, k, X), which are specific to a given class k
gradL2
R , graddot
R , graddot
R .
and a given input X. φ(t, k, X) > φ(t0 , k, X) p s p
means that xt contributed more than xt0 to the task For graddot
2 R , replace ~et with ~et − ~ēt . Since our baseline
method’s (potential) decision to classify X as k. embeddings are all-zeros, this is equivalent.
3.2 Layer-wise relevance propagation in ~it do not receive any relevance themselves. See
Layer-wise relevance propagation (LRP) is a supplementary material for formal definitions of
backpropagation-based explanation method devel- Epsilon LRP for different architectures.
oped for fully connected neural networks and 3.3 DeepLIFT
CNNs (Bach et al., 2015) and later extended to
LSTMs (Arras et al., 2017b). In this paper, we DeepLIFT (Shrikumar et al., 2017) is another
use Epsilon LRP (Eq 58, Bach et al. (2015)). Re- backpropagation-based explanation method. Un-
member that the activation of neuron j,P aj , is the like LRP, it does not explain s(k, X), but
sum of weighted upstream activations, i ai wi,j , s(k, X)−s(k, X̄), where X̄ is some baseline input
plus bias bj , squeezed through some nonlinearity. (here: all-zero embeddings). Following Ancona
We denote the pre-nonlinearity activation of j as et al. (2018) (Eq 4), we use this backpropagation
a0 j . The relevance of j, R(j), is distributed to up- rule:
stream neurons i proportionally to the contribution X ai wi,j − āi wi,j
R(i) = R(j) 0
that i makes to a0 j in the forward pass: aj − ā0j + esign(a0j − ā0j )
j
X ai wi,j
R(i) = R(j) 0 (8) where ā refers to the forward pass of the base-
a j + esign(a0 j )
j line. Note that the original method has a dif-
This ensures that relevance is conserved between ferent mechanism for avoiding small denomina-
layers, with the exception of relevance attributed tors; we use esign for compatibility with LRP.
to bj . To prevent numerical instabilities, esign(a0 ) The DeepLIFT algorithm is started with R(Lk0 ) =
s(k, X)−s(k, X̄) I[k 0 = k]. On gated (Q)RNNs,

returns − if a0 < 0 and otherwise. We set =
.001. The full algorithm is: we proceed analogous to LRP and treat gates as
weights.
R(Lk0 ) = s(k, X)I[k 0 = k]
3.4 Cell decomposition for gated RNNs
... recursive application of Eq 8 ...
dim(~et )
The cell decomposition explanation method for
X LSTMs (Murdoch and Szlam, 2017) decomposes
φlrp (t, k, X) = R(et,j )
j=1
the unnormalized class score s(k, X) (Eq 2) into
additive contributions. For every time step t, we
where L is the final layer, k the target class and compute how much of ~ct “survives” until the final
R(et,j ) the relevance of dimension j in the t’th step T and contributes to s(k, X). This is achieved
embedding vector. For → 0 and provided that all by applying all future forget gates f~, the final tanh
nonlinearities up to the unnormalized class score nonlinearity, the final output gate ~oT , as well as the
are relu, Epsilon LRP is equivalent to the prod- class weights of k to ~ct . We call this quantity “net
uct of input and raw score gradient (here: graddot
1s ) load of t for class k”:
(Kindermans et al., 2016). In our experiments, the
T
second requirement holds only for CNNs.
Y
~ k · ~oT
nl(t, k, X) = w tanh ( f~j ) ~ct
Experiments by Ancona et al. (2017) (see §6) j=t+1
suggest that LRP does not work well for LSTMs Q
if all neurons – including gates – participate in where and are applied elementwise. The rel-
backpropagation. We therefore use Arras et al. evance of t is its gain in net load relative to t − 1:
(2017b)’s modification and treat sigmoid-activated φdecomp (t, k, X) = nl(t, k, X) − nl(t − 1, k, X).
gates as time step-specific weights rather than neu- For GRU, we change the definition of net load:
rons. For instance, the relevance of LSTM candi-
T
date vector ~gt is calculated from memory vector ~ct Y
~ht

~k · (
nl(t, k, X) = w ~zj )
and input gate vector ~it as j=t+1

gt,d · it,d where ~z are GRU update gates.

R(gt,d ) = R(ct,d )
ct,d + esign(ct,d ) 3.5 Input perturbation methods
This is equivalent to applying Eq 8 while treating Input perturbation methods assume that the re-
~it as a diagonal weight matrix. The gate neurons moval or masking of relevant inputs changes the
where f (Zn ) = argmaxk0 p(k 0 |Zn ) . The black

output (Zeiler and Fergus, 2014). Omission-
based methods remove inputs completely (Kádár box approach is maximally general, but insensitive
et al., 2017), while occlusion-based methods re- to the magnitude of evidence found in Zn . Hence,
place them with a baseline (Li et al., 2016b). In we also test magnitude-sensitive loss functions:
computer vision, perturbations are usually applied X 2
to patches, as neighboring pixels tend to correlate ~vˆk = argmin ~zn · ~vk − o(k, Zn )
~vk n
(Zintgraf et al., 2017). To calculate the omitN
(resp. occN ) relevance of word xt , we delete (resp. where o(k, Zn ) is one of s(k, Zn ) or p(k|Zn ). We
occlude), one at a time, all N -grams that contain refer to these as limssems ms
s and limssep .
xt , and average the change in the unnormalized
class score from Eq 2: 4 Experiments
φ[omit|occ]N (t, k, X) = N
P
j=1 s(k, [~ e1 . . . ~e|X| ]) 4.1 Hybrid document experiment
−s(k, [~e1 . . . ~et−N −1+j ]kĒk[~et+j . . . ~e|X| ]) N1

For the hybrid document experiment, we use the
20 newsgroups corpus (topic classification) (Lang,
1995) and reviews from the 10th yelp dataset
where ~et are embedding vectors, k denotes con- challenge (binary sentiment analysis)3 . We train
catenation and Ē is either a sequence of length five DNNs per corpus: a bidirectional GRU (Cho
zero (φomit ) or a sequence of N baseline (here: et al., 2014), a bidirectional LSTM (Hochreiter
all-zero) embedding vectors (φocc ). and Schmidhuber, 1997), a 1D CNN with global
3.6 LIMSSE: LIME for NLP max pooling (Collobert et al., 2011), a bidirec-
tional Quasi-GRU (QGRU), and a bidirectional
Local Interpretable Model-agnostic Explanations
Quasi-LSTM (QLSTM). The Quasi-RNNs are 1D
(LIME) (Ribeiro et al., 2016) is a framework
CNNs with a feature-wise gated recursive pooling
for explaining predictions of complex classifiers.
layer (Bradbury et al., 2017). Word embeddings
LIME approximates the behavior of classifier f in
are R300 and initialized with pre-trained GloVe
the neighborhood of input X with an interpretable
embeddings (Pennington et al., 2014)4 . The main
(here: linear) model. The interpretable model is
layer has a hidden size of 150 (bidirectional ar-
trained on samples Z1 . . . ZN (here: N = 3000),
chitectures: 75 dimensions per direction). For the
which are randomly drawn from X, with “gold la-
QRNNs and CNN, we use a kernel width of 5. In
bels” f (Z1 ) . . . f (ZN ).
all five architectures, the resulting document rep-
Since RNNs and CNNs respect word or-
resentation is projected to 20 (resp. two) dimen-
der, we cannot use the bag of words sam-
sions using a fully connected layer, followed by a
pling method from the original description
softmax. See supplementary material for details
of LIME. Instead, we introduce Local Inter-
on training and regularization.
pretable Model-agnostic Substring-based Expla-
After training, we sentence-tokenize the test
nations (LIMSSE). LIMSSE uniformly samples
sets, shuffle the sentences, concatenate ten sen-
a length ln (here: 1 ≤ ln ≤ 6) and a start-
tences at a time and classify the resulting hybrid
ing point sn , which define the substring Zn =
documents. Documents that are assigned a class
[~xsn . . . ~xsn +ln −1 ]. To the linear model, Zn is rep-
that is not the gold label of at least one con-
resented by a binary vector ~zn ∈ {0, 1}|X| , where
stituent word are discarded (yelp: < 0.1%; 20
zn,t = I[sn ≤ t < sn + ln ].
newsgroups: 14% - 20%). On the remaining docu-
We learn a linear weight vector ~vˆk ∈ R|X| , ments, we use the explanation methods from §3 to
whose entries are word relevances for k, i.e., find the maximally relevant word for each predic-
φlimsse (t, k, X) = v̂k,t . To optimize it, we experi- tion. The random baseline samples the maximally
ment with three loss functions. The first, which we relevant word from a uniform distribution.
will refer to as limssebb , assumes that our DNN is For reference, we also evaluate on a hu-
a total black box that delivers only a classification: man judgment benchmark (Mohseni and Ra-
X gan (2018), Table 2, C11-C15). It contains
~vˆk = argmin

− log σ(~zn · ~vk ) I[f (Zn ) = k] 3
www.yelp.com/dataset_challenge
~vk n 4
http://nlp.stanford.edu/data/glove.
+ log 1 − σ(~zn · ~vk ) I[f (Zn ) 6= k] 840B.300d.zip
column C01 C02 C03 C04 C05 C06 C07 C08 C09 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 C21 C22 C23 C24 C25 C26 C27
hybrid document experiment man. groundtruth morphosyntactic agreement experiment
hittarget hitfeat
yelp 20 newsgroups 20 newsgroups f (X) = y(X) f (X) 6= y(X)

QLSTM

QLSTM
QGRU

QGRU

QGRU
LSTM

LSTM

LSTM
CNN

CNN

CNN
GRU

GRU

GRU
φ
gradL2
1s .61 .68 .67 .70 .68 .45 .47 .25 .33 .79 .26 .31 .07 .18 .74 .48 .23 .63 .19 .52 .27 .73 .22 .09 .11 .19 .19
gradL2
1p .57 .67 .67 .70 .74 .40 .43 .26 .34 .70 .18 .35 .07 .13 .66 .48 .22 .63 .18 .53 .26 .73 .21 .09 .09 .18 .11
L2
grad s
R .71 .66 .69 .71 .70 .58 .32 .26 .21 .82 .23 .15 .11 .08 .76 .69 .67 .68 .51 .73 .70 .75 .55 .19 .22 .20 .20
gradL2
R
p .71 .70 .72 .71 .77 .56 .34 .30 .23 .81 .13 .08 .14 .01 .78 .68 .77 .50 .70 .74 .82 .54 .78 .19 .21 .19 .30
graddot
1s .88 .85 .81 .77 .86 .79 .76 .59 .72 .89 .80 .70 .14 .47 .79 .81 .62 .73 .56 .85 .66 .81 .59 .42 .34 .46 .36
dot
grad1p .92 .88 .84 .79 .95 .78 .72 .59 .72 .81 .71 .59 .20 .44 .69 .79 .58 .74 .54 .83 .61 .81 .56 .41 .33 .46 .35
graddot
R
s .84 .90 .85 .87 .87 .81 .68 .60 .68 .89 .82 .64 .21 .26 .80 .90 .87 .78 .84 .94 .92 .83 .89 .54 .51 .46 .52
graddot
R
p .86 .89 .84 .89 .96 .80 .69 .62 .73 .89 .80 .53 .40 .54 .78 .87 .85 .68 .84 .93 .92 .74 .93 .53 .48 .42 .51
omit1 .79 .82 .85 .87 .61 .78 .75 .54 .76 .82 .80 .48 .33 .48 .65 .81 .81 .79 .80 .86 .87 .86 .84 .43 .45 .44 .45
omit3 .89 .80 .89 .88 .59 .79 .71 .72 .81 .76 .77 .37 .36 .49 .61 .74 .77 .73 .73 .82 .84 .82 .79 .41 .45 .42 .46
omit7 .92 .88 .91 .91 .70 .79 .77 .77 .84 .84 .77 .49 .44 .55 .65 .76 .80 .66 .74 .85 .88 .78 .80 .40 .48 .43 .47
occ1 .80 .71 .74 .84 .61 .78 .73 .60 .77 .82 .77 .49 .19 .10 .65 .91 .85 .86 .86 .94 .88 .89 .88 .50 .44 .46 .47
occ3 .92 .61 .93 .85 .59 .78 .63 .74 .74 .76 .74 .37 .32 .35 .61 .74 .73 .71 .72 .78 .76 .76 .76 .43 .37 .41 .43
occ7 .92 .77 .93 .90 .70 .78 .62 .74 .77 .84 .74 .35 .43 .39 .65 .64 .65 .63 .65 .73 .73 .72 .73 .36 .35 .39 .43
decomp .79 .88 .92 .88 - .75 .79 .77 .80 - .54 .36 .72 .51 - .84 .87 .86 .90 .90 .93 .92 .96 .52 .58 .57 .63
lrp .92 .87 .91 .84 .86 .82 .83 .79 .85 .89 .85 .72 .74 .81 .79 .90 .90 .86 .91 .95 .95 .91 .95 .58 .60 .52 .63
deeplift .91 .89 .94 .85 .87 .82 .83 .78 .84 .89 .84 .72 .70 .81 .80 .91 .90 .85 .91 .95 .95 .90 .95 .59 .59 .52 .63
limssebb .81 .82 .83 .84 .78 .78 .81 .78 .80 .84 .52 .53 .53 .54 .57 .43 .41 .44 .42 .54 .51 .56 .52 .39 .43 .42 .41
limssems
s .94 .94 .93 .93 .91 .85 .87 .83 .86 .89 .85 .84 .76 .84 .82 .62 .62 .67 .63 .75 .74 .82 .75 .52 .53 .55 .53
limssems
p .87 .88 .85 .86 .94 .85 .86 .83 .86 .90 .81 .80 .74 .76 .76 .62 .62 .67 .63 .75 .74 .82 .75 .51 .53 .55 .53
random .69 .67 .70 .69 .66 .20 .19 .22 .22 .21 .09 .09 .06 .06 .08 .27 .27 .27 .27 .33 .33 .33 .33 .12 .13 .12 .12
last - - - - - - - - - - - - - - - .66 .67 .66 .67 .76 .77 .76 .77 .21 .27 .25 .26
N 7551 ≤ N ≤ 7554 3022 ≤ N ≤ 3230 137 ≤ N ≤ 150 N ≈ 1400000 N ≈ 20000

Table 2: Pointing game accuracies in hybrid document experiment (left), on manually annotated bench-
mark (middle) and in morphosyntactic agreement experiment (right). hittarget (resp. hitfeat ): maximal
relevance on subject (resp. on noun with the predicted number feature). Bold: top explanation method.
Underlined: within 5 points of top explanation method.

188 documents from the 20 newsgroups test set As task methods, we replicate Linzen et al.
(classes sci.med and sci.electronics), with one (2016)’s unidirectional LSTM (R50 randomly
manually created list of relevant words per doc- initialized word embeddings, hidden size 50).
ument. We discard documents that are incorrectly We also train unidirectional GRU, QGRU and
classified (20% - 27%) and define: hit(φ, X) = QLSTM architectures with the same dimension-
I[rmax(X, φ) ∈ gt(X)], where gt(X) is the man- ality. We use the explanation methods from §3 to
ual ground truth. find the most relevant word for predictions on the
test set. As described in §2.2, explanation methods
4.2 Morphosyntactic agreement experiment are awarded a hittarget (resp. hitfeat ) point if this
For the morphosyntactic agreement experiment, word is the subject (resp. a noun with the predicted
we use automatically annotated English Wikipedia number feature). For reference, we use a random
sentences by Linzen et al. (2016)5 . For our pur- baseline as well as a baseline that assumes that the
pose, a sample consists of: all words preceding the most relevant word directly precedes the verb.
verb: X = [x1 · · · xT ]; part-of-speech (POS) tags:
pos(X, t) ∈ {VBZ, VBP, NN, NNS, . . .}; and the
5 Discussion
position of the subject: target(X) ∈ [1, T ]. The 5.1 Explanation methods
number feature is derived from the POS:
Our experiments suggest that explanation methods
 for neural NLP differ in quality.
Sg if pos(X, t) ∈ {VBZ, NN}

As in previous work (see §6), gradient L2
feat(X, t) = Pl if pos(X, t) ∈ {VBP, NNS}
 norm (gradL2 ) performs poorly, especially on
n/a otherwise

RNNs. We assume that this is due to its inability
to distinguish relevances for and against k.
The gold label of a sentence is the number of its Gradient embedding dot product (graddot )
verb, i.e., y(X) = feat(X, T + 1). is competitive on CNN (Table 2, graddot 1p C05,
dot
5
www.tallinzen.net/media/rnn_ grad1s C10, C15), presumably because relu is
agreement/agr_50_mostcommon_10K.tsv.gz linear on positive inputs, so gradients are exact in-
decomp initially a pagan culture , detailed information about the return of the christian religion to the islands during the norse-era [is ...]
deeplift initially a pagan culture , detailed information about the return of the christian religion to the islands during the norse-era [is ...]
limssems
p initially a pagan culture , detailed information about the return of the christian religion to the islands during the norse-era [is ...]
Your day is done . Definitely looking forward to going back . All three were outstanding ! I would highly recommend going here to anyone .
lrp We will see if anyone returns the message my boyfriend left . The price is unbelievable ! And our guys are on lunch so we ca n’t fit you in . ” It
’s good , standard froyo . The pork shoulder was THAT tender . Try it with the Tomato Basil cram sauce .
Your day is done . Definitely looking forward to going back . All three were outstanding ! I would highly recommend going here to anyone .
limssems
p We will see if anyone returns the message my boyfriend left . The price is unbelievable ! And our guys are on lunch so we ca n’t fit you in . ” It
’s good , standard froyo . The pork shoulder was THAT tender . Try it with the Tomato Basil cram sauce .

Figure 3: Top: verb context classified singular. Task method: LSTM. Bottom: hybrid yelp review,
classified positive. Task method: QLSTM.

stead of approximate. graddot also has decent per- iment, but it is inconsistent on other architec-
formance for GRU (graddot dot
1p C01, grad s C{06,
R tures. Gated RNNs have a long-term additive and
11, 16, 20, 24}), perhaps because GRU hidden ac- a multiplicative pathway, and the decomposition
tivations are always in [-1,1], where tanh and σ method only detects information traveling via the
are approximately linear. additive one. Miao et al. (2016) show qualita-
Integrated gradient (gradR ) mostly outper- tively that GRUs often reorganize long-term mem-
forms simple gradient (grad1 ), though not consis- ory abruptly, which might explain the difference
tently (C01, C07). Contrary to expectation, in- between LSTM and GRU. QRNNs only have ad-
tegration did not help much with the failure of ditive recurrent connections; however, given that
the gradient method on LSTM on 20 newsgroups ~ct (resp. ~ht ) is calculated by convolution over sev-
(graddot dot eral time steps, decomposition relevance can be in-
1 vs. grad
R in C08, C13), which we had
correctly attributed inside that window. This likely
assumed to be due to saturation of tanh on large
is the reason for the stark difference between the
absolute activations in ~c. Smaller intervals may be
performance of decomposition on QRNNs in the
needed to approximate the integration, however,
hybrid document experiment and on the manually
this means additional computational cost.
labeled data (C07, C09 vs. C12, C14). Overall,
The gradient of s(k, X) performs better or sim- we do not recommend the decomposition method,
ilar to the gradient of p(k|X). The main exception because it fails to take into account all routes by
is yelp (graddot dot
1s vs. grad1p , C01-C05). This is which information can be propagated.
probably due to conflation by p(k|X) of evidence
Omission and occlusion produce inconsis-
for k (numerator in Eq 3) and against competi-
tent results in the hybrid document experiment.
tor classes (denominator). In a two-class scenario,
Shrikumar et al. (2017) show that perturbation
there is little incentive to keep classes separate,
methods can lack sensitivity when there are more
leading to information flow through the denomi-
relevant inputs than the “perturbation window”
nator. In future work, we will replace the two-
covers. In the morphosyntactic agreement experi-
way softmax with a one-way sigmoid such that
ment, omission is not competitive; we assume that
φ(t, 0, X) := −φ(t, 1, X).
this is because it interferes too much with syntactic
LRP and DeepLIFT are the most consistent structure. occ1 does better (esp. C16-C19), possi-
explanation methods across evaluation paradigms bly because an all-zero “placeholder” is less dis-
and task methods. (The comparatively low point- ruptive than word removal. But despite some high
ing game accuracies on the yelp QRNNs and CNN scores, it is less consistent than other explanation
(C02, C04, C05) are probably due to the fact methods.
that they explain s(k, .) in a two-way softmax, Magnitude-sensitive LIMSSE (limssems )
see above.) On CNN (C05, C10, C15), LRP consistently outperforms black-box LIMSSE
and graddot1s perform almost identically, suggest- (limssebb ), which suggests that numerical out-
ing that they are indeed quasi-equivalent on this ar- puts should be used for approximation where
chitecture (see §3.2). On (Q)RNNs, modified LRP possible. In the hybrid document experiment,
and DeepLIFT appear to be superior to the gradi- magnitude-sensitive LIMSSE outperforms the
ent method (lrp vs. graddot dot
1s , deeplift vs. grad s ,
R
other explanation methods (exceptions: C03,
C01-C04, C06-C09, C11-C14, C16-C27). C05). However, it fails in the morphosyntactic
Decomposition performs well on LSTM, es- agreement experiment (C16-C27). In fact, we
pecially in the morphosyntactic agreement exper- expect LIMSSE to be unsuited for large context
problems, as it cannot discover dependencies (Lei et al., 2016)), as these are very specific archi-
whose range is bigger than a given text sample. tectures that may not be not applicable to all tasks.
In Fig 3 (top), limssems p highlights any singular
noun without taking into account how that noun 6.2 Explanation evaluation
fits into the overall syntactic structure.
According to Doshi-Velez and Kim (2017)’s
5.2 Evaluation paradigms taxonomy of explanation evaluation paradigms,
application-grounded paradigms test how well an
The assumptions made by our automatic evalua-
explanation method helps real users solve real
tion paradigms have exceptions: (i) the correlation
tasks (e.g., doctors judge automatic diagnoses);
between fragment of origin and relevance does not
human-grounded paradigms rely on proxy tasks
always hold (e.g., a positive review may contain
(e.g., humans rank task methods based on expla-
negative fragments, and will almost certainly con-
nations); functionally-grounded paradigms work
tain neutral fragments); (ii) in morphological pre-
without human input, like our approach.
diction, we cannot always expect the subject to be
Arras et al. (2016) (cf. Samek et al. (2016))
the only predictor for number. In Fig 2 (bottom)
propose a functionally-grounded explanation eval-
for example, “few” is a reasonable clue for plural
uation paradigm for NLP where words in a cor-
despite not being a noun. This imperfect ground
rectly (resp. incorrectly) classified document are
truth means that absolute pointing game accura-
deleted in descending (resp. ascending) order of
cies should be taken with a grain of salt; but we
relevance. They assume that the fewer words must
argue that this does not invalidate them for com-
be deleted to reduce (resp. increase) accuracy, the
parisons.
better the explanations. According to this metric,
We also point out that there are characteristics
LRP (§3.2) outperforms gradL2 on CNNs (Arras
of explanations that may be desirable but are not
et al., 2016) and LSTMs (Arras et al., 2017b) on
reflected by the pointing game. Consider Fig 3
20 newsgroups. Ancona et al. (2017) perform the
(bottom). Both explanations get hit points, but the
same experiment with a binary sentiment analy-
lrp explanation appears “cleaner” than limssems p ,
sis LSTM. Their graph shows occ1 , graddot 1 and
with relevance concentrated on fewer tokens.
graddot
R tied in first place, while LRP, DeepLIFT
6 Related work and the gradient L1 norm lag behind. Note that
their treatment of LSTM gates in LRP / DeepLIFT
6.1 Explanation methods differs from our implementation.
Explanation methods can be divided into local An issue with the word deletion paradigm is that
and global methods (Doshi-Velez and Kim, 2017). it uses syntactically broken inputs, which may in-
Global methods infer general statements about troduce artefacts (Sundararajan et al., 2017). In
what a DNN has learned, e.g., by clustering docu- our hybrid document paradigm, inputs are syntac-
ments (Aubakirova and Bansal, 2016) or n-grams tically intact (though semantically incoherent at
(Kádár et al., 2017) according to the neurons that the document level); the morphosyntactic agree-
they activate. Li et al. (2016a) compare embed- ment paradigm uses unmodified inputs.
dings of specific words with reference points to Another class of functionally-grounded evalu-
measure how drastically they were changed dur- ation paradigms interprets the performance of a
ing training. In computer vision, Simonyan et al. secondary task method, on inputs that are derived
(2014) optimize the input space to maximize the from (or altered by) an explanation method, as a
activation of a specific neuron. Global explanation proxy for the quality of that explanation method.
methods are of limited value for explaining a spe- Murdoch and Szlam (2017) build a rule-based
cific prediction as they represent average behavior. classifier from the most relevant phrases in a cor-
Therefore, we focus on local methods. pus (task method: LSTM). The classifier based
Local explanation methods explain a decision on decomp (§3.4) outperforms the gradient-based
taken for one specific input at a time. We have classifier, which is in line with our results. Ar-
attempted to include all important local methods ras et al. (2017a) build document representations
for NLP in our experiments (see §3). We do by summing over word embeddings weighted by
not address self-explanatory models (e.g., atten- relevance scores (task method: CNN). They show
tion (Bahdanau et al., 2015) or rationale models that K-nearest neighbor performs better on doc-
ument representations derived with LRP than on 9 Acknowledgement
those derived with gradL2 , which also matches our
results. Denil et al. (2015) condense documents We gratefully acknowledge funding for this
by extracting top-K relevant sentences, and let the work by the European Research Council (ERC
original task method (CNN) classify them. The #740516).
accuracy loss, relative to uncondensed documents,
is smaller for graddot than for heuristic baselines.
In the domain of human-based evaluation
References
paradigms, Ribeiro et al. (2016) compare differ- Marco Ancona, Enea Ceolini, Cengiz Öztireli, and
ent variants of LIME (§3.6) by how well they help Markus Gross. 2017. A unified view of gradient-
non-experts clean a corpus from words that lead based attribution methods for deep neural networks.
In Conference on Neural Information Processing
to overfitting. Selvaraju et al. (2017) assess how System, Long Beach, USA.
well explanation methods help non-experts iden-
tify the more accurate out of two object recogni- Marco Ancona, Enea Ceolini, Cengiz Öztireli, and
tion CNNs. These experiments come closer to real Markus Gross. 2018. Towards better understanding
of gradient-based attribution methods for deep neu-
use cases than functionally-grounded paradigms; ral networks. In International Conference on Learn-
however, they are less scalable. ing Representations, Vancouver, Canada.

7 Summary Leila Arras, Franziska Horn, Grégoire Montavon,

Klaus-Robert Müller, and Wojciech Samek. 2016.
We conducted the first comprehensive evaluation Explaining predictions of non-linear classifiers in
of explanation methods for NLP, an important un- NLP. In First Workshop on Representation Learn-
dertaking because there is a need for understanding for NLP, pages 1–7, Berlin, Germany.
ing the behavior of DNNs.
Leila Arras, Franziska Horn, Grégoire Montavon,
To conduct this study, we introduced evalua- Klaus-Robert Müller, and Wojciech Samek. 2017a.
tion paradigms for explanation methods for two What is relevant in a text document?: An inter-
classes of NLP tasks, small context tasks (e.g., pretable machine learning approach. PloS one,
topic classification) and large context tasks (e.g., 12(8):e0181142.
morphological prediction). Neither paradigm re-
Leila Arras, Grégoire Montavon, Klaus-Robert Müller,
quires manual annotations. We also introduced and Wojciech Samek. 2017b. Explaining recurrent
LIMSSE, a substring-based explanation method neural network predictions in sentiment analysis. In
inspired by LIME and designed for NLP. Eighth Workshop on Computational Approaches to
Based on our experimental results, we recom- Subjectivity, Sentiment and Social Media Analysis,
pages 159–168, Copenhagen, Denmark.
mend LRP, DeepLIFT and LIMSSE for small con-
text tasks and LRP and DeepLIFT for large con- Malika Aubakirova and Mohit Bansal. 2016. Interpret-
text tasks, on all five DNN architectures that we ing neural networks to improve politeness compre-
tested. On CNNs and possibly GRUs, the (inte- hension. In Empirical Methods in Natural Language
grated) gradient embedding dot product is a good Processing, page 2035–2041, Austin, USA.
alternative to DeepLIFT and LRP. Sebastian Bach, Alexander Binder, Grégoire Mon-
tavon, Frederick Klauschen, Klaus-Robert Müller,
8 Code and Wojciech Samek. 2015. On pixel-wise explana-
tions for non-linear classifier decisions by layer-wise
Our implementation of LIMSSE, the gradi- relevance propagation. PloS one, 10(7):e0130140.
ent, perturbation and decomposition meth-
ods can be found in our branch of the Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Ben-
keras package: www.github.com/ gio. 2015. Neural machine translation by jointly
learning to align and translate. In International Con-
NPoe/keras. To re-run our experiments,
ference on Learning Representations, San Diego,
see scripts in www.github.com/NPoe/ USA.
neural-nlp-explanation-experiment.
Our LRP implementation (same repository) is Trapit Bansal, David Belanger, and Andrew McCal-
adapted from Arras et al. (2017b)6 . lum. 2016. Ask the GRU: Multi-task learning for
deep text recommendations. In ACM Conference
6 on Recommender Systems, pages 107–114, Boston,
https://github.com/ArrasL/LRP_for_
LSTM USA.
Steven Bird, Ewan Klein, and Edward Loper. 2009. Tao Lei, Regina Barzilay, and Tommi Jaakkola. 2016.
Natural language processing with Python: Analyz- Rationalizing neural predictions. In Empirical
ing text with the natural language toolkit. O’Reilly Methods in Natural Language Processing, pages
Media. 107–117, Austin, USA.

James Bradbury, Stephen Merity, Caiming Xiong, and Jiwei Li, Xinlei Chen, Eduard Hovy, and Dan Jurafsky.
Richard Socher. 2017. Quasi-recurrent neural net- 2016a. Visualizing and understanding neural mod-
works. In International Conference on Learning els in NLP. In NAACL-HLT, pages 681–691, San
Representations, Toulon, France. Diego, USA.
Kyunghyun Cho, Bart Van Merriënboer, Dzmitry Bah-
Jiwei Li, Will Monroe, and Dan Jurafsky. 2016b. Un-
danau, and Yoshua Bengio. 2014. On the properties
derstanding neural networks through representation
of neural machine translation: Encoder-decoder ap-
erasure. arXiv preprint arXiv:1612.08220.
proaches. In Eighth Workshop on Syntax, Semantics
and Structure in Statistical Translation, pages 103–
111, Doha, Qatar. Tal Linzen, Emmanuel Dupoux, and Yoav Goldberg.
2016. Assessing the ability of LSTMs to learn
Ronan Collobert, Jason Weston, Léon Bottou, Michael syntax-sensitive dependencies. Transactions of the
Karlen, Koray Kavukcuoglu, and Pavel Kuksa. Association for Computational Linguistics, 4:521–
2011. Natural language processing (almost) from 535.
scratch. Journal of Machine Learning Research,
12(Aug):2493–2537. Yajie Miao, Jinyu Li, Yongqiang Wang, Shi-Xiong
Zhang, and Yifan Gong. 2016. Simplifying long
Misha Denil, Alban Demiraj, and Nando de Freitas. short-term memory acoustic models for fast train-
2015. Extraction of salient sentences from labelled ing and decoding. In International Conference
documents. In International Conference on Learn- on Acoustics, Speech and Signal Processing, pages
ing Representations, San Diego, USA. 2284–2288.
Finale Doshi-Velez and Been Kim. 2017. A roadmap Sina Mohseni and Eric D Ragan. 2018. A human-
for a rigorous science of interpretability. CoRR, grounded evaluation benchmark for local expla-
abs/1702.08608. nations of machine learning. arXiv preprint
arXiv:1801.05075.
Bryce Goodman and Seth Flaxman. 2016. European
union regulations on algorithmic decision-making
and a “right to explanation”. In ICML Workshop on W James Murdoch and Arthur Szlam. 2017. Auto-
Human Interpretability in Machine Learning, pages matic rule extraction from long short term memory
26–30, New York, USA. networks. In International Conference on Learning
Representations, Toulon, France.
Yotam Hechtlinger. 2016. Interpretation of prediction
models using the input gradient. In Conference on Fabian Pedregosa, Gaël Varoquaux, Alexandre Gram-
Neural Information Processing Systems, Barcelona, fort, Vincent Michel, Bertrand Thirion, Olivier
Spain. Grisel, Mathieu Blondel, Peter Prettenhofer, Ron
Weiss, Vincent Dubourg, et al. 2011. Scikit-learn:
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Machine learning in Python. Journal of Machine
Long short-term memory. Neural computation, Learning Research, 12(Oct):2825–2830.
9(8):1735–1780.
Jeffrey Pennington, Richard Socher, and Christo-
Akos Kádár, Grzegorz Chrupała, and Afra Alishahi. pher D. Manning. 2014. Glove: Global vectors for
2017. Representation of linguistic form and func- word representation. In Empirical Methods in Nat-
tion in recurrent neural networks. Computational ural Language Processing (EMNLP), pages 1532–
Linguistics, 43(4):761–780. 1543, Doha, Qatar.
Pieter-Jan Kindermans, Kristof Schütt, Klaus-Robert
Müller, and Sven Dähne. 2016. Investigating the in- Marco Tulio Ribeiro, Sameer Singh, and Carlos
fluence of noise and distractors on the interpretation Guestrin. 2016. Why should I trust you?: Ex-
of neural networks. In Conference on Neural Infor- plaining the predictions of any classifier. In ACM
mation Processing Systems, Barcelona, Spain. SIGKDD International Conference on Knowledge
Discovery and Data Mining, pages 1135–1144, San
Diederik Kingma and Jimmy Ba. 2015. Adam: A Francisco, California.
method for stochastic optimization. In Interna-
tional Conference on Learning Representations, San Wojciech Samek, Alexander Binder, Grégoire Mon-
Diego, USA. tavon, Sebastian Lapuschkin, and Klaus-Robert
Müller. 2016. Evaluating the visualization of what
Ken Lang. 1995. Newsweeder: Learning to filter a deep neural network has learned. IEEE trans-
netnews. In International Conference on Machine actions on neural networks and learning systems,
Learning, pages 331–339, Tahoe City, USA. 28(11):2660–2673.
Ramprasaath R Selvaraju, Michael Cogswell, Ab-
hishek Das, Ramakrishna Vedantam, Devi Parikh,
and Dhruv Batra. 2017. Grad-cam: Visual expla-
nations from deep networks via gradient-based lo-
calization. In IEEE Conference on Computer Vision
and Pattern Recognition, pages 618–626, Honolulu,
Hawaii.
Avanti Shrikumar, Peyton Greenside, and Anshul Kun-
daje. 2017. Learning important features through
propagating activation differences. In International
Conference on Machine Learning, pages 3145–
3153, Sydney, Australia.
Karen Simonyan, Andrea Vedaldi, and Andrew Zisser-
man. 2014. Deep inside convolutional networks: Vi-
sualising image classification models and saliency
maps. In International Conference on Learning
Representations, Banff, Canada.
Mukund Sundararajan, Ankur Taly, and Qiqi Yan.
2017. Axiomatic attribution for deep networks.
In International Conference on Machine Learning,
Sydney, Australia.
Matthew D Zeiler and Rob Fergus. 2014. Visualizing
and understanding convolutional networks. In Eu-
ropean Conference on Computer Vision, pages 818–
833, Zürich, Switzerland.
Jianming Zhang, Zhe Lin, Jonathan Brandt, Xiaohui
Shen, and Stan Sclaroff. 2016. Top-down neural at-
tention by excitation backprop. In European Con-
ference on Computer Vision, pages 543–559, Ams-
terdam, Netherlands.
Luisa M Zintgraf, Taco S Cohen, Tameem Adel, and
Max Welling. 2017. Visualizing deep neural net-
work decisions: Prediction difference analysis. In
International Conference on Learning Representa-
tions, Toulon, France.
10 Supplementary material In the morphosyntactic agreement experiment, the |V | ×
50 embedding matrix is randomly initialized. All (Q)RNNs
11 Corpora and data preprocessing are unidirectional and have a hidden size of 50. QRNN ker-
nel width is 5. The core layer is followed by a fully connected
The 20 newsgroups corpus (Lang, 1995) was downloaded us- 50 × 2 layer with softmax activation. We minimize categor-
ing the Python sklearn package (Pedregosa et al., 2011), ical crossentropy using Adam (see above), with early stop-
removing all headers, footers and quotes. The corpus con- ping after 20 epochs based on heldout accuracy, and a batch
tains 18,846 posts and comes with a training and test set. We size of 16. Final test set accuracies are .991/.985/.990/.986
randomly split the latter into a heldout and a test set. (GRU/QGRU/LSTM/QLSTM). Contrary to Linzen et al.
For sentiment analysis we use the Pennsylvania subset of (2016), we do not train an ensemble.
the 10th yelp dataset challenge7 . It contains 206,338 reviews
with 1 to 5 star ratings. 1 or 2 stars are mapped to “nega- 12.1 GRU
tive”, 4 or 5 stars to “positive”, 3 star reviews are discarded.
We randomly split the data into training, heldout and test sets ~h0 = 0
(90%/5%/5%). On both corpora, we use NLTK (Bird et al.,
2009) for word and sentence tokenization. Words with a fre- ~zt = σ(Vz~et + Uz~ht−1 + ~bz )
quency rank above 50000 are mapped to oov. To create hy-
brid documents, we sentence-tokenize the test sets, shuffle, ~rt = σ(Vr~et + Ur~ht−1 + ~br )
and then concatenate ten sentences at a time. g~0 t = V~et + U(~rt ~ht−1 ) + ~b
The manually annotated 20 newsgroups documents were
obtained from Mohseni and Ragan (2018)8 . The relevance ~gt = tanh(g~0 t )
ground truth consists of one list of lowercased word types ~ht = ~zt ~ht−1 + (~1 − ~zt ) ~gt
per document. There are a number of mismatches between
the ground truth and the documents (e.g., one list contains
rays but its document only contains x-rays). This made some
reverse engineering necessary: Given X and its list, we add t 12.2 QGRU
to gt(X) if lower-cased xt is a prefix or suffix of at least one
word type in the list.
Z = σ(Vz ? [0 . . . ~e1 . . . ~eT ] + ~bz )
For the morphosyntactic agreement experiment, we use
Linzen et al. (2016)’s corpus of 1,577,211 English Wikipedia G0 = V ? [0 . . . ~e1 . . . ~eT ] + ~b
sentences with automatic morphosyntactic annotation9 . We
replicate the original dataset sizes (9% train, 1% heldout, G = tanh(G0 )
90% test). Like in the original corpus, words with a frequency ~h0 = 0
rank above 10,000 are replaced by their part-of-speech tag.
~ht = ~zt ~ht−1 + (1 − ~zt ) ~gt
12 Neural networks
Every neural network used in our paper is made up of a word
embedding matrix, followed by a core layer, followed by a
12.3 LSTM
fully-connected layer with softmax activation.
In the hybrid document experiment, the |V | × 300 embed- ~c0 = ~h0 = 0
ding matrix is initialized with GloVe embeddings (Penning- ~it = σ(Vi~et + Ui~ht−1 + ~bi )
ton et al., 2014)10 , which are fine-tuned during training. The
core layer is a bidirectional Gated Recurrent Unit (GRU, Cho f~t = σ(Vf ~et + Uf ~ht−1 + ~bf )
et al. (2014)), bidirectional Long-Short Term Memory Net-
~ot = σ(Vo~et + Uo~ht−1 + ~bo )
work (LSTM, Hochreiter and Schmidhuber (1997)), bidirec-
tional Quasi-GRU or Quasi-LSTM (Bradbury et al., 2017), or g~0 t = V~et + U~ht−1 + ~b
a 1D Convolutional Neural Network (CNN) with global max
pooling (Collobert et al., 2011). In all cases, the core layer ~gt = tanh(g~0 t )
has a hidden size of 150 (bidirectional architectures: 75 per ~ct = f~t ~ct−1 + ~it ~gt
direction), for QRNNs and CNN, we use a kernel width of 5.
For regularization, we use 50% dropout between layers and ~ht = ~ot tanh(~ct )
on hidden-to-hidden connections (GRU/LSTM only).
We minimize categorical crossentropy using Adam
(Kingma and Ba, 2015), with learning rate 0.001, β1 = 0.9,
β2 = 0.999 and batch size 8. Heldout accuracy is monitored;
12.4 QLSTM
after two stagnant epochs, the learning rate is halved, and
after 5 (yelp), resp. 25 (20 newsgroups), stagnant epochs, I = σ(Vi ? [0 . . . ~e1 . . . ~eT ] + ~bi )
training is stopped and the model from the best epoch is
F = σ(Vf ? [0 . . . ~e1 . . . ~eT ] + ~bf )
stored. Final test set accuracies are .964/.954/.965/.959/.957
on yelp and .727/.716/.730/.735/.705 on 20 newsgroups O = σ(Vo ? [0 . . . ~e1 . . . ~eT ] + ~bo )
(GRU/QGRU/LSTM/QLSTM/CNN).
G0 = V ? [0 . . . ~e1 . . . ~eT ] + ~b
7
www.yelp.com/dataset_challenge G = tanh(G0 )
8
http://github.com/SinaMohseni/
~h0 = ~c0 = 0
ML-Interpretability-Evaluation-Benchmark
9
www.tallinzen.net/media/rnn_ ~ct = f~t ~ct−1 + ~it ~gt
agreement/agr_50_mostcommon_10K.tsv.gz
10 ~ht = ~ot tanh(~ct )
http://nlp.stanford.edu/data/glove.
840B.300d.zip
12.5 CNN 14.3 LSTM

R(cT +1,d ) = 0
0
G = V ? [0 . . . ~e1 . . . ~eT . . . 0] + ~b tanh(ct,d ) · ot,d
R(ct,d ) = R(ht,d )
ht,d + esign(ht,d )
G = relu(G0 )
ct,d · ft+1,d
hd = max(gt,d ) + R(ct+1,d )
t ct+1,d + esign(ct+1,d )
gt,d · it,d
R(gt,d ) = R(ct,d )
ct,d + esign(ct,d )
dim(~
gt )
X et,d · vd,j
13 RGB coding in examples R(et,d ) = R(gt,j ) 0 0
j=1
gt,j + esign(gt,j )
dim(~
gt )
X ht−1,d · ud,j
R(ht−1,d ) = R(gt,j ) 0 0
j=1
gt,j + esign(gt,j )
φ(t, k, X)
φ0 (t, k, X) =
maxt0 (1.1|φ(t0 , k, X)|)
R(t, k, X) = φ0 (t, k, X)I[φ(t, k, X) < 0]
14.4 QLSTM
G(t, k, X) = φ0 (t, k, X)I[φ(t, k, X) > 0]
B(t, k, X) = 0
R(cT +1,d ) = 0
tanh(ct,d ) · ot,d
R(ct,d ) = R(ht,d )
ht,d + esign(ht,d )
ct,d · ft+1,d
14 Epsilon LRP and DeepLIFT + R(ct+1,d )
ct+1,d + esign(ct+1,d )
gt,d · it,d
In the following, we assume that the hidden layer relevance R(gt,d ) = R(ct,d )
ct,d + esign(ct,d )
vector R(~h) (resp. R(~hT )) has been backpropagated by the dim(~
gt ) F −1
upstream fully connected layer using equations from Sections X X et,d · vk,d,j
R(et,d ) = R(gt+k,j ) 0 0
3.2 and 3.3 (main paper). DeepLIFT can be derived by re- gt+k,j + esign(gt+k,j )
j=1 k=0
placing h, g, g 0 , e, c with h − h̄, g − ḡ, g 0 − ḡ 0 , e − ē, c − c̄.
F is CNN / QRNN kernel width.
14.5 CNN

14.1 GRU F −1
F0 =
2
R(gt,d ) = R(hd ) · I[argmaxt0 (gt0 ,d ) = t]
dim(~
g) 0
F
gt,d · (1 − zt,d ) X X et,d · vk,d,j
R(gt,d ) = R(ht,d ) R(et,d ) = R(gt+k,j ) 0 0
ht,d + esign(ht,d ) gt+k,j + esign(gt+k,j )
j=1 k=−F 0
dim(~
gt )
X et,d · vd,j
R(et,d ) = R(gt,j ) 0 0
j=1
gt,j + esign(gt,j )
ht−1,d · zt,d
R(ht−1,d ) = R(ht,d )
ht,d + esign(ht,d )
dim(~
gt )
X ht−1,d · rt,d · ud,j
+ R(gt,j ) 0 0
j=1
gt,j + esign(gt,j )

14.2 QGRU

gt,d · (1 − zt,d )
R(gt,d ) = R(ht,d )
ht,d + esign(ht,d )
ht−1,d · zt,d
R(ht−1,d ) = R(ht,d )
ht,d + esign(ht,d )
dim(~
gt ) F −1
X X et,d · vk,d,j
R(et,d ) = R(gt+k,j ) 0 0
j=1
gt+k,j + esign(gt+k,j )
k=0
gradL2
1s few if any events in history [are ...] gradL2
1s the link provided by the editor above [gives ...]
gradL2
1p few if any events in history [are ...] gradL2
1p the link provided by the editor above [gives ...]
gradL2
R
s
few if any events in history [are ...] gradL2
R
s
the link provided by the editor above [gives ...]
gradL2
R
p
few if any events in history [are ...] gradL2
R
p
the link provided by the editor above [gives ...]
graddot
1s few if any events in history [are ...] graddot
1s the link provided by the editor above [gives ...]
graddot
1p few if any events in history [are ...] graddot
1p the link provided by the editor above [gives ...]
graddot
R
s
few if any events in history [are ...] graddot
R
s
the link provided by the editor above [gives ...]
graddot graddot

GRU

GRU
R
p
few if any events in history [are ...] R
p
the link provided by the editor above [gives ...]
omit1 few if any events in history [are ...] omit1 the link provided by the editor above [gives ...]
omit3 few if any events in history [are ...] omit3 the link provided by the editor above [gives ...]
omit7 few if any events in history [are ...] omit7 the link provided by the editor above [gives ...]
occ1 few if any events in history [are ...] occ1 the link provided by the editor above [gives ...]
occ3 few if any events in history [are ...] occ3 the link provided by the editor above [gives ...]
occ7 few if any events in history [are ...] occ7 the link provided by the editor above [gives ...]
decomp few if any events in history [are ...] decomp the link provided by the editor above [gives ...]
lrp few if any events in history [are ...] lrp the link provided by the editor above [gives ...]
deeplift few if any events in history [are ...] deeplift the link provided by the editor above [gives ...]
limssebb few if any events in history [are ...] limssebb the link provided by the editor above [gives ...]
limssems
s few if any events in history [are ...] limssems
s the link provided by the editor above [gives ...]
limssems
p few if any events in history [are ...] limssems
p the link provided by the editor above [gives ...]
gradL2
1s few if any events in history [are ...] gradL2
1s the link provided by the editor above [gives ...]
gradL2
1p few if any events in history [are ...] gradL2
1p the link provided by the editor above [gives ...]
gradL2
R
s
few if any events in history [are ...] gradL2
R
s
the link provided by the editor above [gives ...]
gradL2
R
p
few if any events in history [are ...] gradL2
R
p
the link provided by the editor above [gives ...]
graddot
1s few if any events in history [are ...] graddot
1s the link provided by the editor above [gives ...]
graddot
1p few if any events in history [are ...] graddot
1p the link provided by the editor above [gives ...]
graddot
R
s
few if any events in history [are ...] graddot
R
s
the link provided by the editor above [gives ...]
QGRU

QGRU
graddot
R
p
few if any events in history [are ...] graddot
R
p
the link provided by the editor above [gives ...]
omit1 few if any events in history [are ...] omit1 the link provided by the editor above [gives ...]
omit3 few if any events in history [are ...] omit3 the link provided by the editor above [gives ...]
omit7 few if any events in history [are ...] omit7 the link provided by the editor above [gives ...]
occ1 few if any events in history [are ...] occ1 the link provided by the editor above [gives ...]
occ3 few if any events in history [are ...] occ3 the link provided by the editor above [gives ...]
occ7 few if any events in history [are ...] occ7 the link provided by the editor above [gives ...]
decomp few if any events in history [are ...] decomp the link provided by the editor above [gives ...]
lrp few if any events in history [are ...] lrp the link provided by the editor above [gives ...]
deeplift few if any events in history [are ...] deeplift the link provided by the editor above [gives ...]
limssebb few if any events in history [are ...] limssebb the link provided by the editor above [gives ...]
limssems
s few if any events in history [are ...] limssems
s the link provided by the editor above [gives ...]
limssems
p few if any events in history [are ...] limssems
p the link provided by the editor above [gives ...]
gradL2
1s few if any events in history [are ...] gradL2
1s the link provided by the editor above [gives ...]
gradL2
1p few if any events in history [are ...] gradL2
1p the link provided by the editor above [gives ...]
gradL2
R
s
few if any events in history [are ...] gradL2
R
s
the link provided by the editor above [gives ...]
gradL2
R
p
few if any events in history [are ...] gradL2
R
p
the link provided by the editor above [gives ...]
graddot
1s few if any events in history [are ...] graddot
1s the link provided by the editor above [gives ...]
graddot
1p few if any events in history [are ...] graddot
1p the link provided by the editor above [gives ...]
graddot
R
s
few if any events in history [are ...] dot
grad s
R the link provided by the editor above [gives ...]
LSTM

LSTM
graddot
R
p
few if any events in history [are ...] graddot
R
p
the link provided by the editor above [gives ...]
omit1 few if any events in history [are ...] omit1 the link provided by the editor above [gives ...]
omit3 few if any events in history [are ...] omit3 the link provided by the editor above [gives ...]
omit7 few if any events in history [are ...] omit7 the link provided by the editor above [gives ...]
occ1 few if any events in history [are ...] occ1 the link provided by the editor above [gives ...]
occ3 few if any events in history [are ...] occ3 the link provided by the editor above [gives ...]
occ7 few if any events in history [are ...] occ7 the link provided by the editor above [gives ...]
decomp few if any events in history [are ...] decomp the link provided by the editor above [gives ...]
lrp few if any events in history [are ...] lrp the link provided by the editor above [gives ...]
deeplift few if any events in history [are ...] deeplift the link provided by the editor above [gives ...]
limssebb few if any events in history [are ...] limssebb the link provided by the editor above [gives ...]
limssems
s few if any events in history [are ...] limssems
s the link provided by the editor above [gives ...]
limssems
p few if any events in history [are ...] limssems
p the link provided by the editor above [gives ...]
gradL2
1s few if any events in history [are ...] gradL2
1s the link provided by the editor above [gives ...]
gradL2
1p few if any events in history [are ...] gradL2
1p the link provided by the editor above [gives ...]
gradL2
R
s
few if any events in history [are ...] gradL2
R
s
the link provided by the editor above [gives ...]
gradL2
R
p
few if any events in history [are ...] gradL2
R
p
the link provided by the editor above [gives ...]
graddot
1s few if any events in history [are ...] graddot
1s the link provided by the editor above [gives ...]
graddot
1p few if any events in history [are ...] graddot
1p the link provided by the editor above [gives ...]
graddot
R
s
few if any events in history [are ...] graddot
R
s
the link provided by the editor above [gives ...]
QLSTM

QLSTM

graddot
R
p
few if any events in history [are ...] graddot
R
p
the link provided by the editor above [gives ...]
omit1 few if any events in history [are ...] omit1 the link provided by the editor above [gives ...]
omit3 few if any events in history [are ...] omit3 the link provided by the editor above [gives ...]
omit7 few if any events in history [are ...] omit7 the link provided by the editor above [gives ...]
occ1 few if any events in history [are ...] occ1 the link provided by the editor above [gives ...]
occ3 few if any events in history [are ...] occ3 the link provided by the editor above [gives ...]
occ7 few if any events in history [are ...] occ7 the link provided by the editor above [gives ...]
decomp few if any events in history [are ...] decomp the link provided by the editor above [gives ...]
lrp few if any events in history [are ...] lrp the link provided by the editor above [gives ...]
deeplift few if any events in history [are ...] deeplift the link provided by the editor above [gives ...]
limssebb few if any events in history [are ...] limssebb the link provided by the editor above [gives ...]
limssems
s few if any events in history [are ...] limssems
s the link provided by the editor above [gives ...]
limssems
p few if any events in history [are ...] limssems
p the link provided by the editor above [gives ...]

Figure 4: Verb context classified plural. Green Figure 5: Verb context classified singular. Green
(resp. red): evidence for (resp. against) the predic- (resp. red): evidence for (resp. against) the predic-
tion. Underlined: subject. Bold: rmax position. tion. Underlined: subject. Bold: rmax position.
gradL2
1s i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]
gradL2
1p i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]
gradL2
R
s i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]
gradL2
R
p i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]
graddot
1s i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]
dot
grad1p i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]
graddot
R
s i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]
graddot i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]

GRU
R
p
omit1 i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]
omit3 i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]
omit7 i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]
occ1 i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]
occ3 i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]
occ7 i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]
decomp i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]
lrp i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]
deeplift i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]
limssebb i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]
limssems
s i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]
limssems
p i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]
L2
grad1s i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]
gradL2
1p i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]
gradL2
R
s i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]
gradL2
R
p i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]
graddot
1s i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]
graddot
1p i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]
graddot
R
s i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]
graddot
R
p i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...] QLSTM
omit1 i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]
omit3 i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]
omit7 i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]
occ1 i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]
occ3 i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]
occ7 i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]
decomp i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]
lrp i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]
deeplift i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]
limssebb i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]
limssems
s i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]
limssems
p i like the fact that there is n’t an editor making news decisions , that nearly any news story published [has ...]

Figure 6: Verb context classified singular by GRU and plural by QLSTM. Green (resp. red): evidence
for (resp. against) the prediction. Underlined: subject. Bold: rmax position.
From : kolstad @ cae.wisc.edu ( Joel Kolstad ) Subject : Re : Can Radio Freq . Be Used To Measure Distance ? Organization : U of Wisconsin-Madison
L2
p grad s grad1p grad1s grad p grad s grad1p grad1s

From : kolstad @ cae.wisc.edu ( Joel Kolstad ) Subject : Re : Can Radio Freq . Be Used To Measure Distance ? Organization : U of Wisconsin-Madison
College of Engineering Lines : 12 In article < 72020037 @ otter.hpl.hp.com > tgg @ otter.hpl.hp.com ( Tom Gardner ) writes : > What is the difference
between vertical and horizontal ? Gravity ? Does n’t gravity pull down the photons and cause a doppler shift or something ? ( Just kidding ! )
dot

From : kolstad @ cae.wisc.edu ( Joel Kolstad ) Subject : Re : Can Radio Freq . Be Used To Measure Distance ? Organization : U of Wisconsin-Madison
R

College of Engineering Lines : 12 In article < 72020037 @ otter.hpl.hp.com > tgg @ otter.hpl.hp.com ( Tom Gardner ) writes : > What is the difference
between vertical and horizontal ? Gravity ? Does n’t gravity pull down the photons and cause a doppler shift or something ? ( Just kidding ! )
From : kolstad @ cae.wisc.edu ( Joel Kolstad ) Subject : Re : Can Radio Freq . Be Used To Measure Distance ? Organization : U of Wisconsin-Madison
College of Engineering Lines : 12 In article < 72020037 @ otter.hpl.hp.com > tgg @ otter.hpl.hp.com ( Tom Gardner ) writes : > What is the difference
between vertical and horizontal ? Gravity ? Does n’t gravity pull down the photons and cause a doppler shift or something ? ( Just kidding ! )
From : kolstad @ cae.wisc.edu ( Joel Kolstad ) Subject : Re : Can Radio Freq . Be Used To Measure Distance ? Organization : U of Wisconsin-Madison
College of Engineering Lines : 12 In article < 72020037 @ otter.hpl.hp.com > tgg @ otter.hpl.hp.com ( Tom Gardner ) writes : > What is the difference
between vertical and horizontal ? Gravity ? Does n’t gravity pull down the photons and cause a doppler shift or something ? ( Just kidding ! )
From : kolstad @ cae.wisc.edu ( Joel Kolstad ) Subject : Re : Can Radio Freq . Be Used To Measure Distance ? Organization : U of Wisconsin-Madison
College of Engineering Lines : 12 In article < 72020037 @ otter.hpl.hp.com > tgg @ otter.hpl.hp.com ( Tom Gardner ) writes : > What is the difference
between vertical and horizontal ? Gravity ? Does n’t gravity pull down the photons and cause a doppler shift or something ? ( Just kidding ! )
From : kolstad @ cae.wisc.edu ( Joel Kolstad ) Subject : Re : Can Radio Freq . Be Used To Measure Distance ? Organization : U of Wisconsin-Madison
occ1

Figure 7: sci.electronics post (not hybrid). Underlined: Manual relevance ground truth. Green (resp.
red): evidence for (resp. against) sci.electronics. Task method: CNN. Italics: OOV. Bold: rmax position.
From : chorley @ vms.ocom.okstate.edu Subject : CS “ gas ” and allergic response- Ques . Lines : 6 Nntp-Posting-Host : vms.ocom.okstate.edu Organi-
gradL2
1s
zation : OSU College of Osteopathic Medicine This question derives from the Waco incident : Could CS ( “ gas ” ) particles create an allergic response
which would result in laryngospasm and asphyxiation ? - especially in children . DNC in Ok. OSU-COM will disavow my opinion , and my existence , if
necessary .
From : chorley @ vms.ocom.okstate.edu Subject : CS “ gas ” and allergic response- Ques . Lines : 6 Nntp-Posting-Host : vms.ocom.okstate.edu Organi-
gradL2
1p

zation : OSU College of Osteopathic Medicine This question derives from the Waco incident : Could CS ( “ gas ” ) particles create an allergic response
R

: OSU College of Osteopathic Medicine This question derives from the Waco incident : Could CS ( “ gas ” ) particles create an allergic response which
would result in laryngospasm and asphyxiation ? - especially in children . DNC in Ok. OSU-COM will disavow my opinion , and my existence , if
necessary .
From : chorley @ vms.ocom.okstate.edu Subject : CS “ gas ” and allergic response- Ques . Lines : 6 Nntp-Posting-Host : vms.ocom.okstate.edu Organi-
zation : OSU College of Osteopathic Medicine This question derives from the Waco incident : Could CS ( “ gas ” ) particles create an allergic response
occ1

which would result in laryngospasm and asphyxiation ? - especially in children . DNC in Ok. OSU-COM will disavow my opinion , and my existence , if
necessary .
From : chorley @ vms.ocom.okstate.edu Subject : CS “ gas ” and allergic response- Ques . Lines : 6 Nntp-Posting-Host : vms.ocom.okstate.edu Organi-
zation : OSU College of Osteopathic Medicine This question derives from the Waco incident : Could CS ( “ gas ” ) particles create an allergic response
occ3

which would result in laryngospasm and asphyxiation ? - especially in children . DNC in Ok. OSU-COM will disavow my opinion , and my existence , if
necessary .
From : chorley @ vms.ocom.okstate.edu Subject : CS “ gas ” and allergic response- Ques . Lines : 6 Nntp-Posting-Host : vms.ocom.okstate.edu Organization
: OSU College of Osteopathic Medicine This question derives from the Waco incident : Could CS ( “ gas ” ) particles create an allergic response which
occ7

would result in laryngospasm and asphyxiation ? - especially in children . DNC in Ok. OSU-COM will disavow my opinion , and my existence , if
necessary .
From : chorley @ vms.ocom.okstate.edu Subject : CS “ gas ” and allergic response- Ques . Lines : 6 Nntp-Posting-Host : vms.ocom.okstate.edu Organi-
decomp

zation : OSU College of Osteopathic Medicine This question derives from the Waco incident : Could CS ( “ gas ” ) particles create an allergic response
which would result in laryngospasm and asphyxiation ? - especially in children . DNC in Ok. OSU-COM will disavow my opinion , and my existence , if
necessary .
From : chorley @ vms.ocom.okstate.edu Subject : CS “ gas ” and allergic response- Ques . Lines : 6 Nntp-Posting-Host : vms.ocom.okstate.edu Organization
: OSU College of Osteopathic Medicine This question derives from the Waco incident : Could CS ( “ gas ” ) particles create an allergic response which
would result in laryngospasm and asphyxiation ? - especially in children . DNC in Ok. OSU-COM will disavow my opinion , and my existence , if
necessary .
From : chorley @ vms.ocom.okstate.edu Subject : CS “ gas ” and allergic response- Ques . Lines : 6 Nntp-Posting-Host : vms.ocom.okstate.edu Organization
limssems
s

Figure 8: sci.med post (not hybrid). Underlined: Manual relevance ground truth. Green (resp. red):
evidence for (resp. against) sci.med. Task method: GRU. Italics: OOV. Bold: rmax position.
If you find faith to be honest , show me how . David The whole denominational mindset only causes more problems , sadly . ( See section 7 for details
gradL2
1s
. ) Thank you . ’The Armenians just shot and shot . Maybe coz they ’re ’quality’ cars ; - ) 200 posts/day . can you explain this or is it that they usually
talk to stars more than regular players which explains the hight percentage of results after . It was produced in collaboration with the American College of
Surgeons Commission on Cancer .
If you find faith to be honest , show me how . David The whole denominational mindset only causes more problems , sadly . ( See section 7 for details
gradL2
1p

. ) Thank you . ’The Armenians just shot and shot . Maybe coz they ’re ’quality’ cars ; - ) 200 posts/day . can you explain this or is it that they usually
R

talk to stars more than regular players which explains the hight percentage of results after . It was produced in collaboration with the American College of
Surgeons Commission on Cancer .
If you find faith to be honest , show me how . David The whole denominational mindset only causes more problems , sadly . ( See section 7 for details
. ) Thank you . ’The Armenians just shot and shot . Maybe coz they ’re ’quality’ cars ; - ) 200 posts/day . can you explain this or is it that they usually
occ3

. ) Thank you . ’The Armenians just shot and shot . Maybe coz they ’re ’quality’ cars ; - ) 200 posts/day . can you explain this or is it that they usually
talk to stars more than regular players which explains the hight percentage of results after . It was produced in collaboration with the American College of
Surgeons Commission on Cancer .
If you find faith to be honest , show me how . David The whole denominational mindset only causes more problems , sadly . ( See section 7 for details
. ) Thank you . ’The Armenians just shot and shot . Maybe coz they ’re ’quality’ cars ; - ) 200 posts/day . can you explain this or is it that they usually
talk to stars more than regular players which explains the hight percentage of results after . It was produced in collaboration with the American College of
Surgeons Commission on Cancer .
If you find faith to be honest , show me how . David The whole denominational mindset only causes more problems , sadly . ( See section 7 for details
limssems
s

Figure 9: Hybrid newsgroup post, classified talk.politics.mideast. Green (resp. red): evidence for (resp.
against) talk.politics.mideast. Underlined: talk.politics.mideast fragment. Italics: OOV. Task method:
QGRU. Bold: rmax position.
Fair enough . 2 . H. Rahmi , ed . There ’s nothing punitive or unjust about it . Thanks ( Deletion ) It is not a question of grammar , it is a question
gradL2
1s
of modelling . So far only two things seem to work : To kill it dead or to run into the house and close all doors and windows . FTP to ftp.uu.net :
graphics/jpeg/jpegsrc.v ? .tar.Z Do n’t forget to set binary mode when you FTP tar files . Interplanetary . : +49 231 755-4663 D-W4600 Dortmund 50 ——
Fax : +49 231 755-2386
Fair enough . 2 . H. Rahmi , ed . There ’s nothing punitive or unjust about it . Thanks ( Deletion ) It is not a question of grammar , it is a question
gradL2
1p

of modelling . So far only two things seem to work : To kill it dead or to run into the house and close all doors and windows . FTP to ftp.uu.net :
R

graphics/jpeg/jpegsrc.v ? .tar.Z Do n’t forget to set binary mode when you FTP tar files . Interplanetary . : +49 231 755-4663 D-W4600 Dortmund 50 ——
Fax : +49 231 755-2386
Fair enough . 2 . H. Rahmi , ed . There ’s nothing punitive or unjust about it . Thanks ( Deletion ) It is not a question of grammar , it is a question
gradL2p

of modelling . So far only two things seem to work : To kill it dead or to run into the house and close all doors and windows . FTP to ftp.uu.net :
R

of modelling . So far only two things seem to work : To kill it dead or to run into the house and close all doors and windows . FTP to ftp.uu.net :
graphics/jpeg/jpegsrc.v ? .tar.Z Do n’t forget to set binary mode when you FTP tar files . Interplanetary . : +49 231 755-4663 D-W4600 Dortmund 50 ——
Fax : +49 231 755-2386
Fair enough . 2 . H. Rahmi , ed . There ’s nothing punitive or unjust about it . Thanks ( Deletion ) It is not a question of grammar , it is a question
of modelling . So far only two things seem to work : To kill it dead or to run into the house and close all doors and windows . FTP to ftp.uu.net :
graphics/jpeg/jpegsrc.v ? .tar.Z Do n’t forget to set binary mode when you FTP tar files . Interplanetary . : +49 231 755-4663 D-W4600 Dortmund 50 ——
Fax : +49 231 755-2386
Fair enough . 2 . H. Rahmi , ed . There ’s nothing punitive or unjust about it . Thanks ( Deletion ) It is not a question of grammar , it is a question
ms
p limsses

Figure 10: Hybrid newsgroup post, classified comp.windows.x. Green (resp. red): evidence for (resp.
against) comp.windows.x. Underlined: comp.windows.x fragment. Italics: OOV. Task method: LSTM.
Bold: rmax position. The telephone numbers in the last sentence appear in 3 comp.windows.x posts but
nowhere else in the corpus.
Sorry for any confusion I may have created . Jon Mandaville , Professor of History , Portland State University ( Oregon ) . Trying to mix up the lines is a
gradL2
1s dead end . matter of proving the track record of the # scientific method . – I know that I will have to pay tax when I go to register the car . It sounds like he
has had another set back in his come back . Oh , and a prediction : Milt Cuyler . The Xaw support was missing from OW2.0 but added in 3.0 . This slant
permeates the text .
Sorry for any confusion I may have created . Jon Mandaville , Professor of History , Portland State University ( Oregon ) . Trying to mix up the lines is a
gradL2
1p

dead end . matter of proving the track record of the # scientific method . – I know that I will have to pay tax when I go to register the car . It sounds like he
R

has had another set back in his come back . Oh , and a prediction : Milt Cuyler . The Xaw support was missing from OW2.0 but added in 3.0 . This slant
permeates the text .
Sorry for any confusion I may have created . Jon Mandaville , Professor of History , Portland State University ( Oregon ) . Trying to mix up the lines is a
dead end . matter of proving the track record of the # scientific method . – I know that I will have to pay tax when I go to register the car . It sounds like he
occ3

dead end . matter of proving the track record of the # scientific method . – I know that I will have to pay tax when I go to register the car . It sounds like he
has had another set back in his come back . Oh , and a prediction : Milt Cuyler . The Xaw support was missing from OW2.0 but added in 3.0 . This slant
permeates the text .
Sorry for any confusion I may have created . Jon Mandaville , Professor of History , Portland State University ( Oregon ) . Trying to mix up the lines is a
dead end . matter of proving the track record of the # scientific method . – I know that I will have to pay tax when I go to register the car . It sounds like he
has had another set back in his come back . Oh , and a prediction : Milt Cuyler . The Xaw support was missing from OW2.0 but added in 3.0 . This slant
permeates the text .
Sorry for any confusion I may have created . Jon Mandaville , Professor of History , Portland State University ( Oregon ) . Trying to mix up the lines is a
ms
p limsses

Figure 11: Hybrid newsgroup post, classified comp.windows.x. Green (resp. red): evidence for
(resp. against) comp.windows.x. Underlined: comp.windows.x fragment. Italics: OOV. Task method:
QLSTM. Bold: rmax position.
That should have been a flag for me for when I came in . Service was excellent and food very good . This was a disaster , and Penn Avenue Fish Company
L2
p grad s grad1p grad1s grad p grad s grad1p grad1s

was responsible . Very clean . Did n’t see that coming . Always a pleasure . I really felt like I had wasted $ 25 . Will always go back Check them out on
Beer Advocate . Friendly staff .
dot

That should have been a flag for me for when I came in . Service was excellent and food very good . This was a disaster , and Penn Avenue Fish Company
R

was responsible . Very clean . Did n’t see that coming . Always a pleasure . I really felt like I had wasted $ 25 . Will always go back Check them out on
Beer Advocate . Friendly staff .
omit7 omit3 omit1 graddot

That should have been a flag for me for when I came in . Service was excellent and food very good . This was a disaster , and Penn Avenue Fish Company
R

was responsible . Very clean . Did n’t see that coming . Always a pleasure . I really felt like I had wasted $ 25 . Will always go back Check them out on
Beer Advocate . Friendly staff .
That should have been a flag for me for when I came in . Service was excellent and food very good . This was a disaster , and Penn Avenue Fish Company
was responsible . Very clean . Did n’t see that coming . Always a pleasure . I really felt like I had wasted $ 25 . Will always go back Check them out on
Beer Advocate . Friendly staff .
That should have been a flag for me for when I came in . Service was excellent and food very good . This was a disaster , and Penn Avenue Fish Company
was responsible . Very clean . Did n’t see that coming . Always a pleasure . I really felt like I had wasted $ 25 . Will always go back Check them out on
Beer Advocate . Friendly staff .
That should have been a flag for me for when I came in . Service was excellent and food very good . This was a disaster , and Penn Avenue Fish Company
was responsible . Very clean . Did n’t see that coming . Always a pleasure . I really felt like I had wasted $ 25 . Will always go back Check them out on
Beer Advocate . Friendly staff .
That should have been a flag for me for when I came in . Service was excellent and food very good . This was a disaster , and Penn Avenue Fish Company
occ1

was responsible . Very clean . Did n’t see that coming . Always a pleasure . I really felt like I had wasted $ 25 . Will always go back Check them out on
Beer Advocate . Friendly staff .
bb deeplift

That should have been a flag for me for when I came in . Service was excellent and food very good . This was a disaster , and Penn Avenue Fish Company
was responsible . Very clean . Did n’t see that coming . Always a pleasure . I really felt like I had wasted $ 25 . Will always go back Check them out on
Beer Advocate . Friendly staff .
That should have been a flag for me for when I came in . Service was excellent and food very good . This was a disaster , and Penn Avenue Fish Company
p limsses limsse

was responsible . Very clean . Did n’t see that coming . Always a pleasure . I really felt like I had wasted $ 25 . Will always go back Check them out on
Beer Advocate . Friendly staff .
ms

Figure 12: Hybrid yelp review, classified positive. Green (resp. red): evidence for (resp. against)
positive. Underlined: positive fragments. Italics: OOV. Task method: GRU. Bold: rmax position.
When we went to pay we handing the guy the card and our payment , he checked us out and handed back our payment . After we got our food our waitress
L2
p grad s grad1p grad1s grad p grad s grad1p grad1s

When we went to pay we handing the guy the card and our payment , he checked us out and handed back our payment . After we got our food our waitress
went M.I.A . The room was good size . : ) The waitresses need to work on their skills a little more . This place is terrible . ! We will not be back . Luckily I
do eat salmon , so I headed to the smoked salmon station . One of the few places where you can find good Italian food .
dot

When we went to pay we handing the guy the card and our payment , he checked us out and handed back our payment . After we got our food our waitress
R

went M.I.A . The room was good size . : ) The waitresses need to work on their skills a little more . This place is terrible . ! We will not be back . Luckily I
do eat salmon , so I headed to the smoked salmon station . One of the few places where you can find good Italian food .
When we went to pay we handing the guy the card and our payment , he checked us out and handed back our payment . After we got our food our waitress
went M.I.A . The room was good size . : ) The waitresses need to work on their skills a little more . This place is terrible . ! We will not be back . Luckily I
do eat salmon , so I headed to the smoked salmon station . One of the few places where you can find good Italian food .
When we went to pay we handing the guy the card and our payment , he checked us out and handed back our payment . After we got our food our waitress
went M.I.A . The room was good size . : ) The waitresses need to work on their skills a little more . This place is terrible . ! We will not be back . Luckily I
do eat salmon , so I headed to the smoked salmon station . One of the few places where you can find good Italian food .
When we went to pay we handing the guy the card and our payment , he checked us out and handed back our payment . After we got our food our waitress
went M.I.A . The room was good size . : ) The waitresses need to work on their skills a little more . This place is terrible . ! We will not be back . Luckily I
do eat salmon , so I headed to the smoked salmon station . One of the few places where you can find good Italian food .
When we went to pay we handing the guy the card and our payment , he checked us out and handed back our payment . After we got our food our waitress
occ1

When we went to pay we handing the guy the card and our payment , he checked us out and handed back our payment . After we got our food our waitress
went M.I.A . The room was good size . : ) The waitresses need to work on their skills a little more . This place is terrible . ! We will not be back . Luckily
I do eat salmon , so I headed to the smoked salmon station . One of the few places where you can find good Italian food .
limssems

Figure 13: Hybrid yelp review, classified negative. Green (resp. red): evidence for (resp. against)
negative. Underlined: negative fragments. Italics: OOV. Task method: LSTM. Bold: rmax position.

Lean Six Sigma Tools Free
75% (4)
Lean Six Sigma Tools Free
4 pages
2006 Dynamics of Modelling Container Cranes
No ratings yet
2006 Dynamics of Modelling Container Cranes
7 pages
Paper 1-Bidirectional LSTM With Attention Mechanism and Convolutional Layer
100% (1)
Paper 1-Bidirectional LSTM With Attention Mechanism and Convolutional Layer
51 pages
Deep Learning in Natural Language Processing A State-of-the-Art Survey
No ratings yet
Deep Learning in Natural Language Processing A State-of-the-Art Survey
6 pages
Comparative Study of CNN and RNN For Natural Language Processing
No ratings yet
Comparative Study of CNN and RNN For Natural Language Processing
7 pages
Analysis Methods in Neural Language Processing: A Survey
No ratings yet
Analysis Methods in Neural Language Processing: A Survey
24 pages
2020-PAL-wu
No ratings yet
2020-PAL-wu
16 pages
Cordonnier 2020
No ratings yet
Cordonnier 2020
18 pages
2020.lrec-1.704
No ratings yet
2020.lrec-1.704
10 pages
Relevant in A Text Document An - Interpretab
No ratings yet
Relevant in A Text Document An - Interpretab
19 pages
AC LSTMNeuralNetworkforTextClassification
No ratings yet
AC LSTMNeuralNetworkforTextClassification
10 pages
2004.02015v3
No ratings yet
2004.02015v3
16 pages
Qiu et al. - 2020 - Pre-trained Models for Natural Language Processing
No ratings yet
Qiu et al. - 2020 - Pre-trained Models for Natural Language Processing
28 pages
Applsci 11 11255 v2
No ratings yet
Applsci 11 11255 v2
17 pages
An Overview of Explanation Approaches For Deep Neural Networks (Ongoing Work-This Is A Draft)
No ratings yet
An Overview of Explanation Approaches For Deep Neural Networks (Ongoing Work-This Is A Draft)
17 pages
Pre-Trained Models For Natural Language Processing: A Survey
No ratings yet
Pre-Trained Models For Natural Language Processing: A Survey
31 pages
8 - 1 - NICS 2018 - Vietnamese Keyword Extraction Using Hybrid Deep Learning Methods - Bui Thanh Hung
No ratings yet
8 - 1 - NICS 2018 - Vietnamese Keyword Extraction Using Hybrid Deep Learning Methods - Bui Thanh Hung
6 pages
2016_An Investigation of Deep Neural Network Architectures for Language_Interspeech_LID
No ratings yet
2016_An Investigation of Deep Neural Network Architectures for Language_Interspeech_LID
5 pages
Chapters - Mini Project Report Format
No ratings yet
Chapters - Mini Project Report Format
17 pages
Conference 4
No ratings yet
Conference 4
10 pages
2020 Trac-1 4
No ratings yet
2020 Trac-1 4
5 pages
Towards Emotion Independent Languageidentification System: by Priyam Jain, Krishna Gurugubelli, Anil Kumar Vuppala
No ratings yet
Towards Emotion Independent Languageidentification System: by Priyam Jain, Krishna Gurugubelli, Anil Kumar Vuppala
6 pages
Huang 2018
No ratings yet
Huang 2018
30 pages
A Survey of The State of Explainable AI For Natural Language Processing
No ratings yet
A Survey of The State of Explainable AI For Natural Language Processing
13 pages
Zharmagambetov 2015
No ratings yet
Zharmagambetov 2015
4 pages
Document Context Language Models
No ratings yet
Document Context Language Models
10 pages
s10462-023-10419-1
No ratings yet
s10462-023-10419-1
81 pages
Large Language Models Meet NLP: A Survey
No ratings yet
Large Language Models Meet NLP: A Survey
20 pages
Deprint 03 03 2019 1901.07859 PDF
No ratings yet
Deprint 03 03 2019 1901.07859 PDF
10 pages
(2018) Cross-Domain Aspect Extraction For Sentiment Analysis
No ratings yet
(2018) Cross-Domain Aspect Extraction For Sentiment Analysis
43 pages
learn to learn
No ratings yet
learn to learn
17 pages
A Cognitive Study On Semantic Similarity Analysis
No ratings yet
A Cognitive Study On Semantic Similarity Analysis
6 pages
V D N N D: P D A: Isualizing EEP Eural Etwork Ecisions Rediction Ifference Nalysis
No ratings yet
V D N N D: P D A: Isualizing EEP Eural Etwork Ecisions Rediction Ifference Nalysis
12 pages
challenges-interpretability
No ratings yet
challenges-interpretability
12 pages
Applications of CNN for Sentiement Analysis
No ratings yet
Applications of CNN for Sentiement Analysis
6 pages
Attention Based CNN
No ratings yet
Attention Based CNN
7 pages
1 s2.0 S0925231221010997 Main
No ratings yet
1 s2.0 S0925231221010997 Main
14 pages
Challenges in NMT - 1706.03872
No ratings yet
Challenges in NMT - 1706.03872
12 pages
2211.09732
No ratings yet
2211.09732
14 pages
APznzaYD23xZzgrNn UY T9fGgJbB0 Kfhgt21x0vaHH4qfIvCmiqGVPY37T19O (2)
No ratings yet
APznzaYD23xZzgrNn UY T9fGgJbB0 Kfhgt21x0vaHH4qfIvCmiqGVPY37T19O (2)
10 pages
Thesis On Named Entity Recognition
100% (3)
Thesis On Named Entity Recognition
5 pages
Topic Modelling Meets Deep Neural Networks - A Survey
No ratings yet
Topic Modelling Meets Deep Neural Networks - A Survey
8 pages
KSCBANovel Unsupervised Method Fo
No ratings yet
KSCBANovel Unsupervised Method Fo
11 pages
1811.00196v2
No ratings yet
1811.00196v2
12 pages
Solvingdifferentialequationsusingdeepneuralnetworks
No ratings yet
Solvingdifferentialequationsusingdeepneuralnetworks
20 pages
A Word-Concept Heterogeneous Graph Convolutional
No ratings yet
A Word-Concept Heterogeneous Graph Convolutional
16 pages
A Deep Learning Approach For Sentiment Analysis in Spanish Tweets
No ratings yet
A Deep Learning Approach For Sentiment Analysis in Spanish Tweets
8 pages
Pattern Recognition: Grégoire Montavon, Sebastian Lapuschkin, Alexander Binder, Wojciech Samek, Klaus-Robert Müller
No ratings yet
Pattern Recognition: Grégoire Montavon, Sebastian Lapuschkin, Alexander Binder, Wojciech Samek, Klaus-Robert Müller
12 pages
sentic-lstm
No ratings yet
sentic-lstm
12 pages
Polynomial Regression As An Alternative To Neural Nets
No ratings yet
Polynomial Regression As An Alternative To Neural Nets
28 pages
Clarity Guided Belief Revision For Domain Knowledge Recovery in Legacy Systems
No ratings yet
Clarity Guided Belief Revision For Domain Knowledge Recovery in Legacy Systems
8 pages
AI-Powered Text Generation For Harmonious Human-Machine Interaction: Current State and Future Directions
No ratings yet
AI-Powered Text Generation For Harmonious Human-Machine Interaction: Current State and Future Directions
8 pages
GPLSTM Icassp1
No ratings yet
GPLSTM Icassp1
6 pages
1808 08946v2 PDF
No ratings yet
1808 08946v2 PDF
10 pages
End-to-End_Speech_Emotion_Recognition_Using_Deep_Neural_Networks
No ratings yet
End-to-End_Speech_Emotion_Recognition_Using_Deep_Neural_Networks
5 pages
2022.Naacl Tutorials.4
No ratings yet
2022.Naacl Tutorials.4
7 pages
0VZj74 Nihms 1528335
No ratings yet
0VZj74 Nihms 1528335
42 pages
Reverse Engineering Recurrent Networks For Sentiment Classification Reveals Line Attractor Dynamics
No ratings yet
Reverse Engineering Recurrent Networks For Sentiment Classification Reveals Line Attractor Dynamics
17 pages
Natural Language Processing Meets Quantum Physics: A Survey and Categorization
No ratings yet
Natural Language Processing Meets Quantum Physics: A Survey and Categorization
11 pages
Graph Convolutional Networks For Named Entity Recognition: Gcns NER
No ratings yet
Graph Convolutional Networks For Named Entity Recognition: Gcns NER
9 pages
10 1002@cpe 5971
No ratings yet
10 1002@cpe 5971
17 pages
Deep Learning Frameworks
From Everand
Deep Learning Frameworks
Jamal Hopper
No ratings yet
Experiment No. 4: Aim: Measurement of Straightness by Wedge Method
No ratings yet
Experiment No. 4: Aim: Measurement of Straightness by Wedge Method
2 pages
Open BSc. Automobile Engineering 2
No ratings yet
Open BSc. Automobile Engineering 2
2 pages
IGCSE Topical Past Papers Addmath P1
No ratings yet
IGCSE Topical Past Papers Addmath P1
201 pages
ACES Lesson Plan
No ratings yet
ACES Lesson Plan
2 pages
Math g8 m1 Topic A Lesson 1 Teacher
No ratings yet
Math g8 m1 Topic A Lesson 1 Teacher
8 pages
Grade 8 Module
No ratings yet
Grade 8 Module
9 pages
Algorithms 17 00524
No ratings yet
Algorithms 17 00524
17 pages
This Study Resource Was
No ratings yet
This Study Resource Was
4 pages
4 Dim
No ratings yet
4 Dim
8 pages
SP Iii-39
No ratings yet
SP Iii-39
3 pages
ME101-Lecture10 - Friction and Wedge
No ratings yet
ME101-Lecture10 - Friction and Wedge
31 pages
BCS304 Super Important - 22SCHEME
No ratings yet
BCS304 Super Important - 22SCHEME
3 pages
Rationalizing The Denominators With Conjugates
No ratings yet
Rationalizing The Denominators With Conjugates
4 pages
Class 9 Maths Circles Worksheet-7
No ratings yet
Class 9 Maths Circles Worksheet-7
14 pages
Computational Thinking: A Problem-Solving Tool For Every Classroom
No ratings yet
Computational Thinking: A Problem-Solving Tool For Every Classroom
6 pages
Standard Costing
100% (2)
Standard Costing
65 pages
AI-Driven_DDoS_Mitigation_at_the_Edge_Leveraging_Machine_Learning_for_Real-Time_Threat_Detection_and_Response
No ratings yet
AI-Driven_DDoS_Mitigation_at_the_Edge_Leveraging_Machine_Learning_for_Real-Time_Threat_Detection_and_Response
7 pages
Sample Thesis Chapter 5 Conclusion and Recommendation
100% (1)
Sample Thesis Chapter 5 Conclusion and Recommendation
7 pages
In Short Gate Mechanical Formulas
No ratings yet
In Short Gate Mechanical Formulas
168 pages
1 - Uplift Mechanics of Unanchored Liquid Storage Tanks Subjected To Lateral Earthquake Loading
No ratings yet
1 - Uplift Mechanics of Unanchored Liquid Storage Tanks Subjected To Lateral Earthquake Loading
3 pages
Cs2353 Object Oriented Analysis and Design
No ratings yet
Cs2353 Object Oriented Analysis and Design
4 pages
AT - Chapter 10-Notes - Part 3
No ratings yet
AT - Chapter 10-Notes - Part 3
3 pages
Problem Set - Assignment
No ratings yet
Problem Set - Assignment
6 pages
MATHS Y9 Entrance Exam 2021
No ratings yet
MATHS Y9 Entrance Exam 2021
8 pages
Advance Engineering Mathematics ECE 321
No ratings yet
Advance Engineering Mathematics ECE 321
33 pages
Expt Forces As Vectors Resultants and Equilibrants 1 1
No ratings yet
Expt Forces As Vectors Resultants and Equilibrants 1 1
6 pages
Instant Download Introductory Physics Summaries Examples and Practice Problems 1st Edition Michael Antosh PDF All Chapters
100% (4)
Instant Download Introductory Physics Summaries Examples and Practice Problems 1st Edition Michael Antosh PDF All Chapters
65 pages

Evaluating Neural Network Explanation Methods Using Hybrid Documents and Morphosyntactic Agreement

Uploaded by

Evaluating Neural Network Explanation Methods Using Hybrid Documents and Morphosyntactic Agreement

Uploaded by

Evaluating neural network explanation methods using

hybrid documents and morphosyntactic agreement

Nina Poerner, Benjamin Roth & Hinrich Schütze

Abstract A number of post hoc explanation methods for

gt,d · it,d where ~z are GRU update gates.

7 Summary Leila Arras, Franziska Horn, Grégoire Montavon,

You might also like