0% found this document useful (0 votes)
23 views13 pages

Evaluating The Visualization of What A Deep Neural Network Has Learned

Uploaded by

Best Le
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
23 views13 pages

Evaluating The Visualization of What A Deep Neural Network Has Learned

Uploaded by

Best Le
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 13

SAMEK ET AL.

− EVALUATING THE VISUALIZATION OF WHAT A DEEP NEURAL NETWORK HAS LEARNED 1

Evaluating the visualization of what a


Deep Neural Network has learned
Wojciech Samek† Member, IEEE, Alexander Binder† , Grégoire Montavon, Sebastian Bach, and Klaus-Robert
Müller, Member, IEEE,

Abstract—Deep Neural Networks (DNNs) have demonstrated also [10]. Since DNN training methodologies (unsupervised
impressive performance in complex machine learning tasks such pretraining, dropout, parallelization, GPUs etc.) have been
as image classification or speech recognition. However, due to
arXiv:1509.06321v1 [cs.CV] 21 Sep 2015

improved [11], DNNs are recently able to harvest extremely


their multi-layer nonlinear structure, they are not transparent,
i.e., it is hard to grasp what makes them arrive at a particular large amounts of training data and can thus achieve record
classification or recognition decision given a new unseen data performances in many research fields. At the same time,
sample. Recently, several approaches have been proposed en- DNNs are generally conceived as black box methods, and
abling one to understand and interpret the reasoning embodied users might consider this lack of transparency a drawback in
in a DNN for a single test image. These methods quantify practice. Namely, it is difficult to intuitively and quantitatively
the “importance” of individual pixels wrt the classification
decision and allow a visualization in terms of a heatmap in understand the result of DNN inference, i.e. for an individual
pixel/input space. While the usefulness of heatmaps can be novel input data point, what made the trained DNN model
judged subjectively by a human, an objective quality measure arrive at a particular response. Note that this aspect differs
is missing. In this paper we present a general methodology from feature selection [12], where the question is: which
based on region perturbation for evaluating ordered collections features are on average salient for the ensemble of training
of pixels such as heatmaps. We compare heatmaps computed by
three different methods on the SUN397, ILSVRC2012 and MIT data.
Places data sets. Our main result is that the recently proposed Only recently, the transparency problem has been receiving
Layer-wise Relevance Propagation (LRP) algorithm qualitatively more attention for general nonlinear estimators [13], [14], [15].
and quantitatively provides a better explanation of what made Several methods have been developed to understand what a
a DNN arrive at a particular classification decision than the DNN has learned [16], [17], [18]. While in DNN a large body
sensitivity-based approach or the deconvolution method. We
provide theoretical arguments to explain this result and discuss its of work is dedicated to visualize particular neurons or neuron
practical implications. Finally, we investigate the use of heatmaps layers [1], [19], [20], [21], [22], [23], [24], we focus here on
for unsupervised assessment of neural network performance. methods which visualize the impact of particular regions of a
Index Terms—Convolutional Neural Networks, Explanation, given and fixed single image for a prediction of this image.
Heatmapping, Relevance Models, Image Classification. Zeiler and Fergus [19] have proposed in their work a network
propagation technique to identify patterns in a given input
image that are linked to a particular DNN prediction. This
I. I NTRODUCTION method runs a backward algorithm that reuses the weights
Deep Neural Networks (DNNs) are powerful methods for at each layer to propagate the prediction from the output
solving large scale real world problems such as automated im- down to the input layer, leading to the creation of meaningful
age classification [1], [2], [3], [4], natural language processing patterns in input space. This approach was designed for a
[5], [6], human action recognition [7], [8], or physics [9]; see particular type of neural network, namely convolutional nets
with max-pooling and rectified linear units. A limitation of the
This work was supported by the Brain Korea 21 Plus Program through deconvolution method is the absence of a particular theoretical
the National Research Foundation of Korea funded by the Ministry of criterion that would directly connect the predicted output
Education. This work was also supported by the grant DFG (MU 987/17-
1) and by the German Ministry for Education and Research as Berlin Big to the produced pattern in a quantifiable way. Furthermore,
Data Center BBDC (01IS14013A). This publication only reflects the authors the usage of image-specific information for generating the
views. Funding agencies are not liable for any use that may be made of the backprojections in this method is limited to max-pooling layers
information contained herein. Asterisks indicate corresponding author.
∗ W. Samek is with Fraunhofer Heinrich Hertz Institute, 10587 Berlin, alone. Further previous work has focused on understanding
Germany. (e-mail: [email protected]) non-linear learning methods such as DNNs or kernel methods
∗ A. Binder is with the ISTD Pillar, Singapore University of Technology
[14], [25], [26] essentially by sensitivity analysis in the sense
and Design (SUTD), Singapore, and with the Berlin Institute of Technology
(TU Berlin), 10587 Berlin, Germany. (e-mail: alexander [email protected]) of scores based on partial derivatives at the given sample.
G. Montavon is with the Berlin Institute of Technology (TU Berlin), 10587 Partial derivatives look at local sensitivities detached from
Berlin, Germany. (e-mail: [email protected]) the decision boundary of the classifier. Simonyan et al. [26]
S. Bach is with Fraunhofer Heinrich Hertz Institute, 10587 Berlin, Germany.
(e-mail: [email protected]) applied partial derivatives for visualizing input sensitivities in
∗ K.-R. Müller is with the Berlin Institute of Technology (TU Berlin),
images classified by a deep neural network. Note that although
10587 Berlin, Germany, and also with the Department of Brain and Cog- [26] describes a Taylor series, it relies on partial derivatives
nitive Engineering, Korea University, Seoul 136-713, Korea (e-mail: klaus-
[email protected]) at the given image for computation of results. In a strict
† WS and AB contributed equally sense partial derivatives do not explain a classifier’s decision
SAMEK ET AL. − EVALUATING THE VISUALIZATION OF WHAT A DEEP NEURAL NETWORK HAS LEARNED 2

(“what speaks for the presence of a car in the image”), but Random Segmentation Relevance
rather tell us what change would make the image more or less
belong to the category car. As shown later these two types
of explanations lead to very different results in practice. An
approach, Layer-wise Relevance Propagation (LRP), which is
applicable to arbitrary types of neural unit activities (even if
they are non-continuous) and to general DNN architectures
has been proposed by Bach et al. [27]. This work aims at
Fig. 1. Comparison of three exemplary heatmaps for the image of a ‘3’. Left:
explaining the difference of a prediction f (x) relative to The randomly generated heatmap lacks interpretable information. Middle: The
the neutral state f (x) = 0. The LRP method relies on a segmentation heatmap focuses on the whole digit without indicating what parts
conservation principle to propagate the prediction back without of the image were particularly relevant for classification. Since it does not
suffice to consider only the highlighted pixels for distinguishing an image of
using gradients. This principle ensures that the network output a ‘3’ from images of an ‘8’ or a ‘9’, this heatmap is not useful for explaining
activity is fully redistributed through the layers of a DNN onto classification decisions. Right: A relevance heatmap indicates which parts of
the input variables, i.e., neither positive nor negative evidence the image are used by the classifier. Here the heatmap reflects human intuition
very well because the horizontal bar together with the missing stroke on the
is lost. left are strong evidence that the image depicts a ‘3’ and not any other digit.
In the following we will denote the visualizations produced
by the above methods as heatmaps. While per se a heatmap
is an interesting and intuitive tool that can already allow to elements can be processed by a neural network.
achieve transparency, it is difficult to quantitatively evaluate
Let us consider an image x ∈ Rd , decomposable as a set
the quality of a heatmap. In other words we may ask: what
of pixel values x = {xp } where p denotes a particular pixel,
exactly makes a “good” heatmap. A human may be able to
and a classification function f : Rd → R+ . The function value
intuitively assess the quality of a heatmap, e.g., by matching
f (x) can be interpreted as a score indicating the certainty of
with a prior of what is regarded as being relevant (see Figure
the presence of a certain type of object(s) in the image. Such
1). For practical applications, however, an automated objec-
functions can be learned very well by a deep neural network.
tive and quantitative measure for assessing heatmap quality
Throughout the paper we assume neural networks to consist
becomes necessary. Note that the validation of heatmap quality
of multiple layers of neurons, where neurons are activated as
is important if we want to use it as input for further analysis.
For example we could run computationally more expensive (l+1)
X
(l+1)

algorithms only on relevant regions in the image, where aj =σ zij + bj (1)
i
relevance is detected by a heatmap.
(l) (l,l+1)
In this paper we contribute by with zij = ai wij (2)
• pointing to the issue of how to objectively evaluate the
quality of heatmaps. To the best of our knowledge this The sum operator runs over all lower-layer neurons that are
(l)
question has not been raised so far. connected to neuron j, where ai is the activation of a neuron
• introducing a generic framework for evaluating heatmaps i in the previous layer, and where zij is the contribution of
which extends the approach in [27] from binary inputs to neuron i at layer l to the activation of the neuron j at layer
color images. l + 1. The function σ is a nonlinear monotonously increasing
(l,l+1) (l+1)
• comparing three different heatmap computation methods activation function, wij is the weight and bj is the bias
on three large data-sets and noting that the relevance- term.
based LRP algorithm [27] is more suitable for explaining A heatmap h = {hp } assigns each pixel p a value
the classification decisions of DNNs than the sensitivity- hp = H(x, f, p) according to some function H, typically
based approach [26] and the deconvolution method [19]. derived from a class discriminant f . Since h has the same
• investigating the use of heatmaps for assessment of neural dimensionality as x, it can be visualized as an image. In
network performance. the following we review three recent methods for computing
The next section briefly introduces three existing methods heatmaps, all of them performing a backward propagation pass
for computing heatmaps. Section III discusses the heatmap on the network: (1) a sensitivity analysis based on neural
evaluation problem and presents a generic framework for network partial derivatives, (2) the so-called deconvolution
this task. Two experimental results are presented in Section method and (3) the layer-wise relevance propagation algo-
IV: The first experiment compares different heatmapping rithm. Figure 2 briefly summarizes the methods.
algorithms on SUN397 [28], ILSVRC2012 [29] and MIT
Places [30] data sets and the second experiment investigates
the correlation between heatmap quality and neural network A. Sensitivity Heatmaps
performance on the CIFAR-10 data set [31]. We conclude the
paper in Section V and give an outlook. A well-known tool for interpreting non-linear classifiers is
sensitivity analysis [14]. It was used by Simonyan et al. [26]
II. U NDERSTANDING DNN P REDICTION to compute saliency maps of images classified by neural
In the following we focus on images, but the presented networks. In this approach the sensitivity of a pixel hp is
techniques are applicable to any type of input domain whose computed by using the norm k · k`q over partial derivatives
SAMEK ET AL. − EVALUATING THE VISUALIZATION OF WHAT A DEEP NEURAL NETWORK HAS LEARNED 3

Heatmapping Methods for DNNs

input output
Classification

Explanation
heatmap

Sensitivity Analysis Deconvolution Method LRP Algorithm


(Simonyan et al.,2014) (Zeiler and Fergus, 2014) (Bach et al., 2015)

Chain rule for Backward mapping function


Propagation

computing derivatives: Backward mapping function:


+ conservation principles:
Heatmap

local sensitivity (what makes matching input pattern for the explanatory input pattern that indicates
a bird more/less a bird). classifed object in the image. evidence for and against a bird.
Relation

not specified
to
Applicable

any network with continuous convolutional network with max-pooling any network with monotonous
locally differentiable neurons. and rectified linear units. activations
(even non-continuous units)

(i) heatmap does not fully explain (i) no direct correspondence between
Drawbacks

the image classification. heatmap scores and contribution


of pixels to the classification.
(ii) image-specific information only from
max-pooling layers.
Example
Functions view

(local) (global) (global)

Fig. 2. Comparison of the three heatmap computation methods used in this paper. Left: Sensitivity heatmaps are based on partial derivatives, i.e., measure
which pixels, when changed, would make the image belong less or more to a category (local explanations). The method is applicable to generic architectures
with differentiable units. Middle: The deconvolution method applies a convolutional network g to the output of another convolutional network f . Network g is
constructed in a way to “undo” the operations performed by f . Since negative evidence is discarded and scores are not normalized during the backpropagation,
the relation between heatmap scores and the classification output f (x) is unclear. Right: Layer-wise Relevance Propagation (LRP) exactly decomposes the
classification output f (x) into pixel relevances by observing the layer-wise conservation principle, i.e., evidence for or against a category is not lost. The
algorithm does not use gradients and is therefore applicable to generic architectures (including nets with non-continuous units). LRP globally explains the
classification decision and heatmap scores have a clear interpretation as evidence for or against a category.

([26] used q = ∞) for the color channel c of a pixel p: This quantity measures how much small changes in the pixel
  value locally affect the network output. Large values of hp

hp = f (x) (3) denote pixels which largely affect the classification function f
∂xp,c c∈(r,g,b) `q
SAMEK ET AL. − EVALUATING THE VISUALIZATION OF WHAT A DEEP NEURAL NETWORK HAS LEARNED 4

if changed. Note that the direction of change (i.e., sign of the relevance signal from the layer above into the appropriate
the partial derivative) is lost when using the norm. Partial locations. For deconvolution this seems to be the only place
derivatives are obtained efficiently by running the backprop- besides the classifier output where image information from the
agation algorithm [32] throughout the multiple layers of the forward pass is used, in order to arrive at an image-specific
network. The backpropagation rule from one layer to another explanation.
layer, where x(l) and x(l+1) denote the neuron activities at Nonlinearity: The relevance signal at a ReLU layer is passed
two consecutive layers is given by: through a ReLU function during the deconvolution process.
∂f ∂x(l+1) ∂f Filtering: In a convolution layer, the transposed versions of the
= (4)
∂x(l) ∂x(l) ∂x(l+1) trained filters are used to backpropagate the relevance signal.
The backpropagation algorithm performs the following oper- This projection does not depend on the neuron activations x(l) .
ations in the various layers: The unpooling and filtering rules are the same as those
Unpooling: The gradient signal is redirected onto the input derived from gradient propagation (i.e. those used in Section
neuron(s) to which the corresponding output neuron is sensi- II-A). The propagation rule for the ReLU nonlinearity differs
tive. In the case of max-pooling, the input neuron in question from backpropagation: Here, the backpropagated signal is not
is the one with maximum activation value. multiplied by a discontinuous indicator function, but is instead
(l) passed through a rectification function similar to the one used
Nonlinearity: Denoting zi is the preactivation of the ith
in the forward pass. Note that unlike the indicator function,
neuron of the lth layer, backpropagating the signal through
(l) the rectification function is continuous. This continuity in the
a rectified linear unit (ReLU) defined by the map zi →
(l) backward mapping procedure enables the capture of more
max(0, zi ) corresponds to multiplying the backpropagated global features that can be in principle useful to represent
gradient signal by the indicator function 1{z(l) >0} . evidence for the whole object to be predicted. Note also that
i
Filtering: The gradient signal is convolved by a transposed the deconvolution method only implicitly takes into account
version of the convolutional filter used in the forward pass. properties of individual images through the unpooling opera-
Of particular interest, the multiplication of the signal by an tion. The backprojection over filtering layers is independent of
indicator function in the rectification layer makes the backward the individual image. Thus, when applied to neural networks
mapping discontinuous, and consequently strongly local. Thus, without a pooling layer, the deconvolution method will not
the gradient on which the heatmap is based is expected to be provide individual (image specific) explanations, but rather
mostly composed of local features (e.g. what makes a given average salient features (see Figure 3). Note also that negative
car look more/less like a car) and few global features (e.g. evidence (R(l+1) < 0) is discarded during the backpropagation
what are all the features that compose a given car). Note that due to the application of the ReLU function. Furthermore
the gradient gives for every pixel a direction in RGB-space the backward signal is not normalized layer-wise, so that few
in which the prediction increases or decreases, but it does dominant R(l) may largely determine the final heatmap scores.
not indicate directly to whether a particular region contains Due to the suppression of negative evidence and the lack of
evidence for or against the prediction made by a classifier. We normalization the relation between the heatmap scores and
compute heatmaps by using Eq. 3 with the norms q = {2, ∞}. the classification output cannot be expressed analytically but
is implicit to the above algorithmic procedure.
For deconvolution we apply the same color channel pooling
B. Deconvolution Heatmaps
methods (2-norm, ∞-norm) as for sensitivity analysis.
Another method for heatmap computation was proposed
in [19] and uses a process termed deconvolution. Similarly
C. Relevance Heatmaps
to the backpropagation method to compute the function’s
gradient, the idea of the deconvolution approach is to map Layer-wise Relevance Propagation (LRP) [27] is a prin-
the activations from the network’s output back to pixel space cipled approach to decompose a classification decision into
using a backpropagation rule pixel-wise relevances indicating the contributions of a pixel
to the overall classification score. The approach is derived
R(l) = mdec (R(l+1) ; θ(l,l+1) ). from a layer-wise conservation principle [27], which forces the
Here, R(l) , R(l+1) denote the backward signal as it is back- propagated quantity (e.g. evidence for a predicted class) to be
propagated from one layer to the previous layer, mdec is a preserved between neurons of two adjacent layers. Denoting
(l)
predefined function that may be different for each layer and by Ri the relevance associated to the ith neuron of layer l
(l+1)
θ(l,l+1) is the set of parameters connecting two layers of and by Rj the relevance associated to the jth neuron in
neurons. This method was designed for a convolutional net the next layer, the conservation principle requires that
with max-pooling and rectified linear units, but it could also P (l) P (l+1)
be adapted in principle for other types of architectures. The i Ri = j Rj (5)
following set of rules is applied to compute deconvolution where the sums run over all neurons of the respective layers.
heatmaps. Applying this rule repeatedly for all layers, the heatmap
P (1)
Unpooling: The locations of the maxima within each pooling resulting from LRP satisfies p hp = f (x) where hp = Rp
region are recorded and these recordings are used to place and is said to be consistent with the evidence for the predicted
SAMEK ET AL. − EVALUATING THE VISUALIZATION OF WHAT A DEEP NEURAL NETWORK HAS LEARNED 5

class. Stricter definitions of conservation that involve only negative evidence (i.e. parts of the image that speak against
subsets of neurons can further impose that relevance is locally the neural network classification decision). (3) Finally LRP’s
redistributed in the lower layers. The propagation rules for rule for filtering layers takes into account both filter weights
each type of layers are given below: and lower-layer neuron activations. This allows for individual
Unpooling: Like for the previous approaches, the backward explanations even in a neural network without pooling layers.
signal is redirected proportionally onto the location for which To demonstrate these advantages on a simple example,
the activation was recorded in the forward pass. we compare the explanations provided by the deconvolution
method and LRP for a neural network without pooling layers
Nonlinearity: The backward signal is simply propagated onto
trained on the MNIST data set (see [27] for details). One
the lower layer, ignoring the rectification operation. Note that
can see in Figure 3 that LRP provides individual explanations
this propagation rule satisfies Equation 5.
for all images in the sense that when the digit in the image
Filtering: Bach et al. [27] proposed two relevance prop- is slightly rotated, then the heatmap adapts to this rotation
agation rules for this layer, that satisfy Equation 5. Let and highlights the relevant regions of this particular rotated
(l) (l,l+1)
zij = ai wij be the weighted activation of neuron i onto digit. The deconvolution heatmap on the other hand is not
neuron j in the next layer. The first rule is given by: image-specific because it only depends on the weights and
(l)
X zij (l+1) not the neuron activations. If pooling layers were present in
Ri = P P Rj (6)
i 0 z i0 j +  sign(
i 0 z i0j ) the network, then the deconvolution approach would implicitly
j
adapt to the specific image through the unpooling operation.
The intuition behind this rule is that lower-layer neurons that Still we consider this information important to be included
mostly contribute to the activation of the higher-layer neuron when backprojecting over filtering layers, because neurons
receive a larger share of the relevance Rj of the neuron j. with large activations for a specific image should be regarded
The neuron i then collects the relevance associated to its as more relevant, thus should backproject a larger share of the
contribution from all upper-layer neurons j. A downside of relevance. Apart from this drawback one can see in Figure
this propagation rule (at least if  = 0) is that the denominator 3 that LRP responses can be well interpreted as positive
may tend to zero if lower-level contributions to neuron j cancel evidence (red color) and negative evidence (blue color) of
each other out. The numerical instability can be overcome by a classification decision. In particular, when backpropagating
either setting  > 0. However in that case, the conservation the (artificial) classification decision that the image has been
idea is relaxated in order to gain better numerical properties. A classified as ‘9’, LRP provides a very intuitive explanation,
way to achieve exact conservation is by separating the positive namely that in the left upper part of the image the missing
and negative activations in the relevance propagation formula, stroke closing the loop (blue color) speaks against the fact that
which yield the second formula: this is a ‘9’ whereas the missing stroke in the left lower part of
+ −
! the image (red color) supports this decision. The deconvolution
(l)
X zij zij (l+1)
Ri = α · P + + β · P − Rj . (7) method does not allow such interpretation, because it loses
i0 zi0 j i 0 zi0 j negative evidence while backprojecting over ReLU layers and
j

+ −
does not use image specific information.
Here, zij and zij denote the positive and negative part of zij
+ −
respectively, such that zij + zij = zij . We enforce α + β = 1
III. E VALUATING H EATMAPS
in order for the relevance propagation equations to be
conservative layer-wise. It should be emphasized that unlike A. What makes a good heatmap?
gradient-based techniques, the LRP formula is applicable Although humans are able to intuitively assess the quality of
to non-differentiable neuron activation functions. In the a heatmap by matching with prior knowledge and experience
experiments section we use for consistency the same settings of what is regarded as being relevant, defining objective
as in [27] without having optimized the parameters, namely criteria for heatmap quality is very difficult. In this paper
the LRP variant from Equation (7) with α = 2 and β = −1 we refrain from mimicking the complex human heatmap
(which will be denoted as LRP in subsequent figures), and evaluation process which includes attention, interest point
twice LRP from Equation (6) with  = 0.01 and  = 100. models and perception models of saliency [33], [34], [35],
[36], because we are interested in the relevance of the heatmap
Same as for the deconvolution heatmap, the LRP algorithm for the classifier decision. We use the classifier’s output
does not multiply its backward signal by a discontinuous func- and a perturbation method to objectively assess the quality
tion. Therefore, relevance heatmaps also favor the emergence (see Section III-C). When comparing different heatmapping
of global features, that allow for full explanation of the class algorithms one should be aware that heatmap quality does not
to be predicted. In addition, a heatmap produced by LRP only depend on the algorithms used to compute a heatmap,
has the following technical advantages over sensitivity and but also on the performance of the classifier, whose efficiency
deconvolution: (1) Localized relevance conservation ensures largely depends on the model used and the amount and quality
a proper global redistribution of relevance in the pixel space. of available training data. A random classifier will provide
(2) By summing relevance on each color channel, heatmaps random heatmaps. Also, if the training data does not contain
can be directly interpreted as a measure of total relevance images of the digits ‘3’, then the classifier can not know that
per pixel, without having to compute a norm. This allows for the absence of strokes in the left part of the image (see example
SAMEK ET AL. − EVALUATING THE VISUALIZATION OF WHAT A DEEP NEURAL NETWORK HAS LEARNED 6

What speaks for / against a '3' What speaks for / against a '9'

Image
LRP
Deconvolution

Fig. 3. Comparison of LRP and deconvolution heatmaps for a neural network without pooling layers. The heatmaps in the left panel explain why the image
was classified as ‘3’, whereas heatmaps on the right side explain the classification decision ‘9’. The LRP heatmaps (second row) visualize both positive and
negative evidence and are tailored to each individual image. The deconvolution method does not use neural activations, thus fails to provide image specific
heatmaps for this network (last row). The heatmaps also do not allow to distinguish between positive and negative evidence. Since negative evidence is
discarded when backpropagating through the ReLU layer, deconvolution heatmaps are not able to explain what speaks against the classification decision ‘9’.

in Figure 1) is important for distinguishing the digit ‘3’ from encoded in the image (e.g. as measured by the function f )
digits ‘8’ and ‘9’. Thus, explanations can only be as good as disappears when we progressively remove information from
the data provided to the classifier. the image x, a process referred to as region perturbation, at
Furthermore, one should keep in mind that a heatmap al- the specified locations. The method is a generalization of the
ways represents the classifier’s view, i.e., explanations neither approach presented in [27], where the perturbation process is
need to match human intuition nor focus on the object of a state flip of the associated binary pixel values (single pixel
interest. A heatmap is not a segmentation mask (see Figure perturbation). The method that we propose here applies more
1), on the contrary missing evidence or the context may be generally to any set of locations (e.g., local windows) and
very important for classification. Also image statistics may any local perturbation process such as local randomization or
be highly discriminative, i.e., evidence for a class does not blurring.
need to be localized. From our experience heatmaps become We define a heatmap as an ordered set of locations in the
more intuitive with higher classification accuracy (see Section image, where these locations might lie on a predefined grid.
IV-D), but there is no guarantee that human and classifier
O = (r 1 , r 2 , . . . , r L ) (8)
explanations match. Regarding general quality criteria we
believe that heatmaps should have low “complexity”, i.e., be Each location r p is for example a two-dimensional vector en-
as sparse and non-random as possible. Only the relevant parts coding the horizontal and vertical position on a grid of pixels.
in the images should be highlighted and not more. We use The ordering can either be chosen at hand, or be induced by
complexity, measured in terms of image entropy or the file a heatmapping function hp = H(x, f, r p ), typically derived
size of the compressed heatmap image, as an auxiliary criteria from a class discriminant f (see methods in Section II). The
to assess heatmap quality. scores {hp } indicate how important the given location r p of
the image is for representing the image class. The ordering
B. Salient Features vs. Individual Explanations induced by the heatmapping function is such that for all indices
Salient features represent average explanations of what of the ordered sequence O, the following property holds:
distinguishes one image category from another. For individual

(i < j) ⇔ H(x, f, r i ) > H(x, f, r j ) (9)
images these explanations may be meaningless or even wrong.
For instance, salient features for the class ‘bicycle’ may be the Thus, locations in the image that are most relevant for the
wheels and the handlebar. However, in some images a bicycle class encoded by the classifier function f will be found at
may be partly occluded so that these parts of a bike are not the beginning of the sequence O. Conversely, regions of the
visible. In these images salient features fail to explain the image that are mostly irrelevant will be positioned at the end
classifier’s decision (which still may be correct). Individual of the sequence.
explanations on the other hand do not target the “average We consider a region perturbation process that follows the
case”, but focus on the particular image and may identify other ordered sequence of locations. We call this process most
parts of the bike or the context (e.g., presence of cyclist) as relevant first, abbreviated as MoRF. The recursive formula is:
being good explanations of the classifier’s decision. (0)
xMoRF = x (10)
(k) (k−1)
C. Heatmap Evaluation Framework ∀1≤k≤L: xMoRF = g(xMoRF , r k )
(k−1)
To evaluate the quality of a heatmap we consider a greedy where the function g removes information of the image xMoRF
iterative procedure that consists of measuring how the class at a specified location r k (i.e., a single pixel or a local
SAMEK ET AL. − EVALUATING THE VISUALIZATION OF WHAT A DEEP NEURAL NETWORK HAS LEARNED 7

neighborhood) in the image. Throughout the paper we use B. Quantitative Comparison of Heatmapping Methods
a function g which replaces all pixels in a 9 × 9 neighborhood
around r k by randomly sampled (from uniform distribution) We quantitatively compare the quality of heatmaps gen-
values. When comparing different heatmaps using a fixed erated by the three algorithms described in Section II. As
g(x, r k ) our focus is typically only on the highly relevant a baseline we also compute the AOPC curves for random
regions (i.e., the sorting of the hp values on the non-relevant heatmaps (i.e., random ordering O). Figure 4 displays the
regions is not important). The quantity of interest in this case AOPC values as function of the perturbation steps (i.e., L)
is the area over the MoRF perturbation curve (AOPC): relative to the random baseline.
L
From the figure one can see that heatmaps computed by
1 DX (0) (k)
E
LRP have the largest AOPC values, i.e., they better identify
AOPC = f (xMoRF ) − f (xMoRF ) (11)
L+1 p(x) the relevant (wrt the classification tasks) pixels in the image
k=0
than heatmaps produced with sensitivity analysis or the de-
where h·ip(x) denotes the average over all images in the data
convolution approach. This holds for all three data sets. The
set. An ordering of regions such that the most sensitive regions
-LRP formula (see Eq. 6) performs slightly better than α, β-
are ranked first implies a steep decrease of the graph of MoRF,
LRP (see Eq. 7), however, we expect both LRP variants to
and thus a larger AOPC.
have similar performance when optimizing for the parameters
IV. E XPERIMENTAL R ESULTS (here we use the same settings as in [27]). The deconvolu-
In this section we use the proposed heatmap evaluation pro- tion method performs as closest competitor and significantly
cedure to compare heatmaps computed with the LRP algorithm outperforms the random baseline. Since LRP distinguishes
[27], the deconvolution approach [19] and the sensitivity-based between positive and negative evidence and normalizes the
method [26] (Section IV-B) to a random order baseline. Exem- scores properly, it provides less noisy heatmaps than the
plary heatmaps produced with these algorithms are displayed deconvolution approach (see Section IV-C) which results in
and discussed in Section IV-C. At the end of this section we better quantitative performance. As stated above sensitivity
briefly investigate the correlation between heatmap quality and analysis targets a slightly different problem and thus provides
network performance. quantitatively and qualitatively suboptimal explanations of the
classifier’s decision. Sensitivity provides local explanations,
A. Setup but may fail to capture the global features of a particular class.
We demonstrate the results on a classifier for the MIT Places In this context see also the works of Szegedy [21], Goodfellow
data set [30] provided by the authors of this data set and [22] and Nguyen [39] in which changing an image as a whole
the Caffe reference model [37] for ImageNet. We kept the by a minor perturbation leads to a flip in the class labels, and
classifiers unchanged. The MIT Places classifier is used for in which rainbow-colored noise images are constructed with
two testing data sets. Firstly, we compute the AOPC values high classification accuracy.
over 5040 images from the MIT Places testing set. Secondly, The heatmaps computed on the ILSVRC2012 data set are
we use AOPC averages over 5040 images from the SUN397 qualitatively better (according to our AOPC measure) than
data set [28] as it was done in [38]. We ensured that the heatmaps computed on the other two data sets. One reason for
category labels of the images used were included in the MIT this is that the ILSVRC2012 images contain more objects and
Places label set. Furthermore, for the ImageNet classifier we less cluttered scenes than images from the SUN397 and MIT
report results on the first 5040 images of the ILSVRC2012 Places data sets, i.e., it is easier (also for humans) to capture
data set. The heatmaps are computed for all methods for the the relevant parts of the image. Also the AOPC difference
predicted label, so that our perturbation analysis is a fully between the random baseline and the other heatmapping
unsupervised method during test stage. Perturbation is applied methods is much smaller for the latter two data sets than
to 9×9 non-overlapping regions, each covering 0.157% of the for ILSVRC2012, because cluttered scenes contain evidence
image. We replace all pixels in a region by randomly sampled almost everywhere in the image whereas the background is
(from uniform distribution) values. The choice of a uniform less important for object categories.
distribution as region perturbation follows one assumption: We An interesting phenomenon is the performance difference of
consider a region highly relevant if replacing the information sensitivity heatmaps computed on SUN397 and MIT Places
in this region in arbitrary ways reduces the prediction score data sets, in the former case the AOPC curve of sensitivity
of the classifier; we do not want to restrict the analysis to heatmaps is even below the curve computed with random
highly specialized information removal schemes. In order to ranking of regions, whereas for the latter data set the sensitivity
reduce the effect of randomness we repeat the process 10 heatmaps are (at least initially) clearly better. Note that in
times. For each ordering we perturb the first 100 regions, both cases the same classifier [30], trained on the MIT Places
resulting in 15.7% of the image being exchanged. Running the data, was used. The difference between these data sets is that
experiments for 2 configurations of perturbations, each with SUN397 images lie outside the data manifold (i.e., images
5040 images, takes roughly 36 hours on a workstation with 20 of MIT Places used to train the classifier), so that partial
(10×2) Xeon HT-Cores. Given the above running time and the derivatives need to explain local variations of the classification
large number of configurations reported here, we considered function f (x) in an area in image space where f has not been
the choice of 5040 images as sample size a good compromise trained properly. This effect is not so strong for the MIT Places
between the representativity of our result and computing time. test data, because they are much closer to the images used to
SAMEK ET AL. − EVALUATING THE VISUALIZATION OF WHAT A DEEP NEURAL NETWORK HAS LEARNED 8

SUN397 ILSVRC2012 MIT Places


LRP
LRP 250 LRP
100
LRP 100

AOPC relative to random


LRP
LRP

AOPC relative to random


AOPC relative to random

80 LRP 200
80 LRP

60 LRP
Deconv.
150 Deconv.
Deconv. 60
40
Deconv. Sensitivity
100 Sensitivity
20 40
Deconv.

50 Deconv.
0 Random 20
Random
Sensitivity
-20 Sensitivity
0 Random 0 Sensitivity
Sensitivity
0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100
perturbation steps perturbation steps perturbation steps

Fig. 4. Comparison of the three heatmapping methods relative to the random baseline. The LRP algorithms have largest AOPC values, i.e., best explain the
classifier’s decision, for all three data sets.

TABLE I
train the classifier. Since both LRP and deconvolution provide C OMPARISON OF AVERAGE FILE SIZE OF THE HEATMAP IMAGES IN KB.
global explanations, they are less affected by this off-manifold LRP HEATMAPS HAVE SMALLEST SIZE , I . E ., LOWEST COMPLEXITY.
testing.
Method Sensitivity `2 Deconvolution `2 LRP
We performed above evaluation also for both Caffe networks
SUN397 183 166 154
in training phase, in which the dropout layers were active. The ILSVRC2012 177 164 154
results are qualitatively the same to the ones shown above. MIT Places 183 167 155
The LRP algorithm, which was explicitly designed to explain SUN397 25 20 14
the classifier’s decision, performs significantly better than the ILSVRC2012 22 18 13
other heatmapping approaches. We would like to stress that MIT Places 25 20 14
LRP does not artificially benefit from the way we evaluate
heatmaps as region perturbation is based on a assumption
(good heatmaps should rank pixels according to relevance
files are smaller (i.e., less complex) than the corresponding
wrt to classification) which is independent of the relevance
deconvolution and sensitivity files (same holds for jpeg files).
conservation principle that is used in LRP. Note that LRP was
Additionally, we report results obtained using another com-
originally designed for binary classifiers in which f (x) = 0
plexity measure, namely MATLAB’s entropy function. Also
denotes maximal uncertainty about prediction. The classifiers
according to this measure LRP heatmaps are less complex
used here were trained with a different multiclass objective,
(see boxplots in Figure 5) than heatmaps computed with the
namely that it suffices for the correct class to have the highest
sensitivity and deconvolution methods.
score. One can expect that in such a setup the state of maximal
uncertainty is given by a positive value rather than f (x) = 0.
In that sense the setup here slightly disfavours LRP. However
we refrained from retraining because it was important for us, C. Qualitative Comparison of Heatmapping Methods
firstly, to use classifiers provided by other researchers in an
unmodified manner, and, secondly, to evaluate the robustness In Figure 6 the heatmaps of the first 8 images of each data
of LRP when applied in the popular multi-class setup. set are visualized. The quantitative result presented above are
As stated before heatmaps can also be evaluated wrt to their in line with the subjective impressions. The sensitivity and
complexity (i.e., sparsity and randomness of the explanations). deconvolution heatmaps are nosier and less sparse than the
Good heatmaps highlight the relevant regions and not more, heatmaps computed with the LRP algorithm, reflecting the
whereas suboptimal heatmaps may contain lots of irrelevant results obtained in Section IV-B. For SUN 397 and MIT Places
information and noise. In this sense good heatmaps should the sensitivity heatmaps are close to random, whereas both
be better compressible than noisy ones. Table I compares the LRP and deconvolution highlight some structural elements in
average file size of heatmaps (saved as png and jpeg (quality the scene. We remark that this bad performance of sensitivity
90) images) computed with the three methods. The file sizes heatmaps does not contradict results like [21], [22]. In the
reflect the performance reported in Figure 4, i.e., LRP has former works, an image gets modified as a whole, while in
best performance on all three data sets and its heatmaps have this work we are considering the quality of selecting local
smallest file size (which means that they are well compressible, regions and ordering them. Furthermore gradients require to
i.e., have low complexity). The second best method is the move in a very particular direction for reducing the prediction
deconvolution algorithm and the sensitivity approach performs while we are looking for most relevant regions in the sense that
worst wrt to both measures. These differences in file size are changing them in any kind will likely destroy the prediction.
highly significant. The scatter plots in Figure 5 show that for The deconvolution and LRP algorithms capture more global
almost all images of the three data sets the LRP heatmap png (and more relevant) features than the sensitivity approach.
SAMEK ET AL. − EVALUATING THE VISUALIZATION OF WHAT A DEEP NEURAL NETWORK HAS LEARNED 9

LRP vs. Sensitivity LRP vs. Deconvolution


210 210

200 200

190 190

Deconvolution heatmap size [KB]


180 180
Sensitivity heatmap size [KB]

170 170

160 160

150 150

140 140

130 130

120 120
SUN397
ILSVRC2012
110 110 MIT Places

100 100
100 120 140 160 180 200 100 120 140 160 180 200
LRP heatmap size [KB] LRP heatmap size [KB]

SUN397 ILSVRC2012 MIT Places

6 6 6

5 5 5
Entropy
Entropy

4 4

Entropy
4

3 3 3

2 2 2

1 1 1
Sensitivity Deconvolution LRP Sensitivity Deconvolution LRP Sensitivity Deconvolution LRP

Fig. 5. Comparison of heatmap complexity, measured in terms of file size (top) and image entropy (bottom).

D. Heatmap Quality and Neural Network Performance

In the last experiment we briefly show that the quality


of a heatmap, as measured by AOPC, provides information
about the overall DNN performance. The intuitive explanation
for this is that well-trained DNNs much better capture the
relevant structures in an image, thus produce more meaningful 50
heatmaps than poorly trained networks which rather rely on 75%
perform ance
global image statistics. Thus, by evaluating the quality of 40
AOPC

a heatmap using the proposed procedure we can potentially 30 65%


assess the network performance, at least for classifiers that
were based on the same network topology. Note that this 20 55%
procedure is based on perturbation of the input of the classifier 10
with the highest predicted score. Thus this evaluation method 45%
is purely unsupervised and does not require labels of the 0
1000
1500

3000
3500

5000
2000
2500

4000
500
200

testing images. Figure 7 depicts the AOPC values and the


performance for different training iterations of a DNN for the
CIFAR-10 data set [31]. We did not perform these experiments training iterations
on a larger data set since the effect can still be observed Fig. 7. Evaluation of network performance by using AOPC on the CIFAR-10
nicely in this modest data size. The correlation between both data set
curves indicates that heatmaps contain information which can
potentially be used to judge the quality of the network. This
paper did not indent to profoundly investigate the relation V. C ONCLUSION
between network performance and heatmap quality, this is a Research in DNN has been traditionally focusing on im-
topic for future research. proving the quality, algorithmics or the speed of a neural
SAMEK ET AL. − EVALUATING THE VISUALIZATION OF WHAT A DEEP NEURAL NETWORK HAS LEARNED 10

Image Sensitivity Deconv. LRP Image Sensitivity Deconv. LRP

SUN397
ILSVRC2012
MIT Places

Fig. 6. Qualitative comparison of the three heatmapping methods.


SAMEK ET AL. − EVALUATING THE VISUALIZATION OF WHAT A DEEP NEURAL NETWORK HAS LEARNED 11

network model. We have studied an orthogonal research di- [11] Y. A. LeCun, L. Bottou, G. B. Orr, and K.-R. Müller, “Efficient
rection in our manuscript, namely, we have contributed to backprop,” in Neural networks: Tricks of the trade. Springer, 2012,
pp. 9–48.
furthering the understanding and transparency of the decision [12] I. Guyon and A. Elisseeff, “An introduction to variable and feature
making implemented by a trained DNN: For this we have selection,” Journal of Machine Learning Research, vol. 3, pp. 1157–
focused on the heatmap concept that, e.g. in a computer 1182, 2003.
[13] M. L. Braun, J. Buhmann, and K.-R. Müller, “On relevant dimensions
vision application, is able to attribute the contribution of in kernel feature spaces,” Journal of Machine Learning Research, vol. 9,
individual pixels to the DNN inference result for a novel pp. 1875–1908, Aug 2008.
data sample. While heatmaps allow a better intuition about [14] D. Baehrens, T. Schroeter, S. Harmeling, M. Kawanabe, K. Hansen,
and K.-R. Müller, “How to explain individual classification decisions,”
what has been learned by the network, we tackled the so Journal of Machine Learning Research, vol. 11, pp. 1803–1831, 2010.
far open problem of quantifying the quality of a heatmap. [15] K. Hansen, D. Baehrens, T. Schroeter, M. Rupp, and K.-R. Müller,
In this manner different heatmap algorithms can be compared “Visual interpretation of kernel-based prediction models,” Molecular
Informatics, vol. 30, no. 9, pp. 817–826, 2011.
quantitatively and their properties and limits can be related. [16] D. Erhan, A. Courville, and Y. Bengio, “Understanding representations
We proposed a region perturbation strategy that is based on learned in deep architectures,” Universite de Montreal/DIRO, Montreal,
the idea that flipping the most salient pixels first should lead QC, Canada, Tech. Rep, vol. 1355, 2010.
to high performance decay. A large AOPC value as a function [17] G. Montavon, M. Braun, and K.-R. Müller, “Kernel analysis of deep
networks,” Journal of Machine Learning Research, vol. 12, pp. 2563–
of perturbation steps was shown to provide a good measure 2581, 2011.
for a very informative heatmap. We also showed quantita- [18] G. Montavon, M. L. Braun, T. Krueger, and K.-R. Müller, “Analyzing
tively and qualitatively that sensitivity maps and heatmaps local structure in kernel-based learning: Explanation, complexity and
reliability assessment,” Signal Processing Magazine, IEEE, vol. 30,
computed with the deconvolution algorithm are much noisier no. 4, pp. 62–74, 2013.
than heatmaps computed with the LRP method, thus are less [19] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolu-
suitable for identifying the most important regions wrt the tional networks,” in ECCV, 2014, pp. 818–833.
[20] A. Mahendran and A. Vedaldi, “Understanding deep image representa-
classification task. Above all we provided first evidence that tions by inverting them,” CoRR, vol. abs/1412.0035, 2014.
heatmaps may be useful for assessment of neural network [21] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J.
performance. Bringing this idea into practical application will Goodfellow, and R. Fergus, “Intriguing properties of neural networks,”
CoRR, vol. abs/1312.6199, 2013.
be a topic of future research. Concluding, we have provided [22] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing
the basis for an accurate quantification of heatmap quality. adversarial examples,” CoRR, vol. abs/1412.6572, 2014.
Note that a good heatmap can not only be used for better [23] J. Yosinski, J. Clune, A. M. Nguyen, T. Fuchs, and H. Lipson, “Un-
derstanding neural networks through deep visualization,” CoRR, vol.
understanding of DNNs but also for a priorization of image abs/1506.06579, 2015.
regions. Thus, regions of an individual image with high [24] A. Dosovitskiy and T. Brox, “Inverting convolutional networks with
heatmap values could be subjected to more detailed analysis. convolutional networks,” CoRR, vol. abs/1506.02753, 2015.
[25] P. M. Rasmussen, T. Schmah, K. H. Madsen, T. E. Lund, G. Yourganov,
This could in the future allow highly time efficient processing S. C. Strother, and L. K. Hansen, “Visualization of nonlinear classifica-
of the data only where it matters. tion models in neuroimaging - signed sensitivity maps,” in BIOSIGNALS,
2012, pp. 254–263.
[26] K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep inside convolutional
R EFERENCES networks: Visualising image classification models and saliency maps,”
in ICLR Workshop, 2014.
[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification [27] S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller, and
with deep convolutional neural networks,” in Adv. in NIPS 25, 2012, pp. W. Samek, “On pixel-wise explanations for non-linear classifier deci-
1106–1114. sions by layer-wise relevance propagation,” PLOS ONE, vol. 10, no. 7,
[2] D. C. Ciresan, A. Giusti, L. M. Gambardella, and J. Schmidhuber, “Deep p. e0130140, 2015.
neural networks segment neuronal membranes in electron microscopy [28] J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba, “SUN
images,” in Adv. in NIPS, 2012, pp. 2852–2860. database: Large-scale scene recognition from abbey to zoo,” in Proc.
[3] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, on CVPR, 2010, pp. 3485–3492.
V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” [29] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma,
CoRR, vol. abs/1409.4842, 2014. Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and
[4] D. Ciresan, U. Meier, and J. Schmidhuber, “Multi-column deep neural L. Fei-Fei, “ImageNet Large Scale Visual Recognition Challenge,”
networks for image classification,” in Proc. of CVPR. IEEE, 2012, pp. International Journal of Computer Vision (IJCV), pp. 1–42, April 2015.
3642–3649. [30] B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva, “Learning
[5] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. P. deep features for scene recognition using places database,” in Adv. in
Kuksa, “Natural language processing (almost) from scratch,” Journal of NIPS, 2014, pp. 487–495.
Machine Learning Research, vol. 12, pp. 2493–2537, 2011. [31] A. Krizhevsky, “Learning multiple layers of features from tiny images,”
[6] R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Manning, A. Y. Ng, and Master’s thesis, University of Toronto, 2009.
C. Potts, “Recursive deep models for semantic compositionality over a [32] D. Rumelhart, G. Hinton, and R. Williams, “Learning representations
sentiment treebank,” in Proc. of EMNLP, 2013, pp. 1631–1642. by back-propagating errors,” Nature, vol. 323, no. 6088, pp. 533–536,
[7] S. Ji, W. Xu, M. Yang, and K. Yu, “3d convolutional neural networks 1986.
for human action recognition,” in Proc. of ICML, 2010, pp. 495–502. [33] L. Itti and C. Koch, “A saliency-based search mechanism for overt and
[8] Q. V. Le, W. Y. Zou, S. Y. Yeung, and A. Y. Ng, “Learning hierarchical covert shifts of visual attention,” Vision research, vol. 40, no. 10, pp.
invariant spatio-temporal features for action recognition with indepen- 1489–1506, 2000.
dent subspace analysis,” in Proc. of CVPR, 2011, pp. 3361–3368. [34] D. J. Heeger, E. P. Simoncelli, and J. A. Movshon, “Computational
[9] G. Montavon, M. Rupp, V. Gobre, A. Vazquez-Mayagoitia, K. Hansen, models of cortical visual processing,” Proceedings of the National
A. Tkatchenko, K.-R. Müller, and O. A. von Lilienfeld, “Machine learn- Academy of Sciences, vol. 93, no. 2, pp. 623–627, 1996.
ing of molecular electronic properties in chemical compound space,” [35] E. P. Simoncelli and B. A. Olshausen, “Natural image statistics and
New Journal of Physics, vol. 15, no. 9, p. 095003, 2013. neural representation,” Annual review of neuroscience, vol. 24, no. 1,
[10] G. Montavon, G. B. Orr, and K.-R. Müller, Eds., Neural Networks: pp. 1193–1216, 2001.
Tricks of the Trade, Reloaded, 2nd ed., ser. Lecture Notes in Computer [36] L. Itti and C. Koch, “Computational modelling of visual attention,”
Science (LNCS). Springer, 2012, vol. 7700. Nature reviews neuroscience, vol. 2, no. 3, pp. 194–203, 2001.
SAMEK ET AL. − EVALUATING THE VISUALIZATION OF WHAT A DEEP NEURAL NETWORK HAS LEARNED 12

[37] Y. Jia, “Caffe: An open source convolutional architecture for fast feature We would like the region perturbation process to destroy
embedding,” http://caffe.berkeleyvision.org/, 2013. the class information for highly relevant regions, and leave the
[38] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Object
detectors emerge in deep scene cnns,” arXiv:1412.6856, 2014. class intact for least relevant regions. The idea behind it is to
[39] A. M. Nguyen, J. Yosinski, and J. Clune, “Deep neural networks are maintain a balance between staying on the data manifold and
easily fooled: High confidence predictions for unrecognizable images,” being able to alter the image in a way which can be sensed by
CoRR, vol. abs/1412.1897, 2014.
the classifier. To quantify this property, we propose to monitor
A PPENDIX the gap between the two region perturbation processes LeRF
and MoRF:
C HOOSING A PERTURBATION METHOD
L
An ideal region perturbation method effectively removes 1 DX (k) (k)
E
ABPC = f (xLeRF ) − f (xMoRF ) (13)
information without introducing spurious structures. L+1 p(x)
k=0
Furthermore it neither significantly disrupts image statistics
where f is a class scoring function (not necessarily the same
nor moves the corrupted image far away from the data
as the one used for computing the heatmaps), and where
manifold. In the following we propose four different region
h·ip(x) measures the expectation of the measured quantity
perturbation functions g(x, r p ):
with respect to the distribution of input images. The area
Uniform: replaces each pixel near r p by a RGB value between perturbation curves (ABPC) is an indicator of how
sampled from an uniform distribution U. good a heatmapping technique is, how good the class scoring
Dirichlet: replaces each pixel near r p by a RGB value function f that induces the heatmaps is, and also how good a
sampled from a four-dimensional Dirichlet distribution D. By region perturbation process is at removing local information in
sampling from this distribution we retain the image statistics. the image without introducing spurious structure in it. Large
ABPC values are desirable.
Constant: replaces each pixel near r p by a constant value.
Here we use the average RGB value computed over all
images at this location. Note that this does not mean that all C OMPARISON OF PERTURBATION METHODS ON SUN397
pixels near r p are replaced by the same value. Figure 8 depicts the results of this comparison for the four
Blur: blurs the pixels near r p with a Gaussian filter functions g presented before when applied to the SUN397
with σ = 3. This is the only method which retains local data set. For each test image we compute heatmaps using
information. LRP. Subsequently, we measure the classification output from
the highest linear layer while continuously removing the most
and least relevant information, respectively. One can see in
Note that using an uniform distribution for samplingpfrom Figure 8 that blurring fails to remove information. The MoRF
[0, 1]3 implies a mean of 0.5 and a standard deviation 0.5 1/3 curve is relatively flat, thus the DNN does not lose the ability
for each pixel. However, we observed for those images which to classify an image even though information from relevant
we analyzed that the image mean for certain pixels goes above regions is being continuously corrupted by g(x, r k ). Similarly,
0.55 which is not surprising, as pixels in the top of the images replacing the pixel values in r k by constant values does not
often show bright sky. One possibility for retaining the natural abruptly decrease the score. In both cases the DNN can cope
image statistics is to average RGB value computed over all with losing an increasing portion of relevant information. The
images at each location (see method “Constant”). Another two random methods, Uniform and Dirichlet, effectively re-
possibility is to sample from a distribution which fits the data move information and have significantly larger ABPC values.
well. In case of one pixel with one color channel, a natural Although the curves of both region perturbation processes,
choice would be to fit a beta distribution. The beta distribution MoPF and LePF, show a steeper decrease than in case of
generalizes for higher dimensions to a Dirichlet distribution. Constant and Blur, the relative score decline is much larger
Since the three dimensional Dirichlet distribution does not resulting in larger ABPC values.
fit the required condition r, g, b ∈ [0, 1], r + g + b ∈ [0, 3],
we derive a sampling scheme based on a modified four
dimensional Dirichlet distribution.

E VALUATING PERTURBATION METHODS


We define an alternative region perturbation process where
locations are considered in reverse order. We call this process
least relevant first, or abbreviated as LeRF. In that case, the
perturbation process is defined by a new recursion formula:
(0)
xLeRF = x (12)
(k) (k−1)
∀1≤k≤L: xLeRF = g(xLeRF , r L+1−k ).
We expect in this second case, that the class information in
the image should be very stable and close to the original value
for small k, and only drop quickly to zero as k approaches L.
SAMEK ET AL. − EVALUATING THE VISUALIZATION OF WHAT A DEEP NEURAL NETWORK HAS LEARNED 13

Uniform Dirichlet
0 0
score decline

score decline

−2 −2

−4 −4
ABPC = 243.69 ABPC = 239.8
−6 −6
0 25 50 75 0 25 50 75
perturbation steps perturbation steps

Constant Blur
0 0
score decline

score decline

−2 −2

−4 −4
ABPC = 193.65 ABPC = 113.75
−6 −6
0 25 50 75 0 25 50 75
perturbation steps perturbation steps
Fig. 8. Comparison of the four image corruption schemes on the SUN397
data set. Dashed and solid lines are LeRF and MoRF curves, respectively.

You might also like