0% found this document useful (0 votes)
98 views38 pages

A Survey On Deep Learning in Medical Image Analysis: Haugeland 1985

This document summarizes over 300 contributions applying deep learning techniques, particularly convolutional neural networks, to medical image analysis tasks. It reviews major deep learning concepts and how they have been used for image classification, object detection, segmentation, registration, and other tasks applied to neuro, retinal, pulmonary, pathology, breast, cardiac, abdominal, and musculoskeletal images. While early systems relied on handcrafted image features, deep learning models can learn hierarchical image features directly from data. The survey identifies open challenges for deep learning in medical imaging and directions for future research.

Uploaded by

Sangat Baik
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
98 views38 pages

A Survey On Deep Learning in Medical Image Analysis: Haugeland 1985

This document summarizes over 300 contributions applying deep learning techniques, particularly convolutional neural networks, to medical image analysis tasks. It reviews major deep learning concepts and how they have been used for image classification, object detection, segmentation, registration, and other tasks applied to neuro, retinal, pulmonary, pathology, breast, cardiac, abdominal, and musculoskeletal images. While early systems relied on handcrafted image features, deep learning models can learn hierarchical image features directly from data. The survey identifies open challenges for deep learning in medical imaging and directions for future research.

Uploaded by

Sangat Baik
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 38

A Survey on Deep Learning in Medical Image Analysis

Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi,
Mohsen Ghafoorian, Jeroen A.W.M. van der Laak, Bram van Ginneken, Clara I. Sánchez

Diagnostic Image Analysis Group


Radboud University Medical Center
Nijmegen, The Netherlands
arXiv:1702.05747v2 [cs.CV] 4 Jun 2017

Abstract
Deep learning algorithms, in particular convolutional networks, have rapidly become a methodology of choice for
analyzing medical images. This paper reviews the major deep learning concepts pertinent to medical image analysis
and summarizes over 300 contributions to the field, most of which appeared in the last year. We survey the use of
deep learning for image classification, object detection, segmentation, registration, and other tasks. Concise overviews
are provided of studies per application area: neuro, retinal, pulmonary, digital pathology, breast, cardiac, abdominal,
musculoskeletal. We end with a summary of the current state-of-the-art, a critical discussion of open challenges and
directions for future research.
Keywords: deep learning, convolutional neural networks, medical imaging, survey

1. Introduction and forms the basis of many successful commercially


available medical image analysis systems. Thus, we
As soon as it was possible to scan and load medi- have seen a shift from systems that are completely de-
cal images into a computer, researchers have built sys- signed by humans to systems that are trained by com-
tems for automated analysis. Initially, from the 1970s puters using example data from which feature vectors
to the 1990s, medical image analysis was done with se- are extracted. Computer algorithms determine the opti-
quential application of low-level pixel processing (edge mal decision boundary in the high-dimensional feature
and line detector filters, region growing) and mathe- space. A crucial step in the design of such systems is
matical modeling (fitting lines, circles and ellipses) to the extraction of discriminant features from the images.
construct compound rule-based systems that solved par- This process is still done by human researchers and, as
ticular tasks. There is an analogy with expert systems such, one speaks of systems with handcrafted features.
with many if-then-else statements that were popular in A logical next step is to let computers learn the fea-
artificial intelligence in the same period. These ex- tures that optimally represent the data for the problem at
pert systems have been described as GOFAI (good old- hand. This concept lies at the basis of many deep learn-
fashioned artificial intelligence) (Haugeland, 1985) and ing algorithms: models (networks) composed of many
were often brittle; similar to rule-based image process- layers that transform input data (e.g. images) to outputs
ing systems. (e.g. disease present/absent) while learning increasingly
At the end of the 1990s, supervised techniques, where higher level features. The most successful type of mod-
training data is used to develop a system, were becom- els for image analysis to date are convolutional neu-
ing increasingly popular in medical image analysis. Ex- ral networks (CNNs). CNNs contain many layers that
amples include active shape models (for segmentation), transform their input with convolution filters of a small
atlas methods (where the atlases that are fit to new data extent. Work on CNNs has been done since the late sev-
form the training data), and the concept of feature ex- enties (Fukushima, 1980) and they were already applied
traction and use of statistical classifiers (for computer- to medical image analysis in 1995 by Lo et al. (1995).
aided detection and diagnosis). This pattern recogni- They saw their first successful real-world application in
tion or machine learning approach is still very popular LeNet (LeCun et al., 1998) for hand-written digit recog-
1
nition. Despite these initial successes, the use of CNNs of-the-art, open challenges and overview of research di-
did not gather momentum until various new techniques rections and technologies that will become important in
were developed for efficiently training deep networks, the future.
and advances were made in core computing systems. This survey includes over 300 papers, most of them
The watershed was the contribution of Krizhevsky et al. recent, on a wide variety of applications of deep learn-
(2012) to the ImageNet challenge in December 2012. ing in medical image analysis. To identify relevant
The proposed CNN, called AlexNet, won that compe- contributions PubMed was queried for papers contain-
tition by a large margin. In subsequent years, further ing (”convolutional” OR ”deep learning”) in title or ab-
progress has been made using related but deeper archi- stract. ArXiv was searched for papers mentioning one
tectures (Russakovsky et al., 2014). In computer vision, of a set of terms related to medical imaging. Addi-
deep convolutional networks have now become the tech- tionally, conference proceedings for MICCAI (includ-
nique of choice. ing workshops), SPIE, ISBI and EMBC were searched
The medical image analysis community has taken no- based on titles of papers. We checked references in all
tice of these pivotal developments. However, the transi- selected papers and consulted colleagues. We excluded
tion from systems that use handcrafted features to sys- papers that did not report results on medical image
tems that learn features from the data has been grad- data or only used standard feed-forward neural networks
ual. Before the breakthrough of AlexNet, many dif- with handcrafted features. When overlapping work had
ferent techniques to learn features were popular. Ben- been reported in multiple publications, only the publica-
gio et al. (2013) provide a thorough review of these tion(s) deemed most important were included. We ex-
techniques. They include principal component analysis, pect the search terms used to cover most, if not all, of
clustering of image patches, dictionary approaches, and the work incorporating deep learning methods. The last
many more. Bengio et al. (2013) introduce CNNs that update to the included papers was on February 1, 2017.
are trained end-to-end only at the end of their review in The appendix describes the search process in more de-
a section entitled Global training of deep models. In this tail.
survey, we focus particularly on such deep models, and Summarizing, with this survey we aim to:
do not include the more traditional feature learning ap-
• show that deep learning techniques have permeated
proaches that have been applied to medical images. For
the entire field of medical image analysis;
a broader review on the application of deep learning in
health informatics we refer to Ravi et al. (2017), where • identify the challenges for successful application
medical image analysis is briefly touched upon. of deep learning to medical imaging tasks;
Applications of deep learning to medical image anal-
ysis first started to appear at workshops and confer- • highlight specific contributions which solve or cir-
ences, and then in journals. The number of papers grew cumvent these challenges.
rapidly in 2015 and 2016. This is illustrated in Figure The rest of this survey as structured as followed. In
1. The topic is now dominant at major conferences and Section 2 we introduce the main deep learning tech-
a first special issue appeared of IEEE Transaction on niques that have been used for medical image analy-
Medical Imaging in May 2016 (Greenspan et al., 2016). sis and that are referred to throughout the survey. Sec-
tion 3 describes the contributions of deep learning to
One dedicated review on application of deep learning canonical tasks in medical image analysis: classifica-
to medical image analysis was published by Shen et al. tion, detection, segmentation, registration, retrieval, im-
(2017). Although they cover a substantial amount of age generation and enhancement. Section 4 discusses
work, we feel that important areas of the field were not obtained results and open challenges in different ap-
represented. To give an example, no work on retinal im- plication areas: neuro, ophthalmic, pulmonary, digital
age analysis was covered. The motivation for our review pathology and cell imaging, breast, cardiac, abdominal,
was to offer a comprehensive overview of (almost) all musculoskeletal, and remaining miscellaneous applica-
fields in medical imaging, both from an application and tions. We end with a summary, a critical discussion and
a methodology-drive perspective. This also includes an outlook for future research.
overview tables of all publications which readers can
use to quickly assess the field. Last, we leveraged our 2. Overview of deep learning methods
own experience with the application of deep learning
methods to medical image analysis to provide readers The goal of this section is to provide a formal in-
with a dedicated discussion section covering the state- troduction and definition of the deep learning concepts,
2
250
Segmenta�on (Organ, substructure)
200 Detec�on (Object)
Number of papers

Classifica�on (Exam)
150
Classifica�on (Object)

100 Other
Detec�on (Organ, region, landmark)
50 Segmenta�on (Object)
Registra�on
0
2012 2013 2014 2015 2016 2017 0 20 40 60 80 100
All CNN RBM RNN AE Other Mul�ple

MRI Pathology
Microscopy Brain
CT Other
Lung
Ultrasound
Abdomen
X-ray
Cardiac
Mammography
Breast
Other Bone
Mul�ple Re�na
Color fundus photos Mul�ple
0 20 40 60 80 100 0 10 20 30 40 50 60 70
Number of papers Number of papers

Figure 1: Breakdown of the papers included in this survey in year of publication, task addressed (Section 3), imaging modality, and application
area (Section 4). The number of papers for 2017 has been extrapolated from the papers published in January.

techniques and architectures that we found in the medi- 2.2. Neural Networks
cal image analysis papers surveyed in this work. Neural networks are a type of learning algorithm
which forms the basis of most deep learning methods. A
neural network comprises of neurons or units with some
2.1. Learning algorithms activation a and parameters Θ = {W, B}, where W is a
set of weights and B a set of biases. The activation rep-
Machine learning methods are generally divided into resents a linear combination of the input x to the neuron
supervised and unsupervised learning algorithms, al- and the parameters, followed by an element-wise non-
though there are many nuances. In supervised learning, linearity σ(·), referred to as a transfer function:
a model is presented with a dataset D = {x, y}n=1 N
of in-
put features x and label y pairs, where y typically repre- a = σ(wT x + b). (1)
sents an instance of a fixed set of classes. In the case of
regression tasks y can also be a vector with continuous Typical transfer functions for traditional neural net-
values. Supervised training typically amounts to finding works are the sigmoid and hyperbolic tangent function.
model parameters Θ that best predict the data based on The multi-layered perceptrons (MLP), the most well-
a loss function L(y, ŷ). Here ŷ denotes the output of the known of the traditional neural networks, have several
model obtained by feeding a data point x to the function layers of these transformations:
f (x; Θ) that represents the model. f (x; Θ) = σ(WT σ(WT . . . σ(WT x + b)) + b). (2)
Unsupervised learning algorithms process data with-
out labels and are trained to find patterns, such as la- Here, W is a matrix comprising of columns wk , associ-
tent subspaces. Examples of traditional unsupervised ated with activation k in the output. Layers in between
learning algorithms are principal component analysis the input and output are often referred to as ’hidden’
and clustering methods. Unsupervised training can be layers. When a neural network contains multiple hidden
performed under many different loss functions. One ex- layers it is typically considered a ’deep’ neural network,
ample is reconstruction loss L(x, x̂) where the model has hence the term ’deep learning’.
to learn to reconstruct its input, often through a lower- At the final layer of the network the activations are
dimensional or noisy representation. mapped to a distribution over classes P(y|x; Θ) through
3
a softmax function: 2.3. Convolutional Neural Networks (CNNs)

T
There are two key differences between MLPs and
ewi x+bi CNNs. First, in CNNs weights in the network are shared
P(y|x; Θ) = softmax(x; Θ) = PK T
, (3)
k=1 ewk x+bk in such a way that it the network performs convolution
operations on images. This way, the model does not
where wi indicates the weight vector leading to the out- need to learn separate detectors for the same object oc-
put node associated with class i. A schematic represen- curring at different positions in an image, making the
tation of three-layer MLP is shown in Figure 2. network equivariant with respect to translations of the
Maximum likelihood with stochastic gradient descent input. It also drastically reduces the amount of param-
is currently the most popular method to fit parameters Θ eters (i.e. the number of weights no longer depends on
to a dataset D. In stochastic gradient descent a small the size of the input image) that need to be learned. An
subset of the data, a mini-batch, is used for each gradi- example of a 1D CNN is shown in Figure 2.
ent update instead of the full data set. Optimizing max- At each layer, the input image is convolved with a
imum likelihood in practice amounts to minimizing the set of K kernels W = {W1 , W2 , . . . , WK } and added
negative log-likelihood: biases B = {b1 , . . . , bK }, each generating a new feature
map Xk . These features are subjected to an element-
N
wise non-linear transform σ(·) and the same process is
X
arg min − log P(yn |xn ; Θ) .
 
(4) repeated for every convolutional layer l:
Θ
n=1
Xlk = σ Wkl−1 ∗ Xl−1 + bl−1 .

k (5)
This results in the binary cross-entropy loss for two-
The second key difference between CNNs and MLPs,
class problems and the categorical cross-entropy for
is the typical incorporation of pooling layers in CNNs,
multi-class tasks. A downside of this approach is that
where pixel values of neighborhoods are aggregated us-
it typically does not optimize the quantity we are in-
ing a permutation invariant function, typically the max
terested in directly, such as area under the receiver-
or mean operation. This induces a certain amount of
operating characteristic (ROC) curve or common evalu-
translation invariance and again reduces the amount of
ation measures for segmentation, such as the Dice coef-
parameters in the network. At the end of the convo-
ficient.
lutional stream of the network, fully-connected layers
For a long time, deep neural networks (DNN) were (i.e. regular neural network layers) are usually added,
considered hard to train efficiently. They only gained where weights are no longer shared. Similar to MLPs,
popularity in 2006 (Bengio et al., 2007; Hinton and a distribution over classes is generated by feeding the
Salakhutdinov, 2006; Hinton et al., 2006) when it was activations in the final layer through a softmax function
shown that training DNNs layer-by-layer in an unsu- and the network is trained using maximum likelihood.
pervised manner (pre-training), followed by supervised
fine-tuning of the stacked network, could result in good
2.4. Deep CNN Architectures
performance. Two popular architectures trained in such
a way are stacked auto-encoders (SAEs) and deep belief Given the prevalence of CNNs in medical image anal-
networks (DBNs). However, these techniques are rather ysis, we elaborate on the most common architectures
complex and require a significant amount of engineer- and architectural differences among the widely used
ing to generate satisfactory results. models.
Currently, the most popular models are trained end-
to-end in a supervised fashion, greatly simplifying 2.4.1. General classification architectures
the training process. The most popular architectures LeNet (LeCun et al., 1998) and AlexNet (Krizhevsky
are convolutional neural networks (CNNs) and recur- et al., 2012), introduced over a decade later, were in
rent neural networks (RNNs). CNNs are currently essence very similar models. Both networks were rela-
most widely used in (medical) image analysis, although tively shallow, consisting of two and five convolutional
RNNs are gaining popularity. The following sections layers, respectively, and employed kernels with large re-
will give a brief overview of each of these methods, ceptive fields in layers close to the input and smaller
starting with the most popular ones, and discussing their kernels closer to the output. AlexNet did incorporate
differences and potential challenges when applied to rectified linear units instead of the hyperbolic tangent as
medical problems. activation function.
4
After 2012 the exploration of novel architectures took of fusion, multi-stream architectures are being explored.
off, and in the last three years there is a preference for These models, also referred to as dual pathway architec-
far deeper models. By stacking smaller kernels, instead tures (Kamnitsas et al., 2017), have two main applica-
of using a single layer of kernels with a large receptive tions at the time of writing: (1) multi-scale image analy-
field, a similar function can be represented with less pa- sis and (2) 2.5D classification; both relevant for medical
rameters. These deeper architectures generally have a image processing tasks.
lower memory footprint during inference, which enable For the detection of abnormalities, context is often
their deployment on mobile computing devices such as an important cue. The most straightforward way to in-
smartphones. Simonyan and Zisserman (2014) were the crease context is to feed larger patches to the network,
first to explore much deeper networks, and employed but this can significantly increase the amount of param-
small, fixed size kernels in each layer. A 19-layer model eters and memory requirements of a network. Conse-
often referred to as VGG19 or OxfordNet won the Ima- quently, architectures have been investigated where con-
geNet challenge of 2014. text is added in a down-scaled representation in addi-
On top of the deeper networks, more complex build- tion to high resolution local information. To the best
ing blocks have been introduced that improve the effi- of our knowledge, the multi-stream multi-scale archi-
ciency of the training procedure and again reduce the tecture was first explored by Farabet et al. (2013), who
amount of parameters. Szegedy et al. (2014) introduced used it for segmentation in natural images. Several med-
a 22-layer network named GoogLeNet, also referred to ical applications have also successfully used this con-
as Inception, which made use of so-called inception cept (Kamnitsas et al., 2017; Moeskops et al., 2016a;
blocks (Lin et al., 2013), a module that replaces the Song et al., 2015; Yang et al., 2016c).
mapping defined in Eq. (5) with a set of convolutions of As so much methodology is still developed on nat-
different sizes. Similar to the stacking of small kernels, ural images, the challenge of applying deep learning
this allows a similar function to be represented with less techniques to the medical domain often lies in adapt-
parameters. The ResNet architecture (He et al., 2015) ing existing architectures to, for instance, different input
won the ImageNet challenge in 2015 and consisted of formats such as three-dimensional data. In early appli-
so-called ResNet-blocks. Rather than learning a func- cations of CNNs to such volumetric data, full 3D con-
tion, the residual block only learns the residual and is volutions and the resulting large amount of parameters
thereby pre-conditioned towards learning mappings in were circumvented by dividing the Volume of Interest
each layer that are close to the identity function. This (VOI) into slices which are fed as different streams to a
way, even deeper models can be trained effectively. network. Prasoon et al. (2013) were the first to use this
Since 2014, the performance on the ImageNet bench- approach for knee cartilage segmentation. Similarly, the
mark has saturated and it is difficult to assess whether network can be fed with multiple angled patches from
the small increases in performance can really be at- the 3D-space in a multi-stream fashion, which has been
tributed to ’better’ and more sophisticated architectures. applied by various authors in the context of medical
The advantage of the lower memory footprint these imaging (Roth et al., 2016b; Setio et al., 2016). These
models provide is typically not as important for medi- approaches are also referred to as 2.5D classification.
cal applications. Consequently, AlexNet or other sim-
ple models such as VGG are still popular for medical 2.4.3. Segmentation Architectures
data, though recent landmark studies all use a version Segmentation is a common task in both natural and
of GoogleNet called Inception v3 (Gulshan et al., 2016; medical image analysis and to tackle this, CNNs can
Esteva et al., 2017; Liu et al., 2017). Whether this is due simply be used to classify each pixel in the image indi-
to a superior architecture or simply because the model vidually, by presenting it with patches extracted around
is a default choice in popular software packages is again the particular pixel. A drawback of this naive ’sliding-
difficult to assess. window’ approach is that input patches from neighbor-
ing pixels have huge overlap and the same convolutions
2.4.2. Multi-stream architectures are computed many times. Fortunately, the convolution
The default CNN architecture can easily accommo- and dot product are both linear operators and thus inner
date multiple sources of information or representations products can be written as convolutions and vice versa.
of the input, in the form of channels presented to the By rewriting the fully connected layers as convolutions,
input layer. This idea can be taken further and chan- the CNN can take input images larger than it was trained
nels can be merged at any point in the network. Under on and produce a likelihood map, rather than an out-
the intuition that different tasks require different ways put for a single pixel. The resulting ’fully convolutional
5
network’ (fCNN) can then be applied to an entire input (in time) and consequently suffer from the same
image or volume in an efficient fashion. problems with training as regular deep neural networks
However, because of pooling layers, this may result (Bengio et al., 1994). To this end, several specialized
in output with a far lower resolution than the input. memory units have been developed, the earliest and
’Shift-and-stitch’ (Long et al., 2015) is one of several most popular being the Long Short Term Memory
methods proposed to prevent this decrease in resolution. (LSTM) cell (Hochreiter and Schmidhuber, 1997). The
The fCNN is applied to shifted versions of the input im- Gated Recurrent Unit (Cho et al., 2014) is a recent
age. By stitching the result together, one obtains a full simplification of the LSTM and is also commonly used.
resolution version of the final output, minus the pixels
lost due to the ’valid’ convolutions. Although initially proposed for one-dimensional in-
Ronneberger et al. (2015) took the idea of the fCNN put, RNNs are increasingly applied to images. In natu-
one step further and proposed the U-net architecture, ral images ’pixelRNNs’ are used as autoregressive mod-
comprising a ’regular’ fCNN followed by an upsam- els, generative models that can eventually produce new
pling part where ’up’-convolutions are used to in- images similar to samples in the training set. For med-
crease the image size, coined contractive and expansive ical applications, they have been used for segmentation
paths. Although this is not the first paper to introduce problems, with promising results (Stollenga et al., 2015)
learned upsampling paths in convolutional neural net- in the MRBrainS challenge.
works (e.g. Long et al. (2015)), the authors combined it
with so called skip-connections to directly connect op-
posing contracting and expanding convolutional layers. 2.6. Unsupervised models
A similar approach was used by Çiçek et al. (2016) for
3D data. Milletari et al. (2016b) proposed an extension 2.6.1. Auto-encoders (AEs) and Stacked Auto-encoders
to the U-Net layout that incorporates ResNet-like resid- (SAEs)
ual blocks and a Dice loss layer, rather than the conven-
tional cross-entropy, that directly minimizes this com- AEs are simple networks that are trained to recon-
monly used segmentation error measure. struct the input x on the output layer x0 through one hid-
den layer h. They are governed by a weight matrix W x,h
and bias b x,h from input to hidden state and Wh,x0 with
2.5. Recurrent Neural Networks (RNNs)
corresponding bias bh,x0 from the hidden layer to the re-
Traditionally, RNNs were developed for discrete se- construction. A non-linear function is used to compute
quence analysis. They can be seen as a generalization the hidden activation:
of MLPs because both the input and output can be of
varying length, making them suitable for tasks such as h = σ(W x,h x + b x,h ). (8)
machine translation where a sentence of the source and
target language are the input and output. In a classifica- Additionally, the dimension of the hidden layer |h| is
tion setting, the model learns a distribution over classes taken to be smaller than |x|. This way, the data is pro-
P(y|x1 , x2 , . . . , xT ; Θ) given a sequence x1 , x2 , . . . , xT , jected onto a lower dimensional subspace representing
rather than a single input vector x. a dominant latent structure in the input. Regularization
The plain RNN maintains a latent or hidden state h at or sparsity constraints can be employed to enhance the
time t that is the output of a non-linear mapping from its discovery process. If the hidden layer had the same size
input xt and the previous state ht−1 : as the input and no further non-linearities were added,
the model would simply learn the identity function.
ht = σ(Wxt + Rht−1 + b), (6)
The denoising auto-encoder (Vincent et al., 2010) is
where weight matrices W and R are shared over time. another solution to prevent the model from learning a
For classification, one or more fully connected layers trivial solution. Here the model is trained to recon-
are typically added followed by a softmax to map the struct the input from a noise corrupted version (typically
sequence to a posterior over the classes. salt-and-pepper-noise). SAEs (or deep AEs) are formed
by placing auto-encoder layers on top of each other.
P(y|x1 , x2 , . . . , xT ; Θ) = softmax(hT ; Wout , bout ). (7) In medical applications surveyed in this work, auto-
encoder layer were often trained individually (‘greed-
Since the gradient needs to be backpropagated from ily’) after which the full network was fine-tuned using
the output through time, RNNs are inherently deep supervised training to make a prediction.
6
2.6.2. Restricted Boltzmann Machines (RBMs) and 2.7. Hardware and Software
Deep Belief Networks (DBNs)
One of the main contributors to steep rise of deep
RBMs (Hinton, 2010) are a type of Markov Ran- learning has been the widespread availability of GPU
dom Field (MRF), constituting an input layer or visi- and GPU-computing libraries (CUDA, OpenCL). GPUs
ble layer x = (x1 , x2 , . . . , xN ) and a hidden layer h = are highly parallel computing engines, which have an
(h1 , h2 , . . . , h M ) that carries the latent feature represen- order of magnitude more execution threads than central
tation. The connections between the nodes are bi- processing units (CPUs). With current hardware, deep
directional, so given an input vector x one can obtain learning on GPUs is typically 10 to 30 times faster than
the latent feature representation h and also vice versa. on CPUs.
As such, the RBM is a generative model, and we can Next to hardware, the other driving force behind the
sample from it and generate new data points. In anal- popularity of deep learning methods is the wide avail-
ogy to physical systems, an energy function is defined ability of open source software packages. These li-
for a particular state (x, h) of input and hidden units: braries provide efficient GPU implementations of im-
portant operations in neural networks, such as convo-
E(x, h) = hT Wx − cT x − bT h, (9) lutions; allowing the user to implement ideas at a high
level rather than worrying about low-level efficient im-
with c and b bias terms. The probability of the ‘state’ of plementations. At the time of writing, the most popular
the system is defined by passing the energy to an expo- packages were (in alphabetical order):
nential and normalizing:
• Caffe (Jia et al., 2014). Provides C++ and Python
1 interfaces, developed by graduate students at UC
p(x, h) = exp{−E(x, h)}. (10)
Z Berkeley.
Computing the partition function Z is generally in- • Tensorflow (Abadi et al., 2016). Provides C++
tractable. However, conditional inference in the form of and Python and interfaces, developed by Google
computing h conditioned on v or vice versa is tractable and is used by Google research.
and results in a simple formula:
• Theano (Bastien et al., 2012). Provides a Python
1 interface, developed by MILA lab in Montreal.
P(h j |x) = . (11)
1 + exp{−b j − W j x}
• Torch (Collobert et al., 2011). Provides a Lua in-
Since the network is symmetric, a similar expression terface and is used by, among others, Facebook AI
holds for P(xi |h). research.

There are third-party packages written on top of one or


DBNs (Bengio et al., 2007; Hinton et al., 2006) are
more of these frameworks, such as Lasagne (https://
essentially SAEs where the AE layers are replaced by
github.com/Lasagne/Lasagne) or Keras (https:
RBMs. Training of the individual layers is, again, done
//keras.io/). It goes beyond the scope of this paper
in an unsupervised manner. Final fine-tuning is per-
to discuss all these packages in detail.
formed by adding a linear classifier to the top layer of
the DBN and performing a supervised optimization.
3. Deep Learning Uses in Medical Imaging
2.6.3. Variational Auto-Encoders and Generative Ad-
3.1. Classification
verserial Networks
Recently, two novel unsupervised architectures 3.1.1. Image/exam classification
were introduced: the variational auto-encoder (VAE) Image or exam classification was one of the first ar-
(Kingma and Welling, 2013) and the generative adver- eas in which deep learning made a major contribution
sarial network (GAN) (Goodfellow et al., 2014). There to medical image analysis. In exam classification one
are no peer-reviewed papers applying these methods to typically has one or multiple images (an exam) as in-
medical images yet, but applications in natural images put with a single diagnostic variable as output (e.g.,
are promising. We will elaborate on their potential in disease present or not). In such a setting, every diag-
the discussion. nostic exam is a sample and dataset sizes are typically
7
(a) (b) (c) (d)

Concatenate

Up-convolu�on
Down-sample

Up-sample

(e) (f)
Input node Weighted connec�on
Hidden node Weighted connec�on
Output node (similar colors indicate shared weights)
Probabilis�c node Pooling connec�on

Figure 2: Node graphs of 1D representations of architectures commonly used in medical imaging. a) Auto-encoder, b) restricted Boltzmann
machine, c) recurrent neural network, d) convolutional neural network, e) multi-stream convolutional neural network, f) U-net (with a single
downsampling stage).
.

small compared to those in computer vision (e.g., hun- few authors perform a thorough investigation in which
dreds/thousands vs. millions of samples). The popular- strategy gives the best result. The two papers that do,
ity of transfer learning for such applications is therefore Antony et al. (2016) and Kim et al. (2016a), offer con-
not surprising. flicting results. In the case of Antony et al. (2016), fine-
tuning clearly outperformed feature extraction, achiev-
Transfer learning is essentially the use of pre-trained ing 57.6% accuracy in multi-class grade assessment of
networks (typically on natural images) to try to work knee osteoarthritis versus 53.4%. Kim et al. (2016a),
around the (perceived) requirement of large data sets however, showed that using CNN as a feature extractor
for deep network training. Two transfer learning strate- outperformed fine-tuning in cytopathology image clas-
gies were identified: (1) using a pre-trained network as sification accuracy (70.5% versus 69.1%). If any guid-
a feature extractor and (2) fine-tuning a pre-trained net- ance can be given to which strategy might be most suc-
work on medical data. The former strategy has the extra cessful, we would refer the reader to two recent papers,
benefit of not requiring one to train a deep network at published in high-ranking journals, which fine-tuned a
all, allowing the extracted features to be easily plugged pre-trained version of Google’s Inception v3 architec-
in to existing image analysis pipelines. Both strategies ture on medical data and achieved (near) human expert
are popular and have been widely applied. However,
8
performance (Esteva et al., 2017; Gulshan et al., 2016). 3.1.2. Object or lesion classification
As far as the authors are aware, such results have not yet Object classification usually focuses on the classifi-
been achieved by simply using pre-trained networks as cation of a small (previously identified) part of the med-
feature extractors. ical image into two or more classes (e.g. nodule clas-
With respect to the type of deep networks that are sification in chest CT). For many of these tasks both
commonly used in exam classification, a timeline sim- local information on lesion appearance and global con-
ilar to computer vision is apparent. The medical textual information on lesion location are required for
imaging community initially focused on unsupervised accurate classification. This combination is typically
pre-training and network architectures like SAEs and not possible in generic deep learning architectures. Sev-
RBMs. The first papers applying these techniques for eral authors have used multi-stream architectures to re-
exam classification appeared in 2013 and focused on solve this in a multi-scale fashion (Section 2.4.2). Shen
neuroimaging. Brosch and Tam (2013), Plis et al. et al. (2015b) used three CNNs, each of which takes a
(2014), Suk and Shen (2013), and Suk et al. (2014) nodule patch at a different scale as input. The result-
applied DBNs and SAEs to classify patients as hav- ing feature outputs of the three CNNs are then concate-
ing Alzheimer’s disease based on brain Magnetic Reso- nated to form the final feature vector. A somewhat simi-
nance Imaging (MRI). Recently, a clear shift towards lar approach was followed by Kawahara and Hamarneh
CNNs can be observed. Out of the 47 papers pub- (2016) who used a multi-stream CNN to classify skin
lished on exam classification in 2015, 2016, and 2017, lesions, where each stream works on a different reso-
36 are using CNNs, 5 are based on AEs and 6 on RBMs. lution of the image. Gao et al. (2015) proposed to use
The application areas of these methods are very diverse, a combination of CNNs and RNNs for grading nuclear
ranging from brain MRI to retinal imaging and digital cataracts in slit-lamp images, where CNN filters were
pathology to lung computed tomography (CT). pre-trained. This combination allows the processing of
In the more recent papers using CNNs authors also all contextual information regardless of image size. In-
often train their own network architectures from scratch corporating 3D information is also often a necessity for
instead of using pre-trained networks. Menegola et al. good performance in object classification tasks in med-
(2016) performed some experiments comparing training ical imaging. As images in computer vision tend to be
from scratch to fine-tuning of pre-trained networks and 2D natural images, networks developed in those scenar-
showed that fine-tuning worked better given a small data ios do not directly leverage 3D information. Authors
set of around a 1000 images of skin lesions. However, have used different approaches to integrate 3D in an ef-
these experiments are too small scale to be able to draw fective manner with custom architectures. Setio et al.
any general conclusions from. (2016) used a multi-stream CNN to classify points of
Three papers used an architecture leveraging the interest in chest CT as a nodule or non-nodule. Up to
unique attributes of medical data: two use 3D con- nine differently oriented patches extracted from the can-
volutions (Hosseini-Asl et al., 2016; Payan and Mon- didate were used in separate streams and merged in the
tana, 2015) instead of 2D to classify patients as having fully-connected layers to obtain the final classification
Alzheimer; Kawahara et al. (2016b) applied a CNN- output. In contrast, Nie et al. (2016c) exploited the 3D
like architecture to a brain connectivity graph derived nature of MRI by training a 3D CNN to assess survival
from MRI diffusion-tensor imaging (DTI). In order to in patients suffering from high-grade gliomas.
do this, they developed several new layers which formed Almost all recent papers prefer the use of end-to-end
the basis of their network, so-called edge-to-edge, edge- trained CNNs. In some cases other architectures and
to-node, and node-to-graph layers. They used their net- approaches are used, such as RBMs (van Tulder and
work to predict brain development and showed that they de Bruijne, 2016; Zhang et al., 2016c), SAEs (Cheng
outperformed existing methods in assessing cognitive et al., 2016a) and convolutional sparse auto-encoders
and motor scores. (CSAE) (Kallenberg et al., 2016). The major difference
Summarizing, in exam classification CNNs are the between CSAE and a classic CNN is the usage of unsu-
current standard techniques. Especially CNNs pre- pervised pre-training with sparse auto-encoders.
trained on natural images have shown surprisingly An interesting approach, especially in cases where
strong results, challenging the accuracy of human ex- object annotation to generate training data is expensive,
perts in some tasks. Last, authors have shown that is the integration of multiple instance learning (MIL)
CNNs can be adapted to leverage intrinsic structure of and deep learning. Xu et al. (2014) investigated the use
medical images. of a MIL-framework with both supervised and unsu-
9
pervised feature learning approaches as well as hand- reinforcement learning is applied to the identification
crafted features. The results demonstrated that the per- of landmarks. The authors showed promising results in
formance of the MIL-framework was superior to hand- several tasks: 2D cardiac MRI and ultrasound (US) and
crafted features, which in turn closely approaches the 3D head/neck CT.
performance of a fully supervised method. We expect Due to its increased complexity, only a few methods
such approaches to be popular in the future as well, as addressed the direct localization of landmarks and re-
obtaining high-quality annotated medical data is chal- gions in the 3D image space. Zheng et al. (2015) re-
lenging. duced this complexity by decomposing 3D convolution
Overall, object classification sees less use of pre- as three one-dimensional convolutions for carotid artery
trained networks compared to exam classifications, bifurcation detection in CT data. Ghesu et al. (2016b)
mostly due to the need for incorporation of contextual proposed a sparse adaptive deep neural network pow-
or three-dimensional information. Several authors have ered by marginal space learning in order to deal with
found innovative solutions to add this information to data complexity in the detection of the aortic valve in
deep networks with good results, and as such we ex- 3D transesophageal echocardiogram.
pect deep learning to become even more prominent for CNNs have also been used for the localization of scan
this task in the near future. planes or key frames in temporal data. Baumgartner
et al. (2016) trained CNNs on video frame data to de-
3.2. Detection tect up to 12 standardized scan planes in mid-pregnancy
3.2.1. Organ, region and landmark localization fetal US. Furthermore, they used saliency maps to ob-
Anatomical object localization (in space or time), tain a rough localization of the object of interest in the
such as organs or landmarks, has been an important pre- scan plan (e.g. brain, spine). RNNs, particularly LSTM-
processing step in segmentation tasks or in the clinical RNNs, have also been used to exploit the temporal in-
workflow for therapy planning and intervention. Lo- formation contained in medical videos, another type of
calization in medical imaging often requires parsing of high dimensional data. Chen et al. (2015a), for example,
3D volumes. To solve 3D data parsing with deep learn- employed LSTM models to incorporate temporal infor-
ing algorithms, several approaches have been proposed mation of consecutive sequence in US videos for fetal
that treat the 3D space as a composition of 2D orthog- standard plane detection. Kong et al. (2016) combined
onal planes. Yang et al. (2015) identified landmarks on an LSTM-RNN with a CNN to detect the end-diastole
the distal femur surface by processing three indepen- and end-systole frames in cine-MRI of the heart.
dent sets of 2D MRI slices (one for each plane) with Concluding, localization through 2D image classifi-
regular CNNs. The 3D position of the landmark was cation with CNNs seems to be the most popular strat-
defined as the intersection of the three 2D slices with egy overall to identify organs, regions and landmarks,
the highest classification output. de Vos et al. (2016b) with good results. However, several recent papers ex-
went one step further and localized regions of interest pand on this concept by modifying the learning pro-
(ROIs) around anatomical regions (heart, aortic arch, cess such that accurate localization is directly empha-
and descending aorta) by identifying a rectangular 3D sized, with promising results. We expect such strate-
bounding box after 2D parsing the 3D CT volume. Pre- gies to be explored further as they show that deep learn-
trained CNN architectures, as well as RBM, have been ing techniques can be adapted to a wide range of lo-
used for the same purpose (Cai et al., 2016b; Chen et al., calization tasks (e.g. multiple landmarks). RNNs have
2015b; Kumar et al., 2016), overcoming the lack of data shown promise in localization in the temporal domain,
to learn better feature representations. All these stud- and multi-dimensional RNNs could play a role in spatial
ies cast the localization task as a classification task and localization as well.
as such generic deep learning architectures and learning
processes can be leveraged.
Other authors try to modify the network learning pro- 3.2.2. Object or lesion detection
cess to directly predict locations. For example, Payer The detection of objects of interest or lesions in im-
et al. (2016) proposed to directly regress landmark lo- ages is a key part of diagnosis and is one of the most
cations with CNNs. They used landmark maps, where labor-intensive for clinicians. Typically, the tasks con-
each landmark is represented by a Gaussian, as ground sist of the localization and identification of small lesions
truth input data and the network is directly trained to in the full image space. There has been a long research
predict this landmark map. Another interesting ap- tradition in computer-aided detection systems that are
proach was published by Ghesu et al. (2016a), in which designed to automatically detect lesions, improving the
10
detection accuracy or decreasing the reading time of hu- 3.3. Segmentation
man experts. Interestingly, the first object detection sys-
tem using CNNs was already proposed in 1995, using a 3.3.1. Organ and substructure segmentation
CNN with four layers to detect nodules in x-ray images The segmentation of organs and other substructures
(Lo et al., 1995). in medical images allows quantitative analysis of clini-
cal parameters related to volume and shape, as, for ex-
Most of the published deep learning object detection ample, in cardiac or brain analysis. Furthermore, it is
systems still uses CNNs to perform pixel (or voxel) clas- often an important first step in computer-aided detection
sification, after which some form of post processing is pipelines. The task of segmentation is typically defined
applied to obtain object candidates. As the classifica- as identifying the set of voxels which make up either
tion task performed at each pixel is essentially object the contour or the interior of the object(s) of interest.
classification, CNN architecture and methodology are Segmentation is the most common subject of papers ap-
very similar to those in section 3.1.2. The incorpora- plying deep learning to medical imaging (Figure 1), and
tion of contextual or 3D information is also handled us- as such has also seen the widest variety in methodol-
ing multi-stream CNNs (Section 2.4.2, for example by ogy, including the development of unique CNN-based
Barbu et al. (2016) and Roth et al. (2016b). Teramoto segmentation architectures and the wider application of
et al. (2016) used a multi-stream CNN to integrate CT RNNs.
and Positron Emission Tomography (PET) data. Dou The most well-known, in medical image analysis, of
et al. (2016c) used a 3D CNN to find micro-bleeds in these novel CNN architectures is U-net, published by
brain MRI. Last, as the annotation burden to gener- Ronneberger et al. (2015) (section 2.4.3). The two main
ate training data can be similarly significant compared architectural novelties in U-net are the combination of
to object classification, weakly-supervised deep learn- an equal amount of upsampling and downsampling lay-
ing has been explored by Hwang and Kim (2016), who ers. Although learned upsampling layers have been pro-
adopted such a strategy for the detection of nodules in posed before, U-net combines them with so-called skip
chest radiographs and lesions in mammography. connections between opposing convolution and decon-
volution layers. This which concatenate features from
There are some aspects which are significantly differ- the contracting and expanding paths. From a training
ent between object detection and object classification. perspective this means that entire images/scans can be
One key point is that because every pixel is classified, processed by U-net in one forward pass, resulting in a
typically the class balance is skewed severely towards segmentation map directly. This allows U-net to take
the non-object class in a training setting. To add insult into account the full context of the image, which can be
to injury, usually the majority of the non-object sam- an advantage in contrast to patch-based CNNs. Further-
ples are easy to discriminate, preventing the deep learn- more, in an extended paper by Çiçek et al. (2016), it is
ing method to focus on the challenging samples. van shown that a full 3D segmentation can be achieved by
Grinsven et al. (2016) proposed a selective data sam- feeding U-net with a few 2D annotated slices from the
pling in which wrongly classified samples were fed back same volume. Other authors have also built derivatives
to the network more often to focus on challenging areas of the U-net architecture; Milletari et al. (2016b), for
in retinal images. Last, as classifying each pixel in a example, proposed a 3D-variant of U-net architecture,
sliding window fashion results in orders of magnitude called V-net, performing 3D image segmentation using
of redundant calculation, fCNNs, as used in Wolterink 3D convolutional layers with an objective function di-
et al. (2016), are important aspect of an object detection rectly based on the Dice coefficient. Drozdzal et al.
pipeline as well. (2016) investigated the use of short ResNet-like skip
connections in addition to the long skip-connections in
Challenges in meaningful application of deep learn- a regular U-net.
ing algorithms in object detection are thus mostly sim- RNNs have recently become more popular for seg-
ilar to those in object classification. Only few pa- mentation tasks. For example, Xie et al. (2016b) used
pers directly address issues specific to object detection a spatial clockwork RNN to segment the perimysium
like class imbalance/hard-negative mining or efficient in H&E-histopathology images. This network takes
pixel/voxel-wise processing of images. We expect that into account prior information from both the row and
more emphasis will be given to those areas in the near column predecessors of the current patch. To incor-
future, for example in the application of multi-stream porate bidirectional information from both left/top and
networks in a fully convolutional fashion. right/bottom neighbors, the RNN is applied four times
11
in different orientations and the end-result is concate- 3.3.2. Lesion segmentation
nated and fed to a fully-connected layer. This produces Segmentation of lesions combines the challenges of
the final output for a single patch. Stollenga et al. (2015) object detection and organ and substructure segmen-
where the first to use a 3D LSTM-RNN with convolu- tation in the application of deep learning algorithms.
tional layers in six directions. Andermatt et al. (2016) Global and local context are typically needed to per-
used a 3D RNN with gated recurrent units to segment form accurate segmentation, such that multi-stream net-
gray and white matter in a brain MRI data set. Chen works with different scales or non-uniformly sampled
et al. (2016d) combined bi-directional LSTM-RNNs patches are used as in for example Kamnitsas et al.
with 2D U-net-like-architectures to segment structures (2017) and Ghafoorian et al. (2016b). In lesion seg-
in anisotropic 3D electron microscopy images. Last, mentation we have also seen the application of U-net
Poudel et al. (2016) combined a 2D U-net architecture and similar architectures to leverage both this global
with a gated recurrent unit to perform 3D segmentation. and local context. The architecture used by Wang et al.
Although these specific segmentation architectures (2015), similar to the U-net, consists of the same down-
offered compelling advantages, many authors have also sampling and upsampling paths, but does not use skip
obtained excellent segmentation results with patch- connections. Another U-net-like architecture was used
trained neural networks. One of the earliest papers cov- by Brosch et al. (2016) to segment white matter lesions
ering medical image segmentation with deep learning in brain MRI. However, they used 3D convolutions and
algorithms used such a strategy and was published by a single skip connection between the first convolutional
Ciresan et al. (2012). They applied pixel-wise segmen- and last deconvolutional layers.
tation of membranes in electron microscopy imagery in One other challenge that lesion segmentation shares
a sliding window fashion. Most recent papers now use with object detection is class imbalance, as most vox-
fCNNs (subsection 2.4.3) in preference over sliding- els/pixels in an image are from the non-diseased class.
window-based classification to reduce redundant com- Some papers combat this by adapting the loss function:
putation. Brosch et al. (2016) defined it to be a weighted combi-
fCNNs have also been extended to 3D and have nation of the sensitivity and the specificity, with a larger
been applied to multiple targets at once: Korez et al. weight for the specificity to make it less sensitive to the
(2016), used 3D fCNNs to generate vertebral body like- data imbalance. Others balance the data set by perform-
lihood maps which drove deformable models for ver- ing data augmentation on positive samples (Kamnitsas
tebral body segmentation in MR images, Zhou et al. et al., 2017; Litjens et al., 2016; Pereira et al., 2016).
(2016) segmented nineteen targets in the human torso, Thus lesion segmentation sees a mixture of ap-
and Moeskops et al. (2016b) trained a single fCNN to proaches used in object detection and organ segmenta-
segment brain MRI, the pectoral muscle in breast MRI, tion. Developments in these two areas will most likely
and the coronary arteries in cardiac CT angiography naturally propagate to lesion segmentation as the exist-
(CTA). ing challenges are also mostly similar.
One challenge with voxel classification approaches 3.4. Registration
is that they sometimes lead to spurious responses. To Registration (i.e. spatial alignment) of medical im-
combat this, groups have tried to combine fCNNs with ages is a common image analysis task in which a coordi-
graphical models like MRFs (Shakeri et al., 2016; Song nate transform is calculated from one medical image to
et al., 2015) and Conditional Random Fields (CRFs) another. Often this is performed in an iterative frame-
(Alansary et al., 2016; Cai et al., 2016a; Christ et al., work where a specific type of (non-)parametric trans-
2016; Dou et al., 2016c; Fu et al., 2016a; Gao et al., formation is assumed and a pre-determined metric (e.g.
2016c) to refine the segmentation output. In most of the L2-norm) is optimized. Although segmentation and le-
cases, graphical models are applied on top of the likeli- sion detection are more popular topics for deep learn-
hood map produced by CNNs or fCNNs and act as label ing, researchers have found that deep networks can be
regularizers. beneficial in getting the best possible registration per-
Summarizing, segmentation in medical imaging has formance. Broadly speaking, two strategies are preva-
seen a huge influx of deep learning related methods. lent in current literature: (1) using deep-learning net-
Custom architectures have been created to directly tar- works to estimate a similarity measure for two images
get the segmentation task. These have obtained promis- to drive an iterative optimization strategy, and (2) to di-
ing results, rivaling and often improving over results ob- rectly predict transformation parameters using deep re-
tained with fCNNs. gression networks.
12
Wu et al. (2013), Simonovsky et al. (2016), and nificantly improved execution time: 1500x speed-up for
Cheng et al. (2015) used the first strategy to try to opti- 2D and 66x speed-up for 3D.
mize registration algorithms. Cheng et al. (2015) used In contrast to classification and segmentation, the re-
two types of stacked auto-encoders to assess the local search community seems not have yet settled on the best
similarity between CT and MRI images of the head. way to integrate deep learning techniques in registration
Both auto-encoders take vectorized image patches of methods. Not many papers have yet appeared on the
CT and MRI and reconstruct them through four lay- subject and existing ones each have a distinctly differ-
ers. After the networks are pre-trained using unsu- ent approach. Thus, giving recommendations on what
pervised patch reconstruction they are fine-tuned using method is most promising seems inappropriate. How-
two prediction layers stacked on top of the third layer ever, we expect to see many more contributions of deep
of the SAE. These prediction layers determine whether learning to medical image registration in the near future.
two patches are similar (class 1) or dissimilar (class 2).
Simonovsky et al. (2016) used a similar strategy, al- 3.5. Other tasks in medical imaging
beit with CNNs, to estimate a similarity cost between
two patches from differing modalities. However, they 3.5.1. Content-based image retrieval
also presented a way to use the derivative of this met- Content-based image retrieval (CBIR) is a technique
ric to directly optimize the transformation parameters, for knowledge discovery in massive databases and of-
which are decoupled from the network itself. Last, Wu fers the possibility to identify similar case histories, un-
et al. (2013) combined independent subspace analysis derstand rare disorders, and, ultimately, improve patient
and convolutional layers to extract features from input care. The major challenge in the development of CBIR
patches in an unsupervised manner. The resultant fea- methods is extracting effective feature representations
ture vectors are used to drive the HAMMER registration from the pixel-level information and associating them
algorithm instead of handcrafted features. with meaningful concepts. The ability of deep CNN
Miao et al. (2016) and Yang et al. (2016d) used deep models to learn rich features at multiple levels of ab-
learning algorithms to directly predict the registration straction has elicited interest from the CBIR commu-
transform parameters given input images. Miao et al. nity.
(2016) leveraged CNNs to perform 3D model to 2D x- All current approaches use (pre-trained) CNNs to ex-
ray registration to assess the pose and location of an tract feature descriptors from medical images. Anavi
implanted object during surgery. In total the transfor- et al. (2016) and Liu et al. (2016b) applied their meth-
mation has 6 parameters, two translational, 1 scaling ods to databases of X-ray images. Both used a five-layer
and 3 angular parameters. They parameterize the fea- CNN and extracted features from the fully-connected
ture space in steps of 20 degrees for two angular pa- layers. Anavi et al. (2016) used the last layer and a
rameters and train a separate CNN to predict the update pre-trained network. Their best results were obtained
to the transformation parameters given an digitally re- by feeding these features to a one-vs-all support vec-
constructed x-ray of the 3D model and the actual inter- tor machine (SVM) classifier to obtain the distance met-
operative x-ray. The CNNs are trained with artificial ric. They showed that incorporating gender information
examples generated by manually adapting the transfor- resulted in better performance than just CNN features.
mation parameters for the input training data. They Liu et al. (2016b) used the penultimate fully-connected
showed that their approach has significantly higher reg- layer and a custom CNN trained to classify X-rays in
istration success rates than using traditional - purely in- 193 classes to obtain the descriptive feature vector. Af-
tensity based - registration methods. Yang et al. (2016d) ter descriptor binarization and data retrieval using Ham-
tackled the problem of prior/current registration in brain ming separation values, the performance was inferior
MRI using the OASIS data set. They used the large to the state of the art, which the authors attributed to
deformation diffeomorphic metric mapping (LDDMM) small patch sizes of 96 pixels. The method proposed
registration methodology as a basis. This method takes by Shah et al. (2016) combines CNN feature descrip-
as input an initial momentum value for each pixel which tors with hashing-forests. 1000 features were extracted
is then evolved over time to obtain the final transfor- for overlapping patches in prostate MRI volumes, after
mation. However, the calculation of the initial momen- which a large feature matrix was constructed over all
tum map is often an expensive procure. The authors volumes. Hashing forests were then used to compress
circumvent this by training a U-net like architecture to this into descriptors for each volume.
predict the x- and y-momentum map given the input im- Content-based image retrieval as a whole has thus
ages. They obtain visually similar results but with sig- not seen many successful applications of deep learning
13
(2016c), 3T and 7T brain MRI in Bahrami et al. (2016),
PET from MRI in Li et al. (2014), and CT from MRI in
Nie et al. (2016a). Li et al. (2014) even showed that one
can use these generated images in computer-aided diag-
nosis systems for Alzheimer’s disease when the original
data is missing or not acquired.
With multi-stream CNNs super-resolution images
can be generated from multiple low-resolution inputs
(section 2.4.2). In Oktay et al. (2016), multi-stream net-
works reconstructed high-resolution cardiac MRI from
one or more low-resolution input MRI volumes. Not
only can this strategy be used to infer missing spatial in-
formation, but can also be leveraged in other domains;
for example, inferring advanced MRI diffusion parame-
ters from limited data (Golkov et al., 2016). Other im-
age enhancement applications like intensity normaliza-
tion and denoising have seen only limited application of
deep learning algorithms. Janowczyk et al. (2016a) used
SAEs to normalize H&E-stained histopathology images
Figure 3: Collage of some medical imaging applications in which whereas Benou et al. (2016) used CNNs to perform de-
deep learning has achieved state-of-the-art results. From top-left to noising in DCE-MRI time-series.
bottom-right: mammographic mass classification (Kooi et al., 2016), Image generation has seen impressive results with
segmentation of lesions in the brain (top ranking in BRATS, ISLES
and MRBrains challenges, image from Ghafoorian et al. (2016b), leak
very creative applications of deep networks in signifi-
detection in airway tree segmentation (Charbonnier et al., 2017), di- cantly differing tasks. One can only expect the number
abetic retinopathy classification (Kaggle Diabetic Retinopathy chal- of tasks to increase further in the future.
lenge 2015, image from van Grinsven et al. (2016), prostate segmen-
tation (top rank in PROMISE12 challenge), nodule classification (top
3.5.3. Combining Image Data With Reports
ranking in LUNA16 challenge), breast cancer metastases detection in
lymph nodes (top ranking and human expert performance in CAME- The combination of text reports and medical image
LYON16), human expert performance in skin lesion classification (Es- data has led to two avenues of research: (1) leverag-
teva et al., 2017), and state-of-the-art bone suppression in x-rays, im- ing reports to improve image classification accuracy
age from Yang et al. (2016c).
(Schlegl et al., 2015), and (2) generating text reports
from images (Kisilev et al., 2016; Shin et al., 2015,
methods yet, but given the results in other areas it seems 2016a; Wang et al., 2016e); the latter inspired by recent
only a matter of time. An interesting avenue of research caption generation papers from natural images (Karpa-
could be the direct training of deep networks for the re- thy and Fei-Fei, 2015). To the best of our knowledge,
trieval task itself. the first step towards leveraging reports was taken by
Schlegl et al. (2015), who argued that large amounts of
annotated data may be difficult to acquire and proposed
3.5.2. Image Generation and Enhancement to add semantic descriptions from reports as labels. The
A variety of image generation and enhancement system was trained on sets of images along with their
methods using deep architectures have been proposed, textual descriptions and was taught to predict semantic
ranging from removing obstructing elements in im- class labels during test time. They showed that semantic
ages, normalizing images, improving image quality, information increases classification accuracy for a va-
data completion, and pattern discovery. riety of pathologies in Optical Coherence Tomography
In image generation, 2D or 3D CNNs are used to (OCT) images.
convert one input image into another. Typically these Shin et al. (2015) and Wang et al. (2016e) mined se-
architectures lack the pooling layers present in classifi- mantic interactions between radiology reports and im-
cation networks. These systems are then trained with a ages from a large data set extracted from a PACS sys-
data set in which both the input and the desired output tem. They employed latent Dirichlet allocation (LDA),
are present, defining the differences between the gener- a type of stochastic model that generates a distribution
ated and desired output as the loss function. Examples over a vocabulary of topics based on words in a docu-
are regular and bone-suppressed X-ray in Yang et al. ment. In a later work, Shin et al. (2016a) proposed a sys-
14
Table 1: Overview of papers using deep learning techniques for brain image analysis. All works use MRI unless otherwise mentioned.

Reference Method Application; remarks


Disorder classification (AD, MCI, Schizophrenia)
Brosch and Tam (2013) DBN AD/HC classification; Deep belief networks with convolutional RBMs for manifold learning
Plis et al. (2014) DBN Deep belief networks evaluated on brain network estimation, Schizophrenia and Huntington’s disease classification
Suk and Shen (2013) SAE AD/MCI classification; Stacked auto encoders with supervised fine tuning
Suk et al. (2014) RBM AD/MCI/HC classification; Deep Boltzmann Machines on MRI and PET modalities
Payan and Montana (2015) CNN AD/MCI/HC classification; 3D CNN pre-trained with sparse auto-encoders
Suk et al. (2015) SAE AD/MCI/HC classification; SAE for latent feature extraction on a large set of hand-crafted features from MRI and PET
Hosseini-Asl et al. (2016) CNN AD/MCI/HC classification; 3D CNN pre-trained with a 3D convolutional auto-encoder on fMRI data
Kim et al. (2016b) ANN Schizophrenia/NH classification on fMRI; Neural network showing advantage of pre-training with SAEs, and L1 sparsification
Ortiz et al. (2016) DBN AD/MCI/HC classification; An ensemble of Deep belief networks, with their votes fused using an SVM classifier
Pinaya et al. (2016) DBN Schizophrenia/NH classification; DBN pre-training followed by supervised fine-tuning
Sarraf and Tofighi (2016) CNN AD/HC classification; Adapted Lenet-5 architecture on fMRI data
Suk et al. (2016) SAE MCI/HC classification of fMRI data; Stacked auto-encoders for feature extraction, HMM as a generative model on top
Suk and Shen (2016) CNN AD/MCI/HC classification; CNN on sparse representations created by regression models
Shi et al. (2017) ANN AD/MCI/HC classification; Multi-modal stacked deep polynomial networks with an SVM classifier on top using MRI and PET
Tissue/anatomy/lesion/tumor segmentation
Guo et al. (2014) SAE Hippocampus segmentation; SAE for representation learning used for target/atlas patch similarity measurement
de Brebisson and Montana (2015) CNN Anatomical segmentation; fusing multi-scale 2D patches with a 3D patch using a CNN
Choi and Jin (2016) CNN Striatum segmentation; Two-stage (global/local) approximations with 3D CNNs
Stollenga et al. (2015) RNN Tissue segmentation; PyraMiD-LSTM, best brain segmentation results on MRBrainS13 (and competitive results on EM-ISBI12)
Zhang et al. (2015) CNN Tissue segmentation; multi-modal 2D CNN
Andermatt et al. (2016) RNN Tissue segmentation; two convolutional gated recurrent units in different directions for each dimension
Bao and Chung (2016) CNN Anatomical segmentation; Multi-scale late fusion CNN with random walker as a novel label consistency method
Birenbaum and Greenspan (2016) CNN Lesion segmentation; Multi-view (2.5D) CNN concatenating features from previous time step for a longitudinal analysis
Brosch et al. (2016) CNN Lesion segmentation; Convolutional encoder-decoder network with shortcut connections and convolutional RBM pretraining
Chen et al. (2016a) CNN Tissue segmentation; 3D res-net combining features from different layers
Ghafoorian et al. (2016b) CNN Lesion segmentation; CNN trained on non-uniformly sampled patch to integrate a larger context with a foviation effect
Ghafoorian et al. (2016a) CNN Lesion segmentation; multi-scale CNN with late fusion that integrates anatomical location information into network
Havaei et al. (2016b) CNN Tumor segmentation; CNN handling missing modalities with abstraction layer that transforms feature maps to their statistics
Havaei et al. (2016a) CNN Tumor segmentation; two-path way CNN with different receptive fields
Kamnitsas et al. (2017) CNN Tumor segmentation; 3D multi-scale fully convolutional network with CRF for label consistency
Kleesiek et al. (2016) CNN Brain extraction; 3D fully convolutional CNN on multi-modal input
Mansoor et al. (2016) SAE Visual pathway segmentation; Learning appearance features from SAE for steering the shape model for segmentation
Milletari et al. (2016a) CNN Anatomical segmentation on MRI and US; Hough-voting to acquire mapping from CNN features to full patch segmentations
Moeskops et al. (2016a) CNN Tissue segmentation; CNN trained on multiple patch sizes
Nie et al. (2016b) CNN Infant tissue segmentation; FCN with a late fusion method on different modalities
Pereira et al. (2016) CNN Tumor segmentation; CNN on multiple modality input
Shakeri et al. (2016) CNN Anatomical segmentation; FCN followed by Markov random fields
Zhao and Jia (2016) CNN Tumor segmentation; Multi-scale CNN with a late fusion architecture
Lesion/tumor detection and classification
Pan et al. (2015) CNN Tumor grading; 2D tumor patch classification using a CNN
Dou et al. (2015) ISA Microbleed detection; 3D stacked Independent Subspace Analysis for candidate feature extraction, SVM classification
Dou et al. (2016c) CNN Microbleed detection; 3D FCN for candidate segmentation followed by a 3D CNN as false positive reduction
Ghafoorian et al. (2017) CNN Lacune detection; FCN for candidate segmentation then a multi-scale 3D CNN with anatomical features as false positive reduction
Survival/disease activity/development prediction
Kawahara et al. (2016b) CNN Neurodevelopment prediction; CNN with specially-designed edge-to-edge, edge-to-node and node-to-graph conv. layers for brain nets
Nie et al. (2016c) CNN Survival prediction; features from a Multi-modal 3D CNN is fused with hand-crafted features to train an SVM
Yoo et al. (2016) CNN Disease activity prediction; Training a CNN on the Euclidean distance transform of the lesion masks as the input
van der Burgh et al. (2017) CNN Survival prediction; DBN on MRI and fusing it with clinical characteristics and structural connectivity data
Image construction/enhancement
Li et al. (2014) CNN Image construction; 3D CNN for constructing PET from MR images
Bahrami et al. (2016) CNN Image construction; 3D CNN for constructing 7T-like images from 3T MRI
Benou et al. (2016) SAE Denoising DCE-MRI; using an ensemble of denoising SAE (pretrained with RBMs)
Golkov et al. (2016) CNN Image construction; Per-pixel neural network to predict complex diffusion parameters based on fewer measurements
Hoffmann et al. (2016) ANN Image construction; Deep neural nets with SRelu nonlinearity for thermal image construction
Nie et al. (2016a) CNN Image construction; 3D fully convolutional network for constructing CT from MR images
Sevetlidis et al. (2016) ANN Image construction; Encoder-decoder network for synthesizing one MR modality from another
Other
Brosch et al. (2014) DBN Manifold Learning; DBN with conv. RBM layers for modeling the variability in brain morphology and lesion distribution in MS
Cheng et al. (2015) ANN Similarity measurement; neural network fusing the moving and reference image patches, pretrained with SAE
Huang et al. (2016) RBM fMRI blind source separation; RBM for both internal and functional interaction-induced latent sources detection
Simonovsky et al. (2016) CNN Similarity measurement; 3D CNN estimating similarity between reference and moving images stacked in the input
Wu et al. (2013) ISA Correspondence detection in deformable registration; stacked convolutional ISA for unsupervised feature learning
Yang et al. (2016d) CNN Image registration; Conv. encoder-decoder net. predicting momentum in x and y directions, given the moving and fixed image patches

tem to generate descriptions from chest X-rays. A CNN et al. (2016) used a completely different approach and
was employed to generate a representation of an image predicted categorical BI-RADS descriptors for breast
one label at a time, which was then used to train an lesions. In their work they focused on three descrip-
RNN to generate sequence of MeSH keywords. Kisilev tors used in mammography: shape, margin, and density,
15
Table 2: Overview of papers using deep learning techniques for retinal image analysis. All works use CNNs.

Color fundus images: segmentation of anatomical structures and quality assessment


Fu et al. (2016b) Blood vessel segmentation; CNN combined with CRF to model long-range pixel interactions
Fu et al. (2016a) Blood vessel segmentation; extending the approach by Fu et al. (2016b) by reformulating CRF as RNN
Mahapatra et al. (2016) Image quality assessment; classification output using CNN-based features combined with the output using saliency maps
Maninis et al. (2016) Segmentation of blood vessels and optic disk; VGG-19 network extended with specialized layers for each segmentation task
Wu et al. (2016) Blood vessel segmentation; patch-based CNN followed by mapping PCA solution of last layer feature maps to full segmentation
Zilly et al. (2017) Segmentation of the optic disk and the optic cup; simple CNN with filters sequentially learned using boosting
Color fundus images: detection of abnormalities and diseases
Chen et al. (2015d) Glaucoma detection; end-to-end CNN, the input is a patch centered at the optic disk
Abràmoff et al. (2016) Diabetic retinopathy detection; end-to-end CNN, outperforms traditional method, evaluated on a public dataset
Burlina et al. (2016) Age-related macular degeneration detection; uses overfeat pretrained network for feature extraction
van Grinsven et al. (2016) Hemorrhage detection; CNN dynamically trained using selective data sampling to perform hard negative mining
Gulshan et al. (2016) Diabetic retinopathy detection; Inception network, performance comparable to a panel of seven certified ophthalmologists
Prentasic and Loncaric (2016) Hard exudate detection; end-to-end CNN combined with the outputs of traditional classifiers for detection of landmarks
Worrall et al. (2016) Retinopathy of prematurity detection; fine-tuned ImageNet trained GoogLeNet, feature map visualization to highlight disease
Work in other imaging modalities
Gao et al. (2015) Cataract classification in slit lamp images; CNN followed by a set of recursive neural networks to extract higher order features
Schlegl et al. (2015) Fluid segmentation in OCT; weakly supervised CNN improved with semantic descriptors from clinical reports
Prentasic et al. (2016) Blood vessel segmentation in OCT angiography; simple CNN, segmentation of several capillary networks

where each have their own class label. The system was However, the local patches might lack the contextual
fed with the image data and region proposals and pre- information required for tasks where anatomical infor-
dicts the correct label for each descriptor (e.g. for shape mation is paramount (e.g. white matter lesion segmen-
either oval, round, or irregular). tation). To tackle this, Ghafoorian et al. (2016b) used
Given the wealth of data that is available in PACS non-uniformly sampled patches by gradually lowering
systems in terms of images and corresponding diag- sampling rate in patch sides to span a larger context.
nostic reports, it seems like an ideal avenue for future An alternative strategy used by many groups is multi-
deep learning research. One could expect that advances scale analysis and a fusion of representations in a fully-
in captioning natural images will in time be applied to connected layer.
these data sets as well. Even though brain images are 3D volumes in all sur-
veyed studies, most methods work in 2D, analyzing the
3D volumes slice-by-slice. This is often motivated by
4. Anatomical application areas
either the reduced computational requirements or the
This section presents an overview of deep learning thick slices relative to in-plane resolution in some data
contributions to the various application areas in medi- sets. More recent publications had also employed 3D
cal imaging. We highlight some key contributions and networks.
discuss performance of systems on large data sets and DNNs have completely taken over many brain image
on public challenge data sets. All these challenges are analysis challenges. In the 2014 and 2015 brain tumor
listed on http:\\www.grand-challenge.org. segmentation challenges (BRATS), the 2015 longitu-
dinal multiple sclerosis lesion segmentation challenge,
4.1. Brain the 2015 ischemic stroke lesion segmentation challenge
DNNs have been extensively used for brain image (ISLES), and the 2013 MR brain image segmentation
analysis in several different application domains (Ta- challenge (MRBrains), the top ranking teams to date
ble 1). A large number of studies address classification have all used CNNs. Almost all of the aforementioned
of Alzheimer’s disease and segmentation of brain tis- methods are concentrating on brain MR images. We ex-
sue and anatomical structures (e.g. the hippocampus). pect that other brain imaging modalities such as CT and
Other important areas are detection and segmentation US can also benefit from deep learning based analysis.
of lesions (e.g. tumors, white matter lesions, lacunes,
micro-bleeds). 4.2. Eye
Apart from the methods that aim for a scan-level Ophthalmic imaging has developed rapidly over the
classification (e.g. Alzheimer diagnosis), most meth- past years, but only recently are deep learning algo-
ods learn mappings from local patches to representa- rithms being applied to eye image understanding. As
tions and subsequently from representations to labels. summarized in Table 2, most works employ simple
16
Table 3: Overview of papers using deep learning techniques for chest x-ray image analysis.

Reference Application Remarks


Lo et al. (1995) Nodule detection Classifies candidates from small patches with two-layer CNN, each with 12 5 × 5 filters
Anavi et al. (2015) Image retrieval Combines classical features with those from pre-trained CNN for image retrieval using SVM
Bar et al. (2015) Pathology detection Features from a pre-trained CNN and low level features are used to detect various diseases
Anavi et al. (2016) Image retrieval Continuation of Anavi et al. (2015), adding age and gender as features
Bar et al. (2016) Pathology detection Continuation of Bar et al. (2015), more experiments and adding feature selection
Cicero et al. (2016) Pathology detection GoogLeNet CNN detects five common abnormalities, trained and validated on a large data set
Hwang et al. (2016) Tuberculosis detection Processes entire radiographs with a pre-trained fine-tuned network with 6 convolution layers
Kim and Hwang (2016) Tuberculosis detection MIL framework produces heat map of suspicious regions via deconvolution
Shin et al. (2016a) Pathology detection CNN detects 17 diseases, large data set (7k images), recurrent networks produce short captions
Rajkomar et al. (2017) Frontal/lateral classification Pre-trained CNN performs frontal/lateral classification task
Yang et al. (2016c) Bone suppression Cascade of CNNs at increasing resolution learns bone images from gradients of radiographs
Wang et al. (2016a) Nodule classification Combines classical features with CNN features from pre-trained ImageNet CNN

Table 4: Overview of papers using deep learning techniques for chest CT image analysis.

Reference Application; remarks


Segmentation
Charbonnier et al. (2017) Airway segmentation where multi-view CNN classifies candidate branches as true airways or leaks
Nodule detection and analysis
Ciompi et al. (2015) Used a standard feature extractor and a pre-trained CNN to classify detected lesions as benign peri-fissural nodules
van Ginneken et al. (2015) Detects nodules with pre-trained CNN features from orthogonal patches around candidate, classified with SVM
Shen et al. (2015b) Three CNNs at different scales estimate nodule malignancy scores of radiologists (LIDC-IDRI data set)
Chen et al. (2016e) Combines features from CNN, SDAE and classical features to characterize nodules from LIDC-IDRI data set
Ciompi et al. (2016) Multi-stream CNN to classify nodules into subtypes: solid, part-solid, non-solid, calcified, spiculated, perifissural
Dou et al. (2016b) Uses 3D CNN around nodule candidates; ranks #1 in LUNA16 nodule detection challenge
Li et al. (2016a) Detects nodules with 2D CNN that processes small patches around a nodule
Setio et al. (2016) Detects nodules with end-to-end trained multi-stream CNN with 9 patches per candidate
Shen et al. (2016) 3D CNN classifies volume centered on nodule as benign/malignant, results are combined to patient level prediction
Sun et al. (2016b) Same dataset as Shen et al. (2015b), compares CNN, DBN, SDAE and classical computer-aided diagnosis schemes
Teramoto et al. (2016) Combines features extracted from 2 orthogonal CT patches and a PET patch
Interstitial lung disease
Anthimopoulos et al. (2016) Classification of 2D patches into interstitial lung texture classes using a standard CNN
Christodoulidis et al. (2017) 2D interstitial pattern classification with CNNs pre-trained with a variety of texture data sets
Gao et al. (2016c) Propagates manually drawn segmentations using CNN and CRF for more accurate interstitial lung disease reference
Gao et al. (2016a) AlexNet applied to large parts of 2D CT slices to detect presence of interstitial patterns
Gao et al. (2016b) Uses regression to predict area covered in 2D slice with a particular interstitial pattern
Tarando et al. (2016) Combines existing computer-aided diagnosis system and CNN to classify lung texture patterns.
van Tulder and de Bruijne (2016) Classification of lung texture and airways using an optimal set of filters derived from DBNs and RBMs
Other applications
Tajbakhsh et al. (2015a) Multi-stream CNN to detect pulmonary embolism from candidates obtained from a tobogganing algorithm
Carneiro et al. (2016) Predicts 5-year mortality from thick slice CT scans and segmentation masks
de Vos et al. (2016a) Identifies the slice of interest and determine the distance between CT slices

CNNs for the analysis of color fundus imaging (CFI). Gulshan et al. (2016) performed a thorough analysis
A wide variety of applications are addressed: segmen- of the performance of a Google Inception v3 network
tation of anatomical structures, segmentation and detec- for diabetic retinopathy detection, showing performance
tion of retinal abnormalities, diagnosis of eye diseases, comparable to a panel of seven certified ophthalmolo-
and image quality assessment. gists.
In 2015, Kaggle organized a diabetic retinopathy de-
tection competition: Over 35,000 color fundus images 4.3. Chest
were provided to train algorithms to predict the sever- In thoracic image analysis of both radiography and
ity of disease in 53,000 test images. The majority of computed tomography, the detection, characterization,
the 661 teams that entered the competition applied deep and classification of nodules is the most commonly ad-
learning and four teams achieved performance above dressed application. Many works add features derived
that of humans, all using end-to-end CNNs. Recently from deep networks to existing feature sets or compare
17
Table 5: Overview of papers using deep learning for digital pathology images. The staining and imaging modality abbreviations used in the table are
as follows: H&E: hematoxylin and eosin staining, TIL: Tumor-infiltrating lymphocytes, BCC: Basal cell carcinoma, IHC: immunohistochemistry,
RM: Romanowsky, EM: Electron microscopy, PC: Phase contrast, FL: Fluorescent, IFL: Immunofluorescent, TPM: Two-photon microscopy, CM:
Confocal microscopy, Pap: Papanicolaou.

Reference Topic Staining\Modality Method


Nucleus detection, segmentation, and classification
Cireşan et al. (2013) Mitosis detection H&E CNN-based pixel classifier
Cruz-Roa et al. (2013) Detection of basal cell carcinoma H&E Convolutional auto-encoder neural network
Malon and Cosatto (2013) Mitosis detection H&E Combines shapebased features with CNN
Wang et al. (2014) Mitosis detection H&E Cascaded ensemble of CNN and handcrafted features
Ferrari et al. (2015) Bacterial colony counting Culture plate CNN-based patch classifier
Ronneberger et al. (2015) Cell segmentation EM U-Net with deformation augmentation
Shkolyar et al. (2015) Mitosis detection Live-imaging CNN-based patch classifier
Song et al. (2015) Segmentation of cytoplasm and nuclei H&E Multi-scale CNN and graph-partitioning-based method
Xie et al. (2015a) Nucleus detection Ki-67 CNN model that learns the voting offset vectors and voting confidence
Xie et al. (2015b) Nucleus detection H&E, Ki-67 CNN-based structured regression model for cell detection
Akram et al. (2016) Cell segmentation FL, PC, H&E fCNN for cell bounding box proposal and CNN for segmentation
Albarqouni et al. (2016) Mitosis detection H&E Incorporated ‘crowd sourcing’ layer into the CNN framework
Bauer et al. (2016) Nucleus classification IHC CNN-based patch classifier
Chen et al. (2016b) Mitosis detection H&E Deep regression network (DRN)
Gao et al. (2016e) Nucleus classification IFL Classification of Hep2-cells with CNN
Han et al. (2016) Nucleus classification IFL Classification of Hep2-cells with CNN
Janowczyk et al. (2016b) Nucleus segmentation H&E Resolution adaptive deep hierarchical learning scheme
Kashif et al. (2016) Nucleus detection H&E Combination of CNN and hand-crafted features
Mao and Yin (2016) Mitosis detection PC Hierarchical CNNs for patch sequence classification
Mishra et al. (2016) Classification of mitochondria EM CNN-based patch classifier
Phan et al. (2016) Nucleus classification FL Classification of Hep2-cells using transfer learning (pre-trained CNN)
Romo-Bucheli et al. (2016) Tubule nuclei detection H&E CNN-based classification of pre-selected candidate nuclei
Sirinukunwattana et al. (2016) Nucleus detection and classification H&E CNN with spatially constrained regression
Song et al. (2017) Cell segmentation H&E Multi-scale CNN
Turkki et al. (2016) TIL detection H&E CNN-based classification of superpixels
Veta et al. (2016) Nuclear area measurement H&E A CNN directly measures nucleus area without requiring segmentation
Wang et al. (2016d) Subtype cell detection H&E Combination of two CNNs for joint cell detection and classification
Xie et al. (2016a) Nucleus detection and cell counting FL and H&E Microscopy cell counting with fully convolutional regression networks
Xing et al. (2016) Nucleus segmentation H&E, IHC CNN and selection-based sparse shape model
Xu et al. (2016b) Nucleus detection H&E Stacked sparse auto-encoders (SSAE)
Xu and Huang (2016) Nucleus detection Various General deep learning framework to detect cells in whole-slide images
Yang et al. (2016b) Glial cell segmentation TPM fCNN with an iterative k-terminal cut algorithm
Yao et al. (2016) Nucleus classification H&E Classifies cellular tissue into tumor, lymphocyte, and stromal
Zhao et al. (2016) Classification of leukocytes RM CNN-based patch classifier
Large organ segmentation
Ciresan et al. (2012) Segmentation of neuronal membranes EM Ensemble of several CNNs with different architectures
Kainz et al. (2015) Segmentation of colon glands H&E Used two CNNs to segment glands and their separating structures
Apou et al. (2016) Detection of lobular structures in breast IHC Combined the outputs of a CNN and a texture classification system
BenTaieb and Hamarneh (2016) Segmentation of colon glands H&E fCNN with a loss accounting for smoothness and object interactions
BenTaieb et al. (2016) Segmentation of colon glands H&E A multi-loss fCNN to perform both segmentation and classification
Chen et al. (2016d) Neuronal membrane and fungus segmentation EM Combination of bi-directional LSTM-RNNs and kU-Nets
Chen et al. (2017) Segmentation of colon glands H&E Deep contour-aware CNN
Çiçek et al. (2016) Segmentation of xenopus kidney CM 3D U-Net
Drozdzal et al. (2016) Segmentation of neuronal structures EM fCNN with skip connections
Li et al. (2016b) Segmentation of colon glands H&E Compares CNN with an SVM using hand-crafted features
Teikari et al. (2016) Volumetric vascular segmentation FL Hybrid 2D-3D CNN architecture
Wang et al. (2016c) Segmentation of messy and muscle regions H&E Conditional random field jointly trained with an fCNN
Xie et al. (2016b) Perimysium segmentation H&E 2D spatial clockwork RNN
Xu et al. (2016d) Segmentation of colon glands H&E Used three CNNs to predict gland and contour pixels
Xu et al. (2016a) Segmenting epithelium & stroma H&E, IHC CNNs applied to over-segmented image regions (superpixels)
Detection and classification of disease
Cruz-Roa et al. (2014) Detection of invasive ductal carcinoma H&E CNN-based patch classifier
Xu et al. (2014) Patch-level classification of colon cancer H&E Multiple instance learning framework with CNN features
Bychkov et al. (2016) Outcome prediction of colorectal cancer H&E Extracted CNN features from epithelial tissue for prediction
Chang et al. (2017) Multiple cancer tissue classification Various Transfer learning using multi-Scale convolutional sparse coding
Günhan Ertosun and Rubin (2015) Grading glioma H&E Ensemble of CNNs
Källén et al. (2016) Predicting Gleason score H&E OverFeat pre-trained network as feature extractor
Kim et al. (2016a) Thyroid cytopathology classification H&E, RM & Pap Fine-tuning pre-trained AlexNet
Litjens et al. (2016) Detection of prostate and breast cancer H&E fCNN-based pixel classifier
Quinn et al. (2016) Malaria, tuberculosis and parasites detection Light microscopy CNN-based patch classifier
Rezaeilouyeh et al. (2016) Gleason grading and breast cancer detection H&E The system incorporates shearlet features inside a CNN
Schaumberg et al. (2016) SPOP mutation prediction of prostate cancer H&E Ensemble of ResNets
Wang et al. (2016b) Metastases detection in lymph node H&E Ensemble of CNNs with hard negative mining
Other pathology applications
Janowczyk et al. (2016a) Stain normalization H&E Used SAE for classifying tissue and subsequent histogram matching
Janowczyk and Madabhushi (2016) Deep learning tutorial Various Covers different detecting, segmentation, and classification tasks
Sethi et al. (2016) Comparison of normalization algorithms H&E Presents effectiveness of stain normalization for application of CNNs

18
CNNs with classical machine learning approaches us- and AMIDA 2013, GLAS for gland segmentation and,
ing handcrafted features. In chest X-ray, several groups CAMELYON16 and TUPAC for processing breast can-
detect multiple diseases with a single system. In CT cer tissue samples.
the detection of textural patterns indicative of intersti- In both ICPR 2012 and the AMIDA13 challenges on
tial lung diseases is also a popular research topic. mitosis detection the IDSIA team outperformed other
Chest radiography is the most common radiological algorithms with a CNN based approach (Cireşan et al.,
exam; several works use a large set of images with text 2013). The same team had the highest performing sys-
reports to train systems that combine CNNs for image tem in EM 2012 (Ciresan et al., 2012) for 2D segmen-
analysis and RNNs for text analysis. This is a branch of tation of neuronal processes. In their approach, the task
research we expect to see more of in the near future. of segmenting membranes of neurons was performed
In a recent challenge for nodule detection in CT, by mild smoothing and thresholding of the output of a
LUNA16, CNN architectures were used by all top per- CNN, which computes pixel probabilities.
forming systems. This is in contrast with a previ-
GLAS addressed the problem of gland instance seg-
ous lung nodule detection challenge, ANODE09, where
mentation in colorectal cancer tissue samples. Xu et al.
handcrafted features were used to classify nodule candi-
(2016d) achieved the highest rank using three CNN
dates. The best systems in LUNA16 still rely on nodule
models. The first CNN classifies pixels as gland ver-
candidates computed by rule-based image processing,
sus non-gland. From each feature map of the first
but systems that use deep networks for candidate detec-
CNN, edge information is extracted using the holisti-
tion also performed very well (e.g. U-net). Estimating
cally nested edge technique, which uses side convolu-
the probability that an individual has lung cancer from
tions to produce an edge map. Finally, a third CNN
a CT scan is an important topic: It is the objective of
merges gland and edge maps to produce the final seg-
the Kaggle Data Science Bowl 2017, with $1 million in
mentation.
prizes and more than one thousand participating teams.
CAMELYON16 was the first challenge to provide
4.4. Digital pathology and microscopy participants with WSIs. Contrary to other medical
The growing availability of large scale gigapixel imaging applications, the availability of large amount
whole-slide images (WSI) of tissue specimen has made of annotated data in this challenge allowed for train-
digital pathology and microscopy a very popular appli- ing very deep models such as 22-layer GoogLeNet
cation area for deep learning techniques. The developed (Szegedy et al., 2014), 16-layer VGG-Net (Simonyan
techniques applied to this domain focus on three broad and Zisserman, 2014), and 101-layer ResNet (He et al.,
challenges: (1) Detecting, segmenting, or classifying 2015). The top-five performing systems used one of
nuclei, (2) segmentation of large organs, and (3) detect- these architectures. The best performing solution in the
ing and classifying the disease of interest at the lesion- Camelyon16 challenge was presented in Wang et al.
or WSI-level. Table 5 presents an overview for each of (2016b). This method is based on an ensemble of
these categories. two GoogLeNet architectures, one trained with and
Deep learning techniques have also been applied for one without hard-negative mining to tackle the chal-
normalization of histopathology images. Color normal- lenge. The latest submission of this team using the WSI
ization is an important research area in histopathology standardization algorithm by Ehteshami Bejnordi et al.
image analysis. In Janowczyk et al. (2016a), a method (2016) achieved an AUC of 0.9935, for task 2, which
for stain normalization of hematoxylin and eosin (H&E) outperformed the AUC of a pathologist (AUC = 0.966)
stained histopathology images was presented based on who independently scored the complete test set.
deep sparse auto-encoders. Recently, the importance of The recently held TUPAC challenge addressed detec-
color normalization was demonstrated by Sethi et al. tion of mitosis in breast cancer tissue, and prediction
(2016) for CNN based tissue classification in H&E of tumor grading at the WSI level. The top perform-
stained images. ing system by Paeng et al. (2016) achieved the highest
The introduction of grand challenges in digital performance in all tasks. The method has three main
pathology has fostered the development of comput- components: (1) Finding high cell density regions, (2)
erized digital pathology techniques. The challenges using a CNN to detect mitoses in the regions of interest,
that evaluated existing and new approaches for analy- (3) converting the results of mitosis detection to a fea-
sis of digital pathology images are: EM segmentation ture vector for each WSI and using an SVM classifier
challenge 2012 for the 2D segmentation of neuronal to compute the tumor proliferation and molecular data
processes, mitosis detection challenges in ICPR 2012 scores.
19
Table 6: Overview of papers using deep learning techniques for breast image analysis. MG = mammography; TS = tomosynthesis; US = ultrasound;
ADN = Adaptive Deconvolution Network.

Reference Modality Method Application; remarks


Sahiner et al. (1996) MG CNN First application of a CNN to mammography
Jamieson et al. (2012) MG, US ADN Four layer ADN, an early form of CNN for mass classification
Fonseca et al. (2015) MG CNN Pre-trained network extracted features classified with SVM for breast density estimation
Akselrod-Ballin et al. (2016) MG CNN Use a modified region proposal CNN (R-CNN) for the localization and classification of masses
Arevalo et al. (2016) MG CNN Lesion classification, combination with hand-crafted features gave the best performance
Dalmis et al. (2017) MRI CNN Breast and fibroglandular tissue segmentation
Dubrovina et al. (2016) MG CNN Tissue classification using regular CNNs
Dhungel et al. (2016) MG CNN Combination of different CNNs combined with hand-crafted features
Fotin et al. (2016) TS CNN Improved state-of-the art for mass detection in tomosynthesis
Hwang and Kim (2016) MG CNN Weakly supervised CNN for localization of masses
Huynh et al. (2016) MG CNN Pre-trained CNN on natural image patches applied to mass classification
Kallenberg et al. (2016) MG SAE Unsupervised CNN feature learning with SAE for breast density classification
Kisilev et al. (2016) MG CNN R-CNN combined with multi-class loss trained on semantic descriptions of potential masses
Kooi et al. (2016) MG CNN Improved the state-of-the art for mass detection and show human performance on a patch level
Qiu et al. (2016) MG CNN CNN for direct classification of future risk of developing cancer based on negative mammograms
Samala et al. (2016a) TS CNN Microcalcification detection
Samala et al. (2016b) TS CNN Pre-trained CNN on mammographic masses transfered to tomosynthesis
Sun et al. (2016a) MG CNN Semi-supervised CNN for classification of masses
Zhang et al. (2016c) US RBM Classification benign vs. malignant with shear wave elastography
Kooi et al. (2017) MG CNN Pre-trained CNN on mass/normal patches to discriminate malignant masses from (benign) cysts
Wang et al. (2017) MG CNN Detection of cardiovascular disease based on vessel calcification

Table 7: Overview of papers using deep learning techniques for cardiac image analysis.

Reference Modality Method Application; remarks


Emad et al. (2015) MRI CNN Left ventricle slice detection; simple CNN indicates if structure is present
Avendi et al. (2016) MRI CNN Left ventricle segmentation; AE used to initialize filters because training data set was small
Kong et al. (2016) MRI RNN Identification of end-diastole and end-systole frames from cardiac sequences
Oktay et al. (2016) MRI CNN Super-resolution; U-net/ResNet hybrid, compares favorably with standard superresolution methods
Poudel et al. (2016) MRI RNN Left ventricle segmentation; RNN processes stack of slices, evaluated on several public datasets
Rupprecht et al. (2016) MRI CNN Cardiac structure segmentation; patch-based CNNs integrated in active contour framework
Tran (2016) MRI CNN Left and right ventricle segmentation; 2D fCNN architecture, evaluated on several public data sets
Yang et al. (2016a) MRI CNN Left ventricle segmentation; CNN combined with multi-atlas segmentation
Zhang et al. (2016b) MRI CNN Identifying presence of apex and base slices in cardiac exam for quality assessment
Ngo et al. (2017) MRI DBN Left ventricle segmentation; DBN is used to initialize a level set framework
Carneiro et al. (2012) US DBN Left ventricle segmentation; DBN embedded in system using landmarks and non-rigid registration
Carneiro and Nascimento (2013) US DBN Left ventricle tracking; extension of Carneiro et al. (2012) for tracking
Chen et al. (2016c) US CNN Structure segmentation in 5 different 2D views; uses transfer learning
Ghesu et al. (2016b) US CNN 3D aortic valve detection and segmentation; uses shallow and deeper sparse networks
Nascimento and Carneiro (2016) US DBN Left ventricle segmentation; DBN applied to patches steers multi-atlas segmentation process
Moradi et al. (2016a) US CNN Automatic generation of text descriptions for Doppler US images of cardiac valves using doc2vec
Gülsün et al. (2016) CT CNN Coronary centerline extraction; CNN classifies paths as correct or leakages
Lessmann et al. (2016) CT CNN Coronary calcium detection in low dose ungated CT using multi-stream CNN (3 views)
Moradi et al. (2016b) CT CNN Labeling of 2D slices from cardiac CT exams; comparison with handcrafted features
de Vos et al. (2016b) CT CNN Detect bounding boxes by slice classification and combining 3 orthogonal 2D CNNs
Wolterink et al. (2016) CT CNN Coronary calcium detection in gated CTA; compares 3D CNN with multi-stream 2D CNNs
Zreik et al. (2016) CT CNN Left ventricle segmentation; multi-stream CNN (3 views) voxel classification

4.5. Breast of breast cancer; this consisted of three subtasks: (1)


detection and classification of mass-like lesions, (2) de-
One of the earliest DNN applications from Sahiner tection and classification of micro-calcifications, and (3)
et al. (1996) was on breast imaging. Recently, interest breast cancer risk scoring of images. Mammography is
has returned which resulted in significant advances over by far the most common modality and has consequently
the state of the art, achieving the performance of human enjoyed the most attention. Work on tomosynthesis,
readers on ROIs (Kooi et al., 2016). Since most breast US, and shear wave elastography is still scarce, and we
imaging techniques are two dimensional, methods suc- have only one paper that analyzed breast MRI with deep
cessful in natural images can easily be transferred. With learning; these other modalities will likely receive more
one exception, the only task addressed is the detection
20
attention in the next few years. Table 6 summarizes the for feature extraction and are integrated in compound
literature and main messages. segmentation frameworks. Two papers are exceptional
Since many countries have screening initiatives for because they combined CNNs with RNNs: Poudel et al.
breast cancer, there should be massive amounts of data (2016) introduced a recurrent connection within the U-
available, especially for mammography, and therefore net architecture to segment the left ventricle slice by
enough opportunities for deep models to flourish. Un- slice and learn what information to remember from the
fortunately, large public digital databases are unavail- previous slices when segmenting the next one. Kong
able and consequently older scanned screen-film data et al. (2016) used an architecture with a standard 2D
sets are still in use. Challenges such as the recently CNN and an LSTM to perform temporal regression to
launched DREAM challenge have not yet had the de- identify specific frames and a cardiac sequence. Many
sired success. papers use publicly available data. The largest chal-
As a result, many papers used small data sets result- lenge in this field was the 2015 Kaggle Data Science
ing in mixed performance. Several projects have ad- Bowl where the goal was to automatically measure end-
dressed this issue by exploring semi-supervised learning systolic and end-diastolic volumes in cardiac MRI. 192
(Sun et al., 2016a), weakly supervised learning (Hwang teams competed for $200,000 in prize money and the
and Kim, 2016), and transfer learning (Kooi et al., 2017; top ranking teams all used deep learning, in particular
Samala et al., 2016b)). Another method combines deep fCNN or U-net segmentation schemes.
models with handcrafted features (Dhungel et al., 2016),
which have been shown to be complementary still, even 4.7. Abdomen
for very big data sets (Kooi et al., 2016). State of the Most papers on the abdomen aimed to localize and
art techniques for mass-like lesion detection and classi- segment organs, mainly the liver, kidneys, bladder, and
fication tend to follow a two-stage pipeline with a can- pancreas (Table 8). Two papers address liver tumor seg-
didate detector; this design reduces the image to a set mentation. The main modality is MRI for prostate anal-
of potentially malignant lesions, which are fed to a deep ysis and CT for all other organs. The colon is the only
CNN (Fotin et al., 2016; Kooi et al., 2016). Alterna- area where various applications were addressed, but al-
tives use a region proposal network (R-CNN) that by- ways in a straightforward manner: A CNN was used
passes the cascaded approach (Akselrod-Ballin et al., as a feature extractor and these features were used for
2016; Kisilev et al., 2016). classification.
When large data sets are available, good results can It is interesting to note that in two segmentation
be obtained. At the SPIE Medical Imaging confer- challenges - SLIVER07 for liver and PROMISE12 for
ence of 2016, a researcher from a leading company in prostate - more traditional image analysis methods were
the mammography CAD field told a packed conference dominant up until 2016. In PROMISE12, the current
room how a few weeks of experiments with a standard second and third in rank among the automatic methods
architecture (AlexNet) - trained on the company’s pro- used active appearance models. The algorithm from
prietary database - yielded a performance that was su- IMorphics was ranked first for almost five years (now
perior to what years of engineering handcrafted feature ranked second). However, a 3D fCNN similar to U-
systems had achieved (Fotin et al., 2016). net (Yu et al., 2017) has recently taken the top position.
This paper has an interesting approach where a sum-
4.6. Cardiac operation was used instead of the concatenation opera-
Deep learning has been applied to many aspects of tion used in U-net, making it a hybrid between a ResNet
cardiac image analysis; the literature is summarized in and U-net architecture. Also in SLIVER07 - a 10-year-
Table 7. MRI is the most researched modality and left old liver segmentation challenge - CNNs have started
ventricle segmentation the most common task, but the to appear in 2016 at the top of the leaderboard, replac-
number of applications is highly diverse: segmenta- ing previously dominant methods focused on shape and
tion, tracking, slice classification, image quality assess- appearance modeling.
ment, automated calcium scoring and coronary center-
line tracking, and super-resolution. 4.8. Musculoskeletal
Most papers used simple 2D CNNs and analyzed the Musculoskeletal images have also been analyzed by
3D and often 4D data slice by slice; the exception is deep learning algorithms for segmentation and identifi-
Wolterink et al. (2016) where 3D CNNs were used. cation of bone, joint, and associated soft tissue abnor-
DBNs are used in four papers, but these all originated malities in diverse imaging modalities. The works are
from the same author group. The DBNs are only used summarized in Table 9.
21
Table 8: Overview of papers using deep learning for abdominal image analysis.

Reference Topic Modality Method Remarks


Multiple
Hu et al. (2016a) Segmentation CT CNN 3D CNN with time-implicit level sets for segmentation of liver, spleen and kidneys
Segmentation tasks in liver imaging
Li et al. (2015) Lesion CT CNN 2D 17×17 patch-based classification, Ben-Cohen et al. (2016) repeats this approach
Vivanti et al. (2015) Lesion CT CNN 2D CNN for liver tumor segmentation in follow-up CT taking baseline CT as input
Ben-Cohen et al. (2016) Liver CT CNN 2D CNN similar to U-net, but without cross-connections; good results on SLIVER07
Christ et al. (2016) Liver & tumor CT CNN U-net, cascaded fCNN and dense 3D CRF
Dou et al. (2016a) Liver CT CNN 3D CNN with conditional random field; good results on SLIVER07
Hoogi et al. (2016) Lesion CT/MRI CNN 2D CNN obtained probabilities are used to drive active contour model
Hu et al. (2016b) Liver CT CNN 3D CNN with surface evolution of a shape prior; good results on SLIVER07
Lu et al. (2017) Liver CT CNN 3D CNN, competitive results on SLIVER07
Kidneys
Lu et al. (2016) Localization CT CNN Combines local patch and slice based CNN
Ravishankar et al. (2016b) Localization US CNN Combines CNN with classical features to detect regions around kidneys
Thong et al. (2016) Segmentation CT CNN 2D CCN with 43×43 patches, tested on 20 scans
Pancreas segmentation in CT
Farag et al. (2015) Segmentation CT CNN Approach with elements similar to Roth et al. (2015b)
Roth et al. (2015b) Segmentation CT CNN Orthogonal patches from superpixel regions are fed into CNNs in three different ways
Cai et al. (2016a) Segmentation CT CNN 2 CNNs detect inside and boundary of organ, initializes conditional random field
Roth et al. (2016a) Segmentation CT CNN 2 CNNs detect inside and boundary of pancreas, combined with random forests
Colon
Tajbakhsh et al. (2015b) Polyp detection Colonoscopy CNN CNN computes additional features, improving existing scheme
Liu et al. (2016a) Colitis detection CT CNN Pre-trained ImageNet CNN generates features for linear SVM
Nappi et al. (2016) Polyp detection CT CNN Substantial reduction of false positives using pre-trained and fine-tuned CNN
Tachibana et al. (2016) Electronic cleansing CT CNN Voxel classification in dual energy CT, material other than soft tissue is removed
Zhang et al. (2017) Polyp detection Colonoscopy CNN Pre-trained ImageNet CNN for feature extraction, two SVMs for cascaded classification
Prostate segmentation in MRI
Liao et al. (2013) Application of stacked independent subspace analysis networks
Cheng et al. (2016b) CNN produces energy map for 2D slice based active appearance segmentation
Guo et al. (2016) Stacked sparse auto-encoders extract features from patches, input to atlas matching and a deformable model
Milletari et al. (2016b) 3D U-net based CNN architecture with objective function that directly optimizes Dice coefficient, ranks #5 in PROMISE12
Yu et al. (2017) 3D fully convolutional network, hybrid between a ResNet and U-net architecture, ranks #1 on PROMISE12
Prostate
Azizi et al. (2016)) Lesion classification US DBN DBN learns features from temporal US to classify prostate lesions benign/malignant
Shah et al. (2016) CBIR MRI CNN Features from pre-trained CNN combined with features from hashing forest
Zhu et al. (2017) Lesion classification MRI SAE Learns features from multiple modalities, hierarchical random forest for classification
Bladder
Cha et al. (2016) Segmentation CT CNN CNN patch classification used as initialization for level set

Table 9: Overview of papers using deep learning for musculoskeletal image analysis.

Reference Modality Application; remarks


Prasoon et al. (2013) MRI Knee cartilage segmentation using multi-stream CNNs
Chen et al. (2015c) CT Vertebrae localization; joint learning of vertebrae appearance and dependency on neighbors using CNN
Roth et al. (2015c) CT Sclerotic metastases detection; random 2D views are analyzed by CNN and aggregated
Shen et al. (2015a) CT Vertebrae localization and segmentation; CNN for segmenting vertebrae and for center detection
Suzani et al. (2015) MRI Vertebrae localization, identification and segmentation of vertebrae; CNN used for initial localization
Yang et al. (2015) MRI Anatomical landmark detection; uses CNN for slice classification for presence of landmark
Antony et al. (2016) X-ray Osteoarthritis grading; pre-trained ImageNet CNN fine-tuned on knee X-rays
Cai et al. (2016b) CT, MRI Vertebrae localization; RBM determines position, orientation and label of vertebrae
Golan et al. (2016) US Hip dysplasia detection; CNN with adversarial component detects structures and performs measurements
Korez et al. (2016) MRI Vertebral bodies segmentation; voxel probabilities obtained with a 3D CNN are input to deformable model
Jamaludin et al. (2016) MRI Automatic spine scoring; VGG-19 CNN analyzes vertebral discs and finds lesion hotspots
Miao et al. (2016) X-ray Total Knee Arthroplasty kinematics by real-time 2D/3D registration using CNN
Roth et al. (2016c) CT Posterior-element fractures detection; CNN for 2.5D patch-based analysis
Štern et al. (2016) MRI Hand age estimation; 2D regression CNN analyzes 13 bones
Forsberg et al. (2017) MRI Vertebrae detection and labeling; outputs of two CNNs are input to graphical model
Spampinato et al. (2017) X-ray Skeletal bone age assessment; comparison among several deep learning approaches for the task at hand

22
A surprising number of complete applications with large diversity of deep architectures are covered. The
promising results are available; one that stands out is earliest studies used pre-trained CNNs as feature extrac-
Jamaludin et al. (2016) who trained their system with tors. The fact that these pre-trained networks could sim-
12K discs and claimed near-human performances across ply be downloaded and directly applied to any medical
four different radiological scoring tasks. image facilitated their use. Moreover, in this approach
already existing systems based on handcrafted features
4.9. Other could simply be extended. In the last two years, how-
This final section lists papers that address multiple ever, we have seen that end-to-end trained CNNs have
applications (Table 10) and a variety of other applica- become the preferred approach for medical imaging in-
tions (Table 11). terpretation (see Figure 1). Such CNNs are often inte-
It is remarkable that one single architecture or ap- grated into existing image analysis pipelines and replace
proach based on deep learning can be applied with- traditional handcrafted machine learning methods. This
out modifications to different tasks; this illustrates the is the approach followed by the largest group of papers
versatility of deep learning and its general applicabil- in this survey and we can confidently state that this is
ity. In some works, pre-trained architectures are used, the current standard practice.
sometimes trained with images from a completely dif-
Key aspects of successful deep learning methods
ferent domain. Several authors analyze the effect of
fine-tuning a network by training it with a small data set After reviewing so many papers one would expect
of images from the intended application domain. Com- to be able to distill the perfect deep learning method
bining features extracted by a CNN with ‘traditional’ and architecture for each individual task and applica-
features is also commonly seen. tion area. Although convolutional neural networks (and
From Table 11, the large number of papers that ad- derivatives) are now clearly the top performers in most
dress obstetric applications stand out. Most papers ad- medical image analysis competitions, one striking con-
dress the groundwork, such as selecting an appropriate clusion we can draw is that the exact architecture is not
frame from an US stream. More work on automated the most important determinant in getting a good so-
measurements with deep learning in these US sequences lution. We have seen, for example in challenges like
is likely to follow. the Kaggle Diabetic Retinopathy Challenge, that many
The second area where CNNs are rapidly improv- researchers use the exact same architectures, the same
ing the state of the art is dermoscopic image analy- type of networks, but have widely varying results. A
sis. For a long time, diagnosing skin cancer from pho- key aspect that is often overlooked is that expert knowl-
tographs was considered very difficult and out of reach edge about the task to be solved can provide advan-
for computers. Many studies focused only on images tages that go beyond adding more layers to a CNN.
obtained with specialized cameras, and recent systems Groups and researchers that obtain good performance
based on deep networks produced promising results. A when applying deep learning algorithms often differ-
recent work by Esteva et al. (2017) demonstrated excel- entiate themselves in aspects outside of the deep net-
lent results with training a recent standard architecture work, like novel data preprocessing or augmentation
(Google’s Inception v3) on a data set of both dermo- techniques. An example is that the best performing
scopic and standard photographic images. This data set method in the CAMELYON16-challenge improved sig-
was two orders of magnitude larger than what was used nificantly (AUC from 0.92 to 0.99) by adding a stain
in literature before. In a thorough evaluation, the pro- normalization pre-processing step to improve general-
posed system performed on par with 30 board certified ization without changing the CNN. Other papers focus
dermatologists. on data augmentation strategies to make networks more
robust, and they report that these strategies are essential
to obtain good performance. An example is the elas-
5. Discussion tic deformations that were applied in the original U-Net
paper (Ronneberger et al., 2015).
Overview Augmentation and pre-processing are, of course, not
From the 308 papers reviewed in this survey, it is ev- the only key contributors to good solutions. Several re-
ident that deep learning has pervaded every aspect of searchers have shown that designing architectures in-
medical image analysis. This has happened extremely corporating unique task-specific properties can obtain
quickly: the vast majority of contributions, 242 papers, better results than straightforward CNNs. Two exam-
were published in 2016 or the first month of 2017. A ples which we encountered several times are multi-view
23
Table 10: Overview of papers using a single deep learning approach for different tasks. DQN = Deep Q-Network

Reference Task Modality Method Remarks


Shin et al. (2013) Heart, kidney, liver segmentation MRI SAE SAE to learn temporal/spatial features on 2D + time DCE-MRI
Roth et al. (2015a) 2D slice classification CT CNN Automatically classifying slices in 5 anatomical regions
Shin et al. (2015) 2D key image labeling CT, MRI CNN Text and 2D image analysis on a diverse set of 780 thousand images
Cheng et al. (2016a) Various detection tasks US, CT AE, CNN Detection of breast lesions in US and pulmonary nodules in CT
Ghesu et al. (2016a) Landmark detection US, CT, MRI CNN, DQN Reinforcement learning with CNN features, cardiac MR/US, head&neck CT
Liu et al. (2016b) Image retrieval X-ray CNN Combines CNN feature with Radon transform, evaluated on IRMA database
Merkow et al. (2016) Vascular network segmentation CT, MRI CNN Framework to find various vascular networks
Moeskops et al. (2016b) Various segmentation tasks MRI, CT CNN Single architecture to segment 6 brain tissues, pectoral muscle & coronaries
Roth et al. (2016b) Various detection tasks CT CNN Multi-stream CNN to detect sclerotic lesions, lymph nodes and polyps
Shin et al. (2016b) Abnormality detection CT CNN Compares architectures for detecting interstitial disease and lymph nodes
Tajbakhsh et al. (2016) Abnormality detection CT, US CNN Compares pre-trained with fully trained networks for three detection tasks
Wang et al. (2016e) 2D key image labeling CT, MRI CNN Text concept clustering, related to Shin et al. (2015)
Yan et al. (2016) 2D slice classification CT CNN Automatically classifying CT slices in 12 anatomical regions
Zhou et al. (2016) Thorax-abdomen segmentation CT CNN 21 structures are segmented with 3 orthogonal 2D fCNNs and majority voting

Table 11: Overview of papers using deep learning for various image analysis tasks.

Reference Task Modality Method Remarks


Fetal imaging
Chen et al. (2015b) Frame labeling US CNN Locates abdominal plane from fetal ultrasound videos
Chen et al. (2015a) Frame labeling US RNN Same task as Chen et al. (2015b), now using RNNs
Baumgartner et al. (2016) Frame labeling US CNN Labeling 12 standard frames in 1003 mid pregnancy fetal US videos
Gao et al. (2016d) Frame labeling US CNN 4 class frame classification using transfer learning with pre-trained networks
Kumar et al. (2016) Frame labeling US CNN 12 standard anatomical planes, CNN extracts features for support vector machine
Rajchl et al. (2016b) Segmentation with non expert labels MRI CNN Crowd-sourcing annotation efforts to segment brain structures
Rajchl et al. (2016a) Segmentation given bounding box MRI CNN CNN and CRF for segmentation of structures
Ravishankar et al. (2016a) Quantification US CNN Hybrid system using CNN and texture features to find abdominal circumference
Yu et al. (2016b) Left ventricle segmentation US CNN Frame-by-frame segmentation by dynamically fine-tuning CNN to the latest frame
Dermatology
Codella et al. (2015) Melanoma detection in dermoscopic images CNN Features from pre-trained CNN combined with other features
Demyanov et al. (2016) Pattern identification in dermoscopic images CNN Comparison to simpler networks and simple machine learning
Kawahara et al. (2016a) 5 and 10-class classification photographic images CNN Pre-trained CNN for feature extraction at two image resolutions
Kawahara and Hamarneh (2016) 10-class classification photographic images CNN Extending Kawahara et al. (2016a) now training multi-resolution CNN end-to-end
Yu et al. (2016a) Melanoma detection in dermoscopic images CNN Deep residual networks for lesion segmentation and classification, winner ISIC16
Menegola et al. (2016) Classification of dermoscopic images CNN Various pre-training and fine-tuning strategies are compared
Esteva et al. (2017) Classification of photographic and dermoscopic images CNN Inception CNN trained on 129k images; compares favorably to 29 dermatologists
Lymph nodes
Roth et al. (2014) Lymph node detection CT CNN Introduces multi-stream framework of 2D CNNs with orthogonal patches
Barbu et al. (2016) Lymph node detection CT CNN Compares effect of different loss functions
Nogues et al. (2016) Lymph node detection CT CNN 2 fCNNs, for inside and for contour of lymph nodes, are combined in a CRF
Other
Wang et al. (2015) Wound segmentation photographs CNN Additional detection of infection risk and healing progress
Ypsilantis et al. (2015) Chemotherapy response prediction PET CNN CNN outperforms classical radiomics features in patients with esophageal cancer
Zheng et al. (2015) Carotid artery bifurcation detection CT CNN Two stage detection process, CNNs combined with Haar features
Alansary et al. (2016) Placenta segmentation MRI CNN 3D multi-stream CNN with extension for motion correction
Fritscher et al. (2016) Head&Neck tumor segmentation CT CNN 3 orthogonal patches in 2D CNNs, combined with other features
Jaumard-Hakoun et al. (2016) Tongue contour extraction US RBM Analysis of tongue motion during speech, combines auto-encoders with RBMs
Payer et al. (2016) Hand landmark detection X-ray CNN Various architectures are compared
Quinn et al. (2016) Disease detection microscopy CNN Smartphone mounted on microscope detects malaria, tuberculosis & parasite eggs
Smistad and Løvstakken (2016) Vessel detection and segmentation US CNN Femoral and carotid vessels analyzed with standard fCNN
Twinanda et al. (2017) Task recognition in laparoscopy Videos CNN Fine-tuned AlexNet applied to video frames
Xu et al. (2016c) Cervical dysplasia cervigrams CNN Fine-tuned pre-trained network with added non-imaging features
Xue et al. (2016) Esophageal microvessel classification Microscopy CNN Simple CNN used for feature extraction
Zhang et al. (2016a) Image reconstruction CT CNN Reconstructing from limited angle measurements, reducing reconstruction artefacts
Lekadir et al. (2017) Carotid plaque classification US CNN Simple CNN for characterization of carotid plaque composition in ultrasound
Ma et al. (2017) Thyroid nodule detection US CNN CNN and standard features combines for 2D US analysis

and multi-scale networks. Other, often underestimated, could perform the same task themselves via visual as-
parts of network design are the network input size and sessment of the network input. If they, or domain ex-
receptive field (i.e. the area in input space that con- perts, cannot achieve good performance, the chance that
tributes to a single output unit). Input sizes should be you need to modify your network input or architecture
selected considering for example the required resolution is high.
and context to solve a problem. One might increase the The last aspect we want to touch on is model hyper-
size of the patch to obtain more context, but without parameter optimization (e.g. learning rate, dropout
changing the receptive field of the network this might rate), which can help squeeze out extra performance
not be beneficial. As a standard sanity check researchers from a network. We believe this is of secondary im-
24
portance with respect to performance to the previously Given the complexity of leveraging free-text reports
discussed topics and training data quality. Disappoint- from PACS or similar systems to train algorithms, gen-
ingly, no clear recipe can be given to obtain the best set erally researchers request domain experts (e.g. radiolo-
of hyper-parameters as it is a highly empirical exercise. gist, pathologists) to make task-specific annotations for
Most researchers fall back to an intuition-based random the image data. Labeling a sufficiently large dataset can
search (Bergstra and Bengio, 2012), which often seems take a significant amount of time, and this is problem-
to work well enough. Some basic tips have been cov- atic. For example, to train deep learning systems for
ered before by Bengio (2012). Researchers have also segmentation in radiology often 3D, slice-by-slice an-
looked at Bayesian methods for hyper-parameter opti- notations need to be made and this is very time con-
mization (Snoek et al., 2012), but this has not been ap- suming. Thus, learning efficiently from limited data is
plied in medical image analysis as far as we are aware an important area of research in medical image analy-
of. sis. A recent paper focused on training a deep learning
segmentation system for 3D segmentation using only
Unique challenges in medical image analysis sparse 2D segmentations (Çiçek et al., 2016). Multiple-
It is clear that applying deep learning algorithms to instance or active learning approaches might also of-
medical image analysis presents several unique chal- fer benefit in some cases, and have recently been pur-
lenges. The lack of large training data sets is often men- sued in the context of deep learning (Yan et al., 2016).
tioned as an obstacle. However, this notion is only par- One can also consider leveraging non-expert labels via
tially correct. The use of PACS systems in radiology crowd-sourcing (Rajchl et al., 2016b). Other poten-
has been routine in most western hospitals for at least tial solutions can be found within the medical field it-
a decade and these are filled with millions of images. self; in histopathology one can sometimes use specific
There are few other domains where this magnitude of immunohistochemical stains to highlight regions of in-
imaging data, acquired for specific purposes, are dig- terest, reducing the need for expert experience (Turkki
itally available in well-structured archives. PACS-like et al., 2016).
systems are not as broadly used for other specialties in Even when data is annotated by domain expert, label
medicine, like ophthalmology and pathology, but this noise can be a significant limiting factor in developing
is changing as imaging becomes more prevalent across algorithms, whereas in computer vision the noise in the
disciplines. We are also seeing that increasingly large labeling of images is typically relatively low. To give
public data sets are made available: Esteva et al. (2017) an example, a widely used dataset for evaluating im-
used 18 public data sets and more than 105 training im- age analysis algorithms to detect nodules in lung CT is
ages; in the Kaggle diabetic retinopathy competition a the LIDC-IDRI dataset (Armato et al., 2011). In this
similar number of retinal images were released; and sev- dataset pulmonary nodules were annotated by four ra-
eral chest x-ray studies used more than 104 images. diologists independently. Subsequently the readers re-
The main challenge is thus not the availability of im- viewed each others annotations but no consensus was
age data itself, but the acquisition of relevant annota- forced. It turned out that the number of nodules they
tions/labeling for these images. Traditionally PACS sys- did not unanimously agreed on to be a nodule, was three
tems store free-text reports by radiologists describing times larger than the number they did fully agree on.
their findings. Turning these reports into accurate an- Training a deep learning system on such data requires
notations or structured labels in an automated manner careful consideration of how to deal with noise and un-
requires sophisticated text-mining methods, which is an certainty in the reference standard. One could think
important field of study in itself where deep learning is of solutions like incorporating labeling uncertainty di-
also widely used nowadays. With the introduction of rectly in the loss function, but this is still an open chal-
structured reporting into several areas of medicine, ex- lenge.
tracting labels to data is expected to become easier in the In medical imaging often classification or segmenta-
future. For example, there are already papers appearing tion is presented as a binary task: normal versus ab-
which directly leverage BI-RADS categorizations by ra- normal, object versus background. However, this is of-
diologist to train deep networks (Kisilev et al., 2016) or ten a gross simplification as both classes can be highly
semantic descriptions in analyzing optical coherence to- heterogeneous. For example, the normal category of-
mography images (Schlegl et al., 2015). We expect the ten consists of completely normal tissue but also sev-
amount of research in optimally leveraging free-text and eral categories of benign findings, which can be rare,
structured reports for network training to increase in the and may occasionally include a wide variety of imag-
near future. ing artifacts. This often leads to systems that are ex-
25
tremely good at excluding the most common normal sification, where the anatomical location of the patch is
subclasses, but fail miserably on several rare ones. A often unknown to network. One solution would be to
straightforward solution would be to turn the deep learn- feed the entire image to the deep network and use a dif-
ing system in a multi-class system by providing it with ferent type of evaluation to drive learning, as was done
detailed annotations of all possible subclasses. Obvi- by, for example, Milletari et al. (2016b), who designed
ously this again compounds the issue of limited avail- a loss function based on the Dice coefficient. This also
ability of expert time for annotating and is therefore of- takes advantage of the fact that medical images are of-
ten simply not feasible. Some researchers have specif- ten acquired using a relatively static protocol, where the
ically looked into tackling this imbalance by incorpo- anatomy is always roughly in the same position and at
rating intelligence in the training process itself, by ap- the same scale. However, as mentioned above, if the
plying selective sampling (van Grinsven et al., 2016) or receptive field of the network is small feeding in the en-
hard negative mining (Wang et al., 2016b). However, tire image offers no benefit. Furthermore, feeding full
such strategies typically fail when there is substantial images to the network is not always feasible due to, for
noise in the reference standard. Additional methods for example, memory constraints. In some cases this might
dealing with within-class heterogeneity would be highly be solved in the near future due to advances in GPU
welcome. technology, but in others, for example digital pathology
Another data-related challenge is class imbalance. In with its gigapixel-sized images, other strategies have to
medical imaging, images for the abnormal class might be invented.
be challenging to find, depending on the task at hand.
As an example, the implementation of breast cancer Outlook
screening programs has resulted in vast databases of Although most of the challenges mentioned above
mammograms that have been established at many lo- have not been adequately tackled yet, several high-
cations world-wide. However, the majority of these im- profile successes of deep learning in medical imaging
ages are normal and do not contain any suspicious le- have been reported, such as the work by Esteva et al.
sions. When a mammogram does contain a suspicious (2017) and Gulshan et al. (2016) in the fields of derma-
lesion this is often not cancerous, and even most can- tology and ophthalmology. Both papers show that it is
cerous lesions will not lead to the death of a patient. possible to outperform medical experts in certain tasks
Designing deep learning systems that are adept at han- using deep learning for image classification. However,
dling this class imbalance is another important area of we feel it is important to put these papers into context
research. A typical strategy we encountered in current relative to medical image analysis in general, as most
literature is the application of specific data augmenta- tasks can by no means be considered ’solved’. One as-
tion algorithms to just the underrepresented class, for pect to consider is that both Esteva et al. (2017) and Gul-
example scaling and rotation transforms to generate new shan et al. (2016) focus on small 2D color image classi-
lesions. Pereira et al. (2016) performed a thorough eval- fication, which is relatively similar to the tasks that have
uation of data augmentation strategies for brain lesion been tackled in computer vision (e.g. ImageNet). This
segmentation to combat class imbalance. allows them to take advantage of well-explored network
In medical image analysis useful information is not architectures like ResNet and VGG-Net which have
just contained within the images themselves. Physicians shown to have excellent results in these tasks. However,
often leverage a wealth of data on patient history, age, there is no guarantee that these architectures are optimal
demographics and others to arrive at better decisions. in for example regressions/detection tasks. It also al-
Some authors have already investigated combining this lowed the authors to use networks that were pre-trained
information into deep learning networks in a straight- on a very well-labeled dataset of millions of natural im-
forward manner (Kooi et al., 2017). However, as these ages, which helps combat the lack of similarly large,
authors note, the improvements that were obtained were labeled medical datasets. In contrast, in most medical
not as large as expected. One of the challenges is to bal- imaging tasks 3D gray-scale or multi-channel images
ance the number of imaging features in the deep learn- are used for which pre-trained networks or architectures
ing network (typically thousands) with the number of dont exist. In addition this data typically has very spe-
clinical features (typically only a handful) to prevent the cific challenges, like anisotropic voxel sizes, small reg-
clinical features from being drowned out. Physicians istration errors between varying channels (e.g. in multi-
often also need to use anatomical information to come parametric MRI) or varying intensity ranges. Although
to an accurate diagnosis. However, many deep learning many tasks in medical image analysis can be postulated
systems in medical imaging are still based on patch clas- as a classification problem, this might not always be the
26
optimal strategy as it typically requires some form of optimally leverage this wealth of information.
post-processing with non-deep learning methods (e.g. Finally, deep learning methods have often been de-
counting, segmentation or regression tasks). An inter- scribed as ‘black boxes’. Especially in medicine, where
esting example is the paper by Sirinukunwattana et al. accountability is important and can have serious legal
(2016), which details a method directly predicting the consequences, it is often not enough to have a good pre-
center locations of nuclei and shows that this outper- diction system. This system also has to be able to ar-
forms classification-based center localization. Nonethe- ticulate itself in a certain way. Several strategies have
less, the papers by Esteva et al. (2017) and Gulshan et al. been developed to understand what intermediate layers
(2016) do show what ideally is possible with deep learn- of convolutional networks are responding to, for exam-
ing methods that are well-engineered for specific medi- ple deconvolution networks (Zeiler and Fergus, 2014),
cal image analysis tasks. guided back-propagation (Springenberg et al., 2014) or
Looking at current trends in the machine learning deep Taylor composition (Montavon et al., 2017). Other
community with respect to deep learning, we identify a researchers have tied prediction to textual representa-
key area which can be highly relevant for medical imag- tions of the image (i.e. captioning) (Karpathy and Fei-
ing and is receiving (renewed) interest: unsupervised Fei, 2015), which is another useful avenue to understand
learning. The renaissance of neural networks started what a network is perceiving. Last, some groups have
around 2006 with the popularization of greedy layer- tried to combine Bayesian statistics with deep networks
wise pre-training of neural networks in an unsupervised to obtain true network uncertainty estimates Kendall
manner. This was quickly superseded by fully super- and Gal (2017). This would allow physicians to as-
vised methods which became the standard after the suc- sess when the network is giving unreliable predictions.
cess of AlexNet during the ImageNet competition of Leveraging these techniques in the application of deep
2012, and most papers in this survey follow a supervised learning methods to medical image analysis could ac-
approach. However, interest in unsupervised training celerate acceptance of deep learning applications among
strategies has remained and recently has regained trac- clinicians, and among patients. We also foresee deep
tion. learning approaches will be used for related tasks in
Unsupervised methods are attractive as they allow medical imaging, mostly unexplored, such as image re-
(initial) network training with the wealth of unlabeled construction (Wang, 2016). Deep learning will thus not
data available in the world. Another reason to as- only have a great impact in medical image analysis, but
sume that unsupervised methods will still have a sig- in medical imaging as a whole.
nificant role to play is the analogue to human learn-
ing, which seems to be much more data efficient and
also happens to some extent in an unsupervised man- Acknowledgments
ner; we can learn to recognize objects and structures
The authors would like to thank members of the Di-
without knowing the specific label. We only need very
agnostic Image Analysis Group for discussions and sug-
limited supervision to categorize these recognized ob-
gestions. This research was funded by grants KUN
jects into classes. Two novel unsupervised strategies
2012-5577, KUN 2014-7032, and KUN 2015-7970 of
which we expect to have an impact in medical imag-
the Dutch Cancer Society.
ing are variational auto-encoders (VAEs), introduced by
Kingma and Welling (2013) and generative adversar-
ial networks (GANs), introduced by Goodfellow et al. Appendix A: Literature selection
(2014). The former merges variational Bayesian graph-
ical models with neural networks as encoders/decoders. PubMed was searched for papers containing ”convo-
The latter uses two competing convolutional neural net- lutional” OR ”deep learning” in any field. We specif-
works where one is generating artificial data samples ically did not include the term neural network here as
and the other is discriminating artificial from real sam- this would result in an enormous amount of ’false pos-
ples. Both have stochastic components and are gener- itive’ papers covering brain research. This search ini-
ative networks. Most importantly, they can be trained tially gave over 700 hits. ArXiv was searched for pa-
end-to-end and learn representative features in a com- pers mentioning one of a set of terms related to medical
pletely unsupervised manner. As we discussed in pre- imaging. The exact search string was: ’abs:((medical
vious paragraphs, obtaining large amounts of unlabeled OR mri OR ”magnetic resonance” OR CT OR ”com-
medical data is generally much easier than labeled data puted tomography” OR ultrasound OR pathology OR
and unsupervised methods like VAEs and GANs could xray OR x-ray OR radiograph OR mammography OR
27
fundus OR OCT) AND (”deep learning” OR convo- nary texture and deep learning classification. In: Conf Proc IEEE
lutional OR cnn OR ”neural network”))’. Conference Eng Med Biol Soc. pp. 2940–2943.
Anavi, Y., Kogan, I., Gelbart, E., Geva, O., Greenspan, H., 2016. Vi-
proceedings for MICCAI (including workshops), SPIE, sualizing and enhancing a deep learning framework using patients
ISBI and EMBC were searched based on titles of pa- age and gender for chest X-ray image retrieval. In: Medical Imag-
pers. Again we looked for mentions of ’deep learning’ ing. Vol. 9785 of Proceedings of the SPIE. p. 978510.
or ’convolutional’ or ’neural network’. We went over all Andermatt, S., Pezold, S., Cattin, P., 2016. Multi-dimensional gated
recurrent units for the segmentation of biomedical 3D-data. In:
these papers and excluded the ones that did not discuss DLMIA. Vol. 10008 of Lect Notes Comput Sci. pp. 142–151.
medical imaging (e.g. applications to genetics, chem- Anthimopoulos, M., Christodoulidis, S., Ebner, L., Christe, A.,
istry), only used handcrafted features in combination Mougiakakou, S., 2016. Lung pattern classification for intersti-
with neural networks, or only referenced deep learn- tial lung diseases using a deep convolutional neural network. IEEE
Trans Med Imaging 35 (5), 1207–1216.
ing as future work. When in doubt whether a paper Antony, J., McGuinness, K., Connor, N. E. O., Moran, K., 2016.
should be included we read the abstract and when the Quantifying radiographic knee osteoarthritis severity using deep
exact methodology was still unclear we read the paper convolutional neural networks. arXiv:1609.02469.
itself. We checked references in all selected papers iter- Apou, G., Schaadt, N. S., Naegel, B., Forestier, G., Schönmeyer, R.,
Feuerhake, F., Wemmert, C., Grote, A., 2016. Detection of lobular
atively and consulted colleagues to identify any papers structures in normal breast tissue. Comput Biol Med 74, 91–102.
which were missed by our initial search. When largely Arevalo, J., González, F. A., Ramos-Pollán, R., Oliveira, J. L., Gue-
overlapping work had been reported in multiple publica- vara Lopez, M. A., 2016. Representation learning for mammogra-
phy mass lesion classification with convolutional neural networks.
tions, only the publication deemed most important was Comput Methods Programs Biomed 127, 248–257.
included. A typical example here was arXiv preprints Armato, S. G., McLennan, G., Bidaut, L., McNitt-Gray, M. F., Meyer,
that were subsequently published or conference contri- C. R., Reeves, A. P., Zhao, B., Aberle, D. R., Henschke, C. I., Hoff-
butions which were expanded and published in journals. man, E. A., Kazerooni, E. A., MacMahon, H., Beek, E. J. R. V.,
Yankelevitz, D., Biancardi, A. M., Bland, P. H., Brown, M. S., En-
gelmann, R. M., Laderach, G. E., Max, D., Pais, R. C., Qing, D.
P. Y., Roberts, R. Y., Smith, A. R., Starkey, A., Batrah, P., Caligiuri,
P., Farooqi, A., Gladish, G. W., Jude, C. M., Munden, R. F.,
References Petkovska, I., Quint, L. E., Schwartz, L. H., Sundaram, B., Dodd,
L. E., Fenimore, C., Gur, D., Petrick, N., Freymann, J., Kirby, J.,
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Hughes, B., Casteele, A. V., Gupte, S., Sallamm, M., Heath, M. D.,
Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Kuhn, M. H., Dharaiya, E., Burns, R., Fryd, D. S., Salganicoff, M.,
Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, Anand, V., Shreter, U., Vastagh, S., Croft, B. Y., 2011. The lung
R., Kaiser, L., Kudlur, M., Levenberg, J., Mane, D., Monga, R., image database consortium (LIDC) and image database resource
Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, initiative (IDRI): a completed reference database of lung nodules
B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, on CT scans. Med Phys 38, 915–931.
V., Viegas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Avendi, M., Kheradvar, A., Jafarkhani, H., 2016. A combined deep-
Yu, Y., Zheng, X., 2016. Tensorflow: Large-scale machine learning learning and deformable-model approach to fully automatic seg-
on heterogeneous distributed systems. arXiv:1603.04467. mentation of the left ventricle in cardiac MRI. Med Image Anal
Abràmoff, M. D., Lou, Y., Erginay, A., Clarida, W., Amelon, R., Folk, 30, 108–119.
J. C., Niemeijer, M., 2016. Improved automated detection of dia- Azizi, S., Imani, F., Ghavidel, S., Tahmasebi, A., Kwak, J. T., Xu, S.,
betic retinopathy on a publicly available dataset through integra- Turkbey, B., Choyke, P., Pinto, P., Wood, B., Mousavi, P., Abol-
tion of deep learning. Invest Ophthalmol Vis Sci 57 (13), 5200– maesumi, P., 2016. Detection of prostate cancer using temporal
5206. sequences of ultrasound data: a large clinical feasibility study. Int
Akram, S. U., Kannala, J., Eklund, L., Heikkilä, J., 2016. Cell seg- J Comput Assist Radiol Surg 11 (6), 947–956.
mentation proposal network for microscopy image analysis. In: Bahrami, K., Shi, F., Rekik, I., Shen, D., 2016. Convolutional neural
DLMIA. Vol. 10008 of Lect Notes Comput Sci. pp. 21–29. network for reconstruction of 7T-like images from 3T MRI using
Akselrod-Ballin, A., Karlinsky, L., Alpert, S., Hasoul, S., Ben-Ari, R., appearance and anatomical features. In: DLMIA. Vol. 10008 of
Barkan, E., 2016. A region based convolutional network for tumor Lect Notes Comput Sci. pp. 39–47.
detection and classification in breast mammography. In: DLMIA. Bao, S., Chung, A. C., 2016. Multi-scale structured CNN with label
Vol. 10008 of Lect Notes Comput Sci. pp. 197–205. consistency for brain MR image segmentation. Computer Methods
Alansary, A., Kamnitsas, K., Davidson, A., Khlebnikov, R., Rajchl, in Biomechanics and Biomedical Engineering: Imaging & Visual-
M., Malamateniou, C., Rutherford, M., Hajnal, J. V., Glocker, B., ization, 1–5.
Rueckert, D., Kainz, B., 2016. Fast fully automatic segmentation Bar, Y., Diamant, I., Wolf, L., Greenspan, H., 2015. Deep learning
of the human placenta from motion corrupted MRI. In: Med Image with non-medical training used for chest pathology identification.
Comput Comput Assist Interv. Vol. 9901 of Lect Notes Comput In: Medical Imaging. Vol. 9414 of Proceedings of the SPIE. p.
Sci. pp. 589–597. 94140V.
Albarqouni, S., Baur, C., Achilles, F., Belagiannis, V., Demirci, S., Bar, Y., Diamant, I., Wolf, L., Lieberman, S., Konen, E., Greenspan,
Navab, N., 2016. AggNet: Deep learning from crowds for mito- H., 2016. Chest pathology identification using deep feature selec-
sis detection in breast cancer histology images. IEEE Trans Med tion with non-medical training. Computer Methods in Biomechan-
Imaging 35, 1313–1321. ics and Biomedical Engineering: Imaging & Visualization, 1–5.
Anavi, Y., Kogan, I., Gelbart, E., Geva, O., Greenspan, H., 2015. A Barbu, A., Lu, L., Roth, H., Seff, A., Summers, R. M., 2016. An anal-
comparative study for chest radiograph image retrieval using bi- ysis of robust cost functions for CNN in computer-aided diagnosis.

28
Computer Methods in Biomechanics and Biomedical Engineering: 9791 of Proceedings of the SPIE. p. 979115.
Imaging & Visualization 2016, 1–6. Cai, J., Lu, L., Zhang, Z., Xing, F., Yang, L., Yin, Q., 2016a. Pancreas
Bastien, F., Lamblin, P., Pascanu, R., Bergstra, J., Goodfellow, I., segmentation in mri using graph-based decision fusion on convo-
Bergeron, A., Bouchard, N., Warde-Farley, D., Bengio, Y., 2012. lutional neural networks. In: Med Image Comput Comput Assist
Theano: new features and speed improvements. In: Deep Learning Interv. Vol. 9901 of Lect Notes Comput Sci. pp. 442–450.
and Unsupervised Feature Learning NIPS 2012 Workshop. Cai, Y., Landis, M., Laidley, D. T., Kornecki, A., Lum, A., Li, S.,
Bauer, S., Carion, N., Schäffler, P., Fuchs, T., Wild, P., Buhmann, 2016b. Multi-modal vertebrae recognition using transformed deep
J. M., 2016. Multi-organ cancer classification and survival analy- convolution network. Comput Med Imaging Graph 51, 11–19.
sis. arXiv:1606.00897. Carneiro, G., Nascimento, J. C., 2013. Combining multiple dynamic
Baumgartner, C. F., Kamnitsas, K., Matthew, J., Smith, S., Kainz, models and deep learning architectures for tracking the left ventri-
B., Rueckert, D., 2016. Real-time standard scan plane detection cle endocardium in ultrasound data. IEEE Trans Pattern Anal Mach
and localisation in fetal ultrasound using fully convolutional neural Intell 35, 2592–2607.
networks. In: Med Image Comput Comput Assist Interv. Vol. 9901 Carneiro, G., Nascimento, J. C., Freitas, A., 2012. The segmentation
of Lect Notes Comput Sci. pp. 203–211. of the left ventricle of the heart from ultrasound data using deep
Ben-Cohen, A., Diamant, I., Klang, E., Amitai, M., Greenspan, H., learning architectures and derivative-based search methods. IEEE
2016. Dlmia. In: International Workshop on Large-Scale Annota- Trans Image Process, 968–982.
tion of Biomedical Data and Expert Label Synthesis. Vol. 10008 of Carneiro, G., Oakden-Rayner, L., Bradley, A. P., Nascimento, J.,
Lect Notes Comput Sci. pp. 77–85. Palmer, L., 2016. Automated 5-year mortality prediction using
Bengio, Y., 2012. Practical recommendations for gradient-based train- deep learning and radiomics features from chest computed tomog-
ing of deep architectures. In: Neural Networks: Tricks of the raphy. arXiv:1607.00267.
Trade. Springer Berlin Heidelberg, pp. 437–478. Cha, K. H., Hadjiiski, L. M., Samala, R. K., Chan, H.-P., Cohan, R. H.,
Bengio, Y., Courville, A., Vincent, P., 2013. Representation learning: Caoili, E. M., Paramagul, C., Alva, A., Weizer, A. Z., Dec. 2016.
A review and new perspectives. IEEE Trans Pattern Anal Mach Bladder cancer segmentation in CT for treatment response assess-
Intell 35 (8), 1798–1828. ment: Application of deep-learning convolution neural network-a
Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H., 2007. Greedy pilot study. Tomography 2, 421–429.
layer-wise training of deep networks. In: Advances in Neural In- Chang, H., Han, J., Zhong, C., Snijders, A., Mao, J.-H., Jan. 2017. Un-
formation Processing Systems. pp. 153–160. supervised transfer learning via multi-scale convolutional sparse
Bengio, Y., Simard, P., Frasconi, P., 1994. Learning long-term depen- coding for biomedical applications. IEEE transactions on pattern
dencies with gradient descent is difficult. IEEE Trans Neural Netw analysis and machine intelligence.
5, 157–166. Charbonnier, J., van Rikxoort, E., Setio, A., Schaefer-Prokop, C., van
Benou, A., Veksler, R., Friedman, A., Raviv, T. R., 2016. De-noising Ginneken, B., Ciompi, F., 2017. Improving airway segmentation
of contrast-enhanced mri sequences by an ensemble of expert deep in computed tomography using leak detection with convolutional
neural networks. In: DLMIA. Vol. 10008 of Lect Notes Comput networks. Med Image Anal 36, 52–60.
Sci. pp. 95–110. Chen, H., Dou, Q., Ni, D., Cheng, J.-Z., Qin, J., Li, S., Heng, P.-A.,
BenTaieb, A., Hamarneh, G., 2016. Topology aware fully convolu- 2015a. Automatic fetal ultrasound standard plane detection using
tional networks for histology gland segmentation. In: Med Image knowledge transferred recurrent neural networks. In: Med Image
Comput Comput Assist Interv. Vol. 9901 of Lect Notes Comput Comput Comput Assist Interv. Vol. 9349 of Lect Notes Comput
Sci. pp. 460–468. Sci. Cham, pp. 507–514.
BenTaieb, A., Kawahara, J., Hamarneh, G., 2016. Multi-loss convo- Chen, H., Dou, Q., Yu, L., Heng, P.-A., 2016a. VoxResNet: Deep
lutional networks for gland analysis in microscopy. In: IEEE Int voxelwise residual networks for volumetric brain segmentation.
Symp Biomedical Imaging. pp. 642–645. arXiv:1608.05895.
Bergstra, J., Bengio, Y., 2012. Random search for hyper-parameter Chen, H., Ni, D., Qin, J., Li, S., Yang, X., Wang, T., Heng, P. A.,
optimization. J Mach Learn Res 13 (1), 281–305. 2015b. Standard plane localization in fetal ultrasound via domain
Birenbaum, A., Greenspan, H., 2016. Longitudinal multiple sclero- transferred deep neural networks. IEEE J Biomed Health Inform
sis lesion segmentation using multi-view convolutional neural net- 19 (5), 1627–1636.
works. In: DLMIA. Vol. 10008 of Lect Notes Comput Sci. pp. Chen, H., Qi, X., Yu, L., Heng, P.-A., 2017. DCAN: Deep contour-
58–67. aware networks for accurate gland segmentation. Med Image Anal
Brosch, T., Tam, R., 2013. Manifold learning of brain MRIs by deep 36, 135–146.
learning. In: Med Image Comput Comput Assist Interv. Vol. 8150 Chen, H., Shen, C., Qin, J., Ni, D., Shi, L., Cheng, J. C. Y., Heng, P.-
of Lect Notes Comput Sci. pp. 633–640. A., 2015c. Automatic localization and identification of vertebrae
Brosch, T., Tang, L. Y., Yoo, Y., Li, D. K., Traboulsee, A., Tam, R., in spine CT via a joint learning model with deep neural networks.
2016. Deep 3D convolutional encoder networks with shortcuts for In: Med Image Comput Comput Assist Interv. Vol. 9349 of Lect
multiscale feature integration applied to Multiple Sclerosis lesion Notes Comput Sci. pp. 515–522.
segmentation. IEEE Trans Med Imaging 35 (5), 1229–1239. Chen, H., Wang, X., Heng, P. A., 2016b. Automated mitosis detec-
Brosch, T., Yoo, Y., Li, D. K. B., Traboulsee, A., Tam, R., 2014. tion with deep regression networks. In: IEEE Int Symp Biomedical
Modeling the variability in brain morphology and lesion distribu- Imaging. pp. 1204–1207.
tion in multiple sclerosis by deep learning. In: Med Image Comput Chen, H., Zheng, Y., Park, J.-H., Heng, P.-A., Zhou, S. K., 2016c.
Comput Assist Interv. Vol. 8674 of Lect Notes Comput Sci. pp. Iterative multi-domain regularized deep learning for anatomical
462–469. structure detection and segmentation from ultrasound images. In:
Burlina, P., Freund, D. E., Joshi, N., Wolfson, Y., Bressler, N. M., Med Image Comput Comput Assist Interv. Vol. 9901 of Lect Notes
2016. Detection of age-related macular degeneration via deep Comput Sci. pp. 487–495.
learning. In: IEEE Int Symp Biomedical Imaging. pp. 184–188. Chen, J., Yang, L., Zhang, Y., Alber, M., Chen, D. Z., 2016d. Com-
Bychkov, D., Turkki, R., Haglund, C., Linder, N., Lundin, J., 2016. bining fully convolutional and recurrent neural networks for 3D
Deep learning for tissue microarray image-based outcome predic- biomedical image segmentation. In: Advances in Neural Informa-
tion in patients with colorectal cancer. In: Medical Imaging. Vol. tion Processing Systems. pp. 3036–3044.

29
Chen, S., Qin, J., Ji, X., Lei, B., Wang, T., Ni, D., Cheng, J.-Z., 2016e. ing Systems. pp. 2843–2851.
Automatic scoring of multiple semantic attributes with multi-task Codella, N., Cai, J., Abedini, M., Garnavi, R., Halpern, A., Smith,
feature leverage: A study on pulmonary nodules in CT images. J. R., 2015. Deep learning, sparse coding, and svm for melanoma
IEEE Trans Med Imaging, in press. recognition in dermoscopy images. In: International Workshop on
Chen, X., Xu, Y., Wong, D. W. K., Wong, T. Y., Liu, J., 2015d. Glau- Machine Learning in Medical Imaging. pp. 118–126.
coma detection based on deep convolutional neural network. In: Collobert, R., Kavukcuoglu, K., Farabet, C., 2011. Torch7: A matlab-
Conf Proc IEEE Eng Med Biol Soc. pp. 715–718. like environment for machine learning. In: Advances in Neural
Cheng, J.-Z., Ni, D., Chou, Y.-H., Qin, J., Tiu, C.-M., Chang, Y.- Information Processing Systems.
C., Huang, C.-S., Shen, D., Chen, C.-M., 2016a. Computer-Aided Cruz-Roa, A., Basavanhally, A., González, F., Gilmore, H., Feldman,
Diagnosis with deep learning architecture: Applications to breast M., Ganesan, S., Shih, N., Tomaszewski, J., Madabhushi, A., 2014.
lesions in US images and pulmonary nodules in CT scans. Nat Sci Automatic detection of invasive ductal carcinoma in whole slide
Rep 6, 24454. images with convolutional neural networks. In: Medical Imaging.
Cheng, R., Roth, H. R., Lu, L., Wang, S., Turkbey, B., Gandler, W., Vol. 9041 of Proceedings of the SPIE. p. 904103.
McCreedy, E. S., Agarwal, H. K., Choyke, P., Summers, R. M., Cruz-Roa, A. A., Ovalle, J. E. A., Madabhushi, A., Osorio, F. A. G.,
McAuliffe, M. J., 2016b. Active appearance model and deep learn- 2013. A deep learning architecture for image representation, visual
ing for more accurate prostate segmentation on MRI. In: Medical interpretability and automated basal-cell carcinoma cancer detec-
Imaging. Vol. 9784 of Proceedings of the SPIE. p. 97842I. tion. In: Med Image Comput Comput Assist Interv. Vol. 8150 of
Cheng, X., Zhang, L., Zheng, Y., 2015. Deep similarity learning for Lect Notes Comput Sci. pp. 403–410.
multimodal medical images. Computer Methods in Biomechanics Dalmis, M., Litjens, G., Holland, K., Setio, A., Mann, R., Karsse-
and Biomedical Engineering, 1–5. meijer, N., Gubern-Mérida, A., Feb. 2017. Using deep learning to
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, segment breast and fibroglandular tissue in mri volumes. Medical
F., Schwenk, H., Bengio, Y., 2014. Learning phrase representa- physics 44, 533–546.
tions using rnn encoder-decoder for statistical machine translation. de Brebisson, A., Montana, G., 2015. Deep neural networks for
arXiv:1406.1078. anatomical brain segmentation. In: Comput Vis Pattern Recognit.
Choi, H., Jin, K. H., 2016. Fast and robust segmentation of the stria- pp. 20–28.
tum using deep convolutional neural networks. Journal of Neuro- de Vos, B. D., Viergever, M. A., de Jong, P. A., Išgum, I., 2016a. Au-
science Methods 274, 146–153. tomatic slice identification in 3D medical images with a ConvNet
Christ, P. F., Elshaer, M. E. A., Ettlinger, F., Tatavarty, S., Bickel, M., regressor. In: DLMIA. Vol. 10008 of Lect Notes Comput Sci. pp.
Bilic, P., Rempfler, M., Armbruster, M., Hofmann, F., D’Anastasi, 161–169.
M., et al., 2016. Automatic liver and lesion segmentation in CT de Vos, B. D., Wolterink, J. M., de Jong, P. A., Viergever, M. A.,
using cascaded fully convolutional neural networks and 3D condi- Išgum, I., 2016b. 2D image classification for 3D anatomy localiza-
tional random fields. In: Med Image Comput Comput Assist Interv. tion: employing deep convolutional neural networks. In: Medical
Vol. 9901 of Lect Notes Comput Sci. pp. 415–423. Imaging. Vol. 9784 of Proceedings of the SPIE. p. 97841Y.
Christodoulidis, S., Anthimopoulos, M., Ebner, L., Christe, A., Demyanov, S., Chakravorty, R., Abedini, M., Halpern, A., Garnavi,
Mougiakakou, S., 2017. Multi-source transfer learning with convo- R., 2016. Classification of dermoscopy patterns using deep convo-
lutional neural networks for lung pattern analysis. IEEE J Biomed lutional neural networks. In: IEEE Int Symp Biomedical Imaging.
Health Inform 21, 76–84. pp. 364–368.
Çiçek, Ö., Abdulkadir, A., Lienkamp, S. S., Brox, T., Ronneberger, Dhungel, N., Carneiro, G., Bradley, A. P., 2016. The automated learn-
O., 2016. 3D U-Net: Learning dense volumetric segmentation ing of deep features for breast mass classification from mammo-
from sparse annotation. In: Med Image Comput Comput Assist grams. In: Med Image Comput Comput Assist Interv. Vol. 9901 of
Interv. Vol. 9901 of Lect Notes Comput Sci. Springer, pp. 424– Lect Notes Comput Sci. Springer, pp. 106–114.
432. Dou, Q., Chen, H., Jin, Y., Yu, L., Qin, J., Heng, P.-A., 2016a. 3D
Cicero, M., Bilbily, A., Colak, E., Dowdell, T., Gray, B., Perampal- deeply supervised network for automatic liver segmentation from
adas, K., Barfett, J., 2016. Training and validating a deep convo- CT volumes. arXiv:1607.00582.
lutional neural network for computer-aided detection and classifi- Dou, Q., Chen, H., Yu, L., Qin, J., Heng, P. A., 2016b. Multi-level
cation of abnormalities on frontal chest radiographs. Invest Radiol, contextual 3D CNNs for false positive reduction in pulmonary nod-
in press. ule detection, in press.
Ciompi, F., Chung, K., van Riel, S. J., Setio, A. A. A., Gerke, P. K., Ja- Dou, Q., Chen, H., Yu, L., Shi, L., Wang, D., Mok, V. C., Heng,
cobs, C., Scholten, E. T., Schaefer-Prokop, C. M., Wille, M. M. W., P. A., 2015. Automatic cerebral microbleeds detection from MR
Marchiano, A., Pastorino, U., Prokop, M., van Ginneken, B., 2016. images via independent subspace analysis based hierarchical fea-
Towards automatic pulmonary nodule management in lung cancer tures. Conf Proc IEEE Eng Med Biol Soc, 7933–7936.
screening with deep learning. arXiv:1610.09157. Dou, Q., Chen, H., Yu, L., Zhao, L., Qin, J., Wang, D., Mok, V. C.,
Ciompi, F., de Hoop, B., van Riel, S. J., Chung, K., Scholten, E. T., Shi, L., Heng, P.-A., 2016c. Automatic detection of cerebral mi-
Oudkerk, M., de Jong, P. A., Prokop, M., van Ginneken, B., crobleeds from MR images via 3D convolutional neural networks.
2015. Automatic classification of pulmonary peri-fissural nodules IEEE Trans Med Imaging 35, 1182–1195.
in computed tomography using an ensemble of 2D views and a Drozdzal, M., Vorontsov, E., Chartrand, G., Kadoury, S., Pal, C.,
convolutional neural network out-of-the-box. Med Image Anal 26, 2016. The importance of skip connections in biomedical image
195–202. segmentation. In: DLMIA. Vol. 10008 of Lect Notes Comput Sci.
Cireşan, D. C., Giusti, A., Gambardella, L. M., Schmidhuber, J., 2013. pp. 179–187.
Mitosis detection in breast cancer histology images with deep neu- Dubrovina, A., Kisilev, P., Ginsburg, B., Hashoul, S., Kimmel, R.,
ral networks. In: Med Image Comput Comput Assist Interv. Vol. 2016. Computational mammography using deep neural networks.
8150 of Lect Notes Comput Sci. pp. 411–418. Computer Methods in Biomechanics and Biomedical Engineering:
Ciresan, D., Giusti, A., Gambardella, L. M., Schmidhuber, J., 2012. Imaging & Visualization, 1–5.
Deep neural networks segment neuronal membranes in electron Ehteshami Bejnordi, B., Litjens, G., Timofeeva, N., Otte-Holler, I.,
microscopy images. In: Advances in Neural Information Process- Homeyer, A., Karssemeijer, N., van der Laak, J., Sep 2016. Stain

30
specific standardization of whole-slide histopathological images. Gao, Y., Maraci, M. A., Noble, J. A., 2016d. Describing ultrasound
IEEE Trans Med Imaging 35 (2), 404–415. video content using deep convolutional neural networks. In: IEEE
URL http://dx.doi.org/10.1109/TMI.2015.2476509 Int Symp Biomedical Imaging. pp. 787–790.
Emad, O., Yassine, I. A., Fahmy, A. S., 2015. Automatic localization Gao, Z., Wang, L., Zhou, L., Zhang, J., 2016e. Hep-2 cell image
of the left ventricle in cardiac MRI images using deep learning. In: classification with deep convolutional neural networks. Journal of
Conf Proc IEEE Eng Med Biol Soc. pp. 683–686. Biomedical and Health Informatics.
Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau, Ghafoorian, M., Karssemeijer, N., Heskes, T., Bergkamp, M.,
H. M., Thrun, S., 2017. Dermatologist-level classification of skin Wissink, J., Obels, J., Keizer, K., de Leeuw, F.-E., van Ginneken,
cancer with deep neural networks. Nature 542, 115–118. B., Marchiori, E., Platel, B., 2017. Deep multi-scale location-
Farabet, C., Couprie, C., Najman, L., LeCun, Y., 2013. Learning hier- aware 3d convolutional neural networks for automated detection
archical features for scene labeling. IEEE Trans Pattern Anal Mach of lacunes of presumed vascular origin. NeuroImage: Clinical, in
Intell 35 (8), 1915–1929. press.
Farag, A., Lu, L., Roth, H. R., Liu, J., Turkbey, E., Summers, Ghafoorian, M., Karssemeijer, N., Heskes, T., van Uden, I., Sanchez,
R. M., 2015. A bottom-up approach for pancreas segmenta- C., Litjens, G., de Leeuw, F.-E., van Ginneken, B., Marchiori,
tion using cascaded superpixels and (deep) image patch labeling. E., Platel, B., 2016a. Location sensitive deep convolutional neu-
arXiv:1505.06236. ral networks for segmentation of white matter hyperintensities.
Ferrari, A., Lombardi, S., Signoroni, A., 2015. Bacterial colony arXiv:1610.04834.
counting by convolutional neural networks. Conf Proc IEEE Eng Ghafoorian, M., Karssemeijer, N., Heskes, T., van Uden, I. W. M.,
Med Biol Soc, 7458–7461. de Leeuw, F.-E., Marchiori, E., van Ginneken, B., Platel, B.,
Fonseca, P., Mendoza, J., Wainer, J., Ferrer, J., Pinto, J., Guerrero, 2016b. Non-uniform patch sampling with deep convolutional neu-
J.and Castaneda, B., 2015. Automatic breast density classification ral networks for white matter hyperintensity segmentation. In:
using a convolutional neural network architecture search proce- IEEE Int Symp Biomedical Imaging. pp. 1414–1417.
dure. In: Medical Imaging. Vol. 9413 of Proceedings of the SPIE. Ghesu, F. C., Georgescu, B., Mansi, T., Neumann, D., Hornegger, J.,
p. 941428. Comaniciu, D., 2016a. An artificial agent for anatomical landmark
Forsberg, D., Sjöblom, E., Sunshine, J. L., 2017. Detection and label- detection in medical images. In: Med Image Comput Comput As-
ing of vertebrae in MR images using deep learning with clinical sist Interv. Vol. 9901 of Lect Notes Comput Sci.
annotations as training data. J Digit Imaging, in press. Ghesu, F. C., Krubasik, E., Georgescu, B., Singh, V., Zheng, Y.,
Fotin, S. V., Yin, Y., Haldankar, H., Hoffmeister, J. W., Periaswamy, Hornegger, J., Comaniciu, D., 2016b. Marginal space deep learn-
S., 2016. Detection of soft tissue densities from digital breast to- ing: Efficient architecture for volumetric image parsing. IEEE
mosynthesis: comparison of conventional and deep learning ap- Trans Med Imaging 35, 1217–1228.
proaches. In: Medical Imaging. Vol. 9785 of Proceedings of the Golan, D., Donner, Y., Mansi, C., Jaremko, J., Ramachandran, M.,
SPIE. p. 97850X. 2016. Fully automating Graf‘s method for DDH diagnosis using
Fritscher, K., Raudaschl, P., Zaffino, P., Spadea, M. F., Sharp, G. C., deep convolutional neural networks. In: DLMIA. Vol. 10008 of
Schubert, R., 2016. Deep neural networks for fast segmentation of Lect Notes Comput Sci. pp. 130–141.
3D medical images. In: Med Image Comput Comput Assist Interv. Golkov, V., Dosovitskiy, A., Sperl, J., Menzel, M., Czisch, M.,
Vol. 9901 of Lect Notes Comput Sci. pp. 158–165. Samann, P., Brox, T., Cremers, D., 2016. q-Space deep learning:
Fu, H., Xu, Y., Lin, S., Kee Wong, D. W., Liu, J., 2016a. Deepves- Twelve-fold shorter and model-free diffusion MRI scans. IEEE
sel: Retinal vessel segmentation via?deep learning and conditional Trans Med Imaging 35, 1344 – 1351.
random?field. In: Med Image Comput Comput Assist Interv. Vol. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley,
9901 of Lect Notes Comput Sci. pp. 132–139. D., Ozair, S., Courville, A., Bengio, Y., 2014. Generative adversar-
Fu, H., Xu, Y., Wong, D. W. K., Liu, J., 2016b. Retinal vessel seg- ial nets. arXiv:1406.2661.
mentation via deep learning network and fully-connected condi- Greenspan, H., Summers, R. M., van Ginneken, B., 2016. Deep learn-
tional random fields. In: IEEE Int Symp Biomedical Imaging. pp. ing in medical imaging: Overview and future promise of an excit-
698–701. ing new technique. IEEE Trans Med Imaging 35 (5), 1153–1159.
Fukushima, K., 1980. Neocognitron: A self-organizing neural net- Gulshan, V., Peng, L., Coram, M., Stumpe, M. C., Wu, D.,
work model for a mechanism of pattern recognition unaffected by Narayanaswamy, A., Venugopalan, S., Widner, K., Madams, T.,
shift in position. Biol Cybern 36 (4), 193–202. Cuadros, J., Kim, R., Raman, R., Nelson, P. C., Mega, J. L., Web-
Gao, M., Bagci, U., Lu, L., Wu, A., Buty, M., Shin, H.-C., Roth, ster, D. R., Dec. 2016. Development and validation of a deep learn-
H., Papadakis, G. Z., Depeursinge, A., Summers, R. M., Xu, Z., ing algorithm for detection of diabetic retinopathy in retinal fundus
Mollura, D. J., 2016a. Holistic classification of CT attenuation photographs. JAMA 316, 2402–2410.
patterns for interstitial lung diseases via deep convolutional neu- Gülsün, M. A., Funka-Lea, G., Sharma, P., Rapaka, S., Zheng, Y.,
ral networks. Computer Methods in Biomechanics and Biomedical 2016. Coronary centerline extraction via optimal flow paths and
Engineering: Imaging & Visualization, 1–6. CNN path pruning. In: Med Image Comput Comput Assist Interv.
Gao, M., Xu, Z., Lu, L., Harrison, A. P., Summers, R. M., Mollura, Vol. 9902 of Lect Notes Comput Sci. Springer, pp. 317–325.
D. J., 2016b. Multi-label deep regression and unordered pooling Günhan Ertosun, M., Rubin, D. L., 2015. Automated grading of
for holistic interstitial lung disease pattern detection. In: Machine gliomas using deep learning in digital pathology images: a modu-
Learning in Medical Imaging. Vol. 10019 of Lect Notes Comput lar approach with ensemble of convolutional neural networks. In:
Sci. pp. 147–155. AMIA Annual Symposium. pp. 1899–1908.
Gao, M., Xu, Z., Lu, L., Nogues, I., Summers, R., Mollura, D., 2016c. Guo, Y., Gao, Y., Shen, D., 2016. Deformable MR prostate segmen-
Segmentation label propagation using deep convolutional neural tation via deep feature learning and sparse patch matching. IEEE
networks and dense conditional random field. In: IEEE Int Symp Trans Med Imaging 35 (4), 1077–1089.
Biomedical Imaging. pp. 1265–1268. Guo, Y., Wu, G., Commander, L. A., Szary, S., Jewells, V., Lin, W.,
Gao, X., Lin, S., Wong, T. Y., 2015. Automatic feature learning Shen, D., 2014. Segmenting hippocampus from infant brains by
to grade nuclear cataracts based on deep learning. IEEE Trans sparse patch matching with deep-learned features. In: Med Image
Biomed Eng 62 (11), 2693–2701. Comput Comput Assist Interv. Vol. 8674 of Lect Notes Comput

31
Sci. pp. 308–315. A resolution adaptive deep hierarchical (RADHicaL) learning
Han, X.-H., Lei, J., Chen, Y.-W., 2016. HEp-2 cell classification using scheme applied to nuclear segmentation of digital pathology im-
K-support spatial pooling in deep CNNs. In: DLMIA. Vol. 10008 ages. Computer Methods in Biomechanics and Biomedical Engi-
of Lect Notes Comput Sci. pp. 3–11. neering: Imaging & Visualization, 1–7.
Haugeland, J., 1985. Artificial intelligence: the very idea. The MIT Janowczyk, A., Madabhushi, A., 2016. Deep learning for digital
Press, Cambridge, Mass. pathology image analysis: A comprehensive tutorial with selected
Havaei, M., Davy, A., Warde-Farley, D., Biard, A., Courville, A., Ben- use cases. Journal of pathology informatics 7, 29.
gio, Y., Pal, C., Jodoin, P.-M., Larochelle, H., 2016a. Brain tumor Jaumard-Hakoun, A., Xu, K., Roussel-Ragot, P., Dreyfus, G., Denby,
segmentation with Deep Neural Networks. Med Image Anal 35, B., 2016. Tongue contour extraction from ultrasound images based
18–31. on deep neural network. arXiv:1605.05912.
Havaei, M., Guizard, N., Chapados, N., Bengio, Y., 2016b. HeMIS: Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick,
Hetero-modal image segmentation. In: Med Image Comput Com- R., Guadarrama, S., Darrell, T., 2014. Caffe: Convolutional archi-
put Assist Interv. Vol. 9901 of Lect Notes Comput Sci. pp. 469– tecture for fast feature embedding. In: Proceedings of the 22nd
477. ACM International Conference on Multimedia. pp. 675–678.
He, K., Zhang, X., Ren, S., Sun, J., 2015. Deep residual learning for Kainz, P., Pfeiffer, M., Urschler, M., 2015. Semantic segmentation
image recognition. arXiv:1512.03385. of colon glands with deep convolutional neural networks and total
Hinton, G., 2010. A practical guide to training restricted Boltzmann variation segmentation. arXiv:1511.06919.
machines. Momentum 9 (1), 926. Källén, H., Molin, J., Heyden, A., Lundstr, C., Aström, K., 2016. To-
Hinton, G. E., Osindero, S., Teh, Y.-W., 2006. A fast learning algo- wards grading gleason score using generically trained deep convo-
rithm for deep belief nets. Neural Comput 18, 1527–1554. lutional neural networks. In: IEEE Int Symp Biomedical Imaging.
Hinton, G. E., Salakhutdinov, R. R., 2006. Reducing the dimensional- pp. 1163–1167.
ity of data with neural networks. Science 313, 504–507. Kallenberg, M., Petersen, K., Nielsen, M., Ng, A., Diao, P., Igel, C.,
Hochreiter, S., Schmidhuber, J., 1997. Long short-term memory. Neu- Vachon, C., Holland, K., Karssemeijer, N., Lillholm, M., 2016.
ral Computation 9 (8), 1735–1780. Unsupervised deep learning applied to breast density segmenta-
Hoffmann, N., Koch, E., Steiner, G., Petersohn, U., Kirsch, M., 2016. tion and mammographic risk scoring. IEEE Trans Med Imaging
Learning thermal process representations for intraoperative analy- 35, 1322–1331.
sis of cortical perfusion during ischemic strokes. In: DLMIA. Vol. Kamnitsas, K., Ledig, C., Newcombe, V. F., Simpson, J. P., Kane,
10008 of Lect Notes Comput Sci. pp. 152–160. A. D., Menon, D. K., Rueckert, D., Glocker, B., 2017. Efficient
Hoogi, A., Subramaniam, A., Veerapaneni, R., Rubin, D., 2016. multi-scale 3D CNN with fully connected CRF for accurate brain
Adaptive estimation of active contour parameters using convolu- lesion segmentation. Med Image Anal 36, 61–78.
tional neural networks and texture analysis. IEEE Trans Med Imag- Karpathy, A., Fei-Fei, L., June 2015. Deep visual-semantic align-
ing. ments for generating image descriptions. In: Comput Vis Pattern
Hosseini-Asl, E., Gimel’farb, G., El-Baz, A., 2016. Alzheimer’s dis- Recognit. ArXiv:1412.2306.
ease diagnostics by a deeply supervised adaptable 3D convolu- Kashif, M. N., Raza, S. E. A., Sirinukunwattana, K., Arif, M., Ra-
tional network. arXiv:1607.00556. jpoot, N., 2016. Handcrafted features with convolutional neural
Hu, P., Wu, F., Peng, J., Bao, Y., Chen, F., Kong, D., Nov. 2016a. networks for detection of tumor cells in histology images. In: IEEE
Automatic abdominal multi-organ segmentation using deep convo- Int Symp Biomedical Imaging. pp. 1029–1032.
lutional neural network and time-implicit level sets. Int J Comput Kawahara, J., BenTaieb, A., Hamarneh, G., 2016a. Deep features to
Assist Radiol Surg. classify skin lesions. In: IEEE Int Symp Biomedical Imaging. pp.
Hu, P., Wu, F., Peng, J., Liang, P., Kong, D., Dec. 2016b. Automatic 1397–1400.
3D liver segmentation based on deep learning and globally opti- Kawahara, J., Brown, C. J., Miller, S. P., Booth, B. G., Chau, V.,
mized surface evolution. Phys Med Biol 61, 8676–8698. Grunau, R. E., Zwicker, J. G., Hamarneh, G., 2016b. Brain-
Huang, H., Hu, X., Han, J., Lv, J., Liu, N., Guo, L., Liu, T., 2016. NetCNN: Convolutional neural networks for brain networks; to-
Latent source mining in FMRI data via deep neural network. In: wards predicting neurodevelopment. NeuroImage.
IEEE Int Symp Biomedical Imaging. pp. 638–641. Kawahara, J., Hamarneh, G., 2016. Multi-resolution-tract CNN with
Huynh, B. Q., Li, H., Giger, M. L., Jul 2016. Digital mammographic hybrid pretrained and skin-lesion trained layers. In: Machine
tumor classification using transfer learning from deep convolu- Learning in Medical Imaging. Vol. 10019 of Lect Notes Comput
tional neural networks. J Med Imaging 3, 034501. Sci. pp. 164–171.
Hwang, S., Kim, H., 2016. Self-transfer learning for fully weakly su- Kendall, A., Gal, Y., 2017. What uncertainties do we need in bayesian
pervised object localization. arXiv:1602.01625. deep learning for computer vision? arXiv:1703.04977.
Hwang, S., Kim, H.-E., Jeong, J., Kim, H.-J., 2016. A novel approach Kim, E., Cortre-Real, M., Baloch, Z., 2016a. A deep semantic mobile
for tuberculosis screening based on deep convolutional neural net- application for thyroid cytopathology. In: Medical Imaging. Vol.
works. In: Medical Imaging. Vol. 9785 of Proceedings of the SPIE. 9789 of Proceedings of the SPIE. p. 97890A.
pp. 97852W–1. Kim, H., Hwang, S., 2016. Scale-invariant feature learning using
Jamaludin, A., Kadir, T., Zisserman, A., 2016. SpineNet: Automati- deconvolutional neural networks for weakly-supervised semantic
cally pinpointing classification evidence in spinal MRIs. In: Med segmentation. arXiv:1602.04984.
Image Comput Comput Assist Interv. Vol. 9901 of Lect Notes Kim, J., Calhoun, V. D., Shim, E., Lee, J.-H., 2016b. Deep neural net-
Comput Sci. pp. 166–175. work with weight sparsity control and pre-training extracts hierar-
Jamieson, A. R., Drukker, K., Giger, M. L., 2012. Breast image fea- chical features and enhances classification performance: Evidence
ture learning with adaptive deconvolutional networks. In: Medical from whole-brain resting-state functional connectivity patterns of
Imaging. Vol. 8315 of Proceedings of the SPIE. p. 831506. schizophrenia. NeuroImage 124, 127–146.
Janowczyk, A., Basavanhally, A., Madabhushi, A., 2016a. Stain nor- Kingma, D. P., Welling, M., 2013. Auto-encoding variational bayes.
malization using sparse autoencoders (StaNoSA): Application to arXiv:1312.6114.
digital pathology. Comput Med Imaging Graph, in press. Kisilev, P., Sason, E., Barkan, E., Hashoul, S., 2016. Medical image
Janowczyk, A., Doyle, S., Gilmore, H., Madabhushi, A., 2016b. description using multi-task-loss CNN. In: International Workshop

32
on Large-Scale Annotation of Biomedical Data and Expert Label I., Kovacs, I., Hulsbergen-van de Kaa, C., Bult, P., van Ginneken,
Synthesis. Springer, pp. 121–129. B., van der Laak, J., 2016. Deep learning as a tool for increased
Kleesiek, J., Urban, G., Hubert, A., Schwarz, D., Maier-Hein, K., accuracy and efficiency of histopathological diagnosis. Nat Sci Rep
Bendszus, M., Biller, A., 2016. Deep MRI brain extraction: A 3D 6, 26286.
convolutional neural network for skull stripping. NeuroImage 129, Liu, J., Wang, D., Wei, Z., Lu, L., Kim, L., Turkbey, E., Summers,
460–469. R. M., 2016a. Colitis detection on computed tomography using re-
Kong, B., Zhan, Y., Shin, M., Denny, T., Zhang, S., 2016. Recognizing gional convolutional neural networks. In: IEEE Int Symp Biomed-
end-diastole and end-systole frames via deep temporal regression ical Imaging. pp. 863–866.
network. In: Med Image Comput Comput Assist Interv. Vol. 9901 Liu, X., Tizhoosh, H. R., Kofman, J., 2016b. Generating binary tags
of Lect Notes Comput Sci. pp. 264–272. for fast medical image retrieval based on convolutional nets and
Kooi, T., Litjens, G., van Ginneken, B., Gubern-Mérida, A., Sánchez, Radon transform. In: International Joint Conference on Neural
C. I., Mann, R., den Heeten, A., Karssemeijer, N., 2016. Large Networks. ArXiv:1604.04676.
scale deep learning for computer aided detection of mammo- Liu, Y., Gadepalli, K., Norouzi, M., Dahl, G. E., Kohlberger, T.,
graphic lesions. Med Image Anal 35, 303–312. Boyko, A., Venugopalan, S., Timofeev, A., Nelson, P. Q., Corrado,
Kooi, T., van Ginneken, B., Karssemeijer, N., den Heeten, A., 2017. G. S., Hipp, J. D., Peng, L., Stumpe, M. C., 2017. Detecting cancer
Discriminating solitary cysts from soft tissue lesions in mammog- metastases on gigapixel pathology images. arXiv:1703.02442.
raphy using a pretrained deep convolutional neural network. Med- Lo, S.-C., Lou, S.-L., Lin, J.-S., Freedman, M. T., Chien, M. V., Mun,
ical Physics. S. K., 1995. Artificial convolution neural network techniques and
Korez, R., Likar, B., Pernuš, F., Vrtovec, T., 2016. Model-based seg- applications for lung nodule detection. IEEE Trans Med Imaging
mentation of vertebral bodies from MR images with 3D CNNs. In: 14, 711–718.
Med Image Comput Comput Assist Interv. Vol. 9901 of Lect Notes Long, J., Shelhamer, E., Darrell, T., 2015. Fully convolutional net-
Comput Sci. Springer, pp. 433–441. works for semantic segmentation. arXiv:1411.4038.
Krizhevsky, A., Sutskever, I., Hinton, G., 2012. Imagenet classifi- Lu, F., Wu, F., Hu, P., Peng, Z., Kong, D., Feb. 2017. Automatic 3D
cation with deep convolutional neural networks. In: Advances in liver location and segmentation via convolutional neural network
Neural Information Processing Systems. pp. 1097–1105. and graph cut. Int J Comput Assist Radiol Surg 12, 171–182.
Kumar, A., Sridar, P., Quinton, A., Kumar, R. K., Feng, D., Nanan, Lu, X., Xu, D., Liu, D., 2016. Robust 3d organ localization with dual
R., Kim, J., 2016. Plane identification in fetal ultrasound images learning architectures and fusion. In: DLMIA. Vol. 10008 of Lect
using saliency maps and convolutional neural networks. In: IEEE Notes Comput Sci. pp. 12–20.
Int Symp Biomedical Imaging. pp. 791–794. Ma, J., Wu, F., Zhu, J., Xu, D., Kong, D., Jan 2017. A pre-trained
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., 1998. Gradient-based convolutional neural network based method for thyroid nodule di-
learning applied to document recognition. Proceedings of the IEEE agnosis. Ultrasonics 73, 221–230.
86, 2278–2324. Mahapatra, D., Roy, P. K., Sedai, S., Garnavi, R., 2016. Retinal image
Lekadir, K., Galimzianova, A., Betriu, A., Del Mar Vila, M., Igual, quality classification using saliency maps and CNNs. In: Machine
L., Rubin, D. L., Fernandez, E., Radeva, P., Napel, S., Jan. 2017. Learning in Medical Imaging. Vol. 10019 of Lect Notes Comput
A convolutional neural network for automatic characterization of Sci. pp. 172–179.
plaque composition in carotid ultrasound. IEEE J Biomed Health Malon, C. D., Cosatto, E., 2013. Classification of mitotic figures with
Inform 21, 48–55. convolutional neural networks and seeded blob features. Journal of
Lessmann, N., Isgum, I., Setio, A. A., de Vos, B. D., Ciompi, F., pathology informatics.
de Jong, P. A., Oudkerk, M., Mali, W. P. T. M., Viergever, M. A., Maninis, K.-K., Pont-Tuset, J., Arbeláez, P., Gool, L., 2016. Deep reti-
van Ginneken, B., 2016. Deep convolutional neural networks for nal image understanding. In: Med Image Comput Comput Assist
automatic coronary calcium scoring in a screening study with low- Interv. Vol. 9901 of Lect Notes Comput Sci. pp. 140–148.
dose chest CT. In: Medical Imaging. Vol. 9785 of Proceedings of Mansoor, A., Cerrolaza, J., Idrees, R., Biggs, E., Alsharid, M., Avery,
the SPIE. pp. 978511–1 – 978511–6. R., Linguraru, M. G., 2016. Deep learning guided partitioned shape
Li, R., Zhang, W., Suk, H.-I., Wang, L., Li, J., Shen, D., Ji, S., 2014. model for anterior visual pathway segmentation. IEEE Trans Med
Deep learning based imaging data completion for improved brain Imaging 35 (8), 1856–1865.
disease diagnosis. In: Med Image Comput Comput Assist Interv. Mao, Y., Yin, Z., 2016. A hierarchical convolutional neural net-
Vol. 8675 of Lect Notes Comput Sci. pp. 305–312. work for mitosis detection in phase-contrast microscopy images.
Li, W., Cao, P., Zhao, D., Wang, J., 2016a. Pulmonary nodule clas- In: Med Image Comput Comput Assist Interv. Vol. 9901 of Lect
sification with deep convolutional neural networks on computed Notes Comput Sci. pp. 685–692.
tomography images. Computational and Mathematical Methods in Menegola, A., Fornaciali, M., Pires, R., Avila, S., Valle, E., 2016. To-
Medicine, 6215085. wards automated melanoma screening: Exploring transfer learning
Li, W., Jia, F., Hu, Q., 2015. Automatic segmentation of liver tumor schemes. arXiv:1609.01228.
in CT images with deep convolutional neural networks. Journal of Merkow, J., Kriegman, D., Marsden, A., Tu, Z., 2016. Dense volume-
Computer and Communications 3 (11), 146–151. to-volume vascular boundary detection. arXiv:1605.08401.
Li, W., Manivannan, S., Akbar, S., Zhang, J., Trucco, E., McKenna, Miao, S., Wang, Z. J., Liao, R., 2016. A CNN regression approach
S. J., 2016b. Gland segmentation in colon histology images using for real-time 2D/3D registration. IEEE Trans Med Imaging 35 (5),
hand-crafted features and convolutional neural networks. In: IEEE 1352–1363.
Int Symp Biomedical Imaging. pp. 1405–1408. Milletari, F., Ahmadi, S.-A., Kroll, C., Plate, A., Rozanski, V.,
Liao, S., Gao, Y., Oto, A., Shen, D., 2013. Representation learn- Maiostre, J., Levin, J., Dietrich, O., Ertl-Wagner, B., Bötzel, K.,
ing: A unified deep learning framework for automatic prostate mr Navab, N., 2016a. Hough-CNN: Deep learning for segmentation
segmentation. In: Med Image Comput Comput Assist Interv. Vol. of deep brain regions in MRI and ultrasound. arXiv:1601.07014.
8150 of Lect Notes Comput Sci. pp. 254–261. Milletari, F., Navab, N., Ahmadi, S.-A., 2016b. V-Net: Fully convolu-
Lin, M., Chen, Q., Yan, S., 2013. Network in network. tional neural networks for volumetric medical image segmentation.
arXiv:1312.4400. arXiv:1606.04797.
Litjens, G., Sánchez, C. I., Timofeeva, N., Hermsen, M., Nagtegaal, Mishra, M., Schmitt, S., Wang, L., Strasser, M. K., Marr, C., Navab,

33
N., Zischka, H., Peng, T., 2016. Structure-based assessment of 699–702.
cancerous mitochondria using deep networks. In: IEEE Int Symp Payan, A., Montana, G., 2015. Predicting Alzheimer’s disease:
Biomedical Imaging. pp. 545–548. a neuroimaging study with 3D convolutional neural networks.
Moeskops, P., Viergever, M. A., Mendrik, A. M., de Vries, L. S., Ben- arXiv:1502.02506.
ders, M. J. N. L., Isgum, I., 2016a. Automatic segmentation of Payer, C., Stern, D., Bischof, H., Urschler, M., 2016. Regressing
MR brain images with a convolutional neural network. IEEE Trans heatmaps for multiple landmark localization using CNNs. In: Med
Med Imaging 35 (5), 1252–1262. Image Comput Comput Assist Interv. Vol. 9901 of Lect Notes
Moeskops, P., Wolterink, J. M., Velden, B. H. M., Gilhuijs, K. G. A., Comput Sci. pp. 230–238.
Leiner, T., Viergever, M. A., Isgum, I., 2016b. Deep learning for Pereira, S., Pinto, A., Alves, V., Silva, C. A., 2016. Brain tumor
multi-task medical image segmentation in multiple modalities. In: segmentation using convolutional neural networks in MRI images.
Med Image Comput Comput Assist Interv. Vol. 9901 of Lect Notes IEEE Trans Med Imaging.
Comput Sci. pp. 478–486. Phan, H. T. H., Kumar, A., Kim, J., Feng, D., 2016. Transfer learning
Montavon, G., Lapuschkin, S., Binder, A., Samek, W., Müller, K.- of a convolutional neural network for HEp-2 cell image classifica-
R., 2017. Explaining nonlinear classification decisions with deep tion. In: IEEE Int Symp Biomedical Imaging. pp. 1208–1211.
taylor decomposition. Pattern Recognition 65, 211–222. Pinaya, W. H. L., Gadelha, A., Doyle, O. M., Noto, C., Zugman, A.,
Moradi, M., Guo, Y., Gur, Y., Negahdar, M., Syeda-Mahmood, Cordeiro, Q., Jackowski, A. P., Bressan, R. A., Sato, J. R., Dec.
T., 2016a. A cross-modality neural network transform for semi- 2016. Using deep belief network modelling to characterize dif-
automatic medical image annotation. In: Med Image Comput ferences in brain morphometry in schizophrenia. Nat Sci Rep 6,
Comput Assist Interv. Vol. 9901 of Lect Notes Comput Sci. pp. 38897.
300–307. Plis, S. M., Hjelm, D. R., Salakhutdinov, R., Allen, E. A., Bockholt,
Moradi, M., Gur, Y., Wang, H., Prasanna, P., Syeda-Mahmood, T., H. J., Long, J. D., Johnson, H. J., Paulsen, J. S., Turner, J. A., Cal-
2016b. A hybrid learning approach for semantic labeling of cardiac houn, V. D., 2014. Deep learning for neuroimaging: a validation
CT slices and recognition of body position. In: IEEE Int Symp study. Frontiers in Neuroscience.
Biomedical Imaging. Poudel, R. P. K., Lamata, P., Montana, G., 2016. Recurrent fully con-
Nappi, J. J., Hironaka, T., Regge, D., Yoshida, H., 2016. Deep transfer volutional neural networks for multi-slice MRI cardiac segmenta-
learning of virtual endoluminal views for the detection of polyps in tion. arXiv:1608.03974.
CT colonography. In: Medical Imaging. Proceedings of the SPIE. Prasoon, A., Petersen, K., Igel, C., Lauze, F., Dam, E., Nielsen, M.,
p. 97852B. 2013. Deep feature learning for knee cartilage segmentation using
Nascimento, J. C., Carneiro, G., 2016. Multi-atlas segmentation using a triplanar convolutional neural network. In: Med Image Comput
manifold learning with deep belief networks. In: IEEE Int Symp Comput Assist Interv. Vol. 8150 of Lect Notes Comput Sci. pp.
Biomedical Imaging. pp. 867–871. 246–253.
Ngo, T. A., Lu, Z., Carneiro, G., 2017. Combining deep learning and Prentasic, P., Heisler, M., Mammo, Z., Lee, S., Merkur, A., Navajas,
level set for the automated segmentation of the left ventricle of the E., Beg, M. F., Sarunic, M., Loncaric, S., 2016. Segmentation of
heart from cardiac cine magnetic resonance. Med Image Anal 35, the foveal microvasculature using deep learning networks. Journal
159–171. of Biomedical Optics 21, 75008.
Nie, D., Cao, X., Gao, Y., Wang, L., Shen, D., 2016a. Estimating CT Prentasic, P., Loncaric, S., 2016. Detection of exudates in fundus pho-
image from MRI data using 3D fully convolutional networks. In: tographs using deep neural networks and anatomical landmark de-
DLMIA. Vol. 10008 of Lect Notes Comput Sci. pp. 170–178. tection fusion. Comput Methods Programs Biomed 137, 281–292.
Nie, D., Wang, L., Gao, Y., Shen, D., 2016b. Fully convolutional net- Qiu, Y., Wang, Y., Yan, S., Tan, M., Cheng, S., Liu, H., Zheng, B.,
works for multi-modality isointense infant brain image segmenta- 2016. An initial investigation on developing a new method to pre-
tion. In: IEEE Int Symp Biomedical Imaging. pp. 1342–1345. dict short-term breast cancer risk based on deep learning technol-
Nie, D., Zhang, H., Adeli, E., Liu, L., Shen, D., 2016c. 3D deep ogy. In: Medical Imaging. Vol. 9785 of Proceedings of the SPIE.
learning for multi-modal imaging-guided survival time prediction p. 978521.
of brain tumor patients. In: Med Image Comput Comput Assist Quinn, J. A., Nakasi, R., Mugagga, P. K. B., Byanyima, P., Lubega,
Interv. Vol. 9901 of Lect Notes Comput Sci. pp. 212–220. W., Andama, A., 2016. Deep convolutional neural networks for
Nogues, I., Lu, L., Wang, X., Roth, H., Bertasius, G., Lay, N., Shi, J., microscopy-based point of care diagnostics. arXiv:1608.02989.
Tsehay, Y., Summers, R. M., 2016. Automatic lymph node cluster Rajchl, M., Lee, M. C., Oktay, O., Kamnitsas, K., Passerat-Palmbach,
segmentation using holistically-nested neural networks and struc- J., Bai, W., Kainz, B., Rueckert, D., 2016a. DeepCut: Object
tured optimization in CT images. In: Med Image Comput Comput segmentation from bounding box annotations using convolutional
Assist Interv. Vol. 9901 of Lect Notes Comput Sci. pp. 388–397. neural networks. IEEE Trans Med Imaging, in press.
Oktay, O., Bai, W., Lee, M., Guerrero, R., Kamnitsas, K., Caballero, Rajchl, M., Lee, M. C., Schrans, F., Davidson, A., Passerat-Palmbach,
J., Marvao, A., Cook, S., O’Regan, D., Rueckert, D., 2016. Multi- J., Tarroni, G., Alansary, A., Oktay, O., Kainz, B., Rueck-
input cardiac image super-resolution using convolutional neural ert, D., 2016b. Learning under distributed weak supervision.
networks. In: Med Image Comput Comput Assist Interv. Vol. 9902 arXiv:1606.01100.
of Lect Notes Comput Sci. pp. 246–254. Rajkomar, A., Lingam, S., Taylor, A. G., Blum, M., Mongan, J., 2017.
Ortiz, A., Munilla, J., Górriz, J. M., Ramı́rez, J., 2016. Ensem- High-throughput classification of radiographs using deep convolu-
bles of deep learning architectures for the early diagnosis of the tional neural networks. J Digit Imaging 30, 95–101.
Alzheimer’s disease. International Journal of Neural Systems 26, Ravi, D., Wong, C., Deligianni, F., Berthelot, M., Andreu-Perez, J.,
1650025. Lo, B., Yang, G.-Z., Jan. 2017. Deep learning for health informat-
Paeng, K., Hwang, S., Park, S., Kim, M., Kim, S., 2016. A uni- ics. IEEE J Biomed Health Inform 21, 4–21.
fied framework for tumor proliferation score prediction in breast Ravishankar, H., Prabhu, S. M., Vaidya, V., Singhal, N., 2016a.
histopathology. arXiv:1612.07180. Hybrid approach for automatic segmentation of fetal abdomen
Pan, Y., Huang, W., Lin, Z., Zhu, W., Zhou, J., Wong, J., Ding, Z., from ultrasound images using deep learning. In: IEEE Int Symp
2015. Brain tumor grading based on neural networks and convolu- Biomedical Imaging. pp. 779–782.
tional neural networks. In: Conf Proc IEEE Eng Med Biol Soc. pp. Ravishankar, H., Sudhakar, P., Venkataramani, R., Thiruvenkadam,

34
S., Annangi, P., Babu, N., Vaidya, V., 2016b. Understanding the convolutional neural network with transfer learning from mam-
mechanisms of deep transfer learning for medical images. In: mography. Medical Physics 43 (12), 6654–6666.
DLMIA. Vol. 10008 of Lect Notes Comput Sci. pp. 188–196. Sarraf, S., Tofighi, G., 2016. Classification of Alzheimer’s disease us-
Rezaeilouyeh, H., Mollahosseini, A., Mahoor, M. H., 2016. Micro- ing fMRI data and deep learning convolutional neural networks.
scopic medical image classification framework via deep learning arXiv:1603.08631.
and shearlet transform. Journal of Medical Imaging 3 (4), 044501. Schaumberg, A. J., Rubin, M. A., Fuchs, T. J., 2016. H&E-stained
Romo-Bucheli, D., Janowczyk, A., Gilmore, H., Romero, E., Mad- whole slide deep learning predicts SPOP mutation state in prostate
abhushi, A., Sep 2016. Automated tubule nuclei quantification and cancer. bioRxiv:064279.
correlation with Oncotype DX risk categories in ER+ breast cancer Schlegl, T., Waldstein, S. M., Vogl, W.-D., Schmidt-Erfurth, U.,
whole slide images. Nat Sci Rep 6, 32706. Langs, G., 2015. Predicting semantic descriptions from medical
Ronneberger, O., Fischer, P., Brox, T., 2015. U-net: Convolutional images with convolutional neural networks. In: Inf Process Med
networks for biomedical image segmentation. In: Med Image Imaging. Vol. 9123 of Lect Notes Comput Sci. pp. 437–448.
Comput Comput Assist Interv. Vol. 9351 of Lect Notes Comput Sethi, A., Sha, L., Vahadane, A. R., Deaton, R. J., Kumar, N., Macias,
Sci. pp. 234–241. V., Gann, P. H., 2016. Empirical comparison of color normalization
Roth, H. R., Lee, C. T., Shin, H.-C., Seff, A., Kim, L., Yao, J., Lu, methods for epithelial-stromal classification in H and E images. J
L., Summers, R. M., 2015a. Anatomy-specific classification of Pathol Inform 7, 17.
medical images using deep convolutional nets. In: IEEE Int Symp Setio, A. A. A., Ciompi, F., Litjens, G., Gerke, P., Jacobs, C., van Riel,
Biomedical Imaging. pp. 101–104. S., Wille, M. W., Naqibullah, M., Sanchez, C., van Ginneken, B.,
Roth, H. R., Lu, L., Farag, A., Shin, H.-C., Liu, J., Turkbey, E. B., 2016. Pulmonary nodule detection in CT images: false positive re-
Summers, R. M., 2015b. DeepOrgan: Multi-level deep convolu- duction using multi-view convolutional networks. IEEE Trans Med
tional networks for automated pancreas segmentation. In: Med Im- Imaging 35 (5), 1160–1169.
age Comput Comput Assist Interv. Vol. 9349 of Lect Notes Com- Sevetlidis, V., Giuffrida, M. V., Tsaftaris, S. A., Jan. 2016. Whole
put Sci. pp. 556–564. image synthesis using a deep encoder-decoder network. In: Simu-
Roth, H. R., Lu, L., Farag, A., Sohn, A., Summers, R. M., 2016a. lation and Synthesis in Medical Imaging. Vol. 9968 of Lect Notes
Spatial aggregation of holistically-nested networks for automated Comput Sci. pp. 127–137.
pancreas segmentation. In: Med Image Comput Comput Assist In- Shah, A., Conjeti, S., Navab, N., Katouzian, A., 2016. Deeply learnt
terv. Vol. 9901 of Lect Notes Comput Sci. pp. 451–459. hashing forests for content based image retrieval in prostate MR
Roth, H. R., Lu, L., Liu, J., Yao, J., Seff, A., Cherry, K., Kim, L., images. In: Medical Imaging. Vol. 9784 of Proceedings of the
Summers, R. M., 2016b. Improving computer-aided detection us- SPIE. p. 978414.
ing convolutional neural networks and random view aggregation. Shakeri, M., Tsogkas, S., Ferrante, E., Lippe, S., Kadoury, S., Para-
IEEE Trans Med Imaging 35 (5), 1170–1181. gios, N., Kokkinos, I., 2016. Sub-cortical brain structure segmen-
Roth, H. R., Lu, L., Seff, A., Cherry, K. M., Hoffman, J., Wang, S., tation using F-CNNs. In: IEEE Int Symp Biomedical Imaging. pp.
Liu, J., Turkbey, E., Summers, R. M., 2014. A new 2.5D repre- 269–272.
sentation for lymph node detection using random sets of deep con- Shen, D., Wu, G., Suk, H.-I., Mar. 2017. Deep learning in medical
volutional neural network observations. In: Med Image Comput image analysis. Annu Rev Biomed Eng.
Comput Assist Interv. Vol. 8673 of Lect Notes Comput Sci. pp. Shen, W., Yang, F., Mu, W., Yang, C., Yang, X., Tian, J., 2015a.
520–527. Automatic localization of vertebrae based on convolutional neural
Roth, H. R., Wang, Y., Yao, J., Lu, L., Burns, J. E., Summers, R. M., networks. In: Medical Imaging. Vol. 9413 of Proceedings of the
2016c. Deep convolutional networks for automated detection of SPIE. p. 94132E.
posterior-element fractures on spine CT. In: Medical Imaging. Vol. Shen, W., Zhou, M., Yang, F., Dong, D., Yang, C., Zang, Y., Tian,
9785 of Proceedings of the SPIE. p. 97850P. J., 2016. Learning from experts: Developing transferable deep fea-
Roth, H. R., Yao, J., Lu, L., Stieger, J., Burns, J. E., Summers, R. M., tures for patient-level lung cancer prediction. In: Med Image Com-
2015c. Detection of sclerotic spine metastases via random aggrega- put Comput Assist Interv. Vol. 9901 of Lect Notes Comput Sci. pp.
tion of deep convolutional?neural network classifications. In: Re- 124–131.
cent Advances in Computational Methods and Clinical Applica- Shen, W., Zhou, M., Yang, F., Yang, C., Tian, J., 2015b. Multi-scale
tions for Spine Imaging. Vol. 20 of Lecture Notes in Computational convolutional neural networks for lung nodule classification. In:
Vision and Biomechanics. pp. 3–12. Inf Process Med Imaging. Vol. 9123 of Lect Notes Comput Sci.
Rupprecht, C., Huaroc, E., Baust, M., Navab, N., 2016. Deep active pp. 588–599.
contours. arXiv:1607.05074. Shi, J., Zheng, X., Li, Y., Zhang, Q., Ying, S., Jan. 2017. Mul-
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., timodal neuroimaging feature learning with multimodal stacked
Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., deep polynomial networks for diagnosis of Alzheimer’s disease.
Fei-Fei, L., 2014. ImageNet large scale visual recognition chal- IEEE J Biomed Health Inform, in press.
lenge. Int J Comput Vis 115 (3), 1–42. Shin, H.-C., Lu, L., Kim, L., Seff, A., Yao, J., Summers, R. M., 2015.
Sahiner, B., Chan, H.-P., Petrick, N., Wei, D., Helvie, M. A., Adler, Interleaved text/image deep mining on a very large-scale radiology
D. D., Goodsitt, M. M., 1996. Classification of mass and normal database. In: Comput Vis Pattern Recognit. pp. 1090–1099.
breast tissue: a convolution neural network classifier with spatial Shin, H.-C., Orton, M. R., Collins, D. J., Doran, S. J., Leach, M. O.,
domain and texture images. IEEE Trans Med Imaging 15, 598– 2013. Stacked autoencoders for unsupervised feature learning and
610. multiple organ detection in a pilot study using 4D patient data.
Samala, R. K., Chan, H.-P., Hadjiiski, L., Cha, K., Helvie, M. A., IEEE Trans Pattern Anal Mach Intell 35, 1930–1943.
2016a. Deep-learning convolution neural network for computer- Shin, H.-C., Roberts, K., Lu, L., Demner-Fushman, D., Yao, J.,
aided detection of microcalcifications in digital breast tomosyn- Summers, R. M., 2016a. Learning to read chest x-rays: Re-
thesis. In: Medical Imaging. Vol. 9785 of Proceedings of the SPIE. current neural cascade model for automated image annotation.
p. 97850Y. arXiv:1603.08486.
Samala, R. K., Chan, H.-P., Hadjiiski, L., Helvie, M. A., Wei, J., Cha, Shin, H.-C., Roth, H. R., Gao, M., Lu, L., Xu, Z., Nogues, I.,
K., 2016b. Mass detection in digital breast tomosynthesis: Deep Yao, J., Mollura, D., Summers, R. M., 2016b. Deep convolu-

35
tional neural networks for computer-aided detection: CNN archi- 9785 of Proceedings of the SPIE. p. 97850Z.
tectures, dataset characteristics and transfer learning. IEEE Trans Suzani, A., Rasoulian, A., Seitel, A., Fels, S., Rohling, R., Abolmae-
Med Imaging 35 (5), 1285–1298. sumi, P., 2015. Deep learning for automatic localization, identifi-
Shkolyar, A., Gefen, A., Benayahu, D., Greenspan, H., 2015. Au- cation, and segmentation of vertebral bodies in volumetric mr im-
tomatic detection of cell divisions (mitosis) in live-imaging mi- ages. In: Medical Imaging. Vol. 9415 of Proceedings of the SPIE.
croscopy images using convolutional neural networks. In: Conf p. 941514.
Proc IEEE Eng Med Biol Soc. pp. 743–746. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D.,
Simonovsky, M., Gutiérrez-Becker, B., Mateus, D., Navab, N., Ko- Erhan, D., Vanhoucke, V., Rabinovich, A., 2014. Going deeper
modakis, N., 2016. A deep metric for multimodal registration. In: with convolutions. arXiv:1409.4842.
Med Image Comput Comput Assist Interv. Vol. 9902 of Lect Notes Tachibana, R., Näppi, J. J., Hironaka, T., Kim, S. H., Yoshida, H.,
Comput Sci. pp. 10–18. 2016. Deep learning for electronic cleansing in dual-energy ct
Simonyan, K., Zisserman, A., 2014. Very deep convolutional net- colonography. In: Medical Imaging. Vol. 9785 of Proceedings of
works for large-scale image recognition. arXiv:1409.1556. the SPIE. p. 97851M.
Sirinukunwattana, K., Raza, S. E. A., Tsang, Y.-W., Snead, D. R., Tajbakhsh, N., Gotway, M. B., Liang, J., 2015a. Computer-aided pul-
Cree, I. A., Rajpoot, N. M., 2016. Locality sensitive deep learning monary embolism detection using a novel vessel-aligned multi-
for detection and classification of nuclei in routine colon cancer planar image representation and convolutional neural networks. In:
histology images. IEEE Trans Med Imaging 35 (5), 1196–1206. Med Image Comput Comput Assist Interv. Vol. 9350 of Lect Notes
Smistad, E., Løvstakken, L., 2016. Vessel detection in ultrasound im- Comput Sci. pp. 62–69.
ages using deep convolutional neural networks. In: DLMIA. Vol. Tajbakhsh, N., Gurudu, S. R., Liang, J., 2015b. A comprehensive
10008 of Lect Notes Comput Sci. pp. 30–38. computer-aided polyp detection system for colonoscopy videos.
Snoek, J., Larochelle, H., Adams, R. P., 2012. Practical bayesian op- In: Inf Process Med Imaging. Vol. 9123 of Lect Notes Comput
timization of machine learning algorithms. In: Advances in Neural Sci. pp. 327–338.
Information Processing Systems. pp. 2951–2959. Tajbakhsh, N., Shin, J. Y., Gurudu, S. R., Hurst, R. T., Kendall, C. B.,
Song, Y., Tan, E.-L., Jiang, X., Cheng, J.-Z., Ni, D., Chen, S., Lei, Gotway, M. B., Liang, J., 2016. Convolutional neural networks for
B., Wang, T., Sep 2017. Accurate cervical cell segmentation from medical image analysis: Fine tuning or full training? IEEE Trans
overlapping clumps in pap smear images. IEEE Trans Med Imag- Med Imaging 35 (5), 1299–1312.
ing 36, 288–300. Tarando, S. R., Fetita, C., Faccinetto, A., Yves, P., 2016. Increasing
Song, Y., Zhang, L., Chen, S., Ni, D., Lei, B., Wang, T., 2015. Accu- CAD system efficacy for lung texture analysis using a convolu-
rate segmentation of cervical cytoplasm and nuclei based on mul- tional network. In: Medical Imaging. Vol. 9785 of Proceedings of
tiscale convolutional network and graph partitioning. IEEE Trans the SPIE. pp. 97850Q–97850Q.
Biomed Eng 62 (10), 2421–2433. Teikari, P., Santos, M., Poon, C., Hynynen, K., 2016. Deep learn-
Spampinato, C., Palazzo, S., Giordano, D., Aldinucci, M., Leonardi, ing convolutional networks for multiphoton microscopy vascula-
R., Feb. 2017. Deep learning for automated skeletal bone age as- ture segmentation. arXiv:1606.02382.
sessment in X-ray images. Med Image Anal 36, 41–51. Teramoto, A., Fujita, H., Yamamuro, O., Tamaki, T., 2016. Auto-
Springenberg, J. T., Dosovitskiy, A., Brox, T., Riedmiller, M., 2014. mated detection of pulmonary nodules in PET/CT images: Ensem-
Striving for simplicity: The all convolutional net. arXiv preprint ble false-positive reduction using a convolutional neural network
arXiv:1412.6806. technique. Med Phys 43, 2821–2827.
Štern, D., Payer, C., Lepetit, V., Urschler, M., 2016. Automated age Thong, W., Kadoury, S., Piché, N., Pal, C. J., 2016. Convolutional
estimation from hand MRI volumes using deep learning. In: Med networks for kidney segmentation in contrast-enhanced CT scans.
Image Comput Comput Assist Interv. Vol. 9901 of Lect Notes Computer Methods in Biomechanics and Biomedical Engineering:
Comput Sci. pp. 194–202. Imaging & Visualization, 1–6.
Stollenga, M. F., Byeon, W., Liwicki, M., Schmidhuber, J., 2015. Par- Tran, P. V., 2016. A fully convolutional neural network for cardiac
allel multi-dimensional LSTM, with application to fast biomedical segmentation in short-axis MRI. arXiv:1604.00494.
volumetric image segmentation. In: Advances in Neural Informa- Turkki, R., Linder, N., Kovanen, P. E., Pellinen, T., Lundin, J., 2016.
tion Processing Systems. pp. 2998–3006. Antibody-supervised deep learning for quantification of tumor-
Suk, H.-I., Lee, S.-W., Shen, D., 2014. Hierarchical feature repre- infiltrating immune cells in hematoxylin and eosin stained breast
sentation and multimodal fusion with deep learning for AD/MCI cancer samples. Journal of pathology informatics 7, 38.
diagnosis. NeuroImage 101, 569–582. Twinanda, A. P., Shehata, S., Mutter, D., Marescaux, J., de Mathelin,
Suk, H.-I., Lee, S.-W., Shen, D., 2015. Latent feature representa- M., Padoy, N., 2017. Endonet: A deep architecture for recognition
tion with stacked auto-encoder for AD/MCI diagnosis. Brain Struct tasks on laparoscopic videos. IEEE Trans Med Imaging 36, 86–97.
Funct 220, 841–859. van der Burgh, H. K., Schmidt, R., Westeneng, H.-J., de Reus, M. A.,
Suk, H.-I., Shen, D., 2013. Deep learning-based feature representation van den Berg, L. H., van den Heuvel, M. P., 2017. Deep learning
for AD/MCI classification. In: Med Image Comput Comput Assist predictions of survival based on MRI in amyotrophic lateral scle-
Interv. Vol. 8150 of Lect Notes Comput Sci. pp. 583–590. rosis. NeuroImage. Clinical 13, 361–369.
Suk, H.-I., Shen, D., 2016. Deep ensemble sparse regression network van Ginneken, B., Setio, A. A., Jacobs, C., Ciompi, F., 2015. Off-the-
for Alzheimer’s disease diagnosis. In: Med Image Comput Comput shelf convolutional neural network features for pulmonary nod-
Assist Interv. Vol. 10019 of Lect Notes Comput Sci. pp. 113–121. ule detection in computed tomography scans. In: IEEE Int Symp
Suk, H.-I., Wee, C.-Y., Lee, S.-W., Shen, D., 2016. State-space model Biomedical Imaging. pp. 286–289.
with deep learning for functional dynamics estimation in resting- van Grinsven, M. J. J. P., van Ginneken, B., Hoyng, C. B., Theelen,
state fMRI. NeuroImage 129, 292–307. T., Sánchez, C. I., 2016. Fast convolutional neural network train-
Sun, W., Tseng, T.-L. B., Zhang, J., Qian, W., 2016a. Enhancing deep ing using selective data sampling: Application to hemorrhage de-
convolutional neural network scheme for breast cancer diagnosis tection in color fundus images. IEEE Trans Med Imaging 35 (5),
with unlabeled data. Comput Med Imaging Graph. 1273–1284.
Sun, W., Zheng, B., Qian, W., 2016b. Computer aided lung cancer van Tulder, G., de Bruijne, M., 2016. Combining generative and
diagnosis with deep learning algorithms. In: Medical Imaging. Vol. discriminative representation learning for lung CT analysis with

36
convolutional Restricted Boltzmann Machines. IEEE Trans Med Imaging & Visualization, 1–10.
Imaging 35 (5), 1262–1272. Xie, Y., Kong, X., Xing, F., Liu, F., Su, H., Yang, L., 2015a. Deep vot-
Veta, M., van Diest, P. J., Pluim, J. P. W., 2016. Cutting out the mid- ing: A robust approach toward nucleus localization in microscopy
dleman: measuring nuclear area in histopathology slides without images. In: Med Image Comput Comput Assist Interv. Vol. 9351
segmentation. In: Med Image Comput Comput Assist Interv. Vol. of Lect Notes Comput Sci. pp. 374–382.
9901 of Lect Notes Comput Sci. pp. 632–639. Xie, Y., Xing, F., Kong, X., Su, H., Yang, L., 2015b. Beyond classi-
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.-A., fication: Structured regression for robust cell detection using con-
2010. Stacked denoising autoencoders: Learning useful represen- volutional neural network. In: Med Image Comput Comput Assist
tations in a deep network with a local denoising criterion. J Mach Interv. Vol. 9351 of Lect Notes Comput Sci. pp. 358–365.
Learn Res 11, 3371–3408. Xie, Y., Zhang, Z., Sapkota, M., Yang, L., 2016b. Spatial clock-
Vivanti, R., Ephrat, A., Joskowicz, L., Karaaslan, O., Lev-Cohain, work recurrent neural network for muscle perimysium segmen-
N., Sosna, J., 2015. Automatic liver tumor segmentation in follow- tation. In: International Conference on Medical Image Comput-
up ct studies using convolutional neural networks. In: Proc. ing and Computer-Assisted Intervention. Vol. 9901 of Lect Notes
Patch-Based Methods in Medical Image Processing Workshop, Comput Sci. Springer, pp. 185–193.
MICCAI.–2015. pp. 54–61. Xing, F., Xie, Y., Yang, L., 2016. An automatic learning-based frame-
Wang, C., Elazab, A., Wu, J., Hu, Q., Nov. 2016a. Lung nodule clas- work for robust nucleus segmentation. IEEE Trans Med Imaging
sification using deep feature fusion in chest radiography. Comput 35 (2), 550–566.
Med Imaging Graph. Xu, J., Luo, X., Wang, G., Gilmore, H., Madabhushi, A., 2016a. A
Wang, C., Yan, X., Smith, M., Kochhar, K., Rubin, M., Warren, S. M., deep convolutional neural network for segmenting and classifying
Wrobel, J., Lee, H., 2015. A unified framework for automatic epithelial and stromal regions in histopathological images. Neuro-
wound segmentation and analysis with deep convolutional neural computing 191, 214–223.
networks. In: Conf Proc IEEE Eng Med Biol Soc. pp. 2415–2418. Xu, J., Xiang, L., Liu, Q., Gilmore, H., Wu, J., Tang, J., Madabhushi,
Wang, D., Khosla, A., Gargeya, R., Irshad, H., Beck, A. H., A., 2016b. Stacked sparse autoencoder (ssae) for nuclei detection
2016b. Deep learning for identifying metastatic breast cancer. on breast cancer histopathology images. IEEE Trans Med Imaging
arXiv:1606.05718. 35, 119–130.
Wang, G., 2016. A perspective on deep imaging. IEEE Access 4, Xu, T., Zhang, H., Huang, X., Zhang, S., Metaxas, D. N., 2016c. Mul-
8914–8924. timodal deep learning for cervical dysplasia diagnosis. In: Med Im-
Wang, H., Cruz-Roa, A., Basavanhally, A., Gilmore, H., Shih, N., age Comput Comput Assist Interv. Vol. 9901 of Lect Notes Com-
Feldman, M., Tomaszewski, J., Gonzalez, F., Madabhushi, A., put Sci. pp. 115–123.
2014. Mitosis detection in breast cancer pathology images by com- Xu, Y., Li, Y., Liu, M., Wang, Y., Lai, M., Chang, E. I.-C., 2016d.
bining handcrafted and convolutional neural network features. J Gland instance segmentation by deep multichannel side supervi-
Med Imaging 1, 034003. sion. arXiv:1607.03222.
Wang, J., Ding, H., Azamian, F., Zhou, B., Iribarren, C., Molloi, S., Xu, Y., Mo, T., Feng, Q., Zhong, P., Lai, M., Chang, E. I. C., 2014.
Baldi, P., 2017. Detecting cardiovascular disease from mammo- Deep learning of feature representation with multiple instance
grams with deep learning. IEEE Trans Med Imaging. learning for medical image analysis. In: IEEE International Con-
Wang, J., MacKenzie, J. D., Ramachandran, R., Chen, D. Z., 2016c. ference on Acoustics, Speech and Signal Processing (ICASSP). pp.
A deep learning approach for semantic segmentation in histology 1626–1630.
tissue images. In: Med Image Comput Comput Assist Interv. Vol. Xu, Z., Huang, J., 2016. Detecting 10,000 Cells in one second. In:
9901 of Lect Notes Comput Sci. Springer, pp. 176–184. Med Image Comput Comput Assist Interv. Vol. 9901 of Lect Notes
Wang, S., Yao, J., Xu, Z., Huang, J., 2016d. Subtype cell detection Comput Sci. pp. 676–684.
with an accelerated deep convolution neural network. In: Med Im- Xue, D.-X., Zhang, R., Feng, H., Wang, Y.-L., 2016. CNN-SVM for
age Comput Comput Assist Interv. Vol. 9901 of Lect Notes Com- microvascular morphological type recognition with data augmen-
put Sci. pp. 640–648. tation. J Med Biol Eng 36, 755–764.
Wang, X., Lu, L., Shin, H.-c., Kim, L., Nogues, I., Yao, J., Sum- Yan, Z., Zhan, Y., Peng, Z., Liao, S., Shinagawa, Y., Zhang, S.,
mers, R., 2016e. Unsupervised category discovery via looped Metaxas, D. N., Zhou, X. S., 2016. Multi-instance deep learning:
deep pseudo-task optimization using a large scale radiology image Discover discriminative local anatomies for bodypart recognition.
database. arXiv:1603.07965. IEEE Trans Med Imaging 35 (5), 1332–1343.
Wolterink, J. M., Leiner, T., de Vos, B. D., van Hamersvelt, R. W., Yang, D., Zhang, S., Yan, Z., Tan, C., Li, K., Metaxas, D., 2015.
Viergever, M. A., Isgum, I., 2016. Automatic coronary artery cal- Automated anatomical landmark detection on distal femur surface
cium scoring in cardiac CT angiography using paired convolu- using convolutional neural network. In: IEEE Int Symp Biomedi-
tional neural networks. Med Image Anal 34, 123–136. cal Imaging. pp. 17–21.
Worrall, D. E., Wilson, C. M., Brostow, G. J., 2016. Automated Yang, H., Sun, J., Li, H., Wang, L., Xu, Z., 2016a. Deep fusion net
retinopathy of prematurity case detection with convolutional neu- for multi-atlas segmentation: Application to cardiac mr images. In:
ral networks. In: DLMIA. Vol. 10008 of Lect Notes Comput Sci. Med Image Comput Comput Assist Interv. Vol. 9901 of Lect Notes
pp. 68–76. Comput Sci. pp. 521–528.
Wu, A., Xu, Z., Gao, M., Buty, M., Mollura, D. J., 2016. Deep vessel Yang, L., Zhang, Y., Guldner, I. H., Zhang, S., Chen, D. Z., 2016b. 3d
tracking: A generalized probabilistic approach via deep learning. segmentation of glial cells using fully convolutional networks and
In: IEEE Int Symp Biomedical Imaging. pp. 1363–1367. k-terminal cut. In: Med Image Comput Comput Assist Interv. Vol.
Wu, G., Kim, M., Wang, Q., Gao, Y., Liao, S., Shen, D., 2013. Unsu- 9901 of Lect Notes Comput Sci. Springer, pp. 658–666.
pervised deep feature learning for deformable registration of MR Yang, W., Chen, Y., Liu, Y., Zhong, L., Qin, G., Lu, Z., Feng, Q.,
brain images. In: Med Image Comput Comput Assist Interv. Vol. Chen, W., 2016c. Cascade of multi-scale convolutional neural net-
8150 of Lect Notes Comput Sci. pp. 649–656. works for bone suppression of chest radiographs in gradient do-
Xie, W., Noble, J. A., Zisserman, A., 2016a. Microscopy cell count- main. Med Image Anal 35, 421–433.
ing and detection with fully convolutional regression networks. Yang, X., Kwitt, R., Niethammer, M., 2016d. Fast predictive image
Computer Methods in Biomechanics and Biomedical Engineering: registration. In: DLMIA. Vol. 10008 of Lect Notes Comput Sci.

37
pp. 48–57. cup and disc segmentation. Comput Med Imaging Graph 55, 28–
Yao, J., Wang, S., Zhu, X., Huang, J., 2016. Imaging biomarker dis- 41.
covery for lung cancer survival prediction. In: Med Image Comput Zreik, M., Leiner, T., de Vos, B., van Hamersvelt, R., Viergever, M.,
Comput Assist Interv. Vol. 9901 of Lect Notes Comput Sci. pp. Isgum, I., 2016. Automatic segmentation of the left ventricle in
649–657. cardiac CT angiography using convolutional neural networks. In:
Yoo, Y., Tang, L. W., Brosch, T., Li, D. K. B., Metz, L., Traboulsee, IEEE Int Symp Biomedical Imaging. pp. 40–43.
A., Tam, R., 2016. Deep learning of brain lesion patterns for pre-
dicting future disease activity in patients with early symptoms of
multiple sclerosis. In: DLMIA. Vol. 10008 of Lect Notes Comput
Sci. pp. 86–94.
Ypsilantis, P.-P., Siddique, M., Sohn, H.-M., Davies, A., Cook, G.,
Goh, V., Montana, G., 2015. Predicting response to neoadjuvant
chemotherapy with pet imaging using convolutional neural net-
works. PLoS ONE 10 (9), 1–18.
Yu, L., Chen, H., Dou, Q., Qin, J., Heng, P. A., 2016a. Automated
melanoma recognition in dermoscopy images via very deep resid-
ual networks. IEEE Trans Med Imaging, in press.
Yu, L., Guo, Y., Wang, Y., Yu, J., Chen, P., Nov. 2016b. Segmentation
of fetal left ventricle in echocardiographic sequences based on dy-
namic convolutional neural networks. IEEE Trans Biomed Eng, in
press.
Yu, L., Yang, X., Chen, H., Qin, J., Heng, P. A., 2017. Volumetric
convnets with mixed residual connections for automated prostate
segmentation from 3D MR images. In: Thirty-First AAAI Confer-
ence on Artificial Intelligence.
Zeiler, M. D., Fergus, R., 2014. Visualizing and understanding convo-
lutional networks. In: European Conference on Computer Vision.
pp. 818–833.
Zhang, H., Li, L., Qiao, K., Wang, L., Yan, B., Li, L., Hu, G., 2016a.
Image prediction for limited-angle tomography via deep learning
with convolutional neural network. arXiv:1607.08707.
Zhang, L., Gooya, A., Dong, B. H. R., Petersen, S. E., Medrano-
Gracia, K. P., Frangi, A. F., 2016b. Automated quality assessment
of cardiac MR images using convolutional neural networks. In:
SASHIMI. Vol. 9968 of Lect Notes Comput Sci. pp. 138–145.
Zhang, Q., Xiao, Y., Dai, W., Suo, J., Wang, C., Shi, J., Zheng, H.,
2016c. Deep learning based classification of breast tumors with
shear-wave elastography. Ultrasonics 72, 150–157.
Zhang, R., Zheng, Y., Mak, T. W. C., Yu, R., Wong, S. H., Lau, J.
Y. W., Poon, C. C. Y., Jan. 2017. Automatic detection and classifi-
cation of colorectal polyps by transferring low-level CNN features
from nonmedical domain. IEEE J Biomed Health Inform 21, 41–
47.
Zhang, W., Li, R., Deng, H., Wang, L., Lin, W., Ji, S., Shen, D., 2015.
Deep convolutional neural networks for multi-modality isointense
infant brain image segmentation. NeuroImage 108, 214–224.
Zhao, J., Zhang, M., Zhou, Z., Chu, J., Cao, F., Nov. 2016. Automatic
detection and classification of leukocytes using convolutional neu-
ral networks. Medical & Biological Engineering & Computing.
Zhao, L., Jia, K., 2016. Multiscale CNNs for brain tumor segmenta-
tion and diagnosis. Computational and Mathematical Methods in
Medicine 2016, 8356294.
Zheng, Y., Liu, D., Georgescu, B., Nguyen, H., Comaniciu, D., 2015.
3D deep learning for efficient and robust landmark detection in
volumetric data. In: Med Image Comput Comput Assist Interv.
Vol. 9349 of Lect Notes Comput Sci. pp. 565–572.
Zhou, X., Ito, T., Takayama, R., Wang, S., Hara, T., Fujita, H., 2016.
Three-dimensional CT image segmentation by combining 2D fully
convolutional network with 3D majority voting. In: DLMIA. Vol.
10008 of Lect Notes Comput Sci. pp. 111–120.
Zhu, Y., Wang, L., Liu, M., Qian, C., Yousuf, A., Oto, A., Shen,
D., Jan. 2017. MRI based prostate cancer detection with high-level
representation and hierarchical classification. Med Phys, in press.
Zilly, J., Buhmann, J. M., Mahapatra, D., 2017. Glaucoma detection
using entropy sampling and ensemble learning for automatic optic

38

You might also like