1 s2.0 S0169260717301451 Main - Read PDF
1 s2.0 S0169260717301451 Main - Read PDF
1 s2.0 S0169260717301451 Main - Read PDF
a r t i c l e i n f o a b s t r a c t
Article history: Background and objective: Radiologists often have a hard time classifying mammography mass lesions
Received 10 February 2017 which leads to unnecessary breast biopsies to remove suspicions and this ends up adding exorbitant
Revised 24 December 2017
expenses to an already burdened patient and health care system.
Accepted 10 January 2018
Methods: In this paper we developed a Computer-aided Diagnosis (CAD) system based on deep Convo-
lutional Neural Networks (CNN) that aims to help the radiologist classify mammography mass lesions.
Keywords: Deep learning usually requires large datasets to train networks of a certain depth from scratch. Transfer
Deep learning learning is an effective method to deal with relatively small datasets as in the case of medical images,
Convolutional Neural Network although it can be tricky as we can easily start overfitting.
Transfer learning
Results: In this work, we explore the importance of transfer learning and we experimentally determine
Computer-aided Diagnosis
the best fine-tuning strategy to adopt when training a CNN model. We were able to successfully fine-tune
Breast cancer
Breast mass lesion classification some of the recent, most powerful CNNs and achieved better results compared to other state-of-the-art
methods which classified the same public datasets. For instance we achieved 97.35% accuracy and 0.98
AUC on the DDSM database, 95.50% accuracy and 0.97 AUC on the INbreast database and 96.67% accuracy
and 0.96 AUC on the BCDR database. Furthermore, after pre-processing and normalizing all the extracted
Regions of Interest (ROIs) from the full mammograms, we merged all the datasets to build one large
set of images and used it to fine-tune our CNNs. The CNN model which achieved the best results, a
98.94% accuracy, was used as a baseline to build the Breast Cancer Screening Framework. To evaluate the
proposed CAD system and its efficiency to classify new images, we tested it on an independent database
(MIAS) and got 98.23% accuracy and 0.99 AUC.
Conclusion: The results obtained demonstrate that the proposed framework is performant and can indeed
be used to predict if the mass lesions are benign or malignant.
© 2018 Elsevier B.V. All rights reserved.
https://doi.org/10.1016/j.cmpb.2018.01.011
0169-2607/© 2018 Elsevier B.V. All rights reserved.
20 H. Chougrad et al. / Computer Methods and Programs in Biomedicine 157 (2018) 19–30
Fig. 2. Pre-processing of the mammograms; first row of the figure gives the example of a benign lesion and the second row a malignant lesion. (a) the original mammogram
(b) illustration of the location and boundaries of the lesions annotated by imaging specialists (c) the cropped region of interest (d) the normalized ROI after applying Global
Contrast Normalization. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Table 1
Total number of layers of each model including the fully-connected layers added (5 dense layers added to each model). As well as the
number of convolutional layers fine-tuned for the different fine-tuning strategies adopted. Note that, a dense layer or convolutional
layer followed by a non-linearity is counted as one layer.
Model Total number of layers Last 1 convolutional block Last 2 convolutional blocks Last 3 convolutional blocks
VGG16 23 4 8 12
ResNet50 179 12 22 34
Inceptionv3 221 25 44 58
ResNet50: The ResNet50 is one of the models proposed in the Authors in [31] demonstrated that transfer of knowledge in net-
deep residual learning for image recognition [22] by the Microsoft works could be achieved by first training a neural network on a
research team .The authors came up with a simple and elegant domain for which there is a large amount of data, and then re-
idea. They take a standard deep CNN and add shortcut connections training that network on a related but different domain via fine-
that bypass few convolutional layers at a time. The shortcut con- tuning its weights. [13,14] were able to show that transfer learning
nections create residual blocks, where the output of the convolu- can be beneficial even between two unrelated domains (natural vs.
tional layers is added to the block’s input tensor. For instance, the medical). The advantage of using pre-trained models extends be-
ResNet50 model is composed of 50 layers of similar blocks with yond the limited data issue, where it was proven to be an effective
shortcut connections. These connections keep the computation low initialization technique for many complex models [32,33]
and at the same time provide rich combination features. We propose to investigate the adequacy of this technique for
The ResNet50 model used has one convolutional layer followed our case of study, either to deal with our little data or as a way
by a batch normalization layer, and has two pooling layers in be- for initializing the models. Fig. 3 gives the schema for the models
tween which there is a total of 16 residual modules. Two kinds of setup. The CNN models are built differently but the same proce-
residual modules are alternated, one that has 4 convolutional lay- dure is applied to all of them:
ers and another with 3 convolutional layers and each convolutional
• For starters we kept the original networks architectures up till
layer is followed by batch normalization. The residual block with 4
the fully-connected layers.
convolutional layers is the first one used, followed by at least two
• The original fully-connected layers were built for the ImageNet
or more residual blocks with 3 convolutional layers and so on. The
dataset with 10 0 0 outputs for 10 0 0 class categories. We re-
implemented ResNet50 model achieves a 7.8% top-5 error on Ima-
moved these last fully-connected layers and built our own fully-
geNet.
connected model, on top of the convolutional part of the mod-
Inception v3: The Google research team with Christian Szegedy
els, suited to our number of classes (i.e. 2 classes “Benign” and
were mainly focused on reducing the computational burden of
“Malignant”).
CNNs while maintaining the same level of performance. They in-
troduced a new module named “The inception module" which, for The new customized models (VGG16, ResNet50, Inception v3)
the most part, can be described as a 4 parallel pathways of 1 × 1, will be used to train on the different datasets while adopting dif-
3 × 3 and 5 × 5 convolution filters. And because of the parallel net- ferent training strategies. We first conduct an ablation study where
work implementation, in addition to the down sampling layers in we initialize our models randomly. Then, we use the models as
each block, the model’s execution time beats VGG or ResNet. fixed feature extractors and use the Softmax layer on top as a clas-
The research team proposed many models over the years which sifier. Finally, we adopt a fine-tuning strategy and study the impact
are more and more complex, the Inception v3 [23] model was in- of the fraction fine-tuned on the final results (Fig. 3).
troduced at about the same time as ResNet. The network was built The first convolutional layers of a CNN learn generic features
with some new designing principles for example the use of 3 × 3 and can perform more like edge detectors, which should be use-
convolutions instead of 5 × 5 or 7 × 7 in the inception modules, ful to many tasks, but the following layers become progressively
also the expansion of width at each layer to increase the combina- more specific to the details of the classes contained in the dataset
tion of features for the next layer, as well as the aim of construct- [34]. In accordance with this statement and since mammographic
ing a network with a computational budget balanced between its mass lesion images are very different from ImageNet images, we
depth and width. propose to fine-tune our models to adjust the features of the last
The Inception v3 we implemented has 5 convolutional layers convolutional blocks and make them more data-specific; we fine-
each one followed by a batch normalization layer, 2 pooling lay- tune the weights of the pre-trained networks using the new set of
ers and 11 inception modules. The inception modules used contain images by resuming the backpropagation on the unfrozen layers.
different numbers of paths and convolution layers. Authors of the We propose a detailed study on the impact of the chosen frac-
Inception v3 did not define an “Inception cell" and then repeatedly tion of convolutional layers (unfrozen layers) to fine-tune, on the
applied it to downscale the input. Therefore, the inception mod- final results in the experimental section. Table 1 gives the num-
ules used, sometimes consist of 4, 6, 7, 9 or 10 convolutional layers ber of layers of each model and the number of layers we choose
followed by batch normalization and one pooling layer. The imple- to fine-tune, while the rest of the model is frozen for the dif-
mented model by Chollet [30] achieves 7.8% as a top-5 error on ferent fine-tuning strategies adopted (1 block, 2 blocks, 3 blocks
ImageNet, same as ResNet50. and all the blocks). Since the models are very different a convolu-
tional block varies from one model to another as shown in Fig. 3.
For VGG16 a convolutional block contains 3 convolutional opera-
2.2.3. Transfer learning and fine-tuning tions followed by an activation and a pooling layer. In the case of
As we begin to explore deep learning models from more spe- ResNet50 the convolutional block is a residual block while for the
cialized domains as the quantity of available data gets scarce. Even inception v3 it is an inception module.
though we have impressive training methods nowadays, training
deep learning models on small quantities of data is very difficult. 2.2.4. Regularization and the choice of hyper-parameters
The actual paradigm used to deal with this issue has come through Choosing the right parameters when fine-tuning is tricky. The
the use of pre-trained neural networks [19]. optimization is done using Stochastic Gradient Descent (SGD)
H. Chougrad et al. / Computer Methods and Programs in Biomedicine 157 (2018) 19–30 23
Fig. 3. Schema representing the different architectures and implementations of the models using transfer learning while adopting a fine-tuning strategy for some of the last
convolutional blocks. The top part of the figure gives an overview of some of the layers composing each model; each layer is represented with a different color (keys in
the bottom-right of the figure). The bottom-left part of the figure gives a detailed architecture of the customized fully-connected layers added to the bottom convolutional
part of each model; note that the randomly turned-off activations in the dropout layer are represented with dotted circles. The three differently-colored dotted rectangular
selections represent the different implementations of the models i.e. in each implementation the selected layers in the rectangle are fine-tuned while the rest of the model’s
layers are frozen.
{1FT = only one convolutional layer + the fully-connected part are fined-tuned while the rest of the layers are frozen;
2FT = two convolutional layers + the fully-connected part are fined-tuned while the rest of the layers are frozen;
3FT = three convolutional layer + the fully-connected part are fined-tuned while the rest of the layers are frozen;
BN = batch-normalization}.
rather than and adaptive learning rate optimizer, to make sure the To ensure that our model generalizes well, we used L2 regular-
magnitude of the learning rate stays small and not wreck the pre- ization (weight decay) to penalize large weights and prefer smaller
viously learned features [35]. ones. The L2 regularization penalty operates on the weight matrix
When training the fully-connected model we used the adaptive W and is written as: R(W ) = i j Wi,2j .
ADAM optimizer [36] (Adaptive Moment Estimation). The method N
The loss function then becomes: L = N1 i=1 Li + λ R (W )
is designed to combine the advantages of two recently popular where λ is a hyper-parameter which controls the amount of the
methods: AdaGrad and RMSProp, it computes individual adaptive regularization we are applying. We used λ = 1 as it gave the best
learning rates for different parameters from estimates of first and results.
second moments of the gradients. Moreover, we added a dropout layer [37] so that it randomly
On the other hand, when fine-tuning we used the SGD opti- turns off the activations at training time with a probability of .5.
mizer with a small learning rate. We chose an initial learning rate The randomly selected subset of activations are set to zero, which
of 1e-4 and it was divided by 10 each time the validation error prevents some unit in one layer from relying too strongly on a sin-
stopped improving. We also adopted an early stopping strategy to gle unit in the previous layer (Fig. 3).
monitor the validation loss with a patience set to 20 epochs i.e.
the number of epochs to wait for the accuracy to get better, before
stopping, if no progress is noted on the validation set. 2.2.5. The Breast Cancer Screening Framework
Additionally, to improve the results and avoid overfitting, we After fine-tuning the CNNs, we saved the weights of each model
used some tricks, for instance, we performed data augmentation, in HDF5 format and the structure in a JSON format. The model
L2 regularization and dropout. achieving the highest performance (see Table 2) i.e. the Inception
v3 model trained on the merged database, which we will refer to
24 H. Chougrad et al. / Computer Methods and Programs in Biomedicine 157 (2018) 19–30
Table 2
Summary of the results obtained when fine-tuning the CNNs on the datasets.
Dataset N of images Model Accuracy (%) Std (%) Time elapsed (Min)
as “Inceptionv 3-MD”, was used as a baseline to build the Breast We train and evaluate the CNNs using a stratified 5-fold
Cancer Screening Framework. cross validation. The mean accuracy, standard deviation and time
To evaluate the performance of the Inceptionv 3-MD model, we elapsed for training each model is reported and are used for com-
tested it on an independent database. Initially, we one-hot encoded paring the different setups.
all of the database labels i.e. each label was represented by a vec- First, we investigate the extent that has transfer learning,
tor p of size K = 2 corresponding to the number of output classes. through the use of pre-trained weights as an initialization for the
p is composed of the value 1 for the correct class and 0 for the CNNs, over training the models from scratch with a random ini-
other class. tialization (Fig. 5). We use the pre-trained models as fixed features
Next, we fed the test images to the model and got the outputs extractors and the Softmax layer on top as a classifier; we call this
probabilities from the Softmax classifier: a 0 fine-tuning strategy (Fig. 6). Finally, we carry out an experi-
exp(yi ) mentation to find out the optimal number of layers we need to
f i (y ) = fine-tune for each model in order to get the best performance. We
j exp y j test with fine-tuning one, two, three and all of the convolutional
The Softmax function gives normalized class probabilities for blocks of our models and we examine their performance on DDSM,
the output being “Benign” or “Malignant”, the sum of these prob- BCDR and INbreast databases. The blocks were fine-tuned for 90
abilities adds up to 1. epochs with a batch size of 128 images. We used the Keras library
We could then measure the accuracy of the model by compar- [30] with Theano [38] as a backend and the Cuda enabled GPU
ing the two vectors: first the Softmax vector that comes out of NVidia GTX 980 M.
the classifier and contains the probabilities of the classes, and the
other one is the one-hot encoded label vector.
To measure the distance between these two probability vectors,
3.1. Random initialization vs. transfer learning
we use the cross-entropy loss:
exp( fyi ) On the one hand, we randomly initialize all our models and
Li = −log train them on the datasets and on the other we use the pre-trained
j exp f j models as an initialization for our models. Fig. 5 gives the results
where fj is the jth element of the vector of class scores f. We de- from the comparison between the two different setups trained on
note the distance between the two vectors with D, the Softmax DDSM, BCDR and INbreast.
function with f and the label vector with p. The cross-entropy be- Random initialization merely samples each weight from a stan-
tween a “true” distribution p and an estimated distribution f is de- dard distribution with a low deviation. The idea is to pick weight
fined as: values at random following a distribution which would help the
optimization process to converge to a meaningful solution.
D( f, p) = − pi log( fi )
i The networks weights were initialized to a small random num-
When the ith entry corresponds to the correct class pi = 1, the ber generated from a Gaussian distribution, in our case the values
cost (i.e., distance) becomes − log(fi ) and when the ith entry corre- were between 0 and 0.05. The low deviation allows to bias the net-
sponds to the incorrect class, pi = 0, the entry in fi becomes irrel- work towards a simple 0 solution, without the bad repercussions
evant for the cost. of actually initializing the weights to 0 [35].
To further evaluate the general applicability of the model, we As an alternative, we perform transfer learning through the use
built a user-friendly interface based on the Inceptionv 3-MD to of pre-trained weights, obtained from the CNNs training on Ima-
classify new images. geNet, as an initialization for our networks weights. In Fig. 5 we
We used python and Tkinter to create the GUI (Graphic User compare random initialization versus the transfer learning strategy
Interface) and Keras [30] with Theano [38] to manage the model. (0 fine-tuning).
The framework takes an image as an input, and gives as an output The 0 fine-tuning strategy which we can also refer to as the fea-
the predicted class label to be displayed as an output (Fig. 4). ture extraction mechanism is the basic way of doing transfer learn-
ing. We first remove the classification part of the networks (i.e. the
3. Experimental results fully-connected layers) which was responsible for giving the prob-
abilities of an image as being from each of the 10 0 0 classes in Im-
The extracted ROIs were of size r × r (r = 300), we rescaled ageNet. Then, we use the remaining part of the models as a fixed
them to be 224 × 224, so that they can be compatible with the feature extractor that computes the CNN codes of each image from
original size of images from ImageNet which were used to train our datasets. We finally use a Softmax classifier to train on the ob-
the original CNNs. tained high-dimensional CNN codes (i.e. feature vectors).
H. Chougrad et al. / Computer Methods and Programs in Biomedicine 157 (2018) 19–30 25
Fig. 4. Screenshots of the obtained result from our Breast Cancer Screening Framework based on the Inception v3-MD model; Case of an image containing a suspected
malignant mass lesion.
Fig. 7. The plot of accuracy and loss over the epochs for the Inception v3 model trained on the merged dataset (Inception v3-MD); the plots help in tuning the model and
its hyper-parameters while monitoring its performance on the train and validation sets.
raphy images in a supervised way, and then combined the ob- methods which classified the same public datasets we used, and
tained features with hand-crafted descriptors to classify mass le- this in terms of both accuracy and AUC metrics.
sions from the BCDR database. They achieved an AUC of 0.826 and Intensity normalization is an important preprocessing step in
we achieved 0.96 on the same dataset. medical imaging. During image acquisition, different scanners and
To evaluate the performance of our proposed Breast Cancer parameters are used for scanning the different patients or even the
Screening Framework we used the MIAS dataset, we calculated both same patient sometimes, which may result in large intensity vari-
the AUC and the overall classification accuracy of the test images ations. Those variations can be more flagrant from one set of data
and got 0.99 and 98.23% respectively. In contrast, an evaluation of to another (different illumination conditions, materials, expert in
the same dataset by Peng et al. [15] while using ANN (Artificial charge…etc.). This intensity variation can greatly undermine the
Neural Networks) with texture features extracted from the images performance of the proposed system for mammography analysis.
resulted in a 96% accuracy. Subsequently, before using the images and especially before merg-
ing all datasets, we used GCN normalization to reduce the inten-
4. Discussion sity variation between images, which may have been taken under
different conditions. The normalization helps phase out the inten-
To build an end-to-end powerful classification tool for breast sity variations caused by the various lighting conditions. So, that
cancer screening, we explored various setups and approaches. we can effectively reduce intra-variations between images from the
First, we investigated the extent that has transfer learning over same dataset and inter-variations between images from different
random initialization. We tested the performance of all our net- datasets.
works with the two approaches while training on three public To perform the transfer learning, we used datasets of different
datasets. The results demonstrated that initialization with pre- sizes and the results obtained indicated the existence of a corre-
trained weights is advantageous, and that it may be due to the lation between the number of training data and the performance
fact that the weights are already familiar with some universal fea- of the models. Thus, we combined all datasets to build one large
tures and patterns that were learned from ImageNet as opposed to set of images. The merged dataset was used to fine-tune the net-
random weights. works. Assuredly, the Inception v3 model outperformed the other
We then examined the possible ways of doing transfer learning. networks using this set of data as it achieved 98.94% accuracy.
We adopted a “0 fine-tuning” strategy where we used the CNNs A deep CNN composed of many layers trained on a small
as feature extractors then classified the resulting CNN codes us- dataset should have a large entropic capacity. The model is then
ing a Softmax classifier. Afterwards, we started unfreezing the last able to store a lot of information, which gives it the potential to
convolutional blocks one by one until 3 blocks where the accu- be highly accurate by exploiting more features. However, it can
racy started to drop. For the purpose of the experiment, we further also make it more at risk of storing irrelevant features. To mod-
pushed the fine-tuning strategy to the extreme by fine-tuning all of ulate the entropic capacity of our models, we had to first en-
the CNNs layers and we evaluated the performance in each case. large our datasets through data augmentation. Then, we only fine-
The results indicated that while fine-tuning is beneficial and can tuned the last two convolutional layers of the models to get more
lead to a better performance (i.e. if we compare “0 fine-tuning” to dataset-specific features. In addition, we applied L2 regularization
1 fine-tuning and “1 fine-tuning” to “2 fine-tuning”), but too much and dropout to disrupt complex co-adaptations on training data,
fine-tuning, as for example the “All fine-tuning” strategy, leads to and so we made the models focus on the more significant fea-
worse results. We found out that the optimal number of blocks to tures from the images, for a better generalization. Furthermore, all
fine-tune was 2 convolutional blocks. This enabled us to keep the the models were meticulously monitored to examine their perfor-
first layers which learn generic features and fine-tune only the last mance on the training data and the validation data in order to op-
layers to make them learn more data-specific features. timize the hyper-parameters and select the best model. The latter
Transfer learning and fine-tuning allowed us to use the learned was then used to assess both training and test sets simultaneously,
ImageNet weights of different deep learning models as an initial- to ensure that the model is not overfitting and that it performs
ization to our CNNs, and fine-tune them so as they can differenti- equally well on never-seen data (the test data).
ate malignant breast mass lesions from benign ones. After tuning the models and choosing the best hyper-
The obtained results show a clear improvement over other pro- parameters, we trained one final model for each CNN using a strat-
posed methods. Many of the works which classified mammogra- ified 5-fold cross-validation with all the data and we computed the
phy mass lesions employed simple neural networks, shallow Con- mean accuracy, standard deviation and time elapsed for each ex-
volutional Neural Networks (i.e. not deep enough) or the combina- periment to evaluate the performance.
tion of extracted CNN features with other hand-crafted descriptors. We used the Inception v3 model fine-tuned on the merged
However, the most interesting aspect of CNNs is the end-to-end dataset to develop a powerful classification tool. The Breast Cancer
learning, leading to a better performance while using less complex Screening Framework can be used as a Computer-aided Diagnosis
algorithms. The better performance comes from the fact that the system that classifies mammography mass lesions. To evaluate the
internal components self-optimize to maximize the overall system framework, we tested it using new images from the MIAS database,
performance. And Compared to the traditional neural networks, and we achieved an area under the curve (AUC) of 0.99. The results
CNNs reduce the computational cost as they have fewer parame- obtained outperform by a large margin human performance, with
ters and are easier to train. radiologists achieving a 0.82 AUC according to [40].
When comparing the obtained results of each CNN fine-tuned The developed framework could predict and provide the cor-
on the different datasets, we noticed that the depth of the model rect diagnosis for 98.23% of the images from MIAS, 97.35% from
as well as its architecture affects its performance. The best re- DDSM 95.50% for INbreast and 96.67% for BCDR. The results ob-
sults for each dataset were obtained using the Inception v3 model, tained from the receiver operating characteristic (ROC) curve anal-
which also happens to be the deepest network among the others. ysis showed a high true-positive rate for all previous datasets,
The Inception v3 seems to be more suited for fine-tuning, maybe which means a high probability of correctly identifying malignant
it is because of its architecture which is deep but not stacked up, mass lesions as being cancerous.
making it less sensitive to the vanishing gradient problem. As a Fig. 9 illustrates some examples of images that were misclassi-
result, fine-tuning the pre-trained Inception v3 model enabled us fied (red frames) versus others that were correctly classified (green
to achieve a better performance compared to the state-of-the-art frames).
H. Chougrad et al. / Computer Methods and Programs in Biomedicine 157 (2018) 19–30 29
Fig. 9. Examples of regions of interest containing mass lesions; the first row contains benign lesions and the second row contains malignant lesions; the misclassified
images are framed by a red bounding box (the 3 images on the left in both rows) and the correctly classified by a green one (the 3 images on the right in both rows). (For
interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Examining the misclassified images we can see that the texture Acknowledgment
of some of the benign and malignant images is similar. One possi-
bility is that this is due to high breast density. This research did not receive any specific grant from funding
It is well known that cancer is more difficult to detect, in mam- agencies in the public, commercial, or not-for-profit sectors.
mograms of women with radiographically dense breasts [41]. The BCDR database used in this work was a courtesy of MA
Breasts are made up of lobules, ducts, and fatty and fibrous con- Guevara Lopez and coauthors, Breast Cancer Digital Repository
nective tissue. The breasts are dense in the presence of a lot of Consortium.
glandular tissue and not much fat. On mammograms, dense breast The INBreast database used in this work was a courtesy of the
tissue looks white. Breast masses or tumors also look white, hence, Breast Research Group, INESC Porto, Portugal.
the dense tissue can hide tumors. On the other hand, fatty tissue
looks almost black. On a black background it is easier to identify Supplementary materials
a tumor that looks white (Fig. 9). Therefore, mammograms can be
less accurate in women with dense breasts. Supplementary material associated with this article can be
This suggests that we can further improve our framework, if found, in the online version, at doi:10.1016/j.cmpb.2018.01.011.
we were to carefully select the images to train on and give the
model more challenging examples to learn from. We can also in- References
clude additional imaging techniques in the learning process, such
as Breast Ultrasound or breast MRI (Magnetic Resonance Imaging), [1] R.L. Siegel, K.D. Miller, A. Jemal, Cancer Stat., CA. Cancer J. Clin. 67 (2017)
(2017) 7–30, doi:10.3322/caac.21387.
to help get a clearer view of the breast, especially for cases with [2] I. Schreer, Dense breast tissue as an important risk factor for breast cancer
high breast density [42,43]. and implications for early detection, Breast Care 4 (2009) 89–92, doi:10.1159/
0 0 0211954.
[3] K. Kerlikowske, P.A. Carney, B. Geller, M.T. Mandelson, S.H. Taplin, K. Malvin,
V. Ernster, N. Urban, G. Cutter, R. Rosenberg, R. Ballard-Barbash, Performance
5. Conclusions of screening mammography among women with and without a first-degree
relative with breast cancer, Ann. Intern. Med. 133 (20 0 0) 855–863.
[4] L. Berlin, Radiologic errors, past, present and future, Diagnosis 1 (2014) 79–84,
In summary, we can conclude that integrating the recent well- doi:10.1515/dx- 2013- 0012.
engineered deep learning CNNs through transfer learning into the [5] J.S. Whang, S.R. Baker, R. Patel, L. Luk, A. Castro, The causes of medical mal-
screening mechanism brings an apparent improvement compared practice suits against radiologists in the United States, Radiology 266 (2013)
548–554, doi:10.1148/radiol.12111119.
to other approaches. The fine-tuning strategy proposed improves [6] E.A. Sickles, Periodic mammographic follow-up of probably benign lesions: re-
the state-of-the-art accuracy classification on many public datasets. sults in 3,184 consecutive cases, Radiology 179 (1991) 463–468, doi:10.1148/
The Inception v3 model trained on the merged dataset, which radiology.179.2.2014293.
[7] Y. Bengio, Learning deep architectures for AI, found, Trends® Mach. Learn 2
achieved the best accuracy rate overall, was used to develop a (2009) 1–127, doi:10.1561/2200000006.
mass lesion classification tool. The Breast Cancer Screening Frame- [8] Y. Bengio, A. Courville, P. Vincent, Representation learning: a review and new
work devised, could successfully classify many "never-seen" images perspectives, IEEE Trans. Pattern Anal. Mach. Intell. 35 (2013) 1798–1828,
doi:10.1109/TPAMI.2013.50.
of mammography mass lesions. It provided highly accurate diag- [9] J. Schmidhuber, Deep learning in neural networks: an overview, Neural Netw.
noses when distinguishing benign from malignant lesions. There- 61 (2015) 85–117, doi:10.1016/j.neunet.2014.09.003.
fore, its output could be used as a "second opinion" to assist the [10] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (2015) 436–444,
doi:10.1038/nature14539.
radiologist in giving more accurate diagnoses.
[11] Y. LeCun, K. Kavukcuoglu, C. Farabet, Convolutional networks and applications
Our future work includes using deeper architectures as well as in vision, in: Proc. 2010 IEEE Int. Symp. Circuits Syst., 2010, pp. 253–256,
more challenging images to deal with the hindrance caused by doi:10.1109/ISCAS.2010.5537907.
mammograms of highly dense breasts. Besides, we suppose that it [12] F. Ciompi, B. de Hoop, S.J. van Riel, K. Chung, E.T. Scholten, M. Oudkerk, P.A. de
Jong, M. Prokop, B. van Ginneken, Automatic classification of pulmonary peri-
can also be beneficial to incorporate other imaging techniques in fissural nodules in computed tomography using an ensemble of 2D views and
combination with mammography, in the learning process, to help a convolutional neural network out-of-the-box, Med. Image Anal. 26 (2015)
build a robust and powerful breast mass lesion classification tool. 195–202, doi:10.1016/j.media.2015.08.001.
[13] H. Chen, D. Ni, J. Qin, S. Li, X. Yang, T. Wang, P.A. Heng, Standard plane local-
ization in fetal ultrasound via domain transferred deep neural networks, IEEE
J. Biomed. Health Inf. 19 (2015) 1627–1636, doi:10.1109/JBHI.2015.2425041.
Conflict of interest statement [14] H.C. Shin, H.R. Roth, M. Gao, L. Lu, Z. Xu, I. Nogues, J. Yao, D. Mollura,
R.M. Summers, Deep convolutional neural networks for computer-aided de-
tection: CNN architectures, dataset characteristics and transfer learning, IEEE
No conflict of interest. Trans. Med. Imaging 35 (2016) 1285–1298, doi:10.1109/TMI.2016.2528162.
30 H. Chougrad et al. / Computer Methods and Programs in Biomedicine 157 (2018) 19–30
[15] W. Peng, R.V. Mayorga, E.M.A. Hussein, An automated confirmatory system [29] S.C. Wong, A. Gatt, V. Stamatescu, M.D. McDonnell, Understanding data aug-
for analysis of mammograms, Comput. Methods Programs Biomed. 125 (2016) mentation for classification: when to warp?, ArXiv160908764 Cs. (2016). http:
134–144, doi:10.1016/j.cmpb.2015.09.019. //arxiv.org/abs/1609.08764.
[16] G. Carneiro, J. Nascimento, A.P. Bradley, Unregistered multiview mammogram [30] F. Chollet, Keras, GitHub, 2015 https://github.com/fchollet/keras.
analysis with pre-trained deep learning models, in: Int. Conf. Med. Image Com- [31] Y. Bengio, Deep learning of representations for unsupervised and trans-
put. Comput.-Assist. Interv., Springer, 2015, pp. 652–660. http://link.springer. fer learning, in: PMLR, 2012, pp. 17–36. http://proceedings.mlr.press/v27/
com/chapter/10.1007/978- 3- 319- 24574- 4_78. bengio12a.html. (accessed October 19, 2017).
[17] J. Arevalo, F.A. González, R. Ramos-Pollán, J.L. Oliveira, M.A. Guevara Lopez, [32] H. Lakkaraju, R. Socher, C. Manning, Aspect specific sentiment analysis using
Representation learning for mammography mass lesion classification with con- hierarchical deep learning, NIPS Workshop Deep Learn. Represent. Learn., 2014.
volutional neural networks, Comput. Methods Programs Biomed. 127 (2016) [33] M. Jaderberg, K. Simonyan, A. Zisserman, Spatial transformer networks, in: Adv.
248–257, doi:10.1016/j.cmpb.2015.12.014. Neural Inf. Process. Syst., 2015, pp. 2017–2025.
[18] Z. Jiao, X. Gao, Y. Wang, J. Li, A deep feature based framework for breast [34] J. Yosinski, J. Clune, Y. Bengio, H. Lipson, How transferable are features in deep
masses classification, Neurocomputing 197 (2016) 221–231, doi:10.1016/j. neural networks? in: Adv. Neural Inf. Process. Syst., 2014, pp. 3320–3328.
neucom.2016.02.060. [35] Y. Bengio, Practical Recommendations for Gradient-Based Training of Deep Ar-
[19] M. Oquab, L. Bottou, I. Laptev, J. Sivic, Learning and transferring chitectures, in: Neural Netw. Tricks Trade, Springer, Berlin, Heidelberg, 2012,
mid-level image representations using convolutional neural networks, pp. 437–478, doi:10.1007/978- 3- 642- 35289- 8_26.
in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit, 2014, pp. 1717– [36] D.P. Kingma, J. Ba, Adam: a method for stochastic optimization, ArXiv14126980
1724. http://www.cv-foundation.org/openaccess/content_cvpr_2014/html/ Cs. (2014). http://arxiv.org/abs/1412.6980 (accessed January 31, 2017).
Oquab_Learning_and_Transferring_2014_CVPR_paper.html. (accessed January [37] N. Srivastava, G.E. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov,
27, 2017). Dropout: a simple way to prevent neural networks from overfitting, J. Mach.
[20] K. Chatfield, K. Simonyan, A. Vedaldi, A. Zisserman, Return of the devil in the Learn. Res. 15 (2014) 1929–1958.
details: delving deep into convolutional nets, ArXiv14053531 Cs. (2014). http: [38] Theano Development Team, Theano: a python framework for fast computa-
//arxiv.org/abs/1405.3531 (accessed June 2, 2017). tion of mathematical expressions, ArXiv E-Prints. abs/1605.02688 (2016). http:
[21] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale //arxiv.org/abs/1605.02688.
image recognition, ArXiv14091556 Cs. (2014). http://arxiv.org/abs/1409.1556 [39] J Suckling, et al., The mammographic image analysis society digital mammo-
(accessed February 10, 2017). gram database exerpta medica, Int. Congr. Ser. 1069 (1994) 375–378.
[22] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, [40] J.G. Elmore, S.L. Jackson, L. Abraham, D.L. Miglioretti, P.A. Carney, B.M. Geller,
in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778. B.C. Yankaskas, K. Kerlikowske, T. Onega, R.D. Rosenberg, E.A. Sickles,
[23] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the incep- D.S.M. Buist, Variability in interpretive performance at screening mammogra-
tion architecture for computer vision, in: Proc. IEEE Conf. Comput. Vis.Pattern phy and radiologists’ characteristics associated with accuracy1, Radiology 253
Recognit., 2016, pp. 2818–2826. (2009) 641–651, doi:10.1148/radiol.2533082308.
[24] J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, L. Fei-Fei, ImageNet: a large-scale hi- [41] J.E. Joy, E.E. Penhoet, D.B. Petitti, I. of M. (US) and N.R.C. (US) C. on N.A. to
erarchical image database, in: 2009 IEEE Conf. Comput. Vis. Pattern Recognit., E.D. and D. of B., Cancer, Benefits and Limitations of Mammography, National
2009, pp. 248–255, doi:10.1109/CVPR.2009.5206848. Academies Press, US, 2005 https://www.ncbi.nlm.nih.gov/books/NBK22311/ ac-
[25] A. Halevy, P. Norvig, F. Pereira, The unreasonable effectiveness of data, IEEE cessed June 8, 2017.
Intell. Syst. 24 (2009) 8–12, doi:10.1109/MIS.2009.36. [42] W.A. Berg, Z. Zhang, D. Lehrer, R.A. Jong, E.D. Pisano, R.G. Barr, M. Böhm-Vélez,
[26] M. Heath, K. Bowyer, D. Kopans, R. Moore, W.P. Kegelmeyer, The digital M.C. Mahoney, W.P. Evans, L.H. Larsen, M.J. Morton, E.B. Mendelson, D.M. Far-
database for screening mammography, in: Proc. 5th Int. Workshop Digit. Mam- ria, J.B. Cormack, H.S. Marques, A. Adams, N.M. Yeh, G. Gabrielli, Detection of
mogr., Medical Physics Publishing, 20 0 0, pp. 212–218. https://www3.nd.edu/∼ breast cancer with addition of annual screening ultrasound or a single screen-
kwb/Heath_EtAl_IWDM_20 0 0.pdf. (accessed January 27, 2017). ing MRI to mammography in women with elevated breast cancer risk, JAMA J.
[27] M.G. Lopez, N. Posada, D.C. Moura, R.R. Pollán, J.M.F. Valiente, C.S. Ortega, Am. Med. Assoc. 307 (2012) 1394–1404, doi:10.1001/jama.2012.388.
M. Solar, G. Diaz-Herrero, I. Ramos, J. Loureiro, others, BCDR: a breast cancer [43] D.S. Salem, R.M. Kamal, S.M. Mansour, L.A. Salah, R. Wessam, Breast imaging
digital repository, 15th Int. Conf. Exp. Mech., 2012 http://paginas.fe.up.pt/clme/ in the young: the role of magnetic resonance imaging in breast cancer screen-
icem15/ICEM15_CD/data/papers/3004.pdf accessed January 27, 2017. ing, diagnosis and follow-up, J. Thorac. Dis. 5 (2013) S9–S18, doi:10.3978/j.issn.
[28] I.C. Moreira, I. Amaral, I. Domingues, A. Cardoso, M.J. Cardoso, J.S. Cardoso, 2072-1439.2013.05.02.
INbreast: toward a full-field digital mammographic database, Acad. Radiol 19
(2012) 236–248, doi:10.1016/j.acra.2011.09.014.