Interpretable and Annotation
Interpretable and Annotation
Interpretable and Annotation
com
textbookfull
Jaime Cardoso · Hien Van Nguyen ·
Nicholas Heller et al. (Eds.)
LNCS 12446
Founding Editors
Gerhard Goos
Karlsruhe Institute of Technology, Karlsruhe, Germany
Juris Hartmanis
Cornell University, Ithaca, NY, USA
123
Editors
Jaime Cardoso Hien Van Nguyen
University of Porto University of Houston
Porto, Portugal Houston, TX, USA
Nicholas Heller
University of Minnesota
Minneapolis, MN, USA
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Additional Workshop Editors
It is our genuine honor and great pleasure to welcome you to the Third Workshop on
Interpretability of Machine Intelligence in Medical Image Computing (iMIMIC 2020),
a satellite event at the 23rd International Conference on Medical Image Computing and
Computer Assisted Intervention (MICCAI 2020). Following in the footsteps of the two
previous successful meetings in Granada, Spain (2018) and Shenzhen, China (2019),
we gathered for this new edition.
iMIMIC is a single-track, half-day workshop consisting of high-quality, previously
unpublished papers, presented either orally or as a poster, intended to act as a forum for
research groups, engineers, and practitioners to present recent algorithmic develop-
ments, new results, and promising future directions in interpretability of machine
intelligence in medical image computing. Machine learning systems are achieving
remarkable performances at the cost of increased complexity. Hence, they become less
interpretable, which may cause distrust, potentially limiting clinical acceptance. As
these systems are pervasively being introduced to critical domains, such as medical
image computing and computer assisted intervention, it becomes imperative to develop
methodologies allowing insight into their decision making. Such methodologies would
help physicians to decide whether they should follow and trust automatic decisions.
Additionally, interpretable machine learning methods could facilitate defining the legal
framework of their clinical deployment. Ultimately, interpretability is closely related to
AI safety in healthcare.
This year’s iMIMIC was held on October 4, 2020, virtually in Lima, Peru, and was
hosted by INESC TEC and the University of Coimbra, with the support of University
of Porto and CISUC, all located in Portgual. There was a very positive response to the
call for papers for iMIMIC 2020. We received 18 full papers from 10 countries and 8
were accepted for presentation at the workshop, where each paper was reviewed by at
least three reviewers. The accepted papers present fresh ideas of Interpretability in
settings such as regression, multiple instance learning, weakly supervised learning,
local annotations, classifier re-training, and model pruning.
The high quality of the scientific program of iMIMIC 2020 was due first to the
authors who submitted excellent contributions and second to the dedicated collabora-
tion of the International Program Committee and the other researchers who reviewed
the papers. We would like to thank all the authors for submitting their contributions and
for sharing their research activities.
We are particularly indebted to the Program Committee members and to all the
reviewers for their precious evaluations, which permitted us to set up this publication.
viii iMIMIC 2020 Preface
We were also very pleased to benefit from the participation of the invited speakers
Himabindu Lakkaraju, Harvard University, USA, and Wojciech Samek, Fraunhofer
HHI, Germany. We would like to express our sincere gratitude to these world-
renowned experts.
Welcome to the Second International Workshop on Medical Image Learning with Less
Labels and Imperfect Data (MIL3iD 2020). The MIL3iD 2020 proceedings contain 11
high-quality papers of 8 pages that were selected through a rigorous peer-review
process.
We hope this workshop will create a forum for discussing best practices in medical
image learning with label scarcity and data imperfection. This forum is urgently needed
because the issues of label noises and data scarcity are highly practical, but largely
under-investigated in the medical image analysis community. Traditional approaches
for dealing with these challenges include transfer learning, active learning, denoising,
and sparse representation. The majority of these algorithms were developed prior to the
recent advances of deep learning and might not benefit from the power of deep
networks. The revision and improvement of these techniques in the new light of deep
learning are long overdue.
This workshop potentially helps answer many important questions. For example,
several recent studies found that deep networks are robust to massive random label
noises but more sensitive to structured label noises. What implication do these findings
have on dealing with noisy medical data? Recent work on Bayesian neural networks
demonstrates the feasibility of estimating uncertainty due to the lack of training data. In
other words, it enables our classifiers to be aware of what they do not know. Such a
framework is important for medical applications where safety is critical. How can
researchers of MICCAI community leverage this approach to improve their systems’
robustness in the case of data scarcity? Our prior work shows that a variant of capsule
networks generalizes better than convolutional neural networks with an order of
magnitude fewer training data. This gives rise to an interesting question: Are there
better classes of networks that intrinsically require less labeled data for learning?
Humans always have an edge over deep networks when it comes to learning with small
amounts of data. However, recent work on one-shot deep learning has surpassed human
in an image recognition task using only a few training samples for each task. Do these
results still hold for medical image analysis tasks?
The proceedings of the workshop are published as a joint LNCS volume alongside
other satellite events organized in conjunction with MICCAI. In addition to the LNCS
volume, to promote transparency, the papers’ reviews and preprints are publicly
available on the workshop website: https://www.hvnguyen.com/lesslabelsimperfect
dataml2020. In addition to the papers, abstracts, slides, and posters presented during
the workshop will be made publicly available on the MIL3iD website.
xii MIL3iD 2020 Preface
We would like to thank all the speakers and authors for joining our workshop, the
Program Committee for their excellent work with the peer reviews, and the workshop
chairs and editors for their help with the organization of the second MIL3iD workshop.
This volume contains the proceedings of the 5th International Workshop on Large-
scale Annotation of Biomedical data and Expert Label Synthesis (LABELS 2020),
which was held on October 8, 2020, in conjunction with the 23rd International
Conference on Medical Image Computing and Computer Assisted Intervention
(MICCAI 2020), originally planned for Lima, Peru, but ultimately held virtually, due to
the COVID-19 pandemic. The first workshop in the LABELS series was held in 2016
in Athens, Greece. This was followed by workshops in Quebec City, Canada, in 2017,
Granada, Spain, in 2018, and Shenzhen, China, in 2019.
As data-hungry methods continue to drive advancements in medical imaging, the
need for high-quality annotated data to train and validate these methods continues to
grow. Further, with the pressing need to address health disparities and to prevent
learned systems from internalizing biases, there has never been a greater need for
thorough study and discussion of best practices in data collection and annotation. For
the past four years, LABELS has aimed to facilitate exactly this.
Following the success of the previous four LABELS workshops, the fifth workshop
was planned for 2020. This year’s edition of the workshop included invited talks by
Anand Malpani (Johns Hopkins University, USA) and Amber Simpson (Queen’s
University, Canada), as well as several papers and abstracts. After peer review, a total
of 10 papers and 3 abstracts were selected. The papers appear in this volume, and the
abstracts are available on the workshop website: https://miccailabels.org. The research
presented this year ranged from how to quantify and mitigate demographic biases, to
probing the reproducibility of expert labels, to new tools for more efficient annotation
of emerging image modalities. LABELS takes pride in the fact that theoretical novelty
is not a prerequisite for work presented at the workshop, instead the event embraces the
messy, tedious reality of medical image collection and annotation in an effort to expose
and formalize its underlying principles.
We would like to thank all the speakers and authors for joining our workshop, the
Program Committee for their excellent work with the peer reviews, our sponsors –
Retinai and Auris Health – for their support, and the workshop chairs for their help with
the organization of the fifth LABELS workshop.
August 2020 Nicholas Heller
Raphael Sznitman
Veronika Cheplygina
Diana Mateus
Emanuele Trucco
Samaneh Abbasi
LABELS 2020 Organization
iMIMIC 2020
MIL3iD 2020
LABELS 2020
1 Introduction
Computed tomography (CT) images are taken for a variety of medical reasons,
including diagnosis of bone fractures, internal bleedings, and tumors. Often CT
scans show the spine or sections of the spine. Even for an experienced radi-
ologist, manually searching for fractured vertebrae in such a CT image is a
time-consuming task, and is often not conducted, unless it was the primary
c Springer Nature Switzerland AG 2020
J. Cardoso et al. (Eds.): iMIMIC 2020/MIL3iD 2020/LABELS 2020, LNCS 12446, pp. 3–12, 2020.
https://doi.org/10.1007/978-3-030-61166-8_1
4 E. B. Yilmaz et al.
2 Methods
Given a 3D patch of a CT image containing a centered vertebra, the task of
our proposed models is to decide if the vertebra has an osteoporotic fracture
or not. Note that this means not only distinguishing between Genant score 0
and larger than 0, but also distinguishing whether a deformity is degenerative or
1
We also conducted experiments with a 3D ResNet18 variant [7] that are out of scope
for this publication but are in line with the presented results.
Assessing Attribution Maps for Vertebral Fracture Classifiers 5
direction. A learning rate of 10−4 and a loss weight of 1 is used for all outputs,
i.e., the total loss is the unweighted sum of the losses at all three output nodes.
When using fixed noise values for SmoothGrad and with the exception of
Guided BackProp, the above methods satisfy implementation invariance: Two
models that compute the same function should, given any fixed input image,
have the same attribution maps.
3 Experiments
The dataset used in this study contains 159 low-dose CT images of distinct
patients (136 female), typically showing vertebrae T5–L4. The images were
acquired in seven centers participating in the in the Diagnostik Bilanz study
of the BioAsset project [6]. SpineAnalyzerTM [1] was used by a radiologist to
annotate—for each visible vertebra—the Genant score [5], “deformity percent-
ages” indicating height reduction, vertebra centers on a 2D sagittal slice, and a
differential diagnosis indicating if the vertebra shows either a “deformity” (1019
cases), an “osteoporotic fracture” (128 cases), is “unevaluable” (due to noise,
5 cases) or “normal” (802 cases). For the binary classification task, vertebrae
with an “osteoporotic fracture” are labeled 1, “deformity” and “normal” corre-
spond to 0, and “unevaluable” vertebrae were excluded. The lateral coordinate
was computed using a state-of-the-art vertebra localization tool [11] and man-
ually checked for correctness. Centered on these coordinates, for each vertebra,
a 3D patch is extracted of fixed size from the CT image, serving as input for
the respective CNN. As the only preprocessing steps, images were scaled to
3
a resolution of 1 × 1 × 3 mmvx (longitudinal × anteroposterior × lateral) and
Hounsfield-values were divided by 2048. The images were split on patient level
into four subsets defining a 4-fold cross-validation setup. In each run, two data
subsets were used as training data, one as validation data (for early stopping and
choosing the classification threshold) and one subset as test data. To address the
class imbalance, at training time each vertebra in each mini-batch was chosen
randomly, such that with 50% probability a vertebra labeled 1 was chosen. The
data augmentation methods listed in Sect. 2.1 were applied to each vertebra in
each mini-batch.
Table 1. Vertebral fracture discrimination results for the U-Net prefix and the Cus-
tom CNN. Mean and standard deviation are computed across 4 folds. Bold: Best per
column. ROC-AUC: area under the ROC curve. AP: average precision.
Fig. 2. Attribution maps and sanity checks for the U-Net prefix (left) and the Custom
CNN (right). Top: Comparison of the central sagittal slices of attribution maps using
a true positive L3 vertebra as example, where blue, white and red pixels correspond to
negative, near-zero and positive values. The “Baseline” row shows the attribution maps
for a trained instance of the corresponding CNN; the following rows show resulting maps
when performing the described sanity checks. Images were normalized individually by
division with the maximum absolute value. Bottom: A quantitative comparison, where
colored bars display the SSIM values averaged across the vertebrae in the respective
test sets and black bars show the Pearson correlation of the models’ outputs. Error
bars indicate the standard deviation across vertebrae.
10 E. B. Yilmaz et al.
“Random Weights” and “Random Labels”. In contrast, the SSIM of the attri-
bution maps was only slightly larger, if at all (see Fig. 2 and Sect. 3.1).
of similar accuracy can produce different attribution maps, which matches the
results of [18]. Gradient * Input, designed to reduce noise in the explanations,
highlights mostly the outline of the input structure, which was also observed
by [2]. SmoothGrad, also designed to reduce noise, visually and quantitatively
behaved similar to the original gradient.
In conclusion, the explanations exhibit a strong dependence on the model
architecture, the realization of the parameters, and the precise position of the
target object of interest. Since explanations of a model’s decision would be most
helpful to convince physicians that automated approaches perform trustworthy
evaluations, future work should address the implications of these findings for the
clinical practice.
References
1. SpineAnalyzer. Optasia Medical Ltd., Cheadle Hulme, United Kingdom (2013)
2. Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., Kim, B.: Sanity
checks for saliency maps. In: Advances in Neural Information Processing Systems,
Montréal, Canada, pp. 9525–9536. Curran Associates Inc. (2018)
3. Ancona, M., Ceolini, E., Öztireli, C., Gross, M.: Gradient-based attribution meth-
ods. In: Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K., Müller, K.-R. (eds.)
Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. LNCS
(LNAI), vol. 11700, pp. 169–191. Springer, Cham (2019). https://doi.org/10.1007/
978-3-030-28954-6 9
4. Erhan, D., Bengio, Y., Courville, A., Vincent, P.: Visualizing higher - layer features
of a deep network. Technical report, Univeristé de Montréal (2009)
5. Genant, H.K., Wu, C.Y., van Kuijk, C., Nevitt, M.C.: Vertebral fracture assessment
using a semiquantitative technique. J. Bone Miner. Res. 8(9), 1137–1148 (1993)
6. Glüer, C.C., et al.: New horizons for the in vivo assessment of major aspects of bone
quality microstructure and material properties assessed by Quantitative Computed
Tomography and Quantitative Ultrasound methods developed by the BioAsset
consortium. Osteologie 22, 223–233 (2013)
7. Haarburger, C., et al.: Multi scale curriculum CNN for context-aware breast MRI
malignancy classification. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol.
11767, pp. 495–503. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-
32251-9 54
8. Husseini, M., Sekuboyina, A., Bayat, A., Menze, B.H., Loeffler, M., Kirschke, J.S.:
Conditioned variational auto-encoder for detecting osteoporotic vertebral fractures.
In: Cai, Y., Wang, L., Audette, M., Zheng, G., Li, S. (eds.) CSI 2019. LNCS,
vol. 11963, pp. 29–38. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-
39752-4 3
9. Isensee, F., Kickingereder, P., Wick, W., Bendszus, M., Maier-Hein, K.H.: Brain
tumor segmentation and radiomics survival prediction: contribution to the BRATS
2017 challenge. In: Crimi, A., Bakas, S., Kuijf, H., Menze, B., Reyes, M. (eds.)
BrainLes 2017. LNCS, vol. 10670, pp. 287–297. Springer, Cham (2018). https://
doi.org/10.1007/978-3-319-75238-9 25
10. Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: 3rd Inter-
national Conference on Learning Representations, San Diego, May 2015
12 E. B. Yilmaz et al.
11. Mader, A.O., Lorenz, C., von Berg, J., Meyer, C.: Automatically localizing a large
set of spatially correlated key points: a case study in spine imaging. In: Shen, D.,
et al. (eds.) MICCAI 2019. LNCS, vol. 11769, pp. 384–392. Springer, Cham (2019).
https://doi.org/10.1007/978-3-030-32226-7 43
12. Nicolaes, J., et al.: Detection of vertebral fractures in CT using 3D convolutional
neural networks. In: Cai, Y., Wang, L., Audette, M., Zheng, G., Li, S. (eds.) CSI
2019. LNCS, vol. 11963, pp. 3–14. Springer, Cham (2020). https://doi.org/10.1007/
978-3-030-39752-4 1
13. Izmailov, P., Podoprikhin, D., Garipov, T., Vetrov, D., Wilson, A.G.: Averaging
weights leads to wider optima and better generalization. In: Uncertain Artificial
Intelligence, Monterey, California, pp. 876–885. AUAI Press, Corvallis, March 2018
14. Shrikumar, A., Greenside, P., Shcherbina, A., Kundaje, A.: Not Just a Black Box:
Learning Important Features Through Propagating Activation Differences (2016)
15. Smilkov, D., Thorat, N., Kim, B., Viégas, F., Wattenberg, M.: SmoothGrad: remov-
ing noise by adding noise (2017). arXiv:Learning
16. Springenberg, J., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity:
the all convolutional net. In: International Conference on Learning Representations
(2015)
17. Tomita, N., Cheung, Y.Y., Hassanpour, S.: Deep neural networks for automatic
detection of osteoporotic vertebral fractures on CT scans. Comput. Biol. Med. 98,
8–15 (2018)
18. Young, K., Booth, G., Simpson, B., Dutton, R., Shrapnel, S.: Deep neural network
or dermatologist? In: Suzuki, K., et al. (eds.) ML-CDS/IMIMIC -2019. LNCS,
vol. 11797, pp. 48–55. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-
33850-3 6
Another random document with
no related content on Scribd:
que beaucoup de Juifs, mes camarades, défenseur à l’excès des
traditions pharisiennes. »
Nous n’avons aucun motif d’induire que Paul, quinze ou vingt ans
après, exagérait ses violences pour mieux attester : Je me suis
converti malgré moi, sans nul mérite, sans que rien m’y préparât.
L’étrange, c’est plutôt le ton dégagé de sa confession ; pas un
mot ne laisse entendre que la mémoire de ses violences l’a bourrelé
de remords. Il expliquera très simplement, plus tard, à Timothée,
pourquoi il a pu trouver grâce devant Dieu :
« Le Christ Jésus m’a établi dans son service, moi qui étais
auparavant blasphémateur, persécuteur, tourmenteur. Il m’a pris en
pitié parce que j’avais agi sans savoir, dans le manque de foi. »
Les fureurs de Saul sortaient donc d’un zèle exaspéré pour la
religion qu’il croyait uniquement vraie. Ses cruautés trouveraient une
explication dans ce mot aigu de Pascal :
« Jamais on ne fait le mal si pleinement ni si gaiement que quand
on le fait par conscience. »
Mais il faut aussi comprendre quelle pouvait être l’âme d’un Juif
au I er siècle, ce qu’était le monde autour de lui.
On aurait grand tort de se figurer Israël comme un peuple, avant
tout, féroce. Dans son histoire, les traits de miséricorde et de
tendresse n’ont rien d’anormal. Sur l’âpreté des tempéraments, le
précepte divin posait son onction :
« Tu aimeras le Seigneur ton Dieu de tout ton cœur, de toute ta
force. »
Entre Iahvé et son peuple, un principe de suavité tempérait la
crainte :
« Le Seigneur ton Dieu t’a porté, lui disait Moïse, comme un
homme porte (sur l’épaule) son fils tout petit [45] . »
[45] Deutéronome I , 31.
Atroces pour nous, ces rites expiatoires l’étaient bien moins que
ceux des idolâtres, offrant leurs fils au bûcher de Moloch, ou se
mutilant, comme faisaient les prêtres de Cybèle, en public, avec
frénésie. Ils provoquaient les Juifs à la pénitence, commémorant les
peines dont Iahvé avait frappé leurs pères impies ou fornicateurs. Ils
préfiguraient la victime substituée, elle volontaire et parfaite, le Christ
percé d’épines, flagellé, honni. Mais, chez des âmes brutales, ils
excitaient le goût du sang, une sorte d’irritation luxurieuse déviée en
ivresse de tuerie.
D’ailleurs, asservis à des maîtres iniques, les Juifs, tout en
courbant l’échine, avaient médité d’affreuses représailles. Si on
touchait à leur culte et à la Loi, ils résistaient sauvagement, et les
répressions étaient inexorables. Lorsque Antiochus Épiphane
prétendit helléniser Jérusalem, établir dans le Temple une statue de
Zeus, lorsqu’il eut interdit la circoncision, les pharisiens s’obstinèrent
à faire circoncire les nouveau-nés. Tous ceux qui étaient dénoncés
étaient battus de verges, mutilés, mis en croix ; et les bourreaux,
après avoir étranglé les enfants, pendaient leurs cadavres au cou
des crucifiés [58] . Hérode ayant fait clouer sur le portail du Temple un
aigle d’or, deux docteurs, Judas et Mathias, l’arrachèrent en plein
midi, devant la foule, et le brisèrent à coups de hache. Arrêtés, ils
justifièrent leur violence avec ce seul argument : « Nous avons
vengé l’outrage fait à Dieu et l’honneur de la Loi dont nous sommes
les disciples. » Pour déchaîner un mouvement furieux, il suffit que
Pilate voulût faire promener à travers les rues de Jérusalem des
enseignes militaires où figurait le médaillon de César [59] . Caligula,
quand il essaya d’imposer dans le Temple sa statue en « nouveau
Jupiter », faillit soulever toute la Judée contre Rome.
[58] Josèphe, Antiq., l. XII, VII .
[59] Id., l. XVIII, IV .
*
* *
Et puis il y avait les ergoteries sans fin sur les impuretés, sur ce
qui est permis ou prohibé les jours de sabbat [86] . Il y avait la
casuistique des dommages qui entraînent ou non un châtiment :
[86] Faut-il rappeler l’opposition d’Hillel et de
Chammaï autour de ce point grave, si, un jour de fête, on
pouvait manger un œuf pondu dans la journée ?
« Si un coq, en voltigeant d’un endroit à un autre, cause des
dégâts par son contact, le propriétaire sera responsable du dégât
entier. Mais, si le dommage est survenu par le vent des ailes, le
propriétaire ne doit payer que la moitié de la valeur [87] . »
[87] Baba Gama, trad. Schwab, p. 12.
Quoi qu’il en soit, entre lui et Saul éclate une antithèse : voilà un
Maître, pondéré, souple, théoricien de l’indulgence, et son disciple
agit à l’encontre de sa doctrine autant qu’un Jacobin de 93
démentait un Necker ou un Montesquieu.
Il serait oiseux de vouloir élucider ce problème, comme de
trancher si Paul fut ou non rabbin. Très souvent le disciple est
l’opposé du maître ; de même que le fils est la négation du père.
Gamaliel eut un fils fanatique et hostile aux chrétiens. Saul, dans sa
jeunesse, était quelqu’un de très indépendant. Porté aux extrêmes, il
suivait en ses haines la fougue de ses énergies. S’il admirait la
science et l’autorité de Gamaliel, il estimait dangereux son
libéralisme. En tant que Juif, avait-il tort ?
Une hypothèse semble absurde, effroyable, celle de concevoir la
foi chrétienne étouffée dans sa première croissance. A ne
l’envisager qu’humainement, elle aurait pu l’être, si on l’avait
exterminée avec suite et sans merci. Mais les empereurs ne la
persécuteront par système qu’au second et au troisième siècle,
quand il sera trop tard pour la tuer. Et la persécution juive a été
brève, intermittente, indécise. Une puissance supérieure la
contrecarrait, la paralysait. Hérode aura beau tenir Pierre lié de deux
chaînes entre les soldats. Un Ange touchera les chaînes ; elles se
dénoueront ; d’elle-même, la porte de fer s’ouvrira. Et Saul, au
moment où il se croit victorieux de Jésus le Nazaréen, va devenir
son esclave, « le vase d’élection ».
II
SAUL LE VOYANT