A Practical Guide To Artificial Intelligence-Based Image Analysis in Radiology
A Practical Guide To Artificial Intelligence-Based Image Analysis in Radiology
A Practical Guide To Artificial Intelligence-Based Image Analysis in Radiology
B oth radiology and artificial intelligence (AI) have a long history: the
X-rays have been discovered by W.C. Röntgen in 1895,1 and the
term “AI” was first introduced at a conference in Dartmouth in 1956.2
sponding reports of the wrist in our RIS/PACS that either confirm or ex-
clude a fracture,” and (2) “identify all computed tomography [CT]
pulmonary angiograms with corresponding reports performed at our in-
The vision of “artificial brains” is even older and was known to the 2 stitution between January 01, 2018, and January 6, 2019, on scanner
pioneers of computing, Alan Turing and Konrad Zuse.3 type X with the question whether there is pulmonary embolism or
There is quite some imprecision in the use of the terms AI, ma- not, and separate examinations containing pulmonary embolism from
chine learning (ML), and deep learning (DL). Artificial intelligence is those that do not.” While almost all RIS/PACS applications might han-
an umbrella term encompassing any technique that enables computers dle the first query, processing complex queries such as the second one is
to mimic human intelligence. Machine learning is a subclass of AI tech- not feasible. Finally, many clinical RIS/PACS systems do not allow for
niques and describes algorithms that self-improve upon exposure to a batch-wise export of image and text data. Consequently, reorganiza-
new data.4 Deep learning is a subset of ML algorithms that make use tion of the data according to a study-centered view is needed.
of multilayered neural networks.5 Here, we will use AI as the umbrella The information required for search queries is contained in DICOM
term for consistency. Both radiology and computing have evolved up to tags and the radiology reports.26 Four steps are needed to realize data
the point where an application of AI to radiology problems has become reorganization, these are as follows: (1) retrieval of DICOM tags (from
feasible: radiology has become digital with all data stored in radiology PACS) and radiology reports (from RIS); (2) merging and storing these
information system (RIS)/picture archiving and communication system data in a database; (3) design of a tool for searching this database with
(PACS) archives; AI has matured for automated image analysis. This de- an intuitively understandable user interface and swift full-text search ca-
velopment has been driven by increased computational power, cheaper pability on the report texts by indexing; Figure 1 displays an example
data storage, and higher data transfer rates. This has also led to a pro- RIS/PACS search engine (SE) developed and used at our institution;
nounced increase in articles on AI in radiology in recent years with a and (4) connectivity of the RIS/PACS-SE to clinical RIS/PACS data-
wide range of possible applications such as finding detection,6–11 bases that allow for an easy export of data to tables (technical image in-
segmentation,12–17 classification,18–21 and outcome prediction.22–24 As formation and reports) and secondary image processing applications
the data stored in radiology departments are patient-centered and clinical (image data). Note that a basic version of a study-centered RIS/PACS-SE
data are at best present in an unstructured fashion, as a whole, this data requires only the readout of a limited number of standard DICOM tags
conglomerate is not “AI ready” and a multistep pipeline from data (eg, study date [0008,0020], study description [0008,1030], modality
search and download to quantification of diagnostic performance is [0008,0060]). The integration of further tags allows for more specific
needed to implement AI projects. queries (eg, contrast/bolus agent [0018, 0010]). However, missing data,
Although it is not the aim of this article to describe ML techniques inconsistent labeling over time, and the diverse usage of private data
(for details, see Chartrand et al5 and Kohli et al25), we aim to guide readers element tags are limiting factors. To overcome this problem, establish-
through the pipeline required for an AI project in automated image analysis ing a unified diction for all DICOM tags of interest and logging any
changes over time are important measures to assure high data quality.
Received for publication May 21, 2019; and accepted for publication, after revision,
June 22, 2019.
From the Department of Radiology, University Hospital Basel, Basel, Switzerland.
REPORT TEXT CURATION
Correspondence to: Thomas Weikert, MD, Department of Radiology, University Hospital Although unsupervised ML approaches are rarely used in
Basel, Petersgraben 4, 4031 Basel, Switzerland. E-mail: [email protected]. radiology,27–29 most AI projects require classification of examinations
Conflicts of interest and sources of funding: none declared.
Copyright © 2019 Wolters Kluwer Health, Inc. All rights reserved.
for compiling the training and testing dataset. However, manually label-
ISSN: 0020-9996/19/0000–0000 ing data on an examination level (eg, “fracture” vs “no fracture”) causes
DOI: 10.1097/RLI.0000000000000600 enormous costs due to 2 reasons: first, AI projects benefit from large
FIGURE 1. Exemplary, in-house developed RIS/PACS search platform allowing for flexible data queries including full-text search on radiology reports.
amounts of data, and second, labeling requires a certain level of medical and use of NLP libraries such as peFinder38 or NILE39 is an alternative.
expertise, necessitating the involvement of physicians. Automatized Challenging factors for NLP in the context of radiology encompass am-
retrospective extraction of labels from radiology reports is an answer biguity of abbreviations, dealing with uncertainty and inconclusiveness
to this challenge. However, despite efforts toward standardization in re- of reports and the presence of spelling errors.40
cent years,30–32 radiology reports still mostly consist of unstructured or
semistructured texts. A simple text query is not sufficient due to the
plethora of words describing same findings and further complicating fac- IMAGE DATA CURATION
tors such as negations. Natural language processing (NLP) is a potential There are 3 levels of detail for image labeling: whole image clas-
solution to this problem. It allows for a transfer of continuous texts to sification, object detection, and object segmentation (see Fig. 2).
labels. The performance of NLP methods that were historically based The goal of whole image classification is to assign 1 class label
on manually drafted lexical rule systems has improved dramatically in per examination (eg, class label “fracture” assigned to a radiograph). If
last years by incorporating ML approaches such as support vector this label can be extracted from the corresponding radiology reports, no
machines, random forest, and deep convolutional neural networks further labeling of image data is required. This allows for instant labeling of
(DCNNs).33–35 This significantly reduces human input as only a small a huge number of examinations, as demonstrated by Annarumma et al.37
subset of the text data has to be labeled manually. Although these Object detection requires the assignment of 1 label per object, for
methods achieve excellent sensitivity and specificity, a residual level example, lung tumors, and includes information on where an object is
of inaccuracy is inherent to NLP: Pons and colleagues reviewed NLP located. A method frequently used for object detection tasks is that of
in radiology research projects and found sensitivities ranging from bounding boxes, that is, rectangular boxes with an assigned object class
71% to 98%.36 The acceptable amount of inaccuracy depends on the that contain an object of interest (Fig. 2A). They are also called “weak
amount of data and the specific research question. That AI algorithms annotations” due to the fact that there is no delineation of findings bor-
can handle a certain amount of wrong labels has been demonstrated ders.41 Bounding boxes are provided in many open datasets for object
among others by Annarumma and colleagues, who used NLP-derived recognition, for example, Open Images V4, with over 15 million boxes
labels of over 470,000 adult chest radiographs to successfully train 2 belonging to 600 everyday object classes such as “coffee cup,”42 and
DCNNs for triaging based on imaging information.37 For many pro- were also used in the field of radiology to detect pulmonary nodules on
jects, the labeling of reports on the level of examinations is sufficient. radiographs43 and colitis on abdominal CT scans.44 Usually, there is more
However, labeling of additional information can be useful to identify sub- than 1 object class. For example, if the aim is to predict TNM classes of
sets of examinations with specific features (eg, acute or chronic pulmo- lesions in patients with lung cancer, at least 4 (T) + 3 (N) + 1 (M) labels
nary embolism). As both manual labeling and NLP approaches require are needed. Before starting the annotation, the definition of meaningful,
manual input, an easy-to-use labeling tool is important. Natural language preferably mutually exclusive, labels is important. It should be kept in
processing can be implemented with the software packages mentioned mind that AI algorithms need a sufficient number of examples for each
below in section Software and Hardware Requirements. Whenever a category to be able to correctly map inputs to output classes.45 How
100% correctly labeled dataset is warranted, manual labeling pre hoc or many examples are needed depends on the distinctness of the categories
post hoc remains an option. For specific questions, the design, adaption, and quality of training data, for example, creating an algorithm separating
FIGURE 2. Three levels of image labels, exemplified on the CT image of an adenocarcinoma of the lung: (A) whole image labeling with the label “tumor”
assigned to the whole image, (B) object detection with a light blue bounding box containing the tumor, and (C) object segmentation with tumor
borders delineated in light blue.
completely black from completely white images, corresponding to high learning rate of a DCNN, that is, the pace of adaption during training.57
distinctness, and perfect data quality would need only few examples. It is considered good practice to repeat this training-validation cycle
One can refer to the sample size used in previous studies with similar multiple times using different subsets for training and validation to
questions. The information on how many examples were eventually make use of all data and increase robustness of the model, also known
available for each category should always be reported. as cross-validation. Finally, the model's predictive performance should
For many medical imaging projects, be it for reasons of quantifi- be tested on the testing dataset. It is important that data contained in
cation (eg, volumes) or secondary analyses such as radiomics, the exact the testing dataset are not used for training or validation, as otherwise
demarcation of objects is of interest. Therefore, full segmentation is needed. the derived performance measures would be too optimistic and the gen-
In semantic segmentation, every pixel (2-dimensional [2D]) or voxel eralizability of the model would be negatively affected. The perfor-
(3-dimensional [3D]) of an image dataset is assigned to a class (eg, mance of the algorithm both on the validation and the test dataset
“lung tumor” and “background”). This results in a distinct definition should be reported. The performance on the test dataset is expected to
of object boundaries. If the subclasses of each class are further distin- be worse than on the validation dataset, as hyperparameters were opti-
guished (eg, tumor 1, tumor 2, etc), one speaks of instance segmentation.46 mized on the latter one. The numerical relation of training, validation,
There is a range of segmentation methods from fully manual over semi- and test datasets depends on the amount of data available and the clas-
automated to fully automated segmentation. For manual approaches, sification accuracy (smaller N and higher accuracy: more data assigned
substantial intrarater and interrater variabilities were reported.47,48 It is to training/validation sets). There is no fixed rule, but a ratio of 2/3 for
therefore crucial to quantify that variability, preferably by multiple an- training and 1/3 for validation and testing is common.58
notators and indication of a measure of reliability such as the Dice
score.49 The newest generation of fully automated segmentation
algorithms is based on U-shaped CNNs50; however, traditional image STATISTICAL EVALUATION
processing techniques such as region growing51 and other threshold- As mentioned previously, the outputs of AI algorithms for image
based methods52 are also frequently applied. Another method is that of analysis in radiology are (1) classes on an examination/image level (eg,
patch-wise segmentation, in which each pixel/voxel of a target image is radiograph contains fracture: yes/no), (2) detection of objects (eg, lung
labeled by comparing the patch with the pixel/voxel at its center with a tumor detected: yes/no), and (3) segmentation masks (eg, a lung tumor).
database of manually labeled patches.53 Semiautomated approaches They require different evaluation methods.
require minimal user interaction, for example, a single click to mark
a lung tumor, and segmentation is executed by an algorithm. 1. The output of AI algorithms predicting a class of an examination is
For the creation of segmentation masks, many open-source soft- a score. It can be thought of as a measure of certainty that an input
ware programs are available, for example, 3D Slicer,54 ITK-SNAP,55 or belongs to a target class, for example, that a radiograph shows a frac-
MITK.56 There are 2 important general requirements for these tools: first, ture. Then, a threshold is defined and a class is attributed to each
ease of use with multiple segmentation techniques at hand, as labeling/ examination (eg, if the prediction score of a radiograph is greater
segmentation tasks are time-consuming, and second, easy export of than 0.5, then the label “fracture-yes” is given). In the common
labels/segmentation mask in formats that allows for interchangeability case of a binary classification task, the results can be displayed
with secondary programs maintaining orientation in space. The Neuro- in a confusion matrix (see Fig. 3A) describing true-positive (TP),
imaging Informatics Technology Initiative format is widely used for true-negative (TN), false-positive (FP), and false negative (FN) find-
segmentation masks. Although, in recent years, image processing tasks ings. Frequently derived performance metrics are sensitivity and
have become the domain of DCNNs, classification problems are also specificity. Sensitivity, commonly called recall in data science, is cal-
TP
solved with other ML approaches such as random forest or support culated as TPþFP . Specificity is calculated as TNTN
þFN . The interpreta-
vector machines. tion of these measures depends on the actual use case: although one
might accept a sensitivity of 0.8 in research, this sensitivity is un-
acceptable for a clinically deployed worklist prioritization tool,
TRAINING, VALIDATION, AND TESTING flagging examinations that contain acute abdominal bleeding. A
As in other areas of ML, it is important to differentiate between related measure is the FP findings per case (FPF/c). It is valuable
the training dataset, the validation dataset, and the testing dataset. On for assessing the clinical relevance of an algorithm: a high sensitivity
the training dataset, the model fit is performed and the model “learns” is clinically useless when FPF/c is too high, as many FP cases ob-
to map input to output data. The validation dataset is used to repeatedly struct clinical workflows and lower acceptance among radiologists.
assess the model's performance during training phase and tune its There are many examples of algorithms with high FPF/c's of up to
hyperparameters. These are parameters set before training such as the 40 that would clearly fail in a clinical context.59–61 Summary
FIGURE 4. Simplified pipeline for the creation of an AI-based image analysis algorithm in radiology. After identification and download of report texts and
image data from the local RIS/PACS archives, report texts are transferred to a curated database containing the information of interest. This can be
achieved with manual labeling or automated techniques like the ones from NLP. Labeled image datasets are created by manual, semiautomatic, or
automatic labeling. Then, a study dataset can be compiled and divided into training, validation, and testing datasets. Training and validation is
sometimes run on the same dataset (“cross-validation”). The test dataset must not be used for training or validation to obtain realistic performance
measures. Finally, the AI-based image analysis algorithm is ready for use.
image data are less reproducible compared with most nonmedical imag- THE FUTURE OF THE DATA BASIS OF AI PROJECTS
ing data. Heterogeneity is introduced by multiple vendors of hardware IN RADIOLOGY
and unstandardized scanner parameters that differ significantly from one As of today, data acquisition in radiology departments and in
radiology department to another and even between scanners in the same hospitals in general is not AI ready. Data are stored in fragmented, mu-
institution.72 Imaging artifacts further reduce reproducibility. tually noncompatible IT systems. Radiology reports consist of unstruc-
Second, the output categories are not as distinct as in nonmedical tured or semistructured texts. Therefore, the pipeline of an AI project in
domains: a human reader has no difficulties in differentiating a boat from radiology requires auxiliary steps such as retrieval and reorganization of
a cat; however in radiology, there is relevant interobserver variability.73 In data and text-mining. However, auxiliary steps add some imprecision.
many cases, it is even impossible to make a definite call (eg, differentiat- As data are the fuel of AI projects, we should strive for more sophisti-
ing a lymph node metastasis from an inflammatory lymph node in a pos- cated ways of acquiring and storing medical data to foster accessibility
itron emission tomography/CTwithout a histology report). This translates and data quality. Efforts toward structured reporting in recent years go
to a situation where the compilation of high-quality ground truth datasets in that direction.80,81 At the end of this process, we should have tools
in radiology is very challenging. Furthermore, it is demanding to create a at hand that convert the information provided by radiologists into struc-
sufficient amount of high-quality ground truth data in radiology due to tured reports (via a report engine) and at the same time fill a databank.
the wide range of entities encountered in radiological practice and the This would render many auxiliary tools such as NLP unnecessary and
enormous time and expertise needed for label creation. allow for an instant analysis of clean data. From the hospital's perspec-
Third, as a result of the above, datasets in radiology AI projects tive, this would enable big data cross-departmental projects that are not
are smaller by magnitudes than datasets in nonmedical domains, for feasible today. Regarding image analysis, we should push for prospec-
example, the ImageNet challenge with 1.4 million labels. There are po- tive labeling of image data instead of retrospective labeling. During the
tential remedies: data augmentation, compilation of multicenter public normal reading process, the radiologist identifies and even measures
datasets, and crowd-based approaches. Data augmentation is a common many pathologic findings. At the moment, this costly label information
practice in radiology AI projects. It means that original dataset is aug- is subsequently lost. To change this, we need to modify current PACS
mented with transformed variants of the original images (eg, by scale software to ensure (a) data accessibility by saving measurements and la-
transformations, rotations).74 It is essential to specify if or to what extent bels in an interchangeable format and (b) add tools that allow for a
augmented data was used and which method was used for its generation. quick annotation, for example, based on semiautomated object segmen-
The compilation of multicenter public datasets is promising as the huge tation. When the right infrastructure is set up, it will quickly accumulate
cost of compiling datasets in the medical domain can be shared and structured data, given the high number of examinations performed in ra-
generalizability is increased. Important examples are the LIDC-IDRI diology departments around the world.
database for lung nodules containing 7371 marked pulmonary lesions These data serve then as foundation for a new generation of algo-
on CTs75 and the ChestX-ray8 dataset provided by the National Insti- rithms that enhance radiology reports with quantitative measurements,
tutes of Health containing more than 100,000 frontal-view radiographs help radiologists not to miss lesions, and prioritize examinations with
with 8 disease labels.76 Crowd-labeling approaches were so far only rarely critical findings in worklists, thereby improving the quality provided by
implemented in the medical imaging domain, for example, for the an- radiology departments.
notation of lung cancer77 and the annotation of mitotic activity detec-
tion on histologic images of breast cancer.78
A fourth peculiarity of image data in radiology is that they are CONCLUSIONS
often in 3 dimensions (eg, CT, positron emission tomography/CT, In this article, we reviewed the current technology stack for AI
magnetic resonance imaging) or even 4 dimensions (eg, dynamic projects in radiology. Figure 4 summarizes the steps that are required
cardiac imaging). This is especially relevant for DCNNs initially de- for a successful AI project within a radiology department. We also created
signed for the processing of nonmedical image 2D data. Compatibility awareness for the challenges and their potential solutions, and provided a
is often reached by greatly lowering the resolution (standard matrix: short outlook on future perspectives. Thereby, we hope to encourage ra-
256 256) and treating 3D datasets as a series of subsequent 2D images, diologists to launch their own AI image analysis projects and help enable
thereby losing information. them to make an objective appraisal of articles on AI-based software
Fifth, there are regional differences in the incidence of findings. in radiology.
Although tuberculosis is a frequent finding in India, its incidence is neg-
ligible in most high-income countries.79 An algorithm trained on data
from one region might therefore have serious trouble in classifying med- REFERENCES
ical data from another region. A possible remedy might be fine-tuning 1. Röntgen WC. Über eine neue Art von Strahlen. Sitzungsberichte der Physikalisch-
or retraining of algorithms on regional data. Medizinischen Gesellschaft zu Würzburg. 1895:2–16.
2. Buchanan BG. A (very) brief history of artificial intelligence. AI Mag. 2005;26: 30. Larson DB. Strategies for implementing a standardized structured radiology
53–53.3. reporting program. Radiographics. 2018;38:1705–1716.
3. McCorduck P. Machines Who Think. A K Peters: Wellesley, MA; 2004. 31. Shea LAG, Towbin AJ. The state of structured reporting: the nuance of standard-
4. Jordan MI, Mitchell TM. Machine learning: trends, perspectives, and prospects. ized language. Pediatr Radiol. 2019;49:500–508.
Science. 2015;349:255–260. 32. Herts BR, Gandhi NS, Schneider E, et al. How we do it: creating consistent struc-
5. Chartrand G, Cheng PM, Vorontsov E, et al. Deep learning: a primer for radiolo- ture and content in abdominal radiology report templates. Am J Roentgenol. 2019;
gists. Radiographics. 2017;37:2113–2131. 212:490–496.
6. Chilamkurthy S, Ghosh R, Tanamala S, et al. Deep learning algorithms for detec- 33. Brown AD, Kachura JR. Natural language processing of radiology reports in pa-
tion of critical findings in head CT scans: a retrospective study. Lancet. 2018;392: tients with hepatocellular carcinoma to predict radiology resource utilization. J Am
2388–2396. Coll Radiol. 2019;16:840–844.
7. Santin M, Brama C, Théro H, et al. Detecting abnormal thyroid cartilages on CT 34. Chen MC, Ball RL, Yang L, et al. Deep learning to classify radiology free-text re-
using deep learning. Diagn Interv Imaging. 2019;100:251–257. ports. Radiology. 2018;286:845–852.
8. Winkel DJ, Heye T, Weikert TJ, et al. Evaluation of an AI-based detection software 35. Li AY, Elliot N. Natural language processing to identify ureteric stones in radiol-
for acute findings in abdominal computed tomography scans: toward an automated ogy reports. J Med Imaging Radiat Oncol. 2019;1754–9485.12861.
work list prioritization of routine CT examinations. Invest Radiol. 2019;54:55–59. 36. Pons E, Braun LM, Hunink MG, et al. Natural language processing in radiology: a
9. Mannil M, von Spiczak J, Manka R, et al. Texture analysis and machine learning systematic review. Radiology. 2016;279:329–343.
for detecting myocardial infarction in noncontrast low-dose computed tomogra- 37. Annarumma M, Withey SJ, Bakewell RJ, et al. Automated triaging of adult chest
phy. Invest Radiol. 2018;53:338–343. radiographs with deep artificial neural networks. Radiology. 2019;291:272.
10. Kim Y, Lee KJ, Sunwoo L, et al. Deep learning in diagnosis of maxillary sinusitis 38. Chapman BE, Lee S, Kang HP, et al. Document-level classification of CT pulmo-
using conventional radiography. Invest Radiol. 2019;54:7–15. nary angiography reports based on an extension of the ConText algorithm.
11. Zhang N, Yang G, Gao Z, et al. Deep learning for diagnosis of chronic myocardial J Biomed Inform. 2011;44:728–737.
infarction on nonenhanced cardiac cine MRI. Radiology. 2019;291:606–617. 39. Yu S, Cai T. A Short Introduction to NILE. Available at: https://arxiv.org/pdf/
12. Zheng Y, Ai D, Mu J, et al. Automatic liver segmentation based on appearance and 1311.6063.pdf. Accessed May 3, 2019.
context information. Biomed Eng Online. 2017;16:16. 40. Iroju OG, Olaleke JO. Information technology and computer science. Inf Technol
13. Zhu W, Huang Y, Zeng L, et al. AnatomyNet: deep learning for fast and fully au- Comput Sci. 2015;08:44–50.
tomated whole-volume segmentation of head and neck anatomy. Med Phys. 2019; 41. Rajchl M, Koch LM, Ledig C, et al. Employing weak annotations for medical im-
46:576–589. age analysis problems. Available at: http://labelme.csail.mit.edu/. Accessed April
14. Couteaux V, Si-Mohamed S, Renard-Penna R, et al. Kidney cortex segmentation in 2D 29, 2019.
CT with U-Nets ensemble aggregation. Diagn Interv Imaging. 2019;100:211–217. 42. Kuznetsova A, Rom H, Alldrin N, et al. The Open Images Dataset V4: unified im-
15. Perkuhn M, Stavrinou P, Thiele F, et al. Clinical evaluation of a multiparametric age classification, object detection, and visual relationship detection at scale.
deep learning model for glioblastoma segmentation using heterogeneous magnetic 2018. Available at: http://arxiv.org/abs/1811.00982. Accessed April 29, 2019.
resonance imaging data from clinical routine. Invest Radiol. 2018;53:1.
43. Pesce E, Withey S, Ypsilantis PP, et al. Learning to detect chest radiographs con-
16. Lin L, Dou Q, Jin YM, et al. Deep learning for automated contouring of primary tumor taining pulmonary lesions using visual attention networks. 2019. Available at:
volumes by MRI for nasopharyngeal carcinoma. Radiology. 2019;291:677–686. https://arxiv.org/pdf/1712.00996.pdf. Accessed April 29, 2019.
17. Dreizin D, Zhou Y, Zhang Y, et al. Performance of a deep learning algorithm for 44. Wang S, Zhou M, Liu ZZ, et al. Central focused convolutional neural networks:
automated segmentation and quantification of traumatic pelvic hematomas on developing a data-driven model for lung nodule segmentation. Med Image Anal.
CT. J Digit Imaging. 2019. 2017;40:172–183.
18. Nishio M, Sugiyama O, Yakami M, et al. Computer-aided diagnosis of lung nod- 45. Figueroa RL, Zeng-Treitler Q, Kandula S, et al. Predicting sample size required
ule classification between benign nodule, primary lung cancer, and metastatic for classification performance. BMC Med Inform Decis Mak. 2012;12:8.
lung cancer at different image size using deep convolutional neural network with
transfer learning. PLoS One. 2018;13:e0200721. 46. Romera-Paredes B, Hilaire P, Torr S. Recurrent Instance Segmentation. Available
at: https://arxiv.org/pdf/1511.08250.pdf%5D. Accessed May 9, 2019.
19. Dalmiş MU, Gubern-Mérida A, Vreemann S, et al. Artificial intelligence-based
classification of breast lesions imaged with a multiparametric breast MRI protocol 47. Yu HJ, Chang A, Fukuda Y, et al. Comparative study of intra-operator variability
with ultrafast DCE-MRI, T2, and DWI. Invest Radiol. 2019;54:325–332. in manual and semi-automatic segmentation of knee cartilage. Osteoarthr Cartil.
2016;24:S296–S297.
20. Nakagawa M, Nakaura T, Namimoto T, et al. Machine learning based on multi-
parametric magnetic resonance imaging to differentiate glioblastoma multiforme from 48. Saha A, Grimm LJ, Harowicz M, et al. Interobserver variability in identification of
primary cerebral nervous system lymphoma. Eur J Radiol. 2018;108:147–154. breast tumors in MRI and its implications for prognostic biomarkers and
radiogenomics. Med Phys. 2016;43(8 Part 1):4558–4564.
21. Dunnmon JA, Yi D, Langlotz CP, et al. Assessment of convolutional neural net-
works for automated classification of chest radiographs. Radiology. 2019;290: 49. Taha AA, Hanbury A. Metrics for evaluating 3D medical image segmentation: anal-
537–544. ysis, selection, and tool. BMC Med Imaging. 2015;15:29.
22. Sun R, Limkin EJ, Vakalopoulou M, et al. A radiomics approach to assess tumour- 50. Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedi-
infiltrating CD8 cells and response to anti-PD-1 or anti-PD-L1 immunotherapy: cal Image Segmentation. 2015. Available at: https://arxiv.org/pdf/1505.04597.pdf.
an imaging biomarker, retrospective multicohort study. Lancet Oncol. 2018;19: Accessed Jun 20, 2019.
1180–1191. 51. Adams R, Bischof L. Correspondence Seeded Region Growing. 1994. Available at:
23. Meng Y, Zhang Y, Dong D, et al. Novel radiomic signature as a prognostic biomarker https://pdfs.semanticscholar.org/db44/31b2a552d0f3d250df38b2c60959f404536f.
for locally advanced rectal cancer. J Magn Reson Imaging. 2018;48:605–614. pdf. Accessed May 9, 2019.
24. Buizza G, Toma-Dasu I, Lazzeroni M, et al. Early tumor response prediction for 52. Al-amri SS, Kalyankar NV, Khamitkar SD. Image Segmentation by Using Thresh-
lung cancer patients using novel longitudinal pattern features from sequential old Techniques. 2010. Available at: http://arxiv.org/abs/1005.4020. Accessed May
PET/CT image scans. Phys Med. 2018;54:21–29. 9, 2019.
25. Kohli M, Prevedello LM, Filice RW, et al. Implementing machine learning in ra- 53. Mechrez R, Goldberger J, Greenspan H. Patch-based segmentation with spatial consis-
diology practice and research. Am J Roentgenol. 2017;208:754–760. tency: application to MS lesions in brain MRI. Int J Biomed Imaging. 2016;2016:1–13.
26. NEMA. PS3.1 DICOM PS3.1 2019a–Introduction and Overview. 2019. Available 54. Fedorov A, Beichel R, Kalpathy-Cramer J, et al. 3D Slicer as an image computing
at: http://dicom.nema.org/medical/dicom/current/output/pdf/part01.pdf. Accessed platform for the Quantitative Imaging Network. Magn Reson Imaging. 2012;30:
April 16, 2019. 1323–1341.
27. Hussein S, Kandel P, Bolan CW, et al. Lung and pancreatic tumor characterization 55. Yushkevich PA, Piven J, Hazlett HC, et al. User-guided 3D active contour segmen-
in the deep learning era: novel supervised and unsupervised learning approaches. tation of anatomical structures: significantly improved efficiency and reliability.
IEEE Trans Med Imaging. 2019;1–11. Neuroimage. 2006;31:1116–1128.
28. Lee CC, Yang HC, Lin CJ, et al. Intervening nidal brain parenchyma and risk of 56. Goch C, Metzger J, Nolden M. Tutorial: medical image processing with MITK in-
radiation-induced changes after radiosurgery for brain arteriovenous malformation: troduction and new developments. In: Maier-Hein KH, Deserno TM, Handels H,
a study using an unsupervised machine learning algorithm. World Neurosurg. 2019. et al, eds. Bildverarbeitung für die Medizin 2017. Informatik aktuell. Berlin,
29. Li H, Galperin-Aizenberg M, Pryma D, et al. Unsupervised machine learning of Heidelberg, Germany: Springer Vieweg; 2017:10–10.
radiomic features for predicting treatment response and overall survival of early 57. Probst P, Boulesteix AL, Bischl B. Tunability: Importance of Hyperparameters of
stage non-small cell lung cancer patients treated with stereotactic body radiation Machine Learning Algorithms; 2018. Available at: https://arxiv.org/pdf/1802.
therapy. Radiother Oncol. 2018;129:218–226. 09596.pdf. Accessed May 14, 2019.
58. Dobbin KK, Simon RM. Optimally splitting cases for training and testing high di- 71. Tim Dettmers. A Full Hardware Guide to Deep Learning—Tim Dettmers. Available
mensional classifiers. BMC Med Genomics. 2011;4:31. at: https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/. Accessed
59. Özkan H, Osman O, Şahin S, et al. A novel method for pulmonary embolism de- April 29, 2019.
tection in CTA images. Comput Methods Programs Biomed. 2014;113:757–766. 72. Lubner MG, Smith AD, Sandrasegaran K, et al. CT texture analysis: defini-
60. Zhou C, Chan HP, Sahiner B, et al. Computer-aided detection of pulmonary em- tions, applications, biologic correlates, and challenges. Radiographics. 2017;37:
bolism in computed tomographic pulmonary angiography (CTPA): performance 1483–1503.
evaluation with independent data sets. Med Phys. 2009;36:3385–3396. 73. Ambinder EB, Mullen LA, Falomo E, et al. Variability in individual radiologist
61. Liang J, Bi J. Computer aided detection of pulmonary embolism with tobogganing BI-RADS 3 usage at a large academic center: what's the cause and what should
and mutiple instance classification in CT pulmonary angiography. Inf Process we do about it? Acad Radiol. 2019;26:915–922.
Med Imaging. 2007;20:630–641. 74. Roth HR, Lu L, Liu J, et al. Improving computer-aided detection using convolutional
62. Hosmer DW, Lemeshow S, Sturdivant RX. Applied Logistic Regression. Hoboken, neural networks and random view aggregation. IEEE Trans Med Imaging. 2016;
NJ: Wiley; 2013. 35:1170–1181.
63. Halligan S, Altman DG, Mallett S. Disadvantages of using the area under the re- 75. Armato SG, McLennan G, Bidaut L, et al. The Lung Image Database Con-
ceiver operating characteristic curve to assess imaging tests: a discussion and pro- sortium (LIDC) and Image Database Resource Initiative (IDRI): a com-
posal for an alternative approach. Eur Radiol. 2015;25:932. pleted reference database of lung nodules on CT scans. Med Phys. 2011;38:
64. Vanderlooy S, Hüllermeier E. A critical analysis of variants of the AUC. Mach 915–931.
Learn. 2008;72:247–262. 76. Wang X, Peng Y, Lu L, et al. ChestX-ray8: Hospital-scale Chest X-ray Database and
65. Hossin M, Sulaiman. A review on evaluation metrics for data classification eval- Benchmarks on Weakly-Supervised Classification and Localization of Common
uations. Int J Data Min Knowl Manag Process. 2015;5. Thorax Diseases. 2017. Available at: http://arxiv.org/abs/1705.02315. Accessed
November 30, 2018.
66. Everson A, Everson RM, Fieldsend JE. ORE Open Research Exeter TITLE Multi-
class ROC analysis from a multi-objective optimisation perspective A NOTE ON 77. Kalpathy-Cramer J, Beers A, Mamonov A, Ziegler E, et al. Crowds Cure Cancer: Data
VERSIONS Multi-class ROC analysis from a multi-objective optimisation perspec- collected at the RSNA 2017 annual meeting. The Cancer Imaging Archive. Available at:
tive. 2013. Available at: http://hdl.handle.net/10871/15243. Accessed May 6, 2019. https://wiki.cancerimagingarchive.net/display/DOI/Crowds+Cure+Cancer%3A+Data
67. Landgrebe TC, Duin RP. Approximating the multiclass ROC by pairwise analysis. +collected+at+the+RSNA+2017+annual+meeting. Accessed July 20, 2019.
2007. Available at: www.elsevier.com/locate/patrec. Accessed May 6, 2019. 78. Albarqouni S, Baur C, Achilles F, et al. AggNet: deep learning from crowds for
68. Hand DJ. A Simple Generalisation of the Area Under the ROC Curve for Multiple mitosis detection in breast cancer histology images. IEEE Trans Med Imaging.
Class Classification Problems. Mach. Learn. 2001;45:171–186. 2016;35:1313–1321.
69. Kluyver T, Ragan-Kelley B, Pérez F, et al. Jupyter Notebooks-a publishing format 79. Anon. WHO | Global tuberculosis report. WHO. 2018;2019. Available at: https://
for reproducible computational workflows. 2016. Available at: https://nbviewer. www.who.int/tb/publications/global_report/en/. Accessed April 17, 2019.
jupyter.org/. Accessed April 17, 2019. 80. Kahn CE, Langlotz CP, Burnside ES, et al. Toward best practices in radiology
70. Buitinck L, Louppe G, Blondel M, et al. API design for machine learning soft- reporting. Radiology. 2009;252:852–856.
ware: experiences from the scikit-learn project. 2013. Available at: http://arxiv. 81. European Society of Radiology (ESR). ESR paper on structured reporting in radi-
org/abs/1309.0238. Accessed April 17, 2019. ology. Insights Imaging. 2018;9:1–7.