Deep Learning For Cardiovascular Medicine: A Practical Primer
Deep Learning For Cardiovascular Medicine: A Practical Primer
Deep Learning For Cardiovascular Medicine: A Practical Primer
Received 29 September 2018; revised 2 November 2018; editorial decision 10 January 2019; accepted 22 January 2019
Deep learning (DL) is a branch of machine learning (ML) showing increasing promise in medicine, to assist in data classification, novel disease
phenotyping and complex decision making. Deep learning is a form of ML typically implemented via multi-layered neural networks. Deep learning
has accelerated by recent advances in computer hardware and algorithms and is increasingly applied in e-commerce, finance, and voice and image
recognition to learn and classify complex datasets. The current medical literature shows both strengths and limitations of DL. Strengths of DL in-
clude its ability to automate medical image interpretation, enhance clinical decision-making, identify novel phenotypes, and select better treatment
pathways in complex diseases. Deep learning may be well-suited to cardiovascular medicine in which haemodynamic and electrophysiological in-
dices are increasingly captured on a continuous basis by wearable devices as well as image segmentation in cardiac imaging. However, DL also
has significant weaknesses including difficulties in interpreting its models (the ‘black-box’ criticism), its need for extensive adjudicated (‘labelled’)
data in training, lack of standardization in design, lack of data-efficiency in training, limited applicability to clinical trials, and other factors. Thus, the
optimal clinical application of DL requires careful formulation of solvable problems, selection of most appropriate DL algorithms and data, and
balanced interpretation of results. This review synthesizes the current state of DL for cardiovascular clinicians and investigators, and provides
technical context to appreciate the promise, pitfalls, near-term challenges, and opportunities for this exciting new area.
...................................................................................................................................................................................................
Keywords Big data • Artificial intelligence • Deep learning • Cardiovascular medicine • Precision medicine
..
identifies patterns in large amounts of data that are typically anno- .. Implementing deep learning in
tated (‘labelled’) by humans, such as the presence or absence of ..
.. cardiovascular applications
reduced systolic function on an echocardiogram or atrial fibrillation ..
(AF) on an electrocardiogram (ECG). Supervised learning may be ..
.. Hardware and software considerations
implemented using neural networks, long-used for medical pattern ..
recognition in cardiology,11,12 neuroscience,13–15 and other fields yet .. Historically, ML was computationally expensive and performed by
.. scientists using supercomputers or high-end workstations with multi-
still limited in clinical use. Unsupervised learning, on the other hand, ..
analyses large amounts of unlabelled data to identify hidden patterns
.. core processors. However, due to the highly parallelizable nature of
.. DL algorithms, GPUs designed for gaming have enabled DL to be
or natural structure in data,16 which greatly increases the volume of ..
data that can be analysed (e.g. from large electronic medical records)
.. performed on desktop machines. Although professional-quality
.. GPUs are still relatively expensive, DL can be performed using cloud
at the potential cost of data quality and interpretability. ..
Reinforcement learning trains software to make decisions that maxi-
.. services by services such as Amazon AWS or Google Cloud. Deep
.. learning software packages are almost uniformly open-source, which
mize a ‘reward’ function,17 which may address a clinical problem (e.g. ..
improve ST-segment elevation myocardial infarction outcomes, or
.. means they are freely available with few constraints for academic
.. research. Furthermore, many DL pre-trained models can be down-
reduce error in ECG diagnosis). ..
.. loaded and repurposed for new tasks, including models such as
Deep learning is a specific type of ML inspired by the way that .. AlexNet,20 VGG Network,21 InceptionNet,22 and ResNet,23 and will
the human brain processes data, and enabled by hardware advan- ... be further described in following sections. In fact, downloading pre-
ces such as graphics processing units (GPUs),16,18,19 vast cata- ..
.. trained models and repurposing them for new datasets avoids much
logues of labelled data, and advances in computer science theory. .. of the time consuming and computationally expensive steps of DL.
To date, most implementations in medicine have used convolu- ..
.. Table 1 demonstrates step-by-step an example DL process in cardio-
tional neural networks (CNNs). Following the ‘AI winter’ from ..
the 1980s, when early rule-based and neural network applications .. vascular medicine.
..
were limited by hardware and algorithmic constraints, DL has ..
accelerated supervised, unsupervised, and reinforcement learning. .. Selecting a deep learning software
..
In 2016, DeepMind’s AlphaGo Zero4,5 beat the world champion .. package and modelling strategy
in the ancient Chinese game of Go, and deep Q-learning6 proved .. The first practical step for DL is to choose an appropriate software
..
as accurate as a professional human player in 49 interactive video .. package to work with such as Keras, Tensorflow, or others. Keras
games. In 2018, DL applications by DeepMind rivalled humans in .. is often used as a starting point, as it can be used in a relatively
..
the 3D multiplayer videogame Quake III Arena.7 Figure 3 provides .. straightforward fashion with high-level programming languages,
a schematic of how neural networks, a common basic architec-
.. most commonly Python. Supplementary material online, Table S1
..
ture for DL, could be used to classify an ECG. . summarizes some platforms and related programming languages for
Deep learning for cardiovascular medicine 5
Table 1 A guide to approach of the deep learning applications in cardiovascular medicine research and clinical
practice
AI, artificial Intelligence; DL, deep learning; EHR, electronic health record; GPU, graphics processing unit; HFpEF, heart failure with preserved ejection fraction; NGS, next-gen-
eration sequencing; NPV, negative predictive value; PAD, peripheral arterial disease; PCSK9, proprotein convertase subtilisin/kexin type 9; PPV, positive predictive value; ROC
curve, receiver-operating characteristic curve.
DL. Second, one should choose a DL model appropriate for the .. tailored to investigators’ problems/datasets by modifying outcomes
..
problem under consideration, which may be pre-existing (pre- .. related layers [i.e. last few layers or final layer (softmax layer)] to
trained models) or require development of a custom model (novel
.. optimize their results. This concept is called transfer learning.24 For
..
models). In general, due to time restrictions, and computational re- .. example, if a pre-trained model was trained to accurately predict
quirement, pre-trained models are mostly used first, and then
.. diastolic dysfunction using global longitudinal strain in HFrEF
6 C. Krittanawong et al.
..
patients in a large database, we might be able to use resultant .. intrinsicoid deflection.26 Since a key advantage of DL is to learn com-
knowledge to conduct the prediction of diastolic dysfunction in .. plex features, overly complex preprocessing of data into features
..
HFpEF patients. There are several pre-trained models such as .. may also impede performance. One general goal of DL may be sum-
AlexNet,20 which recognizes visual patterns directly from pixel .. marized as letting the algorithm automatically perform feature selec-
..
clinical indices as supported by others.58 In a novel study applying DL .. echocardiographic views.44 DeepVentricle,71 a DL application for
to retinal fundus photographs trained on 284 335 patients, DL pre- .. cardiac MRI images, received FDA clearance for clinical use.72 Bai et
..
dicted MACE with c-statistic 0.70 in two validation populations .. al.,62 applied DL to automatically segment 93 500 labelled MR images
(12 026 and 999 patients).59 Several reports have used DL to define .. in 4875 subjects from the UK Biobank with similar accuracy to
..
structural heart disease from echocardiography and MR imaging .. experts for segmenting left and right ventricles on short-axis images
(Table 2). Deep learning can diagnose structural disease from limited .. and left and right atrium on long-axis images.
.
Table 2 Deep learning applications in cardiovascular medicine
First author Disease application Images/slides (N) Type of images Results Algorithms
................................................................................................................................................................................................................................................................................................
Arrhythmias
Bumgarner et al.60 AF detection 169 records A rhythm strip from Kardia 93% sensitivity and 84% specificity for AF detection Kardia Band automated
mobile phone algorithm
Pyakillya et al.50 ECG classification 8528 records ECG records 86% accuracy for classification results Deep CNN
Hannun et al.49 Arrhythmias detection (i.e. VT, 64 121 ECG records ECG records The model outperforms the average cardiologist performance on Deep CNN
Mobitz I, Mobitz II) most arrhythmias (measured by sequence level accuracy and set
level accuracy)
Tison et al.8 AF detection 9000 ECGs ECG from smartphone On recumbent ECG, c-statistic 0.97; on ambulatory ECG, c-statistic Deep CNN
0.72
Deep learning for cardiovascular medicine
Xia et al.48 AF detection 123 848 samples ECG signal (STFT and SWT) 98.29% accuracy for STFT and 98.63% accuracy for SWT Deep CNN
Cardiac MRI
Avendi et al.63 Image segmentation (LV shape 45 MRI datasets Cardiac MRI images 90% accuracy (in terms of the Dice metric) Deep CNN and stacked
detection) autoencoders
Bai et al.62 Image segmentation (LV and 93 500 images Cardiac MRI images Reported as Dice metric, mean surface distance, and hausdorff surface Deep CNN (a fully CNN)
RV) distance
Luo et al.64 Image segmentation (LV 1140 subjects Cardiac MRI images High accuracy in LV volumes prediction (measured accuracy by root Deep CNN
volumes) mean squared error)
Oktay et al.65 Image quality (i.e. LV segmenta- 1233 healthy adult Cardiac MRI images Directly compare image quality and segmentation results (i.e. LV cavity Deep CNN (super resolution
tion, motion tracking) subjects volume differences or surface-to-surface distances) approaches)
Oktay et al.47 Image segmentation (cardiac 1200 images Cardiac MRI images Reported as Dice metric, mean surface distance, and hausdorff surface Deep CNN
pathologies) distance
Echocardiography
Dong et al.66 LV segmentation 60 subjects Echocardiography images Reported as Dice metric, mean surface distance, and hausdorff surface Deep GAN (VoxelAtlasGAN)
distance
Gao et al.67 8 echo views classification 432 video images Echocardiography images 92.1% accuracy for classification results Deep CNN
Knackstedt et al.68 Assessment of LV volumes and 432 video images Echocardiography images 92.1% accuracy for classification results Deep CNN
EF
Luong et al.43 Assessment of echo image qual- 6916 images Echocardiography images The average absolute error of the model compared with manual Deep CNN
ity feedback scoring was 0.68 ± 0.58
Madani et al.44 15 echo views recognition 223 787 images Echocardiography images 91.7% accuracy for classification results (F-score 0.904 ± SD 0.058) Deep CNN
Heart failure
Nirschl et al.69 Clinical heart failure 209 patients H&E stained whole-slide images > 93% accuracy for both training and testing datasets vs. 75% accuracy Deep CNN
classification for pathologists
Seah et al.70 Prediction of CHF 103 489 images Chest X-rays Model achieved an AUC of 0.82 (At a cut-off BNP of 100 ng/L) Deep GAN
Islam et al.105 Pulmonary oedema detection 7284 images Chest X-rays The same architecture does not perform well across all abnormalities Deep CNN
Myocardial perfusion imaging
Betancur et al.56 Prediction of obstructive CAD 1638 patients Fast SPECT MPI AUC for disease prediction by deep learning was higher than for total Deep CNN
perfusion deficit
AF, atrial fibrillation; AUC, the area under the receiver operating characteristic curve; CAD, coronary artery disease; CHF, congestive heart failure; CNN, convolutional neural network; ECG, electrocardiogram; EF, ejection fraction; GAN,
generative adversarial network; HCM, hypertrophic cardiomyopathy; LV, left ventricle; MPI, myocardial perfusion imaging; MRI, magnetic resonance imaging; RV, right ventricle; SPECT, single-photon emission computerized tomography;
9
..
Deep learning for outcomes prediction in .. Deep learning for arrhythmia detection
heart failure .. and phenotyping
..
Several studies have applied ML to predict outcomes in heart fail- .. Several studies have used DL to diagnose AF from the ECG. Tison et
.. 8
ure (HF).41,73,74 Choi et al.75 applied neural networks to detect .. al. trained a DL network on 9750 ambulatory smartwatch ECGs,
new onset HF from electronic health records in 3884 patients .. then applied it to 12-lead ECGs. The network performed well in 51
..
who developed incident HF and 28 903 who did not, linking time- .. recumbent patients before cardioversion (c-statistic 0.97 vs. 0.91 for
stamped events (disease diagnosis, medication and procedure .. current ECG algorithms), but less well in a cohort with ambulatory
..
orders). Networks provided a c-statistic for incident HF of 0.78 .. ECGs (c-statistic 0.72, sensitivity 67.7% and specificity 67.6%). A sep-
(12-month observation) and 0.88 (18-month observation), both .. arate study used AI to diagnose AF from electrical ECG sensors in a
..
significantly higher than the best baseline method. Medved et al.76 .. smartphone case (Kardia), or a watch-strap which communicates via
compared the International Heart Transplantation Survival .. Bluetooth to a smartphone (Kardia Band).60 In 100 patients with 169
..
Algorithm developed using DL training, with the Index for .. simultaneous wearable and traditional ECGs, 57 recordings were un-
Mortality Prediction After Cardiac Transplantation (IMPACT), in .. interpretable. For interpretable ECGs, the device identified AF with a
..
transplant recipients from 1997 to 2011 from the UNOS registry. .. K coefficient of 0.77 (sensitivity 93%, specificity 84%) although phys-
In 27 705 patients, using those before 2009 for training and .. ician interpretation improved results further.60 Thus, while these
..
those from 2009 for validation, DL provided a c-statistic for 1- .. data are promising, further advances in analytic algorithms and sensor
year survival of 0.654 for IHTSA, which reduced error by
.. technology are needed for automatic use. Emerging sensors beyond
..
1 compared with the IMPACT model (c-statistic 0.608). These .. optical sensors (photoplesthysmography) in the iWatch8 include
results, while modest, show promise for DL beyond current
.. changes in facial reflectance to visible light,78 bioimpedance in weigh-
..
clinical indices. Future studies may apply DL to multi-variable .. ing scales79 and others. The accuracy of each sensor needs validation
data (e.g. histopathology, echo, ECG, labs, multi-omics, wearable
.. since, in recent comparisons against gold standards, wearable sensors
..
technology) to study HF outcomes. The recent BIOSTAT-CHF .. had acceptable accuracy for resting heart rate yet not for ambulatory
trial, a large registry for risk prediction for HF in 11 European
.. exercise heart rates nor energy expenditure.80 Current ESC guide-
..
countries, has multi-level data that could be used to reclassify .. lines provide a Class I recommendation for the opportunistic screen-
..
HF patients.77 . ing for silent AF in patients >65 years of age by pulse or ECG rhythm
Deep learning for cardiovascular medicine 11
..
strip, based on evidence on cost effectiveness.81 Therefore, integrat- .. has several limitations. Problems are that the c-index is only a
ing DL into wearable technology for intermittent screening for silent .. measure of discrimination, ignores calibration indices and has no
..
AF may be cost effective by preventing sequelae such as stroke. .. single universally accepted c-statistic cut-off or range of acceptable c-
Finally, ML shows promise in identifying novel arrhythmia pheno- .. statistics.92,93 This and other limitations can be addressed by calibra-
..
solutions, technical, and ethical challenges it poses in medicine. At the .. analysis may not only identify hidden information in complex, hetero-
undergraduate and graduate level, such training may focus on its com- .. geneous datasets, but also may bridge the gap between disease
..
plementary role to biostatistics, and on detailed software program- .. pathogenesis, genotypes, phenotypes to enable personalized medi-
ming and hardware aspects. In medical education, implementation of .. cine. However, to transform cardiovascular care, DL will have to ad-
8. Tison GH, Sanchez JM, Ballinger B, Singh A, Olgin JE, Pletcher MJ, Vittinghoff E,
.. environments. ICLR 2018: International Conference on Learning
..
Lee ES, Fan SM, Gladstone RA, Mikell C, Sohoni N, Hsieh J, Marcus GM. Passive .. Representations, Vancouver, BC, Canada.
detection of atrial fibrillation using a commercially available smartwatch. JAMA .. 35. Peng XB, Andrychowicz M, Zaremba W, Abbeel P. Sim-to-real transfer of ro-
Cardiol 2018;3:409–416. .. botic control with dynamics randomization. 2018. IEEE International
9. Bello GA, Dawes TJ, Duan J, Biffi C, de Marvao A, Howard LS, Gibbs JSR, .. Conference on Robotics and Automation (ICRA), Brisbane, Australia.
53. Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S.
.. 74. Miller K, Hettinger C, Humpherys J, Jarvis T, Kartchner D. Forward thinking:
..
Dermatologist-level classification of skin cancer with deep neural networks. .. building deep random forests. arXiv preprint arXiv:1705.07366. 2017.
Nature 2017;542:115. .. 75. Choi E, Schuetz A, Stewart WF, Sun J. Using recurrent neural network models
54. Mundhra D, Cheluvaraju B, Rampure J, Rai Dastidar T, Analyzing microscopic .. for early detection of heart failure onset. J Am Med Inform Assoc 2017;24:
images of peripheral blood smear using deep learning. In: Jorge Cardoso (ed.), .. 361–370.
96. Moons KG, Altman DG, Reitsma JB, Ioannidis JP, Macaskill P, Steyerberg EW, .. 99. Krittanawong C, Bomback AS, Baber U, Bangalore S, Messerli FH, Wilson Tang
Vickers AJ, Ransohoff DF, Collins GS. Transparent reporting of a multivariable
.. WH. Future direction for using artificial intelligence to predict and manage
..
prediction model for individual prognosis or diagnosis (tripod): explanation and .. hypertension. Curr Hypertens Rep 2018;20:75.
elaboration. Ann Intern Med 2015;162:W1–73. .. 100. Krittanawong C. Future physicians in the era of precision cardiovascular medi-
97. Steyerberg EW, Vergouwe Y. Towards better clinical prediction models: seven .. cine. Circulation 2017;136:1572–1574.