Locating Abnormal Heartbeats in ECG Segments Based On Deep Weakly Supervised Learning

Biomedical Signal Processing and Control 68 (2021) 102674
Contents lists available at ScienceDirect
Biomedical Signal Processing and Control

journal homepage: www.elsevier.com/locate/bspc
Locating abnormal heartbeats in ECG segments based on deep weakly

supervised learning
Yanni Tong a, 1, Yinan Sun a, 1, Peng Zhou a, Yang Shen a, Hua Jiang b, Xianzheng Sha a, *,
Shijie Chang a, *
a
Division of Biomedical Engineering, China Medical University, Shenyang, PR China
b
Department of Cardiovascular Medicine, The First Affiliated Hospital of China Medical University, Shenyang, PR China
A R T I C L E I N F O A B S T R A C T
Keywords: Electrocardiogram (ECG) examination has played a routine and crucial role in many aspects of clinic diagnosis.
Electrocardiogram An auxiliary diagnosis system that can extract effective information from ECG is valuable. In this study, we
Multiple instance learning propose to design a novel multi-instance neural network (MINN) model capable of detecting abnormal ECG
CNN
segments, meanwhile, locating abnormal heartbeats in them. The model is constructed by convolutional neural
network (CNN) and trained under the framework of multiple instance learning. It takes the interaction between
the individual heartbeat and whole ECG segment into consideration during the training process, making them
constrain each other. MIT-BIH arrhythmia database and CMUH database supported by the First Hospital of China
Medical University are used as data resources in this study. 44,332 ECG segments are extracted from both da
tabases to exploit the model. We test our model on ECG segments with various number of heartbeats which are 5,
10, 15 and 20 respectively. The best performance of MINN on detecting ECG segments can achieve a AUC and
sensitivity up to 0.9922 and 0.9809, while on locating abnormal heartbeats can achieve a sensitivity up to
0.9473. The test results indicate our system can offer available classification and location messages, having the
potential to be applied in the analysis of long-term ECG records.
1. Introduction perform well on one dataset obtained from particular acquisition facil
ities and nations, but barely acceptable when switching dataset, the
Electrocardiogram (ECG) is a foundational tool with 12 channels of practicability of those models still needs further exploration. Currently,
signals used for revealing the overall situation of cardiac, recording subject to such situations as misjudging disease types and appearance of
electrophysiological activities of myocardial cells which compose the false negative cases, it is unrealistic to rely entirely on automatic com
elementary functional units of cardiac. As for diagnosis of ECG, it either puter diagnosis over experts clinically. Thus, the ECG analysis algo
depends on experienced cardiologists directly drawing conclusions from rithms are also in the process of sustaining optimization, which mainly
the ectopic morphologies and rhythms, or needs to combine specific reflects in trim of classifier structures, transformation and diversity of
parameters computed by algorithms. Typically, the explicit information datasets, and options of learning style. For example, establishing a larger
provided by those algorithms are heart rate, interval duration, and axis ECG training dataset including data collected from over 50,000 patients
deviation, disease categories may be given occasionally but with limited and employing a modified residual network architecture, these opera
accuracy. tions elevated performance of the classifier effectively and competitively
Over the years, many algorithms directing at ECG analysis have [6]. As another example, during the training process, a generic model
emerged. Recent trends to construct those analysis models are basing on could be pre-trained with a common database and then retrained with a
deep learning (DL) method, where projects such as ECG features gen diverse dataset containing ECG of specific patients, aiming at making the
eration and fusion, data augmentation, location of key joints of waves, model dedicated [7]. Models in aforementioned researches were both
and arrhythmia detection can be fulfilled [1–5]. Since models may trained under supervised learning framework, which is a universal
* Corresponding authors at: Division of Biomedical Engineering, China Medical University, Shenyang, 110121, PR China.
E-mail addresses: [email protected] (X. Sha), [email protected] (S. Chang).
1
These two authors contributed equally to this work.
https://doi.org/10.1016/j.bspc.2021.102674
Received 29 January 2021; Received in revised form 7 April 2021; Accepted 24 April 2021
Available online 4 May 2021
1746-8094/© 2021 Elsevier Ltd. All rights reserved.
Y. Tong et al. Biomedical Signal Processing and Control 68 (2021) 102674
Fig. 1. Workflow of our proposed method.
behavior in incipient DL fields. 2. Materials and methods

Usually, under supervised learning mode, plentiful data with precise
annotations such as categories, profiles, and position information of 2.1. Overview
target areas are required. However, data possessing high quality labels
need experts in relevant aspects to assist marking, which is a compli The workflow of our study was designed as delineated in Fig. 1.
cated mission. With the development of DL, the implementation of Notice that the model possesses two output ports, while one port judges
arithmetic is gradually inclined to reduce the proportion of manual work whether the whole ECG segments are healthy, the other port outputs
in the overall project, which means the manual labels are needed but do heat maps denoting the probability values of each heartbeat being
not entirely depend on them to trace all the details. Researchers strive to anomalous. A total of two databases were used to train and test the
utilize datasets with rough labels to train models, making them suffice to proposed model.
speculate fine information. Thus, researches applying weakly supervised
learning emerge constantly. One common form in those studies is giving 2.2. Data source
the categories of training objects and then based on them to formulate
arithmetic with the capacity to predict which parts of the objects are the In this study, ECG signals from MIT-BIH arrhythmia database [15]
dominant factors to identify classifications. For clinical utilization, the and CMUH database have been used. MIT-BIH arrhythmia database
weakly supervised learning is usually used for segmenting and classi includes 48 half-hour 2-channel ECG recordings of 47 subjects with a
fying medical images such as pathology, microscopy, and ultrasound sampling rate of 360 Hz. Referring to previous researches [1,17–19] that
images [8–10]. This learning style simplifies the tag procedure of used MIT-BIH database, signals of lead II were often utilized as the
datasets, achieving competitive effects compared to supervised learning analysis object, so for all ECG records we also utilized lead II signals in
[11–14]. this study. CMUH database contains 12-lead ECG records with a sam
During ECG examination, the changes existing in a period of ECG pling rate of 560 Hz collected from the patients hospitalized between
such as atypical waves and arrhythmia denoting some kind of cardio 2013 and 2017 in First Hospital of China Medical University. Here, we
vascular diseases are not evident, sometimes, not easily perceived by only used signals of lead II from CMUH database, and the length of them
inexperienced physicians. Thus, it is practical to indicate the abnormal is 10 s. For both databases, cardiologists provided annotation files for all
heartbeats of ECG signals in case the occurrence of missed diagnosis. In heartbeats.
this study, we proposed to create an ECG screener (MINN) having the
function of giving the probabilities of ECG segments being positive, 2.3. Preprocessing
meanwhile picking out which specific heartbeats in them are abnormal.
The screener is embedded with convolutional neural network (CNN), In the first step, wavelet transform was used to eliminate baseline
and trained under the way of multiple instance learning (MIL), which is drift [16]. The next step was to crop the long signals to relatively short
a form of weakly supervised learning. That learning method can make segments with different lengths. The purpose of that operation was to
the screener own the property of training ECG segments with only observe and compare the analytical performance of the model for signals
category labels, but ultimately having the ability to predict the inner fine of different lengths. Since we would use the model trained by datas from
labels of ECG segments, that is, locating abnormal heartbeats in the MIT-BIH to test datas from CMUH, we should make the sampling rate of
corresponding segments. The analysis results offered by our model could these signals uniform. Thus, the signals of CMUH are down sampled to
offer more intuitive and helpful information to physicians, aiding the 360 Hz.
diagnosis process more validly.
2.4. Data partition
2.4.1. Dataset derived from MIT-BIH database

We selected 45 records (except for records ‘102’, ‘104’, ‘114’) that
contain signals of lead II from MIT-BIH database. Various types of
2
Fig. 2. Illustration of ECG segments in four datasets.
Table 1
Number of heartbeats and ECG segments involved in each dataset.
Type of heartbeat A B C D
training test training test training test training test
N 58414 14405 58064 14649 58235 14386 58267 14262

L 6412 1642 6352 1691 6283 1733 6498 1517
R 5700 1538 5974 1253 5759 1454 5546 1657
a 116 34 114 34 122 25 120 27
V 5680 1381 5626 1422 5497 1524 5729 1292
F 634 162 600 195 638 157 636 158
A 2032 498 2053 476 2078 428 1933 579
j 176 53 190 39 184 45 184 43
/ 2824 788 2869 733 2925 672 2818 766
！ 359 90 350 70 345 45 280 80
x 157 36 173 19 167 24 154 37
f 197 63 222 38 219 41 177 83
Other Ab 1954 475 1943 481 1908 511 1918 499
Total Ab 26241 6760 26466 6451 26125 6659 25993 6738
Segments 16931 4233 8453 2110 5624 1403 4213 1050
This table lists the number of healthy and abnormal heartbeats. The abbreviations of heartbeat types were achieved from the documents in MIT-BIH database, which
provided the annotations for each heartbeat. For heartbeat types with less number, we consolidated them into “other Ab” type showing their total. “Segments” stands
for the total of ECG segments in each dataset. “Ab” is the abbreviation for abnormal.
heartbeats such as right bundle branch block (R), left bundle branch 8: 2.
block (L), atrial premature contraction (A), premature ventricular
contraction (V), junctional premature (J), and junctional escape (j) are 2.4.2. Dataset derived from CMUH database
included in the ECG signals. After cutting these long signals into seg This dataset consists of II lead ECG data of 315 people extracted from
ments with different number of heartbeats, we divided these segments CMUH database. Fig. 3 displays a sample of ECG segment from the
into dataset A, B, C and D, which contained 5, 10, 15 and 20 heartbeats, dataset. The number of different types of heartbeats in the dataset is
respectively. Fig. 2 shows samples of the preprocessed data from those exhibited in Table 2.
four datasets. Table 1 counts the number of heartbeat types contained in
each dataset. The training and test segments were assigned at a ratio of
3
Yi and Yik could be stated as Yi = 1 ≡ ∃k, s.t. Yik = 1 and Yi = 0 ≡ ∀k, s.

t. Yik = 0. Then we defined Yî as probability of an ECG segment being
aberrant, and Yîk as probability of the kth heartbeat being aberrant in the
ith segment. These two types of probabilities were yield by two ports of
MINN. The Yîk is interpreted as Yîk = p(Yik = 1|Xik , θ), where θ = {ωz ,
bz , ∀z} represents the parameters of the zth layer of MINN. During the
training of model MINN, only coarse labels Yi that indicate the class of
ECG segments were offered, the role of MIL was to make the model learn
the ability of predicting fine labels Yik that signify which specific
heartbeats are problematic.
For the relation between Yî and Yîk , generally there are two modal
ities to express, one is using max pooling which is represented as Yî =
maxk Yîk . This expression emphatically considers the impact of the most
discriminative instances but may overlook the devotion of other in
Fig. 3. Illustration of ECG segments in CMUH dataset.
stances, and make the derivative course ∂Yî /∂Yîk exist truncation [9]. In
view of enabling the model to concurrently consider the impacts of all
Table 2 the instances to one bag, not just pay attention to the instance with
Number of heartbeats for different types. maximum activation value, the other modality uses global functions to
gather instance-level predictions aiming at making more instances
Type of heartbeat Number
participate in the derivation of bag-level prediction. We employed both
N 1801 modalities to link Yî and Yîk in this study, observing and comparing their
L 700
R 590 performances. The global functions we picked to express the relation of
V 59 Yî and Yîk were integrated segmentation and recognition (ISR) [21],
Segments 315 long-sum-exponential (LSE) [22] with r equal to 1, 2.5, 5 and noisy-or
[22]. The relevant equations are:
/( )
2.5. Multiple instance learning ( ) ∑ m
Yîk ∑m
Yîk
Yî = f Yîk = 1 + ISR (1)
1 − Yîk 1 − Yîk
The form of weakly supervised learning we took in this study was k=1 k=1
multiple instance learning (MIL), which was first proposed by Dietterich

( ) 1 1∑ m
( )
et al. to research questions of predicting pharmaceutical activities [20]. Yî = f Yîk = ln exp rYîk LSE (2)
There are two essential definitions in ideology of MIL, namely bag and r m k=1
instance. A bag, consisting of multiple instances, is deemed as negative if ( ) ∏m ( )
all the instances in it are negative, and positive provided one instance is Yî = f Yîk = 1 − k=1
1 − Yîk noisy − or (3)
positive. Similarly, an ECG segment would be diagnosed as healthy if
These formulations could decently aggregate instance probabilities
there is no abnormality in each heartbeat. Therefore, conforming to the
Yîk into bag-level probability Yî , and assign similar weights to similar
notion of bags and instances in MIL, we defined bags as ECG segments
and instances as heartbeats in this study. instances in one bag. The loss function adopted in this study is defined
Here, the training dataset is expressed as S = by:
{(Xi , Yi ), i = 1, 2, 3, 4, …, n }, in which Xi denotes the ith ECG 1∑ n
(4)
(1− Yi )
segment and Yi is the corresponding label. Yi equals to 1 when at least L(θ) = − Y ∧Yi (1 − Yi∧ )
n i=1 i
one aberrant heartbeat appears in a segment, and 0 when all the
heartbeats in the segment are healthy. Bag Xi is composed of instances Through the backpropagation, parameters θ of the model would be
Xik (k = 1, 2, 3, 4, …, m), where m equals to the number of instances, updated so as to minimize the loss function.
and Yik is considered as the label of Xik . Hence, the correlation between
Fig. 4. CNN model structure. The front part is five CONV blocks: conv1_64, conv2_128, conv3_256, conv4_512 and conv5_512. The kernel size of convolutional layers
and max pooling layers are 1 × 9 and 1 × 3. Each output of the convolutional layer is activated by rectification non-linearity (ReLU) [32], and then undergoes the
batch normalization (BN) transformation [33]. The amount of neuros in three fully connected layers are 2048, 1024 and 1. Dropout [34] is applied between fully
connected layers to prevent overfitting.
4
Fig. 5. ROC curves of MINN and VGG-16 tested on four datasets of MIT-BIH. The symbol ‘*’ indicates the three models with highest AUC values. Variables false
positive rate (FPR) and true positive rate equal to TP/(TP + FN) and FP/(FP + TN), where TP, TN, FP and FN are defined as true positives, true negatives, false
positives and false negatives, respectively.
2.6. CNN architecture of selecting features and making the related features activated when the
classification is ascertained. The last fully connected layer uses sigmoid
From invention of LeNet, VGG, ResNet, etc. initially to commonly function to generate instance-level predictions Yîk . After the concurrent
used U-NET, GAN, etc. today, one of the stationary structures in those submodules is a MIL layer, which could aggregate the Yîk through
networks is fitting convolutional layers. In recent years, as convolutional function f(∙) defined above. The previously detached instances are
layers gain an edge among image processing, they have also been used merged back into bags after the MIL layer, which could be expressed as
widely in the domains of disposing one-dimensional time sequences ( ) 1
Yî = f Yîk = f(σ (h(xik , θ) ) ), where σ(v) = 1− exp(− v), and h(xik , θ) are
such as ECG [23–27] and electroencephalogram (EEG) [28–30].
the values output by the last dense layer of the corresponding submodule
Here, we created a model based on CNN, making it adjust to the
which is responsible for processing instance k of bag i. Logically, the
analysis of one-dimensional ECG data, meanwhile combined it with MIL
better Yî prediction outcomes achieved, the probabilities of capturing
to design a model with the function of locating anomalous heartbeats in
long ECG signals. The architecture of the model is displayed in Fig. 4. abnormal heartbeats precisely are higher. Equivalently, the selection of
The heartbeats Xik in an ECG segment Xi are concatenated in parallel to heartbeats would influence ECG segments recognition, thereby, the
serve as the input of the model, and the input shape is m × s, in which m output of Yî and Yîk constrain each other during the training process.
denotes the quantity of the heartbeats and s denotes the number of
sampling points included in each heartbeat. Inspired by the convolu 3. Results
tional modules used in VGG net [31], we built similar structure in the
CONV blocks of MINN. Interactively superimposed convolutional and 3.1. Experiment A
pooling layers comprise the CONV blocks of MINN responsible for
extracting features from signals. Through regulating the parameters, we The main purpose of this experiment was to gain a well-trained
found that increasing the size of convolutional kernel appropriately is MINN model and test it using MIT-BIH dataset. The experiment was
conducive to the capture of detailed information of ECG signals and the implemented on NVIDIA GeForce GTX 1080 Ti with 11G Memory, using
improvement of classification results, thereby the larger kernel of 1 × 9 keras 2.1.2 and python 3.6.5.
is chose to perform convolutional operation with generated feature
maps. Separate submodules are attached to the terminal of CONV 3.1.1. Evaluation on bag-level prediction
blocks, and the amount of them would vary according to the number of One of the major premises of locating abnormal heartbeats well in a
instances contained by the bag. These submodules form different length of ECG is assuring that the accuracy of classifying ECG segments,
branches to process instances independently in parallel, and each of namely the performance of MINN in bag-level predictions is exceptional.
them is made up of three fully connected layers, executing the function The binary classification effect of MINN was evaluated in this section.
5
3.1.2. Evaluation on instance-level prediction

In this section instance-level prediction, which embodies the capa
bility of capturing abnormal heartbeats in ECG segments, is evaluated.
Firstly, RMSE (root mean square error) were calculated based on Yik and
Yîk (Fig. 7). The results reveal that for one specific test dataset, there is a
gap in the deviation between Yik and Yîk generated by different
Fig. 6. AUC (95 % CI) values of MINN with different MIL functions.
Here we compared the performance of MINN with different MIL func

tions. In addition, we also exploited the original VGG-16 as a compari
son model. The CONV blocks of MINN was designed based on the
structure of VGG-16, and the transformation we made here is that BN
layers were added to the CONV blocks and the kernels were resized. All
Fig. 7. RMSE measured by MINN with different MIL functions.
the models were trained 200 epochs with mini batch size equal to 32,
and the hyper-parameters configurations were as follows: Adam opti
mizer [35], setting β1 = 0.9, β2 = 0.999 and learning rate to 0.001. The
learning rate was reduced by a factor of 10 whenever the training loss
did not elevate for three successive epochs.
When the model finished training, receiver operating characteristic
(ROC) curves were constructed based on the test results Yî , simulta
neously, areas under the curve (AUC) were calculated. The test results
are presented in Fig. 5. Delong test [36] was used to calculate AUC with
95 % CI (Fig. 6) and P-value. We can see that MINN with ISR, maxpool
and noisy-or global functions gained the supreme AUC values among all
those five test datasets. Through setting significance level to 0.05 we
could draw these conclusions: (i) For each dataset, the performance of
MINN using ISR, maxpool and noisy-or functions all surpass the original
VGG-16 model (P < 6 × 10− 5 ). That means appropriate network ar
chitecture could contribute to the enhancement of detection. Meanwhile
compared to VGG-16, MINN not only has the ability to distinguish be
tween abnormal and normal ECG segments, but also the ability to find
internal abnormal heartbeats simultaneously. (ii) In general, LSE func
tion does not significantly improve, or even reduces the predictive
ability of the model. Besides, as recommended in [37,38], we adopted
lower r value in LSE and found that the prediction results get better with
r increasing. (iii) The difference between MINN using functions of ISR,
maxpool and noisy-or is not obvious (P > 0.05). (iv) As the length of ECG
signal increases, MINN’s ability to classify them is on a downward trend.
Based on the results of AUC, we chose ISR, maxpool and noisy-or
functions that reacted better in distinguishing problematic ECG seg
ments to calculate their sensitivity (Sen) and specificity (Spec) in pre
dicting Yi . The results are summarized in Table 3. We can see MINN
could achieve balanced results between Sen and Spec, which means our
Fig. 8. Violin plots of Yik
ˆ
predicted by MINN with ISR, maxpool and noisy-
proposed model has equal capability to distinguish normal and
or functions.
abnormal ECG segments.
Table 3
Performance of MINN tested on four datasets and comparison among three global functions.
Method dataset A dataset B dataset C dataset D
Sen Spec Sen Spec Sen Spec Sen Spec
MINN_ISR 0.9835 0.9857 0.9593 0.9839 0.9792 0.9759 0.9676 0.9811

MINN_maxpool 0.9864 0.9913 0.9779 0.9860 0.9641 0.9703 0.9705 0.9892
MINN_noisy-or 0.9864 0.9876 0.9822 0.9882 0.9618 0.9685 0.9809 0.9865
6
Table 4
Performance of MINN on instance-level and comparison among three global functions.
Method dataset A dataset B dataset C dataset D
Sen Spec Sen Spec Sen Spec Sen Spec
MINN_ISR 0.9212 0.9926 0.8830 0.9931 0.8308 0.9925 0.7832 0.9862

MINN_maxpool 0.9552 0.9954 0.9051 0.9950 0.8485 0.9912 0.8292 0.9893
MINN_noisy-or 0.9473 0.9956 0.8933 0.9939 0.8187 0.9914 0.8271 0.9832
Fig. 9. Visualization of instance-level predictions Yik

ˆ
. (A) ~ (D) depict the probability maps of ECG segments of different lengths. The symbols "N", "L", etc. are
ground truth.
functions. ISR, maxpool and noisy-or could make MINN achieve lower
RMSE on instance-level. We depict the distribution of Yîk produced by
those three MIL functions through violin plots (Fig. 8). We can clearly
see that the Yîk values have higher density in 0 and 1 position, meaning
the distribution of it is concentrated. But there are also some obvious
discrete points in the figure, which represent the false positive and false
negative instances emerging in the prediction process.
Table 4 shows Sen and Spec metrics of MINN in predicting Yîk .
Compared to its performance in bag-level, we find that for the same test
dataset, Sen reduced large, while Spec did not fluctuate much. It means
that although the model could distinguish ECG segments well, the
abnormal heartbeats in them still could not be located very precisely.
The occurrence of the above situation is related to the principle of MIL,
where Yî is judged as 0 only if all the Yîk in the corresponding bag Xi are
judged as 0, and Yî is judged as 1, provided any Yîk in the corresponding
bag Xi is judged as 1. Thus, excellent Spec obtained at bag-level portends
high Spec at instance-level. Whereas, high Sen in bag-level just means
that at least one abnormal heartbeat in an ECG segment will be picked
out exactly, and not every abnormal one will be detected. Fig. 10. ROC curves of MINN and VGG-16 tested on CMUH dataset.
For each test sample, we visualize the last dense layer of MINN which
outputs Yîk , and then map the probabilities back to the original data. As 3.2. Experiment B
shown in Fig. 9, it illustrates instance-level probability map calculated
by the model. The maps indicate the proposed model has the ability to The main purpose of this experiment was to evaluate the general
locate abnormal heartbeats, which is meaningful in auxiliary diagnosis. ization ability of MINN, and the operation was testing the trained model
7
Table 5 4. Discussion
Sensitivity and specificity for classifying ECG segments.
Method Sen Spec Clinically, for the diagnosis of ECG, the doctors would refer to pa
rameters calculated by computers. That information would assist in
MINN_ISR 0.9763 0.8288
MINN_maxpool 0.9467 0.6507 making final judgments, thus, designing auxiliary diagnosis systems,
MINN_noisy-or 0.9290 0.8082 considering which useful messages should be provided by systems to
better support doctors to interpret ECG is essential, especially in low and
middle-income regions. Those regions are limited by medical resources,
Table 6 and usually do not have seasoned doctors, resulting in more than 75 % of
Sensitivity and specificity for locating abnormal heartbeats. relevant cardiovascular diseases deaths [39]. Based on aforementioned
situation, we aim at creating an ECG analysis system to better assist
Method Sen Spec
physicians, especially those lacking experiences. The previous ECG
MINN_ISR 0.8844 0.8662 auxiliary diagnostic items generally inclined to directly give ECG types,
MINN_maxpool 0.7591 0.8551
MINN_noisy-or 0.8013 0.9039
which may interfere with the judgements of physicians. Considering that
shortage, our proposed model would give probability of each heartbeat
being abnormal in the form of heat map, that information could serve as
on a brand new dataset, CMUH dataset, which had never been a reference in diagnosis. For the weakly supervised learning mode MIL
encountered by our model before. This behavior could make the test adopted by our study, which has been widely used in many aspects of
results a more realistic reflection of clinical practice effect of our computer aided diagnosis [40–43], this method could avert people from
algorithm. consuming plenty of time to make tags for training dataset. Compared to
those models trained under supervised learning method, the superior
3.2.1. Evaluation on bag-level prediction thing about our model is that it could realize heartbeat detection
According to models’ prediction of Yî , the ROC curve was drawn in without labeling any heartbeat in the training datasets.
Fig. 10. We can achieve that although MINN has lower performance on In the part of model evaluation, we used data derived from databases
CMUH dataset, a dataset collected from a new database and not covered built on distinct countries and races to test model. This can largely
by the previous training process, MINN with ISR function still acquired a simulate a scenario, where the model would be really applied to clinical
maximum AUC of 0.9846 (95 % CI, 0.9708− 0.9985). And MINN using ECG diagnosis. The test results revealed that the generalization property
maxpool and noisy-or functions could also achieve AUC values of 0.9477 of MINN is outstanding, which makes the model a potential and pro
(95 % CI, 0.9243− 0.9712) and 0.9520 (95 % CI, 0.9281− 0.9758). spective method to apply in clinical. However, there are also some
Table 5 lists test results of Sen and Spec. Compared to the test on MIT- problems and limitations in our experiments. In clinical diagnosis, the
BIH dataset, metric Sen remains relatively stable while the Spec false negative events is worth noticing. Thereby, according to the test
declined considerably, indicating the probability of MINN mistaking results of MIT-BIH dataset, we calculated how many problematic ECG
normal for abnormal segments is high. However, for a completely new segments were mistaken for healthy ones by MINN, namely the number
dataset, which is quite different from MIT-BIH datasets in aspects of of false negative data. Based on the theory of MIL, the reason an ECG
country, races and acquisition device, the model with ISR function could segment was considered as normal was that the model thought every
still achieve Sen of 0.9763. That proves the model handles the problem heartbeat in the segment was normal. Here, for false negative segments,
of false negatives well, which is substantial in clinical diagnosis. we counted the types of abnormal heartbeats and the amount of corre
sponding types. The results are shown in Fig. 12. It can be discovered
3.2.2. Evaluation on instance-level prediction that types’ A’,’ F’,’ a’ and’ j’ are easily regarded as normal type. It may
Table 6 is the exhibition of evaluation metrics on instance-level. be due to (1) the disparity between the non-sinus P wave in type’ A’
Fig. 11 shows the effect of MINN in locating abnormal heartbeats. (atrial premature contraction) and sinus P wave in type’ N’ is not
Fig. 11. Illustration of heat maps of ECG segments. The symbols "N", "L", etc. in the figure are ground truth.
8
Fig. 12. Graphical representation of false negative heartbeats.
part of them in Fig. 13. Through the observation of ECG signals, we

consider the inferior Spec may be contributed by the circumstance that
the morphology of normal waveform in two datasets exists discrepancy.
Especially, the CMUH database did not participate in the training pro
cess, this may cause the model not recognizing the normal ECG of CMUH
database.
5. Conclusion
In this study, we proposed a novel and practical function to enrich

ECG analysis system, which is integrating the strengths of MIL with the
classification capability of CNN to locate abnormal portions of ECG. The
model aims at optimizing the auxiliary diagnosis system to offer valu
able information to physicians rather than replace them. The test results
show the model can achieve the function of locating abnormal heart
beats while classifying ECG signals. However, one of the limitations in
our study is that we cannot verify the generalization ability of our model
Fig. 13. Illustration of normal ECG data in both datasets. on more standard databases and real-time databases at present. Due to
the lack of ECG databases with heartbeat annotations, we will increase
apparent, which is easy to be ignored; (2) types’ F’ (fusion of ventricular cooperation with hospitals in future to collect more real-time data with
and normal),’ a’ (aberrated atrial premature) and’ j’ (junctional escape), marked heartbeats to train the model, improving the robustness of it.
those types of heartbeats involved in the training dataset were relatively Another limitation is that the model can only be applied to the ECG
small, which prevented the model from learning their characteristics segments with a fixed number of heartbeats, and cannot perform
well. heartbeat analysis on ECG segments with uneven segmentation. Future
For test results of CMUH dataset, we can conclude that MINN is not work should make the model capable of detecting ECG signals with any
good at recognizing normal class in dataset CMUH as it is in dataset MIT- length.
BIH. We review the samples of normal class in both datasets and show
9
CRediT authorship contribution statement with convolutional neural network, Inf. Sci. (Ny). 405 (2017) 81–90, https://doi.
org/10.1016/j.ins.2017.04.012.
[18] S. Kiranyaz, T. Ince, M. Gabbouj, Real-time patient-specific ECG classification by 1-
Yanni Tong: Methodology, Software, Writing - original draft. Yinan D convolutional neural networks, IEEE Trans. Biomed. Eng. (2016), https://doi.
Sun: Methodology, Writing - review & editing. Peng Zhou: Software. org/10.1109/TBME.2015.2468589.
Yang Shen: Software. Hua Jiang: Validation. Xianzheng Sha: Super [19] S.L. Oh, E.Y.K. Ng, R.S. Tan, U.R. Acharya, Automated diagnosis of arrhythmia
using combination of CNN and LSTM techniques with variable length heart beats,
vision, Writing - review & editing. Shijie Chang: Writing - review & Comput. Biol. Med. (2018), https://doi.org/10.1016/j.compbiomed.2018.06.002.
editing. [20] T.G. Dietterich, R.H. Lathrop, T. Lozano-Pérez, Solving the multiple instance
problem with axis-parallel rectangles, Artif. Intell. 89 (1997) 31–71, https://doi.
org/10.1016/s0004-3702(96)00034-3.
Acknowledgement [21] P. Viola, J.C. Platt, C. Zhang, Multiple Instance boosting for object detection, Adv.
Neural Inf. Process. Syst. (2005) 1417–1424.
This work was supported in part by the National Science Fund of [22] Y. Xu, J.Y. Zhu, E.I.C. Chang, M. Lai, Z. Tu, Weakly supervised histopathology
cancer image segmentation and classification, Med. Image Anal. 18 (2014)
Liaoning under Grant 2018-64, in part by the Big Data Research for 591–604, https://doi.org/10.1016/j.media.2014.01.010.
Health Science of China Medical University under Grant Key Project 6. [23] Y. Zhao, J. Xiong, Y. Hou, M. Zhu, Y. Lu, Y. Xu, J. Teliewubai, W. Liu, X. Xu, X. Li,
Z. Liu, W. Peng, X. Zhao, Y. Zhang, Y. Xu, Early detection of ST-segment elevated
myocardial infarction by artificial intelligence with 12-lead electrocardiogram, Int.
Declaration of Competing Interest J. Cardiol. 317 (2020) 223–230, https://doi.org/10.1016/j.ijcard.2020.04.089.
[24] S.M. Mathews, C. Kambhamettu, K.E. Barner, A novel application of deep learning
The authors declare that they have no known competing financial for single-lead ECG classification, Comput. Biol. Med. 99 (2018) 53–62, https://
doi.org/10.1016/j.compbiomed.2018.05.013.
interests or personal relationships that could have appeared to influence
[25] S. Hong, Y. Zhou, J. Shang, C. Xiao, J. Sun, Opportunities and challenges of deep
the work reported in this paper. learning methods for electrocardiogram data: a systematic review, Comput. Biol.
Med. 122 (2020), 103801, https://doi.org/10.1016/j.compbiomed.2020.103801.
References [26] Z. Yao, Z. Zhu, Y. Chen, Atrial fibrillation detection by multi-scale convolutional
neural networks, 20th Int. Conf. Inf. Fusion, Fusion 2017 - Proc (2017), https://
doi.org/10.23919/ICIF.2017.8009782.
[1] M. Guanglong, W. Xiangqing, Y. Junsheng, ECG signal classification algorithm [27] U.R. Acharya, H. Fujita, S.L. Oh, Y. Hagiwara, J.H. Tan, M. Adam, Application of
based on fusion features, J. Phys. Conf. Ser. 1207 (2019), https://doi.org/10.1088/ deep convolutional neural network for automated detection of myocardial
1742-6596/1207/1/012003. infarction using ECG signals, Inf. Sci. (Ny). 415–416 (2017) 190–198, https://doi.
[2] M.Z. Poh, Y.C. Poh, P.H. Chan, C.K. Wong, L. Pun, W.W.C. Leung, Y.F. Wong, M.M. org/10.1016/j.ins.2017.06.027.
Y. Wong, D.W.S. Chu, C.W. Siu, Diagnostic assessment of a deep learning system [28] R. Schirrmeister, L. Gemein, K. Eggensperger, F. Hutter, T. Ball, Deep learning with
for detecting atrial fibrillation in pulse waveforms, Heart (2018), https://doi.org/ convolutional neural networks for decoding and visualization of EEG pathology,
10.1136/heartjnl-2018-313147. 2017 IEEE Signal Process. Med. Biol. Symp. SPMB 2017 - Proc. (2017), https://doi.
[3] M. Hammad, S. Zhang, K. Wang, A novel two-dimensional ECG feature extraction org/10.1109/SPMB.2017.8257015.
and classification algorithm based on convolution neural network for human [29] R.T. Schirrmeister, J.T. Springenberg, L.D.J. Fiederer, M. Glasstetter,
authentication, Future Gener. Comput. Syst. 101 (2019) 180–196, https://doi.org/ K. Eggensperger, M. Tangermann, F. Hutter, W. Burgard, T. Ball, Deep learning
10.1016/j.future.2019.06.008. with convolutional neural networks for EEG decoding and visualization, Hum.
[4] Y. Xia, N. Wulan, K. Wang, H. Zhang, Detecting atrial fibrillation by deep Brain Mapp. (2017), https://doi.org/10.1002/hbm.23730.
convolutional neural networks, Comput. Biol. Med. 93 (2018) 84–92, https://doi. [30] Y.R. Tabar, U. Halici, A novel deep learning approach for classification of EEG
org/10.1016/j.compbiomed.2017.12.007. motor imagery signals, J. Neural Eng. (2017), https://doi.org/10.1088/1741-
[5] X. Zhai, C. Tin, Automated ECG classification using dual heartbeat coupling based 2560/14/1/016003.
on convolutional neural network, IEEE Access 6 (2018) 27465–27472, https://doi. [31] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale
org/10.1109/ACCESS.2018.2833841. image recognition, 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc.
[6] A.Y. Hannun, P. Rajpurkar, M. Haghpanahi, G.H. Tison, C. Bourn, M.P. Turakhia, (2015).
A.Y. Ng, Publisher correction: cardiologist-level arrhythmia detection and [32] X. Glorot, A. Bordes, Y. Bengio, Deep sparse rectifier neural networks, J. Mach.
classification in ambulatory electrocardiograms using a deep neural network, Nat. Learn. Res. 15 (2011) 315–323.
Med. 25 (1) (2019) 65–69, https://doi.org/10.1038/s41591-018-0268-3. Nat. [33] S. Ioffe, C. Szegedy, Batch normalization: accelerating deep network training by
Med. 25 (2019) 530. https://doi.org/10.1038/s41591-019-0359-9. reducing internal covariate shift, 32nd Int. Conf. Mach. Learn. ICML 2015 (2015)
[7] Y. Li, Y. Pang, J. Wang, X. Li, Patient-specific ECG classification by deeper CNN 448–456.
from generic to dedicated, Neurocomputing 314 (2018) 336–346, https://doi.org/ [34] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: A
10.1016/j.neucom.2018.06.068. simple way to prevent neural networks from overfitting, J. Mach. Learn. Res. 15
[8] Y. Xu, J.Y. Zhu, E. Chang, Z. Tu, Multiple clustered instance learning for (2014) 1929–1958.
histopathology cancer image classification, segmentation and clustering, Proc. [35] D.P. Kingma, J.L. Ba, Adam: a method for stochastic optimization, 3rd Int. Conf.
IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (2012) 964–971, https:// Learn. Represent. ICLR 2015 - Conf. Track Proc. (2015) 1–15.
doi.org/10.1109/CVPR.2012.6247772. [36] E. DeLong, D. DeLong, D. Clarke-Pearson, Comparing the areas under two or more
[9] Z. Jia, X. Huang, E.I. Chang, Y. Xu, Constrained deep weak supervision for correlated receiver operating characteristic curves: a nonparametric, Approach,
histopathology image segmentation, IEEE Trans. Med. Imaging 36 (2017) Biometrics 44 (1988) 837–845, https://doi.org/10.2307/2531595.
2376–2388, https://doi.org/10.1109/TMI.2017.2724070. [37] J. Ramon, L. De Raedt, Multi instance neural networks, ICML Work. (2000) 53–60.
[10] Y. Xu, T. Mo, Q. Feng, P. Zhong, M. Lai, E.I. Chang, Deep learning of feature [38] O.Z. Kraus, J.L. Ba, B.J. Frey, Classifying and segmenting microscopy images with
representation with multiple instance learning for medical image analysis, ICASSP, deep multiple instance learning, Bioinformatics 32 (2016) i52–i59, https://doi.
IEEE Int. Conf. Acoust. Speech Signal Process. - Proc. (2014) 1626–1630, https:// org/10.1093/bioinformatics/btw252.
doi.org/10.1109/ICASSP.2014.6853873. [39] A.H. Ribeiro, M.H. Ribeiro, G.M.M. Paixão, D.M. Oliveira, P.R. Gomes, J.
[11] A. Vezhnevets, J.M. Buhmann, Towards weakly supervised semantic segmentation A. Canazart, M.P.S. Ferreira, C.R. Andersson, P.W. Macfarlane, M. Wagner, T.
by means of multiple instance and multitask learning, Proc. IEEE Comput. Soc. B. Schön, A.L.P. Ribeiro, Automatic diagnosis of the 12-lead ECG using a deep
Conf. Comput. Vis. Pattern Recognit. (2010) 3249–3256, https://doi.org/10.1109/ neural network, Nat. Commun. 11 (2020) 1–9, https://doi.org/10.1038/s41467-
CVPR.2010.5540060. 020-15432-4.
[12] P.O. Pinheiro, R. Collobert, From image-level to pixel-level labeling with [40] M.M. Dundar, S. Badve, V.C. Raykar, R.K. Jain, O. Sertel, M.N. Gurcan, A multiple
convolutional networks, Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern instance learning approach toward optimal classification of pathology slides, Proc.
Recognit. (2015) 1713–1721, https://doi.org/10.1109/CVPR.2015.7298780. - Int. Conf. Pattern Recognit. (2010) 2732–2735, https://doi.org/10.1109/
[13] D. Pathak, P. Krahenbuhl, T. Darrell, Constrained convolutional neural networks ICPR.2010.669.
for weakly supervised segmentation, Proc. IEEE Int. Conf. Comput. Vis. (2015) [41] L. Sun, Y. Lu, K. Yang, S. Li, ECG analysis using multiple instance learning for
1796–1804, https://doi.org/10.1109/ICCV.2015.209. myocardial infarction detection, IEEE Trans. Biomed. Eng. 59 (2012) 3348–3356,
[14] J. Feng, Z.H. Zhou, Deep MIML network, 31st AAAI Conf. Artif. Intell. AAAI 2017 https://doi.org/10.1109/TBME.2012.2213597.
(2017) 1884–1890. [42] D. Wang, A. Khosla, R. Gargeya, H. Irshad, A.H. Beck, Deep Learning for
[15] G.B. Moody, R.G. Mark, The impact of the MIT-BIH arrhythmia database, IEEE Eng. Identifying Metastatic Breast Cancer, 2016, pp. 1–6. http://arxiv.org/abs/1
Med. Biol. Mag. 20 (2001) 45–50, https://doi.org/10.1109/51.932724. 606.05718.
[16] M. Arif, I.A. Malagore, F.A. Afsar, Detection and localization of myocardial [43] Y. Xu, Y. Li, Z. Shen, Z. Wu, T. Gao, Y. Fan, M. Lai, E.I.C. Chang, Parallel multiple
infarction using K-nearest neighbor classifier, J. Med. Syst. 36 (2012) 279–289, instance learning for extremely large histopathology image analysis, BMC
https://doi.org/10.1007/s10916-010-9474-3. Bioinformatics 18 (2017) 1–15, https://doi.org/10.1186/s12859-017-1768-8.
[17] U.R. Acharya, H. Fujita, O.S. Lih, Y. Hagiwara, J.H. Tan, M. Adam, Automated
detection of arrhythmias using different intervals of tachycardia ECG segments
10

Locating Abnormal Heartbeats in ECG Segments Based On Deep Weakly Supervised Learning

Uploaded by

Copyright:

Available Formats

Locating Abnormal Heartbeats in ECG Segments Based On Deep Weakly Supervised Learning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Locating Abnormal Heartbeats in ECG Segments Based On Deep Weakly Supervised Learning

Uploaded by

Copyright:

Available Formats

Biomedical Signal Processing and Control 68 (2021) 102674

Contents lists available at ScienceDirect

Biomedical Signal Processing and Control

Locating abnormal heartbeats in ECG segments based on deep weakly

Fig. 1. Workflow of our proposed method.

behavior in incipient DL fields. 2. Materials and methods

2.4.1. Dataset derived from MIT-BIH database

Fig. 2. Illustration of ECG segments in four datasets.

training test training test training test training test

N 58414 14405 58064 14649 58235 14386 58267 14262

Yi and Yik could be stated as Yi = 1 ≡ ∃k, s.t. Yik = 1 and Yi = 0 ≡ ∀k, s.

multiple instance learning (MIL), which was first proposed by Dietterich

3.1.2. Evaluation on instance-level prediction

Here we compared the performance of MINN with different MIL func­

Sen Spec Sen Spec Sen Spec Sen Spec

MINN_ISR 0.9835 0.9857 0.9593 0.9839 0.9792 0.9759 0.9676 0.9811

Sen Spec Sen Spec Sen Spec Sen Spec

MINN_ISR 0.9212 0.9926 0.8830 0.9931 0.8308 0.9925 0.7832 0.9862

Fig. 9. Visualization of instance-level predictions Yik

Fig. 12. Graphical representation of false negative heartbeats.

part of them in Fig. 13. Through the observation of ECG signals, we

In this study, we proposed a novel and practical function to enrich

You might also like

Here we compared the performance of MINN with different MIL func