1 s2.0 S0010482522008502 Main
1 s2.0 S0010482522008502 Main
1 s2.0 S0010482522008502 Main
A R T I C L E I N F O A B S T R A C T
Keywords: Cardiovascular disease (CVD) is the most fatal disease in the world, so its accurate and automated detection in
Electrocardiogram (ECG) the early stages will certainly support the medical expert in timely diagnosis and treatment, which can save many
Long short-term memory network (LSTM) lives. Many types of research have been carried out in this regard, but due to the problem of data imbalance in
Deep learning
the medical and health care sector, it may not provide the desired results in all aspects. To overcome this
Synthetically minority oversampling technique
(SMOTE)
problem, a sequential ensemble technique has been proposed that detects 6 types of cardiac arrhythmias on large
Ensemble technique ECG imbalanced datasets, and the data imbalanced issue of the ECG dataset has been addressed by using a hybrid
Convolutional neural network (CNN) data resampling technique called “Synthetically Minority Oversampling Technique and Tomek Link (SMOTE +
Tomek)". The sequential ensemble technique employs two distinct deep learning models: Convolutional Neural
Network (CNN) and a hybrid model, CNN with Long Short-Term Memory Network (CNN-LSTM). The two
standard datasets “MIT-BIH arrhythmias database” (MITDB) and “PTB diagnostic database” (PTBDB) are com
bined and extracted 23, 998 ECG beats for the model validation. In this work, the three models CNN, CNN-LSTM,
and ensemble approach were tested on four kinds of ECG datasets: the original data (imbalanced), the data
sampled using a random oversampled technique, data sampled using SMOTE, and the dataset resampled using
SMOTE + Tomek algorithm. The overall highest accuracy was obtained of 99.02% on the SMOTE + Tomek
sampled dataset by ensemble technique and the minority class accuracy result (Recall) is improved by 20% as
compared to the imbalanced data.
1. Introduction CVDs and it may reach up to 23 million by 2030 [2]. Lack of awareness
and inadequate medical care lead to an overabundance of CVD in
Cardiac disorders are the most common and fatal diseases which low-middle-income countries, which can be reduced if detected early
affect the human heart functionality generally known as CVD (cardio and correctly, and diagnosed with appropriate medical care. Hence,
vascular diseases) and that incorporate many different conditions of the cardiac arrhythmia which is a type of CVD can be detected by the ECG
heart. CVDs are mainly divided into three core categories which are signal that represents the electrical activities of the heart’s functionality
blood vessel problems such as coronary or peripheral arterial disease, and is the most widely and commonly used to detect abnormalities of the
heart rhythm irregularity (cardiac arrhythmias), and cardiomyopathy heart. Here, arrhythmias refer to irregular, fast, and slow ECG rhythm,
(cardiac muscle disorders). Here, cardiac arrhythmias have been chosen which may be severe or non-serious based on the types of irregularities.
for this work because it is based on the electrical activity of the heart that Patients suffering from cardiovascular diseases (CVDs) may lose their
can be analyzed using electrocardiogram (ECG) signals. lives if their cardiac arrhythmias are detected incorrectly or manually.
The death rate due to CVD is one of the highest in the world [1], and Automatic, accurate, and early detection of arrhythmias have a very
even in low and middle-income nations such as India, the rate is so high, important role in diagnosing and treating CVDs by medical experts and
exceeding 3/4 of all deaths. Every year, 17.9 million people die due to doctors.
* Corresponding author. Department of Electrical Engineering, Indian Institute of Technology(ISM), Dhanbad, India.
E-mail addresses: [email protected] (H.M. Rai), [email protected] (K. Chatterjee), [email protected] (S. Dashkevych).
https://doi.org/10.1016/j.compbiomed.2022.106142
Received 31 March 2022; Received in revised form 4 September 2022; Accepted 18 September 2022
Available online 22 September 2022
0010-4825/© 2022 Elsevier Ltd. All rights reserved.
H.M. Rai et al. Computers in Biology and Medicine 150 (2022) 106142
The major objective of this work is to provide assistance to health has grown in popularity and prominence in the area of medical
care professionals by detecting arrhythmias in an automated, accurate, healthcare, notably in the biological signal processing and biomedical
and time-efficient manner. Also, our aim is to design a neural-based imaging field, during the last several years.
classifier that can automatically categorize the abnormalities present It has only been a few years since Deep Neural Network (DNN)
in the heart by analyzing the ECG waveform. Furthermore, applying models such as CNN, Recurrent Neural Network (RNN), LSTM, Bi-LSTM,
large ECG datasets for training the classifiers will learn the morphology Transformer Model, Autoencoders, and Generative Adversarial Net
of the ECG signals and accurately predicts arrhythmias. works (GAN) were used to automate the feature extraction, segmenta
Many researchers have made significant contributions to the field of tion, and classification of ECG cardiovascular problems.
arrhythmia detection utilizing ECG patterns over the course of the last Kiranyaz et al. [33]suggested an accurate and quick ECG classifica
few decades. Automatic detection systems for the identification of ar tion method using 1-D (one-dimensional) CNN, which was
rhythmias have been widely researched, with machine learning (ML) patient-specific and can fuse extraction of feature and classification into
being used to assess and diagnose cardiac irregularity as a result. In an integrated learner. The experimentation was accomplished using a
order to do so, a range of publicly accessible datasets is employed to 1D CNN model that achieved better performance on the classification of
evaluate the efficacy of the machine learning approaches used for ar two types of beats supraventricular ectopic beats (SVEB) and ventricular
rhythmias detection. The most widely used ECG databases are, MITDB, ectopic beats (VEB). Rahhal et al. [34] proposed the DL approach which
PTBDB, AFDB (“MIT-BIH AF Database”), NSRDB (“MIT-BIH NSR Data was used to develop a strategy for the dynamic classification of ECG
base”), INCART (“St.-Petersburg Institute of Cardiological Technics 12- Arrhythmias. The suggested model is constructed using a deep CNN
lead Arrhythmia Database”), CUDB (“Creighton University Ventricular model, with the exception of the final layer (“Representation Layer”),
Tachyarrhythmia Database”), SVDB (“MIT-BIH Supraventricular which has been adjusted to be a softmax layer. The preprocessed and
Arrhythmia Database”), PAFDB (“PAF prediction challenge database”), feature extracted data, together with the proposed model, results in high
EDB (“European ST-T Database”), LTSTDB (“Long-Term ST database”), accuracy in arrhythmia classification. Acharya et al. [35] designed a
and ESC (“European Society of Cardiology ST-T Database”). structure of the CNN model for the automatic recognition of coronary
In order to automate the categorization of arrhythmias, several artery disease (CAD), with 4 convolutional layers, 4 maximum pooling
research efforts have been devoted to neural network (NN) techniques layers, and 3 fully connected layers, using the segments of ECG signal of
for assistance such as [3–5], SVM (support vector machine) [3,6–9], periods 2s–5s. The designed Deep CNN model was able to categorize the
decision tree [10–13], RBF (radial basis function) [14–17], KNN (K ECG beats into normal and abnormal classes with 94.95% of accuracy,
-Nearest Neighbors) [18–22], PNN (probabilistic neural network) [3, 93.72% of sensitivity, and 95.18% of specificity for Net-1 (2s) and
23], ensemble classifiers [24,25], and hybrid classifiers [13,26–30] for 95.11% of accuracy, 91.13% of sensitivity, and 95.88% of specificity for
detecting the arrhythmias into different classes. net-2 (5s), respectively. Oh et al. [36] proposed an automatic system by
The effectiveness of the Machine learning-based classifiers in terms combining the long short-term memory network (LSTM) and convolu
of accuracy is dependent on the preprocessing such as de-noising, tional neural network (CNN) for the classification of five arrhythmias
smoothening of the Electrocardiogram data, and also the extraction of types, NSR, LBBB, RBBB, APC, and PVC. Their contribution was to use
features. Considering that the features extracted from the dataset are variable-length ECG segments from the MIT-BIH database. The
perhaps the crucial aspects of classical Machine learning classifiers, CNN-LSTM system provided 98.10% accuracy, 97.50% of sensitivity,
studies have attempted a great variety of features to improve the and 98.70% of specificity with a 10-fold cross-validation scheme.
effectiveness of their classification techniques. Among the most widely Acharya et al. [37] provided a new tool for automatic differentiation of
incorporated features are as follows: WT (wavelet transform) based ventricular arrhythmias into two classes, shockable and non-shockable
features [4,5,13,15,30,31], PCA (principal component analysis) based from 2s ECG signal segments. The 11 layers of CNN were used for the
features [6,8,9,15], morphology-based features [10,31,32], Statistical processing of the segmented ECG signals, where the proposed model was
features [6,14,31]. validated using 10-cross validation and obtained maximum specificity,
The classic machine learning-based prediction approach is heavily sensitivity, and accuracy of 91.04%, 95.32%, and 93.18% respectively.
reliant on the sorts of features that are provided to the classifier to make Hasan and Bhattacharjee [38] presented a technique to identify many
predictions. This indicates that the performance of a classifier is largely diseases from the heart using 1D-CNN, where the modified ECG wave
determined by the features; hence, the same classifier with the same form is applied as an input to the network. To obtain the modified ECG
database and the same parameters may have a varied accuracy for signal, the decomposition of each ECG signal through empirical mode
various kinds of features. The work of feature extraction is entirely decomposition (EMD) is combined with a higher-order intrinsic mode
dependent on the application; there are no predefined criteria or pa function (IMF). The algorithm was validated on three databases PTB,
rameters to be used in the extraction of the features. For example, Saint-Petersburg, and MIT-BIH and the accuracy obtained for these da
numerous researchers have employed a range of variables to predict tabases was 98.24%, 99.71%, and 97.70%, respectively. Andersen et al.
arrhythmias using an ML-based classifier in order to predict cardiovas [39] developed an end-to-end model with a combination of CNN and
cular disease. Furthermore, it takes a significant amount of time, re RNN to extract the higher-order features from ECG RR-interval (RRI)
sources, and effort to extract the features from the preprocessed datasets segments for the classification of NSR (normal sinus rhythm) and AF
before giving them to the classifier. Additionally, the erroneous, incor (atrial fibrillation). The training and validation of the model (CNN-RNN)
rect, or unconnected feature may have a negative impact on the per were performed on 3 different databases of a total of 89 subjects. On
formance of the classifier. A huge dataset will necessitate more time and 5-fold cross-validation, it provided 96.95% and 98.98% of specificity
effort, and certain technologies that operate fine on smaller datasets may and sensitivity respectively. The algorithm (CNN-RNN) was also vali
not be suitable for large datasets. dated on the test dataset (unseen), which provided 86.04% and 98.96%
Deep learning models, which include deep neural networks (DNN), of sensitivity and specificity respectively. Shaker et al. [40] proposed a
have been proposed to address the issues (especially feature extraction) new data augmentation method using generative adversarial networks
that have arisen when using the conventional ML-based approach for (GAN) to balance out the data imbalance problem in the classification of
detection tasks. The DNN model has the capability that autonomously ECG signals. The experimental result showed that the data produced by
extracts features without the need for human interaction or the use of GAN improves the efficiency and performance of the classifier as
any other tools or methods Aside from that, it works well on big datasets compared to the original imbalanced dataset. Their proposed model
and on extended learning, decreases the amount of effort and time using GAN and CNN attained the precision more than 90%, accuracy
required for feature extraction, and produces extremely excellent per more than 98%, sensitivity more than 97.7% and specificity more than
formance measure values. The utilization of deep learning approaches 97.4%. Atal and Singh [41] proposed the automated method for
2
H.M. Rai et al. Computers in Biology and Medicine 150 (2022) 106142
arrhythmias classification by applying optimization based deep con presents a summary of the paper’s results and finishes with suggestions
volutional neural network (DCNN). The Multi-purpose Bat algorithm for future study possibilities.
(MOBA) combined with the Rider Optimization Algorithm to developed
new algorithm called Bat-rider optimization algorithm (BaROA). The 2. Materials & methods
algorithms were validated on the MITDB and the classifier performance
was analyzed based on evaluation metrics sensitivity, specificity and 2.1. ECG database
accuracy, which was 93.98%, 95% and 93.19%, respectively.
Along with ECG signal processing the deep learning, machine This work merges or integrates two highly authenticated and widely
learning and conventional neural network models are also used for used databases (MITDB and PTBDB) for the investigation of cardiac
segmentation and detection of various critical segments from image arrhythmias, which are both available online in open access. The MITDB
dataset. Authors of [42–46] applied multiple image segmentation ap contains recordings of electrocardiogram signals from 47 patients
proaches in various types of biomedical images, including chest radi totaling 48 records, each recording lasting half-hour. This ECG database
ography images, endoscopic images, lung cancer images, color fundus may be acquired from the PhysioNet website [50], It is open access,
images, and COVID-19 X-ray images, respectively. The authors in publicly accessible database that can be downloaded for free. The ECG
Ref. [47]employed an LSTM model for the musculoskeletal rehabilita recording was annotated by 2 expert cardiologists after digitizing it with
tion evaluation system on EMG signals, while the authors of [48] used a a 360 Hz sampling rate and is available in the two-channel recording.
graph-based ELM (Extreme Learning Machine) for the identification of Approximately 110,000 annotated beats are available in this database
epileptic seizures from unbalanced EEG signals. with different arrhythmias types [51]. As shown in Table 1, the
The literature review conducted on the detection or prediction of description utilized for this dataset is divided into five types of beats,
cardiac arrhythmia using deep learning models observed that most re which are defined by the AAMI (“Association for the Advancement of
searchers have used the MITDB database as used in ML, but they have Medical Instrumentation”) standard. The MITDB has 5 types of ECG
also combined several databases to make it larger. Researchers have beats as per the AAMI standard and are categorized as “N”, “S”, “V”, “F”,
applied a variety of deep learning approaches and derived good per and “Q”.
formance values to classify cardiac arrhythmias from ECG signals, and The PTBDB is also an open-access database and famous for the
finding their limitations is a very difficult task. Despite these excessive myocardial infarction (MI) beats of ECG signals, contains a total of 549
efforts, still, some possibilities of signs of progress based on the inves records of electrocardiogram data from the different age groups of 290
tigation conducted, the Accuracy and performance parameters can people, all of which are publicly accessible to download from the
further be improved. As the number of data increases, generally the PhysioNet website [52]. A distinctive feature of this database is that it
performance of the classifiers decreases, which needs to be addressed. has 368 recordings of 148 persons who have had a myocardial infarction
The latest, hybrid and fast DNN classifier can be employed to improve (MI) and 52 individuals who have had healthy control; in addition, there
the accuracy and simulation time. The data misbalancing problem is also are additional records with 7 different forms of arrhythmias. Using the
a big issue that is required to be looked up. PTBDB ECG signals, the 1000 Hz of sampling frequency was used for
Therefore, to fill all the above issues and research gaps investigated digitizing the signals, lead II recordings from 12 leads were selected for
during the literature survey, the CNN-based deep learning model with this study, and two kinds of beats MI and normal (healthy controls) were
the Ensemble technique has been proposed. This work that we have used for analysis. Both the datasets (“MITDB and PTBDB”) after seg
presented is an extension of our previous work [49], which has been mentation have been downloaded from Kaggle respiratory and were
modified in a few key ways, which are mentioned in the following uploaded and contributors by Refs. [53,54]. In this Kaggle respiratory,
paragraphs. The main important contributions of this work are MITDB has 109,446 ECG beats of 5 classes as per the AAMI standard, and
highlighted: MITBD has 14,552 ECG signals of two categories MI class and healthy
control (normal) class.
a Novel Hybrid SMOTE + Tomek dataset balancing algorithm to A total of 109,446 ECG beat samples from MITDB are distributed
resolve the data imbalance problem for detecting 6-type arrhythmias according to all 5 types of arrhythmia, 90,589 beats of “N" type, 2779
from ECG signals. beats of “S" type, 7236 beats of “V", 803 beats of “F" type, and 8039 beats
b The approach described here is the first of its kind to combine the of the “Q" type. According to the AAMI standard, “N" represents non-
SMOTE + Tomek oversampling and undersampling techniques in ectopic beats, “V" represents ventricular ectopic beats, “S" represents
order to balance the ECG dataset for arrhythmia classification, to the super ventricular ectopic beats, and “Q" represents unknown or
authors’ knowledge.
c Designed two different CNN-based DL classifiers, the CNN model and Table 1
hybrid CNN-LSTM model accompanied by a sequential ensemble Beats annotations of MITDB according to AAMI standard.
model for the prediction of arrhythmia on a large ECG dataset.
Annotation Types of ECG beats No of Dataset
d Four independent experiments have been performed to validate Beats
model performance on, (i) Imbalanced training dataset (99,198
N ➢Normal 94, 635 MITBD +
beats), (ii) Random oversampled training dataset (454248 beats), ➢NE (“Nodal escape”) PTBDB
(iii) SMOTE oversampled training dataset (454248 beats), and (iv) ➢RBB (“Right bundle branch block”) and
SMOTE + Tomek link sampled training dataset (452,552 beats). LBB (“Left bundle branch block”) beats
e Verification of the proposed model on two combined benchmarks ➢AE (“Atrial escape”)
S ➢NP (“Nodal premature”) 2779 MITBD
imbalanced ECG dataset MITDB and PTBDB on 123,998 ECG beats.
➢AAP (“Aberrant atrial premature”)
f More than 10% increase in the accuracy of minority class data as ➢SVP (“Supra-ventricular premature”)
compared to imbalanced ECG data. ➢AP (“Atrial premature”)
V ➢VE (“Ventricular escape”) 7236 MITBD
The work is prepared as follows: Section 2 explains the utilized ➢PVC (”(Premature ventricular
contraction”)
materials and methods in this manuscript, which include ECG datasets, F ➢Fusion of ventricular and normal (FVN) 803 MITBD
data balancing approaches, and deep learning models, among others. Q ➢Paced 8039 MITBD
Section 3 provides a comprehensive explanation of the proposed method ➢Unclassifiable
that has been used in this study. Section 4 provides a comprehensive ➢FPN (“Fusion of paced and normal”)
M ➢Myocardial Infarction (MI) 10, 506 PTBDB
review of the results acquired as well as a discussion on them. Section 5
3
H.M. Rai et al. Computers in Biology and Medicine 150 (2022) 106142
unclassified beats, and “F" represents fusion beats. Similarly, the PTBDB accuracy of the classifier prediction result; as a consequence, it is
used for this work has a total of 14,552 beats which are classified into required to evaluate the classifier performance using a range of perfor
two classes, abnormal (MI) 10,506 beats and normal 4046 beats. Finally, mance evaluation metrics to ensure that it is accurate.
both datasets are concatenated, resulting in a total of 123,998 ECG The dataset used in this study, which is an integration of the MITDB
signals, with 94,635 normal beats, bringing the total number of ECG and PTBDB, is a highly imbalanced dataset, with the majority class
signals to 123,998. (Normal class) containing 94,635 ECG beats and the minority class
The combined dataset includes total 109,446 ECG beats where each (Fusion Class) containing only 803 ECG beats. In addition, there is 6
ECG beat have 188 data points. Since the size of each ECG beats are not number of classes in this dataset, with the Normal class having 94,635
same, it’s not feasible to utilize these beats as an input to the proposed ECG beats in the majority and all other classes (M class having 10,506
models. Also, our proposed model does not mainly focus on pre beats, Q class having 8039 beats, V class having 7236 beats, S class
processing techniques, we have utilized time stamp (complete beat) as having 2779 beats, and F class having 803 beats) being in the minority.
the input to the classifier, hence it is required to make the size of each
beat equal. So, the zero padding concepts have been utilized to make 2.1.2. Random Oversampling
each beat of equal size of 188 and time interval of 187 (N-1). So, the use Random Oversampling (RO) is one of the most commonly and widely
of zero padding is to make all the beat size equal because the classifier used methods for handling the data imbalance problem, the main reason
requires same input sample size for each data samples, and it is per for selecting this is because it is very fast, simple to implement, and an
formed by simply putting zero from the actual beat size to 188 data excellent beginning point. The resampling technique balances the data
points. To use the zero padding approach, we first chose the beats with by adding several samples to the minority classes or by removing the
the largest sample size and then made all of the ECG beats the same size samples from the majority of data classes. Based on reducing or deleting
as the longest one based on the greatest number of samples (188). Fig. 1 the data from minority classes or increasing or adding the data into
shows the visualization of each ECG signal beat based on its arrhythmias minority classes is divided into Undersampling and Oversampling
type; in this work the 6 types of arrhythmias types will be predicted. methods. Under-sampling is the method of reducing by removing the
number of samples from the majority classes, this technique is more
2.1.1. Data balancing suitable in the case of huge datasets, for example, sentiments detection
In machine learning-based classifiers the training sample of the using Facebook data. In the case of oversampling, the number of samples
datasets should be approximately equal for each class to make the best is added to increase the data size of the minority class, this is more
use of it. Smaller differences between the training data classes have little suitable for most cases where datasets are not sufficiently large [55].
influence on classifier performance; but, if the gap is considerable, even
the greatest classifier may have poor performance when applied to the 2.1.3. SMOTE and tomek links sampling
training data class. Hence the dataset with a large difference between Data imbalance has been addressed in this paper by the use of the
data classes is said to be imbalanced data such datasets may not be best hybrid SMOTE + Tomek data balancing or resampling strategy, which is
suitable for proper training of the classifier model. In the case of an presented in this subsection. When using hybrid data resampling
imbalanced dataset, the class having the highest sample size is named as methods, both data oversampling (which increases the dataset) and data
“majority class” similarly; the class with the smallest sample size is undersampling (which decreases the dataset) are integrated together to
named as “minority class”. So, the majority class influences more in the produce a single dataset. When using the SMOTE approach, the minority
classifier training stage and has more dominance over the minority class dataset is increased by constructing or producing artificial data samples
[55,56]. Overfitting issues and underfitting issues for the majority & (oversampling) based on the majority dataset, and the minority datasets
minority data classes respectively may occur because of this data are made equal to the majority dataset by using the majority informa
imbalance. As a result of the high amount of sample data, the majority tion. Because this approach develops synthetic datasets rather than just
class is trained more extensively, while a smaller quantity of data may duplicating the original dataset, the created data are comparable to the
result in inappropriate training of the minority class, hence classifier will original dataset but not identical to the original dataset. Using the K-
be more favorable to the majority class and the correct prediction of Nearest Neighbor (KNN) technique, this approach (SMOTE) over
majority class is more as compared to minority classes. One of the issues samples the minority dataset by selecting the nearest neighbors from the
with an unbalanced dataset is that it may not be seen in the overall minority data samples and then generating additional samples based on
Fig. 1. Visualization of each ECG signal beat based on its arrhythmias type.
4
H.M. Rai et al. Computers in Biology and Medicine 150 (2022) 106142
The CNN model is the most widely used and one of the most estab
lished types of deep learning models. It was mainly started with 5 layer
CNN model (Le-Net) [59] and EfficientNet-B7 [60,61] has 813 number
of layers. When it comes to image processing tasks such as object
recognition and segmentation, the CNN is a DL-based model that was
primarily intended and developed for these. This makes CNN more
generalized and widely used in a variety of applications, including signal
processing, image processing, object detection, image segmentation,
speech processing, and many others. CNN has built-in filters that are
capable of automatically extracting the feature of the input data; this
allows CNN to be used in a variety of applications, including signal
processing, and image processing. In this study, we used a modified deep
CNN model for the ECG arrhythmias prediction job since CNN has a
straightforward architecture that can be quickly deployed and adjusted
depending on the task at hand.
CNN model structure is constructed with certain well-known layers
and layouts, which may be readily tweaked or chosen depending on the
tasks. The conventional CNN architecture consists of four operations:
convolution operation, batch normalization operation, non-
linearization operation (ReLu), pooling operation (maximum and min
imum), and dropout rate (operation), it also consists of one fully-
connected (FC) layer in the complete structure [62]. The basic and
initial operation of a CNN model is convolution, which is conducted by a
convolution layer (Conv), which changes based on the data type, such as
Conv1D for dimension data, and Conv2D for two dimensions data, and
Conv3D for three dimensions data in a Python environment. Because it
includes an inbuilt filter, this layer is primarily responsible for convo
lution operations and feature extraction tasks. Depending on the re
quirements, the feature extraction job may be varied by adjusting the
filter Size and stride of the convolution layer. Because real-time appli Fig. 2. The example of class visualization of (a) original dataset(b) SMOTE
cations are generally non-linear, the ReLu layer performs non-linear sampled dataset (c) SMOTE + Tomek sampled dataset.
operations that make use of the activation function. This activation
5
H.M. Rai et al. Computers in Biology and Medicine 150 (2022) 106142
function brings non-linearity to the CNN model. The pooling layer aims “xt” represents the current input state, tan – tanh activation function, w -
to lower the number of features, which implies it reduces the dimen corresponding weight vector, xt - input, b - bias, (X) – pointwise multi
sionality of the feature retrieved by the Conv layer. The pooling layer plication, (+) - Pointwise summation. The structure of a single-stage
also minimizes the number of parameters to learn, which enhances LSTM model with all states, gates and activation functions is illus
processing speed. There are two types of pooling layers: maximum trated in Fig. 3 [65].
pooling layers (MaxPool) that choose the maximum values from the
feature map and average pooling layers (AvgPool) that execute the
2.4. Activation functions
average operation on the features [63]. The batch normalization layer is
an optional layer that users may or may not utilize depending on the
Activation functions introduce nonlinearity into the network, where,
workload. In order to accelerate the learning process of the network
non-linearity makes the network powerful to learn the complex data and
model in general, input characteristics are standardized; this normali
most the real-world problems are non-linear in nature. There are three
zation may also be applied to the network’s hidden layer. During the
types of activation functions used in this work including both CNN and
training phase, the BN layer normalizes the output of the Convolution
LSTM architecture, “tanh” and “sigmoid” and “Relu”, where tanh and
layer by computing the mean and variance of the current layer. The CNN
sigmoid are used in the LSTM network and Relu is used in the CNN
model’s last layer is the Completely Linked Layer (FC), which signifies
network. The sigmoid activation function is abbreviated as the Logistic
that each neuron from the previous layer is fully connected with each
activation function whose output range is between 0 and [66]. Whereas
neuron from the following layer. The FC layer’s primary function is to
modified “tanh” is the modified version of the sigmoid activation
categorize the provided input data into distinct output classes by using
function, whose output varies between − 1 and 1. The Relu activation is
high-level characteristics from the convolution layer and pooling layer.
one of the simplest activation functions which provides only positive
output. Equations (7)–(9) represent the sigmoid, tanh, and Relu acti
2.3. LSTM model vation functions, respectively. The pictorial view of the sigmoid and
tanh activation function is shown in Fig. 4.
Considering that ECG data is predominantly a time series data set
1
that is dependent on previous information, the LSTM network was Sigmoid(y) = y
(7)
1 + e−
employed because it is the most appropriate and specifically developed
deep learning model for both time series and sequential data, as well as 2
for time series and sequential data. Additionally, when using an ML or tanh(y) = − 1 = 2*Sigmoid − 1 (8)
1 + e− y
DL-based model for time-series data, the model suffers from Short-Term {
Memory (STM), and the LSTM model gives a solution to this problem as x, x≥0
Relu(x) = (9)
well. Short-term memory suffers information loss or is not able to carry 0, x<0
the information to a later timestamp than an earlier one if the sequence
of data is long. The main reason behind this is the “vanishing gradient 3. Proposed methodology
problem”, as gradients play an important role in the weight update of the
network. That means weight does not update much or is unable to Fig. 5 depicts the methods used to detect cardiac dysfunction for
contribute sufficiently to the learning if the gradient becomes tremen strongly imbalanced data using deep CNN, hybrid CNN-LSTM, and
dously small. If the gradient update becomes too small in the previous ensemble method. In this work, 6 types of ECG arrhythmias have been
layers of STM, then these layers stop learning, because that network may predicted including 5 AAMI recommended arrhythmias and one MI
forget what it saw previously in the long data sequence, this is known as arrhythmia. Individual types of arrhythmias include multiple beats type,
short-term memory. also for intensive training of the model, a large number of samples have
The LSTM model’s structure makes use of a number of gates in its been utilized.
design, each of which is responsible for the flow of information in the This section provides a high-level summary of the approach and the
LSTM. These gates have an important structure that allows them to keep flow diagram being used for the prediction of abnormality from a large
significant information and dispose of unnecessary ones. For precise and and imbalanced ECG dataset. The raw input is comprised of segmented
efficient forecasts, this operation procedure of gates brings forward the ECG rhythm data, while the objective outcome is composed of six
crucial information included in the lengthier data sequence that has different kinds of anticipated arrhythmias. The primary stage of the
been collected. The fundamental ideas of LSTM functioning are found in proposed methodology incorporates the preparation of the segmented
gates and cell states, as well as concise summaries of gate and cell states, Electrocardiogram data. After that in the 2nd stage, the CNN model is
respectively [64]. The single-stage LSTM model mainly consists of two used for automated feature extraction as well as for training the data.
states, Hidden state (ht) and Cell state (Ct) along with three types of
gates, input Gate (it), forget Gate(ft), and Output gate (Ot) along with
two states, and The mathematical representation of all gates and states
are given in equations (1)–(6).
it = σ (wi [ht− 1 , xt ] + bi ) (1)
( )
ft = σ wf [ht− 1 , xt ] + bf (2)
̂
Ct = tanh(wc [ht− 1 , xt ] + bc ) (4)
Ct = ft *Ct− 1 + it *̂
Ct (5)
ht = Ot *tanhCt (6)
where, ̂
Ct - candidate, σ - Sigmoid activation function, h-hidden state, Fig. 3. The single-stage architecture of the LSTM model.
6
H.M. Rai et al. Computers in Biology and Medicine 150 (2022) 106142
Fig. 4. Visual representation of sigmoid (red) and tanh (green) and ReLU (blue) activation function.
The hybrid CNN-LSTM proposed model is employed in the third stage, LSTM models. The detailed structure-wise description of the designed
and in the end stage, the CNN-LSTM model delivers data to the FC layer, and proposed models along with the proposed data balancing technique
that classifies arrhythmias into six different classes based on their have also been explained in this section.
features.
The data augmentation (DA) approach has been used in many studies
3.1. Data preparation
to generate additional artificial samples, although it is not recom
mended when dealing with time - series or signals. In this work, the
Because the developed models make use of deep CNN architecture,
oversampling approach has been applied over the DA technique because
which recovers very fine details from the input signal, de-noising and
applying the DA to ECG rhythms might result in another sort of ar
smoothing of the data are not necessary for this work; instead, data
rhythmias due to a change in the morphology. As LSTM is best suitable
preparation has been carried out. Because the proposed model makes
for time series data, hence for the prediction of arrhythmias from ECG
use of the LSTM network, it is important to prepare the ECG time series
rhythms, the integration of CNN and LSTM is utilized. To further
data in such a manner that the time interval of the input dataset is
improve the overall result, the sequential ensemble technique has been
provided to the model. Most of the researchers have analyzed the ECG
applied by averaging the best-weighted outcome of the CNN and CNN-
signal based on its features like statistical or morphological or based on
7
H.M. Rai et al. Computers in Biology and Medicine 150 (2022) 106142
its frequency, amplitude, or combination of many features. Further datasets, as described in section 3.2. The overall number of training
more, we have not extracted any features from the dataset that has been beats is 99,198, accounting for 80% of the complete database, and the
used, which is a raw dataset that is produced according to the time number of training beats from each class, N, S, V, F, Q, and M, is 75708,
period which is provided to the proposed model for the prediction of 2223, 5789, 642, 6431, and 8405, respectively. As a result, the original
arrhythmias. Due to the fact that the ECG signals utilized for this work datasets are significantly unbalanced, with the following classes
are time-series data that differ with regard to time from one timestamp contributing to the total training beats: “N" class 76%, “S" class 2.2%, “V"
to another, Hence the data is prepared by determining the time period in class 5.8%, “F" class 0.6%, “Q" Class 6.5%, and “M" class 8.5%. All three
which the signal appears. By subtracting the upcoming time-stamped data balancing techniques have been applied only to training datasets,
ECG signal (ECGt+1) from the present time-stamped signal (ECGt), the and test datasets are not being exposed to the training process as well as
time interval of the ECG waveform is computed. Time interval (ΔECG) is the data balancing process. Because our objective is to preprocess the
given by equation (10): dataset in such a way that the proposed model should be well learned
and therefore it will provide the best results even on the unbalanced test
ΔECG = ECGt+1 − ECGt (10)
dataset.
In the 1st stage, we have experimented with the random data over
ECGt+1 − ECGt ΔECG d(ECG)
gradient = = = (11) sampling technique, in which the excess number of data samples is
Unit time t dt
added to minority samples to make all the classes equal. The over
hence the gradient of the time series ECG data is computed as the ratio of sampling is applied to the train data samples by adding the number of
the time interval value by the unit time of the data sample. The math beats into all the minority classes such as “S”, “P”, “V”, “Q”, and “M”
ematical expression of the gradient of the ECG time-series data is given classes because it has fewer numbers of data. Therefore, after applying
by equation (11), hence the loss curve and accuracy curve graph will be the RO technique, the total number of beats increases to 454,248, where
plotted using the gradient and the time interval value data. each class contributes equally, 16.7% (75,708) of the total training
beats. Now, these totals of 454,248 ECG beats, which are obtained after
RO, are used for training deep learning models with 75,708 beats from
3.2. Distribution of dataset
each class.
In this experiment, the SMOTE oversampling technique is applied to
The division or distribution of the dataset is also required for the
training samples; this method produces synthetic samples of minority
purposes of training and verifying the classifier performance that will be
classes instead of copying and adding sample data to minority classes in
employed in the classification assignment. A smaller dataset for training
a RO method. For the full oversampling procedure to be completed,
results in inefficient training of the model, which may result in a poor
there are five phases, with each step creating a synthetic sample of the
validation outcome as compared to the larger training dataset. As a
minority class that is equivalent to the sample of the majority class. In
result, while distributing the dataset, it is usually desirable to segregate
the 1st step, SMOTE is applied between the ‘F’ class (minority among all)
enough datasets from each class in the same ratio in order to train the
and majority class “N”, and both become equal after the oversampling
model appropriately. There are a total of 123,998 ECG beats that are
process (N– – F=75,708). In the 2nd step, SMOTE is applied between the
used to predict six different kinds of arrhythmias. Because of this, the
next minority class “S”, and the majority class “N”, and this process is
complete dataset is separated into two parts: the training dataset and the
repeated until all the minority samples become equal to the majority
test dataset. The training dataset is used to train the DL models, while
sample. In this method also number of samples after SMOTE over
the test dataset is used to evaluate the classifier’s performance on the
sampling reaches 454248 with each class having an equal number of
test dataset. After experimenting with different ratios, we found that the
samples (75,708), i.e. 16.7% of the total ECG beats. In other words,
80:20 ratio produced the best results, in which 80% of the dataset is used
simply balancing the sample of the class does not solve the data
for training the models and the remaining 20% is used for assessing the
imbalance issue properly, but it needs to be separated between classes.
model’s performance. The proposed deep CNN and hybrid CNN-LSTM
The proposed method we have applied to solve the data imbalance
models are trained using 80% of the total dataset (80% of 123,998 =
problem is the SMOTE + Tomek link, where SMOTE generates the
99,198), while the remaining 20% (20% of 123,998 = 24800) is used for
synthetic data of the minority class, on the other hand, the Tomek link
testing the model performance. The test dataset is kept reserved and is
removes overlapping samples of the majority class in the boundary line.
not exposed during the training process. Additionally, in order to eval
This means that it not only balances the data samples between the mi
uate the training performance of the proposed model, the training
nority class and the majority class but it also clearly separates the data
dataset is further partitioned into two datasets: a training dataset and a
samples between the classes. The SMOTE oversampling and Tomek link
validation dataset, with an 80:20 split between the two datasets. In the
undersampling technique was initially developed for binary class data,
end, 80% of the entire training dataset (80% of 99,198 = 79,358) is
but in this work, we have implemented it in multi-class datasets (6
utilized to train the models, with the remaining 20% (20% of 99,198 =
classes). Also, we have used both techniques jointly in the hybrid
19,840) being used to validate the training performance.
method to solve the problem of data imbalance and at the same time, it
will make explicit diffusion between the data of each class which will
3.3. Data balancing (sampling) help the model to learn well. The SMOTE + Tomek hybrid resampling
technique has also been implemented in five steps, which are presented
In the event of an unbalanced dataset, the balancing strategies are in detail in Table 2.
the way of making or generating an equal amount or an adequate In the 1st step, the Input Class = (75708, 2223, 5789, 642, 6431,
number of data samples for each class in order to train the classifier 8405) is the number of ECG beats in the imbalanced data where Classi
effectively. The datasets in which MITDB and PTDBD are combined represents types of ECG beats in the original dataset (n, s, v, f, q, m).
together have six number of ECG beats and are highly imbalanced, used Firstly we find the majority class (Mj = n = 75,708) and minority class
in this work. Because data are either generated or artificially manufac (Mn = 642) and then apply the SMOTE + Temek technique to generate
tured, or simply duplicated or resampled in order to make all data several synthetic minority samples as well as remove the overlapping
classes equal, data balancing is also referred to as data sampling in some majority samples. After the 1st step, the Classi is updated to =(75305,
contexts. In this work, the SMOTE + Tomek algorithm has been used to 2223, 5789, 75708, 6431, 8405), where the majority samples “n” re
balance the ECG datasets, and the results have been compared to the duces from 75,708 to 75,305 and minority samples oversamples from
performance of two other data balancing methods, RO and SMOTE. The 642 to 75,708. This process is repeated until the 5th step, and the final
datasets were distributed in an 80:20 ratio as training and testing output class samples are updated (75305, 75388, 75441, 75412, 75298,
8
H.M. Rai et al. Computers in Biology and Medicine 150 (2022) 106142
Table 2
Distribution of ECG beats in train and test dataset before and after sampling.
Arrhythmias Total ECG Beats Training Beats RO SMOTE SMOTE + Tomek Test Beats
75708) after the SMOTE + Tomek is implemented. The total training Maximum pooling layer with dimensions of 2 × 1 and a dropout rate of
beat becomes 452552 after using the SMOTE + Tomek algorithm of data 0.25. The second through fourth groups of the CNN model has a similar
balancing, with each class (n,s,v,f,q,m) contributing 16.64%, 16.66%, structure, including the same filter size (2 × 1), BN layer, MaxPool (2,1),
16.67%, 16.66%, 16.64%, and 16.33% to the total training sample, and dropout (0.25), with the exception of the filter number, which varies
respectively. The time link is only able to downsample 1696 beats from as 512, 256, and 128, respectively. The fifth group of CNN models varies
the majority class because of a huge number of data, hence a total of from the previous four groups in that it utilizes the Conv2D layer with a
452552 beats are used to train the proposed deep learning models. filter size of 4 × 1, the number of filters is 64, the average pooling layer
The percentage contribution of arrhythmia classes in training data, (AvgPool) rather than the maximum pooling layer, and no dropout is
before sampling and after applying RO, SMOTE, and SMOTE + Tomek, utilized; only the BN and ReLu layers are the same. The BN layer, ReLu,
respectively, is illustrated in Fig. 6. Also the types of arrhythmia classes, and stride = 1 are all the same throughout the deep CNN model con
the total number of ECG beats, class-wise training beats, number of beats struction. More filters were used at the beginning of groups to extract
after sampling through all the methods, and test data samples are minute features from the input ECG signals, and rectangular filter sizes
charted in Table 2. to fit the shape of the ECG waveform. In addition, the batch normali
zation layer is utilized to normalize the data after each layer and the
dropout layer to speed up the training process. In the final group num
3.4. Proposed deep CNN model
ber, we employed an average pooling layer rather than a maximum
pooling layer to average the features recovered throughout the pro
In this manuscript, two CNN-based deep neural network models are
cedure rather than only the maximum values. The fully connected layers
designed using rectangular filter size and the sequential ensemble
of size 64, together with the softmax layer, are used in the model’s final
technique by integrating both models. The developed deep CNN model
layer to classify the training data into six groups. Fig. 8 shows the
is detailed layer by layer in this section, and the architectural view of the
intended structure of the deep CNN model layer by layer and group by
model is shown in Fig. 7. The deep CNN model is divided into five major
group.
parts known as group numbers, as well as one Fully connected layer (FC)
with Softmax activation function, for a total of 21 layers in the deep CNN
network. Each group number has four layers: Convolution Layer 3.5. Proposed CNN-LSTM model
(Conv2D), Batch Normalization Layer (BN), Non-linearization Layer
(ReLu), and a combination of maximum pooling and dropout layer For the recognition of six types of cardiac abnormalities, the CNN-
(Maxpool-dropout). Because each ECG segment includes 188 samples, LSTM hybrid DL model is structured with 24 numbers of layers, the
187 ECG data points, and one target value, we applied the 187 × 2 × 1 layer-wise architecture of the proposed model is presented in Fig. 8.
size of the input ECG signal to the CNN model. Convolution layers with Similar to the proposed CNN model, this model is designed in six major
varying filter sizes (‘f’) in rectangular shapes are used in all five group groups where 1st five groups have four layers each and the sixth and last
numbers, along with a varied number of filters (N) or feature maps for group has only three layers. The initial three groups of the proposed
each. The convolution layer in the first group of the CNN model has CNN-LSTM model are exactly similar to the designed deep CNN model as
received an input signal with dimensions of 187 × 2 × 1, and it contains shown in Fig. 8 and the remaining three groups 4th to 6th are modified
a rectangular filter with dimensions of 5 × 2 and 1025 number of filters, by including the LSTM layer to get optimum result. As of 1st, three
as well as a batch normalization (BN) layer, a ReLu layer, and a groups include Conv layer, BN layer, ReLu layer, and Maximum pooling
Fig. 6. Comparison of the percentage of samples for each class before and after resampling.
9
H.M. Rai et al. Computers in Biology and Medicine 150 (2022) 106142
and Dropout layer similar to the CNN model including filter size and a determine the overall outcome of the ensemble method, shown in Fig. 9.
number of filters, but the structure of the last three groups (4th, 5th and
6th) doesn’t utilize Maxpool layer and also the number and size are 3.7. Performance and parameter evaluation
varying in all groups.
The 4th group of the CNN-LSTM model includes the Conv2D layer The proposed classifiers’ output is in the form of a confusion matrix,
with a 3 × 1 filter size and 256 number of the filter along with the BN which is generally inaccessible to the general public for analyzing pur
layer, ReLu Layer, and a dropout rate of 0.25. Similarly, the 5th group poses. Additionally, the performance of the classifier, as well as the
consists of a Conv2D layer of filter size 5 × 1 and 128 number of the filter accuracy of the target class, must be confirmed via the use of statistical
along with BN, ReLu, and 0.25 dropout rate. Both 4th and 5th groups are measures. The accuracy and inaccuracies are the most often used per
almost similar except for the filter size and number of filters, but the 6th formance evaluation metrics (PEM), but there is a range of other metrics
and final group is entirely different which includes the LSTM layer. The that researchers use based on the sorts of needs and applications they are
last group of the proposed model has only three elements, a convolution investigating. It is not possible to estimate the overall performance of a
layer of filter size 1 × 1 with 128 numbers filters, reshape layer to resize deep learning model only by accuracy since accuracy does not represent
the signal, and an LSTM layer of size 64. The input to the LSTM layer in the efficiency of minority class samples or the accuracy of a specific class
the 6th group is the output of the reshape layer and its output is given to sample, and hence accuracy alone cannot be used to evaluate perfor
the last layer of the CNN-LSTM model. The last layer apart from the six mance. In order to evaluate the performance of the proposed models, we
groups is the Fully connected layer with a softmax activation function have computed seven types of PEM from the confusion matrix. These are
which classifies the ECG signal features into 6 classes. Acc (“Accuracy”), Re (“Recall”), F1 (“F1-Score,” Pr (“Precision,” CER
(“Classification Error Rate,” Sp (“Specificity,” and OAcc (“Overall Ac
curacy”), which are given in equations (12)–(18).
3.6. Sequential ensemble method
TP
Precision(Pr%) = *100 (12)
Except for a few exceptions, the ensemble methodology is a sort of TP + FP
post-processing tool, not a type of classifier model. In this methodology,
the results of the best models are integrated by averaging and using TP
Recall (Re%) = *100 (13)
various strategies to improve the overall classification performance. TP + FN
When it comes to improving the overall outcome, there are a number of TN
ensemble techniques that may be applied. The most straightforward Specificity(Sp%) = *100 (14)
TN + FP
ensemble methods include weighted averaging, maximum voting, and
averaging. At an advanced level, it may include ensemble approaches FP + FN
Classification Error Rate (CER%) = *100 (15)
like boosting, blending, stacking, and bagging, among others. The Total beats
ensemble technique is structured in such a way that the outputs of two or
more classifiers are blended in order to optimize the overall result. The Accuracy(Acc%) =
TP + TN
*100 (16)
basic models that are used for classification or regression are referred to Total beats
as base models or base learners, and the outcome of these base learners,
2*Precision*Recall
that is, the overall ensemble model, is referred to as the meta learner. F1 − Score (F1%) = *100 (17)
Precesion + Recall
Base models and base learners are both used in classification and
regression [67,68]. The use of ensemble learning is motivated by the ∑
TP
desire to first significantly improve the efficiency and accuracy of the Overall Accuracy (OAcc%) = *100 (18)
Total Beats
classification or regression. Secondly, to overcome the problems asso
ciated with dataset imbalance, and furthermore, to account for the here, the term TP, TN, FP, and FN represents True Positive, True
biasing that occurs during the classifier training process. The ensemble Negative, False Positive, and False Negative, also * indicates the math
method was developed to accomplish these goals. ematical multiplication.
To improve the overall result, we have employed sequential and
average ensemble method which combines the base learning result by 4. Result and experimentation
averaging the prediction result of both models. The dataset is trained
using the proposed CNN model in the first stage, and the arrhythmias There are a whole number of 123,998 ECG beats in the experiment,
predicted across the testing dataset are tested in the second stage. Using which has been conducted using data from multiple databases: the
the same training dataset as before, the proposed CNN-LSTM model MITDB and the PDBDB, which had six kinds of arrhythmia classes: the N
predicts the results of the test dataset using the same predictions. Lastly, class, the S Class, the V class, the F class, the Q class, and the M class.
the average of both prediction results has been calculated in order to Upon that training data, 4 experiments have been carried out
10
H.M. Rai et al. Computers in Biology and Medicine 150 (2022) 106142
Fig. 8. The Architecture of CNN Model (Left), CNN-LSTM model (Right) and combined Ensemble Technique.
independently: the first on the imbalanced set of data, the second on the dataset. An ensemble approach is also used to assess the training per
RO (“random oversampled”) training dataset, and the third on the formance by averaging the prediction results from both deep learning
SMOTE re-sampled dataset, and the last on the SMOTE + Tomek models. Two distinct deep learning models are offered to validate the
resampled set of data. The imbalanced dataset has been only used for the training performance: the CNN model and the CNN-LSTM model. In all,
first experiment; the remaining experiments are performed balanced 123,998 ECG beats were unglued for evaluating the deep neural network
11
H.M. Rai et al. Computers in Biology and Medicine 150 (2022) 106142
performance for classifying 6 types of arrhythmias. 80% of the total confusion matrix, which has been used to compute the seven types of
dataset (99,198) beats has been utilized for CNN and CNNN-LSTM validation metrics and tabulated for the purpose of visualization.
model training and the remaining 20% of the total 123,998 ECG beats
(24,800) have been unglued for testing the model’s efficiency. Valida
4.1. Arrhythmias detection on imbalance data
tion of the model on Kaggle respiratory with GPU assistance was carried
out in the Python environment using the TensorFlow and Keras libraries.
The first experiment was carried out using unbalanced training and
Experiments have been carried out on laptops with the following
the original dataset, in which both the presented models CNN and CNN-
hardware configurations: 1 TB hard drive, 8 GB RAM, NVIDIA GPU, and
LSTM have been trained and predictions have been produced on a
i5 core Pentium CPU.
24,800 test dataset, with the sequential ensemble approach being used
Hyperparameters are crucial in training the model since they directly
to reach the final result. The 99,198 total training beats are further
regulate the DL model’s training behavior and have a beneficial impact
divided into the train and validation dataset in the ratio of 80:20,
on training performance. As a result, selecting the appropriate hyper
respectively. 80% of the total training beats, (79,358) have been used for
parameter is critical to the effective learning of DNN model classifiers, as
training, and the remaining (19,840) for validating the training per
incorrect hyperparameter choice might result in low prediction perfor
formance. The 24,800 beats are already reserved for testing the model
mance and poor learning of the classifiers.
for each experimentation, which is unexposed from the training process.
For example, if somehow the training rate of the network is
First, the CNN model has been trained, and then the CNN-LSTM model
extremely high, it may be more likely to intersect with other models; on
on an imbalanced training dataset, up to 100 epochs. Figs. 10 and 11
the other hand, if it is little, it may fail to recognize the needed pattern of
show the accuracy and loss histories of the CNN and CNN-LSTM models
data. In this way, selecting the optimal hyperparameters provides an
on the training and validation datasets, respectively, without using the
advantage inefficient search across the universe of hypothetical hyper-
balancing strategy. From the training history on the imbalanced dataset,
parameters and also assumes a significant job in dealing with a large
it was observed that both the models learn well throughout the training
number of trials while hyperparameter tuning is being done simulta
on both the train and validation datasets.
neously [69].
The confusion matrix plot of the prediction result obtained on a
It was determined that it gives enhanced and further favorable out
24,800 test dataset is shown in Fig. 12. The prediction is done utilizing
comes later advanced learning of the Deep neural Network with innu
both proposed models and the ensemble approach after training them
merable classes of hyperparameters. Each of the two DL models (CNN
for 100 epochs, and the associated confusion matrix (CM) plot.
and CNN-LSTM) has a set of hyper-parameters that are utilized for
The prediction results in terms of performance evaluation metrics are
optimization, as well as the same input features which are used for
charted in Table 3 for both the models and ensemble technique. From
learning, verification, and validating. The DL model is fed with 187 × 2
the prediction result, it was found that both CNN and CNN-LSTM model
× 1 raw ECG waveform, and the ultimate hyper-parameters used
provides good arrhythmia detection overall accuracy of 98.53%, and
throughout the learning phase include the 100 epochs (iterations), 128
98.79%, respectively but the ensemble technique provides a slightly
batch size, optimizer type Adam, 1e-3 learning rate, 1e-7 decay, and
better overall accuracy of 99.83% over both the models. Even though
momentum 0.9 among others. Since the dataset has been categorized or
both the proposed model learned well on the training data and produces
distributed in three parts, (“training, validation, and test”), the predic
good overall accuracy on the test data but the average class accuracy
tion has been produced on a testing dataset, the training dataset has
(recall %) reduces drastically, 91.19%, 92.69%, and 92.31% for CNN,
been used to train the proposed models, and the validation dataset for
CNN-LSTM, and sequential ensemble approach, respectively. Similarly,
validating the training performance. For both proposed models, the
the minority class accuracy (F class) is 70.19%, 78.88%, and 75.16% for
training results have been presented in the form of an accuracy and loss
all three respective methods, which shows the big difference in overall
graph, and the predictions have been presented in the form of a
accuracy that needs to be improved. This difference is because of the
12
H.M. Rai et al. Computers in Biology and Medicine 150 (2022) 106142
Fig. 10. The CNN model’s learning curves (accuracy & loss) on imbalanced data.
Fig. 11. The CNN-LSTM model’s learning curves (accuracy & loss) on imbalanced data.
Table 3
The performance analysis of the proposed models trained on an imbalanced data.
Predicted ECG Beats %Acc %Pr %Re %Sp %F1 %CER OAcc (%)
Method Class TP FN FP TN
CNN N 18827 100 201 5672 98.79 98.94 99.47 96.58 99.21 1.21 98.58
S 467 89 50 24194 99.44 90.33 83.99 99.79 87.05 0.56
V 1403 44 49 23304 99.63 96.63 96.96 99.79 96.79 0.38
F 113 48 17 24622 99.74 86.92 70.19 99.93 77.66 0.26
Q 1599 9 18 23174 99.89 98.89 99.44 99.92 99.16 0.11
M 2040 61 16 22683 99.69 99.22 97.10 99.93 98.15 0.31
Average 99.53 95.15 91.19 99.32 93.00 0.47
CNN-LSTM N 18842 85 163 5710 99.00 99.14 99.55 97.22 99.35 1.00 98.79
S 461 95 37 24207 99.47 92.57 82.91 99.85 87.48 0.53
V 1403 44 52 23301 99.61 96.43 96.96 99.78 96.69 0.39
F 127 34 15 24624 99.80 89.44 78.88 99.94 83.83 0.20
Q 1599 9 16 23176 99.90 99.01 99.44 99.93 99.22 0.10
M 2067 34 18 22681 99.79 99.14 98.38 99.92 98.76 0.21
Average 99.60 95.95 92.69 99.44 94.22 0.40
ENSEMBLE N 18850 77 169 5704 99.01 99.11 99.59 97.12 99.35 0.99 98.83
S 468 88 30 24214 99.52 93.98 84.17 99.88 88.80 0.48
V 1406 41 53 23300 99.62 96.37 97.17 99.77 96.77 0.38
F 121 40 14 24625 99.78 89.63 75.16 99.94 81.76 0.22
Q 1601 7 13 23179 99.92 99.19 99.56 99.94 99.38 0.08
M 2063 38 12 22687 99.80 99.42 98.19 99.95 98.80 0.20
Average 99.61 96.28 92.31 99.43 94.14 0.39
13
H.M. Rai et al. Computers in Biology and Medicine 150 (2022) 106142
huge data imbalance problem, the models are provided 75,708 ECG the prediction of arrhythmias into 6 classes and their performance
beats from the N class whereas only 645 beats from the minority (F measures are charted in terms of the confusion matrix illustrated in
class). Hence the model learns well in the classes which have adequate Fig. 18. From the confusion matrix, it is observed that the overall ac
training data but could not learn well on very small (minority) sample curacy using the CNN model is 98.57%, the CNN-LSTM model is
data. 98.79%, and using the ensemble technique is 98.91% which is quite
good and improved compared to previous experiments.
4.2. Arrhythmias predication on RO dataset According to the confusion matrix, the performance assessment
measures have been formulated and plotted in Table 5, as shown. When
The 2nd experimentation has been performed on a random over applied to the test dataset, the prediction result demonstrates that the
sampled dataset obtained using the RO technique using proposed overall accuracy has been somewhat increased, while the recall (average
models. The RO technique is applied to ECG training beats, which cre class accuracy) has remained unchanged, but the minority class accu
ates 454,248 ECG beats total, with 75,708 samples from each class. This racy (F-class) has been greatly improved. The proposed CNN and CNN-
resampled data was further divided into training and validation with an LSTM models, as well as the sequential ensemble approach, are used to
80:20 ratio, the test data is already reserved for 24,800 beats. Hence the achieve recall indices of 93.08%, 93.94%, and 94.37%, respectively, for
total beats used in the training are 363,417 and 90,850 for validation. In the minority classes, as well as minority class accuracy of 81.99%,
this case, also, the data has been trained using CNN and CNN-LSTM DL 84.47%, and 85.71%, respectively. Consequently, as compared to the
classifiers, and lastly, the obtained outcome has been ensemble. Training unbalanced dataset, the minority class accuracy of the “F Class” has risen
and validation accuracy and loss history on an oversampled dataset is by around 10%, but when compared to the RO dataset, there has only
shown in Fig. 13 and Fig. 14 From the training history it is observed that been a marginal improvement. In particular, it reveals that the models
because a large number of similar data models do not learn after a few trained in the RO and SMOTE sampled datasets are highly comparable,
iterations, and accuracy and loss are 100% and 0% on both train and particularly in the case of minority classes.
validation data.
Fig. 15 shows the confusion matrices of the prediction results ach 4.4. Arrhythmia prediction on SMOTE + Tomek sampled data
ieved by both models (CNN–CNN-LSTM) trained on random over
sampled datasets and employing the ensemble technique. The final experimentation was performed using the proposed model
Table 4 summarizes the prediction results in terms of all assessment on SMOTE + Tomek link resampled data, it is a combined oversampling
criteria, demonstrating that although class accuracy (recall percent) has and undersampling hybrid method. In this case, the hybrid SMOTE +
increased, total accuracy has decreased when matched to the imbal Tomek link technique is utilized to oversample all the minority classes
anced database. Overall accuracy for CNN, CNN-LSTM, and the by creating synthetic samples, and also downsample some majority class
ensemble approach is 98.53%, 98.55%, and 98.74%, respectively, while data which overlaps with the minority on borderline (shown in Fig. 2).
average recall is 94.82%, 94.67%, and 95.29%. Similarly, utilizing the This technique generates 452, 552 ECG beats from 99,198 training data.
CNN model, CNN-LSTM model, an ensemble approach, the minority (F) The 80% (362,042) of total oversampled beats were used to train the
class accuracy increased to 82.61%, 82.61%, and 84.47%, respectively models and 90,510 beats for validation. The model learning history on
as compared to the previous approach (imbalanced dataset). As a result, the train and validation dataset is plotted in Fig. 19 and Fig. 20. From
the replica copy of the samples may result in an improvement in indi model learning history it is found that even after creating a huge number
vidual class accuracy for each given class, but overall accuracy falls. of synthetic samples model learns quite well.
Increasing the number of samples alone will not be enough to increase Finally, the trained model over a huge dataset is provided test dataset
the classification performance of all classes as well as overall accuracy, for prediction whose outcome is displayed in terms of the confusion
and the additional samples must be unique to achieve this improvement. matrix in Fig. 21. The performance evaluation metrics of the prediction
result using all methods are formulated from the confusion matrix and
4.3. Arrhythmias prediction on SMOTE sampled dataset tabularized in Table 6. From the prediction result, it is observed that
models significantly improve the overall accuracy as well as average
The 3rd experimentation is to apply synthetic oversampling to mi class accuracy and also the minority class accuracy. On the SMOTE +
nority data class; in this case, we have applied the SMOTE oversampling Tomek balanced dataset, the overall accuracy achieved using the CNN
on training samples. The total 99,198 training beats oversampled to model is 98.91%, the overall accuracy acquired using the CNN-LSTM
454,248 by generating synthetic samples to all minority instances except model is 99.06%, and the overall accuracy obtained using the
majority class, hence all minority samples become equal to the “N” class ensemble approach is 99.10%. Furthermore, the mean percentage recall
samples to 75,708 beats. Similar to the RO technique, in this case also values for all three approaches are 94.95%, 95.74%, and 96.28%,
363,417 and 90,850 training and validation samples have been used to respectively, according to the results. The accuracy of minority classes
train the models. The training and validation accuracy and loss history increased as well, rising to 85.71%, 88.20%, and 90.68%, respectively;
using the proposed models are shown in Fig. 16 and Fig. 17, and from representing a 20% increase from 75.16% from imbalanced data to
the training history, it was observed that the model is learned well 90.68% from SMOTE + Tomek balanced data.
throughout the iteration. Following the successful completion of four experiments on four
Trained models learned on SMOTE sampled datasets are utilized for different datasets, including an imbalanced and balanced ECG dataset, it
Fig. 13. The CNN model’s learning curves (accuracy & loss) on an RO data.
14
H.M. Rai et al. Computers in Biology and Medicine 150 (2022) 106142
Fig. 14. The CNN-LSTM model’s learning curves (accuracy & loss) on an RO data.
Table 4
The performance analysis of the proposed models trained on a RO data.
Predicted ECG Beats % Acc %Pr %Re %Sp %F1 %CER OAcc (%)
Method Class TP FN FP TN
CNN N 18700 227 83 5790 98.75 99.56 98.80 98.59 99.18 1.25 98.53
S 508 48 134 24110 99.27 79.13 91.37 99.45 84.81 0.73
V 1407 40 51 23302 99.63 96.50 97.24 99.78 96.87 0.37
F 133 28 54 24585 99.67 71.12 82.61 99.78 76.44 0.33
Q 1601 7 16 23176 99.91 99.01 99.56 99.93 99.29 0.09
M 2087 14 26 22673 99.84 98.77 99.33 99.89 99.05 0.16
Average 99.51 90.68 94.82 99.57 92.60 0.49
CNN-LSTM N 18707 220 87 5786 98.76 99.54 98.84 98.52 99.19 1.24 98.55
S 502 54 119 24125 99.30 80.84 90.29 99.51 85.30 0.70
V 1405 42 51 23302 99.63 96.50 97.10 99.78 96.80 0.38
F 133 28 41 24598 99.72 76.44 82.61 99.83 79.40 0.28
Q 1603 5 19 23173 99.90 98.83 99.69 99.92 99.26 0.10
M 2090 11 43 22656 99.78 97.98 99.48 99.81 98.72 0.22
Average 99.52 91.69 94.67 99.56 93.11 0.48
ENSEMBLE N 18737 190 76 5797 98.93 99.60 99.00 98.71 99.30 1.07 98.74
S 509 47 110 24134 99.37 82.23 91.55 99.55 86.64 0.63
V 1409 38 40 23313 99.69 97.24 97.37 99.83 97.31 0.31
F 136 25 45 24594 99.72 75.14 84.47 99.82 79.53 0.28
Q 1604 4 18 23174 99.91 98.89 99.75 99.92 99.32 0.09
M 2093 8 23 22676 99.88 98.91 99.62 99.90 99.26 0.13
Average 99.58 92.00 95.29 99.62 93.56 0.42
has been determined that the proposed CNN and CNN-LSTM models are Tomek data sampling. From Table 7, it can be seen that the average
both efficient and effective in terms of performance evaluation metrics, (Avg.) recall percentage which represents the individual class accuracy
even when applied to a large 24,800 test dataset. However, the ensemble increases from 91.19% to 96.28%, similarly, the minority (F) class ac
technique, in which the forecast results from both models are averaged curacy improved from 70.19% to 90.68%, which means the minority
and then the prediction is made, had the best overall performance in all class accuracy increases by approximately 20% using the proposed
of the trials. Thus, when compared to the individual classifiers, it can be method.
concluded that the ensemble approach has a significant impact on the
prediction outcome. 5. Discussion
We have also added and compared the performance of all models,
CNN, CNN-LSTM, and Ensemble classifier for all four experiments, From Table 7, we conclude that the proposed hybrid SMOTE +
without data balancing, RO, SMOTE oversampling, and SMOTE + Tomek Resampling technique provides a better solution to the
15
H.M. Rai et al. Computers in Biology and Medicine 150 (2022) 106142
Fig. 16. The CNN model’s learning curves (accuracy & loss) on SMOTE sampled data.
Fig. 17. The CNN-LSTM model’s learning curves (accuracy & loss) on SMOTE sampled data.
Fig. 18. Confusion matrix visualization of proposed techniques on SMOTE sampled data.
imbalanced data problem. individual classes that are due to the Tomek link under-sampling
When we train the model on data over-sampled by the RO technique, method. Therefore, the hybrid SMOTE + Tomek method solves the
improvements in OAcc, Avg. Acc and minority class accuracy are not issue of the data imbalance problem of the ECG dataset and provides an
significant, as the RO method only replicates the same set of samples. approximately 20% increase in the accuracy percentage of the minority
But it provides a sufficient amount of data samples to each class for class (Recall%) in comparison to the original and imbalanced data
intensive learning models to learn, models only learn on the same set of (Table 7).
data again and again, so if we provide different test data it cannot give Over the years, several researchers have endeavored to detect
better results. arrhythmia from ECG datasets using various deep learning approaches.
The SMOTE data sampling technique is a better solution than RO, as They used various datasets, types of arrhythmia, methodology, and data
it does not replicate the same set of data, but it produces the same set of balancing techniques in their studies. In Table 8, we analyzed the
synthetic samples based on the data pattern. Therefore the models learn effectiveness of the proposed models against recently published state-of-
well from each class on the diversity of sufficient datasets and provide the-art approaches. In the performance comparison, we have excluded
significant improvements in the accuracy of the minority class, which the study based on the number of ECG beats classified and the number of
can be seen in Table 7. The only drawback of this technique is that it ECG leads used in the analysis from consideration. The compared pub
does not consider the boundaries of different ranges of data so the data lications have been studied utilizing a variety of ECG leads, datasets, and
may overlap with each other. the number of arrhythmias present; however, the majority of them make
So to overcome the limitation of the SMOTE sampling method, we use of the MITBD.
have hybridized them by combining SMOTE oversampling with the The majority of automated arrhythmia prediction systems using ECG
Tomek link under-sampling method, which generates heterogeneous datasets that have been developed over the years have utilized the
synthetic samples due to the SMOTE oversampling technique and MITDB database for the validation of model performance; this can also
removes overlapping majority class data across the boundaries of the be observed from the state-of-the-art table, which shows that all of the
16
H.M. Rai et al. Computers in Biology and Medicine 150 (2022) 106142
Table 5
The performance analysis of the proposed models trained on SMOTE sampled data.
Predicted ECG Beats % Acc %Pr %Re %Sp %F1 %CER OAcc (%)
Method TP FN FP TN
CNN 18810 117 178 5695 98.81 99.06 99.38 97.99 99.22 1.19 98.57
470 86 52 24192 99.44 90.04 84.53 99.65 87.20 0.56
1386 61 33 23320 99.62 97.67 95.78 99.74 96.72 0.38
132 29 42 24597 99.71 75.86 81.99 99.88 78.81 0.29
1592 16 18 23174 99.86 98.88 99.00 99.93 98.94 0.14
2055 46 32 22667 99.69 98.47 97.81 99.80 98.14 0.31
Average 99.52 93.33 93.08 99.50 93.17 0.48
CNN-LSTM 18827 100 141 5732 99.03 99.26 99.47 98.29 99.36 0.97 98.79
475 81 40 24204 99.51 92.23 85.43 99.67 88.70 0.49
1392 55 24 23329 99.68 98.31 96.20 99.76 97.24 0.32
136 25 37 24602 99.75 78.61 84.47 99.90 81.44 0.25
1599 9 17 23175 99.90 98.95 99.44 99.96 99.19 0.10
2072 29 40 22659 99.72 98.11 98.62 99.87 98.36 0.28
Average 99.60 94.24 93.94 99.57 94.05 0.40
ENSEMBLE 18835 92 132 5741 99.10 99.30 99.51 98.42 99.41 0.90 98.91
477 79 36 24208 99.54 92.98 85.79 99.67 89.24 0.46
1399 48 25 23328 99.71 98.24 96.68 99.79 97.46 0.29
138 23 36 24603 99.76 79.31 85.71 99.91 82.39 0.24
1601 7 18 23174 99.90 98.89 99.56 99.97 99.23 0.10
2079 22 24 22675 99.81 98.86 98.95 99.90 98.91 0.19
Average 99.64 94.60 94.37 99.61 94.44 0.36
Fig. 19. The CNN model’s learning curves (accuracy & loss) on SMOTE + Tomek sampled data.
Fig. 20. The CNN-LSTM model’s learning curves (accuracy & loss) on SMOTE + Tomek sampled data.
Fig. 21. Confusion matrix visualization of proposed techniques on an imbalanced SMOTE + Tomek sampled data.
17
H.M. Rai et al. Computers in Biology and Medicine 150 (2022) 106142
Table 6
The performance analysis of the proposed models trained on SMOTE + Tomek sampled data.
Predicted ECG Beats % Acc %Pr %Re %Sp %F1 %CER OAcc (%)
Method TP FN FP TN
CNN 18827 100 100 5773 99.19 99.47 99.47 98.30 99.47 0.81 98.91
501 55 42 24202 99.61 92.27 90.11 99.77 91.17 0.39
1401 46 45 23308 99.63 96.89 96.82 99.80 96.85 0.37
138 23 43 24596 99.73 76.24 85.71 99.91 80.70 0.27
1592 16 18 23174 99.86 98.88 99.00 99.93 98.94 0.14
2071 30 22 22677 99.79 98.95 98.57 99.87 98.76 0.21
Average 99.64 93.78 94.95 99.60 94.32 0.36
CNN-LSTM 18830 97 78 5795 99.29 99.59 99.49 98.35 99.54 0.71 99.06
505 51 38 24206 99.64 93.00 90.83 99.79 91.90 0.36
1410 37 35 23318 99.71 97.58 97.44 99.84 97.51 0.29
142 19 42 24597 99.75 77.17 88.20 99.92 82.32 0.25
1598 10 13 23179 99.91 99.19 99.38 99.96 99.29 0.09
2082 19 27 22672 99.81 98.72 99.10 99.92 98.91 0.19
Average 99.69 94.21 95.74 99.63 94.91 0.31
ENSEMBLE 18836 91 88 5785 99.28 99.53 99.52 98.45 99.53 0.72 99.10
510 46 32 24212 99.69 94.10 91.73 99.81 92.90 0.31
1409 38 30 23323 99.73 97.92 97.37 99.84 97.64 0.27
146 15 33 24606 99.81 81.56 90.68 99.94 85.88 0.19
1600 8 17 23175 99.90 98.95 99.50 99.97 99.22 0.10
2077 24 22 22677 99.81 98.95 98.86 99.89 98.90 0.19
Average 99.70 95.17 96.28 99.65 95.68 0.30
Table 7
Comparison of prediction performance of all experimentation.
Method Experimentation on Dataset Avg. Acc(%) Avg. Pr(%) Avg. Re(%) Avg. Sp(%) Avg. F1(%) Avg. CER(%) OAcc (%) Minority class Acc (%)
Table 8
Performance comparison of the proposed methodology with state-of-art technique.
Literature Technique Database Data balancing methods Number of Performance (%)
Classes
18
H.M. Rai et al. Computers in Biology and Medicine 150 (2022) 106142
works of literature, with the exception of [73–76] have utilized MITDB. 6. Conclusion and future scope
It is proposed in this work that the two authenticated databases (MITDB
and PTBDB) be merged in order to maximize the number of ECG beats The automated prediction of six different types of cardiac arrhythmia
available for effective training of deep learning models while simulta from a large unbalanced ECG dataset was accomplished using three
neously maintaining the heterogeneity of data samples collected. techniques: two deep neural network models (CNN and CNN-LSTM) and
The authors of [70–76] have taken into consideration the data a sequential ensemble approach, in this work. Total 123, 998 ECG beats
imbalance concerns on the ECG dataset and have employed a variety of from integrated two benchmarked ECG datasets (“MITDB and PTDBD”)
techniques to deal with them, as shown in Table 8. Furthermore, authors are utilized for training and validation of the model. The used dataset is
in Refs. [34,35,37–39,71] have utilized a combined dataset, also only highly imbalanced, hence the three types of different resampling
authors in Refs. [37,38,40,72,75] have detected more numbers of ar methods, Random oversampling, SMOTE, SMOTE + Tomek, have been
rhythmias than our work. For comparison of model performance, ac used for balancing the dataset. There are four experimentation is per
curacy in percentages is mainly stated in the comparison table, except in formed on the imbalanced and balanced train dataset. The 6 types of
cases [37,73,76] where the accuracy has not been calculated in their cardiac arrhythmias are predicted from 24,800 ECG beats and results are
studies. Only [72] have reported an accuracy of 99.26% more than our evaluated using six types of performance assessment metrics. The
proposed work but they have only used the MITDB in their experi average class accuracy (Recall) is improved from 92.3% to 95.54% and
mentation. It should also be emphasized that none of this research has also the minority class accuracy was improved to about 20% compared
taken into account the accuracy of the majority and minority classes, as to the imbalanced dataset. The proposed model performed well on a
well as the resampling of ECG data utilizing the hybrid SMOTE + Tomek large ECG dataset as equated to state-of-art techniques. The offered CNN
algorithm. models provide decent scores (more than 99%) even on the imbalance
To identify the six kinds of arrhythmias, we have presented three dataset which proves that they are adaptive and efficient.
distinct methodologies, including two deep learning models CNN and In spite of the fact that the suggested model performs well and with
CNN-LSTM, as well as one sequential ensemble approach, all of which high accuracy, there is still room for development. The GAN (“genera
have been tested on two benchmarked datasets, the MITDB and the tive adversarial network”) will be used in future work to produce syn
PTBDB. Three distinct data balancing strategies have been used, thetic data samples of minority classes in order to enhance the total
including the proposed hybrid SMOTE + Tomek data resampling number of data samples and also to evaluate the model performance
methodology, to tackle the imbalanced class issue of ECG datasets, between the two groups. Additionally, we will employ the most recent
which has been shown to be effective. model using such a transformer model in conjunction with the CNN
It is possible to easily analyze the performance of the classifier after model, as well as a cross-validation strategy, in order to further increase
computing the individual class accuracy such as recall, precision, and the classification accuracy.
other types of evaluation metrics because the imbalanced datasets lead
to poor performance of the classifier performance but it is difficult to
visualize with the accuracy metrics of the models alone. In addition, four Declaration of competing interest
sets of tests were conducted on four different datasets, including an
imbalanced dataset, a RO dataset, a SMOTE sampled dataset, and a No conflict of interest is involved for this manuscript.
proposed hybrid SMOTE + Tomek balanced dataset. All of these ex
periments have been carried out utilizing the suggested CNN deep Appendix A. Supplementary data
learning model and CNN-LSTM hybridized model, as well as the
sequential ensemble technique, to get the desired results. The most ad Supplementary data to this article can be found online at https://doi.
vantageous aspect of the suggested technique is that no feature extrac org/10.1016/j.compbiomed.2022.106142.
tion and preprocessing tasks were performed in order to predict the
occurrence of ECG arrhythmias. On the proposed balanced dataset, each
References
of the three proposed approaches has an average classification accuracy
of more than 99.5%. The ensemble strategy on SMOTE + Tomek [1] WHO, Cardiovascular Diseases (CVDs), 2017. https://www.who.
balanced datasets was used to obtain an overall accuracy of 99.10%, and int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds. Accessed 9 Jun
it also improved the accuracy of the minority class by more than 20%, 2020.
[2] WHF, World Heart Day 2019 - World Heart Federation, 2019. https://www.wor
according to the results. ld-heart-federation.org/world-heart-day/world-heart-day-2019/. Accessed 7 Jul
The main advantages of our work are as follows: 2020.
[3] R.J. Martis, U.R. Acharya, L.C. Min, ECG beat classification using PCA, LDA, ICA
and Discrete Wavelet Transform, Biomed. Signal Process Control 8 (2013)
• Three distinct techniques of prediction include two DNN models 437–448, https://doi.org/10.1016/j.bspc.2013.01.005.
CNN and CNN-LSTM and the sequential ensemble approach. [4] S. Banerjee, M. Mitra, ECG beat classification based on discrete wavelet
• Two benchmarked datasets integrated together to generate a large transformation and nearest neighbour classifier, J. Med. Eng. Technol. 37 (2013)
264–272, https://doi.org/10.3109/03091902.2013.794251.
123,998 ECG imbalanced dataset. [5] M. Thomas, M.K. Das, S. Ari, Automatic ECG arrhythmia classification using dual
• First work to combine MITDB and PTBDB datasets to predict 6 types tree complex wavelet based features, AEU - Int J Electron Commun 69 (2015)
of arrhythmias. 715–721, https://doi.org/10.1016/j.aeue.2014.12.013.
[6] A.F. Khalaf, M.I. Owis, I.A. Yassine, A novel technique for cardiac arrhythmia
• More than 99.5% of average accuracy and over 99% accuracy using
classification using spectral correlation and support vector machines, Expert Syst.
the proposed model. Appl. 42 (2015) 8361–8368, https://doi.org/10.1016/j.eswa.2015.06.046.
• Hybrid SMOTE + Tomek balancing algorithm to solve ECG data [7] S. Chen, W. Hua, Z. Li, et al., Heartbeat classification using projected and dynamic
features of ECG signal, Biomed. Signal Process Control 31 (2017) 165–173, https://
imbalance issue.
doi.org/10.1016/j.bspc.2016.07.010.
• Improvement of more than 20% of the classification accuracy of [8] M.K. Moridani, M. Abdi Zadeh, Z. Shahiazar Mazraeh, An efficient automated
minority class samples utilizing the proposed technique. algorithm for distinguishing normal and abnormal ECG signal, Irbm 40 (2019)
• No preprocessing and feature extraction method for predicting ar 332–340, https://doi.org/10.1016/j.irbm.2019.09.002.
[9] W. Yang, Y. Si, D. Wang, G. Zhang, A novel method for identifying
rhythmias from ECG dataset. electrocardiograms using an independent component analysis and principal
component analysis network, Meas J Int Meas Confed 152 (2020), 107363, https://
There are a few drawbacks to this work; including the fact that it has doi.org/10.1016/j.measurement.2019.107363.
[10] J. Park, K. Kang, PcHD: personalized classification of heartbeat types using a
not been confirmed using a real-time dataset and that it has not been decision tree, Comput. Biol. Med. 54 (2014) 79–88, https://doi.org/10.1016/j.
built with hardware compatibility. compbiomed.2014.08.013.
19
H.M. Rai et al. Computers in Biology and Medicine 150 (2022) 106142
[11] Hemanth KS. Doreswamy, Performance evaluation of predictive engineering [37] U.R. Acharya, H. Fujita, S.L. Oh, et al., Automated identification of shockable and
materials data sets, Artif Intell Syst ans Mach Learn 3 (2011) 1–8. non-shockable life-threatening ventricular arrhythmias using convolutional neural
[12] F.I. Alarsan, M. Younes, Analysis and classification of heart diseases using network, Future Generat. Comput. Syst. 79 (2018) 952–959, https://doi.org/
heartbeat features and machine learning algorithms, J Big Data 6 (2019) 1–15, 10.1016/j.future.2017.08.039.
https://doi.org/10.1186/s40537-019-0244-x. [38] N.I. Hasan, A. Bhattacharjee, Deep learning approach to cardiovascular disease
[13] A. Hernandez-Matamoros, H. Fujita, E. Escamilla-Hernandez, et al., Recognition of classification employing modified ECG signal from empirical mode decomposition,
ECG signals using wavelet based on atomic functions, Biocybern. Biomed. Eng. 40 Biomed. Signal Process Control 52 (2019) 128–140, https://doi.org/10.1016/j.
(2020) 803–814, https://doi.org/10.1016/j.bbe.2020.02.007. bspc.2019.04.005.
[14] A. Ebrahimzadeh, B. Shakiba, A. Khazaee, Detection of electrocardiogram signals [39] R.S. Andersen, A. Peimankar, S. Puthusserypady, A deep learning approach for
using an efficient method, Appl. Soft Comput. J 22 (2014) 108–117, https://doi. real-time detection of atrial fibrillation, Expert Syst. Appl. 115 (2019) 465–473,
org/10.1016/j.asoc.2014.05.003. https://doi.org/10.1016/j.eswa.2018.08.011.
[15] F.A. Elhaj, N. Salim, A.R. Harris, et al., Arrhythmia recognition and classification [40] A.M. Shaker, M. Tantawi, H.A. Shedeed, M.F. Tolba, Generalization of
using combined linear and nonlinear features of ECG signals, Comput. Methods convolutional neural networks for ECG classification using generative adversarial
Progr. Biomed. 127 (2016) 52–63, https://doi.org/10.1016/j.cmpb.2015.12.024. networks, IEEE Access 8 (2020) 35592–35605, https://doi.org/10.1109/
[16] X. Dong, C. Wang, W. Si, ECG beat classification via deterministic learning, ACCESS.2020.2974712.
Neurocomputing 240 (2017) 1–12, https://doi.org/10.1016/j. [41] D.K. Atal, M. Singh, Arrhythmia classification with ECG signals based on the
neucom.2017.02.056. optimization-enabled deep convolutional neural network, Comput. Methods Progr.
[17] R. Singh, R. Mehta, N. Rajpal, Efficient wavelet families for ECG classification using Biomed. 196 (2020), 105607, https://doi.org/10.1016/j.cmpb.2020.105607.
neural classifiers, Procedia Comput. Sci. 132 (2018) 11–21, https://doi.org/ [42] H. Su, D. Zhao, H. Elmannai, et al., Multilevel threshold image segmentation for
10.1016/j.procs.2018.05.054. COVID-19 chest radiography: a framework using horizontal and vertical multiverse
[18] S. Jayalalith, D. Susan, S. Kumari, B. Archana, K-Nearest neighbour method of optimization, Comput. Biol. Med. 146 (2022), 105618, https://doi.org/10.1016/j.
analysing the ECG signal (to find out the different disorders related to heart), compbiomed.2022.105618.
J. Appl. Sci. 14 (2014) 1628–1632, https://doi.org/10.3923/jas.2014.1628.1632. [43] S. Wang, Y. Cong, H. Zhu, et al., Multi-scale context-guided deep network for
[19] İ. Kayikcioglu, F. Akdeniz, C. Köse, T. Kayikcioglu, Time-frequency approach to automated lesion segmentation with endoscopy images of gastrointestinal tract,
ECG classification of myocardial infarction, Comput. Electr. Eng. 84 (2020), IEEE J Biomed Heal Informatics 25 (2021) 514–525, https://doi.org/10.1109/
https://doi.org/10.1016/j.compeleceng.2020.106621. JBHI.2020.2997760.
[20] R. Saini, N. Bindal, P. Bansal, Classification of heart diseases from ECG signals [44] B. He, W. Hu, K. Zhang, et al., Image segmentation algorithm of lung cancer based
using wavelet transform and kNN classifier, in: International Conference on on neural network model, Expet Syst. 39 (2022), https://doi.org/10.1111/
Computing, Communication and Automation, ICCCA 2015, Institute of Electrical exsy.12822.
and Electronics Engineers Inc., 2015, pp. 1208–1215. [45] S. Tang, F. Yu, Construction and verification of retinal vessel segmentation
[21] A.A. Savostin, D.V. Ritter, G.V. Savostina, Using the K-nearest neighbors algorithm algorithm for color fundus image under BP neural network model, J Supercomput
for automated detection of myocardial infarction by electrocardiogram data 77 (2021) 3870–3884, https://doi.org/10.1007/s11227-020-03422-8.
entries, Pattern Recogn. Image Anal. 29 (2019) 730–737, https://doi.org/10.1134/ [46] A. Qi, D. Zhao, F. Yu, et al., Directional mutation and crossover boosted ant colony
S1054661819040151. optimization with application to COVID-19 X-ray image segmentation, Comput.
[22] S.M. Qaisar, M. Krichen, F. Jallouli, Multirate ECG processing and k-nearest Biol. Med. 148 (2022), 105810, https://doi.org/10.1016/j.
neighbor classifier based efficient arrhythmia diagnosis, in: Lecture Notes in compbiomed.2022.105810.
Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and [47] Y. Dai, J. Wu, Y. Fan, et al., MSEva: a musculoskeletal rehabilitation evaluation
Lecture Notes in Bioinformatics), Springer, 2020, pp. 329–337. system based on EMG signals, ACM Trans. Sens. Netw. (2022), https://doi.org/
[23] A. Ghahremani, S. Nabavi, H. Nateghi, Fast and noise-tolerant method of ECH 10.1145/3522739.
beats classification using wavelet features and fractal dimension, in: Proceeding, [48] J. Zhou, X. Zhang, Z. Jiang, Recognition of imbalanced epileptic EEG signals by a
2010 IEEE Student Conf Res Dev - Eng Innov beyond, 2010, pp. 310–313, https:// graph-based extreme learning machine, Wireless Commun. Mobile Comput.
doi.org/10.1109/SCORED.2010.5704023. SCOReD 2010. (2021), https://doi.org/10.1155/2021/5871684, 2021.
[24] S.H. Wang, V.V. Govindaraj, J.M. Górriz, et al., Covid-19 classification by FGCNet [49] H.M. Rai, K. Chatterjee, Hybrid CNN-LSTM deep learning model and ensemble
with deep feature fusion from graph convolutional network and convolutional technique for automatic detection of myocardial infarction using big ECG data,
neural network, Inf. Fusion 67 (2021) 208–229, https://doi.org/10.1016/j. Appl. Intell. (2021), https://doi.org/10.1007/s10489-021-02696-6.
inffus.2020.10.004. [50] Physionet, MIT-BIH Arrhythmia Database-V1, 2005.
[25] Y.D. Zhang, Z. Dong, S.H. Wang, et al., Advances in multimodal data fusion in [51] A.L. Goldberger, L.A.N. Amaral, L. Glass, et al., PhysioBank, PhysioToolkit, and
neuroimaging: overview, challenges, and novel orientation, Inf. Fusion 64 (2020) PhysioNet: components of a new research resource for complex physiologic signals,
149–187, https://doi.org/10.1016/j.inffus.2020.07.006. Circulation 101 (2000) 23.
[26] S. Dilmac, M. Korurek, ECG heart beat classification method based on modified [52] R. Bousseljot, D. Kreiseler, A.N. Schnabel, The PTB diagnostic ECG database,
ABC algorithm, Appl. Soft Comput. J 36 (2015) 641–655, https://doi.org/ Biomed. Tech. 40 (1995) 317, https://doi.org/10.13026/C28C71.
10.1016/j.asoc.2015.07.010. [53] S. Fazeli, ECG heartbeat categorization dataset, in: Kaggle, 2018. https://www.
[27] S. Raj, K.C. Ray, O. Shankar, Cardiac arrhythmia beat classification using DOST kaggle.com/shayanfazeli/heartbeat. Accessed 21 Aug 2020.
and PSO tuned SVM, Comput. Methods Progr. Biomed. 136 (2016) 163–177, [54] M. Kachuee, S. Fazeli, M. Sarrafzadeh, ECG heartbeat classification: a deep
https://doi.org/10.1016/j.cmpb.2016.08.016. transferable representation, in: Proceedings - 2018 IEEE International Conference
[28] S. Raj, K.C. Ray, Sparse representation of ECG signals for automated recognition of on Healthcare Informatics, ICHI, 2018, pp. 443–444, 2018.
cardiac arrhythmias, Expert Syst. Appl. 105 (2018) 49–64, https://doi.org/ [55] A. Somasundaram, U.S. Reddy, Data imbalance: effects and solutions for
10.1016/j.eswa.2018.03.038. classification of large and highly imbalanced data, Proc 1st Int Conf Res Eng
[29] V. Bhagyalakshmi, R.V. Pujeri, G.D. Devanagavi, GB-SVNN: genetic BAT assisted Comput Technol (ICRECT 2016) (2016) 28–34.
support vector neural network for arrhythmia classification using ECG signals, [56] K. Madasamy, M. Ramaswami, Data imbalance and classifiers: impact and
J King Saud Univ - Comput Inf Sci. (2018), https://doi.org/10.1016/j. solutions from a big data perspective, Int. J. Comput. Intell. Res. 13 (2017)
jksuci.2018.02.005. 2267–2281.
[30] A. Diker, D. Avci, E. Avci, M. Gedikpinar, A new technique for ECG signal [57] N.V. Chawla, K.W. Bowyer, L.O. Hall, W.P. Kegelmeyer, SMOTE: Synthetic
classification genetic algorithm Wavelet Kernel extreme learning machine, Optik Minority Over-sampling Technique, 2002.
180 (2019) 46–55, https://doi.org/10.1016/j.ijleo.2018.11.065. [58] Younes Charfaoui, Resampling to properly handle imbalanced datasets in machine
[31] H. Shi, H. Wang, Y. Huang, et al., A hierarchical method based on weighted learning, in: Heartbeat, 2019. https://heartbeat.fritz.ai/resampling-to-properly-ha
extreme gradient boosting in ECG heartbeat classification, Comput. Methods Progr. ndle-imbalanced-datasets-in-machine-learning-64d82c16ceaa. Accessed 28 Aug
Biomed. 171 (2019) 1–10, https://doi.org/10.1016/j.cmpb.2019.02.005. 2020.
[32] M. Hammad, A. Maher, K. Wang, et al., Detection of abnormal heart conditions [59] Y. Lecun, L. Bottou, Y. Bengio, P. Ha, Gradient_Based learning applied to document
based on characteristics of ECG signals, Meas J Int Meas Confed 125 (2018) recognition, Proc. IEEE (1998) 1–46, https://doi.org/10.1109/5.726791.
634–644, https://doi.org/10.1016/j.measurement.2018.05.033. [60] M. Tan, Q.V. Le, EfficientNet: rethinking model scaling for convolutional neural
[33] S. Kiranyaz, T. Ince, R. Hamila, M. Gabbouj, Convolutional neural networks for networks, in: 36th Int Conf Mach Learn ICML 2019 2019-June, 2019,
patient-specific ECG classification, Proc Annu Int Conf IEEE Eng Med Biol Soc pp. 10691–10700.
EMBS 2015-Novem (2015) 2608–2611, https://doi.org/10.1109/ [61] Vardan Agarwal, Complete architectural details of all EfficientNet Mo0dels, in:
EMBC.2015.7318926. TowardsDataScience, 2020. https://towardsdatascience.com/complete-architect
[34] M.M.A. Rahhal, Y. Bazi, H. Alhichri, et al., Deep learning approach for active ural-details-of-all-efficientnet-models-5fd5b736142. Accessed 30 Jan 2022.
classification of electrocardiogram signals, Inf. Sci. 345 (2016) 340–354, https:// [62] A. Escontrela, Convolutional Neural Networks from the Ground up - towards Data
doi.org/10.1016/j.ins.2016.01.082. Science. towardsdatascience.Com, 2020.
[35] U.R. Acharya, H. Fujita, O.S. Lih, et al., Automated detection of coronary artery [63] A. Kabir Anaraki, M. Ayati, F. Kazemi, Magnetic resonance imaging-based brain
disease using different durations of ECG segments with convolutional neural tumor grades classification and grading via convolutional neural networks and
network, Knowl. Base Syst. 132 (2017) 62–71, https://doi.org/10.1016/j. genetic algorithms, Biocybern. Biomed. Eng. 39 (2019) 63–74, https://doi.org/
knosys.2017.06.003. 10.1016/j.bbe.2018.10.004.
[36] S.L. Oh, E.Y.K. Ng, R.S. Tan, U.R. Acharya, Automated diagnosis of arrhythmia [64] M. Phi, Illustrated Guide to LSTM ’ s and GRU ’ s : a step by step explanation, in:
using combination of CNN and LSTM techniques with variable length heart beats, Medium.com, 2018. https://towardsdatascience.com/illustrated-guide-to-lstms-a
Comput. Biol. Med. 102 (2018) 278–287, https://doi.org/10.1016/j. nd-gru-s-a-step-by-step-explanation-44e9eb85bf21. Accessed 23 Aug 2020.
compbiomed.2018.06.002.
20
H.M. Rai et al. Computers in Biology and Medicine 150 (2022) 106142
[65] X. Yuan, L. Li, Y. Wang, Nonlinear dynamic soft sensor modeling with supervised [71] J. Jiang, H. Zhang, D. Pi, C. Dai, A novel multi-module neural network system for
long short-term memory network, IEEE Trans. Ind. Inf. 16 (2020) 3168–3176, imbalanced heartbeats classification, Expert Syst. Appl. X 1 (2019), 100003.
https://doi.org/10.1109/TII.2019.2902129. [72] J. Gao, H. Zhang, P. Lu, Z. Wang, An effective LSTM recurrent network to detect
[66] N. Omkar, Activation functions with derivative and Python code: sigmoid vs tanh arrhythmia on imbalanced ECG dataset, J Healthc Eng (2019), https://doi.org/
vs Relu, in: Medium.com, 2019. https://medium.com/@omkar.nallagoni/activ 10.1155/2019/6320651, 2019.
ation-functions-with-derivative-and-python-code-sigmoid-vs-tanh-vs-relu-44d [73] A. Darmawahyuni, S. Nurmaini, Sukemi, et al., Deep learning with a recurrent
23915c1f4. network structure in the sequence modeling of imbalanced data for ECG-rhythm
[67] V. Kotu, B. Deshpande, Data science process, Data Sci. (2019) 19–37, https://doi. classifier, Algorithms 12 (2019) 1–12, https://doi.org/10.3390/a12060118.
org/10.1016/b978-0-12-814761-0.00002-2. [74] L.D. Sharma, R.K. Sunkaria, Myocardial infarction detection and localization using
[68] E. Lutins, Ensemble methods in machine learning: what are they and why use optimal features based lead specific approach, Irbm 41 (2020) 58–70, https://doi.
them? | by evan lutins | towards data science, in: towardsdatascience.com, 2017. org/10.1016/j.irbm.2019.09.003.
https://towardsdatascience.com/ensemble-methods-in-machine-learning-what-a [75] S. Samir Abdelmoneem, H. Hanafy Said, A. Anwar Saad, Arrhythmia disease
re-they-and-why-use-them-68ec3f9fef5f. Accessed 11 Dec 2020. classification and mobile based system design, J Phys Conf Ser 1447 (2020),
[69] J. Leonel, Hyperparameters in machine/deep learning, in: Medium.com, 2019. https://doi.org/10.1088/1742-6596/1447/1/012014.
https://medium.com/@jorgesleonel/hyperparameters-in-machine-deep-learni [76] G. Petmezas, K. Haris, L. Stefanopoulos, et al., Automated atrial fibrillation
ng-ca69ad10b981. Accessed 23 May 2020. detection using a hybrid CNN-LSTM network on imbalanced ECG datasets, Biomed.
[70] K.N.V.P.S. Rajesh, R. Dhuli, Classification of imbalanced ECG beats using re- Signal Process Control 63 (2021), https://doi.org/10.1016/j.bspc.2020.102194.
sampling techniques and AdaBoost ensemble classifier, Biomed. Signal Process
Control 41 (2018) 242–254, https://doi.org/10.1016/j.bspc.2017.12.004.
21