1 PB

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

TELKOMNIKA Telecommunication Computing Electronics and Control

Vol. 22, No. 1, February 2024, pp. 76~85


ISSN: 1693-6930, DOI: 10.12928/TELKOMNIKA.v22i1.25357  76

Integration of PSO-based advanced supervised learning


techniques for classification data mining to predict heart failure

Mesran1, Remuz Mb Kmurawak2, Agus Perdana Windarto3


1
Department of Computer Science, Universitas Budi Darma, Medan, Indonesia
2
Department of Information Systems, Universitas Cenderawasih, Jayapura, Indonesia
3
Department of Information Systems, STIKOM Tunas Bangsa, Pematang Siantar, Indonesia

Article Info ABSTRACT


Article history: Heart failure (HF) is a global health threat, requiring urgent research in its
classification. This study proposes a novel approach for HF classification by
Received May 26, 2023 integrating advanced supervised learning (ASL) and particle swarm
Revised Aug 16, 2023 optimization (PSO). ASL techniques like bagging and AdaBoost are
Accepted Aug 30, 2023 employed within the PSO+ASL optimization model to enhance prediction
accuracy. PSO optimizes model weights and bias, while ASL addresses
overfitting or underfitting issues. Split validation and cross-validation
Keywords: (70:30, 80:20, 90:10 with k-fold=10) are used for further optimization. The
testing phase involves 12 classifiers in five groups: decision tree models
Advanced supervised learning (DTM), support vector machines (SVM), Naïve Bayes classifiers models
Classification (NBCM), logistic regression models (LRM), and lazy model (LM).
Heart failure Evaluating the proposed approach with an HF patient dataset from
Optimization https://www.kaggle.com, results are compared against the standard model,
Particle swarm optimization PSO optimization, and PSO+ASL. Experimental findings demonstrate the
superiority of the proposed approach, achieving higher accuracy in HF
prediction. The PSO+ASL optimization model with the k-nearest neighbor
(k-NN) method exhibits the best classification performance. It consistently
achieves the highest accuracy across all tests on dataset composition ratios,
with 100% accuracy, f-measure, sensitivity, specificity values, and area
under cover (AUC) of 1. The proposed approach serves as a reliable tool for
early detection and prevention of HF.
This is an open access article under the CC BY-SA license.

Corresponding Author:
Mesran
Department of Computer Science, Universitas Budi Darma
Jl. Sisingamangaraja No. 338, Siti Rejo I, Medan Kota District, Medan City, North Sumatra, Indonesia
Email: [email protected]

1. INTRODUCTION
Heart failure (HF) is a medical condition that is characterized by a complex set of symptoms rather
than a specific disease [1]. It occurs when the ventricle struggles to fill or empty with blood, making it
challenging for the heart to meet the body’s circulation needs. Common symptoms include shortness of
breath, swollen ankles, and fatigue, while signs such as high jugular venous pressure, pulmonary crackles,
and peripheral edema may also be present, indicating structural and/or functional cardiac or non-cardiac
abnormalities [2], [3]. In Indonesia, heart disease is the leading cause of death, and HF represents a
significant portion of these cases [4]. Approximately 5% of the country’s population is estimated to suffer
from HF [5]. Furthermore, the fatality rate is significant, with up to 17.2% of all HF patients dying during
their initial hospitalization, regardless of a history of heart attacks. Additionally, 11.3% of patients died
within a year of starting treatment, while another 17% required repeated hospitalizations due to worsening

Journal homepage: http://telkomnika.uad.ac.id


TELKOMNIKA Telecommun Comput El Control  77

HF. These patients are typically hospitalized at least once a year after diagnosis, with an average age of 58.
Data from the Basic Health Research Data (Riskesdas) for 2013 and 2018 show an increasing trend in heart
disease, rising from 0.5% in 2013 to 1.5% in 2018. Heart disease, including HF, is associated with significant
healthcare costs, with IDR 7.7 trillion spent on it in 2021, according to data from the Social Security
Administering Body for Health (BPJS). These statistics emphasize the importance of early detection and
treatment of HF. Traditional diagnosis of HF relies on the patient’s medical history, physical tests, and the
doctor’s examination of related symptoms [3], [6]. Angiographic techniques are one of the most reliable
conventional methods for diagnosing HF [7]. However, this method requires specialized expertise and comes
with a high cost and potential side effects [8].
While there have been efforts to achieve high predictive performance and identify relevant risk
factors associated with HF, the emergence of artificial intelligence (AI) tools and machine learning (ML)
algorithms [9], [10] in recent years has provided powerful diagnostic aids [11]. These tools can extract
knowledge from large amounts of data, which may be difficult or impossible for humans to achieve [12], [13].
By employing ML-based decision-making approaches, doctors can detect the risk of HF and provide
necessary treatments and recommendations to manage these risks [14]. Early detection and treatment using
ML techniques have the potential to significantly improve patient survival rates. Consequently, several studies
have utilized ML for the diagnosis [15]–[19] and prediction of HF, such as determining the likelihood of a
patient having a disease history that may cause HF, such as hypertension, diabetes, or hyperlipidemia [20]–[23].
Various classification algorithms, including decision trees [24]–[26], support vector machines (SVM) [27],
Naïve Bayes [28], and neural networks [29] have been used for HF prediction. Despite these efforts,
accurately predicting HF remains a significant challenge. Comparison and benchmarking results of ML
classifiers have shown no significant differences in performance [30], and no single classifier has proven to
be the best for all datasets.
Our study aims to address the existing gap in accurately predicting heart failure using machine
learning techniques. Despite various efforts, no single classifier has proven to be the best for all datasets. In
this research, we present a novel approach that incorporates advanced supervised learning (ASL) [29]–[31]
and particle swarm optimization (PSO) [32], [33] techniques to optimize classification results. Moreover, we
employed split and cross-validation techniques with varying composition ratios of 70:30, 80:20, and 90:10,
using k-fold=10, and tested twelve classifiers sorted into five groups: decision tree models (DTM), SVM,
Naïve Bayes classifier models (NBCM), logistic regression models (LRM), and lazy models (LM). The
selection of these classifiers was based on several considerations. Firstly, previous studies have shown that
various classification algorithms, such as decision trees [22]–[24], SVM [25], Naïve Bayes [26], and neural
networks [27], have been used for HF prediction. These algorithms have demonstrated their effectiveness in
handling complex datasets and have been widely employed in HF research. Secondly, the rationale behind
choosing multiple classifiers lies in the understanding that no single classifier has proven to be the best for all
datasets or consistently outperforms others. Comparison and benchmarking results of ML classifiers have
shown no significant differences in performance [28]. Therefore, by employing a diverse set of classifiers,
the paper aims to explore the strengths and weaknesses of each algorithm and identify the most suitable
classifiers for HF classification. By evaluating the PSO and ASL algorithms on 12 classifiers grouped into
five categories, this study aims to assess the strengths and weaknesses of each classifier and determine the
most appropriate one for HF classification. This research makes a significant contribution by offering a more
precise approach to diagnosing heart failure, leading to early detection and improved patient outcomes.
Furthermore, our findings can guide future research endeavors aimed at enhancing the diagnosis and
treatment of heart failure. The integration of AI and ML techniques [31], [32] in healthcare holds great
promise for enhancing patient well-being and reducing healthcare expenses.

2. METHOD
The primary objective of this study is to enhance the classification performance of 12 classifiers
through the integration of ASL and PSO techniques. A comprehensive evaluation of classifier performance
was conducted using a combination of split tests and cross-validation. The training and test data were
partitioned into different ratios, namely 70:30, 80:20, and 90:10, with a k-fold value of 10. To assess the
effectiveness of the proposed model, data from HF patients were employed. By subjecting the classifiers to
this dataset, the study aimed to improve their classification performance.

2.1. Data preparation and processing


For this study, a dataset comprising five distinct datasets from various sources was obtained from
Kaggle (https://www.kaggle.com). These datasets include Cleveland (303 observations), Hungary (294
observations), Switzerland (123 observations), Long Beach VA (200 observations), and a Stalog (liver)
dataset (270 observations). The combined dataset consists of a total of 918 observations and encompasses
Integration of PSO-based advanced supervised learning techniques for classification data … (Mesran)
78  ISSN: 1693-6930

twelve variables, with eleven variables serving as inputs and one variable acting as the output (label). Each
variable’s subset was tailored according to the specific requirements of the study. The subsequent section
provides a comprehensive description of the variables utilized in the HF study.
The study utilized a sample dataset (complete data can be accessed at https://shorturl.at/klvS2), as
presented in Table 1, consisting of various parameters related to patients. These parameters include the age of
the patient in years, the sex of the patient (M for male and F for female), the type of chest pain experienced
(TA for typical angina, ATA for atypical angina, NAP for non-anginal pain, and ASY for asymptomatic), the
resting blood pressure (RestingBP) in mm Hg, the serum cholesterol level in mm/dl, the fasting blood sugar
(FastingBS) (1 if FastingBS > 120 mg/dl, 0 otherwise), the results of the resting electrocardiogram
(RestingECG) (normal, ST for ST-T wave abnormality, and LVH for probable or definite left ventricular
hypertrophy), the maximum heart rate achieved (MaxHR) (numeric value between 60 and 202), the presence
of exercise-induced angina (Y for yes and N for no), the oldpeak value measured in depression, the slope of
the peak exercise ST segment (up for upsloping, flat for flat, and down for downsloping), and the output class
indicating the presence of heart disease (1 for HF and 0 for normal).
The table represents data for predicting heart failure. It includes information about patients’ age,
gender, chest pain type, resting blood pressure (RestingBP), cholesterol levels, fasting blood sugar
(FastingBS), the results of the resting electrocardiogram (restingECG), the maximum heart rate achieved
(MaxHR), exercise-induced angina, ST depression at exercise, ST slope, and heart disease condition. The
data consists of 918 patients, where each row represents one patient’s information. This data can be used to
build a predictive model that will help identify the risk factors associated with heart failure. Through
analyzing this data, researchers can gain insights into patterns or correlations between the different variables
that may contribute to the onset of heart failure. Ultimately, this data has tremendous potential to inform
clinical decisions and improve patient outcomes.

2.2. Proposed model architecture


The proposed model architecture aims to enhance the accuracy of predicting HF by leveraging
advanced ML techniques, such as PSO-based algorithms and supervised learning algorithms. Through the
selection of pertinent features and optimization of model parameters, this approach enables more precise
predictions, which can be instrumental for healthcare professionals in making informed decisions regarding
patient care. To ensure the robustness of the proposed approach, a combination of split validation and
cross-validation was implemented, utilizing different composition ratios of 70:30, 80:20, and 90:10, with a
k-fold value of 10. To evaluate the effectiveness of the model, twelve classifiers were employed and grouped
into five categories, namely DTM, SVM, NBCM, LRM, and LM.
The confusion matrix and area under the receiver operating characteristic curve (AUC) are utilized
for model evaluation in the classification task due to their ability to comprehensively assess the performance
of the classification model [33], [34]. The confusion matrix allows for a detailed analysis of the model’s
predictions compared to the actual labels, enabling an evaluation of its accuracy in classifying instances into
different classes. Additionally, the selection of AUC serves to measure the overall performance of the
classifier [35]. AUC represents the classifier’s capacity to distinguish between positive and negative
examples at various classification thresholds. It provides a concise summary of the classifier’s performance
in a single value, making it particularly valuable when adjusting the classification threshold based on specific
applications or domains [36].

Table 1. Dataset
Chest pain Exercise Heart
No Age Sex RestingBP Cholesterol FastingBS RestingECG MaxHR Oldpeak ST_slope
type angina disease
1 40 M ATA 140 289 0 Normal 172 N 0 Up Normal
2 49 F NAP 160 180 0 Normal 156 N 1 Flat HF
3 37 M ATA 130 283 0 ST 98 N 0 Up Normal
4 48 F ASY 138 214 0 Normal 108 Y 1.5 Flat HF
5 54 M NAP 150 195 0 Normal 122 N 0 Up Normal
6 39 M NAP 120 339 0 Normal 170 N 0 Up Normal
7 45 F ATA 130 237 0 Normal 170 N 0 Up Normal
8 54 M ATA 110 208 0 Normal 142 N 0 Up Normal
. . . … … … … … … … … … …
914 45 M TA 110 264 0 Normal 132 N 1.2 Flat HF
915 68 M ASY 144 193 1 Normal 141 N 3.4 Flat HF
916 57 M ASY 130 131 0 Normal 115 Y 1.2 Flat HF
917 57 F ATA 130 236 0 LVH 174 N 0 Flat HF
918 38 M NAP 138 175 0 Normal 173 N 0 Up Normal

TELKOMNIKA Telecommun Comput El Control, Vol. 22, No. 1, February 2024: 76-85
TELKOMNIKA Telecommun Comput El Control  79

2.3. Model training and evaluation


The implementation of ASL and PSO techniques for data mining classification in predicting HF
involves several key steps. These steps, including data preparation, model selection, hyperparameter
optimization, training, evaluation, and reporting, are crucial for constructing a precise and reliable
classification model for HF prediction. It is worth noting that during the training phase, each model utilizes
k-fold=10, a cross-validation technique. This ensures robustness and generalizability of the models. Figure 1
provides a visual representation of these steps, highlighting their significance in the overall process.
Furthermore, Table 2 provides a visual representation of the optimization technique utilized in the
study, which is a combination of PSO and ASL. The pseudocode depicted in Table 2 outlines the step-by-step
process of this combined optimization approach. This figure serves as a valuable reference point for
understanding the methodology employed in the study and showcases the integration of PSO and ASL in the
optimization process.
The following is an explanation of the psedeucode from PSO and ASL where the ASL algorithm
takes as input the training data T, the number of base classifiers B, the subspace size S, the learning rate
alpha, and the number of iterations T. It aims to create an ensemble classifier model. The algorithm starts by
initializing the base classifiers and their corresponding weights. For each base classifier, a random subspace
of features is selected. The base classifier is trained using a subset of the training data with the selected
features. The weight for each base classifier is calculated based on its classification error on a validation set.
Next, the base classifiers are combined using weighted majority voting. For each test instance in the training
data, the ensemble output vector Y is initialized to zero. Each base classifier classifies the test instance, and
the weighted output is added to the ensemble output. The ensemble output vector is then normalized to obtain
a probability distribution. The weights for the base classifiers are updated based on the error rate on this
instance. The algorithm repeats this process for a specified number of iterations. Finally, the ensemble
classifier model is returned. These algorithms will be compared with a standard classification model
consisting of 11 classifiers.
In simple terms, Table 2 is explained PSO and ASL algorithms can greatly improve the performance
of classification in predicting heart failure. PSO algorithm can be used to select optimal features subset from
the predict heart failure dataset, while ASL combines bagging and boosting techniques to form a more
reliable ensemble classifier with diverse basis classifiers. The output from the ensemble classifier can then be
used as input for the PSO algorithm to optimize the parameters in the classification model. By using these
two algorithms together, the quality of the output from each basis classifier can be improved, and the most
important features can be selected to form the feature subspace, resulting in a more accurate and reliable
classification model for predicting heart failure.

Figure 1. Proposed model

Integration of PSO-based advanced supervised learning techniques for classification data … (Mesran)
80  ISSN: 1693-6930

Table 2. Pseudocode combination algorithms


Algorithm 1. PSO Algorithm 2. ASL (bagging + boosting)
initialize population of particles input: training data T, number of base classifiers B, subspace size S, learning rate
for each particle in population do: alpha, number of iterations T
initialize particle position and velocity output: ensemble classifier model
evaluate particle fitness for t = 1 to T do:
update personal best position and fitness // Initialize base classifiers and weights
end for for b = 1 to B do:
// Randomly select subspace of features
initialize global best position and fitness select S features at random
repeat until termination condition is met do: // Train base classifier on subspace of features
for each particle in population do: train base classifier using subset of T with selected features
update particle velocity based on current // Calculate weight for each base classifier
and previous positions calculate weight for base classifier based on classification error on validation
update particle position set
evaluate particle fitness end for
if particle has better fitness than personal // Combine base classifiers using weighted majority voting
best then: for each test instance in T do:
update personal best position and initialize ensemble output vector Y to zero
fitness for b = 1 to B do:
end if // Classify instance using base classifier and add to ensemble output
if particle has better fitness than global classify test instance using base classifier b and add weighted output to Y
best then: end for
update global best position and fitness // Normalize output vector to get probability distribution
end if normalize Y
end for // Update weights for base classifiers based on error rate on this instance
end repeat update weight for each base classifier based on error rate on this instance
end for
end for

// Return ensemble classifier model


return ensemble model

In addition to developing an accurate and robust model, it is also essential to evaluate the model’s
accuracy in predicting HF. This is carried out through the confusion matrix and the receiver operating
characteristics (ROC)/area under cover (AUC) curve. The ROC curve was created based on the values
calculated from the confusion matrix, which compares the false positive rates (FPR) and the true positive
rates (TPR). Where:
a) 𝐹𝑃𝑅 = 𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒/(𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝑇𝑟𝑢𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒);
b) 𝑇𝑃𝑅 = 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒/(𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒);
Subsequently, BAD, if the resulting curve is close to the baseline line or the line that crosses from point 0.0.
and GOOD, if the curve is close to 0.1 points.

3. RESULT AND DISCUSSION.


This section presents the experiments conducted and the results obtained for 12 standard model
classifiers, PSO optimization, and PSO+ASL optimization. The aim is to compare and explore which model
produced the best results for the HF classification. To evaluate these models, a combination of split and
cross-validation was used with different compositions of 70:30, 80:20, and 90:10, of which k-fold=10. The
model performance was evaluated using various metrics such as accuracy, f-size, sensitivity, specificity, and
AUC. The evaluation steps for each model are summarized in Table 3, and the AUC values for each model
are presented in Table 4.
Across all 12 classifiers listed in Table 3, there was a noticeable improvement in performance for
both PSO and PSO+ASL optimization models. Compared to the standard model, these optimization models
showed an increase in accuracy ranging from 1% to 35%. Notably, k-NN showed a significant improvement
in accuracy for all dataset ratios, with an increase of 27.99%. The AUC values for classifier models, as
summarized in Table 4, also showed improvement ranging from 0.0204 to 0.077 compared to the standard
model. In this case, k-NN achieved a “very good classification” with an AUC value of 1 for all dataset ratio
compositions.
From the Table 5, it can be observed that the combination of PSO and ASL yields better results in
improving classification accuracy for some classifiers compared to using only PSO. In the 70:30 dataset split,
significant improvements were observed for several classifiers such as decision tree, random forest, gradient
boosted tree, and Naïve Bayes (Kernel) when using PSO+ASL, while SVM (LibSVM) and k-NN did not
show any significant changes. In the 80:20 dataset split, PSO+ASL provided better accuracy improvements

TELKOMNIKA Telecommun Comput El Control, Vol. 22, No. 1, February 2024: 76-85
TELKOMNIKA Telecommun Comput El Control  81

than using only PSO in all classifiers, with the most significant increase seen in random tree and k-NN.
However, the results were less consistent in the 90:10 dataset split, with some classifiers showing
improvements with PSO+ASL, such as SVM, Naïve Bayes (Kernel), and LR (SVM), while others such as
decision tree, gradient boosted tree, and random tree showed a decrease in accuracy. Overall, the use of
PSO+ASL algorithms can improve classification performance for some classifier types and dataset splits, but
the appropriate algorithm should be chosen depending on the characteristics of the dataset used for predicting
heart failure. The information provided in Table 5 can be effectively represented and understood through the
graphical representation presented in Figure 2.
The results obtained from grouping the classifiers, as depicted in Figure 3, reveal that the LM
classifier achieved the highest average accuracy value of 100%. This corresponds to an average increase of
31.9%, 25.1%, and 26.97% for the 70:30, 80:20, and 90:10 ratios, respectively, when compared to the
standard model. For more detailed information regarding the average accuracy value per group, please refer
to Table 6.

Table 3. Evaluation for each classifier model (accuracy (%))


70:30 80:20 90:10
Classifiers
Standard PSO PSO+ASL Standard PSO PSO+ASL Standard PSO PSO+ASL
DTM Decision tree 87.25 88.34 88.81 85.55 87.33 87.34 85.59 86.43 87.40
Random forest 85.07 87.72 87.25 86.1 87.87 87.73 84.86 87.05 86.43
Gradient boosted 84.45 87.42 100 86.38 88.69 100 86.08 87.77 100
tree
Random tree 70.44 82.10 92.53 74.55 82.56 91.96 79.17 81.49 91.65
SVM SVM 86.94 87.87 87.87 85.83 86.77 87.60 85.47 87.05 87.29
SVM (LibSVM) 72.63 81.65 83.36 72.21 85.83 81.88 72.41 79.66 85.59
SVM (linear) 87.1 88.49 87.71 85.68 87.32 87.60 85.59 86.91 87.29
NBCM Naïve Bayes 86.47 88.01 88.80 86.93 88.01 88.01 86.45 87.77 88.62
Naïve Bayes 84.91 89.24 90.98 85.01 87.05 90.05 85.1 87.54 89.35
(Kernel)
LRM LR 86.47 88.80 88.34 86.51 87.21 87.74 86.2 87.30 87.53
LR (SVM) 85.85 88.02 88.49 84.74 87.74 87.60 84.98 87.05 87.29
Lazy K-NN 68.1 100 100 66.86 83.92 100 64.99 83.92 100

Table 4. Evaluation for each classifier model (AUC)


70:30 80:20 90:10
Classifiers
Standard PSO PSO+ASL Standard PSO PSO+ASL Standard PSO PSO+ASL
DTM Decision tree 0.863 0.8830 0.9180 0.847 0.8690 0.9140 0.843 0.8490 0.9060
Random forest 0.896 0.9080 0.9240 0.908 0.9130 0.9230 0.908 0.9150 0.9220
Gradient 0.922 0.9240 1 0.93 0.9260 1 0.927 0.9240 1
Boosted tree
Random tree 0.702 0.8180 0.9480 0.735 0.8360 0.9450 0.804 0.8250 0.9350
SVM SVM 0.928 0.9290 0.9330 0.925 0.9250 0.9240 0.923 0.9180 0.9180
SVM (LibSVM) 0.784 0.8960 0.8460 0.774 0.9120 0.8370 0.778 0.8540 0.8500
SVM (linear) 0.923 0.9340 0.9320 0.921 0.9240 0.9240 0.922 0.9170 0.9180
NBCM Naïve Bayes 0.913 0.9260 0.9450 0.919 0.9250 0.9280 0.917 0.9230 0.9260
Naïve Bayes 0.898 0.9450 0.9700 0.905 0.9220 0.9620 0.907 0.9120 0.9520
(Kernel)
LRM LR 0.931 0.9360 0.9260 0.927 0.9290 0.9120 0.926 0.9260 0.9030
LR (SVM) 0.932 0.9330 0.8900 0.925 0.9250 0.8830 0.924 0.9220 0.8710
LM K-NN 0.5 0.5000 1 0.5 0.5000 1 0.5 0.5000 1

Table 5. The improved average accuracy of each classifier


Classifiers 70:30 80:20 90:10
PSO PSO+ASL PSO PSO+ASL PSO PSO+ASL
DTM Decision tree 1.09 1.56 1.78 1.79 0.84 1.81
Random forest 2.65 2.18 1.77 1.63 2.19 1.57
Gradient boosted tree 2.97 15.55 2.31 13.62 1.69 13.92
Random tree 11.66 22.09 8.01 17.41 2.32 12.48
SVM SVM 0.93 0.93 0.94 1.77 1.58 1.82
SVM (LibSVM) 9.02 10.73 13.62 9.67 7.25 13.18
SVM (linear) 1.39 0.61 1.64 1.92 1.32 1.70
NBCM Naïve Bayes 1.54 2.33 1.08 1.08 1.32 2.17
Naïve Bayes (Kernel) 4.33 6.07 2.04 5.04 2.44 4.25
LRM LR 2.33 1.87 0.70 1.23 1.10 1.33
LR (SVM) 2.17 2.64 3.00 2.86 2.07 2.31
LM K-NN 31.90 31.90 17.06 33.14 18.93 35.01

Integration of PSO-based advanced supervised learning techniques for classification data … (Mesran)
82  ISSN: 1693-6930

Figure 2. The graph improved the average accuracy of each classifier (70:30, 80:20, 90:10)

Figure 3. The graph of the average value of accuracy by group (70:30, 80:20, 90:10)

Table 6. The results of the average value of accuracy by group


70:30 80:20 90:10
Classifiers
Standard PSO PSO+ASL Standard PSO PSO+ASL Standard PSO PSO+ASL
DTM 81.80 86.40 92.15 83.15 86.61 91.76 83.93 85.69 91.37
SVMM 82.22 86.00 86.31 81.24 86.64 85.69 81.16 84.54 86.72
NBCM 85.69 88.63 89.89 85.97 87.53 89.03 85.78 87.66 88.99
LRM 86.16 88.41 88.42 85.63 87.48 87.67 85.59 87.18 87.41
LM 68.10 100 100 66.86 83.92 100 64.99 83.92 100

TELKOMNIKA Telecommun Comput El Control, Vol. 22, No. 1, February 2024: 76-85
TELKOMNIKA Telecommun Comput El Control  83

The combination of PSO and ASL in the HF disease classification study demonstrated that the k-NN
method outperformed all other classifiers across all dataset ratio compositions (70:30, 80:20, and 90:10 with
k-fold=10). The analysis results are visually represented by the performance vector in Figure 4. Specifically,
the true positive (TP) value, representing the number of true positives, is 287, indicating accurate prediction
of HF disease classification. The false positive (FP) value, which represents the number of false positives, is
0, indicating no instances of negative data being incorrectly classified as positive data (70:30 ratio).
Similarly, for dataset ratios of 80:20 and 90:10, the true positive values are 328 and 369, respectively,
indicating correct classification of positive data for HF disease. In both cases, the false positive value remains
at 0, indicating accurate prediction of negative data.

Figure 4. Performance results of the KNN algorithm (70:30, 80:20, and 90:10)

4. CONCLUSION
The study on the integration of ASL and PSO techniques for classification data mining to predict HF
has yielded promising results. The primary goal of the study was to enhance the accuracy of traditional ML
algorithms in classifying HF patients based on various clinical characteristics. To achieve this, twelve
classifiers were employed and categorized into five groups: DTM, SVM, NBCM, LRM, and LM. The
parameters of these algorithms were optimized using ASL and PSO techniques, while a combination of split
validation and cross-validation with composition ratios of 70:30, 80:20, and 90:10, along with a k-fold value
of 10, was utilized. The results indicated that ASL and PSO techniques outperformed the conventional ML
algorithms in terms of accuracy and AUC. However, it is important to note that the study had certain
limitations, such as a small sample size and the absence of external validation, which warrant further
investigation to assess the effectiveness of ASL and PSO techniques in a broader patient population. In
conclusion, this research demonstrates that the utilization of PSO-based ASL techniques for classification data
mining holds significant implications for clinical practice and improved patient outcomes in predicting HF.

REFERENCES
[1] S. Q. Duong et al., “Identification of patients at risk of new onset heart failure: Utilizing a large statewide health information
exchange to train and validate a risk prediction model,” PLoS One, vol. 16, no. 12 December, pp. 1–13, 2021, doi:
10.1371/journal.pone.0260885.
[2] S. Saepudin, P. Ball, and H. Morrissey, “Development of prediction model for identifying heart failure patients with high risk of
developing hyponatremia,” J. Kedokt. dan Kesehat. Indones., vol. 10, no. 2, pp. 121–131, 2019, doi:
10.20885/jkki.vol10.iss2.art4.
[3] E. E. Tripoliti, T. G. Papadopoulos, G. S. Karanasiou, K. K. Naka, and D. I. Fotiadis, “Heart Failure: Diagnosis, Severity
Estimation and Prediction of Adverse Events Through Machine Learning Techniques,” Comput. Struct. Biotechnol. J., vol. 15, pp.
26–47, 2017, doi: 10.1016/j.csbj.2016.11.001.
[4] A. W. Sugiyarto, A. M. Abadi, and Sumarna, “Classification of heart disease based on PCG signal using CNN,” Telkomnika
(Telecommunication Comput. Electron. Control., vol. 19, no. 5, pp. 1697–1706, 2021, doi:
10.12928/TELKOMNIKA.v19i5.20486.

Integration of PSO-based advanced supervised learning techniques for classification data … (Mesran)
84  ISSN: 1693-6930

[5] T. N. Nguyen, T. H. Nguyen, and V. T. Ngo, “Artifact elimination in ECG signal using wavelet transform,” Telkomnika
(Telecommunication Comput. Electron. Control., vol. 18, no. 2, pp. 936–944, 2020, doi: 10.12928/TELKOMNIKA.V18I2.14403.
[6] Yajuan Wang et al., “Early Detection of Heart Failure with Varying Prediction Windows by Structured and Unstructured Data in
Electronic Health Records,” HHS Public Access, vol. 176, no. 1, pp. 139–148, 2018, doi: 10.1109/EMBC.2015.7318907.Early.
[7] T. Chen, S. Zhao, S. Shao, and S. Zheng, “Non-invasive diagnosis methods of coronary disease based on wavelet denoising and
sound analyzing,” Saudi J. Biol. Sci., vol. 24, no. 3, pp. 526–536, 2017, doi: 10.1016/j.sjbs.2017.01.023.
[8] A. F. AlOthman, A. R. W. Sait, and T. A. Alhussain, “Detecting Coronary Artery Disease from Computed Tomography Images
Using a Deep Learning Technique,” Diagnostics, vol. 12, no. 9, 2022, doi: 10.3390/diagnostics12092073.
[9] A. P. Windarto and T. Herawan, Decision Support System on Determination of Contraception Tools as an Effort to Suppress the
Number of Growth Ratios in Indonesia, vol. 730, Springer 1 Nature Singapore Pte Ltd, 2021. doi: https://doi.org/10.1007/978-
981-33-4597-3_69.
[10] A. P. Windarto and T. Herawan, K-Means Algorithm with Rapidminer in Clustering School Participation Rate in Indonesia.
Springer 1 Nature Singapore Pte Ltd, 2021. doi: https://doi.org/10.1007/978-981-33-4597-3_70.
[11] A. Al Bataineh and S. Manacek, “MLP-PSO Hybrid Algorithm for Heart Disease Prediction,” J. Pers. Med., vol. 12, no. 8, 2022,
doi: 10.3390/jpm12081208.
[12] I. D. Mienye and Y. Sun, “Improved heart disease prediction using particle swarm optimization based stacked sparse
autoencoder,” Electron., vol. 10, no. 19, 2021, doi: 10.3390/electronics10192347.
[13] S. I. Novichasari and I. S. Wibisono, “Particle Swarm Optimization For Improved Accuracy of Disease Diagnosis,” J. Appl. Intell.
Syst., vol. 5, no. 2, pp. 57–68, 2020.
[14] C. Krittanawong et al., “Integration of novel monitoring devices with machine learning technology for scalable cardiovascular
management,” Nat. Rev. Cardiol., vol. 18, no. 2, pp. 75–91, 2021, doi: 10.1038/s41569-020-00445-9.
[15] A. Javeed, S. U. Khan, L. Ali, S. Ali, Y. Imrana, and A. Rahman, “Machine Learning-Based Automated Diagnostic Systems
Developed for Heart Failure Prediction Using Different Types of Data Modalities: A Systematic Review and Future Directions,”
Comput. Math. Methods Med., vol. 2022, 2022, doi: 10.1155/2022/9288452.
[16] A. Guo, M. Pasque, F. Loh, D. L. Mann, and P. R. O. Payne, “Heart Failure Diagnosis, Readmission, and Mortality Prediction
Using Machine Learning and Artificial Intelligence Models,” Curr. Epidemiol. Reports, vol. 7, no. 4, pp. 212–219, 2020, doi:
10.1007/s40471-020-00259-w.
[17] D. J. Choi, J. J. Park, T. Ali, and S. Lee, “Artificial intelligence for the diagnosis of heart failure,” npj Digit. Med., vol. 3, no. 1,
2020, doi: 10.1038/s41746-020-0261-3.
[18] D. K. Plati et al., “A machine learning approach for chronic heart failure diagnosis,” Diagnostics, vol. 11, no. 10, pp. 1–15, 2021,
doi: 10.3390/diagnostics11101863.
[19] S. Kordnoori, H. Mostafaei, M. Rostamy-Malkhalifeh, and M. Ostadrahimi, “Diagnosis of Heart Disease Using Feature Selection
Methods Based On Recurrent Fuzzy Neural Networks,” IPTEK J. Technol. Sci., vol. 32, no. 2, p. 64, 2021, doi:
10.12962/j20882033.v32i2.7075.
[20] Q. Bai, C. Su, W. Tang, and Y. Li, “Machine learning to predict end stage kidney disease in chronic kidney disease,” Sci. Rep.,
vol. 12, no. 1, pp. 1–8, 2022, doi: 10.1038/s41598-022-12316-z.
[21] M. E. Grams et al., “Predicting timing of clinical outcomes in patients with chronic kidney disease and severely decreased
glomerular filtration rate,” Kidney Int., vol. 93, no. 6, pp. 1442–1451, 2018, doi: 10.1016/j.kint.2018.01.009.
[22] E. Dovgan et al., “Using machine learning models to predict the initiation of renal replacement therapy among chronic kidney
disease patients,” PLoS One, vol. 15, no. 6, pp. 1–18, 2020, doi: 10.1371/journal.pone.0233976.
[23] C. L. Ramspek et al., “Predicting Kidney Failure, Cardiovascular Disease and Death in Advanced CKD Patients,” Kidney Int.
Reports, vol. 7, no. 10, pp. 2230–2241, 2022, doi: 10.1016/j.ekir.2022.07.165.
[24] M. Ozcan and S. Peker, “A classification and regression tree algorithm for heart disease modeling and prediction,” Healthc. Anal.,
vol. 3, no. December 2022, p. 100130, 2023, doi: 10.1016/j.health.2022.100130.
[25] M. Q. Syafi, “Increasing Accuracy of Heart Disease Classification on C4.5 Algorithm Based on Information Gain Ratio and
Particle Swarm Optimization Using Adaboost Ensemble,” J. Adv. Inf. Syst. Technol., vol. 4, no. 1, pp. 100–112, 2022.
[26] M. K. Iliyas and I. S. Shaikh, “Prediction of Heart Disease Using Decision Tree,” Int. J. Adv. Res. Comput. Sci. Softw. Eng., vol.
6, no. 3, pp. 530–532, 2016.
[27] E. Owusu, P. Boakye-Sekyerehene, J. K. Appati, and J. Y. Ludu, “Computer-Aided Diagnostics of Heart Disease Risk Prediction
Using Boosting Support Vector Machine,” Comput. Intell. Neurosci., vol. 2021, 2021, doi: 10.1155/2021/3152618.
[28] V. S. K. Reddy, P. Meghana, N. V. S. Reddy, and B. A. Rao, “Prediction on Cardiovascular disease using Decision tree and Naïve
Bayes classifiers,” J. Phys. Conf. Ser., vol. 2161, no. 1, 2022, doi: 10.1088/1742-6596/2161/1/012015.
[29] S. Jabbedari Khiabani, A. Batani, and E. Khanmohammadi, “A hybrid decision support system for heart failure diagnosis using
neural networks and statistical process control,” Healthc. Anal., vol. 2, p. 100110, 2022, doi: 10.1016/j.health.2022.100110.
[30] M. Yuvalı, B. Yaman, and Ö. Tosun, “Classification Comparison of Machine Learning Algorithms Using Two Independent CAD
Datasets,” Mathematics, vol. 10, no. 3, 2022, doi: 10.3390/math10030311.
[31] R. zaib and O. Ourabah, “Large Scale Data Using K-Means,” Mesopotamian J. Big Data, pp. 38–47, 2023, doi:
10.58496/mjbd/2023/006.
[32] M. Alajanbi, D. Malerba, and H. Liu, “Distributed Reduced Convolution Neural Networks,” Mesopotamian J. Big Data, pp. 26–
29, 2021, doi: 10.58496/mjbd/2021/005.
[33] A. Tharwat, “Classification assessment methods,” Appl. Comput. Informatics, vol. 17, no. 1, pp. 168–192, 2018, doi:
10.1016/j.aci.2018.08.003.
[34] P. H. Kasani, J. E. Lee, C. Park, C. H. Yun, J. W. Jang, and S. A. Lee, “Evaluation of nutritional status and clinical depression
classification using an explainable machine learning method,” Front. Nutr., vol. 10, 2023, doi: 10.3389/fnut.2023.1165854.
[35] I. Markoulidakis, I. Rallis, I. Georgoulas, G. Kopsiaftis, A. Doulamis, and N. Doulamis, “Multiclass Confusion Matrix Reduction
Method and Its Application on Net Promoter Score Classification Problem,” Technologies, vol. 9, no. 4, 2021, doi:
10.3390/technologies9040081.
[36] D. Chicco and G. Jurman, “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary
classification evaluation,” BMC Genomics, vol. 21, no. 1, pp. 1–13, 2020, doi: 10.1186/s12864-019-6413-7.

TELKOMNIKA Telecommun Comput El Control, Vol. 22, No. 1, February 2024: 76-85
TELKOMNIKA Telecommun Comput El Control  85

BIOGRAPHIES OF AUTHORS

Mesran the author was born in Medan on August 24, 1978, he completed his
master’s degree in Computer Science in 2008 at Universitas Putra Indonesia. Currently, he is
actively teaching at STMIK Budi Darma since 2005 as a permanent lecturer in the Informatics
Engineering program. He can be contacted at email: [email protected].

Remuz Mb Kmurawak the author is a passionate and enthusiastic lecturer who


approaches their work with care and dedication, aiming to inspire and motivate students to
achieve academic success. Their objective extends beyond imparting knowledge; they strive to
foster the development of high-caliber students equipped with practical learning skills. The
author demonstrates excellent communication skills, recognizing its vital role in promoting
teamwork and attaining clear objectives. With a master’s degree in Information Technology,
the author possesses a solid foundation in data analysis and extensive experience as a lecturer
in the Information System Department at Cenderawasih University. Additionally, they actively
participate in the Papuan Information and Communication Technology Council, showcasing
their commitment to the field. In the past, the author worked as a Data Analyst at PT Probindo
Artika Jaya from 2008 to 2011, further enhancing their expertise and practical application of
knowledge. He can be contacted at email at [email protected].

Agus Perdana Windarto the author was born in Pematangsiantar on August


30th, 1986. They completed their master’s degree in Computer Science in 2014 at Universitas
Putra Indonesia ‘YPTK’ Padang, and are currently pursuing their doctorate (Ph.D.) degree at
the same university. The author has been an active lecturer at STIKOM Tunas Bangsa since
2012, teaching in the Information Systems program. Their research focuses on artificial
intelligence (decision support systems, expert systems, data mining, neural networks, fuzzy
logic, deep learning, and genetic algorithms). Additionally, the author has served as a reviewer
for various nationally accredited journals (SINTA 2 - SINTA 6) and manages a community
called “Pemburu Jurnal” at STIKOM Tunas Bangsa. They have won multiple research grant
proposals from DIKTI (twice in 2018-2019), DIKTI Community Service Grant (once in 2019),
PKM-P Grant (as a student advisor in 2018), and PKM-AI Grant (as a student advisor in
2019). The author is also part of the Relawan Jurnal Indonesia (RJI) community in North
Sumatra, the Data Science Indonesia Researchers Association (PDSI), the Forum of Higher
Education Communities (FKPT), and is a co-founder of the Yayasan Adwitiya Basurata
Inovasi (Yayasan Abivasi) foundation with fellow professors. He can be contacted at email:
[email protected] or [email protected].

Integration of PSO-based advanced supervised learning techniques for classification data … (Mesran)

You might also like