1 PB
1 PB
1 PB
Corresponding Author:
Mesran
Department of Computer Science, Universitas Budi Darma
Jl. Sisingamangaraja No. 338, Siti Rejo I, Medan Kota District, Medan City, North Sumatra, Indonesia
Email: [email protected]
1. INTRODUCTION
Heart failure (HF) is a medical condition that is characterized by a complex set of symptoms rather
than a specific disease [1]. It occurs when the ventricle struggles to fill or empty with blood, making it
challenging for the heart to meet the body’s circulation needs. Common symptoms include shortness of
breath, swollen ankles, and fatigue, while signs such as high jugular venous pressure, pulmonary crackles,
and peripheral edema may also be present, indicating structural and/or functional cardiac or non-cardiac
abnormalities [2], [3]. In Indonesia, heart disease is the leading cause of death, and HF represents a
significant portion of these cases [4]. Approximately 5% of the country’s population is estimated to suffer
from HF [5]. Furthermore, the fatality rate is significant, with up to 17.2% of all HF patients dying during
their initial hospitalization, regardless of a history of heart attacks. Additionally, 11.3% of patients died
within a year of starting treatment, while another 17% required repeated hospitalizations due to worsening
HF. These patients are typically hospitalized at least once a year after diagnosis, with an average age of 58.
Data from the Basic Health Research Data (Riskesdas) for 2013 and 2018 show an increasing trend in heart
disease, rising from 0.5% in 2013 to 1.5% in 2018. Heart disease, including HF, is associated with significant
healthcare costs, with IDR 7.7 trillion spent on it in 2021, according to data from the Social Security
Administering Body for Health (BPJS). These statistics emphasize the importance of early detection and
treatment of HF. Traditional diagnosis of HF relies on the patient’s medical history, physical tests, and the
doctor’s examination of related symptoms [3], [6]. Angiographic techniques are one of the most reliable
conventional methods for diagnosing HF [7]. However, this method requires specialized expertise and comes
with a high cost and potential side effects [8].
While there have been efforts to achieve high predictive performance and identify relevant risk
factors associated with HF, the emergence of artificial intelligence (AI) tools and machine learning (ML)
algorithms [9], [10] in recent years has provided powerful diagnostic aids [11]. These tools can extract
knowledge from large amounts of data, which may be difficult or impossible for humans to achieve [12], [13].
By employing ML-based decision-making approaches, doctors can detect the risk of HF and provide
necessary treatments and recommendations to manage these risks [14]. Early detection and treatment using
ML techniques have the potential to significantly improve patient survival rates. Consequently, several studies
have utilized ML for the diagnosis [15]–[19] and prediction of HF, such as determining the likelihood of a
patient having a disease history that may cause HF, such as hypertension, diabetes, or hyperlipidemia [20]–[23].
Various classification algorithms, including decision trees [24]–[26], support vector machines (SVM) [27],
Naïve Bayes [28], and neural networks [29] have been used for HF prediction. Despite these efforts,
accurately predicting HF remains a significant challenge. Comparison and benchmarking results of ML
classifiers have shown no significant differences in performance [30], and no single classifier has proven to
be the best for all datasets.
Our study aims to address the existing gap in accurately predicting heart failure using machine
learning techniques. Despite various efforts, no single classifier has proven to be the best for all datasets. In
this research, we present a novel approach that incorporates advanced supervised learning (ASL) [29]–[31]
and particle swarm optimization (PSO) [32], [33] techniques to optimize classification results. Moreover, we
employed split and cross-validation techniques with varying composition ratios of 70:30, 80:20, and 90:10,
using k-fold=10, and tested twelve classifiers sorted into five groups: decision tree models (DTM), SVM,
Naïve Bayes classifier models (NBCM), logistic regression models (LRM), and lazy models (LM). The
selection of these classifiers was based on several considerations. Firstly, previous studies have shown that
various classification algorithms, such as decision trees [22]–[24], SVM [25], Naïve Bayes [26], and neural
networks [27], have been used for HF prediction. These algorithms have demonstrated their effectiveness in
handling complex datasets and have been widely employed in HF research. Secondly, the rationale behind
choosing multiple classifiers lies in the understanding that no single classifier has proven to be the best for all
datasets or consistently outperforms others. Comparison and benchmarking results of ML classifiers have
shown no significant differences in performance [28]. Therefore, by employing a diverse set of classifiers,
the paper aims to explore the strengths and weaknesses of each algorithm and identify the most suitable
classifiers for HF classification. By evaluating the PSO and ASL algorithms on 12 classifiers grouped into
five categories, this study aims to assess the strengths and weaknesses of each classifier and determine the
most appropriate one for HF classification. This research makes a significant contribution by offering a more
precise approach to diagnosing heart failure, leading to early detection and improved patient outcomes.
Furthermore, our findings can guide future research endeavors aimed at enhancing the diagnosis and
treatment of heart failure. The integration of AI and ML techniques [31], [32] in healthcare holds great
promise for enhancing patient well-being and reducing healthcare expenses.
2. METHOD
The primary objective of this study is to enhance the classification performance of 12 classifiers
through the integration of ASL and PSO techniques. A comprehensive evaluation of classifier performance
was conducted using a combination of split tests and cross-validation. The training and test data were
partitioned into different ratios, namely 70:30, 80:20, and 90:10, with a k-fold value of 10. To assess the
effectiveness of the proposed model, data from HF patients were employed. By subjecting the classifiers to
this dataset, the study aimed to improve their classification performance.
twelve variables, with eleven variables serving as inputs and one variable acting as the output (label). Each
variable’s subset was tailored according to the specific requirements of the study. The subsequent section
provides a comprehensive description of the variables utilized in the HF study.
The study utilized a sample dataset (complete data can be accessed at https://shorturl.at/klvS2), as
presented in Table 1, consisting of various parameters related to patients. These parameters include the age of
the patient in years, the sex of the patient (M for male and F for female), the type of chest pain experienced
(TA for typical angina, ATA for atypical angina, NAP for non-anginal pain, and ASY for asymptomatic), the
resting blood pressure (RestingBP) in mm Hg, the serum cholesterol level in mm/dl, the fasting blood sugar
(FastingBS) (1 if FastingBS > 120 mg/dl, 0 otherwise), the results of the resting electrocardiogram
(RestingECG) (normal, ST for ST-T wave abnormality, and LVH for probable or definite left ventricular
hypertrophy), the maximum heart rate achieved (MaxHR) (numeric value between 60 and 202), the presence
of exercise-induced angina (Y for yes and N for no), the oldpeak value measured in depression, the slope of
the peak exercise ST segment (up for upsloping, flat for flat, and down for downsloping), and the output class
indicating the presence of heart disease (1 for HF and 0 for normal).
The table represents data for predicting heart failure. It includes information about patients’ age,
gender, chest pain type, resting blood pressure (RestingBP), cholesterol levels, fasting blood sugar
(FastingBS), the results of the resting electrocardiogram (restingECG), the maximum heart rate achieved
(MaxHR), exercise-induced angina, ST depression at exercise, ST slope, and heart disease condition. The
data consists of 918 patients, where each row represents one patient’s information. This data can be used to
build a predictive model that will help identify the risk factors associated with heart failure. Through
analyzing this data, researchers can gain insights into patterns or correlations between the different variables
that may contribute to the onset of heart failure. Ultimately, this data has tremendous potential to inform
clinical decisions and improve patient outcomes.
Table 1. Dataset
Chest pain Exercise Heart
No Age Sex RestingBP Cholesterol FastingBS RestingECG MaxHR Oldpeak ST_slope
type angina disease
1 40 M ATA 140 289 0 Normal 172 N 0 Up Normal
2 49 F NAP 160 180 0 Normal 156 N 1 Flat HF
3 37 M ATA 130 283 0 ST 98 N 0 Up Normal
4 48 F ASY 138 214 0 Normal 108 Y 1.5 Flat HF
5 54 M NAP 150 195 0 Normal 122 N 0 Up Normal
6 39 M NAP 120 339 0 Normal 170 N 0 Up Normal
7 45 F ATA 130 237 0 Normal 170 N 0 Up Normal
8 54 M ATA 110 208 0 Normal 142 N 0 Up Normal
. . . … … … … … … … … … …
914 45 M TA 110 264 0 Normal 132 N 1.2 Flat HF
915 68 M ASY 144 193 1 Normal 141 N 3.4 Flat HF
916 57 M ASY 130 131 0 Normal 115 Y 1.2 Flat HF
917 57 F ATA 130 236 0 LVH 174 N 0 Flat HF
918 38 M NAP 138 175 0 Normal 173 N 0 Up Normal
TELKOMNIKA Telecommun Comput El Control, Vol. 22, No. 1, February 2024: 76-85
TELKOMNIKA Telecommun Comput El Control 79
Integration of PSO-based advanced supervised learning techniques for classification data … (Mesran)
80 ISSN: 1693-6930
In addition to developing an accurate and robust model, it is also essential to evaluate the model’s
accuracy in predicting HF. This is carried out through the confusion matrix and the receiver operating
characteristics (ROC)/area under cover (AUC) curve. The ROC curve was created based on the values
calculated from the confusion matrix, which compares the false positive rates (FPR) and the true positive
rates (TPR). Where:
a) 𝐹𝑃𝑅 = 𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒/(𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝑇𝑟𝑢𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒);
b) 𝑇𝑃𝑅 = 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒/(𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒);
Subsequently, BAD, if the resulting curve is close to the baseline line or the line that crosses from point 0.0.
and GOOD, if the curve is close to 0.1 points.
TELKOMNIKA Telecommun Comput El Control, Vol. 22, No. 1, February 2024: 76-85
TELKOMNIKA Telecommun Comput El Control 81
than using only PSO in all classifiers, with the most significant increase seen in random tree and k-NN.
However, the results were less consistent in the 90:10 dataset split, with some classifiers showing
improvements with PSO+ASL, such as SVM, Naïve Bayes (Kernel), and LR (SVM), while others such as
decision tree, gradient boosted tree, and random tree showed a decrease in accuracy. Overall, the use of
PSO+ASL algorithms can improve classification performance for some classifier types and dataset splits, but
the appropriate algorithm should be chosen depending on the characteristics of the dataset used for predicting
heart failure. The information provided in Table 5 can be effectively represented and understood through the
graphical representation presented in Figure 2.
The results obtained from grouping the classifiers, as depicted in Figure 3, reveal that the LM
classifier achieved the highest average accuracy value of 100%. This corresponds to an average increase of
31.9%, 25.1%, and 26.97% for the 70:30, 80:20, and 90:10 ratios, respectively, when compared to the
standard model. For more detailed information regarding the average accuracy value per group, please refer
to Table 6.
Integration of PSO-based advanced supervised learning techniques for classification data … (Mesran)
82 ISSN: 1693-6930
Figure 2. The graph improved the average accuracy of each classifier (70:30, 80:20, 90:10)
Figure 3. The graph of the average value of accuracy by group (70:30, 80:20, 90:10)
TELKOMNIKA Telecommun Comput El Control, Vol. 22, No. 1, February 2024: 76-85
TELKOMNIKA Telecommun Comput El Control 83
The combination of PSO and ASL in the HF disease classification study demonstrated that the k-NN
method outperformed all other classifiers across all dataset ratio compositions (70:30, 80:20, and 90:10 with
k-fold=10). The analysis results are visually represented by the performance vector in Figure 4. Specifically,
the true positive (TP) value, representing the number of true positives, is 287, indicating accurate prediction
of HF disease classification. The false positive (FP) value, which represents the number of false positives, is
0, indicating no instances of negative data being incorrectly classified as positive data (70:30 ratio).
Similarly, for dataset ratios of 80:20 and 90:10, the true positive values are 328 and 369, respectively,
indicating correct classification of positive data for HF disease. In both cases, the false positive value remains
at 0, indicating accurate prediction of negative data.
Figure 4. Performance results of the KNN algorithm (70:30, 80:20, and 90:10)
4. CONCLUSION
The study on the integration of ASL and PSO techniques for classification data mining to predict HF
has yielded promising results. The primary goal of the study was to enhance the accuracy of traditional ML
algorithms in classifying HF patients based on various clinical characteristics. To achieve this, twelve
classifiers were employed and categorized into five groups: DTM, SVM, NBCM, LRM, and LM. The
parameters of these algorithms were optimized using ASL and PSO techniques, while a combination of split
validation and cross-validation with composition ratios of 70:30, 80:20, and 90:10, along with a k-fold value
of 10, was utilized. The results indicated that ASL and PSO techniques outperformed the conventional ML
algorithms in terms of accuracy and AUC. However, it is important to note that the study had certain
limitations, such as a small sample size and the absence of external validation, which warrant further
investigation to assess the effectiveness of ASL and PSO techniques in a broader patient population. In
conclusion, this research demonstrates that the utilization of PSO-based ASL techniques for classification data
mining holds significant implications for clinical practice and improved patient outcomes in predicting HF.
REFERENCES
[1] S. Q. Duong et al., “Identification of patients at risk of new onset heart failure: Utilizing a large statewide health information
exchange to train and validate a risk prediction model,” PLoS One, vol. 16, no. 12 December, pp. 1–13, 2021, doi:
10.1371/journal.pone.0260885.
[2] S. Saepudin, P. Ball, and H. Morrissey, “Development of prediction model for identifying heart failure patients with high risk of
developing hyponatremia,” J. Kedokt. dan Kesehat. Indones., vol. 10, no. 2, pp. 121–131, 2019, doi:
10.20885/jkki.vol10.iss2.art4.
[3] E. E. Tripoliti, T. G. Papadopoulos, G. S. Karanasiou, K. K. Naka, and D. I. Fotiadis, “Heart Failure: Diagnosis, Severity
Estimation and Prediction of Adverse Events Through Machine Learning Techniques,” Comput. Struct. Biotechnol. J., vol. 15, pp.
26–47, 2017, doi: 10.1016/j.csbj.2016.11.001.
[4] A. W. Sugiyarto, A. M. Abadi, and Sumarna, “Classification of heart disease based on PCG signal using CNN,” Telkomnika
(Telecommunication Comput. Electron. Control., vol. 19, no. 5, pp. 1697–1706, 2021, doi:
10.12928/TELKOMNIKA.v19i5.20486.
Integration of PSO-based advanced supervised learning techniques for classification data … (Mesran)
84 ISSN: 1693-6930
[5] T. N. Nguyen, T. H. Nguyen, and V. T. Ngo, “Artifact elimination in ECG signal using wavelet transform,” Telkomnika
(Telecommunication Comput. Electron. Control., vol. 18, no. 2, pp. 936–944, 2020, doi: 10.12928/TELKOMNIKA.V18I2.14403.
[6] Yajuan Wang et al., “Early Detection of Heart Failure with Varying Prediction Windows by Structured and Unstructured Data in
Electronic Health Records,” HHS Public Access, vol. 176, no. 1, pp. 139–148, 2018, doi: 10.1109/EMBC.2015.7318907.Early.
[7] T. Chen, S. Zhao, S. Shao, and S. Zheng, “Non-invasive diagnosis methods of coronary disease based on wavelet denoising and
sound analyzing,” Saudi J. Biol. Sci., vol. 24, no. 3, pp. 526–536, 2017, doi: 10.1016/j.sjbs.2017.01.023.
[8] A. F. AlOthman, A. R. W. Sait, and T. A. Alhussain, “Detecting Coronary Artery Disease from Computed Tomography Images
Using a Deep Learning Technique,” Diagnostics, vol. 12, no. 9, 2022, doi: 10.3390/diagnostics12092073.
[9] A. P. Windarto and T. Herawan, Decision Support System on Determination of Contraception Tools as an Effort to Suppress the
Number of Growth Ratios in Indonesia, vol. 730, Springer 1 Nature Singapore Pte Ltd, 2021. doi: https://doi.org/10.1007/978-
981-33-4597-3_69.
[10] A. P. Windarto and T. Herawan, K-Means Algorithm with Rapidminer in Clustering School Participation Rate in Indonesia.
Springer 1 Nature Singapore Pte Ltd, 2021. doi: https://doi.org/10.1007/978-981-33-4597-3_70.
[11] A. Al Bataineh and S. Manacek, “MLP-PSO Hybrid Algorithm for Heart Disease Prediction,” J. Pers. Med., vol. 12, no. 8, 2022,
doi: 10.3390/jpm12081208.
[12] I. D. Mienye and Y. Sun, “Improved heart disease prediction using particle swarm optimization based stacked sparse
autoencoder,” Electron., vol. 10, no. 19, 2021, doi: 10.3390/electronics10192347.
[13] S. I. Novichasari and I. S. Wibisono, “Particle Swarm Optimization For Improved Accuracy of Disease Diagnosis,” J. Appl. Intell.
Syst., vol. 5, no. 2, pp. 57–68, 2020.
[14] C. Krittanawong et al., “Integration of novel monitoring devices with machine learning technology for scalable cardiovascular
management,” Nat. Rev. Cardiol., vol. 18, no. 2, pp. 75–91, 2021, doi: 10.1038/s41569-020-00445-9.
[15] A. Javeed, S. U. Khan, L. Ali, S. Ali, Y. Imrana, and A. Rahman, “Machine Learning-Based Automated Diagnostic Systems
Developed for Heart Failure Prediction Using Different Types of Data Modalities: A Systematic Review and Future Directions,”
Comput. Math. Methods Med., vol. 2022, 2022, doi: 10.1155/2022/9288452.
[16] A. Guo, M. Pasque, F. Loh, D. L. Mann, and P. R. O. Payne, “Heart Failure Diagnosis, Readmission, and Mortality Prediction
Using Machine Learning and Artificial Intelligence Models,” Curr. Epidemiol. Reports, vol. 7, no. 4, pp. 212–219, 2020, doi:
10.1007/s40471-020-00259-w.
[17] D. J. Choi, J. J. Park, T. Ali, and S. Lee, “Artificial intelligence for the diagnosis of heart failure,” npj Digit. Med., vol. 3, no. 1,
2020, doi: 10.1038/s41746-020-0261-3.
[18] D. K. Plati et al., “A machine learning approach for chronic heart failure diagnosis,” Diagnostics, vol. 11, no. 10, pp. 1–15, 2021,
doi: 10.3390/diagnostics11101863.
[19] S. Kordnoori, H. Mostafaei, M. Rostamy-Malkhalifeh, and M. Ostadrahimi, “Diagnosis of Heart Disease Using Feature Selection
Methods Based On Recurrent Fuzzy Neural Networks,” IPTEK J. Technol. Sci., vol. 32, no. 2, p. 64, 2021, doi:
10.12962/j20882033.v32i2.7075.
[20] Q. Bai, C. Su, W. Tang, and Y. Li, “Machine learning to predict end stage kidney disease in chronic kidney disease,” Sci. Rep.,
vol. 12, no. 1, pp. 1–8, 2022, doi: 10.1038/s41598-022-12316-z.
[21] M. E. Grams et al., “Predicting timing of clinical outcomes in patients with chronic kidney disease and severely decreased
glomerular filtration rate,” Kidney Int., vol. 93, no. 6, pp. 1442–1451, 2018, doi: 10.1016/j.kint.2018.01.009.
[22] E. Dovgan et al., “Using machine learning models to predict the initiation of renal replacement therapy among chronic kidney
disease patients,” PLoS One, vol. 15, no. 6, pp. 1–18, 2020, doi: 10.1371/journal.pone.0233976.
[23] C. L. Ramspek et al., “Predicting Kidney Failure, Cardiovascular Disease and Death in Advanced CKD Patients,” Kidney Int.
Reports, vol. 7, no. 10, pp. 2230–2241, 2022, doi: 10.1016/j.ekir.2022.07.165.
[24] M. Ozcan and S. Peker, “A classification and regression tree algorithm for heart disease modeling and prediction,” Healthc. Anal.,
vol. 3, no. December 2022, p. 100130, 2023, doi: 10.1016/j.health.2022.100130.
[25] M. Q. Syafi, “Increasing Accuracy of Heart Disease Classification on C4.5 Algorithm Based on Information Gain Ratio and
Particle Swarm Optimization Using Adaboost Ensemble,” J. Adv. Inf. Syst. Technol., vol. 4, no. 1, pp. 100–112, 2022.
[26] M. K. Iliyas and I. S. Shaikh, “Prediction of Heart Disease Using Decision Tree,” Int. J. Adv. Res. Comput. Sci. Softw. Eng., vol.
6, no. 3, pp. 530–532, 2016.
[27] E. Owusu, P. Boakye-Sekyerehene, J. K. Appati, and J. Y. Ludu, “Computer-Aided Diagnostics of Heart Disease Risk Prediction
Using Boosting Support Vector Machine,” Comput. Intell. Neurosci., vol. 2021, 2021, doi: 10.1155/2021/3152618.
[28] V. S. K. Reddy, P. Meghana, N. V. S. Reddy, and B. A. Rao, “Prediction on Cardiovascular disease using Decision tree and Naïve
Bayes classifiers,” J. Phys. Conf. Ser., vol. 2161, no. 1, 2022, doi: 10.1088/1742-6596/2161/1/012015.
[29] S. Jabbedari Khiabani, A. Batani, and E. Khanmohammadi, “A hybrid decision support system for heart failure diagnosis using
neural networks and statistical process control,” Healthc. Anal., vol. 2, p. 100110, 2022, doi: 10.1016/j.health.2022.100110.
[30] M. Yuvalı, B. Yaman, and Ö. Tosun, “Classification Comparison of Machine Learning Algorithms Using Two Independent CAD
Datasets,” Mathematics, vol. 10, no. 3, 2022, doi: 10.3390/math10030311.
[31] R. zaib and O. Ourabah, “Large Scale Data Using K-Means,” Mesopotamian J. Big Data, pp. 38–47, 2023, doi:
10.58496/mjbd/2023/006.
[32] M. Alajanbi, D. Malerba, and H. Liu, “Distributed Reduced Convolution Neural Networks,” Mesopotamian J. Big Data, pp. 26–
29, 2021, doi: 10.58496/mjbd/2021/005.
[33] A. Tharwat, “Classification assessment methods,” Appl. Comput. Informatics, vol. 17, no. 1, pp. 168–192, 2018, doi:
10.1016/j.aci.2018.08.003.
[34] P. H. Kasani, J. E. Lee, C. Park, C. H. Yun, J. W. Jang, and S. A. Lee, “Evaluation of nutritional status and clinical depression
classification using an explainable machine learning method,” Front. Nutr., vol. 10, 2023, doi: 10.3389/fnut.2023.1165854.
[35] I. Markoulidakis, I. Rallis, I. Georgoulas, G. Kopsiaftis, A. Doulamis, and N. Doulamis, “Multiclass Confusion Matrix Reduction
Method and Its Application on Net Promoter Score Classification Problem,” Technologies, vol. 9, no. 4, 2021, doi:
10.3390/technologies9040081.
[36] D. Chicco and G. Jurman, “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary
classification evaluation,” BMC Genomics, vol. 21, no. 1, pp. 1–13, 2020, doi: 10.1186/s12864-019-6413-7.
TELKOMNIKA Telecommun Comput El Control, Vol. 22, No. 1, February 2024: 76-85
TELKOMNIKA Telecommun Comput El Control 85
BIOGRAPHIES OF AUTHORS
Mesran the author was born in Medan on August 24, 1978, he completed his
master’s degree in Computer Science in 2008 at Universitas Putra Indonesia. Currently, he is
actively teaching at STMIK Budi Darma since 2005 as a permanent lecturer in the Informatics
Engineering program. He can be contacted at email: [email protected].
Integration of PSO-based advanced supervised learning techniques for classification data … (Mesran)