Article Search
닫기

Original Article

Split Viewer

International Journal of Fuzzy Logic and Intelligent Systems 2023; 23(2): 140-151

Published online June 25, 2023

https://doi.org/10.5391/IJFIS.2023.23.2.140

© The Korean Institute of Intelligent Systems

Imbalanced Learning in Heart Disease Categorization: Improving Minority Class Prediction Accuracy Using the SMOTE Algorithm

Mediana Aryuni1 , Suko Adiarto2 , Eka Miranda1 , Evaristus Didik Madyatmadja1 , Albert Verasius Dian Sano3 , and Elvin Sestomi1

1Department of Information Systems, Bina Nusantara University, Jakarta, Indonesia
2Department of Cardiology and Vascular Medicine, University of Indonesia, Jakarta, Indonesia
3Department of Computer Science, Bina Nusantara University, Jakarta, Indonesia

Correspondence to :
Mediana Aryuni (mediana.aryuni@binus.ac.id)

Received: December 5, 2022; Revised: April 6, 2023; Accepted: May 22, 2023

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

In the field of medical data mining, imbalanced data categorization occurs frequently, which typically leads to classifiers with low predictive accuracy for the minority class. This study aims to construct a classifier model for imbalanced data using the SMOTE oversampling algorithm and a heart disease dataset obtained from Harapan Kita Hospital. The categorization model utilized logistic regression, decision tree, random forest, bagging logistic regression, and bagging decision tree. SMOTE improved the model prediction accuracy with imbalanced data, particularly for minority classes.

Keywords: Heart disease, Prediction, Imbalanced, Accuracy, SMOTE

No potential conflicts of interest relevant to this article were reported.

Mediana Aryuni is a Faculty Member in the School of Information Systems, Bina Nusantara University, Jakarta, Indonesia. Her research interests are data mining and text mining. E-mail: mediana.aryuni@binus.ac.id,

Suko Adiarto is a cardiologist sub specializing in interventional cardiology and a medical staff at the Department of Cardiology and Vascular Medicine, University of Indonesia-National Heart Center Harapan Kita. He has a special interest research in the vascular field. E-mail: sukoadiarto@gmail.com

Eka Miranda is a Faculty Member in the School of Information Systems, Bina Nusantara University, Jakarta, Indonesia. Her research interests are data mining and text mining. E-mail: ekamiranda@binus.ac.id

Evaristus Didik Madyatmadja is a Faculty Member in the School of Information Systems, Bina Nusantara University, Jakarta, Indonesia. His research interests are data mining, E-Government and smart city. E-mail: emadyatmadja@binus.edu

Albert Verasius Dian Sano is a Faculty Member in the School of Computer Science, Bina Nusantara University, Jakarta, Indonesia. His research interests are data mining, E-Government and smart city. E-mail: albert.sano@binus.edu

Elvin Sestomi is a student of in the School of Information Systems, Bina Nusantara University, Jakarta, Indonesia. Her research interest is business intelligence and advanced database. E-mail: elvin.sestomi@binus.ac.id

Article

Original Article

International Journal of Fuzzy Logic and Intelligent Systems 2023; 23(2): 140-151

Published online June 25, 2023 https://doi.org/10.5391/IJFIS.2023.23.2.140

Copyright © The Korean Institute of Intelligent Systems.

Imbalanced Learning in Heart Disease Categorization: Improving Minority Class Prediction Accuracy Using the SMOTE Algorithm

Mediana Aryuni1 , Suko Adiarto2 , Eka Miranda1 , Evaristus Didik Madyatmadja1 , Albert Verasius Dian Sano3 , and Elvin Sestomi1

1Department of Information Systems, Bina Nusantara University, Jakarta, Indonesia
2Department of Cardiology and Vascular Medicine, University of Indonesia, Jakarta, Indonesia
3Department of Computer Science, Bina Nusantara University, Jakarta, Indonesia

Correspondence to:Mediana Aryuni (mediana.aryuni@binus.ac.id)

Received: December 5, 2022; Revised: April 6, 2023; Accepted: May 22, 2023

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

In the field of medical data mining, imbalanced data categorization occurs frequently, which typically leads to classifiers with low predictive accuracy for the minority class. This study aims to construct a classifier model for imbalanced data using the SMOTE oversampling algorithm and a heart disease dataset obtained from Harapan Kita Hospital. The categorization model utilized logistic regression, decision tree, random forest, bagging logistic regression, and bagging decision tree. SMOTE improved the model prediction accuracy with imbalanced data, particularly for minority classes.

Keywords: Heart disease, Prediction, Imbalanced, Accuracy, SMOTE

Fig 1.

Figure 1.

Data preprocessing.

The International Journal of Fuzzy Logic and Intelligent Systems 2023; 23: 140-151https://doi.org/10.5391/IJFIS.2023.23.2.140

Fig 2.

Figure 2.

Model development.

The International Journal of Fuzzy Logic and Intelligent Systems 2023; 23: 140-151https://doi.org/10.5391/IJFIS.2023.23.2.140

Fig 3.

Figure 3.

ROC AOC curve example [18].

The International Journal of Fuzzy Logic and Intelligent Systems 2023; 23: 140-151https://doi.org/10.5391/IJFIS.2023.23.2.140

Fig 4.

Figure 4.

ROC curves of five classifier models without SMOTE algorithm: (a) LR, (b) DT, (c) RF, (d) bagging LR, (e) bagging DT.

The International Journal of Fuzzy Logic and Intelligent Systems 2023; 23: 140-151https://doi.org/10.5391/IJFIS.2023.23.2.140

Fig 5.

Figure 5.

ROC curves of five classifier models with SMOTE algorithm: (a) LR, (b) DT, (c) RF, (d) bagging LR, (e) bagging DT.

The International Journal of Fuzzy Logic and Intelligent Systems 2023; 23: 140-151https://doi.org/10.5391/IJFIS.2023.23.2.140

Table 1 . Attribute description.

Attribute nameDescription
gendermale or female
ageage in years
eritrositthe number of red blood cells in the patient’s blood vessels
hematokritpercentage of red blood cells in blood
hemoglobinthe amount of oxygen and iron-carrying protein in the patient’s body
hermchthe average mass of hemoglobin per red blood cell in a blood sample from a patient
khermchcthe patient’s average hemoglobin concentration per red blood cell
leukositwhite blood cell number in the patient’s blood vessels
trombositthe number of platelets in the patient’s blood for blood clotting
diagnoseclass label (class target)

Table 2 . Performance of each model without SMOTE algorithm.

ModelPerformance measurement
Precision (%)Recall (%)F1-score (%)AUC (%)
LR47504964
DT70606362
RF71586170
Bagging LR72 515069
Bagging DT98667397

Table 3 . Performance of each model with SMOTE algorithm.

ModelPerformance measurement
Precision (%)Recall (%)F1-score (%)AUC (%)
LR66666672
DT76757582
RF78787887
Bagging LR66666673
Bagging DT83838392

Table 4 . Performance of each model before and after using SMOTE algorithm in predicting minority class.

ModelPerformance measurement using SMOTE
Precision (%)Recall (%)F1-score (%)
BeforeAfterBeforeAfterBeforeAfter
LR96667671566
DT107081841876
RF147582812478
Bagging LR96667671566
Bagging DT198176763078

Share this article on :

Related articles in IJFIS