International Journal of Fuzzy Logic and Intelligent Systems 2023; 23(2): 140-151
Published online June 25, 2023
https://doi.org/10.5391/IJFIS.2023.23.2.140
© The Korean Institute of Intelligent Systems
Mediana Aryuni1 , Suko Adiarto2 , Eka Miranda1 , Evaristus Didik Madyatmadja1 , Albert Verasius Dian Sano3 , and Elvin Sestomi1
1Department of Information Systems, Bina Nusantara University, Jakarta, Indonesia
2Department of Cardiology and Vascular Medicine, University of Indonesia, Jakarta, Indonesia
3Department of Computer Science, Bina Nusantara University, Jakarta, Indonesia
Correspondence to :
Mediana Aryuni (mediana.aryuni@binus.ac.id)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
In the field of medical data mining, imbalanced data categorization occurs frequently, which typically leads to classifiers with low predictive accuracy for the minority class. This study aims to construct a classifier model for imbalanced data using the SMOTE oversampling algorithm and a heart disease dataset obtained from Harapan Kita Hospital. The categorization model utilized logistic regression, decision tree, random forest, bagging logistic regression, and bagging decision tree. SMOTE improved the model prediction accuracy with imbalanced data, particularly for minority classes.
Keywords: Heart disease, Prediction, Imbalanced, Accuracy, SMOTE
No potential conflicts of interest relevant to this article were reported.
International Journal of Fuzzy Logic and Intelligent Systems 2023; 23(2): 140-151
Published online June 25, 2023 https://doi.org/10.5391/IJFIS.2023.23.2.140
Copyright © The Korean Institute of Intelligent Systems.
Mediana Aryuni1 , Suko Adiarto2 , Eka Miranda1 , Evaristus Didik Madyatmadja1 , Albert Verasius Dian Sano3 , and Elvin Sestomi1
1Department of Information Systems, Bina Nusantara University, Jakarta, Indonesia
2Department of Cardiology and Vascular Medicine, University of Indonesia, Jakarta, Indonesia
3Department of Computer Science, Bina Nusantara University, Jakarta, Indonesia
Correspondence to:Mediana Aryuni (mediana.aryuni@binus.ac.id)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
In the field of medical data mining, imbalanced data categorization occurs frequently, which typically leads to classifiers with low predictive accuracy for the minority class. This study aims to construct a classifier model for imbalanced data using the SMOTE oversampling algorithm and a heart disease dataset obtained from Harapan Kita Hospital. The categorization model utilized logistic regression, decision tree, random forest, bagging logistic regression, and bagging decision tree. SMOTE improved the model prediction accuracy with imbalanced data, particularly for minority classes.
Keywords: Heart disease, Prediction, Imbalanced, Accuracy, SMOTE
Data preprocessing.
Model development.
ROC AOC curve example [
ROC curves of five classifier models without SMOTE algorithm: (a) LR, (b) DT, (c) RF, (d) bagging LR, (e) bagging DT.
ROC curves of five classifier models with SMOTE algorithm: (a) LR, (b) DT, (c) RF, (d) bagging LR, (e) bagging DT.
Table 1 . Attribute description.
Attribute name | Description |
---|---|
male or female | |
age in years | |
the number of red blood cells in the patient’s blood vessels | |
percentage of red blood cells in blood | |
the amount of oxygen and iron-carrying protein in the patient’s body | |
the average mass of hemoglobin per red blood cell in a blood sample from a patient | |
the patient’s average hemoglobin concentration per red blood cell | |
white blood cell number in the patient’s blood vessels | |
the number of platelets in the patient’s blood for blood clotting | |
class label (class target) |
Table 2 . Performance of each model without SMOTE algorithm.
Model | Performance measurement | |||
---|---|---|---|---|
Precision (%) | Recall (%) | F1-score (%) | AUC (%) | |
47 | 50 | 49 | 64 | |
70 | 60 | 63 | 62 | |
71 | 58 | 61 | 70 | |
72 51 | 50 | 69 | ||
98 | 66 | 73 | 97 |
Table 3 . Performance of each model with SMOTE algorithm.
Model | Performance measurement | |||
---|---|---|---|---|
Precision (%) | Recall (%) | F1-score (%) | AUC (%) | |
66 | 66 | 66 | 72 | |
76 | 75 | 75 | 82 | |
78 | 78 | 78 | 87 | |
66 | 66 | 66 | 73 | |
83 | 83 | 83 | 92 |
Table 4 . Performance of each model before and after using SMOTE algorithm in predicting minority class.
Model | Performance measurement using SMOTE | |||||
---|---|---|---|---|---|---|
Precision (%) | Recall (%) | F1-score (%) | ||||
Before | After | Before | After | Before | After | |
9 | 66 | 67 | 67 | 15 | 66 | |
10 | 70 | 81 | 84 | 18 | 76 | |
14 | 75 | 82 | 81 | 24 | 78 | |
9 | 66 | 67 | 67 | 15 | 66 | |
19 | 81 | 76 | 76 | 30 | 78 |
Hansoo Lee, Jonggeun Kim, and Sungshin Kim
Int. J. Fuzzy Log. Intell. Syst. 2017; 17(4): 229-234 https://doi.org/10.5391/IJFIS.2017.17.4.229Data preprocessing.
|@|~(^,^)~|@|Model development.
|@|~(^,^)~|@|ROC AOC curve example [
ROC curves of five classifier models without SMOTE algorithm: (a) LR, (b) DT, (c) RF, (d) bagging LR, (e) bagging DT.
|@|~(^,^)~|@|ROC curves of five classifier models with SMOTE algorithm: (a) LR, (b) DT, (c) RF, (d) bagging LR, (e) bagging DT.