International Journal of Fuzzy Logic and Intelligent Systems 2024; 24(3): 231-241
Published online September 25, 2024
https://doi.org/10.5391/IJFIS.2024.24.3.231
© The Korean Institute of Intelligent Systems
Amirthalakshmi Thirumalai Maadapoosi1, Velan Balamurugan2, V. Vedanarayanan2, Sahaya Anselin Nisha2, and R. Narmadha2
1Department of Electronics and Communication Engineering, SRM Institute of Science and Technology, Ramapuram, Chennai, India
2Department of Electronics and Communication Engineering, Sathyabama Institute of Science and Technology, Chennai, India
Correspondence to :
Amirthalakshmi Thirumalai Maadapoosi (amirthatm@gmail.com)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Heart disease is currently one of the leading causes of death worldwide. Predicting cardiac diseases is a significant challenge in clinical data analysis. Machine learning is useful for generating judgments and predictions based on the significant amount of data generated by healthcare businesses. Using patient heart disease data, this research presents a heart disease prediction module using the grey wolf optimization (GWO)-tuned artificial neural network (ANN) classifier. The effectiveness of the proposed module relies on the optimal tuning of the weights of the ANN classifier using the GWO algorithm, which involves the hierarchy-based prey-hunting features of grey wolves. This enhances the prediction accuracy of the proposed model. The efficiency of the proposed model was analyzed in terms of performance indices such as accuracy, sensitivity, specificity, and F1-score, which were determined to be 92.9245%, 94.7469%, 90.9215%, and 96.0551%, respectively. The experimental results obtained using the proposed GWO-ANN module validated the efficiency of the proposed strategy compared with conventional models.
Keywords: Grey wolf optimization, Heart disease, Machine learning, Artificial neural network
By 2030, the number of deaths due to cardiovascular disease is expected to increase to 23.3 million [1]. The veins of the heart transport oxygen; if these vessels become blocked or restricted, heart disease or stroke may occur. According to the World Health Organization, approximately 12 million people are affected by heart disease each year, and 80% of those affected die as a result of this condition [2]. The key variables that affect the heart include high cholesterol, high blood pressure, tension, stress, alcohol use, a sedentary lifestyle, obesity, and diabetes. These characteristics can aid in the detection of cardiac disease. Higher blood pressure causes thickening of the arterial walls, resulting in obstruction and an increased mortality rate [3]. The early detection and treatment of cardiac disease can help to prevent and lower patient death rates [4]. Several researchers in the medical field have used various supervised and unsupervised machine learning (ML) algorithms to diagnose and predict cardiac disease. ML approaches aid in the extraction of usable knowledge from large datasets [5].
These algorithms have found widespread use in healthcare, speech recognition, cosmology, and education. Several techniques have been proposed for detecting various patterns in huge datasets [6]. ML is widely used in the healthcare industry to diagnose diseases and make informed decisions. This aids in the classification of patients with significant risk factors. Healthcare businesses collect massive volumes of data, and ML provides several models for fast training and data analysis [7]. These algorithms seek a large search space for optimal solutions by training a dataset. To assess the performance of the models, many measures such as accuracy, sensitivity, specificity, precision, and F1-score may be utilized. The optimum hyperparameters are identified and applied to the ML models by modifying the hyperparameters, which enhances the performance. Robust models can be created by tweaking the hyperparameters. The hyperparameters can be tuned to prevent overfitting or underfitting [4, 8].
This study presents a unique optimization-controlled ML model for cardiovascular disease identification based on patient heart disease data. An artificial neural network (ANN) classifier is used to classify cardiac disease, and the weights of the ANN classifier are optimized using the grey wolf optimization (GWO) method. To make the classification process effective, significant statistical characteristics such as mean, variance, kurtosis, standard deviation, and skewness are retrieved from the input data. Furthermore, the findings of this study reveal a complete description of the proposed model, allowing the efficacy of the proposed categorization model to be confirmed. In addition, a comprehensive analysis is presented based on the performance of traditional classification techniques to assess the benefits of the proposed heart disease strategy.
The major contributions of this research are summarized as follows:
• A novel artificial intelligence (AI)-based heart disease prediction strategy is designed and developed for advanced healthcare systems.
• The GWO algorithm is used to enhance the prediction accuracy of the ANN classifier by optimally tuning the weights and parameters.
• The performance of the ANN model is validated by comparing the proposed and existing methods for the heart disease prediction module.
The remainder of this paper is organized as follows: Section 2 outlines modern heart disease categorization methodologies, as well as the issues that they face. The proposed GWO-based ANN model for heart disease classification is described in Section 3. The outputs of the proposed model are discussed in Section 4, and the study is concluded in Section 5.
The existing techniques for heart disease categorization and the problems associated with the models are discussed in this section, serving as inspiration for the creation of the proposed model.
A literature review of existing methods is presented herein. Abdel-Basset et al. [9] developed an Internet of Things (IoT)-based approach for identifying and monitoring cardiac patients. The purpose of the healthcare model is to increase diagnostic precision. The precise answer to the model could lead to a reduction in mortality and treatment costs. Particle swarm optimization (PSO) and rough sets (RS) with transductive support vector machines (TSVM) was created by Thiyagaraj and Suseendran [10] for heart disease detection. To reduce data redundancy, this strategy increases data integrity. Data are normalized using the zero-score method. PSO is then used to select the best subset of characteristics, reducing the computing cost and improving the prediction performance. Finally, a radial basis function TSVM classifier is used for heart disease prediction. Shah et al. [11] developed an automated diagnostic approach for cardiac disease. Using the advantages of feature selection and extraction models, this technique assesses relevant feature subsets. Magesh and Swarnalatha [12] developed a model for detecting heart disease using Cleveland cardiac samples. Cluster-based decision tree (DT) learning is used to identify cardiac problems. The intended label distribution is used to split the original set and the potential class is determined using the elevated distribution samples. Entropy is employed to determine the characteristics of each class set for heart disease diagnosis. Nilashi et al. [13] used ML algorithms to develop a prediction approach for heart disease detection. This system combines unsupervised and supervised learning to identify cardiac problems. For missing value assertion, the approach uses a self-organizing map, fuzzy SVM, and principal component analysis (PCA). Furthermore, incremental PCA and the fuzzy SVM are designed for incremental data learning to reduce the computation time in disease prediction.
The challenges associated with the existing methods for heart disease classification are as follows:
• The problem with heart disease data is that they require feature selection, which is complicated by the difference in samples and the absence of feature magnitudes. Furthermore, the explanation of cardiac echo images is difficult because of the lack of defined criteria for accurately inferring the data [1].
• Large amounts of original data have been created during the digital age. Gathering useful data is a significant challenge and is an ideal area for exploration. Several ambiguities are present in the information systems that are used for disease diagnosis, making them difficult to handle [4].
• For the diagnosis of cardiac disease, patient data include test results, video clips, and demographic information. Mining valuable information from large amounts of data is time consuming [9].
Heart disease is the most common cause of death worldwide. India is in the midst of an industrial revolution that has resulted in lifestyle changes that contribute to heart disease. Despite the information contained in medical data, clinicians diagnose diseases based on their experience and individual opinions, which can lead to inaccurate diagnoses and high diagnostic expenses, thereby affecting the quality of services offered by hospitals to patients [14]. Medical data mining aims to discover useful information for accurate diagnoses. Disease therapy is difficult and time consuming in healthcare settings. The identification of cardiac diseases resulting from several circumstances is considered a multi-layered problem. Hence, it is necessary to develop an automated model for the accurate prediction of heart disease.
This study presents a method for predicting heart disease using an ML model and patient data. The data are analyzed so that the cardiac problems of individuals can be anticipated automatically. Appropriate tuning of the ANN parameters aids the system in improving its effectiveness in predicting cardiac diseases. Figure 1 provides a diagrammatic representation of the proposed model for heart disease prediction. ANNs are computational models inspired by the neural architecture of the human brain, and consist of interconnected layers of nodes or neurons. Each connection has an associated weight, and ANNs can learn complex patterns and relationships in the data through a training process that typically uses algorithms such as backpropagation. They are widely used for tasks such as classification, regression, and pattern recognition owing to their ability to model nonlinear relationships and handle high-dimensional data. Integrating ANNs with optimization algorithms such as the GWO algorithm can enhance their performance. GWO is a nature-inspired metaheuristic that mimics the social hierarchy and hunting behavior of grey wolves. It involves four types of wolves–alpha, beta, delta, and omega–representing the hierarchy within the pack. The algorithm iteratively updates the positions of the wolves based on the best solutions found thus far, guided by the alpha, beta, and delta wolves. Combining ANNs with GWO leverages the strengths of both, using GWO to optimize the weights and biases of the ANN, thereby improving the ability of the network to find optimal or near-optimal solutions to various complex problems. This hybrid approach enhances the learning capabilities and overall performance of the ANN, making it effective for a wide range of applications.
Consider the input heart disease medical dataset
where
The purpose of pre-processing is to make the input data processing more efficient. Pre-processing is an important phase that prepares the input data for the prediction process. In this step, medical records are converted into diagnostic values as part of the data pre-processing.
The features are obtained from the heart disease data once the pre-processing stage is completed. Statistical features, such as mean, standard deviation, variance, kurtosis, and skewness are retrieved from the pre-processed data. With the aid of statistical features, enormous amounts of data can be analyzed to find common patterns and trends, and then transformed into useful information. The extracted features are described in detail below.
where
The variance measure tracks minute deviations in data over time, which helps to enhance the diagnosis process considerably.
where
The feature vector is created using the input data, which include all vital statistics of the patient. These features are combined to form a feature vector, which is expressed as
The dimension of the feature vector is [1×5], and the feature vector is used as the input by the ANN to categorize the input as normal or abnormal.
Heart disease can be classified using ML algorithms, which detect conditions without the need for human interaction. The results obtained with ANNs offer many benefits in a range of applications compared with other ML approaches. An ANN is a mathematical model based on a biological neural network composed of a group of artificial neurons that calculate information using a connection mechanism. ANNs are computer models with many interconnections and simple, highly parallel processors. The primary premise behind the development of the ANN is that it is based on human characteristics and has neuron layers for analyzing the data that it receives as input. Multiplying the ANN input by weights [15] represents the data flow. Mathematical functions are used to assess the weights, resulting in the activation function of the neuron. The appropriate weight-tuning strategy of the GWO algorithm allows the output of the ANN to be controlled as desired. The construction of the ANN is illustrated in Figure 2.
The output of the ANN can be mathematically expressed as
where
The GWO optimization technique is a metaheuristic based on the hierarchy-based hunting characteristics of the Canidae family [16]. The aspects of the optimization method focused on hunting behaviors indicate the dynamic nature of addressing convergence concerns. TheGWO algorithm solves this problem by leveraging the dynamic features of the Canidae population. This improves the ability of the algorithm to achieve better global optimal convergence than other optimization techniques that frequently converge to local optimal solutions, with better stability between the exploitation and exploration phases. In the presence of grey wolves, the hierarchy-based hunting phenomenon exhibits a hunting hierarchy, including a leading grey wolf, a representative grey wolf, an obeyer grey wolf, and a scapegoat grey wolf.
As stated previously, the hierarchy-based hunting experience of the population is supported by hierarchical levels, with the leading grey wolf at the top of the hierarchy and in charge of devising solutions for resting, hunting, and other activities. During hunting, the leader grey wolf teaches the other grey wolves, despite the fact that the leading grey wolf is not the strongest and does not have any mates within the group, but remains the best grey wolf in the group. At the second-order level, the representative grey wolf directs the leading grey wolf by transferring the decision to the subordinate grey wolves in the search area. The scapegoat grey wolves remain at the bottom of the food chain, guarding the entire force. The obeyer grey wolves lie directly above the scapegoat grey wolf and mostly follow the orders of the leading grey wolves. The proposed GWO algorithm comprises four primary phases, as follows.
where
where
where
where
where
where
This section discusses the outcomes of the GWO-ANN prediction module, and a comparative analysis demonstrates the performance of the GWO-ANN in cardiovascular disease prediction.
The study was conducted using Python, which was installed on a Windows 10 64-bit computer with 16 GB of RAM.
These databases were integrated based on 11 similar traits, resulting in the world’s largest heart disease dataset [18].
To validate the efficacy of the proposed technique, methods including the multilayer perceptron (MLP) classifier, SVM classifier, random forest (RF), AdaBoost classifier [19–22], and ANN classifier [8] were compared with the proposed GWO-ANN classifier in terms of the accuracy, sensitivity, specificity, and F1-score.
This section presents a comparative evaluation of the approaches used in the classification of heart disease using conventional techniques and the proposed model (Table 2).
Figure 3 presents a comparative analysis of these methods using the heart disease database. Figure 3(a) presents a comparative evaluation of the methods in terms of the accuracy. The accuracies of the MLP, SVM, RF, AdaBoost, ANN, and proposed GWO-ANN methods for the training percentage of 70% were 80.898%, 81.3107%, 90.4011%, 84.8175%, 90.8367%, and 92.4979%, respectively. Figure 3(b) presents a comparative evaluation in terms of the sensitivity. The sensitivities of the MLP, SVM, RF, AdaBoost, ANN, and proposed GWO-ANN for the training percentage of 70% were 77.7198%, 82.9382%, 90.3623%, 89.4911%, 90.7979%, and 94.3203%, respectively.
Figure 3(c) presents a comparative evaluation in terms of the specificity. The specificities of the MLP, SVM, RF, AdaBoost, ANN, and proposed GWO-ANN for the training percentage of 70% were 71.3489%, 75.6583%, 86.9725%, 84.8229%, 88.8337%, and 90.4949%, respectively. Figure 3(d) shows the comparative evaluation in terms of the F1-score. The F1-scores of the MLP, SVM, RF, AdaBoost, ANN, and proposed GWO-ANN for the training percentage of 70% were 81.0367%, 82.401%, 90.7993%, 87.5715%, 92.1061%, and 95.6285%, respectively.
Figure 4 shows a comparative evaluation using the Statlog + Cleveland + Hungary database. Figure 4(a) shows a comparative evaluation of the accuracy of the methods. The accuracies of the MLP, SVM, RF, AdaBoost, ANN, and proposed GWO-ANN for the training percentage of 70% were 54.0108%, 77.8053%, 89.3536%, 87.9337%, 91.5205%, and 92.8122%, respectively. Figure 4(b) presents a comparative evaluation of the methods in terms of the sensitivity. The sensitivities of the MLP, SVM, RF, AdaBoost, ANN, and proposed GWO-ANN for the training percentage of 70% were 70.4158%, 75.9977%, 92.2176%, 91.0989%, 93.3803%, and 94.6347%, respectively.
Figure 4(c) presents a comparative evaluation of the methods in terms of their specificity. The specificities of the MLP, SVM, RF, AdaBoost, ANN, and proposed GWO-ANN for the training percentage of 70% were 52.0078%, 81.8617%, 86.7021%, 85.242%, 89.5175%, and 90.8093%, respectively. Figure 4(d) presents a comparative evaluation based on the F1-score. The F1-scores of the MLP, SVM, RF, AdaBoost, ANN, and proposed GWO-ANN for the training percentage of 70% were 57.1414%, 75.6547%, 92.1683%, 90.7796%, 94.6511%, and 95.9429%, respectively.
The proposed method attained the highest accuracy of 92.50% for the heart disease dataset and 92.92% accuracy for the Statlog + Cleveland + Hungary database. In addition, the sensitivity, specificity, and F1-score of the proposed method were 94.75%, 90.92%, and 96.06%, respectively. It is clear from the results that the proposed model enables enhanced heart disease classification, with better measures of accuracy, sensitivity, specificity, and F1-score.
This study has presented an ML-based heart disease prediction module for the diagnosis of cardiovascular diseases by analyzing heart disease data. The heart disease data were initially preprocessed to eliminate artifacts. Significant statistical features such as the mean, standard deviation, variance, kurtosis, and skewness were then extracted from the data, through which a feature vector was developed that acted as the input to the proposed ANN classifier. The performance of the ANN was enhanced by optimally tuning the weights using the GWO algorithm, which is derived by considering the hierarchy-based hunting characteristic features of grey wolves. The efficiency of the proposed model was verified using performance measures including the accuracy, sensitivity, specificity, and F1-score, with results of 92.9245%, 94.7469%, 90.9215%, and 96.0551%, respectively. In future, the accuracy of the model will be further enhanced by optimally tuning the ANN classifier using hybrid metaheuristic optimization algorithms. Furthermore, this method can be tested for real-time applications in the medical field.
No potential conflict of interest relevant to this article was reported.
Table 1. Pseudocode of GWO algorithm.
Sl. no. | Pseudocode of GWO optimization algorithm |
---|---|
1 | Initialize grey wolf population and coefficients |
2 | Calculate fitness value for grey wolves |
3 | For all grey wolves |
4 | { |
5 | |
6 | |
7 | |
8 | } |
9 | While |
10 | { |
11 | Update the grey wolf position as per Eq. (20) |
12 | } |
13 | Else |
14 | { |
15 | Update the coefficients, |
16 | Evaluate fitness for all grey wolves |
17 | Update the solutions |
18 | |
19 | } |
20 | EndWhile |
21 | EndFor |
22 | Return |
Table 2. Comparison of heart disease classification techniques.
S. no. | Datasets | Metrics | Methods | |||||
---|---|---|---|---|---|---|---|---|
MLP | SVM | RF | AdaBoost | ANN | Proposed GWO-ANN | |||
1 | Heart disease dataset | Accuracy (%) | 81.19 | 85.01 | 90.52 | 89.09 | 91.83 | 92.50 |
Sensitivity (%) | 85.48 | 90.61 | 92.34 | 91.23 | 93.65 | 94.32 | ||
Specificity (%) | 76.30 | 78.61 | 88.52 | 87.09 | 90.49 | 90.49 | ||
F1-score (%) | 85.06 | 88.65 | 93.65 | 92.22 | 94.96 | 95.63 | ||
2 | Statlog + Cleveland + Hungary database | Accuracy (%) | 85.65 | 86.93 | 90.36 | 89.22 | 91.63 | 92.92 |
Sensitivity (%) | 85.81 | 88.45 | 92.33 | 91.21 | 93.46 | 94.75 | ||
Specificity (%) | 83.66 | 86.20 | 88.35 | 87.22 | 89.63 | 90.92 | ||
F1-score (%) | 88.37 | 89.70 | 93.49 | 92.35 | 94.76 | 96.06 |
International Journal of Fuzzy Logic and Intelligent Systems 2024; 24(3): 231-241
Published online September 25, 2024 https://doi.org/10.5391/IJFIS.2024.24.3.231
Copyright © The Korean Institute of Intelligent Systems.
Amirthalakshmi Thirumalai Maadapoosi1, Velan Balamurugan2, V. Vedanarayanan2, Sahaya Anselin Nisha2, and R. Narmadha2
1Department of Electronics and Communication Engineering, SRM Institute of Science and Technology, Ramapuram, Chennai, India
2Department of Electronics and Communication Engineering, Sathyabama Institute of Science and Technology, Chennai, India
Correspondence to:Amirthalakshmi Thirumalai Maadapoosi (amirthatm@gmail.com)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Heart disease is currently one of the leading causes of death worldwide. Predicting cardiac diseases is a significant challenge in clinical data analysis. Machine learning is useful for generating judgments and predictions based on the significant amount of data generated by healthcare businesses. Using patient heart disease data, this research presents a heart disease prediction module using the grey wolf optimization (GWO)-tuned artificial neural network (ANN) classifier. The effectiveness of the proposed module relies on the optimal tuning of the weights of the ANN classifier using the GWO algorithm, which involves the hierarchy-based prey-hunting features of grey wolves. This enhances the prediction accuracy of the proposed model. The efficiency of the proposed model was analyzed in terms of performance indices such as accuracy, sensitivity, specificity, and F1-score, which were determined to be 92.9245%, 94.7469%, 90.9215%, and 96.0551%, respectively. The experimental results obtained using the proposed GWO-ANN module validated the efficiency of the proposed strategy compared with conventional models.
Keywords: Grey wolf optimization, Heart disease, Machine learning, Artificial neural network
By 2030, the number of deaths due to cardiovascular disease is expected to increase to 23.3 million [1]. The veins of the heart transport oxygen; if these vessels become blocked or restricted, heart disease or stroke may occur. According to the World Health Organization, approximately 12 million people are affected by heart disease each year, and 80% of those affected die as a result of this condition [2]. The key variables that affect the heart include high cholesterol, high blood pressure, tension, stress, alcohol use, a sedentary lifestyle, obesity, and diabetes. These characteristics can aid in the detection of cardiac disease. Higher blood pressure causes thickening of the arterial walls, resulting in obstruction and an increased mortality rate [3]. The early detection and treatment of cardiac disease can help to prevent and lower patient death rates [4]. Several researchers in the medical field have used various supervised and unsupervised machine learning (ML) algorithms to diagnose and predict cardiac disease. ML approaches aid in the extraction of usable knowledge from large datasets [5].
These algorithms have found widespread use in healthcare, speech recognition, cosmology, and education. Several techniques have been proposed for detecting various patterns in huge datasets [6]. ML is widely used in the healthcare industry to diagnose diseases and make informed decisions. This aids in the classification of patients with significant risk factors. Healthcare businesses collect massive volumes of data, and ML provides several models for fast training and data analysis [7]. These algorithms seek a large search space for optimal solutions by training a dataset. To assess the performance of the models, many measures such as accuracy, sensitivity, specificity, precision, and F1-score may be utilized. The optimum hyperparameters are identified and applied to the ML models by modifying the hyperparameters, which enhances the performance. Robust models can be created by tweaking the hyperparameters. The hyperparameters can be tuned to prevent overfitting or underfitting [4, 8].
This study presents a unique optimization-controlled ML model for cardiovascular disease identification based on patient heart disease data. An artificial neural network (ANN) classifier is used to classify cardiac disease, and the weights of the ANN classifier are optimized using the grey wolf optimization (GWO) method. To make the classification process effective, significant statistical characteristics such as mean, variance, kurtosis, standard deviation, and skewness are retrieved from the input data. Furthermore, the findings of this study reveal a complete description of the proposed model, allowing the efficacy of the proposed categorization model to be confirmed. In addition, a comprehensive analysis is presented based on the performance of traditional classification techniques to assess the benefits of the proposed heart disease strategy.
The major contributions of this research are summarized as follows:
• A novel artificial intelligence (AI)-based heart disease prediction strategy is designed and developed for advanced healthcare systems.
• The GWO algorithm is used to enhance the prediction accuracy of the ANN classifier by optimally tuning the weights and parameters.
• The performance of the ANN model is validated by comparing the proposed and existing methods for the heart disease prediction module.
The remainder of this paper is organized as follows: Section 2 outlines modern heart disease categorization methodologies, as well as the issues that they face. The proposed GWO-based ANN model for heart disease classification is described in Section 3. The outputs of the proposed model are discussed in Section 4, and the study is concluded in Section 5.
The existing techniques for heart disease categorization and the problems associated with the models are discussed in this section, serving as inspiration for the creation of the proposed model.
A literature review of existing methods is presented herein. Abdel-Basset et al. [9] developed an Internet of Things (IoT)-based approach for identifying and monitoring cardiac patients. The purpose of the healthcare model is to increase diagnostic precision. The precise answer to the model could lead to a reduction in mortality and treatment costs. Particle swarm optimization (PSO) and rough sets (RS) with transductive support vector machines (TSVM) was created by Thiyagaraj and Suseendran [10] for heart disease detection. To reduce data redundancy, this strategy increases data integrity. Data are normalized using the zero-score method. PSO is then used to select the best subset of characteristics, reducing the computing cost and improving the prediction performance. Finally, a radial basis function TSVM classifier is used for heart disease prediction. Shah et al. [11] developed an automated diagnostic approach for cardiac disease. Using the advantages of feature selection and extraction models, this technique assesses relevant feature subsets. Magesh and Swarnalatha [12] developed a model for detecting heart disease using Cleveland cardiac samples. Cluster-based decision tree (DT) learning is used to identify cardiac problems. The intended label distribution is used to split the original set and the potential class is determined using the elevated distribution samples. Entropy is employed to determine the characteristics of each class set for heart disease diagnosis. Nilashi et al. [13] used ML algorithms to develop a prediction approach for heart disease detection. This system combines unsupervised and supervised learning to identify cardiac problems. For missing value assertion, the approach uses a self-organizing map, fuzzy SVM, and principal component analysis (PCA). Furthermore, incremental PCA and the fuzzy SVM are designed for incremental data learning to reduce the computation time in disease prediction.
The challenges associated with the existing methods for heart disease classification are as follows:
• The problem with heart disease data is that they require feature selection, which is complicated by the difference in samples and the absence of feature magnitudes. Furthermore, the explanation of cardiac echo images is difficult because of the lack of defined criteria for accurately inferring the data [1].
• Large amounts of original data have been created during the digital age. Gathering useful data is a significant challenge and is an ideal area for exploration. Several ambiguities are present in the information systems that are used for disease diagnosis, making them difficult to handle [4].
• For the diagnosis of cardiac disease, patient data include test results, video clips, and demographic information. Mining valuable information from large amounts of data is time consuming [9].
Heart disease is the most common cause of death worldwide. India is in the midst of an industrial revolution that has resulted in lifestyle changes that contribute to heart disease. Despite the information contained in medical data, clinicians diagnose diseases based on their experience and individual opinions, which can lead to inaccurate diagnoses and high diagnostic expenses, thereby affecting the quality of services offered by hospitals to patients [14]. Medical data mining aims to discover useful information for accurate diagnoses. Disease therapy is difficult and time consuming in healthcare settings. The identification of cardiac diseases resulting from several circumstances is considered a multi-layered problem. Hence, it is necessary to develop an automated model for the accurate prediction of heart disease.
This study presents a method for predicting heart disease using an ML model and patient data. The data are analyzed so that the cardiac problems of individuals can be anticipated automatically. Appropriate tuning of the ANN parameters aids the system in improving its effectiveness in predicting cardiac diseases. Figure 1 provides a diagrammatic representation of the proposed model for heart disease prediction. ANNs are computational models inspired by the neural architecture of the human brain, and consist of interconnected layers of nodes or neurons. Each connection has an associated weight, and ANNs can learn complex patterns and relationships in the data through a training process that typically uses algorithms such as backpropagation. They are widely used for tasks such as classification, regression, and pattern recognition owing to their ability to model nonlinear relationships and handle high-dimensional data. Integrating ANNs with optimization algorithms such as the GWO algorithm can enhance their performance. GWO is a nature-inspired metaheuristic that mimics the social hierarchy and hunting behavior of grey wolves. It involves four types of wolves–alpha, beta, delta, and omega–representing the hierarchy within the pack. The algorithm iteratively updates the positions of the wolves based on the best solutions found thus far, guided by the alpha, beta, and delta wolves. Combining ANNs with GWO leverages the strengths of both, using GWO to optimize the weights and biases of the ANN, thereby improving the ability of the network to find optimal or near-optimal solutions to various complex problems. This hybrid approach enhances the learning capabilities and overall performance of the ANN, making it effective for a wide range of applications.
Consider the input heart disease medical dataset
where
The purpose of pre-processing is to make the input data processing more efficient. Pre-processing is an important phase that prepares the input data for the prediction process. In this step, medical records are converted into diagnostic values as part of the data pre-processing.
The features are obtained from the heart disease data once the pre-processing stage is completed. Statistical features, such as mean, standard deviation, variance, kurtosis, and skewness are retrieved from the pre-processed data. With the aid of statistical features, enormous amounts of data can be analyzed to find common patterns and trends, and then transformed into useful information. The extracted features are described in detail below.
where
The variance measure tracks minute deviations in data over time, which helps to enhance the diagnosis process considerably.
where
The feature vector is created using the input data, which include all vital statistics of the patient. These features are combined to form a feature vector, which is expressed as
The dimension of the feature vector is [1×5], and the feature vector is used as the input by the ANN to categorize the input as normal or abnormal.
Heart disease can be classified using ML algorithms, which detect conditions without the need for human interaction. The results obtained with ANNs offer many benefits in a range of applications compared with other ML approaches. An ANN is a mathematical model based on a biological neural network composed of a group of artificial neurons that calculate information using a connection mechanism. ANNs are computer models with many interconnections and simple, highly parallel processors. The primary premise behind the development of the ANN is that it is based on human characteristics and has neuron layers for analyzing the data that it receives as input. Multiplying the ANN input by weights [15] represents the data flow. Mathematical functions are used to assess the weights, resulting in the activation function of the neuron. The appropriate weight-tuning strategy of the GWO algorithm allows the output of the ANN to be controlled as desired. The construction of the ANN is illustrated in Figure 2.
The output of the ANN can be mathematically expressed as
where
The GWO optimization technique is a metaheuristic based on the hierarchy-based hunting characteristics of the Canidae family [16]. The aspects of the optimization method focused on hunting behaviors indicate the dynamic nature of addressing convergence concerns. TheGWO algorithm solves this problem by leveraging the dynamic features of the Canidae population. This improves the ability of the algorithm to achieve better global optimal convergence than other optimization techniques that frequently converge to local optimal solutions, with better stability between the exploitation and exploration phases. In the presence of grey wolves, the hierarchy-based hunting phenomenon exhibits a hunting hierarchy, including a leading grey wolf, a representative grey wolf, an obeyer grey wolf, and a scapegoat grey wolf.
As stated previously, the hierarchy-based hunting experience of the population is supported by hierarchical levels, with the leading grey wolf at the top of the hierarchy and in charge of devising solutions for resting, hunting, and other activities. During hunting, the leader grey wolf teaches the other grey wolves, despite the fact that the leading grey wolf is not the strongest and does not have any mates within the group, but remains the best grey wolf in the group. At the second-order level, the representative grey wolf directs the leading grey wolf by transferring the decision to the subordinate grey wolves in the search area. The scapegoat grey wolves remain at the bottom of the food chain, guarding the entire force. The obeyer grey wolves lie directly above the scapegoat grey wolf and mostly follow the orders of the leading grey wolves. The proposed GWO algorithm comprises four primary phases, as follows.
where
where
where
where
where
where
This section discusses the outcomes of the GWO-ANN prediction module, and a comparative analysis demonstrates the performance of the GWO-ANN in cardiovascular disease prediction.
The study was conducted using Python, which was installed on a Windows 10 64-bit computer with 16 GB of RAM.
These databases were integrated based on 11 similar traits, resulting in the world’s largest heart disease dataset [18].
To validate the efficacy of the proposed technique, methods including the multilayer perceptron (MLP) classifier, SVM classifier, random forest (RF), AdaBoost classifier [19–22], and ANN classifier [8] were compared with the proposed GWO-ANN classifier in terms of the accuracy, sensitivity, specificity, and F1-score.
This section presents a comparative evaluation of the approaches used in the classification of heart disease using conventional techniques and the proposed model (Table 2).
Figure 3 presents a comparative analysis of these methods using the heart disease database. Figure 3(a) presents a comparative evaluation of the methods in terms of the accuracy. The accuracies of the MLP, SVM, RF, AdaBoost, ANN, and proposed GWO-ANN methods for the training percentage of 70% were 80.898%, 81.3107%, 90.4011%, 84.8175%, 90.8367%, and 92.4979%, respectively. Figure 3(b) presents a comparative evaluation in terms of the sensitivity. The sensitivities of the MLP, SVM, RF, AdaBoost, ANN, and proposed GWO-ANN for the training percentage of 70% were 77.7198%, 82.9382%, 90.3623%, 89.4911%, 90.7979%, and 94.3203%, respectively.
Figure 3(c) presents a comparative evaluation in terms of the specificity. The specificities of the MLP, SVM, RF, AdaBoost, ANN, and proposed GWO-ANN for the training percentage of 70% were 71.3489%, 75.6583%, 86.9725%, 84.8229%, 88.8337%, and 90.4949%, respectively. Figure 3(d) shows the comparative evaluation in terms of the F1-score. The F1-scores of the MLP, SVM, RF, AdaBoost, ANN, and proposed GWO-ANN for the training percentage of 70% were 81.0367%, 82.401%, 90.7993%, 87.5715%, 92.1061%, and 95.6285%, respectively.
Figure 4 shows a comparative evaluation using the Statlog + Cleveland + Hungary database. Figure 4(a) shows a comparative evaluation of the accuracy of the methods. The accuracies of the MLP, SVM, RF, AdaBoost, ANN, and proposed GWO-ANN for the training percentage of 70% were 54.0108%, 77.8053%, 89.3536%, 87.9337%, 91.5205%, and 92.8122%, respectively. Figure 4(b) presents a comparative evaluation of the methods in terms of the sensitivity. The sensitivities of the MLP, SVM, RF, AdaBoost, ANN, and proposed GWO-ANN for the training percentage of 70% were 70.4158%, 75.9977%, 92.2176%, 91.0989%, 93.3803%, and 94.6347%, respectively.
Figure 4(c) presents a comparative evaluation of the methods in terms of their specificity. The specificities of the MLP, SVM, RF, AdaBoost, ANN, and proposed GWO-ANN for the training percentage of 70% were 52.0078%, 81.8617%, 86.7021%, 85.242%, 89.5175%, and 90.8093%, respectively. Figure 4(d) presents a comparative evaluation based on the F1-score. The F1-scores of the MLP, SVM, RF, AdaBoost, ANN, and proposed GWO-ANN for the training percentage of 70% were 57.1414%, 75.6547%, 92.1683%, 90.7796%, 94.6511%, and 95.9429%, respectively.
The proposed method attained the highest accuracy of 92.50% for the heart disease dataset and 92.92% accuracy for the Statlog + Cleveland + Hungary database. In addition, the sensitivity, specificity, and F1-score of the proposed method were 94.75%, 90.92%, and 96.06%, respectively. It is clear from the results that the proposed model enables enhanced heart disease classification, with better measures of accuracy, sensitivity, specificity, and F1-score.
This study has presented an ML-based heart disease prediction module for the diagnosis of cardiovascular diseases by analyzing heart disease data. The heart disease data were initially preprocessed to eliminate artifacts. Significant statistical features such as the mean, standard deviation, variance, kurtosis, and skewness were then extracted from the data, through which a feature vector was developed that acted as the input to the proposed ANN classifier. The performance of the ANN was enhanced by optimally tuning the weights using the GWO algorithm, which is derived by considering the hierarchy-based hunting characteristic features of grey wolves. The efficiency of the proposed model was verified using performance measures including the accuracy, sensitivity, specificity, and F1-score, with results of 92.9245%, 94.7469%, 90.9215%, and 96.0551%, respectively. In future, the accuracy of the model will be further enhanced by optimally tuning the ANN classifier using hybrid metaheuristic optimization algorithms. Furthermore, this method can be tested for real-time applications in the medical field.
Block diagram of proposed heart disease detection module.
Structure of neural network.
Comparative analysis based on heart disease database: (a) accuracy, (b) sensitivity, (c) specificity, and (d) F1-score.
Comparative evaluation based on Statlog + Cleveland + Hungary database: (a) accuracy, (b) sensitivity, (c) specificity, and (d) F1-score.
Table 1 . Pseudocode of GWO algorithm.
Sl. no. | Pseudocode of GWO optimization algorithm |
---|---|
1 | Initialize grey wolf population and coefficients |
2 | Calculate fitness value for grey wolves |
3 | For all grey wolves |
4 | { |
5 | |
6 | |
7 | |
8 | } |
9 | While |
10 | { |
11 | Update the grey wolf position as per Eq. (20) |
12 | } |
13 | Else |
14 | { |
15 | Update the coefficients, |
16 | Evaluate fitness for all grey wolves |
17 | Update the solutions |
18 | |
19 | } |
20 | EndWhile |
21 | EndFor |
22 | Return |
Table 2 . Comparison of heart disease classification techniques.
S. no. | Datasets | Metrics | Methods | |||||
---|---|---|---|---|---|---|---|---|
MLP | SVM | RF | AdaBoost | ANN | Proposed GWO-ANN | |||
1 | Heart disease dataset | Accuracy (%) | 81.19 | 85.01 | 90.52 | 89.09 | 91.83 | 92.50 |
Sensitivity (%) | 85.48 | 90.61 | 92.34 | 91.23 | 93.65 | 94.32 | ||
Specificity (%) | 76.30 | 78.61 | 88.52 | 87.09 | 90.49 | 90.49 | ||
F1-score (%) | 85.06 | 88.65 | 93.65 | 92.22 | 94.96 | 95.63 | ||
2 | Statlog + Cleveland + Hungary database | Accuracy (%) | 85.65 | 86.93 | 90.36 | 89.22 | 91.63 | 92.92 |
Sensitivity (%) | 85.81 | 88.45 | 92.33 | 91.21 | 93.46 | 94.75 | ||
Specificity (%) | 83.66 | 86.20 | 88.35 | 87.22 | 89.63 | 90.92 | ||
F1-score (%) | 88.37 | 89.70 | 93.49 | 92.35 | 94.76 | 96.06 |
Jeong-Hun Kang, Seong-Jin Park, Ye-Won Kim, and Bo-Yeong Kang
International Journal of Fuzzy Logic and Intelligent Systems 2024; 24(3): 242-257 https://doi.org/10.5391/IJFIS.2024.24.3.242Nishant Chauhan and Byung-Jae Choi
International Journal of Fuzzy Logic and Intelligent Systems 2024; 24(1): 10-18 https://doi.org/10.5391/IJFIS.2024.24.1.10Christine Musanase, Anthony Vodacek, Damien Hanyurwimfura, Alfred Uwitonze, Aloys Fashaho, and Adrien Turamyemyirijuru
International Journal of Fuzzy Logic and Intelligent Systems 2023; 23(2): 214-228 https://doi.org/10.5391/IJFIS.2023.23.2.214Block diagram of proposed heart disease detection module.
|@|~(^,^)~|@|Structure of neural network.
|@|~(^,^)~|@|Comparative analysis based on heart disease database: (a) accuracy, (b) sensitivity, (c) specificity, and (d) F1-score.
|@|~(^,^)~|@|Comparative evaluation based on Statlog + Cleveland + Hungary database: (a) accuracy, (b) sensitivity, (c) specificity, and (d) F1-score.