Article Search
닫기

Original Article

Split Viewer

International Journal of Fuzzy Logic and Intelligent Systems 2024; 24(3): 231-241

Published online September 25, 2024

https://doi.org/10.5391/IJFIS.2024.24.3.231

© The Korean Institute of Intelligent Systems

Grey Wolf Optimization-Based Artificial Neural Network in the Development of an Automated Heart Disease Prediction Model

Amirthalakshmi Thirumalai Maadapoosi1, Velan Balamurugan2, V. Vedanarayanan2, Sahaya Anselin Nisha2, and R. Narmadha2

1Department of Electronics and Communication Engineering, SRM Institute of Science and Technology, Ramapuram, Chennai, India
2Department of Electronics and Communication Engineering, Sathyabama Institute of Science and Technology, Chennai, India

Correspondence to :
Amirthalakshmi Thirumalai Maadapoosi (amirthatm@gmail.com)

Received: May 12, 2022; Revised: January 27, 2023; Accepted: June 10, 2024

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Heart disease is currently one of the leading causes of death worldwide. Predicting cardiac diseases is a significant challenge in clinical data analysis. Machine learning is useful for generating judgments and predictions based on the significant amount of data generated by healthcare businesses. Using patient heart disease data, this research presents a heart disease prediction module using the grey wolf optimization (GWO)-tuned artificial neural network (ANN) classifier. The effectiveness of the proposed module relies on the optimal tuning of the weights of the ANN classifier using the GWO algorithm, which involves the hierarchy-based prey-hunting features of grey wolves. This enhances the prediction accuracy of the proposed model. The efficiency of the proposed model was analyzed in terms of performance indices such as accuracy, sensitivity, specificity, and F1-score, which were determined to be 92.9245%, 94.7469%, 90.9215%, and 96.0551%, respectively. The experimental results obtained using the proposed GWO-ANN module validated the efficiency of the proposed strategy compared with conventional models.

Keywords: Grey wolf optimization, Heart disease, Machine learning, Artificial neural network

By 2030, the number of deaths due to cardiovascular disease is expected to increase to 23.3 million [1]. The veins of the heart transport oxygen; if these vessels become blocked or restricted, heart disease or stroke may occur. According to the World Health Organization, approximately 12 million people are affected by heart disease each year, and 80% of those affected die as a result of this condition [2]. The key variables that affect the heart include high cholesterol, high blood pressure, tension, stress, alcohol use, a sedentary lifestyle, obesity, and diabetes. These characteristics can aid in the detection of cardiac disease. Higher blood pressure causes thickening of the arterial walls, resulting in obstruction and an increased mortality rate [3]. The early detection and treatment of cardiac disease can help to prevent and lower patient death rates [4]. Several researchers in the medical field have used various supervised and unsupervised machine learning (ML) algorithms to diagnose and predict cardiac disease. ML approaches aid in the extraction of usable knowledge from large datasets [5].

These algorithms have found widespread use in healthcare, speech recognition, cosmology, and education. Several techniques have been proposed for detecting various patterns in huge datasets [6]. ML is widely used in the healthcare industry to diagnose diseases and make informed decisions. This aids in the classification of patients with significant risk factors. Healthcare businesses collect massive volumes of data, and ML provides several models for fast training and data analysis [7]. These algorithms seek a large search space for optimal solutions by training a dataset. To assess the performance of the models, many measures such as accuracy, sensitivity, specificity, precision, and F1-score may be utilized. The optimum hyperparameters are identified and applied to the ML models by modifying the hyperparameters, which enhances the performance. Robust models can be created by tweaking the hyperparameters. The hyperparameters can be tuned to prevent overfitting or underfitting [4, 8].

This study presents a unique optimization-controlled ML model for cardiovascular disease identification based on patient heart disease data. An artificial neural network (ANN) classifier is used to classify cardiac disease, and the weights of the ANN classifier are optimized using the grey wolf optimization (GWO) method. To make the classification process effective, significant statistical characteristics such as mean, variance, kurtosis, standard deviation, and skewness are retrieved from the input data. Furthermore, the findings of this study reveal a complete description of the proposed model, allowing the efficacy of the proposed categorization model to be confirmed. In addition, a comprehensive analysis is presented based on the performance of traditional classification techniques to assess the benefits of the proposed heart disease strategy.

The major contributions of this research are summarized as follows:

  • • A novel artificial intelligence (AI)-based heart disease prediction strategy is designed and developed for advanced healthcare systems.

  • • The GWO algorithm is used to enhance the prediction accuracy of the ANN classifier by optimally tuning the weights and parameters.

  • • The performance of the ANN model is validated by comparing the proposed and existing methods for the heart disease prediction module.

The remainder of this paper is organized as follows: Section 2 outlines modern heart disease categorization methodologies, as well as the issues that they face. The proposed GWO-based ANN model for heart disease classification is described in Section 3. The outputs of the proposed model are discussed in Section 4, and the study is concluded in Section 5.

The existing techniques for heart disease categorization and the problems associated with the models are discussed in this section, serving as inspiration for the creation of the proposed model.

2.1 Related Works

A literature review of existing methods is presented herein. Abdel-Basset et al. [9] developed an Internet of Things (IoT)-based approach for identifying and monitoring cardiac patients. The purpose of the healthcare model is to increase diagnostic precision. The precise answer to the model could lead to a reduction in mortality and treatment costs. Particle swarm optimization (PSO) and rough sets (RS) with transductive support vector machines (TSVM) was created by Thiyagaraj and Suseendran [10] for heart disease detection. To reduce data redundancy, this strategy increases data integrity. Data are normalized using the zero-score method. PSO is then used to select the best subset of characteristics, reducing the computing cost and improving the prediction performance. Finally, a radial basis function TSVM classifier is used for heart disease prediction. Shah et al. [11] developed an automated diagnostic approach for cardiac disease. Using the advantages of feature selection and extraction models, this technique assesses relevant feature subsets. Magesh and Swarnalatha [12] developed a model for detecting heart disease using Cleveland cardiac samples. Cluster-based decision tree (DT) learning is used to identify cardiac problems. The intended label distribution is used to split the original set and the potential class is determined using the elevated distribution samples. Entropy is employed to determine the characteristics of each class set for heart disease diagnosis. Nilashi et al. [13] used ML algorithms to develop a prediction approach for heart disease detection. This system combines unsupervised and supervised learning to identify cardiac problems. For missing value assertion, the approach uses a self-organizing map, fuzzy SVM, and principal component analysis (PCA). Furthermore, incremental PCA and the fuzzy SVM are designed for incremental data learning to reduce the computation time in disease prediction.

2.2 Problem Statement

The challenges associated with the existing methods for heart disease classification are as follows:

  • • The problem with heart disease data is that they require feature selection, which is complicated by the difference in samples and the absence of feature magnitudes. Furthermore, the explanation of cardiac echo images is difficult because of the lack of defined criteria for accurately inferring the data [1].

  • • Large amounts of original data have been created during the digital age. Gathering useful data is a significant challenge and is an ideal area for exploration. Several ambiguities are present in the information systems that are used for disease diagnosis, making them difficult to handle [4].

  • • For the diagnosis of cardiac disease, patient data include test results, video clips, and demographic information. Mining valuable information from large amounts of data is time consuming [9].

2.3 Motivation

Heart disease is the most common cause of death worldwide. India is in the midst of an industrial revolution that has resulted in lifestyle changes that contribute to heart disease. Despite the information contained in medical data, clinicians diagnose diseases based on their experience and individual opinions, which can lead to inaccurate diagnoses and high diagnostic expenses, thereby affecting the quality of services offered by hospitals to patients [14]. Medical data mining aims to discover useful information for accurate diagnoses. Disease therapy is difficult and time consuming in healthcare settings. The identification of cardiac diseases resulting from several circumstances is considered a multi-layered problem. Hence, it is necessary to develop an automated model for the accurate prediction of heart disease.

This study presents a method for predicting heart disease using an ML model and patient data. The data are analyzed so that the cardiac problems of individuals can be anticipated automatically. Appropriate tuning of the ANN parameters aids the system in improving its effectiveness in predicting cardiac diseases. Figure 1 provides a diagrammatic representation of the proposed model for heart disease prediction. ANNs are computational models inspired by the neural architecture of the human brain, and consist of interconnected layers of nodes or neurons. Each connection has an associated weight, and ANNs can learn complex patterns and relationships in the data through a training process that typically uses algorithms such as backpropagation. They are widely used for tasks such as classification, regression, and pattern recognition owing to their ability to model nonlinear relationships and handle high-dimensional data. Integrating ANNs with optimization algorithms such as the GWO algorithm can enhance their performance. GWO is a nature-inspired metaheuristic that mimics the social hierarchy and hunting behavior of grey wolves. It involves four types of wolves–alpha, beta, delta, and omega–representing the hierarchy within the pack. The algorithm iteratively updates the positions of the wolves based on the best solutions found thus far, guided by the alpha, beta, and delta wolves. Combining ANNs with GWO leverages the strengths of both, using GWO to optimize the weights and biases of the ANN, thereby improving the ability of the network to find optimal or near-optimal solutions to various complex problems. This hybrid approach enhances the learning capabilities and overall performance of the ANN, making it effective for a wide range of applications.

3.1 Input Heart Disease Data

Consider the input heart disease medical dataset I with various attributes, which is expressed as

I={IA,B};   (1Ap);   (1Bq),

where IA,B denotes the Bth attribute A in the data, p specifies the total data, and q is the total attributes in each data. The dimension of the database is indicated as [p × q].

3.2 Pre-processing of Heart Disease Data

The purpose of pre-processing is to make the input data processing more efficient. Pre-processing is an important phase that prepares the input data for the prediction process. In this step, medical records are converted into diagnostic values as part of the data pre-processing.

3.3 Feature Extraction Process

The features are obtained from the heart disease data once the pre-processing stage is completed. Statistical features, such as mean, standard deviation, variance, kurtosis, and skewness are retrieved from the pre-processed data. With the aid of statistical features, enormous amounts of data can be analyzed to find common patterns and trends, and then transformed into useful information. The extracted features are described in detail below.

a) Mean: The mean is a statistical characteristic that estimates the average of all data instances in proportion to the total occurrences from the data, and is expressed as

fM=1Qa=1QKa,

where Q is the total number of times that the data occur and Ka denotes the ath data point.

b) Variance: The average of the squared variances between the individual data and mean is the factor variance and is represented as

fV=1Q-1a=1Q(Ka-fM)2.

The variance measure tracks minute deviations in data over time, which helps to enhance the diagnosis process considerably.

c) Standard deviation: The measurement of the standard deviation is the root of the variance, which is the evaluation of widely scattered data responsible for evaluating each occurrence of heart disease data concerning the mean. The standard deviation equation is as follows:

fSD=1Q-1a=1Q(Ka-fM)2.

d) Skewness: The skewness is defined as the rate of symmetry or lack of symmetry, and is defined as the rate of the third central moment. The skewness of a dataset with perfect symmetry is always zero, and the skewness is made zero by the normal distribution. The formula for skewness is as follows:

fS=(Ka-fM)3QfSD3.

e) Kurtosis: The kurtosis is defined as the value of the distributed weight of the ends corresponding to a continuous distribution, and is equal to zero in the case of a Gaussian distribution. The kurtosis formula is expressed as follows:

fK=(Ka-fM)4QfSD4,

where fSD is the standard deviation, FM is the mean, and Ka indicates the ath measure of K.

3.4 Establishment of the Feature Vector

The feature vector is created using the input data, which include all vital statistics of the patient. These features are combined to form a feature vector, which is expressed as

fvector={fM,fV,fSD.fK,fS}.

The dimension of the feature vector is [1×5], and the feature vector is used as the input by the ANN to categorize the input as normal or abnormal.

3.5 ANN Classifier in Heart Disease Classification

Heart disease can be classified using ML algorithms, which detect conditions without the need for human interaction. The results obtained with ANNs offer many benefits in a range of applications compared with other ML approaches. An ANN is a mathematical model based on a biological neural network composed of a group of artificial neurons that calculate information using a connection mechanism. ANNs are computer models with many interconnections and simple, highly parallel processors. The primary premise behind the development of the ANN is that it is based on human characteristics and has neuron layers for analyzing the data that it receives as input. Multiplying the ANN input by weights [15] represents the data flow. Mathematical functions are used to assess the weights, resulting in the activation function of the neuron. The appropriate weight-tuning strategy of the GWO algorithm allows the output of the ANN to be controlled as desired. The construction of the ANN is illustrated in Figure 2.

The output of the ANN can be mathematically expressed as

DANN=P(n=1DGenrn)+b,

where P is the activation function, rn is the input, and Gen is the weight measure. An ANN classifier is used to divide the input as normal or diseased, and the benefit of employing an ANN [15] is that the classifier automatically executes the classification operation without requiring human intervention. The classification accuracy depends on the accurate tuning of the hyperparameters of the ANN classifier by the GWO algorithm.

The GWO optimization technique is a metaheuristic based on the hierarchy-based hunting characteristics of the Canidae family [16]. The aspects of the optimization method focused on hunting behaviors indicate the dynamic nature of addressing convergence concerns. TheGWO algorithm solves this problem by leveraging the dynamic features of the Canidae population. This improves the ability of the algorithm to achieve better global optimal convergence than other optimization techniques that frequently converge to local optimal solutions, with better stability between the exploitation and exploration phases. In the presence of grey wolves, the hierarchy-based hunting phenomenon exhibits a hunting hierarchy, including a leading grey wolf, a representative grey wolf, an obeyer grey wolf, and a scapegoat grey wolf.

As stated previously, the hierarchy-based hunting experience of the population is supported by hierarchical levels, with the leading grey wolf at the top of the hierarchy and in charge of devising solutions for resting, hunting, and other activities. During hunting, the leader grey wolf teaches the other grey wolves, despite the fact that the leading grey wolf is not the strongest and does not have any mates within the group, but remains the best grey wolf in the group. At the second-order level, the representative grey wolf directs the leading grey wolf by transferring the decision to the subordinate grey wolves in the search area. The scapegoat grey wolves remain at the bottom of the food chain, guarding the entire force. The obeyer grey wolves lie directly above the scapegoat grey wolf and mostly follow the orders of the leading grey wolves. The proposed GWO algorithm comprises four primary phases, as follows.

a) Social tracking hierarchy: The leading, representative, obeyer, and scapegoat grey wolves are pursued by the grey wolf hierarchy, with the leading grey wolf being the strongest among all grey wolves. The representative and obeyer grey wolves compete for second and third place, respectively, and are considered as the second- and third-best grey wolves, whereas the remaining grey wolves in strength are regarded as scapegoat grey wolves.

b) Surrounding phase: The grey wolf surrounds the prey in this phase based on the needs and convenience of the grey wolf. In the surrounding phase, the grey wolf is modeled as

Ebcs+1=ERs-U.C,

where Ebcs+1 indicates the position of the bth grey wolf at the cth coordinate axis and represents the updated location. Let ERs represent the location of the prey, C represent the space between the prey and grey wolf, and U be the coefficient vector. The distance vector is formulated as

C=|W1.ERs-Es|,

where Es represents the location of the grey wolf at time s and W1 represents the coefficient vector. The coefficient vectors U and W1 are mathematically modeled as

U=2g.m1-g,W1=2m2,

where g progressively decreases from 2 to 0 as the number of iterations increases, and m1 and m2 denote random vectors ranging from 0 to 1.

c) Grey wolf hunting phase: Grey wolves can recognize prey, and the hunting process is carried out with the help of the leader grey wolf, representative grey wolf, and obeyer grey wolf. The general characteristics of grey wolves are as follows:

CJ=|W1ER-E|,CK=|W2ER-E|,CL=|W3ER-E|,

where CJ, CK, and CL represent the current route of the best grey wolf at the time of hunting. W1, W2, and W3 symbolize the random limits of the grey wolf in balancing the exploration and exploitation phases. The distance vectors are expressed as

E1=EJ-U1CJ,E2=EK-U2CK,E3=EL-U3CL,

where U1, U2, and U3 indicate the coefficient vectors and Ej is the best solution at time t. U1, U2, and U3 represent the tunable parameters in the optimization process. In general, the hunting process among grey wolves is based on the first three best solutions, according to the roles of the leader, representative, and obeyer grey wolves [18].

Ebc,gs+1=E1+E2+E33,

where E1 indicates the position of the leader grey wolf, E2 is the position of the representative grey wolf, and E3 is the position of the obeyer grey wolf of the optimization procedure, with reference to the three best hunting locations. The hunting is led by the space between the grey wolf and prey, as well as the tunable coefficient parameters, as shown in the above modeling. As a result,

Et+1=[(EJ+EK+EL)-(U1CJ+U2CK+U3CL)3].

Eq. (20) is the updated formulation of the GWO optimization method, which is based on the constraints imposed by the grey wolf hierarchy-based hunting characteristics. Using Eq. (20), the GWO optimization method improves the accomplishment of the global best solution.

d) Social dominance stage: The grey wolves only probe the prey inside the search region during this phase and pick the ideal local sites to attack the prey.

e) Exploration stage: The grey wolves, including the leader, representative, obeyer, and scapegoat, investigate the environment outside the resolution space in this phase and arrive at the best possible conclusion. The pseudocode of the GWO algorithm is presented in Table 1.

This section discusses the outcomes of the GWO-ANN prediction module, and a comparative analysis demonstrates the performance of the GWO-ANN in cardiovascular disease prediction.

5.1 Experimental Setup

The study was conducted using Python, which was installed on a Windows 10 64-bit computer with 16 GB of RAM.

5.2 Database Description

a) Heart disease database: The heart disease database was used as the dataset in the proposed cardiovascular disease prediction module. Of the 76 attributes in this database, only 14 are frequently assessed for research purposes. Using these data, an experimental study was conducted to determine the presence or absence of cardiac disorders [17].

b) Statlog + Cleveland + Hungary database: This dataset was created by integrating several datasets that were previously accessible individually but had never been merged. The Heart Disease UCI and Statlog + Cleveland + Hungary datasets, both available on Kaggle, are essential resources for predicting heart disease using ML. The Heart Disease UCI dataset contains 303 instances with 14 attributes, including age, sex, chest pain type, resting blood pressure, serum cholesterol, fasting blood sugar, resting electrocardiographic results, maximum heart rate achieved, exercise-induced angina, ST depression induced by exercise, slope of the peak exercise ST segment, number of major vessels colored by fluoroscopy, thalassemia, and a target variable indicating the presence of heart disease. This dataset allows for comprehensive analysis and feature engineering, making it useful for developing predictive models. The Statlog + Cleveland + Hungary dataset combines data from three major heart disease studies, and offers a robust dataset with 918 instances and the same 14 attributes. Merging records from the Cleveland, Statlog, and Hungary datasets provides a more extensive and diverse set of patient information, enhancing the generalizability and robustness of heart disease prediction models. This dataset is particularly valuable for analyzing trends and creating models that can predict heart disease across different populations.

These databases were integrated based on 11 similar traits, resulting in the world’s largest heart disease dataset [18].

5.3 Evaluation Metrics

a) Accuracy: In optimal medical data categorization, the accuracy is defined as the scale on which an evaluated value is close to its actual measure.

b) Sensitivity: The ratio of positives accurately detected by the classifier is referred to as the sensitivity.

c) Specificity: The ratio of properly detected negatives by the classifier is known as the specificity.

d) F1-score: The F1-score, which is the average of the precision and recall, is a measure of a model’s accuracy.

5.4 Comparative Methods

To validate the efficacy of the proposed technique, methods including the multilayer perceptron (MLP) classifier, SVM classifier, random forest (RF), AdaBoost classifier [1922], and ANN classifier [8] were compared with the proposed GWO-ANN classifier in terms of the accuracy, sensitivity, specificity, and F1-score.

5.5 Comparative Analysis

This section presents a comparative evaluation of the approaches used in the classification of heart disease using conventional techniques and the proposed model (Table 2).

a) Analysis based on heart disease database

Figure 3 presents a comparative analysis of these methods using the heart disease database. Figure 3(a) presents a comparative evaluation of the methods in terms of the accuracy. The accuracies of the MLP, SVM, RF, AdaBoost, ANN, and proposed GWO-ANN methods for the training percentage of 70% were 80.898%, 81.3107%, 90.4011%, 84.8175%, 90.8367%, and 92.4979%, respectively. Figure 3(b) presents a comparative evaluation in terms of the sensitivity. The sensitivities of the MLP, SVM, RF, AdaBoost, ANN, and proposed GWO-ANN for the training percentage of 70% were 77.7198%, 82.9382%, 90.3623%, 89.4911%, 90.7979%, and 94.3203%, respectively.

Figure 3(c) presents a comparative evaluation in terms of the specificity. The specificities of the MLP, SVM, RF, AdaBoost, ANN, and proposed GWO-ANN for the training percentage of 70% were 71.3489%, 75.6583%, 86.9725%, 84.8229%, 88.8337%, and 90.4949%, respectively. Figure 3(d) shows the comparative evaluation in terms of the F1-score. The F1-scores of the MLP, SVM, RF, AdaBoost, ANN, and proposed GWO-ANN for the training percentage of 70% were 81.0367%, 82.401%, 90.7993%, 87.5715%, 92.1061%, and 95.6285%, respectively.

b) Evaluation based on Statlog + Cleveland + Hungary database

Figure 4 shows a comparative evaluation using the Statlog + Cleveland + Hungary database. Figure 4(a) shows a comparative evaluation of the accuracy of the methods. The accuracies of the MLP, SVM, RF, AdaBoost, ANN, and proposed GWO-ANN for the training percentage of 70% were 54.0108%, 77.8053%, 89.3536%, 87.9337%, 91.5205%, and 92.8122%, respectively. Figure 4(b) presents a comparative evaluation of the methods in terms of the sensitivity. The sensitivities of the MLP, SVM, RF, AdaBoost, ANN, and proposed GWO-ANN for the training percentage of 70% were 70.4158%, 75.9977%, 92.2176%, 91.0989%, 93.3803%, and 94.6347%, respectively.

Figure 4(c) presents a comparative evaluation of the methods in terms of their specificity. The specificities of the MLP, SVM, RF, AdaBoost, ANN, and proposed GWO-ANN for the training percentage of 70% were 52.0078%, 81.8617%, 86.7021%, 85.242%, 89.5175%, and 90.8093%, respectively. Figure 4(d) presents a comparative evaluation based on the F1-score. The F1-scores of the MLP, SVM, RF, AdaBoost, ANN, and proposed GWO-ANN for the training percentage of 70% were 57.1414%, 75.6547%, 92.1683%, 90.7796%, 94.6511%, and 95.9429%, respectively.

The proposed method attained the highest accuracy of 92.50% for the heart disease dataset and 92.92% accuracy for the Statlog + Cleveland + Hungary database. In addition, the sensitivity, specificity, and F1-score of the proposed method were 94.75%, 90.92%, and 96.06%, respectively. It is clear from the results that the proposed model enables enhanced heart disease classification, with better measures of accuracy, sensitivity, specificity, and F1-score.

This study has presented an ML-based heart disease prediction module for the diagnosis of cardiovascular diseases by analyzing heart disease data. The heart disease data were initially preprocessed to eliminate artifacts. Significant statistical features such as the mean, standard deviation, variance, kurtosis, and skewness were then extracted from the data, through which a feature vector was developed that acted as the input to the proposed ANN classifier. The performance of the ANN was enhanced by optimally tuning the weights using the GWO algorithm, which is derived by considering the hierarchy-based hunting characteristic features of grey wolves. The efficiency of the proposed model was verified using performance measures including the accuracy, sensitivity, specificity, and F1-score, with results of 92.9245%, 94.7469%, 90.9215%, and 96.0551%, respectively. In future, the accuracy of the model will be further enhanced by optimally tuning the ANN classifier using hybrid metaheuristic optimization algorithms. Furthermore, this method can be tested for real-time applications in the medical field.

Fig. 1.

Block diagram of proposed heart disease detection module.


Fig. 2.

Structure of neural network.


Fig. 3.

Comparative analysis based on heart disease database: (a) accuracy, (b) sensitivity, (c) specificity, and (d) F1-score.


Fig. 4.

Comparative evaluation based on Statlog + Cleveland + Hungary database: (a) accuracy, (b) sensitivity, (c) specificity, and (d) F1-score.


Table. 1.

Table 1. Pseudocode of GWO algorithm.

Sl. no.Pseudocode of GWO optimization algorithm
1Initialize grey wolf population and coefficients
2Calculate fitness value for grey wolves
3For all grey wolves
4{
5EJ as best grey wolf solution
6EK as best grey wolf solution
7EL as best grey wolf solution
8}
9While s < smax
10{
11 Update the grey wolf position as per Eq. (20)
12}
13Else
14{
15 Update the coefficients, C, m
16 Evaluate fitness for all grey wolves
17 Update the solutions EJ, EK, EL
18s = s + 1
19}
20EndWhile
21EndFor
22Return

Table. 2.

Table 2. Comparison of heart disease classification techniques.

S. no.DatasetsMetricsMethods

MLPSVMRFAdaBoostANNProposed GWO-ANN
1Heart disease datasetAccuracy (%)81.1985.0190.5289.0991.8392.50
Sensitivity (%)85.4890.6192.3491.2393.6594.32
Specificity (%)76.3078.6188.5287.0990.4990.49
F1-score (%)85.0688.6593.6592.2294.9695.63

2Statlog + Cleveland + Hungary databaseAccuracy (%)85.6586.9390.3689.2291.6392.92
Sensitivity (%)85.8188.4592.3391.2193.4694.75
Specificity (%)83.6686.2088.3587.2289.6390.92
F1-score (%)88.3789.7093.4992.3594.7696.06

  1. Mohan, S, Thirumalai, C, and Srivastava, G (2019). Effective heart disease prediction using hybrid machine learning techniques. IEEE Access. 7, 81542-81554. https://doi.org/10.1109/ACCESS.2019.2923707
    CrossRef
  2. Said, IU, Adam, AH, and Garko, AB (2015). Association rule mining on medical data to predict heart disease. International Journal of Science Technology and Management. 4, 26-35.
  3. Chauhan, A, Jain, A, Sharma, P, and Deep, V. Heart disease prediction using evolutionary rule learning., Proceedings of 2018 4th International Conference on Computational Intelligence & Communication Technology (CICT), Ghaziabad, India, 2018, pp.1-4. https://doi.org/10.1109/CIACT.2018.8480271
    CrossRef
  4. Valarmathi, R, and Sheela, T (2021). Heart disease prediction using hyper parameter optimization (HPO) tuning. Biomedical Signal Processing and Control. 70. article no 103033. https://doi.org/10.1016/j.bspc.2021.103033
    CrossRef
  5. Alotaibi, FS (2019). Implementation of machine learning model to predict heart failure disease. International Journal of Advanced Computer Science and Applications. 10, 261-268. https://doi.org/10.14569/IJACSA.2019.0100637
    CrossRef
  6. Motwani, M, Dey, D, Berman, DS, Germano, G, Achenbach, S, and Al-Mallah, MH (2017). Machine learning for prediction of all-cause mortality in patients with suspected coronary artery disease: a 5-year multicentre prospective registry analysis. European Heart Journal. 38, 500-507. https://doi.org/10.1093/eurheartj/ehw188
    CrossRef
  7. Condie, T, Mineiro, P, Polyzotis, N, and Weimer, M . Machine learning for big data., Proceedings of the 2013 IEEE 29th International Conference on Data Engineering (ICDE), Brisbane, Australia, 2013, pp.1242-1244. https://doi.org/10.1109/ICDE.2013.6544913
    CrossRef
  8. Pan, Y, Fu, M, Cheng, B, Tao, X, and Guo, J (2020). Enhanced deep learning assisted convolutional neural network for heart disease prediction on the internet of medical things platform. IEEE Access. 8, 189503-189512. https://doi.org/10.1109/ACCESS.2020.3026214
    CrossRef
  9. Abdel-Basset, M, Gamal, A, Manogaran, G, Son, LH, and Long, HV (2020). A novel group decision making model based on neutrosophic sets for heart disease diagnosis. Multimedia Tools and Applications. 79, 9977-10002. https://doi.org/10.1007/s11042-019-07742-7
    CrossRef
  10. Thiyagaraj, M, and Suseendran, G. (2020) . Enhanced prediction of heart disease using particle swarm optimization and rough sets with transductive support vector machines classifier. Data Management, Analytics and Innovation, 141-152. https://doi.org/10.1007/978-981-13-9364-811
    CrossRef
  11. Shah, SMS, Shah, FA, Hussain, SA, and Batool, S (2020). Support vector machines-based heart disease diagnosis using feature subset, wrapping selection and extraction methods. Computers & Electrical Engineering. 84. article no 106628. https://doi.org/10.1016/j.compeleceng.2020.106628
    CrossRef
  12. Magesh, G, and Swarnalatha, P (2021). Optimal feature selection through a cluster-based DT learning (CDTL) in heart disease prediction. Evolutionary Intelligence. 14, 583-593. https://doi.org/10.1007/s12065-019-00336-0
    CrossRef
  13. Nilashi, M, Ahmadi, H, Manaf, AA, Rashid, TA, Samad, S, Shahmoradi, L, Aljojo, N, and Akbari, E (2020). Coronary heart disease diagnosis through self-organizing map and fuzzy support vector machine with incremental updates. International Journal of Fuzzy Systems. 22, 1376-1388. https://doi.org/10.1007/s40815-020-00828-7
    CrossRef
  14. Bharti, B, and Singh, SN . Analytical study of heart disease prediction comparing with different algorithms., International Conference on Computing, Communication & Automation, Greater Noida, India, 2015, pp.78-82. https://doi.org/10.1109/CCAA.2015.7148347
    CrossRef
  15. Akgul, M, Sonmez, OE, and Ozcan, T (2019). Diagnosis of heart disease using an intelligent method: a hybrid ANN–GA approach. Intelligent and Fuzzy Systems in Big Data Analytics and Decision Making. Cham, Switzerland: Springer, pp. 1250-1257 https://doi.org/10.1007/978-3-030-23756-1147
    CrossRef
  16. Mirjalili, S, Saremi, S, Mirjalili, SM, and Coelho, LDS (2016). Multi-objective grey wolf optimizer: a novel algorithm for multi-criterion optimization. Expert Systems with Applications. 47, 106-119. https://doi.org/10.1016/j.eswa.2015.10.039
    CrossRef
  17. Kaggle, (2020). UCI Heart Disease Data. Available: https://www.kaggle.com/datasets/redwankarimsony/heart-disease-data
  18. Kaggle, (2019). Heart disease dataset (comprehensive): statlog + cleveland + hungary dataset. Available: https://www.kaggle.com/datasets/sid321axn/heartstatlog-cleveland-hungary-final
  19. Deepika, D, and Balaji, N (2022). Effective heart disease prediction using novel MLP-EBMDA approach. Biomedical Signal Processing and Control. 72. article no 103318. https://doi.org/10.1016/j.bspc.2021.103318
    CrossRef
  20. Mythili, T, Mukherji, D, Padalia, N, and Naidu, A (2013). A heart disease prediction model using SVM-decision treeslogistic regression (SDL). International Journal of Computer Applications. 68, 11-15. https://doi.org/10.5120/11662-7250
    CrossRef
  21. Jabbar, MA, Deekshatulu, BL, and Chandra, P (2016). Intelligent heart disease prediction system using random forest and evolutionary approach. Journal of Network and Innovative Computing. 4, 175-184.
  22. Rani, P, Kumar, R, Ahmed, NMS, and Jain, A (2021). A decision support system for heart disease prediction based upon machine learning. Journal of Reliable Intelligent Environments. 7, 263-275. https://doi.org/10.1007/s40860-021-00133-6
    CrossRef

Amirthalakshmi ThirumalaiMaadapoosi is currently working as an associate professor in the Department of Electronics and Communications Engineering, SRM Institute of Science and Technology, Ramapurram, Chennai, India. She completed her B.E. in electronics and communication engineering from Pallavan college of engineering affiliated to Anna University. She received her M.Tech. in VLSI Design from Sathyabama Institute of Science and Technology at Chennai. She obtained her Ph.D. degree from Sathyabama Institute of Science and Technology at Chennai in the year of 2019. She published 29 papers across various reputed international journals and also published 8 patents. Her research interests include VLSI design, low power VLSI design, image processing, fuzzy logic and microelectronics.

V. Balamurugan is currently working as professor in the Department of Electronics and Communications Engineering, Sathyabama Institute of Science and Technology, Chennai, India. He received his B.E. degree in electronics and communication Engineering from periyar university in 2004, M.Tech. degree in VLSI design from Sathyabama University in 2006 and Ph.D. from Sathyabama University in 2018. He has 16 years of teaching experience in various engineering colleges. He has published more than 35 papers in various journals and conferences. His research interests include VLSI design, digital signal and image processing and nano electronics.

V. Vedanarayanan is associate professor in Electronics and Communication Engineering, Sathyabama Institute of Science and Technology-Chennai, has 20 years of teaching experience. Has completed a B.E. in electrical and electronics engineering from Madras University and an M.E. in electronics and controls engineering from Sathyabama University, Chennai. He received his doctorate in medical image processing from Sathya-bama Institute of Science and Technology (Deemed to be University), Chennai. He has published around 40 research papers in Scopus-indexed refereed international journals and nearly 20 Web of Science-indexed journals. His area of research includes medical image analysis, computational intelligence, and sustainable & green systems. Dr. Vedanarayanan is also an active member of IEEE, IANG, etc. He has a resource person who handled various hands-on training programs and workshops for faculties and students in other premier institutions. Also, he was selected as part of the IEEE EMBS Worldwide Student Mentor Program from June to Dec 2023. As a student mentor, he has guided students for multiple competitions, hackathons, and project exhibitions and received appreciation certificates from Indian government agencies like MSME, MHRD.

Sahaya Anselin Nisha received her B.E. degree in Electronics and Communication Engineering in April 2002, M.E. degree in Applied Electronics with Gold Medal from Sathyabama University in 2006, India and Ph.D. from Sathyabama University in 2013. She has 17 years of teaching experience in various engineering colleges. She is currently working as a professor in the department of Electronics and Communication Engineering in Sathyabama University, Chennai, India. She has published more than 35 papers in various journals and Conferences. Her research interests include wireless communication, UWB communication systems, and multiple antenna systems, RF circuit design and microwave communication. She is a life fellow of the Institution of Electronics and Telecommunication Engineers (IETE), India, Antenna Test and Measurement Society (ATMS), and Institute of Electrical and Electronics Engineers (IEEE).

R. Narmadha career started simply and reaches its peaks, getting awarded a doctorate degree by Sathyabama University. At present she is working as an associate professor in the Department of Electronics and Communication Engineering, Sathyabama Institute of Science and Technology, Chennai. She has 21 years of teaching and research experience in the field of wireless communication particularly network security. She has completed a seed money project from Sathyabama Institute. She has published more than 35 papers in international/national peer-reviewed journals and conferences. She has written book chapter in the field of communication networks. She got award of excellence on “Mid carrier research service in engineering” by SMU University, Singapore. She is also an active MIE (Corporate Member) in the Institution of Engineers, (India) and life member of the Indian Society for Technical Education (ISTE). Her research interests include signal and image processing, data science and IoT.

Article

Original Article

International Journal of Fuzzy Logic and Intelligent Systems 2024; 24(3): 231-241

Published online September 25, 2024 https://doi.org/10.5391/IJFIS.2024.24.3.231

Copyright © The Korean Institute of Intelligent Systems.

Grey Wolf Optimization-Based Artificial Neural Network in the Development of an Automated Heart Disease Prediction Model

Amirthalakshmi Thirumalai Maadapoosi1, Velan Balamurugan2, V. Vedanarayanan2, Sahaya Anselin Nisha2, and R. Narmadha2

1Department of Electronics and Communication Engineering, SRM Institute of Science and Technology, Ramapuram, Chennai, India
2Department of Electronics and Communication Engineering, Sathyabama Institute of Science and Technology, Chennai, India

Correspondence to:Amirthalakshmi Thirumalai Maadapoosi (amirthatm@gmail.com)

Received: May 12, 2022; Revised: January 27, 2023; Accepted: June 10, 2024

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Heart disease is currently one of the leading causes of death worldwide. Predicting cardiac diseases is a significant challenge in clinical data analysis. Machine learning is useful for generating judgments and predictions based on the significant amount of data generated by healthcare businesses. Using patient heart disease data, this research presents a heart disease prediction module using the grey wolf optimization (GWO)-tuned artificial neural network (ANN) classifier. The effectiveness of the proposed module relies on the optimal tuning of the weights of the ANN classifier using the GWO algorithm, which involves the hierarchy-based prey-hunting features of grey wolves. This enhances the prediction accuracy of the proposed model. The efficiency of the proposed model was analyzed in terms of performance indices such as accuracy, sensitivity, specificity, and F1-score, which were determined to be 92.9245%, 94.7469%, 90.9215%, and 96.0551%, respectively. The experimental results obtained using the proposed GWO-ANN module validated the efficiency of the proposed strategy compared with conventional models.

Keywords: Grey wolf optimization, Heart disease, Machine learning, Artificial neural network

1. Introduction

By 2030, the number of deaths due to cardiovascular disease is expected to increase to 23.3 million [1]. The veins of the heart transport oxygen; if these vessels become blocked or restricted, heart disease or stroke may occur. According to the World Health Organization, approximately 12 million people are affected by heart disease each year, and 80% of those affected die as a result of this condition [2]. The key variables that affect the heart include high cholesterol, high blood pressure, tension, stress, alcohol use, a sedentary lifestyle, obesity, and diabetes. These characteristics can aid in the detection of cardiac disease. Higher blood pressure causes thickening of the arterial walls, resulting in obstruction and an increased mortality rate [3]. The early detection and treatment of cardiac disease can help to prevent and lower patient death rates [4]. Several researchers in the medical field have used various supervised and unsupervised machine learning (ML) algorithms to diagnose and predict cardiac disease. ML approaches aid in the extraction of usable knowledge from large datasets [5].

These algorithms have found widespread use in healthcare, speech recognition, cosmology, and education. Several techniques have been proposed for detecting various patterns in huge datasets [6]. ML is widely used in the healthcare industry to diagnose diseases and make informed decisions. This aids in the classification of patients with significant risk factors. Healthcare businesses collect massive volumes of data, and ML provides several models for fast training and data analysis [7]. These algorithms seek a large search space for optimal solutions by training a dataset. To assess the performance of the models, many measures such as accuracy, sensitivity, specificity, precision, and F1-score may be utilized. The optimum hyperparameters are identified and applied to the ML models by modifying the hyperparameters, which enhances the performance. Robust models can be created by tweaking the hyperparameters. The hyperparameters can be tuned to prevent overfitting or underfitting [4, 8].

This study presents a unique optimization-controlled ML model for cardiovascular disease identification based on patient heart disease data. An artificial neural network (ANN) classifier is used to classify cardiac disease, and the weights of the ANN classifier are optimized using the grey wolf optimization (GWO) method. To make the classification process effective, significant statistical characteristics such as mean, variance, kurtosis, standard deviation, and skewness are retrieved from the input data. Furthermore, the findings of this study reveal a complete description of the proposed model, allowing the efficacy of the proposed categorization model to be confirmed. In addition, a comprehensive analysis is presented based on the performance of traditional classification techniques to assess the benefits of the proposed heart disease strategy.

The major contributions of this research are summarized as follows:

  • • A novel artificial intelligence (AI)-based heart disease prediction strategy is designed and developed for advanced healthcare systems.

  • • The GWO algorithm is used to enhance the prediction accuracy of the ANN classifier by optimally tuning the weights and parameters.

  • • The performance of the ANN model is validated by comparing the proposed and existing methods for the heart disease prediction module.

The remainder of this paper is organized as follows: Section 2 outlines modern heart disease categorization methodologies, as well as the issues that they face. The proposed GWO-based ANN model for heart disease classification is described in Section 3. The outputs of the proposed model are discussed in Section 4, and the study is concluded in Section 5.

2. Literature Survey

The existing techniques for heart disease categorization and the problems associated with the models are discussed in this section, serving as inspiration for the creation of the proposed model.

2.1 Related Works

A literature review of existing methods is presented herein. Abdel-Basset et al. [9] developed an Internet of Things (IoT)-based approach for identifying and monitoring cardiac patients. The purpose of the healthcare model is to increase diagnostic precision. The precise answer to the model could lead to a reduction in mortality and treatment costs. Particle swarm optimization (PSO) and rough sets (RS) with transductive support vector machines (TSVM) was created by Thiyagaraj and Suseendran [10] for heart disease detection. To reduce data redundancy, this strategy increases data integrity. Data are normalized using the zero-score method. PSO is then used to select the best subset of characteristics, reducing the computing cost and improving the prediction performance. Finally, a radial basis function TSVM classifier is used for heart disease prediction. Shah et al. [11] developed an automated diagnostic approach for cardiac disease. Using the advantages of feature selection and extraction models, this technique assesses relevant feature subsets. Magesh and Swarnalatha [12] developed a model for detecting heart disease using Cleveland cardiac samples. Cluster-based decision tree (DT) learning is used to identify cardiac problems. The intended label distribution is used to split the original set and the potential class is determined using the elevated distribution samples. Entropy is employed to determine the characteristics of each class set for heart disease diagnosis. Nilashi et al. [13] used ML algorithms to develop a prediction approach for heart disease detection. This system combines unsupervised and supervised learning to identify cardiac problems. For missing value assertion, the approach uses a self-organizing map, fuzzy SVM, and principal component analysis (PCA). Furthermore, incremental PCA and the fuzzy SVM are designed for incremental data learning to reduce the computation time in disease prediction.

2.2 Problem Statement

The challenges associated with the existing methods for heart disease classification are as follows:

  • • The problem with heart disease data is that they require feature selection, which is complicated by the difference in samples and the absence of feature magnitudes. Furthermore, the explanation of cardiac echo images is difficult because of the lack of defined criteria for accurately inferring the data [1].

  • • Large amounts of original data have been created during the digital age. Gathering useful data is a significant challenge and is an ideal area for exploration. Several ambiguities are present in the information systems that are used for disease diagnosis, making them difficult to handle [4].

  • • For the diagnosis of cardiac disease, patient data include test results, video clips, and demographic information. Mining valuable information from large amounts of data is time consuming [9].

2.3 Motivation

Heart disease is the most common cause of death worldwide. India is in the midst of an industrial revolution that has resulted in lifestyle changes that contribute to heart disease. Despite the information contained in medical data, clinicians diagnose diseases based on their experience and individual opinions, which can lead to inaccurate diagnoses and high diagnostic expenses, thereby affecting the quality of services offered by hospitals to patients [14]. Medical data mining aims to discover useful information for accurate diagnoses. Disease therapy is difficult and time consuming in healthcare settings. The identification of cardiac diseases resulting from several circumstances is considered a multi-layered problem. Hence, it is necessary to develop an automated model for the accurate prediction of heart disease.

3. Proposed Heart Disease Prediction Method using GWO-tuned ANN Classifier

This study presents a method for predicting heart disease using an ML model and patient data. The data are analyzed so that the cardiac problems of individuals can be anticipated automatically. Appropriate tuning of the ANN parameters aids the system in improving its effectiveness in predicting cardiac diseases. Figure 1 provides a diagrammatic representation of the proposed model for heart disease prediction. ANNs are computational models inspired by the neural architecture of the human brain, and consist of interconnected layers of nodes or neurons. Each connection has an associated weight, and ANNs can learn complex patterns and relationships in the data through a training process that typically uses algorithms such as backpropagation. They are widely used for tasks such as classification, regression, and pattern recognition owing to their ability to model nonlinear relationships and handle high-dimensional data. Integrating ANNs with optimization algorithms such as the GWO algorithm can enhance their performance. GWO is a nature-inspired metaheuristic that mimics the social hierarchy and hunting behavior of grey wolves. It involves four types of wolves–alpha, beta, delta, and omega–representing the hierarchy within the pack. The algorithm iteratively updates the positions of the wolves based on the best solutions found thus far, guided by the alpha, beta, and delta wolves. Combining ANNs with GWO leverages the strengths of both, using GWO to optimize the weights and biases of the ANN, thereby improving the ability of the network to find optimal or near-optimal solutions to various complex problems. This hybrid approach enhances the learning capabilities and overall performance of the ANN, making it effective for a wide range of applications.

3.1 Input Heart Disease Data

Consider the input heart disease medical dataset I with various attributes, which is expressed as

I={IA,B};   (1Ap);   (1Bq),

where IA,B denotes the Bth attribute A in the data, p specifies the total data, and q is the total attributes in each data. The dimension of the database is indicated as [p × q].

3.2 Pre-processing of Heart Disease Data

The purpose of pre-processing is to make the input data processing more efficient. Pre-processing is an important phase that prepares the input data for the prediction process. In this step, medical records are converted into diagnostic values as part of the data pre-processing.

3.3 Feature Extraction Process

The features are obtained from the heart disease data once the pre-processing stage is completed. Statistical features, such as mean, standard deviation, variance, kurtosis, and skewness are retrieved from the pre-processed data. With the aid of statistical features, enormous amounts of data can be analyzed to find common patterns and trends, and then transformed into useful information. The extracted features are described in detail below.

a) Mean: The mean is a statistical characteristic that estimates the average of all data instances in proportion to the total occurrences from the data, and is expressed as

fM=1Qa=1QKa,

where Q is the total number of times that the data occur and Ka denotes the ath data point.

b) Variance: The average of the squared variances between the individual data and mean is the factor variance and is represented as

fV=1Q-1a=1Q(Ka-fM)2.

The variance measure tracks minute deviations in data over time, which helps to enhance the diagnosis process considerably.

c) Standard deviation: The measurement of the standard deviation is the root of the variance, which is the evaluation of widely scattered data responsible for evaluating each occurrence of heart disease data concerning the mean. The standard deviation equation is as follows:

fSD=1Q-1a=1Q(Ka-fM)2.

d) Skewness: The skewness is defined as the rate of symmetry or lack of symmetry, and is defined as the rate of the third central moment. The skewness of a dataset with perfect symmetry is always zero, and the skewness is made zero by the normal distribution. The formula for skewness is as follows:

fS=(Ka-fM)3QfSD3.

e) Kurtosis: The kurtosis is defined as the value of the distributed weight of the ends corresponding to a continuous distribution, and is equal to zero in the case of a Gaussian distribution. The kurtosis formula is expressed as follows:

fK=(Ka-fM)4QfSD4,

where fSD is the standard deviation, FM is the mean, and Ka indicates the ath measure of K.

3.4 Establishment of the Feature Vector

The feature vector is created using the input data, which include all vital statistics of the patient. These features are combined to form a feature vector, which is expressed as

fvector={fM,fV,fSD.fK,fS}.

The dimension of the feature vector is [1×5], and the feature vector is used as the input by the ANN to categorize the input as normal or abnormal.

3.5 ANN Classifier in Heart Disease Classification

Heart disease can be classified using ML algorithms, which detect conditions without the need for human interaction. The results obtained with ANNs offer many benefits in a range of applications compared with other ML approaches. An ANN is a mathematical model based on a biological neural network composed of a group of artificial neurons that calculate information using a connection mechanism. ANNs are computer models with many interconnections and simple, highly parallel processors. The primary premise behind the development of the ANN is that it is based on human characteristics and has neuron layers for analyzing the data that it receives as input. Multiplying the ANN input by weights [15] represents the data flow. Mathematical functions are used to assess the weights, resulting in the activation function of the neuron. The appropriate weight-tuning strategy of the GWO algorithm allows the output of the ANN to be controlled as desired. The construction of the ANN is illustrated in Figure 2.

The output of the ANN can be mathematically expressed as

DANN=P(n=1DGenrn)+b,

where P is the activation function, rn is the input, and Gen is the weight measure. An ANN classifier is used to divide the input as normal or diseased, and the benefit of employing an ANN [15] is that the classifier automatically executes the classification operation without requiring human intervention. The classification accuracy depends on the accurate tuning of the hyperparameters of the ANN classifier by the GWO algorithm.

4. GWO Algorithm in ANN Classifier

The GWO optimization technique is a metaheuristic based on the hierarchy-based hunting characteristics of the Canidae family [16]. The aspects of the optimization method focused on hunting behaviors indicate the dynamic nature of addressing convergence concerns. TheGWO algorithm solves this problem by leveraging the dynamic features of the Canidae population. This improves the ability of the algorithm to achieve better global optimal convergence than other optimization techniques that frequently converge to local optimal solutions, with better stability between the exploitation and exploration phases. In the presence of grey wolves, the hierarchy-based hunting phenomenon exhibits a hunting hierarchy, including a leading grey wolf, a representative grey wolf, an obeyer grey wolf, and a scapegoat grey wolf.

As stated previously, the hierarchy-based hunting experience of the population is supported by hierarchical levels, with the leading grey wolf at the top of the hierarchy and in charge of devising solutions for resting, hunting, and other activities. During hunting, the leader grey wolf teaches the other grey wolves, despite the fact that the leading grey wolf is not the strongest and does not have any mates within the group, but remains the best grey wolf in the group. At the second-order level, the representative grey wolf directs the leading grey wolf by transferring the decision to the subordinate grey wolves in the search area. The scapegoat grey wolves remain at the bottom of the food chain, guarding the entire force. The obeyer grey wolves lie directly above the scapegoat grey wolf and mostly follow the orders of the leading grey wolves. The proposed GWO algorithm comprises four primary phases, as follows.

a) Social tracking hierarchy: The leading, representative, obeyer, and scapegoat grey wolves are pursued by the grey wolf hierarchy, with the leading grey wolf being the strongest among all grey wolves. The representative and obeyer grey wolves compete for second and third place, respectively, and are considered as the second- and third-best grey wolves, whereas the remaining grey wolves in strength are regarded as scapegoat grey wolves.

b) Surrounding phase: The grey wolf surrounds the prey in this phase based on the needs and convenience of the grey wolf. In the surrounding phase, the grey wolf is modeled as

Ebcs+1=ERs-U.C,

where Ebcs+1 indicates the position of the bth grey wolf at the cth coordinate axis and represents the updated location. Let ERs represent the location of the prey, C represent the space between the prey and grey wolf, and U be the coefficient vector. The distance vector is formulated as

C=|W1.ERs-Es|,

where Es represents the location of the grey wolf at time s and W1 represents the coefficient vector. The coefficient vectors U and W1 are mathematically modeled as

U=2g.m1-g,W1=2m2,

where g progressively decreases from 2 to 0 as the number of iterations increases, and m1 and m2 denote random vectors ranging from 0 to 1.

c) Grey wolf hunting phase: Grey wolves can recognize prey, and the hunting process is carried out with the help of the leader grey wolf, representative grey wolf, and obeyer grey wolf. The general characteristics of grey wolves are as follows:

CJ=|W1ER-E|,CK=|W2ER-E|,CL=|W3ER-E|,

where CJ, CK, and CL represent the current route of the best grey wolf at the time of hunting. W1, W2, and W3 symbolize the random limits of the grey wolf in balancing the exploration and exploitation phases. The distance vectors are expressed as

E1=EJ-U1CJ,E2=EK-U2CK,E3=EL-U3CL,

where U1, U2, and U3 indicate the coefficient vectors and Ej is the best solution at time t. U1, U2, and U3 represent the tunable parameters in the optimization process. In general, the hunting process among grey wolves is based on the first three best solutions, according to the roles of the leader, representative, and obeyer grey wolves [18].

Ebc,gs+1=E1+E2+E33,

where E1 indicates the position of the leader grey wolf, E2 is the position of the representative grey wolf, and E3 is the position of the obeyer grey wolf of the optimization procedure, with reference to the three best hunting locations. The hunting is led by the space between the grey wolf and prey, as well as the tunable coefficient parameters, as shown in the above modeling. As a result,

Et+1=[(EJ+EK+EL)-(U1CJ+U2CK+U3CL)3].

Eq. (20) is the updated formulation of the GWO optimization method, which is based on the constraints imposed by the grey wolf hierarchy-based hunting characteristics. Using Eq. (20), the GWO optimization method improves the accomplishment of the global best solution.

d) Social dominance stage: The grey wolves only probe the prey inside the search region during this phase and pick the ideal local sites to attack the prey.

e) Exploration stage: The grey wolves, including the leader, representative, obeyer, and scapegoat, investigate the environment outside the resolution space in this phase and arrive at the best possible conclusion. The pseudocode of the GWO algorithm is presented in Table 1.

5. Results and Discussion

This section discusses the outcomes of the GWO-ANN prediction module, and a comparative analysis demonstrates the performance of the GWO-ANN in cardiovascular disease prediction.

5.1 Experimental Setup

The study was conducted using Python, which was installed on a Windows 10 64-bit computer with 16 GB of RAM.

5.2 Database Description

a) Heart disease database: The heart disease database was used as the dataset in the proposed cardiovascular disease prediction module. Of the 76 attributes in this database, only 14 are frequently assessed for research purposes. Using these data, an experimental study was conducted to determine the presence or absence of cardiac disorders [17].

b) Statlog + Cleveland + Hungary database: This dataset was created by integrating several datasets that were previously accessible individually but had never been merged. The Heart Disease UCI and Statlog + Cleveland + Hungary datasets, both available on Kaggle, are essential resources for predicting heart disease using ML. The Heart Disease UCI dataset contains 303 instances with 14 attributes, including age, sex, chest pain type, resting blood pressure, serum cholesterol, fasting blood sugar, resting electrocardiographic results, maximum heart rate achieved, exercise-induced angina, ST depression induced by exercise, slope of the peak exercise ST segment, number of major vessels colored by fluoroscopy, thalassemia, and a target variable indicating the presence of heart disease. This dataset allows for comprehensive analysis and feature engineering, making it useful for developing predictive models. The Statlog + Cleveland + Hungary dataset combines data from three major heart disease studies, and offers a robust dataset with 918 instances and the same 14 attributes. Merging records from the Cleveland, Statlog, and Hungary datasets provides a more extensive and diverse set of patient information, enhancing the generalizability and robustness of heart disease prediction models. This dataset is particularly valuable for analyzing trends and creating models that can predict heart disease across different populations.

These databases were integrated based on 11 similar traits, resulting in the world’s largest heart disease dataset [18].

5.3 Evaluation Metrics

a) Accuracy: In optimal medical data categorization, the accuracy is defined as the scale on which an evaluated value is close to its actual measure.

b) Sensitivity: The ratio of positives accurately detected by the classifier is referred to as the sensitivity.

c) Specificity: The ratio of properly detected negatives by the classifier is known as the specificity.

d) F1-score: The F1-score, which is the average of the precision and recall, is a measure of a model’s accuracy.

5.4 Comparative Methods

To validate the efficacy of the proposed technique, methods including the multilayer perceptron (MLP) classifier, SVM classifier, random forest (RF), AdaBoost classifier [1922], and ANN classifier [8] were compared with the proposed GWO-ANN classifier in terms of the accuracy, sensitivity, specificity, and F1-score.

5.5 Comparative Analysis

This section presents a comparative evaluation of the approaches used in the classification of heart disease using conventional techniques and the proposed model (Table 2).

a) Analysis based on heart disease database

Figure 3 presents a comparative analysis of these methods using the heart disease database. Figure 3(a) presents a comparative evaluation of the methods in terms of the accuracy. The accuracies of the MLP, SVM, RF, AdaBoost, ANN, and proposed GWO-ANN methods for the training percentage of 70% were 80.898%, 81.3107%, 90.4011%, 84.8175%, 90.8367%, and 92.4979%, respectively. Figure 3(b) presents a comparative evaluation in terms of the sensitivity. The sensitivities of the MLP, SVM, RF, AdaBoost, ANN, and proposed GWO-ANN for the training percentage of 70% were 77.7198%, 82.9382%, 90.3623%, 89.4911%, 90.7979%, and 94.3203%, respectively.

Figure 3(c) presents a comparative evaluation in terms of the specificity. The specificities of the MLP, SVM, RF, AdaBoost, ANN, and proposed GWO-ANN for the training percentage of 70% were 71.3489%, 75.6583%, 86.9725%, 84.8229%, 88.8337%, and 90.4949%, respectively. Figure 3(d) shows the comparative evaluation in terms of the F1-score. The F1-scores of the MLP, SVM, RF, AdaBoost, ANN, and proposed GWO-ANN for the training percentage of 70% were 81.0367%, 82.401%, 90.7993%, 87.5715%, 92.1061%, and 95.6285%, respectively.

b) Evaluation based on Statlog + Cleveland + Hungary database

Figure 4 shows a comparative evaluation using the Statlog + Cleveland + Hungary database. Figure 4(a) shows a comparative evaluation of the accuracy of the methods. The accuracies of the MLP, SVM, RF, AdaBoost, ANN, and proposed GWO-ANN for the training percentage of 70% were 54.0108%, 77.8053%, 89.3536%, 87.9337%, 91.5205%, and 92.8122%, respectively. Figure 4(b) presents a comparative evaluation of the methods in terms of the sensitivity. The sensitivities of the MLP, SVM, RF, AdaBoost, ANN, and proposed GWO-ANN for the training percentage of 70% were 70.4158%, 75.9977%, 92.2176%, 91.0989%, 93.3803%, and 94.6347%, respectively.

Figure 4(c) presents a comparative evaluation of the methods in terms of their specificity. The specificities of the MLP, SVM, RF, AdaBoost, ANN, and proposed GWO-ANN for the training percentage of 70% were 52.0078%, 81.8617%, 86.7021%, 85.242%, 89.5175%, and 90.8093%, respectively. Figure 4(d) presents a comparative evaluation based on the F1-score. The F1-scores of the MLP, SVM, RF, AdaBoost, ANN, and proposed GWO-ANN for the training percentage of 70% were 57.1414%, 75.6547%, 92.1683%, 90.7796%, 94.6511%, and 95.9429%, respectively.

The proposed method attained the highest accuracy of 92.50% for the heart disease dataset and 92.92% accuracy for the Statlog + Cleveland + Hungary database. In addition, the sensitivity, specificity, and F1-score of the proposed method were 94.75%, 90.92%, and 96.06%, respectively. It is clear from the results that the proposed model enables enhanced heart disease classification, with better measures of accuracy, sensitivity, specificity, and F1-score.

6. Concludsion

This study has presented an ML-based heart disease prediction module for the diagnosis of cardiovascular diseases by analyzing heart disease data. The heart disease data were initially preprocessed to eliminate artifacts. Significant statistical features such as the mean, standard deviation, variance, kurtosis, and skewness were then extracted from the data, through which a feature vector was developed that acted as the input to the proposed ANN classifier. The performance of the ANN was enhanced by optimally tuning the weights using the GWO algorithm, which is derived by considering the hierarchy-based hunting characteristic features of grey wolves. The efficiency of the proposed model was verified using performance measures including the accuracy, sensitivity, specificity, and F1-score, with results of 92.9245%, 94.7469%, 90.9215%, and 96.0551%, respectively. In future, the accuracy of the model will be further enhanced by optimally tuning the ANN classifier using hybrid metaheuristic optimization algorithms. Furthermore, this method can be tested for real-time applications in the medical field.

Fig 1.

Figure 1.

Block diagram of proposed heart disease detection module.

The International Journal of Fuzzy Logic and Intelligent Systems 2024; 24: 231-241https://doi.org/10.5391/IJFIS.2024.24.3.231

Fig 2.

Figure 2.

Structure of neural network.

The International Journal of Fuzzy Logic and Intelligent Systems 2024; 24: 231-241https://doi.org/10.5391/IJFIS.2024.24.3.231

Fig 3.

Figure 3.

Comparative analysis based on heart disease database: (a) accuracy, (b) sensitivity, (c) specificity, and (d) F1-score.

The International Journal of Fuzzy Logic and Intelligent Systems 2024; 24: 231-241https://doi.org/10.5391/IJFIS.2024.24.3.231

Fig 4.

Figure 4.

Comparative evaluation based on Statlog + Cleveland + Hungary database: (a) accuracy, (b) sensitivity, (c) specificity, and (d) F1-score.

The International Journal of Fuzzy Logic and Intelligent Systems 2024; 24: 231-241https://doi.org/10.5391/IJFIS.2024.24.3.231

Table 1 . Pseudocode of GWO algorithm.

Sl. no.Pseudocode of GWO optimization algorithm
1Initialize grey wolf population and coefficients
2Calculate fitness value for grey wolves
3For all grey wolves
4{
5EJ as best grey wolf solution
6EK as best grey wolf solution
7EL as best grey wolf solution
8}
9While s < smax
10{
11 Update the grey wolf position as per Eq. (20)
12}
13Else
14{
15 Update the coefficients, C, m
16 Evaluate fitness for all grey wolves
17 Update the solutions EJ, EK, EL
18s = s + 1
19}
20EndWhile
21EndFor
22Return

Table 2 . Comparison of heart disease classification techniques.

S. no.DatasetsMetricsMethods

MLPSVMRFAdaBoostANNProposed GWO-ANN
1Heart disease datasetAccuracy (%)81.1985.0190.5289.0991.8392.50
Sensitivity (%)85.4890.6192.3491.2393.6594.32
Specificity (%)76.3078.6188.5287.0990.4990.49
F1-score (%)85.0688.6593.6592.2294.9695.63

2Statlog + Cleveland + Hungary databaseAccuracy (%)85.6586.9390.3689.2291.6392.92
Sensitivity (%)85.8188.4592.3391.2193.4694.75
Specificity (%)83.6686.2088.3587.2289.6390.92
F1-score (%)88.3789.7093.4992.3594.7696.06

References

  1. Mohan, S, Thirumalai, C, and Srivastava, G (2019). Effective heart disease prediction using hybrid machine learning techniques. IEEE Access. 7, 81542-81554. https://doi.org/10.1109/ACCESS.2019.2923707
    CrossRef
  2. Said, IU, Adam, AH, and Garko, AB (2015). Association rule mining on medical data to predict heart disease. International Journal of Science Technology and Management. 4, 26-35.
  3. Chauhan, A, Jain, A, Sharma, P, and Deep, V. Heart disease prediction using evolutionary rule learning., Proceedings of 2018 4th International Conference on Computational Intelligence & Communication Technology (CICT), Ghaziabad, India, 2018, pp.1-4. https://doi.org/10.1109/CIACT.2018.8480271
    CrossRef
  4. Valarmathi, R, and Sheela, T (2021). Heart disease prediction using hyper parameter optimization (HPO) tuning. Biomedical Signal Processing and Control. 70. article no 103033. https://doi.org/10.1016/j.bspc.2021.103033
    CrossRef
  5. Alotaibi, FS (2019). Implementation of machine learning model to predict heart failure disease. International Journal of Advanced Computer Science and Applications. 10, 261-268. https://doi.org/10.14569/IJACSA.2019.0100637
    CrossRef
  6. Motwani, M, Dey, D, Berman, DS, Germano, G, Achenbach, S, and Al-Mallah, MH (2017). Machine learning for prediction of all-cause mortality in patients with suspected coronary artery disease: a 5-year multicentre prospective registry analysis. European Heart Journal. 38, 500-507. https://doi.org/10.1093/eurheartj/ehw188
    CrossRef
  7. Condie, T, Mineiro, P, Polyzotis, N, and Weimer, M . Machine learning for big data., Proceedings of the 2013 IEEE 29th International Conference on Data Engineering (ICDE), Brisbane, Australia, 2013, pp.1242-1244. https://doi.org/10.1109/ICDE.2013.6544913
    CrossRef
  8. Pan, Y, Fu, M, Cheng, B, Tao, X, and Guo, J (2020). Enhanced deep learning assisted convolutional neural network for heart disease prediction on the internet of medical things platform. IEEE Access. 8, 189503-189512. https://doi.org/10.1109/ACCESS.2020.3026214
    CrossRef
  9. Abdel-Basset, M, Gamal, A, Manogaran, G, Son, LH, and Long, HV (2020). A novel group decision making model based on neutrosophic sets for heart disease diagnosis. Multimedia Tools and Applications. 79, 9977-10002. https://doi.org/10.1007/s11042-019-07742-7
    CrossRef
  10. Thiyagaraj, M, and Suseendran, G. (2020) . Enhanced prediction of heart disease using particle swarm optimization and rough sets with transductive support vector machines classifier. Data Management, Analytics and Innovation, 141-152. https://doi.org/10.1007/978-981-13-9364-811
    CrossRef
  11. Shah, SMS, Shah, FA, Hussain, SA, and Batool, S (2020). Support vector machines-based heart disease diagnosis using feature subset, wrapping selection and extraction methods. Computers & Electrical Engineering. 84. article no 106628. https://doi.org/10.1016/j.compeleceng.2020.106628
    CrossRef
  12. Magesh, G, and Swarnalatha, P (2021). Optimal feature selection through a cluster-based DT learning (CDTL) in heart disease prediction. Evolutionary Intelligence. 14, 583-593. https://doi.org/10.1007/s12065-019-00336-0
    CrossRef
  13. Nilashi, M, Ahmadi, H, Manaf, AA, Rashid, TA, Samad, S, Shahmoradi, L, Aljojo, N, and Akbari, E (2020). Coronary heart disease diagnosis through self-organizing map and fuzzy support vector machine with incremental updates. International Journal of Fuzzy Systems. 22, 1376-1388. https://doi.org/10.1007/s40815-020-00828-7
    CrossRef
  14. Bharti, B, and Singh, SN . Analytical study of heart disease prediction comparing with different algorithms., International Conference on Computing, Communication & Automation, Greater Noida, India, 2015, pp.78-82. https://doi.org/10.1109/CCAA.2015.7148347
    CrossRef
  15. Akgul, M, Sonmez, OE, and Ozcan, T (2019). Diagnosis of heart disease using an intelligent method: a hybrid ANN–GA approach. Intelligent and Fuzzy Systems in Big Data Analytics and Decision Making. Cham, Switzerland: Springer, pp. 1250-1257 https://doi.org/10.1007/978-3-030-23756-1147
    CrossRef
  16. Mirjalili, S, Saremi, S, Mirjalili, SM, and Coelho, LDS (2016). Multi-objective grey wolf optimizer: a novel algorithm for multi-criterion optimization. Expert Systems with Applications. 47, 106-119. https://doi.org/10.1016/j.eswa.2015.10.039
    CrossRef
  17. Kaggle, (2020). UCI Heart Disease Data. Available: https://www.kaggle.com/datasets/redwankarimsony/heart-disease-data
  18. Kaggle, (2019). Heart disease dataset (comprehensive): statlog + cleveland + hungary dataset. Available: https://www.kaggle.com/datasets/sid321axn/heartstatlog-cleveland-hungary-final
  19. Deepika, D, and Balaji, N (2022). Effective heart disease prediction using novel MLP-EBMDA approach. Biomedical Signal Processing and Control. 72. article no 103318. https://doi.org/10.1016/j.bspc.2021.103318
    CrossRef
  20. Mythili, T, Mukherji, D, Padalia, N, and Naidu, A (2013). A heart disease prediction model using SVM-decision treeslogistic regression (SDL). International Journal of Computer Applications. 68, 11-15. https://doi.org/10.5120/11662-7250
    CrossRef
  21. Jabbar, MA, Deekshatulu, BL, and Chandra, P (2016). Intelligent heart disease prediction system using random forest and evolutionary approach. Journal of Network and Innovative Computing. 4, 175-184.
  22. Rani, P, Kumar, R, Ahmed, NMS, and Jain, A (2021). A decision support system for heart disease prediction based upon machine learning. Journal of Reliable Intelligent Environments. 7, 263-275. https://doi.org/10.1007/s40860-021-00133-6
    CrossRef

Share this article on :

Related articles in IJFIS