Article Search
닫기

Original Article

Split Viewer

International Journal of Fuzzy Logic and Intelligent Systems 2023; 23(2): 117-129

Published online June 25, 2023

https://doi.org/10.5391/IJFIS.2023.23.2.117

© The Korean Institute of Intelligent Systems

Improved Swarm Intelligence Optimization Model with Mutual Information Estimation for Feature Selection in Microarray Bioinformatics Datasets for Diseases Diagnosis

Peddarapu Rama Krishna and Pothuraju Rajarajeswari

Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Andhra Pradesh, India

Correspondence to :
Peddarapu Rama Krishna (peddarapuramakrishna@gmail.com)

Received: January 28, 2023; Revised: May 22, 2023; Accepted: June 19, 2023

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Bioinformatics has emerged as a promising field with innovative applications in various biological domains. Microarray data analysis has become the preferred technology for estimating gene expression and diagnosing diseases. However, processing and computing the vast amount of information contained in microarray data pose significant challenges. This study focuses on developing an effective feature selection model for disease diagnosis using microarray datasets. The proposed approach, mutual fuzzy swarm optimization (MFSO), uses mutual information (MI) values to compute features in a microarray gene dataset. The computed MI values are then applied to a fuzzy expert system (FES) for sample classification. To improve classification accuracy, fuzzy logic is iteratively estimated for unclassified instances in the microarray sample. The feature selection model employs a particle swarm optimization model to extract microarray sample features based on the derived MI. The swarm optimization movement is guided by an objective function. The expert system integrates fuzzy logic with human expert knowledge to perform medical diagnoses, with a specific focus on the selected features. The proposed MFSO model utilizes if-then rules to estimate the membership values of the microarray dataset features. Through simulation analysis, the performance of the proposed MFSO model is evaluated using sensitivity, specificity, and ROC values for five different microarray datasets. The results demonstrate improved performance compared to existing methods. In conclusion, this study presents a novel approach for effective feature selection in a microarray dataset for disease diagnosis. The proposed MFSO model integrates MI computation, fuzzy logic, and expert knowledge to achieve improved classification accuracy. The simulation analysis validates the effectiveness of the proposed model, highlighting its superior performance compared to existing methods.

Keywords: Microarray, Fuzzy expert system (FES), Particle swarm optimization, Bioinformatics, Feature selection

Bioinformatics is an interdisciplinary field that encompasses statistics, biology, computer science, and management. It involves modeling and analyzing data related to biological sequences, genome content, and arrangement to predict macromolecule structures. With the the advancement of genomic data, bioinformatics has gained prominence in biological and medical research, leading to increased accounts throughout the years. Bioinformatics combines information and computational biology, exhibiting interrelated characteristics. It is a conventional discipline that utilizes computers, engineers, statisticians, and mathematicians to support information management in healthcare.

Microarray data are widely regarded as effective tools for examining DNA expression in large-scale gene sequences. Microarray technology facilitates the interaction between gene expression and biology, enabling genome-wide analysis. It involves the parallel processing of gene profiles through a hybridization model. The evaluation of eDNA expression at the genome level allows for the potential application of microarrays in the biomedical community. Researchers have reported the use of microarray data in pharmaceutical drug development for selection, assessment, and quality control, as well as in disease diagnosis and monitoring to derive adverse effects from therapeutic interventions. Tumors can be classified using microarray technology based on molecular characteristics, providing an easier alternative to traditional diagnostic schemes.

Microarray techniques have been utilized in the study of carcinogenesis, drug discovery, and the subclassification of cancer cells. Accurate diagnosis of cancer has been demonstrated through the effective classification of microarray data. Gene expression profiles of cancer tissues have been used to assess cancer status using gene expression classifier profile databases. Microarray data typically have high-dimensional gene expression dimensions, which pose challenges in terms of noise measurement and data analysis due to differences in sample quantities. Moreover, the imbalance in label class samples further complicates the analysis. Reducing the dimensionality of microarray data using computer-based systems can mitigate these challenges. However, the analysis of high-dimensional, low and skewed data incurs higher computation costs and can result in poor learning and classification processes. Therefore, microarray data can significantly impact disease identification and estimation. The microarray classification process involves two steps: gene selection and the construction of a classification model to generate accurate predictions from the data.

Bioinformatics has emerged as a promising and innovative field with applications in various biological domains. In particular, the analysis of microarray data has gained significant attention in the disease diagnosis based on gene expression patterns. However, processing and computing large volumes of information contained in microarray data present substantial challenges. This study aims to address the need for an effective feature selection model for disease diagnosis using microarray datasets, thereby contributing to the field. The specific objectives of this study include constructing a feature selection model and evaluating its performance in accurately diagnosing diseases. The research gap lies in the development of a feature selection model capable of effectively handling the challenges associated with microarray data. While previous studies have explored various feature selection methods, there is a need for an approach that can optimize the selection process to improve disease diagnosis accuracy using microarray datasets. This study aims to fill this gap by proposing a novel approach, the mutual fuzzy swarm optimization (MFSO) model, and evaluating its performance against existing methods. By utilizing mutual information (MI) computation, fuzzy logic, and expert knowledge, the MFSO model aims to enhance classification accuracy and improve disease diagnosis performance. A simulation analysis using multiple microarray datasets is conducted to compare the proposed model with existing methods to demonstrate its superiority.

This study aims to address the research gap in effective feature selection for disease diagnosis using microarray datasets. The proposed MFSO model introduces novel approaches and techniques to improve accuracy and performance. Evaluating the model against existing methods will provide evidence of its effectiveness and contribute to the advancement of bioinformatics for disease diagnosis.

The contributions of this study are as follows:

  • • Development of an effective feature selection model: This study proposes the MFSO model as a novel approach for feature selection in disease diagnosis using microarray datasets. The MFSO model integrates MI computation, fuzzy logic, and expert knowledge to optimize the selection process and improve the classification accuracy. By developing this model, this study contributes to the advancement of feature selection techniques specifically tailored for microarray data analysis.

  • • Improved accuracy in disease diagnosis: The proposed MFSO model aims to enhance the accuracy of disease diagnosis using microarray datasets. Through the integration of MI, fuzzy logic, and expert knowledge, the model provides an effective approach for identifying relevant features and classifying samples. Achieving higher accuracy rates improves the reliability and effectiveness of disease diagnosis using microarray data.

  • • Comparison with existing methods: This study conducts a thorough evaluation of the proposed MFSO model by comparing its performance with that of existing feature selection methods. By benchmarking the MFSO model against established approaches, this study provides evidence of its superiority and demonstrates its potential for practical applications in bioinformatics. The comparison identifies the strengths and advantages of the proposed model over existing methods, contributing to the existing body of knowledge .

  • • Addressing the research gap: The study focuses on the need for an effective feature selection model for disease diagnosis using microarray datasets, addressing a specific research gap in bioinformatics. By proposing a novel approach, this study fills an important gap in the literature and offers a valuable contribution to bioinformatics research.

• The study contributes to the field of bioinformatics by introducing a novel feature selection model, improving the accuracy of disease diagnosis, providing a comparative analysis with existing methods, and addressing research gaps. These contributions have implications for both the theoretical understanding and practical application of bioinformatics in disease diagnosis.

Microarray technology is used to compute the genome probe, generating data points in a high-dimensional space based on the genome size. However, microarray data classification often involves large sample sizes.

In their work [12], an optimization model was presented for feature selection and classification of medical data. The developed model utilizes the centroid mutation-based Search and Rescue (cmSAR) optimization algorithm that incorporates a k-nearest neighbor (kNN) classifier for disease classification.

The cmSAR model performs selects features from an optimal group by estimating two classes: convergence and local searches. It employs an fuzzy logic system that considers multivalue fuzzy set logic for the mutation operations. Statistical results demonstrated the effectiveness of the metaheuristics-based optimization model for classification. The disease dataset used in the study comprised 15 disease data features extracted from the UCI dataset. Additionally, the cmSAR model uses the CEC-C06 2019 objective function and performs classification based on the Friedman and Bonferroni–Dunn tests. The examination results confirmed that this cmSAR model achieves improved performance for medical datasets.

In another study [13], phonocardiogram (PCG) signal analysis was conducted by applying the decomposition of the intrinsic mode function (IMF) Hilbert-Huang transform. The analysis employed the mel-frequency cepstral coefficient for feature extraction and the Hilbert-Huang transform for signal classification. Feature selection methods such as the kNN, multilayer perceptron, support vector machine (SVM), and deep neural network (DNN) were employed for classification. This PCG model uses multiple classification classes, including healthy, aortic stenosis, mitral stenosis, mitral regurgitation, and mitral valve prolapse, and employs a five-fold cross-validation approach. The DNN model demonstrated higher precision, recall, F1-score, and accuracy values of 98.9%, 98.7%, 98.8%, and 98.9%, respectively.

In [14], a four-step hybrid ensemble feature selection algorithm was examined using cross-validation. The deployed model utilizes a filter method with an ensemble-weighted score based on ranking features using a sequential forward selection model to create a wrapper-based optimal feature subset. The optimal subset features were sampled to perform the classification. An experimental analysis was performed on a 20-benchmark dataset based on the dimensionalities. The hybrid approach exhibited a higher dataset dimensionality. The proposed hybrid approach comprises of feature selection algorithm, such as naive bayes, radial basic function and SVM, kNN, and random forest, to perform classification. The experimental results demonstrated that the proposed methodology exhibits higher accuracy, specificity, sensitivity, AUC, and F1-score for the selected features. The results exhibited improved performance in achieving competitiveness with state-of-the-art methods.

In [15], a framework for disease diagnosis and classification was constructed. The developed GENE model performs gene selection using a multiobjective player selection strategy-based hybrid population search (MOPS-HPS). The model uses a multifiltering adaptive parameter to perform gene selection. Using a wrapper-based scheme, a rotational blending operator is employed for the transmute binary search space through stochastic search phase estimation to perform classification. The effectiveness of the developed GENEmops model was evaluated on 16 biological datasets and its performance was compared to that of eight binary and eight multi-class models. GENEmops exhibited improved intelligent computational multiclass features with enhanced efficiency.

Neighborhood estimation based on the Euclidean distance was performed to evaluate minority classes. High-dimensional microarray features were evaluated based on the Euclidean distance settings, assuming equal weights for the different features. An effective SMOTE-weighted Minkowski distance computation model was developed for estimating samples from the minority class [16].

The model features were evaluated in a relevant classification task, performing attributes selection with a threshold value of weights. To assess the predictive performance of the model, an experimental analysis was conducted on the 42-imbalance classes. The analysis aimed to compare the performance with the traditional SMOTE approach in both low- and high-dimensional settings, while ensuring that complexity was not affected.

A binary shuffled frog leaping algorithm (BSFLA) meta-heuristics model is implemented in [17] for gene selection. The gene subset was computed based on optimal features, considering 20 subsets of genes extracted from the original dataset. The gene subset was implemented using the kNN classifier model and compared with an artificial neural network (ANN) and SVM. The comparative analysis showed improved performance through particle swarm optimization.

In [18], a relief attribute evaluation was deployed with the light gradient boosting machine (GBM) and DNN model for classifying non-cancer and cancer cells using 20 machine-learning techniques. The analysis was performed on 111 patients with 22,278 features using Python and Weka software. The evaluation metrics used were accuracy, precision, recall, and AUC, utilizing decision trees, naive Bayes, AdaBoost, quantitative descriptive analysis, and DNN models. The classification network demonstrated exceptional performance by achieving 100% accuracy on the microarray dataset, effectively distinguishing between non-cancerous and cancerous cells.

A framework for feature selection was developed in [11] to increase feature discrimination and stability. Features were selected using aggregation strategies within the sampled dataset. These strategies were combined with a multiclass model to achieve classification accuracy. The stability features were examined using binary and multiclass metrics. The simulation results indicated that this method has a higher stability score compared to other classifiers.

In [19], a feature-level ensemble model was constructed for microarray classification using feature subspace partitioning (referred to as fuzzy logic and adaptive evolutionary optimizer-based feature selection for optimization of fuzzy inference systems). The ensemble classification model consists of three steps: generating subsets through partitioning, feature selection, and aggregation of the ranking function. The developed framework model uses a microarray dataset, considering parameters such as accuracy, stability, and runtime.

The proposed methodology introduces the MFSO model to estimate MI for optimizing variables in microarray datasets. The MFSO model computes the optimization of microarray variables to extract features from the dataset.

(1) Fuzzy logic: Fuzzy logic is a mathematical framework that addresses uncertainty and imprecision by allowing the representation and manipulation of vague or fuzzy concepts. It extends classical binary logic by introducing the concept of partial truth, where membership functions (MFs) quantify the degree to which an element belongs to a particular set. In this study, fuzzy logic is utilized to estimate the membership values of features in the microarray dataset using if-then rules.

The proposed MFSO model incorporates fuzzy logic by iteratively estimating the membership values for unclassified instances in the microarray sample. By utilizing if-then rules and membership estimation, the model integrates human expert knowledge and fuzzy reasoning to improve classification accuracy and enhance disease diagnosis performance.

(2) Particle swarm optimization (PSO): PSO is a metaheuristic optimization algorithm inspired by the behaviors of bird flocks and fish schools. It is used to solve optimization problems by simulating the social behavior of particles in a multi-dimensional search space. Each particle represents a potential solution and its movements are influenced by its own best position (pbest) and the global best position (gbest) found by the swarm.

In the proposed MFSO model, PSO is employed to extract microarray sample features based on the derived MI. The swarm optimization movement is guided by an objective function that aims to optimize the feature selection process. By leveraging the PSO, the model explores the search space and identifies the most relevant and informative features for disease diagnosis in microarray datasets.

Fuzzy logic and PSO are selected for this study because of their ability to handle uncertainty, optimize feature selection, and integrate expert knowledge. By incorporating these methodologies into the proposed MFSO model, this study aims to leverage their strengths and benefits to enhance the accuracy and effectiveness of disease diagnosis using microarray datasets.

3.1 Mutual Information

MI is a measure that captures the mutual dependence between two variables. The proposed MFSO model consists of a stochastic system with a variable input X and an output Y. The data are represented as a discrete variable X with possible values denoted as Nx, and variable Y with possible values denoted as Ny. The initial uncertainty, expressed as the entropy estimation H(Y), is defined by Eq. (1):

H(Y)=j=1NyP(yj)log(P(yi)).

Here, the probability value of Y is denoted as P(yj). The uncertainty of the system output Y is evaluated based on input X using the conditional entropy value H(Y | X), defined by Eq. (2):

H(YX)=-i=1NxP(xi)[i=1NyP((yixi)·log P(yjxi)].

In Eq. (2), the conditional probability of output variable yj given input vector xi is denoted as P(xi). The input and possible value vectors, x and y, are represented by i and j, respectively. The difference in uncertainty between the system outputs, denoted as H(Y )−H(YX), is computed based on the input and serves as a measure of the output system uncertainty with respect to the input. The MI between random variables X and Y is evaluated using I(Y ;X), defined by Eq. (3):

I(Y;X)=H(Y)-H(YX).

The information is estimated by computing the average uncertainty value considering the random variable Y while reducing the uncertainty of X. Gene selection is performed based by considering the class label Y in the sample with the related X variable in the genes. The process of MI computation in the MFSO is presented in Figure 1.

The steps of the MFSO are as follows:

  • 1. Compute the gene expression sample ′S.′

  • 2. Evaluate each input gene ′S′ using different equations, as follows:

    • a. Estimate the initial entropy value using Eq. (1)

    • b. Compute the conditional entropy using Eq. (2)

    • c. Evaluate the MI between the gene expressions using Eq. (3)

  • 3. Based on the estimated MI values, store and sort the genes.

  • 4. For every gene 1, 2, ..., n

    • a. Partition the sample population ′S′ into training and testing sequences Ta and Tb, respectively.

    • b. Use the optimization model to measure the fuzzy set values for training and testing with variables Ta and Tb for the input gene.

    • c. Classify the sample based on testing sample features Te and unclassified samples are represented as Tf .

  • For every unclassified sample Te, Steps 2–4 are repeated until the condition for achieving optimal classification accuracy is met. Here, the unclassified sample value is denoted as α, starting with 1.

3.2 MFSO Fuzzy Expert System

The microarray data for the gene expression elements are computed based on the fuzzy expert system (FES), as illustrated in Figure 2. With the FES, the MFs are aggregated using the if-then rules to derive decisions regarding the data. The prediction class is indicated using discrete sample labels with nonlinear mapping of the input and output spaces. The FES microarray data architecture is illustrated in Figure 2.

The FES model consists of the constructed if-then linguistic general form “IF A THEN B.” The proposition of variables A and B is estimated with the rule set B for premises and consequences. The uncertainty between the variables is evaluated based on the linguistic variable and fuzzy set value if-then. The FL model is crucial for accumulating and summarizing data to derive decision-relevant information. The universal fuzzy set F with discourse Z is presented in Eq. (4):

F={(z,μA(z)zZ)}.

In Eq. (4), the MF μA(z) is denoted as z in F. FES uses the Mamdani inference system to evaluate the gene input for gene matching using the if and the if-then rules for the fuzzy process. Fuzzy weights are computed based on each rule degree in the fuzzy interference system. The output class is aggregated through specified fuzzy rules, and the output confidence is the input value for the determined class. Using the PSO algorithm, the if-then rule set is evaluated for the MF. Gene selection is evaluated using FES, which is significant in the microarray dataset with higher dimensionality. The optimization model considers MI for estimating the variables during gene selection. The MF in the variable is computed using the if-then rule based on the Boolean logic function to estimate the true or false values in the data samples. The generalized rule mode of the proposed MFSO model is computed using Eq. (5):

Rulej:ifxp1==Aj1andxpn==Ajnthenclass=Cj.

Here, the gene input value is represented by xp1, . . . . . . , xpn, and the antecedent fuzzy set value is denoted by Aj1,......,Ajn . The label class output is defined as Cj with a quantitative rule-based model with if-then rules. The fuzzy relationship between the variables is computed using the if-then rules for the input and output variables.

The relationships between the variables are computed using the Cartesian product universal set model. The universal set X is evaluated with the fuzzy set A with ordered pairs based on Eq. (6):

A={x,μA(x)xX)}.

In this case, the input variable x in A is denoted as μA(x) and is estimated using trapezoidal and triangular functions, employing FES for the MF variable.

3.3 MFSO PSO model

The MFSO PSO model uses food sources for the phases through a random search model to derive the decision-making process for the selection of food sources. The PSO model used to select features from the gene data is illustrated in Figure 3.

3.3.1 Data representation

The MFSO model computes the optimal rule-set model with PSO. The estimation of the variable is based on the optimal rule set with combinational optimization features in the genes. Particle swarm features are computed based on the tuned MF for the allocated points uniformly in the gene expression range. Tuning with the membership model is evaluated using continuous optimization features in the dataset. The optimal variables eliminate the overall relationship between gene features, as presented in Figure 4.

The PSO model initializes the parameter values by computing the particles in parts that are within the permissible range. A particle swarm within a permissible range is evaluated and effectively processed using the constructed path. The path explored in the optimization model features is evaluated based on a combination of discrete values to derive the decision values based on the objective function. The objective function features are estimated using the objective function model. If the process is yes, the optimal path is selected, and the variables are computed with an updated pheromone and a permissible range set using the iteration process. Each iteration value is evaluated for a random particle swarm to derive the optimum solution.

The particle swarm between variables is used to compute the shortest optimal path for the computation of the edges to estimate the optimal path. To minimize the interaction between the swarms in the region, the movement values are based on global and local functions to derive the optimal path in the region. The flow of the PSO model is presented in Figure 5.

The MFSO model is computed based on the initial variable matrix to obtain the optimal solution for gene features. The gene features considered in this model are blood cells, pancreatic islets, liver cells, and neurons. The initial random solution space primary matrix is computed using Eq. (7):

P=[X1X2xN].

In Eq. (7), the features are represented as Xi and i = 1, 2, ..., N, denoting different feature variables. The evaluation process is takes into account the maximum number of cells, population size in each iteration, and the cost associated with each cell. Different function criteria are considered to determine the target problem. The MFSO model consists of global and local cells that store cost values as memory. The self-renewal process involves considering different feature-area models. At this stage, all particle information is shared among cells based on the cost value associated with the selected genes or features for gene selection.

The PSO model demonstrated enhanced estimation of the swarm path movement through the utilization of a self-estimation model derived using Eq. (8):

SWOptimum(t+1)=Sr×SCoptimum(t).

MFSO is performed using the iteration value denoted as t to estimate the best or optimal solution in every iteration process. The random number is represented as Sr, and it lies between 0 and 1. The current iteration value required to achieve the optimal solution for the variable is denoted as SWOptimum(t+1). The process continues until the optimal swarm value is derived to achieve the best result using the objective function to compute the fitness value, as presented in Eq. (9):

xij(t+1)=μij+ϕ(μij(t)-μkj(t)).

Based on the PSO algorithm, a new swarm movement is computed using Eq. (9). The solution is estimated with ith feature solution dimension denoted as μij, iteration number value is represented as t, estimated random path for the dimension solution is denoted as j based on random variables μkj and ϕ within the range of −1–1.

The swarm path ω random variable is in the range of 0–1. The selected optimal best path is related to path estimation to derive the optimal solution. The optimal solution value is computed using Eq. (10):

Pi=SCOptimumΣi=1NSCi.

In Eq. (10), the features for comparison are denoted as Pi, the cost of the features is represented as SCi, and the final cost of the population is defined as N. These values are used to determine the best solution through the implementation of the PSO algorithm. The algorithm employs different parameter features to establish the control parameters for PSO, which are initialized as variables. The evaluation of the PSO model is based on several factors, including population size (P), dimension (D), maximum iteration (I), trials (T), and maximum number of genes (N). Initially, a random population is generated, and these variables are estimated based on the initial population. The features in the microarray dataset are evaluated using Eq. (11):

Sfit(t)={Si,Si0,1+abs(Si),Si<0.

Therefore, the maximal fitness function Sfit(t) is evaluated using an objective algorithm minimization process. Si represents a parameter or variable that influences the fitness score at time t. The fitness score is calculated based on the given conditions to assign appropriate values to Sfit(t). At the same time instances, the optimal solution is computed to estimate the constraints. The fitness value of the gene is evaluated using the best value in the gene, which is derived using Eq. (12):

SCOptimum(t+1)=Sr*SCOptimum(t).

The feature computation is performed using the greedy selection model, considering both old and new fitness values. The relative features of the variables are evaluated with global and local feature estimations using a fuzzy set model. Through feature estimation, the optimal cost for the features is determined to improve processing speed. The termination condition is set to achieve the maximum number of iterations, allowing the evaluation of the output and derivation of the best solution features through the iteration process. The pseudocode representation of the proposed MFSO model is as Algorithm 1.

The MFSO model utilizes a fuzzy-set model to classify a microarray dataset consisting of five different datasets: type 2 diabetes (T2D), colon cancer, prostate cancer, insulin sensitive resistance (ISR), and leukemia. The dataset comprises a two-class gene expression profile that is publicly accessible. Table 1 provides an overview of the microarray dataset used for feature selection.

The results of the proposed MFSO model are obtained through 30 independent trials to compute unique values and estimate the control parameters and convergence values. The PSO model feature is estimated based on the optimization function that considers swarm movement, as shown in Table 2.

Table 11. Comparison of membership function

DatasetMean number of membership functions (μMF)
BCGAPSOGSAHCAMFSO
T2D0.110.92.32.134.08
ISR0.61.32.22.283.98
Colon0.11.12.12.233.45
Leukemia0.21.32.12.463.06
Prostate0.41.22.32.694.67


The MFSO model utilizes the fuzzy set rule to define or generate a specific set of rules for eliminate irrelevant genes in deriving outcome measures. The convergence learning values for the proposed MFSO model are listed in Table 3.

The performance of the proposed MFSO model is evaluated using a conventional optimization model in the task of feature extraction. The optimization models considered for the analysis are the binary coded genetic algorithm (BCGA), genetic swarm optimization (GSA), real coded genetic algorithm (RCGA), and PSO. It is observed that the proposed MFSO model increased the fitness function value for an iteration value between 40 and 70. The results indicate that the optimal number of iterations for the microarray dataset is 140. The label features for the microarray dataset used to compute the average and worst fitness values increase significantly, as presented in Table 4.

The gene id (GID) generates the MF estimation, as presented in Table 4. The execution MF increases with the simultaneous estimation of the fine-tuned gene number and the MF through the mapped linguistic label in the self-evident variable’s computation features. The MFSO model uses a fuzzy set rule assessment scheme to interpret the feature set in a microarray dataset. The sample microarray dataset features are used to estimate the optimal rule set value based on the MFSO model, as indicated in Table 5.

Among the provided rule datasets, T2D Rule set 3 demonstrates the highest accuracy, which can be attributed to several factors.

  • • The value of Ncorrect: T2D Rule set 3 exhibits a relatively high number of correctly classified instances (Ncorrect = 13). This indicates the effectiveness of the rules in accurately predicting the target variable or classifying instances.

  • • Value of coverage: Although T2D Rule set 3 has a lower coverage value (coverage = 47.32), it focuses on specific patterns or conditions in the data, resulting in higher accuracy. By targeting these specific instances or patterns, this rule set achieves improved accuracy in classification.

  • • Rule set complexity: T2D Rule set 3 may have a higher complexity than other rule sets, either in terms of the number of rules or the complexity of individual rules. This increased complexity enables a more fine-grained and accurate classification of instances, leading to higher overall accuracy.

  • Training data characteristics: The training data used to generate T2D Rule set 3 possesses distinct patterns or characteristics that are captured well by the rules in this particular rule set. These patterns are likely indicative of the target variable, contributing to a higher accuracy in prediction.

• Notably, the accuracy of a rule set can vary depending on the specific dataset and problem domain. The aforementioned factors collectively contribute to the higher accuracy observed in T2D Rule set 3 within the given dataset.

Additionally, the interpretability variables of the proposed MFSO model are computed for the estimated features. The MFSO model utilizes estimated microarray feature selection, which is comparatively presented in Table 6.

A comparative analysis is conducted to assess the convergence values of the proposed MFSO model in comparison to existing GA and HCA models. The results indicate that the proposed MFSO model significantly outperforms the conventional GA and HCA models. The generated fuzzy set model enabled the identification of the minimal number of genes required. The results indicate that the proposed MFSO model achieves higher interpretability than the GA and HCA models. The generalized computations of the dataset features across different datasets are presented in Table 7.

Furthermore, the comparative analysis of the generalization ability reveals that the MFSO model exhibits a minimal number of incorrectly classified samples for the dataset. Thus, the proposed MFSO model significantly enhances the performance of existing methods for disease diagnosis. Table 8 provides a comparison of the sensitivity and specificity of the MFSO model.

The comparative analysis of specificity and sensitivity values for different datasets using the proposed MFSO model demonstrates higher values compared to existing classification models (Figure 6). Additionally, the analysis of false-positive and false-negative values shows significantly lower values compared to existing classifier models. Table 9 presents the mean coverage values for the different datasets.

The mean coverage values for different datasets indicate that the MFSO model achieves a higher mean coverage value. Specifically, for the T2D dataset, the mean coverage is estimated to be 35, which is substantially higher than that of the existing model. Table 10 provides a comparative estimation of variables with different features. The MFs estimated using the proposed MFSO model are listed in Table 11.

The results reveal that the proposed MFSO model achieves more variables compared to the features of the conventional classifier model. The feature selection estimation with the proposed MFSO model is effective, and the computation of MFs for the features yields higher values compared to the existing model.

The text also mentions various research papers proposing different feature-selection and classification models. For instance, Houssein et al. [12] presented an optimization model for medical data feature selection and classification using a centroid mutation model with an integrated rescue optimization algorithm. Arslan and Karhan [13] focused on categorizing PCG signal analysis using the decomposition of the IMF using the Hilbert-Huang transform and employed various classification techniques. In [14], the authors proposed a hybrid ensemble feature selection algorithm that combines the filter method with ensemble weighted scores and a wrapper-based optimal feature subset. Agarwalla and Mukhopadhyay [15] constructed a framework for disease diagnosis and classification using a hybrid population search based on a multi-objective player selection strategy. Maldonado et al. [16] performed neighborhood estimation based on the Euclidean distance to evaluate minority classes. Finally, Dash et al. [17] implemented the BSFLA metaheuristic algorithm for optimizing the feature selection process.

Bioinformatics provides microarray data for processing and detecting various diseases. This study introduces an MI-based feature selection model for microarray data. The proposed MFSO model uses a fuzzy-set model to determine the feature set using if-then rules. The MFSO model uses MF estimation with swarm-based objective function estimation. The proposed MFSO model exhibits improved performance compared to the existing model, with a classification accuracy of 99%.

We would like express our gratitude to all the individuals and organizations who contributed to the successful completion of this research. Their support and assistance were invaluable in conducting the study and achieving the results presented in this study.
  1. Mabarti, I (2020). Implementation of minimum redundancy maximum relevance (MRMR) and genetic algorithm (GA) for microarray data classification with c4. 5 decision tree. Journal of Data Science and Its Applications. 3, 38-47. https://doi.org/10.34818/jdsa.2020.3.37
  2. Dash, R (2021). An adaptive harmony search approach for gene selection and classification of high dimensional medical data. Journal of King Saud University-Computer and Information Sciences. 33, 195-207. https://doi.org/10.1016/j.jksuci.2018.02.013
    CrossRef
  3. Haznedar, B, Arslan, MT, and Kalinli, A (2021). Optimizing ANFIS using simulated annealing algorithm for classification of microarray gene expression cancer data. Medical & Biological Engineering & Computing. 59, 497-509. https://doi.org/10.1007/s11517-021-02331-z
    CrossRef
  4. Tabares-Soto, R, Orozco-Arias, S, Romero-Cano, V, Bucheli, VS, Rodríguez-Sotelo, JL, and Jimenez-Varon, CF (2020). A comparative study of machine learning and deep learning algorithms to classify cancer types based on microarray gene expression data. PeerJ Computer Science. 6. article no. e270
    CrossRef
  5. Kourou, K, Rigas, G, Papaloukas, C, Mitsis, M, and Fotiadis, DI (2020). Cancer classification from time series microarray data through regulatory dynamic Bayesian networks. Computers in Biology and Medicine. 116. article no. 103577
    Pubmed CrossRef
  6. Rostami, M, Forouzandeh, S, Berahmand, K, Soltani, M, Shahsavari, M, and Oussalah, M (2022). Gene selection for microarray data classification via multi-objective graph theoretic-based method. Artificial Intelligence in Medicine. 123. article no. 102228
    Pubmed CrossRef
  7. Houssein, EH, Abdelminaam, DS, Hassan, HN, Al-Sayed, MM, and Nabil, E (2021). A hybrid barnacles mating optimizer algorithm with support vector machines for gene selection of microarray cancer classification. IEEE Access. 9, 64895-64905. https://doi.org/10.1109/ACCESS.2021.3075942
    CrossRef
  8. Cathryn, RH, Kumar, SU, Younes, S, Zayed, H, and Doss, CGP (2022). A review of bioinformatics tools and web servers in different microarray platforms used in cancer research. Advances in Protein Chemistry and Structural Biology. 131, 85-164. https://doi.org/10.1016/bs.apcsb.2022.05.002
    CrossRef
  9. Alomari, OA, Makhadmeh, SN, Al-Betar, MA, Alyasseri, ZAA, Doush, IA, Abasi, AK, Awadallah, MA, and Zitar, RA (2021). Gene selection for microarray data classification based on Gray Wolf Optimizer enhanced with TRIZ-inspired operators. Knowledge-Based Systems. 223. article no. 107034
    CrossRef
  10. Alhenawi, EA, Al-Sayyed, R, Hudaib, A, and Mirjalili, S (2022). Feature selection methods on gene expression microarray data for cancer classification: a systematic review. Computers in Biology and Medicine. 140. article no. 105051
    CrossRef
  11. Wang, A, Liu, H, Yang, J, and Chen, G (2022). Ensemble feature selection for stable biomarker identification and cancer classification from microarray expression data. Computers in Biology and Medicine. 142. article no. 105208
    CrossRef
  12. Houssein, EH, Saber, E, Ali, AA, and ad Wazery, YM (2022). Centroid mutation-based search and rescue optimization algorithm for feature selection and classification. Expert Systems with Applications. 191. article no. 116235
    CrossRef
  13. Arslan, O, and Karhan, M (2022). Effect of Hilbert-Huang transform on classification of PCG signals using machine learning. Journal of King Saud University-Computer and Information Sciences. 34, 9915-9925. https://doi.org/10.1016/j.jksuci.2021.12.019
    CrossRef
  14. Singh, N, and Singh, P (2021). A hybrid ensemble-filter wrapper feature selection approach for medical data classification. Chemometrics and Intelligent Laboratory Systems. 217. article no. 104396
    CrossRef
  15. Agarwalla, P, and Mukhopadhyay, S (2022). GENEmops: supervised feature selection from high dimensional biomedical dataset. Applied Soft Computing. 123. article no. 108963
    CrossRef
  16. Maldonado, S, Vairetti, C, Fernandez, A, and Herrera, F (2022). FW-SMOTE: a feature-weighted oversampling approach for imbalanced classification. Pattern Recognition. 124. article no. 108511
    CrossRef
  17. Dash, R, Dash, R, and Rautray, R (2022). An evolutionary framework based microarray gene selection and classification approach using binary shuffled frog leaping algorithm. Journal of King Saud University-Computer and Information Sciences. 34, 880-891. https://doi.org/10.1016/j.jksuci.2019.04.002
    CrossRef
  18. Nazari, E, Aghemiri, M, Avan, A, Mehrabian, A, and Tabesh, H (2021). Machine learning approaches for classification of colorectal cancer with and without feature selection method on microarray data. Gene Reports. 25. article no. 101419
    CrossRef
  19. Nosrati, V, and Rahmani, M (2022). An ensemble framework for microarray data classification based on feature subspace partitioning. Computers in Biology and Medicine. 148. article no. 105820
    Pubmed CrossRef

Peddarapu Ramakrishna is a research scholar in Computer Science and Engineering department at the Koneru Lakshmaiah Education Foundation (KL Deemed to be University), Vaddeswaram, Andhra Pradesh, India since July 2019. He received his B.Tech. and M.Tech degrees in Computer Science and Engineering from JNTU University (JNTUH), Hyderabad, in 2004 and 2010, respectively. His research interests include machine learning, artificial intelligence, and deep learning. Currently, he is doing his research in biotechnology. He is a member of the CSI and ISTE. He is also working as an Asst. Professor in the Dept. of Computer Science and Engineering at VNR Vignana Jyothi Institute of Engineering and Technology, Bachupally, Hyderabad, Telangana. He has 18 years of teaching experience. He has published more than 15 research articles in ESCI and Scopus/international journals. E-mail: peddarapuramakrishna@gmail.com

Pothuraju Rajarajeswari completed her Ph.D. in Computer Science and Engineering and is currently working as PROFESS-OR in KL University, Guntur, Andhra pradesh. She has 22 years of academic teaching experience. She has guided many Mtech and PhD students. Her areas of interest include artificial intelligence, intelligent systems, data mining, machine learning, bioinformatics. She has published more than 65 research articles in SCI/ESCI and scopus /international journals. E-mail: rajilikhitha@gmail.com

Article

Original Article

International Journal of Fuzzy Logic and Intelligent Systems 2023; 23(2): 117-129

Published online June 25, 2023 https://doi.org/10.5391/IJFIS.2023.23.2.117

Copyright © The Korean Institute of Intelligent Systems.

Improved Swarm Intelligence Optimization Model with Mutual Information Estimation for Feature Selection in Microarray Bioinformatics Datasets for Diseases Diagnosis

Peddarapu Rama Krishna and Pothuraju Rajarajeswari

Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Andhra Pradesh, India

Correspondence to:Peddarapu Rama Krishna (peddarapuramakrishna@gmail.com)

Received: January 28, 2023; Revised: May 22, 2023; Accepted: June 19, 2023

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Bioinformatics has emerged as a promising field with innovative applications in various biological domains. Microarray data analysis has become the preferred technology for estimating gene expression and diagnosing diseases. However, processing and computing the vast amount of information contained in microarray data pose significant challenges. This study focuses on developing an effective feature selection model for disease diagnosis using microarray datasets. The proposed approach, mutual fuzzy swarm optimization (MFSO), uses mutual information (MI) values to compute features in a microarray gene dataset. The computed MI values are then applied to a fuzzy expert system (FES) for sample classification. To improve classification accuracy, fuzzy logic is iteratively estimated for unclassified instances in the microarray sample. The feature selection model employs a particle swarm optimization model to extract microarray sample features based on the derived MI. The swarm optimization movement is guided by an objective function. The expert system integrates fuzzy logic with human expert knowledge to perform medical diagnoses, with a specific focus on the selected features. The proposed MFSO model utilizes if-then rules to estimate the membership values of the microarray dataset features. Through simulation analysis, the performance of the proposed MFSO model is evaluated using sensitivity, specificity, and ROC values for five different microarray datasets. The results demonstrate improved performance compared to existing methods. In conclusion, this study presents a novel approach for effective feature selection in a microarray dataset for disease diagnosis. The proposed MFSO model integrates MI computation, fuzzy logic, and expert knowledge to achieve improved classification accuracy. The simulation analysis validates the effectiveness of the proposed model, highlighting its superior performance compared to existing methods.

Keywords: Microarray, Fuzzy expert system (FES), Particle swarm optimization, Bioinformatics, Feature selection

1. Introduction

Bioinformatics is an interdisciplinary field that encompasses statistics, biology, computer science, and management. It involves modeling and analyzing data related to biological sequences, genome content, and arrangement to predict macromolecule structures. With the the advancement of genomic data, bioinformatics has gained prominence in biological and medical research, leading to increased accounts throughout the years. Bioinformatics combines information and computational biology, exhibiting interrelated characteristics. It is a conventional discipline that utilizes computers, engineers, statisticians, and mathematicians to support information management in healthcare.

Microarray data are widely regarded as effective tools for examining DNA expression in large-scale gene sequences. Microarray technology facilitates the interaction between gene expression and biology, enabling genome-wide analysis. It involves the parallel processing of gene profiles through a hybridization model. The evaluation of eDNA expression at the genome level allows for the potential application of microarrays in the biomedical community. Researchers have reported the use of microarray data in pharmaceutical drug development for selection, assessment, and quality control, as well as in disease diagnosis and monitoring to derive adverse effects from therapeutic interventions. Tumors can be classified using microarray technology based on molecular characteristics, providing an easier alternative to traditional diagnostic schemes.

Microarray techniques have been utilized in the study of carcinogenesis, drug discovery, and the subclassification of cancer cells. Accurate diagnosis of cancer has been demonstrated through the effective classification of microarray data. Gene expression profiles of cancer tissues have been used to assess cancer status using gene expression classifier profile databases. Microarray data typically have high-dimensional gene expression dimensions, which pose challenges in terms of noise measurement and data analysis due to differences in sample quantities. Moreover, the imbalance in label class samples further complicates the analysis. Reducing the dimensionality of microarray data using computer-based systems can mitigate these challenges. However, the analysis of high-dimensional, low and skewed data incurs higher computation costs and can result in poor learning and classification processes. Therefore, microarray data can significantly impact disease identification and estimation. The microarray classification process involves two steps: gene selection and the construction of a classification model to generate accurate predictions from the data.

Bioinformatics has emerged as a promising and innovative field with applications in various biological domains. In particular, the analysis of microarray data has gained significant attention in the disease diagnosis based on gene expression patterns. However, processing and computing large volumes of information contained in microarray data present substantial challenges. This study aims to address the need for an effective feature selection model for disease diagnosis using microarray datasets, thereby contributing to the field. The specific objectives of this study include constructing a feature selection model and evaluating its performance in accurately diagnosing diseases. The research gap lies in the development of a feature selection model capable of effectively handling the challenges associated with microarray data. While previous studies have explored various feature selection methods, there is a need for an approach that can optimize the selection process to improve disease diagnosis accuracy using microarray datasets. This study aims to fill this gap by proposing a novel approach, the mutual fuzzy swarm optimization (MFSO) model, and evaluating its performance against existing methods. By utilizing mutual information (MI) computation, fuzzy logic, and expert knowledge, the MFSO model aims to enhance classification accuracy and improve disease diagnosis performance. A simulation analysis using multiple microarray datasets is conducted to compare the proposed model with existing methods to demonstrate its superiority.

This study aims to address the research gap in effective feature selection for disease diagnosis using microarray datasets. The proposed MFSO model introduces novel approaches and techniques to improve accuracy and performance. Evaluating the model against existing methods will provide evidence of its effectiveness and contribute to the advancement of bioinformatics for disease diagnosis.

The contributions of this study are as follows:

  • • Development of an effective feature selection model: This study proposes the MFSO model as a novel approach for feature selection in disease diagnosis using microarray datasets. The MFSO model integrates MI computation, fuzzy logic, and expert knowledge to optimize the selection process and improve the classification accuracy. By developing this model, this study contributes to the advancement of feature selection techniques specifically tailored for microarray data analysis.

  • • Improved accuracy in disease diagnosis: The proposed MFSO model aims to enhance the accuracy of disease diagnosis using microarray datasets. Through the integration of MI, fuzzy logic, and expert knowledge, the model provides an effective approach for identifying relevant features and classifying samples. Achieving higher accuracy rates improves the reliability and effectiveness of disease diagnosis using microarray data.

  • • Comparison with existing methods: This study conducts a thorough evaluation of the proposed MFSO model by comparing its performance with that of existing feature selection methods. By benchmarking the MFSO model against established approaches, this study provides evidence of its superiority and demonstrates its potential for practical applications in bioinformatics. The comparison identifies the strengths and advantages of the proposed model over existing methods, contributing to the existing body of knowledge .

  • • Addressing the research gap: The study focuses on the need for an effective feature selection model for disease diagnosis using microarray datasets, addressing a specific research gap in bioinformatics. By proposing a novel approach, this study fills an important gap in the literature and offers a valuable contribution to bioinformatics research.

• The study contributes to the field of bioinformatics by introducing a novel feature selection model, improving the accuracy of disease diagnosis, providing a comparative analysis with existing methods, and addressing research gaps. These contributions have implications for both the theoretical understanding and practical application of bioinformatics in disease diagnosis.

2. Related Works

Microarray technology is used to compute the genome probe, generating data points in a high-dimensional space based on the genome size. However, microarray data classification often involves large sample sizes.

In their work [12], an optimization model was presented for feature selection and classification of medical data. The developed model utilizes the centroid mutation-based Search and Rescue (cmSAR) optimization algorithm that incorporates a k-nearest neighbor (kNN) classifier for disease classification.

The cmSAR model performs selects features from an optimal group by estimating two classes: convergence and local searches. It employs an fuzzy logic system that considers multivalue fuzzy set logic for the mutation operations. Statistical results demonstrated the effectiveness of the metaheuristics-based optimization model for classification. The disease dataset used in the study comprised 15 disease data features extracted from the UCI dataset. Additionally, the cmSAR model uses the CEC-C06 2019 objective function and performs classification based on the Friedman and Bonferroni–Dunn tests. The examination results confirmed that this cmSAR model achieves improved performance for medical datasets.

In another study [13], phonocardiogram (PCG) signal analysis was conducted by applying the decomposition of the intrinsic mode function (IMF) Hilbert-Huang transform. The analysis employed the mel-frequency cepstral coefficient for feature extraction and the Hilbert-Huang transform for signal classification. Feature selection methods such as the kNN, multilayer perceptron, support vector machine (SVM), and deep neural network (DNN) were employed for classification. This PCG model uses multiple classification classes, including healthy, aortic stenosis, mitral stenosis, mitral regurgitation, and mitral valve prolapse, and employs a five-fold cross-validation approach. The DNN model demonstrated higher precision, recall, F1-score, and accuracy values of 98.9%, 98.7%, 98.8%, and 98.9%, respectively.

In [14], a four-step hybrid ensemble feature selection algorithm was examined using cross-validation. The deployed model utilizes a filter method with an ensemble-weighted score based on ranking features using a sequential forward selection model to create a wrapper-based optimal feature subset. The optimal subset features were sampled to perform the classification. An experimental analysis was performed on a 20-benchmark dataset based on the dimensionalities. The hybrid approach exhibited a higher dataset dimensionality. The proposed hybrid approach comprises of feature selection algorithm, such as naive bayes, radial basic function and SVM, kNN, and random forest, to perform classification. The experimental results demonstrated that the proposed methodology exhibits higher accuracy, specificity, sensitivity, AUC, and F1-score for the selected features. The results exhibited improved performance in achieving competitiveness with state-of-the-art methods.

In [15], a framework for disease diagnosis and classification was constructed. The developed GENE model performs gene selection using a multiobjective player selection strategy-based hybrid population search (MOPS-HPS). The model uses a multifiltering adaptive parameter to perform gene selection. Using a wrapper-based scheme, a rotational blending operator is employed for the transmute binary search space through stochastic search phase estimation to perform classification. The effectiveness of the developed GENEmops model was evaluated on 16 biological datasets and its performance was compared to that of eight binary and eight multi-class models. GENEmops exhibited improved intelligent computational multiclass features with enhanced efficiency.

Neighborhood estimation based on the Euclidean distance was performed to evaluate minority classes. High-dimensional microarray features were evaluated based on the Euclidean distance settings, assuming equal weights for the different features. An effective SMOTE-weighted Minkowski distance computation model was developed for estimating samples from the minority class [16].

The model features were evaluated in a relevant classification task, performing attributes selection with a threshold value of weights. To assess the predictive performance of the model, an experimental analysis was conducted on the 42-imbalance classes. The analysis aimed to compare the performance with the traditional SMOTE approach in both low- and high-dimensional settings, while ensuring that complexity was not affected.

A binary shuffled frog leaping algorithm (BSFLA) meta-heuristics model is implemented in [17] for gene selection. The gene subset was computed based on optimal features, considering 20 subsets of genes extracted from the original dataset. The gene subset was implemented using the kNN classifier model and compared with an artificial neural network (ANN) and SVM. The comparative analysis showed improved performance through particle swarm optimization.

In [18], a relief attribute evaluation was deployed with the light gradient boosting machine (GBM) and DNN model for classifying non-cancer and cancer cells using 20 machine-learning techniques. The analysis was performed on 111 patients with 22,278 features using Python and Weka software. The evaluation metrics used were accuracy, precision, recall, and AUC, utilizing decision trees, naive Bayes, AdaBoost, quantitative descriptive analysis, and DNN models. The classification network demonstrated exceptional performance by achieving 100% accuracy on the microarray dataset, effectively distinguishing between non-cancerous and cancerous cells.

A framework for feature selection was developed in [11] to increase feature discrimination and stability. Features were selected using aggregation strategies within the sampled dataset. These strategies were combined with a multiclass model to achieve classification accuracy. The stability features were examined using binary and multiclass metrics. The simulation results indicated that this method has a higher stability score compared to other classifiers.

In [19], a feature-level ensemble model was constructed for microarray classification using feature subspace partitioning (referred to as fuzzy logic and adaptive evolutionary optimizer-based feature selection for optimization of fuzzy inference systems). The ensemble classification model consists of three steps: generating subsets through partitioning, feature selection, and aggregation of the ranking function. The developed framework model uses a microarray dataset, considering parameters such as accuracy, stability, and runtime.

3. Proposed Methodology

The proposed methodology introduces the MFSO model to estimate MI for optimizing variables in microarray datasets. The MFSO model computes the optimization of microarray variables to extract features from the dataset.

(1) Fuzzy logic: Fuzzy logic is a mathematical framework that addresses uncertainty and imprecision by allowing the representation and manipulation of vague or fuzzy concepts. It extends classical binary logic by introducing the concept of partial truth, where membership functions (MFs) quantify the degree to which an element belongs to a particular set. In this study, fuzzy logic is utilized to estimate the membership values of features in the microarray dataset using if-then rules.

The proposed MFSO model incorporates fuzzy logic by iteratively estimating the membership values for unclassified instances in the microarray sample. By utilizing if-then rules and membership estimation, the model integrates human expert knowledge and fuzzy reasoning to improve classification accuracy and enhance disease diagnosis performance.

(2) Particle swarm optimization (PSO): PSO is a metaheuristic optimization algorithm inspired by the behaviors of bird flocks and fish schools. It is used to solve optimization problems by simulating the social behavior of particles in a multi-dimensional search space. Each particle represents a potential solution and its movements are influenced by its own best position (pbest) and the global best position (gbest) found by the swarm.

In the proposed MFSO model, PSO is employed to extract microarray sample features based on the derived MI. The swarm optimization movement is guided by an objective function that aims to optimize the feature selection process. By leveraging the PSO, the model explores the search space and identifies the most relevant and informative features for disease diagnosis in microarray datasets.

Fuzzy logic and PSO are selected for this study because of their ability to handle uncertainty, optimize feature selection, and integrate expert knowledge. By incorporating these methodologies into the proposed MFSO model, this study aims to leverage their strengths and benefits to enhance the accuracy and effectiveness of disease diagnosis using microarray datasets.

3.1 Mutual Information

MI is a measure that captures the mutual dependence between two variables. The proposed MFSO model consists of a stochastic system with a variable input X and an output Y. The data are represented as a discrete variable X with possible values denoted as Nx, and variable Y with possible values denoted as Ny. The initial uncertainty, expressed as the entropy estimation H(Y), is defined by Eq. (1):

H(Y)=j=1NyP(yj)log(P(yi)).

Here, the probability value of Y is denoted as P(yj). The uncertainty of the system output Y is evaluated based on input X using the conditional entropy value H(Y | X), defined by Eq. (2):

H(YX)=-i=1NxP(xi)[i=1NyP((yixi)·log P(yjxi)].

In Eq. (2), the conditional probability of output variable yj given input vector xi is denoted as P(xi). The input and possible value vectors, x and y, are represented by i and j, respectively. The difference in uncertainty between the system outputs, denoted as H(Y )−H(YX), is computed based on the input and serves as a measure of the output system uncertainty with respect to the input. The MI between random variables X and Y is evaluated using I(Y ;X), defined by Eq. (3):

I(Y;X)=H(Y)-H(YX).

The information is estimated by computing the average uncertainty value considering the random variable Y while reducing the uncertainty of X. Gene selection is performed based by considering the class label Y in the sample with the related X variable in the genes. The process of MI computation in the MFSO is presented in Figure 1.

The steps of the MFSO are as follows:

  • 1. Compute the gene expression sample ′S.′

  • 2. Evaluate each input gene ′S′ using different equations, as follows:

    • a. Estimate the initial entropy value using Eq. (1)

    • b. Compute the conditional entropy using Eq. (2)

    • c. Evaluate the MI between the gene expressions using Eq. (3)

  • 3. Based on the estimated MI values, store and sort the genes.

  • 4. For every gene 1, 2, ..., n

    • a. Partition the sample population ′S′ into training and testing sequences Ta and Tb, respectively.

    • b. Use the optimization model to measure the fuzzy set values for training and testing with variables Ta and Tb for the input gene.

    • c. Classify the sample based on testing sample features Te and unclassified samples are represented as Tf .

  • For every unclassified sample Te, Steps 2–4 are repeated until the condition for achieving optimal classification accuracy is met. Here, the unclassified sample value is denoted as α, starting with 1.

3.2 MFSO Fuzzy Expert System

The microarray data for the gene expression elements are computed based on the fuzzy expert system (FES), as illustrated in Figure 2. With the FES, the MFs are aggregated using the if-then rules to derive decisions regarding the data. The prediction class is indicated using discrete sample labels with nonlinear mapping of the input and output spaces. The FES microarray data architecture is illustrated in Figure 2.

The FES model consists of the constructed if-then linguistic general form “IF A THEN B.” The proposition of variables A and B is estimated with the rule set B for premises and consequences. The uncertainty between the variables is evaluated based on the linguistic variable and fuzzy set value if-then. The FL model is crucial for accumulating and summarizing data to derive decision-relevant information. The universal fuzzy set F with discourse Z is presented in Eq. (4):

F={(z,μA(z)zZ)}.

In Eq. (4), the MF μA(z) is denoted as z in F. FES uses the Mamdani inference system to evaluate the gene input for gene matching using the if and the if-then rules for the fuzzy process. Fuzzy weights are computed based on each rule degree in the fuzzy interference system. The output class is aggregated through specified fuzzy rules, and the output confidence is the input value for the determined class. Using the PSO algorithm, the if-then rule set is evaluated for the MF. Gene selection is evaluated using FES, which is significant in the microarray dataset with higher dimensionality. The optimization model considers MI for estimating the variables during gene selection. The MF in the variable is computed using the if-then rule based on the Boolean logic function to estimate the true or false values in the data samples. The generalized rule mode of the proposed MFSO model is computed using Eq. (5):

Rulej:ifxp1==Aj1andxpn==Ajnthenclass=Cj.

Here, the gene input value is represented by xp1, . . . . . . , xpn, and the antecedent fuzzy set value is denoted by Aj1,......,Ajn . The label class output is defined as Cj with a quantitative rule-based model with if-then rules. The fuzzy relationship between the variables is computed using the if-then rules for the input and output variables.

The relationships between the variables are computed using the Cartesian product universal set model. The universal set X is evaluated with the fuzzy set A with ordered pairs based on Eq. (6):

A={x,μA(x)xX)}.

In this case, the input variable x in A is denoted as μA(x) and is estimated using trapezoidal and triangular functions, employing FES for the MF variable.

3.3 MFSO PSO model

The MFSO PSO model uses food sources for the phases through a random search model to derive the decision-making process for the selection of food sources. The PSO model used to select features from the gene data is illustrated in Figure 3.

3.3.1 Data representation

The MFSO model computes the optimal rule-set model with PSO. The estimation of the variable is based on the optimal rule set with combinational optimization features in the genes. Particle swarm features are computed based on the tuned MF for the allocated points uniformly in the gene expression range. Tuning with the membership model is evaluated using continuous optimization features in the dataset. The optimal variables eliminate the overall relationship between gene features, as presented in Figure 4.

The PSO model initializes the parameter values by computing the particles in parts that are within the permissible range. A particle swarm within a permissible range is evaluated and effectively processed using the constructed path. The path explored in the optimization model features is evaluated based on a combination of discrete values to derive the decision values based on the objective function. The objective function features are estimated using the objective function model. If the process is yes, the optimal path is selected, and the variables are computed with an updated pheromone and a permissible range set using the iteration process. Each iteration value is evaluated for a random particle swarm to derive the optimum solution.

The particle swarm between variables is used to compute the shortest optimal path for the computation of the edges to estimate the optimal path. To minimize the interaction between the swarms in the region, the movement values are based on global and local functions to derive the optimal path in the region. The flow of the PSO model is presented in Figure 5.

The MFSO model is computed based on the initial variable matrix to obtain the optimal solution for gene features. The gene features considered in this model are blood cells, pancreatic islets, liver cells, and neurons. The initial random solution space primary matrix is computed using Eq. (7):

P=[X1X2xN].

In Eq. (7), the features are represented as Xi and i = 1, 2, ..., N, denoting different feature variables. The evaluation process is takes into account the maximum number of cells, population size in each iteration, and the cost associated with each cell. Different function criteria are considered to determine the target problem. The MFSO model consists of global and local cells that store cost values as memory. The self-renewal process involves considering different feature-area models. At this stage, all particle information is shared among cells based on the cost value associated with the selected genes or features for gene selection.

The PSO model demonstrated enhanced estimation of the swarm path movement through the utilization of a self-estimation model derived using Eq. (8):

SWOptimum(t+1)=Sr×SCoptimum(t).

MFSO is performed using the iteration value denoted as t to estimate the best or optimal solution in every iteration process. The random number is represented as Sr, and it lies between 0 and 1. The current iteration value required to achieve the optimal solution for the variable is denoted as SWOptimum(t+1). The process continues until the optimal swarm value is derived to achieve the best result using the objective function to compute the fitness value, as presented in Eq. (9):

xij(t+1)=μij+ϕ(μij(t)-μkj(t)).

Based on the PSO algorithm, a new swarm movement is computed using Eq. (9). The solution is estimated with ith feature solution dimension denoted as μij, iteration number value is represented as t, estimated random path for the dimension solution is denoted as j based on random variables μkj and ϕ within the range of −1–1.

The swarm path ω random variable is in the range of 0–1. The selected optimal best path is related to path estimation to derive the optimal solution. The optimal solution value is computed using Eq. (10):

Pi=SCOptimumΣi=1NSCi.

In Eq. (10), the features for comparison are denoted as Pi, the cost of the features is represented as SCi, and the final cost of the population is defined as N. These values are used to determine the best solution through the implementation of the PSO algorithm. The algorithm employs different parameter features to establish the control parameters for PSO, which are initialized as variables. The evaluation of the PSO model is based on several factors, including population size (P), dimension (D), maximum iteration (I), trials (T), and maximum number of genes (N). Initially, a random population is generated, and these variables are estimated based on the initial population. The features in the microarray dataset are evaluated using Eq. (11):

Sfit(t)={Si,Si0,1+abs(Si),Si<0.

Therefore, the maximal fitness function Sfit(t) is evaluated using an objective algorithm minimization process. Si represents a parameter or variable that influences the fitness score at time t. The fitness score is calculated based on the given conditions to assign appropriate values to Sfit(t). At the same time instances, the optimal solution is computed to estimate the constraints. The fitness value of the gene is evaluated using the best value in the gene, which is derived using Eq. (12):

SCOptimum(t+1)=Sr*SCOptimum(t).

The feature computation is performed using the greedy selection model, considering both old and new fitness values. The relative features of the variables are evaluated with global and local feature estimations using a fuzzy set model. Through feature estimation, the optimal cost for the features is determined to improve processing speed. The termination condition is set to achieve the maximum number of iterations, allowing the evaluation of the output and derivation of the best solution features through the iteration process. The pseudocode representation of the proposed MFSO model is as Algorithm 1.

4. Dataset

The MFSO model utilizes a fuzzy-set model to classify a microarray dataset consisting of five different datasets: type 2 diabetes (T2D), colon cancer, prostate cancer, insulin sensitive resistance (ISR), and leukemia. The dataset comprises a two-class gene expression profile that is publicly accessible. Table 1 provides an overview of the microarray dataset used for feature selection.

5. Results and Discussion

The results of the proposed MFSO model are obtained through 30 independent trials to compute unique values and estimate the control parameters and convergence values. The PSO model feature is estimated based on the optimization function that considers swarm movement, as shown in Table 2.

Table 11. Comparison of membership function.

DatasetMean number of membership functions (μMF)
BCGAPSOGSAHCAMFSO
T2D0.110.92.32.134.08
ISR0.61.32.22.283.98
Colon0.11.12.12.233.45
Leukemia0.21.32.12.463.06
Prostate0.41.22.32.694.67


The MFSO model utilizes the fuzzy set rule to define or generate a specific set of rules for eliminate irrelevant genes in deriving outcome measures. The convergence learning values for the proposed MFSO model are listed in Table 3.

The performance of the proposed MFSO model is evaluated using a conventional optimization model in the task of feature extraction. The optimization models considered for the analysis are the binary coded genetic algorithm (BCGA), genetic swarm optimization (GSA), real coded genetic algorithm (RCGA), and PSO. It is observed that the proposed MFSO model increased the fitness function value for an iteration value between 40 and 70. The results indicate that the optimal number of iterations for the microarray dataset is 140. The label features for the microarray dataset used to compute the average and worst fitness values increase significantly, as presented in Table 4.

The gene id (GID) generates the MF estimation, as presented in Table 4. The execution MF increases with the simultaneous estimation of the fine-tuned gene number and the MF through the mapped linguistic label in the self-evident variable’s computation features. The MFSO model uses a fuzzy set rule assessment scheme to interpret the feature set in a microarray dataset. The sample microarray dataset features are used to estimate the optimal rule set value based on the MFSO model, as indicated in Table 5.

Among the provided rule datasets, T2D Rule set 3 demonstrates the highest accuracy, which can be attributed to several factors.

  • • The value of Ncorrect: T2D Rule set 3 exhibits a relatively high number of correctly classified instances (Ncorrect = 13). This indicates the effectiveness of the rules in accurately predicting the target variable or classifying instances.

  • • Value of coverage: Although T2D Rule set 3 has a lower coverage value (coverage = 47.32), it focuses on specific patterns or conditions in the data, resulting in higher accuracy. By targeting these specific instances or patterns, this rule set achieves improved accuracy in classification.

  • • Rule set complexity: T2D Rule set 3 may have a higher complexity than other rule sets, either in terms of the number of rules or the complexity of individual rules. This increased complexity enables a more fine-grained and accurate classification of instances, leading to higher overall accuracy.

  • Training data characteristics: The training data used to generate T2D Rule set 3 possesses distinct patterns or characteristics that are captured well by the rules in this particular rule set. These patterns are likely indicative of the target variable, contributing to a higher accuracy in prediction.

• Notably, the accuracy of a rule set can vary depending on the specific dataset and problem domain. The aforementioned factors collectively contribute to the higher accuracy observed in T2D Rule set 3 within the given dataset.

Additionally, the interpretability variables of the proposed MFSO model are computed for the estimated features. The MFSO model utilizes estimated microarray feature selection, which is comparatively presented in Table 6.

A comparative analysis is conducted to assess the convergence values of the proposed MFSO model in comparison to existing GA and HCA models. The results indicate that the proposed MFSO model significantly outperforms the conventional GA and HCA models. The generated fuzzy set model enabled the identification of the minimal number of genes required. The results indicate that the proposed MFSO model achieves higher interpretability than the GA and HCA models. The generalized computations of the dataset features across different datasets are presented in Table 7.

Furthermore, the comparative analysis of the generalization ability reveals that the MFSO model exhibits a minimal number of incorrectly classified samples for the dataset. Thus, the proposed MFSO model significantly enhances the performance of existing methods for disease diagnosis. Table 8 provides a comparison of the sensitivity and specificity of the MFSO model.

The comparative analysis of specificity and sensitivity values for different datasets using the proposed MFSO model demonstrates higher values compared to existing classification models (Figure 6). Additionally, the analysis of false-positive and false-negative values shows significantly lower values compared to existing classifier models. Table 9 presents the mean coverage values for the different datasets.

The mean coverage values for different datasets indicate that the MFSO model achieves a higher mean coverage value. Specifically, for the T2D dataset, the mean coverage is estimated to be 35, which is substantially higher than that of the existing model. Table 10 provides a comparative estimation of variables with different features. The MFs estimated using the proposed MFSO model are listed in Table 11.

The results reveal that the proposed MFSO model achieves more variables compared to the features of the conventional classifier model. The feature selection estimation with the proposed MFSO model is effective, and the computation of MFs for the features yields higher values compared to the existing model.

The text also mentions various research papers proposing different feature-selection and classification models. For instance, Houssein et al. [12] presented an optimization model for medical data feature selection and classification using a centroid mutation model with an integrated rescue optimization algorithm. Arslan and Karhan [13] focused on categorizing PCG signal analysis using the decomposition of the IMF using the Hilbert-Huang transform and employed various classification techniques. In [14], the authors proposed a hybrid ensemble feature selection algorithm that combines the filter method with ensemble weighted scores and a wrapper-based optimal feature subset. Agarwalla and Mukhopadhyay [15] constructed a framework for disease diagnosis and classification using a hybrid population search based on a multi-objective player selection strategy. Maldonado et al. [16] performed neighborhood estimation based on the Euclidean distance to evaluate minority classes. Finally, Dash et al. [17] implemented the BSFLA metaheuristic algorithm for optimizing the feature selection process.

6. Conclusion

Bioinformatics provides microarray data for processing and detecting various diseases. This study introduces an MI-based feature selection model for microarray data. The proposed MFSO model uses a fuzzy-set model to determine the feature set using if-then rules. The MFSO model uses MF estimation with swarm-based objective function estimation. The proposed MFSO model exhibits improved performance compared to the existing model, with a classification accuracy of 99%.

Fig 1.

Figure 1.

Flow diagram showing MI computation in the MFSO.

The International Journal of Fuzzy Logic and Intelligent Systems 2023; 23: 117-129https://doi.org/10.5391/IJFIS.2023.23.2.117

Fig 2.

Figure 2.

Design of the fuzzy system.

The International Journal of Fuzzy Logic and Intelligent Systems 2023; 23: 117-129https://doi.org/10.5391/IJFIS.2023.23.2.117

Fig 3.

Figure 3.

Flow chart of the PSO model.

The International Journal of Fuzzy Logic and Intelligent Systems 2023; 23: 117-129https://doi.org/10.5391/IJFIS.2023.23.2.117

Fig 4.

Figure 4.

Representation of solution variable in the hybrid ant stem algorithm.

The International Journal of Fuzzy Logic and Intelligent Systems 2023; 23: 117-129https://doi.org/10.5391/IJFIS.2023.23.2.117

Fig 5.

Figure 5.

Flow chart of the PSO.

The International Journal of Fuzzy Logic and Intelligent Systems 2023; 23: 117-129https://doi.org/10.5391/IJFIS.2023.23.2.117

Fig 6.

Figure 6.

Comparison of (a) sensitivity, (b) specificity, (c) false positive, and (d) false negative.

The International Journal of Fuzzy Logic and Intelligent Systems 2023; 23: 117-129https://doi.org/10.5391/IJFIS.2023.23.2.117

Algorithm 1. MFSO feature selection in microarray..

Begin
for i = 1 ... I do:
Compute the fitness value of swarm optimization P using an objective function
Estimate the fitness function based on the swarm objective function
Repeat the fuzzy set function Fi −1 == Fi
Calculate the swarm movement in the objects:
Pi=SCOptimumΣi=1NSCi
Estimate the fitness value of the particle swarm function
SCOptimum(t+1)=Pi*SCOptimum(t)
Compare the optimal value
Perform the differentiation process for the MF
xij(t+1)=μij+ϕ(μij(t)-μkj(t))+ω(μmi(t)-μij(t)),ω=μkj/μij
End for
return
End

Table 1. Dataset features.

DatasetSample countGeneLabelsClass samples
ISR116,783IS55
IR55

T2D3321,793DM216
NGT16

Colon cancer591,974Tumor38
Normal23

Leukemia696,943ALL43
AML29

Prostate10611,678Tumor49
Normal53

IS, insulin sensitivity; IR, insulin resistance; DM, diabetes mellitus; NGT, normoglycaemia; ALL, Acute lymphocytic leukemia; AML, acute myeloid leukemia..


Table 2. Estimation of features using particle swarm optimization.

ParametersValue
Delay factor (ρ)0.5
Number of swarm75
Minimal value (τ0)0.02
Error factor (σ)1
Csize27

Table 3. Interpretability of the MFSO based on rules.

DatasetRule setNcorrectNcoversCoverageAccuracy (%)
ISRRule 1567894
Rule 2689190

T2DRule 114188391
Rule 29118689
Rule 34178893
Rule 4597293

Table 4. Gene rule set with the MFSO.

DatasetsGene ID#MFLabels
T2DNM_021131.11medium
NM_0223491.12low, medium
AW2912181low
BC000229.11low
AL5235751high
NM_005260.21High

ISRD32129_f_at2low, medium
Z83805_at1medium
Y09615_at2low, high
X07730_at2medium, high

Table 5. Rule set with the MFSO.

DatasetRule setValue of NcoversValue of NcorrectValue of CoverageAccuracy of Rule (%)
ISR1867794.56
21089098.73

T2D1211774.6397,64
2171169.8498.74
3211347.3299.04
49938.9498.74

Table 6. Comparison of interpretability based on datasets.

DatasetMean coverageNumber of variablesMembership function
GAHCAMFSOGAHCAMFSOGAHCAMFSO
T2D21.325.8336.852.93.9816.840.453.7822.74
ISR25.8228.9339.433.24.9419.320.833.7429.83

Table 7. Generalization ability.

DatasetApproachLOOCV evaluation

CorrectIncorrectUnclassified
ISRGA84.35.410.3
PSO87.85.96.3
HCA92.44.33.3
MFSO98.20.80.8

T2DGA44.1129.4126.48
PSO85.38.825.88
HCA94.112.83.09
MFSO98.40.71.1

Table 8. Comparison of classification performance.

Data setApproaches# IGSensitivitySpecificityFalse positiveFalse negative
ISRMI-PSO90.9850.8350.0380.068
MI-HCA50.9930.7970.1840.296
MI- MFSO50.9860.8250.0590.083

T2DMI-PSO100.9830.8730.0480.63
MI-HCA60.9930.9260.0410.783
MI-MFSO60.9970.9480.0280.73

Table 9. Comparison of mean coverage.

DatasetMean coverage (μC)
BCGAPSOGSAHCAMFSO
T2D26.423.624.623.7336.82
ISR23.727.227.328.4241.75
Colon34.6737.9334.931.3540.94
Leukemia32.8239.137.133.9444.88
Prostate28.6326.428.436.7253.94

Table 10. Comparison of variables.

DatasetNumber of variables (#V)
BCGAPSOGSAHCAMFSO
T2D4.23.84.35.3814.63
ISR3.83.94.95.1613.94
Colon4.25.35.15.5310.94
Leukemia4.95.25.95.499.84
Prostate5.14.15.65.2810.35

Table 11. Comparison of membership function.

DatasetMean number of membership functions (μMF)
BCGAPSOGSAHCAMFSO
T2D0.110.92.32.134.08
ISR0.61.32.22.283.98
Colon0.11.12.12.233.45
Leukemia0.21.32.12.463.06
Prostate0.41.22.32.694.67

References

  1. Mabarti, I (2020). Implementation of minimum redundancy maximum relevance (MRMR) and genetic algorithm (GA) for microarray data classification with c4. 5 decision tree. Journal of Data Science and Its Applications. 3, 38-47. https://doi.org/10.34818/jdsa.2020.3.37
  2. Dash, R (2021). An adaptive harmony search approach for gene selection and classification of high dimensional medical data. Journal of King Saud University-Computer and Information Sciences. 33, 195-207. https://doi.org/10.1016/j.jksuci.2018.02.013
    CrossRef
  3. Haznedar, B, Arslan, MT, and Kalinli, A (2021). Optimizing ANFIS using simulated annealing algorithm for classification of microarray gene expression cancer data. Medical & Biological Engineering & Computing. 59, 497-509. https://doi.org/10.1007/s11517-021-02331-z
    CrossRef
  4. Tabares-Soto, R, Orozco-Arias, S, Romero-Cano, V, Bucheli, VS, Rodríguez-Sotelo, JL, and Jimenez-Varon, CF (2020). A comparative study of machine learning and deep learning algorithms to classify cancer types based on microarray gene expression data. PeerJ Computer Science. 6. article no. e270
    CrossRef
  5. Kourou, K, Rigas, G, Papaloukas, C, Mitsis, M, and Fotiadis, DI (2020). Cancer classification from time series microarray data through regulatory dynamic Bayesian networks. Computers in Biology and Medicine. 116. article no. 103577
    Pubmed CrossRef
  6. Rostami, M, Forouzandeh, S, Berahmand, K, Soltani, M, Shahsavari, M, and Oussalah, M (2022). Gene selection for microarray data classification via multi-objective graph theoretic-based method. Artificial Intelligence in Medicine. 123. article no. 102228
    Pubmed CrossRef
  7. Houssein, EH, Abdelminaam, DS, Hassan, HN, Al-Sayed, MM, and Nabil, E (2021). A hybrid barnacles mating optimizer algorithm with support vector machines for gene selection of microarray cancer classification. IEEE Access. 9, 64895-64905. https://doi.org/10.1109/ACCESS.2021.3075942
    CrossRef
  8. Cathryn, RH, Kumar, SU, Younes, S, Zayed, H, and Doss, CGP (2022). A review of bioinformatics tools and web servers in different microarray platforms used in cancer research. Advances in Protein Chemistry and Structural Biology. 131, 85-164. https://doi.org/10.1016/bs.apcsb.2022.05.002
    CrossRef
  9. Alomari, OA, Makhadmeh, SN, Al-Betar, MA, Alyasseri, ZAA, Doush, IA, Abasi, AK, Awadallah, MA, and Zitar, RA (2021). Gene selection for microarray data classification based on Gray Wolf Optimizer enhanced with TRIZ-inspired operators. Knowledge-Based Systems. 223. article no. 107034
    CrossRef
  10. Alhenawi, EA, Al-Sayyed, R, Hudaib, A, and Mirjalili, S (2022). Feature selection methods on gene expression microarray data for cancer classification: a systematic review. Computers in Biology and Medicine. 140. article no. 105051
    CrossRef
  11. Wang, A, Liu, H, Yang, J, and Chen, G (2022). Ensemble feature selection for stable biomarker identification and cancer classification from microarray expression data. Computers in Biology and Medicine. 142. article no. 105208
    CrossRef
  12. Houssein, EH, Saber, E, Ali, AA, and ad Wazery, YM (2022). Centroid mutation-based search and rescue optimization algorithm for feature selection and classification. Expert Systems with Applications. 191. article no. 116235
    CrossRef
  13. Arslan, O, and Karhan, M (2022). Effect of Hilbert-Huang transform on classification of PCG signals using machine learning. Journal of King Saud University-Computer and Information Sciences. 34, 9915-9925. https://doi.org/10.1016/j.jksuci.2021.12.019
    CrossRef
  14. Singh, N, and Singh, P (2021). A hybrid ensemble-filter wrapper feature selection approach for medical data classification. Chemometrics and Intelligent Laboratory Systems. 217. article no. 104396
    CrossRef
  15. Agarwalla, P, and Mukhopadhyay, S (2022). GENEmops: supervised feature selection from high dimensional biomedical dataset. Applied Soft Computing. 123. article no. 108963
    CrossRef
  16. Maldonado, S, Vairetti, C, Fernandez, A, and Herrera, F (2022). FW-SMOTE: a feature-weighted oversampling approach for imbalanced classification. Pattern Recognition. 124. article no. 108511
    CrossRef
  17. Dash, R, Dash, R, and Rautray, R (2022). An evolutionary framework based microarray gene selection and classification approach using binary shuffled frog leaping algorithm. Journal of King Saud University-Computer and Information Sciences. 34, 880-891. https://doi.org/10.1016/j.jksuci.2019.04.002
    CrossRef
  18. Nazari, E, Aghemiri, M, Avan, A, Mehrabian, A, and Tabesh, H (2021). Machine learning approaches for classification of colorectal cancer with and without feature selection method on microarray data. Gene Reports. 25. article no. 101419
    CrossRef
  19. Nosrati, V, and Rahmani, M (2022). An ensemble framework for microarray data classification based on feature subspace partitioning. Computers in Biology and Medicine. 148. article no. 105820
    Pubmed CrossRef

Share this article on :

Related articles in IJFIS