International Journal of Fuzzy Logic and Intelligent Systems 2023; 23(2): 117-129
Published online June 25, 2023
https://doi.org/10.5391/IJFIS.2023.23.2.117
© The Korean Institute of Intelligent Systems
Peddarapu Rama Krishna and Pothuraju Rajarajeswari
Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Andhra Pradesh, India
Correspondence to :
Peddarapu Rama Krishna (peddarapuramakrishna@gmail.com)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Bioinformatics has emerged as a promising field with innovative applications in various biological domains. Microarray data analysis has become the preferred technology for estimating gene expression and diagnosing diseases. However, processing and computing the vast amount of information contained in microarray data pose significant challenges. This study focuses on developing an effective feature selection model for disease diagnosis using microarray datasets. The proposed approach, mutual fuzzy swarm optimization (MFSO), uses mutual information (MI) values to compute features in a microarray gene dataset. The computed MI values are then applied to a fuzzy expert system (FES) for sample classification. To improve classification accuracy, fuzzy logic is iteratively estimated for unclassified instances in the microarray sample. The feature selection model employs a particle swarm optimization model to extract microarray sample features based on the derived MI. The swarm optimization movement is guided by an objective function. The expert system integrates fuzzy logic with human expert knowledge to perform medical diagnoses, with a specific focus on the selected features. The proposed MFSO model utilizes if-then rules to estimate the membership values of the microarray dataset features. Through simulation analysis, the performance of the proposed MFSO model is evaluated using sensitivity, specificity, and ROC values for five different microarray datasets. The results demonstrate improved performance compared to existing methods. In conclusion, this study presents a novel approach for effective feature selection in a microarray dataset for disease diagnosis. The proposed MFSO model integrates MI computation, fuzzy logic, and expert knowledge to achieve improved classification accuracy. The simulation analysis validates the effectiveness of the proposed model, highlighting its superior performance compared to existing methods.
Keywords: Microarray, Fuzzy expert system (FES), Particle swarm optimization, Bioinformatics, Feature selection
No potential conflict of interest relevant to this article was reported.
International Journal of Fuzzy Logic and Intelligent Systems 2023; 23(2): 117-129
Published online June 25, 2023 https://doi.org/10.5391/IJFIS.2023.23.2.117
Copyright © The Korean Institute of Intelligent Systems.
Peddarapu Rama Krishna and Pothuraju Rajarajeswari
Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Andhra Pradesh, India
Correspondence to:Peddarapu Rama Krishna (peddarapuramakrishna@gmail.com)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Bioinformatics has emerged as a promising field with innovative applications in various biological domains. Microarray data analysis has become the preferred technology for estimating gene expression and diagnosing diseases. However, processing and computing the vast amount of information contained in microarray data pose significant challenges. This study focuses on developing an effective feature selection model for disease diagnosis using microarray datasets. The proposed approach, mutual fuzzy swarm optimization (MFSO), uses mutual information (MI) values to compute features in a microarray gene dataset. The computed MI values are then applied to a fuzzy expert system (FES) for sample classification. To improve classification accuracy, fuzzy logic is iteratively estimated for unclassified instances in the microarray sample. The feature selection model employs a particle swarm optimization model to extract microarray sample features based on the derived MI. The swarm optimization movement is guided by an objective function. The expert system integrates fuzzy logic with human expert knowledge to perform medical diagnoses, with a specific focus on the selected features. The proposed MFSO model utilizes if-then rules to estimate the membership values of the microarray dataset features. Through simulation analysis, the performance of the proposed MFSO model is evaluated using sensitivity, specificity, and ROC values for five different microarray datasets. The results demonstrate improved performance compared to existing methods. In conclusion, this study presents a novel approach for effective feature selection in a microarray dataset for disease diagnosis. The proposed MFSO model integrates MI computation, fuzzy logic, and expert knowledge to achieve improved classification accuracy. The simulation analysis validates the effectiveness of the proposed model, highlighting its superior performance compared to existing methods.
Keywords: Microarray, Fuzzy expert system (FES), Particle swarm optimization, Bioinformatics, Feature selection
Flow diagram showing MI computation in the MFSO.
Design of the fuzzy system.
Flow chart of the PSO model.
Representation of solution variable in the hybrid ant stem algorithm.
Flow chart of the PSO.
Comparison of (a) sensitivity, (b) specificity, (c) false positive, and (d) false negative.
Algorithm 1. MFSO feature selection in microarray..
Begin |
for |
Compute the fitness value of swarm optimization P using an objective function |
Estimate the fitness function based on the swarm objective function |
Repeat the fuzzy set function |
Calculate the swarm movement in the objects: |
Estimate the fitness value of the particle swarm function |
Compare the optimal value |
Perform the differentiation process for the MF |
End for |
return |
End |
Table 1. Dataset features.
Dataset | Sample count | Gene | Labels | Class samples |
---|---|---|---|---|
ISR | 11 | 6,783 | IS | 55 |
IR | 55 | |||
T2D | 33 | 21,793 | DM2 | 16 |
NGT | 16 | |||
Colon cancer | 59 | 1,974 | Tumor | 38 |
Normal | 23 | |||
Leukemia | 69 | 6,943 | ALL | 43 |
AML | 29 | |||
Prostate | 106 | 11,678 | Tumor | 49 |
Normal | 53 |
IS, insulin sensitivity; IR, insulin resistance; DM, diabetes mellitus; NGT, normoglycaemia; ALL, Acute lymphocytic leukemia; AML, acute myeloid leukemia..
Table 2. Estimation of features using particle swarm optimization.
Parameters | Value |
---|---|
Delay factor ( | 0.5 |
Number of swarm | 75 |
Minimal value ( | 0.02 |
Error factor ( | 1 |
Csize | 27 |
Table 3. Interpretability of the MFSO based on rules.
Dataset | Rule set | Ncorrect | Ncovers | Coverage | Accuracy (%) |
---|---|---|---|---|---|
ISR | Rule 1 | 5 | 6 | 78 | 94 |
Rule 2 | 6 | 8 | 91 | 90 | |
T2D | Rule 1 | 14 | 18 | 83 | 91 |
Rule 2 | 9 | 11 | 86 | 89 | |
Rule 3 | 4 | 17 | 88 | 93 | |
Rule 4 | 5 | 9 | 72 | 93 |
Table 4. Gene rule set with the MFSO.
Datasets | Gene ID | #MF | Labels |
---|---|---|---|
T2D | NM_021131.1 | 1 | medium |
NM_0223491.1 | 2 | low, medium | |
AW291218 | 1 | low | |
BC000229.1 | 1 | low | |
AL523575 | 1 | high | |
NM_005260.2 | 1 | High | |
ISR | D32129_f_at | 2 | low, medium |
Z83805_at | 1 | medium | |
Y09615_at | 2 | low, high | |
X07730_at | 2 | medium, high |
Table 5. Rule set with the MFSO.
Dataset | Rule set | Value of Ncovers | Value of Ncorrect | Value of Coverage | Accuracy of Rule (%) |
---|---|---|---|---|---|
ISR | 1 | 8 | 6 | 77 | 94.56 |
2 | 10 | 8 | 90 | 98.73 | |
T2D | 1 | 21 | 17 | 74.63 | 97,64 |
2 | 17 | 11 | 69.84 | 98.74 | |
3 | 21 | 13 | 47.32 | 99.04 | |
4 | 9 | 9 | 38.94 | 98.74 |
Table 6. Comparison of interpretability based on datasets.
Dataset | Mean coverage | Number of variables | Membership function | ||||||
---|---|---|---|---|---|---|---|---|---|
GA | HCA | MFSO | GA | HCA | MFSO | GA | HCA | MFSO | |
T2D | 21.3 | 25.83 | 36.85 | 2.9 | 3.98 | 16.84 | 0.45 | 3.78 | 22.74 |
ISR | 25.8 | 228.93 | 39.43 | 3.2 | 4.94 | 19.32 | 0.83 | 3.74 | 29.83 |
Table 7. Generalization ability.
Dataset | Approach | LOOCV evaluation | ||
---|---|---|---|---|
Correct | Incorrect | Unclassified | ||
ISR | GA | 84.3 | 5.4 | 10.3 |
PSO | 87.8 | 5.9 | 6.3 | |
HCA | 92.4 | 4.3 | 3.3 | |
MFSO | 98.2 | 0.8 | 0.8 | |
T2D | GA | 44.11 | 29.41 | 26.48 |
PSO | 85.3 | 8.82 | 5.88 | |
HCA | 94.11 | 2.8 | 3.09 | |
MFSO | 98.4 | 0.7 | 1.1 |
Table 8. Comparison of classification performance.
Data set | Approaches | # IG | Sensitivity | Specificity | False positive | False negative |
---|---|---|---|---|---|---|
ISR | MI-PSO | 9 | 0.985 | 0.835 | 0.038 | 0.068 |
MI-HCA | 5 | 0.993 | 0.797 | 0.184 | 0.296 | |
MI- MFSO | 5 | 0.986 | 0.825 | 0.059 | 0.083 | |
T2D | MI-PSO | 10 | 0.983 | 0.873 | 0.048 | 0.63 |
MI-HCA | 6 | 0.993 | 0.926 | 0.041 | 0.783 | |
MI-MFSO | 6 | 0.997 | 0.948 | 0.028 | 0.73 |
Table 9. Comparison of mean coverage.
Dataset | Mean coverage ( | ||||
---|---|---|---|---|---|
BCGA | PSO | GSA | HCA | MFSO | |
T2D | 26.4 | 23.6 | 24.6 | 23.73 | 36.82 |
ISR | 23.7 | 27.2 | 27.3 | 28.42 | 41.75 |
Colon | 34.67 | 37.93 | 34.9 | 31.35 | 40.94 |
Leukemia | 32.82 | 39.1 | 37.1 | 33.94 | 44.88 |
Prostate | 28.63 | 26.4 | 28.4 | 36.72 | 53.94 |
Table 10. Comparison of variables.
Dataset | Number of variables (#V) | ||||
---|---|---|---|---|---|
BCGA | PSO | GSA | HCA | MFSO | |
T2D | 4.2 | 3.8 | 4.3 | 5.38 | 14.63 |
ISR | 3.8 | 3.9 | 4.9 | 5.16 | 13.94 |
Colon | 4.2 | 5.3 | 5.1 | 5.53 | 10.94 |
Leukemia | 4.9 | 5.2 | 5.9 | 5.49 | 9.84 |
Prostate | 5.1 | 4.1 | 5.6 | 5.28 | 10.35 |
Table 11. Comparison of membership function.
Dataset | Mean number of membership functions ( | ||||
---|---|---|---|---|---|
BCGA | PSO | GSA | HCA | MFSO | |
T2D | 0.11 | 0.9 | 2.3 | 2.13 | 4.08 |
ISR | 0.6 | 1.3 | 2.2 | 2.28 | 3.98 |
Colon | 0.1 | 1.1 | 2.1 | 2.23 | 3.45 |
Leukemia | 0.2 | 1.3 | 2.1 | 2.46 | 3.06 |
Prostate | 0.4 | 1.2 | 2.3 | 2.69 | 4.67 |
Iqbal M. Batiha, Shaher Momani, Radwan M. Batyha, Iqbal H. Jebril, Duha Abu Judeh, and Jamal Oudetallah
International Journal of Fuzzy Logic and Intelligent Systems 2024; 24(1): 74-82 https://doi.org/10.5391/IJFIS.2024.24.1.74Jinwan Park
International Journal of Fuzzy Logic and Intelligent Systems 2021; 21(4): 378-390 https://doi.org/10.5391/IJFIS.2021.21.4.378SeJoon Park, NagYoon Song, Wooyeon Yu, and Dohyun Kim
International Journal of Fuzzy Logic and Intelligent Systems 2019; 19(4): 307-314 https://doi.org/10.5391/IJFIS.2019.19.4.307Flow diagram showing MI computation in the MFSO.
|@|~(^,^)~|@|Design of the fuzzy system.
|@|~(^,^)~|@|Flow chart of the PSO model.
|@|~(^,^)~|@|Representation of solution variable in the hybrid ant stem algorithm.
|@|~(^,^)~|@|Flow chart of the PSO.
|@|~(^,^)~|@|Comparison of (a) sensitivity, (b) specificity, (c) false positive, and (d) false negative.