Article Search
닫기

Original Article

Split Viewer

International Journal of Fuzzy Logic and Intelligent Systems 2024; 24(1): 10-18

Published online March 25, 2024

https://doi.org/10.5391/IJFIS.2024.24.1.10

© The Korean Institute of Intelligent Systems

A Machine Learning Approach to ADHD Diagnosis Using Mutual Information and Stacked Classifiers

Nishant Chauhan and Byung-Jae Choi

Department of Electronic Engineering, Daegu University, Gyeongsan, Korea

Correspondence to :
Byung-Jae Choi (bjchoi@daegu.ac.kr)

Received: November 24, 2023; Revised: March 3, 2024; Accepted: March 12, 2024

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Attention-deficit hyperactivity disorder (ADHD) is a prevalent neurodevelopmental condition in children characterized by impairments in attention, hyperactivity, and impulse control. Despite extensive research, the underlying cause of ADHD remains unclear. Electroencephalography (EEG), a noninvasive method for recording brain activity, is valuable for studying ADHD-related neural patterns. This study explored the potential of EEG data to differentiate children with ADHD and healthy controls (HC) to enhance diagnostic accuracy. We analyzed EEG recordings from 61 children with ADHD and 60 healthy controls. The EEG data comprised signals from 19 scalp channels. Our primary objective was to develop a machine learning model capable of classifying ADHD subjects with ADHD from HC using EEG data as discriminatory features. To select the most relevant features, we utilized mutual information (MI), a measure of the statistical dependence between two variables. The top features were selected based on their minimum MI values, ensuring that they captured meaningful information from both ADHD and HC groups. Principal component analysis was employed to reduce dimensionality while preserving the essential features, aiming to mitigate computational complexity. The selected features were then used to train ten different classifiers: random forest, multilayer perceptron (MLP), k-nearest neighbors, extra tree classifier, XGBoost, support vector machines, logistic regression, AdaBoost, classification and regression trees, and gradient boosting machines. A stacked classifier was constructed by combining the outputs of all 10 individual classifiers, with the MLP acting as a meta-classifier. The stacked classifier outperformed individual models, achieving an impressive accuracy of 92%. Its precision (91%) and sensitivity (93%) were also higher than those of the individual models, indicating its ability to correctly identify ADHD-positive cases. Furthermore, the specificity of the stacked classifier (93%) was superior, highlighting its improved proficiency in correctly classifying HC. This comprehensive evaluation established the stacked classifier as an effective approach for ADHD classification, surpassing the performance of several standalone models. Our proposed method offers a noninvasive, objective, and cost-effective method for identifying children with ADHD, leading to earlier diagnosis, intervention, and improved treatment outcomes.

Keywords: EEG, ADHD, Machine learning, Mutual information, PCA, Stacked classifier

Attention-deficit hyperactivity disorder (ADHD) is a neurodevelopmental disorder marked by inattention, hyperactivity, and impulsivity, which are inconsistent with an individual’s age and developmental level [1]. The exact cause of ADHD remains unknown; however, several factors, including genetics, brain injury, toxin exposure, and substance abuse during pregnancy, may contribute to its development [2]. Currently, diagnosis of ADHD often relies on comprehensive evaluation by a pediatrician, psychiatrist, or psychologist. However, clinical manifestations of ADHD are subtle and difficult to detect [3]. Imaging techniques such as magnetic resonance imaging (MRI), positron emission tomography (PET), and computed tomography (CT) scans are not yet reliable for ADHD diagnosis, and even in developed countries, there is ongoing debate among specialists regarding diagnostic criteria. Moreover, the risk of false-positive diagnoses, especially in children, remains high. The diagnostic process for ADHD typically involves gathering a significant amount of data, making it complex, time-consuming, and costly. This extensive data collection is necessary to determine whether ADHD underlies the observed symptoms as opposed to normal behavioral variations, potential differential diagnoses, or co-occurring conditions [4]. Yet, it heavily relies on subjective assessments, increasing the likelihood of diagnostic biases and potentially delaying treatment initiation despite the availability of effective ADHD treatments. Therefore, it is imperative to streamlining, abbreviate and standardize the diagnostic process of ADHD. This can be achieved by identifying the most relevant data elements that can accurately predict the diagnostic outcomes. Machine learning (ML) techniques offer a promising approach to achieve this goal.

The growing availability of ML models has sparked a surge in interest in their application in the study of psychiatric disorders. ML models are mathematical algorithms capable of identifying intricate patterns in existing datasets. These learned patterns can then be utilized for predictive tasks in new datasets (e.g., patient vs. control participant classification and symptom score prediction), as well as to identify the most critical variables influencing these predictions. They have demonstrated remarkable effectiveness in capturing complex interactions underlying discrete alterations in schizophrenia, Alzheimer’s disease (AD), and autism spectrum disorder (ASD) [5].

One study showcased electroencephalography (EEG) analysis effectiveness in differentiating 253 children with ADHD from 67 age-matched controls. These used EEG signals recorded during eyes-closed resting-state conditions. Using logistic regression (LR) as a classifier, they achieved an impressive accuracy of 87%, sensitivity of 89%, and specificity of 79.6% [6]. Another study [7] explored the using nonlinear features extracted from EEG recordings during a cognitive attention task for classification. By combining minimum redundancy maximum relevance (mRMR) feature selection and multilayer perceptron (MLP) with one hidden layer containing five neurons, the researchers successfully distinguished 30 children with ADHD from 30 age-matched controls, achieving an accuracy of 92.28%. A recent nationwide Swedish study employed multiple ML models, including random forest (RF), elastic net, deep neural network, and gradient boosting machine (GBM) to identify significant predictors of ADHD based on the family and medical histories of a large cohort of 238,696 individuals [8]. The best-performing model attained a sensitivity of 71.7% and a specificity of 65.0%. Key risk factors for ADHD in children included having parents with criminal convictions, male sex, having a relative with ADHD, academic difficulties, and learning disabilities.

The potential of event-related potentials (ERPs) as diagnostic tools for ADHD was explored in a study that examined ERPs derived from the prefrontal and inferior parietal regions in 14 children with ADHD and 16 age-matched controls during a spatial Stroop task. Using k-nearest neighbor (KNN) and support vector machine (SVM) classifiers, researchers achieved an impressive 83.33% accuracy with KNN, outperforming the SVM’s accuracy of 56.42% [9]. Beyond enabling subject classification into traditional diagnostic groups, research on ML-aided ADHD diagnosis has also contributed to a deeper understanding of the clinical presentation and heterogeneity of ADHD by facilitating the identification of novel subgroups of participants, which can ultimately enhance diagnostic accuracy [10].

Feature selection is vital in ML, refining model performance by pinpointing the most relevant and informative features, thereby reducing dimensionality, improving interpretability, and mitigating the risk of overfitting. Moreover, ML techniques that can explore the heterogeneities in ADHD (e.g., feature selection and reduction methods) have the potential to not only improve diagnosis, but also contribute to advancements in future research investigating the underlying mechanisms of ADHD by providing more appropriately defined samples.

In our study, we investigated the potential of EEG data to differentiate between children with ADHD and HC to improve diagnostic accuracy. Using mutual information (MI) for feature selection, we identified key features with minimum MI values from both groups. To manage computational complexity, principal component analysis (PCA was applied for dimensionality reduction while preserving essential features. Ten classifiers (RF, MLP, KNN, extra tree classifier [ETC], XGBoost [XGB]), SVM, LR, AdaBoost, classification and regression tree [CART], and gradient boosting machine [GBM]) were trained using the selected features. A Stacked classifier, combining outputs from all 10 classifiers with MLP as the meta-classifier, demonstrated promising results.

The remainder of this paper is organized as follows. Section 2 outlines the materials and methodology employed. Section 3 discusses results comprehensively. Finally, Section 4 concludes the paper with a summary of key findings and future directions.

In this study, we proposed an ML framework that utilizes MI-based feature selection, dimensionality reduction, and stacked classifier ensemble learning to address the challenges of traditional ADHD diagnosis and enhance classification accuracy.

Figure 1 illustrates the architecture of the proposed method for ADHD and health controls (HC) classification using a stacked classifier. The framework begins by preprocessing collected EEG data to ensure quality and consistency. Subsequently, MI was employed to select the most relevant features from the preprocessed EEG signals, eliminating redundant and less informative features. Further refinement of the data representation is achieved through PCA, which reduced the dimensionality of the feature set while preserving essential information. The selected features are then used to train a stacked classifier, which is an ensemble-learning approach that combines the predictions of multiple individual classifiers. This stacked classifier leverages the strengths of different algorithms to achieve superior performance compared with standalone classifiers. Finally, the trained stacked classifier predicts ADHD and HC, with performance compared to other individual classifiers. This comprehensive approach aims to improve the accuracy and efficiency of ADHD diagnosis, potentially paving the way for improved clinical decision-making and patient outcomes.

2.1 Dataset

This study utilized a validated online EEG dataset available at https://ieee-dataport.org/open-access/eegdata-adhd-control-children [11]. This dataset includes EEG recordings from two distinct groups: 61 children diagnosed with ADHD (48 boys and 13 girls, average age 9.61 ± 1.75 years) and 60 healthy children (50 boys and 10 girls, average age 9.85 ± 1.77 years). EEG data were collected using an SDC24 device equipped with 19 channels followed the 10–20 electrode placement system commonly used in EEG. The 10–20 plus standard system provides a consistent method for electrode positioning by dividing the scalp into defined regions with labeled points based on standardized distances relative to prominent anatomical landmarks. Electrodes were precisely placed at predetermined locations to ensure uniform and reproducible EEG signal acquisition across diverse brain regions. The “10–20” nomenclature signifies the standardized distances of 10% or 20% between electrode placements, creating a systematic grid for consistent electrode positioning across subjects and research settings. This method ensures methodological rigor in data collection, facilitates meaningful cross-subject and cross-study comparisons, and is crucial in EEG-based ADHD research. ADHD was diagnosed by an experienced child and adolescent psychiatrist using the Diagnostic and Statistical Manual of Mental Disorders, fourth edition (DSM-IV) criteria [12].

2.2 Preprocessing

EEG signals are susceptible to various artifacts and noise sources, necessitating thorough preprocessing before reliable analysis. The datasets in this study underwent the following preprocessing steps.

The dataset owners employed a customized iteration of Makoto’s preprocessing pipeline adapted for use with EEGLab functions (version 14.1.1; Delorme & Makeig, 2004) in MATLAB 2018a. For artifact elimination, a band-pass finite impulse response filter covering 0.5 Hz to 48 Hz was applied to the continuous EEG data. The CleanLine plugin was used to suppress line noise. Independent component analysis (ICA) was then employed to decompose the EEG data, and components related to eye blinks and muscle artifacts were manually excluded based on spectral properties, scalp maps, and temporal characteristics. For each subject, the time-series was segmented into 1,024-sample (8-second) segments, with variable counts owing to task-specific timings. The minimum task duration was 50 seconds for the control group participants, while the maximum duration was 285 seconds for the ADHD participants. The mean segment count were 13.18 (standard deviation = 3.15) in the control group and 16.14 segments (standard deviation = 6.42) in the ADHD group.

2.3 Feature Selection and Dimensionality Reduction

MI quantifies the statistical dependence between two variables. In the context of feature selection for ADHD classification using EEG data, MI helps identify EEG features highly correlated with the target variable (i.e., ADHD diagnosis). This method is powerful for selecting the most relevant and informative features in a dataset. PCA is a dimensionality-reduction technique. It reduces the number of features in a dataset while preserving the most important information. The use of both methods in this study is described below.

2.3.1 Feature selection using MI

MI gauges the dependence between two variables. In this study, MI was used to identify the EEG features that were most informative for distinguishing children with ADHD from healthy controls. Features with high MI values are crucial as they provide more information regarding the diagnosis of ADHD. To calculate the MI, the minimum MI values between each EEG feature and class label (ADHD vs. HC) were computed using the mutual_info_score function in Python’s scikit-learn library.

Finally, topographic maps were generated to visualize the spatial distribution of the most informative EEG features. These maps, created using the minimum norm estimate (MNE) tool-box, provide a visual perspective on how these significant features are distributed across the scalp.

Figure 2 displays a tomographic comparison of features between ADHD and HC based on minimum MI. The figure shows that several features are significantly different between individuals with ADHD and controls. These features include:

  • T7 (Temporal regions of the brain): This region is involved in processing auditory and visual information.

  • F3, Fz, and F4 (Frontal lobe of the brain): This region is involved in executive functions including planning, decision-making, and attention.

  • Cz and C4 (Central regions of the brain): This region is involved in motor and sensory processes.

  • P3, P7, Pz, and P4 (Parietal lobes of the brain): This region is also involved in spatial processing and attention.

  • T8 (Temporal lobe of the brain): This region is involved in processing auditory and visual information.

The figure also illustrates that the direction of the difference in MI between ADHD and control individuals was not always consistent. For instance, the MI between individuals with ADHD and control individuals was higher for the temporal region of the brain in some cases (e.g., Ept and TB) and lower in others (e.g., P3 and Pz). This suggests that the underlying neural mechanisms of ADHD are complex and that there is no single feature that can reliably distinguish between individuals with ADHD and controls.

2.3.2 Feature reduction using PCA

This process entails reducing the dimensionality of the features chosen through MI. The cumulative variance plot in Figure 3 visually indicates the proportion of explained variance with an increasing number of components, which helps to identify an optimal cutoff. Here, the 95% cutoff threshold guided the selection of the number of components. Subsequently, PCA was applied to transform the original dataset into a new, more compact representation with a reduced number of dimensions (five in this instance). Five PCA components were selected as they accounted for over 95% of the variance in the brain network data.

2.4 Classification

In this study, we employed an ensemble of ten ML classifiers to detect patterns in EEG data and enhance ADHD classification in children. The classifiers utilized encompassed a range of algorithms to ensure a comprehensive exploration of the dataset. RF [13], MLP [14], KNN [15], ETC [16], and XGB [17] represent various decision-based models. Additionally, SVM [18], LR using stochastic gradient descent (SGD) [19], AdaBoost [20], CART [21], and GBM [22] have introduced diverse learning approaches. This broad range of classifiers ensures a thorough examination of EEG data, capturing the nuances and complexities essential for accurate classification of ADHD in children.

Random forest (RF): RF is an ensemble learning technique that builds multiple decision trees during training. Each tree is built using a random subset of features and a subset of the training data. The final prediction is made by aggregating the predictions of all individual trees, often using a majority-voting scheme. This method boosts the robustness and mitigates overfitting by introducing randomness into the model.

Multilayer perceptron (MLP): An MLP is a type of artificial neural network with multiple layers of interconnected nodes (neurons). It can learn complex patterns in data through a process known as backpropagation, in which the network adjusts its internal parameters based on errors in its predictions. MLP is particularly effective for tasks involving nonlinear relationships in data due to its capacity to model intricate decision boundaries.

K-nearest neighbors (KNN): KNN is a lazy-learning algorithm that classifies data points based on the majority class of their KNN in the feature space. It operates under the assumption that similar instances in an input space share similar output labels. The choice of k (number of neighbors) influences the sensitivity of the model to local variations in the data.

Extra tree classifier (ETC): ETC is similar to a RF but distinguishes itself by splitting the data using random thresholds for each feature. This builds multiple decision trees and selects the best splits, thereby introducing additional randomness into the model. This randomness can improve the generalization ability and resilience of the model to noise.

XGBoost (XGB): XGBoost is an optimized gradient boosting library tailored for efficient processing of large datasets. It sequentially builds decision trees, with each subsequent tree aimed at correcting the errors made by the ensemble of the previous trees. XGBoost incorporates regularization terms to prevent overfitting and has become widely popular owing to its speed and performance in various ML tasks.

Support vector machines (SVM): SVMs are supervised learning models that aim to find a hyperplane in a feature space that best separates different classes of data points. SVM excels in high-dimensional spaces and is particularly useful for tasks in which clear decision boundaries are required.

Logistic regression (LR): LR is a linear classification algorithm frequently employed to predict binary outcomes (e.g., ADHD vs. HC). It estimates the probability of a sample belonging to a particular class by finding the linear decision boundary that best separates the data points. The model’s parameters are trained using optimization algorithms such as SGD, which iteratively adjusts the parameters to minimize the error between the predicted probabilities and true class labels.

AdaBoost: AdaBoost is an ensemble-learning method that sequentially combines weak classifiers. Each weak classifier aims to correct the errors of its predecessor, and instances that are misclassified in one round are assigned higher weights in subsequent rounds. This adaptiveness makes AdaBoost robust and adept at handling complex decision-making boundaries.

Classification and regression trees (CART): CART is a decision-tree algorithm utilized for both classification and regression tasks. It recursively divides the data into subsets based on the most discriminative features, forming a tree-like structure in which each leaf node corresponds to a class label or regression value.

Gradient boosting machines (GBM): GBMs are an ensemble technique that constructs decision trees sequentially. Each tree is trained to rectify errors from the ensemble of previously built trees. GBM excels at capturing intricate relationships in data and is often used for regression and classification tasks.

Stacked classifier (meta-classifier): The final layer of the ensemble, referred to as the stacked classifier, is a MLP in this study. It takes predictions from 10 base classifiers as input features and learns to make a final prediction, harnessing the diverse information provided by the individual classifiers.

Stacked classifiers, also known as stacked ensembles, is an ML approach that combines the predictions of multiple base classifiers to improve the overall performance. In this study, ten diverse classifiers were employed, and their outputs were fed into a meta-classifier, specifically an MLP, to form a stacked classifier. This ensemble strategy aims to leverage the strengths of individual classifiers and enhance predictive accuracy and robustness.

2.5 Performance Metrices

It is important to use various metrics when evaluating the performance of a classifier. This is because no single metric can capture all aspects of the classifier performance. In this study, we used the following five performance metrics.

Accuracy: Accuracy measures the overall correctness of the model’s predictions and is calculated as the ratio of correctly predicted instances to total instances.

Accuracy=TP+TNTP+TN+FP+FN,

where true positive (TP), true negative (TN), false positive (FP), and false negative (FN).

Precision: The precision evaluates the model’s accuracy in identifying positive instances, indicating the proportion of true-positive predictions among all instances predicted to be positive.

Precision=TPTP+FP+FP.

Sensitivity: Sensitivity assesses the model’s ability to capture all positive instances and is defined as the ratio of true positives to the sum of true positives and false negatives.

Sensitivity=TPTP+FN+FN.

Specificity: This measures the model’s accuracy in identifying negative instances and is calculated as the ratio of true negatives to the sum of true negatives and false positives.

Specificity=TNTN+FP.

F1-score: The F1-score combines precision and sensitivity, providing a balanced metric that considers both false positives and false negatives. It is calculated as the harmonic mean of the precision and sensitivity.

F1-score=2×Precision×SensitivityPrecision+Sensitivity.

The performance comparison results of the ten ML classifiers for the ADHD classification task are shown in Figure 4. The stacked classifier exhibited the best performance, with 92% accuracy, 91% precision, 93% sensitivity, 93%, and an F1-score of 92%.

Furthermore, based on the confusion matrix shown in Figure 5, the classifier detected 56 TP, 5 FP, 5 FN, and 55 TN. The 5-fold cross-validation was applied, and the average of each performance metric was reported. This indicates that the stacked classifier potentially learned more complex patterns in the data than the other classifiers.

The XGB and ETC also demonstrated strong performance, achieving accuracies of 88% and 79%, respectively. These classifiers are ensemble-learning methods that combine the predictions of multiple individual classifiers to produce a final prediction. This is one reason why they were able to achieve such good performance.

In contrast, the other classifiers exhibited lower performances, with accuracies ranging from 54% to 79%.

Overall, the results indicate that ensemble learning methods hold promise for ADHD classification. The stacked classifier achieves the best performance, whereas the XGB and ETC perform satisfactorily.

Here are some additional observations from the results:

  • • The stacked classifier had the highest precision and specificity, indicating that it was very good at correctly identifying both positive and negative ADHD cases.

  • • The XGB classifier exhibited the highest sensitivity, indicating its strong ability to detect all cases of ADHD, even if it resulted in false positives.

  • • The MLP and SVM classifiers had the lowest performance, suggesting that they were unable to learn complex patterns in the ADHD data as well as the other classifiers.

  • • Varied MI differences between ADHD and HC were observed in a variety of brain regions, suggesting that ADHD is a complex neurodevelopmental disorder that affects multiple brain regions.

  • • The direction of the MI differences was not always consistent, suggesting that there is no single neural signature for ADHD.

  • • The observed MI differences were relatively small, suggesting that ADHD is a heterogeneous disorder with a wide range of presentations.

In conclusion, our findings offer important insights into the neural mechanisms underlying ADHD. The results indicate that MI could be used to develop more accurate diagnostic tools for ADHD, and that a multimodal approach to ADHD diagnosis and treatment may be most effective.

This study extensively explored the use of ML techniques for the classification of children with ADHD and HC using EEG data. By employing MI-based feature selection, dimensionality reduction via PCA, and a stacked classifier ensemble, we significantly improved diagnostic accuracy. The robust performance of the individual classifiers, particularly the XGBoost model, highlights the potential of advanced ML algorithms to discern subtle patterns in EEG data associated with ADHD. The stacked classifier exhibits superior accuracy, emphasizing the benefits of combining diverse classifiers to enhance diagnostic outcomes. Our findings contribute to the growing body of literature on ADHD diagnosis, showcasing the effectiveness of ML methodologies and emphasizing the importance of feature selection and ensemble learning strategies. Further research on larger and more diverse datasets is needed to consolidate and extend our findings, with the ultimate goal of refining ADHD diagnosis and advancing our understanding of neurodevelopmental disorders.

This study showcased the potential of utilizing minimum MI-based features and a stacked classifier for ADHD diagnosis. However, the balanced and preprocessed nature of the dataset may limit the generalizability of the findings to real-world clinical settings. Future investigations should evaluate the performance of stacked classifiers on imbalanced datasets that reflect wider patient variations. Furthermore, exploring the integration of different neuroimaging modalities with EEG features could offer valuable avenues for enhancing diagnostic accuracy and deepening our comprehension of ADHD.

No potential conflict of interest relevant to this article was reported.

Fig. 1.

Proposed stacked classifier-based framework.


Fig. 2.

Topographic maps of minimum mutual information values between EEG features and ADHD diagnosis, comparing ADHD and HC.


Fig. 3.

Cumulative variance explained by principal components after feature selection usingMI.


Fig. 4.

Performance comparison of stacked classifier with other individualML models for ADHD classification.


Fig. 5.

Confusion matrix of stacked classifier for ADHD classification.


  1. Polanczyk, G, De Lima, MS, Horta, BL, Biederman, J, and Rohde, LA (2007). The worldwide prevalence of ADHD: a systematic review and metaregression analysis. American Journal of Psychiatry. 164, 942-948. https://doi.org/10.1176/appi.ajp.164.6.942
    Pubmed CrossRef
  2. Nunez-Jaramillo, L, Herrera-Solis, A, and Herrera-Morales, WV (2021). ADHD: reviewing the causes and evaluating solutions. Journal of Personalized Medicine. 11. article no 166. https://doi.org/10.3390/jpm11030166
    Pubmed KoreaMed CrossRef
  3. Eigsti, IM, Barkley, RA, and Fletcher, K (2008). The clinical presentation of attention-deficit/hyperactivity disorder in children and adolescents: A review of the diagnostic criteria and evidence-based practices. Journal of Clinical Child and Adolescent Psychology. 47, 977-1013.
  4. Faraone, SV (2005). The scientific foundation for understanding attention-deficit/hyperactivity disorder as a valid psychiatric disorder. European Child & Adolescent Psychiatry. 14, 1-10. https://doi.org/10.1007/s00787-005-0429-z
    Pubmed CrossRef
  5. Arbabshirani, MR, Plis, S, Sui, J, and Calhoun, VD (2017). Single subject prediction of brain disorders in neuroimaging: promises and pitfalls. Neuroimage. 145, 137-165. https://doi.org/10.1016/j.neuroimage.2016.02.079
    Pubmed KoreaMed CrossRef
  6. Magee, CA, Clarke, AR, Barry, RJ, McCarthy, R, and Selikowitz, M (2005). Examining the diagnostic utility of EEG power measures in children with attention deficit/hyperactivity disorder. Clinical Neurophysiology. 116, 1033-1040. https://doi.org/10.1016/j.clinph.2004.12.007
    Pubmed CrossRef
  7. Mohammadi, MR, Khaleghi, A, Nasrabadi, AM, Rafieivand, S, Begol, M, and Zarafshan, H (2016). EEG classification of ADHD and normal children using non-linear features and neural network. Biomedical Engineering Letters. 6, 66-73. https://doi.org/10.1007/s13534-016-0218-2
    CrossRef
  8. Garcia-Argibay, M, Zhang-James, Y, Cortese, S, Lichtenstein, P, Larsson, H, and Faraone, SV (2023). Predicting childhood and adolescent attention-deficit/hyperactivity disorder onset: a nationwide deep learning approach. Molecular Psychiatry. 28, 1232-1239. https://doi.org/10.1038/s41380-022-01918-8
    Pubmed KoreaMed CrossRef
  9. Yang, J, Li, W, Wang, S, Lu, J, and Zou, L. (2016) . Classification of children with attention deficit hyperactivity disorder using PCA and K-nearest neighbors during interference control task. Advances in Cognitive Neurodynamics (V). Singapore: Springer, pp. 447-453. https://doi.org/10.1007/978-981-10-0207-6_61
    CrossRef
  10. Karalunas, SL, and Nigg, JT (2020). Heterogeneity and subtyping in attention-deficit/hyperactivity disorder—considerations for emerging research using personcentered computational approaches. Biological Psychiatry. 88, 103-110. https://doi.org/10.1016/j.biopsych.2019.11.002
    Pubmed KoreaMed CrossRef
  11. Nasrabadi, AM, Allahverdy, A, Samavati, M, and Mohammadi, MR (2023). EEG data for ADHD/control children. Available: https://dx.doi.org/10.21227/rzfh-zn36
    CrossRef
  12. Morgan, AE, Hynd, GW, Riccio, CA, and Hall, J (1996). Validity of DSM-IV ADHD predominantly inattentive and combined types: relationship to previous DSM diagnoses/subtype differences. Journal of the American Academy of Child & Adolescent Psychiatry. 35, 325-333. https://doi.org/10.1097/00004583-199603000-00014
    Pubmed CrossRef
  13. Breiman, L (2001). Random forests. Machine Learning. 45, 5-32. https://doi.org/10.1023/A:1010933404324
    CrossRef
  14. Hinton, GE, and Salakhutdinov, RR (2006). Reducing the dimensionality of data with neural networks. Science. 313, 504-507. https://doi.org/10.1126/science.1127647
    Pubmed CrossRef
  15. Cover, T, and Hart, P (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory. 13, 21-27. https://doi.org/10.1109/TIT.1967.1053964
    CrossRef
  16. Geurts, P, Ernst, D, and Wehenkel, L (2006). Extremely randomized trees. Machine Learning. 63, 3-42. https://doi.org/10.1007/s10994-006-6226-1
    CrossRef
  17. Chen, T, and Guestrin, C. XGBoost: a scalable tree boosting system, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 2016, pp. 785-794. https://doi.org/10.1145/2939672.2939785
    CrossRef
  18. Cortes, C, and Vapnik, V (1995). Support-vector networks. Machine Learning. 20, 273-297. https://doi.org/10.1007/BF00994018
    CrossRef
  19. Bottou, L, and LeCun, Y (2003). Large scale online learning. Advances in Neural Information Processing Systems. 16, 213-224.
  20. Freund, Y, and Schapire, RE (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences. 55, 119-139. https://doi.org/10.1006/jcss.1997.1504
    CrossRef
  21. Quinlan, JR (1996). Bagging, boosting, and C4.5., Proceedings of the AAAI Conference on Artificial Intelligence, 13, pp.725-730.
  22. Friedman, JH (2001). Greedy function approximation: a gradient boosting machine. Annals of Statistics. 29, 1189-1232.
    CrossRef

Nishant Chauhan received his B.S degree in Computer Science from AKTU, India in 2012, and M.S. and Ph.D. degrees in Control & Instrumentation from the Department of Electronic Engineering, Daegu University, South Korea in 2020 and 2023, respectively. His research interests include computational neuroscience, fuzzy logic, Image processing, machine learning and deep learning for medical images, brain MRI, and fMRI.

E-mail: nishantsep1090@daegu.ac.kr.

Byung-Jae Choi received his B.S. in Electronic Engineering from Kyungpook National University, Daegu, South Korea, in 1987, and M.S. and Ph.D. degrees in Electrical and Electronic Engineering, KAIST, Daejeon, South Korea, in 1989 and 1998, respectively. He is the vice president of Daegu University and a professor at the School of Electronic and Electrical Engineering, Daegu University, South Korea, since 1999. His current research interests include intelligent control and its applications.

E-mail: bjchoi@daegu.ac.kr.

Article

Original Article

International Journal of Fuzzy Logic and Intelligent Systems 2024; 24(1): 10-18

Published online March 25, 2024 https://doi.org/10.5391/IJFIS.2024.24.1.10

Copyright © The Korean Institute of Intelligent Systems.

A Machine Learning Approach to ADHD Diagnosis Using Mutual Information and Stacked Classifiers

Nishant Chauhan and Byung-Jae Choi

Department of Electronic Engineering, Daegu University, Gyeongsan, Korea

Correspondence to:Byung-Jae Choi (bjchoi@daegu.ac.kr)

Received: November 24, 2023; Revised: March 3, 2024; Accepted: March 12, 2024

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Attention-deficit hyperactivity disorder (ADHD) is a prevalent neurodevelopmental condition in children characterized by impairments in attention, hyperactivity, and impulse control. Despite extensive research, the underlying cause of ADHD remains unclear. Electroencephalography (EEG), a noninvasive method for recording brain activity, is valuable for studying ADHD-related neural patterns. This study explored the potential of EEG data to differentiate children with ADHD and healthy controls (HC) to enhance diagnostic accuracy. We analyzed EEG recordings from 61 children with ADHD and 60 healthy controls. The EEG data comprised signals from 19 scalp channels. Our primary objective was to develop a machine learning model capable of classifying ADHD subjects with ADHD from HC using EEG data as discriminatory features. To select the most relevant features, we utilized mutual information (MI), a measure of the statistical dependence between two variables. The top features were selected based on their minimum MI values, ensuring that they captured meaningful information from both ADHD and HC groups. Principal component analysis was employed to reduce dimensionality while preserving the essential features, aiming to mitigate computational complexity. The selected features were then used to train ten different classifiers: random forest, multilayer perceptron (MLP), k-nearest neighbors, extra tree classifier, XGBoost, support vector machines, logistic regression, AdaBoost, classification and regression trees, and gradient boosting machines. A stacked classifier was constructed by combining the outputs of all 10 individual classifiers, with the MLP acting as a meta-classifier. The stacked classifier outperformed individual models, achieving an impressive accuracy of 92%. Its precision (91%) and sensitivity (93%) were also higher than those of the individual models, indicating its ability to correctly identify ADHD-positive cases. Furthermore, the specificity of the stacked classifier (93%) was superior, highlighting its improved proficiency in correctly classifying HC. This comprehensive evaluation established the stacked classifier as an effective approach for ADHD classification, surpassing the performance of several standalone models. Our proposed method offers a noninvasive, objective, and cost-effective method for identifying children with ADHD, leading to earlier diagnosis, intervention, and improved treatment outcomes.

Keywords: EEG, ADHD, Machine learning, Mutual information, PCA, Stacked classifier

1. Introduction

Attention-deficit hyperactivity disorder (ADHD) is a neurodevelopmental disorder marked by inattention, hyperactivity, and impulsivity, which are inconsistent with an individual’s age and developmental level [1]. The exact cause of ADHD remains unknown; however, several factors, including genetics, brain injury, toxin exposure, and substance abuse during pregnancy, may contribute to its development [2]. Currently, diagnosis of ADHD often relies on comprehensive evaluation by a pediatrician, psychiatrist, or psychologist. However, clinical manifestations of ADHD are subtle and difficult to detect [3]. Imaging techniques such as magnetic resonance imaging (MRI), positron emission tomography (PET), and computed tomography (CT) scans are not yet reliable for ADHD diagnosis, and even in developed countries, there is ongoing debate among specialists regarding diagnostic criteria. Moreover, the risk of false-positive diagnoses, especially in children, remains high. The diagnostic process for ADHD typically involves gathering a significant amount of data, making it complex, time-consuming, and costly. This extensive data collection is necessary to determine whether ADHD underlies the observed symptoms as opposed to normal behavioral variations, potential differential diagnoses, or co-occurring conditions [4]. Yet, it heavily relies on subjective assessments, increasing the likelihood of diagnostic biases and potentially delaying treatment initiation despite the availability of effective ADHD treatments. Therefore, it is imperative to streamlining, abbreviate and standardize the diagnostic process of ADHD. This can be achieved by identifying the most relevant data elements that can accurately predict the diagnostic outcomes. Machine learning (ML) techniques offer a promising approach to achieve this goal.

The growing availability of ML models has sparked a surge in interest in their application in the study of psychiatric disorders. ML models are mathematical algorithms capable of identifying intricate patterns in existing datasets. These learned patterns can then be utilized for predictive tasks in new datasets (e.g., patient vs. control participant classification and symptom score prediction), as well as to identify the most critical variables influencing these predictions. They have demonstrated remarkable effectiveness in capturing complex interactions underlying discrete alterations in schizophrenia, Alzheimer’s disease (AD), and autism spectrum disorder (ASD) [5].

One study showcased electroencephalography (EEG) analysis effectiveness in differentiating 253 children with ADHD from 67 age-matched controls. These used EEG signals recorded during eyes-closed resting-state conditions. Using logistic regression (LR) as a classifier, they achieved an impressive accuracy of 87%, sensitivity of 89%, and specificity of 79.6% [6]. Another study [7] explored the using nonlinear features extracted from EEG recordings during a cognitive attention task for classification. By combining minimum redundancy maximum relevance (mRMR) feature selection and multilayer perceptron (MLP) with one hidden layer containing five neurons, the researchers successfully distinguished 30 children with ADHD from 30 age-matched controls, achieving an accuracy of 92.28%. A recent nationwide Swedish study employed multiple ML models, including random forest (RF), elastic net, deep neural network, and gradient boosting machine (GBM) to identify significant predictors of ADHD based on the family and medical histories of a large cohort of 238,696 individuals [8]. The best-performing model attained a sensitivity of 71.7% and a specificity of 65.0%. Key risk factors for ADHD in children included having parents with criminal convictions, male sex, having a relative with ADHD, academic difficulties, and learning disabilities.

The potential of event-related potentials (ERPs) as diagnostic tools for ADHD was explored in a study that examined ERPs derived from the prefrontal and inferior parietal regions in 14 children with ADHD and 16 age-matched controls during a spatial Stroop task. Using k-nearest neighbor (KNN) and support vector machine (SVM) classifiers, researchers achieved an impressive 83.33% accuracy with KNN, outperforming the SVM’s accuracy of 56.42% [9]. Beyond enabling subject classification into traditional diagnostic groups, research on ML-aided ADHD diagnosis has also contributed to a deeper understanding of the clinical presentation and heterogeneity of ADHD by facilitating the identification of novel subgroups of participants, which can ultimately enhance diagnostic accuracy [10].

Feature selection is vital in ML, refining model performance by pinpointing the most relevant and informative features, thereby reducing dimensionality, improving interpretability, and mitigating the risk of overfitting. Moreover, ML techniques that can explore the heterogeneities in ADHD (e.g., feature selection and reduction methods) have the potential to not only improve diagnosis, but also contribute to advancements in future research investigating the underlying mechanisms of ADHD by providing more appropriately defined samples.

In our study, we investigated the potential of EEG data to differentiate between children with ADHD and HC to improve diagnostic accuracy. Using mutual information (MI) for feature selection, we identified key features with minimum MI values from both groups. To manage computational complexity, principal component analysis (PCA was applied for dimensionality reduction while preserving essential features. Ten classifiers (RF, MLP, KNN, extra tree classifier [ETC], XGBoost [XGB]), SVM, LR, AdaBoost, classification and regression tree [CART], and gradient boosting machine [GBM]) were trained using the selected features. A Stacked classifier, combining outputs from all 10 classifiers with MLP as the meta-classifier, demonstrated promising results.

The remainder of this paper is organized as follows. Section 2 outlines the materials and methodology employed. Section 3 discusses results comprehensively. Finally, Section 4 concludes the paper with a summary of key findings and future directions.

2. Methodology

In this study, we proposed an ML framework that utilizes MI-based feature selection, dimensionality reduction, and stacked classifier ensemble learning to address the challenges of traditional ADHD diagnosis and enhance classification accuracy.

Figure 1 illustrates the architecture of the proposed method for ADHD and health controls (HC) classification using a stacked classifier. The framework begins by preprocessing collected EEG data to ensure quality and consistency. Subsequently, MI was employed to select the most relevant features from the preprocessed EEG signals, eliminating redundant and less informative features. Further refinement of the data representation is achieved through PCA, which reduced the dimensionality of the feature set while preserving essential information. The selected features are then used to train a stacked classifier, which is an ensemble-learning approach that combines the predictions of multiple individual classifiers. This stacked classifier leverages the strengths of different algorithms to achieve superior performance compared with standalone classifiers. Finally, the trained stacked classifier predicts ADHD and HC, with performance compared to other individual classifiers. This comprehensive approach aims to improve the accuracy and efficiency of ADHD diagnosis, potentially paving the way for improved clinical decision-making and patient outcomes.

2.1 Dataset

This study utilized a validated online EEG dataset available at https://ieee-dataport.org/open-access/eegdata-adhd-control-children [11]. This dataset includes EEG recordings from two distinct groups: 61 children diagnosed with ADHD (48 boys and 13 girls, average age 9.61 ± 1.75 years) and 60 healthy children (50 boys and 10 girls, average age 9.85 ± 1.77 years). EEG data were collected using an SDC24 device equipped with 19 channels followed the 10–20 electrode placement system commonly used in EEG. The 10–20 plus standard system provides a consistent method for electrode positioning by dividing the scalp into defined regions with labeled points based on standardized distances relative to prominent anatomical landmarks. Electrodes were precisely placed at predetermined locations to ensure uniform and reproducible EEG signal acquisition across diverse brain regions. The “10–20” nomenclature signifies the standardized distances of 10% or 20% between electrode placements, creating a systematic grid for consistent electrode positioning across subjects and research settings. This method ensures methodological rigor in data collection, facilitates meaningful cross-subject and cross-study comparisons, and is crucial in EEG-based ADHD research. ADHD was diagnosed by an experienced child and adolescent psychiatrist using the Diagnostic and Statistical Manual of Mental Disorders, fourth edition (DSM-IV) criteria [12].

2.2 Preprocessing

EEG signals are susceptible to various artifacts and noise sources, necessitating thorough preprocessing before reliable analysis. The datasets in this study underwent the following preprocessing steps.

The dataset owners employed a customized iteration of Makoto’s preprocessing pipeline adapted for use with EEGLab functions (version 14.1.1; Delorme & Makeig, 2004) in MATLAB 2018a. For artifact elimination, a band-pass finite impulse response filter covering 0.5 Hz to 48 Hz was applied to the continuous EEG data. The CleanLine plugin was used to suppress line noise. Independent component analysis (ICA) was then employed to decompose the EEG data, and components related to eye blinks and muscle artifacts were manually excluded based on spectral properties, scalp maps, and temporal characteristics. For each subject, the time-series was segmented into 1,024-sample (8-second) segments, with variable counts owing to task-specific timings. The minimum task duration was 50 seconds for the control group participants, while the maximum duration was 285 seconds for the ADHD participants. The mean segment count were 13.18 (standard deviation = 3.15) in the control group and 16.14 segments (standard deviation = 6.42) in the ADHD group.

2.3 Feature Selection and Dimensionality Reduction

MI quantifies the statistical dependence between two variables. In the context of feature selection for ADHD classification using EEG data, MI helps identify EEG features highly correlated with the target variable (i.e., ADHD diagnosis). This method is powerful for selecting the most relevant and informative features in a dataset. PCA is a dimensionality-reduction technique. It reduces the number of features in a dataset while preserving the most important information. The use of both methods in this study is described below.

2.3.1 Feature selection using MI

MI gauges the dependence between two variables. In this study, MI was used to identify the EEG features that were most informative for distinguishing children with ADHD from healthy controls. Features with high MI values are crucial as they provide more information regarding the diagnosis of ADHD. To calculate the MI, the minimum MI values between each EEG feature and class label (ADHD vs. HC) were computed using the mutual_info_score function in Python’s scikit-learn library.

Finally, topographic maps were generated to visualize the spatial distribution of the most informative EEG features. These maps, created using the minimum norm estimate (MNE) tool-box, provide a visual perspective on how these significant features are distributed across the scalp.

Figure 2 displays a tomographic comparison of features between ADHD and HC based on minimum MI. The figure shows that several features are significantly different between individuals with ADHD and controls. These features include:

  • T7 (Temporal regions of the brain): This region is involved in processing auditory and visual information.

  • F3, Fz, and F4 (Frontal lobe of the brain): This region is involved in executive functions including planning, decision-making, and attention.

  • Cz and C4 (Central regions of the brain): This region is involved in motor and sensory processes.

  • P3, P7, Pz, and P4 (Parietal lobes of the brain): This region is also involved in spatial processing and attention.

  • T8 (Temporal lobe of the brain): This region is involved in processing auditory and visual information.

The figure also illustrates that the direction of the difference in MI between ADHD and control individuals was not always consistent. For instance, the MI between individuals with ADHD and control individuals was higher for the temporal region of the brain in some cases (e.g., Ept and TB) and lower in others (e.g., P3 and Pz). This suggests that the underlying neural mechanisms of ADHD are complex and that there is no single feature that can reliably distinguish between individuals with ADHD and controls.

2.3.2 Feature reduction using PCA

This process entails reducing the dimensionality of the features chosen through MI. The cumulative variance plot in Figure 3 visually indicates the proportion of explained variance with an increasing number of components, which helps to identify an optimal cutoff. Here, the 95% cutoff threshold guided the selection of the number of components. Subsequently, PCA was applied to transform the original dataset into a new, more compact representation with a reduced number of dimensions (five in this instance). Five PCA components were selected as they accounted for over 95% of the variance in the brain network data.

2.4 Classification

In this study, we employed an ensemble of ten ML classifiers to detect patterns in EEG data and enhance ADHD classification in children. The classifiers utilized encompassed a range of algorithms to ensure a comprehensive exploration of the dataset. RF [13], MLP [14], KNN [15], ETC [16], and XGB [17] represent various decision-based models. Additionally, SVM [18], LR using stochastic gradient descent (SGD) [19], AdaBoost [20], CART [21], and GBM [22] have introduced diverse learning approaches. This broad range of classifiers ensures a thorough examination of EEG data, capturing the nuances and complexities essential for accurate classification of ADHD in children.

Random forest (RF): RF is an ensemble learning technique that builds multiple decision trees during training. Each tree is built using a random subset of features and a subset of the training data. The final prediction is made by aggregating the predictions of all individual trees, often using a majority-voting scheme. This method boosts the robustness and mitigates overfitting by introducing randomness into the model.

Multilayer perceptron (MLP): An MLP is a type of artificial neural network with multiple layers of interconnected nodes (neurons). It can learn complex patterns in data through a process known as backpropagation, in which the network adjusts its internal parameters based on errors in its predictions. MLP is particularly effective for tasks involving nonlinear relationships in data due to its capacity to model intricate decision boundaries.

K-nearest neighbors (KNN): KNN is a lazy-learning algorithm that classifies data points based on the majority class of their KNN in the feature space. It operates under the assumption that similar instances in an input space share similar output labels. The choice of k (number of neighbors) influences the sensitivity of the model to local variations in the data.

Extra tree classifier (ETC): ETC is similar to a RF but distinguishes itself by splitting the data using random thresholds for each feature. This builds multiple decision trees and selects the best splits, thereby introducing additional randomness into the model. This randomness can improve the generalization ability and resilience of the model to noise.

XGBoost (XGB): XGBoost is an optimized gradient boosting library tailored for efficient processing of large datasets. It sequentially builds decision trees, with each subsequent tree aimed at correcting the errors made by the ensemble of the previous trees. XGBoost incorporates regularization terms to prevent overfitting and has become widely popular owing to its speed and performance in various ML tasks.

Support vector machines (SVM): SVMs are supervised learning models that aim to find a hyperplane in a feature space that best separates different classes of data points. SVM excels in high-dimensional spaces and is particularly useful for tasks in which clear decision boundaries are required.

Logistic regression (LR): LR is a linear classification algorithm frequently employed to predict binary outcomes (e.g., ADHD vs. HC). It estimates the probability of a sample belonging to a particular class by finding the linear decision boundary that best separates the data points. The model’s parameters are trained using optimization algorithms such as SGD, which iteratively adjusts the parameters to minimize the error between the predicted probabilities and true class labels.

AdaBoost: AdaBoost is an ensemble-learning method that sequentially combines weak classifiers. Each weak classifier aims to correct the errors of its predecessor, and instances that are misclassified in one round are assigned higher weights in subsequent rounds. This adaptiveness makes AdaBoost robust and adept at handling complex decision-making boundaries.

Classification and regression trees (CART): CART is a decision-tree algorithm utilized for both classification and regression tasks. It recursively divides the data into subsets based on the most discriminative features, forming a tree-like structure in which each leaf node corresponds to a class label or regression value.

Gradient boosting machines (GBM): GBMs are an ensemble technique that constructs decision trees sequentially. Each tree is trained to rectify errors from the ensemble of previously built trees. GBM excels at capturing intricate relationships in data and is often used for regression and classification tasks.

Stacked classifier (meta-classifier): The final layer of the ensemble, referred to as the stacked classifier, is a MLP in this study. It takes predictions from 10 base classifiers as input features and learns to make a final prediction, harnessing the diverse information provided by the individual classifiers.

Stacked classifiers, also known as stacked ensembles, is an ML approach that combines the predictions of multiple base classifiers to improve the overall performance. In this study, ten diverse classifiers were employed, and their outputs were fed into a meta-classifier, specifically an MLP, to form a stacked classifier. This ensemble strategy aims to leverage the strengths of individual classifiers and enhance predictive accuracy and robustness.

2.5 Performance Metrices

It is important to use various metrics when evaluating the performance of a classifier. This is because no single metric can capture all aspects of the classifier performance. In this study, we used the following five performance metrics.

Accuracy: Accuracy measures the overall correctness of the model’s predictions and is calculated as the ratio of correctly predicted instances to total instances.

Accuracy=TP+TNTP+TN+FP+FN,

where true positive (TP), true negative (TN), false positive (FP), and false negative (FN).

Precision: The precision evaluates the model’s accuracy in identifying positive instances, indicating the proportion of true-positive predictions among all instances predicted to be positive.

Precision=TPTP+FP+FP.

Sensitivity: Sensitivity assesses the model’s ability to capture all positive instances and is defined as the ratio of true positives to the sum of true positives and false negatives.

Sensitivity=TPTP+FN+FN.

Specificity: This measures the model’s accuracy in identifying negative instances and is calculated as the ratio of true negatives to the sum of true negatives and false positives.

Specificity=TNTN+FP.

F1-score: The F1-score combines precision and sensitivity, providing a balanced metric that considers both false positives and false negatives. It is calculated as the harmonic mean of the precision and sensitivity.

F1-score=2×Precision×SensitivityPrecision+Sensitivity.

3. Results & Discussion

The performance comparison results of the ten ML classifiers for the ADHD classification task are shown in Figure 4. The stacked classifier exhibited the best performance, with 92% accuracy, 91% precision, 93% sensitivity, 93%, and an F1-score of 92%.

Furthermore, based on the confusion matrix shown in Figure 5, the classifier detected 56 TP, 5 FP, 5 FN, and 55 TN. The 5-fold cross-validation was applied, and the average of each performance metric was reported. This indicates that the stacked classifier potentially learned more complex patterns in the data than the other classifiers.

The XGB and ETC also demonstrated strong performance, achieving accuracies of 88% and 79%, respectively. These classifiers are ensemble-learning methods that combine the predictions of multiple individual classifiers to produce a final prediction. This is one reason why they were able to achieve such good performance.

In contrast, the other classifiers exhibited lower performances, with accuracies ranging from 54% to 79%.

Overall, the results indicate that ensemble learning methods hold promise for ADHD classification. The stacked classifier achieves the best performance, whereas the XGB and ETC perform satisfactorily.

Here are some additional observations from the results:

  • • The stacked classifier had the highest precision and specificity, indicating that it was very good at correctly identifying both positive and negative ADHD cases.

  • • The XGB classifier exhibited the highest sensitivity, indicating its strong ability to detect all cases of ADHD, even if it resulted in false positives.

  • • The MLP and SVM classifiers had the lowest performance, suggesting that they were unable to learn complex patterns in the ADHD data as well as the other classifiers.

  • • Varied MI differences between ADHD and HC were observed in a variety of brain regions, suggesting that ADHD is a complex neurodevelopmental disorder that affects multiple brain regions.

  • • The direction of the MI differences was not always consistent, suggesting that there is no single neural signature for ADHD.

  • • The observed MI differences were relatively small, suggesting that ADHD is a heterogeneous disorder with a wide range of presentations.

In conclusion, our findings offer important insights into the neural mechanisms underlying ADHD. The results indicate that MI could be used to develop more accurate diagnostic tools for ADHD, and that a multimodal approach to ADHD diagnosis and treatment may be most effective.

4. Conclusion

This study extensively explored the use of ML techniques for the classification of children with ADHD and HC using EEG data. By employing MI-based feature selection, dimensionality reduction via PCA, and a stacked classifier ensemble, we significantly improved diagnostic accuracy. The robust performance of the individual classifiers, particularly the XGBoost model, highlights the potential of advanced ML algorithms to discern subtle patterns in EEG data associated with ADHD. The stacked classifier exhibits superior accuracy, emphasizing the benefits of combining diverse classifiers to enhance diagnostic outcomes. Our findings contribute to the growing body of literature on ADHD diagnosis, showcasing the effectiveness of ML methodologies and emphasizing the importance of feature selection and ensemble learning strategies. Further research on larger and more diverse datasets is needed to consolidate and extend our findings, with the ultimate goal of refining ADHD diagnosis and advancing our understanding of neurodevelopmental disorders.

This study showcased the potential of utilizing minimum MI-based features and a stacked classifier for ADHD diagnosis. However, the balanced and preprocessed nature of the dataset may limit the generalizability of the findings to real-world clinical settings. Future investigations should evaluate the performance of stacked classifiers on imbalanced datasets that reflect wider patient variations. Furthermore, exploring the integration of different neuroimaging modalities with EEG features could offer valuable avenues for enhancing diagnostic accuracy and deepening our comprehension of ADHD.

Fig 1.

Figure 1.

Proposed stacked classifier-based framework.

The International Journal of Fuzzy Logic and Intelligent Systems 2024; 24: 10-18https://doi.org/10.5391/IJFIS.2024.24.1.10

Fig 2.

Figure 2.

Topographic maps of minimum mutual information values between EEG features and ADHD diagnosis, comparing ADHD and HC.

The International Journal of Fuzzy Logic and Intelligent Systems 2024; 24: 10-18https://doi.org/10.5391/IJFIS.2024.24.1.10

Fig 3.

Figure 3.

Cumulative variance explained by principal components after feature selection usingMI.

The International Journal of Fuzzy Logic and Intelligent Systems 2024; 24: 10-18https://doi.org/10.5391/IJFIS.2024.24.1.10

Fig 4.

Figure 4.

Performance comparison of stacked classifier with other individualML models for ADHD classification.

The International Journal of Fuzzy Logic and Intelligent Systems 2024; 24: 10-18https://doi.org/10.5391/IJFIS.2024.24.1.10

Fig 5.

Figure 5.

Confusion matrix of stacked classifier for ADHD classification.

The International Journal of Fuzzy Logic and Intelligent Systems 2024; 24: 10-18https://doi.org/10.5391/IJFIS.2024.24.1.10

References

  1. Polanczyk, G, De Lima, MS, Horta, BL, Biederman, J, and Rohde, LA (2007). The worldwide prevalence of ADHD: a systematic review and metaregression analysis. American Journal of Psychiatry. 164, 942-948. https://doi.org/10.1176/appi.ajp.164.6.942
    Pubmed CrossRef
  2. Nunez-Jaramillo, L, Herrera-Solis, A, and Herrera-Morales, WV (2021). ADHD: reviewing the causes and evaluating solutions. Journal of Personalized Medicine. 11. article no 166. https://doi.org/10.3390/jpm11030166
    Pubmed KoreaMed CrossRef
  3. Eigsti, IM, Barkley, RA, and Fletcher, K (2008). The clinical presentation of attention-deficit/hyperactivity disorder in children and adolescents: A review of the diagnostic criteria and evidence-based practices. Journal of Clinical Child and Adolescent Psychology. 47, 977-1013.
  4. Faraone, SV (2005). The scientific foundation for understanding attention-deficit/hyperactivity disorder as a valid psychiatric disorder. European Child & Adolescent Psychiatry. 14, 1-10. https://doi.org/10.1007/s00787-005-0429-z
    Pubmed CrossRef
  5. Arbabshirani, MR, Plis, S, Sui, J, and Calhoun, VD (2017). Single subject prediction of brain disorders in neuroimaging: promises and pitfalls. Neuroimage. 145, 137-165. https://doi.org/10.1016/j.neuroimage.2016.02.079
    Pubmed KoreaMed CrossRef
  6. Magee, CA, Clarke, AR, Barry, RJ, McCarthy, R, and Selikowitz, M (2005). Examining the diagnostic utility of EEG power measures in children with attention deficit/hyperactivity disorder. Clinical Neurophysiology. 116, 1033-1040. https://doi.org/10.1016/j.clinph.2004.12.007
    Pubmed CrossRef
  7. Mohammadi, MR, Khaleghi, A, Nasrabadi, AM, Rafieivand, S, Begol, M, and Zarafshan, H (2016). EEG classification of ADHD and normal children using non-linear features and neural network. Biomedical Engineering Letters. 6, 66-73. https://doi.org/10.1007/s13534-016-0218-2
    CrossRef
  8. Garcia-Argibay, M, Zhang-James, Y, Cortese, S, Lichtenstein, P, Larsson, H, and Faraone, SV (2023). Predicting childhood and adolescent attention-deficit/hyperactivity disorder onset: a nationwide deep learning approach. Molecular Psychiatry. 28, 1232-1239. https://doi.org/10.1038/s41380-022-01918-8
    Pubmed KoreaMed CrossRef
  9. Yang, J, Li, W, Wang, S, Lu, J, and Zou, L. (2016) . Classification of children with attention deficit hyperactivity disorder using PCA and K-nearest neighbors during interference control task. Advances in Cognitive Neurodynamics (V). Singapore: Springer, pp. 447-453. https://doi.org/10.1007/978-981-10-0207-6_61
    CrossRef
  10. Karalunas, SL, and Nigg, JT (2020). Heterogeneity and subtyping in attention-deficit/hyperactivity disorder—considerations for emerging research using personcentered computational approaches. Biological Psychiatry. 88, 103-110. https://doi.org/10.1016/j.biopsych.2019.11.002
    Pubmed KoreaMed CrossRef
  11. Nasrabadi, AM, Allahverdy, A, Samavati, M, and Mohammadi, MR (2023). EEG data for ADHD/control children. Available: https://dx.doi.org/10.21227/rzfh-zn36
    CrossRef
  12. Morgan, AE, Hynd, GW, Riccio, CA, and Hall, J (1996). Validity of DSM-IV ADHD predominantly inattentive and combined types: relationship to previous DSM diagnoses/subtype differences. Journal of the American Academy of Child & Adolescent Psychiatry. 35, 325-333. https://doi.org/10.1097/00004583-199603000-00014
    Pubmed CrossRef
  13. Breiman, L (2001). Random forests. Machine Learning. 45, 5-32. https://doi.org/10.1023/A:1010933404324
    CrossRef
  14. Hinton, GE, and Salakhutdinov, RR (2006). Reducing the dimensionality of data with neural networks. Science. 313, 504-507. https://doi.org/10.1126/science.1127647
    Pubmed CrossRef
  15. Cover, T, and Hart, P (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory. 13, 21-27. https://doi.org/10.1109/TIT.1967.1053964
    CrossRef
  16. Geurts, P, Ernst, D, and Wehenkel, L (2006). Extremely randomized trees. Machine Learning. 63, 3-42. https://doi.org/10.1007/s10994-006-6226-1
    CrossRef
  17. Chen, T, and Guestrin, C. XGBoost: a scalable tree boosting system, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 2016, pp. 785-794. https://doi.org/10.1145/2939672.2939785
    CrossRef
  18. Cortes, C, and Vapnik, V (1995). Support-vector networks. Machine Learning. 20, 273-297. https://doi.org/10.1007/BF00994018
    CrossRef
  19. Bottou, L, and LeCun, Y (2003). Large scale online learning. Advances in Neural Information Processing Systems. 16, 213-224.
  20. Freund, Y, and Schapire, RE (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences. 55, 119-139. https://doi.org/10.1006/jcss.1997.1504
    CrossRef
  21. Quinlan, JR (1996). Bagging, boosting, and C4.5., Proceedings of the AAAI Conference on Artificial Intelligence, 13, pp.725-730.
  22. Friedman, JH (2001). Greedy function approximation: a gradient boosting machine. Annals of Statistics. 29, 1189-1232.
    CrossRef

Share this article on :

Related articles in IJFIS