Article Search
닫기

Original Article

Split Viewer

International Journal of Fuzzy Logic and Intelligent Systems 2023; 23(4): 482-499

Published online December 25, 2023

https://doi.org/10.5391/IJFIS.2023.23.4.482

© The Korean Institute of Intelligent Systems

Hilbert-Schmidt Independence Criterion Lasso Feature Selection in Parkinson’s Disease Detection System

Wiharto Wiharto , Ahmad Sucipto, and Umi Salamah

Department of Informatics, Universitas Sebelas Maret, Surakarta, Indonesia

Correspondence to :
Wiharto Wiharto (wiharto@staff.uns.ac.id)

Received: July 4, 2022; Accepted: December 22, 2023

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Parkinson’s disease is a neurological disorder which interferes human activities. Early detection is needed to facilitate treatment before the symptoms get worse. Earlier detection used vocal voice as a comparison with normal subject. However, detection using vocal voice still has weaknesses in detection system. Vocal voice contains a lot of information that isn’t necessarily relevant for a detection system. Previous studies proposed a feature selection method on detection system. However, the proposed method can’t handle variation in the amount of data. These variations include an imbalance sample to features and classes. In answering these problems, the Hilbert-Schmidt Independence Criterion Lasso (HSIC Lasso) feature selection method is used which has feature transformation capabilities that can produce more relevant features. In addition, detection system uses Synthetic Minority Oversampling Technique (SMOTE) method to balance data and several classification methods such as k-nearest neighbors, support vector machine, and multilayer perceptron to obtain best predictive model. HSIC Lasso produces 18 of 45 features with an accuracy of 88.34% on a small sample and 50 of 754 features with an accuracy of 96.16% on a large sample. From this result, when compared with previous studies, HSIC Lasso is more suitable on balanced data with more samples and features.

Keywords: Parkinson’s disease, Early detection, Vocal voice, HSIC Lasso

Parkinson’s disease is a chronic and progressive neurodegenerative disorders characterized by motor and non-motor dysfunction of human nerves [1]. This neurodegenerative disorder occurs in dopamine-producing cells in the substantia nigra in the midbrain [2]. When dopamine-producing cells, also known as dopaminergic cells, degenerate, dopamine levels drop. Dopamine is a neurotransmitter whose job is to bind to G protein receptors in the dopaminergic signaling system which is very important for physiological processes and balance of activities of the human body [3]. The motor dysfunction of Parkinson’s disease is characterized by muscle weakness and stiffness, slurred speech, and weight loss. Meanwhile, non-motor dysfunction in Parkinson’s disease is followed by cognitive changes, sleep disturbances, and psychiatric symptoms [4]. If these symptoms get worse, it will affect all activities of that person and is difficult to treat. Researchers continue to strive to prevent the development of Parkinson’s disease by conducting early detection, in order to make it easier for clinical officers to treat people with Parkinson’s disease.

The previous detection of Parkinson’s disease used a lot of people’s vocal voices as a comparison with normal individuals. In the study of Sakar et al. [5] said that 90% of people with Parkinson’s disease have problems with their voices. Sakar et al. [5] also proved by testing the pronunciation of letters, numbers, and some words for people with Parkinson’s disease and normal individuals. The test results show that the pronunciation of vowels provides more discriminatory information in cases of Parkinson’s disease. That’s because, voice noise as a comparison characteristic of disease is more clearly heard in the pronunciation of vowel sounds [6]. However, the use of human vocal sounds still has weaknesses in the detection system.

Human vocal sound has a lot of information that is not necessarily useful for the detection process. Little et al. [7] tried to extract sound signals based on frequency, duration, and noise. Meanwhile, Sakar et al. [8] transforming voice signals with wavelet transform and extracting voice signals based on frequency and noise. Both use a machine learning approach, because it can shorten the time in analyzing and observing the development of a disease [9]. Moreover, machine learning can automatically make it easier to process large amounts of data, including attributes of vocal sounds [10]. The sound extraction in this study was used as a data attribute for the learning process of a detection system with a machine learning approach. From the detection system in both studies, the results show that not all attributes make a significant contribution to the learning process. From these problems, we need a method to choose the attributes of vocal sound extraction that are useful for the detection system.

The process of selecting attributes or selecting features will be very useful to improve the performance of the detection system. In the same case, Sakar et al. [8] tested the minimum redundancy maximum relevance (MRMR) feature selection method which produces a Parzen window estimator to calculate feature correlations in the data. However, the Parzen window estimator is not reliable when the number of training samples is small and results in less relevant features. Gunduz [11] uses the ReliefF method in filtering features based on feature scores. However, ReliefF shows poor performance on data that has unbalanced classes and produces features that are less relevant for the detection system. Feature selection method implementation that has been applied previous studies cannot overcome variations in the amount of data including the number of samples against the number of features and the number of samples that are not balanced between Parkinson’s disease and normal subject. Therefore, a feature selection method is needed that can produce features that are more relevant to variations in the amount of data.

Yamada et al. [12] presents a feature selection method Hilbert Schmidt Independence Criterion Lasso (HSIC Lasso) which transforms features into kernel form to obtain more relevant features. The resulting kernel shape has sparsity which improves the calculation of feature correlation values. Damodaran et al. [13] used HSIC Lasso and succeeded in selecting high-dimensional image features and producing good prediction performance. Therefore, this study will test the HSIC Lasso feature selection method in the Parkinson’s disease detection system which is expected to produce vocal sound features that are more relevant to variations in the amount of data.

This study is related to other research based on the methods and materials used in the research process. Some of these linkages include using the selection, oversampling, and classification features in the same way or applied to the same case. The dataset material used is also taken from several studies by taking, cases, and forms in the form of feature extraction which are relatively the same. The datasets are from Little et al. [7], Naranjo et al. [14], and Sakar et al. [8]. The research of Little et al. [7] aims to introduce a new technique for analyzing voices in Parkinson’s disease. The methods used in this study are recurrence and fractal scaling. The results show that recurrence and fractal scaling are effective in regression classification with thousands of features. Naranjo et al. [14] tested the replication of voice recordings that differentiated healthy people from people with Parkinson’s disease using a subject-based Bayesian approach. The proposed system is able to distinguish cases even with a small sample. While Sakar et al. [8] examined feature extraction of tunable Q-factor wavelet transform in voice signals in cases of Parkinson’s disease. The proposed method shows a better performance than the previously applied technique.

Then application of the feature selection method used in this study was taken from the study of Yamada et al. [12]. They aims to prove the optimal solution in determining features relevant to the kernel in the feature selection method using the HSIC Lasso. The proposed method is proven to be effective in the classification and regression process with thousands of features. Meanwhile, oversampling and classification used in this study are related to research that uses selection features in the same case and dataset. In the oversampling method, the synthetic minority oversampling technique (SMOTE), also applied by Hoq et al. [15]. In [15], the authors tries to apply artificial intelligent in the case of Parkinson’s disease by extracting vocal sound features using principal component analysis and sparse autoencoder with synthetic minority oversampling. The sparse autoencoder method results in better performance in the detection system. Then from the classification method, Ouhmida et al. [16] using the k-nearest neighbors (k-NN) classification method to analyze the performance of machine learning in identifying Parkinson’s disease based on speech disorders. In addition, the MRMR and ReliefF feature selection is used. The results show that the ReliefF feature selection results in better performance. Abdullah et al. [17] applying machine learning in a Parkinson’s disease diagnosis system based on voice recordings with Lasso feature selection and support vector machine (SVM) classification. The results show that the model produces better performance than the previous method. Meanwhile, Ramchoun et al. [18] introduces a new approach to optimize network architecture with multilayer perceptron classification method. The model architecture shows a high effectiveness compared to the previously applied model.

Feature selection research using HSIC Lasso for the detection of Parkinson’s disease is carried out by creating a detection system with a machine learning approach. In making the detection system, several stages were carried out such as feature selection, data balancing with oversampling, and classification to build predictive models. The stages are shown in Figure 1. In obtaining the optimal prediction model, the hyper-parameter adjustment stage is carried out on the classification model. The data is divided into two parts, namely training data and testing data. Prediction model setup using training data. While the rest, data testing, is used to evaluate the prediction model. The distribution of the data is based on the evaluation technique used in related research. It is intended that the results of the prediction model in this study can be compared with related research. The evaluation technique used is to divide 75% of the training data and 25% of the testing data on Naranjo et al. [14] dataset, 10-fold cross-validation on Little et al. [7] dataset, and leave one subject out cross-validation on Sakar et al. [8] dataset.

3.1 Dataset Description

The dataset that will be used in this study was taken from the UCI (University of California at Irvine) machine learning database. Data was collected by selecting the same case (Parkinson’s disease), relatively similar features (the extraction of vocal cords of people with Parkinson’s disease), and the same sampling method. In this study, the authors propose several data sets that meet these prerequisites, which are shown in Table 1.

The data acquisition is classified as a physical test by taking the sound of the vowel “a” as many as three samples per subject, except for the Little et al. [7] dataset who took six samples per subject. The use of the vowel “a” already represents all vowel pronunciations [7]. Voice capture uses a microphone whose frequency is set at 44.1 kHz. The results of the sound extraction are grouped into the feature subsets described in Table 2.

3.2 Oversampling

In the research of Hoq et al. [15] used a method to solve the problem of unbalanced data in machine learning. The method is called SMOTE. The SMOTE method oversampling the class that is classified as a minority. The resulting new samples are synthetic parts of the nearest minority class data. The closest data distance used for the new sample is randomly selected. In Figure 2, the synthetic data (red triangle) generated from the minor class (blue triangle) makes the two data classes balanced.

The dataset used by Sakar et al. [8] and Little et al. [7] have a ratio of data on people with Parkinson’s disease with normal individuals in the range of 3:1. The ratio of the amount of data concludes that the dataset is unbalanced. This can cause the predictive model to over-fit (a situation when the model is too good in the training process) in one class and ignore the minor class. Therefore, the author will test oversampling using SMOTE to add synthetic data to an unbalanced dataset. The method will select a sample of adjacent data from the minor class and balance the ratio of the numbers of data to 1:1. The oversampling process is only carried out on the training data. This is because the synthetic data produced are not relevant to be used to evaluate the model [15].

3.3 Feature Selection

The least absolute shrinkage and selection operator (Lasso) method is an operator used for the selection and shrinkage of the smallest absolute value [19]. Lasso will optimize the case by making the regression coefficient of irrelevant and/or redundant features (features that have a low correlation with the error function) to be precise or close to 0. The Lasso optimization case is shown as follows:

Lasso=minαd12y-Xα22+λα1,

where α = [α1, . . ., αd]′ is the regression coefficient vector, αk shows the regression coefficient of the k-th feature, λ > 0 is the regularization parameter, ||·||1 is L1-norm, ||·||2 is L2-norm, and [·]′ are transpose matrices. The optimization of the regression coefficient by Lasso is depicted in Figure 3 as a rectangular area. Meanwhile, the ellipse in Figure 3 is the boundary line of the least square error function.

In the research of Yamada et al. [12] proposed the HSIC Lasso method. This method is an alternative implementation of nonlinear feature-wise Lasso which aims to obtain sparsity in features by forming a Hilbert-Schmidt reproducing lernel. The optimization problem for HSIC Lasso is shown in Eq. (2).

HSICLasso=minαd12L¯-k=1dαkK¯(k)Frob2+λα1,

where ||·||Frob is Frobenius norm, (k) = ΓK(k)Γ and = ΓLΓ is Gram centered matrix, Ki,j(k)=K(xk,i,xk,j) and Li,j = L(yi, yj) are Gram matrices, K(x, x′) and L(y, y′) is a kernel function, =Γ=In-1n1n1n is a centering matrix, In is an n-dimensional identity matrix, and 1n is an n-dimensional vector that all contains one.

In Figure 4 the input and target data are normalized using a z-score (centering with the average value and shrinking with a standard deviation) to facilitate the learning process. Then, HSIC Lasso uses the gaussian kernel for the input K transformation and the target L transformation if the target is a regression case. However, if the target is a classification case or the target data type is a category, then the kernel used is the delta kernel.

Gaussian Kernel=exp (-(x-x)22σx2),Delta Kernel={1ny,if y=y,0,sebaliknya,

where σx is the standard deviation of the data. After that, it is continued by using the Γ centering matrix. The last stage is case optimization using least angle regression to shrink the regression coefficient. Features that have a value of more than 0 in the regression coefficient are selected features.

Least-angle regression (LARS) is a step-by-step procedure used to simplify the computation of all types of Lasso cases [20]. This method will estimate the Lasso regression coefficient in stages until the feature has a strong correlation with the output, so that in practice it is not necessary to manually adjust the regularization parameter in the HSIC Lasso method.

In each iteration of the LARS algorithm stage, a calculation is performed on the feature index set A, with {1, 2, . . ., m} ∈ A and m being the number of features, to obtain the regression coefficient in Eq. (2). The stages of each iteration of the LARS algorithm in estimating the HSIC Lasso regression coefficient are described below:

  • • Step 1: Find correlation vector between input X and target y which has been transformed into kernel form and in Eq. (2). Assume ĉ as correlation vector and as μ̂ estimator. The estimator value begins with μ̂ = 0.

    c^=K¯(L¯-μ^).

  • • Step 2: Determine the absolute highest correlation vector Ĉ in each jA feature.

    C^=maxj{c^j}.

  • • Step 3: Specifies value of GA and AA vectors. Assume 1A is the A-dimensional identity matrix.

    GA=XAXAdan AA=(1AGA-11A)-1/2.

  • • Step 4: Calculates the uA equiangular vector. The equiangular vector is used to store the divisor angle of the column in the X data feature. The divisor angle is always less than 90°.

    uA=XAwA,dengan wA=AAGA-1AA.

  • • Step 5: Calculate the inner product of vector a on the feature set A.

    aXuA.

  • • Step 6: Updating the positive estimator on feature set A.

    μ^A+=μ^A+γuA,

    where

    γ^=min+jAcC^c^jAAaj,C^+c^jAA+aj.

    In Eq. (11), min+ is a way of taking a positive minimum value from index j outside the feature set A. If it is at the last index, then γ̂ uses the following equation

    γ^m=C^mAm.

After the stage ends, the features are selected based on the μ estimator. In the LARS algorithm, the μ estimator is the result of μ̂ = α. If the feature selection condition in HSIC Lasso is α> 0, then it can be concluded that μ̂ > 0. While the regularization λ parameter on HSIC Lasso is taken from the highest correlation vector value Ĉ on the last index.

In this study, a feature selection method was proposed using the HSIC Lasso method. The HSIC Lasso method has parameters B and M. Parameter B is the amount of training data that will be divided into several partitions and parameter M is the number of permutations for the bagging process of the HSIC Lasso sample. The author sets the values of B = 10 and M = 5 in the HSIC Lasso feature selection method. The division of data into several partitions will improve the computational performance of the LARS algorithm. In addition, the kernel for output data in the form of category data on HSIC Lasso is set as kernel delta. The use of the LARS algorithm is automatically executed after the transformation of input and output data into the kernel. The LARS optimization method will produce some information such as a list of selected feature indexes, regression coefficient values for each feature, and regularization parameter values.

3.4 Tuning Classification Model

The processed data will be used as training material for prediction models for people with Parkinson’s disease. In machine learning, the data learning stage is contained in the classification process. This study uses several classification methods which are described in the section below.

1) K-nearest neighbors

The basic concept of k-NN is classification based on its closest neighbors [21]. The k-NN method has two stages. The first stage is the process of determining the nearest neighbor. The second stage is the process of determining the class using the neighbors that have been determined. The dominant class of neighbors will determine the predicted data class. Neighbor selection is usually done by measuring distance metrics. Figure 5 shows the process of determining the 5 nearest neighbors. The dominant class (blue triangle object) makes prediction defined as that class.

2) Support vector machine

SVM is a classification method that maps input vector values into higher dimensions [22]. SVM forms a hyperlane (line that separates the two classes) on each side of the shared data (see Figure 6). This classification model has been tested well in previous studies depending on how the parameter settings affect the generalization error value (the measurement value of the algorithm’s accuracy in predicting invisible data).

3) Multilayer perceptron

Multilayer perceptron (MLP) is the development of artificial neural networks (ANN) for the classification process by studying pattern data [18]. MLP has neurons that are organized and always connected from the top layer to the bottom layer (see Figure 7). In the hidden layer there is at least one layer between the input and output. The number of input layers is the same as the number of problem patterns and the number of output layers is the same as the number of classes. The number of hidden layers that will be used in this method can be set freely.

In this study, several classification methods were used to examine vocal datasets of patients with Parkinson’s disease. It aims to produce a detection system with the best detection performance. The classification method used has different hyperparameters. The author will set the hyper-parameters of the classification method to the ranges described in Table 3. Evaluate the tuning using 10-fold cross-validation based on the area under the ROC curve (AUC) parameter value. In the skicit-learn learning package, the setup process by trying all combinations of hyper-parameters is called grid search cross-validation (Grid-SearchCV).

3.5 Evaluation

The test is carried out by comparing the combination of using oversampling and/or feature selection methods without using these methods. The comparison is evaluated based on the classification performance parameters. Then proceed with testing using a predetermined classification. The settings are evaluated based on the value of the AUC parameter. The classification model with the best AUC score is used to adjust the hyperparameters for further evaluation. While the validation of the prediction model uses 25% testing, 10-fold cross-validation, and leave one subject out cross-validation in accordance with previous research. The best prediction model for each dataset will be compared with previous studies based on the same dataset and validation technique.

The results of the prediction model are evaluated based on the confusion matrix which is calculated into classification performance such as accuracy, sensitivity, precision, F1 score, and AUC-ROC curve. The following is an explanation of these parameters:

1) Accuracy

Accuracy is the probability that the predicted fraction is correct from the entire population. The formula for accuracy is shown below:

accuracy=TP+TNTP+TN+FP+FN.
2) Sensitivity

Sensitivity, also known as recall, is the probability of a positive fraction of the predicted correct fraction. The formula for the sensitivity is shown below:

sensitivity=TPTP+FN.
3) Precision

Precision is the probability that a positive fraction of the total fraction should be positive. The formula for precision is shown below:

precision=TPTP+FP.
4) F1 score

The harmonic parameters composed of sensitivity and precision are called F1 scores. The formula for the F1 score is shown below:

F1=2*TP2*TP+FP+FN.
5) AUC-ROC

Area under the curve of receiver characteristic operator (AUC-ROC) is a probability curve and performance calculation that is usually used for binary classification problems. In the ROC curve of Figure 8, the x-coordinate represents the false positive rate and the y-coordinate represents the true positive rate. The AUC value is used to analyze the classification performance. The AUC value ranges from 0–1. The prediction model is said to be good if the AUC value is getting closer to 1.

The accuracy parameters and the number of features are used to compare the prediction model with related studies that have the same data set. The AUC score is used to evaluate the predictive model tuning process. While the ROC curve in both classes, a value of 1 for people with Parkinson’s disease and a value of 0 for normal individuals, was used to evaluate the performance of the final prediction model.

4.1 Model without Proposed Method

The author tries to test the effect of the HSIC Lasso feature selection method and the SMOTE oversampling method on the prediction model. Testing is done by testing the model without using the HSIC Lasso SMOTE method. The evaluation of the prediction model without using HSIC Lasso SMOTE uses the parameters of accuracy, sensitivity, precision, F1, and AUC scores on all datasets that have been determined. Meanwhile, the evaluation technique used in each dataset is in accordance with the related research evaluation technique. The dataset evaluation technique Little et al. [7] dataset using 25% of data testing, Naranjo et al. [14] dataset using 10-fold cross-validation (CV), and the Sakar et al. [8] dataset using leave one subject out cross-validation (LOSOCV).

Prediction model in Little et al. [7] dataset which is shown in Table 4, produces good AUC values for the k-NN and MLP classification methods. While the AUC value in the SVM classification method has a fairly large distance, reaching 10%. When viewed from the performance range of sensitivity and precision parameters, the value of the prediction model parameters using the SVM classification is not too far from other classifications.

Naranjo et al. [14] dataset show that the three classification methods produce accuracy below 80% with the MLP method as the best classification in this model (see Table 5). Likewise with the AUC parameter value which has a long distance with a value of one. This predictive model is still not suitable for use for Parkinson’s disease detection systems. 25% of data testing is used to evaluate the model without HSIC Lasso SMOTE.

While the prediction model using the Sakar et al. [8] dataset, the sensitivity of each model has better results than its precision (see Table 6). This dataset has higher data on people with Parkinson’s disease than in normal individuals. This condition is an over-fitting in the class of people with Parkinson’s disease.

4.2 Proposed Method on Unbalanced Data

The author tries to test the unbalanced data in the Little et al. [7] dataset and the Sakar et al. [8] dataset to determine the effect of unbalanced data in the prediction model. From the two datasets, the normal class of individuals has less data than the class of people with Parkinson’s disease. The performance evaluated for this stage is accuracy, sensitivity, precision, F1, and AUC scores for all classical models that have been determined.

The prediction model using the k-NN classification in Table 7 shows better results than the prediction model without using the HSIC Lasso feature selection shown in Table 4. The value of the AUC parameter in the model is also very close to the value of one. However, the other classification methods experienced a slight decrease of 2% in the accuracy parameter. The number of features selected in this dataset is 8 out of 22 features. Slightly better number of features using the HSIC Lasso k-NN model. Some of the selected features include spread1 (variation of 1 frequency spread), spread2 (variation of 2 frequency spread), DFA (signal fractal scaling exponent), MDVP:Fhi (Hz) (maximum vocal fundamental frequency), MDVP:Fo (Hz) (average vocal fundamental frequency), HNR (harmonic to noise ratio), RPDE (nonlinear dynamical complexity measures), and PPE (nonlinear measure of fundamental frequency variation).

Then, Lasso’s HSIC prediction model on the Sakar et al. [8] dataset resulted in a better sensitivity parameter value than the precision parameter value (see Table 8). The dataset is unbalanced data. The number of people with Parkinson’s is more than normal individuals. The features used in this HSIC Lasso model are 60 features out of 754 features. The AUC parameter value in this model is also far from the value 1. When compared with the model without using HSIC Lasso, the HSIC Lasso model shows an increase in the model using the SVM classification method. However, the other classification methods show a slight decrease in the accuracy parameter.

4.3 Proposed Method with Oversampling

This study applies the stages of oversampling, feature selection, and setting the classification method in stages. The process of oversampling on unbalanced datasets resulted in a ratio of data comparison between people with Parkinson’s disease and normal individuals being 1:1. The oversampling process on the Little et al. [7] dataset chose a minor class of normal individuals with 48 samples to 147 samples, so that the total data in the training Little et al. [7] dataset have 294 data samples. Likewise with the Sakar et al. [8] dataset which has 192 samples of normal individuals to 564 samples and brings the total data to 1, 128 data samples. However, oversampling is only done on the training data. Therefore, the results of oversampling depend on the distribution of data for each evaluation technique. Then at the feature selection stage, the training data is used for learning materials for the HSIC Lasso method to select features that are relevant and not redundant. The results of feature selection produce data in the form of the selected feature index and the regression coefficient value which is the weight of the feature selection. Unselected features have a zero value on the regression coefficient.

The results of using the HSIC Lasso method, which selects features based on correlation values, ranks the regression coefficient values in selecting relevant features. From the research Little et al. [7] dataset which consisted of 22 features, 8 features were selected using HSIC Lasso. The most important feature in this dataset is the feature of spread1 (variation of 1 frequency spread) and followed by spread2 (variation of 2 frequencies spread) which is shown in Figure 9. While other features are basic features of frequency such as DFA, MDVP:Fhi, MDVP:Fo, HNR, RPDE, and PPE. When compared with the results of feature selection on unbalanced data, the selected features in the Little et al. [7] dataset have the same type and number. This shows that the data imbalance does not affect the selected features in Little et al. [7] dataset.

The selected features consist of four subsets of time frequency features and 4 subsets of baseline voice features. The comparison graph between the selected features and the total features is shown in Figure 10. Most of the selected features are in the baseline subset which has a selected feature ratio of 4:17. While the time frequency feature subset has a selected feature ratio of 4:5. In the Little et al. [7] dataset the time frequency feature subset is very influential in the prediction model. However, it has not concluded that the subset of features is the most important feature in the vocal sound signal. It should be underlined that this condition only occurs based on the data collected in the Little et al. [7].

Meanwhile, in the Naranjo et al. [14] dataset which has 45 features, 18 similar features were selected in each evaluation using a different classification. The selected features are shown in Figure 11. The feature that has the highest similarity value is the HNR_35 feature (the amount of the harmonic to noise ratio in the 0–3, 500 Hz frequency range). While the remaining features have a large similarity value range with the HNR_35 feature. HNR_35 belongs to the baseline voice feature subset.

Of the 18 selected features, there are 14 subsets of the mel frequency feature and four subsets of the baseline voice feature. Figure 12 explains that from 26 subsets of selected mel frequency features 12 irrelevant and redundant features and from 19 selected baseline voice feature subsets 15 irrelevant and redundant features. Naranjo et al. [14] dataset mostly discard the baseline voice feature subset and make the mel frequency feature subset more important to use for prediction models.

In the Sakar et al. [8] dataset which has 754 features selected 50 features using HSIC Lasso. In contrast to other data, Sakar et al. [8] dataset has a subset of vocal fold, wavelet transform, and TQWT (tunable Q-factor wavelet transform) features. The four features that have the highest similarity values are tqwt_TKEO_std_dec_12 (standard deviation of the 12th level Teager-Kaiser energy operator), tqwt_kurtosisValue_dec_26 (quantity used to define distribution on signal level 26), tqwt_kurtosisValue_dec_20 (quantity which is used to define the signal distribution in the 20th level wavelet transform), tqwt_entropy_shanon_dec_12 (12th degree wavelet transform probability distribution).

If seen in Figure 13, the most selected features belong to the TQWT feature subset. The number of features selected from these features is 36 out of 432 features. While the other features consist of 6 subsets of the mel frequency feature, one subset of the baseline voice feature, three subsets of the vocal fold feature, one subset of the time frequency feature, and three subsets of the wavelet transform feature. The wavelet transform feature subset has a ratio of the difference between the selected features and the smallest total features of 1:182 compared to the TQWT feature subset of 1:12. This shows that Sakar et al. [8] dataset has an important subset of features namely TQWT.

The features selected after the feature selection stage will be used in the model setting stage for each classification method. The results of the adjustment are shown in Table 9. Performance evaluation on the prediction model uses the AUC parameter value. The setup uses the GridSearchCV method with a fold parameter value of 10. The number of class ratios in each fold is the same.

Then, the prediction model with the best hyper-parameters in each classification method is used for evaluation using testing data and evaluation techniques according to previous studies. Evaluation uses the parameters of accuracy, sensitivity, precision, F1, and AUC. In addition, there is also an ROC curve to determine a good classification method for the prediction model.

Table 10 shows that the HSIC Lasso SMOTE model produces better performance than without using oversampling and feature selection methods. The best accuracy performance results were obtained by the MLP classification method with a difference of 8.08% from the model without HSIC Lasso SMOTE. However, models that use the SVM classification result in lower accuracy performance, even though they have better AUC performance. The process of balancing the data using the SMOTE method also increases the accuracy performance by 5.50% in the MLP classification method. However, when compared to other classification methods, the SMOTE method does not have a significant effect.

Figure 14 shows the results of both ROC curves that are good for the k-NN and MLP classification methods. Especially in the MLP classification method which is worth 99.70%. While in the SVM classification method, the AUC value has a fairly far range from other classification methods. The left side of the curve is targeted at a class of normal individuals with an index of 0. On this curve, the SVM method tends to predict people with Parkinson’s disease better than normal individuals. Likewise, the left side of the curve targets the class of people with Parkinson’s disease with index 1. This ROC curve concludes that the MLP prediction model is better used for the prediction model.

In the Naranjo et al. [14] dataset no oversampling was carried out. This is because the dataset has balanced data in each class. As a result, the use of SMOTE oversampling did not have any impact on the dataset. The HSIC Lasso SMOTE prediction model on the Naranjo et al. [14] dataset resulted in better performance than without using the feature selection method (see Table 11). The best accuracy performance results were obtained by the MLP classification method with a difference of 8.67% from the model without HSIC Lasso SMOTE. While the k-NN method shows poor results and has a performance range that is quite far from other classification methods.

However, if you look at the ROC curve shown in Figure 15, the results of the AUC values in the three classification methods are not much different. The MLP method has the highest AUC value with a value of 88.33%. If we look at the two curves, the ability of the MLP classification method in predicting the class of people with Parkinson’s disease and normal individuals is the same. Then, the SVM method has an AUC value of 86.89%. Comparison of the ability to predict people with Parkinson’s disease with normal individuals. Likewise with the k-NN classification method with an AUC value of 83.03%, although it has the lowest accuracy performance compared to other classification methods. This concludes that the MLP method is also good for predicting models of the Naranjo et al. [14] dataset.

Meanwhile, Table 12 shows that the HSIC Lasso SMOTE model improves accuracy by 15.78% on k-NN, 18.26% on SVM, 15.06% on MLP from the model without using HSIC Lasso SMOTE. The best classification method was achieved by the MLP method. The comparison between the sensitivity and precision values in each classification method is not much different. This shows that the three methods are able to predict with balanced results. The data that has been balanced after the oversampling process using the SMOTE method also improves the performance of the prediction model.

While on the ROC curve the prediction model shows good results in the three classification methods. The method that produces the highest AUC value is the MLP method with a value of 99.22% (see Figure 16). It also concludes that the MLP method is good for predicting models in the Sakar et al. [8] dataset. The performance results of the three datasets produce good performance on the prediction model using the MLP classification method. The sensitivity and precision parameters have a value comparison that is not too far away and has a fairly high F1 parameter. Compared to other datasets, Naranjo et al. [14] dataset have a fairly low performance using the HSIC Lasso method or not.

4.4 Discussion

From the previous experiment, HSIC Lasso provides better performance than the prediction model that does not use this method. Moreover, the use of SMOTE to balance the data in the Little et al. [7] dataset and Sakar et al. [8] dataset strengthens the HSIC Lasso method in evaluating data in the feature selection process. The author tries to compare the HSIC Lasso method with related studies based on the same dataset and using the feature selection method.

Compared to other studies, the Lasso HSIC prediction model produces fewer features and better accuracy performance in the Little et al. [7] dataset. Feature subset used in this study is contained in the feature subset used in the related study [16, 23, 24] (see Table 13). Although the features used are relatively the same, especially in the study of Ouhmida et al. [16], the classification results using the k-NN method produce lower accuracy. The k-NN method used as a predictive model in this study has been set with the best hyper-parameters. However, the MLP classification method is able to match the accuracy performance compared to other studies. Besides being compared to the same dataset, this comparison is based on evaluating the same predictive model using 10-fold cross-validation.

While in the Naranjo et al. [14] dataset, the HSIC Lasso method does not produce fewer features than other studies [17, 25] (see Table 14). The features used in other studies are of the same type and are contained in the subset of features used in this study. The method used in other studies is also the root of the HSIC Lasso method, namely the Lasso feature selection method. As explained earlier, the Lasso method does not transform data into kernel form. So there is no sparsity or data void that makes the correlation between features and output even greater. However, in the case of the Naranjo et al. [14] dataset which has 45 features, 240 samples, and balanced data classes, the HSIC Lasso model is not suitable for use. Although it is not well used, the MLP method produces better performance than other studies that use a smaller number of features. This concludes that the Lasso HSIC model in the case of the Naranjo et al. [14] dataset are not well used for detection systems. This dataset does not use the oversampling method, because the data is already balanced and has no effect if applied.

The results of the Lasso HSIC selection method on the Sakar et al. [8] dataset yields a non-distinguishable number of features based on the total features in the dataset (see Table 15). On the features used by research related to the Sakar et al. [8] dataset does not show in detail what features were selected. However, in the study of Sakar et al. [8] shows the distribution of features that have been grouped as in Table 2. Table 16 compares the distribution of features based on these groupings with the MRMR SVM-RBF model in the research of Sakar et al. [8] dataset.

Regardless of feature distribution, the proposed predictive model yields better performance than other studies. Especially in the HSIC Lasso SMOTE MLP method which produces a difference of 10.18% with the study of Sakar et al. [8] which has the same number of features. In this dataset, the use of the SMOTE method greatly affects the performance of the prediction model. When compared to prediction models that do not use SMOTE, the results of the prediction model performance are not much different from other studies. In every model variation that has been tested, HSIC Lasso always provides the best results in handling feature selection with very large amounts of data based on the accuracy of the predictions produced. Moreover, in every case that has been tested, HSIC Lasso can reduce features effectively. So that it is able to provide better performance with a minimal number of features.

Proposed method has proven successful in selecting vocal voice features and has a good performance in developing a Parkinson’s disease detection system. Likewise with the prediction model of related research, HSIC Lasso method can exceed the accuracy performance and produce fewer features in some conditions depending on the data tested from previous studies. Variation of data includes the number of samples and features. Number of samples comparison between two classes also affects HSIC Lasso calculation in determining the number of relevant and not redundant features. HSIC Lasso has proven to be more effective in building predictive models on balanced data with larger features and data samples.

HSIC Lasso feature selection method will be better used in high-dimensional data. High-dimensional data will make the data that has been transformed into a Hilbert Schmidt Kernel produce a stronger similarity value between the feature data and the output data. Then the lack of this study is that it does not focus on other aspect of relation between data feature and subject condition. The author generalizes the condition of the subject and ignores some features such age, gender, and the number of voice recordings on one subject. This is potentially important in evaluating human vocal voice data. Because the voices produced by people in different age or gender are very different.

No potential conflict of interest relevant to this article was reported.

Fig. 1.

Parkinson’s disease detection system flow.


Fig. 2.

Oversampling process using synthetic minority oversampling technique: (a) unbalanced dataset, (b) adding synthetic data from minority class, and (c) balanced dataset.


Fig. 3.

Lasso regression graph in case area where θ is error function.


Fig. 4.

HSIC Lasso method flow.


Fig. 5.

K-nearest neighbors method overview.


Fig. 6.

Support vector machine method overview.


Fig. 7.

Multilayer perceptron method overview.


Fig. 8.

ROC curve graph.


Fig. 9.

Selected feature score Little et al. [7] dataset.


Fig. 10.

Distribution selected feature Little et al. [7] dataset.


Fig. 11.

Selected feature score Naranjo et al. [14] dataset.


Fig. 12.

Distribution selected feature Naranjo et al. [14] dataset.


Fig. 13.

Distribution selected feature Sakar et al. [8] dataset.


Fig. 14.

ROC graph result: (a) targeting Parkinson’s disease class and (b) targeting normal class Sakar et al. [8] dataset.


Fig. 15.

ROC graph result: (a) targeting Parkinson’s disease class and (b) targeting normal class Naranjo et al. [14] dataset.


Fig. 16.

ROC graph result: (a) targeting Parkinson’s disease class and (b) targeting normal class Sakar et al. [8] dataset.


Table. 1.

Table 1. Details of data used.

DatasetNumber of featuresNumber of samplesParkinson’ disease samplesNormal samples
Little et al. [7]2219514748
Naranjo et al. [14]45240120120
Sakar et al. [8]754756564192

Table. 2.

Table 2. Feature subset grouping.

NumberFeature subsetDescription
IVoice BaselineBasic value of sound extraction
IIVocal FoldVocal cords extraction
IIITime FrequencySound frequency time extraction
IVMel FrequencyShort-term power spectrum coefficient
VWavelet TransformFeature extraction from wavelet transform
VITunable Q-factor Wavelet TransformFeature extraction from wavelet transform with Q-factor

Table. 3.

Table 3. Hyper-parameter tuning.

ClassificationHyper-parameter
k-NNn-neighbour: [1, 3, 5], weight: [uniform, distance], metric: [euclidean, minkowski]
SVMC: [0.1, 0.5, 1.0], kernel: [rbf, poly, sigmoid], gamma: auto
MLPsolver: [lbfgs, sgd, adam], hidden layer: [200, 300, 400], learning rate: [constant, invscaling, adaptive], iterasi: 3000, learning rate init: 0.0001

Table. 4.

Table 4. Performance test result without feature selection Little et al. [7] dataset (unit: %).

Classification methodAccuracySensitivityPrecisionF1AUC
k-NN91.2891.2891.5091.3698.20
SVM89.2389.2390.5788.0888.27
MLP90.7690.7690.5790.5597.03

Table. 5.

Table 5. Performance test result without feature selection Naranjo et al. [14] dataset (unit: %).

Classification methodAccuracySensitivityPrecisionF1AUC
k-NN77.0077.0077.0076.9984.17
SVM78.6778.6778.8978.7586.98
MLP79.6779.6779.6179.6187.63

Table. 6.

Table 6. Performance test result without feature selection Sakar et al. [8] dataset (unit: %).

Classification methodAccuracySensitivityPrecisionF1AUC
k-NN75.3975.3972.1572.2665.83
SVM75.0075.0081.2764.6793.38
MLP81.1281.1281.8377.8584.24

Table. 7.

Table 7. Performance test result with HSIC Lasso Little et al. [7] dataset (unit: %).

Classification methodAccuracySensitivityPrecisionF1AUC
k-NN94.8794.8795.1094.9399.16
SVM93.3393.3293.9093.4796.71
MLP91.2891.2891.1491.1896.72

Table. 8.

Table 8. Performance test result with HSIC Lasso Sakar et al. [8] dataset (unit: %).

Classification methodAccuracySensitivityPrecisionF1AUC
k-NN87.4387.4388.2186.1789.72
SVM87.4387.4387.0587.0290.13
MLP87.9687.9687.6487.6991.55

Table. 9.

Table 9. Hyper-parameter test result.

DatasetClassificationHyper-parameterAUC (%)
Little et al. [7]k-NNn-neighbor: 5, weight: distance, metric: euclidean98.13
SVMC: 1.0, Kernel: poly, Gamma: auto93.59
MLPSolver: adam, Hidden layer: 400, Learning rate: adaptive, Iterasi: 3000Learning rate init: 0.000199.70

Naranjo et al. [14]k-NNn-neighbor: 5, weight: distance, metric: euclidean86.03
SVMC: 0.5, Kernel: poly, Gamma: auto86.89
MLPSolver: adam, Hidden layer: 400, Learning rate: invscaling, Iterasi: 3000Learning rate init: 0.000188.33

Sakar et al. [8]k-NNn-neighbor: 5, weight: distance, metric: euclidean96.59
SVMC: 0.1, Kernel: poly, Gamma: auto97.72
MLPSolver: adam, Hidden layer: 400, Learning rate: invscaling, Iterasi: 3000Learning rate init: 0.000199.22

Table. 10.

Table 10. Performance test result with HSIC Lasso SMOTE Sakar et al. [8] dataset (unit: %).

Classification methodAccuracySensitivityPrecisionF1AUC
k-NN97.6197.6197.6797.6197.61
SVM96.9396.9397.0496.9398.63
MLP98.8498.5795.6896.5799.70

Table. 11.

Table 11. Performance test result with HSIC Lasso SMOTE Naranjo et al. [14] dataset (unit: %).

Classification methodAccuracySensitivityPrecisionF1AUC
k-NN79.8479.8375.8579.8486.03
SVM83.3383.3383.3783.3286.89
MLP88.3480.0096.0087.2788.33

Table. 12.

Table 12. Performance test result with HSIC Lasso SMOTE Sakar et al. [8] dataset (unit: %).

Classification methodAccuracySensitivityPrecisionF1AUC
k-NN91.1791.1792.8291.7096.59
SVM93.2693.2693.4093.2597.72
MLP96.1896.1896.2796.1899.22

Table. 13.

Table 13. Comparison with a number of previous studies Little et al. [7] dataset.

StudiesMethodNumber of featuresAccuracy (%)
Ouhmida et al. [16]ReliefF + k-NN897.92
Thapa et al. [23]Correlation-based + TSVM1393.90
Ma et al. [24]RFE + SVM1196.29
This studySMOTE + HSIC Lasso + k-NN897.61
SMOTE + HSIC Lasso + SVM896.93
SMOTE + HSIC Lasso + MLP898.84

Table. 14.

Table 14. Comparison with a number of previous studies Naranjo et al. [14] dataset.

StudiesMethodNumber of featuresAccuracy (%)
Abdullah et al. [17]Lasso + SVM1086.90
Naranjo et al. [25]Correlation-based + Lasso + Bayessian1086.20
HSIC Lasso + k-NN1875.83
HSIC Lasso + SVM1883.33
HSIC Lasso + MLP1888.34

Table. 15.

Table 15. Comparison with a number of previous studies Sakar et al. [8].

StudiesMethodNumber of featuresAccuracy (%)
Gunduz [11]ReliefF + VAE6091.60
Sakar et al. [8]MRMR + SVM-RBF5086.00
SMOTE + HSIC Lasso + k-NN5091.17
SMOTE + HSIC Lasso + SVM5093.26
SMOTE + HSIC Lasso + MLP5096.18

Table. 16.

Table 16. Distribution of Sakar et al. [8] dataset features.

StudiesSelection feature methodsSelected feature subset distribution
IIIIIIIVVVI
Sakar et al. [8]MRMR43210130
This studyHSIC Lasso1316336

  1. Jankovic, J (2008). Parkinson’s disease: clinical features and diagnosis. Journal of Neurology Neurosurgery and Psychiatry. 79, 368-376. https://doi.org/10.1136/jnnp.2007.131045
    Pubmed CrossRef
  2. Shafique, H, Blagrove, A, Chung, A, and Logendrarajah, R (2011). Causes of Parkinson’s disease: literature review. Journal of Parkinsonism & Restless Legs Syndrome. 1, 5-7. https://doi.org/10.2147/JPRLS.S37041
    CrossRef
  3. Klein, MO, Battagello, DS, Cardoso, AR, Hauser, DN, Bittencourt, JC, and Correa, RG (2019). Dopamine: functions, signaling, and association with neurological diseases. Cellular and Molecular Neurobiology. 39, 31-59. https://doi.org/10.1007/s10571-018-0632-3
    Pubmed CrossRef
  4. Magrinelli, F, Picelli, A, Tocco, P, Federico, A, Roncari, L, Smania, N, Zanetteans, G, and Tamburin, S (2016). Pathophysiology of motor dysfunction in Parkinson’s disease as the rationale for drug treatment and rehabilitation. Parkinson’s Disease. 2016. article no 9832839
    Pubmed KoreaMed CrossRef
  5. Sakar, BE, Isenkul, ME, Sakar, CO, Sertbas, A, Gurgen, F, Delil, S, Apaydin, H, and Kursun, O (2013). Collection and analysis of a Parkinson speech dataset with multiple types of sound recordings. IEEE Journal of Biomedical and Health Informatics. 17, 828-834. https://doi.org/10.1109/JBHI.2013.2245674
    Pubmed CrossRef
  6. Sinder, DJ, Krane, MH, and Flanagan, JL (1998). Synthesis of fricative sounds using an aeroacoustic noise generation model. The Journal of the Acoustical Society of America. 103, 2775-2775. https://doi.org/10.1121/1.421418
    CrossRef
  7. Little, MA, McSharry, PE, Roberts, SJ, Costello, DA, and Moroz, IM (2007). Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection. BioMedical Engineering OnLine. 6. article no 23
    Pubmed KoreaMed CrossRef
  8. Sakar, CO, Serbes, G, Gunduz, A, Tunc, HC, Nizam, H, and Sakar, BE (2019). A comparative analysis of speech signal processing algorithms for Parkinson’s disease classification and the use of the tunable Q-factor wavelet transform. Applied Soft Computing. 74, 255-263. https://doi.org/10.1016/j.asoc.2018.10.022
    CrossRef
  9. Kumar, L, and Singh, DK . Analyzing computational response and performance of deep convolution neural network for plant disease classification using plant leave dataset., Proceedings of 2021 10th IEEE International Conference on Communication Systems and Network Technologies (CSNT), 2021, Bhopal, India, Array, pp.549-553. https://doi.org/10.1109/CSNT51715.2021.9509632
    CrossRef
  10. Kumar, L, and Singh, DK (2023). A comprehensive survey on generative adversarial networks used for synthesizing multimedia content. Multimedia Tools and Applications. 82, 40585-40624. https://doi.org/10.1007/s11042-023-15138-x
    CrossRef
  11. Gunduz, H (2021). An efficient dimensionality reduction method using filter-based feature selection and variational autoencoders on Parkinson’s disease classification. Biomedical Signal Processing and Control. 66, article no 102452. https://doi.org/10.1016/j.bspc.2021.102452
    CrossRef
  12. Yamada, M, Jitkrittum, W, Sigal, L, Xing, EP, and Sugiyama, M (2014). High-dimensional feature selection by featurewise kernelized lasso. Neural Computation. 26, 185-207. https://doi.org/10.1162/NECO_a_00537
    Pubmed CrossRef
  13. Damodaran, BB, Courty, N, and Lefevre, S (2017). Sparse Hilbert Schmidt independence criterion and surrogate-kernel-based feature selection for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing. 55, 2385-2398. https://doi.org/10.1109/TGRS.2016.2642479
    CrossRef
  14. Naranjo, L, Perez, CJ, Campos-Roca, Y, and Martin, J (2016). Addressing voice recording replications for Parkinson’s disease detection. Expert Systems with Applications. 46, 286-292. https://doi.org/10.1016/j.eswa.2015.10.034
    CrossRef
  15. Hoq, M, Uddin, MN, and Park, SB (2021). Vocal feature extraction-based artificial intelligent model for Parkinson’s disease detection. Diagnostics. 11, article no 1076. https://doi.org/10.3390/diagnostics11061076
    Pubmed KoreaMed CrossRef
  16. Ouhmida, A, Raihani, A, Cherradi, B, and Terrada, O (2021). A novel approach for Parkinson’s disease detection based on voice classification and features selection techniques. International Journal of Online Biomedical Engineering. 17, 111-130. https://doi.org/10.3991/ijoe.v17i10.24499
    CrossRef
  17. Abdullah, AA, Norazman, NN, Ahmad, WKW, Awang, SA, and Jian, FW . Detection of Parkinson’s disease (PD) based on speech recordings using machine learning techniques., Proceedings of 2020 International Conference on Innovation and Intelligence for Informatics, Computing and Technologies (3ICT), 2020, Sakheer, Bahrain, Array, pp.1-6. https://doi.org/10.1109/3ICT51146.2020.9311991
    CrossRef
  18. Ramchoun, H, Ghanou, Y, Ettaouil, M, and Janati Idrissi, MA (2016). Multilayer perceptron: architecture optimization and training. International Journal of Interactive Multimedia and Artificial Intelligence. 4, 26-30. http://doi.org/10.9781/ijimai.2016.415
    CrossRef
  19. Tibshirani, R (1996). Regression Shrinkage and Selection Via the Lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology. 58, 267-288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
    CrossRef
  20. Efron, B, Hastie, T, Johnstone, I, and Tibshirani, R (2004). Least angle regression. Annals of Statistics. 32, 407-499. https://doi.org/10.1214/009053604000000067
    CrossRef
  21. Cunningham, P, and Delany, SJ (2022). k-nearest neighbour classifiers: a tutorial. ACM Computing Surveys. 54, article no 128. https://doi.org/10.1145/3459665
    CrossRef
  22. Hsu, CW, and Lin, CJ (2002). A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Networks. 13, 415-425. https://doi.org/10.1109/72.991427
    Pubmed CrossRef
  23. Thapa, S, Adhikari, S, Ghimire, A, and Aditya, A . Feature selection based twin-support vector machine for the diagnosis of Parkinson’s disease., Proceedings of 2020 IEEE 8th R10 Humanitarian Technology Conference (R10-HTC), 2020, Kuching, Malaysia, Array, pp.1-6. https://doi.org/10.1109/R10-HTC49770.2020.9356984
    KoreaMed CrossRef
  24. Ma, H, Tan, T, Zhou, H, and Gao, T . Support vector machine-recursive feature elimination for the diagnosis of Parkinson disease based on speech analysis., Proceedings of 2016 7th International Conference on Intelligent Control and Information Processing (ICICIP), 2016, Siem Reap, Cambodia, Array, pp.34-40. https://doi.org/10.1109/ICICIP.2016.7885912
    CrossRef
  25. Naranjo, L, Perez, CJ, Martin, J, and Campos-Roca, Y (2017). A two-stage variable selection and classification approach for Parkinson’s disease detection by using voice recording replications. Computer Methods and Programs in Biomedicine. 142, 147-156. https://doi.org/10.1016/j.cmpb.2017.02.019
    Pubmed CrossRef

Wiharto Wiharto received obtained a Doctoral Degrees from Gadjah Mada University, Yogyakarta, Indonesia, in 2017. He received a Bachelor of Electrical Engineering (B.E.) from Telkom University, Bandung, Indonesia, 1999 and master’s degree in computer science from Gadjah Mada University, Yogyakarta, Indonesia, in 2004. He is presently working as an Assistant professor in the Department of Informatics, Faculty of mathematics and natural sciences, Sebelas Maret University, Surakarta, Indonesia. His experience and areas of interest focus on artificial intelligence, clinical decision support system, medical image processing and biomedical data science.

Ahmad Sucipto received a Bachelor of Computer Science, Faculty of Mathematics and Natural Sciences, Sebelas Maret University, Surakarta, Indonesia, in 2022. The area of research being carried out is the data mining, artificial intelligence, and clinical decision support system.

Umi Salamah received her Bachelor’s Degree from the Department of Mathematics, Universitas Sebelas Maret, Indonesia, in 1994. She received her Master’s and Doctoral Degrees from the Department of Informatics Engineering, Institut Teknologi Sepuluh Nopember, Indonesia, in 2002 and 2018, respectively. Her research interests include fuzzy logic and systems, image processing, applied mathematics, and computational sciences.

Article

Original Article

International Journal of Fuzzy Logic and Intelligent Systems 2023; 23(4): 482-499

Published online December 25, 2023 https://doi.org/10.5391/IJFIS.2023.23.4.482

Copyright © The Korean Institute of Intelligent Systems.

Hilbert-Schmidt Independence Criterion Lasso Feature Selection in Parkinson’s Disease Detection System

Wiharto Wiharto , Ahmad Sucipto, and Umi Salamah

Department of Informatics, Universitas Sebelas Maret, Surakarta, Indonesia

Correspondence to:Wiharto Wiharto (wiharto@staff.uns.ac.id)

Received: July 4, 2022; Accepted: December 22, 2023

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Parkinson’s disease is a neurological disorder which interferes human activities. Early detection is needed to facilitate treatment before the symptoms get worse. Earlier detection used vocal voice as a comparison with normal subject. However, detection using vocal voice still has weaknesses in detection system. Vocal voice contains a lot of information that isn’t necessarily relevant for a detection system. Previous studies proposed a feature selection method on detection system. However, the proposed method can’t handle variation in the amount of data. These variations include an imbalance sample to features and classes. In answering these problems, the Hilbert-Schmidt Independence Criterion Lasso (HSIC Lasso) feature selection method is used which has feature transformation capabilities that can produce more relevant features. In addition, detection system uses Synthetic Minority Oversampling Technique (SMOTE) method to balance data and several classification methods such as k-nearest neighbors, support vector machine, and multilayer perceptron to obtain best predictive model. HSIC Lasso produces 18 of 45 features with an accuracy of 88.34% on a small sample and 50 of 754 features with an accuracy of 96.16% on a large sample. From this result, when compared with previous studies, HSIC Lasso is more suitable on balanced data with more samples and features.

Keywords: Parkinson&rsquo,s disease, Early detection, Vocal voice, HSIC Lasso

1. Introduction

Parkinson’s disease is a chronic and progressive neurodegenerative disorders characterized by motor and non-motor dysfunction of human nerves [1]. This neurodegenerative disorder occurs in dopamine-producing cells in the substantia nigra in the midbrain [2]. When dopamine-producing cells, also known as dopaminergic cells, degenerate, dopamine levels drop. Dopamine is a neurotransmitter whose job is to bind to G protein receptors in the dopaminergic signaling system which is very important for physiological processes and balance of activities of the human body [3]. The motor dysfunction of Parkinson’s disease is characterized by muscle weakness and stiffness, slurred speech, and weight loss. Meanwhile, non-motor dysfunction in Parkinson’s disease is followed by cognitive changes, sleep disturbances, and psychiatric symptoms [4]. If these symptoms get worse, it will affect all activities of that person and is difficult to treat. Researchers continue to strive to prevent the development of Parkinson’s disease by conducting early detection, in order to make it easier for clinical officers to treat people with Parkinson’s disease.

The previous detection of Parkinson’s disease used a lot of people’s vocal voices as a comparison with normal individuals. In the study of Sakar et al. [5] said that 90% of people with Parkinson’s disease have problems with their voices. Sakar et al. [5] also proved by testing the pronunciation of letters, numbers, and some words for people with Parkinson’s disease and normal individuals. The test results show that the pronunciation of vowels provides more discriminatory information in cases of Parkinson’s disease. That’s because, voice noise as a comparison characteristic of disease is more clearly heard in the pronunciation of vowel sounds [6]. However, the use of human vocal sounds still has weaknesses in the detection system.

Human vocal sound has a lot of information that is not necessarily useful for the detection process. Little et al. [7] tried to extract sound signals based on frequency, duration, and noise. Meanwhile, Sakar et al. [8] transforming voice signals with wavelet transform and extracting voice signals based on frequency and noise. Both use a machine learning approach, because it can shorten the time in analyzing and observing the development of a disease [9]. Moreover, machine learning can automatically make it easier to process large amounts of data, including attributes of vocal sounds [10]. The sound extraction in this study was used as a data attribute for the learning process of a detection system with a machine learning approach. From the detection system in both studies, the results show that not all attributes make a significant contribution to the learning process. From these problems, we need a method to choose the attributes of vocal sound extraction that are useful for the detection system.

The process of selecting attributes or selecting features will be very useful to improve the performance of the detection system. In the same case, Sakar et al. [8] tested the minimum redundancy maximum relevance (MRMR) feature selection method which produces a Parzen window estimator to calculate feature correlations in the data. However, the Parzen window estimator is not reliable when the number of training samples is small and results in less relevant features. Gunduz [11] uses the ReliefF method in filtering features based on feature scores. However, ReliefF shows poor performance on data that has unbalanced classes and produces features that are less relevant for the detection system. Feature selection method implementation that has been applied previous studies cannot overcome variations in the amount of data including the number of samples against the number of features and the number of samples that are not balanced between Parkinson’s disease and normal subject. Therefore, a feature selection method is needed that can produce features that are more relevant to variations in the amount of data.

Yamada et al. [12] presents a feature selection method Hilbert Schmidt Independence Criterion Lasso (HSIC Lasso) which transforms features into kernel form to obtain more relevant features. The resulting kernel shape has sparsity which improves the calculation of feature correlation values. Damodaran et al. [13] used HSIC Lasso and succeeded in selecting high-dimensional image features and producing good prediction performance. Therefore, this study will test the HSIC Lasso feature selection method in the Parkinson’s disease detection system which is expected to produce vocal sound features that are more relevant to variations in the amount of data.

2. Related Works

This study is related to other research based on the methods and materials used in the research process. Some of these linkages include using the selection, oversampling, and classification features in the same way or applied to the same case. The dataset material used is also taken from several studies by taking, cases, and forms in the form of feature extraction which are relatively the same. The datasets are from Little et al. [7], Naranjo et al. [14], and Sakar et al. [8]. The research of Little et al. [7] aims to introduce a new technique for analyzing voices in Parkinson’s disease. The methods used in this study are recurrence and fractal scaling. The results show that recurrence and fractal scaling are effective in regression classification with thousands of features. Naranjo et al. [14] tested the replication of voice recordings that differentiated healthy people from people with Parkinson’s disease using a subject-based Bayesian approach. The proposed system is able to distinguish cases even with a small sample. While Sakar et al. [8] examined feature extraction of tunable Q-factor wavelet transform in voice signals in cases of Parkinson’s disease. The proposed method shows a better performance than the previously applied technique.

Then application of the feature selection method used in this study was taken from the study of Yamada et al. [12]. They aims to prove the optimal solution in determining features relevant to the kernel in the feature selection method using the HSIC Lasso. The proposed method is proven to be effective in the classification and regression process with thousands of features. Meanwhile, oversampling and classification used in this study are related to research that uses selection features in the same case and dataset. In the oversampling method, the synthetic minority oversampling technique (SMOTE), also applied by Hoq et al. [15]. In [15], the authors tries to apply artificial intelligent in the case of Parkinson’s disease by extracting vocal sound features using principal component analysis and sparse autoencoder with synthetic minority oversampling. The sparse autoencoder method results in better performance in the detection system. Then from the classification method, Ouhmida et al. [16] using the k-nearest neighbors (k-NN) classification method to analyze the performance of machine learning in identifying Parkinson’s disease based on speech disorders. In addition, the MRMR and ReliefF feature selection is used. The results show that the ReliefF feature selection results in better performance. Abdullah et al. [17] applying machine learning in a Parkinson’s disease diagnosis system based on voice recordings with Lasso feature selection and support vector machine (SVM) classification. The results show that the model produces better performance than the previous method. Meanwhile, Ramchoun et al. [18] introduces a new approach to optimize network architecture with multilayer perceptron classification method. The model architecture shows a high effectiveness compared to the previously applied model.

3. Methods

Feature selection research using HSIC Lasso for the detection of Parkinson’s disease is carried out by creating a detection system with a machine learning approach. In making the detection system, several stages were carried out such as feature selection, data balancing with oversampling, and classification to build predictive models. The stages are shown in Figure 1. In obtaining the optimal prediction model, the hyper-parameter adjustment stage is carried out on the classification model. The data is divided into two parts, namely training data and testing data. Prediction model setup using training data. While the rest, data testing, is used to evaluate the prediction model. The distribution of the data is based on the evaluation technique used in related research. It is intended that the results of the prediction model in this study can be compared with related research. The evaluation technique used is to divide 75% of the training data and 25% of the testing data on Naranjo et al. [14] dataset, 10-fold cross-validation on Little et al. [7] dataset, and leave one subject out cross-validation on Sakar et al. [8] dataset.

3.1 Dataset Description

The dataset that will be used in this study was taken from the UCI (University of California at Irvine) machine learning database. Data was collected by selecting the same case (Parkinson’s disease), relatively similar features (the extraction of vocal cords of people with Parkinson’s disease), and the same sampling method. In this study, the authors propose several data sets that meet these prerequisites, which are shown in Table 1.

The data acquisition is classified as a physical test by taking the sound of the vowel “a” as many as three samples per subject, except for the Little et al. [7] dataset who took six samples per subject. The use of the vowel “a” already represents all vowel pronunciations [7]. Voice capture uses a microphone whose frequency is set at 44.1 kHz. The results of the sound extraction are grouped into the feature subsets described in Table 2.

3.2 Oversampling

In the research of Hoq et al. [15] used a method to solve the problem of unbalanced data in machine learning. The method is called SMOTE. The SMOTE method oversampling the class that is classified as a minority. The resulting new samples are synthetic parts of the nearest minority class data. The closest data distance used for the new sample is randomly selected. In Figure 2, the synthetic data (red triangle) generated from the minor class (blue triangle) makes the two data classes balanced.

The dataset used by Sakar et al. [8] and Little et al. [7] have a ratio of data on people with Parkinson’s disease with normal individuals in the range of 3:1. The ratio of the amount of data concludes that the dataset is unbalanced. This can cause the predictive model to over-fit (a situation when the model is too good in the training process) in one class and ignore the minor class. Therefore, the author will test oversampling using SMOTE to add synthetic data to an unbalanced dataset. The method will select a sample of adjacent data from the minor class and balance the ratio of the numbers of data to 1:1. The oversampling process is only carried out on the training data. This is because the synthetic data produced are not relevant to be used to evaluate the model [15].

3.3 Feature Selection

The least absolute shrinkage and selection operator (Lasso) method is an operator used for the selection and shrinkage of the smallest absolute value [19]. Lasso will optimize the case by making the regression coefficient of irrelevant and/or redundant features (features that have a low correlation with the error function) to be precise or close to 0. The Lasso optimization case is shown as follows:

Lasso=minαd12y-Xα22+λα1,

where α = [α1, . . ., αd]′ is the regression coefficient vector, αk shows the regression coefficient of the k-th feature, λ > 0 is the regularization parameter, ||·||1 is L1-norm, ||·||2 is L2-norm, and [·]′ are transpose matrices. The optimization of the regression coefficient by Lasso is depicted in Figure 3 as a rectangular area. Meanwhile, the ellipse in Figure 3 is the boundary line of the least square error function.

In the research of Yamada et al. [12] proposed the HSIC Lasso method. This method is an alternative implementation of nonlinear feature-wise Lasso which aims to obtain sparsity in features by forming a Hilbert-Schmidt reproducing lernel. The optimization problem for HSIC Lasso is shown in Eq. (2).

HSICLasso=minαd12L¯-k=1dαkK¯(k)Frob2+λα1,

where ||·||Frob is Frobenius norm, (k) = ΓK(k)Γ and = ΓLΓ is Gram centered matrix, Ki,j(k)=K(xk,i,xk,j) and Li,j = L(yi, yj) are Gram matrices, K(x, x′) and L(y, y′) is a kernel function, =Γ=In-1n1n1n is a centering matrix, In is an n-dimensional identity matrix, and 1n is an n-dimensional vector that all contains one.

In Figure 4 the input and target data are normalized using a z-score (centering with the average value and shrinking with a standard deviation) to facilitate the learning process. Then, HSIC Lasso uses the gaussian kernel for the input K transformation and the target L transformation if the target is a regression case. However, if the target is a classification case or the target data type is a category, then the kernel used is the delta kernel.

Gaussian Kernel=exp (-(x-x)22σx2),Delta Kernel={1ny,if y=y,0,sebaliknya,

where σx is the standard deviation of the data. After that, it is continued by using the Γ centering matrix. The last stage is case optimization using least angle regression to shrink the regression coefficient. Features that have a value of more than 0 in the regression coefficient are selected features.

Least-angle regression (LARS) is a step-by-step procedure used to simplify the computation of all types of Lasso cases [20]. This method will estimate the Lasso regression coefficient in stages until the feature has a strong correlation with the output, so that in practice it is not necessary to manually adjust the regularization parameter in the HSIC Lasso method.

In each iteration of the LARS algorithm stage, a calculation is performed on the feature index set A, with {1, 2, . . ., m} ∈ A and m being the number of features, to obtain the regression coefficient in Eq. (2). The stages of each iteration of the LARS algorithm in estimating the HSIC Lasso regression coefficient are described below:

  • • Step 1: Find correlation vector between input X and target y which has been transformed into kernel form and in Eq. (2). Assume ĉ as correlation vector and as μ̂ estimator. The estimator value begins with μ̂ = 0.

    c^=K¯(L¯-μ^).

  • • Step 2: Determine the absolute highest correlation vector Ĉ in each jA feature.

    C^=maxj{c^j}.

  • • Step 3: Specifies value of GA and AA vectors. Assume 1A is the A-dimensional identity matrix.

    GA=XAXAdan AA=(1AGA-11A)-1/2.

  • • Step 4: Calculates the uA equiangular vector. The equiangular vector is used to store the divisor angle of the column in the X data feature. The divisor angle is always less than 90°.

    uA=XAwA,dengan wA=AAGA-1AA.

  • • Step 5: Calculate the inner product of vector a on the feature set A.

    aXuA.

  • • Step 6: Updating the positive estimator on feature set A.

    μ^A+=μ^A+γuA,

    where

    γ^=min+jAcC^c^jAAaj,C^+c^jAA+aj.

    In Eq. (11), min+ is a way of taking a positive minimum value from index j outside the feature set A. If it is at the last index, then γ̂ uses the following equation

    γ^m=C^mAm.

After the stage ends, the features are selected based on the μ estimator. In the LARS algorithm, the μ estimator is the result of μ̂ = α. If the feature selection condition in HSIC Lasso is α> 0, then it can be concluded that μ̂ > 0. While the regularization λ parameter on HSIC Lasso is taken from the highest correlation vector value Ĉ on the last index.

In this study, a feature selection method was proposed using the HSIC Lasso method. The HSIC Lasso method has parameters B and M. Parameter B is the amount of training data that will be divided into several partitions and parameter M is the number of permutations for the bagging process of the HSIC Lasso sample. The author sets the values of B = 10 and M = 5 in the HSIC Lasso feature selection method. The division of data into several partitions will improve the computational performance of the LARS algorithm. In addition, the kernel for output data in the form of category data on HSIC Lasso is set as kernel delta. The use of the LARS algorithm is automatically executed after the transformation of input and output data into the kernel. The LARS optimization method will produce some information such as a list of selected feature indexes, regression coefficient values for each feature, and regularization parameter values.

3.4 Tuning Classification Model

The processed data will be used as training material for prediction models for people with Parkinson’s disease. In machine learning, the data learning stage is contained in the classification process. This study uses several classification methods which are described in the section below.

1) K-nearest neighbors

The basic concept of k-NN is classification based on its closest neighbors [21]. The k-NN method has two stages. The first stage is the process of determining the nearest neighbor. The second stage is the process of determining the class using the neighbors that have been determined. The dominant class of neighbors will determine the predicted data class. Neighbor selection is usually done by measuring distance metrics. Figure 5 shows the process of determining the 5 nearest neighbors. The dominant class (blue triangle object) makes prediction defined as that class.

2) Support vector machine

SVM is a classification method that maps input vector values into higher dimensions [22]. SVM forms a hyperlane (line that separates the two classes) on each side of the shared data (see Figure 6). This classification model has been tested well in previous studies depending on how the parameter settings affect the generalization error value (the measurement value of the algorithm’s accuracy in predicting invisible data).

3) Multilayer perceptron

Multilayer perceptron (MLP) is the development of artificial neural networks (ANN) for the classification process by studying pattern data [18]. MLP has neurons that are organized and always connected from the top layer to the bottom layer (see Figure 7). In the hidden layer there is at least one layer between the input and output. The number of input layers is the same as the number of problem patterns and the number of output layers is the same as the number of classes. The number of hidden layers that will be used in this method can be set freely.

In this study, several classification methods were used to examine vocal datasets of patients with Parkinson’s disease. It aims to produce a detection system with the best detection performance. The classification method used has different hyperparameters. The author will set the hyper-parameters of the classification method to the ranges described in Table 3. Evaluate the tuning using 10-fold cross-validation based on the area under the ROC curve (AUC) parameter value. In the skicit-learn learning package, the setup process by trying all combinations of hyper-parameters is called grid search cross-validation (Grid-SearchCV).

3.5 Evaluation

The test is carried out by comparing the combination of using oversampling and/or feature selection methods without using these methods. The comparison is evaluated based on the classification performance parameters. Then proceed with testing using a predetermined classification. The settings are evaluated based on the value of the AUC parameter. The classification model with the best AUC score is used to adjust the hyperparameters for further evaluation. While the validation of the prediction model uses 25% testing, 10-fold cross-validation, and leave one subject out cross-validation in accordance with previous research. The best prediction model for each dataset will be compared with previous studies based on the same dataset and validation technique.

The results of the prediction model are evaluated based on the confusion matrix which is calculated into classification performance such as accuracy, sensitivity, precision, F1 score, and AUC-ROC curve. The following is an explanation of these parameters:

1) Accuracy

Accuracy is the probability that the predicted fraction is correct from the entire population. The formula for accuracy is shown below:

accuracy=TP+TNTP+TN+FP+FN.
2) Sensitivity

Sensitivity, also known as recall, is the probability of a positive fraction of the predicted correct fraction. The formula for the sensitivity is shown below:

sensitivity=TPTP+FN.
3) Precision

Precision is the probability that a positive fraction of the total fraction should be positive. The formula for precision is shown below:

precision=TPTP+FP.
4) F1 score

The harmonic parameters composed of sensitivity and precision are called F1 scores. The formula for the F1 score is shown below:

F1=2*TP2*TP+FP+FN.
5) AUC-ROC

Area under the curve of receiver characteristic operator (AUC-ROC) is a probability curve and performance calculation that is usually used for binary classification problems. In the ROC curve of Figure 8, the x-coordinate represents the false positive rate and the y-coordinate represents the true positive rate. The AUC value is used to analyze the classification performance. The AUC value ranges from 0–1. The prediction model is said to be good if the AUC value is getting closer to 1.

The accuracy parameters and the number of features are used to compare the prediction model with related studies that have the same data set. The AUC score is used to evaluate the predictive model tuning process. While the ROC curve in both classes, a value of 1 for people with Parkinson’s disease and a value of 0 for normal individuals, was used to evaluate the performance of the final prediction model.

4. Experiments

4.1 Model without Proposed Method

The author tries to test the effect of the HSIC Lasso feature selection method and the SMOTE oversampling method on the prediction model. Testing is done by testing the model without using the HSIC Lasso SMOTE method. The evaluation of the prediction model without using HSIC Lasso SMOTE uses the parameters of accuracy, sensitivity, precision, F1, and AUC scores on all datasets that have been determined. Meanwhile, the evaluation technique used in each dataset is in accordance with the related research evaluation technique. The dataset evaluation technique Little et al. [7] dataset using 25% of data testing, Naranjo et al. [14] dataset using 10-fold cross-validation (CV), and the Sakar et al. [8] dataset using leave one subject out cross-validation (LOSOCV).

Prediction model in Little et al. [7] dataset which is shown in Table 4, produces good AUC values for the k-NN and MLP classification methods. While the AUC value in the SVM classification method has a fairly large distance, reaching 10%. When viewed from the performance range of sensitivity and precision parameters, the value of the prediction model parameters using the SVM classification is not too far from other classifications.

Naranjo et al. [14] dataset show that the three classification methods produce accuracy below 80% with the MLP method as the best classification in this model (see Table 5). Likewise with the AUC parameter value which has a long distance with a value of one. This predictive model is still not suitable for use for Parkinson’s disease detection systems. 25% of data testing is used to evaluate the model without HSIC Lasso SMOTE.

While the prediction model using the Sakar et al. [8] dataset, the sensitivity of each model has better results than its precision (see Table 6). This dataset has higher data on people with Parkinson’s disease than in normal individuals. This condition is an over-fitting in the class of people with Parkinson’s disease.

4.2 Proposed Method on Unbalanced Data

The author tries to test the unbalanced data in the Little et al. [7] dataset and the Sakar et al. [8] dataset to determine the effect of unbalanced data in the prediction model. From the two datasets, the normal class of individuals has less data than the class of people with Parkinson’s disease. The performance evaluated for this stage is accuracy, sensitivity, precision, F1, and AUC scores for all classical models that have been determined.

The prediction model using the k-NN classification in Table 7 shows better results than the prediction model without using the HSIC Lasso feature selection shown in Table 4. The value of the AUC parameter in the model is also very close to the value of one. However, the other classification methods experienced a slight decrease of 2% in the accuracy parameter. The number of features selected in this dataset is 8 out of 22 features. Slightly better number of features using the HSIC Lasso k-NN model. Some of the selected features include spread1 (variation of 1 frequency spread), spread2 (variation of 2 frequency spread), DFA (signal fractal scaling exponent), MDVP:Fhi (Hz) (maximum vocal fundamental frequency), MDVP:Fo (Hz) (average vocal fundamental frequency), HNR (harmonic to noise ratio), RPDE (nonlinear dynamical complexity measures), and PPE (nonlinear measure of fundamental frequency variation).

Then, Lasso’s HSIC prediction model on the Sakar et al. [8] dataset resulted in a better sensitivity parameter value than the precision parameter value (see Table 8). The dataset is unbalanced data. The number of people with Parkinson’s is more than normal individuals. The features used in this HSIC Lasso model are 60 features out of 754 features. The AUC parameter value in this model is also far from the value 1. When compared with the model without using HSIC Lasso, the HSIC Lasso model shows an increase in the model using the SVM classification method. However, the other classification methods show a slight decrease in the accuracy parameter.

4.3 Proposed Method with Oversampling

This study applies the stages of oversampling, feature selection, and setting the classification method in stages. The process of oversampling on unbalanced datasets resulted in a ratio of data comparison between people with Parkinson’s disease and normal individuals being 1:1. The oversampling process on the Little et al. [7] dataset chose a minor class of normal individuals with 48 samples to 147 samples, so that the total data in the training Little et al. [7] dataset have 294 data samples. Likewise with the Sakar et al. [8] dataset which has 192 samples of normal individuals to 564 samples and brings the total data to 1, 128 data samples. However, oversampling is only done on the training data. Therefore, the results of oversampling depend on the distribution of data for each evaluation technique. Then at the feature selection stage, the training data is used for learning materials for the HSIC Lasso method to select features that are relevant and not redundant. The results of feature selection produce data in the form of the selected feature index and the regression coefficient value which is the weight of the feature selection. Unselected features have a zero value on the regression coefficient.

The results of using the HSIC Lasso method, which selects features based on correlation values, ranks the regression coefficient values in selecting relevant features. From the research Little et al. [7] dataset which consisted of 22 features, 8 features were selected using HSIC Lasso. The most important feature in this dataset is the feature of spread1 (variation of 1 frequency spread) and followed by spread2 (variation of 2 frequencies spread) which is shown in Figure 9. While other features are basic features of frequency such as DFA, MDVP:Fhi, MDVP:Fo, HNR, RPDE, and PPE. When compared with the results of feature selection on unbalanced data, the selected features in the Little et al. [7] dataset have the same type and number. This shows that the data imbalance does not affect the selected features in Little et al. [7] dataset.

The selected features consist of four subsets of time frequency features and 4 subsets of baseline voice features. The comparison graph between the selected features and the total features is shown in Figure 10. Most of the selected features are in the baseline subset which has a selected feature ratio of 4:17. While the time frequency feature subset has a selected feature ratio of 4:5. In the Little et al. [7] dataset the time frequency feature subset is very influential in the prediction model. However, it has not concluded that the subset of features is the most important feature in the vocal sound signal. It should be underlined that this condition only occurs based on the data collected in the Little et al. [7].

Meanwhile, in the Naranjo et al. [14] dataset which has 45 features, 18 similar features were selected in each evaluation using a different classification. The selected features are shown in Figure 11. The feature that has the highest similarity value is the HNR_35 feature (the amount of the harmonic to noise ratio in the 0–3, 500 Hz frequency range). While the remaining features have a large similarity value range with the HNR_35 feature. HNR_35 belongs to the baseline voice feature subset.

Of the 18 selected features, there are 14 subsets of the mel frequency feature and four subsets of the baseline voice feature. Figure 12 explains that from 26 subsets of selected mel frequency features 12 irrelevant and redundant features and from 19 selected baseline voice feature subsets 15 irrelevant and redundant features. Naranjo et al. [14] dataset mostly discard the baseline voice feature subset and make the mel frequency feature subset more important to use for prediction models.

In the Sakar et al. [8] dataset which has 754 features selected 50 features using HSIC Lasso. In contrast to other data, Sakar et al. [8] dataset has a subset of vocal fold, wavelet transform, and TQWT (tunable Q-factor wavelet transform) features. The four features that have the highest similarity values are tqwt_TKEO_std_dec_12 (standard deviation of the 12th level Teager-Kaiser energy operator), tqwt_kurtosisValue_dec_26 (quantity used to define distribution on signal level 26), tqwt_kurtosisValue_dec_20 (quantity which is used to define the signal distribution in the 20th level wavelet transform), tqwt_entropy_shanon_dec_12 (12th degree wavelet transform probability distribution).

If seen in Figure 13, the most selected features belong to the TQWT feature subset. The number of features selected from these features is 36 out of 432 features. While the other features consist of 6 subsets of the mel frequency feature, one subset of the baseline voice feature, three subsets of the vocal fold feature, one subset of the time frequency feature, and three subsets of the wavelet transform feature. The wavelet transform feature subset has a ratio of the difference between the selected features and the smallest total features of 1:182 compared to the TQWT feature subset of 1:12. This shows that Sakar et al. [8] dataset has an important subset of features namely TQWT.

The features selected after the feature selection stage will be used in the model setting stage for each classification method. The results of the adjustment are shown in Table 9. Performance evaluation on the prediction model uses the AUC parameter value. The setup uses the GridSearchCV method with a fold parameter value of 10. The number of class ratios in each fold is the same.

Then, the prediction model with the best hyper-parameters in each classification method is used for evaluation using testing data and evaluation techniques according to previous studies. Evaluation uses the parameters of accuracy, sensitivity, precision, F1, and AUC. In addition, there is also an ROC curve to determine a good classification method for the prediction model.

Table 10 shows that the HSIC Lasso SMOTE model produces better performance than without using oversampling and feature selection methods. The best accuracy performance results were obtained by the MLP classification method with a difference of 8.08% from the model without HSIC Lasso SMOTE. However, models that use the SVM classification result in lower accuracy performance, even though they have better AUC performance. The process of balancing the data using the SMOTE method also increases the accuracy performance by 5.50% in the MLP classification method. However, when compared to other classification methods, the SMOTE method does not have a significant effect.

Figure 14 shows the results of both ROC curves that are good for the k-NN and MLP classification methods. Especially in the MLP classification method which is worth 99.70%. While in the SVM classification method, the AUC value has a fairly far range from other classification methods. The left side of the curve is targeted at a class of normal individuals with an index of 0. On this curve, the SVM method tends to predict people with Parkinson’s disease better than normal individuals. Likewise, the left side of the curve targets the class of people with Parkinson’s disease with index 1. This ROC curve concludes that the MLP prediction model is better used for the prediction model.

In the Naranjo et al. [14] dataset no oversampling was carried out. This is because the dataset has balanced data in each class. As a result, the use of SMOTE oversampling did not have any impact on the dataset. The HSIC Lasso SMOTE prediction model on the Naranjo et al. [14] dataset resulted in better performance than without using the feature selection method (see Table 11). The best accuracy performance results were obtained by the MLP classification method with a difference of 8.67% from the model without HSIC Lasso SMOTE. While the k-NN method shows poor results and has a performance range that is quite far from other classification methods.

However, if you look at the ROC curve shown in Figure 15, the results of the AUC values in the three classification methods are not much different. The MLP method has the highest AUC value with a value of 88.33%. If we look at the two curves, the ability of the MLP classification method in predicting the class of people with Parkinson’s disease and normal individuals is the same. Then, the SVM method has an AUC value of 86.89%. Comparison of the ability to predict people with Parkinson’s disease with normal individuals. Likewise with the k-NN classification method with an AUC value of 83.03%, although it has the lowest accuracy performance compared to other classification methods. This concludes that the MLP method is also good for predicting models of the Naranjo et al. [14] dataset.

Meanwhile, Table 12 shows that the HSIC Lasso SMOTE model improves accuracy by 15.78% on k-NN, 18.26% on SVM, 15.06% on MLP from the model without using HSIC Lasso SMOTE. The best classification method was achieved by the MLP method. The comparison between the sensitivity and precision values in each classification method is not much different. This shows that the three methods are able to predict with balanced results. The data that has been balanced after the oversampling process using the SMOTE method also improves the performance of the prediction model.

While on the ROC curve the prediction model shows good results in the three classification methods. The method that produces the highest AUC value is the MLP method with a value of 99.22% (see Figure 16). It also concludes that the MLP method is good for predicting models in the Sakar et al. [8] dataset. The performance results of the three datasets produce good performance on the prediction model using the MLP classification method. The sensitivity and precision parameters have a value comparison that is not too far away and has a fairly high F1 parameter. Compared to other datasets, Naranjo et al. [14] dataset have a fairly low performance using the HSIC Lasso method or not.

4.4 Discussion

From the previous experiment, HSIC Lasso provides better performance than the prediction model that does not use this method. Moreover, the use of SMOTE to balance the data in the Little et al. [7] dataset and Sakar et al. [8] dataset strengthens the HSIC Lasso method in evaluating data in the feature selection process. The author tries to compare the HSIC Lasso method with related studies based on the same dataset and using the feature selection method.

Compared to other studies, the Lasso HSIC prediction model produces fewer features and better accuracy performance in the Little et al. [7] dataset. Feature subset used in this study is contained in the feature subset used in the related study [16, 23, 24] (see Table 13). Although the features used are relatively the same, especially in the study of Ouhmida et al. [16], the classification results using the k-NN method produce lower accuracy. The k-NN method used as a predictive model in this study has been set with the best hyper-parameters. However, the MLP classification method is able to match the accuracy performance compared to other studies. Besides being compared to the same dataset, this comparison is based on evaluating the same predictive model using 10-fold cross-validation.

While in the Naranjo et al. [14] dataset, the HSIC Lasso method does not produce fewer features than other studies [17, 25] (see Table 14). The features used in other studies are of the same type and are contained in the subset of features used in this study. The method used in other studies is also the root of the HSIC Lasso method, namely the Lasso feature selection method. As explained earlier, the Lasso method does not transform data into kernel form. So there is no sparsity or data void that makes the correlation between features and output even greater. However, in the case of the Naranjo et al. [14] dataset which has 45 features, 240 samples, and balanced data classes, the HSIC Lasso model is not suitable for use. Although it is not well used, the MLP method produces better performance than other studies that use a smaller number of features. This concludes that the Lasso HSIC model in the case of the Naranjo et al. [14] dataset are not well used for detection systems. This dataset does not use the oversampling method, because the data is already balanced and has no effect if applied.

The results of the Lasso HSIC selection method on the Sakar et al. [8] dataset yields a non-distinguishable number of features based on the total features in the dataset (see Table 15). On the features used by research related to the Sakar et al. [8] dataset does not show in detail what features were selected. However, in the study of Sakar et al. [8] shows the distribution of features that have been grouped as in Table 2. Table 16 compares the distribution of features based on these groupings with the MRMR SVM-RBF model in the research of Sakar et al. [8] dataset.

Regardless of feature distribution, the proposed predictive model yields better performance than other studies. Especially in the HSIC Lasso SMOTE MLP method which produces a difference of 10.18% with the study of Sakar et al. [8] which has the same number of features. In this dataset, the use of the SMOTE method greatly affects the performance of the prediction model. When compared to prediction models that do not use SMOTE, the results of the prediction model performance are not much different from other studies. In every model variation that has been tested, HSIC Lasso always provides the best results in handling feature selection with very large amounts of data based on the accuracy of the predictions produced. Moreover, in every case that has been tested, HSIC Lasso can reduce features effectively. So that it is able to provide better performance with a minimal number of features.

5. Conclusion

Proposed method has proven successful in selecting vocal voice features and has a good performance in developing a Parkinson’s disease detection system. Likewise with the prediction model of related research, HSIC Lasso method can exceed the accuracy performance and produce fewer features in some conditions depending on the data tested from previous studies. Variation of data includes the number of samples and features. Number of samples comparison between two classes also affects HSIC Lasso calculation in determining the number of relevant and not redundant features. HSIC Lasso has proven to be more effective in building predictive models on balanced data with larger features and data samples.

HSIC Lasso feature selection method will be better used in high-dimensional data. High-dimensional data will make the data that has been transformed into a Hilbert Schmidt Kernel produce a stronger similarity value between the feature data and the output data. Then the lack of this study is that it does not focus on other aspect of relation between data feature and subject condition. The author generalizes the condition of the subject and ignores some features such age, gender, and the number of voice recordings on one subject. This is potentially important in evaluating human vocal voice data. Because the voices produced by people in different age or gender are very different.

Fig 1.

Figure 1.

Parkinson’s disease detection system flow.

The International Journal of Fuzzy Logic and Intelligent Systems 2023; 23: 482-499https://doi.org/10.5391/IJFIS.2023.23.4.482

Fig 2.

Figure 2.

Oversampling process using synthetic minority oversampling technique: (a) unbalanced dataset, (b) adding synthetic data from minority class, and (c) balanced dataset.

The International Journal of Fuzzy Logic and Intelligent Systems 2023; 23: 482-499https://doi.org/10.5391/IJFIS.2023.23.4.482

Fig 3.

Figure 3.

Lasso regression graph in case area where θ is error function.

The International Journal of Fuzzy Logic and Intelligent Systems 2023; 23: 482-499https://doi.org/10.5391/IJFIS.2023.23.4.482

Fig 4.

Figure 4.

HSIC Lasso method flow.

The International Journal of Fuzzy Logic and Intelligent Systems 2023; 23: 482-499https://doi.org/10.5391/IJFIS.2023.23.4.482

Fig 5.

Figure 5.

K-nearest neighbors method overview.

The International Journal of Fuzzy Logic and Intelligent Systems 2023; 23: 482-499https://doi.org/10.5391/IJFIS.2023.23.4.482

Fig 6.

Figure 6.

Support vector machine method overview.

The International Journal of Fuzzy Logic and Intelligent Systems 2023; 23: 482-499https://doi.org/10.5391/IJFIS.2023.23.4.482

Fig 7.

Figure 7.

Multilayer perceptron method overview.

The International Journal of Fuzzy Logic and Intelligent Systems 2023; 23: 482-499https://doi.org/10.5391/IJFIS.2023.23.4.482

Fig 8.

Figure 8.

ROC curve graph.

The International Journal of Fuzzy Logic and Intelligent Systems 2023; 23: 482-499https://doi.org/10.5391/IJFIS.2023.23.4.482

Fig 9.

Figure 9.

Selected feature score Little et al. [7] dataset.

The International Journal of Fuzzy Logic and Intelligent Systems 2023; 23: 482-499https://doi.org/10.5391/IJFIS.2023.23.4.482

Fig 10.

Figure 10.

Distribution selected feature Little et al. [7] dataset.

The International Journal of Fuzzy Logic and Intelligent Systems 2023; 23: 482-499https://doi.org/10.5391/IJFIS.2023.23.4.482

Fig 11.

Figure 11.

Selected feature score Naranjo et al. [14] dataset.

The International Journal of Fuzzy Logic and Intelligent Systems 2023; 23: 482-499https://doi.org/10.5391/IJFIS.2023.23.4.482

Fig 12.

Figure 12.

Distribution selected feature Naranjo et al. [14] dataset.

The International Journal of Fuzzy Logic and Intelligent Systems 2023; 23: 482-499https://doi.org/10.5391/IJFIS.2023.23.4.482

Fig 13.

Figure 13.

Distribution selected feature Sakar et al. [8] dataset.

The International Journal of Fuzzy Logic and Intelligent Systems 2023; 23: 482-499https://doi.org/10.5391/IJFIS.2023.23.4.482

Fig 14.

Figure 14.

ROC graph result: (a) targeting Parkinson’s disease class and (b) targeting normal class Sakar et al. [8] dataset.

The International Journal of Fuzzy Logic and Intelligent Systems 2023; 23: 482-499https://doi.org/10.5391/IJFIS.2023.23.4.482

Fig 15.

Figure 15.

ROC graph result: (a) targeting Parkinson’s disease class and (b) targeting normal class Naranjo et al. [14] dataset.

The International Journal of Fuzzy Logic and Intelligent Systems 2023; 23: 482-499https://doi.org/10.5391/IJFIS.2023.23.4.482

Fig 16.

Figure 16.

ROC graph result: (a) targeting Parkinson’s disease class and (b) targeting normal class Sakar et al. [8] dataset.

The International Journal of Fuzzy Logic and Intelligent Systems 2023; 23: 482-499https://doi.org/10.5391/IJFIS.2023.23.4.482

Table 1 . Details of data used.

DatasetNumber of featuresNumber of samplesParkinson’ disease samplesNormal samples
Little et al. [7]2219514748
Naranjo et al. [14]45240120120
Sakar et al. [8]754756564192

Table 2 . Feature subset grouping.

NumberFeature subsetDescription
IVoice BaselineBasic value of sound extraction
IIVocal FoldVocal cords extraction
IIITime FrequencySound frequency time extraction
IVMel FrequencyShort-term power spectrum coefficient
VWavelet TransformFeature extraction from wavelet transform
VITunable Q-factor Wavelet TransformFeature extraction from wavelet transform with Q-factor

Table 3 . Hyper-parameter tuning.

ClassificationHyper-parameter
k-NNn-neighbour: [1, 3, 5], weight: [uniform, distance], metric: [euclidean, minkowski]
SVMC: [0.1, 0.5, 1.0], kernel: [rbf, poly, sigmoid], gamma: auto
MLPsolver: [lbfgs, sgd, adam], hidden layer: [200, 300, 400], learning rate: [constant, invscaling, adaptive], iterasi: 3000, learning rate init: 0.0001

Table 4 . Performance test result without feature selection Little et al. [7] dataset (unit: %).

Classification methodAccuracySensitivityPrecisionF1AUC
k-NN91.2891.2891.5091.3698.20
SVM89.2389.2390.5788.0888.27
MLP90.7690.7690.5790.5597.03

Table 5 . Performance test result without feature selection Naranjo et al. [14] dataset (unit: %).

Classification methodAccuracySensitivityPrecisionF1AUC
k-NN77.0077.0077.0076.9984.17
SVM78.6778.6778.8978.7586.98
MLP79.6779.6779.6179.6187.63

Table 6 . Performance test result without feature selection Sakar et al. [8] dataset (unit: %).

Classification methodAccuracySensitivityPrecisionF1AUC
k-NN75.3975.3972.1572.2665.83
SVM75.0075.0081.2764.6793.38
MLP81.1281.1281.8377.8584.24

Table 7 . Performance test result with HSIC Lasso Little et al. [7] dataset (unit: %).

Classification methodAccuracySensitivityPrecisionF1AUC
k-NN94.8794.8795.1094.9399.16
SVM93.3393.3293.9093.4796.71
MLP91.2891.2891.1491.1896.72

Table 8 . Performance test result with HSIC Lasso Sakar et al. [8] dataset (unit: %).

Classification methodAccuracySensitivityPrecisionF1AUC
k-NN87.4387.4388.2186.1789.72
SVM87.4387.4387.0587.0290.13
MLP87.9687.9687.6487.6991.55

Table 9 . Hyper-parameter test result.

DatasetClassificationHyper-parameterAUC (%)
Little et al. [7]k-NNn-neighbor: 5, weight: distance, metric: euclidean98.13
SVMC: 1.0, Kernel: poly, Gamma: auto93.59
MLPSolver: adam, Hidden layer: 400, Learning rate: adaptive, Iterasi: 3000Learning rate init: 0.000199.70

Naranjo et al. [14]k-NNn-neighbor: 5, weight: distance, metric: euclidean86.03
SVMC: 0.5, Kernel: poly, Gamma: auto86.89
MLPSolver: adam, Hidden layer: 400, Learning rate: invscaling, Iterasi: 3000Learning rate init: 0.000188.33

Sakar et al. [8]k-NNn-neighbor: 5, weight: distance, metric: euclidean96.59
SVMC: 0.1, Kernel: poly, Gamma: auto97.72
MLPSolver: adam, Hidden layer: 400, Learning rate: invscaling, Iterasi: 3000Learning rate init: 0.000199.22

Table 10 . Performance test result with HSIC Lasso SMOTE Sakar et al. [8] dataset (unit: %).

Classification methodAccuracySensitivityPrecisionF1AUC
k-NN97.6197.6197.6797.6197.61
SVM96.9396.9397.0496.9398.63
MLP98.8498.5795.6896.5799.70

Table 11 . Performance test result with HSIC Lasso SMOTE Naranjo et al. [14] dataset (unit: %).

Classification methodAccuracySensitivityPrecisionF1AUC
k-NN79.8479.8375.8579.8486.03
SVM83.3383.3383.3783.3286.89
MLP88.3480.0096.0087.2788.33

Table 12 . Performance test result with HSIC Lasso SMOTE Sakar et al. [8] dataset (unit: %).

Classification methodAccuracySensitivityPrecisionF1AUC
k-NN91.1791.1792.8291.7096.59
SVM93.2693.2693.4093.2597.72
MLP96.1896.1896.2796.1899.22

Table 13 . Comparison with a number of previous studies Little et al. [7] dataset.

StudiesMethodNumber of featuresAccuracy (%)
Ouhmida et al. [16]ReliefF + k-NN897.92
Thapa et al. [23]Correlation-based + TSVM1393.90
Ma et al. [24]RFE + SVM1196.29
This studySMOTE + HSIC Lasso + k-NN897.61
SMOTE + HSIC Lasso + SVM896.93
SMOTE + HSIC Lasso + MLP898.84

Table 14 . Comparison with a number of previous studies Naranjo et al. [14] dataset.

StudiesMethodNumber of featuresAccuracy (%)
Abdullah et al. [17]Lasso + SVM1086.90
Naranjo et al. [25]Correlation-based + Lasso + Bayessian1086.20
HSIC Lasso + k-NN1875.83
HSIC Lasso + SVM1883.33
HSIC Lasso + MLP1888.34

Table 15 . Comparison with a number of previous studies Sakar et al. [8].

StudiesMethodNumber of featuresAccuracy (%)
Gunduz [11]ReliefF + VAE6091.60
Sakar et al. [8]MRMR + SVM-RBF5086.00
SMOTE + HSIC Lasso + k-NN5091.17
SMOTE + HSIC Lasso + SVM5093.26
SMOTE + HSIC Lasso + MLP5096.18

Table 16 . Distribution of Sakar et al. [8] dataset features.

StudiesSelection feature methodsSelected feature subset distribution
IIIIIIIVVVI
Sakar et al. [8]MRMR43210130
This studyHSIC Lasso1316336

References

  1. Jankovic, J (2008). Parkinson’s disease: clinical features and diagnosis. Journal of Neurology Neurosurgery and Psychiatry. 79, 368-376. https://doi.org/10.1136/jnnp.2007.131045
    Pubmed CrossRef
  2. Shafique, H, Blagrove, A, Chung, A, and Logendrarajah, R (2011). Causes of Parkinson’s disease: literature review. Journal of Parkinsonism & Restless Legs Syndrome. 1, 5-7. https://doi.org/10.2147/JPRLS.S37041
    CrossRef
  3. Klein, MO, Battagello, DS, Cardoso, AR, Hauser, DN, Bittencourt, JC, and Correa, RG (2019). Dopamine: functions, signaling, and association with neurological diseases. Cellular and Molecular Neurobiology. 39, 31-59. https://doi.org/10.1007/s10571-018-0632-3
    Pubmed CrossRef
  4. Magrinelli, F, Picelli, A, Tocco, P, Federico, A, Roncari, L, Smania, N, Zanetteans, G, and Tamburin, S (2016). Pathophysiology of motor dysfunction in Parkinson’s disease as the rationale for drug treatment and rehabilitation. Parkinson’s Disease. 2016. article no 9832839
    Pubmed KoreaMed CrossRef
  5. Sakar, BE, Isenkul, ME, Sakar, CO, Sertbas, A, Gurgen, F, Delil, S, Apaydin, H, and Kursun, O (2013). Collection and analysis of a Parkinson speech dataset with multiple types of sound recordings. IEEE Journal of Biomedical and Health Informatics. 17, 828-834. https://doi.org/10.1109/JBHI.2013.2245674
    Pubmed CrossRef
  6. Sinder, DJ, Krane, MH, and Flanagan, JL (1998). Synthesis of fricative sounds using an aeroacoustic noise generation model. The Journal of the Acoustical Society of America. 103, 2775-2775. https://doi.org/10.1121/1.421418
    CrossRef
  7. Little, MA, McSharry, PE, Roberts, SJ, Costello, DA, and Moroz, IM (2007). Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection. BioMedical Engineering OnLine. 6. article no 23
    Pubmed KoreaMed CrossRef
  8. Sakar, CO, Serbes, G, Gunduz, A, Tunc, HC, Nizam, H, and Sakar, BE (2019). A comparative analysis of speech signal processing algorithms for Parkinson’s disease classification and the use of the tunable Q-factor wavelet transform. Applied Soft Computing. 74, 255-263. https://doi.org/10.1016/j.asoc.2018.10.022
    CrossRef
  9. Kumar, L, and Singh, DK . Analyzing computational response and performance of deep convolution neural network for plant disease classification using plant leave dataset., Proceedings of 2021 10th IEEE International Conference on Communication Systems and Network Technologies (CSNT), 2021, Bhopal, India, Array, pp.549-553. https://doi.org/10.1109/CSNT51715.2021.9509632
    CrossRef
  10. Kumar, L, and Singh, DK (2023). A comprehensive survey on generative adversarial networks used for synthesizing multimedia content. Multimedia Tools and Applications. 82, 40585-40624. https://doi.org/10.1007/s11042-023-15138-x
    CrossRef
  11. Gunduz, H (2021). An efficient dimensionality reduction method using filter-based feature selection and variational autoencoders on Parkinson’s disease classification. Biomedical Signal Processing and Control. 66, article no 102452. https://doi.org/10.1016/j.bspc.2021.102452
    CrossRef
  12. Yamada, M, Jitkrittum, W, Sigal, L, Xing, EP, and Sugiyama, M (2014). High-dimensional feature selection by featurewise kernelized lasso. Neural Computation. 26, 185-207. https://doi.org/10.1162/NECO_a_00537
    Pubmed CrossRef
  13. Damodaran, BB, Courty, N, and Lefevre, S (2017). Sparse Hilbert Schmidt independence criterion and surrogate-kernel-based feature selection for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing. 55, 2385-2398. https://doi.org/10.1109/TGRS.2016.2642479
    CrossRef
  14. Naranjo, L, Perez, CJ, Campos-Roca, Y, and Martin, J (2016). Addressing voice recording replications for Parkinson’s disease detection. Expert Systems with Applications. 46, 286-292. https://doi.org/10.1016/j.eswa.2015.10.034
    CrossRef
  15. Hoq, M, Uddin, MN, and Park, SB (2021). Vocal feature extraction-based artificial intelligent model for Parkinson’s disease detection. Diagnostics. 11, article no 1076. https://doi.org/10.3390/diagnostics11061076
    Pubmed KoreaMed CrossRef
  16. Ouhmida, A, Raihani, A, Cherradi, B, and Terrada, O (2021). A novel approach for Parkinson’s disease detection based on voice classification and features selection techniques. International Journal of Online Biomedical Engineering. 17, 111-130. https://doi.org/10.3991/ijoe.v17i10.24499
    CrossRef
  17. Abdullah, AA, Norazman, NN, Ahmad, WKW, Awang, SA, and Jian, FW . Detection of Parkinson’s disease (PD) based on speech recordings using machine learning techniques., Proceedings of 2020 International Conference on Innovation and Intelligence for Informatics, Computing and Technologies (3ICT), 2020, Sakheer, Bahrain, Array, pp.1-6. https://doi.org/10.1109/3ICT51146.2020.9311991
    CrossRef
  18. Ramchoun, H, Ghanou, Y, Ettaouil, M, and Janati Idrissi, MA (2016). Multilayer perceptron: architecture optimization and training. International Journal of Interactive Multimedia and Artificial Intelligence. 4, 26-30. http://doi.org/10.9781/ijimai.2016.415
    CrossRef
  19. Tibshirani, R (1996). Regression Shrinkage and Selection Via the Lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology. 58, 267-288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
    CrossRef
  20. Efron, B, Hastie, T, Johnstone, I, and Tibshirani, R (2004). Least angle regression. Annals of Statistics. 32, 407-499. https://doi.org/10.1214/009053604000000067
    CrossRef
  21. Cunningham, P, and Delany, SJ (2022). k-nearest neighbour classifiers: a tutorial. ACM Computing Surveys. 54, article no 128. https://doi.org/10.1145/3459665
    CrossRef
  22. Hsu, CW, and Lin, CJ (2002). A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Networks. 13, 415-425. https://doi.org/10.1109/72.991427
    Pubmed CrossRef
  23. Thapa, S, Adhikari, S, Ghimire, A, and Aditya, A . Feature selection based twin-support vector machine for the diagnosis of Parkinson’s disease., Proceedings of 2020 IEEE 8th R10 Humanitarian Technology Conference (R10-HTC), 2020, Kuching, Malaysia, Array, pp.1-6. https://doi.org/10.1109/R10-HTC49770.2020.9356984
    KoreaMed CrossRef
  24. Ma, H, Tan, T, Zhou, H, and Gao, T . Support vector machine-recursive feature elimination for the diagnosis of Parkinson disease based on speech analysis., Proceedings of 2016 7th International Conference on Intelligent Control and Information Processing (ICICIP), 2016, Siem Reap, Cambodia, Array, pp.34-40. https://doi.org/10.1109/ICICIP.2016.7885912
    CrossRef
  25. Naranjo, L, Perez, CJ, Martin, J, and Campos-Roca, Y (2017). A two-stage variable selection and classification approach for Parkinson’s disease detection by using voice recording replications. Computer Methods and Programs in Biomedicine. 142, 147-156. https://doi.org/10.1016/j.cmpb.2017.02.019
    Pubmed CrossRef