International Journal of Fuzzy Logic and Intelligent Systems 2021; 21(3): 205-212
Published online September 25, 2021
https://doi.org/10.5391/IJFIS.2021.21.3.205
© The Korean Institute of Intelligent Systems
Seungyeon Lee^{1*}, Eunji Jo^{1*}, Sangheum Hwang^{2}, Gyeong Bok Jung^{3}, and Dohyun Kim^{1}
^{1}Department of Industrial and Management Engineering, Myongji University, Yongin, Korea
^{2}Department of Industrial and Information Systems Engineering, Seoul National University of Science and Technology, Seoul, Korea
^{3}Department of Physics Education, Chosun University, Gwangju, Korea
Correspondence to :
Dohyun Kim (ftgog@mju.ac.kr)
^{*}These authors contributed equally to this work.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Deep neural networks (DNNs) have recently attracted attention in various areas. Their hierarchical architecture is used to model complex nonlinear relationships in high-dimensional data. DNNs generally require large numbers of data to train millions of parameters. However, the training of a DNN with a small number of high-dimensional data can result in an overfitting. To alleviate this problem, we propose a similarity-based DNN that can effectively reduce the dimensionality of the data. The proposed method utilizes a kernel function to calculate pairwise similarities of observations as input, and the nonlinearity based on the similarities is then explored using a DNN. Experiment results show that the proposed method performs effectively regardless of the dataset used, implying that it can be applied as an alternative when learning a small number of high-dimensional data.
Keywords: Deep neural networks (DNNs), Kernel method, Feature extraction, High-dimensional data
In recent years, deep neural networks (DNNs) have attracted attention in the field of machine learning. DNNs consist of a hierarchical structure with more than one hidden layer between an input layer and an output layer, through which they can model complex nonlinear relationships in high-dimensional data. However, DNNs are often affected by the dimensionality of the features. In particular, when training DNNs with a small number of high-dimensional data, such as in a sequencing or spectrum analysis, an overfitting, low prediction performance, and high computing cost can arise. Dimensionality-reduction algorithms have been proposed to solve these problems. The algorithms can be classified into two categories: feature selection and feature extraction [1].
With feature selection, a subset of features is directly selected without altering the original representation of the features. These algorithms can be broadly classified into similarity-, information-theory-, statistical-, and sparse-learning-based methods [2]. In the case of similarity-based methods such as the Laplacian score and a spectral analysis based feature selection using expanded Laplacian and Fisher scores, the importance of the features is assessed based on the ability to preserve similarities between observations. In addition, information-theory-based methods such as information gain and minimum redundancy maximum relevance use different heuristic filter criteria to measure the importance of the features [2]. These methods maximize the feature relevance and minimize the feature redundancy. A statistical-based method relies on various statistical measures instead of learning algorithms to evaluate the relevance of the features. In this case, the features with high redundancy are removed. Finally, sparse learning-based methods, such as
Feature extraction algorithms combine the original features to create new features. These methods project the original high-dimensional features into a new feature space with low dimensionality [1]. Typical examples of feature extraction algorithms include a principal component analysis (PCA) and autoencoders. A PCA generates new features represented by linear combinations of the original features. A PCA attempts to create new features in a linear subspace by maximizing the variance of the data [3]. The dimensionality of the features can be reduced by selecting from among the newly extracted features those that properly describe the data. However, a standard PCA is not helpful when the data cannot be expressed well in a linear subspace. Fortunately, a standard PCA can be generalized into a nonlinear form using kernel functions [4]. An autoencoder is a deep learning method that reconstructs the original features of the input in its output to create new nonlinear features. The autoencoder extracts nonlinear features from the middle hidden layer, the dimensions of which are smaller than the input features. The model is learned by reconstructing the input data in the output layer, and thus it can obtain features that reflect the important characteristics of the data.
Before analyzing a small number of high-dimensional data with a DNN, a PCA or an autoencoder is used to reduce the dimensionality. These approaches can decrease the complexity of the DNN by directly reducing the features of the data. However, because the features are extracted without considering the target, the models can exhibit a low prediction performance.
In this study, we propose a similarity-based deep neural network for small high-dimensional data, which simplifies the model by utilizing the similarities between observations rather than directly reducing the dimensionality of the features. The proposed method can mitigate the problems of existing DNNs, such as the model complexity and run time. The following are the contributions of the proposed method. 1) Using the similarities of the observations, the method can predict labels without direct access to the features for a small number of high-dimensional data. 2) Through the kernel function, the raw data are projected in the feature space, and the nonlinearity is explored using the DNNs. 3) The proposed method effectively reduces the complexity of the model without directly reducing the dimensionality of the features. 4) Using only the similarities between observations, the proposed method shows a good performance.
The remainder of this paper is organized as follows: Section 2 describes the similarity-based method. Section 3 describes the process of the proposed method in detail. In Section 4, the proposed method and the basic DNN model are compared and the results are presented. Finally, Section 5 summarizes the contributions of this study and discusses issues for further research.
Machine learning algorithms often deal with high-dimensional datasets and complex nonlinear models, which require an optimization problem of high dimensionality. One of the common approaches to solving this is the primal-dual method, which can also be applied to a variety of tasks, including a regression problem. By solving the problem through primal-dual methods, the computational efficiency significantly increases in many cases, and specific characteristics can be used more efficiently [5]. A support vector machine (SVM) is a representative method based on primal and dual methods. A primal SVM is a feature-based approach that directly models the data features, whereas the dual SVM is a similarity-based approach that models the data similarities.
In this section, we deal with a feature-based approach (primal problem) using features directly, and a similarity-based approach (dual problem) utilizing data similarities. We then describe the differences between the two approaches.
A training dataset
where
In
The discriminant function can be represented as a dual problem using the dual formulation of the optimization problem. The dual problem focuses only on the inner product between observations and not on the features. In the dual SVM, model
The primal problem models the features, whereas the dual problem models the linear combination of the inner product using all of the data. Therefore, when the number of training data is smaller than the number of features, the dual problem is faster than the primal problem. In addition, the dual representation can express the nonlinearity of the data using a kernel function, unlike with the primal problem.
For real-world problems, it is often necessary to model more complex representations beyond the linear dependence of the data. A kernel function captures the nonlinear data representations and can measure the similarities between observations in a high-dimensional feature space, eliminating the need to map observations directly to this space [6]. The kernel function can be applied in many different ways, such as a polynomial kernel and radial basis function (RBF) kernel. First, a linear kernel is calculated using the inner product and is expressed as follows:
Second, the Gaussian or RBF kernel is in the form of a radial basis function and is expressed as
The RBF kernel is confirmed to be a valid kernel by expanding the square term and allows the feature vector corresponding to the RBF kernel to have an infinite number of dimensionalities [7].
The kernel SVM is a model that applies the kernel function to the dual SVM to measure the similarity in a high-dimensional feature space [8]. The dual model
Using the dual formulation, the parameter
In the dual problem, through the kernel function,
In this study, we propose a similarity-based DNN. The proposed method can effectively reduce the dimensionality of the data by adopting a similarity-based approach to a DNN when the number of features is much larger than the number of data.
With the proposed method, the dimensionality can be reduced using the similarities between the data as input. Each observation is newly configured as similarities with the entire training data and similarities enter into input nodes. Figure 1 shows the input layer of the proposed method and a basic DNN model. The proposed method uses the similarities between data as input, whereas the basic DNN uses
The notation for the proposed method is expressed as follows:
Here,
The proposed method conducts a prediction based on the similarity in the high-dimensional feature space obtained using the kernel function. Thus, when applying a small sample of high-dimensional data (i.e.,
This proposed method can effectively reduce the number of input nodes in a DNN by adopting a similarity-based approach. The pairwise similarities of the observations are calculated using the kernel function, and the nonlinearity of the data is then explored through the application of the DNN with the similarities obtained as input. As a result, the proposed method can effectively simplify the model when using a small number of samples of high-dimensional data. In addition, the proposed method is advantageous in terms of computational efficiency and running time compared to conventional dimension reduction methods such as a PCA and an autoencoder. A PCA must compute
The proposed method reduces the dimensionality through the similarities. However, the similarities with all observations may be unimportant. Accordingly, we utilize a variant of a rectified linear unit (ReLU) based sparse autoencoder [9] to consider only similarities with important observations. As the only difference between the proposed and existing methods, the existing method is applied to the autoencoder, whereas the proposed method is applied to a basic DNN. This can be thought of as the same process as finding support vectors in an SVM. The ReLU-based sparse autoencoder was proposed to select important features by making the input nodes sparse. The algorithm adds a hidden layer of the same number of dimensions as the input layer and connects the nodes in a one-to-one manner (Figure 3). These hidden nodes use the ReLU activation function,
We evaluated the proposed methods, combined with the two kernels (linear and RBF), by comparing them with the existing method using features as the input. The proposed sparse method with an additional step for selecting important observations was also compared. Because the input data types of the proposed and existing methods differ, the two models consist of different structures. Therefore, to minimize the sum of the margin losses, we defined the experimental protocols that are commonly used in the experiments using both methods. In the experiments, we applied classification tasks on two datasets, and all experiments were implemented in PyTorch. We used the Adam optimizer and initialized parameters with a normal distribution
Computational experiments were conducted using two spectral datasets. Both datasets are high-dimensional with a small number of samples (Table 1). The detailed dataset descriptions and statistics are as follows.
The Brain dataset was collected using Raman spectroscopy, which is widely applied to characterize the biology organization. The dataset shows biochemical changes in the hippocampal region of the brain. The vector normalization method was used to correct the baseline of the Raman spectra, and the spectral data of the brain tissue were calculated as the average of 10 measured samples. The dataset consists of 30 observations with 10 measurements per class (i.e., normal, ischemic, and neuronal nitric oxide synthase [nNOS] inhalation) with 2,501 frequencies (500–1,750 cm^{−1}) [11].
The BV dataset represents biochemical changes at the molecular level in human MDA-MB-231 breast cancer cells following exposure to bee venom (BV). It consists of Raman spectral data measured at various concentrations. The dataset has 30 observations: 10 observations for each concentration (i.e., 0.7, 1.5, and 3.0 μg/mL). Raman intensities were measured at 2,301 frequencies (600–1,750 cm^{−1}) [12].
After finding the optimal hyper-parameters using the three-fold cross-validation, we obtained the average mean and standard deviation of the accuracy over 100 runs. The results are summarized in Table 2 and Figure 4. In Table 2, the proposed method using the RBF kernel is highly competitive. The proposed method achieves a similar or higher accuracy than the existing DNN approach. In the BV dataset, the proposed method using similarity as the input showed a performance improvement of 2.41% and a reduction in the number of parameters of 99.4%, compared to the existing DNN method using features as input. In the Brain dataset, although the proposed method slightly improved the performance by approximately 0.5%, the standard deviation of the proposed method was lower than that of the existing model. This implies that the proposed method is more stable. These performance improvements are thought to be due to the reduced dimensionality of the input data (i.e., the model parameters), which effectively reduces the model complexity and is therefore less affected by an overfitting.
Experiment results show that the type of kernel used to measure the similarity of the data is important. As listed in Table 2, the performance was improved only when using the RBF kernel, whereas using the linear kernel degraded the performance. The linear kernel measures the similarities among the data linearly using the inner product, whereas the RBF kernel computes the similarities implicitly based on infinite-dimensional features. These characteristics of the RBF kernel can effectively reflect the complicated relations between observations. Therefore, the proposed method using the RBF kernel can be a useful candidate for modeling complicated high-dimensional data.
We conducted additional experiments to select the important observations, which correspond to the proposed sparse method in Table 2. The proposed sparse method is less accurate than the proposed method. However, this method has the advantage of detecting important observations. Notedly, it showed a competitive performance despite the similarities (i.e., input features) being removed by approximately 15% or more. This method can be a useful alternative when applying similarities with all observations is neither a possible nor practical approach.
In this study, a new method was proposed for solving problems when using a small number of high-dimensional data samples. The training of a DNN with high-dimensional data can result in model complexity, a decreased learning accuracy, and computational costs. The proposed similarity-based approach can be used to solve such problems. First, the similarities between observations are measured in a high-dimensional feature space by the kernel function, and the nonlinearity of the data is then explored using a DNN. The proposed method reduces the number of input nodes in the DNN and simplifies the model. Experimental results show that the proposed method can be a useful alternative when training a DNN using a small number of samples of high-dimensional data. This implies that the proposed method can be used in a variety of machine learning algorithms for small samples of high-dimensional data. In addition, we conducted additional experiments by applying a ReLU-based sparse autoencoder to the proposed method. The model can be further simplified using only the similarities with important observations. The proposed method is a suitable alternative when modeling the data by considering only those similarities with important observations, such as the support vectors of an SVM. As a future study, it would be interesting to investigate the relationships between observations and data modeling using such relationships. Applying these findings to regression tasks may also be a fruitful area for future research.
No potential conflict of interest relevant to this article was reported.
Table 2. Classification results for each dataset.
Dataset | Method | Accuracy | Number of parameters | Number of input nodes removed |
---|---|---|---|---|
Brain | DNNs | 0.979 ± 0.064 | 50,880 | 0 |
Proposed method (linear kernel) | 0.967 ± 0.000 | 2,490 | 0 | |
Proposed method (RBF kernel) | 0.985 ± 0.017 | 22,300 | 0 | |
Proposed sparse method (RBF kernel) | 0.912 ± 0.059 | 1,260 | 4 (20%) | |
BV | DNNs | 0.868 ± 0.030 | 4,304,000 | 0 |
Proposed method (linear kernel) | 0.767 ± 0.020 | 96,900 | 0 | |
Proposed method (RBF kernel) | 0.889 ± 0.025 | 24,470 | 0 | |
Proposed sparse method (RBF kernel) | 0.866 ± 0.025 | 3,670 | 3 (15%) |
E-mail: sylee1@mju.ac.kr
E-mail: goodji@mju.ac.kr
E-mail: shwang@seoultech.ac.kr
E-mail: gbjung@Chosun.ac.kr
E-mail: ftgog@mju.ac.kr
International Journal of Fuzzy Logic and Intelligent Systems 2021; 21(3): 205-212
Published online September 25, 2021 https://doi.org/10.5391/IJFIS.2021.21.3.205
Copyright © The Korean Institute of Intelligent Systems.
Seungyeon Lee^{1*}, Eunji Jo^{1*}, Sangheum Hwang^{2}, Gyeong Bok Jung^{3}, and Dohyun Kim^{1}
^{1}Department of Industrial and Management Engineering, Myongji University, Yongin, Korea
^{2}Department of Industrial and Information Systems Engineering, Seoul National University of Science and Technology, Seoul, Korea
^{3}Department of Physics Education, Chosun University, Gwangju, Korea
Correspondence to:Dohyun Kim (ftgog@mju.ac.kr)
^{*}These authors contributed equally to this work.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Deep neural networks (DNNs) have recently attracted attention in various areas. Their hierarchical architecture is used to model complex nonlinear relationships in high-dimensional data. DNNs generally require large numbers of data to train millions of parameters. However, the training of a DNN with a small number of high-dimensional data can result in an overfitting. To alleviate this problem, we propose a similarity-based DNN that can effectively reduce the dimensionality of the data. The proposed method utilizes a kernel function to calculate pairwise similarities of observations as input, and the nonlinearity based on the similarities is then explored using a DNN. Experiment results show that the proposed method performs effectively regardless of the dataset used, implying that it can be applied as an alternative when learning a small number of high-dimensional data.
Keywords: Deep neural networks (DNNs), Kernel method, Feature extraction, High-dimensional data
In recent years, deep neural networks (DNNs) have attracted attention in the field of machine learning. DNNs consist of a hierarchical structure with more than one hidden layer between an input layer and an output layer, through which they can model complex nonlinear relationships in high-dimensional data. However, DNNs are often affected by the dimensionality of the features. In particular, when training DNNs with a small number of high-dimensional data, such as in a sequencing or spectrum analysis, an overfitting, low prediction performance, and high computing cost can arise. Dimensionality-reduction algorithms have been proposed to solve these problems. The algorithms can be classified into two categories: feature selection and feature extraction [1].
With feature selection, a subset of features is directly selected without altering the original representation of the features. These algorithms can be broadly classified into similarity-, information-theory-, statistical-, and sparse-learning-based methods [2]. In the case of similarity-based methods such as the Laplacian score and a spectral analysis based feature selection using expanded Laplacian and Fisher scores, the importance of the features is assessed based on the ability to preserve similarities between observations. In addition, information-theory-based methods such as information gain and minimum redundancy maximum relevance use different heuristic filter criteria to measure the importance of the features [2]. These methods maximize the feature relevance and minimize the feature redundancy. A statistical-based method relies on various statistical measures instead of learning algorithms to evaluate the relevance of the features. In this case, the features with high redundancy are removed. Finally, sparse learning-based methods, such as
Feature extraction algorithms combine the original features to create new features. These methods project the original high-dimensional features into a new feature space with low dimensionality [1]. Typical examples of feature extraction algorithms include a principal component analysis (PCA) and autoencoders. A PCA generates new features represented by linear combinations of the original features. A PCA attempts to create new features in a linear subspace by maximizing the variance of the data [3]. The dimensionality of the features can be reduced by selecting from among the newly extracted features those that properly describe the data. However, a standard PCA is not helpful when the data cannot be expressed well in a linear subspace. Fortunately, a standard PCA can be generalized into a nonlinear form using kernel functions [4]. An autoencoder is a deep learning method that reconstructs the original features of the input in its output to create new nonlinear features. The autoencoder extracts nonlinear features from the middle hidden layer, the dimensions of which are smaller than the input features. The model is learned by reconstructing the input data in the output layer, and thus it can obtain features that reflect the important characteristics of the data.
Before analyzing a small number of high-dimensional data with a DNN, a PCA or an autoencoder is used to reduce the dimensionality. These approaches can decrease the complexity of the DNN by directly reducing the features of the data. However, because the features are extracted without considering the target, the models can exhibit a low prediction performance.
In this study, we propose a similarity-based deep neural network for small high-dimensional data, which simplifies the model by utilizing the similarities between observations rather than directly reducing the dimensionality of the features. The proposed method can mitigate the problems of existing DNNs, such as the model complexity and run time. The following are the contributions of the proposed method. 1) Using the similarities of the observations, the method can predict labels without direct access to the features for a small number of high-dimensional data. 2) Through the kernel function, the raw data are projected in the feature space, and the nonlinearity is explored using the DNNs. 3) The proposed method effectively reduces the complexity of the model without directly reducing the dimensionality of the features. 4) Using only the similarities between observations, the proposed method shows a good performance.
The remainder of this paper is organized as follows: Section 2 describes the similarity-based method. Section 3 describes the process of the proposed method in detail. In Section 4, the proposed method and the basic DNN model are compared and the results are presented. Finally, Section 5 summarizes the contributions of this study and discusses issues for further research.
Machine learning algorithms often deal with high-dimensional datasets and complex nonlinear models, which require an optimization problem of high dimensionality. One of the common approaches to solving this is the primal-dual method, which can also be applied to a variety of tasks, including a regression problem. By solving the problem through primal-dual methods, the computational efficiency significantly increases in many cases, and specific characteristics can be used more efficiently [5]. A support vector machine (SVM) is a representative method based on primal and dual methods. A primal SVM is a feature-based approach that directly models the data features, whereas the dual SVM is a similarity-based approach that models the data similarities.
In this section, we deal with a feature-based approach (primal problem) using features directly, and a similarity-based approach (dual problem) utilizing data similarities. We then describe the differences between the two approaches.
A training dataset
where
In
The discriminant function can be represented as a dual problem using the dual formulation of the optimization problem. The dual problem focuses only on the inner product between observations and not on the features. In the dual SVM, model
The primal problem models the features, whereas the dual problem models the linear combination of the inner product using all of the data. Therefore, when the number of training data is smaller than the number of features, the dual problem is faster than the primal problem. In addition, the dual representation can express the nonlinearity of the data using a kernel function, unlike with the primal problem.
For real-world problems, it is often necessary to model more complex representations beyond the linear dependence of the data. A kernel function captures the nonlinear data representations and can measure the similarities between observations in a high-dimensional feature space, eliminating the need to map observations directly to this space [6]. The kernel function can be applied in many different ways, such as a polynomial kernel and radial basis function (RBF) kernel. First, a linear kernel is calculated using the inner product and is expressed as follows:
Second, the Gaussian or RBF kernel is in the form of a radial basis function and is expressed as
The RBF kernel is confirmed to be a valid kernel by expanding the square term and allows the feature vector corresponding to the RBF kernel to have an infinite number of dimensionalities [7].
The kernel SVM is a model that applies the kernel function to the dual SVM to measure the similarity in a high-dimensional feature space [8]. The dual model
Using the dual formulation, the parameter
In the dual problem, through the kernel function,
In this study, we propose a similarity-based DNN. The proposed method can effectively reduce the dimensionality of the data by adopting a similarity-based approach to a DNN when the number of features is much larger than the number of data.
With the proposed method, the dimensionality can be reduced using the similarities between the data as input. Each observation is newly configured as similarities with the entire training data and similarities enter into input nodes. Figure 1 shows the input layer of the proposed method and a basic DNN model. The proposed method uses the similarities between data as input, whereas the basic DNN uses
The notation for the proposed method is expressed as follows:
Here,
The proposed method conducts a prediction based on the similarity in the high-dimensional feature space obtained using the kernel function. Thus, when applying a small sample of high-dimensional data (i.e.,
This proposed method can effectively reduce the number of input nodes in a DNN by adopting a similarity-based approach. The pairwise similarities of the observations are calculated using the kernel function, and the nonlinearity of the data is then explored through the application of the DNN with the similarities obtained as input. As a result, the proposed method can effectively simplify the model when using a small number of samples of high-dimensional data. In addition, the proposed method is advantageous in terms of computational efficiency and running time compared to conventional dimension reduction methods such as a PCA and an autoencoder. A PCA must compute
The proposed method reduces the dimensionality through the similarities. However, the similarities with all observations may be unimportant. Accordingly, we utilize a variant of a rectified linear unit (ReLU) based sparse autoencoder [9] to consider only similarities with important observations. As the only difference between the proposed and existing methods, the existing method is applied to the autoencoder, whereas the proposed method is applied to a basic DNN. This can be thought of as the same process as finding support vectors in an SVM. The ReLU-based sparse autoencoder was proposed to select important features by making the input nodes sparse. The algorithm adds a hidden layer of the same number of dimensions as the input layer and connects the nodes in a one-to-one manner (Figure 3). These hidden nodes use the ReLU activation function,
We evaluated the proposed methods, combined with the two kernels (linear and RBF), by comparing them with the existing method using features as the input. The proposed sparse method with an additional step for selecting important observations was also compared. Because the input data types of the proposed and existing methods differ, the two models consist of different structures. Therefore, to minimize the sum of the margin losses, we defined the experimental protocols that are commonly used in the experiments using both methods. In the experiments, we applied classification tasks on two datasets, and all experiments were implemented in PyTorch. We used the Adam optimizer and initialized parameters with a normal distribution
Computational experiments were conducted using two spectral datasets. Both datasets are high-dimensional with a small number of samples (Table 1). The detailed dataset descriptions and statistics are as follows.
The Brain dataset was collected using Raman spectroscopy, which is widely applied to characterize the biology organization. The dataset shows biochemical changes in the hippocampal region of the brain. The vector normalization method was used to correct the baseline of the Raman spectra, and the spectral data of the brain tissue were calculated as the average of 10 measured samples. The dataset consists of 30 observations with 10 measurements per class (i.e., normal, ischemic, and neuronal nitric oxide synthase [nNOS] inhalation) with 2,501 frequencies (500–1,750 cm^{−1}) [11].
The BV dataset represents biochemical changes at the molecular level in human MDA-MB-231 breast cancer cells following exposure to bee venom (BV). It consists of Raman spectral data measured at various concentrations. The dataset has 30 observations: 10 observations for each concentration (i.e., 0.7, 1.5, and 3.0 μg/mL). Raman intensities were measured at 2,301 frequencies (600–1,750 cm^{−1}) [12].
After finding the optimal hyper-parameters using the three-fold cross-validation, we obtained the average mean and standard deviation of the accuracy over 100 runs. The results are summarized in Table 2 and Figure 4. In Table 2, the proposed method using the RBF kernel is highly competitive. The proposed method achieves a similar or higher accuracy than the existing DNN approach. In the BV dataset, the proposed method using similarity as the input showed a performance improvement of 2.41% and a reduction in the number of parameters of 99.4%, compared to the existing DNN method using features as input. In the Brain dataset, although the proposed method slightly improved the performance by approximately 0.5%, the standard deviation of the proposed method was lower than that of the existing model. This implies that the proposed method is more stable. These performance improvements are thought to be due to the reduced dimensionality of the input data (i.e., the model parameters), which effectively reduces the model complexity and is therefore less affected by an overfitting.
Experiment results show that the type of kernel used to measure the similarity of the data is important. As listed in Table 2, the performance was improved only when using the RBF kernel, whereas using the linear kernel degraded the performance. The linear kernel measures the similarities among the data linearly using the inner product, whereas the RBF kernel computes the similarities implicitly based on infinite-dimensional features. These characteristics of the RBF kernel can effectively reflect the complicated relations between observations. Therefore, the proposed method using the RBF kernel can be a useful candidate for modeling complicated high-dimensional data.
We conducted additional experiments to select the important observations, which correspond to the proposed sparse method in Table 2. The proposed sparse method is less accurate than the proposed method. However, this method has the advantage of detecting important observations. Notedly, it showed a competitive performance despite the similarities (i.e., input features) being removed by approximately 15% or more. This method can be a useful alternative when applying similarities with all observations is neither a possible nor practical approach.
In this study, a new method was proposed for solving problems when using a small number of high-dimensional data samples. The training of a DNN with high-dimensional data can result in model complexity, a decreased learning accuracy, and computational costs. The proposed similarity-based approach can be used to solve such problems. First, the similarities between observations are measured in a high-dimensional feature space by the kernel function, and the nonlinearity of the data is then explored using a DNN. The proposed method reduces the number of input nodes in the DNN and simplifies the model. Experimental results show that the proposed method can be a useful alternative when training a DNN using a small number of samples of high-dimensional data. This implies that the proposed method can be used in a variety of machine learning algorithms for small samples of high-dimensional data. In addition, we conducted additional experiments by applying a ReLU-based sparse autoencoder to the proposed method. The model can be further simplified using only the similarities with important observations. The proposed method is a suitable alternative when modeling the data by considering only those similarities with important observations, such as the support vectors of an SVM. As a future study, it would be interesting to investigate the relationships between observations and data modeling using such relationships. Applying these findings to regression tasks may also be a fruitful area for future research.
Input layer of proposed method and basic DNN.
Flowchart of proposed method.
Selection of important observations.
Plot of prediction accuracy.
Table 1 . Experimental data.
Dataset | Observations | Features | Classes |
---|---|---|---|
Brain | 30 | 2,501 | 3 |
BV | 30 | 2,301 | 3 |
Table 2 . Classification results for each dataset.
Dataset | Method | Accuracy | Number of parameters | Number of input nodes removed |
---|---|---|---|---|
Brain | DNNs | 0.979 ± 0.064 | 50,880 | 0 |
Proposed method (linear kernel) | 0.967 ± 0.000 | 2,490 | 0 | |
Proposed method (RBF kernel) | 0.985 ± 0.017 | 22,300 | 0 | |
Proposed sparse method (RBF kernel) | 0.912 ± 0.059 | 1,260 | 4 (20%) | |
BV | DNNs | 0.868 ± 0.030 | 4,304,000 | 0 |
Proposed method (linear kernel) | 0.767 ± 0.020 | 96,900 | 0 | |
Proposed method (RBF kernel) | 0.889 ± 0.025 | 24,470 | 0 | |
Proposed sparse method (RBF kernel) | 0.866 ± 0.025 | 3,670 | 3 (15%) |
P. Murugeswari and S. Vijayalakshmi
International Journal of Fuzzy Logic and Intelligent Systems 2020; 20(4): 336-345 https://doi.org/10.5391/IJFIS.2020.20.4.336Jihad Anwar Qadir, Abdulbasit K. Al-Talabani, and Hiwa A. Aziz
International Journal of Fuzzy Logic and Intelligent Systems 2020; 20(4): 272-277 https://doi.org/10.5391/IJFIS.2020.20.4.272Ali Rohan and Sung Ho Kim
International Journal of Fuzzy Logic and Intelligent Systems 2019; 19(2): 78-87 https://doi.org/10.5391/IJFIS.2019.19.2.78Input layer of proposed method and basic DNN.
|@|~(^,^)~|@|Flowchart of proposed method.
|@|~(^,^)~|@|Selection of important observations.
|@|~(^,^)~|@|Plot of prediction accuracy.