Int. J. Fuzzy Log. Intell. Syst. 2018; 18(2): 120-125
Published online June 25, 2018
https://doi.org/10.5391/IJFIS.2018.18.2.120
© The Korean Institute of Intelligent Systems
Heesung Lee
Department of Railroad Electrical and Electronics Engineering, Korea National University of Transportation, Uiwang-si, Korea
Correspondence to :
Heesung Lee (hslee0717@ut.ac.kr)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
This paper proposes a new feature extraction scheme, combining global and local features. The proposed method uses principal component analysis (PCA) and linear discriminant analysis (LDA) for global property and locality preserving projections (LPP) for local property of the pattern. PCA and LDA are known for extracting the most descriptive ones after projection while LPP is known for preserving the neighborhood structure of the data set. The proposed combing method integrates global and local descriptive information and finds an efficient set of alternatives beyond PCA, LDA and LPP in the parametric space. Further, In order to find optimal parameters, the genetic algorithm (GA) is employed. Experiments are performed with four data sets selected in UCI machine learning repository to show the performance of the proposed algorithm.
Keywords: PCA, LDA, LPP, GA, Feature space, UCI machine learning repository
For general pattern recognition systems, the patterns can be represented as certain features [1]. It is also well-known fact that the irrelevant or redundant features degrade the classification accuracy, so constructing an appropriate feature subset is essential to build a good classification algorithm. Therefore, developing recognition scheme of real world objects or concepts using computational methods, the selection of an appropriate set of features is of considerable importance [2]. In this paper, we propose a new feature extraction scheme combining locality preserving projection with global information. The proposed method uses principal component analysis (PCA) and linear discriminant analysis (LDA) for global property and locality preserving projections (LPP) for local property of the pattern to improve the classification ability. PCA and LDA are well known for preserving the most descriptive ones after projection while LPP is known for preserving the neighborhood structure of the data set [3]. Our combing method can integrate global and local descriptive information and find a richer set of alternatives beyond PCA, LDA and LPP. In order to find the hybrid features adaptively and find optimal parameters, the genetic algorithm (GA) is employed. GAs have been used for many scientific and engineering optimization and search problems [4]. GAs provide an adaptive and robust computational procedure modeled on the mechanics of natural genetic systems [5]. GAs can be considered as a good solution for finding a richer set of alternatives beyond PCA, LDA and LPP in search space, because they are complicated problem in a large search space [6].
We introduce some preliminaries including PCA, LDA, LPP and GA briefly in Section 2. The proposed global and local features combining method based on GA is explained in Section 3. In Section 4, we assess the experimental performance of our proposed system. Finally, the conclusion is drawn in Section 5.
PCA can choose a dimensionality reducing linear projection that maximizes the scatter of all projected samples. Let us assume that an object is represented by the
and we have a set of objects
where each object belongs to one of
where
The objective of PCA is maximizing the determinant of the total scatter matrix of the projected samples as follows:
The new feature vectors
where
The LDA is a linear projection method that maximizes the ratio of determinant of between-class covariance matrix to the determinant of within-class covariance matrix [8]. The within-class scatter matrix
and
where
It has been proven that that this ratio is maximized when the column vectors of the projection matrix
Though the information of features is important, the local structure is also significant in many real-world applications [3, 10]. LPP is proposed for learning a locality preserving subspace and seeks to preserve the intrinsic geometry of the samples and local structure. The objective function of LPP is as follows:
where
In (
where
Finally, we can obtain reduced optimal problem as follows:
The
The GA typically maintains a fixed size population of individuals that represent the set of solution candidates for the optimization problem to be solved. The goodness of each candidate solution is evaluated using a fitness function. The population of the GA evolves with a set of genetic operators being applied. The basic genetic operators are selection, crossover and mutation. In the selection process, some individuals are selected to be copied into a tentative next population. An individual is selected for the next population with the number of copies proportional to the fitness value. The copied individuals are altered to search for a global optimal solution in the crossover and mutation processes. The GA is simple yet provides adaptive and robust optimization methodology. Given below is the basic configuration of the GA [6].
Table 6. Average, the best, and the worst accrucy of the proposed method
Database | Average | Best | Worst |
---|---|---|---|
Sonar | 86.27 (0.679) | 92.16 | 76.47 |
Spambase | 83.13 (0.009) | 83.57 | 82.26 |
SPECTF | 75.37 (0.084) | 83.33 | 63.64 |
Heart | 63.43 (0.026) | 65.67 | 59.70 |
We propose a method which uses PCA and LDA for global property and LPP for local property of the pattern to improve the classification ability. It can integrate global and local descriptive information and find a richer set of alternatives beyond PCA, LDA and LPP. Based on them, we can obtain new features which preserve global and local structure, simultaneously as follows:
where
where
Because with different
For experiments, four databases of the UCI repository [12] are used to test the proposed method. The databases used in the experiments are summarized in Table 1.
In this experiment, we use the genetic parameters given in Table 2. We divide each database into the four subsets. We use three subsets as training and the remaining one subset as test set. We made independent four runs to show the general performance of the algorithms. The training set is used to train classifier and find optimal parameter and the test set is used to evaluate the performance of the proposed method. The fitness function of a chromosome is computed by training accuracy using leave-one-out cross-validation of the training set with nearest neighbor (NN) classifier [13]. We denote the results of previous three methods using only PCA, LDA and LPP in Tables 3, 4, and 5, respectively. We also denote the accuracy of the proposed method in Table 6. In Tables 3
Further, the result of the proposed method is compared with those of the conventional feature selection methods in Figure 2. It can be seen that, in all four databases, the proposed methods show better recognition accuracy than the previous methods. The reason for the better performance of the proposed method might be that proposed method considered global and local structures, simultaneously while conventional methods considered only one aspect structure. Further, optimal parameters can be obtained using GA to improve classification ability.
In this paper, we propose the improved the recognition techniques of real-world objects or concepts using computational methods. In classifier, the selection of an appropriate set of features is of considerable importance. Though PCA, LDA and LPP aim to preserve the global and local structure, respectively, it is more important that local and global structures are considered, simultaneously. Therefore, we have proposed a new feature extraction method which can preserve both extrinsic and intrinsic geometry of structures. To find optimal parameters, GA is used. For these reason, we can obtain feature extraction scheme that can have the more classification ability and use features which have global and local properties. The proposed method was applied to four databases selected in UCI machine learning repository and its effectiveness was demonstrated.
No potential conflict of interest relevant to this article was reported.
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP; Ministry of Science, ICT & Future Planning) (No. NRF-2017R1C1B5018408).
No potential conflict of Interest relevant to this article was reported.
E-mail: hslee0717@ut.ac.kr
Int. J. Fuzzy Log. Intell. Syst. 2018; 18(2): 120-125
Published online June 25, 2018 https://doi.org/10.5391/IJFIS.2018.18.2.120
Copyright © The Korean Institute of Intelligent Systems.
Heesung Lee
Department of Railroad Electrical and Electronics Engineering, Korea National University of Transportation, Uiwang-si, Korea
Correspondence to:Heesung Lee (hslee0717@ut.ac.kr)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
This paper proposes a new feature extraction scheme, combining global and local features. The proposed method uses principal component analysis (PCA) and linear discriminant analysis (LDA) for global property and locality preserving projections (LPP) for local property of the pattern. PCA and LDA are known for extracting the most descriptive ones after projection while LPP is known for preserving the neighborhood structure of the data set. The proposed combing method integrates global and local descriptive information and finds an efficient set of alternatives beyond PCA, LDA and LPP in the parametric space. Further, In order to find optimal parameters, the genetic algorithm (GA) is employed. Experiments are performed with four data sets selected in UCI machine learning repository to show the performance of the proposed algorithm.
Keywords: PCA, LDA, LPP, GA, Feature space, UCI machine learning repository
For general pattern recognition systems, the patterns can be represented as certain features [1]. It is also well-known fact that the irrelevant or redundant features degrade the classification accuracy, so constructing an appropriate feature subset is essential to build a good classification algorithm. Therefore, developing recognition scheme of real world objects or concepts using computational methods, the selection of an appropriate set of features is of considerable importance [2]. In this paper, we propose a new feature extraction scheme combining locality preserving projection with global information. The proposed method uses principal component analysis (PCA) and linear discriminant analysis (LDA) for global property and locality preserving projections (LPP) for local property of the pattern to improve the classification ability. PCA and LDA are well known for preserving the most descriptive ones after projection while LPP is known for preserving the neighborhood structure of the data set [3]. Our combing method can integrate global and local descriptive information and find a richer set of alternatives beyond PCA, LDA and LPP. In order to find the hybrid features adaptively and find optimal parameters, the genetic algorithm (GA) is employed. GAs have been used for many scientific and engineering optimization and search problems [4]. GAs provide an adaptive and robust computational procedure modeled on the mechanics of natural genetic systems [5]. GAs can be considered as a good solution for finding a richer set of alternatives beyond PCA, LDA and LPP in search space, because they are complicated problem in a large search space [6].
We introduce some preliminaries including PCA, LDA, LPP and GA briefly in Section 2. The proposed global and local features combining method based on GA is explained in Section 3. In Section 4, we assess the experimental performance of our proposed system. Finally, the conclusion is drawn in Section 5.
PCA can choose a dimensionality reducing linear projection that maximizes the scatter of all projected samples. Let us assume that an object is represented by the
and we have a set of objects
where each object belongs to one of
where
The objective of PCA is maximizing the determinant of the total scatter matrix of the projected samples as follows:
The new feature vectors
where
The LDA is a linear projection method that maximizes the ratio of determinant of between-class covariance matrix to the determinant of within-class covariance matrix [8]. The within-class scatter matrix
and
where
It has been proven that that this ratio is maximized when the column vectors of the projection matrix
Though the information of features is important, the local structure is also significant in many real-world applications [3, 10]. LPP is proposed for learning a locality preserving subspace and seeks to preserve the intrinsic geometry of the samples and local structure. The objective function of LPP is as follows:
where
In (
where
Finally, we can obtain reduced optimal problem as follows:
The
The GA typically maintains a fixed size population of individuals that represent the set of solution candidates for the optimization problem to be solved. The goodness of each candidate solution is evaluated using a fitness function. The population of the GA evolves with a set of genetic operators being applied. The basic genetic operators are selection, crossover and mutation. In the selection process, some individuals are selected to be copied into a tentative next population. An individual is selected for the next population with the number of copies proportional to the fitness value. The copied individuals are altered to search for a global optimal solution in the crossover and mutation processes. The GA is simple yet provides adaptive and robust optimization methodology. Given below is the basic configuration of the GA [6].
Table 6. Average, the best, and the worst accrucy of the proposed method.
Database | Average | Best | Worst |
---|---|---|---|
Sonar | 86.27 (0.679) | 92.16 | 76.47 |
Spambase | 83.13 (0.009) | 83.57 | 82.26 |
SPECTF | 75.37 (0.084) | 83.33 | 63.64 |
Heart | 63.43 (0.026) | 65.67 | 59.70 |
We propose a method which uses PCA and LDA for global property and LPP for local property of the pattern to improve the classification ability. It can integrate global and local descriptive information and find a richer set of alternatives beyond PCA, LDA and LPP. Based on them, we can obtain new features which preserve global and local structure, simultaneously as follows:
where
where
Because with different
For experiments, four databases of the UCI repository [12] are used to test the proposed method. The databases used in the experiments are summarized in Table 1.
In this experiment, we use the genetic parameters given in Table 2. We divide each database into the four subsets. We use three subsets as training and the remaining one subset as test set. We made independent four runs to show the general performance of the algorithms. The training set is used to train classifier and find optimal parameter and the test set is used to evaluate the performance of the proposed method. The fitness function of a chromosome is computed by training accuracy using leave-one-out cross-validation of the training set with nearest neighbor (NN) classifier [13]. We denote the results of previous three methods using only PCA, LDA and LPP in Tables 3, 4, and 5, respectively. We also denote the accuracy of the proposed method in Table 6. In Tables 3
Further, the result of the proposed method is compared with those of the conventional feature selection methods in Figure 2. It can be seen that, in all four databases, the proposed methods show better recognition accuracy than the previous methods. The reason for the better performance of the proposed method might be that proposed method considered global and local structures, simultaneously while conventional methods considered only one aspect structure. Further, optimal parameters can be obtained using GA to improve classification ability.
In this paper, we propose the improved the recognition techniques of real-world objects or concepts using computational methods. In classifier, the selection of an appropriate set of features is of considerable importance. Though PCA, LDA and LPP aim to preserve the global and local structure, respectively, it is more important that local and global structures are considered, simultaneously. Therefore, we have proposed a new feature extraction method which can preserve both extrinsic and intrinsic geometry of structures. To find optimal parameters, GA is used. For these reason, we can obtain feature extraction scheme that can have the more classification ability and use features which have global and local properties. The proposed method was applied to four databases selected in UCI machine learning repository and its effectiveness was demonstrated.
No potential conflict of interest relevant to this article was reported.
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP; Ministry of Science, ICT & Future Planning) (No. NRF-2017R1C1B5018408).
Structure of the chromosome.
Correct classification ratio for four databases of the UCI repository.
procedure Genetic Algorithm |
begin |
initialize |
while termination-condition not satisfied do |
begin |
evaluate |
select |
crossover |
mutation |
|
end |
end |
Table 1. Databases used for experimnets.
Database | Number of instances | Number of classes | Number of features |
---|---|---|---|
Sonar | 208 | 3 | 60 |
Spambase | 4601 | 2 | 57 |
SPECTF | 267 | 2 | 44 |
Heart | 270 | 2 | 13 |
Table 2. GA parameters.
Parameter | Value |
---|---|
Crossover rate | 0.6 |
Mutation rate | 0.05 |
Population size | 100 |
Generation | 200 |
Table 3. Average, the best, and the worst accrucy using PCA.
Database | Average | Best | Worst |
---|---|---|---|
Sonar | 82.35 (0.053) | 86.27 | 74.51 |
Spambase | 81.84 (0.004) | 82.35 | 81.39 |
SPECTF | 69.70 (0.081) | 74.24 | 57.58 |
Heart | 62.31 (0.022) | 64.17 | 59.70 |
Table 4. Average, the best, and the worst accrucy using LDA.
Database | Average | Best | Worst |
---|---|---|---|
Sonar | 74.02 (0.079) | 80.40 | 62.75 |
Spambase | 78.37 (0.022) | 81.65 | 77.13 |
SPECTF | 73.11 (0.054) | 75.76 | 65.15 |
Heart | 53.36 (0.056) | 59.70 | 46.27 |
Table 5. Average, the best, and the worst accrucy using LPP.
Database | Average | Best | Worst |
---|---|---|---|
Sonar | 84.80 (0.040) | 90.20 | 80.39 |
Spambase | 82.04 (0.002) | 82.26 | 81.82 |
SPECTF | 73.86 (0.050) | 80.30 | 68.19 |
Heart | 55.97 (0.019) | 58.21 | 53.73 |
Table 6. Average, the best, and the worst accrucy of the proposed method.
Database | Average | Best | Worst |
---|---|---|---|
Sonar | 86.27 (0.679) | 92.16 | 76.47 |
Spambase | 83.13 (0.009) | 83.57 | 82.26 |
SPECTF | 75.37 (0.084) | 83.33 | 63.64 |
Heart | 63.43 (0.026) | 65.67 | 59.70 |
Erdenebayar Urtnasan, Jong-Uk Park, SooYong Lee, and Kyoung-Joung Lee
Int. J. Fuzzy Log. Intell. Syst. -0001; 17(2): 76-81 https://doi.org/10.5391/IJFIS.2017.17.2.76Lae-Jeong Park
Int. J. Fuzzy Log. Intell. Syst. 2010; 10(2): 95-100Sung-Ho Kim,Young-Sam Lee
Int. J. Fuzzy Log. Intell. Syst. 2006; 6(3): 217-222Structure of the chromosome.
|@|~(^,^)~|@|Correct classification ratio for four databases of the UCI repository.