Regression analysis can be used for predictive and descriptive purposes in a variety of business applications. However, the successive existing regression methods such as support vector regression (SVR) have the drawback that it is not easy to derive an explicit function description that expresses the nonlinear relationship between an output variable and input variables. To resolve this issue, developed in this article is a nonlinear regression algorithm using particle swarm optimization (PSO) which is PSR. The output variables of PSR allow to obtain the explicit function description of the output variable using input variables. Three PSRs are proposed based on infeasible-particle update rules. Their experimental results show that the proposed approach performs similarly to and slightly better than the existing methods regardless of the data sets, implying that it can be utilized as a useful alternative when obtaining the explicit function description of the output variable using input variables and interpreting which of the original input variables are more important than others in the obtained regression model.
Supervised learning is one of the representative machine learning methods that attempt to reveal hidden relationships between input data and output data. The main problems of supervised learning are largely classification and regression. Especially, regression analysis can be used for predictive and descriptive purposes in a variety of business applications. For this reason, many regression methods have been developed, including ordinary least squares (OLS) and support vector regression (SVR).
OLS aims to estimate the unknown parameters in a linear regression model in order to minimize the sum of squared differences between the observed and predicted values. OLS, one of the simplest methods, has the advantage that the output variable can be expressed as a function expression of the input variables. However, to express the nonlinear relationship between the input variables and the output variable, the analyst must manually generate the necessary variables.
SVR, proposed originally by Vapnik et al. in 1997 [1], is one of the most popular supervised learning algorithms widely used in regression analysis. Using kernel tricks to efficiently perform nonlinear regression and implicitly map inputs to higher-dimensional space, SVR’s performance is very superior. However, there are some problems. First, it is difficult to obtain a clear function representation of the output variable using input variables. Therefore, it is not easy to understand the relationship between the output variable and the input variables. Second, the selection of many parameters is important to the performance of the SVR, directly affecting its generalization and regression efficiency.
In this paper, we propose an algorithm that combines the advantages of OLS and SVR using the particle swarm optimization (PSO) method. The chief advantages of this PSO-based algorithm are that it can first derive an explicit function description that expresses the nonlinear relationship between the output variable and the input variables without kernel tricks, and that, simultaneously, it helps to select the optimal parameters for the generalized regression model.
This paper is organized as follows. Section 2 explains PSO-based non-linear regression algorithm. Section 3 discusses the performance of the proposed method. Finally, Section 4 draws conclusions.
In this section, we present the signomial function. The signomial function f(x) is the sum of the monomial g_{i}(x), which is the product of a number of positive variables (x_{1}, …, x_{n}) with exponents of real numbers (r_{i}_{1}, …, r_{in}) [2].
where the coefficients β_{i} is the real number.
Our goal is to obtain the function f(x) by estimating the parameters (β_{i} and r_{i}_{1}, r_{i}_{2}, …, r_{in}) using PSO.
In this research, the basic PSO proposed by Kennedy and Eberhart [3] was applied. In 2007, many implementations of the basic PSO were cataloged and defined by Bratton and Kennedy [4]. PSO has been applied to many optimization problems such as the traveling salesman problem [5] and scheduling problems [6]. PSO is also well known as a metaheuristic algorithm that can be utilized to optimize continuous nonlinear problems [7]. This is why PSO also has been widely used to select optimal parameters of support vector machine (SVM) for classification and regression [8–10]. In this paper, we will present a new, PSO-based supervised learning method for regression problems. To the best of our knowledge, there have been no previous studies on regression problems using PSO.
PSO is an iterative procedure maintaining multiple “particles”( solutions) and their “directions and speeds,” which are referred to simply as particles and velocities, respectively. In this research, PSO was used to estimate r_{ij} for all i, j in
where k is the particle index. The overall PSO procedure is schematized in Figure 1.
In the beginning, multiple particles and their velocities are randomly generated. The initial
In each iteration, each particle and each velocity is updated based on the current particle, current velocity, local best particle, and global best particle. If
where, R1 and R2 are uniform [0, 1] random variables. The local best particle is the best particle (solution) over all iterations for particle k, and the global best particle is the best particle (solution) over all iterations for all particles. The local and global best particles are updated when the coefficients and MSE are updated in Figure 1. The coefficients in
When a particle and its velocity are updated, the particle can obey constraint
In this section, the method by which the regression models are determined by the PSRs is explained, and then their respective experimental results are compared with OLS, multivariate adaptive regression splines (MARS), and SVR. In order to compare the OLS, MARS, SVR, and 3 PSRs, the four datasets shown in Table 3 were used. These data sets are usually used in machine learning area. Each data set has 10 train, valid, and test datasets, which are used for model training, model validation, and model testing, respectively. For optimal objectivity, various types of data were tested. Abalone had a larger number of data than the other datasets, and Triazines a larger number of independent variables.
A PSR, with its random search characteristics, provides different regression models over multiple runs with the same data. In order to find better regression models by the PSRs, five replication runs for each data set were performed. Five replication models were determined by each PSR with the test data. Then, the model providing the minimum MSE by valid data was selected as the regression model for the PSR. Finally, the selected model was tested by the test data. Ten terms (m = 10) were used for the PSR models in
Table 4 compares the average MSEs and their average standard deviations for ten independent runs. For each method, the first row shows the average MSE, and the second row shows the average standard deviation. The bold values in Table 4 are the top 3 results for the 6 methods and test data. The 3 PSRs provided better test results than the other methods for the Abalone and Machine data sets. PSR1 and PSR2 provided better average MSEs for Abalone, Machine, and Triazines than OLS, MARS or SVR. SVR provided the best average MSE for the Auto data. These results show that the PSR methods provide competitive regression models.
Table 5 shows the test MSE error, (Method MSE − Best MSE)/Best MSE, between each method and the best method. PSR1 provided the best MSE for Abalone and Triazines, and PSR3 provided the best MSE for Machine. PSR1 was the best in terms of the average test MSE error. The second- and third-best methods were PSR2 and SVR, respectively. Tables 4 and 5 provide the evidence of the competitiveness of the PSR regression models.
Figure 2 represents the Table 5 data graphically. Figure 2(a) shows overall test MSE errors in Table 5. Figure 2(b) provides a zoom-in of the most notable area of Figure 2(a). Most of the methods provided smaller MSE errors for the Abalone dataset. OLS and MARS provided larger average errors than the others. Average MSE errors of PSR1 and PSR2 are smaller than SVR. All of the MSE errors of PSR1, PSR2, and PSR3 were less than 25%. Most significantly, all of the MSE errors of PSR2 were less than 20%. This result shows that PSR1 was the best method for minimization of average test MSE errors. However, PSR2 showed robustness with all of the data sets.
The advantage of the PSR models is their provision of a model description using independent variables. Tables 6
Tables 6
The r_{ij} in Tables 6
PSR1 and PSR3 estimated more r_{ij} to be zero than PSR2, since they have more bounded constraints. For Abalone and Triazines, PSR1 was the best and PSR2 second best. Users can choose their model in consideration of minimum test MSE error (PSR1), robustness (PSR2), or ratio of r_{ij} = 0 (PSR1), according to their specific purpose. All of these models were shown to be competitive with existing methods (i.e., OLS, MARS, and SVR).
In this paper, we proposed new PSRs to solve the nonlinear regression problem. The PSR methods provide the signomial functions using the PSO method. The PSR regression models offer the benefit of explicit functions descriptions representing the nonlinearity of the given data without kernel tricks.
Also, with the PSR regression models, sparse signomial functions that can be used for prediction purposes as well as interpretation purposes can be obtained. The present experimental results revealed that this approach can detect a sparse signomial nonlinear function; indeed, PSR may be considered as a useful alternative when interpreting which original input variables are more important than others in an obtained regression model. The development of nonlinear sparse regression models using other metaheuristic optimization methods will be an important focus of future studies.
No potential conflict of interest relevant to this article was reported.
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (NRF-2017R1E1A1A01077375).
Pseudo-code for initialization
Pseudo-code | |
---|---|
Particle | Step 1: Set |
Velocity | Generate uniform [−0.01, 0.01] and assign it to |
Three infeasible-particle update rules
Method | Update rule |
---|---|
PSR1 | If |
PSR2 | If |
PSR3 | Add a penalty to MSE when |
Parameters for each dataset
Dataset | Number of independent variables ( |
Number of independent runs | Number of observations | ||
---|---|---|---|---|---|
Train | Valid | Test | |||
Abalone | 7 | 10 | 2,507 | 835 | 304 |
Auto | 7 | 10 | 304 | 101 | 101 |
Machine | 6 | 10 | 125 | 42 | 42 |
Triazines | 60 | 10 | 112 | 37 | 37 |
Average MSEs and their standard deviations for each method and dataset
Method | Dataset | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Abalone | Auto | Machine | Triazines | ||||||||||
Train | Valid | Test | Train | Valid | Test | Train | Valid | Test | Train | Valid | Test | ||
OLS | MSE | 4.90 | 4.93 | 5.11 | 10.24 | 11.82 | 12.57 | 2854 | 8388 | 6215 | 0.011 | 0.042 | 0.058 |
SD | 0.10 | 0.29 | 0.41 | 0.51 | 1.63 | 1.57 | 846 | 6467 | 4488 | 0.001 | 0.009 | 0.043 | |
MARS | MSE | 4.45 | 4.53 | 4.66 | 6.97 | 8.72 | 9.36 | 1476 | 11212 | 10710 | 0.013 | 0.026 | 0.023 |
SD | 0.09 | 0.23 | 0.28 | 0.55 | 0.95 | 1.93 | 446 | 12416 | 15231 | 0.002 | 0.005 | 0.005 | |
SVR | MSE | 4.28 | 4.46 | 4.67 | 4.11 | 7.50 | 1156 | 5600 | 5092 | 0.013 | 0.023 | ||
SD | 0.13 | 0.26 | 0.36 | 0.46 | 1.36 | 1.24 | 250 | 4409 | 4976 | 0.001 | 0.006 | 0.006 | |
PSR1 | MSE | 4.42 | 4.38 | 6.77 | 8.05 | 8.93 | 1324 | 5625 | 0.013 | 0.021 | |||
SD | 0.12 | 0.22 | 0.31 | 0.43 | 1.05 | 1.75 | 243 | 4688 | 3795 | 0.001 | 0.004 | 0.004 | |
PSR2 | MSE | 4.36 | 4.38 | 6.42 | 7.88 | 680 | 3507 | 0.012 | 0.020 | ||||
SD | 0.13 | 0.23 | 0.32 | 0.53 | 1.13 | 1.48 | 204 | 2585 | 4204 | 0.001 | 0.003 | 0.006 | |
PSR3 | MSE | 4.43 | 4.39 | 7.06 | 7.92 | 1676 | 4484 | 0.015 | 0.022 | 0.026 | |||
SD | 0.13 | 0.21 | 0.37 | 0.21 | 1.29 | 1.61 | 446 | 3550 | 2904 | 0.001 | 0.004 | 0.016 |
Test MSE error
Method | Abalone | Auto | Machine | Triazines | Average |
---|---|---|---|---|---|
OLS | 0.105856 | 0.748477 | 0.538925 | 1.969327 | 0.840646 |
MARS | 0.008258 | 0.301676 | 1.651818 | 0.187167 | 0.537230 |
SVR | 0.009596 | 0 | 0.260881 | 0.155416 | 0.106473 |
PSR1 | 0 | 0.241871 | 0.047562 | 0 | 0.072358 |
PSR2 | 0.002454 | 0.139036 | 0.152940 | 0.047548 | 0.085495 |
PSR3 | 0.007278 | 0.237266 | 0 | 0.340737 | 0.146320 |
MSE error = (Method MSE – Best MSE) / Best MSE.
PSR1 model for first dataset of Avalone
Term index ( |
Coefficient ( |
|||||||
---|---|---|---|---|---|---|---|---|
Independent variable index ( |
||||||||
1 | 2 | 3 | 4 | 5 | 6 | 7 | ||
1 | −15.5727 | 0.3 | 0.06 | 0.02 | 0.14 | 0.11 | 0.04 | −0.11 |
2 | 3.4747 | 0.15 | −0.03 | −0.17 | −0.2 | 0 | −0.1 | 0.12 |
3 | −4.6892 | 0.26 | −0.18 | 0.03 | 0.01 | −0.06 | 0.05 | −0.3 |
4 | −1.8090 | −0.1 | −0.29 | 0.02 | −0.05 | −0.12 | −0.07 | 0.01 |
5 | 15.6752 | −0.11 | −0.3 | 0.25 | −0.01 | 0.07 | −0.03 | −0.01 |
6 | −3.8564 | −0.03 | 0.27 | 0 | 0.07 | 0.05 | −0.33 | 0.07 |
7 | 7.0208 | 0 | −0.02 | 0 | −0.23 | 0.21 | 0.09 | −0.37 |
8 | 28.3323 | −0.11 | −0.02 | 0 | 0.49 | −0.27 | −0.01 | 0.04 |
9 | −6.8067 | −0.1 | −0.05 | 0.13 | 0.11 | 0.22 | −0.09 | −0.16 |
10 | −14.3795 | −0.37 | −0.29 | 0 | −0.01 | 0.02 | 0.2 | −0.01 |
0 | −3.4941 | - |
PSR2 model for first dataset of Auto
Term index ( |
Coefficient ( |
|||||||
---|---|---|---|---|---|---|---|---|
Independent variable index ( |
||||||||
1 | 2 | 3 | 4 | 5 | 6 | 7 | ||
1 | 5.64E+11 | −0.96 | −1 | −1 | −1 | −1 | −1 | −1 |
2 | −140.3110 | −0.29 | −0.61 | −1 | 1 | 1 | −1 | −0.59 |
3 | 0.08789 | −0.1 | −1 | −1 | 1 | 1 | 1 | 0 |
4 | −0.0291 | 1 | 0.44 | 0.73 | −0.41 | 1 | −0.36 | 0.8 |
5 | 4.31E-06 | 1 | −0.91 | 0.81 | 0.76 | 1 | 1 | −0.75 |
6 | 0.0851 | 0.79 | −0.7 | −1 | 1 | −0.61 | 1 | 0.85 |
7 | 3946841 | −1 | −0.5 | −0.55 | −1 | −1 | 1 | 0.88 |
8 | 0.0008 | 1 | 1 | 0.54 | 0.04 | −0.35 | −0.51 | 1 |
9 | −4714388 | −1 | −1 | −1 | −0.38 | −1 | 0.81 | 1 |
10 | 2.9175 | −0.42 | 1 | −0.45 | −1 | 0.99 | 1 | −1 |
0 | 8.5401 | - |
PSR3 model for first dataset of Machine
Term index ( |
Coefficient ( |
||||||
---|---|---|---|---|---|---|---|
Independent variable index ( |
|||||||
1 | 2 | 3 | 4 | 5 | 6 | ||
1 | −1145.7200 | −0.73 | 0 | 0 | 0.16 | 0 | 0.01 |
2 | −318.298 | −0.01 | −0.4 | −0.02 | 0.19 | 0.01 | 0.33 |
3 | 0.1497 | 0.83 | 0.02 | −0.01 | 0.02 | 0.02 | 0.05 |
4 | 0.2105 | 0.01 | 0.05 | −0.01 | −0.07 | 0.65 | 0.18 |
5 | 0.7012 | 0.16 | 0.71 | −0.01 | 0.02 | −0.01 | 0 |
6 | 1679.2390 | 0.05 | −0.17 | −0.22 | −0.23 | −0.02 | −0.29 |
7 | −1.8570 | 0.32 | 0.52 | −0.02 | 0.07 | 0.03 | −0.01 |
8 | 0.4133 | 0.47 | 0.06 | 0.32 | −0.01 | 0.03 | −0.06 |
9 | 307.1087 | −0.36 | −0.04 | −0.01 | 0.28 | 0.01 | 0.27 |
10 | −28.4306 | −0.23 | 0.23 | 0 | −0.25 | 0.1 | 0.08 |
0 | 26.2379 | - |
Ratio of
Method | Abalone | Auto | Machine | Triazines |
---|---|---|---|---|
PSR1 | 0.294 (1,029) | 0.611 (2,140) | 0.527 (1,582) | 0.952 (2,8571) |
PSR2 | 0.012 (41) | 0.034 (115) | 0.018 (54) | 0.467 (14,009) |
PSR3 | 0.119 (415) | 0.192 (671) | 0.113 (338) | 0.906 (27,181) |
E-mail: sonmupsj@kangwon.ac.kr
E-mail: nysong@mju.ac.kr
E-mail: wyyuie@mju.ac.kr
E-mail: ftgog@mju.ac.kr