Article Search
닫기

## Original Article

Split Viewer

International Journal of Fuzzy Logic and Intelligent Systems 2019; 19(4): 307-314

Published online December 25, 2019

https://doi.org/10.5391/IJFIS.2019.19.4.307

© The Korean Institute of Intelligent Systems

## PSR: PSO-Based Signomial Regression Model

SeJoon Park1, NagYoon Song2, Wooyeon Yu2, and Dohyun Kim2

1Division of Energy Resources Engineering and Industrial Engineering, Kangwon National University, Chuncheon, Korea
2Department of Industrial and Management Engineering, Myongji University, Yongin, Korea

Correspondence to :
Dohyun Kim (ftgog@mju.ac.kr)

Received: November 11, 2019; Revised: December 10, 2019; Accepted: December 12, 2019

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Regression analysis can be used for predictive and descriptive purposes in a variety of business applications. However, the successive existing regression methods such as support vector regression (SVR) have the drawback that it is not easy to derive an explicit function description that expresses the nonlinear relationship between an output variable and input variables. To resolve this issue, developed in this article is a nonlinear regression algorithm using particle swarm optimization (PSO) which is PSR. The output variables of PSR allow to obtain the explicit function description of the output variable using input variables. Three PSRs are proposed based on infeasible-particle update rules. Their experimental results show that the proposed approach performs similarly to and slightly better than the existing methods regardless of the data sets, implying that it can be utilized as a useful alternative when obtaining the explicit function description of the output variable using input variables and interpreting which of the original input variables are more important than others in the obtained regression model.

Keywords: Signomial regression, Particle swarm optimization, Meta-heuristic optimization

Supervised learning is one of the representative machine learning methods that attempt to reveal hidden relationships between input data and output data. The main problems of supervised learning are largely classification and regression. Especially, regression analysis can be used for predictive and descriptive purposes in a variety of business applications. For this reason, many regression methods have been developed, including ordinary least squares (OLS) and support vector regression (SVR).

OLS aims to estimate the unknown parameters in a linear regression model in order to minimize the sum of squared differences between the observed and predicted values. OLS, one of the simplest methods, has the advantage that the output variable can be expressed as a function expression of the input variables. However, to express the nonlinear relationship between the input variables and the output variable, the analyst must manually generate the necessary variables.

SVR, proposed originally by Vapnik et al. in 1997 [1], is one of the most popular supervised learning algorithms widely used in regression analysis. Using kernel tricks to efficiently perform nonlinear regression and implicitly map inputs to higher-dimensional space, SVR’s performance is very superior. However, there are some problems. First, it is difficult to obtain a clear function representation of the output variable using input variables. Therefore, it is not easy to understand the relationship between the output variable and the input variables. Second, the selection of many parameters is important to the performance of the SVR, directly affecting its generalization and regression efficiency.

In this paper, we propose an algorithm that combines the advantages of OLS and SVR using the particle swarm optimization (PSO) method. The chief advantages of this PSO-based algorithm are that it can first derive an explicit function description that expresses the nonlinear relationship between the output variable and the input variables without kernel tricks, and that, simultaneously, it helps to select the optimal parameters for the generalized regression model.

This paper is organized as follows. Section 2 explains PSO-based non-linear regression algorithm. Section 3 discusses the performance of the proposed method. Finally, Section 4 draws conclusions.

### 2.1 PSR Model

In this section, we present the signomial function. The signomial function f(x) is the sum of the monomial gi(x), which is the product of a number of positive variables (x1, …, xn) with exponents of real numbers (ri1, …, rin) [2].

gi(x)=x1ri1x2ri2xnrin,f(x)=i=1mβigi(x),

where the coefficients βi is the real number.

Our goal is to obtain the function f(x) by estimating the parameters (βi and ri1, ri2, …, rin) using PSO.

### 2.2 Particle Swarm Optimization

In this research, the basic PSO proposed by Kennedy and Eberhart [3] was applied. In 2007, many implementations of the basic PSO were cataloged and defined by Bratton and Kennedy [4]. PSO has been applied to many optimization problems such as the traveling salesman problem [5] and scheduling problems [6]. PSO is also well known as a metaheuristic algorithm that can be utilized to optimize continuous nonlinear problems [7]. This is why PSO also has been widely used to select optimal parameters of support vector machine (SVM) for classification and regression [810]. In this paper, we will present a new, PSO-based supervised learning method for regression problems. To the best of our knowledge, there have been no previous studies on regression problems using PSO.

PSO is an iterative procedure maintaining multiple “particles”( solutions) and their “directions and speeds,” which are referred to simply as particles and velocities, respectively. In this research, PSO was used to estimate rij for all i, j in Eq. (1). The objective of PSO is to find the estimated rij and the coefficients in Eq. (1) in order to minimize the mean square error (MSE) of Eq. (1). First, the particle and velocity is defined as below:

Particle:Rk=(r11k,r12k,,rmnk),Velocity:Vk=(v11k,v12k,,vmnk),

where k is the particle index. The overall PSO procedure is schematized in Figure 1.

In the beginning, multiple particles and their velocities are randomly generated. The initial rijk is between −1 and 1, and j=1n|rijk|1 for ∀ k. The initial vijk is between −0.01 and 0.01. Table 1 shows how to initialize the k th particle and velocity. All rijk are initialized from i = 1 to i = m term by term. In each term the independent variable index, j, is randomly selected one by one. In each selection, a uniform [-1+j|rijk|,1-j|rijk|] random variable is generated and assigned to rijk. This allows j=1n|rijk| to be less than or equal to one.

In each iteration, each particle and each velocity is updated based on the current particle, current velocity, local best particle, and global best particle. If Rkt and Vkt are the particle and velocity for the tth iteration, each particle is randomly changed by Eq. (3).

Vkt+1=Vkt+2·R1·(global best particle-Rkt)+2·R2·(local best particle-Rkt),Rkt+1=Rkt+Vkt+1,

where, R1 and R2 are uniform [0, 1] random variables. The local best particle is the best particle (solution) over all iterations for particle k, and the global best particle is the best particle (solution) over all iterations for all particles. The local and global best particles are updated when the coefficients and MSE are updated in Figure 1. The coefficients in Eq. (1) and its MSE are determined for each particle by the least square method. As shown in Figure 1, if the best global particle is not changed over TR iterations, the PSO procedure is terminated. This is the termination rule of the PSO procedure.

### 2.3 Three Particle and Velocity Update Methods

When a particle and its velocity are updated, the particle can obey constraint j=1n|rijk|1 (Figure 1). Three different update methods are proposed in this research, namely PSR1, PSR2, and PSR3, as shown in Table 2. In PSR1, if j=1n|rijk|>1, set rijk=0 one by one until j=1n|rijk|1. A smaller |rijk| is selected, and set to zero, in order to select a less important rijk. In PSR 2, the constraint is relaxed as each |rijk|1. Therefore, if |rijk|>1, set rijk to 1 or −1. In PSR3, PSO is able to keep infeasible particles (solutions) such as |rijk|>1. However, this method adds sufficient penalty to the associated MSE to prevent selection of an infeasible particle as the local or global best solution. In the next section, the results of the PSRs and other approaches are compared.

### 3.1 Experimental Design

In this section, the method by which the regression models are determined by the PSRs is explained, and then their respective experimental results are compared with OLS, multivariate adaptive regression splines (MARS), and SVR. In order to compare the OLS, MARS, SVR, and 3 PSRs, the four datasets shown in Table 3 were used. These data sets are usually used in machine learning area. Each data set has 10 train, valid, and test datasets, which are used for model training, model validation, and model testing, respectively. For optimal objectivity, various types of data were tested. Abalone had a larger number of data than the other datasets, and Triazines a larger number of independent variables.

A PSR, with its random search characteristics, provides different regression models over multiple runs with the same data. In order to find better regression models by the PSRs, five replication runs for each data set were performed. Five replication models were determined by each PSR with the test data. Then, the model providing the minimum MSE by valid data was selected as the regression model for the PSR. Finally, the selected model was tested by the test data. Ten terms (m = 10) were used for the PSR models in Eq. (1). The other parameters were pre-tested with data, as indicated in Table 3. Three numbers of particles were examined: 100, 200 and 300. The termination rule (TR) was also tested with 100, 200, and 300 particles. Two hundred particles and TR 200 were selected for a comparison experiment, since increasing them did not improve the overall MSEs. The R package version 3.4.0 was used for the PSR procedures, which were performed on an Intel Core i5-4590 CPU @ 3.30 GHz, 4 GB RAM.

### 3.2 Experimental Results

Table 4 compares the average MSEs and their average standard deviations for ten independent runs. For each method, the first row shows the average MSE, and the second row shows the average standard deviation. The bold values in Table 4 are the top 3 results for the 6 methods and test data. The 3 PSRs provided better test results than the other methods for the Abalone and Machine data sets. PSR1 and PSR2 provided better average MSEs for Abalone, Machine, and Triazines than OLS, MARS or SVR. SVR provided the best average MSE for the Auto data. These results show that the PSR methods provide competitive regression models.

Table 5 shows the test MSE error, (Method MSE − Best MSE)/Best MSE, between each method and the best method. PSR1 provided the best MSE for Abalone and Triazines, and PSR3 provided the best MSE for Machine. PSR1 was the best in terms of the average test MSE error. The second- and third-best methods were PSR2 and SVR, respectively. Tables 4 and 5 provide the evidence of the competitiveness of the PSR regression models.

Figure 2 represents the Table 5 data graphically. Figure 2(a) shows overall test MSE errors in Table 5. Figure 2(b) provides a zoom-in of the most notable area of Figure 2(a). Most of the methods provided smaller MSE errors for the Abalone dataset. OLS and MARS provided larger average errors than the others. Average MSE errors of PSR1 and PSR2 are smaller than SVR. All of the MSE errors of PSR1, PSR2, and PSR3 were less than 25%. Most significantly, all of the MSE errors of PSR2 were less than 20%. This result shows that PSR1 was the best method for minimization of average test MSE errors. However, PSR2 showed robustness with all of the data sets.

The advantage of the PSR models is their provision of a model description using independent variables. Tables 68 show PSR1, PSR2, and PSR3 model examples, respectively. With PSR1, the most valid MSE was shown at the first replication for the first run of Avalone (Table 6); with PSR2, the most valid MSE was shown at the third replication for the first run of Auto (Table 7), and with PSR3, the most valid MSE was shown at the third replication for the first run of Machine (Table 8). Each PSR model provided the minimum test MSE error. No model for Triazines is provided herein, due to the large number of independent variables.

Eq. (4) shows the model in Table 6 in mathematical form.

y=-3.4941-15.5727·x10.3·x20.06·x30.02·x40.14·x50.11·x60.04·x7-0.11+3.4747·x10.15·x2-0.03·x3-0.17·x4-0.2·x6-0.1·x70.12-4.6892·x10.26·x2-0.18·x30.03·x40.01·x5-0.06·x6-0.05·x7-0.3-1.8090·x1-0.1·x2-0.29·x30.02·x4-0.05·x5-0.12·x6-0.07·x70.01+15.6752·x1-0.11·x20.3·x30.25·x4-0.01·x50.07·x6-0.03·x7-0.01-3.8564·x1-0.03·x20.27·x40.07·x50.05·x6-0.33·x70.07+7.0280·x2-0.02·x4-0.23·x50.21·x60.09·x7-0.37+28.3323·x1-0.11·x2-0.02·x40.49·x5-0.27·x6-0.01·x70.04-6.8067·x1-0.1·x2-0.05·x30.13·x40.11·x50.22·x6-0.09·x7-0.16-14.3795·x1-0.37·x2-0.29·x4-0.01·x50.02·x60.2·x7-0.01.

Tables 68 represents the mathematical model from PSR1, PSR2, and PSR3. Eq. (4) shows example mathematical model as a mathematical form.

The rij in Tables 68 are all between −1 and 1. All j=1n|rijk|1 in Tables 6 and 8. The regression models differ based on the PSR method. More than half of rij are bounded at 1 or −1 in Table 7 but not in Table 6. Some rij are zeros in Tables 68. The ratios of rij = 0 for all of the developed models are provided in Table 9. The ratio was calculated by the total number of rij = 0 over the total number of estimated rij. When the number of independent variables increased, the number of rij= 0 also increased. For Abalone, 3, 500 rij (7 independent variables ×10 terms ×5 replications ×10 independent data sets) were estimated for each PSR model. Totals of 3, 500, 3, 000, and 30, 000 rij were estimated for Auto, Machine, and Triazines, respectively, based on the number of independent variables.

PSR1 and PSR3 estimated more rij to be zero than PSR2, since they have more bounded constraints. For Abalone and Triazines, PSR1 was the best and PSR2 second best. Users can choose their model in consideration of minimum test MSE error (PSR1), robustness (PSR2), or ratio of rij = 0 (PSR1), according to their specific purpose. All of these models were shown to be competitive with existing methods (i.e., OLS, MARS, and SVR).

In this paper, we proposed new PSRs to solve the nonlinear regression problem. The PSR methods provide the signomial functions using the PSO method. The PSR regression models offer the benefit of explicit functions descriptions representing the nonlinearity of the given data without kernel tricks.

Also, with the PSR regression models, sparse signomial functions that can be used for prediction purposes as well as interpretation purposes can be obtained. The present experimental results revealed that this approach can detect a sparse signomial nonlinear function; indeed, PSR may be considered as a useful alternative when interpreting which original input variables are more important than others in an obtained regression model. The development of nonlinear sparse regression models using other metaheuristic optimization methods will be an important focus of future studies.

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (NRF-2017R1E1A1A01077375).

Fig. 1.

PSO procedure.

Fig. 2.

Test MSE errors: (a) overall test MSE errors in Table and (b) a zoom-in of the most notable area of Figure 2(a).

Table. 1.

Table 1. Pseudo-code for initialization.

Pseudo-code
ParticleStep 1: Set rijk=0 for ∀ i, j. Set i = 1, S = {1, 2, 3, …, n}.Step 2: Randomly select q in S.Step 3: Generate uniform [-1+j|rijk|,1-j|rijk|] and round up two decimal places and assign it to riqk.Step 4: S = S−{q}. If S ≠ Ø, go to Step 2.Step 5: If i < m, i = i + 1, set S = {1, 2, 3, …, n} and go to Step 2.
VelocityGenerate uniform [−0.01, 0.01] and assign it to vijk for ∀i, j

Table. 2.

Table 2. Three infeasible-particle update rules.

MethodUpdate rule
PSR1If j=1n|rijk|>1, set rpqk=0, where (p,q)=arg(p,q),|rpqk|>0min|rpqk|, until j=1n|rijk|1.
PSR2If rijk>1, set rijk=1 or if rijk<-1, set rijk=-1.
PSR3Add a penalty to MSE when j=1n|rijk|>1 for any k.

Table. 3.

Table 3. Parameters for each dataset.

DatasetNumber of independent variables (n)Number of independent runsNumber of observations
TrainValidTest
Abalone7102,507835304
Auto710304101101
Machine6101254242
Triazines60101123737

Table. 4.

Table 4. Average MSEs and their standard deviations for each method and dataset.

MethodDataset
AbaloneAutoMachineTriazines
TrainValidTestTrainValidTestTrainValidTestTrainValidTest
OLSMSE4.904.935.1110.2411.8212.572854838862150.0110.0420.058
SD0.100.290.410.511.631.57846646744880.0010.0090.043

MARSMSE4.454.534.666.978.729.36147611212107100.0130.0260.023
SD0.090.230.280.550.951.9344612416152310.0020.0050.005

SVRMSE4.284.464.674.117.507.191156560050920.0130.0230.023
SD0.130.260.360.461.361.24250440949760.0010.0060.006

PSR1MSE4.424.384.626.778.058.931324562542310.0130.0210.019
SD0.120.220.310.431.051.75243468837950.0010.0040.004

PSR2MSE4.364.384.636.427.888.19680350746560.0120.0200.020
SD0.130.230.320.531.131.48204258542040.0010.0030.006

PSR3MSE4.434.394.667.067.928.891676448440390.0150.0220.026
SD0.130.210.370.211.291.61446355029040.0010.0040.016

Table. 5.

Table 5. Test MSE error.

MethodAbaloneAutoMachineTriazinesAverage
OLS0.1058560.7484770.5389251.9693270.840646
MARS0.0082580.3016761.6518180.1871670.537230
SVR0.00959600.2608810.1554160.106473
PSR100.2418710.04756200.072358
PSR20.0024540.1390360.1529400.0475480.085495
PSR30.0072780.23726600.3407370.146320

MSE error = (Method MSE – Best MSE) / Best MSE..

Table. 6.

Table 6. PSR1 model for first dataset of Avalone.

Term index (i)Coefficient (βi)rij
Independent variable index (j)
1234567
1−15.57270.30.060.020.140.110.04−0.11
23.47470.15−0.03−0.17−0.20−0.10.12
3−4.68920.26−0.180.030.01−0.060.05−0.3
4−1.8090−0.1−0.290.02−0.05−0.12−0.070.01
515.6752−0.11−0.30.25−0.010.07−0.03−0.01
6−3.8564−0.030.2700.070.05−0.330.07
77.02080−0.020−0.230.210.09−0.37
828.3323−0.11−0.0200.49−0.27−0.010.04
9−6.8067−0.1−0.050.130.110.22−0.09−0.16
10−14.3795−0.37−0.290−0.010.020.2−0.01
0−3.4941-

Table. 7.

Table 7. PSR2 model for first dataset of Auto.

Term index (i)Coefficient (βi)rij
Independent variable index (j)
1234567
15.64E+11−0.96−1−1−1−1−1−1
2−140.3110−0.29−0.61−111−1−0.59
30.08789−0.1−1−11110
4−0.029110.440.73−0.411−0.360.8
54.31E-061−0.910.810.7611−0.75
60.08510.79−0.7−11−0.6110.85
73946841−1−0.5−0.55−1−110.88
80.0008110.540.04−0.35−0.511
9−4714388−1−1−1−0.38−10.811
102.9175−0.421−0.45−10.991−1
08.5401-

Table. 8.

Table 8. PSR3 model for first dataset of Machine.

Term index (i)Coefficient (βi)rij
Independent variable index (j)
123456
1−1145.7200−0.73000.1600.01
2−318.298−0.01−0.4−0.020.190.010.33
30.14970.830.02−0.010.020.020.05
40.21050.010.05−0.01−0.070.650.18
50.70120.160.71−0.010.02−0.010
61679.23900.05−0.17−0.22−0.23−0.02−0.29
7−1.85700.320.52−0.020.070.03−0.01
80.41330.470.060.32−0.010.03−0.06
9307.1087−0.36−0.04−0.010.280.010.27
10−28.4306−0.230.230−0.250.10.08
026.2379-

Table. 9.

Table 9. Ratio of rij = 0 (total number of rij = 0).

MethodAbaloneAutoMachineTriazines
PSR10.294 (1,029)0.611 (2,140)0.527 (1,582)0.952 (2,8571)
PSR20.012 (41)0.034 (115)0.018 (54)0.467 (14,009)
PSR30.119 (415)0.192 (671)0.113 (338)0.906 (27,181)

1. Drucker, H, Burges, CJ, Kaufman, L, Smola, AJ, and Vapnik, V (1996). Support vector regression machines. Advances in Neural Information Processing Systems. 9, 155-161.
2. Maranas, CD, and Floudas, CA (1997). Global optimization in generalized geometric programming. Computers & Chemical Engineering. 21, 351-369. https://doi.org/10.1016/S0098-1354(96)00282-7
3. Kennedy, J, and Eberhart, R 1995. Particle swarm optimization (PSO)., Proceedings of IEEE International Conference on Neural Networks, Perth, Australia, Array, pp.1942-1948. https://doi.org/10.1109/ICNN.1995.488968
4. Bratton, D, and Kennedy, J 2007. Defining a standard for particle swarm optimization., Proceedings of 2007 IEEE Swarm Intelligence Symposium, Honolulu, HI, Array, pp.120-127. https://doi.org/10.1109/SIS.2007.368035
5. Wang, KP, Huang, L, Zhou, CG, and Pang, W 2003. Particle swarm optimization for traveling salesman problem., Proceedings of the 2003 international Conference on Machine Learning and Cybernetics (IEEE cat no 03ex693), Xian, China, Array, pp.1583-1585. https://doi.org/10.1109/ICMLC.2003.1259748
6. Akjiratikarl, C, Yenradee, P, and Drake, PR (2007). PSO-based algorithm for home care worker scheduling in the UK. Computers & Industrial Engineering. 53, 559-583. https://doi.org/10.1016/j.cie.2007.06.002
7. Schwaab, M, Biscaia, EC, Monteiro, JL, and Pinto, JC (2008). Nonlinear parameter estimation through particle swarm optimization. Chemical Engineering Science. 63, 1542-1552. https://doi.org/10.1016/j.ces.2007.11.024
8. Bao, Y, Hu, Z, and Xiong, T (2013). A PSO and pattern search based memetic algorithm for SVMs parameters optimization. Neurocomputing. 117. https://doi.org/10.1016/j.neucom.2013.01.027
9. Wang, D, and Wu, X (2008). The Optimization of SVM Parameters Based on PSO. Computer Applications. 28, 134-139.
10. Yuan, X, and Liu, A (2007). The research of SVM parameter selection based on PSO algorithm. Techniques of Automation and Application. 26, 5-8.

SeJoon Park received the Industrial Engineering Master and Ph.D. degrees from University of Florida in 2005 and the Ph.D. from Oregon State University in 2012, respectively. Currently, he is an Assistant Professor, Division of Energy Resources Engineering and Industrial Engineering at Kangwon National University. His research interests include big data analysis and optimization.

E-mail: sonmupsj@kangwon.ac.kr

NagYoon Song received his M.S. degree in Industrial and Management Engineering from Myongji University in 2016. His research interests anomaly detection, explainable artificial intelligence (XAI) and architecture search based deep learning.

E-mail: nysong@mju.ac.kr

Wooyeon Yu is currently a professor of Industrial and Management Engineering at Myongji University. He received his Ph.D. degree in the Department of Industrial and Manufacturing Systems Engineering from Iowa State University. Before he joined Myongji University, he worked as a senior consultant at Samsung SDS. His primary research interests are in material handling, logistics and supply chain management.

E-mail: wyyuie@mju.ac.kr

Dohyun Kim received his M.S. and Ph.D. degrees in industrial engineering from KAIST, Korea, in 2002 and 2007, respectively. Currently, he is an Associate Professor with the Department of Industrial and Management Engineering, Myongji University. His research interests include statistical data mining, deep learning and graph data analysis.

E-mail: ftgog@mju.ac.kr

### Article

#### Original Article

International Journal of Fuzzy Logic and Intelligent Systems 2019; 19(4): 307-314

Published online December 25, 2019 https://doi.org/10.5391/IJFIS.2019.19.4.307

## PSR: PSO-Based Signomial Regression Model

SeJoon Park1, NagYoon Song2, Wooyeon Yu2, and Dohyun Kim2

1Division of Energy Resources Engineering and Industrial Engineering, Kangwon National University, Chuncheon, Korea
2Department of Industrial and Management Engineering, Myongji University, Yongin, Korea

Correspondence to:Dohyun Kim (ftgog@mju.ac.kr)

Received: November 11, 2019; Revised: December 10, 2019; Accepted: December 12, 2019

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

### Abstract

Regression analysis can be used for predictive and descriptive purposes in a variety of business applications. However, the successive existing regression methods such as support vector regression (SVR) have the drawback that it is not easy to derive an explicit function description that expresses the nonlinear relationship between an output variable and input variables. To resolve this issue, developed in this article is a nonlinear regression algorithm using particle swarm optimization (PSO) which is PSR. The output variables of PSR allow to obtain the explicit function description of the output variable using input variables. Three PSRs are proposed based on infeasible-particle update rules. Their experimental results show that the proposed approach performs similarly to and slightly better than the existing methods regardless of the data sets, implying that it can be utilized as a useful alternative when obtaining the explicit function description of the output variable using input variables and interpreting which of the original input variables are more important than others in the obtained regression model.

Keywords: Signomial regression, Particle swarm optimization, Meta-heuristic optimization

### 1. Introduction

Supervised learning is one of the representative machine learning methods that attempt to reveal hidden relationships between input data and output data. The main problems of supervised learning are largely classification and regression. Especially, regression analysis can be used for predictive and descriptive purposes in a variety of business applications. For this reason, many regression methods have been developed, including ordinary least squares (OLS) and support vector regression (SVR).

OLS aims to estimate the unknown parameters in a linear regression model in order to minimize the sum of squared differences between the observed and predicted values. OLS, one of the simplest methods, has the advantage that the output variable can be expressed as a function expression of the input variables. However, to express the nonlinear relationship between the input variables and the output variable, the analyst must manually generate the necessary variables.

SVR, proposed originally by Vapnik et al. in 1997 [1], is one of the most popular supervised learning algorithms widely used in regression analysis. Using kernel tricks to efficiently perform nonlinear regression and implicitly map inputs to higher-dimensional space, SVR’s performance is very superior. However, there are some problems. First, it is difficult to obtain a clear function representation of the output variable using input variables. Therefore, it is not easy to understand the relationship between the output variable and the input variables. Second, the selection of many parameters is important to the performance of the SVR, directly affecting its generalization and regression efficiency.

In this paper, we propose an algorithm that combines the advantages of OLS and SVR using the particle swarm optimization (PSO) method. The chief advantages of this PSO-based algorithm are that it can first derive an explicit function description that expresses the nonlinear relationship between the output variable and the input variables without kernel tricks, and that, simultaneously, it helps to select the optimal parameters for the generalized regression model.

This paper is organized as follows. Section 2 explains PSO-based non-linear regression algorithm. Section 3 discusses the performance of the proposed method. Finally, Section 4 draws conclusions.

### 2.1 PSR Model

In this section, we present the signomial function. The signomial function f(x) is the sum of the monomial gi(x), which is the product of a number of positive variables (x1, …, xn) with exponents of real numbers (ri1, …, rin) [2].

$gi(x)=x1ri1x2ri2⋯xnrin,f(x)=∑i=1mβigi(x),$

where the coefficients βi is the real number.

Our goal is to obtain the function f(x) by estimating the parameters (βi and ri1, ri2, …, rin) using PSO.

### 2.2 Particle Swarm Optimization

In this research, the basic PSO proposed by Kennedy and Eberhart [3] was applied. In 2007, many implementations of the basic PSO were cataloged and defined by Bratton and Kennedy [4]. PSO has been applied to many optimization problems such as the traveling salesman problem [5] and scheduling problems [6]. PSO is also well known as a metaheuristic algorithm that can be utilized to optimize continuous nonlinear problems [7]. This is why PSO also has been widely used to select optimal parameters of support vector machine (SVM) for classification and regression [810]. In this paper, we will present a new, PSO-based supervised learning method for regression problems. To the best of our knowledge, there have been no previous studies on regression problems using PSO.

PSO is an iterative procedure maintaining multiple “particles”( solutions) and their “directions and speeds,” which are referred to simply as particles and velocities, respectively. In this research, PSO was used to estimate rij for all i, j in Eq. (1). The objective of PSO is to find the estimated rij and the coefficients in Eq. (1) in order to minimize the mean square error (MSE) of Eq. (1). First, the particle and velocity is defined as below:

$Particle: Rk=(r11k,r12k,…,rmnk),Velocity: Vk=(v11k,v12k,…,vmnk),$

where k is the particle index. The overall PSO procedure is schematized in Figure 1.

In the beginning, multiple particles and their velocities are randomly generated. The initial $rijk$ is between −1 and 1, and $∑j=1n|rijk|≤1$ for ∀ k. The initial $vijk$ is between −0.01 and 0.01. Table 1 shows how to initialize the k th particle and velocity. All $rijk$ are initialized from i = 1 to i = m term by term. In each term the independent variable index, j, is randomly selected one by one. In each selection, a uniform $[-1+∑∀j|rijk|,1-∑∀j|rijk|]$ random variable is generated and assigned to $rijk$. This allows $∑j=1n|rijk|$ to be less than or equal to one.

In each iteration, each particle and each velocity is updated based on the current particle, current velocity, local best particle, and global best particle. If $Rkt$ and $Vkt$ are the particle and velocity for the tth iteration, each particle is randomly changed by Eq. (3).

$Vkt+1=Vkt+2·R1·(global best particle-Rkt)+2·R2·(local best particle-Rkt),Rkt+1=Rkt+Vkt+1,$

where, R1 and R2 are uniform [0, 1] random variables. The local best particle is the best particle (solution) over all iterations for particle k, and the global best particle is the best particle (solution) over all iterations for all particles. The local and global best particles are updated when the coefficients and MSE are updated in Figure 1. The coefficients in Eq. (1) and its MSE are determined for each particle by the least square method. As shown in Figure 1, if the best global particle is not changed over TR iterations, the PSO procedure is terminated. This is the termination rule of the PSO procedure.

### 2.3 Three Particle and Velocity Update Methods

When a particle and its velocity are updated, the particle can obey constraint $∑j=1n|rijk|≤1$ (Figure 1). Three different update methods are proposed in this research, namely PSR1, PSR2, and PSR3, as shown in Table 2. In PSR1, if $∑j=1n|rijk|>1$, set $rijk=0$ one by one until $∑j=1n|rijk|≤1$. A smaller $|rijk|$ is selected, and set to zero, in order to select a less important $rijk$. In PSR 2, the constraint is relaxed as each $|rijk|≤1$. Therefore, if $|rijk|>1$, set $rijk$ to 1 or −1. In PSR3, PSO is able to keep infeasible particles (solutions) such as $|rijk|>1$. However, this method adds sufficient penalty to the associated MSE to prevent selection of an infeasible particle as the local or global best solution. In the next section, the results of the PSRs and other approaches are compared.

### 3.1 Experimental Design

In this section, the method by which the regression models are determined by the PSRs is explained, and then their respective experimental results are compared with OLS, multivariate adaptive regression splines (MARS), and SVR. In order to compare the OLS, MARS, SVR, and 3 PSRs, the four datasets shown in Table 3 were used. These data sets are usually used in machine learning area. Each data set has 10 train, valid, and test datasets, which are used for model training, model validation, and model testing, respectively. For optimal objectivity, various types of data were tested. Abalone had a larger number of data than the other datasets, and Triazines a larger number of independent variables.

A PSR, with its random search characteristics, provides different regression models over multiple runs with the same data. In order to find better regression models by the PSRs, five replication runs for each data set were performed. Five replication models were determined by each PSR with the test data. Then, the model providing the minimum MSE by valid data was selected as the regression model for the PSR. Finally, the selected model was tested by the test data. Ten terms (m = 10) were used for the PSR models in Eq. (1). The other parameters were pre-tested with data, as indicated in Table 3. Three numbers of particles were examined: 100, 200 and 300. The termination rule (TR) was also tested with 100, 200, and 300 particles. Two hundred particles and TR 200 were selected for a comparison experiment, since increasing them did not improve the overall MSEs. The R package version 3.4.0 was used for the PSR procedures, which were performed on an Intel Core i5-4590 CPU @ 3.30 GHz, 4 GB RAM.

### 3.2 Experimental Results

Table 4 compares the average MSEs and their average standard deviations for ten independent runs. For each method, the first row shows the average MSE, and the second row shows the average standard deviation. The bold values in Table 4 are the top 3 results for the 6 methods and test data. The 3 PSRs provided better test results than the other methods for the Abalone and Machine data sets. PSR1 and PSR2 provided better average MSEs for Abalone, Machine, and Triazines than OLS, MARS or SVR. SVR provided the best average MSE for the Auto data. These results show that the PSR methods provide competitive regression models.

Table 5 shows the test MSE error, (Method MSE − Best MSE)/Best MSE, between each method and the best method. PSR1 provided the best MSE for Abalone and Triazines, and PSR3 provided the best MSE for Machine. PSR1 was the best in terms of the average test MSE error. The second- and third-best methods were PSR2 and SVR, respectively. Tables 4 and 5 provide the evidence of the competitiveness of the PSR regression models.

Figure 2 represents the Table 5 data graphically. Figure 2(a) shows overall test MSE errors in Table 5. Figure 2(b) provides a zoom-in of the most notable area of Figure 2(a). Most of the methods provided smaller MSE errors for the Abalone dataset. OLS and MARS provided larger average errors than the others. Average MSE errors of PSR1 and PSR2 are smaller than SVR. All of the MSE errors of PSR1, PSR2, and PSR3 were less than 25%. Most significantly, all of the MSE errors of PSR2 were less than 20%. This result shows that PSR1 was the best method for minimization of average test MSE errors. However, PSR2 showed robustness with all of the data sets.

The advantage of the PSR models is their provision of a model description using independent variables. Tables 68 show PSR1, PSR2, and PSR3 model examples, respectively. With PSR1, the most valid MSE was shown at the first replication for the first run of Avalone (Table 6); with PSR2, the most valid MSE was shown at the third replication for the first run of Auto (Table 7), and with PSR3, the most valid MSE was shown at the third replication for the first run of Machine (Table 8). Each PSR model provided the minimum test MSE error. No model for Triazines is provided herein, due to the large number of independent variables.

Eq. (4) shows the model in Table 6 in mathematical form.

$y= -3.4941-15.5727·x10.3·x20.06·x30.02·x40.14·x50.11 ·x60.04·x7-0.11+3.4747·x10.15·x2-0.03·x3-0.17·x4-0.2 ·x6-0.1·x70.12-4.6892·x10.26·x2-0.18·x30.03·x40.01 ·x5-0.06·x6-0.05·x7-0.3-1.8090·x1-0.1·x2-0.29 ·x30.02·x4-0.05·x5-0.12·x6-0.07·x70.01+15.6752 ·x1-0.11·x20.3·x30.25·x4-0.01·x50.07·x6-0.03·x7-0.01 -3.8564·x1-0.03·x20.27·x40.07·x50.05·x6-0.33·x70.07 +7.0280·x2-0.02·x4-0.23·x50.21·x60.09·x7-0.37 +28.3323·x1-0.11·x2-0.02·x40.49·x5-0.27 ·x6-0.01·x70.04-6.8067·x1-0.1·x2-0.05·x30.13·x40.11 ·x50.22·x6-0.09·x7-0.16-14.3795·x1-0.37·x2-0.29 ·x4-0.01·x50.02·x60.2·x7-0.01.$

Tables 68 represents the mathematical model from PSR1, PSR2, and PSR3. Eq. (4) shows example mathematical model as a mathematical form.

The rij in Tables 68 are all between −1 and 1. All $∑j=1n|rijk|≤1$ in Tables 6 and 8. The regression models differ based on the PSR method. More than half of rij are bounded at 1 or −1 in Table 7 but not in Table 6. Some rij are zeros in Tables 68. The ratios of rij = 0 for all of the developed models are provided in Table 9. The ratio was calculated by the total number of rij = 0 over the total number of estimated rij. When the number of independent variables increased, the number of rij= 0 also increased. For Abalone, 3, 500 rij (7 independent variables ×10 terms ×5 replications ×10 independent data sets) were estimated for each PSR model. Totals of 3, 500, 3, 000, and 30, 000 rij were estimated for Auto, Machine, and Triazines, respectively, based on the number of independent variables.

PSR1 and PSR3 estimated more rij to be zero than PSR2, since they have more bounded constraints. For Abalone and Triazines, PSR1 was the best and PSR2 second best. Users can choose their model in consideration of minimum test MSE error (PSR1), robustness (PSR2), or ratio of rij = 0 (PSR1), according to their specific purpose. All of these models were shown to be competitive with existing methods (i.e., OLS, MARS, and SVR).

### 4. Conclusion

In this paper, we proposed new PSRs to solve the nonlinear regression problem. The PSR methods provide the signomial functions using the PSO method. The PSR regression models offer the benefit of explicit functions descriptions representing the nonlinearity of the given data without kernel tricks.

Also, with the PSR regression models, sparse signomial functions that can be used for prediction purposes as well as interpretation purposes can be obtained. The present experimental results revealed that this approach can detect a sparse signomial nonlinear function; indeed, PSR may be considered as a useful alternative when interpreting which original input variables are more important than others in an obtained regression model. The development of nonlinear sparse regression models using other metaheuristic optimization methods will be an important focus of future studies.

### Acknowledgements

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (NRF-2017R1E1A1A01077375).

### Fig 1.

Figure 1.

PSO procedure.

The International Journal of Fuzzy Logic and Intelligent Systems 2019; 19: 307-314https://doi.org/10.5391/IJFIS.2019.19.4.307

### Fig 2.

Figure 2.

Test MSE errors: (a) overall test MSE errors in Table and (b) a zoom-in of the most notable area of Figure 2(a).

The International Journal of Fuzzy Logic and Intelligent Systems 2019; 19: 307-314https://doi.org/10.5391/IJFIS.2019.19.4.307

Pseudo-code for initialization.

Pseudo-code
ParticleStep 1: Set rijk=0 for ∀ i, j. Set i = 1, S = {1, 2, 3, …, n}.Step 2: Randomly select q in S.Step 3: Generate uniform [-1+j|rijk|,1-j|rijk|] and round up two decimal places and assign it to riqk.Step 4: S = S−{q}. If S ≠ Ø, go to Step 2.Step 5: If i < m, i = i + 1, set S = {1, 2, 3, …, n} and go to Step 2.
VelocityGenerate uniform [−0.01, 0.01] and assign it to vijk for ∀i, j

Three infeasible-particle update rules.

MethodUpdate rule
PSR1If j=1n|rijk|>1, set rpqk=0, where (p,q)=arg(p,q),|rpqk|>0min|rpqk|, until j=1n|rijk|1.
PSR2If rijk>1, set rijk=1 or if rijk<-1, set rijk=-1.
PSR3Add a penalty to MSE when j=1n|rijk|>1 for any k.

Parameters for each dataset.

DatasetNumber of independent variables (n)Number of independent runsNumber of observations
TrainValidTest
Abalone7102,507835304
Auto710304101101
Machine6101254242
Triazines60101123737

Average MSEs and their standard deviations for each method and dataset.

MethodDataset
AbaloneAutoMachineTriazines
TrainValidTestTrainValidTestTrainValidTestTrainValidTest
OLSMSE4.904.935.1110.2411.8212.572854838862150.0110.0420.058
SD0.100.290.410.511.631.57846646744880.0010.0090.043

MARSMSE4.454.534.666.978.729.36147611212107100.0130.0260.023
SD0.090.230.280.550.951.9344612416152310.0020.0050.005

SVRMSE4.284.464.674.117.507.191156560050920.0130.0230.023
SD0.130.260.360.461.361.24250440949760.0010.0060.006

PSR1MSE4.424.384.626.778.058.931324562542310.0130.0210.019
SD0.120.220.310.431.051.75243468837950.0010.0040.004

PSR2MSE4.364.384.636.427.888.19680350746560.0120.0200.020
SD0.130.230.320.531.131.48204258542040.0010.0030.006

PSR3MSE4.434.394.667.067.928.891676448440390.0150.0220.026
SD0.130.210.370.211.291.61446355029040.0010.0040.016

Test MSE error.

MethodAbaloneAutoMachineTriazinesAverage
OLS0.1058560.7484770.5389251.9693270.840646
MARS0.0082580.3016761.6518180.1871670.537230
SVR0.00959600.2608810.1554160.106473
PSR100.2418710.04756200.072358
PSR20.0024540.1390360.1529400.0475480.085495
PSR30.0072780.23726600.3407370.146320

MSE error = (Method MSE – Best MSE) / Best MSE..

PSR1 model for first dataset of Avalone.

Term index (i)Coefficient (βi)rij
Independent variable index (j)
1234567
1−15.57270.30.060.020.140.110.04−0.11
23.47470.15−0.03−0.17−0.20−0.10.12
3−4.68920.26−0.180.030.01−0.060.05−0.3
4−1.8090−0.1−0.290.02−0.05−0.12−0.070.01
515.6752−0.11−0.30.25−0.010.07−0.03−0.01
6−3.8564−0.030.2700.070.05−0.330.07
77.02080−0.020−0.230.210.09−0.37
828.3323−0.11−0.0200.49−0.27−0.010.04
9−6.8067−0.1−0.050.130.110.22−0.09−0.16
10−14.3795−0.37−0.290−0.010.020.2−0.01
0−3.4941-

PSR2 model for first dataset of Auto.

Term index (i)Coefficient (βi)rij
Independent variable index (j)
1234567
15.64E+11−0.96−1−1−1−1−1−1
2−140.3110−0.29−0.61−111−1−0.59
30.08789−0.1−1−11110
4−0.029110.440.73−0.411−0.360.8
54.31E-061−0.910.810.7611−0.75
60.08510.79−0.7−11−0.6110.85
73946841−1−0.5−0.55−1−110.88
80.0008110.540.04−0.35−0.511
9−4714388−1−1−1−0.38−10.811
102.9175−0.421−0.45−10.991−1
08.5401-

PSR3 model for first dataset of Machine.

Term index (i)Coefficient (βi)rij
Independent variable index (j)
123456
1−1145.7200−0.73000.1600.01
2−318.298−0.01−0.4−0.020.190.010.33
30.14970.830.02−0.010.020.020.05
40.21050.010.05−0.01−0.070.650.18
50.70120.160.71−0.010.02−0.010
61679.23900.05−0.17−0.22−0.23−0.02−0.29
7−1.85700.320.52−0.020.070.03−0.01
80.41330.470.060.32−0.010.03−0.06
9307.1087−0.36−0.04−0.010.280.010.27
10−28.4306−0.230.230−0.250.10.08
026.2379-

Ratio of rij = 0 (total number of rij = 0).

MethodAbaloneAutoMachineTriazines
PSR10.294 (1,029)0.611 (2,140)0.527 (1,582)0.952 (2,8571)
PSR20.012 (41)0.034 (115)0.018 (54)0.467 (14,009)
PSR30.119 (415)0.192 (671)0.113 (338)0.906 (27,181)

### References

1. Drucker, H, Burges, CJ, Kaufman, L, Smola, AJ, and Vapnik, V (1996). Support vector regression machines. Advances in Neural Information Processing Systems. 9, 155-161.
2. Maranas, CD, and Floudas, CA (1997). Global optimization in generalized geometric programming. Computers & Chemical Engineering. 21, 351-369. https://doi.org/10.1016/S0098-1354(96)00282-7
3. Kennedy, J, and Eberhart, R 1995. Particle swarm optimization (PSO)., Proceedings of IEEE International Conference on Neural Networks, Perth, Australia, Array, pp.1942-1948. https://doi.org/10.1109/ICNN.1995.488968
4. Bratton, D, and Kennedy, J 2007. Defining a standard for particle swarm optimization., Proceedings of 2007 IEEE Swarm Intelligence Symposium, Honolulu, HI, Array, pp.120-127. https://doi.org/10.1109/SIS.2007.368035
5. Wang, KP, Huang, L, Zhou, CG, and Pang, W 2003. Particle swarm optimization for traveling salesman problem., Proceedings of the 2003 international Conference on Machine Learning and Cybernetics (IEEE cat no 03ex693), Xian, China, Array, pp.1583-1585. https://doi.org/10.1109/ICMLC.2003.1259748
6. Akjiratikarl, C, Yenradee, P, and Drake, PR (2007). PSO-based algorithm for home care worker scheduling in the UK. Computers & Industrial Engineering. 53, 559-583. https://doi.org/10.1016/j.cie.2007.06.002
7. Schwaab, M, Biscaia, EC, Monteiro, JL, and Pinto, JC (2008). Nonlinear parameter estimation through particle swarm optimization. Chemical Engineering Science. 63, 1542-1552. https://doi.org/10.1016/j.ces.2007.11.024
8. Bao, Y, Hu, Z, and Xiong, T (2013). A PSO and pattern search based memetic algorithm for SVMs parameters optimization. Neurocomputing. 117. https://doi.org/10.1016/j.neucom.2013.01.027
9. Wang, D, and Wu, X (2008). The Optimization of SVM Parameters Based on PSO. Computer Applications. 28, 134-139.
10. Yuan, X, and Liu, A (2007). The research of SVM parameter selection based on PSO algorithm. Techniques of Automation and Application. 26, 5-8.