Article Search
닫기

Original Article

Split Viewer

Int. J. Fuzzy Log. Intell. Syst. 2018; 18(2): 111-119

Published online June 25, 2018

https://doi.org/10.5391/IJFIS.2018.18.2.111

© The Korean Institute of Intelligent Systems

Building a HOG Descriptor Model of Pedestrian Images Using GA and GP Learning

Youngwan Cho, and Kisung Seo

Department of Computer Engineering, Seokyeong University, Seoul, Korea

Correspondence to :
Kisung Seo (ksseo@skuniv.ac.kr)

Received: May 17, 2018; Revised: June 9, 2018; Accepted: June 18, 2018

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

For detecting a pedestrian by using features of images, it is generally needed to establish a reference model that is used to match with input images. The support vector machine (SVM) or AdaBoost Cascade method have been generally used to train the reference pedestrian model in the approaches using the histogram of oriented gradients (HOG) as features of the pedestrian model. In this paper, we propose a new approach to match HOG features of input images with reference model and to learn the structure and parameters of the reference model. The Gaussian scoring method proposed in this paper evaluates the degree of feature coincidence with HOG maps divided with angle of the HOG vector. We also propose two approaches for leaning of the reference model: genetic algorithm (GA) based learning and genetic programming (GP) based learning. The GA and GP are used to search the best parameters of the gene and nonlinear function representing feature map of pedestrian model, respectively. We performed experiments to verify the performance of proposed method in terms of accuracy and processing time with INRIA person dataset.

Keywords: HOG model, Genetic algorithm, Genetic programming, Learning, Pedestrian detection

Machine vision has attracted much attention in recent decades of years because it is important for various application areas such as surveillance, car safety, and robotics. Recently, various researches on machine vision have been actively applied smart vehicles for driving safety and autonomous driving. The recent progresses in smart vehicles have growing needs to be endowed with pedestrian detection. This study aims at propose an approach to pedestrian detection with enhanced accuracy and reduced processing time.

In recent years, the number of approaches to detect pedestrians in monocular images has grown steadily. Benenson et al. [1] categorized the main paradigms for pedestrian detection into Viola-Jones variants, HOG+SVM rigid templates, deformable part detectors (DPM), and convolutional neural networks. Viola and Jones [2] recognized objects by combining Haar wavelet-like features. Instead of recognizing an object using a single classifier, they used several multiple cascade classifiers to recognize the object. Viola et al. [3] also developed a superior classifier progressively through a boosting method for detector training. Dollar et al. [4] evaluated that the ideas of Viola et al. [3] had served as a foundation for modern detectors. YOLO [6] and faster R-CNN [7] are currently considered as the state of the art techniques where real-time deep learning techniques are applied.

Dalal and Triggs [8] created the HOG, which is a method to generate a histogram by accumulating the magnitude in terms of angles of gradient and recognize the shape of the object through it. In addition, they used SVM as a learning machine to utilize the HOG features in detecting pedestrians. The DPM of Felzenszwalb et al. [9] utilized the method of Dalal and Triggs [8] as a building block. Zhu et al. [11] improved the speed of calculation in the study of Dalal and Triggs [8] by applying the cascade technique to HOG features. There is a difference between the conventional HOG and cascade HOG in that the cascade HOG transforms the size of the model variably and selects an excellent model among the converted models through the boosting learning, whereas the conventional HOG recognizes the object by fixing the size of the model.

Since their introduction, the number of variants of HOG features has proliferated greatly with nearly all modern detectors utilizing them in some form. Maji et al. [12] considered fast approximations of non-linear kernels. Dollar et al. [5] recognized pedestrians by utilizing various channels such as color and edge, unlike the HOG using a gray channel. Felzenszwalb et al. [10] determined the similarity between the whole model and partial model of the pedestrian model after their generation and finally judged whether it is a pedestrian or not. In most of studies using the HOG such as studies of Dalal and Triggs [8] and Felzenszwalb et al. [10], pedestrian recognition is carried out by using learning based approach in which discriminative features are extracted from each candidate and then, they are passed through the learning machine or classifier such as SVM. In this paper, we propose a new approach for matching and training of pedestrian model using the HOG features.

2.1 Pedestrian Detection

In this paper, we use a matching technique using the pedestrian HOG feature model as a method for pedestrian detection through a vision sensor. After distinguishing features and similarities in the input image with the pedestrian HOG feature model, the detector judges whether it is a pedestrian or not. To determine the similarities, we propose a scoring method using the Gaussian function in this paper. The scoring method divides the HOG features by each angle, accumulates the score values calculated at each angle and obtains the final score.

If the pedestrian HOG feature model is configured with a single pedestrian image, there might be caused errors that detect only a specific pedestrian by characteristics such as clothing, hair style and posture of the pedestrian. Therefore, it is necessary to train the pedestrian HOG feature model to express the generalized pedestrian characteristics. The HOG feature model is learned using genetic algorithm. In addition, we use genetic programming to learn the nonlinear function expressing the reference model which is used to calculate the final score value to determine the detection of pedestrian.

2.2 Gaussian Scoring

In this research, we divide the HOG feature values ranging from 0° to 360° degrees by the angle and determine the similarity between the model and an input image with respect to the feature values on each divided angle space. This paper proposes a Gaussian scoring to determine the degree of similarity for each angle as given in Eq. (1).

Score(I,M)=e-(I-M)22(kM)2,

where, I and M represent the HOG value of an input image and the model, respectively. The constant k is introduced to generate higher scoring to relatively larger HOG values compared to smaller HOG values. The final scoring function for the input image cell is derived as Eq. (2).

anglescore(x,y,l)=(i,j)Score(I(x+i,y+j),M(i,j))w·h,Cellscore(x,y)=(l)anglescore(x,y,l)Nl,if(CellScore>Threshold)PedestrianDetect,

where w, h and Nl represent the width and height of the model and total number of divided angle, respectively.

When the scoring based decision is applied to pedestrian detection, a huge amount of calculation is needed because the scoring is performed through the exponential function for all cells. Therefore, a lookup table is used to improve the calculation speed. Eq. (1) for Gaussian scoring function can be expressed in terms of parameter j as given in Eq. (3).

Score(I,M)=e-(j-1)22k2,j=IM.

Since the value of the Gaussian function is determined by the ratio of the HOG value of the input image to the reference model as shown in Eq. (3), the Gaussian function value according to the ratio can be configured with a lookup table. In addition, as the ratio j approaches to small or large value, the scoring value approaches to zero. Since the scoring value approaching to zero means that the feature of the model and input image are far each other in similarity, we can regard the scoring value in these cases as 0 to improve the calculation speed. The proposed Gaussian scoring is calculated with the ratio j value as shown in Eq. (4).

If(1nj2n-1n)Score(j)=e-(j-1)22k2,elseScore(j)=0.

The scoring function (4) can be approximated by lookup table configured at discrete value j and linear interpolation given in Eq. (5)

Score(j)=GauLUT(jindex)+GauLUT(jindex+1)×(j-j*),

where GauLUT represents the Gaussian lookup table array, jindex represents the array index corresponding to the ratio value, and j* represents the j value corresponding to jindex, respectively.

3.1 Learning of the Model by using GA

GA was first developed by John Holland in 1975 as an intelligent algorithm that solves the optimization problems based on Darwin’s theory of evolution. It was developed by mimicking the biological evolution and explores the best genes through crossover, mutation and selection operations. Many studies have reported that GA and GP provide an efficient and robust alternative for solving complex and highly nonlinear optimization problems utilizing global search procedures. The GA and GP model and their optimization procedures can be applied to solve the problem of establishing and training of the reference pedestrian model because it belongs to highly nonlinear optimization problem.

This paper proposes a HOG model training method using GA and applies the model to detection of general pedestrian characteristics. The INRIA person dataset is used for the training image of the HOG model, and the pedestrian HOG model is trained using 1, 216 pedestrian images and 1, 218 background images. The genetic configuration of the pedestrian model is shown in Figure 1.

In order to apply GA to learn the HOG pedestrian detection model, the model image and training images need to be presented as genes. A pedestrian model image is divided into cells as shown in Figure 1. A gene represented by P in Figure 1 consists of chromosomes (G) whose number is equal to the number of divided cells in the image.

GA needs to evaluate the fitness of gene to learn the parameter of the model. In order to construct the fitness function, we define the average score of the pedestrian images and non-pedestrian images. The average score is obtained by averaging the Gaussian scorings of the whole image of the pedestrian and non-pedestrian, respectively, with the GA HOG model as shown in Figure 2 and Eq. (6).

SP=1NPi=1NPScore(PDi,GAModel),SN=1NNi=1NNScore(NDi,GAModel).

In Eq. (6)SP and SN represent the average score of pedestrian images and the non-pedestrian images, NP and NN represent the number of positive data and the negative data, and PDi and NDi represent the ith pedestrian and non-pedestrian image, respectively. Now, we can regard SP and SN as measures for representing matching degree between GA model and pedestrian image and non-pedestrian image, respectively.

It can be noted that the high value of SP means the corresponding GA model has high degree of measure for representing the general pedestrian characteristics. On the other hand, when SN is high, the measure can be regarded that the model hardly distinguish the pedestrian and non-pedestrian characteristics. Therefore, the larger the difference between SP and SN, the easier it is distinguish between the general pedestrian and non-pedestrian characteristics, which is the more advantageous to detect pedestrians. On the other hand, if the variance value of the pedestrian score and the non-pedestrian score becomes larger, it is difficult to separate the pedestrian and non-pedestrian characteristics. Figure 3 shows changes in pedestrian detection rates of the pedestrian classification according to changes of score variance.

In graphs of Figure 3, SP and SN values are 6 and 4, respectively. Under the assumption that pedestrians are recognized based on the red solid lines, the detection rate is 100% in Figure 3(a), but the rate is 28.5% in Figure 3(b). Therefore, in order to train GA HOG model to have good performance, it is advantageous to reflect the variance of the pedestrian score and the non-pedestrian score in the evaluation function. Thus, we derived the fitness function to evaluate the GA HOG model as shown in Eq. (7).

Fitness=sw1SPSN+w2(SP-SN)-w3(vP+vN),If(SP<Threshold)s=0,elses=1,

where w1, w2 and w3 represent weights, vP and vN represent variance of scores for pedestrian images and non-pedestrian images, respectively. In Eq. (7), we can note that SP -SN value is applied to the fitness function because the larger the difference between SP and SN, the better the pedestrian model. In addition, vP +vN is added because the model having smaller variance is advantageous to distinguish between the pedestrian and non-pedestrian images as explained above. We designed the fitness function to have the term SP/SN, the ratio of the score on the positive data to the score on the negative data, because it is another representation of characteristic difference between positive data and negative data to the model.

3.2 Learning of the Model by using GP

The overall flow of GP algorithm [13] is similar to GA, but the structure of genes is different from each other. While GA has a fixed structure due to its bit string or real-number string, GP has a variable structure as the genes are expressed in a tree structure. Since the genetic structure of GP is variable, it has been widely applied to practical optimization such as searching for complex nonlinear functions or designing the system model itself.

Wf1=α×Tree0+(1-α)×Tree1,Wf2=(1-α)×Tree0+α×Tree1.

The cell score consists of average value of angle scores corresponding to the scores for each angle in Eq. (2a). Since angle score is a score for each angle, and we divide the angle by 20° in this paper, there exist 18 angle scores for each image. In Eq. (2b), the average value of the angle scores is obtained to calculate the cell score. In this case, the weights of all angle scores are same. However, there might be specific angle score that can well fit the general pedestrian characteristics.

We can train cell score function of Eq. (2b) using GP to the direction of increasing the weight for this type of angle score. The number of terminals used in cell score learning is 19 in total, including 18 angle score values from angle score 0° to angle score 340° and any constant values from 0 to 1. In this paper, GP functions consist of 11 items, which are +, |−|, *,/, |sin|, |cos|, avg, Wf1, Wf2, min, and max. The terminal and function of GP are summarized in Table 1.

Through the Gaussian scoring of INRIA person dataset and GA HOG learning model introduced for learning the score, the angle score is obtained as shown in Figure 4, and it is utilized as a GP terminal.

Based on the angle score value obtained as in Figure 4, a GP tree is configured to obtain the final score. In order to evaluate the fitness of the GP tree, the average score SPGP of positive image through GP model and the average score SNGP of negative image are calculated. The GP explores the tree with the largest difference between SPGP and SNGP. The calculation process of SPGP and SNGP is shown in Figure 5, and the equation is shown in Eq. (9).

SPGP=1NPi=1NPGPScorei(anglescore),SNGP=1NNi=1NNGPScorei(anglescore).

GP fitness is similar to the fitness of GA HOG model learning. As in the GA model, the larger the difference between SPGP and SNGP and the smaller the variance of scores are, the easier it is distinguish between the pedestrian and non-pedestrian characteristics. Thus, we designed the fitness function of GP model to reflect the difference and the variance of the average scores. Through the Gaussian scoring of an input image and the learned GA HOG model, the angle score is obtained, and the final score is calculated by using it as an input of the learned GP scoring. The final decision whether the input image is a pedestrian image or not depends on the threshold. Thus, the performance of pedestrian detection is affected by the threshold settings. Therefore, we propose a method for setting appropriate threshold values as shown in Figure 6 and Eq. (10).

Pd=sP-vP,Nu=sN+vN,d=Pd-Nu,T=Pd-d·vPvP+vN.

Pd and Nu represents the lower positive and upper negative reference, respectively. These are obtained by subtracting the standard deviation from the average score. The difference between Pd and Nu is represented by d, which can be a measure for evaluating how good classifier is generated. The threshold can be raised or lowered by the standard deviation value of positive score and negative score from the median value. This means that data with larger standard deviation has wider score width, and therefore the threshold T is adjusted accordingly.

In order to verify the performance of the scoring learning technique using GP and pedestrian HOG model learning using GA and Gaussian scoring proposed in this paper, the pedestrian detection test was performed with the testing image of INRIA person dataset. We used ACC (Accuracy), CSI (Critical Success Index) and PAG (Post Agreement) as the performance indexes for the experiment of pedestrian image detection. Table 2 shows the result division table for performance index parameters, and the performance indexes are calculated with the parameters as shown in Eq. (11).

ACC=TP+TNTP+FP+TN+FN,PAG=TPTP+FP,CSI=TPTP+FP+FN.

We verify the performance of pedestrian detection through GA HOG model and GP scoring and compare the performance with a pedestrian classifier using SVM of HOG features. The parameters used in GA learning are summarized in Table 3, and Figure 7 shows the average score of the best GA HOG model.

We experimented GP learning with the parameters given in Table 4, and the result of best GP are shown in Figure 8. Table 5 shows a summary of the analysis on the performance comparison of the pedestrian detection using the conventional SVM and the algorithm proposed in this paper.

In Table 6, GS represents the Gaussian scoring, and GU means the Gaussian lookup table. The detection accuracy of GS and GA model was about 4% better than that of SVM, and the performance of GS+GA+GP model showed about 9% better in accuracy compared to SVM. In addition, the execution time of the proposed methods was measured, and the results of the time taken to execute a single input image with a resolution of 640 × 480 are summarized in Table 6. All the algorithms proposed in this paper are much better than SVM in terms of the execution time, and the execution time of the algorithm using the Gaussian lookup table proposed in this paper was about 2.3 times faster than the conventional GS.

This paper proposed a Gaussian Scoring method for matching images and GA/GP based learning to establish a reference model for pedestrian detection.

In the conventional approaches using HOG feature for matching, the HOG feature map consists of vectors representing the magnitude and angle of gradient for each pixel, and a reference model is trained with learning algorithms such as SVM and AdaBoost cascade method. In this paper, we consider the HOG value separately for each divided angle and evaluate the degree of HOG feature coincidence using the GS. By evaluating the HOG feature for each divided angle, it allowed us to visualize the features and make it easy to analyze it. This has a major advantage over the conventional HOG matching technique that cannot ensure the visualized representation due to large dimension of the HOG vector. In order to solve the problem of processing time to calculate the Gaussian score for all cells of the input images, we proposed a method to improve the processing time by using the Gaussian lookup table. In this paper, we also proposed GA and GP based training methods for the reference model. The GA and GP are used to search the best parameters of the gene and nonlinear function representing feature map of pedestrian model, respectively.

We performed experiments to verify the performance of proposed method in terms of accuracy and processing time with INRIA person dataset. The experimental results showed us that the accuracies of pedestrian detection using GA model and GP model were about 4% and 9% higher, respectively than the SVM method. In terms of processing time, the results showed us that the GA model using Gaussian lookup table has the best performance and 7.8 times faster than the SVM model. For future works, it is challenged to devise a model and training algorithm for recognizing the pedestrian even from input images where various size of pedestrian is appeared or a part of pedestrian is covered by something.

Fig. 1.

Genetic configuration of the pedestrian HOG model.


Fig. 2.

Calculation of the average score SP and SN.


Fig. 3.

Effect of score variance to detect pedestrian.


Fig. 4.

Angle score calculation process.


Fig. 5.

GP score average calculation process.


Fig. 6.

Threshold setting.


Fig. 7.

Best GA model.


Fig. 8.

Best GP model.


Table. 1.

Table 1. Configuration of GP terminal and function.

NodeArityDescription
R0Arbitrary values ranging from 0 to 1
angle score0~3400Score values of each angle
|Sin|1Absolute value of Sin function
|Cos|1Absolute value of Cos function
+2Sum of two channels
|−|2Absolute value of difference between two channels
*2Product of two channels
/2Division of two channels
Avg2Average of two channels
Wf12Weighted sum of two channels
Wf22Weighted sum of two channels
Max3Maximum value among three channels
Min3Minimum value among three channels

Table. 2.

Table 2. Results division table.

Predicted
YesNo
ActualYesTrue PositiveFalse Negative
NoFalse NegativeTrue Negative

Table. 3.

Table 3. GA parameter values.

ParameterValue
Population size50
Max generation100
Crossover rate0.9
Mutation rate0.1
Select methodTournament (size= 7)

Table. 4.

Table 4. GP parameter values.

ParameterValue
Population size1, 000
Max generation200
Crossover rate0.9
Mutation rate0.1
Select methodTournament (size= 7)
Initial depth6–8
Max depth17
Initial populationHalf and half

Table. 5.

Table 5. Experimental results of pedestrian detection performance.

Efficiency index
ACCPAGCSI
SVM0.8470.8010.724
GS0.7430.7520.592
GS + GA0.8900.8890.802
GS + GA + GP0.9270.9250.871

Table. 6.

Table 6. Comparison of execution time.

Execution time (FPS)
GS + GA44 ms
GS + GA + GP53 ms
SVM148 ms
GU + GA19 ms
GU + GA + GP28 ms

  1. Benenson, R, Omran, M, Hosang, J, and Schiele, B 2014. Ten years of pedestrian detection, what have we learned?., Computer Vision ECCV 2014 Workshops, Array, pp.613-627.
  2. Viola, P, and Jones, MJ 2001. Rapid object detection using a boosted cascade of simple features., Proceedings of the 2001 IEEE International Conference on Computer Vision and Pattern Recognition, Kauai, HI, Array.
  3. Viola, P, Jones, MJ, and Snow, D 2003. Detecting pedestrians using patterns of motion and appearance., Proceedings of the 9th International Conference on Computer Vision, Nice, France, Array.
  4. Dollar, P, Wojek, C, Shiele, B, and Perona, P (2012). Pedestrian detection: an evaluation of the state of the art. IEEE Transactions on Pattern Analysis and Machine Intelligence. 34, 743-761.
    CrossRef
  5. Redmon, J, Divvala, S, Girshick, R, and Farhadi, A 2016. You only look once: unified, real-time object detection., Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, Array, pp.779-788.
  6. Ren, S, He, K, Girshick, R, and Sun, J (2015). Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems. 28, 91-99.
  7. Dalal, N, and Triggs, B 2005. Histograms of oriented gradients for human detection., Proceedings of International Conference on Computer Vision and Pattern Recognition, San Diego, CA, Array, pp.886-893.
  8. Felzenszwalb, P, McAllester, D, and Ramanan, D 2008. A discriminatively trained, multiscale, deformable part model., Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, Array, pp.1-8.
  9. Zhu, Q, Yeh, MC, Cheng, KT, and Avidan, S 2006. Fast human detection using a cascade of histograms of oriented gradients., Proceedings of International Conference on Computer Vision and Pattern Recognition, New York, NY, Array, pp.1491-1498.
  10. Maji, S, Berg, AC, and Malik, J 2008. Classification using intersection kernel support vector machine is efficient., Proceedings of International Conference on Computer Vision and Pattern Recognition, Anchorage, AK, Array, pp.1-8.
  11. Dollar, P, Tu, Z, Perona, P, and Belongie, S 2009. Integral channel features., Proceedings of the British Machine Vision Conference, London, UK, Array.
  12. Felzenszwalb, P, Girshick, R, McAllester, D, and Ramanan, D (2010). Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence. 32, 1627-1645.
    Pubmed CrossRef
  13. Koza, JR (1992). On the Programming of Computers by Means of Natural Selection. Cambridge, MA: MIT Press
  14. Cheon, M, and Lee, H (2017). Vision-based vehicle detection system applying hypothesis fitting. International Journal of Fuzzy Logic and Intelligent Systems. 17, 58-67.
    CrossRef
  15. Kim, AR, Rhee, SY, and Jang, HW (2016). Lane detection for parking violation assessments. International Journal of Fuzzy Logic and Intelligent Systems. 16, 13-20.
    CrossRef
  16. Choi, H, and Cho, Y 2016. A learning method of evolutionary HOG model for pedestrian detection., Proceedings of 11th International Workshop Series Convergence Works.

Youngwan Cho received the B.S., M.S., and Ph.D. degrees in electronic engineering from Yonsei University, Seoul, Korea, in 1991, 1993, and 1999, respectively. He worked as a Senior Research Engineer in the Control System Group at Samsung Electronics, Seoul, Korea, from 2000 to 2003. He was a visiting scholar at Department of Mechanical Engineering, Michigan State University from 2016 to 2017. He is currently working as an Associate Professor in the Department of Computer Engineering, Seokyeong University, Seoul, Korea. His research interests include fuzzy control theory and applications, intelligent control systems, machine learning, and robotics and automation.

E-mail: ywcho@skuniv.ac.kr


Kisung Seo received the B.S., M.S., and Ph.D. degrees in Electrical Engineering from Yonsei University, Seoul, Korea, in 1986, 1988, and 1993, respectively. He became Full-time Lecturer and Assistant Professor of Industrial Engineering in 1993 and 1995 at Seokyeong University, Seoul, Korea. He joined Genetic Algorithms Research and Applications Group (GARAGe) and Case Center for Computer-Aided Engineering & Manufacturing, Michigan State University from 1999 to 2002 as a Research Associate. He was also appointed Visiting Assistant Professor in Electrical & Computer Engineering, Michigan State University from 2002 to 2003. He was a Visiting Scholar at BEACON (Bio/computational Evolution in Action CONsortium) Center, Michigan State University from 2011 to 2012. He is currently Professor of Electronics Engineering, Seokyeong University. His research interests include deep learning, computer vision, evolutionary algorithm, genetic programming, evolutionary robotics, and evolutionary prediction for weather system.

E-mail: ksseo@skuniv.ac.kr


Article

Original Article

Int. J. Fuzzy Log. Intell. Syst. 2018; 18(2): 111-119

Published online June 25, 2018 https://doi.org/10.5391/IJFIS.2018.18.2.111

Copyright © The Korean Institute of Intelligent Systems.

Building a HOG Descriptor Model of Pedestrian Images Using GA and GP Learning

Youngwan Cho, and Kisung Seo

Department of Computer Engineering, Seokyeong University, Seoul, Korea

Correspondence to:Kisung Seo (ksseo@skuniv.ac.kr)

Received: May 17, 2018; Revised: June 9, 2018; Accepted: June 18, 2018

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

For detecting a pedestrian by using features of images, it is generally needed to establish a reference model that is used to match with input images. The support vector machine (SVM) or AdaBoost Cascade method have been generally used to train the reference pedestrian model in the approaches using the histogram of oriented gradients (HOG) as features of the pedestrian model. In this paper, we propose a new approach to match HOG features of input images with reference model and to learn the structure and parameters of the reference model. The Gaussian scoring method proposed in this paper evaluates the degree of feature coincidence with HOG maps divided with angle of the HOG vector. We also propose two approaches for leaning of the reference model: genetic algorithm (GA) based learning and genetic programming (GP) based learning. The GA and GP are used to search the best parameters of the gene and nonlinear function representing feature map of pedestrian model, respectively. We performed experiments to verify the performance of proposed method in terms of accuracy and processing time with INRIA person dataset.

Keywords: HOG model, Genetic algorithm, Genetic programming, Learning, Pedestrian detection

1. Introduction

Machine vision has attracted much attention in recent decades of years because it is important for various application areas such as surveillance, car safety, and robotics. Recently, various researches on machine vision have been actively applied smart vehicles for driving safety and autonomous driving. The recent progresses in smart vehicles have growing needs to be endowed with pedestrian detection. This study aims at propose an approach to pedestrian detection with enhanced accuracy and reduced processing time.

In recent years, the number of approaches to detect pedestrians in monocular images has grown steadily. Benenson et al. [1] categorized the main paradigms for pedestrian detection into Viola-Jones variants, HOG+SVM rigid templates, deformable part detectors (DPM), and convolutional neural networks. Viola and Jones [2] recognized objects by combining Haar wavelet-like features. Instead of recognizing an object using a single classifier, they used several multiple cascade classifiers to recognize the object. Viola et al. [3] also developed a superior classifier progressively through a boosting method for detector training. Dollar et al. [4] evaluated that the ideas of Viola et al. [3] had served as a foundation for modern detectors. YOLO [6] and faster R-CNN [7] are currently considered as the state of the art techniques where real-time deep learning techniques are applied.

Dalal and Triggs [8] created the HOG, which is a method to generate a histogram by accumulating the magnitude in terms of angles of gradient and recognize the shape of the object through it. In addition, they used SVM as a learning machine to utilize the HOG features in detecting pedestrians. The DPM of Felzenszwalb et al. [9] utilized the method of Dalal and Triggs [8] as a building block. Zhu et al. [11] improved the speed of calculation in the study of Dalal and Triggs [8] by applying the cascade technique to HOG features. There is a difference between the conventional HOG and cascade HOG in that the cascade HOG transforms the size of the model variably and selects an excellent model among the converted models through the boosting learning, whereas the conventional HOG recognizes the object by fixing the size of the model.

Since their introduction, the number of variants of HOG features has proliferated greatly with nearly all modern detectors utilizing them in some form. Maji et al. [12] considered fast approximations of non-linear kernels. Dollar et al. [5] recognized pedestrians by utilizing various channels such as color and edge, unlike the HOG using a gray channel. Felzenszwalb et al. [10] determined the similarity between the whole model and partial model of the pedestrian model after their generation and finally judged whether it is a pedestrian or not. In most of studies using the HOG such as studies of Dalal and Triggs [8] and Felzenszwalb et al. [10], pedestrian recognition is carried out by using learning based approach in which discriminative features are extracted from each candidate and then, they are passed through the learning machine or classifier such as SVM. In this paper, we propose a new approach for matching and training of pedestrian model using the HOG features.

2. HOGDescriptor Model of Pedestrian Images

2.1 Pedestrian Detection

In this paper, we use a matching technique using the pedestrian HOG feature model as a method for pedestrian detection through a vision sensor. After distinguishing features and similarities in the input image with the pedestrian HOG feature model, the detector judges whether it is a pedestrian or not. To determine the similarities, we propose a scoring method using the Gaussian function in this paper. The scoring method divides the HOG features by each angle, accumulates the score values calculated at each angle and obtains the final score.

If the pedestrian HOG feature model is configured with a single pedestrian image, there might be caused errors that detect only a specific pedestrian by characteristics such as clothing, hair style and posture of the pedestrian. Therefore, it is necessary to train the pedestrian HOG feature model to express the generalized pedestrian characteristics. The HOG feature model is learned using genetic algorithm. In addition, we use genetic programming to learn the nonlinear function expressing the reference model which is used to calculate the final score value to determine the detection of pedestrian.

2.2 Gaussian Scoring

In this research, we divide the HOG feature values ranging from 0° to 360° degrees by the angle and determine the similarity between the model and an input image with respect to the feature values on each divided angle space. This paper proposes a Gaussian scoring to determine the degree of similarity for each angle as given in Eq. (1).

Score(I,M)=e-(I-M)22(kM)2,

where, I and M represent the HOG value of an input image and the model, respectively. The constant k is introduced to generate higher scoring to relatively larger HOG values compared to smaller HOG values. The final scoring function for the input image cell is derived as Eq. (2).

anglescore(x,y,l)=(i,j)Score(I(x+i,y+j),M(i,j))w·h,Cellscore(x,y)=(l)anglescore(x,y,l)Nl,if(CellScore>Threshold)PedestrianDetect,

where w, h and Nl represent the width and height of the model and total number of divided angle, respectively.

When the scoring based decision is applied to pedestrian detection, a huge amount of calculation is needed because the scoring is performed through the exponential function for all cells. Therefore, a lookup table is used to improve the calculation speed. Eq. (1) for Gaussian scoring function can be expressed in terms of parameter j as given in Eq. (3).

Score(I,M)=e-(j-1)22k2,j=IM.

Since the value of the Gaussian function is determined by the ratio of the HOG value of the input image to the reference model as shown in Eq. (3), the Gaussian function value according to the ratio can be configured with a lookup table. In addition, as the ratio j approaches to small or large value, the scoring value approaches to zero. Since the scoring value approaching to zero means that the feature of the model and input image are far each other in similarity, we can regard the scoring value in these cases as 0 to improve the calculation speed. The proposed Gaussian scoring is calculated with the ratio j value as shown in Eq. (4).

If(1nj2n-1n)Score(j)=e-(j-1)22k2,elseScore(j)=0.

The scoring function (4) can be approximated by lookup table configured at discrete value j and linear interpolation given in Eq. (5)

Score(j)=GauLUT(jindex)+GauLUT(jindex+1)×(j-j*),

where GauLUT represents the Gaussian lookup table array, jindex represents the array index corresponding to the ratio value, and j* represents the j value corresponding to jindex, respectively.

3. Learning of the Pedestrian HOG Model

3.1 Learning of the Model by using GA

GA was first developed by John Holland in 1975 as an intelligent algorithm that solves the optimization problems based on Darwin’s theory of evolution. It was developed by mimicking the biological evolution and explores the best genes through crossover, mutation and selection operations. Many studies have reported that GA and GP provide an efficient and robust alternative for solving complex and highly nonlinear optimization problems utilizing global search procedures. The GA and GP model and their optimization procedures can be applied to solve the problem of establishing and training of the reference pedestrian model because it belongs to highly nonlinear optimization problem.

This paper proposes a HOG model training method using GA and applies the model to detection of general pedestrian characteristics. The INRIA person dataset is used for the training image of the HOG model, and the pedestrian HOG model is trained using 1, 216 pedestrian images and 1, 218 background images. The genetic configuration of the pedestrian model is shown in Figure 1.

In order to apply GA to learn the HOG pedestrian detection model, the model image and training images need to be presented as genes. A pedestrian model image is divided into cells as shown in Figure 1. A gene represented by P in Figure 1 consists of chromosomes (G) whose number is equal to the number of divided cells in the image.

GA needs to evaluate the fitness of gene to learn the parameter of the model. In order to construct the fitness function, we define the average score of the pedestrian images and non-pedestrian images. The average score is obtained by averaging the Gaussian scorings of the whole image of the pedestrian and non-pedestrian, respectively, with the GA HOG model as shown in Figure 2 and Eq. (6).

SP=1NPi=1NPScore(PDi,GAModel),SN=1NNi=1NNScore(NDi,GAModel).

In Eq. (6)SP and SN represent the average score of pedestrian images and the non-pedestrian images, NP and NN represent the number of positive data and the negative data, and PDi and NDi represent the ith pedestrian and non-pedestrian image, respectively. Now, we can regard SP and SN as measures for representing matching degree between GA model and pedestrian image and non-pedestrian image, respectively.

It can be noted that the high value of SP means the corresponding GA model has high degree of measure for representing the general pedestrian characteristics. On the other hand, when SN is high, the measure can be regarded that the model hardly distinguish the pedestrian and non-pedestrian characteristics. Therefore, the larger the difference between SP and SN, the easier it is distinguish between the general pedestrian and non-pedestrian characteristics, which is the more advantageous to detect pedestrians. On the other hand, if the variance value of the pedestrian score and the non-pedestrian score becomes larger, it is difficult to separate the pedestrian and non-pedestrian characteristics. Figure 3 shows changes in pedestrian detection rates of the pedestrian classification according to changes of score variance.

In graphs of Figure 3, SP and SN values are 6 and 4, respectively. Under the assumption that pedestrians are recognized based on the red solid lines, the detection rate is 100% in Figure 3(a), but the rate is 28.5% in Figure 3(b). Therefore, in order to train GA HOG model to have good performance, it is advantageous to reflect the variance of the pedestrian score and the non-pedestrian score in the evaluation function. Thus, we derived the fitness function to evaluate the GA HOG model as shown in Eq. (7).

Fitness=sw1SPSN+w2(SP-SN)-w3(vP+vN),If(SP<Threshold)s=0,elses=1,

where w1, w2 and w3 represent weights, vP and vN represent variance of scores for pedestrian images and non-pedestrian images, respectively. In Eq. (7), we can note that SP -SN value is applied to the fitness function because the larger the difference between SP and SN, the better the pedestrian model. In addition, vP +vN is added because the model having smaller variance is advantageous to distinguish between the pedestrian and non-pedestrian images as explained above. We designed the fitness function to have the term SP/SN, the ratio of the score on the positive data to the score on the negative data, because it is another representation of characteristic difference between positive data and negative data to the model.

3.2 Learning of the Model by using GP

The overall flow of GP algorithm [13] is similar to GA, but the structure of genes is different from each other. While GA has a fixed structure due to its bit string or real-number string, GP has a variable structure as the genes are expressed in a tree structure. Since the genetic structure of GP is variable, it has been widely applied to practical optimization such as searching for complex nonlinear functions or designing the system model itself.

Wf1=α×Tree0+(1-α)×Tree1,Wf2=(1-α)×Tree0+α×Tree1.

The cell score consists of average value of angle scores corresponding to the scores for each angle in Eq. (2a). Since angle score is a score for each angle, and we divide the angle by 20° in this paper, there exist 18 angle scores for each image. In Eq. (2b), the average value of the angle scores is obtained to calculate the cell score. In this case, the weights of all angle scores are same. However, there might be specific angle score that can well fit the general pedestrian characteristics.

We can train cell score function of Eq. (2b) using GP to the direction of increasing the weight for this type of angle score. The number of terminals used in cell score learning is 19 in total, including 18 angle score values from angle score 0° to angle score 340° and any constant values from 0 to 1. In this paper, GP functions consist of 11 items, which are +, |−|, *,/, |sin|, |cos|, avg, Wf1, Wf2, min, and max. The terminal and function of GP are summarized in Table 1.

Through the Gaussian scoring of INRIA person dataset and GA HOG learning model introduced for learning the score, the angle score is obtained as shown in Figure 4, and it is utilized as a GP terminal.

Based on the angle score value obtained as in Figure 4, a GP tree is configured to obtain the final score. In order to evaluate the fitness of the GP tree, the average score SPGP of positive image through GP model and the average score SNGP of negative image are calculated. The GP explores the tree with the largest difference between SPGP and SNGP. The calculation process of SPGP and SNGP is shown in Figure 5, and the equation is shown in Eq. (9).

SPGP=1NPi=1NPGPScorei(anglescore),SNGP=1NNi=1NNGPScorei(anglescore).

GP fitness is similar to the fitness of GA HOG model learning. As in the GA model, the larger the difference between SPGP and SNGP and the smaller the variance of scores are, the easier it is distinguish between the pedestrian and non-pedestrian characteristics. Thus, we designed the fitness function of GP model to reflect the difference and the variance of the average scores. Through the Gaussian scoring of an input image and the learned GA HOG model, the angle score is obtained, and the final score is calculated by using it as an input of the learned GP scoring. The final decision whether the input image is a pedestrian image or not depends on the threshold. Thus, the performance of pedestrian detection is affected by the threshold settings. Therefore, we propose a method for setting appropriate threshold values as shown in Figure 6 and Eq. (10).

Pd=sP-vP,Nu=sN+vN,d=Pd-Nu,T=Pd-d·vPvP+vN.

Pd and Nu represents the lower positive and upper negative reference, respectively. These are obtained by subtracting the standard deviation from the average score. The difference between Pd and Nu is represented by d, which can be a measure for evaluating how good classifier is generated. The threshold can be raised or lowered by the standard deviation value of positive score and negative score from the median value. This means that data with larger standard deviation has wider score width, and therefore the threshold T is adjusted accordingly.

4. Experiments

In order to verify the performance of the scoring learning technique using GP and pedestrian HOG model learning using GA and Gaussian scoring proposed in this paper, the pedestrian detection test was performed with the testing image of INRIA person dataset. We used ACC (Accuracy), CSI (Critical Success Index) and PAG (Post Agreement) as the performance indexes for the experiment of pedestrian image detection. Table 2 shows the result division table for performance index parameters, and the performance indexes are calculated with the parameters as shown in Eq. (11).

ACC=TP+TNTP+FP+TN+FN,PAG=TPTP+FP,CSI=TPTP+FP+FN.

We verify the performance of pedestrian detection through GA HOG model and GP scoring and compare the performance with a pedestrian classifier using SVM of HOG features. The parameters used in GA learning are summarized in Table 3, and Figure 7 shows the average score of the best GA HOG model.

We experimented GP learning with the parameters given in Table 4, and the result of best GP are shown in Figure 8. Table 5 shows a summary of the analysis on the performance comparison of the pedestrian detection using the conventional SVM and the algorithm proposed in this paper.

In Table 6, GS represents the Gaussian scoring, and GU means the Gaussian lookup table. The detection accuracy of GS and GA model was about 4% better than that of SVM, and the performance of GS+GA+GP model showed about 9% better in accuracy compared to SVM. In addition, the execution time of the proposed methods was measured, and the results of the time taken to execute a single input image with a resolution of 640 × 480 are summarized in Table 6. All the algorithms proposed in this paper are much better than SVM in terms of the execution time, and the execution time of the algorithm using the Gaussian lookup table proposed in this paper was about 2.3 times faster than the conventional GS.

5. Conclusion

This paper proposed a Gaussian Scoring method for matching images and GA/GP based learning to establish a reference model for pedestrian detection.

In the conventional approaches using HOG feature for matching, the HOG feature map consists of vectors representing the magnitude and angle of gradient for each pixel, and a reference model is trained with learning algorithms such as SVM and AdaBoost cascade method. In this paper, we consider the HOG value separately for each divided angle and evaluate the degree of HOG feature coincidence using the GS. By evaluating the HOG feature for each divided angle, it allowed us to visualize the features and make it easy to analyze it. This has a major advantage over the conventional HOG matching technique that cannot ensure the visualized representation due to large dimension of the HOG vector. In order to solve the problem of processing time to calculate the Gaussian score for all cells of the input images, we proposed a method to improve the processing time by using the Gaussian lookup table. In this paper, we also proposed GA and GP based training methods for the reference model. The GA and GP are used to search the best parameters of the gene and nonlinear function representing feature map of pedestrian model, respectively.

We performed experiments to verify the performance of proposed method in terms of accuracy and processing time with INRIA person dataset. The experimental results showed us that the accuracies of pedestrian detection using GA model and GP model were about 4% and 9% higher, respectively than the SVM method. In terms of processing time, the results showed us that the GA model using Gaussian lookup table has the best performance and 7.8 times faster than the SVM model. For future works, it is challenged to devise a model and training algorithm for recognizing the pedestrian even from input images where various size of pedestrian is appeared or a part of pedestrian is covered by something.

Conflict of Interest

No potential conflict of interest relevant to this article was reported.

Acknowledgements

This research was supported by Seokyeong University in 2018.

Fig 1.

Figure 1.

Genetic configuration of the pedestrian HOG model.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 111-119https://doi.org/10.5391/IJFIS.2018.18.2.111

Fig 2.

Figure 2.

Calculation of the average score SP and SN.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 111-119https://doi.org/10.5391/IJFIS.2018.18.2.111

Fig 3.

Figure 3.

Effect of score variance to detect pedestrian.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 111-119https://doi.org/10.5391/IJFIS.2018.18.2.111

Fig 4.

Figure 4.

Angle score calculation process.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 111-119https://doi.org/10.5391/IJFIS.2018.18.2.111

Fig 5.

Figure 5.

GP score average calculation process.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 111-119https://doi.org/10.5391/IJFIS.2018.18.2.111

Fig 6.

Figure 6.

Threshold setting.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 111-119https://doi.org/10.5391/IJFIS.2018.18.2.111

Fig 7.

Figure 7.

Best GA model.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 111-119https://doi.org/10.5391/IJFIS.2018.18.2.111

Fig 8.

Figure 8.

Best GP model.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 111-119https://doi.org/10.5391/IJFIS.2018.18.2.111

Table 1 . Configuration of GP terminal and function.

NodeArityDescription
R0Arbitrary values ranging from 0 to 1
angle score0~3400Score values of each angle
|Sin|1Absolute value of Sin function
|Cos|1Absolute value of Cos function
+2Sum of two channels
|−|2Absolute value of difference between two channels
*2Product of two channels
/2Division of two channels
Avg2Average of two channels
Wf12Weighted sum of two channels
Wf22Weighted sum of two channels
Max3Maximum value among three channels
Min3Minimum value among three channels

Table 2 . Results division table.

Predicted
YesNo
ActualYesTrue PositiveFalse Negative
NoFalse NegativeTrue Negative

Table 3 . GA parameter values.

ParameterValue
Population size50
Max generation100
Crossover rate0.9
Mutation rate0.1
Select methodTournament (size= 7)

Table 4 . GP parameter values.

ParameterValue
Population size1, 000
Max generation200
Crossover rate0.9
Mutation rate0.1
Select methodTournament (size= 7)
Initial depth6–8
Max depth17
Initial populationHalf and half

Table 5 . Experimental results of pedestrian detection performance.

Efficiency index
ACCPAGCSI
SVM0.8470.8010.724
GS0.7430.7520.592
GS + GA0.8900.8890.802
GS + GA + GP0.9270.9250.871

Table 6 . Comparison of execution time.

Execution time (FPS)
GS + GA44 ms
GS + GA + GP53 ms
SVM148 ms
GU + GA19 ms
GU + GA + GP28 ms

References

  1. Benenson, R, Omran, M, Hosang, J, and Schiele, B 2014. Ten years of pedestrian detection, what have we learned?., Computer Vision ECCV 2014 Workshops, Array, pp.613-627.
  2. Viola, P, and Jones, MJ 2001. Rapid object detection using a boosted cascade of simple features., Proceedings of the 2001 IEEE International Conference on Computer Vision and Pattern Recognition, Kauai, HI, Array.
  3. Viola, P, Jones, MJ, and Snow, D 2003. Detecting pedestrians using patterns of motion and appearance., Proceedings of the 9th International Conference on Computer Vision, Nice, France, Array.
  4. Dollar, P, Wojek, C, Shiele, B, and Perona, P (2012). Pedestrian detection: an evaluation of the state of the art. IEEE Transactions on Pattern Analysis and Machine Intelligence. 34, 743-761.
    CrossRef
  5. Redmon, J, Divvala, S, Girshick, R, and Farhadi, A 2016. You only look once: unified, real-time object detection., Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, Array, pp.779-788.
  6. Ren, S, He, K, Girshick, R, and Sun, J (2015). Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems. 28, 91-99.
  7. Dalal, N, and Triggs, B 2005. Histograms of oriented gradients for human detection., Proceedings of International Conference on Computer Vision and Pattern Recognition, San Diego, CA, Array, pp.886-893.
  8. Felzenszwalb, P, McAllester, D, and Ramanan, D 2008. A discriminatively trained, multiscale, deformable part model., Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, Array, pp.1-8.
  9. Zhu, Q, Yeh, MC, Cheng, KT, and Avidan, S 2006. Fast human detection using a cascade of histograms of oriented gradients., Proceedings of International Conference on Computer Vision and Pattern Recognition, New York, NY, Array, pp.1491-1498.
  10. Maji, S, Berg, AC, and Malik, J 2008. Classification using intersection kernel support vector machine is efficient., Proceedings of International Conference on Computer Vision and Pattern Recognition, Anchorage, AK, Array, pp.1-8.
  11. Dollar, P, Tu, Z, Perona, P, and Belongie, S 2009. Integral channel features., Proceedings of the British Machine Vision Conference, London, UK, Array.
  12. Felzenszwalb, P, Girshick, R, McAllester, D, and Ramanan, D (2010). Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence. 32, 1627-1645.
    Pubmed CrossRef
  13. Koza, JR (1992). On the Programming of Computers by Means of Natural Selection. Cambridge, MA: MIT Press
  14. Cheon, M, and Lee, H (2017). Vision-based vehicle detection system applying hypothesis fitting. International Journal of Fuzzy Logic and Intelligent Systems. 17, 58-67.
    CrossRef
  15. Kim, AR, Rhee, SY, and Jang, HW (2016). Lane detection for parking violation assessments. International Journal of Fuzzy Logic and Intelligent Systems. 16, 13-20.
    CrossRef
  16. Choi, H, and Cho, Y 2016. A learning method of evolutionary HOG model for pedestrian detection., Proceedings of 11th International Workshop Series Convergence Works.

Share this article on :

Related articles in IJFIS

Most KeyWord