In this paper, we propose the improved vision-based vehicle detection system which is added a hypothesis fitting (HF) step to the typical vehicle detection system consisting of the hypothesis generation (HG) and hypothesis verification (HV) step. In the HG step, the system generates hypotheses using shadow regions appearing under vehicles. In the HV step, the system verifies whether a hypothesis is a vehicle or not by applying a classifier and feature vectors extracted from the hypothesis. The proposed HF step is conducted between HG and HV steps and help to improve performance of the vehicle verification by adapting region of hypotheses. We verify the performance of our proposed HF method through a total of 4,797 hypotheses, 1,606 positive hypotheses, and 3,191 negative hypotheses data set.
A vision-based vehicle detection system is a component of automatic driving systems and plays a very important role to reduce traffic accidents. Many vision-based vehicle detection systems using a single camera have been proposed and a typical system consists of two steps, i.e., hypothesis generation (HG) and hypothesis verification (HV) steps in order to reduce computational time and achieve real-time processing [1, 2].
The purpose of the HG step is extracting vehicle hypotheses from road images. The results of the HG step can have considerable effect on the verification accuracy of the HV step. To extract hypotheses more accurately, many methods for the hypothesis generation have been studied. The HG step generally uses one of the following three methods: the knowledge-based method, the stereo vision-based method or the motion-based method. To detect objects and hypothesizes the location of object, the knowledge-based method uses characteristic information such as symmetry [3–6], corners [7, 8], shadows [9], edges [10–12], textures [13, 14], and vehicle lights [15–17]. There are two types of the stereo vision-based method for detecting vehicles. One of those methods uses disparity map [18], and the other method uses the Inverse Perspective Mapping method [19, 20]. The motion-based method extracts hypotheses using optical flow by obtaining their relative motions: approaching vehicles produce diverging flow while overtaking vehicles produce converging flow [21, 22].
The input of the HV step is the hypothesized vehicles obtained from the HG step. In the HV step, hypotheses are verified whether these candidates are vehicles or not. Typical HV methods are the template-based and the appearance-based method. The template-based methods calculate the correlation between hypotheses and the pre-defined pattern [23–25]. Otherwise, the appearance methods regard the HV as a classification problem of distinguishing vehicle (positive) and nonvehicle (negative) data. To generate classifier parameters, this method extracts features from training images (including vehicle and nonvehicle data) [26–28].
In this paper, we improve the vision-based vehicle detection system which is proposed in [29]. The system applies the knowledge-based HG method which extracts hypotheses using shadow regions [9, 29]. In the HV step, the system applies an appearance-based method which uses the histogram of oriented gradients (HOG) [30, 31] and HOG symmetry vectors [29] as feature vectors and applies total error rate minimization using reduced model (TER-RM) [29, 32] as a classifier.
In order to improve performance of the system, we propose a hypothesis fitting (HF) step in this paper. If hypotheses are incorrectly extracted in the HG step, classification accuracies decrease and the HOG symmetry extracted from hypotheses cannot be good feature to verify those hypotheses. The HF step conducted between the HG and HV steps adapts the region of a hypothesis generated in the HG step and helps to enhance verification accuracy in the HV step.
We introduce our system briefly in Section 2, demonstrate the HG step and HV step in Sections 3 and 4 respectively, and discuss our proposed HF step in Section 5. In Section 6, we assess the experimental performance of our proposed system.
In this paper, we propose the improved vision-based vehicle detection system using a single camera which is added a HF step to the typical vehicle detection system consisting of HG and HV step.
In the HG step, hypotheses, i.e., candidates of vehicles are extracted from a road image. In the HV step, hypotheses extracted in the HG step are verified to determine whether or not they are vehicles.
As shown in Figure 1, the HF step is conducted between HG and HV steps and help the improving performance of the vehicle verification by adapting region of hypotheses.
The proposed system in this paper applies the HG and HV step which are introduced in [29].
The proposed system extracts shadow regions to generate hypotheses of vehicle locations from road images. This idea is based on the fact that the shaded area under vehicle is always darker than the road area surrounding the shadow. Therefore, if the gray level of the paved road is roughly estimated, we can expect that the gray level of the shaded area will be less than this level.
The system applies the road area, i.e., the driving space extracting method using the edge information which is introduced in [9]. This method regards the driving space as the lowest central homogeneous area in the image demarcated by edges. After extracting the road area, dark regions including shaded areas under vehicles are defined as regions that have smaller intensity than a threshold value m − 3σ, where m and σ are the average and standard deviation of the frequency distribution of the road pixels. To obtain shadow regions underneath vehicles, horizontal edges between dark areas and road regions are extracted, and hypotheses are determined based on the locations of those edges [9]. Figure 2 shows the hypothesis generation step briefly.
In HV step, our proposed system verifies whether the hypotheses extracted in the HG step are vehicles or not. To verify a hypothesis, the system extracts feature vectors from the hypothesis image and classifies those vectors.
In the HV step, the proposed system use two kinds of feature vectors: HOG [30, 31] and HOG symmetry [29]. The HOG is one of the most popular features in vision-based target object detection fields, especially human detection systems. The HOG symmetry vector is composed of numerical elements that represent symmetrical characteristics.
The HOG vectors are extracted in three step sequences which are gradient computation, orientation binning and histogram generation [30]. In the gradient computation step, the gradient of an image is obtained by applying two 1-D filters: (−1 0 1) for the horizontal direction and (−1 0 1)^{T} for the vertical direction. By applying these two filters, orientation and magnitude values of gradients for each pixel are obtained. In orientation binning step, the orientation bins are evenly divided into the predefined range, e.g., 6 bins with π/3 range. After the number of the orientation bins is determined, a histogram is generated by accumulating a magnitude value for each pixel of an image in the histogram generation step. If an orientation value of a pixel is within an angle range of a specific bin, a magnitude value for the pixel is accumulated in the corresponding bin. In the proposed system, hypothesis images are resized into 64×64 pixels and divided into four blocks with 32×32 pixels as shown in Figure 3.
The HOG symmetry vectors represent symmetrical characteristics numerically using two different HOG vectors [29]. The HOG vectors extracted from four blocks of a vehicle hypothesis image show symmetry characteristics: HOG1 and HOG2 are symmetric to each other, as are HOG3 and HOG4, as shown in Figure 4. Therefore, two HOG symmetry vectors are extracted from upper and lower parts of a hypothesis image, and each element of the HOG symmetry vector represents how similar each element of two different HOG vectors is.
Consider that a symmetrical image is divided into two blocks and two 8-dimensional HOG vectors are extracted from the left and right blocks as shown in Figure 5.
The HOG symmetry vector
where
In the HV step, the system verifies hypotheses extracted in the HG step. The system extracts feature vectors, i.e., HOG and HOG symmetry and these vectors are applied to a classifier TER-RM with data importance which is introduced in [29].
TER-RM is a classification method using the reduced model [33] to minimize the total error rate (TER) of training data, which is the sum of the false positive (FP) and false negative (FN) rates [32]. TER-RM with data importance is a classifier which is applied importance value of training data to the TER-RM, and it minimizes the sums of the FP and FN rate multiplied by importance value [29].
The reduced model is used to produce more sophisticated separating surfaces than linear discriminant functions and the model is given by [33]
where
The goal of TER-RM with importance value can be expressed as follows [29]:
where
where
The optimization problem in (
where η is the positive offset value.
The solution of the minimization condition for (
where
In this section, we introduce a hypothesis fitting (HF) method.
In late afternoon or early morning, vehicles cast relatively long shadows compared to their size. In this situation, the proposed system which uses shaded area based HG method cannot extract the hypothesis correctly as shown in Figure 6 and it can decrease the classification accuracies. To reduce classification error caused by those incorrectly extracted hypotheses, we apply the HF step to the system. The proposed HF method is conducted between the HG and HV steps and adapts the region of a hypothesis generated in the HG step using the symmetry characteristics of vehicle images.
Generally, edge information is used to extract symmetry characteristics of vehicles. Representatively, Kuehnle [3] and Zielke et al. [4] proposed vehicle detection system using the edge based symmetry.
However, the proposed HF method uses HOG which is used as features in the system instead of edges information to find symmetry characteristics of vehicles.
It can reduce the running time of the system, instead of generating the orientation and the magnitude map of the gradients twice, thereby sharing the map both in the hypothesis fitting step and the feature extraction step.
Figure 7 shows the process of the proposed HF method briefly. First, the method applies the smallest sized window to left end of a hypothesis image and extracts HOG from the window. At the same time, the same-sized window is applied to immediate right of the window and HOG is also extracted.
After obtaining two HOGs are obtained from the left and the right windows, the numerical symmetry of those HOGs is calculated. The HOG vector that is obtained from the right window is rearranged as (
where
After the process is completed, it finds the location of the symmetry axis that has the maximum S and adjusts the hypothesis image according to the location.
If the system corrects hypotheses depending only on the maximum symmetry values without any specific conditions, some of hypotheses are not correctly adjusted and it can cause worse classification accuracy because of a decrease in resolution of hypotheses. Therefore, a threshold value is defined and the system adjusts the hypothesis when its maximum symmetry value is larger than the threshold value. If the maximum symmetry value is smaller than the threshold value, the hypothesis is not adjusted and the system uses an unadjusted hypothesis (an original hypothesis image) for the HV step.
In addition, if hypotheses have smaller resolution than a specific threshold value, the proposed HF step is not applied. This is also because the low resolution of hypothesis images can cause decrease in the verification accuracy. These threshold values are also determined by trial and error. Figure 8 shows the result of the proposed HF method.
For experiments, a total of 4,797 hypotheses, 1,606 positive hypotheses, and 3,191 negative hypotheses, are extracted from road images collected during real on-road driving tests in Seoul, Korea. Gray level hypothesis image are used, and those images are divided into four blocks. (upper left, upper right, lower left, and lower right) The HOG vector is extracted from each block and two HOG symmetry vectors are obtained from upper blocks and lower blocks, respectively. Then, TER-RM with importance value is applied to verify hypotheses.
Figure 9 shows positive and negative hypotheses which the proposed system extracts. Most of negative data set are guard rails, and some of those are structures and buildings near roads, signs on roads, parts of vehicles, and so on.
In this section, the classification results according to the dimensions of HOG are compared. 4, 6, 8, and 10-dimensional HOG vectors are extracted from obtained hypothesis images and those vectors are applied to TER-RM with importance classification method.
As this paper mentioned in the previous Section 4, Four HOG vectors and two HOG symmetry vectors are obtained from a hypothesis. Table 1 shows the dimension of feature vectors according to the dimension of HOG.
The 24, 32, 48, and 60-dimensional feature vectors are applied to the TER-RM with importance (RM order is 3), and classification results of these features are compared using 10-fold cross validation.
Figure 10 and Table 2 show classification result, i.e., error rate according to the dimensions of HOG (4, 6, 8, and 10). Figure 11 and Table 3 show TER according to the dimensions of HOG (4, 6, 8, and 10).
As shown in Figures 10 and 11 and Tables 2 and 3, when the dimension of HOG is 8, i.e., 48-dimensional feature vectors are applied, the classification result is relatively better than other cases.
Generally, as the dimension of feature vectors increases, the accuracy of the classification increases. However, too large dimension of feature vectors sometimes cause worse classification test result because of the overfitting problem. For this reason, the error rate of the classification applying 10-dimensional HOG is worse than the case that 8-dimensional HOG is applied. Based on the result, 8-dimensional HOG, i.e., 48-dimensional feature vectors are applied to the system for the hypothesis verification.
In this section, the performance of the proposed HF method which is introduced in Section 5 is verified. To assess the performance of the method, two kinds of data sets are created from the hypotheses which are mentioned in Section 6.1. One of the data sets is applied the hypothesis fitting method, and the other one is not.
Four 8-dimensional HOG and two 8-dimensional HOG symmetry vectors are extracted from the two data sets, and the vectors are applied to the TER-RM with importance (RM order is 3) to classify data sets. The performance of the classification applying each data set is assessed by the 10-fold cross validation.
Figures 12 and 13 and Tables 4 and 5 show numerical performances of the 10-fold cross validation (error rate and TER) when the two data sets are applied to the classification method.
According to above Figure 12 and Table 4, when the hypothesis fitting method is applied to a data set, the accuracy is averagely better (approximately 1.2%) than the case that the unfitted data set is applied. As shown in Figure 13 and Table 5, the hypothesis fitting method also leads to 0.054 lower TER.
In this paper, we propose the improved vision-based vehicle detection system using a single camera which is added a HF step to the typical vehicle detection system consisting of the HG and HV step. The proposed HF step is conducted between HG and HV steps and help the improving performance of the vehicle verification by adapting region of hypotheses.
We show performance of the proposed system through 4,797 images including 1,606 positive hypotheses, and 3,191 negative hypotheses.
According to the classification results in Section 6, the proposed hypothesis fitting method leads to approximately 1.2% higher accuracy.
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (Ministry of Science, ICT & Future Planning [MSIP]) (No. 2017R1C1B5018408).
No potential conflict of interest relevant to this article was rep
Overview of our proposed system.
Process of the HV step (a) road region, (b) dark region, (c) edges of shadows underneath vehicles, (d) extracted hypotheses based on Figure 2(c).
Four HOGs from a hypothesis.
Four HOG vectors are generated from four blocks and two HOG symmetry vectors (C_{1} and C_{2}) are obtained from upper blocks and lower blocks, respectively.
The HOG symmetry of a octagon: Bins 4, 5, and 6 of HOG1 and Bins 2, 1, and 8 of HOG2 are symmetric to each other, and HOG symmetry is generated from HOG1 and rearrangement of HOG2, i.e., HOG2’.
Examples of hypotheses which are incorrectly extracted due to long shadows.
Hypothesis fitting method.
Original hypothesis images and adjusted images by the proposed method.
(a) Positive (vehicle) and (b) negative (non-vehicle) images.
Ten-fold cross-validation result (error rate) according to the dimensions of HOG.
Ten-fold cross-validation result (TER) according to the dimensions of HOG.
10-fold cross validation result (error rate) for the adjusted data set and the unadjusted data set.
Ten-fold cross-validation result (TER) for the adjusted data set and the unadjusted data set.
Dimensions of feature vectors according to the dimensions of HOG
Dimension of HOG | Number of HOGs | Dimension of HOG symmetry | Number of HOG symmetries | Dimension of feature vector |
---|---|---|---|---|
4 | 4 | 4 | 2 | 24 |
6 | 4 | 6 | 2 | 32 |
8 | 4 | 8 | 2 | 48 |
10 | 4 | 10 | 2 | 60 |
Average, the best, and the worst error rate according to the dimensions of HOG
Dimension of HOG | AVG±SD | Best | Worst |
---|---|---|---|
4 | 0.1277±0.0278 | 0.1021 | 0.2021 |
6 | 0.0827±0.0324 | 0.0563 | 0.1750 |
8 | 0.0575±0.0116 | 0.0417 | 0.0792 |
10 | 0.0777±0.0370 | 0.0458 | 0.1750 |
Average, the best, and the worst TER according to the dimensions of HOG
Dimension of HOG | AVG±SD | Best | Worst |
---|---|---|---|
4 | 0.2680±0.0264 | 0.2284 | 0.3150 |
6 | 0.1818±0.0467 | 0.1296 | 0.2688 |
8 | 0.1212±0.0231 | 0.0863 | 0.1573 |
10 | 0.1407±0.0487 | 0.0931 | 0.2658 |
Average, the best, and the worst error rate for the adjusted data set and the unadjusted data set
Data set | AVG±SD | Best | Worst |
---|---|---|---|
Adjusted data set | 0.0575±0.0116 | 0.0417 | 0.0792 |
Unadjusted data set | 0.0696±0.0274 | 0.0333 | 0.1292 |
Average, the best, and the worst TER for the adjusted data set and the unadjusted data set
Data set | AVG±SD | Best | Worst |
---|---|---|---|
Adjusted data set | 0.1212±0.0231 | 0.0863 | 0.1573 |
Unadjusted data set | 0.1753±0.0574 | 0.0855 | 0.2794 |
E-mail: cheonmk@gtec.ac.kr
E-mail: hslee0717@ut.ac.kr