International Journal of Fuzzy Logic and Intelligent Systems 2018; 18(3): 182-189
Published online September 25, 2018
https://doi.org/10.5391/IJFIS.2018.18.3.182
© The Korean Institute of Intelligent Systems
Min-Hyuck Lee, and Seokwon Yeom
School of Computer and Communication Engineering, Daegu University, Gyeongsan, Korea
Correspondence to :
Seokwon Yeom (yeom@daegu.ac.kr)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Multiple object detection and tracking are essential for video surveillance. A drone or unmanned aerial vehicle (UAV) is very useful for aerial surveillance due to the efficiency of capturing remote scenes. This paper addresses the detection and tracking of moving vehicles with a UAV. A UAV can capture video sequences above the road at a distance. The detection step consists of frame differencing followed by thresholding, morphological filtering, and removing false alarms considering the true size of vehicles. The centroids of the detected areas are considered as measurements for tracking. Tracking is performed with Kalman filtering to estimate the state of the target based on the dynamic state model and measured positions. In the experiment, three moving cars are captured at a long distance by a drone. Experimental results show that the proposed method well detects moving cars and achieves good accuracy in tracking their dynamic state.
Keywords: UAV/drone imaging, Object detection, Multiple target tracking, Kalman filtering
Multiple object detection and tracking are important for security and surveillance [1]. Recently, there is increased usage of unmanned aerial vehicles (UAVs) or drones for aerial surveillance due to the efficiency of capturing remote scenes [2]. The image resolution for objects located far from the camera is low. Further, blurring and noise can degrade image quality. Various studies have been conducted on visual detection and tracking of moving objects. Methods based on background subtraction and frame difference were studied in [3–8]. Gaussian mixture modeling (GMM) was used to analyze background and target regions in [3–5]. Scale invariant feature transform (SIFT) was adopted to extract foreground objects from the background scene in [6, 7]. In [8], background was subtracted under Gaussian mixture assumption followed by morphological filters. In [9], hypothesis fitting was adopted for vehicle detection. Long-range moving objects were detected by background subtraction in [10].
In addition to the above-mentioned research, many studies have been conducted on multiple-target tracking [11–14]. High maneuvering, closely located multiple targets, heavy clutter (false alarm), and low detection probability are often considered difficulties to overcome [11]. Kalman filtering provides a solution for estimating the state of the target in real time. It is known to be optimal under independent Gaussian noise assumption [12, 13]. When multiple measurements are detected in a frame, data association is required to assign the measurements to the established tracks [14].
Vison-based target tracking with UAVs has been researched in [15–22]. Correlation filtering-based tracking was studied in [15]. Closely located-object tracking was performed with feature matching and multi-inertial sensing data in [16]. Pedestrians were tracked by template-matching in [17, 18]. Small animals were tracked with a freely moving camera in [19]. A moving ground target was tracked in dense obstacles areas with UAVs in [20]. Multiple moving targets are tracked by Kalman filtering and IMM filtering in [21] and [22], respectively. In [23], position and velocity were estimated with multiple sensor fusion.
In this paper, we address the detection and tracking of multiple moving vehicles obtained by a UAV, which captures videos above the road from a long distance. To achieve this goal, we propose to integrate two systems: one is detection based on image processing, which manipulates pixels in the stationary scene, and the other is tracking based on Kalman filtering, which estimates dynamic states (position and velocity) of targets in the time domain. Object detection is performed through frame differencing and thresholding, morphological filtering, and removing false alarms based on the minimum size of vehicles. Two frames separated by a constant interval are subtracted and turned into a binary image by thresholding, which can represent the difference between the current and the past frames. Two morphological operations, erosion and dilation, are applied to the binary image to extract regions of interest (ROIs), which are the disjoint alternative areas. The ROIs are the candidate areas of moving vehicles. False ROIs are removed by the minimum ROI size, which is based on the true size of moving vehicles. The centroids of ROIs are the results of detection. The centroids represent the measured positions of targets, which will be processed in the next tracking stage. Tracking is performed by Kalman filtering to estimate the state of the target. A nearly constant velocity (NCV) model is assumed for the dynamic state of the target [11]. A position gating process excludes the measurement outside the validation region. The nearest measurement-to-track association scheme assigns one measurement to one track [21]. The tracks are initialized by a two-point differencing initialization method with a maximum speed gating [11]. In the experiments, three moving cars are captured at a long distance by a UAV. The camera built in a UAV points directly downward to the road. Root-mean-squared errors (RMSEs) are obtained for position and velocity in
The remainder of the paper is organized as follows. Object detection is discussed in Section 2. Multiple target tracking is presented in Section 3. Section 4 demonstrates experimental results. The conclusion follows in Section 5.
A block diagram of object detection is described in Figure 1. First, frame difference is obtained between the current and the past frames. A thresholding step follows to generate a binary image as:
where
where
where
The dynamic state of the target is modeled as a NCV model; the target’s maneuvering is modeled by the uncertainty of the process noise, which is assumed to follow the Gaussian distribution. The following is the discrete state equation of a target:
where
where
The initial state of each target is calculated by the two-point differencing method with a maximum speed gating [14]. The initial state and covariance matrix of target
The state is confirmed as an initial state of a track if the following speed gating is satisfied:
The state and covariance predictions are iteratively computed with the initial values as:
where
Data association is the process of assigning multiple measurements to established tracks. Measurement gating is a preprocess that reduces the number of candidate measurements for data association. The measurement gating is performed by chi-square hypothesis testing with the assumption of the Gaussian measurement residuals [11]. Measurements become valid for target
where
The state estimate and the covariance matrix of targets are updated as follows:
If
Several metrics are used for performance evaluation, position error, velocity error, and RMSE of position and velocity. The position error of target
where
where
where the ground truth of the velocity is approximated as:
The RMSE of velocity is obtained as:
A drone (Phantom 4 Advanced) captures three moving vehicles at 30 frames per second. A total of 372 frames are captured for more than 12 seconds. The size of one frame is 4096
A total of 368 frames from the 5th frame to the 372nd frame is processed to detect moving objects, since
Figure 4 shows all centroids for 368 frames. Table 1 shows the detection rate of three cars. Average detection rate is around 96%. The low detection rate of Car 3 is mainly due to the tree occlusion and the same background color (white sidewalk) as the target. A total of 10 false alarms are detected, and two of them are caused by the split measurements from one car.
The centroids in Figure 4 become input measurements for tracking. The sampling time Δ in
Figure 6(a) shows the ground truth of the position of Car 1 from the first to the 155th frame. Figures 6(b) and c are the position vectors of Cars 2 and 3, respectively, from the first frame to the 289th frame, and from the 74th frame to the 372nd frame. Figure 7(a)–(c) shows the approximated ground truth of the velocity in
The sampling time is set differently in the following experiments. Figure 10 shows the tracking result with Δ = 0.1 s, that is, measurements are acquired every third frame from the original data. Cars 1 and 2 are initialized by the 5th and 8th frames and Car 3 is initialized by the 74th and 77th frames. Cars 1, 2, and 3 are estimated until 155th, 287th, and 371th frame, respectively, thus Cars 1, 2, and 3 are estimated for 50, 94, and 99 frames, respectively, including initialization. Figure 11 is the tracking results with every sixth frame (Δ = 0.2 s) updated. Cars 1 and 2 are initialized by the 5th and 11th frames and Car 3 is initialized by the 77th and 83rd frames. Cars 1, 2, and 3 are estimated for 25, 47, and 49 frames, respectively, until 155th, 298th, and 371st frame. The relative computational times are 1, 0.51, and 0.34 for Δ = 0.033, Δ = 0.1, and Δ = 0.2 s, respectively.
Table 2 shows the RMSE of position in
In this paper, multiple moving objects are captured over long distances by a UAV. The locations of objects are detected using a background scene. Three targets are tracked with Kalman filtering. Experimental results show that the proposed algorithm detects and tracks moving objects with good accuracy. Position and velocity errors are obtained for different maneuvering targets and sampling times. It is shown that the proposed system can adjust the sampling time to build a more efficient system in computational resources. This property is favorable for drone applications because the computational resources in a drone are very limited. This study also proposes new applications for drones, such as traffic control and smart car as well as security and defense. Tracking a large number of targets with high maneuvering remains for future study.
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (Grant Number: 2017R1D1 A3B03031668).
No potential conflict of interest relevant to this article was reported.
(a) Cars 1 and 2 at the first frame; (b) Cars 13 at the 150th frame; (c) Car 3 at the 300th frame. Moving cars are indicated by the red circles.
(a) Frame differencing; (b) thresholding; (c) morphological filtering and centroids; (d) segmented region by ROI windows after false alarm removal.
Table 1. Detection rates.
Number of frames | Number of detections | Detection rate (%) | |
---|---|---|---|
Car 1 | 151 | 151 | 100 |
Car 2 | 285 | 282 | 98.95 |
Car 3 | 299 | 273 | 91.3 |
Total/average | 735 | 706 | 96.05 |
Table 2. Position RMSE.
Sampling time (s) | RMSE (m) | Average | ||
---|---|---|---|---|
Car 1 | Car 2 | Car 3 | ||
0.033 | 0.885 | 1.619 | 1.567 | 1.345 |
0.1 | 0.89 | 1.684 | 1.561 | 1.378 |
0.2 | 0.925 | 1.686 | 1.636 | 1.416 |
Table 3. Velocity RMSE.
Sampling time (s) | RMSE (m/s) | Average | ||
---|---|---|---|---|
Car 1 | Car 2 | Car 3 | ||
0.033 | 1.472 | 1.71 | 2.114 | 1.765 |
0.1 | 1.377 | 1.983 | 1.844 | 1.734 |
0.2 | 1.287 | 2.015 | 2.366 | 1.89 |
E-mail: dool0331@daegu.ac.kr
E-mail: yeom@daegu.ac.kr
International Journal of Fuzzy Logic and Intelligent Systems 2018; 18(3): 182-189
Published online September 25, 2018 https://doi.org/10.5391/IJFIS.2018.18.3.182
Copyright © The Korean Institute of Intelligent Systems.
Min-Hyuck Lee, and Seokwon Yeom
School of Computer and Communication Engineering, Daegu University, Gyeongsan, Korea
Correspondence to:Seokwon Yeom (yeom@daegu.ac.kr)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Multiple object detection and tracking are essential for video surveillance. A drone or unmanned aerial vehicle (UAV) is very useful for aerial surveillance due to the efficiency of capturing remote scenes. This paper addresses the detection and tracking of moving vehicles with a UAV. A UAV can capture video sequences above the road at a distance. The detection step consists of frame differencing followed by thresholding, morphological filtering, and removing false alarms considering the true size of vehicles. The centroids of the detected areas are considered as measurements for tracking. Tracking is performed with Kalman filtering to estimate the state of the target based on the dynamic state model and measured positions. In the experiment, three moving cars are captured at a long distance by a drone. Experimental results show that the proposed method well detects moving cars and achieves good accuracy in tracking their dynamic state.
Keywords: UAV/drone imaging, Object detection, Multiple target tracking, Kalman filtering
Multiple object detection and tracking are important for security and surveillance [1]. Recently, there is increased usage of unmanned aerial vehicles (UAVs) or drones for aerial surveillance due to the efficiency of capturing remote scenes [2]. The image resolution for objects located far from the camera is low. Further, blurring and noise can degrade image quality. Various studies have been conducted on visual detection and tracking of moving objects. Methods based on background subtraction and frame difference were studied in [3–8]. Gaussian mixture modeling (GMM) was used to analyze background and target regions in [3–5]. Scale invariant feature transform (SIFT) was adopted to extract foreground objects from the background scene in [6, 7]. In [8], background was subtracted under Gaussian mixture assumption followed by morphological filters. In [9], hypothesis fitting was adopted for vehicle detection. Long-range moving objects were detected by background subtraction in [10].
In addition to the above-mentioned research, many studies have been conducted on multiple-target tracking [11–14]. High maneuvering, closely located multiple targets, heavy clutter (false alarm), and low detection probability are often considered difficulties to overcome [11]. Kalman filtering provides a solution for estimating the state of the target in real time. It is known to be optimal under independent Gaussian noise assumption [12, 13]. When multiple measurements are detected in a frame, data association is required to assign the measurements to the established tracks [14].
Vison-based target tracking with UAVs has been researched in [15–22]. Correlation filtering-based tracking was studied in [15]. Closely located-object tracking was performed with feature matching and multi-inertial sensing data in [16]. Pedestrians were tracked by template-matching in [17, 18]. Small animals were tracked with a freely moving camera in [19]. A moving ground target was tracked in dense obstacles areas with UAVs in [20]. Multiple moving targets are tracked by Kalman filtering and IMM filtering in [21] and [22], respectively. In [23], position and velocity were estimated with multiple sensor fusion.
In this paper, we address the detection and tracking of multiple moving vehicles obtained by a UAV, which captures videos above the road from a long distance. To achieve this goal, we propose to integrate two systems: one is detection based on image processing, which manipulates pixels in the stationary scene, and the other is tracking based on Kalman filtering, which estimates dynamic states (position and velocity) of targets in the time domain. Object detection is performed through frame differencing and thresholding, morphological filtering, and removing false alarms based on the minimum size of vehicles. Two frames separated by a constant interval are subtracted and turned into a binary image by thresholding, which can represent the difference between the current and the past frames. Two morphological operations, erosion and dilation, are applied to the binary image to extract regions of interest (ROIs), which are the disjoint alternative areas. The ROIs are the candidate areas of moving vehicles. False ROIs are removed by the minimum ROI size, which is based on the true size of moving vehicles. The centroids of ROIs are the results of detection. The centroids represent the measured positions of targets, which will be processed in the next tracking stage. Tracking is performed by Kalman filtering to estimate the state of the target. A nearly constant velocity (NCV) model is assumed for the dynamic state of the target [11]. A position gating process excludes the measurement outside the validation region. The nearest measurement-to-track association scheme assigns one measurement to one track [21]. The tracks are initialized by a two-point differencing initialization method with a maximum speed gating [11]. In the experiments, three moving cars are captured at a long distance by a UAV. The camera built in a UAV points directly downward to the road. Root-mean-squared errors (RMSEs) are obtained for position and velocity in
The remainder of the paper is organized as follows. Object detection is discussed in Section 2. Multiple target tracking is presented in Section 3. Section 4 demonstrates experimental results. The conclusion follows in Section 5.
A block diagram of object detection is described in Figure 1. First, frame difference is obtained between the current and the past frames. A thresholding step follows to generate a binary image as:
where
where
where
The dynamic state of the target is modeled as a NCV model; the target’s maneuvering is modeled by the uncertainty of the process noise, which is assumed to follow the Gaussian distribution. The following is the discrete state equation of a target:
where
where
The initial state of each target is calculated by the two-point differencing method with a maximum speed gating [14]. The initial state and covariance matrix of target
The state is confirmed as an initial state of a track if the following speed gating is satisfied:
The state and covariance predictions are iteratively computed with the initial values as:
where
Data association is the process of assigning multiple measurements to established tracks. Measurement gating is a preprocess that reduces the number of candidate measurements for data association. The measurement gating is performed by chi-square hypothesis testing with the assumption of the Gaussian measurement residuals [11]. Measurements become valid for target
where
The state estimate and the covariance matrix of targets are updated as follows:
If
Several metrics are used for performance evaluation, position error, velocity error, and RMSE of position and velocity. The position error of target
where
where
where the ground truth of the velocity is approximated as:
The RMSE of velocity is obtained as:
A drone (Phantom 4 Advanced) captures three moving vehicles at 30 frames per second. A total of 372 frames are captured for more than 12 seconds. The size of one frame is 4096
A total of 368 frames from the 5th frame to the 372nd frame is processed to detect moving objects, since
Figure 4 shows all centroids for 368 frames. Table 1 shows the detection rate of three cars. Average detection rate is around 96%. The low detection rate of Car 3 is mainly due to the tree occlusion and the same background color (white sidewalk) as the target. A total of 10 false alarms are detected, and two of them are caused by the split measurements from one car.
The centroids in Figure 4 become input measurements for tracking. The sampling time Δ in
Figure 6(a) shows the ground truth of the position of Car 1 from the first to the 155th frame. Figures 6(b) and c are the position vectors of Cars 2 and 3, respectively, from the first frame to the 289th frame, and from the 74th frame to the 372nd frame. Figure 7(a)–(c) shows the approximated ground truth of the velocity in
The sampling time is set differently in the following experiments. Figure 10 shows the tracking result with Δ = 0.1 s, that is, measurements are acquired every third frame from the original data. Cars 1 and 2 are initialized by the 5th and 8th frames and Car 3 is initialized by the 74th and 77th frames. Cars 1, 2, and 3 are estimated until 155th, 287th, and 371th frame, respectively, thus Cars 1, 2, and 3 are estimated for 50, 94, and 99 frames, respectively, including initialization. Figure 11 is the tracking results with every sixth frame (Δ = 0.2 s) updated. Cars 1 and 2 are initialized by the 5th and 11th frames and Car 3 is initialized by the 77th and 83rd frames. Cars 1, 2, and 3 are estimated for 25, 47, and 49 frames, respectively, until 155th, 298th, and 371st frame. The relative computational times are 1, 0.51, and 0.34 for Δ = 0.033, Δ = 0.1, and Δ = 0.2 s, respectively.
Table 2 shows the RMSE of position in
In this paper, multiple moving objects are captured over long distances by a UAV. The locations of objects are detected using a background scene. Three targets are tracked with Kalman filtering. Experimental results show that the proposed algorithm detects and tracks moving objects with good accuracy. Position and velocity errors are obtained for different maneuvering targets and sampling times. It is shown that the proposed system can adjust the sampling time to build a more efficient system in computational resources. This property is favorable for drone applications because the computational resources in a drone are very limited. This study also proposes new applications for drones, such as traffic control and smart car as well as security and defense. Tracking a large number of targets with high maneuvering remains for future study.
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (Grant Number: 2017R1D1 A3B03031668).
Block diagram of moving object detection.
(a) Cars 1 and 2 at the first frame; (b) Cars 13 at the 150th frame; (c) Car 3 at the 300th frame. Moving cars are indicated by the red circles.
(a) Frame differencing; (b) thresholding; (c) morphological filtering and centroids; (d) segmented region by ROI windows after false alarm removal.
Measurements including false alarms for 368 frames.
Tracking results whenΔ = 0.033 s: (a) Car 1, (b) Car 2, (c) Car 3.
Ground truth of position: (a) Car 1, (b) Car 2, (c) Car 3.
Approximated ground truth of velocity: (a) Car 1, (b) Car 2, (c) Car 3.
Position error: (a) Car 1, (b) Car 2, (c) Car 3.
Velocity error: (a) Car 1, (b) Car 2, (c) Car 3.
Tracking results whenΔ = 0.1 s: (a) Car 1, (b) Car 2, (c) Car 3.
Tracking results whenΔ = 0.2 s: (a) Car 1, (b) Car 2, (c) Car 3.
Table 1 . Detection rates.
Number of frames | Number of detections | Detection rate (%) | |
---|---|---|---|
Car 1 | 151 | 151 | 100 |
Car 2 | 285 | 282 | 98.95 |
Car 3 | 299 | 273 | 91.3 |
Total/average | 735 | 706 | 96.05 |
Table 2 . Position RMSE.
Sampling time (s) | RMSE (m) | Average | ||
---|---|---|---|---|
Car 1 | Car 2 | Car 3 | ||
0.033 | 0.885 | 1.619 | 1.567 | 1.345 |
0.1 | 0.89 | 1.684 | 1.561 | 1.378 |
0.2 | 0.925 | 1.686 | 1.636 | 1.416 |
Table 3 . Velocity RMSE.
Sampling time (s) | RMSE (m/s) | Average | ||
---|---|---|---|---|
Car 1 | Car 2 | Car 3 | ||
0.033 | 1.472 | 1.71 | 2.114 | 1.765 |
0.1 | 1.377 | 1.983 | 1.844 | 1.734 |
0.2 | 1.287 | 2.015 | 2.366 | 1.89 |
Laily Nur Qomariyati, Nurul Jannah, Suryo Adhi Wibowo, and Thomhert Suprapto Siadari
International Journal of Fuzzy Logic and Intelligent Systems 2024; 24(3): 194-202 https://doi.org/10.5391/IJFIS.2024.24.3.194Dheo Prasetyo Nugroho, Sigit Widiyanto, and Dini Tri Wardani
International Journal of Fuzzy Logic and Intelligent Systems 2022; 22(3): 223-232 https://doi.org/10.5391/IJFIS.2022.22.3.223Donho Nam and Seokwon Yeom
International Journal of Fuzzy Logic and Intelligent Systems 2020; 20(1): 43-51 https://doi.org/10.5391/IJFIS.2020.20.1.43Block diagram of moving object detection.
|@|~(^,^)~|@|(a) Cars 1 and 2 at the first frame; (b) Cars 13 at the 150th frame; (c) Car 3 at the 300th frame. Moving cars are indicated by the red circles.
|@|~(^,^)~|@|(a) Frame differencing; (b) thresholding; (c) morphological filtering and centroids; (d) segmented region by ROI windows after false alarm removal.
|@|~(^,^)~|@|Measurements including false alarms for 368 frames.
|@|~(^,^)~|@|Tracking results whenΔ = 0.033 s: (a) Car 1, (b) Car 2, (c) Car 3.
|@|~(^,^)~|@|Ground truth of position: (a) Car 1, (b) Car 2, (c) Car 3.
|@|~(^,^)~|@|Approximated ground truth of velocity: (a) Car 1, (b) Car 2, (c) Car 3.
|@|~(^,^)~|@|Position error: (a) Car 1, (b) Car 2, (c) Car 3.
|@|~(^,^)~|@|Velocity error: (a) Car 1, (b) Car 2, (c) Car 3.
|@|~(^,^)~|@|Tracking results whenΔ = 0.1 s: (a) Car 1, (b) Car 2, (c) Car 3.
|@|~(^,^)~|@|Tracking results whenΔ = 0.2 s: (a) Car 1, (b) Car 2, (c) Car 3.