search for


Moving Vehicle Detection and Drone Velocity Estimation with a Moving Drone
International Journal of Fuzzy Logic and Intelligent Systems 2020;20(1):43-51
Published online March 25, 2020
© 2020 Korean Institute of Intelligent Systems.

Donho Nam and Seokwon Yeom

School of Information Convergence Technology, Daegu University, Gyeongsan, 38453, Korea
Correspondence to: Seokwon Yeom (
Received September 17, 2019; Revised December 24, 2019; Accepted March 19, 2020.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Small unmanned aerial vehicles can be effectively used for aerial video surveillance. Although the field of view of the camera mounted on the drone is limited, flying drones can expand their surveillance coverage. In this paper, we address the detection of moving targets in urban environments with a moving drone. The drone moves at a constant velocity and captures video clips of moving vehicles such as cars, buses, and bicycles. Moving vehicle detection consists of frame registration and subtraction followed by thresholding, morphological operations and false blob reduction. First, two consecutive frames are registered; the coordinates of the next frame are compensated by a displacement vector that minimizes the sum of absolute difference between the two frames. Second, the next compensated frame is subtracted from the current frame, and the binary image is generated by thresholding. Finally, morphological operations and false alarm removal extract the target blobs. In the experiments, the drone flies at a constant speed of 5.1 m/s at an altitude of 150 m while capturing video clips of nine moving targets. The detection and false alarm rates as well as the receiver operating characteristic curves are obtained, and the drone velocities in the x and y directions are estimated by the displacement vector. The average detection rate ranges from 90% to 97% while the false alarm rate ranges from 0.06 to 0.5. The root mean square error of the speed is 0.07 m/s when the reference frame is fixed, showing the robustness of the proposed method.

Keywords : Drone, Unmanned aerial vehicle, Frame registration, Object detection, Velocity estimation
1. Introduction

Small drones or unmanned aerial vehicles (UAVs) are highly useful for aerial video surveillance. Moreover, moving object detection is essential for identifying harmful threats in advance. Drones can capture video from a distance while hovering at a fixed point or moving from one point to another as programmed. This remote scene capturing is cost effective and does not require highly trained personnel. However, the computational resources of small drones are often limited in battery, memory, computing power, and bandwidth, which pose a challenge to the analysis of high-resolution video sequences in real-time.

The importance of automatic video surveillance was highlighted in [1]. Various applications of the aerial surveillance by drones were presented in [2]. In [3], a distant moving object was detected using a background area subtractor. A coarse-to-fine detection scheme was proposed in [4]. In [5], the background was subtracted under a Gaussian mixture assumption, followed by a morphological filter. In [6], various object detection methods were reviewed in three categories: background subtraction, frame difference, and optical flow.

An all-in-one camera-based target detection and positioning system was studied for search and rescue mission in [7]. [8] and [9] studied the detection and tracking of moving vehicles by using Kalman and interacting multiple model (IMM) filters, respectively. The detection and tracking of pedestrians were studied with a drone hovering at a low altitude in [10]. In past studies, the drone hovered in one position as a stationary sensor, and the field of view was fixed to capture the same scene. The surveillance coverage of the drone depends on its distance to the object and the focal length of the camera. As the distance from the object increases, the drone coverage increases. However, the image quality easily degrades due to blur, noise, and low resolution.

Image registration is the process of matching two or more images of the same scene into the same coordinates. Various image registration methods were surveyed in [11]. Frames from different sensors was registered via modified dynamic time warping in [12]. Coarse-to-fine level image registration was proposed in [13]. Telemetry-assisted registrations of frames from a drone was studied in [14]. The speed of small UAVs can be set, but the direction of flight is determined by two waypoints on the map. Thus, the drone velocities cannot be immediately found in the x and y directions. In [15], the drone velocities were estimated using frame registration. In [16] and [17], the fuzzy controllers were studied for quadcopter trajectory tracking.

In this paper, we address the detection of moving targets in urban environments with a flying drone. The drone flies from one point to another at a constant speed while creating a wider coverage than a stationary drone. To compensate for the drone’s movement in the video sequence, frame registration is performed between two consecutive frames. The displacement vector for the next frame is obtained when the sum of absolute difference (SAD) between the two consecutive frames is minimized. This displacement vector compensates for the coordinates of the next frame. Moving objects are then detected in the current frame by frame subtraction and thresholding; the current frame is subtracted by the next frame that is compensated, and thresholding is performed to generate a binary image. Two morphological operations (erosion and dilation) are sequentially applied to the binary image. The erosion operation removes small clutters while dilation operation compensates for erosion in the object areas and connects the segmented areas of one object to generate target blobs. Finally, target blobs smaller than the target size are removed. Meanwhile, the drone velocities in the x and y directions are estimated by the displacement vector obtained from the frame registration. This velocity estimation does not depend on the global navigation satellite system thus, a stand-alone camera-based system can be built. It is noted that the accuracy of the frame registration can be measured by estimating the drone speed if the ground truth of the drone speed is known. Figure 1 shows a flying drone that detects moving targets on a city road. The contributions of this paper are (1) the surveillance coverage can be expanded by a flying drone, (2) drone velocity can be estimated through frame registration, (3) this approach can speed up the detection process with less computational resources because no massive training is required.

In the experiments, a drone flies in a straight line at a speed of 5.1 m/s and captures a video clip at the height of 150 m. The drone camera points directly downwards while capturing the video sequences. Nine moving vehicles (6 sedans, 2 buses, and 1 bicycle) are captured for approximately 15 seconds. The consecutive frames are apart by three frames for frame registration, and moving targets are detected every three frames. The detection and false alarm rates are obtained with different minimum blob sizes, and the receiver operating characteristic (ROC) curve is obtained. The average detection rate ranges from 0.9 to 0.97, while the false alarm rate ranges from 0.06 to 0.5. The root mean square error (RMSE) of the speed is 0.7 m/s when the reference frame is set to the first frame for frame registration.

The remainder of the paper is organized as follows: moving object detection and drone velocity estimation are discussed in Section 2. Section 3 presents experimental results and the conclusion follows in Section 4.

2. Frame Registration and Moving Object Detection

Figure 2 shows the block diagram of frame registration and moving object detection. The system consists of two stages: frame registration and object detection. The drone velocity is estimated by the displacement vector. The detailed procedures are described in the next subsections.

2.1 Frame Registration and Moving Object Detection

Two consecutive frames kL and k are registered using the SAD between them. The SAD is obtained as

SADI(px,py;k)=Ik-L(m,n)-Ik(m+px,n+py),m=1,,M,   n=1,,N,   k=1+L,1+2L,,K,

where Ik is the k frame; px and py are the translation vectors in the x and y directions, respectively, which are estimated by minimizing the SAD;M and N are the image sizes in the x and y directions, respectively; K is the total number of frames; L is the frame difference between two consecutive frames. In the experiments, L was set to 3 and K was 454, thus the reference and target frames in sEq. (1) varied as (I1, I4), (I4, I7), ..., (I451, I454).

The displacement vectors are estimated in the x and y directions by minimizing the SAD as


The subtraction and thresholding processes generates a binary image after the coordinates of the next frame is compensated, as

IB(m,n;k-L)={1,   ifIk-L(m,n)-Ik(m+p^Ix(k),n+p^Iy(k))>θT,0,   otherwise,}

where θT is a threshold value set to 85 in the experiments. Two morphological operations, erosion and dilation, were applied to the binary image as


where E and D are the structuring elements for erosion and dilation, respectively [15]; Ez and Dz are the translation of E and D, respectively, so that their origins are located at z; E and D are set to [1]2×2 and [1]20×20, respectively, in the experiments. The erosion process removes very small clutters, and the dilation restores the boundaries of the object areas and connects the fragmented areas from one object. The candidate blobs are the individual regions after the morphological operations. Finally, assuming the true size of the object is known, the false blobs are eliminated as


where Oi is the ith candidate blob and θs is the minimum blob size of the object; θs was set to 380, 400, and 420 in the experiments.

To evaluate the detection performance, three metrics are defined as

Rd(t)=Number of detections of target tNumber of frames target tappears,Rf=Number of false alarmsNumber of frames processed,Reff=Number of effective false alarmsNumber of frames processed,

where Rd(t) is the detection rate of target t; Rf and Reff are false alarm and effective false alarm rates, respectively. The number of effective false alarm is defined as the number of false alarms except for the number of people. Since we aim to detect the object larger than a certain size, objects (such as moving person) that are smaller than the minimum size are rejected as false alarms.

2.2 Drone Velocity Estimation

The velocity estimation of the drone is useful when a high-precision global navigation satellite system is not available. The accuracy of the frame registration can be also verified by the estimate of the drone speed. The following cumulative displacement vector minimizes the cumulative SAD, where the reference frame is set to the first frame as

SADC(px,py;k)=I(m,n;1)-Ik(m+px,n+py),m=1,,M,   n=1,,N,   k=1+L,1+2L,,K,,[p^Cx(k)p^Cy(k)]=min(pxpy)SADC(px,py;k).

The velocities of the moving drone in x and y directions are estimated with the cumulative displacement vector as


where d is a metric that converts pixels to meters, and T is the time between two frames. The actual values of d and T used in the experiments are described in the following sections.

The speed of the drone is calculated as


The accuracy of the speed is evaluated by the RMSE between the ground truth and the estimate as


where sg is the ground truth of the drone speed, which is assumed to be the speed set at 5.1 m/s; Keff is the effective frame number defined as (Kl)/L = 151 in the experiments.

3. Experimental Results

3.1 Scenario Description

The drone (DJI Phantom 4 Advanced) flew at 150 m high in a straight line at a constant speed of 5.1 m/s around the Daegu University main gate fountain. The speed was set at the control box by the operator. The drone camera pointed directly toward the ground and captured a video clip of 454 frames for approximately 15 seconds at 30 fps; thus, T in Eqs. (12) and (13) is 1/30 seconds. Nine targets (6 sedans, 2 buses, and 1 bicycle) were captured with a few pedestrians. It is noted that this study focused on moving targets that are faster and larger than people; thus, pedestrians were not considered as targets of interest. Each frame is 4096 × 2160 pixels and was scaled down to a 2048 × 1080-pixel-grayscale image for efficient image processing. One pixel of the frame corresponds to 0.11 m, and the coverage of a single frame is approximately 451 × 238 m. Therefore, d in Eqs. (12) and (13) is 0.11 m/pixel.

Figure 3(a) shows Targets 1–4 at the first frame, Figure 3(b) shows Targets 1–6 at the 54th frame, and Figure 3(c) shows Targets 1–3 and 6–9 at the 131st frame. In Figure 3, the red circles indicate the moving targets. The coverage of each frame continued to move as the drone flew slightly upwards from left to right.

Table 1 shows the characteristics of the nine targets such as the initial and final frame when the targets appear as well as the direction and component of the targets in the video. This analysis was performed manually to identify the target characteristics for the vehicle detection.

3.2 Moving Vehicle Detection

Two consecutive frames were 0.1 seconds (3 frames) apart from each other. After the coordinates of the next frame were compensated, frame subtraction was performed between the current and 0.1 seconds preceding frames. Thus, object detection was performed with 151 frames (1, 4, 7, ..., 451 frames). Figure 4(a) shows the results of the detection process of Figure 3(a) when θs is 400. Figure 4(b) and 4(c) show the results of the detection processes of Figure 3(b) and 3(c), respectively. The first columns in Figure 4 are the binary images generated by subtracting the frames with the threshold. The second columns are the images after the morphological operations (erosion and dilation). The third columns are the results of the false blob removal. The last columns are the segmented window areas of the original image.

Figure 5(a) to 5(c) show the detection results (bounding boxes) of Figure 3(a) to 3(c), respectively. The number of detections in Figure 3(a) to 3(c) are 3, 6, and 6 out of 4, 6, and 7 targets, respectively. No false alarm was detected in Figure 5(a) to 5(c). The bus (Target 1) was missing in Figure 5(a) due to the tree obstruction, and the gray sedan (Target 6) was missing in Figure 5(c) because the car stopped before the traffic signal. It should be noted that when a target stops, there is no change in the target area between the frames.

Table 2 shows the detection rates with varying θs as 380, 400, and 420. The average detection rate varies from 0.9 to 0.97 as the minimum size of the target increases. The minimum blob size (θs) can be determined by several factors, such as the focal length of the camera and the altitude of the drone as well as the minimum size of the vehicle. It is noted that the blob sizes of 380, 400, and 420 correspond to 4.6, 4.84, and 5.08 m2, respectively. In particular, the detection rate for Target 9 was low because it moved very slowly and then parked. If a target speed is very low, the size of the blob is not sufficiently large to be detected. Table 3 shows the number of false alarms and effective false alarms as well as false alarm and effective false alarm rates. The number of false alarms is 9, 23, and 76. Among them, the number of people is 5, 16, and 64, respectively. The false alarm rates range from 0.06 to 0.5, and the effective false alarm rates range from 0.03 and 0.08. Figure 6 shows the ROC curve for target detection. In the scenario, the maximum number of false alarms is 76, which corresponds to 0.5 false alarm rate. Therefore, we provided a ROC curve below 0.5 false alarm rate.

Figure 7 shows an expanded coverage as the drone moves. In Figure 7, the centroids of all the targets are presented in blue circles, including false alarms. The coverage was expanded to 2779 × 1096 pixels, which corresponds to 305 × 120 m. In the upper left part, several people were detected but disappeared with a larger threshold.

3.3 Drone Velocity Estimation

Figure 8(a) and 8(b) show the estimated velocities of the drone in the x and y directions, respectively, as obtained by Eqs. (12) and (13). The instant velocities were obtained with the displacement vectors in Eq. (2). The instant velocity in the y direction could not be obtained because the pixel change was measured as an integer and less than one over a short time. The instant and cumulative speed estimates are shown in Figure 8(c). The straight line in Figure 8(c) is the ground truth of the drone speed.

Table 4 shows the RMSEs of the instant and cumulative speeds. The cumulative speed is around 0.07 m/s, which is 98.6% accurate compared to the actual speed while the accuracy of the instant speed is 92.2%.

4. Conclusion

In this study, multiple moving targets were detected by a flying drone. To compensate for the movement of the drone, the frames were registered by SAD. Moving objects were then successfully detected by frame subtraction, morphological operations, and false blob removal. In addition, the drone speed was estimated, and the registration accuracy was verified with a ground truth. The detection rate was as high as 97% and the RMSE of the drone speed was as low as 0.07 m/s. However, if the target moves too slowly, the detection rate can be decreased. Blob fragments may be generated if the speed of the target is too high.

This technology can be applied to smart surveillance, such as unmanned security systems or search and rescue missions. Multiple target tracking after object detection remains for future study.

Conflict of Interest

No potential conflict of interest relevant to this article was reported.


This research was supported by the Daegu University Research Grant 2015.

Fig. 1.

Illustration of a flying drone for detection of moving targets.

Fig. 2.

Block diagram of moving object detection.

Fig. 3.

(a) Targets 1–4 at the first frame, (b) Targets 1–6 at the 54th frame, and (c) Targets 1–3 and 6–9 at the 131st frame.

Fig. 4.

Detection process of (a) Figure 3(a), (b)Figure 3(b), and (c)Figure 3(c).

Fig. 5.

Detection results (bounding boxes) of (a) Figure 3(a), (b)Figure 3(b), and (c)Figure 3(c).

Fig. 6.

ROC curve for target detection.

Fig. 7.

All detections including false alarms on the expanded coverage with varying θs as (a) 380, (b) 400, and (c) 420.

Fig. 8.

(a) Velocity in the x direction, (b) velocity in the y direction, and (c) speed.


Table 1

Target characteristics

Target IDInitial frameFinal frameDirectionComponent
11454RightBlue Bus
21454RightBlack Sedan
31454RightWhite Sedan
41196UpwardWhite Sedan
6130208LeftGray Sedan
7211454LeftBlue Bus
8295454LeftWhite-black Sedan
9370415ParkingWhite Sedan

Table 2

Detection rates

Target IDNumber of appearancesθs

Table 3

Number of false alarms and false alarm rates

Number of false alarms76239
Number of effective false alarms1274

Table 4

Speed RMSE

RMSE (m/s)

  1. Zaheer, Z, Usmani, A, Khan, E, and Qadeer, MA 2016. Aerial surveillance system using UAV., Proceedings of the 13th International Conference on Wireless and Optical Communications Networks (WOCN), Hyderabad, India, Array, pp.1-7.
  2. Hampapur, A, Brown, L, Connell, J, Pankanti, S, Senior, AW, and Tian, YL 2003. Smart surveillance: applications, technologies and implications., Proceedings of the 2013 Joint the 4th Pacific-Rim Conference on Multimedia, Singapore, Array, pp.1133-1138.
  3. Chung, W, Kim, Y, Kim, YJ, and Kim, D 2016. A two-stage foreground propagation for moving object detection in a non-stationary., Proceedings of the 13th IEEE International Conference on Advanced Video and Signal Based Surveillance, Colorado Springs, CO, Array, pp.187-193.
  4. Wu, Y, He, X, and Nquyen, TQ (2017). Moving object detection with a freely moving camera via background motion subtraction. IEEE Transactions on Circuits and Systems for Video Technology. 27, 236-248.
  5. Olugboja, A, and Wang, Z 2016. Detection of moving objects using foreground detector and improved morphological filter., Proceedings of the 3rd International Conference on Information Science and Control Engineering, Beijing, China, Array, pp.329-333.
  6. Tiwari, M, and Singhai, R (2017). A review of detection and tracking of object from image and video sequences. International Journal of Computational Intelligence Research. 13, 745-765.
  7. Sun, J, Li, B, Jiang, Y, and Wen, CY (2016). A camera-based target detection and positioning UAV system for search and rescue (SAR) purposes. Sensors. 16. article no. 1778
  8. Lee, MH, and Yeom, S (2018). Detection and tracking of multiple moving vehicles with a UAV. International Journal of Fuzzy Logic and Intelligent Systems. 18, 182-189.
  9. Lee, MH, and Yeom, S (2018). Multiple target detection and tracking on urban roads with a drone. Journal of Intelligent & Fuzzy Systems. 35, 6071-6078.
  10. Yeom, S, and Cho, IJ (2019). Detection and tracking of moving pedestrians with a small unmanned aerial vehicle. Applied Sciences. 9. article no. 3359
  11. Zitova, B, and Flusser, J (2003). Image registration methods: a survey. Image and Vision Computing. 21, 977-1000.
  12. Uchiyama, H, Takahashi, T, Ide, I, and Murase, H 2008. Frame registration of in-vehicle normal camera with omni-directional camera for self-position estimation., Proceedings of the 3rd International Conference on Innovative Computing Information and Control, Dalian, China, Array.
  13. Tico, M, and Pulli, K 2010. Robust image registration for multi-frame mobile applications., Proceedings of the 2010 Conference Record of the Forty Fourth Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, Array, pp.860-864.
  14. Tzanidou, G, Climent-Perez, P, Hummel, G, Schmitt, M, Stutz, P, Monekosso, DN, and Remagnino, P 2015. Telemetry assisted frame recognition and background subtraction in low-altitude UAV videos., Proceedings of the 12th IEEE International Conference on Advanced Video and Signal Based Surveillance, Karlsruhe, Germany, Array, pp.1-6.
  15. Nam, DH, and Yeom, S (2019). Estimation of drone velocity with sum of absolute difference between multiple frames. Journal of the Institute of Convergence Signal Processing. 20, 171-176.
  16. Rabah, M, Rohan, A, Han, YJ, and Kim, SH (2018). Design of fuzzy-PID controller for quadcopter trajectory-tracking. International Journal of Fuzzy Logic and Intelligent Systems. 18, 204-213.
  17. Tiep, DK, and Ryoo, YJ (2017). An autonomous control of fuzzy-PD controller for quadcopter. International Journal of Fuzzy Logic and Intelligent Systems. 17, 107-113.
  18. Jahne, B (2002). Digital Image Processing. Heidelberg: Springer

Donho Nam is currently B.S. degree in School of ICT Convergence at Daegu University in South Korea. He has been working on researches related to image processing using UAV. His research interests include intelligent image processing and multiple target detection.


Seokwon Yeom has been a faculty member of Daegu University since 2007. He is now a full professor of the same university, School of ICT Convergence. He was a visiting scholar at the University of Maryland in 2014 and a director of the Gyeongbuk techno-park specialization center in 2013. He has a Ph.D. in Electrical and Computer Engineering from the University of Connecticut in 2006. He is currently working on projects related with smart surveillance of small drones. His research interests are intelligent processing of image and optical information, machine learning, and target tracking. He has researched on multiple target tracking for airborne early warning, three-dimensional image processing with digital holography and integral imaging, photon-counting linear discriminant analysis and nonlinear matched filter, millimeter wave and infrared image analysis, low-resolution object recognition, and aerial surveillance with small unmanned vehicle systems. He is a member of the editorial board of Applied Sciences since 2019, a board member of the Korean Institute of Intelligent Systems since 2016, and a member of the board of directors of the Korean Institute of Convergence Signal Processing since 2014. He has served as program chair of the ICCCS2015, ISIS2017, iFUZZY2018, and ICCCS2019.