Abstract
Moving object detection is an imperative task in computer vision, where it is primarily used for surveillance applications. With the increasing availability of low-altitude aerial vehicles, new challenges for moving object detection have surfaced, both for academia and industry. In this paper, we propose a new approach that can detect moving objects efficiently and handle parallax cases. By introducing sparse flow based parallax handling and downscale processing, we push the boundaries of real-time performance with 16 FPS on limited embedded resources (a five-fold improvement over existing baselines), while managing to perform comparably or even improve the state-of-the-art in two different datasets. We also present a roadmap for extending our approach to exploit multi-modal data in order to mitigate the need for parameter tuning.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Ranging from high-altitude Unmanned Aerial Vehicles (UAV) capable of flying at 65,000 ftFootnote 1 to low-altitude miniature drones, long-endurance variants to micro air vehicles weighing just a few gramsFootnote 2, UAV industry has gone through a meteoric rise. Owing to their ever increasing availability in civilian and military sectors alike, UAV variants have been disruptive in the last decade and consequently found use in several applications, such as disaster relief, precision agriculture, cinematography, cargo delivery, industrial inspection, mapping, military surveillance and air support [1].
Following the industrial attention, academic community also contributed to the transformation of UAVs in various aspects, such as aerodynamics, avionics and various sensory data acquired by said platforms. Slightly different than remote sensing domain, drone-mounted imagery has paved the way for new research in computer vision (CV). There has been a large quantity of studies reported in object detection [2,3,4,5,6], action detection [7], visual object tracking [8,9,10], object counting [11] and road extraction [12]. In recent years, new datasets [7, 13,14,15,16,17], challenges and dedicated workshops [18, 19] have surfaced to bridge the gap between drone-specific vision problems and their generic versions.
From a practical perspective, low-altitude drones introduce several new problems for CV algorithms. Proneness to sudden platform movements and exposure to environmental conditions arguably affect low-altitude drones in a more pronounced manner compared to their high-altitude counterparts. Moreover, fast-changing operating altitudes and camera viewpoints result into the generation of data with a large diversity, which inherently furthers the complexity of virtually any vision problem. Their small-sized nature also impose severe limits on the availability of computational resources installed on-board, which calls for non-trivial engineering solutions [20, 21].
Moving object detection (MOD), primarily used for surveillance purposes, is a long-standing problem in CV and has been the subject of many studies [22,23,24]. Due to the presence of platform motion in drone vision, it becomes a notorious problem, where platform motion can easily be confused with moving regions/objects. Several solutions addressing platform motion issue have been reported [25, 26]. Moreover, low-altitude drone cases also suffer from severe motion parallax which causes objects closer to camera move faster than objects further away. Solutions provided for motion parallax issue is considered computationally expensive [17, 27,28,29], which makes the solutions even harder especially when on-board processing with (near) real-time performance is a hard constraint.
In this paper, we propose a new approach for moving object detection, primarily optimized for embedded resources for on-board functionality. We make two main contributions; first, we show that performing a large portion of our pipeline in lower resolutions significantly improve the runtime performance while keeping our accuracy high. Second, we design the matching part of the parallax handling scheme using a simple sparse-flow based technique which avoids the bottlenecks such as failing to extract features from candidate objects or inferior feature matching. Its sparse nature also contributes to further speed-ups, pushing further to real-time performance on embedded platforms.
The paper is organized as follows. In Sect. 2, related work in the literature is reviewed. The proposed approach is explained thoroughly in Sect. 3. Experimental results and their analysis are reported in Sect. 4. We conclude our work by drawing insights and making future recommendations in Sect. 5.
2 Related Work
The research community has contributed to moving object detection literature considerably over the last few decades. Earlier studies aimed to solve this problem for static cameras, where background subtraction [22] and temporal differencing [30] based solutions slowly transformed into more sophisticated approaches such as background learning via Mixture of Gaussians, Eigen backgrounds and motion layers [31, 32]. As mobile platforms started to emerge, a new layer of complexity was introduced; ego-motion. The presence of ego-motion renders obsolete the approaches devised for static cameras, as the platform motion is likely to produce quite a few false positives. Moreover, this problem becomes more pronounced when platform motion is sudden.
A simple method to tackle platform-motion induced false positives is to perform image alignment as a preprocessing step. By finding the affine/perspective transformation between two consecutive images, one can warp an image onto another and then perform temporal differencing. Primarily named as “feature-based” methods, such methods depend on accurate image alignment where accurate feature keypoint/descriptor computation is imperative [33]. Another approach to solve ego-motion in such cases can be referred as “motion-based”, where motion layers [32] and optical flow [26] techniques are utilized. In cases where planar surface assumption (if any) does not hold, the perspective transformation based warping fails to handle motion parallax induced false positives. Unlike high-altitude scenarios, motion parallax becomes a severe problem in imagery taken from the ground as well as low-altitude UAV imagery. There are studies in the literature using various geometric constraints and flow-based solutions which claim to mitigate the effects of motion parallax [27, 34].
Building on the simple solutions reported above, several high impact studies have been reported in recent years. Based on their previous study [34], in [35] authors propose a new method that is related with the projective structure between consecutive image planes, which is used in conjunction with epipolar constraint. This new constraint is useful to detect the moving objects which move along the same direction with the camera, which is a configuration epipolar constraint misses to detect. Assessed using airborne videos, authors state abrupt motion or medium-level parallax might be detrimental to the efficacy of their algorithm. Authors of [36] tackle moving object detection for ground robots, where they use epipolar constraint along with a motion estimation mechanism to handle degenerate cases (camera and platform move to the same direction) in a Bayesian framework. Work reported in [27] handles moving object detection by using epipolar and flow-vector bound constraints, which facilitates parallax handling as well as degenerate cases. Authors estimate the camera pose by using Parallel Tracking and Mapping technique. Similar methods have been reported in [37] and [17], where both algorithms target low altitude imagery but the latter handles parallax in an optimized manner.
In addition to feature based methods mentioned above, motion-based methods have also emerged. In [28], authors fuse the sensory data with imagery to facilitate moving object detection in the presence of ego-motion and motion parallax. By using optical flow in conjunction with the epipolar constraint, authors show they can eliminate parallax effects in videos taken from ground vehicles. In work reported in [38], authors use a dense flow based method where optical flow and artificial flow are assessed for their orientation and magnitude to find moving objects in aerial imagery. Another study using flow-based approaches is [39], where authors use optical flow information along with a reduced Singular Value Decomposition and image inpainting stages to handle parallax and ego-motion. They present their results using sequences taken from aerial and ground vehicles. In [40], authors use artificial flow and background subtraction together. They formulate two scores; anomaly and motion scores where the former facilitates good precision and the latter helps achieve improved recall values.
3 Our Approach
In this work, we propose a hybrid moving object detection pipeline which fuses feature based and optical flow based approaches in an efficient manner for near real time performance. In addition, we propose many minor improvements in the pipeline for increasing processing speed as well as detection accuracy. Our proposed pipeline is given in Fig. 1. It is based on well studied ego-motion compensation and plane-parallax decomposition approaches [17, 28, 34, 35, 41] and divided into different process lines for ease of understanding.
3.1 Preprocessing and Ego-Motion Compensation
One of the most challenging parts of moving object detection from a drone is to be able to detect varying size of objects from varying altitudes. In a background subtraction and ego-motion compensation based system, such as ours, the easiest way to cope with this variation is to be able to use varying length of time difference between frames that are compared. Thus, as the very first stage of our pipeline, we have implemented a dynamic frame buffer that changes its size according to the height measurements read (when available) from the barometric sensor and speed measurements read from IMU (Inertial Measurement Unit) as well as the users’ desire of detection sensitivity. The size of the buffer, thus the time \(\varDelta \) between frames that will be processed, increases as the required sensitivity to detect smaller objects (and/or smaller movements) increase. In our system, before pushing the frames into our buffer, if the used camera is known and calibration is possible, we correct the lens distortion (radial and tangential) as well.
Typical to the majority of computer vision systems, feature extraction and matching take a significant time of our pipeline and form the bottleneck. Additionally, we claim that calculating the homography between frames in high resolution is not worth the loss in runtime. Therefore, we downscale the input images for feature extraction and matching (using SURF [42]), and then calculate the homographies between frames t, \(t-\varDelta \) and \(t-\varDelta \), \(t-2\varDelta \). However, to detect smaller objects, the rest of the pipeline runs on original resolution. To achieve this, the homographies calculated in lower resolution \(H_d\) are used to calculate/estimate the original resolution homographies \(H_u\) using Eq. 1.
where \(P_{do}\) is the perspective transformation between the downscaled image and original image.
3.2 Moving Object Detection
The calculated upscale homographies (\(H_u\)) are used for perspective warping (of original image \(F_o\)) and three-frame differencing. As can be seen in Fig. 2, current and previous frames are warped on the center frame separately, and two separate two-frame differences are calculated. These two-frame difference results are then processed with an empirical threshold value, which produces a binary image for each. Morphological operations are used to cancel noise and associate pixels belonging to the same object. These two-frame differences (after thresholding and morphological operations) are joined with a logical AND operation to facilitate three-frame differencing. Resulting three-frame difference is then subjected to a connected component analysis to create the object bounding boxes.
3.3 Parallax Filtering
Especially for mini UAVs that operate typically under 150 m, parallax can be a significant problem. Without a dedicated algorithm, there might be many false positives due to trees, buildings, etc. In the literature, using geometric constraints has proven to be an effective solution for eliminating parallax regions [17, 27, 28, 35]. In these studies, either features that are extracted on candidate moving objects are tracked/matched [17, 27] or each candidate pixel is densely tracked/matched [28, 35] to be able to apply geometric constraints. Instead of these, we propose a fast and efficient hybrid method that only tracks the center locations of the candidate objects using sparse optical flow (via [43]). As can be seen from Table 1, this method facilitates significant performance improvement compared to feature tracking based methods. After tracking only the center locations of the candidate objects, we apply epipolar constraint on tracked locations. As can be seen in Figs. 3 and 4, the benefits of tracking only object centers are two fold; epipolar constraint calculations are significantly reduced and the requirement of having keypoints/features on a candidate object is removed.
In order to understand the epipolar constraint [44], assume that \(I_{t-\varDelta }\) and \(I_{t}\) denote two images of a scene (taken by the same camera at different positions in space) at times \(t-\varDelta \) and t, and P denote a 3D point in the scene. In addition, let \(p_{t-\varDelta }\) be the projection of P on \(I_{t-\varDelta }\), and \(p_{t}\) be the projection of P on \(I_{t}\).
In light of these, a unique fundamental matrix, represented by \(F_{t}^{t-\varDelta }\), that relates images \(I_t\) to \(I_{t-\varDelta }\) can be found, which satisfies
for all corresponding points \(p_{t-\varDelta }^{i}\) and \(p_{t}^{i}\) where i represents each unique image point. In the case where P is a static point, it satisfies
where \(el_{t-\varDelta }\) and \(el_{t}\) are epipolar lines corresponding to \(p_{t}\) and \(p_{t-\varDelta }\), respectively. If P is a 3D static point, \(p_t\) should be located on the epiline \(el_t\) (see Fig. 5a). Otherwise, P will not satisfy the epipolar constraint (see Fig. 5b). One exceptional case can occasionally rise, where the point of interest moves along the epilines themselves. This occurs when the camera and the point of interest move along the same direction (i.e. degenerate case).
If camera information required for camera calibration is available, essential matrix instead of fundamental matrix can be used for more accurate results as follows,
where K denotes the camera calibration matrix, \(\widehat{T}\) denotes the skew symmetric translation matrix and R denotes the rotation matrix between corresponding frames.
4 Experiments
4.1 Datasets
We evaluate our technique in a rigorous manner using two different configurations. In the first one, we use the well-known VIVID [45] dataset. VIVID consists of nine sequences, where three are thermal IR data and the rest are RGB. VIVID annotations are available for every tenth frame and it contains annotations for only one object in the scene. We use a select number of VIVID sequences (egtest01-02-04-05) solely to compare our results with other algorithms. VIVID is the most commonly used dataset for evaluating moving object detection algorithms although it is intended for object tracking. Since VIVID is developed for benchmarking tracking algorithms, only single object (even though multiple moving objects exists) is annotated for each 10th frame.
Our second set of evaluation is performed using the publicly available LAMOD dataset [17]. LAMOD consists of various sequences taken from two publicly available datasets, VIVID and UAV123 [16]. These sequences are hand-annotated from scratch for each moving object present in the scene. Annotations are available for each frame and the dataset provides a large set of adverse effects, such as motion parallax, occlusion, out-of-focus and altitude/viewpoint variation [17].
4.2 Results
Execution Time. Improvements introduced in run-time performance by our approach is primarily two folds; calculation of the features and homography at downscale and sparse optical flow based parallax filtering. We perform our execution time analysis on NVIDIA Jetson TX1 and TX2 modulesFootnote 3.
As expected, feature extraction in downscaled versions introduce significant speed-ups. We observe that from \(1280 \times 720\) resolution to \(640 \times 360\), downscale processing improves runtimes from 148 to 42 ms and 113 to 30 ms for TX1 and TX2, respectively. As downscale processing effectively reduces the number of extracted features, this also reflects on speed of feature matching. Comparing \(1280 \times 720\) to \(640 \times 360\) versions, speed of matching improves by the square of input size ratios due to brute-force matching. We see matching speeds change from 146 to 8 ms and 106 to 6 ms (approximately 1700% improvement) for TX1 and TX2, respectively. Sparse optical flow based parallax handling, compared to feature based parallax handling, also introduces considerable execution time gains, as shown in Table 1. TX1 results show an improvement of 20% to 25% whereas TX2 results show improvements in between 18% to 20%.
Table 2 shows a detailed comparison of a recent technique [17] and our approach. A significant improvement up to 40% is observed for low resolution inputs, both with and without parallax filtering. For larger input resolutions, improvements are in between 200% to 400%.
To support our claim that downscale processing does not lead to significant degradation in accuracy, we also assess our pipeline with full high resolution operation. We present results for original and downscaled operations for LAMOD ground truths in Table 4. Results show a slight decrease in accuracy when compared to high resolution. Except a maximum of 6% decrease in recall for egtest02, we do not see any other significant decrease in accuracies. In fact, precision and recall values do not even change in many cases, such as egtest04 recall and egtest04 precision values.
Accuracy. We first evaluate our proposed approach using single object ground truths of VIVID dataset to compare our performance with other baseline algorithms. We use precision/recall as our metric and take a minimum of 50% overlap to be a correct detection. As all the baseline algorithms have reported their results in terms of correct detection ratio and miss detection ratio, we convert these results to precision and recall for a better comparison (miss detection ratio is effectively \(1-precision\), whereas correct detection is ratio is equal to precision). We do not report results for parallax handling for sequences EgTest01 and EgTest02 as they do not have parallax effects. Results are shown in Table 3.
Our proposed algorithm performs comparably to other baselines, even surpassing them in several sequences; EgTest01 and egtest02 results outperform all others in precision, whereas our precision or recall values are the second best in other sequences. Our method shines as it has close precision and recall values. When we perform parallax handling, an expected reduction in recall is compensated with an increase in precision, practically evening out or improving the final F-score. It must be noted that nearly all baselines are effectively object trackers, which means our algorithm performs quite accurately as we do not support our detection with a sophisticated tracker.
We then assess our pipeline for multiple moving objects using LAMOD dataset. We use precision/recall and per-frame precision/recallFootnote 4 (i.e. where precision and recall is calculated for every frame and then averaged) as our evaluation metric where 50% overlap is considered a detection. Similar to previous section of our evaluation, we do not report parallax filtering results for EgTest01 and EgTest02. Exemplary results are visualized in Fig. 6. Results are shown in Table 4.
Results indicate our proposed algorithm significantly outperforms an existing baseline [17] in all sequences except EgTest05. Parallax filtering introduces considerable gains in precision and modest reductions in recall, as reported before. This is expected as EgTest04 and EgTest05 have degenerate cases (i.e. objects and the platform move along the same direction) and our approach currently does not handle such cases. This leads to the elimination of true positives by parallax filtering, thus the reduction in recall.
4.3 Multi-modal Extension
In the previous section, as we use public datasets where no IMU, height measurement or camera information is available, we can not fully utilise the adaptive algorithm we show in Fig. 1. This means we can not use lens distortion correction at all and we can only use a fixed set of parameters (i.e. dynamic buffer size) for all sequences. In order to show how our pipeline works while utilising external sensory data, we present some qualitative results with our in-house captured videos, where we were able to acquire the relevant IMU and camera parameter information.
Lens Distortion Correction. Lens distortion distorts certain pixels to other locations, radially or tangentially in our case, which directly effects our results (see Fig. 7(b)). This occurs as pixels are distorted to some other location and during image registration, they are erroneously detected as moving objects. By using radial and tangential coefficients specific to the camera lens, this effect can be corrected. Such correction leads to visible improvements in our performance (see Fig. 7(d)).
Dynamic Frame Buffer. It can be hard to detect slowly moving objects in high altitudes as their relative displacement in the image is not large. This can be alleviated by using the height measurements provided by a barometric sensor and vehicle speed measurements by IMU; we dynamically change the size of the buffer (namely the distance between the frames to be differenced) linearly by using the altitude and speed information. By doing so, we effectively amplify the perceived movement of slow moving objects, thus making them highly detectable. Exemplary results shown in Fig. 8(c) and (d) verify the said phenomena and shows a visible improvement in recall.
5 Conclusions
In this paper, we propose a new approach aimed at tackling moving object detection problem for imagery taken from low-altitude aerial platforms. Capable of handling the motion of the platform as well as the detrimental effects of motion parallax, our approach performs parallax handling by sparse optical flow based tracking along with epipolar constraint and performs a large portion of the pipeline in lower resolutions. These two changes introduce significant runtime improvements, reaching up to 16 FPS on embedded resources. Moreover, we analyze our approach in two different datasets for single and multiple moving object detection tasks. We observe that our results perform either comparably or better than existing state-of-the-art algorithms. We also outline an advanced pipeline capable of exploiting multi-modal data that might alleviate the need of laborious parameter tuning. As future work, we aim to integrate a light-weight scheme to alleviate the effect of degenerate motion cases. Should a dataset with IMU, height measurements and camera information become publicly available, we aim to assess our approach in a multi-modal setting.
Notes
- 1.
- 2.
- 3.
- 4.
Authors of [17] have reported their results with this metric, therefore we give these results to compare our work.
References
Clarke, R.: Understanding the drone epidemic. Comput. Law Secur. Rev. 30(3), 230–246 (2014)
Zhong, J., Lei, T., Yao, G.: Robust vehicle detection in aerial images based on cascaded convolutional neural networks. Sensors 17(12), 2720 (2017)
Li, F., Li, S., Zhu, C., Lan, X., Chang, H.: Cost-effective class-imbalance aware cnn for vehicle localization and categorization in high resolution aerial images. Remote Sens. 9(5), 494 (2017)
Tijtgat, N., Van Ranst, W., Volckaert, B., Goedemé, T., De Turck, F.: Embedded real-time object detection for a UAV warning system. In: The International Conference on Computer Vision, ICCV 2017, pp. 2110–2118 (2017)
Sommer, L.W., Schuchert, T., Beyerer, J.: Fast deep vehicle detection in aerial images. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 311–319. IEEE (2017)
Stek, T.D.: Drones over Mediterranean landscapes. The potential of small UAV’s (drones) for site detection and heritage management in archaeological survey projects: a case study from Le Pianelle in the Tappino Valley, Molise (Italy). J. Cultural Herit. 22, 1066–1071 (2016)
Barekatain, M., et al.: Okutama-action: an aerial view video dataset for concurrent human action detection. In: 1st Joint BMTT-PETS Workshop on Tracking and Surveillance, CVPR, pp. 1–8 (2017)
Pestana, J., Sanchez-Lopez, J.L., Campoy, P., Saripalli, S.: Vision based GPS-denied object tracking and following for unmanned aerial vehicles. In: 2013 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), pp. 1–6. IEEE (2013)
Dang, C.T., Pham, T.B., Truong, N.V., et al.: Vision based ground object tracking using AR drone quadrotor. In: 2013 International Conference on Control, Automation and Information Sciences (ICCAIS), pp. 146–151. IEEE (2013)
Chen, P., Dang, Y., Liang, R., Zhu, W., He, X.: Real-time object tracking on a drone with multi-inertial sensing data. IEEE Trans. Intell. Transp. Syst. 19(1), 131–139 (2018)
Hsieh, M.R., Lin, Y.L., Hsu, W.H.: Drone-based object counting by spatially regularized regional proposal network. In: The IEEE International Conference on Computer Vision (ICCV), vol. 1 (2017)
Kanistras, K., Martins, G., Rutherford, M.J., Valavanis, K.P.: A survey of unmanned aerial vehicles (UAVs) for traffic monitoring. In: 2013 International Conference on Unmanned Aircraft Systems (ICUAS), pp. 221–234. IEEE (2013)
Du, D., et al.: The unmanned aerial vehicle benchmark: object detection and tracking. arXiv preprint arXiv:1804.00518 (2018)
Wang, S., et al.: TorontoCity: seeing the world with a million eyes. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3028–3036. IEEE (2017)
Xia, G.S., et al.: DOTA: a large-scale dataset for object detection in aerial images. In: Proceedings of CVPR (2018)
Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for UAV tracking. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 445–461. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_27
Berker Logoglu, K., et al.: Feature-based efficient moving object detection for low-altitude aerial platforms. In: The IEEE International Conference on Computer Vision (ICCV) Workshops, October 2017
Lam, D., et al.: xView: objects in context in overhead imagery. arXiv preprint arXiv:1802.07856 (2018)
Zhu, P., Wen, L., Bian, X., Ling, H., Hu, Q.: Vision meets drones: a challenge. arXiv preprint arXiv:1804.07437 (2018)
Yu, Q., Medioni, G.: A GPU-based implementation of motion detection from a moving platform (2008)
Kryjak, T., Komorkiewicz, M., Gorgon, M.: Real-time moving object detection for video surveillance system in FPGA. In: 2011 Conference on Design and Architectures for Signal and Image Processing (DASIP), pp. 1–8. IEEE (2011)
Elgammal, A., Duraiswami, R., Harwood, D., Davis, L.S.: Background and foreground modeling using nonparametric kernel density estimation for visual surveillance. Proc. IEEE 90(7), 1151–1163 (2002)
Eveland, C., Konolige, K., Bolles, R.C.: Background modeling for segmentation of video-rate stereo sequences. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 266–271. IEEE (1998)
Zhou, X., Yang, C., Yu, W.: Moving object detection by detecting contiguous outliers in the low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell. 35(3), 597–610 (2013)
Suganuma, N., Kubo, T.: Fast dynamic object extraction using stereovision based on occupancy grid maps and optical flow. In: 2011 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM), pp. 978–983. IEEE (2011)
Rodríguez-Canosa, G.R., Thomas, S., Del Cerro, J., Barrientos, A., MacDonald, B.: A real-time method to detect and track moving objects (DATMO) from unmanned aerial vehicles (UAVs) using a single camera. Remote Sens. 4(4), 1090–1111 (2012)
Kimura, M., Shibasaki, R., Shao, X., Nagai, M.: Automatic extraction of moving objects from UAV-borne monocular images using multi-view geometric constraints. In: International Micro Air Vehicle Conference and Competition, IMAV 2014, Delft, The Netherlands, 12–15 August 2014, Delft University of Technology (2014)
Salgian, G., Bergen, J., Samarasekera, S., Kumar, R.: Moving target indication from a moving camera in the presence of strong parallax. Technical report, DTIC Document (2006)
Dey, S., Reilly, V., Saleemi, I., Shah, M.: Detection of independently moving objects in non-planar scenes via multi-frame monocular epipolar constraint. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 860–873. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33715-4_62
Paragios, N., Deriche, R.: Geodesic active contours and level sets for the detection and tracking of moving objects. IEEE Trans. Pattern Anal. Mach. Intell. 22(3), 266–280 (2000)
Joshi, K.A., Thakore, D.G.: A survey on moving object detection and tracking in video surveillance system. Int. J. Soft Comput. Eng. 2(3), 44–48 (2012)
Cao, X., Lan, J., Yan, P., Li, X.: Vehicle detection and tracking in airborne videos by multi-motion layer analysis. Mach. Vis. Appl. 23(5), 921–935 (2012)
Irani, M., Anandan, P.: A unified approach to moving object detection in 2D and 3D scenes. IEEE Trans. Pattern Anal. Mach. Intell. 20(6), 577–589 (1998)
Kang, J., Cohen, I., Medioni, G., Yuan, C.: Detection and tracking of moving objects from a moving platform in presence of strong parallax. In: Tenth IEEE International Conference on Computer Vision, ICCV 2005, vol. 1, pp. 10–17. IEEE (2005)
Yuan, C., Medioni, G., Kang, J., Cohen, I.: Detecting motion regions in the presence of a strong parallax from a moving camera by multiview geometric constraints. IEEE Trans. Pattern Anal. Mach. Intell. 29(9), 1627–1641 (2007)
Kundu, A., Krishna, K.M., Sivaswamy, J.: Moving object detection by multi-view geometric techniques from a single camera mounted robot. In: 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4306–4312, October 2009
Minaeian, S., Liu, J., Son, Y.J.: Effective and efficient detection of moving targets from a UAV’s camera. IEEE Trans. Intell. Transp. Syst. 19, 497–506 (2018)
Castelli, T., Trémeau, A., Konik, H., Dinet, E.: Moving object detection for unconstrained low-altitude aerial videos, a pose-independant detector based on artificial flow. In: 2015 9th International Symposium on Image and Signal Processing and Analysis (ISPA), pp. 42–47. IEEE (2015)
Wu, Y., He, X., Nguyen, T.Q.: Moving object detection with a freely moving camera via background motion subtraction. IEEE Trans. Circuits Syst. Video Technol. 27(2), 236–248 (2017)
Makino, K., Shibata, T., Yachida, S., Ogawa, T., Takahashi, K.: Moving-object detection method for moving cameras by merging background subtraction and optical flow methods. In: 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pp. 383–387, November 2017
Ali, S., Shah, M.: COCOA: tracking in aerial imagery. In: Airborne Intelligence, Surveillance, Reconnaissance (ISR) Systems and Applications III, vol. 6209, p. 62090D. International Society for Optics and Photonics (2006)
Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_32
Lucas, B.D., Kanade, T., et al.: An iterative image registration technique with an application to stereo vision (1981)
Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003)
Collins, R., Zhou, X., Teh, S.K.: An open source tracking testbed and evaluation web site. In: IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS 2005), vol. 2, p. 35 (2005)
Hasan, M.: Integrating geometric, motion and appearance constraints for robust tracking in aerial videos (2013)
Babenko, B., Yang, M.H., Belongie, S.: Visual tracking with online multiple instance learning. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 983–990. IEEE (2009)
Grabner, H., Grabner, M., Bischof, H.: Real-time tracking via on-line boosting. In: British Machine Vision Conference, vol. 1, p. 6 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Lezki, H. et al. (2019). Joint Exploitation of Features and Optical Flow for Real-Time Moving Object Detection on Drones. In: Leal-Taixé, L., Roth, S. (eds) Computer Vision – ECCV 2018 Workshops. ECCV 2018. Lecture Notes in Computer Science(), vol 11130. Springer, Cham. https://doi.org/10.1007/978-3-030-11012-3_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-11012-3_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11011-6
Online ISBN: 978-3-030-11012-3
eBook Packages: Computer ScienceComputer Science (R0)