Abstract
We provide a set of generic modifications to improve the execution efficiency of single-shot object detectors by exploiting prior object locations in video sequences. We propose a crop-based method to accelerate object detection tasks. It dynamically generates crop regions based on prior information and exploits scene sparsity enabling focused use of computational resources. In contrast to prior work, smaller input resolutions for processing crop regions are used to further reduce computational load. The execution efficiency is increased by avoiding multiple executions of the detector in full resolution. Data augmentations are used to successfully train these lower-resolution networks and maintain their accuracy at the baseline level while reducing inference time. Experiments with two public datasets, UA-DETRAC [13] and UAVDT [2], using the SSD-ML [19] object detection architecture with \(128\times 128\), \(64\times 64\) and \(32\times 32\) input resolutions show that we can achieve a maximum speedup by a factor of 1.7 on the UA-DETRAC dataset, and 1.6 on the UAVDT dataset while delivering the same level of accuracy as the base method. An extensive set of experiments demonstrates the speed-accuracy trade-off and shows that our method can achieve accuracy comparable to state-of-the-art methods at lower execution time.
This work is funded by the NWO Perspectief program ZERO.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Experiments performed with PyTorch v1.8, CUDA and cuDNN 10.2.
References
Abdulla, W.: Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow (2017). https://github.com/matterport/Mask_RCNN
Du, D., et al.: The unmanned aerial vehicle benchmark: object detection and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 370–386 (2018)
Li, C., Yang, T., Zhu, S., Chen, C., Guan, S.: Density map guided object detection in aerial images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 190–191 (2020)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Ozge Unel, F., Ozkalayci, B.O., Cigla, C.: The power of tiling for small object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
Rŭžička, V., Franchetti, F.: Fast and accurate object detection in high resolution 4K and 8K video using GPUs. In: 2018 IEEE High Performance extreme Computing Conference (HPEC), pp. 1–7. IEEE (2018)
Sun, X., Wu, P., Hoi, S.C.: Face detection using deep learning: an improved faster RCNN approach. Neurocomputing 299, 42–50 (2018)
Wang, Y., Mao, K., Chen, T., Yin, Y., He, S., Chen, G.: Accelerating real-time object detection in high-resolution video surveillance. Concurr. Comput. Pract. Exp., e6307 (2021)
Wen, L., et al.: UA-DETRAC: a new benchmark and protocol for multi-object detection and tracking. Comput. Vis. Image Underst. 193, 102907 (2020)
Xu, J., Li, Y., Wang, S.: AdaZoom: adaptive zoom network for multi-scale object detection in large scenes. arXiv preprint arXiv:2106.10409 (2021)
Yang, F., Fan, H., Chu, P., Blasch, E., Ling, H.: Clustered object detection in aerial images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8311–8320 (2019)
Yilmaz, A., Javed, O., Shah, M.: Object tracking: a survey. ACM Comput. Surv. (CSUR) 38(4), 13-es (2006)
Zhang, J., Huang, J., Chen, X., Zhang, D.: How to fully exploit the abilities of aerial image detectors. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)
Zhang, X., Izquierdo, E., Chandramouli, K.: Dense and small object detection in UAV vision based on cascade network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)
Zwemer, M., Wijnhoven, R.G., et al.: SSD-ML: hierarchical object classification for traffic surveillance. In: 15th International Conference on Computer Vision. Imaging and Computer Graphics Theory and Applications (VISAPP2020), pp. 250–259. SCITEPRESS-Science and Technology Publications, LDA (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ulker, B., Stuijk, S., Corporaal, H., Wijnhoven, R. (2022). Accelerating Video Object Detection by Exploiting Prior Object Locations. In: Sclaroff, S., Distante, C., Leo, M., Farinella, G.M., Tombari, F. (eds) Image Analysis and Processing – ICIAP 2022. ICIAP 2022. Lecture Notes in Computer Science, vol 13232. Springer, Cham. https://doi.org/10.1007/978-3-031-06430-2_55
Download citation
DOI: https://doi.org/10.1007/978-3-031-06430-2_55
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06429-6
Online ISBN: 978-3-031-06430-2
eBook Packages: Computer ScienceComputer Science (R0)