UAV Multi-object Tracking by Combining Two Deep Neural Architectures

Mazzeo, Pier Luigi; Manica, Alessandro; Distante, Cosimo

doi:10.1007/978-3-031-43148-7_22

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14233))

Included in the following conference series:

International Conference on Image Analysis and Processing

606 Accesses

Abstract

Detecting and tracking multiple objects from unmanned aerial vehicle (UAV) videos is an high challenging task in a wide range of practical applications. Almost all traditional trackers meet some issues on UAV images due to camera movements causing view change in a 3D directions. In this work, we propose a Convolutional Neural Network specialized in multi-object tracking (MOT) for images captured from UAV. The architecture we introduced is composed by two main blocks: i) an object detection block based on YOLOv8 architecture; ii) an association block based on strongSORT architecture. We investigated different versions of YOLOv8 architectures with the strongSORT as association trackers. Experimental results on the VisDrone2019 dataset show that the proposed solution outperforms the up to date state-of-the-art tracking algorithms performance on UAV videos reaching the 42.03% in Multi-Object Tracking Accuracy (MOTA).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Azimi, S.M., Kraus, M., Bahmanyar, R., Reinartz, P.: Multiple pedestrians and vehicles tracking in aerial imagery using a convolutional neural network. Remote. Sens. 13, 1953 (2021)
Article Google Scholar
Bochinski, E., Eiselein, V., Sikora, T.: High-speed tracking-by-detection without using image information. In: 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6. IEEE (2017)
Google Scholar
Braso, G., Leal-Taixe, L.: Learning a neural solver for multiple object tracking. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6246–6256 (2020)
Google Scholar
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018)
Google Scholar
Chen, L., Ai, H., Zhuang, Z., Shang, C.: Real-time multiple people tracking with deeply learned candidate selection and person re-identification. In: 2018 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2018)
Google Scholar
Du, Y., et al.: StrongSORT: make DeepSORT great again. IEEE Trans. Multimedia (2023)
Google Scholar
Glenn, J., Ayush, C., Jing, Q.: YOLO by ultralytics (2023). https://github.com/ultralytics/ultralytics, software
Huang, C., Wu, B., Nevatia, R.: Robust object tracking by hierarchical association of detection responses. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5303, pp. 788–801. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88688-4_58
Chapter Google Scholar
Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: FlowNet 2.0: evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2462–2470 (2017)
Google Scholar
Kalal, Z., Mikolajczyk, K., Matas, J.: Forward-backward error: automatic detection of tracking failures. In: 2010 20th International Conference on Pattern Recognition, pp. 2756–2759. IEEE (2010)
Google Scholar
Li, Z., Peng, C., Yu, G., Zhang, X., Deng, Y., Sun, J.: DetNet: design backbone for object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 339–354. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_21
Chapter Google Scholar
Liang, Y., Zhou, Y.: LSTM multiple object tracker combining multiple cues. In: 2018 25th IEEE International Conference on Image Processing (ICIP), pp. 2351–2355. IEEE (2018)
Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Google Scholar
Liu, S., Li, X., Lu, H., He, Y.: Multi-object tracking meets moving UAV. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8876–8885 (2022)
Google Scholar
Luo, H., Gu, Y., Liao, X., Lai, S., Jiang, W.: Bag of tricks and a strong baseline for deep person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019)
Google Scholar
Meinhardt, T., Kirillov, A., Leal-Taixe, L., Feichtenhofer, C.: TrackFormer: multi-object tracking with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8844–8854 (2022)
Google Scholar
Milan, A., Rezatofighi, S.H., Dick, A., Reid, I., Schindler, K.: Online multi-target tracking using recurrent neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31 (2017)
Google Scholar
Pirsiavash, H., Ramanan, D., Fowlkes, C.C.: Globally-optimal greedy algorithms for tracking a variable number of objects. In: CVPR 2011, pp. 1201–1208. IEEE (2011)
Google Scholar
Sadeghian, A., Alahi, A., Savarese, S.: Tracking the untrackable: learning to track multiple cues with long-term dependencies. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 300–311 (2017)
Google Scholar
Solawetz, J.: What is YOLOv8? The ultimate guide. https://blog.roboflow.com/whats-new-in-yolov8/
Sun, S., Akhtar, N., Song, H., Mian, A., Shah, M.: Deep affinity network for multiple object tracking. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 104–119 (2019)
Google Scholar
Tang, Z., et al.: CityFlow: a city-scale benchmark for multi-target multi-camera vehicle tracking and re-identification. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8789–8798 (2019)
Google Scholar
ultralytics: Ultralytics YOLOv8: the state-of-the-art YOLO model. https://ultralytics.com/yolov8
Varga, L.A., Kiefer, B., Messmer, M., Zell, A.: SeaDronesSee: a maritime benchmark for detecting humans in open water. In: 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 3686–3696 (2021)
Google Scholar
Wang, G., Wang, Y., Zhang, H., Gu, R., Hwang, J.N.: Exploit the connectivity: multi-object tracking with TrackletNet. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 482–490 (2019)
Google Scholar
Wang, G., Yuan, X., Zheng, A., Hsu, H.M., Hwang, J.N.: Anomaly candidate identification and starting time estimation of vehicles from traffic videos. In: CVPR Workshops (2019)
Google Scholar
Wang, G., Yuan, Y., Chen, X., Li, J., Zhou, X.: Learning discriminative features with multiple granularities for person re-identification. In: Proceedings of the 26th ACM International Conference on Multimedia, pp. 274–282 (2018)
Google Scholar
Wang, Z., Zheng, L., Liu, Y., Li, Y., Wang, S.: Towards real-time multi-object tracking. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 107–122. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_7
Chapter Google Scholar
Wen, L., et al.: Detection, tracking, and counting meets drones in crowds: a benchmark. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7808–7817 (2021)
Google Scholar
Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 3645–3649. IEEE (2017)
Google Scholar
Wu, H., Nie, J., He, Z., Zhu, Z., Gao, M.: One-shot multiple object tracking in UAV videos using task-specific fine-grained features. Remote Sens. 14(16), 3853 (2022)
Article Google Scholar
Wu, X., Li, W., Hong, D., Tao, R., Du, Q.: Deep learning for unmanned aerial vehicle-based object detection and tracking: a survey. IEEE Geosci. Remote Sens. Mag. 10(1), 91–124 (2021)
Article Google Scholar
Zeng, F., Dong, B., Zhang, Y., Wang, T., Zhang, X., Wei, Y.: MOTR: end-to-end multiple-object tracking with transformer. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, 23–27 October 2022, Proceedings, Part XXVII, pp. 659–675. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19812-0_38
Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4203–4212 (2018)
Google Scholar
Zhang, Y., Wang, C., Wang, X., Zeng, W., Liu, W.: FairMOT: on the fairness of detection and re-identification in multiple object tracking. Int. J. Comput. Vis. 129, 3069–3087 (2021)
Article Google Scholar
Zheng, L., et al.: MARS: a video benchmark for large-scale person re-identification. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 868–884. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_52
Chapter Google Scholar
Zhou, K., Yang, Y., Cavallaro, A., Xiang, T.: Omni-scale feature learning for person re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3702–3712 (2019)
Google Scholar
Zhu, P., et al.: Detection and tracking meet drones challenge. IEEE Trans. Pattern Anal. Mach. Intell. 44(11), 7380–7399 (2021)
Article Google Scholar

Download references

Acknowledgement

This research was funded in part by Future Artificial Intelligence Research-FAIR CUP B53C220036 30006 grant number PE0000013, and in part by the Ministry of Enterprises and Made in Italy with the grant ENDOR “ENabling technologies for Defence and mOnitoring of the foRests” - PON 2014-2020 FESR - CUP B82C21001750005. The authors would like to thank Mr. Arturo Argentieri from CNR-ISASI Italy for his technical contribution on the multi-GPU computing facilities.

Author information

Authors and Affiliations

ISASI - CNR, Via Monteroni sn, 73100, Lecce, Italy
Pier Luigi Mazzeo & Cosimo Distante
Università del Salento, Via Monteroni sn, 73100, Lecce, Italy
Alessandro Manica

Authors

Pier Luigi Mazzeo
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Manica
View author publications
You can also search for this author in PubMed Google Scholar
Cosimo Distante
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pier Luigi Mazzeo .

Editor information

Editors and Affiliations

University of Udine, Udine, Italy
Gian Luca Foresti
University of Udine, Udine, Italy
Andrea Fusiello
University of York, York, UK
Edwin Hancock

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mazzeo, P.L., Manica, A., Distante, C. (2023). UAV Multi-object Tracking by Combining Two Deep Neural Architectures. In: Foresti, G.L., Fusiello, A., Hancock, E. (eds) Image Analysis and Processing – ICIAP 2023. ICIAP 2023. Lecture Notes in Computer Science, vol 14233. Springer, Cham. https://doi.org/10.1007/978-3-031-43148-7_22

Download citation

DOI: https://doi.org/10.1007/978-3-031-43148-7_22
Published: 05 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43147-0
Online ISBN: 978-3-031-43148-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

UAV Multi-object Tracking by Combining Two Deep Neural Architectures