Abstract
Research on object detection and tracking has achieved remarkable progress in recent years. Due to the superior viewing angle and maneuverability advantages of unmanned aerial vehicles (UAVs), the application of UAV-based tracking is also undergoing rapid development. But since the targets captured by UAVs are tiny and all have similarities and low recognition, this leads to the great challenge of multiple-object tracking (MOT). To solve the two problems mentioned above, We propose TFATracking, a comprehensive framework that fully exploits temporal context for UAV tracking. To further reflect the effectiveness of the algorithm and promote the development of UAV object tracking, we present a large-scale, high-diversity benchmark for short-term UAV multi-object tracking named T2UAV in this work. It contains 20 UAV-captured video sequences with a total number of frames over 12k and an average video length of over 600 frames. We conduct a comprehensive performance evaluation of 8 MOT algorithms on the dataset and present a detailed analysis. We will release the dataset for free academic use.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Wojke, N., et al.: Simple online and realtime tracking with a deep association metric. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 3645–3649 (2017)
Yu, F., et al.: POI: Multiple Object Tracking with High Performance Detection and Appearance Feature (2016)
Dicle, C., et al.: The way they move: tracking multiple targets with similar appearance. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2304–2311 (2013)
Bewley, A., et al.: Simple online and realtime tracking. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 3464–3468 (2016)
Zhou, X., Koltun, V., Krähenbühl, P.: Tracking objects as points. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 474–490. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_28
Ha, Q., et al.: MFNet: towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes. In: International Conference on Intelligent Robots and Systems (IROS), pp. 5108–5115 (2017)
He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Ge, Z., et al.: Yolox: exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430pp (2021)
Duan, K., et al.: Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2019)
Redmon, J., et al.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Lin, T.-Y., et al.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Lu, Z., et al.: Retinatrack: online single stage joint detection and tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14668–14678 (2020)
Wang, Z., Zheng, L., Liu, Y., Li, Y., Wang, S.: Towards real-time multi-object tracking. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 107–122. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_7
Law, H., et al.: Cornernet: detecting objects as paired keypoints. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 734–750 (2018)
Woo, S., et al.: Fairmot: on the fairness of detection and re-identification in multiple object tracking. Int. J. Comput. Vis. 1–19 (2021)
Yu, F., et al.: Deep layer aggregation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2403–2412 (2018)
Liang, C., et al.: One More Check: Making “Fake Background” Be Tracked Again. arXiv preprint arXiv:2104.09441 (2021)
Chen, L., et al.: Real-time multiple people tracking with deeply learned candidate selection and person re-identification. In: 2018 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2018)
Chu, P., et al.: Transmot: spatial-temporal graph transformer for multiple object tracking. arXiv preprint arXiv:2104.00194 (2021)
Wang, N., et al.: Transformer meets tracker: exploiting temporal context for robust visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1571–1580 (2021)
Peng, J., et al.: Chained-tracker: chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 145–161. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_9
Zhu, X., et al.: Flow-guided feature aggregation for video object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 408–417 (2017)
Hu, J., et al.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Park, J., et al.: Bam: bottleneck attention module. arXiv preprint arXiv:1807.06514 (2018)
Woo, S., et al.: Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Zhang, Y., et al.: A simple baseline for multi-object tracking. arXiv preprint arXiv:2004.01888 (2020)
Shuai, B., et al.: SiamMOT: siamese multi-object tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 12372–12382 (2021)
Milan, A., et al.: MOT16: A Benchmark for Multi-Object Tracking (2016)
Zeng, F., et al.: MOTR: End-to-End Multiple-Object Tracking with TRansformer. arXiv preprint arXiv:2105.03247 (2021)
Liang, C., et al.: Rethinking the competition between detection and ReID in multi-object tracking. arXiv preprint arXiv:2010.12138 (2020)
Bernardin, K.: Evaluating multiple object tracking performance: the clear mot metrics. EURASIP J. Image Video Process. 1–10 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhao, X., Zhang, Y. (2022). TFAtrack: Temporal Feature Aggregation for UAV Tracking and a Unified Benchmark. In: Yu, S., et al. Pattern Recognition and Computer Vision. PRCV 2022. Lecture Notes in Computer Science, vol 13534. Springer, Cham. https://doi.org/10.1007/978-3-031-18907-4_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-18907-4_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-18906-7
Online ISBN: 978-3-031-18907-4
eBook Packages: Computer ScienceComputer Science (R0)