TFAtrack: Temporal Feature Aggregation for UAV Tracking and a Unified Benchmark

Zhao, Xiaowei; Zhang, Youhua

doi:10.1007/978-3-031-18907-4_5

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13534))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

2786 Accesses

Abstract

Research on object detection and tracking has achieved remarkable progress in recent years. Due to the superior viewing angle and maneuverability advantages of unmanned aerial vehicles (UAVs), the application of UAV-based tracking is also undergoing rapid development. But since the targets captured by UAVs are tiny and all have similarities and low recognition, this leads to the great challenge of multiple-object tracking (MOT). To solve the two problems mentioned above, We propose TFATracking, a comprehensive framework that fully exploits temporal context for UAV tracking. To further reflect the effectiveness of the algorithm and promote the development of UAV object tracking, we present a large-scale, high-diversity benchmark for short-term UAV multi-object tracking named T2UAV in this work. It contains 20 UAV-captured video sequences with a total number of frames over 12k and an average video length of over 600 frames. We conduct a comprehensive performance evaluation of 8 MOT algorithms on the dataset and present a detailed analysis. We will release the dataset for free academic use.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Wojke, N., et al.: Simple online and realtime tracking with a deep association metric. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 3645–3649 (2017)
Google Scholar
Yu, F., et al.: POI: Multiple Object Tracking with High Performance Detection and Appearance Feature (2016)
Google Scholar
Dicle, C., et al.: The way they move: tracking multiple targets with similar appearance. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2304–2311 (2013)
Google Scholar
Bewley, A., et al.: Simple online and realtime tracking. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 3464–3468 (2016)
Google Scholar
Zhou, X., Koltun, V., Krähenbühl, P.: Tracking objects as points. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 474–490. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_28
Chapter Google Scholar
Ha, Q., et al.: MFNet: towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes. In: International Conference on Intelligent Robots and Systems (IROS), pp. 5108–5115 (2017)
Google Scholar
He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Ge, Z., et al.: Yolox: exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430pp (2021)
Duan, K., et al.: Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2019)
Google Scholar
Redmon, J., et al.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Lin, T.-Y., et al.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Google Scholar
Lu, Z., et al.: Retinatrack: online single stage joint detection and tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14668–14678 (2020)
Google Scholar
Wang, Z., Zheng, L., Liu, Y., Li, Y., Wang, S.: Towards real-time multi-object tracking. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 107–122. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_7
Chapter Google Scholar
Law, H., et al.: Cornernet: detecting objects as paired keypoints. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 734–750 (2018)
Google Scholar
Woo, S., et al.: Fairmot: on the fairness of detection and re-identification in multiple object tracking. Int. J. Comput. Vis. 1–19 (2021)
Google Scholar
Yu, F., et al.: Deep layer aggregation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2403–2412 (2018)
Google Scholar
Liang, C., et al.: One More Check: Making “Fake Background” Be Tracked Again. arXiv preprint arXiv:2104.09441 (2021)
Chen, L., et al.: Real-time multiple people tracking with deeply learned candidate selection and person re-identification. In: 2018 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2018)
Google Scholar
Chu, P., et al.: Transmot: spatial-temporal graph transformer for multiple object tracking. arXiv preprint arXiv:2104.00194 (2021)
Wang, N., et al.: Transformer meets tracker: exploiting temporal context for robust visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1571–1580 (2021)
Google Scholar
Peng, J., et al.: Chained-tracker: chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 145–161. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_9
Chapter Google Scholar
Zhu, X., et al.: Flow-guided feature aggregation for video object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 408–417 (2017)
Google Scholar
Hu, J., et al.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Google Scholar
Park, J., et al.: Bam: bottleneck attention module. arXiv preprint arXiv:1807.06514 (2018)
Woo, S., et al.: Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Google Scholar
Zhang, Y., et al.: A simple baseline for multi-object tracking. arXiv preprint arXiv:2004.01888 (2020)
Shuai, B., et al.: SiamMOT: siamese multi-object tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 12372–12382 (2021)
Google Scholar
Milan, A., et al.: MOT16: A Benchmark for Multi-Object Tracking (2016)
Google Scholar
Zeng, F., et al.: MOTR: End-to-End Multiple-Object Tracking with TRansformer. arXiv preprint arXiv:2105.03247 (2021)
Liang, C., et al.: Rethinking the competition between detection and ReID in multi-object tracking. arXiv preprint arXiv:2010.12138 (2020)
Bernardin, K.: Evaluating multiple object tracking performance: the clear mot metrics. EURASIP J. Image Video Process. 1–10 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, School of Information and Computer, Anhui Agricultural University, Hefei, China
Xiaowei Zhao & Youhua Zhang

Authors

Xiaowei Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Youhua Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Youhua Zhang .

Editor information

Editors and Affiliations

Southern University of Science and Technology, Shenzhen, China
Shiqi Yu
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Zhaoxiang Zhang
Hong Kong Baptist University, Hong Kong, China
Pong C. Yuen
Northwestern Polytechnical University, Xi’an, China
Junwei Han
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Hong Kong Baptist University, Hong Kong, China
Yike Guo
Sun Yat-sen University, Guangzhou, China
Jianhuang Lai
Southern University of Science and Technology, Shenzhen, China
Jianguo Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, X., Zhang, Y. (2022). TFAtrack: Temporal Feature Aggregation for UAV Tracking and a Unified Benchmark. In: Yu, S., et al. Pattern Recognition and Computer Vision. PRCV 2022. Lecture Notes in Computer Science, vol 13534. Springer, Cham. https://doi.org/10.1007/978-3-031-18907-4_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-18907-4_5
Published: 27 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-18906-7
Online ISBN: 978-3-031-18907-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

TFAtrack: Temporal Feature Aggregation for UAV Tracking and a Unified Benchmark