VisDrone-MOT2020: The Vision Meets Drone Multiple Object Tracking Challenge Results

Fan, Heng; Du, Dawei; Wen, Longyin; Zhu, Pengfei; Hu, Qinghua; Ling, Haibin; Shah, Mubarak; Pan, Junwen; Schumann, Arne; Dong, Bin; Stadler, Daniel; Xu, Duo; Bunyak, Filiz; Seetharaman, Guna; Liu, Guizhong; Haritha, V.; Hrishikesh, P. S.; Han, Jie; Palaniappan, Kannappan; Zhu, Kaojin; Sommer, Lars Wilko; Zhang, Libo; Shine, Linu; Yao, Min; Al-Shakarji, Noor M.; Li, Shengwen; Sun, Ting; Sai, Wang; Yu, Wentao; Wu, Xi; Hong, Xiaopeng; Wei, Xing; Zhao, Xingjie; Zhao, Yanyun; Gong, Yihong; Yao, Yuehan; He, Yuhang; Zhao, Zhaoze; Xie, Zhen; Yang, Zheng; Xu, Zhenyu; Luo, Zhipeng; Duan, Zhizhao

doi:10.1007/978-3-030-66823-5_43

Heng Fan¹⁰,
Dawei Du¹¹,
Longyin Wen¹²,
Pengfei Zhu¹³,
Qinghua Hu¹³,
Haibin Ling¹⁰,
Mubarak Shah¹⁴,
Junwen Pan¹³,
Arne Schumann¹⁹,
Bin Dong¹⁶,
Daniel Stadler¹⁷,
Duo Xu²¹,
Filiz Bunyak²⁶,
Guna Seetharaman²⁷,
Guizhong Liu¹⁵,
V. Haritha²⁴,
P. S. Hrishikesh²⁴,
Jie Han¹⁵,
Kannappan Palaniappan²⁶,
Kaojin Zhu²³,
Lars Wilko Sommer¹⁸,
Libo Zhang²⁸,
Linu Shine²⁴,
Min Yao²⁸,
Noor M. Al-Shakarji^25,26,
Shengwen Li²²,
Ting Sun¹⁵,
Wang Sai¹⁶,
Wentao Yu¹⁵,
Xi Wu²¹,
Xiaopeng Hong¹⁵,
Xing Wei¹⁵,
Xingjie Zhao¹⁵,
Yanyun Zhao²²,
Yihong Gong¹⁵,
Yuehan Yao¹⁶,
Yuhang He¹⁵,
Zhaoze Zhao²⁰,
Zhen Xie²¹,
Zheng Yang²³,
Zhenyu Xu¹⁶,
Zhipeng Luo¹⁶ &
…
Zhizhao Duan²¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12538))

Included in the following conference series:

European Conference on Computer Vision

3474 Accesses
10 Citations

Abstract

The Vision Meets Drone (VisDrone2020) Multiple Object Tracking (MOT) is the third annual UAV MOT tracking evaluation activity organized by the VisDrone team, in conjunction with European Conference on Computer Vision (ECCV 2020). The VisDrone-MOT2020 consists of 79 challenging video sequences, including 56 videos (\(\sim \)24K frames) for training, 7 videos (\(\sim \)3K frames) for validation and 17 videos (\(\sim \)6K frames) for evaluation. All frames in these sequences are manually annotated with high-quality bounding boxes. Results of 12 participating MOT algorithms are presented and analyzed in detail. The challenging results, video sequences as well as the evaluation toolkit are made available at http://aiskyeye.com/. By holding VisDrone-MOT2020 challenge, we hope to facilitate future research and applications of MOT algorithms on drone videos.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/ultralytics/yolov5.

References

Al-Shakarji, N.M., Bunyak, F., Seetharaman, G., Palaniappan, K.: Multi-object tracking cascade with multi-step data association and occlusion handling. In: AVSS (2018)
Google Scholar
Al-Shakarji, N.M., Seetharaman, G., Bunyak, F., Palaniappan, K.: Robust multi-object tracking with semantic color correlation. In: AVSS (2017)
Google Scholar
Bergmann, P., Meinhardt, T., Leal-Taixe, L.: Tracking without bells and whistles. In: ICCV (2019)
Google Scholar
Bochinski, E., Eiselein, V., Sikora, T.: High-speed tracking-by-detection without using image information. In: AVSS (2017)
Google Scholar
Brasó, G., Leal-Taixé, L.: Learning a neural solver for multiple object tracking. In: CVPR (2020)
Google Scholar
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: CVPR (2018)
Google Scholar
Chang, Z., et al.: Weighted bilinear coding over salient body parts for person re-identification. Neurocomputing 407, 454–464 (2020)
Article Google Scholar
Chen, B., Deng, W., Hu, J.: Mixed high-order attention network for person re-identification. In: ICCV (2019)
Google Scholar
Chen, K., et al.: Hybrid task cascade for instance segmentation. In: CVPR (2019)
Google Scholar
Chu, P., Fan, H., Tan, C.C., Ling, H.: Online multi-object tracking with instance-aware tracker and dynamic model refreshment. In: WACV (2019)
Google Scholar
Chu, P., Ling, H.: FAMNet: joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking. In: ICCV (2019)
Google Scholar
Dave, A., Khurana, T., Tokmakov, P., Schmid, C., Ramanan, D.: TAO: a large-scale benchmark for tracking any object. arXiv (2020)
Google Scholar
Dendorfer, P., et al.: MOT20: a benchmark for multi object tracking in crowded scenes. arXiv (2020)
Google Scholar
Du, D., et al.: The unmanned aerial vehicle benchmark: object detection and tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 375–391. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_23
Chapter Google Scholar
Evangelidis, G.D., Psarakis, E.Z.: Parametric image alignment using enhanced correlation coefficient maximization. PAMI 30(10), 1858–1865 (2008)
Article Google Scholar
Fan, H., et al.: LaSOT: a high-quality benchmark for large-scale single object tracking. In: CVPR (2019)
Google Scholar
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)
Article Google Scholar
Girshick, R.: Fast R-CNN. In: ICCV (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
Hsieh, M.R., Lin, Y.L., Hsu, W.H.: Drone-based object counting by spatially regularized regional proposal network. In: ICCV (2017)
Google Scholar
Keuper, M., Tang, S., Andres, B., Brox, T., Schiele, B.: Motion segmentation & multiple object tracking by correlation co-clustering. PAMI 42(1), 140–153 (2018)
Article Google Scholar
Kim, C., Li, F., Rehg, J.M.: Multi-object tracking with neural gating using bilinear LSTM. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 208–224. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01237-3_13
Chapter Google Scholar
Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Res. Logist. Q. 2(1–2), 83–97 (1955)
Article MathSciNet Google Scholar
Li, J., Wang, J., Tian, Q., Gao, W., Zhang, S.: Global-local temporal representations for video person re-identification. In: ICCV (2019)
Google Scholar
Li, J., Zhang, S., Huang, T.: Multi-scale 3D convolution network for video based person re-identification. In: AAAI (2019)
Google Scholar
Li, S., Yu, H., Hu, H.: Appearance and motion enhancement for video-based person re-identification. In: AAAI (2020)
Google Scholar
Li, W., Zhao, R., Xiao, T., Wang, X.: DeepReID: deep filter pairing neural network for person re-identification. In: CVPR (2014)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR (2017)
Google Scholar
Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Luo, H., Gu, Y., Liao, X., Lai, S., Jiang, W.: Bag of tricks and a strong baseline for deep person re-identification. In: CVPRW (2019)
Google Scholar
Milan, A., Leal-Taixé, L., Reid, I., Roth, S., Schindler, K.: MOT16: a benchmark for multi-object tracking. arXiv (2016)
Google Scholar
Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for UAV tracking. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 445–461. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_27
Chapter Google Scholar
Pan, S., Tong, Z., Zhao, Y., Zhao, Z., Su, F., Zhuang, B.: Multi-object tracking hierarchically in visual data taken from drones. In: ICCVW (2019)
Google Scholar
Park, E., Liu, W., Russakovsky, O., Deng, J., Li, F.F., Berg, A.: Large Scale Visual Recognition Challenge 2017. http://image-net.org/challenges/LSVRC/2017
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv (2018)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)
Google Scholar
Robicquet, A., Sadeghian, A., Alahi, A., Savarese, S.: Learning social etiquette: human trajectory understanding in crowded scenes. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 549–565. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_33
Chapter Google Scholar
Wang, G., Wang, Y., Zhang, H., Gu, R., Hwang, J.: Exploit the connectivity: multi-object tracking with trackletnet. In: ACM MM, pp. 482–490 (2019)
Google Scholar
Wang, G., Yuan, Y., Chen, X., Li, J., Zhou, X.: Learning discriminative features with multiple granularities for person re-identification. In: ACM MM (2018)
Google Scholar
Wang, J., et al.: Deep high-resolution representation learning for visual recognition. PAMI (2020)
Google Scholar
Wen, L., et al.: UA-DETRAC: a new benchmark and protocol for multi-object detection and tracking. Comput. Vis. Image Underst. 193, 102907 (2020)
Article Google Scholar
Wen, L., Du, D., Li, S., Bian, X., Lyu, S.: Learning non-uniform hypergraph for multi-object tracking. In: AAAI, pp. 8981–8988 (2019)
Google Scholar
Wen, L., Li, W., Yan, J., Lei, Z., Yi, D., Li, S.Z.: Multiple target tracking based on undirected hierarchical relation hypergraph. In: CVPR (2014)
Google Scholar
Wen, L., Zhang, Y., Bo, L., Shi, H., Zhu, R., et al.: VisDrone-MOT2019: the vision meets drone multiple object tracking challenge results. In: ICCVW, pp. 189–198 (2019)
Google Scholar
Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: ICIP (2017)
Google Scholar
Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. In: CVPR (2013)
Google Scholar
Wu, Y., He, K.: Group normalization. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_1
Chapter Google Scholar
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: CVPR (2017)
Google Scholar
Yang, Y., Wen, L., Lyu, S., Li, S.Z.: Unsupervised learning of multi-level descriptors for person re-identification. In: AAAI (2017)
Google Scholar
Zhan, Y., Wang, C., Wang, X., Zeng, W., Liu, W.: A simple baseline for multi-object tracking. arXiv (2020)
Google Scholar
Zhao, L., Li, X., Zhuang, Y., Wang, J.: Deeply-learned part-aligned representations for person re-identification. In: ICCV (2017)
Google Scholar
Zhao, R., Ouyang, W., Wang, X.: Unsupervised salience learning for person re-identification. In: CVPR (2013)
Google Scholar
Zhou, K., Yang, Y., Cavallaro, A., Xiang, T.: Omni-scale feature learning for person re-identification. In: ICCV (2019)
Google Scholar
Zhou, Q., et al.: Graph correspondence transfer for person re-identification. In: AAAI (2018)
Google Scholar
Zhou, X., Koltun, V., Krähenbühl, P.: Tracking objects as points. arXiv (2020)
Google Scholar
Zhu, J., Yang, H., Liu, N., Kim, M., Zhang, W., Yang, M.-H.: Online multi-object tracking with dual matching attention networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 379–396. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_23
Chapter Google Scholar
Zhu, P., Wen, L., Du, D., Bian, X., Hu, Q., Ling, H.: Vision meets drones: past, present and future. CoRR abs/2001.06303 (2020)
Google Scholar
Zhu, P., et al.: VisDrone-VDT2018: the vision meets drone video detection and tracking challenge results. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11133, pp. 496–518. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11021-5_29
Chapter Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 61876127 and Grant 61732011, in part by Natural Science Foundation of Tianjin under Grant 17JCZDJC30800.

Author information

Authors and Affiliations

Stony Brook University, New York, NY, USA
Heng Fan & Haibin Ling
Kitware, Inc., Clifton Park, NY, USA
Dawei Du
JD Finance America Corporation, Mountain View, CA, USA
Longyin Wen
Tianjin University, Tianjin, China
Pengfei Zhu, Qinghua Hu & Junwen Pan
University of Central Florida, Orlando, FL, USA
Mubarak Shah
Xi’an Jiaotong University, Xi’an, China
Guizhong Liu, Jie Han, Ting Sun, Wentao Yu, Xiaopeng Hong, Xing Wei, Xingjie Zhao, Yihong Gong & Yuhang He
DeepBlue Technology (Shanghai), Shanghai, China
Bin Dong, Wang Sai, Yuehan Yao, Zhenyu Xu & Zhipeng Luo
Karlsruhe Institute of Technology, Karlsruhe, Germany
Daniel Stadler
Fraunhofer IOSB, Karlsruhe, Germany
Lars Wilko Sommer
Fraunhofer Center for Machine Learning, Karlsruhe, Germany
Arne Schumann
Southwestern University of Finance and Economics, Chengdu, China
Zhaoze Zhao
Zhejiang University, Hangzhou, China
Duo Xu, Xi Wu, Zhen Xie & Zhizhao Duan
Beijing University of Posts and Telecommunications, Beijing, China
Shengwen Li & Yanyun Zhao
Xidian University, Xi’an, China
Kaojin Zhu & Zheng Yang
College of Engineering Trivandrum, Thiruvananthapuram, India
V. Haritha, P. S. Hrishikesh & Linu Shine
University of Technology, Baghdad, Iraq
Noor M. Al-Shakarji
University of Missouri-Columbia, Columbia, MO, USA
Filiz Bunyak, Kannappan Palaniappan & Noor M. Al-Shakarji
U.S. Naval Research Laboratory, Washington, DC, USA
Guna Seetharaman
Institute of Software, Chinese Academy of Sciences, Beijing, China
Libo Zhang & Min Yao

Authors

Heng Fan
View author publications
You can also search for this author in PubMed Google Scholar
Dawei Du
View author publications
You can also search for this author in PubMed Google Scholar
Longyin Wen
View author publications
You can also search for this author in PubMed Google Scholar
Pengfei Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Qinghua Hu
View author publications
You can also search for this author in PubMed Google Scholar
Haibin Ling
View author publications
You can also search for this author in PubMed Google Scholar
Mubarak Shah
View author publications
You can also search for this author in PubMed Google Scholar
Junwen Pan
View author publications
You can also search for this author in PubMed Google Scholar
Arne Schumann
View author publications
You can also search for this author in PubMed Google Scholar
Bin Dong
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Stadler
View author publications
You can also search for this author in PubMed Google Scholar
Duo Xu
View author publications
You can also search for this author in PubMed Google Scholar
Filiz Bunyak
View author publications
You can also search for this author in PubMed Google Scholar
Guna Seetharaman
View author publications
You can also search for this author in PubMed Google Scholar
Guizhong Liu
View author publications
You can also search for this author in PubMed Google Scholar
V. Haritha
View author publications
You can also search for this author in PubMed Google Scholar
P. S. Hrishikesh
View author publications
You can also search for this author in PubMed Google Scholar
Jie Han
View author publications
You can also search for this author in PubMed Google Scholar
Kannappan Palaniappan
View author publications
You can also search for this author in PubMed Google Scholar
Kaojin Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Lars Wilko Sommer
View author publications
You can also search for this author in PubMed Google Scholar
Libo Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Linu Shine
View author publications
You can also search for this author in PubMed Google Scholar
Min Yao
View author publications
You can also search for this author in PubMed Google Scholar
Noor M. Al-Shakarji
View author publications
You can also search for this author in PubMed Google Scholar
Shengwen Li
View author publications
You can also search for this author in PubMed Google Scholar
Ting Sun
View author publications
You can also search for this author in PubMed Google Scholar
Wang Sai
View author publications
You can also search for this author in PubMed Google Scholar
Wentao Yu
View author publications
You can also search for this author in PubMed Google Scholar
Xi Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaopeng Hong
View author publications
You can also search for this author in PubMed Google Scholar
Xing Wei
View author publications
You can also search for this author in PubMed Google Scholar
Xingjie Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Yanyun Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Yihong Gong
View author publications
You can also search for this author in PubMed Google Scholar
Yuehan Yao
View author publications
You can also search for this author in PubMed Google Scholar
Yuhang He
View author publications
You can also search for this author in PubMed Google Scholar
Zhaoze Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Xie
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Yang
View author publications
You can also search for this author in PubMed Google Scholar
Zhenyu Xu
View author publications
You can also search for this author in PubMed Google Scholar
Zhipeng Luo
View author publications
You can also search for this author in PubMed Google Scholar
Zhizhao Duan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pengfei Zhu .

Editor information

Editors and Affiliations

University of Clermont Auvergne, Clermont Ferrand, France
Adrien Bartoli
Università degli Studi di Udine, Udine, Italy
Andrea Fusiello

A Descriptions of Submitted Trackers

In the appendix, we summarize 12 trackers submitted in the VisDrone-MOT2020 Challenge, which are ordered according to the submissions of their final results.

1.1 A.1 Coarse-to-Fine Multi-Class Multi-Object Tracking (COFE)

Yuhang He, Wentao Yu, Jie Han, Xiaopeng Hong, Xing Wei and Yihong Gong

{hyh1379478,yu1034397129,hanjie1997}@stu.xjtu.edu.cn,

{hongxiaopeng,weixing,ygong}@mail.xjtu.edu.cn

COFE is proposed to track multiple targets in different categories under different scenarios. As shown in Fig. 1, the proposed method contains three major modules: 1) Multi-class object detection, 2) Coarse-category multi-object tracking, and 3) Fine-grained trajectory finetuning. Firstly, we use a Deep Convolutional Neural Network (DCNN) based object detector [6] to detect interested targets in the image plane, where each detection is denoted by a bounding box with a class label and a confidence score. Secondly, we track multiple targets in coarse categories, where fine-grained classes (such as van, bus, car) are summarized into coarse categories (e.g., vehicle). For each coarse category, we perform multi-object tracking by exploiting the appearance and motion information of targets, where the appearance feature is extracted using a DCNN feature extractor [54] and the motion pattern of each target is modeled by a Kalman Filter. Finally, for each obtained trajectory, we finetune its fine-grained class label by a simple voting and refine the tracking results by post processing (i.e., bounding box smoothing).

1.2 A.2 Simple Online Multi-Object Tracker (SOMOT)

Zhipeng Luo, Yuehan Yao, Zhenyu Xu, Bin Dong and Wang Sai

{luozp,yaoyh,xuzy,dongb,wangs}@deepblueai.com

Following separate detection and embedding model, we build a strong detector based on Cascade R-CNN [6] and a embedding model based on Multiple Granularity Network (MGN). For association step, we build simple online multi-object tracker, which is inspired by DeepSORT [46] and FairMOT [51]. For detector, Cascade R-CNN [6] pretrained on COCO [29] is applied. For embedding model, bag of tricks are used to improve the performance of MGN [40]. For association step, we initialize a number of tracklets based on the estimated boxes in the first frame. In the subsequent frames, we associate the boxes to the existing tracklets (all activated tracklets) according to their distances measured by embedding features. We update the appearance features of the trackers in each time step to handle appearance variations. Then, unmatched activated tracklets and estimated boxes are associated by their distance of Intersection over Union (IoU). Also, inactivated tracklets and estimated boxes are associated by their distance of IoU.

1.3 A.3 Position-, Appearance- and Size-aware Tracker (PAS tracker)

Daniel Stadler, Lars Wilko Sommer and Arne Schumann

daniel.stadler@kit.edu,{lars.sommer,arne.schumann}@iosb.fraunhofer.de

The PAS algorithm follows the tracking-by-detection paradigm. As detectors, we train two Cascade R-CNN [6] with FPN [28] on the VisDrone2020 MOT train and val set applying as backbone ResNeXt-101 [49] and HRNetV2p-W32 [41], respectively. Training is performed on randomly sampled image crops (\(608 \times 608\) pixels) and the SSD [30] data augmentation pipeline is used. To improve the quality of the detections, we utilize test-time strategies like horizontal flipping and multi-scale testing. Additionally, we generate category-specific expert models using weights from different epochs and from the two detectors with different backbones. For associating detections, we build a similarity measure that integrates position, appearance and size information of objects. A constant velocity model is assumed for the motion prediction of objects and a camera motion compensation model based on the Enhanced Correlation Coefficient Maximization [15] is also applied. The appearance of an object is represented by a feature vector computed with a re-identification model from [31] based on a ResNet-50 [19]. The association of tracks and new detections is solved by the Hungarian method [23]. Additionally, to remove false positive detections in crowded scenarios, a simple filtering approach considering the overlap of existing tracks and new detections is proposed. Finally, we remove short tracks with less than 10 frames and small tracks with a mean size of less than 100 pixels as most of them are false positives.

1.4 A.4 Simple Online and Realtime Tracking with a Deep Association (Deepsort)

Zhaoze Zhao

hanjie@smail.swufe.edu.cn

Simple Online and Realtime Tracking (SORT) [46] is a pragmatic approach to multiple object tracking with a focus on simple, effective algorithms. In this paper, we integrate appearance information to improve the performance of SORT. Due to this extension we are able to track objects through longer periods of occlusions, effectively reducing the number of identity switches. In spirit of the original framework we place much of the computational complexity into an offline pre-training stage where we learn a deep association metric on a large-scale person re-identification dataset. During online application, we establish measurement-to-track associations using nearest neighbour queries in visual appearance space.

1.5 A.5 YOLOv5 based V-IOU tracker (YOLO-TRAC)

Zhizhao Duan, Xi Wu, Duo Xu and Zhen Xie

{Duanai,21725018}@zju.edu.cn,wuxi9410@gmail.com,zjutxz@hotmail.com

Trac is a track by detection framework. We use YOLO-V5^{Footnote 1} as our detection network, and V-IOU Tracker [4] is used for tracking.

1.6 A.6 An improved multi-object tracking method for the VisDrone videos based on CenterTrack (VDCT)

Shengwen Li and Yanyun Zhao

{2019140337,zyy}@bupt.edu.cn

VDCT is improved from CenterTrack, which is a point-based framework that combines detection and tracking [56]. Its inputs include the current frame, the previous frame, and the tracked objects in the previous frame; and it outputs the displacements of tracked objects. Our improvements include: (1) The tracked objects which do not match within 20 frames are allowed to associate with objects detected in current frame by properly extending the survival time of the tracked objects. (2) The motion direction of adjacent frame objects usually does not change abruptly due to the continuity of object motion, so we calculate the dot product of the displacements of adjacent frame objects and decide whether to associate the objects. (3) We use the NIOU method [34] to perform non-maximum suppression on vehicle objects. (4) We adopt the hierarchical matching strategy in DeepSORT [46] to solve the long occlusion problem. (5) OSNet [54] is used to extract each trajectory’s appearance feature, measure their distance from others and we simply merge two trajectories if their distance is close enough. The experimental results show the effectiveness of our improved method.

1.7 A.7 Cascade RCNN based IOU tracker (Cascade RCNN+IOU)

Ting Sun and Xingjie Zhao

sunting9999@stu.xjtu.edu.cn,1243273854@qq.com

We use Cascade R-CNN [6] as the detector with three improvements: (1) We use Group normalization [48] instead of Batch normalization; (2) We use online hard example mining to select positive and negative samples; (3) We use multiple scales to test our data; (4) We use two stronger backbones to train models and integrate them. Then, we perform detection association using the IOU tracker [4].

1.8 A.8 Hybrid task cascade based IOU tracker (HTC+IOU)

Ting Sun, Xingjie Zhao and Guizhong Liu

sunting9999@stu.xjtu.edu.cn,1243273854@qq.com

We use hybrid task cascade for instance segmentation [9] as the detector with three improvements: (1) We use Group normalization [48] instead of Batch normalization; (2) We use online hard example mining to select positive and negative samples; (3) We use multiple scales to test our data; (4) We use two stronger backbones to train models and integrate them. Then, we perform detection association using the IOU tracker [4].

1.9 A.9 Multi-object Tracking based on HRNet (HR-GNN)

Zheng Yang and Kaojin Zhu

151776257@qq.com,1320531351@qq.com

HR-GNN is built based on the detector using HRNet [41] as backbone. Then the tracking results are generated by using GNN to analyze the detection results.

1.10 A.10 Multi-object tracking with TrackletNet (TNT)

Haritha V, Melvin Kuriakose, Hrishikesh PS and Linu Shine

vakkatharitha@gmail.com

TNT is based on the work of [39] by merging temporal and appearance information together as a unified framework. We learn appearance similarity among tracklets by a graph model, where we use CNN features and intersection-over-union (IOU) with epipolar constraints to compensate camera movement between adjacent frames. Finally, the tracklets can be clustered into groups, resulting in trajectories with individual object IDs.

1.11 A.11 A simple baseline for one-shot multi-object tracking (anchor-free_mot)

Min Yao and Libo Zhang

libo@iscas.ac.cn

The anchor-free_mot method is based on FairMOT [51]. Specifically, we use the encoder-decoder network to extract feature maps. Then, two simple parallel heads are used to predict the bounding box and re-ID features of the targets, respectively. Notably, the targets are represented by points from the anchor-free object detection method.

1.12 A.12 Semantic Color Correlation Tracker (SCTrack)

Noor M. Al-Shakarji, Filiz Bunyak, Guna Seetharaman and Kannappan Palaniappan

{nmahyd,bunyak,palaniappank}@mail.missouri.edu,

gunasekaran.seetharaman@rl.af.mil

SCTrack is a time-efficient detection-based multi-object tracking method. Specifically, we use a three-step cascaded data association scheme to combine a fast spatial distance only short-term data association, a robust tracklet linking step using discriminative object appearance models, and an explicit occlusion handling unit relying not only on tracked objects’ motion patterns but also on environmental constraints such as presence of potential occluders in the scene. The details can be referred to [1, 2].

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fan, H. et al. (2020). VisDrone-MOT2020: The Vision Meets Drone Multiple Object Tracking Challenge Results. In: Bartoli, A., Fusiello, A. (eds) Computer Vision – ECCV 2020 Workshops. ECCV 2020. Lecture Notes in Computer Science(), vol 12538. Springer, Cham. https://doi.org/10.1007/978-3-030-66823-5_43

Download citation

DOI: https://doi.org/10.1007/978-3-030-66823-5_43
Published: 03 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66822-8
Online ISBN: 978-3-030-66823-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

VisDrone-MOT2020: The Vision Meets Drone Multiple Object Tracking Challenge Results

Abstract

Access this chapter

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Descriptions of Submitted Trackers

A Descriptions of Submitted Trackers

1.1 A.1 Coarse-to-Fine Multi-Class Multi-Object Tracking (COFE)

1.2 A.2 Simple Online Multi-Object Tracker (SOMOT)

1.3 A.3 Position-, Appearance- and Size-aware Tracker (PAS tracker)

1.4 A.4 Simple Online and Realtime Tracking with a Deep Association (Deepsort)

1.5 A.5 YOLOv5 based V-IOU tracker (YOLO-TRAC)

1.6 A.6 An improved multi-object tracking method for the VisDrone videos based on CenterTrack (VDCT)

1.7 A.7 Cascade RCNN based IOU tracker (Cascade RCNN+IOU)

1.8 A.8 Hybrid task cascade based IOU tracker (HTC+IOU)

1.9 A.9 Multi-object Tracking based on HRNet (HR-GNN)

1.10 A.10 Multi-object tracking with TrackletNet (TNT)

1.11 A.11 A simple baseline for one-shot multi-object tracking (anchor-free_mot)

1.12 A.12 Semantic Color Correlation Tracker (SCTrack)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation