Skip to main content

VisDrone-SOT2020: The Vision Meets Drone Single Object Tracking Challenge Results

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12538))

Abstract

The Vision Meets Drone (VisDrone2020) Single Object Tracking is the third annual UAV tracking evaluation activity organized by the VisDrone team, in conjunction with European Conference on Computer Vision (ECCV 2020). The VisDrone-SOT2020 Challenge presents and discusses the results of 13 participating algorithms in detail. By using ensemble of different trackers trained on several large-scale datasets, the top performer in VisDrone-SOT2020 achieves better results than the counterparts in VisDrone-SOT2018 and VisDrone-SOT2019. The challenging results, collected videos as well as the valuation toolkit are made available at http://aiskyeye.com/. By holding VisDrone-SOT2020 challenge, we hope to provide the community a dedicated platform for developing and evaluating drone-based tracking approaches.

Keywords

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://youtube-vos.org/dataset/vos/.

  2. 2.

    https://davischallenge.org/.

References

  1. Ahn, N., Kang, B., Sohn, K.A.: Efficient deep neural network for photo-realistic image super-resolution. arXiv (2019)

    Google Scholar 

  2. Kristan, M., et al.: The sixth visual object tracking VOT2018 challenge results. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11129, pp. 3–53. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11009-3_1

    Chapter  Google Scholar 

  3. Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional Siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_56

    Chapter  Google Scholar 

  4. Bewley, A., Ge, Z., Ott, L., Ramos, F., Upcroft, B.: Simple online and realtime tracking. In: ICIP (2016)

    Google Scholar 

  5. Bhat, G., Danelljan, M., Gool, L.V., Timofte, R.: Learning discriminative model prediction for tracking. In: ICCV (2019)

    Google Scholar 

  6. Bolme, D.S., Beveridge, J.R., Draper, B.A., Lui, Y.M.: Visual object tracking using adaptive correlation filters. In: CVPR (2010)

    Google Scholar 

  7. Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: Eco: Efficient convolution operators for tracking. In: CVPR (2017)

    Google Scholar 

  8. Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: ATOM: accurate tracking by overlap maximization. In: CVPR (2019)

    Google Scholar 

  9. Danelljan, M., Gool, L.V., Timofte, R.: Probabilistic regression for visual tracking. In: CVPR (2020)

    Google Scholar 

  10. Danelljan, M., Häger, G., Khan, F., Felsberg, M.: Accurate scale estimation for robust visual tracking. In: BMVC (2014)

    Google Scholar 

  11. Danelljan, M., Hager, G., Shahbaz Khan, F., Felsberg, M.: Learning spatially regularized correlation filters for visual tracking. In: ICCV (2015)

    Google Scholar 

  12. Danelljan, M., Robinson, A., Shahbaz Khan, F., Felsberg, M.: Beyond correlation filters: learning continuous convolution operators for visual tracking. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 472–488. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_29

    Chapter  Google Scholar 

  13. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)

    Google Scholar 

  14. Du, D., et al.: The unmanned aerial vehicle benchmark: object detection and tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 375–391. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_23

    Chapter  Google Scholar 

  15. Du, D., Wen, L., Qi, H., Huang, Q., Tian, Q., Lyu, S.: Iterative graph seeking for object tracking. TIP 27(4), 1809–1821 (2018)

    MathSciNet  MATH  Google Scholar 

  16. Du, D., et al.: VisDrone-SOT2019: the vision meets drone single object tracking challenge results. In: ICCVW (2019)

    Google Scholar 

  17. Fan, H., et al.: LaSOT: a high-quality benchmark for large-scale single object tracking. In: CVPR (2019)

    Google Scholar 

  18. Fan, H., Ling, H.: Parallel tracking and verifying: a framework for real-time and high accuracy visual tracking. In: ICCV (2017)

    Google Scholar 

  19. Fan, H., Ling, H.: SANet: structure-aware network for visual tracking. In: CVPRW (2017)

    Google Scholar 

  20. Fan, H., Ling, H.: Siamese cascaded region proposal networks for real-time visual tracking. In: CVPR (2019)

    Google Scholar 

  21. Galoogahi, H.K., Fagg, A., Huang, C., Ramanan, D., Lucey, S.: Need for speed: a benchmark for higher frame rate object tracking. In: ICCV (2017)

    Google Scholar 

  22. Galoogahi, H.K., Fagg, A., Lucey, S.: Learning background-aware correlation filters for visual tracking. In: ICCV (2017)

    Google Scholar 

  23. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)

    Google Scholar 

  24. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  25. Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. TPAMI 37(3), 583–596 (2015)

    Article  Google Scholar 

  26. Huang, L., Zhao, X., Huang, K.: GOT-10k: a large high-diversity benchmark for generic object tracking in the wild. TPAMI (2019)

    Google Scholar 

  27. Jiang, B., Luo, R., Mao, J., Xiao, T., Jiang, Y.: Acquisition of localization confidence for accurate object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 816–832. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_48

    Chapter  Google Scholar 

  28. Jung, I., Son, J., Baek, M., Han, B.: Real-time MDNet. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 89–104. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_6

    Chapter  Google Scholar 

  29. Kristan, M., et al.: A novel performance evaluation methodology for single-target trackers. TPAMI 38(11), 2137–2155 (2016)

    Article  Google Scholar 

  30. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS (2012)

    Google Scholar 

  31. Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: SiamRPN++: evolution of Siamese visual tracking with very deep networks. In: CVPR (2019)

    Google Scholar 

  32. Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with Siamese region proposal network. In: CVPR (2018)

    Google Scholar 

  33. Li, F., Tian, C., Zuo, W., Zhang, L., Yang, M.H.: Learning spatial-temporal regularized correlation filters for visual tracking. In: CVPR (2018)

    Google Scholar 

  34. Li, S., Yeung, D.Y.: Visual object tracking for unmanned aerial vehicles: a benchmark and new motion models. In: AAAI (2017)

    Google Scholar 

  35. Li, Y., Zhu, J.: A scale adaptive kernel correlation filter tracker with feature integration. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8926, pp. 254–265. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16181-5_18

    Chapter  Google Scholar 

  36. Liang, P., Blasch, E., Ling, H.: Encoding color information for visual tracking: algorithms and benchmark. TIP 24(12), 5630–5644 (2015)

    MathSciNet  MATH  Google Scholar 

  37. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  38. Liu, T., Wang, G., Yang, Q.: Real-time part-based visual tracking via adaptive correlation filters. In: CVPR (2015)

    Google Scholar 

  39. Lukezic, A., et al.: CDTB: a color and depth visual object tracking dataset and benchmark. In: ICCV (2019)

    Google Scholar 

  40. Lv, F., Lu, F., Wu, J., Lim, C.: MBLLEN: low-light image/video enhancement using CNNs. In: BMVC (2018)

    Google Scholar 

  41. Ma, C., Huang, J.B., Yang, X., Yang, M.H.: Hierarchical convolutional features for visual tracking. In: ICCV (2015)

    Google Scholar 

  42. Marvasti-Zadeh, S.M., Khaghani, J., Ghanei-Yakhdan, H., Kasaei, S., Cheng, L.: COMET: context-aware IoU-guided network for small object tracking. arXiv (2020)

    Google Scholar 

  43. Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for UAV tracking. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 445–461. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_27

    Chapter  Google Scholar 

  44. Mueller, M., Smith, N., Ghanem, B.: Context-aware correlation filter tracking. In: CVPR (2017)

    Google Scholar 

  45. Müller, M., Bibi, A., Giancola, S., Alsubaihi, S., Ghanem, B.: TrackingNet: a large-scale dataset and benchmark for object tracking in the wild. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 310–327. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_19

    Chapter  Google Scholar 

  46. Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: CVPR (2016)

    Google Scholar 

  47. Real, E., Shlens, J., Mazzocchi, S., Pan, X., Vanhoucke, V.: YouTube-BoundingBoxes: a large high-precision human-annotated data set for object detection in video. In: CVPR, pp. 7464–7473 (2017)

    Google Scholar 

  48. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)

    Google Scholar 

  49. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. IJCV 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  50. Smeulders, A.W., Chu, D.M., Cucchiara, R., Calderara, S., Dehghan, A., Shah, M.: Visual tracking: an experimental survey. TPAMI 36(7), 1442–1468 (2014)

    Article  Google Scholar 

  51. Song, Y., et al.: VITAL: visual tracking via adversarial learning. In: CVPR (2018)

    Google Scholar 

  52. Tao, R., Gavves, E., Smeulders, A.W.: Siamese instance search for tracking. In: CVPR (2016)

    Google Scholar 

  53. Valmadre, J., et al.: Long-term tracking in the wild: a benchmark. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 692–707. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_41

    Chapter  Google Scholar 

  54. Voigtlaender, P., Luiten, J., Torr, P.H., Leibe, B.: Siam R-CNN: visual tracking by re-detection. In: CVPR (2020)

    Google Scholar 

  55. Wang, G., Luo, C., Xiong, Z., Zeng, W.: SPM-Tracker: series-parallel matching for real-time visual object tracking. In: CVPR (2019)

    Google Scholar 

  56. Wang, Q., Zhang, L., Bertinetto, L., Hu, W., Torr, P.H.: Fast online object tracking and segmentation: a unifying approach. In: CVPR (2019)

    Google Scholar 

  57. Wen, L., et al.: VisDrone-SOT2018: the vision meets drone single-object tracking challenge results. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11133, pp. 469–495. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11021-5_28

    Chapter  Google Scholar 

  58. Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. In: CVPR (2013)

    Google Scholar 

  59. Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. TPAMI 37(9), 1834–1848 (2015)

    Article  Google Scholar 

  60. Yan, B., Wang, D., Lu, H., Yang, X.: Alpha-Refine: boosting tracking performance by precise bounding box estimation. arXiv (2020)

    Google Scholar 

  61. Yang, G., Ramanan, D.: Volumetric correspondence networks for optical flow. In: NeurIPS (2019)

    Google Scholar 

  62. Ying, Z., Li, G., Ren, Y., Wang, R., Wang, W.: A new low-light image enhancement algorithm using camera response model. In: ICCVW (2017)

    Google Scholar 

  63. Yuan, D., Fan, N., He, Z.: Learning target-focusing convolutional regression model for visual object tracking. Knowl.-Based Syst. (2020)

    Google Scholar 

  64. Zhang, Y., Zhang, J., Guo, X.: Kindling the darkness: a practical low-light image enhancer. In: ACM MM (2019)

    Google Scholar 

  65. Zhou, J., Wang, P., Sun, H.: Discriminative and robust online learning for Siamese visual tracking. In: AAAI (2020)

    Google Scholar 

  66. Zhou, W., Wen, L., Zhang, L., Du, D., Luo, T., Wu, Y.: SiamMan: Siamese motion-aware network for visual tracking. CoRR abs/1912.05515 (2019)

    Google Scholar 

  67. Zhu, P., Wen, L., Du, D., Bian, X., Hu, Q., Ling, H.: Vision meets drones: past, present and future. CoRR abs/2001.06303 (2020)

    Google Scholar 

  68. Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware Siamese networks for visual object tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 103–119. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_7

    Chapter  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 61876127 and Grant 61732011, in part by Natural Science Foundation of Tianjin under Grant 17JCZDJC30800.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pengfei Zhu .

Editor information

Editors and Affiliations

A Descriptions of Submitted Trackers

A Descriptions of Submitted Trackers

In the appendix, we summarize 13 trackers submitted in the VisDrone-SOT2020 Challenge, which are ordered according to the submissions of their final results.

1.1 A.1 Strategy and Motion Integrated Long-Term Experts-Version 2 (SMILEv2)

Yuxuan Li, Zhongjian Huang and Biao Wang

liyuxuan_xidian@126.com, huangzj@stu.xidian.edu.cn, biaowang@webank.com

SMILEv2 is combined with three kind of basic trackers and integrated in our IPIU-tracking framework. In this new framework, we are able to select different trackers in different situations by a semi-automatic way. As shown in Fig. 9, the framework has three parts, which are prediction module, tracking module and fix module. For prediction module, we introduce the Kalman filter and the optical flow method of VCN [61] as the object motion information and camera motion information respectively. For tracking module, we use three trackers including Dimp [5], SiamMask [56], SORT-MOT [4]. For fix module, we first obtain the output of prediction and tracking modules, and then judge the final result.

Fig. 9.
figure 9

The framework of SMILEv2.

1.2 A.2 Long-Term Tracking with Night-Enhancement and Motion Integrated (LTNMI)

Yuting Yang, Yanjie Gao, Ruiyan Ma and Xin Hou

{ytyang_1,yjgao}@stu.xidian.edu.cn, 3028408083@qq.com, xinhou@webank.com

LTNMI is a combination of ATOM [8], SiamRPN++ [31], Siam-RCNN [54] and Dimp [5]. We combined the ATOM and SiamRPN++ to get a better result, and then our method can give the reliability low limits of the above two systems on the condition of different confidence levels, which makes the systems more reliable respectively as different features play different role in the process of tracking based on their reliability. In addition, we improve the prediction of blurred scenes by using SIFT algorithm to match features. By estimating motion, the regression boxes can continue tracking the target in case of occlusion. When encountering dark or low-resolution scenes, we use threshold judgment and image brightness enhancement processing. We use MBLLEN [40] algorithm to process weak light enhancement. And then, we use Dimp to get the result of the sequences with weak light enhancement. At last, we use Siam-RCNN to find some lost frames. As a result, when the overlap are of fused result and the result generated by Siam-RCNN is nearly 95%, we conclude that the result generated by Siam-RCNN is better because of accurate detecting bounding box.

1.3 A.3 Ensemble of Classification and Matching Models with Alpha-Refine for UAV Tracking (ECMMAR)

Shuhao Chen, Zezhou Wang, Simiao Lai, Dong Wang and Huchuan Lu

{shuhaochn,zzwang}@mail.dlut.edu.cn, laisimiao1@gmail.com,

{wdice,lhchuan}@dlut.edu.cn

ECMMAR tracker is improved from Dimp [5] and SiamRPN++ [31] with online update module [65]. Dimp performs well in distinguishing distractors, while SiamRPN++ with the re-detection module performs well in detecting target when target disappeared by full occlusion or fast perspective conversion. The main modification are: 1) Develop an interactive mechanism to handle with long-term tracking and improve the robustness. 2) Muti-scale search regions are set to help to re-detect target when full occlusion or fast perspective conversion happened. 3) Use a refinement module [60] to refine the localized bounding box. 4) Employ a low-light image enhancement [62] method to deal with low-light scenes. 5) Fine-tune the superdimp pre-trained model and alpha-refine pre-trained model with visdrone2020 dataset. 6) Motion compensation is used when the camera viewing angle changes greatly. 7) Inertial motion is added when both tracker results are unreliable.

1.4 A.4 UAV Tracking with Extra Proposals Based on Corrected Velocity Prediction (CVP-superdimp)

Zitong Yi and Yanyun Zhao

{zitong.yi,zyy}@mail.dlut.edu.cn

CVP-superdimp is a robust tracking strategy under the circumstance of UAV tracking, especially for the nerve-wracking problem of fierce camera moving and long-term full occlusion. The base tracker follows [5, 9], which contains two modules: object classification module based on DIMP and bounding box regression module based on prDIMP. Our proposed tracking strategy adds a new module of velocity prediction for both short-term and long-term, which can provide additional high-quality proposals for tracker searching in the next frame.

1.5 A.5 LTCOMET: Context-Aware IoU-Guided Network for Small Object Tracking (LTCOMET)

Seyed Mojtaba Marvasti-Zadeh, Javad Khaghani, Li Cheng, Hossein Ghanei-Yakhdan and Shohreh Kasaei

{mojtaba.marvasti,khaghani,lcheng5}@ualberta.ca,hghaneiy@yazd.ac.ir,

kasaei@sharif.edu

To bridge the gap between aerial views tracking methods and modern trackers, the modified context-aware IoU-guided tracker (LTCOMET) is proposed that exploits the offline reference proposal generation strategy (same as COMET tracker [42]), multitask two-stream network [42], kindling the darkness (KinD) [64], and photo-realistic cascading residual network (PCARN) [1]. The network architecture is the same as [42] without using channel reduction after the multi-scale aggregation and fusion modules (MSAFs). The KinD employs a network for light adjustment and degradation removal, which is employed as a preprocessing of LTCOMET on target patches. Also, the LTCOMET employs the generator network of PCARN to recover high-resolution patches of target and its context from low-resolution ones. Furthermore, the proposed method uses a widowing search strategy when it loses the target. The proposed LTCOMET has been trained on a broad range of tracking datasets while it exploits various photometric and geometric distortions (i.e., data augmentations) to improve the variability of target regions.

1.6 A.6 Discriminative and Robust Online Learning for Long Term Siamese Visual Tracking (DROL_LT)

Jinghao Zhou, Peng Wang, Haoyang Sun and Zikai Zhang

{jensen.zhoujh,zzkdemail}@gmail.com,{peng.wang,sunhaoyang}@gmail.com

DROL_LT is based on DROL [65]. DROL proposes an online module with an attention mechanism for offline Siamese networks to extract target-specific features under L2 error. DROL also proposes a filter update strategy adaptive to treacherous background noises for discriminative learning, and a template update strategy to handle large target deformations for robust learning. DROL_LT adds two modules to improve DROL results in long term tracking tasks. (1) A detector is added to help DROL recover the targets, which disappear and appear many times. ROI Align is used to extract the features from mixed offline feature maps with the bounding boxes information from detector. (2) A mechanism is designed to help tracker to decide when to update online classifiers and when to use detectors, which depends on a set of thresholds given from experience.

1.7 A.7 Discriminative and Robust Online Learning for Long Term Siamese Visual Tracking (DIMP-SiamRPN)

Zhipeng Luo, Penghao Zhang, Yubo Sun and Bin Dong

{luozp,zhangph,sunyb,Dongbin}@deepblueai.com

DIMP-SiamRPN is improved based on PrDIMP [9] and SiamRPN++ [31]. First, we use the frame numbers to divide videos in the challenge set into long-term videos and short-term videos. The short videos are tested using PrDIMP’s hyper-parameter adjustment model to obtain the results. Daytime scenes in the long videos are tested by the SiamRPN++ model. In the SiamRPN++ model, we enlarge the instance size 15 pixels every frame, and the upper limit of the search threshold is 1000. In addition, when the target seems to be lost, we reset the center of search scope to the center of the image. Furthermore, we define a make-up strategy to deal with occlusion. As scenes of night in the long videos, we divide them into strong light scenes and dark scenes according to the light intensity, in which different inference parameters are used.

1.8 A.8 Discriminative Model Prediction and Accurate Re-detection for Drone Tracking (DiMP_AR)

Xuefeng Zhu, Xiaojun Wu and Tianyang Xu

{xuefeng_zhu95,xiaojun_wu_jnu,tianyang_xu}@163.com

DiMP_AR is based on the DiMP [5] by adding a re-detection module. We use the DiMP tracker as a local tracker to predict target state normally and the RT-MDNet [28] is used as a verifier to verify the prediction of DiMP. If the verification is above a predefined threshold, the normal local tracking is conducted in next frame. Otherwise, the re-detection module will be activated. Firstly, the faster R-CNN detector [48] is used to detect some highly possible target candidates in the whole image of next frame. Then the SiamRPN++ [31] tracker is employed to detect the search regions regarding the possible target candidates. When the target is regained, we switch to local tracking with the tracker DiMP.

1.9 A.9 Precise Visual Tracking by Re-detection (PrSiamR-CNN)

Zhongzhou Zhang, Lei Zhang, Keyang Wang and Zhenwei He

{zz.zhang,leizhang,wangkeyang,hzw}@cqu.edu.cn

PrSiamR-CNN is modified from recently proposed state-of-the-art single object tracker Siam R-CNN [54] by using extra training data from VisDrone-SOT2020.

1.10 A.10 Discriminative Model Prediction with Deeper ResNet-101 (DiMP-101)

Liting Lin and Yong Xu

l.lt@mail.scut.edu.cn

DiMP-101 is based on the DiMP [5] model, adopting deeper ResNet-101 as the backbone. With higher learning capacity of the feature extraction network, the performance of the tracking algorithm has been significantly improved to a new level.

1.11 A.11 ECO: Efficient Convolution Operators for Tracking (ECO)

Lei Pang

panglei2015@ia.ac.cn

ECO [7] is a discriminative correlation filter based tracker using deep features. This method introduces a factorized convolution operator and a compact generative model of the training sample distribution to reduce model parameters. In addition, it proposes a conservative model update strategy with improved robustness and reduced complexity. More details can be referred to [7].

1.12 A.12 Target-Focusing Convolutional Regression Tracking (TFCR)

Di Yuan, Nana Fan and Zhenyu He

dyuanhit@gmail.com

TFCR [63] is a target-focusing convolutional regression (CR) model for visual object tracking tasks. This model uses a target-focusing loss function to alleviate the influence of background noise on the response map of the current tracking image frame, which effectively improves the tracking accuracy. In particular, it can effectively balance the disequilibrium of positive and negative samples by reducing some effects of the negative samples that act on the object appearance model.

1.13 A.13 DDL-Tracker (DDL)

Yong Wang, Lu Ding, Dongjie Zhou and Wentao He

wangyong5@mail.sysu.edu.cn,dinglu@sjtu.edu.cn,13520071811@163.com, weishiinsky@126.com

DDL-tracker employs deep layers to extract features. Meanwhile, one HOG detector is trained online. If the tracking result is below a threshold, we use the results by the detector.

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fan, H. et al. (2020). VisDrone-SOT2020: The Vision Meets Drone Single Object Tracking Challenge Results. In: Bartoli, A., Fusiello, A. (eds) Computer Vision – ECCV 2020 Workshops. ECCV 2020. Lecture Notes in Computer Science(), vol 12538. Springer, Cham. https://doi.org/10.1007/978-3-030-66823-5_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-66823-5_44

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-66822-8

  • Online ISBN: 978-3-030-66823-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics