Skip to main content
Log in

Robust real-time visual object tracking via multi-scale fully convolutional Siamese networks

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Robust visual object tracking against occlusions and deformations is still very challenging task. To tackle these issues, existing Convolutional Neural Networks (CNNs) based trackers either fail to handle them or can just run in low speed. In this paper, we present a realtime tracker which is robust to occlusions and deformations based on a Region-based, Multi-Scale Fully Convolutional Siamese Network (R-MSFCN). In the proposed R-MSFCN, the information of regions is extracted separately by the proposition of position-sensitive score maps on multiple convolutional layers. Combining these score maps via adaptive weights leads to accurate location of the target on a new frame. The experiments illustrate that our method outperforms state-of-the-art approaches, and can handle the cases of object deformation and occlusion at about 31 FPS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Ahuja N, Liu S, Ghanem B, Zhang T (2012) Robust visual tracking via multi-task sparse learning. In: CVPR, pp 2042–2049

  2. Bertinetto L, Valmadre J, Golodetz S, Miksik O, Torr P (2016) Staple: complementary learners for real-time tracking. Comput Sci 38(2):311–323

    Google Scholar 

  3. Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr P (2016) Fully-convolutional siamese networks for object tracking. arXiv:1606.09549

  4. Caseiro R, Martins P, Batista J (2015) High-speed tracking with kernelized correlation filters. TPAMI

  5. Danelljan M, Hager G, Khan FS, Felsberg M (2014) Accurate scale estimation for robust visual tracking. In: BMVC

  6. Danelljan M, Hager G, Khan FS, Felsberg M (2016) Adaptive decontamination of the training set: a unified formulation for discriminative visual tracking. In: CVPR

  7. Danelljan M, Robinson A, Khan FS, Felsberg M (2016) Beyond correlation filters: learning continuous convolution operators for visual tracking. In: ECCV

  8. Hare S, Saffari A, Torr PHS (2016) Struck: structured output tracking with kernels. TPAMI 38(10):263–270

    Article  Google Scholar 

  9. Held D, Thrun S, Savarese S (2016) Learning to track at 100 fps with deep regression networks. In: ECCV

  10. Henriques JF, Rui C, Martins P, Batista J (2015) High-speed tracking with kernelized correlation filters. TPAMI 37(3):583–596

    Article  Google Scholar 

  11. Jifeng D, Yi L, Kaiming H, Jian S (2016) R-FCN: object detection via region-based fully convolutional networks. arXiv:1605.06409

  12. Kalal Z, Mikolajczyk K, Matas J (2012) Tracking-learning-detection. TPAMI 34(7):1409–22

    Article  Google Scholar 

  13. Kristan M, Matas J, Leonardis A, Felsberg M, Cehovin L, Fernandez G, Vojir T, Hager G, Nebehay G, Pflugfelder R (2016) The visual object tracking vot2015 challenge results. In: ICCV, pp 564–586

  14. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25(2):2012

    Google Scholar 

  15. Li Y, Qi H, Dai J, Ji X, Wei Y (2016) Fully convolutional instance-aware semantic segmentation. arXiv preprint arXiv:1611.07709

  16. Liu T, Wang G, Yang Q (2015) Real-time part-based visual tracking via adaptive correlation filters. In: CVPR, pp 4902–4912

  17. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: CVPR, pp 3431–3440

  18. Ma C, Yang X, Zhang C, Yang MH (2015) Long-term correlation tracking. In: CVPR, pp 5388–5396

  19. Nam H, Han B (2015) Learning multi-domain convolutional neural networks for visual tracking. arXiv preprint arXiv:1510.07945

  20. Nam H, Baek M, Han B (2016) Modeling and propagating cnns in a tree structure for visual tracking. arXiv:1608.07242

  21. Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: ICCV, pp 1520–1528

  22. Pinheiro PO, Collobert R, Dollar P (2015) Learning to segment object candidates. Comput Sci: 1990–1998

  23. Qi Y, Zhang S, Qin L, Yao H, Huang Q, Lim J, Yang MH (2016) Hedged deep tracking. In: CVPR, pp 4303–4311

  24. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  25. Ross DA, Lim J, Lin RS, Yang MH (2008) Incremental learning for robust visual tracking. IJCV 77(1):125–141

    Article  Google Scholar 

  26. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M (2015) Imagenet large scale visual recognition challenge. IJCV 115(3):211–252

    Article  MathSciNet  Google Scholar 

  27. Tao R, Gavves E, Smeulders AW (2016) Siamese instance search for tracking. In: CVPR, pp 1420–1429

  28. Wang L, Ouyang W, Wang X, Lu H (2016) Visual tracking with fully convolutional networks. In: ICCV, pp 3119–3127

  29. Wu Y, Lim J, Yang MH (2013) Online object tracking: a benchmark. In: CVPR, pp 2411–2418

  30. Wu Y, Lim J, Yang MH (2015) Object tracking benchmark. TPAMI 37 (9):1–1

    Article  Google Scholar 

  31. Xiang W, Zhou Y (2014) Part-based tracking with appearance learning and structural constrains. In: ICONIP. Springer, Berlin, pp 594–601

  32. Yao R, Shi Q, Shen C, Zhang Y (2013) Part-based visual tracking with online latent structural learning. In: CVPR, pp 2363–2370

  33. Zhang T, Jia K, Xu C, Ma Y, Ahuja N (2014) Partial occlusion handling for visual tracking via robust part matching. In: ICCV, pp 1258–1265

  34. Zhao H, Shi J, Qi X, Wang X, Jia J (2016) Pyramid scene parsing network. arXiv:1612.01105

Download references

Acknowledgments

This work was supported in part by Natural Science Foundation of China (No.61231018), National Science and Technology Support Program (2015BAH31F01) and Program of Introducing Talents of Discipline to University under grant B13043.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Longchao Yang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, L., Jiang, P., Wang, F. et al. Robust real-time visual object tracking via multi-scale fully convolutional Siamese networks. Multimed Tools Appl 77, 22131–22143 (2018). https://doi.org/10.1007/s11042-018-5664-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-5664-7

Keywords

Navigation