Robust real-time visual object tracking via multi-scale fully convolutional Siamese networks

Yang, Longchao; Jiang, Peilin; Wang, Fei; Wang, Xuan

doi:10.1007/s11042-018-5664-7

Robust real-time visual object tracking via multi-scale fully convolutional Siamese networks

Published: 13 April 2018

Volume 77, pages 22131–22143, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Longchao Yang¹,
Peilin Jiang²,
Fei Wang¹ &
…
Xuan Wang¹

472 Accesses
7 Citations
Explore all metrics

Abstract

Robust visual object tracking against occlusions and deformations is still very challenging task. To tackle these issues, existing Convolutional Neural Networks (CNNs) based trackers either fail to handle them or can just run in low speed. In this paper, we present a realtime tracker which is robust to occlusions and deformations based on a Region-based, Multi-Scale Fully Convolutional Siamese Network (R-MSFCN). In the proposed R-MSFCN, the information of regions is extracted separately by the proposition of position-sensitive score maps on multiple convolutional layers. Combining these score maps via adaptive weights leads to accurate location of the target on a new frame. The experiments illustrate that our method outperforms state-of-the-art approaches, and can handle the cases of object deformation and occlusion at about 31 FPS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Robust and Real-Time Visual Tracking Based on Single-Layer Convolutional Features and Accurate Scale Estimation

Feature selection accelerated convolutional neural networks for visual tracking

Article 30 March 2021

Zhiyan Cui & Na Lu

Towards real-time object tracking with deep Siamese network and layerwise aggregation

Article 25 January 2021

Lasheng Yu, Yongpeng Zhao & Xiaopeng Zheng

References

Ahuja N, Liu S, Ghanem B, Zhang T (2012) Robust visual tracking via multi-task sparse learning. In: CVPR, pp 2042–2049
Bertinetto L, Valmadre J, Golodetz S, Miksik O, Torr P (2016) Staple: complementary learners for real-time tracking. Comput Sci 38(2):311–323
Google Scholar
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr P (2016) Fully-convolutional siamese networks for object tracking. arXiv:1606.09549
Caseiro R, Martins P, Batista J (2015) High-speed tracking with kernelized correlation filters. TPAMI
Danelljan M, Hager G, Khan FS, Felsberg M (2014) Accurate scale estimation for robust visual tracking. In: BMVC
Danelljan M, Hager G, Khan FS, Felsberg M (2016) Adaptive decontamination of the training set: a unified formulation for discriminative visual tracking. In: CVPR
Danelljan M, Robinson A, Khan FS, Felsberg M (2016) Beyond correlation filters: learning continuous convolution operators for visual tracking. In: ECCV
Hare S, Saffari A, Torr PHS (2016) Struck: structured output tracking with kernels. TPAMI 38(10):263–270
Article Google Scholar
Held D, Thrun S, Savarese S (2016) Learning to track at 100 fps with deep regression networks. In: ECCV
Henriques JF, Rui C, Martins P, Batista J (2015) High-speed tracking with kernelized correlation filters. TPAMI 37(3):583–596
Article Google Scholar
Jifeng D, Yi L, Kaiming H, Jian S (2016) R-FCN: object detection via region-based fully convolutional networks. arXiv:1605.06409
Kalal Z, Mikolajczyk K, Matas J (2012) Tracking-learning-detection. TPAMI 34(7):1409–22
Article Google Scholar
Kristan M, Matas J, Leonardis A, Felsberg M, Cehovin L, Fernandez G, Vojir T, Hager G, Nebehay G, Pflugfelder R (2016) The visual object tracking vot2015 challenge results. In: ICCV, pp 564–586
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25(2):2012
Google Scholar
Li Y, Qi H, Dai J, Ji X, Wei Y (2016) Fully convolutional instance-aware semantic segmentation. arXiv preprint arXiv:1611.07709
Liu T, Wang G, Yang Q (2015) Real-time part-based visual tracking via adaptive correlation filters. In: CVPR, pp 4902–4912
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: CVPR, pp 3431–3440
Ma C, Yang X, Zhang C, Yang MH (2015) Long-term correlation tracking. In: CVPR, pp 5388–5396
Nam H, Han B (2015) Learning multi-domain convolutional neural networks for visual tracking. arXiv preprint arXiv:1510.07945
Nam H, Baek M, Han B (2016) Modeling and propagating cnns in a tree structure for visual tracking. arXiv:1608.07242
Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: ICCV, pp 1520–1528
Pinheiro PO, Collobert R, Dollar P (2015) Learning to segment object candidates. Comput Sci: 1990–1998
Qi Y, Zhang S, Qin L, Yao H, Huang Q, Lim J, Yang MH (2016) Hedged deep tracking. In: CVPR, pp 4303–4311
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Article Google Scholar
Ross DA, Lim J, Lin RS, Yang MH (2008) Incremental learning for robust visual tracking. IJCV 77(1):125–141
Article Google Scholar
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M (2015) Imagenet large scale visual recognition challenge. IJCV 115(3):211–252
Article MathSciNet Google Scholar
Tao R, Gavves E, Smeulders AW (2016) Siamese instance search for tracking. In: CVPR, pp 1420–1429
Wang L, Ouyang W, Wang X, Lu H (2016) Visual tracking with fully convolutional networks. In: ICCV, pp 3119–3127
Wu Y, Lim J, Yang MH (2013) Online object tracking: a benchmark. In: CVPR, pp 2411–2418
Wu Y, Lim J, Yang MH (2015) Object tracking benchmark. TPAMI 37 (9):1–1
Article Google Scholar
Xiang W, Zhou Y (2014) Part-based tracking with appearance learning and structural constrains. In: ICONIP. Springer, Berlin, pp 594–601
Yao R, Shi Q, Shen C, Zhang Y (2013) Part-based visual tracking with online latent structural learning. In: CVPR, pp 2363–2370
Zhang T, Jia K, Xu C, Ma Y, Ahuja N (2014) Partial occlusion handling for visual tracking via robust part matching. In: ICCV, pp 1258–1265
Zhao H, Shi J, Qi X, Wang X, Jia J (2016) Pyramid scene parsing network. arXiv:1612.01105

Download references

Acknowledgments

This work was supported in part by Natural Science Foundation of China (No.61231018), National Science and Technology Support Program (2015BAH31F01) and Program of Introducing Talents of Discipline to University under grant B13043.

Author information

Authors and Affiliations

Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, 28 Xianning Road, Xi’an, China
Longchao Yang, Fei Wang & Xuan Wang
School of Software Engineering, Xi’an Jiaotong University, 28 Xianning Road, Xi’an, China
Peilin Jiang

Authors

Longchao Yang
View author publications
You can also search for this author in PubMed Google Scholar
Peilin Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Fei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xuan Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Longchao Yang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, L., Jiang, P., Wang, F. et al. Robust real-time visual object tracking via multi-scale fully convolutional Siamese networks. Multimed Tools Appl 77, 22131–22143 (2018). https://doi.org/10.1007/s11042-018-5664-7

Download citation

Received: 20 September 2017
Revised: 15 November 2017
Accepted: 14 January 2018
Published: 13 April 2018
Issue Date: September 2018
DOI: https://doi.org/10.1007/s11042-018-5664-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Robust real-time visual object tracking via multi-scale fully convolutional Siamese networks

Abstract

Access this article

Similar content being viewed by others

Robust and Real-Time Visual Tracking Based on Single-Layer Convolutional Features and Accurate Scale Estimation

Feature selection accelerated convolutional neural networks for visual tracking

Towards real-time object tracking with deep Siamese network and layerwise aggregation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust real-time visual object tracking via multi-scale fully convolutional Siamese networks

Abstract

Access this article

Similar content being viewed by others

Robust and Real-Time Visual Tracking Based on Single-Layer Convolutional Features and Accurate Scale Estimation

Feature selection accelerated convolutional neural networks for visual tracking

Towards real-time object tracking with deep Siamese network and layerwise aggregation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation