Abstract
Robust visual object tracking against occlusions and deformations is still very challenging task. To tackle these issues, existing Convolutional Neural Networks (CNNs) based trackers either fail to handle them or can just run in low speed. In this paper, we present a realtime tracker which is robust to occlusions and deformations based on a Region-based, Multi-Scale Fully Convolutional Siamese Network (R-MSFCN). In the proposed R-MSFCN, the information of regions is extracted separately by the proposition of position-sensitive score maps on multiple convolutional layers. Combining these score maps via adaptive weights leads to accurate location of the target on a new frame. The experiments illustrate that our method outperforms state-of-the-art approaches, and can handle the cases of object deformation and occlusion at about 31 FPS.
Similar content being viewed by others
References
Ahuja N, Liu S, Ghanem B, Zhang T (2012) Robust visual tracking via multi-task sparse learning. In: CVPR, pp 2042–2049
Bertinetto L, Valmadre J, Golodetz S, Miksik O, Torr P (2016) Staple: complementary learners for real-time tracking. Comput Sci 38(2):311–323
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr P (2016) Fully-convolutional siamese networks for object tracking. arXiv:1606.09549
Caseiro R, Martins P, Batista J (2015) High-speed tracking with kernelized correlation filters. TPAMI
Danelljan M, Hager G, Khan FS, Felsberg M (2014) Accurate scale estimation for robust visual tracking. In: BMVC
Danelljan M, Hager G, Khan FS, Felsberg M (2016) Adaptive decontamination of the training set: a unified formulation for discriminative visual tracking. In: CVPR
Danelljan M, Robinson A, Khan FS, Felsberg M (2016) Beyond correlation filters: learning continuous convolution operators for visual tracking. In: ECCV
Hare S, Saffari A, Torr PHS (2016) Struck: structured output tracking with kernels. TPAMI 38(10):263–270
Held D, Thrun S, Savarese S (2016) Learning to track at 100 fps with deep regression networks. In: ECCV
Henriques JF, Rui C, Martins P, Batista J (2015) High-speed tracking with kernelized correlation filters. TPAMI 37(3):583–596
Jifeng D, Yi L, Kaiming H, Jian S (2016) R-FCN: object detection via region-based fully convolutional networks. arXiv:1605.06409
Kalal Z, Mikolajczyk K, Matas J (2012) Tracking-learning-detection. TPAMI 34(7):1409–22
Kristan M, Matas J, Leonardis A, Felsberg M, Cehovin L, Fernandez G, Vojir T, Hager G, Nebehay G, Pflugfelder R (2016) The visual object tracking vot2015 challenge results. In: ICCV, pp 564–586
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25(2):2012
Li Y, Qi H, Dai J, Ji X, Wei Y (2016) Fully convolutional instance-aware semantic segmentation. arXiv preprint arXiv:1611.07709
Liu T, Wang G, Yang Q (2015) Real-time part-based visual tracking via adaptive correlation filters. In: CVPR, pp 4902–4912
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: CVPR, pp 3431–3440
Ma C, Yang X, Zhang C, Yang MH (2015) Long-term correlation tracking. In: CVPR, pp 5388–5396
Nam H, Han B (2015) Learning multi-domain convolutional neural networks for visual tracking. arXiv preprint arXiv:1510.07945
Nam H, Baek M, Han B (2016) Modeling and propagating cnns in a tree structure for visual tracking. arXiv:1608.07242
Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: ICCV, pp 1520–1528
Pinheiro PO, Collobert R, Dollar P (2015) Learning to segment object candidates. Comput Sci: 1990–1998
Qi Y, Zhang S, Qin L, Yao H, Huang Q, Lim J, Yang MH (2016) Hedged deep tracking. In: CVPR, pp 4303–4311
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Ross DA, Lim J, Lin RS, Yang MH (2008) Incremental learning for robust visual tracking. IJCV 77(1):125–141
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M (2015) Imagenet large scale visual recognition challenge. IJCV 115(3):211–252
Tao R, Gavves E, Smeulders AW (2016) Siamese instance search for tracking. In: CVPR, pp 1420–1429
Wang L, Ouyang W, Wang X, Lu H (2016) Visual tracking with fully convolutional networks. In: ICCV, pp 3119–3127
Wu Y, Lim J, Yang MH (2013) Online object tracking: a benchmark. In: CVPR, pp 2411–2418
Wu Y, Lim J, Yang MH (2015) Object tracking benchmark. TPAMI 37 (9):1–1
Xiang W, Zhou Y (2014) Part-based tracking with appearance learning and structural constrains. In: ICONIP. Springer, Berlin, pp 594–601
Yao R, Shi Q, Shen C, Zhang Y (2013) Part-based visual tracking with online latent structural learning. In: CVPR, pp 2363–2370
Zhang T, Jia K, Xu C, Ma Y, Ahuja N (2014) Partial occlusion handling for visual tracking via robust part matching. In: ICCV, pp 1258–1265
Zhao H, Shi J, Qi X, Wang X, Jia J (2016) Pyramid scene parsing network. arXiv:1612.01105
Acknowledgments
This work was supported in part by Natural Science Foundation of China (No.61231018), National Science and Technology Support Program (2015BAH31F01) and Program of Introducing Talents of Discipline to University under grant B13043.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yang, L., Jiang, P., Wang, F. et al. Robust real-time visual object tracking via multi-scale fully convolutional Siamese networks. Multimed Tools Appl 77, 22131–22143 (2018). https://doi.org/10.1007/s11042-018-5664-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-5664-7