Abstract
Deep learning-based methods have recently attracted significant attention in visual tracking community, leading to an increase in state-of-the-art tracking performance. However, due to the utilization of more complex models, it has also been accompanied with a decrease in speed. For real-time tracking applications, a careful balance of performance and speed is required. We propose a real-time tracking method based on deep feature fusion, which combines deep learning with kernel correlation filter. First, hierarchical features are extracted from a lightweight pre-trained convolutional neural network. Then, original features of different levels are fused using canonical correlation analysis. Fused features, as well as some original deep features, are used in three kernel correlation filters to track the target. An adaptive update strategy, based on dispersion analysis of response maps for the correlation filters, is proposed to improve robustness to target appearance changes. Different update frequencies are adopted for the three filters to adapt to severe appearance changes. We perform extensive experiments on two benchmarks: OTB-50 and OTB-100. Quantitative and qualitative evaluations show that the proposed tracking method performs favorably against some state-of-the-art methods – even better than algorithms using complex network model. Furthermore, proposed algorithm runs faster than 20 frame per second (FPS) and hence able to achieve near real-time tracking.
Similar content being viewed by others
References
Bao C, Wu Y, Ling H, Ji H (2012) Real time robust l1 tracker using accelerated proximal gradient approach. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1830–1837
Bertinetto L, Valmadre J, Henriques JF, et al. (2016) Fully-convolutional siamese networks for object tracking. In: Proceedings of the European conference on computer vision, pp 850–865
Bertinetto L, Valmadre J, Golodetz S, et al. (2016) Staple: complementary learners for real-time tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1401–1409
Bolme DS, Beveridge JR, Draper BA, Lui YM (2010) Visual object tracking using adaptive correlation filters. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2544–2550
Danelljan M, Hager G, Khan FS, Felsberg M (2015) Convolutional features for correlation filter based visual tracking. In: Proceedings of the IEEE international conference on computer vision workshop, pp 621–629
Danelljan M, Hager G, Khan FS, Felsberg M (2016) Discriminative scale space tracking. IEEE Trans Pattern Anal Mach Intell 39(8):1561–1575
Danelljan M, Robinson A, Khan FS, Felsberg M (2016) Beyond correlation filters: learning continuous convolution operators for visual tracking. In: Proceedings of the European conference on computer vision, pp 472–488
Danelljan M, Bhat G, Khan FS, Felsberg M (2017) ECO: Efficient Convolution operators for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6931–6939
Deng J, Dong W, Socher R, et al. (2009) Imagenet: a large-scale hierarchical image database. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 248–255
Dong X, Shen J, Yu D, et al. (2017) Occlusion-aware real-time object tracking. IEEE Trans Multimed 19(4):763–771
Dou J, Qin Q, Tu Z (2017) Robust visual tracking based on generative and discriminative model collaboration. Multimed Tools Appl 76(14):15839–15866
Everingham M, Gool LV, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (VOC) challenge. Int J Comput Vis 88 (2):303–338
Galoogahi HK, Fagg A, Lucey S (2017) Learning Background-Aware correlation filters for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 1144–1152
Haghighat M, Abdel-Mottaleb M, Alhalabi W (2016) Fully automatic face normalization and single sample face recognition in unconstrained environments. Expert Syst Appl 47(5):23–34
Hardoon RD, Szedmak SR, Shawe-Taylor JR (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12):2639–2664
He L, Qiao X, Wen S, Li F (2018) Robust object tracking based on motion consistency. Sensors 18(2):572
He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Held D, Thrun S, Savarese S (2016) Learning to track at 100 FPS with deep regression networks. In: Proceedings of the European conference on computer vision, pp 749–765
Henriques JF, Caseiro R, Martins P, Batista J (2015) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596
Hotelling H (1935) Relations between two sets of variates. Biometrika 28(28):321–377
Hong S, You T, Kwak S, Han B (2015) Online tracking by learning discriminative saliency map with convolutional neural network. In: Proceedings of the international conference on machine learning, pp 597–606
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2261–2269
Jia X, Lu H, Yang MH (2012) Visual tracking via adaptive structural local sparse appearance model. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1822–1829
Kristan M, Pflugfelder R, Leonardis A, et al. (2016) The visual object tracking VOT2014 challenge results. In: Proceedings of the computer vision - European conference on computer vision workshops, PT II, vol 8926, pp 191–217
Kristan M, Leonardis A, Matas J, et al. (2017) The visual object tracking VOT2017 challenge results. In: Proceedings of the IEEE international conference on computer vision workshops, pp 1949–1972
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of the conference and workshop on neural information processing systems, pp 1097–1105
Kwon J, Lee KM (2010) Visual tracking decomposition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1269–1276
Li F, Tian C, Zuo W, et al. (2018) Learning spatial-temporal regularized correlation filters for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4904–4913
Li F, Zhang S, Qiao X (2017) Scene-aware adaptive updating for visual tracking via correlation filters. Sensors 17(11):2626
Li Y, Zhu J (2014) A scale adaptive kernel correlation filter tracker with feature integration. In: Proceedings of the European conference on computer vision workshop, pp 254–265
Li X, Liu Q, He Z, et al. (2016) A multi-view model for visual tracking via correlation filters. Knowl-Based Syst 113(C):88–99
Liang N, Wu G, Kang W, et al. (2018) Real-time long-term tracking with prediction-detection-correction. IEEE Trans Multimed 20(9):2289–2302
Lin TY, Dollár P, Girshick R, et al. (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Lin TY, Maire M, Belongie S, et al. (2014) Microsoft COCO: common objects in context. In: Proceedings of the European conference on computer vision, pp 740–755
Liu G (2018) Robust visual tracking via smooth manifold kernel sparse learning. IEEE Trans Multimed 20(11):2949–2963
Liu F, Gong C, Huang X, Zhou T, Yang J, Tao D (2018) Robust visual tracking revisited: from correlation filter to template matching. IEEE Trans Image Process 27(6):2777–2790
Lu X, Wang W, Ma C, Shen J, Shao L, Porikli F (2019) See more, know more: unsupervised video object segmentation with co-attention siamese networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3618–3627
Lukezic A, et al. (2017) Discriminative correlation filter tracker with channel and spatial reliability. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4847–4856
Ma C, et al. (2015) Hierarchical convolutional features for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 3074–3082
Ma B, Hu H, Shen J, Liu Y, Shao L (2016) Generalized pooling for robust object tracking. IEEE Trans Image Process 25(9):4199–4208
Ma C, Miao Z, Zhang X, et al. (2018) A saliency prior context model for real-time object tracking. IEEE Trans Multimed 19(11):2415–2424
Ma C, Yang X, Zhang C, Yang MH (2015) Long-term correlation tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5388–5396
Mueller M, Smith N, Ghanem B (2017) Context-aware correlation filter tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1387–1395
Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4293–4302
Rashid M, Khan MA, et al. (2019) Object detection and classification: a joint selection and fusion strategy of deep convolutional neural network and SIFT point features. Multimed Tools Appl 78(12):15751–15777
Ross DA, Lim J, Lin RS, Yang MH (2008) Incremental learning for robust visual tracking. Int J Comput Vis 77(1):125–141
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Song Y, Ma C, et al. (2017) CREST: convolutional residual learning for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 2574–2583
Sun Q, Zeng S, Liu Y, et al. (2005) A new method of feature fusion and its application in image recognition. Pattern Recogn 38(12):2437–2448
Valmadre J, Bertinetto L, Henriques JF, et al. (2017) End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5000–5008
Vo DM, Lee S-W (2018) Semantic image segmentation using fully convolutional neural networks with multi-scale images and multi-scale dilated convolutions. Multimed Tools Appl 77(14):18689–18707
Wang Q, Gao J, Xing J, et al. (2017) DCFNet: discriminant correlation filters network for visual tracking. arXiv:1704.04057
Wang H, Liu P, et al. (2019) Online convolution network tracking via spatio-temporal context. Multimed Tools Appl 78(1):257–270
Wang M, Liu Y, Huang Z (2017) Large margin object tracking with circulant feature maps. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4800–4808
Wang L, Ouyang W, et al. (2015) Visual tracking with fully convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 3119–3127
Wang N, Zhou W, Tian Q, et al. (2018) Multi-cue correlation filters for robust visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4844–4853
Wu Y, Lim J, Yang MH (2013) Online object tracking: a benchmark. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2411–2418
Wu Y, Lim J, Yang MH (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848
Wu Z, Mao K, Ng Gee-Wah (2018) Feature regrouping for CCA-based feature fusion and extraction through normalized cut. In: Proceedings of the international conference on information fusion, pp 2275–2282
Yang X, Sun D (2016) Feature-level fusion of palmprint and palm vein base on canonical correlation analysis. In: Proceedings of the international conference on signal processing, pp 1353–1356
Yang R, Wei Z (2016) Discriminative descriptors for object tracking. J Vis Commun Image Represent 35:146–154
Zhang J, Ma S, Sclaroff S (2014) MEEM: robust tracking via multiple experts using entropy minimization. In: Proceedings of the European conference on computer vision, pp 188–203
Zhang K, Zhang L, Yang MH (2014) Fast compressive tracking. IEEE Trans Pattern Anal Mach Intell 36(10):2002–2015
Zhao J, Liu J, Fan D, Cao Y, Yang J, Cheng M (2019) EGNEt: edge guidance network for salient object detection. In: Proceedings of the IEEE conference on international conference on computer vision, pp 8778–8787
Zhou T, Bhaskar H, Liu F, Yang J (2017) Graph regularized and locality-constrained coding for robust visual tracking. IEEE Trans Circ Sys Vid Tech 27(10):2153–2164
Zhou Y, Han J, Yuan X, et al. (2017) Inverse sparse group lasso model for robust object tracking. IEEE Trans Multimed 19(8):1798–1810
Zhou T, Liu F, Bhaskar H, Yang J (2018) Robust visual tracking via online discriminative and Low-Rank dictionary learning. IEEE Trans Cybern 48 (9):2643–2655
Zhu Z, Huang G, Zou W, et al. (2017) Learning unified convolutional networks for real-time visual tracking. In: Proceedings of the IEEE international conference on computer vision workshops, pp 1973–1982
Acknowledgements
This research was supported in part by the National Science Foundation of China (61671365) and the Joint Foundation of Ministry of Education of China (6141A02022344).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Pang, Y., Li, F., Qiao, X. et al. Real-time tracking based on deep feature fusion. Multimed Tools Appl 79, 27229–27255 (2020). https://doi.org/10.1007/s11042-020-09267-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-09267-w