Skip to main content
Log in

Real-time tracking based on deep feature fusion

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Deep learning-based methods have recently attracted significant attention in visual tracking community, leading to an increase in state-of-the-art tracking performance. However, due to the utilization of more complex models, it has also been accompanied with a decrease in speed. For real-time tracking applications, a careful balance of performance and speed is required. We propose a real-time tracking method based on deep feature fusion, which combines deep learning with kernel correlation filter. First, hierarchical features are extracted from a lightweight pre-trained convolutional neural network. Then, original features of different levels are fused using canonical correlation analysis. Fused features, as well as some original deep features, are used in three kernel correlation filters to track the target. An adaptive update strategy, based on dispersion analysis of response maps for the correlation filters, is proposed to improve robustness to target appearance changes. Different update frequencies are adopted for the three filters to adapt to severe appearance changes. We perform extensive experiments on two benchmarks: OTB-50 and OTB-100. Quantitative and qualitative evaluations show that the proposed tracking method performs favorably against some state-of-the-art methods – even better than algorithms using complex network model. Furthermore, proposed algorithm runs faster than 20 frame per second (FPS) and hence able to achieve near real-time tracking.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Bao C, Wu Y, Ling H, Ji H (2012) Real time robust l1 tracker using accelerated proximal gradient approach. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1830–1837

  2. Bertinetto L, Valmadre J, Henriques JF, et al. (2016) Fully-convolutional siamese networks for object tracking. In: Proceedings of the European conference on computer vision, pp 850–865

  3. Bertinetto L, Valmadre J, Golodetz S, et al. (2016) Staple: complementary learners for real-time tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1401–1409

  4. Bolme DS, Beveridge JR, Draper BA, Lui YM (2010) Visual object tracking using adaptive correlation filters. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2544–2550

  5. Danelljan M, Hager G, Khan FS, Felsberg M (2015) Convolutional features for correlation filter based visual tracking. In: Proceedings of the IEEE international conference on computer vision workshop, pp 621–629

  6. Danelljan M, Hager G, Khan FS, Felsberg M (2016) Discriminative scale space tracking. IEEE Trans Pattern Anal Mach Intell 39(8):1561–1575

    Article  Google Scholar 

  7. Danelljan M, Robinson A, Khan FS, Felsberg M (2016) Beyond correlation filters: learning continuous convolution operators for visual tracking. In: Proceedings of the European conference on computer vision, pp 472–488

  8. Danelljan M, Bhat G, Khan FS, Felsberg M (2017) ECO: Efficient Convolution operators for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6931–6939

  9. Deng J, Dong W, Socher R, et al. (2009) Imagenet: a large-scale hierarchical image database. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 248–255

  10. Dong X, Shen J, Yu D, et al. (2017) Occlusion-aware real-time object tracking. IEEE Trans Multimed 19(4):763–771

    Article  Google Scholar 

  11. Dou J, Qin Q, Tu Z (2017) Robust visual tracking based on generative and discriminative model collaboration. Multimed Tools Appl 76(14):15839–15866

    Article  Google Scholar 

  12. Everingham M, Gool LV, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (VOC) challenge. Int J Comput Vis 88 (2):303–338

    Article  Google Scholar 

  13. Galoogahi HK, Fagg A, Lucey S (2017) Learning Background-Aware correlation filters for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 1144–1152

  14. Haghighat M, Abdel-Mottaleb M, Alhalabi W (2016) Fully automatic face normalization and single sample face recognition in unconstrained environments. Expert Syst Appl 47(5):23–34

    Article  Google Scholar 

  15. Hardoon RD, Szedmak SR, Shawe-Taylor JR (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12):2639–2664

    Article  Google Scholar 

  16. He L, Qiao X, Wen S, Li F (2018) Robust object tracking based on motion consistency. Sensors 18(2):572

    Article  Google Scholar 

  17. He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  18. Held D, Thrun S, Savarese S (2016) Learning to track at 100 FPS with deep regression networks. In: Proceedings of the European conference on computer vision, pp 749–765

  19. Henriques JF, Caseiro R, Martins P, Batista J (2015) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596

    Article  Google Scholar 

  20. Hotelling H (1935) Relations between two sets of variates. Biometrika 28(28):321–377

    MATH  Google Scholar 

  21. Hong S, You T, Kwak S, Han B (2015) Online tracking by learning discriminative saliency map with convolutional neural network. In: Proceedings of the international conference on machine learning, pp 597–606

  22. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2261–2269

  23. Jia X, Lu H, Yang MH (2012) Visual tracking via adaptive structural local sparse appearance model. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1822–1829

  24. Kristan M, Pflugfelder R, Leonardis A, et al. (2016) The visual object tracking VOT2014 challenge results. In: Proceedings of the computer vision - European conference on computer vision workshops, PT II, vol 8926, pp 191–217

  25. Kristan M, Leonardis A, Matas J, et al. (2017) The visual object tracking VOT2017 challenge results. In: Proceedings of the IEEE international conference on computer vision workshops, pp 1949–1972

  26. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of the conference and workshop on neural information processing systems, pp 1097–1105

  27. Kwon J, Lee KM (2010) Visual tracking decomposition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1269–1276

  28. Li F, Tian C, Zuo W, et al. (2018) Learning spatial-temporal regularized correlation filters for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4904–4913

  29. Li F, Zhang S, Qiao X (2017) Scene-aware adaptive updating for visual tracking via correlation filters. Sensors 17(11):2626

    Article  Google Scholar 

  30. Li Y, Zhu J (2014) A scale adaptive kernel correlation filter tracker with feature integration. In: Proceedings of the European conference on computer vision workshop, pp 254–265

  31. Li X, Liu Q, He Z, et al. (2016) A multi-view model for visual tracking via correlation filters. Knowl-Based Syst 113(C):88–99

    Article  Google Scholar 

  32. Liang N, Wu G, Kang W, et al. (2018) Real-time long-term tracking with prediction-detection-correction. IEEE Trans Multimed 20(9):2289–2302

    Article  Google Scholar 

  33. Lin TY, Dollár P, Girshick R, et al. (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125

  34. Lin TY, Maire M, Belongie S, et al. (2014) Microsoft COCO: common objects in context. In: Proceedings of the European conference on computer vision, pp 740–755

  35. Liu G (2018) Robust visual tracking via smooth manifold kernel sparse learning. IEEE Trans Multimed 20(11):2949–2963

    Article  Google Scholar 

  36. Liu F, Gong C, Huang X, Zhou T, Yang J, Tao D (2018) Robust visual tracking revisited: from correlation filter to template matching. IEEE Trans Image Process 27(6):2777–2790

    Article  MathSciNet  Google Scholar 

  37. Lu X, Wang W, Ma C, Shen J, Shao L, Porikli F (2019) See more, know more: unsupervised video object segmentation with co-attention siamese networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3618–3627

  38. Lukezic A, et al. (2017) Discriminative correlation filter tracker with channel and spatial reliability. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4847–4856

  39. Ma C, et al. (2015) Hierarchical convolutional features for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 3074–3082

  40. Ma B, Hu H, Shen J, Liu Y, Shao L (2016) Generalized pooling for robust object tracking. IEEE Trans Image Process 25(9):4199–4208

    MathSciNet  MATH  Google Scholar 

  41. Ma C, Miao Z, Zhang X, et al. (2018) A saliency prior context model for real-time object tracking. IEEE Trans Multimed 19(11):2415–2424

    Article  Google Scholar 

  42. Ma C, Yang X, Zhang C, Yang MH (2015) Long-term correlation tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5388–5396

  43. Mueller M, Smith N, Ghanem B (2017) Context-aware correlation filter tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1387–1395

  44. Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4293–4302

  45. Rashid M, Khan MA, et al. (2019) Object detection and classification: a joint selection and fusion strategy of deep convolutional neural network and SIFT point features. Multimed Tools Appl 78(12):15751–15777

    Article  Google Scholar 

  46. Ross DA, Lim J, Lin RS, Yang MH (2008) Incremental learning for robust visual tracking. Int J Comput Vis 77(1):125–141

    Article  Google Scholar 

  47. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  48. Song Y, Ma C, et al. (2017) CREST: convolutional residual learning for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 2574–2583

  49. Sun Q, Zeng S, Liu Y, et al. (2005) A new method of feature fusion and its application in image recognition. Pattern Recogn 38(12):2437–2448

    Article  Google Scholar 

  50. Valmadre J, Bertinetto L, Henriques JF, et al. (2017) End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5000–5008

  51. Vo DM, Lee S-W (2018) Semantic image segmentation using fully convolutional neural networks with multi-scale images and multi-scale dilated convolutions. Multimed Tools Appl 77(14):18689–18707

    Article  Google Scholar 

  52. Wang Q, Gao J, Xing J, et al. (2017) DCFNet: discriminant correlation filters network for visual tracking. arXiv:1704.04057

  53. Wang H, Liu P, et al. (2019) Online convolution network tracking via spatio-temporal context. Multimed Tools Appl 78(1):257–270

    Article  Google Scholar 

  54. Wang M, Liu Y, Huang Z (2017) Large margin object tracking with circulant feature maps. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4800–4808

  55. Wang L, Ouyang W, et al. (2015) Visual tracking with fully convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 3119–3127

  56. Wang N, Zhou W, Tian Q, et al. (2018) Multi-cue correlation filters for robust visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4844–4853

  57. Wu Y, Lim J, Yang MH (2013) Online object tracking: a benchmark. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2411–2418

  58. Wu Y, Lim J, Yang MH (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848

    Article  Google Scholar 

  59. Wu Z, Mao K, Ng Gee-Wah (2018) Feature regrouping for CCA-based feature fusion and extraction through normalized cut. In: Proceedings of the international conference on information fusion, pp 2275–2282

  60. Yang X, Sun D (2016) Feature-level fusion of palmprint and palm vein base on canonical correlation analysis. In: Proceedings of the international conference on signal processing, pp 1353–1356

  61. Yang R, Wei Z (2016) Discriminative descriptors for object tracking. J Vis Commun Image Represent 35:146–154

    Article  Google Scholar 

  62. Zhang J, Ma S, Sclaroff S (2014) MEEM: robust tracking via multiple experts using entropy minimization. In: Proceedings of the European conference on computer vision, pp 188–203

  63. Zhang K, Zhang L, Yang MH (2014) Fast compressive tracking. IEEE Trans Pattern Anal Mach Intell 36(10):2002–2015

    Article  Google Scholar 

  64. Zhao J, Liu J, Fan D, Cao Y, Yang J, Cheng M (2019) EGNEt: edge guidance network for salient object detection. In: Proceedings of the IEEE conference on international conference on computer vision, pp 8778–8787

  65. Zhou T, Bhaskar H, Liu F, Yang J (2017) Graph regularized and locality-constrained coding for robust visual tracking. IEEE Trans Circ Sys Vid Tech 27(10):2153–2164

    Article  Google Scholar 

  66. Zhou Y, Han J, Yuan X, et al. (2017) Inverse sparse group lasso model for robust object tracking. IEEE Trans Multimed 19(8):1798–1810

    Article  Google Scholar 

  67. Zhou T, Liu F, Bhaskar H, Yang J (2018) Robust visual tracking via online discriminative and Low-Rank dictionary learning. IEEE Trans Cybern 48 (9):2643–2655

    Article  Google Scholar 

  68. Zhu Z, Huang G, Zou W, et al. (2017) Learning unified convolutional networks for real-time visual tracking. In: Proceedings of the IEEE international conference on computer vision workshops, pp 1973–1982

Download references

Acknowledgements

This research was supported in part by the National Science Foundation of China (61671365) and the Joint Foundation of Ministry of Education of China (6141A02022344).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fan Li.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pang, Y., Li, F., Qiao, X. et al. Real-time tracking based on deep feature fusion. Multimed Tools Appl 79, 27229–27255 (2020). https://doi.org/10.1007/s11042-020-09267-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-09267-w

Keywords

Navigation