Skip to main content
Log in

An Ensemble of Complementary Models for Deep Tracking

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

Convolutional neural networks (CNNs) have shown favorable performance in recent tracking benchmark datasets. Some methods extract different levels of features based on pre-trained CNNs to deal with various challenging scenarios. Despite demonstrated successes for visual tracking, utilizing features from the same network might suffer from the suboptimal performance due to limitations of CNN architecture itself. We observe that different CNNs usually have complementary characteristics in representing target objects. Therefore, we propose to leverage the complementary properties of different CNNs for visual tracking in this paper. The importances of different CNNs are identified by a joint inference of candidate location, predicted location and confidence score. The prediction scores of all CNNs are adaptively fused to obtain robust tracking performance. Moreover, we introduce the attention mechanism to highlight discriminative features in each CNN. Experimental results on OTB2013 and OTB2015 datasets show that the proposed method performs favorably compared with some state-of-the-art methods. We conclude that combination of complementary models can better track objects in terms of accuracy and robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Li C, Lin L, Zuo W, Tang J, Yang M. Visual tracking via dynamic graph learning. IEEE Trans Pattern Anal Mach Intell. 2019;41(11):2770–82. https://doi.org/10.1109/TPAMI.2018.2864965.

    Article  Google Scholar 

  2. Li C, Liang X, Lu Y, Zhao N, Tang J. Rgb-t object tracking: Benchmark and baseline. Pattern Recogn. 2019;96:106977.

    Article  Google Scholar 

  3. Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). 2005;1:886–893. IEEE.

  4. Van De Weijer J, Schmid C, Verbeek J, Larlus DJIToIP. Learning color names for real-world applications. IEEE Trans Image Process. 2009;18(7):1512–1523.

  5. Rublee E, Rabaud V, Konolige K, Bradski G. Orb: An efficient alternative to sift or surf. In: 2011 International Conference on Computer Vision. 2011. pp. 2564–2571. Ieee.

  6. Ta DN, Chen WC, Gelfand N, Pulli K. Surftrac: Efficient tracking and continuous object recognition using local feature descriptors. In: 2009 Proc IEEE Conf Comput Vis Pattern Recognit. 2009. pp. 2937–2944. IEEE.

  7. Bhat G, Johnander J, Danelljan M, Shahbaz Khan F, Felsberg M. Unveiling the power of deep tracking. In: Proceedings of the European Conference on Computer Vision (ECCV). 2018. pp. 483–498.

  8. Ma C, Huang JB, Yang X, Yang MH. Hierarchical convolutional features for visual tracking. In: IEEE International Conference on Computer Vision. 2016.

  9. Yu W, Wang X, Hou Z, Wang P, Qin XJM. Microsystems: Deep discriminative correlation tracking based on adaptive feature fusion. Microsc Microanal. 2019;71:102854.

    Google Scholar 

  10. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2015. pp. 770–778.

  11. Simonyan K, Zisserman AJ. Very deep convolutional networks for large-scale image recognition. 2014. arXiv preprint. arXiv:1409.1556.

  12. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2015. pp. 1–9.

  13. Wu Y, Lim J, Yang MH. Online object tracking: A benchmark. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2013. pp. 2411–2418.

  14. Wu, Yi JL, Yang M. Object tracking benchmark. In: IEEE Trans Pattern Anal Mach Intell. 2015. pp. 1834–1848.

  15. Bolme DS, Beveridge JR, Draper BA, Lui YM. Visual object tracking using adaptive correlation filters. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2010. pp. 2544–2550. IEEE.

  16. Kiani Galoogahi H, Fagg A, Lucey S. Learning background-aware correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision. 2017. pp. 1135–1143.

  17. Liu T, Wang G, Yang Q. Real-time part-based visual tracking via adaptive correlation filters. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2015. pp. 4902–4912.

  18. Ma C, Huang JB, Yang X, Yang MH. Hierarchical convolutional features for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision. 2015. pp. 3074–3082.

  19. Zhang T, Xu C, Yang MH. Multi-task correlation particle filter for robust object tracking. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2017. pp. 4335–4343.

  20. Qi Y, Zhang S, Qin L, Yao H, Huang Q, Lim J, Yang MH. Hedged deep tracking. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2016. pp. 4303–4311.

  21. Sun C, Wang D, Lu H, Yang MH. Learning spatial-aware regressions for visual tracking. 2018.

  22. Danelljan M, Robinson A, Khan F.S, Felsberg M. Beyond correlation filters: Learning continuous convolution operators for visual tracking. In: European Conference on Computer Vision. 2016. pp. 472–488. Springer.

  23. Danelljan M, Bhat G, Shahbaz Khan F, Felsberg M. Eco: Efficient convolution operators for tracking. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2017. pp. 6638–6646.

  24. Wang N, Zhou W, Tian Q, Hong R, Li H. Multi-cue correlation filters for robust visual tracking. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2018.

  25. Khalid O, SanMiguel JC, Cavallaro A. Multi-tracker partition fusion. IEEE Trans Circuits Syst Video Technol. 2017.

  26. Wang N, Shi J, Yeung DY, Jia J. Understanding and diagnosing visual tracking systems. 2015.

  27. Xie C, Wang N, Zhou W, Li W, Li H. Multi-tracker fusion via adaptive outlier detection. Multimed Tools Appl. 2018;78:2227–50.

    Article  Google Scholar 

  28. Choi J, Jin Chang H, Jeong J, Demiris Y, Young Choi J. Visual tracking using attention-modulated disintegration and integration. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2016. pp. 4321–4330.

  29. Choi J, Jin Chang H, Yun S, Fischer T, Demiris Y, Young Choi J. Attentional correlation filter network for adaptive visual tracking. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2017. pp. 4807–4816.

  30. Chu Q, Ouyang W, Li H, Wang X, Liu B, Yu N. Online multi-object tracking using cnn-based single object tracker with spatial-temporal attention mechanism. In: Proceedings of the IEEE International Conference on Computer Vision. 2017. pp. 4836–4845.

  31. Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X. Residual attention network for image classification. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2017. pp. 3156–3164.

  32. Doran MM, Hoffman JE. The role of visual attention in multiple object tracking: Evidence from erps. Atten Percept Psychophys. 2010:33–52.

  33. Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2018. pp. 7132–7141.

  34. Gao Z, Xie J, Wang Q, Li P. Global second-order pooling convolutional networks. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2019. pp. 3024–3033.

  35. Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q. Eca-net: Efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. pp. 11534–11542.

  36. Woo S, Park J, Lee JY, So Kweon I. Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV). 2018. pp. 3–19.

  37. Fu CY, Liu W, Ranga A, Tyagi A, Berg AC. Dssd : Deconvolutional single shot detector. CoRR. 2017.

  38. Zhu R, Zhang S, Wang X, Wen L, Mei T. Scratchdet: Training single-shot object detectors from scratch. In: Proc IEEE Conf Comput Vis Pattern Recognit (CVPR). 2019.

  39. Danelljan M, Hager G, Shahbaz Khan F, Felsberg M. Convolutional features for correlation filter based visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision Workshops. 2015. pp. 58–66.

  40. Li X, Ma C, Wu B, He Z, Yang MH. Target-aware deep tracking. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2019. pp. 1369–1378.

  41. Bhat G, Danelljan M, Gool LV, Timofte R. Learning discriminative model prediction for tracking. In: Proceedings of the IEEE International Conference on Computer Vision. 2019. pp. 6182–6191.

  42. Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH. Fully-convolutional siamese networks for object tracking. In: European  Conference on Computer Vision. 2016. pp. 850–865. Springer.

  43. Li B, Yan J, Wu W, Zhu Z, Hu X. High performance visual tracking with siamese region proposal network. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2018. pp. 8971–8980.

  44. Danelljan M, Bhat G, Khan FS, Felsberg M. Atom: Accurate tracking by overlap maximization. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2019. pp. 4660–4669.

  45. Jung I, Son J, Baek M, Han B. Real-time mdnet. In: Proceedings of the European Conference on Computer Vision (ECCV). 2018. pp. 83–98.

  46. Pu S, Song Y, Ma C, Zhang H, Yang MH. Deep attention tracking via reciprocative learning. In: Adv Neural Inf Proces Syst. 2018. pp. 1931–1941.

Download references

Acknowledgements

This work is supported by Shenzhen Basic Research Program (No. JCYJ20170817155854115) and National Natural Science Foundation of China (No. 61976003).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Wang.

Ethics declarations

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Conflicts of Interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kong, Q., Tang, J., Li, C. et al. An Ensemble of Complementary Models for Deep Tracking. Cogn Comput 14, 1096–1106 (2022). https://doi.org/10.1007/s12559-021-09864-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-021-09864-3

Keywords

Navigation