Abstract
Convolutional neural networks (CNNs) have shown favorable performance in recent tracking benchmark datasets. Some methods extract different levels of features based on pre-trained CNNs to deal with various challenging scenarios. Despite demonstrated successes for visual tracking, utilizing features from the same network might suffer from the suboptimal performance due to limitations of CNN architecture itself. We observe that different CNNs usually have complementary characteristics in representing target objects. Therefore, we propose to leverage the complementary properties of different CNNs for visual tracking in this paper. The importances of different CNNs are identified by a joint inference of candidate location, predicted location and confidence score. The prediction scores of all CNNs are adaptively fused to obtain robust tracking performance. Moreover, we introduce the attention mechanism to highlight discriminative features in each CNN. Experimental results on OTB2013 and OTB2015 datasets show that the proposed method performs favorably compared with some state-of-the-art methods. We conclude that combination of complementary models can better track objects in terms of accuracy and robustness.
Similar content being viewed by others
References
Li C, Lin L, Zuo W, Tang J, Yang M. Visual tracking via dynamic graph learning. IEEE Trans Pattern Anal Mach Intell. 2019;41(11):2770–82. https://doi.org/10.1109/TPAMI.2018.2864965.
Li C, Liang X, Lu Y, Zhao N, Tang J. Rgb-t object tracking: Benchmark and baseline. Pattern Recogn. 2019;96:106977.
Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). 2005;1:886–893. IEEE.
Van De Weijer J, Schmid C, Verbeek J, Larlus DJIToIP. Learning color names for real-world applications. IEEE Trans Image Process. 2009;18(7):1512–1523.
Rublee E, Rabaud V, Konolige K, Bradski G. Orb: An efficient alternative to sift or surf. In: 2011 International Conference on Computer Vision. 2011. pp. 2564–2571. Ieee.
Ta DN, Chen WC, Gelfand N, Pulli K. Surftrac: Efficient tracking and continuous object recognition using local feature descriptors. In: 2009 Proc IEEE Conf Comput Vis Pattern Recognit. 2009. pp. 2937–2944. IEEE.
Bhat G, Johnander J, Danelljan M, Shahbaz Khan F, Felsberg M. Unveiling the power of deep tracking. In: Proceedings of the European Conference on Computer Vision (ECCV). 2018. pp. 483–498.
Ma C, Huang JB, Yang X, Yang MH. Hierarchical convolutional features for visual tracking. In: IEEE International Conference on Computer Vision. 2016.
Yu W, Wang X, Hou Z, Wang P, Qin XJM. Microsystems: Deep discriminative correlation tracking based on adaptive feature fusion. Microsc Microanal. 2019;71:102854.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2015. pp. 770–778.
Simonyan K, Zisserman AJ. Very deep convolutional networks for large-scale image recognition. 2014. arXiv preprint. arXiv:1409.1556.
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2015. pp. 1–9.
Wu Y, Lim J, Yang MH. Online object tracking: A benchmark. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2013. pp. 2411–2418.
Wu, Yi JL, Yang M. Object tracking benchmark. In: IEEE Trans Pattern Anal Mach Intell. 2015. pp. 1834–1848.
Bolme DS, Beveridge JR, Draper BA, Lui YM. Visual object tracking using adaptive correlation filters. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2010. pp. 2544–2550. IEEE.
Kiani Galoogahi H, Fagg A, Lucey S. Learning background-aware correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision. 2017. pp. 1135–1143.
Liu T, Wang G, Yang Q. Real-time part-based visual tracking via adaptive correlation filters. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2015. pp. 4902–4912.
Ma C, Huang JB, Yang X, Yang MH. Hierarchical convolutional features for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision. 2015. pp. 3074–3082.
Zhang T, Xu C, Yang MH. Multi-task correlation particle filter for robust object tracking. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2017. pp. 4335–4343.
Qi Y, Zhang S, Qin L, Yao H, Huang Q, Lim J, Yang MH. Hedged deep tracking. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2016. pp. 4303–4311.
Sun C, Wang D, Lu H, Yang MH. Learning spatial-aware regressions for visual tracking. 2018.
Danelljan M, Robinson A, Khan F.S, Felsberg M. Beyond correlation filters: Learning continuous convolution operators for visual tracking. In: European Conference on Computer Vision. 2016. pp. 472–488. Springer.
Danelljan M, Bhat G, Shahbaz Khan F, Felsberg M. Eco: Efficient convolution operators for tracking. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2017. pp. 6638–6646.
Wang N, Zhou W, Tian Q, Hong R, Li H. Multi-cue correlation filters for robust visual tracking. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2018.
Khalid O, SanMiguel JC, Cavallaro A. Multi-tracker partition fusion. IEEE Trans Circuits Syst Video Technol. 2017.
Wang N, Shi J, Yeung DY, Jia J. Understanding and diagnosing visual tracking systems. 2015.
Xie C, Wang N, Zhou W, Li W, Li H. Multi-tracker fusion via adaptive outlier detection. Multimed Tools Appl. 2018;78:2227–50.
Choi J, Jin Chang H, Jeong J, Demiris Y, Young Choi J. Visual tracking using attention-modulated disintegration and integration. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2016. pp. 4321–4330.
Choi J, Jin Chang H, Yun S, Fischer T, Demiris Y, Young Choi J. Attentional correlation filter network for adaptive visual tracking. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2017. pp. 4807–4816.
Chu Q, Ouyang W, Li H, Wang X, Liu B, Yu N. Online multi-object tracking using cnn-based single object tracker with spatial-temporal attention mechanism. In: Proceedings of the IEEE International Conference on Computer Vision. 2017. pp. 4836–4845.
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X. Residual attention network for image classification. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2017. pp. 3156–3164.
Doran MM, Hoffman JE. The role of visual attention in multiple object tracking: Evidence from erps. Atten Percept Psychophys. 2010:33–52.
Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2018. pp. 7132–7141.
Gao Z, Xie J, Wang Q, Li P. Global second-order pooling convolutional networks. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2019. pp. 3024–3033.
Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q. Eca-net: Efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. pp. 11534–11542.
Woo S, Park J, Lee JY, So Kweon I. Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV). 2018. pp. 3–19.
Fu CY, Liu W, Ranga A, Tyagi A, Berg AC. Dssd : Deconvolutional single shot detector. CoRR. 2017.
Zhu R, Zhang S, Wang X, Wen L, Mei T. Scratchdet: Training single-shot object detectors from scratch. In: Proc IEEE Conf Comput Vis Pattern Recognit (CVPR). 2019.
Danelljan M, Hager G, Shahbaz Khan F, Felsberg M. Convolutional features for correlation filter based visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision Workshops. 2015. pp. 58–66.
Li X, Ma C, Wu B, He Z, Yang MH. Target-aware deep tracking. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2019. pp. 1369–1378.
Bhat G, Danelljan M, Gool LV, Timofte R. Learning discriminative model prediction for tracking. In: Proceedings of the IEEE International Conference on Computer Vision. 2019. pp. 6182–6191.
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH. Fully-convolutional siamese networks for object tracking. In: European Conference on Computer Vision. 2016. pp. 850–865. Springer.
Li B, Yan J, Wu W, Zhu Z, Hu X. High performance visual tracking with siamese region proposal network. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2018. pp. 8971–8980.
Danelljan M, Bhat G, Khan FS, Felsberg M. Atom: Accurate tracking by overlap maximization. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2019. pp. 4660–4669.
Jung I, Son J, Baek M, Han B. Real-time mdnet. In: Proceedings of the European Conference on Computer Vision (ECCV). 2018. pp. 83–98.
Pu S, Song Y, Ma C, Zhang H, Yang MH. Deep attention tracking via reciprocative learning. In: Adv Neural Inf Proces Syst. 2018. pp. 1931–1941.
Acknowledgements
This work is supported by Shenzhen Basic Research Program (No. JCYJ20170817155854115) and National Natural Science Foundation of China (No. 61976003).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethical Approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Conflicts of Interest
The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Kong, Q., Tang, J., Li, C. et al. An Ensemble of Complementary Models for Deep Tracking. Cogn Comput 14, 1096–1106 (2022). https://doi.org/10.1007/s12559-021-09864-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-021-09864-3