Abstract
Most of the existing tracking methods based on convolutional neural network (CNN) models are too slow for use in real-time applications despite their excellent tracking accuracy in comparison with traditional methods. Meanwhile, CNN tracking solutions are memory intensive and require considerable computational resources. In this paper, we propose a time-efficient and accurate tracking scheme, a feature selection accelerated CNN (FSNet) tracking solution based on MDNet (Multi-Domain Network). The large number of convolutional operations is a major contributor to the high computational cost of MDNet. To reduce the computational complexity, we incorporated an efficient mutual information-based feature selection over the convolutional layer that reduces the feature redundancy in feature maps. Considering that tracking is a typical binary classification problem, redundant feature maps can simply be pruned, which results in an insignificant influence on the tracking performance. To further accelerate the CNN tracking solution, a RoIAlign layer is added that can apply convolution to the entire image instead of just to each RoI (Region of Interest). The bilinear interpolation of RoIAlign could well reduce misalignment errors of the tracked target. In addition, a new fine-tuning strategy is used in the fully-connected layers to accelerate the online updating process. By combining the above strategies, the accelerated CNN achieves a speedup to 60 FPS (Frame Per Second) on the GPU compared with the original MDNet, which functioned at 1 FPS with a very low impact on tracking accuracy. We evaluated the proposed solution on four benchmarks: OTB50, OTB100 ,VOT2016 and UAV123. The extensive comparison results verify the superior performance of FSNet.













Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Wu Y, Lim J, Yang MH (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848
Li P, Wang D, Wang L, Lu H (2018) Deep visual tracking: Review and experimental comparison. Pattern Recogn 76:323–338
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Pérez-Hernández F, Tabik S, Lamas A, Olmos R, Fujita H, Herrera F (2020) Object detection binary classifiers methodology based on deep learning to identify small objects handled similarly: Application in video surveillance. Knowl.-Based Syst 194:105590
Lu N, Wu Y, Feng L, Song J (2018) Deep learning for fall detection: Three-dimensional cnn combined with lstm on video kinematic data. IEEE J Biomed Health Inform 23(1):314–323
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
Danelljan M, Bhat G, Shahbaz Khan F, Felsberg M (2017) Eco: Efficient convolution operators for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6638–6646
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, et al. (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4293–4302
Kraskov A, Stögbauer H, Grassberger P (2004) Estimating mutual information. Phys Rev E 69(6):066138
Cui Z, Lu N, Jing X, Shi X (2018) Fast dynamic convolutional neural networks for visual tracking. In: Proceedings of Asian conference on machine learning, pp 770–785
Bolme DS, Beveridge JR, Draper BA, Lui YM (2010) Visual object tracking using adaptive correlation filters. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2544–2550
Henriques JF, Caseiro R, Martins P, Batista J (2014) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596
Touil DE, Terki N, Medouakh S (2018) Learning spatially correlation filters based on convolutional features via pso algorithm and two combined color spaces for visual tracking. Appl Intell 48:2837–2846
Han Z, Wang P, Ye Q (2020) Adaptive discriminative deep correlation filter for visual object tracking. IEEE Trans Circuits Syst Video Technol 30(1):155–166
Xu T, Feng ZH, Wu XJ, Kittler J (2019) Learning adaptive discriminative correlation filters via temporal consistency preserving spatial feature selection for robust visual object tracking. IEEE Trans Image Process 28(11):5596–5609
Gao P, Zhang Q, Wang F, Xiao L, Fujita H, Zhang Y (2020) Learning reinforced attentional representation for end-to-end visual tracking. Inform Sci 517:52–67
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH (2016) Fully-convolutional siamese networks for object tracking. In: Proceedings of European conference on computer vision, pp 850–865
Zhang Z, Peng H (2019) Deeper and wider siamese networks for real-time visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern Recognition, pp 4591–4600
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Gao P, Yuan R, Wang F, Xiao L, Fujita H, Zhang Y (2020) Siamese attentional keypoint network for high performance visual tracking. Knowl.-Based Syst 193:105448
Abdelpakey MH, Shehata MS (2019) Dp-siam: Dynamic policy siamese network for robust object tracking. IEEE Trans Image Process 28(11):5596–5609
Huang W, Gu J, Ma X, Li Y (2020) End-to-end multitask siamese network with residual hierarchical attention for real-time object tracking. Appl Intell 50:1908–1921
Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: Delving deep into convolutional nets. In: Proceedings of British machine vision conference
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of advances in neural information processing systems, pp 1097–1105
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825–2830
Lal TN, Chapelle O, Weston J, Elisseeff A (2006) Embedded methods. Feature Extraction 207:137–165
Sung KK, Poggio T (1998) Example-based learning for view-based human face detection. IEEE Trans Pattern Anal Mach Intell 20(1):39–51
Girshick R, Donahue J, Darrell T, Malik J (2015) Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38(1):142–158
Jung I, Son J, Baek M, Han B (2018) Real-time mdnet. In: Proceedings of European conference on computer vision, pp 83–98
Song Y, Ma C, Gong L, Zhang J, Lau RW, Yang MH (2017) Crest: Convolutional residual learning for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 2555–2564
Choi J, Jin Chang H, Fischer T, Yun S, Lee K, Jeong J, Demiris Y, Young Choi J (2018) Context-aware deep feature compression for high-speed visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 479–488
Henriques JF, Caseiro R, Martins P, Batista J (2012) Exploiting the circulant structure of tracking-by-detection with kernels. In: Proceedings of European conference on computer vision, pp 702–715
Ma C, Huang JB, Yang X, Yang MH (2015) Hierarchical convolutional features for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 3074–3082
Bertinetto L, Valmadre J, Golodetz S, Miksik O, Torr PH (2016) Staple: Complementary learners for real-time tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1401–1409
Kristan M, Matas J, Leonardis A, Felsberg M, Cehovin L, Fernández G, Vojir T (2016) The visual object tracking vot2016 challenge results. In: Proceedings of European conference on computer vision, pp 777–823
Mueller M, Smith N, Ghanem B (2016) A benchmark and simulator for uav tracking. In: Proceedings of European conference on computer vision, pp 445–461
Danelljan M, Hager G, Shahbaz Khan F, Felsberg M (2015) Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 4310–4318
Zhang J, Ma S, Sclaroff S (2014) Meem: Robust tracking via multiple experts using entropy minimization. In: Proceedings of European conference on computer vision, pp 188–203
Li Y, Zhu J (2014) A scale adaptive kernel correlation filter tracker with feature integration. In: Proceedings of European conference on computer vision, pp 254–265
Hong Z, Chen Z, Wang C, Mei X, Prokhorov D, Tao D (2015) Multi-store tracker (muster): A cognitive psychology inspired approach to object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 749–758
Danelljan M, Häger G, Khan FS, Felsberg M (2016) Discriminative scale space tracking. IEEE Trans Pattern Anal Mach Intell 39(8):1561–1575
Hare S, Golodetz S, Saffari A, Vineet V, Cheng MM, Hicks SL, Torr PH (2015) Struck: Structured output tracking with kernels. IEEE Trans Pattern Anal Mach Intell 38(10):2096–2109
Jia X, Lu H, Yang MH (2012) Visual tracking via adaptive structural local sparse appearance model. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1822–1829
Kalal Z, Mikolajczyk K, Matas J (2011) Tracking-learning-detection. IEEE Trans Pattern Anal Mach Intell 34(7):1409–1422
Acknowledgements
This work is supported by National Key R&D Program of China 2018AAA0101501, Science and Technology Project of SGCC (State Grid Corporation of China): Fundamental Theory of Human-in-the-loop Hybrid-Augmented Intelligence for Power Grid Dispatch and Control.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Cui, Z., Lu, N. Feature selection accelerated convolutional neural networks for visual tracking. Appl Intell 51, 8230–8244 (2021). https://doi.org/10.1007/s10489-021-02234-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02234-4