Abstract
Visual tracking is one of the most challenging problems in computer vision. Most state-of-the-art visual trackers suffer from three challenging problems: nondiverse discriminate feature representation, coarse object locator, and limited quantities of positive samples. In this paper, a multi-view multi-expert region proposal prediction algorithm for tracking is proposed to solve the above problems concurrently in one framework. The proposed algorithm integrates multiple views and exploits powerful multiple sources of information, which can solve nondiverse discriminate feature representation problem effectively. It builds multiple SVM classifier models on the expanded bounding boxes and adds the regional suggestion network module to accurately optimize it to predict optimal object location, which naturally alleviates the coarse object locator and limited quantities of positive samples problems at the same time. A comprehensive evaluation of the proposed approach on various benchmark sequences has been performed. The evaluation results demonstrate our method can significantly improve the tracking performance by combining the advantages of lightweight region proposal network predictive learning model and multi-view expert groups. The experimental results demonstrate the proposed approach outperforms other state-of-the-art visual trackers.










Similar content being viewed by others
References
Yilmaz, A., Javed, O., Shah, M.: Object tracking: a survey. ACM Comput. Surv. 38(4), 1–45 (2006)
Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2411–2418 (2013)
Smeulders, A.W., Chu, D.M., Cucchiara, R., et al.: Visual tracking: an experimental survey. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1442–1468 (2014)
Danelljan, M., Hager, G., Khan, F., Felsberg, M.: Accurate scale estimation for robust visual tracking. In: Proceedings of British machine vision conference, pp. 1–11 (2014)
Gao, J., Ling, H., Hu, W., Xing, J.: Transfer learning based visual tracking with Gaussian processes regression. In: Proceedings of European Conference on Computer Vision, pp. 188–203 (2014)
Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2015)
Zhang, J., Ma, S., Sclaroff, S.: MEEM: robust tracking via multiple experts using entropy minimization. In: Proceedings of European Conference on Computer Vision, pp. 188–203 (2014)
Hare, S., Saffari, A., Torr, P.: Struck: structured output tracking with kernels. In: Proceedings of IEEE International Conference on Computer Vision, pp. 263–270 (2011)
Zhang, K., Zhang, L., Yang, M.-H.: Real-time compressive tracking. In: Proceedings of European Conference on Computer Vision, pp. 866–879 (2012)
Wang, N., Shi, J., Yeung, D., Jia, J.: Understanding and diagnosing visual tracking systems. In: Proceedings of IEEE International Conference on Computer Vision, pp. 3101–3109 (2015)
Comaniciu, D., Ramesh, V., Meer, P.: Real-time tracking of non-rigid objects using mean shift. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 142–149 (2002)
Ahonen, T., Hadid, A., Pietikainen, M.: Face description with local binary patterns: application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 28(12), 2037–2041 (2006)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 886–893 (2005)
Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1409–1422 (2012)
Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2016)
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.: Fully-convolutional siamese networks for object tracking. In: Proceedings of European Conference on Computer Vision, pp. 850–865 (2016)
Jack, V., Luca, B.: End-to-end representation learning for Correlation Filter based tracking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2017)
Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: Proceedings of IEEE International Conference on Computer Vision (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of Conference and Workshop on Neural Information Processing Systems (2015)
Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: SiamRPN++: evolution of siamese visual tracking with very deep networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2019)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Zhang, Y., Sohny, K., Villegasy, R.: ”Improving Object Detection with Deep Convolutional Networks via Bayesian Optimization and Structured Prediction,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (2015): 249C-258
Yoon, J., Kim, D., Yoon, K.: Visual tracking via adaptive tracker selection with multiple features. In: Proceedings of European Conference on Computer Vision, pp. 28–41 (2012)
Ma, L., Lu, J., Feng, J., Zhou, J.: Multiple feature fusion via weighted entropy for visual tracking. In: Proceedings of European Conference on Computer Vision, pp. 3128–3136 (2015)
Grabner, H., Bischof, H.: Online boosting and vision. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 260–267 (2006)
Hong, Z., Mei, X., Prokhorov, D., Tao, D.: Tracking via robust multi-task multi-view joint sparse representation. In: Proceedings of European Conference on Computer Vision, pp. 649–656 (2013)
Danelljan, M., Shahbaz Khan, F., Felsberg, M., Weijer, J.: Adaptive color attributes for real-time visual tracking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1090–1097 (2014)
Medioni, G., Vo, N., Ba, T.: Context tracker: exploring supporters and distracters in unconstrained environments. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp.1177–1184 (2011)
Ross, D.A., Lim, J., Lin, R.S., Yang, M.H.: Incremental learning for robust visual tracking. Int. J. Comput. Vis. 77(3), 125–141 (2008)
Sun, X., Yao, H., Zhang, S., Li, D.: Non-rigid object contour tracking via a novel supervised level set model. IEEE Trans. Image Process. 24(11), 3386–99 (2015)
Mei, X., Ling, H., Wu, Y., Blasch, E., Bai, L.: Minimum error bounded efficient L1 tracker with occlusion detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1257–1264 (2011)
Zhang, T., Bibi, A., Ghanem, B.: In defense of sparse tracking: circulant sparse tracker. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3080–3088 (2016)
Hu, D., Zhou, X., Wu, J.: Visual tracking based on convolutional deep belief network. In: Proceedings of Conference and Workshop on Neural Information Processing Systems, pp. 103–115. Springer (2015)
Kuen, J., Lim, K.M., Lee, C.P.: Self-taught learning of a deep invariant representation for visual tracking via temporal slowness principle. Pattern Recogn. 48(10), 2964–2982 (2016)
Wang, L., Ouyang, W., Wang, X., Lu, H.: Visual tracking with fully convolutional networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3119–3127 (2015)
Zhang, K., Liu, Q., Wu, Y., Yang, M.-H.: Robust visual tracking via convolutional networks without training. IEEE Trans. Image Process. 25(4), 1779–1792 (2016)
Bertinetto, L., Valmadre, J., Henriques, J. F., Vedaldi, A., Torr, P.H.: Fully-convolutional siamese networks for object tracking. In: Proceedings of European Conference on Computer Vision, pp. 850–865 (2016)
Li, H., Li, Y., Porikli, F.: DeepTrack: learning discriminative feature representations online for robust visual tracking. IEEE Trans. Image Process. 25(4), 1834–1848 (2016)
Birchfield, S., Sriram, R.: Spatiograms versus histograms for region-based tracking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1158–1163 (2005)
Grandvalet, Y., Bengio, Y.: Semi-supervised learning by entropy minimization. In: Proceedings of Conference and Workshop on Neural Information Processing Systems, pp. 529–536 (2005)
Danelljan, M., Robinson, A., Shahbaz Khan, F., et al.: Beyond correlation filters: Learning continuous convolution operators for visual tracking. In: Proceedings of European Conference on Computer Vision, pp. 472–488(2016)
Guo, W., Cao, L., Han, T.X., Yan, S., Xu, C.: Max-confidence boosting with uncertainty for visual tracking. IEEE Trans. Image Process. 24(5), 1650–1659 (2015)
Zhang, Y., Sohny, K., Villegasy, R.: Improving object detection with deep convolutional networks via bayesian optimization and structured prediction. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 249–258 (2015)
Rasmussen, C., Williams, C.: Gaussian Processes for Machine Learning, pp. 7–31. The MIT Press, Cambridge (2006)
Acknowledgements
The author would like to thank the anonymous reviewers for their helpful comments on an earlier draft of this paper. The work was supported in part by the National Natural Science Foundation of China under Grant 62072286 and Grant 61572296.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by B. Bao.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Guo, W., Li, D., Liang, B. et al. Multi-view region proposal network predictive learning for tracking. Multimedia Systems 29, 333–346 (2023). https://doi.org/10.1007/s00530-022-01001-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-022-01001-w