Skip to main content
Log in

Guided MDNet tracker with guided samples

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Visual tracking is the process of estimating the position of an object in a video sequence and plays a very important role in the field of autonomous video processing. Recent work renders that the trackers developed using deep learning techniques such as the convolutional neural network (CNN) exhibits outstanding performances in terms of accuracy and robustness as compared to other state-of-the-art trackers. Multi-domain convolutional neural network (MDNet) is a deep tracker which uses the CNN for estimating the target in each frame of the video sequence. The majority of the tracking challenges could be very easily handled by the MDNet tracker due to its offline training and online tracking features. The offline training stage helps in capturing the target representations into the shared layers of the CNN, while the online tracking uses a large number of probable random samples of bboxes (bounding boxes) around the previous target for estimating the target in the current frame. Once the target is estimated, a process of fine-tuning is performed which will update the weights of shared layers of CNN. The large number of random samples used for target estimation and the huge number of random training samples generated for the fine-tuning during the online stage makes tracking by MDNet computationally complex and slow. The major contribution of this paper is to suggest using guided samples to the input of the CNN rather than random samples. Moreover, it generates a lesser number of highly efficient training samples for the fine-tuning which helps in decreasing the computational complexity of the tracker by half without much compromise on the performance and thus improves the speed of the tracking process. An extensive evaluation has been performed on the proposed Guided MDNet with different datasets like ALOV300++, OTB and VOT, and its performances are measured in terms of metrics like F-score, one-pass evaluation, robustness and accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Nam, H., Han, B..: Learning multi-domain convolutional neural networks for visual tracking. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4293–4302 (2016)

  2. Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 34, 7 (2012)

  3. Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 37(3), 583–596 (2015)

  4. Krizhevsky, A., Sutskever, I., Hinton, G. E.: Imagenet classification with deep convolutional neural networks. In: Conference on Neural Information Processing Systems (NIPS) (2012)

  5. Ciresan, D. C., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2012)

  6. Nam, H., Han, B., et al.: The Visual Object Tracking VOT2015 Challenge Results (2015)

  7. Zhang, K., Liu, Q., Yang, M.H.: Robust visual tracking via convolutional networks without training. IEEE Trans. Image Process. 25, 4 (2016)

    Article  MathSciNet  Google Scholar 

  8. Jepson, A., Fleet, D., El-Maraghi, T.: Robust on-line appearance models for visual tracking. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 25(10), 1296–1311 (2003)

  9. Adam, A., Rivlin, E., Shimshoni, I.: Robust fragments based tracking using the integral histogram. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 798–805 (2006)

  10. Briechle, K., Hanebeck, U.D.: Template matching using fast normalized cross correlation. In: Aerospace/Defense Sensing, Simulation, and Controls. International Society for Optics and Photonics, pp. 95–102 (2001)

  11. Lucas, B. D., Kanade, T., et al.: An iterative image registration technique with an application to stereo vision. In: International Joint Conference on Artificial Intelligence, IJCAI, vol. 81, pp. 674–679 (1981)

  12. Ross, D.. A., Lim, J., Lin, R..-S., Yang, M..-H.: Incremental learning for robust visual tracking. Int. J. Comput. Vis. (IJCV) 77(1–3), 125–141 (2008)

  13. Smeulders, A.W., Chu, D.M., Cucchiara, R., Calderara, S., Dehghan, A., Shah, M.: Visual tracking: an experimental survey. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 36(7), 1442–1468 (2014)

  14. Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 37(3), 583–596 (2015)

    Article  Google Scholar 

  15. Smeulders, A.W., Chu, D.M., Cucchiara, R., Calderara, S., Dehghan, A., Shah, M.: Visual tracking: an experimental survey. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 36(7), 1442–1468 (2014)

    Article  Google Scholar 

  16. Babenko, B., Yang, M.-H., Belongie, S.: Visual tracking with online multiple instance learning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2009)

  17. Kalal, Z., Matas, J., Mikolajczyk, K.: P-N learning: bootstrapping binary classifiers by structural constraints. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2010)

  18. Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: British Machine Vision Conference (BMVC) (2014)

  19. Ross, D.A., Lim, J., Lin, R.-S., Yang, M.-H.: Incremental learning for robust visual tracking. Int. J. Comput. Vis. (IJCV) 77(1–3), 125–141 (2008)

    Article  Google Scholar 

  20. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2015)

  21. Lucas, B.D., Kanade, T., et al.: An iterative image registration technique with an application to stereo vision. International Joint Conference on Artificial Intelligence, IJCAI 81, 674–679 (1981)

    Google Scholar 

  22. Jepson, A., Fleet, D., El-Maraghi, T.: Robust on-line appearance models for visual tracking. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 25(10), 1296–1311 (2003)

    Article  Google Scholar 

  23. Danelljan, M., Hager, G., Khan, F.S., Felsberg, M.: Convolutional Features for correlation filter based visual tracking (2015)

  24. Avidan, S.: Ensemble tracking. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 29(2), 261–271 (2007)

    Article  Google Scholar 

  25. Li, Y., Zhu, J.: A scale adaptive kernel correlation filter tracker with feature integration. In: European Conference on Computer Vision (ECCV), Lecture Notes in Computer Science, vol. 8926, Springer Publishing, pp. 254–265 (2015)

  26. Danelljan, M., Hager, G., Khan, F.S., Felsberg, M.: Learning spatially regularized correlation filters for visual tracking (2015)

  27. Zhang, J., Ma, S., Sclaroff, S.: MEEM: robust tracking via multiple experts using entropy minimization. In; European Conference on Computer Vision (ECCV), pp. 188–203 (2014)

  28. Kakanuru, S., Rapuru, M. K., Mishra, D., Gorthi, S. S.: Complementary tracker’s fusion for robust visual tracking. In: Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP), pp. 51:1–51:8 (2016)

  29. Zhang, K., Liu, Q., Yang, M.H.: Robust visual tracking via convolutional networks without training. IEEE Trans. Image Process. 25, 4 (2016)

  30. Danelljan, M., Robinson, A., Khan, F. S., Felsberg, M.: Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking, Best Tracker of VOT2016. arXiv:608.03773 (2017)

  31. Nam, H., Baek, M., Han, B.: Modeling and Propagating CNNs in a Tree Structure for Visual Tracking, Winner of VOT2016. https://arxiv.org/abs/1608.07242, 2017

  32. Wu, Y., Lim, J., Yang, M. H.: Online Object Tracking: A Benchmark, pp. 2411–2418 (2013)

  33. Ma, C., Huang, J.B., Yang, X., Yang, M.H.: Hierarchical Convolutional Features for Visual Tracking (2015)

  34. Chu, D.M., Smeulders, A.W.M.: Thirteen hard cases in visual tracking. In: Proceedings of IEEE International Workshop PETS (2010). http://alov300pp.joomlafree.it/

  35. Kristan, Matej, Matas, Jiri, et al.: A novel performance evaluation methodology for single-target trackers. IEEE Trans. Pattern Anal. Mach. Intell. 38(11), 2137–2155 (2016)

  36. Ilchae, J., Jeany, S., Mooyeol, B., Bohyung, H.: Real-time MDNet. In: European Conference on Computer Vision (ECCV) (2018)

  37. Hsu, H., Ding, J.: FasterMDNet: learning model adaptation by RNN in tracking-by-detection based visual tracking. In: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Kuala Lumpur, pp. 657–660 (2017)

  38. Kristan, M., et al.: The visual object tracking VOT2017 challenge results. In: ICCV 2017 Workshops (2017)

Download references

Funding

This study was not funded by any grant.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deepak Mishra.

Ethics declarations

Conflict of interest

All the authors declare that they have no conflict of interest.

Ethical standard

All the authors declare that all the principles of ethical and professional con- duct have been followed while preparing the manuscript with heading.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Venugopal Minimol, P., Mishra, D. & Gorthi, R.K.S.S. Guided MDNet tracker with guided samples. Vis Comput 38, 1135–1149 (2022). https://doi.org/10.1007/s00371-021-02072-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-021-02072-y

Keywords

Navigation