Skip to main content
Log in

A survey on online learning for visual tracking

  • Survey
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Visual object tracking has become one of the most active research topics in computer vision, which has been growing in commercial development as well as academic research. Many visual trackers have been proposed in the last two decades. Recent studies of computer vision for dynamic scenes include motion detection, object classification, environment modeling, tracking of moving objects, understanding of object behaviors, object identification, and data fusion from multiple sensors. This paper provides an in-depth overview of recent object tracking research. Object tracking tasks in realistic scenario often face challenging problems such as camera motion, occlusion, illumination effect, clutter, and similar appearance. A variety of tracker techniques have been published, which combine multiple techniques to solve multiple visual tracking sub-problems. This paper also reviews the latest research trend in object tracking based on convolutional neural networks, which is receiving growing attention. Finally, the paper discusses the future challenges and research directions for the object tracking problems that still need extensive studies in coming years.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Wang, X., Chen, D., Yang, T., Hu, B., Zhang, J.: Action recognition based on object tracking and dense trajectories. In: IEEE International Conference on Automatica (ICA-ACCA) (2016). https://doi.org/10.1109/ica-acca.2016.7778391

  2. Foresti, G.L., Snidaro, L.: (2005) Vehicle detection and tracking for traffic monitoring. In: Roli, F., Vitulano, S. (eds) Image Analysis and Processing—ICIAP 2005. ICIAP 2005. Lecture Notes in Computer Science, vol. 3617. Springer, Berlin. https://doi.org/10.1007/11553595_147

  3. Hui, Z., Yaohua, X., Lu M, Jiansheng, F.: Vision-based real-time traffic accident detection. In: 2014 11th World Congress on Intelligent Control and Automation (WCICA). https://doi.org/10.1109/wcica.2014.7052859

  4. Kamijo, S., Matsushita, Y., Ikeuchi, K., Sakauchi, M.: Traffic monitoring and accident detection at intersections. IEEE Trans. Intell. Trans. Syst. 10(1109/6979), 880968 (2000)

    Google Scholar 

  5. Sidla, O., Lypetskyy, Y., Brandle, N., Seer, S.: Pedestrian detection and tracking for counting applications in crowded situations. In: IEEE International Conference on Video and Signal Based Surveillance. AVSS’06 (2006). https://doi.org/10.1109/AVSS.2006.91

  6. Li, X., Zhao, H., Zhang, L.: Pedestrian counting system based on multiple object detection and tracking. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, E.S. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science, vol 10636. Springer, Cham, https://doi.org/10.1007/978-3-319-70090-8_9

  7. Wang, Y., Doherty, J. E., Van Dyck, R. E.: Moving object tracking in video. In: Proceedings. 29th Applied Imagery Pattern Recognition Workshop (2000). https://doi.org/10.1109/aiprw.2000.953609

  8. Kim, C., Hwang, J.-N.: Fast and automatic video object segmentation and tracking for content-based applications. IEEE Trans. Circuits Syst. Video Technol. (2002). https://doi.org/10.1109/76.988659

    Article  Google Scholar 

  9. Lu, G., Shark, L. K., Hall, G.: Dynamic hand gesture tracking and recognition for real-time immersive virtual object manipulation. In: International Conference on CyberWorlds, 2009. CW’09 (2009). https://doi.org/10.1109/CW.2009.22

  10. Boult, T.: Frame-rate multi-body tracking for surveillance. In: Proceedings of the DARPA Image Understanding Workshop, Monterey, CA, pp. 305–308 (1998)

  11. Basu, A., Southwell, D.: Omni-directional sensors for pipe inspection. In: Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, vol. 4, pp. 3107–3112 (1995)

  12. Kemeny, S. E., Panicacci, R., Pain, B., Matthies, L., Fossum, E. R.: Multi-resolution image sensor. In: IEEE Transactions on the Circuits System Video Technology, vol. 7, pp. 575–583 (1997)

  13. Gress, O., Posch, S.: Trajectory retrieval from Monte Carlo data association samples for tracking in fluorescence microscopy images. In: 2012 9th IEEE International Symposium on Biomedical Imaging (ISBI). IEEE, pp. 374–377 (2012)

  14. Mian, A.S.: Real time visual tracking of aircrafts. Digital Image Comput Tech Appl (2008). https://doi.org/10.1109/dicta.2008.33

    Article  Google Scholar 

  15. Li, P., Wang, D., Wang, L., Huchuan, L.: Deep visual tracking: review and experimental comparison. Pattern Recogn. 76, 323–338 (2018)

    Article  Google Scholar 

  16. Yan, C., Li, L., Zhang, C., Liu, B., Zhang, Y., Dai, Q.: Cross-modality bridging and knowledge transferring for image understanding. IEEE Trans Multimed 21, 2675–2685 (2019)

    Article  Google Scholar 

  17. Abbass, M.Y., Kwon, K., Kim, N. et al.: Efficient object tracking using hierarchical convolutional features model and correlation filters. Vis. Comput. (2020). https://doi.org/10.1007/s00371-020-01833-5

    Article  Google Scholar 

  18. Hao, X., Zhang, Y., Dai, Q.: A fast uyghur text detector for complex background images. IEEE Trans Multimed 20, 3389–3398 (2018)

    Article  Google Scholar 

  19. Everingham, M., Gool, L.J.V., Williams, C.K.I., Winn, J.M., Zisserman, A.: The Pascal visual object classes VOC challenge. IJCV 88(2), 303–338 (2010)

    Article  Google Scholar 

  20. Nghiem, A. T., Bremond, F., Thonnat, M., Valentin, V.: Etiseo, performance evaluation for video surveillance systems. In: Proceedings of the AVSS, London, UK, pp. 476–481 (2007)

  21. Kwon, J., Lee, K. M.: Tracking of a non-rigid object via patch-based dynamic appearance modeling and adaptive basin hopping monte carlo sampling. In: Proceedings of the IEEE CVPR, Miami, FL, USA (2009)

  22. Kwon, J., Lee, K.: Tracking of abrupt motion using Wang Landau Monte Carlo estimation. In: Proceedings of the 10th ECCV, Marseille, France (2008)

  23. Salti, S., Cavallaro, A., di Stefano, L.: Adaptive appearance modeling for video tracking: survey and evaluation. IEEE Trans. Image Process. 21(10), 4334–4348 (2012)

    Article  MathSciNet  Google Scholar 

  24. Karasulu, B., Korukoglu, S.: A software for performance evaluation and comparison of people detection and tracking methods in video processing. MTA 55(3), 677–723 (2011)

    Google Scholar 

  25. Maggio, E., Cavallaro, A.: Tracking by sampling trackers. In: Proceedings of the IEEE ICCV, Barcelona, Spain, pp. 1195–1202 (2011)

  26. Smeulders, A.W.M., Chu, D.M., Cucchiara, R., Calderara, S., Dehghan, A., Shah, M.: Visual tracking: an experimental survey. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1442–1468 (2014)

    Article  Google Scholar 

  27. Liu, Q., Zhao, X., Hou, Z.: Survey of single-target visual tracking methods based on online learning. IET Comput. Vis. 8(5), 419–428 (2014)

    Article  Google Scholar 

  28. Jepson, A. D., Fleet, D. J., El-Maraghi, T. F.: Robust online appearance models for visual tracking. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition (CVPR), Kauai, HI, USA, pp. 415–422 (2001)

  29. Zhou, S., Chellappa, R., Moghaddam, B.: Visual tracking and recognition using appearance-adaptive models in particle filters. IEEE Trans. Image Process. 13(11), 1491–1506 (2004)

    Article  Google Scholar 

  30. Tu, J. L., Tao, H.: Online updating appearance generative mixture model for meanshift tracking. In: Proceedings of the Asian Conference Computer Vision (ACCV), Hyderabad, India, pp. 694–703 (2006)

  31. Fussenegger, M., Roth, P., Bischof, H., Deriche, R., Pinz, A.: A level set framework using a new incremental, robust active shape model for object segmentation and tracking. Image Vis. Comput. 27(8), 1157–1168 (2009)

    Article  Google Scholar 

  32. Yang, H. X., Song, Z., Chen, R. N.: An incremental PCA-HOG descriptor for robust visual hand tracking. In: Proceedings of the International Symposium Visual Computing (ISVC), Las Vegas, Nevada, USA, pp. 687–695 (2010)

  33. Chiverton, J., Xie, X.H.: Automatic bootstrapping and tracking of object contours. IEEE Trans. Image Process. 21(3), 1231–1245 (2012)

    Article  MathSciNet  Google Scholar 

  34. Chiverton, J., Mirmehdi, M., Xie, X. H.: On-line learning of shape information for object segmentation and tracking. In: Proceedings of the British Machine Vision Conference (BMVC), London, UK, pp. 1–11 (2009)

  35. Liu, X.B., Lin, L., Yan, S.C., Jin, H., Jiang, W.B.: Adaptive object tracking by learning hybrid template online. IEEE Trans. Circuits Syst. Video Technol. 21(11), 1588–1599 (2011)

    Article  Google Scholar 

  36. Xu, Y. L., Zhou, H. F., Wang, Q., Lin, L.: Real time object of interest tracking by learning composite patch-based templates. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), Orlando, FL, USA, pp. 389–392 (2012)

  37. Kwon, J., Lee, K. M.: Visual tracking decomposition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, pp. 1269–1276 (2010)

  38. Kwon, J., Lee, K. M.: Tracking by sampling trackers. In: Proceedings of the IEEE Conference on Computer Vision (ICCV), Barcelona, Spanish, pp. 1195–1202 (2011)

  39. Ross, D., Lim, J., Yang, M. H.: Adaptive probabilistic visual tracking with incremental subspace update. In: Proceedings of the European Conference on Computer Vision (ECCV), Prague, Czech Republic, pp. 470–482 (2004)

  40. Lim, J., Ross, D., Lin, R.S., Yang, M.H.: Incremental learning for visual tracking. In: Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, pp. 793–800. MTI Press, Boca Raton (2005)

    Google Scholar 

  41. Lee, K., Kriegman, D.: Online learning of probabilistic appearance manifolds for video-based recognition and tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, USA, pp. 852–859 (2005)

  42. Li, X., Hu, W. M., Zhang, Z. F.: Robust visual tracking based on incremental tensor subspace learning. In: Proceedings of the IEEE Conference on Computer Vision (ICCV), Rio de Janeiro, Brazil, October 2007, pp. 1–8

  43. Wen, J., Gao, X.: Incremental learning of weighted tensor subspace for visual tracking. In: Proceedings of the IEEE International Conference on Systems, Man and Cybernetics (SMC), San Antonio, TX, USA, pp. 3688–3693 (2009)

  44. Li, X., Hu, W., Zhang, Z., Zhang, X., Luo, G.: Visual tracking via incremental log-Euclidean Riemannian subspace learning. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, Alaska, USA, pp. 1–8 (2008)

  45. Wu, Y., Cheng, J., Wang, J., Lu, H.: Real-time visual tracking via incremental covariance tensor learning. In: Proceedings of the IEEE Conference on Computer Vision (ICCV), Kyoto, Japan, pp. 1631–1638 (2009)

  46. Lu, K., Ding, Z.M., Ge, S.: Locally connected graph for visual tracking. Neurocomputing 120, 45–53 (2013)

    Article  Google Scholar 

  47. Matthews, L., Ishikawa, T., Baker, S.: The template update problem. IEEE Trans. Pattern Anal. Mach. Intell. 26(6), 810–815 (2004)

    Article  Google Scholar 

  48. Mei, X., Ling, H.B.: Robust visual tracking and vehicle classification via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 33(11), 2259–2272 (2011)

    Article  Google Scholar 

  49. Liu, B., Yang, L., Huang, J., Meer, P., Gong, L., Kulikowski, C. A.: Robust and fast collaborative tracking with two stage sparse optimization. In: Proceedings of the European Conference on Computer Vision (ECCV), Grete, Greece, pp. 624–637 (2010)

  50. Liu, R., Huang, J. Z., Yang, L., Kulikowsk, C. A.: Robust tracking using local sparse appearance model and K-selection. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition (CVPR), Colorado Springs, CO, USA, pp. 1313–1320 (2011)

  51. Chen, F., Wang, Q., Wang, S., Zhang, W.D., Xu, W.L.: Object tracking via appearance modeling and sparse representation. Int. J. Image Vis. Comput. 29, 787–796 (2011)

    Article  Google Scholar 

  52. Jia, X., Lu, H., Yang, M. H.: Visual tracking via adaptive structural local sparse appearance model. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition (CVPR), Providence, Rhode Island, USA, pp. 1822–1829 (2012)

  53. Lu, X.Q., Yuan, Y., Yan, P.K.: Robust visual tracking with discriminative sparse learning. Pattern Recogn. 46(7), 1762–1771 (2013)

    Article  Google Scholar 

  54. Stern, H., Efros, B.: Adaptive color space switching for face tracking in multi-colored lighting environments. In: Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition, Washington, DC, USA, pp. 236–241 (2002)

  55. Collins, R.T., Liu, Y.X., Leordeanu, M.: Online selection of discriminative tracking features. IEEE Trans. Pattern Anal. Mach. Intell. 27(10), 1631–1643 (2004)

    Article  Google Scholar 

  56. Nguyen, H. T., Smeulders, A.: Tracking aspects of the foreground against the background. In: Proceedings of the European Conference on Computer Vision (ECCV), Prague, Czech Republic, pp. 446–456 (2004)

  57. Wang, J., Chen, X., Gao, W.: Online selecting discriminative tracking features using particle filter. In: Proceedings of the IEEE Conference Vision and Pattern Recognition (CVPR), San Diego, CA, USA, pp. 1037–1042 (2005)

  58. Li, G., Liang, D., Huang, Q., Jiang, S. Q., Gao, W.: Object tracking using incremental 2D-LDA learning and Bayes inference. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), San Diego, California, USA, pp. 1568–1571 (2008)

  59. Avidan, S.: Ensemble tracking. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition (CVPR), San Diego, CA, USA, pp. 494–501 (2005)

  60. Leistner, C., Granber, H., Bischof, H.: Semi-supervised boosting using visual similarity learning. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition (CVPR), Anchorage, Alaska, USA, pp. 1–8 (2008)

  61. Babenko, B., Yang, M. H., Belongie, S.: Visual tracking with online multiple instance learning. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition (CVPR), Miami, Florida, USA, pp. 983–990 (2009)

  62. Li, W., Duan, L.X., Tsang, I.W., Xu, D.: Batch mode adaptive multiple instance learning for computer vision tasks. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition (CVPR), Providence, Rhode Island, USA, pp. 2368–2375 (2012)

  63. Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1409–1422 (2012)

    Article  Google Scholar 

  64. Kalal, Z., Matas, J., Mikolajczyk, K.: Online learning of robust object detectors during unstable tracking. In: Proceedings of the IEEE Conference on Computer Vision Workshop (ICCV Workshop), Kyoto, Japan, pp. 1417–1424 (2009)

  65. Hare, S., Saffari, A., Torr, P. H. S.: Struck: structured output tracking with kernels. In: Proceedings of the ICCV, Barcelona, Spain, pp. 263–270 (2011)

  66. Zhang, J., Ma, S., Sclaroff, S.: MEEM: robust tracking via multiple experts using entropy minimization. In: Proceedings of the ECCV, Zürich, Switzerland, pp. 188–203 (2014)

  67. Bolme, D. S., Beveridge, J. R., Draper, B. A., Lui, Y. M.: Visual object tracking using adaptive correlation filters. In: Proceedings of the CVPR, San Francisco, CA, USA, pp. 2544–2550 (2010)

  68. Henriques,J. F., Rui, C., Martins, P., Batista, J.: Exploiting the circulant structure of tracking-by-detection with kernels. In: Proceedings of the ECCV, Firenze, Italy, pp. 702–715 (2012)

  69. Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2015)

    Article  Google Scholar 

  70. Danelljan, M., Khan, F. S., Felsberg, M., van de Weijer, J.: Adaptive color attributes for real-time visual tracking. In: Proceedings of the CVPR, Columbus, OH, USA, pp. 1090–1097 (2014)

  71. Danelljan, M., Häger, G., Khan, F., Felsberg, M.: Accurate scale estimation for robust visual tracking. In: Proceedings of the British Machine Vision Conference (BMVC), Nottingham, UK, pp. 1–11 (2014)

  72. Li, Y., Zhu, J.: A scale adaptive kernel correlation filter tracker with feature integration. In: Proceedings of the ECCV Workshop, pp. 254–265 (2014)

  73. Danelljan, M., Häger, G., Khan, F.S., Felsberg, M.: Discriminative scale space tracking. IEEE Trans. Pattern Anal. Mach. Intell. 39(8), 1561–1575 (2017)

    Article  Google Scholar 

  74. Lukei, A., Voji, T., Zajc, L.C., Matas, J., Kristan, M.: Discriminative correlation filter tracker with channel and spatial reliability. Int. J. Comput. Vis. 126, 671–688 (2018)

    Article  MathSciNet  Google Scholar 

  75. Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., Torr, P. H. S.: Staple: complementary learners for real-time tracking. In: Proceedings of the CVPR, Las Vegas, NV, USA, pp. 1401–1409 (2016)

  76. Lin, R.S., Ross, D., Lim, J., Yang, M.H.: Adaptive discriminative generative model and its applications. Adv. Neural. Inf. Process. Syst. 17, 801–808 (2004)

    Google Scholar 

  77. Zhang, X. Q., Hu, W. M., Maybank, S., Li, X.: Graph based discriminative learning for robust and efficient object tracking. In: Proceedings of the IEEE Conference on Computer Vision (ICCV), Rio de Janeiro, Brazil, pp. 1–8 (2007)

  78. Yu, Q., Dinh, T. B., Medioni, G.: Online tracking and reacquisition using co-trained generative and discriminative trackers. In: Proceedings of the European Conference on Computer Vision (ECCV), Marseille, France, pp. 678–691 (2008)

  79. Yin, Z., Collins, R. T.: Shape constrained figure-ground segmentation and tracking. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition (CVPR), Miami, Florida, USA, pp. 731–738 (2009)

  80. Yang, M., Wu, Y., Lao, S.: Intelligent collaborative tracking by mining auxiliary objects. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition (CVPR), New York, NY, USA, pp. 697–704 (2006)

  81. Le Cun, Y., Bengio, Y., Hinton, G.E.: Deep learning. Nature 521, 436–444 (2015)

    Article  Google Scholar 

  82. Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 142–158 (2016)

    Article  Google Scholar 

  83. Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)

  84. Kim, S., Hori, T., Watanabe, S.: Joint ctc-attention based end-to-end speech recognition using multi-task learning. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pp. 4835–4839 (2017)

  85. Wu, Z., Valentini-Botinhao, C., Watts, O., King, S.: Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pp. 4460–4464 (2015)

  86. Vinyals, O., Kaiser, L., Koo, T., Petrov, S., Sutskever, I., Hinton, G. E.: Grammar as a foreign language. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 2773–2781 (2015)

  87. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. Clinical Orthopaedics and Related Research. https://arxiv.org/abs/1409.0473

  88. Bora, K., Chowdhury, M., Mahanta, L. B., Kundu, M. K., Das, A. K.: Pap smear image classification using convolutional neural network. In: Tenth Indian Conference on Computer Vision, Graphics and Image Processing, p. 55 (2016)

  89. Han, X.-H., Lei, J., Chen, Y.-W.: HEp-2 Cell Classification Using k-Support Spatial Pooling in Deep CNNs. Deep Learning and Data Labeling for Medical Applications, pp. 3–11. Springer, Berlin (2016)

    Book  Google Scholar 

  90. Hinton, G., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006)

    Article  MathSciNet  Google Scholar 

  91. Fukushima, K.: Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. J. Biol. Cybern. 36(4), 193–202 (1980). https://doi.org/10.1007/bf00344251

    Article  MATH  Google Scholar 

  92. Ramírez-Quintana, J.A., Chacon-Murguia, M.I., Chacon-Hinojos, J.F.: Artificial neural image processing applications: a survey. Eng Lett 20(1), 68–80 (2012)

    Google Scholar 

  93. LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. J. Neural Comput. 1(4), 541–551 (1989). https://doi.org/10.1162/neco.1989.1.4.541

    Article  Google Scholar 

  94. Padmanabhan, J., Premkumar, M.J.J.: Machine learning in automatic speech recognition: a survey. IETE Tech. Rev. 32(4), 240–251 (2015). https://doi.org/10.1080/02564602.2015.1010611

    Article  Google Scholar 

  95. Zeiler, M. D., Fergus, R.: Visualizing and understanding convolutional networks. In: Proceedings part I of the 13th European conference computer vision (ECCV’14), Zurich, Switzerland, pp. 818–833 (2014). https://doi.org/10.1007/978-3-319-10590-153

  96. Wang, L., Sng, D.: Deep learning algorithms with applications to video analytics for a smart city: a survey. In: CoRR, https://arxiv.org/abs/1512.03131 (2015)

  97. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  98. Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: Proceedings of the 19th International Conference on Neural Information Processing Systems (NIPS’06). MIT Press, Canada, pp 153–160 (2006)

  99. Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649 (2013). https://doi.org/10.1109/icassp.2013.6638947

  100. Abbas, Q., Ibrahim, M. E. A., Jaffar, M. A.: Artif. Intell. Rev. (2018). https://doi.org/10.1007/s10462-018-9633-3

  101. Ma, C., Huang, J., Yang, X., Yang, M.: Hierarchical convolutional features for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3074–3082 (2015)

  102. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Clin. Orthop. Rel. Res. (2014). https://arxiv.org/abs/1409.1556

  103. Hong, S., You, T., Kwak, S., Han, B.: Online tracking by learning discriminative saliency map with convolutional neural network. In: Proceedings of the International Conference on Machine Learning, pp. 597–606 (2015)

  104. Danelljan, M., Häger, G., Khan, F. S., Felsberg, M.: Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4310–4318 (2015)

  105. Galoogahi, H. K., Sim, T., Lucey, S.: Multi-channel correlation filters. In: ICCV, pp. 7–25 (2013)

  106. Zhu, G., Porikli, F., Li, H.: Robust visual tracking with deep convolutional neural network based object proposals on pets. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1265–1272 (2016)

  107. Danelljan, M., Robinson, A., Khan, F. S., Felsberg, M.: Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking. Springer, Cham (2016)

  108. Danelljan, M., Bhat, G., Khan, F. S., Felsberg, M.: ECO: efficient convolution operators for tracking. In: Proceedings of the IEEE Conference on the Computer Vision Pattern Recognition (CVPR), pp. 6931–6939 (2017)

  109. Bhat, G., Johnander, J., Danelljan, M., Khan, F. S., and Felsberg, M.: Unveiling the power of deep tracking. In: Proceedings of the European Conference on the Computer Vision (ECCV), Munich, Germany, pp. 483–498 (2018)

  110. Held, D., Thrun, S., Savarese, S.: Learning to track at 100 FPS with deep regression networks. In: Proceedings of the European Conference on the Computer Vision (ECCV), Amsterdam, The Netherlands, pp. 749–765 (2016)

  111. Tao, R., Gavves, E., Smeulders, A. W. M.: Siamese instance search for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1420–1429 (2016)

  112. Bertinetto, L., Valmadre, J., Henriques, J. F., Vedaldi, A., Torr, P. H. S.: Fully-convolutional siamese networks for object tracking. In: Proceedings of the European Conference on Computer Vision Workshops, pp. 850–865 (2016)

  113. Chen, K., Tao, W.: Once for all: a two-flow convolutional neural network for visual tracking. Clin. Orthop. Rel. Res. (2016). https://arxiv.org/abs/1604.07507

  114. Yan, C., Tu, Y., Wang, X., Zhang, Y., Hao, X., Zhang, Y., Dai, Q.: STAT: spatial-temporal attention mechanism for video captioning. IEEE Trans. Multimed. 22, 830–830 (2019)

    Article  Google Scholar 

  115. Zhu, Z., Wu, W., Zou, W., Yan, J.: End-to-end_ow correlation tracking with spatial-temporal attention. In: Proceedings of the IEEE Conference on the Computer Vision Pattern Recognition (CVPR), pp. 548–557 (2018)

  116. Bertinetto, L., Valmadre, J., Henriques, J. F., Vedaldi, A., Torr, P. H. S.: Fully-convolutional siamese networks for object tracking. In: Proceedings of the European Conference on the Computer Vision (ECCV), Amsterdam, The Netherlands, pp. 850–865 (2016)

  117. Valmadre, J., Bertinetto, L., Henriques, J., Vedaldi, A., Torr, P. H. S.: End-to-end representation learning for correlation filter based tracking. In: Proceedings of the European Conference on the Computer Vision Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 5000–5008 (2017)

  118. Kuai, Y., Wen, G., Li, D.: Masked and dynamic siamese network for robust visual tracking. Inf. Sci. 503, 169–182 (2019). https://doi.org/10.1016/j.ins.2019.07.004

    Article  Google Scholar 

  119. Gordon, D., Farhadi, A., Fox, D.: Re3: real-time recurrent regression networks for visual tracking of generic objects, https://arxiv.org/abs/1705.06368 (2017)

  120. Guo, Q., Wei, F., Zhou, C., Rui, H., Liang, W., Song, W.: Learning dynamic siamese network for visual object tracking. In: Proceedings of the IEEE Conference on Computer Vision (ICCV), Venice, Italy, vol. 1, pp. 1781–1789 (2017)

  121. Dong, X., Shen, J.: Triplet loss in siamese network for object tracking. In: Proceedings of the European Conference Computer Vision (ECCV), Munich, Germany, pp. 472–488 (2018)

  122. Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE Conference on Computer Vision Pattern Recognition (CVPR), Salt Lake City, UT, USA, pp. 8971–8980 (2018)

  123. Zhang, Z., Peng, H.: Deeper and wider siamese networks for real-time visual tracking. In: Proceedings of the IEEE Conference on Computer Vision Pattern Recognition (2019)

  124. Wang, Q., Zhang, L., Bertinetto, L., Hu, W., Torr, P. H. S.: Fast online object tracking and segmentation: a unifying approach. In: Proceedings of the IEEE International Conference on the Computer Vision Pattern Recognition, pp. 1328–1338 (2019)

  125. Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1834–1848 (2015)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Research Foundation of Korea (NRF) (No. NRF-2018R1D1A3B07044041), under the ITRC (Information Technology Research Center) support program supervised by the IITP (Institute for Information & communications Technology Promotion) (IITP-2020-2015-0-00448), and under Industrial Technology Innovation Program (No.20002655), grant funded by Korea Government.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nam Kim.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abbass, M.Y., Kwon, KC., Kim, N. et al. A survey on online learning for visual tracking. Vis Comput 37, 993–1014 (2021). https://doi.org/10.1007/s00371-020-01848-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-020-01848-y

Keywords

Navigation