Skip to main content

Robust Online Visual Tracking with a Single Convolutional Neural Network

  • Conference paper
  • First Online:
Computer Vision -- ACCV 2014 (ACCV 2014)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9007))

Included in the following conference series:

Abstract

Deep neural networks, albeit their great success on feature learning in various computer vision tasks, are usually considered as impractical for online visual tracking because they require very long training time and a large number of training samples. In this work, we present an efficient and very robust online tracking algorithm using a single Convolutional Neural Network (CNN) for learning effective feature representations of the target object over time. Our contributions are multifold: First, we introduce a novel truncated structural loss function that maintains as many training samples as possible and reduces the risk of tracking error accumulation, thus drift, by accommodating the uncertainty of the model output. Second, we enhance the ordinary Stochastic Gradient Descent approach in CNN training with a temporal selection mechanism, which generates positive and negative samples within different time periods. Finally, we propose to update the CNN model in a “lazy” style to speed-up the training stage, where the network is updated only when a significant appearance change occurs on the object, without sacrificing tracking accuracy. The CNN tracker outperforms all compared state-of-the-art methods in our extensive evaluations that involve 18 well-known benchmark video sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Two parameters \(r_{\mu }\) and \(r_{\sigma }\) determine a local contrast normalization process. In this work, we use three configurations, i.e., \(\{r_{\mu } = 3, r_{\sigma } = 1\}\), \(\{r_{\mu } = 3, r_{\sigma } = 3\}\) and \(\{r_{\mu } = 5, r_{\sigma } = 5\}\), respectively.

  2. 2.

    Here we follow the labeling style in conventional CNN training.

  3. 3.

    In this paper \(o = 3\), i.e., the bounding box changes in its location and the scale.

  4. 4.

    \(s = h / 32\), where \(h\) is object’s height.

References

  1. Pérez, P., Hue, C., Vermaak, J., Gangnet, M.: Color-Based Probabilistic Tracking. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part I. LNCS, vol. 2350, pp. 661–675. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  2. Collins, R.T., Liu, Y., Leordeanu, M.: Online selection of discriminative tracking features. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1631–1643 (2005)

    Article  Google Scholar 

  3. Adam, A., Rivlin, E., Shimshoni, I.: Robust fragments-based tracking using the integral histogram. In: CVPR 2006, vol. 1 (2006)

    Google Scholar 

  4. Hare, S., Saffari, A., Torr, P.H.: Struck: structured output tracking with kernels. In: ICCV 2011, pp. 263–270. IEEE (2011)

    Google Scholar 

  5. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)

    Article  Google Scholar 

  6. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 886–893. IEEE (2005)

    Google Scholar 

  7. Ahonen, T., Hadid, A., Pietikainen, M.: Face description with local binary patterns: application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 28, 2037–2041 (2006)

    Article  Google Scholar 

  8. Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828 (2013)

    Article  Google Scholar 

  9. Kavukcuoglu, K., Sermanet, P., Boureau, Y.L., Gregor, K., Mathieu, M., LeCun, Y.: Learning convolutional feature hierachies for visual recognition. In: NIPS 2010

    Google Scholar 

  10. Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: NIPS 2012 (2012)

    Google Scholar 

  11. Ciresan, D.C., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classification. In: CVPR 2012 (2012)

    Google Scholar 

  12. Fan, J., Xu, W., Wu, Y., Gong, Y.: Human tracking using convolutional neural networks. Trans. Neural Netw. 21, 1610–1623 (2010)

    Article  Google Scholar 

  13. Wang, N., Yeung, D.Y.: Learning a deep compact image representation for visual tracking. In: NIPS 2013 (2013)

    Google Scholar 

  14. Babenko, B., Yang, M.H., Belongie, S.: Visual tracking with online multiple instance learning. Transactions on Pattern Analysis and Machine Intelligence (2011)

    Google Scholar 

  15. Zheng, Y., Liu, Q., Chen, E., Ge, Y., Zhao, J.L.: Time series classification using multi-channels deep convolutional neural networks. In: Li, F., Li, G., Hwang, S., Yao, B., Zhang, Z. (eds.) WAIM 2014. LNCS, vol. 8485, pp. 298–310. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  16. Ciresan, D., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classification. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3642–3649. IEEE (2012)

    Google Scholar 

  17. Cireşan, D., Meier, U., Masci, J., Schmidhuber, J.: Multi-column deep neural network for traffic sign classification. Neural Netw. 32, 333–338 (2012)

    Article  Google Scholar 

  18. Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Intl J. Comput. Vis. 88, 303–338 (2010)

    Article  Google Scholar 

  19. Viola, P., Platt, J., Zhang, C., et al.: Multiple instance boosting for object detection. In: NIPS, vol. 2, p. 5 (2005)

    Google Scholar 

  20. Xing, J., Gao, J., Li, B., Hu, W., Yan, S.: Robust object tracking with online multi-lifespan dictionary learning. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 665–672. IEEE (2013)

    Google Scholar 

  21. Kalal, Z., Matas, J., Mikolajczyk, K.: Pn learning: bootstrapping binary classifiers by structural constraints. In: CVPR 2010, pp. 49–56. IEEE (2010)

    Google Scholar 

  22. Dinh, T.B., Vo, N., Medioni, G.: Context tracker: exploring supporters and distracters in unconstrained environments. In: CVPR 2011, pp. 1177–1184. IEEE (2011)

    Google Scholar 

  23. Jia, X., Lu, H., Yang, M.H.: Visual tracking via adaptive structural local sparse appearance model. In: CVPR 2012, pp. 1822–1829. IEEE (2012)

    Google Scholar 

  24. Zhong, W., Lu, H., Yang, M.H.: Robust object tracking via sparsity-based collaborative model. In: CVPR 2012, pp. 1838–1845. IEEE (2012)

    Google Scholar 

  25. Ross, D.A., Lim, J., Lin, R.S., Yang, M.H.: Incremental learning for robust visual tracking. Intl. J. Comput. Vis. 77, 125–141 (2008)

    Article  Google Scholar 

  26. Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. In: CVPR 2013 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hanxi Li .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material (zip 16,402 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Li, H., Li, Y., Porikli, F. (2015). Robust Online Visual Tracking with a Single Convolutional Neural Network. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision -- ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9007. Springer, Cham. https://doi.org/10.1007/978-3-319-16814-2_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16814-2_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16813-5

  • Online ISBN: 978-3-319-16814-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics