skip to main content
10.1145/3123266.3123381acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Deep Location-Specific Tracking

Authors Info & Claims
Published:19 October 2017Publication History

ABSTRACT

Convolutional Neural Network (CNN) based methods have shown significant performance gains in the problem of visual tracking in recent years. Due to many uncertain changes of objects online, such as abrupt motion, background clutter and large deformation, the visual tracking is still a challenging task. We propose a novel algorithm, namely Deep Location-Specific Tracking, which decomposes the tracking problem into a localization task and a classification task, and trains an individual network for each task. The localization network exploits the information in the current frame and provides a specific location to improve the probability of successful tracking, while the classification network finds the target among many examples generated around the target location in the previous frame, as well as the one estimated from the localization network in the current frame. CNN based trackers often have massive number of trainable parameters, and are prone to over-fitting to some particular object states, leading to less precision or tracking drift. We address this problem by learning a classification network based on 1 × 1 convolution and global average pooling. Extensive experimental results on popular benchmark datasets show that the proposed tracker achieves competitive results without using additional tracking videos for fine-tuning. The code is available at https://github.com/ZjjConan/DLST

References

  1. Shai Avidan. 2007. Ensemble tracking. IEEE TPAMI, Vol. 29, 2 (2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Boris Babenko, Ming-Hsuan Yang, and Serge Belongie. 2009. Visual tracking with online multiple instance learning CVPR. IEEE, 983--990.Google ScholarGoogle Scholar
  3. Luca Bertinetto, Jack Valmadre, Stuart Golodetz, Ondrej Miksik, and Philip HS Torr. 2016 a. Staple: Complementary learners for real-time tracking CVPR. IEEE, 1401--1409.Google ScholarGoogle Scholar
  4. Luca Bertinetto, Jack Valmadre, Jo ao F Henriques, Andrea Vedaldi, and Philip HS Torr. 2016 b. Fully-Convolutional Siamese Networks for Object Tracking. arXiv preprint arXiv:1606.09549 (2016).Google ScholarGoogle Scholar
  5. David S Bolme, J Ross Beveridge, Bruce A Draper, and Yui Man Lui. 2010. Visual object tracking using adaptive correlation filters CVPR. IEEE, 2544--2550.Google ScholarGoogle Scholar
  6. K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. 2014. Return of the Devil in the Details: Delving Deep into Convolutional Nets BMVC. BMVA Press.showeprint{arxiv}cs/1405.3531Google ScholarGoogle Scholar
  7. Navneet Dalal and Bill Triggs. 2005. Histograms of oriented gradients for human detection CVPR, Vol. Vol. 1. IEEE, 886--893. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Martin Danelljan, Gustav H"ager, Fahad Khan, and Michael Felsberg. 2014. Accurate scale estimation for robust visual tracking BMVC. BMVA Press.Google ScholarGoogle Scholar
  9. Martin Danelljan, Gustav Hager, Fahad Shahbaz Khan, and Michael Felsberg. 2015 a. Convolutional features for correlation filter based visual tracking ICCVW. IEEE, 58--66.Google ScholarGoogle Scholar
  10. Martin Danelljan, Gustav Hager, Fahad Shahbaz Khan, and Michael Felsberg. 2015 b. Learning spatially regularized correlation filters for visual tracking ICCV. IEEE, 4310--4318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Martin Danelljan, Gustav Hager, Fahad Shahbaz Khan, and Michael Felsberg. 2016 a. Adaptive decontamination of the training set: A unified formulation for discriminative visual tracking. In CVPR. IEEE, 1430--1438.Google ScholarGoogle Scholar
  12. Martin Danelljan, Andreas Robinson, Fahad Shahbaz Khan, and Michael Felsberg. 2016 b. Beyond correlation filters: Learning continuous convolution operators for visual tracking ECCV. Springer, 472--488.Google ScholarGoogle Scholar
  13. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database CVPR. IEEE, 248--255.Google ScholarGoogle Scholar
  14. Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor Darrell. 2014. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. ICML. 647--655. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation CVPR. IEEE, 580--587. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Helmut Grabner, Michael Grabner, and Horst Bischof. 2006. Real-time tracking via on-line boosting.. In BMVC, Vol. Vol. 1. BMVA Press, 6.Google ScholarGoogle ScholarCross RefCross Ref
  17. Sam Hare, Stuart Golodetz, Amir Saffari, Vibhav Vineet, Ming-Ming Cheng, Stephen L Hicks, and Philip HS Torr. 2016. Struck: Structured output tracking with kernels. IEEE TPAMI, Vol. 38, 10 (2016), 2096--2109. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. IEEE, 770--778.Google ScholarGoogle Scholar
  19. Jo ao F Henriques, Rui Caseiro, Pedro Martins, and Jorge Batista. 2015. High-speed tracking with kernelized correlation filters. IEEE TPAMI, Vol. 37, 3 (2015), 583--596.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Seunghoon Hong, Tackgeun You, Suha Kwak, and Bohyung Han. 2015 b. Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network. ICML. 597--606. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Zhibin Hong, Zhe Chen, Chaohui Wang, Xue Mei, Danil Prokhorov, and Dacheng Tao. 2015 a. Multi-store tracker (muster): A cognitive psychology inspired approach to object tracking CVPR. IEEE, 749--758.Google ScholarGoogle Scholar
  22. Matej Kristan, Alevs Leonardis, Jiri Matas, Michael Felsberg, Roman Pflugfelder, Luka vCehovin, Tomas Vojir, Gustav H"ager, Alan Lukevzivc, and Gustavo Fernandez textitet al.. 2016. The Visual Object Tracking VOT2016 challenge results. Springer. (Oct. 2016). http://www.springer.com/gp/book/9783319488806Google ScholarGoogle Scholar
  23. Matej Kristan, Jiri Matas, Alevs Leonardis, Michael Felsberg, Luka vCehovin, Gustavo Fernandez, Tomas Vojir, Gustav H"ager, Georg Nebehay, and Roman Pflugfelder textitet al.. 2015. The Visual Object Tracking VOT2015 challenge results Visual Object Tracking Workshop 2015 at ICCV2015. IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Matej Kristan, Roman Pflugfelder, Alevs Leonardis, Jiri Matas, Luka vCehovin, Georg Nebehay, Tomas Vojir, Gustavo Fernandez, Alan Lukevzivc, and Aleksandar Dimitriev textitet al.. 2014. The Visual Object Tracking VOT2014 challenge results. (2014). http://www.votchallenge.net/vot2014/program.htmlGoogle ScholarGoogle Scholar
  25. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks NIPS. 1097--1105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A Li, M Lin, Y Wu, MH Yang, and S Yan. 2016. NUS-PRO: A New Visual Tracking Challenge. IEEE TPAMI, Vol. 38, 2 (2016), 335--349. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Hanxi Li, Yi Li, and Fatih Porikli. 2014. Robust online visual tracking with a single convolutional neural network ACCV. Springer, 194--209.Google ScholarGoogle Scholar
  28. Min Lin, Qiang Chen, and Shuicheng Yan. 2013. Network in network. arXiv preprint arXiv:1312.4400 (2013).Google ScholarGoogle Scholar
  29. Xiaobai Liu. 2016. V3I-STAL: Visual Vehicle-to-Vehicle Interaction via Simultaneous Tracking and Localization MM. ACM, New York, NY, USA, 1117--1126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Chao Ma, Jia-Bin Huang, Xiaokang Yang, and Ming-Hsuan Yang. 2015 a. Hierarchical convolutional features for visual tracking ICCV. IEEE, 3074--3082. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Chao Ma, Xiaokang Yang, Chongyang Zhang, and Ming-Hsuan Yang. 2015 b. Long-term correlation tracking. In CVPR. IEEE, 5388--5396.Google ScholarGoogle Scholar
  32. Xue Mei and Haibin Ling. 2009. Robust visual tracking using l1 minimization. In ICCV. IEEE, 1436--1443.Google ScholarGoogle Scholar
  33. Hyeonseob Nam, Mooyeol Baek, and Bohyung Han. 2016. Modeling and propagating cnns in a tree structure for visual tracking. arXiv preprint arXiv:1608.07242 (2016).Google ScholarGoogle Scholar
  34. Hyeonseob Nam and Bohyung Han. 2016. Learning multi-domain convolutional neural networks for visual tracking CVPR. IEEE, 4293--4302.Google ScholarGoogle Scholar
  35. Yuankai Qi, Shengping Zhang, Lei Qin, Hongxun Yao, Qingming Huang, Jongwoo Lim, and Ming-Hsuan Yang. 2016. Hedged deep tracking CVPR. IEEE, 4303--4311.Google ScholarGoogle Scholar
  36. David A Ross, Jongwoo Lim, Ruei-Sung Lin, and Ming-Hsuan Yang. 2008. Incremental learning for robust visual tracking. IJCV, Vol. 77, 1 (2008), 125--141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. K. Simonyan and A. Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition ICLR.Google ScholarGoogle Scholar
  38. Arnold WM Smeulders, Dung M Chu, Rita Cucchiara, Simone Calderara, Afshin Dehghan, and Mubarak Shah. 2014. Visual tracking: An experimental survey. IEEE TPAMI, Vol. 36, 7 (2014), 1442--1468. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Michael Stengel, Steve Grogorick, Martin Eisemann, Elmar Eisemann, and Marcus Magnor. 2015. An Affordable Solution for Binocular Eye Tracking and Calibration in Head-mounted Displays MM. ACM, 15--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In CVPR. IEEE, 1--9.Google ScholarGoogle Scholar
  41. Ran Tao, Efstratios Gavves, and Arnold W M Smeulders. 2016. Siamese Instance Search for Tracking. In CVPR. IEEE.Google ScholarGoogle Scholar
  42. Andrea Vedaldi and Karel Lenc. 2015. Matconvnet: Convolutional neural networks for matlab MM. ACM, 689--692. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Dong Wang, Huchuan Lu, and Ming-Hsuan Yang. 2013. Online object tracking with sparse prototypes. IEEE TIP, Vol. 22, 1 (2013), 314--325. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Lijun Wang, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. 2015. Visual tracking with fully convolutional networks. ICCV. IEEE, 3119--3127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Yi Wu, Jongwoo Lim, and Ming-Hsuan Yang. 2013. Online object tracking: A benchmark. In CVPR. IEEE, 2411--2418. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Yi Wu, Jongwoo Lim, and Ming-Hsuan Yang. 2015. Object tracking benchmark. IEEE TPAMI, Vol. 37, 9 (2015), 1834--1848.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Jianming Zhang, Shugao Ma, and Stan Sclaroff. 2014. MEEM: robust tracking via multiple experts using entropy minimization ECCV. Springer, 188--203.Google ScholarGoogle Scholar
  48. Kaihua Zhang, Lei Zhang, and Ming-Hsuan Yang. 2012. Real-time compressive tracking. In ECCV. Springer, 864--877. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Gao Zhu, Fatih Porikli, and Hongdong Li. 2016. Beyond local search: Tracking objects everywhere with instance-specific proposals CVPR. IEEE, 943--951.Google ScholarGoogle Scholar

Index Terms

  1. Deep Location-Specific Tracking

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MM '17: Proceedings of the 25th ACM international conference on Multimedia
      October 2017
      2028 pages
      ISBN:9781450349062
      DOI:10.1145/3123266

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 19 October 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      MM '17 Paper Acceptance Rate189of684submissions,28%Overall Acceptance Rate995of4,171submissions,24%

      Upcoming Conference

      MM '24
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader