ABSTRACT
Convolutional Neural Network (CNN) based methods have shown significant performance gains in the problem of visual tracking in recent years. Due to many uncertain changes of objects online, such as abrupt motion, background clutter and large deformation, the visual tracking is still a challenging task. We propose a novel algorithm, namely Deep Location-Specific Tracking, which decomposes the tracking problem into a localization task and a classification task, and trains an individual network for each task. The localization network exploits the information in the current frame and provides a specific location to improve the probability of successful tracking, while the classification network finds the target among many examples generated around the target location in the previous frame, as well as the one estimated from the localization network in the current frame. CNN based trackers often have massive number of trainable parameters, and are prone to over-fitting to some particular object states, leading to less precision or tracking drift. We address this problem by learning a classification network based on 1 × 1 convolution and global average pooling. Extensive experimental results on popular benchmark datasets show that the proposed tracker achieves competitive results without using additional tracking videos for fine-tuning. The code is available at https://github.com/ZjjConan/DLST
- Shai Avidan. 2007. Ensemble tracking. IEEE TPAMI, Vol. 29, 2 (2007). Google ScholarDigital Library
- Boris Babenko, Ming-Hsuan Yang, and Serge Belongie. 2009. Visual tracking with online multiple instance learning CVPR. IEEE, 983--990.Google Scholar
- Luca Bertinetto, Jack Valmadre, Stuart Golodetz, Ondrej Miksik, and Philip HS Torr. 2016 a. Staple: Complementary learners for real-time tracking CVPR. IEEE, 1401--1409.Google Scholar
- Luca Bertinetto, Jack Valmadre, Jo ao F Henriques, Andrea Vedaldi, and Philip HS Torr. 2016 b. Fully-Convolutional Siamese Networks for Object Tracking. arXiv preprint arXiv:1606.09549 (2016).Google Scholar
- David S Bolme, J Ross Beveridge, Bruce A Draper, and Yui Man Lui. 2010. Visual object tracking using adaptive correlation filters CVPR. IEEE, 2544--2550.Google Scholar
- K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. 2014. Return of the Devil in the Details: Delving Deep into Convolutional Nets BMVC. BMVA Press.showeprint{arxiv}cs/1405.3531Google Scholar
- Navneet Dalal and Bill Triggs. 2005. Histograms of oriented gradients for human detection CVPR, Vol. Vol. 1. IEEE, 886--893. Google ScholarDigital Library
- Martin Danelljan, Gustav H"ager, Fahad Khan, and Michael Felsberg. 2014. Accurate scale estimation for robust visual tracking BMVC. BMVA Press.Google Scholar
- Martin Danelljan, Gustav Hager, Fahad Shahbaz Khan, and Michael Felsberg. 2015 a. Convolutional features for correlation filter based visual tracking ICCVW. IEEE, 58--66.Google Scholar
- Martin Danelljan, Gustav Hager, Fahad Shahbaz Khan, and Michael Felsberg. 2015 b. Learning spatially regularized correlation filters for visual tracking ICCV. IEEE, 4310--4318. Google ScholarDigital Library
- Martin Danelljan, Gustav Hager, Fahad Shahbaz Khan, and Michael Felsberg. 2016 a. Adaptive decontamination of the training set: A unified formulation for discriminative visual tracking. In CVPR. IEEE, 1430--1438.Google Scholar
- Martin Danelljan, Andreas Robinson, Fahad Shahbaz Khan, and Michael Felsberg. 2016 b. Beyond correlation filters: Learning continuous convolution operators for visual tracking ECCV. Springer, 472--488.Google Scholar
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database CVPR. IEEE, 248--255.Google Scholar
- Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor Darrell. 2014. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. ICML. 647--655. Google ScholarDigital Library
- Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation CVPR. IEEE, 580--587. Google ScholarDigital Library
- Helmut Grabner, Michael Grabner, and Horst Bischof. 2006. Real-time tracking via on-line boosting.. In BMVC, Vol. Vol. 1. BMVA Press, 6.Google ScholarCross Ref
- Sam Hare, Stuart Golodetz, Amir Saffari, Vibhav Vineet, Ming-Ming Cheng, Stephen L Hicks, and Philip HS Torr. 2016. Struck: Structured output tracking with kernels. IEEE TPAMI, Vol. 38, 10 (2016), 2096--2109. Google ScholarDigital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. IEEE, 770--778.Google Scholar
- Jo ao F Henriques, Rui Caseiro, Pedro Martins, and Jorge Batista. 2015. High-speed tracking with kernelized correlation filters. IEEE TPAMI, Vol. 37, 3 (2015), 583--596.Google ScholarDigital Library
- Seunghoon Hong, Tackgeun You, Suha Kwak, and Bohyung Han. 2015 b. Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network. ICML. 597--606. Google ScholarDigital Library
- Zhibin Hong, Zhe Chen, Chaohui Wang, Xue Mei, Danil Prokhorov, and Dacheng Tao. 2015 a. Multi-store tracker (muster): A cognitive psychology inspired approach to object tracking CVPR. IEEE, 749--758.Google Scholar
- Matej Kristan, Alevs Leonardis, Jiri Matas, Michael Felsberg, Roman Pflugfelder, Luka vCehovin, Tomas Vojir, Gustav H"ager, Alan Lukevzivc, and Gustavo Fernandez textitet al.. 2016. The Visual Object Tracking VOT2016 challenge results. Springer. (Oct. 2016). http://www.springer.com/gp/book/9783319488806Google Scholar
- Matej Kristan, Jiri Matas, Alevs Leonardis, Michael Felsberg, Luka vCehovin, Gustavo Fernandez, Tomas Vojir, Gustav H"ager, Georg Nebehay, and Roman Pflugfelder textitet al.. 2015. The Visual Object Tracking VOT2015 challenge results Visual Object Tracking Workshop 2015 at ICCV2015. IEEE. Google ScholarDigital Library
- Matej Kristan, Roman Pflugfelder, Alevs Leonardis, Jiri Matas, Luka vCehovin, Georg Nebehay, Tomas Vojir, Gustavo Fernandez, Alan Lukevzivc, and Aleksandar Dimitriev textitet al.. 2014. The Visual Object Tracking VOT2014 challenge results. (2014). http://www.votchallenge.net/vot2014/program.htmlGoogle Scholar
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks NIPS. 1097--1105. Google ScholarDigital Library
- A Li, M Lin, Y Wu, MH Yang, and S Yan. 2016. NUS-PRO: A New Visual Tracking Challenge. IEEE TPAMI, Vol. 38, 2 (2016), 335--349. Google ScholarDigital Library
- Hanxi Li, Yi Li, and Fatih Porikli. 2014. Robust online visual tracking with a single convolutional neural network ACCV. Springer, 194--209.Google Scholar
- Min Lin, Qiang Chen, and Shuicheng Yan. 2013. Network in network. arXiv preprint arXiv:1312.4400 (2013).Google Scholar
- Xiaobai Liu. 2016. V3I-STAL: Visual Vehicle-to-Vehicle Interaction via Simultaneous Tracking and Localization MM. ACM, New York, NY, USA, 1117--1126. Google ScholarDigital Library
- Chao Ma, Jia-Bin Huang, Xiaokang Yang, and Ming-Hsuan Yang. 2015 a. Hierarchical convolutional features for visual tracking ICCV. IEEE, 3074--3082. Google ScholarDigital Library
- Chao Ma, Xiaokang Yang, Chongyang Zhang, and Ming-Hsuan Yang. 2015 b. Long-term correlation tracking. In CVPR. IEEE, 5388--5396.Google Scholar
- Xue Mei and Haibin Ling. 2009. Robust visual tracking using l1 minimization. In ICCV. IEEE, 1436--1443.Google Scholar
- Hyeonseob Nam, Mooyeol Baek, and Bohyung Han. 2016. Modeling and propagating cnns in a tree structure for visual tracking. arXiv preprint arXiv:1608.07242 (2016).Google Scholar
- Hyeonseob Nam and Bohyung Han. 2016. Learning multi-domain convolutional neural networks for visual tracking CVPR. IEEE, 4293--4302.Google Scholar
- Yuankai Qi, Shengping Zhang, Lei Qin, Hongxun Yao, Qingming Huang, Jongwoo Lim, and Ming-Hsuan Yang. 2016. Hedged deep tracking CVPR. IEEE, 4303--4311.Google Scholar
- David A Ross, Jongwoo Lim, Ruei-Sung Lin, and Ming-Hsuan Yang. 2008. Incremental learning for robust visual tracking. IJCV, Vol. 77, 1 (2008), 125--141. Google ScholarDigital Library
- K. Simonyan and A. Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition ICLR.Google Scholar
- Arnold WM Smeulders, Dung M Chu, Rita Cucchiara, Simone Calderara, Afshin Dehghan, and Mubarak Shah. 2014. Visual tracking: An experimental survey. IEEE TPAMI, Vol. 36, 7 (2014), 1442--1468. Google ScholarDigital Library
- Michael Stengel, Steve Grogorick, Martin Eisemann, Elmar Eisemann, and Marcus Magnor. 2015. An Affordable Solution for Binocular Eye Tracking and Calibration in Head-mounted Displays MM. ACM, 15--24. Google ScholarDigital Library
- Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In CVPR. IEEE, 1--9.Google Scholar
- Ran Tao, Efstratios Gavves, and Arnold W M Smeulders. 2016. Siamese Instance Search for Tracking. In CVPR. IEEE.Google Scholar
- Andrea Vedaldi and Karel Lenc. 2015. Matconvnet: Convolutional neural networks for matlab MM. ACM, 689--692. Google ScholarDigital Library
- Dong Wang, Huchuan Lu, and Ming-Hsuan Yang. 2013. Online object tracking with sparse prototypes. IEEE TIP, Vol. 22, 1 (2013), 314--325. Google ScholarDigital Library
- Lijun Wang, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. 2015. Visual tracking with fully convolutional networks. ICCV. IEEE, 3119--3127. Google ScholarDigital Library
- Yi Wu, Jongwoo Lim, and Ming-Hsuan Yang. 2013. Online object tracking: A benchmark. In CVPR. IEEE, 2411--2418. Google ScholarDigital Library
- Yi Wu, Jongwoo Lim, and Ming-Hsuan Yang. 2015. Object tracking benchmark. IEEE TPAMI, Vol. 37, 9 (2015), 1834--1848.Google ScholarDigital Library
- Jianming Zhang, Shugao Ma, and Stan Sclaroff. 2014. MEEM: robust tracking via multiple experts using entropy minimization ECCV. Springer, 188--203.Google Scholar
- Kaihua Zhang, Lei Zhang, and Ming-Hsuan Yang. 2012. Real-time compressive tracking. In ECCV. Springer, 864--877. Google ScholarDigital Library
- Gao Zhu, Fatih Porikli, and Hongdong Li. 2016. Beyond local search: Tracking objects everywhere with instance-specific proposals CVPR. IEEE, 943--951.Google Scholar
Index Terms
Deep Location-Specific Tracking
Recommendations
Deep visual tracking
The first comprehensive survey on deep-learning-based trackers.Review existing deep visual trackers from three different perspectives.Large-scale benchmark evaluations of deep visual trackers.Summarize cutting-edge research works and discuss future ...
Human tracking using convolutional neural networks
In this paper, we treat tracking as a learning problem of estimating the location and the scale of an object given its previous location, scale, as well as current and previous image frames. Given a set of examples, we train convolutional neural ...
Robust pedestrian tracking using improved tracking-learning-detection algorithm
ICVGIP '16: Proceedings of the Tenth Indian Conference on Computer Vision, Graphics and Image ProcessingManual analysis of pedestrians for surveillance of large crowds in real time applications is not practical. Tracking-Learning-Detection suggested by Kalal, Mikolajczyk and Matas [1] is one of the most prominent automatic object tracking system. TLD can ...
Comments