Abstract
Owing to the inherent lack of training data in visual tracking, recent work in deep learning-based trackers has focused on learning a generic representation offline from large-scale training data and transferring the pre-trained feature representation to a tracking task. Offline pre-training is time-consuming, and the learned generic representation may be either less discriminative for tracking specific objects or overfitted to typical tracking datasets. In this paper, we propose an online discriminative tracking method based on robust feature learning without large-scale pre-training. Specifically, we first design a PCA filter bank-based convolutional neural network (CNN) architecture to learn robust features online with a few positive and negative samples in the high-dimensional feature space. Then, we use a simple soft-thresholding method to produce sparse features that are more robust to target appearance variations. Moreover, we increase the reliability of our tracker using edge information generated from edge box proposals during the process of visual tracking. Finally, effective visual tracking results are achieved by systematically combining the tracking information and edge box-based scores in a particle filtering framework. Extensive results on the widely used online tracking benchmark (OTB-50) with 50 videos validate the robustness and effectiveness of the proposed tracker without large-scale pre-training.
Similar content being viewed by others
References
Comaniciu D, Ramesh V, Meer P. Kernel-based object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003, 25(5): 564–577
Danelljan M, Khan F S, Felsberg M, Weijer J V D. Adaptive color attributes for real-time visual tracking. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2014, 1090–1097
Ross D A, Lim J, Lin R S, Yang M H. Incremental learning for robust visual tracking. International Journal of Computer Vision, 2008, 77(1–3): 125–141
Wang Q, Chen F, Xu W L, Yang M H. Object tracking via partial least squares analysis. IEEE Transactions on Image Processing, 2012, 21(10): 4454–4465
Viola P, Jones M J. Robust real-time face detection. International Journal of Computer Vision, 2004, 57(2): 137–154
Grabner H, Bischof H. On-line boosting and vision. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2006, 260–267
Hare S, Saffari A, Torr P. Struck: structured output tracking with kernels. IEEE International Conference on Computer Vision and Pattern Recognition. 2011
Yao R, Shi Q F, Shen C H, Zhang Y N, Hengel A V D. Part-based visual tracking with online latent structural learning. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2013, 2363–2370
Ahonen T, Hadid A, Pietikainen M. Face description with local binary patterns: application to face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(12): 2037–2041
Takala V, Pietikainen M. Multi-object tracking using color, texture and motion. In: Proceedings of IEEE Conference on Computer Vission and Pattern Recognition, 2007
Yang F, Lu H, Zhang W, Yang G. Visual tracking via bag of features. IEEE Transactions on Image Processing, 2012, 6(2): 115–128
Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2005, 886–893
Godec M, Roth P M, Bischof H. Hough-based tracking of non-rigid objects. Computer Vision and Image Understanding, 2011, 117(10): 1245–1256
Lu Y, Wu T F, Zhu S C. Online object tracking, learning and parsing with and-or graphs. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2014, 3462–3469
Grabner H, Matas J, Gool L V, Cattin P. Tracking the invisible: learning where the object might be. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2010
Fan J L, Shen X H, Wu Y. Scribble tracker: a matting-based approach for robust tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(8): 1633–1634
Porikli F, Tuzel O, Meer P. Covariance tracking using model update based on lie algebra. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2006, 728–735
Wu Y, Cheng J, Wang J, Lu H, Wang J, Ling H, Blasch E, Bai L. Real-time probabilistic covariance tracking with efficient model update. IEEE Transactions on Image Processing, 2012, 21(5): 2824–2837
Li X, Dick A, Shen C H, Hengel A V D, Wang H Z. Incremental learning of 3D-DCT compact representations for robust visual tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(4): 863–881
Isard M, Blake A. CONDENSATION—conditional density propagation for visual tracking. International Journal of Computer Vision, 1998, 29(1): 5–28
Wang S, Lu H, Yang F, Yang MH. Superpixel tracking. In: Proceedings of International Conference on Computer Vision. 2011, 1323–1330
Smeulders A W M, Chu D M, Cucchiara R, Calderara S, Dehghan A, Shah M. Visual tracking: an experimental survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(7): 1442–1468
Li X, Hu W, Shen C, Zhang Z, Dick A, van den Hengel A. A survey of appearance models in visual object tracking. ACM Transactions on Intelligent Systems and Technology, 2013, 4(4): 1–42
Collins R T, Liu Y, Leordeanu M. Online selection of discriminative tracking features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(10): 1631–1643
Mei X, Ling H. Robust visual tracking using L1 minimization. In: Proceedings of International Conference on Computer Vision. 2009, 1436–1443
Bao C, Wu Y, Ling H, Ji H. Real time robust L1 tracker using accelerated proximal gradient approach. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2012, 1830–1837
Zhang K H, Zhang L, Yang M H. Real-time compressive tracking. In: Proceedings of European Conference on Compute Vision. 2012, 864–877
Zhang T, Ghanem B, Liu S, Ahuja N. Low-rank sparse learning for robust visual tracking. In: Proceedings of European Conference on Compute Vision. 2012, 470–484
Jia X, Lu H C, Yang M H. Visual tracking via adaptive structural local sparse appearance model. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2012, 1822–1829
Zhang Z, Wong K H. Pyramid-based visual tracking using sparsity represented mean transform. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2014, 1226–1233
Zhong B N, Yao H X, Chen S, Ji R R, Chin T J, Wang H Z. Visual tracking via weakly supervised learning from multiple imperfect oracles. Pattern Recognition, 2014, 47(3): 1395–1410
Hong Z, Chen Z, Wang C, Mei X, Prokhorov D, Tao D. Multistore tracker (muster): a cognitive psychology inspired approach to object tracking. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2015, 749–758
Bai Y, Tang M. Robust tracking via weakly supervised ranking SVM. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2012, 1854–1861
Zuo W M, Wu X H, Lin L, Zhang L, Yang M H. Learning support correlation filters for visual tracking. 2016, arXiv:1601.06032
Kalal Z, Mikolajczyk K, Matas J. Tracking-learning-detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(7): 1409–1422
Babenko B, Yang M, Belongie S. Robust object tracking with online multiple instance learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(8): 1619–1632
Santner J, Leistner C, Saffari A, Pock T, Bischof H. PROST: parallel robust online simple tracking. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2010, 723–730
Gall J, Yao A, Van L, Lempitsky V. Hough forests for object detection, tracking, and action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(11): 2188–2202
Zhang L, Maaten L V D. Preserving structure in model-free tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(4): 756–769
Duffner S, Garcia C. Pixeltrack: a fast adaptive algorithm for tracking non-rigid objects. International Conference on Computer Vision. 2013, 2480–2487
Cehovin L, Kristan M, Leonardis A. Robust visual tracking using an adaptive coupled-layer visual model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(4): 941–953
Henriques J F, Caseiro R, Martins P, Batista J. High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3): 583–596
Chen Z, Hong Z B, Tao D C. An experimental survey on correlation filter-based tracking. 2015, arXiv:1509.05520
Liang P P, Liao C Y, Mei X, Ling H B. Adaptive objectness for object tracking. 2015, arXiv:1501.00909
Cheng M M, Zhang Z M, Lin W Y, Torr P. BING: binarized normed gradients for objectness estimation at 300fps. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2014, 3286–3293
Hua Y, Alahari K, Schmid C. Online object tracking with proposal selection. In: Proceedings of International Conference on Computer Vision. 2015, 3092–3100.
Zhu G, Porikli F, Li H D. Tracking randomly moving objects on Edge Box proposals. 2015, arXiv:1507.08085
Gan Y, Liu J, Dong J Y, Zhong G Q. A PCA-based convolutional network. 2015, arXiv:1505.03703
Guo Y W, Chen Y, Tang F, Li A, Luo W T, Liu M M. Object tracking using learned feature manifolds. Computer Vision and Image Understanding, 2014, 118: 128–139
Fan J L, Xu W, Wu Y, Gong Y H. Human tracking using convolutional neural networks. TEEE Transactions on Neural Networks, 2010, 21(10): 1610–1623
Wang N Y, Yeung D Y. Learning a deep compact image representation for visual tracking. In: Proceedings of Neural Information Processing Systems Conference. 2013, 809–817
Wang L, Liu T, Wang G, Chan K L, Yang Q. Video tracking using learned hierarchical features. IEEE Transactions on Image Processing, 2015, 24(4): 1424–1435
Li H X, Li Y, Porikli F. Deeptrack: learning discriminative feature representations by convolutional neural networks for visual tracking. In: Proceedings of British Machine Vision Conference. 2014
Wang L J, Ouyang WL,Wang X G, Lu H C. Visual tracking with fully convolutional networks. In: Proceedings of International Conference on Computer Vision, 2015, 3119–3127
Hong S, You T, Kwak S, Han B. Online tracking by learning discriminative saliency map with convolutional neural network. In: Proceedings of International Conference on Machine Learning. 2015, 597–606
Ma C, Huang J B, Yang X K, Yang M H. Hierarchical convolutional features for visual tracking. In: Proceedings of International Conference on Computer Vision. 2015, 3074–3082
Nam H S, Han B Y. Learning multi-domain convolutional neural networks for visual tracking. 2015, arXiv:1510.07945
Elad M, Figueiredo M A, Ma Y. On the role of sparse and redundant representations in image processing. Proceedings of the IEEE, 2010, 98(6): 972–982
Wu Y, Lim J W, Yang M H. Online object tracking: a benchmark. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2013, 2411–2418
Yilmaz A, Javed O. Shah M. Object tracking: a survey. ACM Computing Surveys, 2006, 38(4):13.
Dollár P, Zitnick C T. Structured forests for fast edge detection. In: Proceedings of International Conference on Computer Vision. 2013, 1841–1848
Zitnick C L, Doll’ar P. Edge boxes: locating object proposals from edges. In: Proceedings of European Conference on Compute Vision. 2014, 391–405
Zhang J M, Ma S G, Sclaroff S. MEEM: robust tracking via multiple experts using entropy minimization. In: Proceedings of European Conference on Compute Vision. 2014
Henriques J F, Caseiro R, Martins P, Batista J. High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3): 583–596
Gao J, Ling H, Hu W, Xing J. Transfer learning based visual tracking with gaussian processes regression. In: Proceedings of European Conference on Compute Vision. 2014
Acknowledgements
This work is supported by the National Natural Science Foundation of China (Grant Nos. 61572205 and 61175121), Natural Science Foundation of Fujian Province (2015J01257), Promotion Program for Young and Middle-aged Teacher in Science and Technology Research of Huaqiao University (ZQN-PY210 and ZQN-YX108), 2015 Program for New Century Excellent Talents in Fujian Province University, Project of science and technology plan of Fujian Province of China (2017H01010065).
Author information
Authors and Affiliations
Corresponding author
Additional information
Jun Zhang received the BS degree from the Hunan Institute of Science and Technology, Hunan, China in 2014. She is currently pursuing the ME degree with the school of Huaqiao University, Fujian, China. Her current research interests include computer vision, machine learning, and pattern recognition.
Bineng Zhong received the BS, MS, and PhD degrees in computer science from the Harbin Institute of Technology, Harbin, China in 2004, 2006, and 2010, respectively. From 2007 to 2008, he was a research fellow with the Institute of Automation and Institute of Computing Technology, Chinese Academy of Science, China. Currently, he is an associate professor with the School of Computer Science and Technology, Huaqiao University, Xiamen, China. His current research interests include pattern recognition, machine learning, and computer vision.
Pengfei Wang received the BS degree in College of Mathematic and Information, China West Normal University, Nanchong, China in 2014. Currently, he is a master student with the School of Computer Science and Technology, Huaqiao University, Xiamen, China. His current research interests include object tracking, machine learning, and computer vision.
Cheng Wang received the BS degree in software engineering from Xi’dian University, Xi’an, China in 2002 and the PhD degree in mechanics from Xi’an Jiaotong University, Xi’an, China in 2012, respectively. Currently, he is an associate professor at the School of Computer Science and Technology, Huaqiao University, Xiamen, China. His current research interests include signal processing, machine learning, and data mining.
Jixiang Du received the BS andMS degrees in Vehicle Engineering from Hefei University of Technology, Hefei, China in 1999 and 2002. He received the PhD degree in Pattern Recognition & Intelligent System from University of Science and Technology of China (USTC), Hefei, China in 2005. He is also the Associate Dean of the College and the Director of Department of Computer Science and Technology. His current research mainly concern pattern recognition and machine learning.
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Zhang, J., Zhong, B., Wang, P. et al. Robust feature learning for online discriminative tracking without large-scale pre-training. Front. Comput. Sci. 12, 1160–1172 (2018). https://doi.org/10.1007/s11704-017-6281-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11704-017-6281-8