Abstract
Single online visual object tracking has been an active research topic for its wide application on various tasks. In this paper, a new framework and related approaches are proposed to solve this problem consisting of enhanced tracking and detection learning. In the enhanced tracking part, an appearance model based on correlation filter with deep CNN features and a dynamic model using improved pyramid optical flow method are employed. Two models cooperate together to depict object appearance and capture target trajectory, which also contribute to provide training samples for detection learning. In the detection learning part, a cascade classifier and P-N learning scheme are employed to reinitialize tracking when model drift occurs. Data experiments on several challenging benchmarks show that the presented method is comparable to the state-of-the-art.










Similar content being viewed by others
References
Alsmirat MA, Al-Alem F, Al-Ayyoub M et al (2018) Impact of digital fingerprint image quality on the fingerprint recognition accuracy. Multimedia Tools & Applications 4:1–40
Bailer C, Taetz B, Stricker D (2017) Optical Flow Fields: Dense Correspondence Fields for Highly Accurate Large Displacement Optical Flow Estimation. pp. 4015–4023
Bertinetto L, Valmadre J, Golodetz S et al (2015) Staple: Complementary Learners for Real-Time Tracking 38(2):1401–1409
Bertinetto L, Valmadre J, Henriques JF, et al (2016) Fully-Convolutional Siamese Networks for Object Tracking. pp. 850–865
Choi J, Chang H J, Yun S, et al (2017) Attentional Correlation Filter Network for Adaptive Visual Tracking. IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, pp. 4828–4837
Dalal N, Triggs B (2005) Histograms of Oriented Gradients for Human Detection. IEEE Computer Society Conference on Computer Vision & Pattern Recognition. IEEE Computer Society, pp. 886–893
Danelljan M, Bhat G, Khan FS, et al (2017) ECO: Efficient Convolution Operators for Tracking. pp. 6931–6939
Danelljan M, Häger G, Khan FS (2014) Accurate scale estimation for robust visual tracking. British Machine Vision Conference 65:1–65 11
Danelljan M, Häger G, Khan F S, et al (2015) Convolutional Features for Correlation Filter Based Visual Tracking. IEEE International Conference on Computer Vision Workshop. IEEE Computer Society, pp. 621–629
Danelljan M, Häger G, Khan FS, et al (2016) Learning Spatially Regularized Correlation Filters for Visual Tracking. 4310–4318
Danelljan M, Häger G, Khan FS et al (2017) Discriminative Scale Space Tracking. IEEE Transactions on Pattern Analysis & Machine Intelligence 39(8):1561–1575
Danelljan M, Khan F S, Felsberg M, et al (2014) Adaptive Color Attributes for Real-Time Visual Tracking. Computer Vision and Pattern Recognition. IEEE, pp. 1090–1097
Deng J, Dong W, Socher R, et al (2009) ImageNet: A large-scale hierarchical image database. Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, pp. 248–255
Girshick R, Donahue J, Darrell T, et al (2014) Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, pp. 580–587
Gladh S, Danelljan M, Khan FS et al (2016) Deep Motion Features for Visual Tracking
He K, Zhang X, Ren S et al (2015) Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Transactions on Pattern Analysis & Machine Intelligence 37(9):1904–1916
Henriques JF, Caseiro R, Martins P et al (2015) High-Speed Tracking with Kernelized Correlation Filters. IEEE Transactions on Pattern Analysis & Machine Intelligence 37(3):583–596
Hong S, You T, Kwak S, et al (2015) Online tracking by learning discriminative saliency map with convolutional neural network. pp. 597–606
Hua Y, Alahari K, Schmid C (2014) Occlusion and Motion Reasoning for Long-Term Tracking. European Conference on Computer Vision
Jing P, Su Y, Nie L et al A Framework of Joint Low-rank and Sparse Regression for Image Memorability Prediction. IEEE Transactions on Circuits & Systems for Video Technology PP(99):1
Kalal Z, Mikolajczyk K, Matas J (2012) Tracking-Learning-Detection. IEEE Transactions on Pattern Analysis & Machine Intelligence 34(7):1409–1422
Kristan M, Matas J, Leonardis A, et al (2015) The visual object tracking vot2015 challenge results. Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 1–23
Kristan M, Matas J, Leonardis A et al (2016) A Novel Performance Evaluation Methodology for Single-Target Trackers. IEEE Transactions on Pattern Analysis & Machine Intelligence 38(11):2137–2155
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. International Conference on Neural Information Processing Systems. Curran Associates Inc. pp. 1097–1105
Li T, Li J, Liu Z, et al (2018) Differentially Private Naive Bayes Learning over Multiple Data Sources. Information Sciences 444
Li H, Li Y, Porikli F (2014) DeepTrack: Learning Discriminative Feature Representations by Convolutional Neural Networks for Visual Tracking. 1–12–1-12
Li Z, Sun L, Yan Q, et al (2018) Significant permission identification for machine learning based android malware detection. IEEE Transactions on Industrial Informatics
Li Y, Wang G, Nie L, et al (2017) Distance Metric Optimization Driven Convolutional Neural Network for Age Invariant Face Recognition. Pattern Recognition 75
Li Y, Zhu J (2014) A Scale Adaptive Kernel Correlation Filter Tracker with Feature Integration. 8926:254–265
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. Computer Vision and Pattern Recognition. IEEE, pp. 3431–3440
Lu G, Nie L, Kambhamettu C (2017) Large-scale Tracking for Images with Few Textures. IEEE Transactions on Multimedia PP(99):1
Lukezic A, Vojir T, Zajc L C, et al (2017) Discriminative Correlation Filter with Channel and Spatial Reliability. IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, pp. 4847–4856
Ma C, Yang X, Zhang C, et al (2015) Long-term correlation tracking. Computer Vision and Pattern Recognition. IEEE, pp. 5388–5396
Mueller M, Smith N, Ghanem B (2017) Context-Aware Correlation Filter Tracking. Computer Vision and Pattern Recognition. IEEE, pp. 1387–1395
Ning J, Yang J, Jiang S, et al (2016) Object Tracking via Dual Linear Structured SVM and Explicit Feature Map. Computer Vision and Pattern Recognition. IEEE, pp. 4266–4274
Pfister T, Charles J, Zisserman A (2015) Flowing ConvNets for Human Pose Estimation in Videos. pp. 1913–1921
Qi Y, Zhang S, Qin L, et al (2016) Hedged Deep Tracking. Computer Vision and Pattern Recognition. IEEE, pp. 4303–4311
Razavian A S, Azizpour H, Sullivan J, et al (2014) CNN Features Off-the-Shelf: An Astounding Baseline for Recognition. IEEE Conference on Computer Vision and Pattern Recognition Workshops. IEEE Computer Society, pp. 512–519
Rui C, Martins P, Batista J (2012) Exploiting the circulant structure of tracking-by-detection with kernels. European Conference on Computer Vision. Springer-Verlag, pp. 702–715
Sadeghian A, Alahi A, Savarese S (2017) Tracking the Untrackable: Learning to Track Multiple Cues with Long-Term Dependencies. pp. 300–311
Simonyan K, Zisserman A (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition. Computer Science
Smeulders AWM, Chu DM, Cucchiara R et al (2014) Visual Tracking: An Experimental Survey. IEEE Transactions on Pattern Analysis & Machine Intelligence 36(7):1442–1468
Supancic JS (2013) Ramanan D. Self-Paced Learning for Long-Term Tracking 9(4):2379–2386
Tao R, Gavves E, Smeulders AWM (2016) Siamese Instance Search for Tracking. IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, pp. 1420–1429
Torralba A, Fergus R, Freeman WT (2008) 80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition. IEEE Transactions on Pattern Analysis & Machine Intelligence 30(11):1958–1970
Valmadre J, Bertinetto L, Henriques J et al (2017) End-to-End Representation Learning for Correlation Filter Based Tracking. pp. 5000–5008
Wang J, Cherian A, Porikli F (2017) Ordered Pooling of Optical Flow Sequences for Action Recognition. Applications of Computer Vision. IEEE, pp. 168–176
Wang L, Liu T, Wang G et al (2015) Video Tracking Using Learned Hierarchical Features. IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society 24(4):1424–1435
Wang N, Yeung D Y (2013) Learning a deep compact image representation for visual tracking. International Conference on Neural Information Processing Systems. Curran Associates Inc., pp. 809–817
Wu Y, Lim J, Yang MH (2013) Online Object Tracking: A Benchmark. Computer Vision and Pattern Recognition. IEEE, pp. 2411–2418
Wu Y, Lim J, Yang MH (2015) Object Tracking Benchmark. IEEE Transactions on Pattern Analysis & Machine Intelligence 37(9):1834–1848
Xing J, Gao J, Li B, et al (2014) Robust Object Tracking with Online Multi-lifespan Dictionary Learning. IEEE International Conference on Computer Vision. IEEE, pp. 665–672
Yi Y, Ding J, Lai J (2013) A novel video salient object extraction method based on visual attention. Elsevier Science Inc.
Yi Y, Lin M (2016) Human action recognition with graph-based multiple-instance learning. Pattern Recogn 53(C):148–162
Yilmaz A (2006) Object tracking: A survey. ACM Comput Surv 38(4):13
Zhang B, Li Z, Cao X et al (2017) Output Constraint Transfer for Kernelized Correlation Filter in Tracking. IEEE Transactions on Systems Man & Cybernetics Systems 47(4):693–703
Zhang B, Luan S, Chen C et al (2017) Latent Constrained Correlation Filter. IEEE Trans Image Process PP(99):1
Zhang J, Ma S, Sclaroff S (2014) MEEM: Robust Tracking via Multiple Experts Using Entropy Minimization. European Conference on Computer Vision, Springer, Cham, pp 188–203
Zhang W, Srinivasan P, Shi J (2011) Discriminative image warping with attribute flow. IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, pp. 2393–2400
Zhu X, Wang Y, Dai J, et al (2017) Flow-Guided Feature Aggregation for Video Object Detection. pp. 408–417
Zhu X, Xiong Y, Dai J, et al (2016) Deep Feature Flow for Video Recognition
Zou WY, Ng AY, Zhu S et al (2012) Deep learning of invariant features via simulated fixations in video. Adv Neural Inf Proces Syst 25:3212–3220
Acknowledgments
The authors would like to thank Fang Li for her insightful comments and help in collecting data which have greatly helped us to improve the technical contents and experiments of the study. This work was partly supported by National Natural Science Foundation of China (No. 61672546 and No. 61573385), and Guangzhou Science and Technology Project (No. 201707010127).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yi, Y., Luo, L. & Zheng, Z. Single online visual object tracking with enhanced tracking and detection learning. Multimed Tools Appl 78, 12333–12351 (2019). https://doi.org/10.1007/s11042-018-6787-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6787-6