Abstract
Visual object tracking in dynamic environments with severe appearance variations is a significant problem in the computer vision field. This paper proposes a novel visual tracking algorithm that exploits the multiple level features learning ability of SDAE. There are two training stages for the SDAE network: Layer-wise pre-training and fine-tuning. In the pre-training stage, a two-layer sparse-coded method is used to represent the input image, then a multi-level image feature descriptor is obtained. In the fine-tuning stage, the connection weights and bias terms for back propagation are gathered via genetic algorithm. A logistic classification layer is added at the top of the encoder network to enable tracking within the well-established particle filter network. Experimental results confirm, both qualitatively and quantitatively, that the proposed method performs well in comparison against eight other state-of-the-art methods.
Similar content being viewed by others
References
Arulampalam M, Maskell S, Gordon N et al (2002) A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Trans Signal Process 50(2):174–188
Babenko B, Yang MH, Belongie S (2009) Visual tracking with online multiple instance learning. Proc IEEE Int Conf Computer Vision and Pattern Recognition (CVPR), Miami, June 2009, pp 983–990
Baldi P (2012) Autoencoders, unsupervised learning, and deep architectures. JMLR: workshop and conference Proc 27th on unsupervised and transfer learning, pp 37–50
Bao C, Wu Y, Ling H et al (2012) Real time robust L1 tracker using accelerated proximal gradient approach. Proc IEEE Int Conf Computer Vision and Pattern Recognition (CVPR), Providence, June 2012, pp 1830–1837
Bengio Y (2009) Learning deep architectures for AI. Foundation and Trends in Machine Learning 2(1):1–127
Bengio S, Pereira F, Singer Y et al (2009) Group sparse coding. Advances in Neural Information Processing Systems (NIPS), Vancouver, December 2009, pp 82–89
Collins RT, Liu YX, Leordeanu M (2005) Online selection of discriminative tracking features. IEEE Trans Pattern Anal Mach Intell 27(10):1631–1643
Everingham M, Eslami SA, Van Gool L, Williams CK, Winn J, Zisserman A (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vis 111(1):98–136
Girshick R, Donahue J, Darrell T et al (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. Proc IEEE Int Conf Computer Vision and Pattern Recognition (CVPR), Columbus, June 2014, pp 580–587
Godsill S, Clapp T (2001) Improvement strategies for Monte Carlo particle filters[M]// sequential Monte Carlo methods in practice. Springer, New York, pp. 139–158
Grabner H, Grabner M, Bischof H (2006) Real-time tracking via on-line boosting. Proc British Machine Vision Conf, Edinburgh, September 2006, pp 6
Hinton GE (2012) A practical guide to training restricted boltzmann machines. Springer, Berlin
Huval B, Coates A, Ng A (2013) Deep learning for class-generic object detection. arXiv preprint arXiv:1312.6885
Jordan A (2001) On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. Advances in Neural Information Processing Systems (NIPS), Vancouver, December 2001, pp 169–187
Kaihua Z, Lei Z, Yang MH (2013) Real-time object tracking via online discriminative feature selection. IEEE Trans Image Process 22(12):4664–4677
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems (NIPS), Lake Tahoe, December 2012, pp 1097–1105
von Lehman A, Paek EG, Liao PF (1988) Factors influencing learning by back-propagation. IEEE Int Conf Neural Networks, San Diego, July 1988, pp 335–341
Li MQ, Kou JS, Lin D (2002) The fundamental theory and application of genetic algorithm. Science Press, Beijing
Olshausen B, Field D (1997) Sparse coding with an overcomplete basis set: a strategy employed by V1? Vis Res 37(23):3311–3326
Perez P, Hue C, Vermaak J et al (2002) Color-Based Probabilistic Tracking. Proc 7th European Conf Computer Vision (ECCV), Copenhagen, pp 661–675
Ross D, Lim J, Yang MH (2004) Adaptive probabilistic visual tracking with incremental subspace update. Proc 8th European Conf on Computer Vision (ECCV), Prague, May 2004, pp 470–482
Ross D, Lim J, Lin RS (2008) Incremental learning for robust visual tracking. Int J Comput Vis 77(1):125–141
Vincent P, Larochelle H, Bengio Y et al (2008) Extracting and composing robust features with denoising autoencoders. Proc 25th Int Conf on Machine Learning (ICML), New York, June 2008, pp 1096–1103
Vincent P, Larochelle H, Lajoie I et al (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408
Wang N, Yeung DY (2013) Learning a deep compact image representation for visual tracking. Advances in Neural Information Processing Systems (NIPS), Lake Tahoe, December 2013, pp 809–817
Wu Y, Lim J, Yang MH (2013) Online object tracking: a benchmark. Proc IEEE Conf Computer Vision and Pattern Recognition (CVPR), Portland, June 2013, pp 2411–2418
Yang J, Zhang D, Frangi AF et al (2004) Two-dimensional PCA: a new approach to appearance-based face representation and recognition. IEEE Trans Pattern Anal Mach Intell 26(1):131–137
Yilmaz A, Javed O, Shah M (2006) Object tracking: a survey. ACM Comput Surv 38(4):1–45
Zhang T, Ghanem B, Liu S et al (2012) Robust visual tracking via multi-task sparse learning. Proc IEEE Conf on Computer Vision and Pattern Recognition (CVPR), Providence, June 2012, pp 2042–2049
Zhang K, Zhang L, Yang MH (2012) Real-time compressive tracking. Proc 12th European Conf Computer Vision (ECCV), Florence, October 2012, pp 864–877
Zhou X, Xie L, Zhang P et al (2014) An ensemble of deep neural networks for object tracking. Proc IEEE Int Conf Image Process (ICIP), Paris, October 2014, pp 843–847
Yu K, Li YQ, Lafferty J (2011) Learning image representations from the pixel level via hierarchical sparse coding. Proc IEEE Int Conf Computer Vision and Pattern Recognition (CVPR), Providence, June 2011, pp 1713–1720
Acknowledgments
This work is supported by the National Natural Science Foundation of China (61672433). The authors would like to thank the valuable comments from the reviewers and editors.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hua, W., Mu, D., Guo, D. et al. Visual tracking based on stacked Denoising Autoencoder network with genetic algorithm optimization. Multimed Tools Appl 77, 4253–4269 (2018). https://doi.org/10.1007/s11042-017-4702-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-4702-1