Effective visual tracking by pairwise metric learning
Introduction
Visual tracking is a significant topic for human vision cognitive systems [1], and it has various applications including video surveillance, traffic monitoring, behavior analysis, etc. Although there have been much progresses [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17] in the past decades, it remains a challenging problem to design a robust and efficient visual tracker due to the large appearance variations, illumination changes, background clutter, and partial occlusion.
A typical tracking framework mainly consists of three parts: feature representation, appearance modeling and dynamic forecasting [3]. Among them, appearance modeling is mainly to build a mathematical model for object identification, which plays a key role in visual tracking [18]. From appearance modeling point of view, distinguishing the target from its background is a basic ability. This trait is extensively studied in many discriminative trackers, e.g., support vector machines (SVM) based methods [6], [19], and boosting-based ones [3], [20]. On the other hand, to achieve a stable tracking, the appearance model should be able to handle visual variations of the target itself. This property is emphasized in the generative trackers, e.g., subspace models [2], [4], and templates-based ones [21], [22]. Conceptually, discriminative trackers aim at how to maximize the separability between the object and non-object regions; while generative ones concentrate on how to find the object with different visual variations.
For robust tracking, a natural attempt is to fully consider the two aspects above to give a hybrid approach. Zhong et al. [11] proposed a tracking method combining a discriminative classifier and a sparse generative model. In addition, some similar works [23], [24] have also shown that the hybrid trackers could yield a superior tracking performance than the single discriminative or generative ones. However, it is obvious that the resultant advantage is based upon two independent modules. Thus, the corresponding computational cost would be double with the same amount of training data [25].
In this paper, rather than combining the discriminative and generative information of training data using two separated modules, we exploit these two kinds of information in a single object model by pairwise metric learning (PML). The notion of PML refers to studying the similarity or dissimilarity of a data pair, and the sample pairs are used as training instances. In contrast to classical regression/classification, there are two advantages for applying PML: (1) it is easier to obtain large number of pairwise training instances, which can address the problem that the labeled data are insufficient in visual tracking [26]; (2) PML method can exploit the mutual relationship within a training pair, and thus tends to achieve a better learning performance [27].
The PML has been widely used in different fields, such as document retrieval [26], object classification [28], recommendation task [29], etc. Inspired by the success of PML in these applications, we apply it into visual tracking. Technically, unlike the non-pairwise trackers (e.g., [15], [19], [30]) that only use target and/or background samples as the training instances, the target-background pairs and target-target ones are utilized in the proposed tracker. A novel and efficient learning technique, i.e., Extreme Learning Machine (ELM) [31], is to build the pairwise appearance model. Theoretically, for the ELM training, the samples from different subsets (target or background) will have different ELM output responses, and vice versa. With this rationale, the trained ELM network aims to reflect a difference between the target-background pair, while have almost the same responses for the target-target ones. Thus, the discriminative and generative information of training data are fully exploited in a single ELM network. Furthermore, to adapt to the visual changes during tracking, online sequential ELM (OS-ELM) [32] is used to update the obtained pairwise appearance model, which can result in a more robust tracking process.
Recently, there are several trackers [5], [6] involving the concept of PML, which differ from the proposed method in the following aspects: (1) they only concentrate on the discriminative analysis among the target-background pair, but ignore the generative information of target-target one. In contrast, the proposed tracker makes full use of the target-background discriminative information and the target-target generative one in a single object model; (2) unlike the existing pairwise methods, the proposed tracker can be efficiently performed without heavy quadratic programming (QP) [6] or matrix factorization (MF) [5], due to the fast and effective learning capabilities of ELM [31].
Section snippets
Preliminary knowledge
To facilitate the understanding of proposed tracker, this section briefly reviews the related contents of ELM. For a more detailed discussion and analysis, we refer the readers to [31], [33]. Note that the differences and relationships between ELM and other earlier works have been intensively analyzed in [34].
ELM proposed by Huang [34] is originally used for training generalized single hidden layer feed-forward neural networks (SLFNs), and recently extended it to the multi-layer case [35].
Pairwise training
The pipeline of proposed tracker is demonstrated in Algorithm 1 and Fig. 1. Let denote the target samples dynamically collected before the current tth frame, which indicate the different target observations from the 1st frame to the (t)th frame. The current training data is composed of two parts: represents the target samples collected at the tth frame, and stands for the background samples far away from the current estimated object center. And f
Discussions
We note that the contributions of proposed tracker are in twofold: (1) a novel appearance model is presented based on the PML method; (2) ELM technique is utilized to facilitate the pairwise learning performance. The detailed novelties are expounded as follows.
Performance evaluation and analysis
In this section, we conduct comprehensive comparisons to evaluate the performance of proposed approach named the PMLT tracker. And we compare the tracking results of our method with other seven algorithms, including the ranking SVM tracker (RSVT) [6], the sparsity collaborative tracker (SCM) [11], the multiple instance learning tracker (MIL) [3], the fragments tracker (Frag) [22], the compressive tracker (CT) [30], the visual tracking decomposition tracker (VTD) [46] and the
Conclusion
In this paper, based on the PML, we have advocated a novel and effective online tracking method using Extreme Learning Machine (ELM). Unlike the existing trackers, the proposed method has fully considered the discriminative and generative aspects of appearance modeling in a single object model. The fast learning speed of ELM facilitates the pairwise training efficiency. Moreover, we have designed the online sequential updating of appearance model, which results in a more robust tracking
Acknowledgment
This work was supported by the National Natural Science Foundation of China under Grant 61301090, the Beijing Excellent Talent Fund under Grant 2013D009011000001, the National High Technology Research and Development Program of China under Grant 2014AA8012013L, and in part by the Excellent Young Scholars Research Fund of Beijing Institute of Technology under Grant 2013YR0508.
Chenwei Deng received the Ph.D. degree in signal and information processing from Beijing Institute of Technology, Beijing, China, in 2009. He is currently a full professor at the School of Information and Electronics, Beijing Institute of Technology, China. He has authored or co-authored over 50 technical papers in refereed international journals and conferences, and co-edited one book. His current research interests include image/video coding, quality assessment, perceptual modeling, features
References (49)
- et al.
Incremental pairwise discriminant analysis based visual tracking
Neurocomputing
(2010) - et al.
Robust visual tracking via multi-graph ranking
Neurocomputing
(2015) - et al.
Multi-task l0 gradient minimization for visual tracking
Neurocomputing
(2015) - et al.
A survey of appearance models in visual object tracking
ACM Trans. Intell. Systems Technol.
(2013) - et al.
Extreme learning machine for ranking: generalization analysis and applications
Neural Netw.
(2014) - et al.
Trends in extreme learning machines: a review
Neural Netw.
(2015) Cognitive processes in eye guidance: algorithms for attention in image processing
Cognit. Comput.
(2009)- et al.
Incremental learning for robust visual tracking
Int. J. Comput. Vis.
(2008) - et al.
Robust object tracking with online multiple instance learning
IEEE Trans. Patt. Anal. Mach. Intell.
(2011) - et al.
Robust visual tracking using l1 minimization
Proceedings of the IEEE International Conference on Computer Vision
(2009)
Robust visual tracking via ranking svm
Proceedings of the IEEE International Conference on Image Processing
Robust tracking via weakly supervised ranking svm
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Object tracking via 2dpca and l1-regularization
IEEE Signal Process. Lett.
Object tracking via robust multitask sparse representation
IEEE Signal Process. Lett.
Online visual tracking via two view sparse representation
IEEE Signal Process. Lett.
Robust object tracking via sparsity-based collaborative model
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Robust tracking with discriminative ranking lists
IEEE Trans. Image Process.
Visual tracking via temporally smooth sparse coding
IEEE Signal Process. Lett.
Visual tracking via constrained incremental non-negative matrix factorization
IEEE Signal Process. Lett.
Multitask extreme learning machine for visual tracking
Cognit. Comput.
Support vector tracking
IEEE Trans. Patt. Anal. Mach. Intell.
Real-time tracking via on-line boosting
Proceedings of the British Machine Vision Conference
Robust visual tracking and vehicle classification via sparse representation
IEEE Trans. Patt. Anal. Mach. Intell.
Robust fragments-based tracking using the integral histogram
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Cited by (0)
Chenwei Deng received the Ph.D. degree in signal and information processing from Beijing Institute of Technology, Beijing, China, in 2009. He is currently a full professor at the School of Information and Electronics, Beijing Institute of Technology, China. He has authored or co-authored over 50 technical papers in refereed international journals and conferences, and co-edited one book. His current research interests include image/video coding, quality assessment, perceptual modeling, features representation.
Baoxian Wang received the B.Eng. degree from Northeastern University, China, in 2010. He is currently pursuing the Ph.D. degree in the School of Information and Electronics, Beijing Institute of Technology, Beijing, China. His current research interests include image processing, computer vision, machine learning, and pattern recognition.
Weisi Lin received the Ph.D. degree from King’s College London, London, U.K. He is currently an Associate Professor and Associate Chair (Graduate Studies) with the School of Computer Engineering, Nanyang Technological University, Singapore. His current research interests include visual quality evaluation and perception-inspired signal modeling. He has published over 270 refereed papers at international journals and conferences. More details are available at http://www.ntu.edu.sg/home/wslin/.
Guang-Bin Huang received the Ph.D. degree in electrical engineering from Nanyang Technological University, Singapore in 1999. From May 2001, he has been working as an Assistant Professor and Associate Professor in the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore. His current research interests include machine learning, computational intelligence, and extreme learning machine. He serves as an Associate Editor of Neurocomputing, and IEEE Transactions on Cybernetics.
Baojun Zhao received the Ph.D. degree in electromagnetic measurement technology and equipment from Harbin Institute of Technology (HIT), Harbin, China, in 1996. From 1996 to 1998, he was a postdoctoral fellow at Beijing Institute of Technology (BIT), Beijing, China. Since 1998, he has been engaged in teaching and research work at Radar Research Laboratory, BIT. He has authored or co-authored over 100 publications. His main research interests include image/video coding, image recognition.