Elsevier

Neurocomputing

Volume 286, 19 April 2018, Pages 88-108
Neurocomputing

Online classification for object tracking based on superpixel

https://doi.org/10.1016/j.neucom.2018.01.069Get rights and content

Abstract

Treating object tracking as a binary classification problem has been greatly explored in recent years. State-of-the-art classification based trackers perform better robustness than many of the other existing trackers. In this paper, we propose a collaborative model by incorporating the local and holistic models together which is corresponding to discriminative and generative models in tracking-classification-framework. At the local level, an on-line Random Forest (RF) classifier is trained to distinguish the superpixels of the object from the background. A series of local superpixels are used to represent the target, so as to adapt the appearance variances. The discriminative model is used to classify superpixels in the next frame as either belonging to the object or background. A confidence map consisting of dependability and stability is formed to measure the probabilities of superpxiels pertaining to the target from the classifier. A modified mean-shift is proposed to work on the confidence map to find the peak, where is the position of the target. Meanwhile, a separate component for managing the training set dynamically is employed to control the updating of the RF model. At the global level, the target is represented by covariance matrix of multi-scale bounding boxes. The generative model is applied for protection measure, which can effectively reduce the drifts during tracking. Experimental results demonstrate that our method is effective and performs favorably in comparison to the state-of-the-art trackers.

Introduction

Object tracking is a focused research field in computer vision for the past few decades. It plays an important role in many computer vision applications such as activity analysis [1], intelligent transportation systems [2], and surveillance [3], etc. Tracking finds a region in the current image that matches the given object. If the matching function takes into account only the object without the background, it might not be able to correctly distinguish the object from the background and the tracking might fail.

Certainly, object tracking is quite a challenging research because the appearance of the same object will change differently under different illumination or viewpoints. Numbers of powerful algorithms have been proposed to track the object in the video over the past two decades. Yang et al. [4] survey the recent progress on visual tracking, including feature descriptors, online learning methods, integration of context information as well as sampling methods. Based on object modeling, a majority of the trackers can be divided into the following two categories: discriminative methods [5], [6], [7], [8], [9], [10] and generative methods [11], [12], [13], [14], [15]. Techniques under the generative approach attempt to build a model based on the object appearance. However, they are not equipped to distinguish an object of interest from the background patches because they consider only object similarity. Consequently, background pixels within a bounding-box are inevitably considered as parts of foreground, thereby bringing amount of inaccurate information for updating the appearance model.

Compared with the generative approach, the discriminative approach aims at separating the object from the background. This observation is evident from several recent developments in the literature [16], [17], [18] which illustrate better tracking results than the generative approach. However, the discriminative approach does not cater for arbitrary object tracking and adaptation to appearance changes. Babenko et al. [18] use multiple-instance learning to train the model with bags of positive examples. Kalal et al. [19] encode rules for additional supervision based on optical flow and a conservative appearance model. Other approaches avoid or delay making hard decisions. Tuzel et al. [20] address the problem of distractor resistance tracking for multiple nearby objects and also poses it as a binary pattern classification problem. Wu et al. [21] propose a regional deep learning tracker that observes the target by multiple sub-regions and each region is observed by a deep learning model. In contrast to most existing trackers which only exploit 2D color or gray images to learn the appearance model of the tracked object online, Zhong et al. [22] take a different approach. Inspired by the increased popularity of depth sensors, they put more emphasis on the 3D context to prevent model drift and handle occlusion. In addition, methods based on correlation filter [23], [24], [25] and methods based on deep learning [26], [27], [28], [29] have achieved excellent performance and draw more attention. Overall consideration, we take two models to complement each other.

In this paper, we try to integrate the merits of both models and the main contributions are summarized as the following aspects. Firstly, we propose a collaborative model by incorporating the local and holistic models together which is corresponding to discriminative and generative models in tracking-by-classification framework. Secondly, we propose an effective and efficient appearance model by superpixel and an online Random Forest (RF) model by incorporating the foreground and background information. This discriminative model utilizes both the local information of the target and spatial-temporal cues of the superpixel. Thirdly, we form a confidence map consisting of dependability and stability to measure the probabilities of superpxiels pertaining to the target. This measure can provide reliable evidence for target localization and discard any superpixels whose positions are far away from the target. Fourthly, we propose a new modified mean-shift algorithm to work on the confidence map for locating the target. The algorithm is verified to be more rapid and accurate. Finally, we represent the target by covariance matrix of multi-scale bounding boxes at the global level. The generative model is applied for protection measure to reduce the drifts. Furthermore, we utilize the strategy for dynamically managing training samples to control model updating. In conjunction with updating mechanism, managing component and protection measure, the proposed algorithm is able to handle complex scenarios. In addition, the target can be tracked by the boundary and video segmentation is also achieved as a by-product in our approach.

Section snippets

Related work

Since tracking by superpixel can be considered as one kind of part-based method. We review visual tracking in three aspects including part-based tracking, classification-based tracking and superpixel-based tracking.

Our method

We propose our formulation for jointly learning the RF model and updating the model in conjunction with updating mechanism, managing component and protection measure in a tracking-by-classification framework (See the Fig. 1).

Feature analysis

As we mentioned in Section 3.1.2, we analyze those features and select the best features for updating our classifier. Because the RF algorithm can be used to re-rank the importance of variables in a classification model, we can get the weight of each feature in our online learning system.

To measure the importance of the jth feature after training, the values of the jth feature are permuted among the training data and the out-of-bag error is computed on this perturbed data set. The importance

Experiments

We conduct experiments on 12 published challenging tracking sequences as well as visual tracking benchmark [55] to exhibit the advantages of our proposed algorithm. These sequences include most challenging factors in visual tracking: scale variation (SV), half or full occlusion (OCC), non-rigid object deformation (DEF), motion blur (MB), fast motion (FM), In-Plane Rotation (IPR), out-of-plane rotation (OPR), Out-of-View (OV), background clutters (BC), low resolution (LR) and illumination

Discussion of occlusion handling

We demonstrate the effectiveness of the proposed protection with occlusion and drifts in handling occlusions using an example (As shown in Fig. 18). In this sequence, we aim to track the kitesurfer in the sequence surfing1 as shown in the initial tracking object whose appearance varies significantly due to occlusion and pose change.

In Fig. 18 (a), we show a plot of center error with four representative results. The plot of center error describes the tracking results. When the tracking is

Conclusion

In this paper, we have treated tracking as a binary classification problem and have proposed an appearance model by incorporating the local and holistic models together corresponding to discriminative and generative models. At local level, an on-line RF classifier has been trained to distinguish the superpixels of the object from the background. By using a series of local superpixels to represent the target, our tracker has been robust in dealing with the occlusion, rotation and deformation.

Acknowledgment

This work is supported in part by National Natural Science Foundation of China (61403342, 61273286, U1509207, 61325019, 61603341), Zhejiang Provincial Natural Science Foundation of China (LY18F030020), and Hubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering (2017SDSJ03).

Sixian Chan He is a Ph.D. candidate at the computer science and technology from Zhejiang University of Technology. He received his bachelor degree from Anhui University of Architecture in 2012. His research interest covers image processing, machine learning and video tracking. E-mail: [email protected].

References (61)

  • F. Lv et al.

    Automatic tracking and labeling of human activities in a video sequence

    Proceedings of the Sixth IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS04)

    (2004)
  • C. Stauffer et al.

    Learning patterns of activity using real-time tracking

    IEEE Trans. Pattern Anal. Mach. Intel.

    (2000)
  • X. Mei et al.

    Robust visual tracking using ℓ1 minimization

    Proceedings of the IEEE Twelfth International Conference on Computer Vision

    (2009)
  • R.T. Collins

    Mean-shift blob tracking through scale space

    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003.

    (2003)
  • F. Yang et al.

    Robust superpixel tracking

    IEEE Trans. Image Process.

    (2014)
  • D.A. Ross et al.

    Incremental learning for robust visual tracking

    Int. J. Comput. Vis.

    (2008)
  • M.M. Azab et al.

    New technique for online object tracking-by-detection in video

    IET Image Process.

    (2014)
  • K. Zhang et al.

    Robust visual tracking via convolutional networks without training

    IEEE Trans. Image Process.

    (2016)
  • S. Avidan

    Support vector tracking

    IEEE Trans. Pattern Anal. Mach. Intel.

    (2004)
  • S. Avidan

    Ensemble tracking

    IEEE Trans. Pattern Anal. Mach. Intel.

    (2007)
  • B. Babenko et al.

    Robust object tracking with online multiple instance learning

    IEEE Trans. Pattern Anal. Mach. Intel.

    (2011)
  • Z. Kalal et al.

    Tracking-learning-detection

    IEEE Trans. Pattern Anal. Mach. Intel.

    (2012)
  • O. Tuzel et al.

    Region covariance: a fast descriptor for detection and classification

    Proceedings of the European Conference on Computer Vision

    (2006)
  • J.F. Henriques et al.

    High-speed tracking with kernelized correlation filters

    IEEE Trans. Pattern Anal. Mach. Intel.

    (2015)
  • M. Tang et al.

    Multi-kernel correlation filter for visual tracking

    Proceedings of the IEEE International Conference on Computer Vision

    (2015)
  • M. Danelljan et al.

    Beyond correlation filters: learning continuous convolution operators for visual tracking

    Proceedings of the European Conference on Computer Vision (ECCV)

    (2016)
  • J. Gao et al.

    Deep relative tracking

    IEEE Trans. Image Process.

    (2017)
  • L. Wang et al.

    Visual tracking with fully convolutional networks

    Proceedings of the IEEE International Conference on Computer Vision

    (2015)
  • L. Bertinetto et al.

    Fully-convolutional siamese networks for object tracking

    Proceedings of the European Conference on Computer Vision

    (2016)
  • L.Q.H.Y.Q.H.J.L. Yuankai Qi et al.

    Hedged deep tracking

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    (2016)
  • Cited by (4)

    • A framework of tracking by multi-trackers with multi-features in a hybrid cascade way

      2019, Signal Processing: Image Communication
      Citation Excerpt :

      While discriminative methods treat the tracking process as the design of classifiers [39,40] or regressors [4,41,42] and model the posterior probability of the object state directly. Chan et al. [43] design an online Random Forest (RF) classifier to distinguish the superpixels of object from the background. Zhang et al. [37] propose a novel hybrid formulation for tracking based on various classification and regression methods such as binary SVM, regression SVM and one-class SVM.

    Sixian Chan He is a Ph.D. candidate at the computer science and technology from Zhejiang University of Technology. He received his bachelor degree from Anhui University of Architecture in 2012. His research interest covers image processing, machine learning and video tracking. E-mail: [email protected].

    Xiaolong Zhou He received the Ph.D. degree in mechanical engineering from the Department of Mechanical and Biomedical Engineering, City University of Hong Kong, Hong Kong, in 2013. He joined Zhejiang University of Technology, Zhejiang, China in February 2014 where he currently serves as an Associate Professor at the College of Computer Science. From 2015 to 2016, he worked as a Research Fellow at the School of Computing, University of Portsmouth, Portsmouth, UK. He serves as an IEEE member and an ACM member. He received the T.J. Tarn Best Paper Award on ROBIO2012 and ICRA2016 CEB award for Best Reviewers. His research interests include visual tracking, gaze estimation, 3D reconstruction and their applications in various fields. He has authored over 50 peer-reviewed international journals and conference papers. He has served as a Program Committee Member on ROBIO2015, ICIRA2015, SMC2015 and HSI2016. E-mail: [email protected].

    Shenyong Chen He received the Ph.D. degree in computer vision from City University of Hong Kong, Hong Kong in 2003. He joined Zhejiang University of Technology, Zhejiang, China in February 2004 where he currently serves as a Professor at the College of Computer Science. From 2006 to 2007, he received a fellowship from the Alexander von Humboldt Foundation of Germany and worked at the University of Hamburg, Hamburg, Germany. From 2008 to 2009, he worked as a Visiting Professor at Imperial College, London, UK and in 2012 he worked as a Visiting Professor at The University of Cambridge, UK. He has published over 100 scientific papers in international journals and conferences. He serves as an IET Fellow and an IEEE Senior Member. He was invited as a nominator for the 2012 Nobel Prize in Physics (by the Nobel Committee). His research interests include computer vision, 3D modeling, and image processing. E-mail: [email protected].

    View full text