Elsevier

Pattern Recognition

Volume 45, Issue 3, March 2012, Pages 1050-1060
Pattern Recognition

Correlation-based incremental visual tracking

https://doi.org/10.1016/j.patcog.2011.08.026Get rights and content

Abstract

Generative subspace models like probabilistic principal component analysis (PCA) have been shown to be quite effective for visual tracking problems due to their representational power that can capture the generation process for high-dimensional image data. The recent advance of incremental learning has further enabled them to be practical for real-time scenarios. Despite these benefits, the PCA-based approaches in visual tracking can be potentially susceptible to noise such as partial occlusion due to their compatibility judgement based on the goodness of fitting for the entire image patch. In this paper we introduce a novel appearance model that measures the goodness of target matching as the correlation score between partial sub-patches within a target. We incorporate the canonical correlation analysis (CCA) into the probabilistic filtering framework in a principled manner, and derive how the correlation score can be evaluated efficiently in the proposed model. We then provide an efficient incremental learning algorithm that updates the CCA subspaces to adapt to new data available from the previous tracking results. We demonstrate the significant improvement in tracking accuracy achieved by the proposed approach on extensive datasets including the large-scale real-world YouTube celebrity video database as well as the novel video lecture dataset acquired from British Machine Vision Conference held in 2009, where both datasets are challenging due to the abrupt changes in pose, size, and illumination conditions.

Highlights

► We introduce a novel tracking method that exploits correlation between sub-patches. ► CCA is incorporated into a probabilistic tracking framework in a principled way. ► An efficient incremental CCA learning algorithm is proposed. ► The approach is more robust to noise than the previous work on real-world videos.

Introduction

The visual tracking is a very important problem in computer vision whose goal is to localize the regions of the object of interest (e.g., face) in a stream of video frames. Central to the success of higher-level tasks such as object recognition and activity classification, it has received significant attention in the vision community for decades. However, the visual tracking still remains a challenging problem due to the intrinsic difficulty in modeling potentially varying appearance of a target object along video frames, typically originated from dynamic changes in pose and illumination as well as shape deformation for non-rigid objects.

In tackling the visual tracking problem, two essential issues that need to be considered are: (1) how to model the target appearance (e.g., subspace representation, kernel-weighted histogram), and (2) how to choose (i.e., search for) the best target location among a set of candidates in each video frame (e.g., temporal filtering, gradient search). For instance, [1] used a contour model for target representation while the density of the tracking state (or target location) is estimated and propagated within a probabilistic state-space model, often referred to as particle filtering or Condensation. In [2], a gradient-based search strategy called mean-shift was adopted to find the best-matching target patches represented as color histograms.

The former issue of the appearance modeling is known to be a more critical factor than the search strategy, and often significantly affects the tracking performance [3], [4], [5], [6]. Developing a robust observation model has been the matter of primary interest in the recent visual tracking research. Existing appearance models include, among others, the view-based models [7], 3D models [8], mixture models [5], the kernel representation [9], and the Gaussian density based model [10].

In designing an appearance model, the crucial properties that a tracker needs to meet are robustness and adaptability to changes in target appearance (e.g., pose, illumination). Recent tracking methods such as the Incremental Visual Tracker (IVT) [6] aimed to achieve these goals by incorporating an adaptive appearance model. Similar attempts to incremental modeling of appearance changes have been suggested in [11], [12], [13]. In particular, the IVT represented a target as a low-dimensional subspace that captures the principal components of possible appearance variations, where the subspace is updated adaptively using the image patches tracked in the previous frames. Unlike many non-adaptive approaches that employ fixed appearance template models such as the eigentracking of [7], the IVT alleviates the burden of constructing a target model prior to tracking with a large number of expensive offline data, and tends to yield higher tracking accuracies. More recently, [11] extended the IVT by introducing offline target models (so-called visual constraints) to the adaptive one, addressing the IVT's known problem of susceptibility to drift due to the gradual adaptation to non-targets.

The appearance model of the IVT is essentially the probabilistic principal component analysis (PPCA) [14], a generative probabilistic model that aims to represent the image patch of the target from a low-dimensional latent subspace. Despite its representational power that can capture low-dimensional intrinsic variability for high-dimensional image data, the PPCA in visual tracking can be potentially susceptible to noise (e.g., partial occlusion). This is because the PPCA judges the goodness of target matching based on the compatibility score of the entire image patch with respect to its generation process. In the presence of partial occlusion, for instance, although a considerable portion of the target image patch remains intact, the occluded portion may severely degrade the compatibility score with respect to the PPCA generative model.

Although there have been some previous attempts to address the partial occlusion problem (e.g., [12]), in this paper we suggest a novel appearance model that judges the goodness of the target matching based on the correlation score between partial sub-patches within a target image patch. The sub-patches can be chosen a priori, say two half-patches obtained by vertical or horizontal split of the target (see Fig. 2). The intuition is that the statistical relationships between pairs of partial sub-patches tend to be less affected by environment changes (e.g., illumination variation or occlusion) since those sub-patches would undergo identical random noise processes that govern the changes in environment. Our approach can thus be more robust to noise than the PPCA which only judges how well the entire image patch fits to the underlying model.

For a reliable estimation of the correlation score for the high-dimensional image data, we basically consider the canonical correlation analysis (CCA) [15], [16], a low-dimensional dyadic subspace model that captures the maximal correlation between two variates (sub-patches). In order to incorporate the correlation model into a probabilistic filtering framework in a principled manner, we utilize the probabilistic CCA (PCCA), the probabilistic extension of the CCA recently introduced by [17].

Our major contribution is two-fold: we first derive how the correlation score can be evaluated efficiently for a given image patch (that is, a particle in the filtering framework), which serves as the essential particle-reweighting part in our PCCA-based filtering model that gauges the relevance of each particle. We then provide an efficient incremental algorithm for updating the PCCA subspaces, which avoid the computationally demanding procedure of building the subspaces naively from the scratch. We call our approach the correlation-based incremental visual tracking.

In an extensive set of experiments including the large-scale real-world YouTube celebrity video database [11] as well as the novel challenging video lecture dataset obtained from the British Machine Vision Conference held in 2009, we demonstrate that our approach significantly outperforms the IVT in terms of the tracking accuracy.

The paper is organized as follows: in the next section, the formal description of the probabilistic framework for visual tracking is provided. We then briefly review the IVT [6] in Section 3 focusing on the main ingredients, the PPCA subspace model and the incremental learning algorithm. In Section 4, we introduce our approach, the correlation-based incremental visual tracker, with the emphasis on the derivation for the observation likelihood score and the incremental PCCA update algorithm. The experimental evaluation results are provided in Section 5.

Section snippets

Probabilistic framework for visual tracking

In a probabilistic framework [1], the object tracking can be posed as an on-line temporal filtering problem where we aim to estimateP(ut|F0t),fort=1,2,.Here Ft is the input image frame and ut is the tracking state at time t with the initial state u0 assumed known a priori either by manual mark-up or a object detector. The state ut specifies an image patch in Ft that tightly surrounds the target object to be tracked. A typical choice is to form a square bounding box parameterized by the

Incremental visual tracking (IVT)

The goal of the IVT [6] is to build and maintain a low-dimensional subspace at each time that captures the principal variations of the object appearance thus far. This can be achieved by learning a PCA subspace at each time t, denoted as St=(mt,Bt) where we often hide the subscript t in notation for brevity, using the previously tracked images D={h0,,ht1}. That is, m is the mean vector of D, and B takes in its columns a few major eigenvectors of the covariance matrix estimated from D. In the

Correlation-based incremental visual tracking

Our main idea is to partition the image patch ht into two sub-patches xt and yt, and build a subspace that captures the correlation between two. The sub-patches can be chosen a priori, say two half-patches obtained by a vertical or horizontal split (see Fig. 2). Measuring the intrinsic statistical relationship between these high-dimensional variates can be effectively done by the subspace model called the canonical correlation analysis (CCA) [15], [16], where its probabilistic interpretation

Evaluation

In this section we empirically demonstrate the performance of the proposed correlation-based tracking method on both synthetic and real-world videos. We particularly deal with the face tracking problems, which are of greater importance and often even more popular than tracking non-face objects due to the intrinsic changeability in 3D pose as well as non-rigid facial expression.

We first consider two synthetic settings that simulate the conditions of partial occlusion and gradual illumination

Concluding remarks

In this paper we have proposed a novel correlation-based appearance model in visual tracking problem that can be robust to noise such as partial occlusion. Compared to the existing generative models such as PCA subspaces which only judge how well the entire image fits to the underlying generation process, our model captures the statistical relationship between partial patches within a target, which is shown to be crucial for accurate and robust tracking. In addition, we have introduced an

Minyoung Kim received the BS and MS degrees both in Computer Science and Engineering in Seoul National University, South Korea. He earned the PhD degree in Computer Science from Rutgers University in 2008. From 2009 to 2010 he was a postdoctoral researcher at the Robotics Institute of Carnegie Mellon University. He is currently an Assistant Professor in the Department of Electronic and Information Engineering at Seoul National University of Science and Technology in Korea. His primary research

References (21)

  • K. Hotta

    Adaptive weighting of local classifiers by particle filters for robust tracking

    Pattern Recognition

    (2009)
  • A. Yao et al.

    An incremental Bhattacharyya dissimilarity measure for particle filtering

    Pattern Recognition

    (2010)
  • M. Isard, A. Blake, Contour tracking by stochastic propagation of conditional density, in: European Conference on...
  • D. Comaniciu et al.

    Kernel-based object tracking

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2003)
  • M.J. Black et al.

    A framework for modeling appearance change in image sequence

  • T.F. Cootes et al.

    Active appearance models

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2001)
  • A.D. Jepson et al.

    Robust online appearance models for visual tracking

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2003)
  • D. Ross et al.

    Incremental learning for robust visual tracking

    International Journal of Computer Vision

    (2008)
  • M.J. Black, A.D. Jepson, EigenTracking: robust matching and tracking of articulated objects using a view-based...
  • M. La Cascia et al.

    reliable head tracking under varying illumination: an approach based on registration of texture-mapped 3D models

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2000)
There are more references available in the full text version of this article.

Cited by (17)

  • Self-taught learning of a deep invariant representation for visual tracking via temporal slowness principle

    2015, Pattern Recognition
    Citation Excerpt :

    Liwicki et al. [4] formulated an incremental kernel PCA in Krein space to learn a nonlinear subspace representation for tracking. To account for partial occlusion during tracking, Kim [5] proposed a Canonical Correlation Analysis (CCA)-based tracker which considers the correlations among sub-patches of tracking observations. Mixture models can also be used for tracking, Jepson et al. [6] learned a mixture model to model appearance changes of target object via an online Expectation-Maximization (EM) algorithm.

  • Uncertain canonical correlation analysis for multi-view feature extraction from uncertain data streams

    2015, Neurocomputing
    Citation Excerpt :

    The incremental learning algorithm for CCA subspaces is another example. Kim [42] incorporated CCA into a novel appearance model that measures the goodness of target matching as the correlation score between partial sub-patches within a target, and applied it to deal with the visual tracking problem. Other examples come from community of data stream mining.

  • Object tracking based on an online learning network with total error rate minimization

    2015, Pattern Recognition
    Citation Excerpt :

    Ross et al. [4] improved the concept of online update by introducing principle component analysis and a forgetting factor which enables non-manual initialization. To consider the partial occlusion problem, Kim [32] proposed an incremental learning method based on a canonical analysis among partial image patches. To further address the partial occlusion and illumination problems simultaneously, Choi and Oh [33] proposed a segment based method with histogram of oriented gradient features.

  • Game-theoretical occlusion handling for multi-target visual tracking

    2013, Pattern Recognition
    Citation Excerpt :

    Multi-target visual tracking (MTVT) has emerged as an active research topic in the past two decades due to its widespread applications in many areas, including intelligent surveillance, smart rooms, visual human–computer interfaces, autonomous robotics, augmented reality and video compression. It extends the single target visual tracking [1–5] to a situation where the number of targets is not known and is also varying with time. Moreover, the data association and the mutual occlusion problems make MTVT a challenge.

  • Incremental canonical correlation analysis

    2020, Applied Sciences (Switzerland)
View all citing articles on Scopus

Minyoung Kim received the BS and MS degrees both in Computer Science and Engineering in Seoul National University, South Korea. He earned the PhD degree in Computer Science from Rutgers University in 2008. From 2009 to 2010 he was a postdoctoral researcher at the Robotics Institute of Carnegie Mellon University. He is currently an Assistant Professor in the Department of Electronic and Information Engineering at Seoul National University of Science and Technology in Korea. His primary research interest is machine learning and computer vision. His research focus includes graphical models, motion estimation/tracking, discriminative models/learning, kernel methods, and dimensionality reduction.

View full text