Mean Shift tracking with multiple reference color histograms

https://doi.org/10.1016/j.cviu.2009.12.006Get rights and content

Abstract

The Mean Shift tracker is a widely used tool for robustly and quickly tracking the location of an object in an image sequence using the object’s color histogram. The reference histogram is typically set to that in the target region in the frame where the tracking is initiated. Often, however, no single view suffices to produce a reference histogram appropriate for tracking the target. In contexts where multiple views of the target are available prior to the tracking, this paper enhances the Mean Shift tracker to use multiple reference histograms obtained from these different target views. This is done while preserving both the convergence and the speed properties of the original tracker. We first suggest a simple method to use multiple reference histograms for producing a single histogram that is more appropriate for tracking the target. Then, to enhance the tracking further, we propose an extension to the Mean Shift tracker where the convex hull of these histograms is used as the target model. Many experimental results demonstrate the successful tracking of targets whose visible colors change drastically and rapidly during the sequence, where the basic Mean Shift tracker obviously fails.

Introduction

The target’s color histogram is widely used for visual tracking (e.g. [1], [2], [3], [4], [5]) and, as was shown by Comaniciu et al. [3], [6], tracking using this feature may be performed very quickly via the Mean Shift procedure [7]. This paper extends Comaniciu et al.’s tracker in [3], [6], which will be referred to in this paper by its common name Mean Shift tracker.

The Mean Shift tracker works by searching in each frame for the location of an image region whose color histogram is closest to the reference color histogram of the target. The distance between two histograms is measured using their Bhattacharyya coefficient, and the search is performed by seeking the target location via Mean Shift iterations beginning from the target location estimated in the previous frame (the tracker is outlined in Section 2).

In the Mean Shift tracker, as well as in the other trackers cited previously, the reference color histogram is approximated according to a single view of the target, typically as it appears in the first frame of the sequence. Although using this method for obtaining the reference histogram proved to be very robust in many scenarios, it produces, in many cases, a poor representation of the target, which might result in poor tracking. More seriously, the support of a reference histogram obtained by this method may become non-overlapping with the support of the target’s histogram as it appears in the sequence, usually resulting in target loss. Indeed, for many objects, any viewing direction may be replaced with a different viewing direction where all the object’s colors apparent in the latter view differ from those in the former. An (unscrambled) Rubik Cube is an extreme example of such an object; each side is a different color, and three sides at most are visible from any viewing direction. Major changes in the apparent colors of a target may also result from changes in the actual target’s colors, as when a person puts on or removes a piece of clothing, or as in the case of an alternating street advertisement.

Often, several different views of the target are available prior to the tracking, either from images that were previously acquired (e.g. [8], [9], [10], [11]) or when performing off-line tracking (e.g. [12], [13], [14], [15]). In these contexts, this paper extends the Mean Shift tracker to using multiple reference color histograms. At first we suggest a simple method to combine these histograms into a single histogram that is more appropriate for tracking the target. In order to enhance the tracking further, we then propose an extension to the Mean Shift tracker, where the convex hull of these histograms is used as the target model. That is, rather than searching for the image region whose color histogram is closest to a single reference histogram, we search for the image region by minimizing the distance of its color histogram from the convex hull of several reference histograms.

Time-varying histograms of colors (e.g. [5], [16]) or of other features such as filter responses (e.g. [17]) have been used for target modeling before, and many trackers have modeled the target’s 2D appearance as being time-varying within a subspace (e.g. [8], [9], [18], [19], [20]). In the latter group of trackers, the search for the target (and possibly for additional transformation parameters) is performed by minimizing the distance of its appearance in the current frame from that subspace. This approach is applied here by modeling the target’s color histogram as being a time-varying linear combination of several reference histograms, under the restriction that the mixture coefficients are nonnegative and sum to unity (so that the linear combination will be a histogram mixture).

Section 2 outlines the original Mean Shift tracker [3]. A simple method for combining multiple reference histograms into one is proposed in Section 3. Section 4 describes the proposed extension of the Mean Shift tracker to use the convex hull-based target model. Experimental results are described in Section 5, Section 6 includes a discussion, and a paper summary is provided in Section 7.

Section snippets

The Mean Shift tracker

In this section we outline the Mean Shift tracker described in [3]. The notations used here are similar to those in [3], with minor modifications to suit the subsequent sections.

Combining multiple histograms into one

Sometimes no view of the target yields a reasonable approximation of its circumferential color histogram. An extreme example is presented in Fig. 1. This figure shows the results of the Mean Shift tracker for Sequence I, where a Rubik Cube is tracked. The reference color histogram was set in the first frame, where the visible colors of the cube are orange,

Convex hull-based target model

As different sides of the target face the camera, the target’s histogram changes. To accommodate for a time-varying target histogram, we propose to extend the reference target model used by the Mean Shift tracker to include the convex hull of multiple reference histograms obtained from different target views. That is, the target model is approximated as the mixture of M reference histogramsqˆ(α)=v=1Mαvqˆv,vαv0,v=1Mαv=1,where the mixture proportions α={αv}v=1,,M vary with time.

Thus, the

Experimental results

Results of testing the Mean Shift tracker with the convex hull-based target model are presented for seven sequences. All the targets tracked in the experiments were such that their color histogram could not be reasonably modeled from a single view. In all experiments the RGB color space was used. Each color band was equally divided into eight bins, except for Sequence III, where each color band had to be divided into 32 bins because the target’s colors were very similar to the background’s.

The

Discussion

There appears to be a resemblance between the problem dealt with in this work and those in the papers by Bajramovic et al. [24] and by Maggio and Cavallaro [25], which also enhance the tracking by employing multiple reference histograms. However, the problems are distinct. Here, we are concerned with the problem of temporal changes in the target’s features (e.g., due to rotations in space), whereas [24] deals with the problem of fusing different types of features in the tracking process. These

Conclusion

While the commonly used, Mean Shift tracker [3] proved to be robust in many tracking scenarios, there are cases where no single view suffices to produce a reference color histogram appropriate for tracking the target.

This paper presented a method for immunizing the Mean Shift tracker against the above problem by using multiple reference color histograms. These histograms are obtained from different target views or for different target states. A simple method for combining these histograms into

References (26)

  • N.S. Peng et al.

    Mean shift blob tracking with kernel histogram filtering and hypothesis testing

    Pattern Recognition Letters

    (2005)
  • S.J. McKenna et al.

    Tracking colour objects using adaptive mixture models

    Image and Vision Computing

    (1999)
  • S. Birchfield, Elliptical head tracking using intensity gradients and color histograms, in: Proceedings of the 1998...
  • R.T. Collins, Mean-shift blob tracking through scale space, in: Proceedings of the 2003 IEEE Computer Society...
  • D. Comaniciu et al.

    Kernel-based object tracking

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2003)
  • P. Pérez, C. Hue, J. Vermaak, M. Gangnet, Color-based probabilistic tracking, in: Proceedings of the 7th European...
  • D. Comaniciu, V. Ramesh, P. Meer, Real-time tracking of non-rigid objects using mean shift, in: Proceedings of the 2000...
  • D. Comaniciu et al.

    Mean shift: a robust approach toward feature space analysis

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2002)
  • M.J. Black et al.

    EigenTracking: robust matching and tracking of articulated objects using a view-based representation

    International Journal of Computer Vision

    (1998)
  • F. De la Torre, C.J.G. Rubio, E. Martinez, Subspace eyetracking for driver warning, in: Proceedings of the 2003...
  • S. Avidan

    Support vector tracking

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2004)
  • J. Tu, H. Tao, T. Huang, Online updating appearance generative mixture model for meanshift tracking, in: Proceedings of...
  • J. Sun, W. Zhang, X. Tang, H.-Y. Shum, Bi-directional tracking using trajectory segment analysis, in: Proceedings of...
  • Cited by (147)

    View all citing articles on Scopus
    View full text