Mean Shift tracking with multiple reference color histograms

doi:10.1016/j.cviu.2009.12.006

Computer Vision and Image Understanding

Volume 114, Issue 3, March 2010, Pages 400-408

https://doi.org/10.1016/j.cviu.2009.12.006 Get rights and content

Abstract

The Mean Shift tracker is a widely used tool for robustly and quickly tracking the location of an object in an image sequence using the object’s color histogram. The reference histogram is typically set to that in the target region in the frame where the tracking is initiated. Often, however, no single view suffices to produce a reference histogram appropriate for tracking the target. In contexts where multiple views of the target are available prior to the tracking, this paper enhances the Mean Shift tracker to use multiple reference histograms obtained from these different target views. This is done while preserving both the convergence and the speed properties of the original tracker. We first suggest a simple method to use multiple reference histograms for producing a single histogram that is more appropriate for tracking the target. Then, to enhance the tracking further, we propose an extension to the Mean Shift tracker where the convex hull of these histograms is used as the target model. Many experimental results demonstrate the successful tracking of targets whose visible colors change drastically and rapidly during the sequence, where the basic Mean Shift tracker obviously fails.

Introduction

The target’s color histogram is widely used for visual tracking (e.g. [1], [2], [3], [4], [5]) and, as was shown by Comaniciu et al. [3], [6], tracking using this feature may be performed very quickly via the Mean Shift procedure [7]. This paper extends Comaniciu et al.’s tracker in [3], [6], which will be referred to in this paper by its common name Mean Shift tracker.

The Mean Shift tracker works by searching in each frame for the location of an image region whose color histogram is closest to the reference color histogram of the target. The distance between two histograms is measured using their Bhattacharyya coefficient, and the search is performed by seeking the target location via Mean Shift iterations beginning from the target location estimated in the previous frame (the tracker is outlined in Section 2).

In the Mean Shift tracker, as well as in the other trackers cited previously, the reference color histogram is approximated according to a single view of the target, typically as it appears in the first frame of the sequence. Although using this method for obtaining the reference histogram proved to be very robust in many scenarios, it produces, in many cases, a poor representation of the target, which might result in poor tracking. More seriously, the support of a reference histogram obtained by this method may become non-overlapping with the support of the target’s histogram as it appears in the sequence, usually resulting in target loss. Indeed, for many objects, any viewing direction may be replaced with a different viewing direction where all the object’s colors apparent in the latter view differ from those in the former. An (unscrambled) Rubik Cube is an extreme example of such an object; each side is a different color, and three sides at most are visible from any viewing direction. Major changes in the apparent colors of a target may also result from changes in the actual target’s colors, as when a person puts on or removes a piece of clothing, or as in the case of an alternating street advertisement.

Often, several different views of the target are available prior to the tracking, either from images that were previously acquired (e.g. [8], [9], [10], [11]) or when performing off-line tracking (e.g. [12], [13], [14], [15]). In these contexts, this paper extends the Mean Shift tracker to using multiple reference color histograms. At first we suggest a simple method to combine these histograms into a single histogram that is more appropriate for tracking the target. In order to enhance the tracking further, we then propose an extension to the Mean Shift tracker, where the convex hull of these histograms is used as the target model. That is, rather than searching for the image region whose color histogram is closest to a single reference histogram, we search for the image region by minimizing the distance of its color histogram from the convex hull of several reference histograms.

Time-varying histograms of colors (e.g. [5], [16]) or of other features such as filter responses (e.g. [17]) have been used for target modeling before, and many trackers have modeled the target’s 2D appearance as being time-varying within a subspace (e.g. [8], [9], [18], [19], [20]). In the latter group of trackers, the search for the target (and possibly for additional transformation parameters) is performed by minimizing the distance of its appearance in the current frame from that subspace. This approach is applied here by modeling the target’s color histogram as being a time-varying linear combination of several reference histograms, under the restriction that the mixture coefficients are nonnegative and sum to unity (so that the linear combination will be a histogram mixture).

Section 2 outlines the original Mean Shift tracker [3]. A simple method for combining multiple reference histograms into one is proposed in Section 3. Section 4 describes the proposed extension of the Mean Shift tracker to use the convex hull-based target model. Experimental results are described in Section 5, Section 6 includes a discussion, and a paper summary is provided in Section 7.

Section snippets

The Mean Shift tracker

In this section we outline the Mean Shift tracker described in [3]. The notations used here are similar to those in [3], with minor modifications to suit the subsequent sections.

Combining multiple histograms into one

Sometimes no view of the target yields a reasonable approximation of its circumferential color histogram. An extreme example is presented in Fig. 1. This figure shows the results of the Mean Shift tracker for Sequence I, where a Rubik Cube is tracked. The reference color histogram was set in the first frame, where the visible colors of the cube are orange,

Convex hull-based target model

As different sides of the target face the camera, the target’s histogram changes. To accommodate for a time-varying target histogram, we propose to extend the reference target model used by the Mean Shift tracker to include the convex hull of multiple reference histograms obtained from different target views. That is, the target model is approximated as the mixture of M reference histograms $\hat{q} (α) = \sum_{v = 1}^{M} α_{v} {\hat{q}}^{v}, \forall v α_{v} ⩾ 0, \sum_{v = 1}^{M} α_{v} = 1,$ where the mixture proportions $α = {α_{v}}_{v = 1, \dots, M}$ vary with time.

Thus, the

Experimental results

Results of testing the Mean Shift tracker with the convex hull-based target model are presented for seven sequences. All the targets tracked in the experiments were such that their color histogram could not be reasonably modeled from a single view. In all experiments the RGB color space was used. Each color band was equally divided into eight bins, except for Sequence III, where each color band had to be divided into 32 bins because the target’s colors were very similar to the background’s.

The

Discussion

There appears to be a resemblance between the problem dealt with in this work and those in the papers by Bajramovic et al. [24] and by Maggio and Cavallaro [25], which also enhance the tracking by employing multiple reference histograms. However, the problems are distinct. Here, we are concerned with the problem of temporal changes in the target’s features (e.g., due to rotations in space), whereas [24] deals with the problem of fusing different types of features in the tracking process. These

Conclusion

While the commonly used, Mean Shift tracker [3] proved to be robust in many tracking scenarios, there are cases where no single view suffices to produce a reference color histogram appropriate for tracking the target.

This paper presented a method for immunizing the Mean Shift tracker against the above problem by using multiple reference color histograms. These histograms are obtained from different target views or for different target states. A simple method for combining these histograms into

References (26)

N.S. Peng et al.
Mean shift blob tracking with kernel histogram filtering and hypothesis testing
Pattern Recognition Letters
(2005)
S.J. McKenna et al.
Tracking colour objects using adaptive mixture models
Image and Vision Computing
(1999)
S. Birchfield, Elliptical head tracking using intensity gradients and color histograms, in: Proceedings of the 1998...
R.T. Collins, Mean-shift blob tracking through scale space, in: Proceedings of the 2003 IEEE Computer Society...
D. Comaniciu et al.
Kernel-based object tracking
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2003)
P. Pérez, C. Hue, J. Vermaak, M. Gangnet, Color-based probabilistic tracking, in: Proceedings of the 7th European...
D. Comaniciu, V. Ramesh, P. Meer, Real-time tracking of non-rigid objects using mean shift, in: Proceedings of the 2000...
D. Comaniciu et al.
Mean shift: a robust approach toward feature space analysis
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2002)
M.J. Black et al.
EigenTracking: robust matching and tracking of articulated objects using a view-based representation
International Journal of Computer Vision
(1998)
F. De la Torre, C.J.G. Rubio, E. Martinez, Subspace eyetracking for driver warning, in: Proceedings of the 2003...

S. Avidan

Support vector tracking

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2004)

J. Tu, H. Tao, T. Huang, Online updating appearance generative mixture model for meanshift tracking, in: Proceedings of...

J. Sun, W. Zhang, X. Tang, H.-Y. Shum, Bi-directional tracking using trajectory segment analysis, in: Proceedings of...

Cited by (147)

Visual tracking in video sequences based on biologically inspired mechanisms
2024, Computer Vision and Image Understanding
Visual tracking is the process of locating one or more objects based on their appearance. The high variation in the conditions and states of a moving object and presence of challenges such as background clutter, illumination variation, occlusion, etc. makes this problem extremely complex, and hard to achieve a robust algorithm in this field. However, unlike the machine vision, in the biological vision, the task of visual tracking is ideally conducted even in the worst conditions. Consequently, in this paper, taking into account the superior performance of biological vision in visual tracking, a biologically inspired visual tracking algorithm is introduced. The proposed algorithm inspiring the task-driven recognition procedure of the primary layers of the ventral pathway, and visual cortex mechanisms including spatial–temporal processing, motion perception, attention, and saliency to track a single object in the video sequence. For this purpose, a set of low-level features including the oriented-edges, color, and motion information (inspired by the layer V1) extracted from the target area and based on the discrimination rate that each feature creates with the background (inspired by the saliency mechanism), a subset of these features are employed to generate the appearance model and identify the target location. Moreover, by memorizing the shape and motion information (inspired by the short-term memory) scale variation and occlusion are handled. The experimental results showed that the proposed algorithm can well handle most of the visual tracking challenges, achieve high precision in target locating and act in a real-time manner.
Human Event Recognition in Smart Classrooms Using Computer Vision: A Systematic Literature Review
2023, Programming and Computer Software
A Fast Calibration Method for Pneumotachograph with a 3L Syringe
2023, Bioengineering
UEQMS: UMAP Embedded Quick Mean Shift Algorithm for High Dimensional Clustering
2023, Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023
Generative online learning of appearance modeling approaches for visual tracking
2023, Journal of Optics (India)
GridShift: A Faster Mode-seeking Algorithm for Image Segmentation and Object Tracking
2022, arXiv

View all citing articles on Scopus

View full text

Mean Shift tracking with multiple reference color histograms

Abstract

Introduction

Section snippets

The Mean Shift tracker

Combining multiple histograms into one

Convex hull-based target model

Experimental results

Discussion

Conclusion

Pattern Recognition Letters

Image and Vision Computing

Kernel-based object tracking

IEEE Transactions on Pattern Analysis and Machine Intelligence

Mean shift: a robust approach toward feature space analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence

EigenTracking: robust matching and tracking of articulated objects using a view-based representation

International Journal of Computer Vision

Support vector tracking

IEEE Transactions on Pattern Analysis and Machine Intelligence