Generative-model-based tracking by cluster analysis of image differences

doi:10.1016/S0921-8890(02)00203-8

Robotics and Autonomous Systems

Volume 39, Issues 3–4, 30 June 2002, Pages 181-194

https://doi.org/10.1016/S0921-8890(02)00203-8 Get rights and content

Abstract

The EM algorithm is used to track moving objects as clusters of pixels significantly different from the corresponding pixels in a reference image. The underlying cluster model is Gaussian in image space, but not in grey-level difference distribution. The generative model is used to derive criteria for the elimination and merging of clusters, while simple heuristics are used for the initialisation and splitting of clusters. The system is competitive with other tracking algorithms based on image differencing.

Introduction

Many tracking systems include the four stages of image differencing, thresholding of the difference image, morphological filtering, and connected component labelling. These stages are used to identify distinct targets and attribute each image pixel to one of the targets. Further processing stages use this information for detection of target features, Kalman filtering, etc. Several examples of this approach can be found in [4]. The approach can be quite effective, but thresholding and morphological operators involve information loss: this loss is the result of assigning each pixel unambiguously either to the background or to one (and only one) target, ignoring the uncertainty of these assignments.

For most purposes, this attempt is unnecessary: the output that is required from a tracker is information on how many targets are present and the approximate location and size of the targets. This seems an appropriate task for cluster analysis.

Tracking algorithms based on clustering have been proposed in [6], [14]. The most important innovation described in this paper is that the new method is put on a sound statistical basis by the formulation of a generative model for the clusters. This model underlies not only the EM algorithm used to optimise cluster parameters, but also the criteria used for determining the number of clusters. This number is not fixed over an image sequence, but dynamically updated on the basis of both the cluster parameters and the evidence from the current image. It should also be noted that the generative model does not prescribe Gaussian clusters: this is important, because distributions of grey-level differences in image sequences contain many outliers and are therefore not well approximated by a Gaussian distribution. Finally, it should be noted that previous cluster-tracking algorithms [6], [14] operate without image differencing; as a consequence, they can be used with a moving camera. However, if tracking with a fixed camera, clustering applied to image differences is faster and more reliable, because the background can be treated as a single cluster.

The cluster tracker was developed as a component of a model-based tracking system [12]: clustering applied to a difference image can provide initial estimates of the location and size of a new target, by projecting the centroid and covariance matrix of the corresponding cluster onto the ground plane. It was soon discovered that the cluster-tracking algorithm is remarkably effective on its own, especially considering its conceptual simplicity.

This paper describes the generative model underlying the tracker (Section 2) and the tracking algorithm itself (Section 3), before presenting results obtained with the PETS2000 image sequences [4] (Section 4). Finally, the strengths and weaknesses of the tracker are briefly reviewed (Section 5).

Section snippets

Generative model

The principle behind the tracking algorithm is simple: a moving target will produce a cluster of pixels in the difference image. The probabilities that a pixel originate from the background cluster or from one of the target clusters can be estimated from the location of the pixel and the value of its grey-level difference with respect to the reference image. Cluster analysis can be used to improve rough initial estimates of cluster parameters.

Each pixel of the difference image is considered an

Cluster parameter estimation

The parameters are estimated by iterative maximisation of the log-likelihood by the EM clustering algorithm (see [9, Section 2.7.2]): at each iteration, the parameters are re-estimated for each cluster by using the current estimates of the probabilities. For instance, the updated estimate of the average (absolute value) grey-level difference μ_j for cluster j is computed as $μ_{j}^{(k+1)} = ∑_{u} |δ(u)|·p_{j}^{(k)} (u) ∑_{u} p_{j}^{(k)} (u),$ where the superscripts indicate iteration number.

Given that the background cluster

Results

The algorithm was tested on the PETS2000 image sequences: ftp://pets.rdg.ac.uk/PETS2000. Movies with the tracking results for the test sequence (with a duration of almost one minute) and the training sequence (with a duration of just over four minutes) are available at http://www.diku.dk/research/published/2001/01-07.html.

The three vehicles and three people visible in the test sequence were detected and tracked until they left the field of view, or until the end of the image sequence. Two birds

Conclusions

The aim of this paper is to illustrate the capabilities and the limitations of an algorithm based only on cluster analysis of grey-level image differences. The focus is not on detection, but on whether detected targets can be successfully tracked. By this criterion, the capabilities of the cluster tracker are comparable to those of more sophisticated tracking system (see, e.g. [4]). The main limitation is that an algorithm that ignores the history of the clusters, 3D geometry, and any other

Acknowledgements

I am grateful to Ernst Hansen for proving the theorem in Appendix A and to Anthony Worrall, James Ferryman and Tommaso Cotroneo for practical help and moral support.

Arthur E.C. Pece received his first degree in Biological Sciences at the University of Parma, Italy, in 1982, and his Ph.D. in physiology at the University of Alberta, Canada, in 1990. His interest in computer vision originates from research on image coding in the human visual cortex, carried out at the University of Cambridge. He has been a post-doctoral researcher in computer vision at the University of Reading, England, and at the University of Groningen, The Netherlands. He is currently an

References (15)

T.M. Cover, J.A. Thomas, Elements of Information Theory, Wiley, New York,...
A.P. Dempster et al.
Maximum likelihood from incomplete data via the EM algorithm (with discussion)
Journal of the Royal Statistical Society B
(1977)
D.W. Dong et al.
Statistics of natural time-varying images
Network
(1995)
J.M. Ferryman (Ed.), Proceedings of the First IEEE Workshop on Performance Evaluation in Tracking and Surveillance...
J.H. van Hateren et al.
Independent component analysis of natural image sequences yields spatio-temporal filters similar to simple cells in primary visual cortex
Proceedings of the Royal Society of London B
(1998)
B. Heisele, U. Kreßel, W. Ritter, Tracking non-rigid, moving objects based on color cluster flow, in: Proceedings of...
J. Huang, D. Mumford, Statistics of natural images and models, in: Proceedings of the IEEE Conference on Computer...

There are more references available in the full text version of this article.

Cited by (16)

On the computational rationale for generative models
2007, Computer Vision and Image Understanding
Generative and discriminative models are best defined by the structure of their graphical representation. This paper introduces such a definition and uses it to argue that, in some practical cases, generative models need to be formulated in order to be implemented within generate-and-test algorithms. This argument is inspired mainly by the ideas of the late Donald MacKay and by considerations of computational complexity.
A comparison between feature-based and EM-based contour tracking
2006, Image and Vision Computing
Most active-contour methods are based either on maximizing the image contrast under the contour or on minimizing the sum of squared distances between contour and image ‘features’. The Marginalized Likelihood Ratio (MLR) contour model uses a contrast-based measure of goodness-of-fit for the contour and thus falls into the first class. The point of departure from previous models consists in marginalizing this contrast measure over unmodelled shape variations.
The MLR model naturally leads to the EM Contour algorithm, in which pose optimization is carried out by iterated least-squares, as in feature-based contour methods. The difference with respect to other feature-based algorithms is that the EM Contour algorithm minimizes squared distances from Bayes least-squares (marginalized) estimates of contour locations, rather than from ‘strongest features’ in the neighborhood of the contour. Within the framework of the MLR model, alternatives to the EM algorithm can also be derived: one of these alternatives is the empirical-information method.
Tracking experiments demonstrate the robustness of pose estimates given by the MLR model, and support the theoretical expectation that the EM Contour algorithm is more robust than either feature-based methods or the empirical-information method.
Contour tracking based on marginalized likelihood ratios
2006, Image and Vision Computing
Citation Excerpt :
The value of σ was set equal to the greater of 4 pixels and (F/d)/10 m, where F is the focal length of the camera and d is the distance of the vehicle from the camera. Vehicle detection and state initialization were carried out automatically using the Cluster Tracker [40] as detailed in [45,44,42]. The tracking experiments were repeated 10 times (with different random seeds) using either the BObs or MLR model, and either 512 or 1024 particles for each vehicle.
When fitting contour models to image data, it is necessary to take into account unmodelled shape variability. Traditionally, this has been done either by blurring the input image or by looking for image features in the neighborhood of the contour. A more statistically rigorous approach is to marginalize over all possible shape deformations. When this is done, the resulting likelihood model has similarities to both the blurring approach and the feature-based approach.
A tracking application is used to demonstrate the marginalized likelihood model and compare it to the blurring approach. The best tracking results were obtained with the new model when combined with the Expectation–Maximization (EM) algorithm.
Intelligent Robotic Systems - SIRS'2000
2002, Robotics and Autonomous Systems
Illumination change adaptive tracking based on color centroid shifting
2011, Optical Engineering
Vehicle detection from an image sequence collected by a hovering helicopter
2011, International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences - ISPRS Archives

View all citing articles on Scopus

View full text

Generative-model-based tracking by cluster analysis of image differences

Abstract

Introduction

Section snippets

Generative model

Cluster parameter estimation

Results

Conclusions

Acknowledgements

Maximum likelihood from incomplete data via the EM algorithm (with discussion)

Journal of the Royal Statistical Society B

Statistics of natural time-varying images

Network

Independent component analysis of natural image sequences yields spatio-temporal filters similar to simple cells in primary visual cortex

Proceedings of the Royal Society of London B