Generative-model-based tracking by cluster analysis of image differences

https://doi.org/10.1016/S0921-8890(02)00203-8Get rights and content

Abstract

The EM algorithm is used to track moving objects as clusters of pixels significantly different from the corresponding pixels in a reference image. The underlying cluster model is Gaussian in image space, but not in grey-level difference distribution. The generative model is used to derive criteria for the elimination and merging of clusters, while simple heuristics are used for the initialisation and splitting of clusters. The system is competitive with other tracking algorithms based on image differencing.

Introduction

Many tracking systems include the four stages of image differencing, thresholding of the difference image, morphological filtering, and connected component labelling. These stages are used to identify distinct targets and attribute each image pixel to one of the targets. Further processing stages use this information for detection of target features, Kalman filtering, etc. Several examples of this approach can be found in [4]. The approach can be quite effective, but thresholding and morphological operators involve information loss: this loss is the result of assigning each pixel unambiguously either to the background or to one (and only one) target, ignoring the uncertainty of these assignments.

For most purposes, this attempt is unnecessary: the output that is required from a tracker is information on how many targets are present and the approximate location and size of the targets. This seems an appropriate task for cluster analysis.

Tracking algorithms based on clustering have been proposed in [6], [14]. The most important innovation described in this paper is that the new method is put on a sound statistical basis by the formulation of a generative model for the clusters. This model underlies not only the EM algorithm used to optimise cluster parameters, but also the criteria used for determining the number of clusters. This number is not fixed over an image sequence, but dynamically updated on the basis of both the cluster parameters and the evidence from the current image. It should also be noted that the generative model does not prescribe Gaussian clusters: this is important, because distributions of grey-level differences in image sequences contain many outliers and are therefore not well approximated by a Gaussian distribution. Finally, it should be noted that previous cluster-tracking algorithms [6], [14] operate without image differencing; as a consequence, they can be used with a moving camera. However, if tracking with a fixed camera, clustering applied to image differences is faster and more reliable, because the background can be treated as a single cluster.

The cluster tracker was developed as a component of a model-based tracking system [12]: clustering applied to a difference image can provide initial estimates of the location and size of a new target, by projecting the centroid and covariance matrix of the corresponding cluster onto the ground plane. It was soon discovered that the cluster-tracking algorithm is remarkably effective on its own, especially considering its conceptual simplicity.

This paper describes the generative model underlying the tracker (Section 2) and the tracking algorithm itself (Section 3), before presenting results obtained with the PETS2000 image sequences [4] (Section 4). Finally, the strengths and weaknesses of the tracker are briefly reviewed (Section 5).

Section snippets

Generative model

The principle behind the tracking algorithm is simple: a moving target will produce a cluster of pixels in the difference image. The probabilities that a pixel originate from the background cluster or from one of the target clusters can be estimated from the location of the pixel and the value of its grey-level difference with respect to the reference image. Cluster analysis can be used to improve rough initial estimates of cluster parameters.

Each pixel of the difference image is considered an

Cluster parameter estimation

The parameters are estimated by iterative maximisation of the log-likelihood by the EM clustering algorithm (see [9, Section 2.7.2]): at each iteration, the parameters are re-estimated for each cluster by using the current estimates of the probabilities. For instance, the updated estimate of the average (absolute value) grey-level difference μj for cluster j is computed as μj(k+1)=u|δ(u)|·pj(k)(u)upj(k)(u),where the superscripts indicate iteration number.

Given that the background cluster

Results

The algorithm was tested on the PETS2000 image sequences: ftp://pets.rdg.ac.uk/PETS2000. Movies with the tracking results for the test sequence (with a duration of almost one minute) and the training sequence (with a duration of just over four minutes) are available at http://www.diku.dk/research/published/2001/01-07.html.

The three vehicles and three people visible in the test sequence were detected and tracked until they left the field of view, or until the end of the image sequence. Two birds

Conclusions

The aim of this paper is to illustrate the capabilities and the limitations of an algorithm based only on cluster analysis of grey-level image differences. The focus is not on detection, but on whether detected targets can be successfully tracked. By this criterion, the capabilities of the cluster tracker are comparable to those of more sophisticated tracking system (see, e.g. [4]). The main limitation is that an algorithm that ignores the history of the clusters, 3D geometry, and any other

Acknowledgements

I am grateful to Ernst Hansen for proving the theorem in Appendix A and to Anthony Worrall, James Ferryman and Tommaso Cotroneo for practical help and moral support.

Arthur E.C. Pece received his first degree in Biological Sciences at the University of Parma, Italy, in 1982, and his Ph.D. in physiology at the University of Alberta, Canada, in 1990. His interest in computer vision originates from research on image coding in the human visual cortex, carried out at the University of Cambridge. He has been a post-doctoral researcher in computer vision at the University of Reading, England, and at the University of Groningen, The Netherlands. He is currently an

References (15)

  • T.M. Cover, J.A. Thomas, Elements of Information Theory, Wiley, New York,...
  • A.P. Dempster et al.

    Maximum likelihood from incomplete data via the EM algorithm (with discussion)

    Journal of the Royal Statistical Society B

    (1977)
  • D.W. Dong et al.

    Statistics of natural time-varying images

    Network

    (1995)
  • J.M. Ferryman (Ed.), Proceedings of the First IEEE Workshop on Performance Evaluation in Tracking and Surveillance...
  • J.H. van Hateren et al.

    Independent component analysis of natural image sequences yields spatio-temporal filters similar to simple cells in primary visual cortex

    Proceedings of the Royal Society of London B

    (1998)
  • B. Heisele, U. Kreßel, W. Ritter, Tracking non-rigid, moving objects based on color cluster flow, in: Proceedings of...
  • J. Huang, D. Mumford, Statistics of natural images and models, in: Proceedings of the IEEE Conference on Computer...
There are more references available in the full text version of this article.

Cited by (16)

  • On the computational rationale for generative models

    2007, Computer Vision and Image Understanding
  • Contour tracking based on marginalized likelihood ratios

    2006, Image and Vision Computing
    Citation Excerpt :

    The value of σ was set equal to the greater of 4 pixels and (F/d)/10 m, where F is the focal length of the camera and d is the distance of the vehicle from the camera. Vehicle detection and state initialization were carried out automatically using the Cluster Tracker [40] as detailed in [45,44,42]. The tracking experiments were repeated 10 times (with different random seeds) using either the BObs or MLR model, and either 512 or 1024 particles for each vehicle.

  • Intelligent Robotic Systems - SIRS'2000

    2002, Robotics and Autonomous Systems
  • Vehicle detection from an image sequence collected by a hovering helicopter

    2011, International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences - ISPRS Archives
View all citing articles on Scopus

Arthur E.C. Pece received his first degree in Biological Sciences at the University of Parma, Italy, in 1982, and his Ph.D. in physiology at the University of Alberta, Canada, in 1990. His interest in computer vision originates from research on image coding in the human visual cortex, carried out at the University of Cambridge. He has been a post-doctoral researcher in computer vision at the University of Reading, England, and at the University of Groningen, The Netherlands. He is currently an Assistant Research Professor at the University of Copenhagen. His main research interests are generative model-based tracking and sparse image coding.

View full text