Elsevier

Pattern Recognition

Volume 90, June 2019, Pages 377-389
Pattern Recognition

A labeled random finite set online multi-object tracker for video data

https://doi.org/10.1016/j.patcog.2019.02.004Get rights and content

Highlights

  • The proposed filter addresses occlusions and detection loss that exploits the advantages of both detection-based and TBD approaches to improve performance while reducing the computational cost.

  • In a single Bayesian recursion the filter seamlessly integrates state estimation, track management, clutter rejection, detection loss and occlusion handling as well as prior knowledge that detection loss in the middle of the scene are likely to be due to occlusions.

  • Tracking performance is compared to state-of-the-art algorithms on simulated data and well-known benchmark video datasets.

Abstract

This paper proposes an online multi-object tracking algorithm for image observations using a top-down Bayesian formulation that seamlessly integrates state estimation, track management, handling of false positives, false negatives and occlusion into a single recursion. This is achieved by modeling the multi-object state as labeled random finite set and using the Bayes recursion to propagate the multi-object filtering density forward in time. The proposed filter updates tracks with detections but switches to image data when detection loss occurs, thereby exploiting the efficiency of detection data and the accuracy of image data. Furthermore the labeled random finite set framework enables the incorporation of prior knowledge that detection loss in the middle of the scene are likely to be due to occlusions. Such prior knowledge can be exploited to improve occlusion handling, especially long occlusions that can lead to premature track termination in on-line multi-object tracking. Tracking performance is compared to state-of-the-art algorithms on synthetic data and well-known benchmark video datasets.

Introduction

In a multiple object setting, not only do the states of the objects vary with time, but the number of objects also changes due to objects appearing and disappearing. In this work, we consider the problem of jointly estimating the time-varying number of objects and their trajectories from a stream of noisy images. In particular, we are interested in multi-object tracking (MOT) solutions that compute estimates at a given time using only data up to that time. These so-called online solutions are better suited for time-critical applications.

A critical function of a multi-object tracker is track management, which concerns track initiation/termination and track labeling or identifying trajectories of individual objects. Track management is more challenging for online algorithms than for batch algorithms. Usually, track initiation/termination in online MOT algorithms is performed by examining consecutive detections in time [1], [2]. However, false positives generated by the background, compounded by false negatives (including those from object occlusions), can result in false tracks and lost tracks, especially in online algorithms. False negatives also cause track fragmentation in batch algorithms as reported in [3], [4], [5], [6]. With the exception of the recent network flow [7] techniques, track labels are assigned upon track initiation, and maintained over time until termination. An online multi-object Bayesian filter that provides systematic track labeling using labeled random finite set (RFS) was proposed in [8].

In most video MOT approaches, each image in the data sequence is compressed into a set of detections before a filtering operation is applied to keep track of the objects (including undetected ones). Typically, in the filtering module, motion correspondence or data association is first determined followed by the application of standard filtering techniques such as Kalman or sequential Monte Carlo [1], [2]. The main advantage of performing detection before filtering is the computational efficiency in the compression of images into relevant detections. The main disadvantage is the loss of information, in addition to false negatives and false positives, especially in low signal to noise ratio (SNR) applications.

Track-before-detect (TBD) is an alternative approach, which by-passes the detection module and exploits the spatio-temporal information directly from the image sequence. The TBD methodology is often required in tracking applications for low SNR image data [9], [10], [11], [12]. In visual tracking applications, perhaps the most well-known TBD MOT algorithm is BraMBLe [13]. Other visual MOT algorithms that can be categorized as TBD include [14], [15] which exploit color-based observation models, [2], [16], which exploit multi-modality of distributions, and [17] which uses multi-Bernoulli random finite set models. While the TBD approach minimizes information loss, it is computationally more expensive. So far it is not clear how we could simultaneously process detection and image measurements to exploit their complementary advantages, in a principled manner.

In this paper, we develop an efficient online MOT algorithm for video data that exploits the advantages of both detection-based and TBD approaches to improve performance while reducing the computational cost. In the visual MOT literature, simultaneous consideration of detections and image features were proposed in ad-hoc manners [1], [5], and it is not clear how to combine them in a principled way. The innovation of our proposed algorithm is the adaptive update of tracks with detections (for efficiency), or with local regions of the input image (to minimize information loss and improve accuracy). In addition, the proposed visual MOT filter seamlessly integrates state estimation, track management, clutter rejection, false negatives and occlusion handling, (which are traditionally separate functionalities) in a single Bayesian recursion.

The key technical contribution is a hybrid multi-object measurement model that simultaneously accommodates detections and image observations. Conceptually, this model is a simple generalization of the standard multi-object measurement model [18] and the separable model for image measurement [10]. Such a simple construct, however, enables us to simultaneously exploit the efficiency of the detection-based approach and the accuracy of TBD-based approach. Specifically, using the labeled RFS framework for multi-object estimation [8], we prove conjugacy of the Generalized Labelled Multi-Bernoulli (GLMB) distributions with respect to the likelihood function of the proposed measurement model. Using this conjugacy result, and the labeled RFS estimation formulation [8], we develop an analytic Bayesian MOT filter that avoids processing the entire image so as to reduce computational costs, while at the same time make use of relevant local information at the image level to reduce the effect of false negatives as well as tracking errors.

Due to the labeled RFS filtering formulation, the proposed MOT filter addresses state estimation, track management, clutter rejection, false negatives and occlusion handling, in one single recursion. Generally, an online MOT algorithm would terminate a track that has not been detected over several frames. In many visual MOT applications however, it is observed that away from designated exit regions such as scene edges, the longer an object is in the scene, the less likely it is to disappear, see for example [19], [20] which exploit theses so-called closed world assumptions. Intuitively, this observation can be used to delay the termination of tracks that have been occluded over an extended period, so as to improve occlusion handling. The labeled RFS framework provides a principled and inexpensive means to exploit this observation for improved occlusion handling.

The remainder of the paper is structured as follows. The Bayesian filtering formulation of the MOT problem using labeled RFS is given in Section 2, followed by details of the proposed solution in Section 3. Performance evaluation of the proposed MOT filter against state-of-the-art trackers is presented in Section 4, and concluding remarks are given in Section 5.

Section snippets

Bayesian multiple object tracking

This section outlines the RFS framework for MOT that accommodates uncertainty in the number of objects, the states of the objects and their trajectories. The salient feature of this framework is that it admits direct parallels between traditional Bayesian filtering and MOT. The modeling of the multi-object state as an RFS in Section 2.1 enables Bayesian filtering concepts to be directly translated to the multi-object case in Section 2.2. Section 2.3 examines the MOT problem in the presence of

GLMB Filter for tracking with image data

The GLMB filter (with the standard measurement likelihood) is a suitable candidate for online MOT [26], [28]. However, it is neither designed to handle occlusion nor image data. Even though occluded objects share the observations of the occluding objects, this situation is not permitted in the standard multi-object likelihood. Consequently, uncertainties in the states of occluded objects grow, while their existence probabilities quickly diminish to zero, leading to possible hi-jacking, and

Experimental results

The proposed MOT filter is tested on a simulated TBD application in Section 4.1, and on real video data in Section 4.2.

Conclusion

This paper proposed an efficient online visual MOT algorithm that exploits the advantages of both detection-based and TBD approaches, which seamlessly integrates state estimation, track management, clutter rejection, false negatives and occlusion handling into one single Bayesian recursion. In particular, it has the efficiency of the detection-based approach that avoids updating with the entire image, while at the same time making use of information at the image level by using only small

Acknowledgements

This work was supported by the Australian Research Council through a research grant DP160104662 and the National Strategic Project-Fine particle of the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT(MSIT), the Ministry of Environment(ME), and the Ministry of Health and Welfare (MOHW) (NRF-2017M3D8A1092022).

Du Yong Kim received the B.E. degree in electrical and electronics engineering from Ajou University, Korea, in 2005. He received the M.S. and Ph.D. degrees from the Gwangju Institute of Science and Technology, Korea, in 2006 and 2011, respectively. As a Postdoctoral Researcher, he worked on statistical signal processing and image processing at the Gwangju Institute of Science and Technology (2011–2012), the University of Western Australia (2012–2014), and Curtin University (2014–2018). He is

References (49)

  • A. Dehghan et al.

    Target identity-aware network flow for online multiple target tracking

    CVPR

    (2015)
  • B.T. Vo et al.

    Labeled random finite sets and multi-object conjugate priors

    IEEE Trans. Signal Process.

    (2013)
  • S. Davey et al.

    Track-before-detect techniques

    Integrated Tracking, Classification, and Sensor Management: Theory and Applications

    (2012)
  • B.N. Vo et al.

    Joint detection and estimation of multiple objects from image observations

    IEEE Trans. Signal Process.

    (2010)
  • F. Papi et al.

    A particle multi-target tracking for superposional measurements using labeled random finite sets

    IEEE Trans. Signal Process.

    (2015)
  • F. Papi et al.

    Generalized labeled multi-bernoulli approximation of multi-object densities

    IEEE Trans. Signal Process.

    (2015)
  • M. Isard et al.

    Bramble: a Bayesian multiple-blob tracker

    Proc. Int. Conf. Comput. Vis.

    (2001)
  • P. Pérez et al.

    Color-based probabilistic tracking

    Proc. Eur. Conf. Comput. Vis.

    (2002)
  • A.D.J. Vermaak et al.

    Maintaining multi-modality through mixture tracking

    Proc. Int. Conf. Comput. Vis.

    (2003)
  • R. Mahler

    Statistical multisource-multitarget information fusion

    in: Artech House

    (2007)
  • S.S. Intille et al.

    Closed-world tracking

    Proc. Int. Conf. Comput. Vis.

    (1995)
  • R. Mahler

    Multitarget bayes filtering via first-order multitarget moments

    IEEE Trans. Aerosp. Electron. Sys.

    (2003)
  • R. Mahler

    Advances in statistical multisource-multitarget information fusion

    Artech House

    (2014)
  • B.N. Vo et al.

    Sequential monte carlo methods for multi-target filtering with random finite sets

    IEEE Trans. Aerosp. Electron. Sys.

    (2005)
  • Cited by (57)

    • Detection confidence driven multi-object tracking to recover reliable tracks from unreliable detections

      2023, Pattern Recognition
      Citation Excerpt :

      One practical benefit of RCT is that it does not use a GPU, which in edge settings may be fully utilized by the detection network - future work includes implementing and evaluating an online version of RCT in these settings. Also, while RCT does probabilistically integrate motion and detections, performance in high-density settings could likely be further improved by probabilistically incorporating appearance information in a Bayesian fashion [47]. Additionally, we found many of the top-ranked MOT methods work poorly with a low-quality detector; so it would be interesting to explore an adaptive approach which analyzes detection quality and adapts the tracker behavior accordingly.

    • Multi-object tracking with an adaptive generalized labeled multi-Bernoulli filter

      2022, Signal Processing
      Citation Excerpt :

      An approximation of the GLMB filter, the labeled multi-Bernoulli (LMB) filter, was proposed in [10] to improve computational efficiency further, with the expense of reduced tracking accuracy. The labeled RFS filters have been used in the literature to solve various practical problems in multi-object tracking, for example, in computer vision [14–18], simultaneous localization and mapping (SLAM) [19] in robotics, multi-sensor management [20,21] and multiple drones control [22,23]. Further extensions of the GLMB filter have also been proposed for track-before-detect (TBD) [24], spawning of objects [25,26], merged measurements [27], extended objects [28], multi-sensor tracking [29], and multi-object smoothing [30].

    • Robust multi-sensor generalized labeled multi-Bernoulli filter

      2022, Signal Processing
      Citation Excerpt :

      Furthermore, labeled RFS filters can also be formulated to jointly track the targets and their ancestral information via a spawning model as in [16–18]. Today, RFS-based filters have been applied to many fields ranging from space debris tracking [19,20], crowd surveillance [21,22], automation [17,23] to cell tracking [18,24]. Multi-sensor setting frequently appears in multi-target tracking applications.

    View all citing articles on Scopus

    Du Yong Kim received the B.E. degree in electrical and electronics engineering from Ajou University, Korea, in 2005. He received the M.S. and Ph.D. degrees from the Gwangju Institute of Science and Technology, Korea, in 2006 and 2011, respectively. As a Postdoctoral Researcher, he worked on statistical signal processing and image processing at the Gwangju Institute of Science and Technology (2011–2012), the University of Western Australia (2012–2014), and Curtin University (2014–2018). He is currently working as a Vice-Chancellor’s Research Fellow at the School of Engineering, RMIT University. His main research interests include Bayesian filtering theory and its applications to machine learning, computer vision, sensor networks, and automatic control.

    Ba-Ngu Vo received his Bachelor degrees jointly in Science and Electrical Engineering with first class honors in 1994, and Ph.D. in 1997. He had held various research positions before joining the department of Electrical and Electronic Engineering at the University of Melbourne in 2000. In 2010, he joined the School of Electrical Electronic and Computer Engineering at the University of Western Australia as Winthrop Professor and Chair of Signal Processing. Currently he is Professor and Chair of Signals and Systems in the Department of Electrical and Computer Engineering at Curtin University. Prof. Vo is a recipient of the Australian Research Council’s inaugural Future Fellowship and the 2010 Australian Museum Eureka Prize for Outstanding Science in support of Defence or National Security. His research interests are Signal Processing, Systems Theory and Stochastic Geometry with emphasis on target tracking, robotics, computer vision and space situational awareness. He is best known as a pioneer in the random set approach to multi-object filtering.

    Ba-Tuong Vo was born in Perth, Australia, in 1982. He received the B. Sc. degree in applied mathematics and B.E. degree in electrical and electronic engineering (with first-class honors) in 2004 and the Ph.D. degree in engineering (with Distinction) in 2008, all from the University of Western Australia. He is currently an associate professor in the department of electrical and computer engineering at Curtin University and a recipient of an Australian Research Council Fellowship. His primary research interests are in point process theory, filtering and estimation, and multiple object filtering. Dr. Vo is a recipient of the 2010 Australian Museum DSTO Eureka Prize for “Outstanding Science in Support of Defence or National Security”.

    Moongu Jeon received the B.S. degree in architectural engineering from Korea University, Seoul, Korea, in 1988 and the M.S. and Ph.D. degrees in computer science and scientific computation from the University of Minnesota, Minneapolis, MN, USA, in 1999 and 2001, respectively. In 2001–2003, he was a Postgraduate Researcher with the University of California Santa Barbara, Santa Barbara, CA, USA, where he worked on optimal control problems, and then, he moved to the National Research Council of Canada, where he worked on the sparse representation of high-dimensional data and the level set methods for image processing until July 2005. In 2005, he joined Gwangju Institute of Science and Technology, Gwangju, Korea, where he is currently a Full Professor with the School of Information and Communications. His current research interests include machine learning, computer vision, and ITSs.

    View full text