A labeled random finite set online multi-object tracker for video data

doi:10.1016/j.patcog.2019.02.004

Pattern Recognition

Volume 90, June 2019, Pages 377-389

https://doi.org/10.1016/j.patcog.2019.02.004 Get rights and content

Highlights

•
The proposed filter addresses occlusions and detection loss that exploits the advantages of both detection-based and TBD approaches to improve performance while reducing the computational cost.
•
In a single Bayesian recursion the filter seamlessly integrates state estimation, track management, clutter rejection, detection loss and occlusion handling as well as prior knowledge that detection loss in the middle of the scene are likely to be due to occlusions.
•
Tracking performance is compared to state-of-the-art algorithms on simulated data and well-known benchmark video datasets.

Abstract

This paper proposes an online multi-object tracking algorithm for image observations using a top-down Bayesian formulation that seamlessly integrates state estimation, track management, handling of false positives, false negatives and occlusion into a single recursion. This is achieved by modeling the multi-object state as labeled random finite set and using the Bayes recursion to propagate the multi-object filtering density forward in time. The proposed filter updates tracks with detections but switches to image data when detection loss occurs, thereby exploiting the efficiency of detection data and the accuracy of image data. Furthermore the labeled random finite set framework enables the incorporation of prior knowledge that detection loss in the middle of the scene are likely to be due to occlusions. Such prior knowledge can be exploited to improve occlusion handling, especially long occlusions that can lead to premature track termination in on-line multi-object tracking. Tracking performance is compared to state-of-the-art algorithms on synthetic data and well-known benchmark video datasets.

Introduction

In a multiple object setting, not only do the states of the objects vary with time, but the number of objects also changes due to objects appearing and disappearing. In this work, we consider the problem of jointly estimating the time-varying number of objects and their trajectories from a stream of noisy images. In particular, we are interested in multi-object tracking (MOT) solutions that compute estimates at a given time using only data up to that time. These so-called online solutions are better suited for time-critical applications.

A critical function of a multi-object tracker is track management, which concerns track initiation/termination and track labeling or identifying trajectories of individual objects. Track management is more challenging for online algorithms than for batch algorithms. Usually, track initiation/termination in online MOT algorithms is performed by examining consecutive detections in time [1], [2]. However, false positives generated by the background, compounded by false negatives (including those from object occlusions), can result in false tracks and lost tracks, especially in online algorithms. False negatives also cause track fragmentation in batch algorithms as reported in [3], [4], [5], [6]. With the exception of the recent network flow [7] techniques, track labels are assigned upon track initiation, and maintained over time until termination. An online multi-object Bayesian filter that provides systematic track labeling using labeled random finite set (RFS) was proposed in [8].

In most video MOT approaches, each image in the data sequence is compressed into a set of detections before a filtering operation is applied to keep track of the objects (including undetected ones). Typically, in the filtering module, motion correspondence or data association is first determined followed by the application of standard filtering techniques such as Kalman or sequential Monte Carlo [1], [2]. The main advantage of performing detection before filtering is the computational efficiency in the compression of images into relevant detections. The main disadvantage is the loss of information, in addition to false negatives and false positives, especially in low signal to noise ratio (SNR) applications.

Track-before-detect (TBD) is an alternative approach, which by-passes the detection module and exploits the spatio-temporal information directly from the image sequence. The TBD methodology is often required in tracking applications for low SNR image data [9], [10], [11], [12]. In visual tracking applications, perhaps the most well-known TBD MOT algorithm is BraMBLe [13]. Other visual MOT algorithms that can be categorized as TBD include [14], [15] which exploit color-based observation models, [2], [16], which exploit multi-modality of distributions, and [17] which uses multi-Bernoulli random finite set models. While the TBD approach minimizes information loss, it is computationally more expensive. So far it is not clear how we could simultaneously process detection and image measurements to exploit their complementary advantages, in a principled manner.

In this paper, we develop an efficient online MOT algorithm for video data that exploits the advantages of both detection-based and TBD approaches to improve performance while reducing the computational cost. In the visual MOT literature, simultaneous consideration of detections and image features were proposed in ad-hoc manners [1], [5], and it is not clear how to combine them in a principled way. The innovation of our proposed algorithm is the adaptive update of tracks with detections (for efficiency), or with local regions of the input image (to minimize information loss and improve accuracy). In addition, the proposed visual MOT filter seamlessly integrates state estimation, track management, clutter rejection, false negatives and occlusion handling, (which are traditionally separate functionalities) in a single Bayesian recursion.

The key technical contribution is a hybrid multi-object measurement model that simultaneously accommodates detections and image observations. Conceptually, this model is a simple generalization of the standard multi-object measurement model [18] and the separable model for image measurement [10]. Such a simple construct, however, enables us to simultaneously exploit the efficiency of the detection-based approach and the accuracy of TBD-based approach. Specifically, using the labeled RFS framework for multi-object estimation [8], we prove conjugacy of the Generalized Labelled Multi-Bernoulli (GLMB) distributions with respect to the likelihood function of the proposed measurement model. Using this conjugacy result, and the labeled RFS estimation formulation [8], we develop an analytic Bayesian MOT filter that avoids processing the entire image so as to reduce computational costs, while at the same time make use of relevant local information at the image level to reduce the effect of false negatives as well as tracking errors.

Due to the labeled RFS filtering formulation, the proposed MOT filter addresses state estimation, track management, clutter rejection, false negatives and occlusion handling, in one single recursion. Generally, an online MOT algorithm would terminate a track that has not been detected over several frames. In many visual MOT applications however, it is observed that away from designated exit regions such as scene edges, the longer an object is in the scene, the less likely it is to disappear, see for example [19], [20] which exploit theses so-called closed world assumptions. Intuitively, this observation can be used to delay the termination of tracks that have been occluded over an extended period, so as to improve occlusion handling. The labeled RFS framework provides a principled and inexpensive means to exploit this observation for improved occlusion handling.

The remainder of the paper is structured as follows. The Bayesian filtering formulation of the MOT problem using labeled RFS is given in Section 2, followed by details of the proposed solution in Section 3. Performance evaluation of the proposed MOT filter against state-of-the-art trackers is presented in Section 4, and concluding remarks are given in Section 5.

Section snippets

Bayesian multiple object tracking

This section outlines the RFS framework for MOT that accommodates uncertainty in the number of objects, the states of the objects and their trajectories. The salient feature of this framework is that it admits direct parallels between traditional Bayesian filtering and MOT. The modeling of the multi-object state as an RFS in Section 2.1 enables Bayesian filtering concepts to be directly translated to the multi-object case in Section 2.2. Section 2.3 examines the MOT problem in the presence of

GLMB Filter for tracking with image data

The GLMB filter (with the standard measurement likelihood) is a suitable candidate for online MOT [26], [28]. However, it is neither designed to handle occlusion nor image data. Even though occluded objects share the observations of the occluding objects, this situation is not permitted in the standard multi-object likelihood. Consequently, uncertainties in the states of occluded objects grow, while their existence probabilities quickly diminish to zero, leading to possible hi-jacking, and

Experimental results

The proposed MOT filter is tested on a simulated TBD application in Section 4.1, and on real video data in Section 4.2.

Conclusion

This paper proposed an efficient online visual MOT algorithm that exploits the advantages of both detection-based and TBD approaches, which seamlessly integrates state estimation, track management, clutter rejection, false negatives and occlusion handling into one single Bayesian recursion. In particular, it has the efficiency of the detection-based approach that avoids updating with the entire image, while at the same time making use of information at the image level by using only small

Acknowledgements

This work was supported by the Australian Research Council through a research grant DP160104662 and the National Strategic Project-Fine particle of the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT(MSIT), the Ministry of Environment(ME), and the Ministry of Health and Welfare (MOHW) (NRF-2017M3D8A1092022).

References (49)

L. Zhang et al.
Global data association for multi-object tracking using network flows
CVPR
(2008)
S. Zhang et al.
Multi-target tracking by learning local-to-global trajectory models
Pattern Recognit.
(2015)
K. Nummiaro et al.
An adaptive color-based particle filter
Image Vis. Comput.
(2003)
R. Hoseinnezhad et al.
Visual tracking of numerous targets via multi-bernoulli filtering of image data
Pattern Recognit.
(2012)
M. Kristan et al.
Closed-world tracking of multiple interacting targets for indoor-sports applications
Comput. Vis. Image Understanding
(2009)
X. Shen et al.
Adaptive pedestrian tracking via patch-based features and spatial-temporal similarity measurement
Pattern Recognit.
(2016)
M.D. Breitenstein et al.
Online multiperson tracking-by-detection from a single, uncalibrated camera
PAMI
(2011)
K. Okuma et al.
A boosted particle filter: multitarget detection and tracking
Proc. Eur. Conf. Comput. Vis.
(2004)
J. Berclaz et al.
Multiple object tracking using k-shortest paths optimization
PAMI
(2011)
A. Milan et al.
Continuous energy minimization for multitarget tracking
PAMI
(2014)

A. Dehghan et al.

Target identity-aware network flow for online multiple target tracking

CVPR

(2015)

B.T. Vo et al.

Labeled random finite sets and multi-object conjugate priors

IEEE Trans. Signal Process.

(2013)

S. Davey et al.

Track-before-detect techniques

Integrated Tracking, Classification, and Sensor Management: Theory and Applications

(2012)

B.N. Vo et al.

Joint detection and estimation of multiple objects from image observations

IEEE Trans. Signal Process.

(2010)

F. Papi et al.

A particle multi-target tracking for superposional measurements using labeled random finite sets

IEEE Trans. Signal Process.

(2015)

F. Papi et al.

Generalized labeled multi-bernoulli approximation of multi-object densities

IEEE Trans. Signal Process.

(2015)

M. Isard et al.

Bramble: a Bayesian multiple-blob tracker

Proc. Int. Conf. Comput. Vis.

(2001)

P. Pérez et al.

Color-based probabilistic tracking

Proc. Eur. Conf. Comput. Vis.

(2002)

A.D.J. Vermaak et al.

Maintaining multi-modality through mixture tracking

Proc. Int. Conf. Comput. Vis.

(2003)

R. Mahler

Statistical multisource-multitarget information fusion

in: Artech House

(2007)

S.S. Intille et al.

Closed-world tracking

Proc. Int. Conf. Comput. Vis.

(1995)

R. Mahler

Multitarget bayes filtering via first-order multitarget moments

IEEE Trans. Aerosp. Electron. Sys.

(2003)

R. Mahler

Advances in statistical multisource-multitarget information fusion

Artech House

(2014)

B.N. Vo et al.

Sequential monte carlo methods for multi-target filtering with random finite sets

IEEE Trans. Aerosp. Electron. Sys.

(2005)

Cited by (57)

Multi-label Feature selection with adaptive graph learning and label information enhancement
2024, Knowledge-Based Systems
The high dimensionality and complexity of multi-label data make obtaining accurate label sets in practical applications difficult. Noisy data in the labels will affect the model’s classification performance. Existing methods only reconstruct logical labels into semantic labels to mitigate the effect of inherent noise and do not specifically address the noise. In order to solve this problem and eliminate the effect of noise, this paper proposes a multi-label feature selection method with adaptive graph learning and label information enhancement (AGLE). First, each label is represented as a combination of other labels using global label correlation, which can reconstruct the original label matrix to obtain the self-expression label matrix. Based on this, similar labels in two matrices can be considered reliable labels, and not similar may be noise. Therefore, constructing a predictive label matrix to weigh the learning of original and self-expression labels can enhance reliable information. Second, an adaptive graph is utilized to learn global label correlation to ensure that the self-expression label matrix can be continuously optimized. Then, the local label manifold structure is utilized to constrain the adaptive graph for more stable performance. Finally, the adaptive graph and $ℓ_{2, 1}$ -norm are combined to guide learning the feature weight matrix. For the proposed method, an iterative algorithm based on alternating optimization is used to compute the weight matrix. Comparing experimental results with several state-of-the-art algorithms demonstrates the effectiveness of the AGLE method for the noise problem.
Detection confidence driven multi-object tracking to recover reliable tracks from unreliable detections
2023, Pattern Recognition
Citation Excerpt :
One practical benefit of RCT is that it does not use a GPU, which in edge settings may be fully utilized by the detection network - future work includes implementing and evaluating an online version of RCT in these settings. Also, while RCT does probabilistically integrate motion and detections, performance in high-density settings could likely be further improved by probabilistically incorporating appearance information in a Bayesian fashion [47]. Additionally, we found many of the top-ranked MOT methods work poorly with a low-quality detector; so it would be interesting to explore an adaptive approach which analyzes detection quality and adapts the tracker behavior accordingly.
Multi-object tracking (MOT) systems often rely on accurate object detectors; however, accurate detectors are not available in every application domain. We present Robust Confidence Tracking (RCT), an offline MOT algorithm designed for settings where detection quality is poor. Whereas prior methods simply threshold and discard detection confidence information, RCT relies on the exact detection confidence values to increase track quality throughout the entire tracking pipeline. This innovation (along with some simple and well-studied heuristics) allows RCT to achieve robust performance with minimal identity switches, even when provided with completely unfiltered detections. To compare trackers in the presence of unreliable detections, we present a challenging real-world underwater fish tracking dataset, FISHTRAC. In an large-scale evaluation across FISHTRAC, UA-DETRAC, and MOTChallenge data, RCT outperforms a wide variety of trackers, including deep trackers and more classic approaches. We have publically released our FISHTRAC codebase and training dataset at https://github.com/tmandel/fish-detrac, which will facilitate comparing trackers on understudied problems.
Online multi-object tracking with δ-GLMB filter based on occlusion and identity switch handling
2022, Image and Vision Computing
In this paper, we propose an online multi-object tracking (MOT) method based on the delta Generalized Labeled Multi-Bernoulli ( $δ$ -GLMB) filter framework to address occlusion and miss-detection issues and recover identity switch (ID switch). Along with the principal $δ$ -GLMB filter that performs multi-object tracking, we propose a one-step $δ$ -GLMB filter to handle occlusion and miss-detection. The one-step $δ$ -GLMB filter is non-iterative and only requires current measurements. The filter is based on a proposed measurement-to-reappeared track association method and addresses MOT issues by incorporating all occluded and miss-detected objects. We introduce a novel similarity metric to apply in the measurement-to-reappeared track association process to define the weight of hypothesized reappeared tracks. To ensure the track consistency, we also extend the principal $δ$ -GLMB filter to efficiently recover switched IDs using the cardinality density, size, and visual features of the hypothesized tracks. In addition, we perform an ablation study to demonstrate the contribution of the main parts of the proposed method. We evaluate the proposed method on well-known and publicly available test datasets focused on pedestrian tracking. Note that our proposed method is online and not based on the learning paradigm. So it does not use any additional source of information such as private detections and pre-trained networks. Despite that, we achieved a reliable performance in multiple persons tracking at complex scenes by applying occlusion/miss-detection and ID switch handlers. Experimental results show that the proposed tracker performs better or at least at the same level of the state-of-the-art online and offline MOT methods.
Multiple targets tracking by using improved labeled multi-target multi-Bernoulli filter with FDA-MIMO radar
2022, Digital Signal Processing: A Review Journal
In the traditional multi-target multi-Bernoulli (MeMBer) filter, only the radar signal after constant false alarm rate (CFAR) is considered. In addition, it is also necessary to assume that the detection probability of the target and the covariance matrix of the measurement noise are known. In fact, the detection probability of the target and the covariance matrix of the noise are difficult to determine. Therefore, this paper proposes an improved labeled multi-target multi-Bernoulli (IL-MeMBer) filter that can avoid considering the detection probability of the target and the covariance matrix of the measurement noise. Firstly, the multi-target superposition measurement model of FDA-MIMO is given in the presence of false targets. Secondly, the target likelihood function under the superposition measurement model is derived by analyzing the characteristics of the traditional point measurement likelihood function and the Music spatial spectrum. Thirdly, considering the unknown and time-varying number of targets, the original echo signals without detection probability before (CFAR) processing, and no target tracks output by the traditional MeMBer filter, we put forward an analytical solution of the IL-MeMBer filter. Finally, a fusion strategy based on Euclidean distance is proposed to solve the problem that the same target can be estimated to have multiple tracks in the IL-MeMBer filter. In addition, the sequential Monte Carlo (SMC) implementation of the IL-MeMBer filter is provided. Numerical experiences verify the effectiveness and superiority of the proposed methods.
Multi-object tracking with an adaptive generalized labeled multi-Bernoulli filter
2022, Signal Processing
Citation Excerpt :
An approximation of the GLMB filter, the labeled multi-Bernoulli (LMB) filter, was proposed in [10] to improve computational efficiency further, with the expense of reduced tracking accuracy. The labeled RFS filters have been used in the literature to solve various practical problems in multi-object tracking, for example, in computer vision [14–18], simultaneous localization and mapping (SLAM) [19] in robotics, multi-sensor management [20,21] and multiple drones control [22,23]. Further extensions of the GLMB filter have also been proposed for track-before-detect (TBD) [24], spawning of objects [25,26], merged measurements [27], extended objects [28], multi-sensor tracking [29], and multi-object smoothing [30].
The challenges in multi-object tracking mainly stem from the random variations in the cardinality and states of objects during the tracking process. Further, the information on locations where the objects appear, their detection probabilities, and the statistics of the sensor’s false alarms significantly influence the tracking accuracy of the filter. However, this information is usually assumed to be known and provided by the users. In this paper, we propose an adaptive generalized labeled multi-Bernoulli (GLMB) filter which can track multiple objects without prior knowledge of the aforementioned information. Experimental results show that the performance of the proposed filter is comparable to an ideal GLMB filter supplied with correct information of the tracking scenarios.
Robust multi-sensor generalized labeled multi-Bernoulli filter
2022, Signal Processing
Citation Excerpt :
Furthermore, labeled RFS filters can also be formulated to jointly track the targets and their ancestral information via a spawning model as in [16–18]. Today, RFS-based filters have been applied to many fields ranging from space debris tracking [19,20], crowd surveillance [21,22], automation [17,23] to cell tracking [18,24]. Multi-sensor setting frequently appears in multi-target tracking applications.
This paper proposes an efficient and robust algorithm to estimate target trajectories with unknown target detection profiles and clutter rates using measurements from multiple sensors. In particular, we propose to combine the multi-sensor Generalized Labeled Multi-Bernoulli (MS-GLMB) filter to estimate target trajectories and robust Cardinalized Probability Hypothesis Density (CPHD) filters to estimate the clutter rates. The target detection probability is augmented to the filtering state space for joint estimation. Experimental results show that the proposed robust filter exhibits near-optimal performance in the sense that it is comparable to the optimal MS-GLMB operating with true clutter rate and detection probability. More importantly, it outperforms other studied filters when the detection profile and clutter rate are unknown and time-variant. This is attributed to the ability of the robust filter to learn the background parameters on-the-fly.

View all citing articles on Scopus

Du Yong Kim received the B.E. degree in electrical and electronics engineering from Ajou University, Korea, in 2005. He received the M.S. and Ph.D. degrees from the Gwangju Institute of Science and Technology, Korea, in 2006 and 2011, respectively. As a Postdoctoral Researcher, he worked on statistical signal processing and image processing at the Gwangju Institute of Science and Technology (2011–2012), the University of Western Australia (2012–2014), and Curtin University (2014–2018). He is currently working as a Vice-Chancellor’s Research Fellow at the School of Engineering, RMIT University. His main research interests include Bayesian filtering theory and its applications to machine learning, computer vision, sensor networks, and automatic control.

Ba-Ngu Vo received his Bachelor degrees jointly in Science and Electrical Engineering with first class honors in 1994, and Ph.D. in 1997. He had held various research positions before joining the department of Electrical and Electronic Engineering at the University of Melbourne in 2000. In 2010, he joined the School of Electrical Electronic and Computer Engineering at the University of Western Australia as Winthrop Professor and Chair of Signal Processing. Currently he is Professor and Chair of Signals and Systems in the Department of Electrical and Computer Engineering at Curtin University. Prof. Vo is a recipient of the Australian Research Council’s inaugural Future Fellowship and the 2010 Australian Museum Eureka Prize for Outstanding Science in support of Defence or National Security. His research interests are Signal Processing, Systems Theory and Stochastic Geometry with emphasis on target tracking, robotics, computer vision and space situational awareness. He is best known as a pioneer in the random set approach to multi-object filtering.

Ba-Tuong Vo was born in Perth, Australia, in 1982. He received the B. Sc. degree in applied mathematics and B.E. degree in electrical and electronic engineering (with first-class honors) in 2004 and the Ph.D. degree in engineering (with Distinction) in 2008, all from the University of Western Australia. He is currently an associate professor in the department of electrical and computer engineering at Curtin University and a recipient of an Australian Research Council Fellowship. His primary research interests are in point process theory, filtering and estimation, and multiple object filtering. Dr. Vo is a recipient of the 2010 Australian Museum DSTO Eureka Prize for “Outstanding Science in Support of Defence or National Security”.

Moongu Jeon received the B.S. degree in architectural engineering from Korea University, Seoul, Korea, in 1988 and the M.S. and Ph.D. degrees in computer science and scientific computation from the University of Minnesota, Minneapolis, MN, USA, in 1999 and 2001, respectively. In 2001–2003, he was a Postgraduate Researcher with the University of California Santa Barbara, Santa Barbara, CA, USA, where he worked on optimal control problems, and then, he moved to the National Research Council of Canada, where he worked on the sparse representation of high-dimensional data and the level set methods for image processing until July 2005. In 2005, he joined Gwangju Institute of Science and Technology, Gwangju, Korea, where he is currently a Full Professor with the School of Information and Communications. His current research interests include machine learning, computer vision, and ITSs.

View full text

A labeled random finite set online multi-object tracker for video data

Highlights

Abstract

Introduction

Section snippets

Bayesian multiple object tracking

GLMB Filter for tracking with image data

Experimental results

Conclusion

Acknowledgements

Pattern Recognit.

Image Vis. Comput.

Pattern Recognit.

Comput. Vis. Image Understanding

Pattern Recognit.

Online multiperson tracking-by-detection from a single, uncalibrated camera

PAMI

A boosted particle filter: multitarget detection and tracking

Proc. Eur. Conf. Comput. Vis.

Multiple object tracking using k-shortest paths optimization

PAMI

Continuous energy minimization for multitarget tracking

PAMI

Target identity-aware network flow for online multiple target tracking

CVPR

Labeled random finite sets and multi-object conjugate priors

IEEE Trans. Signal Process.

Track-before-detect techniques

Integrated Tracking, Classification, and Sensor Management: Theory and Applications

Joint detection and estimation of multiple objects from image observations

IEEE Trans. Signal Process.

A particle multi-target tracking for superposional measurements using labeled random finite sets

IEEE Trans. Signal Process.

Generalized labeled multi-bernoulli approximation of multi-object densities

IEEE Trans. Signal Process.

Bramble: a Bayesian multiple-blob tracker

Proc. Int. Conf. Comput. Vis.

Color-based probabilistic tracking

Proc. Eur. Conf. Comput. Vis.

Maintaining multi-modality through mixture tracking

Proc. Int. Conf. Comput. Vis.

Statistical multisource-multitarget information fusion

in: Artech House

Closed-world tracking

Proc. Int. Conf. Comput. Vis.

Multitarget bayes filtering via first-order multitarget moments

IEEE Trans. Aerosp. Electron. Sys.

Advances in statistical multisource-multitarget information fusion

Artech House

Sequential monte carlo methods for multi-target filtering with random finite sets

IEEE Trans. Aerosp. Electron. Sys.