Online multi-object tracking via robust collaborative model and sample selection

https://doi.org/10.1016/j.cviu.2016.07.003Get rights and content

Abstract

The past decade has witnessed significant progress in object detection and tracking in videos. In this paper, we present a collaborative model between a pre-trained object detector and a number of single-object online trackers within the particle filtering framework. For each frame, we construct an association between detections and trackers, and treat each detected image region as a key sample, for online update, if it is associated to a tracker. We present a motion model that incorporates the associated detections with object dynamics. Furthermore, we propose an effective sample selection scheme to update the appearance model of each tracker. We use discriminative and generative appearance models for the likelihood function and data association, respectively. Experimental results show that the proposed scheme generally outperforms state-of-the-art methods.

Introduction

Multi-object tracking (MOT) is one challenging vision problem with numerous applications in automatic visual surveillance, behavior analysis, and intelligent transportation systems, to name a few. In the past decade, more attention has been paid on detecting and tracking one or more objects in videos. Recent advancement in object detection facilitates collaboration between the detection and tracking modules for multi-object tracking (Breitenstein et al., 2009).

Robust multi-object tracking entails solving many challenging problems such as occlusion, appearance variation, and illumination change. A pre-trained object detector robust to appearance variation of one specific class is often used as a critical module of most multi-object tracking methods. Specifically, one detector encodes the generic pattern information about a certain object class (e.g., cars, pedestrians and faces), and one tracker models the appearance of the specific target to maintain the target identity in an image sequence. However, an object detector is likely to generate false positives and negatives, thereby affecting the performance of a tracker in terms of data association and online model update.

In multi-object tracking, offline methods based on global optimization of all object trajectories usually perform better than online counterparts (Andriyenko, Schindler, 2011, Andriyenko, Schindler, Roth, 2012, Brendel, Amer, Todorovic, 2011, Butt, Collins, 2012, Izadinia, Saleemi, Li, Shah, 2012, Leal-Taixé, Pons-Moll, Rosenhahn, 2011, Shitrit, Berclaz, Fleuret, Fua, 2011, Wu, Thangali, Sclaroff, Betke, 2012, Zamir, Dehghan, Shah, 2012), and an experimental evaluation of recent methods can be found in Leal-Taixé et al. (2015). For instance, Brendel et al. proposed the maximum-weight independent set of a graph for data association (Brendel et al., 2011), and Zamir et al. used the generalized minimum clique graph to solve the data association (Zamir et al., 2012). In Butt and Collins (2012), the data association problem is solved by using a sliding window of three frames to generate short tracklets, and in case of inconsistencies, the algorithm uses larger tracklet optimization. The minimum-cost network flow is then used to optimize the overall object trajectories. For real-time applications, online methods (Breitenstein, Reichlin, Leibe, Koller-Meier, Gool, 2009, Okuma, Taleghani, Freitas, Little, Lowe, 2004, Shu, Dehghan, Oreifej, Hand, Shah, 2012, Wu, Tong, Zhang, Lu, 2008) have been developed within the tracking-by-detection framework where data association between detections and trackers are carried out in an online manner.

Table 1 summarizes the multi-object tracking methods that are most related to this work. Online multi-object tracking can be carried out by using joint state-space model for multi-targets (Duffner, Odobez, 2013, Eiselein, Arp, Patzold, Sikora, 2012, Jin, Mokhtarian, 2007, Maggio, Taj, Cavallaro, 2008, Okuma, Taleghani, Freitas, Little, Lowe, 2004, Vermaak, Doucet, Perez, 2003). For instance, a mixture particle filter has been proposed (Okuma et al., 2004) to compute the posterior probability via the collaboration between an object detector and the proposal distribution of the particle filter. However, the joint state-space tracking methods require high computational complexity. The probability hypothesis density filter (Mahler, 2003) has been incorporated in visual multi-target tracking (Maggio, Piccardo, Regazzoni, Cavallaro, 2007, Maggio, Taj, Cavallaro, 2008) since the time complexity is linear with respect to the number of targets. However, it does not maintain the target identity, and consequently, requires an online clustering method to detect the peaks of the particle weights and applies data association to each cluster.

Numerous online multi-object tracking methods deal with each tracker independently (Breitenstein, Reichlin, Leibe, Koller-Meier, Gool, 2009, Schumann, Bäuml, Stiefelhagen, 2013, Shu, Dehghan, Oreifej, Hand, Shah, 2012, Yang, Lv, Xu, Gong, 2009, Zhang, Presti, Sclaroff, 2012). In Breitenstein et al. (2009), a method based on a particle filter and two human detectors with different features was developed, where the observation model depends on the associated detection, the detector confidence density and the likelihood of appearance. In addition, Shu et al. (2012) introduced a part-based pedestrian detector for online multi-person tracking. This method combines the detection results with the Kalman filter, where data association is performed every frame, and the filter is used when occlusion occurs. Recently, Zhang et al. (2012) used the mean-shift trackers and the Kalman filter for multi-person tracking, where trackers are either weakly or strongly trained. We note that these methods are likely to have low recall as the detector and tracker are not integrated within the same framework.

The degeneracy problem of particle filters (Gordon et al., 1993) has been addressed in several methods (Huang, Djuric, 2004, Jinxia, Yongli, Jingmin, Qian, 2012, Rui, Chen, 2001, Santhoshkumar, Karthikeyan, Manjunath, 2013) with more effective proposal distributions and re-sampling steps. Rui and Chen (2001) used the unscented Kalman filter for generating the proposal distribution, and Han et al. (2011) used a genetic algorithm to increase the diversity of the particles. Recently, the Metropolis Hastings algorithm has been used to sample particles from associated detections in the tracking-by-detection framework (Santhoshkumar et al., 2013). We note that the above-mentioned methods do not exploit the collaboration between detectors and trackers (Han, Ding, Hao, Liang, 2011, Rui, Chen, 2001), or do not consider the effect of false positive detections on the trackers (Santhoshkumar et al., 2013).

An adaptive appearance model is one of the important factors for effective object tracking as it accounts for appearance change (Salti, Cavallaro, Stefano, 2012, Wu, Lim, Yang, 2013). In Okuma et al. (2004), the appearance model is fixed during the tracking process and thus, may result in tracking failure. On the other hand, the trackers are updated with positive samples (Zhang et al., 2012) straightforwardly without differentiating whether they contain noise or not. As multiple objects are likely to be occluded, it is necessary to analyze the samples and reduce the likelihood of including noisy samples for model update. Recently, the appearance models (Shu et al., 2012) have been updated by the detected non-occluded object parts rather than the holistic samples.

In this paper, we propose an online multi-object tracking scheme by using a robust collaborative model for interaction between a number of single-object trackers with sparse representation-based discriminative classifiers (Wright, Yang, Ganesh, Sastry, Ma, 2009, Zhong, Lu, Yang, 2012), and a pre-trained object detector in the particle filter framework, where every target is tracked independently to avoid the high computational complexity of the joint probability with increasing number of targets. A novel sample selection scheme is used to update each tracker by using key samples with high confidence from the trajectory of an object, where the key sample represents the association between the tracker and a detection at time, t. In addition, we present a data association method with partial occlusion handling by using diverse generative models composed of sparsity-based generative model (Zhong et al., 2012), and two-dimensional principal component analysis (2DPCA) (Yang et al., 2004) generative model. Finally, we introduce a 2DPCA generative model to re-identify lost targets. Experimental results on benchmark datasets demonstrate that the proposed scheme generally outperforms state-of-the-art methods.

Section snippets

Overview of the proposed scheme

The proposed multi-object tracking scheme consists of three main components: a pre-trained object detector, a data association module and a number of single-object trackers. Fig. 1 shows the block diagram of the proposed scheme, wherein only one single-object tracker is shown. The object detector is applied on every frame and supports the data association module with a set of detections Dt at time t. The object tracker adopts a hybrid motion model, and a particle filter with a robust

Tracking scheme

Each object tracker is based on the particle filter tracking framework that uses the sparse representations and 2DPCA as the appearance model. We incorporate two measurements from the detector and tracker into the particle filter, and propose a novel collaborative model that directly affects the likelihood function to obtain the posterior estimate of the target location. We construct the appearance model of the target by using discriminative and generative appearance models, for the likelihood

Datasets

We evaluate the tracking performance of the proposed algorithm using seven challenging sequences, namely, the PETS09-S2L1, PETS09-S2L2 (Ferryman, 2009), UCF Parking Lot (UCF-PL) dataset (Shu et al., 2012), Soccer dataset (Wu et al., 2008), Town Center dataset (Benfold and Reid, 2011), and Urban as well as Sunny sequences from LISA 2010 dataset (Sivaraman and Trivedi, 2010), and compare it with that of several state-of-the-art online MOT methods.

The PETS09-S2L1 sequence consists of 799 frames of

Conclusion

In this paper, we have presented a robust collaborative model that enhances the interaction between a pre-trained object detector and a number of single-object online trackers in the particle filter framework. The proposed scheme is based on incorporating the associated detections with the motion model, in addition to the likelihood function providing different weights for the propagated and the newly created particles sampled from the associated detections, providing a reduction on the effect

Acknowledgments

The authors would like to thank Dr. Y. Wu for his helpful discussions and suggestions. They would also like to thank all the authors that made their codes available for comparison of the proposed algorithm with theirs and the anonymous reviewers for their constructive comments and suggestions. M.A. Naiel would like to acknowledge the support from Concordia University to conduct this research. This work is supported by research grants from the Natural Sciences and Engineering Research Council

References (55)

  • H. Han et al.

    An evolutionary particle filter with the immune genetic algorithm for intelligent video target tracking

    Comput. Math. Applicat.

    (2011)
  • F. Poiesi et al.

    Multi-target tracking on confidence maps: an application to people tracking

    Comput. Vis. Image Under.

    (2013)
  • D. Zhang et al.

    (2D)2PCA: Two-directional two-dimensional PCA for efficient face representation and recognition

    Neurocomputing

    (2005)
  • A. Andriyenko et al.

    Multi-target tracking by continuous energy minimization

    Proc. IEEE Conf. on ​Computer Vision and ​Pattern Recognition

    (2011)
  • A. Andriyenko et al.

    Discrete-continuous optimization for multi-target tracking

    Proc. IEEE Conf. on ​Computer Vision and ​Pattern Recognition

    (2012)
  • S.H. Bae et al.

    Robust online multi-object tracking based on tracklet confidence and online discriminative appearance learning

    Proc. IEEE Conf. on Computer Vision and ​Pattern Recognition

    (2014)
  • B. Benfold et al.

    Stable multi-target tracking in real-time surveillance video

    Proc. IEEE Conf. on ​Computer Vision and ​Pattern Recognition

    (2011)
  • K. Bernardin et al.

    Evaluating multiple object tracking performance: the CLEAR MOT metrics

    J. Image Video Process.

    (2008)
  • M.D. Breitenstein et al.

    Robust tracking-by-detection using a detector confidence particle filter

    Proc. IEEE International Conference on Computer Vision

    (2009)
  • M.D. Breitenstein et al.

    Online multiperson tracking-by-detection from a single, uncalibrated camera

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2011)
  • W. Brendel et al.

    Multiobject tracking as maximum weight independent set

    Proc. IEEE Conf. on ​Computer Vision and ​Pattern Recognition

    (2011)
  • A.A. Butt et al.

    Multiple target tracking using frame triplets

    Proc. Asian Conference on Computer Vision

    (2012)
  • N. Dalal et al.

    Histograms of oriented gradients for human detection

    Proc. IEEE Conf. on ​Computer Vision and ​Pattern Recognition

    (2005)
  • Dollár, P.,. Piotr’s Image and Video Matlab Toolbox (PMT). https://pdollar.github.io/toolbox/. Last retrieved, January...
  • P. Dollár et al.

    Fast feature pyramids for object detection

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2014)
  • P. Dollár et al.

    The fastest pedestrian detector in the west

    Proc. British Machine Vision Conference

    (2010)
  • S. Duffner et al.

    Track creation and deletion framework for long-term online multi-face tracking

    IEEE Trans. Image Process.

    (2013)
  • V. Eiselein et al.

    Real-time multi-human tracking using a probability hypothesis density filter and multiple detectors

    Proc. IEEE International Conference on Advanced Video and Signal-Based Surveillance

    (2012)
  • M. Everingham et al.

    The Pascal visual object classes (VOC) challenge

    Int. J. Comput. Vis.

    (2010)
  • J. Ferryman

    In: Proc. IEEE Workshop Performance Evaluation of Tracking and Surveillance

    (2009)
  • D.G. Gomez et al.

    State-driven particle filter for multi-person tracking

    Proc. Advanced Concepts for Intelligent Vision Systems

    (2012)
  • N.J. Gordon et al.

    Novel approach to nonlinear and non-Gaussian Bayesian state estimation

    IEE Proc. F (Radar Signal Process.)

    (1993)
  • Y. Huang et al.

    A hybrid importance function for particle filtering

    IEEE Signal Process. Lett.

    (2004)
  • H. Izadinia et al.

    (MP)2T: multiple people multiple parts tracker

    Proc. European Conference on Computer Vision

    (2012)
  • Y. Jin et al.

    Variational particle filter for multi-object tracking

    Proc. IEEE ​International Conference on Computer Vision

    (2007)
  • Y. Jinxia et al.

    Research on particle filter based on an improved hybrid proposal distribution with adaptive parameter optimization

    Proc. International Conference on Intelligent Computation Technology and Automation

    (2012)
  • C.-H. Kuo et al.

    Multi-target tracking by on-line learned discriminative appearance models

    Proc. IEEE Conf. on ​Computer Vision and Pattern Recognition

    (2010)
  • View full text