Elsevier

Image and Vision Computing

Volume 32, Issue 11, November 2014, Pages 930-939
Image and Vision Computing

Robust object tracking using least absolute deviation

https://doi.org/10.1016/j.imavis.2014.08.008Get rights and content

Highlights

  • The representation error is modelled as a Laplacian distribution.

  • We derive our new LAD–Lasso model based on a Bayesian MAP estimate.

  • LAD–Lasso model is robust to outliers.

  • The number of optimisation variable in the new model reduces greatly.

  • We use ADMM algorithm to solve the new optimisation problem.

Abstract

Recently, sparse representation has been applied to object tracking, where each candidate target is approximately represented as a sparse linear combination of target templates. In this paper, we present a new tracking algorithm, which is faster and more robust than other tracking algorithms, based on sparse representation. First, with an analysis of many typical tracking examples with various degrees of corruption, we model the corruption as a Laplacian distribution. Then, a LAD–Lasso optimisation model is proposed based on Bayesian Maximum A Posteriori (MAP) estimation theory. Compared with L1 Tracker and APG-L1 Tracker, the number of optimisation variables is reduced greatly; it is equal to the number of target templates, regardless of the dimensions of the feature. Finally, we use the Alternating Direction Method of Multipliers (ADMM) to solve the proposed optimisation problem. Experiments on some challenging sequences demonstrate that our proposed method performs better than the state-of-the-art methods in terms of accuracy and robustness.

Introduction

Object tracking is an important component of many surveillance systems, such as transport systems (e.g., road traffic, airports and harbours), public spaces (e.g., shopping malls and parks), industrial environments, low-altitude rescue and military establishments. More efficient and robust object tracking remains a challenge due to the issues of image noise, complex object motion, partial or full occlusions, drastic illumination and pose changes [1].

Methods based on particle filter are widely used for tracking. Under the particle filter framework, different image features such as colour [3], [4], [7], shape [2], [13] and image structure [7], [15] can be used to represent the appearance of object.

Recently, there has been an increased interest in sparse representation and its applications in the field of computer vision. Under the particle filter framework, Xue Mei et al. [8], [9] proposed a tracking method based on sparse representation, which was named the L1 Tracker. To find the tracking target in a frame, each target candidate (particle sample) is approximately expressed as a sparse linear combination of some target templates and trivial templates. The sparse representation coefficients of a target candidate are calculated by solving a L1-regularised least squares problem, which is high computational cost due to the high dimension of trivial templates.

To further accelerate the L1 Tracker, Hanxi Li et al. [16] use the orthogonal matching pursuit (OMP) algorithm to search for a sparse solution. To reduce the number of particle samples that need to participate in solving the optimisation problems, Xue Mei et al. [10] improve L1 Tracker with a minimal error bounding strategy called the BPR-L1 Tracker. The APG-L1 Tracker proposed by Chenglong Bao et al. [24] used an accelerated proximal gradient (APG) approach to solve a new L1-norm related problem that added one term to control the energy of trivial templates. By regularizing the representation problem to enforce joint sparsity and learning the particle representations together, Tianzhu Zhang et al. [34] propose a computationally efficient multi-task sparse learning method to mine correlations among different tasks to obtain better tracking results than learning each task individually. The linear representation in their later work [37] incorporates background templates in the dictionary to discriminate the target from the background better and casts the tracking problem as an efficient low-rank matrix learning problem.

To further accelerate the L1 Tracker, Hanxi Li et al. [16] use the orthogonal matching pursuit (OMP) algorithm to search for a sparse solution. To reduce the number of particle samples that need to participate in solving the optimisation problems, Xue Mei et al. [10] improve the computation speed of L1 Tracker with a minimal error bounding strategy called the BPR-L1 Tracker. The APG-L1 Tracker proposed by Chenglong Bao et al. [24] used an accelerated proximal gradient (APG) approach to solve a new L1-norm related problem that added one term to control the energy of trivial templates. By enforcing joint sparsity, Tianzhu Zhang et al. [34] propose a computationally efficient multi-task sparse learning method, which learns the particle representation solutions of all particle samples together.

Some work based on sparse representation is devoted to improve the robustness of tracking. To alleviate the accumulation of errors during the self-updating, Baiyang Liu et al. [31] use a static sparse dictionary and a dynamically online updated basis distribution to model the target appearance. In order to deal with the challenge of drastic appearance change, Zhong Wei et al. [32] propose an appearance model exploiting both holistic templates and local representation. Jia Xu et al. [33] develop an appearance model, which exploits both partial information and spatial information of the target based on a novel alignment-pooling method. To discriminate the target from the background, Tianzhu Zhang et al. [37] incorporates background templates into the dictionary of sparse representation and reformulates the tracking problem as an efficient low-rank matrix learning problem.

In the view of the model of the representation error, LSS [36] assumes that the representation error follows the Gaussian–Laplacian distribution. Dong Wang et al. [35], [36] use classic principal component analysis (PCA) to learn effective appearance model. It needs to be stressed that there is no sparsity constraint on the representation coefficients in [35], [36]. Consequently LSS [36] is not in the framework of sparse representation.

In this paper, we propose a new tracking method under the framework of the L1 Tracker that can work more quickly and robustly. Our main contributions include:

  • 1)

    The representation error is modelled as a random variable following a Laplacian distribution. The representation error, which indicates corruption or noise, is random and unknown in advance. Thus, accurately modelling the representation error is a key to the robustness of tracking. After an elaborate analysis of the distribution of the corruption, we find that the distribution of the corruption is characterised by one spike and a long-tail, so we model the representation error as a Laplacian distribution.

  • 2)

    Based on the Laplacian representation error model and a sparseness-promoting prior of the representation vector, we derive our new LAD–Lasso model with a Bayesian Maximum A Posteriori (MAP) estimate. The number of optimisation variables in our new model is equal to the number of target templates, regardless of the dimensions of the feature. Thus, the computation cost can be reduced greatly compared with L1 Tracker and APG-L1 Tracker.

  • 3)

    After reformulating our proposed optimisation model, we use Alternating Direction Method of Multipliers (ADMM) to solve our proposed nonsmooth optimisation problem.

We name our new method the LAD Tracker (Least Absolute Deviation). Experiments on challenging video sequences demonstrate our method performs well in computation speed and robustness.

This paper is organised as follows: In Section 2, we briefly review the basic idea of trackers based on sparse representation. Section 3 introduces our LAD Tracker in detail. In Section 4, we make a theoretical analysis of the robustness and computation cost, compared with other trackers based on sparse representation. In Section 5, we demonstrate the performances of the LAD Tracker through numerous experiments. The conclusion is made in Section 6.

Section snippets

Trackers based on sparse representation

In this section, we will briefly introduce the framework of trackers based on sparse representation. John Wright and Yi Ma et al. [20] addressed the problem of human face recognition via computing sparse linear representations with regard to a dictionary of different human faces. Then, Xue Mei et al. [8] extend the application based on sparse representation to tracking named the L1 Tracker. There are many later works that improve on this method [10], [16], [24]. All of these related trackers

LAD Tracker using least absolute deviation

This section will introduce the details of our proposed LAD Tracker.

As introduced in Section 2, the main purpose of a sparse representation is to estimate the representation vector x efficiently and accurately under the given T and y.

The main idea of LAD Tracker can be described in Fig. 2. First, from the representation model in Eq. (1), the representation error e will affect the estimate of x and thus will affect the robustness of tracking when corruptions occur. Thus, how to accurately model

Analysis of robustness and computation cost

In this section, we will discuss the robustness and computation cost of our LAD Tracker.

Experiments

In this section, we perform some experiments to demonstrate the performance of our LAD Tracker.

We compare LAD Tracker with 7 state of the art trackers: APG-L1 Tracker [24], L1 Tracker [8], CT [14], MIL Tracker [6] IVT [5], MTT [34] and OSP [35]. It is worth mentioning that different trackers are suitable for different sequences. For example, trackers based on tracking-by-detection, such as MIL Tracker and CT perform better than trackers based on sparse representation when dramatic pose

Conclusions

In this paper, we proposed a new tracking method based on sparse representation. By modelling the corruption as a Laplacian distribution, we propose a new optimisation model for estimating the representation vector and use an ADMM optimisation algorithm to solve it. Numerous simulation results on challenging sequences demonstrated that our LAD Tracker performs very well.

There is still one interesting question about the parameter λ = a/b in our model in Eq. (9). It is an important parameter that

References (37)

  • D. Gabay et al.

    A dual algorithm for the solution of nonlinear variational problems via finite element approximation

    Comput. Math. Appl.

    (1976)
  • Alper Yilmaz et al.

    Object tracking: a survey

    ACM J. Comput. Surv.

    (2006)
  • A. Yilmaz et al.

    Contour-based object tracking with occlusion handling in video acquired using mobile cameras

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2004)
  • Dorin Comaniciu et al.

    Real-time tracking of non-rigid objects using mean shift

    Comput. Vis. Pattern Recognit.

    (2000)
  • Patrick Pérez et al.

    Color-based probabilistic tracking

    Eur. Conf. Comp. Vision

    (2002)
  • David A. Ross et al.

    Incremental learning for robust visual tracking

    Int. J. Comput. Vis.

    (2008)
  • Boris Babenko et al.

    Visual tracking with online multiple instance learning

    Comput. Vis. Pattern Recognit.

    (2009)
  • Kenji Okuma et al.

    A boosted particle filter: multitarget detection and tracking

    Eur. Conf. Comput. Vis.

    (2004)
  • Xue Mei et al.

    Robust visual tracking using ℓ1 minimization

    Int. Conf. Comput. Vis.

    (2009)
  • Xue Mei et al.

    Robust visual tracking and vehicle classification via sparse representation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2011)
  • Xue Mei et al.

    Minimum error bounded efficient ℓ1 tracker with occlusion detection

    Comput. Vis. Pattern Recognit.

    (2011)
  • Stephen P. Boyd et al.

    Convex Optimization

    (2004)
  • M. Sanjeev Arulampalam et al.

    A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking

    IEEE Trans. Signal Process.

    (2002)
  • Michael Isard et al.

    CONDENSATION — Conditional Density Propagation for Visual Tracking

    Int. J. Comput. Vision

    (1998)
  • K. Zhang et al.

    Real-time compressive tracking

    Eur. Conf. Comput. Vis.

    (2012)
  • Artur Loza et al.

    Structural similarity-based object tracking in multimodality surveillance videos

    Mach. Vis. Appl.

    (2009)
  • Hanxi Li et al.

    Real-time visual tracking using compressive sensing

    Comput. Vis. Pattern Recognit.

    (2011)
  • J. Yang et al.

    Alternating direction algorithms for ℓ 1-problems in compressive sensing

    SIAM J. Sci. Comput.

    (2011)
  • Cited by (0)

    This paper has been recommended for acceptance by Ming-Hsuan Yang.

    View full text