Robust object tracking using least absolute deviation☆
Graphical abstract
Introduction
Object tracking is an important component of many surveillance systems, such as transport systems (e.g., road traffic, airports and harbours), public spaces (e.g., shopping malls and parks), industrial environments, low-altitude rescue and military establishments. More efficient and robust object tracking remains a challenge due to the issues of image noise, complex object motion, partial or full occlusions, drastic illumination and pose changes [1].
Methods based on particle filter are widely used for tracking. Under the particle filter framework, different image features such as colour [3], [4], [7], shape [2], [13] and image structure [7], [15] can be used to represent the appearance of object.
Recently, there has been an increased interest in sparse representation and its applications in the field of computer vision. Under the particle filter framework, Xue Mei et al. [8], [9] proposed a tracking method based on sparse representation, which was named the L1 Tracker. To find the tracking target in a frame, each target candidate (particle sample) is approximately expressed as a sparse linear combination of some target templates and trivial templates. The sparse representation coefficients of a target candidate are calculated by solving a L1-regularised least squares problem, which is high computational cost due to the high dimension of trivial templates.
To further accelerate the L1 Tracker, Hanxi Li et al. [16] use the orthogonal matching pursuit (OMP) algorithm to search for a sparse solution. To reduce the number of particle samples that need to participate in solving the optimisation problems, Xue Mei et al. [10] improve L1 Tracker with a minimal error bounding strategy called the BPR-L1 Tracker. The APG-L1 Tracker proposed by Chenglong Bao et al. [24] used an accelerated proximal gradient (APG) approach to solve a new L1-norm related problem that added one term to control the energy of trivial templates. By regularizing the representation problem to enforce joint sparsity and learning the particle representations together, Tianzhu Zhang et al. [34] propose a computationally efficient multi-task sparse learning method to mine correlations among different tasks to obtain better tracking results than learning each task individually. The linear representation in their later work [37] incorporates background templates in the dictionary to discriminate the target from the background better and casts the tracking problem as an efficient low-rank matrix learning problem.
To further accelerate the L1 Tracker, Hanxi Li et al. [16] use the orthogonal matching pursuit (OMP) algorithm to search for a sparse solution. To reduce the number of particle samples that need to participate in solving the optimisation problems, Xue Mei et al. [10] improve the computation speed of L1 Tracker with a minimal error bounding strategy called the BPR-L1 Tracker. The APG-L1 Tracker proposed by Chenglong Bao et al. [24] used an accelerated proximal gradient (APG) approach to solve a new L1-norm related problem that added one term to control the energy of trivial templates. By enforcing joint sparsity, Tianzhu Zhang et al. [34] propose a computationally efficient multi-task sparse learning method, which learns the particle representation solutions of all particle samples together.
Some work based on sparse representation is devoted to improve the robustness of tracking. To alleviate the accumulation of errors during the self-updating, Baiyang Liu et al. [31] use a static sparse dictionary and a dynamically online updated basis distribution to model the target appearance. In order to deal with the challenge of drastic appearance change, Zhong Wei et al. [32] propose an appearance model exploiting both holistic templates and local representation. Jia Xu et al. [33] develop an appearance model, which exploits both partial information and spatial information of the target based on a novel alignment-pooling method. To discriminate the target from the background, Tianzhu Zhang et al. [37] incorporates background templates into the dictionary of sparse representation and reformulates the tracking problem as an efficient low-rank matrix learning problem.
In the view of the model of the representation error, LSS [36] assumes that the representation error follows the Gaussian–Laplacian distribution. Dong Wang et al. [35], [36] use classic principal component analysis (PCA) to learn effective appearance model. It needs to be stressed that there is no sparsity constraint on the representation coefficients in [35], [36]. Consequently LSS [36] is not in the framework of sparse representation.
In this paper, we propose a new tracking method under the framework of the L1 Tracker that can work more quickly and robustly. Our main contributions include:
- 1)
The representation error is modelled as a random variable following a Laplacian distribution. The representation error, which indicates corruption or noise, is random and unknown in advance. Thus, accurately modelling the representation error is a key to the robustness of tracking. After an elaborate analysis of the distribution of the corruption, we find that the distribution of the corruption is characterised by one spike and a long-tail, so we model the representation error as a Laplacian distribution.
- 2)
Based on the Laplacian representation error model and a sparseness-promoting prior of the representation vector, we derive our new LAD–Lasso model with a Bayesian Maximum A Posteriori (MAP) estimate. The number of optimisation variables in our new model is equal to the number of target templates, regardless of the dimensions of the feature. Thus, the computation cost can be reduced greatly compared with L1 Tracker and APG-L1 Tracker.
- 3)
After reformulating our proposed optimisation model, we use Alternating Direction Method of Multipliers (ADMM) to solve our proposed nonsmooth optimisation problem.
We name our new method the LAD Tracker (Least Absolute Deviation). Experiments on challenging video sequences demonstrate our method performs well in computation speed and robustness.
This paper is organised as follows: In Section 2, we briefly review the basic idea of trackers based on sparse representation. Section 3 introduces our LAD Tracker in detail. In Section 4, we make a theoretical analysis of the robustness and computation cost, compared with other trackers based on sparse representation. In Section 5, we demonstrate the performances of the LAD Tracker through numerous experiments. The conclusion is made in Section 6.
Section snippets
Trackers based on sparse representation
In this section, we will briefly introduce the framework of trackers based on sparse representation. John Wright and Yi Ma et al. [20] addressed the problem of human face recognition via computing sparse linear representations with regard to a dictionary of different human faces. Then, Xue Mei et al. [8] extend the application based on sparse representation to tracking named the L1 Tracker. There are many later works that improve on this method [10], [16], [24]. All of these related trackers
LAD Tracker using least absolute deviation
This section will introduce the details of our proposed LAD Tracker.
As introduced in Section 2, the main purpose of a sparse representation is to estimate the representation vector x efficiently and accurately under the given T and y.
The main idea of LAD Tracker can be described in Fig. 2. First, from the representation model in Eq. (1), the representation error e will affect the estimate of x and thus will affect the robustness of tracking when corruptions occur. Thus, how to accurately model
Analysis of robustness and computation cost
In this section, we will discuss the robustness and computation cost of our LAD Tracker.
Experiments
In this section, we perform some experiments to demonstrate the performance of our LAD Tracker.
We compare LAD Tracker with 7 state of the art trackers: APG-L1 Tracker [24], L1 Tracker [8], CT [14], MIL Tracker [6] IVT [5], MTT [34] and OSP [35]. It is worth mentioning that different trackers are suitable for different sequences. For example, trackers based on tracking-by-detection, such as MIL Tracker and CT perform better than trackers based on sparse representation when dramatic pose
Conclusions
In this paper, we proposed a new tracking method based on sparse representation. By modelling the corruption as a Laplacian distribution, we propose a new optimisation model for estimating the representation vector and use an ADMM optimisation algorithm to solve it. Numerous simulation results on challenging sequences demonstrated that our LAD Tracker performs very well.
There is still one interesting question about the parameter λ = a/b in our model in Eq. (9). It is an important parameter that
References (37)
- et al.
A dual algorithm for the solution of nonlinear variational problems via finite element approximation
Comput. Math. Appl.
(1976) - et al.
Object tracking: a survey
ACM J. Comput. Surv.
(2006) - et al.
Contour-based object tracking with occlusion handling in video acquired using mobile cameras
IEEE Trans. Pattern Anal. Mach. Intell.
(2004) - et al.
Real-time tracking of non-rigid objects using mean shift
Comput. Vis. Pattern Recognit.
(2000) - et al.
Color-based probabilistic tracking
Eur. Conf. Comp. Vision
(2002) - et al.
Incremental learning for robust visual tracking
Int. J. Comput. Vis.
(2008) - et al.
Visual tracking with online multiple instance learning
Comput. Vis. Pattern Recognit.
(2009) - et al.
A boosted particle filter: multitarget detection and tracking
Eur. Conf. Comput. Vis.
(2004) - et al.
Robust visual tracking using ℓ1 minimization
Int. Conf. Comput. Vis.
(2009) - et al.
Robust visual tracking and vehicle classification via sparse representation
IEEE Trans. Pattern Anal. Mach. Intell.
(2011)
Minimum error bounded efficient ℓ1 tracker with occlusion detection
Comput. Vis. Pattern Recognit.
Convex Optimization
A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking
IEEE Trans. Signal Process.
CONDENSATION — Conditional Density Propagation for Visual Tracking
Int. J. Comput. Vision
Real-time compressive tracking
Eur. Conf. Comput. Vis.
Structural similarity-based object tracking in multimodality surveillance videos
Mach. Vis. Appl.
Real-time visual tracking using compressive sensing
Comput. Vis. Pattern Recognit.
Alternating direction algorithms for ℓ 1-problems in compressive sensing
SIAM J. Sci. Comput.
Cited by (0)
- ☆
This paper has been recommended for acceptance by Ming-Hsuan Yang.