Visual tracking via context-aware local sparse appearance model☆
Introduction
As one of the most fundamental and important topics in pattern recognition and computer vision, visual tracking has received widespread concerns [1], [2], [3], [4], [5], [6], [7], [8], [9]. After obtaining the unique initial states of an uncertain object (maybe a person or a car), visual tracking aims at estimating the size and location of this specific object continuously in an image sequence. Although there are many breakthroughs in recent years, visual tracking still remains a quite challenging task due to the factors such as occlusion, varying illumination, background clutter, etc.
Recently, a large number of sparse representation based trackers have sprung up in visual tracking with demonstrated success [10]. The basic idea of sparse representation in tracking is to represent each target candidate by the linear combination of dictionary atoms with sparse constraint. According to the conventional representation scheme, sparse representation based appearance model can be classified into global or local pattern.
Global sparse trackers [11], [12], [13], [14], [15], [16] adopt the holistic template of a target as the appearance representation. Although good performance are reported, their methods may fail with high possibility when the local appearance changes occur. Partial variations will cause an imprecise similarity measurement between candidates and the object since they are treated as single entities. Compared to global sparse appearance models, local sparse appearance models [17], [18], [19], [20], [21], [22] are more attractive due to their effectiveness in handling partial occlusions. However, there still exist several drawbacks in local sparse appearance models as follows. (1) Different impacts of different local patches are not significantly considered on the likelihood computation. Due to the appearance variation difference of local patches during tracking, it is necessary to make a distinction among them rather than treating them equivalently. (2) Local sparse appearance models are less effective in dealing with background clutter. The underlying reason for this weakness lies in that discriminative information is rarely used in local sparse appearance models. (3) Most local sparse trackers update the dictionary based on the holistic pattern. That is to say, once the tracking result is updated, all local patches within it are updated naturally. When the tracking result contains some occluded patches, error will be accumulated if updated and thus result in a dirty dictionary. Otherwise, some effective appearance changes may fail to be captured, leading to degraded representation ability of the dictionary.
To address above issues, we propose a context-aware local sparse appearance model for robust visual tracking. First, considering that different local patches should have varying degrees of impact on the likelihood computation of target candidates, we develop a novel Impact Allocation Strategy to adaptively allocate positive impact factor for each local patch using spatial-temporal context. Specifically, a local patch which is more distinguishing from the spatial context will obtain higher positive impact factor on the final likelihood function. To this end, we separately construct local object dictionary and local context dictionary to represent each patch inside the tracking result, which introduces some discriminative information to alleviate the drift problem. Furthermore, we exploit temporal context for more robust tracking. Historical information provided by the previous frame is utilized to help locate the target more accurately. To the best of our knowledge, there are few tracking methods which integrate the spatial-temporal context information into local sparse appearance model to consider appearance difference among different local patches. And extensive experiments have demonstrated the effectiveness of this integration.
Second, to ensure that all effective appearance changes can be captured from the tracking result, a new patch-based dictionary update method is proposed in this paper. All local patches are updated independently with the validation of effectiveness using sparsity concentration index [23] and spatial context. On the one hand, we exploit sparsity concentration index to check whether the local patch to be updated is a valid local patch from the target object. On the other hand, spatial context is used to provide some discriminative information to eliminate the effect of the background. In this way, even though most of local patches inside the target object undergo heavy occlusions, the tracker is still capable of capturing effective appearance changes from un-occluded local patches without missing.
The main contributions are summarized as follows.
- 1.
An effective context-aware local sparse appearance model is proposed for robust tracking. The reliability of each local patch is measured through the whole tracking process with spatial-temporal context.
- 2.
A novel impact allocation strategy (IAS) is developed to consider appearance difference among local patches. Different local patches are adaptively allocated varying positive impact factors on the likelihood computation of target candidates.
- 3.
A patch-based dictionary update scheme is presented to ensure that all effective appearance changes can be updated into the dictionary without missing, even if the tracking result is mostly under occlusion.
Section snippets
Related works
There is extensive literature about various tracking methods, we advise readers to refer to [24], [25], [26] for thorough acquaintance. Here, we only talk about the most related works to ours and the tracking technology used in this study.
Proposed tracking algorithm
In this section, we first show the construction of local object dictionary and local context dictionary and describe the structural local sparse appearance model. Then, we present a novel IAS using spatial-temporal context and analyze it specifically. Finally, a patch-based dictionary update scheme is introduced in detail.
Experimental evaluation
In this section, we illustrate the experimental methodology and conduct performance evaluation for demonstrating the effectiveness of the proposed tracking method. Our work is performed on a PC machine with Intel Core i5-5200U CPU 2.2 GHz and 8G memory. The source code is implemented in MATLAB. All experiment evaluations are based on one-pass evaluation (OPE) criterion.
Conclusion
In this paper, we propose an effective tracking method based on context-aware local sparse appearance model. With consideration of appearance variation difference among different local patches, we assign varying positive impact factors to them using spatial context, which adds some discriminative information to alleviate tracking drift. Moreover, historical information is also induced to provide support for more robust tracking with temporal context. In order to capture all effective appearance
Acknowledgements
This work was supported by the National key R&D Program of China (Grant 2017YFB0202901, 2017YFB0202905). This work was partially supported by the National Natural Science Foundation of China (No.61672215, No.91320103), the Special Project on the Integration of Industry, Education and Research of Guangdong Province, China (No.2012A090300003) and the Science and Technology Planning Project of Guangdong Province, China (No.2013B090700003). The corresponding author of this paper is Manman Peng
References (56)
- et al.
Robust object tracking based on adaptive templates matching via the fusion of multiple features
J. Visual Commun. Image Represent.
(2017) - et al.
Improved kernelized correlation filter tracking by using spatial regularization
J. Visual Commun. Image Represent.
(2018) - et al.
Learning adaptively windowed correlation filters for robust tracking
J. Visual Commun. Image Represent.
(2018) - et al.
Sparse coding based visual tracking: Review and experimental comparison
Pattern Recogn.
(2013) - et al.
Structure-aware local sparse coding for visual tracking
IEEE Trans. Image Process. PP
(2018) - et al.
Visual tracking with structured patch-based model
Image Vision Comput.
(2017) - et al.
Dual-scale structural local sparse appearance model for robust object tracking
Neurocomputing
(2017) - et al.
Multi-task structure-aware context modeling for robust keypoint-based object tracking
IEEE Trans. Pattern Anal. Mach. Intell.
(2018) - et al.
Online learning and joint optimization of combined spatial-temporal models for robust visual tracking
Neurocomputing
(2017) - et al.
Sparse representation combined with context information for visual tracking
Neurocomputing
(2017)
Robust structural sparse tracking
IEEE Trans. Pattern Anal. Mach. Intell. PP
Structure-aware local sparse coding for visual tracking
IEEE Trans. Image Process. PP
Tracking-learning-detection
IEEE Trans. Pattern Anal. Mach. Intell.
Robust object tracking via sparse collaborative appearance model
IEEE Trans. Image Process.
Robust visual tracking via multiple kernel boosting with affinity constraints
IEEE Trans. Circuits Syst. Video Technol.
Multi-timescale collaborative tracking
IEEE Trans. Pattern Anal. Mach. Intell.
High-speed tracking with kernelized correlation filters
IEEE Trans. Pattern Anal. Mach. Intell.
Robust visual tracking via convolutional networks without training
IEEE Trans. Image Process.
Robust visual tracking via exclusive context modeling
IEEE Trans. Cybernetics
Online object tracking with sparse prototypes
IEEE Trans. Image Process.
Inverse sparse group lasso model for robust object tracking
IEEE Trans. Multimedia
Robust visual tracking based on product sparse coding
Pattern Recognit. Lett.
Discriminative tracking using tensor pooling
IEEE Trans. Cybernetics
Visual tracking using strong classifier and structural local sparse descriptors
IEEE Trans. Multimedia
Cited by (12)
Visual object tracking using sparse context-aware spatio-temporal correlation filter
2020, Journal of Visual Communication and Image RepresentationCitation Excerpt :However, designing a reliable and robust tracking algorithm is still a challenging problem due to various factors that include partial occlusion, deformation, large-scale variations, illumination, clutter, fast motion, and motion blur. To cope with these challenges, several tracking algorithms have been proposed in the literature, which improved the tracker performance [9–13]. Correlation filter (CF)-based discriminative tracking [14–17] method has gained much attention among the VOT research communities.
Reliable correlation tracking via dual-memory selection model
2020, Information SciencesCitation Excerpt :This technique aims to estimate the trajectory of an unknown target in an image sequence with only a given initial state. Although significant progress [1,2,13,18,23,31] has been achieved over the past decades, designing an efficient and robust tracking algorithm is still quite challenging due to several factors, such as target deformations, background clutters and occlusions. Recently, discriminative correlation filters (DCFs) have been successfully applied to visual tracking and have received extensive attention.
Visual tracking via dynamic weighting with pyramid-redetection based Siamese networks
2019, Journal of Visual Communication and Image RepresentationCitation Excerpt :Given that tracking tasks involve offline-learned details and online variation, thus, how to design an algorithm, which contains offline or online knowledge, to deal with above basic challenges better has become an open problem these years. Most of the traditional tracking algorithms [1–9,35,36,38,41–43,58–65] lack of guidance from offline-trained knowledge. They exploit the known details in the first frame and the predictive results in the subsequent frames to construct the tracking model.
Multi-pattern correlation tracking
2019, Knowledge-Based SystemsSparse representation of 3D images for piecewise dimensionality reduction with high quality reconstruction
2019, ArrayCitation Excerpt :The results suggest that the method may be of assistance to image processing applications which rely on a transformation for data reduction as a first step of further processing. For examples of relevant applications we refer to Refs. [24–28]. Within the redundant dictionary framework for approximation, the problem of finding the sparsest decomposition of a given multi-channel image can be formulated as follows: Given an image and a dictionary, approximate the image by the ‘atomic decomposition’ (2) such that the number k of atoms is minimum.
Structural local sparse and low-rank tracker using deep features
2023, Multimedia Systems
- ☆
This paper has been recommended for acceptance by Zicheng Liu.