Visual tracking via context-aware local sparse appearance model

doi:10.1016/j.jvcir.2018.09.004

Journal of Visual Communication and Image Representation

Volume 56, October 2018, Pages 92-105

https://doi.org/10.1016/j.jvcir.2018.09.004 Get rights and content

Highlights

•
A context-aware local sparse appearance model is proposed for robust tracking.
•
An impact allocation strategy (IAS) is developed to consider appearance difference among local patches.
•
A patch-based dictionary update scheme is presented to ensure the update of all effective appearance changes.

Abstract

Most existing local sparse trackers are prone to drifting away as they do not make use of discriminative information of local patches. In this paper, we propose an effective context-aware local sparse appearance model to alleviate the drift problem caused by background clutter and occlusions. First, considering that different local patches should have different impacts on the likelihood computation, we present a novel Impact Allocation Strategy (IAS) with integration of the spatial-temporal context. Varying positive impact factors are adaptively assigned to different local patches based on their ability distinguishing the spatial context, which provides discriminative information to prevent the tracker from drifting. Furthermore, we exploit temporal context to introduce some historical information for more accurate locating. Second, we present a new patch-based dictionary update method being able to update each patch independently with the validation of effectiveness. On the one hand, we introduce sparsity concentration index to check whether the local patch to be updated is a valid local patch from the target object. On the other hand, spatial context is further employed to eliminate the effect of the background. Experimental results show the superiority and competitiveness of the proposed method on the benchmark data set compared to other state-of-the-art algorithms.

Introduction

As one of the most fundamental and important topics in pattern recognition and computer vision, visual tracking has received widespread concerns [1], [2], [3], [4], [5], [6], [7], [8], [9]. After obtaining the unique initial states of an uncertain object (maybe a person or a car), visual tracking aims at estimating the size and location of this specific object continuously in an image sequence. Although there are many breakthroughs in recent years, visual tracking still remains a quite challenging task due to the factors such as occlusion, varying illumination, background clutter, etc.

Recently, a large number of sparse representation based trackers have sprung up in visual tracking with demonstrated success [10]. The basic idea of sparse representation in tracking is to represent each target candidate by the linear combination of dictionary atoms with sparse constraint. According to the conventional representation scheme, sparse representation based appearance model can be classified into global or local pattern.

Global sparse trackers [11], [12], [13], [14], [15], [16] adopt the holistic template of a target as the appearance representation. Although good performance are reported, their methods may fail with high possibility when the local appearance changes occur. Partial variations will cause an imprecise similarity measurement between candidates and the object since they are treated as single entities. Compared to global sparse appearance models, local sparse appearance models [17], [18], [19], [20], [21], [22] are more attractive due to their effectiveness in handling partial occlusions. However, there still exist several drawbacks in local sparse appearance models as follows. (1) Different impacts of different local patches are not significantly considered on the likelihood computation. Due to the appearance variation difference of local patches during tracking, it is necessary to make a distinction among them rather than treating them equivalently. (2) Local sparse appearance models are less effective in dealing with background clutter. The underlying reason for this weakness lies in that discriminative information is rarely used in local sparse appearance models. (3) Most local sparse trackers update the dictionary based on the holistic pattern. That is to say, once the tracking result is updated, all local patches within it are updated naturally. When the tracking result contains some occluded patches, error will be accumulated if updated and thus result in a dirty dictionary. Otherwise, some effective appearance changes may fail to be captured, leading to degraded representation ability of the dictionary.

To address above issues, we propose a context-aware local sparse appearance model for robust visual tracking. First, considering that different local patches should have varying degrees of impact on the likelihood computation of target candidates, we develop a novel Impact Allocation Strategy to adaptively allocate positive impact factor for each local patch using spatial-temporal context. Specifically, a local patch which is more distinguishing from the spatial context will obtain higher positive impact factor on the final likelihood function. To this end, we separately construct local object dictionary and local context dictionary to represent each patch inside the tracking result, which introduces some discriminative information to alleviate the drift problem. Furthermore, we exploit temporal context for more robust tracking. Historical information provided by the previous frame is utilized to help locate the target more accurately. To the best of our knowledge, there are few tracking methods which integrate the spatial-temporal context information into local sparse appearance model to consider appearance difference among different local patches. And extensive experiments have demonstrated the effectiveness of this integration.

Second, to ensure that all effective appearance changes can be captured from the tracking result, a new patch-based dictionary update method is proposed in this paper. All local patches are updated independently with the validation of effectiveness using sparsity concentration index [23] and spatial context. On the one hand, we exploit sparsity concentration index to check whether the local patch to be updated is a valid local patch from the target object. On the other hand, spatial context is used to provide some discriminative information to eliminate the effect of the background. In this way, even though most of local patches inside the target object undergo heavy occlusions, the tracker is still capable of capturing effective appearance changes from un-occluded local patches without missing.

The main contributions are summarized as follows.

1.
An effective context-aware local sparse appearance model is proposed for robust tracking. The reliability of each local patch is measured through the whole tracking process with spatial-temporal context.
2.
A novel impact allocation strategy (IAS) is developed to consider appearance difference among local patches. Different local patches are adaptively allocated varying positive impact factors on the likelihood computation of target candidates.
3.
A patch-based dictionary update scheme is presented to ensure that all effective appearance changes can be updated into the dictionary without missing, even if the tracking result is mostly under occlusion.

Section snippets

Related works

There is extensive literature about various tracking methods, we advise readers to refer to [24], [25], [26] for thorough acquaintance. Here, we only talk about the most related works to ours and the tracking technology used in this study.

Proposed tracking algorithm

In this section, we first show the construction of local object dictionary and local context dictionary and describe the structural local sparse appearance model. Then, we present a novel IAS using spatial-temporal context and analyze it specifically. Finally, a patch-based dictionary update scheme is introduced in detail.

Experimental evaluation

In this section, we illustrate the experimental methodology and conduct performance evaluation for demonstrating the effectiveness of the proposed tracking method. Our work is performed on a PC machine with Intel Core i5-5200U CPU 2.2 GHz and 8G memory. The source code is implemented in MATLAB. All experiment evaluations are based on one-pass evaluation (OPE) criterion.

Conclusion

In this paper, we propose an effective tracking method based on context-aware local sparse appearance model. With consideration of appearance variation difference among different local patches, we assign varying positive impact factors to them using spatial context, which adds some discriminative information to alleviate tracking drift. Moreover, historical information is also induced to provide support for more robust tracking with temporal context. In order to capture all effective appearance

Acknowledgements

This work was supported by the National key R&D Program of China (Grant 2017YFB0202901, 2017YFB0202905). This work was partially supported by the National Natural Science Foundation of China (No.61672215, No.91320103), the Special Project on the Integration of Industry, Education and Research of Guangdong Province, China (No.2012A090300003) and the Science and Technology Planning Project of Guangdong Province, China (No.2013B090700003). The corresponding author of this paper is Manman Peng

References (56)

Z. Li et al.
Robust object tracking based on adaptive templates matching via the fusion of multiple features
J. Visual Commun. Image Represent.
(2017)
L. Gao et al.
Improved kernelized correlation filter tracking by using spatial regularization
J. Visual Commun. Image Represent.
(2018)
Y. Kuai et al.
Learning adaptively windowed correlation filters for robust tracking
J. Visual Commun. Image Represent.
(2018)
S. Zhang et al.
Sparse coding based visual tracking: Review and experimental comparison
Pattern Recogn.
(2013)
Y. Qi et al.
Structure-aware local sparse coding for visual tracking
IEEE Trans. Image Process. PP
(2018)
F. Li et al.
Visual tracking with structured patch-based model
Image Vision Comput.
(2017)
Z. Zhao et al.
Dual-scale structural local sparse appearance model for robust object tracking
Neurocomputing
(2017)
X. Li et al.
Multi-task structure-aware context modeling for robust keypoint-based object tracking
IEEE Trans. Pattern Anal. Mach. Intell.
(2018)
T. Zhou et al.
Online learning and joint optimization of combined spatial-temporal models for robust visual tracking
Neurocomputing
(2017)
P. Feng et al.
Sparse representation combined with context information for visual tracking
Neurocomputing
(2017)

T. Zhang et al.

Robust structural sparse tracking

IEEE Trans. Pattern Anal. Mach. Intell. PP

(2018)

Y. Qi et al.

Structure-aware local sparse coding for visual tracking

IEEE Trans. Image Process. PP

(2018)

Z. Kalal et al.

Tracking-learning-detection

IEEE Trans. Pattern Anal. Mach. Intell.

(2012)

W. Zhong et al.

Robust object tracking via sparse collaborative appearance model

IEEE Trans. Image Process.

(2014)

F. Yang et al.

Robust visual tracking via multiple kernel boosting with affinity constraints

IEEE Trans. Circuits Syst. Video Technol.

(2014)

D. Chen et al.

Multi-timescale collaborative tracking

IEEE Trans. Pattern Anal. Mach. Intell.

(2017)

J.F. Henriques et al.

High-speed tracking with kernelized correlation filters

IEEE Trans. Pattern Anal. Mach. Intell.

(2015)

K. Zhang et al.

Robust visual tracking via convolutional networks without training

IEEE Trans. Image Process.

(2016)

T. Zhang et al.

Robust visual tracking via exclusive context modeling

IEEE Trans. Cybernetics

(2016)

D. Wang et al.

Online object tracking with sparse prototypes

IEEE Trans. Image Process.

(2013)

Y. Zhou et al.

Inverse sparse group lasso model for robust object tracking

IEEE Trans. Multimedia

(2017)

T. Zhang, B. Ghanem, S. Liu, N. Ahuja, Robust visual tracking via multi-task sparse learning, in: 2012 IEEE Conference...

X. Mei, H. Ling, Robust visual tracking using ℓ1 minimization, in: IEEE 12th International Conference on Computer...

H. Huang et al.

Robust visual tracking based on product sparse coding

Pattern Recognit. Lett.

(2015)

X. Jia, H. Lu, M. Yang, Visual tracking via adaptive structural local sparse appearance model, in: 2012 IEEE Conference...

B. Ma et al.

Discriminative tracking using tensor pooling

IEEE Trans. Cybernetics

(2016)

B. Ma et al.

Visual tracking using strong classifier and structural local sparse descriptors

IEEE Trans. Multimedia

(2015)

T. Zhang, S. Liu, C. Xu, S. Yan, B. Ghanem, N. Ahuja, M. Yang, Structural sparse tracking, in: IEEE Conference on...

Cited by (12)

Visual object tracking using sparse context-aware spatio-temporal correlation filter
2020, Journal of Visual Communication and Image Representation
Citation Excerpt :
However, designing a reliable and robust tracking algorithm is still a challenging problem due to various factors that include partial occlusion, deformation, large-scale variations, illumination, clutter, fast motion, and motion blur. To cope with these challenges, several tracking algorithms have been proposed in the literature, which improved the tracker performance [9–13]. Correlation filter (CF)-based discriminative tracking [14–17] method has gained much attention among the VOT research communities.
This paper presents a novel sparse context-aware spatio-temporal correlation filter tracker (SCAST) method for robust visual object tracking. Different from the existing trackers, this paper introduce an $l_{1}$ multi-scale regularization parameter-based correlation filter that reduces the boundary effect due to partial occlusions, illumination and scale variations. At each iteration, the $l_{1}$ regularization parameter is updated through spatial knowledge of each correlation filter coefficient. Besides, the contextual information acquired from the target region can lead to determining the accurate localization of the target. Moreover, contextual information has combined with spatio-temporal factor to achieve the better performance. Further, an objective function is designed with system constraints to ensure the applicability of the model and the optimal solution is derived by utilizing the alternating direction method of multiplier, which leads to low computational cost. Finally, the feasibility and superiority of proposed tracker algorithm is evaluated through three benchmark dataset: OTB-2013, OTB-2015, and TempleColor-128.
Reliable correlation tracking via dual-memory selection model
2020, Information Sciences
Citation Excerpt :
This technique aims to estimate the trajectory of an unknown target in an image sequence with only a given initial state. Although significant progress [1,2,13,18,23,31] has been achieved over the past decades, designing an efficient and robust tracking algorithm is still quite challenging due to several factors, such as target deformations, background clutters and occlusions. Recently, discriminative correlation filters (DCFs) have been successfully applied to visual tracking and have received extensive attention.
Correlation-filter-based trackers have shown favorable accuracy and efficiency in visual tracking. However, most of these trackers are prone to drift in cases of heavy occlusions and temporal tracking failures because they only maintain the short-term memory of target appearance via a highly adaptive update mode. In this paper, we propose a reliable visual tracking method based on a dual-memory selection (DMS) model to alleviate tracking drift. Considering that long-term memory is robust to heavy occlusions while short-term memory performs well in rapid appearance changes, the proposed DMS model combines these two memory patterns of the target appearance and adaptively selects a reliable memory pattern to handle the current tracking challenges via a memory selector. For each memory pattern, a memory tracker is established based on discriminative correlation filters. The short-term tracker aggressively updates the target model to capture recent appearance changes via a linear interpolation update model, while the long-term tracker conservatively updates the target model to maintain historical appearance characteristics with a memory-improved update model and a dynamic learning rate. Furthermore, a novel memory evaluation criterion (MEC) is developed to evaluate the reliability of each tracker for memory selection. From credibility and discriminability measurements considering the temporal context, the memory tracker with the highest reliability score is selected to determine the target location in each frame. Extensive experiments on public benchmark datasets demonstrate that the proposed tracking method performs favorably compared to multiple recent state-of-the-art methods.
Visual tracking via dynamic weighting with pyramid-redetection based Siamese networks
2019, Journal of Visual Communication and Image Representation
Citation Excerpt :
Given that tracking tasks involve offline-learned details and online variation, thus, how to design an algorithm, which contains offline or online knowledge, to deal with above basic challenges better has become an open problem these years. Most of the traditional tracking algorithms [1–9,35,36,38,41–43,58–65] lack of guidance from offline-trained knowledge. They exploit the known details in the first frame and the predictive results in the subsequent frames to construct the tracking model.
Siamese network based similarity-learning algorithm is currently a significant branch of visual tracking. However, most of existing deep Siamese networks depend much on the offline-trained knowledge and always assume the same importance for different prediction views. In this paper, we first introduce a dynamic weighting module in Siamese framework, which could make the offline-trained network adapt to the current circumstance well and weight predictive response maps discriminatively. The thought stems from the basis that different maps have different predictive preference, which should not be treated equally. Secondly, in order to focus more on the accurate preference, we then introduce the residual structure to form the residual dynamic weighting module. Thirdly, we construct a simple online pyramid-redetection module to avoid local search and also consider the global viewpoint. Extensive experiments on both short-term and long-term tracking demonstrate that the proposed tracker possesses the competitive tracking performance over many mainstream state-of-the-art trackers.
Multi-pattern correlation tracking
2019, Knowledge-Based Systems
In this paper, we propose a novel multi-pattern correlation tracker (MPCT) which deeply models the appearance of the target object for robust tracking. Specifically, multiple correlation filters are learned to capture different appearance patterns of the target object during the tracking process and each filter represents one specific appearance pattern. With the proposed reliable and matching score, a two stage selection algorithm is developed to select a suitable correlation filter to localize the target object. To effectively obtain different filters, we design an online evaluation algorithm to generate filters for different appearance patterns. By taking advantage of multiple filters to model different appearance patterns, the proposed MPCT tracker can not only capture dynamic appearance changes under complex scenes but also deal with severe occlusion and model drift problems to achieve better tracking performance. Extensive experimental results prove that the proposed tracking algorithm performs superiorly against several state-of-the-art tracking methods on challenging tracking benchmarks.
Sparse representation of 3D images for piecewise dimensionality reduction with high quality reconstruction
2019, Array
Citation Excerpt :
The results suggest that the method may be of assistance to image processing applications which rely on a transformation for data reduction as a first step of further processing. For examples of relevant applications we refer to Refs. [24–28]. Within the redundant dictionary framework for approximation, the problem of finding the sparsest decomposition of a given multi-channel image can be formulated as follows: Given an image and a dictionary, approximate the image by the ‘atomic decomposition’ (2) such that the number k of atoms is minimum.
Sparse representation of 3D images is considered within the context of data reduction. The goal is to produce high quality approximations of 3D images using fewer elementary components than the number of intensity points in the 3D array. This is achieved by means of a highly redundant dictionary and a dedicated pursuit strategy especially designed for low memory requirements. The benefit of the proposed framework is illustrated in the first instance by demonstrating the gain in dimensionality reduction obtained when approximating true color images as very thin 3D arrays, instead of performing an independent channel by channel approximation. The full power of the approach is further exemplified by producing high quality approximations of hyper-spectral images with a reduction of up to 371 times the number of data points in the representation.
Structural local sparse and low-rank tracker using deep features
2023, Multimedia Systems

View all citing articles on Scopus

^☆: This paper has been recommended for acceptance by Zicheng Liu.

View full text

Visual tracking via context-aware local sparse appearance model☆

Highlights

Abstract

Introduction

Section snippets

Related works

Proposed tracking algorithm

Experimental evaluation

Conclusion

Acknowledgements

J. Visual Commun. Image Represent.

J. Visual Commun. Image Represent.

J. Visual Commun. Image Represent.

Pattern Recogn.

IEEE Trans. Image Process. PP

Image Vision Comput.

Neurocomputing

IEEE Trans. Pattern Anal. Mach. Intell.

Neurocomputing

Neurocomputing

IEEE Trans. Pattern Anal. Mach. Intell. PP

IEEE Trans. Image Process. PP

Tracking-learning-detection

IEEE Trans. Pattern Anal. Mach. Intell.

Robust object tracking via sparse collaborative appearance model

IEEE Trans. Image Process.

Robust visual tracking via multiple kernel boosting with affinity constraints

IEEE Trans. Circuits Syst. Video Technol.

Multi-timescale collaborative tracking

IEEE Trans. Pattern Anal. Mach. Intell.

High-speed tracking with kernelized correlation filters

IEEE Trans. Pattern Anal. Mach. Intell.

Robust visual tracking via convolutional networks without training

IEEE Trans. Image Process.

Robust visual tracking via exclusive context modeling

IEEE Trans. Cybernetics

Online object tracking with sparse prototypes

IEEE Trans. Image Process.

Inverse sparse group lasso model for robust object tracking

IEEE Trans. Multimedia

Robust visual tracking based on product sparse coding

Pattern Recognit. Lett.

Discriminative tracking using tensor pooling

IEEE Trans. Cybernetics

Visual tracking using strong classifier and structural local sparse descriptors

IEEE Trans. Multimedia