Global Coupled Learning and Local Consistencies Ensuring for sparse-based tracking

doi:10.1016/j.neucom.2014.12.060

Neurocomputing

Volume 160, 21 July 2015, Pages 191-205

https://doi.org/10.1016/j.neucom.2014.12.060 Get rights and content

Highlights

•
We sparsely represent the object in both global and local level for tracking, which aim to explore the object׳s holistic and local information respectively.
•
The global dictionary and classifier are coupled learned in our global part.
•
We define temporal and spatial consistencies among the object patches, and refine the tracking result by ensuring the consistencies.

Abstract

This paper presents a robust tracking algorithm by sparsely representing the object at both global and local levels. Accordingly, the algorithm is constructed by two complementary parts: Global Coupled Learning (GCL) part and Local Consistencies Ensuring (LCE) part. The global part is a discriminative model which aims to utilize the holistic features of the object via an over-complete global dictionary and classifier, and the dictionary and classifier are coupled learning to construct an adaptive GCL part. While in LCE part, we explore the object׳s local features by sparsely coding the object patches via a local dictionary, then both temporal and spatial consistencies of the local patches are ensured to refine the tracking results. Moreover, the GCL and LCE parts are integrated into a Bayesian framework for constructing the final tracker. Experiments on fifteen benchmark challenging sequences demonstrate that the proposed algorithm has more effectiveness and robustness than the alternative ten state-of-the-art trackers.

Introduction

Visual tracking is an indispensable topic in computer vision, due to its numerous applications in vehicle navigation, surveillance, and human–computer interaction [1], [2]. Although efforts have been made by many researchers for constructing more effective trackers in the past years [3], [4], [5], [6], [40], [44], tracking is still a challenging problem, since only few groundtruth in the first frame can be used, and the targets may undergo pose variation, occlusion, illumination changing, background cluttering, etc. All these challenges may contribute to track error and turn to drift.

To design a robust tracker which can handle the aforementioned challenges, various representation schemes are introduced into tracking task, such as pixel-based tracker [7], adopted features based trackers (e.g., texture [8], color [9], sparse-based tracker [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23]), description models based trackers (e.g., histogram [15], subspace representation [11], [40]) and multilevel quantization tracker [16]. Among the schemes listed above, sparse representation is wildly considered to be an effective tool for dealing with the aforementioned challenges.

As to sparse-based trackers, Mei and Ling [10] sparsely represent each object in a space spanned by trivial templates to tackle occlusion, and corruption challenges. Jia et al. [17] propose a tracking method based on a structural local sparse appearance model which exploits both local and spatial information of the target. Moreover, Bai et al. [13] model the object as a sparse linear combination of structured union of subspaces in a basis library. In addition, Hong et al. [18] integrate a multi-task and multi-view sparse learning problem into particle filter framework, which aims to explore underlying relationships between different particles and various types of visual features for tracking. However, all these works only address the foreground samples to construct generative models while ignoring the information from background.

To add the background information, Liu et al. [19] and Wang et al. [20] construct discriminative models based on sparse representation; however, they only encode the local patches of both object and background, while lose the holistic information provided by the object. Additionally, Xie et al. [14] utilize the sparse representation of target and background by combining both generative model and discriminative model, but they only encode the object in a holistic level without taking the local information into consideration. Meanwhile, some other algorithms also try to integrate both generative and discriminative models for tracking [24], [25], [26], [27]; however, they are not sparse-based algorithms and they do not exploit the combination of holistic and local features.

To address the above problems, in this paper, we aim to integrate the advantages of both discriminative model and generative model to exploit the holistic and local information from the object. Thus, the proposed algorithm is constructed by both global and local parts. In the global part, we encode holistic information of both object and background via a global dictionary, then the sparse codes are used to train a classifier to roughly distinguish the target from the backgrounds. As to the update scheme, we coupled learning the global dictionary and classifier instead of updating the dictionary and classifier as two separate parts as traditional algorithms in [20], [21]. In local part, we first partition the candidates into patches, then use a local dictionary to encode the patches into sparse codes. Finally, between two consecutive frames, both temporal and spatial consistencies of the patches are ensured to refine the tracking results. Moreover, the global part and local part are two complementary parts which contain both holistic and local information of the object, and we integrate them into a Bayesian inference framework to construct the final tracker.

The contributions of this paper are as follows:

(1)
We sparsely represent the object in both global and local levels via two complementary parts, and these two parts give novel aspects to utilize the object׳s holistic and local information for tracking.
(2)
To construct an adaptive GCL part, we employ an online algorithm to coupled learning of the global dictionary and classifier.
(3)
In LCE part, we propose a new method to calculate the candidates׳ local confidences based on the temporal and spatial consistencies among the object patches.

Similar with our work, Zhong et al. [31] propose a sparse-based collaborative model which exploits both holistic and local information of the object. But we are different from them in both the way of sparse representation and the dictionary updating algorithm. Moreover, we use two stage filtering to combine the global and local parts instead of simply multiplying the confidence values of the holistic template and local patches in [31], and more detailed differences between the two trackers will be discussed in Section 2.

The paper is organized as follows. Firstly, we briefly discuss some related work in Section 2, and the details of our proposed tracker will be presented in Section 3. Then Section 4 presents the quantitative and qualitative comparisons between the proposed algorithm and some state-of-the-art trackers. Finally, conclusions and future work are followed in Section 5.

Section snippets

Related work

Our work is closely related to two topics: sparse representation and dictionary learning. Many algorithms have been designed in these two topics, and good reviews can be found in [2], [3], [4]. Next, we present some most relevant work which motivates our paper and shows our respect on them.

As introduced in Section 1, there are plentiful literatures which bring sparse representation into appearance modeling. Wang et al. [20] encode the local patches inside the object region and concatenate the

Proposed algorithm

In this paper, we sparsely represent the image samples at both global and local levels, then construct two complementary parts based on these two levels; finally, we integrate the two parts into a Bayesian inference framework for tracking. More details are elucidated in the rest of this section.

Experiments

To evaluate the performance of the proposed tracker, fifteen benchmarks are collected for comparison, and the challenges of the benchmarks are listed in Table 1. In addition, ten state-of-the-art trackers are compared with GLT algorithm on the benchmark video sequences; they are OAB [41], FragTrack [43], IVT [40], L1T [10], NNT [45], MIL [44], ODLSR [20], MTT [22], CT [42], and Collaborative Tracker (CollaT) [31]. We use the publicly available source codes and keep the original implementations

Conclusions

In this paper, we proposed a novel algorithm which aims to mine the global and local information of the object via two level sparse representation. In GCL part, we extracted the samples׳ holistic features based on Gaussian pyramid and over-complete global dictionary, then a classifier was constructed to roughly estimate the locations of the object. In LCE part, the candidates were partitioned into patches, then we encoded the patches via a local dictionary. Both temporal and spatial

Acknowledgment

The authors would like to thank the anonymous editor and reviewers who gave valuable suggestions that has helped to improve the quality of the manuscript. This research has been supported by the National Natural Science Foundation of China (Grant nos. 61402480, U1135005) and National High Technology Research and Development Program of China (Grant no. 2013AA01A607).

Yehui Yang is currently a Ph.D. candidate in Institute of Automation, Chinese Academy of Sciences (CAS). He received the B.S. degree in Automation from the College of Electrical and Information Engineering at Hunan University, Chang Sha, PRC, in 2011. His research interests include computer vision, machine learning and pattern recognition.

References (45)

S. Zhang et al.
Sparse coding based visual trackingreview and experimental comparison
Pattern Recognit.
(2013)
T.X. Bai et al.
Robust visual tracking with structured sparse representation appearance model
Pattern Recognit.
(2012)
Y. Xie et al.
Discriminative subspace learning with sparse representation view-based model for robust visual tracking
Pattern Recognit.
(2014)
K. Cannons, A Review of Visual Tracking, Technical Report, York University, Canada, Available:...
A. Yilmaz et al.
Object trackinga survey
ACM Comput. Surv.
(2006)
Y. Wu, J. Lim, M.-H. Yang, Online object tracking: a benchmark, in: CVPR,...
M. Isard, A. Blake, Condensation—conditional density propagation for visual tracking, in: Proceedings of the...
D. Comaniciu et al.
Kernel-based object tracking
IEEE Trans. Pattern Anal. Mach. Intell.
(2003)
S. Duffner, C. Garcia, Pixeltrack: a fast adaptive algorithm for tracking non-rigid objects. in: ICCV, 2013, pp....
S. Avidan
Ensemble tracking
IEEE Trans. Pattern Anal. Mach. Intell.
(2007)

P. Prez, C. Hue, J. Vermaak, M. Gangnet, Color-based probabilistic tracking, in: ECCV, 2002, pp....

X. Mei, H. Ling, Robust visual tracking using l1 minimization, in: ICCV, 2009, pp....

D. Wang et al.

Online object tracking with sparse prototypes

IEEE Trans. Image Process.

(2013)

Z. Hong, X. Mei, D. Tao, Dual-force metric learning for robust distracter-resistant tracker, in: ICCV,...

D. Comaniciu et al.

Kernel-based object tracking

IEEE Trans. Pattern Anal. Mach. Intell.

(2003)

Z. Hong, C. Wang, X. Mei, D. Prokhorov, D. Tao, Tracking using Multilevel Quantizations, in: ECCV,...

X. Jia, H. Lu, M.H. Yang, Visual tracking via adaptive structural local sparse appearance model, in: CVPR, 2012, pp....

Z. Hong, X. Mei, D. Prokhorov, D. Tao, Tracking via robust multi-task multi-view joint sparse representation, in: ICCV,...

B. Liu, L. Yang, J. Huang, P. Meer, L. Gong, C. Kulikowski, Robust and fast collaborative tracking with two stage...

Q. Wang, F. Chen, W. Xu, M.H. Yang, Online discriminative object tracking with local sparse representation, in: WACV,...

Y. Xie et al.

Discriminative object tracking via sparse representation and online dictionary learning

IEEE Trans. Cybern.

(2014)

T. Zhang, B. Ghanem, S. Liu, N. Ahuja, Robust visual tracking via structured multi-task sparse learning, in:...

Cited by (14)

Visual tracking using Locality-constrained Linear Coding and saliency map for visible light and infrared image sequences
2018, Signal Processing: Image Communication
Citation Excerpt :
This section divides the existing these trackers into three categories: (1) Sparse representation-based target searching [22,23]: For the discriminative trackers, sparse representation classifier (SRC) can be used as a binary classifier. The dictionary of SRC contains two subspaces: one is target templates and the other is background regions.
As a development of sparse coding, while retaining the advantage of sparse coding in classification, Locality-constrained Linear Coding(LLC) greatly improves the time efficiency of appearance modeling. However, in order to further promote the performance of real-time and develop a tracking algorithm that can be applied to both visible light images and infrared images, this paper proposes a tracking algorithm using LLC and saliency map under the framework of particle filtering. It is universally acknowledged that number of particles determines the accuracy of tracker under the framework of particle filtering. Unfortunately, the increase in the number of particles leads to the augment of computational burden. Therefore, the basic idea of the proposed algorithm is to reduce the computational number of observation vectors while keeping the effective number of particles and achieve the goal of strengthening the real-time performance of tracker. The proposed algorithm firstly uses spectral residual to obtain a saliency map of the current frame and then computes the saliency score of each particle. Secondly, several particles are eliminated directly according to the difference between the saliency score of the particle in the current frame and the target score in the previous frame. Thirdly, LLC is used to compute the observation vector for the rest particles and complete tracking tasks. Both quantitative and qualitative experimental results demonstrate that the proposed algorithm performs favorably against the nine state-of-the-art trackers on twelve challenging test sequences including six visible light sequences and six infrared sequences. In addition, related experimental results reveal that the proposed algorithm decreases the computational complexity and has the better tracking performance compared with the tracker just using LLC in the framework of particle filtering.
Visual tracking using global sparse coding and local convolutional features
2018, Digital Signal Processing: A Review Journal
Citation Excerpt :
Zhang et al. [25] utilize a bank of filters to convolve the target region at each position to extract useful local structural feature for the target representation. Moreover, some algorithms try to represent the target by exploiting the combination of holistic and local templates for tracking [36,37]. In this paper, we represent the target at both global and local levels.
Visual tracking is a challenging task in many computer vision applications due to factors such as occlusion, scale variations, background clutter, and so on. In this paper, we present a robust tracking algorithm by representing the target at two levels: global and local levels. Accordingly, the tracking algorithm is composed of two parts: global and local parts. The global part is a discriminative model which separates the foreground object from the background based on holistic features. In the local part, we explore the target's local representation by a set of filters convolving the target region at each position. Then, the global part and local part are integrated into a collaborative model to construct the final tracker. Experiments on the tracking benchmark dataset with 50 challenging videos demonstrate the robustness and effectiveness of the proposed algorithm, outperforming several state-of-the-art models.
Two-level superpixel and feedback based visual object tracking
2017, Neurocomputing
Citation Excerpt :
Cehovin et al. [50] propose a coupled-layer visual model that combines the target's global and local appearance by interlacing two layers. The algorithm in [51] is constructed by Global Coupled Learning (GCL) part and Local Consistencies Ensuring (LCE) part with sparsely representing object at both global and local levels. Wang et al. [52] represent object at all three levels: top, middle and bottom levels, then integrate all the features at all levels through a hierarchical tree structural constraint.
While numerous superpixel-based tracking algorithms have been proposed and demonstrated successfully, there still remain some challenges, such as determining the number of superpixels, mining and exploiting the structural information of superpixels and handling the drifts. In this paper, we propose a tracking method with two-level superpixels and a novel update strategy based on feedback to deal with the challenges mentioned above. Firstly, Bilateral filter is introduced to filter out outliers and improve the boundary capability of object as well as segmentation of superpixels. Then two-level superpixel is proposed to determine superpixel number automatically through iterating instead of setting superpixel number empirically which affects the robustness of tracking algorithm. Moreover, a novel measuring method which considers color similarity and relative positions of superpixels is proposed to make a better use of structural information of superpixels and improve tracking performance by adding relative position of superpixels into the appearance model. Finally, a feedback based update strategy is presented to handle drifts existing in tracking by calculating the adaptation of appearance model and updating the parameters like superpixel number and relative position of superpixels. Experiments on challenging sequences and comparisons to state-of-the-art methods demonstrate the feasibility and effectiveness of the proposed tracking algorithm.
Visual object tracking with online weighted chaotic multiple instance learning
2017, Neurocomputing
In this paper, a chaotic multiple instance learning tracker based on chaos theory for a robust and efficient online tracking is introduced. In this method, chaotic characteristics can be utilized for representing the target as well as the updating appearance model, which has not been used for the tracking task. The computational architecture of the method is organized as follows. (1) Chaotic representation: a chaotic model can capture the complex dynamics of the target region to train the weak classifiers. Our representation can balance the global and local features to handle fast motion, partial occlusion, and illumination changes. (2) Importance of instance: fractal dimension of the dynamic model can be adjusted as instance weight for efficient online learning. (3) Chaotic approximation: A robust chaotic approximation to update the appearance model is introduced, which is crucial to select the discriminative and robust features. Chaotic online learning quickly explores the feature space to update the appearance model of the target by means of a chaotic map. The experimental results reveal that the proposed method is more effective and robust than the state-of-the-art trackers on various challenging sequences. Indeed, the efficiency of the proposed method is attributed to its strong online updating of chaotic policy as well as desirable target representation of chaotic model.
Object tracking via inverse sparse representation and convolutional networks
2017, Optik
Citation Excerpt :
However, many generative methods need to compute ℓ1 minimization which leads to slow tracking speed. Recently, sparse representation [11–14] has been widely used in the field of object tracking [15–17] under the particle filtering framework and facilitates more accurate tracking results than the traditional algorithms. Mei et al. firstly apply sparse representation to object tracking and develop the ℓ1 tracker [18].
In this paper, we present a novel object tracking method based on inverse sparse representation and convolutional networks. First, in contrast to existing trackers based on conventional sparse representation, the target template can be sparsely represented by candidate dictionary in our method and the candidates corresponding to nonzero coefficients are selected as the some optimal candidates of tracking results. At the same time, locally normalized features are adopted to obtain the representation of target template and candidate dictionary, which can deal with partial occlusion and slight object appearance change. Second, a convolutional network is proposed to select the best candidate from the candidate set got by inverse sparse representation. Numerous bank filters are introduced to preserve local structure of the target and background samples and the feature maps are extracted to form the simple layers and complex layers. Finally, a simple local model update scheme is employed to accommodate occlusion and target appearance change. Both qualitative and quantitative evaluations on several challenging video sequences demonstrate that the proposed method can achieve favorable and stable results compared to the state-of-the-art trackers.
Dual-scale structural local sparse appearance model for robust object tracking
2017, Neurocomputing
Citation Excerpt :
Section 5 gives the experimental results of the trackers with our DSLSA model and other trackers, respectively, and in Section 6, conclusions are given. Sparse representation, which is a main part of sparse coding, has been applied in the field of visual tracking successfully [21–23,26,31,37–42]. In sparse representation, a signal can be represented as a linear combination of a few basis vectors.
Recently, sparse representation has been applied in object tracking successfully. However, the existing sparse representation captures either the holistic features of the target or the local features of the target. In this paper, we propose a dual-scale structural local sparse appearance (DSLSA) model based on overlapped patches, which can capture the quasi-holistic features and the local features of the target simultaneously. This paper first proposes two-scales structural local sparse appearance models based on overlapped patches. The larger-scale model is used to capture the structural quasi-holistic feature of the target, and the smaller-scale model is used to capture the structural local features of the target. Then, we propose a new mechanism to associate these two scale models as a new dual-scale appearance model. Both qualitative and quantitative analyses on challenging benchmark image sequences indicate that the tracker with our DSLSA model performs favorably against several state-of-the-art trackers.

View all citing articles on Scopus

Yuan Xie received the Ph.D. degree in Pattern Recognition and Intelligent Systems from the Institute of Automation, Chinese Academy of Sciences (CAS), in 2013. He received his master degree in School of Information Science and Technology from Xiamen University, China, in 2010. He is a member of IEEE. His research interests include image processing, computer vision, machine learning and pattern recognition.

Wensheng Zhang received the Ph.D. degree in Pattern Recognition and Intelligent Systems from the Institute of Automation, Chinese Academy of Sciences (CAS), in 2000. He joined the Institute of Software, CAS, in 2001. He is a Professor of Machine Learning and Data Mining and the Director of Research and Development Department, Institute of Automation, CAS. He has published over 32 papers in the area of Modeling Complex Systems, Statistical Machine Learning and Data Mining. His research interests include computer vision, pattern recognition, artificial intelligence and computer human interaction.

Wenrui Hu received the Ph.D. degree in Pattern Recognition and Intelligent Systems from the Institute of Automation, Chinese Academy of Sciences (CAS), in 2014. He received his master degree in School of Optoelectronic Information from Beijing Institute of Technology, China, in 2010. His research interests include image processing, computer vision, machine learning and pattern recognition.

Yuanhua Tan engaged in the information system construction of oil industry in long term. He is now a Senior Researcher in Karamay Hongyou Software Company, Xinjiang, PRC. His research interests include database, smart city and intelligent system.

View full text

Global Coupled Learning and Local Consistencies Ensuring for sparse-based tracking

Highlights

Abstract

Introduction

Section snippets

Related work

Proposed algorithm

Experiments

Conclusions

Acknowledgment

Pattern Recognit.

Pattern Recognit.

Pattern Recognit.

Object trackinga survey

ACM Comput. Surv.

Kernel-based object tracking

IEEE Trans. Pattern Anal. Mach. Intell.

Ensemble tracking

IEEE Trans. Pattern Anal. Mach. Intell.

Online object tracking with sparse prototypes

IEEE Trans. Image Process.

Kernel-based object tracking

IEEE Trans. Pattern Anal. Mach. Intell.

Discriminative object tracking via sparse representation and online dictionary learning

IEEE Trans. Cybern.