Elsevier

Neurocomputing

Volume 160, 21 July 2015, Pages 191-205
Neurocomputing

Global Coupled Learning and Local Consistencies Ensuring for sparse-based tracking

https://doi.org/10.1016/j.neucom.2014.12.060Get rights and content

Highlights

  • We sparsely represent the object in both global and local level for tracking, which aim to explore the object׳s holistic and local information respectively.

  • The global dictionary and classifier are coupled learned in our global part.

  • We define temporal and spatial consistencies among the object patches, and refine the tracking result by ensuring the consistencies.

Abstract

This paper presents a robust tracking algorithm by sparsely representing the object at both global and local levels. Accordingly, the algorithm is constructed by two complementary parts: Global Coupled Learning (GCL) part and Local Consistencies Ensuring (LCE) part. The global part is a discriminative model which aims to utilize the holistic features of the object via an over-complete global dictionary and classifier, and the dictionary and classifier are coupled learning to construct an adaptive GCL part. While in LCE part, we explore the object׳s local features by sparsely coding the object patches via a local dictionary, then both temporal and spatial consistencies of the local patches are ensured to refine the tracking results. Moreover, the GCL and LCE parts are integrated into a Bayesian framework for constructing the final tracker. Experiments on fifteen benchmark challenging sequences demonstrate that the proposed algorithm has more effectiveness and robustness than the alternative ten state-of-the-art trackers.

Introduction

Visual tracking is an indispensable topic in computer vision, due to its numerous applications in vehicle navigation, surveillance, and human–computer interaction [1], [2]. Although efforts have been made by many researchers for constructing more effective trackers in the past years [3], [4], [5], [6], [40], [44], tracking is still a challenging problem, since only few groundtruth in the first frame can be used, and the targets may undergo pose variation, occlusion, illumination changing, background cluttering, etc. All these challenges may contribute to track error and turn to drift.

To design a robust tracker which can handle the aforementioned challenges, various representation schemes are introduced into tracking task, such as pixel-based tracker [7], adopted features based trackers (e.g., texture [8], color [9], sparse-based tracker [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23]), description models based trackers (e.g., histogram [15], subspace representation [11], [40]) and multilevel quantization tracker [16]. Among the schemes listed above, sparse representation is wildly considered to be an effective tool for dealing with the aforementioned challenges.

As to sparse-based trackers, Mei and Ling [10] sparsely represent each object in a space spanned by trivial templates to tackle occlusion, and corruption challenges. Jia et al. [17] propose a tracking method based on a structural local sparse appearance model which exploits both local and spatial information of the target. Moreover, Bai et al. [13] model the object as a sparse linear combination of structured union of subspaces in a basis library. In addition, Hong et al. [18] integrate a multi-task and multi-view sparse learning problem into particle filter framework, which aims to explore underlying relationships between different particles and various types of visual features for tracking. However, all these works only address the foreground samples to construct generative models while ignoring the information from background.

To add the background information, Liu et al. [19] and Wang et al. [20] construct discriminative models based on sparse representation; however, they only encode the local patches of both object and background, while lose the holistic information provided by the object. Additionally, Xie et al. [14] utilize the sparse representation of target and background by combining both generative model and discriminative model, but they only encode the object in a holistic level without taking the local information into consideration. Meanwhile, some other algorithms also try to integrate both generative and discriminative models for tracking [24], [25], [26], [27]; however, they are not sparse-based algorithms and they do not exploit the combination of holistic and local features.

To address the above problems, in this paper, we aim to integrate the advantages of both discriminative model and generative model to exploit the holistic and local information from the object. Thus, the proposed algorithm is constructed by both global and local parts. In the global part, we encode holistic information of both object and background via a global dictionary, then the sparse codes are used to train a classifier to roughly distinguish the target from the backgrounds. As to the update scheme, we coupled learning the global dictionary and classifier instead of updating the dictionary and classifier as two separate parts as traditional algorithms in [20], [21]. In local part, we first partition the candidates into patches, then use a local dictionary to encode the patches into sparse codes. Finally, between two consecutive frames, both temporal and spatial consistencies of the patches are ensured to refine the tracking results. Moreover, the global part and local part are two complementary parts which contain both holistic and local information of the object, and we integrate them into a Bayesian inference framework to construct the final tracker.

The contributions of this paper are as follows:

  • (1)

    We sparsely represent the object in both global and local levels via two complementary parts, and these two parts give novel aspects to utilize the object׳s holistic and local information for tracking.

  • (2)

    To construct an adaptive GCL part, we employ an online algorithm to coupled learning of the global dictionary and classifier.

  • (3)

    In LCE part, we propose a new method to calculate the candidates׳ local confidences based on the temporal and spatial consistencies among the object patches.

Similar with our work, Zhong et al. [31] propose a sparse-based collaborative model which exploits both holistic and local information of the object. But we are different from them in both the way of sparse representation and the dictionary updating algorithm. Moreover, we use two stage filtering to combine the global and local parts instead of simply multiplying the confidence values of the holistic template and local patches in [31], and more detailed differences between the two trackers will be discussed in Section 2.

The paper is organized as follows. Firstly, we briefly discuss some related work in Section 2, and the details of our proposed tracker will be presented in Section 3. Then Section 4 presents the quantitative and qualitative comparisons between the proposed algorithm and some state-of-the-art trackers. Finally, conclusions and future work are followed in Section 5.

Section snippets

Related work

Our work is closely related to two topics: sparse representation and dictionary learning. Many algorithms have been designed in these two topics, and good reviews can be found in [2], [3], [4]. Next, we present some most relevant work which motivates our paper and shows our respect on them.

As introduced in Section 1, there are plentiful literatures which bring sparse representation into appearance modeling. Wang et al. [20] encode the local patches inside the object region and concatenate the

Proposed algorithm

In this paper, we sparsely represent the image samples at both global and local levels, then construct two complementary parts based on these two levels; finally, we integrate the two parts into a Bayesian inference framework for tracking. More details are elucidated in the rest of this section.

Experiments

To evaluate the performance of the proposed tracker, fifteen benchmarks are collected for comparison, and the challenges of the benchmarks are listed in Table 1. In addition, ten state-of-the-art trackers are compared with GLT algorithm on the benchmark video sequences; they are OAB [41], FragTrack [43], IVT [40], L1T [10], NNT [45], MIL [44], ODLSR [20], MTT [22], CT [42], and Collaborative Tracker (CollaT) [31]. We use the publicly available source codes and keep the original implementations

Conclusions

In this paper, we proposed a novel algorithm which aims to mine the global and local information of the object via two level sparse representation. In GCL part, we extracted the samples׳ holistic features based on Gaussian pyramid and over-complete global dictionary, then a classifier was constructed to roughly estimate the locations of the object. In LCE part, the candidates were partitioned into patches, then we encoded the patches via a local dictionary. Both temporal and spatial

Acknowledgment

The authors would like to thank the anonymous editor and reviewers who gave valuable suggestions that has helped to improve the quality of the manuscript. This research has been supported by the National Natural Science Foundation of China (Grant nos. 61402480, U1135005) and National High Technology Research and Development Program of China (Grant no. 2013AA01A607).

Yehui Yang is currently a Ph.D. candidate in Institute of Automation, Chinese Academy of Sciences (CAS). He received the B.S. degree in Automation from the College of Electrical and Information Engineering at Hunan University, Chang Sha, PRC, in 2011. His research interests include computer vision, machine learning and pattern recognition.

References (45)

  • P. Prez, C. Hue, J. Vermaak, M. Gangnet, Color-based probabilistic tracking, in: ECCV, 2002, pp....
  • X. Mei, H. Ling, Robust visual tracking using l1 minimization, in: ICCV, 2009, pp....
  • D. Wang et al.

    Online object tracking with sparse prototypes

    IEEE Trans. Image Process.

    (2013)
  • Z. Hong, X. Mei, D. Tao, Dual-force metric learning for robust distracter-resistant tracker, in: ICCV,...
  • D. Comaniciu et al.

    Kernel-based object tracking

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2003)
  • Z. Hong, C. Wang, X. Mei, D. Prokhorov, D. Tao, Tracking using Multilevel Quantizations, in: ECCV,...
  • X. Jia, H. Lu, M.H. Yang, Visual tracking via adaptive structural local sparse appearance model, in: CVPR, 2012, pp....
  • Z. Hong, X. Mei, D. Prokhorov, D. Tao, Tracking via robust multi-task multi-view joint sparse representation, in: ICCV,...
  • B. Liu, L. Yang, J. Huang, P. Meer, L. Gong, C. Kulikowski, Robust and fast collaborative tracking with two stage...
  • Q. Wang, F. Chen, W. Xu, M.H. Yang, Online discriminative object tracking with local sparse representation, in: WACV,...
  • Y. Xie et al.

    Discriminative object tracking via sparse representation and online dictionary learning

    IEEE Trans. Cybern.

    (2014)
  • T. Zhang, B. Ghanem, S. Liu, N. Ahuja, Robust visual tracking via structured multi-task sparse learning, in:...
  • Cited by (14)

    • Visual tracking using Locality-constrained Linear Coding and saliency map for visible light and infrared image sequences

      2018, Signal Processing: Image Communication
      Citation Excerpt :

      This section divides the existing these trackers into three categories: (1) Sparse representation-based target searching [22,23]: For the discriminative trackers, sparse representation classifier (SRC) can be used as a binary classifier. The dictionary of SRC contains two subspaces: one is target templates and the other is background regions.

    • Visual tracking using global sparse coding and local convolutional features

      2018, Digital Signal Processing: A Review Journal
      Citation Excerpt :

      Zhang et al. [25] utilize a bank of filters to convolve the target region at each position to extract useful local structural feature for the target representation. Moreover, some algorithms try to represent the target by exploiting the combination of holistic and local templates for tracking [36,37]. In this paper, we represent the target at both global and local levels.

    • Two-level superpixel and feedback based visual object tracking

      2017, Neurocomputing
      Citation Excerpt :

      Cehovin et al. [50] propose a coupled-layer visual model that combines the target's global and local appearance by interlacing two layers. The algorithm in [51] is constructed by Global Coupled Learning (GCL) part and Local Consistencies Ensuring (LCE) part with sparsely representing object at both global and local levels. Wang et al. [52] represent object at all three levels: top, middle and bottom levels, then integrate all the features at all levels through a hierarchical tree structural constraint.

    • Object tracking via inverse sparse representation and convolutional networks

      2017, Optik
      Citation Excerpt :

      However, many generative methods need to compute ℓ1 minimization which leads to slow tracking speed. Recently, sparse representation [11–14] has been widely used in the field of object tracking [15–17] under the particle filtering framework and facilitates more accurate tracking results than the traditional algorithms. Mei et al. firstly apply sparse representation to object tracking and develop the ℓ1 tracker [18].

    • Dual-scale structural local sparse appearance model for robust object tracking

      2017, Neurocomputing
      Citation Excerpt :

      Section 5 gives the experimental results of the trackers with our DSLSA model and other trackers, respectively, and in Section 6, conclusions are given. Sparse representation, which is a main part of sparse coding, has been applied in the field of visual tracking successfully [21–23,26,31,37–42]. In sparse representation, a signal can be represented as a linear combination of a few basis vectors.

    View all citing articles on Scopus

    Yehui Yang is currently a Ph.D. candidate in Institute of Automation, Chinese Academy of Sciences (CAS). He received the B.S. degree in Automation from the College of Electrical and Information Engineering at Hunan University, Chang Sha, PRC, in 2011. His research interests include computer vision, machine learning and pattern recognition.

    Yuan Xie received the Ph.D. degree in Pattern Recognition and Intelligent Systems from the Institute of Automation, Chinese Academy of Sciences (CAS), in 2013. He received his master degree in School of Information Science and Technology from Xiamen University, China, in 2010. He is a member of IEEE. His research interests include image processing, computer vision, machine learning and pattern recognition.

    Wensheng Zhang received the Ph.D. degree in Pattern Recognition and Intelligent Systems from the Institute of Automation, Chinese Academy of Sciences (CAS), in 2000. He joined the Institute of Software, CAS, in 2001. He is a Professor of Machine Learning and Data Mining and the Director of Research and Development Department, Institute of Automation, CAS. He has published over 32 papers in the area of Modeling Complex Systems, Statistical Machine Learning and Data Mining. His research interests include computer vision, pattern recognition, artificial intelligence and computer human interaction.

    Wenrui Hu received the Ph.D. degree in Pattern Recognition and Intelligent Systems from the Institute of Automation, Chinese Academy of Sciences (CAS), in 2014. He received his master degree in School of Optoelectronic Information from Beijing Institute of Technology, China, in 2010. His research interests include image processing, computer vision, machine learning and pattern recognition.

    Yuanhua Tan engaged in the information system construction of oil industry in long term. He is now a Senior Researcher in Karamay Hongyou Software Company, Xinjiang, PRC. His research interests include database, smart city and intelligent system.

    View full text