Elsevier

Neurocomputing

Volume 286, 19 April 2018, Pages 121-129
Neurocomputing

Temporally-adjusted correlation filter-based tracking

https://doi.org/10.1016/j.neucom.2018.01.067Get rights and content

Abstract

Recently, discriminative correlation filter (DCF) has been wildly studied and adopted in visual object tracking task. Since the convolution operation can be efficiently computed through fast Fourier transform (FFT), DCF trackers achieve the outstanding results while maintaining a very high computational performance. Lots of research efforts have been devoted to improving the tracking capability of DCF-based trackers, such as exploring new features or dealing with scale changes. As visual object tracking is naturally an incremental procedure, DCF trackers are inevitably suffering from drifting phenomenon caused by the accumulated small errors during the tracking process. In this paper, we proposed a temporally-adjusted correlation filter (TCF) tracking method to effectively address the drifting problem. By taking advantage of temporal information among the previous states of the target, our approach is able to refine the traditional DCF model during the tracking procedure and greatly reduce the risk of drifting. The experimental results on the challenging OTB-2013 and OTB-2015 dataset show that the proposed strategy is very promising.

Introduction

Visual tracking has been studied wildly as a fundamental task in computer vision, which aims to estimate the motion of the target given its location only in starting frames of a video sequence.

During the past years, tracking-by-detection has been proven to be an effective tracking scheme. Its main idea is to extract a discriminative model using machine learning method to estimate the position of given target. Correlation filters have been extensively studied during the past few years [1], [2], [3], [4], [5] as a successful kind of tracker, which aim to learn a filter from previously observed target appearance. By learning such filter, the location of the target in the new frame can be estimated through a response map constructed by applying the learned filter on the frame. Since frames are captured in form of streaming, the efficiency is of great importance for visual object tracking. By building the connections between circular sliding windows and convolution operations [1], filter in the tracker can be computed efficiently using Fast Fourier Transform (FFT). Moreover, the linear system adopted in most correlation filter-based trackers enable it to update the filter incrementally by tracking several intermediate variables. Thus, correlation filter-based methods have been established as an efficient tracking scheme.

During the tracking process, the target may suffer from illumination changes, occlusions, deformations and rapid motions, which require trackers to adapt to the various appearance change of the target. Moreover, the trackers have to deal with the issue of lacking the sufficient samples to train a discriminative classifier. This makes most of the trackers an incremental online algorithm relying on continuously successfully tracking the target to refine the model. Such incremental tracking approaches usually face a major challenge that it heavily depends on the previous tracking results generated by the tracker instead of the ground truth. When the previous prediction is not accurate enough, drifting may occur as the error is introduced in the incremental approaches and being accumulated during the visual object tracking procedure. It appears that the region considered to be the target gradually drift off the designated target. Fig. 1 demonstrates the drifting phenomena occurred in the soccer sequence.

Matthews et al. [6] suggest that the drifting problem is mainly caused by the accumulation of small errors from tracking algorithm during the template updating procedure, which can be alleviated by updating the template of target passively. Some methods [6], [7] employ the initial template generated by the first frame acting as an anchor to re-align the target from time to time. However, there is the contradiction that changing appearance of target requires an active updating template scheme while preventing drifting needs a stable template to re-align the tracking region. Note that the drifting is not only a problem in visual tracking but also occurs in other tasks with an incremental updating approach, such as visual Simultaneous Localization and Mapping (SLAM). Thrun et al. [8] show that the small error in the previous frames will lead to a significant influence on the later estimations.

To address the issue of drifting for the incremental online tracking, in this paper, we propose a temporally-adjusted correlation filter (TCF) scheme. Different from the DCF trackers estimating target location based on the coordinate of the maximum response, we propose a differentiable target location re-estimation method. Specifically, we introduce a target realignment component, which takes advantage of Lucas–Kanade framework [9], [10] to perform target location realignment based on previously tracked frames during tracking. Moreover, a bundle adjustment like procedure is introduced beside the realignment approach. The proposed scheme simultaneously re-estimates the motion of the target and refines the template learned by the previous appearance of the target. By accurately estimating target’s location, the template can be learned more effectively in the tracking procedure. Note that the re-estimation of target location will cause a small displacement (realignment displacement shown in the second row of Fig. 1) in each frame while the template learned by realigned target will results in a large displacement (accumulated displacement shown in the first row of Fig. 1) between base tracker and the proposed tracker. The experimental evaluation demonstrates that our proposed scheme can alleviate the drifting problem during tracking and achieves better performance compared to the reference tracker in OTB-2013 benchmark with 50 challenging video sequences and OTB-2015 benchmark with 100 sequences.

The rest of this paper is organized as follows: Section 2 reviews the recent DCF based trackers. In Section 3, we briefly study the general DCF tracking approach. Section 4 describes our TCF tracker. Implementation details and comprehensive results are given in Section 5. Section 6 concludes this paper and discusses our further work.

Section snippets

Related work

The key of visual object tracker is to train models that are able to adapt to the changing appearance of the target. Typically, these trackers use either generative methods [11], [12], [13], [14] or discriminative approaches [1], [15], [16]. Trackers with generative models manage to build models to represent the appearance of target. Different from the generative methods, the discriminative approaches try to train a classifier to distinguish between the target and the background around it.

Discriminative correlation filter methods

In this section, we briefly review the overall correlation filter-based tracking methods. The traditional DCF method can be viewed as a learning process that aims to learn a filter f to encode the target’s appearance information. Suppose xRM×N is an image vector centered at the visual target and yRM×N is the labeled response map with a Gaussian pulse aligned with the target in x. The objective is to find a filter which can produce a Gaussian-like pulse indicating the target’s position after

Temporally-adjusted correlation filter-based tracker

Our proposed approach can be directly applied to DCF scheme. We build our tracker based on SRDCF tracker [4] aiming for a more general DCF framework.

Since observation xk is retrieved sequentially in real-world application, filter f is hard to be obtained during the tracking procedure. Instead, the filter ft at time t is calculated at each frame and the actual objective function is as below: argminftk=1t1αkft*F(xk)yk2+ω·ft2.

Also, the response map is calculated on the newly captured tth

Experiment

In this section, we give the implementation details and experimental setup. We evaluate the proposed Temporally-adjusted Correlation Filters (TCF) tracker with OTB-2013 dataset [31] and OTB-2015 dataset which include various challenges such as scale variations, fast motion, in-plane rotation etc. We compare the proposed tracking algorithm with other representative trackers including SRDCF [4], SAMF [3], DSST [5], KCF [2] and TGPR [32].

Conclusion

In this paper, we propose a novel temporally-adjusted correlation filter-based tracker which alleviates the drifting problem in visual object tracking task. The proposed tracker utilizes a differentiable target location re-estimation method to refine the tracking result. To this end, a bundle adjustment like procedure is introduced. The proposed tracker can be viewed as a general DCF tracker, where the re-alignment scheme can be adopted by other DCF trackers. By re-estimating the location of

Acknowledgment

The authors would like to thank the reviews for their constructive and informative comments. This work is supported by the National Key Research and Development Program of China (No. 2016YFB1001501).

Wenjie Song is currently a Ph.D. Candidate in College of Computer Science, Zhejiang University. He received his bachelor degree from College of Computer Science, Zhejiang University in 2011. His main research interests include computer vision and machine learning.

References (32)

  • D.S. Bolme et al.

    Visual object tracking using adaptive correlation filters

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2010)
  • J.F. Henriques et al.

    High-speed tracking with kernelized correlation filters

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2015)
  • Y. Li, J. Zhu, A Scale Adaptive Kernel Correlation Filter Tracker with Feature Integration, Springer International...
  • M. Danelljan et al.

    Learning spatially regularized correlation filters for visual tracking

    Proceedings of the IEEE International Conference on Computer Vision

    (2015)
  • M. Danelljan et al.

    Discriminative scale space tracking

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2016)
  • I. Matthews et al.

    The template update problem

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2004)
  • Z. Jia et al.

    Target tracking with Bayesian fusion based template matching

    Proceedings of the IEEE International Conference on Image Processing

    (2005)
  • S. Thrun

    Robotic mapping: a survey[J]

    Exploring Artificial Intelligence in the New Millennium

    (2002)
  • B.D. Lucas et al.

    An iterative image registration technique with an application to stereo vision

    Proceedings of the Seventh International Joint Conference on Artificial Intelligence

    (1981)
  • S. Baker et al.

    Lucas–Kanade 20 years on: a unifying framework

    Int. J. Comput. Vis.

    (2004)
  • J. Kwon et al.

    Tracking by sampling trackers

    Proceedings of the International Conference on Computer Vision

    (2011)
  • B. Liu et al.

    Robust tracking using local sparse appearance model and k-selection

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2011)
  • C. Bao et al.

    Real time robust l1 tracker using accelerated proximal gradient approach

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2012)
  • L. Sevilla-Lara

    Distribution fields for tracking

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2012)
  • J.F. Henriques, R. Caseiro, P. Martins, J. Batista, Exploiting the Circulant Structure of Tracking-by-Detection with...
  • S. Hare et al.

    Struck: structured output tracking with kernels

    Proceedings of the IEEE International Conference on Computer Vision

    (2011)
  • Cited by (7)

    • Rotation-aware dynamic temporal consistency with spatial sparsity correlation tracking

      2022, Image and Vision Computing
      Citation Excerpt :

      Spatio-temporal appearance model is exploited by STT [10] to improve the performance. The temporal penalization [6,11–16] is also employed to constrain the filter to be smooth between successive frames. The conventional methods generally conduct temporal consistency based on the fixed patterns of temporal modeling, which introduces historical frame directly.

    • High speed long-term visual object tracking algorithm for real robot systems

      2021, Neurocomputing
      Citation Excerpt :

      However, the confidence level depends only on the maximum response of the long-term filter, which can easily lead to errors [42]. The second difficulty is the correction of drift errors [43]. With a larger area of padding, the filter is more adaptable towards fast motion, but the position drifts more easily.

    • Gaussian-response correlation filter for robust visual object tracking

      2020, Neurocomputing
      Citation Excerpt :

      Moreover, the discriminative tracking methods are more productive in suppressing the effects of object appearance variation than generative tracking. Since the proposal of the MOSSE [7] tracker, numerous research results based on correlation filters have been published in many computer vision tasks, including visual object tracking [43–47]. Danelljan et al. [48] improved CSK tracker with color attributes to increase the efficiency of tracker.

    • Robust correlation filter tracking with multi-scale spatial view

      2019, Neurocomputing
      Citation Excerpt :

      The shortcomings of this type of method mainly lie in that it is difficult to generate the optimal separating hyperplane when there is relatively few training samples whose training process is associated with relatively high computational complexity. In recent years, correlation filter based tracking methods (e.g., [31–46]) have been proposed which received much attention because it outperforms the previous methods in terms of the tracking speed and accuracy [47]. But alternatively, due to the insufficient updating methods when suffered from serious occlusion or severe change in illumination and so on, the appearance model of the target is easy to carry more and more noise information, leading to drift or even tracking failure.

    • Learning transform-aware attentive network for object tracking

      2019, Neurocomputing
      Citation Excerpt :

      Different from this method, the proposed method regards object tracking as an attention process, bottom-up and top-down mechanisms are combined to locate the tracking target in next frame. Recent years have witnessed the success of inferring target states from response maps [22–27]. The most representative approach is correlation filter based trackers [6,8,9].

    View all citing articles on Scopus

    Wenjie Song is currently a Ph.D. Candidate in College of Computer Science, Zhejiang University. He received his bachelor degree from College of Computer Science, Zhejiang University in 2011. His main research interests include computer vision and machine learning.

    Yang Li is currently a Ph.D. Candidate in College of Computer Science, Zhejiang University. He received his bachelor degree in Institute of Software Engineering from East China Normal University in 2011 and master degree in College of Computer Science from Zhejiang University in 2016. His main research interests include computer vision and machine learning.

    Jianke Zhu is an Associate Professor in College of Computer Science at Zhejiang University. He received his Ph.D. degree in Computer Science and Engineering from The Chinese University of Hong Kong. He was a postdoc in BIWI Computer Vision Lab at ETH Zurich. Dr. Zhu’s research interests include computer vision and multimedia information retrieval. He is a senior member of the IEEE.

    Chun Chen is a Professor in College of Computer Science, Zhejiang University. He received his Ph.D. degree in College of Computer Science from Zhejiang University. His research interests include Image Processing, Computer Vision, CSCW and Embedded Systems.

    View full text