Temporally-adjusted correlation filter-based tracking
Introduction
Visual tracking has been studied wildly as a fundamental task in computer vision, which aims to estimate the motion of the target given its location only in starting frames of a video sequence.
During the past years, tracking-by-detection has been proven to be an effective tracking scheme. Its main idea is to extract a discriminative model using machine learning method to estimate the position of given target. Correlation filters have been extensively studied during the past few years [1], [2], [3], [4], [5] as a successful kind of tracker, which aim to learn a filter from previously observed target appearance. By learning such filter, the location of the target in the new frame can be estimated through a response map constructed by applying the learned filter on the frame. Since frames are captured in form of streaming, the efficiency is of great importance for visual object tracking. By building the connections between circular sliding windows and convolution operations [1], filter in the tracker can be computed efficiently using Fast Fourier Transform (FFT). Moreover, the linear system adopted in most correlation filter-based trackers enable it to update the filter incrementally by tracking several intermediate variables. Thus, correlation filter-based methods have been established as an efficient tracking scheme.
During the tracking process, the target may suffer from illumination changes, occlusions, deformations and rapid motions, which require trackers to adapt to the various appearance change of the target. Moreover, the trackers have to deal with the issue of lacking the sufficient samples to train a discriminative classifier. This makes most of the trackers an incremental online algorithm relying on continuously successfully tracking the target to refine the model. Such incremental tracking approaches usually face a major challenge that it heavily depends on the previous tracking results generated by the tracker instead of the ground truth. When the previous prediction is not accurate enough, drifting may occur as the error is introduced in the incremental approaches and being accumulated during the visual object tracking procedure. It appears that the region considered to be the target gradually drift off the designated target. Fig. 1 demonstrates the drifting phenomena occurred in the soccer sequence.
Matthews et al. [6] suggest that the drifting problem is mainly caused by the accumulation of small errors from tracking algorithm during the template updating procedure, which can be alleviated by updating the template of target passively. Some methods [6], [7] employ the initial template generated by the first frame acting as an anchor to re-align the target from time to time. However, there is the contradiction that changing appearance of target requires an active updating template scheme while preventing drifting needs a stable template to re-align the tracking region. Note that the drifting is not only a problem in visual tracking but also occurs in other tasks with an incremental updating approach, such as visual Simultaneous Localization and Mapping (SLAM). Thrun et al. [8] show that the small error in the previous frames will lead to a significant influence on the later estimations.
To address the issue of drifting for the incremental online tracking, in this paper, we propose a temporally-adjusted correlation filter (TCF) scheme. Different from the DCF trackers estimating target location based on the coordinate of the maximum response, we propose a differentiable target location re-estimation method. Specifically, we introduce a target realignment component, which takes advantage of Lucas–Kanade framework [9], [10] to perform target location realignment based on previously tracked frames during tracking. Moreover, a bundle adjustment like procedure is introduced beside the realignment approach. The proposed scheme simultaneously re-estimates the motion of the target and refines the template learned by the previous appearance of the target. By accurately estimating target’s location, the template can be learned more effectively in the tracking procedure. Note that the re-estimation of target location will cause a small displacement (realignment displacement shown in the second row of Fig. 1) in each frame while the template learned by realigned target will results in a large displacement (accumulated displacement shown in the first row of Fig. 1) between base tracker and the proposed tracker. The experimental evaluation demonstrates that our proposed scheme can alleviate the drifting problem during tracking and achieves better performance compared to the reference tracker in OTB-2013 benchmark with 50 challenging video sequences and OTB-2015 benchmark with 100 sequences.
The rest of this paper is organized as follows: Section 2 reviews the recent DCF based trackers. In Section 3, we briefly study the general DCF tracking approach. Section 4 describes our TCF tracker. Implementation details and comprehensive results are given in Section 5. Section 6 concludes this paper and discusses our further work.
Section snippets
Related work
The key of visual object tracker is to train models that are able to adapt to the changing appearance of the target. Typically, these trackers use either generative methods [11], [12], [13], [14] or discriminative approaches [1], [15], [16]. Trackers with generative models manage to build models to represent the appearance of target. Different from the generative methods, the discriminative approaches try to train a classifier to distinguish between the target and the background around it.
Discriminative correlation filter methods
In this section, we briefly review the overall correlation filter-based tracking methods. The traditional DCF method can be viewed as a learning process that aims to learn a filter f to encode the target’s appearance information. Suppose is an image vector centered at the visual target and is the labeled response map with a Gaussian pulse aligned with the target in x. The objective is to find a filter which can produce a Gaussian-like pulse indicating the target’s position after
Temporally-adjusted correlation filter-based tracker
Our proposed approach can be directly applied to DCF scheme. We build our tracker based on SRDCF tracker [4] aiming for a more general DCF framework.
Since observation xk is retrieved sequentially in real-world application, filter f is hard to be obtained during the tracking procedure. Instead, the filter ft at time t is calculated at each frame and the actual objective function is as below:
Also, the response map is calculated on the newly captured tth
Experiment
In this section, we give the implementation details and experimental setup. We evaluate the proposed Temporally-adjusted Correlation Filters (TCF) tracker with OTB-2013 dataset [31] and OTB-2015 dataset which include various challenges such as scale variations, fast motion, in-plane rotation etc. We compare the proposed tracking algorithm with other representative trackers including SRDCF [4], SAMF [3], DSST [5], KCF [2] and TGPR [32].
Conclusion
In this paper, we propose a novel temporally-adjusted correlation filter-based tracker which alleviates the drifting problem in visual object tracking task. The proposed tracker utilizes a differentiable target location re-estimation method to refine the tracking result. To this end, a bundle adjustment like procedure is introduced. The proposed tracker can be viewed as a general DCF tracker, where the re-alignment scheme can be adopted by other DCF trackers. By re-estimating the location of
Acknowledgment
The authors would like to thank the reviews for their constructive and informative comments. This work is supported by the National Key Research and Development Program of China (No. 2016YFB1001501).
Wenjie Song is currently a Ph.D. Candidate in College of Computer Science, Zhejiang University. He received his bachelor degree from College of Computer Science, Zhejiang University in 2011. His main research interests include computer vision and machine learning.
References (32)
- et al.
Visual object tracking using adaptive correlation filters
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(2010) - et al.
High-speed tracking with kernelized correlation filters
IEEE Trans. Pattern Anal. Mach. Intell.
(2015) - Y. Li, J. Zhu, A Scale Adaptive Kernel Correlation Filter Tracker with Feature Integration, Springer International...
- et al.
Learning spatially regularized correlation filters for visual tracking
Proceedings of the IEEE International Conference on Computer Vision
(2015) - et al.
Discriminative scale space tracking
IEEE Trans. Pattern Anal. Mach. Intell.
(2016) - et al.
The template update problem
IEEE Trans. Pattern Anal. Mach. Intell.
(2004) - et al.
Target tracking with Bayesian fusion based template matching
Proceedings of the IEEE International Conference on Image Processing
(2005) Robotic mapping: a survey[J]
Exploring Artificial Intelligence in the New Millennium
(2002)- et al.
An iterative image registration technique with an application to stereo vision
Proceedings of the Seventh International Joint Conference on Artificial Intelligence
(1981) - et al.
Lucas–Kanade 20 years on: a unifying framework
Int. J. Comput. Vis.
(2004)
Tracking by sampling trackers
Proceedings of the International Conference on Computer Vision
Robust tracking using local sparse appearance model and k-selection
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Real time robust l1 tracker using accelerated proximal gradient approach
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Distribution fields for tracking
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Struck: structured output tracking with kernels
Proceedings of the IEEE International Conference on Computer Vision
Cited by (7)
AdaptMVSNet: Efficient Multi-View Stereo with adaptive convolution and attention fusion
2023, Computers and Graphics (Pergamon)Rotation-aware dynamic temporal consistency with spatial sparsity correlation tracking
2022, Image and Vision ComputingCitation Excerpt :Spatio-temporal appearance model is exploited by STT [10] to improve the performance. The temporal penalization [6,11–16] is also employed to constrain the filter to be smooth between successive frames. The conventional methods generally conduct temporal consistency based on the fixed patterns of temporal modeling, which introduces historical frame directly.
High speed long-term visual object tracking algorithm for real robot systems
2021, NeurocomputingCitation Excerpt :However, the confidence level depends only on the maximum response of the long-term filter, which can easily lead to errors [42]. The second difficulty is the correction of drift errors [43]. With a larger area of padding, the filter is more adaptable towards fast motion, but the position drifts more easily.
Gaussian-response correlation filter for robust visual object tracking
2020, NeurocomputingCitation Excerpt :Moreover, the discriminative tracking methods are more productive in suppressing the effects of object appearance variation than generative tracking. Since the proposal of the MOSSE [7] tracker, numerous research results based on correlation filters have been published in many computer vision tasks, including visual object tracking [43–47]. Danelljan et al. [48] improved CSK tracker with color attributes to increase the efficiency of tracker.
Robust correlation filter tracking with multi-scale spatial view
2019, NeurocomputingCitation Excerpt :The shortcomings of this type of method mainly lie in that it is difficult to generate the optimal separating hyperplane when there is relatively few training samples whose training process is associated with relatively high computational complexity. In recent years, correlation filter based tracking methods (e.g., [31–46]) have been proposed which received much attention because it outperforms the previous methods in terms of the tracking speed and accuracy [47]. But alternatively, due to the insufficient updating methods when suffered from serious occlusion or severe change in illumination and so on, the appearance model of the target is easy to carry more and more noise information, leading to drift or even tracking failure.
Learning transform-aware attentive network for object tracking
2019, NeurocomputingCitation Excerpt :Different from this method, the proposed method regards object tracking as an attention process, bottom-up and top-down mechanisms are combined to locate the tracking target in next frame. Recent years have witnessed the success of inferring target states from response maps [22–27]. The most representative approach is correlation filter based trackers [6,8,9].
Wenjie Song is currently a Ph.D. Candidate in College of Computer Science, Zhejiang University. He received his bachelor degree from College of Computer Science, Zhejiang University in 2011. His main research interests include computer vision and machine learning.
Yang Li is currently a Ph.D. Candidate in College of Computer Science, Zhejiang University. He received his bachelor degree in Institute of Software Engineering from East China Normal University in 2011 and master degree in College of Computer Science from Zhejiang University in 2016. His main research interests include computer vision and machine learning.
Jianke Zhu is an Associate Professor in College of Computer Science at Zhejiang University. He received his Ph.D. degree in Computer Science and Engineering from The Chinese University of Hong Kong. He was a postdoc in BIWI Computer Vision Lab at ETH Zurich. Dr. Zhu’s research interests include computer vision and multimedia information retrieval. He is a senior member of the IEEE.
Chun Chen is a Professor in College of Computer Science, Zhejiang University. He received his Ph.D. degree in College of Computer Science from Zhejiang University. His research interests include Image Processing, Computer Vision, CSCW and Embedded Systems.