Reliable correlation tracking via dual-memory selection model
Introduction
Visual tracking is a fundamental and important topic in computer vision, and it has numerous applications, ranging from video surveillance, human-machine interaction, and robotic services to automatic driving. This technique aims to estimate the trajectory of an unknown target in an image sequence with only a given initial state. Although significant progress [1], [2], [13], [18], [23], [31] has been achieved over the past decades, designing an efficient and robust tracking algorithm is still quite challenging due to several factors, such as target deformations, background clutters and occlusions.
Recently, discriminative correlation filters (DCFs) have been successfully applied to visual tracking and have received extensive attention. In general, DCF-based tracking methods follow the tracking-by-detection framework, in which the training, detection and updating steps are sequentially executed during the entire tracking process. However, unlike most existing tracking-by-detection trackers, DCF-based trackers perform the training and detection steps more efficiently using the circular sample assumption and fast Fourier transform (FFT) technique. Moreover, the introductions of approximate dense sampling and high-dimensional features further enhance the accuracy of DCF-based tracking methods.
However, correlation-filter-based trackers are prone to drift due to their highly adaptive model update modes, especially when the target undergoes many more challenging factors, such as heavy occlusions and background clutters. Unreliable tracking results will contaminate the filter over time, which can lead to tracking failure if not immediately addressed. To mitigate the model drift problem, some researchers [4], [5] design a dynamic learning rate based on the confidence of the current tracking result. However, it is not easy to robustly evaluate the tracking confidence, and this is always unfeasible in some complex scenarios. Other tracking methods [9], [28] attempt to strengthen the model discrimination by reducing boundary effects. Unfortunately, they generally need to solve a complicated model formulation with a time-consuming optimization procedure, which may limit their use in many real-time applications. Recently, a number of works [27], [29] focus on including a redetection scheme to refine unreliable tracking results. However, these methods always trust the redetection result without careful checking. Once the redetection result is corrupted, they will lose the chance to recover from tracking failures.
Motivated by the work in [29], we introduce the long-term memory of target appearance to alleviate the problem of model drift. Long-term memory provides more historical information of target appearance and is thus robust for handling heavy occlusions. Short-term memory is also an indispensable information resource for adapting to rapid appearance changes, and it cannot be replaced by long-term memory. In fact, these two memory patterns are complementary to each other, and cooperation between them is supposed to enhance both the adaptivity and robustness for visual tracking. Fig. 1 illustrates the specialities of trackers with different memories and the effectiveness of combining both short-term memory and long-term memory. Thanks to the maintenance of short-term memory, the Staple tracker adapts well to large appearance changes in the Skating1 sequence, where the long-term tracker TLD fails. However, when the target suffers from heavy occlusions in the Jogging2 sequence, the long-term tracker TLD performs more robustly than the short-term tracker Staple. By combining both short-term memory and long-term memory, our tracker and the MUSTer tracker perform favorably compared to the Staple tracker and the TLD tracker. In particular, the multistore tracker (MUSTer) also exploits both short-term memory and long-term memory to achieve better tracking performance. Despite the demonstrated success, MUSTer is computationally expensive because it needs to perform keypoint matching-tracking and RANSAC estimation based on the SIFT descriptors. Moreover, MUSTer has many parameters to carefully tune, which may weaken its generalizability in some new datasets.
In this study, we propose a dual-memory selection (DMS) model to alleviate the tracking drift problem by considering both the short-term memory and long-term memory of target appearance. The dual-memory pattern is able to provide a richer target appearance representation and enhance both the adaptivity and robustness for visual tracking. Specifically, the proposed DMS model consists of four components: a short-term tracker, a long-term tracker, the memory evaluation criterion (MEC) and a memory selector. These four components work collaboratively to construct a reliable tracking framework. Since long-term memory is robust for handling heavy occlusions and short-term memory performs well in adapting to rapid appearance changes, we build two trackers based on correlation filters with short-term memory and long-term memory, respectively. The short-term tracker uses the linear interpolation update model to capture the recent target appearance. The long-term tracker exploits the memory-improved update model to maintain the memory of the historical target appearance. Furthermore, considering that different memory patterns have respective specialities to deal with different challenging factors, it is desirable to design a memory selector to achieve better performance in various tracking scenes. The memory selector is able to adaptively select a reliable memory pattern depending on the need for handling the current challenge. Intuitively, a direct idea for memory selection is based on the estimation of the current target state. However, it is difficult to distinguish drastic appearance changes from occlusions because they usually show similar appearance characteristics. To better perform memory selection, we propose a novel MEC that is based on the reliability evaluation of trackers with short-term memory and long-term memory. Moreover, by introducing the temporal context into the reliability evaluation, a stable output is obtained with temporal continuity. Finally, we conduct extensive evaluation experiments on the OTB-2013, OTB-2015, VOT2015 and VOT2016 datasets. Compared with various state-of-the-art DCF-based and deep learning tracking algorithms, our tracker shows superior performance in terms of accuracy and speed.
The main contributions of this paper can be summarized as follows.
- 1.
An adaptive DMS model is proposed for alleviating the problem of tracking drift. Considering that the short-term memory and long-term memory of target appearance play different roles in addressing various challenges, the DMS model adaptively selects the most reliable memory pattern via a memory selector according to the immediate requirement.
- 2.
A novel MEC is developed for memory selection by evaluating the reliability of trackers with short-term memory and long-term memory. Moreover, the introduction of a temporal context helps output a more stable motion trajectory with temporal continuity.
- 3.
Extensive experiments on four large-scale benchmarks have been conducted to demonstrate the competitive performance of our tracker compared with other state-of-the-art tracking algorithms.
The remaining context of our work is organized as follows. Section 2 gives an overview of related works to ours. Section 3 presents an elaboration of our work including the dual-memory selection model (DMS), short-term tracker, long-term tracker and memory evaluation criterion (MEC). In Section 4, extensive experimental results are shown with detailed discussions. Finally, the proposed work is concluded in Section 5.
Section snippets
Related works
There are several surveys that review the recent research progress in visual tracking, which can be found in [25], [37]. In this section, we only discuss the works that are the most related to ours, namely, correlation tracking methods, multiexpert tracking methods and deep learning tracking methods.
Our method
In this section, we first introduce the proposed DMS model in Section 3.1, which serves as the overall framework of our method. Then, we establish the short-term tracker and long-term tracker in Section 3.2 and Section 3.3, respectively. Finally, the MEC is elaborated in Section 3.4 by considering stable credibility and discriminability measurements.
Experimental results and analysis
In this section, we first introduce implemental details of our method including experimental environments and parameter settings. Then we present extensive comparisons on the OTB benchmark [37], [38] and VOT benchmark [20], [21] with state-of-the-art trackers to demonstrate the superiority of the proposed method. Finally, more detailed analysis is given on the parameters. A brief description of all evaluated datasets can be found in Table 1.
Conclusion
In this paper, we consider both the short-term memory and long-term memory of the target appearance for enhancing the adaptivity and robustness of visual tracking and further propose a DMS model to select a reliable memory pattern to handle the current tracking challenges. Specifically, we establish a memory tracker for each memory pattern based on DCFs. Furthermore, to perform a robust reliability evaluation for memory selection, an MEC is presented by considering the credibility and
CRediT authorship contribution statement
Guiji Li: Conceptualization, Methodology, Software, Validation, Writing - original draft. Manman Peng: Writing - original draft, Supervision, Project administration. Ke Nai: Validation, Formal analysis. Zhiyong Li: Investigation, Writing - review & editing. Keqin Li: Writing - review & editing.
Declaration of Competing Interest
We declare that we have no conflicts of interest to this work.
Acknowledgements
This work is supported by the National key R&D Program of China (Grant 2017YFB0202901, 2017YFB0202905). This work is supported by the National Nature Science Foundation of China(Grant Number: 61906167). The corresponding author of this paper is Manman Peng ([email protected]).
References (50)
- et al.
Visual object tracking via enhanced structural correlation filter
Inf. Sci.
(2017) - et al.
Masked and dynamic siamese network for robust visual tracking
Inf. Sci.
(2019) - et al.
Visual tracking via context-aware local sparse appearance model
J. Visual Commun. Image Represen.
(2018) - et al.
Deep visual tracking: review and experimental comparison
Pattern Recognit
(2018) - et al.
Augmenting cascaded correlation filters with spatial-temporal saliency for visual tracking
Inf. Sci.
(2019) - et al.
Staple: Complementary learners for real-time tracking
2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016
(2016) - et al.
Fully-convolutional siamese networks for object tracking
Computer Vision - ECCV 2016 Workshops - Amsterdam, The Netherlands, October 8–10 and 15–16, 2016, Proceedings, Part II
(2016) - et al.
Visual object tracking using adaptive correlation filters
The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 13–18 June 2010
(2010) - et al.
A structural coupled-layer tracking method based on correlation filters
MultiMedia Modeling - 23rd International Conference, MMM 2017, Reykjavik, Iceland, January 4–6, 2017, Proceedings, Part I
(2017) - et al.
Attentional correlation filter network for adaptive visual tracking
2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017
(2017)
Accurate scale estimation for robust visual tracking
British Machine Vision Conference, BMVC 2014, Nottingham, UK, September 1–5, 2014
Convolutional features for correlation filter based visual tracking
2015 IEEE International Conference on Computer Vision Workshop, ICCV Workshops 2015, Santiago, Chile, December 7–13, 2015
Learning spatially regularized correlation filters for visual tracking
2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7–13, 2015
Adaptive color attributes for real-time visual tracking
2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, June 23–28, 2014
Parallel tracking and verifying: A framework for real-time and high accuracy visual tracking
IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22–29, 2017
Learning background-aware correlation filters for visual tracking
IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22–29, 2017
High-speed tracking with kernelized correlation filters
IEEE Trans. Pattern Anal. Mach. Intell.
Image-based three-dimensional human pose recovery by multiview locality-sensitive sparse retrieval
IEEE Trans. Industr. Electron.
Multimodal face-pose estimation with multitask manifold deep learning
IEEE Trans. Industr. Inform.
Online tracking by learning discriminative saliency map with convolutional neural network
Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6–11 July 2015
Multi-store tracker (muster): A cognitive psychology inspired approach to object tracking
IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7–12, 2015
Tracking-learning-detection
IEEE Trans. Pattern Anal. Mach. Intell.
Residual LSTM attention network for object tracking
IEEE Signal Process. Lett.
The visual object tracking VOT2016 challenge results
Computer Vision - ECCV 2016 Workshops - Amsterdam, The Netherlands, October 8–10 and 15–16, 2016, Proceedings, Part II
The visual object tracking VOT2015 challenge results
2015 IEEE International Conference on Computer Vision Workshop, ICCV Workshops 2015, Santiago, Chile, December 7–13, 2015
Cited by (12)
SiamST: Siamese network with spatio-temporal awareness for object tracking
2023, Information SciencesHierarchical memory-guided long-term tracking with meta transformer inquiry network
2023, Knowledge-Based SystemsUncertain motion tracking via target-objectness proposal and memory validation
2022, Information SciencesResidual memory inference network for regression tracking with weighted gradient harmonized loss
2022, Information SciencesCitation Excerpt :Long-term memory contains the historical states of the target, which is robust for a tracker to deal with the violent deformation. For example, Discriminative Correlation Filters (DCFs) based methods [26,34] propose a correlation filter to maintain the long-term memory of the target, combined with a low learning rate. When the baseline tracker is unreliable, the conservative correlation filter can recover the target.
Deep neural networks with attention mechanism for monocular depth estimation on embedded devices
2022, Future Generation Computer SystemsCitation Excerpt :In the era of Internet of Things (IoT), MDE is usually applied in diverse fields [1], such as semantic segmentation, image classification [2], object detection [3,4], 3D (three-dimensional) reconstruction [5], and object tracking [6–8].
Learning object-uncertainty policy for visual tracking
2022, Information SciencesCitation Excerpt :We make a brief survey of the recent popular tracking methods for the sake of completeness. Discriminative correlation filter (DCF) provides a solution to the visual tracking by solving ridge regression in the Fourier domain, which shows attractive efficiency [3,5,6,9,10,20,37,38]. The recent researches have shown that the ridge regression can be solved in the deep learning frameworks [25,28], which refrain from the boundary effect [37,39] in traditional DCF trackers to a certain extent.