A stable long-term object tracking method with re-detection strategy
Introduction
Object tracking is a comprehensive and fundamental problem in computer vision with numerous applications [28], [31], [34]. Giving the initial state (e.g. location, size) of the target to be tracked in the first frame of the video, the target tracking algorithm can automatically estimate the state of the target in subsequent frames [33].
In recent years, some classic target tracking algorithms aimed at the short-term tracking problem that focused on the tracking for a few hundred frames. For example, correlation filter based algorithms such as CSK [13], KCF [14], CN [7], Staple [1], SAMF [17], BACF [9] and deep learning-based algorithms such as FCT [37], TCNN [22], MDNet [24], GOTURN [12] are very effective for short tracking. But in practical applications, our requirement for tracking is to track correctly for a longer period of time, the range of which can be 1~10 minutes, that is, long-term tracking. However, in the tracking process, problems such as object occlusion, light changes, lens shifts, out-of-plane rotations and object deformations are encountered, which make it difficult to track correctly for a long time. Thus long-term tracking is still a challenging task [23], [25]. For the problem of out-of-plane rotations, multiple perspective features of object appear and need to be learned by the tracker. When the target is occluded or disappeared from the view for several frames, the tracker may lose the appearance information of the target until the target appears. Whether the disappeared target can be successfully tracked again, it becomes one of the indicators to judge a tracking algorithm.
Among the tracking algorithms, the correlation filter based methods perform very efficiently in the short-term tracking for their fast calculation speed, outstanding real-time performance [8], [10], and high precision. Moreover, updating the correlation filter online makes the tracker more robust, such as MOSSE [3]. CSK [13] adopted the theory of circulant matrices for fast detection and updated with the Fast Fourier Transform. The context-aware correlation filter algorithm [21], which adds background information above the kernel correlation filter framework, enable the correlation filter tracker to solve the target rotation problem. However, the context-aware correlation filter tracker still cannot get rid of the target disappearance problem. When the target happens to be occluded or deform in the long-term tracking, it is difficult for the tracker to successfully track the target all the time. There are two ways to solve the problems in the long-term tracking. The first one is to extract better features to make the target easier to identify. The second one is to judge whether the tracker fails to locate the target and add a re-detector in tracking process. When the tracking fails, the re-detection is performed to obtain the correct target position for successful tracking again.
In our paper, we adopt re-detection and correlation filter to track the target that may disappear or deform heavily in long-term tracking. We design a discriminative mechanism in the re-detection process by using the support vector machine (SVM) to separate the tracking target from the background. During tracking, when the confidence of the target is higher than the threshold, the SVM parameters are updated to learn the latest appearance of the target. When the confidence of the target is lower than the threshold, the SVM classifier is used to discriminate the target for re-detection and the tracker will be initialized. In our strategy, only features that are extracted from the initial frame and the frames with high target confidence value during the tracking process are used to update the classifier which aims to increasing the robustness of the tracking algorithm.
To learn the parameters for the tracker, conventional methods often take the maximum response of the current frame as the target position and update the tracker with the features in the target bounding box after receiving every frame. However, it is unreasonable to update the tracker with this strategy. When an occlusion occurs, or a similar object appears, the resulting response maps may fluctuate drastically and the tracking may fail. In this condition, using an incorrect and non-target feature to update the parameters of the tracker will lead to the model degradation. To overcome this problem, we introduce re-detection into the correlation tracker and propose a strategy that is able to determine whether the tracker and the re-detector should be updated to prevent the tracking system from introducing wrong features. In the work of [20], it merely adopted the maximum response as the re-detection criterion. However, it would lose the information of the fluctuating response map. Therefore, we introduce the APCE criterion [29] to evaluate the fluctuation degree in the tracking process to further determine whether the tracker needs to be updated or initialized by the re-detector.
This work aims to address the problem of long-term visual tracking. Our contributions can be summarized in threefold.
- •
We propose a stable long-term tracking strategy called CA-CF-SVM, which is composed of a context-aware correlation filter based tracker module and a SVM-based re-detector module. With the confidence strategy adopted, the CF tracker ensures an accurate tracking result and the SVM updates accordingly. When the response maps fluctuate heavily, the SVM switches to work as a re-detector and the tracker will be initialized. In this way, our strategy can avoid the degradation of the tracking model, and keep a stable tracking in the long-term tracking even if the occlusion, the deformation or the confusing similar object appears.
- •
To judge the tracking confidence, we not only use the maximum response but also introduce the APCE criterion to the re-detection part. The combination of the two criteria can accurately help to determine the state of the tracker in time and improving the accuracy of the tracking system.
- •
By evaluating our algorithm on a large-scale OTB benchmark dataset [32] with 50 challenging tracking videos, we proposed to analyze the effection of different parameter values in our algorithm. In order to prove the effectiveness of our re-detection module, we combine our SVM based re-detector with the other three correlation filter based tracker, including CSK [13], SAMF [17] and BACF [9]. All of the algorithms with re-detection module outperform the baseline methods in terms of precision, robustness, efficiency for the specific problem of the long-term tracking.
Section snippets
Related work
The tracking algorithms are mainly divided into correlation filter algorithms, deep learning algorithms and the other algorithms according to the principle. The algorithms based on deep learning such as ADNet [35], CFNet [30], DeepLMCF [29] are efficient and potential in the field of tracking, but they suffer from high complexity of deep features that leads to extremely huge computing costs. However, the correlation filtering algorithms [11], [26], [36], [38], [39], [40] have the advantages of
Tracking components
The basic idea of our algorithm is the combination of tracker and re-detector. We evaluate the tracking effect by its confidence level. When the confidence level is higher than the threshold, the tracking module performs alone. When the confidence level is low, which may caused by the problems of occlusion, out-of-plane rotation, deformmation, etc. and the tracker cannot continuously track the target, we re-detect the target in the current frame and re-locate the correct position of the target.
Setups
The tracking code we used was implemented in MATLAB on an Intel i7 processor clocked at 2.50 GHz and 8.00 GB of memory. During the tracking process, the search window size is determined by the target size given in the initial frame and the ratio of bounding box to the interest of region. In the experiment, the parameters were set as follows: the ratio of bounding box to the interest of region is 0.5, the additional regularization factor λ2 for the context-aware module in formula (2) is 25, the
Conclusions
In the long-term tracking subject, there are many challenge problems and always cause the traditional trackers failure. We propose an algorithm using the support vector machine for re-detection. By learning the target features of the frame with high confidence in the tracking process, we update the support vector machine. By re-detecting the target in the frame with lower confidence, the target can be found again. Our algorithm is suitable for long-term tracking because it is outstanding for
References (40)
- et al.
Staple: complementary learners for real-time tracking
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2016) - et al.
Fully-convolutional Siamese networks for object Tracking
Proceedings of the Lecture Notes in Computer Science
(2016) - et al.
Visual object tracking using adaptive correlation filters
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2010) - et al.
Attentional correlation filter network for adaptive visual tracking
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2017) - et al.
Accurate scale estimation for robust visual tracking
Proceedings of the British Machine Vision Conference, Nottingham
(2014) - et al.
Discriminative Scale Space Tracking
IEEE Trans. Pattern Anal. Mach. Intell.
(2017) - et al.
Adaptive color attributes for real-time visual tracking
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR)
(2014) - et al.
Occlusion-aware real-time object tracking
IEEE Trans. Multimed.
(2017) - et al.
Learning background-aware correlation filters for visual tracking
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR)
(2017) - et al.
Real-time superpixel segmentation by DBSCAN clustering algorithm
IEEE Trans. Image Process.
(2016)
Transfer learning based visual tracking with Gaussian processes regression
Proceedings of the European Conference on Computer Vision
Learning to track at 100 FPS with deep regression networks
Proceedings of the European Conference on Computer Vision
Exploiting the circulant structure of tracking-by-detection with Kernels
Proceedings of the European Conference on Computer Vision
High-speed tracking with kernelized correlation filters
IEEE Trans. Pattern Anal. Mach. Intell.
Tracking-learning-detection
IEEE Trans. Pattern Anal. Mach. Intell.
A Scale Adaptive Kernel Correlation Filter Tracker with Feature Integration
Proceedings of the European Conference on Computer Vision
Reliable patch trackers: robust visual tracking by exploiting reliable patches
Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR)
Discriminative correlation filter tracker with channel and spatial reliability
Int. J. Comput. Vis.
Long-term correlation tracking
Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR)
Cited by (12)
Error-tolerant approximate graph matching utilizing node centrality information
2020, Pattern Recognition LettersCitation Excerpt :In [44], the authors proposed an object tracking benchmark and effective approaches for robust tracking. A long-term object tracking strategy that obtains a stable and accurate tracking performance is given by [26]. In [22], authors provide a comprehensive survey of biometric research techniques of the last five decades, including its accomplishments, challenges, and opportunities.
Motion-aware object tracking for aerial images with deep features and discriminative correlation filter
2024, Multimedia Tools and ApplicationsTemplate Drift Suppression Method Based on Sub Pixel Correction
2023, Laser and Optoelectronics ProgressLong-term tracking with transformer and template update
2022, Eurasip Journal on Advances in Signal Processing