Elsevier

Pattern Recognition Letters

Volume 127, 1 November 2019, Pages 119-127
Pattern Recognition Letters

A stable long-term object tracking method with re-detection strategy

https://doi.org/10.1016/j.patrec.2018.09.017Get rights and content

Highlights

  • We introduce a long-term tracking strategy that is composed of a CA-CF tracker and a SVM-based re-detector.

  • We propose to adopt both the maximum response and the APCE as the criteria to judge the confidence of the tracker in time.

  • We analyze the parameters that appeared in our method and compare their performance with different values.

  • Our algorithm obtains a stable and accurate tracking performance for long-term tracking.

Abstract

In this work, we proposed a long-term tracking strategy to deal with the occlusion, out-of-plane rotation, and the confusing non-target object. Our tracking system is composed of two parts, the CA-CF tracker, an efficient correlation method for short-term tracking, and the SVM-based re-detector, which prevents the CA tracker from degradation. When the tracker works with confidence, the CA-CF module ensures an accurate tracking result and the SVM updates accordingly. When the response maps fluctuate heavily, the SVM switches to work as a re-detector and the tracker will be initialized. We also introduced to adopt both the maximum response criterion and the APCE criterion to judge the performance of the tracker in time. By evaluating our algorithm on the OTB benchmark datasets, we proposed to analyze the result affected by the parameters of our CA-CF-SVM strategy. The experimental results show that our method has a significant improvement than the state-of-the-art methods for the long-term tracking both in accuracy and robustness.

Introduction

Object tracking is a comprehensive and fundamental problem in computer vision with numerous applications [28], [31], [34]. Giving the initial state (e.g. location, size) of the target to be tracked in the first frame of the video, the target tracking algorithm can automatically estimate the state of the target in subsequent frames [33].

In recent years, some classic target tracking algorithms aimed at the short-term tracking problem that focused on the tracking for a few hundred frames. For example, correlation filter based algorithms such as CSK [13], KCF [14], CN [7], Staple [1], SAMF [17], BACF [9] and deep learning-based algorithms such as FCT [37], TCNN [22], MDNet [24], GOTURN [12] are very effective for short tracking. But in practical applications, our requirement for tracking is to track correctly for a longer period of time, the range of which can be 1~10 minutes, that is, long-term tracking. However, in the tracking process, problems such as object occlusion, light changes, lens shifts, out-of-plane rotations and object deformations are encountered, which make it difficult to track correctly for a long time. Thus long-term tracking is still a challenging task [23], [25]. For the problem of out-of-plane rotations, multiple perspective features of object appear and need to be learned by the tracker. When the target is occluded or disappeared from the view for several frames, the tracker may lose the appearance information of the target until the target appears. Whether the disappeared target can be successfully tracked again, it becomes one of the indicators to judge a tracking algorithm.

Among the tracking algorithms, the correlation filter based methods perform very efficiently in the short-term tracking for their fast calculation speed, outstanding real-time performance [8], [10], and high precision. Moreover, updating the correlation filter online makes the tracker more robust, such as MOSSE [3]. CSK [13] adopted the theory of circulant matrices for fast detection and updated with the Fast Fourier Transform. The context-aware correlation filter algorithm [21], which adds background information above the kernel correlation filter framework, enable the correlation filter tracker to solve the target rotation problem. However, the context-aware correlation filter tracker still cannot get rid of the target disappearance problem. When the target happens to be occluded or deform in the long-term tracking, it is difficult for the tracker to successfully track the target all the time. There are two ways to solve the problems in the long-term tracking. The first one is to extract better features to make the target easier to identify. The second one is to judge whether the tracker fails to locate the target and add a re-detector in tracking process. When the tracking fails, the re-detection is performed to obtain the correct target position for successful tracking again.

In our paper, we adopt re-detection and correlation filter to track the target that may disappear or deform heavily in long-term tracking. We design a discriminative mechanism in the re-detection process by using the support vector machine (SVM) to separate the tracking target from the background. During tracking, when the confidence of the target is higher than the threshold, the SVM parameters are updated to learn the latest appearance of the target. When the confidence of the target is lower than the threshold, the SVM classifier is used to discriminate the target for re-detection and the tracker will be initialized. In our strategy, only features that are extracted from the initial frame and the frames with high target confidence value during the tracking process are used to update the classifier which aims to increasing the robustness of the tracking algorithm.

To learn the parameters for the tracker, conventional methods often take the maximum response of the current frame as the target position and update the tracker with the features in the target bounding box after receiving every frame. However, it is unreasonable to update the tracker with this strategy. When an occlusion occurs, or a similar object appears, the resulting response maps may fluctuate drastically and the tracking may fail. In this condition, using an incorrect and non-target feature to update the parameters of the tracker will lead to the model degradation. To overcome this problem, we introduce re-detection into the correlation tracker and propose a strategy that is able to determine whether the tracker and the re-detector should be updated to prevent the tracking system from introducing wrong features. In the work of [20], it merely adopted the maximum response as the re-detection criterion. However, it would lose the information of the fluctuating response map. Therefore, we introduce the APCE criterion [29] to evaluate the fluctuation degree in the tracking process to further determine whether the tracker needs to be updated or initialized by the re-detector.

This work aims to address the problem of long-term visual tracking. Our contributions can be summarized in threefold.

  • We propose a stable long-term tracking strategy called CA-CF-SVM, which is composed of a context-aware correlation filter based tracker module and a SVM-based re-detector module. With the confidence strategy adopted, the CF tracker ensures an accurate tracking result and the SVM updates accordingly. When the response maps fluctuate heavily, the SVM switches to work as a re-detector and the tracker will be initialized. In this way, our strategy can avoid the degradation of the tracking model, and keep a stable tracking in the long-term tracking even if the occlusion, the deformation or the confusing similar object appears.

  • To judge the tracking confidence, we not only use the maximum response but also introduce the APCE criterion to the re-detection part. The combination of the two criteria can accurately help to determine the state of the tracker in time and improving the accuracy of the tracking system.

  • By evaluating our algorithm on a large-scale OTB benchmark dataset [32] with 50 challenging tracking videos, we proposed to analyze the effection of different parameter values in our algorithm. In order to prove the effectiveness of our re-detection module, we combine our SVM based re-detector with the other three correlation filter based tracker, including CSK [13], SAMF [17] and BACF [9]. All of the algorithms with re-detection module outperform the baseline methods in terms of precision, robustness, efficiency for the specific problem of the long-term tracking.

Section snippets

Related work

The tracking algorithms are mainly divided into correlation filter algorithms, deep learning algorithms and the other algorithms according to the principle. The algorithms based on deep learning such as ADNet [35], CFNet [30], DeepLMCF [29] are efficient and potential in the field of tracking, but they suffer from high complexity of deep features that leads to extremely huge computing costs. However, the correlation filtering algorithms [11], [26], [36], [38], [39], [40] have the advantages of

Tracking components

The basic idea of our algorithm is the combination of tracker and re-detector. We evaluate the tracking effect by its confidence level. When the confidence level is higher than the threshold, the tracking module performs alone. When the confidence level is low, which may caused by the problems of occlusion, out-of-plane rotation, deformmation, etc. and the tracker cannot continuously track the target, we re-detect the target in the current frame and re-locate the correct position of the target.

Setups

The tracking code we used was implemented in MATLAB on an Intel i7 processor clocked at 2.50 GHz and 8.00 GB of memory. During the tracking process, the search window size is determined by the target size given in the initial frame and the ratio of bounding box to the interest of region. In the experiment, the parameters were set as follows: the ratio of bounding box to the interest of region is 0.5, the additional regularization factor λ2 for the context-aware module in formula (2) is 25, the

Conclusions

In the long-term tracking subject, there are many challenge problems and always cause the traditional trackers failure. We propose an algorithm using the support vector machine for re-detection. By learning the target features of the frame with high confidence in the tracking process, we update the support vector machine. By re-detecting the target in the frame with lower confidence, the target can be found again. Our algorithm is suitable for long-term tracking because it is outstanding for

References (40)

  • L. Bertinetto et al.

    Staple: complementary learners for real-time tracking

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    (2016)
  • L. Bertinetto et al.

    Fully-convolutional Siamese networks for object Tracking

    Proceedings of the Lecture Notes in Computer Science

    (2016)
  • D.S. Bolme et al.

    Visual object tracking using adaptive correlation filters

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    (2010)
  • ChoiJ. et al.

    Attentional correlation filter network for adaptive visual tracking

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    (2017)
  • M. Danelljan et al.

    Accurate scale estimation for robust visual tracking

    Proceedings of the British Machine Vision Conference, Nottingham

    (2014)
  • M. Danelljan et al.

    Discriminative Scale Space Tracking

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2017)
  • M. Danelljan et al.

    Adaptive color attributes for real-time visual tracking

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR)

    (2014)
  • DongX. et al.

    Occlusion-aware real-time object tracking

    IEEE Trans. Multimed.

    (2017)
  • H.K. Galoogahi et al.

    Learning background-aware correlation filters for visual tracking

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR)

    (2017)
  • ShenJ. et al.

    Real-time superpixel segmentation by DBSCAN clustering algorithm

    IEEE Trans. Image Process.

    (2016)
  • GaoJ. et al.

    Transfer learning based visual tracking with Gaussian processes regression

    Proceedings of the European Conference on Computer Vision

    (2014)
  • D. Held et al.

    Learning to track at 100 FPS with deep regression networks

    Proceedings of the European Conference on Computer Vision

    (2016)
  • J. Henriques et al.

    Exploiting the circulant structure of tracking-by-detection with Kernels

    Proceedings of the European Conference on Computer Vision

    (2012)
  • J. Henriques et al.

    High-speed tracking with kernelized correlation filters

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2015)
  • T Joachims. Making large-scale svm learning practical. Technical report, Technical Report, SFB 475:...
  • Z. Kalal et al.

    Tracking-learning-detection

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2012)
  • LiY. et al.

    A Scale Adaptive Kernel Correlation Filter Tracker with Feature Integration

    Proceedings of the European Conference on Computer Vision

    (2014)
  • LiY. et al.

    Reliable patch trackers: robust visual tracking by exploiting reliable patches

    Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR)

    (2015)
  • A. Lukežič et al.

    Discriminative correlation filter tracker with channel and spatial reliability

    Int. J. Comput. Vis.

    (2017)
  • MaC. et al.

    Long-term correlation tracking

    Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR)

    (2015)
  • Cited by (12)

    View all citing articles on Scopus
    View full text