Elsevier

Information Sciences

Volumes 394–395, July 2017, Pages 232-245
Information Sciences

Visual object tracking via enhanced structural correlation filter

https://doi.org/10.1016/j.ins.2017.02.012Get rights and content

Abstract

In this study, we aim to build a robust correlation-based visual object tracking system. The function of traditional correlation filters for visual tracking is to search the most likely position of target by circularly shifting the search image patch. However, the search image patch needs to be large enough to cover both the object and background, which results in the algorithm being sensitive to changes in background. To alleviate this problem, we first propose an efficient object-surrounding histogram model to suppress the background. In this model, we build a Bayes classifier based on the initial given object, and we then apply it to each pixel in subsequent frames. With this model, the original image can be enhanced in order to eliminate the impact of circular shifting. Moreover, we develop a structural correlation filter that consists of both holistic and local object parts. The multiple object parts are adaptively weighted and further aggregated to predict the relative motion from the last frame. We conduct extensive experiments on frequently used benchmarks with 51 video sequences. The experimental results show that the proposed algorithm achieves outstanding performance, especially in terms of heavy occlusion and severe deformation.

Introduction

Visual object tracking has attracted much attention in a wide range of applications such as video surveillance, robotics, and auto-control systems. Given the initial bounding box of an object, the main task of a visual object tracker is to estimate the state of the object in subsequent frames. The main challenge arises from two perspectives. One is to learn the object representation from one image without any other prior information [28], [32], while the other considers the numerous variations of the object, e.g., the scale variation, heavy occlusion, deformation, in- and out-of-plane rotation, and out-of-view [31].

In recent years, a number of visual tracking algorithms have improved both in terms of their accuracy and the robustness of visual tracking [4], [11], [14], [20], [24], [27], [34], [36], [38]. In [20], Kalal et al. proposed a P-N learning algorithm in which a classifier is learned in the initial frame and updated with the tracking result. The algorithm works well in the out-of-view case, which is attributed to the re-detection component. Zhong et al. [38] built a sparsity-based collaborative model that incorporates both a discriminative classifier and generative model. Benefiting from the local sparse representation in the algorithm, the tracker achieves a high score when dealing with heavy occlusion. Lim et al. [23] developed a refined particle swarm intelligence [18] method for abrupt motion tracking. Ali et al. [2] proposed a novel cellular automation model [17] atop a set of scene-specific “floor fields” to make tracking in extremely crowded situations tractable. In [14], Hare et al. employed a structured output support vector machine (SVM) to make a direct prediction of the changes in the object’s position, and it has shown remarkable performance especially in situations involving abrupt motion. However, the above-mentioned algorithms are all limited to handling specific aspects of each of the various challenges [30]. Recently, correlation filter-based trackers have become the most popular algorithms among the visual object tracking community [5], [9], [10], [15], [16], [35]. The correlation filter is sufficiently powerful to enable the discrimination of the object from the background because it is trained with a large number of samples. Further, the efficiency of the computation may be increased by transforming to the Fourier domain. However, correlation filter-based trackers still have disadvantages that need to be addressed.

In this paper, we address two issues related to correlation filter-based trackers. 1) as currently employed in these algorithms, the kernel ridge regression is trained using thousands of samples generated by circularly shifting the image patch, and the image patch needs to be sufficiently large to cover both the object and background context, as shown in Fig. 1. As a result, the positive samples will contain unnecessary background context information. This will fuel the drifting, especially when the object is occluded. 2) the samples, which are used to predict the object position in the next frame, are also generated by circular shifting. The significantly changed background context may cancel out the contribution of the object when predicting the new state of the object. This will significantly weaken its robustness to background clutter and appearance variations. A cosine mask window, as used in [16], may alleviate the issue, but it is still not enough to handle more challenging tracking tasks.

To overcome these two issues, we propose to improve the correlation-based tracker from the following two perspectives. First, we construct a structural correlation filter in which the correlation filter is applied to both the holistic and local object image patches. The holistic and local correlation filters are adaptively weighted based on the confidence with which we can predict the location. The final prediction depends on the weighted mean response of correlation filters. When the object is partially occluded, these non-occluded local parts will play an important role in predicting the final location. In this way, the spatial structural correlation filter will predict the object position more accurately, especially in situations of occlusion.

Second, we use a novel object-surrounding histogram model to enhance the object while suppressing the background context. In our model, we applied a color histogram-based Bayes classifier to each pixel on the search image patch. The pixels with a higher probability of belonging to the object will be enhanced, and the pixels with a lower probability will be suppressed. In the object-surrounding histogram model, we considered four background contexts in different directions (left, top, right, and bottom). By using the object-surrounding histogram model, the regression model will be less sensitive to the background context. There will also be a clear boundary between the object and the background, which is very helpful to represent the object in features such as histograms of oriented gradients (HOG) [8]. Furthermore, the color histogram model provides a generative way to evaluate the track result, which is important when adaptively updating the correlation filters.

The flowchart of our enhanced structural correlation tracker is shown in Fig. 2. As you can see, the background in the enhanced image is evidently suppressed. The enhanced image is further decomposed to one holistic and four local image patches. We applied each of the image patches using a correlation filter. Then, we adaptively weighted the responses of the five correlation filters to obtain the final response. The location of the object can be predicted according to the index of the maximum value in the final response. Another problem for correlation filter-based tracking is scale estimation. Unlike the traditional approach, in which densely scaled object samples are tested, we used a random scaling method to generate candidate-searching images. The random scaling method is reasonable because the scale variation between adjacent frames is small and smooth, as in most cases. With fewer searching proposals, the random scaling method may be much more efficient, while realizing competitive performance.

Section snippets

Related work

Recently, in the field of visual tracking, there has been much attention on correlation filter-based tracking algorithms owing to the high efficiency of correlation operation when transformed into the Fourier domain. When given an initial object window, a large number of samples with Gaussian-shaped labels are selected to train a correlation filter. Then, the position of the object in the next frame can be predicted by correlating the filter over a search image patch. In [5], Bolme et al. built

Enhanced structural correlation tracking

In this section, we describe the enhanced structural correlation tracking algorithm in detail. The algorithm can be decomposed into four closely associated parts, as presented in the four subsections below. To predict the state of the object in a new frame, we first generate object proposals using the random scaling method. Then, we preprocessed each proposal using the object-surrounding histogram model. Finally, we applied the structural correlation filter to predict the position of the object

Experiment setup

To prove the effectiveness of the proposed algorithm, we compared it with other state-of-the-art algorithms on a large visual tracking benchmark [30] that contains 51 videos with various challenges.

In a typical correlation-based tracking algorithm, the searching window used to train the kernelized ridge regression model needs to be larger than the given object patch. In our implementation, the sizes of searching window of the holistic layer and four local parts are set to 2.5 and 1.7 times that

Conclusion

In this paper, we focused on a natural disadvantage of traditional correlation filter-based tracking algorithms, which is induced by the circular shifting mechanism. In our algorithm, we proposed a robust structural correlation filter that is based on holistic and local parts for handling heavy occlusions and severe deformation. Furthermore, we proposed a novel object-surrounding histogram model to enhance the object while suppressing the background. This significantly improves the performance

Acknowledgment

We would like to thank the reviewers for their time and the valuable comments. This work is supported by the National Natural Science Foundation of China (Grant 61371140) and in part by Grants 2015CFA062 and 2015BAA133.

References (38)

  • L. Cehovin et al.

    Robust visual tracking using an adaptive coupled-layer visual model

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2013)
  • P. Chockalingam et al.

    Adaptive fragments-based tracking of non-rigid objects using level sets

    Proceedings of IEEE International Conference on Computer Vision

    (2009)
  • N. Dalal et al.

    Histograms of oriented gradients for human detection

    Proceedings of IEEE Conference on Computer Vision and Pattern Recognition

    (2005)
  • M. Danelljan et al.

    Accurate scale estimation for robust visual tracking

    Proceedings of British Machine Vision Conference

    (2014)
  • M. Danelljan et al.

    Adaptive color attributes for real-time visual tracking

    Proceedings of IEEE Conference on Computer Vision and Pattern Recognition

    (2014)
  • T.B. Dinh et al.

    Context tracker: exploring supporters and distracters in unconstrained environments

    Proceedings of IEEE Conference on Computer Vision and Pattern Recognition

    (2011)
  • D. Du et al.

    Online deformable object tracking based on structure-aware hyper-graph

    IEEE Trans. Image Process.

    (2016)
  • P. Felzenszwalb et al.

    Object detection with discriminatively trained part-based models

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2010)
  • S. Hare et al.

    Struck: structured output tracking with kernels

    Proceedings of IEEE International Conference on Computer Vision

    (2011)
  • Cited by (30)

    • Reliable correlation tracking via dual-memory selection model

      2020, Information Sciences
      Citation Excerpt :

      Unreliable tracking results will contaminate the filter over time, which can lead to tracking failure if not immediately addressed. To mitigate the model drift problem, some researchers [4,5] design a dynamic learning rate based on the confidence of the current tracking result. However, it is not easy to robustly evaluate the tracking confidence, and this is always unfeasible in some complex scenarios.

    • Size-aware visual object tracking via dynamic fusion of correlation filter-based part regressors

      2019, Signal Processing
      Citation Excerpt :

      Thanks to preserving object structure since appearance modeling, part-based tracking strategy holds a powerful framework toward local identifying thus handling non-object information notably induced by partial occlusion [12,31] provided that trackable parts are carefully distinguished thus well aligned by their counterparts in successive frames via a robust object sizing approach [32,33]. In general, a part-based model is described by either local scheme or mixed one; The former uses collaborative smaller parts [33–39] while the latter interactively hires both global and local parts [23,31,40–43]. Anyway, improving the tracking performance in a part-based scheme mostly comes at the price of increasing the runtime, e.g., the part-wise CFTs introduced in [36] or [43].

    • CFGVF: An improved correlation filters based visual tracking algorithm

      2019, Optik
      Citation Excerpt :

      In [10], the cyclic structural tracker was promoted by combining kernel trick and multiple-channel features. To deal with partial occlusion, Chen et al. [12] introduced a novel object-surrounding histogram model to enhance the object while suppressing the background and proposed a robust structural correlation tracker based on holistic and local parts. To tackle the problem of the fixed template size in kernel correlation filter based tracker, Li and Zhu [17] proposed an effective scale adaptive scheme.

    View all citing articles on Scopus
    View full text