Visual object tracking via enhanced structural correlation filter
Introduction
Visual object tracking has attracted much attention in a wide range of applications such as video surveillance, robotics, and auto-control systems. Given the initial bounding box of an object, the main task of a visual object tracker is to estimate the state of the object in subsequent frames. The main challenge arises from two perspectives. One is to learn the object representation from one image without any other prior information [28], [32], while the other considers the numerous variations of the object, e.g., the scale variation, heavy occlusion, deformation, in- and out-of-plane rotation, and out-of-view [31].
In recent years, a number of visual tracking algorithms have improved both in terms of their accuracy and the robustness of visual tracking [4], [11], [14], [20], [24], [27], [34], [36], [38]. In [20], Kalal et al. proposed a P-N learning algorithm in which a classifier is learned in the initial frame and updated with the tracking result. The algorithm works well in the out-of-view case, which is attributed to the re-detection component. Zhong et al. [38] built a sparsity-based collaborative model that incorporates both a discriminative classifier and generative model. Benefiting from the local sparse representation in the algorithm, the tracker achieves a high score when dealing with heavy occlusion. Lim et al. [23] developed a refined particle swarm intelligence [18] method for abrupt motion tracking. Ali et al. [2] proposed a novel cellular automation model [17] atop a set of scene-specific “floor fields” to make tracking in extremely crowded situations tractable. In [14], Hare et al. employed a structured output support vector machine (SVM) to make a direct prediction of the changes in the object’s position, and it has shown remarkable performance especially in situations involving abrupt motion. However, the above-mentioned algorithms are all limited to handling specific aspects of each of the various challenges [30]. Recently, correlation filter-based trackers have become the most popular algorithms among the visual object tracking community [5], [9], [10], [15], [16], [35]. The correlation filter is sufficiently powerful to enable the discrimination of the object from the background because it is trained with a large number of samples. Further, the efficiency of the computation may be increased by transforming to the Fourier domain. However, correlation filter-based trackers still have disadvantages that need to be addressed.
In this paper, we address two issues related to correlation filter-based trackers. 1) as currently employed in these algorithms, the kernel ridge regression is trained using thousands of samples generated by circularly shifting the image patch, and the image patch needs to be sufficiently large to cover both the object and background context, as shown in Fig. 1. As a result, the positive samples will contain unnecessary background context information. This will fuel the drifting, especially when the object is occluded. 2) the samples, which are used to predict the object position in the next frame, are also generated by circular shifting. The significantly changed background context may cancel out the contribution of the object when predicting the new state of the object. This will significantly weaken its robustness to background clutter and appearance variations. A cosine mask window, as used in [16], may alleviate the issue, but it is still not enough to handle more challenging tracking tasks.
To overcome these two issues, we propose to improve the correlation-based tracker from the following two perspectives. First, we construct a structural correlation filter in which the correlation filter is applied to both the holistic and local object image patches. The holistic and local correlation filters are adaptively weighted based on the confidence with which we can predict the location. The final prediction depends on the weighted mean response of correlation filters. When the object is partially occluded, these non-occluded local parts will play an important role in predicting the final location. In this way, the spatial structural correlation filter will predict the object position more accurately, especially in situations of occlusion.
Second, we use a novel object-surrounding histogram model to enhance the object while suppressing the background context. In our model, we applied a color histogram-based Bayes classifier to each pixel on the search image patch. The pixels with a higher probability of belonging to the object will be enhanced, and the pixels with a lower probability will be suppressed. In the object-surrounding histogram model, we considered four background contexts in different directions (left, top, right, and bottom). By using the object-surrounding histogram model, the regression model will be less sensitive to the background context. There will also be a clear boundary between the object and the background, which is very helpful to represent the object in features such as histograms of oriented gradients (HOG) [8]. Furthermore, the color histogram model provides a generative way to evaluate the track result, which is important when adaptively updating the correlation filters.
The flowchart of our enhanced structural correlation tracker is shown in Fig. 2. As you can see, the background in the enhanced image is evidently suppressed. The enhanced image is further decomposed to one holistic and four local image patches. We applied each of the image patches using a correlation filter. Then, we adaptively weighted the responses of the five correlation filters to obtain the final response. The location of the object can be predicted according to the index of the maximum value in the final response. Another problem for correlation filter-based tracking is scale estimation. Unlike the traditional approach, in which densely scaled object samples are tested, we used a random scaling method to generate candidate-searching images. The random scaling method is reasonable because the scale variation between adjacent frames is small and smooth, as in most cases. With fewer searching proposals, the random scaling method may be much more efficient, while realizing competitive performance.
Section snippets
Related work
Recently, in the field of visual tracking, there has been much attention on correlation filter-based tracking algorithms owing to the high efficiency of correlation operation when transformed into the Fourier domain. When given an initial object window, a large number of samples with Gaussian-shaped labels are selected to train a correlation filter. Then, the position of the object in the next frame can be predicted by correlating the filter over a search image patch. In [5], Bolme et al. built
Enhanced structural correlation tracking
In this section, we describe the enhanced structural correlation tracking algorithm in detail. The algorithm can be decomposed into four closely associated parts, as presented in the four subsections below. To predict the state of the object in a new frame, we first generate object proposals using the random scaling method. Then, we preprocessed each proposal using the object-surrounding histogram model. Finally, we applied the structural correlation filter to predict the position of the object
Experiment setup
To prove the effectiveness of the proposed algorithm, we compared it with other state-of-the-art algorithms on a large visual tracking benchmark [30] that contains 51 videos with various challenges.
In a typical correlation-based tracking algorithm, the searching window used to train the kernelized ridge regression model needs to be larger than the given object patch. In our implementation, the sizes of searching window of the holistic layer and four local parts are set to 2.5 and 1.7 times that
Conclusion
In this paper, we focused on a natural disadvantage of traditional correlation filter-based tracking algorithms, which is induced by the circular shifting mechanism. In our algorithm, we proposed a robust structural correlation filter that is based on holistic and local parts for handling heavy occlusions and severe deformation. Furthermore, we proposed a novel object-surrounding histogram model to enhance the object while suppressing the background. This significantly improves the performance
Acknowledgment
We would like to thank the reviewers for their time and the valuable comments. This work is supported by the National Natural Science Foundation of China (Grant 61371140) and in part by Grants 2015CFA062 and 2015BAA133.
References (38)
- et al.
A storage allocation algorithm for outbound containers based on the outerinner cellular automaton
Inf. Sci.
(2014) - et al.
Refined particle swarm intelligence method for abrupt motion tracking
Inf. Sci.
(2014) - et al.
Efficient visual tracking using particle filter with incremental likelihood calculation
Inf. Sci.
(2012) - et al.
Recent advances and trends in visual tracking: a review
Neurocomputing
(2011) - et al.
Online similarity learning for visual tracking
Inf. Sci.
(2016) - et al.
Multiple object tracking with partial occlusion handling using salient feature points
Inf. Sci.
(2014) - et al.
Floor fields for tracking in high density crowd scenes
Proceedings of the European Conference on Computer Vision
(2008) - et al.
Robust object tracking with online multiple instance learning
IEEE Trans. Pattern Anal. Mach. Intell.
(2011) - et al.
Real time robust l1 tracker using accelerated proximal gradient approach
Proceedings of IEEE Conference on Computer Vision and Pattern Recognition
(2012) - et al.
Visual object tracking using adaptive correlation filters
Proceedings of IEEE Conference on Computer Vision and Pattern Recognition
(2010)
Robust visual tracking using an adaptive coupled-layer visual model
IEEE Trans. Pattern Anal. Mach. Intell.
Adaptive fragments-based tracking of non-rigid objects using level sets
Proceedings of IEEE International Conference on Computer Vision
Histograms of oriented gradients for human detection
Proceedings of IEEE Conference on Computer Vision and Pattern Recognition
Accurate scale estimation for robust visual tracking
Proceedings of British Machine Vision Conference
Adaptive color attributes for real-time visual tracking
Proceedings of IEEE Conference on Computer Vision and Pattern Recognition
Context tracker: exploring supporters and distracters in unconstrained environments
Proceedings of IEEE Conference on Computer Vision and Pattern Recognition
Online deformable object tracking based on structure-aware hyper-graph
IEEE Trans. Image Process.
Object detection with discriminatively trained part-based models
IEEE Trans. Pattern Anal. Mach. Intell.
Struck: structured output tracking with kernels
Proceedings of IEEE International Conference on Computer Vision
Cited by (30)
Reliable correlation tracking via dual-memory selection model
2020, Information SciencesCitation Excerpt :Unreliable tracking results will contaminate the filter over time, which can lead to tracking failure if not immediately addressed. To mitigate the model drift problem, some researchers [4,5] design a dynamic learning rate based on the confidence of the current tracking result. However, it is not easy to robustly evaluate the tracking confidence, and this is always unfeasible in some complex scenarios.
Masked and dynamic Siamese network for robust visual tracking
2019, Information SciencesSize-aware visual object tracking via dynamic fusion of correlation filter-based part regressors
2019, Signal ProcessingCitation Excerpt :Thanks to preserving object structure since appearance modeling, part-based tracking strategy holds a powerful framework toward local identifying thus handling non-object information notably induced by partial occlusion [12,31] provided that trackable parts are carefully distinguished thus well aligned by their counterparts in successive frames via a robust object sizing approach [32,33]. In general, a part-based model is described by either local scheme or mixed one; The former uses collaborative smaller parts [33–39] while the latter interactively hires both global and local parts [23,31,40–43]. Anyway, improving the tracking performance in a part-based scheme mostly comes at the price of increasing the runtime, e.g., the part-wise CFTs introduced in [36] or [43].
CFGVF: An improved correlation filters based visual tracking algorithm
2019, OptikCitation Excerpt :In [10], the cyclic structural tracker was promoted by combining kernel trick and multiple-channel features. To deal with partial occlusion, Chen et al. [12] introduced a novel object-surrounding histogram model to enhance the object while suppressing the background and proposed a robust structural correlation tracker based on holistic and local parts. To tackle the problem of the fixed template size in kernel correlation filter based tracker, Li and Zhu [17] proposed an effective scale adaptive scheme.
Object tracking under large motion: Combining coarse-to-fine search with superpixels
2019, Information SciencesAugmenting cascaded correlation filters with spatial–temporal saliency for visual tracking
2019, Information Sciences