Visual object tracking via enhanced structural correlation filter

doi:10.1016/j.ins.2017.02.012

Information Sciences

Volumes 394–395, July 2017, Pages 232-245

https://doi.org/10.1016/j.ins.2017.02.012 Get rights and content

Abstract

In this study, we aim to build a robust correlation-based visual object tracking system. The function of traditional correlation filters for visual tracking is to search the most likely position of target by circularly shifting the search image patch. However, the search image patch needs to be large enough to cover both the object and background, which results in the algorithm being sensitive to changes in background. To alleviate this problem, we first propose an efficient object-surrounding histogram model to suppress the background. In this model, we build a Bayes classifier based on the initial given object, and we then apply it to each pixel in subsequent frames. With this model, the original image can be enhanced in order to eliminate the impact of circular shifting. Moreover, we develop a structural correlation filter that consists of both holistic and local object parts. The multiple object parts are adaptively weighted and further aggregated to predict the relative motion from the last frame. We conduct extensive experiments on frequently used benchmarks with 51 video sequences. The experimental results show that the proposed algorithm achieves outstanding performance, especially in terms of heavy occlusion and severe deformation.

Introduction

Visual object tracking has attracted much attention in a wide range of applications such as video surveillance, robotics, and auto-control systems. Given the initial bounding box of an object, the main task of a visual object tracker is to estimate the state of the object in subsequent frames. The main challenge arises from two perspectives. One is to learn the object representation from one image without any other prior information [28], [32], while the other considers the numerous variations of the object, e.g., the scale variation, heavy occlusion, deformation, in- and out-of-plane rotation, and out-of-view [31].

In recent years, a number of visual tracking algorithms have improved both in terms of their accuracy and the robustness of visual tracking [4], [11], [14], [20], [24], [27], [34], [36], [38]. In [20], Kalal et al. proposed a P-N learning algorithm in which a classifier is learned in the initial frame and updated with the tracking result. The algorithm works well in the out-of-view case, which is attributed to the re-detection component. Zhong et al. [38] built a sparsity-based collaborative model that incorporates both a discriminative classifier and generative model. Benefiting from the local sparse representation in the algorithm, the tracker achieves a high score when dealing with heavy occlusion. Lim et al. [23] developed a refined particle swarm intelligence [18] method for abrupt motion tracking. Ali et al. [2] proposed a novel cellular automation model [17] atop a set of scene-specific “floor fields” to make tracking in extremely crowded situations tractable. In [14], Hare et al. employed a structured output support vector machine (SVM) to make a direct prediction of the changes in the object’s position, and it has shown remarkable performance especially in situations involving abrupt motion. However, the above-mentioned algorithms are all limited to handling specific aspects of each of the various challenges [30]. Recently, correlation filter-based trackers have become the most popular algorithms among the visual object tracking community [5], [9], [10], [15], [16], [35]. The correlation filter is sufficiently powerful to enable the discrimination of the object from the background because it is trained with a large number of samples. Further, the efficiency of the computation may be increased by transforming to the Fourier domain. However, correlation filter-based trackers still have disadvantages that need to be addressed.

In this paper, we address two issues related to correlation filter-based trackers. 1) as currently employed in these algorithms, the kernel ridge regression is trained using thousands of samples generated by circularly shifting the image patch, and the image patch needs to be sufficiently large to cover both the object and background context, as shown in Fig. 1. As a result, the positive samples will contain unnecessary background context information. This will fuel the drifting, especially when the object is occluded. 2) the samples, which are used to predict the object position in the next frame, are also generated by circular shifting. The significantly changed background context may cancel out the contribution of the object when predicting the new state of the object. This will significantly weaken its robustness to background clutter and appearance variations. A cosine mask window, as used in [16], may alleviate the issue, but it is still not enough to handle more challenging tracking tasks.

To overcome these two issues, we propose to improve the correlation-based tracker from the following two perspectives. First, we construct a structural correlation filter in which the correlation filter is applied to both the holistic and local object image patches. The holistic and local correlation filters are adaptively weighted based on the confidence with which we can predict the location. The final prediction depends on the weighted mean response of correlation filters. When the object is partially occluded, these non-occluded local parts will play an important role in predicting the final location. In this way, the spatial structural correlation filter will predict the object position more accurately, especially in situations of occlusion.

Second, we use a novel object-surrounding histogram model to enhance the object while suppressing the background context. In our model, we applied a color histogram-based Bayes classifier to each pixel on the search image patch. The pixels with a higher probability of belonging to the object will be enhanced, and the pixels with a lower probability will be suppressed. In the object-surrounding histogram model, we considered four background contexts in different directions (left, top, right, and bottom). By using the object-surrounding histogram model, the regression model will be less sensitive to the background context. There will also be a clear boundary between the object and the background, which is very helpful to represent the object in features such as histograms of oriented gradients (HOG) [8]. Furthermore, the color histogram model provides a generative way to evaluate the track result, which is important when adaptively updating the correlation filters.

The flowchart of our enhanced structural correlation tracker is shown in Fig. 2. As you can see, the background in the enhanced image is evidently suppressed. The enhanced image is further decomposed to one holistic and four local image patches. We applied each of the image patches using a correlation filter. Then, we adaptively weighted the responses of the five correlation filters to obtain the final response. The location of the object can be predicted according to the index of the maximum value in the final response. Another problem for correlation filter-based tracking is scale estimation. Unlike the traditional approach, in which densely scaled object samples are tested, we used a random scaling method to generate candidate-searching images. The random scaling method is reasonable because the scale variation between adjacent frames is small and smooth, as in most cases. With fewer searching proposals, the random scaling method may be much more efficient, while realizing competitive performance.

Section snippets

Related work

Recently, in the field of visual tracking, there has been much attention on correlation filter-based tracking algorithms owing to the high efficiency of correlation operation when transformed into the Fourier domain. When given an initial object window, a large number of samples with Gaussian-shaped labels are selected to train a correlation filter. Then, the position of the object in the next frame can be predicted by correlating the filter over a search image patch. In [5], Bolme et al. built

Enhanced structural correlation tracking

In this section, we describe the enhanced structural correlation tracking algorithm in detail. The algorithm can be decomposed into four closely associated parts, as presented in the four subsections below. To predict the state of the object in a new frame, we first generate object proposals using the random scaling method. Then, we preprocessed each proposal using the object-surrounding histogram model. Finally, we applied the structural correlation filter to predict the position of the object

Experiment setup

To prove the effectiveness of the proposed algorithm, we compared it with other state-of-the-art algorithms on a large visual tracking benchmark [30] that contains 51 videos with various challenges.

In a typical correlation-based tracking algorithm, the searching window used to train the kernelized ridge regression model needs to be larger than the given object patch. In our implementation, the sizes of searching window of the holistic layer and four local parts are set to 2.5 and 1.7 times that

Conclusion

In this paper, we focused on a natural disadvantage of traditional correlation filter-based tracking algorithms, which is induced by the circular shifting mechanism. In our algorithm, we proposed a robust structural correlation filter that is based on holistic and local parts for handling heavy occlusions and severe deformation. Furthermore, we proposed a novel object-surrounding histogram model to enhance the object while suppressing the background. This significantly improves the performance

Acknowledgment

We would like to thank the reviewers for their time and the valuable comments. This work is supported by the National Natural Science Foundation of China (Grant 61371140) and in part by Grants 2015CFA062 and 2015BAA133.

References (38)

W. Hu et al.
A storage allocation algorithm for outbound containers based on the outerinner cellular automaton
Inf. Sci.
(2014)
M. Lim et al.
Refined particle swarm intelligence method for abrupt motion tracking
Inf. Sci.
(2014)
H. Liu et al.
Efficient visual tracking using particle filter with incremental likelihood calculation
Inf. Sci.
(2012)
H. Yang et al.
Recent advances and trends in visual tracking: a review
Neurocomputing
(2011)
S. Yi et al.
Online similarity learning for visual tracking
Inf. Sci.
(2016)
M.N. Ali et al.
Multiple object tracking with partial occlusion handling using salient feature points
Inf. Sci.
(2014)
S. Ali et al.
Floor fields for tracking in high density crowd scenes
Proceedings of the European Conference on Computer Vision
(2008)
B. Babenko et al.
Robust object tracking with online multiple instance learning
IEEE Trans. Pattern Anal. Mach. Intell.
(2011)
C. Bao et al.
Real time robust l1 tracker using accelerated proximal gradient approach
Proceedings of IEEE Conference on Computer Vision and Pattern Recognition
(2012)
D.S. Bolme et al.
Visual object tracking using adaptive correlation filters
Proceedings of IEEE Conference on Computer Vision and Pattern Recognition
(2010)

L. Cehovin et al.

Robust visual tracking using an adaptive coupled-layer visual model

IEEE Trans. Pattern Anal. Mach. Intell.

(2013)

P. Chockalingam et al.

Adaptive fragments-based tracking of non-rigid objects using level sets

Proceedings of IEEE International Conference on Computer Vision

(2009)

N. Dalal et al.

Histograms of oriented gradients for human detection

Proceedings of IEEE Conference on Computer Vision and Pattern Recognition

(2005)

M. Danelljan et al.

Accurate scale estimation for robust visual tracking

Proceedings of British Machine Vision Conference

(2014)

M. Danelljan et al.

Adaptive color attributes for real-time visual tracking

Proceedings of IEEE Conference on Computer Vision and Pattern Recognition

(2014)

T.B. Dinh et al.

Context tracker: exploring supporters and distracters in unconstrained environments

Proceedings of IEEE Conference on Computer Vision and Pattern Recognition

(2011)

D. Du et al.

Online deformable object tracking based on structure-aware hyper-graph

IEEE Trans. Image Process.

(2016)

P. Felzenszwalb et al.

Object detection with discriminatively trained part-based models

IEEE Trans. Pattern Anal. Mach. Intell.

(2010)

S. Hare et al.

Struck: structured output tracking with kernels

Proceedings of IEEE International Conference on Computer Vision

(2011)

Cited by (30)

Reliable correlation tracking via dual-memory selection model
2020, Information Sciences
Citation Excerpt :
Unreliable tracking results will contaminate the filter over time, which can lead to tracking failure if not immediately addressed. To mitigate the model drift problem, some researchers [4,5] design a dynamic learning rate based on the confidence of the current tracking result. However, it is not easy to robustly evaluate the tracking confidence, and this is always unfeasible in some complex scenarios.
Correlation-filter-based trackers have shown favorable accuracy and efficiency in visual tracking. However, most of these trackers are prone to drift in cases of heavy occlusions and temporal tracking failures because they only maintain the short-term memory of target appearance via a highly adaptive update mode. In this paper, we propose a reliable visual tracking method based on a dual-memory selection (DMS) model to alleviate tracking drift. Considering that long-term memory is robust to heavy occlusions while short-term memory performs well in rapid appearance changes, the proposed DMS model combines these two memory patterns of the target appearance and adaptively selects a reliable memory pattern to handle the current tracking challenges via a memory selector. For each memory pattern, a memory tracker is established based on discriminative correlation filters. The short-term tracker aggressively updates the target model to capture recent appearance changes via a linear interpolation update model, while the long-term tracker conservatively updates the target model to maintain historical appearance characteristics with a memory-improved update model and a dynamic learning rate. Furthermore, a novel memory evaluation criterion (MEC) is developed to evaluate the reliability of each tracker for memory selection. From credibility and discriminability measurements considering the temporal context, the memory tracker with the highest reliability score is selected to determine the target location in each frame. Extensive experiments on public benchmark datasets demonstrate that the proposed tracking method performs favorably compared to multiple recent state-of-the-art methods.
Masked and dynamic Siamese network for robust visual tracking
2019, Information Sciences
Visual object tracking is as a critical function for many computer vision tasks such as motion analysis, event detection and action recognition. Recently, Siamese network based trackers gained enormous popularity in the tracking field due to their favorable accuracy and efficiency. However, the distraction problem caused by semantic backgrounds and the simple modeling strategy of target templates often lead to performance degradation. In this study, we propose two modules, namely the target objectness model and the target template model, based on existing Siamese network based trackers to solve these issues. The target objectness model computes the possibility of each pixel in the search area pertaining to the tracked target based on color distributions of the foreground and background areas. The computed target likelihood map is masked on the previous response map, and subsequently adjusts the final response map to focus on the target. This practice enlarges the discrimination between the tracked target and surrounding backgrounds, thus alleviating the distraction problem. The target template model proposes a Gaussian mixed model to encode target appearance variations, where each component of the model represents a different aspect of the target, and the component weights are learned and dynamically updated. The proposed Gaussian model enhances diversity and simultaneously reduces redundancy between target samples. To validate the effectiveness of our proposed method, we perform extensive experiments on four widely used benchmarks, namely OTB100, VOT2016, TC128, and UAV123. The experimental results demonstrate that our proposed algorithm achieves favorable performance compared to many state-of-the-art trackers while maintaining real-time tracking speed.
Size-aware visual object tracking via dynamic fusion of correlation filter-based part regressors
2019, Signal Processing
Citation Excerpt :
Thanks to preserving object structure since appearance modeling, part-based tracking strategy holds a powerful framework toward local identifying thus handling non-object information notably induced by partial occlusion [12,31] provided that trackable parts are carefully distinguished thus well aligned by their counterparts in successive frames via a robust object sizing approach [32,33]. In general, a part-based model is described by either local scheme or mixed one; The former uses collaborative smaller parts [33–39] while the latter interactively hires both global and local parts [23,31,40–43]. Anyway, improving the tracking performance in a part-based scheme mostly comes at the price of increasing the runtime, e.g., the part-wise CFTs introduced in [36] or [43].
Although correlation filter (CF)-based trackers have shown promising results in addressing problematic challenges of visual tracking, common holistic-wise CF-based trackers mostly drift away from the target object when undergoing partial occlusion. On the other hand, part-based models provide a prosperous basis for handling occlusion problem, due to preserving local structure of the target object. Employing local-global appearance models of the object, we propose a robust tracking algorithm based on the weighted cumulative fusion of CF-based part regressors. Indeed, we dynamically learn importance weights of each part via a multilinear ridge regression optimization model aiming at enhancing discrimination power of our tracker. To alleviate tracking drift caused by the object size changes, we further present an accurate method that jointly estimates object scale and aspect ratio by analyzing relative deformation cost of importance pair-wise parts. Also, to reduce the computational complexity, we introduce a feature sharing strategy for all constituent parts. Extensive experiments on OTB-2013, OTB-50, OTB-100, and VOT2016 datasets demonstrate that our tracker not only impressively enhances the performance of target-wise KCF tracker as its baseline but also performs favorably against state-of-the-art trackers in terms of qualitative and quantitative measures while running about 30 fps using Matlab on 3.2 GHz core-i5.
CFGVF: An improved correlation filters based visual tracking algorithm
2019, Optik
Citation Excerpt :
In [10], the cyclic structural tracker was promoted by combining kernel trick and multiple-channel features. To deal with partial occlusion, Chen et al. [12] introduced a novel object-surrounding histogram model to enhance the object while suppressing the background and proposed a robust structural correlation tracker based on holistic and local parts. To tackle the problem of the fixed template size in kernel correlation filter based tracker, Li and Zhu [17] proposed an effective scale adaptive scheme.
Visual tracking is a challenging task in computer vision. Correlation filter (CF) based visual tracking algorithm has become an attractive tracking technique, while there are still some limitations. Existing CF-based tracking algorithms are vulnerable to the influence of the surrounding background, and the usage of fixed scale template may easily lead to tracking failure. To overcome the above limitations, we propose a CFs-based visual tracking algorithm, CFs with Gabor energy filter, variable-scale template and features fusion (CFGVF). In CFGVF, the Gabor energy filter is firstly adopted to preprocess every frame of the image sequence, which largely eliminates the influence of illumination variation. Then, the Gabor energy, Histogram of Oriented Gradient and color naming features are integrated to enhance the ability of dealing with significant appearance variations such as deformation and motion blur. Furthermore, we propose a variable-scale template method to estimate the scale of the target object. Finally, an online updating schema is adopted to reduce the interference of surrounding background and occlusion. Experimental results on the visual tracking benchmark dataset OOTB show that the performance and robustness of the CFGVF tracker outperform those of several state-of-the-art trackers.
Object tracking under large motion: Combining coarse-to-fine search with superpixels
2019, Information Sciences
We propose an object tracking method under large motion in image sequences. Dense sampling and particle filtering have been widely applied to cope with this problem; however, the former is computationally expensive, and the latter is sensitive to local minima. By introducing a novel search method based on coarse-to-fine strategy and image superpixels, we try to solve both drawbacks. In the coarse step, we first extract superpixels associated with a target object on the entire search region by using a simple generative appearance model. In the fine step, we perform a sampling and similarity measurement process within the selected superpixels to find the most accurate location of the target object, also suggest a way to use both a discriminative appearance model and a sophisticated generative appearance model simultaneously. Extensive experiments on popular benchmark dataset demonstrate that the proposed method outperforms other competitive approaches, and also show better results in challenging scenarios such as occlusion, deformation, out-of-view, and in-plane/out-of-plane rotation.
Augmenting cascaded correlation filters with spatial–temporal saliency for visual tracking
2019, Information Sciences
We herein propose a novel visual tracking approach using cascaded discriminative correlation filters (DCFs). The approach consists of two stages. In the first stage, a DCF is trained with high-level convolutional features to initially estimate the location of the object. In the second stage, another DCF is trained using low-level convolutional features to refine the object location. To efficiently track the deformable or occluded objects, spatial–temporal saliency is introduced to enhance the second stage DCF. The proposed approach is tested on the VOT2015 and OTB-13 benchmark datasets. The experimental results show that our tracker achieves state-of-the-art performance and performs extremely well in tracking nonrigid, fast moving, or occluded objects.

View all citing articles on Scopus

View full text

Visual object tracking via enhanced structural correlation filter

Abstract

Introduction

Section snippets

Related work

Enhanced structural correlation tracking

Experiment setup

Conclusion

Acknowledgment

Inf. Sci.

Inf. Sci.

Inf. Sci.

Neurocomputing

Inf. Sci.

Multiple object tracking with partial occlusion handling using salient feature points

Inf. Sci.

Floor fields for tracking in high density crowd scenes

Proceedings of the European Conference on Computer Vision

Robust object tracking with online multiple instance learning

IEEE Trans. Pattern Anal. Mach. Intell.

Real time robust l1 tracker using accelerated proximal gradient approach

Proceedings of IEEE Conference on Computer Vision and Pattern Recognition

Visual object tracking using adaptive correlation filters

Proceedings of IEEE Conference on Computer Vision and Pattern Recognition

Robust visual tracking using an adaptive coupled-layer visual model

IEEE Trans. Pattern Anal. Mach. Intell.

Adaptive fragments-based tracking of non-rigid objects using level sets

Proceedings of IEEE International Conference on Computer Vision

Histograms of oriented gradients for human detection

Proceedings of IEEE Conference on Computer Vision and Pattern Recognition

Accurate scale estimation for robust visual tracking

Proceedings of British Machine Vision Conference

Adaptive color attributes for real-time visual tracking

Proceedings of IEEE Conference on Computer Vision and Pattern Recognition

Context tracker: exploring supporters and distracters in unconstrained environments

Proceedings of IEEE Conference on Computer Vision and Pattern Recognition

Online deformable object tracking based on structure-aware hyper-graph

IEEE Trans. Image Process.

Object detection with discriminatively trained part-based models

IEEE Trans. Pattern Anal. Mach. Intell.

Struck: structured output tracking with kernels

Proceedings of IEEE International Conference on Computer Vision