Elsevier

Neurocomputing

Volume 358, 17 September 2019, Pages 119-140
Neurocomputing

Robust correlation filter tracking with multi-scale spatial view

https://doi.org/10.1016/j.neucom.2019.05.017Get rights and content

Highlights

  • This work proposes a robust correlation filter tracking method with multi-scale spatial view (RCFMSV) which focuses on such interference as serious occlusion or severe illumination change in video tracking.

  • This work puts forward a detection model of multi-scale spatial view (DMMSV) which detects the interfered status.

  • This work proposes an on-line location model of multi-scale spatial view (On-Line LMMSV) to realize a more robust target positioning.

  • The proposed work is competitive with the state-of-the-art methods in tracking performance on object tracking benchmark.

Abstract

With extensive applications, visual tracking has already become one of the most important research focuses in computer vision. Due to such interference as serious occlusion or severe illumination change and so on, the appearance model of the target tends to vary heavily, posing great challenges on tracking. However, a majority of existing tracking methods have difficulties in detecting the above interference under the single spatial view, affecting the performance of tracking method apparently. In this paper, a robust correlation filter tracking method with multi-scale spatial view (RCFMSV) is proposed in which a group of multi-scale spatial filters of different view areas is established. There are two models in RCFMSV, one is detection model of multi-scale spatial view (DMMSV), which is responsible for the interference detection with the help of different sensitivity of the spatial view in different scales. The other is on-line location model of multi-scale spatial view (On-line LMMSV), which is mainly used to perform collaborative location by introducing the method of pre-location and adopting the multi-scale spatial view around the target as a reference to realize a more accurate tracking method. Extensive tracking experiments have been conducted on the proposed algorithm in object tracking benchmark and detailed comparative analysis between this algorithm and the state-of-the-art methods also have been made. It is confirmed by the experiments and analysis that the RCFMSV tracking method proposed in our work is competitive with the state-of-the-art methods in tracking performance.

Introduction

As one of the most important research focuses in computer vision [1], [2], [3], [4], [5] and big data [6], [7], [8], [9], [10], visual tracking is also playing a crucial role in such applications as autopilot, video surveillance, behavior analysis and video understanding [11], [12].

In the past decades, researchers have proposed tracking methods fall into such two types as generative method and discriminative method. To be specific, generative method (e.g., [13], [14], [15], [16], [17], [18], [19], [20]) refers to learning the target appearance model via training images collected from video frames. It reconstructs the candidate target image samples with the appearance model resulted from learning and designates the candidate sample with the minimal reconstruction error as the location of the target to be tracked. The main idea of discriminative method (e.g., [21], [22], [23], [24], [25], [26], [27], [28], [29], [30]), which regards visual tracking as an issue of binary classification, is to train the classifiers online with the image samples of target and background. The classifier is then used to score the candidate samples in the subsequent frame and the location of the candidate sample with the highest score is set as the target’s new location.

Over the above two types of tracking method, generative method is, on one hand, equipped with satisfactory adaptability to the changes of target’s appearance model. But on the other hand, the reconstruction error is required for each candidate sample, leading to higher computational complexity. In addition, when the target confronts partial occlusion, that method is likely to suffer drift or even failure in tracking due to the accumulation of instability.. The discriminative tracking method performs well in dealing with partial occlusion on the target because it takes the background information into consideration in training process. The shortcomings of this type of method mainly lie in that it is difficult to generate the optimal separating hyperplane when there is relatively few training samples whose training process is associated with relatively high computational complexity.

In recent years, correlation filter based tracking methods (e.g., [31], [32], [33], [34], [35], [36], [37], [38], [39], [40], [41], [42], [43], [44], [45], [46]) have been proposed which received much attention because it outperforms the previous methods in terms of the tracking speed and accuracy [47]. But alternatively, due to the insufficient updating methods when suffered from serious occlusion or severe change in illumination and so on, the appearance model of the target is easy to carry more and more noise information, leading to drift or even tracking failure. Therefore, the correlation filter based tracking methods still have a comparatively large space for performance enhancement when in face of such interference as serious occlusion or severe change in illumination and so on.

As an important research focus in visual tracking, the issue that how to correctly cope with the interference factors including serious occlusion or severe change in illumination and so on still poses great challenges on existing tracking methods. Although correlation filter based tracking methods have so far solved this kind of problem to some extent, there are still some shortcomings as follows:

  • A majority of tracking methods still have problems to deal with the issue of interference especially the situation where the target is subject to serious occlusion or severe illumination change and so on. Consequently, the trackers are highly possible to introduce the incorrect image data to the appearance model causing inaccurate location or even tracking failure.

  • Most tracking methods tend to adopt a single filter to locate target, which is likely to result in tracking drifts and reducing the accuracy of the tracking algorithm due to lack of spatial information for reference.

It is observed from our research that the spatial sensitivities are different by the different scales of view around the target when the object suffers from interference. To be specific, the smaller scale of spatial view is more sensitive to the interference while the larger scale of spatial view shows less sensitive. Note that here “the smaller scale” means the smaller viewing window. Fig. 1 illustrates the sampling method of our work. On the other hand, if the tracker has more adaptive mechanism to realize collaborative location, the tracking task can be more precise. The innovation of our work is originally enlightened by the phenomenon. Fig. 2 illustrates the main idea of the proposed method.

In view of the above analysis, we propose a robust correlation filter tracking method with multi-scale spatial view (RCFMSV) whose basic idea is to establish a multi-scale spatial view in line with different ranges of sampling and set up corresponding sub filters. We found the characteristic that spatial views in different scales show different sensitivity to the target’s such interference (e.g. serious occlusion, severe illumination change). This can be used to detect the target’s state especially under serious interference. Alternatively, the multi-scale spatial view can be incorporated with the method of pre-positioning and collaborative location to improve the accuracy of the tracker and a more outstanding tracking method is therefore achieved.

Different from previous work adopting multi-scale information mainly for object scale adaption [48], [49], [50], we exploit the multi-scale spatial view to detect interference and locate the target collaboratively, which is also the contribution of our work.

The main contributions of our work are summarized as following three points:

  • In response to the issue that the current correlation filter based tracking methods ignore to make proper use of the multi-scale spatial view around the target to effectively cope with the serious interference to the target, we put forward a detection model of multi-scale spatial view (DMMSV) which contains three multi-scale spatial filters with different scales of sampling ranges. This model is able to detect whether the target is seriously interfered including serious occlusion or severe change in illumination and so on.

  • Given that the current tracking methods generally adopt the single scale filter which lacks sufficient consideration of the multi-scale spatial information and easily leads to a tracking drift, we propose an on-line location model of multi-scale spatial view (On-line LMMSV) which pre-positions the target in light of the three sub filters in multi-scale spatial filter. A method of collaborative location is designed to realize a more accurate target positioning.

  • We conduct extensive tracking experiments and make detailed comparative analysis between the RCFMSV algorithm proposed in our work and the state-of-the-art tracking methods with the object tracking benchmark. Component analysis has also been made to verify the effectiveness of the two models above. As indicated by the experiment results, the algorithm put forward in our work is competitive with the state-of-the-art tracking methods in terms of tracking performance.

The rest of this paper is organized as follows: Section 2 reviews the related work of proposed method in our work. Section 3 briefly reviews the kernelized correlation filters (KCF) tracking method. Section 4 introduces DMMSV together with its deduction in details. Section 5 is about the detailed introductions to On-line LMMSV. Section 6 will explain how to track object with DMMSV and On-line LMMSV. We present the experiment results and make comparative analysis between our method and the state-of-the-art tracking methods in Section 7. The summary of this paper is presented in Section 8.

Section snippets

Related work

This section is about a brief review of correlation filter based tracking methods, and we will later give a discussion of the differences between our method and related work.

Preliminary: Kernelized Correlation Filters (KCF) tracking method

Kernelized Correlation Filters (KCF) [34] tracking method, which adopts ridge regression as its basic idea to model tracking problem, has been successfully applied for tracking. The goal of KCF is to train a filter w that can minimize the response error over training samples xi and regression target yi, the optimization problem is expressed asminwi(wTxiyi)2+λw2

In spatial domain, we can obtain the solution as follows,w=(XTX+λI)1XTyWherein, the data matrix X has one sample per row xi, and

Detection Model of Multi-scale Spatial View (DMMSV)

This section will present the detailed introductions on DMMSV which is proposed in our work. Fig. 3 depicts the overall flow chart of the interference detection conducted by DMMSV.

On-line Location Model of Multi-scale Spatial View (On-line LMMSV)

In this section, we would introduce how to apply the multi-scale spatial view to locate the target on-line in details. The overall flow chart of the On-line LMMSV is shown in Fig. 5.

Robust tracking with DMMSV and On-line LMMSV

In this section, we would elaborate how to track object with DMMSV and On-line LMMSV, firstly, we choose KCF [34] tracker as the baseline of proposed RCFMSV tracking algorithm. According to KCF tracker, the tracking process includes two stage - filter training and object positioning. In filter training stage, the algorithm adopts circulant matrix to realize fast filter training in Fourier domain. In positioning stage, the algorithm uses trained filter to filter the image patch at the position

Experiments

In this section, robust correlation filter tracking with multi-scale spatial view (RCFMSV) proposed in our work will be described in terms of the experiments and corresponding analysis including the presentation of experimental results as well as the comparative analysis with other state-of-the-art tracking algorithms.

Conclusion

In this paper, we propose a correlation filter tracking method with multi-scale spatial view (RCFMSV) which performs target’s interference detection and target location by using the characteristics of multi-scale spatial view around the target. As one of the two models contained in RCFMSV proposed in our work, the DMMSV is used for detecting the target’s state to cope with the interference like serious occlusion or severe illumination change and so on by exploiting the different sensitivity of

Acknowledgment

This work was supported in part by the National Natrual Science Foundation of China under Grant 61471274. The authors declare that they have no competing interests and they would like to thank the anonymous referees and the editor for their helpful comments and suggestions.

Yafu Xiao received the B.E. degree in Mathematics and Computer Science School of Jianghan University, Wuhan, China, in 2011, the M.E. degree in Computer School of Hubei University of Technology, Wuhan, China, in 2014. He is currently pursuing the Ph.D. degree with School of Computer Science, Wuhan University, Wuhan, China. His research interests include visual tracking, face alignment and object recognition.

References (86)

  • DuB. et al.

    Beyond the sparsity-based target detector: a hybrid sparsity and statistics-based detector for hyperspectral images

    IEEE Trans. Image Process.

    (2016)
  • WangW. et al.

    Deep visual attention prediction

    IEEE Trans. Image Process.

    (2018)
  • WangW. et al.

    Video salient object detection via fully convolutional networks

    IEEE Trans. Image Process.

    (2018)
  • WangW. et al.

    A deep network solution for attention and aesthetics aware photo cropping

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2018)
  • GaoR. et al.

    A personalized point-of-interest recommendation model via fusion of geo-social information

    Neurocomputing

    (2018)
  • WuJ. et al.

    Hybrid dynamic k-nearest-neighbour and distance and attribute weighted method for classification

    Int. J. Comput. Appl. Technol.

    (2012)
  • WuJ. et al.

    A naive Bayes probability estimation model based on self-adaptive differential evolution

    J. Intell. Inf. Syst.

    (2014)
  • ZhuX. et al.

    Similarity-maintaining privacy preservation and location-aware low-rank matrix factorization for QoS prediction based web service recommendation

    IEEE Trans. Serv. Comput.

    (2018)
  • JingX.Y. et al.

    Super-resolution person re-identification with semi-coupled low-rank discriminant dictionary learning

    IEEE Trans. Image Process.

    (2017)
  • ZhuX. et al.

    Image to video person re-identification by learning heterogeneous dictionary pair with feature projection matrix

    IEEE Trans. Inf. Forensic Secur.

    (2018)
  • HoJ. et al.

    Visual tracking using learned linear subspaces

    Proceedings of the IEEE Conference on Computer Vision Pattern Recognition

    (2004)
  • HuW. et al.

    Single and multiple object tracking using log-euclidean riemannian subspace and block-division appearance model

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2012)
  • D.A. Ross et al.

    Incremental learning for robust visual tracking

    Int. J. Comput. Vis.

    (2008)
  • MeiX. et al.

    Robust visual tracking using l1 minimization

    Proceedings of the IEEE International Conference on Computer Visison

    (2009)
  • LiuB. et al.

    Robust visual tracking using local sparse appearance model and k-selection

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2013)
  • BaoC. et al.

    Real time robust l1 tracker using accelerated proximal gradient approach

    Proceedings of the IEEE Conference on Computer Vision Pattern Recognition

    (2012)
  • ZhangT. et al.

    Robust visual tracking via structured multi-task sparse learning

    Int. J. Comput. Vis.

    (2013)
  • WangD. et al.

    Least soft-threshold squares tracking

    Proceedings of the IEEE Conference on Computer Vision Pattern Recognition

    (2013)
  • WangJ. et al.

    Online selecting discriminative tracking features using particle filter

    Proceedings of the IEEE Conference on Computer Vision Pattern Recognition

    (2005)
  • S. Avidan

    Support vector tracking

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2004)
  • ZhangZ. et al.

    Proposal generation for object detection using cascaded ranking svms

    Proceedings of the IEEE Conference on Computer Vision Pattern Recognition

    (2011)
  • H. Grabner et al.

    Semi-supervised on-line boosting for robust tracking

    Proceedings of the European Conference on Computer Vision

    (2008)
  • B. Babenko et al.

    Robust object tracking with online multiple instance learning

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2011)
  • ZhangK. et al.

    Real-time compressive tracking

    Proceedings of the European Conference on Computer Vision

    (2012)
  • SunH. et al.

    Efficient compressive sensing tracking via mixed classifier decision

    Sci. China-Inf. Sci.

    (2016)
  • S. Hare et al.

    Struck: structured output tracking with kernels

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2016)
  • Z. Kalal et al.

    Tracking-learning-detection

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2012)
  • S. Duffner et al.

    Fast pixelwise adaptive visual tracking of non-rigid objects

    IEEE Trans. Image Process.

    (2017)
  • D.S. Bolme et al.

    Visual object tracking using adaptive correlation filters

    Proceedings of the IEEE Conference on Computer Vision Pattern Recognition

    (2010)
  • J.F. Henriques et al.

    Exploiting the circulant structure of tracking-by-detection with kernels

    Proceedings of the European Conference on Computer Vision

    (2012)
  • M. Danelljan et al.

    Adaptive color attributes for real-time visual tracking

    Proceedings of the IEEE Conference on Computer Vision Pattern Recognition

    (2014)
  • J.F. Henriques et al.

    High-speed tracking with kernelized correlation filters

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2015)
  • M. Danelljan et al.

    Accurate scale estimation for robust visual tracking

    Proceedings of the British Machine Vision Conference

    (2014)
  • Cited by (6)

    • Visual tracking with dumbbell selection network

      2023, Neurocomputing
      Citation Excerpt :

      SiamRPN [8] applies the region proposal network (RPN) module in target detection to the tracking tasks and regards the original similarity calculation problem as regression and classification problems. Despite the superior performance of the Siamese trackers [8–10], the backbone of most trackers is relatively shallow [11], such as AlexNet [12]. SiamDW [13] and SiamRPN++ [14] conducted a series of groundbreaking experiments to figure out the underlying reason why the direct replacement of the deep network (such as ResNet [15]) with the shallow network fail to bring performance improvement: The padding operation in deep networks can impact the strict translation invariance of the Siamese network.

    • GARAT: Generative Adversarial Learning for Robust and Accurate Tracking

      2022, Neural Networks
      Citation Excerpt :

      Zhang et al. proposed to use the correlation filter and spatiotemporal context learning for tracking (Zhang, Zhang, Liu, Zhang and Yang, 2014). Then Henriques et al. proposed a tracking method with kernelized correlation filters that achieved higher accuracy and precision with faster speed than almost all existing trackers before (Henriques, Caseiro, Martins, & Batista, 2014), and many works based on correlation filters have come out (Li et al., 2020; Xiao et al., 2019; Zhou, Li, Du, Chang, & Xiao, 2020). However, it is still very difficult for a tracker to meet both precise and speed demands because when faced with challenges such as occlusion, background clutter, illumination variation, scale variation, motion blur, fast motion, and deformation, the algorithm usually needs to have a higher computational cost to extract the target features.

    • High speed long-term visual object tracking algorithm for real robot systems

      2021, Neurocomputing
      Citation Excerpt :

      Then we show the contributions of this paper. For scale estimation, scale space filter is designed in the same way as translation filter [34]. The difference is that it is constructed by a scale pyramid representation [35].

    • Multiple people tracking with articulation detection and stitching strategy

      2020, Neurocomputing
      Citation Excerpt :

      Multiple people tracking is a fundamental middle-level module in multimedia processing, and it has been widely studied and applied to multimedia applications like surveillance and autonomous driving [1–3,25,54,65].

    • A Research on Typhoon Tracking System Based on Meteorological Remote Sensing

      2020, 2020 International Conference on Internet of Things and Intelligent Applications, ITIA 2020

    Yafu Xiao received the B.E. degree in Mathematics and Computer Science School of Jianghan University, Wuhan, China, in 2011, the M.E. degree in Computer School of Hubei University of Technology, Wuhan, China, in 2014. He is currently pursuing the Ph.D. degree with School of Computer Science, Wuhan University, Wuhan, China. His research interests include visual tracking, face alignment and object recognition.

    Jing Li received the Ph.D. degree from Wuhan University, Wuhan, China, in 2006. He is currently a Professor with the School of Computer Science, Wuhan University, Wuhan, China. His research interests include multimedia technology and data mining.

    Bo Du received the B.S. and Ph.D. degrees in photogrammetry and remote sensing from the State Key Lab of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University, Wuhan, China, in 2005 and 2010, respectively. He is currently a "Luojia Talented Young Scholar" Professor appointed by the Wuhan University, China, which is the most prestigious chair professor title for young staff in the university. He is also a Professor with the School of Computer Science, Wuhan University. He has more than 50 research papers published in the IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING (TGRS), the IEEE TRANSACTIONS ON IMAGE PROCESSING (TIP), the IEEE JOURNAL OF SELECTED TOPICS IN EARTH OBSERVATIONS AND APPLIED REMOTE SENSING (JSTARS), and the IEEE GEOSCIENCE AND REMOTE SENSING LETTERS (GRSL). Thirteen of them are ESI hot papers or highly cited papers. His major research interests include pattern recognition, hyperspectral image processing, and signal processing. Dr. Du was a recipient of the Distinguished Paper Award from IJCAI 2018, the Best Paper Award of the IEEE Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS) 2018, and the Champion Award of the IEEE Data Fusion Contest 2018. He received the Best Reviewer awards from the IEEE GRSS for his service to the IEEE JSTARS in 2011 and the ACM Rising Star Awards for his academic progress in 2015. He was the Session Chair for both International Geoscience and Remote Sensing Symposium 2018/2016 and the 4th IEEE GRSS WHISPERS. He also serves as a Reviewer of 20 Science Citation Index magazines, including IEEE TGRS, TIP, JSTARS, and GRSL.

    Jia Wu received the Ph.D. degree in computer science from the University of Technology Sydney, Ultimo, NSW, Australia. He is currently an Assistant Professor with the Department of Computing, Faculty of Science and Engineering, Macquarie University, Sydney. Prior to that, he was with the Centre for Artificial Intelligence, University of Technology Sydney. He is also a Honorary Professor in School of Computer Science, Wuhan University. His current research interests include data mining, computer vision and machine learning. Dr. Wu is an Associate Editor of ACM Transactions on Knowledge Discovery from Data (TKDD). He is the recipient of Best Paper Award in Data Science Track (SDM 2018) and the Best Student Paper Award (IJCNN 2017) and Best Paper Candidate Award (ICDM 2014). Since 2009, he has authored or co-authored over 60 refereed journal and conference papers, such as the IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, the IEEE TRANSACTIONS ON CYBERNETICS, Pattern Recognition, the International Joint Conference on Artificial Intelligence, AAAI Conference on Artificial Intelligence, International Conference on Data Engineering, the International Conference on Data Mining, SIAM International Conference on Data Mining, and the Conference on Information and Knowledge Management, in these areas.

    Xuefei Li received the Ph.D. degree from Wuhan University in 2007, Wuhan, China. He is currently an Associate Professor in School of Computer Science, Wuhan University, Wuhan, China. His current research interest includes data mining and Multimedia technology.

    Jun Chang received the Ph.D. degree from Wuhan University in 2011, Wuhan, China. He is currently an Assistant Professor in School of Computer Science, Wuhan University, Wuhan, China. His current research interests include computer vision, largescale machine learning, and stream data mining.

    Yifei Zhou received the M.S degree in Software Engineering from Wuhan University in 2013, Wuhan, China. She is currently working for the Ph.D. degree in School of Computer Science, Wuhan University. Her main research interest include data mining and pattern recognition.

    View full text