Deep discriminative correlation tracking based on adaptive feature fusion

https://doi.org/10.1016/j.micpro.2019.102854Get rights and content

Abstract

Discriminative Correlation Filter (DCF) is a fashionable method to treat with the challenges in visual tracking tasks. Recently, the integration of DCF and Convolutional Neural Network (CNN) obtains a favorable tracking performance. However, how to make full use of the CNN features under the DCF tracking framework is still a challenging problem. In this paper, we designed an uncertainty measurement of the correlation response map, and based on this measurement, we proposed an adaptive feature fusion strategy to fuse the correlation response maps of different CNN features. Finally, a deep discriminative correlation tracking (DDCT) algorithm based on this adaptive feature fusion strategy is proposed. Meanwhile, the separate scale correlation filters based on scale pyramid and HOG feature are also introduced to further treat with the scale estimation. Moreover, in order to solve the problem of tracking drifts due to the severe occlusion or serious appearance changes of the target, we present a new adaptive and selective update mechanism to update the discriminative correlation filters and scale correlation filters effectively. Extensive experimental results on OTB2013, OTB2015 and Temple Color 128 benchmark datasets show that the proposed algorithm achieves a better overall performance compared with the state-of-the-art methods.

Introduction

As one of the fundamental problem in computer vision, visual tracking has been widely applied in many fields, such as visual surveillance, traffic monitoring, and human-computer interface, etc. During the past decade, a lot of trackers are proposed to improve the tracking performance, which can be found in the recent surveys of visual tracking. Despite much progress has been achieved in these traditional methods, it remains tremendous challenges in achieving a robust tracker due to the severe occlusion or serious appearance changes of the target.

Recently, along with the fast development of the deep learning technology, the deep features based on Convolutional Neural Network (CNN) has demonstrated outstanding performance in computer vision applications, e.g., object recognition [1], [2], image classification [3], [4], and saliency detection [5], [6]. The deep CNN has the strong capability to learn the rich high-level semantic feature representations which are of great significance in distinguishing objects from different categories. Some recent studies [7], [8], [9] have shown that the multilayer CNN architecture can efficiently capture sophisticated hierarchical features which have different properties for tracking problem. The higher layers capture more abstract and semantic features which are effective to distinguish the objects from various categories and robust to the dramatic changes of targets appearance. However, if the background objects have the similar appearances with target, the high-level features will be less effective to differentiate them. The lower layers provide more detailed local features. They are less robust to the changes of targets appearance but are very helpful to separate the true target from its similar background objects due to the detail representations. This phenomenon motivates some researchers to apply hierarchical convolutional features to improve the tracking precision [8], [9].

Besides the deep learning tracking methods, Correlation Filter (CF) [10], [11] is proved to be efficient and effective for visual tracking problem. As one of the traditional tracking methods, CF-based tracking has attracted considerable attention due to its high computational efficiency with the use of fast Fourier transforms. The earlier evaluation demonstrates that the discriminative correlation filters (DCF) based trackers obtain relative better performances compared with other traditional trackers. After the prevalent of deep learning based tracking, the integration of deep learning and discriminative correlation filters achieves state-of-the-art performance compared with other deep learning tracking methods. The Convolutional Neural Networks based on deep learning provide the stable image features while discriminative correlation filters are served as the discriminative classifier to produce the tracking results. The HCF is a typical DCF based tracker which exploits features extracted from deep convolutional neural networks trained on object recognition datasets to improve tracking accuracy and robustness.

Although the integration of CNN and DCF improves the performance of visual tracking significantly, there are still some challenging problems with these deep learning based trackers. Firstly, it is usually not the optimal choice to manually preset the weights when fusing the different CNN layer features. The weight values should adapt to different video sequences and even different frames in the same sequence. Secondly, exist works has a large performance gap between success plots and precision plots. The main reason is that the scale estimation is a very challenging problem during the tracking course. Thirdly, the target is easy to be missed once the serious occlusion occurs due to some traditional update strategies of constantly updating. The constantly update strategies tend to introduce background clutter into the positive samples, which will result in error accumulation until the target is lost.

In order to address the above problems, we designed an adaptive feature fusion strategy to fuse the correlation response maps generated from different CNN layer features. Based on this, we proposed a deep discriminative correlation tracking algorithm with scale adaption and model update. The main contributions of our work can be summarized as follows:

  • (1)

    We designed a metric to measure the uncertainty of the correlation response map and proposed an adaptive feature fusion strategy to integrate the hierarchical CNN features with discriminative correlation filter. The integrated tracker can online adjust the fusion weights to obtain more robust tracking results compared with the baseline trackers.

  • (2)

    We constructed a scale filter to estimate the scale of target based on a set of scale correlation filters. By using the scale filter, the varied scales of target will be accurately estimated after locating its center location. This design relieves the mutual influence of location errors and scale errors, and reduces computational complexity efficiently.

  • (3)

    We proposed a new adaptive and selective updating mechanism to relief the model drift. It provides two model updating strategies and can adaptively select the optimal one by the correlation value of two adjacent frames. This updating mechanism further improves the tracking success rate under some typical challenging conditions.

    Extensive experiments are carried out on the OTB2013 [12] (including 50 challenging videos), OTB2015 [13] (including 100 challenging videos) and Temple Color 128 [14] (including 128 challenging videos, TC128 for short) tracking benchmark datasets. The experimental results demonstrate that the proposed tracker achieves outstanding performance against state-of-the-art trackers (most of them are proposed from 2015–2017).

Section snippets

Related work

As a very hot topic in computer vision, visual tracking has been researched for decades. Lots of tracking algorithms are proposed to resolve the hard problems and improve the overall performances. Especially recent years, the deep learning based trackers have attracted very board attentions and improved the performances significantly. However, a comprehensive survey of visual tracking literature is out of the range of this paper. We only briefly review the closely related works according to

Proposed algorithm

The recent advancement of visual tracking demonstrates that the CNN-based trackers achieved a relative higher performance and CF-based tracker owns a remarkable efficiency. The integration of these two reveals great attraction to researchers. However, how to develop the potential of CNN features as much as possible is a tough question. In this section, we will introduce the proposed algorithm, which consists of deep discriminative correlation filters learning, feature fusion based on

Experiments

In order to evaluate the proposed algorithm, we implement the tracker using the mixed programming of MATLAB and VC++ based on the experimental platform of CPU (Intel Xeon 2.4 GHz) and GPU (GTX Titan X), We evaluate the proposed tracker on some recent visual tracking benchmark dataset with comparisons of some state-of-the-art trackers under one-pass evaluation (OPE). These trackers can be broadly categorized into four classes: (i) CNN-based trackers including CNN-SVM [16] and STCT [17]; (ii)

Discussion

In this paper, by integrating the deep discriminative correlation filters learning and a novel fusion strategy, we propose a deep discriminative correlation tracking algorithm based on adaptive feature fusion. In order to further improve the tracking performance, we designed an online fast scale estimation method. Moreover, we present a new adaptive and selective update mechanism to update both the discriminative correlation filters and scale correlation filters. The new update mechanism solves

Declaration of Competing Interest

None.

Acknowledgement

This work was supported in part by Nation Natural Science Foundation of China under grant #61703423, #61773396, and #41601436.

Wangsheng Yu was born in Hunan province of China. He received his M.S. and Ph.D degrees both in Communication and Information System from the Air Force Engineering University (AFEU) in 2010 and 2014, respectively. He is currently a lecturer with the Information and Navigation College, Air Force Engineering University. His research interests include computer vision and image processing.

References (49)

  • Y. Zhang et al.

    Deep neural network for halftone image classification based on sparse auto-encoder

    Eng. Appl. Artif. Intell.

    (2016)
  • L. Zhang et al.

    Obust visual tracking via co-trained Kernelized correlation filters

    Pattern Recog.

    (2017)
  • C. Szegedy et al.

    Deep neural networks for object detection

    Proceedings of the NIPS, Lake Tahoe

    (2013)
  • A. Eitel et al.

    Multimodal deep learning for robust RGB-d object recognition

    Proceedings of the IROS, Hamburg

    (2015)
  • J. Wang et al.

    CNN-RNN: A Unified Framework for Multi-label Image Classification

    Proceedings of the CVPR, Las Vegas

    (2016)
  • L.Z. Wang et al.

    Saliency Detection with Recurrent Fully Convolutional Networks

    Proceedings of the ECCV

    (2016)
  • G. Li et al.

    Visual saliency detection based on multiscale deep CNN features

    IEEE Trans. Image Process.

    (2016)
  • B. Hariharan et al.

    Hypercolumns for object segmentation and fine–grained localization

    Proceedings of the CVPR

    (2015)
  • C. Ma et al.

    Hierarchical convolutional features for visual tracking

    Proceedings of the ICCV, Santiago

    (2015)
  • P. Agrawal et al.

    Analyzing the performance of multilayer neural networks for object recognition

    Proceedings of theECCV

    (2014)
  • D.S. Bolme et al.

    Visual object tracking using adaptive correlation filters

    Proceedings of the CVPR

    (2010)
  • V.N. Boddeti et al.

    Correlation filters for object alignment

    Proceedings of the CVPR

    (2013)
  • Y. Wu et al.

    Online Object tracking: a benchmark

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2013)
  • Y. Wu et al.

    Object tracking benchmark

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2015)
  • P. Liang et al.

    Encoding color information for visual tracking: Algorithms and benchmark

    IEEE Trans. Image Process.

    (2015)
  • L. Wang, W. Ou-Yang, X. Wang, et al., Visual tracking with fully convolutional networks, Proceedings of the IEEE...
  • S. Hong, T. You, S. Kwak, et al., Online tracking by learning discriminative saliency map with convolutional neural...
  • L.J. Wang, W.L. Ouyang, X.G. Wang, et al., STCT: Sequentially training convolutional networks for visual tracking,...
  • H. Nam, B. Han, Learning multi-domain convolutional neural networks for visual tracking, Proceedings of the IEEE...
  • M. Kristan, R. Pflugfelder, J. Matas, et al., The visual object tracking VOT2015 challenge results, Proceedings of the...
  • J.F. Henriques, R. Caserio, P. Martins, et al., Exploiting the circulate structure of tracking-by-detection with...
  • M. Danelljan, F.S. Khan, M. Felsberg, et al., Adaptive color attributes for real-time visual tracking, Proceedings of...
  • J.F. Henriques, R. Caseiro, P. Martins, et al., High-speed tracking with Kernelized correlation filters, Proceedings of...
  • M. Danelljan, G. Hager, F.S. Khan, et al., Accurate scale estimation for robust visual tracking, Proceedings of the...
  • Cited by (2)

    • Target recognition of basketball sports image based on embedded system and internet of things

      2021, Microprocessors and Microsystems
      Citation Excerpt :

      Electronic Writing [18] Independent production and solid dynamic information services and self-selling scaling and extraordinary female tennis players process [19] I use a research facility to investigate 3D Rapid video laws and videos on the American mountain top in the Human Intelligence Lab. Expertise attributes provide an upgrade system for individual and specific working conditions under product [20], another general and specific update channel size relationship that distinguishes the channel success [21]. Checking the basketball signal is testing the movement of humans.

    Wangsheng Yu was born in Hunan province of China. He received his M.S. and Ph.D degrees both in Communication and Information System from the Air Force Engineering University (AFEU) in 2010 and 2014, respectively. He is currently a lecturer with the Information and Navigation College, Air Force Engineering University. His research interests include computer vision and image processing.

    View full text