Elsevier

Neurocomputing

Volume 500, 21 August 2022, Pages 537-546
Neurocomputing

Teacher-student knowledge distillation for real-time correlation tracking

https://doi.org/10.1016/j.neucom.2022.05.064Get rights and content

Abstract

The performance of correlation filter (CF) based visual trackers has been greatly improved with pretrained deep convolutional neural networks. However, these networks limit the application scope of CF based trackers because of high feature dimension, high time consumption of feature extraction and huge memory storage. To alleviate this problem, we introduce a teacher-student knowledge distillation framework to obtain a lightweight network to speed up CF based trackers. Specifically, we take a pretrained deep convolutional neural network from the image classification task as a teacher network, and distill this teacher network into a lightweight student network. During offline distillation training process, we propose an attention transfer loss to ensure the lightweight student network maintains feature representation of the large-capacity teacher network. Meanwhile, we propose a correlation tracking loss to transfer the student network from image classification task to correlation tracking task, which improves the discriminant ability of the student network. Experiments on OTB, VOT2017 and Temple Color show that, using the learned lightweight network model as the feature extractor, the state-of-the-art CF based tracker achieves real-time speed on a single CPU, while maintaining almost the same tracking performance.

Introduction

Visual tracking is a fundamental problem in computer vision and has been applied in many fields, which tracks a specified target given in the first frame in a changing video sequence automatically. Many methods [1], [2], [3], [4], [35], [54], [55], [56] have proposed to solve problems in visual tracking, such as occlusion, slow running speed and so on. Recently, correlation filters (CF) based trackers have been widely concerned and studied because of their computational efficiency in Fourier domain. The raw deep convolutional neural networks from other tasks are generally used to extract the target feature presentation for CF based trackers. Compared with the traditional hand-craft feature (e,g, HoG [5]), the deep convolutional features are more effective for the representation of target feature. Based on the deep convolutional features, correlation filter based trackers achieve more robust and accurate results on several popular benchmarks [1], [2], [3], [4]. However, the accuracy of these trackers is improved by using these deep convolution features, while the running speed is seriously reduced, especially on resource-constrained platform. The main reasons are: (1) more time consumption of correlation filters process. Because these deep convolutional features are designed to cover general objectives in large datasets, such as ImageNet, they have high dimension. And the computation time of correlation filter increases with the increase of feature dimension. (2) more time consumption of feature extraction. When extracting convolutional features of a image, a lot of convolution operation will be conducted, and thus more time is consumed during extracting feature. Furthermore, using raw deep convolutional neural network as the feature extractor, trackers require huge memory storage. For example, the original VGG-M [6] is used as the feature extractor by most CF based trackers [7], [8]. Including the full connection layer, and the model size of VGG-M is about 369 MB. Although GPU and can be used to accelerate trackers to some extent, the practical application scope is severely limited. In this work, we explore the way to optimize the running speed of CF based trackers using the raw deep convolutional neural network. Our goal is to make the improved CF tracker run on a single CPU platform without significantly reducing the performance, thus providing insights into the application scope of CF based trackers. According to our observations, improvements of running speed can be made in two aspects:

  • (1) Reducing the model capacity of the feature extraction network. The smaller capacity deep convolutional neural network can reduce the time consumption generated in the target feature extraction process, and thus reducing the memory storage occupied by the algorithm;

  • (2) Reducing the dimension of the extracted target features, which reduces the computation time of the correlation filter.

To this end, we introduce a teacher-student knowledge distillation training framework to obtain a lightweight convolutional neural network, which has lower feature dimension, less feature extraction time and smaller memory storage. And then the lightweight model is used as the feature extractor to speed up CF based trackers. Specifically, we take a pretrained deep convolutional neural network from the image classification task, namely VGG-M [6], as the teacher network, and then a lightweight convolutional neural network is designed as the student network. In general knowledge distillation training process, a student network is generated by compressing a teacher network, and the student network is applied to the same domain as the teacher network. In this work, the student network and the teacher network are in two different domain, that is correlation tracking and image classification. To achieve model compression and reduce differences between domains, we propose two kinds of loss functions to guide the training process of the student network model: the attention transfer loss (AT loss) function and the correlation tracking loss (CT loss) function. The AT loss ensures that the lightweight student network to maintain feature representation of large-capacity teacher network. And the CF loss improves the student network discriminant ability, and shifts the student network suitable for the image classification task to the correlation tracking task to narrow the gap between domains. Meanwhile, to enrich feature representation of a student network, we carried out the distillation process on shallow, middle and deep convolutional layers jointly.

After offline training based on the teacher-student knowledge distillation framework, we obtained a lightweight feature extraction network with a model size of about 1.3 MB. Compared with teacher network size of 90 MB (excluding all full connection layer), the student network reduces the model capacity by about 69 times. When the trained lightweight student network is combined with the state-of-the-art correlation filter based tracker, namely ECO [7], the tracker achieves real-time running speed (26 FPS) on a CPU platform. Meanwhile, a large number of experiments on the popular benchmarks show that the proposed method almost maintains the performance similar to that of the original ECO.

We summarize our main contributions as follow:

  • (1) A new teacher-student knowledge distillation training framework is proposed to learn a lightweight network for DF based visual tracker. During training the lightweight network, we propose an attention transfer loss function and a correlation tracking loss function to jointly guide the training process of the lightweight student network.

  • (2) We propose to distillate the lightweight student through the attention transfer process and the correlation tracking process on shallow, middle and high level convolutional layers jointly to enrich feature representation of the student network.

  • (3) We combine the learned lightweight student network with state-of-the-art CF based tracker [7]. The evaluation on the four popular benchmarks shows that our method can improve the running speed of the tracker on a CPU while maintain almost similar tracking performance.

Section snippets

Related works

In this section, we give a brief review closely related to this work on three aspects: correlation filter for visual tracking, real-time visual tracking based on deep learning, knowledge distillation.

Proposed methods

The framework of the proposed teacher-student knowledge distillation is given in Fig. 1. In the following sub-sections, we introduce its network structure, the attention transfer training process, the correlation tracking training process, and the online correlation filter tracking process with the learned lightweight network.

Experiments

In this section, we first introduce the implementation details. Secondly, the results of OTB2013 [1], OTB2015 [2], VOT2017 [3] and Temple Color [4] prove the effectiveness and robustness of our method. Finally, we conducted ablation experiments to analyze the contribution of each part of the tracker to the performance of the tracker and the effectiveness of the network structure.

Conclusion

In this work, we propose to use a lightweight feature extraction network to optimize the speed of CF based tracker from the feature extraction and the learning time consumption of correlation filtering. A highly compressed and lightweight feature extraction network is obtained by model compression and transfer of a raw large-capacity teacher network from image classification task. A large number of experiments show that our training strategy is effective. Although the obtained network is very

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This work was supported by the Project of Guangxi Science and Technology (No. 2022GXNSFDA035079 and GuiKeAD21075030), the National Natural Science Foundation of China (No. 61972167 and 62076214), the Guangxi “Bagui Scholar” Teams for Innovation and Research Project, the Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing, and the Guangxi Talent Highland Project of Big Data Intelligence and Application.

Qihuang Chen is currently a visiting researcher at Guangxi Normal University, Guilin, China. He received the M.S. degree from School of Computer Science and Technology, Huaqiao University, in 2020. His research interests include computer vision and machine learning.

References (56)

  • Y. Wu et al.

    Online object tracking: A benchmark

    Proceedings of the IEEE conference on computer vision and pattern recognition

    (2013:)
  • Y. Wu et al.

    Object Tracking Benchmark

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2015)
  • M. Kristan et al.

    The visual object tracking vot2017 challenge results

  • P. Liang et al.

    Encoding Color Information for Visual Tracking: Algorithms and Benchmark

    IEEE Trans. Image Process.

    (2015)
  • P. Felzenszwalb et al.

    Object detection with discriminatively trained part-based models

    TPAMI

    (2010)
  • K. Chatfield, K. Simonyan, A. Vedaldi, et al. Return of the devil in the details: Delving deep into convolutional nets,...
  • M. Danelljan, G. Bhat, F. Shahbaz Khan, et al., Eco: Efficient convolution operators for tracking, in: Proceedings of...
  • M. Danelljan et al.

    Convolutional Features for Correlation Filter Based Visual Tracking

  • D.S. Bolme et al.

    Visual object tracking using adaptive correlation filters[C]

    Twenty-third IEEE Conference on Computer Vision & Pattern Recognition, IEEE

    (2010)
  • Y. Bo, Z.Q. Ling, Optimal Control for Large-scale Descriptor Systems with Symmetric Circulant Structure, J....
  • J.F. Henriques et al.

    High-speed tracking with kernelized correlation filters

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2014)
  • M. Danelljan et al.

    Adaptive color attributes for real-time visual tracking

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

    (2014)
  • C. Ma et al.

    Hierarchical Convolutional Features for Visual Tracking

  • Y. Qi, S. Zhang, L. Qin, et al., Hedged deep tracking, in: Proceedings of the IEEE conference on computer vision and...
  • Y. Li et al.

    A Scale Adaptive Kernel Correlation Filter Tracker with Feature Integration

  • M. Danelljan, G. Häger, F. Khan, et al., Accurate scale estimation for robust visual tracking, in: British Machine...
  • M. Danelljan et al.

    Learning spatially regularized correlation filters for visual tracking

    Proceedings of the IEEE international conference on computer vision

    (2015)
  • M. Mueller et al.

    Context-aware correlation filter tracking

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2017)
  • M. Tang et al.

    Multi-kernel correlation filter for visual tracking

    Proceedings of the IEEE international conference on computer vision

    (2015)
  • J. Choi et al.

    Attentional correlation filter network for adaptive visual tracking

    Proceedings of the IEEE conference on computer vision and pattern recognition

    (2017)
  • Y. Li et al.

    Reliable patch trackers: Robust visual tracking by exploiting reliable patches

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2015)
  • T. Liu et al.

    Real-time part-based visual tracking via adaptive correlation filters

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

    (2015)
  • A. Bibi, M. Mueller, B. Ghanem, Target response adaptation for correlation filter tracking, in: European conference on...
  • Y. Sui et al.

    Real-time visual tracking: Promoting the robustness of correlation filter learning[C]//European conference on computer vision

  • M. Danelljan et al.

    Beyond correlation filters: Learning continuous convolution operators for visual tracking

  • H. Nam et al.

    Learning multi-domain convolutional neural networks for visual tracking

    Proceedings of the IEEE conference on computer vision and pattern recognition

    (2016)
  • Y. Song et al.

    Crest: Convolutional residual learning for visual tracking

    Proceedings of the IEEE International Conference on Computer Vision

    (2017)
  • B. Li et al.

    High performance visual tracking with siamese region proposal network

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2018)
  • Cited by (5)

    Qihuang Chen is currently a visiting researcher at Guangxi Normal University, Guilin, China. He received the M.S. degree from School of Computer Science and Technology, Huaqiao University, in 2020. His research interests include computer vision and machine learning.

    Bineng Zhong received the B.S., M.S., and Ph.D. degrees in computer science from the Harbin Institute of Technology, Harbin, China, in 2004, 2006, and 2010, respectively. From 2007 to 2008, he was a Research Fellow with the Institute of Automation and Institute of Computing Technology, Chinese Academy of Science. From September 2017 to September 2018, he was a visiting scholar in Northeastern University, Boston, MA, USA. From November 2010 to October 2020, he was a professor with the School of Computer Science and Technology, Huaqiao University, Xiamen, China. Currently, he is a professor with the School of Computer Science and Engineering, Guangxi Normal University, Guilin, China. His current research interests include pattern recognition, machine learning, and computer vision.

    Qihua Liang received the B.S degree in accounting major from the Xiamen University, Xiamen, China, in 2014. Currently, she is a teacher with the School of Computer Science and Engineering, Guangxi Normal University, Guilin, China. Her current research interests include computer vision and pattern recognition.

    Deng Qingyong is an associate professor at the School of Computer Science and Engineering & School of Software, Guangxi Normal University, China. He received his master’s degree in Signal and Information Processing from Xiangtan University, China in 2009 and Ph.D. degree in Beijing University of Posts and Telecommunications (BUPT), China in 2019. He has published more than 30 referred journal papers in his current research interests, including IoT, AI and wireless network. He is a member of IEEE and CCF.

    Xianxian Li received the Ph.D. degree in computer science and technology from Beihang University, Beijing, China. He is currently a professor with the School of Computer Science and Engineering, Guangxi Normal University. His research interests include machine learning, data security, blockchain and distributed system.

    View full text