25 August 2020 Adaptive decision-level fusion and complementary mining for visual object tracking with deeper networks
Xiaoyan Meng, Yangzhou Chen, Le Xin
Author Affiliations +
Abstract

Multiple region proposal networks (RPNs) have been recently combined with the Siamese network with deeper backbone networks for tracking and shown excellent accuracy with high efficiency. Although the destruction of the strict translation invariance caused by network padding in the original ResNet-50 is solved by a custom sampling strategy, its impact is not eliminated from the network structure itself, and the multilayer feature fusion is insufficient. To this end, we propose an object tracking framework based on SiamRPN with the deeper backbone networks and cascaded RPN (D-CRPN). First, we exploit the cropping-inside residual units for reforming ResNet-50 to break the spatial invariance restriction and train the robust backbone networks for visual tracking. Then, the feature transfer blocks are proposed to achieve the effective integration of the outputs of multiple blocks in a specific network stage. Finally, to improve the robustness of our tracker, we present a quality measure for the synthetic response maps of RPN modules and then use it to calculate the adaptive weights for the linear weighting method. The extensive evaluation performed on OTB100, VOT2016, and VOT2018 benchmark datasets demonstrates that the proposed D-CRPN tracker outperforms most of the state-of-the-art approaches while maintaining real-time tracking speed.

© 2020 SPIE and IS&T 1017-9909/2020/$28.00© 2020 SPIE and IS&T
Xiaoyan Meng, Yangzhou Chen, and Le Xin "Adaptive decision-level fusion and complementary mining for visual object tracking with deeper networks," Journal of Electronic Imaging 29(4), 043024 (25 August 2020). https://doi.org/10.1117/1.JEI.29.4.043024
Received: 30 April 2020; Accepted: 6 August 2020; Published: 25 August 2020
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Optical tracking

Mining

Video

Visualization

Detection and tracking algorithms

Feature extraction

Convolution

Back to Top