Adaptive decision-level fusion and complementary mining for visual object tracking with deeper networks

Xiaoyan Meng; Yangzhou Chen; Le Xin

doi:10.1117/1.JEI.29.4.043024

25 August 2020 Adaptive decision-level fusion and complementary mining for visual object tracking with deeper networks

Xiaoyan Meng, Yangzhou Chen, Le Xin

Author Affiliations +

Journal of Electronic Imaging, Vol. 29, Issue 4, 043024 (August 2020). https://doi.org/10.1117/1.JEI.29.4.043024

Abstract

Multiple region proposal networks (RPNs) have been recently combined with the Siamese network with deeper backbone networks for tracking and shown excellent accuracy with high efficiency. Although the destruction of the strict translation invariance caused by network padding in the original ResNet-50 is solved by a custom sampling strategy, its impact is not eliminated from the network structure itself, and the multilayer feature fusion is insufficient. To this end, we propose an object tracking framework based on SiamRPN with the deeper backbone networks and cascaded RPN (D-CRPN). First, we exploit the cropping-inside residual units for reforming ResNet-50 to break the spatial invariance restriction and train the robust backbone networks for visual tracking. Then, the feature transfer blocks are proposed to achieve the effective integration of the outputs of multiple blocks in a specific network stage. Finally, to improve the robustness of our tracker, we present a quality measure for the synthetic response maps of RPN modules and then use it to calculate the adaptive weights for the linear weighting method. The extensive evaluation performed on OTB100, VOT2016, and VOT2018 benchmark datasets demonstrates that the proposed D-CRPN tracker outperforms most of the state-of-the-art approaches while maintaining real-time tracking speed.

Citation Download Citation

Xiaoyan Meng, Yangzhou Chen, and Le Xin "Adaptive decision-level fusion and complementary mining for visual object tracking with deeper networks," Journal of Electronic Imaging 29(4), 043024 (25 August 2020). https://doi.org/10.1117/1.JEI.29.4.043024

Received: 30 April 2020; Accepted: 6 August 2020; Published: 25 August 2020

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $24.00

Non-members: $28.00 ADD TO CART

JOURNAL ARTICLE
17 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Optical tracking

Mining

Video

Visualization

Detection and tracking algorithms

Feature extraction

Convolution

Show All Keywords

Keywords/Phrases

Search In:

Publication Years