Abstract:
In this paper, we propose the novel complementary Siamese networks (CoSNet) for visual tracking by exploiting complementary global and local representations to learn a ma...Show MoreMetadata
Abstract:
In this paper, we propose the novel complementary Siamese networks (CoSNet) for visual tracking by exploiting complementary global and local representations to learn a matching function. In specific, the proposed CoSNet is two-fold: a global Siamese network (GSNet) and a local Siamese network (LSNet). The GSNet aims to match the target with candidates using holistic representation. By contrast, the LSNet explores partial object representation for matching. Instead of simply decomposing the object into regular patches in LSNet, we propose a novel attentional local part network, which automatically generates salient object parts for local representation and adaptively weights each part according to its importance in matching. In CoSNet, the GSNet and LSNet are jointly trained in an end-to-end manner. By coupling two complementary Siamese networks, our CoSNet learns a robust matching function which can effectively handle various appearance changes in visual tracking. Extensive experiments on a large-scale dataset with 100 sequences show that CoSNet outperforms other state-of-the-art trackers.
Published in: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 12-17 May 2019
Date Added to IEEE Xplore: 17 April 2019
ISBN Information: