Leveraging Local and Global Cues for Visual Tracking via Parallel Interaction Network | IEEE Journals & Magazine | IEEE Xplore

Leveraging Local and Global Cues for Visual Tracking via Parallel Interaction Network


Abstract:

Despite that both local and context information are crucial for robust tracking, existing CNN-based and transformer-based methods mainly focus on one of these aspects. Co...Show More

Abstract:

Despite that both local and context information are crucial for robust tracking, existing CNN-based and transformer-based methods mainly focus on one of these aspects. Consequently, the former fails to exploit rich global context information due to the limited receptive field, while the latter suffers from the deficiencies in constructing the local relationship among neighboring regions. To address this issue, we propose the SiamPIN tracker, based on our Parallel Interaction Network. It consists of two effective modules, namely Global Aggregation Block (GAB) and Local Process Block (LPB). GAB perceives the global context to capture the long-range spatial dependency through a transformer-based architecture. Meanwhile, LPB performs local information extraction using a CNN model to retain the detailed appearance information of the target. These two modules are connected consecutively to compose a Trans-Conv unit block, which transmits the global context information to the local feature extraction procedure, hence enables the interaction of global-local information flow. Several such blocks are cascaded so that our model can learn to aggregate local and context information interactively. The proposed tracker achieves state-of-the-art performance on six benchmark datasets, while maintaining a real time running speed.
Page(s): 1671 - 1683
Date of Publication: 10 October 2022

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.