Abstract:
Recently have testified the superior tracking ability of Transformer in RGBT tracking for its global and dynamic modeling property. However, these Transformer-based track...Show MoreMetadata
Abstract:
Recently have testified the superior tracking ability of Transformer in RGBT tracking for its global and dynamic modeling property. However, these Transformer-based trackers lack attention to the primary feature information and are susceptible to interference from background information. In addition, they often either focus on shared modality information or specific modality information but fail to adequately explore the potential of these two patterns together. To address these issues, a sparse trifurcate Transformer aggregation network is proposed in this article for enhancing tracking robustness. First, a trifurcate tree structure is designed to obtain both modality-shared and modality-specific information, which can learn more powerful feature representations. Second, a sparse attention mechanism is adopted in Transformer to focus on the important features. To fully mine the complementary multimodal information, a confidence-aware aggregation network is designed to generate reliability weights of each mode. Finally, a double-head network is introduced to locate target. Sufficient experimental results on multiple RGBT benchmarks, including GTOT, RGBT210, RGBT234, and LasHeR, verify superior tracking ability against other advanced trackers.
Published in: IEEE Transactions on Instrumentation and Measurement ( Volume: 73)