Abstract:
Current point-based trackers are usually implemented by the following two branches: a classification branch for predicting the target candidate locations and a regression...Show MoreMetadata
Abstract:
Current point-based trackers are usually implemented by the following two branches: a classification branch for predicting the target candidate locations and a regression branch for regressing the tracking box, which may lead to a spatial misalignment between the two tasks. Meanwhile, they ignore a meaningful exploration on how to define positive and negative samples during training and explicit border information for accurate box prediction. In this research, we investigate the key issues of point-based trackers and unlock their key limitations. First, we design a novel task-aligned component and a new loss function, named task-aligned loss, to learn the alignment of the classification and regression tasks. Second, we introduce a border alignment (BorderAlign) component in both the classification and regression branches to effectively exploit the border features of a tracking target. Third, we develop an adaptive training sample assignment (ATSA) to adaptively divide the positive and negative samples based on the statistical characteristics of the tracking object. Finally, a deformable transformer is developed to enhance the representations of search features and explore rich temporal contexts among video frames. Extensive experimental results demonstrate that the proposed tracker achieves state-of-the-art performance on six tracking benchmark datasets.
Published in: IEEE Transactions on Multimedia ( Volume: 25)