Abstract:
The mainstream tracking-by-detection paradigm for multi-object tracking generally conducts detection first, followed by Re-IDentification (Re-ID) and motion estimation. T...Show MoreMetadata
Abstract:
The mainstream tracking-by-detection paradigm for multi-object tracking generally conducts detection first, followed by Re-IDentification (Re-ID) and motion estimation. The associations between the predicted boxes and existing tracks are then performed via visual and motion association. However, challenges such as irregular motion patterns, similar appearances, and frequent occlusions often arise, making object tracking a nontrivial task. In this article, we propose a multi-object tracker based on Spatio-TemporAl Topological (STAT) constraints to address the above issues. More specifically, we design the Feature Adaptive Association Module (FAAM) to establish the association between motion and appearance regionally, completing a complementary combination of appearance and motion features. Among these, the Appearance Feature Update Module (AFUM) is proposed to manage the appearance updates of tracked objects by imposing constraints based on the spatial locations and the degree of object occlusion, while temporal consistency is adopted to smooth the appearance states of tracks to mitigate the accumulation of appearance noise. Moreover, the Robust Motion Tracking Module (RMTM) is established to reduce the impact of irregular motions and certain unreliable detection results. The proposed module includes a higher weighted momentum term to accommodate the excessive motion amplitude and considers low-confidence boxes accompanied by the stage-wise association strategy for high-confidence boxes. Extensive experiments on DanceTrack and benchmark MOT datasets verify the effectiveness of our STAT tracker, especially the state-of-the-art results on DanceTrack, which is characterized by irregular motion and indistinguishable appearance attributes.
Published in: IEEE Transactions on Multimedia ( Volume: 26)