Abstract:
Recently, there has been a surge of interest in using one-shot methods for multi-object tracking (MOT). These methods use a single network to produce both object detectio...Show MoreMetadata
Abstract:
Recently, there has been a surge of interest in using one-shot methods for multi-object tracking (MOT). These methods use a single network to produce both object detection results and embedding features simultaneously, achieving a balance of accuracy and speed. However, it is hard for the backbone network to extract high-quality feature information in complex scenes such as complex backgrounds or occlusions. In addition, most methods rely on identical rules to fuse appearance and motion information during the data association phase, which may fail when the target is briefly obscured or lost. In this work, we propose a novel multi-object tracker FSTrack that aims to address the challenges mentioned above. Our proposed solution incorporates feature enhancement and similarity estimation techniques to improve model performance. Specifically, we introduce the efficient channel attention module into the backbone network to facilitate better information interaction between channels and enhance representation capability. Furthermore, we propose a novel similarity matrix that combines appearance distance and DIoU distance of the target, resulting in superior association accuracy and fewer identity switching times. Experimental results on the MOT benchmarks, MOT17 and MOT20, demonstrate the superiority of our approach, especially the significantly improved tracking continuity.
Published in: IEEE Signal Processing Letters ( Volume: 31)