mmMCL3DMOT: Multi-Modal Momentum Contrastive Learning for 3D Multi-Object Tracking | IEEE Journals & Magazine | IEEE Xplore

mmMCL3DMOT: Multi-Modal Momentum Contrastive Learning for 3D Multi-Object Tracking


Abstract:

3D multi-object tracking methods utilize object motion, image, and point cloud information to compute similarities between different objects, facilitating cross-frame dat...Show More

Abstract:

3D multi-object tracking methods utilize object motion, image, and point cloud information to compute similarities between different objects, facilitating cross-frame data association. In this letter, we propose a novel approach called mmMCL3DMOT to calculate object appearance similarity by employing multi-modal momentum contrastive self-supervised learning. We introduce three key techniques. First, a self-supervised training paradigm is adopted, incorporating image, point cloud, and existing 3D detection inputs to enable multi-modal feature extraction without manual annotation. Second, our feature learning approach combines intra-modal and cross-modal feature correspondences within image and point cloud modalities, resulting in more discriminative feature extraction with momentum contrast. Finally, by computing similarity using the multi-modal features and incorporating a robust motion metric, we enable joint cascade reasoning for object association, leading to high-performance 3D MOT. Extensive experiments have demonstrated the significant impact of our method. Moreover, our tracker achieves state-of-the-art (SOTA) performance on both the KITTI and nuScenes datasets.
Published in: IEEE Signal Processing Letters ( Volume: 31)
Page(s): 1895 - 1899
Date of Publication: 19 July 2024

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.