MCHFormer: A Multi-Cross Hybrid Former of Point-Image for 3D Object Detection | IEEE Journals & Magazine | IEEE Xplore

MCHFormer: A Multi-Cross Hybrid Former of Point-Image for 3D Object Detection


Abstract:

Mismatch often occurs between local and global information in multimodal data during downscaling transformation, which results in the loss of localization information. A ...Show More

Abstract:

Mismatch often occurs between local and global information in multimodal data during downscaling transformation, which results in the loss of localization information. A Multi-Cross Hybrid Former (MCHFormer) of point-image is proposed for 3D object detection in autonomous driving, which cross-fuses LiDAR with cameras at multiple levels. Specifically, the voxelized point cloud is firstly extracted through a Dual-Stream Feature Extraction (DSFE) network. Local fine-grained area information is integrated into the global feature information, which results in a multi-layered Bird's Eye View (BEV). Meanwhile, the raw coordinates of points are incorporated into point-wise features through position coding. Then, point features are projected onto image and BEV features to obtain highly coupled multimodal information, which achieves alignment of point cloud with image information. Finally, a multi-cross Transformer fuses multiple unimodal data into a hybrid representation with more spatial awareness, which achieves accurate 3D object detection. MCHFormer are conducted extensive comparative experiments with other State-Of-The-Art (SOTA) algorithms on the KITTI, NuScenes, Waymo datasets and real road scenes. Experimental results show that the proposed algorithm not only has better accuracy and generalization capability, but also has accurate detection effect on real road scenarios.
Published in: IEEE Transactions on Intelligent Vehicles ( Volume: 9, Issue: 1, January 2024)
Page(s): 383 - 394
Date of Publication: 10 October 2023

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.