Abstract:
3D object detection plays a key role in the perception system of intelligent vehicles. The reliable 3D structural information provided by LiDAR points enables the accurat...Show MoreMetadata
Abstract:
3D object detection plays a key role in the perception system of intelligent vehicles. The reliable 3D structural information provided by LiDAR points enables the accurate regression of position and pose, while the semantic ambiguity issue caused by the sparse points is still challenging. In this article, a scalable 3D object detection pipeline CenterSFA and a series of new modules are proposed to improve the detection performance. In contrast to previous point-level fusing models, semantic and geometric cues from images are sequentially utilized in a center-based paradigm. The object centers are accurately predicted with semantic guidance and selectively employed as the basis for feature aggregation and property regression. Specifically, the attention mechanism is utilized in the semantic and spatial similarity calculation, enabling the surrounding feature aggregation for multi-scale objects. An instance-level correlation is established between the camera feature and the BEV feature for cross-modal feature aggregation. Extensive experiments are conducted on the large-scale nuScenes dataset to verify the state-of-the-art performance of the proposed model, especially for occluded objects and far-range detection. The proposed model outperforms the competitive CenterPoint by 10.4% in mAP and 5.4% in NDS, as well as the representative fusion method MVP by 2.8% in mAP and 1.6% in NDS on val set, indicating its superiority in accurate 3D detection.
Published in: IEEE Transactions on Intelligent Vehicles ( Volume: 9, Issue: 1, January 2024)