LRCFormer: lightweight transformer based radar-camera fusion for 3D target detection

Huang, Xiaohong; Xu, Kunqiang; Tian, Ziran

doi:10.1007/s11760-024-03595-2

LRCFormer: lightweight transformer based radar-camera fusion for 3D target detection

Original Paper
Published: 03 December 2024

Volume 19, article number 51, (2025)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Xiaohong Huang¹,
Kunqiang Xu¹ &
Ziran Tian¹

215 Accesses
Explore all metrics

Abstract

In response to sensor performance degradation under complex environmental conditions and the low detection efficiency of traditional 3D object detection models due to their inherent complexity. In this paper, we propose a lightweight 3D object detection model based on the improved REDFormer, dubbed "LRCFormer". Initially, to enhance computational efficiency and mitigate the vanishing gradient problem, a piecewise linear activation function is introduced to optimize the radar backbone network. This network, in conjunction with the image encoder, extracts radar and multi-scale image features, respectively. Subsequently, we propose an improved spatio-temporal encoding fusion module. This module employs a single-head attention mechanism, replacing the traditional multi-head attention mechanism, and incorporates multi-scale pooling feature extraction to optimize the temporal attention module, thus enhancing the processing efficiency of time-series data. Furthermore, a multi-scale fusion network replaces the forward feedback network in the original encoder, thereby effectively integrating features of different resolutions. Finally, the detection head performs 3D object detection tasks. Experimental results on the nuScenes public dataset show that the model is not only smaller in size (reducing 13.2M parameters compared to the baseline) but also achieves better detection accuracy than the state-of-the-art (SOTA) models, with an average detection precision (mAP) increase of 0.7% and a nuScenes detection score (NDS) increase of 1.8%.Particularly in rainy and nighttime scenarios, improvements of 2% and 1.9% in mAP, and 0.5% and 4.5% in NDS, respectively, were achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Radar-camera fusion for 3D object detection with aggregation transformer

Article 22 August 2024

Interactive guidance network for object detection based on radar-camera fusion

Article 31 August 2023

CenRadfusion: fusing image center detection and millimeter wave radar for 3D object detection

Article 31 May 2024

Data availability

No datasets were generated or analysed during the current study.

References

Barbosa, F.M., Osório, F.S.: Camera-Radar Perception for Autonomous Vehicles and ADAS: Concepts, Datasets and Metrics. Preprint at https://arxiv.org/abs/2303.04302 (2023)
Zhou, Y., Liu, L., et al.: Towards deep radar perception for autonomous driving: datasets, methods, and challenges. Sensors 22(11), 4208 (2022)
Article MATH Google Scholar
Liu, Z., Cai, Y., et al.: Robust target recognition and tracking of self-driving cars with radar and camera information fusion under severe weather conditions. IEEE Transact. Intell. Transp. Syst. 23(7), 6640–6653 (2021)
Article MATH Google Scholar
Yao, S., Guan, R., et al.: Radar-camera fusion for object detection and semantic segmentation in autonomous driving: a comprehensive review. IEEE Transact. Intell. Veh. (2023)
Liang, T., Xie, H., et al.: Bevfusion: a simple and robust lidar-camera fusion framework. Adv. Neural Info. Process. Syst. 35, 10421–10434 (2022)
MATH Google Scholar
Liu, Z., Tang, H., et al.: Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation. In: Paper Presented at the 2023 IEEE International Conference on Robotics and Automation (ICRA) (2023)
Stäcker, L., Mishra, S., et al.: RC-BEVFusion: A plug-in module for radar-camera bird’s eye view feature fusion. In: Paper Presented at the DAGM German Conference on Pattern Recognition (2023)
Li, Z., Wang, W., et al.: Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. In: Paper Presented at the European Conference on Computer Vision (2022)
Cui, C., et al.: Radar Enlightens the Dark: Enhancing Low-Visibility Perception for Automated Vehicles with Camera-Radar Fusion. In: Paper Presented at the 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC) (2023)
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. Adv. Neural Inform. Process. Syst. 30 (2017)
Zhou, Q., Sun, Z., et al.: Mixture lightweight transformer for scene understanding. Comput. Electr. Eng. 108, 108698 (2023)
Article MATH Google Scholar
Zhou, Y., Tuzel, O.: Voxelnet: End-to-end learning for point cloud based 3d object detection. In: Paper Presented at the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Paper Presented at the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Adv. Neural Info. Process. Syst. 30 (2017)
Lang, A.H., Vora, S., et al.: Pointpillars: Fast encoders for object detection from point clouds. In: Paper Presented at the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)
Yin, T., Zhou, X., Krahenbuhl, P.: Center-based 3d object detection and tracking. In: Paper Presented at the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021)
Wang, Y., Guizilini, V.C., et al.: Detr3d: 3d object detection from multi-view images via 3d-to-2d queries. In: Paper Presented at the Conference on Robot Learning (2022)
Carion, N., Massa, F., et al.: End-to-end object detection with transformers. In: Paper Presented at the European conference on computer vision (2020)
Liu, Y., Wang, T., Zhang, X., Sun, J.: Petr: Position embedding transformation for multi-view 3d object detection. In: Paper Presented at the European conference on computer vision (2022)
Li, Y., Ge, Z., et al.: Bevdepth: Acquisition of reliable depth for multi-view 3d object detection. In: Paper Presented at the AAAI Conference on Artificial Intelligence (2023)
Nabati, R., Qi, H.: Rrpn: Radar region proposal network for object detection in autonomous vehicles. In: Paper Presented at the 2019 IEEE International Conference on Image Processing (ICIP) (2019)
Bansal, K., Rungta, K., Bharadia, D.: Radsegnet: A reliable approach to radar camera fusion. Preprint at https://arxiv.org/abs/2208.03849 (2022)
John, V., Mita, S.: RVNet: Deep sensor fusion of monocular camera and radar for image-based obstacle detection in challenging environments. In: Paper Presented at the Image and Video Technology: 9th Pacific-Rim Symposium, PSIVT 2019, Sydney, NSW, Australia, November 18–22, 2019, Proceedings 9 (2019)
Nabati, R., Qi, H.: Centerfusion: Center-based radar and camera fusion for 3d object detection. In: Paper Presented at the Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021)
Duan, K., Bai, S., Xie, L., et al.: Centernet: Keypoint triplets for object detection. In: Paper Presented at the IEEE/CVF International Conference on Computer Vision (2019)
Kowol, K., Rottmann, M., Bracke, S., Gottschalk, H.: Yodar: Uncertainty-based sensor fusion for vehicle detection with camera and radar sensors. Preprint at https://arxiv.org/abs/2010.03320 (2020)
Nobis, F., Geisslinger, M., et al.: A deep learning-based radar and camera sensor fusion architecture for object detection. In: Paper Presented at the 2019 Sensor Data Fusion: Trends, Solutions, Applications (SDF) (2019)
Chang, S., Zhang, Y., et al.: Spatial attention fusion for obstacle detection using mmwave radar and vision sensor. Sensors 20(4), 956 (2020)
Article MATH Google Scholar
Long, Y., Kumar, A., et al.: RADIANT: Radar-image association network for 3D object detection. In: Paper presented at the AAAI Conference on Artificial Intelligence (2023)
Zhou, T., Chen, J., et al.: Bridging the view disparity between radar and camera features for multi-modal fusion 3d object detection. IEEE Transact. Intell. Veh. 8(2), 1523–1535 (2023)
Article MATH Google Scholar
Kim, Y., Kim, S., Choi, J.W., Kum, D.: Craft: Camera-radar 3d object detection with spatio-contextual fusion transformer. In: Paper Presented at the AAAI Conference on Artificial Intelligence (2023)
Kim, Y., Shin, J., et al.: Crn: Camera radar net for accurate, robust, efficient 3d perception. In: Paper Presented at the IEEE/CVF International Conference on Computer Vision (2023)
Chen, X., Zhang, T., et al.: Futr3d: A unified sensor fusion framework for 3d detection. In: Paper Presented at the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)
Shuai, X., Shen, Y., et al.: millieye: A lightweight mmwave radar and camera fusion system for robust object detection. In: Paper Presented at the International Conference on Internet-of-Things Design and Implementation (2021)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Paper Presented at the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Lin, T.Y., Dollár, P., et al.: Feature pyramid networks for object detection. In: Paper Presented at the IEEE Conference on Computer Vision and Pattern Recognition (2017)

Download references

Author information

Authors and Affiliations

Hebei Provincial Key Laboratory of Industrial Intelligent Perception, College of Artificial Intelligence, North China University of Science and Technology, Tangshan, 063210, Hebei, China
Xiaohong Huang, Kunqiang Xu & Ziran Tian

Authors

Xiaohong Huang
View author publications
You can also search for this author inPubMed Google Scholar
Kunqiang Xu
View author publications
You can also search for this author inPubMed Google Scholar
Ziran Tian
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

X.H. Design research methods, experimental design. Data processing and analysis. Participate in discussions and revisions and K.X. is involved in study design, method selection. Assist in data processing and analysis and write the first draft of the paper and and Z.T. participated in the writing and revision of the paper.

Corresponding author

Correspondence to Kunqiang Xu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Huang, X., Xu, K. & Tian, Z. LRCFormer: lightweight transformer based radar-camera fusion for 3D target detection. SIViP 19, 51 (2025). https://doi.org/10.1007/s11760-024-03595-2

Download citation

Received: 28 May 2024
Revised: 27 September 2024
Accepted: 06 November 2024
Published: 03 December 2024
DOI: https://doi.org/10.1007/s11760-024-03595-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LRCFormer: lightweight transformer based radar-camera fusion for 3D target detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Radar-camera fusion for 3D object detection with aggregation transformer

Interactive guidance network for object detection based on radar-camera fusion

CenRadfusion: fusing image center detection and millimeter wave radar for 3D object detection

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now