Skip to main content
Log in

LRCFormer: lightweight transformer based radar-camera fusion for 3D target detection

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

In response to sensor performance degradation under complex environmental conditions and the low detection efficiency of traditional 3D object detection models due to their inherent complexity. In this paper, we propose a lightweight 3D object detection model based on the improved REDFormer, dubbed "LRCFormer". Initially, to enhance computational efficiency and mitigate the vanishing gradient problem, a piecewise linear activation function is introduced to optimize the radar backbone network. This network, in conjunction with the image encoder, extracts radar and multi-scale image features, respectively. Subsequently, we propose an improved spatio-temporal encoding fusion module. This module employs a single-head attention mechanism, replacing the traditional multi-head attention mechanism, and incorporates multi-scale pooling feature extraction to optimize the temporal attention module, thus enhancing the processing efficiency of time-series data. Furthermore, a multi-scale fusion network replaces the forward feedback network in the original encoder, thereby effectively integrating features of different resolutions. Finally, the detection head performs 3D object detection tasks. Experimental results on the nuScenes public dataset show that the model is not only smaller in size (reducing 13.2M parameters compared to the baseline) but also achieves better detection accuracy than the state-of-the-art (SOTA) models, with an average detection precision (mAP) increase of 0.7% and a nuScenes detection score (NDS) increase of 1.8%.Particularly in rainy and nighttime scenarios, improvements of 2% and 1.9% in mAP, and 0.5% and 4.5% in NDS, respectively, were achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

No datasets were generated or analysed during the current study.

References

  1. Barbosa, F.M., Osório, F.S.: Camera-Radar Perception for Autonomous Vehicles and ADAS: Concepts, Datasets and Metrics. Preprint at https://arxiv.org/abs/2303.04302 (2023)

  2. Zhou, Y., Liu, L., et al.: Towards deep radar perception for autonomous driving: datasets, methods, and challenges. Sensors 22(11), 4208 (2022)

    Article  MATH  Google Scholar 

  3. Liu, Z., Cai, Y., et al.: Robust target recognition and tracking of self-driving cars with radar and camera information fusion under severe weather conditions. IEEE Transact. Intell. Transp. Syst. 23(7), 6640–6653 (2021)

    Article  MATH  Google Scholar 

  4. Yao, S., Guan, R., et al.: Radar-camera fusion for object detection and semantic segmentation in autonomous driving: a comprehensive review. IEEE Transact. Intell. Veh. (2023)

  5. Liang, T., Xie, H., et al.: Bevfusion: a simple and robust lidar-camera fusion framework. Adv. Neural Info. Process. Syst. 35, 10421–10434 (2022)

    MATH  Google Scholar 

  6. Liu, Z., Tang, H., et al.: Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation. In: Paper Presented at the 2023 IEEE International Conference on Robotics and Automation (ICRA) (2023)

  7. Stäcker, L., Mishra, S., et al.: RC-BEVFusion: A plug-in module for radar-camera bird’s eye view feature fusion. In: Paper Presented at the DAGM German Conference on Pattern Recognition (2023)

  8. Li, Z., Wang, W., et al.: Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. In: Paper Presented at the European Conference on Computer Vision (2022)

  9. Cui, C., et al.: Radar Enlightens the Dark: Enhancing Low-Visibility Perception for Automated Vehicles with Camera-Radar Fusion. In: Paper Presented at the 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC) (2023)

  10. Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. Adv. Neural Inform. Process. Syst. 30 (2017)

  11. Zhou, Q., Sun, Z., et al.: Mixture lightweight transformer for scene understanding. Comput. Electr. Eng. 108, 108698 (2023)

    Article  MATH  Google Scholar 

  12. Zhou, Y., Tuzel, O.: Voxelnet: End-to-end learning for point cloud based 3d object detection. In: Paper Presented at the IEEE Conference on Computer Vision and Pattern Recognition (2018)

  13. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Paper Presented at the IEEE Conference on Computer Vision and Pattern Recognition (2017)

  14. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Adv. Neural Info. Process. Syst. 30 (2017)

  15. Lang, A.H., Vora, S., et al.: Pointpillars: Fast encoders for object detection from point clouds. In: Paper Presented at the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)

  16. Yin, T., Zhou, X., Krahenbuhl, P.: Center-based 3d object detection and tracking. In: Paper Presented at the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021)

  17. Wang, Y., Guizilini, V.C., et al.: Detr3d: 3d object detection from multi-view images via 3d-to-2d queries. In: Paper Presented at the Conference on Robot Learning (2022)

  18. Carion, N., Massa, F., et al.: End-to-end object detection with transformers. In: Paper Presented at the European conference on computer vision (2020)

  19. Liu, Y., Wang, T., Zhang, X., Sun, J.: Petr: Position embedding transformation for multi-view 3d object detection. In: Paper Presented at the European conference on computer vision (2022)

  20. Li, Y., Ge, Z., et al.: Bevdepth: Acquisition of reliable depth for multi-view 3d object detection. In: Paper Presented at the AAAI Conference on Artificial Intelligence (2023)

  21. Nabati, R., Qi, H.: Rrpn: Radar region proposal network for object detection in autonomous vehicles. In: Paper Presented at the 2019 IEEE International Conference on Image Processing (ICIP) (2019)

  22. Bansal, K., Rungta, K., Bharadia, D.: Radsegnet: A reliable approach to radar camera fusion. Preprint at https://arxiv.org/abs/2208.03849 (2022)

  23. John, V., Mita, S.: RVNet: Deep sensor fusion of monocular camera and radar for image-based obstacle detection in challenging environments. In: Paper Presented at the Image and Video Technology: 9th Pacific-Rim Symposium, PSIVT 2019, Sydney, NSW, Australia, November 18–22, 2019, Proceedings 9 (2019)

  24. Nabati, R., Qi, H.: Centerfusion: Center-based radar and camera fusion for 3d object detection. In: Paper Presented at the Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021)

  25. Duan, K., Bai, S., Xie, L., et al.: Centernet: Keypoint triplets for object detection. In: Paper Presented at the IEEE/CVF International Conference on Computer Vision (2019)

  26. Kowol, K., Rottmann, M., Bracke, S., Gottschalk, H.: Yodar: Uncertainty-based sensor fusion for vehicle detection with camera and radar sensors. Preprint at https://arxiv.org/abs/2010.03320 (2020)

  27. Nobis, F., Geisslinger, M., et al.: A deep learning-based radar and camera sensor fusion architecture for object detection. In: Paper Presented at the 2019 Sensor Data Fusion: Trends, Solutions, Applications (SDF) (2019)

  28. Chang, S., Zhang, Y., et al.: Spatial attention fusion for obstacle detection using mmwave radar and vision sensor. Sensors 20(4), 956 (2020)

    Article  MATH  Google Scholar 

  29. Long, Y., Kumar, A., et al.: RADIANT: Radar-image association network for 3D object detection. In: Paper presented at the AAAI Conference on Artificial Intelligence (2023)

  30. Zhou, T., Chen, J., et al.: Bridging the view disparity between radar and camera features for multi-modal fusion 3d object detection. IEEE Transact. Intell. Veh. 8(2), 1523–1535 (2023)

    Article  MATH  Google Scholar 

  31. Kim, Y., Kim, S., Choi, J.W., Kum, D.: Craft: Camera-radar 3d object detection with spatio-contextual fusion transformer. In: Paper Presented at the AAAI Conference on Artificial Intelligence (2023)

  32. Kim, Y., Shin, J., et al.: Crn: Camera radar net for accurate, robust, efficient 3d perception. In: Paper Presented at the IEEE/CVF International Conference on Computer Vision (2023)

  33. Chen, X., Zhang, T., et al.: Futr3d: A unified sensor fusion framework for 3d detection. In: Paper Presented at the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)

  34. Shuai, X., Shen, Y., et al.: millieye: A lightweight mmwave radar and camera fusion system for robust object detection. In: Paper Presented at the International Conference on Internet-of-Things Design and Implementation (2021)

  35. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Paper Presented at the IEEE Conference on Computer Vision and Pattern Recognition (2016)

  36. Lin, T.Y., Dollár, P., et al.: Feature pyramid networks for object detection. In: Paper Presented at the IEEE Conference on Computer Vision and Pattern Recognition (2017)

Download references

Author information

Authors and Affiliations

Authors

Contributions

X.H. Design research methods, experimental design. Data processing and analysis. Participate in discussions and revisions and K.X. is involved in study design, method selection. Assist in data processing and analysis and write the first draft of the paper and and Z.T. participated in the writing and revision of the paper.

Corresponding author

Correspondence to Kunqiang Xu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, X., Xu, K. & Tian, Z. LRCFormer: lightweight transformer based radar-camera fusion for 3D target detection. SIViP 19, 51 (2025). https://doi.org/10.1007/s11760-024-03595-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11760-024-03595-2

Keywords