Skip to main content
Log in

Radar-camera fusion for 3D object detection with aggregation transformer

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In recent years, with the continuous development of autonomous driving, monocular 3D object detection has garnered increasing attention as a crucial research topic. However, the precision of 3D object detection is impeded by the limitations of monocular camera sensors, which struggle to capture accurate depth information. To address this challenge, a novel Aggregation Transformer Network (ATNet) is introduced, featuring Cross-Attention based Positional Aggregation and Dual Expansion-Squeeze based Channel Aggregation. The proposed ATNet adaptively fuses radar and camera data at both positional and channel levels. Specifically, the Cross-Attention based Positional Aggregation leverages camera-radar information to compute a non-linear attention coefficient, which reinforces salient features and suppresses irrelevant ones. The Dual Expansion-Squeeze based Channel Aggregation utilizes refined processing techniques to integrate radar and camera data adaptively at the channel level. Furthermore, to enhance feature-level fusion, we propose a multi-scale radar-camera fusion strategy that integrates radar information across multiple stages of the camera subnet’s backbone, allowing for improved object detection across various scales. Extensive experiments conducted on the widely-used nuScenes dataset validate that our proposed Aggregation Transformer, when integrated into superb monocular 3D object detection models, delivers promising results compared to existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability and access

The data that support the findings of this study are openly available in nuScenes at https://www.nuscenes.org/nuscenes.

References

  1. Ahn B, Kim Y, Park G, Cho NI (2018) Block-matching convolutional neural network (bmcnn): improving cnn-based denoising by block-matched inputs. In: 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 516–525. IEEE https://doi.org/10.23919/apsipa.2018.8659548

  2. Hosseini SA, Abbaszadeh Shahri A, Asheghi R (2022) Prediction of bedload transport rate using a block combined network structure. Hydrol Sci J 67(1):117–128. https://doi.org/10.1080/02626667.2021.2003367

    Article  Google Scholar 

  3. Asheghi R, Hosseini SA, Saneie M, Shahri AA (2020) Updating the neural network sediment load models using different sensitivity analysis methods: a regional application. J Hydroinf 22(3):562–577. https://doi.org/10.2166/hydro.2020.098

    Article  Google Scholar 

  4. Zhang J, Huang K, Tan T, Zhang Z (2017) Local structured representation for generic object detection. Front Comp Sci 11:632–648. https://doi.org/10.1007/s11704-016-5530-6

    Article  Google Scholar 

  5. Lee DH, Chen K-L, Liou K-H, Liu C-L, Liu J-L (2021) Deep learning and control algorithms of direct perception for autonomous driving. Appl Intell 51(1):237–247. https://doi.org/10.1007/s10489-020-01827-9

    Article  Google Scholar 

  6. Dickmanns ED (1992) A general dynamic vision architecture for ugv and uav. Appl Intell 2:251–270. https://doi.org/10.1007/bf00119551

    Article  Google Scholar 

  7. Dupuis E, Novo D, O’Connor I, Bosio A (2020) Sensitivity analysis and compression opportunities in dnns using weight sharing. In: 2020 23rd International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS), pp. 1–6. IEEE https://doi.org/10.1109/ddecs50862.2020.9095658

  8. Abbaszadeh Shahri A, Chunling S, Larsson S (2023) A hybrid ensemble-based automated deep learning approach to generate 3d geo-models and uncertainty analysis. Eng Comput 1–16. https://doi.org/10.1007/s00366-023-01852-5

  9. Kumar A, Brazil G, Corona E, Parchami A, Liu X (2022) Deviant: Depth equivariant network for monocular 3d object detection. In: European Conference on Computer Vision, pp. 664–683. https://doi.org/10.1007/978-3-031-20077-9_39 . Springer

  10. Peng L, Wu X, Yang Z, Liu H, Cai D (2022) Did-m3d: Decoupling instance depth for monocular 3d object detection. arXiv preprint arXiv:2207.08531https://doi.org/10.1007/978-3-031-19769-7_5

  11. Hao W, Andolina IM, Wang W, Zhang Z (2021) Biologically inspired visual computing: the state of the art. Front Comp Sci 15(1):151304. https://doi.org/10.1007/s11704-020-9001-8

    Article  Google Scholar 

  12. Wang T, Pang J, Lin D (2022) Monocular 3d object detection with depth from motion. In: European Conference on Computer Vision, pp. 386–403. Springer https://doi.org/10.1007/978-3-031-20077-9_23

  13. Gao T, Jia Z, Lin W, Li Y (2022) Delving into monocular 3d vehicle tracking: a decoupled framework and a dedicated metric. Appl Intell 1–11. https://doi.org/10.1007/s10489-022-03432-4

  14. Naik DL, Kiran R (2021) A novel sensitivity-based method for feature selection. J Big Data 8(1):128. https://doi.org/10.1186/s40537-021-00515-w

    Article  Google Scholar 

  15. Zhang P (2019) A novel feature selection method based on global sensitivity analysis with application in machine learning-based prediction model. Appl Soft Comput 85:105859. https://doi.org/10.1016/j.asoc.2019.105859

    Article  Google Scholar 

  16. Gao T, Pan H, Gao H (2022) Monocular 3d object detection with sequential feature association and depth hint augmentation. IEEE Trans Intell 7(2):240–250. https://doi.org/10.1109/tiv.2022.3143954

    Article  Google Scholar 

  17. Wang T, Zhu X, Pang J, Lin D (2021) Fcos3d: Fully convolutional one-stage monocular 3d object detection. In: IEEE International Conference on Computer Vision, pp. 913–922. https://doi.org/10.1109/iccvw54120.2021.00107

  18. Liu Z, Wu Z, Tóth R (2020) Smoke: Single-stage monocular 3d object detection via keypoint estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 996–997. https://doi.org/10.1109/cvprw50498.2020.00506

  19. Zhang Y, Zheng W, Zhu Z, Huang G, Du D, Zhou J, Lu J (2022) Dimension embeddings for monocular 3d object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1589–1598. https://doi.org/10.1109/cvpr52688.2022.00164

  20. Li Z, Qu Z, Zhou Y, Liu J, Wang H, Jiang L (2022) Diversity matters: Fully exploiting depth clues for reliable monocular 3d object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2791–2800. https://doi.org/10.1109/cvpr52688.2022.00281

  21. Jiang H, Cheng MM, Li SJ, Borji A, Wang J (2019) Joint salient object detection and existence prediction. Front Comp Sci 13:778–788. https://doi.org/10.1007/s11704-017-6613-8

    Article  Google Scholar 

  22. Lian, Q., Ye, B., Xu, R., Yao, W., Zhang, T (2022) Exploring geometric consistency for monocular 3d object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1685–1694. https://doi.org/10.1109/cvpr52688.2022.00173

  23. Gu J, Wu B, Fan L, Huang J, Cao S, Xiang Z, Hua XS (2022) Homography loss for monocular 3d object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1080–1089. https://doi.org/10.1109/cvpr52688.2022.00115

  24. Yang B, Luo W, Urtasun R (2018) Pixor: Real-time 3d object detection from point clouds. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7652–7660. https://doi.org/10.1109/cvpr.2018.00798

  25. Wang W, Wang T, Cai Y (2022) Multi-view attention-convolution pooling network for 3d point cloud classification. Appl Intell 52(13):14787–14798. https://doi.org/10.1007/s10489-021-02840-2

    Article  Google Scholar 

  26. Rozsa Z, Sziranyi T (2019) Object detection from a few lidar scanning planes. IEEE Trans Intell Veh 4(4):548–560. https://doi.org/10.1109/tiv.2019.2938109

    Article  Google Scholar 

  27. Zhang R, Qiu H, Wang T, Guo Z, Xu X, Qiao Y, Gao P, Li H (2022) Monodetr: Depth-guided transformer for monocular 3d object detection. arXiv preprint arXiv:2203.13310https://doi.org/10.1109/iccv51070.2023.00840

  28. Qin Z, Li X (2022) Monoground: Detecting monocular 3d objects from the ground. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3793–3802. https://doi.org/10.1109/cvpr52688.2022.00377

  29. Lian Q, Li P, Chen X (2022) Monojsg: Joint semantic and geometric cost volume for monocular 3d object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1070–1079. https://doi.org/10.1109/cvpr52688.2022.00114

  30. Chen YN, Dai H, Ding Y (2022) Pseudo-stereo for monocular 3d object detection in autonomous driving. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 887–897. https://doi.org/10.1109/cvpr52688.2022.00096

  31. Li P, Jin J (2022) Time3d: End-to-end joint monocular 3d object detection and tracking for autonomous driving. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3885–3894. https://doi.org/10.1109/cvpr52688.2022.00386

  32. Li Y, Chen Y, He J, Zhang Z (2022) Densely constrained depth estimator for monocular 3d object detection. In: European Conference on Computer Vision, pp. 718–734. Springer https://doi.org/10.1007/978-3-031-20077-9_42

  33. Battaglia E, Bioglio L, Pensa RG (2020) Towards content sensitivity analysis. In: Berthold, M.R., Feelders, A., Krempl, G. (eds.) Advances in Intelligent Data Analysis XVIII, pp. 67–79. Springer, Cham. https://doi.org/10.1007/978-3-030-44584-3_6

  34. Yeung DS, Cloete I, Shi D, Ng W (2010) Sensitivity Analysis for Neural Networks. Springer, ???

  35. He C, Li R, Li S, Zhang L (2022) Voxel set transformer: A set-to-set approach to 3d object detection from point clouds. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8417–8427. https://doi.org/10.1109/cvpr52688.2022.00823

  36. Li Y, Qi X, Chen Y, Wang L, Li Z, Sun J, Jia J (2022) Voxel field fusion for 3d object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1120–1129. https://doi.org/10.1109/cvpr52688.2022.00119

  37. Fazlali H, Xu Y, Ren Y, Liu B (2022) A versatile multi-view framework for lidar-based 3d object detection with guidance from panoptic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 17192–17201. https://doi.org/10.1109/cvpr52688.2022.01668

  38. Fan L, Pang Z, Zhang T, Wang YX, Zhao H, Wang F, Wang N, Zhang Z (2022) Embracing single stride 3d object detector with sparse transformer. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8458–8468. https://doi.org/10.1109/cvpr52688.2022.00827

  39. Lehner A, Gasperini S, Marcos-Ramiro A, Schmidt M, Mahani MAN, Navab N, Busam B, Tombari F (2022) 3d-vfield: Adversarial augmentation of point clouds for domain generalization in 3d object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 17295–17304. https://doi.org/10.1109/cvpr52688.2022.01678

  40. Li X, Kong D (2022) Srif-rcnn: Sparsely represented inputs fusion of different sensors for 3d object detection. Appl Intell 1–22. https://doi.org/10.1007/s10489-022-03594-1

  41. Xu X, Wang W, Wang J (2016) A three-way incremental-learning algorithm for radar emitter identification. Front Comp Sci 10:673–688. https://doi.org/10.1007/s11704-015-4457-7

    Article  Google Scholar 

  42. Nabati R, Qi H (2021) Centerfusion: Center-based radar and camera fusion for 3d object detection. In: IEEE Winter Conference on Applications of Computer Vision, pp. 1527–1536. https://doi.org/10.1109/wacv48630.2021.00157

  43. Abbaszadeh Shahri A, Maghsoudi Moud F (2021) Landslide susceptibility mapping using hybridized block modular intelligence model. Bull Eng Geol Environ 80:267–284. https://doi.org/10.1007/s10064-020-01922-8

    Article  Google Scholar 

  44. Zou BJ, Guo YD, He Q, Ouyang PB, Liu K, Chen ZL (2018) 3d filtering by block matching and convolutional neural network for image denoising. J Comput Sci Technol 33:838–848. https://doi.org/10.1007/s11390-018-1859-7

    Article  Google Scholar 

  45. Zhou J, Ni J, Rao Y (2017) Block-based convolutional neural network for image forgery detection. In: Digital Forensics and Watermarking: 16th International Workshop, IWDW 2017, Magdeburg, Germany, August 23-25, 2017, Proceedings 16, pp. 65–76. Springer https://doi.org/10.1007/978-3-319-64185-0_6

  46. Lin JT, Dai D, Van Gool L (2020) Depth estimation from monocular images and sparse radar data. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 10233–10240. IEEE https://doi.org/10.1109/iros45743.2020.9340998

  47. Li Y, Chen Y, Qi X, Li Z, Sun J, Jia J (2022) Unifying voxel-based representation with transformer for 3d object detection. arXiv preprint arXiv:2206.00630

  48. Liu Z, Tang H, Amini A, Yang X, Mao H, Rus D, Han S (2022) Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation. arXiv preprint arXiv:2205.13542. https://doi.org/10.1109/icra48891.2023.10160968

  49. Bai X, Hu Z, Zhu X, Huang Q, Chen Y, Fu H, Tai CL (2022) Transfusion: Robust lidar-camera fusion for 3d object detection with transformers. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1090–1099. https://doi.org/10.1109/cvpr52688.2022.00116

  50. Xu S, Zhou D, Fang J, Yin J, Bin Z, Zhang L (2021) Fusionpainting: Multimodal fusion with adaptive attention for 3d object detection. In: IEEE International Conference on Intelligent Transportation Systems, pp. 3047–3054. IEEE https://doi.org/10.1109/itsc48978.2021.9564951

  51. Nobis F, Geisslinger M, Weber M, Betz J, Lienkamp M (2019) A deep learning-based radar and camera sensor fusion architecture for object detection. In: Sensor Data Fusion: Trends, Solutions, Applications, pp. 1–7. IEEE https://doi.org/10.1109/sdf.2019.8916629

  52. Long Y, Morris D, Liu X, Castro M, Chakravarty P, Narayanan P (2021) Radar-camera pixel depth association for depth completion. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 12507–12516. https://doi.org/10.1109/cvpr46437.2021.01232

  53. Wang Y, Jiang Z, Li Y, Hwang J-N, Xing G, Liu H (2021) Rodnet: A real-time radar object detection network cross-supervised by camera-radar fused object 3d localization. IEEE J Sel Top Signal Process 15(4):954–967. https://doi.org/10.1109/jstsp.2021.3058895

    Article  Google Scholar 

  54. Nabati R, Qi H (2019) Rrpn: Radar region proposal network for object detection in autonomous vehicles. In: IEEE International Conference on Image Processing, pp. 3093–3097. IEEE https://doi.org/10.1109/icip.2019.8803392

  55. Zeng Y, Zhang D, Wang C, Miao Z, Liu T, Zhan X, Hao D, Ma C (2022) Lift: Learning 4d lidar image fusion transformer for 3d object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 17172–17181. https://doi.org/10.1109/cvpr52688.2022.01666

  56. Peri N, Luiten J, Li M, Ošep A, Leal-Taixé L, Ramanan D (2022) Forecasting from lidar via future object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 17202–17211. https://doi.org/10.1109/cvpr52688.2022.01669

  57. Liu C, Gao C, Liu, F, Liu J, Meng D, Gao X (2022) Ss3d: Sparsely-supervised 3d object detection from point cloud. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8428–8437. https://doi.org/10.1109/cvpr52688.2022.00824

  58. Hu JSK, Kuai T, Waslander SL (2022) Point density-aware voxels for lidar 3d object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8469–8478. https://doi.org/10.1109/cvpr52688.2022.00828

  59. Hahner M, Sakaridis C, Bijelic M, Heide F, Yu F, Dai D, Van Gool L (2022) Lidar snowfall simulation for robust 3d object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 16364–16374. https://doi.org/10.1109/cvpr52688.2022.01588

  60. Samal K, Kumawat H, Saha P, Wolf M, Mukhopadhyay S (2022) Task-driven rgb-lidar fusion for object tracking in resource-efficient autonomous system. IEEE Trans Intell Veh 7(1):102–112. https://doi.org/10.1109/tiv.2021.3087664

    Article  Google Scholar 

  61. Sun Y, Li J, Wang Y, Xu X, Yang X, Sun Z (2022) Atop: An attention-to-optimization approach for automatic lidar-camera calibration via cross-modal object matching. IEEE Trans Intell Veh 8(1):1–13. https://doi.org/10.1109/tiv.2022.3184976

    Article  Google Scholar 

  62. Li G, Ji Z, Qu X, Zhou R, Cao D (2022) Cross-domain object detection for autonomous driving: A stepwise domain adaptative yolo approach. IEEE Trans Intell Veh 7(3):603–615. https://doi.org/10.1109/tiv.2022.3165353

    Article  Google Scholar 

  63. Yadav R, Vierling A, Berns K (2020) Radar+ rgb fusion for robust object detection in autonomous vehicle. In: 2020 IEEE International Conference on Image Processing (ICIP), pp. 1986–1990. IEEE https://doi.org/10.1109/icip40778.2020.9191046

  64. Qian K, Zhu S, Zhang X, Li LE (2021) Robust multimodal vehicle detection in foggy weather using complementary lidar and radar signals. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 444–453. https://doi.org/10.1109/cvpr46437.2021.00051

  65. Hussain MI, Rafique MA, Jeon M (2021) Rvmde: Radar validated monocular depth estimation for robotics. arXiv preprint arXiv:2109.05265

  66. Misra I, Girdhar R, Joulin A (2021) An End-to-End Transformer Model for 3D Object Detection. In: IEEE International Conference on Computer Vision. https://doi.org/10.1109/iccv48922.2021.00290

  67. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al. (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929

  68. Li Z, Wang W, Li H, Xie E, Sima C, Lu T, Yu Q, Dai J (2022) Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. arXiv preprint arXiv:2203.17270. https://doi.org/10.1007/978-3-031-20077-9_1

  69. Huang KC, Wu TH, Su HT, Hsu WH (2022) Monodtr: Monocular 3d object detection with depth-aware transformer. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4012–4021. https://doi.org/10.1109/cvpr52688.2022.00398

  70. Zhu X, Ma Y, Wang T, Xu Y, Shi J, Lin D (2020) Ssn: Shape signature networks for multi-class object detection from point clouds. In: European Conference on Computer Vision, pp. 581–597. Springer https://doi.org/10.1007/978-3-030-58595-2_35

  71. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: IEEE International Conference on Computer Vision, pp. 10012–10022. https://doi.org/10.1109/iccv48922.2021.00986

  72. Xie E, Wang W, Yu Z, Anandkumar A, Alvarez JM, Luo P (2021) Segformer: Simple and efficient design for semantic segmentation with transformers. Adv Neural Inf Process Syst 34:12077–12090

    Google Scholar 

  73. Chen CFR, Fan Q, Panda R (2021) Crossvit: Cross-attention multi-scale vision transformer for image classification. In: IEEE International Conference on Computer Vision, pp. 357–366. https://doi.org/10.1109/iccv48922.2021.00041

  74. Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1933–1941. https://doi.org/10.1109/cvpr.2016.213

  75. Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: Keypoint triplets for object detection. In: IEEE International Conference on Computer Vision, pp. 6569–6578 https://doi.org/10.1109/iccv.2019.00667

  76. Wang Y, Guizilini VC, Zhang T, Wang Y, Zhao H, Solomon J (2022) Detr3d: 3d object detection from multi-view images via 3d-to-2d queries. In: The Conference on Robot Learning, pp. 180–191. PMLR

  77. Chen H, Wang P, Wang F, Tian W, Xiong L, Li H (2022) Epro-pnp: Generalized end-to-end probabilistic perspective-n-points for monocular object pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2781–2790. https://doi.org/10.1109/cvpr52688.2022.00280

  78. Wang T, Xinge Z, Pang J, Lin D (2022) Probabilistic and geometric depth: Detecting objects in perspective. In: The Conference on Robot Learning, pp. 1475–1485. PMLR

  79. Caesar H, Bankiti V, Lang AH, Vora S, Liong VE, Xu Q, Krishnan A, Pan Y, Baldan G, Beijbom O (2020) nuscenes: A multimodal dataset for autonomous driving. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 11621–11631. https://doi.org/10.1109/cvpr42600.2020.01164

  80. Wang J, Lan S, Gao M, Davis LS (2020) Infofocus: 3d object detection for autonomous driving with dynamic information modeling. In: European Conference on Computer Vision, pp. 405–420. Springer https://doi.org/10.1007/978-3-030-58607-2_24

  81. Simonelli A, Bulo SR, Porzi L, López-Antequera M, Kontschieder P (2019) Disentangling monocular 3d object detection. In: IEEE International Conference on Computer Vision, pp. 1991–1999. https://doi.org/10.1109/iccv.2019.00208

  82. Contributors M (2020) MMDetection3D: OpenMMLab next-generation platform for general 3D object detection. https://github.com/open-mmlab/mmdetection3d

Download references

Funding

This work was supported in part by the National Nature Science Foundation of China under Grant 62106158; in part by the Research and Development Program of Beijing Municipal Education Commission under Grant KM202210028007; and in part by the R&D Program of Beijing Municipal Education Commission (KZ20231002822).

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: Jun Li, Zizhang Wu; Methodology: Jun Li; Formal analysis and investigation: Zizhang Wu; Writing - original draft preparation: Han Zhang; Writing - review and editing: Tianhao Xu.

Corresponding author

Correspondence to Zizhang Wu.

Ethics declarations

Ethical and informed consent for data used

Not applicable. This study was conducted without directly involving human participants, and thus no informed consent was required.

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, J., Zhang, H., Wu, Z. et al. Radar-camera fusion for 3D object detection with aggregation transformer. Appl Intell 54, 10627–10639 (2024). https://doi.org/10.1007/s10489-024-05718-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-024-05718-1

Keywords