Radar-camera fusion for 3D object detection with aggregation transformer

Li, Jun; Zhang, Han; Wu, Zizhang; Xu, Tianhao

doi:10.1007/s10489-024-05718-1

Radar-camera fusion for 3D object detection with aggregation transformer

Published: 22 August 2024

Volume 54, pages 10627–10639, (2024)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Jun Li^1,2,
Han Zhang¹,
Zizhang Wu ORCID: orcid.org/0000-0002-2169-8271³ &
…
Tianhao Xu⁴

344 Accesses
Explore all metrics

Abstract

In recent years, with the continuous development of autonomous driving, monocular 3D object detection has garnered increasing attention as a crucial research topic. However, the precision of 3D object detection is impeded by the limitations of monocular camera sensors, which struggle to capture accurate depth information. To address this challenge, a novel Aggregation Transformer Network (ATNet) is introduced, featuring Cross-Attention based Positional Aggregation and Dual Expansion-Squeeze based Channel Aggregation. The proposed ATNet adaptively fuses radar and camera data at both positional and channel levels. Specifically, the Cross-Attention based Positional Aggregation leverages camera-radar information to compute a non-linear attention coefficient, which reinforces salient features and suppresses irrelevant ones. The Dual Expansion-Squeeze based Channel Aggregation utilizes refined processing techniques to integrate radar and camera data adaptively at the channel level. Furthermore, to enhance feature-level fusion, we propose a multi-scale radar-camera fusion strategy that integrates radar information across multiple stages of the camera subnet’s backbone, allowing for improved object detection across various scales. Extensive experiments conducted on the widely-used nuScenes dataset validate that our proposed Aggregation Transformer, when integrated into superb monocular 3D object detection models, delivers promising results compared to existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Interactive guidance network for object detection based on radar-camera fusion

Article 31 August 2023

CenterTransFuser: radar point cloud and visual information fusion for 3D object detection

Article Open access 11 January 2023

LRCFormer: lightweight transformer based radar-camera fusion for 3D target detection

Article 03 December 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability and access

The data that support the findings of this study are openly available in nuScenes at https://www.nuscenes.org/nuscenes.

References

Ahn B, Kim Y, Park G, Cho NI (2018) Block-matching convolutional neural network (bmcnn): improving cnn-based denoising by block-matched inputs. In: 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 516–525. IEEE https://doi.org/10.23919/apsipa.2018.8659548
Hosseini SA, Abbaszadeh Shahri A, Asheghi R (2022) Prediction of bedload transport rate using a block combined network structure. Hydrol Sci J 67(1):117–128. https://doi.org/10.1080/02626667.2021.2003367
Article Google Scholar
Asheghi R, Hosseini SA, Saneie M, Shahri AA (2020) Updating the neural network sediment load models using different sensitivity analysis methods: a regional application. J Hydroinf 22(3):562–577. https://doi.org/10.2166/hydro.2020.098
Article Google Scholar
Zhang J, Huang K, Tan T, Zhang Z (2017) Local structured representation for generic object detection. Front Comp Sci 11:632–648. https://doi.org/10.1007/s11704-016-5530-6
Article Google Scholar
Lee DH, Chen K-L, Liou K-H, Liu C-L, Liu J-L (2021) Deep learning and control algorithms of direct perception for autonomous driving. Appl Intell 51(1):237–247. https://doi.org/10.1007/s10489-020-01827-9
Article Google Scholar
Dickmanns ED (1992) A general dynamic vision architecture for ugv and uav. Appl Intell 2:251–270. https://doi.org/10.1007/bf00119551
Article Google Scholar
Dupuis E, Novo D, O’Connor I, Bosio A (2020) Sensitivity analysis and compression opportunities in dnns using weight sharing. In: 2020 23rd International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS), pp. 1–6. IEEE https://doi.org/10.1109/ddecs50862.2020.9095658
Abbaszadeh Shahri A, Chunling S, Larsson S (2023) A hybrid ensemble-based automated deep learning approach to generate 3d geo-models and uncertainty analysis. Eng Comput 1–16. https://doi.org/10.1007/s00366-023-01852-5
Kumar A, Brazil G, Corona E, Parchami A, Liu X (2022) Deviant: Depth equivariant network for monocular 3d object detection. In: European Conference on Computer Vision, pp. 664–683. https://doi.org/10.1007/978-3-031-20077-9_39 . Springer
Peng L, Wu X, Yang Z, Liu H, Cai D (2022) Did-m3d: Decoupling instance depth for monocular 3d object detection. arXiv preprint arXiv:2207.08531 https://doi.org/10.1007/978-3-031-19769-7_5
Hao W, Andolina IM, Wang W, Zhang Z (2021) Biologically inspired visual computing: the state of the art. Front Comp Sci 15(1):151304. https://doi.org/10.1007/s11704-020-9001-8
Article Google Scholar
Wang T, Pang J, Lin D (2022) Monocular 3d object detection with depth from motion. In: European Conference on Computer Vision, pp. 386–403. Springer https://doi.org/10.1007/978-3-031-20077-9_23
Gao T, Jia Z, Lin W, Li Y (2022) Delving into monocular 3d vehicle tracking: a decoupled framework and a dedicated metric. Appl Intell 1–11. https://doi.org/10.1007/s10489-022-03432-4
Naik DL, Kiran R (2021) A novel sensitivity-based method for feature selection. J Big Data 8(1):128. https://doi.org/10.1186/s40537-021-00515-w
Article Google Scholar
Zhang P (2019) A novel feature selection method based on global sensitivity analysis with application in machine learning-based prediction model. Appl Soft Comput 85:105859. https://doi.org/10.1016/j.asoc.2019.105859
Article Google Scholar
Gao T, Pan H, Gao H (2022) Monocular 3d object detection with sequential feature association and depth hint augmentation. IEEE Trans Intell 7(2):240–250. https://doi.org/10.1109/tiv.2022.3143954
Article Google Scholar
Wang T, Zhu X, Pang J, Lin D (2021) Fcos3d: Fully convolutional one-stage monocular 3d object detection. In: IEEE International Conference on Computer Vision, pp. 913–922. https://doi.org/10.1109/iccvw54120.2021.00107
Liu Z, Wu Z, Tóth R (2020) Smoke: Single-stage monocular 3d object detection via keypoint estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 996–997. https://doi.org/10.1109/cvprw50498.2020.00506
Zhang Y, Zheng W, Zhu Z, Huang G, Du D, Zhou J, Lu J (2022) Dimension embeddings for monocular 3d object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1589–1598. https://doi.org/10.1109/cvpr52688.2022.00164
Li Z, Qu Z, Zhou Y, Liu J, Wang H, Jiang L (2022) Diversity matters: Fully exploiting depth clues for reliable monocular 3d object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2791–2800. https://doi.org/10.1109/cvpr52688.2022.00281
Jiang H, Cheng MM, Li SJ, Borji A, Wang J (2019) Joint salient object detection and existence prediction. Front Comp Sci 13:778–788. https://doi.org/10.1007/s11704-017-6613-8
Article Google Scholar
Lian, Q., Ye, B., Xu, R., Yao, W., Zhang, T (2022) Exploring geometric consistency for monocular 3d object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1685–1694. https://doi.org/10.1109/cvpr52688.2022.00173
Gu J, Wu B, Fan L, Huang J, Cao S, Xiang Z, Hua XS (2022) Homography loss for monocular 3d object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1080–1089. https://doi.org/10.1109/cvpr52688.2022.00115
Yang B, Luo W, Urtasun R (2018) Pixor: Real-time 3d object detection from point clouds. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7652–7660. https://doi.org/10.1109/cvpr.2018.00798
Wang W, Wang T, Cai Y (2022) Multi-view attention-convolution pooling network for 3d point cloud classification. Appl Intell 52(13):14787–14798. https://doi.org/10.1007/s10489-021-02840-2
Article Google Scholar
Rozsa Z, Sziranyi T (2019) Object detection from a few lidar scanning planes. IEEE Trans Intell Veh 4(4):548–560. https://doi.org/10.1109/tiv.2019.2938109
Article Google Scholar
Zhang R, Qiu H, Wang T, Guo Z, Xu X, Qiao Y, Gao P, Li H (2022) Monodetr: Depth-guided transformer for monocular 3d object detection. arXiv preprint arXiv:2203.13310 https://doi.org/10.1109/iccv51070.2023.00840
Qin Z, Li X (2022) Monoground: Detecting monocular 3d objects from the ground. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3793–3802. https://doi.org/10.1109/cvpr52688.2022.00377
Lian Q, Li P, Chen X (2022) Monojsg: Joint semantic and geometric cost volume for monocular 3d object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1070–1079. https://doi.org/10.1109/cvpr52688.2022.00114
Chen YN, Dai H, Ding Y (2022) Pseudo-stereo for monocular 3d object detection in autonomous driving. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 887–897. https://doi.org/10.1109/cvpr52688.2022.00096
Li P, Jin J (2022) Time3d: End-to-end joint monocular 3d object detection and tracking for autonomous driving. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3885–3894. https://doi.org/10.1109/cvpr52688.2022.00386
Li Y, Chen Y, He J, Zhang Z (2022) Densely constrained depth estimator for monocular 3d object detection. In: European Conference on Computer Vision, pp. 718–734. Springer https://doi.org/10.1007/978-3-031-20077-9_42
Battaglia E, Bioglio L, Pensa RG (2020) Towards content sensitivity analysis. In: Berthold, M.R., Feelders, A., Krempl, G. (eds.) Advances in Intelligent Data Analysis XVIII, pp. 67–79. Springer, Cham. https://doi.org/10.1007/978-3-030-44584-3_6
Yeung DS, Cloete I, Shi D, Ng W (2010) Sensitivity Analysis for Neural Networks. Springer, ???
He C, Li R, Li S, Zhang L (2022) Voxel set transformer: A set-to-set approach to 3d object detection from point clouds. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8417–8427. https://doi.org/10.1109/cvpr52688.2022.00823
Li Y, Qi X, Chen Y, Wang L, Li Z, Sun J, Jia J (2022) Voxel field fusion for 3d object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1120–1129. https://doi.org/10.1109/cvpr52688.2022.00119
Fazlali H, Xu Y, Ren Y, Liu B (2022) A versatile multi-view framework for lidar-based 3d object detection with guidance from panoptic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 17192–17201. https://doi.org/10.1109/cvpr52688.2022.01668
Fan L, Pang Z, Zhang T, Wang YX, Zhao H, Wang F, Wang N, Zhang Z (2022) Embracing single stride 3d object detector with sparse transformer. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8458–8468. https://doi.org/10.1109/cvpr52688.2022.00827
Lehner A, Gasperini S, Marcos-Ramiro A, Schmidt M, Mahani MAN, Navab N, Busam B, Tombari F (2022) 3d-vfield: Adversarial augmentation of point clouds for domain generalization in 3d object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 17295–17304. https://doi.org/10.1109/cvpr52688.2022.01678
Li X, Kong D (2022) Srif-rcnn: Sparsely represented inputs fusion of different sensors for 3d object detection. Appl Intell 1–22. https://doi.org/10.1007/s10489-022-03594-1
Xu X, Wang W, Wang J (2016) A three-way incremental-learning algorithm for radar emitter identification. Front Comp Sci 10:673–688. https://doi.org/10.1007/s11704-015-4457-7
Article Google Scholar
Nabati R, Qi H (2021) Centerfusion: Center-based radar and camera fusion for 3d object detection. In: IEEE Winter Conference on Applications of Computer Vision, pp. 1527–1536. https://doi.org/10.1109/wacv48630.2021.00157
Abbaszadeh Shahri A, Maghsoudi Moud F (2021) Landslide susceptibility mapping using hybridized block modular intelligence model. Bull Eng Geol Environ 80:267–284. https://doi.org/10.1007/s10064-020-01922-8
Article Google Scholar
Zou BJ, Guo YD, He Q, Ouyang PB, Liu K, Chen ZL (2018) 3d filtering by block matching and convolutional neural network for image denoising. J Comput Sci Technol 33:838–848. https://doi.org/10.1007/s11390-018-1859-7
Article Google Scholar
Zhou J, Ni J, Rao Y (2017) Block-based convolutional neural network for image forgery detection. In: Digital Forensics and Watermarking: 16th International Workshop, IWDW 2017, Magdeburg, Germany, August 23-25, 2017, Proceedings 16, pp. 65–76. Springer https://doi.org/10.1007/978-3-319-64185-0_6
Lin JT, Dai D, Van Gool L (2020) Depth estimation from monocular images and sparse radar data. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 10233–10240. IEEE https://doi.org/10.1109/iros45743.2020.9340998
Li Y, Chen Y, Qi X, Li Z, Sun J, Jia J (2022) Unifying voxel-based representation with transformer for 3d object detection. arXiv preprint arXiv:2206.00630
Liu Z, Tang H, Amini A, Yang X, Mao H, Rus D, Han S (2022) Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation. arXiv preprint arXiv:2205.13542. https://doi.org/10.1109/icra48891.2023.10160968
Bai X, Hu Z, Zhu X, Huang Q, Chen Y, Fu H, Tai CL (2022) Transfusion: Robust lidar-camera fusion for 3d object detection with transformers. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1090–1099. https://doi.org/10.1109/cvpr52688.2022.00116
Xu S, Zhou D, Fang J, Yin J, Bin Z, Zhang L (2021) Fusionpainting: Multimodal fusion with adaptive attention for 3d object detection. In: IEEE International Conference on Intelligent Transportation Systems, pp. 3047–3054. IEEE https://doi.org/10.1109/itsc48978.2021.9564951
Nobis F, Geisslinger M, Weber M, Betz J, Lienkamp M (2019) A deep learning-based radar and camera sensor fusion architecture for object detection. In: Sensor Data Fusion: Trends, Solutions, Applications, pp. 1–7. IEEE https://doi.org/10.1109/sdf.2019.8916629
Long Y, Morris D, Liu X, Castro M, Chakravarty P, Narayanan P (2021) Radar-camera pixel depth association for depth completion. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 12507–12516. https://doi.org/10.1109/cvpr46437.2021.01232
Wang Y, Jiang Z, Li Y, Hwang J-N, Xing G, Liu H (2021) Rodnet: A real-time radar object detection network cross-supervised by camera-radar fused object 3d localization. IEEE J Sel Top Signal Process 15(4):954–967. https://doi.org/10.1109/jstsp.2021.3058895
Article Google Scholar
Nabati R, Qi H (2019) Rrpn: Radar region proposal network for object detection in autonomous vehicles. In: IEEE International Conference on Image Processing, pp. 3093–3097. IEEE https://doi.org/10.1109/icip.2019.8803392
Zeng Y, Zhang D, Wang C, Miao Z, Liu T, Zhan X, Hao D, Ma C (2022) Lift: Learning 4d lidar image fusion transformer for 3d object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 17172–17181. https://doi.org/10.1109/cvpr52688.2022.01666
Peri N, Luiten J, Li M, Ošep A, Leal-Taixé L, Ramanan D (2022) Forecasting from lidar via future object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 17202–17211. https://doi.org/10.1109/cvpr52688.2022.01669
Liu C, Gao C, Liu, F, Liu J, Meng D, Gao X (2022) Ss3d: Sparsely-supervised 3d object detection from point cloud. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8428–8437. https://doi.org/10.1109/cvpr52688.2022.00824
Hu JSK, Kuai T, Waslander SL (2022) Point density-aware voxels for lidar 3d object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8469–8478. https://doi.org/10.1109/cvpr52688.2022.00828
Hahner M, Sakaridis C, Bijelic M, Heide F, Yu F, Dai D, Van Gool L (2022) Lidar snowfall simulation for robust 3d object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 16364–16374. https://doi.org/10.1109/cvpr52688.2022.01588
Samal K, Kumawat H, Saha P, Wolf M, Mukhopadhyay S (2022) Task-driven rgb-lidar fusion for object tracking in resource-efficient autonomous system. IEEE Trans Intell Veh 7(1):102–112. https://doi.org/10.1109/tiv.2021.3087664
Article Google Scholar
Sun Y, Li J, Wang Y, Xu X, Yang X, Sun Z (2022) Atop: An attention-to-optimization approach for automatic lidar-camera calibration via cross-modal object matching. IEEE Trans Intell Veh 8(1):1–13. https://doi.org/10.1109/tiv.2022.3184976
Article Google Scholar
Li G, Ji Z, Qu X, Zhou R, Cao D (2022) Cross-domain object detection for autonomous driving: A stepwise domain adaptative yolo approach. IEEE Trans Intell Veh 7(3):603–615. https://doi.org/10.1109/tiv.2022.3165353
Article Google Scholar
Yadav R, Vierling A, Berns K (2020) Radar+ rgb fusion for robust object detection in autonomous vehicle. In: 2020 IEEE International Conference on Image Processing (ICIP), pp. 1986–1990. IEEE https://doi.org/10.1109/icip40778.2020.9191046
Qian K, Zhu S, Zhang X, Li LE (2021) Robust multimodal vehicle detection in foggy weather using complementary lidar and radar signals. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 444–453. https://doi.org/10.1109/cvpr46437.2021.00051
Hussain MI, Rafique MA, Jeon M (2021) Rvmde: Radar validated monocular depth estimation for robotics. arXiv preprint arXiv:2109.05265
Misra I, Girdhar R, Joulin A (2021) An End-to-End Transformer Model for 3D Object Detection. In: IEEE International Conference on Computer Vision. https://doi.org/10.1109/iccv48922.2021.00290
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al. (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Li Z, Wang W, Li H, Xie E, Sima C, Lu T, Yu Q, Dai J (2022) Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. arXiv preprint arXiv:2203.17270. https://doi.org/10.1007/978-3-031-20077-9_1
Huang KC, Wu TH, Su HT, Hsu WH (2022) Monodtr: Monocular 3d object detection with depth-aware transformer. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4012–4021. https://doi.org/10.1109/cvpr52688.2022.00398
Zhu X, Ma Y, Wang T, Xu Y, Shi J, Lin D (2020) Ssn: Shape signature networks for multi-class object detection from point clouds. In: European Conference on Computer Vision, pp. 581–597. Springer https://doi.org/10.1007/978-3-030-58595-2_35
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: IEEE International Conference on Computer Vision, pp. 10012–10022. https://doi.org/10.1109/iccv48922.2021.00986
Xie E, Wang W, Yu Z, Anandkumar A, Alvarez JM, Luo P (2021) Segformer: Simple and efficient design for semantic segmentation with transformers. Adv Neural Inf Process Syst 34:12077–12090
Google Scholar
Chen CFR, Fan Q, Panda R (2021) Crossvit: Cross-attention multi-scale vision transformer for image classification. In: IEEE International Conference on Computer Vision, pp. 357–366. https://doi.org/10.1109/iccv48922.2021.00041
Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1933–1941. https://doi.org/10.1109/cvpr.2016.213
Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: Keypoint triplets for object detection. In: IEEE International Conference on Computer Vision, pp. 6569–6578 https://doi.org/10.1109/iccv.2019.00667
Wang Y, Guizilini VC, Zhang T, Wang Y, Zhao H, Solomon J (2022) Detr3d: 3d object detection from multi-view images via 3d-to-2d queries. In: The Conference on Robot Learning, pp. 180–191. PMLR
Chen H, Wang P, Wang F, Tian W, Xiong L, Li H (2022) Epro-pnp: Generalized end-to-end probabilistic perspective-n-points for monocular object pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2781–2790. https://doi.org/10.1109/cvpr52688.2022.00280
Wang T, Xinge Z, Pang J, Lin D (2022) Probabilistic and geometric depth: Detecting objects in perspective. In: The Conference on Robot Learning, pp. 1475–1485. PMLR
Caesar H, Bankiti V, Lang AH, Vora S, Liong VE, Xu Q, Krishnan A, Pan Y, Baldan G, Beijbom O (2020) nuscenes: A multimodal dataset for autonomous driving. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 11621–11631. https://doi.org/10.1109/cvpr42600.2020.01164
Wang J, Lan S, Gao M, Davis LS (2020) Infofocus: 3d object detection for autonomous driving with dynamic information modeling. In: European Conference on Computer Vision, pp. 405–420. Springer https://doi.org/10.1007/978-3-030-58607-2_24
Simonelli A, Bulo SR, Porzi L, López-Antequera M, Kontschieder P (2019) Disentangling monocular 3d object detection. In: IEEE International Conference on Computer Vision, pp. 1991–1999. https://doi.org/10.1109/iccv.2019.00208
Contributors M (2020) MMDetection3D: OpenMMLab next-generation platform for general 3D object detection. https://github.com/open-mmlab/mmdetection3d

Download references

Funding

This work was supported in part by the National Nature Science Foundation of China under Grant 62106158; in part by the Research and Development Program of Beijing Municipal Education Commission under Grant KM202210028007; and in part by the R&D Program of Beijing Municipal Education Commission (KZ20231002822).

Author information

Authors and Affiliations

Information Engineering College, Capital Normal University, Beijing, 100048, China
Jun Li & Han Zhang
Institute of Artificial Intelligence Education, Capital Normal University, Beijing, 100048, China
Jun Li
Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, 200433, China
Zizhang Wu
Computer Vision Perception Department, ZongMu Technology, Shanghai, 201203, China
Tianhao Xu

Authors

Jun Li
View author publications
You can also search for this author inPubMed Google Scholar
Han Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Zizhang Wu
View author publications
You can also search for this author inPubMed Google Scholar
Tianhao Xu
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Conceptualization: Jun Li, Zizhang Wu; Methodology: Jun Li; Formal analysis and investigation: Zizhang Wu; Writing - original draft preparation: Han Zhang; Writing - review and editing: Tianhao Xu.

Corresponding author

Correspondence to Zizhang Wu.

Ethics declarations

Ethical and informed consent for data used

Not applicable. This study was conducted without directly involving human participants, and thus no informed consent was required.

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, J., Zhang, H., Wu, Z. et al. Radar-camera fusion for 3D object detection with aggregation transformer. Appl Intell 54, 10627–10639 (2024). https://doi.org/10.1007/s10489-024-05718-1

Download citation

Accepted: 27 July 2024
Published: 22 August 2024
Issue Date: November 2024
DOI: https://doi.org/10.1007/s10489-024-05718-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Radar-camera fusion for 3D object detection with aggregation transformer

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Interactive guidance network for object detection based on radar-camera fusion

CenterTransFuser: radar point cloud and visual information fusion for 3D object detection

LRCFormer: lightweight transformer based radar-camera fusion for 3D target detection

Explore related subjects

Data availability and access

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical and informed consent for data used

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now