Abstract
Recently, multi-modal 3D object detection techniques based on point clouds and images have received increasing attention. However, existing methods for multi-modal feature fusion are often relatively singular, and single point cloud representation methods also have some limitations. For example, voxelization may result in the loss of fine-grained information, while 2D images lack depth information, which can restrict the accuracy of detection. Therefore, in this work, we propose a novel method for multi-modal 3D object detection based on point cloud diversity representation and dual feature fusion, PCDR-DFF, to improve the prediction accuracy of 3D object detection. Firstly, point clouds are projected to the image coordinate system and extract multi-level features of the point cloud corresponding to the image using a 2D backbone network. Then, the point clouds are jointly characterized using graphs and pillars, and the 3D features of the point clouds are extracted using graph neural networks and residual connectivity. Finally, a dual feature fusion method is designed to improve the accuracy of detection with the help of a well-designed multi-point fusion model and multi-feature fusion mechanism embedded with a spare 3D-U Net. Extensive experiments on the KITTI dataset demonstrate the effectiveness and competitiveness of our proposed models in comparison with other methods.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Arnold E, Al-Jarrah OY, Dianati M, Fallah S, Oxtoby D, Mouzakitis A (2019) A survey on 3d object detection methods for autonomous driving applications. IEEE Trans Intell Transp Syst 20(10):3782–3795
Bai X, Hu Z, Zhu X, Huang Q, Chen Y, Fu H, Tai CL (2022) Transfusion: Robust lidar-camera fusion for 3d robject detection with transformers. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1090–1099
Bai X, Hu Z, Zhu X, Huang Q, Chen Y, Fu H, Tai CL (2022) Transfusion: robust lidar-camera fusion for 3d object detection with transformers. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1090–1099
Bharati P, Pramanik A (2022) Deep learning techniques-r-cnn to mask r-cnn: a survey. In: Proceedings of the computational intelligence in pattern recognition. pp. 657–668
Brazil G, Liu X (2019) M3d-rpn: monocular 3d region proposal network for object detection. Proceedings of the IEEE international conference on computer vision. pp. 9287–9296
Cao P, Chen H, Zhang Y, Wang G (2019) Multi-view frustum pointnet for object detection in autonomous driving. In: Proceedings of the IEEE international conference on image processing. pp. 3896–3899
Chen J, Lei B, Song Q, Ying H, Chen DZ, Wu J (2020) A hierarchical graph network for 3d object detection on point clouds. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 392–401
Chen X, Ma H, Wan J, Li B, Xia T (2017) Mv3d: multi-view 3d object detection network for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1907–1915
Chen X, Zhang T, Wang Y, Wang Y, Zhao H (2023) Futr3d: a unified sensor fusion framework for 3d detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 172–181
Chen Y, Huang S, Liu S, Yu B, Jia J (2022) DSGN++: exploiting visual-spatial relation for stereo-based 3D detectors. IEEE Trans Pattern Anal Mach Intell 45(4):4416–4429
Chen Y, Li Y, Zhang X, Sun J, Jia J (2022) Focal sparse convolutional networks for 3d object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 5428–5437
Chen Y, Liu S, Shen X, Jia J (2020) Dsgn: deep stereo geometry network for 3d object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 12536–12545
Chen Z, Li Z, Zhang S, Fang L, Jiang Q, Zhao F, Zhou B, Zhao H (2022) Autoalign: pixel-instance feature aggregation for multi-modal 3d object detection. arXiv preprint arXiv:2201.06493
Deng J, Zhou W, Zhang Y, Li H (2021) From multi-view to hollow-3d: Hallucinated hollow-3d r-cnn for 3d object detection. Circuits Syst Video Technol 31(12):4722–4734
Ding Z, Han X, Niethammer M (2019) Votenet: a deep learning label fusion method for multi-atlas segmentation. In: Proceedings of the international conference on medical image computing and computer assisted intervention. pp. 202–210
Gao A, Pang Y, Nie J, Shao Z, Cao J, Guo Y, Li X (2022) ESGN: efficient stereo geometry network for fast 3d object detection. IEEE Trans Circuits Syst Video Technol
Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the kitti dataset. Int J Robot Res 32(11):1231–1237
Guo X, Shi S, Wang X, Li H (2021) Liga-stereo: Learning lidar geometry aware representations for stereo-based 3d detector. In: Proceedings of the IEEE international conference on computer vision. pp. 3153–3163
Guo Y, Yu H, Ma L, Zeng L, Luo X (2023) Thfe: a triple-hierarchy feature enhancement method for tiny boat detection. Eng Appl Artif Intell 123:106271
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision. pp. 2961–2969
He Q, Wang Z, Zeng H, Zeng Y, Liu Y (2022) Svga-net: sparse voxel-graph attention network for 3d object detection from point clouds. Proc AAAI Conf Arti Intell 36(1):870–878
Huang T, Liu Z, Chen X, Bai X (2020) Epnet: Enhancing point features with image semantics for 3d object detection. In: Proceedings of the European conference on computer vision. pp. 35–52
Ku J, Mozifian M, Lee J, Harakeh A, Waslander SL (2018) Joint 3d proposal generation and object detection from view aggregation. In: Proceedings of the IEEE international conference on intelligent robots and systems. pp. 1–8
Lang AH, Vora S, Caesar H, Zhou L, Yang J, Beijbom O (2019) Pointpillars: fast encoders for object detection from point clouds. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 12697–12705
Li B, Zhang T, Xia T (2016) Vehicle detection from 3d lidar using fully convolutional network. arXiv preprint arXiv:1608.07916
Li P, Chen X, Shen S (2019) Stereo r-cnn based 3d object detection for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 7644–7652
Li X, Shi B, Hou Y, Wu X, Ma T, Li Y, He L (2022) Homogeneous multi-modal feature fusion and interaction for 3d object detection. In: Proceedings of the IEEE European conference on computer vision. pp. 691–707
Li Y, Bu R, Sun M, Wu W, Di X, Chen B (2018) Pointcnn: convolution on x-transformed points. Adv Neural Inf Process Syst. vol. 31
Li Y, Deng J, Zhang Y, Ji J, Li H, Zhang Y (2022) Ezfusion: a close look at the integration of lidar, millimeter-wave radar, and camera for accurate 3d object detection and tracking. IEEE Robot Autom Lett 7(4):11182–11189
Li Y, Yu AW, Meng T, Caine B, Ngiam J, Peng D, Shen J, Lu Y, Zhou D, Le QV et al (2022) Lidar-camera deep fusion for multi-modal 3d object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 17182–17191
Liang M, Yang B, Chen Y, Hu R, Urtasun R (2019) Multi-task multi-sensor fusion for 3d object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 7345–7353
Liang M, Yang B, Wang S, Urtasun R (2018) Deep continuous fusion for multi-sensor 3d object detection. In: Proceedings of the European conference on computer vision. pp. 641–656
Liang Z, Zhang M, Zhang Z, Zhao X, Pu S (2020) Rangercnn: towards fast and accurate 3d object detection with range image representation. arXiv preprint arXiv:2009.00206
Liang Z, Zhang Z, Zhang M, Zhao X, Pu S (2021) Rangeioudet: range image based real-time 3d object detector optimized by intersection over union. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 7140–7149
Lin B, Wang F, Zhao F, Sun Y (2018) Scale invariant point feature (SIPF) for 3d point clouds and 3d multi-scale object detection. Neural Comput Appl 29:1209–1224
Lin C, Tian D, Duan X, Zhou J, Zhao D, Cao D (2022) Cl3d: camera-lidar 3d object detection with point feature enhancement and point-guided fusion. IEEE Trans Intell Transp Syst 23(10):18040–18050
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2117–2125
Liu Y, Fan B, Xiang S, Pan C (2019) Rs-cnn: relation-shape convolutional neural network for point cloud analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 8895–8904
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE international conference on computer vision. pp. 10012–10022
Liu Z, Tang H, Amini A, Yang X, Mao H, Rus DL, Han S (2023) Bevfusion: multi-task multi-sensor fusion with unified bird’s-eye view representation. In: Proceedings of the IEEE international conference on robotics and automation. pp. 2774–2781
Liu Z, Ye X, Tan X, Ding E, Bai X (2023) Stereodistill: pick the cream from lidar for distilling stereo-based 3d object detection. arXiv preprint arXiv:2301.01615
Liu Z, Zhao X, Huang T, Hu R, Zhou Y, Bai X (2020) Tanet: robust 3d object detection from point clouds with triple attention. Proc the AAAI Conf Artif Intell 34(07):11677–11684
Luo Z, Zhang G, Zhou C, Liu T, Lu S, Pan L (2023) Transpillars: coarse-to-fine aggregation for multi-frame 3d object detection. In: Proceedings of the IEEE winter conference on applications of computer vision. pp. 4230–4239
Ma X, Wang Z, Li H, Zhang P, Ouyang W, Fan X (2019) Accurate monocular 3d object detection via color-embedded 3d reconstruction for autonomous driving. In: Proceedings of the IEEE international conference on computer vision. pp. 6851–6860
Meng Q, Wang W, Zhou T, Shen J, Van Gool L, Dai D (2020) Weakly supervised 3d object detection from lidar point cloud. In: Proceedings of the IEEE European conference on computer vision. pp. 515–531
Milioto A, Vizzo I, Behley J, Stachniss C (2019) Rangenet++: fast and accurate lidar semantic segmentation. In: Proceedings of the IEEE international conference on intelligent robots and systems. pp. 4213–4220
Pang S, Morris D, Radha H (2020) Clocs: camera-lidar object candidates fusion for 3d object detection. In: Proceedings of the IEEE international conference on intelligent robots and systems. pp. 10386–10393
Qi C, Yi L, Su HP, Guibas LP. Pointnet++: deep hierarchical feature learning on point sets in a metric space. arXiv preprint arXiv:1706.02413
Qi CR, Su H, Mo K, Guibas LJ (2017) Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 652–660
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst. Vol. 28
Roshanaei M, Maleki M (2009) Dynamic-knn: a novel locating method in wlan based on angle of arrival. Proc IEEE Symp Ind Electron Appl 2:722–726
Shankar V, Roelofs R, Mania H, Fang A, Recht B, Schmidt L (2020) Evaluating machine accuracy on imagenet. In: Proceedings of the international conference on machine learning. pp. 8634–8644
Shanti DMF, Hidayat N, Wihandika RC (2018) Implementasi metode f-knn (fuzzy k-nearest neighbor) untuk diagnosis penyakit anjing. Jurnal Pengembangan Teknologį Įnformasį dan Įlmu Komputer e-ĮSSN 2548:964X
Shi S, Wang X, Li H (2019) Pointrcnn: 3d object proposal generation and detection from point cloud. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–779
Shi S, Wang Z, Shi J, Wang X, Li H (2020) From points to parts: 3d object detection from point cloud with part-aware and part-aggregation network. Pattern Anal Mach Intell 43(8):2647–2664
Sindagi VA, Zhou Y, Tuzel O (2019) Mvx-net: multimodal voxelnet for 3d object detection. In: Proceedings of the international conference on robotics and automation. pp. 7276–7282
Vaswani A, Ramachandran P, Srinivas A, Parmar N, Hechtman B, Shlens J (2021) Scaling local self-attention for parameter efficient visual backbones. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 12894–12904
Vu T, Jang H, Pham TX, Yoo C (2019) Cascade rpn: delving into high-quality region proposal network with adaptive convolution. In: Proceedings of the annual conference on neural information processing systems. 32
Wan R, Zhao T, Zhao W (2023) Pta-det: point transformer associating point cloud and image for 3d object detection. Sensors 23(6):3229
Wang J, Sun K, Cheng T, Jiang B, Deng C, Zhao Y, Liu D, Mu Y, Tan M, Wang X et al (2020) Deep high-resolution representation learning for visual recognition. IEEE Trans Pattern Anal Mach Intell 43(10):3349–3364
Xie L, Xiang C, Yu Z, Xu G, Yang Z, Cai D, He X (2020) Pi-rcnn: an efficient multi-sensor 3d object detector with point-based attentive cont-conv fusion module. Proc AAAI Conf Artif Intell 34(07):12460–12467
Yan C, Salman E (2017) Mono3d: Open source cell library for monolithic 3-d integrated circuits. Proc IEEE Trans Circuits Syst I 65(3):1075–1085
Yan Y, Mao Y, Li B (2018) Second: sparsely embedded convolutional detection. Sensors 18(10):3337
Yang B, Luo W, Urtasun R (2018) Pixor: real-time 3d object detection from point clouds. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 7652–7660
Yang Z, Sun Y, Liu S, Shen X, Jia J (2018) Ipod: intensive point-based object detector for point cloud. arXiv preprint arXiv:1812.05276
Yoo JH, Kim Y, Kim J, Choi JW (2020) 3d-cvf: generating joint camera and lidar features using cross-view spatial feature fusion for 3d object detection. In: Proceedings of the European Conference on Computer Vision. pp. 720–736
You Y, Wang Y, Chao WL, Garg D, Pleiss G, Hariharan B, Campbell M, Weinberger KQ (2019) Pseudo-lidar++: accurate depth for 3d object detection in autonomous driving. arXiv preprint arXiv:1906.06310
Zhang K, Hao M, Wang J, de Silva CW, Fu C (2019) Linked dynamic graph cnn: learning on point cloud via linking hierarchical features. arXiv preprint arXiv:1904.10014
Zhang P, Wang D, Lu H, Wang H, Ruan X (2017) Amulet: Aggregating multi-level convolutional features for salient object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV). pp. 202–211
Zheng Y, Shyrokau B, Keviczky T (2022) 3dop: comfort-oriented motion planning for automated vehicles with active suspensions. In: Proceedings of the IEEE intelligent vehicles symposium. pp. 390–395
Zhou Y, Tuzel O (2018) Voxelnet: end-to-end learning for point cloud based 3d object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4490–4499
Zhu L, Chen Z, Wang B, Tian G, Ji L (2023) Sfss-net: shape-awared filter and sematic-ranked sampler for voxel-based 3d object detection. Neural Comput Appl 35(18):13417–13431
Acknowledgements
This work was supported by the National Natural Science Foundation of China (62102003), Natural Science Foundation of Anhui Province (2108085QF258), Anhui Postdoctoral Science Foundation (2022B623), the University Synergy Innovation Program of Anhui Province (GXXT-2021-006, GXXT-2022-038), Central guiding local technology development special funds (202107d06020001), the Institute of Energy, Hefei Comprehensive National Science Center under (21KZS217), University-level general projects of Anhui University of science and technology (xjyb2020-04).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declared that they have no conflict of interest to this work.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xia, C., Li, X., Gao, X. et al. PCDR-DFF: multi-modal 3D object detection based on point cloud diversity representation and dual feature fusion. Neural Comput & Applic 36, 9329–9346 (2024). https://doi.org/10.1007/s00521-024-09561-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-024-09561-w