PCDR-DFF: multi-modal 3D object detection based on point cloud diversity representation and dual feature fusion

Xia, Chenxing; Li, Xubing; Gao, Xiuju; Ge, Bin; Li, Kuan-Ching; Fang, Xianjin; Zhang, Yan; Yang, Ke

doi:10.1007/s00521-024-09561-w

PCDR-DFF: multi-modal 3D object detection based on point cloud diversity representation and dual feature fusion

Original Article
Published: 01 March 2024

Volume 36, pages 9329–9346, (2024)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Chenxing Xia^1,2,3,
Xubing Li¹,
Xiuju Gao⁴,
Bin Ge¹,
Kuan-Ching Li⁵,
Xianjin Fang^1,6,
Yan Zhang⁷ &
…
Ke Yang²

547 Accesses
Explore all metrics

Abstract

Recently, multi-modal 3D object detection techniques based on point clouds and images have received increasing attention. However, existing methods for multi-modal feature fusion are often relatively singular, and single point cloud representation methods also have some limitations. For example, voxelization may result in the loss of fine-grained information, while 2D images lack depth information, which can restrict the accuracy of detection. Therefore, in this work, we propose a novel method for multi-modal 3D object detection based on point cloud diversity representation and dual feature fusion, PCDR-DFF, to improve the prediction accuracy of 3D object detection. Firstly, point clouds are projected to the image coordinate system and extract multi-level features of the point cloud corresponding to the image using a 2D backbone network. Then, the point clouds are jointly characterized using graphs and pillars, and the 3D features of the point clouds are extracted using graph neural networks and residual connectivity. Finally, a dual feature fusion method is designed to improve the accuracy of detection with the help of a well-designed multi-point fusion model and multi-feature fusion mechanism embedded with a spare 3D-U Net. Extensive experiments on the KITTI dataset demonstrate the effectiveness and competitiveness of our proposed models in comparison with other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MMFG: Multimodal-based Mutual Feature Gating 3D Object Detection

Article Open access 07 June 2024

Point cloud 3D object detection method based on density information-local feature fusion

Article 13 May 2023

VPC-VoxelNet: multi-modal fusion 3D object detection networks based on virtual point clouds

Article 06 March 2025

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Arnold E, Al-Jarrah OY, Dianati M, Fallah S, Oxtoby D, Mouzakitis A (2019) A survey on 3d object detection methods for autonomous driving applications. IEEE Trans Intell Transp Syst 20(10):3782–3795
Article Google Scholar
Bai X, Hu Z, Zhu X, Huang Q, Chen Y, Fu H, Tai CL (2022) Transfusion: Robust lidar-camera fusion for 3d robject detection with transformers. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1090–1099
Bai X, Hu Z, Zhu X, Huang Q, Chen Y, Fu H, Tai CL (2022) Transfusion: robust lidar-camera fusion for 3d object detection with transformers. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1090–1099
Bharati P, Pramanik A (2022) Deep learning techniques-r-cnn to mask r-cnn: a survey. In: Proceedings of the computational intelligence in pattern recognition. pp. 657–668
Brazil G, Liu X (2019) M3d-rpn: monocular 3d region proposal network for object detection. Proceedings of the IEEE international conference on computer vision. pp. 9287–9296
Cao P, Chen H, Zhang Y, Wang G (2019) Multi-view frustum pointnet for object detection in autonomous driving. In: Proceedings of the IEEE international conference on image processing. pp. 3896–3899
Chen J, Lei B, Song Q, Ying H, Chen DZ, Wu J (2020) A hierarchical graph network for 3d object detection on point clouds. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 392–401
Chen X, Ma H, Wan J, Li B, Xia T (2017) Mv3d: multi-view 3d object detection network for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1907–1915
Chen X, Zhang T, Wang Y, Wang Y, Zhao H (2023) Futr3d: a unified sensor fusion framework for 3d detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 172–181
Chen Y, Huang S, Liu S, Yu B, Jia J (2022) DSGN++: exploiting visual-spatial relation for stereo-based 3D detectors. IEEE Trans Pattern Anal Mach Intell 45(4):4416–4429
Google Scholar
Chen Y, Li Y, Zhang X, Sun J, Jia J (2022) Focal sparse convolutional networks for 3d object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 5428–5437
Chen Y, Liu S, Shen X, Jia J (2020) Dsgn: deep stereo geometry network for 3d object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 12536–12545
Chen Z, Li Z, Zhang S, Fang L, Jiang Q, Zhao F, Zhou B, Zhao H (2022) Autoalign: pixel-instance feature aggregation for multi-modal 3d object detection. arXiv preprint arXiv:2201.06493
Deng J, Zhou W, Zhang Y, Li H (2021) From multi-view to hollow-3d: Hallucinated hollow-3d r-cnn for 3d object detection. Circuits Syst Video Technol 31(12):4722–4734
Article Google Scholar
Ding Z, Han X, Niethammer M (2019) Votenet: a deep learning label fusion method for multi-atlas segmentation. In: Proceedings of the international conference on medical image computing and computer assisted intervention. pp. 202–210
Gao A, Pang Y, Nie J, Shao Z, Cao J, Guo Y, Li X (2022) ESGN: efficient stereo geometry network for fast 3d object detection. IEEE Trans Circuits Syst Video Technol
Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the kitti dataset. Int J Robot Res 32(11):1231–1237
Article Google Scholar
Guo X, Shi S, Wang X, Li H (2021) Liga-stereo: Learning lidar geometry aware representations for stereo-based 3d detector. In: Proceedings of the IEEE international conference on computer vision. pp. 3153–3163
Guo Y, Yu H, Ma L, Zeng L, Luo X (2023) Thfe: a triple-hierarchy feature enhancement method for tiny boat detection. Eng Appl Artif Intell 123:106271
Article Google Scholar
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision. pp. 2961–2969
He Q, Wang Z, Zeng H, Zeng Y, Liu Y (2022) Svga-net: sparse voxel-graph attention network for 3d object detection from point clouds. Proc AAAI Conf Arti Intell 36(1):870–878
Google Scholar
Huang T, Liu Z, Chen X, Bai X (2020) Epnet: Enhancing point features with image semantics for 3d object detection. In: Proceedings of the European conference on computer vision. pp. 35–52
Ku J, Mozifian M, Lee J, Harakeh A, Waslander SL (2018) Joint 3d proposal generation and object detection from view aggregation. In: Proceedings of the IEEE international conference on intelligent robots and systems. pp. 1–8
Lang AH, Vora S, Caesar H, Zhou L, Yang J, Beijbom O (2019) Pointpillars: fast encoders for object detection from point clouds. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 12697–12705
Li B, Zhang T, Xia T (2016) Vehicle detection from 3d lidar using fully convolutional network. arXiv preprint arXiv:1608.07916
Li P, Chen X, Shen S (2019) Stereo r-cnn based 3d object detection for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 7644–7652
Li X, Shi B, Hou Y, Wu X, Ma T, Li Y, He L (2022) Homogeneous multi-modal feature fusion and interaction for 3d object detection. In: Proceedings of the IEEE European conference on computer vision. pp. 691–707
Li Y, Bu R, Sun M, Wu W, Di X, Chen B (2018) Pointcnn: convolution on x-transformed points. Adv Neural Inf Process Syst. vol. 31
Li Y, Deng J, Zhang Y, Ji J, Li H, Zhang Y (2022) Ezfusion: a close look at the integration of lidar, millimeter-wave radar, and camera for accurate 3d object detection and tracking. IEEE Robot Autom Lett 7(4):11182–11189
Article Google Scholar
Li Y, Yu AW, Meng T, Caine B, Ngiam J, Peng D, Shen J, Lu Y, Zhou D, Le QV et al (2022) Lidar-camera deep fusion for multi-modal 3d object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 17182–17191
Liang M, Yang B, Chen Y, Hu R, Urtasun R (2019) Multi-task multi-sensor fusion for 3d object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 7345–7353
Liang M, Yang B, Wang S, Urtasun R (2018) Deep continuous fusion for multi-sensor 3d object detection. In: Proceedings of the European conference on computer vision. pp. 641–656
Liang Z, Zhang M, Zhang Z, Zhao X, Pu S (2020) Rangercnn: towards fast and accurate 3d object detection with range image representation. arXiv preprint arXiv:2009.00206
Liang Z, Zhang Z, Zhang M, Zhao X, Pu S (2021) Rangeioudet: range image based real-time 3d object detector optimized by intersection over union. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 7140–7149
Lin B, Wang F, Zhao F, Sun Y (2018) Scale invariant point feature (SIPF) for 3d point clouds and 3d multi-scale object detection. Neural Comput Appl 29:1209–1224
Article Google Scholar
Lin C, Tian D, Duan X, Zhou J, Zhao D, Cao D (2022) Cl3d: camera-lidar 3d object detection with point feature enhancement and point-guided fusion. IEEE Trans Intell Transp Syst 23(10):18040–18050
Article Google Scholar
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2117–2125
Liu Y, Fan B, Xiang S, Pan C (2019) Rs-cnn: relation-shape convolutional neural network for point cloud analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 8895–8904
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE international conference on computer vision. pp. 10012–10022
Liu Z, Tang H, Amini A, Yang X, Mao H, Rus DL, Han S (2023) Bevfusion: multi-task multi-sensor fusion with unified bird’s-eye view representation. In: Proceedings of the IEEE international conference on robotics and automation. pp. 2774–2781
Liu Z, Ye X, Tan X, Ding E, Bai X (2023) Stereodistill: pick the cream from lidar for distilling stereo-based 3d object detection. arXiv preprint arXiv:2301.01615
Liu Z, Zhao X, Huang T, Hu R, Zhou Y, Bai X (2020) Tanet: robust 3d object detection from point clouds with triple attention. Proc the AAAI Conf Artif Intell 34(07):11677–11684
Google Scholar
Luo Z, Zhang G, Zhou C, Liu T, Lu S, Pan L (2023) Transpillars: coarse-to-fine aggregation for multi-frame 3d object detection. In: Proceedings of the IEEE winter conference on applications of computer vision. pp. 4230–4239
Ma X, Wang Z, Li H, Zhang P, Ouyang W, Fan X (2019) Accurate monocular 3d object detection via color-embedded 3d reconstruction for autonomous driving. In: Proceedings of the IEEE international conference on computer vision. pp. 6851–6860
Meng Q, Wang W, Zhou T, Shen J, Van Gool L, Dai D (2020) Weakly supervised 3d object detection from lidar point cloud. In: Proceedings of the IEEE European conference on computer vision. pp. 515–531
Milioto A, Vizzo I, Behley J, Stachniss C (2019) Rangenet++: fast and accurate lidar semantic segmentation. In: Proceedings of the IEEE international conference on intelligent robots and systems. pp. 4213–4220
Pang S, Morris D, Radha H (2020) Clocs: camera-lidar object candidates fusion for 3d object detection. In: Proceedings of the IEEE international conference on intelligent robots and systems. pp. 10386–10393
Qi C, Yi L, Su HP, Guibas LP. Pointnet++: deep hierarchical feature learning on point sets in a metric space. arXiv preprint arXiv:1706.02413
Qi CR, Su H, Mo K, Guibas LJ (2017) Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 652–660
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst. Vol. 28
Roshanaei M, Maleki M (2009) Dynamic-knn: a novel locating method in wlan based on angle of arrival. Proc IEEE Symp Ind Electron Appl 2:722–726
Google Scholar
Shankar V, Roelofs R, Mania H, Fang A, Recht B, Schmidt L (2020) Evaluating machine accuracy on imagenet. In: Proceedings of the international conference on machine learning. pp. 8634–8644
Shanti DMF, Hidayat N, Wihandika RC (2018) Implementasi metode f-knn (fuzzy k-nearest neighbor) untuk diagnosis penyakit anjing. Jurnal Pengembangan Teknologį Įnformasį dan Įlmu Komputer e-ĮSSN 2548:964X
Google Scholar
Shi S, Wang X, Li H (2019) Pointrcnn: 3d object proposal generation and detection from point cloud. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–779
Shi S, Wang Z, Shi J, Wang X, Li H (2020) From points to parts: 3d object detection from point cloud with part-aware and part-aggregation network. Pattern Anal Mach Intell 43(8):2647–2664
Google Scholar
Sindagi VA, Zhou Y, Tuzel O (2019) Mvx-net: multimodal voxelnet for 3d object detection. In: Proceedings of the international conference on robotics and automation. pp. 7276–7282
Vaswani A, Ramachandran P, Srinivas A, Parmar N, Hechtman B, Shlens J (2021) Scaling local self-attention for parameter efficient visual backbones. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 12894–12904
Vu T, Jang H, Pham TX, Yoo C (2019) Cascade rpn: delving into high-quality region proposal network with adaptive convolution. In: Proceedings of the annual conference on neural information processing systems. 32
Wan R, Zhao T, Zhao W (2023) Pta-det: point transformer associating point cloud and image for 3d object detection. Sensors 23(6):3229
Article Google Scholar
Wang J, Sun K, Cheng T, Jiang B, Deng C, Zhao Y, Liu D, Mu Y, Tan M, Wang X et al (2020) Deep high-resolution representation learning for visual recognition. IEEE Trans Pattern Anal Mach Intell 43(10):3349–3364
Article Google Scholar
Xie L, Xiang C, Yu Z, Xu G, Yang Z, Cai D, He X (2020) Pi-rcnn: an efficient multi-sensor 3d object detector with point-based attentive cont-conv fusion module. Proc AAAI Conf Artif Intell 34(07):12460–12467
Google Scholar
Yan C, Salman E (2017) Mono3d: Open source cell library for monolithic 3-d integrated circuits. Proc IEEE Trans Circuits Syst I 65(3):1075–1085
Google Scholar
Yan Y, Mao Y, Li B (2018) Second: sparsely embedded convolutional detection. Sensors 18(10):3337
Article Google Scholar
Yang B, Luo W, Urtasun R (2018) Pixor: real-time 3d object detection from point clouds. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 7652–7660
Yang Z, Sun Y, Liu S, Shen X, Jia J (2018) Ipod: intensive point-based object detector for point cloud. arXiv preprint arXiv:1812.05276
Yoo JH, Kim Y, Kim J, Choi JW (2020) 3d-cvf: generating joint camera and lidar features using cross-view spatial feature fusion for 3d object detection. In: Proceedings of the European Conference on Computer Vision. pp. 720–736
You Y, Wang Y, Chao WL, Garg D, Pleiss G, Hariharan B, Campbell M, Weinberger KQ (2019) Pseudo-lidar++: accurate depth for 3d object detection in autonomous driving. arXiv preprint arXiv:1906.06310
Zhang K, Hao M, Wang J, de Silva CW, Fu C (2019) Linked dynamic graph cnn: learning on point cloud via linking hierarchical features. arXiv preprint arXiv:1904.10014
Zhang P, Wang D, Lu H, Wang H, Ruan X (2017) Amulet: Aggregating multi-level convolutional features for salient object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV). pp. 202–211
Zheng Y, Shyrokau B, Keviczky T (2022) 3dop: comfort-oriented motion planning for automated vehicles with active suspensions. In: Proceedings of the IEEE intelligent vehicles symposium. pp. 390–395
Zhou Y, Tuzel O (2018) Voxelnet: end-to-end learning for point cloud based 3d object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4490–4499
Zhu L, Chen Z, Wang B, Tian G, Ji L (2023) Sfss-net: shape-awared filter and sematic-ranked sampler for voxel-based 3d object detection. Neural Comput Appl 35(18):13417–13431
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (62102003), Natural Science Foundation of Anhui Province (2108085QF258), Anhui Postdoctoral Science Foundation (2022B623), the University Synergy Innovation Program of Anhui Province (GXXT-2021-006, GXXT-2022-038), Central guiding local technology development special funds (202107d06020001), the Institute of Energy, Hefei Comprehensive National Science Center under (21KZS217), University-level general projects of Anhui University of science and technology (xjyb2020-04).

Author information

Authors and Affiliations

College of Computer Science and Engineering, Anhui University of Science and Technology, Huainan, 232001, China
Chenxing Xia, Xubing Li, Bin Ge & Xianjin Fang
Institute of Energy, Hefei Comprehensive National Science Center, Hefei, Anhui, China
Chenxing Xia & Ke Yang
Anhui Purvar Bigdata Technology Co. Ltd, Huainan, 232001, China
Chenxing Xia
College of Electrical and Information Engineering, Anhui University of Science and Technology, Huainan, Anhui, China
Xiuju Gao
Department of Computer Science and Information Engineering, Providence University, Taichung, Taiwan
Kuan-Ching Li
Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
Xianjin Fang
The School of Electronics and Information Engineering, Anhui University, Hefei, Anhui, China
Yan Zhang

Authors

Chenxing Xia
View author publications
You can also search for this author inPubMed Google Scholar
Xubing Li
View author publications
You can also search for this author inPubMed Google Scholar
Xiuju Gao
View author publications
You can also search for this author inPubMed Google Scholar
Bin Ge
View author publications
You can also search for this author inPubMed Google Scholar
Kuan-Ching Li
View author publications
You can also search for this author inPubMed Google Scholar
Xianjin Fang
View author publications
You can also search for this author inPubMed Google Scholar
Yan Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Ke Yang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Xubing Li.

Ethics declarations

Conflict of interest

The authors declared that they have no conflict of interest to this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xia, C., Li, X., Gao, X. et al. PCDR-DFF: multi-modal 3D object detection based on point cloud diversity representation and dual feature fusion. Neural Comput & Applic 36, 9329–9346 (2024). https://doi.org/10.1007/s00521-024-09561-w

Download citation

Received: 18 June 2023
Accepted: 24 January 2024
Published: 01 March 2024
Issue Date: June 2024
DOI: https://doi.org/10.1007/s00521-024-09561-w

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PCDR-DFF: multi-modal 3D object detection based on point cloud diversity representation and dual feature fusion

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

MMFG: Multimodal-based Mutual Feature Gating 3D Object Detection

Point cloud 3D object detection method based on density information-local feature fusion

VPC-VoxelNet: multi-modal fusion 3D object detection networks based on virtual point clouds

Explore related subjects

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now