Multi-modal information fusion for LiDAR-based 3D object detection framework

Ma, Ruixin; Yin, Yong; Chen, Jing; Chang, Rihao

doi:10.1007/s11042-023-15452-4

Multi-modal information fusion for LiDAR-based 3D object detection framework

Published: 13 June 2023

Volume 83, pages 7995–8012, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Ruixin Ma^1,2,
Yong Yin¹,
Jing Chen² &
…
Rihao Chang ORCID: orcid.org/0000-0001-5384-0212³

449 Accesses
1 Citation
Explore all metrics

Abstract

With the rapid development of water transportation, ship safety supervision is facing more severe pressures and challenges. Precise and efficient detection of ship targets is becoming more and more important, which urgently requires intelligent detection methods to ultimately improves shipping management efficiency. However, the surveillance video of waterway transportation is often influenced by fog and rain, which can affect the performance of object detection and reduce the efficiency of management. The current traditional object approaches are hard to handle these problems. In this paper, we propose a novel multi-modal information fusion method to handle multi-object detection in waterway transportation, which introduces the LiDAR (Light Detection And Ranging) dataset to add spatial information and handle the interference of fog and rain. The target ROI (Region Of Interest) point cloud and image data are initially fused in the pre-fusion stage. This phase can efficiently direct the network’s attention to the region with the highest target probability, increasing the target recall rate. The 3D bounding box in the point cloud and 2D bounding boxes in the image retrieved are then fused in the post-fusion stage to improve target precision and enrich target detection information. Finally, using time synchronization and a space transformation matrix, the detection result is transferred to the picture coordinate system to create a ship image target with 3D depth information. This technique overcomes the constraints of single-sensor environment perception, adapts to the detection of ship targets in a variety of situations, and is more precise and robust. The algorithm’s superiority is also demonstrated by the experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Real time object detection using LiDAR and camera fusion for autonomous driving

Article Open access 17 May 2023

AFMCT: adaptive fusion module based on cross-modal transformer block for 3D object detection

Article 23 March 2024

3D object detection algorithm based on multi-sensor segmental fusion of frustum association for autonomous driving

Article 02 July 2023

Data Availability

To attract more researchers to join the research in this field, and promote the research and development of multi-source information fusion in the field of ship supervision. We plan to make our dataset public, but the current dataset is not very complete and the amount of data is small, and we have been continuously enriching and improving this dataset, including the relevant data collected at the Haihe KaiQi Bridge in Tianjin and the Yangtze River Bridge in Nanjing. Once the dataset has enough data and covers a sufficiently diverse environmental scenario, we will disclose our dataset.

References

Barrera A, Guindel C, Beltrán J, García F (2020) Birdnet+: end-to-end 3d object detection in lidar bird’s eye view. In: 2020 IEEE 23rd international conference on intelligent transportation systems (ITSC). IEEE, pp 1–6
Bleiholder J, Naumann F (2009) Data fusion. ACM Comput Surv (CSUR) 41(1):1–41
Article Google Scholar
Chang-jun W, Cheng P, Yong L (2022) Multi-feature fusion ship target detection algorithm in complex environment. Comput Modern 11:81
Google Scholar
Chen J, Wang Q, Peng W, Xu H, Li X, Xu W (2022) Disparity-based multiscale fusion network for transportation detection. IEEE Trans Intell Transp Syst 23(10):18855–18863
Article Google Scholar
Chen X, Ma H, Wan J, Li B, Xia T (2017) Multi-view 3d object detection network for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1907–1915
Collignon A, Maes F, Delaere D, Vandermeulen D, Suetens P, Marchal G (1995) Automated multi-modality image registration based on information theory. In: Information processing in medical imaging, vol 3, pp 263–274
da Silva BRF, Nogueira M B, Alsina P J, de Albuquerque GLA, Dantas Jo ao BD, de Medeiros Adelardo AD, Santiago G S (2017) Study on detection of boats using satellite imagery for use on unmanned aerial vehicles. In: 2017 Latin American robotics symposium (LARS) and 2017 Brazilian symposium on robotics (SBR). IEEE, pp 1–5
Dai H, Du L, Wang Y, Wang Z (2016) A modified cfar algorithm based on object proposals for ship target detection in sar images. IEEE Geosci Remote Sens Lett 13(12):1925–1929
Article Google Scholar
Engelcke M, Rao D, Wang D Z, Tong C H, Posner I (2017) Vote3deep: fast object detection in 3d point clouds using efficient convolutional neural networks. In: 2017 IEEE International conference on robotics and automation (ICRA). IEEE, pp 1355–1361
Fan L, Xiong X, Wang F, Wang N, Zhang Z (2021) Rangedet: in defense of range view for lidar-based 3d object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2918–2927
Ge Z, Liu S, Wang F, Li Z, Sun J (2021) Yolox: exceeding yolo series in 2021. arXiv:2107.08430
Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the kitti dataset. Int J Robot Res 32(11):1231–1237
Article Google Scholar
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 3354–3361
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
González A, Villalonga G, Xu J, Vázquez D, Amores J, López A M (2015) Multiview random forest of local experts combining rgb and lidar data for pedestrian detection. In: 2015 IEEE Intelligent vehicles symposium (IV). IEEE, pp 356–361
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Huang J, Jiang Z, Zhang H, Yao Y (2017) Ship object detection in remote sensing images using convolutional neural networks. Journal of Beijing University of Aeronautics and Astronautics 43(9):1841
Google Scholar
Kidono K, Naito T, Miura J (2012) Reliable pedestrian recognition combining high-definition lidar and vision data. In: 2012 15th International IEEE conference on intelligent transportation systems. IEEE, pp 1783–1788
Königshof H, Salscheider N O, Stiller C (2019) Realtime 3d object detection for automated driving using stereo vision and semantic information. In: 2019 IEEE Intelligent transportation systems conference (ITSC). IEEE, pp 1405–1410
Königshof H, Stiller C (2020) Learning-based shape estimation with grid map patches for realtime 3d object detection for automated driving. In: 2020 IEEE 23rd International conference on intelligent transportation systems (ITSC), pp 1–6
Krizhevsky A, Sutskever I, Hinton G E (2012) Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25
Li S, Liu Z, Shen Z, Cheng K-T (2022) Stereo neural vernier caliper. In: AAAI Conference on artificial intelligence
Li Y, Bu R, Sun M, Wu W, Di X, Chen B (2018) Pointcnn: convolution on x-transformed points. Advances in Neural Information Processing Systems, 31
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg A C (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022
Ma R, Yin Y, Li Z, Chen J, Bao K (2020) Research on active intelligent perception technology of vessel situation based on multisensor fusion. Math Probl Eng, 2020
Meng T, Jing X, Yan Z, Pedrycz W (2020) A survey on machine learning for data fusion. Inform Fus 57:115–129
Article Google Scholar
Mi C, Shen Y, Mi W, Huang Y (2015) Ship identification algorithm based on 3d point cloud for automated ship loaders. J Coastal Res 73:28–34
Article Google Scholar
Nie W, Ren M, Liu A, Mao Z, Nie J (2020) M-gcn: multi-branch graph convolution network for 2d image-based on 3d model retrieval. IEEE Trans Multimedia
Nie W, Ren M, Nie J, Zhao S (2020) C-gcn: correlation based graph convolutional network for audio-video emotion recognition. IEEE Trans Multimedia
Nie X, Liu W, Wu W (2020) Ship detection based on enhanced yolov3 under complex environments. J Comput Applic 40(9):2561
Google Scholar
Qi C R, Su H, Mo K, Guibas L J (2017) Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 652–660
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. arXiv:1506.01497
Simon M, Amende K, Kraus A, Honer J, Samann T, Kaulbersch H, Milz S, Michael Gross H (2019) Complexer-yolo: real-time 3d object detection and tracking on semantic point clouds. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 0–0
Song S, Xiao J (2016) Deep sliding shapes for amodal 3d object detection in rgb-d images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 808–816
Tulldahl H M, Pettersson M (2007) Lidar for shallow underwater target detection. In: Electro-optical remote sensing, detection, and photonic technologies and their applications, vol 6739. International Society for Optics and Photonics, p 673906
Wang J, Zheng T, Lei P, Bai X (2019) A hierarchical convolution neural network (cnn)-based ship target detection method in spaceborne sar imagery. Rem Sens 11(6):620
Article Google Scholar
Wu J, Mao S, Wang X, Zhang T (2011) Ship target detection and tracking in cluttered infrared imagery. Opt Eng 50(5):057207
Article Google Scholar
You Y, Wang Y, Chao W-L, Garg D, Pleiss G, Hariharan B, Campbell M, Weinberger K Q (2020) Pseudo-lidar++: accurate depth for 3d object detection in autonomous driving. In: International conference on learning representations (ICLR)
Zhang W, Li R, Deng H, Wang L, Lin W, Ji S, Shen D (2015) Deep convolutional neural networks for multi-modality isointense infant brain image segmentation. Neuroimage 108:214–224
Article Google Scholar
Zhang W, Zhou H, Sun S, Wang Z, Shi J, Loy C C (2019) Robust multi-modality multi-object tracking. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2365–2374
Zhang Y, Xiong W, Dong X, Hu C, Sun Y (2018) Grft-based moving ship target detection and imaging in geosynchronous sar. Rem Sens 10 (12):2002
Article Google Scholar
Zhou T, Ruan S, Canu S (2019) A review: deep learning for medical image segmentation using multi-modality fusion. Array 3:100004
Article Google Scholar
Zhou Y, Tuzel O (2018) Voxelnet: end-to-end learning for point cloud based 3d object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4490–4499

Download references

Funding

This research was supported by the Basic Research Fund of Central-Level Nonprofit Scientific Research Institutes (No. TKS20230203).

Author information

Authors and Affiliations

Key Laboratory of Marine Simulation and Control, Dalian Maritime University, Dalian, 116026, China
Ruixin Ma & Yong Yin
Tianjin Research Institute for Water Transport Engineering, M.O.T., Tianjin, 300456, China
Ruixin Ma & Jing Chen
School of Electrical Automation and Information Engineering, Tianjin University, Tianjin, 300072, China
Rihao Chang

Authors

Ruixin Ma
View author publications
You can also search for this author in PubMed Google Scholar
Yong Yin
View author publications
You can also search for this author in PubMed Google Scholar
Jing Chen
View author publications
You can also search for this author in PubMed Google Scholar
Rihao Chang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Funding acquisition and administrative support were provided by Ruixing Ma. Original manuscript writing and language editing were completed by Yong Yin and Ruixin Ma. Data curation and experiment were performed by Jing Chen and Rihao Chang. Writing review and revision were proceeded by Rihao Chang and Yong Yin. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Rihao Chang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ma, R., Yin, Y., Chen, J. et al. Multi-modal information fusion for LiDAR-based 3D object detection framework. Multimed Tools Appl 83, 7995–8012 (2024). https://doi.org/10.1007/s11042-023-15452-4

Download citation

Received: 15 April 2022
Revised: 10 January 2023
Accepted: 18 April 2023
Published: 13 June 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s11042-023-15452-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-modal information fusion for LiDAR-based 3D object detection framework

Abstract

Access this article

Similar content being viewed by others

Real time object detection using LiDAR and camera fusion for autonomous driving

AFMCT: adaptive fusion module based on cross-modal transformer block for 3D object detection

3D object detection algorithm based on multi-sensor segmental fusion of frustum association for autonomous driving

Data Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-modal information fusion for LiDAR-based 3D object detection framework

Abstract

Access this article

Similar content being viewed by others

Real time object detection using LiDAR and camera fusion for autonomous driving

AFMCT: adaptive fusion module based on cross-modal transformer block for 3D object detection

3D object detection algorithm based on multi-sensor segmental fusion of frustum association for autonomous driving

Data Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation