Skip to main content
Log in

Multi-modal information fusion for LiDAR-based 3D object detection framework

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

With the rapid development of water transportation, ship safety supervision is facing more severe pressures and challenges. Precise and efficient detection of ship targets is becoming more and more important, which urgently requires intelligent detection methods to ultimately improves shipping management efficiency. However, the surveillance video of waterway transportation is often influenced by fog and rain, which can affect the performance of object detection and reduce the efficiency of management. The current traditional object approaches are hard to handle these problems. In this paper, we propose a novel multi-modal information fusion method to handle multi-object detection in waterway transportation, which introduces the LiDAR (Light Detection And Ranging) dataset to add spatial information and handle the interference of fog and rain. The target ROI (Region Of Interest) point cloud and image data are initially fused in the pre-fusion stage. This phase can efficiently direct the network’s attention to the region with the highest target probability, increasing the target recall rate. The 3D bounding box in the point cloud and 2D bounding boxes in the image retrieved are then fused in the post-fusion stage to improve target precision and enrich target detection information. Finally, using time synchronization and a space transformation matrix, the detection result is transferred to the picture coordinate system to create a ship image target with 3D depth information. This technique overcomes the constraints of single-sensor environment perception, adapts to the detection of ship targets in a variety of situations, and is more precise and robust. The algorithm’s superiority is also demonstrated by the experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data Availability

To attract more researchers to join the research in this field, and promote the research and development of multi-source information fusion in the field of ship supervision. We plan to make our dataset public, but the current dataset is not very complete and the amount of data is small, and we have been continuously enriching and improving this dataset, including the relevant data collected at the Haihe KaiQi Bridge in Tianjin and the Yangtze River Bridge in Nanjing. Once the dataset has enough data and covers a sufficiently diverse environmental scenario, we will disclose our dataset.

References

  1. Barrera A, Guindel C, Beltrán J, García F (2020) Birdnet+: end-to-end 3d object detection in lidar bird’s eye view. In: 2020 IEEE 23rd international conference on intelligent transportation systems (ITSC). IEEE, pp 1–6

  2. Bleiholder J, Naumann F (2009) Data fusion. ACM Comput Surv (CSUR) 41(1):1–41

    Article  Google Scholar 

  3. Chang-jun W, Cheng P, Yong L (2022) Multi-feature fusion ship target detection algorithm in complex environment. Comput Modern 11:81

    Google Scholar 

  4. Chen J, Wang Q, Peng W, Xu H, Li X, Xu W (2022) Disparity-based multiscale fusion network for transportation detection. IEEE Trans Intell Transp Syst 23(10):18855–18863

    Article  Google Scholar 

  5. Chen X, Ma H, Wan J, Li B, Xia T (2017) Multi-view 3d object detection network for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1907–1915

  6. Collignon A, Maes F, Delaere D, Vandermeulen D, Suetens P, Marchal G (1995) Automated multi-modality image registration based on information theory. In: Information processing in medical imaging, vol 3, pp 263–274

  7. da Silva BRF, Nogueira M B, Alsina P J, de Albuquerque GLA, Dantas Jo ao BD, de Medeiros Adelardo AD, Santiago G S (2017) Study on detection of boats using satellite imagery for use on unmanned aerial vehicles. In: 2017 Latin American robotics symposium (LARS) and 2017 Brazilian symposium on robotics (SBR). IEEE, pp 1–5

  8. Dai H, Du L, Wang Y, Wang Z (2016) A modified cfar algorithm based on object proposals for ship target detection in sar images. IEEE Geosci Remote Sens Lett 13(12):1925–1929

    Article  Google Scholar 

  9. Engelcke M, Rao D, Wang D Z, Tong C H, Posner I (2017) Vote3deep: fast object detection in 3d point clouds using efficient convolutional neural networks. In: 2017 IEEE International conference on robotics and automation (ICRA). IEEE, pp 1355–1361

  10. Fan L, Xiong X, Wang F, Wang N, Zhang Z (2021) Rangedet: in defense of range view for lidar-based 3d object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2918–2927

  11. Ge Z, Liu S, Wang F, Li Z, Sun J (2021) Yolox: exceeding yolo series in 2021. arXiv:2107.08430

  12. Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the kitti dataset. Int J Robot Res 32(11):1231–1237

    Article  Google Scholar 

  13. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 3354–3361

  14. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

  15. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  16. González A, Villalonga G, Xu J, Vázquez D, Amores J, López A M (2015) Multiview random forest of local experts combining rgb and lidar data for pedestrian detection. In: 2015 IEEE Intelligent vehicles symposium (IV). IEEE, pp 356–361

  17. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969

  18. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  19. Huang J, Jiang Z, Zhang H, Yao Y (2017) Ship object detection in remote sensing images using convolutional neural networks. Journal of Beijing University of Aeronautics and Astronautics 43(9):1841

    Google Scholar 

  20. Kidono K, Naito T, Miura J (2012) Reliable pedestrian recognition combining high-definition lidar and vision data. In: 2012 15th International IEEE conference on intelligent transportation systems. IEEE, pp 1783–1788

  21. Königshof H, Salscheider N O, Stiller C (2019) Realtime 3d object detection for automated driving using stereo vision and semantic information. In: 2019 IEEE Intelligent transportation systems conference (ITSC). IEEE, pp 1405–1410

  22. Königshof H, Stiller C (2020) Learning-based shape estimation with grid map patches for realtime 3d object detection for automated driving. In: 2020 IEEE 23rd International conference on intelligent transportation systems (ITSC), pp 1–6

  23. Krizhevsky A, Sutskever I, Hinton G E (2012) Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25

  24. Li S, Liu Z, Shen Z, Cheng K-T (2022) Stereo neural vernier caliper. In: AAAI Conference on artificial intelligence

  25. Li Y, Bu R, Sun M, Wu W, Di X, Chen B (2018) Pointcnn: convolution on x-transformed points. Advances in Neural Information Processing Systems, 31

  26. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125

  27. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg A C (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37

  28. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022

  29. Ma R, Yin Y, Li Z, Chen J, Bao K (2020) Research on active intelligent perception technology of vessel situation based on multisensor fusion. Math Probl Eng, 2020

  30. Meng T, Jing X, Yan Z, Pedrycz W (2020) A survey on machine learning for data fusion. Inform Fus 57:115–129

    Article  Google Scholar 

  31. Mi C, Shen Y, Mi W, Huang Y (2015) Ship identification algorithm based on 3d point cloud for automated ship loaders. J Coastal Res 73:28–34

    Article  Google Scholar 

  32. Nie W, Ren M, Liu A, Mao Z, Nie J (2020) M-gcn: multi-branch graph convolution network for 2d image-based on 3d model retrieval. IEEE Trans Multimedia

  33. Nie W, Ren M, Nie J, Zhao S (2020) C-gcn: correlation based graph convolutional network for audio-video emotion recognition. IEEE Trans Multimedia

  34. Nie X, Liu W, Wu W (2020) Ship detection based on enhanced yolov3 under complex environments. J Comput Applic 40(9):2561

    Google Scholar 

  35. Qi C R, Su H, Mo K, Guibas L J (2017) Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 652–660

  36. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  37. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. arXiv:1506.01497

  38. Simon M, Amende K, Kraus A, Honer J, Samann T, Kaulbersch H, Milz S, Michael Gross H (2019) Complexer-yolo: real-time 3d object detection and tracking on semantic point clouds. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 0–0

  39. Song S, Xiao J (2016) Deep sliding shapes for amodal 3d object detection in rgb-d images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 808–816

  40. Tulldahl H M, Pettersson M (2007) Lidar for shallow underwater target detection. In: Electro-optical remote sensing, detection, and photonic technologies and their applications, vol 6739. International Society for Optics and Photonics, p 673906

  41. Wang J, Zheng T, Lei P, Bai X (2019) A hierarchical convolution neural network (cnn)-based ship target detection method in spaceborne sar imagery. Rem Sens 11(6):620

    Article  Google Scholar 

  42. Wu J, Mao S, Wang X, Zhang T (2011) Ship target detection and tracking in cluttered infrared imagery. Opt Eng 50(5):057207

    Article  Google Scholar 

  43. You Y, Wang Y, Chao W-L, Garg D, Pleiss G, Hariharan B, Campbell M, Weinberger K Q (2020) Pseudo-lidar++: accurate depth for 3d object detection in autonomous driving. In: International conference on learning representations (ICLR)

  44. Zhang W, Li R, Deng H, Wang L, Lin W, Ji S, Shen D (2015) Deep convolutional neural networks for multi-modality isointense infant brain image segmentation. Neuroimage 108:214–224

    Article  Google Scholar 

  45. Zhang W, Zhou H, Sun S, Wang Z, Shi J, Loy C C (2019) Robust multi-modality multi-object tracking. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2365–2374

  46. Zhang Y, Xiong W, Dong X, Hu C, Sun Y (2018) Grft-based moving ship target detection and imaging in geosynchronous sar. Rem Sens 10 (12):2002

    Article  Google Scholar 

  47. Zhou T, Ruan S, Canu S (2019) A review: deep learning for medical image segmentation using multi-modality fusion. Array 3:100004

    Article  Google Scholar 

  48. Zhou Y, Tuzel O (2018) Voxelnet: end-to-end learning for point cloud based 3d object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4490–4499

Download references

Funding

This research was supported by the Basic Research Fund of Central-Level Nonprofit Scientific Research Institutes (No. TKS20230203).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Funding acquisition and administrative support were provided by Ruixing Ma. Original manuscript writing and language editing were completed by Yong Yin and Ruixin Ma. Data curation and experiment were performed by Jing Chen and Rihao Chang. Writing review and revision were proceeded by Rihao Chang and Yong Yin. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Rihao Chang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, R., Yin, Y., Chen, J. et al. Multi-modal information fusion for LiDAR-based 3D object detection framework. Multimed Tools Appl 83, 7995–8012 (2024). https://doi.org/10.1007/s11042-023-15452-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15452-4

Keywords

Navigation