Abstract
With the rapid development of water transportation, ship safety supervision is facing more severe pressures and challenges. Precise and efficient detection of ship targets is becoming more and more important, which urgently requires intelligent detection methods to ultimately improves shipping management efficiency. However, the surveillance video of waterway transportation is often influenced by fog and rain, which can affect the performance of object detection and reduce the efficiency of management. The current traditional object approaches are hard to handle these problems. In this paper, we propose a novel multi-modal information fusion method to handle multi-object detection in waterway transportation, which introduces the LiDAR (Light Detection And Ranging) dataset to add spatial information and handle the interference of fog and rain. The target ROI (Region Of Interest) point cloud and image data are initially fused in the pre-fusion stage. This phase can efficiently direct the network’s attention to the region with the highest target probability, increasing the target recall rate. The 3D bounding box in the point cloud and 2D bounding boxes in the image retrieved are then fused in the post-fusion stage to improve target precision and enrich target detection information. Finally, using time synchronization and a space transformation matrix, the detection result is transferred to the picture coordinate system to create a ship image target with 3D depth information. This technique overcomes the constraints of single-sensor environment perception, adapts to the detection of ship targets in a variety of situations, and is more precise and robust. The algorithm’s superiority is also demonstrated by the experiments.
Similar content being viewed by others
Data Availability
To attract more researchers to join the research in this field, and promote the research and development of multi-source information fusion in the field of ship supervision. We plan to make our dataset public, but the current dataset is not very complete and the amount of data is small, and we have been continuously enriching and improving this dataset, including the relevant data collected at the Haihe KaiQi Bridge in Tianjin and the Yangtze River Bridge in Nanjing. Once the dataset has enough data and covers a sufficiently diverse environmental scenario, we will disclose our dataset.
References
Barrera A, Guindel C, Beltrán J, García F (2020) Birdnet+: end-to-end 3d object detection in lidar bird’s eye view. In: 2020 IEEE 23rd international conference on intelligent transportation systems (ITSC). IEEE, pp 1–6
Bleiholder J, Naumann F (2009) Data fusion. ACM Comput Surv (CSUR) 41(1):1–41
Chang-jun W, Cheng P, Yong L (2022) Multi-feature fusion ship target detection algorithm in complex environment. Comput Modern 11:81
Chen J, Wang Q, Peng W, Xu H, Li X, Xu W (2022) Disparity-based multiscale fusion network for transportation detection. IEEE Trans Intell Transp Syst 23(10):18855–18863
Chen X, Ma H, Wan J, Li B, Xia T (2017) Multi-view 3d object detection network for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1907–1915
Collignon A, Maes F, Delaere D, Vandermeulen D, Suetens P, Marchal G (1995) Automated multi-modality image registration based on information theory. In: Information processing in medical imaging, vol 3, pp 263–274
da Silva BRF, Nogueira M B, Alsina P J, de Albuquerque GLA, Dantas Jo ao BD, de Medeiros Adelardo AD, Santiago G S (2017) Study on detection of boats using satellite imagery for use on unmanned aerial vehicles. In: 2017 Latin American robotics symposium (LARS) and 2017 Brazilian symposium on robotics (SBR). IEEE, pp 1–5
Dai H, Du L, Wang Y, Wang Z (2016) A modified cfar algorithm based on object proposals for ship target detection in sar images. IEEE Geosci Remote Sens Lett 13(12):1925–1929
Engelcke M, Rao D, Wang D Z, Tong C H, Posner I (2017) Vote3deep: fast object detection in 3d point clouds using efficient convolutional neural networks. In: 2017 IEEE International conference on robotics and automation (ICRA). IEEE, pp 1355–1361
Fan L, Xiong X, Wang F, Wang N, Zhang Z (2021) Rangedet: in defense of range view for lidar-based 3d object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2918–2927
Ge Z, Liu S, Wang F, Li Z, Sun J (2021) Yolox: exceeding yolo series in 2021. arXiv:2107.08430
Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the kitti dataset. Int J Robot Res 32(11):1231–1237
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 3354–3361
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
González A, Villalonga G, Xu J, Vázquez D, Amores J, López A M (2015) Multiview random forest of local experts combining rgb and lidar data for pedestrian detection. In: 2015 IEEE Intelligent vehicles symposium (IV). IEEE, pp 356–361
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Huang J, Jiang Z, Zhang H, Yao Y (2017) Ship object detection in remote sensing images using convolutional neural networks. Journal of Beijing University of Aeronautics and Astronautics 43(9):1841
Kidono K, Naito T, Miura J (2012) Reliable pedestrian recognition combining high-definition lidar and vision data. In: 2012 15th International IEEE conference on intelligent transportation systems. IEEE, pp 1783–1788
Königshof H, Salscheider N O, Stiller C (2019) Realtime 3d object detection for automated driving using stereo vision and semantic information. In: 2019 IEEE Intelligent transportation systems conference (ITSC). IEEE, pp 1405–1410
Königshof H, Stiller C (2020) Learning-based shape estimation with grid map patches for realtime 3d object detection for automated driving. In: 2020 IEEE 23rd International conference on intelligent transportation systems (ITSC), pp 1–6
Krizhevsky A, Sutskever I, Hinton G E (2012) Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25
Li S, Liu Z, Shen Z, Cheng K-T (2022) Stereo neural vernier caliper. In: AAAI Conference on artificial intelligence
Li Y, Bu R, Sun M, Wu W, Di X, Chen B (2018) Pointcnn: convolution on x-transformed points. Advances in Neural Information Processing Systems, 31
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg A C (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022
Ma R, Yin Y, Li Z, Chen J, Bao K (2020) Research on active intelligent perception technology of vessel situation based on multisensor fusion. Math Probl Eng, 2020
Meng T, Jing X, Yan Z, Pedrycz W (2020) A survey on machine learning for data fusion. Inform Fus 57:115–129
Mi C, Shen Y, Mi W, Huang Y (2015) Ship identification algorithm based on 3d point cloud for automated ship loaders. J Coastal Res 73:28–34
Nie W, Ren M, Liu A, Mao Z, Nie J (2020) M-gcn: multi-branch graph convolution network for 2d image-based on 3d model retrieval. IEEE Trans Multimedia
Nie W, Ren M, Nie J, Zhao S (2020) C-gcn: correlation based graph convolutional network for audio-video emotion recognition. IEEE Trans Multimedia
Nie X, Liu W, Wu W (2020) Ship detection based on enhanced yolov3 under complex environments. J Comput Applic 40(9):2561
Qi C R, Su H, Mo K, Guibas L J (2017) Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 652–660
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. arXiv:1506.01497
Simon M, Amende K, Kraus A, Honer J, Samann T, Kaulbersch H, Milz S, Michael Gross H (2019) Complexer-yolo: real-time 3d object detection and tracking on semantic point clouds. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 0–0
Song S, Xiao J (2016) Deep sliding shapes for amodal 3d object detection in rgb-d images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 808–816
Tulldahl H M, Pettersson M (2007) Lidar for shallow underwater target detection. In: Electro-optical remote sensing, detection, and photonic technologies and their applications, vol 6739. International Society for Optics and Photonics, p 673906
Wang J, Zheng T, Lei P, Bai X (2019) A hierarchical convolution neural network (cnn)-based ship target detection method in spaceborne sar imagery. Rem Sens 11(6):620
Wu J, Mao S, Wang X, Zhang T (2011) Ship target detection and tracking in cluttered infrared imagery. Opt Eng 50(5):057207
You Y, Wang Y, Chao W-L, Garg D, Pleiss G, Hariharan B, Campbell M, Weinberger K Q (2020) Pseudo-lidar++: accurate depth for 3d object detection in autonomous driving. In: International conference on learning representations (ICLR)
Zhang W, Li R, Deng H, Wang L, Lin W, Ji S, Shen D (2015) Deep convolutional neural networks for multi-modality isointense infant brain image segmentation. Neuroimage 108:214–224
Zhang W, Zhou H, Sun S, Wang Z, Shi J, Loy C C (2019) Robust multi-modality multi-object tracking. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2365–2374
Zhang Y, Xiong W, Dong X, Hu C, Sun Y (2018) Grft-based moving ship target detection and imaging in geosynchronous sar. Rem Sens 10 (12):2002
Zhou T, Ruan S, Canu S (2019) A review: deep learning for medical image segmentation using multi-modality fusion. Array 3:100004
Zhou Y, Tuzel O (2018) Voxelnet: end-to-end learning for point cloud based 3d object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4490–4499
Funding
This research was supported by the Basic Research Fund of Central-Level Nonprofit Scientific Research Institutes (No. TKS20230203).
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Funding acquisition and administrative support were provided by Ruixing Ma. Original manuscript writing and language editing were completed by Yong Yin and Ruixin Ma. Data curation and experiment were performed by Jing Chen and Rihao Chang. Writing review and revision were proceeded by Rihao Chang and Yong Yin. All authors read and approved the final manuscript.
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ma, R., Yin, Y., Chen, J. et al. Multi-modal information fusion for LiDAR-based 3D object detection framework. Multimed Tools Appl 83, 7995–8012 (2024). https://doi.org/10.1007/s11042-023-15452-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-15452-4