Skip to main content
Log in

DFAMNet: dual fusion attention multi-modal network for semantic segmentation on LiDAR point clouds

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Semantic segmentation of outdoor point clouds is an important task in the field of computer vision, aiming to classify outdoor point cloud data into different semantic categories. The methods based on pure point cloud have some shortcomings, such as incomplete information and difficulty in processing incomplete data. In the paper, it proposes pseudo point cloud method to align image with point cloud. The image features are extracted through a 2D network, and then the point cloud is mapped onto the image to obtain the corresponding pixel features, forming the pseudo point cloud. Then the dual fusion attention mechanism is designed to fuse the features of point cloud and pseudo point cloud. It improves the efficiency of the fusion network. The experimental results show that this method outperforms existing methods on the large-scale SemanticKITTI benchmark and achieves third place performance on the NuScenes benchmark. Code is available at https://github.com/Pdsn5/DFAMNet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability and access

Data will be made available on request.

References

  1. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

  2. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, pp 234–241

  3. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40:834–848

    Article  Google Scholar 

  4. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969

  5. Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881–2890

  6. Li Z, Sun Y, Zhang L, Tang J (2021) Ctnet: context-based tandem network for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 44:9904–9917

    Article  Google Scholar 

  7. Li Y, Yao T, Pan Y, Mei T (2022) Contextual transformer networks for visual recognition. IEEE Trans Pattern Anal Mach Intell

  8. Qi CR, Su H, Mo K, Guibas LJ (2017a) Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 652–660

  9. Qi CR, Yi L, Su H, Guibas LJ (2017b) Pointnet++: deep hierarchical feature learning on point sets in a metric space. Adv Neural Inf Process Syst 30

  10. Wang Y, Sun Y, Liu Z, Sarma SE, Bronstein MM, Solomon JM (2019) Dynamic graph cnn for learning on point clouds. Acm Transactions On Graphics (tog) 38:1–12

    Google Scholar 

  11. Zhu X, Zhou H, Wang T, Hong F, Ma Y, Li W, Li H, Lin D (2021) Cylindrical and asymmetrical 3d convolution networks for lidar segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9939–9948

  12. Qiu H, Yu B, Tao D (2022) Gfnet: geometric flow network for 3d point cloud semantic segmentation. arXiv preprint arXiv:2207.02605

  13. Hu Q, Yang B, Xie L, Rosa S, Guo Y, Wang Z, Trigoni N, Markham A (2020) Randla-net: efficient semantic segmentation of large-scale point clouds. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11108–11117

  14. He Z, Fan X, Peng Y, Shen Z, Jiao J, Liu M (2022) Empointmovseg: sparse tensor-based moving-object segmentation in 3-d lidar point clouds for autonomous driving-embedded system. IEEE Trans Comput Aided Des Integr Circuits Syst 42:41–53

    Article  Google Scholar 

  15. Guo R, Li D, Han Y (2021) Deep multi-scale and multi-modal fusion for 3d object detection. Pattern Recogn Lett 151:236–242

    Article  Google Scholar 

  16. Deng Z, Sun H, Zhou S, Zhao J, Lei L, Zou H (2018) Multi-scale object detection in remote sensing imagery with convolutional neural networks. ISPRS J Photogramm Remote Sens 145:3–22

    Article  Google Scholar 

  17. Fang X, Jiang M, Zhu J, Shao X, Wang H (2023) M2rnet: multi-modal and multi-scale refined network for rgb-d salient object detection. Pattern Recogn 135:109139

    Article  Google Scholar 

  18. Bultmann S, Quenzel J, Behnke S (2023) Real-time multi-modal semantic fusion on unmanned aerial vehicles with label propagation for cross-domain adaptation. Robot Auton Syst 159:104286

    Article  Google Scholar 

  19. Guan D, Cao Y, Yang J, Cao Y, Yang MY (2019) Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection. Information Fusion 50:148–157

    Article  Google Scholar 

  20. Sun Y, Zuo W, Liu M (2019) Rtfnet: Rgb-thermal fusion network for semantic segmentation of urban scenes. IEEE Robotics and Automation Letters 4:2576–2583

    Article  Google Scholar 

  21. Yan X, Gao J, Zheng C, Zheng C, Zhang R, Cui S, Li Z (2022) 2dpass: 2d priors assisted semantic segmentation on lidar point clouds. Computer vision–ECCV 2022: 17th European conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVIII. Springer, pp 677–695

  22. Wang Y, Chao W-L, Garg D, Hariharan B, Campbell M, Weinberger KQ (2019) Pseudo-lidar from visual depth estimation: bridging the gap in 3d object detection for autonomous driving. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 8445–8453

  23. Wu X, Peng L, Yang H, Xie L, Huang C, Deng C, Liu H, Cai D (2022) Sparse fuse dense: towards high quality 3d detection with depth completion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5418–5427

  24. You Y, Wang Y, Chao W-L, Garg D, Pleiss G, Hariharan B, Campbell M, Weinberger KQ (2019) Pseudo-lidar++: accurate depth for 3d object detection in autonomous driving. arXiv preprint arXiv:1906.06310

  25. Ravindran R, Santora MJ, Jamali MM (2020) Multi-object detection and tracking, based on dnn, for autonomous vehicles: a review. IEEE Sens J 21:5668–5677

    Article  Google Scholar 

  26. Wang J, Zhu M, Wang B, Sun D, Wei H, Liu C, Nie H (1895) Kda3d: key-point densification and multi-attention guidance for 3d object detection. Remote Sensing 12:1895

    Article  Google Scholar 

  27. Wan R, Zhao T, Zhao W (2023) Pta-det: point transformer associating point cloud and image for 3d object detection. Sensors 23:3229

    Article  Google Scholar 

  28. Bazi Y, Bashmal L, Rahhal MMA, Dayil RA, Ajlan NA (2021) Vision transformers for remote sensing image classification. Remote Sensing 13:516

    Article  Google Scholar 

  29. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: roceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022

  30. Zhao H, Jiang L, Jia J, Torr PH, Koltun V (2021) Point transformer. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 16259–16268

  31. Guo M-H, Cai J-X, Liu Z-N, Mu T-J, Martin RR, Hu S-M (2021) Pct: point cloud transformer. Computational Visual Media 7:187–199

    Article  Google Scholar 

  32. Zhuang Z, Li R, Jia K, Wang Q, Li Y, Tan M (2021) Perception-aware multi-sensor fusion for 3d lidar semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 16280–16290

  33. Tang H, Liu Z, Zhao S, Lin Y, Lin J, Wang H, Han S (2020) Searching efficient 3d architectures with sparse point-voxel convolution. In: European conference on computer vision. Springer, pp 685–702

  34. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  35. Berman M, Triki AR, Blaschko MB (2018) The lovász-softmax loss: a tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4413–4421

  36. Landrieu L, Simonovsky M (2018) Large-scale point cloud semantic segmentation with superpoint graphs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4558–4567

  37. Su H, Jampani V, Sun D, Maji S, Kalogerakis E, Yang M-H, Kautz J (2018) Splatnet: sparse lattice networks for point cloud processing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2530–2539

  38. Wu B, Wan A, Yue X, Keutzer K (2018) Squeezeseg: convolutional neural nets with recurrent crf for real-time road-object segmentation from 3d lidar point cloud. In: 2018 IEEE international conference on robotics and automation (ICRA). IEEE, pp 1887–1893

  39. Wu B, Zhou X, Zhao S, Yue X, Keutzer K (2019) Squeezesegv2: improved model structure and unsupervised domain adaptation for road-object segmentation from a lidar point cloud. In: 2019 International conference on robotics and automation (ICRA). IEEE, pp 4376–4382

  40. Tatarchenko M, Park J, Koltun V, Zhou Q-Y (2018) Tangent convolutions for dense prediction in 3d. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3887–3896

  41. Behley J, Garbade M, Milioto A, Quenzel J, Behnke S, Stachniss C, Gall J (2019) Semantickitti: a dataset for semantic scene understanding of lidar sequences. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9297–9307

  42. Zhang Y, Zhou Z, David P, Yue X, Xi Z, Gong B, Foroosh H (2020) Polarnet: an improved grid representation for online lidar point clouds semantic segmentation. In: Proceedings  of the IEEE/CVF conference on computer vision and pattern recognition, pp 9601–9610

  43. Yan X, Gao J, Li J, Zhang R, Li Z, Huang R, Cui S (2021) Sparse single sweep lidar point cloud segmentation via learning contextual shape priors from scene completion. Proceedings of the AAAI conference on artificial intelligence 35:3101–3109

    Article  Google Scholar 

  44. Cheng R, Razani R, Taghavi E, Li E, Liu B (2021) 2-s3net: attentive feature fusion with adaptive feature selection for sparse semantic segmentation network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12547–12556

  45. Genova K, Yin X, Kundu A, Pantofaru C, Cole F, Sud A, Brewington B, Shucker B, Funkhouser T (2021) Learning 3d semantic segmentation with only 2d image supervision. In: 2021 International conference on 3D vision (3DV). IEEE, pp 361–372

  46. Graham B, Engelcke M, Van Der Maaten L (2018) 3d semantic segmentation with submanifold sparse convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9224–9232

Download references

Acknowledgements

This work is supported by the National Nature Science Fund of China under Grant No. 61601176.

Author information

Authors and Affiliations

Authors

Contributions

Mingjie Li: Conceptualization, Methodology, code, Writing original draft. Gaihua Wang: Conceptualization, Writing, code, Methodology, Supervision. Minghao Zhu: Conceptualization, code. Chunzheng Li: Conceptualization, code. Xuran Pan: Conceptualization, code. Qian Long: Conceptualization, Writing, code, Methodology, Supervision.

Corresponding author

Correspondence to Gaihua Wang.

Ethics declarations

Competing Interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical and informed consent for data used

We confirm that the manuscript has been read and approved by all named authors and that there are no other persons who satisfied the criteria for authorship but are not listed. We further confirm that the order of authors listed in the manuscript has been approved by all of us.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, M., Wang, G., Zhu, M. et al. DFAMNet: dual fusion attention multi-modal network for semantic segmentation on LiDAR point clouds. Appl Intell 54, 3169–3180 (2024). https://doi.org/10.1007/s10489-024-05302-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-024-05302-7

Keywords

Navigation