DFAMNet: dual fusion attention multi-modal network for semantic segmentation on LiDAR point clouds

Li, Mingjie; Wang, Gaihua; Zhu, Minghao; Li, Chunzheng; Liu, Hong; Pan, Xuran; Long, Qian

doi:10.1007/s10489-024-05302-7

DFAMNet: dual fusion attention multi-modal network for semantic segmentation on LiDAR point clouds

Published: 21 February 2024

Volume 54, pages 3169–3180, (2024)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Mingjie Li¹,
Gaihua Wang²,
Minghao Zhu¹,
Chunzheng Li¹,
Hong Liu¹,
Xuran Pan² &
…
Qian Long²

275 Accesses
Explore all metrics

Abstract

Semantic segmentation of outdoor point clouds is an important task in the field of computer vision, aiming to classify outdoor point cloud data into different semantic categories. The methods based on pure point cloud have some shortcomings, such as incomplete information and difficulty in processing incomplete data. In the paper, it proposes pseudo point cloud method to align image with point cloud. The image features are extracted through a 2D network, and then the point cloud is mapped onto the image to obtain the corresponding pixel features, forming the pseudo point cloud. Then the dual fusion attention mechanism is designed to fuse the features of point cloud and pseudo point cloud. It improves the efficiency of the fusion network. The experimental results show that this method outperforms existing methods on the large-scale SemanticKITTI benchmark and achieves third place performance on the NuScenes benchmark. Code is available at https://github.com/Pdsn5/DFAMNet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PointMS: Semantic Segmentation for Point Cloud Based on Multi-scale Directional Convolution

Article 30 September 2022

pCTFusion: Point Convolution-Transformer Fusion with Semantic Aware Loss for Outdoor LiDAR Point Cloud Segmentation

Article 19 February 2024

SDANet: spatial deep attention-based for point cloud classification and segmentation

Article 30 March 2022

Data availability and access

Data will be made available on request.

References

Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, pp 234–241
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40:834–848
Article Google Scholar
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881–2890
Li Z, Sun Y, Zhang L, Tang J (2021) Ctnet: context-based tandem network for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 44:9904–9917
Article Google Scholar
Li Y, Yao T, Pan Y, Mei T (2022) Contextual transformer networks for visual recognition. IEEE Trans Pattern Anal Mach Intell
Qi CR, Su H, Mo K, Guibas LJ (2017a) Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 652–660
Qi CR, Yi L, Su H, Guibas LJ (2017b) Pointnet++: deep hierarchical feature learning on point sets in a metric space. Adv Neural Inf Process Syst 30
Wang Y, Sun Y, Liu Z, Sarma SE, Bronstein MM, Solomon JM (2019) Dynamic graph cnn for learning on point clouds. Acm Transactions On Graphics (tog) 38:1–12
Google Scholar
Zhu X, Zhou H, Wang T, Hong F, Ma Y, Li W, Li H, Lin D (2021) Cylindrical and asymmetrical 3d convolution networks for lidar segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9939–9948
Qiu H, Yu B, Tao D (2022) Gfnet: geometric flow network for 3d point cloud semantic segmentation. arXiv preprint arXiv:2207.02605
Hu Q, Yang B, Xie L, Rosa S, Guo Y, Wang Z, Trigoni N, Markham A (2020) Randla-net: efficient semantic segmentation of large-scale point clouds. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11108–11117
He Z, Fan X, Peng Y, Shen Z, Jiao J, Liu M (2022) Empointmovseg: sparse tensor-based moving-object segmentation in 3-d lidar point clouds for autonomous driving-embedded system. IEEE Trans Comput Aided Des Integr Circuits Syst 42:41–53
Article Google Scholar
Guo R, Li D, Han Y (2021) Deep multi-scale and multi-modal fusion for 3d object detection. Pattern Recogn Lett 151:236–242
Article Google Scholar
Deng Z, Sun H, Zhou S, Zhao J, Lei L, Zou H (2018) Multi-scale object detection in remote sensing imagery with convolutional neural networks. ISPRS J Photogramm Remote Sens 145:3–22
Article Google Scholar
Fang X, Jiang M, Zhu J, Shao X, Wang H (2023) M2rnet: multi-modal and multi-scale refined network for rgb-d salient object detection. Pattern Recogn 135:109139
Article Google Scholar
Bultmann S, Quenzel J, Behnke S (2023) Real-time multi-modal semantic fusion on unmanned aerial vehicles with label propagation for cross-domain adaptation. Robot Auton Syst 159:104286
Article Google Scholar
Guan D, Cao Y, Yang J, Cao Y, Yang MY (2019) Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection. Information Fusion 50:148–157
Article Google Scholar
Sun Y, Zuo W, Liu M (2019) Rtfnet: Rgb-thermal fusion network for semantic segmentation of urban scenes. IEEE Robotics and Automation Letters 4:2576–2583
Article Google Scholar
Yan X, Gao J, Zheng C, Zheng C, Zhang R, Cui S, Li Z (2022) 2dpass: 2d priors assisted semantic segmentation on lidar point clouds. Computer vision–ECCV 2022: 17th European conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVIII. Springer, pp 677–695
Wang Y, Chao W-L, Garg D, Hariharan B, Campbell M, Weinberger KQ (2019) Pseudo-lidar from visual depth estimation: bridging the gap in 3d object detection for autonomous driving. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 8445–8453
Wu X, Peng L, Yang H, Xie L, Huang C, Deng C, Liu H, Cai D (2022) Sparse fuse dense: towards high quality 3d detection with depth completion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5418–5427
You Y, Wang Y, Chao W-L, Garg D, Pleiss G, Hariharan B, Campbell M, Weinberger KQ (2019) Pseudo-lidar++: accurate depth for 3d object detection in autonomous driving. arXiv preprint arXiv:1906.06310
Ravindran R, Santora MJ, Jamali MM (2020) Multi-object detection and tracking, based on dnn, for autonomous vehicles: a review. IEEE Sens J 21:5668–5677
Article Google Scholar
Wang J, Zhu M, Wang B, Sun D, Wei H, Liu C, Nie H (1895) Kda3d: key-point densification and multi-attention guidance for 3d object detection. Remote Sensing 12:1895
Article Google Scholar
Wan R, Zhao T, Zhao W (2023) Pta-det: point transformer associating point cloud and image for 3d object detection. Sensors 23:3229
Article Google Scholar
Bazi Y, Bashmal L, Rahhal MMA, Dayil RA, Ajlan NA (2021) Vision transformers for remote sensing image classification. Remote Sensing 13:516
Article Google Scholar
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: roceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022
Zhao H, Jiang L, Jia J, Torr PH, Koltun V (2021) Point transformer. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 16259–16268
Guo M-H, Cai J-X, Liu Z-N, Mu T-J, Martin RR, Hu S-M (2021) Pct: point cloud transformer. Computational Visual Media 7:187–199
Article Google Scholar
Zhuang Z, Li R, Jia K, Wang Q, Li Y, Tan M (2021) Perception-aware multi-sensor fusion for 3d lidar semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 16280–16290
Tang H, Liu Z, Zhao S, Lin Y, Lin J, Wang H, Han S (2020) Searching efficient 3d architectures with sparse point-voxel convolution. In: European conference on computer vision. Springer, pp 685–702
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Berman M, Triki AR, Blaschko MB (2018) The lovász-softmax loss: a tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4413–4421
Landrieu L, Simonovsky M (2018) Large-scale point cloud semantic segmentation with superpoint graphs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4558–4567
Su H, Jampani V, Sun D, Maji S, Kalogerakis E, Yang M-H, Kautz J (2018) Splatnet: sparse lattice networks for point cloud processing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2530–2539
Wu B, Wan A, Yue X, Keutzer K (2018) Squeezeseg: convolutional neural nets with recurrent crf for real-time road-object segmentation from 3d lidar point cloud. In: 2018 IEEE international conference on robotics and automation (ICRA). IEEE, pp 1887–1893
Wu B, Zhou X, Zhao S, Yue X, Keutzer K (2019) Squeezesegv2: improved model structure and unsupervised domain adaptation for road-object segmentation from a lidar point cloud. In: 2019 International conference on robotics and automation (ICRA). IEEE, pp 4376–4382
Tatarchenko M, Park J, Koltun V, Zhou Q-Y (2018) Tangent convolutions for dense prediction in 3d. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3887–3896
Behley J, Garbade M, Milioto A, Quenzel J, Behnke S, Stachniss C, Gall J (2019) Semantickitti: a dataset for semantic scene understanding of lidar sequences. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9297–9307
Zhang Y, Zhou Z, David P, Yue X, Xi Z, Gong B, Foroosh H (2020) Polarnet: an improved grid representation for online lidar point clouds semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9601–9610
Yan X, Gao J, Li J, Zhang R, Li Z, Huang R, Cui S (2021) Sparse single sweep lidar point cloud segmentation via learning contextual shape priors from scene completion. Proceedings of the AAAI conference on artificial intelligence 35:3101–3109
Article Google Scholar
Cheng R, Razani R, Taghavi E, Li E, Liu B (2021) 2-s3net: attentive feature fusion with adaptive feature selection for sparse semantic segmentation network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12547–12556
Genova K, Yin X, Kundu A, Pantofaru C, Cole F, Sud A, Brewington B, Shucker B, Funkhouser T (2021) Learning 3d semantic segmentation with only 2d image supervision. In: 2021 International conference on 3D vision (3DV). IEEE, pp 361–372
Graham B, Engelcke M, Van Der Maaten L (2018) 3d semantic segmentation with submanifold sparse convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9224–9232

Download references

Acknowledgements

This work is supported by the National Nature Science Fund of China under Grant No. 61601176.

Author information

Authors and Affiliations

School of Electrical and Elctronic Engineering, Hubei University of Technology, Wuhan, 430068, China
Mingjie Li, Minghao Zhu, Chunzheng Li & Hong Liu
College of Artificial Intelligence, Tianjin University of Science and Technology, Tianjin, 300457, China
Gaihua Wang, Xuran Pan & Qian Long

Authors

Mingjie Li
View author publications
You can also search for this author in PubMed Google Scholar
Gaihua Wang
View author publications
You can also search for this author in PubMed Google Scholar
Minghao Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Chunzheng Li
View author publications
You can also search for this author in PubMed Google Scholar
Hong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xuran Pan
View author publications
You can also search for this author in PubMed Google Scholar
Qian Long
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Mingjie Li: Conceptualization, Methodology, code, Writing original draft. Gaihua Wang: Conceptualization, Writing, code, Methodology, Supervision. Minghao Zhu: Conceptualization, code. Chunzheng Li: Conceptualization, code. Xuran Pan: Conceptualization, code. Qian Long: Conceptualization, Writing, code, Methodology, Supervision.

Corresponding author

Correspondence to Gaihua Wang.

Ethics declarations

Competing Interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical and informed consent for data used

We confirm that the manuscript has been read and approved by all named authors and that there are no other persons who satisfied the criteria for authorship but are not listed. We further confirm that the order of authors listed in the manuscript has been approved by all of us.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, M., Wang, G., Zhu, M. et al. DFAMNet: dual fusion attention multi-modal network for semantic segmentation on LiDAR point clouds. Appl Intell 54, 3169–3180 (2024). https://doi.org/10.1007/s10489-024-05302-7

Download citation

Accepted: 29 January 2024
Published: 21 February 2024
Issue Date: February 2024
DOI: https://doi.org/10.1007/s10489-024-05302-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DFAMNet: dual fusion attention multi-modal network for semantic segmentation on LiDAR point clouds

Abstract

Access this article

Similar content being viewed by others

PointMS: Semantic Segmentation for Point Cloud Based on Multi-scale Directional Convolution

pCTFusion: Point Convolution-Transformer Fusion with Semantic Aware Loss for Outdoor LiDAR Point Cloud Segmentation

SDANet: spatial deep attention-based for point cloud classification and segmentation

Data availability and access

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interests

Ethical and informed consent for data used

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

DFAMNet: dual fusion attention multi-modal network for semantic segmentation on LiDAR point clouds

Abstract

Access this article

Similar content being viewed by others

PointMS: Semantic Segmentation for Point Cloud Based on Multi-scale Directional Convolution

pCTFusion: Point Convolution-Transformer Fusion with Semantic Aware Loss for Outdoor LiDAR Point Cloud Segmentation

SDANet: spatial deep attention-based for point cloud classification and segmentation

Data availability and access

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interests

Ethical and informed consent for data used

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation