Skip to main content
Log in

RPV-CASNet: range-point-voxel integration with channel self-attention network for lidar point cloud segmentation

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Maximizing the advantages of different views and mitigating their respective disadvantages in fine-grained segmentation tasks are an important challenge in the field of point cloud multi-view fusion. Traditional multi-view fusion methods ignore two fatal problems: 1. the loss of depth and quantization information due to mapping and voxelization operations, resulting in “anomalies” in the extracted features; 2. how to pay attention to the large differences in object sizes among different views during point cloud learning, and fine-tune the fusion efficiency in order to improve the performance of network. In this paper, we propose a new algorithm that uses channel self-attention to fuse range-point-voxel, abbreviated as RPV-CASNet. RPV-CASNet integrates the three different views: range, point and voxel in a more subtle way through an interactive structure (range-point-voxel cross-adaptive layer known as RPVLayer for short), to take full advantage of the differences among them. The RPVLayer contains two key designs: the Feature Refinement Module (FRM) and the Multi-Fine-Grained Feature Self-Attention Module(MFGFSAM). Specifically, the FRM allows for a re-inference representation of points with entrained anomalous features, correcting the features. The MFGFSAM addresses two challenges: efficiently aggregating tokens from distant regions and preserving multiscale features within a single attention layer. In addition, we design a Dynamic Feature Pyramid Extractor (DFPE) for network deployment, which is used to extract rich features from spherical range images. Our method achieves impressive mIoU scores of 69.8% and 77.1% on the SemanticKITTI and nuScenes datasets, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Availability of data and materials

The data generated and analysed during the current study are available from the corresponding author on reasonable request.

References

  1. Choy CB, Xu D, Gwak J, Chen K, Savarese S (2016) 3d-r2n2: A unified approach for single and multi-view 3d object reconstruction. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VIII 14, pages 628–644. Springer

  2. Qi CR, Yi L, Su H, Guibas LJ (2017) Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660

  3. Wang Y, Sun Y, Liu Z, Sarma SE, Bronstein MM, Solomon JM (2019) Dynamic graph cnn for learning on point clouds. ACM Transactions on Graphics (tog) 38(5):1–12

    Article  Google Scholar 

  4. Thomas H, Qi CR, Deschaud JE, Marcotegui B, Goulette F, Guibas LJ (2019) Kpconv: Flexible and deformable convolution for point clouds. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6411–6420

  5. Xu C, Wu B, Wang Z, Zhan W, Vajda P, Keutzer K, Tomizuka M (2020) Squeezesegv3: Spatially-adaptive convolution for efficient point-cloud segmentation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVIII 16, pages 1–19. Springer

  6. Milioto A, Vizzo I, Behley J, Stachniss C (2019) Rangenet++: Fast and accurate lidar semantic segmentation. In 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 4213–4220. IEEE

  7. Cortinhal T, Tzelepis G, Erdal Aksoy E (2020) Salsanext: Fast, uncertainty-aware semantic segmentation of lidar point clouds. In Advances in Visual Computing: 15th International Symposium, ISVC 2020, San Diego, CA, USA, October 5–7, 2020, Proceedings, Part II 15, pages 207–222. Springer

  8. Ando A, Gidaris S, Bursuc A, Puy G, Boulch A, Marlet R (2023) Rangevit: Towards vision transformers for 3d semantic segmentation in autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5240–5250

  9. Liu Z, Tang H, Lin Y, Han S (2019) Point-voxel cnn for efficient 3d deep learning. Advances in Neural Information Processing Systems 32

  10. Xu J, Zhang R, Dou J, Zhu Y, Sun J, Pu S (2021) Rpvnet: A deep and efficient range-point-voxel fusion network for lidar point cloud segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 16024–16033

  11. Zhou Y, Sun P, Zhang Y, Anguelov D, Gao J, Ouyang T, Guo J, Ngiam J, Vasudevan V (2020) End-to-end multi-view fusion for 3d object detection in lidar point clouds. In Conference on Robot Learning, pages 923–932. PMLR

  12. Wang Y, Fathi A, Kundu A, Ross DA, Pantofaru C, Funkhouser T, Solomon J (2020) Pillar-based object detection for autonomous driving. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXII 16, pages 18–34. Springer

  13. Liong VE, Nguyen TN, Widjaja S, Sharma D, Chong ZJ (2020) Amvnet: Assertion-based multi-view fusion network for lidar semantic segmentation. arXiv:2012.04934

  14. Zhang F, Fang J, Wah B, Torr P (2020) Deep fusionnet for point cloud semantic segmentation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIV 16, pages 644–663. Springer

  15. Gerdzhev M, Razani R, Taghavi E, Bingbing L (2021) Tornado-net: multiview total variation semantic segmentation with diamond inception module. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 9543–9549. IEEE

  16. Behley J, Garbade M, Milioto A, Quenzel J, Behnke S, Stachniss C, Gall J (2019) Semantickitti: A dataset for semantic scene understanding of lidar sequences. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9297–9307

  17. Caesar H, Bankiti V, Lang AH, Vora S, Liong VE, Xu Q, Krishnan A, Pan Y, Baldan G, Beijbom O (2020) nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11621–11631

  18. Hertz A, Hanocka R, Giryes R, Cohen-Or D (2020) Pointgmm: A neural gmm network for point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12054–12063

  19. Yang X, Jin M, He W, Chen Q (2023) Pointcat: Cross-attention transformer for point cloud. arXiv:2304.03012

  20. Zhao H, Jiang L, Jiang J, Torr PH, Koltun V (2021) Point transformer. In Proceedings of the IEEE/CVF international conference on computer vision, pages 16259–16268

  21. Guo MH, Cai JX, Liu ZN, Mu TJ, Martin RR, Hu SM (2021) Pct: Point cloud transformer. Computational Visual Media 7:187–199

    Article  Google Scholar 

  22. Yan X, Zheng C, Li Z, Wang S, Cui S (2020) Pointasnl: Robust point clouds processing using nonlocal neural networks with adaptive sampling. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5589–5598

  23. Puy G, Boulch A, Marlet R (2023) Using a waffle iron for automotive point cloud semantic segmentation. arXiv:2301.10100

  24. Lin J, Rickert M, Perzylo A, Knoll A (2021) Pctma-net: Point cloud transformer with morphing atlas-based point generation network for dense point cloud completion. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5657–5663. IEEE

  25. Lai X, Liu J, Jiang L, Wang L, Zhao H, Liu S, Qi X, Jia J (2022) Stratified transformer for 3d point cloud segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8500–8509

  26. He C, Li R, Li S, Zhang L (2022) Voxel set transformer: A set-to-set approach to 3d object detection from point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8417–8427

  27. Mao J, Xue Y, Niu M, Bai H, Feng J, Liang X, Xu H, Xu C (2021) Voxel transformer for 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3164–3173

  28. Park C, Jeong Y, Cho M, Park J (2021) Efficient point transformer for large-scale 3d scene understanding

  29. Zhang C, Wan H, Shen X, Wu Z (2022) Pvt: Point-voxel transformer for point cloud learning. Int J Intell Syst 37(12):11985–12008

    Article  Google Scholar 

  30. Zhang C, Wan H, Shen X, Wu Z (2022) Patchformer: An efficient point transformer with patch attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11799–11808

  31. Hu Q, Yang B, Xie L, Rosa S, Guo Y, Wang Z, Trigoni N, Markham A (2020) Randla-net: Efficient semantic segmentation of large-scale point clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11108–11117

  32. Zhou Y, Tuzel O (2018) Voxelnet: End-to-end learning for point cloud based 3d object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4490–4499

  33. Wang Z, Lu F (2019) Voxsegnet: Volumetric cnns for semantic part segmentation of 3d shapes. IEEE Trans Visual Comput Graphics 26(9):2919–2930

    Article  Google Scholar 

  34. Tang H, Liu Z, Zhao S, Lin Y, Lin J, Wang H, Han S (2020) Searching efficient 3d architectures with sparse point-voxel convolution. In European conference on computer vision, pages 685–702. Springer

  35. Park J, Kim C, Kim S, Jo K (2023) Pcscnet: Fast 3d semantic segmentation of lidar point cloud for autonomous car using point convolution and sparse convolution network. Expert Syst Appl 212:118815

    Article  Google Scholar 

  36. Zhang Y, Zhou Z, David P, Yue X, Xi Z, Gong B, Foroosh H (2020) Polarnet: An improved grid representation for online lidar point clouds semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9601–9610

  37. Zhu X, Zhou H, Wang T, Hong F, Ma Y, Li W, Li H, Lin D (2021) Cylindrical and asymmetrical 3d convolution networks for lidar segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9939–9948

  38. Cheng R, Razani R, Taghavi E, Li E, Liu B (2021) 2-s3net: Attentive feature fusion with adaptive feature selection for sparse semantic segmentation network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12547–12556

  39. Li J, Liu Y, Yuan X, Zhao C, Siegwart R, Reid I, Cadena C (2019) Depth based semantic scene completion with position importance aware loss. IEEE Robotics and Automation Letters 5(1):219–226

    Article  Google Scholar 

  40. Kochanov D, Nejadasl FK, Booij O (2020) Kprnet: Improving projection-based lidar semantic segmentation.arXiv:2007.12668

  41. Jhaldiyal A, Chaudhary N (2023) Semantic segmentation of 3d lidar data using deep learning: a review of projection-based methods. Appl Intell 53(6):6844–6855

    Article  Google Scholar 

  42. Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv:1804.02767

  43. Aksoy EE, Baci S, Cavdar S (2020) Salsanet: Fast road and vehicle segmentation in lidar point clouds for autonomous driving. In 2020 IEEE intelligent vehicles symposium (IV), pages 926–932. IEEE

  44. Cen J, Zhang S, Pei Y, Li K, Zheng H, Luo M, Zhang Y, Chen Q (2023) Cmdfusion: Bidirectional fusion network with cross-modality knowledge distillation for lidar semantic segmentation. IEEE Robotics and Automation Letters 9(1):771–778

    Article  Google Scholar 

  45. Cheng HX, Han XF, Xiao GQ (2023) Transrvnet: Lidar semantic segmentation with transformer. IEEE Transactions on Intelligent Transportation Systems

  46. Zhao L, Zhou H, Zhu X, Song X, Li H, Tao W (2023) Lif-seg: Lidar and camera image fusion for 3d lidar semantic segmentation. IEEE Transactions on Multimedia

  47. Du J, Huang X, Xing M, Zhang T (2023) Improved 3d semantic segmentation model based on rgb image and lidar point cloud fusion for automantic driving. Int J Automot Technol 24(3):787–797

    Article  Google Scholar 

  48. Chen D, Zhuang M, Zhong X, Wu W, Liu Q (2023) Rspmp: Real-time semantic perception and motion planning for autonomous navigation of unmanned ground vehicle in off-road environments. Appl Intell 53(5):4979–4995

    Google Scholar 

  49. Li M, Wang G, Zhu M, Li C, Liu H, Pan X, Long Q (2024) Dfamnet: dual fusion attention multi-modal network for semantic segmentation on lidar point clouds. Applied Intelligence, pages 1–12

  50. Chen R, Liu Y, Kong L, Zhu X, Ma Y, Li Y, Hou Y, Qiao Y, Wang W (2023) Clip2scene: Towards label-efficient 3d scene understanding by clip. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7020–7030

  51. Kong L, Liu Y, Chen R, Ma Y, Zhu X, Li Y, Hou Y, Qiao Y, Liu Z (2023) Rethinking range view representation for lidar segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 228–240

  52. Kong L, Liu Y, Li X, Chen R, Zhang W, Ren J, Pan L, Chen K, Liu Z (2023) Robo3d: Towards robust and reliable 3d perception against corruptions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 19994–20006

  53. Ren S, Zhou D, He S, Feng J, Wang X (2022) Shunted self-attention via multi-scale token aggregation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10853–10862

  54. Li R, Li X, Heng PA, Fu CW (2020) Pointaugment: an auto-augmentation framework for point cloud classification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6378–6387

  55. Chen Y, Hu VT, Gavves E, Mensink T, Mettes P, Yang P, Snoek CGM (2020) Pointmixup: Augmentation for point clouds. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16, pages 330–345. Springer

  56. Xiao A, Huang J, Guan D, Cui K, Lu S, Shao L (2022) Polarmix: A general data augmentation technique for lidar point clouds. Adv Neural Inf Process Syst 35:11035–11048

    Google Scholar 

  57. Kong L, Ren J, Pan L, Liu Z (2023) Lasermix for semi-supervised lidar semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21705–21715

  58. Li X, Zhang G, Pan H, Wang Z (2022) Cpgnet: Cascade point-grid fusion network for real-time lidar semantic segmentation. In 2022 International Conference on Robotics and Automation (ICRA), pages 11117–11123. IEEE

Download references

Acknowledgements

Special thanks to Min Zhao and Chenyang Wang for their invaluable help during the experiments. In addition, I would like to thank Zitai Jiang for his insightful discussions. I would also like to thank the Image Engineering and Pattern Recognition Research Laboratory for their generous support in providing the necessary experimental equipment. Finally, I would like to thank all the viewers who contributed to this paper.

Funding

Research on Key Issues of Action Element Modeling and Deep Interaction Relationship Model Inference in Complex Group Behavior 61672305 (Ranked 1/9), National Natural Science Foundation of China.

Author information

Authors and Affiliations

Authors

Contributions

Jiajiong Li: Methodology, Software, Writing original draft and editing; Chuanxu Wang: Conceptualization, Funding acquisition; Chenyang Wang: Resources, Writing review; Min Zhao: Writing review; Zitai Jiang: Writing review.

Corresponding author

Correspondence to Chuanxu Wang.

Ethics declarations

Competing interests

The authors declare no conflict of interest.

Ethical and informed consent for data used

We comply with ethical and informed consent for the data used.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, J., Wang, C., Wang, C. et al. RPV-CASNet: range-point-voxel integration with channel self-attention network for lidar point cloud segmentation. Appl Intell 54, 7829–7848 (2024). https://doi.org/10.1007/s10489-024-05553-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-024-05553-4

Keywords