Abstract
Maximizing the advantages of different views and mitigating their respective disadvantages in fine-grained segmentation tasks are an important challenge in the field of point cloud multi-view fusion. Traditional multi-view fusion methods ignore two fatal problems: 1. the loss of depth and quantization information due to mapping and voxelization operations, resulting in “anomalies” in the extracted features; 2. how to pay attention to the large differences in object sizes among different views during point cloud learning, and fine-tune the fusion efficiency in order to improve the performance of network. In this paper, we propose a new algorithm that uses channel self-attention to fuse range-point-voxel, abbreviated as RPV-CASNet. RPV-CASNet integrates the three different views: range, point and voxel in a more subtle way through an interactive structure (range-point-voxel cross-adaptive layer known as RPVLayer for short), to take full advantage of the differences among them. The RPVLayer contains two key designs: the Feature Refinement Module (FRM) and the Multi-Fine-Grained Feature Self-Attention Module(MFGFSAM). Specifically, the FRM allows for a re-inference representation of points with entrained anomalous features, correcting the features. The MFGFSAM addresses two challenges: efficiently aggregating tokens from distant regions and preserving multiscale features within a single attention layer. In addition, we design a Dynamic Feature Pyramid Extractor (DFPE) for network deployment, which is used to extract rich features from spherical range images. Our method achieves impressive mIoU scores of 69.8% and 77.1% on the SemanticKITTI and nuScenes datasets, respectively.









Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of data and materials
The data generated and analysed during the current study are available from the corresponding author on reasonable request.
References
Choy CB, Xu D, Gwak J, Chen K, Savarese S (2016) 3d-r2n2: A unified approach for single and multi-view 3d object reconstruction. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VIII 14, pages 628–644. Springer
Qi CR, Yi L, Su H, Guibas LJ (2017) Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660
Wang Y, Sun Y, Liu Z, Sarma SE, Bronstein MM, Solomon JM (2019) Dynamic graph cnn for learning on point clouds. ACM Transactions on Graphics (tog) 38(5):1–12
Thomas H, Qi CR, Deschaud JE, Marcotegui B, Goulette F, Guibas LJ (2019) Kpconv: Flexible and deformable convolution for point clouds. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6411–6420
Xu C, Wu B, Wang Z, Zhan W, Vajda P, Keutzer K, Tomizuka M (2020) Squeezesegv3: Spatially-adaptive convolution for efficient point-cloud segmentation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVIII 16, pages 1–19. Springer
Milioto A, Vizzo I, Behley J, Stachniss C (2019) Rangenet++: Fast and accurate lidar semantic segmentation. In 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 4213–4220. IEEE
Cortinhal T, Tzelepis G, Erdal Aksoy E (2020) Salsanext: Fast, uncertainty-aware semantic segmentation of lidar point clouds. In Advances in Visual Computing: 15th International Symposium, ISVC 2020, San Diego, CA, USA, October 5–7, 2020, Proceedings, Part II 15, pages 207–222. Springer
Ando A, Gidaris S, Bursuc A, Puy G, Boulch A, Marlet R (2023) Rangevit: Towards vision transformers for 3d semantic segmentation in autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5240–5250
Liu Z, Tang H, Lin Y, Han S (2019) Point-voxel cnn for efficient 3d deep learning. Advances in Neural Information Processing Systems 32
Xu J, Zhang R, Dou J, Zhu Y, Sun J, Pu S (2021) Rpvnet: A deep and efficient range-point-voxel fusion network for lidar point cloud segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 16024–16033
Zhou Y, Sun P, Zhang Y, Anguelov D, Gao J, Ouyang T, Guo J, Ngiam J, Vasudevan V (2020) End-to-end multi-view fusion for 3d object detection in lidar point clouds. In Conference on Robot Learning, pages 923–932. PMLR
Wang Y, Fathi A, Kundu A, Ross DA, Pantofaru C, Funkhouser T, Solomon J (2020) Pillar-based object detection for autonomous driving. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXII 16, pages 18–34. Springer
Liong VE, Nguyen TN, Widjaja S, Sharma D, Chong ZJ (2020) Amvnet: Assertion-based multi-view fusion network for lidar semantic segmentation. arXiv:2012.04934
Zhang F, Fang J, Wah B, Torr P (2020) Deep fusionnet for point cloud semantic segmentation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIV 16, pages 644–663. Springer
Gerdzhev M, Razani R, Taghavi E, Bingbing L (2021) Tornado-net: multiview total variation semantic segmentation with diamond inception module. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 9543–9549. IEEE
Behley J, Garbade M, Milioto A, Quenzel J, Behnke S, Stachniss C, Gall J (2019) Semantickitti: A dataset for semantic scene understanding of lidar sequences. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9297–9307
Caesar H, Bankiti V, Lang AH, Vora S, Liong VE, Xu Q, Krishnan A, Pan Y, Baldan G, Beijbom O (2020) nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11621–11631
Hertz A, Hanocka R, Giryes R, Cohen-Or D (2020) Pointgmm: A neural gmm network for point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12054–12063
Yang X, Jin M, He W, Chen Q (2023) Pointcat: Cross-attention transformer for point cloud. arXiv:2304.03012
Zhao H, Jiang L, Jiang J, Torr PH, Koltun V (2021) Point transformer. In Proceedings of the IEEE/CVF international conference on computer vision, pages 16259–16268
Guo MH, Cai JX, Liu ZN, Mu TJ, Martin RR, Hu SM (2021) Pct: Point cloud transformer. Computational Visual Media 7:187–199
Yan X, Zheng C, Li Z, Wang S, Cui S (2020) Pointasnl: Robust point clouds processing using nonlocal neural networks with adaptive sampling. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5589–5598
Puy G, Boulch A, Marlet R (2023) Using a waffle iron for automotive point cloud semantic segmentation. arXiv:2301.10100
Lin J, Rickert M, Perzylo A, Knoll A (2021) Pctma-net: Point cloud transformer with morphing atlas-based point generation network for dense point cloud completion. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5657–5663. IEEE
Lai X, Liu J, Jiang L, Wang L, Zhao H, Liu S, Qi X, Jia J (2022) Stratified transformer for 3d point cloud segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8500–8509
He C, Li R, Li S, Zhang L (2022) Voxel set transformer: A set-to-set approach to 3d object detection from point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8417–8427
Mao J, Xue Y, Niu M, Bai H, Feng J, Liang X, Xu H, Xu C (2021) Voxel transformer for 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3164–3173
Park C, Jeong Y, Cho M, Park J (2021) Efficient point transformer for large-scale 3d scene understanding
Zhang C, Wan H, Shen X, Wu Z (2022) Pvt: Point-voxel transformer for point cloud learning. Int J Intell Syst 37(12):11985–12008
Zhang C, Wan H, Shen X, Wu Z (2022) Patchformer: An efficient point transformer with patch attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11799–11808
Hu Q, Yang B, Xie L, Rosa S, Guo Y, Wang Z, Trigoni N, Markham A (2020) Randla-net: Efficient semantic segmentation of large-scale point clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11108–11117
Zhou Y, Tuzel O (2018) Voxelnet: End-to-end learning for point cloud based 3d object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4490–4499
Wang Z, Lu F (2019) Voxsegnet: Volumetric cnns for semantic part segmentation of 3d shapes. IEEE Trans Visual Comput Graphics 26(9):2919–2930
Tang H, Liu Z, Zhao S, Lin Y, Lin J, Wang H, Han S (2020) Searching efficient 3d architectures with sparse point-voxel convolution. In European conference on computer vision, pages 685–702. Springer
Park J, Kim C, Kim S, Jo K (2023) Pcscnet: Fast 3d semantic segmentation of lidar point cloud for autonomous car using point convolution and sparse convolution network. Expert Syst Appl 212:118815
Zhang Y, Zhou Z, David P, Yue X, Xi Z, Gong B, Foroosh H (2020) Polarnet: An improved grid representation for online lidar point clouds semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9601–9610
Zhu X, Zhou H, Wang T, Hong F, Ma Y, Li W, Li H, Lin D (2021) Cylindrical and asymmetrical 3d convolution networks for lidar segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9939–9948
Cheng R, Razani R, Taghavi E, Li E, Liu B (2021) 2-s3net: Attentive feature fusion with adaptive feature selection for sparse semantic segmentation network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12547–12556
Li J, Liu Y, Yuan X, Zhao C, Siegwart R, Reid I, Cadena C (2019) Depth based semantic scene completion with position importance aware loss. IEEE Robotics and Automation Letters 5(1):219–226
Kochanov D, Nejadasl FK, Booij O (2020) Kprnet: Improving projection-based lidar semantic segmentation.arXiv:2007.12668
Jhaldiyal A, Chaudhary N (2023) Semantic segmentation of 3d lidar data using deep learning: a review of projection-based methods. Appl Intell 53(6):6844–6855
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv:1804.02767
Aksoy EE, Baci S, Cavdar S (2020) Salsanet: Fast road and vehicle segmentation in lidar point clouds for autonomous driving. In 2020 IEEE intelligent vehicles symposium (IV), pages 926–932. IEEE
Cen J, Zhang S, Pei Y, Li K, Zheng H, Luo M, Zhang Y, Chen Q (2023) Cmdfusion: Bidirectional fusion network with cross-modality knowledge distillation for lidar semantic segmentation. IEEE Robotics and Automation Letters 9(1):771–778
Cheng HX, Han XF, Xiao GQ (2023) Transrvnet: Lidar semantic segmentation with transformer. IEEE Transactions on Intelligent Transportation Systems
Zhao L, Zhou H, Zhu X, Song X, Li H, Tao W (2023) Lif-seg: Lidar and camera image fusion for 3d lidar semantic segmentation. IEEE Transactions on Multimedia
Du J, Huang X, Xing M, Zhang T (2023) Improved 3d semantic segmentation model based on rgb image and lidar point cloud fusion for automantic driving. Int J Automot Technol 24(3):787–797
Chen D, Zhuang M, Zhong X, Wu W, Liu Q (2023) Rspmp: Real-time semantic perception and motion planning for autonomous navigation of unmanned ground vehicle in off-road environments. Appl Intell 53(5):4979–4995
Li M, Wang G, Zhu M, Li C, Liu H, Pan X, Long Q (2024) Dfamnet: dual fusion attention multi-modal network for semantic segmentation on lidar point clouds. Applied Intelligence, pages 1–12
Chen R, Liu Y, Kong L, Zhu X, Ma Y, Li Y, Hou Y, Qiao Y, Wang W (2023) Clip2scene: Towards label-efficient 3d scene understanding by clip. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7020–7030
Kong L, Liu Y, Chen R, Ma Y, Zhu X, Li Y, Hou Y, Qiao Y, Liu Z (2023) Rethinking range view representation for lidar segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 228–240
Kong L, Liu Y, Li X, Chen R, Zhang W, Ren J, Pan L, Chen K, Liu Z (2023) Robo3d: Towards robust and reliable 3d perception against corruptions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 19994–20006
Ren S, Zhou D, He S, Feng J, Wang X (2022) Shunted self-attention via multi-scale token aggregation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10853–10862
Li R, Li X, Heng PA, Fu CW (2020) Pointaugment: an auto-augmentation framework for point cloud classification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6378–6387
Chen Y, Hu VT, Gavves E, Mensink T, Mettes P, Yang P, Snoek CGM (2020) Pointmixup: Augmentation for point clouds. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16, pages 330–345. Springer
Xiao A, Huang J, Guan D, Cui K, Lu S, Shao L (2022) Polarmix: A general data augmentation technique for lidar point clouds. Adv Neural Inf Process Syst 35:11035–11048
Kong L, Ren J, Pan L, Liu Z (2023) Lasermix for semi-supervised lidar semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21705–21715
Li X, Zhang G, Pan H, Wang Z (2022) Cpgnet: Cascade point-grid fusion network for real-time lidar semantic segmentation. In 2022 International Conference on Robotics and Automation (ICRA), pages 11117–11123. IEEE
Acknowledgements
Special thanks to Min Zhao and Chenyang Wang for their invaluable help during the experiments. In addition, I would like to thank Zitai Jiang for his insightful discussions. I would also like to thank the Image Engineering and Pattern Recognition Research Laboratory for their generous support in providing the necessary experimental equipment. Finally, I would like to thank all the viewers who contributed to this paper.
Funding
Research on Key Issues of Action Element Modeling and Deep Interaction Relationship Model Inference in Complex Group Behavior 61672305 (Ranked 1/9), National Natural Science Foundation of China.
Author information
Authors and Affiliations
Contributions
Jiajiong Li: Methodology, Software, Writing original draft and editing; Chuanxu Wang: Conceptualization, Funding acquisition; Chenyang Wang: Resources, Writing review; Min Zhao: Writing review; Zitai Jiang: Writing review.
Corresponding author
Ethics declarations
Competing interests
The authors declare no conflict of interest.
Ethical and informed consent for data used
We comply with ethical and informed consent for the data used.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, J., Wang, C., Wang, C. et al. RPV-CASNet: range-point-voxel integration with channel self-attention network for lidar point cloud segmentation. Appl Intell 54, 7829–7848 (2024). https://doi.org/10.1007/s10489-024-05553-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-024-05553-4