Abstract
Weakly supervised point cloud segmentation has garnered considerable interest recently, primarily due to its ability to diminish labor-intensive manual labeling costs. The effectiveness of such methods hinges on their ability to augment the supervision signals available for training implicitly. However, we found that most approaches tend to be implemented through complex modeling, which is not conducive to deployment and implementation in resource-poor scenarios. Our study introduces a novel scene consistency modeling approach that significantly enhances weakly supervised point cloud segmentation in this context. By synergistically modeling both complete and incomplete scenes, our method can improve the quality of the supervision signal and save more resources and ease of deployment in practical applications. To achieve this, we first generate the corresponding incomplete scene for the whole scene using windowing techniques. Next, we input the complete and incomplete scenes into a network encoder and obtain prediction results for each scene through two decoders. We enforce semantic consistency between the labeled and unlabeled data in the two scenes by employing cross-entropy and KL loss. This consistent modeling method enables the network to focus more on the same areas in both scenes, capturing local details and effectively increasing the supervision signals. One of the advantages of the proposed method is its simplicity and cost-effectiveness. Because we rely solely on variance and KL loss to model scene consistency, resulting in straightforward computations. Our experimental evaluations on S3DIS, ScanNet, and Semantic3D datasets provide further evidence that our method can effectively leverage sparsely labeled data and abundant unlabeled data to enhance supervision signals and improve the overall model performance.








Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availibility
The S3DIS dataset used in the article can be obtained at http://buildingparser.stanford.edu/dataset.html. The Semantic3D dataset used in the article can be obtained at http://www.semantic3d.net/view_dbase.php?chl=1. The Scannet dataset used in the article can be obtained at http://www.scan-net.org/.
References
Chen X, Ma H, Wan J, Li B, Xia T (2016) Multi-view 3d object detection network for autonomous driving. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6526–6534
Li Y, Ma L, Zhong Z, Liu F, Chapman MA, Cao D, Li J (2021) Deep learning for lidar point clouds in autonomous driving: A review. IEEE Trans Neural Netw Learn Syst 32(8):3412–3432
Blanc T, El Beheiry M, Caporal C, Masson J-B, Hajj B (2020) Genuage: visualize and analyze multidimensional single-molecule point cloud data in virtual reality. Nature Methods 1–3
Mildenhall B, Srinivasan PP, Tancik M, Barron JT, Ramamoorthi R, Ng R (2020) Nerf: Representing scenes as neural radiance fields for view synthesis. arxiv:2003.08934
Guo J, Borges Pvk, Park C, Gawel AR (2019) Local descriptor for robust place recognition using lidar intensity. In: IEEE, pp 1470–1477
Qi C, Su H, Mo K, Guibas LJ (2016) Pointnet: Deep learning on point sets for 3d classification and segmentation. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 77–85
Qi C, Yi L, Su H, Guibas LJ (2017) Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In: NIPS
Hu Q, Yang B, Xie L, Rosa S, Guo Y, Wang Z, Trigoni A, Markham A (2019) Randla-net: Efficient semantic segmentation of large-scale point clouds. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 11105–11114
Dai A, Chang AX, Savva M, Halber M, Funkhouser T, Nießner M (2017) Scannet: Richly-annotated 3d reconstructions of indoor scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5828–5839
Wei J, Lin G, Yap K-H, Hung T-Y, Xie L (2020) Multi-path region mining for weakly supervised 3d semantic segmentation on point clouds. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4384–4393
Xu X, Lee GH (2020) Weakly supervised semantic point cloud segmentation: Towards 10x fewer labels. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13706–13715
Zhang Y, Qu Y, Xie Y, Li Z, Zheng S, Li C (2021) Perturbed self-distillation: Weakly supervised large-scale point cloud semantic segmentation. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp 15500–15508
Hu Q, Yang B, Fang G, Guo Y, Leonardis A, Trigoni N, Markham A (2021) Sqn: Weakly-supervised semantic segmentation of large-scale 3d point clouds with 1000x fewer labels. In: European Conference on Computer Vision
Li J, Chen BM, Lee GH (2018) So-net: Self-organizing network for point cloud analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9397–9406
Zhang Z, Hua B-S, Yeung S-K (2019) Shellnet: Efficient point cloud convolutional neural networks using concentric shells statistics. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1607–1616
Li G, Muller M, Thabet A, Ghanem B (2019) Deepgcns: Can gcns go as deep as cnns? In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9267–9276
Fan S, Dong Q, Zhu F, Lv Y, Ye P, Wang F-Y (2021) Scf-net: Learning spatial contextual features for large-scale point cloud segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14504–14513
Hou Y, Zhu X, Ma Y, Loy CC, Li Y (2022) Point-to-voxel knowledge distillation for lidar semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8479–8488
Wang Z, Rao Y, Yu X, Zhou J, Lu J (2022) Semaffinet: Semantic-affine transformation for point cloud segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11819–11829
Lai X, Liu J, Jiang L, Wang L, Zhao H, Liu S, Qi X, Jia J (2022) Stratified transformer for 3d point cloud segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8500–8509
Tang L, Zhan Y, Chen Z, Yu B, Tao D (2022) Contrastive boundary learning for point cloud segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8489–8499
Liu L, Zhuang Z, Huang S, Xiao X, Xiang T-Z, Chen C, Wang J, Tan M (2023) Cpcm: Contextual point cloud modeling for weakly-supervised point cloud semantic segmentation. arxiv:2307.10316
Niu Y, Yin J (2024) Weakly supervised point cloud semantic segmentation with the fusion of heterogeneous network features. Image Vis Comput 142:104916
Unal O, Dai D, Gool LV (2022) Scribble-supervised lidar semantic segmentation. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 2687–2697
Zhang Y, Li Z, Xie Y, Qu Y, Li C, Mei T (2021) Weakly supervised semantic segmentation for large-scale point cloud. arxiv:2212.04744
Liu Z, Qi X, Fu C-W (2021) One thing one click: A self-training approach for weakly supervised 3d semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1726–1736
Wei J, Lin G, Yap K-H, Liu F, Hung T-Y (2021) Dense supervision propagation for weakly supervised semantic segmentation on 3d point clouds. arXiv preprint arXiv:2107.11267
Armeni I, Sax S, Zamir AR, Savarese S (2017) Joint 2d-3d-semantic data for indoor scene understanding
Hackel T, Savinov N, Ladicky L, Wegner JD, Schindler K, Pollefeys M (2017) Semantic3d.net: A new large-scale point cloud classification benchmark. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences
Dai A, Chang AX, Savva M, Halber M, Funkhouser T, Nießner M (2017) Scannet: Richly-annotated 3d reconstructions of indoor scenes. In: Proc. Computer Vision and Pattern Recognition (CVPR), IEEE
Lei H, Akhtar N, Mian A (2021) Spherical kernel for efficient graph convolution on 3d point clouds. IEEE Trans Pattern Anal Mach Intell 43(10):3664–3680
Acknowledgements
This work was supported partly by the National Natural Science Foundation of China (Grant No. 62173045, 62273054), partly by the Fundamental Research Funds for the Central Universities (Grant No. 2020XD-A04-3), and the Natural Science Foundation of Hainan Province (Grant No. 622RC675).
Author information
Authors and Affiliations
Contributions
Yingchun Niu: Conceptualisation, Methodology, Software, Validation, Formal analysis, Investigation, Data Curation, Writing original draft. Jianqin Yin: Conceptualisation, Methodology. Chao Qi: Methodology, Software, Validation, Formal analysis, Investigation. Liang Geng: Investigation, Data Curation.
Corresponding author
Ethics declarations
Ethical and informed consent for data used
Data in the present study are publicly available, and ethical approval and informed consent were obtained in each original study.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Competing interests
The authors have no relevant financial or nonfinancial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Niu, Y., Yin, J., Qi, C. et al. Weakly supervised point cloud semantic segmentation based on scene consistency. Appl Intell 54, 12439–12452 (2024). https://doi.org/10.1007/s10489-024-05822-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-024-05822-2