Skip to main content
Log in

RM3D: Robust Data-Efficient 3D Scene Parsing via Traditional and Learnt 3D Descriptors-Based Semantic Region Merging

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Existing state-of-the-art 3D point clouds understanding methods merely perform well in a fully supervised manner. To the best of our knowledge, there exists no unified framework which simultaneously solves the downstream high-level understanding tasks including both segmentation and detection, especially when labels are extremely limited. This work presents a general and simple framework to tackle point clouds understanding when labels are limited. The first contribution is that we have done extensive methodology comparisons of traditional and learnt 3D descriptors for the task of weakly supervised 3D scene understanding, and validated that our adapted traditional PFH-based 3D descriptors show excellent generalization ability across different domains. The second contribution is that we proposed a learning-based region merging strategy based on the affinity provided by both the traditional/learnt 3D descriptors and learnt semantics. The merging process takes both low-level geometric and high-level semantic feature correlations into consideration. Experimental results demonstrate that our framework has the best performance among the three most important weakly supervised point clouds understanding tasks including semantic segmentation, instance segmentation, and object detection even when very limited number of points are labeled. Our method, termed Region Merging 3D (RM3D), has superior performance on ScanNet data-efficient learning online benchmarks and other four large-scale 3D understanding benchmarks under various experimental settings, outperforming current arts by a margin for various 3D understanding tasks without complicated learning strategies such as active learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Armeni, I., Sener, O., Zamir, A.R., Jiang, H., Brilakis, I., Fischer, M., & Savarese, S. (2016). 3d semantic parsing of large-scale indoor spaces. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1534–1543).

  • Aubry, M., Schlickewei, U., & Cremers, D. (2011). The wave kernel signature: A quantum mechanical approach to shape analysis. In 2011 IEEE international conference on computer vision workshops (ICCV workshops) (pp. 1626–1633). IEEE.

  • Beamer, S., Asanovic, K., & Patterson, D. (2012). Direction-optimizing breadth-first search. In SC’12: Proceedings of the international conference on high performance computing, networking, storage and analysis (pp. 1–10). IEEE.

  • Behley, J., Garbade, M., Milioto, A., Quenzel, J., Behnke, S., Stachniss, C., & Gall, J. (2019). Semantickitti: A dataset for semantic scene understanding of lidar sequences. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 9297–9307).

  • Behley, J., Steinhage, V., & Cremers, A. B. (2015) Efficient radius neighbor search in three-dimensional point clouds. In 2015 IEEE international conference on robotics and automation (ICRA) (pp. 3625–3630). IEEE.

  • Behley, J., Garbade, M., Milioto, A., Quenzel, J., Behnke, S., Gall, J., & Stachniss, C. (2021). Towards 3d lidar-based semantic scene understanding of 3d point cloud sequences: The semantickitti dataset. The International Journal of Robotics Research, 40(8–9), 959–967.

    Article  Google Scholar 

  • Bronstein, M. M., & Kokkinos, I. (2010). Scale-invariant heat kernel signatures for non-rigid shape recognition. In 2010 IEEE computer society conference on computer vision and pattern recognition (pp. 1704–1711). IEEE.

  • Chen, Y., Liu, Z., Zhang, B., Fok, W., Qi, X., Wu, & Y.C. (2023). Mgfn: Magnitude-contrastive glance-and-focus network for weakly-supervised video anomaly detection. In AAAI conference on artificial intelligence. AAAI.

  • Chen, D.Y., Tian, X.P., Shen, Y.T., & Ouhyoung, M. (2003). On visual similarity based 3d model retrieval. In Computer graphics forum (Vol. 22, pp. 223–232). Wiley Online Library.

  • Cheng, M., Hui, L., Xie, J., & Yang, J. (2021). Sspc-net: Semi-supervised semantic 3d point cloud segmentation network. In Proceedings of the AAAI conference on artificial intelligence (Vol. 35, pp. 1140–1147).

  • Choy, C., Gwak, J., & Savarese, S. (2019). 4d spatio-temporal convnets: Minkowski convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3075–3084).

  • Chu, R., Ye, X., Liu, Z., Tan, X., Qi, X., Fu, C.W., & Jia, J. (2022)Twist: Two-way inter-label self-training for semi-supervised 3d instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1100–1109).

  • Dai, A., Chang, A. X., Savva, M., Halber, M., Funkhouser, T., & Nießner, M. (2017). Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5828–5839).

  • Eckart, B., Yuan, W., Liu, C., & Kautz, J. (2021). Self-supervised learning on 3d point clouds by learning discrete generative models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 8248–8257).

  • Gaidon, A., Wang, Q., Cabon, Y., & Vig, E. (2016). Virtual worlds as proxy for multi-object tracking analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4340–4349).

  • Gebal, K., Bærentzen, J.A., Aanæs, H., & Larsen, R. (2009). Shape analysis using the auto diffusion function. In Computer graphics forum (Vol. 28, pp. 1405–1413). Wiley Online Library.

  • Geiger, A., Lenz, P., Stiller, C., & Urtasun, R. (2013). Vision meets robotics: The kitti dataset. The International Journal of Robotics Research, 32(11), 1231–1237.

    Article  Google Scholar 

  • Gong, J., Xu, J., Tan, X., Song, H., Qu, Y., Xie, Y., & Ma, L. (2021). Omni-supervised point cloud segmentation via gradual receptive field component reasoning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11,673–11,682).

  • Graham, B., Engelcke, M., & van der Maaten, L. (2018) 3d semantic segmentation with submanifold sparse convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 9224–9232).

  • Hackel, T., Savinov, N., Ladicky, L., Wegner, J. D., Schindler, K., & Pollefeys, M. (2017). Semantic3d. net: A new large-scale point cloud classification benchmark. arXiv preprint arXiv:1704.03847

  • He, T., Shen, C., & van den Hengel, A. (2021). Dyco3d: Robust instance segmentation of 3d point clouds through dynamic convolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 354–363).

  • Hou, J., Dai, A., & Niessner, M. (2019) 3d-sis: 3d semantic instance segmentation of rgb-d scans. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (2019).

  • Hou, J., Graham, B., Nießner, M., & Xie, S. (2021). Exploring data-efficient 3d scene understanding with contrastive scene contexts. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 15,587–15,597).

  • Huang, S., Gojcic, Z., Usvyatsov, M., Wieser, A., & Schindler, K.(2021). Predator: Registration of 3d point clouds with low overlap. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4267–4276).

  • Imani, E., & White, M. (2018). Improving regression performance with distributional losses. In International conference on machine learning (pp. 2157–2166). PMLR.

  • Jiang, L., Shi, S., Tian, Z., Lai, X., Liu, S., Fu, C.W., & Jia, J. (2021). Guided point contrastive learning for semi-supervised point cloud semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6423–6432).

  • Jiang, M., Wu, Y., Zhao, T., Zhao, Z., & Lu, C. (2018). Pointsift: A sift-like network module for 3d point cloud semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.

  • Jiang, L., Zhao, H., Shi, S., Liu, S., Fu, C.W., & Jia, J. (2020). Pointgroup: Dual-set point grouping for 3d instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4867–4876).

  • Kundu, A., Yin, X., Fathi, A., Ross, D., Brewington, B., Funkhouser, T., & Pantofaru, C. (2020). Virtual multi-view fusion for 3d semantic segmentation. In European conference on computer vision (pp. 518–535). Springer.

  • Landrieu, L., & Boussaha, M. (2019). Point cloud oversegmentation with graph-structured deep metric learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 7440–7449).

  • Landrieu, L., & Simonovsky, M. (2018) Large-scale point cloud semantic segmentation with superpoint graphs. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 4558–4567).

  • Landrieu, L., & Obozinski, G. (2017). Cut pursuit: Fast algorithms to learn piecewise constant functions on general weighted graphs. SIAM Journal on Imaging Sciences, 10(4), 1724–1766.

    Article  MathSciNet  MATH  Google Scholar 

  • Li, X., Sun, X., Meng, Y., Liang, J., Wu, F., & Li, J. (2019). Dice loss for data-imbalanced nlp tasks. arXiv preprint arXiv:1911.02855

  • Li, M., Xie, Y., Shen, Y., Ke, B., Qiao, R., Ren, B., Lin, S., Ma, L. (2022) Hybridcr: Weakly-supervised 3d point cloud semantic segmentation via hybrid contrastive regularization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 14,930–14,939).

  • Liu, K. (2022). A robust and efficient lidar-inertial-visual fused simultaneous localization and mapping system with loop closure. In 2022 12th international conference on CYBER technology in automation, control, and intelligent systems (CYBER) (pp. 1182–1187). IEEE.

  • Liu, K. (2022). An integrated lidar-slam system for complex environment with noisy point clouds. arXiv preprint arXiv:2212.05705

  • Liu, K. (2022). Robust industrial uav/ugv-based unsupervised domain adaptive crack recognitions with depth and edge awareness: From system and database constructions to real-site inspections. In Proceedings of the 30th ACM international conference on multimedia (pp. 5361–5370).

  • Liu, K. (2022). Semi-supervised confidence-level-based contrastive discrimination for class-imbalanced semantic segmentation. In 2022 12th International conference on CYBER technology in automation, control, and intelligent systems (CYBER) (pp. 1230–1235). IEEE.

  • Liu, K., & Ou, H. (2022). A light-weight lidar-inertial slam system with high efficiency and loop closure detection capacity. In 2022 International conference on advanced robotics and mechatronics (ICARM) (pp. 284–289). IEEE.

  • Liu, K., & Ou, H. (2022). A light-weight lidar-inertial slam system with loop closing. arXiv preprint arXiv:2212.05743

  • Liu, Y., Fan, B., Xiang, S., & Pan, C. (2019). Relation-shape convolutional neural network for point cloud analysis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8895–8904).

  • Liu, K., Gao, Z., Lin, F., & Chen, B.M. (2020). Fg-net: Fast large-scale lidar point clouds understanding network leveraging correlatedfeature mining and geometric-aware modelling. arXiv preprint arXiv:2012.09439

  • Liu, K., Gao, Z., Lin, F., & Chen, B. M. (2021). Fg-conv: Large-scale lidar point clouds understanding leveraging feature correlation mining and geometric-aware modeling. In 2021 IEEE international conference on robotics and automation (ICRA) (pp. 12,896–12,902). IEEE.

  • Liu, K., Han, X., & Chen, B. M. (2019). Deep learning based automatic crack detection and segmentation for unmanned aerial vehicle inspections. In 2019 IEEE international conference on robotics and biomimetics (ROBIO) (pp. 381–387). IEEE.

  • Liu, Z., Qi, X., & Fu, C.-W. (2021). 3d-to-2d distillation for indoor scene parsing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4464–4474).

  • Liu, Z., Qi, X., Fu, & C. W. (2021). One thing one click: A self-training approach for weakly supervised 3d semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 1726–1736).

  • Liu, K., Xiao, A., Huang, J., Cui, K., Xing, Y., & Lu, S. (2022). D-lc-nets: Robust denoising and loop closing networks for lidar slam in complicated circumstances with noisy point clouds. In IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 3097–3106).

  • Liu, K., Yang, G., Zhang, J., Zhao, Z., Chen, X., Chen, B. M. (2022). Datasets and methods for boosting infrastructure inspection: A survey on defect segmentation and detection. In 2022 IEEE 17th international conference on control & automation (ICCA) (pp. 23–30). IEEE.

  • Liu, K., Zhao, Y., Gao, Z., Chen, B. M. (2022). Weaklabel3d-net: A complete framework for real-scene lidar point clouds weakly supervised multi-tasks understanding. In 2022 international conference on robotics and automation (ICRA), pp. 5108–5115. IEEE.

  • Liu, K., Zhao, Y., Nie, Q., Gao, Z., & Chen, B. M. (2022). Weakly supervised 3d scene segmentation with region-level boundary awareness and instance discrimination. In European conference on computer vision (pp. 37–55). Springer.

  • Liu, K., Zhao, Y., Nie, Q., Gao, Z., Chen, B. M. (2022). Ws3d supplementary material. In European conference on computer vision (ECCV) (pp. 37–55). Cham: Springer.

  • Liu, K., Zhou, X., Zhao, B., Ou, H., Chen, B. M. (2022). An integrated visual system for unmanned aerial vehicles following ground vehicles: Simulations and experiments. In 2022 IEEE 17th International Conference on Control & Automation (ICCA) (pp. 593–598). IEEE.

  • Liu, K., & Chen, B. M. (2022). Industrial uav-based unsupervised domain adaptive crack recognitions: From system setups to real-site infrastructural inspections. IEEE Transactions on Industrial Electronics, 1(1), 1–11.

    Google Scholar 

  • Liu, K., Gao, Z., Lin, F., & Chen, B. M. (2022). Fg-net: A fast and accurate framework for large-scale lidar point cloud understanding. IEEE Transactions on Cybernetics, 1(1), 1–12.

    Google Scholar 

  • Liu, K., Qu, Y., Kim, H. M., & Song, H. (2017). Avoiding frequency second dip in power unreserved control during wind power rotational speed recovery. IEEE Transactions on Power Systems, 33(3), 3097–3106.

    Article  Google Scholar 

  • Luo, L., Tian, B., Zhao, H., & Zhou, G. (2021). Pointly-supervised 3d scene parsing with viewpoint bottleneck. arXiv preprint arXiv:2109.08553

  • Papon, J., Abramov, A., Schoeler, M., & Worgotter, F. (2013). Voxel cloud connectivity segmentation-supervoxels for point clouds. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2027–2034).

  • Pham, Q. H., Nguyen, T., Hua, B. S., Roig, G., & Yeung, S. K. (2019). Jsis3d: Joint semantic-instance segmentation of 3d point clouds with multi-task pointwise networks and multi-value conditional random fields. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition) (pp. 8827–8836).

  • Qi, C. R., Litany, O., He, K., Guibas, L. J. (2019). Deep hough voting for 3d object detection in point clouds. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9277–9286).

  • Ren, J., Pan, L., Liu, Z. (2022). Benchmarking and analyzing point cloud classification under corruptions. ICML.

  • Rusu, R. B., & Cousins, S. (2011). 3d is here: Point cloud library (pcl). In 2011 IEEE international conference on robotics and automation (pp. 1–4). IEEE.

  • Rusu, R. B., Blodow, N., & Beetz, M. (2009). Fast point feature histograms (fpfh) for 3d registration. In 2009 IEEE international conference on robotics and automation (ICRA) (pp. 3212–3217). IEEE.

  • Rusu, R. B., Blodow, N., Marton, Z. C., & Beetz, M. (2008). Aligning point cloud views using persistent feature histograms. In 2008 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 3384–3391). IEEE.

  • Salti, S., Tombari, F., & Di Stefano, L. (2014). Shot: Unique signatures of histograms for surface and texture description. Computer Vision and Image Understanding, 125, 251–264.

    Article  Google Scholar 

  • Shi, X., Xu, X., Chen, K., Cai, L., Foo, C.S., & Jia, K. (2021). Label-efficient point cloud semantic segmentation: An active learning approach. arXiv preprint arXiv:2101.06931

  • Sun, P., Kretzschmar, H., Dotiwalla, X., Chouard, A., Patnaik, V., Tsui, P., Guo, J., Zhou, Y., Chai, Y., & Caine, B., et al. (2020). Scalability in perception for autonomous driving: Waymo open dataset. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2446–2454).

  • Sun, J., Ovsjanikov, M., & Guibas, L. (2009). A concise and provably informative multi-scale signature based on heat diffusion. In Computer graphics forum (Vol. 28, pp. 1383–1392). Wiley Online Library.

  • Tang, W., & Zou, D. (2022). Multi-instance point cloud registration by efficient correspondence clustering. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6667–6676).

  • Thabet, A., Alwassel, H., & Ghanem, B. (2020). Self-supervised learning of local features in 3d point clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops (pp. 938–939).

  • Tombari, F., Salti, S., & Di Stefano, L. (2010). Unique signatures of histograms for local surface description. In European conference on computer vision (ECCV) (pp. 356–369). Springer.

  • Uijlings, J. R., Van De Sande, K. E., Gevers, T., & Smeulders, A. W. (2013). Selective search for object recognition. International Journal of Computer Vision, 104(2), 154–171.

    Article  Google Scholar 

  • Wang, H., Liu, Q., Yue, X., Lasenby, J., & Kusner, M. J. (2021). Unsupervised point cloud pre-training via occlusion completion. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9782–9792).

  • Wang, H., Rong, X., Yang, L., Feng, J., Xiao, J., Tian, Y.(2020). Weakly supervised semantic segmentation in 3d graph-structured point clouds of wild scenes. arXiv preprint arXiv:2004.12498

  • Wen, X., Han, Z., Youk, G., Liu, Y. S. (2020). Cf-sis: Semantic-instance segmentation of 3d point clouds by context fusion with self-attention. In Proceedings of the 28th ACM international conference on multimedia (pp. 1661–1669).

  • Wu, T. H., Liu, Y. C., Huang, Y. K., Lee, H. Y., Su, H. T., Huang, P. C., Hsu, W. H. (2021). Redal: Region-based and diversity-aware active learning for point cloud semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 15,510–15,519).

  • Xie, S., Gu, J., Guo, D., Qi, C. R., Guibas, L., & Litany, O. (2020). Pointcontrast: Unsupervised pre-training for 3d point cloud understanding. In European conference on computer vision (ECCV) (pp. 574–591). Springer.

  • Xu, X., & Lee, G. H. (2020). Weakly supervised semantic point cloud segmentation: Towards 10x fewer labels. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 13,706–13,715).

  • Yang, G., Liu, K., Zhao, Z., Zhang, J., Chen, X., & Chen, B. M. (2022). Datasets and methods for boosting infrastructure inspection: A survey on defect classification. In 2022 IEEE 17th international conference on control and automation (ICCA) (pp. 15–22). IEEE.

  • Yang, G., Liu, K., Zhang, J., Zhao, B., Zhao, Z., Chen, X., & Chen, B. M. (2022). Datasets and processing methods for boosting visual inspection of civil infrastructure: A comprehensive review and algorithm comparison for crack classification, segmentation, and detection. Construction and Building Materials, 356, 129,226.

    Article  Google Scholar 

  • Ye, S., Chen, D., Han, S., & Liao, J. (2021). Learning with noisy labels for robust point cloud segmentation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6443–6452).

  • Yuzhi, Z., Lai-Man, P., Xuehui, W., Kangcheng, L., Yujia, Z., Wing-Yin, Y., Pengfei, X., & Jingjing, X. (2020). Legacy photo editing with learned noise prior. arXiv preprint arXiv:2011.11309

  • Zhang, Y., Qu, Y., Xie, Y., Li, Z., Zheng, S., & Li, C. (2021). Perturbed self-distillation: Weakly supervised large-scale point cloud semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 15,520–15,528).

  • Zhang, J., Cao, J., Liu, X., Wang, J., Liu, J., & Shi, X. (2013). Point cloud normal estimation via low-rank subspace clustering. Computers & Graphics, 37(6), 697–706.

    Article  Google Scholar 

  • Zhao, Y., Po, L. M., Lin, T., Wang, X., Liu, K., Zhang, Y., Yu, W. Y., Xian, P., & Xiong, J. (2020). Legacy photo editing with learned noise prior. arXiv preprint arXiv:2011.11309

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kangcheng Liu.

Additional information

Communicated by Matteo Poggi.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (mp4 24166 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, K. RM3D: Robust Data-Efficient 3D Scene Parsing via Traditional and Learnt 3D Descriptors-Based Semantic Region Merging. Int J Comput Vis 131, 938–967 (2023). https://doi.org/10.1007/s11263-022-01740-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-022-01740-3

Keywords

Navigation