Skip to main content
Log in

Learning local contextual features for 3D point clouds semantic segmentation by attentive kernel convolution

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Unlike 2D images that are represented in regular grids, 3D point clouds are irregular and unordered, hence directly applying convolution neural networks (CNNs) to process point clouds is quite challenging. In this paper, we propose a novel deep neural network named AKNet to achieve point cloud semantic segmentation. The key to our AKNet is the attentive kernel convolution (AKConv), which is a deformed convolution operation for perceiving sufficient local context of 3D scenes. AKConv first constructs the Basic Weight Units that are robust to point’s ordering. Then, for capturing the more distinctive local features, the convolution kernels of AKConv are associated with Attentive Weight Units through the self-attentive function acting on Basic Weight Units. Furthermore, 3D point clouds provide richer geometric shape information, which is helpful to recognize objects. However, inputting only raw point features to the convolution function could cause geometric information loss. Thus, we utilize augmented features as input of AKConv. Besides, to preserve the semantic information from the encoding to decoding layers, we introduce the backward encoding (BE) mechanism by utilizing higher-layer semantic features. We conduct experiments on three large-scale point clouds datasets. The experimental results demonstrate that our AKNet outperforms state-of-the-art (SOTA) networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data Availability

The datasets used in this paper are public datasets and can be obtained by contacting the relevant providers in Semantic3D [45], S3DIS [46] and SensatUrban [47].

References

  1. Qi, C.R., Chen, X., Litany, O., et al.: Imvotenet: boosting 3D object detection in point clouds with image votes. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4404–4413 (2020)

  2. Tang, Y., Li, L., Wang, C., et al.: Real-time detection of surface deformation and strain in recycled aggregate concrete-filled steel tubular columns via four-ocular vision. Robotics Comput. Integr. Manuf. 59, 36–46 (2019)

    Article  Google Scholar 

  3. Shao, Y., Tong, G., Peng, H.: Mining local geometric structure for large-scale 3D point clouds semantic segmentation. Neurocomputing 500, 191–202 (2022)

    Article  Google Scholar 

  4. Li, H., Sun, Z.: A structural-constraint 3D point clouds segmentation adversarial method. Vis. Comput. 37(2), 325–340 (2021)

    Article  Google Scholar 

  5. Tateno, K., Tombari, F., Navab, N.: Real-time and scalable incremental segmentation on dense slam. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2015, pp. 4465–4472 (2015)

  6. Koppula, H. S., Anand, A., Joachims, T., Saxena, A.: Semantic labeling of 3D point clouds for indoor scenes. In: Advances in Neural Information Processing Systems, pp. 244–252 (2011)

  7. Li, R., Zhang, Y., Niu, D., et al.: PointVGG: Graph convolutional network with progressive aggregating features on point clouds. Neurocomputing 429, 187–198 (2021)

    Article  Google Scholar 

  8. Wu, J., Jiao, J., Yang, Q., et al.: Ground-aware point cloud semantic segmentation for autonomous driving. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 971–979 (2019)

  9. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)

  10. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241 (2015)

  11. Tang, Y., Chen, Z., Huang, Z., et al.: Visual measurement of dam concrete cracks based on U-net and improved thinning algorithm. J. Exp. Mech. 37(2), 209–220 (2022)

    Google Scholar 

  12. Su, H., Maji, S., et al.: Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 945–953 (2015)

  13. Boulch, A., Le Saux, B., et al.: Unstructured point cloud semantic labeling using deep segmentation networks. In: Workshop on 3D Object Retrieval (2017)

  14. Graham, B., Engelcke, M., et al.: 3D semantic segmentation with submanifold sparse convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9224–9232 (2018)

  15. Tchapmi, L., Choy, C., Armeni, I., et al.: Segcloud: semantic segmentation of 3D point clouds. In: Proceedings of the International Conference on 3D Vision, pp. 537–547 (2017)

  16. Klokov, R., Lempitsky, V.: Deep kd-networks for the recognition of 3D point cloud models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 863–872 (2017)

  17. Riegler, G., Osman Ulusoy, A., Geiger, A.: OctNet: learning deep 3D representations at high resolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3577–3586 (2017)

  18. Qi, C.R., Su, H., Mo, K., et al.: Pointnet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)

  19. Charles, R.Q., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: Proceedings of the Conference and Workshop on Neural Information Processing Systems, Long Beach, CA, pp. 5099–5108 (2017)

  20. Wang, Y., Sun, Y., Liu, Z., et al.: Dynamic graph CNN for learning on point clouds. In: ACM Transactions on Graphics, pp. 1–12 (2019)

  21. Zhao, H., Jiang, L., Fu, C.W., et al.: Pointweb: enhancing local neighborhood features for point cloud processing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5565–5573 (2019)

  22. Lan, S., Yu, R., Yu, G., et al.: Modeling local geometric structure of 3D point clouds using geo-CNN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 998–1008 (2019)

  23. Jiang, M., Wu, Y., Zhao, T., et al.: PointSIFT: a SIFT-like network module for 3D point cloud semantic segmentation. arXiv:1807.00652 (2018)

  24. Zhang, Z., Hua, B.S., Yeung, S.K.: ShellNet: efficient point cloud convolutional neural networks using concentric shells statistics. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1607–1616 (2019)

  25. Hu, Q., Yang, B., Xie, L., et al.: RandLA-Net: efficient semantic segmentation of large-scale point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11108–11117 (2020)

  26. Fan, S., Dong, Q., Zhu, F., et al.: SCF-Net: learning spatial contextual features for large-scale point cloud segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14504–14513 (2021)

  27. Simonovsky, M., Komodakis, N.: Dynamic edge-conditioned filters in convolutional neural networks on graphs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, pp. 29–38 (2017)

  28. Li, Y., Bu, R., Sun, M., et al.: PointCNN: convolution on X-transformed points. In: Proceedings of the Conference and Workshop on Neural Information Processing Systems, Montreal, Canada, pp. 820–830 (2018)

  29. Wang, L., Huang, Y., Hou, Y., et al.: Graph attention convolution for point cloud semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10296–10305 (2019)

  30. Liu, Y., Fan, B., Xiang, S., et al.: Relation-shape convolutional neural network for point cloud analysis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8895–8904 (2019)

  31. Wu, W., Qi, Z., Fuxin, L.: Pointconv: deep convolutional networks on 3D point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9621–9630 (2019)

  32. Thomas, H., Qi, C.R., Deschaud, J.E., et al.: Kpconv: flexible and deformable convolution for point clouds. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6411–6420 (2019)

  33. Xu, M., Ding, R., Zhao, H., et al.: Paconv: position adaptive convolution with dynamic kernel assembling on point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3173–3182 (2021)

  34. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)

  35. Shu, X., Yang, J., Yan, R., et al.: Expansion-squeeze-excitation fusion network for elderly activity recognition. IEEE Trans. Circuits Syst. Video Technol. 32(8), 5281–5292 (2022)

    Article  Google Scholar 

  36. Shi, W., Du, H., Mei, W., et al.: (SARN) spatial-wise attention residual network for image super-resolution. Vis. Comput. 37(6), 1569–1580 (2021)

    Article  Google Scholar 

  37. Shu, X., Zhang, L., Qi, G.J., et al.: Spatiotemporal co-attention recurrent neural networks for human-skeleton motion prediction. IEEE Trans. Pattern Anal. Mach. Intell. 44(6), 3300–3315 (2022)

    Article  Google Scholar 

  38. Kumar, N., Sukavanam, N.: Weakly supervised deep network for spatiotemporal localization and detection of human actions in wild conditions. Vis. Comput. 36(9), 1809–1821 (2020)

    Article  Google Scholar 

  39. Xu, B., Shu, X., Song, Y.: X-invariant contrastive augmentation and representation learning for semi-supervised skeleton-based action recognition. IEEE Trans. Image Process. 31(5), 3852–3867 (2022)

    Article  Google Scholar 

  40. Wang, P., Yao, W.: A new weakly supervised approach for ALS point cloud semantic segmentation. ISPRS J. Photogramm. Remote Sens. 188, 237–254 (2022)

    Article  Google Scholar 

  41. Hu, Q., Yang, B., Fang, G., et al.: Sqn: weakly-supervised semantic segmentation of large-scale 3D point clouds with 1000x fewer labels. arXiv preprint arXiv:2104.04891 (2021)

  42. Thomas, H., Goulette, F., Deschaud, J., et al.: Semantic classification of 3D point clouds with multiscale spherical neighborhoods. In: 3DV, pp. 390–398 (2018)

  43. Landrieu, L., Simonovsky, M.: Large-scale point cloud semantic segmentation with superpoint graphs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4558–4567 (2018)

  44. Gong, J., Xu, J., Tan, X., et al.: Omni-supervised point cloud segmentation via gradual receptive field component reasoning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11673–11682 (2021)

  45. Hackel, Timo, Savinov, N., Ladicky, L., Wegner, Jan D.: SEMANTIC3D.NET: a new large-scale scene point cloud classification benchmark. ISPRS J. Photogramm. Remote Sens. 91–98 (2017)

  46. Armeni, I., Sener, O., Zamir, A.R., Jiang, H., Brilakis, I., Fischer, M., Savarese, S.: 3D semantic parsing of large-scale indoor spaces. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1534–1543 (2016)

  47. Hu, Q., Yang, B., Khalid, S., et al.: Towards semantic segmentation of urban-scale 3D point clouds: a dataset, benchmarks and challenges. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4977–4987 (2021)

  48. Tatarchenko, M., Park, J., Koltun, V., et al.: Tangent convolutions for dense prediction in 3D. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3887–3896 (2018)

  49. Graham, B., Engelcke, M., Van Der Maaten, L.: 3D semantic segmentation with submanifold sparse convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9224–9232 (2018)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuyuan Shao.

Ethics declarations

Conflict of interest

No conflict of interest exits in the submission of this manuscript, and manuscript is approved by all authors for publication. I would like to declare on behalf of my co-authors that the work described was original research that has not been published previously. All the authors listed have approved the manuscript that is enclosed.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is supported by National Key R & D Program of China (Nos. 2019YFB1309905, 2020YFB1712802).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tong, G., Shao, Y. & Peng, H. Learning local contextual features for 3D point clouds semantic segmentation by attentive kernel convolution. Vis Comput 40, 831–847 (2024). https://doi.org/10.1007/s00371-023-02819-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-023-02819-9

Keywords

Navigation