Skip to main content
Log in

Point attention network for point cloud semantic segmentation

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

We address the point cloud semantic segmentation problem through modeling long-range dependencies based on the self-attention mechanism. Existing semantic segmentation models generally focus on local feature aggregation. By comparison, we propose a point attention network (PA-Net) to selectively extract local features with long-range dependencies. We specially devise two complementary attention modules for the point cloud semantic segmentation task. The attention modules adaptively integrate the semantic inter-dependencies with long-range dependencies. Our point attention module adaptively integrates local features of the last layer of the encoder with a weighted sum of the long-range dependency features. Regardless of the distance of similar features, they are all correlated with each other. Meanwhile, the feature attention module adaptively integrates inter-dependent feature maps among all local features in the last layer of the encoder. Extensive results prove that our two attention modules together improve the performance of semantic segmentation on point clouds. We achieve better semantic segmentation performance on two benchmark point cloud datasets (i.e., S3DIS and ScanNet). Particularly, the IoU on 11 semantic categories of S3DIS is significantly boosted.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Janai J, Güney F, Behl A, et al. Computer vision for autonomous vehicles: problems, datasets and state of the art. FNT Comput Graph Vision, 2020, 12: 1–308

    Article  Google Scholar 

  2. Qi C R, Su H, Mo K C, et al. PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 77–85

  3. Li Y Y, Bu R, Sun M C, et al. PointCNN: convolution on X-transformed points. In: Proceedings of Conference on Neural Information Processing Systems, Montréal, 2018. 828–838

  4. Huang Q G, Wang W Y, Neumann U. Recurrent slice networks for 3D segmentation of point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 2626–2635

  5. Wang W Y, Yu R, Huang Q G, et al. SGPN: similarity group proposal network for 3D point cloud instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 2569–2578

  6. Wang Y, Sun Y B, Liu Z W, et al. Dynamic graph CNN for learning on point clouds. ACM Trans Graph, 2019, 38: 1–12

    Google Scholar 

  7. Wang X L, Liu S, Shen X Y, et al. Associatively segmenting instances and semantics in point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 4096–4105

  8. Lin Y Q, Yan Z Z, Huang H B, et al. FPConv: learning local flattening for point convolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 2020. 4292–4301

  9. Maximilian J, Gu J Y, Su H. Multi-view PointNet for 3D scene understanding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, 2019. 3995–4003

  10. Armeni I, Sener O, Roshan A, et al. 3D semantic parsing of large-scale indoor spaces. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 1534–1543

  11. Dai A, Chang A X, Savva M, et al. ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 5828–5839

  12. Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Boston, 2015. 3431–3440

  13. Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. Commun ACM, 2017, 60: 84–90

    Article  Google Scholar 

  14. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. 2015. ArXiv:1409.1556

  15. Szegedy C, Liu W, Jia Y Q. Going deeper with convolutions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Boston, 2015. 1–9

  16. He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 770–778

  17. Huang G, Liu Z, Weinberger K Q. Densely connected convolutional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 4700–4708

  18. Chollet F. Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 1251–1258

  19. Chen L C, Zhu Y K, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 801–818

  20. Wu Z R, Song S R, Khosla A, et al. 3D ShapeNets: a deep representation for volumetric shapes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Boston, 2015. 1912–1920

  21. Su H, Maji S, Kalogerakis E, et al. Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Santiago, 2015. 945–953

  22. Zhi S F, Liu Y X, Li X, et al. LightNet: a lightweight 3D convolutional neural network for real-time 3D object recognition. In: Proceedings of Eurographics Workshop on 3D Object Retrieval, Lyon, 2017. 9–16

  23. Qi C R, Yi L, Su H, et al. PointNet++: deep hierarchical feature learning on point sets in a metric space. In: Proceedings of Conference on Neural Information Processing Systems, Long Beach, 2017. 5105–5114

  24. Thabet A K, Alwassel H, Ghanem B, et al. MortonNet: self-supervised learning of local features in 3D point clouds. 2019. ArXiv:1904.00230

  25. Te G S, Hu W, Zheng A M, et al. RGCNN: regularized graph CNN for point cloud segmentation. In: Proceedings of Multimedia Conference on Multimedia Conference, Seoul, 2018. 746–754

  26. Wang L, Huang Y C, Hou Y L, et al. Graph attention convolution for point cloud semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 10296–10305

  27. Meng H Y, Gao L, Lai Y K, et al. VV-Net: voxel VAE net with group convolutions for point cloud segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, 2019. 8499–8507

  28. Xu C F, Wu B C, Wang Z N, et al. SqueezeSegV3: spatially-adaptive convolution for efficient point-cloud segmentation. In: Proceedings of European Conference on Computer Vision, Glasgow, 2020. 1–19

  29. Zhang F H, Fang J, Wah B, et al. Deep fusionnet for point cloud semantic segmentation. In: Proceedings of European Conference on Computer Vision, Glasgow, 2020. 644–663

  30. Guo M H, Cai J X, Liu Z N, et al. PCT: point cloud transformer. Comp Visual Media, 2021, 7: 187–199

    Article  Google Scholar 

  31. Zhang Z Y, Hua B S, Yeung S K. ShellNet: efficient point cloud convolutional neural networks using concentric shells statistics. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, 2019. 1607–1616

  32. Hu Q Y, Yang B, Xie L H, et al. RandLA-Net: efficient semantic segmentation of large-scale point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 2020. 11108–11117

  33. Lin G S, Shen C H, Hengel A, et al. Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 3194–3203

  34. Lin Z H, Feng M W, Santos C, et al. A structured self-attentive sentence embedding. 2017. ArXiv:1703.03130

  35. Shen T, Zhou T Y, Long G D, et al. DiSAN: directional self-attention network for RNN/CNN-free language understanding. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, 2018. 5446–5455

  36. Tang J H, Hong R C, Yan S C, et al. Image annotation by kNN-sparse graph-based label propagation over noisily tagged web images. ACM Trans Intell Syst Technol, 2011, 2: 1–15

    Article  Google Scholar 

  37. Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proceedings of Conference on Neural Information Processing Systems, Long Beach, 2017. 5998–6008

  38. Zhang H, Goodfellow I J, Metaxas D N, et al. Self-attention generative adversarial networks. In: Proceedings of the 36th International Conference on Machine Learning, Long Beach, 2019. 7354–7363

  39. Wang X L, Girshick R B, Gupta A, et al. Non-local neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 7794–7803

  40. Kingma D P, Ba J. ADAM: a method for stochastic optimization. 2014. ArXiv:1412.6980

  41. Zhao H S, Jiang L, Fu C W, et al. PointWeb: enhancing local neighborhood features for point cloud processing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 5565–5573

  42. Jiang L, Zhao H S, Liu S, et al. Hierarchical point-edge interaction network for point cloud semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, 2019. 10433–10441

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant Nos. 62032011, 61772257).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jie Guo or Yanwen Guo.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ren, D., Wu, Z., Li, J. et al. Point attention network for point cloud semantic segmentation. Sci. China Inf. Sci. 65, 192104 (2022). https://doi.org/10.1007/s11432-021-3387-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-021-3387-7

Keywords

Navigation