Abstract
We address the point cloud semantic segmentation problem through modeling long-range dependencies based on the self-attention mechanism. Existing semantic segmentation models generally focus on local feature aggregation. By comparison, we propose a point attention network (PA-Net) to selectively extract local features with long-range dependencies. We specially devise two complementary attention modules for the point cloud semantic segmentation task. The attention modules adaptively integrate the semantic inter-dependencies with long-range dependencies. Our point attention module adaptively integrates local features of the last layer of the encoder with a weighted sum of the long-range dependency features. Regardless of the distance of similar features, they are all correlated with each other. Meanwhile, the feature attention module adaptively integrates inter-dependent feature maps among all local features in the last layer of the encoder. Extensive results prove that our two attention modules together improve the performance of semantic segmentation on point clouds. We achieve better semantic segmentation performance on two benchmark point cloud datasets (i.e., S3DIS and ScanNet). Particularly, the IoU on 11 semantic categories of S3DIS is significantly boosted.
Similar content being viewed by others
References
Janai J, Güney F, Behl A, et al. Computer vision for autonomous vehicles: problems, datasets and state of the art. FNT Comput Graph Vision, 2020, 12: 1–308
Qi C R, Su H, Mo K C, et al. PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 77–85
Li Y Y, Bu R, Sun M C, et al. PointCNN: convolution on X-transformed points. In: Proceedings of Conference on Neural Information Processing Systems, Montréal, 2018. 828–838
Huang Q G, Wang W Y, Neumann U. Recurrent slice networks for 3D segmentation of point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 2626–2635
Wang W Y, Yu R, Huang Q G, et al. SGPN: similarity group proposal network for 3D point cloud instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 2569–2578
Wang Y, Sun Y B, Liu Z W, et al. Dynamic graph CNN for learning on point clouds. ACM Trans Graph, 2019, 38: 1–12
Wang X L, Liu S, Shen X Y, et al. Associatively segmenting instances and semantics in point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 4096–4105
Lin Y Q, Yan Z Z, Huang H B, et al. FPConv: learning local flattening for point convolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 2020. 4292–4301
Maximilian J, Gu J Y, Su H. Multi-view PointNet for 3D scene understanding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, 2019. 3995–4003
Armeni I, Sener O, Roshan A, et al. 3D semantic parsing of large-scale indoor spaces. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 1534–1543
Dai A, Chang A X, Savva M, et al. ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 5828–5839
Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Boston, 2015. 3431–3440
Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. Commun ACM, 2017, 60: 84–90
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. 2015. ArXiv:1409.1556
Szegedy C, Liu W, Jia Y Q. Going deeper with convolutions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Boston, 2015. 1–9
He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 770–778
Huang G, Liu Z, Weinberger K Q. Densely connected convolutional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 4700–4708
Chollet F. Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 1251–1258
Chen L C, Zhu Y K, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 801–818
Wu Z R, Song S R, Khosla A, et al. 3D ShapeNets: a deep representation for volumetric shapes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Boston, 2015. 1912–1920
Su H, Maji S, Kalogerakis E, et al. Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Santiago, 2015. 945–953
Zhi S F, Liu Y X, Li X, et al. LightNet: a lightweight 3D convolutional neural network for real-time 3D object recognition. In: Proceedings of Eurographics Workshop on 3D Object Retrieval, Lyon, 2017. 9–16
Qi C R, Yi L, Su H, et al. PointNet++: deep hierarchical feature learning on point sets in a metric space. In: Proceedings of Conference on Neural Information Processing Systems, Long Beach, 2017. 5105–5114
Thabet A K, Alwassel H, Ghanem B, et al. MortonNet: self-supervised learning of local features in 3D point clouds. 2019. ArXiv:1904.00230
Te G S, Hu W, Zheng A M, et al. RGCNN: regularized graph CNN for point cloud segmentation. In: Proceedings of Multimedia Conference on Multimedia Conference, Seoul, 2018. 746–754
Wang L, Huang Y C, Hou Y L, et al. Graph attention convolution for point cloud semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 10296–10305
Meng H Y, Gao L, Lai Y K, et al. VV-Net: voxel VAE net with group convolutions for point cloud segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, 2019. 8499–8507
Xu C F, Wu B C, Wang Z N, et al. SqueezeSegV3: spatially-adaptive convolution for efficient point-cloud segmentation. In: Proceedings of European Conference on Computer Vision, Glasgow, 2020. 1–19
Zhang F H, Fang J, Wah B, et al. Deep fusionnet for point cloud semantic segmentation. In: Proceedings of European Conference on Computer Vision, Glasgow, 2020. 644–663
Guo M H, Cai J X, Liu Z N, et al. PCT: point cloud transformer. Comp Visual Media, 2021, 7: 187–199
Zhang Z Y, Hua B S, Yeung S K. ShellNet: efficient point cloud convolutional neural networks using concentric shells statistics. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, 2019. 1607–1616
Hu Q Y, Yang B, Xie L H, et al. RandLA-Net: efficient semantic segmentation of large-scale point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 2020. 11108–11117
Lin G S, Shen C H, Hengel A, et al. Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 3194–3203
Lin Z H, Feng M W, Santos C, et al. A structured self-attentive sentence embedding. 2017. ArXiv:1703.03130
Shen T, Zhou T Y, Long G D, et al. DiSAN: directional self-attention network for RNN/CNN-free language understanding. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, 2018. 5446–5455
Tang J H, Hong R C, Yan S C, et al. Image annotation by kNN-sparse graph-based label propagation over noisily tagged web images. ACM Trans Intell Syst Technol, 2011, 2: 1–15
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proceedings of Conference on Neural Information Processing Systems, Long Beach, 2017. 5998–6008
Zhang H, Goodfellow I J, Metaxas D N, et al. Self-attention generative adversarial networks. In: Proceedings of the 36th International Conference on Machine Learning, Long Beach, 2019. 7354–7363
Wang X L, Girshick R B, Gupta A, et al. Non-local neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 7794–7803
Kingma D P, Ba J. ADAM: a method for stochastic optimization. 2014. ArXiv:1412.6980
Zhao H S, Jiang L, Fu C W, et al. PointWeb: enhancing local neighborhood features for point cloud processing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 5565–5573
Jiang L, Zhao H S, Liu S, et al. Hierarchical point-edge interaction network for point cloud semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, 2019. 10433–10441
Acknowledgements
This work was supported by National Natural Science Foundation of China (Grant Nos. 62032011, 61772257).
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Ren, D., Wu, Z., Li, J. et al. Point attention network for point cloud semantic segmentation. Sci. China Inf. Sci. 65, 192104 (2022). https://doi.org/10.1007/s11432-021-3387-7
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-021-3387-7