Skip to main content
Log in

RS-TNet: point cloud transformer with relation-shape awareness for fine-grained 3D visual processing

  • Data analytics and machine learning
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Point cloud representation is a challenge to extracting sufficient semantic information while ensuring that the sparsely point cloud spatial structure is complete. Benefiting from the Transformer network, recent studies have promoted the development of point cloud representation by extracting refined attention features based on global context. However, there is still undesired semantic information loss in the feature extraction stage. Hence, this paper proposes a novel architecture for 3D point cloud representation, namely Relation-Shape Transformer Network (RS-TNet), to address above problem while maintaining the merits of relation-shape embedding mechanism so as to generate rich and robust local semantic features. Specifically, RS-TNet can achieve coarse-to-fine grained semantic information coverage by integrating the global multi-head self-attention and local Relation-Feature extraction module simultaneously. Moreover, theoretical analysis demonstrates that RS-TNet can explicitly introduce the spatial relation of points by learning underlying shapes. In this way, extracted features are of more shape awareness and robustness. As a result, the proposed RS-TNet achieves 90.9% class accuracy and 85.6% Intersection-over-Union on ModelNet40 and ShapeNet datasets, respectively. Further, ablation experiments verify the effectiveness of our RS-TNet in point cloud classification and part segmentation tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

Enquiries about data availability should be directed to the authors.

References

  • Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al. (2016) Tensorflow: A system for large-scale machine learning. In: 12th \(\{\)USENIX\(\}\) symposium on operating systems design and implementation (\(\{OSDI\}\) 16), pp 265–283

  • Atzmon M, Maron H, Lipman Y (2018) Point convolutional neural networks by extension operators. ACM Trans Graph, 37(4), 71:1

  • Chen J, Qin J, Shen Y, Liu L, Zhu F, Shao L (2020) Learning attentive and hierarchical representations for 3d shape recognition. In Computer Vision-ECCV, (2020) 16th European conference, Glasgow, UK, 23–28 Aug 2020. Proceedings, Part XV 16, 105–122

  • Engel N, Belagiannis V, Dietmayer K (2020) Point transformer. IEEE Access 9:134826–134840

  • Esteves C, Xu Y, Allen-Blanchette C, Daniilidis K (2019) Equivariant multi-view networks. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1568–1577

  • Fuchs F, Worrall D, Fischer V, Welling M (2020) Se (3)-transformers: 3d roto-translation equivariant attention networks. Adv Neural Inf Process Syst

  • Guan T, Wang J, Lan S, Chandra R, Wu Z, Davis L, Manocha D (2021) M3detr: Multi-representation, multi-scale, mutual-relation 3d object detection with transformers. arXiv preprint: arXiv:2104.11896

  • Guo M-H, Cai J-X, Liu Z-N, Mu T-J, Martin RR, Hu S-M (2021) Pct: Point cloud transformer. Comput Vis Media, pp 187–199

  • Han X-F, Jin Y-F, Cheng H-X, Xiao G-Q (2021) Dual transformer for point cloud analysis. arXiv preprint: arXiv:2104.13044

  • Han X-F, Kuang Y-J, Xiao GQ (2021) Point cloud learning with transformer. arXiv preprint: arXiv:2104.13636

  • Kaul C, Pears N, Manandhar S (2021) Fatnet: A feature-attentive network for 3d point cloud processing. In: 2020 25th International conference on pattern recognition (ICPR), pp 7211–7218

  • Klokov R, Lempitsky V (2017) Escape from cells: deep kd-networks for the recognition of 3d point cloud models. In: Proceedings of the IEEE international conference on computer vision, 2017, pp 863–872

  • Li J, Chen BM, Lee GH (2018) So-net: Self-organizing network for point cloud analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9397–9406

  • Lin J, Rickert M, Perzylo A, Knoll A (2021) Pctma-net: Point cloud transformer with morphing atlas-based point generation network for dense point cloud completion. In: Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS)

  • Liu Y, Fan B, Xiang S, Pan C (2019) Relation-shape convolutional neural network for point cloud analysis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8895–8904

  • Liu X, Han Z, Liu Y-S, Zwicker M (2019) Point2sequence: learning the shape representation of 3d point clouds with an attention-based sequence to sequence network. In: Proceedings of the AAAI conference on artificial intelligence, pp 8778–8785

  • Liu Z, Zhao X, Huang T, Hu R, Zhou Y, Bai X (2020) Tanet: Robust 3d object detection from point clouds with triple attention. In: Proceedings of the AAAI conference on artificial intelligence, pp 11677–11684

  • Loshchilov I, Hutter F (2017) Sgdr: Stochastic gradient descent with warm restarts. In: International conference on learning representations

  • Luo Z, Liu D, Li J, Chen Y, Xiao Z, Junior JM, Goncalves WN, Wang C (2020) Learning sequential slice representation with an attention-embedding network for 3d shape recognition and retrieval in mls point clouds. In: ISPRS J Photogramm Remote Sens, pp 147–163

  • Maturana D, Scherer S (2015) Voxnet: A 3d convolutional neural network for real-time object recognition. In: IEEE/RSJ international conference on intelligent robots and systems (IROS) 2015:922–928

  • Qi CR, Su H, Mo K, Guibas LJ (2017) Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 652–660

  • Qi CR, Su H, Nießner M, Dai A, Yan M, Guibas LJ (2016) Volumetric and multi-view cnns for object classification on 3d data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5648–5656

  • Qi CR, Yi L, Su H, Guibas LJ (2017) Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In: Annual conference on neural information processing systems, pp 5099–5108

  • Qin C, You H, Wang L, Kuo C-CJ, Fu Y (2019) “Pointdan: A multi-scale 3d domain adaption network for point cloud representation. Adv Neural Inf Process Syst, pp 7192–7203

  • Riegler G, Osman Ulusoy A, Geiger A (2017) Octnet: Learning deep 3d representations at high resolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3577–3586

  • Su H, Jampani V, Sun D, Maji S, Kalogerakis E, Yang M-H, Kautz J (2018) Splatnet: Sparse lattice networks for point cloud processing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2530–2539

  • Su H, Maji S, Kalogerakis E, Learned-Miller E (2015) Multi-view convolutional neural networks for 3d shape recognition. In: Proceedings of the IEEE international conference on computer vision, pp 945–953

  • Sun W, Zhang Z, Huang J (2020) Robnet: real-time road-object 3d point cloud segmentation based on squeezenet and cyclic CRF. Soft Comput 24(8):5805–5818

    Article  Google Scholar 

  • Wang J, Fu X, Wang X, Liu S, Gao L, Zhang W (2020) Enabling energy-efficient and reliable neural network via neuron-level voltage scaling. IEEE transactions on computers, pp 1460–1473

  • Wang X, Jin Y, Cen Y, Lang C, Li Y (2021) PST-NET: point cloud sampling via point-based transformer. In: International conference on image and graphics, vol 12890, pp 57–69

  • Wang X, Jin Y, Cen Y, Wang T, Li Y (2021) Attention models for point clouds in deep learning: a survey. arXiv preprint arXiv:2102.10788

  • Wang X, Jin Y, Cen Y, Wang T, Tang B, Li Y (2022) Lightn: Light-weight transformer network for performance-overhead tradeoff in point cloud downsampling. CoRR, vol abs/2202.06263

  • Wang X, Jin Y, Li C, Cen Y, Li Y (2022) Vsln: View-aware sphere learning network for cross-view vehicle re-identification. Int J Intell Syst, pp 1–21

  • Wang Y, Sun Y, Liu Z, Sarma SE, Bronstein MM, Solomon JM (2019) Dynamic graph cnn for learning on point clouds. Acm Trans Graphics (tog), pp 1–12

  • Wen X, Han Z, Youk G, Liu Y-S (2020) Cf-sis: Semantic-instance segmentation of 3d point clouds by context fusion with self-attention. In: Proceedings of the 28th ACM international conference on multimedia, pp 1661–1669

  • Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3d shapenets: A deep representation for volumetric shapes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1912–1920

  • Xiang P, Wen X, Liu Y-S, Cao Y-P, Wan P, Zheng W, Han Z (2021) Snowflakenet: Point cloud completion by snowflake point deconvolution with skip-transformer. arXiv preprint: arXiv:2108.04444

  • Xie S, Liu S, Chen Z, Tu Z (2018) Attentional shapecontextnet for point cloud recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4606–4615

  • Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov RR, Le QV (2019) Xlnet: generalized autoregressive pretraining for language understanding. Adv Neural Inf Process Syst

  • Yang B, Luo W, Urtasun R (2018) Pixor: real-time 3d object detection from point clouds. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7652–7660

  • Yang J, Zhang Q, Ni B, Li L, Liu J, Zhou M, Tian Q (2019) Modeling point clouds with self-attention and gumbel subset sampling. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3323–3332

  • Yan X, Zheng C, Li Z, Wang S, Cui S (2020) Pointasnl: Robust point clouds processing using nonlocal neural networks with adaptive sampling. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5589–5598

  • Yi L, Kim VG, Ceylan D, Shen I-C, Yan M, Su H, Lu C, Huang Q, Sheffer A, Guibas L (2016) A scalable active framework for region annotation in 3d shape collections. ACM Trans Graphics (ToG), pp 1–12

  • Yue K, Sun M, Yuan Y, Zhou F, Ding E, Xu F (2018) Compact generalized non-local network. In: Annual conference on neural information processing systems, pp 6511–6520

  • Zhang Y, Jin Y, Chen J, Kan S, Cen Y, Cao Q (2020) PGAN: part-based nondirect coupling embedded GAN for person reidentification. IEEE Multim 27(3):23–33

    Article  Google Scholar 

  • Zhao H, Jia J, Koltun V (2020) Exploring self-attention for image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10076–10085

  • Zhao H, Jiang L, Fu C-W, Jia J (2019) Pointweb: Enhancing local neighborhood features for point cloud processing. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5565–5573

  • Zhao H, Jiang L, Jia J, Torr P, Koltun V (2020) Point transformer. arXiv preprint: arXiv:2012.09164

Download references

Funding

This work was supported by the National Natural Science Foundation of China under Grant No.61972030.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by XW, YJ, YZ, YC, BL and SW. The first draft of the manuscript was written by XW and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Yi Jin or Yigang Cen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Zeng, Y., Jin, Y. et al. RS-TNet: point cloud transformer with relation-shape awareness for fine-grained 3D visual processing. Soft Comput 27, 1005–1013 (2023). https://doi.org/10.1007/s00500-022-07543-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-022-07543-5

Keywords

Navigation