Skip to main content

GPSFormer: A Global Perception and Local Structure Fitting-Based Transformer for Point Cloud Understanding

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Abstract

Despite the significant advancements in pre-training methods for point cloud understanding, directly capturing intricate shape information from irregular point clouds without reliance on external data remains a formidable challenge. To address this problem, we propose GPSFormer, an innovative Global Perception and Local Structure Fitting-based Transformer, which learns detailed shape information from point clouds with remarkable precision. The core of GPSFormer is the Global Perception Module (GPM) and the Local Structure Fitting Convolution (LSFConv). Specifically, GPM utilizes Adaptive Deformable Graph Convolution (ADGConv) to identify short-range dependencies among similar features in the feature space and employs Multi-Head Attention (MHA) to learn long-range dependencies across all positions within the feature space, ultimately enabling flexible learning of contextual representations. Inspired by Taylor series, we design LSFConv, which learns both low-order fundamental and high-order refinement information from explicitly encoded local geometric structures. Integrating the GPM and LSFConv as fundamental components, we construct GPSFormer, a cutting-edge Transformer that effectively captures global and local structures of point clouds. Extensive experiments validate GPSFormer’s effectiveness in three point cloud tasks: shape classification, part segmentation, and few-shot learning. The code of GPSFormer is available at https://github.com/changshuowang/GPSFormer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Chen, B., Xia, Y., Zang, Y., Wang, C., Li, J.: Decoupled local aggregation for point cloud learning. arXiv preprint arXiv:2308.16532 (2023)

  2. Chen, G., Wang, M., Yang, Y., Yu, K., Yuan, L., Yue, Y.: PointGPT: auto-regressively generative pre-training from point clouds. arXiv preprint arXiv:2305.11487 (2023)

  3. Engel, N., Belagiannis, V., Dietmayer, K.: Point transformer. IEEE Access 9, 134826–134840 (2021)

    Article  Google Scholar 

  4. Fang, X., Hu, Y., Zhou, P., Wu, D.: ANIMC: a soft approach for autoweighted noisy and incomplete multiview clustering. IEEE Trans. Artif. Intell. 3(2), 192–206 (2021)

    Article  Google Scholar 

  5. Fang, X., Hu, Y., Zhou, P., Wu, D.O.: V3H: view variation and view heredity for incomplete multiview clustering. IEEE Trans. Artif. Intell. 1(3), 233–247 (2020)

    Article  Google Scholar 

  6. Fang, X., Hu, Y., Zhou, P., Wu, D.O.: Unbalanced incomplete multi-view clustering via the scheme of view evolution: weak views are meat; strong views do eat. IEEE Trans. Emerg. Top. Comput. Intell. 6(4), 913–927 (2021)

    Article  Google Scholar 

  7. Fang, X., et al.: Annotations are not all you need: a cross-modal knowledge transfer network for unsupervised temporal sentence grounding. In: Findings of the Association for Computational Linguistics: EMNLP 2023, pp. 8721–8733 (2023)

    Google Scholar 

  8. Fang, X., et al.: Fewer steps, better performance: efficient cross-modal clip trimming for video moment retrieval using language. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1735–1743 (2024)

    Google Scholar 

  9. Fang, X., Liu, D., Zhou, P., Hu, Y.: Multi-modal cross-domain alignment network for video moment retrieval. IEEE Trans. Multimedia 25, 7517–7532 (2022)

    Article  Google Scholar 

  10. Fang, X., Liu, D., Zhou, P., Nan, G.: You can ground earlier than see: an effective and efficient pipeline for temporal sentence grounding in compressed videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2448–2460 (2023)

    Google Scholar 

  11. Fang, X., Liu, D., Zhou, P., Xu, Z., Li, R.: Hierarchical local-global transformer for temporal sentence grounding. IEEE Trans. Multimedia 26, 3263–3277 (2023)

    Article  Google Scholar 

  12. Goyal, A., Law, H., Liu, B., Newell, A., Deng, J.: Revisiting point cloud shape classification with a simple and effective baseline. In: International Conference on Machine Learning, pp. 3809–3820. PMLR (2021)

    Google Scholar 

  13. Guo, M.H., Cai, J.X., Liu, Z.N., Mu, T.J., Martin, R.R., Hu, S.M.: PCT: point cloud transformer. Comput. Vis. Media 7, 187–199 (2021)

    Article  Google Scholar 

  14. Hamdi, A., Giancola, S., Ghanem, B.: MVTN: multi-view transformation network for 3D shape recognition. In: ICCV, pp. 1–11 (2021)

    Google Scholar 

  15. Jiang, L., Wang, C., Ning, X., Yu, Z.: LTTPoint: a MLP-based point cloud classification method with local topology transformation module. In: 2023 7th Asian Conference on Artificial Intelligence Technology (ACAIT), pp. 783–789. IEEE (2023)

    Google Scholar 

  16. Kasneci, E., et al.: ChatGPT for good? On opportunities and challenges of large language models for education. Learn. Individ. Differ. 103, 102274 (2023)

    Article  Google Scholar 

  17. Komarichev, A., Zhong, Z., Hua, J.: A-CNN: annularly convolutional neural networks on point clouds. In: CVPR, pp. 7421–7430 (2019)

    Google Scholar 

  18. Li, L., Zhu, S., Fu, H., Tan, P., Tai, C.L.: End-to-end learning local multi-view descriptors for 3D point clouds. In: CVPR, pp. 1919–1928 (2020)

    Google Scholar 

  19. Li, Y., Bu, R., Sun, M., Wu, W., Di, X., Chen, B.: PointCNN: convolution on X-transformed points. In: NeurIPS, vol. 31 (2018)

    Google Scholar 

  20. Liu, Y., Tian, B., Lv, Y., Li, L., Wang, F.Y.: Point cloud classification using content-based transformer via clustering in feature space. IEEE/CAA J. Autom. Sin. 11(1), 231–239 (2023)

    Article  Google Scholar 

  21. Liu, Y., Chen, C., Wang, C., King, X., Liu, M.: Regress before construct: regress autoencoder for point cloud self-supervised learning. In: ACMMM, pp. 1738–1749 (2023)

    Google Scholar 

  22. Liu, Y., Fan, B., Xiang, S., Pan, C.: Relation-shape convolutional neural network for point cloud analysis. In: CVPR, pp. 8895–8904 (2019)

    Google Scholar 

  23. Ma, X., Qin, C., You, H., Ran, H., Fu, Y.: Rethinking network design and local geometry in point cloud: a simple residual MLP framework. arXiv preprint arXiv:2202.07123 (2022)

  24. Maturana, D., Scherer, S.: VoxNet: a 3D convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 922–928. IEEE (2015)

    Google Scholar 

  25. Moody, J., Darken, C.J.: Fast learning in networks of locally-tuned processing units. Neural Comput. 1(2), 281–294 (1989)

    Article  Google Scholar 

  26. Muzahid, A., Wan, W., Sohel, F., Wu, L., Hou, L.: CurveNet: curvature-based multitask learning deep networks for 3D object recognition. IEEE/CAA J. Autom. Sin. 8(6), 1177–1187 (2020)

    Article  Google Scholar 

  27. Pang, Y., Wang, W., Tay, F.E., Liu, W., Tian, Y., Yuan, L.: Masked autoencoders for point cloud self-supervised learning. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13662, pp. 604–621. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20086-1_35

    Chapter  Google Scholar 

  28. Park, J., Lee, S., Kim, S., Xiong, Y., Kim, H.J.: Self-positioning point-based transformer for point cloud understanding. In: CVPR, pp. 21814–21823 (2023)

    Google Scholar 

  29. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: CVPR, pp. 652–660 (2017)

    Google Scholar 

  30. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. arXiv preprint arXiv:1706.02413 (2017)

  31. Qian, G., et al.: PointNeXt: revisiting PointNet++ with improved training and scaling strategies. In: NeurIPS, vol. 35, pp. 23192–23204 (2022)

    Google Scholar 

  32. Qiu, S., Anwar, S., Barnes, N.: Dense-resolution network for point cloud classification and segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3813–3822 (2021)

    Google Scholar 

  33. Qiu, S., Anwar, S., Barnes, N.: Geometric back-projection network for point cloud classification. TMM 24, 1943–1955 (2021)

    Google Scholar 

  34. Ran, H., Liu, J., Wang, C.: Surface representation for point clouds. In: CVPR, pp. 18942–18952 (2022)

    Google Scholar 

  35. Rao, Y., Lu, J., Zhou, J.: Global-local bidirectional reasoning for unsupervised representation learning of 3D point clouds. In: CVPR, pp. 5376–5385 (2020)

    Google Scholar 

  36. Ren, J., Rao, A., Lindorfer, M., Legout, A., Choffnes, D.: ReCon: revealing and controlling PII leaks in mobile network traffic. In: Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services, pp. 361–374 (2016)

    Google Scholar 

  37. Riegler, G., Osman Ulusoy, A., Geiger, A.: OctNet: learning deep 3D representations at high resolutions. In: CVPR, pp. 3577–3586 (2017)

    Google Scholar 

  38. Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65(6), 386 (1958)

    Article  Google Scholar 

  39. Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3D shape recognition. In: ICCV, pp. 945–953 (2015)

    Google Scholar 

  40. Sun, P., et al.: Scalability in perception for autonomous driving: waymo open dataset. In: CVPR, pp. 2446–2454 (2020)

    Google Scholar 

  41. Thomas, H., Qi, C.R., Deschaud, J.E., Marcotegui, B., Goulette, F., Guibas, L.J.: KPConv: flexible and deformable convolution for point clouds. In: ICCV, pp. 6411–6420 (2019)

    Google Scholar 

  42. Uy, M.A., Pham, Q.H., Hua, B.S., Nguyen, T., Yeung, S.K.: Revisiting point cloud classification: a new benchmark dataset and classification model on real-world data. In: ICCV, pp. 1588–1597 (2019)

    Google Scholar 

  43. Wang, C., Ning, X., Li, W., Bai, X., Gao, X.: 3D person re-identification based on global semantic guidance and local feature aggregation. IEEE Trans. Circ. Syst. Video Technol. 34(6), 4698–4712 (2024)

    Article  Google Scholar 

  44. Wang, C., Ning, X., Sun, L., Zhang, L., Li, W., Bai, X.: Learning discriminative features by covering local geometric space for point cloud analysis. IEEE Trans. Geosci. Remote Sens. 60, 1–15 (2022)

    Google Scholar 

  45. Wang, C., Wang, H., Ning, X., Shengwei, T., Li, W.: 3D point cloud classification method based on dynamic coverage of local area. J. Softw. 34(4), 1962–1976 (2022)

    Google Scholar 

  46. Wang, C., Samari, B., Siddiqi, K.: Local spectral graph convolution for point set feature learning. In: ECCV, pp. 52–66 (2018)

    Google Scholar 

  47. Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. (ToG) 38(5), 1–12 (2019)

    Article  Google Scholar 

  48. Wei, X., Yu, R., Sun, J.: View-GCN: view-based graph convolutional network for 3D shape analysis. In: CVPR, pp. 1850–1859 (2020)

    Google Scholar 

  49. Wu, J., Zhang, C., Xue, T., Freeman, B., Tenenbaum, J.: Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In: NeurIPS, vol. 29 (2016)

    Google Scholar 

  50. Wu, W., Qi, Z., Fuxin, L.: PointConv: deep convolutional networks on 3D point clouds. In: CVPR, pp. 9621–9630 (2019)

    Google Scholar 

  51. Wu, Z., et al.: 3D ShapeNets: a deep representation for volumetric shapes. In: CVPR, pp. 1912–1920 (2015)

    Google Scholar 

  52. Xu, M., Ding, R., Zhao, H., Qi, X.: PAConv: position adaptive convolution with dynamic kernel assembling on point clouds. In: CVPR, pp. 3173–3182 (2021)

    Google Scholar 

  53. Xue, L., et al.: ULIP: learning a unified representation of language, images, and point clouds for 3D understanding. In: CVPR, pp. 1179–1189 (2023)

    Google Scholar 

  54. Yan, X., Zheng, C., Li, Z., Wang, S., Cui, S.: PointASNL: robust point clouds processing using nonlocal neural networks with adaptive sampling. In: CVPR, pp. 5589–5598 (2020)

    Google Scholar 

  55. Yang, Y., Feng, C., Shen, Y., Tian, D.: FoldingNet: point cloud auto-encoder via deep grid deformation. In: CVPR, pp. 206–215 (2018)

    Google Scholar 

  56. Yang, Z., Wang, L.: Learning relationships for multi-view 3D object recognition. In: ICCV, pp. 7505–7514 (2019)

    Google Scholar 

  57. Yi, L., et al.: A scalable active framework for region annotation in 3D shape collections. ACM Trans. Graph. (ToG) 35(6), 1–12 (2016)

    Article  Google Scholar 

  58. Yu, X., Tang, L., Rao, Y., Huang, T., Zhou, J., Lu, J.: Point-BERT: pre-training 3D point cloud transformers with masked point modeling. In: CVPR, pp. 19313–19322 (2022)

    Google Scholar 

  59. Yu, Z., Li, L., Xie, J., Wang, C., Li, W., Ning, X.: Pedestrian 3D shape understanding for person re-identification via multi-view learning. IEEE Trans. Circ. Syst. Video Technol. 34(7), 5589–5602 (2024)

    Article  Google Scholar 

  60. Zha, Y., et al.: Towards compact 3D representations via point feature enhancement masked autoencoders. arXiv preprint arXiv:2312.10726 (2023)

  61. Zhang, C., Wan, H., Shen, X., Wu, Z.: PVT: point-voxel transformer for point cloud learning. arXiv preprint arXiv:2108.06076 (2021)

  62. Zhang, H., et al.: Deep learning-based 3D point cloud classification: a systematic survey and outlook. Displays 79, 102456 (2023)

    Article  Google Scholar 

  63. Zhang, H., Wang, C., Yu, L., Tian, S., Ning, X., Rodrigues, J.: PointGT: a method for point-cloud classification and segmentation based on local geometric transformation. IEEE Trans. Multimedia 26, 8052–8062 (2024)

    Article  Google Scholar 

  64. Zhang, R., Wang, L., Wang, Y., Gao, P., Li, H., Shi, J.: Parameter is not all you need: starting from non-parametric networks for 3D point cloud analysis. arXiv preprint arXiv:2303.08134 (2023)

  65. Zhao, H., Jiang, L., Jia, J., Torr, P.H., Koltun, V.: Point transformer. In: ICCV, pp. 16259–16268 (2021)

    Google Scholar 

  66. Zhou, H., Feng, Y., Fang, M., Wei, M., Qin, J., Lu, T.: Adaptive graph convolution for point cloud analysis. In: ICCV, pp. 4965–4974 (2021)

    Google Scholar 

  67. Zhu, X.X., Shahzad, M.: Facade reconstruction using multiview spaceborne TomoSAR point clouds. IEEE Trans. Geosci. Remote Sens. 52(6), 3541–3552 (2013)

    Article  Google Scholar 

Download references

Acknowledgment

This work was supported in part by NTU-DESAY SV Research Program under Grant 2018-0980; and in part by the Ministry of Education, Singapore, under its Academic Research Fund Tier 1, under Grant RG78/21. The computational work for this article was partially performed on resources of the National Supercomputing Centre, Singapore (https://www.nscc.sg).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Siew-Kei Lam .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 559 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, C. et al. (2025). GPSFormer: A Global Perception and Local Structure Fitting-Based Transformer for Point Cloud Understanding. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15066. Springer, Cham. https://doi.org/10.1007/978-3-031-73242-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-73242-3_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-73241-6

  • Online ISBN: 978-3-031-73242-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics