GPSFormer: A Global Perception and Local Structure Fitting-Based Transformer for Point Cloud Understanding

Wang, Changshuo; Wu, Meiqing; Lam, Siew-Kei; Ning, Xin; Yu, Shangshu; Wang, Ruiping; Li, Weijun; Srikanthan, Thambipillai

doi:10.1007/978-3-031-73242-3_5

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15066))

Included in the following conference series:

European Conference on Computer Vision

455 Accesses

Abstract

Despite the significant advancements in pre-training methods for point cloud understanding, directly capturing intricate shape information from irregular point clouds without reliance on external data remains a formidable challenge. To address this problem, we propose GPSFormer, an innovative Global Perception and Local Structure Fitting-based Transformer, which learns detailed shape information from point clouds with remarkable precision. The core of GPSFormer is the Global Perception Module (GPM) and the Local Structure Fitting Convolution (LSFConv). Specifically, GPM utilizes Adaptive Deformable Graph Convolution (ADGConv) to identify short-range dependencies among similar features in the feature space and employs Multi-Head Attention (MHA) to learn long-range dependencies across all positions within the feature space, ultimately enabling flexible learning of contextual representations. Inspired by Taylor series, we design LSFConv, which learns both low-order fundamental and high-order refinement information from explicitly encoded local geometric structures. Integrating the GPM and LSFConv as fundamental components, we construct GPSFormer, a cutting-edge Transformer that effectively captures global and local structures of point clouds. Extensive experiments validate GPSFormer’s effectiveness in three point cloud tasks: shape classification, part segmentation, and few-shot learning. The code of GPSFormer is available at https://github.com/changshuowang/GPSFormer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 74.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

GridNet: efficiently learning deep hierarchical representation for 3D point cloud understanding

Article 15 June 2021

DiffConv: Analyzing Irregular Point Clouds with an Irregular View

RegGeoNet: Learning Regular Representations for Large-Scale 3D Point Clouds

Article 27 September 2022

References

Chen, B., Xia, Y., Zang, Y., Wang, C., Li, J.: Decoupled local aggregation for point cloud learning. arXiv preprint arXiv:2308.16532 (2023)
Chen, G., Wang, M., Yang, Y., Yu, K., Yuan, L., Yue, Y.: PointGPT: auto-regressively generative pre-training from point clouds. arXiv preprint arXiv:2305.11487 (2023)
Engel, N., Belagiannis, V., Dietmayer, K.: Point transformer. IEEE Access 9, 134826–134840 (2021)
Article Google Scholar
Fang, X., Hu, Y., Zhou, P., Wu, D.: ANIMC: a soft approach for autoweighted noisy and incomplete multiview clustering. IEEE Trans. Artif. Intell. 3(2), 192–206 (2021)
Article Google Scholar
Fang, X., Hu, Y., Zhou, P., Wu, D.O.: V3H: view variation and view heredity for incomplete multiview clustering. IEEE Trans. Artif. Intell. 1(3), 233–247 (2020)
Article Google Scholar
Fang, X., Hu, Y., Zhou, P., Wu, D.O.: Unbalanced incomplete multi-view clustering via the scheme of view evolution: weak views are meat; strong views do eat. IEEE Trans. Emerg. Top. Comput. Intell. 6(4), 913–927 (2021)
Article Google Scholar
Fang, X., et al.: Annotations are not all you need: a cross-modal knowledge transfer network for unsupervised temporal sentence grounding. In: Findings of the Association for Computational Linguistics: EMNLP 2023, pp. 8721–8733 (2023)
Google Scholar
Fang, X., et al.: Fewer steps, better performance: efficient cross-modal clip trimming for video moment retrieval using language. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1735–1743 (2024)
Google Scholar
Fang, X., Liu, D., Zhou, P., Hu, Y.: Multi-modal cross-domain alignment network for video moment retrieval. IEEE Trans. Multimedia 25, 7517–7532 (2022)
Article Google Scholar
Fang, X., Liu, D., Zhou, P., Nan, G.: You can ground earlier than see: an effective and efficient pipeline for temporal sentence grounding in compressed videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2448–2460 (2023)
Google Scholar
Fang, X., Liu, D., Zhou, P., Xu, Z., Li, R.: Hierarchical local-global transformer for temporal sentence grounding. IEEE Trans. Multimedia 26, 3263–3277 (2023)
Article Google Scholar
Goyal, A., Law, H., Liu, B., Newell, A., Deng, J.: Revisiting point cloud shape classification with a simple and effective baseline. In: International Conference on Machine Learning, pp. 3809–3820. PMLR (2021)
Google Scholar
Guo, M.H., Cai, J.X., Liu, Z.N., Mu, T.J., Martin, R.R., Hu, S.M.: PCT: point cloud transformer. Comput. Vis. Media 7, 187–199 (2021)
Article Google Scholar
Hamdi, A., Giancola, S., Ghanem, B.: MVTN: multi-view transformation network for 3D shape recognition. In: ICCV, pp. 1–11 (2021)
Google Scholar
Jiang, L., Wang, C., Ning, X., Yu, Z.: LTTPoint: a MLP-based point cloud classification method with local topology transformation module. In: 2023 7th Asian Conference on Artificial Intelligence Technology (ACAIT), pp. 783–789. IEEE (2023)
Google Scholar
Kasneci, E., et al.: ChatGPT for good? On opportunities and challenges of large language models for education. Learn. Individ. Differ. 103, 102274 (2023)
Article Google Scholar
Komarichev, A., Zhong, Z., Hua, J.: A-CNN: annularly convolutional neural networks on point clouds. In: CVPR, pp. 7421–7430 (2019)
Google Scholar
Li, L., Zhu, S., Fu, H., Tan, P., Tai, C.L.: End-to-end learning local multi-view descriptors for 3D point clouds. In: CVPR, pp. 1919–1928 (2020)
Google Scholar
Li, Y., Bu, R., Sun, M., Wu, W., Di, X., Chen, B.: PointCNN: convolution on X-transformed points. In: NeurIPS, vol. 31 (2018)
Google Scholar
Liu, Y., Tian, B., Lv, Y., Li, L., Wang, F.Y.: Point cloud classification using content-based transformer via clustering in feature space. IEEE/CAA J. Autom. Sin. 11(1), 231–239 (2023)
Article Google Scholar
Liu, Y., Chen, C., Wang, C., King, X., Liu, M.: Regress before construct: regress autoencoder for point cloud self-supervised learning. In: ACMMM, pp. 1738–1749 (2023)
Google Scholar
Liu, Y., Fan, B., Xiang, S., Pan, C.: Relation-shape convolutional neural network for point cloud analysis. In: CVPR, pp. 8895–8904 (2019)
Google Scholar
Ma, X., Qin, C., You, H., Ran, H., Fu, Y.: Rethinking network design and local geometry in point cloud: a simple residual MLP framework. arXiv preprint arXiv:2202.07123 (2022)
Maturana, D., Scherer, S.: VoxNet: a 3D convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 922–928. IEEE (2015)
Google Scholar
Moody, J., Darken, C.J.: Fast learning in networks of locally-tuned processing units. Neural Comput. 1(2), 281–294 (1989)
Article Google Scholar
Muzahid, A., Wan, W., Sohel, F., Wu, L., Hou, L.: CurveNet: curvature-based multitask learning deep networks for 3D object recognition. IEEE/CAA J. Autom. Sin. 8(6), 1177–1187 (2020)
Article Google Scholar
Pang, Y., Wang, W., Tay, F.E., Liu, W., Tian, Y., Yuan, L.: Masked autoencoders for point cloud self-supervised learning. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13662, pp. 604–621. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20086-1_35
Chapter Google Scholar
Park, J., Lee, S., Kim, S., Xiong, Y., Kim, H.J.: Self-positioning point-based transformer for point cloud understanding. In: CVPR, pp. 21814–21823 (2023)
Google Scholar
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: CVPR, pp. 652–660 (2017)
Google Scholar
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. arXiv preprint arXiv:1706.02413 (2017)
Qian, G., et al.: PointNeXt: revisiting PointNet++ with improved training and scaling strategies. In: NeurIPS, vol. 35, pp. 23192–23204 (2022)
Google Scholar
Qiu, S., Anwar, S., Barnes, N.: Dense-resolution network for point cloud classification and segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3813–3822 (2021)
Google Scholar
Qiu, S., Anwar, S., Barnes, N.: Geometric back-projection network for point cloud classification. TMM 24, 1943–1955 (2021)
Google Scholar
Ran, H., Liu, J., Wang, C.: Surface representation for point clouds. In: CVPR, pp. 18942–18952 (2022)
Google Scholar
Rao, Y., Lu, J., Zhou, J.: Global-local bidirectional reasoning for unsupervised representation learning of 3D point clouds. In: CVPR, pp. 5376–5385 (2020)
Google Scholar
Ren, J., Rao, A., Lindorfer, M., Legout, A., Choffnes, D.: ReCon: revealing and controlling PII leaks in mobile network traffic. In: Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services, pp. 361–374 (2016)
Google Scholar
Riegler, G., Osman Ulusoy, A., Geiger, A.: OctNet: learning deep 3D representations at high resolutions. In: CVPR, pp. 3577–3586 (2017)
Google Scholar
Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65(6), 386 (1958)
Article Google Scholar
Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3D shape recognition. In: ICCV, pp. 945–953 (2015)
Google Scholar
Sun, P., et al.: Scalability in perception for autonomous driving: waymo open dataset. In: CVPR, pp. 2446–2454 (2020)
Google Scholar
Thomas, H., Qi, C.R., Deschaud, J.E., Marcotegui, B., Goulette, F., Guibas, L.J.: KPConv: flexible and deformable convolution for point clouds. In: ICCV, pp. 6411–6420 (2019)
Google Scholar
Uy, M.A., Pham, Q.H., Hua, B.S., Nguyen, T., Yeung, S.K.: Revisiting point cloud classification: a new benchmark dataset and classification model on real-world data. In: ICCV, pp. 1588–1597 (2019)
Google Scholar
Wang, C., Ning, X., Li, W., Bai, X., Gao, X.: 3D person re-identification based on global semantic guidance and local feature aggregation. IEEE Trans. Circ. Syst. Video Technol. 34(6), 4698–4712 (2024)
Article Google Scholar
Wang, C., Ning, X., Sun, L., Zhang, L., Li, W., Bai, X.: Learning discriminative features by covering local geometric space for point cloud analysis. IEEE Trans. Geosci. Remote Sens. 60, 1–15 (2022)
Google Scholar
Wang, C., Wang, H., Ning, X., Shengwei, T., Li, W.: 3D point cloud classification method based on dynamic coverage of local area. J. Softw. 34(4), 1962–1976 (2022)
Google Scholar
Wang, C., Samari, B., Siddiqi, K.: Local spectral graph convolution for point set feature learning. In: ECCV, pp. 52–66 (2018)
Google Scholar
Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. (ToG) 38(5), 1–12 (2019)
Article Google Scholar
Wei, X., Yu, R., Sun, J.: View-GCN: view-based graph convolutional network for 3D shape analysis. In: CVPR, pp. 1850–1859 (2020)
Google Scholar
Wu, J., Zhang, C., Xue, T., Freeman, B., Tenenbaum, J.: Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In: NeurIPS, vol. 29 (2016)
Google Scholar
Wu, W., Qi, Z., Fuxin, L.: PointConv: deep convolutional networks on 3D point clouds. In: CVPR, pp. 9621–9630 (2019)
Google Scholar
Wu, Z., et al.: 3D ShapeNets: a deep representation for volumetric shapes. In: CVPR, pp. 1912–1920 (2015)
Google Scholar
Xu, M., Ding, R., Zhao, H., Qi, X.: PAConv: position adaptive convolution with dynamic kernel assembling on point clouds. In: CVPR, pp. 3173–3182 (2021)
Google Scholar
Xue, L., et al.: ULIP: learning a unified representation of language, images, and point clouds for 3D understanding. In: CVPR, pp. 1179–1189 (2023)
Google Scholar
Yan, X., Zheng, C., Li, Z., Wang, S., Cui, S.: PointASNL: robust point clouds processing using nonlocal neural networks with adaptive sampling. In: CVPR, pp. 5589–5598 (2020)
Google Scholar
Yang, Y., Feng, C., Shen, Y., Tian, D.: FoldingNet: point cloud auto-encoder via deep grid deformation. In: CVPR, pp. 206–215 (2018)
Google Scholar
Yang, Z., Wang, L.: Learning relationships for multi-view 3D object recognition. In: ICCV, pp. 7505–7514 (2019)
Google Scholar
Yi, L., et al.: A scalable active framework for region annotation in 3D shape collections. ACM Trans. Graph. (ToG) 35(6), 1–12 (2016)
Article Google Scholar
Yu, X., Tang, L., Rao, Y., Huang, T., Zhou, J., Lu, J.: Point-BERT: pre-training 3D point cloud transformers with masked point modeling. In: CVPR, pp. 19313–19322 (2022)
Google Scholar
Yu, Z., Li, L., Xie, J., Wang, C., Li, W., Ning, X.: Pedestrian 3D shape understanding for person re-identification via multi-view learning. IEEE Trans. Circ. Syst. Video Technol. 34(7), 5589–5602 (2024)
Article Google Scholar
Zha, Y., et al.: Towards compact 3D representations via point feature enhancement masked autoencoders. arXiv preprint arXiv:2312.10726 (2023)
Zhang, C., Wan, H., Shen, X., Wu, Z.: PVT: point-voxel transformer for point cloud learning. arXiv preprint arXiv:2108.06076 (2021)
Zhang, H., et al.: Deep learning-based 3D point cloud classification: a systematic survey and outlook. Displays 79, 102456 (2023)
Article Google Scholar
Zhang, H., Wang, C., Yu, L., Tian, S., Ning, X., Rodrigues, J.: PointGT: a method for point-cloud classification and segmentation based on local geometric transformation. IEEE Trans. Multimedia 26, 8052–8062 (2024)
Article Google Scholar
Zhang, R., Wang, L., Wang, Y., Gao, P., Li, H., Shi, J.: Parameter is not all you need: starting from non-parametric networks for 3D point cloud analysis. arXiv preprint arXiv:2303.08134 (2023)
Zhao, H., Jiang, L., Jia, J., Torr, P.H., Koltun, V.: Point transformer. In: ICCV, pp. 16259–16268 (2021)
Google Scholar
Zhou, H., Feng, Y., Fang, M., Wei, M., Qin, J., Lu, T.: Adaptive graph convolution for point cloud analysis. In: ICCV, pp. 4965–4974 (2021)
Google Scholar
Zhu, X.X., Shahzad, M.: Facade reconstruction using multiview spaceborne TomoSAR point clouds. IEEE Trans. Geosci. Remote Sens. 52(6), 3541–3552 (2013)
Article Google Scholar

Download references

Acknowledgment

This work was supported in part by NTU-DESAY SV Research Program under Grant 2018-0980; and in part by the Ministry of Education, Singapore, under its Academic Research Fund Tier 1, under Grant RG78/21. The computational work for this article was partially performed on resources of the National Supercomputing Centre, Singapore (https://www.nscc.sg).

Author information

Authors and Affiliations

Cyber Security Research Center (CYSREN), Nanyang Technological University, Singapore, Singapore
Changshuo Wang, Meiqing Wu, Siew-Kei Lam, Shangshu Yu, Ruiping Wang & Thambipillai Srikanthan
College of Computing and Data Science, Nanyang Technological University, Singapore, Singapore
Siew-Kei Lam & Thambipillai Srikanthan
Institute of Semiconductors, Chinese Academy of Sciences, Beijing, China
Xin Ning & Weijun Li

Authors

Changshuo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Meiqing Wu
View author publications
You can also search for this author in PubMed Google Scholar
Siew-Kei Lam
View author publications
You can also search for this author in PubMed Google Scholar
Xin Ning
View author publications
You can also search for this author in PubMed Google Scholar
Shangshu Yu
View author publications
You can also search for this author in PubMed Google Scholar
Ruiping Wang
View author publications
You can also search for this author in PubMed Google Scholar
Weijun Li
View author publications
You can also search for this author in PubMed Google Scholar
Thambipillai Srikanthan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Siew-Kei Lam .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Hessen, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 559 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, C. et al. (2025). GPSFormer: A Global Perception and Local Structure Fitting-Based Transformer for Point Cloud Understanding. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15066. Springer, Cham. https://doi.org/10.1007/978-3-031-73242-3_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-73242-3_5
Published: 29 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73241-6
Online ISBN: 978-3-031-73242-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

GPSFormer: A Global Perception and Local Structure Fitting-Based Transformer for Point Cloud Understanding