PointUltra: ultra-efficient mamba framework for transformative point cloud analysis

Liu, Bo; Tian, Shengwei; Yu, Long; Fan, Xin; Zhu, Zhezhe; Lv, Jialun

doi:10.1007/s11227-025-07066-4

PointUltra: ultra-efficient mamba framework for transformative point cloud analysis

Published: 02 March 2025

Volume 81, article number 562, (2025)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Bo Liu¹,
Shengwei Tian¹^na1,
Long Yu²^na1,
Xin Fan¹^na1,
Zhezhe Zhu¹^na1 &
…
Jialun Lv¹

214 Accesses
Explore all metrics

Abstract

The field of point cloud analysis has made significant advancements due to the propulsion of deep learning. The transformer model has sparked a revolutionary change in point cloud learning tasks. However, its quadratic complexity limits its scalability in processing long sequences and strains on limited computational resources. In this paper, we introduce PointUltra, an innovative hybrid framework based on Mamba that diverges from traditional transformer-based methods. PointUltra employs a linear complexity algorithm, which significantly reduces computational costs while demonstrating robust global modeling capabilities. Specifically, we propose UltraX, a spatially coherent traversal serialization strategy that efficiently converts complex 3D point cloud data into ordered 1D sequences. This processed data allow the Mamba encoder to effectively extract intergroup global features from each point cloud sample, ensuring comprehensive global analysis. Unlike previous Mamba methods, our UltraX serialization strategy overcomes the challenges of random and coordinate-based serialization. This strategy preserves point adjacency and spatial relationships in point clouds. Experimental results also validate the effectiveness of our approach, achieving an accuracy of 92.3% on the ScanObjectNN variant OBJ-BG, 94.2% accuracy on ModelNet40, and an ins.mIoU of 86.8% on the ShapeNetPart dataset, while significantly saving about 25% of FLOPs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DualMLP: a two-stream fusion model for 3D point cloud classification

Article 10 October 2023

Position adaptive residual block and knowledge complement strategy for point cloud analysis

Article Open access 27 April 2024

P2P-Bridge: Diffusion Bridges for 3D Point Cloud Denoising

Data availability

No datasets were generated or analyzed during the current study.

References

Cheng R, Zeng H, Zhang B, Wang X, Zhao T (2023) FFA-Net: fast feature aggregation network for 3D point cloud segmentation. Mach Vis Appl 34(5):80
Article MATH Google Scholar
Sun L, Li Y, Qin W (2024) PEPillar: a point-enhanced pillar network for efficient 3D object detection in autonomous driving. Vis Comput 1–12
Zhu L, Liao B, Zhang Q, Wang X, Liu W, Wang X (2024) Vision mamba: efficient visual representation learning with bidirectional state space model. Preprint at arXiv:2401.09417
Chatterjee J, Vega MT (2024) 3D-scene-former: 3D scene generation from a single RGB image using transformers. Vis Comput 1–15
Yu X, Tang L, Rao Y, Huang T, Zhou J, Lu J (2022) Point-Bert: pre-training 3d point cloud transformers with masked point modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 19313–19322
Pang Y, Wang W, Tay FE, Liu W, Tian Y, Yuan L (2022) Masked autoencoders for point cloud self-supervised learning. In: European Conference on Computer Vision. Springer, pp 604–621
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems. vol 30
Yang Y, Xing Z, Zhu L (2024) Vivim: a video vision mamba for medical video object segmentation. Preprint at arXiv:2401.14168
Gu A, Johnson I, Goel K, Saab K, Dao T, Rudra A, Ré C (2021) Combining recurrent, convolutional, and continuous-time models with linear state space layers. Adv Neural Inf Process Syst 34:572–585
Google Scholar
He W, Han K, Tang Y, Wang C, Yang Y, Guo T, Wang Y (2024) Densemamba: State space models with dense hidden connection for efficient large language models. Preprint at arXiv:2403.00818
Gu A, Goel K, Re C (2021) Efficiently modeling long sequences with structured state spaces. In: Proceedings of the International Conference on Learning Representations
Wu C, Zheng J, Pfrommer J, Beyerer J (2023) Attention-based point cloud edge sampling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 5333–5343
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: Proceedings of the International Conference on Learning Representations
Liang D, Zhou X, Wang X, Zhu X, Xu W, Zou Z, Ye X, Bai X (2024) Pointmamba: a simple state space model for point cloud analysis. Preprint at arXiv:2402.10739
Qi CR, Su H, Mo K, Guibas LJ (2017) Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 652–660
Huang T, Zou H, Cui J, Zhang J, Yang X, Li L, Liu Y (2022) Adaptive recurrent forward network for dense point cloud completion. IEEE Trans Multimed 25:5903–5915
Article MATH Google Scholar
Deng X, Zhang W, Ding Q, Zhang X (2023) PointVector: a vector representation in point cloud analysis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 9455–9465
Huang T, Zhang J, Chen J, Ding Z, Tai Y, Zhang Z, Wang C, Liu Y (2022) 3QNet: 3d point cloud geometry quantization compression network. ACM Trans Graph (TOG) 41(6):1–13
Article MATH Google Scholar
Huang T, Ding Z, Zhang J, Tai Y, Zhang Z, Chen M, Wang C, Liu Y (2023) Learning to measure the point cloud reconstruction loss in a representation space. In: CVPR. p 3
Qi CR, Yi L, Su H, Guibas LJ (2017) Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In: Advances in neural information processing systems. vol 30
Choy C, Gwak J, Savarese S (2019) 4d spatio-temporal convnets: Minkowski convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 3075–3084
Li Y, Bu R, Sun M, Wu W, Di X, Chen B (2018) PointCNN: convolution on X-transformed points. Adv Neural Inf Process Syst 31:1–37815
MATH Google Scholar
Zhao H, Jiang L, Fu C-W, Jia J (2019) Pointweb: Enhancing local neighborhood features for point cloud processing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5565–5573
Komarichev A, Zhong Z, Hua J (2019) A-CNN: annularly convolutional neural networks on point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 7421–7430
Qian G, Li Y, Peng H, Mai J, Hammoud H, Elhoseiny M, Ghanem B (2022) PointNeXt: revisiting PointNet++ with improved training. Adv Neural Inf Process Syst 35:23192–23204
Google Scholar
Wu W, Qi Z, Fuxin L (2019) PointConv: deep convolutional networks on 3D point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 9621–9630
He Q, Wang Z, Zeng H, Zeng Y, Liu Y (2022) Svga-net: sparse voxel-graph attention network for 3d object detection from point clouds. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol 36, pp 870–878
Landrieu L, Simonovsky M (2018) Large-scale point cloud semantic segmentation with superpoint graphs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 4558–4567
Shen Y, Feng C, Yang Y, Tian D (2018) Mining point cloud local structures by kernel correlation and graph pooling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 4548–4557
Wang Y, Sun Y, Liu Z, Sarma SE, Bronstein MM, Solomon JM (2019) Dynamic graph CNN for learning on point clouds. ACM Trans Graph (TOG) 38(5):1–12 (1, 3, 7, 8)
Article MATH Google Scholar
Hou Y, Zhu X, Ma Y, Loy CC, Li Y (2022) Point-to-voxel knowledge distillation for lidar semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 8479–8488
Smith JT, Warrington A, Linderman S (2022) Simplified state space layers for sequence modeling. In: Proceedings of the International Conference on Learning Representations
Guo M-H, Cai J-X, Liu Z-N, Mu T-J, Martin RR, Hu S-M (2021) Pct: point cloud transformer. Comput Vis Media 7:187–199
Article Google Scholar
Yan X, Zheng C, Li Z, Wang S, Cui S (2020) Pointasnl: robust point clouds processing using nonlocal neural networks with adaptive sampling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 5589–5598
Wang W, Xie E, Li X, Fan D-P, Song K, Liang D, Lu T, Luo P, Shao L (2022) Pvt v2: improved baselines with pyramid vision transformer. Comput Vis Media 8(3):415–424
Article Google Scholar
Wu X, Lao Y, Jiang L, Liu X, Zhao H (2022) Point transformer v2: grouped vector attention and partition-based pooling. Adv Neural Inf Process Syst 35:33330–33342
Google Scholar
Park C, Jeong Y, Cho M, Park J (2022) Fast point transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 16949–16958
Fan L, Pang Z, Zhang T, Wang Y-X, Zhao H, Wang F, Wang N, Zhang Z (2022) Embracing single stride 3d object detector with sparse transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 8458–8468
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp 10012–10022
Sun P, Tan M, Wang W, Liu C, Xia F, Leng Z, Anguelov D (2022) Swformer: sparse window transformer for 3d object detection in point clouds. In: European Conference on Computer Vision. Springer, pp 426–442
Gu A, Dao T (2023) Mamba: linear-time sequence modeling with selective state spaces. Preprint at arXiv:2312.00752
Zhu L, Liao B, Zhang Q, Wang X, Liu W, Wang X (2024) Vision mamba: efficient visual representation learning with bidirectional state space model. Preprint at arXiv:2401.09417
Wang C, Tsepa O, Ma J, Wang B (2024) Graph-mamba: towards long-range graph sequence modeling with selective state spaces. Preprint at arXiv:2402.00789
Li K, Li X, Wang Y, He Y, Wang Y, Wang L, Qiao Y (2024) Videomamba: state space model for efficient video understanding. Preprint at arXiv:2403.06977
Gu A, Dao T, Ermon S, Rudra A, Ré C (2020) Hippo: recurrent memory with optimal polynomial projections. In: Advances in neural information processing systems
Gu A, Goel K, Ré C (2021) Efficiently modeling long sequences with structured state spaces. Preprint at arXiv:2111.00396
Nguyen E, Goel K, Gu A, Downs G, Shah P, Dao T, Baccus S, Ré C (2022) S4ND: modeling images and videos as multidimensional signals with state spaces. Adv Neural Inf Process Syst 35:2846–2861
Google Scholar
Gupta A, Gu A, Berant J (2022) Diagonal state spaces are as effective as structured state spaces. Adv Neural Inf Process Syst 35:22982–22994
Google Scholar
Gu A, Goel K, Gupta A, Ré C (2022) On the parameterization and initialization of diagonal state space models. Adv Neural Inf Process Syst 35:35971–35983
MATH Google Scholar
Smith JT, Warrington A, Linderman SW (2022) Simplified state space layers for sequence modeling. Preprint at arXiv:2208.04933
Hasani R, Lechner M, Wang T-H, Chahine M, Amini A, Rus D (2022) Liquid structural state-space models. Preprint at arXiv:2209.12951
Behrouz A, Hashemi F (2024) Graph mamba: towards learning on graphs with state space models. Preprint at arXiv:2401.14168
Uy MA, Pham Q-H, Hua B-S, Nguyen T, Yeung S-K (2019) Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp 1588–1597
Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3d shapenets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 1912–1920
Yi L, Kim VG, Ceylan D, Shen I-C, Yan M, Su H, Lu C, Huang Q, Sheffer A, Guibas L (2016) A scalable active framework for region annotation in 3d shape collections. ACM Trans Graph (ToG) 35(6):1–12
Article Google Scholar
Zhang R, Guo Z, Gao P, Fang R, Zhao B, Wang D, Qiao Y, Li H (2022) Point-M2AE: multi-scale masked autoencoders for hierarchical point cloud pre-training. Adv Neural Inf Process Syst 35:27061–27074
Google Scholar
Zhao H, Jiang L, Jia J, Torr PH, Koltun V (2021) Point transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp 16259–16268
Hamdi A, Giancola S, Ghanem B (2021) MVTN: multi-view transformation network for 3d shape recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp 1–11
Ma X, Qin C, You H, Ran H, Fu Y (2022) Rethinking network design and local geometry in point cloud: a simple residual MLP framework. Preprint at arXiv:2202.07123
Z, Wang Z, Chen Z, Wu Y et al (2024) Pointramba: a hybrid transformer-mamba framework for point cloud analysis. Preprint at arXiv:2405.15463
Zhang T, Li X, Yuan H, Ji S, Yan S (2024) Point cloud mamba: point cloud learning via state space model. Preprint at arXiv:2403.00762
Han X, Tang Y, Wang Z, Li X (2024) Mamba3D: enhancing local features for 3d point cloud analysis via state space model. Preprint at arXiv:2404.14966
Liao Y, Xie J, Geiger A (2022) KITTI-360: a novel dataset and benchmarks for urban scene understanding in 2d and 3d. Pattern Anal Mach Intell (PAMI)

Download references

Acknowledgements

We thank the anonymous reviewers for their insightful comments.

Funding

This study was partially supported by the Tianshan Talent Training Program in the Autonomous Region, China (Grant Number: 2023TSYCLJ0023).

Author information

Long Yu, Xin Fan, Zhezhe Zhu and Jialun Lv have contributed equally to this work.

Authors and Affiliations

College of Software, Xinjiang University, Ürümqi, 830091, China
Bo Liu, Shengwei Tian, Xin Fan, Zhezhe Zhu & Jialun Lv
Network Center, Xinjiang University, Ürümqi, 830091, China
Long Yu

Authors

Bo Liu
View author publications
You can also search for this author inPubMed Google Scholar
Shengwei Tian
View author publications
You can also search for this author inPubMed Google Scholar
Long Yu
View author publications
You can also search for this author inPubMed Google Scholar
Xin Fan
View author publications
You can also search for this author inPubMed Google Scholar
Zhezhe Zhu
View author publications
You can also search for this author inPubMed Google Scholar
Jialun Lv
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Bo Liu was responsible for the entire experimental process and the writing of the manuscript. Xin Fan and Zhezhe Zhu handled data collection and analysis. Jialun Lv. and LongYu developed the software tools used in the study. Shengwei Tian supervised the entire research process and reviewed the manuscript. All authors participated in the manuscript review and gave final approval.

Corresponding author

Correspondence to Shengwei Tian.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, B., Tian, S., Yu, L. et al. PointUltra: ultra-efficient mamba framework for transformative point cloud analysis. J Supercomput 81, 562 (2025). https://doi.org/10.1007/s11227-025-07066-4

Download citation

Accepted: 13 February 2025
Published: 02 March 2025
DOI: https://doi.org/10.1007/s11227-025-07066-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PointUltra: ultra-efficient mamba framework for transformative point cloud analysis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

DualMLP: a two-stream fusion model for 3D point cloud classification

Position adaptive residual block and knowledge complement strategy for point cloud analysis

P2P-Bridge: Diffusion Bridges for 3D Point Cloud Denoising

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now