Skip to main content

Advertisement

Log in

PointUltra: ultra-efficient mamba framework for transformative point cloud analysis

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The field of point cloud analysis has made significant advancements due to the propulsion of deep learning. The transformer model has sparked a revolutionary change in point cloud learning tasks. However, its quadratic complexity limits its scalability in processing long sequences and strains on limited computational resources. In this paper, we introduce PointUltra, an innovative hybrid framework based on Mamba that diverges from traditional transformer-based methods. PointUltra employs a linear complexity algorithm, which significantly reduces computational costs while demonstrating robust global modeling capabilities. Specifically, we propose UltraX, a spatially coherent traversal serialization strategy that efficiently converts complex 3D point cloud data into ordered 1D sequences. This processed data allow the Mamba encoder to effectively extract intergroup global features from each point cloud sample, ensuring comprehensive global analysis. Unlike previous Mamba methods, our UltraX serialization strategy overcomes the challenges of random and coordinate-based serialization. This strategy preserves point adjacency and spatial relationships in point clouds. Experimental results also validate the effectiveness of our approach, achieving an accuracy of 92.3% on the ScanObjectNN variant OBJ-BG, 94.2% accuracy on ModelNet40, and an ins.mIoU of 86.8% on the ShapeNetPart dataset, while significantly saving about 25% of FLOPs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

No datasets were generated or analyzed during the current study.

References

  1. Cheng R, Zeng H, Zhang B, Wang X, Zhao T (2023) FFA-Net: fast feature aggregation network for 3D point cloud segmentation. Mach Vis Appl 34(5):80

    Article  MATH  Google Scholar 

  2. Sun L, Li Y, Qin W (2024) PEPillar: a point-enhanced pillar network for efficient 3D object detection in autonomous driving. Vis Comput 1–12

  3. Zhu L, Liao B, Zhang Q, Wang X, Liu W, Wang X (2024) Vision mamba: efficient visual representation learning with bidirectional state space model. Preprint at arXiv:2401.09417

  4. Chatterjee J, Vega MT (2024) 3D-scene-former: 3D scene generation from a single RGB image using transformers. Vis Comput 1–15

  5. Yu X, Tang L, Rao Y, Huang T, Zhou J, Lu J (2022) Point-Bert: pre-training 3d point cloud transformers with masked point modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 19313–19322

  6. Pang Y, Wang W, Tay FE, Liu W, Tian Y, Yuan L (2022) Masked autoencoders for point cloud self-supervised learning. In: European Conference on Computer Vision. Springer, pp 604–621

  7. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems. vol 30

  8. Yang Y, Xing Z, Zhu L (2024) Vivim: a video vision mamba for medical video object segmentation. Preprint at arXiv:2401.14168

  9. Gu A, Johnson I, Goel K, Saab K, Dao T, Rudra A, Ré C (2021) Combining recurrent, convolutional, and continuous-time models with linear state space layers. Adv Neural Inf Process Syst 34:572–585

    Google Scholar 

  10. He W, Han K, Tang Y, Wang C, Yang Y, Guo T, Wang Y (2024) Densemamba: State space models with dense hidden connection for efficient large language models. Preprint at arXiv:2403.00818

  11. Gu A, Goel K, Re C (2021) Efficiently modeling long sequences with structured state spaces. In: Proceedings of the International Conference on Learning Representations

  12. Wu C, Zheng J, Pfrommer J, Beyerer J (2023) Attention-based point cloud edge sampling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 5333–5343

  13. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: Proceedings of the International Conference on Learning Representations

  14. Liang D, Zhou X, Wang X, Zhu X, Xu W, Zou Z, Ye X, Bai X (2024) Pointmamba: a simple state space model for point cloud analysis. Preprint at arXiv:2402.10739

  15. Qi CR, Su H, Mo K, Guibas LJ (2017) Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 652–660

  16. Huang T, Zou H, Cui J, Zhang J, Yang X, Li L, Liu Y (2022) Adaptive recurrent forward network for dense point cloud completion. IEEE Trans Multimed 25:5903–5915

    Article  MATH  Google Scholar 

  17. Deng X, Zhang W, Ding Q, Zhang X (2023) PointVector: a vector representation in point cloud analysis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 9455–9465

  18. Huang T, Zhang J, Chen J, Ding Z, Tai Y, Zhang Z, Wang C, Liu Y (2022) 3QNet: 3d point cloud geometry quantization compression network. ACM Trans Graph (TOG) 41(6):1–13

    Article  MATH  Google Scholar 

  19. Huang T, Ding Z, Zhang J, Tai Y, Zhang Z, Chen M, Wang C, Liu Y (2023) Learning to measure the point cloud reconstruction loss in a representation space. In: CVPR. p 3

  20. Qi CR, Yi L, Su H, Guibas LJ (2017) Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In: Advances in neural information processing systems. vol 30

  21. Choy C, Gwak J, Savarese S (2019) 4d spatio-temporal convnets: Minkowski convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 3075–3084

  22. Li Y, Bu R, Sun M, Wu W, Di X, Chen B (2018) PointCNN: convolution on X-transformed points. Adv Neural Inf Process Syst 31:1–37815

    MATH  Google Scholar 

  23. Zhao H, Jiang L, Fu C-W, Jia J (2019) Pointweb: Enhancing local neighborhood features for point cloud processing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5565–5573

  24. Komarichev A, Zhong Z, Hua J (2019) A-CNN: annularly convolutional neural networks on point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 7421–7430

  25. Qian G, Li Y, Peng H, Mai J, Hammoud H, Elhoseiny M, Ghanem B (2022) PointNeXt: revisiting PointNet++ with improved training. Adv Neural Inf Process Syst 35:23192–23204

    Google Scholar 

  26. Wu W, Qi Z, Fuxin L (2019) PointConv: deep convolutional networks on 3D point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 9621–9630

  27. He Q, Wang Z, Zeng H, Zeng Y, Liu Y (2022) Svga-net: sparse voxel-graph attention network for 3d object detection from point clouds. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol 36, pp 870–878

  28. Landrieu L, Simonovsky M (2018) Large-scale point cloud semantic segmentation with superpoint graphs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 4558–4567

  29. Shen Y, Feng C, Yang Y, Tian D (2018) Mining point cloud local structures by kernel correlation and graph pooling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 4548–4557

  30. Wang Y, Sun Y, Liu Z, Sarma SE, Bronstein MM, Solomon JM (2019) Dynamic graph CNN for learning on point clouds. ACM Trans Graph (TOG) 38(5):1–12 (1, 3, 7, 8)

    Article  MATH  Google Scholar 

  31. Hou Y, Zhu X, Ma Y, Loy CC, Li Y (2022) Point-to-voxel knowledge distillation for lidar semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 8479–8488

  32. Smith JT, Warrington A, Linderman S (2022) Simplified state space layers for sequence modeling. In: Proceedings of the International Conference on Learning Representations

  33. Guo M-H, Cai J-X, Liu Z-N, Mu T-J, Martin RR, Hu S-M (2021) Pct: point cloud transformer. Comput Vis Media 7:187–199

    Article  Google Scholar 

  34. Yan X, Zheng C, Li Z, Wang S, Cui S (2020) Pointasnl: robust point clouds processing using nonlocal neural networks with adaptive sampling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 5589–5598

  35. Wang W, Xie E, Li X, Fan D-P, Song K, Liang D, Lu T, Luo P, Shao L (2022) Pvt v2: improved baselines with pyramid vision transformer. Comput Vis Media 8(3):415–424

    Article  Google Scholar 

  36. Wu X, Lao Y, Jiang L, Liu X, Zhao H (2022) Point transformer v2: grouped vector attention and partition-based pooling. Adv Neural Inf Process Syst 35:33330–33342

    Google Scholar 

  37. Park C, Jeong Y, Cho M, Park J (2022) Fast point transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 16949–16958

  38. Fan L, Pang Z, Zhang T, Wang Y-X, Zhao H, Wang F, Wang N, Zhang Z (2022) Embracing single stride 3d object detector with sparse transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 8458–8468

  39. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp 10012–10022

  40. Sun P, Tan M, Wang W, Liu C, Xia F, Leng Z, Anguelov D (2022) Swformer: sparse window transformer for 3d object detection in point clouds. In: European Conference on Computer Vision. Springer, pp 426–442

  41. Gu A, Dao T (2023) Mamba: linear-time sequence modeling with selective state spaces. Preprint at arXiv:2312.00752

  42. Zhu L, Liao B, Zhang Q, Wang X, Liu W, Wang X (2024) Vision mamba: efficient visual representation learning with bidirectional state space model. Preprint at arXiv:2401.09417

  43. Wang C, Tsepa O, Ma J, Wang B (2024) Graph-mamba: towards long-range graph sequence modeling with selective state spaces. Preprint at arXiv:2402.00789

  44. Li K, Li X, Wang Y, He Y, Wang Y, Wang L, Qiao Y (2024) Videomamba: state space model for efficient video understanding. Preprint at arXiv:2403.06977

  45. Gu A, Dao T, Ermon S, Rudra A, Ré C (2020) Hippo: recurrent memory with optimal polynomial projections. In: Advances in neural information processing systems

  46. Gu A, Goel K, Ré C (2021) Efficiently modeling long sequences with structured state spaces. Preprint at arXiv:2111.00396

  47. Nguyen E, Goel K, Gu A, Downs G, Shah P, Dao T, Baccus S, Ré C (2022) S4ND: modeling images and videos as multidimensional signals with state spaces. Adv Neural Inf Process Syst 35:2846–2861

    Google Scholar 

  48. Gupta A, Gu A, Berant J (2022) Diagonal state spaces are as effective as structured state spaces. Adv Neural Inf Process Syst 35:22982–22994

    Google Scholar 

  49. Gu A, Goel K, Gupta A, Ré C (2022) On the parameterization and initialization of diagonal state space models. Adv Neural Inf Process Syst 35:35971–35983

    MATH  Google Scholar 

  50. Smith JT, Warrington A, Linderman SW (2022) Simplified state space layers for sequence modeling. Preprint at arXiv:2208.04933

  51. Hasani R, Lechner M, Wang T-H, Chahine M, Amini A, Rus D (2022) Liquid structural state-space models. Preprint at arXiv:2209.12951

  52. Behrouz A, Hashemi F (2024) Graph mamba: towards learning on graphs with state space models. Preprint at arXiv:2401.14168

  53. Uy MA, Pham Q-H, Hua B-S, Nguyen T, Yeung S-K (2019) Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp 1588–1597

  54. Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3d shapenets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 1912–1920

  55. Yi L, Kim VG, Ceylan D, Shen I-C, Yan M, Su H, Lu C, Huang Q, Sheffer A, Guibas L (2016) A scalable active framework for region annotation in 3d shape collections. ACM Trans Graph (ToG) 35(6):1–12

    Article  Google Scholar 

  56. Zhang R, Guo Z, Gao P, Fang R, Zhao B, Wang D, Qiao Y, Li H (2022) Point-M2AE: multi-scale masked autoencoders for hierarchical point cloud pre-training. Adv Neural Inf Process Syst 35:27061–27074

    Google Scholar 

  57. Zhao H, Jiang L, Jia J, Torr PH, Koltun V (2021) Point transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp 16259–16268

  58. Hamdi A, Giancola S, Ghanem B (2021) MVTN: multi-view transformation network for 3d shape recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp 1–11

  59. Ma X, Qin C, You H, Ran H, Fu Y (2022) Rethinking network design and local geometry in point cloud: a simple residual MLP framework. Preprint at arXiv:2202.07123

  60. Z, Wang Z, Chen Z, Wu Y et al (2024) Pointramba: a hybrid transformer-mamba framework for point cloud analysis. Preprint at arXiv:2405.15463

  61. Zhang T, Li X, Yuan H, Ji S, Yan S (2024) Point cloud mamba: point cloud learning via state space model. Preprint at arXiv:2403.00762

  62. Han X, Tang Y, Wang Z, Li X (2024) Mamba3D: enhancing local features for 3d point cloud analysis via state space model. Preprint at arXiv:2404.14966

  63. Liao Y, Xie J, Geiger A (2022) KITTI-360: a novel dataset and benchmarks for urban scene understanding in 2d and 3d. Pattern Anal Mach Intell (PAMI)

Download references

Acknowledgements

We thank the anonymous reviewers for their insightful comments.

Funding

This study was partially supported by the Tianshan Talent Training Program in the Autonomous Region, China (Grant Number: 2023TSYCLJ0023).

Author information

Authors and Affiliations

Authors

Contributions

Bo Liu was responsible for the entire experimental process and the writing of the manuscript. Xin Fan and Zhezhe Zhu handled data collection and analysis. Jialun Lv. and LongYu developed the software tools used in the study. Shengwei Tian supervised the entire research process and reviewed the manuscript. All authors participated in the manuscript review and gave final approval.

Corresponding author

Correspondence to Shengwei Tian.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, B., Tian, S., Yu, L. et al. PointUltra: ultra-efficient mamba framework for transformative point cloud analysis. J Supercomput 81, 562 (2025). https://doi.org/10.1007/s11227-025-07066-4

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11227-025-07066-4

Keywords