Skip to main content
Log in

WalkFormer: 3D mesh analysis via transformer on random walk

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

A 3D mesh is a popular representation of 3D shapes. For mesh analysis tasks, one typical method is to map 3D mesh data into 1D sequence data with random walk sampling. However, existing random walk-based approaches cannot make full use of attentive regions, which limits the capability of 3D shape analysis. In addition, existing methods process the random walk as sequence data in the discovery order, which results in computational overhead. In this paper, we propose a novel neural framework named WalkFormer, which applies a transformer to a random walk to fully exploit semantic information in a 3D mesh. First, we propose a novel transformer-based framework to learn semantic information from a random walk of a 3D mesh. Second, to capture the attentive regions of the random walk, our approach extends the multi-head self-attention mechanism to specific 3D mesh analysis tasks. To establish the long-range interactions between vertices in the random walk, our approach adopts a novel relative position encoding module. Thus, the local–global information in the random walk can be obtained and learned in our approach. Third, we discover that for 3D mesh analysis, the sequential operations for the random walk sequence are redundant. Different from previous random walk methods, our approach can be executed in a parallelized manner, which greatly improves computational efficiency. Numerous experiments demonstrate the effectiveness of the proposed method on typical 3D shape analysis tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Abouelaziz I, Chetouani A, El Hassouni M, Latecki LJ, Cherifi H (2020) 3d visual saliency and convolutional neural network for blind mesh quality assessment. Neural Comput Appl 32(21):16589–16603

    Google Scholar 

  2. Kwon S, Kim BC, Mun D, Han S (2015) Simplification of feature-based 3D CAD assembly data of ship and offshore equipment using quantitative evaluation metrics. Comput Aided Des 59:140–154

    Google Scholar 

  3. Lin B, Wang F, Zhao F, Sun Y (2018) Scale invariant point feature (SIPF) for 3D point clouds and 3d multi-scale object detection. Neural Comput Appl 29(5):1209–1224

    Google Scholar 

  4. Kim BC, Mun D (2014) Feature-based simplification of boundary representation models using sequential iterative volume decomposition. Comput Graph 38:97–107

    Google Scholar 

  5. Wang Y, Horvath I (2013) Computer-aided multi-scale materials and product design. Comput Aided Des 45(1):1–3

    MathSciNet  Google Scholar 

  6. Rosen DW, Jeong N, Wang Y (2013) A method for reverse engineering of material microstructure for heterogeneous cad. Comput Aided Des 45(7):1068–1078

    Google Scholar 

  7. Wang W, Cai Y, Wang T (2022) Multi-view dual attention network for 3D object recognition. Neural Comput Appl 34(4):3201–3212

    Google Scholar 

  8. Hanocka R, Hertz A, Fish N, Giryes R, Fleishman S, Cohen-Or D (2019) Meshcnn: a network with an edge. ACM Trans Graph 38(4):1–12

    Google Scholar 

  9. Tang W, He F, Liu Y, Duan Y (2022) MATR: multimodal medical image fusion via multiscale adaptive transformer. IEEE Trans Image Process 31:5134–5149

    ADS  PubMed  Google Scholar 

  10. Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90

    Google Scholar 

  11. Liu W, Wang Z, Liu X, Zeng N, Liu Y, Alsaadi FE (2017) A survey of deep neural network architectures and their applications. Neurocomputing 234:11–26

    Google Scholar 

  12. Zhao T, Chen Q, Kuang Z, Yu J, Zhang W, Fan J (2018) Deep mixture of diverse experts for large-scale visual recognition. IEEE Trans Pattern Anal Mach Intell 41(5):1072–1087

    PubMed  Google Scholar 

  13. Zhang H, Sun Y, Liu L, Wang X, Li L, Liu W (2020) Clothingout: a category-supervised GAN model for clothing segmentation and retrieval. Neural Comput Appl 32(9):4519–4530

    Google Scholar 

  14. El-Bana S, Al-Kabbany A, Sharkas M (2020) A two-stage framework for automated malignant pulmonary nodule detection in CT scans. Diagnostics 10(3):131

    PubMed  PubMed Central  Google Scholar 

  15. Zhang H, Ji Y, Huang W, Liu L (2019) Sitcom-star-based clothing retrieval for video advertising: a deep learning framework. Neural Comput Appl 31(11):7361–7380

    Google Scholar 

  16. Hussein A, Elyan E, Gaber MM, Jayne C (2018) Deep imitation learning for 3D navigation tasks. Neural Comput Appl 29(7):389–404

    PubMed  Google Scholar 

  17. El-Bana S, Al-Kabbany A, Sharkas M (2020) A multi-task pipeline with specialized streams for classification and segmentation of infection manifestations in covid-19 scans. PeerJ Comput Sci 6:303

    Google Scholar 

  18. Zhang X, Zhao W, Zhang W, Peng J, Fan J (2022) Guided filter network for semantic image segmentation. IEEE Trans Image Process 31:2695–2709

    ADS  PubMed  Google Scholar 

  19. Wang S, Chen Z, You S, Wang B, Shen Y, Lei B (2022) Brain stroke lesion segmentation using consistent perception generative adversarial network. Neural Comput Appl 34(11):8657–8669

    Google Scholar 

  20. Wu H, He F, Duan Y, Yan X (2022) Perceptual metric-guided human image generation. Integr Comput Aided Eng 29(2):141–151

    Google Scholar 

  21. Hu S-M, Liu Z-N, Guo M-H, Cai J-X, Huang J, Mu T-J, Martin RR (2022) Subdivision-based mesh convolution networks. ACM Trans Graph 41(3):1–16

    Google Scholar 

  22. Lahav A, Tal A (2020) Meshwalker deep mesh understanding by random walks. ACM Trans Graph 39(6):1–13

    Google Scholar 

  23. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    CAS  PubMed  Google Scholar 

  24. Izhak RB, Lahav A, Tal A (2022) Attwalk: attentive cross-walks for deep mesh analysis. In: 2022 IEEE/CVF winter conference on applications of computer vision (WACV), pp 2937–2946. IEEE

  25. Mesika A, Ben-Shabat Y, Tal A (2022) Cloudwalker: random walks for 3D point cloud shape analysis. Comput Graph 106:110–118

    Google Scholar 

  26. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30:5998–6008

    Google Scholar 

  27. Shaw P, Uszkoreit J, Vaswani A (2018) Self-attention with relative position representations. In: NAACL-HLT (2), pp 464–468

  28. Ahmed E, Saint A, Shabayek AER, Cherenkova K, Das R, Gusev G, Aouada D, Ottersten B (2018) A survey on deep learning advances on different 3D data representations. arXiv preprint arXiv:1808.01462

  29. Su H, Maji S, Kalogerakis E, Learned-Miller E (2015) Multi-view convolutional neural networks for 3d shape recognition. In: Proceedings of the IEEE international conference on computer vision, pp 945–953

  30. Qi CR, Su H, Nießner M, Dai A, Yan M, Guibas LJ (2016) Volumetric and multi-view CNNs for object classification on 3d data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5648–5656

  31. Yang Y, Chen F, Wu F, Zeng D, Ji Y-M, Jing X-Y (2020) Multi-view semantic learning network for point cloud based 3D object detection. Neurocomputing 397:477–485

    Google Scholar 

  32. Qin P, Zhang C, Dang M (2022) Gvnet: Gaussian model with voxel-based 3d detection network for autonomous driving. Neural Comput Appl 34(9):6637–6645

    Google Scholar 

  33. Qi CR, Su H, Mo K, Guibas LJ (2017) Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 652–660

  34. Qi CR, Yi L, Su H, Guibas LJ (2017) Pointnet++: deep hierarchical feature learning on point sets in a metric space. Adv Neural Inf Process Syst 30:5099–5108

    Google Scholar 

  35. Wang Y, Sun Y, Liu Z, Sarma SE, Bronstein MM, Solomon JM (2019) Dynamic graph CNN for learning on point clouds. ACM Trans Graph 38(5):1–12

    Google Scholar 

  36. Wang H, Liu X, Kang W, Yan Z, Wang B, Ning Q (2022) Multi-features guidance network for partial-to-partial point cloud registration. Neural Comput Appl 34(2):1623–1634

    CAS  Google Scholar 

  37. Milano F, Loquercio A, Rosinol A, Scaramuzza D, Carlone L (2020) Primal-dual mesh convolutional neural networks. Adv Neural Inf Process Syst 33:952–963

    Google Scholar 

  38. Feng Y, Feng Y, You H, Zhao X, Gao Y (2019) Meshnet: mesh neural network for 3D shape representation. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 8279–8286

  39. Yi L, Su H, Guo X, Guibas LJ (2017) Syncspeccnn: synchronized spectral CNN for 3D shape segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2282–2290

  40. Kostrikov I, Jiang Z, Panozzo D, Zorin D, Bruna J (2018) Surface networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2540–2548

  41. Such FP, Sah S, Dominguez MA, Pillai S, Zhang C, Michael A, Cahill ND, Ptucha R (2017) Robust spatial filtering with graph convolutional neural networks. IEEE J Sel Top Signal Process 11(6):884–896

    ADS  Google Scholar 

  42. Verma N, Boyer E, Verbeek J (2018) Feastnet: Feature-steered graph convolutions for 3d shape analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2598–2606

  43. Lim I, Dielen A, Campen M, Kobbelt L (2018) A simple approach to intrinsic correspondence learning on unstructured 3d meshes. In: Proceedings of the European conference on computer vision (ECCV) workshops, vol. 11131, pp 349–362

  44. Gong S, Chen L, Bronstein M, Zafeiriou S (2019) Spiralnet++: A fast and highly efficient mesh convolution operator. In: Proceedings of the IEEE/CVF international conference on computer vision workshops, pp 4141–4148

  45. Chen Y, Zhao J, Shi C, Yuan D (2020) Mesh convolution: a novel feature extraction method for 3d nonrigid object classification. IEEE Trans Multimed 23:3098–3111

    Google Scholar 

  46. Schult J, Engelmann F, Kontogianni T, Leibe B (2020) Dualconvmesh-net: Joint geodesic and euclidean convolutions on 3d meshes. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8612–8622

  47. Lai YK, Hu SM, Martin RR, Rosin PL (2008) Fast mesh segmentation using random walks. In: Proceedings of the 2008 ACM symposium on solid and physical modeling, pp 183–191

  48. Grady L (2006) Random walks for image segmentation. IEEE Trans Pattern Anal Mach Intell 28(11):1768–1783

    PubMed  Google Scholar 

  49. Schneider L, Niemann A, Beuing O, Preim B, Saalfeld S (2021) Medmeshcnn-enabling meshcnn for medical surface models. Comput Methods Programs Biomed 210:106372

    PubMed  Google Scholar 

  50. Liu H-TD, Kim VG, Chaudhuri S, Aigerman N, Jacobson A (2020) Neural subdivision. ACM Trans Graph 39(4):124

    Google Scholar 

  51. Guo K, Zou D, Chen X (2015) 3d mesh labeling via deep convolutional neural networks. ACM Trans Graph 35(1):1–12

    CAS  Google Scholar 

  52. Singh VV, Sheshappanavar SV, Kambhamettu C (2021) Meshnet++: A network with a face. In: Proceedings of the 29th ACM international conference on multimedia, pp 4883–4891

  53. Wang Y, Xie Y, Fan L, Hu G (2022) Stmg: Swin transformer for multi-label image recognition with graph convolution network. Neural Comput Appl 34(12):10051–10063

    Google Scholar 

  54. Kalyan KS, Rajasekharan A, Sangeetha S (2021) Ammus: a survey of transformer-based pretrained models in natural language processing. arXiv preprint arXiv:2108.05542

  55. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly Sa (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: International conference on learning representations

  56. Guo M-H, Cai J-X, Liu Z-N, Mu T-J, Martin RR, Hu S-M (2021) Pct: Point cloud transformer. Comput Vis Media 7(2):187–199

    Google Scholar 

  57. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision, pp 213–229

  58. Xie E, Wang W, Yu Z, Anandkumar A, Alvarez JM, Luo P (2021) Segformer: simple and efficient design for semantic segmentation with transformers. Adv Neural Inf Process Syst 34:12077–12090

    Google Scholar 

  59. Lin K, Wang L, Liu Z (2021) End-to-end human pose and mesh reconstruction with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1954–1963

  60. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  61. Ulyanov D, Vedaldi A, Lempitsky V (2016) Instance normalization: the missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022

  62. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980

  63. Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3d shapenets: A deep representation for volumetric shapes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1912–1920

  64. Lian Z, Godil A, Bustos B, Daoudi M, Hermans J, Kawamura S, Kurita Y, Lavoué G, Van Nguyen H, Ohbuchi R (2011) Shrec’11 track: Shape retrieval on non-rigid 3d watertight meshes. In: 3DOR@ Eurographics, pp 79–88

  65. Wang Y, Asafi S, Van Kaick O, Zhang H, Cohen-Or D, Chen B (2012) Active co-analysis of a set of shapes. ACM Trans Graph 31(6):1–10

    Google Scholar 

  66. Maron H, Galun M, Aigerman N, Trope M, Dym N, Yumer E, Kim VG, Lipman Y (2017) Convolutional neural networks on surfaces via seamless toric covers. ACM Trans Graph 36(4):1–10

    Google Scholar 

  67. Sharp N, Attaiki S, Crane K, Ovsjanikov M (2022) Diffusionnet: discretization agnostic learning on surfaces. ACM Trans Graph 41(3):1–16

    Google Scholar 

  68. Smirnov D, Solomon J (2021) Hodgenet: learning spectral geometry on triangle meshes. ACM Trans Graph 40(4):1–11

    Google Scholar 

  69. Ezuz D, Solomon J, Kim VG, Ben-Chen M (2017) Gwcnn: a metric alignment layer for deep shape analysis. Comput Graph Forum 36(5):49–57

    Google Scholar 

  70. Haim N, Segol N, Ben-Hamu H, Maron H, Lipman Y (2019) Surface networks via general covers. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 632–641

  71. Anguelov D, Srinivasan P, Koller D, Thrun S, Rodgers J, Davis J (2005) Scape: shape completion and animation of people. In: ACM SIGGRAPH 2005 papers, pp 408–416

  72. Bogo F, Romero J, Loper M, Black MJ (2014) Faust: dataset and evaluation for 3dD mesh registration. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3794–3801

  73. Vlasic D, Baran I, Matusik W, Popović J (2008) Articulated mesh animation from multi-view silhouettes. In: ACM SIGGRAPH 2008 Papers, pp 1–9

  74. Adobe (2016) Adobe fuse 3D characters. https://www.mixamo.com

  75. Li Y, Bu R, Sun M, Wu W, Di X, Chen B (2018) Pointcnn: convolution on x-transformed points. Adv Neural Inf Process Syst 31:828–838

    Google Scholar 

  76. Su J, Lu Y, Pan S, Wen B, Liu Y (2021) Roformer: enhanced transformer with rotary position embedding. arXiv preprint arXiv:2104.09864

Download references

Funding

This work was supported by the National Natural Science Foundation of China under Grant 62072348 and China Yunnan province major science and technology special plan project No. 202202AF080004. The numerical calculations in this paper have been done on the supercomputing system in the Supercomputing Center of Wuhan University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fazhi He.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guo, Q., He, F., Fan, B. et al. WalkFormer: 3D mesh analysis via transformer on random walk. Neural Comput & Applic 36, 3499–3511 (2024). https://doi.org/10.1007/s00521-023-09279-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-09279-1

Keywords

Navigation