Abstract
A 3D mesh is a popular representation of 3D shapes. For mesh analysis tasks, one typical method is to map 3D mesh data into 1D sequence data with random walk sampling. However, existing random walk-based approaches cannot make full use of attentive regions, which limits the capability of 3D shape analysis. In addition, existing methods process the random walk as sequence data in the discovery order, which results in computational overhead. In this paper, we propose a novel neural framework named WalkFormer, which applies a transformer to a random walk to fully exploit semantic information in a 3D mesh. First, we propose a novel transformer-based framework to learn semantic information from a random walk of a 3D mesh. Second, to capture the attentive regions of the random walk, our approach extends the multi-head self-attention mechanism to specific 3D mesh analysis tasks. To establish the long-range interactions between vertices in the random walk, our approach adopts a novel relative position encoding module. Thus, the local–global information in the random walk can be obtained and learned in our approach. Third, we discover that for 3D mesh analysis, the sequential operations for the random walk sequence are redundant. Different from previous random walk methods, our approach can be executed in a parallelized manner, which greatly improves computational efficiency. Numerous experiments demonstrate the effectiveness of the proposed method on typical 3D shape analysis tasks.






Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Abouelaziz I, Chetouani A, El Hassouni M, Latecki LJ, Cherifi H (2020) 3d visual saliency and convolutional neural network for blind mesh quality assessment. Neural Comput Appl 32(21):16589–16603
Kwon S, Kim BC, Mun D, Han S (2015) Simplification of feature-based 3D CAD assembly data of ship and offshore equipment using quantitative evaluation metrics. Comput Aided Des 59:140–154
Lin B, Wang F, Zhao F, Sun Y (2018) Scale invariant point feature (SIPF) for 3D point clouds and 3d multi-scale object detection. Neural Comput Appl 29(5):1209–1224
Kim BC, Mun D (2014) Feature-based simplification of boundary representation models using sequential iterative volume decomposition. Comput Graph 38:97–107
Wang Y, Horvath I (2013) Computer-aided multi-scale materials and product design. Comput Aided Des 45(1):1–3
Rosen DW, Jeong N, Wang Y (2013) A method for reverse engineering of material microstructure for heterogeneous cad. Comput Aided Des 45(7):1068–1078
Wang W, Cai Y, Wang T (2022) Multi-view dual attention network for 3D object recognition. Neural Comput Appl 34(4):3201–3212
Hanocka R, Hertz A, Fish N, Giryes R, Fleishman S, Cohen-Or D (2019) Meshcnn: a network with an edge. ACM Trans Graph 38(4):1–12
Tang W, He F, Liu Y, Duan Y (2022) MATR: multimodal medical image fusion via multiscale adaptive transformer. IEEE Trans Image Process 31:5134–5149
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Liu W, Wang Z, Liu X, Zeng N, Liu Y, Alsaadi FE (2017) A survey of deep neural network architectures and their applications. Neurocomputing 234:11–26
Zhao T, Chen Q, Kuang Z, Yu J, Zhang W, Fan J (2018) Deep mixture of diverse experts for large-scale visual recognition. IEEE Trans Pattern Anal Mach Intell 41(5):1072–1087
Zhang H, Sun Y, Liu L, Wang X, Li L, Liu W (2020) Clothingout: a category-supervised GAN model for clothing segmentation and retrieval. Neural Comput Appl 32(9):4519–4530
El-Bana S, Al-Kabbany A, Sharkas M (2020) A two-stage framework for automated malignant pulmonary nodule detection in CT scans. Diagnostics 10(3):131
Zhang H, Ji Y, Huang W, Liu L (2019) Sitcom-star-based clothing retrieval for video advertising: a deep learning framework. Neural Comput Appl 31(11):7361–7380
Hussein A, Elyan E, Gaber MM, Jayne C (2018) Deep imitation learning for 3D navigation tasks. Neural Comput Appl 29(7):389–404
El-Bana S, Al-Kabbany A, Sharkas M (2020) A multi-task pipeline with specialized streams for classification and segmentation of infection manifestations in covid-19 scans. PeerJ Comput Sci 6:303
Zhang X, Zhao W, Zhang W, Peng J, Fan J (2022) Guided filter network for semantic image segmentation. IEEE Trans Image Process 31:2695–2709
Wang S, Chen Z, You S, Wang B, Shen Y, Lei B (2022) Brain stroke lesion segmentation using consistent perception generative adversarial network. Neural Comput Appl 34(11):8657–8669
Wu H, He F, Duan Y, Yan X (2022) Perceptual metric-guided human image generation. Integr Comput Aided Eng 29(2):141–151
Hu S-M, Liu Z-N, Guo M-H, Cai J-X, Huang J, Mu T-J, Martin RR (2022) Subdivision-based mesh convolution networks. ACM Trans Graph 41(3):1–16
Lahav A, Tal A (2020) Meshwalker deep mesh understanding by random walks. ACM Trans Graph 39(6):1–13
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Izhak RB, Lahav A, Tal A (2022) Attwalk: attentive cross-walks for deep mesh analysis. In: 2022 IEEE/CVF winter conference on applications of computer vision (WACV), pp 2937–2946. IEEE
Mesika A, Ben-Shabat Y, Tal A (2022) Cloudwalker: random walks for 3D point cloud shape analysis. Comput Graph 106:110–118
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30:5998–6008
Shaw P, Uszkoreit J, Vaswani A (2018) Self-attention with relative position representations. In: NAACL-HLT (2), pp 464–468
Ahmed E, Saint A, Shabayek AER, Cherenkova K, Das R, Gusev G, Aouada D, Ottersten B (2018) A survey on deep learning advances on different 3D data representations. arXiv preprint arXiv:1808.01462
Su H, Maji S, Kalogerakis E, Learned-Miller E (2015) Multi-view convolutional neural networks for 3d shape recognition. In: Proceedings of the IEEE international conference on computer vision, pp 945–953
Qi CR, Su H, Nießner M, Dai A, Yan M, Guibas LJ (2016) Volumetric and multi-view CNNs for object classification on 3d data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5648–5656
Yang Y, Chen F, Wu F, Zeng D, Ji Y-M, Jing X-Y (2020) Multi-view semantic learning network for point cloud based 3D object detection. Neurocomputing 397:477–485
Qin P, Zhang C, Dang M (2022) Gvnet: Gaussian model with voxel-based 3d detection network for autonomous driving. Neural Comput Appl 34(9):6637–6645
Qi CR, Su H, Mo K, Guibas LJ (2017) Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 652–660
Qi CR, Yi L, Su H, Guibas LJ (2017) Pointnet++: deep hierarchical feature learning on point sets in a metric space. Adv Neural Inf Process Syst 30:5099–5108
Wang Y, Sun Y, Liu Z, Sarma SE, Bronstein MM, Solomon JM (2019) Dynamic graph CNN for learning on point clouds. ACM Trans Graph 38(5):1–12
Wang H, Liu X, Kang W, Yan Z, Wang B, Ning Q (2022) Multi-features guidance network for partial-to-partial point cloud registration. Neural Comput Appl 34(2):1623–1634
Milano F, Loquercio A, Rosinol A, Scaramuzza D, Carlone L (2020) Primal-dual mesh convolutional neural networks. Adv Neural Inf Process Syst 33:952–963
Feng Y, Feng Y, You H, Zhao X, Gao Y (2019) Meshnet: mesh neural network for 3D shape representation. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 8279–8286
Yi L, Su H, Guo X, Guibas LJ (2017) Syncspeccnn: synchronized spectral CNN for 3D shape segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2282–2290
Kostrikov I, Jiang Z, Panozzo D, Zorin D, Bruna J (2018) Surface networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2540–2548
Such FP, Sah S, Dominguez MA, Pillai S, Zhang C, Michael A, Cahill ND, Ptucha R (2017) Robust spatial filtering with graph convolutional neural networks. IEEE J Sel Top Signal Process 11(6):884–896
Verma N, Boyer E, Verbeek J (2018) Feastnet: Feature-steered graph convolutions for 3d shape analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2598–2606
Lim I, Dielen A, Campen M, Kobbelt L (2018) A simple approach to intrinsic correspondence learning on unstructured 3d meshes. In: Proceedings of the European conference on computer vision (ECCV) workshops, vol. 11131, pp 349–362
Gong S, Chen L, Bronstein M, Zafeiriou S (2019) Spiralnet++: A fast and highly efficient mesh convolution operator. In: Proceedings of the IEEE/CVF international conference on computer vision workshops, pp 4141–4148
Chen Y, Zhao J, Shi C, Yuan D (2020) Mesh convolution: a novel feature extraction method for 3d nonrigid object classification. IEEE Trans Multimed 23:3098–3111
Schult J, Engelmann F, Kontogianni T, Leibe B (2020) Dualconvmesh-net: Joint geodesic and euclidean convolutions on 3d meshes. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8612–8622
Lai YK, Hu SM, Martin RR, Rosin PL (2008) Fast mesh segmentation using random walks. In: Proceedings of the 2008 ACM symposium on solid and physical modeling, pp 183–191
Grady L (2006) Random walks for image segmentation. IEEE Trans Pattern Anal Mach Intell 28(11):1768–1783
Schneider L, Niemann A, Beuing O, Preim B, Saalfeld S (2021) Medmeshcnn-enabling meshcnn for medical surface models. Comput Methods Programs Biomed 210:106372
Liu H-TD, Kim VG, Chaudhuri S, Aigerman N, Jacobson A (2020) Neural subdivision. ACM Trans Graph 39(4):124
Guo K, Zou D, Chen X (2015) 3d mesh labeling via deep convolutional neural networks. ACM Trans Graph 35(1):1–12
Singh VV, Sheshappanavar SV, Kambhamettu C (2021) Meshnet++: A network with a face. In: Proceedings of the 29th ACM international conference on multimedia, pp 4883–4891
Wang Y, Xie Y, Fan L, Hu G (2022) Stmg: Swin transformer for multi-label image recognition with graph convolution network. Neural Comput Appl 34(12):10051–10063
Kalyan KS, Rajasekharan A, Sangeetha S (2021) Ammus: a survey of transformer-based pretrained models in natural language processing. arXiv preprint arXiv:2108.05542
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly Sa (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: International conference on learning representations
Guo M-H, Cai J-X, Liu Z-N, Mu T-J, Martin RR, Hu S-M (2021) Pct: Point cloud transformer. Comput Vis Media 7(2):187–199
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision, pp 213–229
Xie E, Wang W, Yu Z, Anandkumar A, Alvarez JM, Luo P (2021) Segformer: simple and efficient design for semantic segmentation with transformers. Adv Neural Inf Process Syst 34:12077–12090
Lin K, Wang L, Liu Z (2021) End-to-end human pose and mesh reconstruction with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1954–1963
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Ulyanov D, Vedaldi A, Lempitsky V (2016) Instance normalization: the missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3d shapenets: A deep representation for volumetric shapes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1912–1920
Lian Z, Godil A, Bustos B, Daoudi M, Hermans J, Kawamura S, Kurita Y, Lavoué G, Van Nguyen H, Ohbuchi R (2011) Shrec’11 track: Shape retrieval on non-rigid 3d watertight meshes. In: 3DOR@ Eurographics, pp 79–88
Wang Y, Asafi S, Van Kaick O, Zhang H, Cohen-Or D, Chen B (2012) Active co-analysis of a set of shapes. ACM Trans Graph 31(6):1–10
Maron H, Galun M, Aigerman N, Trope M, Dym N, Yumer E, Kim VG, Lipman Y (2017) Convolutional neural networks on surfaces via seamless toric covers. ACM Trans Graph 36(4):1–10
Sharp N, Attaiki S, Crane K, Ovsjanikov M (2022) Diffusionnet: discretization agnostic learning on surfaces. ACM Trans Graph 41(3):1–16
Smirnov D, Solomon J (2021) Hodgenet: learning spectral geometry on triangle meshes. ACM Trans Graph 40(4):1–11
Ezuz D, Solomon J, Kim VG, Ben-Chen M (2017) Gwcnn: a metric alignment layer for deep shape analysis. Comput Graph Forum 36(5):49–57
Haim N, Segol N, Ben-Hamu H, Maron H, Lipman Y (2019) Surface networks via general covers. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 632–641
Anguelov D, Srinivasan P, Koller D, Thrun S, Rodgers J, Davis J (2005) Scape: shape completion and animation of people. In: ACM SIGGRAPH 2005 papers, pp 408–416
Bogo F, Romero J, Loper M, Black MJ (2014) Faust: dataset and evaluation for 3dD mesh registration. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3794–3801
Vlasic D, Baran I, Matusik W, Popović J (2008) Articulated mesh animation from multi-view silhouettes. In: ACM SIGGRAPH 2008 Papers, pp 1–9
Adobe (2016) Adobe fuse 3D characters. https://www.mixamo.com
Li Y, Bu R, Sun M, Wu W, Di X, Chen B (2018) Pointcnn: convolution on x-transformed points. Adv Neural Inf Process Syst 31:828–838
Su J, Lu Y, Pan S, Wen B, Liu Y (2021) Roformer: enhanced transformer with rotary position embedding. arXiv preprint arXiv:2104.09864
Funding
This work was supported by the National Natural Science Foundation of China under Grant 62072348 and China Yunnan province major science and technology special plan project No. 202202AF080004. The numerical calculations in this paper have been done on the supercomputing system in the Supercomputing Center of Wuhan University.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Guo, Q., He, F., Fan, B. et al. WalkFormer: 3D mesh analysis via transformer on random walk. Neural Comput & Applic 36, 3499–3511 (2024). https://doi.org/10.1007/s00521-023-09279-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-09279-1