MVHANet: multi-view hierarchical aggregation network for skeleton-based hand gesture recognition

Li, Shaochen; Liu, Zhenyu; Duan, Guifang; Tan, Jianrong

doi:10.1007/s11760-022-02469-9

MVHANet: multi-view hierarchical aggregation network for skeleton-based hand gesture recognition

Original Paper
Published: 21 January 2023

Volume 17, pages 2521–2529, (2023)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Shaochen Li¹,
Zhenyu Liu¹,
Guifang Duan¹ &
…
Jianrong Tan¹

410 Accesses
1 Altmetric
Explore all metrics

Abstract

Skeleton-based gesture recognition (SHGR) is a very challenging task due to the complex articulated topology of hands. Previous works often learn hand characteristics from a single observation viewpoint. However, the various context information hidden in multiple viewpoints is disregarded. To resolve this issue, we propose a novel multi-view hierarchical aggregation network for SHGR. Firstly, two-dimensional non-uniform spatial sampling, a novel strategy forming extrinsic parameter distributions of virtual cameras, is presented to enumerate viewpoints to observe hand skeletons. Afterward, we adopt coordinate transformation to generate multi-view hand skeletons and employ a multi-branch convolutional neural networks to further extract the multi-view features. Furthermore, we exploit a novel hierarchical aggregation network including hierarchical attention architecture and global context modeling to fuse the multi-view features for final classification. Experiments on three benchmarked datasets demonstrate that our work can be competitive with the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Viewpoint guided multi-stream neural network for skeleton action recognition

Article 17 June 2023

SVIM: A Skeleton-Based View-Invariant Method for Online Gesture Recognition

Attention-Based Fusion of Directed Rotation Graphs for Skeleton-Based Dynamic Hand Gesture Recognition

References

Nuzzi, C., Pasinetti, S., Pagani, R., Ghidini, S., Beschi, M., Coffetti, G., Sansoni, G.: MEGURU: a gesture-based robot program builder for Meta-Collaborative workstations. Robot. Comput.-Integr. Manuf. 68, 102085 (2021)
Article Google Scholar
Boukdir, A., Benaddy, M., Ellahyani, A., Meslouhi, O.E., Kardouchi, M.: 3D gesture segmentation for word-level Arabic sign language using large-scale RGB video sequences and autoencoder convolutional networks. SIViP 16, 2055–2062 (2022)
Article Google Scholar
Wang, P., Bai, X., Billinghurst, M., Zhang, S., Wei, S., Xu, G., He, W., Zhang, X., Zhang, J.: 3DGAM: using 3D gesture and CAD models for training on mixed reality remote collaboration. Multimed. Tools Appl 80, 31059–31084 (2021)
Article Google Scholar
Zhang, P., Lan, C., Xing, J., Zeng, W., Xue, J., Zheng, N.: View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2136–2145 (2017)
Molchanov, P., Gupta, S., Kim, K., Kautz, J.: Hand gesture recognition with 3D convolutional neural networks. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1–7 (2015)
De Smedt, Q., Wannous, H., Vandeborre, J.-P.: Skeleton-based dynamic hand gesture recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1206–1214 (2016)
Smedt, Q.D., Wannous, H., Vandeborre, J.-P., Guerry, J., Saux, B.L., Filliat, D.: SHREC'17 track: 3D hand gesture recognition using a depth and skeletal dataset. In: 3DOR-10th Eurographics Workshop on 3D Object Retrieval, pp. 1–6 (2017)
Lo Presti, L., La Cascia, M.: 3D skeleton-based human action classification: a survey. Pattern Recogn. 53, 130–147 (2016)
Article Google Scholar
Guo, F., He, Z., Zhang, S., Zhao, X., Tan, J.: Attention-based pose sequence machine for 3D hand pose estimation. IEEE Access 8, 18258–18269 (2020)
Article Google Scholar
Avola, D., Bernardi, M., Cinque, L., Foresti, G.L., Massaroni, C.: Exploiting recurrent neural networks and leap motion controller for the recognition of sign language and semaphoric hand gestures. IEEE Trans. Multimed. 21, 234–245 (2019)
Article Google Scholar
Chen, X., Wang, G., Guo, H., Zhang, C., Wang, H., Zhang, L.: MFA-Net: motion feature augmented network for dynamic hand gesture recognition from skeletal data. Sensors. 19, 239 (2019)
Article Google Scholar
Li, C., Zhong, Q., Xie, D., Pu, S.: Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, pp. 786–792 (2018)
Núñez, J.C., Cabido, R., Pantrigo, J.J., Montemayor, A.S., Vélez, J.F.: Convolutional Neural Networks and Long Short-Term Memory for skeleton-based human activity and hand gesture recognition. Pattern Recogn. 76, 80–94 (2018)
Article Google Scholar
Hou, J., Wang, G., Chen, X., Xue, J.-H., Zhu, R., Yang, H.: Spatial-temporal attention Res-TCN for skeleton-based dynamic hand gesture recognition. In: Leal-Taixé, L., Roth, S. (eds.) Proceedings of the European Conference on Computer Vision (ECCV), pp. 273–286 (2019)
Fan, Z., Zhao, X., Lin, T., Su, H.: Attention-based multiview re-observation fusion network for skeletal action recognition. IEEE Trans. Multimed. 21, 363–374 (2019)
Article Google Scholar
Feng, Y., Zhang, Z., Zhao, X., Ji, R., Gao, Y.: GVCNN: group-view convolutional neural networks for 3D shape recognition. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 264–272 (2018)
Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3D shape recognition. arXiv:1505.00880 [cs] (2015)
Wang, C., Pelillo, M., Siddiqi, K.: Dominant set clustering and pooling for multi-view 3D object recognition. arXiv:1906.01592 [cs] (2019)
Garcia-Hernando, G., Yuan, S., Baek, S., Kim, T.-K.: First-person hand action benchmark with RGB-D videos and 3D hand pose annotations. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 409–419 (2018)
Ding, K., Liu, Y.-H.: Sphere image for 3-D model retrieval. IEEE Trans. Multimed. 16, 1369–1376 (2014)
Article Google Scholar
Biermann, H., Levin, A., Zorin, D.: Piecewise smooth subdivision surfaces with normal control. In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, pp. 113–120 (2000)
Neave, H.R.: On using the Box–Muller transformation with multiplicative congruential pseudo-random number generators. Appl. Stat. 22, 92 (1973)
Article Google Scholar
Liang, B., Li, H.: Specificity and latent correlation learning for action recognition using synthetic multi-view data from depth maps. IEEE Trans. Image Process. 26, 5560–5574 (2017)
Article MathSciNet MATH Google Scholar
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: Convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv:1412.6980 [cs]. (2017)
Boulahia, S.Y., Anquetil, E., Multon, F., Kulpa, R.: Dynamic hand gesture recognition based on 3D pattern assembled trajectories. In: 2017 Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA), pp. 1–6 (2017)
Tu, J., Liu, M., Liu, H.: Skeleton-based human action recognition using spatial temporal 3D convolutional neural networks. In: 2018 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2018)
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: AAAI Conference on Artificial Intelligence, pp 7444–7452 (2018)
Chen, Y.: Construct dynamic graphs for hand gesture recognition via spatial-temporal attention. arXiv:1907.08871 [cs] (2019)
Nguyen, X.S., Brun, L., Lézoray, O., Bougleux, S.: A neural network based on SPD manifold learning for skeleton-based hand gesture recognition. arXiv:1904.12970 [cs] (2019)
Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3D skeletons as points in a Lie group. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 588–595 (2014)
Garcia-Hernando, G., Kim, T.-K.: Transition forests: learning discriminative temporal transitions for action recognition and detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 407–415 (2017)
Huang, Z., Van Gool, L.: A Riemannian network for SPD matrix learning. arXiv:1608.04233 [cs] (2016)
Zhang, X., Wang, Y., Gou, M., Sznaier, M., Camps, O.: Efficient temporal sequence comparison and classification using gram matrix embeddings on a Riemannian manifold. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4498–4507 (2016)

Download references

Acknowledgements

This work was supported by Key Research and Development Program of Zhejiang Province under Grant 2022C01064.

Author information

Authors and Affiliations

State Key Lab of CAD&CG, Zhejiang University, Hangzhou, 310027, China
Shaochen Li, Zhenyu Liu, Guifang Duan & Jianrong Tan

Authors

Shaochen Li
View author publications
You can also search for this author inPubMed Google Scholar
Zhenyu Liu
View author publications
You can also search for this author inPubMed Google Scholar
Guifang Duan
View author publications
You can also search for this author inPubMed Google Scholar
Jianrong Tan
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Shaochen Li: Conceptualization, Methodology, Writing—Original Draft Zhenyu Liu: Methodology Guifang Duan: Methodology, Writing—Review & Editing Jianrong Tan: Writing—Review & Editing.

Corresponding author

Correspondence to Guifang Duan.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, S., Liu, Z., Duan, G. et al. MVHANet: multi-view hierarchical aggregation network for skeleton-based hand gesture recognition. SIViP 17, 2521–2529 (2023). https://doi.org/10.1007/s11760-022-02469-9

Download citation

Received: 17 November 2022
Revised: 20 December 2022
Accepted: 22 December 2022
Published: 21 January 2023
Issue Date: July 2023
DOI: https://doi.org/10.1007/s11760-022-02469-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MVHANet: multi-view hierarchical aggregation network for skeleton-based hand gesture recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Viewpoint guided multi-stream neural network for skeleton action recognition

SVIM: A Skeleton-Based View-Invariant Method for Online Gesture Recognition

Attention-Based Fusion of Directed Rotation Graphs for Skeleton-Based Dynamic Hand Gesture Recognition

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now