Asymmetric multi-branch GCN for skeleton-based sign language recognition

Liu, Yuhong; Lu, Fei; Cheng, Xianpeng; Yuan, Ying

doi:10.1007/s11042-024-18443-1

Asymmetric multi-branch GCN for skeleton-based sign language recognition

Published: 15 February 2024

Volume 83, pages 75293–75319, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Yuhong Liu¹,
Fei Lu ORCID: orcid.org/0000-0002-7286-6714¹,
Xianpeng Cheng¹ &
…
Ying Yuan¹

212 Accesses
Explore all metrics

Abstract

With the increase in the number of hearing-impaired people in the world, sign language recognition (SLR) has attracted extensive attention from scholars. Given the problems existing in the current research of SLR, such as parameter expansion and unsatisfactory performance of feature extraction, a novel skeleton-based method is proposed in this paper. The Asymmetric Multi-branch Graph Convolution Network (AM-GCN), composed of a spatial graph convolution and an Asymmetric Multi-branch Temporal Convolution (MTC), is constructed to achieve the acquisition and processing of graph structure. MTC utilizes multi-branch dilated convolution to expand the receptive field and enhance information dependence. To effectively extract discriminative spatiotemporal information from a large amount of information, the Spatial and Temporal Fusion Attention module (STFA) is proposed. The STFA maintains spatiotemporal consistency and obtains the fused attention map, which substantially facilitates spatiotemporal feature learning. In this article, Asymmetric Convolution Channel Attention (ACCA) is used as channel attention. Some experiments are carried out on a processed dataset obtained from video transformation, confirming the robustness of the ACCA for image flipping and rotation. The STFA and ACCA jointly form a spatial-temporal-channel attention module to extract distinguishing features and enhance the model representation. Eventually, the attention module is inserted into the AM-GCN, attaining AM-GCN-A, which is experimented on the WLASL2000, AUTSL, and CSL datasets. The top 1 accuracy is 57.01$\%$, 96.27$\%$, and 98.20$\%$, respectively. The results are competitive with the state-of-the-art methods and prove the effectiveness of the model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Skeleton-Based Sign Language Recognition with Attention-Enhanced Graph Convolutional Networks

Spatial-Temporal Graph Transformer for Skeleton-Based Sign Language Recognition

Diffusion-guided graph convolutional networks for sign language recognition

Article 21 March 2025

Availability of data and materials

All datasets are publicly available, shown in the references.

Notes

We show the main codes about the structure of AM-GCN in https://github.com/LiuyhLinda/Asymmetric_Multi-branch_GCN-main_for_SLR.

References

Otoom M, Alzubaidi MA, Aloufee R (2022) Novel navigation assistive device for deaf drivers. Assist Technol 34(2):129–139
Article Google Scholar
Kusters A (2021) International sign and american sign language as different types of global deaf lingua francas. Sign Lang Stud 21(4):391–426
Article Google Scholar
Cui R, Liu H, Zhang C (2017) Recurrent convolutional neural networks for continuous sign language recognition by staged optimization. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 7361–7369
Pu J, Zhou W, Li H (2019) Iterative alignment network for continuous sign language recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 4165–4174
Jiang S, Sun B, Wang L, Bai Y, Li K, Fu Y (2021) Skeleton aware multi-modal sign language recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 3413–3423
Vazquez-Enriquez M, Alba-Castro JL, Docío-Fernández L, Rodriguez-Banga E (2021) Isolated sign language recognition with multi-scale spatial-temporal graph convolutional networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 3462–3471
Zhang Z (2012) Microsoft kinect sensor and its effect. IEEE Multimed 19(2):4–10
Article Google Scholar
Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv:1609.02907
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second AAAI conference on artificial intelligence
Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 12026–12035
Cheng K, Zhang Y, Cao C, Shi L, Cheng J, Lu H (2020) Decoupling gcn with dropgraph module for skeleton-based action recognition. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIV 16. Springer, pp 536–553
Liu Z, Zhang H, Chen Z, Wang Z, Ouyang W (2020) Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 143–152
Liang W, Xu X (2021) Skeleton-based sign language recognition with attention-enhanced graph convolutional networks. In: Natural language processing and chinese computing: 10th CCF International conference, NLPCC 2021, Qingdao, China, October 13–17, 2021, Proceedings, Part I 10. Springer, pp 773–785
Vazquez-Enriquez M, Alba-Castro JL, Docío-Fernández L, Rodriguez-Banga E (2021) Isolated sign language recognition with multi-scale spatial-temporal graph convolutional networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 3462–3471
Li D, Rodriguez C, Yu X, Li H (2020) Word-level deep sign language recognition from video: a new large-scale dataset and methods comparison. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp 1459–1469
Sincan OM, Keles HY (2020) Autsl: a large scale multi-modal Turkish sign language dataset and baseline methods. IEEE Access 8:181340–181355
Article Google Scholar
Zhang J, Zhou W, Xie C, Pu J, Li H (2016) Chinese sign language recognition with adaptive hmm. In: 2016 IEEE international conference on multimedia and expo (ICME). IEEE, pp 1–6
Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3d skeletons as points in a lie group. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 588–595
Badhe PC, Kulkarni V (2015) Indian sign language translator using gesture recognition algorithm. In: 2015 IEEE international conference on computer graphics, vision and information security (CGVIS). IEEE, pp 195–200
Cooper H, Ong E-J, Pugeault N, Bowden R (2012) Sign language recognition using sub-units. J Mach Learn Res 13:2205–2231
Google Scholar
Huang J, Zhou W, Li H, Li W (2015) Sign language recognition using 3d convolutional neural networks. In: 2015 IEEE international conference on multimedia and expo (ICME). IEEE, pp 1–6
Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 6299–6308
Abeje BT, Salau AO, Mengistu AD, Tamiru NK (2022) Ethiopian sign language recognition using deep convolutional neural network. Multimed Tools Appl 81(20):29027–29043
Article Google Scholar
Xiao Q, Zhao Y, Huan W (2019) Multi-sensor data fusion for sign language recognition based on dynamic Bayesian network and convolutional neural network. Multimed Tools Appl 78:15335–15352
Article Google Scholar
Han X, Lu F, Tian G (2022) Efficient 3d cnns with knowledge transfer for sign language recognition. Multimed Tools Appl 81(7):10071–10090
Pigou L, Van Den Oord A, Dieleman S, Van Herreweghe M, Dambre J (2018) Beyond temporal pooling: recurrence and temporal convolutions for gesture recognition in video. Int J Comput Vision 126:430–439
Article MathSciNet Google Scholar
Liu T, Zhou W, Li H (2016) Sign language recognition with long short-term memory. In: 2016 IEEE international conference on image processing (ICIP). IEEE, pp 2871–2875
Koller O, Camgoz NC, Ney H, Bowden R (2019) Weakly supervised learning with multi-stream cnn-lstm-hmms to discover sequential parallelism in sign language videos. IEEE Trans Pattern Anal Mach Intell 42(9):2306–2320
Article Google Scholar
Enireddy V, Anitha J, Mahendra N, Kishore G (2023) An optimized automated recognition of infant sign language using enhanced convolution neural network and deep lstm. Multimed Tools Appl 1–23
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30
Camgoz NC, Koller O, Hadfield S, Bowden R (2020) Sign language transformers: joint end-to-end sign language recognition and translation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 10023–10033
Saunders B, Camgoz NC, Bowden R (2020) Progressive transformers for end-to-end sign language production. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16. Springer, pp 687–705
Zhang Y, Wang J, Wang X, Jing H, Sun Z, Cai Y (2023) Static hand gesture recognition method based on the vision transformer. Multimed Tools Appl 1–20
Li C, Zhong Q, Xie D, Pu S (2017) Skeleton-based action recognition with convolutional neural networks. In: 2017 IEEE international conference on multimedia & expo workshops (ICMEW). IEEE, pp 597–600
Li C, Wang P, Wang S, Hou Y, Li W (2017) Skeleton-based action recognition using lstm and cnn. In: 2017 IEEE international conference on multimedia & expo workshops (ICMEW). IEEE, pp 585–590
Shi L, Zhang Y, Cheng J, Lu H (2020) Skeleton-based action recognition with multi-stream adaptive graph convolutional networks. IEEE Trans Image Process 29:9532–9545
Article Google Scholar
Amorim CC, Macêdo D, Zanchettin C (2019) Spatial-temporal graph convolutional networks for sign language recognition. In: Artificial neural networks and machine learning–ICANN 2019: workshop and special sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings 28. Springer, pp 646–657
Liang W, Xu X (2021) Skeleton-based sign language recognition with attention-enhanced graph convolutional networks. In: Natural language processing and Chinese computing: 10th CCF international conference, NLPCC 2021, Qingdao, China, October 13–17, 2021, Proceedings, Part I 10. Springer, pp 773–785
Cheng K, Zhang Y, He X, Chen W, Cheng J, Lu H (2020) Skeleton-based action recognition with shift graph convolutional network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 183–192
Cheng K, Zhang Y, He X, Cheng J, Lu H (2021) Extremely lightweight skeleton-based action recognition with shiftgcn++. IEEE Trans Image Process 30:7333–7348
Article Google Scholar
Tunga A, Nuthalapati SV, Wachs J (2021) Pose-based sign language recognition using gcn and bert. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp 31–40
Jiang S, Sun B, Wang L, Bai Y, Li K, Fu Y (2021) Skeleton aware multi-modal sign language recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 3413–3423
Jiang S, Sun B, Wang L, Bai Y, Li K, Fu Y (2021) Sign language recognition via skeleton-aware multi-model ensemble. arXiv:2110.06161
Lee H, Cho J, Kim I-j, Park U (2022) Distance-gcn for action recognition. Pattern Recognition: 6th Asian Conference, ACPR 2021, Jeju Island, South Korea, November 9–12, 2021. Revised Selected Papers, Part I. Springer, pp 170–181
Chapter Google Scholar
Ke L, Peng K-C, Lyu S (2022) Towards to-at spatio-temporal focus for skeleton-based action recognition. Proceedings of the AAAI conference on artificial intelligence 36:1131–1139
Article Google Scholar
Li R, Meng L (2022) Sign language recognition and translation network based on multi-view data. Appl Intell 52(13):14624–14638
Article Google Scholar
Li C, Xie C, Zhang B, Han J, Zhen X, Chen J (2021) Memory attention networks for skeleton-based action recognition. IEEE Trans Neural Netw Learn Syst 33(9):4800–4814
Article Google Scholar
Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 1227–1236
Song Y-F, Zhang Z, Shan C, Wang L (2020) Stronger, faster and more explainable: a graph convolutional baseline for skeleton-based action recognition. In: Proceedings of the 28th ACM international conference on multimedia. pp 1625–1633
Cho S, Maqbool M, Liu F, Foroosh H (2020) Self-attention network for skeleton-based human action recognition. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp 635–644
Xu W, Ying J, Yang H, Liu J, Hu X (2022) Residual spatial graph convolution and temporal sequence attention network for sign language translation. Multimed Tools Appl 1–25
Song Y-F, Zhang Z, Shan C, Wang L (2022) Constructing stronger and faster baselines for skeleton-based action recognition. IEEE Trans Pattern Anal Mach Intell 45(2):1474–1488
Article Google Scholar
Liu Y, Lu F, Cheng X, Yuan Y, Tian G (2022) Multi-stream gcn for sign language recognition based on asymmetric convolution channel attention. In: 2022 IEEE 17th conference on industrial electronics and applications (ICIEA). IEEE, pp 614–619
Selvaraj P, Nc G, Kumar P, Khapra M (2021) Openhands: making sign language recognition accessible with pose-based pretrained models across languages. arXiv:2110.05877
Song N (2022) Slgtformer: an attention-based approach to sign language recognition. arXiv:2212.10746
Maruyama M, Ghose S, Inoue K, Roy PP, Iwamura M, Yoshioka M (2021) Word-level sign language recognition with multi-stream neural networks focusing on local regions. arXiv:2106.15989
Hu H, Zhou W, Li H (2021) Hand-model-aware sign language recognition. Proceedings of the AAAI conference on artificial intelligence 35:1558–1566
Article Google Scholar
Hu H, Zhao W, Zhou W, Wang Y, Li H (2021) Signbert: pre-training of hand-model-aware representation for sign language recognition. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 11087–11096
Al-Hammadi M, Bencherif MA, Alsulaiman M, Muhammad G, Mekhtiche MA, Abdul W, Alohali YA, Alrayes TS, Mathkour H, Faisal M et al (2022) Spatial attention-based 3d graph convolutional neural network for sign language recognition. Sensors 22(12):4558
Article Google Scholar

Download references

Acknowledgements

The work was supported by the National Natural Science Foundation of China (No.61973187).

Author information

Authors and Affiliations

School of Control Science and Engineering, Shandong University, Qianfoshan Sub-district, Jinan, 250061, Shandong, China
Yuhong Liu, Fei Lu, Xianpeng Cheng & Ying Yuan

Authors

Yuhong Liu
View author publications
You can also search for this author inPubMed Google Scholar
Fei Lu
View author publications
You can also search for this author inPubMed Google Scholar
Xianpeng Cheng
View author publications
You can also search for this author inPubMed Google Scholar
Ying Yuan
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Fei Lu.

Ethics declarations

Conflict of interest

No potential conflict of interest was reported by the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, Y., Lu, F., Cheng, X. et al. Asymmetric multi-branch GCN for skeleton-based sign language recognition. Multimed Tools Appl 83, 75293–75319 (2024). https://doi.org/10.1007/s11042-024-18443-1

Download citation

Received: 23 February 2023
Revised: 25 December 2023
Accepted: 24 January 2024
Published: 15 February 2024
Issue Date: September 2024
DOI: https://doi.org/10.1007/s11042-024-18443-1

Keywords

Part of a collection:

Track 6: Computer Vision for Multimedia Applications

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Asymmetric multi-branch GCN for skeleton-based sign language recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Skeleton-Based Sign Language Recognition with Attention-Enhanced Graph Convolutional Networks

Spatial-Temporal Graph Transformer for Skeleton-Based Sign Language Recognition

Diffusion-guided graph convolutional networks for sign language recognition

Availability of data and materials

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now