Abstract
With the increase in the number of hearing-impaired people in the world, sign language recognition (SLR) has attracted extensive attention from scholars. Given the problems existing in the current research of SLR, such as parameter expansion and unsatisfactory performance of feature extraction, a novel skeleton-based method is proposed in this paper. The Asymmetric Multi-branch Graph Convolution Network (AM-GCN), composed of a spatial graph convolution and an Asymmetric Multi-branch Temporal Convolution (MTC), is constructed to achieve the acquisition and processing of graph structure. MTC utilizes multi-branch dilated convolution to expand the receptive field and enhance information dependence. To effectively extract discriminative spatiotemporal information from a large amount of information, the Spatial and Temporal Fusion Attention module (STFA) is proposed. The STFA maintains spatiotemporal consistency and obtains the fused attention map, which substantially facilitates spatiotemporal feature learning. In this article, Asymmetric Convolution Channel Attention (ACCA) is used as channel attention. Some experiments are carried out on a processed dataset obtained from video transformation, confirming the robustness of the ACCA for image flipping and rotation. The STFA and ACCA jointly form a spatial-temporal-channel attention module to extract distinguishing features and enhance the model representation. Eventually, the attention module is inserted into the AM-GCN, attaining AM-GCN-A, which is experimented on the WLASL2000, AUTSL, and CSL datasets. The top 1 accuracy is 57.01\(\%\), 96.27\(\%\), and 98.20\(\%\), respectively. The results are competitive with the state-of-the-art methods and prove the effectiveness of the model.














Similar content being viewed by others
Availability of data and materials
All datasets are publicly available, shown in the references.
Notes
We show the main codes about the structure of AM-GCN in https://github.com/LiuyhLinda/Asymmetric_Multi-branch_GCN-main_for_SLR.
References
Otoom M, Alzubaidi MA, Aloufee R (2022) Novel navigation assistive device for deaf drivers. Assist Technol 34(2):129–139
Kusters A (2021) International sign and american sign language as different types of global deaf lingua francas. Sign Lang Stud 21(4):391–426
Cui R, Liu H, Zhang C (2017) Recurrent convolutional neural networks for continuous sign language recognition by staged optimization. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 7361–7369
Pu J, Zhou W, Li H (2019) Iterative alignment network for continuous sign language recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 4165–4174
Jiang S, Sun B, Wang L, Bai Y, Li K, Fu Y (2021) Skeleton aware multi-modal sign language recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 3413–3423
Vazquez-Enriquez M, Alba-Castro JL, Docío-Fernández L, Rodriguez-Banga E (2021) Isolated sign language recognition with multi-scale spatial-temporal graph convolutional networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 3462–3471
Zhang Z (2012) Microsoft kinect sensor and its effect. IEEE Multimed 19(2):4–10
Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv:1609.02907
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second AAAI conference on artificial intelligence
Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 12026–12035
Cheng K, Zhang Y, Cao C, Shi L, Cheng J, Lu H (2020) Decoupling gcn with dropgraph module for skeleton-based action recognition. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIV 16. Springer, pp 536–553
Liu Z, Zhang H, Chen Z, Wang Z, Ouyang W (2020) Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 143–152
Liang W, Xu X (2021) Skeleton-based sign language recognition with attention-enhanced graph convolutional networks. In: Natural language processing and chinese computing: 10th CCF International conference, NLPCC 2021, Qingdao, China, October 13–17, 2021, Proceedings, Part I 10. Springer, pp 773–785
Vazquez-Enriquez M, Alba-Castro JL, Docío-Fernández L, Rodriguez-Banga E (2021) Isolated sign language recognition with multi-scale spatial-temporal graph convolutional networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 3462–3471
Li D, Rodriguez C, Yu X, Li H (2020) Word-level deep sign language recognition from video: a new large-scale dataset and methods comparison. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp 1459–1469
Sincan OM, Keles HY (2020) Autsl: a large scale multi-modal Turkish sign language dataset and baseline methods. IEEE Access 8:181340–181355
Zhang J, Zhou W, Xie C, Pu J, Li H (2016) Chinese sign language recognition with adaptive hmm. In: 2016 IEEE international conference on multimedia and expo (ICME). IEEE, pp 1–6
Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3d skeletons as points in a lie group. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 588–595
Badhe PC, Kulkarni V (2015) Indian sign language translator using gesture recognition algorithm. In: 2015 IEEE international conference on computer graphics, vision and information security (CGVIS). IEEE, pp 195–200
Cooper H, Ong E-J, Pugeault N, Bowden R (2012) Sign language recognition using sub-units. J Mach Learn Res 13:2205–2231
Huang J, Zhou W, Li H, Li W (2015) Sign language recognition using 3d convolutional neural networks. In: 2015 IEEE international conference on multimedia and expo (ICME). IEEE, pp 1–6
Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 6299–6308
Abeje BT, Salau AO, Mengistu AD, Tamiru NK (2022) Ethiopian sign language recognition using deep convolutional neural network. Multimed Tools Appl 81(20):29027–29043
Xiao Q, Zhao Y, Huan W (2019) Multi-sensor data fusion for sign language recognition based on dynamic Bayesian network and convolutional neural network. Multimed Tools Appl 78:15335–15352
Han X, Lu F, Tian G (2022) Efficient 3d cnns with knowledge transfer for sign language recognition. Multimed Tools Appl 81(7):10071–10090
Pigou L, Van Den Oord A, Dieleman S, Van Herreweghe M, Dambre J (2018) Beyond temporal pooling: recurrence and temporal convolutions for gesture recognition in video. Int J Comput Vision 126:430–439
Liu T, Zhou W, Li H (2016) Sign language recognition with long short-term memory. In: 2016 IEEE international conference on image processing (ICIP). IEEE, pp 2871–2875
Koller O, Camgoz NC, Ney H, Bowden R (2019) Weakly supervised learning with multi-stream cnn-lstm-hmms to discover sequential parallelism in sign language videos. IEEE Trans Pattern Anal Mach Intell 42(9):2306–2320
Enireddy V, Anitha J, Mahendra N, Kishore G (2023) An optimized automated recognition of infant sign language using enhanced convolution neural network and deep lstm. Multimed Tools Appl 1–23
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30
Camgoz NC, Koller O, Hadfield S, Bowden R (2020) Sign language transformers: joint end-to-end sign language recognition and translation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 10023–10033
Saunders B, Camgoz NC, Bowden R (2020) Progressive transformers for end-to-end sign language production. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16. Springer, pp 687–705
Zhang Y, Wang J, Wang X, Jing H, Sun Z, Cai Y (2023) Static hand gesture recognition method based on the vision transformer. Multimed Tools Appl 1–20
Li C, Zhong Q, Xie D, Pu S (2017) Skeleton-based action recognition with convolutional neural networks. In: 2017 IEEE international conference on multimedia & expo workshops (ICMEW). IEEE, pp 597–600
Li C, Wang P, Wang S, Hou Y, Li W (2017) Skeleton-based action recognition using lstm and cnn. In: 2017 IEEE international conference on multimedia & expo workshops (ICMEW). IEEE, pp 585–590
Shi L, Zhang Y, Cheng J, Lu H (2020) Skeleton-based action recognition with multi-stream adaptive graph convolutional networks. IEEE Trans Image Process 29:9532–9545
Amorim CC, Macêdo D, Zanchettin C (2019) Spatial-temporal graph convolutional networks for sign language recognition. In: Artificial neural networks and machine learning–ICANN 2019: workshop and special sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings 28. Springer, pp 646–657
Liang W, Xu X (2021) Skeleton-based sign language recognition with attention-enhanced graph convolutional networks. In: Natural language processing and Chinese computing: 10th CCF international conference, NLPCC 2021, Qingdao, China, October 13–17, 2021, Proceedings, Part I 10. Springer, pp 773–785
Cheng K, Zhang Y, He X, Chen W, Cheng J, Lu H (2020) Skeleton-based action recognition with shift graph convolutional network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 183–192
Cheng K, Zhang Y, He X, Cheng J, Lu H (2021) Extremely lightweight skeleton-based action recognition with shiftgcn++. IEEE Trans Image Process 30:7333–7348
Tunga A, Nuthalapati SV, Wachs J (2021) Pose-based sign language recognition using gcn and bert. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp 31–40
Jiang S, Sun B, Wang L, Bai Y, Li K, Fu Y (2021) Skeleton aware multi-modal sign language recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 3413–3423
Jiang S, Sun B, Wang L, Bai Y, Li K, Fu Y (2021) Sign language recognition via skeleton-aware multi-model ensemble. arXiv:2110.06161
Lee H, Cho J, Kim I-j, Park U (2022) Distance-gcn for action recognition. Pattern Recognition: 6th Asian Conference, ACPR 2021, Jeju Island, South Korea, November 9–12, 2021. Revised Selected Papers, Part I. Springer, pp 170–181
Ke L, Peng K-C, Lyu S (2022) Towards to-at spatio-temporal focus for skeleton-based action recognition. Proceedings of the AAAI conference on artificial intelligence 36:1131–1139
Li R, Meng L (2022) Sign language recognition and translation network based on multi-view data. Appl Intell 52(13):14624–14638
Li C, Xie C, Zhang B, Han J, Zhen X, Chen J (2021) Memory attention networks for skeleton-based action recognition. IEEE Trans Neural Netw Learn Syst 33(9):4800–4814
Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 1227–1236
Song Y-F, Zhang Z, Shan C, Wang L (2020) Stronger, faster and more explainable: a graph convolutional baseline for skeleton-based action recognition. In: Proceedings of the 28th ACM international conference on multimedia. pp 1625–1633
Cho S, Maqbool M, Liu F, Foroosh H (2020) Self-attention network for skeleton-based human action recognition. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp 635–644
Xu W, Ying J, Yang H, Liu J, Hu X (2022) Residual spatial graph convolution and temporal sequence attention network for sign language translation. Multimed Tools Appl 1–25
Song Y-F, Zhang Z, Shan C, Wang L (2022) Constructing stronger and faster baselines for skeleton-based action recognition. IEEE Trans Pattern Anal Mach Intell 45(2):1474–1488
Liu Y, Lu F, Cheng X, Yuan Y, Tian G (2022) Multi-stream gcn for sign language recognition based on asymmetric convolution channel attention. In: 2022 IEEE 17th conference on industrial electronics and applications (ICIEA). IEEE, pp 614–619
Selvaraj P, Nc G, Kumar P, Khapra M (2021) Openhands: making sign language recognition accessible with pose-based pretrained models across languages. arXiv:2110.05877
Song N (2022) Slgtformer: an attention-based approach to sign language recognition. arXiv:2212.10746
Maruyama M, Ghose S, Inoue K, Roy PP, Iwamura M, Yoshioka M (2021) Word-level sign language recognition with multi-stream neural networks focusing on local regions. arXiv:2106.15989
Hu H, Zhou W, Li H (2021) Hand-model-aware sign language recognition. Proceedings of the AAAI conference on artificial intelligence 35:1558–1566
Hu H, Zhao W, Zhou W, Wang Y, Li H (2021) Signbert: pre-training of hand-model-aware representation for sign language recognition. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 11087–11096
Al-Hammadi M, Bencherif MA, Alsulaiman M, Muhammad G, Mekhtiche MA, Abdul W, Alohali YA, Alrayes TS, Mathkour H, Faisal M et al (2022) Spatial attention-based 3d graph convolutional neural network for sign language recognition. Sensors 22(12):4558
Acknowledgements
The work was supported by the National Natural Science Foundation of China (No.61973187).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
No potential conflict of interest was reported by the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, Y., Lu, F., Cheng, X. et al. Asymmetric multi-branch GCN for skeleton-based sign language recognition. Multimed Tools Appl 83, 75293–75319 (2024). https://doi.org/10.1007/s11042-024-18443-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-024-18443-1