Skip to main content
Log in

Asymmetric multi-branch GCN for skeleton-based sign language recognition

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

With the increase in the number of hearing-impaired people in the world, sign language recognition (SLR) has attracted extensive attention from scholars. Given the problems existing in the current research of SLR, such as parameter expansion and unsatisfactory performance of feature extraction, a novel skeleton-based method is proposed in this paper. The Asymmetric Multi-branch Graph Convolution Network (AM-GCN), composed of a spatial graph convolution and an Asymmetric Multi-branch Temporal Convolution (MTC), is constructed to achieve the acquisition and processing of graph structure. MTC utilizes multi-branch dilated convolution to expand the receptive field and enhance information dependence. To effectively extract discriminative spatiotemporal information from a large amount of information, the Spatial and Temporal Fusion Attention module (STFA) is proposed. The STFA maintains spatiotemporal consistency and obtains the fused attention map, which substantially facilitates spatiotemporal feature learning. In this article, Asymmetric Convolution Channel Attention (ACCA) is used as channel attention. Some experiments are carried out on a processed dataset obtained from video transformation, confirming the robustness of the ACCA for image flipping and rotation. The STFA and ACCA jointly form a spatial-temporal-channel attention module to extract distinguishing features and enhance the model representation. Eventually, the attention module is inserted into the AM-GCN, attaining AM-GCN-A, which is experimented on the WLASL2000, AUTSL, and CSL datasets. The top 1 accuracy is 57.01\(\%\), 96.27\(\%\), and 98.20\(\%\), respectively. The results are competitive with the state-of-the-art methods and prove the effectiveness of the model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Availability of data and materials

All datasets are publicly available, shown in the references.

Notes

  1. We show the main codes about the structure of AM-GCN in https://github.com/LiuyhLinda/Asymmetric_Multi-branch_GCN-main_for_SLR.

References

  1. Otoom M, Alzubaidi MA, Aloufee R (2022) Novel navigation assistive device for deaf drivers. Assist Technol 34(2):129–139

    Article  Google Scholar 

  2. Kusters A (2021) International sign and american sign language as different types of global deaf lingua francas. Sign Lang Stud 21(4):391–426

    Article  Google Scholar 

  3. Cui R, Liu H, Zhang C (2017) Recurrent convolutional neural networks for continuous sign language recognition by staged optimization. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 7361–7369

  4. Pu J, Zhou W, Li H (2019) Iterative alignment network for continuous sign language recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 4165–4174

  5. Jiang S, Sun B, Wang L, Bai Y, Li K, Fu Y (2021) Skeleton aware multi-modal sign language recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 3413–3423

  6. Vazquez-Enriquez M, Alba-Castro JL, Docío-Fernández L, Rodriguez-Banga E (2021) Isolated sign language recognition with multi-scale spatial-temporal graph convolutional networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 3462–3471

  7. Zhang Z (2012) Microsoft kinect sensor and its effect. IEEE Multimed 19(2):4–10

    Article  Google Scholar 

  8. Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv:1609.02907

  9. Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second AAAI conference on artificial intelligence

  10. Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 12026–12035

  11. Cheng K, Zhang Y, Cao C, Shi L, Cheng J, Lu H (2020) Decoupling gcn with dropgraph module for skeleton-based action recognition. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIV 16. Springer, pp 536–553

  12. Liu Z, Zhang H, Chen Z, Wang Z, Ouyang W (2020) Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 143–152

  13. Liang W, Xu X (2021) Skeleton-based sign language recognition with attention-enhanced graph convolutional networks. In: Natural language processing and chinese computing: 10th CCF International conference, NLPCC 2021, Qingdao, China, October 13–17, 2021, Proceedings, Part I 10. Springer, pp 773–785

  14. Vazquez-Enriquez M, Alba-Castro JL, Docío-Fernández L, Rodriguez-Banga E (2021) Isolated sign language recognition with multi-scale spatial-temporal graph convolutional networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 3462–3471

  15. Li D, Rodriguez C, Yu X, Li H (2020) Word-level deep sign language recognition from video: a new large-scale dataset and methods comparison. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp 1459–1469

  16. Sincan OM, Keles HY (2020) Autsl: a large scale multi-modal Turkish sign language dataset and baseline methods. IEEE Access 8:181340–181355

    Article  Google Scholar 

  17. Zhang J, Zhou W, Xie C, Pu J, Li H (2016) Chinese sign language recognition with adaptive hmm. In: 2016 IEEE international conference on multimedia and expo (ICME). IEEE, pp 1–6

  18. Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3d skeletons as points in a lie group. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 588–595

  19. Badhe PC, Kulkarni V (2015) Indian sign language translator using gesture recognition algorithm. In: 2015 IEEE international conference on computer graphics, vision and information security (CGVIS). IEEE, pp 195–200

  20. Cooper H, Ong E-J, Pugeault N, Bowden R (2012) Sign language recognition using sub-units. J Mach Learn Res 13:2205–2231

    Google Scholar 

  21. Huang J, Zhou W, Li H, Li W (2015) Sign language recognition using 3d convolutional neural networks. In: 2015 IEEE international conference on multimedia and expo (ICME). IEEE, pp 1–6

  22. Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 6299–6308

  23. Abeje BT, Salau AO, Mengistu AD, Tamiru NK (2022) Ethiopian sign language recognition using deep convolutional neural network. Multimed Tools Appl 81(20):29027–29043

    Article  Google Scholar 

  24. Xiao Q, Zhao Y, Huan W (2019) Multi-sensor data fusion for sign language recognition based on dynamic Bayesian network and convolutional neural network. Multimed Tools Appl 78:15335–15352

    Article  Google Scholar 

  25. Han X, Lu F, Tian G (2022) Efficient 3d cnns with knowledge transfer for sign language recognition. Multimed Tools Appl 81(7):10071–10090

  26. Pigou L, Van Den Oord A, Dieleman S, Van Herreweghe M, Dambre J (2018) Beyond temporal pooling: recurrence and temporal convolutions for gesture recognition in video. Int J Comput Vision 126:430–439

    Article  MathSciNet  Google Scholar 

  27. Liu T, Zhou W, Li H (2016) Sign language recognition with long short-term memory. In: 2016 IEEE international conference on image processing (ICIP). IEEE, pp 2871–2875

  28. Koller O, Camgoz NC, Ney H, Bowden R (2019) Weakly supervised learning with multi-stream cnn-lstm-hmms to discover sequential parallelism in sign language videos. IEEE Trans Pattern Anal Mach Intell 42(9):2306–2320

    Article  Google Scholar 

  29. Enireddy V, Anitha J, Mahendra N, Kishore G (2023) An optimized automated recognition of infant sign language using enhanced convolution neural network and deep lstm. Multimed Tools Appl 1–23

  30. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30

  31. Camgoz NC, Koller O, Hadfield S, Bowden R (2020) Sign language transformers: joint end-to-end sign language recognition and translation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 10023–10033

  32. Saunders B, Camgoz NC, Bowden R (2020) Progressive transformers for end-to-end sign language production. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16. Springer, pp 687–705

  33. Zhang Y, Wang J, Wang X, Jing H, Sun Z, Cai Y (2023) Static hand gesture recognition method based on the vision transformer. Multimed Tools Appl 1–20

  34. Li C, Zhong Q, Xie D, Pu S (2017) Skeleton-based action recognition with convolutional neural networks. In: 2017 IEEE international conference on multimedia & expo workshops (ICMEW). IEEE, pp 597–600

  35. Li C, Wang P, Wang S, Hou Y, Li W (2017) Skeleton-based action recognition using lstm and cnn. In: 2017 IEEE international conference on multimedia & expo workshops (ICMEW). IEEE, pp 585–590

  36. Shi L, Zhang Y, Cheng J, Lu H (2020) Skeleton-based action recognition with multi-stream adaptive graph convolutional networks. IEEE Trans Image Process 29:9532–9545

    Article  Google Scholar 

  37. Amorim CC, Macêdo D, Zanchettin C (2019) Spatial-temporal graph convolutional networks for sign language recognition. In: Artificial neural networks and machine learning–ICANN 2019: workshop and special sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings 28. Springer, pp 646–657

  38. Liang W, Xu X (2021) Skeleton-based sign language recognition with attention-enhanced graph convolutional networks. In: Natural language processing and Chinese computing: 10th CCF international conference, NLPCC 2021, Qingdao, China, October 13–17, 2021, Proceedings, Part I 10. Springer, pp 773–785

  39. Cheng K, Zhang Y, He X, Chen W, Cheng J, Lu H (2020) Skeleton-based action recognition with shift graph convolutional network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 183–192

  40. Cheng K, Zhang Y, He X, Cheng J, Lu H (2021) Extremely lightweight skeleton-based action recognition with shiftgcn++. IEEE Trans Image Process 30:7333–7348

    Article  Google Scholar 

  41. Tunga A, Nuthalapati SV, Wachs J (2021) Pose-based sign language recognition using gcn and bert. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp 31–40

  42. Jiang S, Sun B, Wang L, Bai Y, Li K, Fu Y (2021) Skeleton aware multi-modal sign language recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 3413–3423

  43. Jiang S, Sun B, Wang L, Bai Y, Li K, Fu Y (2021) Sign language recognition via skeleton-aware multi-model ensemble. arXiv:2110.06161

  44. Lee H, Cho J, Kim I-j, Park U (2022) Distance-gcn for action recognition. Pattern Recognition: 6th Asian Conference, ACPR 2021, Jeju Island, South Korea, November 9–12, 2021. Revised Selected Papers, Part I. Springer, pp 170–181

    Chapter  Google Scholar 

  45. Ke L, Peng K-C, Lyu S (2022) Towards to-at spatio-temporal focus for skeleton-based action recognition. Proceedings of the AAAI conference on artificial intelligence 36:1131–1139

    Article  Google Scholar 

  46. Li R, Meng L (2022) Sign language recognition and translation network based on multi-view data. Appl Intell 52(13):14624–14638

    Article  Google Scholar 

  47. Li C, Xie C, Zhang B, Han J, Zhen X, Chen J (2021) Memory attention networks for skeleton-based action recognition. IEEE Trans Neural Netw Learn Syst 33(9):4800–4814

    Article  Google Scholar 

  48. Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 1227–1236

  49. Song Y-F, Zhang Z, Shan C, Wang L (2020) Stronger, faster and more explainable: a graph convolutional baseline for skeleton-based action recognition. In: Proceedings of the 28th ACM international conference on multimedia. pp 1625–1633

  50. Cho S, Maqbool M, Liu F, Foroosh H (2020) Self-attention network for skeleton-based human action recognition. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp 635–644

  51. Xu W, Ying J, Yang H, Liu J, Hu X (2022) Residual spatial graph convolution and temporal sequence attention network for sign language translation. Multimed Tools Appl 1–25

  52. Song Y-F, Zhang Z, Shan C, Wang L (2022) Constructing stronger and faster baselines for skeleton-based action recognition. IEEE Trans Pattern Anal Mach Intell 45(2):1474–1488

    Article  Google Scholar 

  53. Liu Y, Lu F, Cheng X, Yuan Y, Tian G (2022) Multi-stream gcn for sign language recognition based on asymmetric convolution channel attention. In: 2022 IEEE 17th conference on industrial electronics and applications (ICIEA). IEEE, pp 614–619

  54. Selvaraj P, Nc G, Kumar P, Khapra M (2021) Openhands: making sign language recognition accessible with pose-based pretrained models across languages. arXiv:2110.05877

  55. Song N (2022) Slgtformer: an attention-based approach to sign language recognition. arXiv:2212.10746

  56. Maruyama M, Ghose S, Inoue K, Roy PP, Iwamura M, Yoshioka M (2021) Word-level sign language recognition with multi-stream neural networks focusing on local regions. arXiv:2106.15989

  57. Hu H, Zhou W, Li H (2021) Hand-model-aware sign language recognition. Proceedings of the AAAI conference on artificial intelligence 35:1558–1566

    Article  Google Scholar 

  58. Hu H, Zhao W, Zhou W, Wang Y, Li H (2021) Signbert: pre-training of hand-model-aware representation for sign language recognition. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 11087–11096

  59. Al-Hammadi M, Bencherif MA, Alsulaiman M, Muhammad G, Mekhtiche MA, Abdul W, Alohali YA, Alrayes TS, Mathkour H, Faisal M et al (2022) Spatial attention-based 3d graph convolutional neural network for sign language recognition. Sensors 22(12):4558

    Article  Google Scholar 

Download references

Acknowledgements

The work was supported by the National Natural Science Foundation of China (No.61973187).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fei Lu.

Ethics declarations

Conflict of interest

No potential conflict of interest was reported by the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Y., Lu, F., Cheng, X. et al. Asymmetric multi-branch GCN for skeleton-based sign language recognition. Multimed Tools Appl 83, 75293–75319 (2024). https://doi.org/10.1007/s11042-024-18443-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-024-18443-1

Keywords