Residual spatial graph convolution and temporal sequence attention network for sign language translation

Xu, Wencheng; Ying, Jie; Yang, Haima; Liu, Jin; Hu, Xing

doi:10.1007/s11042-022-14172-5

Residual spatial graph convolution and temporal sequence attention network for sign language translation

Published: 23 November 2022

Volume 82, pages 23483–23507, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Wencheng Xu¹,
Jie Ying ORCID: orcid.org/0000-0001-5256-0323¹,
Haima Yang¹,
Jin Liu² &
…
Xing Hu¹

172 Accesses
3 Citations
Explore all metrics

Abstract

Vision-based sign language translation technology (SLT) has brought the communication distance between deaf and ordinary people closer to a certain extent. The obstacle of SLT is mainly in two aspects: firstly, when capturing sign language action features, it is impossible to effectively overcome the shortcomings such as redundant information of sign language gesture features and motion ambiguity; secondly, it is difficult to define the alignment between action sequences and lexical sequences when processing sentence-level sign language videos. To overcome these problems, this paper proposes a sign language translation method based on residual spatial graph convolution network (Res-SGCN) and temporal attention model. Where, the Res-SGCN module is used to capture the spatial interaction feature information between the sign language skeleton nodes, and subsequently the temporal attention network is used to capture the temporal dimensional information fusion of the sign language spatial feature sequence and align it with the predicted vocabulary for translation. Experiments on public datasets show that the word error rate(WER) output by the proposed model reaches 4.17%, which is superior to other advanced sign language translation methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Spatial–temporal transformer for end-to-end sign language recognition

Article Open access 03 February 2023

Sign language recognition and translation network based on multi-view data

Article 05 July 2022

Skeleton-Based Sign Language Recognition with Attention-Enhanced Graph Convolutional Networks

Data Availability statements

All data included in this study are available upon request by contact with the corresponding author. The datasets generated during and/or analysed during the current study are available in the [CCSL] repository, [http://home.ustc.edu.cn/~pjh/openresources/cslr-dataset-2015/index.html].

Notes

http://home.ustc.edu.cn/~pjh/openresources/cslr-dataset-2015/index.html

References

Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. CoRR arXiv:1409.0473
Bazarevsky V, Grishchenko I, Raveendran K, Zhu TL, Zhang F, Grundmann M Blazepose: on-device real-time body pose tracking. arXiv:2006.10204
Camgoz NC, Hadfield S, Koller O, Bowden R (2017) Subunets: end-to-end hand shape and continuous sign language recognition. In: 2017 IEEE international conference on computer vision (ICCV), pp 3075–3084. https://doi.org/10.1109/ICCV.2017.332
Camgoz NC, Hadfield S, Koller O, Ney H, Bowden R (2018) Neural sign language translation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 7784–7793. https://doi.org/10.1109/CVPR.2018.00812
Cho K, van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation
Cihan Camgöz N, Koller O, Hadfield S, Bowden R (2020) Sign language transformers: joint end-to-end sign language recognition and translation. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 10020–10030. https://doi.org/10.1109/CVPR42600.2020.01004
de Amorim CC, Macêdo D, Zanchettin C (2019) Spatial-temporal graph convolutional networks for sign language recognition. In: Tetko IV, Kůrková V, Karpov P, Theis F (eds) Artificial neural networks and machine learning – ICANN 2019: workshop and special sessions, pp 646–657. Springer
Gao L, Li H, Liu Z, Liu Z, Wan L, Feng W (2021) Rnn-transducer based chinese sign language recognition. Neurocomputing 434:45–54. https://doi.org/10.1016/j.neucom.2020.12.006
Article Google Scholar
Graves A, Fernández S, Gomez F, Schmidhuber J (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd international conference on machine learning. ICML ’06, pp 369–376. Association for Computing Machinery. https://doi.org/10.1145/1143844.1143891
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition
Higuchi Y, Watanabe S, Chen N, Ogawa T, Kobayashi T (2020) Mask ctc: non-autoregressive end-to-end asr with ctc and mask predict
Huang J, Zhou W, Li H, Li W (2015) Sign language recognition using 3d convolutional neural networks. In: 2015 IEEE international conference on multimedia and expo (ICME), pp 1–6. https://doi.org/10.1109/ICME.2015.7177428
Huang J, Zhou W, Zhang Q, Li H, Li W (2018) Video-based sign language recognition without temporal segmentation
Ko S-K, Kim CJ, Jung H, Cho C (2019) Neural sign language translation based on human keypoint estimation. Appl Sci 9(13):2683
Article Google Scholar
Li D, Opazo CR, Yu X, Li H (2020) Word-level deep sign language recognition from video: a new large-scale dataset and methods comparison. In: 2020 IEEE winter conference on applications of computer vision (WACV), pp 1448–1458. https://doi.org/10.1109/WACV45572.2020.9093512
Ma C, Zhang S, Wang A, Qi Y, Chen G (2020) Skeleton-based dynamic hand gesture recognition using an enhanced network with one-shot learning. Appl Sci 10(11):3680. https://doi.org/10.3390/app10113680 https://doi.org/10.3390/app10113680
Article Google Scholar
Pu J, Zhou W, Li H (2019) Iterative alignment network for continuous sign language recognition. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4160–4169. https://doi.org/10.1109/CVPR.2019.00429
Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2016) Inception-v4 inception-ResNet and the impact of residual connections on learning
Venugopalan S, Rohrbach M, Donahue J, Mooney R, Darrell T, Saenko K (2015) Sequence to sequence – video to text. In: 2015 IEEE international conference on computer vision (ICCV), pp 4534–4542. https://doi.org/10.1109/ICCV.2015.515
Xiao Q, Qin M, Yin Y (2020) Skeleton-based chinese sign language recognition and generation for bidirectional communication between deaf and hearing people. Neural Net 125:41–55. https://doi.org/10.1016/j.neunet.2020.01.030
Article Google Scholar
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. Proceedings of the AAAI conference on artificial intelligence, vol 32(1)
Yang S, Zhu Q (2017) Continuous chinese sign language recognition with cnn-lstm. In: Ninth international conference on digital image processing (ICDIP 2017), vol 10420, pp 83–89. SPIE
Zhang J, Zhou W, Xie C, Pu J, Li H (2016) Chinese sign language recognition with adaptive hmm. In: 2016 IEEE international conference on multimedia and expo (ICME), pp 1–6. https://doi.org/10.1109/ICME.2016.7552950
Zhou H, Zhou W, Li H (2019) Dynamic pseudo label decoding for continuous sign language recognition. In: 2019 IEEE International conference on multimedia and expo (ICME)

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant 62172280. CAS Key Laboratory of Technology in Geospatial Information Processing and Application System (GIPAS), University of Science and Technology of China (USTC) for releasing the CCSL database.

Author information

Authors and Affiliations

School of Optical Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
Wencheng Xu, Jie Ying, Haima Yang & Xing Hu
School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, 201620, China
Jin Liu

Authors

Wencheng Xu
View author publications
You can also search for this author in PubMed Google Scholar
Jie Ying
View author publications
You can also search for this author in PubMed Google Scholar
Haima Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xing Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jie Ying.

Ethics declarations

Conflict of Interests

No conflict of interest exits in the submission of this manuscript, and manuscript is approved by all authors for publication. I would like to declare on behalf of my co-authors that the work described was original research that has not been published previously, and not under consideration for publication elsewhere.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xu, W., Ying, J., Yang, H. et al. Residual spatial graph convolution and temporal sequence attention network for sign language translation. Multimed Tools Appl 82, 23483–23507 (2023). https://doi.org/10.1007/s11042-022-14172-5

Download citation

Received: 25 March 2022
Revised: 03 August 2022
Accepted: 27 October 2022
Published: 23 November 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s11042-022-14172-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Residual spatial graph convolution and temporal sequence attention network for sign language translation

Abstract

Access this article

Similar content being viewed by others

Spatial–temporal transformer for end-to-end sign language recognition

Sign language recognition and translation network based on multi-view data

Skeleton-Based Sign Language Recognition with Attention-Enhanced Graph Convolutional Networks

Data Availability statements

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Residual spatial graph convolution and temporal sequence attention network for sign language translation

Abstract

Access this article

Similar content being viewed by others

Spatial–temporal transformer for end-to-end sign language recognition

Sign language recognition and translation network based on multi-view data

Skeleton-Based Sign Language Recognition with Attention-Enhanced Graph Convolutional Networks

Data Availability statements

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation