Sentence2SignGesture: a hybrid neural machine translation network for sign language video generation

Natarajan, B.; Elakkiya, R.; Prasad, Moturi Leela

doi:10.1007/s12652-021-03640-9

Sentence2SignGesture: a hybrid neural machine translation network for sign language video generation

Original Research
Published: 26 January 2022

Volume 14, pages 9807–9821, (2023)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

1177 Accesses
Explore all metrics

Abstract

The development of Neural Machine Translation (NMT) systems has attained prominent position in language translation tasks. However, it faces huge challenges in translating the new words and out-of-vocabularies. This problem is identified as a major drawback of conventional NMT systems in language translation results more copied outputs. In addition to that, it places the risks in understanding multilingual language structures and word relationships. In this paper, we propose novel deep stacked GRU algorithm based NMT System to address the aforementioned challenges and handles multilingual sentences based translation tasks efficiently. We aimed to develop the proposed model for translating the spoken sentences into sign words. The generated sign words (glosses) are mapped with sign gesture images to automate the sign gesture video generation process using deep generative models. The proposed hybrid NMT model has been evaluated qualitatively and quantitatively using different benchmark sign language datasets. The improved BLEU Score shows the outperformance of our model compared with earlier approaches. We also evaluated the proposed model using our self created Indian sign language corpus (ISL-CSLTR). The final result shows the achievement of greater translation results with minimal processing cost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Text2Sign: Towards Sign Language Production Using Neural Machine Translation and Generative Adversarial Networks

Article Open access 02 January 2020

Graph Adversarial Network with Bottleneck Adapter Tuning for Sign Language Production

Progressive Transformers for End-to-End Sign Language Production

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Amodei D, Ananthanarayanan S, Anubhai R, Bai J, Battenberg E, Case C, Zhu Z (2016) Deep speech 2: end-to-end speech recognition in English and Mandarin. In: International conference on machine learning, pp 173–182. PMLR
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv: arXiv: 1409.0473
Bantupalli K, Xie Y (2018) American sign language recognition using deep learning and computer vision. In: 2018 IEEE international conference on big data (big data), pp 4896–4899. IEEE
Bengio Y, Ducharme R, Vincent P, Janvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155
MATH Google Scholar
Bheda, V, Radpour D (2017) Using deep convolutional networks for gesture recognition in American sign language. arXiv preprint arXiv: arXiv: 1710.06836
Camgoz NC, Hadfield S, Koller O, Ney H, Bowden R (2018) Neural sign language translation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7784–7793
Carpuat M, Wu D (2007) Improving statistical machine translation using word sense disambiguation. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), pp 61–72
Chiang D (2005) A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd annual meeting of the association for computational linguistics (ACL’05), pp 263–270
Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv: arXiv: 1406.1078
Cho K, Van Merriënboer B, Bahdanau D, Bengio Y (2014) On the properties of neural machine translation: encoder-decoder approaches. arXiv preprint arXiv: arXiv: 1409.1259
Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2625–2634
Duarte A, Palaskar S, Ventura L, Ghadiyaram D, DeHaan K, Metze F, Giro-i-Nieto X (2021) How2Sign: a large-scale multimodal dataset for continuous American sign language. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition pp 2735–2744
Elakkiya R (2021) Machine learning based sign language recognition: a review and its research frontier. J Ambient Intell Hum Comput 12(7):7205–7224
Article Google Scholar
Elakkiya R, Natarajan B (2021) ISL-CSLTR: Indian sign language dataset for continuous sign language translation and recognition. Mendeley Data. https://doi.org/10.17632/kcmpdxky7p.1
Article Google Scholar
Elakkiya R, Selvamani K (2018) Enhanced dynamic programming approach for subunit modelling to handle segmentation and recognition ambiguities in sign language. J Parallel Distributed Comput 117:246–255
Article Google Scholar
Elakkiya R, Selvamani K (2019) Subunit sign modeling framework for continuous sign language recognition. Comput Electr Eng 74:379–390
Article Google Scholar
Graves A, Fernández S, Gomez F, & Schmidhuber J (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd international conference on machine learning, pp 369–376
Graves A, Mohamed AR, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing, pp 6645–6649. IEEE
Guo D, Zhou W, Li H, Wang M (2018) Hierarchical LSTM for sign language translation. In: Proceedings of the AAAI conference on artificial intelligence, vol 32(1)
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Kalchbrenner N, Blunsom P (2013) Recurrent continuous translation models. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 1700–1709
Ko SK, Kim CJ, Jung H, Cho C (2019) Neural sign language translation based on human keypoint estimation. Appl Sci 9(13):2683
Article Google Scholar
Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. University of Southern California Marina del Rey Information Sciences Institute
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of the demo and poster sessions, pp 177–180
Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: MT summit, vol 5, pp 79–86
Koller O, Forster J, Ney H (2015b) Continuous sign language recognition: towards large vocabulary statistical recognition systems handling multiple signers. Comput vis Image Underst 141:108–125
Article Google Scholar
Koller O, Ney H, Bowden R (2015) Deep learning of mouth shapes for sign language. In: Proceedings of the IEEE international conference on computer vision workshops, pp 85–91
Koller O, Zargaran O, Ney H, Bowden R (2016) Deep sign: hybrid CNN-HMM for continuous sign language recognition. In: Proceedings of the British machine vision conference 2016
Konstantinidis D, Dimitropoulos K, Daras P (2018) A deep learning approach for analyzing video and skeletal features in sign language recognition. In: 2018 IEEE international conference on imaging systems and techniques (IST), pp 1–6. IEEE
Kudo T, Richardson J (2018) Sentencepiece: a simple and language independent subword tokenizer and detokenizer for neural text processing. arXiv preprint arXiv: arXiv: 1808.06226
Kudo T (2018) Subword regularization: Improving neural network translation models with multiple subword candidates. arXiv preprint arXiv: arXiv: 1804.10959
Luong MT, Pham, H, Manning CD (2015) Effective approaches to attention-based neural machine translation. arXiv preprint arXiv: arXiv: 1508.04025
Neubig G (2017) Neural machine translation and sequence-to-sequence models: a tutorial. arXiv preprint arXiv: arXiv: 1703.01619
Ong SC, Ranganath S (2005) Automatic sign language analysis: a survey and the future beyond lexical meaning. IEEE Trans Pattern Anal Mach Intell 27(06):873–891
Article Google Scholar
Provilkov I, Emelianenko D, Voita E (2019) Bpe-dropout: simple and effective subword regularization. arXiv preprint arXiv: arXiv: 1910.13267
Pu J, Zhou W, Li H (2019) Iterative alignment network for continuous sign language recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4165–4174
Pust M, Hermjakob U, Knight K, Marcu D, May J (2015) Parsing English into abstract meaning representation using syntax-based machine translation. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 1143–1154
Sennrich R, Haddow B, Birch A (2015) Neural machine translation of rare words with subword units. arXiv preprint arXiv: arXiv: 1508.07909
Simard M, Ueffing N, Isabelle P, Kuhn R (2007) Rule-based translation with statistical phrase-based post-editing. In: Proceedings of the second workshop on statistical machine translation, pp 203–206
Stoll S, Camgoz NC, Hadfield S, Bowden R (2020) Text2Sign: towards sign language production using neural machine translation and generative adversarial networks. Int J Comput vis 128(4):891–908
Article Google Scholar
Stoll S, Camgöz NC, Hadfield S, Bowden R (2018) Sign language production using neural machine translation and generative adversarial networks. In: Proceedings of the 29th British machine vision conference (BMVC 2018). University of Surrey
Sutskever I, Vinyal O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, pp 3104–3112
Tu Z, Lu Z, Liu Y, Liu X, Li H (2016) Modeling coverage for neural machine translation. arXiv preprint arXiv: arXiv: 1601.04811
Utiyama M, Isahara H (2007) A comparison of pivot methods for phrase-based statistical machine translation. In: Human language technologies 2007: the conference of the North American chapter of the association for computational linguistics; proceedings of the main conference, pp 484–491
Vaswani A, Bengio S, Brevdo E, Chollet F, Gomez AN, Gouws S, Uszkoreit J (2018) Tensor2tensor for neural machine translation. arXiv preprint arXiv: 1803.07416
Wang W, Knight K, Marcu D (2007) Binarizing syntax trees to improve syntax-based machine translation accuracy. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), pp 746–754
Wang X, Lu Z, Tu Z, Li H, Xiong D, Zhang M (2017) Neural machine translation advised by statistical machine translation. In: Thirty-first AAAI conference on artificial intelligence
Wang S, Guo D, Zhou WG, Zha ZJ, Wang M (2018) Connectionist temporal fusion for sign language translation. In: Proceedings of the 26th ACM international conference on multimedia, pp 1483–1491
Wołk K, Marasek K (2015) Neural-based machine translation for medical text domain. based on European Medicines Agency leaflet texts. Proc Comput Sci 64:2–9
Article Google Scholar
Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Dean J (2016) Google's neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv: 1609.08144

Download references

Acknowledgements

The research was funded by the Science and Engineering Research Board (SERB), India under Start-up Research Grant (SRG)/2019–2021 (Grant no. SRG/2019/001338). We would like to thank Navajeevan, Residential School for the Deaf, College of Spl. D.Ed & B.Ed, Vocational Centre, and Child Care & Learning Centre, Ayyalurimetta, Nandyal, Andhra Pradesh, India for their support and also, we thank all the students for their contribution in collecting the sign videos and the successful completion of the ISL-CSLTR corpus.

Author information

Authors and Affiliations

School of Computing, SASTRA Deemed to be University, Thanjavur, Tamilnadu, 613401, India
B. Natarajan, R. Elakkiya & Moturi Leela Prasad

Authors

B. Natarajan
View author publications
You can also search for this author inPubMed Google Scholar
R. Elakkiya
View author publications
You can also search for this author inPubMed Google Scholar
Moturi Leela Prasad
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to R. Elakkiya.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Natarajan, B., Elakkiya, R. & Prasad, M.L. Sentence2SignGesture: a hybrid neural machine translation network for sign language video generation. J Ambient Intell Human Comput 14, 9807–9821 (2023). https://doi.org/10.1007/s12652-021-03640-9

Download citation

Received: 13 July 2021
Accepted: 01 December 2021
Published: 26 January 2022
Issue Date: August 2023
DOI: https://doi.org/10.1007/s12652-021-03640-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sentence2SignGesture: a hybrid neural machine translation network for sign language video generation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Text2Sign: Towards Sign Language Production Using Neural Machine Translation and Generative Adversarial Networks

Graph Adversarial Network with Bottleneck Adapter Tuning for Sign Language Production

Progressive Transformers for End-to-End Sign Language Production

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now