Skip to main content

Advertisement

Log in

Sentence2SignGesture: a hybrid neural machine translation network for sign language video generation

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

The development of Neural Machine Translation (NMT) systems has attained prominent position in language translation tasks. However, it faces huge challenges in translating the new words and out-of-vocabularies. This problem is identified as a major drawback of conventional NMT systems in language translation results more copied outputs. In addition to that, it places the risks in understanding multilingual language structures and word relationships. In this paper, we propose novel deep stacked GRU algorithm based NMT System to address the aforementioned challenges and handles multilingual sentences based translation tasks efficiently. We aimed to develop the proposed model for translating the spoken sentences into sign words. The generated sign words (glosses) are mapped with sign gesture images to automate the sign gesture video generation process using deep generative models. The proposed hybrid NMT model has been evaluated qualitatively and quantitatively using different benchmark sign language datasets. The improved BLEU Score shows the outperformance of our model compared with earlier approaches. We also evaluated the proposed model using our self created Indian sign language corpus (ISL-CSLTR). The final result shows the achievement of greater translation results with minimal processing cost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

source sentence ‘can you repeat that please’, target sign gloss ‘YOU REPEAT PLEASE’

Fig. 11

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Amodei D, Ananthanarayanan S, Anubhai R, Bai J, Battenberg E, Case C, Zhu Z (2016) Deep speech 2: end-to-end speech recognition in English and Mandarin. In: International conference on machine learning, pp 173–182. PMLR

  • Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv: arXiv: 1409.0473

  • Bantupalli K, Xie Y (2018) American sign language recognition using deep learning and computer vision. In: 2018 IEEE international conference on big data (big data), pp 4896–4899. IEEE

  • Bengio Y, Ducharme R, Vincent P, Janvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155

    MATH  Google Scholar 

  • Bheda, V, Radpour D (2017) Using deep convolutional networks for gesture recognition in American sign language. arXiv preprint arXiv: arXiv: 1710.06836

  • Camgoz NC, Hadfield S, Koller O, Ney H, Bowden R (2018) Neural sign language translation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7784–7793

  • Carpuat M, Wu D (2007) Improving statistical machine translation using word sense disambiguation. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), pp 61–72

  • Chiang D (2005) A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd annual meeting of the association for computational linguistics (ACL’05), pp 263–270

  • Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv: arXiv: 1406.1078

  • Cho K, Van Merriënboer B, Bahdanau D, Bengio Y (2014) On the properties of neural machine translation: encoder-decoder approaches. arXiv preprint arXiv: arXiv: 1409.1259

  • Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2625–2634

  • Duarte A, Palaskar S, Ventura L, Ghadiyaram D, DeHaan K, Metze F, Giro-i-Nieto X (2021) How2Sign: a large-scale multimodal dataset for continuous American sign language. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition pp 2735–2744

  • Elakkiya R (2021) Machine learning based sign language recognition: a review and its research frontier. J Ambient Intell Hum Comput 12(7):7205–7224

    Article  Google Scholar 

  • Elakkiya R, Natarajan B (2021) ISL-CSLTR: Indian sign language dataset for continuous sign language translation and recognition. Mendeley Data. https://doi.org/10.17632/kcmpdxky7p.1

    Article  Google Scholar 

  • Elakkiya R, Selvamani K (2018) Enhanced dynamic programming approach for subunit modelling to handle segmentation and recognition ambiguities in sign language. J Parallel Distributed Comput 117:246–255

    Article  Google Scholar 

  • Elakkiya R, Selvamani K (2019) Subunit sign modeling framework for continuous sign language recognition. Comput Electr Eng 74:379–390

    Article  Google Scholar 

  • Graves A, Fernández S, Gomez F, & Schmidhuber J (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd international conference on machine learning, pp 369–376

  • Graves A, Mohamed AR, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing, pp 6645–6649. IEEE

  • Guo D, Zhou W, Li H, Wang M (2018) Hierarchical LSTM for sign language translation. In: Proceedings of the AAAI conference on artificial intelligence, vol 32(1)

  • Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  • Kalchbrenner N, Blunsom P (2013) Recurrent continuous translation models. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 1700–1709

  • Ko SK, Kim CJ, Jung H, Cho C (2019) Neural sign language translation based on human keypoint estimation. Appl Sci 9(13):2683

    Article  Google Scholar 

  • Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. University of Southern California Marina del Rey Information Sciences Institute

  • Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of the demo and poster sessions, pp 177–180

  • Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: MT summit, vol 5, pp 79–86

  • Koller O, Forster J, Ney H (2015b) Continuous sign language recognition: towards large vocabulary statistical recognition systems handling multiple signers. Comput vis Image Underst 141:108–125

    Article  Google Scholar 

  • Koller O, Ney H, Bowden R (2015) Deep learning of mouth shapes for sign language. In: Proceedings of the IEEE international conference on computer vision workshops, pp 85–91

  • Koller O, Zargaran O, Ney H, Bowden R (2016) Deep sign: hybrid CNN-HMM for continuous sign language recognition. In: Proceedings of the British machine vision conference 2016

  • Konstantinidis D, Dimitropoulos K, Daras P (2018) A deep learning approach for analyzing video and skeletal features in sign language recognition. In: 2018 IEEE international conference on imaging systems and techniques (IST), pp 1–6. IEEE

  • Kudo T, Richardson J (2018) Sentencepiece: a simple and language independent subword tokenizer and detokenizer for neural text processing. arXiv preprint arXiv: arXiv: 1808.06226

  • Kudo T (2018) Subword regularization: Improving neural network translation models with multiple subword candidates. arXiv preprint arXiv: arXiv: 1804.10959

  • Luong MT, Pham, H, Manning CD (2015) Effective approaches to attention-based neural machine translation. arXiv preprint arXiv: arXiv: 1508.04025

  • Neubig G (2017) Neural machine translation and sequence-to-sequence models: a tutorial. arXiv preprint arXiv: arXiv: 1703.01619

  • Ong SC, Ranganath S (2005) Automatic sign language analysis: a survey and the future beyond lexical meaning. IEEE Trans Pattern Anal Mach Intell 27(06):873–891

    Article  Google Scholar 

  • Provilkov I, Emelianenko D, Voita E (2019) Bpe-dropout: simple and effective subword regularization. arXiv preprint arXiv: arXiv: 1910.13267

  • Pu J, Zhou W, Li H (2019) Iterative alignment network for continuous sign language recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4165–4174

  • Pust M, Hermjakob U, Knight K, Marcu D, May J (2015) Parsing English into abstract meaning representation using syntax-based machine translation. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 1143–1154

  • Sennrich R, Haddow B, Birch A (2015) Neural machine translation of rare words with subword units. arXiv preprint arXiv: arXiv: 1508.07909

  • Simard M, Ueffing N, Isabelle P, Kuhn R (2007) Rule-based translation with statistical phrase-based post-editing. In: Proceedings of the second workshop on statistical machine translation, pp 203–206

  • Stoll S, Camgoz NC, Hadfield S, Bowden R (2020) Text2Sign: towards sign language production using neural machine translation and generative adversarial networks. Int J Comput vis 128(4):891–908

    Article  Google Scholar 

  • Stoll S, Camgöz NC, Hadfield S, Bowden R (2018) Sign language production using neural machine translation and generative adversarial networks. In: Proceedings of the 29th British machine vision conference (BMVC 2018). University of Surrey

  • Sutskever I, Vinyal O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, pp 3104–3112

  • Tu Z, Lu Z, Liu Y, Liu X, Li H (2016) Modeling coverage for neural machine translation. arXiv preprint arXiv: arXiv: 1601.04811

  • Utiyama M, Isahara H (2007) A comparison of pivot methods for phrase-based statistical machine translation. In: Human language technologies 2007: the conference of the North American chapter of the association for computational linguistics; proceedings of the main conference, pp 484–491

  • Vaswani A, Bengio S, Brevdo E, Chollet F, Gomez AN, Gouws S, Uszkoreit J (2018) Tensor2tensor for neural machine translation. arXiv preprint arXiv: 1803.07416

  • Wang W, Knight K, Marcu D (2007) Binarizing syntax trees to improve syntax-based machine translation accuracy. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), pp 746–754

  • Wang X, Lu Z, Tu Z, Li H, Xiong D, Zhang M (2017) Neural machine translation advised by statistical machine translation. In: Thirty-first AAAI conference on artificial intelligence

  • Wang S, Guo D, Zhou WG, Zha ZJ, Wang M (2018) Connectionist temporal fusion for sign language translation. In: Proceedings of the 26th ACM international conference on multimedia, pp 1483–1491

  • Wołk K, Marasek K (2015) Neural-based machine translation for medical text domain. based on European Medicines Agency leaflet texts. Proc Comput Sci 64:2–9

    Article  Google Scholar 

  • Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Dean J (2016) Google's neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv: 1609.08144

Download references

Acknowledgements

The research was funded by the Science and Engineering Research Board (SERB), India under Start-up Research Grant (SRG)/2019–2021 (Grant no. SRG/2019/001338). We would like to thank Navajeevan, Residential School for the Deaf, College of Spl. D.Ed & B.Ed, Vocational Centre, and Child Care & Learning Centre, Ayyalurimetta, Nandyal, Andhra Pradesh, India for their support and also, we thank all the students for their contribution in collecting the sign videos and the successful completion of the ISL-CSLTR corpus.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to R. Elakkiya.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Natarajan, B., Elakkiya, R. & Prasad, M.L. Sentence2SignGesture: a hybrid neural machine translation network for sign language video generation. J Ambient Intell Human Comput 14, 9807–9821 (2023). https://doi.org/10.1007/s12652-021-03640-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-021-03640-9

Keywords