Skip to main content
Log in

Analysis, preparation, and optimization of statistical sign language machine translation

  • Published:
Machine Translation

Abstract

Sign languages represent an interesting niche for statistical machine translation that is typically hampered by the scarceness of suitable data, and most papers in this area apply only a few, well-known techniques that are not adapted to small-sized corpora. In this article, we analyze existing data collections and emphasize their quality and usability for statistical machine translation. We also offer findings in the proper preprocessing of a sign language corpus, by introducing sentence end markers, splitting compound words and handling parallel communication channels. Then, we focus on optimization procedures that are tailored to scarce resources, such as scaling factor optimization, alignment optimization and system combination. All methods are evaluated on two of the largest sign language corpora available.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Bauer B, Kraiss KF (2001) Towards an automatic sign language recognition system using subunits. In: Gesture and sign language in human-computer interaction. International gesture workshop GW 2001, Springer, London, pp 64–75

  • Becker C (2010) Lesen und Schreiben Lernen mit einer Hörschädigung. Unterstützte Kommunikation 1: 17–21

    Google Scholar 

  • Bellugi U, Fischer S (1972) A comparison of sign language and spoken language. Cognition 1: 173–200

    Article  Google Scholar 

  • Bertoldi N, Tiotto G, Prinetto P, Piccolo E, Nunnari F, Lombardo V, Mazzei A, Damiano R, Lesmo L, Principe AD (2010) On the creation and the annotation of a large-scale Italian-LIS parallel corpus. In: 4th workshop on the representation and processing of sign languages: corpora and sign language technologies, Valletta, Malta, pp 19–22

  • Brown PF, Della Pietra SA, Della Pietra VJ, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2): 263–311

    Google Scholar 

  • Bungeroth J, Ney H (2004) Statistical sign language translation. In: LREC 2004, workshop proceedings: representation and processing of sign languages, Lisbon, Portugal, pp 105–108

  • Bungeroth J, Stein D, Dreuw P, Ney H, Morrissey S, Way A, van Zijl L (2008) The ATIS sign language corpus. In: International conference on language resources and evaluation, Marrakech, Morocco, p 4

  • Chiu Y, Wu C, Su H, Cheng C (2007) Joint optimization of word alignment and epenthesis generation for chinese to taiwanese sign synthesis. IEEE Trans Pattern Anal Mach Intel 29(1): 28–39

    Article  Google Scholar 

  • Crasborn O, Zwitserlood I (2008) The corpus NGT: an online corpus for professionals and laymen. In: Crasborn O, Hanke T, Efthimiou E, Zwitserlood I, Thoutenhoofd E (eds) Construction and exploitation of sign language corpora. 3rd workshop on the representation and processing of sign languages at LREC 2008, ELDA, Paris, France, pp 44–49

  • Crasborn O, van der Kooij E, Nonhebel A, Emmerik W (2004) ECHO data set for sign language of the Netherlands (NGT). Department of Linguistics, Radboud University Nijmegen, Nijmegen

    Google Scholar 

  • Dreuw P, Forster J, Gweth Y, Stein D, Ney H, Martinez G, Verges Llahi J, Crasborn O, Ormel E, Du W, Hoyoux T, Piater J, Moya Lazaro JM, Wheatley M (2010a) Signspeak—understanding, recognition, and translation of sign languages. In: 4th Workshop on the representation and processing of sign languages: corpora and sign language technologies, Malta

  • Dreuw P, Ney H, Martinez G, Crasborn O, Piater J, Miguel Moya J, Wheatley M (2010b) The signspeak project—bridging the gap between signers and speakers. In: International conference on language resources and evaluation, Valletta, Malta, pp 476–481

  • Efthimiou E, Fotinea SE, Vogler C, Hanke T, Glauert J, Bowden R, Braffort A, Collet C, Maragos P, Segouat J (2009) Sign language recognition, generation, and modelling: a research effort with applications in deaf communication. In: Stephanidis C (ed) Universal access in human-computer interaction. Addressing diversity, lecture notes in computer science, vol 5614, Springer, Berlin, pp 21–30

  • Fiscus JG (1997) A post-processing system to yield reduced word error rates: recognizer output voting error reduction (rover), pp 347–352

  • Hermans D, Knoors H, Ormel E, Verhoeven L (2008a) Modeling reading vocabulary learning in deaf children in bilingual education programs. J Deaf Stud Deaf Edu 13(2): 155–174

    Article  Google Scholar 

  • Hermans D, Knoors H, Ormel E, Verhoeven L (2008b) The relationship between the reading and signing skills of deaf children in bilingual education programs. J Deaf Stud Deaf Edu 13(4): 519–530

    Google Scholar 

  • Huenerfauth M (2004) Spatial representation of classifier predicates for machine translation into american sign language. In: Workshop on representation and processing of sign language, 4th internationnal conference on language ressources and evaluation, LREC 2004, pp 24–31

  • Johnston T (2001) The lexical database of auslan (Australian sign language). Sign Lang Linguist 4(25): 145–169

    Article  Google Scholar 

  • Kanis J, Müller L (2009) Advances in Czech—signed speech translation. In: Lecture notes in computer science, vol 5729. Springer, New York, pp 48–55

  • Kanis J, Zahradil J, Jurčíček F, Müller L (2005) Czech-sign speech corpus for semantic based machine translation. In: Lecture notes in artificial intelligence, vol. 4188, pp 613–620

  • Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: Machine translation summit X, Phuket, Thailand, pp 79–86

  • Koehn P, Och FJ, Marcu D (2003) Statistical Phrase-Based Translation. In: Proceedings of the human language technology, North American chapter of the association for computational linguistics, Edmonton, Canada, pp 54–60

  • Kramer F (2007) Kulturfaire Berufseignungsdiagnostik bei Gehörlosen und daraus abgeleitete Untersuchungen zu den Unterschieden der Rechenfertigkeiten bei Gehörlosen und Hörenden. PhD thesis, RWTH Aachen University, Aachen, Germany

  • Massó G, Badia T (2010) Dealing with sign language morphemes for statistical machine translation. In: 4th workshop on the representation and processing of sign languages: corpora and sign language technologies, Valletta, Malta, pp 154–157

  • Matusov E, Leusch G, Banchs RE, Bertoldi N, Dechelotte D, Federico M, Kolss M, Lee YS, Marino JB, Paulik M, Roukos S, Schwenk H, Ney H (2008) System combination for machine translation of spoken and written language. IEEE Trans Audio Speech Lang Process 16(7): 1222–1237

    Article  Google Scholar 

  • Mauser A, Hasan S, Ney H (2009) Extending statistical machine translation with discriminative and trigger-based lexicon models. In: Conference on empirical methods in natural language processing, Singapore, pp 210–218

  • Morissey S (2008) Data-driven machine translation for sign languages. PhD thesis, School of Computing, Dublin City University, Dublin City University, Ireland

  • Morrissey S, Way A (2006) Lost in translation: the problems of using mainstream MT evaluation metrics for sign language translation. In: Proceedings of the 5th SALTMIL workshop on minority languages at LREC’06, Genoa, Italy, pp 91–98

  • Morrissey S, Way A, Stein D, Bungeroth J, Ney H (2007) Towards a hybrid data-driven MT system for sign languages. In: Machine translation summit, Copenhagen, Denmark, pp 329–335

  • Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1): 19–51

    Article  MATH  Google Scholar 

  • Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 41st annual meeting of the association for computational linguistics, Philadelphia, Pennsylvania, USA, pp 311–318

  • Pizzuto E, Rossini P, Russo T (2006) Representing signed languages in written form: questions that need to be posed. In: Proceedings of the workshop on the representation and processing of sign languages: “lexicographic matters and didactic scenarios”, international conference on language resources and evaluation LREC 2006, Genoa, Italy—28th May 2006, pp 1–6

  • Popović M, Stein D, Ney H (2006) Statistical machine translation of German compound words. In: 5th international conference on natural language processing, FinTal, Turku, Finland, pp 616–624

  • Prillwitz S (1989) HamNoSys, Version 2.0; Hamburg notation system for sign language. An introductory guide. Signum, Hamburg

    Google Scholar 

  • Rexroat N (1997) The colonization of the deaf community. Soc Work Perspect 7(1): 18–26

    Google Scholar 

  • Sáfár É, Marshall I (2001) The architecture of an English-text-to-sign-languages translation system. In: et al GA (ed) Recent advances in natural language processing (RANLP), Tzigov Chark, Bulgaria, pp 223–228

  • San-Segundo R, Pardo JM, Ferreiros J, Sama V, Barra-Chicote R, Lucas JM, Snchez D, Garca A (2010) Spoken spanish generation from sign language. Interact Comput 22(2):123–139, URL http://linkinghub.elsevier.com/retrieve/pii/S095354380900099X

    Google Scholar 

  • Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th conference of the association for machine translation in the Americas, Cambridge, MA, USA, pp 223–231

  • Speers AL (2002) Representation of American sign language for machine translation. PhD thesis, Georgetown University, Washington, DC

  • Stein D, Bungeroth J, Ney H (2006) Morpho-syntax based statistical methods for sign language translation. In: Conference of the European association for machine translation, Oslo, Norway, pp 169–177

  • Stein D, Forster J, Zelle U, Dreuw P, Ney H (2010a) Analysis of the German sign language weather forecast corpus. In: 4th workshop on the representation and processing of sign languages: corpora and sign language technologies, Valletta, Malta, pp 225–230

  • Stein D, Schmidt C, Ney H (2010b) Sign language machine translation overkill. In: Proceedings of the international workshop on spoken language translation (IWSLT), Paris, France, pp 337–334

  • Veale T, Conway A, Collins B (1998) The challenges of cross-modal translation: English to sign language translation in the zardoz system. J Mach Trans 13(1): 81–106

    Article  Google Scholar 

  • Venugopal A, Zollmann A, Smith N, Vogel S (2009) Preference grammars: softening syntactic constraints to improve statistical machine translation. In: Proceedings of human language technologies: the 2009 annual conference of the north American chapter of the association for computational linguistics, Boulder, Colorado, USA, pp 236–244

  • Vilar D, Stein D, Ney H (2008) Analysing soft syntax features and heuristics for hierarchical phrase based machine translation. In: Proceedings of the international workshop on spoken language translation (IWSLT), Waikiki, Hawaii, pp 190–197

  • Vilar D, Stein D, Huck M, Ney H (2010) Jane: open source hierarchical translation, extended with reordering and lexicon models. In: ACL 2010 joint fifth workshop on statistical machine translation and metrics MATR, Uppsala, Sweden, pp 262–270

  • Wauters LN, van Bon WHJ, Tellings AEJM (2006) Reading comprehension of Dutch deaf children. Read Writ Interdiscip J 19: 49–76

    Article  Google Scholar 

  • Zens R, Ney H (2008) Improvements in dynamic programming beam search for phrase-based statistical machine translation. In: Proceedings of the international workshop on spoken language translation (IWSLT), Honolulu, Hawaii, pp 195–205

  • Zielinski A, Simon C (2008) Morphisto: an open-source morphological analyzer for German. In: Proceedings of the international workshop on finite-state methods and natural language processing Ispra, Italy, pp 177–184

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Stein.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Stein, D., Schmidt, C. & Ney, H. Analysis, preparation, and optimization of statistical sign language machine translation. Machine Translation 26, 325–357 (2012). https://doi.org/10.1007/s10590-012-9125-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-012-9125-1

Keywords

Navigation