Abstract
Sign languages represent an interesting niche for statistical machine translation that is typically hampered by the scarceness of suitable data, and most papers in this area apply only a few, well-known techniques that are not adapted to small-sized corpora. In this article, we analyze existing data collections and emphasize their quality and usability for statistical machine translation. We also offer findings in the proper preprocessing of a sign language corpus, by introducing sentence end markers, splitting compound words and handling parallel communication channels. Then, we focus on optimization procedures that are tailored to scarce resources, such as scaling factor optimization, alignment optimization and system combination. All methods are evaluated on two of the largest sign language corpora available.
Similar content being viewed by others
References
Bauer B, Kraiss KF (2001) Towards an automatic sign language recognition system using subunits. In: Gesture and sign language in human-computer interaction. International gesture workshop GW 2001, Springer, London, pp 64–75
Becker C (2010) Lesen und Schreiben Lernen mit einer Hörschädigung. Unterstützte Kommunikation 1: 17–21
Bellugi U, Fischer S (1972) A comparison of sign language and spoken language. Cognition 1: 173–200
Bertoldi N, Tiotto G, Prinetto P, Piccolo E, Nunnari F, Lombardo V, Mazzei A, Damiano R, Lesmo L, Principe AD (2010) On the creation and the annotation of a large-scale Italian-LIS parallel corpus. In: 4th workshop on the representation and processing of sign languages: corpora and sign language technologies, Valletta, Malta, pp 19–22
Brown PF, Della Pietra SA, Della Pietra VJ, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2): 263–311
Bungeroth J, Ney H (2004) Statistical sign language translation. In: LREC 2004, workshop proceedings: representation and processing of sign languages, Lisbon, Portugal, pp 105–108
Bungeroth J, Stein D, Dreuw P, Ney H, Morrissey S, Way A, van Zijl L (2008) The ATIS sign language corpus. In: International conference on language resources and evaluation, Marrakech, Morocco, p 4
Chiu Y, Wu C, Su H, Cheng C (2007) Joint optimization of word alignment and epenthesis generation for chinese to taiwanese sign synthesis. IEEE Trans Pattern Anal Mach Intel 29(1): 28–39
Crasborn O, Zwitserlood I (2008) The corpus NGT: an online corpus for professionals and laymen. In: Crasborn O, Hanke T, Efthimiou E, Zwitserlood I, Thoutenhoofd E (eds) Construction and exploitation of sign language corpora. 3rd workshop on the representation and processing of sign languages at LREC 2008, ELDA, Paris, France, pp 44–49
Crasborn O, van der Kooij E, Nonhebel A, Emmerik W (2004) ECHO data set for sign language of the Netherlands (NGT). Department of Linguistics, Radboud University Nijmegen, Nijmegen
Dreuw P, Forster J, Gweth Y, Stein D, Ney H, Martinez G, Verges Llahi J, Crasborn O, Ormel E, Du W, Hoyoux T, Piater J, Moya Lazaro JM, Wheatley M (2010a) Signspeak—understanding, recognition, and translation of sign languages. In: 4th Workshop on the representation and processing of sign languages: corpora and sign language technologies, Malta
Dreuw P, Ney H, Martinez G, Crasborn O, Piater J, Miguel Moya J, Wheatley M (2010b) The signspeak project—bridging the gap between signers and speakers. In: International conference on language resources and evaluation, Valletta, Malta, pp 476–481
Efthimiou E, Fotinea SE, Vogler C, Hanke T, Glauert J, Bowden R, Braffort A, Collet C, Maragos P, Segouat J (2009) Sign language recognition, generation, and modelling: a research effort with applications in deaf communication. In: Stephanidis C (ed) Universal access in human-computer interaction. Addressing diversity, lecture notes in computer science, vol 5614, Springer, Berlin, pp 21–30
Fiscus JG (1997) A post-processing system to yield reduced word error rates: recognizer output voting error reduction (rover), pp 347–352
Hermans D, Knoors H, Ormel E, Verhoeven L (2008a) Modeling reading vocabulary learning in deaf children in bilingual education programs. J Deaf Stud Deaf Edu 13(2): 155–174
Hermans D, Knoors H, Ormel E, Verhoeven L (2008b) The relationship between the reading and signing skills of deaf children in bilingual education programs. J Deaf Stud Deaf Edu 13(4): 519–530
Huenerfauth M (2004) Spatial representation of classifier predicates for machine translation into american sign language. In: Workshop on representation and processing of sign language, 4th internationnal conference on language ressources and evaluation, LREC 2004, pp 24–31
Johnston T (2001) The lexical database of auslan (Australian sign language). Sign Lang Linguist 4(25): 145–169
Kanis J, Müller L (2009) Advances in Czech—signed speech translation. In: Lecture notes in computer science, vol 5729. Springer, New York, pp 48–55
Kanis J, Zahradil J, Jurčíček F, Müller L (2005) Czech-sign speech corpus for semantic based machine translation. In: Lecture notes in artificial intelligence, vol. 4188, pp 613–620
Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: Machine translation summit X, Phuket, Thailand, pp 79–86
Koehn P, Och FJ, Marcu D (2003) Statistical Phrase-Based Translation. In: Proceedings of the human language technology, North American chapter of the association for computational linguistics, Edmonton, Canada, pp 54–60
Kramer F (2007) Kulturfaire Berufseignungsdiagnostik bei Gehörlosen und daraus abgeleitete Untersuchungen zu den Unterschieden der Rechenfertigkeiten bei Gehörlosen und Hörenden. PhD thesis, RWTH Aachen University, Aachen, Germany
Massó G, Badia T (2010) Dealing with sign language morphemes for statistical machine translation. In: 4th workshop on the representation and processing of sign languages: corpora and sign language technologies, Valletta, Malta, pp 154–157
Matusov E, Leusch G, Banchs RE, Bertoldi N, Dechelotte D, Federico M, Kolss M, Lee YS, Marino JB, Paulik M, Roukos S, Schwenk H, Ney H (2008) System combination for machine translation of spoken and written language. IEEE Trans Audio Speech Lang Process 16(7): 1222–1237
Mauser A, Hasan S, Ney H (2009) Extending statistical machine translation with discriminative and trigger-based lexicon models. In: Conference on empirical methods in natural language processing, Singapore, pp 210–218
Morissey S (2008) Data-driven machine translation for sign languages. PhD thesis, School of Computing, Dublin City University, Dublin City University, Ireland
Morrissey S, Way A (2006) Lost in translation: the problems of using mainstream MT evaluation metrics for sign language translation. In: Proceedings of the 5th SALTMIL workshop on minority languages at LREC’06, Genoa, Italy, pp 91–98
Morrissey S, Way A, Stein D, Bungeroth J, Ney H (2007) Towards a hybrid data-driven MT system for sign languages. In: Machine translation summit, Copenhagen, Denmark, pp 329–335
Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1): 19–51
Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 41st annual meeting of the association for computational linguistics, Philadelphia, Pennsylvania, USA, pp 311–318
Pizzuto E, Rossini P, Russo T (2006) Representing signed languages in written form: questions that need to be posed. In: Proceedings of the workshop on the representation and processing of sign languages: “lexicographic matters and didactic scenarios”, international conference on language resources and evaluation LREC 2006, Genoa, Italy—28th May 2006, pp 1–6
Popović M, Stein D, Ney H (2006) Statistical machine translation of German compound words. In: 5th international conference on natural language processing, FinTal, Turku, Finland, pp 616–624
Prillwitz S (1989) HamNoSys, Version 2.0; Hamburg notation system for sign language. An introductory guide. Signum, Hamburg
Rexroat N (1997) The colonization of the deaf community. Soc Work Perspect 7(1): 18–26
Sáfár É, Marshall I (2001) The architecture of an English-text-to-sign-languages translation system. In: et al GA (ed) Recent advances in natural language processing (RANLP), Tzigov Chark, Bulgaria, pp 223–228
San-Segundo R, Pardo JM, Ferreiros J, Sama V, Barra-Chicote R, Lucas JM, Snchez D, Garca A (2010) Spoken spanish generation from sign language. Interact Comput 22(2):123–139, URL http://linkinghub.elsevier.com/retrieve/pii/S095354380900099X
Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th conference of the association for machine translation in the Americas, Cambridge, MA, USA, pp 223–231
Speers AL (2002) Representation of American sign language for machine translation. PhD thesis, Georgetown University, Washington, DC
Stein D, Bungeroth J, Ney H (2006) Morpho-syntax based statistical methods for sign language translation. In: Conference of the European association for machine translation, Oslo, Norway, pp 169–177
Stein D, Forster J, Zelle U, Dreuw P, Ney H (2010a) Analysis of the German sign language weather forecast corpus. In: 4th workshop on the representation and processing of sign languages: corpora and sign language technologies, Valletta, Malta, pp 225–230
Stein D, Schmidt C, Ney H (2010b) Sign language machine translation overkill. In: Proceedings of the international workshop on spoken language translation (IWSLT), Paris, France, pp 337–334
Veale T, Conway A, Collins B (1998) The challenges of cross-modal translation: English to sign language translation in the zardoz system. J Mach Trans 13(1): 81–106
Venugopal A, Zollmann A, Smith N, Vogel S (2009) Preference grammars: softening syntactic constraints to improve statistical machine translation. In: Proceedings of human language technologies: the 2009 annual conference of the north American chapter of the association for computational linguistics, Boulder, Colorado, USA, pp 236–244
Vilar D, Stein D, Ney H (2008) Analysing soft syntax features and heuristics for hierarchical phrase based machine translation. In: Proceedings of the international workshop on spoken language translation (IWSLT), Waikiki, Hawaii, pp 190–197
Vilar D, Stein D, Huck M, Ney H (2010) Jane: open source hierarchical translation, extended with reordering and lexicon models. In: ACL 2010 joint fifth workshop on statistical machine translation and metrics MATR, Uppsala, Sweden, pp 262–270
Wauters LN, van Bon WHJ, Tellings AEJM (2006) Reading comprehension of Dutch deaf children. Read Writ Interdiscip J 19: 49–76
Zens R, Ney H (2008) Improvements in dynamic programming beam search for phrase-based statistical machine translation. In: Proceedings of the international workshop on spoken language translation (IWSLT), Honolulu, Hawaii, pp 195–205
Zielinski A, Simon C (2008) Morphisto: an open-source morphological analyzer for German. In: Proceedings of the international workshop on finite-state methods and natural language processing Ispra, Italy, pp 177–184
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Stein, D., Schmidt, C. & Ney, H. Analysis, preparation, and optimization of statistical sign language machine translation. Machine Translation 26, 325–357 (2012). https://doi.org/10.1007/s10590-012-9125-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-012-9125-1