Elsevier

Computer Speech & Language

Volume 51, September 2018, Pages 110-135
Computer Speech & Language

A novel rule based machine translation scheme from Greek to Greek Sign Language: Production of different types of large corpora and Language Models evaluation

https://doi.org/10.1016/j.csl.2018.04.001Get rights and content

Highlights

  • The system produces large amounts of written GSL glossed corpora of high quality.

  • The glossed corpus is used as training data for creating Language Model n-grams.

  • Language Models for written GSL gloss are missing from the literature.

  • The system achieves a relative score of 84% for 4-grams, which is very promising.

  • The system accelerates the translation process, carried out by professionals.

Abstract

One of the aims of assistive technologies is to help people with disabilities to communicate with others and to provide means of access to information. As an aid to Deaf people, in this work we present a novel prototype Rule Based Machine Translation (RBMT) system for the creation of large and quality written Greek Sign Language (GSL) glossed corpora from Greek text. In particular, the proposed RBMT system assists the professional GSL translator in speeding up the production of different kinds of GSL glossed corpora. Then each glossed corpus is used for the production/creation of Language Model (LM) n-grams. With the GSL glossed corpus from Greek text, we can build, test and evaluate different kinds of Language Models for different kinds of glossed GSL corpora. Here, it should be noted that it does not require grammar knowledge of GSL but only very basic GSL phenomena covered by manual RBMT rules as it assists the professional human translator. Furthermore, it should also be stressed that Language Models for written GSL gloss are missing from the scientific literature, thus this work is pioneer in this field. Evaluation of the proposed scheme is carried out for the weather reports domain, where 20,284 tokens and 1000 sentences have been produced. By using the BiLingual Evaluation Understudy (BLEU) metric score, our prototype RBMT system achieves a relative score of 0.84 (84%) for 4-grams and 0.9 (90%) for 1-grams.

Introduction

Translation helps people to overcome linguistic and cultural barriers. However, according to Isabelle and Foster (2006), manual translation is too expensive, and its cost is unlikely to fall substantially enough, to constitute it as a practical solution to the everyday needs of ordinary people. Machine translation can help break linguistic barriers and make translation affordable to many people. This advancement is especially important to Deaf people, since translation supports the communication between Deaf and hearing communities and somehow provides Deaf people with the same opportunities to access information as everyone else (Porta et al., 2014).

Sign languages (SLs) exploit a different physical medium from the oral-aural system of spoken languages. SLs are gestural-visual languages, and this difference in modality causes SLs to constitute another branch within the typology of languages. However, there are still many myths around SLs. One of the most common and enduring myths is that the Sign Language (SL) is universal; however, in reality, each country generally has its own, native sign language (Edward, 1959, Klima and Bellugi, 1979).

This paper focuses on the Greek Sign Language (GSL),1 which satisfies all linguistic criteria to be a complete human language (Brennan et al., 1984, Croneberg, 1965, Klima and Bellugi, 1975). First of all, GSL uses its own grammar and syntax. According to the Greek law no. 2817/2000, GSL is the official language of the Greek Deaf community, while in 2013 the Greek Deaf Federation has published a formal announcement demanding the institutional recognition of GSL. Currently more than 40,000 people use GSL. Additionally, another common myth is that the GSL derives from the Greek spoken language. In reality, SLs do not derive from spoken languages, but they are influenced by them (Stokoe, 1969).

According to Porta et al. (2014) regarding the fundamental problems of SLs, most contemporary works on SLs have adopted language theories developed for the spoken languages, instead of testing new theories. From the point of view of natural language processing, SLs are still under-resourced or low-density languages – that is to say, little or no specific technology is available for these languages, and computerized linguistic resources, such as corpora or lexicons, are very scarce.

Additionally, another major problem of SLs is the lack of a writing system. Strictly speaking, the only way to represent SLs is by using video and this is why there is lack of large corpora. The limitations in composing, editing and reusing SL utterances as well as their consequences for Deaf education and communication have been systematically mentioned in the SL studies literature, since the second half of the twentieth century (Efthimiou et al., 2016). However, several notational systems exist. The most important include Stokoe notation2 (Stokoe, 1960), Sutton SignWriting3 (Sutton, 1995), HamNoSys (Hamburg Sign Language Notation System)4 (Prillwitz et al., 1989). SignWriting was conceived primarily as a writing system, and has its roots in DanceWriting (Sutton, 1978), a notation for reading and writing dance movements. Stokoe notation system and HamNoSys were conceived as a phonological transcription system for SLs, with the same objective as the International Phonetic Alphabet (IPA) for spoken languages. A very promising system is SiGML (Elliott et al., 2004), which represents the 3-D properties of SLs. Last but not least, the “si5s” writing system (Augustus et al., 2013) has been proposed for the American Sign Language (ASL).

Furthermore, regarding GSL and to the best of the authors’ knowledge, currently no Language Models exist. To confront the aforementioned problems, in this paper an innovative RBMT system is proposed, which quickly produces both high quality and large glossed GSL corpus. In particular, the focus is put primarily on syntax, so glosses are used instead of phonological notation. Glossing is a commonly used approach for explaining or representing the meaning of signs and the grammatical structure of signed phrases and sentences in a text, written in another language. However, glossing is not a writing system that could be understood by SL users. For this reason, a gloss system is proposed based on the Berkley system (for the ASL), which is also decorated with Non-Manual Component Sign (NmCs) tag features. The proposed scheme also enables the production of a simpler version of gloss, without NmCs tags, adopted from the Deaf Community and especially from the bilingual Deaf people (they know both GSL and the Greek spoken language), who use a similar written Greek system in the Social Media. The proposed GSL glosses system could be a precursor towards building a full Machine Translation (MT) system that would eventually produce avatar animation output. Finally, by using the produced glossed corpus with different combinations of decorated tags of part-of-speech (POS), provided by AUEB's (Athens University of Economics and Business) Greek POS Parser (Koleli, 2011), and by also incorporating NmCs, a statistic Language Model (LM) of n-grams is produced and analytically evaluated.

The rest of this paper is organized as follows: Section 2 presents a review of machine translation systems for SLs, as well as state-of-art SL corpora that have been built. In Section 3 related work is discussed and the way that the proposed scheme produces a different kind of GSL glossed corpus for training Language Models is analyzed. In Section 4 evaluation results are presented, while in Section 5 Language Models are created from different types of corpora, with the aid of a human GSL translator. Finally, Section 6 concludes this paper, providing also directions for future work.

Section snippets

Fundamentals of GSL

The most important documentation for any language (either signed or not) is a reference grammar, which documents the principles governing the construction of words, as well as all kinds of grammatical structures found in a language. Currently and regarding GSL, there are some attempts to gather resources, create a dictionary and annotated corpora and analyze a set of signers’ data, deriving from the annotated corpora (Efthimiou et al., 2012, Efthimiou and Fotinea, 2007a, Efthimiou and Fotinea,

The proposed RBMT system for Greek-to-GSL translation

The proposed RBMT system has taken into consideration the Basic Unification Grammar principles (Carpenter, 2005, Carpenter, 1992, Kay, 1984, Shieber, 2003). For its overall development, different tools and technologies have been combined: (a) AUEB's POS Parser8 for performing morphological annotation on the source corpora, (b) the NLTK (Natural Language Toolkit) 3.0 suite,9 which is a free, open source, community-driven, leading platform

Evaluation of the proposed RBMT system

Human evaluation is fundamental and remains of crucial importance to proper assessment of the quality of MT systems. When the output of an MT system is evaluated, however, the whole process is taken into account. In our case, different aspects of the proposed RBMT system are evaluated such as: (a) all stages of development of the transfer rules, (b) accuracy of translation and (c) complexity. However, here it should also be stressed that the lack of a generally accepted writing system for SLs,

Statistical Language Models (SLMs)

One of our main future plans is to develop a Statistical Machine Translation System (SMTS). Having a way to estimate the relative likelihood of different phrases is useful in many natural language processing applications. Towards this direction, SLMs have been considered in the literature and they are very important parts of a SMTS. At this section, we try to find out if our GSL gloss corpora could be used in a SMTS. For this purpose, we have created and studied different types of Language

Conclusions and future work

The choice of a particular type of technology to process a language (either spoken or signed) is greatly influenced by the density of the language, i.e., the availability of digitally stored resources. Commercial research and development have concentrated on high-density languages. Today GSL, like any other SL, is a low-density or under-resourced language. Because of modality, acquisition of SL data is a time consuming and expensive task, compared to the acquisition of spoken or written data.

References (92)

  • A. Braffort et al.

    Toward an annotation software for video of sign language, including image processing tools and signing space modelling

  • M. Brennan et al.

    Words in Hand: A Structural Analysis of the Signs of British Sign Language

    (1984)
  • P. Brown et al.

    A statistical approach to language translation

  • P.F. Brown et al.

    A statistical approach to machine translation

    Comput Linguist

    (1990)
  • H. Brugman et al.

    Multimodal annotations in gesture and sign language studies

  • H. Bunke et al.

    Hidden Markov models: Applications in Computer Vision

    (2001)
  • R. Carpenter

    The Logic of Typed Feature Structures (Cambridge Tracts in Theoretical Computer Science)

    (2005)
  • R. Carpenter

    The Logic of Typed Feature Structures

    (1992)
  • J. Chandioux

    Météo: 100 million words later’

  • J. Chandioux

    METEO: an operational system, for the translation of public weather forecasts

    FBIS Seminar on Machine Translation

    (1976)
  • ChenS.F. et al.

    An empirical study of smoothing techniques for language modeling

  • C. Croneberg

    The linguistic community

  • A.-L. Dimou et al.

    Grammar/prosody modelling in Greek sign language: towards the definition of built-in sign synthesis rules

  • P. Dreuw et al.

    Benchmark databases for video-based automatic sign language recognition

  • T.H. Edward

    The Silent Language

    (1959)
  • E. Efthimiou et al.

    GSLC: creation and annotation of a Greek sign language corpus for HCI

  • E. Efthimiou et al.

    An environment for deaf accessibility to educational content

  • E. Efthimiou et al.

    From grammar-based MT to post-processed SL representations

    Univ. Access Inf. Soc.

    (2016)
  • E. Efthimiou et al.

    Sign language technologies and resources of the dicta-sign project

  • R. Elliott et al.

    An overview of the SiGML notation and SiGMLSigning software system

  • M.L. Forcada et al.

    Apertium: a free/open-source platform for rule-based machine translation

    Mach. Transl.

    (2011)
  • S.-E. Fotinea et al.

    Sign language computer-aided education: exploiting gsl resources and technologies for web deaf communication

  • S.-E. Fotinea et al.

    Generating linguistic content for Greek to GSL conversion

  • J.H. Greenberg

    Some universals of grammar with particular reference to the order of meaningful elements

  • A.B. Grieve-Smith

    English to American Sign Language machine translation of weather reports

  • T. Hanke

    iLex – a tool for sign language lexicography and corpus analysis

  • K. Hengeveld et al.

    The architecture of a functional discourse grammar

  • N. Hoiting et al.

    Transcription as a tool for understanding: the Berkeley transcription system for sign language research (BTS)

  • MP Huenerfauth

    American Sign Language Natural Language Generation and Machine Translation Systems

    (2003)
  • M. Huenerfauth

    Generating American Sign Language Classifier Predicates for English-to-ASL Machine Translation

    (2006)
  • W.J. Hutchins et al.

    An Introduction to Machine Translation

    (1992)
  • N. Ide et al.

    XCES: an XML-based encoding standard for linguistic corpora

  • J. Kanis et al.

    Automatic Czech – sign speech translation

  • G. Karypis

    Evaluation of item-based top-N recommendation algorithms

  • S. Katz

    Estimation of probabilities from sparse data for the language model component of a speech recognizer

    IEEE Trans. Acoust. Speech Signal Process.

    (1987)
  • M. Kay

    Functional unification grammar: a formalism for machine translation

  • Cited by (13)

    • A survey on Sign Language machine translation

      2023, Expert Systems with Applications
      Citation Excerpt :

      This could be used for SLT using the transformation rules proposed by Lozynska, Davydov, Pasichnyk, and Veretennikova (2019). Focused on helping professional translators, Kouremenos et al. (2018) aimed at creating language models for the Greek SL (GSL). In particular, the glosses were derived from spoken language text using an RBMT system.

    • State of the Art of Automation in Sign Language: A Systematic Review

      2023, ACM Transactions on Asian and Low-Resource Language Information Processing
    • Machine translation from text to sign language: a systematic review

      2023, Universal Access in the Information Society
    View all citing articles on Scopus

    This paper has been recommended for acceptance by Roger K. Moore.

    View full text