Skip to main content

Bootstrapping the Lexicon Building Process for Machine Translation between ‘New’ Languages

  • Conference paper
  • First Online:
Machine Translation: From Research to Real Users (AMTA 2002)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2499))

Included in the following conference series:

  • 653 Accesses

Abstract

The cumulative effort over the past few decades that have gone into developing linguistic resources for tasks ranging from machine readable dictionaries to translation systems is enormous. Such effort is prohibitively expensive for languages outside the (largely) European family. The possibility of building such resources automatically by accessing electronic corpora of such languages are therefore of great interest to those involved in studying these ‘new’ - ‘lesser known’ languages. The main stumbling block to applying these data driven techniques directly is that most of them require large corpora rarely available for such ‘new’ languages. This paper describes an attempt at setting up a bootstrapping agenda to exploit the scarce corpus resources that may be available at the outset to a researcher concerned with such languages. In particular it reports on results of an experiment to use state-of-the-art data-driven techniques for building linguistic resources for Sinhala - a non-European language with virtually no electronic resources.

Work reported herein was carried out at INRIA, France, supported by the European Research Consortium on Informatics and Mathematics (ERCIM).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Germann, U.: Building a Statistical Machine Translation System from Scratch: How Much Bang Can We Expect for the Buck. Proceedings of the Data-Driven MT Workshop of ACL-01.Toulouse, France (2001)

    Google Scholar 

  2. Brown, P. F., Della-Pietra, S. A., Della-Pietra, V. J. and Mercer, R. L.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2) (1993)263–311.

    Google Scholar 

  3. Al-Onaizan, Y., Curin, J., Jahr, M., Knight, Lafferty, J., Melamed, D., Och, F.-J., Purdy, D., Smith, N. A., and Yarowsky, D.: Statistical Machine Translation, Final Report, JHU Workshop 1999. Technical Report, CLSP/JHU (1999)

    Google Scholar 

  4. Gale W. A. and Church K. W.: A program for aligning sentences in bilingual corpora. Proceedings of ACL-91, Berkeley (1991) 177–184

    Google Scholar 

  5. Melamed I. Dan: A Portable Algorithm for Mapping Bitext Correspondence. Proceedings of the 35th Conference of the Association for Computational Linguistics (ACL’97), Madrid, Spain (1997)

    Google Scholar 

  6. Clarkson, P.R. and Rosenfield, R.: Statistical Language Modeling using the CMU-Cambridge Toolkit, Proceedings ESCA Eurospeech, Rhodes, Greece (1997)

    Google Scholar 

  7. Germann, U., Jahr, M., Knight, K., Marcu, D., and Yamada, K.: Fast Decoding and Optimal Decoding for Machine Translation. Proceedings of ACL-01. Toulouse, France (2001)

    Google Scholar 

  8. Simard, M.: Text-translation Alignment: Three Languages Are Better Than Two. In Proceedings of EMNLP/VLC-99, College Park, MD (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Weerasinghe, R. (2002). Bootstrapping the Lexicon Building Process for Machine Translation between ‘New’ Languages. In: Richardson, S.D. (eds) Machine Translation: From Research to Real Users. AMTA 2002. Lecture Notes in Computer Science(), vol 2499. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45820-4_18

Download citation

  • DOI: https://doi.org/10.1007/3-540-45820-4_18

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44282-0

  • Online ISBN: 978-3-540-45820-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics