Bootstrapping the Lexicon Building Process for Machine Translation between ‘New’ Languages

Weerasinghe, Ruvan

doi:10.1007/3-540-45820-4_18

Ruvan Weerasinghe²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2499))

Included in the following conference series:

Conference of the Association for Machine Translation in the Americas

653 Accesses

Abstract

The cumulative effort over the past few decades that have gone into developing linguistic resources for tasks ranging from machine readable dictionaries to translation systems is enormous. Such effort is prohibitively expensive for languages outside the (largely) European family. The possibility of building such resources automatically by accessing electronic corpora of such languages are therefore of great interest to those involved in studying these ‘new’ - ‘lesser known’ languages. The main stumbling block to applying these data driven techniques directly is that most of them require large corpora rarely available for such ‘new’ languages. This paper describes an attempt at setting up a bootstrapping agenda to exploit the scarce corpus resources that may be available at the outset to a researcher concerned with such languages. In particular it reports on results of an experiment to use state-of-the-art data-driven techniques for building linguistic resources for Sinhala - a non-European language with virtually no electronic resources.

Work reported herein was carried out at INRIA, France, supported by the European Research Consortium on Informatics and Mathematics (ERCIM).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Bootstrapping a Portuguese WordNet from Galician, Spanish and English Wordnets

Recent advances in Apertium, a free/open-source rule-based machine translation platform for low-resource languages

Article Open access 18 October 2021

Constructing a poor man’s wordnet in a resource-rich world

Article 11 February 2015

References

Germann, U.: Building a Statistical Machine Translation System from Scratch: How Much Bang Can We Expect for the Buck. Proceedings of the Data-Driven MT Workshop of ACL-01.Toulouse, France (2001)
Google Scholar
Brown, P. F., Della-Pietra, S. A., Della-Pietra, V. J. and Mercer, R. L.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2) (1993)263–311.
Google Scholar
Al-Onaizan, Y., Curin, J., Jahr, M., Knight, Lafferty, J., Melamed, D., Och, F.-J., Purdy, D., Smith, N. A., and Yarowsky, D.: Statistical Machine Translation, Final Report, JHU Workshop 1999. Technical Report, CLSP/JHU (1999)
Google Scholar
Gale W. A. and Church K. W.: A program for aligning sentences in bilingual corpora. Proceedings of ACL-91, Berkeley (1991) 177–184
Google Scholar
Melamed I. Dan: A Portable Algorithm for Mapping Bitext Correspondence. Proceedings of the 35th Conference of the Association for Computational Linguistics (ACL’97), Madrid, Spain (1997)
Google Scholar
Clarkson, P.R. and Rosenfield, R.: Statistical Language Modeling using the CMU-Cambridge Toolkit, Proceedings ESCA Eurospeech, Rhodes, Greece (1997)
Google Scholar
Germann, U., Jahr, M., Knight, K., Marcu, D., and Yamada, K.: Fast Decoding and Optimal Decoding for Machine Translation. Proceedings of ACL-01. Toulouse, France (2001)
Google Scholar
Simard, M.: Text-translation Alignment: Three Languages Are Better Than Two. In Proceedings of EMNLP/VLC-99, College Park, MD (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Colombo, Sri Lanka
Ruvan Weerasinghe

Authors

Ruvan Weerasinghe
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Microsoft Research, 1 Microsoft Way, Redmond, WA, 98052, USA
Stephen D. Richardson

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Weerasinghe, R. (2002). Bootstrapping the Lexicon Building Process for Machine Translation between ‘New’ Languages. In: Richardson, S.D. (eds) Machine Translation: From Research to Real Users. AMTA 2002. Lecture Notes in Computer Science(), vol 2499. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45820-4_18

Download citation

DOI: https://doi.org/10.1007/3-540-45820-4_18
Published: 20 September 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44282-0
Online ISBN: 978-3-540-45820-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Bootstrapping the Lexicon Building Process for Machine Translation between ‘New’ Languages

Abstract

Access this chapter

Preview

Similar content being viewed by others

Bootstrapping a Portuguese WordNet from Galician, Spanish and English Wordnets

Recent advances in Apertium, a free/open-source rule-based machine translation platform for low-resource languages

Constructing a poor man’s wordnet in a resource-rich world

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Bootstrapping the Lexicon Building Process for Machine Translation between ‘New’ Languages

Abstract

Access this chapter

Preview

Similar content being viewed by others

Bootstrapping a Portuguese WordNet from Galician, Spanish and English Wordnets

Recent advances in Apertium, a free/open-source rule-based machine translation platform for low-resource languages

Constructing a poor man’s wordnet in a resource-rich world

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation