A Structurally Diverse Minimal Corpus for Eliciting Structural Mappings Between Languages

Probst, Katharina; Lavie, Alon

doi:10.1007/978-3-540-30194-3_24

Katharina Probst²⁰ &
Alon Lavie²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3265))

Included in the following conference series:

Conference of the Association for Machine Translation in the Americas

1115 Accesses

Abstract

We describe an approach to creating a small but diverse corpus in English that can be used to elicit information about any target language. The focus of the corpus is on structural information. The resulting bilingual corpus can then be used for natural language processing tasks such as inferring transfer mappings for Machine Translation. The corpus is sufficiently small that a bilingual user can translate and word-align it within a matter of hours. We describe how the corpus is created and how its structural diversity is ensured. We then argue that it is not necessary to introduce a large amount of redundancy into the corpus. This is shown by creating an increasingly redundant corpus and observing that the information gained converges as redundancy increases.

This research was funded in part by NSF grant number IIS-0121631.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bouquiaux, L., Thomas, J.M.C.: Studying and Describing Unwritten Languages, The Summer Institute of Linguistics, Dallas, TX (1992)
Google Scholar
Comrie, B., Smith, N.: Lingua Descriptive Series: Questionnaire Lingua vol. 42, pp.1-72 (1977)
Google Scholar
Lavie, A., Vogel, S., Levin, L., Peterson, E., Probst, K., Font Llitjos, A., Reynolds, R., Carbonell, J., Cohen, R.: Experiments with a Hindi-to-English Transferbased MT System under a Miserly Data Scenario. In: ACM Transactions on Asian Language Information Processing (TALIP), vol. 2(2) (2003)
Google Scholar
Jones, D., Havrilla, R.: Twisted Pair Grammar: Support for Rapid Development of Machine Translation for Low Density Languages. In: Farwell, D., Gerber, L., Hovy, E. (eds.) AMTA 1998. LNCS (LNAI), vol. 1529, pp. 318–332. Springer, Heidelberg (1998)
Chapter Google Scholar
Marcus, M.A., Taylor, R., MacIntyre, A., Bies, C., Cooper, M., Ferguson, A.: Littmann. The Penn Treebank Project (1992), http://www.cis.upenn.edu/treebank/home.html
Probst, K.R., Brown, J., Carbonell, A., Lavie, L., Levin, E.: Peterson. Design and Implementation of Controlled Elicitation for Machine Translation of Lowdensity Languages. In:Workshop MT2010 at Machine Translation Summit VIII (2001)
Google Scholar
Probst, K., Levin, L., Peterson, E., Lavie, A., Carbonell, J.: MT for Resource- Poor Languages Using Elicitation-Based Learning of Syntactic Transfer Rules, Machine Translation. Special Issue on Embedded MT (2003)
Google Scholar
Probst, K., Levin, L.: Challenges in Automated Elicitation of a Controlled Bilingual Corpus.In: 9th International Conference on Theoretical and Methodological Issues in Machine Translation, TMI 2002 (2002)
Google Scholar
Sherematyeva, S., Nirenburg, S.: Towards a Unversal Tool for NLP Resource Acquisition.In: Second International Conference on Language Resources and Evaluation, LREC-00 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Language Technologies Institute, Carnegie Mellon University,
Katharina Probst & Alon Lavie

Authors

Katharina Probst
View author publications
You can also search for this author in PubMed Google Scholar
Alon Lavie
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Language Technologies Institute, Carnegie Mellon University, 5000 Forbes Avenue, 15213, Pittsburgh, PA, USA
Robert E. Frederking
Intelligence Technology Innovation Center, 20505, Washington, D.C., USA
Kathryn B. Taylor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Probst, K., Lavie, A. (2004). A Structurally Diverse Minimal Corpus for Eliciting Structural Mappings Between Languages. In: Frederking, R.E., Taylor, K.B. (eds) Machine Translation: From Real Users to Research. AMTA 2004. Lecture Notes in Computer Science(), vol 3265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30194-3_24

Download citation

DOI: https://doi.org/10.1007/978-3-540-30194-3_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23300-8
Online ISBN: 978-3-540-30194-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics