Skip to main content

A Structurally Diverse Minimal Corpus for Eliciting Structural Mappings Between Languages

  • Conference paper
Machine Translation: From Real Users to Research (AMTA 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3265))

Included in the following conference series:

  • 1115 Accesses

Abstract

We describe an approach to creating a small but diverse corpus in English that can be used to elicit information about any target language. The focus of the corpus is on structural information. The resulting bilingual corpus can then be used for natural language processing tasks such as inferring transfer mappings for Machine Translation. The corpus is sufficiently small that a bilingual user can translate and word-align it within a matter of hours. We describe how the corpus is created and how its structural diversity is ensured. We then argue that it is not necessary to introduce a large amount of redundancy into the corpus. This is shown by creating an increasingly redundant corpus and observing that the information gained converges as redundancy increases.

This research was funded in part by NSF grant number IIS-0121631.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bouquiaux, L., Thomas, J.M.C.: Studying and Describing Unwritten Languages, The Summer Institute of Linguistics, Dallas, TX (1992)

    Google Scholar 

  2. Comrie, B., Smith, N.: Lingua Descriptive Series: Questionnaire Lingua vol. 42, pp.1-72 (1977)

    Google Scholar 

  3. Lavie, A., Vogel, S., Levin, L., Peterson, E., Probst, K., Font Llitjos, A., Reynolds, R., Carbonell, J., Cohen, R.: Experiments with a Hindi-to-English Transferbased MT System under a Miserly Data Scenario. In: ACM Transactions on Asian Language Information Processing (TALIP), vol. 2(2) (2003)

    Google Scholar 

  4. Jones, D., Havrilla, R.: Twisted Pair Grammar: Support for Rapid Development of Machine Translation for Low Density Languages. In: Farwell, D., Gerber, L., Hovy, E. (eds.) AMTA 1998. LNCS (LNAI), vol. 1529, pp. 318–332. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  5. Marcus, M.A., Taylor, R., MacIntyre, A., Bies, C., Cooper, M., Ferguson, A.: Littmann. The Penn Treebank Project (1992), http://www.cis.upenn.edu/treebank/home.html

  6. Probst, K.R., Brown, J., Carbonell, A., Lavie, L., Levin, E.: Peterson. Design and Implementation of Controlled Elicitation for Machine Translation of Lowdensity Languages. In:Workshop MT2010 at Machine Translation Summit VIII (2001)

    Google Scholar 

  7. Probst, K., Levin, L., Peterson, E., Lavie, A., Carbonell, J.: MT for Resource- Poor Languages Using Elicitation-Based Learning of Syntactic Transfer Rules, Machine Translation. Special Issue on Embedded MT (2003)

    Google Scholar 

  8. Probst, K., Levin, L.: Challenges in Automated Elicitation of a Controlled Bilingual Corpus.In: 9th International Conference on Theoretical and Methodological Issues in Machine Translation, TMI 2002 (2002)

    Google Scholar 

  9. Sherematyeva, S., Nirenburg, S.: Towards a Unversal Tool for NLP Resource Acquisition.In: Second International Conference on Language Resources and Evaluation, LREC-00 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Probst, K., Lavie, A. (2004). A Structurally Diverse Minimal Corpus for Eliciting Structural Mappings Between Languages. In: Frederking, R.E., Taylor, K.B. (eds) Machine Translation: From Real Users to Research. AMTA 2004. Lecture Notes in Computer Science(), vol 3265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30194-3_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30194-3_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23300-8

  • Online ISBN: 978-3-540-30194-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics