Skip to main content

Collocation Discovery for Optimal Bilingual Lexicon Development

  • Conference paper
  • First Online:
Advances in Artificial Intelligence (Canadian AI 2000)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1822))

  • 574 Accesses

Abstract

The accurate translation of collocations, or multi-word units, is essential for high quality machine translation. However, many collocations do not translate compositionally, thus requiring individual entries in the bilingual lexicon. We present a technique for collocation extraction from large corpora that takes into account the dispersion of the collocations throughout the corpus. Collocations are ranked to more accurately reflect how likely they are to occur in a wide variety of texts; collocations which are specific to a particular text are less useful for lexicon development. Once the collocations are extracted, appropriate bilingual lexical entries can be developed by lexicographers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J.B. Carroll, P. Davies, and B. Richman. The American Heritage Word Frequency Book. Houghton Mifflin, Boston, 1971.

    Google Scholar 

  2. Y. Choueka, T. Klein, and E. Neuwitz. Automatic retrieval of frequent idiomatic and collocational expressions in a large corpus. Journal for Literary and Linguistic Computing, 4:34–38, 1983.

    Google Scholar 

  3. Kenneth W. Church and P. Hanks. Word association norms, mutual information, and lexicography. Computational Linguistics, 16:22–29, 1991.

    Google Scholar 

  4. B. Daille. Combined approach for terminology extraction: lexical statistics and linguistic filtering. UCREL Technical Papers 5, Department of Linguistics, University of Lancaster, Lancaster, UK, 1995.

    Google Scholar 

  5. T. Dunning. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19:61–74, 1993.

    Google Scholar 

  6. A.S. Hornby, A.P. Cowie, and A.C. Gimson. The Oxford Advanced Learners Dictionary of Current English. Oxford University Press, Oxford, 1987.

    Google Scholar 

  7. S.M. Katz. Distribution of context words and phrases in text and language modelling. Journal of Natural Language Engineering, 2:15–59, 1996.

    Article  Google Scholar 

  8. Fred Popowich, Davide Turcato, Olivier Laurens, Paul McFetridge, J. Devlan Nicholson, Patrick McGivern, Maricela Corzo-Pena, Lisa Pidruchney, and Scott MacDonald. A lexicalist approach to the translation of colloquial text. In Proceedings of the 7th International Conference on Theoretical and Methodological Issues in Machine Translation, pages 76–86, Santa Fe, New Mexico, USA, 1997.

    Google Scholar 

  9. F. Smadja. Retrieving collocations from text: Xtract. Computational Linguistics, 19:143–177, 1993.

    Google Scholar 

  10. Davide Turcato, Olivier Laurens, Paul McFetridge, and Fred Popowich. Inflectional information in transfer for lexicalist MT. In Proceedings of the International Conference ‘Recent Advances in Natural Language Processing’ (RANLP-97), pages 98–103, Tzigov Chark, Bulgaria, 1997.

    Google Scholar 

  11. Pete Whitelock. Shake and bake translation. In C.J. Rupp, M.A. Rosner, and R.L. Johnson, editors, Constraints, Language and Computation, pages 339–359. Academic Press, London, 1994.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

McDonald, S., Turcato, D., McFetridge, P., Popowich, F., Toole, J. (2000). Collocation Discovery for Optimal Bilingual Lexicon Development. In: Hamilton, H.J. (eds) Advances in Artificial Intelligence. Canadian AI 2000. Lecture Notes in Computer Science(), vol 1822. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45486-1_11

Download citation

  • DOI: https://doi.org/10.1007/3-540-45486-1_11

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67557-0

  • Online ISBN: 978-3-540-45486-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics