Abstract
The accurate translation of collocations, or multi-word units, is essential for high quality machine translation. However, many collocations do not translate compositionally, thus requiring individual entries in the bilingual lexicon. We present a technique for collocation extraction from large corpora that takes into account the dispersion of the collocations throughout the corpus. Collocations are ranked to more accurately reflect how likely they are to occur in a wide variety of texts; collocations which are specific to a particular text are less useful for lexicon development. Once the collocations are extracted, appropriate bilingual lexical entries can be developed by lexicographers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
J.B. Carroll, P. Davies, and B. Richman. The American Heritage Word Frequency Book. Houghton Mifflin, Boston, 1971.
Y. Choueka, T. Klein, and E. Neuwitz. Automatic retrieval of frequent idiomatic and collocational expressions in a large corpus. Journal for Literary and Linguistic Computing, 4:34–38, 1983.
Kenneth W. Church and P. Hanks. Word association norms, mutual information, and lexicography. Computational Linguistics, 16:22–29, 1991.
B. Daille. Combined approach for terminology extraction: lexical statistics and linguistic filtering. UCREL Technical Papers 5, Department of Linguistics, University of Lancaster, Lancaster, UK, 1995.
T. Dunning. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19:61–74, 1993.
A.S. Hornby, A.P. Cowie, and A.C. Gimson. The Oxford Advanced Learners Dictionary of Current English. Oxford University Press, Oxford, 1987.
S.M. Katz. Distribution of context words and phrases in text and language modelling. Journal of Natural Language Engineering, 2:15–59, 1996.
Fred Popowich, Davide Turcato, Olivier Laurens, Paul McFetridge, J. Devlan Nicholson, Patrick McGivern, Maricela Corzo-Pena, Lisa Pidruchney, and Scott MacDonald. A lexicalist approach to the translation of colloquial text. In Proceedings of the 7th International Conference on Theoretical and Methodological Issues in Machine Translation, pages 76–86, Santa Fe, New Mexico, USA, 1997.
F. Smadja. Retrieving collocations from text: Xtract. Computational Linguistics, 19:143–177, 1993.
Davide Turcato, Olivier Laurens, Paul McFetridge, and Fred Popowich. Inflectional information in transfer for lexicalist MT. In Proceedings of the International Conference ‘Recent Advances in Natural Language Processing’ (RANLP-97), pages 98–103, Tzigov Chark, Bulgaria, 1997.
Pete Whitelock. Shake and bake translation. In C.J. Rupp, M.A. Rosner, and R.L. Johnson, editors, Constraints, Language and Computation, pages 339–359. Academic Press, London, 1994.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
McDonald, S., Turcato, D., McFetridge, P., Popowich, F., Toole, J. (2000). Collocation Discovery for Optimal Bilingual Lexicon Development. In: Hamilton, H.J. (eds) Advances in Artificial Intelligence. Canadian AI 2000. Lecture Notes in Computer Science(), vol 1822. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45486-1_11
Download citation
DOI: https://doi.org/10.1007/3-540-45486-1_11
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67557-0
Online ISBN: 978-3-540-45486-1
eBook Packages: Springer Book Archive