Abstract
Collocations are defined as syntactically linked and semantically plausible combinations of content words. Since collocations constitute a bulk of common texts and depend on the language, creation of collocation databases (CBDs) is important. However, manual compilation of such databases is prohibitively expensive. We present heuristics for automatic generation of new Spanish collocations based on those already present in a CBD, with the help of WordNet-like thesaurus: If a word A is semantically “similar” to a word B and a collocation B + C is known, then A + C presumably is a collocation of the same type given certain conditions are met.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Benson, M., E. Benson, and R. Ilson. The BBI Combinatory Dictionary of English. John Benjamin, Amsterdam / Philadelphia, 1989.
Bolshakov, I. A., A. Gelbukh. A Very Large Database of Collocations and Semantic Links. In: Mokrane Bouzeghoub et al. (eds.) Natural Language Processing and Information Systems. 5th International Conference on Applications NLDB-2000, Versailles, France, June 2000. Lecture Notes in Computer Science No. 1959, Springer, 2001, p. 103–114.
Calzolari, N., R. Bindi. Acquisition of Lexical Information from a Large Textual Italian Corpus. Proc. of COLING-90, Helsinki, 1990.
Fellbaum, Ch. (ed.) WordNet: An Electronic Lexical Database. MIT Press, Cambridge, London, 1998.
Mel’čuk, Igor. Fraseología y diccionario en la lingüística moderna. In: I. Uzcanga Vivar et al. (eds.) Presencia y renovación de la lingüística francesa. Salamanca: Ediciones Universidad, 2001, p. 267–310.
Mel’čuk, I., A. Zholkovsky. The explanatory combinatorial dictionary. In: M. Evens (ed.) Relational models of lexicon. Cambridge University Press. Cambridge. England, 1988, p. 41–74.
Satoshi Sekine et al. Automatic Learning for Semantic Collocation. Proc. 3rd Conf. Applied Natural Language Processing, Trento, Italy, 1992, p. 104–110.Smadja, F. Retreiving collocations from text: Xtract. Computational Linguistics. Vol. 19, No. 1, 1991, p. 143–177.
Smadja, F. Retreiving collocations from text: Xtract. Computational Linguistics. Vol. 19, No. 1, 1991, p. 143–177.
Vossen, P. (ed.). EuroWordNet General Document. Vers. 3 final. 2000, www.hum.uva.nl/~ewn.
Wanner, Leo (ed.). Lexical Functions in Lexicography and Natural Language Processing. Studies in Language Companion Series, ser. 31. John Benjamin, Amsterdam/ Philadelphia, 1996.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bolshakov, I.A., Gelbukh, A. (2002). Heuristics-Based Replenishment of Collocation Databases. In: Ranchhod, E., Mamede, N.J. (eds) Advances in Natural Language Processing. PorTAL 2002. Lecture Notes in Computer Science(), vol 2389. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45433-0_5
Download citation
DOI: https://doi.org/10.1007/3-540-45433-0_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43829-8
Online ISBN: 978-3-540-45433-5
eBook Packages: Springer Book Archive