Skip to main content

Heuristics-Based Replenishment of Collocation Databases

  • Conference paper
  • First Online:
Advances in Natural Language Processing (PorTAL 2002)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2389))

Included in the following conference series:

Abstract

Collocations are defined as syntactically linked and semantically plausible combinations of content words. Since collocations constitute a bulk of common texts and depend on the language, creation of collocation databases (CBDs) is important. However, manual compilation of such databases is prohibitively expensive. We present heuristics for automatic generation of new Spanish collocations based on those already present in a CBD, with the help of WordNet-like thesaurus: If a word A is semantically “similar” to a word B and a collocation B + C is known, then A + C presumably is a collocation of the same type given certain conditions are met.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Benson, M., E. Benson, and R. Ilson. The BBI Combinatory Dictionary of English. John Benjamin, Amsterdam / Philadelphia, 1989.

    Google Scholar 

  2. Bolshakov, I. A., A. Gelbukh. A Very Large Database of Collocations and Semantic Links. In: Mokrane Bouzeghoub et al. (eds.) Natural Language Processing and Information Systems. 5th International Conference on Applications NLDB-2000, Versailles, France, June 2000. Lecture Notes in Computer Science No. 1959, Springer, 2001, p. 103–114.

    Chapter  Google Scholar 

  3. Calzolari, N., R. Bindi. Acquisition of Lexical Information from a Large Textual Italian Corpus. Proc. of COLING-90, Helsinki, 1990.

    Google Scholar 

  4. Fellbaum, Ch. (ed.) WordNet: An Electronic Lexical Database. MIT Press, Cambridge, London, 1998.

    MATH  Google Scholar 

  5. Mel’čuk, Igor. Fraseología y diccionario en la lingüística moderna. In: I. Uzcanga Vivar et al. (eds.) Presencia y renovación de la lingüística francesa. Salamanca: Ediciones Universidad, 2001, p. 267–310.

    Google Scholar 

  6. Mel’čuk, I., A. Zholkovsky. The explanatory combinatorial dictionary. In: M. Evens (ed.) Relational models of lexicon. Cambridge University Press. Cambridge. England, 1988, p. 41–74.

    Google Scholar 

  7. Satoshi Sekine et al. Automatic Learning for Semantic Collocation. Proc. 3rd Conf. Applied Natural Language Processing, Trento, Italy, 1992, p. 104–110.Smadja, F. Retreiving collocations from text: Xtract. Computational Linguistics. Vol. 19, No. 1, 1991, p. 143–177.

    Google Scholar 

  8. Smadja, F. Retreiving collocations from text: Xtract. Computational Linguistics. Vol. 19, No. 1, 1991, p. 143–177.

    Google Scholar 

  9. Vossen, P. (ed.). EuroWordNet General Document. Vers. 3 final. 2000, www.hum.uva.nl/~ewn.

  10. Wanner, Leo (ed.). Lexical Functions in Lexicography and Natural Language Processing. Studies in Language Companion Series, ser. 31. John Benjamin, Amsterdam/ Philadelphia, 1996.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bolshakov, I.A., Gelbukh, A. (2002). Heuristics-Based Replenishment of Collocation Databases. In: Ranchhod, E., Mamede, N.J. (eds) Advances in Natural Language Processing. PorTAL 2002. Lecture Notes in Computer Science(), vol 2389. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45433-0_5

Download citation

  • DOI: https://doi.org/10.1007/3-540-45433-0_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43829-8

  • Online ISBN: 978-3-540-45433-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics