Skip to main content

Measuring Non-compositionality of Verb-Noun Collocations Using Lexical Functions and WordNet Hypernyms

  • Conference paper
  • First Online:
Advances in Artificial Intelligence and Its Applications (MICAI 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9414))

Included in the following conference series:

  • 1492 Accesses

Abstract

In such verb-noun combinations as draw a conclusion, lend support, take a step, the verb acquires a meaning different from its typical meaning usually represented by the first sense in WordNet thus making a correct compositional analysis hard or even impossible. Such non-compositional word combinations are called collocations. The semantics and syntactical properties of collocations can be formalized using lexical functions, a concept of the Meaning-Text Theory. In this paper we realized two series of experiments, both with supervised learning methods on automatic detection of lexical functions in verb-noun collocations using WordNet hypernyms. In the first experimental series, we used hypernyms which correspond to the manually annotated WordNet senses of verbs and nouns in the dataset. In the second series, we used hypernyms corresponding to the typical (first) sense of the verbs. Comparing the results of both experiments we found that the performance of supervised learning on some lexical functions was better in the second case in spite of the fact that the first sense was not the sense of the verbs they have in collocations. This shows that for such lexical functions, the semantics of the verbs is closer to their typical senses and thus non-compositionality of such collocations is weaker. We propose to use the difference in lexical function detection based on the actual sense and the first sense as a simple measure of non-compositionality of verb-noun collocations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Macmillan Dictionary Online, available on http://www.macmillandictionary.com/.

  2. 2.

    WordNet 3.1, available on http://wordnet.princeton.edu/.

  3. 3.

    Merriam-Webster Dictionary Online, available on http://www.merriam-webster.com/.

  4. 4.

    Cambridge Dictionary Online, http://dictionary.cambridge.org/.

  5. 5.

    Corpus of Contemporary American English (COCA) created by Mark Davies, Brigham Young University, available on http://corpus.byu.edu/coca/.

  6. 6.

    Spanish Verb-Noun Lexical Functions, available on http://148.204.58.221/okolesnikova/index.php?id=lex/ and http://www.gelbukh.com/lexical-functions/.

  7. 7.

    Spanish WordNet: http://www.lsi.upc.edu/~nlp/web/index.php?Itemid=57&id=31&option=com_content&task=view.

  8. 8.

    The University of Waikato Computer Science Department Machine Learning Group, WEKA, available on http://www.cs.waikato.ac.nz/ml/weka/downloading.html/.

References

  1. Alonso-Rorís, V.M., Santos Gago, J.M., Pérez Rodríguez, R., Rivas Costa, C., Gómez Carballa, M.A., Anido Rifón, L.: Information extraction in semantic, highly-structured, and semi-structured web sources. Polibits 49, 69–75 (2014)

    Google Scholar 

  2. Apresjan, J.D.: Lexical Semantics. Vostochnaya Literatura, Russian Academy of Sciences, Moscow (1995). (In Russian)

    Google Scholar 

  3. Apresjan, J.D.: Systematic Lexicography. Oxford University Press, Oxford (2000)

    Google Scholar 

  4. Baldwin, T., Bannard, C., Tanaka, T., Widdows, D.: An empirical model of multiword expression decomposability. In: Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, vol. 18, pp. 89–96. Association for Computational Linguistics (2003)

    Google Scholar 

  5. Agarwal, B., Poria, S., Mittal, N., Gelbukh, A., Hussain, A.: Concept-level sentiment analysis with dependency-based semantic parsing: a novel approach. Cogn. Comput. 7(4), 1–13 (2015)

    Article  Google Scholar 

  6. Biemann, C., Giesbrecht, E.: Distributional semantics and compositionality 2011: shared task description and results. In: Proceedings of the Workshop on Distributional Semantics and Compositionality, pp. 21–28. Association for Computational Linguistics (2011)

    Google Scholar 

  7. Bu, F., Zhu, X., Li, M.: Measuring the non-compositionality of multiword expressions. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 116–124. Association for Computational Linguistics (2010)

    Google Scholar 

  8. Fazly, A., Stevenson, S.: Distinguishing subtypes of multiword expressions using linguistically motivated statistical measures. In: Grégoire, N., Evert, S., Krenn, B. (eds.) Proceedings of the ACL 2007 Workshop on a Broader Perspective on Multiword Expressions, pp. 9–16. Czech Republic, Prague (2007)

    Google Scholar 

  9. Fontenelle, T.: Using lexical functions to discover metaphors. In: Proceedings of the 6th EURALEX International Congress, pp. 271–278 (1994)

    Google Scholar 

  10. Gelbukh, A., Kolesnikova, O.: Supervised learning for semantic classification of Spanish collocations. In: Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Kittler, J. (eds.) MCPR 2010. LNCS, vol. 6256, pp. 362–371. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  11. Hashimoto, K., Stenetorp, P., Miwa, M., Tsuruoka, Y.: Task-Oriented Learning of Word Embeddings for Semantic Relation Classification, arXiv preprint arXiv:1503.00095 (2015)

  12. Hirst, G., St-Onge, D.: Lexical chains as representations of context for the detection and correction of malapropisms. In: WordNet: An electronic Lexical Database, pp. 305–332 (1998)

    Google Scholar 

  13. Inkpen, D., Razavi, A.H.: Topic classification using latent Dirichlet allocation at multiple levels. Int. J. Comput. Linguist. Appl. 6(1), 43–58

    Google Scholar 

  14. Jimenez, S., Gonzalez, F.A., Gelbukh, A.: Soft cardinality in semantic text processing: experience of the SemEval international competitions. Polibits 51, 63–72 (2015)

    Article  Google Scholar 

  15. Johannsen, A., Alonso, H.M., Rishøj, C., Søgaard, A.: Shared task system description: frustratingly hard compositionality prediction. In: Proceedings of the Workshop on Distributional Semantics and Compositionality, pp. 29–32. Association for Computational Linguistics (2011)

    Google Scholar 

  16. Huynh, D., Tran, D., Ma, W., Sharma, D.: Semantic similarity measure using relational and latent topic features. Int. J. Comput. Linguist. Appl. 5(1), 11–25 (2014)

    Google Scholar 

  17. Kahane, S.: What is a natural language and how to describe it? Meaning-text approaches in contrast with generative approaches. In: Gelbukh, A. (ed.) CICLing 2001. LNCS, vol. 2004, pp. 1–17. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  18. Kahane, S.: The meaning-text theory. Dependency Valency Int. Handb. Contemp. Res. 1, 546–570 (2003)

    Google Scholar 

  19. Katz, G., Giesbrecht, E.: Automatic identification of non-compositional multi-word expressions using latent semantic analysis. In: Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, pp. 12–19. Association for Computational Linguistics (2006)

    Google Scholar 

  20. Kim, S.N., Baldwin, T.: Detecting compositionality of English verb-particle constructions using semantic similarity. In: Proceedings of the 7th Meeting of the Pacific Association for Computational Linguistics PACLING 2007, pp. 40–48 (2007)

    Google Scholar 

  21. Kittredge, R., Iordanskaja, L., Polguère, A.: Multilingual text generation and the meaning-text theory. In: Proceedings of TMI-88, Pittsburgh, PA (1988)

    Google Scholar 

  22. Kolesnikova, O.: Discriminative ability of WordNet senses on the task of detecting lexical functions in Spanish verb-noun collocations. Int. J. Comput. Linguist. Appl. 5(2), 61–86 (2014)

    Google Scholar 

  23. Kunchukuttan, A., Damani, O.P.: A system for compound noun multiword expression extraction for Hindi. In: 6th International Conference on Natural Language Processing, pp. 20–29 (2008)

    Google Scholar 

  24. Lin, D.: Automatic retrieval and clustering of similar words. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, vol. 2, pp. 768–774. Association for Computational Linguistics (1998)

    Google Scholar 

  25. Lyons, J.: Linguistic Semantics: An Introduction. Cambridge University Press, Cambridge (1995)

    Book  Google Scholar 

  26. McCarthy, D., Keller, B., Carroll, J.: Detecting a continuum of compositionality in phrasal verbs. In: Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, vol. 18, pp. 73–80. Association for Computational Linguistics (2003)

    Google Scholar 

  27. McCarthy, D., Venkatapathy, S., Joshi, A.K.: Detecting compositionality of verb-object combinations using selectional preferences. In: EMNLP-CoNLL, pp. 369–379 (2007)

    Google Scholar 

  28. McIntosh, C., Francis, B., Poole, R. (eds.): Oxford Collocations Dictionary for Students of English. Oxford University Press, Oxford (2009)

    Google Scholar 

  29. Mel’čuk, I.A., Žolkovskij, A.K.: Towards a functioning ‘Meaning-Text’ model of language. Linguistics 8(57), 10–47 (1970)

    Google Scholar 

  30. Mel’čuk, I.A.: Toward a Theory of Meaning-Text Linguistic Models. Nauka Publishers, Moscow (1974)

    Google Scholar 

  31. Mel’čuk, I.A.: Lexical functions: a tool for the description of lexical relations in a lexicon. In: Wanner, L. (ed.) Lexical Functions in Lexicography and Natural Language Processing, pp. 37–102. Benjamins Academic Publishers, Amsterdam, Philadelphia, PA (1996)

    Google Scholar 

  32. Mel’čuk, I.A.: Semantics: From Meaning to Text, vol. 3. John Benjamins Publishing Company, Amsterdam (2015)

    Google Scholar 

  33. Milićević, J.: A short guide to the meaning-text linguistic theory. J. Koralex 8, 187–233 (2006)

    Google Scholar 

  34. Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  35. Miller, G.A., Leacock, C., Tengi, R., Bunker, R.T.: A semantic concordance. In: Proceedings of the Workshop on Human Language Technology Association for Computational Linguistics, pp. 303–308 (1993)

    Google Scholar 

  36. Mohler, M., Tomlinson, M., Rink, B.: Cross-lingual semantic generalization for the detection of metaphor. Int. J. Comput. Linguist. Appl. 6(2), 115–136 (2015)

    Google Scholar 

  37. Nakagawa, H., Mori, T.: Automatic term recognition based on statistics of compound nouns and their components. Terminology 9(2), 201–219 (2003)

    Article  Google Scholar 

  38. Orliac, B., Dillinger, M.: Collocation extraction for machine translation. In: Proceedings of Machine Translation Summit IX, pp. 292–298 (2003)

    Google Scholar 

  39. Pakray, P., Neogi, S., Bhaskar, P., Poria, S., Bandyopadhyay, S., Gelbukh, A.: A textual entailment system using anaphora resolution. In: System Report. Recognizing Textual Entailment Track (TAC RTE). Notebook. National Institute of Standards and Technology (2011)

    Google Scholar 

  40. Pakray, P., Pal, S., Poria, S., Bandyopadhyay, S., Gelbukh, A.: JU_CSE_TAC: textual entailment recognition system at TAC RTE-6. In: System Report, Text Analysis Conference Recognizing Textual Entailment Track (TAC RTE). Notebook. National Institute of Standards and Technology (2010)

    Google Scholar 

  41. Pakray, P., Poria, S., Bandyopadhyay, S., Gelbukh, A.: Semantic textual entailment recognition using UNL. Polibits 43, 23–27 (2011)

    Article  Google Scholar 

  42. Pecina, P.: An extensive empirical study of collocation extraction methods. In: Proceedings of the ACL Student Research Workshop, pp. 13–18. Association for Computational Linguistics (2005)

    Google Scholar 

  43. Poria, S., Agarwal, B., Gelbukh, A., Hussain, A., Howard, N.: Dependency-based semantic parsing for concept-level text analysis. In: Gelbukh, A. (ed.) CICLing 2014, Part I. LNCS, vol. 8403, pp. 113–127. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  44. Poria, S., Cambria, E., Gelbukh, A.: Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis. In: Proceedings of EMNLP 2015, Lisbon, pp. 2539–2544 (2015)

    Google Scholar 

  45. Poria, S., Cambria, E., Gelbukh, A., Bisio, F., Hussain, A.: Sentiment data flow analysis by means of dynamic linguistic patterns. IEEE Comput. Intell. Mag. 10(4), 26–36 (2015)

    Article  Google Scholar 

  46. Poria, S., Cambria, E., Howard, N., Huang, G.-B., Hussain, A.: Fusing audio, visual and textual clues for sentiment analysis from multimodal content. Neurocomputing (2015, in press). doi:10.1016/j.neucom.2015.01.095

  47. Poria, S., Cambria, E., Winterstein, G., Huang, G.-B.: Sentic patterns: dependency-based rules for concept-level sentiment analysis. Knowl.-Based Syst. 69, 45–63 (2014)

    Article  Google Scholar 

  48. Poria, S., Gelbukh, A., Agarwal, B., Cambria, E., Howard, N.: Common sense knowledge based personality recognition from text. In: Castro, F., Gelbukh, A., González, M. (eds.) MICAI 2013, Part II. LNCS, vol. 8266, pp. 484–496. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  49. Poria, S., Gelbukh, A., Das, D., Bandyopadhyay, S.: Fuzzy clustering for semi-supervised learning – case study: construction of an emotion lexicon. In: Batyrshin, I., González Mendoza, M. (eds.) MICAI 2012, Part I. LNCS, vol. 7629, pp. 73–86. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  50. Poria, S., Gelbukh, A., Cambria, E., Hussain, A., Huang, G.-B.: EmoSenticSpace: a novel framework for affective common-sense reasoning. Knowl.-Based Syst. 69, 108–123 (2014)

    Article  Google Scholar 

  51. Reddy, S., McCarthy, D., Manandhar, S., Gella, S.: Exemplar-based word-space model for compositionality detection: shared task system description. In: Proceedings of the Workshop on Distributional Semantics and Compositionality, pp. 54–60. Association for Computational Linguistics (2011)

    Google Scholar 

  52. Rinaldi, F., Lithgow-Serrano, O., López-Fuentes, A., Gama-Castro, S., Balderas-Martínez, Y.I., Solano-Lira, H., Collado-Vides, J.: An approach towards semi-automated biomedical literature curation and enrichment for a major biological database. Polibits 52, 25–31 (2015)

    Google Scholar 

  53. Rohde, D.L., Gonnerman, L.M., Plaut, D.C.: An improved model of semantic similarity based on lexical co-occurrence. Commun. ACM 8, 627–633 (2006)

    Google Scholar 

  54. Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: a pain in the neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  55. Schütze, H.: Automatic word sense discrimination. Comput. Linguist. 24(1), 97–123 (1998)

    Google Scholar 

  56. Sidorov, G.: Should syntactic N-grams contain names of syntactic relations? Int. J. Comput. Linguist. Appl. 5(2), 23–46 (2014)

    Google Scholar 

  57. Sidorov, G., Kobozeva, I., Zimmerling, A., Chanona-Hernández, L., Kolesnikova, O.: Modelo computacional del diálogo basado en reglas aplicado a un robot guía móvil. Polibits 50, 35–42 (2014)

    Article  Google Scholar 

  58. Smadja, F.A., McKeown, K.R.: Automatically extracting and representing collocations for language generation. In: Proceedings of the 28th Annual Meeting on Association for Computational Linguistics, pp. 252–259. Association for Computational Linguistics (1990)

    Google Scholar 

  59. Svensson, M.H.: A very complex criterion of fixedness: non-compositionality. Phraseology Interdisc. Perspect. S. Granger 81, 81–93 (2008)

    Article  Google Scholar 

  60. Van de Cruys, T., Moirón, B.V.: Semantics-based multiword expression extraction. In: Proceedings of the Workshop on a Broader Perspective on Multiword Expressions, pp. 25–32. Association for Computational Linguistics (2007)

    Google Scholar 

  61. Venkatapathy, S., Joshi, A.K.: Measuring the relative compositionality of verb-noun (VN) collocations by integrating features. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 899–906. Association for Computational Linguistics (2005)

    Google Scholar 

  62. Vossen, P. (ed.): EuroWordNet: a Multilingual Database with Lexical Semantic Networks. Kluwer Academic Publishers, Dordrecht (1998)

    MATH  Google Scholar 

  63. Wanner, L. (ed.): Recent Trends in Meaning-Text Theory. John Benjamins Publishers, Amsterdam, Philadelphia (1997)

    Google Scholar 

  64. Witten, I.H., Frank, E., Hall, M.A.: Data mining: Practical machine learning tools and techniques. Morgan Kaufmann Publishers, MA, USA (2011)

    Google Scholar 

  65. Zabokrtský, Z.: Resemblances between meaning-text theory and functional generative description. In: Proceedings of Second International Conference on Meaning–Text Theory, Moscow (2005)

    Google Scholar 

  66. Žolkovskij, A.K., Mel’čuk, I.A.: On a possible method an instruments for semantic synthesis (of texts), in Russian. Sci. Technol. Inf. 6, 23–28 (1965). (in Russian)

    Google Scholar 

  67. Žolkovskij, A.K., Mel’čuk, I.A.: On semantic synthesis (of texts), in Russian. Probl. Cybern. 19, 177–238 (1967). (in Russian)

    Google Scholar 

Download references

Acknowledgements

The work was done under partial support of Mexican Government: SNI and Instituto Politécnico Nacional, grants SIP 20152095 and SIP 20152100.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Olga Kolesnikova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Kolesnikova, O., Gelbukh, A. (2015). Measuring Non-compositionality of Verb-Noun Collocations Using Lexical Functions and WordNet Hypernyms. In: Pichardo Lagunas, O., Herrera Alcántara, O., Arroyo Figueroa, G. (eds) Advances in Artificial Intelligence and Its Applications. MICAI 2015. Lecture Notes in Computer Science(), vol 9414. Springer, Cham. https://doi.org/10.1007/978-3-319-27101-9_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27101-9_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27100-2

  • Online ISBN: 978-3-319-27101-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics