Skip to main content

Measuring the Compositionality of Arabic Multiword Expressions

  • Conference paper
Book cover Soft Computing Applications and Intelligent Systems (M-CAIT 2013)

Abstract

This paper presents a method for measuring the compositionality score of multiword expressions (MWEs). Based on Wikipedia (WP) as a lexicon resource, the multiword expressions are identified using the title of Wikipedia articles that are made up of more than one word without further process. Through the semantic representation, this method exploits the hierarchical taxonomy in Wikipedia to represent the concept (single word or multiword) as a feature vector containing the WP articles that belong to concept of categories and sub-categories. The literality and the multiplicative function composition scores are used for measuring the compositionality score of an MWE utilizing the semantic similarity. The proposed method is evaluated by comparing the compositionality score against human judgments (dataset) containing 100 Arabic noun-noun compounds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Attia, M.A., Tounsi, L., Pecina, P., van Genabith, J., Toral, A.: Automatic extraction of arabic multiword expressions. In: Workshop on Multiword Expressions: from Theory to Applications (MWE 2010), pp. 19–27 (2010)

    Google Scholar 

  2. Attia, M.A.: Accommodating Multiword Expressions in an Arabic LFG Grammar. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 87–98. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  3. Baldwin, T., Bannard, C., Tanaka, T., Widdows, D.: An empirical model of multiword expression decomposability. In: Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment (2003)

    Google Scholar 

  4. Cilibrasi, R.L., Vitanyi, P.M.B.: The Google Similarity Distance. IEEE TKDE 19(3), 370–383 (2007)

    Google Scholar 

  5. Deksne, D., Skadiņš, R., Skadiņa, I.: Dictionary of multiword expressions for translation into highly inflected languages. In: Proceedings LREC, Marrakech, pp. 1401–1405 (2008)

    Google Scholar 

  6. Erk, K., Padó, S.: A structured vector space model for word meaning in context. In: Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP), Edinburgh, UK, pp. 897–906 (2008)

    Google Scholar 

  7. Guevara, E.: Computing Semantic Compositionality in Distributional Semantics. In: Proceedings of the Ninth International Conference on Computational Semantics (IWCS 2011), Oxford, England, UK, pp. 135–144 (2011)

    Google Scholar 

  8. Korkontzelos, I., Manandhar, S.: Detecting Compositionality in Multi-Word Expressions. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp. 65–68. Association for Computational Linguistics (2009)

    Google Scholar 

  9. Landauer, T., Dumais, S.: A solution to plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review 104(2), 211 (1997)

    Article  Google Scholar 

  10. Medelyan, O., Milne, D., Legg, C., Witten, I.H.: Mining meaning from wikipedia. Int. J. Hum.-Comput. Stud. 67(9), 716–754 (2009)

    Article  Google Scholar 

  11. Milne, D., Witten, I.H.: An Effective, Low-Cost Measure of Semantic Relatedness obtained from Wikipedia Links. In: Proceedings of AAAI Workshop on Wikipedia and Artificial Intelligence (WIKIAI), pp. 25–30. AAAI, Menlo Park (2008)

    Google Scholar 

  12. Mitchell, J., Lapata, M.: Vector-based models of semantic composition. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 236–244 (2008)

    Google Scholar 

  13. Nivre, J., Nilson, J.: Multiword Units in Syntactic Parsing. In: Proceedings of MEMURA 2004 Workshop, Lisbon, pp. 39–46 (2004)

    Google Scholar 

  14. Partee, B.H.: Lexical semantics and compositionality. In: Osherson, D. (series ed.), Gleitman, L., Liberman, M. (volume eds.) Invitation to Cognitive Science. Part I: Language, pp. 311–360. MIT Press, Cambridge (1995)

    Google Scholar 

  15. Piao, S.L., Archer, D., Mudrayam, O., Rayson, P., Garside, R., McEnery, T., Wilson, A.: A Large Semantic Lexicon for Corpus Annotation. In: Proceedings of the Corpus Linguistics Conference, Birmingham, UK (2005)

    Google Scholar 

  16. Piao, S., Rayson, P., Mudraya, O., Wilson, A., Garside, R.: Measuring MWE compositionality using semantic annotation. In: Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, Sydney, Australia, pp. 2–11 (2006)

    Google Scholar 

  17. Reddy, S., Klapaftis, I., McCarthy, D., Manandhar, S.: An Empirical Study on Compositionality in Compound Nouns. In: Proceedings of the 5th International Joint Conference on Natural Language Processing, Chiang Mai, Thailand, pp. 210–218 (2011)

    Google Scholar 

  18. Reddy, S., Klapaftis, I., McCarthy, D., Manandhar, S.: Dynamic and static prototype vectors for semantic composition. In: Proceedings of the 5th International Joint Conference on Natural Language Processing, Chiang Mai, Thailand, pp. 705–713 (2011)

    Google Scholar 

  19. SanJuan, E., Ibekwe-SanJuan, F.: Text mining without document context. Inf. Process. Manage. 42(6), 1532–1552 (2006)

    Article  Google Scholar 

  20. Zesch, T., Müller, C., Gurevych, I.: Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary. In: Proceedings of the 6th International Conference on Language Resources and Evaluation, Marrakech, Morocco, pp. 1646–1652 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Saif, A., Ab Aziz, M.J., Omar, N. (2013). Measuring the Compositionality of Arabic Multiword Expressions. In: Noah, S.A., et al. Soft Computing Applications and Intelligent Systems. M-CAIT 2013. Communications in Computer and Information Science, vol 378. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40567-9_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40567-9_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40566-2

  • Online ISBN: 978-3-642-40567-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics