Abstract
This paper presents a method for measuring the compositionality score of multiword expressions (MWEs). Based on Wikipedia (WP) as a lexicon resource, the multiword expressions are identified using the title of Wikipedia articles that are made up of more than one word without further process. Through the semantic representation, this method exploits the hierarchical taxonomy in Wikipedia to represent the concept (single word or multiword) as a feature vector containing the WP articles that belong to concept of categories and sub-categories. The literality and the multiplicative function composition scores are used for measuring the compositionality score of an MWE utilizing the semantic similarity. The proposed method is evaluated by comparing the compositionality score against human judgments (dataset) containing 100 Arabic noun-noun compounds.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Attia, M.A., Tounsi, L., Pecina, P., van Genabith, J., Toral, A.: Automatic extraction of arabic multiword expressions. In: Workshop on Multiword Expressions: from Theory to Applications (MWE 2010), pp. 19–27 (2010)
Attia, M.A.: Accommodating Multiword Expressions in an Arabic LFG Grammar. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 87–98. Springer, Heidelberg (2006)
Baldwin, T., Bannard, C., Tanaka, T., Widdows, D.: An empirical model of multiword expression decomposability. In: Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment (2003)
Cilibrasi, R.L., Vitanyi, P.M.B.: The Google Similarity Distance. IEEE TKDE 19(3), 370–383 (2007)
Deksne, D., Skadiņš, R., Skadiņa, I.: Dictionary of multiword expressions for translation into highly inflected languages. In: Proceedings LREC, Marrakech, pp. 1401–1405 (2008)
Erk, K., Padó, S.: A structured vector space model for word meaning in context. In: Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP), Edinburgh, UK, pp. 897–906 (2008)
Guevara, E.: Computing Semantic Compositionality in Distributional Semantics. In: Proceedings of the Ninth International Conference on Computational Semantics (IWCS 2011), Oxford, England, UK, pp. 135–144 (2011)
Korkontzelos, I., Manandhar, S.: Detecting Compositionality in Multi-Word Expressions. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp. 65–68. Association for Computational Linguistics (2009)
Landauer, T., Dumais, S.: A solution to plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review 104(2), 211 (1997)
Medelyan, O., Milne, D., Legg, C., Witten, I.H.: Mining meaning from wikipedia. Int. J. Hum.-Comput. Stud. 67(9), 716–754 (2009)
Milne, D., Witten, I.H.: An Effective, Low-Cost Measure of Semantic Relatedness obtained from Wikipedia Links. In: Proceedings of AAAI Workshop on Wikipedia and Artificial Intelligence (WIKIAI), pp. 25–30. AAAI, Menlo Park (2008)
Mitchell, J., Lapata, M.: Vector-based models of semantic composition. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 236–244 (2008)
Nivre, J., Nilson, J.: Multiword Units in Syntactic Parsing. In: Proceedings of MEMURA 2004 Workshop, Lisbon, pp. 39–46 (2004)
Partee, B.H.: Lexical semantics and compositionality. In: Osherson, D. (series ed.), Gleitman, L., Liberman, M. (volume eds.) Invitation to Cognitive Science. Part I: Language, pp. 311–360. MIT Press, Cambridge (1995)
Piao, S.L., Archer, D., Mudrayam, O., Rayson, P., Garside, R., McEnery, T., Wilson, A.: A Large Semantic Lexicon for Corpus Annotation. In: Proceedings of the Corpus Linguistics Conference, Birmingham, UK (2005)
Piao, S., Rayson, P., Mudraya, O., Wilson, A., Garside, R.: Measuring MWE compositionality using semantic annotation. In: Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, Sydney, Australia, pp. 2–11 (2006)
Reddy, S., Klapaftis, I., McCarthy, D., Manandhar, S.: An Empirical Study on Compositionality in Compound Nouns. In: Proceedings of the 5th International Joint Conference on Natural Language Processing, Chiang Mai, Thailand, pp. 210–218 (2011)
Reddy, S., Klapaftis, I., McCarthy, D., Manandhar, S.: Dynamic and static prototype vectors for semantic composition. In: Proceedings of the 5th International Joint Conference on Natural Language Processing, Chiang Mai, Thailand, pp. 705–713 (2011)
SanJuan, E., Ibekwe-SanJuan, F.: Text mining without document context. Inf. Process. Manage. 42(6), 1532–1552 (2006)
Zesch, T., Müller, C., Gurevych, I.: Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary. In: Proceedings of the 6th International Conference on Language Resources and Evaluation, Marrakech, Morocco, pp. 1646–1652 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Saif, A., Ab Aziz, M.J., Omar, N. (2013). Measuring the Compositionality of Arabic Multiword Expressions. In: Noah, S.A., et al. Soft Computing Applications and Intelligent Systems. M-CAIT 2013. Communications in Computer and Information Science, vol 378. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40567-9_21
Download citation
DOI: https://doi.org/10.1007/978-3-642-40567-9_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40566-2
Online ISBN: 978-3-642-40567-9
eBook Packages: Computer ScienceComputer Science (R0)