Abstract
Many methods for measuring the semantic similarity between sentences have been proposed, particularly for English. These methods are considered restrictive as they usually do not take into account some semantic and syntactic-semantic knowledge like semantic predicate, thematic role and semantic class. Measuring the semantic similarity between sentences in Arabic is particularly a challenging task because of the complex linguistic structure of the Arabic language and given the lack of electronic resources such as syntactic-semantic knowledge and annotated corpora.
In this paper, we proposed a method for measuring Arabic sentences’ similarity based on automatic learning taking advantage of LMF standardized Arabic dictionaries, notably the syntactic-semantic knowledge that they contain. Furthermore, we evaluated our proposal with the cross validation method by using 690 pairs of sentences taken from old Arabic dictionaries designed for human use like Al-Wassit and Lissan-Al-Arab. The obtained results are very encouraging and show a good performance that approximates to human intuition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Yuhua, L., McLean, D., Bandar, Z.A., O’Shea, J.D., Crockett, K.: Sentence similarity based on semantic nets and corpus statistics. In: IEEE Transactions on Knowledge and Data Engineering, vol. 18, pp. 1138–1150 (2006)
Lee, M., Chang, J., Hsieh, T.: A Grammar-Based Semantic Similarity Algorithm for Natural Language Sentences. The Scientific World Journal 2014, Article ID 437162, 17 (2014). http://dx.doi.org/10.1155/2014/43716
Ştefănescu, D., Banjade, R., Rus, V.: A sentence similarity method based on chunking and information content. In: Gelbukh, A. (ed.) CICLing 2014, Part I. LNCS, vol. 8403, pp. 442–453. Springer, Heidelberg (2014)
Almarsoomi, F., O’Shea, J., Bandar, Z, Crockett, K.: Arabic word semantic similarity. In: World Academy of Science, Engineering and Technology (WASET 2012), vol. 6, pp. 81–89 (2012)
Francopoulo, G., George, M.: Language Resource Management. 2008. Lexical Markup Framework (LMF). Technical report, ISO/TC 37/SC 4 N453 (N330 Rev.16) (2008)
Khemakhem, A., Gargouri, B., Ben Hamadou, A.: LMF standardized dictionary for arabic language. In: International Conference on Computing and Information Technology (2012)
Wali, W., Gargouri, B., Ben Hamadou, A.: Towards detecting anomalies in the content of standardized LMF dictionaries. In: Recent Advances in Natural Language Processing (RANLP 2013), pp 719–726 (2013)
Wali, W., Gargouri, B., Ben Hamadou, A.: LMF-based approach for detecting semantic anomalies in electronic dictionaries. In: The Asialex International Conference (ASIALEX 2013), pp 242–252 (2013)
Wu, Z., Palmer, M.: Verb semantics and lexical selection. In: The Proceedings of the Annual Meeting of the Association for Computational Linguistics (1994)
Agirre, E., Cer, D., Diab, M., Gonzalez-Agirre, A., Guo, W.: Shared task: semantic textual similarity, including a pilot on typed similarity. In: Proceedings of the Second Joint Conference on Lexical and Computational Semantics (*SEM 2013), Atlanta, Georgia (2013)
Saric, F., Glavas, G., Karan, M., Snajder, J., Basi, B.: TakeLab: systems for measuring semantic text similarity. In: First Joint Conference on Lexical and Computational Semantics (*SEM 2012), Montreal, Canada, pp 441–448, June 7-8, 2012. @2012 Association for Computational Linguistics (2012)
Wali, W., Gargouri, B., Ben Hamadou, A.: Using standardized lexical semantic knowledge to measure similarity. In: 7th International Conference on Knowledge Science, Engineering and Management (KSEM 2014), pp. 93–104 (2014)
Jaccard, P.: Etude comparative de la distribution florale dans une portion des Alpes et des Jura. In : Bulletin de la Société Vaudoise des Sciences Naturelles, vol. 37, pp. 547–579 (1901)
Frank, E., Witten, I.H.: Practical machine learning tools and techniques. In: 2nd Morgan Kaufmann series in data management systems (2005)
Pasha, A., Al-Badrashiny, M., Diab, M., El Kholy, A., Eskander, R., Habash, N., Pooleery, M., Rambow, O., Roth, R.M.: MADAMIRA: a fast, comprehensive tool for morphological analysis and disambiguation of arabic. In: The Proceedings of 9th Edition of the Language Resources and Evaluation Conference (LREC2014), Reykjavik, Iceland, pp. 26–31, May 2014
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Wali, W., Gargouri, B., hamadou, A.B. (2015). Supervised Learning to Measure the Semantic Similarity Between Arabic Sentences. In: Núñez, M., Nguyen, N., Camacho, D., Trawiński, B. (eds) Computational Collective Intelligence. Lecture Notes in Computer Science(), vol 9329. Springer, Cham. https://doi.org/10.1007/978-3-319-24069-5_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-24069-5_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24068-8
Online ISBN: 978-3-319-24069-5
eBook Packages: Computer ScienceComputer Science (R0)