Supervised Learning to Measure the Semantic Similarity Between Arabic Sentences

Wali, Wafa; Gargouri, Bilel; hamadou, Abdelmajid Ben

doi:10.1007/978-3-319-24069-5_15

Wafa Wali¹⁷,
Bilel Gargouri¹⁷ &
Abdelmajid Ben hamadou¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9329))

1690 Accesses
9 Citations

Abstract

Many methods for measuring the semantic similarity between sentences have been proposed, particularly for English. These methods are considered restrictive as they usually do not take into account some semantic and syntactic-semantic knowledge like semantic predicate, thematic role and semantic class. Measuring the semantic similarity between sentences in Arabic is particularly a challenging task because of the complex linguistic structure of the Arabic language and given the lack of electronic resources such as syntactic-semantic knowledge and annotated corpora.

In this paper, we proposed a method for measuring Arabic sentences’ similarity based on automatic learning taking advantage of LMF standardized Arabic dictionaries, notably the syntactic-semantic knowledge that they contain. Furthermore, we evaluated our proposal with the cross validation method by using 690 pairs of sentences taken from old Arabic dictionaries designed for human use like Al-Wassit and Lissan-Al-Arab. The obtained results are very encouraging and show a good performance that approximates to human intuition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Yuhua, L., McLean, D., Bandar, Z.A., O’Shea, J.D., Crockett, K.: Sentence similarity based on semantic nets and corpus statistics. In: IEEE Transactions on Knowledge and Data Engineering, vol. 18, pp. 1138–1150 (2006)
Google Scholar
Lee, M., Chang, J., Hsieh, T.: A Grammar-Based Semantic Similarity Algorithm for Natural Language Sentences. The Scientific World Journal 2014, Article ID 437162, 17 (2014). http://dx.doi.org/10.1155/2014/43716
Ştefănescu, D., Banjade, R., Rus, V.: A sentence similarity method based on chunking and information content. In: Gelbukh, A. (ed.) CICLing 2014, Part I. LNCS, vol. 8403, pp. 442–453. Springer, Heidelberg (2014)
Chapter Google Scholar
Almarsoomi, F., O’Shea, J., Bandar, Z, Crockett, K.: Arabic word semantic similarity. In: World Academy of Science, Engineering and Technology (WASET 2012), vol. 6, pp. 81–89 (2012)
Google Scholar
Francopoulo, G., George, M.: Language Resource Management. 2008. Lexical Markup Framework (LMF). Technical report, ISO/TC 37/SC 4 N453 (N330 Rev.16) (2008)
Google Scholar
Khemakhem, A., Gargouri, B., Ben Hamadou, A.: LMF standardized dictionary for arabic language. In: International Conference on Computing and Information Technology (2012)
Google Scholar
Wali, W., Gargouri, B., Ben Hamadou, A.: Towards detecting anomalies in the content of standardized LMF dictionaries. In: Recent Advances in Natural Language Processing (RANLP 2013), pp 719–726 (2013)
Google Scholar
Wali, W., Gargouri, B., Ben Hamadou, A.: LMF-based approach for detecting semantic anomalies in electronic dictionaries. In: The Asialex International Conference (ASIALEX 2013), pp 242–252 (2013)
Google Scholar
Wu, Z., Palmer, M.: Verb semantics and lexical selection. In: The Proceedings of the Annual Meeting of the Association for Computational Linguistics (1994)
Google Scholar
Agirre, E., Cer, D., Diab, M., Gonzalez-Agirre, A., Guo, W.: Shared task: semantic textual similarity, including a pilot on typed similarity. In: Proceedings of the Second Joint Conference on Lexical and Computational Semantics (*SEM 2013), Atlanta, Georgia (2013)
Google Scholar
Saric, F., Glavas, G., Karan, M., Snajder, J., Basi, B.: TakeLab: systems for measuring semantic text similarity. In: First Joint Conference on Lexical and Computational Semantics (*SEM 2012), Montreal, Canada, pp 441–448, June 7-8, 2012. @2012 Association for Computational Linguistics (2012)
Google Scholar
Wali, W., Gargouri, B., Ben Hamadou, A.: Using standardized lexical semantic knowledge to measure similarity. In: 7th International Conference on Knowledge Science, Engineering and Management (KSEM 2014), pp. 93–104 (2014)
Google Scholar
Jaccard, P.: Etude comparative de la distribution florale dans une portion des Alpes et des Jura. In : Bulletin de la Société Vaudoise des Sciences Naturelles, vol. 37, pp. 547–579 (1901)
Google Scholar
Frank, E., Witten, I.H.: Practical machine learning tools and techniques. In: 2nd Morgan Kaufmann series in data management systems (2005)
Google Scholar
Pasha, A., Al-Badrashiny, M., Diab, M., El Kholy, A., Eskander, R., Habash, N., Pooleery, M., Rambow, O., Roth, R.M.: MADAMIRA: a fast, comprehensive tool for morphological analysis and disambiguation of arabic. In: The Proceedings of 9th Edition of the Language Resources and Evaluation Conference (LREC2014), Reykjavik, Iceland, pp. 26–31, May 2014
Google Scholar

Download references

Author information

Authors and Affiliations

MIR@CL Laboratory, FSEGS, Sfax, Tunisia
Wafa Wali & Bilel Gargouri
MIR@CL Laboratory, ISIMS, Sfax, Tunisia
Abdelmajid Ben hamadou

Authors

Wafa Wali
View author publications
You can also search for this author in PubMed Google Scholar
Bilel Gargouri
View author publications
You can also search for this author in PubMed Google Scholar
Abdelmajid Ben hamadou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wafa Wali .

Editor information

Editors and Affiliations

Universidad Complutense de Madrid, Madrid, Spain
Manuel Núñez
Wrocław University of Technology, Wroclaw, Poland
Ngoc Thanh Nguyen
Computer Science Department, Universidad Autónoma De Madrid, Madrid, Spain
David Camacho
Wrocław University of Technology, Wroclaw, Poland
Bogdan Trawiński

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wali, W., Gargouri, B., hamadou, A.B. (2015). Supervised Learning to Measure the Semantic Similarity Between Arabic Sentences. In: Núñez, M., Nguyen, N., Camacho, D., Trawiński, B. (eds) Computational Collective Intelligence. Lecture Notes in Computer Science(), vol 9329. Springer, Cham. https://doi.org/10.1007/978-3-319-24069-5_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-24069-5_15
Published: 24 October 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24068-8
Online ISBN: 978-3-319-24069-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics