Skip to main content

Supervised Learning to Measure the Semantic Similarity Between Arabic Sentences

  • Conference paper
  • First Online:
Computational Collective Intelligence

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9329))

Abstract

Many methods for measuring the semantic similarity between sentences have been proposed, particularly for English. These methods are considered restrictive as they usually do not take into account some semantic and syntactic-semantic knowledge like semantic predicate, thematic role and semantic class. Measuring the semantic similarity between sentences in Arabic is particularly a challenging task because of the complex linguistic structure of the Arabic language and given the lack of electronic resources such as syntactic-semantic knowledge and annotated corpora.

In this paper, we proposed a method for measuring Arabic sentences’ similarity based on automatic learning taking advantage of LMF standardized Arabic dictionaries, notably the syntactic-semantic knowledge that they contain. Furthermore, we evaluated our proposal with the cross validation method by using 690 pairs of sentences taken from old Arabic dictionaries designed for human use like Al-Wassit and Lissan-Al-Arab. The obtained results are very encouraging and show a good performance that approximates to human intuition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Yuhua, L., McLean, D., Bandar, Z.A., O’Shea, J.D., Crockett, K.: Sentence similarity based on semantic nets and corpus statistics. In: IEEE Transactions on Knowledge and Data Engineering, vol. 18, pp. 1138–1150 (2006)

    Google Scholar 

  2. Lee, M., Chang, J., Hsieh, T.: A Grammar-Based Semantic Similarity Algorithm for Natural Language Sentences. The Scientific World Journal 2014, Article ID 437162, 17 (2014). http://dx.doi.org/10.1155/2014/43716

  3. Ştefănescu, D., Banjade, R., Rus, V.: A sentence similarity method based on chunking and information content. In: Gelbukh, A. (ed.) CICLing 2014, Part I. LNCS, vol. 8403, pp. 442–453. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  4. Almarsoomi, F., O’Shea, J., Bandar, Z, Crockett, K.: Arabic word semantic similarity. In: World Academy of Science, Engineering and Technology (WASET 2012), vol. 6, pp. 81–89 (2012)

    Google Scholar 

  5. Francopoulo, G., George, M.: Language Resource Management. 2008. Lexical Markup Framework (LMF). Technical report, ISO/TC 37/SC 4 N453 (N330 Rev.16) (2008)

    Google Scholar 

  6. Khemakhem, A., Gargouri, B., Ben Hamadou, A.: LMF standardized dictionary for arabic language. In: International Conference on Computing and Information Technology (2012)

    Google Scholar 

  7. Wali, W., Gargouri, B., Ben Hamadou, A.: Towards detecting anomalies in the content of standardized LMF dictionaries. In: Recent Advances in Natural Language Processing (RANLP 2013), pp 719–726 (2013)

    Google Scholar 

  8. Wali, W., Gargouri, B., Ben Hamadou, A.: LMF-based approach for detecting semantic anomalies in electronic dictionaries. In: The Asialex International Conference (ASIALEX 2013), pp 242–252 (2013)

    Google Scholar 

  9. Wu, Z., Palmer, M.: Verb semantics and lexical selection. In: The Proceedings of the Annual Meeting of the Association for Computational Linguistics (1994)

    Google Scholar 

  10. Agirre, E., Cer, D., Diab, M., Gonzalez-Agirre, A., Guo, W.: Shared task: semantic textual similarity, including a pilot on typed similarity. In: Proceedings of the Second Joint Conference on Lexical and Computational Semantics (*SEM 2013), Atlanta, Georgia (2013)

    Google Scholar 

  11. Saric, F., Glavas, G., Karan, M., Snajder, J., Basi, B.: TakeLab: systems for measuring semantic text similarity. In: First Joint Conference on Lexical and Computational Semantics (*SEM 2012), Montreal, Canada, pp 441–448, June 7-8, 2012. @2012 Association for Computational Linguistics (2012)

    Google Scholar 

  12. Wali, W., Gargouri, B., Ben Hamadou, A.: Using standardized lexical semantic knowledge to measure similarity. In: 7th International Conference on Knowledge Science, Engineering and Management (KSEM 2014), pp. 93–104 (2014)

    Google Scholar 

  13. Jaccard, P.: Etude comparative de la distribution florale dans une portion des Alpes et des Jura. In : Bulletin de la Société Vaudoise des Sciences Naturelles, vol. 37, pp. 547–579 (1901)

    Google Scholar 

  14. Frank, E., Witten, I.H.: Practical machine learning tools and techniques. In: 2nd Morgan Kaufmann series in data management systems (2005)

    Google Scholar 

  15. Pasha, A., Al-Badrashiny, M., Diab, M., El Kholy, A., Eskander, R., Habash, N., Pooleery, M., Rambow, O., Roth, R.M.: MADAMIRA: a fast, comprehensive tool for morphological analysis and disambiguation of arabic. In: The Proceedings of 9th Edition of the Language Resources and Evaluation Conference (LREC2014), Reykjavik, Iceland, pp. 26–31, May 2014

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wafa Wali .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Wali, W., Gargouri, B., hamadou, A.B. (2015). Supervised Learning to Measure the Semantic Similarity Between Arabic Sentences. In: Núñez, M., Nguyen, N., Camacho, D., Trawiński, B. (eds) Computational Collective Intelligence. Lecture Notes in Computer Science(), vol 9329. Springer, Cham. https://doi.org/10.1007/978-3-319-24069-5_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24069-5_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24068-8

  • Online ISBN: 978-3-319-24069-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics