Skip to main content
Log in

“Easy” meta-embedding for detecting and correcting semantic errors in Arabic documents

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Word-Embedding models have enabled massive advances in natural language understanding tasks and achieved state-of-the-art performances in multiple natural language processing tasks. In this paper, we present an original method based on an “easy” meta-embedding to automatically detect and correct Arabic real-words errors that are semantically inconsistent with the context of the sentence. Due to the lexical proximity of words in Arabic, the risk of having this type of errors in documents is relatively high compared to other languages. Our method uses three word embedding techniques and their combination, namely SkipGram, FastText and BERT for both detection and correction. It checks the semantic affinity of words with the immediate context in a collocation and the near context of the sentence. Experiments have shown that the proposed meta-embedding improves the overall performance of our system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Algorithm 2
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. One error edition: Addition of a character, deletion of a character, substitution of a character, inversion of two adjacent characters.

  2. The morpho-syntactic analyzer segments agglutinated words and gives different syntactic information as Part-of-Speech and lemmas.

  3. Available for free download for non-commercial use, sourceforge.net/

    projects/kacst-acptool/files/

References

  1. Al-Jefri, M. M., Mahmoud, S. A., (2013) Context-sensitive Arabic spellchecker using contextwords and n-gram language models. In proc. Taibah Univ. Int. Conf. Adv. Inf. Technol. Holy Quran Sci. pp. 258–263

  2. Alwehaibi, A., Roy, K., 2018 Comparison of pre-trained word vectors for Arabic text classification using deep learning approach. In proc. - 17th IEEE international conference on machine learning and applications, ICMLA, pp. 1471–1474

  3. Azmi AM, Almutery MN, Aboalsamh HA (2019) Real-word errors in Arabic texts: a BetterAlgorithm for detection and correction. IEEE/ACM Transac Audio, Speech, Language Proc 27(8):1308–1320

    Article  Google Scholar 

  4. Bojanowski P, Grave E, Joulin A, Mikolov T (2016) Enriching Word Vectors with Subword Information, arXiv preprint arXiv:1607.04606

  5. Bravo-Candel D, López-Hernández J, García-Díaz JA, Molina-Molina F (2021) Automatic correction of real-word errors in Spanish clinical texts. Sensors J 21:2893

    Article  Google Scholar 

  6. Coates JN, Bollegala D (2018) Frustratingly Easy Meta-Embedding – Computing Meta-Embeddings by Averaging Source Word Embeddings, in Proc. of NAACL-HLT 2018, New Orleans, Louisiana, pp 194–198.

  7. Devlin J, Chang MW, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional transformers for language understanding”, in Proce of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol.1. pp. 4171–4186

  8. Firth JR (1957) A synopsis of linguistic theory studies in linguistic analysis. Blackwell, Oxford

    Google Scholar 

  9. Golding, A.R., 1995. A Bayesian hybrid method for context-sensitive spelling correction, in proc. of the 3rd workshop on very large corpora, Massachusetts, USA. pp. 39–53.

  10. Golding AR, Roth D (1999) A winnow-based approach to context-sensitive spelling correction. Machine Learn J 34(1–3):107–130

    Article  MATH  Google Scholar 

  11. Golding AR, Schabes Y (1996) Combining trigram based and feature based methods for context sensitive spelling correction, in proc. of the 34th annual meeting of the Association for Computational Linguistics, Santa Cruz. pp. 71-78

  12. Graem H, Budanitsky A (2005) Correcting real-word spelling errors by restoring lexical cohesion. Nat Lang Eng 11:87–111

    Article  Google Scholar 

  13. Gutierrez F, Dou D, Fickas S, Griffiths G, (2014) Online reasoning for ontology-based error detection in text. OTM international conference on ontologies, databases and application of semantics, pp.562-579

  14. Islam A, Inkpen D (2015) Real-word spelling correction using Google web 1T 3-gram data set, in proc. 18th ACM Conf. Inf Knowl Manage, 2009, pp. 1689–1692

  15. Kim M, Choi S-K, Jin J, Kwon H-C (2015) Adaptive context-sensitivespelling error correction techniques for the extremely unpredictable error generating language environments, in proc. IEEE Int. Conf. Comput. Inf. Technol.; ubiquitous Comput. Commun.; dependable, auto-nomic secure Comput.; pervasive Intell. Comput. pp. 654-656

  16. Lee JH, Kim M, Kwon HC (2018) Context-sensitive spelling errorcorrection techniques using contextual embeddings, in proc. KIISE Korea Comput Congr 2018:607–609

    Google Scholar 

  17. Lee JH, Kim M, Kwon HC (2020) Deep learning-based context-sensitive spelling. IEEE Access 8:152565–152578

    Article  Google Scholar 

  18. Mikolov T, Sutskever I, Chen K, Corrado G, Dean J, (2013) Distributed representations of phrase and their compositionality. Advances in neural information processing systems, 3111-3119

  19. Rokaya M (2015) Arabic semantic spell checking based on power links. Int Interdisciplinary J 18(11):4749–4770

    Google Scholar 

  20. Samanta P, Chaudhuri BB, A simple real-word error detection and correction using local word bigram and trigram, in proc. 25th Conf. Computational Linguistics Speech Process

  21. Sharmaa S, Guptab S (2015) A correction model for real-word errors, the 4th international conference on eco-friendly computing and communication systems. Procedia Comput Sci 70:99–106

    Article  Google Scholar 

  22. Soliman E, Eissa K, El-Beltagy S (2017) AraVec: a set of Arabic word embedding models for use in Arabic NLP. Procedia Computer Science, pp. 256–265

  23. Toshevska M, Stojanovska F, Kalajdjiesk J (2020) Comparative analysis of word embeddings for capturing word similarities. 6th International Conference on Natural Language Processing, Copenhagen, Denmark. pp. 9–24

  24. Turney PD (2008) A uniform approach to analogies, synonyms, antonyms, and associations, in proc. of the 22nd International Conference on Computational Linguistics, , Manchester, UK. pp. 905–912

  25. Yin W, Schütze H (2016). Learning word meta-embeddings, in Proc. of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, pp. 351–1360

  26. Zribi Ben Othmane C (2018) Word2Vec vs LSA pour la détection des erreurs orthograohiques produisant un dérèglement sémantique en langue arabe. Actes CORIA-TALN-RJC 1:293–302

    Google Scholar 

  27. Zribi Ben Othmane C (2020) English-Arabic collocation extraction to enhance Arabic collocation identification. Knowl Inf Syst 62(6):2439–2459

    Article  Google Scholar 

  28. Zribi Ben Othmane C, Ben Ahmed M (2013) Detection of semantic errors in Arabic texts. Artificial Intel J 195:249–264

    Article  MathSciNet  MATH  Google Scholar 

  29. Zribi Ben Othmane C, Ben Fraj F, Limam I (2017) POS-tagging arabic texts: a novel approach based on ant colony. Nat Lang Eng 23(3):419–439

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chiraz Ben Othmane Zribi.

Ethics declarations

Conflict of interest

It has not any conflicts of interests or competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zribi, C.B.O. “Easy” meta-embedding for detecting and correcting semantic errors in Arabic documents. Multimed Tools Appl 82, 21161–21175 (2023). https://doi.org/10.1007/s11042-023-14553-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-14553-4

Keywords

Navigation