Skip to main content

SPIRAL: SPellIng eRror Parallel Corpus for Arabic Language

  • Conference paper
  • First Online:
Intelligent Systems and Pattern Recognition (ISPR 2022)

Abstract

Linguistic resources (corpus) have become a fundamental component in natural language processing over the last two decades, due to the role they play both in the testing and evaluation phase and in the development phase with the emergence of statistical and machine learning approaches. However, the development of such resources requires considerable time and effort.

In this paper, we present the steps to build our parallel corpus SPIRAL, which will be a useful resource for research in the field of spelling error detection and correction in Arabic texts. SPIRAL is the result of a study dedicated to the universe of spelling errors, where we exploited the different taxonomies dedicated to spelling error in Arabic texts, to generate automatically more than 248 million possible erroneous words from 420,000 correctly spelled words.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Download SPIRAL: https://mega.nz/file/sa50BaKL#i9qjD52tt-QzLiKWM-rOQ9XrC1anyOwSOJ_kpzutN7M.

  2. 2.

    Tools: Ghalatawi (http://ghalatawi.sourceforge.net/index.php?content=english), Arabic Spell checker (http://arabicspellchecker.com/), AyaSpell (http://ayaspell.sourceforge.net/).

  3. 3.

    https://www.alriyadh.com/.

  4. 4.

    https://www.okaz.com.sa/.

  5. 5.

    https://www.bbc.com/arabic.

  6. 6.

    https://learning.aljazeera.net/en/generallanguage/level/beginner.

  7. 7.

    https://al-maktaba.org/.

  8. 8.

    https://farasa.qcri.org/.

  9. 9.

    Same observation for the prefixes:

References

  1. Sinclair, J.: Corpus and text-basic principles. In: Wynne, M., (ed.) Developing Linguistic Corpora: A Guide to Good Practice, pp. 1–16. Oxford, UK OxbowBooks (2005)

    Google Scholar 

  2. Kukich, K.: Technique for automatically correcting words in text. ACM Comput. Surv. 24(4), 377–439 (1992)

    Article  Google Scholar 

  3. Saty, A.A., Bouzoubaa, K., Si, L.A.: Survey of Arabic checker techniques. SUST J. Eng. Comput. Sci. (JECS) 21(1), 34–41 (2020)

    Google Scholar 

  4. Alkanhal, M.I., Al-Badrashiny, M.A., Alghamdi, M.M., Al-Qabbany, A.O.: Automatic stochastic Arabic spelling correction with emphasis on space insertions and deletions. In: Proceeding of IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no.7, (2012)

    Google Scholar 

  5. Alkhatib, M., Azza, A., Shaalan, K.: Deep learning for Arabic error detection and correction. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 19(5), 1–13 (2020)

    Article  Google Scholar 

  6. Nejja M., Yousfi A.: The vocabulary and the morphology in spell checker. In: The Proceeding of the First International Conference on Intelligent Computing in Data Sciences (2018)

    Google Scholar 

  7. Habash, N., Mohit, B., Obeid, O., Oflazer, k., Tomeh, N., Zaghouani, W.: QALB: Qatar Arabic language bank. In: Proceedings of Qatar Annual Research Conference (2013)

    Google Scholar 

  8. Attia, M., Pecina, P., Samih, Y., Shaalan, K., Van Ggenabith, J.: Arabic spelling error detection and correction. Nat. Lang. Eng. 22(05), 751–773 (2015)

    Article  Google Scholar 

  9. Al-Jefri, M.M., Mahmoud, S.A.: Context-sensitive Arabic spell checker using context words and N-gramlanguage models. In: Proceedings of the International Conference on Advances in Information Technology for the Holy Quran and Its Sciences, pp. 258–263 (2013)

    Google Scholar 

  10. Abandah, G.A, Graves, A., Al-Shagoor, B., Arabiyat, A., Jamour, F., Al-Taee, M.: Automatic diacritization of Arabic text using recurrent neural networks. Int. J. Doc. Anal. Recognit, 18(2), 183–197 (2015)https://doi.org/10.1007/s10032-015-0242-2

  11. Farwaneh, S., Tamimi, M.: Arabic learners written corpus: a resource for research and learning. The Center for Educational Resources in Culture, Language and Literacy (2012)

    Google Scholar 

  12. Eryani, F., Habash, N., Bouamor, H., Khalifa, S.: A spelling correction corpus for multiple Arabic dialects. In: Proceedings of the 12th Conference on Language Resources and Evaluation (2020)

    Google Scholar 

  13. Mars, M.: Toward a robust spell checker for Arabic text. In: Gervasi, O., et al. (eds.) ICCSA 2016. LNCS, vol. 9790, pp. 312–322. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-42092-9_24

    Chapter  Google Scholar 

  14. Shaalan, K., Allam, A., Gomah, A.: Towards automatic spell checking for Arabic. In: Proceedings of the 4th Conference on Language Engineering, Egyptian Society of Language Engineering (2003)

    Google Scholar 

  15. Shaalan, K., Aref, R., Fahmy, A.: An approach for analyzing and correcting spelling errors for non-native Arabic learners. In The Proceeding of 7th International Conference on Informatics and Systems (2010)

    Google Scholar 

  16. Brosh, H.: Arabic spelling: errors, perceptions, and strategies. Foreign Lang. Ann. 48(4), 584–603 (2015)

    Article  Google Scholar 

  17. Alamri, M., Teahan, W.J.: A new error annotation for dyslexic texts in Arabic. In: Proceedings of the Third Arabic Natural Language Processing Workshop, pp. 72–78 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamd Amine Cheragui .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Aichaoui, S.B., Hiri, N., Cheragui, M.A. (2022). SPIRAL: SPellIng eRror Parallel Corpus for Arabic Language. In: Bennour, A., Ensari, T., Kessentini, Y., Eom, S. (eds) Intelligent Systems and Pattern Recognition. ISPR 2022. Communications in Computer and Information Science, vol 1589. Springer, Cham. https://doi.org/10.1007/978-3-031-08277-1_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-08277-1_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-08276-4

  • Online ISBN: 978-3-031-08277-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics