Abstract
Linguistic resources (corpus) have become a fundamental component in natural language processing over the last two decades, due to the role they play both in the testing and evaluation phase and in the development phase with the emergence of statistical and machine learning approaches. However, the development of such resources requires considerable time and effort.
In this paper, we present the steps to build our parallel corpus SPIRAL, which will be a useful resource for research in the field of spelling error detection and correction in Arabic texts. SPIRAL is the result of a study dedicated to the universe of spelling errors, where we exploited the different taxonomies dedicated to spelling error in Arabic texts, to generate automatically more than 248 million possible erroneous words from 420,000 correctly spelled words.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
Tools: Ghalatawi (http://ghalatawi.sourceforge.net/index.php?content=english), Arabic Spell checker (http://arabicspellchecker.com/), AyaSpell (http://ayaspell.sourceforge.net/).
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
Same observation for the prefixes:
References
Sinclair, J.: Corpus and text-basic principles. In: Wynne, M., (ed.) Developing Linguistic Corpora: A Guide to Good Practice, pp. 1–16. Oxford, UK OxbowBooks (2005)
Kukich, K.: Technique for automatically correcting words in text. ACM Comput. Surv. 24(4), 377–439 (1992)
Saty, A.A., Bouzoubaa, K., Si, L.A.: Survey of Arabic checker techniques. SUST J. Eng. Comput. Sci. (JECS) 21(1), 34–41 (2020)
Alkanhal, M.I., Al-Badrashiny, M.A., Alghamdi, M.M., Al-Qabbany, A.O.: Automatic stochastic Arabic spelling correction with emphasis on space insertions and deletions. In: Proceeding of IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no.7, (2012)
Alkhatib, M., Azza, A., Shaalan, K.: Deep learning for Arabic error detection and correction. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 19(5), 1–13 (2020)
Nejja M., Yousfi A.: The vocabulary and the morphology in spell checker. In: The Proceeding of the First International Conference on Intelligent Computing in Data Sciences (2018)
Habash, N., Mohit, B., Obeid, O., Oflazer, k., Tomeh, N., Zaghouani, W.: QALB: Qatar Arabic language bank. In: Proceedings of Qatar Annual Research Conference (2013)
Attia, M., Pecina, P., Samih, Y., Shaalan, K., Van Ggenabith, J.: Arabic spelling error detection and correction. Nat. Lang. Eng. 22(05), 751–773 (2015)
Al-Jefri, M.M., Mahmoud, S.A.: Context-sensitive Arabic spell checker using context words and N-gramlanguage models. In: Proceedings of the International Conference on Advances in Information Technology for the Holy Quran and Its Sciences, pp. 258–263 (2013)
Abandah, G.A, Graves, A., Al-Shagoor, B., Arabiyat, A., Jamour, F., Al-Taee, M.: Automatic diacritization of Arabic text using recurrent neural networks. Int. J. Doc. Anal. Recognit, 18(2), 183–197 (2015)https://doi.org/10.1007/s10032-015-0242-2
Farwaneh, S., Tamimi, M.: Arabic learners written corpus: a resource for research and learning. The Center for Educational Resources in Culture, Language and Literacy (2012)
Eryani, F., Habash, N., Bouamor, H., Khalifa, S.: A spelling correction corpus for multiple Arabic dialects. In: Proceedings of the 12th Conference on Language Resources and Evaluation (2020)
Mars, M.: Toward a robust spell checker for Arabic text. In: Gervasi, O., et al. (eds.) ICCSA 2016. LNCS, vol. 9790, pp. 312–322. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-42092-9_24
Shaalan, K., Allam, A., Gomah, A.: Towards automatic spell checking for Arabic. In: Proceedings of the 4th Conference on Language Engineering, Egyptian Society of Language Engineering (2003)
Shaalan, K., Aref, R., Fahmy, A.: An approach for analyzing and correcting spelling errors for non-native Arabic learners. In The Proceeding of 7th International Conference on Informatics and Systems (2010)
Brosh, H.: Arabic spelling: errors, perceptions, and strategies. Foreign Lang. Ann. 48(4), 584–603 (2015)
Alamri, M., Teahan, W.J.: A new error annotation for dyslexic texts in Arabic. In: Proceedings of the Third Arabic Natural Language Processing Workshop, pp. 72–78 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Aichaoui, S.B., Hiri, N., Cheragui, M.A. (2022). SPIRAL: SPellIng eRror Parallel Corpus for Arabic Language. In: Bennour, A., Ensari, T., Kessentini, Y., Eom, S. (eds) Intelligent Systems and Pattern Recognition. ISPR 2022. Communications in Computer and Information Science, vol 1589. Springer, Cham. https://doi.org/10.1007/978-3-031-08277-1_21
Download citation
DOI: https://doi.org/10.1007/978-3-031-08277-1_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08276-4
Online ISBN: 978-3-031-08277-1
eBook Packages: Computer ScienceComputer Science (R0)