Abstract
Reading in a foreign language is a difficult task, especially if the texts presented to readers are chosen without taking into account the reader’s skill level. Foreign language learners need to be presented with reading material suitable to their reading capacities. A basic tool for determining if a text is appropriate to a reader’s level is the assessment of its readability, a measure that aims to represent the human capacities required to comprehend a given text. Readability prediction for a text is an important aspect in the process of teaching and learning, for reading in a foreign language as well as in one’s native language, and continues to be a central area of research and practice. In this paper, we present our approach to readability assessment for Modern Standard Arabic (MSA) as a foreign language. Readability prediction is carried out using the Global Language Online Support System (GLOSS) corpus, which was developed for independent learners to improve their foreign language skills and was annotated with the Interagency Language Roundtable (ILR) scale. In this study, we introduce a frequency dictionary, which was developed to calculate frequency-based features. The approach gives results that surpass the state-of the-art results for Arabic.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
https://gloss.dliflc.edu/. The MSA corpus has undergone some variation in contents over time.
- 2.
The ILR scale (https://www.languagetesting.com/ilr-scale), developed by the U.S. Federal Government, rates language ability uses values 0 to 5, where: Level 0 (no proficiency); Level 1 (elementary proficiency); Level 2 (limited working proficiency); Level 3 (general occupational proficiency); Level 4 (advanced professional proficiency) and Level 5 (functionally native proficiency). Levels 0+, 1+, 2+, 3+, or 4+ are used when the person’s skills significantly exceed those of a given level, but are insufficient to reach the next level.
- 3.
The Waikato Environment for Knowledge Analysis (WEKA) is an open source machine learning software resource that contains implementations of various algorithms.
References
Dale, E., Chall, J.S.: A formula for predicting readability: instructions. Educ. Res. Bull. 27(2), 37–54 (1948)
Flesch, R.: A new readability yardstick. J. Appl. Psychol. 32(3), 221 (1948)
Gunning, R.: The fog index after twenty years. J. Bus. Commun. 6(2), 3–13 (1969)
Ghani, K.A., Noh, A.S., Yusoff, N.M.: Linguistic features for development of Arabic text readability formula in Malaysia: a preliminary study. Middle-East J. Sci. Res. 19(3), 319–331 (2014)
Al Tamimi, A.K., Jaradat, M., Al-Jarrah, N., Ghanem, S.: AARI: automatic arabic readability index. Int. Arab. J. Inf. Technol. 11(4), 370–378 (2014)
Al-Khalifa, H.S., Al-Ajlan, A.: Automatic readability measurements of the Arabic text: an exploratory study. Arab. J. Sci. Eng. 35, 103–124 (2010)
Forsyth, J.: Automatic readability detection for modern standard Arabic. Thesis Diss., Brigh. Young Univ. – Provo (2014)
Pasha, A., et al.: MADAMIRA: a fast, comprehensive tool for morphological analysis and disambiguation of Arabic. In: LREC, vol. 14, pp. 1094–1101 (2014)
Saddiki, H., Bouzoubaa, K., Cavalli-Sforza, V.: Text readability for Arabic as a foreign language. In: 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA), pp. 1–8 (2015)
Nassiri, N., Lakhouaja, A., Cavalli-Sforza, V.: Modern Standard Arabic readability prediction. In: Lachkar, A., Bouzoubaa, K., Mazroui, A., Hamdani, A., Lekhouaja, A. (eds.) ICALP 2017. CCIS, vol. 782, pp. 120–133. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73500-9_9
Boudchiche, M., Mazroui, A.: Approche hybride pour le développement d’un lemmatiseur pour la langue arabe. In: Presented at the 13th African Conference on Research in Computer Science and Applied Mathematics, Hammamet, Tunisia, p. 147 (2016)
Boudchiche, M., Mazroui, A., Ould Abdallahi Ould Bebah, M., Lakhouaja, A., Boudlal, A.: AlKhalil Morpho Sys 2: a robust Arabic morpho-syntactic analyzer. J. King Saud Univ. – Comput. Inf. Sci. 29(2), 141–146 (2017)
Ababou, N., Mazroui, A.: A hybrid Arabic POS tagging for simple and compound morphosyntactic tags. Int. J. Speech Technol. 19(2), 289–302 (2016)
Zerrouki, T., Balla, A.: Tashkeela: novel corpus of Arabic vocalized texts, data for auto-diacritization systems. Data Brief 11, 147–151 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Nassiri, N., Lakhouaja, A., Cavalli-Sforza, V. (2018). Arabic Readability Assessment for Foreign Language Learners. In: Silberztein, M., Atigui, F., Kornyshova, E., Métais, E., Meziane, F. (eds) Natural Language Processing and Information Systems. NLDB 2018. Lecture Notes in Computer Science(), vol 10859. Springer, Cham. https://doi.org/10.1007/978-3-319-91947-8_49
Download citation
DOI: https://doi.org/10.1007/978-3-319-91947-8_49
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91946-1
Online ISBN: 978-3-319-91947-8
eBook Packages: Computer ScienceComputer Science (R0)