Abstract
The following paper describes a Deep Learning model capable of classifying the inherent readability complexity of a piece of text in European Portuguese. The model was developed using modern Natural Language Processing techniques, featuring a highly-fine-tuned Neural Network which takes as input both the text as well as multiple metrics relating to it. This classifier was trained on a dataset featuring texts divided in 5 CEFR categories, obtaining an accuracy of 73%, a top 2 accuracy of 90% and an adjacent accuracy of 94%.
This research was supported by FCT—Fundação para a Ciência e Tecnologia within the R&D Units Project Scope: UIDB/00319/2020.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abadi, M., et al.: TensorFlow: Large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/
Anderson, J.: Lix and rix: variations on a little-known readability index. J. Reading 26(6), 490–496 (1983)
Björnsson, C.H.: Läsbarhet. stockholm: Liber (1968)
Branco, A., Rodrigues, J., Costa, F., Silva, J., Vaz, R.: Rolling out text categorization for language learning assessment supported by language technology. In: Baptista, J., Mamede, N., Candeias, S., Paraboni, I., Pardo, T.A.S., Volpe Nunes, M.G. (eds.) PROPOR 2014. LNCS (LNAI), vol. 8775, pp. 256–261. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09761-9_29
Chollet, F., et al.: Keras. https://github.com/fchollet/keras (2015)
Coleman, M., Liau, T.L.: A computer readability formula designed for machine scoring. J. Appl. Psychol. 60(2), 283 (1975)
Curto, P., Mamede, N., Baptista, J.: Automatic text difficulty classifier. Assisting the selection of adequate reading materials for European Portuguese teaching. In: Proceedings of CSEDU, pp. 36–44 (2015)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Edwards, C.: Growing pains for deep learning. Commun. ACM 58(7), 14–16 (2015)
Council of Europe. Common European framework of reference for languages: Learning, teaching, assessment - companion volume. https://rm.coe.int/common-european-framework-of-reference-for-languages-learning-teaching/16809ea0d4
Feng, L., Jansche, M., Huenerfauth, M., Elhadad, N.: A comparison of features for automatic readability assessment (2010)
Fitzsimmons, P.R., Michael, B., Hulley, J.L., Scott, G.O.: A readability assessment of online parkinson’s disease information. J. R. Coll. Phys. Edinb. 40(4), 292–296 (2010)
Flesch, R.: Marks of a readable style. contributions to education# 897 (1943)
François, T., Miltsakaki, E.: Do NLP and machine learning improve traditional readability formulas? In: Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations, pp. 49–57 (2012)
Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann. Stat. 28(2), 337–407 (2000)
Golub, L.S., Kidder, C.: Syntactic density and the computer. Elementary English 51(8), 1128–1131 (1974)
Grosso, M.J., Soares, A., Sousa, F.D., Pascoal, J.: Quadro de referência para o ensino português no estrangeiro. Documento orientador. DGE MEC Portugal (2011)
Gunning, R., et al.: Technique of clear writing (1952)
Hartmann, N., Fonseca, E., Shulby, C., Treviso, M., Rodrigues, J., Aluisio, S.: Portuguese word embeddings: Evaluating on word analogies and natural language tasks. arXiv preprint arXiv:1708.06025 (2017)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Howard, J., Major, J.: Guidelines for designing effective English language teaching materials. TESOLANZ J. 12(10), 50–58 (2004)
Imperial, J.M.: Knowledge-rich bert embeddings for readability assessment. arXiv preprint arXiv:2106.07935 (2021)
Kincaid, J.P., et al.: Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. Technical Report, Naval Technical Training Command Millington TN Research Branch (1975)
Kotani, K., et al.: A machine learning approach to measurement of text readability for efl learners using various linguistic features. Online Submission (2011)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
Mc Laughlin, G.H.: Smog grading-a new readability formula. J. Read. 12(8), 639–646 (1969)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Sig. Process. 45(11), 2673–2681 (1997)
Schwarm, S.E., Ostendorf, M.: Reading level assessment using support vector machines and statistical language models. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005), pp. 523–530 (2005)
Smith, E.A., Senter, R.: Automated readability index. AMRL-TR. Aerospace Medical Research Laboratories (US), pp. 1–14 (1967)
Teller, V.: Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition (2000)
Tseng, H.-C., Chen, H.-C., Chang, K.-E., Sung, Y.-T., Chen, B.: An innovative BERT-based readability model. In: Rønningsbakk, L., Wu, T.-T., Sandnes, F.E., Huang, Y.-M. (eds.) ICITL 2019. LNCS, vol. 11937, pp. 301–308. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-35343-8_32
Acknowledgments
This research was supported by FCT—Fundação para a Ciência e Tecnologia within the R&D Units Project Scope: UIDB/00319/2020.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Correia, J., Mendes, R. (2021). Neural Complexity Assessment: A Deep Learning Approach to Readability Classification for European Portuguese Corpora. In: Yin, H., et al. Intelligent Data Engineering and Automated Learning – IDEAL 2021. IDEAL 2021. Lecture Notes in Computer Science(), vol 13113. Springer, Cham. https://doi.org/10.1007/978-3-030-91608-4_30
Download citation
DOI: https://doi.org/10.1007/978-3-030-91608-4_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91607-7
Online ISBN: 978-3-030-91608-4
eBook Packages: Computer ScienceComputer Science (R0)