Neural Complexity Assessment: A Deep Learning Approach to Readability Classification for European Portuguese Corpora

Correia, João; Mendes, Rui

doi:10.1007/978-3-030-91608-4_30

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13113))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

1513 Accesses

Abstract

The following paper describes a Deep Learning model capable of classifying the inherent readability complexity of a piece of text in European Portuguese. The model was developed using modern Natural Language Processing techniques, featuring a highly-fine-tuned Neural Network which takes as input both the text as well as multiple metrics relating to it. This classifier was trained on a dataset featuring texts divided in 5 CEFR categories, obtaining an accuracy of 73%, a top 2 accuracy of 90% and an adjacent accuracy of 94%.

This research was supported by FCT—Fundação para a Ciência e Tecnologia within the R&D Units Project Scope: UIDB/00319/2020.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abadi, M., et al.: TensorFlow: Large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/
Anderson, J.: Lix and rix: variations on a little-known readability index. J. Reading 26(6), 490–496 (1983)
Google Scholar
Björnsson, C.H.: Läsbarhet. stockholm: Liber (1968)
Google Scholar
Branco, A., Rodrigues, J., Costa, F., Silva, J., Vaz, R.: Rolling out text categorization for language learning assessment supported by language technology. In: Baptista, J., Mamede, N., Candeias, S., Paraboni, I., Pardo, T.A.S., Volpe Nunes, M.G. (eds.) PROPOR 2014. LNCS (LNAI), vol. 8775, pp. 256–261. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09761-9_29
Chapter Google Scholar
Chollet, F., et al.: Keras. https://github.com/fchollet/keras (2015)
Coleman, M., Liau, T.L.: A computer readability formula designed for machine scoring. J. Appl. Psychol. 60(2), 283 (1975)
Article Google Scholar
Curto, P., Mamede, N., Baptista, J.: Automatic text difficulty classifier. Assisting the selection of adequate reading materials for European Portuguese teaching. In: Proceedings of CSEDU, pp. 36–44 (2015)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Edwards, C.: Growing pains for deep learning. Commun. ACM 58(7), 14–16 (2015)
Article Google Scholar
Council of Europe. Common European framework of reference for languages: Learning, teaching, assessment - companion volume. https://rm.coe.int/common-european-framework-of-reference-for-languages-learning-teaching/16809ea0d4
Feng, L., Jansche, M., Huenerfauth, M., Elhadad, N.: A comparison of features for automatic readability assessment (2010)
Google Scholar
Fitzsimmons, P.R., Michael, B., Hulley, J.L., Scott, G.O.: A readability assessment of online parkinson’s disease information. J. R. Coll. Phys. Edinb. 40(4), 292–296 (2010)
Article Google Scholar
Flesch, R.: Marks of a readable style. contributions to education# 897 (1943)
Google Scholar
François, T., Miltsakaki, E.: Do NLP and machine learning improve traditional readability formulas? In: Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations, pp. 49–57 (2012)
Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann. Stat. 28(2), 337–407 (2000)
Article Google Scholar
Golub, L.S., Kidder, C.: Syntactic density and the computer. Elementary English 51(8), 1128–1131 (1974)
Google Scholar
Grosso, M.J., Soares, A., Sousa, F.D., Pascoal, J.: Quadro de referência para o ensino português no estrangeiro. Documento orientador. DGE MEC Portugal (2011)
Google Scholar
Gunning, R., et al.: Technique of clear writing (1952)
Google Scholar
Hartmann, N., Fonseca, E., Shulby, C., Treviso, M., Rodrigues, J., Aluisio, S.: Portuguese word embeddings: Evaluating on word analogies and natural language tasks. arXiv preprint arXiv:1708.06025 (2017)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Howard, J., Major, J.: Guidelines for designing effective English language teaching materials. TESOLANZ J. 12(10), 50–58 (2004)
Google Scholar
Imperial, J.M.: Knowledge-rich bert embeddings for readability assessment. arXiv preprint arXiv:2106.07935 (2021)
Kincaid, J.P., et al.: Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. Technical Report, Naval Technical Training Command Millington TN Research Branch (1975)
Google Scholar
Kotani, K., et al.: A machine learning approach to measurement of text readability for efl learners using various linguistic features. Online Submission (2011)
Google Scholar
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
Mc Laughlin, G.H.: Smog grading-a new readability formula. J. Read. 12(8), 639–646 (1969)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Sig. Process. 45(11), 2673–2681 (1997)
Article Google Scholar
Schwarm, S.E., Ostendorf, M.: Reading level assessment using support vector machines and statistical language models. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005), pp. 523–530 (2005)
Google Scholar
Smith, E.A., Senter, R.: Automated readability index. AMRL-TR. Aerospace Medical Research Laboratories (US), pp. 1–14 (1967)
Google Scholar
Teller, V.: Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition (2000)
Google Scholar
Tseng, H.-C., Chen, H.-C., Chang, K.-E., Sung, Y.-T., Chen, B.: An innovative BERT-based readability model. In: Rønningsbakk, L., Wu, T.-T., Sandnes, F.E., Huang, Y.-M. (eds.) ICITL 2019. LNCS, vol. 11937, pp. 301–308. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-35343-8_32
Chapter Google Scholar

Download references

Acknowledgments

This research was supported by FCT—Fundação para a Ciência e Tecnologia within the R&D Units Project Scope: UIDB/00319/2020.

Author information

Authors and Affiliations

Centro Algoritmi, Departamento de Informática, Universidade do Minho, Braga, Portugal
João Correia & Rui Mendes

Authors

João Correia
View author publications
You can also search for this author in PubMed Google Scholar
Rui Mendes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rui Mendes .

Editor information

Editors and Affiliations

University of Manchester, Manchester, UK
Hujun Yin
Universidad Politecnica de Madrid, Madrid, Spain
David Camacho
University of Birmingham, Birmingham, UK
Peter Tino
University of Manchester, Manchester, UK
Richard Allmendinger
University of Huelva, Huelva, Spain
Antonio J. Tallón-Ballesteros
Southern University of Science and Technology, Shenzhen, China
Ke Tang
Yonsei University, Seoul, Korea (Republic of)
Sung-Bae Cho
University of Minho, Braga, Portugal
Paulo Novais
NOVA University of Lisbon, Lisbon, Portugal
Susana Nascimento

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Correia, J., Mendes, R. (2021). Neural Complexity Assessment: A Deep Learning Approach to Readability Classification for European Portuguese Corpora. In: Yin, H., et al. Intelligent Data Engineering and Automated Learning – IDEAL 2021. IDEAL 2021. Lecture Notes in Computer Science(), vol 13113. Springer, Cham. https://doi.org/10.1007/978-3-030-91608-4_30

Download citation

DOI: https://doi.org/10.1007/978-3-030-91608-4_30
Published: 23 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91607-7
Online ISBN: 978-3-030-91608-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics