Skip to main content

Neural Complexity Assessment: A Deep Learning Approach to Readability Classification for European Portuguese Corpora

  • Conference paper
  • First Online:
Intelligent Data Engineering and Automated Learning – IDEAL 2021 (IDEAL 2021)

Abstract

The following paper describes a Deep Learning model capable of classifying the inherent readability complexity of a piece of text in European Portuguese. The model was developed using modern Natural Language Processing techniques, featuring a highly-fine-tuned Neural Network which takes as input both the text as well as multiple metrics relating to it. This classifier was trained on a dataset featuring texts divided in 5 CEFR categories, obtaining an accuracy of 73%, a top 2 accuracy of 90% and an adjacent accuracy of 94%.

This research was supported by FCT—Fundação para a Ciência e Tecnologia within the R&D Units Project Scope: UIDB/00319/2020.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abadi, M., et al.: TensorFlow: Large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/

  2. Anderson, J.: Lix and rix: variations on a little-known readability index. J. Reading 26(6), 490–496 (1983)

    Google Scholar 

  3. Björnsson, C.H.: Läsbarhet. stockholm: Liber (1968)

    Google Scholar 

  4. Branco, A., Rodrigues, J., Costa, F., Silva, J., Vaz, R.: Rolling out text categorization for language learning assessment supported by language technology. In: Baptista, J., Mamede, N., Candeias, S., Paraboni, I., Pardo, T.A.S., Volpe Nunes, M.G. (eds.) PROPOR 2014. LNCS (LNAI), vol. 8775, pp. 256–261. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09761-9_29

    Chapter  Google Scholar 

  5. Chollet, F., et al.: Keras. https://github.com/fchollet/keras (2015)

  6. Coleman, M., Liau, T.L.: A computer readability formula designed for machine scoring. J. Appl. Psychol. 60(2), 283 (1975)

    Article  Google Scholar 

  7. Curto, P., Mamede, N., Baptista, J.: Automatic text difficulty classifier. Assisting the selection of adequate reading materials for European Portuguese teaching. In: Proceedings of CSEDU, pp. 36–44 (2015)

    Google Scholar 

  8. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  9. Edwards, C.: Growing pains for deep learning. Commun. ACM 58(7), 14–16 (2015)

    Article  Google Scholar 

  10. Council of Europe. Common European framework of reference for languages: Learning, teaching, assessment - companion volume. https://rm.coe.int/common-european-framework-of-reference-for-languages-learning-teaching/16809ea0d4

  11. Feng, L., Jansche, M., Huenerfauth, M., Elhadad, N.: A comparison of features for automatic readability assessment (2010)

    Google Scholar 

  12. Fitzsimmons, P.R., Michael, B., Hulley, J.L., Scott, G.O.: A readability assessment of online parkinson’s disease information. J. R. Coll. Phys. Edinb. 40(4), 292–296 (2010)

    Article  Google Scholar 

  13. Flesch, R.: Marks of a readable style. contributions to education# 897 (1943)

    Google Scholar 

  14. François, T., Miltsakaki, E.: Do NLP and machine learning improve traditional readability formulas? In: Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations, pp. 49–57 (2012)

    Google Scholar 

  15. Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann. Stat. 28(2), 337–407 (2000)

    Article  Google Scholar 

  16. Golub, L.S., Kidder, C.: Syntactic density and the computer. Elementary English 51(8), 1128–1131 (1974)

    Google Scholar 

  17. Grosso, M.J., Soares, A., Sousa, F.D., Pascoal, J.: Quadro de referência para o ensino português no estrangeiro. Documento orientador. DGE MEC Portugal (2011)

    Google Scholar 

  18. Gunning, R., et al.: Technique of clear writing (1952)

    Google Scholar 

  19. Hartmann, N., Fonseca, E., Shulby, C., Treviso, M., Rodrigues, J., Aluisio, S.: Portuguese word embeddings: Evaluating on word analogies and natural language tasks. arXiv preprint arXiv:1708.06025 (2017)

  20. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  21. Howard, J., Major, J.: Guidelines for designing effective English language teaching materials. TESOLANZ J. 12(10), 50–58 (2004)

    Google Scholar 

  22. Imperial, J.M.: Knowledge-rich bert embeddings for readability assessment. arXiv preprint arXiv:2106.07935 (2021)

  23. Kincaid, J.P., et al.: Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. Technical Report, Naval Technical Training Command Millington TN Research Branch (1975)

    Google Scholar 

  24. Kotani, K., et al.: A machine learning approach to measurement of text readability for efl learners using various linguistic features. Online Submission (2011)

    Google Scholar 

  25. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)

  26. Mc Laughlin, G.H.: Smog grading-a new readability formula. J. Read. 12(8), 639–646 (1969)

    Google Scholar 

  27. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  28. Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Sig. Process. 45(11), 2673–2681 (1997)

    Article  Google Scholar 

  29. Schwarm, S.E., Ostendorf, M.: Reading level assessment using support vector machines and statistical language models. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005), pp. 523–530 (2005)

    Google Scholar 

  30. Smith, E.A., Senter, R.: Automated readability index. AMRL-TR. Aerospace Medical Research Laboratories (US), pp. 1–14 (1967)

    Google Scholar 

  31. Teller, V.: Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition (2000)

    Google Scholar 

  32. Tseng, H.-C., Chen, H.-C., Chang, K.-E., Sung, Y.-T., Chen, B.: An innovative BERT-based readability model. In: Rønningsbakk, L., Wu, T.-T., Sandnes, F.E., Huang, Y.-M. (eds.) ICITL 2019. LNCS, vol. 11937, pp. 301–308. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-35343-8_32

    Chapter  Google Scholar 

Download references

Acknowledgments

This research was supported by FCT—Fundação para a Ciência e Tecnologia within the R&D Units Project Scope: UIDB/00319/2020.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rui Mendes .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Correia, J., Mendes, R. (2021). Neural Complexity Assessment: A Deep Learning Approach to Readability Classification for European Portuguese Corpora. In: Yin, H., et al. Intelligent Data Engineering and Automated Learning – IDEAL 2021. IDEAL 2021. Lecture Notes in Computer Science(), vol 13113. Springer, Cham. https://doi.org/10.1007/978-3-030-91608-4_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-91608-4_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-91607-7

  • Online ISBN: 978-3-030-91608-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics