Skip to main content

Revisiting the Readability Assessment of Texts in Portuguese

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6433))

Abstract

The Web content accessibility guidelines (WCAG) 2.0 include in its principle of comprehensibility an accessibility requirement related to the level of writing. This requirement states that websites with texts demanding higher reading skills than individuals with lower secondary education possess (fifth to ninth grades in Brazil) should offer them an alternative version of the same content. Natural Language Processing technology and research in Psycholinguistics can help automate the task of classifying a text according to its reading difficulty. In this paper, we present experiments to build a readability checker to classify texts in Portuguese, considering different text genres, domains and reader ages, using naturally occurring texts. More precisely, we classify texts in simple (for 7 to 14-year-olds) and complex (for adults), and address three key research questions: (1) Which machine-learning algorithm produces the best results? (2) Which features are relevant? (3) Do different text genres have an impact on readability assessment?

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. W3C. Web content accessibility guidelines (WCAG) 2.0. W3C Recommendation (December 2008), http://www.w3.org/TR/WCAG20/

  2. Flesch, R.: A new readability yardstick. Journal of Applied Psychology 32, 221–233 (1948)

    Article  Google Scholar 

  3. DuBay, W.H.: The principles of readability. Impact Information, Costa Mesa (2004), http://www.impact-information.com/impactinfo/readability02.pdf

    Google Scholar 

  4. Roark, B., Mitchell, M., Hollingshead, K.: Syntactic complexity measures for detect-ing mild cognitive impairment. In: The Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing, Prague, Czech Republic (2007)

    Google Scholar 

  5. Miltsakaki, E., Troutt, A.: Read-X: Automatic Evaluation of Reading Difficulty of Web Text. In: The Proceedings of E-Learn 2007, by the Association for the Advancement of Computing in Education (AACE), Quebec, Canada (2007)

    Google Scholar 

  6. Miltsakaki, E., Troutt, A.: Real Time Web Text Classification and Analysis of Reading Difficulty. In: The Proceedings of the 3rd Workshop on Innovative Use of NLP for Building Educational Applications, at the 46th Meeting of the Association for Computational Linguistics and Human Language Technologies, Columbus, OH (2008)

    Google Scholar 

  7. Pitler, E., Nenkova, A.: Revisiting readability: A unified framework for predicting text quality. In: Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), Wikiki, Honolulu, Hawaii (2008)

    Google Scholar 

  8. Newbold, N., Gillam, L.: The Linguistics of Readability: The Next Step for Word Processing. In: Proceedings of the NAACL HLT 2010, Workshop on Computational Linguistics and Writing, Los Angeles, California, pp. 65–72 (2010)

    Google Scholar 

  9. Burstein, J., Chodorow, M., Leacock, C.: CriterionSM Online Essay Evaluation: An Application for Automated Evaluation of Student Essays. In: The Proceedings of the Fifteenth Annual Conference on Innovative Applications of Artificial Intelligence, Acapulco, Mexico (2003)

    Google Scholar 

  10. Sheehan, K.M., Kostin, I., Futagi, Y.: Reading Level Assessment for Literary and Expository Texts. In: McNamara, D.S., Trafton, J.G. (eds.) Proceedings of the 29th Annual Cognitive Science Society, pp. 18–53. Cognitive Science Society, Austin (2007)

    Google Scholar 

  11. Feng, L., Elhadad, N., Huenerfauth, M.: Cognitively Motivated Features for Readability Assessment. In: The Proceedings of Conference of the European Chapter of the Association for Computational Linguistics (EACL), Athens, Greece, pp. 229–237 (2009)

    Google Scholar 

  12. Heilman, M., Collins-Thompson, K., Callan, J., Eskenazi, M.: Combining lexical and grammatical features to improve readability measures for first and second language texts. In: The Proceedings of NAACL HLT, Rochester, NY, pp. 460–467 (2007)

    Google Scholar 

  13. Heilman, M., Collins-Thompson, K., Eskenazi, M.: An Analysis of Statistical Models and Features for Reading Difficulty Prediction. In: Proceedings of the 3rd Workshop on Innovative Use of NLP for Building Educational Applications, Association for Computational Linguistics, Rochester, NY, pp. 71–79 (2008)

    Google Scholar 

  14. Petersen, S.E., Ostendorf, M.: A machine learning approach to reading level assess-ment. Computer Speech and Language 23, 89–106 (2009)

    Article  Google Scholar 

  15. Schwarm, S.E., Ostendorf, M.: Reading Level Assessment Using Support Vector Machines and Statistical Language Models. In: The Proceedings of the 43rd Annual Meeting of the ACL, pp. 523–530. University of Michigan, Ann Arbor (2005)

    Google Scholar 

  16. Martins, T.B.F., Ghiraldelo, C.M., Nunes, M.G.V., Oliveira Junior, O.N.: Readability formulas applied to textbooks in Brazilian Portuguese. ICMC Technical Report, N. 28, p. 11 (1996)

    Google Scholar 

  17. Aluisio, S.M., Specia, L., Gasperin, C., Scarton, C.: Readability Assessment for Text Simplification. In: Proceedings of the NAACL-HLT 2010, Workshop on Innovative Use of NLP for Building Educational Application, Los Angeles, USA, pp. 1–9 (2010)

    Google Scholar 

  18. Scarton, C.E., Aluísio, S.M.: Análise da Inteligibilidade de textos via ferramentas de Processamento de Língua Natural: adaptando as métricas do Coh-Metrix para o Português. Revista Linguamática (Revista para o Processamento Automático das Línguas Ibéricas) 2(1), 45–61 (2010)

    Google Scholar 

  19. Graesser, A.C., McNamara, D.S., Louwerse, M.M., Cai, Z.: Coh-Metrix: Analysis of text on cohesion and language. Behavioral Research Methods, Instruments, and Computers 36, 193–202 (2004)

    Article  Google Scholar 

  20. McNamara, D.S., Louwerse, M.M., Graesser, A.C.: Coh-Metrix: Automated cohe-sion and coherence scores to predict text readability and facilitate comprehension. Grant proposal (2002), http://cohmetrix.memphis.edu/cohmetrixpr/publications.html

  21. Crossley, S.A., Louwerse, M.M., McCarthy, P.M., McNamara, D.S.: A linguistic analysis of simplified and authentic texts. Modern Language Journal 91, 15–30 (2007)

    Article  Google Scholar 

  22. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  23. Santini, M.: Characterizing Genres of Web Pages: Genre Hybridism and Individualization. In: Hawaii International Conference on System Sciences, 40th Annual Hawaii International Conference on System Sciences (HICSS 2007), p. 71 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Scarton, C., Gasperin, C., Aluisio, S. (2010). Revisiting the Readability Assessment of Texts in Portuguese. In: Kuri-Morales, A., Simari, G.R. (eds) Advances in Artificial Intelligence – IBERAMIA 2010. IBERAMIA 2010. Lecture Notes in Computer Science(), vol 6433. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16952-6_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-16952-6_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-16951-9

  • Online ISBN: 978-3-642-16952-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics