Abstract
The Web content accessibility guidelines (WCAG) 2.0 include in its principle of comprehensibility an accessibility requirement related to the level of writing. This requirement states that websites with texts demanding higher reading skills than individuals with lower secondary education possess (fifth to ninth grades in Brazil) should offer them an alternative version of the same content. Natural Language Processing technology and research in Psycholinguistics can help automate the task of classifying a text according to its reading difficulty. In this paper, we present experiments to build a readability checker to classify texts in Portuguese, considering different text genres, domains and reader ages, using naturally occurring texts. More precisely, we classify texts in simple (for 7 to 14-year-olds) and complex (for adults), and address three key research questions: (1) Which machine-learning algorithm produces the best results? (2) Which features are relevant? (3) Do different text genres have an impact on readability assessment?
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
W3C. Web content accessibility guidelines (WCAG) 2.0. W3C Recommendation (December 2008), http://www.w3.org/TR/WCAG20/
Flesch, R.: A new readability yardstick. Journal of Applied Psychology 32, 221–233 (1948)
DuBay, W.H.: The principles of readability. Impact Information, Costa Mesa (2004), http://www.impact-information.com/impactinfo/readability02.pdf
Roark, B., Mitchell, M., Hollingshead, K.: Syntactic complexity measures for detect-ing mild cognitive impairment. In: The Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing, Prague, Czech Republic (2007)
Miltsakaki, E., Troutt, A.: Read-X: Automatic Evaluation of Reading Difficulty of Web Text. In: The Proceedings of E-Learn 2007, by the Association for the Advancement of Computing in Education (AACE), Quebec, Canada (2007)
Miltsakaki, E., Troutt, A.: Real Time Web Text Classification and Analysis of Reading Difficulty. In: The Proceedings of the 3rd Workshop on Innovative Use of NLP for Building Educational Applications, at the 46th Meeting of the Association for Computational Linguistics and Human Language Technologies, Columbus, OH (2008)
Pitler, E., Nenkova, A.: Revisiting readability: A unified framework for predicting text quality. In: Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), Wikiki, Honolulu, Hawaii (2008)
Newbold, N., Gillam, L.: The Linguistics of Readability: The Next Step for Word Processing. In: Proceedings of the NAACL HLT 2010, Workshop on Computational Linguistics and Writing, Los Angeles, California, pp. 65–72 (2010)
Burstein, J., Chodorow, M., Leacock, C.: CriterionSM Online Essay Evaluation: An Application for Automated Evaluation of Student Essays. In: The Proceedings of the Fifteenth Annual Conference on Innovative Applications of Artificial Intelligence, Acapulco, Mexico (2003)
Sheehan, K.M., Kostin, I., Futagi, Y.: Reading Level Assessment for Literary and Expository Texts. In: McNamara, D.S., Trafton, J.G. (eds.) Proceedings of the 29th Annual Cognitive Science Society, pp. 18–53. Cognitive Science Society, Austin (2007)
Feng, L., Elhadad, N., Huenerfauth, M.: Cognitively Motivated Features for Readability Assessment. In: The Proceedings of Conference of the European Chapter of the Association for Computational Linguistics (EACL), Athens, Greece, pp. 229–237 (2009)
Heilman, M., Collins-Thompson, K., Callan, J., Eskenazi, M.: Combining lexical and grammatical features to improve readability measures for first and second language texts. In: The Proceedings of NAACL HLT, Rochester, NY, pp. 460–467 (2007)
Heilman, M., Collins-Thompson, K., Eskenazi, M.: An Analysis of Statistical Models and Features for Reading Difficulty Prediction. In: Proceedings of the 3rd Workshop on Innovative Use of NLP for Building Educational Applications, Association for Computational Linguistics, Rochester, NY, pp. 71–79 (2008)
Petersen, S.E., Ostendorf, M.: A machine learning approach to reading level assess-ment. Computer Speech and Language 23, 89–106 (2009)
Schwarm, S.E., Ostendorf, M.: Reading Level Assessment Using Support Vector Machines and Statistical Language Models. In: The Proceedings of the 43rd Annual Meeting of the ACL, pp. 523–530. University of Michigan, Ann Arbor (2005)
Martins, T.B.F., Ghiraldelo, C.M., Nunes, M.G.V., Oliveira Junior, O.N.: Readability formulas applied to textbooks in Brazilian Portuguese. ICMC Technical Report, N. 28, p. 11 (1996)
Aluisio, S.M., Specia, L., Gasperin, C., Scarton, C.: Readability Assessment for Text Simplification. In: Proceedings of the NAACL-HLT 2010, Workshop on Innovative Use of NLP for Building Educational Application, Los Angeles, USA, pp. 1–9 (2010)
Scarton, C.E., Aluísio, S.M.: Análise da Inteligibilidade de textos via ferramentas de Processamento de Língua Natural: adaptando as métricas do Coh-Metrix para o Português. Revista Linguamática (Revista para o Processamento Automático das Línguas Ibéricas) 2(1), 45–61 (2010)
Graesser, A.C., McNamara, D.S., Louwerse, M.M., Cai, Z.: Coh-Metrix: Analysis of text on cohesion and language. Behavioral Research Methods, Instruments, and Computers 36, 193–202 (2004)
McNamara, D.S., Louwerse, M.M., Graesser, A.C.: Coh-Metrix: Automated cohe-sion and coherence scores to predict text readability and facilitate comprehension. Grant proposal (2002), http://cohmetrix.memphis.edu/cohmetrixpr/publications.html
Crossley, S.A., Louwerse, M.M., McCarthy, P.M., McNamara, D.S.: A linguistic analysis of simplified and authentic texts. Modern Language Journal 91, 15–30 (2007)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Santini, M.: Characterizing Genres of Web Pages: Genre Hybridism and Individualization. In: Hawaii International Conference on System Sciences, 40th Annual Hawaii International Conference on System Sciences (HICSS 2007), p. 71 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Scarton, C., Gasperin, C., Aluisio, S. (2010). Revisiting the Readability Assessment of Texts in Portuguese. In: Kuri-Morales, A., Simari, G.R. (eds) Advances in Artificial Intelligence – IBERAMIA 2010. IBERAMIA 2010. Lecture Notes in Computer Science(), vol 6433. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16952-6_31
Download citation
DOI: https://doi.org/10.1007/978-3-642-16952-6_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16951-9
Online ISBN: 978-3-642-16952-6
eBook Packages: Computer ScienceComputer Science (R0)