Skip to main content

Towards Automatic Content Analysis of Rhetorical Structure in Brazilian College Entrance Essays

  • Conference paper
  • First Online:
Artificial Intelligence in Education (AIED 2021)

Abstract

Essay scorers manually look for the presence of required rhetorical categories to evaluate coherence, which is a time-consuming task. Several attempts in the literature have been reported to automate the identification of rhetorical categories in essays with machine learning. However, existing machine learning algorithms are mostly trained on content features which can lead to over-fitting and hindering model generalizability. Thus, this paper proposed a set of content-independent features to identify rhetorical categories. The best performing classifier, XGBoost, achieved performance comparable to human annotation and outperformed previous models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://bit.ly/36LivBB.

References

  1. Abba, K.A., Joshi, R.M., Ji, X.R.: Analyzing writing performance of l1, l2, and generation 1.5 community college students through coh-metrix. Written Lang. Literacy 22(1), 67–94 (2019)

    Google Scholar 

  2. Aggarwal, C.C., Zhai, C.: A survey of text classification algorithms. In: Mining Text Data, pp. 163–222. Springer (2012). https://doi.org/10.1007/978-1-4614-3223-4_6

  3. Barbosa, G., et al.: Towards automatic cross-language classification of cognitive presence in online discussions. In: Proceedings of the Tenth International Conference on Learning Analytics & Knowledge, pp. 605–614 (2020)

    Google Scholar 

  4. Burstein, J., Marcu, D., Knight, K.: Finding the write stuff: automatic identification of discourse structure in student essays. IEEE Intell. Syst. 18(1), 32–39 (2003). https://doi.org/10.1109/MIS.2003.1179191

    Article  Google Scholar 

  5. Camelo, R., Justino, S., de Mello, R.F.L.: Coh-metrix PT-BR: uma API web de análise textual para a educação. In: Anais dos Workshops do IX Congresso Brasileiro de Informática na Educação, pp. 179–186. SBC (2020)D

    Google Scholar 

  6. Carvalho, F., Rodrigues, R.G., Santos, G., Cruz, P., Ferrari, L., Guedes, G.P.: Evaluating the Brazilian Portuguese version of the 2015 LIWC lexicon with sentiment analysis in social networks. In: Anais do VIII Brazilian Workshop on Social Network Analysis and Mining, pp. 24–34. SBC (2019)

    Google Scholar 

  7. Cavalcanti, A.P., et al.: How good is my feedback? A content analysis of written feedback. In: Proceedings of the Tenth International Conference on Learning Analytics & Knowledge, pp. 428–437 (2020)

    Google Scholar 

  8. Chan, J.C.W., Paelinckx, D.: Evaluation of random forest and adaboost tree-based ensemble classification and spectral band selection for ecotope mapping using airborne hyperspectral imagery. Remote Sensing Environ. 112(6), 2999–3011 (2008)

    Article  Google Scholar 

  9. Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)

    Google Scholar 

  10. Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20(1), 37–46 (1960)

    Article  Google Scholar 

  11. Crossley, S.A., McNamara, D.S.: Understanding expert ratings of essay quality: Coh-metrix analyses of first and second language writing. Int. J. Continuing Eng. Educ. Life Long Learn. 21(2–3), 170–191 (2011)

    Article  Google Scholar 

  12. Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D.: Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15(1), 3133–3181 (2014)

    MathSciNet  MATH  Google Scholar 

  13. Ferreira, M., Rolim, V., Mello, R.F., Lins, R.D., Chen, G., Gašević, D.: Towards automatic content analysis of social presence in transcripts of online discussions. In: Proceedings of the Tenth International Conference on Learning Analytics & Knowledge, pp. 141–150 (2020)

    Google Scholar 

  14. Fiacco, J., Cotos, E., Rose, C.: Towards enabling feedback on rhetorical structure with neural sequence models. In: Proceedings of the 9th International Conference on Learning Analytics & Knowledge, pp. 310–319 (2019)

    Google Scholar 

  15. Graesser, A.C., McNamara, D.S., Kulikowich, J.M.: Coh-metrix: providing multilevel analyses of text characteristics. Educ. Res. 40(5), 223–234 (2011)

    Article  Google Scholar 

  16. Haendchen Filho, A., do Prado, H.A., Ferneda, E., Nau, J.: An approach to evaluate adherence to the theme and the argumentative structure of essays. Proc. Comput. Sci. 126, 788–797 (2018)

    Google Scholar 

  17. Jiang, S., Yang, K., Suvarna, C., Casula, P., Zhang, M., Rose, C.: Applying rhetorical structure theory to student essays for providing automated writing feedback. In: Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019, pp. 163–168 (2019)

    Google Scholar 

  18. Khalid, S., Khalil, T., Nasreen, S.: A survey of feature selection and feature extraction techniques in machine learning. In: 2014 Science and Information Conference, pp. 372–378. IEEE (2014)

    Google Scholar 

  19. Kiesel, D., Riehmann, P., Wachsmuth, H., Stein, B., Froehlich, B.: Visual analysis of argumentation in essays. IEEE Trans. Visual. Comput. Graph. 27, 1139–1148 (2020)

    Article  Google Scholar 

  20. Kovanovic, V., Joksimovic, S., Gasevic, D., Hatala, M.: What is the source of social capital? The association between social network position and social presence in communities of inquiry. In: Workshop at Educational Data Mining Conference. EDM (2014)

    Google Scholar 

  21. Latifi, S., Gierl, M.: Automated scoring of junior and senior high essays using coh-metrix features: implications for large-scale language testing. Lang. Test. 0265532220929918 (2020)

    Google Scholar 

  22. McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z.: Automated Evaluation of Text and Discourse with Coh-Metrix. Cambridge University Press, Cambridge (2014)

    Book  Google Scholar 

  23. Neto, V., Rolim, V., Ferreira, R., Kovanović, V., Gašević, D., Lins, R.D., Lins, R.: Automated analysis of cognitive presence in online discussions written in Portuguese. In: European Conference on Technology Enhanced Learning, pp. 245–261. Springer (2018). https://doi.org/10.1007/978-3-319-98572-5_19

  24. Nguyen, H., Litman, D.: Context-aware argumentative relation mining. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1127–1137 (2016)

    Google Scholar 

  25. Rakovic, M., Winne, P., Marzouk, Z., Chang, D.: Automatic identification of knowledge transforming content in argument essays developed from multiple sources. J. Comput. Assist. Learn

    Google Scholar 

  26. dos Santos, K.S., Soder, M., Marques, B.S.B., Feltrim, V.D.: Analyzing the rhetorical structure of opinion articles in the context of a Brazilian college entrance examination. In: Villavicencio, A., et al. (eds.) PROPOR 2018. LNCS (LNAI), vol. 11122, pp. 3–12. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99722-3_1

    Chapter  Google Scholar 

  27. Sutton, C., McCallum, A.: An introduction to conditional random fields for relational learning. In: Introduction to Statistical Relational Learning, vol. 2, pp. 93–128 (2006)

    Google Scholar 

  28. Tausczik, Y.R., Pennebaker, J.W.: The psychological meaning of words: LIWC and computerized text analysis methods. J. Lang. Soc. Psychol. 29(1), 24–54 (2010). https://doi.org/10.1177/0261927X09351676

    Article  Google Scholar 

  29. Van Dijk, T.A.: Macrostructures: An Interdisciplinary Study of Global Structures in Discourse, Interaction, and Cognition. Routledge (2019)

    Google Scholar 

  30. Zupanc, K., Bosnić, Z.: Automated essay evaluation with semantic analysis 120(C), 118–132 (2017). https://doi.org/10.1016/j.knosys.2017.01.006

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rafael Ferreira Mello .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mello, R.F., Fiorentino, G., Miranda, P., Oliveira, H., Raković, M., Gašević, D. (2021). Towards Automatic Content Analysis of Rhetorical Structure in Brazilian College Entrance Essays. In: Roll, I., McNamara, D., Sosnovsky, S., Luckin, R., Dimitrova, V. (eds) Artificial Intelligence in Education. AIED 2021. Lecture Notes in Computer Science(), vol 12749. Springer, Cham. https://doi.org/10.1007/978-3-030-78270-2_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-78270-2_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-78269-6

  • Online ISBN: 978-3-030-78270-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics