Abstract
Essay scorers manually look for the presence of required rhetorical categories to evaluate coherence, which is a time-consuming task. Several attempts in the literature have been reported to automate the identification of rhetorical categories in essays with machine learning. However, existing machine learning algorithms are mostly trained on content features which can lead to over-fitting and hindering model generalizability. Thus, this paper proposed a set of content-independent features to identify rhetorical categories. The best performing classifier, XGBoost, achieved performance comparable to human annotation and outperformed previous models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Abba, K.A., Joshi, R.M., Ji, X.R.: Analyzing writing performance of l1, l2, and generation 1.5 community college students through coh-metrix. Written Lang. Literacy 22(1), 67–94 (2019)
Aggarwal, C.C., Zhai, C.: A survey of text classification algorithms. In: Mining Text Data, pp. 163–222. Springer (2012). https://doi.org/10.1007/978-1-4614-3223-4_6
Barbosa, G., et al.: Towards automatic cross-language classification of cognitive presence in online discussions. In: Proceedings of the Tenth International Conference on Learning Analytics & Knowledge, pp. 605–614 (2020)
Burstein, J., Marcu, D., Knight, K.: Finding the write stuff: automatic identification of discourse structure in student essays. IEEE Intell. Syst. 18(1), 32–39 (2003). https://doi.org/10.1109/MIS.2003.1179191
Camelo, R., Justino, S., de Mello, R.F.L.: Coh-metrix PT-BR: uma API web de análise textual para a educação. In: Anais dos Workshops do IX Congresso Brasileiro de Informática na Educação, pp. 179–186. SBC (2020)D
Carvalho, F., Rodrigues, R.G., Santos, G., Cruz, P., Ferrari, L., Guedes, G.P.: Evaluating the Brazilian Portuguese version of the 2015 LIWC lexicon with sentiment analysis in social networks. In: Anais do VIII Brazilian Workshop on Social Network Analysis and Mining, pp. 24–34. SBC (2019)
Cavalcanti, A.P., et al.: How good is my feedback? A content analysis of written feedback. In: Proceedings of the Tenth International Conference on Learning Analytics & Knowledge, pp. 428–437 (2020)
Chan, J.C.W., Paelinckx, D.: Evaluation of random forest and adaboost tree-based ensemble classification and spectral band selection for ecotope mapping using airborne hyperspectral imagery. Remote Sensing Environ. 112(6), 2999–3011 (2008)
Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20(1), 37–46 (1960)
Crossley, S.A., McNamara, D.S.: Understanding expert ratings of essay quality: Coh-metrix analyses of first and second language writing. Int. J. Continuing Eng. Educ. Life Long Learn. 21(2–3), 170–191 (2011)
Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D.: Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15(1), 3133–3181 (2014)
Ferreira, M., Rolim, V., Mello, R.F., Lins, R.D., Chen, G., Gašević, D.: Towards automatic content analysis of social presence in transcripts of online discussions. In: Proceedings of the Tenth International Conference on Learning Analytics & Knowledge, pp. 141–150 (2020)
Fiacco, J., Cotos, E., Rose, C.: Towards enabling feedback on rhetorical structure with neural sequence models. In: Proceedings of the 9th International Conference on Learning Analytics & Knowledge, pp. 310–319 (2019)
Graesser, A.C., McNamara, D.S., Kulikowich, J.M.: Coh-metrix: providing multilevel analyses of text characteristics. Educ. Res. 40(5), 223–234 (2011)
Haendchen Filho, A., do Prado, H.A., Ferneda, E., Nau, J.: An approach to evaluate adherence to the theme and the argumentative structure of essays. Proc. Comput. Sci. 126, 788–797 (2018)
Jiang, S., Yang, K., Suvarna, C., Casula, P., Zhang, M., Rose, C.: Applying rhetorical structure theory to student essays for providing automated writing feedback. In: Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019, pp. 163–168 (2019)
Khalid, S., Khalil, T., Nasreen, S.: A survey of feature selection and feature extraction techniques in machine learning. In: 2014 Science and Information Conference, pp. 372–378. IEEE (2014)
Kiesel, D., Riehmann, P., Wachsmuth, H., Stein, B., Froehlich, B.: Visual analysis of argumentation in essays. IEEE Trans. Visual. Comput. Graph. 27, 1139–1148 (2020)
Kovanovic, V., Joksimovic, S., Gasevic, D., Hatala, M.: What is the source of social capital? The association between social network position and social presence in communities of inquiry. In: Workshop at Educational Data Mining Conference. EDM (2014)
Latifi, S., Gierl, M.: Automated scoring of junior and senior high essays using coh-metrix features: implications for large-scale language testing. Lang. Test. 0265532220929918 (2020)
McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z.: Automated Evaluation of Text and Discourse with Coh-Metrix. Cambridge University Press, Cambridge (2014)
Neto, V., Rolim, V., Ferreira, R., Kovanović, V., Gašević, D., Lins, R.D., Lins, R.: Automated analysis of cognitive presence in online discussions written in Portuguese. In: European Conference on Technology Enhanced Learning, pp. 245–261. Springer (2018). https://doi.org/10.1007/978-3-319-98572-5_19
Nguyen, H., Litman, D.: Context-aware argumentative relation mining. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1127–1137 (2016)
Rakovic, M., Winne, P., Marzouk, Z., Chang, D.: Automatic identification of knowledge transforming content in argument essays developed from multiple sources. J. Comput. Assist. Learn
dos Santos, K.S., Soder, M., Marques, B.S.B., Feltrim, V.D.: Analyzing the rhetorical structure of opinion articles in the context of a Brazilian college entrance examination. In: Villavicencio, A., et al. (eds.) PROPOR 2018. LNCS (LNAI), vol. 11122, pp. 3–12. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99722-3_1
Sutton, C., McCallum, A.: An introduction to conditional random fields for relational learning. In: Introduction to Statistical Relational Learning, vol. 2, pp. 93–128 (2006)
Tausczik, Y.R., Pennebaker, J.W.: The psychological meaning of words: LIWC and computerized text analysis methods. J. Lang. Soc. Psychol. 29(1), 24–54 (2010). https://doi.org/10.1177/0261927X09351676
Van Dijk, T.A.: Macrostructures: An Interdisciplinary Study of Global Structures in Discourse, Interaction, and Cognition. Routledge (2019)
Zupanc, K., Bosnić, Z.: Automated essay evaluation with semantic analysis 120(C), 118–132 (2017). https://doi.org/10.1016/j.knosys.2017.01.006
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Mello, R.F., Fiorentino, G., Miranda, P., Oliveira, H., Raković, M., Gašević, D. (2021). Towards Automatic Content Analysis of Rhetorical Structure in Brazilian College Entrance Essays. In: Roll, I., McNamara, D., Sosnovsky, S., Luckin, R., Dimitrova, V. (eds) Artificial Intelligence in Education. AIED 2021. Lecture Notes in Computer Science(), vol 12749. Springer, Cham. https://doi.org/10.1007/978-3-030-78270-2_29
Download citation
DOI: https://doi.org/10.1007/978-3-030-78270-2_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-78269-6
Online ISBN: 978-3-030-78270-2
eBook Packages: Computer ScienceComputer Science (R0)