Towards Automatic Content Analysis of Rhetorical Structure in Brazilian College Entrance Essays

Mello, Rafael Ferreira; Fiorentino, Giuseppe; Miranda, Péricles; Oliveira, Hilário; Raković, Mladen; Gašević, Dragan

doi:10.1007/978-3-030-78270-2_29

Rafael Ferreira Mello¹³,
Giuseppe Fiorentino¹³,
Péricles Miranda¹³,
Hilário Oliveira¹⁴,
Mladen Raković¹⁵ &
…
Dragan Gašević¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12749))

Included in the following conference series:

International Conference on Artificial Intelligence in Education

3977 Accesses
2 Citations

Abstract

Essay scorers manually look for the presence of required rhetorical categories to evaluate coherence, which is a time-consuming task. Several attempts in the literature have been reported to automate the identification of rhetorical categories in essays with machine learning. However, existing machine learning algorithms are mostly trained on content features which can lead to over-fitting and hindering model generalizability. Thus, this paper proposed a set of content-independent features to identify rhetorical categories. The best performing classifier, XGBoost, achieved performance comparable to human annotation and outperformed previous models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://bit.ly/36LivBB.

References

Abba, K.A., Joshi, R.M., Ji, X.R.: Analyzing writing performance of l1, l2, and generation 1.5 community college students through coh-metrix. Written Lang. Literacy 22(1), 67–94 (2019)
Google Scholar
Aggarwal, C.C., Zhai, C.: A survey of text classification algorithms. In: Mining Text Data, pp. 163–222. Springer (2012). https://doi.org/10.1007/978-1-4614-3223-4_6
Barbosa, G., et al.: Towards automatic cross-language classification of cognitive presence in online discussions. In: Proceedings of the Tenth International Conference on Learning Analytics & Knowledge, pp. 605–614 (2020)
Google Scholar
Burstein, J., Marcu, D., Knight, K.: Finding the write stuff: automatic identification of discourse structure in student essays. IEEE Intell. Syst. 18(1), 32–39 (2003). https://doi.org/10.1109/MIS.2003.1179191
Article Google Scholar
Camelo, R., Justino, S., de Mello, R.F.L.: Coh-metrix PT-BR: uma API web de análise textual para a educação. In: Anais dos Workshops do IX Congresso Brasileiro de Informática na Educação, pp. 179–186. SBC (2020)D
Google Scholar
Carvalho, F., Rodrigues, R.G., Santos, G., Cruz, P., Ferrari, L., Guedes, G.P.: Evaluating the Brazilian Portuguese version of the 2015 LIWC lexicon with sentiment analysis in social networks. In: Anais do VIII Brazilian Workshop on Social Network Analysis and Mining, pp. 24–34. SBC (2019)
Google Scholar
Cavalcanti, A.P., et al.: How good is my feedback? A content analysis of written feedback. In: Proceedings of the Tenth International Conference on Learning Analytics & Knowledge, pp. 428–437 (2020)
Google Scholar
Chan, J.C.W., Paelinckx, D.: Evaluation of random forest and adaboost tree-based ensemble classification and spectral band selection for ecotope mapping using airborne hyperspectral imagery. Remote Sensing Environ. 112(6), 2999–3011 (2008)
Article Google Scholar
Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)
Google Scholar
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20(1), 37–46 (1960)
Article Google Scholar
Crossley, S.A., McNamara, D.S.: Understanding expert ratings of essay quality: Coh-metrix analyses of first and second language writing. Int. J. Continuing Eng. Educ. Life Long Learn. 21(2–3), 170–191 (2011)
Article Google Scholar
Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D.: Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15(1), 3133–3181 (2014)
MathSciNet MATH Google Scholar
Ferreira, M., Rolim, V., Mello, R.F., Lins, R.D., Chen, G., Gašević, D.: Towards automatic content analysis of social presence in transcripts of online discussions. In: Proceedings of the Tenth International Conference on Learning Analytics & Knowledge, pp. 141–150 (2020)
Google Scholar
Fiacco, J., Cotos, E., Rose, C.: Towards enabling feedback on rhetorical structure with neural sequence models. In: Proceedings of the 9th International Conference on Learning Analytics & Knowledge, pp. 310–319 (2019)
Google Scholar
Graesser, A.C., McNamara, D.S., Kulikowich, J.M.: Coh-metrix: providing multilevel analyses of text characteristics. Educ. Res. 40(5), 223–234 (2011)
Article Google Scholar
Haendchen Filho, A., do Prado, H.A., Ferneda, E., Nau, J.: An approach to evaluate adherence to the theme and the argumentative structure of essays. Proc. Comput. Sci. 126, 788–797 (2018)
Google Scholar
Jiang, S., Yang, K., Suvarna, C., Casula, P., Zhang, M., Rose, C.: Applying rhetorical structure theory to student essays for providing automated writing feedback. In: Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019, pp. 163–168 (2019)
Google Scholar
Khalid, S., Khalil, T., Nasreen, S.: A survey of feature selection and feature extraction techniques in machine learning. In: 2014 Science and Information Conference, pp. 372–378. IEEE (2014)
Google Scholar
Kiesel, D., Riehmann, P., Wachsmuth, H., Stein, B., Froehlich, B.: Visual analysis of argumentation in essays. IEEE Trans. Visual. Comput. Graph. 27, 1139–1148 (2020)
Article Google Scholar
Kovanovic, V., Joksimovic, S., Gasevic, D., Hatala, M.: What is the source of social capital? The association between social network position and social presence in communities of inquiry. In: Workshop at Educational Data Mining Conference. EDM (2014)
Google Scholar
Latifi, S., Gierl, M.: Automated scoring of junior and senior high essays using coh-metrix features: implications for large-scale language testing. Lang. Test. 0265532220929918 (2020)
Google Scholar
McNamara, D.S., Graesser, A.C., McCarthy, P.M., Cai, Z.: Automated Evaluation of Text and Discourse with Coh-Metrix. Cambridge University Press, Cambridge (2014)
Book Google Scholar
Neto, V., Rolim, V., Ferreira, R., Kovanović, V., Gašević, D., Lins, R.D., Lins, R.: Automated analysis of cognitive presence in online discussions written in Portuguese. In: European Conference on Technology Enhanced Learning, pp. 245–261. Springer (2018). https://doi.org/10.1007/978-3-319-98572-5_19
Nguyen, H., Litman, D.: Context-aware argumentative relation mining. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1127–1137 (2016)
Google Scholar
Rakovic, M., Winne, P., Marzouk, Z., Chang, D.: Automatic identification of knowledge transforming content in argument essays developed from multiple sources. J. Comput. Assist. Learn
Google Scholar
dos Santos, K.S., Soder, M., Marques, B.S.B., Feltrim, V.D.: Analyzing the rhetorical structure of opinion articles in the context of a Brazilian college entrance examination. In: Villavicencio, A., et al. (eds.) PROPOR 2018. LNCS (LNAI), vol. 11122, pp. 3–12. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99722-3_1
Chapter Google Scholar
Sutton, C., McCallum, A.: An introduction to conditional random fields for relational learning. In: Introduction to Statistical Relational Learning, vol. 2, pp. 93–128 (2006)
Google Scholar
Tausczik, Y.R., Pennebaker, J.W.: The psychological meaning of words: LIWC and computerized text analysis methods. J. Lang. Soc. Psychol. 29(1), 24–54 (2010). https://doi.org/10.1177/0261927X09351676
Article Google Scholar
Van Dijk, T.A.: Macrostructures: An Interdisciplinary Study of Global Structures in Discourse, Interaction, and Cognition. Routledge (2019)
Google Scholar
Zupanc, K., Bosnić, Z.: Automated essay evaluation with semantic analysis 120(C), 118–132 (2017). https://doi.org/10.1016/j.knosys.2017.01.006

Download references

Author information

Authors and Affiliations

Department of computing, Universidade Federal Rural de Pernambuco, Recife, Brazil
Rafael Ferreira Mello, Giuseppe Fiorentino & Péricles Miranda
Instituto Federal do Espírito Santo, Vitória, Brazil
Hilário Oliveira
Centre for Learning Analytics, Faculty of Information Technology, Monash University, Melbourne, Australia
Mladen Raković & Dragan Gašević

Authors

Rafael Ferreira Mello
View author publications
You can also search for this author in PubMed Google Scholar
Giuseppe Fiorentino
View author publications
You can also search for this author in PubMed Google Scholar
Péricles Miranda
View author publications
You can also search for this author in PubMed Google Scholar
Hilário Oliveira
View author publications
You can also search for this author in PubMed Google Scholar
Mladen Raković
View author publications
You can also search for this author in PubMed Google Scholar
Dragan Gašević
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rafael Ferreira Mello .

Editor information

Editors and Affiliations

Technion – Israel Institute of Technology, Haifa, Israel
Ido Roll
Arizona State University, Tempe, AZ, USA
Danielle McNamara
Utrecht University, Utrecht, The Netherlands
Sergey Sosnovsky
London Knowledge Lab, London, UK
Rose Luckin
University of Leeds, Leeds, UK
Vania Dimitrova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mello, R.F., Fiorentino, G., Miranda, P., Oliveira, H., Raković, M., Gašević, D. (2021). Towards Automatic Content Analysis of Rhetorical Structure in Brazilian College Entrance Essays. In: Roll, I., McNamara, D., Sosnovsky, S., Luckin, R., Dimitrova, V. (eds) Artificial Intelligence in Education. AIED 2021. Lecture Notes in Computer Science(), vol 12749. Springer, Cham. https://doi.org/10.1007/978-3-030-78270-2_29

Download citation

DOI: https://doi.org/10.1007/978-3-030-78270-2_29
Published: 12 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-78269-6
Online ISBN: 978-3-030-78270-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Towards Automatic Content Analysis of Rhetorical Structure in Brazilian College Entrance Essays