Abstract
In this paper we propose a data driven methodology to assess text complexity of Spanish school texts. We model the problem as a classification task, that can be solved in a data-driven fashion using machine learning techniques. We show empirically that the discriminative power of the classifier depends on school grade level. Our proposal includes multiple predictors that capture different dimensions of text complexity such as coherence and cohesion. We provide an importance analysis of predictors across several complexity levels. Finally, we assess the model performance using accuracy and correlation measurements. The proposed model achieves accuracies of 0.7.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)
Hempelmann, C.F., Dufty, D., McCarthy, P.M., Graesser, A.C., Cai, Z., McNamara, D.S.: Using LSA to automatically identify givenness and newness of noun phrases in written discourse. In: Proceedings of the 27th Annual Conference of the Cognitive Science Society, pp. 941–946. Erlbaum, Mahwah (2005)
Crossley, S.A., Kyle, K., McNamara, D.S.: The tool for the automatic analysis of text cohesion (TAACO): automatic assessment of local, global, and text cohesion. Behav. Res. Methods 48(4), 1227–1237 (2016)
Barzilay, R., Lapata, M.: Modeling local coherence: an entity-based approach. Comput. Linguist. 34(1), 1–34 (2008)
Guinaudeau, C., Strube, M.: Graph-based local coherence modeling. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 93–103 (2013)
Salesky, E., Shen, W.: Exploiting morphological, grammatical, and semantic correlates for improved text difficulty assessment. In: Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 155–162 (2014)
Palma, D., Atkinson, J.: Coherence-based automatic essay assessment. IEEE Intell. Syst. 33(5), 26–36 (2018)
Kincaid, J.P., Fishburne Jr., R.P., Rogers, R.L., Chissom, B.S.: Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel (1975)
Graesser, A.C., McNamara, D.S., Louwerse, M.M., Cai, Z.: Coh-Metrix: analysis of text on cohesion and language. Behav. Res. Methods Instrum. Comput. 36(2), 193–202 (2004)
Wade-Stein, D., Kintsch, E.: Summary Street: interactive computer support for writing. Cogn. Instr. 22(3), 333–362 (2004)
Honnibal, M., Montani, I.: spacy 2: natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing (2017, to appear)
Cristian Cardellino: Spanish Billion Words Corpus and Embeddings, March 2016. https://crscardellino.github.io/SBWCE/
Kursa, M.B., Jankowski, A., Rudnicki, W.R.: Boruta–a system for feature selection. Fundamenta Informaticae 101(4), 271–285 (2010)
Acknowledgments
This research was supported by FONDEF (Chile) under Grant IT17I0051 “Desarrollo de una herramienta computacional para la evaluación automática de textos en el sistema escolar chileno.” (“Development of a computational tool for automatic assessment of Chilean school texts”).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Palma, D., Soto, C., Veliz, M., Riffo, B., Gutiérrez, A. (2020). A Data-Driven Methodology to Assess Text Complexity Based on Syntactic and Semantic Measurements. In: Ahram, T., Taiar, R., Colson, S., Choplin, A. (eds) Human Interaction and Emerging Technologies. IHIET 2019. Advances in Intelligent Systems and Computing, vol 1018. Springer, Cham. https://doi.org/10.1007/978-3-030-25629-6_79
Download citation
DOI: https://doi.org/10.1007/978-3-030-25629-6_79
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-25628-9
Online ISBN: 978-3-030-25629-6
eBook Packages: EngineeringEngineering (R0)