A Data-Driven Methodology to Assess Text Complexity Based on Syntactic and Semantic Measurements

Palma, Diego; Soto, Christian; Veliz, Mónica; Riffo, Bernardo; Gutiérrez, Antonio

doi:10.1007/978-3-030-25629-6_79

Diego Palma¹⁸,
Christian Soto¹⁸,
Mónica Veliz¹⁸,
Bernardo Riffo¹⁸ &
…
Antonio Gutiérrez¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1018))

Included in the following conference series:

International Conference on Human Interaction and Emerging Technologies

3952 Accesses

Abstract

In this paper we propose a data driven methodology to assess text complexity of Spanish school texts. We model the problem as a classification task, that can be solved in a data-driven fashion using machine learning techniques. We show empirically that the discriminative power of the classifier depends on school grade level. Our proposal includes multiple predictors that capture different dimensions of text complexity such as coherence and cohesion. We provide an importance analysis of predictors across several complexity levels. Finally, we assess the model performance using accuracy and correlation measurements. The proposed model achieves accuracies of 0.7.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Complexity of Russian Academic Texts as the Function of Syntactic Parameters

ReaderBench: A Multi-lingual Framework for Analyzing Text Complexity

Readability Formulas for Three Levels of Russian School Textbooks

Article 01 October 2024

References

Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)
Article Google Scholar
Hempelmann, C.F., Dufty, D., McCarthy, P.M., Graesser, A.C., Cai, Z., McNamara, D.S.: Using LSA to automatically identify givenness and newness of noun phrases in written discourse. In: Proceedings of the 27th Annual Conference of the Cognitive Science Society, pp. 941–946. Erlbaum, Mahwah (2005)
Google Scholar
Crossley, S.A., Kyle, K., McNamara, D.S.: The tool for the automatic analysis of text cohesion (TAACO): automatic assessment of local, global, and text cohesion. Behav. Res. Methods 48(4), 1227–1237 (2016)
Article Google Scholar
Barzilay, R., Lapata, M.: Modeling local coherence: an entity-based approach. Comput. Linguist. 34(1), 1–34 (2008)
Article Google Scholar
Guinaudeau, C., Strube, M.: Graph-based local coherence modeling. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 93–103 (2013)
Google Scholar
Salesky, E., Shen, W.: Exploiting morphological, grammatical, and semantic correlates for improved text difficulty assessment. In: Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 155–162 (2014)
Google Scholar
Palma, D., Atkinson, J.: Coherence-based automatic essay assessment. IEEE Intell. Syst. 33(5), 26–36 (2018)
Article Google Scholar
Kincaid, J.P., Fishburne Jr., R.P., Rogers, R.L., Chissom, B.S.: Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel (1975)
Google Scholar
Graesser, A.C., McNamara, D.S., Louwerse, M.M., Cai, Z.: Coh-Metrix: analysis of text on cohesion and language. Behav. Res. Methods Instrum. Comput. 36(2), 193–202 (2004)
Article Google Scholar
Wade-Stein, D., Kintsch, E.: Summary Street: interactive computer support for writing. Cogn. Instr. 22(3), 333–362 (2004)
Article Google Scholar
Honnibal, M., Montani, I.: spacy 2: natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing (2017, to appear)
Google Scholar
Cristian Cardellino: Spanish Billion Words Corpus and Embeddings, March 2016. https://crscardellino.github.io/SBWCE/
Kursa, M.B., Jankowski, A., Rudnicki, W.R.: Boruta–a system for feature selection. Fundamenta Informaticae 101(4), 271–285 (2010)
MathSciNet Google Scholar

Download references

Acknowledgments

This research was supported by FONDEF (Chile) under Grant IT17I0051 “Desarrollo de una herramienta computacional para la evaluación automática de textos en el sistema escolar chileno.” (“Development of a computational tool for automatic assessment of Chilean school texts”).

Author information

Authors and Affiliations

University of Concepción, Concepción, Chile
Diego Palma, Christian Soto, Mónica Veliz & Bernardo Riffo
Georgia Southern University, Statesboro, USA
Antonio Gutiérrez

Authors

Diego Palma
View author publications
You can also search for this author in PubMed Google Scholar
Christian Soto
View author publications
You can also search for this author in PubMed Google Scholar
Mónica Veliz
View author publications
You can also search for this author in PubMed Google Scholar
Bernardo Riffo
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Gutiérrez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Diego Palma .

Editor information

Editors and Affiliations

Institute for Advanced Systems Engineering, University of Central Florida, Orlando, FL, USA
Tareq Ahram
Université de Reims Champagne Ardenne, GRESPI, Reims, France
Redha Taiar
Laboratoire Motricité Humaine, Expertise, Sport, Santé, Université Côte d’Azur, Nice Cedex 3, France
Serge Colson
IFMK Niçois, Université Côte d’Azur, Nice Cedex 3, France
Arnaud Choplin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Palma, D., Soto, C., Veliz, M., Riffo, B., Gutiérrez, A. (2020). A Data-Driven Methodology to Assess Text Complexity Based on Syntactic and Semantic Measurements. In: Ahram, T., Taiar, R., Colson, S., Choplin, A. (eds) Human Interaction and Emerging Technologies. IHIET 2019. Advances in Intelligent Systems and Computing, vol 1018. Springer, Cham. https://doi.org/10.1007/978-3-030-25629-6_79

Download citation

DOI: https://doi.org/10.1007/978-3-030-25629-6_79
Published: 25 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-25628-9
Online ISBN: 978-3-030-25629-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics