Automatically Assessing Children’s Writing Skills Based on Age-Supervised Datasets

Moreno, Nelly; Jimenez, Sergio; Baquero, Julia

doi:10.1007/978-3-642-54903-8_47

Nelly Moreno¹⁷,
Sergio Jimenez¹⁷ &
Julia Baquero¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8404))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1676 Accesses

Abstract

In this paper, we propose an approach for predicting the age of the authors of narrative texts written by children between 6 and 13 years old. The features of the proposed model, which are lexical and syntactical (part of speech), were normalized to avoid that the model uses the length of the text as a predictor. In addition, the initial features were extended using n-grams representations and combined using machine learning techniques for regression (i.e. SMOreg). The proposed model was tested with collections of texts retrieved from Internet in Spanish, French and English, obtaining mean-absolute-error rates in the age-prediction task of 1.40, 1.20 and 1.72 years-old, respectively. Finally, we discuss the usefulness of this model to generate rankings of documents by written proficiency for each age.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aguilar, C.: Análisis de frecuencias de construcciones anafóricas en narraciones infantiles. Estudios de Lingüística Aplicada 22(38), 33–43 (2003)
Google Scholar
Becerra, C., Gonzalez, F., Gelbukh, A.: Visualizable and explicable recommendations obtained from price estimation functions. In: Proceedings of the RecSys 2011 Workshop on Human Decision Making in Recommender Systems (Decisions@ RecSys 2011), Chicago, IL, pp. 27–34 (2011)
Google Scholar
Carreras, X., Chao, I., Lluis, P.: FreeLing: an open-source suite of language analyzers. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004), Barcelona, Spain (2004)
Google Scholar
Cheng, N., Chandramouli, R., Subbalakshmi, K.: Author gender identification from text. Digital Investigation 8(1), 78–88 (2011)
Article Google Scholar
Colletta, J.-M., Pellenq, C., Guidetti, M.: Age-related changes in co-speech gesture and narrative: Evidence from french children and adults. Speech Communication 52(6), 565–576 (2010)
Article Google Scholar
Cortes, C., Vapnik, V.N.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)
Article MATH Google Scholar
De-Arteaga, M., Jimenez, S., Dueñas, G., Mancera, S., Baquero, J.: Author profiling using corpus statistics, lexicons and stylistic features. In: Online Working Notes of the 10th PAN Evaluation Lab on Uncovering Plagiarism, Authorship. and Social Misuse, CLEF 2013, Valencia, Spain (September 2013)
Google Scholar
Dikli, S.: An overview of automated scoring of essays. Journal of Technology, Learning and Assessment 5(1) (August 2006)
Google Scholar
Dumais, S.: Latent semantic analysis. Annual Review of Information Science and Technology 38(1), 188–230 (2004)
Article Google Scholar
Furman, R., Özyürek, A.: Development of interactional discourse markers: Insights from turkish children’s and adults’ oral narratives. Journal of Pragmatics 39(10), 1742–1757 (2007)
Article Google Scholar
Hall, M., Eibe, F., Holmes, G., Pfahringer, B.: The WEKA data mining software: An update. SIGKDD Explorations 11(1), 10–18 (2009)
Article Google Scholar
Ilgaz, H., Aksu-Koç, A.: Episodic development in preschool children’s play-prompted and direct-elicited narratives. Cognitive Development 20(4), 526–544 (2005)
Article Google Scholar
Kakkonen, T., Myller, N., Timonen, J., Sutinen, E.: Automatic essay grading with probabilistic latent semantic analysis. In: Proceedings of the Second Workshop on Building Educational Applications Using NLP, EdAppsNLP 2005, pp. 29–36. Association for Computational Linguistics, Stroudsburg (2005)
Chapter Google Scholar
Landauer, T.: Pasteur’s quadrant: Computational linguistics, LSA, and education. In: Proceedings of the HLT-NAACL 2003 Workshop on Building Educational Applications Using Natural Language Processing, Edmonton, Canada (2003)
Google Scholar
López-Monroy, A.P., Montes-y Gómez, M., Escalante, H.J., Villaseñor-Pineda, L., Villatoro-Tello, E.: INAOE’s participation at PAN’13: author profiling task notebook for PAN at CLEF 2013. In: Online Working Notes of the 10th PAN Evaluation Lab on Uncovering Plagiarism, Authorship. and Social Misuse, CLEF 2013, Valencia, Spain (September 2013)
Google Scholar
Nguyen, D., Smith, N.A., Rosé, C.P.: Author age prediction from text using linear regression. In: Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, LaTeCH 2011, pp. 115–123. Association for Computational Linguistics, Stroudsburg (2011)
Google Scholar
Pennbaker, J.W., Stone, L.D.: Words of wisdom: Language use over the life span. Journal of Personality and Social Psychology 85(2), 291–301 (2003)
Article Google Scholar
Rangel, F., Rosso, P., Koppel, M., Stamatatos, E., Inches, G.: Overview of the author profiling task at PAN 2013. In: Online Working Notes of the 10th PAN Evaluation Lab on Uncovering Plagiarism, Authorship. and Social Misuse, CLEF 2013, Valencia, Spain (September 2013)
Google Scholar
Salton, G., Wong, A.K.C., Yang, C.-S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Article MATH Google Scholar
Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of International Conference on New Methods in Language Processing, Manchester, UK (1994)
Google Scholar
Shevade, S., Keerthi, S., Bhattacharyya, C., Murthy, K.: Improvements to the SMO algorithm for SVM regression. IEEE-NN 11(5), 1188–1193 (2000)
Article Google Scholar
Smola, A.J.: Learning with Kernels. GMD Forschungszentrum Informationstechnik, Sankt Augustin (1998)
Google Scholar
Stadler, M.A., Ward, G.C.: Supporting the narrative development of young children. Early Childhood Education Journal 33(2), 73–80 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Universidad Nacional de Colombia, Bogotá
Nelly Moreno, Sergio Jimenez & Julia Baquero

Authors

Nelly Moreno
View author publications
You can also search for this author in PubMed Google Scholar
Sergio Jimenez
View author publications
You can also search for this author in PubMed Google Scholar
Julia Baquero
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, Av. Juan Dios Bátiz, Col. Nueva Industrial Vallejo, 07738, Mexico D.F, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Moreno, N., Jimenez, S., Baquero, J. (2014). Automatically Assessing Children’s Writing Skills Based on Age-Supervised Datasets. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54903-8_47

Download citation

DOI: https://doi.org/10.1007/978-3-642-54903-8_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54902-1
Online ISBN: 978-3-642-54903-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics