Abstract
In this paper, we propose an approach for predicting the age of the authors of narrative texts written by children between 6 and 13 years old. The features of the proposed model, which are lexical and syntactical (part of speech), were normalized to avoid that the model uses the length of the text as a predictor. In addition, the initial features were extended using n-grams representations and combined using machine learning techniques for regression (i.e. SMOreg). The proposed model was tested with collections of texts retrieved from Internet in Spanish, French and English, obtaining mean-absolute-error rates in the age-prediction task of 1.40, 1.20 and 1.72 years-old, respectively. Finally, we discuss the usefulness of this model to generate rankings of documents by written proficiency for each age.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aguilar, C.: Análisis de frecuencias de construcciones anafóricas en narraciones infantiles. Estudios de Lingüística Aplicada 22(38), 33–43 (2003)
Becerra, C., Gonzalez, F., Gelbukh, A.: Visualizable and explicable recommendations obtained from price estimation functions. In: Proceedings of the RecSys 2011 Workshop on Human Decision Making in Recommender Systems (Decisions@ RecSys 2011), Chicago, IL, pp. 27–34 (2011)
Carreras, X., Chao, I., Lluis, P.: FreeLing: an open-source suite of language analyzers. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004), Barcelona, Spain (2004)
Cheng, N., Chandramouli, R., Subbalakshmi, K.: Author gender identification from text. Digital Investigation 8(1), 78–88 (2011)
Colletta, J.-M., Pellenq, C., Guidetti, M.: Age-related changes in co-speech gesture and narrative: Evidence from french children and adults. Speech Communication 52(6), 565–576 (2010)
Cortes, C., Vapnik, V.N.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)
De-Arteaga, M., Jimenez, S., Dueñas, G., Mancera, S., Baquero, J.: Author profiling using corpus statistics, lexicons and stylistic features. In: Online Working Notes of the 10th PAN Evaluation Lab on Uncovering Plagiarism, Authorship. and Social Misuse, CLEF 2013, Valencia, Spain (September 2013)
Dikli, S.: An overview of automated scoring of essays. Journal of Technology, Learning and Assessment 5(1) (August 2006)
Dumais, S.: Latent semantic analysis. Annual Review of Information Science and Technology 38(1), 188–230 (2004)
Furman, R., Özyürek, A.: Development of interactional discourse markers: Insights from turkish children’s and adults’ oral narratives. Journal of Pragmatics 39(10), 1742–1757 (2007)
Hall, M., Eibe, F., Holmes, G., Pfahringer, B.: The WEKA data mining software: An update. SIGKDD Explorations 11(1), 10–18 (2009)
Ilgaz, H., Aksu-Koç, A.: Episodic development in preschool children’s play-prompted and direct-elicited narratives. Cognitive Development 20(4), 526–544 (2005)
Kakkonen, T., Myller, N., Timonen, J., Sutinen, E.: Automatic essay grading with probabilistic latent semantic analysis. In: Proceedings of the Second Workshop on Building Educational Applications Using NLP, EdAppsNLP 2005, pp. 29–36. Association for Computational Linguistics, Stroudsburg (2005)
Landauer, T.: Pasteur’s quadrant: Computational linguistics, LSA, and education. In: Proceedings of the HLT-NAACL 2003 Workshop on Building Educational Applications Using Natural Language Processing, Edmonton, Canada (2003)
López-Monroy, A.P., Montes-y Gómez, M., Escalante, H.J., Villaseñor-Pineda, L., Villatoro-Tello, E.: INAOE’s participation at PAN’13: author profiling task notebook for PAN at CLEF 2013. In: Online Working Notes of the 10th PAN Evaluation Lab on Uncovering Plagiarism, Authorship. and Social Misuse, CLEF 2013, Valencia, Spain (September 2013)
Nguyen, D., Smith, N.A., Rosé, C.P.: Author age prediction from text using linear regression. In: Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, LaTeCH 2011, pp. 115–123. Association for Computational Linguistics, Stroudsburg (2011)
Pennbaker, J.W., Stone, L.D.: Words of wisdom: Language use over the life span. Journal of Personality and Social Psychology 85(2), 291–301 (2003)
Rangel, F., Rosso, P., Koppel, M., Stamatatos, E., Inches, G.: Overview of the author profiling task at PAN 2013. In: Online Working Notes of the 10th PAN Evaluation Lab on Uncovering Plagiarism, Authorship. and Social Misuse, CLEF 2013, Valencia, Spain (September 2013)
Salton, G., Wong, A.K.C., Yang, C.-S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of International Conference on New Methods in Language Processing, Manchester, UK (1994)
Shevade, S., Keerthi, S., Bhattacharyya, C., Murthy, K.: Improvements to the SMO algorithm for SVM regression. IEEE-NN 11(5), 1188–1193 (2000)
Smola, A.J.: Learning with Kernels. GMD Forschungszentrum Informationstechnik, Sankt Augustin (1998)
Stadler, M.A., Ward, G.C.: Supporting the narrative development of young children. Early Childhood Education Journal 33(2), 73–80 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Moreno, N., Jimenez, S., Baquero, J. (2014). Automatically Assessing Children’s Writing Skills Based on Age-Supervised Datasets. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54903-8_47
Download citation
DOI: https://doi.org/10.1007/978-3-642-54903-8_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54902-1
Online ISBN: 978-3-642-54903-8
eBook Packages: Computer ScienceComputer Science (R0)