Skip to main content

Automatically Assessing Children’s Writing Skills Based on Age-Supervised Datasets

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8404))

  • 1676 Accesses

Abstract

In this paper, we propose an approach for predicting the age of the authors of narrative texts written by children between 6 and 13 years old. The features of the proposed model, which are lexical and syntactical (part of speech), were normalized to avoid that the model uses the length of the text as a predictor. In addition, the initial features were extended using n-grams representations and combined using machine learning techniques for regression (i.e. SMOreg). The proposed model was tested with collections of texts retrieved from Internet in Spanish, French and English, obtaining mean-absolute-error rates in the age-prediction task of 1.40, 1.20 and 1.72 years-old, respectively. Finally, we discuss the usefulness of this model to generate rankings of documents by written proficiency for each age.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aguilar, C.: Análisis de frecuencias de construcciones anafóricas en narraciones infantiles. Estudios de Lingüística Aplicada 22(38), 33–43 (2003)

    Google Scholar 

  2. Becerra, C., Gonzalez, F., Gelbukh, A.: Visualizable and explicable recommendations obtained from price estimation functions. In: Proceedings of the RecSys 2011 Workshop on Human Decision Making in Recommender Systems (Decisions@ RecSys 2011), Chicago, IL, pp. 27–34 (2011)

    Google Scholar 

  3. Carreras, X., Chao, I., Lluis, P.: FreeLing: an open-source suite of language analyzers. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004), Barcelona, Spain (2004)

    Google Scholar 

  4. Cheng, N., Chandramouli, R., Subbalakshmi, K.: Author gender identification from text. Digital Investigation 8(1), 78–88 (2011)

    Article  Google Scholar 

  5. Colletta, J.-M., Pellenq, C., Guidetti, M.: Age-related changes in co-speech gesture and narrative: Evidence from french children and adults. Speech Communication 52(6), 565–576 (2010)

    Article  Google Scholar 

  6. Cortes, C., Vapnik, V.N.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)

    Article  MATH  Google Scholar 

  7. De-Arteaga, M., Jimenez, S., Dueñas, G., Mancera, S., Baquero, J.: Author profiling using corpus statistics, lexicons and stylistic features. In: Online Working Notes of the 10th PAN Evaluation Lab on Uncovering Plagiarism, Authorship. and Social Misuse, CLEF 2013, Valencia, Spain (September 2013)

    Google Scholar 

  8. Dikli, S.: An overview of automated scoring of essays. Journal of Technology, Learning and Assessment 5(1) (August 2006)

    Google Scholar 

  9. Dumais, S.: Latent semantic analysis. Annual Review of Information Science and Technology 38(1), 188–230 (2004)

    Article  Google Scholar 

  10. Furman, R., Özyürek, A.: Development of interactional discourse markers: Insights from turkish children’s and adults’ oral narratives. Journal of Pragmatics 39(10), 1742–1757 (2007)

    Article  Google Scholar 

  11. Hall, M., Eibe, F., Holmes, G., Pfahringer, B.: The WEKA data mining software: An update. SIGKDD Explorations 11(1), 10–18 (2009)

    Article  Google Scholar 

  12. Ilgaz, H., Aksu-Koç, A.: Episodic development in preschool children’s play-prompted and direct-elicited narratives. Cognitive Development 20(4), 526–544 (2005)

    Article  Google Scholar 

  13. Kakkonen, T., Myller, N., Timonen, J., Sutinen, E.: Automatic essay grading with probabilistic latent semantic analysis. In: Proceedings of the Second Workshop on Building Educational Applications Using NLP, EdAppsNLP 2005, pp. 29–36. Association for Computational Linguistics, Stroudsburg (2005)

    Chapter  Google Scholar 

  14. Landauer, T.: Pasteur’s quadrant: Computational linguistics, LSA, and education. In: Proceedings of the HLT-NAACL 2003 Workshop on Building Educational Applications Using Natural Language Processing, Edmonton, Canada (2003)

    Google Scholar 

  15. López-Monroy, A.P., Montes-y Gómez, M., Escalante, H.J., Villaseñor-Pineda, L., Villatoro-Tello, E.: INAOE’s participation at PAN’13: author profiling task notebook for PAN at CLEF 2013. In: Online Working Notes of the 10th PAN Evaluation Lab on Uncovering Plagiarism, Authorship. and Social Misuse, CLEF 2013, Valencia, Spain (September 2013)

    Google Scholar 

  16. Nguyen, D., Smith, N.A., Rosé, C.P.: Author age prediction from text using linear regression. In: Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, LaTeCH 2011, pp. 115–123. Association for Computational Linguistics, Stroudsburg (2011)

    Google Scholar 

  17. Pennbaker, J.W., Stone, L.D.: Words of wisdom: Language use over the life span. Journal of Personality and Social Psychology 85(2), 291–301 (2003)

    Article  Google Scholar 

  18. Rangel, F., Rosso, P., Koppel, M., Stamatatos, E., Inches, G.: Overview of the author profiling task at PAN 2013. In: Online Working Notes of the 10th PAN Evaluation Lab on Uncovering Plagiarism, Authorship. and Social Misuse, CLEF 2013, Valencia, Spain (September 2013)

    Google Scholar 

  19. Salton, G., Wong, A.K.C., Yang, C.-S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)

    Article  MATH  Google Scholar 

  20. Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of International Conference on New Methods in Language Processing, Manchester, UK (1994)

    Google Scholar 

  21. Shevade, S., Keerthi, S., Bhattacharyya, C., Murthy, K.: Improvements to the SMO algorithm for SVM regression. IEEE-NN 11(5), 1188–1193 (2000)

    Article  Google Scholar 

  22. Smola, A.J.: Learning with Kernels. GMD Forschungszentrum Informationstechnik, Sankt Augustin (1998)

    Google Scholar 

  23. Stadler, M.A., Ward, G.C.: Supporting the narrative development of young children. Early Childhood Education Journal 33(2), 73–80 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Moreno, N., Jimenez, S., Baquero, J. (2014). Automatically Assessing Children’s Writing Skills Based on Age-Supervised Datasets. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54903-8_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-54903-8_47

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-54902-1

  • Online ISBN: 978-3-642-54903-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics