Skip to main content

A Statistical Model for Predicting Child Language Acquisition: Unfolding Qualitative Grammatical Development by Using Logistic Regression Model

  • Conference paper
  • First Online:
Studies in Theoretical and Applied Statistics (SIS 2021)

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 406))

Included in the following conference series:

  • 354 Accesses

Abstract

Language acquisition is a scientific puzzle still awaiting a theoretical solution. Children seem to acquire their native language in a spontaneous and effortless way and they probably do so by keeping track of the frequency with which language items such as phonemes or parts of speech occur. Advances in data storage, processing and visualization have triggered a growing and fertile interest in analysing language by relying on statistics and quantitative methods. In this paper we propose a multiple logistic regression model to evaluate how different components of language contribute to its acquisition over time. The empirical basis consists of a corpus, which can be considered as a series of statistically representative samples taken at regular time intervals. The aim is to show how quantitative methods can contribute to explaining the creation and development of grammatical categories in first language acquisition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Among the seven children in the CoLaJE database we choose Adrien because, from a sampling point of view, the data is more detailed and complete.

  2. 2.

    Classical linear regression was discarded because it gave poor results during the modelling stage.

  3. 3.

    We transformed the variable from months to years for a better representation of the data.

  4. 4.

    It is possible to calculate the IPC of any given French word at this link [10]: http://igm.univ-mlv.fr/~gambette/iPhocomp/.

  5. 5.

    Universal Dependencies (UD) is a framework for consistent annotation of grammar (parts of speech, morphological features, and syntactic dependencies) across different human languages.

  6. 6.

    Before modelling AGE as linear, we tried to create three successive yearly time slots to see how the two other regressors behave if taken apart, but the resulting classified number of cases had a lower success rate than the model proposed. We then choose to model it as linear because first language acquisition is a highly non-linear phenomenon and the only certainty linguists have is that – roughly speaking – it develops in a cumulative way over time. We tried to model the interaction effects between COMPLEX, IPC and CLASS too, but it turned out to be less precise than the model proposed: in fact, COMPLEX showed a counterintuitive result in which its increase in value causes a decrease in WPV (models are available on request).

  7. 7.

    All calculations are performed with STATA ver. 15.

  8. 8.

    The AGE in the abscissa axis refers to the period of the video recordings (see CoLaJE database [4] for more details).

  9. 9.

    The Sentence Phonetic Variation Rate (SPVR) is the ratio between the number of phonetic variations (the number of differences detected between “pho” and “mod”) and the total numbers of words. SPVR can assume the value 0% when the child does not make any errors and 100% when the child does not correctly pronounce any of the words contained in the sentence [17].

  10. 10.

    To have an idea of what 0 or 4 or 8 mean, you can type a word in the link [10]: http://igm.univ-mlv.fr/~gambette/iPhocomp/.

  11. 11.

    The past participle form in French could be given as an example.

  12. 12.

    A graphic visualization of this work can be found at this link [17]: http://advanse.lirmm.fr/EMClustering/.

  13. 13.

    Considering the low number of "OTHER" type words, only the most important classes were taken: CLOSED and OPEN.

References

  1. Ambridge, B., Kidd, E., Rowland, C.F., Theakston, A.: The ubiquity of frequency effects in first language acquisition. J. Child Lang. 42, 239–273. Cambridge University Press (2015)

    Google Scholar 

  2. Briglia, A.: Statistical and computational approaches to first language acquisition. Mining a set of French longitudinal corpora (CoLaJE). Linguistics. Université Paul Valéry Montpellier 3 (France); University of Messina (Italy) (2021)

    Google Scholar 

  3. Briglia, A., Mucciardi, M., Sauvage, J.: Identify the speech code through statistics: a data-driven approach, Book of Short Papers SIS. (2020)

    Google Scholar 

  4. CoLaJE Corpus: http://colaje.scicog.fr/index.php/corpus (2020)

  5. Colombo, M., Elkin, L., Hartmann, S.: Being realist about Bayes, and the predictive processing theory of mind. Br. J. Philos. Sci. 72(11), 185–220 (2020)

    Google Scholar 

  6. Didirkova, I., Dodane, C., Diwersy S.: The role of disfluencies in language acquisition and development of syntactic complexity in children. DISS 2019, Budapest, Hungary (2019)

    Google Scholar 

  7. Ferrer, I.C., Solé, R.V.: The small world of human language. Proc. R. Soc. Lond. B.2682261–2265 (2001)

    Google Scholar 

  8. Friston, K.: Life as we know it. J. R. Soc. Interface 10 (2013)

    Google Scholar 

  9. Hosmer, D., Lemeshow, S.: Applied logistic regression. Wiley, New York (1989)

    MATH  Google Scholar 

  10. Index of Phonetic Complexity. http://igm.univ-mlv.fr/~gambette/iPhocomp/ (2021)

  11. Jakielski, K.: Quantifying phonetic complexity in words: an experimental index. Child Phonology Conference, Cedar Fallas, IA (2000)

    Google Scholar 

  12. Lee, H., Gambette, P., Barkat-Defradas, M.: iPhocomp: calcul automatique de l’indice de complexité phonétique de Jakielski. JEP 2014, XXXè édition des Journées d'Etudes sur la Parole, Le Mans, France, pp. 622–630, Actes de la XXXe édition des Journées d'Etudes sur la Parole (2014)

    Google Scholar 

  13. Mac Neilage, P.: The frame/content theory of evolution of speech production. Behav. Brain Sci. 21(4), 499–511 (1998)

    Article  Google Scholar 

  14. Mac Whinney, B.: The childes project: tools for analysing talk, 3rd edn. Lawrence Erlbaum Associates, Mahwah, NJ (2000)

    Google Scholar 

  15. Morgenstern, A., Parisse. C.: The Paris corpus. French Lang Stud 22, 7–12. Cambridge (2012)

    Google Scholar 

  16. Mucciardi, M., Pirrotta, G., Briglia, A.: EM Clustering method and first language acquisition. In: Book of Short Papers Models and Learning for Clustering and Classification (2021)

    Google Scholar 

  17. Mucciardi, M., Pirrotta, G., Briglia, A. Sallaberry, A.: Visualizing cluster of words: a graphical approach to grammar acquisition. In: Book of Abstracts and Short Papers CLADAG 2021 (2021)

    Google Scholar 

  18. Qi, P., Zhang, Y., Zhang, Y., Bolton, J., Manning, C.J.: Stanza: a python natural language processing toolkit for many human languages. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations (2020)

    Google Scholar 

  19. Saffran, J.: Statistical language learning: mechanisms and constraints. Curr. Dir. Psychol. Sci. 12(4), 110–114 (2003)

    Article  Google Scholar 

  20. Sagae, K.: Tracking child language development with neural network language models. Front. Psychol. 12, 674402 (2021)

    Article  Google Scholar 

  21. Sekali, M.: First language acquisition of French grammar (from 10 months to 4 years old). French Lang. Stud. 22, 1–6 (2012)

    Article  Google Scholar 

  22. Szmrecsanyi, B.: On operationalizing syntactic complexity. In: Purnelle, G., Fairon C., Dister A. (eds.), Le poids des mots. Proceedings of the 7th International Conference on Textual Data Statistical Analysis, vol. 2. Presses Universitaires de Louvain, Louvain-la-Neuve (2004)

    Google Scholar 

  23. Tomasello, M., Stahl, D.: Sampling children’s spontaneous speech: how much is enough? J. Child Lang. 31, 101–121 (2004)

    Article  Google Scholar 

  24. UD (Universal Dependencies).: https://universaldependencies.org (2021)

  25. Yamaguchi, N.: What is a representative language sample for word and sound acquisition? Can. J. Linguist., Univ. Tor. Press. 63(04), 667–685 (2018)

    Article  Google Scholar 

  26. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision—ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8689. Springer (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Massimo Mucciardi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Briglia, A., Mucciardi, M., Pirrotta, G. (2022). A Statistical Model for Predicting Child Language Acquisition: Unfolding Qualitative Grammatical Development by Using Logistic Regression Model. In: Salvati, N., Perna, C., Marchetti, S., Chambers, R. (eds) Studies in Theoretical and Applied Statistics . SIS 2021. Springer Proceedings in Mathematics & Statistics, vol 406. Springer, Cham. https://doi.org/10.1007/978-3-031-16609-9_7

Download citation

Publish with us

Policies and ethics