Abstract
Language acquisition is a scientific puzzle still awaiting a theoretical solution. Children seem to acquire their native language in a spontaneous and effortless way and they probably do so by keeping track of the frequency with which language items such as phonemes or parts of speech occur. Advances in data storage, processing and visualization have triggered a growing and fertile interest in analysing language by relying on statistics and quantitative methods. In this paper we propose a multiple logistic regression model to evaluate how different components of language contribute to its acquisition over time. The empirical basis consists of a corpus, which can be considered as a series of statistically representative samples taken at regular time intervals. The aim is to show how quantitative methods can contribute to explaining the creation and development of grammatical categories in first language acquisition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Among the seven children in the CoLaJE database we choose Adrien because, from a sampling point of view, the data is more detailed and complete.
- 2.
Classical linear regression was discarded because it gave poor results during the modelling stage.
- 3.
We transformed the variable from months to years for a better representation of the data.
- 4.
It is possible to calculate the IPC of any given French word at this link [10]: http://igm.univ-mlv.fr/~gambette/iPhocomp/.
- 5.
Universal Dependencies (UD) is a framework for consistent annotation of grammar (parts of speech, morphological features, and syntactic dependencies) across different human languages.
- 6.
Before modelling AGE as linear, we tried to create three successive yearly time slots to see how the two other regressors behave if taken apart, but the resulting classified number of cases had a lower success rate than the model proposed. We then choose to model it as linear because first language acquisition is a highly non-linear phenomenon and the only certainty linguists have is that – roughly speaking – it develops in a cumulative way over time. We tried to model the interaction effects between COMPLEX, IPC and CLASS too, but it turned out to be less precise than the model proposed: in fact, COMPLEX showed a counterintuitive result in which its increase in value causes a decrease in WPV (models are available on request).
- 7.
All calculations are performed with STATA ver. 15.
- 8.
The AGE in the abscissa axis refers to the period of the video recordings (see CoLaJE database [4] for more details).
- 9.
The Sentence Phonetic Variation Rate (SPVR) is the ratio between the number of phonetic variations (the number of differences detected between “pho” and “mod”) and the total numbers of words. SPVR can assume the value 0% when the child does not make any errors and 100% when the child does not correctly pronounce any of the words contained in the sentence [17].
- 10.
To have an idea of what 0 or 4 or 8 mean, you can type a word in the link [10]: http://igm.univ-mlv.fr/~gambette/iPhocomp/.
- 11.
The past participle form in French could be given as an example.
- 12.
A graphic visualization of this work can be found at this link [17]: http://advanse.lirmm.fr/EMClustering/.
- 13.
Considering the low number of "OTHER" type words, only the most important classes were taken: CLOSED and OPEN.
References
Ambridge, B., Kidd, E., Rowland, C.F., Theakston, A.: The ubiquity of frequency effects in first language acquisition. J. Child Lang. 42, 239–273. Cambridge University Press (2015)
Briglia, A.: Statistical and computational approaches to first language acquisition. Mining a set of French longitudinal corpora (CoLaJE). Linguistics. Université Paul Valéry Montpellier 3 (France); University of Messina (Italy) (2021)
Briglia, A., Mucciardi, M., Sauvage, J.: Identify the speech code through statistics: a data-driven approach, Book of Short Papers SIS. (2020)
CoLaJE Corpus: http://colaje.scicog.fr/index.php/corpus (2020)
Colombo, M., Elkin, L., Hartmann, S.: Being realist about Bayes, and the predictive processing theory of mind. Br. J. Philos. Sci. 72(11), 185–220 (2020)
Didirkova, I., Dodane, C., Diwersy S.: The role of disfluencies in language acquisition and development of syntactic complexity in children. DISS 2019, Budapest, Hungary (2019)
Ferrer, I.C., Solé, R.V.: The small world of human language. Proc. R. Soc. Lond. B.2682261–2265 (2001)
Friston, K.: Life as we know it. J. R. Soc. Interface 10 (2013)
Hosmer, D., Lemeshow, S.: Applied logistic regression. Wiley, New York (1989)
Index of Phonetic Complexity. http://igm.univ-mlv.fr/~gambette/iPhocomp/ (2021)
Jakielski, K.: Quantifying phonetic complexity in words: an experimental index. Child Phonology Conference, Cedar Fallas, IA (2000)
Lee, H., Gambette, P., Barkat-Defradas, M.: iPhocomp: calcul automatique de l’indice de complexité phonétique de Jakielski. JEP 2014, XXXè édition des Journées d'Etudes sur la Parole, Le Mans, France, pp. 622–630, Actes de la XXXe édition des Journées d'Etudes sur la Parole (2014)
Mac Neilage, P.: The frame/content theory of evolution of speech production. Behav. Brain Sci. 21(4), 499–511 (1998)
Mac Whinney, B.: The childes project: tools for analysing talk, 3rd edn. Lawrence Erlbaum Associates, Mahwah, NJ (2000)
Morgenstern, A., Parisse. C.: The Paris corpus. French Lang Stud 22, 7–12. Cambridge (2012)
Mucciardi, M., Pirrotta, G., Briglia, A.: EM Clustering method and first language acquisition. In: Book of Short Papers Models and Learning for Clustering and Classification (2021)
Mucciardi, M., Pirrotta, G., Briglia, A. Sallaberry, A.: Visualizing cluster of words: a graphical approach to grammar acquisition. In: Book of Abstracts and Short Papers CLADAG 2021 (2021)
Qi, P., Zhang, Y., Zhang, Y., Bolton, J., Manning, C.J.: Stanza: a python natural language processing toolkit for many human languages. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations (2020)
Saffran, J.: Statistical language learning: mechanisms and constraints. Curr. Dir. Psychol. Sci. 12(4), 110–114 (2003)
Sagae, K.: Tracking child language development with neural network language models. Front. Psychol. 12, 674402 (2021)
Sekali, M.: First language acquisition of French grammar (from 10 months to 4 years old). French Lang. Stud. 22, 1–6 (2012)
Szmrecsanyi, B.: On operationalizing syntactic complexity. In: Purnelle, G., Fairon C., Dister A. (eds.), Le poids des mots. Proceedings of the 7th International Conference on Textual Data Statistical Analysis, vol. 2. Presses Universitaires de Louvain, Louvain-la-Neuve (2004)
Tomasello, M., Stahl, D.: Sampling children’s spontaneous speech: how much is enough? J. Child Lang. 31, 101–121 (2004)
UD (Universal Dependencies).: https://universaldependencies.org (2021)
Yamaguchi, N.: What is a representative language sample for word and sound acquisition? Can. J. Linguist., Univ. Tor. Press. 63(04), 667–685 (2018)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision—ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8689. Springer (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Briglia, A., Mucciardi, M., Pirrotta, G. (2022). A Statistical Model for Predicting Child Language Acquisition: Unfolding Qualitative Grammatical Development by Using Logistic Regression Model. In: Salvati, N., Perna, C., Marchetti, S., Chambers, R. (eds) Studies in Theoretical and Applied Statistics . SIS 2021. Springer Proceedings in Mathematics & Statistics, vol 406. Springer, Cham. https://doi.org/10.1007/978-3-031-16609-9_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-16609-9_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16608-2
Online ISBN: 978-3-031-16609-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)