A Statistical Model for Predicting Child Language Acquisition: Unfolding Qualitative Grammatical Development by Using Logistic Regression Model

Briglia, Andrea; Mucciardi, Massimo; Pirrotta, Giovanni

doi:10.1007/978-3-031-16609-9_7

Andrea Briglia⁵,
Massimo Mucciardi⁶ &
Giovanni Pirrotta⁷

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 406))

Included in the following conference series:

Convegno della Società Italiana di Statistica

354 Accesses

Abstract

Language acquisition is a scientific puzzle still awaiting a theoretical solution. Children seem to acquire their native language in a spontaneous and effortless way and they probably do so by keeping track of the frequency with which language items such as phonemes or parts of speech occur. Advances in data storage, processing and visualization have triggered a growing and fertile interest in analysing language by relying on statistics and quantitative methods. In this paper we propose a multiple logistic regression model to evaluate how different components of language contribute to its acquisition over time. The empirical basis consists of a corpus, which can be considered as a series of statistically representative samples taken at regular time intervals. The aim is to show how quantitative methods can contribute to explaining the creation and development of grammatical categories in first language acquisition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Among the seven children in the CoLaJE database we choose Adrien because, from a sampling point of view, the data is more detailed and complete.
2.
Classical linear regression was discarded because it gave poor results during the modelling stage.
3.
We transformed the variable from months to years for a better representation of the data.
4.
It is possible to calculate the IPC of any given French word at this link [10]: http://igm.univ-mlv.fr/~gambette/iPhocomp/.
5.
Universal Dependencies (UD) is a framework for consistent annotation of grammar (parts of speech, morphological features, and syntactic dependencies) across different human languages.
6.
Before modelling AGE as linear, we tried to create three successive yearly time slots to see how the two other regressors behave if taken apart, but the resulting classified number of cases had a lower success rate than the model proposed. We then choose to model it as linear because first language acquisition is a highly non-linear phenomenon and the only certainty linguists have is that – roughly speaking – it develops in a cumulative way over time. We tried to model the interaction effects between COMPLEX, IPC and CLASS too, but it turned out to be less precise than the model proposed: in fact, COMPLEX showed a counterintuitive result in which its increase in value causes a decrease in WPV (models are available on request).
7.
All calculations are performed with STATA ver. 15.
8.
The AGE in the abscissa axis refers to the period of the video recordings (see CoLaJE database [4] for more details).
9.
The Sentence Phonetic Variation Rate (SPVR) is the ratio between the number of phonetic variations (the number of differences detected between “pho” and “mod”) and the total numbers of words. SPVR can assume the value 0% when the child does not make any errors and 100% when the child does not correctly pronounce any of the words contained in the sentence [17].
10.
To have an idea of what 0 or 4 or 8 mean, you can type a word in the link [10]: http://igm.univ-mlv.fr/~gambette/iPhocomp/.
11.
The past participle form in French could be given as an example.
12.
A graphic visualization of this work can be found at this link [17]: http://advanse.lirmm.fr/EMClustering/.
13.
Considering the low number of "OTHER" type words, only the most important classes were taken: CLOSED and OPEN.

References

Ambridge, B., Kidd, E., Rowland, C.F., Theakston, A.: The ubiquity of frequency effects in first language acquisition. J. Child Lang. 42, 239–273. Cambridge University Press (2015)
Google Scholar
Briglia, A.: Statistical and computational approaches to first language acquisition. Mining a set of French longitudinal corpora (CoLaJE). Linguistics. Université Paul Valéry Montpellier 3 (France); University of Messina (Italy) (2021)
Google Scholar
Briglia, A., Mucciardi, M., Sauvage, J.: Identify the speech code through statistics: a data-driven approach, Book of Short Papers SIS. (2020)
Google Scholar
CoLaJE Corpus: http://colaje.scicog.fr/index.php/corpus (2020)
Colombo, M., Elkin, L., Hartmann, S.: Being realist about Bayes, and the predictive processing theory of mind. Br. J. Philos. Sci. 72(11), 185–220 (2020)
Google Scholar
Didirkova, I., Dodane, C., Diwersy S.: The role of disfluencies in language acquisition and development of syntactic complexity in children. DISS 2019, Budapest, Hungary (2019)
Google Scholar
Ferrer, I.C., Solé, R.V.: The small world of human language. Proc. R. Soc. Lond. B.2682261–2265 (2001)
Google Scholar
Friston, K.: Life as we know it. J. R. Soc. Interface 10 (2013)
Google Scholar
Hosmer, D., Lemeshow, S.: Applied logistic regression. Wiley, New York (1989)
MATH Google Scholar
Index of Phonetic Complexity. http://igm.univ-mlv.fr/~gambette/iPhocomp/ (2021)
Jakielski, K.: Quantifying phonetic complexity in words: an experimental index. Child Phonology Conference, Cedar Fallas, IA (2000)
Google Scholar
Lee, H., Gambette, P., Barkat-Defradas, M.: iPhocomp: calcul automatique de l’indice de complexité phonétique de Jakielski. JEP 2014, XXXè édition des Journées d'Etudes sur la Parole, Le Mans, France, pp. 622–630, Actes de la XXXe édition des Journées d'Etudes sur la Parole (2014)
Google Scholar
Mac Neilage, P.: The frame/content theory of evolution of speech production. Behav. Brain Sci. 21(4), 499–511 (1998)
Article Google Scholar
Mac Whinney, B.: The childes project: tools for analysing talk, 3rd edn. Lawrence Erlbaum Associates, Mahwah, NJ (2000)
Google Scholar
Morgenstern, A., Parisse. C.: The Paris corpus. French Lang Stud 22, 7–12. Cambridge (2012)
Google Scholar
Mucciardi, M., Pirrotta, G., Briglia, A.: EM Clustering method and first language acquisition. In: Book of Short Papers Models and Learning for Clustering and Classification (2021)
Google Scholar
Mucciardi, M., Pirrotta, G., Briglia, A. Sallaberry, A.: Visualizing cluster of words: a graphical approach to grammar acquisition. In: Book of Abstracts and Short Papers CLADAG 2021 (2021)
Google Scholar
Qi, P., Zhang, Y., Zhang, Y., Bolton, J., Manning, C.J.: Stanza: a python natural language processing toolkit for many human languages. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations (2020)
Google Scholar
Saffran, J.: Statistical language learning: mechanisms and constraints. Curr. Dir. Psychol. Sci. 12(4), 110–114 (2003)
Article Google Scholar
Sagae, K.: Tracking child language development with neural network language models. Front. Psychol. 12, 674402 (2021)
Article Google Scholar
Sekali, M.: First language acquisition of French grammar (from 10 months to 4 years old). French Lang. Stud. 22, 1–6 (2012)
Article Google Scholar
Szmrecsanyi, B.: On operationalizing syntactic complexity. In: Purnelle, G., Fairon C., Dister A. (eds.), Le poids des mots. Proceedings of the 7th International Conference on Textual Data Statistical Analysis, vol. 2. Presses Universitaires de Louvain, Louvain-la-Neuve (2004)
Google Scholar
Tomasello, M., Stahl, D.: Sampling children’s spontaneous speech: how much is enough? J. Child Lang. 31, 101–121 (2004)
Article Google Scholar
UD (Universal Dependencies).: https://universaldependencies.org (2021)
Yamaguchi, N.: What is a representative language sample for word and sound acquisition? Can. J. Linguist., Univ. Tor. Press. 63(04), 667–685 (2018)
Article Google Scholar
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision—ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8689. Springer (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

STIH Lab LC Sorbonne Université, Paris, France
Andrea Briglia
Department of Cognitive Science, University of Messina, Messina, Italy
Massimo Mucciardi
University of Messina, Messina, Italy
Giovanni Pirrotta

Authors

Andrea Briglia
View author publications
You can also search for this author in PubMed Google Scholar
Massimo Mucciardi
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Pirrotta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Massimo Mucciardi .

Editor information

Editors and Affiliations

Department of Economics and Management, University of Pisa, Pisa, Italy
Nicola Salvati
Department of Economics and Statistics, University of Salerno, Fisciano, Salerno, Italy
Cira Perna
Department of Economics and Management, University of Pisa, Pisa, Italy
Stefano Marchetti
School of Mathematics and Applied Statistics, University of Wollongong, Wollongong, NSW, Australia
Raymond Chambers

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Briglia, A., Mucciardi, M., Pirrotta, G. (2022). A Statistical Model for Predicting Child Language Acquisition: Unfolding Qualitative Grammatical Development by Using Logistic Regression Model. In: Salvati, N., Perna, C., Marchetti, S., Chambers, R. (eds) Studies in Theoretical and Applied Statistics . SIS 2021. Springer Proceedings in Mathematics & Statistics, vol 406. Springer, Cham. https://doi.org/10.1007/978-3-031-16609-9_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-16609-9_7
Published: 15 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16608-2
Online ISBN: 978-3-031-16609-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics