Abstract
In this contribution, we report on a computational corpus-based study to analyse the semantic evolution of words over time. Though semantic change is complex and not well suited to analytical manipulation, we believe that computational modelling is a crucial tool to study this phenomenon. This study consists of two parts. In the first one, our aim is to capture the systemic change of word meanings in an empirical model that is also predictive, making it falsifiable. In order to illustrate the significance of this kind of empirical model, we then conducted an experimental evaluation using the Google Books N-Gram corpus. The results show that the model is effective in capturing semantic change and can achieve a high degree of accuracy on predicting words’ distributional semantics. In the second part, we look at the degree to which the S-curve model, which is generally used to describe the quantitative property associated with linguistic changes, applies in the case of lexical semantic change. We use an automatic procedure to empirically extract words that have known the biggest semantic shifts in the past two centuries from the Google Books N-gram corpus. Then, we investigate the significance of the S-curve pattern in their frequency evolution. The results suggest that the S-curve pattern has indeed some generic character, especially in the case of frequency rises related to semantic expansions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Simpson, J.A., Weiner, E.S.C.: The Oxford English Dictionary. Oxford University Press, Oxford (1989)
Dubossarsky, H., Tsvetkov, Y., Dyer, C., Grossman, E.: A bottom up approach to category mapping and meaning change. In: NetWordS, pp. 66–70 (2015)
Traugott, E.C., Dasher, R.B.: Regularity in Semantic Change. Cambridge University Press, Cambridge (2001)
Bailey, C.-J.N.: Variation and linguistic theory (1973)
Kroch, A.S.: Reflexes of grammar in patterns of language change. Lang. Var. Change. 1, 199–244 (1989)
Steels, L.: Modeling the cultural evolution of language. Phys. Life Rev. 8, 339–356 (2011)
Boukhaled, M., Fagard, B., Poibeau, T.: Modelling the semantic change dynamics using diachronic word embedding. In: Proceedings of the 11th International Conference on Agents and Artificial Intelligenc, ICAART 2019. Prague, Czech Republic (2019)
Kim, Y., Chiu, Y.-I., Hanaki, K., Hegde, D., Petrov, S.: Temporal analysis of language through neural language models. arXiv Preprint. arXiv:1405.3515 (2014)
Rosin, G.D., Radinsky, K., Adar, E.: Learning Word Relatedness over Time. arXiv Preprint arXiv:1707.08081 (2017)
Szymanski, T.: Temporal word analogies: identifying lexical replacement with diachronic word embeddings. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 448–453 (2017)
Kutuzov, A., Velldal, E., Øvrelid, L.: Temporal dynamics of semantic relations in word embeddings: an application to predicting armed conflict participants. arXiv Preprint arXiv:1707.08660 (2017)
Hamilton, W.L., Leskovec, J., Jurafsky, D.: Diachronic word embeddings reveal statistical laws of semantic change. arXiv Preprint arXiv:1605.09096 (2016)
Feltgen, Q., Fagard, B., Nadal, J.-P.: Frequency patterns of semantic change: corpus-based evidence of a near-critical dynamics in language change. R. Soc. Open Sci. 4, 170830 (2017)
Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37, 141–188 (2010)
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41, 391–407 (1990)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv Preprint arXiv:1301.3781 (2013)
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Lin, Y., Michel, J.-B., Aiden, E.L., Orwant, J., Brockman, W., Petrov, S.: Syntactic annotations for the Google books ngram corpus. In: Proceedings of the ACL 2012 System Demonstrations, pp. 169–174 (2012)
Hamilton, W.L., Leskovec, J., Jurafsky, D.: Cultural shift or linguistic drift? comparing two computational measures of semantic change. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, p. 2116 (2016)
Bengio, Y.: Markovian models for sequential data. Neural Comput. Surv. 2, 129–162 (1999)
Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data (2001)
Medsker, L.R., Jain, L.C.: Recurrent neural networks. Des. Appl. 5 (2001)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)
Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: International Conference on Machine Learning, pp. 1310–1318 (2013)
Iacobacci, I., Pilehvar, M.T., Navigli, R.: Sensembed: learning sense embeddings for word and relational similarity. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 95–105 (2015)
Rogers, E.M.: Diffusion of Innovations. Simon and Schuster, New York (2010)
Denison, D.: Logistic and simplistic S-curves. Motiv. Lang. Chang. 54, 70 (2003)
Labov, W.: Principles of Linguistic Change, Volume 3: Cognitive and Cultural Factors. Wiley, Oxford (1994)
Ghanbarnejad, F., Gerlach, M., Miotto, J.M., Altmann, E.G.: Extracting information from S-curves of language change. J. R. Soc. Interface 11, 20141044 (2014)
Nevalainen, T.: Descriptive adequacy of the S-curve model in diachronic studies of language change. In: Can We Predict Linguistic Change? (2015)
Blythe, R.A., Croft, W.: S-curves and the mechanisms of propagation in language change. Language (Baltim) 88, 269–304 (2012)
Feltgen, Q.: Statistical physics of language evolution: the grammaticalization phenomenon (2017)
Webster, N.: Noah Webster’s first edition of an American dictionary of the English language. Foundation for Amer Christian (1828)
Acknowledgements
This work is supported by the project 2016-147 ANR OPLADYN TAP-DD2016.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Boukhaled, M.A., Fagard, B., Poibeau, T. (2019). The Dynamics of Semantic Change: A Corpus-Based Analysis. In: van den Herik, J., Rocha, A., Steels, L. (eds) Agents and Artificial Intelligence. ICAART 2019. Lecture Notes in Computer Science(), vol 11978. Springer, Cham. https://doi.org/10.1007/978-3-030-37494-5_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-37494-5_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37493-8
Online ISBN: 978-3-030-37494-5
eBook Packages: Computer ScienceComputer Science (R0)