Abstract
Music and language are structurally similar. Such structural similarity is often explained by generative processes. This paper describes the recent development of probabilistic generative models (PGMs) for language learning and symbol emergence in robotics. Symbol emergence in robotics aims to develop a robot that can adapt to real-world environments and human linguistic communications and acquire language from sensorimotor information alone (i.e., in an unsupervised manner). This is regarded as a constructive approach to symbol emergence systems. To this end, a series of PGMs have been developed, including those for simultaneous phoneme and word discovery, lexical acquisition, object and spatial concept formation, and the emergence of a symbol system. By extending the models, a symbol emergence system comprising a multi-agent system in which a symbol system emerges is revealed to be modeled using PGMs. In this model, symbol emergence can be regarded as collective predictive coding. This paper expands on this idea by combining the theory that “emotion is based on the predictive coding of interoceptive signals” and “symbol emergence systems”, and describes the possible hypothesis of the emergence of meaning in music.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Thus, integrating various predictive coding with PGMs is essential for modeling integrative human cognitive systems. As a framework for this purpose, SERKET was proposed [39, 63]. Recently, the idea was extended to the whole-brain PGM, which aims to build a cognitive model covering an entire brain by combining PGMs with anatomical knowledge of brain architecture [67]. This approach is known as the whole-brain architecture approach [73]. Following this idea, the anatomical validity of the above NPB-DAA for spoken language acquisition and SLAM-based place recognition was also examined from the viewpoint of the brain [53, 58].
- 2.
Recently, this idea has been developed into a large-scale language model using transformers, and its generality and performance have become widely known.
References
Akbari, M., Liang, J.: Semi-recurrent CNN-based VAE-GAN for sequential data generation. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2321–2325. IEEE (2018)
Ando, Y., Nakamura, T., Araki, T., Nagai, T.: Formation of hierarchical object concept using hierarchical latent Dirichlet allocation. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2272–2279 (2013)
Araki, T., Nakamura, T., Nagai, T., Funakoshi, K., Nakano, M., Iwahashi, N.: Autonomous acquisition of multimodal information for online object concept formation by robots. In: 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1540–1547 (2011). https://doi.org/10.1109/IROS.2011.6048422
Araki, T., Nakamura, T., Nagai, T., Nagasaka, S., Taniguchi, T., Iwahashi, N.: Online learning of concepts and words using multimodal LDA and hierarchical Pitman-Yor Language Model. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1623–1630 (2012). https://doi.org/10.1109/IROS.2012.6385812
Asano, R., Boeckx, C.: Syntax in language and music: what is the right level of comparison? Front. Psychol. 6, 942 (2015)
Atherton, R.P., et al.: Shared processing of language and music: evidence from a cross-modal interference paradigm. Exp. Psychol. 65(1), 40 (2018)
Baevski, A., Zhou, Y., Mohamed, A., Auli, M.: wav2vec 2.0: a framework for self-supervised learning of speech representations. In: Advances in Neural Information Processing Systems, vol. 33, 12449–12460 (2020)
Barrett, L.F., Simmons, W.K.: Interoceptive predictions in the brain. Nat. Rev. Neurosci. 16(7), 419–429 (2015)
Barsalou, L.W.: Perceptual symbol systems. Behav. Brain Sci. 22(04), 1–16 (1999). https://doi.org/10.1017/S0140525X99002149
Berwick, R.C., Beckers, G.J., Okanoya, K., Bolhuis, J.J.: A bird’s eye view of human language evolution. Front. Evol. Neurosci. 4, 5 (2012)
Bishop, C.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, Heidelberg (2006)
Bommasani, R., et al.: On the opportunities and risks of foundation models (2021). https://doi.org/10.48550/ARXIV.2108.07258
Briot, J.P., Hadjeres, G., Pachet, F.D.: Deep learning techniques for music generation–a survey. arXiv preprint arXiv:1709.01620 (2017)
Brown, S.: Are music and language homologues? Ann. N. Y. Acad. Sci. 930(1), 372–374 (2001)
Brown, S., Martinez, M.J., Parsons, L.M.: Music and language side by side in the brain: a pet study of the generation of melodies and sentences. Eur. J. Neurosci. 23(10), 2791–2803 (2006)
Brown, T., et al.: Language models are few-shot learners. In: Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901 (2020)
Cangelosi, A., Schlesinger, M.: Developmental Robotics. The MIT Press, Cambridge (2015)
Chandler, D.: Semiotics the Basics. Routledge, Milton Park (2002)
Diéguez, P.L., Soo, V.W.: Variational autoencoders for polyphonic music interpolation. In: 2020 International Conference on Technologies and Applications of Artificial Intelligence (TAAI), pp. 56–61 (2020)
Dunbar, E., et al.: The zero resource speech challenge 2017. In: 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 323–330 (2017)
Feld, S., Fox, A.A.: Music and language. Ann. Rev. Anthropol. 23, 25–53 (1994)
Flavell, J.H.: The Developmental Psychology of Jean Piaget. Literary Licensing, LLC (2011)
Friston, K., Moran, R.J., Nagai, Y., Taniguchi, T., Gomi, H., Tenenbaum, J.: World model learning and inference. Neural Netw. 144, 573–590 (2021)
Furukawa, K., Taniguchi, A., Hagiwara, Y., Taniguchi, T.: Symbol emergence as inter-personal categorization with head-to-head latent word. In: IEEE International Conference on Development and Learning (ICDL 2022), pp. 60–67 (2022)
Hagiwara, Y., Furukawa, K., Taniguchi, A., Taniguchi, T.: Multiagent multimodal categorization for symbol emergence: emergent communication via interpersonal cross-modal inference. Adv. Robot. 36(5–6), 239–260 (2022)
Hagiwara, Y., Inoue, M., Kobayashi, H., Taniguchi, T.: Hierarchical spatial concept formation based on multimodal information for human support robots. Front. Neurorobot. 12(11), 1–16 (2018)
Hagiwara, Y., Kobayashi, H., Taniguchi, A., Taniguchi, T.: Symbol emergence as an interpersonal multimodal categorization. Front. Robot. AI 6(134), 1–17 (2019). https://doi.org/10.3389/frobt.2019.00134
Hohwy, J.: The Predictive Mind. OUP, Oxford (2013)
Huang, C.Z.A., et al.: Music transformer. arXiv preprint arXiv:1809.04281 (2018)
Huang, Y.S., Yang, Y.H.: Pop music transformer: beat-based modeling and generation of expressive pop piano compositions. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1180–1188 (2020)
Jackendoff, R., Lerdahl, F.: A grammatical parallel between music and language. In: Clynes, M. (ed.) Music, Mind, and Brain, pp. 83–117. Springer, Cham (1982). https://doi.org/10.1007/978-1-4684-8917-0_5
Jiang, J., Xia, G.G., Carlton, D.B., Anderson, C.N., Miyakawa, R.H.: Transformer VAE: a hierarchical model for structure-aware and interpretable music representation learning. In: 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2020, pp. 516–520. IEEE (2020)
Mochihashi, D., Sumita, E.: The infinite Markov model. In: Advances in Neural Information Processing Systems, vol. 20 (2007)
Nakamura, T., Ando, Y., Nagai, T., Kaneko, M.: Concept formation by robots using an infinite mixture of models. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2015)
Nakamura, T., Araki, T., Nagai, T., Iwahashi, N.: Grounding of word meanings in LDA-based multimodal concepts. Adv. Robot. 25, 2189–2206 (2012)
Nakamura, T., Nagai, T., Funakoshi, K., Nagasaka, S., Taniguchi, T., Iwahashi, N.: Mutual learning of an object concept and language model based on MLDA and NPYLM. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 600–607 (2014)
Nakamura, T., Nagai, T., Iwahashi, N.: Multimodal object categorization by a robot. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2415–2420 (2007). https://doi.org/10.1109/IROS.2007.4399634
Nakamura, T., Nagai, T., Iwahashi, N.: Bag of multimodal hierarchical Dirichlet processes: model of complex conceptual structure for intelligent robots. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3818–3823 (2012). https://doi.org/10.1109/IROS.2012.6385502
Nakamura, T., Nagai, T., Taniguchi, T.: Serket: an architecture for connecting stochastic models to realize a large-scale cognitive model. Front. Neurorobot. 12, 25 (2018)
van Niekerk, B., Nortje, L., Kamper, H.: Vector-quantized neural networks for acoustic unit discovery in the zerospeech 2020 challenge. arXiv preprint arXiv:2005.09409 (2020)
Okanoya, K.: Language evolution and an emergent property. Curr. Opin. Neurobiol. 17(2), 271–276 (2007). https://doi.org/10.1016/j.conb.2007.03.011
Okanoya, K.: Sexual communication and domestication may give rise to the signal complexity necessary for the emergence of language: an indication from songbird studies. Psychon. Bull. Rev. 24(1), 106–110 (2017)
Okanoya, K., Merker, B.: Neural substrates for string-context mutual segmentation: a path to human language. In: Lyon, C., Nehaniv, C.L., Cangelosi, A. (eds.) Emergence of Communication and Language, pp. 421–434. Springer, London (2007). https://doi.org/10.1007/978-1-84628-779-4_22
Okuda, Y., Ozaki, R., Komura, S., Taniguchi, T.: Double articulation analyzer with prosody for unsupervised word and phone discovery. IEEE Trans. Cogn. Dev. Syst. (2022). https://doi.org/10.1109/TCDS.2022.3210751
Peirce, C.S.: Collected Writings. Harvard University Press, Cambridge (1931–1958)
Saffran, J.R., Newport, E.L., Aslin, R.N.: Word segmentation: the role of distributional cues. J. Mem. Lang. 35(4), 606–621 (1996)
Seth, A.K.: Interoceptive inference, emotion, and the embodied self. Trends Cogn. Sci. 17(11), 565–573 (2013)
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948)
Shirai, A., Taniguchi, T.: A proposal of an interactive music composition system using Gibbs sampler. In: Jacko, J.A. (ed.) HCI 2011. LNCS, vol. 6761, pp. 490–497. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21602-2_53
Shirai, A., Taniguchi, T.: A proposal of the melody generation method using variable-order Pitman-Yor language model. J. Jpn. Soc. Fuzzy Theory Intell. Inform. 25(6), 901–913 (2013). https://doi.org/10.3156/jsoft.25.901
Sternin, A., McGarry, L.M., Owen, A.M., Grahn, J.A.: The effect of familiarity on neural representations of music and language. J. Cogn. Neurosci. 33(8), 1595–1611 (2021)
Suzuki, M., Matsuo, Y.: A survey of multimodal deep generative models. Adv. Robot. 36(5–6), 261–278 (2022)
Taniguchi, A., Fukawa, A., Yamakawa, H.: Hippocampal formation-inspired probabilistic generative model. Neural Netw. 151, 317–335 (2022)
Taniguchi, A., Hagiwara, Y., Taniguchi, T., Inamura, T.: Online spatial concept and lexical acquisition with simultaneous localization and mapping. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 811–818 (2017)
Taniguchi, A., Hagiwara, Y., Taniguchi, T., Inamura, T.: Improved and scalable online learning of spatial concepts and language models with mapping. Auton. Robot. 44(6), 927–946 (2020). https://doi.org/10.1007/s10514-020-09905-0
Taniguchi, A., Isobe, S., El Hafi, L., Hagiwara, Y., Taniguchi, T.: Autonomous planning based on spatial concepts to tidy up home environments with service robots. Adv. Robot. 35(8), 471–489 (2021)
Taniguchi, A., Murakami, H., Ozaki, R., Taniguchi, T.: Unsupervised multimodal word discovery based on double articulation analysis with co-occurrence cues. arXiv preprint arXiv:2201.06786 (2022)
Taniguchi, A., Muro, M., Yamakawa, H., Taniguchi, T.: Brain-inspired probabilistic generative model for double articulation analysis of spoken language. In: IEEE International Conference on Development and Learning (ICDL 2022), pp. 107–114 (2022)
Taniguchi, A., Taniguchi, T., Inamura, T.: Spatial concept acquisition for a mobile robot that integrates self-localization and unsupervised word discovery from spoken sentences. IEEE Trans. Cogn. Dev. Syst. 8(4), 285–297 (2016)
Taniguchi, A., Taniguchi, T., Inamura, T.: Unsupervised spatial lexical acquisition by updating a language model with place clues. Robot. Auton. Syst. 99, 166–180 (2018)
Taniguchi, T., Nagai, T., Nakamura, T., Iwahashi, N., Ogata, T., Asoh, H.: Symbol emergence in robotics: a survey. Adv. Robot. 30(11–12), 706–728 (2016)
Taniguchi, T., Nagasaka, S., Nakashima, R.: Nonparametric Bayesian double articulation analyzer for direct language acquisition from continuous speech signals. IEEE Trans. Cogn. Dev. Syst. 8(3), 171–185 (2016). https://doi.org/10.1109/TCDS.2016.2550591
Taniguchi, T., et al.: Neuro-serket: development of integrative cognitive system through the composition of deep probabilistic generative models. N. Gener. Comput. 38(1), 23–48 (2020)
Taniguchi, T., Nakashima, R., Liu, H., Nagasaka, S.: Double articulation analyzer with deep sparse autoencoder for unsupervised word discovery from speech signals. Adv. Robot. 30(11–12), 770–783 (2016). https://doi.org/10.1080/01691864.2016.1159981
Taniguchi, T., Sawaragi, T.: Incremental acquisition of behaviors and signs based on a reinforcement learning schemata model and a spike timing-dependent plasticity network. Adv. Robot. 21(10), 1177–1199 (2007)
Taniguchi, T., et al.: Symbol emergence in cognitive developmental systems: a survey. IEEE Trans. Cogn. Dev. Syst. 11, 494–516 (2018)
Taniguchi, T., et al.: A whole brain probabilistic generative model: toward realizing cognitive architectures for developmental robots. Neural Netw. 150, 293–312 (2022)
Taniguchi, T., Yoshida, Y., Taniguchi, A., Hagiwara, Y.: Emergent communication through metropolis-hastings naming game with deep generative models. arXiv preprint arXiv:2205.12392 (2022)
Taniguchi, T., Yoshino, R., Takano, T.: Multimodal hierarchical Dirichlet process-based active perception by a robot. Front. Neurorobot. 12, 22 (2018)
Tjandra, A., Sakti, S., Nakamura, S.: Transformer VQ-VAE for unsupervised unit discovery and speech synthesis: zerospeech 2020 challenge. arXiv preprint arXiv:2005.11676 (2020)
Von Uexküll, J.: A stroll through the worlds of animals and men: a picture book of invisible worlds. Semiotica 89(4), 319–391 (1992)
Vuust, P., Heggli, O.A., Friston, K.J., Kringelbach, M.L.: Music in the brain. Nat. Rev. Neurosci. 23(5), 287–305 (2022)
Yamakawa, H., Osawa, M., Matsuo, Y.: Whole brain architecture approach is a feasible way toward an artificial general intelligence. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds.) ICONIP 2016. LNCS, vol. 9947, pp. 275–281. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46687-3_30
Acknowledgments
This paper was written as a post-proceedings paper for the keynote speech titled “Generative Models for Symbol Emergence based on Real-World Sensory-motor Information and Communication” presented at the 15th International Symposium on Computer Music Multidisciplinary Research (CMMR) 2021. This work was supported by JSPS KAKENHI Grant Numbers JP16H06569 and JP21H04904.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this paper
Cite this paper
Taniguchi, T. (2023). On Parallelism in Music and Language: A Perspective from Symbol Emergence Systems Based on Probabilistic Generative Models. In: Aramaki, M., Hirata, K., Kitahara, T., Kronland-Martinet, R., Ystad, S. (eds) Music in the AI Era. CMMR 2021. Lecture Notes in Computer Science, vol 13770 . Springer, Cham. https://doi.org/10.1007/978-3-031-35382-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-35382-6_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35381-9
Online ISBN: 978-3-031-35382-6
eBook Packages: Computer ScienceComputer Science (R0)