On Parallelism in Music and Language: A Perspective from Symbol Emergence Systems Based on Probabilistic Generative Models

Taniguchi, Tadahiro

doi:10.1007/978-3-031-35382-6_2

Tadahiro Taniguchi¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13770 ))

Included in the following conference series:

International Symposium on Computer Music Multidisciplinary Research

674 Accesses
3 Altmetric

Abstract

Music and language are structurally similar. Such structural similarity is often explained by generative processes. This paper describes the recent development of probabilistic generative models (PGMs) for language learning and symbol emergence in robotics. Symbol emergence in robotics aims to develop a robot that can adapt to real-world environments and human linguistic communications and acquire language from sensorimotor information alone (i.e., in an unsupervised manner). This is regarded as a constructive approach to symbol emergence systems. To this end, a series of PGMs have been developed, including those for simultaneous phoneme and word discovery, lexical acquisition, object and spatial concept formation, and the emergence of a symbol system. By extending the models, a symbol emergence system comprising a multi-agent system in which a symbol system emerges is revealed to be modeled using PGMs. In this model, symbol emergence can be regarded as collective predictive coding. This paper expands on this idea by combining the theory that “emotion is based on the predictive coding of interoceptive signals” and “symbol emergence systems”, and describes the possible hypothesis of the emergence of meaning in music.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Thus, integrating various predictive coding with PGMs is essential for modeling integrative human cognitive systems. As a framework for this purpose, SERKET was proposed [39, 63]. Recently, the idea was extended to the whole-brain PGM, which aims to build a cognitive model covering an entire brain by combining PGMs with anatomical knowledge of brain architecture [67]. This approach is known as the whole-brain architecture approach [73]. Following this idea, the anatomical validity of the above NPB-DAA for spoken language acquisition and SLAM-based place recognition was also examined from the viewpoint of the brain [53, 58].
2.
Recently, this idea has been developed into a large-scale language model using transformers, and its generality and performance have become widely known.

References

Akbari, M., Liang, J.: Semi-recurrent CNN-based VAE-GAN for sequential data generation. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2321–2325. IEEE (2018)
Google Scholar
Ando, Y., Nakamura, T., Araki, T., Nagai, T.: Formation of hierarchical object concept using hierarchical latent Dirichlet allocation. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2272–2279 (2013)
Google Scholar
Araki, T., Nakamura, T., Nagai, T., Funakoshi, K., Nakano, M., Iwahashi, N.: Autonomous acquisition of multimodal information for online object concept formation by robots. In: 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1540–1547 (2011). https://doi.org/10.1109/IROS.2011.6048422
Araki, T., Nakamura, T., Nagai, T., Nagasaka, S., Taniguchi, T., Iwahashi, N.: Online learning of concepts and words using multimodal LDA and hierarchical Pitman-Yor Language Model. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1623–1630 (2012). https://doi.org/10.1109/IROS.2012.6385812
Asano, R., Boeckx, C.: Syntax in language and music: what is the right level of comparison? Front. Psychol. 6, 942 (2015)
Article Google Scholar
Atherton, R.P., et al.: Shared processing of language and music: evidence from a cross-modal interference paradigm. Exp. Psychol. 65(1), 40 (2018)
Article Google Scholar
Baevski, A., Zhou, Y., Mohamed, A., Auli, M.: wav2vec 2.0: a framework for self-supervised learning of speech representations. In: Advances in Neural Information Processing Systems, vol. 33, 12449–12460 (2020)
Google Scholar
Barrett, L.F., Simmons, W.K.: Interoceptive predictions in the brain. Nat. Rev. Neurosci. 16(7), 419–429 (2015)
Article Google Scholar
Barsalou, L.W.: Perceptual symbol systems. Behav. Brain Sci. 22(04), 1–16 (1999). https://doi.org/10.1017/S0140525X99002149
Article Google Scholar
Berwick, R.C., Beckers, G.J., Okanoya, K., Bolhuis, J.J.: A bird’s eye view of human language evolution. Front. Evol. Neurosci. 4, 5 (2012)
Article Google Scholar
Bishop, C.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, Heidelberg (2006)
MATH Google Scholar
Bommasani, R., et al.: On the opportunities and risks of foundation models (2021). https://doi.org/10.48550/ARXIV.2108.07258
Briot, J.P., Hadjeres, G., Pachet, F.D.: Deep learning techniques for music generation–a survey. arXiv preprint arXiv:1709.01620 (2017)
Brown, S.: Are music and language homologues? Ann. N. Y. Acad. Sci. 930(1), 372–374 (2001)
Article Google Scholar
Brown, S., Martinez, M.J., Parsons, L.M.: Music and language side by side in the brain: a pet study of the generation of melodies and sentences. Eur. J. Neurosci. 23(10), 2791–2803 (2006)
Article Google Scholar
Brown, T., et al.: Language models are few-shot learners. In: Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901 (2020)
Google Scholar
Cangelosi, A., Schlesinger, M.: Developmental Robotics. The MIT Press, Cambridge (2015)
Book Google Scholar
Chandler, D.: Semiotics the Basics. Routledge, Milton Park (2002)
Google Scholar
Diéguez, P.L., Soo, V.W.: Variational autoencoders for polyphonic music interpolation. In: 2020 International Conference on Technologies and Applications of Artificial Intelligence (TAAI), pp. 56–61 (2020)
Google Scholar
Dunbar, E., et al.: The zero resource speech challenge 2017. In: 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 323–330 (2017)
Google Scholar
Feld, S., Fox, A.A.: Music and language. Ann. Rev. Anthropol. 23, 25–53 (1994)
Article Google Scholar
Flavell, J.H.: The Developmental Psychology of Jean Piaget. Literary Licensing, LLC (2011)
Google Scholar
Friston, K., Moran, R.J., Nagai, Y., Taniguchi, T., Gomi, H., Tenenbaum, J.: World model learning and inference. Neural Netw. 144, 573–590 (2021)
Article Google Scholar
Furukawa, K., Taniguchi, A., Hagiwara, Y., Taniguchi, T.: Symbol emergence as inter-personal categorization with head-to-head latent word. In: IEEE International Conference on Development and Learning (ICDL 2022), pp. 60–67 (2022)
Google Scholar
Hagiwara, Y., Furukawa, K., Taniguchi, A., Taniguchi, T.: Multiagent multimodal categorization for symbol emergence: emergent communication via interpersonal cross-modal inference. Adv. Robot. 36(5–6), 239–260 (2022)
Article Google Scholar
Hagiwara, Y., Inoue, M., Kobayashi, H., Taniguchi, T.: Hierarchical spatial concept formation based on multimodal information for human support robots. Front. Neurorobot. 12(11), 1–16 (2018)
Google Scholar
Hagiwara, Y., Kobayashi, H., Taniguchi, A., Taniguchi, T.: Symbol emergence as an interpersonal multimodal categorization. Front. Robot. AI 6(134), 1–17 (2019). https://doi.org/10.3389/frobt.2019.00134
Article Google Scholar
Hohwy, J.: The Predictive Mind. OUP, Oxford (2013)
Book Google Scholar
Huang, C.Z.A., et al.: Music transformer. arXiv preprint arXiv:1809.04281 (2018)
Huang, Y.S., Yang, Y.H.: Pop music transformer: beat-based modeling and generation of expressive pop piano compositions. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1180–1188 (2020)
Google Scholar
Jackendoff, R., Lerdahl, F.: A grammatical parallel between music and language. In: Clynes, M. (ed.) Music, Mind, and Brain, pp. 83–117. Springer, Cham (1982). https://doi.org/10.1007/978-1-4684-8917-0_5
Chapter Google Scholar
Jiang, J., Xia, G.G., Carlton, D.B., Anderson, C.N., Miyakawa, R.H.: Transformer VAE: a hierarchical model for structure-aware and interpretable music representation learning. In: 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2020, pp. 516–520. IEEE (2020)
Google Scholar
Mochihashi, D., Sumita, E.: The infinite Markov model. In: Advances in Neural Information Processing Systems, vol. 20 (2007)
Google Scholar
Nakamura, T., Ando, Y., Nagai, T., Kaneko, M.: Concept formation by robots using an infinite mixture of models. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2015)
Google Scholar
Nakamura, T., Araki, T., Nagai, T., Iwahashi, N.: Grounding of word meanings in LDA-based multimodal concepts. Adv. Robot. 25, 2189–2206 (2012)
Article Google Scholar
Nakamura, T., Nagai, T., Funakoshi, K., Nagasaka, S., Taniguchi, T., Iwahashi, N.: Mutual learning of an object concept and language model based on MLDA and NPYLM. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 600–607 (2014)
Google Scholar
Nakamura, T., Nagai, T., Iwahashi, N.: Multimodal object categorization by a robot. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2415–2420 (2007). https://doi.org/10.1109/IROS.2007.4399634
Nakamura, T., Nagai, T., Iwahashi, N.: Bag of multimodal hierarchical Dirichlet processes: model of complex conceptual structure for intelligent robots. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3818–3823 (2012). https://doi.org/10.1109/IROS.2012.6385502
Nakamura, T., Nagai, T., Taniguchi, T.: Serket: an architecture for connecting stochastic models to realize a large-scale cognitive model. Front. Neurorobot. 12, 25 (2018)
Article Google Scholar
van Niekerk, B., Nortje, L., Kamper, H.: Vector-quantized neural networks for acoustic unit discovery in the zerospeech 2020 challenge. arXiv preprint arXiv:2005.09409 (2020)
Okanoya, K.: Language evolution and an emergent property. Curr. Opin. Neurobiol. 17(2), 271–276 (2007). https://doi.org/10.1016/j.conb.2007.03.011
Article Google Scholar
Okanoya, K.: Sexual communication and domestication may give rise to the signal complexity necessary for the emergence of language: an indication from songbird studies. Psychon. Bull. Rev. 24(1), 106–110 (2017)
Article Google Scholar
Okanoya, K., Merker, B.: Neural substrates for string-context mutual segmentation: a path to human language. In: Lyon, C., Nehaniv, C.L., Cangelosi, A. (eds.) Emergence of Communication and Language, pp. 421–434. Springer, London (2007). https://doi.org/10.1007/978-1-84628-779-4_22
Chapter Google Scholar
Okuda, Y., Ozaki, R., Komura, S., Taniguchi, T.: Double articulation analyzer with prosody for unsupervised word and phone discovery. IEEE Trans. Cogn. Dev. Syst. (2022). https://doi.org/10.1109/TCDS.2022.3210751
Peirce, C.S.: Collected Writings. Harvard University Press, Cambridge (1931–1958)
Google Scholar
Saffran, J.R., Newport, E.L., Aslin, R.N.: Word segmentation: the role of distributional cues. J. Mem. Lang. 35(4), 606–621 (1996)
Article Google Scholar
Seth, A.K.: Interoceptive inference, emotion, and the embodied self. Trends Cogn. Sci. 17(11), 565–573 (2013)
Article Google Scholar
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948)
Article MathSciNet MATH Google Scholar
Shirai, A., Taniguchi, T.: A proposal of an interactive music composition system using Gibbs sampler. In: Jacko, J.A. (ed.) HCI 2011. LNCS, vol. 6761, pp. 490–497. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21602-2_53
Chapter Google Scholar
Shirai, A., Taniguchi, T.: A proposal of the melody generation method using variable-order Pitman-Yor language model. J. Jpn. Soc. Fuzzy Theory Intell. Inform. 25(6), 901–913 (2013). https://doi.org/10.3156/jsoft.25.901
Article Google Scholar
Sternin, A., McGarry, L.M., Owen, A.M., Grahn, J.A.: The effect of familiarity on neural representations of music and language. J. Cogn. Neurosci. 33(8), 1595–1611 (2021)
Google Scholar
Suzuki, M., Matsuo, Y.: A survey of multimodal deep generative models. Adv. Robot. 36(5–6), 261–278 (2022)
Article Google Scholar
Taniguchi, A., Fukawa, A., Yamakawa, H.: Hippocampal formation-inspired probabilistic generative model. Neural Netw. 151, 317–335 (2022)
Article Google Scholar
Taniguchi, A., Hagiwara, Y., Taniguchi, T., Inamura, T.: Online spatial concept and lexical acquisition with simultaneous localization and mapping. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 811–818 (2017)
Google Scholar
Taniguchi, A., Hagiwara, Y., Taniguchi, T., Inamura, T.: Improved and scalable online learning of spatial concepts and language models with mapping. Auton. Robot. 44(6), 927–946 (2020). https://doi.org/10.1007/s10514-020-09905-0
Article Google Scholar
Taniguchi, A., Isobe, S., El Hafi, L., Hagiwara, Y., Taniguchi, T.: Autonomous planning based on spatial concepts to tidy up home environments with service robots. Adv. Robot. 35(8), 471–489 (2021)
Article Google Scholar
Taniguchi, A., Murakami, H., Ozaki, R., Taniguchi, T.: Unsupervised multimodal word discovery based on double articulation analysis with co-occurrence cues. arXiv preprint arXiv:2201.06786 (2022)
Taniguchi, A., Muro, M., Yamakawa, H., Taniguchi, T.: Brain-inspired probabilistic generative model for double articulation analysis of spoken language. In: IEEE International Conference on Development and Learning (ICDL 2022), pp. 107–114 (2022)
Google Scholar
Taniguchi, A., Taniguchi, T., Inamura, T.: Spatial concept acquisition for a mobile robot that integrates self-localization and unsupervised word discovery from spoken sentences. IEEE Trans. Cogn. Dev. Syst. 8(4), 285–297 (2016)
Google Scholar
Taniguchi, A., Taniguchi, T., Inamura, T.: Unsupervised spatial lexical acquisition by updating a language model with place clues. Robot. Auton. Syst. 99, 166–180 (2018)
Article Google Scholar
Taniguchi, T., Nagai, T., Nakamura, T., Iwahashi, N., Ogata, T., Asoh, H.: Symbol emergence in robotics: a survey. Adv. Robot. 30(11–12), 706–728 (2016)
Article Google Scholar
Taniguchi, T., Nagasaka, S., Nakashima, R.: Nonparametric Bayesian double articulation analyzer for direct language acquisition from continuous speech signals. IEEE Trans. Cogn. Dev. Syst. 8(3), 171–185 (2016). https://doi.org/10.1109/TCDS.2016.2550591
Article Google Scholar
Taniguchi, T., et al.: Neuro-serket: development of integrative cognitive system through the composition of deep probabilistic generative models. N. Gener. Comput. 38(1), 23–48 (2020)
Article Google Scholar
Taniguchi, T., Nakashima, R., Liu, H., Nagasaka, S.: Double articulation analyzer with deep sparse autoencoder for unsupervised word discovery from speech signals. Adv. Robot. 30(11–12), 770–783 (2016). https://doi.org/10.1080/01691864.2016.1159981
Article Google Scholar
Taniguchi, T., Sawaragi, T.: Incremental acquisition of behaviors and signs based on a reinforcement learning schemata model and a spike timing-dependent plasticity network. Adv. Robot. 21(10), 1177–1199 (2007)
Article Google Scholar
Taniguchi, T., et al.: Symbol emergence in cognitive developmental systems: a survey. IEEE Trans. Cogn. Dev. Syst. 11, 494–516 (2018)
Article Google Scholar
Taniguchi, T., et al.: A whole brain probabilistic generative model: toward realizing cognitive architectures for developmental robots. Neural Netw. 150, 293–312 (2022)
Article Google Scholar
Taniguchi, T., Yoshida, Y., Taniguchi, A., Hagiwara, Y.: Emergent communication through metropolis-hastings naming game with deep generative models. arXiv preprint arXiv:2205.12392 (2022)
Taniguchi, T., Yoshino, R., Takano, T.: Multimodal hierarchical Dirichlet process-based active perception by a robot. Front. Neurorobot. 12, 22 (2018)
Article Google Scholar
Tjandra, A., Sakti, S., Nakamura, S.: Transformer VQ-VAE for unsupervised unit discovery and speech synthesis: zerospeech 2020 challenge. arXiv preprint arXiv:2005.11676 (2020)
Von Uexküll, J.: A stroll through the worlds of animals and men: a picture book of invisible worlds. Semiotica 89(4), 319–391 (1992)
Google Scholar
Vuust, P., Heggli, O.A., Friston, K.J., Kringelbach, M.L.: Music in the brain. Nat. Rev. Neurosci. 23(5), 287–305 (2022)
Article Google Scholar
Yamakawa, H., Osawa, M., Matsuo, Y.: Whole brain architecture approach is a feasible way toward an artificial general intelligence. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds.) ICONIP 2016. LNCS, vol. 9947, pp. 275–281. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46687-3_30
Chapter Google Scholar

Download references

Acknowledgments

This paper was written as a post-proceedings paper for the keynote speech titled “Generative Models for Symbol Emergence based on Real-World Sensory-motor Information and Communication” presented at the 15th International Symposium on Computer Music Multidisciplinary Research (CMMR) 2021. This work was supported by JSPS KAKENHI Grant Numbers JP16H06569 and JP21H04904.

Author information

Authors and Affiliations

Ritsumeikan University, 1-1-1 Noji Higashi, Kusatsu, Shiga, 525-8577, Japan
Tadahiro Taniguchi

Authors

Tadahiro Taniguchi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tadahiro Taniguchi .

Editor information

Editors and Affiliations

Aix-Marseille Univ, Marseille Cedex 09, France
Mitsuko Aramaki
Future University Hakodate, Hakodate, Hokkaido, Japan
Keiji Hirata
Nihon University, Tokyo, Japan
Tetsuro Kitahara
Aix-Marseille Univ, Marseille Cedex 09, France
Richard Kronland-Martinet
Aix-Marseille Univ, Marseille Cedex 09, France
Sølvi Ystad

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Taniguchi, T. (2023). On Parallelism in Music and Language: A Perspective from Symbol Emergence Systems Based on Probabilistic Generative Models. In: Aramaki, M., Hirata, K., Kitahara, T., Kronland-Martinet, R., Ystad, S. (eds) Music in the AI Era. CMMR 2021. Lecture Notes in Computer Science, vol 13770 . Springer, Cham. https://doi.org/10.1007/978-3-031-35382-6_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-35382-6_2
Published: 22 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35381-9
Online ISBN: 978-3-031-35382-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On Parallelism in Music and Language: A Perspective from Symbol Emergence Systems Based on Probabilistic Generative Models