Skip to main content

On Parallelism in Music and Language: A Perspective from Symbol Emergence Systems Based on Probabilistic Generative Models

  • Conference paper
  • First Online:
Music in the AI Era (CMMR 2021)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13770 ))

Included in the following conference series:

Abstract

Music and language are structurally similar. Such structural similarity is often explained by generative processes. This paper describes the recent development of probabilistic generative models (PGMs) for language learning and symbol emergence in robotics. Symbol emergence in robotics aims to develop a robot that can adapt to real-world environments and human linguistic communications and acquire language from sensorimotor information alone (i.e., in an unsupervised manner). This is regarded as a constructive approach to symbol emergence systems. To this end, a series of PGMs have been developed, including those for simultaneous phoneme and word discovery, lexical acquisition, object and spatial concept formation, and the emergence of a symbol system. By extending the models, a symbol emergence system comprising a multi-agent system in which a symbol system emerges is revealed to be modeled using PGMs. In this model, symbol emergence can be regarded as collective predictive coding. This paper expands on this idea by combining the theory that “emotion is based on the predictive coding of interoceptive signals” and “symbol emergence systems”, and describes the possible hypothesis of the emergence of meaning in music.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Thus, integrating various predictive coding with PGMs is essential for modeling integrative human cognitive systems. As a framework for this purpose, SERKET was proposed [39, 63]. Recently, the idea was extended to the whole-brain PGM, which aims to build a cognitive model covering an entire brain by combining PGMs with anatomical knowledge of brain architecture [67]. This approach is known as the whole-brain architecture approach [73]. Following this idea, the anatomical validity of the above NPB-DAA for spoken language acquisition and SLAM-based place recognition was also examined from the viewpoint of the brain [53, 58].

  2. 2.

    Recently, this idea has been developed into a large-scale language model using transformers, and its generality and performance have become widely known.

References

  1. Akbari, M., Liang, J.: Semi-recurrent CNN-based VAE-GAN for sequential data generation. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2321–2325. IEEE (2018)

    Google Scholar 

  2. Ando, Y., Nakamura, T., Araki, T., Nagai, T.: Formation of hierarchical object concept using hierarchical latent Dirichlet allocation. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2272–2279 (2013)

    Google Scholar 

  3. Araki, T., Nakamura, T., Nagai, T., Funakoshi, K., Nakano, M., Iwahashi, N.: Autonomous acquisition of multimodal information for online object concept formation by robots. In: 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1540–1547 (2011). https://doi.org/10.1109/IROS.2011.6048422

  4. Araki, T., Nakamura, T., Nagai, T., Nagasaka, S., Taniguchi, T., Iwahashi, N.: Online learning of concepts and words using multimodal LDA and hierarchical Pitman-Yor Language Model. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1623–1630 (2012). https://doi.org/10.1109/IROS.2012.6385812

  5. Asano, R., Boeckx, C.: Syntax in language and music: what is the right level of comparison? Front. Psychol. 6, 942 (2015)

    Article  Google Scholar 

  6. Atherton, R.P., et al.: Shared processing of language and music: evidence from a cross-modal interference paradigm. Exp. Psychol. 65(1), 40 (2018)

    Article  Google Scholar 

  7. Baevski, A., Zhou, Y., Mohamed, A., Auli, M.: wav2vec 2.0: a framework for self-supervised learning of speech representations. In: Advances in Neural Information Processing Systems, vol. 33, 12449–12460 (2020)

    Google Scholar 

  8. Barrett, L.F., Simmons, W.K.: Interoceptive predictions in the brain. Nat. Rev. Neurosci. 16(7), 419–429 (2015)

    Article  Google Scholar 

  9. Barsalou, L.W.: Perceptual symbol systems. Behav. Brain Sci. 22(04), 1–16 (1999). https://doi.org/10.1017/S0140525X99002149

    Article  Google Scholar 

  10. Berwick, R.C., Beckers, G.J., Okanoya, K., Bolhuis, J.J.: A bird’s eye view of human language evolution. Front. Evol. Neurosci. 4, 5 (2012)

    Article  Google Scholar 

  11. Bishop, C.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, Heidelberg (2006)

    MATH  Google Scholar 

  12. Bommasani, R., et al.: On the opportunities and risks of foundation models (2021). https://doi.org/10.48550/ARXIV.2108.07258

  13. Briot, J.P., Hadjeres, G., Pachet, F.D.: Deep learning techniques for music generation–a survey. arXiv preprint arXiv:1709.01620 (2017)

  14. Brown, S.: Are music and language homologues? Ann. N. Y. Acad. Sci. 930(1), 372–374 (2001)

    Article  Google Scholar 

  15. Brown, S., Martinez, M.J., Parsons, L.M.: Music and language side by side in the brain: a pet study of the generation of melodies and sentences. Eur. J. Neurosci. 23(10), 2791–2803 (2006)

    Article  Google Scholar 

  16. Brown, T., et al.: Language models are few-shot learners. In: Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901 (2020)

    Google Scholar 

  17. Cangelosi, A., Schlesinger, M.: Developmental Robotics. The MIT Press, Cambridge (2015)

    Book  Google Scholar 

  18. Chandler, D.: Semiotics the Basics. Routledge, Milton Park (2002)

    Google Scholar 

  19. Diéguez, P.L., Soo, V.W.: Variational autoencoders for polyphonic music interpolation. In: 2020 International Conference on Technologies and Applications of Artificial Intelligence (TAAI), pp. 56–61 (2020)

    Google Scholar 

  20. Dunbar, E., et al.: The zero resource speech challenge 2017. In: 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 323–330 (2017)

    Google Scholar 

  21. Feld, S., Fox, A.A.: Music and language. Ann. Rev. Anthropol. 23, 25–53 (1994)

    Article  Google Scholar 

  22. Flavell, J.H.: The Developmental Psychology of Jean Piaget. Literary Licensing, LLC (2011)

    Google Scholar 

  23. Friston, K., Moran, R.J., Nagai, Y., Taniguchi, T., Gomi, H., Tenenbaum, J.: World model learning and inference. Neural Netw. 144, 573–590 (2021)

    Article  Google Scholar 

  24. Furukawa, K., Taniguchi, A., Hagiwara, Y., Taniguchi, T.: Symbol emergence as inter-personal categorization with head-to-head latent word. In: IEEE International Conference on Development and Learning (ICDL 2022), pp. 60–67 (2022)

    Google Scholar 

  25. Hagiwara, Y., Furukawa, K., Taniguchi, A., Taniguchi, T.: Multiagent multimodal categorization for symbol emergence: emergent communication via interpersonal cross-modal inference. Adv. Robot. 36(5–6), 239–260 (2022)

    Article  Google Scholar 

  26. Hagiwara, Y., Inoue, M., Kobayashi, H., Taniguchi, T.: Hierarchical spatial concept formation based on multimodal information for human support robots. Front. Neurorobot. 12(11), 1–16 (2018)

    Google Scholar 

  27. Hagiwara, Y., Kobayashi, H., Taniguchi, A., Taniguchi, T.: Symbol emergence as an interpersonal multimodal categorization. Front. Robot. AI 6(134), 1–17 (2019). https://doi.org/10.3389/frobt.2019.00134

    Article  Google Scholar 

  28. Hohwy, J.: The Predictive Mind. OUP, Oxford (2013)

    Book  Google Scholar 

  29. Huang, C.Z.A., et al.: Music transformer. arXiv preprint arXiv:1809.04281 (2018)

  30. Huang, Y.S., Yang, Y.H.: Pop music transformer: beat-based modeling and generation of expressive pop piano compositions. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1180–1188 (2020)

    Google Scholar 

  31. Jackendoff, R., Lerdahl, F.: A grammatical parallel between music and language. In: Clynes, M. (ed.) Music, Mind, and Brain, pp. 83–117. Springer, Cham (1982). https://doi.org/10.1007/978-1-4684-8917-0_5

    Chapter  Google Scholar 

  32. Jiang, J., Xia, G.G., Carlton, D.B., Anderson, C.N., Miyakawa, R.H.: Transformer VAE: a hierarchical model for structure-aware and interpretable music representation learning. In: 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2020, pp. 516–520. IEEE (2020)

    Google Scholar 

  33. Mochihashi, D., Sumita, E.: The infinite Markov model. In: Advances in Neural Information Processing Systems, vol. 20 (2007)

    Google Scholar 

  34. Nakamura, T., Ando, Y., Nagai, T., Kaneko, M.: Concept formation by robots using an infinite mixture of models. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2015)

    Google Scholar 

  35. Nakamura, T., Araki, T., Nagai, T., Iwahashi, N.: Grounding of word meanings in LDA-based multimodal concepts. Adv. Robot. 25, 2189–2206 (2012)

    Article  Google Scholar 

  36. Nakamura, T., Nagai, T., Funakoshi, K., Nagasaka, S., Taniguchi, T., Iwahashi, N.: Mutual learning of an object concept and language model based on MLDA and NPYLM. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 600–607 (2014)

    Google Scholar 

  37. Nakamura, T., Nagai, T., Iwahashi, N.: Multimodal object categorization by a robot. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2415–2420 (2007). https://doi.org/10.1109/IROS.2007.4399634

  38. Nakamura, T., Nagai, T., Iwahashi, N.: Bag of multimodal hierarchical Dirichlet processes: model of complex conceptual structure for intelligent robots. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3818–3823 (2012). https://doi.org/10.1109/IROS.2012.6385502

  39. Nakamura, T., Nagai, T., Taniguchi, T.: Serket: an architecture for connecting stochastic models to realize a large-scale cognitive model. Front. Neurorobot. 12, 25 (2018)

    Article  Google Scholar 

  40. van Niekerk, B., Nortje, L., Kamper, H.: Vector-quantized neural networks for acoustic unit discovery in the zerospeech 2020 challenge. arXiv preprint arXiv:2005.09409 (2020)

  41. Okanoya, K.: Language evolution and an emergent property. Curr. Opin. Neurobiol. 17(2), 271–276 (2007). https://doi.org/10.1016/j.conb.2007.03.011

    Article  Google Scholar 

  42. Okanoya, K.: Sexual communication and domestication may give rise to the signal complexity necessary for the emergence of language: an indication from songbird studies. Psychon. Bull. Rev. 24(1), 106–110 (2017)

    Article  Google Scholar 

  43. Okanoya, K., Merker, B.: Neural substrates for string-context mutual segmentation: a path to human language. In: Lyon, C., Nehaniv, C.L., Cangelosi, A. (eds.) Emergence of Communication and Language, pp. 421–434. Springer, London (2007). https://doi.org/10.1007/978-1-84628-779-4_22

    Chapter  Google Scholar 

  44. Okuda, Y., Ozaki, R., Komura, S., Taniguchi, T.: Double articulation analyzer with prosody for unsupervised word and phone discovery. IEEE Trans. Cogn. Dev. Syst. (2022). https://doi.org/10.1109/TCDS.2022.3210751

  45. Peirce, C.S.: Collected Writings. Harvard University Press, Cambridge (1931–1958)

    Google Scholar 

  46. Saffran, J.R., Newport, E.L., Aslin, R.N.: Word segmentation: the role of distributional cues. J. Mem. Lang. 35(4), 606–621 (1996)

    Article  Google Scholar 

  47. Seth, A.K.: Interoceptive inference, emotion, and the embodied self. Trends Cogn. Sci. 17(11), 565–573 (2013)

    Article  Google Scholar 

  48. Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948)

    Article  MathSciNet  MATH  Google Scholar 

  49. Shirai, A., Taniguchi, T.: A proposal of an interactive music composition system using Gibbs sampler. In: Jacko, J.A. (ed.) HCI 2011. LNCS, vol. 6761, pp. 490–497. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21602-2_53

    Chapter  Google Scholar 

  50. Shirai, A., Taniguchi, T.: A proposal of the melody generation method using variable-order Pitman-Yor language model. J. Jpn. Soc. Fuzzy Theory Intell. Inform. 25(6), 901–913 (2013). https://doi.org/10.3156/jsoft.25.901

    Article  Google Scholar 

  51. Sternin, A., McGarry, L.M., Owen, A.M., Grahn, J.A.: The effect of familiarity on neural representations of music and language. J. Cogn. Neurosci. 33(8), 1595–1611 (2021)

    Google Scholar 

  52. Suzuki, M., Matsuo, Y.: A survey of multimodal deep generative models. Adv. Robot. 36(5–6), 261–278 (2022)

    Article  Google Scholar 

  53. Taniguchi, A., Fukawa, A., Yamakawa, H.: Hippocampal formation-inspired probabilistic generative model. Neural Netw. 151, 317–335 (2022)

    Article  Google Scholar 

  54. Taniguchi, A., Hagiwara, Y., Taniguchi, T., Inamura, T.: Online spatial concept and lexical acquisition with simultaneous localization and mapping. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 811–818 (2017)

    Google Scholar 

  55. Taniguchi, A., Hagiwara, Y., Taniguchi, T., Inamura, T.: Improved and scalable online learning of spatial concepts and language models with mapping. Auton. Robot. 44(6), 927–946 (2020). https://doi.org/10.1007/s10514-020-09905-0

    Article  Google Scholar 

  56. Taniguchi, A., Isobe, S., El Hafi, L., Hagiwara, Y., Taniguchi, T.: Autonomous planning based on spatial concepts to tidy up home environments with service robots. Adv. Robot. 35(8), 471–489 (2021)

    Article  Google Scholar 

  57. Taniguchi, A., Murakami, H., Ozaki, R., Taniguchi, T.: Unsupervised multimodal word discovery based on double articulation analysis with co-occurrence cues. arXiv preprint arXiv:2201.06786 (2022)

  58. Taniguchi, A., Muro, M., Yamakawa, H., Taniguchi, T.: Brain-inspired probabilistic generative model for double articulation analysis of spoken language. In: IEEE International Conference on Development and Learning (ICDL 2022), pp. 107–114 (2022)

    Google Scholar 

  59. Taniguchi, A., Taniguchi, T., Inamura, T.: Spatial concept acquisition for a mobile robot that integrates self-localization and unsupervised word discovery from spoken sentences. IEEE Trans. Cogn. Dev. Syst. 8(4), 285–297 (2016)

    Google Scholar 

  60. Taniguchi, A., Taniguchi, T., Inamura, T.: Unsupervised spatial lexical acquisition by updating a language model with place clues. Robot. Auton. Syst. 99, 166–180 (2018)

    Article  Google Scholar 

  61. Taniguchi, T., Nagai, T., Nakamura, T., Iwahashi, N., Ogata, T., Asoh, H.: Symbol emergence in robotics: a survey. Adv. Robot. 30(11–12), 706–728 (2016)

    Article  Google Scholar 

  62. Taniguchi, T., Nagasaka, S., Nakashima, R.: Nonparametric Bayesian double articulation analyzer for direct language acquisition from continuous speech signals. IEEE Trans. Cogn. Dev. Syst. 8(3), 171–185 (2016). https://doi.org/10.1109/TCDS.2016.2550591

    Article  Google Scholar 

  63. Taniguchi, T., et al.: Neuro-serket: development of integrative cognitive system through the composition of deep probabilistic generative models. N. Gener. Comput. 38(1), 23–48 (2020)

    Article  Google Scholar 

  64. Taniguchi, T., Nakashima, R., Liu, H., Nagasaka, S.: Double articulation analyzer with deep sparse autoencoder for unsupervised word discovery from speech signals. Adv. Robot. 30(11–12), 770–783 (2016). https://doi.org/10.1080/01691864.2016.1159981

    Article  Google Scholar 

  65. Taniguchi, T., Sawaragi, T.: Incremental acquisition of behaviors and signs based on a reinforcement learning schemata model and a spike timing-dependent plasticity network. Adv. Robot. 21(10), 1177–1199 (2007)

    Article  Google Scholar 

  66. Taniguchi, T., et al.: Symbol emergence in cognitive developmental systems: a survey. IEEE Trans. Cogn. Dev. Syst. 11, 494–516 (2018)

    Article  Google Scholar 

  67. Taniguchi, T., et al.: A whole brain probabilistic generative model: toward realizing cognitive architectures for developmental robots. Neural Netw. 150, 293–312 (2022)

    Article  Google Scholar 

  68. Taniguchi, T., Yoshida, Y., Taniguchi, A., Hagiwara, Y.: Emergent communication through metropolis-hastings naming game with deep generative models. arXiv preprint arXiv:2205.12392 (2022)

  69. Taniguchi, T., Yoshino, R., Takano, T.: Multimodal hierarchical Dirichlet process-based active perception by a robot. Front. Neurorobot. 12, 22 (2018)

    Article  Google Scholar 

  70. Tjandra, A., Sakti, S., Nakamura, S.: Transformer VQ-VAE for unsupervised unit discovery and speech synthesis: zerospeech 2020 challenge. arXiv preprint arXiv:2005.11676 (2020)

  71. Von Uexküll, J.: A stroll through the worlds of animals and men: a picture book of invisible worlds. Semiotica 89(4), 319–391 (1992)

    Google Scholar 

  72. Vuust, P., Heggli, O.A., Friston, K.J., Kringelbach, M.L.: Music in the brain. Nat. Rev. Neurosci. 23(5), 287–305 (2022)

    Article  Google Scholar 

  73. Yamakawa, H., Osawa, M., Matsuo, Y.: Whole brain architecture approach is a feasible way toward an artificial general intelligence. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds.) ICONIP 2016. LNCS, vol. 9947, pp. 275–281. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46687-3_30

    Chapter  Google Scholar 

Download references

Acknowledgments

This paper was written as a post-proceedings paper for the keynote speech titled “Generative Models for Symbol Emergence based on Real-World Sensory-motor Information and Communication” presented at the 15th International Symposium on Computer Music Multidisciplinary Research (CMMR) 2021. This work was supported by JSPS KAKENHI Grant Numbers JP16H06569 and JP21H04904.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tadahiro Taniguchi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Taniguchi, T. (2023). On Parallelism in Music and Language: A Perspective from Symbol Emergence Systems Based on Probabilistic Generative Models. In: Aramaki, M., Hirata, K., Kitahara, T., Kronland-Martinet, R., Ystad, S. (eds) Music in the AI Era. CMMR 2021. Lecture Notes in Computer Science, vol 13770 . Springer, Cham. https://doi.org/10.1007/978-3-031-35382-6_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-35382-6_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-35381-9

  • Online ISBN: 978-3-031-35382-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics