SOUND OF(F): Contextual Storytelling Using Machine Learning Representations of Sound and Music

Erol, Zeynep; Zhang, Zhiyuan; Özgünay, Eray; LC, Ray

doi:10.1007/978-3-030-95531-1_23

Zeynep Erol¹⁸,
Zhiyuan Zhang¹⁹,
Eray Özgünay¹⁸ &
…
Ray LC¹⁹

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 422))

Included in the following conference series:

International Conference on ArtsIT, Interactivity and Game Creation

1434 Accesses

The original version of this chapter was revised: Author name has been corrected. The correction to this chapter is available at https://doi.org/10.1007/978-3-030-95531-1_32

Abstract

In dreams, one’s life experiences are jumbled together, so that characters can represent multiple people in your life and sounds can run together without sequential order. To show one’s memories in a dream in a more contextual way, we represent environments and sounds using machine learning approaches that take into account the totality of a complex dataset. The immersive environment uses machine learning to computationally cluster sounds in thematic scenes to allow audiences to grasp the dimensions of the complexity in a dream-like scenario. We applied the t-SNE algorithm to collections of music and voice sequences to explore the way interactions in immersive space can be used to convert temporal sound data into spatial interactions. We designed both 2D and 3D interactions, as well as headspace vs. controller interactions in two case studies, one on segmenting a single work of music and one on a collection of sound fragments, applying it to a Virtual Reality (VR) artwork about replaying memories in a dream. We found that audiences can enrich their experience of the story without necessarily gaining an understanding of the artwork through the machine-learning generated soundscapes. This provides a method for experiencing the temporal sound sequences in an environment spatially using nonlinear exploration in VR.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Change history

07 May 2022
In the original version of this book the name of LC Ray was incorrect, which has now been corrected.

References

Balasubramanian, M.: The isomap algorithm and topological stability. Science 295(5552), 7a–77 (2002)
Article Google Scholar
Böck, S., Krebs, F., Schedl, M.: Evaluating the Online Capabilities of Onset Detection Methods
Google Scholar
Born, G.: Music, Sound and Space: Transformations of Public and Private Experience. Cambridge University Press, Cambridge (2013)
Book Google Scholar
Carr, C.J., Zukowski, Z.: Curating Generative Raw Audio Music with D.O.M.E, Los Angeles, p. 4 (2019)
Google Scholar
Casey, M., Rhodes, C., Slaney, M.: Analysis of minimum distances in high-dimensional musical spaces. IEEE Trans. Audio Speech Lang. Process. 16(5), 1015–1028 (2008)
Article Google Scholar
Cavallo, M., Dholakia, M., Havlena, M., Ocheltree, K., Podlaseck, M.: Dataspace: a reconfigurable hybrid reality environment for collaborative information analysis. In: 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), pp. 145–153 (2019)
Google Scholar
Flexer, A.: Improving Visualization of High-Dimensional Music Similarity Spaces. ISMIR (2015)
Google Scholar
Gemmeke, J.F., Ellis, D.P.W., Freedman, D., et al.: Audio set: an ontology and human-labeled dataset for audio events. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 776–780 (2017)
Google Scholar
Gomez, O., Ganguli, K.K., Kuzmenko, L., Guedes, C.: Exploring music collections: an interactive, dimensionality reduction approach to visualizing Songbanks. In: Proceedings of the 25th International Conference on Intelligent User Interfaces Companion, Association for Computing Machinery, pp. 138–139 (2020)
Google Scholar
Klimenko, S., Charnine, M., Zolotarev, O., Merkureva, N., Khakimova, A.: Semantic approach to visualization of research front of scientific papers using web-based 3D graphic. In: Proceedings of the 23rd International ACM Conference on 3D Web Technology, Association for Computing Machinery, pp. 1–6 (2018)
Google Scholar
Klingemann, M.: Raster Fairy (2016)
Google Scholar
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)
Article MathSciNet Google Scholar
de Leon, F., Martinez, K.: Enhancing timbre model using MFCC and its time derivatives for music similarity estimation, p. 5
Google Scholar
Li, D., Sethi, I.K., Dimitrova, N., McGee, T.: Classification of general audio data for content-based retrieval. Pattern Recogn. Lett. 22(5), 533–544 (2001)
Article Google Scholar
Logan, B.: Mel frequency Cepstral coefficients for music modeling. In: Proceedings of the 1st International Symposium Music Information Retrieval (2000)
Google Scholar
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(86), 2579–2605 (2008)
MATH Google Scholar
Mack, K.: Blortasia: a virtual reality art experience. In: ACM SIGGRAPH 2017 VR Village, Association for Computing Machinery, pp. 1–2 (2017)
Google Scholar
McFee, B., Raffel, C., Liang, D., et al.: librosa: audio and music signal analysis in Python, pp. 18–24 (2015)
Google Scholar
Muelder, C., Provan, T., Ma, K.-L.: Content based graph visualization of audio data for music library navigation. In: 2010 IEEE International Symposium on Multimedia, pp. 129–136 (2010)
Google Scholar
Müller, M.: Information Retrieval for Music and Motion. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74048-3
Book Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., et al.: Scikit-learn: machine learning in Python. Mach. Learn. Python, 6
Google Scholar
Piczak, K.J.: Environmental sound classification with convolutional neural networks. In: 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6 (2015)
Google Scholar
Rong, F.: Audio classification method based on machine learning. In: 2016 International Conference on Intelligent Transportation, Big Data Smart City (ICITBS), pp. 81–84 (2016)
Google Scholar
Roweis, S.T.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
Article Google Scholar
Yu, Y., Beuret, S., Zeng, D., Oyama, K.: Deep learning of human perception in audio event classification. In: 2018 IEEE International Symposium on Multimedia (ISM), pp. 188–189 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Hong Kong University of Science and Technology, Kowloon, Hong Kong
Zeynep Erol & Eray Özgünay
City University of Hong Kong, Kowloon, Hong Kong
Zhiyuan Zhang & Ray LC

Authors

Zeynep Erol
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Eray Özgünay
View author publications
You can also search for this author in PubMed Google Scholar
Ray LC
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ray LC .

Editor information

Editors and Affiliations

Karlsruhe University of Applied Sciences, Karlsruhe, Germany
Matthias Wölfel
Baden State Museum, Karlsruhe, Germany
Johannes Bernhardt
Baden State Museum, Karlsruhe, Germany
Sonja Thiel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Erol, Z., Zhang, Z., Özgünay, E., LC, R. (2022). SOUND OF(F): Contextual Storytelling Using Machine Learning Representations of Sound and Music. In: Wölfel, M., Bernhardt, J., Thiel, S. (eds) ArtsIT, Interactivity and Game Creation. ArtsIT 2021. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 422. Springer, Cham. https://doi.org/10.1007/978-3-030-95531-1_23

Download citation

DOI: https://doi.org/10.1007/978-3-030-95531-1_23
Published: 10 February 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95530-4
Online ISBN: 978-3-030-95531-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics