Skip to main content

SOUND OF(F): Contextual Storytelling Using Machine Learning Representations of Sound and Music

  • Conference paper
  • First Online:
ArtsIT, Interactivity and Game Creation (ArtsIT 2021)

Abstract

In dreams, one’s life experiences are jumbled together, so that characters can represent multiple people in your life and sounds can run together without sequential order. To show one’s memories in a dream in a more contextual way, we represent environments and sounds using machine learning approaches that take into account the totality of a complex dataset. The immersive environment uses machine learning to computationally cluster sounds in thematic scenes to allow audiences to grasp the dimensions of the complexity in a dream-like scenario. We applied the t-SNE algorithm to collections of music and voice sequences to explore the way interactions in immersive space can be used to convert temporal sound data into spatial interactions. We designed both 2D and 3D interactions, as well as headspace vs. controller interactions in two case studies, one on segmenting a single work of music and one on a collection of sound fragments, applying it to a Virtual Reality (VR) artwork about replaying memories in a dream. We found that audiences can enrich their experience of the story without necessarily gaining an understanding of the artwork through the machine-learning generated soundscapes. This provides a method for experiencing the temporal sound sequences in an environment spatially using nonlinear exploration in VR.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Change history

  • 07 May 2022

    In the original version of this book the name of LC Ray was incorrect, which has now been corrected.

References

  1. Balasubramanian, M.: The isomap algorithm and topological stability. Science 295(5552), 7a–77 (2002)

    Article  Google Scholar 

  2. Böck, S., Krebs, F., Schedl, M.: Evaluating the Online Capabilities of Onset Detection Methods

    Google Scholar 

  3. Born, G.: Music, Sound and Space: Transformations of Public and Private Experience. Cambridge University Press, Cambridge (2013)

    Book  Google Scholar 

  4. Carr, C.J., Zukowski, Z.: Curating Generative Raw Audio Music with D.O.M.E, Los Angeles, p. 4 (2019)

    Google Scholar 

  5. Casey, M., Rhodes, C., Slaney, M.: Analysis of minimum distances in high-dimensional musical spaces. IEEE Trans. Audio Speech Lang. Process. 16(5), 1015–1028 (2008)

    Article  Google Scholar 

  6. Cavallo, M., Dholakia, M., Havlena, M., Ocheltree, K., Podlaseck, M.: Dataspace: a reconfigurable hybrid reality environment for collaborative information analysis. In: 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), pp. 145–153 (2019)

    Google Scholar 

  7. Flexer, A.: Improving Visualization of High-Dimensional Music Similarity Spaces. ISMIR (2015)

    Google Scholar 

  8. Gemmeke, J.F., Ellis, D.P.W., Freedman, D., et al.: Audio set: an ontology and human-labeled dataset for audio events. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 776–780 (2017)

    Google Scholar 

  9. Gomez, O., Ganguli, K.K., Kuzmenko, L., Guedes, C.: Exploring music collections: an interactive, dimensionality reduction approach to visualizing Songbanks. In: Proceedings of the 25th International Conference on Intelligent User Interfaces Companion, Association for Computing Machinery, pp. 138–139 (2020)

    Google Scholar 

  10. Klimenko, S., Charnine, M., Zolotarev, O., Merkureva, N., Khakimova, A.: Semantic approach to visualization of research front of scientific papers using web-based 3D graphic. In: Proceedings of the 23rd International ACM Conference on 3D Web Technology, Association for Computing Machinery, pp. 1–6 (2018)

    Google Scholar 

  11. Klingemann, M.: Raster Fairy (2016)

    Google Scholar 

  12. Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)

    Article  MathSciNet  Google Scholar 

  13. de Leon, F., Martinez, K.: Enhancing timbre model using MFCC and its time derivatives for music similarity estimation, p. 5

    Google Scholar 

  14. Li, D., Sethi, I.K., Dimitrova, N., McGee, T.: Classification of general audio data for content-based retrieval. Pattern Recogn. Lett. 22(5), 533–544 (2001)

    Article  Google Scholar 

  15. Logan, B.: Mel frequency Cepstral coefficients for music modeling. In: Proceedings of the 1st International Symposium Music Information Retrieval (2000)

    Google Scholar 

  16. van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(86), 2579–2605 (2008)

    MATH  Google Scholar 

  17. Mack, K.: Blortasia: a virtual reality art experience. In: ACM SIGGRAPH 2017 VR Village, Association for Computing Machinery, pp. 1–2 (2017)

    Google Scholar 

  18. McFee, B., Raffel, C., Liang, D., et al.: librosa: audio and music signal analysis in Python, pp. 18–24 (2015)

    Google Scholar 

  19. Muelder, C., Provan, T., Ma, K.-L.: Content based graph visualization of audio data for music library navigation. In: 2010 IEEE International Symposium on Multimedia, pp. 129–136 (2010)

    Google Scholar 

  20. Müller, M.: Information Retrieval for Music and Motion. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74048-3

    Book  Google Scholar 

  21. Pedregosa, F., Varoquaux, G., Gramfort, A., et al.: Scikit-learn: machine learning in Python. Mach. Learn. Python, 6

    Google Scholar 

  22. Piczak, K.J.: Environmental sound classification with convolutional neural networks. In: 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6 (2015)

    Google Scholar 

  23. Rong, F.: Audio classification method based on machine learning. In: 2016 International Conference on Intelligent Transportation, Big Data Smart City (ICITBS), pp. 81–84 (2016)

    Google Scholar 

  24. Roweis, S.T.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)

    Article  Google Scholar 

  25. Yu, Y., Beuret, S., Zeng, D., Oyama, K.: Deep learning of human perception in audio event classification. In: 2018 IEEE International Symposium on Multimedia (ISM), pp. 188–189 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ray LC .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Erol, Z., Zhang, Z., Özgünay, E., LC, R. (2022). SOUND OF(F): Contextual Storytelling Using Machine Learning Representations of Sound and Music. In: Wölfel, M., Bernhardt, J., Thiel, S. (eds) ArtsIT, Interactivity and Game Creation. ArtsIT 2021. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 422. Springer, Cham. https://doi.org/10.1007/978-3-030-95531-1_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-95531-1_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-95530-4

  • Online ISBN: 978-3-030-95531-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics