skip to main content
10.1145/3394171.3413894acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Scene-Aware Background Music Synthesis

Published:12 October 2020Publication History

ABSTRACT

In this paper, we introduce an interactive background music synthesis algorithm guided by visual content. We leverage a cascading strategy to synthesize background music in two stages: Scene Visual Analysis and Background Music Synthesis. First, seeking a deep learning-based solution, we leverage neural networks to analyze the sentiment of the input scene. Second, real-time background music is synthesized by optimizing a cost function that guides the selection and transition of music clips to maximize the emotion consistency between visual and auditory criteria, and music continuity. In our experiments, we demonstrate the proposed approach can synthesize dynamic background music for different types of scenarios. We also conducted quantitative and qualitative analysis on the synthesized results of multiple example scenes to validate the efficacy of our approach.

Skip Supplemental Material Section

Supplemental Material

3394171.3413894.mp4

mp4

112.9 MB

References

  1. 2018. Inside the booming business of background music. https://www.theguardian.com/news/2018/nov/06/inside-the-booming-business-of-background-music.Google ScholarGoogle Scholar
  2. Sami Abboud, Shlomi Hanassy, Shelly Levy-Tzedek, Shachar Maidenbaum, and Amir Amedi. 2014. EyeMusic: Introducing a “visual” colorful experience for the blind using auditory sensory substitution. Restorative neurology and neuroscience, Vol. 32, 2 (2014), 247--257.Google ScholarGoogle Scholar
  3. Mohammed Habibullah Baig, Jibin Rajan Varghese, and Zhangyang Wang. 2018. MusicMapp: A Deep Learning Based Solution for Music Exploration and Visual Interaction. In ACM Multimedia. 1253--1255.Google ScholarGoogle Scholar
  4. Laura-Lee Balkwill and William Forde Thompson. 1999. A cross-cultural investigation of the perception of emotion in music: Psychophysical and cultural cues. Music perception: an interdisciplinary journal, Vol. 17, 1 (1999), 43--64.Google ScholarGoogle Scholar
  5. Jared S Bauer, Alex Jansen, and Jesse Cirimele. 2011. MoodMusic: a method for cooperative, generative music playlist creation. In UIST. ACM, 85--86.Google ScholarGoogle Scholar
  6. Thomas Baumgartner, Michaela Esslen, and Lutz Jancke. 2006. From emotion perception to emotion experience: Emotions evoked by pictures and classical music. International journal of psychophysiology, Vol. 60, 1 (2006), 34--43.Google ScholarGoogle Scholar
  7. Axel Berndt, Knut Hartmann, Niklas Röber, and Maic Masuch. 2006. Composition and arrangement techniques for music in interactive immersive environments. Audio Mostly, Vol. 2006 (2006), 53--59.Google ScholarGoogle Scholar
  8. John Ashley Burgoyne, Jonathan Wild, and Ichiro Fujinaga. 2011. An Expert Ground Truth Set for Audio Chord Recognition and Music Analysis.. In ISMIR, Vol. 11. 633--638.Google ScholarGoogle Scholar
  9. Victor Campos, Brendan Jou, and Xavier Giro-i Nieto. 2017. From pixels to sentiment: Fine-tuning CNNs for visual sentiment prediction. Image and Vision Computing, Vol. 65 (2017), 15--22.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Sofia Cavaco, J Tomás Henriques, Michele Mengucci, Nuno Correia, and Francisco Medeiros. 2013. Color sonification for the visually impaired. Procedia Technology, Vol. 9 (2013), 1048--1057.Google ScholarGoogle ScholarCross RefCross Ref
  11. Fu-Yin Cherng, Yi-Chen Lee, Jung-Tai King, and Wen-Chieh Lin. 2019. Measuring the Influences of Musical Parameters on Cognitive and Behavioral Responses to Audio Notifications Using EEG and Large-scale Online Studies. In ACM SIGCHI. ACM, 409.Google ScholarGoogle Scholar
  12. John Clough and Gerald Myerson. 1986. Musical scales and the generalized circle of fifths. The american mathematical monthly, Vol. 93, 9 (1986), 695--701.Google ScholarGoogle Scholar
  13. Daniel PW Ellis and Graham E Poliner. 2007. Identifyingcover songs' with chroma features and dynamic programming beat tracking. In ICASSP, Vol. 4. IEEE, IV--1429.Google ScholarGoogle Scholar
  14. Shaojing Fan, Zhiqi Shen, Ming Jiang, Bryan L Koenig, Juan Xu, Mohan S Kankanhalli, and Qi Zhao. 2018. Emotional attention: A study of image sentiment and visual attention. In CVPR. 7521--7531.Google ScholarGoogle Scholar
  15. Eric Fassbender, Deborah Richards, Ayse Bilgin, William Forde Thompson, and Wolfgang Heiden. 2012. VirSchool: The effect of background music and immersive display systems on memory for facts learned in an educational virtual environment. Computers & Education, Vol. 58, 1 (2012), 490--500.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. JG Fox. 1971. Background music and industrial efficiency - a review. Applied ergonomics, Vol. 2, 2 (1971), 70--73.Google ScholarGoogle Scholar
  17. Heitor Guimaraes. 2018. Music Genre classification using Convolutional Neural Networks. Github.Google ScholarGoogle Scholar
  18. Johann David Heinichen. 1969. Der General-Baß in der Composition [1728]. Hildesheim: Olms (1969).Google ScholarGoogle Scholar
  19. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.Google ScholarGoogle Scholar
  20. Christine Hosey, Lara Vujović, Brian St Thomas, Jean Garcia-Gathright, and Jennifer Thom. 2019. Just Give Me What I Want: How People Use and Evaluate Music Search. In ACM SIGCHI. ACM, 299.Google ScholarGoogle Scholar
  21. Qibin Hou, Ming-Ming Cheng, Xiaowei Hu, Ali Borji, Zhuowen Tu, and Philip HS Torr. 2017. Deeply supervised salient object detection with short connections. In CVPR. 3203--3212.Google ScholarGoogle Scholar
  22. Haikun Huang, Michael Solah, Dingzeyu Li, and Lap-Fai Yu. 2019. Audible Panorama: Automatic Spatial Audio Generation for Panorama Imagery. In ACM SIGCHI. ACM, 621.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Danyal Imran. 2016. Music Emotion Recognition. Github. https://github.com/danz1ka19/Music-Emotion-Recognition.Google ScholarGoogle Scholar
  24. Junki Kikuchi, Hidekatsu Yanagi, and Yoshiaki Mima. 2016. Music composition with recommendation. In UIST. ACM, 137--138.Google ScholarGoogle Scholar
  25. Peter J Lang. 1979. A bio-informational theory of emotional imagery. Psychophysiology, Vol. 16, 6 (1979), 495--512.Google ScholarGoogle ScholarCross RefCross Ref
  26. Peter J Lang, Margaret M Bradley, and Bruce N Cuthbert. 1998. Emotion, motivation, and anxiety: Brain mechanisms and psychophysiology. Biological psychiatry, Vol. 44, 12 (1998), 1248--1263.Google ScholarGoogle Scholar
  27. Jen-Chun Lin, Wen-Li Wei, James Yang, Hsin-Min Wang, and Hong-Yuan Mark Liao. 2017. Automatic music video generation based on simultaneous soundtrack recommendation and video editing. In ACM Multimedia. 519--527.Google ScholarGoogle Scholar
  28. Tie Liu, Zejian Yuan, Jian Sun, Jingdong Wang, Nanning Zheng, Xiaoou Tang, and Heung-Yeung Shum. 2010. Learning to detect a salient object. TPAMI, Vol. 33, 2 (2010), 353--367.Google ScholarGoogle Scholar
  29. Xi Lu, Xiaohang Liu, and Erik Stolterman Bergqvist. 2019. It sounds like she is sad: Introducing a Biosensing Prototype that Transforms Emotions into Real-time Music and Facilitates Social Interaction. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, LBW2219.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Vincent P Magnini and Emily E Parker. 2009. The psychological effects of music: Implications for hotel firms. Journal of Vacation Marketing, Vol. 15, 1 (2009), 53--62.Google ScholarGoogle ScholarCross RefCross Ref
  31. Jon McCormack, Toby Gifford, Patrick Hutchings, Maria Teresa Llano Rodriguez, Matthew Yee-King, and Mark d'Inverno. 2019. In a Silent Way: Communication Between AI and Improvising Musicians Beyond Sound. In ACM SIGCHI. ACM, 38.Google ScholarGoogle Scholar
  32. Leonard B Meyer. 2008. Emotion and meaning in music .University of Chicago Press.Google ScholarGoogle Scholar
  33. Meinard Müller and Jonathan Driedger. 2012. Data-driven sound track generation. In Dagstuhl Follow-Ups, Vol. 3. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.Google ScholarGoogle Scholar
  34. Adrian C North, David J Hargreaves, and Jon J Hargreaves. 2004. Uses of music in everyday life. Music Perception: An Interdisciplinary Journal, Vol. 22, 1 (2004), 41--77.Google ScholarGoogle ScholarCross RefCross Ref
  35. Andrew Owens, Jiajun Wu, Josh H McDermott, William T Freeman, and Antonio Torralba. 2016. Ambient sound provides supervision for visual learning. In ECCV. Springer, 801--816.Google ScholarGoogle Scholar
  36. Jose Costa Pereira, Emanuele Coviello, Gabriel Doyle, Nikhil Rasiwasia, Gert RG Lanckriet, Roger Levy, and Nuno Vasconcelos. 2013. On the role of correlation and abstraction in cross-modal multimedia retrieval. TPAMI, Vol. 36, 3 (2013), 521--535.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Debbie Richards, Eric Fassbender, Ayse Bilgin, and William Forde Thompson. 2008. An investigation of the role of background music in IVWs for learning. ALT-J, Vol. 16, 3 (2008), 231--244.Google ScholarGoogle ScholarCross RefCross Ref
  38. Judy Robertson, Andrew de Quincey, Tom Stapleford, and Geraint Wiggins. 1998. Real-time music generation for a virtual environment. In Proceedings of ECAI-98 Workshop on AI/Alife and Entertainment. Citeseer.Google ScholarGoogle Scholar
  39. Steve Rubin and Maneesh Agrawala. 2014. Generating emotionally relevant musical scores for audio stories. In UIST. ACM, 439--448.Google ScholarGoogle Scholar
  40. Zhengshan Shi and Gautham J Mysore. 2018. LoopMaker: Automatic Creation of Music Loops from Pre-recorded Music. In ACM SIGCHI. ACM, 454.Google ScholarGoogle Scholar
  41. Jianchao Tan, Jyh-Ming Lien, and Yotam Gingold. 2016. Decomposing images into layers via RGB-space geometry. TOG, Vol. 36, 1 (2016), 1--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Zhenyu Tang, Nicolas Morales, and Dinesh Manocha. 2018. Dynamic Sound Field Synthesis for Speech and Music Optimization. In ACM Multimedia. 1901--1909.Google ScholarGoogle Scholar
  43. Quoc-Tuan Truong and Hady W Lauw. 2017. Visual sentiment analysis for review images with item-oriented and user-oriented CNN. In ACM Multimedia. 1274--1282.Google ScholarGoogle Scholar
  44. Patricia Valdez and Albert Mehrabian. 1994. Effects of color on emotions. Journal of experimental psychology: General, Vol. 123, 4 (1994), 394.Google ScholarGoogle ScholarCross RefCross Ref
  45. Patrik Vuilleumier. 2005. How brains beware: neural mechanisms of emotional attention. Trends in cognitive sciences, Vol. 9, 12 (2005), 585--594.Google ScholarGoogle Scholar
  46. Ju-Chiang Wang, Hsin-Min Wang, and Shyh-Kang Jeng. 2012. Playing with tagging: A real-time tagging music player. In ICASSP. IEEE, 77--80.Google ScholarGoogle Scholar
  47. Yujia Wang, Wei Liang, Jianbing Shen, Yunde Jia, and Lap-Fai Yu. 2019 a. A deep Coarse-to-Fine network for head pose estimation from synthetic data. Pattern Recognition, Vol. 94 (2019), 196--206.Google ScholarGoogle ScholarCross RefCross Ref
  48. Yujia Wang, Wenguan Wang, Wei Liang, and Lap-Fai Yu. 2019 b. Comic-guided speech synthesis. TOG, Vol. 38, 6 (2019), 1--14.Google ScholarGoogle Scholar
  49. Guy Whitmore. 2003. Design with music in mind: A guide to adaptive audio for game designers. Gamasutra, May, Vol. 29 (2003).Google ScholarGoogle Scholar
  50. Yiming Wu and Wei Li. 2019. Automatic audio chord recognition with midi-trained deep feature and BLSTM-CRF sequence decoding model. Transactions on Audio, Speech and Language Processing, Vol. 27, 2 (2019), 355--366.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Richard Yalch and Eric Spangenberg. 1990. Effects of store music on shopping behavior. Journal of Consumer Marketing, Vol. 7, 2 (1990), 55--63.Google ScholarGoogle ScholarCross RefCross Ref
  52. Yi-Hsuan Yang and Homer H Chen. 2010. Ranking-based emotion recognition for music organization and retrieval. IEEE Transactions on Audio, Speech, and Language Processing, Vol. 19, 4 (2010), 762--774.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Quanzeng You, Jiebo Luo, Hailin Jin, and Jianchao Yang. 2015. Robust image sentiment analysis using progressively trained and domain transferred deep networks. In AAAI. 381--388.Google ScholarGoogle Scholar
  54. Quanzeng You, Jiebo Luo, Hailin Jin, and Jianchao Yang. 2016. Building a large scale dataset for image emotion recognition: The fine print and the benchmark. In AAAI. 308--314.Google ScholarGoogle Scholar
  55. Sicheng Zhao, Yue Gao, Xiaolei Jiang, Hongxun Yao, Tat-Seng Chua, and Xiaoshuai Sun. 2014. Exploring principles-of-art features for image emotion recognition. In ACM Multimedia. 47--56.Google ScholarGoogle Scholar
  56. Yipin Zhou, Zhaowen Wang, Chen Fang, Trung Bui, and Tamara L Berg. 2018. Visual to sound: Generating natural sound for videos in the wild. In CVPR. 3550--3558.Google ScholarGoogle Scholar

Index Terms

  1. Scene-Aware Background Music Synthesis

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MM '20: Proceedings of the 28th ACM International Conference on Multimedia
        October 2020
        4889 pages
        ISBN:9781450379885
        DOI:10.1145/3394171

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 October 2020

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate995of4,171submissions,24%

        Upcoming Conference

        MM '24
        MM '24: The 32nd ACM International Conference on Multimedia
        October 28 - November 1, 2024
        Melbourne , VIC , Australia

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader