ABSTRACT
In this paper, we introduce an interactive background music synthesis algorithm guided by visual content. We leverage a cascading strategy to synthesize background music in two stages: Scene Visual Analysis and Background Music Synthesis. First, seeking a deep learning-based solution, we leverage neural networks to analyze the sentiment of the input scene. Second, real-time background music is synthesized by optimizing a cost function that guides the selection and transition of music clips to maximize the emotion consistency between visual and auditory criteria, and music continuity. In our experiments, we demonstrate the proposed approach can synthesize dynamic background music for different types of scenarios. We also conducted quantitative and qualitative analysis on the synthesized results of multiple example scenes to validate the efficacy of our approach.
Supplemental Material
Available for Download
Supplementary material for "Scene-Aware Background Music Synthesis" including i) virtual scene panoramas; ii) detailed statistical results of qualitative experiment results.
- 2018. Inside the booming business of background music. https://www.theguardian.com/news/2018/nov/06/inside-the-booming-business-of-background-music.Google Scholar
- Sami Abboud, Shlomi Hanassy, Shelly Levy-Tzedek, Shachar Maidenbaum, and Amir Amedi. 2014. EyeMusic: Introducing a “visual” colorful experience for the blind using auditory sensory substitution. Restorative neurology and neuroscience, Vol. 32, 2 (2014), 247--257.Google Scholar
- Mohammed Habibullah Baig, Jibin Rajan Varghese, and Zhangyang Wang. 2018. MusicMapp: A Deep Learning Based Solution for Music Exploration and Visual Interaction. In ACM Multimedia. 1253--1255.Google Scholar
- Laura-Lee Balkwill and William Forde Thompson. 1999. A cross-cultural investigation of the perception of emotion in music: Psychophysical and cultural cues. Music perception: an interdisciplinary journal, Vol. 17, 1 (1999), 43--64.Google Scholar
- Jared S Bauer, Alex Jansen, and Jesse Cirimele. 2011. MoodMusic: a method for cooperative, generative music playlist creation. In UIST. ACM, 85--86.Google Scholar
- Thomas Baumgartner, Michaela Esslen, and Lutz Jancke. 2006. From emotion perception to emotion experience: Emotions evoked by pictures and classical music. International journal of psychophysiology, Vol. 60, 1 (2006), 34--43.Google Scholar
- Axel Berndt, Knut Hartmann, Niklas Röber, and Maic Masuch. 2006. Composition and arrangement techniques for music in interactive immersive environments. Audio Mostly, Vol. 2006 (2006), 53--59.Google Scholar
- John Ashley Burgoyne, Jonathan Wild, and Ichiro Fujinaga. 2011. An Expert Ground Truth Set for Audio Chord Recognition and Music Analysis.. In ISMIR, Vol. 11. 633--638.Google Scholar
- Victor Campos, Brendan Jou, and Xavier Giro-i Nieto. 2017. From pixels to sentiment: Fine-tuning CNNs for visual sentiment prediction. Image and Vision Computing, Vol. 65 (2017), 15--22.Google ScholarDigital Library
- Sofia Cavaco, J Tomás Henriques, Michele Mengucci, Nuno Correia, and Francisco Medeiros. 2013. Color sonification for the visually impaired. Procedia Technology, Vol. 9 (2013), 1048--1057.Google ScholarCross Ref
- Fu-Yin Cherng, Yi-Chen Lee, Jung-Tai King, and Wen-Chieh Lin. 2019. Measuring the Influences of Musical Parameters on Cognitive and Behavioral Responses to Audio Notifications Using EEG and Large-scale Online Studies. In ACM SIGCHI. ACM, 409.Google Scholar
- John Clough and Gerald Myerson. 1986. Musical scales and the generalized circle of fifths. The american mathematical monthly, Vol. 93, 9 (1986), 695--701.Google Scholar
- Daniel PW Ellis and Graham E Poliner. 2007. Identifyingcover songs' with chroma features and dynamic programming beat tracking. In ICASSP, Vol. 4. IEEE, IV--1429.Google Scholar
- Shaojing Fan, Zhiqi Shen, Ming Jiang, Bryan L Koenig, Juan Xu, Mohan S Kankanhalli, and Qi Zhao. 2018. Emotional attention: A study of image sentiment and visual attention. In CVPR. 7521--7531.Google Scholar
- Eric Fassbender, Deborah Richards, Ayse Bilgin, William Forde Thompson, and Wolfgang Heiden. 2012. VirSchool: The effect of background music and immersive display systems on memory for facts learned in an educational virtual environment. Computers & Education, Vol. 58, 1 (2012), 490--500.Google ScholarDigital Library
- JG Fox. 1971. Background music and industrial efficiency - a review. Applied ergonomics, Vol. 2, 2 (1971), 70--73.Google Scholar
- Heitor Guimaraes. 2018. Music Genre classification using Convolutional Neural Networks. Github.Google Scholar
- Johann David Heinichen. 1969. Der General-Baß in der Composition [1728]. Hildesheim: Olms (1969).Google Scholar
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.Google Scholar
- Christine Hosey, Lara Vujović, Brian St Thomas, Jean Garcia-Gathright, and Jennifer Thom. 2019. Just Give Me What I Want: How People Use and Evaluate Music Search. In ACM SIGCHI. ACM, 299.Google Scholar
- Qibin Hou, Ming-Ming Cheng, Xiaowei Hu, Ali Borji, Zhuowen Tu, and Philip HS Torr. 2017. Deeply supervised salient object detection with short connections. In CVPR. 3203--3212.Google Scholar
- Haikun Huang, Michael Solah, Dingzeyu Li, and Lap-Fai Yu. 2019. Audible Panorama: Automatic Spatial Audio Generation for Panorama Imagery. In ACM SIGCHI. ACM, 621.Google ScholarDigital Library
- Danyal Imran. 2016. Music Emotion Recognition. Github. https://github.com/danz1ka19/Music-Emotion-Recognition.Google Scholar
- Junki Kikuchi, Hidekatsu Yanagi, and Yoshiaki Mima. 2016. Music composition with recommendation. In UIST. ACM, 137--138.Google Scholar
- Peter J Lang. 1979. A bio-informational theory of emotional imagery. Psychophysiology, Vol. 16, 6 (1979), 495--512.Google ScholarCross Ref
- Peter J Lang, Margaret M Bradley, and Bruce N Cuthbert. 1998. Emotion, motivation, and anxiety: Brain mechanisms and psychophysiology. Biological psychiatry, Vol. 44, 12 (1998), 1248--1263.Google Scholar
- Jen-Chun Lin, Wen-Li Wei, James Yang, Hsin-Min Wang, and Hong-Yuan Mark Liao. 2017. Automatic music video generation based on simultaneous soundtrack recommendation and video editing. In ACM Multimedia. 519--527.Google Scholar
- Tie Liu, Zejian Yuan, Jian Sun, Jingdong Wang, Nanning Zheng, Xiaoou Tang, and Heung-Yeung Shum. 2010. Learning to detect a salient object. TPAMI, Vol. 33, 2 (2010), 353--367.Google Scholar
- Xi Lu, Xiaohang Liu, and Erik Stolterman Bergqvist. 2019. It sounds like she is sad: Introducing a Biosensing Prototype that Transforms Emotions into Real-time Music and Facilitates Social Interaction. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, LBW2219.Google ScholarDigital Library
- Vincent P Magnini and Emily E Parker. 2009. The psychological effects of music: Implications for hotel firms. Journal of Vacation Marketing, Vol. 15, 1 (2009), 53--62.Google ScholarCross Ref
- Jon McCormack, Toby Gifford, Patrick Hutchings, Maria Teresa Llano Rodriguez, Matthew Yee-King, and Mark d'Inverno. 2019. In a Silent Way: Communication Between AI and Improvising Musicians Beyond Sound. In ACM SIGCHI. ACM, 38.Google Scholar
- Leonard B Meyer. 2008. Emotion and meaning in music .University of Chicago Press.Google Scholar
- Meinard Müller and Jonathan Driedger. 2012. Data-driven sound track generation. In Dagstuhl Follow-Ups, Vol. 3. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.Google Scholar
- Adrian C North, David J Hargreaves, and Jon J Hargreaves. 2004. Uses of music in everyday life. Music Perception: An Interdisciplinary Journal, Vol. 22, 1 (2004), 41--77.Google ScholarCross Ref
- Andrew Owens, Jiajun Wu, Josh H McDermott, William T Freeman, and Antonio Torralba. 2016. Ambient sound provides supervision for visual learning. In ECCV. Springer, 801--816.Google Scholar
- Jose Costa Pereira, Emanuele Coviello, Gabriel Doyle, Nikhil Rasiwasia, Gert RG Lanckriet, Roger Levy, and Nuno Vasconcelos. 2013. On the role of correlation and abstraction in cross-modal multimedia retrieval. TPAMI, Vol. 36, 3 (2013), 521--535.Google ScholarDigital Library
- Debbie Richards, Eric Fassbender, Ayse Bilgin, and William Forde Thompson. 2008. An investigation of the role of background music in IVWs for learning. ALT-J, Vol. 16, 3 (2008), 231--244.Google ScholarCross Ref
- Judy Robertson, Andrew de Quincey, Tom Stapleford, and Geraint Wiggins. 1998. Real-time music generation for a virtual environment. In Proceedings of ECAI-98 Workshop on AI/Alife and Entertainment. Citeseer.Google Scholar
- Steve Rubin and Maneesh Agrawala. 2014. Generating emotionally relevant musical scores for audio stories. In UIST. ACM, 439--448.Google Scholar
- Zhengshan Shi and Gautham J Mysore. 2018. LoopMaker: Automatic Creation of Music Loops from Pre-recorded Music. In ACM SIGCHI. ACM, 454.Google Scholar
- Jianchao Tan, Jyh-Ming Lien, and Yotam Gingold. 2016. Decomposing images into layers via RGB-space geometry. TOG, Vol. 36, 1 (2016), 1--14.Google ScholarDigital Library
- Zhenyu Tang, Nicolas Morales, and Dinesh Manocha. 2018. Dynamic Sound Field Synthesis for Speech and Music Optimization. In ACM Multimedia. 1901--1909.Google Scholar
- Quoc-Tuan Truong and Hady W Lauw. 2017. Visual sentiment analysis for review images with item-oriented and user-oriented CNN. In ACM Multimedia. 1274--1282.Google Scholar
- Patricia Valdez and Albert Mehrabian. 1994. Effects of color on emotions. Journal of experimental psychology: General, Vol. 123, 4 (1994), 394.Google ScholarCross Ref
- Patrik Vuilleumier. 2005. How brains beware: neural mechanisms of emotional attention. Trends in cognitive sciences, Vol. 9, 12 (2005), 585--594.Google Scholar
- Ju-Chiang Wang, Hsin-Min Wang, and Shyh-Kang Jeng. 2012. Playing with tagging: A real-time tagging music player. In ICASSP. IEEE, 77--80.Google Scholar
- Yujia Wang, Wei Liang, Jianbing Shen, Yunde Jia, and Lap-Fai Yu. 2019 a. A deep Coarse-to-Fine network for head pose estimation from synthetic data. Pattern Recognition, Vol. 94 (2019), 196--206.Google ScholarCross Ref
- Yujia Wang, Wenguan Wang, Wei Liang, and Lap-Fai Yu. 2019 b. Comic-guided speech synthesis. TOG, Vol. 38, 6 (2019), 1--14.Google Scholar
- Guy Whitmore. 2003. Design with music in mind: A guide to adaptive audio for game designers. Gamasutra, May, Vol. 29 (2003).Google Scholar
- Yiming Wu and Wei Li. 2019. Automatic audio chord recognition with midi-trained deep feature and BLSTM-CRF sequence decoding model. Transactions on Audio, Speech and Language Processing, Vol. 27, 2 (2019), 355--366.Google ScholarDigital Library
- Richard Yalch and Eric Spangenberg. 1990. Effects of store music on shopping behavior. Journal of Consumer Marketing, Vol. 7, 2 (1990), 55--63.Google ScholarCross Ref
- Yi-Hsuan Yang and Homer H Chen. 2010. Ranking-based emotion recognition for music organization and retrieval. IEEE Transactions on Audio, Speech, and Language Processing, Vol. 19, 4 (2010), 762--774.Google ScholarDigital Library
- Quanzeng You, Jiebo Luo, Hailin Jin, and Jianchao Yang. 2015. Robust image sentiment analysis using progressively trained and domain transferred deep networks. In AAAI. 381--388.Google Scholar
- Quanzeng You, Jiebo Luo, Hailin Jin, and Jianchao Yang. 2016. Building a large scale dataset for image emotion recognition: The fine print and the benchmark. In AAAI. 308--314.Google Scholar
- Sicheng Zhao, Yue Gao, Xiaolei Jiang, Hongxun Yao, Tat-Seng Chua, and Xiaoshuai Sun. 2014. Exploring principles-of-art features for image emotion recognition. In ACM Multimedia. 47--56.Google Scholar
- Yipin Zhou, Zhaowen Wang, Chen Fang, Trung Bui, and Tamara L Berg. 2018. Visual to sound: Generating natural sound for videos in the wild. In CVPR. 3550--3558.Google Scholar
Index Terms
- Scene-Aware Background Music Synthesis
Recommendations
Video Background Music Generation with Controllable Music Transformer
MM '21: Proceedings of the 29th ACM International Conference on MultimediaIn this work, we address the task of video background music generation. Some previous works achieve effective music generation but are unable to generate melodious music specifically for a given video, and none of them considers the video-music rhythmic ...
Background music reactive games
MindTrek '10: Proceedings of the 14th International Academic MindTrek Conference: Envisioning Future Media EnvironmentsIn this paper, we discuss the concept of games that react to their background music. Instead of limiting the player to a fixed set of songs, the background music can be any song chosen from the player's own music collection. Due to the relative ...
Example-Based Automatic Music-Driven Conventional Dance Motion Synthesis
We introduce a novel method for synthesizing dance motions that follow the emotions and contents of a piece of music. Our method employs a learning-based approach to model the music to motion mapping relationship embodied in example dance motions along ...
Comments