research-article

Scene-Aware Background Music Synthesis

Authors:
Yujia Wang

Beijing Institute of Technology, Beijing, China

Beijing Institute of Technology, Beijing, China
View Profile

,
Wei Liang

Beijing Institute of Technology, Beijing, China

Beijing Institute of Technology, Beijing, China
View Profile

,
Wanwan Li

George Mason University, Fairfax, VA, USA

George Mason University, Fairfax, VA, USA
View Profile

,
Dingzeyu Li

Adobe Research, Seattle, WA, USA

Adobe Research, Seattle, WA, USA
View Profile

,
Lap-Fai Yu

George Mason University, Fairfax, VA, USA

George Mason University, Fairfax, VA, USA
View Profile

MM '20: Proceedings of the 28th ACM International Conference on MultimediaOctober 2020Pages 1162–1170https://doi.org/10.1145/3394171.3413894

Published:12 October 2020Publication History

MM '20: Proceedings of the 28th ACM International Conference on Multimedia

Pages 1162–1170

ABSTRACT

In this paper, we introduce an interactive background music synthesis algorithm guided by visual content. We leverage a cascading strategy to synthesize background music in two stages: Scene Visual Analysis and Background Music Synthesis. First, seeking a deep learning-based solution, we leverage neural networks to analyze the sentiment of the input scene. Second, real-time background music is synthesized by optimizing a cost function that guides the selection and transition of music clips to maximize the emotion consistency between visual and auditory criteria, and music continuity. In our experiments, we demonstrate the proposed approach can synthesize dynamic background music for different types of scenarios. We also conducted quantitative and qualitative analysis on the synthesized results of multiple example scenes to validate the efficacy of our approach.

Supplemental Material

3394171.3413894.mp4

mp4

112.9 MB

Download

Available for Download

zip

mmfp0068aux.zip (1.4 MB)

Supplementary material for "Scene-Aware Background Music Synthesis" including i) virtual scene panoramas; ii) detailed statistical results of qualitative experiment results.

References

2018. Inside the booming business of background music. https://www.theguardian.com/news/2018/nov/06/inside-the-booming-business-of-background-music.Google Scholar
Sami Abboud, Shlomi Hanassy, Shelly Levy-Tzedek, Shachar Maidenbaum, and Amir Amedi. 2014. EyeMusic: Introducing a “visual” colorful experience for the blind using auditory sensory substitution. Restorative neurology and neuroscience, Vol. 32, 2 (2014), 247--257.Google Scholar
Mohammed Habibullah Baig, Jibin Rajan Varghese, and Zhangyang Wang. 2018. MusicMapp: A Deep Learning Based Solution for Music Exploration and Visual Interaction. In ACM Multimedia. 1253--1255.Google Scholar
Laura-Lee Balkwill and William Forde Thompson. 1999. A cross-cultural investigation of the perception of emotion in music: Psychophysical and cultural cues. Music perception: an interdisciplinary journal, Vol. 17, 1 (1999), 43--64.Google Scholar
Jared S Bauer, Alex Jansen, and Jesse Cirimele. 2011. MoodMusic: a method for cooperative, generative music playlist creation. In UIST. ACM, 85--86.Google Scholar
Thomas Baumgartner, Michaela Esslen, and Lutz Jancke. 2006. From emotion perception to emotion experience: Emotions evoked by pictures and classical music. International journal of psychophysiology, Vol. 60, 1 (2006), 34--43.Google Scholar
Axel Berndt, Knut Hartmann, Niklas Röber, and Maic Masuch. 2006. Composition and arrangement techniques for music in interactive immersive environments. Audio Mostly, Vol. 2006 (2006), 53--59.Google Scholar
John Ashley Burgoyne, Jonathan Wild, and Ichiro Fujinaga. 2011. An Expert Ground Truth Set for Audio Chord Recognition and Music Analysis.. In ISMIR, Vol. 11. 633--638.Google Scholar
Victor Campos, Brendan Jou, and Xavier Giro-i Nieto. 2017. From pixels to sentiment: Fine-tuning CNNs for visual sentiment prediction. Image and Vision Computing, Vol. 65 (2017), 15--22.Google ScholarDigital Library
Sofia Cavaco, J Tomás Henriques, Michele Mengucci, Nuno Correia, and Francisco Medeiros. 2013. Color sonification for the visually impaired. Procedia Technology, Vol. 9 (2013), 1048--1057.Google ScholarCross Ref
Fu-Yin Cherng, Yi-Chen Lee, Jung-Tai King, and Wen-Chieh Lin. 2019. Measuring the Influences of Musical Parameters on Cognitive and Behavioral Responses to Audio Notifications Using EEG and Large-scale Online Studies. In ACM SIGCHI. ACM, 409.Google Scholar
John Clough and Gerald Myerson. 1986. Musical scales and the generalized circle of fifths. The american mathematical monthly, Vol. 93, 9 (1986), 695--701.Google Scholar
Daniel PW Ellis and Graham E Poliner. 2007. Identifyingcover songs' with chroma features and dynamic programming beat tracking. In ICASSP, Vol. 4. IEEE, IV--1429.Google Scholar
Shaojing Fan, Zhiqi Shen, Ming Jiang, Bryan L Koenig, Juan Xu, Mohan S Kankanhalli, and Qi Zhao. 2018. Emotional attention: A study of image sentiment and visual attention. In CVPR. 7521--7531.Google Scholar
Eric Fassbender, Deborah Richards, Ayse Bilgin, William Forde Thompson, and Wolfgang Heiden. 2012. VirSchool: The effect of background music and immersive display systems on memory for facts learned in an educational virtual environment. Computers & Education, Vol. 58, 1 (2012), 490--500.Google ScholarDigital Library
JG Fox. 1971. Background music and industrial efficiency - a review. Applied ergonomics, Vol. 2, 2 (1971), 70--73.Google Scholar
Heitor Guimaraes. 2018. Music Genre classification using Convolutional Neural Networks. Github.Google Scholar
Johann David Heinichen. 1969. Der General-Baß in der Composition [1728]. Hildesheim: Olms (1969).Google Scholar
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.Google Scholar
Christine Hosey, Lara Vujović, Brian St Thomas, Jean Garcia-Gathright, and Jennifer Thom. 2019. Just Give Me What I Want: How People Use and Evaluate Music Search. In ACM SIGCHI. ACM, 299.Google Scholar
Qibin Hou, Ming-Ming Cheng, Xiaowei Hu, Ali Borji, Zhuowen Tu, and Philip HS Torr. 2017. Deeply supervised salient object detection with short connections. In CVPR. 3203--3212.Google Scholar
Haikun Huang, Michael Solah, Dingzeyu Li, and Lap-Fai Yu. 2019. Audible Panorama: Automatic Spatial Audio Generation for Panorama Imagery. In ACM SIGCHI. ACM, 621.Google ScholarDigital Library
Danyal Imran. 2016. Music Emotion Recognition. Github. https://github.com/danz1ka19/Music-Emotion-Recognition.Google Scholar
Junki Kikuchi, Hidekatsu Yanagi, and Yoshiaki Mima. 2016. Music composition with recommendation. In UIST. ACM, 137--138.Google Scholar
Peter J Lang. 1979. A bio-informational theory of emotional imagery. Psychophysiology, Vol. 16, 6 (1979), 495--512.Google ScholarCross Ref
Peter J Lang, Margaret M Bradley, and Bruce N Cuthbert. 1998. Emotion, motivation, and anxiety: Brain mechanisms and psychophysiology. Biological psychiatry, Vol. 44, 12 (1998), 1248--1263.Google Scholar
Jen-Chun Lin, Wen-Li Wei, James Yang, Hsin-Min Wang, and Hong-Yuan Mark Liao. 2017. Automatic music video generation based on simultaneous soundtrack recommendation and video editing. In ACM Multimedia. 519--527.Google Scholar
Tie Liu, Zejian Yuan, Jian Sun, Jingdong Wang, Nanning Zheng, Xiaoou Tang, and Heung-Yeung Shum. 2010. Learning to detect a salient object. TPAMI, Vol. 33, 2 (2010), 353--367.Google Scholar
Xi Lu, Xiaohang Liu, and Erik Stolterman Bergqvist. 2019. It sounds like she is sad: Introducing a Biosensing Prototype that Transforms Emotions into Real-time Music and Facilitates Social Interaction. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, LBW2219.Google ScholarDigital Library
Vincent P Magnini and Emily E Parker. 2009. The psychological effects of music: Implications for hotel firms. Journal of Vacation Marketing, Vol. 15, 1 (2009), 53--62.Google ScholarCross Ref
Jon McCormack, Toby Gifford, Patrick Hutchings, Maria Teresa Llano Rodriguez, Matthew Yee-King, and Mark d'Inverno. 2019. In a Silent Way: Communication Between AI and Improvising Musicians Beyond Sound. In ACM SIGCHI. ACM, 38.Google Scholar
Leonard B Meyer. 2008. Emotion and meaning in music .University of Chicago Press.Google Scholar
Meinard Müller and Jonathan Driedger. 2012. Data-driven sound track generation. In Dagstuhl Follow-Ups, Vol. 3. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.Google Scholar
Adrian C North, David J Hargreaves, and Jon J Hargreaves. 2004. Uses of music in everyday life. Music Perception: An Interdisciplinary Journal, Vol. 22, 1 (2004), 41--77.Google ScholarCross Ref
Andrew Owens, Jiajun Wu, Josh H McDermott, William T Freeman, and Antonio Torralba. 2016. Ambient sound provides supervision for visual learning. In ECCV. Springer, 801--816.Google Scholar
Jose Costa Pereira, Emanuele Coviello, Gabriel Doyle, Nikhil Rasiwasia, Gert RG Lanckriet, Roger Levy, and Nuno Vasconcelos. 2013. On the role of correlation and abstraction in cross-modal multimedia retrieval. TPAMI, Vol. 36, 3 (2013), 521--535.Google ScholarDigital Library
Debbie Richards, Eric Fassbender, Ayse Bilgin, and William Forde Thompson. 2008. An investigation of the role of background music in IVWs for learning. ALT-J, Vol. 16, 3 (2008), 231--244.Google ScholarCross Ref
Judy Robertson, Andrew de Quincey, Tom Stapleford, and Geraint Wiggins. 1998. Real-time music generation for a virtual environment. In Proceedings of ECAI-98 Workshop on AI/Alife and Entertainment. Citeseer.Google Scholar
Steve Rubin and Maneesh Agrawala. 2014. Generating emotionally relevant musical scores for audio stories. In UIST. ACM, 439--448.Google Scholar
Zhengshan Shi and Gautham J Mysore. 2018. LoopMaker: Automatic Creation of Music Loops from Pre-recorded Music. In ACM SIGCHI. ACM, 454.Google Scholar
Jianchao Tan, Jyh-Ming Lien, and Yotam Gingold. 2016. Decomposing images into layers via RGB-space geometry. TOG, Vol. 36, 1 (2016), 1--14.Google ScholarDigital Library
Zhenyu Tang, Nicolas Morales, and Dinesh Manocha. 2018. Dynamic Sound Field Synthesis for Speech and Music Optimization. In ACM Multimedia. 1901--1909.Google Scholar
Quoc-Tuan Truong and Hady W Lauw. 2017. Visual sentiment analysis for review images with item-oriented and user-oriented CNN. In ACM Multimedia. 1274--1282.Google Scholar
Patricia Valdez and Albert Mehrabian. 1994. Effects of color on emotions. Journal of experimental psychology: General, Vol. 123, 4 (1994), 394.Google ScholarCross Ref
Patrik Vuilleumier. 2005. How brains beware: neural mechanisms of emotional attention. Trends in cognitive sciences, Vol. 9, 12 (2005), 585--594.Google Scholar
Ju-Chiang Wang, Hsin-Min Wang, and Shyh-Kang Jeng. 2012. Playing with tagging: A real-time tagging music player. In ICASSP. IEEE, 77--80.Google Scholar
Yujia Wang, Wei Liang, Jianbing Shen, Yunde Jia, and Lap-Fai Yu. 2019 a. A deep Coarse-to-Fine network for head pose estimation from synthetic data. Pattern Recognition, Vol. 94 (2019), 196--206.Google ScholarCross Ref
Yujia Wang, Wenguan Wang, Wei Liang, and Lap-Fai Yu. 2019 b. Comic-guided speech synthesis. TOG, Vol. 38, 6 (2019), 1--14.Google Scholar
Guy Whitmore. 2003. Design with music in mind: A guide to adaptive audio for game designers. Gamasutra, May, Vol. 29 (2003).Google Scholar
Yiming Wu and Wei Li. 2019. Automatic audio chord recognition with midi-trained deep feature and BLSTM-CRF sequence decoding model. Transactions on Audio, Speech and Language Processing, Vol. 27, 2 (2019), 355--366.Google ScholarDigital Library
Richard Yalch and Eric Spangenberg. 1990. Effects of store music on shopping behavior. Journal of Consumer Marketing, Vol. 7, 2 (1990), 55--63.Google ScholarCross Ref
Yi-Hsuan Yang and Homer H Chen. 2010. Ranking-based emotion recognition for music organization and retrieval. IEEE Transactions on Audio, Speech, and Language Processing, Vol. 19, 4 (2010), 762--774.Google ScholarDigital Library
Quanzeng You, Jiebo Luo, Hailin Jin, and Jianchao Yang. 2015. Robust image sentiment analysis using progressively trained and domain transferred deep networks. In AAAI. 381--388.Google Scholar
Quanzeng You, Jiebo Luo, Hailin Jin, and Jianchao Yang. 2016. Building a large scale dataset for image emotion recognition: The fine print and the benchmark. In AAAI. 308--314.Google Scholar
Sicheng Zhao, Yue Gao, Xiaolei Jiang, Hongxun Yao, Tat-Seng Chua, and Xiaoshuai Sun. 2014. Exploring principles-of-art features for image emotion recognition. In ACM Multimedia. 47--56.Google Scholar
Yipin Zhou, Zhaowen Wang, Chen Fang, Trung Bui, and Tamara L Berg. 2018. Visual to sound: Generating natural sound for videos in the wild. In CVPR. 3550--3558.Google Scholar

Index Terms

Scene-Aware Background Music Synthesis
1. Applied computing
  1. Arts and humanities
    1. Sound and music computing
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction paradigms
      1. Virtual reality

Recommendations

Video Background Music Generation with Controllable Music Transformer
MM '21: Proceedings of the 29th ACM International Conference on Multimedia

In this work, we address the task of video background music generation. Some previous works achieve effective music generation but are unable to generate melodious music specifically for a given video, and none of them considers the video-music rhythmic ...
Read More
Background music reactive games
MindTrek '10: Proceedings of the 14th International Academic MindTrek Conference: Envisioning Future Media Environments

In this paper, we discuss the concept of games that react to their background music. Instead of limiting the player to a fixed set of songs, the background music can be any song chosen from the player's own music collection. Due to the relative ...
Read More
Example-Based Automatic Music-Driven Conventional Dance Motion Synthesis

We introduce a novel method for synthesizing dance motions that follow the emotions and contents of a piece of music. Our method employs a learning-based approach to model the music to motion mapping relationship embodied in example dance motions along ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '20: Proceedings of the 28th ACM International Conference on Multimedia
October 2020
4889 pages
ISBN:9781450379885
DOI:10.1145/3394171
General Chairs:
Chang Wen Chen
Chinese University of Hong Kong, Shenzhen, China
,
Rita Cucchiara
UNIMORE, Italy
,
Xian-Sheng Hua
Alibaba Group, China
,
Program Chairs:
Guo-Jun Qi
Futurewei Technologies, USA
,
Elisa Ricci
UNITN & Fondazione Bruno Kessler, Italy
,
Zhengyou Zhang
Tencent, China
,
Roger Zimmermann
National University of Singapore, Singapore
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 October 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
background music synthesis
music transition
scene sentiment
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 430
  Total Downloads
- Downloads (Last 12 months)66
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Scene-Aware Background Music Synthesis

MM '20: Proceedings of the 28th ACM International Conference on Multimedia

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Video Background Music Generation with Controllable Music Transformer

Background music reactive games

Example-Based Automatic Music-Driven Conventional Dance Motion Synthesis