Audio Generation from Scene Considering Its Emotion Aspect

Sergio, Gwenaelle Cunha; Lee, Minho

doi:10.1007/978-3-319-46672-9_9

Gwenaelle Cunha Sergio¹⁹ &
Minho Lee¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9948))

Included in the following conference series:

International Conference on Neural Information Processing

2862 Accesses
1 Citations

Abstract

Scenes can convey emotion like music. If that’s so, it might be possible that, given an image, one can generate music with similar emotional reaction from users. The challenge lies in how to do that. In this paper, we use the Hue, Saturation and Lightness features from a number of image samples extracted from videos excerpts and the tempo, loudness and rhythm from a number of audio samples also extracted from the same video excerpts to train a group of neural networks, including Recurrent Neural Network and Neuro-Fuzzy Network, and obtain the desired audio signal to evoke a similar emotional response to a listener. This work could prove to be an important contribution to the field of Human-Computer Interaction because it can improve the interaction between computers and humans. Experimental results show that this model effectively produces an audio that matches the video evoking a similar emotion from the viewer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.youtube.com/user/lindseystomp.

References

Mark, J.J.: Writing: Definition (2011). http://www.ancient.eu/writing/
Rouzic, M.: Photosounder (2008). http://photosounder.com/
White, D.: Sonicphoto (2011). http://www.skytopia.com/software/sonicphoto/
Singh, J.F.: Paint2sound (2012). http://flexibeatz.weebly.com/paint2sound.html
Zhang, Q., Jeong, S., Lee, M.: Autonomous emotion development using incremental modified adaptive neuro-fuzzy inference system. Neurocomputing 96, 33–44 (2012). Elsevier
Article Google Scholar
Zhang, Q., Lee, M.: Emotion development system by interacting with human eeg and natural scene understanding. Cogn. Syst. Res. 14, 37–49 (2012). Elsevier
Article Google Scholar
Yanulevskaya, V., Gemert, J.V., Roth, K., Herbold, A., Sebe, N., Geusebroek, J.: Emotional valence categorization using holistic image features. In: 15th IEEE International Conference on Image Processing (ICIP), pp. 101–104 (2008)
Google Scholar
Mikels, J.A., Fredrickson, B.L., Larkin, G.R., Lindberg, C.M., Maglio, S.J., Reuter-Lorenz, P.A.: Emotional category data on images from the international affective picture system. Behav. Res. Meth. 37(4), 626–630 (2005)
Article Google Scholar
Levitin, D.J.: Dr. daniel j. levitin: Neuroscientist, musician, author (2015). http://daniellevitin.com/publicpage/
Levitin, D.J.: This Is Your Brain on Music: The Science of a Human Obsession. Dutton Penguin Books Ltd., New York (2006)
Google Scholar
Kim, T.: The acoustic and visual emotive signals classification in movies using brain cognitive signal and fuzzy clustering and adaptive neuro-fuzzy inference system (2014)
Google Scholar
Lee, G., Kwon, M., Kavuri, S., Lee, M.: Emotion recognition based on 3D fuzzy visual and eeg features in movie clips. Neurocomputing 144, 560–568 (2014). Elsevier
Article Google Scholar
Kwon, I.K., Lee, S.Y.: Design of emotional space modeling using neuro-fuzzy. Adv. Sci. Technol. Lett. 46, 6–9 (2014)
Article Google Scholar
Lee, Giyoung, Kwon, Mingu, Sri, S.K., Lee, M.: Emotion recognition based on 3D fuzzy visual and eeg features in movie clips. Neurocomputing 144, 560–568 (2014). Elsevier
Article Google Scholar

Download references

Acknowledgments

This work was supported by the Industrial Strategic Technology Development Program (10044009) funded by the Ministry of Trade, Industry and Energy (MOTIE, Korea).

Author information

Authors and Affiliations

School of Electronics Engineering, Kyungpook National University, 1370 Sankyuk-Dong, Taegu, 702-701, South Korea
Gwenaelle Cunha Sergio & Minho Lee

Authors

Gwenaelle Cunha Sergio
View author publications
You can also search for this author in PubMed Google Scholar
Minho Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Minho Lee .

Editor information

Editors and Affiliations

The University of Tokyo, Tokyo, Japan
Akira Hirose
Kobe University, Kobe, Japan
Seiichi Ozawa
Okinawa Institute of Science and Technology Graduate University, Onna, Japan
Kenji Doya
Nara Institute of Science and Technology, Ikoma, Japan
Kazushi Ikeda
Kyungpook National University, Daegu, Korea (Republic of)
Minho Lee
Chinese Academy of Sciences, Beijing, China
Derong Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sergio, G.C., Lee, M. (2016). Audio Generation from Scene Considering Its Emotion Aspect. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds) Neural Information Processing. ICONIP 2016. Lecture Notes in Computer Science(), vol 9948. Springer, Cham. https://doi.org/10.1007/978-3-319-46672-9_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-46672-9_9
Published: 30 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46671-2
Online ISBN: 978-3-319-46672-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics