Emotion Prediction of Sound Events Based on Transfer Learning

Ntalampiras, Stavros; Potamitis, Ilyas

doi:10.1007/978-3-319-65172-9_26

Stavros Ntalampiras¹³ &
Ilyas Potamitis¹⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 744))

Included in the following conference series:

International Conference on Engineering Applications of Neural Networks

3121 Accesses

Abstract

Processing generalized sound events with the purpose of predicting the emotion they might evoke is a relatively young research field. Tools, datasets, and methodologies to address such a challenging task are still under development, far from any standardized format. This work aims to cover this gap by revealing and exploiting potential similarities existing during the perception of emotions evoked by sound events and music. o this end we propose (a) the usage of temporal modulation features and (b) a transfer learning module based on an Echo State Network assisting the prediction of valence and arousal measurements associated with generalized sound events. The effectiveness of the proposed transfer learning solution is demonstrated after a thoroughly designed experimental phase employing both sound and music data. The results demonstrate the importance of transfer learning in the specific field and encourage further research on approaches which manage the problem in a cooperative way.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Symbolic and Acoustic: Multi-domain Music Emotion Modeling for Instrumental Music

Audio Tagging for Emotion Recognition: A Review

Supervised machine learning for audio emotion recognition

Article Open access 22 April 2020

Notes

1.
http://freemusicarchive.org/.

References

Ntalampiras, S., Potamitis, I., Fakotakis, N.: Acoustic detection of human activities in natural environments. J. Audio Eng. Soc. 60, 686–695 (2012)
Google Scholar
Ntalampiras, S.: A transfer learning framework for predicting the emotional content of generalized sound events. J. Acoust. Soc. Am. 141, 1694–1701 (2017)
Article Google Scholar
Shigeno, S.: Effects of discrepancy between vocal emotion and the emotional meaning of speech on identifying the speakers emotions. J. Acoust. Soc. Am. 140, 3399–3399 (2016)
Article Google Scholar
Scherer, K.R.: Vocal communication of emotion: a review of research paradigms. Speech Commun. 40, 227–256 (2003)
Article MATH Google Scholar
Hozjan, V., Kai, Z.: A rule-based emotion-dependent feature extraction method for emotion analysis from speech. J. Acoust. Soc. Am. 119, 3109–3120 (2006)
Article Google Scholar
Marcell, M., Malatanos, M., Leahy, C., Comeaux, C.: Identifying, rating, and remembering environmental sound events. Behav. Res. Methods 39, 561–569 (2007)
Article Google Scholar
Garner, T., Grimshaw, M.: A climate of fear: considerations for designing a virtual acoustic ecology of fear. In: Proceedings of 6th Audio Mostly Conference: A Conference on Interaction with Sound, pp. 31–38 (2011)
Google Scholar
El Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn. 44, 572–587 (2011)
Article MATH Google Scholar
Asadi, R., Fell, H.: Improving the accuracy of speech emotion recognition using acoustic landmarks and Teager energy operator features. J. Acoust. Soc. Am. 137, 2303–2303 (2015)
Article Google Scholar
Lee, C., Lui, S., So, C.: Visualization of time-varying joint development of pitch and dynamics for speech emotion recognition. J. Acoust. Soc. Am. 135, 2422–2422 (2014)
Article Google Scholar
Fukuyama, S., Goto, M.: Music emotion recognition with adaptive aggregation of Gaussian process regressors. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 71–75 (2016)
Google Scholar
Markov, K., Matsui, T.: Music genre and emotion recognition using Gaussian processes. IEEE Access 2, 688–697 (2014)
Article Google Scholar
Yi-Hsuan, Y., Chen, H.: Machine recognition of music emotion: a review. ACM Trans. Intell. Syst. Technol. 3, 40:1–40:30 (2012)
Google Scholar
Gang, M.-J., Teft, L.: Individual differences in heart rate responses to affective sound. Psychophysiology 12, 423–426 (1975)
Article Google Scholar
Schuller, B., Hantke, S., Weninger, F., Han, W., Zhang, Z., Narayanan, S.: Automatic recognition of emotion evoked by general sound events. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 341–344 (2012)
Google Scholar
Drossos, K., Floros, A., Kanellopoulos, N.-G.: Affective acoustic ecology: towards emotionally enhanced sound events. In: Proceedings of 7th Audio Mostly Conference: A Conference on Interaction with Sound, pp. 109–116 (2012)
Google Scholar
Weninger, F., Eyben, F., Schuller, B., Mortillaro, M., Scherer, K.-R.: On the acoustics of emotion in audio: what speech, music and sound have in common. Front. Psychol. 292, 1–12 (2013)
Google Scholar
Schuller, B., Steidl, S., Batliner, A., Vinciarelli, A., Scherer, K.-R., Ringeval, F., Chetouani, M., Weninger, F., Eyben, F., Marchi, E., Mortillaro, M., Salamin, H., Polychroniou, A., Valente, F., Kim, S.: The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. In: INTERSPEECH, pp. 148–152 (2013)
Google Scholar
Bradley, M., Lang, P.-J.: The International Affective Digitized Sounds (2nd edn. IADS-2): Affective Ratings of Sounds and Instruction Manual. Technical report B-3, University of Florida, Gainesville, Fl (2004)
Google Scholar
Soleymani, M., Caro, M.-N., Schmidt, E.-M., Sha, C.-Y., Yang, Y.H.: 1000 songs for emotional analysis of music. In: Proceedings of 2nd ACM International Workshop on Crowdsourcing for Multimedia, pp. 1–6 (2013)
Google Scholar
Ntalampiras, S., Potamitis, I.: On predicting the unpleasantness level of a sound event. In: 15th Annual Conference of International Speech Communication Association (INTERSPEECH), pp. 1782–1785 (2014)
Google Scholar
Clark, P., Atlas, L.: Time-frequency coherent modulation filtering of nonstationary signals. IEEE Trans. Signal Process. 57, 4323–4332 (2009)
Article MathSciNet Google Scholar
Schimmel, S.M., Atlas, L.E., Nie, K.: Feasibility of single channel speaker separation based on modulation frequency analysis. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 605–608 (2007)
Google Scholar
Vinton, M.S., Atlas, L.E.: Scalable and progressive audio codec. In: 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, Proceedings (ICASSP 2001), pp. 3277–3280 (2001)
Google Scholar
Klapuri, A.: Multipitch analysis of polyphonic music and speech signals using an auditory model. IEEE Trans. Audio Speech Lang. Process. 16, 255–266 (2008)
Article Google Scholar
Atlas, L., Clark, P., Schimmel, S.: Modulation Toolbox Version 2.1 for MATLAB. http://isdl.ee.washington.edu/projects/modulationtoolbox/. Accessed Sept 2010
Jalalvand, A., Triefenbach, F., Verstraeten, D., Martens, J.: Connected digit recognition by means of reservoir computing. In: Proceedings of 12th Annual Conference of the International Speech Communication Association, pp. 1725–1728 (2011)
Google Scholar
Verstraeten, D., Schrauwen, B., Stroobandt, D.: Reservoir-based techniques for speech recognition. In: International Joint Conference on Neural Networks, IJCNN 2006, pp. 1050–1053 (2006)
Google Scholar
Jaeger, H., Haas, H.: Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication. Science 304, 78–80 (2004)
Article Google Scholar
Lukoševičius, M., Jaeger, H.: Survey: reservoir computing approaches to recurrent neural network training. Comput. Sci. Rev. 3, 127–149 (2009)
Article MATH Google Scholar
Verstraeten, D., Schrauwen, B., d’Haene, M., Stroobandt, D.: An experimental unification of reservoir computing methods. Neural Netw. 20, 391–403 (2007)
Article MATH Google Scholar
Ntalampiras, S., Potamitis, I., Fakotakis, N.: Exploiting temporal feature integration for generalized sound recognition. EURASIP J. Adv. Signal Process. 2009, 1–12 (2009)
Article MATH Google Scholar
Ntalampiras, S.: Audio pattern recognition of baby crying sound events. J. Audio Eng. Soc 63, 358–369 (2015)
Article Google Scholar
Scharf, B.: Complex sounds and critical bands. Psychol. Bull. 58, 205–217 (1961)
Article Google Scholar
Yi-Lin, L., Gang, W.: Speech emotion recognition based on HMM and SVM. In: International Conference on Machine Learning and Cybernetics, vol. 8, pp. 4898–4901 (2005)
Google Scholar
Smola, A.-J., Schölkopf, B.: A tutorial on support vector regression. Stat. Comput. 14, 199–222 (2004)
Article MathSciNet Google Scholar

Download references

Acknowledgment

The research leading to these results has received partial funding from European Union HORIZON 2020 fast track to innovation project no. 691131 REMOSIS.

Author information

Authors and Affiliations

National Research Council of Italy, Milan, Italy
Stavros Ntalampiras
Technological Educational Institute of Crete, Rethymno, Greece
Ilyas Potamitis

Authors

Stavros Ntalampiras
View author publications
You can also search for this author in PubMed Google Scholar
Ilyas Potamitis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stavros Ntalampiras .

Editor information

Editors and Affiliations

Politecnico di Milano, Milan, Italy
Giacomo Boracchi
Democritus University of Thrace, University Campus, Xanthi, Greece
Lazaros Iliadis
School of Computing Science and Digital Media, Robert Gordon University, Aberdeen, United Kingdom
Chrisina Jayne
Univesity of Ioannina, Ioannina, Greece
Aristidis Likas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ntalampiras, S., Potamitis, I. (2017). Emotion Prediction of Sound Events Based on Transfer Learning. In: Boracchi, G., Iliadis, L., Jayne, C., Likas, A. (eds) Engineering Applications of Neural Networks. EANN 2017. Communications in Computer and Information Science, vol 744. Springer, Cham. https://doi.org/10.1007/978-3-319-65172-9_26

Download citation

DOI: https://doi.org/10.1007/978-3-319-65172-9_26
Published: 02 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65171-2
Online ISBN: 978-3-319-65172-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics