Abstract
Processing generalized sound events with the purpose of predicting the emotion they might evoke is a relatively young research field. Tools, datasets, and methodologies to address such a challenging task are still under development, far from any standardized format. This work aims to cover this gap by revealing and exploiting potential similarities existing during the perception of emotions evoked by sound events and music. o this end we propose (a) the usage of temporal modulation features and (b) a transfer learning module based on an Echo State Network assisting the prediction of valence and arousal measurements associated with generalized sound events. The effectiveness of the proposed transfer learning solution is demonstrated after a thoroughly designed experimental phase employing both sound and music data. The results demonstrate the importance of transfer learning in the specific field and encourage further research on approaches which manage the problem in a cooperative way.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Ntalampiras, S., Potamitis, I., Fakotakis, N.: Acoustic detection of human activities in natural environments. J. Audio Eng. Soc. 60, 686–695 (2012)
Ntalampiras, S.: A transfer learning framework for predicting the emotional content of generalized sound events. J. Acoust. Soc. Am. 141, 1694–1701 (2017)
Shigeno, S.: Effects of discrepancy between vocal emotion and the emotional meaning of speech on identifying the speakers emotions. J. Acoust. Soc. Am. 140, 3399–3399 (2016)
Scherer, K.R.: Vocal communication of emotion: a review of research paradigms. Speech Commun. 40, 227–256 (2003)
Hozjan, V., Kai, Z.: A rule-based emotion-dependent feature extraction method for emotion analysis from speech. J. Acoust. Soc. Am. 119, 3109–3120 (2006)
Marcell, M., Malatanos, M., Leahy, C., Comeaux, C.: Identifying, rating, and remembering environmental sound events. Behav. Res. Methods 39, 561–569 (2007)
Garner, T., Grimshaw, M.: A climate of fear: considerations for designing a virtual acoustic ecology of fear. In: Proceedings of 6th Audio Mostly Conference: A Conference on Interaction with Sound, pp. 31–38 (2011)
El Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn. 44, 572–587 (2011)
Asadi, R., Fell, H.: Improving the accuracy of speech emotion recognition using acoustic landmarks and Teager energy operator features. J. Acoust. Soc. Am. 137, 2303–2303 (2015)
Lee, C., Lui, S., So, C.: Visualization of time-varying joint development of pitch and dynamics for speech emotion recognition. J. Acoust. Soc. Am. 135, 2422–2422 (2014)
Fukuyama, S., Goto, M.: Music emotion recognition with adaptive aggregation of Gaussian process regressors. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 71–75 (2016)
Markov, K., Matsui, T.: Music genre and emotion recognition using Gaussian processes. IEEE Access 2, 688–697 (2014)
Yi-Hsuan, Y., Chen, H.: Machine recognition of music emotion: a review. ACM Trans. Intell. Syst. Technol. 3, 40:1–40:30 (2012)
Gang, M.-J., Teft, L.: Individual differences in heart rate responses to affective sound. Psychophysiology 12, 423–426 (1975)
Schuller, B., Hantke, S., Weninger, F., Han, W., Zhang, Z., Narayanan, S.: Automatic recognition of emotion evoked by general sound events. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 341–344 (2012)
Drossos, K., Floros, A., Kanellopoulos, N.-G.: Affective acoustic ecology: towards emotionally enhanced sound events. In: Proceedings of 7th Audio Mostly Conference: A Conference on Interaction with Sound, pp. 109–116 (2012)
Weninger, F., Eyben, F., Schuller, B., Mortillaro, M., Scherer, K.-R.: On the acoustics of emotion in audio: what speech, music and sound have in common. Front. Psychol. 292, 1–12 (2013)
Schuller, B., Steidl, S., Batliner, A., Vinciarelli, A., Scherer, K.-R., Ringeval, F., Chetouani, M., Weninger, F., Eyben, F., Marchi, E., Mortillaro, M., Salamin, H., Polychroniou, A., Valente, F., Kim, S.: The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. In: INTERSPEECH, pp. 148–152 (2013)
Bradley, M., Lang, P.-J.: The International Affective Digitized Sounds (2nd edn. IADS-2): Affective Ratings of Sounds and Instruction Manual. Technical report B-3, University of Florida, Gainesville, Fl (2004)
Soleymani, M., Caro, M.-N., Schmidt, E.-M., Sha, C.-Y., Yang, Y.H.: 1000 songs for emotional analysis of music. In: Proceedings of 2nd ACM International Workshop on Crowdsourcing for Multimedia, pp. 1–6 (2013)
Ntalampiras, S., Potamitis, I.: On predicting the unpleasantness level of a sound event. In: 15th Annual Conference of International Speech Communication Association (INTERSPEECH), pp. 1782–1785 (2014)
Clark, P., Atlas, L.: Time-frequency coherent modulation filtering of nonstationary signals. IEEE Trans. Signal Process. 57, 4323–4332 (2009)
Schimmel, S.M., Atlas, L.E., Nie, K.: Feasibility of single channel speaker separation based on modulation frequency analysis. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 605–608 (2007)
Vinton, M.S., Atlas, L.E.: Scalable and progressive audio codec. In: 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, Proceedings (ICASSP 2001), pp. 3277–3280 (2001)
Klapuri, A.: Multipitch analysis of polyphonic music and speech signals using an auditory model. IEEE Trans. Audio Speech Lang. Process. 16, 255–266 (2008)
Atlas, L., Clark, P., Schimmel, S.: Modulation Toolbox Version 2.1 for MATLAB. http://isdl.ee.washington.edu/projects/modulationtoolbox/. Accessed Sept 2010
Jalalvand, A., Triefenbach, F., Verstraeten, D., Martens, J.: Connected digit recognition by means of reservoir computing. In: Proceedings of 12th Annual Conference of the International Speech Communication Association, pp. 1725–1728 (2011)
Verstraeten, D., Schrauwen, B., Stroobandt, D.: Reservoir-based techniques for speech recognition. In: International Joint Conference on Neural Networks, IJCNN 2006, pp. 1050–1053 (2006)
Jaeger, H., Haas, H.: Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication. Science 304, 78–80 (2004)
Lukoševičius, M., Jaeger, H.: Survey: reservoir computing approaches to recurrent neural network training. Comput. Sci. Rev. 3, 127–149 (2009)
Verstraeten, D., Schrauwen, B., d’Haene, M., Stroobandt, D.: An experimental unification of reservoir computing methods. Neural Netw. 20, 391–403 (2007)
Ntalampiras, S., Potamitis, I., Fakotakis, N.: Exploiting temporal feature integration for generalized sound recognition. EURASIP J. Adv. Signal Process. 2009, 1–12 (2009)
Ntalampiras, S.: Audio pattern recognition of baby crying sound events. J. Audio Eng. Soc 63, 358–369 (2015)
Scharf, B.: Complex sounds and critical bands. Psychol. Bull. 58, 205–217 (1961)
Yi-Lin, L., Gang, W.: Speech emotion recognition based on HMM and SVM. In: International Conference on Machine Learning and Cybernetics, vol. 8, pp. 4898–4901 (2005)
Smola, A.-J., Schölkopf, B.: A tutorial on support vector regression. Stat. Comput. 14, 199–222 (2004)
Acknowledgment
The research leading to these results has received partial funding from European Union HORIZON 2020 fast track to innovation project no. 691131 REMOSIS.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Ntalampiras, S., Potamitis, I. (2017). Emotion Prediction of Sound Events Based on Transfer Learning. In: Boracchi, G., Iliadis, L., Jayne, C., Likas, A. (eds) Engineering Applications of Neural Networks. EANN 2017. Communications in Computer and Information Science, vol 744. Springer, Cham. https://doi.org/10.1007/978-3-319-65172-9_26
Download citation
DOI: https://doi.org/10.1007/978-3-319-65172-9_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65171-2
Online ISBN: 978-3-319-65172-9
eBook Packages: Computer ScienceComputer Science (R0)