Abstract
The article presents experiments using recurrent neural networks for emotion detection for musical segments using Russell’s circumplex model. A process of feature extraction and creating sequential data for learning networks with long short-term memory (LSTM) units is presented. Models were implemented using the WekaDeeplearning4j package and a number of experiments were carried out with data with different sets of features and varying segmentation. The usefulness of dividing data into sequences as well as the sense of using recurrent networks to recognize emotions in music, whose results have even exceeded the SVM algorithm for regression, were demonstrated. The author analyzed the effect of the network structure and the set of used features on the results of regressors recognizing values on two axes of the emotion model: arousal and valence.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Aljanaki, A., Yang, Y.H., Soleymani, M.: Developing a benchmark for emotional analysis of music. PLoS One 12(3), e0173392 (2017)
Bachorik, J., Bangert, M., Loui, P., Larke, K., Berger, J., Rowe, R., Schlaug, G.: Emotion in motion: investigating the time-course of emotional judgments of musical stimuli. Music Percept. 26, 355–364 (2009)
Bogdanov, D., Wack, N., Gómez, E., Gulati, S., Herrera, P., Mayor, O., Roma, G., Salamon, J., Zapata, J., Serra, X.: ESSENTIA: an audio analysis library for music information retrieval. In: Proceedings of the 14th International Society for Music Information Retrieval Conference, pp. 493–498 (2013)
Chowdhury, S., Portabella, A.V., Haunschmid, V., Widmer, G.: Towards explainable music emotion recognition: the route via mid-level features. In: Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019, Delft, The Netherlands, pp. 237–243 (2019)
Coutinho, E., Trigeorgis, G., Zafeiriou, S., Schuller, B.: Automatically estimating emotion in music with deep long-short term memory recurrent neural networks. In: Working Notes Proceedings of the MediaEval 2015 Workshop, Wurzen, Germany (2015)
Delbouys, R., Hennequin, R., Piccoli, F., Royo-Letelier, J., Moussallam, M.: Music mood detection based on audio and lyrics with deep neural net. In: Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR 2018), Paris, France, pp. 370–375 (2018)
Gers, F.A., Schmidhuber, J., Cummins, F.A.: Learning to forget: continual prediction with LSTM. Neural Comput. 12, 2451–2471 (2000)
Grekow, J.: Audio features dedicated to the detection of four basic emotions. In: Saeed, K., Homenda, W. (eds.) CISIM 2015. LNCS, vol. 9339, pp. 583–591. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24369-6_49
Grekow, J.: Music emotion maps in arousal-valence space. In: Saeed, K., Homenda, W. (eds.) CISIM 2016. LNCS, vol. 9842, pp. 697–706. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45378-1_60
Grekow, J.: Human annotation. In: From Content-Based Music Emotion Recognition to Emotion Maps of Musical Pieces. SCI, vol. 747, pp. 13–24. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-70609-2
Lang, S., Bravo-Marquez, F., Beckham, C., Hall, M., Frank, E.: WekaDeeplearning4j: a deep learning package for Weka based on Deeplearning4j. Knowl.-Based Syst. 178, 48–50 (2019)
Lu, L., Liu, D., Zhang, H.J.: Automatic mood detection and tracking of music audio signals. Trans. Audio Speech Lang. Proc. 14(1), 5–18 (2006)
Russell, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. 39(6), 1161–1178 (1980)
Tzanetakis, G., Cook, P.: Marsyas: a framework for audio analysis. Org. Sound 4(3), 169–175 (2000)
Weninger, F., Eyben, F., Schuller, B.: On-line continuous-time music mood regression with deep recurrent neural networks. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5412–5416 (2014)
Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques, 4th edn. Morgan Kaufmann Publishers Inc., San Francisco (2016)
Acknowledgments
This research was realized as part of study no. KSIiSK in the Bialystok University of Technology and financed with funds from the Ministry of Science and Higher Education.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Grekow, J. (2020). Static Music Emotion Recognition Using Recurrent Neural Networks. In: Helic, D., Leitner, G., Stettinger, M., Felfernig, A., Raś, Z.W. (eds) Foundations of Intelligent Systems. ISMIS 2020. Lecture Notes in Computer Science(), vol 12117. Springer, Cham. https://doi.org/10.1007/978-3-030-59491-6_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-59491-6_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59490-9
Online ISBN: 978-3-030-59491-6
eBook Packages: Computer ScienceComputer Science (R0)