Skip to main content

Static Music Emotion Recognition Using Recurrent Neural Networks

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12117))

Abstract

The article presents experiments using recurrent neural networks for emotion detection for musical segments using Russell’s circumplex model. A process of feature extraction and creating sequential data for learning networks with long short-term memory (LSTM) units is presented. Models were implemented using the WekaDeeplearning4j package and a number of experiments were carried out with data with different sets of features and varying segmentation. The usefulness of dividing data into sequences as well as the sense of using recurrent networks to recognize emotions in music, whose results have even exceeded the SVM algorithm for regression, were demonstrated. The author analyzed the effect of the network structure and the set of used features on the results of regressors recognizing values on two axes of the emotion model: arousal and valence.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://marsyas.info/downloads/datasets.html.

  2. 2.

    http://essentia.upf.edu/documentation/algorithms_reference.html.

  3. 3.

    https://deeplearning4j.org.

References

  1. Aljanaki, A., Yang, Y.H., Soleymani, M.: Developing a benchmark for emotional analysis of music. PLoS One 12(3), e0173392 (2017)

    Article  Google Scholar 

  2. Bachorik, J., Bangert, M., Loui, P., Larke, K., Berger, J., Rowe, R., Schlaug, G.: Emotion in motion: investigating the time-course of emotional judgments of musical stimuli. Music Percept. 26, 355–364 (2009)

    Google Scholar 

  3. Bogdanov, D., Wack, N., Gómez, E., Gulati, S., Herrera, P., Mayor, O., Roma, G., Salamon, J., Zapata, J., Serra, X.: ESSENTIA: an audio analysis library for music information retrieval. In: Proceedings of the 14th International Society for Music Information Retrieval Conference, pp. 493–498 (2013)

    Google Scholar 

  4. Chowdhury, S., Portabella, A.V., Haunschmid, V., Widmer, G.: Towards explainable music emotion recognition: the route via mid-level features. In: Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019, Delft, The Netherlands, pp. 237–243 (2019)

    Google Scholar 

  5. Coutinho, E., Trigeorgis, G., Zafeiriou, S., Schuller, B.: Automatically estimating emotion in music with deep long-short term memory recurrent neural networks. In: Working Notes Proceedings of the MediaEval 2015 Workshop, Wurzen, Germany (2015)

    Google Scholar 

  6. Delbouys, R., Hennequin, R., Piccoli, F., Royo-Letelier, J., Moussallam, M.: Music mood detection based on audio and lyrics with deep neural net. In: Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR 2018), Paris, France, pp. 370–375 (2018)

    Google Scholar 

  7. Gers, F.A., Schmidhuber, J., Cummins, F.A.: Learning to forget: continual prediction with LSTM. Neural Comput. 12, 2451–2471 (2000)

    Article  Google Scholar 

  8. Grekow, J.: Audio features dedicated to the detection of four basic emotions. In: Saeed, K., Homenda, W. (eds.) CISIM 2015. LNCS, vol. 9339, pp. 583–591. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24369-6_49

    Chapter  Google Scholar 

  9. Grekow, J.: Music emotion maps in arousal-valence space. In: Saeed, K., Homenda, W. (eds.) CISIM 2016. LNCS, vol. 9842, pp. 697–706. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45378-1_60

    Chapter  Google Scholar 

  10. Grekow, J.: Human annotation. In: From Content-Based Music Emotion Recognition to Emotion Maps of Musical Pieces. SCI, vol. 747, pp. 13–24. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-70609-2

  11. Lang, S., Bravo-Marquez, F., Beckham, C., Hall, M., Frank, E.: WekaDeeplearning4j: a deep learning package for Weka based on Deeplearning4j. Knowl.-Based Syst. 178, 48–50 (2019)

    Article  Google Scholar 

  12. Lu, L., Liu, D., Zhang, H.J.: Automatic mood detection and tracking of music audio signals. Trans. Audio Speech Lang. Proc. 14(1), 5–18 (2006)

    Article  Google Scholar 

  13. Russell, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. 39(6), 1161–1178 (1980)

    Article  Google Scholar 

  14. Tzanetakis, G., Cook, P.: Marsyas: a framework for audio analysis. Org. Sound 4(3), 169–175 (2000)

    Article  Google Scholar 

  15. Weninger, F., Eyben, F., Schuller, B.: On-line continuous-time music mood regression with deep recurrent neural networks. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5412–5416 (2014)

    Google Scholar 

  16. Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques, 4th edn. Morgan Kaufmann Publishers Inc., San Francisco (2016)

    Google Scholar 

Download references

Acknowledgments

This research was realized as part of study no. KSIiSK in the Bialystok University of Technology and financed with funds from the Ministry of Science and Higher Education.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jacek Grekow .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Grekow, J. (2020). Static Music Emotion Recognition Using Recurrent Neural Networks. In: Helic, D., Leitner, G., Stettinger, M., Felfernig, A., Raś, Z.W. (eds) Foundations of Intelligent Systems. ISMIS 2020. Lecture Notes in Computer Science(), vol 12117. Springer, Cham. https://doi.org/10.1007/978-3-030-59491-6_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-59491-6_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-59490-9

  • Online ISBN: 978-3-030-59491-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics