End-to-end music emotion variation detection using iteratively reconstructed deep features

Orjesek, Richard; Jarina, Roman; Chmulik, Michal

doi:10.1007/s11042-021-11584-7

End-to-end music emotion variation detection using iteratively reconstructed deep features

1193: Intelligent Processing of Multimedia Signals
Published: 08 January 2022

Volume 81, pages 5017–5031, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

903 Accesses
Explore all metrics

Abstract

Automatic music emotion recognition (MER) has received increased attention in areas of music information retrieval and user interface development. Music emotion variation detection (or dynamic MER) captures also temporal changes of emotion, and emotional content in music is expressed as a series of valence-arousal predictions. One of the issues in MER is extraction of emotional characteristics from audio signal. We propose a deep neural network based solution for mining music emotion-related salient features directly from raw audio waveform. The proposed architecture is based on stacking one-dimensional convolution layer, autoencoder-based layer with iterative reconstruction, and bidirectional gated recurrent unit. The tests on the DEAM dataset have shown that the proposed solution, in comparison with other state-of-the-art systems, can bring a significant improvement of the regression accuracy, notably for the valence dimension. It is shown that the proposed iterative reconstruction layer is able to enhance the discriminative properties of the features and further increase regression accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Multimodal Fusion-Based Hybrid CRNN Model for Emotion Prediction in Music

A survey of music emotion recognition

Article 22 January 2022

Modularized composite attention network for continuous music emotion recognition

Article 19 August 2022

References

Aljanaki A, Yang Y-H, Soleymani M (2017) (2017) Developing a benchmark for emotional analysis of music. PLoS ONE 12(3):1–22
Article Google Scholar
Amiriparian S, Gerczuk M, Coutinho E, Baird A, Ottl S, Milling M, Schuller B (2019) Emotion and themes recognition in music utilising convolutional and recurrent neural networks. In: Proc. MediaEval 2019 workshop, Sophia Antipolis, France,
Bai J, Peng J, Shi J, Tang D, Wu Y, Li J, Luo K (2016) Dimensional music emotion recognition by valence-arousal regression. In: IEEE 15th international conference on cognitive informatics and cognitive computing. pp 42–49
Barthet M, Fazekas G, Sandler M (2012) Music emotion recognition: From content-to context-based models. International symposium on computer music modeling and retrieval. Springer, Berlin, pp 228–252
Google Scholar
Casey MA, Veltkamp R, Goto M, Leman M, Rhodes C, Slaney M (2008) Content-based music information retrieval: Current directions and future challenges. Proc. of the IEEE 96(4):668–696
Article Google Scholar
Cheuk KW, Luo YJ, Balamurali BT, Roig G, Herremans D (2020) Regression-based music emotion prediction using triplet neural networks. In: Proc. international joint conference on neural networks (IJCNN), July 2020. pp 1–7
Choi K, Fazekas G, Sandler M, Cho K (2017) Convolutional recurrent neural networks for music classification. In: Proc. IEEE int. conf. acoust., speech, signal process. (ICASSP).pp 2392–2396
Coutinho E, Cangelosi A (2011) Musical emotions: Predicting second-by-second subjective feelings of emotion from low-level psychoacoustic features and physiological measurements. Emotion 11(4):921
Article Google Scholar
Coutinho E, Trigeorgis G, Zafeiriou S, Schuller B (2015) Automatically estimating emotion in music with deep long-short term memory recurrent neural networks. In: Working Notes Proc. of the MediaEval 2015 workshop, Wurzen, Germany
DEAM dataset - The MediaEval database for emotional analysis of music. http://cvml.unige.ch/databases/DEAM. Accessed 30 Sept 2020
Deng JJ, Leung CH (2015) Dynamic time warping for music retrieval using time series modeling of musical emotions. IEEE Transactions on Affective Computing 6(2):137–151
Article Google Scholar
Dieleman S, Schrauwen B (2014) End-to-end learning for music audio. In: IEEE international conference on acoustics, speech and signal processing (ICASSP2014), May. pp 6964–6968
Dong Y, Yang X, Zhao X, Li J (2019) (2019) Bidirectional Convolutional Recurrent Sparse Network (BCRSN): An Efficient Model for Music Emotion Recognition. IEEE Trans. Multimedia 21(12):3150–3163
Article Google Scholar
Eerola T, Vuoskoski JK (2011) A comparison of the discrete and dimensional models of emotion in music. Psychology of Music 39(1):18–4
Article Google Scholar
Goodfellow IJ, Warde-Farley D, Mirza M, Courville A, Bengio Y (2013) Maxout networks. In: Proc. int. conf. on machine learning (ICML’13), Atlanta, GA, USA, June 2013, vol 28. pp III-1319–III-1327
Inskip C, Macfarlane A, Rafferty P (2012) Towards the disintermediation of creative music search: analysing queries to determine important facet. Int. Journal on Digital Libraries 12(2):137–147
Article Google Scholar
Kim J, Urbano J, Liem CC, Hanjalic A (2020) One deep music representation to rule them all? A comparative analysis of different representation learning strategies. Neural Computing and Applications 32(4):1067–1093
Article Google Scholar
Li X, Tian J, Xu M, Ning Y, Cai L (2016) DBLSTM-based multi-scale fusion for dynamic emotion prediction in music. In: Proc. IEEE int. conf. on multimedia and expo (ICME), Jul 2016. pp 1–6
Lim W, Jang D, Lee T (2016) Speech emotion recognition using convolutional and recurrent neural networks. In: Proc. Asia–Pac. signal inf. process. assoc. annu. summit conf.. pp 1–4
Malik M, Adavanne S, Drossos K, Virtanen T, Ticha D, Jarina R (2017) Stacked convolutional and recurrent neural networks for music emotion recognition. In: Proc. of sound and music computing conf., Espoo, Finland
Markov K, Matsui T (2014) Music genre and emotion recognition using Gaussian processes. IEEE access 2:688–697
Article Google Scholar
MediaEval benchmarking initiative for multimedia evaluation. http://www.multimediaeval.org/
Music Information Retrieval Evaluation eXchange. https://www.music-ir.org/mirex
OpenSmile audio feature extraction. https://www.audeering.com/opensmile/
Orjesek R, Jarina R, Chmulik M, Kuba M (2019) DNN based music emotion recognition from raw audio signal. In: Proc. 29th Int. Conf. Radioelektronika, Apr 2019. IEEE, Pardubice, pp 1–4
Panda R, Malheiro R, Paiva RP (2018) Novel audio features for music emotion recognition. IEEE Trans. on Affective Computing 11(4):614–626
Article Google Scholar
Pons J, Nieto O, Prockup M, Schmidt E, Ehmann A, Serra X (2018) End-to-end learning for music audio tagging at scale. In: 19th int. society for music information retrieval conference, Paris, France. arXiv:1711.02520. Accessed 15 Oct 2020
Richard G, Sundaram S, Narayanan S (2013) An overview on perceptually motivated audio indexing and classification. Proceedings of the IEEE 101(9):1939–1954
Article Google Scholar
Sarkar R, Choudhury S, Dutta S, Roy A, Saha SK (2020) Recognition of emotion in music based on deep convolutional neural network. Multimedia Tools and Applications 79(1):765–783
Article Google Scholar
Sharan Roneel V, Moir Tom J (2016) An overview of applications and advancements in automatic sound recognition. Neurocomputing 200:22–34
Article Google Scholar
Thickstun J, Harchaoui Z, Kakade S (2016) Learning features of music from scratch. In: Int. conf. on learning representations ICLR. arXiv:1611.09827. Accessed 15 Oct 2020
Tokozume Y, Harada T (2017) Learning environmental sounds with end-to-end convolutional neural network. In: Proc. of IEEE int. conf. on acoustics, speech and signal processing (ICASSP 2017). pp 2721–2725
Trigeorgis G, Ringeval F, Brueckner R, Marchi E, Nicolaou MA, Schuller B, Zafeiriou S (2016) Adieu features? end-to-end speech emotion recognition using a deep convolutional recurrent network. In: Proc. IEEE Int. conf. on acoustics, speech and signal processing (ICASSP). pp 5200–5204
Tzirakis P, Zhang J, Schuller BW (2018) End-to-end speech emotion recognition using deep neural networks. In: Proc. IEEE int. conf. on acoustics, speech and signal processing (ICASSP), April 2018. pp 5089–5093
Wang S, Wang Z, Ji Q (2015) Multiple emotional tagging of multimedia data by exploiting dependencies among emotions. Multimedia Tools and Applications 74(6):1863–1883
Article Google Scholar
Wei W, Zhu H, Benetos E, Wang Y (2020) A-CRNN: a domain adaptation model for sound event detection. In: Proc. IEEE int. conf. acoust., speech, signal process. (ICASSP), May 2020. pp 276–280
Weninger F, Ringeval F, Marchi E, Schuller B (2016) Discriminatively trained recurrent neural networks for continuous dimensional emotion recognition from audio. In: Proc. int. joint conf. on artificial intelligence (IJCAI’16), Jul 2016. pp 2196–2202
Xu M, Li X, Xianyu H, Tian J, Meng F, Chen W (2015) Multi-scale approaches to the MediaEval 2015 emotion in music task. In: Working notes proc. of the MediaEval 2015 Workshop, Wurzen, Germany
Yang X, Dong Y, Li J (2018) Review of data features-based music emotion recognition methods. Multimedia systems 24(4):365–389
Article Google Scholar
Yang Y-H, Lin Y-C, Su Y-F, Chen HH (2008) A regression approach to music emotion recognition. IEEE Trans. Audio, Speech, and Lang. Proc. 16(2): 448–457
Yang YH, Chen HH (2012) Machine recognition of music emotion: A review. ACM Transactions on Intelligent Systems and Technology 3(3):1–30
Article Google Scholar

Download references

Author information

Authors and Affiliations

BrainIt.sk, Zilina, Slovakia
Richard Orjesek
University of Zilina, Zilina, Slovakia
Richard Orjesek, Roman Jarina & Michal Chmulik

Authors

Richard Orjesek
View author publications
You can also search for this author in PubMed Google Scholar
Roman Jarina
View author publications
You can also search for this author in PubMed Google Scholar
Michal Chmulik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roman Jarina.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Orjesek, R., Jarina, R. & Chmulik, M. End-to-end music emotion variation detection using iteratively reconstructed deep features. Multimed Tools Appl 81, 5017–5031 (2022). https://doi.org/10.1007/s11042-021-11584-7

Download citation

Received: 30 December 2020
Revised: 03 August 2021
Accepted: 20 September 2021
Published: 08 January 2022
Issue Date: February 2022
DOI: https://doi.org/10.1007/s11042-021-11584-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

End-to-end music emotion variation detection using iteratively reconstructed deep features

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multimodal Fusion-Based Hybrid CRNN Model for Emotion Prediction in Music

A survey of music emotion recognition

Modularized composite attention network for continuous music emotion recognition

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now