Recognition of emotion in music based on deep convolutional neural network

Sarkar, Rajib; Choudhury, Sombuddha; Dutta, Saikat; Roy, Aneek; Saha, Sanjoy Kumar

doi:10.1007/s11042-019-08192-x

Recognition of emotion in music based on deep convolutional neural network

Published: 13 September 2019

Volume 79, pages 765–783, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Rajib Sarkar ORCID: orcid.org/0000-0002-7498-1628^1,2,
Sombuddha Choudhury¹,
Saikat Dutta¹,
Aneek Roy¹ &
…
Sanjoy Kumar Saha¹

1964 Accesses
44 Citations
Explore all metrics

Abstract

In the domain of music information retrieval, emotion based classification is an active area of research. Emotion being a perceptual and subjective concept, the task is quite challenging. It is very difficult to design signal based descriptors to represent emotions. In this work deep leaning network is proposed and experiment is done with benchmark datasets namely, Soundtracks, Bi-Modal and MER_taffc. Experiment has also been done with hand crafted descriptor consisting of different time domain and spectral features, linear predictive coding and MFCC based features. Different classifiers like, neural network, support vector machine and random forest are tried. Although the combined feature set with neural network provides an optimal result for the datasets, but in general the performance of such approaches is limited. It is difficult to obtain a consistent feature set that works across the classifier and datasets. To get rid of the issue of feature design, deep learning based approach is followed. A convolutional neural network built around VGGNet and a novel post-processing technique are proposed. Proposed methodology provides substantial improvement of performance for the datasets. Comparison with other reported works on three different datasets also establishes the superiority of the proposed methodology. The improvement in performance has been substantiated by Z test.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Facial emotion recognition using convolutional neural networks (FERC)

Article 18 February 2020

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Article Open access 07 May 2022

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

Article Open access 13 February 2024

References

Abdel-Hamid O, Mohamed AR, Jiang H, Deng L, Penn G, Yu D (2014) Convolutional neural networks for speech recognition. IEEE/ACM Trans Audio Speech Lang Process 22(10):1533–1545
Article Google Scholar
Albornoz E, Sänchez-Gutiërrez M, Martinez F, Rufiner H, Goddard J (2014) Spoken emotion recognition using deep learning. In: Iberoamerican congress on pattern recognition, pp 104–111
Badshah AM, Rahim N, Ullah N, Ahmad J, Muhammad K, Lee MY, Kwon S, Baik SW (2019) Deep features-based speech emotion recognition for smart affective services. Multimed Tools Appl 78(5):5571–5589
Article Google Scholar
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473
Bigand E, Vieillard S, Madurell F, Marozeau J, Dacquet A (2005) Multidimensional scaling of emotional responses to music: The effect of musical expertise and of the duration of the excerpts. Cogn Emot 19(8):1113–1139
Article Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article Google Scholar
Cabrera D et al (1999) Psysound: a computer program for psychoacoustical analysis. In: Australian acoustical society conference, vol 24, pp 47–54
Casella G, Berger RL (2002) Statistical inference, vol 2. CA, Duxbury Pacific Grove
Google Scholar
Chollet F (2015) Keras. https://github.com/fchollet/keras
Coutinho E, Trigeorgis G, Zafeiriou S, Schuller BW (2015) Automatically estimating emotion in music with deep long-short term memory recurrent neural networks. In: Mediaeval
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
MATH Google Scholar
Cummins N, Amiriparian S, Hagerer G, Batliner A, Steidl S, Schuller BW (2017) An image-based deep spectrum feature representation for the recognition of emotional speech. In: International conference on multimedia, pp 478–484
Droit-Volet S, Ramos D, Bueno L, Bigand E (2013) music, emotion, and time perception: the influence of subjective emotional valence and arousal? Front Psychol 4:417
Article Google Scholar
Eerola T, Vuoskoski JK (2011) A comparison of the discrete and dimensional models of emotion in music. Psychol Music 39(1):18–49
Article Google Scholar
Gabrielsson A, Lindström E (2001) The influence of musical structure on emotional expression. Oxford University Press, Oxford
Gharavian D, Bejani M, Sheikhan M (2017) Audio-visual emotion recognition using fcbf feature selection method and particle swarm optimization for fuzzy artmap neural networks. Multimed Tools Appl 76(2):2331–2352
Article Google Scholar
Goldberg Y (2017) Neural network methods for natural language processing. Synth Lect Hum Lang Technol 10(1):1–309
Article Google Scholar
Han BJ, Rho S, Jun S, Hwang E (2010) Music emotion classification and context-based music recommendation. Multimed Tools Appl 47(3):433–460
Article Google Scholar
Hassan A, Damper R, Niranjan M (2013) On acoustic emotion recognition: compensating for covariate shift. IEEE Trans Audio Speech Lang Process 21(7):1458–1468
Article Google Scholar
Hastie T, Tibshirani R, Friedman J (2008) The Elements of Statistical Learning, 2 edn., chap. Random Forests. Springer, pp 592
Huang Z, Dong M, Mao Q, Zhan Y (2014) Speech emotion recognition using cnn. In: ACM International conference on multimedia, pp 801–804
Huang Z, Xue W, Mao Q, Zhan Y (2017) Unsupervised domain adaptation for speech emotion recognition using pcanet. Multimed Tools Appl 76(5):6785–6799
Article Google Scholar
Huq A, Bello JP, Rowe R (2010) Automated music emotion recognition: a systematic evaluation. J Music Res 39(3):227–244
Article Google Scholar
Jun Han B, Rho S, Dannenberg RB, Hwang E (2009) Smers: Music emotion recognition using support vector regression. In: International society for music information retrieval, pp 651–656
Kahou SE, Pal C, Bouthillier X, Froumenty P, Gülċehre Ċ, Memisevic R, Vincent P, Courville A, Bengio Y, Ferrari RC et al (2013) Combining modality specific deep neural networks for emotion recognition in video. In: International conference on multimodal interaction, pp 543–550
Kim Y, Schmidt EM, Migneco R, Morton BG, Richardson P, Scott J, Speck JA, Turnbull D (2010) Music emotion recognition: a state of the art review. In: International society for music information retrieval, pp 255–266
Kim Y, Lee H, Provost EM (2013) Deep learning for robust feature generation in audiovisual emotion recognition. In: International conference on acoustics, speech and signal processing, pp 3687–3691
Kingma D, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Krumhansl CL (2002) Music: a link between cognition and emotion. Curr Direct Psychol Sci 11(2):45–50
Article Google Scholar
Lerch A (2012) An Introduction to Audio Content Analysis: Applications in Signal Processing and Music Informatics, 1st edn. Wiley-IEEE Press, New York
Book Google Scholar
Logan B (2000) Mel frequency cepstral coefficients for music modeling. In: International society for music information retrieval, pp 138–147
Lu L, Liu D, Zhang H (2006) Automatic mood detection and tracking of music audio signals. IEEE Trans Audio Speech Lang Process 14(1):5–18
Article Google Scholar
Lu Q, Chen X, Yang D, Wang J (2010) Boosting for multi-modal music emotion. In: International society for music information and retrieval conference, pp 105–105
Lin YC, Yang YH, Chen HH (2011) Exploiting online music tags for music emotion classification. ACM Trans Multimed Comput Commun Appl 7S(1):26:1–26:16
Google Scholar
Liu X, Chen Q, Wu X, Liu Y, Liu Y (2017) Cnn based music emotion classification. arXiv:1704.05665
Malheiro R, Panda R, Gomes P, Paiva R (2016) Bi-modal music emotion recognition: Novel lyrical features and dataset. In: International workshop on music and machine learning
Mao Q, Dong M, Huang Z, Zhan Y (2014) Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Trans Multimed 16(8):2203–2213
Article Google Scholar
Markov K, Iwata M, Matsui T (2013) Music emotion recognition using gaussian processes. In: Mediaeval
Minsky M, Papert S (1969) Perceptrons. MIT Press, Cambridge
Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: International conference on machine learning, pp 807–814
Nordström H, Laukka P (2019) The time course of emotion recognition in speech and music. J Acoust Soc Amer 145(5):3058–3074
Article Google Scholar
Ooi CS, Seng KP, Ang LM, Chew LW (2014) A new approach of audio emotion recognition. Expert Syst Appl 41(13):5858–5869
Article Google Scholar
Panda R, Malheiro RM, Paiva RP (2018) Novel audio features for music emotion recognition. IEEE Transactions on Affective Computing
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825–2830
MathSciNet MATH Google Scholar
Rabiner LR, Schafer RW (2007) Introduction to digital speech processing. Found Trends Signal Process 1(1):1–194
Article Google Scholar
Rao KS, Reddy VR, Maity S (2015) Language identification using spectral and prosodic features. Springer, Berlin
Russell J (1980) A circumplex model of affect. J Person Soc Psychol 39 (6):1161–1178
Article Google Scholar
Saari P, Eerola T, Lartillot O (2011) Generalizability and simplicity as criteria in feature selection: Application to mood classification in music. IEEE Trans Audio Speech Lang Process 19(6):1802–1812
Article Google Scholar
Schmidt EM, Kim Y (2011) Learning emotion-based acoustic features with deep belief networks. In: IEEE Workshop on applications of signal processing to audio and acoustics, pp 65–68
Sadowski P (2016) Notes on backpropagation. homepage: https://www.ics.uci.edu/~pjsadows/notes.pdf (online)
Sanyal S, Banerjee A, Sengupta R, Ghosh D (2016) Chaotic brain, musical mind-a non-linear neurocognitive physics based study. Journal of Neurology and Neuroscience
Seo YS, Huh JH (2019) Automatic emotion-based music classification for supporting intelligent iot applications. Electronics 8(2):164
Article Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:http://arXiv.org/abs/1409.1556
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Computer vision and pattern recognition, pp 1–9
Thayer RE (1990) The biopsychology of mood and arousal. Oxford University Press, Oxford
Thammasan N, Fukui K, Numao M (2016) Application of deep belief networks in eeg-based dynamic music-emotion recognition. In: International joint conference on neural networks, pp 881–888
Trigeorgis G, Ringeval F, Brueckner R, Marchi E, Nicolaou MA, Schuller B, Zafeiriou S (2016) Adieu features? end-to-end speech emotion recognition using a deep convolutional recurrent network. In: International conference on acoustics, speech and signal processing, pp 5200–5204
Tzanetakis G, Cook P (1999) Marsyas: a framework for audio analysis. Organised Sound 4(3):169–175
Article Google Scholar
Yang YH, Lin YC, Su YF, Chen HH (2007) Music emotion classification: a regression approach. In: International conference on multimedia and expo, pp 208–211
Yang YH, Lin YC, Su YF, Chen HH (2008) A regression approach to music emotion recognition. IEEE Trans Audio Speech Lang Process 16(2):448–457
Article Google Scholar
Yang YH, Chen HH (2012) Machine recognition of music emotion: a review. ACM Trans Intell Syst Technol 3(3):40:1–40:30
Article Google Scholar
Yang X, Dong Y, Li J (2018) Review of data features-based music emotion recognition methods. Multimedi Syst 24(4):365–389
Article Google Scholar
Yeh CH, Tseng WY, Chen CY, Lin YD, Tsai YR, Bi HI, Lin YC, Lin HY (2014) Popular music representation: chorus detection & emotion recognition. Multimed Tools Appl 73(3):2103–2128
Article Google Scholar
Zhang F, Meng H, Li M (2016) Emotion extraction and recognition from music. In: International conference on natural computation, fuzzy systems and knowledge discovery, pp 1728–1733
Zheng WL, Lu BL (2015) Investigating critical frequency bands and channels for eeg-based emotion recognition with deep neural networks. IEEE Trans Auton Ment Dev 7(3):162–175
Article Google Scholar
Zeng N, Zhang H, Song B, Liu W, Li Y, Dobaie AM (2018) Facial expression recognition via learning deep sparse autoencoders. Neurocomputing 273:643–649
Article Google Scholar
Zao L, Cavalcante D, Coelho R (2014) Time-frequency feature and ams-gmm mask for acoustic emotion classification. IEEE Signal Process Lett 21(5):620–624
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engg., Jadavpur University, Kolkata, 700032, India
Rajib Sarkar, Sombuddha Choudhury, Saikat Dutta, Aneek Roy & Sanjoy Kumar Saha
Computer Science Department, Derozio Memorial College, Kolkata, 700136, India
Rajib Sarkar

Authors

Rajib Sarkar
View author publications
You can also search for this author in PubMed Google Scholar
Sombuddha Choudhury
View author publications
You can also search for this author in PubMed Google Scholar
Saikat Dutta
View author publications
You can also search for this author in PubMed Google Scholar
Aneek Roy
View author publications
You can also search for this author in PubMed Google Scholar
Sanjoy Kumar Saha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rajib Sarkar.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sarkar, R., Choudhury, S., Dutta, S. et al. Recognition of emotion in music based on deep convolutional neural network. Multimed Tools Appl 79, 765–783 (2020). https://doi.org/10.1007/s11042-019-08192-x

Download citation

Received: 10 November 2018
Revised: 01 August 2019
Accepted: 06 September 2019
Published: 13 September 2019
Issue Date: January 2020
DOI: https://doi.org/10.1007/s11042-019-08192-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recognition of emotion in music based on deep convolutional neural network

Abstract

Access this article

Similar content being viewed by others

Facial emotion recognition using convolutional neural networks (FERC)

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Recognition of emotion in music based on deep convolutional neural network

Abstract

Access this article

Similar content being viewed by others

Facial emotion recognition using convolutional neural networks (FERC)

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation