Development of music emotion classification system using convolution neural network

Chaudhary, Deepti; Singh, Niraj Pratap; Singh, Sachin

doi:10.1007/s10772-020-09781-0

Development of music emotion classification system using convolution neural network

Published: 26 November 2020

Volume 24, pages 571–580, (2021)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

1099 Accesses
16 Citations
Explore all metrics

Abstract

Music emotion classification (MEC) is the multidisciplinary research area that is related to perceive the emotions from the songs and label the songs with particular emotion classes. MEC systems (MECS) extract the features from the songs and then the songs are categorized on the basis of emotions by comparing their features. In this paper an MECS has been proposed that makes use of Convolutional Neural Network (CNN) by converting the music to their visual representation known as spectrograms. By using CNN extraction of specific features of music signals is not necessarily required to classify the songs. In this work two MECS are trained and tested by using Hindi database by using CNN and third MECS system is developed by using SVM. In first MECS spectrograms are obtained by using hamming windows of size 2048 and noverlap factor of 1024 and in second MECS spectrograms are obtained by using hamming windows of size 1024 and noverlap factor of 512. The three combinations of CNN layers are used in order to classify the songs in four, eight and sixteen classes on the basis of emotional tags. The performance of MECS design is analyzed on the basis of training accuracy, validation accuracy, training loss and validation loss. Results show that the two MECS systems developed by CNN has better accuracy and less loss than the third MECS system modeled by SVM.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Facial emotion recognition using convolutional neural networks (FERC)

Article 18 February 2020

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Article Open access 07 May 2022

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

Article Open access 13 February 2024

References

Aljanaki, A. (2016). Emotion in music: Representation and computational modeling.
Bhattarai, B., & Lee, J. (2019). Automatic music mood detection using transfer learning and multilayer perceptron. International Journal of Fuzzy Logic and Intelligent Systems, 19(2), 88–96.
Article Google Scholar
Bilal Er, M., & Aydilek, I. B. (2019). Music emotion recognition by using chroma spectrogram and deep visual features. Journal of Computational Intelligent Systems, 12(2), 1622–1634.
Article Google Scholar
Bischke, B., Helber, P., Schulze, C., Srinivasan, V., Dengel, A., & Borth, D. (2017). The multimedia satellite task at mediaeval 2017: Emergency response for flooding events. In CEUR workshop proceedings, September 13–15, 2017, Ireland, Dublin.
Bishop, C. M. (2006). Pattern recognition and machine learning. New York: Springer.
MATH Google Scholar
Cabrera, D., Ferguson, S., & Schubert, E. (2007). Psysound3: Software for acoustical and psychoacoustical analysis of sound recordings. In Proceedings of the 13th international conference on auditory display, June 26–29, 2007, Montréal, Canada.
Carruthers, A., & Carruthers, J. (1990). Handwritten digit recognition with a back-propagation network.
Chiang, W. C., Wang, J. S., & Hsu, Y. L. (2014). A music emotion recognition algorithm with hierarchical SVM based classifiers. In International symposium on computer, consumer and control (pp. 1249–1252), June 10–12, 2014, Taichung, Taiwan.
Dörfler, M., Bammer, R., & Grill, T. (2017). Inside the spectrogram: Convolutional Neural Networks in audio processing. In International conference on sampling theory and applications (SampTA) (Vol. 1, pp. 152–155), July 3–7, 2017, Tallin, Estonia.
Ekman, P. (1992). An argument for basic emotions. Cognition and Emotion, 6(3–4), 169–200.
Article Google Scholar
Flanagan, J. L., Allen, J. B., & Hasegawa-Johnson, M. A. (1972). Speech Analysis, Synthesis, and Perception (2nd ed.). Berlin: NewYork.
Book Google Scholar
Hou, Y., & Chen, S. (2019). Distinguishing different emotions evoked by music via electroencephalographic signals. Computational Intelligence and Neuroscience, 2, 1–18.
Google Scholar
Hu, X. (2017). A framework for evaluating multimodal music mood classification. Journal of the Association for Information Science and Technology, 68(2), 273–285.
Article Google Scholar
Hu, X., Downie, J. S., Laurier, C., Bay, M., & Ehmann, A. F. (2008) The 2007 MIREX audio mood classification task : Lessons learned. In Proceedings of 9th international conference on music information retrieval (pp. 462–467), September 14–18, 2008, Philadelphia, PA, United States.
Kim, Y. E., Williamson, D. S., & Pilli, S. (2006). Towards quantifying the ‘album effect’ in artist identification. In Proceedings of 7th international conference on music information retrieval (pp. 393–394), October 8–12, 2006, Victoria, Canada.
Koelstra, S., et al. (2012). DEAP: A database for emotion analysis; Using physiological signals. IEEE Transactions on Affective Computing, 3(1), 18–31.
Article Google Scholar
Lecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of IEEE, 86(11), 2278–2324.
Article Google Scholar
Lee, J., & Nam, J. (2017). Multi-level and multi-scale feature aggregation using pretrained convolutional neural networks for music auto-tagging. IEEE Signal Processing Letters, 24(8), 1208–1212.
Article Google Scholar
Lee, M. S., Lee, Y. K., Lim, M. T., & Kang, T. K. (2020). Emotion recognition using convolutional neural network with selected statistical photoplethysmogram features. Applied Sciences, 10(10), 3501.
Article Google Scholar
Liu, T., Han, L., Ma, L., & Guo, D. (2018). Audio-based deep music emotion recognition. In Proceedings of AIP (Vol. 1967), May 2018.
Liu, X., Chen, Q., Wu, X., Liu, Y., & Liu, Y. (2017). CNN based music emotion classification.
Niu, X., Chen, L., & Chen, Q. (2011). Research on genetic algorithm based on emotion recognition using physiological signals. In: International conference on computational problem-solving I (pp. 614–618), October 21–23, 2011, Chengdu, China.
Olivier Lartillot, P. T. (2007). A matlab toolbox for musical feature extraction from audio. In International conference on digital audio effects, Bordeaux.
Oppenheim, A. V., Schafer, R. W., & Buck, J. R. (1999). Discrete-time signal processing. Upper Saddle River, NJ: Prentice Hall.
Google Scholar
Plutchik, R. (2001). The nature of emotions: Human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice. American Scientist, 89(4), 344–350.
Article Google Scholar
Prat, C. C. (1950). Music as the language of emotion. The Library of Congress.
Rabiner, L., & Schafer, R. W. (1978). Digital processing of speech signals. Englewood Cliffs, NJ: Prentice-Hall.
Google Scholar
Rao, V., Ramakrishnan, S., & Rao, P. (2003). Singing voice detection in North Indian classical music. In National conference on communications, February 01–03, 2003, Indian Institute of Technology, Bombay.
Ross, R. T. (1938). A statistic for circular series. Journal of Educational Psychology, 29(5), 384–389.
Article Google Scholar
Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6), 1161–1178.
Article Google Scholar
Saari, P., Eerola, T., & Lartillot, O. (2011). Generalizability and simplicity as criteria in feature selection: Application to mood classification in music. IEEE Transactions on Audio, Speech and Language Processing, 19(6), 1802–1812.
Article Google Scholar
Sawata, R., Ogawa, T., & Haseyama, M. (2017). Novel audio feature projection using KDLPCCA-based correlation with EEG features for favorite music classification. IEEE Transactions on Affective Computing, 3045, 1–14.
Google Scholar
Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85–117.
Article Google Scholar
Shakya, A., Gurung, B., Thapa, M. S., & Rai, M. (2017). Music classification based on genre and mood. In International conference on computational intelligence, communications and bussiness analytics (Vol. 776, pp. 168–183), Singapore.
Thayer, R. E. (1989). The biopsychology of mood and arousal. New York, NY: Oxford University Press.
Google Scholar
Tseng, K. C., Lin, B. S., Han, C. M., & Wang, P. S. (2013). Emotion recognition of EEG underlying favourite music by support vector machine. In Proceedings of 1st international conference on Orange technologies, pp. 155–158, March 12–16, 2013, Tainan, Taiwan.
Turnbull, D., Barrington, L., Torres, D., & Lanckriet, G. (2008). Semantic annotation and retrieval of music and sound effects. IEEE Transactions on Audio, Speech and Language Processing, 16(2), 467–476.
Article Google Scholar
Tzanetakis, G., & Cook, P. (2000). MARSYAS: A framework for audio analysis. Organised Sound, 4(3), 169–175.
Article Google Scholar
Wang, J., Chen, N., & Zhang, K. (2010). Music emotional classification and continuous model. In Proceedings of 2nd international conference on software engineering and data mining (SEDM) (pp. 544–547), June 23–25, 2010, Chengdu, China.
Wang, J. C., Yang, Y. H., Wang, H. M., & Jeng, S. K. (2015). Modeling the affective content of music with a Gaussian mixture model. IEEE Transactions on Affective Computing, 6(1), 56–68.
Article Google Scholar
Wang, S. Y., Wang, J. C., Yang, Y. H., & Wang, H. M. (2014). Towards time—Varying music auto-tagging based on CAL500 expansion. In Proceedings of international conference on multimedia and expo., July 14–18, 2014, Chengdu, China
Wei, Z., Li, X., & Yang, L. (2014). Extraction and evaluation model for the basic characteristics of MIDI file music. In Proceedings of 26th Chinese control and decision conference (pp. 2083–2087), May 31–June 2, 2014, Changsha, China.
Wiatowski, T., & Bolcskei, H. (2018). A mathematical theory of deep convolutional neural networks for feature extraction. IEEE Transactions on Information Theory, 64(3), 1845–1866.
Article MathSciNet Google Scholar
Yang, Y.-H., & Chen, H. H. (2012). Machine recognition of music emotion. ACM Transactions on Intelligent Systems and Technology, 3(3), 1–30.
Article Google Scholar
Yang, Y. H., Su, Y. F., Lin, Y. C., & Chen, H. H. (2011). Music emotion recognition. Boca Raton: CRC Press.
Book Google Scholar
Zhu, B., & Bai, Z. C. (2010). Overview of artificial emotion in music. In Conference on computer-aided industrial design and conceptual design (Vol. 2, pp. 1577–1581), November 17–19, 2010, Yiwu, China.

Download references

Author information

Authors and Affiliations

Department of Electronics Engineering, National Institute of Technology Kurukshetra, Kurukshetra, Haryana, 136119, India
Deepti Chaudhary & Niraj Pratap Singh
Department of Electrical and Electronics Engineering, National Institute of Technology, New Delhi, Delhi, 110040, India
Sachin Singh
Electronics and Communication Department, University Institute of Engineering and Technology, Kurukshetra University, Kurukshetra, Haryana, 136119, India
Deepti Chaudhary

Authors

Deepti Chaudhary
View author publications
You can also search for this author in PubMed Google Scholar
Niraj Pratap Singh
View author publications
You can also search for this author in PubMed Google Scholar
Sachin Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Deepti Chaudhary.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chaudhary, D., Singh, N.P. & Singh, S. Development of music emotion classification system using convolution neural network. Int J Speech Technol 24, 571–580 (2021). https://doi.org/10.1007/s10772-020-09781-0

Download citation

Received: 27 May 2020
Accepted: 16 November 2020
Published: 26 November 2020
Issue Date: September 2021
DOI: https://doi.org/10.1007/s10772-020-09781-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Development of music emotion classification system using convolution neural network

Abstract

Access this article

Similar content being viewed by others

Facial emotion recognition using convolutional neural networks (FERC)

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Development of music emotion classification system using convolution neural network

Abstract

Access this article

Similar content being viewed by others

Facial emotion recognition using convolutional neural networks (FERC)

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation