Skip to main content
Log in

Development of music emotion classification system using convolution neural network

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Music emotion classification (MEC) is the multidisciplinary research area that is related to perceive the emotions from the songs and label the songs with particular emotion classes. MEC systems (MECS) extract the features from the songs and then the songs are categorized on the basis of emotions by comparing their features. In this paper an MECS has been proposed that makes use of Convolutional Neural Network (CNN) by converting the music to their visual representation known as spectrograms. By using CNN extraction of specific features of music signals is not necessarily required to classify the songs. In this work two MECS are trained and tested by using Hindi database by using CNN and third MECS system is developed by using SVM. In first MECS spectrograms are obtained by using hamming windows of size 2048 and noverlap factor of 1024 and in second MECS spectrograms are obtained by using hamming windows of size 1024 and noverlap factor of 512. The three combinations of CNN layers are used in order to classify the songs in four, eight and sixteen classes on the basis of emotional tags. The performance of MECS design is analyzed on the basis of training accuracy, validation accuracy, training loss and validation loss. Results show that the two MECS systems developed by CNN has better accuracy and less loss than the third MECS system modeled by SVM.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Aljanaki, A. (2016). Emotion in music: Representation and computational modeling.

  • Bhattarai, B., & Lee, J. (2019). Automatic music mood detection using transfer learning and multilayer perceptron. International Journal of Fuzzy Logic and Intelligent Systems, 19(2), 88–96.

    Article  Google Scholar 

  • Bilal Er, M., & Aydilek, I. B. (2019). Music emotion recognition by using chroma spectrogram and deep visual features. Journal of Computational Intelligent Systems, 12(2), 1622–1634.

    Article  Google Scholar 

  • Bischke, B., Helber, P., Schulze, C., Srinivasan, V., Dengel, A., & Borth, D. (2017). The multimedia satellite task at mediaeval 2017: Emergency response for flooding events. In CEUR workshop proceedings, September 13–15, 2017, Ireland, Dublin.

  • Bishop, C. M. (2006). Pattern recognition and machine learning. New York: Springer.

    MATH  Google Scholar 

  • Cabrera, D., Ferguson, S., & Schubert, E. (2007). Psysound3: Software for acoustical and psychoacoustical analysis of sound recordings. In Proceedings of the 13th international conference on auditory display, June 26–29, 2007, Montréal, Canada.

  • Carruthers, A., & Carruthers, J. (1990). Handwritten digit recognition with a back-propagation network.

  • Chiang, W. C., Wang, J. S., & Hsu, Y. L. (2014). A music emotion recognition algorithm with hierarchical SVM based classifiers. In International symposium on computer, consumer and control (pp. 1249–1252), June 10–12, 2014, Taichung, Taiwan.

  • Dörfler, M., Bammer, R., & Grill, T. (2017). Inside the spectrogram: Convolutional Neural Networks in audio processing. In International conference on sampling theory and applications (SampTA) (Vol. 1, pp. 152–155), July 3–7, 2017, Tallin, Estonia.

  • Ekman, P. (1992). An argument for basic emotions. Cognition and Emotion, 6(3–4), 169–200.

    Article  Google Scholar 

  • Flanagan, J. L., Allen, J. B., & Hasegawa-Johnson, M. A. (1972). Speech Analysis, Synthesis, and Perception (2nd ed.). Berlin: NewYork.

    Book  Google Scholar 

  • Hou, Y., & Chen, S. (2019). Distinguishing different emotions evoked by music via electroencephalographic signals. Computational Intelligence and Neuroscience, 2, 1–18.

    Google Scholar 

  • Hu, X. (2017). A framework for evaluating multimodal music mood classification. Journal of the Association for Information Science and Technology, 68(2), 273–285.

    Article  Google Scholar 

  • Hu, X., Downie, J. S., Laurier, C., Bay, M., & Ehmann, A. F. (2008) The 2007 MIREX audio mood classification task : Lessons learned. In Proceedings of 9th international conference on music information retrieval (pp. 462–467), September 14–18, 2008, Philadelphia, PA, United States.

  • Kim, Y. E., Williamson, D. S., & Pilli, S. (2006). Towards quantifying the ‘album effect’ in artist identification. In Proceedings of 7th international conference on music information retrieval (pp. 393–394), October 8–12, 2006, Victoria, Canada.

  • Koelstra, S., et al. (2012). DEAP: A database for emotion analysis; Using physiological signals. IEEE Transactions on Affective Computing, 3(1), 18–31.

    Article  Google Scholar 

  • Lecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of IEEE, 86(11), 2278–2324.

    Article  Google Scholar 

  • Lee, J., & Nam, J. (2017). Multi-level and multi-scale feature aggregation using pretrained convolutional neural networks for music auto-tagging. IEEE Signal Processing Letters, 24(8), 1208–1212.

    Article  Google Scholar 

  • Lee, M. S., Lee, Y. K., Lim, M. T., & Kang, T. K. (2020). Emotion recognition using convolutional neural network with selected statistical photoplethysmogram features. Applied Sciences, 10(10), 3501.

    Article  Google Scholar 

  • Liu, T., Han, L., Ma, L., & Guo, D. (2018). Audio-based deep music emotion recognition. In Proceedings of AIP (Vol. 1967), May 2018.

  • Liu, X., Chen, Q., Wu, X., Liu, Y., & Liu, Y. (2017). CNN based music emotion classification.

  • Niu, X., Chen, L., & Chen, Q. (2011). Research on genetic algorithm based on emotion recognition using physiological signals. In: International conference on computational problem-solving I (pp. 614–618), October 21–23, 2011, Chengdu, China.

  • Olivier Lartillot, P. T. (2007). A matlab toolbox for musical feature extraction from audio. In International conference on digital audio effects, Bordeaux.

  • Oppenheim, A. V., Schafer, R. W., & Buck, J. R. (1999). Discrete-time signal processing. Upper Saddle River, NJ: Prentice Hall.

    Google Scholar 

  • Plutchik, R. (2001). The nature of emotions: Human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice. American Scientist, 89(4), 344–350.

    Article  Google Scholar 

  • Prat, C. C. (1950). Music as the language of emotion. The Library of Congress.

  • Rabiner, L., & Schafer, R. W. (1978). Digital processing of speech signals. Englewood Cliffs, NJ: Prentice-Hall.

    Google Scholar 

  • Rao, V., Ramakrishnan, S., & Rao, P. (2003). Singing voice detection in North Indian classical music. In National conference on communications, February 01–03, 2003, Indian Institute of Technology, Bombay.

  • Ross, R. T. (1938). A statistic for circular series. Journal of Educational Psychology, 29(5), 384–389.

    Article  Google Scholar 

  • Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6), 1161–1178.

    Article  Google Scholar 

  • Saari, P., Eerola, T., & Lartillot, O. (2011). Generalizability and simplicity as criteria in feature selection: Application to mood classification in music. IEEE Transactions on Audio, Speech and Language Processing, 19(6), 1802–1812.

    Article  Google Scholar 

  • Sawata, R., Ogawa, T., & Haseyama, M. (2017). Novel audio feature projection using KDLPCCA-based correlation with EEG features for favorite music classification. IEEE Transactions on Affective Computing, 3045, 1–14.

    Google Scholar 

  • Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85–117.

    Article  Google Scholar 

  • Shakya, A., Gurung, B., Thapa, M. S., & Rai, M. (2017). Music classification based on genre and mood. In International conference on computational intelligence, communications and bussiness analytics (Vol. 776, pp. 168–183), Singapore.

  • Thayer, R. E. (1989). The biopsychology of mood and arousal. New York, NY: Oxford University Press.

    Google Scholar 

  • Tseng, K. C., Lin, B. S., Han, C. M., & Wang, P. S. (2013). Emotion recognition of EEG underlying favourite music by support vector machine. In Proceedings of 1st international conference on Orange technologies, pp. 155–158, March 12–16, 2013, Tainan, Taiwan.

  • Turnbull, D., Barrington, L., Torres, D., & Lanckriet, G. (2008). Semantic annotation and retrieval of music and sound effects. IEEE Transactions on Audio, Speech and Language Processing, 16(2), 467–476.

    Article  Google Scholar 

  • Tzanetakis, G., & Cook, P. (2000). MARSYAS: A framework for audio analysis. Organised Sound, 4(3), 169–175.

    Article  Google Scholar 

  • Wang, J., Chen, N., & Zhang, K. (2010). Music emotional classification and continuous model. In Proceedings of 2nd international conference on software engineering and data mining (SEDM) (pp. 544–547), June 23–25, 2010, Chengdu, China.

  • Wang, J. C., Yang, Y. H., Wang, H. M., & Jeng, S. K. (2015). Modeling the affective content of music with a Gaussian mixture model. IEEE Transactions on Affective Computing, 6(1), 56–68.

    Article  Google Scholar 

  • Wang, S. Y., Wang, J. C., Yang, Y. H., & Wang, H. M. (2014). Towards time—Varying music auto-tagging based on CAL500 expansion. In Proceedings of international conference on multimedia and expo., July 14–18, 2014, Chengdu, China

  • Wei, Z., Li, X., & Yang, L. (2014). Extraction and evaluation model for the basic characteristics of MIDI file music. In Proceedings of 26th Chinese control and decision conference (pp. 2083–2087), May 31–June 2, 2014, Changsha, China.

  • Wiatowski, T., & Bolcskei, H. (2018). A mathematical theory of deep convolutional neural networks for feature extraction. IEEE Transactions on Information Theory, 64(3), 1845–1866.

    Article  MathSciNet  Google Scholar 

  • Yang, Y.-H., & Chen, H. H. (2012). Machine recognition of music emotion. ACM Transactions on Intelligent Systems and Technology, 3(3), 1–30.

    Article  Google Scholar 

  • Yang, Y. H., Su, Y. F., Lin, Y. C., & Chen, H. H. (2011). Music emotion recognition. Boca Raton: CRC Press.

    Book  Google Scholar 

  • Zhu, B., & Bai, Z. C. (2010). Overview of artificial emotion in music. In Conference on computer-aided industrial design and conceptual design (Vol. 2, pp. 1577–1581), November 17–19, 2010, Yiwu, China.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deepti Chaudhary.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chaudhary, D., Singh, N.P. & Singh, S. Development of music emotion classification system using convolution neural network. Int J Speech Technol 24, 571–580 (2021). https://doi.org/10.1007/s10772-020-09781-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-020-09781-0

Keywords

Navigation