Skip to main content
Log in

COSMIC: Music emotion recognition combining structure analysis and modal interaction

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

As a common multi-modal information carrier, music is frequently used to deliver emotions with lyrics and melodies. Besides lyrics (text) and melodies (audio), the structure of a song is another indicator of emotions creating a strong resonance for listeners. Typically, a pop song is composed of verses and choruses. To improve the performance of existing music emotion recognition models, we first propose a hierarchical model to analyze music structure. Then, a cross-modal interaction method is developed to extract and interact emotions from different modalities. Finally, we perform music emotion recognition by combining music structure analysis and cross-modal interaction. Adequate experiments are conducted on a dataset crawled from Netease Cloud Music, and results demonstrate the effectiveness of music structure analysis and cross-modal interaction. The proposed model COSMIC achieves state-of-the-art performance on music emotion recognition tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Data Availability

The experiments conducted in this article used both publicly available datasets and a custom-built dataset. The publicly available datasets used in this study can be accessed through their original sources as cited in the references. The custom-built dataset used in this study was created by the authors and cannot be publicly shared due to potential copyright issues with some of the data sources.

Notes

  1. https://music.163.com/

  2. https://github.com/deezer/spleeter/

  3. https://www.audeering.com/opensmile/

References

  1. Agrawal Y, Shanker RGR, Alluri V (2021) Transformer-based approach towards music emotion recognition from lyrics. In: European conference on information retrieval, pp 167–175. Springer

  2. Aljanaki A, Yang Y-H, Soleymani M (2017) Developing a benchmark for emotional analysis of music. PloS one 12(3):0173392

    Article  Google Scholar 

  3. Baccianella S, Esuli A, Sebastiani F (2010) Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10)

  4. Benward B, Saker MN (1997) Music in theory and practice vol. 7. McGraw-Hill

  5. Bertin-Mahieux T, Ellis DPW, Whitman B, amere P (2011) The million song dataset. In: Proceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR, pp 591–596

  6. Bhattacharya A, Kadambari K (2018) A multimodal approach towards emotion recognition of music using audio and lyrical content. arXiv:1811.05760

  7. Carr D (2004) Music, meaning, and emotion. J Aesthet Art Crit 62(3):225–234

    Article  Google Scholar 

  8. Choi K, Fazekas G, Sandler MB, Cho K (2017) Convolutional recurrent neural networks for music classification. In: 2017 IEEE International conference on acoustics, speech and signal processing, ICASSP, pp 2392–2396. IEEE

  9. Delbouys R, Hennequin R, Piccoli F, Royo-letelier J, Moussallam M (2018) Music mood detection based on audio and lyrics with deep neural net. In: Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR, Paris, pp 370–375

  10. Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the north american chapter of the association for computational linguistics: Human language technologies, volume 1 (Long and Short Papers), pp 4171–4186

  11. Dhariwal P, Jun H, Payne C, Kim JW, Radford A, Sutskever I (2020) Jukebox: A generative model for music. arXiv:2005.00341

  12. Dong Y, Yang X, Zhao X, Li J (2019) Bidirectional convolutional recurrent sparse network (BCRSN):, an efficient model for music emotion recognition. IEEE Trans Multimed 21(12):3150–3163

    Article  Google Scholar 

  13. Eyben F, Weninger F, Gross F, Schuller B (2013) Recent developments in opensmile, the munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM International conference on multimedia, pp 835–838

  14. Ferreira LN, Whitehead J (2019) Learning to generate music with sentiment. In: Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR, Delft, pp 384–390

  15. Finnegan R (2012) Music, experience, and the anthropology of emotion. In: The cultural study of music, pp 375–385. Routledge

  16. Garg A, Chaturvedi V, Kaur AB, Varshney V, Parashar A (2022) Machine learning model for mapping of music mood and human emotion based on physiological signals. Multimed Tools Appl 81(4):5137–5177

    Article  Google Scholar 

  17. Han B-J, Rho S, Jun S, Hwang E (2010) Music emotion classification and context-based music recommendation. Multimed Tools Appl 47(3):433–460

    Article  Google Scholar 

  18. Hennequin R, Khlif A, Voituret F, Moussallam M (2020) Spleeter: a fast and efficient music source separation tool with pre-trained models. J Open Source Softw 5(50):2154

    Article  Google Scholar 

  19. Hizlisoy S, Yildirim S, Tufekci Z (2021) Music emotion recognition using convolutional long short term memory deep neural networks. Eng Sci Technol an Int J 24(3):760–767

    Article  Google Scholar 

  20. Hung H-T, Ching J, Doh S, Kim N, Nam J, Yang Y-H (2021) EMOPIA: A multi-modal pop piano dataset for emotion recognition and emotion-based music generation. In: Proceedings of the 22nd International Society for Music Information Retrieval Conference, ISMIR, Online, pp 318–325

  21. Kumar V, Minz S (2013) Mood classifiaction of lyrics using sentiwordnet. In: 2013 International conference on computer communication and informatics, pp 1–5. IEEE

  22. Laurier C, Grivolla J, Herrera P (2008) Multimodal music mood classification using audio and lyrics. In: 2008 7th International conference on machine learning and applications, pp 688–693. IEEE

  23. Mo S, Niu J (2019) A novel method based on OMPGW method for feature extraction in automatic music mood classification. IEEE Trans Affect Comput 10(3):313–324

    Article  Google Scholar 

  24. Panagakis Y, Kotropoulos C (2013) Music classification by low-rank semantic mappings. EURASIP J Audio Speech Music Process 2013(1):13

    Article  Google Scholar 

  25. Panda R, Malheiro R, Paiva RP (2020) Novel audio features for music emotion recognition. IEEE Trans Affect Comput 11(4):614–626

    Article  Google Scholar 

  26. Panda RES, Malheiro R, Rocha B, Oliveira AP, Paiva RP (2013) Multi-modal music emotion recognition: a new dataset, methodology and comparative analysis. In: 10th International symposium on computer music multidisciplinary research (CMMR 2013), pp 570–582

  27. Parisi L, Francia S, Olivastri S, Tavella MS (2019) Exploiting synchronized lyrics and vocal features for music emotion detection. arXiv:1901.04831

  28. Rahman JS, Gedeon T, Caldwell S, Jones R, Jin Z (2021) Towards effective music therapy for mental health care using machine learning tools: Human affective reasoning and music genres. J Artif Intell Soft Comput Res 11(1):5–20

    Article  Google Scholar 

  29. Robinson J (2005) Deeper than Reason: Emotion and its role in literature, music, and art. Oxford University Press on Demand, NY

    Book  Google Scholar 

  30. Shen Y, Tan S, Sordoni A, Courville AC (2019) Ordered neurons: Integrating tree structures into recurrent neural networks. In: 7th International conference on learning representations, ICLR 2019, new orleans

  31. Stein D (2005) Engaging music: Essays in music analysis. Oxford University Press, USA

    Google Scholar 

  32. Won M, Oramas S, Nieto O, Gouyon F, Serra X (2021) Multimodal metric learning for tag-based music retrieval. In: 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 591–595. IEEE

  33. Won M, Salamon J, Bryan NJ, Mysore GJ, Serra X (2021) Emotion embedding spaces for matching music to stories. In: Proceedings of the 22nd International Society for Music Information Retrieval Conference, ISMIR, Online, pp 777–785

  34. Xiong Y, Su F, Wang Q (2017) Automatic music mood classification by learning cross-media relevance between audio and lyrics. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), pp 961–966. IEEE

  35. Xu M, Li X, Xianyu H, Tian J, Meng F, Chen W (2015) Multi-scale approaches to the MediaEval 2015 “Emotion in Music” task. In: Working notes proceedings of the MediaEval 2015 workshop. CEUR Workshop proceedings, vol. 1436. CEUR-WS.org

  36. Yousefian Jazi S, Kaedi M, Fatemi A (2021) An emotion-aware music recommender system: bridging the user’s interaction and music recommendation. Multimed Tools Appl 80(9):13559–13574

    Article  Google Scholar 

  37. Zhang Y, Jiang J, Xia G, Dixon S (2022) Interpreting song lyrics with an audio-informed pre-trained language model. In: Proceedings of the 23rd International Society for Music Information Retrieval Conference, ISMIR, Bangaluru, pp 19–26

  38. Zhang K, Zhang H, Li S, Yang C, Sun L (2018) The PMEmo dataset for music emotion recognition. In: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, ICMR, Yokohama, pp 135–142

  39. Zhang M, Zhu Y, Zhang W, Zhu Y, Feng T (2022) Modularized composite attention network for continuous music emotion recognition. Multimed Tools Appl, 1–23

  40. Zhao J, Ru G, Yu Y, Wu Y, Li D, Li W (2022) Multimodal music emotion recognition with hierarchical cross-modal attention network. In: IEEE International conference on multimedia and expo, ICME 2022, pp 1–6. IEEE

  41. Zhou J, Chen X, Yang D (2019) Multimodel music emotion recognition using unsupervised deep neural networks. In: Proceedings of the 6th Conference on Sound and Music Technology (CSMT), pp 27–39. Springer

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liang Yang.

Ethics declarations

Conflict of Interests

The authors declared that they have no conflicts of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Workflow diagram

The full workflow of the proposed COSMIC framework is shown in Fig. 4.

Fig. 4
figure 4

The workflow diagram of COSMIC

Appendix B: Algorithm

The pseudocode for our proposed algorithm is presented as follows.

Algorithm 1
figure a

Pseudocode for the COSMIC Framework.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, L., Shen, Z., Zeng, J. et al. COSMIC: Music emotion recognition combining structure analysis and modal interaction. Multimed Tools Appl 83, 12519–12534 (2024). https://doi.org/10.1007/s11042-023-15376-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15376-z

Keywords

Navigation