Skip to main content

Multimodal Music Mood Classification by Fusion of Audio and Lyrics

  • Conference paper
MultiMedia Modeling (MMM 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8936))

Included in the following conference series:

Abstract

Mood analysis from music data attracts both increasing research and application attentions in recent years. In this paper, we propose a novel multimodal approach for music mood classification incorporating audio and lyric information, which consists of three key components: 1) lyric feature extraction with a recursive hierarchical deep learning model, preceded by lyric filtering with discriminative reduction of vocabulary and synonymous lyric expansion; 2) saliency based audio feature extraction; 3) a Hough forest based fusion and classification scheme that fuses two modalities at the more fine-grained sentence level, utilizing the time alignment cross modalities. The effectiveness of the proposed model is verified by the experiments on a real dataset containing more than 3000 minutes of music.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ali, Omar, S., Zehra, Peynircioglu, F.: Songs and emotions: Are lyrics and melodies equal partners? Psychology of Music 34(4), 511–534 (2006)

    Article  Google Scholar 

  2. Bengio, Y., Schwenk, H., Senscal, J.S., Morin, F., Gauvain, J.L.: Neural probabilistic language models. JMLR 3, 1137–1155 (2003)

    MATH  Google Scholar 

  3. Hinton, G.E.: Learning distributed representations of concepts. In: 8th Annual Conference of the Cognitive Science Society, pp. 1–12 (1986)

    Google Scholar 

  4. Hu, X., Downie, J.S.: When lyrics outperform audio for music mood classification: A feature analysis. In: ISMIR 2010, pp. 619–624 (2010)

    Google Scholar 

  5. Hu, X., Downie, J.S., Ehmann, A.F.: Lyric text mining in music mood classification. In: ISMIR 2009, pp. 411–416 (2009)

    Google Scholar 

  6. Kim, Schmidt, E.M., Migneco, R., Youngmoo, E.: Music emotion recognition: A state of the art review. In: ISMIR 2010, pp. 255–266 (2010)

    Google Scholar 

  7. Laurier, C., Grivolla, J., Herrera, P.: Multimodal music mood classification using audio and lyrics. In: ICMLA 2008, pp. 688–693 (2008)

    Google Scholar 

  8. Li, T., Ogihara, M.: Detecting emotion in music. In: ISMIR 2003, pp. 239–240 (2003)

    Google Scholar 

  9. Lu, L., Liu, D., Zhang, H.J.: Automatic mood detection and tracking of music audio signals. IEEE TASLP 14(1), 5–18 (2006)

    MathSciNet  Google Scholar 

  10. Miller, G.A.: Wordnet: A lexical database for english. Communications of the ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  11. Nakagawa, T., Iuni, K., Kurohashi, S.: Dependency tree-based sentiment classification using crfs with hidden variables. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 786–794 (2010)

    Google Scholar 

  12. Panda, R., Paiva, R.P.: Mirex 2012: Mood classification tasks submission (2012)

    Google Scholar 

  13. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? sentiment classification using machine learning techniques. In: EMNLP 2002, pp. 78–86 (2002)

    Google Scholar 

  14. Russell, J.A.: A circumplex model of affect. Journal of Personality and Social Psychology, 1161–1178 (1980)

    Google Scholar 

  15. Socher, R., Pennington, J., Huang, E.H., Ng, A.Y., Manning, C.D.: Semi-supervised recursive autoencoders for predicting sentiment distributions. In: EMNLP 2011, pp. 151–161 (2011)

    Google Scholar 

  16. Turian, J., Ratinov, L., Bengio, Y.: Word representations: A simple and general method for semi-supervised learning. In: 48th Annual Meeting of the Association for Computational Linguistics, pp. 384–394 (2010)

    Google Scholar 

  17. Yang, D., Lee, W.S.: Disambiguating music emotion using software agents. In: ISMIR 2004, pp. 218–223 (2004)

    Google Scholar 

  18. Yang, Y.-H., Lin, Y.-C., Cheng, H.-T., Liao, I.-B., Ho, Y.-C., Chen, H.H.: Toward multi-modal music emotion classification. In: Huang, Y.-M.R., Xu, C., Cheng, K.-S., Yang, J.-F.K., Swamy, M.N.S., Li, S., Ding, J.-W. (eds.) PCM 2008. LNCS, vol. 5353, pp. 70–79. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  19. Yang, Y.-H., Chen, H.-H.: Machine recognition of music emotion: A review. ACM Transactions on Intelligent Systems and Technology 3(3) (May 2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Xue, H., Xue, L., Su, F. (2015). Multimodal Music Mood Classification by Fusion of Audio and Lyrics. In: He, X., Luo, S., Tao, D., Xu, C., Yang, J., Hasan, M.A. (eds) MultiMedia Modeling. MMM 2015. Lecture Notes in Computer Science, vol 8936. Springer, Cham. https://doi.org/10.1007/978-3-319-14442-9_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-14442-9_3

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-14441-2

  • Online ISBN: 978-3-319-14442-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics