Abstract
Mood analysis from music data attracts both increasing research and application attentions in recent years. In this paper, we propose a novel multimodal approach for music mood classification incorporating audio and lyric information, which consists of three key components: 1) lyric feature extraction with a recursive hierarchical deep learning model, preceded by lyric filtering with discriminative reduction of vocabulary and synonymous lyric expansion; 2) saliency based audio feature extraction; 3) a Hough forest based fusion and classification scheme that fuses two modalities at the more fine-grained sentence level, utilizing the time alignment cross modalities. The effectiveness of the proposed model is verified by the experiments on a real dataset containing more than 3000 minutes of music.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ali, Omar, S., Zehra, Peynircioglu, F.: Songs and emotions: Are lyrics and melodies equal partners? Psychology of Music 34(4), 511–534 (2006)
Bengio, Y., Schwenk, H., Senscal, J.S., Morin, F., Gauvain, J.L.: Neural probabilistic language models. JMLR 3, 1137–1155 (2003)
Hinton, G.E.: Learning distributed representations of concepts. In: 8th Annual Conference of the Cognitive Science Society, pp. 1–12 (1986)
Hu, X., Downie, J.S.: When lyrics outperform audio for music mood classification: A feature analysis. In: ISMIR 2010, pp. 619–624 (2010)
Hu, X., Downie, J.S., Ehmann, A.F.: Lyric text mining in music mood classification. In: ISMIR 2009, pp. 411–416 (2009)
Kim, Schmidt, E.M., Migneco, R., Youngmoo, E.: Music emotion recognition: A state of the art review. In: ISMIR 2010, pp. 255–266 (2010)
Laurier, C., Grivolla, J., Herrera, P.: Multimodal music mood classification using audio and lyrics. In: ICMLA 2008, pp. 688–693 (2008)
Li, T., Ogihara, M.: Detecting emotion in music. In: ISMIR 2003, pp. 239–240 (2003)
Lu, L., Liu, D., Zhang, H.J.: Automatic mood detection and tracking of music audio signals. IEEE TASLP 14(1), 5–18 (2006)
Miller, G.A.: Wordnet: A lexical database for english. Communications of the ACM 38(11), 39–41 (1995)
Nakagawa, T., Iuni, K., Kurohashi, S.: Dependency tree-based sentiment classification using crfs with hidden variables. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 786–794 (2010)
Panda, R., Paiva, R.P.: Mirex 2012: Mood classification tasks submission (2012)
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? sentiment classification using machine learning techniques. In: EMNLP 2002, pp. 78–86 (2002)
Russell, J.A.: A circumplex model of affect. Journal of Personality and Social Psychology, 1161–1178 (1980)
Socher, R., Pennington, J., Huang, E.H., Ng, A.Y., Manning, C.D.: Semi-supervised recursive autoencoders for predicting sentiment distributions. In: EMNLP 2011, pp. 151–161 (2011)
Turian, J., Ratinov, L., Bengio, Y.: Word representations: A simple and general method for semi-supervised learning. In: 48th Annual Meeting of the Association for Computational Linguistics, pp. 384–394 (2010)
Yang, D., Lee, W.S.: Disambiguating music emotion using software agents. In: ISMIR 2004, pp. 218–223 (2004)
Yang, Y.-H., Lin, Y.-C., Cheng, H.-T., Liao, I.-B., Ho, Y.-C., Chen, H.H.: Toward multi-modal music emotion classification. In: Huang, Y.-M.R., Xu, C., Cheng, K.-S., Yang, J.-F.K., Swamy, M.N.S., Li, S., Ding, J.-W. (eds.) PCM 2008. LNCS, vol. 5353, pp. 70–79. Springer, Heidelberg (2008)
Yang, Y.-H., Chen, H.-H.: Machine recognition of music emotion: A review. ACM Transactions on Intelligent Systems and Technology 3(3) (May 2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Xue, H., Xue, L., Su, F. (2015). Multimodal Music Mood Classification by Fusion of Audio and Lyrics. In: He, X., Luo, S., Tao, D., Xu, C., Yang, J., Hasan, M.A. (eds) MultiMedia Modeling. MMM 2015. Lecture Notes in Computer Science, vol 8936. Springer, Cham. https://doi.org/10.1007/978-3-319-14442-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-14442-9_3
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14441-2
Online ISBN: 978-3-319-14442-9
eBook Packages: Computer ScienceComputer Science (R0)