Multimodal Music Mood Classification by Fusion of Audio and Lyrics

Xue, Hao; Xue, Like; Su, Feng

doi:10.1007/978-3-319-14442-9_3

Hao Xue²⁰,
Like Xue²⁰ &
Feng Su²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8936))

Included in the following conference series:

International Conference on Multimedia Modeling

4005 Accesses
14 Citations

Abstract

Mood analysis from music data attracts both increasing research and application attentions in recent years. In this paper, we propose a novel multimodal approach for music mood classification incorporating audio and lyric information, which consists of three key components: 1) lyric feature extraction with a recursive hierarchical deep learning model, preceded by lyric filtering with discriminative reduction of vocabulary and synonymous lyric expansion; 2) saliency based audio feature extraction; 3) a Hough forest based fusion and classification scheme that fuses two modalities at the more fine-grained sentence level, utilizing the time alignment cross modalities. The effectiveness of the proposed model is verified by the experiments on a real dataset containing more than 3000 minutes of music.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ali, Omar, S., Zehra, Peynircioglu, F.: Songs and emotions: Are lyrics and melodies equal partners? Psychology of Music 34(4), 511–534 (2006)
Article Google Scholar
Bengio, Y., Schwenk, H., Senscal, J.S., Morin, F., Gauvain, J.L.: Neural probabilistic language models. JMLR 3, 1137–1155 (2003)
MATH Google Scholar
Hinton, G.E.: Learning distributed representations of concepts. In: 8th Annual Conference of the Cognitive Science Society, pp. 1–12 (1986)
Google Scholar
Hu, X., Downie, J.S.: When lyrics outperform audio for music mood classification: A feature analysis. In: ISMIR 2010, pp. 619–624 (2010)
Google Scholar
Hu, X., Downie, J.S., Ehmann, A.F.: Lyric text mining in music mood classification. In: ISMIR 2009, pp. 411–416 (2009)
Google Scholar
Kim, Schmidt, E.M., Migneco, R., Youngmoo, E.: Music emotion recognition: A state of the art review. In: ISMIR 2010, pp. 255–266 (2010)
Google Scholar
Laurier, C., Grivolla, J., Herrera, P.: Multimodal music mood classification using audio and lyrics. In: ICMLA 2008, pp. 688–693 (2008)
Google Scholar
Li, T., Ogihara, M.: Detecting emotion in music. In: ISMIR 2003, pp. 239–240 (2003)
Google Scholar
Lu, L., Liu, D., Zhang, H.J.: Automatic mood detection and tracking of music audio signals. IEEE TASLP 14(1), 5–18 (2006)
MathSciNet Google Scholar
Miller, G.A.: Wordnet: A lexical database for english. Communications of the ACM 38(11), 39–41 (1995)
Article Google Scholar
Nakagawa, T., Iuni, K., Kurohashi, S.: Dependency tree-based sentiment classification using crfs with hidden variables. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 786–794 (2010)
Google Scholar
Panda, R., Paiva, R.P.: Mirex 2012: Mood classification tasks submission (2012)
Google Scholar
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? sentiment classification using machine learning techniques. In: EMNLP 2002, pp. 78–86 (2002)
Google Scholar
Russell, J.A.: A circumplex model of affect. Journal of Personality and Social Psychology, 1161–1178 (1980)
Google Scholar
Socher, R., Pennington, J., Huang, E.H., Ng, A.Y., Manning, C.D.: Semi-supervised recursive autoencoders for predicting sentiment distributions. In: EMNLP 2011, pp. 151–161 (2011)
Google Scholar
Turian, J., Ratinov, L., Bengio, Y.: Word representations: A simple and general method for semi-supervised learning. In: 48th Annual Meeting of the Association for Computational Linguistics, pp. 384–394 (2010)
Google Scholar
Yang, D., Lee, W.S.: Disambiguating music emotion using software agents. In: ISMIR 2004, pp. 218–223 (2004)
Google Scholar
Yang, Y.-H., Lin, Y.-C., Cheng, H.-T., Liao, I.-B., Ho, Y.-C., Chen, H.H.: Toward multi-modal music emotion classification. In: Huang, Y.-M.R., Xu, C., Cheng, K.-S., Yang, J.-F.K., Swamy, M.N.S., Li, S., Ding, J.-W. (eds.) PCM 2008. LNCS, vol. 5353, pp. 70–79. Springer, Heidelberg (2008)
Chapter Google Scholar
Yang, Y.-H., Chen, H.-H.: Machine recognition of music emotion: A review. ACM Transactions on Intelligent Systems and Technology 3(3) (May 2012)
Google Scholar

Download references

Author information

Authors and Affiliations

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China
Hao Xue, Like Xue & Feng Su

Authors

Hao Xue
View author publications
You can also search for this author in PubMed Google Scholar
Like Xue
View author publications
You can also search for this author in PubMed Google Scholar
Feng Su
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Technology, P.O. Box 123, 2007, Sydney, NSW, Australia
Xiangjian He
University of Newcastle, University Dr, Callaghan, 2308, NSW, Australia
Suhuai Luo
University of Technology, P.O. Box 123, 2007, Sydney, NSW, Australia
Dacheng Tao & Muhammad Abul Hasan &
National Lab of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, 95, Zhongguancun East Road, 100190, Beijing, P.R. China
Changsheng Xu
Shanghai Jitotong University, 800 Dong Chuan Rd, 200240, Shanghai, China
Jie Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xue, H., Xue, L., Su, F. (2015). Multimodal Music Mood Classification by Fusion of Audio and Lyrics. In: He, X., Luo, S., Tao, D., Xu, C., Yang, J., Hasan, M.A. (eds) MultiMedia Modeling. MMM 2015. Lecture Notes in Computer Science, vol 8936. Springer, Cham. https://doi.org/10.1007/978-3-319-14442-9_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-14442-9_3
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14441-2
Online ISBN: 978-3-319-14442-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics