Bi-Modal Deep Boltzmann Machine Based Musical Emotion Classification

Huang, Moyuan; Rong, Wenge; Arjannikov, Tom; Jiang, Nan; Xiong, Zhang

doi:10.1007/978-3-319-44781-0_24

Moyuan Huang¹⁶,
Wenge Rong¹⁶,
Tom Arjannikov¹⁷,
Nan Jiang¹⁶ &
…
Zhang Xiong¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9887))

Included in the following conference series:

International Conference on Artificial Neural Networks

3975 Accesses
6 Citations

Abstract

Music plays an important role in many people’s lives. When listening to music, we usually choose those music pieces that best suit our current moods. However attractive, automating this task remains a challenge. To this end the approaches in the literature exploit different kinds of information (audio, visual, social, etc.) about individual music pieces. In this work, we study the task of classifying music into different mood categories by integrating information from two domains: audio and semantic. We combine information extracted directly from audio with information about the corresponding tracks’ lyrics using a bi-modal Deep Boltzmann Machine architecture and show the effectiveness of this approach through empirical experiments using the largest music dataset publicly available for research and benchmark purposes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bertin-Mahieux, T., Ellis, D.P.W., Whitman, B., Lamere, P.: The million song dataset. In: Proceedings of 12th International Society for Music Information Retrieval Conference, pp. 591–596 (2011)
Google Scholar
Corona, H., O’Mahony, M.P.: An exploration of mood classification in the million songs dataset. In: Proceedings of 12th Sound and Music Computing Conference (2015)
Google Scholar
He, H., Jin, J., Xiong, Y., Chen, B., Sun, W., Zhao, L.: Language feature mining for music emotion classification via supervised learning from lyrics. In: Kang, L., Cai, Z., Yan, X., Liu, Y. (eds.) ISICA 2008. LNCS, vol. 5370, pp. 426–435. Springer, Heidelberg (2008)
Chapter Google Scholar
Hu, X., Choi, K., Downie, J.S.: A framework for evaluating multimodal music mood classification. J. Assoc. Inf. Sci. Technol. (2016) (In press)
Google Scholar
Hu, X., Downie, J.S.: When lyrics outperform audio for music mood classification: a feature analysis. In: Proceedings of 11th International Society for Music Information Retrieval Conference, pp. 619–624 (2010)
Google Scholar
Hu, X., Downie, J.S., Ehmann, A.F.: Lyric text mining in music mood classification. In: Proceedings of 10th International Society for Music Information Retrieval Conference, pp. 411–416 (2009)
Google Scholar
Hu, X., Downie, J.S., Laurier, C., Bay, M., Ehmann, A.F.: The 2007 MIREX audio mood classification task: lessons learned. In: Proceedings of 9th International Conference on Music Information Retrieval, pp. 462–467 (2008)
Google Scholar
Hu, Y., Chen, X., Yang, D.: Lyric-based song emotion detection with affective lexicon and fuzzy clustering method. In: Proceedings of 10th International Society for Music Information Retrieval Conference, pp. 123–128 (2009)
Google Scholar
Kim, Y.E., Schmidt, E.M., Migneco, R., Morton, B.G., Richardson, P., Scott, J.J., Speck, J.A., Turnbull, D.: State of the art report: music emotion recognition: a state of the art review. In: Proceedings of 11th International Society for Music Information Retrieval Conference, pp. 255–266 (2010)
Google Scholar
Laurier, C., Grivolla, J., Herrera, P.: Multimodal music mood classification using audio and lyrics. In: Proceedings of 7th International Conference on Machine Learning and Applications, pp. 688–693 (2008)
Google Scholar
Li, T., Mitsunori, O., Tzanetakis, G. (eds.): Music Data Mining. CRC Press, Boca Raton (2012)
Google Scholar
Li, T., Ogihara, M.: Detecting emotion in music. In: Proceedings of 4th International Society for Music Information Retrieval Conference (2003)
Google Scholar
Lu, L., Liu, D., Zhang, H.: Automatic mood detection and tracking of music audio signals. IEEE Trans. Audio Speech Lang. Process. 14(1), 5–18 (2006)
Article MathSciNet Google Scholar
Mayer, R., Neumayer, R., Rauber, A.: Combination of audio and lyrics features for genre classification in digital audio collections. In: Proceedings of 16th International Conference on Multimedia, pp. 159–168 (2008)
Google Scholar
Russell, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. 39(6), 1161–1178 (1980)
Article Google Scholar
Salakhutdinov, R., Hinton, G.E.: Deep boltzmann machines. In: Proceedings of 12th International Conference on Artificial Intelligence and Statistics, pp. 448–455 (2009)
Google Scholar
Schindler, A., Mayer, R., Rauber, A.: Facilitating comprehensive benchmarking experiments on the million song dataset. In: Proceedings of 2012 International Society for Music Information Retrieval Conference, pp. 469–474 (2012)
Google Scholar
Smolensky, P.: Information processing in dynamical systems: foundations of harmony theory. Technical report, DTIC Document (1986)
Google Scholar
Srivastava, N., Salakhutdinov, R.: Multimodal learning with deep boltzmann machines. J. Mach. Learn. Res. 15(1), 2949–2980 (2014)
MathSciNet MATH Google Scholar
Xue, H., Xue, L., Su, F.: Multimodal music mood classification by fusion of audio and lyrics. In: He, X., Luo, S., Tao, D., Xu, C., Yang, J., Hasan, M.A. (eds.) MMM 2015, Part II. LNCS, vol. 8936, pp. 26–37. Springer, Heidelberg (2015)
Google Scholar
Yang, Y.H., Chen, H.H.: Machine recognition of music emotion: a review. ACM Trans. Intell. Syst. Technol. 3(3), 338–343 (2012)
Article Google Scholar

Download references

Acknowledgements

This work was partially supported by the National Natural Science Foundation of China (No. 61332018), the National Department Public Benefit Research Foundation (No. 201510209), and the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

School of Computer Science and Engineering, Beihang University, Beijing, China
Moyuan Huang, Wenge Rong, Nan Jiang & Zhang Xiong
Department of Computer Science, University of Victoria, Victoria, Canada
Tom Arjannikov

Authors

Moyuan Huang
View author publications
You can also search for this author in PubMed Google Scholar
Wenge Rong
View author publications
You can also search for this author in PubMed Google Scholar
Tom Arjannikov
View author publications
You can also search for this author in PubMed Google Scholar
Nan Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Zhang Xiong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenge Rong .

Editor information

Editors and Affiliations

University of Lausanne, Lausanne, Switzerland
Alessandro E.P. Villa
University of Lausanne, Lausanne, Switzerland
Paolo Masulli
Universitat Politécnica de Catalunya, Terrrassa, Spain
Antonio Javier Pons Rivero

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huang, M., Rong, W., Arjannikov, T., Jiang, N., Xiong, Z. (2016). Bi-Modal Deep Boltzmann Machine Based Musical Emotion Classification. In: Villa, A., Masulli, P., Pons Rivero, A. (eds) Artificial Neural Networks and Machine Learning – ICANN 2016. ICANN 2016. Lecture Notes in Computer Science(), vol 9887. Springer, Cham. https://doi.org/10.1007/978-3-319-44781-0_24

Download citation

DOI: https://doi.org/10.1007/978-3-319-44781-0_24
Published: 13 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44780-3
Online ISBN: 978-3-319-44781-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics