Abstract
Music plays an important role in many people’s lives. When listening to music, we usually choose those music pieces that best suit our current moods. However attractive, automating this task remains a challenge. To this end the approaches in the literature exploit different kinds of information (audio, visual, social, etc.) about individual music pieces. In this work, we study the task of classifying music into different mood categories by integrating information from two domains: audio and semantic. We combine information extracted directly from audio with information about the corresponding tracks’ lyrics using a bi-modal Deep Boltzmann Machine architecture and show the effectiveness of this approach through empirical experiments using the largest music dataset publicly available for research and benchmark purposes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bertin-Mahieux, T., Ellis, D.P.W., Whitman, B., Lamere, P.: The million song dataset. In: Proceedings of 12th International Society for Music Information Retrieval Conference, pp. 591–596 (2011)
Corona, H., O’Mahony, M.P.: An exploration of mood classification in the million songs dataset. In: Proceedings of 12th Sound and Music Computing Conference (2015)
He, H., Jin, J., Xiong, Y., Chen, B., Sun, W., Zhao, L.: Language feature mining for music emotion classification via supervised learning from lyrics. In: Kang, L., Cai, Z., Yan, X., Liu, Y. (eds.) ISICA 2008. LNCS, vol. 5370, pp. 426–435. Springer, Heidelberg (2008)
Hu, X., Choi, K., Downie, J.S.: A framework for evaluating multimodal music mood classification. J. Assoc. Inf. Sci. Technol. (2016) (In press)
Hu, X., Downie, J.S.: When lyrics outperform audio for music mood classification: a feature analysis. In: Proceedings of 11th International Society for Music Information Retrieval Conference, pp. 619–624 (2010)
Hu, X., Downie, J.S., Ehmann, A.F.: Lyric text mining in music mood classification. In: Proceedings of 10th International Society for Music Information Retrieval Conference, pp. 411–416 (2009)
Hu, X., Downie, J.S., Laurier, C., Bay, M., Ehmann, A.F.: The 2007 MIREX audio mood classification task: lessons learned. In: Proceedings of 9th International Conference on Music Information Retrieval, pp. 462–467 (2008)
Hu, Y., Chen, X., Yang, D.: Lyric-based song emotion detection with affective lexicon and fuzzy clustering method. In: Proceedings of 10th International Society for Music Information Retrieval Conference, pp. 123–128 (2009)
Kim, Y.E., Schmidt, E.M., Migneco, R., Morton, B.G., Richardson, P., Scott, J.J., Speck, J.A., Turnbull, D.: State of the art report: music emotion recognition: a state of the art review. In: Proceedings of 11th International Society for Music Information Retrieval Conference, pp. 255–266 (2010)
Laurier, C., Grivolla, J., Herrera, P.: Multimodal music mood classification using audio and lyrics. In: Proceedings of 7th International Conference on Machine Learning and Applications, pp. 688–693 (2008)
Li, T., Mitsunori, O., Tzanetakis, G. (eds.): Music Data Mining. CRC Press, Boca Raton (2012)
Li, T., Ogihara, M.: Detecting emotion in music. In: Proceedings of 4th International Society for Music Information Retrieval Conference (2003)
Lu, L., Liu, D., Zhang, H.: Automatic mood detection and tracking of music audio signals. IEEE Trans. Audio Speech Lang. Process. 14(1), 5–18 (2006)
Mayer, R., Neumayer, R., Rauber, A.: Combination of audio and lyrics features for genre classification in digital audio collections. In: Proceedings of 16th International Conference on Multimedia, pp. 159–168 (2008)
Russell, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. 39(6), 1161–1178 (1980)
Salakhutdinov, R., Hinton, G.E.: Deep boltzmann machines. In: Proceedings of 12th International Conference on Artificial Intelligence and Statistics, pp. 448–455 (2009)
Schindler, A., Mayer, R., Rauber, A.: Facilitating comprehensive benchmarking experiments on the million song dataset. In: Proceedings of 2012 International Society for Music Information Retrieval Conference, pp. 469–474 (2012)
Smolensky, P.: Information processing in dynamical systems: foundations of harmony theory. Technical report, DTIC Document (1986)
Srivastava, N., Salakhutdinov, R.: Multimodal learning with deep boltzmann machines. J. Mach. Learn. Res. 15(1), 2949–2980 (2014)
Xue, H., Xue, L., Su, F.: Multimodal music mood classification by fusion of audio and lyrics. In: He, X., Luo, S., Tao, D., Xu, C., Yang, J., Hasan, M.A. (eds.) MMM 2015, Part II. LNCS, vol. 8936, pp. 26–37. Springer, Heidelberg (2015)
Yang, Y.H., Chen, H.H.: Machine recognition of music emotion: a review. ACM Trans. Intell. Syst. Technol. 3(3), 338–343 (2012)
Acknowledgements
This work was partially supported by the National Natural Science Foundation of China (No. 61332018), the National Department Public Benefit Research Foundation (No. 201510209), and the Fundamental Research Funds for the Central Universities.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Huang, M., Rong, W., Arjannikov, T., Jiang, N., Xiong, Z. (2016). Bi-Modal Deep Boltzmann Machine Based Musical Emotion Classification. In: Villa, A., Masulli, P., Pons Rivero, A. (eds) Artificial Neural Networks and Machine Learning – ICANN 2016. ICANN 2016. Lecture Notes in Computer Science(), vol 9887. Springer, Cham. https://doi.org/10.1007/978-3-319-44781-0_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-44781-0_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44780-3
Online ISBN: 978-3-319-44781-0
eBook Packages: Computer ScienceComputer Science (R0)