Abstract
Music emotion recognition is an important topic in music information retrieval area. A lot of acoustic features are used to train a music classification or regression emotion model. However, these existing features may not be efficient for classification or regression task. Furthermore, most works do not explain why these features do work for classification. In our work, eight features are extracted to represent the arousal dimension of music emotion, and various commonly used statistical learning methods such as Logistic Regression, and tree-based methods are applied to interpret important features. Then the shrinkage methods are applied to feature selection and classification in music emotion recognition for the first time. Our tests show that the proposed approaches are efficient for feature selection just as entropy-based filter methods, and better than wrapper methods. The shrinkage methods can produce more continuous and low variance model than wrapper methods. Then, we discover that the most useful features are low specific loudness sensation coefficients (low-SONE), root mean square and loudness-flux. Moreover, the shrinkage methods apply in logistic regression perform better for classification than most of other methods. We get an average accuracy rate of 83.8 %.
Similar content being viewed by others
References
Kim, Y.E., Schmidt, E.M., Migneco, R., Morton, B.G., Richardson, P., Scott, J., Speck, J.A., Turnbull, D.: Music emotion recognition: a state of the art review. ISMIR, Utrecht, Netherlands, pp. 255–266 (2010)
Yang, Y.-H., Chen, H.H.: Machine recognition of music emotion: a review. ACM Trans. Intell. Syst. Technol. (TIST) 3(3), 1–30 (2012). doi:10.1145/2168752.2168754
Thayer, R.E.: The Biopsychology of Mood and Arousal. Oxford Univ. Press, New York (1989)
Lu, L., Liu, D., Zhang, H.J.: Automatic mood detection and tracking of music audio signals. IEEE Trans. Audio Speech Lang. Process. 14(1), 5–18 (2006)
Hastie, T., Tibshirani, R., Friedman, J.: Elements of Statistical Learning. Springer, New York (2009)
Šikonja, M.R., Kononenko, I.: Theoretical and empirical analysis of ReliefF and RReliefF. Mach. Learn. 53, 23–69 (2003)
Hall, M.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the 17th International Conference on Machine Learning (ICML), Stanford, CA, USA, pp. 359–366 (2000)
Miyoshi, M., et al.: Feature selection method for music mood score detection. IEEE, Modeling, Simulation and Applied Optimization (ICMSAO), 2011 4th International Conference, pp. 1–6 (2011)
Yang, Y.H., Lin, Y.C., Su, Y.F., et al.: A regression approach to music emotion recognition. IEEE Trans. Audio Speech Lang. Process. 16(2), 448–457 (2008)
Miller, A.: Subset Selection in Regression. CRC Press, Boca Raton, London (2002)
Huq, A., Bello, J.P., Rowe, R.: Automated music emotion recognition: a systematic evaluation. J. New Music Res. 39(3), 227–244 (2010)
Saari, P., Eerola, T., Lartillot, O.: Generalizability and simplicity as criteria in feature selection: application to mood classification in music. IEEE Trans. Audio Speech and Lang. Process. 19(6), 1802–1812 (2011)
Ruxanda, M.M., Chua, B.Y., Nanopoulos, A., Jensen, C.S.: Emotion-based music retrieval on a well-reduced audio feature space. In: Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference, pp. 181–184 (2009)
Schmidt, E.M., Turnbull, D., Kim, Y.E.: Feature selection for content-based, time-varying musical emotion regression. International Conference on Multimedia Information Retrieval, ACM, pp. 267–274 (2010)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Methodol. 58(1), 267–288 (1996)
Chen, S., Donoho, D., Saunders, M.: Atomic decomposition by basis pursuit. SIAM Rev. 43(1), 129–159 (2001)
Smith, E.C., Lewicki, M.S.: Efficient auditory coding. Nature 439, 978–982 (2006)
Schmidt, E.M., Kim, Y.E.: Learning emotion-based acoustic features with deep belief networks. IEEE, Applications of Signal Processing to Audio and Acoustics (WASPAA), 2011 IEEE Workshop, pp. 65–68 (2011)
Schmidt, E.M., Scott, J., Kim, Y.E.: Feature learning in dynamic environments: modeling the acoustic structure of musical emotion. In: ISMIR, pp. 325–330 (2012)
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Pampalk, E., Rauber, A., Merkl, D.: A MATLAB toolbox to compute music similarity from audio. In: Proceedings of the ISMIR International Conference on Music Information Retrieval (ISMIR) (2004)
Painter, T., Spanias, A.: A review of algorithms for perceptual coding of digital audio signals. In: Digital Signal Processing Proceedings (DSP), pp. 179–208. IEEE (1997)
Zwicker, E.: Subdivision of the audible frequency range into critical bands. J. Acoust. Soc. Am. 33(2), 248 (1961)
Lartillot, O., Toiviainen, P.: MIR in Matlab (II): a toolbox for musical feature extraction from audio. International Society for Music Information Retrieval (ISMIR), pp. 127–130 (2007)
Roth, V.: The generalized LASSO. IEEE Trans. Neural Netw. 15(1), 16–28 (2004). doi:10.1109/TNN.2003.809398
Efron, B., et al.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)
Friedman, J., Hastie, T., Höfling, H., Tibshirani, R.: Pathwise coordinate optimization. Ann. Appl. Stat. 1(2), 302–332 (2007)
Lee, A., Silvapulle, M.: Ridge estimation in logistic regression. Commun. Stat. Simul. Comput. 17, 1231–1257 (1988)
Bradley, P.S., Mangasarian, O.L.: Feature selection via concave minimization and support vector machines. In: Machine Learning Proceedings of the Fifteenth International Conference (ICML’98). Morgan Kaufmann, San Francisco, CA, pp. 82–90 (1998)
Zhu, J., Rosset, S., Hastie, T., Tibshirani, R.: 1-norm support vector machines. Adv. Neural Inf. Process. Syst. 16(1), 49–56 (2003)
Friedman, J., Hastie, T., Rosset, S., Tibshirani, R., Zhu, J.: Discussion of boosting papers. Ann. Stat. 32, 102–107 (2004)
Tibshirani, R., Hastie, T., Narasimhan, B., Chu, G.: Class prediction by nearest shrunken centroids with applications to dna microarrays. Stat. Sci. 18(1), 104–117 (2003)
Guo, Y., Hastie, T., Tibshirani, R.: Regularized discriminant analysis and its application in microarrays. Biostatistics 1(1), 1–18 (2005)
Goto, M., Hashiguchi, H., Nishimura, T., Oka, R.: RWC music database: popular, classical and jazz music data-bases. In: Proceedings of the International Conference on Music Information Retrieval (ISMIR), pp. 287–288 (2002)
Soleymani, M., et al.: 1000 songs for emotional analysis of music. In: Proceedings of the 2nd ACM international workshop on Crowdsourcing for multimedia, pp. 1–6 (2013)
Eerola, T., Lartillot, O., Toiviainen, P.: Prediction of Multidimensional Emotional Ratings in Music from Audio Using Multivariate Regression Models. International Society for Music Information Retrieval (ISMIR), pp. 621–626 (2009)
Li, T., Ogihara, M.: Detecting emotion in music. In: Proceedings of the ISMIR International Conference on Music Information Retrieval (ISMIR), pp. 239–240 (2003)
Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Petrov, B.N., Csáki, F. (eds.) 2nd International Symposium on Information Theory, Tsahkadsor, Armenia, USSR, September 2–8, 1971, Budapest, Akadémiai Kiadó, pp. 267–281 (1973)
Acknowledgments
This work is supported by Communication University of China Engineering Project 3132014XNG1429, 2012BAH17B02 And the National key Science & Technology Pillar Program of China under Grant No. 2012-BAH51F02.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by B. Prabhakaran.
Rights and permissions
About this article
Cite this article
Zhang, J.L., Huang, X.L., Yang, L.F. et al. Feature selection and feature learning in arousal dimension of music emotion by using shrinkage methods. Multimedia Systems 23, 251–264 (2017). https://doi.org/10.1007/s00530-015-0489-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-015-0489-y