Skip to main content
Log in

Feature selection and feature learning in arousal dimension of music emotion by using shrinkage methods

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Music emotion recognition is an important topic in music information retrieval area. A lot of acoustic features are used to train a music classification or regression emotion model. However, these existing features may not be efficient for classification or regression task. Furthermore, most works do not explain why these features do work for classification. In our work, eight features are extracted to represent the arousal dimension of music emotion, and various commonly used statistical learning methods such as Logistic Regression, and tree-based methods are applied to interpret important features. Then the shrinkage methods are applied to feature selection and classification in music emotion recognition for the first time. Our tests show that the proposed approaches are efficient for feature selection just as entropy-based filter methods, and better than wrapper methods. The shrinkage methods can produce more continuous and low variance model than wrapper methods. Then, we discover that the most useful features are low specific loudness sensation coefficients (low-SONE), root mean square and loudness-flux. Moreover, the shrinkage methods apply in logistic regression perform better for classification than most of other methods. We get an average accuracy rate of 83.8 %.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Kim, Y.E., Schmidt, E.M., Migneco, R., Morton, B.G., Richardson, P., Scott, J., Speck, J.A., Turnbull, D.: Music emotion recognition: a state of the art review. ISMIR, Utrecht, Netherlands, pp. 255–266 (2010)

  2. Yang, Y.-H., Chen, H.H.: Machine recognition of music emotion: a review. ACM Trans. Intell. Syst. Technol. (TIST) 3(3), 1–30 (2012). doi:10.1145/2168752.2168754

    Article  MathSciNet  Google Scholar 

  3. Thayer, R.E.: The Biopsychology of Mood and Arousal. Oxford Univ. Press, New York (1989)

    Google Scholar 

  4. Lu, L., Liu, D., Zhang, H.J.: Automatic mood detection and tracking of music audio signals. IEEE Trans. Audio Speech Lang. Process. 14(1), 5–18 (2006)

    Article  Google Scholar 

  5. Hastie, T., Tibshirani, R., Friedman, J.: Elements of Statistical Learning. Springer, New York (2009)

    Book  MATH  Google Scholar 

  6. Šikonja, M.R., Kononenko, I.: Theoretical and empirical analysis of ReliefF and RReliefF. Mach. Learn. 53, 23–69 (2003)

    Article  MATH  Google Scholar 

  7. Hall, M.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the 17th International Conference on Machine Learning (ICML), Stanford, CA, USA, pp. 359–366 (2000)

  8. Miyoshi, M., et al.: Feature selection method for music mood score detection. IEEE, Modeling, Simulation and Applied Optimization (ICMSAO), 2011 4th International Conference, pp. 1–6 (2011)

  9. Yang, Y.H., Lin, Y.C., Su, Y.F., et al.: A regression approach to music emotion recognition. IEEE Trans. Audio Speech Lang. Process. 16(2), 448–457 (2008)

    Article  Google Scholar 

  10. Miller, A.: Subset Selection in Regression. CRC Press, Boca Raton, London (2002)

    Book  MATH  Google Scholar 

  11. Huq, A., Bello, J.P., Rowe, R.: Automated music emotion recognition: a systematic evaluation. J. New Music Res. 39(3), 227–244 (2010)

    Article  Google Scholar 

  12. Saari, P., Eerola, T., Lartillot, O.: Generalizability and simplicity as criteria in feature selection: application to mood classification in music. IEEE Trans. Audio Speech and Lang. Process. 19(6), 1802–1812 (2011)

    Article  Google Scholar 

  13. Ruxanda, M.M., Chua, B.Y., Nanopoulos, A., Jensen, C.S.: Emotion-based music retrieval on a well-reduced audio feature space. In: Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference, pp. 181–184 (2009)

  14. Schmidt, E.M., Turnbull, D., Kim, Y.E.: Feature selection for content-based, time-varying musical emotion regression. International Conference on Multimedia Information Retrieval, ACM, pp. 267–274 (2010)

  15. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Methodol. 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  16. Chen, S., Donoho, D., Saunders, M.: Atomic decomposition by basis pursuit. SIAM Rev. 43(1), 129–159 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  17. Smith, E.C., Lewicki, M.S.: Efficient auditory coding. Nature 439, 978–982 (2006)

    Article  Google Scholar 

  18. Schmidt, E.M., Kim, Y.E.: Learning emotion-based acoustic features with deep belief networks. IEEE, Applications of Signal Processing to Audio and Acoustics (WASPAA), 2011 IEEE Workshop, pp. 65–68 (2011)

  19. Schmidt, E.M., Scott, J., Kim, Y.E.: Feature learning in dynamic environments: modeling the acoustic structure of musical emotion. In: ISMIR, pp. 325–330 (2012)

  20. Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  21. Pampalk, E., Rauber, A., Merkl, D.: A MATLAB toolbox to compute music similarity from audio. In: Proceedings of the ISMIR International Conference on Music Information Retrieval (ISMIR) (2004)

  22. Painter, T., Spanias, A.: A review of algorithms for perceptual coding of digital audio signals. In: Digital Signal Processing Proceedings (DSP), pp. 179–208. IEEE (1997)

  23. Zwicker, E.: Subdivision of the audible frequency range into critical bands. J. Acoust. Soc. Am. 33(2), 248 (1961)

    Article  Google Scholar 

  24. Lartillot, O., Toiviainen, P.: MIR in Matlab (II): a toolbox for musical feature extraction from audio. International Society for Music Information Retrieval (ISMIR), pp. 127–130 (2007)

  25. Roth, V.: The generalized LASSO. IEEE Trans. Neural Netw. 15(1), 16–28 (2004). doi:10.1109/TNN.2003.809398

    Article  Google Scholar 

  26. Efron, B., et al.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  27. Friedman, J., Hastie, T., Höfling, H., Tibshirani, R.: Pathwise coordinate optimization. Ann. Appl. Stat. 1(2), 302–332 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  28. Lee, A., Silvapulle, M.: Ridge estimation in logistic regression. Commun. Stat. Simul. Comput. 17, 1231–1257 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  29. Bradley, P.S., Mangasarian, O.L.: Feature selection via concave minimization and support vector machines. In: Machine Learning Proceedings of the Fifteenth International Conference (ICML’98). Morgan Kaufmann, San Francisco, CA, pp. 82–90 (1998)

  30. Zhu, J., Rosset, S., Hastie, T., Tibshirani, R.: 1-norm support vector machines. Adv. Neural Inf. Process. Syst. 16(1), 49–56 (2003)

    Google Scholar 

  31. Friedman, J., Hastie, T., Rosset, S., Tibshirani, R., Zhu, J.: Discussion of boosting papers. Ann. Stat. 32, 102–107 (2004)

    MATH  Google Scholar 

  32. Tibshirani, R., Hastie, T., Narasimhan, B., Chu, G.: Class prediction by nearest shrunken centroids with applications to dna microarrays. Stat. Sci. 18(1), 104–117 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  33. Guo, Y., Hastie, T., Tibshirani, R.: Regularized discriminant analysis and its application in microarrays. Biostatistics 1(1), 1–18 (2005)

    MATH  Google Scholar 

  34. Goto, M., Hashiguchi, H., Nishimura, T., Oka, R.: RWC music database: popular, classical and jazz music data-bases. In: Proceedings of the International Conference on Music Information Retrieval (ISMIR), pp. 287–288 (2002)

  35. Soleymani, M., et al.: 1000 songs for emotional analysis of music. In: Proceedings of the 2nd ACM international workshop on Crowdsourcing for multimedia, pp. 1–6 (2013)

  36. Eerola, T., Lartillot, O., Toiviainen, P.: Prediction of Multidimensional Emotional Ratings in Music from Audio Using Multivariate Regression Models. International Society for Music Information Retrieval (ISMIR), pp. 621–626 (2009)

  37. Li, T., Ogihara, M.: Detecting emotion in music. In: Proceedings of the ISMIR International Conference on Music Information Retrieval (ISMIR), pp. 239–240 (2003)

  38. Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Petrov, B.N., Csáki, F. (eds.) 2nd International Symposium on Information Theory, Tsahkadsor, Armenia, USSR, September 2–8, 1971, Budapest, Akadémiai Kiadó, pp. 267–281 (1973)

Download references

Acknowledgments

This work is supported by Communication University of China Engineering Project 3132014XNG1429, 2012BAH17B02 And the National key Science & Technology Pillar Program of China under Grant No. 2012-BAH51F02.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiang Long Zhang.

Additional information

Communicated by B. Prabhakaran.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, J.L., Huang, X.L., Yang, L.F. et al. Feature selection and feature learning in arousal dimension of music emotion by using shrinkage methods. Multimedia Systems 23, 251–264 (2017). https://doi.org/10.1007/s00530-015-0489-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-015-0489-y

Keywords

Navigation