Feature selection and feature learning in arousal dimension of music emotion by using shrinkage methods

Zhang, Jiang Long; Huang, Xiang Lin; Yang, Li Fang; Xu, Ye; Sun, Shu Tao

doi:10.1007/s00530-015-0489-y

Feature selection and feature learning in arousal dimension of music emotion by using shrinkage methods

Regular Paper
Published: 20 October 2015

Volume 23, pages 251–264, (2017)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Jiang Long Zhang¹,
Xiang Lin Huang¹,
Li Fang Yang¹,
Ye Xu¹ &
…
Shu Tao Sun¹

576 Accesses
10 Citations
Explore all metrics

Abstract

Music emotion recognition is an important topic in music information retrieval area. A lot of acoustic features are used to train a music classification or regression emotion model. However, these existing features may not be efficient for classification or regression task. Furthermore, most works do not explain why these features do work for classification. In our work, eight features are extracted to represent the arousal dimension of music emotion, and various commonly used statistical learning methods such as Logistic Regression, and tree-based methods are applied to interpret important features. Then the shrinkage methods are applied to feature selection and classification in music emotion recognition for the first time. Our tests show that the proposed approaches are efficient for feature selection just as entropy-based filter methods, and better than wrapper methods. The shrinkage methods can produce more continuous and low variance model than wrapper methods. Then, we discover that the most useful features are low specific loudness sensation coefficients (low-SONE), root mean square and loudness-flux. Moreover, the shrinkage methods apply in logistic regression perform better for classification than most of other methods. We get an average accuracy rate of 83.8 %.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speech Emotion Recognition Using Multiple Classifiers

Using psychophysiological measures to recognize personal music emotional experience

Article 01 July 2019

Pertinent feature selection techniques for automatic emotion recognition in stressed speech

Article 03 June 2022

References

Kim, Y.E., Schmidt, E.M., Migneco, R., Morton, B.G., Richardson, P., Scott, J., Speck, J.A., Turnbull, D.: Music emotion recognition: a state of the art review. ISMIR, Utrecht, Netherlands, pp. 255–266 (2010)
Yang, Y.-H., Chen, H.H.: Machine recognition of music emotion: a review. ACM Trans. Intell. Syst. Technol. (TIST) 3(3), 1–30 (2012). doi:10.1145/2168752.2168754
Article MathSciNet Google Scholar
Thayer, R.E.: The Biopsychology of Mood and Arousal. Oxford Univ. Press, New York (1989)
Google Scholar
Lu, L., Liu, D., Zhang, H.J.: Automatic mood detection and tracking of music audio signals. IEEE Trans. Audio Speech Lang. Process. 14(1), 5–18 (2006)
Article Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: Elements of Statistical Learning. Springer, New York (2009)
Book MATH Google Scholar
Šikonja, M.R., Kononenko, I.: Theoretical and empirical analysis of ReliefF and RReliefF. Mach. Learn. 53, 23–69 (2003)
Article MATH Google Scholar
Hall, M.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the 17th International Conference on Machine Learning (ICML), Stanford, CA, USA, pp. 359–366 (2000)
Miyoshi, M., et al.: Feature selection method for music mood score detection. IEEE, Modeling, Simulation and Applied Optimization (ICMSAO), 2011 4th International Conference, pp. 1–6 (2011)
Yang, Y.H., Lin, Y.C., Su, Y.F., et al.: A regression approach to music emotion recognition. IEEE Trans. Audio Speech Lang. Process. 16(2), 448–457 (2008)
Article Google Scholar
Miller, A.: Subset Selection in Regression. CRC Press, Boca Raton, London (2002)
Book MATH Google Scholar
Huq, A., Bello, J.P., Rowe, R.: Automated music emotion recognition: a systematic evaluation. J. New Music Res. 39(3), 227–244 (2010)
Article Google Scholar
Saari, P., Eerola, T., Lartillot, O.: Generalizability and simplicity as criteria in feature selection: application to mood classification in music. IEEE Trans. Audio Speech and Lang. Process. 19(6), 1802–1812 (2011)
Article Google Scholar
Ruxanda, M.M., Chua, B.Y., Nanopoulos, A., Jensen, C.S.: Emotion-based music retrieval on a well-reduced audio feature space. In: Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference, pp. 181–184 (2009)
Schmidt, E.M., Turnbull, D., Kim, Y.E.: Feature selection for content-based, time-varying musical emotion regression. International Conference on Multimedia Information Retrieval, ACM, pp. 267–274 (2010)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Methodol. 58(1), 267–288 (1996)
MathSciNet MATH Google Scholar
Chen, S., Donoho, D., Saunders, M.: Atomic decomposition by basis pursuit. SIAM Rev. 43(1), 129–159 (2001)
Article MathSciNet MATH Google Scholar
Smith, E.C., Lewicki, M.S.: Efficient auditory coding. Nature 439, 978–982 (2006)
Article Google Scholar
Schmidt, E.M., Kim, Y.E.: Learning emotion-based acoustic features with deep belief networks. IEEE, Applications of Signal Processing to Audio and Acoustics (WASPAA), 2011 IEEE Workshop, pp. 65–68 (2011)
Schmidt, E.M., Scott, J., Kim, Y.E.: Feature learning in dynamic environments: modeling the acoustic structure of musical emotion. In: ISMIR, pp. 325–330 (2012)
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Article MathSciNet MATH Google Scholar
Pampalk, E., Rauber, A., Merkl, D.: A MATLAB toolbox to compute music similarity from audio. In: Proceedings of the ISMIR International Conference on Music Information Retrieval (ISMIR) (2004)
Painter, T., Spanias, A.: A review of algorithms for perceptual coding of digital audio signals. In: Digital Signal Processing Proceedings (DSP), pp. 179–208. IEEE (1997)
Zwicker, E.: Subdivision of the audible frequency range into critical bands. J. Acoust. Soc. Am. 33(2), 248 (1961)
Article Google Scholar
Lartillot, O., Toiviainen, P.: MIR in Matlab (II): a toolbox for musical feature extraction from audio. International Society for Music Information Retrieval (ISMIR), pp. 127–130 (2007)
Roth, V.: The generalized LASSO. IEEE Trans. Neural Netw. 15(1), 16–28 (2004). doi:10.1109/TNN.2003.809398
Article Google Scholar
Efron, B., et al.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)
Article MathSciNet MATH Google Scholar
Friedman, J., Hastie, T., Höfling, H., Tibshirani, R.: Pathwise coordinate optimization. Ann. Appl. Stat. 1(2), 302–332 (2007)
Article MathSciNet MATH Google Scholar
Lee, A., Silvapulle, M.: Ridge estimation in logistic regression. Commun. Stat. Simul. Comput. 17, 1231–1257 (1988)
Article MathSciNet MATH Google Scholar
Bradley, P.S., Mangasarian, O.L.: Feature selection via concave minimization and support vector machines. In: Machine Learning Proceedings of the Fifteenth International Conference (ICML’98). Morgan Kaufmann, San Francisco, CA, pp. 82–90 (1998)
Zhu, J., Rosset, S., Hastie, T., Tibshirani, R.: 1-norm support vector machines. Adv. Neural Inf. Process. Syst. 16(1), 49–56 (2003)
Google Scholar
Friedman, J., Hastie, T., Rosset, S., Tibshirani, R., Zhu, J.: Discussion of boosting papers. Ann. Stat. 32, 102–107 (2004)
MATH Google Scholar
Tibshirani, R., Hastie, T., Narasimhan, B., Chu, G.: Class prediction by nearest shrunken centroids with applications to dna microarrays. Stat. Sci. 18(1), 104–117 (2003)
Article MathSciNet MATH Google Scholar
Guo, Y., Hastie, T., Tibshirani, R.: Regularized discriminant analysis and its application in microarrays. Biostatistics 1(1), 1–18 (2005)
MATH Google Scholar
Goto, M., Hashiguchi, H., Nishimura, T., Oka, R.: RWC music database: popular, classical and jazz music data-bases. In: Proceedings of the International Conference on Music Information Retrieval (ISMIR), pp. 287–288 (2002)
Soleymani, M., et al.: 1000 songs for emotional analysis of music. In: Proceedings of the 2nd ACM international workshop on Crowdsourcing for multimedia, pp. 1–6 (2013)
Eerola, T., Lartillot, O., Toiviainen, P.: Prediction of Multidimensional Emotional Ratings in Music from Audio Using Multivariate Regression Models. International Society for Music Information Retrieval (ISMIR), pp. 621–626 (2009)
Li, T., Ogihara, M.: Detecting emotion in music. In: Proceedings of the ISMIR International Conference on Music Information Retrieval (ISMIR), pp. 239–240 (2003)
Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Petrov, B.N., Csáki, F. (eds.) 2nd International Symposium on Information Theory, Tsahkadsor, Armenia, USSR, September 2–8, 1971, Budapest, Akadémiai Kiadó, pp. 267–281 (1973)

Download references

Acknowledgments

This work is supported by Communication University of China Engineering Project 3132014XNG1429, 2012BAH17B02 And the National key Science & Technology Pillar Program of China under Grant No. 2012-BAH51F02.

Author information

Authors and Affiliations

School of Computer, Communication University of China, Beijing, China
Jiang Long Zhang, Xiang Lin Huang, Li Fang Yang, Ye Xu & Shu Tao Sun

Authors

Jiang Long Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Lin Huang
View author publications
You can also search for this author in PubMed Google Scholar
Li Fang Yang
View author publications
You can also search for this author in PubMed Google Scholar
Ye Xu
View author publications
You can also search for this author in PubMed Google Scholar
Shu Tao Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiang Long Zhang.

Additional information

Communicated by B. Prabhakaran.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, J.L., Huang, X.L., Yang, L.F. et al. Feature selection and feature learning in arousal dimension of music emotion by using shrinkage methods. Multimedia Systems 23, 251–264 (2017). https://doi.org/10.1007/s00530-015-0489-y

Download citation

Received: 01 January 2015
Accepted: 03 October 2015
Published: 20 October 2015
Issue Date: March 2017
DOI: https://doi.org/10.1007/s00530-015-0489-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature selection and feature learning in arousal dimension of music emotion by using shrinkage methods

Abstract

Access this article

Similar content being viewed by others

Speech Emotion Recognition Using Multiple Classifiers

Using psychophysiological measures to recognize personal music emotional experience

Pertinent feature selection techniques for automatic emotion recognition in stressed speech

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Feature selection and feature learning in arousal dimension of music emotion by using shrinkage methods

Abstract

Access this article

Similar content being viewed by others

Speech Emotion Recognition Using Multiple Classifiers

Using psychophysiological measures to recognize personal music emotional experience

Pertinent feature selection techniques for automatic emotion recognition in stressed speech

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation