Analyzing the Perceptual Salience of Audio Features for Musical Emotion Recognition

Schmidt, Erik M.; Prockup, Matthew; Scott, Jeffrey; Dolhansky, Brian; Morton, Brandon G.; Kim, Youngmoo E.

doi:10.1007/978-3-642-41248-6_15

Erik M. Schmidt¹⁸,
Matthew Prockup¹⁸,
Jeffrey Scott¹⁸,
Brian Dolhansky¹⁹,
Brandon G. Morton¹⁸ &
…
Youngmoo E. Kim¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7900))

Included in the following conference series:

International Symposium on Computer Music Modeling and Retrieval

3519 Accesses

Abstract

While the organization of music in terms of emotional affect is a natural process for humans, quantifying it empirically proves to be a very difficult task. Consequently, no acoustic feature (or combination thereof) has emerged as the optimal representation for musical emotion recognition. Due to the subjective nature of emotion, determining whether an acoustic feature domain is informative requires evaluation by human subjects. In this work, we seek to perceptually evaluate two of the most commonly used features in music information retrieval: mel-frequency cepstral coefficients and chroma. Furthermore, to identify emotion-informative feature domains, we explore which musical features are most relevant in determining emotion perceptually, and which acoustic feature domains are most variant or invariant to those changes. Finally, given our collected perceptual data, we conduct an extensive computational experiment for emotion prediction accuracy on a large number of acoustic feature domains, investigating pairwise prediction both in the context of a general corpus as well as in the context of a corpus that is constrained to contain only specific musical feature transformations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Barthet, M., Fazekas, G., Sandler, M.: Multidisciplinary perspectives on music emotion recognition: Implications for content and context-based models. In: Proceedings of the International Symposium on Computer Music Modeling and Retrieval (CMMR), London, UK (June 2012)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20, 273–295 (1995)
MATH Google Scholar
Dalla Bella, S., Peretz, I., Rousseau, L., Gosselin, N.: A developmental study of the affective value of tempo and mode in music. Cognition 80(3) (July 2001)
Google Scholar
Husain, G., Thompson, W., Glenn Schellenberg, E.: Effects of musical tempo and mode on arousal, mood, and spatial abilities. Music Perception 20(2), 151–171 (2002)
Article Google Scholar
Gagnon, L., Peretz, I.: Mode and tempo relative contributions to happy-sad judgements in equitone melodies. Cognition & Emotion 17(1), 25–40 (2003)
Article Google Scholar
Gerardi, G., Gerken, L.: The development of affective responses to modality and melodic contour. Music Perception 12(3), 279–290 (1995)
Article Google Scholar
Hevner, K.: Experimental studies of the elements of expression in music. American Journal of Psychology 48, 246–268 (1936)
Article Google Scholar
Ipeirotis, P.: Demographics of mechanical turk. In: CeDER Working Papers. NYU Stern School of Business (2010)
Google Scholar
Jiang, D., Lu, L., Zhang, H., Tao, J., Cai, L.: Music type classification by spectral contrast feature. In: Proc. Intl. Conf. on Multimedia and Expo., vol. 1, pp. 113–116 (2002)
Google Scholar
Joachims, T.: Optimizing search engines using clickthrough data. In: Proceedings of the ACM Conference on Knowledge Discovery and Data Mining, KDD (2002)
Google Scholar
Juslin, P.N., Karlsson, J., Lindström, E., Friberg, A., Schoonderwaldt, E.: Play it again with feeling: Computer feedback in musical communication of emotions. Journal of Experimental Psychology: Applied 12(2), 79–95 (2006)
Article Google Scholar
Kim, Y.E., Schmidt, E.M., Migneco, R., Morton, B., Richardson, P., Scott, J., Speck, J.A., Turnbull, D.: Music emotion recognition: A state of the art review. In: Proceedings of the International Society for Music Information Retrieval (ISMIR) Conference, Utrecht, Netherlands (2010)
Google Scholar
Lee, J.H.: Crowdsourcing music similarity judgments using mechanical turk. In: Proceedings of the International Society for Music Information Retrieval (ISMIR) Conference, Utrecht, Netherlands (2010)
Google Scholar
Mandel, M.I., Eck, D., Bengio, Y.: Learning tags that vary within a song. In: Proceedings of the International Society for Music Information Retrieval (ISMIR) Conference, Utrecht, Netherlands (2010)
Google Scholar
Randel, D.M.: The Harvard dictionary of music, 4th edn. Belknap Press of Harvard University Press, Cambridge (2003)
Google Scholar
Rigg, M.G.: Speed as a determiner of musical mood. Journal of Experimental Psychology 27, 566–571 (1940)
Article Google Scholar
Schmidt, E.M., Kim, Y.E.: Prediction of time-varying musical mood distributions using Kalman filtering. In: Proc. of the 9th IEEE Intl. Conf. on Machine Learning and Applications (ICMLA), Washington, D.C (2010)
Google Scholar
Schmidt, E.M., Kim, Y.E.: Prediction of time-varying musical mood distributions from audio. In: Proceedings of the International Society for Music Information Retrieval (ISMIR) Conference, Utrecht, Netherlands (2010)
Google Scholar
Schmidt, E.M., Kim, Y.E.: Modeling musical emotion dynamics with conditional random fields. In: Proceedings of the International Society for Music Information Retrieval (ISMIR) Conference, Miami, FL (October 2011)
Google Scholar
Schmidt, E.M., Prockup, M., Scott, J., Morton, B., Kim, Y.E.: Relating perceptual and feature space invariances in music emotion recognition. In: Proceedings of the International Symposium on Computer Music Modeling and Retrieval (CMMR), London, UK (2012)
Google Scholar
Schmidt, E.M., Turnbull, D., Kim, Y.E.: Feature selection for content-based, time-varying musical emotion regression. In: ACM MIR, Philadelphia, PA (2010)
Google Scholar
Snow, R., O’Connor, B., Jurafsky, D., Ng, A.: Cheap and Fast - But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks. In: Proc. Empirical Methods in NLP (2008)
Google Scholar
Sorokin, A., Forsyth, D.: Utility data annotation with amazon mechanical turk. In: CVPR Workshops (2008)
Google Scholar
Speck, J.A., Schmidt, E.M., Morton, B.G., Kim, Y.E.: A comparative study of collaborative vs. traditional annotation methods. In: Proceedings of the International Society for Music Information Retrieval (ISMIR) Conference, Miami, Florida (2011)
Google Scholar
Thayer, R.E.: The Biopsychology of Mood and Arousal. Oxford Univ. Press, Oxford (1989)
Google Scholar
Webster, G.D., Weir, C.G.: Emotional responses to music: Interactive effects of mode, texture, and tempo. Motivation and Emotion 29, 19–39 (2005)
Article Google Scholar
Whitehill, J., Ruvolo, P., Wu, T., Bergsma, J., Movellan, J.: Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In: NIPS. MIT Press (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Electrical and Computer Engineering, Drexel University, USA
Erik M. Schmidt, Matthew Prockup, Jeffrey Scott, Brandon G. Morton & Youngmoo E. Kim
Computer Science, University of Pennsylvania, USA
Brian Dolhansky

Authors

Erik M. Schmidt
View author publications
You can also search for this author in PubMed Google Scholar
Matthew Prockup
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey Scott
View author publications
You can also search for this author in PubMed Google Scholar
Brian Dolhansky
View author publications
You can also search for this author in PubMed Google Scholar
Brandon G. Morton
View author publications
You can also search for this author in PubMed Google Scholar
Youngmoo E. Kim
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CNRS - LMA, 31 Chemin Joseph Aiguier, 13402, Marseille Cedex 20, France
Mitsuko Aramaki , Richard Kronland-Martinet & Sølvi Ystad , &
Centre for Digital Music, Queen Mary University of London, Mile End Road, E1 4NS, London, UK
Mathieu Barthet

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schmidt, E.M., Prockup, M., Scott, J., Dolhansky, B., Morton, B.G., Kim, Y.E. (2013). Analyzing the Perceptual Salience of Audio Features for Musical Emotion Recognition. In: Aramaki, M., Barthet, M., Kronland-Martinet, R., Ystad, S. (eds) From Sounds to Music and Emotions. CMMR 2012. Lecture Notes in Computer Science, vol 7900. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41248-6_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-41248-6_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41247-9
Online ISBN: 978-3-642-41248-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics