Abstract
While the organization of music in terms of emotional affect is a natural process for humans, quantifying it empirically proves to be a very difficult task. Consequently, no acoustic feature (or combination thereof) has emerged as the optimal representation for musical emotion recognition. Due to the subjective nature of emotion, determining whether an acoustic feature domain is informative requires evaluation by human subjects. In this work, we seek to perceptually evaluate two of the most commonly used features in music information retrieval: mel-frequency cepstral coefficients and chroma. Furthermore, to identify emotion-informative feature domains, we explore which musical features are most relevant in determining emotion perceptually, and which acoustic feature domains are most variant or invariant to those changes. Finally, given our collected perceptual data, we conduct an extensive computational experiment for emotion prediction accuracy on a large number of acoustic feature domains, investigating pairwise prediction both in the context of a general corpus as well as in the context of a corpus that is constrained to contain only specific musical feature transformations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Barthet, M., Fazekas, G., Sandler, M.: Multidisciplinary perspectives on music emotion recognition: Implications for content and context-based models. In: Proceedings of the International Symposium on Computer Music Modeling and Retrieval (CMMR), London, UK (June 2012)
Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20, 273–295 (1995)
Dalla Bella, S., Peretz, I., Rousseau, L., Gosselin, N.: A developmental study of the affective value of tempo and mode in music. Cognition 80(3) (July 2001)
Husain, G., Thompson, W., Glenn Schellenberg, E.: Effects of musical tempo and mode on arousal, mood, and spatial abilities. Music Perception 20(2), 151–171 (2002)
Gagnon, L., Peretz, I.: Mode and tempo relative contributions to happy-sad judgements in equitone melodies. Cognition & Emotion 17(1), 25–40 (2003)
Gerardi, G., Gerken, L.: The development of affective responses to modality and melodic contour. Music Perception 12(3), 279–290 (1995)
Hevner, K.: Experimental studies of the elements of expression in music. American Journal of Psychology 48, 246–268 (1936)
Ipeirotis, P.: Demographics of mechanical turk. In: CeDER Working Papers. NYU Stern School of Business (2010)
Jiang, D., Lu, L., Zhang, H., Tao, J., Cai, L.: Music type classification by spectral contrast feature. In: Proc. Intl. Conf. on Multimedia and Expo., vol. 1, pp. 113–116 (2002)
Joachims, T.: Optimizing search engines using clickthrough data. In: Proceedings of the ACM Conference on Knowledge Discovery and Data Mining, KDD (2002)
Juslin, P.N., Karlsson, J., Lindström, E., Friberg, A., Schoonderwaldt, E.: Play it again with feeling: Computer feedback in musical communication of emotions. Journal of Experimental Psychology: Applied 12(2), 79–95 (2006)
Kim, Y.E., Schmidt, E.M., Migneco, R., Morton, B., Richardson, P., Scott, J., Speck, J.A., Turnbull, D.: Music emotion recognition: A state of the art review. In: Proceedings of the International Society for Music Information Retrieval (ISMIR) Conference, Utrecht, Netherlands (2010)
Lee, J.H.: Crowdsourcing music similarity judgments using mechanical turk. In: Proceedings of the International Society for Music Information Retrieval (ISMIR) Conference, Utrecht, Netherlands (2010)
Mandel, M.I., Eck, D., Bengio, Y.: Learning tags that vary within a song. In: Proceedings of the International Society for Music Information Retrieval (ISMIR) Conference, Utrecht, Netherlands (2010)
Randel, D.M.: The Harvard dictionary of music, 4th edn. Belknap Press of Harvard University Press, Cambridge (2003)
Rigg, M.G.: Speed as a determiner of musical mood. Journal of Experimental Psychology 27, 566–571 (1940)
Schmidt, E.M., Kim, Y.E.: Prediction of time-varying musical mood distributions using Kalman filtering. In: Proc. of the 9th IEEE Intl. Conf. on Machine Learning and Applications (ICMLA), Washington, D.C (2010)
Schmidt, E.M., Kim, Y.E.: Prediction of time-varying musical mood distributions from audio. In: Proceedings of the International Society for Music Information Retrieval (ISMIR) Conference, Utrecht, Netherlands (2010)
Schmidt, E.M., Kim, Y.E.: Modeling musical emotion dynamics with conditional random fields. In: Proceedings of the International Society for Music Information Retrieval (ISMIR) Conference, Miami, FL (October 2011)
Schmidt, E.M., Prockup, M., Scott, J., Morton, B., Kim, Y.E.: Relating perceptual and feature space invariances in music emotion recognition. In: Proceedings of the International Symposium on Computer Music Modeling and Retrieval (CMMR), London, UK (2012)
Schmidt, E.M., Turnbull, D., Kim, Y.E.: Feature selection for content-based, time-varying musical emotion regression. In: ACM MIR, Philadelphia, PA (2010)
Snow, R., O’Connor, B., Jurafsky, D., Ng, A.: Cheap and Fast - But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks. In: Proc. Empirical Methods in NLP (2008)
Sorokin, A., Forsyth, D.: Utility data annotation with amazon mechanical turk. In: CVPR Workshops (2008)
Speck, J.A., Schmidt, E.M., Morton, B.G., Kim, Y.E.: A comparative study of collaborative vs. traditional annotation methods. In: Proceedings of the International Society for Music Information Retrieval (ISMIR) Conference, Miami, Florida (2011)
Thayer, R.E.: The Biopsychology of Mood and Arousal. Oxford Univ. Press, Oxford (1989)
Webster, G.D., Weir, C.G.: Emotional responses to music: Interactive effects of mode, texture, and tempo. Motivation and Emotion 29, 19–39 (2005)
Whitehill, J., Ruvolo, P., Wu, T., Bergsma, J., Movellan, J.: Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In: NIPS. MIT Press (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Schmidt, E.M., Prockup, M., Scott, J., Dolhansky, B., Morton, B.G., Kim, Y.E. (2013). Analyzing the Perceptual Salience of Audio Features for Musical Emotion Recognition. In: Aramaki, M., Barthet, M., Kronland-Martinet, R., Ystad, S. (eds) From Sounds to Music and Emotions. CMMR 2012. Lecture Notes in Computer Science, vol 7900. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41248-6_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-41248-6_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41247-9
Online ISBN: 978-3-642-41248-6
eBook Packages: Computer ScienceComputer Science (R0)