Abstract
The purpose of this paper is to study the performance of glottal waveform parameters and TEO in distinguishing binary classes of four emotion dimensions (activation, expectation, power, and valence) using authentic emotional speech. The two feature sets were compared with a 1941-dimension acoustic feature set including prosodic, spectral, and other voicing related features extracted using openSMILE toolkit. The comparison work highlight the discrimination ability of TEO in emotion dimensions activation and power, and glottal parameters in expectation and valence for authentic speech data. Using the same classification methodology, TEO and glottal parameter outperformed or performed similarly to the prosodic, spectral and other voicing related features (i.e., the feature set obtained using openSMILE).
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.: Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine 18, 32–80 (2001)
Tao, J., Tan, T.: Affective computing: A review. In: Tao, J., Tan, T., Picard, R.W. (eds.) ACII 2005. LNCS, vol. 3784, pp. 981–995. Springer, Heidelberg (2005)
Calvo, R.A., D’Mello, S.: Affect detection: An interdisciplinary review of models, methods, and their applications. IEEE Transactions on Affective Computing 1, 18–37 (2010)
Busso, C., Sungbok, L., Narayanan, S.: Analysis of emotionally salient aspects of fundamental frequency for emotion detection. IEEE Transactions on Audio, Speech, and Language Processing 17, 582–596 (2009)
Litman, D.J., Forbes-Riley, K.: Recognizing student emotions and attitudes on the basis of utterances in spoken tutoring dialogues with both human and computer tutors. Speech Communication 48, 559–590 (2006)
Fragopanagos, N., Taylor, J.G.: Emotion recognition in human-computer interaction. Neural Networks 18, 389–405 (2005)
Nicolaou, M., Gunes, H., Pantic, M.: Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space. IEEE Transactions on Affective Computing, 1–1 (2011)
Espinosa, H.P., Garcia, C.A.R., Pineda, L.V.: Bilingual acoustic feature selection for emotion estimation using a 3d continuous model. In: 2011 IEEE International Conference on Automatic Face and Gesture Recognition and Workshops, pp. 786–791 (2011)
Wu, C.H., Liang, W.B.: Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Transactions on Affective Computing 2, 10–21 (2011)
Cummings, K.E., Clements, M.A.: Analysis of the glottal excitation of emotionally styled and stressed speech. The Journal of the Acoustical Society of America 98, 88–98 (1995)
Moore, E., Clements, M., Peifer, J., Weisser, L.: Investigating the role of glottal features in classifying clinical depression. In: Proceedings of the 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, vol. 3, pp. 2849–2852 (2003)
Ozdas, A., Shiavi, R.G., Silverman, S.E., Silverman, M.K., Wilkes, D.M.: Investigation of vocal jitter and glottal flow spectrum as possible cues for depression and near-term suicidal risk. IEEE Transactions on Biomedical Engineering 51, 1530–1540 (2004)
Moore, E., Clements, M.A., Peifer, J.W., Weisser, L.: Critical analysis of the impact of glottal features in the classification of clinical depression in speech. IEEE Transactions on Biomedical Engineering 55, 96–107 (2008)
Moore, E., Torres, J.: A performance assessment of objective measures for evaluating the quality of glottal waveform estimates. Speech Communication (2007) (in press)
Sun, R., Moore, E., Torres, J.: Investigating glottal parameters for differentiating emotional categories with similar prosodics. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2009, Taipei, Taiwan (2009)
Caims, D., Hansen, J.H.L.: Nonlinear analysis and classification of speech under stressed conditions. The Journal of the Acoustical Society of America 96, 3392–3399 (1994)
Zhou, G., Hansen, J.H.L., Kaiser, J.F.: Nonlinear feature based classification of speech under stress. IEEE Transactions on Speech and Audio Processing 9, 201–216 (2001)
Eyben, F., Wollmer, M., Schuller, B.: Openear - introducing the munich open-source emotion and affect recognition toolkit. In: 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops, ACII 2009, pp. 1–6 (2009)
Eyben, F., Wollmer, M., Schuller, B.: Opensmile-the munich versatile and fast open-source audio feature extractor. In: ACM Multimedia (MM), Florence, Italy, pp. 1459–1462 (2010)
Sundberg, J., Patel, S., Bjorkner, E., Scherer, K.: Interdependencies among voice source parameters in emotional speech. IEEE Transactions on Affective Computing, 1–1 (2011)
McKeown, G., Valstar, M.F., Cowie, R., Pantic, M.: The semaine corpus of emotionally coloured character interactions. In: 2010 IEEE International Conference on Multimedia and Expo. (ICME), pp. 1079–1084 (2010)
Schuller, B., Valstar, M.F., Eyben, F., McKeown, G., Cowie, R., Pantic, M.: Avec 2011-the first international audio/visual emotion chanllenge. In: D´Mello, S., et al. (eds.) ACII 2011, Part II. LNCS, vol. 6975, pp. 415–424. Springer, Heidelberg (2011)
Patrick, A.N., Anastasis, K., Jon, G., Mike, B.: Estimation of glottal closure instants in voiced speech using the dypsa algorithm. IEEE Transactions on Audio, Speech, and Language Processing 15, 34–43 (2007)
Airas, M., Pulakka, H., Backstrom, T., Alku, P.: A toolkit for voice inverse filtering and parametrisation. In: INTERSPEECH (2005)
Laukkanen, A.M., Vilkman, E., Alku, P., Oksanen, H.: Physical variations related to stress and emotional state: a preliminary study. Journal of Phonetics 24, 313–335 (1996)
Titze, I.R., Sundberg, J.: Vocal intensity in speakers and singers. The Journal of the Acoustical Society of America 91, 2936–2946 (1992)
Childers, D.G.: Vocal quality factors: Analysis, synthesis, and perception. The Journal of the Acoustical Society of America 90, 2394–2410 (1991)
Kaiser, J.F.: On a simple algorithm to calculate the ‘energy’ of a signal. In: International Conference on Acoustics, Speech, and Signal Processing, ICASSP-1990, vol.1, pp: 381–384 (1990)
Maragos, P., Kaiser, J.F., Quatieri, T.F.: Energy separation in signal modulations with application to speech analysis. IEEE Transactions on Signal Processing 41, 3024–3051 (1993)
Hanson, H.M., Maragos, P., Potamianos, A.: A system for finding speech formants and modulations via energy separation. IEEE Transactions on Speech and Audio Processing 2, 436–443 (1994)
Potamianos, A., Maragos, P.: Speech formant frequency and bandwidth tracking using multiband energy demodulation. In: International Conference on Acoustics, Speech, and Signal Processing, ICASSP-1995, vol. 1, pp. 784–787 (1995)
Lippmann, R., Martin, E., Paul, D.: Multi-style training for robust isolated-word speech recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1987, vol. 12, pp. 705–708 (1987)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: An update. SIGKDD Explorations 11 (2009)
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011) Software, http://www.csie.ntu.edu.tw/~cjlin/libsvm
Batliner, A., Steidl, S., Schuller, B., Seppi, D., Laskowski, K., Vogt, T., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: Combining efforts for improving automatic classification of emotional user states. In: Proc. IS-LTC 2006, Ljubliana, pp. 240–245 (2006)
Hirschberg, J., Liscombe, J., Venditti, J.: Experiments in emotional speech. In: ISCA and IEEE Workshop on Spontanous Speech Processing and Recognition, Tokyo, Japan, pp. 119–125 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sun, R., Moore, E. (2011). Investigating Glottal Parameters and Teager Energy Operators in Emotion Recognition. In: D’Mello, S., Graesser, A., Schuller, B., Martin, JC. (eds) Affective Computing and Intelligent Interaction. ACII 2011. Lecture Notes in Computer Science, vol 6975. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24571-8_54
Download citation
DOI: https://doi.org/10.1007/978-3-642-24571-8_54
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24570-1
Online ISBN: 978-3-642-24571-8
eBook Packages: Computer ScienceComputer Science (R0)