Skip to main content

Automatic Emotion Recognition from Cochlear Implant-Like Spectrally Reduced Speech

  • Conference paper
Ambient Assisted Living and Daily Activities (IWAAL 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8868))

Included in the following conference series:

Abstract

In this paper we present a robust feature extractor that includes the In this paper we study the performance of emotion recognition from cochlear implant-like spectrally reduced speech (SRS) using the conventional Mel-frequency cepstral coefficients and a Gaussian mixture model (GMM)-based classifier. Cochlear-implant-like SRS of each utterance from the emotional speech corpus is obtained only from low-bandwidth subband temporal envelopes of the corresponding original utterance. The resulting utterances have less spectral information than the original utterances but contain the most relevant information for emotion recognition. The emotion classes are trained on the Mel-frequency cepstral coefficient (MFCC) features extracted from the SRS signals and classification is performed using MFCC features computed from the test SRS signals. In order to evaluate to the performance of the SRS-MFCC features, emotion recognition experiments are conducted on the FAU AIBO spontaneous emotion corpus. Conventional MFCC, Mel-warped DFT (discrete Fourier transform) spectrum-based cepstral coefficients (MWDCC), PLP (perceptual linear prediction), and amplitude modulation cepstral coefficient (AMCC) features extracted from the original signals are used for comparison purpose. Experimental results depict that the SRS-MFCC features outperformed all other features in terms of emotion recognition accuracy. Average relative improvements obtained over all baseline systems are 1.5% and 11.6% in terms of unweighted average recall and weighted average recall, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Wu, S., Falk, T.H., Chan, W.-Y.: Automatic speech emotion recognition using modulation spectral features. Speech Comm. 53(5), 768–785 (2011)

    Article  Google Scholar 

  2. Chen, L., Mao, X., Xue, Y., Cheng, L.L.: Speech emotion recognition: Features and classification models. Digital Signal Processing 22, 1154–1160 (2012)

    Article  MathSciNet  Google Scholar 

  3. Ververidis, D., Kotropoulos, C.: Emotional speech recognition – resources features and methods. Speech Commun. 48, 1162–1181 (2006)

    Article  Google Scholar 

  4. Scherer, K.: Vocal communication of emotion: A review of research paradigms. Speech Commun. 40, 227–256 (2003)

    Article  MATH  Google Scholar 

  5. Sobol-Shikler, T., Robinson, P.: Classification of complex information: Inference of co-occurring affective states from their expressions in speech. IEEE Trans. Pattern Anal. Mach. Intell. 32(7), 1284–1297 (2010)

    Article  Google Scholar 

  6. Dumouchel, P., Dehak, N., Attabi, Y., Dehak, R., Boufaden, N.: Cepstral and long-term features for emotion recognition. In: Proc. INTERSPEECH, pp. 344–347 (2009)

    Google Scholar 

  7. Alam, M.J., Attabi, Y., Dumouchel, P., Kenny, P., O’Shaughnessy, D.: Amplitude Modulation Features for Emotion Recognition from Speech. In: Proc. INTERSPEECH, Lyon, France (2013)

    Google Scholar 

  8. Georgogiannis, A., Digalakis, V.: Speech emotion recognition using nonlinear Teager energy based features in noisy environments. In: Proc. EUSIPCO, Bucharest, Romania (August 2012)

    Google Scholar 

  9. Sato, N., Obuchi, Y.: Emotion recognition using Mel-frequency cepstral coefficients. Journal of Natural Language Processing 14(4), 83–96 (2007)

    Article  Google Scholar 

  10. Neiberg, D., Elenius, K., Laskowski, K.: Emotion recognition in spontaneous speech using GMMs. In: Proc. of INTERSPEECH Conference, pp. 809–812 (2006)

    Google Scholar 

  11. Peter, C., Beale, R. (eds.): Affect and Emotion in Human-Computer Interaction. LNCS, vol. 4868. Springer, Heidelberg (2008)

    Google Scholar 

  12. Yoon, W.-J., Park, K.-S.: A study of emotion recognition and its applications. In: Torra, V., Narukawa, Y., Yoshida, Y. (eds.) MDAI 2007. LNCS (LNAI), vol. 4617, pp. 455–462. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  13. Schuller, B., Müller, R., Eyben, F., Gast, J., Hörnler, B., Wöllmer, M., Rigoll, G., Höthker, A., Konosu, H.: Being bored? Recognizing natural interest by extensive audiovisual integration for real-life application. Image Vis. Comput. 27(12), 1760–1774 (2009)

    Article  Google Scholar 

  14. Van Deemter, K., Krenn, B., Piwek, P., Klesen, M., Schröder, M., Baumann, S.: Fully generated scripted dialogue for embodied agents. Artificial Intelligence 172(10), 1219–1244 (2008)

    Article  MATH  Google Scholar 

  15. Lorini, E., Schwarzentruber, F.: A logic for reasoning about counterfactual emotions. Artificial Intelligence 175(3), 814–847 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  16. Scherer, K.R., Bänziger, T., Roesch, E.B. (eds.): Blueprint for Affective Computing - A Sourcebook. Oxford University Press, Oxford (2010)

    Google Scholar 

  17. Schuller, B., Steidl, S., Batliner, A.: The INTERSPEECH 2009 Emotion Challenge. In: Interspeech, ISCA, Brighton (2009)

    Google Scholar 

  18. Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoustics, Speech, and Signal Processing 28(4), 357–366 (1980)

    Article  Google Scholar 

  19. Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. Journal of the Acoustical Society of America 87(4), 1738–1752 (1990)

    Article  Google Scholar 

  20. Do, C.-T., Pastor, D., Goalic, A.: A novel framework for noise robust ASR using cochlear implant-like spectrally reduced speech. Speech Communication 54(1), 119–133 (2012)

    Article  Google Scholar 

  21. Do, C.-T., Pastor, D., Le Lan, G., Goalic, A.: Recognizing cochlear implant-like spectrally reduced speech with HMM-based ASR: experiments with MFCCs and PLP coefficients. In: Proc. of INTERSPEECH 2010, pp. 2634–2637 (September 2010)

    Google Scholar 

  22. Do, C.-T., Taghizadeh, M.J., Garner, P.N.: Combining cepstral normalization and cochlear implant-like speech processing for microphone array-based speech recognition. In: Proc. SLT 2012 - IEEE Workshop on Spoken Language Technology, pp. 137–142 (December 2012)

    Google Scholar 

  23. Do, C.-T., Barras, C.: Cochlear implant-like processing of speech signal for speaker verification. In: Proc. SAPA 2012 Conference - Statistical and Perceptual Audition (Satellite Workshop of Interspeech 2012), pp. 17–21 (September 2012)

    Google Scholar 

  24. Shannon, R.V., Zeng, F.-G., Kamath, V., Wygonski, J., Ekelid, M.: Speech recognition with primarily temporal cues. Science 270(5234), 303–304 (1995)

    Article  Google Scholar 

  25. Zeng, F.-G., Nie, K., Stickney, G., Kong, Y.-Y., Vongphoe, M., Bhargave, A., Wei, C., Cao, K.: Speech recognition with amplitude and frequency modulations. Proceedings of National Academy of Sciences 102(7), 2293–2298 (2005)

    Article  Google Scholar 

  26. Dempster, A.P., Laird, N.M., Robin, D.B.: Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royel Stastical Society B, 1–38 (1997)

    Google Scholar 

  27. Gunawan, T.S., Ambikairajah, E.: Speech enhancement using temporal masking and fractional Bark gammatone filters. In: Proc. 10th Australian Int. Conf. Speech Sci. Technol., Sydney, Australia, December 08-10, pp. 420–425 (2004)

    Google Scholar 

  28. Steidl, S.: Automatic Classification of Emotion-Related User States in Spontaneous Children’s Speech. Logos Verlag, Berlin (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Alam, M.J., Attabi, Y., Kenny, P., Dumouchel, P., O’Shaughnessy, D. (2014). Automatic Emotion Recognition from Cochlear Implant-Like Spectrally Reduced Speech. In: Pecchia, L., Chen, L.L., Nugent, C., Bravo, J. (eds) Ambient Assisted Living and Daily Activities. IWAAL 2014. Lecture Notes in Computer Science, vol 8868. Springer, Cham. https://doi.org/10.1007/978-3-319-13105-4_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-13105-4_48

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-13104-7

  • Online ISBN: 978-3-319-13105-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics