Skip to main content

Toward Emotional Speaker Recognition: Framework and Preliminary Results

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7701))

Abstract

Besides channel and environment noises, emotion variability in speech signals has been found to be another important factor that degenerates drastically the performance of most speaker recognition systems proposed in the literature. How to make current GMM-UBM system adaptive to emotion variability is one consideration. We thus propose a framework named Deformation Compensation (DC) for emotional speaker recognition, which viewing emotion variability as deformation (some sort of distribution distortion in the feature space) and trying to take deformation compensation by making dynamic modification on the feature, model and score level. This paper reports the preliminary results which have been gained so far, including our proposed Deformation Compensation framework together with the preliminary case study on GMM-UBM.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kinnunen, T., Li, H.: An overview of text-independent speaker recognition: From features to supervectors. Speech Communication 52(1), 12–40 (2010)

    Article  Google Scholar 

  2. National Institute of Standards and Technology. The NIST year 2010 speaker recognition evaluation plan (2010), http://www.itl.nist.gov/iad/mig/tests/sre/2010/index.html

  3. Scherer, K., Johnstone, T., Bänziger, T.: Automatic verification of emotionally stressed speakers: The problem of individual differences. In: Proc. SPECOM (1998)

    Google Scholar 

  4. Yang, Y., Shan, Z., Wu, Z.: Frequency Shifting for Emotional Speaker Recognition, Pattern Recognition. In: Yin, P.-Y. (ed.) Pattern Recognition, pp. 305–318. InTech (October 2009)

    Google Scholar 

  5. Wu, W., Zheng, T.F., Xu, M.X., Bao, H.J.: Study on Speaker Verification on Emotional Speech. In: Proceedings of ICSLP 2006, pp. 2102–2105 (2006)

    Google Scholar 

  6. Bao, H.J., Xu, M.X., Zheng, T.F.: Emotion Attribute Projection for Speaker Recognition on Emotional Speech. In: InterSpeech 2007, pp. 758–761 (2007)

    Google Scholar 

  7. Shriberg, E., Kajarekar, S., Scheffer, N.: Does Session Variability Compensation in Speaker Recognition Model Intrinsic Variation Under Mismatched Conditions? In: Interspeech 2009, Brighton, United Kingdom, pp. 1551–1554 (2009)

    Google Scholar 

  8. Kenny, P., Boulianne, G., Ouellet, P., Dumouchel, P.: Speaker and session variability in GMM-based speaker verification. IEEE Transactions on Audio, Speech and Language Processing 15(4), 1448–1460 (2007)

    Article  Google Scholar 

  9. Chen, L., Yang, Y.: Applying Emotional Factor Analysis and I-Vector to Emotional Speaker Recognition. In: Sun, Z., Lai, J., Chen, X., Tan, T. (eds.) CCBR 2011. LNCS, vol. 7098, pp. 174–179. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  10. Louis, T.B.: Emotions, speech and the ASR framework. Speech Communication 40, 213–225 (2003)

    Article  MATH  Google Scholar 

  11. Huang, T., Yang, Y.: Applying pitch-dependent difference detection and modification to emotional speaker recognition. In: Interspeech 2011 (2011)

    Google Scholar 

  12. Wu, T., Yang, Y., Wu, Z., Li, D.: MASC: A Speech Corpus in Mandarin for Emotion Analysis and Affective Speaker Recognition. In: ODYSSEY 2006, pp. 1–5 (June 2006)

    Google Scholar 

  13. Read, D., Craik, F.I.M.: Earwitness identification: some influences on voice recognition. Journal of Experimental Psychology: Applied 1, 6–18

    Google Scholar 

  14. Rosengerg, A.E.: Listener performance in speaker verification tasks. IEEE Transactions on Audio and Eletroacoustics AU-21, 221–225

    Google Scholar 

  15. Hautamaki, V., Kinnunen, T., Nosratighods, M., Lee, K.-A., Ma, B., Li, H.: Approaching human listener accuracy with modern speaker verification. In: Interspeech 2010, pp. 1473–1476 (2010)

    Google Scholar 

  16. Yang, Y., Chen, L., Wang, W.: Emotional Speaker Identification by Humans and Machines. In: Sun, Z., Lai, J., Chen, X., Tan, T. (eds.) CCBR 2011. LNCS, vol. 7098, pp. 167–173. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yang, Y., Chen, L. (2012). Toward Emotional Speaker Recognition: Framework and Preliminary Results. In: Zheng, WS., Sun, Z., Wang, Y., Chen, X., Yuen, P.C., Lai, J. (eds) Biometric Recognition. CCBR 2012. Lecture Notes in Computer Science, vol 7701. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35136-5_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35136-5_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35135-8

  • Online ISBN: 978-3-642-35136-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics