Skip to main content

A Study on Text-Independent Speaker Recognition Systems in Emotional Conditions Using Different Pattern Recognition Models

  • Conference paper
  • First Online:
Book cover Mining Intelligence and Knowledge Exploration (MIKE 2016)

Abstract

The present study focuses on the text-independent speaker recognition in emotional conditions. In this paper, both system and source features are considered to represent speaker specific information. At the model level, Gaussian Mixture Models (GMMs), Gaussian Mixture Model-Universal Background Model (GMM-UBM) and Deep Neural Networks (DNN) are explored. The experiments are performed using 3 emotional databases, i.e. German emotional speech database (EMO-DB), IITKGP-SESC: Hindi and IITKGP-SESC: Telugu databases. The emotions considered in the present study are neutral, anger, happy and sad. The results show that, the performance of a speaker recognition system trained with clean speech is degrading while testing with emotional data irrespective of feature used or model used to build the system. The best results are obtained for the score level fusion of system and source features based systems when speakers are modeled with DNNs.

R. Prasath—A part of this was carried out when the author was in Indian Institute of Information Technology (IIIT) Sricity, India.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Atal, B.S.: Automatic recognition of speakers from their voices. Proc. IEEE 64(4), 460–475 (1976)

    Article  Google Scholar 

  2. Atal, B.S.: Automatic speaker recognition based on pitch contours. J. Acoust. Soc. Am. 52(6B), 1687–1697 (1972)

    Article  Google Scholar 

  3. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B.: A database of german emotional speech. In: Proceedings of the INTERSPEECH, vol. 5, pp. 1517–1520 (2005)

    Google Scholar 

  4. Campbell, W.M., Sturim, D.E., Reynolds, D.A.: Support vector machines using gmm supervectors for speaker verification. IEEESignal Process. Lett. 13(5), 308–311 (2006)

    Article  Google Scholar 

  5. Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)

    Article  Google Scholar 

  6. Ghiurcau, M.V., Rusu, C., Astola, J.: Speaker recognition in an emotional environment. In: Proceedings of the Signal Processing and Applied Mathematics for Electronics and Communications, pp. 81–84 (2011)

    Google Scholar 

  7. Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al.: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)

    Article  Google Scholar 

  8. Lakshmi, H.R., Achanta, S., Bhavya, P.V., Gangashetty, S.V.: An investigation of end-to-end speaker recognition using deep neural networks. Int. J. Eng. Res. Electron. Commun. Eng. 3(1), 42–47 (2016)

    Google Scholar 

  9. Koolagudi, S.G., Krothapalli, R.S.: Two stage emotion recognition based on speaking rate. Int. J. Speech Technol. 14(1), 35–48 (2011)

    Article  Google Scholar 

  10. Koolagudi, S.G., Maity, S., Kumar, V.A., Chakrabarti, S., Rao, K.S.: IITKGP-SESC: Speech database for emotion analysis. In: Ranka, S., Aluru, S., Buyya, R., Chung, Y.-C., Dua, S., Grama, A., Gupta, S.K.S., Kumar, R., Phoha, V.V. (eds.) IC3 2009. CCIS, vol. 40, pp. 485–492. Springer, Heidelberg (2009). doi:10.1007/978-3-642-03547-0_46

    Chapter  Google Scholar 

  11. Koolagudi, S.G., Sharma, K., Sreenivasa Rao, K.: Speaker recognition in emotional environment. In: Mathew, J., Patra, P., Pradhan, D.K., Kuttyamma, A.J. (eds.) ICECCS 2012. CCIS, vol. 305, pp. 117–124. Springer, Heidelberg (2012). doi:10.1007/978-3-642-32112-2_15

    Chapter  Google Scholar 

  12. Mounika, K.V., Achanta, S., Lakshmi, H.R., Suryakanth, V.G., Vuppala, A.K.: An investigation of deep neural network architectures for language recognition in Indian languages. In: Proceedings of the INTERSPEECH, pp. 2930–2933 (2016)

    Google Scholar 

  13. Li, D., Yang, Y., Wu, Z., Wu, T.: Emotion-state conversion for speaker recognition. In: Tao, J., Tan, T., Picard, R.W. (eds.) ACII 2005. LNCS, vol. 3784, pp. 403–410. Springer, Heidelberg (2005). doi:10.1007/11573548_52

    Chapter  Google Scholar 

  14. Lopez-Moreno, I., Gonzalez-Dominguez, J., Plchot, O., Martinez, D., Gonzalez-Rodriguez, J., Moreno, P.: Automatic language identification using deep neural networks. In: Proceedings of the ICASSP, pp. 5337–5341 (2014)

    Google Scholar 

  15. Makhoul, J.: Linear prediction: A tutorial review. Proc. IEEE 63(4), 561–580 (1975)

    Article  Google Scholar 

  16. Oshaughnessy, D.: Speaker recognition. IEEE ASSP Mag. 3, 4–17 (1986)

    Article  Google Scholar 

  17. O’shaughnessy, D.: Speech Communication: Human and Machine. Universities Press, India (1987)

    MATH  Google Scholar 

  18. Prasanna, S.M., Gupta, C.S., Yegnanarayana, B.: Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Commun. 48(10), 1243–1261 (2006)

    Article  Google Scholar 

  19. Reynolds, D.: An overview of automatic speaker recognition. In: Proceedings of the ICASSP, pp. 4072–4075 (2002)

    Google Scholar 

  20. Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted gaussian mixture models. Digit. Signal Process. 10(1), 19–41 (2000)

    Article  Google Scholar 

  21. Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3(1), 72–83 (1995)

    Article  Google Scholar 

  22. Richardson, F., Reynolds, D., Dehak, N.: Deep neural network approaches to speaker and language recognition. IEEE Signal Process. Lett. 22(10), 1671–1675 (2015)

    Article  Google Scholar 

  23. Salman, A., Chen, K.: Exploring speaker-specific characteristics with deep learning. In: Proceedings of the IJCNN, pp. 103–110. IEEE (2011)

    Google Scholar 

  24. Scherer, K.R., Johnstone, T., Klasmeyer, G., Bänziger, T.: Can automatic speaker verification be improved by training the algorithms on emotional speech? In: Proceedings of the INTERSPEECH, pp. 807–810 (2000)

    Google Scholar 

  25. Wegmuller, M., von der Weid, J.P., Oberson, P., Gisin, N.: Study on speaker verification on emotional speech. In: Proceedings of the INTERSPEECH (2006)

    Google Scholar 

  26. Wu, T., Yang, Y., Wu, Z.: Improving speaker recognition by training on emotion-added models. In: Tao, J., Tan, T., Picard, R.W. (eds.) ACII 2005. LNCS, vol. 3784, pp. 382–389. Springer, Heidelberg (2005). doi:10.1007/11573548_49

    Chapter  Google Scholar 

Download references

Acknowledgements

The first author would like to thank Department of Electronics and Information Technology, Ministry of Communication & IT, Govt of India for granting PhD Fellowship under Visvesvaraya PhD Scheme.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. N. R. K. Raju Alluri .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Alluri, K.N.R.K.R., Achanta, S., Prasath, R., Gangashetty, S.V., Vuppala, A.K. (2017). A Study on Text-Independent Speaker Recognition Systems in Emotional Conditions Using Different Pattern Recognition Models. In: Prasath, R., Gelbukh, A. (eds) Mining Intelligence and Knowledge Exploration. MIKE 2016. Lecture Notes in Computer Science(), vol 10089. Springer, Cham. https://doi.org/10.1007/978-3-319-58130-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-58130-9_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-58129-3

  • Online ISBN: 978-3-319-58130-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics