A Study on Text-Independent Speaker Recognition Systems in Emotional Conditions Using Different Pattern Recognition Models

Alluri, K. N. R. K. Raju; Achanta, Sivanand; Prasath, Rajendra; Gangashetty, Suryakanth V.; Vuppala, Anil Kumar

doi:10.1007/978-3-319-58130-9_7

K. N. R. K. Raju Alluri¹⁵,
Sivanand Achanta¹⁵,
Rajendra Prasath¹⁶,
Suryakanth V. Gangashetty¹⁵ &
…
Anil Kumar Vuppala¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10089))

Included in the following conference series:

International Conference on Mining Intelligence and Knowledge Exploration

605 Accesses
2 Citations

Abstract

The present study focuses on the text-independent speaker recognition in emotional conditions. In this paper, both system and source features are considered to represent speaker specific information. At the model level, Gaussian Mixture Models (GMMs), Gaussian Mixture Model-Universal Background Model (GMM-UBM) and Deep Neural Networks (DNN) are explored. The experiments are performed using 3 emotional databases, i.e. German emotional speech database (EMO-DB), IITKGP-SESC: Hindi and IITKGP-SESC: Telugu databases. The emotions considered in the present study are neutral, anger, happy and sad. The results show that, the performance of a speaker recognition system trained with clean speech is degrading while testing with emotional data irrespective of feature used or model used to build the system. The best results are obtained for the score level fusion of system and source features based systems when speakers are modeled with DNNs.

R. Prasath—A part of this was carried out when the author was in Indian Institute of Information Technology (IIIT) Sricity, India.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Atal, B.S.: Automatic recognition of speakers from their voices. Proc. IEEE 64(4), 460–475 (1976)
Article Google Scholar
Atal, B.S.: Automatic speaker recognition based on pitch contours. J. Acoust. Soc. Am. 52(6B), 1687–1697 (1972)
Article Google Scholar
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B.: A database of german emotional speech. In: Proceedings of the INTERSPEECH, vol. 5, pp. 1517–1520 (2005)
Google Scholar
Campbell, W.M., Sturim, D.E., Reynolds, D.A.: Support vector machines using gmm supervectors for speaker verification. IEEESignal Process. Lett. 13(5), 308–311 (2006)
Article Google Scholar
Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)
Article Google Scholar
Ghiurcau, M.V., Rusu, C., Astola, J.: Speaker recognition in an emotional environment. In: Proceedings of the Signal Processing and Applied Mathematics for Electronics and Communications, pp. 81–84 (2011)
Google Scholar
Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al.: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)
Article Google Scholar
Lakshmi, H.R., Achanta, S., Bhavya, P.V., Gangashetty, S.V.: An investigation of end-to-end speaker recognition using deep neural networks. Int. J. Eng. Res. Electron. Commun. Eng. 3(1), 42–47 (2016)
Google Scholar
Koolagudi, S.G., Krothapalli, R.S.: Two stage emotion recognition based on speaking rate. Int. J. Speech Technol. 14(1), 35–48 (2011)
Article Google Scholar
Koolagudi, S.G., Maity, S., Kumar, V.A., Chakrabarti, S., Rao, K.S.: IITKGP-SESC: Speech database for emotion analysis. In: Ranka, S., Aluru, S., Buyya, R., Chung, Y.-C., Dua, S., Grama, A., Gupta, S.K.S., Kumar, R., Phoha, V.V. (eds.) IC3 2009. CCIS, vol. 40, pp. 485–492. Springer, Heidelberg (2009). doi:10.1007/978-3-642-03547-0_46
Chapter Google Scholar
Koolagudi, S.G., Sharma, K., Sreenivasa Rao, K.: Speaker recognition in emotional environment. In: Mathew, J., Patra, P., Pradhan, D.K., Kuttyamma, A.J. (eds.) ICECCS 2012. CCIS, vol. 305, pp. 117–124. Springer, Heidelberg (2012). doi:10.1007/978-3-642-32112-2_15
Chapter Google Scholar
Mounika, K.V., Achanta, S., Lakshmi, H.R., Suryakanth, V.G., Vuppala, A.K.: An investigation of deep neural network architectures for language recognition in Indian languages. In: Proceedings of the INTERSPEECH, pp. 2930–2933 (2016)
Google Scholar
Li, D., Yang, Y., Wu, Z., Wu, T.: Emotion-state conversion for speaker recognition. In: Tao, J., Tan, T., Picard, R.W. (eds.) ACII 2005. LNCS, vol. 3784, pp. 403–410. Springer, Heidelberg (2005). doi:10.1007/11573548_52
Chapter Google Scholar
Lopez-Moreno, I., Gonzalez-Dominguez, J., Plchot, O., Martinez, D., Gonzalez-Rodriguez, J., Moreno, P.: Automatic language identification using deep neural networks. In: Proceedings of the ICASSP, pp. 5337–5341 (2014)
Google Scholar
Makhoul, J.: Linear prediction: A tutorial review. Proc. IEEE 63(4), 561–580 (1975)
Article Google Scholar
Oshaughnessy, D.: Speaker recognition. IEEE ASSP Mag. 3, 4–17 (1986)
Article Google Scholar
O’shaughnessy, D.: Speech Communication: Human and Machine. Universities Press, India (1987)
MATH Google Scholar
Prasanna, S.M., Gupta, C.S., Yegnanarayana, B.: Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Commun. 48(10), 1243–1261 (2006)
Article Google Scholar
Reynolds, D.: An overview of automatic speaker recognition. In: Proceedings of the ICASSP, pp. 4072–4075 (2002)
Google Scholar
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted gaussian mixture models. Digit. Signal Process. 10(1), 19–41 (2000)
Article Google Scholar
Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3(1), 72–83 (1995)
Article Google Scholar
Richardson, F., Reynolds, D., Dehak, N.: Deep neural network approaches to speaker and language recognition. IEEE Signal Process. Lett. 22(10), 1671–1675 (2015)
Article Google Scholar
Salman, A., Chen, K.: Exploring speaker-specific characteristics with deep learning. In: Proceedings of the IJCNN, pp. 103–110. IEEE (2011)
Google Scholar
Scherer, K.R., Johnstone, T., Klasmeyer, G., Bänziger, T.: Can automatic speaker verification be improved by training the algorithms on emotional speech? In: Proceedings of the INTERSPEECH, pp. 807–810 (2000)
Google Scholar
Wegmuller, M., von der Weid, J.P., Oberson, P., Gisin, N.: Study on speaker verification on emotional speech. In: Proceedings of the INTERSPEECH (2006)
Google Scholar
Wu, T., Yang, Y., Wu, Z.: Improving speaker recognition by training on emotion-added models. In: Tao, J., Tan, T., Picard, R.W. (eds.) ACII 2005. LNCS, vol. 3784, pp. 382–389. Springer, Heidelberg (2005). doi:10.1007/11573548_49
Chapter Google Scholar

Download references

Acknowledgements

The first author would like to thank Department of Electronics and Information Technology, Ministry of Communication & IT, Govt of India for granting PhD Fellowship under Visvesvaraya PhD Scheme.

Author information

Authors and Affiliations

Speech and Vison Lab (LTRC), International Institute of Information Technology Hyderabad, Hyderabad, Andhra Pradesh, India
K. N. R. K. Raju Alluri, Sivanand Achanta, Suryakanth V. Gangashetty & Anil Kumar Vuppala
NTNU, Trondheim, Norway
Rajendra Prasath

Authors

K. N. R. K. Raju Alluri
View author publications
You can also search for this author in PubMed Google Scholar
Sivanand Achanta
View author publications
You can also search for this author in PubMed Google Scholar
Rajendra Prasath
View author publications
You can also search for this author in PubMed Google Scholar
Suryakanth V. Gangashetty
View author publications
You can also search for this author in PubMed Google Scholar
Anil Kumar Vuppala
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. N. R. K. Raju Alluri .

Editor information

Editors and Affiliations

Norwegian University of Science and Technology, Trondheim, Norway
Rajendra Prasath
Center for Computing Research, CIC, National Polytechnic Institute, IPN, Mexico City, Distrito Federal, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alluri, K.N.R.K.R., Achanta, S., Prasath, R., Gangashetty, S.V., Vuppala, A.K. (2017). A Study on Text-Independent Speaker Recognition Systems in Emotional Conditions Using Different Pattern Recognition Models. In: Prasath, R., Gelbukh, A. (eds) Mining Intelligence and Knowledge Exploration. MIKE 2016. Lecture Notes in Computer Science(), vol 10089. Springer, Cham. https://doi.org/10.1007/978-3-319-58130-9_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-58130-9_7
Published: 27 April 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-58129-3
Online ISBN: 978-3-319-58130-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics