Abstract
An Automatic Speaker Recognition is a biometric system that allows you to identify and verify people, using voice as a discriminatory feature. The purpose of this paper is the feature extraction stage, performing an analysis of effectiveness in real environment. The features extraction has as objective to capture the associated characteristic space of the speaker, being the Mel features and its linear variant the most used methods. In real conditions, the environment over which the speech signal is processed tends not to be ideal, nor is the duration of the speech, so it’s necessary to use robust techniques for assuring a lower degradation grade of system effectiveness; techniques such as Power Normalization, Hilbert Envelope and Modulation of Mean Duration are described, analyzed and evaluated.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The meta-data training was done with telephonic samples of NIST04 and NIST05.
- 2.
Visit the web site “https://www.nist.gov” for searching the evaluation plan of NIST\(\_2008\) and knowing more about these conditions.
References
Benesty, J.: Springer Handbook of Speech Processing. Springer Handbooks. Heidelberg, Springer (2008). https://doi.org/10.1007/978-3-540-49127-9
Hernández, G.: Métodos de representación y verificación del locutor con independencia del texto. Ph.D. thesis, Instituto Superior Politécnico José Antonio Echeverría (2014)
Hirsch, H.G.: F a N T - Filtering and Noise Adding Tool, March 2005
Yang, J., Xie, S.J. (eds.): New Trends Developments in Biometrics. InTech, November 2012
Kim, C., Stern, R.M.: Feature extraction for robust speech recognition using a power-law nonlinearity and power-bias subtraction. In: INTERSPEECH 2009, 10th Annual Conference of the International Speech Communication Association, Brighton, UK, 6–10 September 2009, pp. 28–31 (2009)
Kim, C., Stern, R.M.: Power-normalized cepstral coefficients (PNCC) for robust speech recognition. IEEE/ACM Trans. Audio, Speech Lang. Process. 24(7), 1315–1329 (2016)
Kondoz, A.M.: Digital Speech, Coding for Low Bit Rate Communication Systems, 2nd edn. Wiley, London (2004)
Kvedalen, E.: Signal processing using the teager energy operator and other nonlinear operators. Cand. Scient Thesis (2003)
Makhoul, J.: Linear prediction: a tutorial review (1975)
McLaren, M., Lei, Y., Ferrer, L.: Advances in deep neural network approaches to speaker recognition. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2015, South Brisbane, Queensland, Australia, 19–24 April 2015, pp. 4814–4818 (2015). https://doi.org/10.1109/ICASSP.2015.7178885
Mitra, V., Franco, H., Graciarena, M., Vergyri, D.: Medium-duration modulation cepstral feature for robust speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014, Florence, Italy, May 4–9 2014, pp. 1749–1753 (2014)
Singh, N., Khan, R.A., Shree, R.: MFCC and prosodic feature extraction techniques: a comparative study. Int. J. Comput. Appl. 54, 0975–8887 (2012)
Ribas, D.: Reconocimiento Robusto de Locutores en Ambientes no Controlados. Ph.D. thesis, Instituto Superior Politécnico José Antonio Echeverría (2016)
Sadjadi, S.O., Hansen, J.H.L.: Mean hilbert envelope coefficients (MHEC) for robust speaker and language identification. Speech Commun. 72, 138–148 (2015)
Shao, Y., Jin, Z., Wang, D., Srinivasan, S.: An auditory-based feature for robust speech recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2009, Taipei, Taiwan, 19–24 April 2009, pp. 4625–4628 (2009)
Slaney, M.: An efficient implementation of the Patterson-Holdsworth auditory filter bank. Technical report, Apple Computer, Perception Group, Advanced Technology Group (1993)
Traunmller, H., Eriksson, A.: The frequency range of the voice fundamental in the speech of male and female adults (1995)
Zhou, X., Garcia-Romero, D., Duraiswami, R., Espy-Wilson, C.Y., Shamma, S.A.: Linear versus Mel frequency cepstral coefficients for speaker recognition. In: 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, ASRU 2011, Waikoloa, HI, USA, 11–15 December 2011, pp. 559–564 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Campbell, E.L., Hernández, G., Calvo, J.R. (2018). Feature Extraction of Automatic Speaker Recognition, Analysis and Evaluation in Real Environment. In: Hernández Heredia, Y., Milián Núñez, V., Ruiz Shulcloper, J. (eds) Progress in Artificial Intelligence and Pattern Recognition. IWAIPR 2018. Lecture Notes in Computer Science(), vol 11047. Springer, Cham. https://doi.org/10.1007/978-3-030-01132-1_43
Download citation
DOI: https://doi.org/10.1007/978-3-030-01132-1_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01131-4
Online ISBN: 978-3-030-01132-1
eBook Packages: Computer ScienceComputer Science (R0)