Feature Extraction of Automatic Speaker Recognition, Analysis and Evaluation in Real Environment

Campbell, Edward L.; Hernández, Gabriel; Calvo, José Ramón

doi:10.1007/978-3-030-01132-1_43

Edward L. Campbell ORCID: orcid.org/0000-0002-9382-1208¹⁶,
Gabriel Hernández¹⁶ &
José Ramón Calvo¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11047))

Included in the following conference series:

International Workshop on Artificial Intelligence and Pattern Recognition

1183 Accesses
1 Citations

Abstract

An Automatic Speaker Recognition is a biometric system that allows you to identify and verify people, using voice as a discriminatory feature. The purpose of this paper is the feature extraction stage, performing an analysis of effectiveness in real environment. The features extraction has as objective to capture the associated characteristic space of the speaker, being the Mel features and its linear variant the most used methods. In real conditions, the environment over which the speech signal is processed tends not to be ideal, nor is the duration of the speech, so it’s necessary to use robust techniques for assuring a lower degradation grade of system effectiveness; techniques such as Power Normalization, Hilbert Envelope and Modulation of Mean Duration are described, analyzed and evaluated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The meta-data training was done with telephonic samples of NIST04 and NIST05.
2.
Visit the web site “https://www.nist.gov” for searching the evaluation plan of NIST\(\_2008\) and knowing more about these conditions.

References

Benesty, J.: Springer Handbook of Speech Processing. Springer Handbooks. Heidelberg, Springer (2008). https://doi.org/10.1007/978-3-540-49127-9
Book Google Scholar
Hernández, G.: Métodos de representación y verificación del locutor con independencia del texto. Ph.D. thesis, Instituto Superior Politécnico José Antonio Echeverría (2014)
Google Scholar
Hirsch, H.G.: F a N T - Filtering and Noise Adding Tool, March 2005
Google Scholar
Yang, J., Xie, S.J. (eds.): New Trends Developments in Biometrics. InTech, November 2012
Google Scholar
Kim, C., Stern, R.M.: Feature extraction for robust speech recognition using a power-law nonlinearity and power-bias subtraction. In: INTERSPEECH 2009, 10th Annual Conference of the International Speech Communication Association, Brighton, UK, 6–10 September 2009, pp. 28–31 (2009)
Google Scholar
Kim, C., Stern, R.M.: Power-normalized cepstral coefficients (PNCC) for robust speech recognition. IEEE/ACM Trans. Audio, Speech Lang. Process. 24(7), 1315–1329 (2016)
Article Google Scholar
Kondoz, A.M.: Digital Speech, Coding for Low Bit Rate Communication Systems, 2nd edn. Wiley, London (2004)
Book Google Scholar
Kvedalen, E.: Signal processing using the teager energy operator and other nonlinear operators. Cand. Scient Thesis (2003)
Google Scholar
Makhoul, J.: Linear prediction: a tutorial review (1975)
Google Scholar
McLaren, M., Lei, Y., Ferrer, L.: Advances in deep neural network approaches to speaker recognition. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2015, South Brisbane, Queensland, Australia, 19–24 April 2015, pp. 4814–4818 (2015). https://doi.org/10.1109/ICASSP.2015.7178885
Mitra, V., Franco, H., Graciarena, M., Vergyri, D.: Medium-duration modulation cepstral feature for robust speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014, Florence, Italy, May 4–9 2014, pp. 1749–1753 (2014)
Google Scholar
Singh, N., Khan, R.A., Shree, R.: MFCC and prosodic feature extraction techniques: a comparative study. Int. J. Comput. Appl. 54, 0975–8887 (2012)
Google Scholar
Ribas, D.: Reconocimiento Robusto de Locutores en Ambientes no Controlados. Ph.D. thesis, Instituto Superior Politécnico José Antonio Echeverría (2016)
Google Scholar
Sadjadi, S.O., Hansen, J.H.L.: Mean hilbert envelope coefficients (MHEC) for robust speaker and language identification. Speech Commun. 72, 138–148 (2015)
Article Google Scholar
Shao, Y., Jin, Z., Wang, D., Srinivasan, S.: An auditory-based feature for robust speech recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2009, Taipei, Taiwan, 19–24 April 2009, pp. 4625–4628 (2009)
Google Scholar
Slaney, M.: An efficient implementation of the Patterson-Holdsworth auditory filter bank. Technical report, Apple Computer, Perception Group, Advanced Technology Group (1993)
Google Scholar
Traunmller, H., Eriksson, A.: The frequency range of the voice fundamental in the speech of male and female adults (1995)
Google Scholar
Zhou, X., Garcia-Romero, D., Duraiswami, R., Espy-Wilson, C.Y., Shamma, S.A.: Linear versus Mel frequency cepstral coefficients for speaker recognition. In: 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, ASRU 2011, Waikoloa, HI, USA, 11–15 December 2011, pp. 559–564 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Images and Signals Group, CENATAV Research Division, DATYS Enterprise, 7th A Street # 21406, Playa, Havana, Cuba
Edward L. Campbell, Gabriel Hernández & José Ramón Calvo

Authors

Edward L. Campbell
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel Hernández
View author publications
You can also search for this author in PubMed Google Scholar
José Ramón Calvo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Edward L. Campbell .

Editor information

Editors and Affiliations

Universidad de las Ciencias Informáticas, Havana, Cuba
Yanio Hernández Heredia
Universidad de las Ciencias Informáticas, Havana, Cuba
Vladimir Milián Núñez
Universidad de las Ciencias Informáticas, Havana, Cuba
José Ruiz Shulcloper

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Campbell, E.L., Hernández, G., Calvo, J.R. (2018). Feature Extraction of Automatic Speaker Recognition, Analysis and Evaluation in Real Environment. In: Hernández Heredia, Y., Milián Núñez, V., Ruiz Shulcloper, J. (eds) Progress in Artificial Intelligence and Pattern Recognition. IWAIPR 2018. Lecture Notes in Computer Science(), vol 11047. Springer, Cham. https://doi.org/10.1007/978-3-030-01132-1_43

Download citation

DOI: https://doi.org/10.1007/978-3-030-01132-1_43
Published: 22 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01131-4
Online ISBN: 978-3-030-01132-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics