Skip to main content

Feature Extraction of Automatic Speaker Recognition, Analysis and Evaluation in Real Environment

  • Conference paper
  • First Online:
Progress in Artificial Intelligence and Pattern Recognition (IWAIPR 2018)

Abstract

An Automatic Speaker Recognition is a biometric system that allows you to identify and verify people, using voice as a discriminatory feature. The purpose of this paper is the feature extraction stage, performing an analysis of effectiveness in real environment. The features extraction has as objective to capture the associated characteristic space of the speaker, being the Mel features and its linear variant the most used methods. In real conditions, the environment over which the speech signal is processed tends not to be ideal, nor is the duration of the speech, so it’s necessary to use robust techniques for assuring a lower degradation grade of system effectiveness; techniques such as Power Normalization, Hilbert Envelope and Modulation of Mean Duration are described, analyzed and evaluated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The meta-data training was done with telephonic samples of NIST04 and NIST05.

  2. 2.

    Visit the web site “https://www.nist.gov” for searching the evaluation plan of NIST\(\_2008\) and knowing more about these conditions.

References

  1. Benesty, J.: Springer Handbook of Speech Processing. Springer Handbooks. Heidelberg, Springer (2008). https://doi.org/10.1007/978-3-540-49127-9

    Book  Google Scholar 

  2. Hernández, G.: Métodos de representación y verificación del locutor con independencia del texto. Ph.D. thesis, Instituto Superior Politécnico José Antonio Echeverría (2014)

    Google Scholar 

  3. Hirsch, H.G.: F a N T - Filtering and Noise Adding Tool, March 2005

    Google Scholar 

  4. Yang, J., Xie, S.J. (eds.): New Trends Developments in Biometrics. InTech, November 2012

    Google Scholar 

  5. Kim, C., Stern, R.M.: Feature extraction for robust speech recognition using a power-law nonlinearity and power-bias subtraction. In: INTERSPEECH 2009, 10th Annual Conference of the International Speech Communication Association, Brighton, UK, 6–10 September 2009, pp. 28–31 (2009)

    Google Scholar 

  6. Kim, C., Stern, R.M.: Power-normalized cepstral coefficients (PNCC) for robust speech recognition. IEEE/ACM Trans. Audio, Speech Lang. Process. 24(7), 1315–1329 (2016)

    Article  Google Scholar 

  7. Kondoz, A.M.: Digital Speech, Coding for Low Bit Rate Communication Systems, 2nd edn. Wiley, London (2004)

    Book  Google Scholar 

  8. Kvedalen, E.: Signal processing using the teager energy operator and other nonlinear operators. Cand. Scient Thesis (2003)

    Google Scholar 

  9. Makhoul, J.: Linear prediction: a tutorial review (1975)

    Google Scholar 

  10. McLaren, M., Lei, Y., Ferrer, L.: Advances in deep neural network approaches to speaker recognition. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2015, South Brisbane, Queensland, Australia, 19–24 April 2015, pp. 4814–4818 (2015). https://doi.org/10.1109/ICASSP.2015.7178885

  11. Mitra, V., Franco, H., Graciarena, M., Vergyri, D.: Medium-duration modulation cepstral feature for robust speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014, Florence, Italy, May 4–9 2014, pp. 1749–1753 (2014)

    Google Scholar 

  12. Singh, N., Khan, R.A., Shree, R.: MFCC and prosodic feature extraction techniques: a comparative study. Int. J. Comput. Appl. 54, 0975–8887 (2012)

    Google Scholar 

  13. Ribas, D.: Reconocimiento Robusto de Locutores en Ambientes no Controlados. Ph.D. thesis, Instituto Superior Politécnico José Antonio Echeverría (2016)

    Google Scholar 

  14. Sadjadi, S.O., Hansen, J.H.L.: Mean hilbert envelope coefficients (MHEC) for robust speaker and language identification. Speech Commun. 72, 138–148 (2015)

    Article  Google Scholar 

  15. Shao, Y., Jin, Z., Wang, D., Srinivasan, S.: An auditory-based feature for robust speech recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2009, Taipei, Taiwan, 19–24 April 2009, pp. 4625–4628 (2009)

    Google Scholar 

  16. Slaney, M.: An efficient implementation of the Patterson-Holdsworth auditory filter bank. Technical report, Apple Computer, Perception Group, Advanced Technology Group (1993)

    Google Scholar 

  17. Traunmller, H., Eriksson, A.: The frequency range of the voice fundamental in the speech of male and female adults (1995)

    Google Scholar 

  18. Zhou, X., Garcia-Romero, D., Duraiswami, R., Espy-Wilson, C.Y., Shamma, S.A.: Linear versus Mel frequency cepstral coefficients for speaker recognition. In: 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, ASRU 2011, Waikoloa, HI, USA, 11–15 December 2011, pp. 559–564 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Edward L. Campbell .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Campbell, E.L., Hernández, G., Calvo, J.R. (2018). Feature Extraction of Automatic Speaker Recognition, Analysis and Evaluation in Real Environment. In: Hernández Heredia, Y., Milián Núñez, V., Ruiz Shulcloper, J. (eds) Progress in Artificial Intelligence and Pattern Recognition. IWAIPR 2018. Lecture Notes in Computer Science(), vol 11047. Springer, Cham. https://doi.org/10.1007/978-3-030-01132-1_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-01132-1_43

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-01131-4

  • Online ISBN: 978-3-030-01132-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics