Skip to main content

Assessment of a Speaker Recognition System Based on an Auditory Model and Neural Nets

  • Conference paper
Bioinspired Applications in Artificial and Natural Computation (IWINAC 2009)

Abstract

This paper deals with a new speaker recognition system based on a model of the human auditory system. Our model is based on a human nonlinear cochlear filter-bank and Neural Nets.

The efficiency of this system has been tested using a number of Spanish words from the ‘Ahumada’ database as uttered by a native male speaker. These words were fed into the cochlea model and their corresponding outputs were processed with an envelope component extractor, yielding five parameters that convey different auditory sensations (loudness, roughness and virtual tones).

Because this process generates large data sets, the use of multivariate statistical methods and Neural Nets was appropriate. A variety of normalization techniques and classifying methods were tested on this biologically motivated feature set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lopez-Poveda, E.A., Meddis, R.: A human nonlinear cochlear filterbank. J. Acoust. Soc. Am. 110(6), 3107–3118 (2001)

    Article  Google Scholar 

  2. Atal, B.S., Hanauer, S.L.: Speech analysis and synthesis by linear prediction of the speech wave. Journal of The American Acoustics Society 50, 637–655 (1971)

    Article  Google Scholar 

  3. Merkel, J.D., Gray, A.H.: Linear prediction of speech. Springer, Heidelberg (1976)

    Book  Google Scholar 

  4. Furui, S.: Cepstral analysis techniques for automatic speaker verification. IEEE Transaction on Acoustics, Speech and Signal Processing 27, 254–277 (1981)

    Article  Google Scholar 

  5. Mermelstein, P.: Distance measures for speech recognition, psychological and instrumental. In: Chen, C.H. (ed.) Pattern Recognition and Artificial Intelligence, pp. 374–388. Academic, New York (1976)

    Google Scholar 

  6. Gunnar Fant. Acoustic Theory of Speech Production. Mouton 1970. The Hague, Paris (1970)

    Google Scholar 

  7. von Békésy, G.: Experiments in Hearing. McGraw-Hill, New York (1960); reprinted in 1989

    Google Scholar 

  8. Anderson, T.R.: A comparison of auditory models for speaker independent phoneme recognition. In: Proceedings of the 1993 International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 231–234 (1993)

    Google Scholar 

  9. Anderson, T.R.: Speaker independent phoneme recognition with an auditory model and a neural network: a comparison with traditional techniques. In: Proceedings of the Acoustics, Speech, and Signal Processing, pp. 149–152 (1991)

    Google Scholar 

  10. Anderson, T.R.: Auditory models with Kohonen SOFM and LVQ for speaker Independent Phoneme Recognition. In: IEEE International Conference on Neural Networks, vol. 7, pp. 4466–4469 (1994)

    Google Scholar 

  11. Jankowski Jr., C.R., Lippmann, R.P.: Comparison of auditory models for robust speech recognition. In: Proceedings of the workshop on Speech and Natural Language, pp. 453–454 (1992)

    Google Scholar 

  12. Kasper, K., Reininger, H., Wolf, D.: Exploiting the potential of auditory preprocessing for robust speech recognition by locally recurrent neural networks. In: Proc. Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), pp. 1223–1226 (1997)

    Google Scholar 

  13. Kim, D.-S., Lee, S.-Y., Hil, R.M.: Auditory processing of speech signals for robust speech recognition in real-world noisy environments. IEEE Transactions on Speech and Audio Processing, 55–69 (1999)

    Google Scholar 

  14. Koizumi, T., Mori, M., Taniguchi, S.: Speech recognition based on a model of human auditory system. In: 4th International Conference on Spoken Language Processing, pp. 937–940 (1996)

    Google Scholar 

  15. Hunt, M.J., Lefébvre, C.: Speaker dependent and independent speech recognition experiments with an auditory model. In: International Conference on Acoustics, Speech, and Signal Processing, pp. 215–218 (1988)

    Google Scholar 

  16. Colombi, J.M., Anderson, T.R., Rogers, S.K., Ruck, D.W., Warhola, G.T.: Auditory model representation and comparison for speaker recognition. In: IEEE International Conference on Neural Networks, pp. 1914–1919 (1993)

    Google Scholar 

  17. Colombi, J.M.: Cepstral and Auditory Model Features for Speaker Recognition. Master’s thesis (1992)

    Google Scholar 

  18. Shao, Y., Wang, D.: Robust speaker identification using auditory features and computational auditory scene analysis. In: International Conference on Acoustics, Speech, and Signal Processing, pp. 1589–1592 (2008)

    Google Scholar 

  19. Ortega-Garcia, J., González-Rodriguez, J., Marrero-Aguiar, V., et al.: Ahumada: A large speech corpus in Spanish for speaker identification and verification. Speech Communication 31(2-3), 255–264 (2000)

    Article  Google Scholar 

  20. Shamma, S.A., Chadwich, R.S., Wilbur, W.J., Morrish, K.A., Rinzel, J.: A biophysical model of cochlear processing: intensity dependence of pure tone responses. J. Acoust. Soc. Am. 80(1), 133–145 (1986)

    Article  Google Scholar 

  21. Poveda, E.A.L., Eustaquio-Martín, A.: A biophysical model of the Inner Hair Cell: The contribution of potassium currents to peripherical auditory compression. Journal of the Association for Research in Otolaryngology. JARO 7, 218–235 (2006)

    Article  Google Scholar 

  22. Martínez-Rams, E., Garcerán-Hernández, V., Ferrández-Vicente, J.M.: Low rate stochastic strategy for cochlear implants. Neurocomputing 72(4-6), 936–943 (2009)

    Article  Google Scholar 

  23. Martens, J.-P., Van Immerseel, L.: An auditory based on the analysis of envelope patterns. In: International Conference on Acoustic, Speech and Signal Processing, ICASSP 1990, vol. 1, pp. 401–404 (1990)

    Google Scholar 

  24. Immerseel, L.V., Martens, J.P.: Pitch and voiced/unvoiced determination with a auditory model. J. Acoust. Soc. Am. 91(6), 3511–3526 (1992)

    Article  Google Scholar 

  25. Kohonen, T.: Self-Organization and associative Memory, 3rd edn. Springer, Berlin (1989)

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Martínez–Rams, E.A., Garcerán–Hernández, V. (2009). Assessment of a Speaker Recognition System Based on an Auditory Model and Neural Nets. In: Mira, J., Ferrández, J.M., Álvarez, J.R., de la Paz, F., Toledo, F.J. (eds) Bioinspired Applications in Artificial and Natural Computation. IWINAC 2009. Lecture Notes in Computer Science, vol 5602. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02267-8_52

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02267-8_52

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02266-1

  • Online ISBN: 978-3-642-02267-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics