Skip to main content

A noise-robust auditory modelling front end for voiced speech

  • Part I: Coding and Learning in Biology
  • Conference paper
  • First Online:
Artificial Neural Networks — ICANN'97 (ICANN 1997)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1327))

Included in the following conference series:

  • 311 Accesses

Abstract

A method for detecting and displaying voiced elements of speech using amplitude modulated pulses due to unresolved harmonics of the excitation frequency (fundamental) is presented. It uses an auditory model consisting of a gammatone filterbank (modelling the basilar membrane), simple rectification (modelling the organ of Corti inner hair cells), envelope bandpass filters (modelling some spiral ganglion neuron effects) and amplitude modulation detectors (modelling certain cell populations in the cochlear nucleus). We demonstrate that it can display a pattern of activity across the spectrum and across time that describes the energy distribution in voiced speech, and that this pattern degrades slowly in the presence of non-speech noise.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J.B. Allen. How do humans process and recognize speech. IEEE Transactions on Speech and Auditory Processing, 2(4):567–577, 1994.

    Google Scholar 

  2. A.S. Bregman. Auditory scene analysis. MIT Press, 1990.

    Google Scholar 

  3. B.R. Glasberg and B.C.J. Moore. Derivation of filter shapes from notched-noise data. Hearing Research, 47:103–138, 1990.

    Google Scholar 

  4. D.O. Kim, J.G. Sirianni, and S.O. Chang. Responses of den-pvcn neurons and auditory nerve fibres in unanesthetized decerebrate cats to am and pure tones: analysis with autocorrelation/power-spectrum. Hearing Research, 45:95–113, 1990.

    Google Scholar 

  5. Smith L.S. A neurally motivated technique for voicing detection and f 0 estimation in speech. Technical report, Centre for Cognitive and Computational Neuroscience, University of Stirling, Stirling UK, 1996.

    Google Scholar 

  6. Smith L.S. Onset-based sound segmentation. In D.S. Touretzky, M.C. Mozer, and M.E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 729–735. MIT Press, 1996.

    Google Scholar 

  7. A.R. Palmer and I.M. Winter. Cochlear nerve and cochlear nucleus responses to the fundamental frequency of voiced speech sounds and harmonic complex tones. Advances in the Biosciences, 83:231–239, 1992.

    Google Scholar 

  8. R.D. Patterson, M.H. Allerhand, and C. Giguere. Time-domain modelling of peripheral auditory processing: A modular architecture and a software platform. Journal of the Acoustical Society of America, 98:1890–1894, 1995.

    Google Scholar 

  9. I.M. Winter and A.R. Palmer. Level dependence of cochlear nucleus onset unit responses and facilitation by second tones or broadband noise. Journal of Neuroscience, 73(1):141–159, 1995.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Wulfram Gerstner Alain Germond Martin Hasler Jean-Daniel Nicoud

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Smith, L.S. (1997). A noise-robust auditory modelling front end for voiced speech. In: Gerstner, W., Germond, A., Hasler, M., Nicoud, JD. (eds) Artificial Neural Networks — ICANN'97. ICANN 1997. Lecture Notes in Computer Science, vol 1327. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0020139

Download citation

  • DOI: https://doi.org/10.1007/BFb0020139

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-63631-1

  • Online ISBN: 978-3-540-69620-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics