A noise-robust auditory modelling front end for voiced speech

Smith, Leslie S.

doi:10.1007/BFb0020139

Leslie S. Smith¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1327))

Included in the following conference series:

International Conference on Artificial Neural Networks

311 Accesses

Abstract

A method for detecting and displaying voiced elements of speech using amplitude modulated pulses due to unresolved harmonics of the excitation frequency (fundamental) is presented. It uses an auditory model consisting of a gammatone filterbank (modelling the basilar membrane), simple rectification (modelling the organ of Corti inner hair cells), envelope bandpass filters (modelling some spiral ganglion neuron effects) and amplitude modulation detectors (modelling certain cell populations in the cochlear nucleus). We demonstrate that it can display a pattern of activity across the spectrum and across time that describes the energy distribution in voiced speech, and that this pattern degrades slowly in the presence of non-speech noise.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

J.B. Allen. How do humans process and recognize speech. IEEE Transactions on Speech and Auditory Processing, 2(4):567–577, 1994.
Google Scholar
A.S. Bregman. Auditory scene analysis. MIT Press, 1990.
Google Scholar
B.R. Glasberg and B.C.J. Moore. Derivation of filter shapes from notched-noise data. Hearing Research, 47:103–138, 1990.
Google Scholar
D.O. Kim, J.G. Sirianni, and S.O. Chang. Responses of den-pvcn neurons and auditory nerve fibres in unanesthetized decerebrate cats to am and pure tones: analysis with autocorrelation/power-spectrum. Hearing Research, 45:95–113, 1990.
Google Scholar
Smith L.S. A neurally motivated technique for voicing detection and f ₀ estimation in speech. Technical report, Centre for Cognitive and Computational Neuroscience, University of Stirling, Stirling UK, 1996.
Google Scholar
Smith L.S. Onset-based sound segmentation. In D.S. Touretzky, M.C. Mozer, and M.E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 729–735. MIT Press, 1996.
Google Scholar
A.R. Palmer and I.M. Winter. Cochlear nerve and cochlear nucleus responses to the fundamental frequency of voiced speech sounds and harmonic complex tones. Advances in the Biosciences, 83:231–239, 1992.
Google Scholar
R.D. Patterson, M.H. Allerhand, and C. Giguere. Time-domain modelling of peripheral auditory processing: A modular architecture and a software platform. Journal of the Acoustical Society of America, 98:1890–1894, 1995.
Google Scholar
I.M. Winter and A.R. Palmer. Level dependence of cochlear nucleus onset unit responses and facilitation by second tones or broadband noise. Journal of Neuroscience, 73(1):141–159, 1995.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing Science and Mathematics, University of Stirling, FK9 4LA, Stirling, Scotland
Leslie S. Smith

Authors

Leslie S. Smith
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Wulfram Gerstner Alain Germond Martin Hasler Jean-Daniel Nicoud

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Smith, L.S. (1997). A noise-robust auditory modelling front end for voiced speech. In: Gerstner, W., Germond, A., Hasler, M., Nicoud, JD. (eds) Artificial Neural Networks — ICANN'97. ICANN 1997. Lecture Notes in Computer Science, vol 1327. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0020139

Download citation

DOI: https://doi.org/10.1007/BFb0020139
Published: 09 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63631-1
Online ISBN: 978-3-540-69620-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics