Abstract
A method for detecting and displaying voiced elements of speech using amplitude modulated pulses due to unresolved harmonics of the excitation frequency (fundamental) is presented. It uses an auditory model consisting of a gammatone filterbank (modelling the basilar membrane), simple rectification (modelling the organ of Corti inner hair cells), envelope bandpass filters (modelling some spiral ganglion neuron effects) and amplitude modulation detectors (modelling certain cell populations in the cochlear nucleus). We demonstrate that it can display a pattern of activity across the spectrum and across time that describes the energy distribution in voiced speech, and that this pattern degrades slowly in the presence of non-speech noise.
Preview
Unable to display preview. Download preview PDF.
References
J.B. Allen. How do humans process and recognize speech. IEEE Transactions on Speech and Auditory Processing, 2(4):567–577, 1994.
A.S. Bregman. Auditory scene analysis. MIT Press, 1990.
B.R. Glasberg and B.C.J. Moore. Derivation of filter shapes from notched-noise data. Hearing Research, 47:103–138, 1990.
D.O. Kim, J.G. Sirianni, and S.O. Chang. Responses of den-pvcn neurons and auditory nerve fibres in unanesthetized decerebrate cats to am and pure tones: analysis with autocorrelation/power-spectrum. Hearing Research, 45:95–113, 1990.
Smith L.S. A neurally motivated technique for voicing detection and f 0 estimation in speech. Technical report, Centre for Cognitive and Computational Neuroscience, University of Stirling, Stirling UK, 1996.
Smith L.S. Onset-based sound segmentation. In D.S. Touretzky, M.C. Mozer, and M.E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 729–735. MIT Press, 1996.
A.R. Palmer and I.M. Winter. Cochlear nerve and cochlear nucleus responses to the fundamental frequency of voiced speech sounds and harmonic complex tones. Advances in the Biosciences, 83:231–239, 1992.
R.D. Patterson, M.H. Allerhand, and C. Giguere. Time-domain modelling of peripheral auditory processing: A modular architecture and a software platform. Journal of the Acoustical Society of America, 98:1890–1894, 1995.
I.M. Winter and A.R. Palmer. Level dependence of cochlear nucleus onset unit responses and facilitation by second tones or broadband noise. Journal of Neuroscience, 73(1):141–159, 1995.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Smith, L.S. (1997). A noise-robust auditory modelling front end for voiced speech. In: Gerstner, W., Germond, A., Hasler, M., Nicoud, JD. (eds) Artificial Neural Networks — ICANN'97. ICANN 1997. Lecture Notes in Computer Science, vol 1327. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0020139
Download citation
DOI: https://doi.org/10.1007/BFb0020139
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63631-1
Online ISBN: 978-3-540-69620-9
eBook Packages: Springer Book Archive