Abstract
We show that data-guided techniques optimized for classification of speech sounds into context-independent phoneme classes yield auditory-like frequency resolution and enhanced sensitivity to modulation frequencies in the 1–15 Hz range. Next we present a viable recognition paradigm in which temporal trajectories of critical band spectral energies in individual critical bands are used to yield estimates of likelihood of phoneme classes. The relative success of this technique leads to discussion about auditory basis of human speech communication process. Overall, we argue against spectral envelope based linguistic code in communication by speech.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Allen, J.B. (1994) How do humans process and recognize speech, IEEE Trans. on Speech and Audio Processing, (2), No. 4, (567–577).
Bourlard, H. and Dupont, S. (1996) A new ASR approach based on independent processing and re-combination of partial frequency bands, Proc. ICSLP 96, (426–429).
Cooper, F.S. Dellatre, P.C., Liberman, A.M., Borst, J.M, and Gerstman, L.J., (1952). Some experiments on the perception of synthetic speech stimuli. J. Acoust. Soc. Am. 24 (597–606).
Fletcher, H. (1953) Speech and hearing in communication, The ASA edition, edited by J.B. Allen, Acoust. Soc. Am.
Greenberg, S. (1996) Understanding speech understanding: Towards unified theory of speech perception, in Proc. ESCA Workshop on the Auditory Basis of Speech Perception, Geenberg, S. and Ainsworth, W.A., Eds. (1–8)
Hermansky, H. and Morgan, N. (1994) RASTA Processing of Speech, in IEEE Transactions on Speech and Audio Processing, (2), No. 4, (587–589).
Hermansky, H., Tibrewala, S. and Pavel, M. (1996) Towards ASR on Partially Corrupted Speech, in Proceedings ICSLP’96, (462–465)
Hermansky, H. (1998) Should recognizers have ears?, Speech Communication, 25, (3–27), Elsevier.
Hermansky, H. and Sharma S. (1998) TRAPS Classifiers of Temporal Patterns, in Proceedings ICSLP’98, Sydney, Australia.
Hermansky, H. and Sharma, S, (1999)., Temporal Patterns (TRAPS) in ASR of Noisy Speech,” in ICASSP’99, Phoenix, Arizona, USA.
Hermansky, H. and N. Malayath, (1998) Spectral basis functions from discriminant analysis, in Proc. of ICSLP98, Sydney
Malayath, N. (2000) Data-Driven Methods for Extracting Features from Speech, OGI Ph.D. thesis, Oregon Graduate Institute, Portland, Oregon
Sharma, S. (1999), Multi-Stream Approach To Robust Speech Recognition, OGI Ph.D. Thesis, Portland, Oregon
Stickney, G. S. and Assmann, P.F. (2001). Acoustic and linguistic factors in the perception of bandpass-filtered speech, J. Acoust. Soc. Am. 109, (1157–1165).
van Vuuren, S. and Hermansky, H. (1997) Data-driven design of RASTA-like filters. Proc. of EUROSPEECH97, Greece (409–412)
van Vuuren, S. (1999). Speaker Verification in a Feature-Time Space, OGI Ph.D. Thesis, Portland, Oregon.
Warren, R.M., Riener, K.R., Bashford, Jr. J.A. and Brubaker, B.S. (1995) Spectral redundancy: Intelligibility of sentences heard through narrow spectral slits, Percpt. Psychophysics, 57, (175–182).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Heřmanský, H. (2001). Human Speech Perception: Some Lessons from Automatic Speech Recognition. In: Matoušek, V., Mautner, P., Mouček, R., Taušer, K. (eds) Text, Speech and Dialogue. TSD 2001. Lecture Notes in Computer Science(), vol 2166. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44805-5_24
Download citation
DOI: https://doi.org/10.1007/3-540-44805-5_24
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42557-1
Online ISBN: 978-3-540-44805-1
eBook Packages: Springer Book Archive