Human Speech Perception: Some Lessons from Automatic Speech Recognition

Heřmanský, Hynek

doi:10.1007/3-540-44805-5_24

Human Speech Perception: Some Lessons from Automatic Speech Recognition

Hynek Heřmanský^2,3

Conference paper
First Online: 01 January 2001

410 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2166))

Abstract

We show that data-guided techniques optimized for classification of speech sounds into context-independent phoneme classes yield auditory-like frequency resolution and enhanced sensitivity to modulation frequencies in the 1–15 Hz range. Next we present a viable recognition paradigm in which temporal trajectories of critical band spectral energies in individual critical bands are used to yield estimates of likelihood of phoneme classes. The relative success of this technique leads to discussion about auditory basis of human speech communication process. Overall, we argue against spectral envelope based linguistic code in communication by speech.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Allen, J.B. (1994) How do humans process and recognize speech, IEEE Trans. on Speech and Audio Processing, (2), No. 4, (567–577).
Google Scholar
Bourlard, H. and Dupont, S. (1996) A new ASR approach based on independent processing and re-combination of partial frequency bands, Proc. ICSLP 96, (426–429).
Google Scholar
Cooper, F.S. Dellatre, P.C., Liberman, A.M., Borst, J.M, and Gerstman, L.J., (1952). Some experiments on the perception of synthetic speech stimuli. J. Acoust. Soc. Am. 24 (597–606).
Article Google Scholar
Fletcher, H. (1953) Speech and hearing in communication, The ASA edition, edited by J.B. Allen, Acoust. Soc. Am.
Google Scholar
Greenberg, S. (1996) Understanding speech understanding: Towards unified theory of speech perception, in Proc. ESCA Workshop on the Auditory Basis of Speech Perception, Geenberg, S. and Ainsworth, W.A., Eds. (1–8)
Google Scholar
Hermansky, H. and Morgan, N. (1994) RASTA Processing of Speech, in IEEE Transactions on Speech and Audio Processing, (2), No. 4, (587–589).
Google Scholar
Hermansky, H., Tibrewala, S. and Pavel, M. (1996) Towards ASR on Partially Corrupted Speech, in Proceedings ICSLP’96, (462–465)
Google Scholar
Hermansky, H. (1998) Should recognizers have ears?, Speech Communication, 25, (3–27), Elsevier.
Article Google Scholar
Hermansky, H. and Sharma S. (1998) TRAPS Classifiers of Temporal Patterns, in Proceedings ICSLP’98, Sydney, Australia.
Google Scholar
Hermansky, H. and Sharma, S, (1999)., Temporal Patterns (TRAPS) in ASR of Noisy Speech,” in ICASSP’99, Phoenix, Arizona, USA.
Google Scholar
Hermansky, H. and N. Malayath, (1998) Spectral basis functions from discriminant analysis, in Proc. of ICSLP98, Sydney
Google Scholar
Malayath, N. (2000) Data-Driven Methods for Extracting Features from Speech, OGI Ph.D. thesis, Oregon Graduate Institute, Portland, Oregon
Google Scholar
Sharma, S. (1999), Multi-Stream Approach To Robust Speech Recognition, OGI Ph.D. Thesis, Portland, Oregon
Google Scholar
Stickney, G. S. and Assmann, P.F. (2001). Acoustic and linguistic factors in the perception of bandpass-filtered speech, J. Acoust. Soc. Am. 109, (1157–1165).
Google Scholar
van Vuuren, S. and Hermansky, H. (1997) Data-driven design of RASTA-like filters. Proc. of EUROSPEECH97, Greece (409–412)
Google Scholar
van Vuuren, S. (1999). Speaker Verification in a Feature-Time Space, OGI Ph.D. Thesis, Portland, Oregon.
Google Scholar
Warren, R.M., Riener, K.R., Bashford, Jr. J.A. and Brubaker, B.S. (1995) Spectral redundancy: Intelligibility of sentences heard through narrow spectral slits, Percpt. Psychophysics, 57, (175–182).
Google Scholar

Download references

Author information

Authors and Affiliations

OGI School of Engineering, Oregon Health and Sciences University, Portland, Oregon
Hynek Heřmanský
International Computer Science Institute, Berkeley, California
Hynek Heřmanský

Authors

Hynek Heřmanský
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Science and Engineering, University of West Bohemia in Plzeň, Faculty of Applied Sciences, Univerzitní 22, 306-14, Plzeň, Czech Republic
Václav Matoušek , Pavel Mautner , Roman Mouček & Karel Taušer , , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Heřmanský, H. (2001). Human Speech Perception: Some Lessons from Automatic Speech Recognition. In: Matoušek, V., Mautner, P., Mouček, R., Taušer, K. (eds) Text, Speech and Dialogue. TSD 2001. Lecture Notes in Computer Science(), vol 2166. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44805-5_24

Download citation

DOI: https://doi.org/10.1007/3-540-44805-5_24
Published: 24 August 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42557-1
Online ISBN: 978-3-540-44805-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics