Skip to main content

Human Speech Perception: Some Lessons from Automatic Speech Recognition

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2166))

Abstract

We show that data-guided techniques optimized for classification of speech sounds into context-independent phoneme classes yield auditory-like frequency resolution and enhanced sensitivity to modulation frequencies in the 1–15 Hz range. Next we present a viable recognition paradigm in which temporal trajectories of critical band spectral energies in individual critical bands are used to yield estimates of likelihood of phoneme classes. The relative success of this technique leads to discussion about auditory basis of human speech communication process. Overall, we argue against spectral envelope based linguistic code in communication by speech.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allen, J.B. (1994) How do humans process and recognize speech, IEEE Trans. on Speech and Audio Processing, (2), No. 4, (567–577).

    Google Scholar 

  2. Bourlard, H. and Dupont, S. (1996) A new ASR approach based on independent processing and re-combination of partial frequency bands, Proc. ICSLP 96, (426–429).

    Google Scholar 

  3. Cooper, F.S. Dellatre, P.C., Liberman, A.M., Borst, J.M, and Gerstman, L.J., (1952). Some experiments on the perception of synthetic speech stimuli. J. Acoust. Soc. Am. 24 (597–606).

    Article  Google Scholar 

  4. Fletcher, H. (1953) Speech and hearing in communication, The ASA edition, edited by J.B. Allen, Acoust. Soc. Am.

    Google Scholar 

  5. Greenberg, S. (1996) Understanding speech understanding: Towards unified theory of speech perception, in Proc. ESCA Workshop on the Auditory Basis of Speech Perception, Geenberg, S. and Ainsworth, W.A., Eds. (1–8)

    Google Scholar 

  6. Hermansky, H. and Morgan, N. (1994) RASTA Processing of Speech, in IEEE Transactions on Speech and Audio Processing, (2), No. 4, (587–589).

    Google Scholar 

  7. Hermansky, H., Tibrewala, S. and Pavel, M. (1996) Towards ASR on Partially Corrupted Speech, in Proceedings ICSLP’96, (462–465)

    Google Scholar 

  8. Hermansky, H. (1998) Should recognizers have ears?, Speech Communication, 25, (3–27), Elsevier.

    Article  Google Scholar 

  9. Hermansky, H. and Sharma S. (1998) TRAPS Classifiers of Temporal Patterns, in Proceedings ICSLP’98, Sydney, Australia.

    Google Scholar 

  10. Hermansky, H. and Sharma, S, (1999)., Temporal Patterns (TRAPS) in ASR of Noisy Speech,” in ICASSP’99, Phoenix, Arizona, USA.

    Google Scholar 

  11. Hermansky, H. and N. Malayath, (1998) Spectral basis functions from discriminant analysis, in Proc. of ICSLP98, Sydney

    Google Scholar 

  12. Malayath, N. (2000) Data-Driven Methods for Extracting Features from Speech, OGI Ph.D. thesis, Oregon Graduate Institute, Portland, Oregon

    Google Scholar 

  13. Sharma, S. (1999), Multi-Stream Approach To Robust Speech Recognition, OGI Ph.D. Thesis, Portland, Oregon

    Google Scholar 

  14. Stickney, G. S. and Assmann, P.F. (2001). Acoustic and linguistic factors in the perception of bandpass-filtered speech, J. Acoust. Soc. Am. 109, (1157–1165).

    Google Scholar 

  15. van Vuuren, S. and Hermansky, H. (1997) Data-driven design of RASTA-like filters. Proc. of EUROSPEECH97, Greece (409–412)

    Google Scholar 

  16. van Vuuren, S. (1999). Speaker Verification in a Feature-Time Space, OGI Ph.D. Thesis, Portland, Oregon.

    Google Scholar 

  17. Warren, R.M., Riener, K.R., Bashford, Jr. J.A. and Brubaker, B.S. (1995) Spectral redundancy: Intelligibility of sentences heard through narrow spectral slits, Percpt. Psychophysics, 57, (175–182).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Heřmanský, H. (2001). Human Speech Perception: Some Lessons from Automatic Speech Recognition. In: Matoušek, V., Mautner, P., Mouček, R., Taušer, K. (eds) Text, Speech and Dialogue. TSD 2001. Lecture Notes in Computer Science(), vol 2166. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44805-5_24

Download citation

  • DOI: https://doi.org/10.1007/3-540-44805-5_24

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42557-1

  • Online ISBN: 978-3-540-44805-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics