Skip to main content

Dealing with Unexpected Words in Automatic Recognition of Speech

  • Conference paper
Book cover Text, Speech and Dialogue (TSD 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6836))

Included in the following conference series:

Abstract

Unexpected words attract listener’s attention. They are information-rich and getting them right is important for human communication. In the automatic recognition of speech (ASR), words that are not in the expected lexicon of the machine are typically substituted by some acoustically similar but nevertheless wrong words. The article discusses reasons for this undesirable behavior of the machine, describes some known examples of dealing with the unexpected words in human speech perception and their implications, and proposes an alternative architecture of ASR that could alleviate some of the problems with the unexpected acoustic inputs. Some published experimental results from using this alternative architecture are given.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Klatt, D.H.: Review of the ARPA speech understanding project. J. Acoust. Soc. Am. 62, 1345–1366 (1977)

    Article  Google Scholar 

  2. Chase, L.L.: Error-Responsive Feedback Mechanism for Speech Recognizers, PhD Thesis, Carnegie-Mellon University

    Google Scholar 

  3. Allen, J.B.: Articulation and Intelligibility. Morgan & Claypool (2005)

    Google Scholar 

  4. Van Petten, C., et al.: Time course of word identification and semantic integration in spoken language. J. Experimental Psychology: Learning, Memory, and Cognition 25(2) (1999)

    Google Scholar 

  5. Boothroyd, A.: Speech perception and sensorineural hearing loss. In: Ross, M., Giolas, G. (eds.) Auditory Management of Hearing-Impaired Children, University Park, Baltimore, MD (1978)

    Google Scholar 

  6. Boothroyd, A., Nittrouer, S.: Mathematical treatment of context effects in phoneme and word recognition. J. Acoust. Soc. Am. 84(1), 101–114 (1988)

    Article  Google Scholar 

  7. Miller, G.A., Heise, G.A., Lichten, W.: The intelligibility of speech as a function of the context of the test material. J. Exp. Psychol. 41, 329–335 (1951)

    Article  Google Scholar 

  8. Grant, K.W., Seitz, P.F.: The recognition of isolated words ands words in sentences: Individual variability in the use of sentence context. J. Acoust. Soc. Am. 107(2)

    Google Scholar 

  9. Rankovic, C., Allen, J.B.: Study of Speech and Hearing in Bell Telephone Laboratories: The Fletcher Years. In: CD ROM with Correspondence, Internal Reports and Notebooks of R. Galt (1917-1933). Acoustical Society of America, Melville (2000)

    Google Scholar 

  10. Bourlard, H., Wellekens, C.J.: Links between Markov Models and  Multilayer Perceptrons. In: Touretzky, D. (ed.) IEEE Conference on Neural Information Processing Systems, 1988, Denver, CO, pp. 502–510. Morgan-Kaufmann Publishers, San Francisco (1989)

    Google Scholar 

  11. Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1998)

    Google Scholar 

  12. Ketabdar, H., Hannemann, M., Hermansky, H.: Detection of Out-of-Vocabulary Words in Posterior Based ASR. In: Proceedings of the International Conference on Spoken Language Processing, Antwerp, Belgium (2007)

    Google Scholar 

  13. Wessel, F., et al.: Confidence measures for large vocabulary continuous speech recognition. IEEE Trans. Speech and Audio Processing 9(3), 288–298 (2001)

    Article  MathSciNet  Google Scholar 

  14. White, C., et al.: Confidence Estimation, OOV Detection And Language ID Using Phone-To-Word Transduction And Phone-Level Alignments. In: Proc. ICASSP (2008)

    Google Scholar 

  15. Burget, L., et al.: Combination Of Strongly And Weakly Constrained Recognizers For Reliable Detection Of OOVs. In: Proc. ICASSP (2008)

    Google Scholar 

  16. Kombrink, S., et al.: Posterior-based Out of Vocabulary Word Detection in Telephone Speech. In: Proc. Interspeech 2009, Brighton, U.K (2009)

    Google Scholar 

  17. Hannemann, M., et al.: Similarity scoring for recognized repeated Out-of-Vocabulary words. In: Proc. Interspeech 2010, Makuhari, Japan (2010)

    Google Scholar 

  18. Szöke, I., Fapso, M., Burget, L., Cernocky, J.: Hybrid Word-Subword Decoding for Spoken Term Detection. In: SSCS 2008 - Speech Search Workshop at SIGIR (2008)

    Google Scholar 

  19. Kombrink, S., et al.: Recovery of rare words in lecture speech. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS (LNAI), vol. 6231, pp. 330–337. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  20. Kombrink, S.: OOV detection and beyond. In: DIRAC workshop at ECML/PKDD, Barcelona (2010)

    Google Scholar 

  21. Tobias, J.V.: Foundations of Modern Auditory Theory. Academic Press, London (1970)

    Google Scholar 

  22. Mesgarani, N., et al.: Phoneme representation and classification in primary auditory cortex. Acoust. Soc. Am. 123, 899–909 (2008)

    Article  Google Scholar 

  23. Mesgarani, N., et al.: Toward optimizing stream fusion in multistream recognition of speech. J. Acoust. Soc. Am. 130(1), EL14–EL18 (2011); (5 pages)

    Article  Google Scholar 

  24. Hermansky, H., et al.: Performance Monitoring For Robustness In Automatic Recognition Of Speech. In: Proc. Symposium on Machine Learning in Speech and Language Processing, Bellevue, Washington, USA (June 2011)

    Google Scholar 

  25. Mesgarani, N., Thomas, S., Hermansky, H.: Adaptive Stream Fusion in Multistream Recognition of Speech. In: Proc. Interspeech (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hermansky, H. (2011). Dealing with Unexpected Words in Automatic Recognition of Speech. In: Habernal, I., Matoušek, V. (eds) Text, Speech and Dialogue. TSD 2011. Lecture Notes in Computer Science(), vol 6836. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23538-2_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23538-2_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23537-5

  • Online ISBN: 978-3-642-23538-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics