Skip to main content

Part of the book series: Cognitive Technologies ((COGTECH))

  • 678 Accesses

Summary

The human machine interaction of SmartKom is a very complex task, defined by natural, spontaneous language, speaker independence, large vocabularies, and background noises. Speech recognition is an integral part of the multimodal dialogue system. It transforms the acoustic input signal into an orthographic transcription representing the utterance of the speaker. This contribution discusses how to enhance and customize the speech recognizer for the SmartKom applications. Significant improvements were achieved by adapting the speech recognizer to the environment, to the speaker, and to the task. Speech recognition confidence measures were investigated to reject unreliable user input and to detect user input containing unknown words, i.e., words that are not contained in the vocabulary of the speech recognizer. Finally, we present new ideas for future work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • L.L. Chase. Error-Responsive Feedback Mechanisms for Speech Recognizers. PhD thesis, Carnegie Mellon University, Pittsburgh, PA, 1997.

    Google Scholar 

  • F. Class, A. Kaltenmeier, and P. Regel-Brietzmann. Optimization of an HMM-based Continuous Speech Recognizer. In: Proc. EUROSPEECH-93, pp. 803–806, Berlin, Germany, 1993.

    Google Scholar 

  • E. Eide, H. Gish, P. Jeanrenaud, and A. Mielke. Understanding and Improving Speech Recognition Performance Through the Use of Diagnostic Tools. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP-95), pp. 221–224, Detroit, MI, 1995.

    Google Scholar 

  • P. Fetter. Detection and Transcription of Out-Of-Vocabulary Words in Continuous Speech Recognition. PhD thesis, Technical University of Berlin, 1998.

    Google Scholar 

  • P. Fetter, F. Dandurand, and P. Regel-Brietzmann. Word Graph Rescoring Using Confidence Measures. In: Proc. ICSLP-96, pp. 10–13, Philadelphia, PA, 1996.

    Google Scholar 

  • M. Finke, T. Zeppenfeld, M. Maier, L. Mayfield, K. Ries, P. Zhan, and A. Waibel. Switchboard April 1996. Technical report, DARPA, 1996.

    Google Scholar 

  • F. Metze, T. Kemp, T. Schultz, and H. Soltau. Confidence Measure Based Language Identification. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP-2000), Istanbul, Turkey, 2000.

    Google Scholar 

  • M. Pitz, F. Wessel, and H. Ney. Improved MLLR Speaker Adaptation Using Confidence Measures for Conversational Speech Recognition. In: Proc. ICSLP-2000, Beijing, China, 2000.

    Google Scholar 

  • T. Schaaf and T. Kemp. Confidence Measure for Spontaneous Speech Recognition. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP-97), pp. 875–878, Munich, Germany, 1997.

    Google Scholar 

  • M. Weintraub. LVCSR Log-Likelihood Ratio Scoring for Keyword Spotting. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP-95), pp. 887–890, Detroit, MI, 1995.

    Google Scholar 

  • F. Wessel, K. Macherey, and R. SchlĂ¼ter. Using Word Probabilities as Confidence Measures. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP-98), pp. 225–228, Budapest, Hungary, 1998.

    Google Scholar 

  • G. Williams and S. Renals. Confidence Measures for Hybrid HMM/ANN Speech Recognition. In: Proc. EUROSPEECH-97, pp. 1955–1958, Rhodes, Greece, 1997.

    Google Scholar 

  • S. Young. Detection of Misrecognitions and Out-Of-Vocabulary Words. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP-94), pp. 21–24, Adelaide, Australia, 1994.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Berton, A., Kaltenmeier, A., Haiber, U., Schreiner, O. (2006). Speech Recognition. In: Wahlster, W. (eds) SmartKom: Foundations of Multimodal Dialogue Systems. Cognitive Technologies. Springer, Berlin, Heidelberg . https://doi.org/10.1007/3-540-36678-4_6

Download citation

  • DOI: https://doi.org/10.1007/3-540-36678-4_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23732-7

  • Online ISBN: 978-3-540-36678-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics