Summary
The human machine interaction of SmartKom is a very complex task, defined by natural, spontaneous language, speaker independence, large vocabularies, and background noises. Speech recognition is an integral part of the multimodal dialogue system. It transforms the acoustic input signal into an orthographic transcription representing the utterance of the speaker. This contribution discusses how to enhance and customize the speech recognizer for the SmartKom applications. Significant improvements were achieved by adapting the speech recognizer to the environment, to the speaker, and to the task. Speech recognition confidence measures were investigated to reject unreliable user input and to detect user input containing unknown words, i.e., words that are not contained in the vocabulary of the speech recognizer. Finally, we present new ideas for future work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
L.L. Chase. Error-Responsive Feedback Mechanisms for Speech Recognizers. PhD thesis, Carnegie Mellon University, Pittsburgh, PA, 1997.
F. Class, A. Kaltenmeier, and P. Regel-Brietzmann. Optimization of an HMM-based Continuous Speech Recognizer. In: Proc. EUROSPEECH-93, pp. 803–806, Berlin, Germany, 1993.
E. Eide, H. Gish, P. Jeanrenaud, and A. Mielke. Understanding and Improving Speech Recognition Performance Through the Use of Diagnostic Tools. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP-95), pp. 221–224, Detroit, MI, 1995.
P. Fetter. Detection and Transcription of Out-Of-Vocabulary Words in Continuous Speech Recognition. PhD thesis, Technical University of Berlin, 1998.
P. Fetter, F. Dandurand, and P. Regel-Brietzmann. Word Graph Rescoring Using Confidence Measures. In: Proc. ICSLP-96, pp. 10–13, Philadelphia, PA, 1996.
M. Finke, T. Zeppenfeld, M. Maier, L. Mayfield, K. Ries, P. Zhan, and A. Waibel. Switchboard April 1996. Technical report, DARPA, 1996.
F. Metze, T. Kemp, T. Schultz, and H. Soltau. Confidence Measure Based Language Identification. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP-2000), Istanbul, Turkey, 2000.
M. Pitz, F. Wessel, and H. Ney. Improved MLLR Speaker Adaptation Using Confidence Measures for Conversational Speech Recognition. In: Proc. ICSLP-2000, Beijing, China, 2000.
T. Schaaf and T. Kemp. Confidence Measure for Spontaneous Speech Recognition. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP-97), pp. 875–878, Munich, Germany, 1997.
M. Weintraub. LVCSR Log-Likelihood Ratio Scoring for Keyword Spotting. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP-95), pp. 887–890, Detroit, MI, 1995.
F. Wessel, K. Macherey, and R. SchlĂ¼ter. Using Word Probabilities as Confidence Measures. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP-98), pp. 225–228, Budapest, Hungary, 1998.
G. Williams and S. Renals. Confidence Measures for Hybrid HMM/ANN Speech Recognition. In: Proc. EUROSPEECH-97, pp. 1955–1958, Rhodes, Greece, 1997.
S. Young. Detection of Misrecognitions and Out-Of-Vocabulary Words. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP-94), pp. 21–24, Adelaide, Australia, 1994.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Berton, A., Kaltenmeier, A., Haiber, U., Schreiner, O. (2006). Speech Recognition. In: Wahlster, W. (eds) SmartKom: Foundations of Multimodal Dialogue Systems. Cognitive Technologies. Springer, Berlin, Heidelberg . https://doi.org/10.1007/3-540-36678-4_6
Download citation
DOI: https://doi.org/10.1007/3-540-36678-4_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23732-7
Online ISBN: 978-3-540-36678-2
eBook Packages: Computer ScienceComputer Science (R0)