Abstract
At Delft University of Technology there is a project running on multimodal interfaces on the interaction of speech and lipreading. A large vocabulary speaker independent speech recognizer for the Dutch language was developed using Hidden Markov Toolkit and the Polyphone database of recorded Dutch speech. To make the system more noise robust audio cues provided by an automatic lip-reading technique were integrated in the system. In this paper we give an outline of both systems and present results of experiments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Wiggers, P, Wojdel J., Rothkrantz, L. A Speech Recognizer for the Dutch Language, In: Proceedings of Euromedia 2002, Modena, Italy.
Wojdel, J., Wiggers, P, Rotkrantz, L. The Audio-Visual Corpus for Multimodal Speech Recognition in Dutch Language, Submitted to ICSLP 2002, 2002.
Wojdel, J., Rothkrantz, L., Using Aerial and Geometric Features in Automatic Lip-reading, Proceedings of Eurospeech 2001, Scandinavia.
Damhuis M., Boogaart T., in’ t Veld, C., Versteijlen, M.,W. Schelvis, W., Bos, L., Boves L., Creation and Analysis of the Dutch Polyphone Corpus, Proceedings ICSLP’ 94, pp. 1803–1806, 18–22 September 1994,Yokohama, Japan.
Dupont, S. Luettin J., Using the Multi-Stream Approach for Continuous Audio Visual Speech Recognition, IDIAP Research Report 97-14.
Neti, C. Potamianos, G., Luettin, J. Mattews I., Glotin, H., Vergyri, D., Sison, J., Mashari, A., Zhou, J., Audio-Visual Speech Recognition, IBM T.J.Watson Research Center, SummerWorkshop 2000, Final Report.
Young, S., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., Woodland, P., The HTK Book (for HTK version 3.0), Cambridge University Engineering Department.
S. Pigeon and L. Vandendorpe, The M2VTS multimodal face database, in Lecture Notes in Computer Science: Audio-and Video-based Biometric Person Authentication (J. Bigun, G. Chollet and G. Borgefors, Eds.), vol. 1206, pp. 403–409, 1997.
Wolfgang Wahlster, Norbert Reithinger, Anselm Blocher: SmartKom: Multimodal Communication with a Life-Like Character, proceedings of Eurospeech 2001, Scandinavia.
A. Verma, T. Faruquie, C. Neti, S. Basu, A. Senior, Late integration in audio-visual continuous speech recognition, Automatic Speech Recognition and Understanding, 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wiggers, P., Rothkrantz, L.J.M. (2002). Integration of Speech Recognition and Automatic Lip-Reading. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2002. Lecture Notes in Computer Science(), vol 2448. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46154-X_28
Download citation
DOI: https://doi.org/10.1007/3-540-46154-X_28
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44129-8
Online ISBN: 978-3-540-46154-8
eBook Packages: Springer Book Archive