Abstract
This paper presents our work on lip reading in the Dutch language. The results are based on a new data corpus recorded at 100Hz in our group. The NDUTAVSC corpus is to date the largest corpus build for lip reading in Dutch. For parameterising the input data we use Active Appearance Models. Based on the results of AAM we define a set of high level geometric features which are used for training recognizer systems for different recognition tasks, such as fixed length digits strings, random length letters strings, random word sequences, fixed topic continuous speech and random continuous speech. We show that our approach gives great improvements compared to previous results. We also investigate the influence of the high speed recordings on the performance of the recognition. We show that in the case of high speech rate the use of higher speed recordings is compulsory.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Wang, S.L., Lau, W.H., Leung, S.H.: Automatic Lipreading with Limited Training Data. In: Proc. of the 18th Int. Conf. on Pattern Recognition, Washington, DC, vol. 3, pp. 881–884 (2006)
Bregler, C., Konig, Y.: “Eigenlip” for Robust Speech Recognition. In: IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (1994)
Duchnowski, P., Hunke, M., Büsching, D., Meier, U., Waibel, A.: Toward Movement-Invariant Automatic Lip-Reading and Speech Recognition. In: Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 1, pp. 109–112 (1995)
Essa, I.A., Pentland, A.: A Vision System for Observing and Extracting Facial Action Parameters. In: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, pp. 76–83 (1994)
Fleet, D.J., Black, M.J., Yacoob, Y., Jepson, A.D.: Design and Use of Linear Models for Image Motion Analysis. Int. Jour. of Computer Vision 36(3), 171–193 (2000)
Chiţu, A.G., Rothkrantz, L.J.M., Wiggers, P., Wojdel, J.: Comparison Between Different Feature Extraction Techniques for Audio-Visual. Jour. on Multimodal User Interfaces 1(1), 7–20 (2007)
Chiţu, A.G., Rothkrantz, L.J.M.: Building a Data Corpus for Audio-Visual Speech Recognition. In: Euromedia 2007, pp. 88–92 (2007)
Chiţu, A.G., Rothkrantz, L.J.M.: The Influence of Video Sampling Rate on Lipreading Performance. In: 12th Int. Conf. on Speech and Computer, pp. 678–684 (2007)
Wojdel, J.C., Wiggers, P., Rothkrantz, L.J.M.: An Audio-Visual Corpus for Multimodal Speech Recognition in Dutch Language. In: Proc. of the Int. Conf. on Spoken Language Processing, Denver CO, pp. 1917–1920 (2002)
Edwards, G., Taylor, C., Cootes, T.: Interpreting Face Images using Active Appearance Models. In: 3rd Int. Conf. on Automatic Face and Gesture Recognition, pp. 300–305 (1998)
Cootes, T.F., Taylor, C.J.: Statistical Models of Appearance for Medical Image Analysis and Computer Vision. In: Proc. of the SPIE Medical Imaging, vol. 4322, pp. 236–248 (2001)
Potamianos, G., Graf, H.P., Cosatto, E.: An Image Transform Approach for HMM Based Automatic Lipreading. In: Proc. of IEEE Int. Conf. on Image Processing, vol. 1, pp. 173–177 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chitu, A.G., Driel, K., Rothkrantz, L.J.M. (2010). Automatic Lip Reading in the Dutch Language Using Active Appearance Models on High Speed Recordings. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2010. Lecture Notes in Computer Science(), vol 6231. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15760-8_33
Download citation
DOI: https://doi.org/10.1007/978-3-642-15760-8_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15759-2
Online ISBN: 978-3-642-15760-8
eBook Packages: Computer ScienceComputer Science (R0)