Abstract:
In this article a complete audio-visual speech recognition system suitable for embedded devices is presented. As visual feature extraction algorithms active shape models ...Show MoreMetadata
First Page of the Article
![](/xploreAssets/images/absImages/01415153.png)
Abstract:
In this article a complete audio-visual speech recognition system suitable for embedded devices is presented. As visual feature extraction algorithms active shape models (ASM) and discrete cosine transformation (DCT) have been investigated and discussed for an embedded implementation. The audio-visual information integration has also been designed by taking into account device limitations. It is well known that the use of visual cues improves the recognition results especially in scenarios with high level of acoustical noise. We wanted to compare the performance of lip reading and the conventional noise reduction systems in these degraded scenarios, as well as the combination of both kinds of solutions. Important improvements are obtained especially for nonstationary background noise like voice interference, car acceleration or indicator clicks. For this kind of noise lip reading outperforms the results obtained with conventional noise reduction technologies.
Published in: Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005.
Date of Conference: 23-23 March 2005
Date Added to IEEE Xplore: 09 May 2005
Print ISBN:0-7803-8874-7
ISSN Information:
First Page of the Article
![](/xploreAssets/images/absImages/01415153.png)