Conferences >2004 IEEE International Confe...

Improved face and feature finding for audio-visual speech recognition in visually challenging environments

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Visual information in a speaker's face is known to improve the robustness of automatic speech recognition (ASR). However, most studies in audio-visual ASR have focused on...Show More

Metadata

Abstract:

Visual information in a speaker's face is known to improve the robustness of automatic speech recognition (ASR). However, most studies in audio-visual ASR have focused on "visually clean" data to benefit ASR in noise. This paper is a follow up on a previous study that investigated audio-visual ASR in visually challenging environments. It focuses on visual speech front end processing, and it proposes an improved, appearance based face and feature detection algorithm that utilizes Gaussian mixture model classifiers. This method is shown to improve the accuracy of face and feature detection, and thus visual speech recognition, over our previously used baseline system. In turn, this translates to improved audio-visual ASR, resulting in a 10% relative reduction of the word-error-rate in noisy speech.

Published in: 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing

Date of Conference: 17-21 May 2004

Date Added to IEEE Xplore: 30 August 2004

Print ISBN:0-7803-8484-9

Print ISSN: 1520-6149

DOI: 10.1109/ICASSP.2004.1327250

Conference Location: Montreal, QC, Canada

Contents

References is not available for this document.

Improved face and feature finding for audio-visual speech recognition in visually challenging environments

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Improved face and feature finding for audio-visual speech recognition in visually challenging environments

Alerts

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?