Noisy audio feature enhancement using audio-visual speech data | IEEE Conference Publication | IEEE Xplore

Noisy audio feature enhancement using audio-visual speech data


Abstract:

We investigate improving automatic speech recognition (ASR) in noisy conditions by enhancing noisy audio features using visual speech captured from the speaker's face. Th...Show More

Abstract:

We investigate improving automatic speech recognition (ASR) in noisy conditions by enhancing noisy audio features using visual speech captured from the speaker's face. The enhancement is achieved by applying a linear filter to the concatenated vector of noisy audio and visual features, obtained by mean square error estimation of the clean audio features in a training stage. The performance of the enhanced audio features is evaluated on two ASR tasks: A connected digits task and speaker-independent, large-vocabulary, continuous speech recognition. In both cases and at sufficiently low signal-to-noise ratios (SNRs), ASR trained on the enhanced audio features significantly outperforms ASR trained on the noisy audio, achieving for example a 46% relative reduction in word error rate on the digits task at −3.5 dB SNR. However, the method fails to capture the full visual modality benefit to ASR, as demonstrated by its comparison to discriminant audio-visual feature fusion introduced in previous work.
Date of Conference: 13-17 May 2002
Date Added to IEEE Xplore: 07 April 2011
Print ISBN:0-7803-7402-9
Print ISSN: 1520-6149
Conference Location: Orlando, FL, USA

Contact IEEE to Subscribe

References

References is not available for this document.