Conferences >2003 IEEE International Confe...

Frame-dependent multi-stream reliability indicators for audio-visual speech recognition

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

We investigate the use of local, frame-dependent reliability indicators of the audio and visual modalities, as a means of estimating stream exponents of multi-stream hidd...Show More

Metadata

Abstract:

We investigate the use of local, frame-dependent reliability indicators of the audio and visual modalities, as a means of estimating stream exponents of multi-stream hidden Markov models for audio-visual automatic speech recognition. We consider two such indicators at each modality, defined as functions of the speech-class conditional observation probabilities of appropriate audio-or visual-only classifiers. We subsequently map the four reliability indicators into the stream exponents of a state-synchronous, two-stream hidden Markov model, as a sigmoid function of their linear combination. We propose two algorithms to estimate the sigmoid weights, based on the maximum conditional likelihood and minimum classification error criteria. We demonstrate the superiority of the proposed approach on a connected-digit audio-visual speech recognition task, under varying audio channel noise conditions. Indeed, the use of the estimated, frame-dependent stream exponents results in a significantly smaller word error rate than using global stream exponents. In addition, it outperforms utterance-level exponents, even though the latter utilize a-priori knowledge of the utterance noise level.

Published in: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).

Date of Conference: 06-10 April 2003

Date Added to IEEE Xplore: 21 May 2003

Print ISBN:0-7803-7663-3

Print ISSN: 1520-6149

DOI: 10.1109/ICASSP.2003.1198707

Conference Location: Hong Kong, China

Contents

References is not available for this document.

Frame-dependent multi-stream reliability indicators for audio-visual speech recognition

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Frame-dependent multi-stream reliability indicators for audio-visual speech recognition

Alerts

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?