Conferences >2013 36th International Confe...

Audio-visual speech recognition in noisy audio environments

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

It is a well-known fact that the visual part of speech can improve the resulting recognition rate mainly in noisy conditions. Main goal of this work is to find a set of v...Show More

Metadata

Abstract:

It is a well-known fact that the visual part of speech can improve the resulting recognition rate mainly in noisy conditions. Main goal of this work is to find a set of visual features which would be possible to use in our audio-visual speech recognition systems. Discrete Cosine Transform (DCT) and Active Appearance Model (AAM) based visual features are extracted from visual speech signals, enhanced by a simplified variant of Hierarchical Linear Discriminant Analysis (HiLDA) and normalized across speakers. The visual features are then combined with standard MFCC audio features by the middle fusion method. The results from audio-visual speech recognition are compared with the results from experiments where the log-spectra minimum mean square error and multiband spectral subtraction methods for reducing additive noise in the audio signal are used.

Published in: 2013 36th International Conference on Telecommunications and Signal Processing (TSP)

Date of Conference: 02-04 July 2013

Date Added to IEEE Xplore: 30 September 2013

ISBN Information:

DOI: 10.1109/TSP.2013.6613979

Conference Location: Rome, Italy

Contents

References is not available for this document.

Audio-visual speech recognition in noisy audio environments

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Audio-visual speech recognition in noisy audio environments

Alerts

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?