Conferences >2015 IEEE Workshop on Automat...

Speaker location and microphone spacing invariant acoustic modeling from raw multichannel waveforms

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Multichannel ASR systems commonly use separate modules to perform speech enhancement and acoustic modeling. In this paper, we present an algorithm to do multichannel enha...Show More

Metadata

Abstract:

Multichannel ASR systems commonly use separate modules to perform speech enhancement and acoustic modeling. In this paper, we present an algorithm to do multichannel enhancement jointly with the acoustic model, using a raw waveform convolutional LSTM deep neural network (CLDNN). We will show that our proposed method offers ~5% relative improvement in WER over a log-mel CLDNN trained on multiple channels. Analysis shows that the proposed network learns to be robust to varying angles of arrival for the target speaker, and performs as well as a model that is given oracle knowledge of the true location. Finally, we show that training such a network on inputs captured using multiple (linear) array configurations results in a model that is robust to a range of microphone spacings.

Published in: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

Date of Conference: 13-17 December 2015

Date Added to IEEE Xplore: 11 February 2016

ISBN Information:

DOI: 10.1109/ASRU.2015.7404770

Conference Location: Scottsdale, AZ, USA

Contents

References is not available for this document.

Speaker location and microphone spacing invariant acoustic modeling from raw multichannel waveforms

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Speaker location and microphone spacing invariant acoustic modeling from raw multichannel waveforms

Alerts

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?