Loading [a11y]/accessibility-menu.js
Factored spatial and spectral multichannel raw waveform CLDNNs | IEEE Conference Publication | IEEE Xplore

Factored spatial and spectral multichannel raw waveform CLDNNs


Abstract:

Multichannel ASR systems commonly separate speech enhancement, including localization, beamforming and postfiltering, from acoustic modeling. Recently, we explored doing ...Show More

Abstract:

Multichannel ASR systems commonly separate speech enhancement, including localization, beamforming and postfiltering, from acoustic modeling. Recently, we explored doing multichannel enhancement jointly with acoustic modeling, where beamforming and frequency decomposition was folded into one layer of the neural network [1, 2]. In this paper, we explore factoring these operations into separate layers in the network. Furthermore, we explore using multi-task learning (MTL) as a proxy for postfiltering, where we train the network to predict "clean" features as well as context-dependent states. We find that with the factored architecture, we can achieve a 10% relative improvement in WER over a single channel and a 5% relative improvement over the unfactored model from [1] on a 2,000-hour Voice Search task. In addition, by incorporating MTL, we can achieve 11% and 7% relative improvements over single channel and unfactored multichannel models, respectively.
Date of Conference: 20-25 March 2016
Date Added to IEEE Xplore: 19 May 2016
ISBN Information:
Electronic ISSN: 2379-190X
Conference Location: Shanghai, China

References

References is not available for this document.