A comparative study of multi-channel processing methods for noisy automatic speech recognition in urban environments | IEEE Conference Publication | IEEE Xplore

A comparative study of multi-channel processing methods for noisy automatic speech recognition in urban environments


Abstract:

For the distant speech recognition, the multi-channel processing has been proven to significantly improve the ASR performances compared to the single channel approaches. ...Show More

Abstract:

For the distant speech recognition, the multi-channel processing has been proven to significantly improve the ASR performances compared to the single channel approaches. However, there is very little work has done to provide a comparative evaluation of the approaches, particularly with the modern Deep Neural Network (DNN) recognizers. In this paper, we address the above problem by evaluating the most recently reported mutti-channel methods for the distant speech recognition under urban environments using the 3rd CHiME Challenge database. Particularly, we analyse the effects of each stage of processing of beamforming, adaptive noise cancellation and dereverberation. The back-end processing components are also investigated. We further describe in details our best performing system which combines a harmonic to subharmonic ratio (SHR) voice activity detection, and correlative beamforming with adaptive channel selection in the from-end; semi-supervised DNN adaptation and RNN language model rescoring in the back-end. The system achieved impressive 60% and 55% relative WER reductions on the development set, as well as 65% and 60% of the same on the test set, for real and simulated data sets, respectively.
Date of Conference: 20-25 March 2016
Date Added to IEEE Xplore: 19 May 2016
ISBN Information:
Electronic ISSN: 2379-190X
Conference Location: Shanghai, China

Contact IEEE to Subscribe

References

References is not available for this document.