Abstract:
For the distant speech recognition, the multi-channel processing has been proven to significantly improve the ASR performances compared to the single channel approaches. ...Show MoreMetadata
Abstract:
For the distant speech recognition, the multi-channel processing has been proven to significantly improve the ASR performances compared to the single channel approaches. However, there is very little work has done to provide a comparative evaluation of the approaches, particularly with the modern Deep Neural Network (DNN) recognizers. In this paper, we address the above problem by evaluating the most recently reported mutti-channel methods for the distant speech recognition under urban environments using the 3rd CHiME Challenge database. Particularly, we analyse the effects of each stage of processing of beamforming, adaptive noise cancellation and dereverberation. The back-end processing components are also investigated. We further describe in details our best performing system which combines a harmonic to subharmonic ratio (SHR) voice activity detection, and correlative beamforming with adaptive channel selection in the from-end; semi-supervised DNN adaptation and RNN language model rescoring in the back-end. The system achieved impressive 60% and 55% relative WER reductions on the development set, as well as 65% and 60% of the same on the test set, for real and simulated data sets, respectively.
Published in: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 20-25 March 2016
Date Added to IEEE Xplore: 19 May 2016
ISBN Information:
Electronic ISSN: 2379-190X