Abstract:
In this paper, we propose a feature adaptation method that combines speech features from multiple microphone channels for robust automatic speech recognition (ASR). The p...Show MoreMetadata
Abstract:
In this paper, we propose a feature adaptation method that combines speech features from multiple microphone channels for robust automatic speech recognition (ASR). The proposed method first transforms the features in all channels using channel-dependent linear transforms, and then sum the channels into one channel for acoustic modeling. The transform parameters are estimated by maximizing the likelihood of the transformed features on a Gaussian mixture model (GMM) trained from clean features. To use diagonal covariance matrices for efficient estimation algorithm, the likelihood function is evaluated in the cepstral domain, while the transformation is in the log Mel filterbank domain. We evaluate the proposed feature adaptation on the 6-channel evaluation data in the CHiME-3 task. Results show that the proposed feature adaptation method with diagonal channel-dependent transforms reduces word error rate (WER) from 21.05% (best single channel) to 16.96% when a DNN-based acoustic model is used. This result is also slightly better than the 17.60% obtained by the minimum variance distortionless response beamforming.
Date of Conference: 17-20 October 2016
Date Added to IEEE Xplore: 04 May 2017
ISBN Information: