Loading [a11y]/accessibility-menu.js
Maximum Likelihood Nonlinear Transformations Based on Deep Neural Networks | IEEE Journals & Magazine | IEEE Xplore

Maximum Likelihood Nonlinear Transformations Based on Deep Neural Networks


Abstract:

Feature transformations are commonly used in speech recognition to account for distribution mismatches between the source and target domains (also referred to as covariat...Show More

Abstract:

Feature transformations are commonly used in speech recognition to account for distribution mismatches between the source and target domains (also referred to as covariate shift). Linear (affine) or piecewise linear transformations are typically considered. In this paper, we present deep neural network (DNN) based nonlinear feature transformations estimated under the maximum likelihood criterion. We use the hidden Markov model (HMM) to model speech feature sequences and features in each HMM state assume a Gaussian mixture model (GMM) distribution. The network is pre-trained close to a linear transformation followed by a fine-tuning using the gradient descent algorithm. Due to the nonlinearity, the gradients and the partition functions of GMM-HMM state distributions are evaluated using the Monte Carlo (MC) method based on importance sampling. In addition, a deep stacked architecture is proposed to hierarchically build a DNN as a series of sub-networks with each representing a nonlinear transformation itself, which can be learned using a block-wise learning strategy. Applications of the proposed nonlinear transformations in speaker/environment adaptation and acoustic modeling in large vocabulary continuous speech recognition tasks show its superior performance over the widely-used constrained maximum likelihood linear regression (CMLLR).
Page(s): 2023 - 2031
Date of Publication: 27 July 2016

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.