Abstract:
This paper investigates modeling nonlinear transformations based on deep neural networks (DNNs). Specifically, a DNN is used as a nonlinear mapping function for feature s...Show MoreMetadata
Abstract:
This paper investigates modeling nonlinear transformations based on deep neural networks (DNNs). Specifically, a DNN is used as a nonlinear mapping function for feature space transformation for HMM acoustic models. The nonlinear transformations are estimated under the sequence-based maximum likelihood criterion. The likelihood partition function is evaluated using the Monte Carlo method based on importance sampling. The DNN is first pre-trained approximately to a linear transformation then followed by fine-tuning using the gradient descent algorithm. In addition, a deep stacked architecture is proposed that builds a DNN as a series of sub-networks hierarchically with each representing a nonlinear transformation. A block-wise learning strategy is introduced. LVCSR speaker adaptation experiments on the proposed maximum likelihood nonlinear transformation have shown superior results than the widely-used CMLLR transformation.
Published in: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 19-24 April 2015
Date Added to IEEE Xplore: 06 August 2015
Electronic ISBN:978-1-4673-6997-8