Improved Neural Network Initialization by Grouping Context-Dependent Targets for Acoustic Modeling

Kurata, Gakuto; Kingsbury, Brian

doi:10.21437/Interspeech.2016-725

Improved Neural Network Initialization by Grouping Context-Dependent Targets for Acoustic Modeling

Gakuto Kurata, Brian Kingsbury

Neural Network (NN) Acoustic Models (AMs) are usually trained using context-dependent Hidden Markov Model (CD-HMM) states as independent targets. For example, the CD-HMM states of A-b-2 (second variant of beginning state of A) and A-m-1 (first variant of middle state of A) both correspond to the phone A, and A-b-1 and A-b-2 both correspond to the Context-independent HMM (CI-HMM) state A-b, but this relationship is not explicitly modeled. We propose a method that treats some neurons in the final hidden layer just below the output layer as dedicated neurons for phones or CI-HMM states by initializing connections between the dedicated neurons and the corresponding CD-HMM outputs with stronger weights than to other outputs. We obtained 6.5% and 3.6% relative error reductions with a DNN AM and a CNN AM, respectively, on a 50-hour English broadcast news task and 4.6% reduction with a CNN AM on a 500-hour Japanese task, in all cases after Hessian-free sequence training. Our proposed method only changes the NN parameter initialization and requires no additional computation in NN training or speech recognition run-time.

doi: 10.21437/Interspeech.2016-725

Cite as: Kurata, G., Kingsbury, B. (2016) Improved Neural Network Initialization by Grouping Context-Dependent Targets for Acoustic Modeling. Proc. Interspeech 2016, 27-31, doi: 10.21437/Interspeech.2016-725

@inproceedings{kurata16_interspeech,
  author={Gakuto Kurata and Brian Kingsbury},
  title={{Improved Neural Network Initialization by Grouping Context-Dependent Targets for Acoustic Modeling}},
  year=2016,
  booktitle={Proc. Interspeech 2016},
  pages={27--31},
  doi={10.21437/Interspeech.2016-725}
}