Incorporating a Generative Front-End Layer to Deep Neural Network for Noise Robust Automatic Speech Recognition

Kundu, Souvik; Sim, Khe Chai; Gales, Mark J.F.

doi:10.21437/Interspeech.2016-760

Incorporating a Generative Front-End Layer to Deep Neural Network for Noise Robust Automatic Speech Recognition

Souvik Kundu, Khe Chai Sim, Mark J.F. Gales

It is difficult to apply well-formulated model-based noise adaptation approaches to Deep Neural Network (DNN) due to the lack of interpretability of the model parameters. In this paper, we propose incorporating a generative front-end layer (GFL), which is parameterised by Gaussian Mixture Model (GMM), into the DNN. A GFL can be easily adapted to different noise conditions by applying the model-based Vector Taylor Series (VTS) to the underlying GMM. We show that incorporating a GFL to DNN yields 12.1% relative improvement over a baseline multi-condition DNN. We also show that the proposed system performs significantly better than the noise aware training method, where the per-utterance estimated noise parameters are appended to the acoustic features.

doi: 10.21437/Interspeech.2016-760

Cite as: Kundu, S., Sim, K.C., Gales, M.J.F. (2016) Incorporating a Generative Front-End Layer to Deep Neural Network for Noise Robust Automatic Speech Recognition. Proc. Interspeech 2016, 2359-2363, doi: 10.21437/Interspeech.2016-760

@inproceedings{kundu16_interspeech,
  author={Souvik Kundu and Khe Chai Sim and Mark J.F. Gales},
  title={{Incorporating a Generative Front-End Layer to Deep Neural Network for Noise Robust Automatic Speech Recognition}},
  year=2016,
  booktitle={Proc. Interspeech 2016},
  pages={2359--2363},
  doi={10.21437/Interspeech.2016-760}
}