It is difficult to apply well-formulated model-based noise adaptation approaches to Deep Neural Network (DNN) due to the lack of interpretability of the model parameters. In this paper, we propose incorporating a generative front-end layer (GFL), which is parameterised by Gaussian Mixture Model (GMM), into the DNN. A GFL can be easily adapted to different noise conditions by applying the model-based Vector Taylor Series (VTS) to the underlying GMM. We show that incorporating a GFL to DNN yields 12.1% relative improvement over a baseline multi-condition DNN. We also show that the proposed system performs significantly better than the noise aware training method, where the per-utterance estimated noise parameters are appended to the acoustic features.
Cite as: Kundu, S., Sim, K.C., Gales, M.J.F. (2016) Incorporating a Generative Front-End Layer to Deep Neural Network for Noise Robust Automatic Speech Recognition. Proc. Interspeech 2016, 2359-2363, doi: 10.21437/Interspeech.2016-760
@inproceedings{kundu16_interspeech, author={Souvik Kundu and Khe Chai Sim and Mark J.F. Gales}, title={{Incorporating a Generative Front-End Layer to Deep Neural Network for Noise Robust Automatic Speech Recognition}}, year=2016, booktitle={Proc. Interspeech 2016}, pages={2359--2363}, doi={10.21437/Interspeech.2016-760} }