ISCA Archive Interspeech 2019
ISCA Archive Interspeech 2019

Generative Noise Modeling and Channel Simulation for Robust Speech Recognition in Unseen Conditions

Meet Soni, Sonal Joshi, Ashish Panda

Multi-conditioned training is a state-of-the-art approach to achieve robustness in Automatic Speech Recognition (ASR) systems. This approach works well in practice for seen degradation conditions. However, the performance of such system is still an issue for unseen degradation conditions. In this work we consider distortions due to additive noise and channel mismatch. To achieve the robustness to additive noise, we propose a parametric generative model for noise signals. By changing the parameters of the proposed generative model, various noise signals can be generated and used to develop a multi-conditioned dataset for ASR system training. The generative model is designed to span the feature space of Mel Filterbank Energies by using band-limited white noise signals as basis. To simulate channel distortions, we propose to shift the mean of log spectral magnitude using utterances with estimated channel distortions. Experiments performed on the Aurora 4 noisy speech database show that using noise types generated from the proposed generative model for multi-conditioned training provides significant performance gain for additive noise in unseen conditions. We compare our results with those from multi-conditioning by various real noise databases including environmental and other real life noises.


doi: 10.21437/Interspeech.2019-2090

Cite as: Soni, M., Joshi, S., Panda, A. (2019) Generative Noise Modeling and Channel Simulation for Robust Speech Recognition in Unseen Conditions. Proc. Interspeech 2019, 441-445, doi: 10.21437/Interspeech.2019-2090

@inproceedings{soni19b_interspeech,
  author={Meet Soni and Sonal Joshi and Ashish Panda},
  title={{Generative Noise Modeling and Channel Simulation for Robust Speech Recognition in Unseen Conditions}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={441--445},
  doi={10.21437/Interspeech.2019-2090}
}