ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

Improving RNN-T for Domain Scaling Using Semi-Supervised Training with Neural TTS

Yan Deng, Rui Zhao, Zhong Meng, Xie Chen, Bing Liu, Jinyu Li, Yifan Gong, Lei He

Recurrent neural network transducer (RNN-T) has shown to be comparable with conventional hybrid model for speech recognition. However, there is still a challenge in out-of-domain scenarios with context or words different from training data. In this paper, we explore the semi-supervised training which optimizes RNN-T jointly with neural text-to-speech (TTS) to better generalize to new domains using domain-specific text data. We apply the method to two tasks: one with out-of-domain context and the other with significant out-of-vocabulary (OOV) words. The results show that the proposed method significantly improves the recognition accuracy in both tasks, resulting in 61.4% and 53.8% relative word error rate (WER) reductions respectively, from a well-trained RNN-T with 65 thousand hours of training data. We do further study on the semi-supervised training methodology: 1) which modules of RNN-T model to be updated; 2) the impact of using different neural TTS models; 3) the performance of using text with different relevancy to target domain. Finally, we compare several RNN-T customization methods, and conclude that semi-supervised training with neural TTS is comparable and complementary with Internal Language Model Estimation (ILME) or biasing.


doi: 10.21437/Interspeech.2021-1017

Cite as: Deng, Y., Zhao, R., Meng, Z., Chen, X., Liu, B., Li, J., Gong, Y., He, L. (2021) Improving RNN-T for Domain Scaling Using Semi-Supervised Training with Neural TTS. Proc. Interspeech 2021, 751-755, doi: 10.21437/Interspeech.2021-1017

@inproceedings{deng21_interspeech,
  author={Yan Deng and Rui Zhao and Zhong Meng and Xie Chen and Bing Liu and Jinyu Li and Yifan Gong and Lei He},
  title={{Improving RNN-T for Domain Scaling Using Semi-Supervised Training with Neural TTS}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={751--755},
  doi={10.21437/Interspeech.2021-1017}
}