ISCA Archive Interspeech 2016
ISCA Archive Interspeech 2016

Improving Large Vocabulary Accented Mandarin Speech Recognition with Attribute-Based I-Vectors

Hao Zheng, Shanshan Zhang, Liwei Qiao, Jianping Li, Wenju Liu

It has been well-recognized that the accent has a great impact on the ASR of Chinese Mandarin, therefore, how to improve the performance on the accented speech has become a critical issue in this field. The attribute feature has been proven effective on modelling accented speech, resulting in a significantly improved performance in accent recognition. In this paper, we propose an attribute-based i-vector to improve the performance of speech recognition system on large vocabulary accented Mandarin speech task. The system with proposed attribute features works well especially with sufficient training data. To further promote the performance on conditions such as resource limited condition or training data mismatched condition, we also develop Multi-Task Learning Deep Neural Networks (MTL-DNNs) with attribute classification as the secondary task to improve the discriminative ability on Mandarin speech. Experiments on the 450-hour Intel accented Mandarin speech corpus demonstrate that the system with attribute-based i-vectors achieves a significant performance improvement on sufficient training data compared with the baseline DNN-HMM system. The MTL-DNNs complement the shortage of attribute-based i-vectors on data limited and mismatched conditions and obtain obvious CER reductions.


doi: 10.21437/Interspeech.2016-378

Cite as: Zheng, H., Zhang, S., Qiao, L., Li, J., Liu, W. (2016) Improving Large Vocabulary Accented Mandarin Speech Recognition with Attribute-Based I-Vectors. Proc. Interspeech 2016, 3454-3458, doi: 10.21437/Interspeech.2016-378

@inproceedings{zheng16b_interspeech,
  author={Hao Zheng and Shanshan Zhang and Liwei Qiao and Jianping Li and Wenju Liu},
  title={{Improving Large Vocabulary Accented Mandarin Speech Recognition with Attribute-Based I-Vectors}},
  year=2016,
  booktitle={Proc. Interspeech 2016},
  pages={3454--3458},
  doi={10.21437/Interspeech.2016-378}
}