Abstract
Due to the lack of labeled training data, the performance of acoustic models in low-resource speech recognition systems such as Khalkha dialect Mongolian is poor. Transfer Learning can solve the data-sparse problem by learning the source domain (high resource) knowledge to guides the training of the target domain (low resource) model. In this paper, we investigate the modeling method of using different transfer learning ways in the Khalkha dialect Mongolian ASR system. First, the English and Chahar dialect are used as the source domains, and the trained acoustic model on the above source domains are conducted to initialize the Khalkha acoustic model parameter. Furthermore, the different training strategies, the portability of different hidden layers, and the impact of the pre-training model on the transfer model were applied to validate their effectiveness in the Khalkha dialect ASR task. The experimental results show that the optimal acoustic model is chain TDNN based on weight transfer method with Chahar dialect as the source domain. The final WER is 15.67%, which is relatively reduced by 38% compared to the random initialization model.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Gao, G.L., Zhang, S.: A Mongolian speech recognition system based on HMM. In: Proceedings of the International Conference on Intelligent Computing, pp. 667–676 (2006)
Bao, F., Gao, G.: Improving of acoustic model for the Mongolian speech recognition system. In: Chinese Conference on 2nd Pattern Recognition, CCPR 2009, pp. 1–5 (2009)
Bao, F., Gao, G., Wang, H.: Mongolian speech keyword spotting method based on stem. J. Chin. Inf. Process. 30(1), 124–128 (2016)
Bao, F., Gao, G., Yan, X., et al.: Segmentation-based Mongolian LVCSR approach. In: Proceedings of the 38th ICASSP, pp. 1–5 (2013)
Zhang, H., Bao, F., Gao, G.L.: Mongolian speech recognition based on deep neural networks. In: Proceedings of the 15th Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, pp. 180–188 (2015)
Zhang, H., Bao, F., Gao, G.L., et al.: Comparison on neural network based acoustic model in Mongolian speech recognition. In: Proceedings of the 2016 International Conference on 20th Asian Language Processing (IALP), pp. 1–5 (2016)
Wang, Y., Bao, F., Zhang, H., et al.: Research on Mongolian speech recognition based on FSMN. In: Natural Language Processing and Chinese Computing, pp. 243–254 (2017)
Wang, Y., Bao, F.L., Gao, G.L.: Research on Mongolian speech recognition based on TDNN-LSTM. In: Natural Language Processing and Chinese Computing, pp. 221–226 (2018)
Pan, S., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)
Bengio, Y., et al.: Deep learning of representations for unsupervised and transfer learning. In: ICML Unsupervised and Transfer Learning, pp. 17–36 (2012)
Swietojanski, P., Ghoshal, A., Renals, S.: Unsupervised cross-lingual knowledge transfer in DNN-based LVCSR. In: 2012 IEEE Spoken Language Technology Workshop (SLT). IEEE (2013)
Povey, D., Ghoshal, A., Boulianne, G., et al.: The Kaldi speech recognition toolkit. In: Workshop on Automatic Speech Recognition and Understanding (No. EPFL-CONF-192584). IEEE Signal Processing Society (2011)
Acknowledgment
This work was supported by the National Natural Science Foundation of China (Nos. 61563040, 61773224); Natural Science Foundation of Inner Mongolia (Nos. 2018MS06006, 2016ZD06).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Shi, L., Bao, F., Wang, Y., Gao, G. (2019). Research on Khalkha Dialect Mongolian Speech Recognition Acoustic Model Based on Weight Transfer. In: Tang, J., Kan, MY., Zhao, D., Li, S., Zan, H. (eds) Natural Language Processing and Chinese Computing. NLPCC 2019. Lecture Notes in Computer Science(), vol 11839. Springer, Cham. https://doi.org/10.1007/978-3-030-32236-6_47
Download citation
DOI: https://doi.org/10.1007/978-3-030-32236-6_47
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32235-9
Online ISBN: 978-3-030-32236-6
eBook Packages: Computer ScienceComputer Science (R0)