Research on Khalkha Dialect Mongolian Speech Recognition Acoustic Model Based on Weight Transfer

Shi, Linyan; Bao, Feilong; Wang, Yonghe; Gao, Guanglai

doi:10.1007/978-3-030-32236-6_47

Research on Khalkha Dialect Mongolian Speech Recognition Acoustic Model Based on Weight Transfer

Linyan Shi¹³,
Feilong Bao¹³,
Yonghe Wang¹³ &
…
Guanglai Gao¹³

Conference paper
First Online: 30 September 2019

4632 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11839))

Abstract

Due to the lack of labeled training data, the performance of acoustic models in low-resource speech recognition systems such as Khalkha dialect Mongolian is poor. Transfer Learning can solve the data-sparse problem by learning the source domain (high resource) knowledge to guides the training of the target domain (low resource) model. In this paper, we investigate the modeling method of using different transfer learning ways in the Khalkha dialect Mongolian ASR system. First, the English and Chahar dialect are used as the source domains, and the trained acoustic model on the above source domains are conducted to initialize the Khalkha acoustic model parameter. Furthermore, the different training strategies, the portability of different hidden layers, and the impact of the pre-training model on the transfer model were applied to validate their effectiveness in the Khalkha dialect ASR task. The experimental results show that the optimal acoustic model is chain TDNN based on weight transfer method with Chahar dialect as the source domain. The final WER is 15.67%, which is relatively reduced by 38% compared to the random initialization model.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Gao, G.L., Zhang, S.: A Mongolian speech recognition system based on HMM. In: Proceedings of the International Conference on Intelligent Computing, pp. 667–676 (2006)
Google Scholar
Bao, F., Gao, G.: Improving of acoustic model for the Mongolian speech recognition system. In: Chinese Conference on 2nd Pattern Recognition, CCPR 2009, pp. 1–5 (2009)
Google Scholar
Bao, F., Gao, G., Wang, H.: Mongolian speech keyword spotting method based on stem. J. Chin. Inf. Process. 30(1), 124–128 (2016)
Google Scholar
Bao, F., Gao, G., Yan, X., et al.: Segmentation-based Mongolian LVCSR approach. In: Proceedings of the 38th ICASSP, pp. 1–5 (2013)
Google Scholar
Zhang, H., Bao, F., Gao, G.L.: Mongolian speech recognition based on deep neural networks. In: Proceedings of the 15th Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, pp. 180–188 (2015)
Google Scholar
Zhang, H., Bao, F., Gao, G.L., et al.: Comparison on neural network based acoustic model in Mongolian speech recognition. In: Proceedings of the 2016 International Conference on 20th Asian Language Processing (IALP), pp. 1–5 (2016)
Google Scholar
Wang, Y., Bao, F., Zhang, H., et al.: Research on Mongolian speech recognition based on FSMN. In: Natural Language Processing and Chinese Computing, pp. 243–254 (2017)
Chapter Google Scholar
Wang, Y., Bao, F.L., Gao, G.L.: Research on Mongolian speech recognition based on TDNN-LSTM. In: Natural Language Processing and Chinese Computing, pp. 221–226 (2018)
Chapter Google Scholar
Pan, S., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)
Article Google Scholar
Bengio, Y., et al.: Deep learning of representations for unsupervised and transfer learning. In: ICML Unsupervised and Transfer Learning, pp. 17–36 (2012)
Google Scholar
Swietojanski, P., Ghoshal, A., Renals, S.: Unsupervised cross-lingual knowledge transfer in DNN-based LVCSR. In: 2012 IEEE Spoken Language Technology Workshop (SLT). IEEE (2013)
Google Scholar
Povey, D., Ghoshal, A., Boulianne, G., et al.: The Kaldi speech recognition toolkit. In: Workshop on Automatic Speech Recognition and Understanding (No. EPFL-CONF-192584). IEEE Signal Processing Society (2011)
Google Scholar

Download references

Acknowledgment

This work was supported by the National Natural Science Foundation of China (Nos. 61563040, 61773224); Natural Science Foundation of Inner Mongolia (Nos. 2018MS06006, 2016ZD06).

Author information

Authors and Affiliations

Inner Mongolian Key Laboratory of Mongolian Information Processing Technology, College of Computer Science, Inner Mongolia University, Hohhot, 010021, China
Linyan Shi, Feilong Bao, Yonghe Wang & Guanglai Gao

Authors

Linyan Shi
View author publications
You can also search for this author in PubMed Google Scholar
Feilong Bao
View author publications
You can also search for this author in PubMed Google Scholar
Yonghe Wang
View author publications
You can also search for this author in PubMed Google Scholar
Guanglai Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Feilong Bao .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Jie Tang
National University of Singapore, Singapore, Singapore
Min-Yen Kan
Peking University, Beijing, China
Dongyan Zhao
Peking University, Beijing, China
Sujian Li
Zhengzhou University, Zhengzhou, China
Hongying Zan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shi, L., Bao, F., Wang, Y., Gao, G. (2019). Research on Khalkha Dialect Mongolian Speech Recognition Acoustic Model Based on Weight Transfer. In: Tang, J., Kan, MY., Zhao, D., Li, S., Zan, H. (eds) Natural Language Processing and Chinese Computing. NLPCC 2019. Lecture Notes in Computer Science(), vol 11839. Springer, Cham. https://doi.org/10.1007/978-3-030-32236-6_47

Download citation

DOI: https://doi.org/10.1007/978-3-030-32236-6_47
Published: 30 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32235-9
Online ISBN: 978-3-030-32236-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)