Skip to main content

Research on Khalkha Dialect Mongolian Speech Recognition Acoustic Model Based on Weight Transfer

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11839))

Abstract

Due to the lack of labeled training data, the performance of acoustic models in low-resource speech recognition systems such as Khalkha dialect Mongolian is poor. Transfer Learning can solve the data-sparse problem by learning the source domain (high resource) knowledge to guides the training of the target domain (low resource) model. In this paper, we investigate the modeling method of using different transfer learning ways in the Khalkha dialect Mongolian ASR system. First, the English and Chahar dialect are used as the source domains, and the trained acoustic model on the above source domains are conducted to initialize the Khalkha acoustic model parameter. Furthermore, the different training strategies, the portability of different hidden layers, and the impact of the pre-training model on the transfer model were applied to validate their effectiveness in the Khalkha dialect ASR task. The experimental results show that the optimal acoustic model is chain TDNN based on weight transfer method with Chahar dialect as the source domain. The final WER is 15.67%, which is relatively reduced by 38% compared to the random initialization model.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Gao, G.L., Zhang, S.: A Mongolian speech recognition system based on HMM. In: Proceedings of the International Conference on Intelligent Computing, pp. 667–676 (2006)

    Google Scholar 

  2. Bao, F., Gao, G.: Improving of acoustic model for the Mongolian speech recognition system. In: Chinese Conference on 2nd Pattern Recognition, CCPR 2009, pp. 1–5 (2009)

    Google Scholar 

  3. Bao, F., Gao, G., Wang, H.: Mongolian speech keyword spotting method based on stem. J. Chin. Inf. Process. 30(1), 124–128 (2016)

    Google Scholar 

  4. Bao, F., Gao, G., Yan, X., et al.: Segmentation-based Mongolian LVCSR approach. In: Proceedings of the 38th ICASSP, pp. 1–5 (2013)

    Google Scholar 

  5. Zhang, H., Bao, F., Gao, G.L.: Mongolian speech recognition based on deep neural networks. In: Proceedings of the 15th Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, pp. 180–188 (2015)

    Google Scholar 

  6. Zhang, H., Bao, F., Gao, G.L., et al.: Comparison on neural network based acoustic model in Mongolian speech recognition. In: Proceedings of the 2016 International Conference on 20th Asian Language Processing (IALP), pp. 1–5 (2016)

    Google Scholar 

  7. Wang, Y., Bao, F., Zhang, H., et al.: Research on Mongolian speech recognition based on FSMN. In: Natural Language Processing and Chinese Computing, pp. 243–254 (2017)

    Chapter  Google Scholar 

  8. Wang, Y., Bao, F.L., Gao, G.L.: Research on Mongolian speech recognition based on TDNN-LSTM. In: Natural Language Processing and Chinese Computing, pp. 221–226 (2018)

    Chapter  Google Scholar 

  9. Pan, S., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)

    Article  Google Scholar 

  10. Bengio, Y., et al.: Deep learning of representations for unsupervised and transfer learning. In: ICML Unsupervised and Transfer Learning, pp. 17–36 (2012)

    Google Scholar 

  11. Swietojanski, P., Ghoshal, A., Renals, S.: Unsupervised cross-lingual knowledge transfer in DNN-based LVCSR. In: 2012 IEEE Spoken Language Technology Workshop (SLT). IEEE (2013)

    Google Scholar 

  12. Povey, D., Ghoshal, A., Boulianne, G., et al.: The Kaldi speech recognition toolkit. In: Workshop on Automatic Speech Recognition and Understanding (No. EPFL-CONF-192584). IEEE Signal Processing Society (2011)

    Google Scholar 

Download references

Acknowledgment

This work was supported by the National Natural Science Foundation of China (Nos. 61563040, 61773224); Natural Science Foundation of Inner Mongolia (Nos. 2018MS06006, 2016ZD06).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Feilong Bao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shi, L., Bao, F., Wang, Y., Gao, G. (2019). Research on Khalkha Dialect Mongolian Speech Recognition Acoustic Model Based on Weight Transfer. In: Tang, J., Kan, MY., Zhao, D., Li, S., Zan, H. (eds) Natural Language Processing and Chinese Computing. NLPCC 2019. Lecture Notes in Computer Science(), vol 11839. Springer, Cham. https://doi.org/10.1007/978-3-030-32236-6_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-32236-6_47

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-32235-9

  • Online ISBN: 978-3-030-32236-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics