Abstract
Mongolian is an influential language. And better Mongolian Large Vocabulary Continuous Speech Recognition (LVCSR) systems are required. Recently, the research of speech recognition has achieved a big improvement by introducing the Deep Neural Networks (DNNs). In this study, a DNN-based Mongolian LVCSR system is built. Experimental results show that the DNN-based models outperform the conventional models which based on Gaussian Mixture Models (GMMs) for the Mongolian speech recognition, by a large margin. Compared with the best GMM-based model, the DNN-based one obtains a relative improvement over 50 %. And it becomes a new state-of-the-art system in this field.
References
Lewis, M.P., Simons, G.F., Fennig, C.D.: Ethnologue: Languages of the World, 18th edn. Sil International, Dallas, TX (2015). http://www.ethnologue.com
Gao, G., Biligetu, Nabuqing, Zhang, S.: A mongolian speech recognition system based on HMM. In: Huang, D.-S., Li, K., Irwin, G.W. (eds.) ICIC 2006. LNCS (LNAI), vol. 4114, pp. 667–676. Springer, Heidelberg (2006)
Qilao, H., Gao, G.: Researching of speech recognition oriented mongolian acoustic model. In: Chinese Conference on Pattern Recognition, CCPR 2008, pp. 406–411. IEEE (2008)
Bao, F., Gao, G.: Improving of acoustic model for the mongolian speech recognition system. In: Chinese Conference on Pattern Recognition, CCPR 2009, pp. 616–620. IEEE (2009)
Bao, F., Gao, G., Yan, X., Wang, W.: Segmentation-based mongolian LVCSR approach. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. pp. 8136–8139. IEEE (2013)
Ayush, A., Damdinsuren, B.: A design and implementation of HMM based mongolian speech recognition system. In: 2013 8th International Forum on Strategic Technology (IFOST), vol. 2, pp. 341–344, June 2013
Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.-R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al.: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)
Furui, S.: Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust. Speech Signal Process. 29(2), 254–272 (1981)
Mohamed, A.-R., Hinton, G., Penn, G.: Understanding how deep belief networks perform acoustic modelling. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4273–4276. IEEE (2012)
Gales, M.J.: Maximum likelihood linear transformations for HMM-based speech recognition. Comput. Speech Lang. 12(2), 75–98 (1998)
Forney Jr., G.D.: The viterbi algorithm. Proc. IEEE 61(3), 268–278 (1973)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Bengio, Y., Schwenk, H., Senécal, J.-S., Morin, F., Gauvain, J.-L.: Neural probabilistic language models. In: Holmes, D.E., Jain, L.C. (eds.) Innovations in Machine Learning, pp. 137–186. Springer, Heidelberg (2006)
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlíček, P., Qian, Y., Schwarz, P., et al.: The Kaldi speech recognition toolkit (2011)
Stolcke, A., et al.: SRILM-an extensible language modeling toolkit. In: INTERSPEECH (2002)
Bao, F., Gao, G., Yan, X., Wang, H.: Language model for cyrillic mongolian to traditional mongolian conversion. In: Zhou, G., Li, J., Zhao, D., Feng, Y. (eds.) NLPCC 2013. CCIS, vol. 400, pp. 13–18. Springer, Heidelberg (2013)
Bao, F., Gao, G., Yan, X., Wei, H.: Research on conversion approach between traditional mongolian and cyrillic mongolian. Comput. Eng. Appl. 2014(23), 206–211 (2014)
Mikolov, T., Karafiát, M., Burget, L., Cernocký, J., Khudanpur, S.: Recurrent neural network based language model. Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1045–1048 (2010)
Sundermeyer, M., Oparin, I., Gauvain, J.L., Freiberg, B., Schlüter, R., Ney, H.: Comparison of feedforward and recurrent neural network language models. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8430–8434 (2013)
Hannun, A., Case, C., Casper, J., Catanzaro, B., Diamos, G., Elsen, E., Prenger, R., Satheesh, S., Sengupta, S., Coates, A., et al.: Deepspeech: Scaling up end-to-end speech recognition (2014). arXiv preprint arXiv:1412.5567
Chan, W., Lane, I.: Deep recurrent neural networks for acoustic modelling (2015). arXiv preprint arXiv:1504.01482
Acknowledgements
This research was supported in part by the China national nature science foundation (No.61263037), Inner Mongolia nature science foundation (No. 2014BS0604) and the program of high-level talents of Inner Mongolia University.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Zhang, H., Bao, F., Gao, G. (2015). Mongolian Speech Recognition Based on Deep Neural Networks. In: Sun, M., Liu, Z., Zhang, M., Liu, Y. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. CCL NLP-NABD 2015 2015. Lecture Notes in Computer Science(), vol 9427. Springer, Cham. https://doi.org/10.1007/978-3-319-25816-4_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-25816-4_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25815-7
Online ISBN: 978-3-319-25816-4
eBook Packages: Computer ScienceComputer Science (R0)