A Novel Approach to Improve the Mongolian Language Model Using Intermediate Characters

Yan, Xiaofei; Bao, Feilong; Wei, Hongxi; Su, Xiangdong

doi:10.1007/978-3-319-47674-2_9

Xiaofei Yan¹⁸,
Feilong Bao¹⁸,
Hongxi Wei¹⁸ &
…
Xiangdong Su¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10035))

Included in the following conference series:

1791 Accesses
4 Citations

Abstract

In Mongolian language, there is a phenomenon that many words have the same presentation form but represent different words with different codes. Since typists usually input the words according to their representation forms and cannot distinguish the codes sometimes, there are lots of coding errors occurred in Mongolian corpus. It results in statistic and retrieval very difficult on such a Mongolian corpus. To solve this problem, this paper proposed a method which merges the words with same presentation forms by Intermediate characters, then use the corpus in Intermediate characters form to build Mongolian language model. Experimental result shows that the proposed method can reduce the perplexity and the word error rate for the 3-gram language model by 41 % and 30 % respectively when comparing model trained on the corpus without processing. The proposed approach significantly improves the performance of Mongolian language model and greatly enhances the accuracy of Mongolian speech recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Automatic Speech Recognition Improvement for Kazakh Language with Enhanced Language Model

Processing of Chinese language and text information system under the background of speech recognition

Article 10 June 2023

A hybrid input-type recurrent neural network for LVCSR language modeling

Article Open access 08 August 2016

References

Bao, F., Gao, G., Yan, X., Wang, W.: Segmentation-based Mongolian LVCSR approach. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8136–8139 (2013)
Google Scholar
Gao, G., Jin, W., Long, F.: Mongolian text retrieval: language specialities and IR model. J. Comput. Inf. Syst. 1561–1568 (2009)
Google Scholar
Bao, F., Gao, G., Yan, X., Wang, H.: Language model for cyrillic mongolian to traditional Mongolian conversion. In: Li, J., Ji, H., Zhao, D., Feng, Y. (eds.) NLPCC 2015. CCIS, vol. 9362, pp. 13–18. Springer, Heidelberg (2013). doi:10.1007/978-3-642-41644-6_2
Chapter Google Scholar
Jun, Z.: Design and Implementation of Mongolian Word Analyzing and Correcting Based on Statistical Language Method. Inner Mongolia University, Hohhot (2007)
Google Scholar
Su, C., Hou, H., Yang, P., Yuan, H.: Based on the statistical translation framework of the Monglian automatic spelling correction method. J. Chin. Inf. Proces. 175–179 (2013)
Google Scholar
Sloglo: A proofreading algorithm of Mongolian text based on nondeterministic finite automata. J. Chin. Inf. Proces. 110–115 (2009)
Google Scholar
Jiang, B.: Research on Rule-Based the Method of Mongolian Automatic Correction. Inner Mongolia University, Hohhot (2014)
Google Scholar
GB 25914-2010: Information technology of traditional Mongolian nominal characters, presentation characters and control characters using the rules, 10 January 2011
Google Scholar
Zong, C.: Natural Language Processing, 2nd edn., pp. 83–104. Tsinghua University Press, Beijing (2013)
Google Scholar
Jurafsky, D., Martin, J.: Speech and Language Processing, 2nd edn. Prentice Hall, Upper Saddle River (2009)
Google Scholar
Frankie, J.: Modified Kneser-Ney smoothing of n-gram models. RIACS Technical report (2000)
Google Scholar
Meng, J., Zhang, J., Zhao, H.: Overview of the speech recognition technology. In: International Conference on Computational and Information Sciences (2012)
Google Scholar
Mikolov, T., Karafiat, M., Burget, L., et al.: Recurrent neural network based language model. In: Proceedings of 11th Annual Conference of the International Speech Communication Association, Makuhari, Japan, pp. 1045–1048 (2010)
Google Scholar
Mikolov, T., Kombrink, S., Burget, L., et al.: Extensions of recurrent neural network language model. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Prague, Czech Republic, pp. 5528–5531 (2011)
Google Scholar
Forney Jr., G.D.: The Viterbi algorithm. Proc. IEEE 61(3), 268–278 (1973)
Article MathSciNet Google Scholar
Stolcke, A., et al.: SRILM –an extensible language modeling toolkit. In: Proceedings of the International Conference on Spoken Language Processing, vol. 2, Denver, pp. 901–904 (2002)
Google Scholar
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., Vesely, K.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society (2011)
Google Scholar
Zhang, H., Bao, F., Gao, G.: Mongolian speech recognition based on deep neural networks. In: Sun, M., Liu, Z., Zhang, M., Liu, Y. (eds.) CCL 2015. Lecture Notes in Artificial Intelligence (LNAI), vol. 9427, pp. 180–188. Springer, Heidelberg (2015). doi:10.1007/978-3-319-25816-4_15
Chapter Google Scholar

Download references

Acknowledgements

This research was partially supported by the China National Nature Science Foundation (No. 61263037 and No. 61563040), Inner Mongolia nature science foundation (No. 2014BS0604) and the program of high-level talents of Inner Mongolia University.

Author information

Authors and Affiliations

College of Computer Science, Inner Mongolia University, Hohhot, 010021, China
Xiaofei Yan, Feilong Bao, Hongxi Wei & Xiangdong Su

Authors

Xiaofei Yan
View author publications
You can also search for this author in PubMed Google Scholar
Feilong Bao
View author publications
You can also search for this author in PubMed Google Scholar
Hongxi Wei
View author publications
You can also search for this author in PubMed Google Scholar
Xiangdong Su
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Feilong Bao .

Editor information

Editors and Affiliations

Tsinghua University , Beijing, China
Maosong Sun
Fudan University , Shanghai, China
Xuanjing Huang
Dalian University of Technology , Dalian, China
Hongfei Lin
Tsinghua University , Beijing, China
Zhiyuan Liu
Tsinghua University , Beijing, China
Yang Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yan, X., Bao, F., Wei, H., Su, X. (2016). A Novel Approach to Improve the Mongolian Language Model Using Intermediate Characters. In: Sun, M., Huang, X., Lin, H., Liu, Z., Liu, Y. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2016 2016. Lecture Notes in Computer Science(), vol 10035. Springer, Cham. https://doi.org/10.1007/978-3-319-47674-2_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-47674-2_9
Published: 10 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47673-5
Online ISBN: 978-3-319-47674-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Novel Approach to Improve the Mongolian Language Model Using Intermediate Characters

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Automatic Speech Recognition Improvement for Kazakh Language with Enhanced Language Model

Processing of Chinese language and text information system under the background of speech recognition

A hybrid input-type recurrent neural network for LVCSR language modeling

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Novel Approach to Improve the Mongolian Language Model Using Intermediate Characters

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Automatic Speech Recognition Improvement for Kazakh Language with Enhanced Language Model

Processing of Chinese language and text information system under the background of speech recognition

A hybrid input-type recurrent neural network for LVCSR language modeling

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation