Skip to main content

A Novel Approach to Improve the Mongolian Language Model Using Intermediate Characters

  • Conference paper
  • First Online:
Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data (NLP-NABD 2016, CCL 2016)

Abstract

In Mongolian language, there is a phenomenon that many words have the same presentation form but represent different words with different codes. Since typists usually input the words according to their representation forms and cannot distinguish the codes sometimes, there are lots of coding errors occurred in Mongolian corpus. It results in statistic and retrieval very difficult on such a Mongolian corpus. To solve this problem, this paper proposed a method which merges the words with same presentation forms by Intermediate characters, then use the corpus in Intermediate characters form to build Mongolian language model. Experimental result shows that the proposed method can reduce the perplexity and the word error rate for the 3-gram language model by 41 % and 30 % respectively when comparing model trained on the corpus without processing. The proposed approach significantly improves the performance of Mongolian language model and greatly enhances the accuracy of Mongolian speech recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bao, F., Gao, G., Yan, X., Wang, W.: Segmentation-based Mongolian LVCSR approach. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8136–8139 (2013)

    Google Scholar 

  2. Gao, G., Jin, W., Long, F.: Mongolian text retrieval: language specialities and IR model. J. Comput. Inf. Syst. 1561–1568 (2009)

    Google Scholar 

  3. Bao, F., Gao, G., Yan, X., Wang, H.: Language model for cyrillic mongolian to traditional Mongolian conversion. In: Li, J., Ji, H., Zhao, D., Feng, Y. (eds.) NLPCC 2015. CCIS, vol. 9362, pp. 13–18. Springer, Heidelberg (2013). doi:10.1007/978-3-642-41644-6_2

    Chapter  Google Scholar 

  4. Jun, Z.: Design and Implementation of Mongolian Word Analyzing and Correcting Based on Statistical Language Method. Inner Mongolia University, Hohhot (2007)

    Google Scholar 

  5. Su, C., Hou, H., Yang, P., Yuan, H.: Based on the statistical translation framework of the Monglian automatic spelling correction method. J. Chin. Inf. Proces. 175–179 (2013)

    Google Scholar 

  6. Sloglo: A proofreading algorithm of Mongolian text based on nondeterministic finite automata. J. Chin. Inf. Proces. 110–115 (2009)

    Google Scholar 

  7. Jiang, B.: Research on Rule-Based the Method of Mongolian Automatic Correction. Inner Mongolia University, Hohhot (2014)

    Google Scholar 

  8. GB 25914-2010: Information technology of traditional Mongolian nominal characters, presentation characters and control characters using the rules, 10 January 2011

    Google Scholar 

  9. Zong, C.: Natural Language Processing, 2nd edn., pp. 83–104. Tsinghua University Press, Beijing (2013)

    Google Scholar 

  10. Jurafsky, D., Martin, J.: Speech and Language Processing, 2nd edn. Prentice Hall, Upper Saddle River (2009)

    Google Scholar 

  11. Frankie, J.: Modified Kneser-Ney smoothing of n-gram models. RIACS Technical report (2000)

    Google Scholar 

  12. Meng, J., Zhang, J., Zhao, H.: Overview of the speech recognition technology. In: International Conference on Computational and Information Sciences (2012)

    Google Scholar 

  13. Mikolov, T., Karafiat, M., Burget, L., et al.: Recurrent neural network based language model. In: Proceedings of 11th Annual Conference of the International Speech Communication Association, Makuhari, Japan, pp. 1045–1048 (2010)

    Google Scholar 

  14. Mikolov, T., Kombrink, S., Burget, L., et al.: Extensions of recurrent neural network language model. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Prague, Czech Republic, pp. 5528–5531 (2011)

    Google Scholar 

  15. Forney Jr., G.D.: The Viterbi algorithm. Proc. IEEE 61(3), 268–278 (1973)

    Article  MathSciNet  Google Scholar 

  16. Stolcke, A., et al.: SRILM –an extensible language modeling toolkit. In: Proceedings of the International Conference on Spoken Language Processing, vol. 2, Denver, pp. 901–904 (2002)

    Google Scholar 

  17. Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., Vesely, K.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society (2011)

    Google Scholar 

  18. Zhang, H., Bao, F., Gao, G.: Mongolian speech recognition based on deep neural networks. In: Sun, M., Liu, Z., Zhang, M., Liu, Y. (eds.) CCL 2015. Lecture Notes in Artificial Intelligence (LNAI), vol. 9427, pp. 180–188. Springer, Heidelberg (2015). doi:10.1007/978-3-319-25816-4_15

    Chapter  Google Scholar 

Download references

Acknowledgements

This research was partially supported by the China National Nature Science Foundation (No. 61263037 and No. 61563040), Inner Mongolia nature science foundation (No. 2014BS0604) and the program of high-level talents of Inner Mongolia University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Feilong Bao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Yan, X., Bao, F., Wei, H., Su, X. (2016). A Novel Approach to Improve the Mongolian Language Model Using Intermediate Characters. In: Sun, M., Huang, X., Lin, H., Liu, Z., Liu, Y. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2016 2016. Lecture Notes in Computer Science(), vol 10035. Springer, Cham. https://doi.org/10.1007/978-3-319-47674-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-47674-2_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-47673-5

  • Online ISBN: 978-3-319-47674-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics