Language Model for Mongolian Polyphone Proofreading

Lu, Min; Bao, Feilong; Gao, Guanglai

doi:10.1007/978-3-319-69005-6_38

Language Model for Mongolian Polyphone Proofreading

Min Lu¹⁷,
Feilong Bao¹⁷ &
Guanglai Gao¹⁷

Conference paper
First Online: 07 October 2017

1855 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10565))

Abstract

Mongolian text proofreading is the particularly difficult task because of its unique polyphonic alphabet, morphological ambiguity and agglutinative feature, and coding errors are currently pervasive in the Mongolian corpus of electronic edition, which results in Mongolian statistic and retrieval research toughly difficult to carry out. Some conventional approaches have been proposed to solve this problem but with limitations by not considering proofreading of polyphone. In this paper, we address this problem by means of constructing the large-scale resource and conducting n-gram language model based approach. For ease of understanding, the entire proofreading system architecture is also introduced in this paper, since the polyphone proofreading is the important component of it. Experimental results show that our method performs pretty well. Polyphone correction accuracy is relatively improved by 62% and overall system accuracy is relatively promoted by 16.1%.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Wang, W., Bao, F., Gao, G.: Mongolian named entity recognition system with rich features. In: COLING, pp. 505–512 (2016)
Google Scholar
Bao, F., Gao, G., Wang, H., et al.: Cyril Mongolian to traditional Mongolian conversion based on rules and statistics method. J. Chin. Inf. Process. 31(3), 156–162 (2013)
Google Scholar
Bao, F., Gao, G., Yan, X., et al.: Segmentation-based Mongolian LVCSR approach. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8136–8139. IEEE (2013)
Google Scholar
Islam, A., Inkpen, D.: Real-word spelling correction using Google web 1T n-gram data set. In: International Conference on Natural Language Processing and Knowledge Engineering, Nlp-Ke, pp. 1689–1692. IEEE (2009)
Google Scholar
Su, C., Hou, H., Yang, P., Yuan, H.: Based on the statistical translation framework of the Mongolian automatic spelling correction method. J. Chin. Inf. Process. 175–179 (2013)
Google Scholar
Si, L.: Mongolian proofreading algorithm based on nondeterministic finite automata. Chin. J. Inf. 23(6), 110–115 (2009)
Google Scholar
Jiang, B.: Research on Rule-Based the Method of Mongolian Automatic Correction. Inner Mongolia University, Hohhot (2014)
Google Scholar
Yan, X., Bao, F., Wei, H., Su, X.: A novel approach to improve the Mongolian language model using intermediate characters. In: Sun, M., Huang, X., Lin, H., Liu, Z., Liu, Y. (eds.) CCL/NLP-NABD -2016. LNCS, vol. 10035, pp. 103–113. Springer, Cham (2016). doi:10.1007/978-3-319-47674-2_9
Chapter Google Scholar
Gong, Z.: Research on Mongolian code conversion. Inner Mongolia University (2008)
Google Scholar
GB 25914-2010: Information technology of traditional Mongolian nominal characters, presentation characters and control characters using the rules (2011)
Google Scholar
Surgereltu, : Mongolia Orthography Dictionary, 5th edn. Inner Mongolia People’s Publisher, Hohhot (2011)
Google Scholar
Inner Mongolia University: Modern Mongolian. 2nd edn. Inner Mongolia People’s Publisher, Hohhot (2005)
Google Scholar
Zong, C.: Statistical Natural Language Processing, 2nd edn. Tsinghua University Press, Beijing (2008)
Google Scholar
Jurafsky, D., Martin, J.: Speech and Language Processing, 2nd edn. Prentice Hall, Upper Saddle River (2009)
Google Scholar
Stolcke, A.: SRILM - an extensible language modeling toolkit. In: Proceedings of International Conference on Spoken Language Processing, Denver, Colorado (2002)
Google Scholar
Pontus, S., Sampo, P., Goran T.: Brat: a web-based tool for NLP-assisted text annotation. In: Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 102–107
Google Scholar

Download references

Acknowledgements

This paper is supported by The National Natural Science Foundation of China (No. 61563040), Inner Mongolia Natural Science Foundation of major projects (No. 2016ZD06) and Inner Mongolia Natural Science Fund Project (No. 2017BS0601).

Author information

Authors and Affiliations

College of Computer Science, Inner Mongolia University, Hohhot, 010021, China
Min Lu, Feilong Bao & Guanglai Gao

Authors

Min Lu
View author publications
You can also search for this author in PubMed Google Scholar
Feilong Bao
View author publications
You can also search for this author in PubMed Google Scholar
Guanglai Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Feilong Bao .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Maosong Sun
Beijing University of Posts and Telecommunications, Beijing, China
Xiaojie Wang
Peking University, Beijing, China
Baobao Chang
Soochow University, Suzhou, China
Deyi Xiong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lu, M., Bao, F., Gao, G. (2017). Language Model for Mongolian Polyphone Proofreading. In: Sun, M., Wang, X., Chang, B., Xiong, D. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2017 2017. Lecture Notes in Computer Science(), vol 10565. Springer, Cham. https://doi.org/10.1007/978-3-319-69005-6_38

Download citation

DOI: https://doi.org/10.1007/978-3-319-69005-6_38
Published: 07 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69004-9
Online ISBN: 978-3-319-69005-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics