Abstract
POS tagging is a fundamental work in Natural Language Processing, which determines the subsequent processing quality, and the ambiguity of multi-category words directly affects the accuracy of Vietnamese POS tagging. At present, the POS tagging of English and Chinese has achieved better results, but the accuracy of Vietnamese POS tagging is still to be improved. For address this problem, this paper proposes a novel method of Vietnamese POS tagging based on multi-category words disambiguation model and Part of Speech dictionary, the multi-category words dictionary and the non-multi-category words dictionary are generated from the Vietnamese dictionary, which are used to build POS tagging corpus. 396,946 multi-category words have been extracted from the corpus, by using statistical method, the maximum entropy disambiguation model of Vietnamese part of speech is constructed, based on it, the multi-category words and the non-multi-category words are tagged. Experimental results show that the method proposed in the paper is higher than the existing model, which is proved that the method is feasible and effective.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Brill, E., Pop, M.: Unsupervised learning of disambiguation rules for part-of-speech tagging. In: Armstrong, S., Church, K., Isabelle, P., Manzi, S., Tzoukermann, E., Yarowsky, D. (eds.) Natural Language Processing Using Very Large Corpora, vol. 11, pp. 27–42. Springer, Dordrecht (1999). https://doi.org/10.1007/978-94-017-2390-9_3
Hu, G., Zhang, J., Li, M.: Improved transformation based POS tagging of Latin Mongolian. Comput. Appl. 27(4), 963–965 (2007). (in Chinese)
Wang, G., Wang, X.: POS tagging method based on rule priority. J. Anhui Univ. Technol. Nat. Sci. 25(4), 426–429 (2008). (in Chinese)
Bernard, M.: Tagging English text with a probabilistic model. Comput. Linguist. 20(2), 1–29 (1994)
Wang, L., Che, W., Liu, T.: Chinese POS tagging based on SVMTool. J. Chin. Inf. Process. 23(4), 16–21 (2009). (in Chinese)
Binulal, G.S., Goud, P.A., Soman, K.P.: A SVM based approach to Telugu parts of speech tagging using SVMTool. Int. J. Recent Trends Eng. 1(2), 183–185 (2009)
Nongmeikapam, K., Nonglenjaoba, L., Roshan, A., Singh, T.S., Singh, T.N., Bandyopadhyay, S.: Transliterated SVM based Manipuri POS tagging. In: Wyld, D., Zizka, J., Nagamalai, D. (eds.) ECCV 2012. AISC, vol. 166, pp. 989–999. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30157-5_98
Jiang, S., Chen, Q.: Research on Japanese word segmentation and POS tagging based on rules and statistics. J. Chin. Inf. Process. 24(1), 117–122 (2010). (in Chinese)
Nghiem, M., Dinh, D., Nguyen, M.: Improving Vietnamese POS tagging by integrating a rich feature set and support vector machines. In Proceedings of Research, Innovation and, Vision for the Future, RIVF, pp. 128–133 (2008)
Oanh, T.T., Cuong, A.L., Thuy, Q.H., Quynh, H.L.: An experimental study on Vietnamese POS tagging. In: Proceedings of International Conference on Asian Language Processing, IALP, Singapore (2009)
Phuong, L.-H., Azim, R.: An empirical study of maximum entropy approach for part-of-speech tagging of Vietnamese texts. In: Proceedings of TALN 2010, Montreal, Canada (2010)
Xiong, M.: Research on Vietnamese lexical analysis method. Kunming University of Science and Technology (2016)
Ban, D.Q., Ban, H.: Vietnamese Grammar. Education Publisher, Hanoi (2004)
Hoa, N.C.: Practical Vietnamese Grammar. Vietname National University Publisher, Hanoi (2001)
Zhi, T., Zhang, Y.: The acquiring method of chinese ambiguity word POS tagging rules based on rough sets and fuzzy neural network. Comput. Eng. Appl. 38(12), 89–91 (2002). (in Chinese)
Li, H., Jia, Z., Yin, H., et al.: Chinese ambiguity word’s annotation based on rules. Comput. Appl. 34(8), 2197–2201 (2014). (in Chinese)
Acknowledgment
This work was supported in part by the key project of National Natural Science Foundation of China (Grant No. 61732005) and the National Natural Science Foundation of China (Grant Nos. 61262041, 61562052 and 61472168).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Chen, Z. et al. (2018). Vietnamese Part of Speech Tagging Based on Multi-category Words Disambiguation Model. In: Huang, X., Jiang, J., Zhao, D., Feng, Y., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2017. Lecture Notes in Computer Science(), vol 10619. Springer, Cham. https://doi.org/10.1007/978-3-319-73618-1_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-73618-1_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73617-4
Online ISBN: 978-3-319-73618-1
eBook Packages: Computer ScienceComputer Science (R0)