Skip to main content

Vietnamese Part of Speech Tagging Based on Multi-category Words Disambiguation Model

  • Conference paper
  • First Online:
Natural Language Processing and Chinese Computing (NLPCC 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10619))

  • 3217 Accesses

Abstract

POS tagging is a fundamental work in Natural Language Processing, which determines the subsequent processing quality, and the ambiguity of multi-category words directly affects the accuracy of Vietnamese POS tagging. At present, the POS tagging of English and Chinese has achieved better results, but the accuracy of Vietnamese POS tagging is still to be improved. For address this problem, this paper proposes a novel method of Vietnamese POS tagging based on multi-category words disambiguation model and Part of Speech dictionary, the multi-category words dictionary and the non-multi-category words dictionary are generated from the Vietnamese dictionary, which are used to build POS tagging corpus. 396,946 multi-category words have been extracted from the corpus, by using statistical method, the maximum entropy disambiguation model of Vietnamese part of speech is constructed, based on it, the multi-category words and the non-multi-category words are tagged. Experimental results show that the method proposed in the paper is higher than the existing model, which is proved that the method is feasible and effective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Brill, E., Pop, M.: Unsupervised learning of disambiguation rules for part-of-speech tagging. In: Armstrong, S., Church, K., Isabelle, P., Manzi, S., Tzoukermann, E., Yarowsky, D. (eds.) Natural Language Processing Using Very Large Corpora, vol. 11, pp. 27–42. Springer, Dordrecht (1999). https://doi.org/10.1007/978-94-017-2390-9_3

    Chapter  Google Scholar 

  2. Hu, G., Zhang, J., Li, M.: Improved transformation based POS tagging of Latin Mongolian. Comput. Appl. 27(4), 963–965 (2007). (in Chinese)

    Google Scholar 

  3. Wang, G., Wang, X.: POS tagging method based on rule priority. J. Anhui Univ. Technol. Nat. Sci. 25(4), 426–429 (2008). (in Chinese)

    Google Scholar 

  4. Bernard, M.: Tagging English text with a probabilistic model. Comput. Linguist. 20(2), 1–29 (1994)

    Google Scholar 

  5. Wang, L., Che, W., Liu, T.: Chinese POS tagging based on SVMTool. J. Chin. Inf. Process. 23(4), 16–21 (2009). (in Chinese)

    Google Scholar 

  6. Binulal, G.S., Goud, P.A., Soman, K.P.: A SVM based approach to Telugu parts of speech tagging using SVMTool. Int. J. Recent Trends Eng. 1(2), 183–185 (2009)

    Google Scholar 

  7. Nongmeikapam, K., Nonglenjaoba, L., Roshan, A., Singh, T.S., Singh, T.N., Bandyopadhyay, S.: Transliterated SVM based Manipuri POS tagging. In: Wyld, D., Zizka, J., Nagamalai, D. (eds.) ECCV 2012. AISC, vol. 166, pp. 989–999. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30157-5_98

    Chapter  Google Scholar 

  8. Jiang, S., Chen, Q.: Research on Japanese word segmentation and POS tagging based on rules and statistics. J. Chin. Inf. Process. 24(1), 117–122 (2010). (in Chinese)

    Google Scholar 

  9. Nghiem, M., Dinh, D., Nguyen, M.: Improving Vietnamese POS tagging by integrating a rich feature set and support vector machines. In Proceedings of Research, Innovation and, Vision for the Future, RIVF, pp. 128–133 (2008)

    Google Scholar 

  10. Oanh, T.T., Cuong, A.L., Thuy, Q.H., Quynh, H.L.: An experimental study on Vietnamese POS tagging. In: Proceedings of International Conference on Asian Language Processing, IALP, Singapore (2009)

    Google Scholar 

  11. Phuong, L.-H., Azim, R.: An empirical study of maximum entropy approach for part-of-speech tagging of Vietnamese texts. In: Proceedings of TALN 2010, Montreal, Canada (2010)

    Google Scholar 

  12. Xiong, M.: Research on Vietnamese lexical analysis method. Kunming University of Science and Technology (2016)

    Google Scholar 

  13. Ban, D.Q., Ban, H.: Vietnamese Grammar. Education Publisher, Hanoi (2004)

    Google Scholar 

  14. Hoa, N.C.: Practical Vietnamese Grammar. Vietname National University Publisher, Hanoi (2001)

    Google Scholar 

  15. Zhi, T., Zhang, Y.: The acquiring method of chinese ambiguity word POS tagging rules based on rough sets and fuzzy neural network. Comput. Eng. Appl. 38(12), 89–91 (2002). (in Chinese)

    Google Scholar 

  16. Li, H., Jia, Z., Yin, H., et al.: Chinese ambiguity word’s annotation based on rules. Comput. Appl. 34(8), 2197–2201 (2014). (in Chinese)

    Google Scholar 

Download references

Acknowledgment

This work was supported in part by the key project of National Natural Science Foundation of China (Grant No. 61732005) and the National Natural Science Foundation of China (Grant Nos. 61262041, 61562052 and 61472168).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guo Jianyi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, Z. et al. (2018). Vietnamese Part of Speech Tagging Based on Multi-category Words Disambiguation Model. In: Huang, X., Jiang, J., Zhao, D., Feng, Y., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2017. Lecture Notes in Computer Science(), vol 10619. Springer, Cham. https://doi.org/10.1007/978-3-319-73618-1_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-73618-1_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-73617-4

  • Online ISBN: 978-3-319-73618-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics