Abstract
This paper intends to address and solve the problem Vietnamese Named Entity recognition and classification (VNER) by using the bootstrapping algorithm and rule-based model. The rule-based model relies on contextual rules to provide contextual evidence that a VNE belongs to a category. These rules exploit linguistic constraints of category are constructed by using the bootstrapping algorithm. Bootstrapping algorithm starts with a handful of seed VNEs of a given category and accumulate all contextual rules found around these seeds in a large corpus. These rules are ranked and used to find new VNEs.
Our experimented corpus is generated from about 250.034 online news articles and over 9.000 literatures. Our VNER system consists 27 categories and more 300.000 VNEs which are recognized and categorized. The accuracy of the recognizing and classifying algorithm is about 95%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Chen, C., Lee, H.J.: A Three-Phase System for Chinese Named Entity Recognition. In: Proceedings of ROCLING XVI, pp. 39–48 (2004)
Le Trung, H., Le Anh, V., Le Trung, K.: An Unsupervised Learning and Statistical Approach for Vietnamese Word Recognition and Segmentation. In: Nguyen, N.T., Le, M.T., Świątek, J. (eds.) ACIIDS 2010, Part II. LNCS (LNAI), vol. 5991, pp. 195–204. Springer, Heidelberg (2010)
Le Trung, H., Le Anh, V., Dang, V.-H., Hoang, H.V.: Recognizing and Tagging Vietnamese Words Based on Statistics and Word Order Patterns. In: Nguyen, N.T., Trawiński, B., Katarzyniak, R., Jo, G.-S. (eds.) Adv. Methods for Comput. Collective Intelligence. SCI, vol. 457, pp. 3–12. Springer, Heidelberg (2013)
Lin, W., Yangarber, R., Grishman, R.: Bootstrapped learning of semantic classes from positive and negative examples. In: Proceedings of ICMLK 2003 Workshop on the Continuum from Labeled to Unlabeled Data (2003)
Micheal, T., Riloff, E.: A Bootstrapping Method for Learning Semantic Lexicon using Extraction Pattern Contexts. In: Proceedings of the ACL 2002 conference on Empirical Methods in Natural Language Processing, pp. 214–221 (2002)
Riloff, E., Jones, R.: Learning Dictionaries for Information Extraction by Multi-level Bootstrapping. In: Proceedings of the Sixteenth National Conference on the Artificial Intelligence and the Eleventh Innovative Applications of Artificial Intelligence Conference, pp. 474–479 (1999)
Tran, Q.T., Pham, T.X.T., Ngo, Q.H., Dinh, D., Collier, N.: Named Entity Recognition in Vietnamese documents. Progress in Informatics Journal, 5–13 (2007)
Pham, T.X.T., Kawazoe, A., Dinh, D., Collier, N., Tran, Q.T.: Construction of a Vietnamese Corpora for Named Entity Recognition. In: RIAO 2007, 8th International Conference, pp. 719–724. Carnegie Mellon University, Pittsburgh (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Le Trung, H., Le Anh, V., Le Trung, K. (2014). Bootstrapping and Rule-Based Model for Recognizing Vietnamese Named Entity. In: Nguyen, N.T., Attachoo, B., Trawiński, B., Somboonviwat, K. (eds) Intelligent Information and Database Systems. ACIIDS 2014. Lecture Notes in Computer Science(), vol 8398. Springer, Cham. https://doi.org/10.1007/978-3-319-05458-2_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-05458-2_18
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05457-5
Online ISBN: 978-3-319-05458-2
eBook Packages: Computer ScienceComputer Science (R0)