Abstract
This paper presents a Named Entity Recognition (NER) system for Chinese classic novels in the Ming and Qing dynasties using the Conditional Random Fields (CRFs) method. An annotated corpus of four influential vernacular novels produced during this period is used as both training and testing data. In the experiment, three novels are used as training data and one novel is used as the testing data. Three sets of features are proposed for the CRFs model: (1) baseline feature set, that is, word/POS and bigram for different window sizes, (2) dependency head and dependency relationship, and (3) Wikipedia categories. The F-measures for these four books range from 67% to 80%. Experiments show that using the dependency head and relationship as well as Wikipedia categories can improve the performance of the NER system. Compared with the second feature set, the third one can produce greater improvement.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bunescu, R., Pasca, M.: Using encyclopedic knowledge for named entity disambiguation. In: Proceedings of EACL-2006, pp. 9–16 (2006)
Che, W., Li, Z., Liu, T.: LTP: A Chinese language technology platform. In: Proceedings of the Coling 2010, Demo Volume, pp. 13–16 (2010)
Chiu, T.-S., Lu, Q., Xu, J., Xiong, D., Lo, F.: PoS tagging for classical chinese text. In: Lu, Q., Gao, H.H. (eds.) Chinese Lexical Semantics. LNCS (LNAI), vol. 9332, pp. 448–456. Springer, Heidelberg (2015). doi:10.1007/978-3-319-27194-1_44
Kazama, J., Torisawa, K.: Exploiting Wikipedia as external knowledge for named entity recognition. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 698–707 (2007)
Kazama, J., Torisawa, K.: Inducing gazetteers for named entity recognition by large-scale clustering of dependency relations. In: Proceedings of ACL-2008: HLT, pp. 407–415 (2008)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289 (2001)
Nothman, J., Ringland, N., Radford, W., et al.: Learning multilingual named entity recognition from Wikipedia. Artificial Intelligence 194, 151–175 (2013)
Academia Sinica. Academia Sinica Tagged Corpus of Early Mandarin Chinese (2001). http://app.sinica.edu.tw/cgi-bin/kiwi/pkiwi/kiwi.sh
Wallach, H.: Efficient training of conditional random fields. In: Proceedings of the 6th Annual CLUK Research Colloquium (2002)
Wei, P.-C., Thompson, P.M., Liu, C.-H., et al.: Historical Corpora for Synchronic and Diachronic Linguistics Studies. International Journal of Computational Linguistics & Chinese Language Processing 2(1), 131–145 (1997)
Wick, M., Rohanimanesh, K., Culotta, A., McCallum, A.: SampleRank: learning preferences from atomic gradients. In: Neural Information Processing Systems (NIPS), Workshop on Advances in Ranking (2009)
Xiong, D., Lu, Q., Lo, F., Shi, D., Chiu, T.-S., Li, W.: Specification for segmentation and named entity annotation of Chinese classics in the ming and qing dynasties. In: Ji, D., Xiao, G. (eds.) CLSW 2012. LNCS (LNAI), vol. 7717, pp. 280–293. Springer, Heidelberg (2013). doi:10.1007/978-3-642-36337-5_29
Yu, S., Duan, H., ZhuYu, X., et al.: Specification for Corpus Processing at Peking University: Word Segmentation, POS Tagging and Phonetic Notation. Journal of Chinese Language and Computing 13(2), 121–158 (2003)
Help: Category. (n.d.). In Wikipedia. (retrieved August 10, 2004). https://en.wikipedia.org/wiki/Help:Category
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Long, Y., Xiong, D., Lu, Q., Li, M., Huang, CR. (2016). Named Entity Recognition for Chinese Novels in the Ming-Qing Dynasties. In: Dong, M., Lin, J., Tang, X. (eds) Chinese Lexical Semantics. CLSW 2016. Lecture Notes in Computer Science(), vol 10085. Springer, Cham. https://doi.org/10.1007/978-3-319-49508-8_34
Download citation
DOI: https://doi.org/10.1007/978-3-319-49508-8_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49507-1
Online ISBN: 978-3-319-49508-8
eBook Packages: Computer ScienceComputer Science (R0)