Skip to main content

Named Entity Recognition for Chinese Novels in the Ming-Qing Dynasties

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10085))

Abstract

This paper presents a Named Entity Recognition (NER) system for Chinese classic novels in the Ming and Qing dynasties using the Conditional Random Fields (CRFs) method. An annotated corpus of four influential vernacular novels produced during this period is used as both training and testing data. In the experiment, three novels are used as training data and one novel is used as the testing data. Three sets of features are proposed for the CRFs model: (1) baseline feature set, that is, word/POS and bigram for different window sizes, (2) dependency head and dependency relationship, and (3) Wikipedia categories. The F-measures for these four books range from 67% to 80%. Experiments show that using the dependency head and relationship as well as Wikipedia categories can improve the performance of the NER system. Compared with the second feature set, the third one can produce greater improvement.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bunescu, R., Pasca, M.: Using encyclopedic knowledge for named entity disambiguation. In: Proceedings of EACL-2006, pp. 9–16 (2006)

    Google Scholar 

  2. Che, W., Li, Z., Liu, T.: LTP: A Chinese language technology platform. In: Proceedings of the Coling 2010, Demo Volume, pp. 13–16 (2010)

    Google Scholar 

  3. Chiu, T.-S., Lu, Q., Xu, J., Xiong, D., Lo, F.: PoS tagging for classical chinese text. In: Lu, Q., Gao, H.H. (eds.) Chinese Lexical Semantics. LNCS (LNAI), vol. 9332, pp. 448–456. Springer, Heidelberg (2015). doi:10.1007/978-3-319-27194-1_44

    Chapter  Google Scholar 

  4. Kazama, J., Torisawa, K.: Exploiting Wikipedia as external knowledge for named entity recognition. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 698–707 (2007)

    Google Scholar 

  5. Kazama, J., Torisawa, K.: Inducing gazetteers for named entity recognition by large-scale clustering of dependency relations. In: Proceedings of ACL-2008: HLT, pp. 407–415 (2008)

    Google Scholar 

  6. Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289 (2001)

    Google Scholar 

  7. Nothman, J., Ringland, N., Radford, W., et al.: Learning multilingual named entity recognition from Wikipedia. Artificial Intelligence 194, 151–175 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  8. Academia Sinica. Academia Sinica Tagged Corpus of Early Mandarin Chinese (2001). http://app.sinica.edu.tw/cgi-bin/kiwi/pkiwi/kiwi.sh

  9. Wallach, H.: Efficient training of conditional random fields. In: Proceedings of the 6th Annual CLUK Research Colloquium (2002)

    Google Scholar 

  10. Wei, P.-C., Thompson, P.M., Liu, C.-H., et al.: Historical Corpora for Synchronic and Diachronic Linguistics Studies. International Journal of Computational Linguistics & Chinese Language Processing 2(1), 131–145 (1997)

    Google Scholar 

  11. Wick, M., Rohanimanesh, K., Culotta, A., McCallum, A.: SampleRank: learning preferences from atomic gradients. In: Neural Information Processing Systems (NIPS), Workshop on Advances in Ranking (2009)

    Google Scholar 

  12. Xiong, D., Lu, Q., Lo, F., Shi, D., Chiu, T.-S., Li, W.: Specification for segmentation and named entity annotation of Chinese classics in the ming and qing dynasties. In: Ji, D., Xiao, G. (eds.) CLSW 2012. LNCS (LNAI), vol. 7717, pp. 280–293. Springer, Heidelberg (2013). doi:10.1007/978-3-642-36337-5_29

    Chapter  Google Scholar 

  13. Yu, S., Duan, H., ZhuYu, X., et al.: Specification for Corpus Processing at Peking University: Word Segmentation, POS Tagging and Phonetic Notation. Journal of Chinese Language and Computing 13(2), 121–158 (2003)

    Google Scholar 

  14. Help: Category. (n.d.). In Wikipedia. (retrieved August 10, 2004). https://en.wikipedia.org/wiki/Help:Category

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yunfei Long .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Long, Y., Xiong, D., Lu, Q., Li, M., Huang, CR. (2016). Named Entity Recognition for Chinese Novels in the Ming-Qing Dynasties. In: Dong, M., Lin, J., Tang, X. (eds) Chinese Lexical Semantics. CLSW 2016. Lecture Notes in Computer Science(), vol 10085. Springer, Cham. https://doi.org/10.1007/978-3-319-49508-8_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49508-8_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49507-1

  • Online ISBN: 978-3-319-49508-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics