Skip to main content

Applying Machine Learning to Chinese Entity Detection and Tracking

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4394))

Abstract

This paper presents a Chinese entity detection and tracking system that takes advantages of character-based models and machine learning approaches. An entity here is defined as a link of all its mentions in text together with the associated attributes. Entity mentions of different types normally exhibit quite different linguistic patterns. Six separate Conditional Random Fields (CRF) models that incorporate character N-gram and word knowledge features are built to detect the extent and the head of three types of mentions, namely named, nominal and pronominal mentions. For each type of mentions, attributes are identified by Support Vector Machine (SVM) classifiers which take mention heads and their context as classification features. Mentions can then be merged into a unified entity representation by examining their attributes and connections in a rule-based coreference resolution process. The system is evaluated on ACE 2005 corpus and achieves competitive results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Linguistic data consortium (LDC): ACE (Automatic Content Extraction) Chinese annotation guidelines for entities. Version 5.5 (2005)

    Google Scholar 

  2. Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: International Conference on Machine Learning of ICML-2001 (2001)

    Google Scholar 

  3. Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proceedings of Human Language Technology of NAACL-2003 (2003)

    Google Scholar 

  4. Tseng, H., Chang, P., Andrew, G., Jurafsky, D., Manning, C.: A conditional random field word segmenter. In: Proceedings of SIGHAN Workshop on Chinese Language Processing (2005)

    Google Scholar 

  5. Chen, W., Zhang, Y., Hitoshi, I.: Named entity recognition with conditional random fields. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pp. 118–121 (2006)

    Google Scholar 

  6. Wu, Y., Yang, J., Lin, Q.: Description of the NCU Chinese word segmentation and named entity recognition system for SIGHAN Bakeoff 2006. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pp. 209–212 (2006)

    Google Scholar 

  7. Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Proceedings of ECML-98, 10th European Conference on Machine Learning (1998)

    Google Scholar 

  8. Grishman, R., Sundheim, B.: Design of the muc-6 evaluation. In: Proceedings of MUC-6 (1995)

    Google Scholar 

  9. Krupka, G.R., Hausman, K.: Description of the NetOwl TM extractor system as used for MUC-7. In: Proceedings of the MUC-7 (1998)

    Google Scholar 

  10. Zhou, Y., Huang, C., Gao, J., Wu, L.: Transformation based Chinese entity detection and tracking. In: Proceedings of International Joint Conference on Natural Language Processing, pp. 232–237 (2005)

    Google Scholar 

  11. Bikel, D.M., Schwartz, R., Weischedel, R.M.: An algorithm that learns what’s in a name. The Machine Learning Journal, Special Issue on Natural Language Learning (1999)

    Google Scholar 

  12. Klein, D., Smarr, J., Nguyen, H., Manning, C.D.: Named entity recognition with character-level models. In: Proceedings of CoNLL-2003 (2003)

    Google Scholar 

  13. Guo, H., Jiang, J., Hu, G., Zhang, T.: Chinese named entity recognition based on multilevel linguistic features. In: Proceedings of IJCNLP-2004 (2004)

    Google Scholar 

  14. Li, H., Huang, C., Gao, J., Fan, X.: The use of SVM for Chinese new word identification. In: Proceedings of IJCNLP2004 (2004)

    Google Scholar 

  15. Wu, Y., Zhao, J., Xu, B.: Chinese named entity recognition model based on multiple features. In: Proceedings of HLT/EMNLP, pp. 427–434 (2005)

    Google Scholar 

  16. Hobbs, J.R.: Resolving pronoun references. Lingua 44, 311–338 (1978)

    Article  Google Scholar 

  17. Soon, W.M., Lim, D.C.Y., Ng, H.T.: Machine learning approach to coreference resolution of noun phrases. In: Computational Linguistics, pp. 521–544 (2001)

    Google Scholar 

  18. Luo, X., Ittycheriah, A., Jing, H., Kambhatla, N., Roukos, S.: A mention-synchronous coreference resolution algorithm based on the bell tree. In: Proc. of the 42nd Annual Meeting of the Association for Computational Linguistics, pp. 135–142 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Qian, D., Li, W., Yuan, C., Lu, Q., Wu, M. (2007). Applying Machine Learning to Chinese Entity Detection and Tracking. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2007. Lecture Notes in Computer Science, vol 4394. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70939-8_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-70939-8_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-70938-1

  • Online ISBN: 978-3-540-70939-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics