Skip to main content
Log in

Linguistic Theory Based Contextual Evidence Mining for Statistical Chinese Co-Reference Resolution

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Under statistical learning framework, the paper focuses on how to use traditional linguistic findings on anaphora resolution as a guide for mining and organizing contextual features for Chinese co-reference resolution. The main achievements are as follows. (1) In order to simulate “syntactic and semantic parallelism factor”, we extract “bags of word form and POS” feature and “bag of semes” feature from the contexts of the entity mentions and incorporate them into the baseline feature set. (2) Because it is too coarse to use the feature of bags of word form, POS tag and seme to determine the syntactic and semantic parallelism between two entity mentions, we propose a method for contextual feature reconstruction based on semantic similarity computation, in order that the reconstructed contextual features could better approximate the anaphora resolution factor of “Syntactic and Semantic Parallelism Preferences”. (3) We use an entity-mention-based contextual feature representation instead of isolated word-based contextual feature representation, and expand the size of the contextual windows in addition, in order to approximately simulate “the selectional restriction factor” for anaphora resolution. The experiments show that the multi-level contextual features are useful for co-reference resolution, and the statistical system incorporated with these features performs well on the standard ACE datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Mitkov R. Anaphora Resolution. London: Longman Press, 2002.

    Google Scholar 

  2. NIST. The Official Evaluation Plan for the ACE 2005 Evaluation. 2005, http://www.nist.gov/speech/tests/ace/ace05/.

  3. Soon W M, Ng H T, Lim D. A machine learning approach to co-reference resolution of noun phrases. Computational Linguistics, 2001, 27(4): 521–544.

    Article  Google Scholar 

  4. Ng V, Cardie C. Improving machine learning approaches to co-reference resolution. In Proc. the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL-02), Philadelphia, PA, USA, 2002, pp.104–111.

  5. Vincent Ng. Machine learning for coreference resolution: From local classification to global ranking. In Proc. the 43rd Annual Meeting of the Association for Computational Linguistics (ACL-05), Ann Arbor, MI, 2005, pp.157–164.

  6. Yang X, Zhou G, Su J, Tan C L. Improving noun phrase co-reference resolution by matching strings. In Proc. IJCNLP-04, Hainan, China, Lecture Notes in Computer Science, Volume 3248, 2004, pp.22–31.

  7. Strube M, Rapp S, Muller C. The influence of minimum edit distance on reference resolution. In Proc. the Conference on Empirical Methods in Natural Language Processing (EMNLP-2002), Philadelphia, USA, 2002, pp.312–319.

  8. Houfeng Wang, Tingting He. Research on Chinese pronominal anaphora resolution. Chinese Journal of Computers, 2001, 24(2): 136–143.

    Google Scholar 

  9. Houfeng Wang, Zheng Mei. Robust pronominal resolution within Chinese text. Journal of Software, 2005, 16(5): 700–707.

    Article  Google Scholar 

  10. Chinchor N, Marsh E, MUC-7 Information Extraction Task Definition, In Proc. the Seventh Message Understanding Conference (MUC-7), San Diego, CA, USA, Chinchor NA (ed.), Science Applications International Corporation, 1998.

  11. Vilain M, Burger J, Aberdeen J et al. A model-theoretic coreference scoring scheme. In Proc. the Sixth Message Understanding Conference (MUC-6), Columbia, Maryland, USA, Morgan Kaufmann, 1995, pp.45–52.

  12. Doddington G, Mitchell A, Przybocki M et al. Automatic Content Extraction (ACE) program — Task definitions and performance measures. In Proc. the Fourth International Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal, 2004, pp.837–840.

  13. Florian R, Hassan H, Ittycheriah A et al. A statistical model for multilingual entity detection and tracking. In Proc. the Human Language Technology Conference — North American Chapter of the Association for Computational Linguistics Annual Meeting (HLT/NAACL-2006), Boston, Massachusetts, USA, 2004, pp.1–8.

  14. Iida R, Inui K, Takamura H et al. Incorporating contextual cues in trainable models for coreference resolution. In Proc. the EACL’03 Workshop on the Computational Treatment of Anaphora, Budapest, Hungary, 2003, pp.23–30.

  15. John Bryant. Combining feature based and semantic information for co-reference resolution. Research Report at U.C. Berkeley and ICSI.

  16. Van Deemter K, Kibble R. On Coreferring: Coreference in MUC and Related Annotation Schemes 2000. Computational Linguistics, 2004, 26(4): 629–637.

    Article  Google Scholar 

  17. Aone C, Halverson L, Hampton T, Ramos-Santacruz M. SRA: Description of the IE2 System Used for MUC-7. In Proc. the Seventh Message Understanding Conference (MUC-7), Chinchor N A (ed). San Diego, CA, Science Applications International Corporation, 1998.

    Google Scholar 

  18. Jurafsky Dan, James Martin. Speech and Language Processing. Prentice-Hall, Englewood Cliffs NJ, 2000.

    Google Scholar 

  19. Zhendong Dong, Qiang Dong. HowNet and the Computation of Meaning. Singapore: World Scientific 2006.

    Google Scholar 

  20. Qun Liu, Sujian Li. Word similarity computing based on How-net. Journal of Computational Linguistics and Chinese Language Processing, 2002, 7(2): 59–76.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Zhao.

Additional information

Supported by the National Natural Science Foundation of China under Grant Nos. 60372016, 60121302, 60673042, the National High Technology Development 863 Program of China under Grant No. 2006AA01Z144, and the Natural Science Foundation of Beijing under Grant No. 4052027.

Electronic Supplementary Material

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, J., Liu, FF. Linguistic Theory Based Contextual Evidence Mining for Statistical Chinese Co-Reference Resolution. J Comput Sci Technol 22, 608–617 (2007). https://doi.org/10.1007/s11390-007-9076-9

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-007-9076-9

Keywords

Navigation