Abstract
Under statistical learning framework, the paper focuses on how to use traditional linguistic findings on anaphora resolution as a guide for mining and organizing contextual features for Chinese co-reference resolution. The main achievements are as follows. (1) In order to simulate “syntactic and semantic parallelism factor”, we extract “bags of word form and POS” feature and “bag of semes” feature from the contexts of the entity mentions and incorporate them into the baseline feature set. (2) Because it is too coarse to use the feature of bags of word form, POS tag and seme to determine the syntactic and semantic parallelism between two entity mentions, we propose a method for contextual feature reconstruction based on semantic similarity computation, in order that the reconstructed contextual features could better approximate the anaphora resolution factor of “Syntactic and Semantic Parallelism Preferences”. (3) We use an entity-mention-based contextual feature representation instead of isolated word-based contextual feature representation, and expand the size of the contextual windows in addition, in order to approximately simulate “the selectional restriction factor” for anaphora resolution. The experiments show that the multi-level contextual features are useful for co-reference resolution, and the statistical system incorporated with these features performs well on the standard ACE datasets.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Mitkov R. Anaphora Resolution. London: Longman Press, 2002.
NIST. The Official Evaluation Plan for the ACE 2005 Evaluation. 2005, http://www.nist.gov/speech/tests/ace/ace05/.
Soon W M, Ng H T, Lim D. A machine learning approach to co-reference resolution of noun phrases. Computational Linguistics, 2001, 27(4): 521–544.
Ng V, Cardie C. Improving machine learning approaches to co-reference resolution. In Proc. the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL-02), Philadelphia, PA, USA, 2002, pp.104–111.
Vincent Ng. Machine learning for coreference resolution: From local classification to global ranking. In Proc. the 43rd Annual Meeting of the Association for Computational Linguistics (ACL-05), Ann Arbor, MI, 2005, pp.157–164.
Yang X, Zhou G, Su J, Tan C L. Improving noun phrase co-reference resolution by matching strings. In Proc. IJCNLP-04, Hainan, China, Lecture Notes in Computer Science, Volume 3248, 2004, pp.22–31.
Strube M, Rapp S, Muller C. The influence of minimum edit distance on reference resolution. In Proc. the Conference on Empirical Methods in Natural Language Processing (EMNLP-2002), Philadelphia, USA, 2002, pp.312–319.
Houfeng Wang, Tingting He. Research on Chinese pronominal anaphora resolution. Chinese Journal of Computers, 2001, 24(2): 136–143.
Houfeng Wang, Zheng Mei. Robust pronominal resolution within Chinese text. Journal of Software, 2005, 16(5): 700–707.
Chinchor N, Marsh E, MUC-7 Information Extraction Task Definition, In Proc. the Seventh Message Understanding Conference (MUC-7), San Diego, CA, USA, Chinchor NA (ed.), Science Applications International Corporation, 1998.
Vilain M, Burger J, Aberdeen J et al. A model-theoretic coreference scoring scheme. In Proc. the Sixth Message Understanding Conference (MUC-6), Columbia, Maryland, USA, Morgan Kaufmann, 1995, pp.45–52.
Doddington G, Mitchell A, Przybocki M et al. Automatic Content Extraction (ACE) program — Task definitions and performance measures. In Proc. the Fourth International Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal, 2004, pp.837–840.
Florian R, Hassan H, Ittycheriah A et al. A statistical model for multilingual entity detection and tracking. In Proc. the Human Language Technology Conference — North American Chapter of the Association for Computational Linguistics Annual Meeting (HLT/NAACL-2006), Boston, Massachusetts, USA, 2004, pp.1–8.
Iida R, Inui K, Takamura H et al. Incorporating contextual cues in trainable models for coreference resolution. In Proc. the EACL’03 Workshop on the Computational Treatment of Anaphora, Budapest, Hungary, 2003, pp.23–30.
John Bryant. Combining feature based and semantic information for co-reference resolution. Research Report at U.C. Berkeley and ICSI.
Van Deemter K, Kibble R. On Coreferring: Coreference in MUC and Related Annotation Schemes 2000. Computational Linguistics, 2004, 26(4): 629–637.
Aone C, Halverson L, Hampton T, Ramos-Santacruz M. SRA: Description of the IE2 System Used for MUC-7. In Proc. the Seventh Message Understanding Conference (MUC-7), Chinchor N A (ed). San Diego, CA, Science Applications International Corporation, 1998.
Jurafsky Dan, James Martin. Speech and Language Processing. Prentice-Hall, Englewood Cliffs NJ, 2000.
Zhendong Dong, Qiang Dong. HowNet and the Computation of Meaning. Singapore: World Scientific 2006.
Qun Liu, Sujian Li. Word similarity computing based on How-net. Journal of Computational Linguistics and Chinese Language Processing, 2002, 7(2): 59–76.
Author information
Authors and Affiliations
Corresponding author
Additional information
Supported by the National Natural Science Foundation of China under Grant Nos. 60372016, 60121302, 60673042, the National High Technology Development 863 Program of China under Grant No. 2006AA01Z144, and the Natural Science Foundation of Beijing under Grant No. 4052027.
Electronic Supplementary Material
Rights and permissions
About this article
Cite this article
Zhao, J., Liu, FF. Linguistic Theory Based Contextual Evidence Mining for Statistical Chinese Co-Reference Resolution. J Comput Sci Technol 22, 608–617 (2007). https://doi.org/10.1007/s11390-007-9076-9
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-007-9076-9