Skip to main content

Evaluation Campaigns

  • Chapter
  • First Online:
Anaphora Resolution

Abstract

In this chapter, we overview the major efforts in evaluation campaigns (shared tasks) for coreference resolution, where multiple participants are given the same datasets and annotations, and are evaluated on the same test set and using the same scoring software, thus making it possible to compare the different participating systems. More specifically, we overview the Message Understanding Conference (MUC), the Automatic Content Extraction program (ACE), the SemEval-2010 TaskĀ 1, the i2b2-2011 shared task, and the CoNLL-2011 and 2012 shared tasks. We discuss the critical issues behind the practice of coreference resolution evaluation, such as the range of mentions defined in the annotation guidelines, the use of gold vs. predicted mentions, the layers of preprocessing information that are provided, and the multiple coreference evaluation measures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The first ACE editions focused on five classes, and later editions added vehicles and weapons.

  2. 2.

    Although OntoNotes was not originally annotated with singletons, they were identified heuristically and added in the dataset used in SemEval so as to make the different datasets as similar as possible. A few non-referential NPs that could not be automatically detected (e.g., expletive pronouns) were unavoidably annotated as singletons in this process. In the Dutch dataset, only singletons for named entities are annotated.

  3. 3.

    Some of the original corpora from which the SemEval datasets were extracted contain coreference annotations for non-NP mentions, but verbal mentions were removed to keep the evaluation campaign simpler.

  4. 4.

    Even though the gold scenario at the CoNLL-2011 and 2012 evaluations provided coreferent mentions only, not all participants exploited this hint to corefer every given mention, and left some mentions unlinked, thus not achieving 100ā€‰% recall for mention detection.

  5. 5.

    ACE also had ā€œdiagnosticā€ tasks where gold mentions were provided.

  6. 6.

    To summarize some of the variations that have been proposed:

    • Bengtson and RothĀ [8] discard the predicted mentions that have no counterpart in the gold.

    • Stoyanov etĀ al.Ā [92] use b 3 all , which retains all predicted mentions, and b 3 0, which discards all predicted mentions with no counterpart in the gold.

    • Rahman and NgĀ [74] only discard the predicted mentions that have no counterpart in the gold and that are singletons.

    • Cai and StrubeĀ [15] adjust a system output in three ways: gold mentions with no system counterpart are added as predicted singleton mentions, predicted singleton mentions with no counterpart are removed, and to compute precision, predicted coreferent mentions with no gold counterpart are added as gold singleton mentions.

  7. 7.

    This assumption used to hold for blanc [76], but not anymore since Luo etĀ al.ā€™s extension [64].

  8. 8.

    Word senses in OntoNotes have a direct one-to-many mapping to WordNet senses.

  9. 9.

    http://conll.github.io/reference-coreference-scorers/

  10. 10.

    The scorer used at SemEval-2010 was not the same version as the one used at the CoNLL-2011 and CoNLL-2012 shared tasks, as the latter incorporated a (buggy) implementation of Cai and Strubeā€™s [15] variations.

  11. 11.

    Nominal predicates and appositive phrases fell under the Identity type in the MUC annotation scheme.

  12. 12.

    http://www.itl.nist.gov/iad/mig/tests/ace/

  13. 13.

    http://projects.ldc.upenn.edu/ace/data/

  14. 14.

    The full list of seven ACE entity types includes: person (e.g., the President of the U.S.), organization (e.g., University of Tennessee), geopolitical entity (e.g., the people of France), location (e.g., Germany), facility (e.g., the oil refinery), vehicle (e.g., the train), and weapon (e.g., knife).

  15. 15.

    In Chinese, the word count is approximated by multiplying the number of characters by 1.5.

  16. 16.

    We ran the scorer using the head-word relaxed flag, as the original SemEval task did.

  17. 17.

    https://www.i2b2.org/NLP/

  18. 18.

    The corpus guidelines are given as an appendix to the main JAMIA publication [96].

  19. 19.

    http://conll.github.io/reference-coreference-scorers

References

  1. Anick, P., Hong, P., Xue, N., etĀ al.: Coreference resolution for electronic medical records. In: Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data, Boston (2011)

    Google ScholarĀ 

  2. Appelt, D.E., Hobbs, J.R., Bear, J., Israel, D., Kameyama, M., Kehler, A., Martin, D., Myers, K., Tyson, M.: SRI international FASTUS system MUC-6 test results and analysis. In: Proceedings of MUC-6, Columbia, pp.Ā 237ā€“248 (1995)

    Google ScholarĀ 

  3. Attardi, G., Rossi, S.D., Simi, M.: TANL-1: coreference resolution by parse analysis and similarity clustering. In: Proceedings of SemEval-2, Uppsala, pp.Ā 108ā€“111 (2010)

    Google ScholarĀ 

  4. Bagga, A., Baldwin, B.: Algorithms for scoring coreference chains. In: Proceedings of the LREC Workshop on Linguistic Coreference, Granada, pp.Ā 563ā€“566 (1998)

    Google ScholarĀ 

  5. Baldwin, B., Morton, T., Bagga, A., Baldridge, J., Chandraseker, R., Dimitriadis, A., Snyder, K., Wolska, M.: Description of the UPenn CAMP system as used for coreference. In: Proceedings of MUC-7, Fairfax (1998)

    Google ScholarĀ 

  6. Baldwin, B., Reynar, J., Collins, M., Eisner, J., Ratnaparkhi, A., Rosenzweig, J., Sarkar, A., Srinivas: University of Pennsylvania: description of the University of Pennsylvania system used for MUC-6. In: Proceedings of MUC-6, Columbia, pp.Ā 177ā€“191 (1995)

    Google ScholarĀ 

  7. Benajiba, Y., Shaw, J.: An SVM-based coreference resolution system based on philips information extraction. In: Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data, Boston (2011)

    Google ScholarĀ 

  8. Bengtson, E., Roth, D.: Understanding the value of features for coreference resolution. In: Proceedings of EMNLPĀ 2008, Honolulu, pp.Ā 294ā€“303 (2008)

    Google ScholarĀ 

  9. Bergsma, S., Lin, D.: Bootstrapping path-based pronoun resolution. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Sydney, pp.Ā 33ā€“40 (2006)

    Google ScholarĀ 

  10. Bjƶrkelund, A., Farkas, R.: Data-driven multilingual coreference resolution using resolver stacking. In: Proceedings of CoNLL-2012: Shared Task, Jeju Island, pp.Ā 49ā€“55 (2012)

    Google ScholarĀ 

  11. Bjƶrkelund, A., Nugues, P.: Exploring lexicalized features for coreference resolution. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp.Ā 45ā€“50 (2011)

    Google ScholarĀ 

  12. Broscheit, S., Poesio, M., Ponzetto, S.P., RodrĆ­guez, K.J., Romano, L., Uryupina, O., Versley, Y., Zanoli, R.: BART: a multilingual anaphora resolution system. In: Proceedings of SemEval-2, Uppsala, pp.Ā 104ā€“107 (2010)

    Google ScholarĀ 

  13. Cai, J., Mujdricza, E., Hou, Y., Strube, M.: Weakly supervised graph-based coreference resolution for clinical texts. In: Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data, Boston (2011)

    Google ScholarĀ 

  14. Cai, J., Mujdricza-Maydt, E., Strube, M.: Unrestricted coreference resolution via global hypergraph partitioning. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp.Ā 56ā€“60 (2011)

    Google ScholarĀ 

  15. Cai, J., Strube, M.: Evaluation metrics for end-to-end coreference resolution systems. In: Proceedings of SIGDIAL, University of Tokyo, Tokyo, pp.Ā 28ā€“36 (2010)

    Google ScholarĀ 

  16. Chang, K.W., Samdani, R., Rozovskaya, A., Rizzolo, N., Sammons, M., Roth, D.: Inference protocols for coreference resolution. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp.Ā 40ā€“44 (2011)

    Google ScholarĀ 

  17. Chang, K.W., Samdani, R., Rozovskaya, A., Sammons, M., Roth, D.: Illinois-coref: the UI system in the CoNLL-2012 shared task. In: Proceedings of CoNLL-2012: Shared Task, Jeju Island, pp.Ā 113ā€“117 (2012)

    Google ScholarĀ 

  18. Charton, E., Gagnon, M.: Poly-co: a multilayer perceptron approach for coreference detection. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp.Ā 97ā€“101 (2011)

    Google ScholarĀ 

  19. Chen, C., Ng, V.: Combining the best of two worlds: a hybrid approach to multilingual coreference resolution. In: Proceedings of CoNLL-2012: Shared Task, Jeju Island, pp.Ā 56ā€“63 (2012)

    Google ScholarĀ 

  20. Chen, W., Zhang, M., Qin, B.: Coreference resolution system using maximum entropy classifier. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp.Ā 127ā€“130 (2011)

    Google ScholarĀ 

  21. Chinchor, N.A.: Overview of MUC-7/MET-2. In: Proceedings of the Seventh Message Understanding Conference (MUC-7), Fairfax (1998)

    Google ScholarĀ 

  22. Choi, Y., Cardie, C.: Structured local training and biased potential functions for conditional random fields with application to coreference resolution. In: Proceedings of HLT-NAACL, Rochester, pp.Ā 65ā€“72 (2007)

    Google ScholarĀ 

  23. Culotta, A., Wick, M., Hall, R., McCallum, A.: First-order probabilistic models for coreference resolution. In: HLT/NAACL, Rochester, pp.Ā 81ā€“88 (2007)

    Google ScholarĀ 

  24. Dai, H., Wu, C., Chen, C., etĀ al.: Co-reference resolution of the medical concepts in the patient discharge summaries. In: Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data, Boston (2011)

    Google ScholarĀ 

  25. van Deemter, K., Kibble, R.: On coreferring: coreference in MUC and related annotation schemes. Comput. Linguist. 26 (4), 629ā€“637 (2000). Squib

    Google ScholarĀ 

  26. Denis, P., Baldridge, J.: Joint determination of anaphoricity and coreference resolution using integer programming. In: Proceedings of NAACL-HLTĀ 2007, Rochester (2007)

    Google ScholarĀ 

  27. Denis, P., Baldridge, J.: Global joint models for coreference resolution and named entity classification. Procesamiento del Lenguaje Natural 42, 87ā€“96 (2009)

    Google ScholarĀ 

  28. Doddington, G., Mitchell, A., Przybocki, M., Ramshaw, L., Strassel, S., Weischedel, R.: The automatic content extraction (ACE) program ā€“ tasks, data, and evaluation. In: Proceedings of LRECĀ 2004, Lisbon, pp.Ā 837ā€“840 (2004)

    Google ScholarĀ 

  29. Fernandes, E., dos Santos, C., MilidiĆŗ, R.: Latent structure perceptron with feature induction for unrestricted coreference resolution. In: Proceedings of CoNLL-2012: Shared Task, Jeju Island, pp.Ā 41ā€“48 (2012)

    Google ScholarĀ 

  30. Fisher, D., Soderland, S., McCarthy, J., Feng, F., Lehnert, W.: Description of the UMass system as used for MUC-6. In: Proceedings of MUC-6, Columbia, pp.Ā 127ā€“140 (1995)

    Google ScholarĀ 

  31. Fukumoto, J., Masui, F., Shimohata, M., Sasaki, M.: Oki electric industry: description of the Oki system as used for MUC-7. In: Proceedings of MUC-7, Fairfax (1998)

    Google ScholarĀ 

  32. Gaizauskas, R., Wakao, T., Humphreys, K., Cunningham, H., Wilks, Y.: University of Sheffield: description of the LaSIE system as used for MUC-6. In: Proceedings of MUC-6, Columbia, pp.Ā 207ā€“220 (1995)

    Google ScholarĀ 

  33. Garigliano, R., Urbanowicz, A., Nettleton, D.J.: University of Durham: description of the LOLITA system as used in MUC-7. In: Proceedings of MUC-7, Fairfax (1998)

    Google ScholarĀ 

  34. GƤrtner, M., Bjƶrkelund, A., Thiele, G., Seeker, W., Kuhn, J.: Visualization, search, and error analysis for coreference annotations. In: Proceedings of ACL: System Demonstrations, Baltimore, pp.Ā 7ā€“12 (2014)

    Google ScholarĀ 

  35. Glinos, D.: A search based method for clinical text coreference resolution. In: Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data, Boston (2011)

    Google ScholarĀ 

  36. Gooch, P.: Coreference resolution in clinical discharge summaries, progress notes, surgical and pathology reports: a unified lexical approach. In: Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data, Boston (2011)

    Google ScholarĀ 

  37. Grishman, R.: The NYU system for MUC-6 or whereā€™s the syntax? In: Proceedings of MUC-6, Columbia, pp.Ā 167ā€“175 (1995)

    Google ScholarĀ 

  38. Grishman, R., Sundheim, B.: Design of the MUC-6 evaluation. In: Proceedings of the Sixth Message Understanding Conference (MUC-6), Columbia (1995)

    Google ScholarĀ 

  39. Grishman, R., Sundheim, B.: Message understanding conference-6: a brief history. In: Proceedings of COLING, Copenhagen, pp.Ā 466ā€“471 (1996)

    Google ScholarĀ 

  40. Grouin, C., Dinarelli, M., Rosset, S.: Coreference resolution in clinical reports ā€“ the limsi participation in the i2b2/va 2011 challenge. In: Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data, Boston (2011)

    Google ScholarĀ 

  41. Haghighi, A., Klein, D.: Unsupervised coreference resolution in a nonparametric Bayesian model. In: Proceedings of ACL, Prague, pp.Ā 848ā€“855 (2007)

    Google ScholarĀ 

  42. Hendrickx, I., Bouma, G., Coppens, F., Daelemans, W., Hoste, V., Kloosterman, G., Mineur, A.M., Van DerĀ Vloet, J., Verschelde, J.L.: A coreference corpus and resolution system for Dutch. In: Proceedings of LREC, Marrakech (2008)

    Google ScholarĀ 

  43. Hinote, D., Ramirez, C., Chen, P.: A comparative study of co-refernece resolution in clinical text. In: Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data, Boston (2011)

    Google ScholarĀ 

  44. Hinrichs, E., KĆ¼bler, S., Naumann, K.: A unified representation for morphological, syntactic, semantic and referential annotations. In: ACL Workshop on Frontiers in Corpus Annotation II: Pie in the Sky, Ann Arbor (2005)

    Google ScholarĀ 

  45. Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., Weischedel, R.: OntoNotes: the 90ā€‰% solution. In: Proceedings of HLT/NAACL, pp.Ā 57ā€“60. Association for Computational Linguistics, NewĀ York City (2006)

    Google ScholarĀ 

  46. Humphreys, K., Gaizauskas, R., Azzam, S., Huyck, C., Mitchell, B., Cunningham, H., Wilks, Y.: University of Sheffield: description of the LaSIE-II system as used for MUC-7. In: Proceedings of MUC-7, Fairfax (1998)

    Google ScholarĀ 

  47. Irwin, J., Komachi, M., Matsumoto, Y.: Narrative schema as world knowledge for coreference resolution. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp.Ā 86ā€“92 (2011)

    Google ScholarĀ 

  48. Jindal, P., Roth, D.: Using domain knowledge and domain-inspired discourse model for coreference resolution for clinical narratives. In: Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data, Boston (2011)

    Google ScholarĀ 

  49. Klein, D., Kummerfeld, J.K., Bansal, M., Burkett, D.: Mention detection: heuristics for the OntoNotes annotations. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp.Ā 102ā€“106 (2011)

    Google ScholarĀ 

  50. Klenner, M., Tuggener, D.: An incremental model for coreference resolution with restrictive antecedent accessibility. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp.Ā 81ā€“85 (2011)

    Google ScholarĀ 

  51. Kobdani, H., SchĆ¼tze, H.: SUCRE: a modular system for coreference resolution. In: Proceedings of SemEval-2, Uppsala, pp.Ā 92ā€“95 (2010)

    Google ScholarĀ 

  52. Kobdani, H., SchĆ¼tze, H.: Supervised coreference resolution with SUCRE. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp.Ā 71ā€“75 (2011)

    Google ScholarĀ 

  53. Kummerfeld, J.K., Klein, D.: Error-driven analysis of challenges in coreference resolution. In: Proceedings of EMNLP, Seattle, pp.Ā 265ā€“277 (2013)

    Google ScholarĀ 

  54. Lan, M., Zhao, J., Zhang, K., etĀ al.: Comparative investigation on learning-based and rule-based approaches to coreference resolution in clinic domain: a case study in i2b2 challenge 2011 Task 1. In: Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data, Boston (2011)

    Google ScholarĀ 

  55. Lee, H., Peirsman, Y., Chang, A., Chambers, N., Surdeanu, M., Jurafsky, D.: Stanfordā€™s multi-pass sieve coreference resolution system at the CoNLL-2011 shared task. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp.Ā 28ā€“34 (2011)

    Google ScholarĀ 

  56. Li, B.: Learning to model multilingual unrestricted coreference in OntoNotes. In: Proceedings of CoNLL-2012: Shared Task, Jeju Island, pp.Ā 129ā€“135 (2012)

    Google ScholarĀ 

  57. Li, X., Wang, X., Liao, X.: Simple maximum entropy models for multilingual coreference resolution. In: Proceedings of CoNLL-2012: Shared Task, Jeju Island, pp.Ā 83ā€“87 (2012)

    Google ScholarĀ 

  58. Li, X., Wang, X., Qi, S.: Coreference resolution with loose transitivity constraints. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp.Ā 107ā€“111 (2011)

    Google ScholarĀ 

  59. Lin, D.: University of Manitoba: description of the PIE system used for MUC-6. In: Proceedings of MUC-6, Columbia, pp.Ā 114ā€“126 (1995)

    Google ScholarĀ 

  60. Lin, D.: Using collocation statistics in information extraction. In: Proceedings of MUC-7, Fairfax (1998)

    Google ScholarĀ 

  61. Luo, X.: On coreference resolution performance metrics. In: Proceedings of HLT-EMNLP, Vancouver, pp.Ā 25ā€“32 (2005)

    Google ScholarĀ 

  62. Luo, X.: Coreference or not: a twin model for coreference resolution. In: Proceedings of HLT-NAACLĀ 2007, Rochester, pp.Ā 73ā€“80 (2007)

    Google ScholarĀ 

  63. Luo, X., Ittycheriah, A., Jing, H., Kambhatla, N., Roukos, S.: A mention-synchronous coreference resolution algorithm based on the Bell tree. In: Proceedings of ACL, Barcelona, pp.Ā 21ā€“26 (2004)

    Google ScholarĀ 

  64. Luo, X., Pradhan, S., Recasens, M., Hovy, E.: An extension of BLANC to system mentions. In: Proceedings of ACL, Baltimore, pp.Ā 24ā€“29 (2014)

    Google ScholarĀ 

  65. MĆ rquez, L., Recasens, M., Sapena, E.: Coreference resolution: an empirical study based on SemEval-2010 shared Task 1. Lang. Resour. Eval. 47 (3), 661ā€“694 (2012)

    ArticleĀ  Google ScholarĀ 

  66. Martschat, S., Cai, J., Broscheit, S., MĆŗjdricza-Maydt, Ɖ., Strube, M.: A multigraph model for coreference resolution. In: Proceedings of CoNLL-2012: Shared Task, Jeju Island, pp.Ā 100ā€“106 (2012)

    Google ScholarĀ 

  67. Mitkov, R.: Towards a more consistent and comprehensive evaluation of anaphora resolution algorithms and systems. In: Proceedings of the Discourse Anaphora and Anaphora Resolution Colloquium (DAARC 2000), Lancaster, pp.Ā 96ā€“107 (2010)

    Google ScholarĀ 

  68. Morgan, R., Garigliano, R., Callaghan, P., Poria, S., Smith, M., Urbanowicz, A., Collingham, R., Costantino, M., Cooper, C., the LOLITAĀ Group: University of Durham: description of the LOLITA system as used in MUC-6. In: Proceedings of MUC-6, Columbia, pp.Ā 71ā€“85 (1995)

    Google ScholarĀ 

  69. Ng, V.: Graph-cut-based anaphoricity determination for coreference resolution. In: Proceedings of NAACL-HLTĀ 2009, Boulder, pp.Ā 575ā€“583 (2009)

    Google ScholarĀ 

  70. Pradhan, S., Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., Weischedel, R.: OntoNotes: a unified relational semantic representation. Int. J. Semant. Comput. 1 (4), 405ā€“419 (2007)

    ArticleĀ  Google ScholarĀ 

  71. Pradhan, S., Luo, X., Recasens, M., Hovy, E., Ng, V., Strube, M.: Scoring coreference partitions of predicted mentions: a reference implementation. In: Proceedings of ACL, Baltimore, pp.Ā 30ā€“35 (2014)

    Google ScholarĀ 

  72. Pradhan, S., Moschitti, A., Xue, N., Uryupina, O., Zhang, Y.: CoNLL-2012 shared task: modeling multilingual unrestricted coreference in OntoNotes. In: Proceedings of CoNLL-2012: Shared Task, Jeju Island, pp.Ā 1ā€“40 (2012)

    Google ScholarĀ 

  73. Pradhan, S., Ramshaw, L., Marcus, M., Palmer, M., Weischedel, R., Xue, N.: CoNLL-2011 shared task: modeling unrestricted coreference in OntoNotes. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp.Ā 1ā€“27 (2011)

    Google ScholarĀ 

  74. Rahman, A., Ng, V.: Supervised models for coreference resolution. In: Proceedings of EMNLPĀ 2009, Suntec, pp.Ā 968ā€“977 (2009)

    Google ScholarĀ 

  75. Recasens, M., Hovy, E.: Coreference resolution across corpora: languages, coding schemes, and preprocessing information. In: Proceedings of ACL, Uppsala, pp.Ā 1423ā€“1432 (2010)

    Google ScholarĀ 

  76. Recasens, M., Hovy, E.: BLANC: implementing the rand index for coreference evaluation. Nat. Lang. Eng. 17 (4), 485ā€“510 (2011)

    ArticleĀ  Google ScholarĀ 

  77. Recasens, M., deĀ Marneffe, M.C., Potts, C.: The life and death of discourse entities: identifying singleton mentions. In: Proceedings of NAACL-2013, Atlanta, pp.Ā 627ā€“633 (2013)

    Google ScholarĀ 

  78. Recasens, M., MĆ rquez, L., Sapena, E., MartĆ­, M.A., TaulĆ©, M., Hoste, V., Poesio, M., Versley, Y.: SemEval-2010 Task 1: coreference resolution in multiple languages. In: Proceedings of SemEval-2, Uppsala, pp.Ā 1ā€“8 (2010)

    Google ScholarĀ 

  79. Recasens, M., MartĆ­, M.A.: AnCora-CO: coreferentially annotated corpora for Spanish and Catalan. Lang. Resour. Eval. 44 (4), 315ā€“345 (2010)

    ArticleĀ  Google ScholarĀ 

  80. Rink, B., Harabagiu, S.: A supervised multi-pass sieve approach for resolving coreference in clinical records. In: Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data, Boston (2011)

    Google ScholarĀ 

  81. RodrĆ­guez, K.J., Delogu, F., Versley, Y., Stemle, E., Poesio, M.: Anaphoric annotation of Wikipedia and blogs in the live memories corpus. In: Proceedings LREC, Valletta (poster) (2010)

    Google ScholarĀ 

  82. dos Santos, C.N., Carvalho, D.L.: Rule and tree ensembles for unrestricted coreference resolution. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp.Ā 51ā€“55 (2011)

    Google ScholarĀ 

  83. Sapena, E., PadrĆ³, L., Turmo, J.: RelaxCor: a global relaxation labeling approach to coreference resolution for the SemEval-2 coreference task. In: Proceedings of SemEval-2, Uppsala, pp.Ā 88ā€“91 (2010)

    Google ScholarĀ 

  84. Sapena, E., PadrĆ³, L., Turmo, J.: RelaxCor participation in CoNLL shared task on coreference resolution. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp.Ā 35ā€“39 (2011)

    Google ScholarĀ 

  85. Savova, G.K., Chapman, W.W., Zheng, J., Crowley, R.S.: Anaphoric relations in the clinical narrative: corpus creation. J. Am. Med. Inform. Assoc. 18 (4), 459ā€“465 (2011)

    ArticleĀ  Google ScholarĀ 

  86. Shou, H., Zhao, H.: System paper for CoNLL-2012 shared task: hybrid rule-based algorithm for coreference resolution. In: Proceedings of CoNLL-2012: Shared Task, Jeju Island, pp.Ā 118ā€“121 (2012)

    Google ScholarĀ 

  87. Sobha, L.D., Pattabhi, R.K.R., Vijay Sundar Ram, R., Malarkodi, C.S., Akilandeswari, A.: Hybrid approach for coreference resolution. In: Proceedings of CoNLL-2011: Shared Task, Portland (2011)

    Google ScholarĀ 

  88. Song, Y., Wang, H., Jiang, J.: Link type based pre-cluster pair model for coreference resolution. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp.Ā 131ā€“315 (2011)

    Google ScholarĀ 

  89. Soon, W.M., Ng, H.T., Lim, D.C.Y.: A machine learning approach to coreference resolution of noun phrases. Comput. Linguist. 27 (4), 521ā€“544 (2001)

    ArticleĀ  Google ScholarĀ 

  90. Stamborg, M., Medved, D., Exner, P., Nugues, P.: Using syntactic dependencies to solve coreferences. In: Proceedings of CoNLL-2012: Shared Task, Jeju Island, pp.Ā 64ā€“70 (2012)

    Google ScholarĀ 

  91. Stoyanov, V., Babbar, U., Gupta, P., Cardie, C.: Reconciling OntoNotes: unrestricted coreference resolution in OntoNotes with reconcile. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp.Ā 122ā€“126 (2011)

    Google ScholarĀ 

  92. Stoyanov, V., Gilbert, N., Cardie, C., Riloff, E.: Conundrums in noun phrase coreference resolution: making sense of the state-of-the-art. In: Proceedings of ACL-IJCNLP, Singapore, pp.Ā 656ā€“664 (2009)

    Google ScholarĀ 

  93. Uryupina, O.: Corry: a system for coreference resolution. In: Proceedings of SemEval-2, Uppsala, pp.Ā 100ā€“103 (2010)

    Google ScholarĀ 

  94. Uryupina, O., Moschitti, A., Poesio, M.: BART goes multilingual: the UniTN/Essex submission to the CoNLL-2012 shared task. In: Proceedings of CoNLL-2012: Shared Task, Jeju Island, pp.Ā 122ā€“128 (2012)

    Google ScholarĀ 

  95. Uryupina, O., Saha, S., Ekbal, A., Poesio, M.: Multi-metric optimization for coreference: the UniTN/IITP/Essex submission to the 2011 CONLL shared task. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp.Ā 61ā€“65 (2011)

    Google ScholarĀ 

  96. Uzuner, O., Bodnari, A., Shen, S., Forbush, T., Pestian, J., South, B.R.: Evaluating the state of the art in coreference resolution for electronic medical records. J. Am. Med. Inform. Assoc. 19 (5), 786ā€“791 (2012)

    ArticleĀ  Google ScholarĀ 

  97. Vilain, M., Burger, J., Aberdeen, J., Connolly, D., Hirschman, L.: A model-theoretic coreference scoring scheme. In: Proceedings of MUC-6, Columbia, pp.Ā 45ā€“52 (1995)

    Google ScholarĀ 

  98. Ware, H., Mullet, C., Jagannathan, V., El-Rawas, O.: Machine learning-based coreference resolution of concepts in clinical documents. In: Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data, Boston (2011)

    Google ScholarĀ 

  99. Weischedel, R., Hovy, E., Palmer, M., Marcus, M., Belvin, R., Pradhan, S., Ramshaw, L., Xue, N.: OntoNotes: a large training corpus for enhanced processing. In: Olive, J., Christianson, C., McCary, J. (eds.) Handbook of Natural Language Processing and Machine Translation. Springer, New York (2011)

    Google ScholarĀ 

  100. Xiong, H., Liu, Q.: ICT: system description for CoNLL-2012. In: Proceedings of CoNLL-2012: Shared Task, Jeju Island, pp.Ā 71ā€“75 (2012)

    Google ScholarĀ 

  101. Xiong, H., Song, L., Meng, F., Liu, Y., Liu, Q., Lv, Y.: ETS: an error tolerable system for coreference resolution. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp.Ā 76ā€“80 (2011)

    Google ScholarĀ 

  102. Xu, Y., Liu, J., Wu, J., Wang, Y., Chang, E.: EHUATUO: a mention-pair coreference system by exploiting document intrinsic latent structures and world knowledge in discharge summaries (Rank 1). In: Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data, Boston (2011)

    Google ScholarĀ 

  103. Xu, R., Xu, J., Liu, J., Liu, C., Zou, C., Gui, L., Zheng, Y., Qu, P.: Incorporating rule-based and statistic-based techniques for coreference resolution. In: Proceedings of CoNLL-2012: Shared Task, Jeju Island, pp.Ā 107ā€“112 (2012)

    Google ScholarĀ 

  104. Yang, H., Willis, A., DeĀ Roeck, A., Nuseibeh, B.: A system for coreference resolution in clinical documents. In: Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data, Boston (2011)

    Google ScholarĀ 

  105. Yang, Y., Xue, N., Anick, P.: A machine learning-based coreference detection system for OntoNotes. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp.Ā 117ā€“121 (2011)

    Google ScholarĀ 

  106. Yuan, B., Chen, Q., Xiang, Y., Wang, X., Ge, L., Liu, Z., Liao, M., Si, X.: A mixed deterministic model for coreference resolution. In: Proceedings of CoNLL-2012: Shared Task, Jeju Island, pp.Ā 76ā€“82 (2012)

    Google ScholarĀ 

  107. Zhang, X., Wu, C., Zhao, H.: Chinese coreference resolution via ordered filtering. In: Proceedings of CoNLL-2012: Shared Task, Jeju Island, pp.Ā 95ā€“99 (2012)

    Google ScholarĀ 

  108. Zhekova, D., KĆ¼bler, S.: UBIU: a language-independent system for coreference resolution. In: Proceedings of SemEval-2, Uppsala, pp.Ā 96ā€“99 (2010)

    Google ScholarĀ 

  109. Zhekova, D., KĆ¼bler, S.: UBIU: a robust system for resolving unrestricted coreference. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp.Ā 112ā€“116 (2011)

    Google ScholarĀ 

  110. Zhekova, D., KĆ¼bler, S., Bonner, J., Ragheb, M., Hsu, Y.Y.: UBIU for multilingual coreference resolution in ontonotes. In: Proceedings of CoNLL-2012: Shared Task, Jeju Island, pp.Ā 88ā€“94 (2012)

    Google ScholarĀ 

  111. Zhou, H., Li, Y., Huang, D., Zhang, Y., Wu, C., Yang, Y.: Combining syntactic and semantic features by SVM for unrestricted coreference resolution. In: Proceedings of CoNLL-2011: Shared Task, Portland, pp.Ā 66ā€“70 (2011)

    Google ScholarĀ 

Download references

Acknowledgements

We would like to thank our co-organizers of SemEval-2010 TaskĀ 1 (LluĆ­s MĆ rquez, Emili Sapena, M. AntĆ²nia MartĆ­, Mariona TaulĆ©, VĆ©ronique Hoste, Massimo Poesio, and Yannick Versley) and the CoNLL-2011/2012 Shared Tasks (Lance Ramshaw, Mitchell Marcus, Martha Palmer, Ralph Weischedel, Alessandro Moschitti, Nianwen Xue, Olga Uryupina, and Yuchen Zhang), as well as the organizers of the MUC, ACE and i2b2 evaluation campaigns.

We would also like to thank all the participants. Without their hard work, patience and perseverance, these evaluations would not have happened.

The second author gratefully acknowledges the support of the Defense Advanced Research Projects Agency (DARPA/IPTO) under the GALE program, DARPA/CMO Contract No.Ā HR0011-06-C-0022, grants R01LM10090 from the National Library of Medicine, and IIS-1219142 from the National Science Foundation and the European Communityā€™s Seventh Framework Programme (FP7/2007-2013) under grant number 288024 (LiMoSINe).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marta Recasens .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2016 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Recasens, M., Pradhan, S. (2016). Evaluation Campaigns. In: Poesio, M., Stuckardt, R., Versley, Y. (eds) Anaphora Resolution. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-47909-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-47909-4_6

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-47908-7

  • Online ISBN: 978-3-662-47909-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics