Skip to main content
Log in

Annotating abstract anaphora

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

In this paper, we present first results from annotating abstract (discourse-deictic) anaphora in German. Our annotation guidelines provide linguistic tests for identifying the antecedent, and for determining the semantic types of both the antecedent and the anaphor. The corpus consists of selected speaker turns from the Europarl corpus. To date, 100 texts have been annotated according to these guidelines. The annotations show that anaphoric personal and demonstrative pronouns differ with respect to the distance to their antecedents. A semantic analysis reveals that, contrary to suggestions put forward in the literature, referents of anaphors do not tend to be more abstract than the referents of their antecedents.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. In an earlier version of the guidelines (cf. Dipper and Zinsmeister 2009), annotators had to determine the semantic types of the antecedents via reference to a table that listed 10 types of propositional entities, such as event, process, state, fact (see e.g., Asher 1993; Maienborn 2003; Vendler 1967). Annotators had to select an entity on the basis of features like world-dependent, time-dependent, dynamic, telic, and by applying linguistic tests from theoretical semantic work, such as “Is it possible to add frequency adverbials like ‘three times’, or time frames like ‘within one hour’, or time spans like ‘for one hour’?” (Dowty 1979). These tests, however, proved very difficult to apply to naturally occurring sentences. Therefore, the annotation criteria were redesigned as described in the text.

  2. MMAX2: http://www.mmax2.sourceforge.net/.

  3. For the task of classifying abstract anaphor versus the rest, we achieve an inter-annotator agreement of κ = .79 (computed according to Artstein and Poesio 2008).

  4. According to the Kolmogorov-Smirnov test, antecedent lengths for das versus dies do not differ significantly, but they do if compared to es: D = .36, p < .01 (das versus es); D = .36, p < .05 (dies versus es).

  5. Inter-annotator agreement for the task of identifying the antecedent is .40 (observed agreement on exact matches). If only the head verb of the antecedent is considered, agreement improves to .55, and measuring simple overlap yields .84.

  6. According to the Kolmogorov-Smirnov test, distances associated with dies versus es do not differ significantly. Significant differences exist between das versus dies: D = .29, p < .01; and between das versus es: D = .36, p < .01.

  7. For the annotation of semantic types in the NP Replacement Test, we achieved inter-annotator agreements of .75 for head noun classes and .77 for article classes. In the Colon Test, we achieved observed agreements of .70 for noun classes, .92 for article classes, and .86 for verb classes. For evaluating agreement, we consider both the first-to-the-mind choice as well as the alternative choice as equal candidates. If the annotators agree on any candidate, we consider it to be a match. As a consequence, we can only compute observed agreement but not expected agreement. The same holds for antecedent identification (see footnote 6).

  8. If multiple or ambiguous paraphrases have been annotated, the most concrete type is chosen for antecedents, and the most abstract type for anaphors. This is done in order to satisfy the Abstractness Hypothesis.

References

  • Artstein, R., & Poesio, M. (2006). Identifying reference to abstract objects in dialogue. In Proceedings of Brandial (pp. 56–63).

  • Artstein, R., & Poesio, M. (2008). Inter-coder agreement for computational linguistics (survey article). Computational Linguistics, 34(4), 555–596.

    Article  Google Scholar 

  • Asher, N. (1993). Reference to abstract objects in discourse. Boston: Kluwer Academic Publishers.

    Book  Google Scholar 

  • Botley, S. (2006). Indirect anaphora: Testing the limits of corpus-based linguistics. International Journal of Corpus Linguistics, 11(1), 73–112.

    Article  Google Scholar 

  • Botley, S., & McEnery, T. (2001). Demonstratives in English: A corpus-based study. Journal of English Linguistics, 29, 7–33.

    Article  Google Scholar 

  • Byron, D. K. (2002). Resolving pronominal reference to abstract entities. In Proceedings of ACL-02 (pp. 80–87).

  • Byron, D. K. (2003). Annotation of pronouns and their antecedents: A comparison of two domains. Technical Report, University of Rochester.

  • Consten, M., & Knees, M. (2005). Complex anaphors—ontology and resolution. In Proceedings of the 15th Amsterdam Colloquium (pp. 65–70).

  • Consten, M., Knees, M., & Schwarz-Friesel, M. (2007). The function of complex anaphors in texts: Evidence from corpus studies and ontological considerations. In Anaphors in text (pp. 81–102). Amsterdam: John Benjamins.

  • Dipper, S., & Zinsmeister, H. (2009). Annotating discourse anaphora. In Proceedings of the ACL-IJCNLP Linguistic Annotation Workshop (LAW III) (pp. 166–169).

  • Dipper, S., & Zinsmeister, H. (2009). Annotation guidelines “discourse-deictic anaphora”. Draft. Universities of Bochum and Konstanz.

  • Dowty, D. (1979). Word meaning and montague grammar. Dordrecht: Reidel.

    Book  Google Scholar 

  • Eckert, M., & Strube, M. (2000). Dialogue acts, synchronising units and anaphora resolution. Journal of Semantics, 17(1), 51–89.

    Article  Google Scholar 

  • Fellbaum, C. (1998). WordNet: An electronic lexical database. Cambridge, MA: MIT Press.

    Google Scholar 

  • Gundel, J. K., Hedberg, N., & Zacharski, R. (2004). Demonstrative pronouns in natural discourse. In Proceedings of the 5th Discourse Anaphora and Anaphora Resolution Colloquium (DAARC-2007), pp. 81–86.

  • Hedberg, N., Gundel, J. K., & Zacharski, R. (2007). Directly and indirectly anaphoric demonstrative and personal pronouns in newspaper articles. In Proceedings of the 6th Discourse Anaphora and Anaphora Resolution Colloquium (DAARC-2004), pp. 31–36.

  • Hegarty, M., Gundel, J. K., & Borthen, K. (2001). Information structure and the accessibility of clausally introduced referents. Theoretical Linguistics, 27(2–3), 163–186.

    Google Scholar 

  • Koehn, P. (2005). Europarl: A parallel corpus for statistical machine translation. In Proceedings of the 10th Machine Translation Summit (MT summit), pp. 79–86.

  • Kučová, L., & Hajičová, E. (2004). Coreferential relations in the prague dependency treebank. In Proceedings of the 5th Discourse Anaphora and Anaphora Resolution Colloquium (DAARC-2004), pp. 97–102.

  • Lappin, S., & Leass, H. J. (2004). An algorithm for pronominal anaphora resolution. Computational Linguistics, 20(4), 535–561.

    Google Scholar 

  • Maienborn, C. (2003). Die logische Form von Kopula-Sätzen. Berlin: Akademie Verlag.

    Google Scholar 

  • Müller, C. (2007). Resolving it, this, and that in unrestricted multi-party dialog. In Proceedings of ACL-07, pp. 816–823.

  • Müller, C. (2008). Fully automatic resolution of ‘it’ , ‘this’, and ‘that’ in unrestricted multi-party dialog. Ph.D. thesis, University of Tübingen.

  • Navarretta, C. (2008). Pronominal types and abstract reference in the Danish and Italian DAD corpora. In Proceedings of the 2nd Workshop on Anaphora Resolution (WAR II), pp. 63–71.

  • Navarretta, C., & Olsen, S. (2008). Annotating abstract pronominal anaphora in the DAD project. In Proceedings of LREC-08, pp. 2046–2052.

  • Ng, V. (2010). Supervised noun phrase coreference research: The first 15 years. In Proceedings of ACL-10, pp. 1396–1411.

  • Poesio, M., & Artstein, R. (2008). Anaphoric annotation in the ARRAU corpus. In Proceedings of LREC-08, pp. 1170–1174.

  • Poesio, M., & Modjeska, N. N. (2005). Focus, activation, and this-noun phrases: An empirical study. In Branco, A., McEnery, T., & Mitkov, R. (Eds.), Anaphora processing. Amsterdam/Philadelphia: John Benjamins, pp. 429–442.

    Google Scholar 

  • Pradhan, S. S., Ramshaw, L., Weischedel, R., MacBride, J., & Micciulla, L. (2007). Unrestricted coreference: Identifying entities and events in OntoNotes. In Proceedings of the International Conference on Semantic Computing (ICSC 2007), pp. 446–453.

  • Recasens, M. (2008a). Discourse deixis and coreference: Evidence from AnCora. In Proceedings of the 2nd Workshop on Anaphora Resolution (WAR II), pp. 73–82.

  • Recasens, M. (2008b). Towards coreference resolution for Catalan and Spanish. Master’s thesis, University of Barcelona.

  • Recasens, M., & Martí, M. A. (2009). AnCora-CO: Coreferentially annotated corpora for Spanish and Catalan. Language Resources and Evaluation, 44(4), 315–345.

    Google Scholar 

  • Vendler, Z. (1967). Linguistics in philosophy, Chap. Verbs and time. Ithaka: Cornell University Press, pp. 97–121.

  • Vieira, R., Salmon-Alt, S., & Gasperin, C. (2005). Coreference and anaphoric relations of demonstrative noun phrases in a multilingual corpus. In Branco, A., McEnery, T., & Mitkov, R. (Eds.), Anaphora processing: Linguistic, cognitive and computational modelling. Amsterdam: John Benjamins, pp. 385–403.

    Google Scholar 

  • Webber, B. L. (1988). Discourse deixis: Reference to discourse segments. In Proceedings of ACL-88, pp. 113–122.

  • Webber, B. L. (1991). Structure and ostention in the interpretation of discourse deixis. Language and Cognitive Processes, 6, 107–135.

    Article  Google Scholar 

Download references

Acknowledgments

We would like to thank the anonymous reviewers for their helpful comments, and our student annotators: Iris Bräuning, Christine Rieger, and Melanie Seiß. The work reported here was in part supported by Europäischer Sozialfonds in Baden-Württemberg, by the Rectorate’s Fund of Ruhr-University Bochum, and by the Young Scholar Fund of the University of Konstanz’s Excellence Initiative.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stefanie Dipper.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dipper, S., Zinsmeister, H. Annotating abstract anaphora. Lang Resources & Evaluation 46, 37–52 (2012). https://doi.org/10.1007/s10579-011-9160-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-011-9160-1

Keywords

Navigation