FactBank: a corpus annotated with event factuality

Saurí, Roser; Pustejovsky, James

doi:10.1007/s10579-009-9089-9

FactBank: a corpus annotated with event factuality

Published: 07 May 2009

Volume 43, pages 227–268, (2009)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

Roser Saurí¹ &
James Pustejovsky¹

2507 Accesses
109 Citations
Explore all metrics

Abstract

Recent work in computational linguistics points out the need for systems to be sensitive to the veracity or factuality of events as mentioned in text; that is, to recognize whether events are presented as corresponding to actual situations in the world, situations that have not happened, or situations of uncertain interpretation. Event factuality is an important aspect of the representation of events in discourse, but the annotation of such information poses a representational challenge, largely because factuality is expressed through the interaction of numerous linguistic markers and constructions. Many of these markers are already encoded in existing corpora, albeit in a somewhat fragmented way. In this article, we present FactBank, a corpus annotated with information concerning the factuality of events. Its annotation has been carried out from a descriptive framework of factuality grounded on both theoretical findings and data analysis. FactBank is built on top of TimeBank, adding to it an additional level of semantic information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Natural Language Processing

When tense shifts presuppositions: hani and monstrous semantics

Article 22 December 2023

Furkan Dikmen, Elena Guerzoni & Ömer Demirok

Counterpart Theory and Actuality

Article Open access 02 April 2024

James Milford

Notes

The main references for these corpora are: PropBank (Palmer et al. 2005), FrameNet (Baker et al. 1998), RST Corpus (Carlson et al. 2003), Penn Discourse TreeBank (Miltsakaki et al. 2004), GraphBank (Wolf and Gibson 2005), TimeBank (Pustejovsky et al. 2006), MPQA Opinion Corpus (Wiebe et al. 2005).
In this article, the term event will be used in a very broad sense to refer to both processes and states, but also other abstract objects such as propositions, facts, possibilities, etc.
This is distinct from most of the work within truth-conditional semantics, which conceives of modality as independent from the speaker’s perspective (e.g., Kratzer 1991).
Here and throughout the rest of the article, events in the examples will be identified by marking only their verb, noun, or adjective head, together with polarity particles and auxiliaries when deemed necessary. This follows the convention assumed in TimeML, the specification language used to represent event and temporal information in the corpus presented here (Pustejovsky et al. 2006).
Some authors use the term hedging to refer to markers of modality expressing the degree of commitment of the source towards the certainty of a proposition. See, e.g., Clemen (1997).
See Saurí (2008) for a more comprehensive view on the factuality of events and its identification.
The original sentence in this set is (17b), from the British National Corpus.
Furthermore, Nairn et al. (2006), Saurí and Pustejovsky (2007), and Saurí (2008) show that the interaction among all these elements can be modeled in a predictable way.
This is equivalent to the notation < author,izvestiya > in Wiebe’s work. Here, we adopt a reversed representation of the nesting (i.e., the non-embedded source last) because it positions the most direct source of the event at the outmost layer, thus facilitating its reading.
From Rubin (2006, p. 59).
Scalar predications are conceived as collections of predicates P_n such as <P_j, P_j−1, …, P₂, P₁>, where P_n outranks (i.e., is stronger than) P_n−1 on the relevant scale.
The vowels naming the vertices, which are derived from Latin verbs a ff i rmo ‘I affirm’, and n e g o ‘I deny’, reflect this distinction.
Semantically, this can be interpreted as: Val(mod,Val(pol,e))—i.e., the modal value scopes over the polarity value.
This step is applied here only for the purpose of illustrating the complete process, although it should be clear just from the meaning of the sentence that the event change in the original example is presented with some degree of uncertainty.
http://www.timeml.org/site/timebank/timebank.html.
The figures reported here update those reported in previous work (Saurí 2008; Saurí and Pustejovsky 2008).
TimeML has moved towards a stand-off annotation. The example here is embedded for illustration purposes.
It must be pointed out, however, that none of the aforementioned issues are problems from a TimeML perspective, since its goal is not to provide a full-fledged annotation of factuality. Moreover, TimeML has been intentionally conceived of as a surface-based markup, which explains why, for instance, modal auxiliaries are recorded but not interpreted.
For the sake of clarity, the example above provides both the form and the ID for events and sources, but the original FactBank annotation records only the IDs.
We follow here the same approach as TimeML of annotating only heads.
These syntactic functions were obtained from parsing the corpus with the Stanford Parser (de Marneffe et al. 2006b).
As a matter of fact, there was no event judged as such throughout the whole corpus.
Rubin’s approach and ours are not completely equivalent, since she annotates only sentences where there are “explicit markers of certainty”, whereas we assume that factuality is a value affecting all events in text. In addition, her system does not consider polarity as part of the information to identify.

References

ACE (2008). ACE (Automatic Content Extraction) English annotation guidelines for relations (Version 6.0 – 2008.01.07 ed.). Linguistic Data Consortium. http://www.ldc.upenn.edu/Projects/ACE.
Aikhenvald, A. Y. (2004). Evidentiality. Oxford, England: Oxford University Press.
Google Scholar
Andreevskaia, A., & Bergler, S. (2006). Mining WordNet for fussy sentiment: Sentiment tag extraction from WordNet glosses. In Proceedings of the 11th conference of the European chapter of the Association for the Computational Linguistics, EACL-2006.
Asher, N. (1993). Reference to abstract objects in English. Dordrecht, The Netherlands: Kluwer Academic Press.
Google Scholar
Bach, K., & Harnish, R. M. (1979). Linguistic communication and speech acts. Cambridge, Massachusetts, USA: The MIT Press.
Google Scholar
Baker, C. F., Fillmore, C. J., & Lowe, J. B. (1998). The Berkeley FrameNet project. In 17th International conference on computational linguistics (pp. 86–90).
Bergler, S. (1992). Evidential analysis of reported speech. PhD thesis, Brandeis University.
Bethard, S., Yu, H., Thornton, A., Hatzivassiloglou, V., & Jurafsky, D. (2004). Automatic extraction of opinion propositions and their holders. In Proceedings of AAAI spring symposium on exploring attitude and affect in text.
Biber, D., & Finegan, E. (1989). Styles of stance in English: Lexical and grammatical marking of evidentiality and affect. Text, 9(1), 93–124.
Google Scholar
Carlson, L., Marcu, D., & Okurowski, M. E. (2003). Building a discourse-tagged corpus in the framework of Rhetorical Structure Theory. In J. v. Kuppevelt & R. W. Smith (Eds.), Current and new directions in discourse and dialogue. Springer.
Chafe, W. (1986). Evidentiality in English conversation and academic writing. In W. Chafe & J. Nichols (Eds.), Evidentiality: The linguistic coding of epistemology. Norwood, New Jersey, USA: Ablex Publishing Corporation.
Google Scholar
Choi, Y., Cardie, C., Riloff, E., & Patwardhan, S. (2005). Identifying sources of opinions with conditional random fields and extraction patterns. In Proceedings of the HLT/EMNLP 2005. Vancouver, Canada.
Clemen, G. (1997). The concept of hedging: Origins, approaches and definitions. In R. Markkanen & H. Schröder (Eds.), Hedging and discourse: Approaches to the analysis of a pragmatic phenomenon in academic texts (pp. 235–248). Berlin; New York: Walter de Gruyter.
Google Scholar
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 10, 37–46.
Article Google Scholar
Condoravdi, C., Crouch, R., van den Berg, M., Everett, J., Stolle, R., Paiva, V., & Bobrow, D. (2001). Preventing existence. In Proceedings of the conference on formal ontologies in information systems (FOIS), Ogunquit, Maine, USA.
Dave, K. (2003). Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. In Proceedings of World Wide Web conference 2003.
de Haan, F. (1997). The interaction of modality and negation: A typological study. New York, USA: Garland.
Google Scholar
de Haan, F. (2000). The relation between modality and evidentiality. In R. Müller & M. Reis (Eds.), Modalität und Modalverben im Deutschen. Hamburg, Germany: Helmut Buske Verlag.
Google Scholar
de Marneffe, M.-C., MacCartney, B., Grenager, T., Cer, D., Rafferty, A., & Manning, C. D. (2006a). Learning to distinguish valid textual entailments. In Second PASCAL RTE Challenge (RTE-2).
de Marneffe, M.-C., MacCartney, B., & Manning, C. D. (2006b). Generating typed dependency parses from phrase structure parses. In Proceedings of LREC 2006.
Di Eugenio, B., & Glass, M. (2004). The kappa statistic: a second look. Computational Linguistics, 30, 95–101.
Google Scholar
Dor, D. (1995). Representations, attitudes and factivity evaluations. An epistemically-based analysis of lexical selection. PhD thesis, Stanford University.
Geurts, B. (1998). Presuppositions and anaphors in attitude contexts. Linguistics and Philosophy, 21, 545–601.
Article Google Scholar
Givón, T. (1993). English grammar. A function-based introduction. Amsterdam, The Netherlands: John Benjamins.
Google Scholar
Glanzberg, M. (2003). Felicity and presupposition triggers. In University of Michigan Workshop in Philosophy and Linguistics. Michigan, USA.
Halliday, M. A. K. (1994). An introduction to Functional Grammar (2nd ed.). London, England: Edward Arnold.
Google Scholar
Halliday, M. A. K., & Matthiessen, C. M. (2004). An introduction to Functional Grammar. London, England: Hodder Arnold.
Google Scholar
Hickl, A., & Bensley, J. (2007). A discourse commitment-based framework for recognizing textual entailment. In Proceedings of the workshop on textual entailment and paraphrasing (pp. 171–176). Prague, Czech Republic.
Hooper, J. B. (1975). On assertive predicates. In J. Kimball (Ed.), Syntax and semantics, IV (pp. 91–124). New York, USA: Academic Press.
Google Scholar
Horn, L. R. (1972). On the semantic properties of logical operators in English. PhD thesis, UCLA. Distributed by the Indiana University Linguistics Club in 1976.
Horn, L. R. (1989). A natural history of negation. Chicago, USA: University of Chicago Press.
Google Scholar
Huddleston, R. (1984). Introduction to the grammar of English. Cambridge, England: Cambridge University Press.
Google Scholar
Karttunen, L. (1970). Implicative verbs. Language, 47, 340–358.
Article Google Scholar
Karttunen, L. (1973). Presuppositions of compound sentences. Linguistic Inquiry, 4(2), 169–193.
Google Scholar
Karttunen, L., & Zaenen, A. (2005). Veridicity. In G. Katz, J. Pustejovsky, & F. Schilder (Eds.), Dagstuhl seminar proceedings. Schloss Dagstuhl, Germany. Internationales Begegnungs- und Forschungszentrum (IBFI).
Kiefer, F. (1987). On defining modality. Folia Linguistica, XXI, 67–94.
Article Google Scholar
Kiparsky, P., & Kiparsky, C. (1970). Fact. In M. Bierwisch & K. E. Heidolph (Eds.), Progress in linguistics. A collection of papers (pp. 143–173). The Hague: Mouton.
Koenig, J.-P., & Davis, A. R. (2001). Sublexical modality and the structure of lexical semantics. Linguistics and Philosophy, 24, 71–124.
Article Google Scholar
Kratzer, A. (1991). Modality. In A. van Stechow & D. Wunderlich (Eds.), Semantik: Ein internationales Handbuch der zeitgenoessischen Forschung (pp. 639–650). Berlin, Germany: Walter de Gruyter.
Google Scholar
Light, M., Qiu, X. Y., & Srinivasan, P. (2004). The language of Bioscience: Facts, speculations, and statements in between. In BioLINK 2004: Linking biological literature, ontologies, and databases (pp. 17–24).
Lyons, J. (1977). Semantics. Cambridge, England: Cambridge University Press.
Google Scholar
Martin, J. R., & White, P. R. R. (2005). Language of evaluation: Appraisal in English. Palgrave Macmillan.
Meyers, A., Reeves, R., Macleod, C., Szekely, R., Zielinska, V., Young, B., & Grishman, R. (2004). The NomBank project: An interim report. In Proceedings of frontiers in corpus annotation workshop. HLT-NAACL.
Miltsakaki, E., Prasad, R., Joshi, A., & Webber, B. (2004). The Penn Discourse TreeBank. In Proceedings of LREC 2004.
Mushin, I. (2001). Evidentiality and epistemological stance. Amsterdam/Philadelphia: John Benjamin.
Google Scholar
Nairn, R., Condoravdi, C., & Karttunen, L. (2006). Computing relative polarity for textual inference. In Inference in Computational Semantics, ICoS-5.
Palmer, F. R. (1986). Mood and modality. Cambridge, England: Cambridge University Press.
Google Scholar
Palmer, M., Gildea, D., & Kingsbury, P. (2005). The Proposition Bank: An annotated corpus of semantic roles. Computational Linguistics, 31(1), 71–105.
Google Scholar
Pang, B., & Lee, L. (2005). Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the ACL, 115–124.
Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the EMNLP 2002.
Polanyi, L., & Zaenen, A. (2005). Contextual lexical valence shifters. In J. Shanahan, Y. Qu, & J. Wiebe (Eds.), Computing attitude and affect in text: Theories and applications. New York, NY, USA: Springer-Verlag.
Google Scholar
Pradhan, S., Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., & Weischedel, R. (2007). OntoNotes: A unified relational semantic representation. In Proceedings of IEEE international conference on semantic computing, ICSC 2007 (pp. 517–526).
Prasad, R., Dinesh, N., Lee, A., Joshi, A., & Webber, B. (2007). Attribution and its annotation in the Penn Discourse TreeBank. Traitement Automatique des Langues, 47(2), 43–63.
Google Scholar
Prasad, R., Dinesh, N., Lee, A., Robaldo, L., Joshi, A., & Webber, B. (2008). The Penn Discourse TreeBank 2.0. In Proceedings of LREC 2008, Marrakesh, Morocco.
Pustejovsky, J., Castano, J., Ingria, R., Saurí, R., Gaizauskas, R., Setzer, A., & Katz, G. (2003). TimeML: Robust specification of event and temporal expressions in text. In IWCS-5, fifth international workshop on computational semantics.
Pustejovsky, J., Verhagen, M., Saurí, R., Littman, J., Gaizauskas, R., Katz, G., Mani, I., Knippen, R., & Setzer, A. (2006). TimeBank 1.2. Linguistic Data Consortium (LDC). Philadelphia, PA. http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T08.
Pustejovsky, J., Knippen, B., Littman, J., & Saurí, R. (2005). Temporal and event information in natural language text. Language Resources and Evaluation, 39(2), 123–164.
Article Google Scholar
Pustejovsky, J., & Rumshisky, A. (2008). Between chaos and structure: Interpreting lexical data through a theoretical lens. Special Issue of International Journal of Lexicography in Memory of John Sinclair, 21(3), 337–355.
Google Scholar
Quirk, R., Greenbaum, S., Leech, G., & Svartik, J. (1985). A comprehensive grammar of the English language. London, England: Longman.
Google Scholar
Read, J., Hope, D., & Carroll, J. (2007). Annotating expressions of appraisal in English. In Proceedings of the linguistic annotation workshop, Prague. Association for Computational Linguistics, ACL.
Riloff, E., Wiebe, J., & Wilson, T. (2003). Learning subjective nouns using extraction pattern bootstrapping. In Proceedings of the 7th conference on natural language learning (CoNLL 2003).
Rubin, V. L. (2006). Identifying certainty in texts. PhD thesis, Syracuse University.
Rubin, V. L. (2007). Stating with certainty or stating with doubt: Intercoder reliability results for manual annotation of epistemically modalized statements. In Proceedings of the NAACL-HLT 2007.
Rubin, V. L., Liddy, E. D., & Kando, N. (2005). Certainty identification in texts: Categorization model and manual tagging results. In J. Shanahan, Y. Qu, & J. Wiebe (Eds.), Computing attitude and affect in text: Theories and applications. New York, USA: Springer-Verlag.
Google Scholar
Saurí, R. (2008). A factuality profiler for eventualities in text. PhD thesis, Brandeis University.
Saurí, R., & Pustejovsky, J. (2007). Determining modality and factuality for text entailment. In Proceedings of the first IEEE international conference on semantic computing, Irvine, CA, USA.
Saurí, R., & Pustejovsky, J. (2008). From structure to interpretation: A double-layered annotation for event factuality. In Proceedings of the second linguistic annotation workshop (The LAW II). LREC 2008, Marrakesh, Morocco.
Saurí, R., Verhagen, M., & Pustejovsky, J. (2006a). Annotating and recognizing event modality in text. In 19th International FLAIRS conference, FLAIRS 2006. The Florida Artificial Intelligence Research Society.
Saurí, R., Verhagen, M., & Pustejovsky, J. (2006b). SlinkET: A partial modal parser for events. In Proceedings of LREC 2006, Genoa, Italy.
Siegel, S., & Castellan, N. J. (1988). Nonparametric statistics for the behavioral sciences. Boston, MA, USA: McGraw Hill.
Google Scholar
Snow, R., & Vanderwende, L. (2006). Effectively using syntax for recognizing false entailment. In Proceedings of HLT-NAACL 2006.
Stoyanov, V., & Cardie, C. (2008). Annotating topics of opinions. In Proceedings of LREC 2008, Marrakech, Morocco. ELDA.
Tatu, M., & Moldovan, D. (2005). A semantic approach to recognizing textual entailment. In Proceedings of HLT/EMNLP (pp. 371–378).
Turney, P. D. (2002). Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th ACL, 417–424.
Van Valin, R. D., & LaPolla, R. J. (1997). Syntax. Structure, meaning and function. Cambridge, England: Cambridge University Press.
Google Scholar
Verhagen, M., Stubbs, A., & Pustejovsky, J. (2007). Combining independent syntactic and semantic annotation schemes. In Proceedings of the Linguistic Annotation Workshop (pp. 109–112). ACL. Prague, Czech Republic.
Waugh, L. R. (1995). Reported speech in journalistic discourse: The relation of function and text. Text, 15(1), 129–173.
Google Scholar
Wiebe, J., Wilson, T., & Cardie, C. (2005). Annotating expressions of opinions and emotions in language. Language Resources and Evaluation, 39(2), 165–210.
Article Google Scholar
Wiebe, J. M. (2000). Learning subjective adjectives from corpora. In Proceedings of the 17th National Conference on Artificial Intelligence (AAAI 2000).
Wierzbicka, A. (1987). English speech act verbs. A semantic dictionary. Sydney, Australia: Academic Press.
Google Scholar
Wilson, T., Hoffmann, P., Somasundaran, S., Kessler, J., Wiebe, J., Choi, Y., Cardie, C., Riloff, E., & Patwardhan, S. (2005). OpinionFinder: A system for subjectivity analysis. In Proceedings of the HLT/EMNLP 2005 Demonstration Abstracts (pp. 34–35). Vancouver, Canada.
Wilson, T., Wiebe, J., & Hwa, R. (2004). Just how mad are you? Finding strong and weak opinion clauses. In Proceedings of the 19th National Conference on Artificial Intelligence (AAAI 2004).
Wolf, F., & Gibson, E. (2005). Representing discourse coherence: A corpus-based analysis. Computational Linguistics, 31(2), 249–287.
Article Google Scholar

Download references

Acknowledgments

We are very grateful to Marc Verhagen, Toni Badia, Lauri Karttunen, Rick Alterman, Sabine Bergler, Adam Meyers, and Silvia Pareti for their valuable comments and helpful discussion regarding this research. We also want to extend thanks to four anonymous reviewers for their constructive suggestions, which helped improve the original manuscript. All errors and mistakes are responsibility of the authors. This work is been supported by a grant to Prof. Pustejovsky, NAVAIR Contract No. N61339-06-C-0140.

Author information

Authors and Affiliations

Laboratory for Linguistics and Computation, Computer Science Department, Brandeis University, Waltham, MA, USA
Roser Saurí & James Pustejovsky

Authors

Roser Saurí
View author publications
You can also search for this author in PubMed Google Scholar
James Pustejovsky
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roser Saurí.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saurí, R., Pustejovsky, J. FactBank: a corpus annotated with event factuality. Lang Resources & Evaluation 43, 227–268 (2009). https://doi.org/10.1007/s10579-009-9089-9

Download citation

Received: 14 August 2008
Accepted: 19 March 2009
Published: 07 May 2009
Issue Date: September 2009
DOI: https://doi.org/10.1007/s10579-009-9089-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FactBank: a corpus annotated with event factuality

Abstract

Access this article

Similar content being viewed by others

Natural Language Processing

When tense shifts presuppositions: hani and monstrous semantics

Counterpart Theory and Actuality

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Natural Language Processing

When tense shifts presuppositions: hani and monstrous semantics

Counterpart Theory and Actuality

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation