Skip to main content
Log in

Abstract

Traditional event extraction systems focus mainly on event type identification and event participant extraction based on pre-specified event type paradigms and manually annotated corpora. However, different domains have different event type paradigms. When transferring to a new domain, we have to build a new event type paradigm and annotate a new corpus from scratch. This kind of conventional event extraction system requires massive human effort, and hence prevents event extraction from being widely applicable. In this paper, we present BUEES, a bottom-up event extraction system, which extracts events from the web in a completely unsupervised way. The system automatically builds an event type paradigm in the input corpus, and then proceeds to extract a large number of instance patterns of these events. Subsequently, the system extracts event arguments according to these patterns. By conducting a series of experiments, we demonstrate the good performance of BUEES and compare it to a state-of-the-art Chinese event extraction system, i.e., a supervised event extraction system. Experimental results show that BUEES performs comparably (5% higher F-measure in event type identification and 3% higher F-measure in event argument extraction), but without any human effort.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ahn, D., 2006. The stages of event extraction. Proc. Workshop on Annotating and Reasoning about Time and Events, p.1-8.

  • Banko, M., Etzioni, O., 2008. The tradeoffs between open and traditional relation extraction. Proc. Annual Meeting on Association for Computational Linguistics, p.28-36.

  • Banko, M., Cafarella, M.J., Soderland, S., et al., 2007. Open information extraction for the Web. Proc. 20th Int. Joint Conf. on Artificial Intelligence, p.2670-2676.

  • Barzilay, R., McKeown, K.R., 2001. Extracting paraphrases from a parallel corpus. Proc. 39th Annual Meeting on Association for Computational Linguistics, p.50-57. [doi:10.3115/1073012.1073020]

  • Chambers, N., Jurafsky, D., 2009. Unsupervised learning of narrative schemas and their participants. Proc. 47th Annual Meeting on Association for Computational Linguistics and 4th Int. Joint Conf. on Natural Language Processing, p.602-610.

  • Chambers, N., Jurafsky, D., 2011. Template-based information extraction without the templates. Proc. 49th Annual Meeting on Association for Computational Linguistics, p.976-986.

  • Che, W., Li, Z., Li, Y., et al., 2009. Multilingual dependencybased syntactic and semantic parsing. Proc. 13th Conf. on Computational Natural Language Learning, p.49-54.

  • Chen, Z., Ji, H., 2009. Language specific issue and feature exploration in Chinese event extraction. Proc. Annual Conf. on Association for Computational Linguistics, p.209-212.

  • Chinchor, N., Lewis, D.D., Hirschman, L., 1993. Evaluating message understanding systems: an analysis of the third message understanding conference (MUC-3). Comput. Ling., 19(3):409–449.

    MATH  Google Scholar 

  • Ding, X., Song, F., Qin, B., et al., 2011. Research on typical event extraction method in the field of music. J. Chin. Inform. Process., 25(2):15–20 (in Chinese).

    Google Scholar 

  • Ding, X., Qin, B., Liu, T., 2013. Building Chinese event type paradigm based on trigger clustering. Proc. Int. Joint Conf. on Natural Language Processing, p.311-319.

  • Dong, Z., Dong, Q., 2006. HowNet and the Computation of Meaning. World Scientific Publishing Company, USA.

    Book  Google Scholar 

  • Etzioni, O., Fader, A., Christensen, J., et al., 2011. Open information extraction: the second generation. Proc. 22nd Int. Joint Conf. on Artificial Intelligence, p.3-10.

  • Fader, A., Soderland, S., Etzioni, O., 2011. Identifying relations for open information extraction. Proc. Conf. on Empirical Methods in Natural Language Processing, p.1535-1545.

  • Friedman, J.H., Bentley, J.L., Finkel, R.A., 1977. An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw., 3(3):209–226. [doi:10.1145/355744.355745]

    Article  MATH  Google Scholar 

  • Grishman, R., 1997. Information extraction: techniques and challenges. In: Pazienza, M.T. (Ed.), Information Extraction: a Multidisciplinary Approach to an Emerging Information Technology. Springer Berlin Heidelberg, New York, USA, p.10–27. [doi:10.1007/3-540-63438-X_2]

    Chapter  Google Scholar 

  • Grishman, R., 2001. Adaptive information extraction and sublanguage analysis. Int. Joint Conf. on Artificial Itelligence, Workshop on Adaptive Text Extraction and Mining.

  • Halkidi, M., Batistakis, Y., Vazirgiannis, M., 2001. On clustering validation techniques. J. Intell. Inform. Syst., 17(2-3):107–145. [doi:10.1023/A:1012801612483]

    Article  MATH  Google Scholar 

  • Hasegawa, T., Sekine, S., Grishman, R., 2004. Discovering relations among named entities from large corpora. Proc. 42nd Annual Meeting on Association for Computational Linguistics, Article 415. [doi:10.3115/1218955.1219008]

  • Hirschberg, D.S., 1977. Algorithms for the longest common subsequence problem. J. ACM, 24(4):664–675. [doi:10.1145/322033.322044]

    Article  MathSciNet  MATH  Google Scholar 

  • Hong, Y., Zhang, J., Ma, B., et al., 2011. Using cross-entity inference to improve event extraction. Proc. 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, p.1127-1136.

  • Ibrahim, A., Katz, B., Lin, J., 2003. Extracting structural paraphrases from aligned monolingual corpora. Proc. 2nd Int. Workshop on Paraphrasing, p.57-64. [doi:10.3115/1118984.1118992]

  • Ji, H., Grishman, R., 2008. Refining event extraction through cross-document inference. Proc. Association for Computational Linguistics, p.254-262.

  • Lee, C.S., Chen, Y.J., Jian, Z.W., 2003. Ontology-based fuzzy event extraction agent for Chinese e-news summarization. Expert Syst. Appl., 25(3):431–447. [doi:10.1016/S0957-4174(03)00062-9]

    Article  Google Scholar 

  • Liao, S., Grishman, R., 2010. Filtered ranking for bootstrapping in event extraction. Proc. 23rd Int. Conf. on Computational Linguistics, p.680-688.

  • Lin, D., Pantel, P., 2001. DIRT@SBT@discovery of inference rules from text. Proc. 7th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p.323-328. [doi:10.1145/502512.502559]

  • Liu, T., Ma, J., Zhang, H., et al., 2007. Subdividing verbs to improve syntactic parsing. J. Electron. (China), 24(3):347–352 (in Chinese). [doi:10.1007/s11767-005-0193-8]

    Google Scholar 

  • Mei, J.J., Zhu, Y.M., Gao, Y.Q., et al., 1983. Dictionary of Synonymous Words. Shanghai Dictionary Publishing Press, Shanghai, China (in Chinese).

    Google Scholar 

  • Miller, S., Guinness, J., Zamanian, A., 2004. Name tagging with word clusters and discriminative training. Proc. Conf. of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, p.337-342.

  • Miwa, M., Sætre, R., Kim, J.D., et al., 2010. Event extraction with complex event classification using rich features. J. Bioinform. Comput. Biol., 8(1):131–146. [doi:10.1142/S0219720010004586]

    Article  Google Scholar 

  • Pang, B., Knight, K., Marcu, D., 2003. Syntax-based alignment of multiple translations: extracting paraphrases and generating new sentences. Proc. Conf. of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, p.102-109. [doi:10.3115/1073445.1073469]

  • Patwardhan, S., Riloff, E., 2006. Learning domain-specific information extraction patterns from the Web. Proc. Workshop on Information Extraction Beyond the Document, p.66-73.

  • Pham, X., Le, M., Ho, B., 2013. A hybrid approach for biomedical event extraction. Proc. Association for Computational Linguistics, p.121-124.

  • Poon, H., Domingos, P., 2008. Joint unsupervised coreference resolution with Markov logic. Proc. Conf. on Empirical Methods in Natural Language Processing, p.650-659.

  • Poon, H., Domingos, P., 2009. Unsupervised semantic parsing. Proc. Conf. on Empirical Methods in Natural Language Processing, p.1-10.

  • Riloff, E., 1996. Automatically generating extraction patterns from untagged text. Proc. AAAI, p.1044-1049.

  • Ritter, A., Mausam, Etzioni, O., et al., 2012. Open domain event extraction from Twitter. Proc. 18th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p.1104-1112. [doi:10.1145/2339530.2339704]

  • Rosenfeld, B., Feldman, R., 2006. URES: an unsupervised web relation extraction system. Proc. COLING/ACL on Main Conference Poster Sessions, p.667-674.

  • Schilder, F., 007. Event extraction and temporal reasoning in legal documents. In: Schilder, F., Katz, G., Pustejovsky, J. (Eds.), Annotating, Extracting and Reasoning about Time and Events, p.55-71. [doi:10.1007/978-3-540-75989-8_5]

  • Shinyama, Y., Sekine, S., 2006. Preemptive information extraction using unrestricted relation discovery. Proc. Conf. of the North American Chapter of the Association of Computational Linguistics on Human Language Technology, p.304-311. [doi:10.3115/1220835.1220874]

  • Soderland, S., 1999. Learning information extraction rules for semi-structured and free text. Mach. Learn., 34(1-3):233–272. [doi:10.1023/A:1007562322031]

    Article  Google Scholar 

  • Stevenson, M., Greenwood, M.A., 2005. A semantic approach to IE pattern induction. Proc. 43rd Annual Meeting on Association for Computational Linguistics, p.379-386. [doi:10.3115/1219840.1219887]

  • Sudo, K., Sekine, S., Grishman, R., 2003. An improved extraction pattern representation model for automatic IE pattern acquisition. Proc. 41st Annual Meeting on Association for Computational Linguistics, p.224-231. [doi:10.3115/1075096.1075125]

  • Wagner, W., Schmid, H., im Walde, S.S., 2009. Verb sense disambiguation using a predicate-argument-clustering model. Proc. CogSci Workshop on Distributional Semantics Beyond Concrete Concepts, p.23-28.

  • Wu, F., Weld, D.S., 2010. Open information extraction using Wikipedia. Proc. 48th Annual Meeting of the Association for Computational Linguistics, p.118-127.

  • Yangarber, R., Grishman, R., Tapanainen, P., et al., 2000. Automatic acquisition of domain knowledge for information extraction. Proc. 18th Conf. on Computational Linguistics, p.940-946. [doi:10.3115/992730.992782]

  • Yates, A., Etzioni, O., 2009. Unsupervised methods for determining object and relation synonyms on the web. J. Artif. Intell. Res., 34(1):255–296.

    Google Scholar 

  • Yeh, A., Hirschman, L., Morgan, A., 2002. Background and overview for KDD Cup 2002 task 1: information extraction from biomedical articles. ACM SIGKDD Explor. Newslett., 4(2):87–89. [doi:10.1145/772862.772873]

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ting Liu.

Additional information

Project supported by the National Natural Science Foundation of China (Nos. 61133012 and 61472107) and the National Basic Research Program (973) of China (No. 2014CB340503)

A preliminary version was presented at the 6th International Joint Conference on Natural Language Processing, Oct. 14-18, 2013, Japan

ORCID: Xiao DING, http://orcid.org/0000-0002-5838-0320

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ding, X., Qin, B. & Liu, T. BUEES: a bottom-up event extraction system. Frontiers Inf Technol Electronic Eng 16, 541–552 (2015). https://doi.org/10.1631/FITEE.1400405

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/FITEE.1400405

Keywords

CLC number

Navigation