Predicting Relevance of Event Extraction for the End User

Huttunen, Silja; Vihavainen, Arto; Du, Mian; Yangarber, Roman

doi:10.1007/978-3-642-28569-1_8

Silja Huttunen⁵,
Arto Vihavainen⁵,
Mian Du⁵ &
…
Roman Yangarber⁵

Part of the book series: Theory and Applications of Natural Language Processing ((NLP))

2052 Accesses
2 Citations

Abstract

We present work on estimating the relevance of the results of an Event Extraction system to the end-user’s needs. Our aim is to develop user-oriented measures of utility of the extracted events, i.e., how useful is the factual information found in the document for the end user. We introduce discourse and lexical features, and build classifiers that learn from the users’ ratings of the relevance of the extraction results. Traditional criteria for evaluating the performance of Information Extraction (IE) focus on the correctness of the extracted information, e.g., in terms of recall, precision, F-measure, etc. We rather focus on subjective criteria for evaluating the quality of the extracted information: utility of results to the end-user. To measure utility, we use methods from text mining and linguistic analysis to identify features that are good predictors of the relevance of an event or a document. We report on experiments in two real-world event extraction domains: corporate activities reported in business news, and health threats in news about infectious epidemics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Event Detection and Information Extraction Strategies from Text: A Preliminary Study Using GENIA Corpus

Using Entity Relation to Improve Event Detection via Attention Mechanism

A Framework for Event Information Extraction from Chinese News Online

Notes

1.
The Pattern Understanding and Learning System: http://puls.cs.helsinki.fi
2.
The so-called “inverted pyramid” principle, [3]
3.
PULS system normalizes and unifies variants of disease names and organization names, e.g., Swine Flu with H1N1; company full-names and acronyms.
4.
Note that, on the other hand, it makes less sense to train the classifier on raw data, since it is inherently more noisy, degrades the classifier performance.

References

ACE: Automatic content extraction. http://www.nist.gov/speech/tests/ace/ (2004)
Bagga, A., Biermann, A.W.: Analyzing the complexity of a domain with respect to an information extraction task. In: Proceeding of the 10th International Conference on Research on Computational Linguistics (ROCLING X), Taipei (1997)
Google Scholar
Bell, A.: The Language of News Media. Language in Society/Blackwell, Oxford (1991)
Google Scholar
Bouckaert, R.: Bayesian network classifiers in Weka. Technical Report (2004)
Google Scholar
Culotta, A., McCallum, A.: Confidence estimation for information extraction. In: Proceedings of Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics, Boston (2004)
Google Scholar
Cvitas, A.: Information extraction in business intelligence systems. In: MIPRO, 2010 Proceedings of the 33rd International Convention, Opatija, May 2010, pp. 1278–1282
Google Scholar
Freifeld, C., Mandl, K., Reis, B., Brownstein, J.: HealthMap: global infectious disease monitoring through automated classification and visualization of internet media reports. J. Am. Med. Inf. Assoc. 15(1), 150–157 (2008)
Google Scholar
Grishman, R., Huttunen, S., Yangarber, R.: Event extraction for infectious disease outbreaks. In: Proceedings of the 2nd Human Language Technology Conference (HLT 2002), San Diego, March 2002
Google Scholar
Grishman, R., Huttunen, S., Yangarber, R.: Information extraction for enhanced access to disease outbreak reports. J. Biomed. Inf. 35(4), 236–246 (2003)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009). http://dx.doi.org/10.1145/1656274.1656278
Hirschman, L.: Language understanding evaluations: lessons learned from MUC and ATIS. In: Proceedings of the First International Conference on Language Resources and Evaluation (LREC), Granada, May 1998, pp. 117–122
Google Scholar
Huttunen, S., Yangarber, R., Grishman, R.: Complexity of event structure in information extraction. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), Taipei, August 2002
Google Scholar
John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Eleventh Conference on Uncertainty in Artificial Intelligence, Montreal, pp. 338–345. Morgan Kaufmann, San Mateo (1995)
Google Scholar
Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Advances in kernel Methods: Support Vector Learning, pp. 185–208. MIT, Cambridge (1999)
Google Scholar
Saggion, H., Funk, A., Maynard, D., Bontcheva, K.: Ontology-based information extraction for business intelligence. In: Proceedings of the 6th International Semantic Web Conference and 2nd Asian Semantic Web Conference. ISWC’07/ASWC’07, Busan, pp. 843–856. Springer, Berlin/Heidelberg (2007). http://portal.acm.org/citation.cfm?id=1785162.1785225
Steinberger, R., Fuart, F., van der Goot, E., Best, C., von Etter, P., Yangarber, R.: Text mining from the web for medical intelligence. In: Perrotta, D., Piskorski, J., Soulié-Fogelman, F., Steinberger, R. (eds.) Mining Massive Data Sets for Security. OIS, Amsterdam (2008)
Google Scholar
von Etter, P., Huttunen, S., Vihavainen, A., Vuorinen, M., Yangarber, R.: Assessment of utility in Web mining for the domain of public health. In: Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents. Association for Computational Linguistics, Los Angeles, June 2010, pp. 29–37. http://www.aclweb.org/anthology/W10-1105
Yangarber, R., Best, C., von Etter, P., Fuart, F., Horby, D., Steinberger, R.: Combining information about epidemic threats from multiple sources. In: Proceedings of the MMIES Workshop, International Conference on Recent Advances in Natural Language Processing (RANLP 2007), Borovets, September 2007
Google Scholar
Yangarber, R., Steinberger, R.: Automatic epidemiological surveillance from on-line news in MedISys and PULS. In: Proceedings of IMED-2009: International Meeting on Emerging Diseases and Surveillance, Vienna (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Helsinki, Helsinki, Finland
Silja Huttunen, Arto Vihavainen, Mian Du & Roman Yangarber

Authors

Silja Huttunen
View author publications
Search author on:PubMed Google Scholar
Arto Vihavainen
View author publications
Search author on:PubMed Google Scholar
Mian Du
View author publications
Search author on:PubMed Google Scholar
Roman Yangarber
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Roman Yangarber .

Editor information

Editors and Affiliations

Universite Sorbonne Nouvelle, LATTICE-CNRS, Ecole Normale Superieure and, rue d'Ulm 45, Paris, 75005, France
Thierry Poibeau
, Information & Communication Technologies, Universitat Pompeu Fabra, C/ Tanger 122-140, Barcelona, 08018, Spain
Horacio Saggion
Institute for Computer Science, Polish Acadmey of Science, ul. Jana Kazimierza 5, Warsaw, 01-248, Poland
Jakub Piskorski
Department of Computer Science, University of Helsinki, Gustaf Hällströmin katu 2, Helsinki, 00014, Finland
Roman Yangarber

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Huttunen, S., Vihavainen, A., Du, M., Yangarber, R. (2013). Predicting Relevance of Event Extraction for the End User. In: Poibeau, T., Saggion, H., Piskorski, J., Yangarber, R. (eds) Multi-source, Multilingual Information Extraction and Summarization. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28569-1_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-28569-1_8
Published: 12 July 2012
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28568-4
Online ISBN: 978-3-642-28569-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics