Abstract
We present work on estimating the relevance of the results of an Event Extraction system to the end-user’s needs. Our aim is to develop user-oriented measures of utility of the extracted events, i.e., how useful is the factual information found in the document for the end user. We introduce discourse and lexical features, and build classifiers that learn from the users’ ratings of the relevance of the extraction results. Traditional criteria for evaluating the performance of Information Extraction (IE) focus on the correctness of the extracted information, e.g., in terms of recall, precision, F-measure, etc. We rather focus on subjective criteria for evaluating the quality of the extracted information: utility of results to the end-user. To measure utility, we use methods from text mining and linguistic analysis to identify features that are good predictors of the relevance of an event or a document. We report on experiments in two real-world event extraction domains: corporate activities reported in business news, and health threats in news about infectious epidemics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The Pattern Understanding and Learning System: http://puls.cs.helsinki.fi
- 2.
The so-called “inverted pyramid” principle, [3]
- 3.
PULS system normalizes and unifies variants of disease names and organization names, e.g., Swine Flu with H1N1; company full-names and acronyms.
- 4.
Note that, on the other hand, it makes less sense to train the classifier on raw data, since it is inherently more noisy, degrades the classifier performance.
References
ACE: Automatic content extraction. http://www.nist.gov/speech/tests/ace/ (2004)
Bagga, A., Biermann, A.W.: Analyzing the complexity of a domain with respect to an information extraction task. In: Proceeding of the 10th International Conference on Research on Computational Linguistics (ROCLING X), Taipei (1997)
Bell, A.: The Language of News Media. Language in Society/Blackwell, Oxford (1991)
Bouckaert, R.: Bayesian network classifiers in Weka. Technical Report (2004)
Culotta, A., McCallum, A.: Confidence estimation for information extraction. In: Proceedings of Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics, Boston (2004)
Cvitas, A.: Information extraction in business intelligence systems. In: MIPRO, 2010 Proceedings of the 33rd International Convention, Opatija, May 2010, pp. 1278–1282
Freifeld, C., Mandl, K., Reis, B., Brownstein, J.: HealthMap: global infectious disease monitoring through automated classification and visualization of internet media reports. J. Am. Med. Inf. Assoc. 15(1), 150–157 (2008)
Grishman, R., Huttunen, S., Yangarber, R.: Event extraction for infectious disease outbreaks. In: Proceedings of the 2nd Human Language Technology Conference (HLT 2002), San Diego, March 2002
Grishman, R., Huttunen, S., Yangarber, R.: Information extraction for enhanced access to disease outbreak reports. J. Biomed. Inf. 35(4), 236–246 (2003)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009). http://dx.doi.org/10.1145/1656274.1656278
Hirschman, L.: Language understanding evaluations: lessons learned from MUC and ATIS. In: Proceedings of the First International Conference on Language Resources and Evaluation (LREC), Granada, May 1998, pp. 117–122
Huttunen, S., Yangarber, R., Grishman, R.: Complexity of event structure in information extraction. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), Taipei, August 2002
John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Eleventh Conference on Uncertainty in Artificial Intelligence, Montreal, pp. 338–345. Morgan Kaufmann, San Mateo (1995)
Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Advances in kernel Methods: Support Vector Learning, pp. 185–208. MIT, Cambridge (1999)
Saggion, H., Funk, A., Maynard, D., Bontcheva, K.: Ontology-based information extraction for business intelligence. In: Proceedings of the 6th International Semantic Web Conference and 2nd Asian Semantic Web Conference. ISWC’07/ASWC’07, Busan, pp. 843–856. Springer, Berlin/Heidelberg (2007). http://portal.acm.org/citation.cfm?id=1785162.1785225
Steinberger, R., Fuart, F., van der Goot, E., Best, C., von Etter, P., Yangarber, R.: Text mining from the web for medical intelligence. In: Perrotta, D., Piskorski, J., Soulié-Fogelman, F., Steinberger, R. (eds.) Mining Massive Data Sets for Security. OIS, Amsterdam (2008)
von Etter, P., Huttunen, S., Vihavainen, A., Vuorinen, M., Yangarber, R.: Assessment of utility in Web mining for the domain of public health. In: Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents. Association for Computational Linguistics, Los Angeles, June 2010, pp. 29–37. http://www.aclweb.org/anthology/W10-1105
Yangarber, R., Best, C., von Etter, P., Fuart, F., Horby, D., Steinberger, R.: Combining information about epidemic threats from multiple sources. In: Proceedings of the MMIES Workshop, International Conference on Recent Advances in Natural Language Processing (RANLP 2007), Borovets, September 2007
Yangarber, R., Steinberger, R.: Automatic epidemiological surveillance from on-line news in MedISys and PULS. In: Proceedings of IMED-2009: International Meeting on Emerging Diseases and Surveillance, Vienna (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Huttunen, S., Vihavainen, A., Du, M., Yangarber, R. (2013). Predicting Relevance of Event Extraction for the End User. In: Poibeau, T., Saggion, H., Piskorski, J., Yangarber, R. (eds) Multi-source, Multilingual Information Extraction and Summarization. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28569-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-28569-1_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28568-4
Online ISBN: 978-3-642-28569-1
eBook Packages: Computer ScienceComputer Science (R0)