Skip to main content

Predicting Relevance of Event Extraction for the End User

  • Chapter
  • First Online:
Multi-source, Multilingual Information Extraction and Summarization
  • 1934 Accesses

Abstract

We present work on estimating the relevance of the results of an Event Extraction system to the end-user’s needs. Our aim is to develop user-oriented measures of utility of the extracted events, i.e., how useful is the factual information found in the document for the end user. We introduce discourse and lexical features, and build classifiers that learn from the users’ ratings of the relevance of the extraction results. Traditional criteria for evaluating the performance of Information Extraction (IE) focus on the correctness of the extracted information, e.g., in terms of recall, precision, F-measure, etc. We rather focus on subjective criteria for evaluating the quality of the extracted information: utility of results to the end-user. To measure utility, we use methods from text mining and linguistic analysis to identify features that are good predictors of the relevance of an event or a document. We report on experiments in two real-world event extraction domains: corporate activities reported in business news, and health threats in news about infectious epidemics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The Pattern Understanding and Learning System: http://puls.cs.helsinki.fi

  2. 2.

    The so-called “inverted pyramid” principle, [3]

  3. 3.

    PULS system normalizes and unifies variants of disease names and organization names, e.g., Swine Flu with H1N1; company full-names and acronyms.

  4. 4.

    Note that, on the other hand, it makes less sense to train the classifier on raw data, since it is inherently more noisy, degrades the classifier performance.

References

  1. ACE: Automatic content extraction. http://www.nist.gov/speech/tests/ace/ (2004)

  2. Bagga, A., Biermann, A.W.: Analyzing the complexity of a domain with respect to an information extraction task. In: Proceeding of the 10th International Conference on Research on Computational Linguistics (ROCLING X), Taipei (1997)

    Google Scholar 

  3. Bell, A.: The Language of News Media. Language in Society/Blackwell, Oxford (1991)

    Google Scholar 

  4. Bouckaert, R.: Bayesian network classifiers in Weka. Technical Report (2004)

    Google Scholar 

  5. Culotta, A., McCallum, A.: Confidence estimation for information extraction. In: Proceedings of Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics, Boston (2004)

    Google Scholar 

  6. Cvitas, A.: Information extraction in business intelligence systems. In: MIPRO, 2010 Proceedings of the 33rd International Convention, Opatija, May 2010, pp. 1278–1282

    Google Scholar 

  7. Freifeld, C., Mandl, K., Reis, B., Brownstein, J.: HealthMap: global infectious disease monitoring through automated classification and visualization of internet media reports. J. Am. Med. Inf. Assoc. 15(1), 150–157 (2008)

    Google Scholar 

  8. Grishman, R., Huttunen, S., Yangarber, R.: Event extraction for infectious disease outbreaks. In: Proceedings of the 2nd Human Language Technology Conference (HLT 2002), San Diego, March 2002

    Google Scholar 

  9. Grishman, R., Huttunen, S., Yangarber, R.: Information extraction for enhanced access to disease outbreak reports. J. Biomed. Inf. 35(4), 236–246 (2003)

    Google Scholar 

  10. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009). http://dx.doi.org/10.1145/1656274.1656278

  11. Hirschman, L.: Language understanding evaluations: lessons learned from MUC and ATIS. In: Proceedings of the First International Conference on Language Resources and Evaluation (LREC), Granada, May 1998, pp. 117–122

    Google Scholar 

  12. Huttunen, S., Yangarber, R., Grishman, R.: Complexity of event structure in information extraction. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), Taipei, August 2002

    Google Scholar 

  13. John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Eleventh Conference on Uncertainty in Artificial Intelligence, Montreal, pp. 338–345. Morgan Kaufmann, San Mateo (1995)

    Google Scholar 

  14. Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Advances in kernel Methods: Support Vector Learning, pp. 185–208. MIT, Cambridge (1999)

    Google Scholar 

  15. Saggion, H., Funk, A., Maynard, D., Bontcheva, K.: Ontology-based information extraction for business intelligence. In: Proceedings of the 6th International Semantic Web Conference and 2nd Asian Semantic Web Conference. ISWC’07/ASWC’07, Busan, pp. 843–856. Springer, Berlin/Heidelberg (2007). http://portal.acm.org/citation.cfm?id=1785162.1785225

  16. Steinberger, R., Fuart, F., van der Goot, E., Best, C., von Etter, P., Yangarber, R.: Text mining from the web for medical intelligence. In: Perrotta, D., Piskorski, J., Soulié-Fogelman, F., Steinberger, R. (eds.) Mining Massive Data Sets for Security. OIS, Amsterdam (2008)

    Google Scholar 

  17. von Etter, P., Huttunen, S., Vihavainen, A., Vuorinen, M., Yangarber, R.: Assessment of utility in Web mining for the domain of public health. In: Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents. Association for Computational Linguistics, Los Angeles, June 2010, pp. 29–37. http://www.aclweb.org/anthology/W10-1105

  18. Yangarber, R., Best, C., von Etter, P., Fuart, F., Horby, D., Steinberger, R.: Combining information about epidemic threats from multiple sources. In: Proceedings of the MMIES Workshop, International Conference on Recent Advances in Natural Language Processing (RANLP 2007), Borovets, September 2007

    Google Scholar 

  19. Yangarber, R., Steinberger, R.: Automatic epidemiological surveillance from on-line news in MedISys and PULS. In: Proceedings of IMED-2009: International Meeting on Emerging Diseases and Surveillance, Vienna (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roman Yangarber .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Huttunen, S., Vihavainen, A., Du, M., Yangarber, R. (2013). Predicting Relevance of Event Extraction for the End User. In: Poibeau, T., Saggion, H., Piskorski, J., Yangarber, R. (eds) Multi-source, Multilingual Information Extraction and Summarization. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28569-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28569-1_8

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28568-4

  • Online ISBN: 978-3-642-28569-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics