Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6360))

  • 832 Accesses

Abstract

The issue of how to experimentally evaluate information extraction (IE) systems has received hardly any satisfactory solution in the literature. In this paper we propose a novel evaluation model for IE and argue that, among others, it allows (i) a correct appreciation of the degree of overlap between predicted and true segments, and (ii) a fair evaluation of the ability of a system to correctly identify segment boundaries. We describe the properties of this models, also by presenting the result of a re-evaluation of the results of the CoNLL’03 and CoNLL’02 Shared Tasks on Named Entity Extraction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Lavelli, A., Califf, M.E., Ciravegna, F., Freitag, D., Giuliano, C., Kushmerick, N., Romano, L., Ireson, N.: Evaluation of machine learning-based information extraction algorithms: Criticisms and recommendations. Language Resources and Evaluation 42(4), 361–393 (2008)

    Article  Google Scholar 

  2. Sarawagi, S.: Information extraction. Foundations and Trends in Databases 1(3), 261–377 (2008)

    Article  MATH  Google Scholar 

  3. Lewis, D.D.: Evaluating and optmizing autonomous text classification systems. In: Proceedings of the 18th ACM International Conference on Research and Development in Information Retrieval (SIGIR 1995), Seattle, US, pp. 246–254 (1995)

    Google Scholar 

  4. van Rijsbergen, C.J.: Foundations of evaluation. Journal of Documentation 30(4), 365–373 (1974)

    Article  Google Scholar 

  5. Suzuki, J., McDermott, E., Isozaki, H.: Training conditional random fields with multivariate evaluation measures. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL (ACL/COLING 2006), Sydney, AU, pp. 217–224 (2006)

    Google Scholar 

  6. Cohen, J.: A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20(1), 37–46 (1960)

    Article  Google Scholar 

  7. Fawcett, T.: An introduction to ROC analysis. Pattern Recognition Letters 27, 861–874 (2006)

    Article  Google Scholar 

  8. Freitag, D., Kushmerick, N.: Boosted wrapper induction. In: Proceedings of the 17th Conference of the American Association for Artificial Intelligence (AAAI 2000), Austin, US, pp. 577–583 (2000)

    Google Scholar 

  9. Freitag, D.: Machine learning for information extraction in informal domains. Machine Learning 39, 169–202 (2000)

    Article  MATH  Google Scholar 

  10. Krishnan, V., Manning, C.D.: An effective two-stage model for exploiting non-local dependencies in named entity recognition. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING/ACL 2006), Sydney, AU, pp. 1121–1128 (2006)

    Google Scholar 

  11. Tjong Kim Sang, E.F.: Introduction to the CoNLL-2002 shared task: Language-independent named entity recognition. In: Proceedings of the 6th Conference on Natural Language Learning (CONLL 2002), Taipei, TW, 155–158 (2002)

    Google Scholar 

  12. Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In: Proceedings of the 7th Conference on Natural Language Learning (CONLL 2003), Edmonton, CA, pp. 142–147 (2003)

    Google Scholar 

  13. Freitag, D.: Using grammatical inference to improve precision in information extraction. In: Proceedings of the ICML 1997 Workshop on Automata Induction, Grammatical Inference, and Language Acquisition, Nashville, US (1997)

    Google Scholar 

  14. De Sitter, A., Daelemans, W.: Information extraction via double classification. In: Proceedings of the ECML/PKDD 2003 Workshop on Adaptive Text Extraction and Mining, Cavtat-Dubrovnik, KR, pp. 66–73 (2003)

    Google Scholar 

  15. Tsai, R.T.H., Wu, S.H., Chou, W.C., Lin, Y.C., He, D., Hsiang, J., Sung, T.Y., Hsu, W.L.: Various criteria in the evaluation of biomedical named entity recognition. BMC Bioinformatics 7(92) (2006)

    Google Scholar 

  16. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)

    Article  Google Scholar 

  17. Wiebe, J., Wilson, T., Cardie, C.: Annotating expressions of opinions and emotions in language. Language Resources and Evaluation 39(2/3), 165–210 (2005)

    Article  Google Scholar 

  18. Joachims, T.: A support vector method for multivariate performance measures. In: Proceedings of the 22nd International Conference on Machine Learning (ICML 2005), Bonn, DE, 377–384 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Esuli, A., Sebastiani, F. (2010). Evaluating Information Extraction. In: Agosti, M., Ferro, N., Peters, C., de Rijke, M., Smeaton, A. (eds) Multilingual and Multimodal Information Access Evaluation. CLEF 2010. Lecture Notes in Computer Science, vol 6360. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15998-5_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15998-5_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15997-8

  • Online ISBN: 978-3-642-15998-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics