Abstract
The issue of how to experimentally evaluate information extraction (IE) systems has received hardly any satisfactory solution in the literature. In this paper we propose a novel evaluation model for IE and argue that, among others, it allows (i) a correct appreciation of the degree of overlap between predicted and true segments, and (ii) a fair evaluation of the ability of a system to correctly identify segment boundaries. We describe the properties of this models, also by presenting the result of a re-evaluation of the results of the CoNLL’03 and CoNLL’02 Shared Tasks on Named Entity Extraction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Lavelli, A., Califf, M.E., Ciravegna, F., Freitag, D., Giuliano, C., Kushmerick, N., Romano, L., Ireson, N.: Evaluation of machine learning-based information extraction algorithms: Criticisms and recommendations. Language Resources and Evaluation 42(4), 361–393 (2008)
Sarawagi, S.: Information extraction. Foundations and Trends in Databases 1(3), 261–377 (2008)
Lewis, D.D.: Evaluating and optmizing autonomous text classification systems. In: Proceedings of the 18th ACM International Conference on Research and Development in Information Retrieval (SIGIR 1995), Seattle, US, pp. 246–254 (1995)
van Rijsbergen, C.J.: Foundations of evaluation. Journal of Documentation 30(4), 365–373 (1974)
Suzuki, J., McDermott, E., Isozaki, H.: Training conditional random fields with multivariate evaluation measures. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL (ACL/COLING 2006), Sydney, AU, pp. 217–224 (2006)
Cohen, J.: A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20(1), 37–46 (1960)
Fawcett, T.: An introduction to ROC analysis. Pattern Recognition Letters 27, 861–874 (2006)
Freitag, D., Kushmerick, N.: Boosted wrapper induction. In: Proceedings of the 17th Conference of the American Association for Artificial Intelligence (AAAI 2000), Austin, US, pp. 577–583 (2000)
Freitag, D.: Machine learning for information extraction in informal domains. Machine Learning 39, 169–202 (2000)
Krishnan, V., Manning, C.D.: An effective two-stage model for exploiting non-local dependencies in named entity recognition. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING/ACL 2006), Sydney, AU, pp. 1121–1128 (2006)
Tjong Kim Sang, E.F.: Introduction to the CoNLL-2002 shared task: Language-independent named entity recognition. In: Proceedings of the 6th Conference on Natural Language Learning (CONLL 2002), Taipei, TW, 155–158 (2002)
Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In: Proceedings of the 7th Conference on Natural Language Learning (CONLL 2003), Edmonton, CA, pp. 142–147 (2003)
Freitag, D.: Using grammatical inference to improve precision in information extraction. In: Proceedings of the ICML 1997 Workshop on Automata Induction, Grammatical Inference, and Language Acquisition, Nashville, US (1997)
De Sitter, A., Daelemans, W.: Information extraction via double classification. In: Proceedings of the ECML/PKDD 2003 Workshop on Adaptive Text Extraction and Mining, Cavtat-Dubrovnik, KR, pp. 66–73 (2003)
Tsai, R.T.H., Wu, S.H., Chou, W.C., Lin, Y.C., He, D., Hsiang, J., Sung, T.Y., Hsu, W.L.: Various criteria in the evaluation of biomedical named entity recognition. BMC Bioinformatics 7(92) (2006)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Wiebe, J., Wilson, T., Cardie, C.: Annotating expressions of opinions and emotions in language. Language Resources and Evaluation 39(2/3), 165–210 (2005)
Joachims, T.: A support vector method for multivariate performance measures. In: Proceedings of the 22nd International Conference on Machine Learning (ICML 2005), Bonn, DE, 377–384 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Esuli, A., Sebastiani, F. (2010). Evaluating Information Extraction. In: Agosti, M., Ferro, N., Peters, C., de Rijke, M., Smeaton, A. (eds) Multilingual and Multimodal Information Access Evaluation. CLEF 2010. Lecture Notes in Computer Science, vol 6360. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15998-5_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-15998-5_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15997-8
Online ISBN: 978-3-642-15998-5
eBook Packages: Computer ScienceComputer Science (R0)