Abstract
In this chapter we discuss evaluation of Information Retrieval (IR) systems and in particular ImageCLEF, a large–scale evaluation campaign that has produced several publicly–accessible resources required for evaluating visual information retrieval systems and is the focus of this book. This chapter sets the scene for the book by describing the purpose of system and user–centred evaluation, the purpose of test collections, the role of evaluation campaigns such as TREC and CLEF, our motivations for starting ImageCLEF and then a summary of the tracks run over the seven years (data, tasks and participants). The chapter will also provide an insight into lessons learned and experiences gained over the years spent organising ImageCLEF, and a summary of the main highlights.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Borland P (2000) Experimental components for the evaluation of interactive information retrieval systems. Journal of Documentation 56(1):71–90
Cleverdon C (1959) The evaluation of systems used in information retrieval. In: Proceedings of the International Conference on Scientific Information — Two Volumes. Washington: National Academy of Sciences, National Research Council, pp 687–698
Cleverdon CW (1991) The significance of the Cranfield tests on index languages. In: 14th annual international ACM SIGIR conference on research and development in information retrieval. ACM, Chicago, IL, USA, pp 3–12
Clough PD, Sanderson M (2006) User experiments with the eurovision cross–language image retrieval system. Journal of the American Society for Information Science and Technology 57(5):679–708
Deselaers T, Müller H, Deserno TM (2009) Editorial to the special issue on medical image annotation in ImageCLEF 2007. Pattern Recognition Letters 29(15):1987
Dunlop M (2000) Reflections on MIRA: Interactive evaluation in information retrieval. Journal of the American Society for Information Science 51(14):1269–1274
Escalante HJ, Hernández CA, Gonzalez JA, López-López A, Montes M, Morales EF, Sucar LE, Villaseñor L, Grubinger M (2010) The segmented and annotated IAPR TC–12 benchmark. Computer Vision and Image Understanding 114(4):419–428
Forsyth D (2002) Benchmarks for storage and retrieval in multimedia databases. In: Proceedings of storage and retrieval for media databases, pp 240–247. SPIE Photonics West Conference
Goodrum A (2000) Image information retrieval: An overview of current research. Informing Science 3(2):63–66
Grubinger M, Clough PD, Müller H, Deselaers T (2006) The IAPR TC–12 benchmark — a new evaluation resource for visual information systems. In: Proceedings of the International Workshop OntoImage 2006 Language Resources for Content–Based Image Retrieval, held in conjunction with LREC 2006, pp 13–23
Hanbury A, Clough PD, Müller H (2010) Special issue on image and video retrieval evaluation. Computer Vision and Image Understanding 114:409–410
Ingwersen P, Järvelin K (2005) The turn: Integration of information seeking and retrieval in context. The information retrieval series, Springer, Secaucus, NJ, USA. 140203850X
Kando N (2003) Evaluation of information access technologies at the NTCIR workshop. In: Peters C, Gonzalo J, Braschler M, Kluck M (eds) Comparative Evaluation of Multilingual Information Access Systems Fourth Workshop of the Cross–Language Evaluation Forum, CLEF 2003. Lecture Notes in Computer Science (LNCS), vol 3237, Trondheim, Norway, pp 29–43
Kelly D (2010) Methods for evaluating interactive information retrieval systems with users. Foundations and Trends in Information Retrieval 3(1–2):1–224
Kuriyama K, Kando N, Nozue T, Eguchi K (2002) Pooling for a large–scale test collection: An analysis of the search results from the first NTCIR workshop. Information Retrieval 5(1):41–59
Müller H, Müller W, McG Squire D, Marchand-Mailet S, Pun T (2001) Performance evaluation in content–based image retrieval: Overview and proposals. Pattern Recognition Letters 22(5):593–601
Müller H, Geissbuhler G, Marchand-Maillet S, Clough PD (2004) Benchmarking image retrieval applications. In: Proceedings of the tenth international conference on distributed multimedia systems (DMS’2004), workshop on visual information systems (VIS 2004), pp 334–337
Müller H, Deselaers T, Grubinger M, Clough PD, Hanbury A, Hersh W (2007) Problems with running a successful multimedia retrieval benchmark. In: Proceedings of the third MUSCLE / ImageCLEF workshop on image and video retrieval evaluation
Nowak S, Lukashevich H, Dunker P, Rüger S (2010) Performance measures for multilabel classification — a case study in the area of image classification. In: ACM SIGMM International conference on multimedia information retrieval (ACM MIR). ACM press, Philadelphia, Pennsylvania
Peters C, Braschler M (2001) Cross–language system evaluation: The CLEF campaigns. Journal of the American Society for Information Science and Technology 52(12):1067–1072
Petrelli D (2008) On the role of user–centred evaluation in the advancement of interactive information retrieval. Information Processing and Management 44(1):22–38
Robertson S (2008) On the history of evaluation in ir. Journal of Information Science 34:439–456
Sanderson M (2010 – to appear) Test Collection Evaluation of Ad–hoc Retrieval Systems. Foundations and Trends in Information Retrieval
Sanderson M, Clough PD (2002) Eurovision — an image–based CLIR system. In: Workshop held at the 25th annual international ACM SIGIR conference on research and development in information retrieval, Workshop 1: Cross–Language Information Retrieval: A Research Roadmap. ACM press, Philadelphia, Pennsylvania, pp 56–59
Saracevic T (1995) Evaluation of evaluation in information retrieval. In: 18th annual international ACM SIGIR conference on research and development in information retrieval. ACM, Seattle, OR, USA, pp 138–146
Sedghi S, Sanderson M, Clough PD (2009) A study on the relevance criteria for medical images. Pattern Recognition Letters 29(15):2046–2057
Smith JR (1998) Image retrieval evaluation. In: Proceedings of the IEEE Workshop on Content–Based Access of Image and Video Libraries (CBAIVL 1998). IEEE Computer Society, Washington, DC, USA, pp 112–113
Voorhees EM (2000) Variations in relevance judgments and the measurement of retrieval effectiveness. Information Processing & Management 36(5):697–716
Voorhees EM, Harman DKe (2005) TREC: Experiments and evaluation in information retrieval. MIT Press, Cambridge, MA
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Clough, P., Müller, H., Sanderson, M. (2010). Seven Years of Image Retrieval Evaluation. In: Müller, H., Clough, P., Deselaers, T., Caputo, B. (eds) ImageCLEF. The Information Retrieval Series, vol 32. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15181-1_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-15181-1_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15180-4
Online ISBN: 978-3-642-15181-1
eBook Packages: Computer ScienceComputer Science (R0)