Skip to main content

Reproducibility and Validity in CLEF

  • Chapter
  • First Online:
Information Retrieval Evaluation in a Changing World

Part of the book series: The Information Retrieval Series ((INRE,volume 41))

Abstract

In this paper, we investigate CLEF’s contribution to the reproducibility of IR experiments. After discussing the concepts of reproducibility and validity, we show that CLEF has not only produced test collections that can be re-used by other researchers, but also undertaken various efforts in enabling reproducibility.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Agosti M, Di Buccio E, Ferro N, Masiero I, Peruzzo S, Silvello G (2012) DIRECTions: design and specification of an IR evaluation infrastructure. In: Catarci T, Forner P, Hiemstra D, Peñas A, Santucci G (eds) Information access evaluation. Multilinguality, multimodality, and visual analytics. Proceedings of the third international conference of the CLEF initiative (CLEF 2012). Lecture notes in computer science (LNCS), vol 7488. Springer, Heidelberg, pp 88–99

    Chapter  Google Scholar 

  • Angelini M, Ferro N, Santucci G, Silvello G (2016) A visual analytics approach for what-if analysis of information retrieval systems. In Perego R, Sebastiani F, Aslam JA, Ruthven I, Zobel J (eds) Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval, SIGIR 2016, Pisa, July 17–21, 2016. ACM, New York, pp 1081–1084. ISBN 978-1-4503-4069-4. http://doi.acm.org/10.1145/2911451.2911462

    Chapter  Google Scholar 

  • Armstrong TG, Moffat A, Webber W, Zobel J (2009) Improvements that don’t add up: ad-hoc retrieval results since 1998. In: Cheung DW-L, Song I-Y, Chu WW, Hu X, Lin JJ (eds) Proceedings of the 18th ACM conference on Information and knowledge management CIKM. ACM, New York, pp 601–610. ISBN 978-1-60558-512-3

    Chapter  Google Scholar 

  • Besançon R, Chaudiron S, Mostefa D, Timimi I, Choukri K, LaĂŻb M (2010) Information filtering evaluation: overview of CLEF 2009 INFILE track. In: Peters C, Di Nunzio GM, Kurimo M, Mandl T, Mostefa D, Peñas A, Roda G (eds) Multilingual information access evaluation vol. I. Text retrieval experiments – tenth workshop of the cross–language evaluation forum (CLEF 2009). Revised selected papers. Lecture notes in computer science (LNCS), vol 6241. Springer, Heidelberg, pp 342–353

    Google Scholar 

  • Braschler M (2002) CLEF 2001 – overview of results. In: Peters C, Braschler M, Gonzalo J, Kluck M (eds) Evaluation of cross-language information retrieval systems: second workshop of the cross–language evaluation forum (CLEF 2001) revised papers. Lecture notes in computer science (LNCS), vol 2406. Springer, Heidelberg, pp 9–26

    Chapter  Google Scholar 

  • Carterette BA (2012) Multiple testing in statistical analysis of systems-based information retrieval experiments. ACM Trans Inf Syst 30(1):4:1–4:34. http://doi.acm.org/10.1145/2094072.2094076

    Article  Google Scholar 

  • Di Nunzio GM, Ferro N (2005) DIRECT: a system for evaluating information access components of digital libraries. In: Rauber A, Christodoulakis C, Tjoa AM (eds) Research and advanced technology for digital libraries, 9th European conference, ECDL 2005, Vienna, Austria, September 18–23, 2005, proceedings. Springer, Berlin, pp 483–484. https://doi.org/10.1007/11551362_46

    Chapter  Google Scholar 

  • Ferro N, Fuhr N, Jarvelin K, Kando N, Lippold M, Zobel J (2016) Increasing reproducibility in IR: findings from the Dagstuhl seminar on “reproducibility of data-oriented experiments in e-science”. SIGIR Forum 50(1):68–82. http://sigir.org/files/forum/2016J/p068.pdf

    Article  Google Scholar 

  • Ferro N, Fuhr N, Grefenstette G, Konstan JA, Castells P, Daly EM, Declerck T, Ekstrand MD, Geyer W, Gonzalo J, Kuflik T, Linden K, Magnini B, Nie J-Y, Perego R, Shapira B, Soboroff I, Tintarev N, Verspoor K, Willemsen MC, Zobel J (2018) The Dagstuhl perspectives workshop on performance modeling and prediction. SIGIR Forum 52(1):91–101

    Article  Google Scholar 

  • Freire J, Fuhr N, Rauber A (2016) Reproducibility of data-oriented experiments in e-science. Dagstuhl Rep 6(1):108–159. http://drops.dagstuhl.de/opus/institut_dagrep.php?fakultaet=07

    Google Scholar 

  • Fuhr N (2017) Some common mistakes in ir evaluation, and how they can be avoided. SIGIR Forum 51(3):32–41. http://sigir.org/wp-content/uploads/2018/01/p032.pdf

    Article  Google Scholar 

  • Gonzalo J, Oard DW (2005) iCLEF 2004 track overview: pilot experiments in interactive cross-language question answering. In: Peters C, Clough P, Gonzalo J, Jones GJF, Kluck M, Magnini B (eds) Multilingual information access for text, speech and images: fifth workshop of the cross–language evaluation forum (CLEF 2004) revised selected papers. Lecture notes in computer science (LNCS), vol 3491. Springer, Heidelberg, pp 310–322

    Chapter  Google Scholar 

  • Kille B, Lommatzsch A, Hopfgartner F, Larson M, Brodt T (2017) CLEF 2017 newsreel overview: offline and online evaluation of stream-based news recommender systems. In Cappellato L, Ferro N, Goeuriot L, Mandl T (eds) Working notes of CLEF 2017 - conference and labs of the evaluation forum, Dublin, September 11–14, 2017. CEUR workshop proceedings (CEUR-WS.org). ISSN 1613-0073. http://ceur-ws.org/Vol-1866/. http://ceur-ws.org/Vol-1866/invited_paper_17.pdf

  • Open Science Collaboration (2015) Estimating the reproducibility of psychological science. Science 349(6251):943–952

    Article  Google Scholar 

  • Potthast M, Gollub T, Hagen M, Kiesel J, Michel M, Oberländer A, Tippmann M, BarrĂłn-Cedeño A, Gupta P, Rosso P, Stein B (2012) Overview of the 4th international competition on plagiarism detection. In: Forner P, Karlgren J, Womser-Hacker C, Ferro N (eds) CLEF 2012 working notes. CEUR workshop proceedings (CEUR-WS.org). ISSN 1613-0073, http://ceur-ws.org/Vol-1178/

  • Potthast M, Hagen M, Gollub T, Tippmann M, Kiesel J, Rosso P, Stamatatos E, Stein B (2013) Overview of the 5th international competition on plagiarism detection. In: Forner P, Navigli R, Tufis D, Ferro N (eds) CLEF 2013 working notes. CEUR workshop proceedings (CEUR-WS.org). ISSN 1613-0073. http://ceur-ws.org/Vol-1179/

  • Rao J, Lin JJ, Efron M (2015) Reproducible experiments on lexical and temporal feedback for tweet search. In Hanbury A, Kazai G, Rauber A, Fuhr N (eds) Advances in information retrieval - 37th European conference on IR research, ECIR 2015, Vienna, March 29–April 2, 2015. Proceedings. Lecture Notes in Computer Science, vol 9022, pp 755–767. ISBN 978-3-319-16353-6. https://doi.org/10.1007/978-3-319-16354-3_82

    Google Scholar 

  • Schuth A, Balog K, Kelly L (2015) Overview of the living labs for information retrieval evaluation (LL4IR) CLEF Lab 2015. In: Mothe J, Savoy J, Kamps J, Pinel-Sauvagnat K, Jones GJF, SanJuan E, Cappellato L, Ferro N (eds) Experimental IR meets multilinguality, multimodality, and interaction. Proceedings of the sixth international conference of the CLEF association (CLEF 2015). Lecture notes in computer science (LNCS), vol 9283. Springer, Heidelberg, pp 484–496

    Chapter  Google Scholar 

  • Silvello G, Bordea G, Ferro N, Buitelaar P, Bogers T (2017) Semantic representation and enrichment of information retrieval experimental data. Int J Digit Libr 18(2):145–172. ISSN 1432-5012. https://doi.org/10.1007/s00799-016-0172-8

    Article  Google Scholar 

  • Voorhees EM, Buckley C (2002) The effect of topic set size on retrieval experiment error. In: Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’02. ACM, New York, pp 316–323. ISBN 1-58113-561-0. https://doi.org/10.1145/564376.564432

    Chapter  Google Scholar 

  • Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, San Francisco. ISBN 0123748569, ISBN 9780123748560

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Norbert Fuhr .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Fuhr, N. (2019). Reproducibility and Validity in CLEF. In: Ferro, N., Peters, C. (eds) Information Retrieval Evaluation in a Changing World. The Information Retrieval Series, vol 41. Springer, Cham. https://doi.org/10.1007/978-3-030-22948-1_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-22948-1_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-22947-4

  • Online ISBN: 978-3-030-22948-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics