ABSTRACT
This paper introduces TrecTools, a Python library for assisting Information Retrieval (IR) practitioners with TREC-like campaigns. IR practitioners tasked with activities like building test collections, evaluating systems, or analysing results from empirical experiments commonly have to resort to use a number of different software tools and scripts that each perform an individual functionality - and at times they even have to implement ad-hoc scripts of their own. TrecTools aims to provide a unified environment for performing these common activities.
Written in the most popular programming language for Data Science, Python, TrecTools offers an object-oriented, easily extensible library. Existing systems, e.g., trec_eval, have considerable barrier to entry when it comes to modify or extend them. Furthermore, many existing IR measures and tools are implemented independently of each other, in different programming languages. TrecTools seeks to lower the barrier to entry and to unify existing tools, frameworks and activities into one common umbrella. Widespread adoption of a centralised solution for developing, evaluating, and analysing TREC-like campaigns will ease the burden on organisers and provide participants and users with a standard environment for common IR experimental activities.
TrecTools is distributed as an open source library under the MIT license at https://github.com/joaopalotti/trectools
- Enrique Amigó, Jorge Carrillo-de Albornoz, Mario Almagro-Cádiz, Julio Gonzalo, Javier Rodríguez-Vidal, and Felisa Verdejo. 2017. Evall: Open access evaluation for information access systems. In SIGIR. ACM, 1301--1304. Google ScholarDigital Library
- Javed A. Aslam and Mark Montague. 2001. Models for Metasearch. In SIGIR. ACM, 276--284. Google ScholarDigital Library
- Leif Azzopardi, Paul Thomas, and Alistair Moffat. 2019. cwl_eval: An Evaluation Tool for Information Retrieval. In SIGIR. ACM. Google ScholarDigital Library
- Chris Buckley et al. 2004. The trec_eval evaluation package.Google Scholar
- Charles L. A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan Büttcher, and Ian MacKinnon. 2008. Novelty and diversity in information retrieval evaluation. In SIGIR. ACM, 659--666. Google ScholarDigital Library
- Gordon V. Cormack, Charles L. A. Clarke, and Stefan Buettcher. 2009. Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In SIGIR, Vol. 9. 758--759. Google ScholarDigital Library
- Jimmy, Guido Zuccon, João Palotti, Lorraine Goeuriot, and Liadh Kelly. 2018. Overview of the CLEF 2018 Consumer Health Search Task. In CLEF. http://ceur-ws.org/Vol-2125/invited_paper_17.pdf.Google Scholar
- Bevan Koopman and Guido Zuccon. 2014. Relevation!: An open source system for information retrieval relevance assessment. In SIGIR. 1243--1244. Google ScholarDigital Library
- Joon Ho Lee. 1997. Analyses of Multiple Evidence Combination. In SIGIR. ACM, 267--276. Google ScholarDigital Library
- Aldo Lipani, Mihai Lupu, and Allan Hanbury. 2017. Visual Pool: A Tool to Visualize and Interact with the Pooling Method. In SIGIR. ACM, 1321--1324. Google ScholarDigital Library
- Aldo Lipani, Joao Palotti, Mihai Lupu, Florina Piroi, Guido Zuccon, and Allan Hanbury. 2017. Fixed-cost pooling strategies based on IR evaluation measures. In ECIR. Springer, 357--368.Google Scholar
- Craig Macdonald, Richard McCreadie, Rodrygo L. T. Santos, and Iadh Ounis. 2012. From puppy to maturity: Experiences in developing Terrier. Proc. of OSIR at SIGIR (2012), 60--63.Google Scholar
- Craig Macdonald and Iadh Ounis. 2006. Voting for candidates: adapting data fusion techniques for an expert search task. In CIKM. ACM, 387--396. Google ScholarDigital Library
- Alistair Moffat and Justin Zobel. 2008. Rank-biased Precision for Measurement of Retrieval Effectiveness. ACM Trans. Inf. Syst. 27, 1, Article 2 (Dec. 2008), 27 pages. Google ScholarDigital Library
- Joao Palotti, Lorraine Goeuriot, Guido Zuccon, and Allan Hanbury. 2016. Ranking health web pages with relevance and understandability. In SIGIR. ACM, 965--968. Google ScholarDigital Library
- João Palotti, Guido Zuccon, Lorraine Goeuriot, Liadh Kelly, Allan Hanbury, Gareth J. F. Jones, Mihai Lupu, and Pavel Pecina. 2015. ShARe/CLEF eHealth Evaluation Lab 2015, Task 2: User-centred Health Information Retrieval. In CLEF.Google Scholar
- Joao Palotti, Guido Zuccon, and Allan Hanbury. 2018. MM: A new Framework for Multidimensional Evaluation of Search Engines. In CIKM. ACM, 1699--1702. Google ScholarDigital Library
- João Palotti, Guido Zuccon, Jimmy, Pavel Pecina, Mihai Lupu, Lorraine Goeuriot, Liadh Kelly, and Allan Hanbury. 2017. CLEF 2017 Task Overview: The IR Task at the eHealth Evaluation Lab - Evaluating Retrieval Methods for Consumer Health Search. In CLEF. http://ceur-ws.org/Vol-1866/invited_paper_16.pdf.Google Scholar
- K. Spark-Jones. 1975. Report on the need for and provision of an 'ideal' information retrieval test collection. Computer Laboratory (1975).Google Scholar
- Trevor Strohman, Donald Metzler, Howard Turtle, and W. Bruce Croft. 2005. Indri: A language model-based search engine for complex queries. In Proceedings of the International Conference on Intelligent Analysis, Vol. 2. 2--6.Google Scholar
- Julián Urbano and Mónica Marrero. 2017. The Treatment of Ties in AP Correlation. In SIGIR. 321--324. Google ScholarDigital Library
- Christophe Van Gysel and Maarten de Rijke. 2018. Pytrec_Eval: An Extremely Fast Python Interface to Trec_Eval. In SIGIR. ACM, 873--876. Google ScholarDigital Library
- Christophe Van Gysel, Evangelos Kanoulas, and Maarten de Rijke. 2017. Pyndri: a Python Interface to the Indri Search Engine. In ECIR, Vol. 2017. Springer.Google Scholar
- Lidan Wang, Paul N. Bennett, and Kevyn Collins-Thompson. 2012. Robust ranking models via risk-sensitive optimization. In SIGIR. ACM, 761--770. Google ScholarDigital Library
- Peilin Yang, Hui Fang, and Jimmy Lin. 2018. Anserini: Reproducible ranking baselines using Lucene. JDIQ 10, 4 (2018), 16. Google ScholarDigital Library
- Emine Yilmaz, Javed A. Aslam, and Stephen Robertson. 2008. A new rank correlation coefficient for information retrieval. In SIGIR. ACM, 587--594. Google ScholarDigital Library
- Guido Zuccon. 2016. Understandability biased evaluation for information retrieval. In ECIR. Springer, 280--292.Google Scholar
- Guido Zuccon, João Palotti, Lorraine Goeuriot, Liadh Kelly, Mihai Lupu, Pavel Pecina, Henning Mueller, Julie Budaher, and Anthony Deacon. 2016. The IR Task at the CLEF eHealth Evaluation Lab 2016: User-centred Health Information Retrieval. In CLEF, Vol. 1609. 15--27.Google Scholar
Index Terms
TrecTools: an Open-source Python Library for Information Retrieval Practitioners Involved in TREC-like Campaigns
Recommendations
Predicting relevance based on assessor disagreement: analysis and practical applications for search evaluation
AbstractEvaluation of search engines relies on assessments of search results for selected test queries, from which we would ideally like to draw conclusions in terms of relevance of the results for general (e.g., future, unknown) users. In practice ...
On information retrieval metrics designed for evaluation with incomplete relevance assessments
Modern information retrieval (IR) test collections have grown in size, but the available manpower for relevance assessments has more or less remained constant. Hence, how to reliably evaluate and compare IR systems using incomplete relevance data, ...
A new context-dependent term weight computed by boost and discount using relevance information
We studied the effectiveness of a new class of context-dependent term weights for information retrieval. Unlike the traditional term frequency–inverse document frequency (TF–IDF), the new weighting of a term t in a document d depends not only on the ...
Comments