skip to main content
10.1145/3331184.3331399acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

TrecTools: an Open-source Python Library for Information Retrieval Practitioners Involved in TREC-like Campaigns

Published:18 July 2019Publication History

ABSTRACT

This paper introduces TrecTools, a Python library for assisting Information Retrieval (IR) practitioners with TREC-like campaigns. IR practitioners tasked with activities like building test collections, evaluating systems, or analysing results from empirical experiments commonly have to resort to use a number of different software tools and scripts that each perform an individual functionality - and at times they even have to implement ad-hoc scripts of their own. TrecTools aims to provide a unified environment for performing these common activities.

Written in the most popular programming language for Data Science, Python, TrecTools offers an object-oriented, easily extensible library. Existing systems, e.g., trec_eval, have considerable barrier to entry when it comes to modify or extend them. Furthermore, many existing IR measures and tools are implemented independently of each other, in different programming languages. TrecTools seeks to lower the barrier to entry and to unify existing tools, frameworks and activities into one common umbrella. Widespread adoption of a centralised solution for developing, evaluating, and analysing TREC-like campaigns will ease the burden on organisers and provide participants and users with a standard environment for common IR experimental activities.

TrecTools is distributed as an open source library under the MIT license at https://github.com/joaopalotti/trectools

References

  1. Enrique Amigó, Jorge Carrillo-de Albornoz, Mario Almagro-Cádiz, Julio Gonzalo, Javier Rodríguez-Vidal, and Felisa Verdejo. 2017. Evall: Open access evaluation for information access systems. In SIGIR. ACM, 1301--1304. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Javed A. Aslam and Mark Montague. 2001. Models for Metasearch. In SIGIR. ACM, 276--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Leif Azzopardi, Paul Thomas, and Alistair Moffat. 2019. cwl_eval: An Evaluation Tool for Information Retrieval. In SIGIR. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Chris Buckley et al. 2004. The trec_eval evaluation package.Google ScholarGoogle Scholar
  5. Charles L. A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan Büttcher, and Ian MacKinnon. 2008. Novelty and diversity in information retrieval evaluation. In SIGIR. ACM, 659--666. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Gordon V. Cormack, Charles L. A. Clarke, and Stefan Buettcher. 2009. Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In SIGIR, Vol. 9. 758--759. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jimmy, Guido Zuccon, João Palotti, Lorraine Goeuriot, and Liadh Kelly. 2018. Overview of the CLEF 2018 Consumer Health Search Task. In CLEF. http://ceur-ws.org/Vol-2125/invited_paper_17.pdf.Google ScholarGoogle Scholar
  8. Bevan Koopman and Guido Zuccon. 2014. Relevation!: An open source system for information retrieval relevance assessment. In SIGIR. 1243--1244. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Joon Ho Lee. 1997. Analyses of Multiple Evidence Combination. In SIGIR. ACM, 267--276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Aldo Lipani, Mihai Lupu, and Allan Hanbury. 2017. Visual Pool: A Tool to Visualize and Interact with the Pooling Method. In SIGIR. ACM, 1321--1324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Aldo Lipani, Joao Palotti, Mihai Lupu, Florina Piroi, Guido Zuccon, and Allan Hanbury. 2017. Fixed-cost pooling strategies based on IR evaluation measures. In ECIR. Springer, 357--368.Google ScholarGoogle Scholar
  12. Craig Macdonald, Richard McCreadie, Rodrygo L. T. Santos, and Iadh Ounis. 2012. From puppy to maturity: Experiences in developing Terrier. Proc. of OSIR at SIGIR (2012), 60--63.Google ScholarGoogle Scholar
  13. Craig Macdonald and Iadh Ounis. 2006. Voting for candidates: adapting data fusion techniques for an expert search task. In CIKM. ACM, 387--396. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Alistair Moffat and Justin Zobel. 2008. Rank-biased Precision for Measurement of Retrieval Effectiveness. ACM Trans. Inf. Syst. 27, 1, Article 2 (Dec. 2008), 27 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Joao Palotti, Lorraine Goeuriot, Guido Zuccon, and Allan Hanbury. 2016. Ranking health web pages with relevance and understandability. In SIGIR. ACM, 965--968. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. João Palotti, Guido Zuccon, Lorraine Goeuriot, Liadh Kelly, Allan Hanbury, Gareth J. F. Jones, Mihai Lupu, and Pavel Pecina. 2015. ShARe/CLEF eHealth Evaluation Lab 2015, Task 2: User-centred Health Information Retrieval. In CLEF.Google ScholarGoogle Scholar
  17. Joao Palotti, Guido Zuccon, and Allan Hanbury. 2018. MM: A new Framework for Multidimensional Evaluation of Search Engines. In CIKM. ACM, 1699--1702. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. João Palotti, Guido Zuccon, Jimmy, Pavel Pecina, Mihai Lupu, Lorraine Goeuriot, Liadh Kelly, and Allan Hanbury. 2017. CLEF 2017 Task Overview: The IR Task at the eHealth Evaluation Lab - Evaluating Retrieval Methods for Consumer Health Search. In CLEF. http://ceur-ws.org/Vol-1866/invited_paper_16.pdf.Google ScholarGoogle Scholar
  19. K. Spark-Jones. 1975. Report on the need for and provision of an 'ideal' information retrieval test collection. Computer Laboratory (1975).Google ScholarGoogle Scholar
  20. Trevor Strohman, Donald Metzler, Howard Turtle, and W. Bruce Croft. 2005. Indri: A language model-based search engine for complex queries. In Proceedings of the International Conference on Intelligent Analysis, Vol. 2. 2--6.Google ScholarGoogle Scholar
  21. Julián Urbano and Mónica Marrero. 2017. The Treatment of Ties in AP Correlation. In SIGIR. 321--324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Christophe Van Gysel and Maarten de Rijke. 2018. Pytrec_Eval: An Extremely Fast Python Interface to Trec_Eval. In SIGIR. ACM, 873--876. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Christophe Van Gysel, Evangelos Kanoulas, and Maarten de Rijke. 2017. Pyndri: a Python Interface to the Indri Search Engine. In ECIR, Vol. 2017. Springer.Google ScholarGoogle Scholar
  24. Lidan Wang, Paul N. Bennett, and Kevyn Collins-Thompson. 2012. Robust ranking models via risk-sensitive optimization. In SIGIR. ACM, 761--770. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Peilin Yang, Hui Fang, and Jimmy Lin. 2018. Anserini: Reproducible ranking baselines using Lucene. JDIQ 10, 4 (2018), 16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Emine Yilmaz, Javed A. Aslam, and Stephen Robertson. 2008. A new rank correlation coefficient for information retrieval. In SIGIR. ACM, 587--594. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Guido Zuccon. 2016. Understandability biased evaluation for information retrieval. In ECIR. Springer, 280--292.Google ScholarGoogle Scholar
  28. Guido Zuccon, João Palotti, Lorraine Goeuriot, Liadh Kelly, Mihai Lupu, Pavel Pecina, Henning Mueller, Julie Budaher, and Anthony Deacon. 2016. The IR Task at the CLEF eHealth Evaluation Lab 2016: User-centred Health Information Retrieval. In CLEF, Vol. 1609. 15--27.Google ScholarGoogle Scholar

Index Terms

  1. TrecTools: an Open-source Python Library for Information Retrieval Practitioners Involved in TREC-like Campaigns

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval
        July 2019
        1512 pages
        ISBN:9781450361729
        DOI:10.1145/3331184

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 18 July 2019

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        SIGIR'19 Paper Acceptance Rate84of426submissions,20%Overall Acceptance Rate792of3,983submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader