ABSTRACT
ranxhub is an online repository for sharing artifacts deriving from the evaluation of Information Retrieval systems. Specifically, we provide a platform for sharing pre-computed runs: the ranked lists of documents retrieved for a specific set of queries by a retrieval model. We also extend ranx, a Python library for the evaluation and comparison of Information Retrieval runs, adding functionalities to integrate the usage of ranxhub seamlessly, allowing the user to compare the results of multiple systems in just a few lines of code. In this paper, we first outline the many advantages and implications that an online repository for sharing runs can bring to the table. Then, we introduce ranxhub and its integration with ranx, showing its very simple usage. Finally, we discuss some use cases for which ranxhub can be highly valuable for the research community.
- Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, and Tong Wang. 2016. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. https://doi.org/10.48550/ARXIV.1611.09268Google ScholarCross Ref
- Elias Bassani. 2022. ranx: A Blazing-Fast Python Library for Ranking Evaluation and Comparison. In Advances in Information Retrieval - 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10-14, 2022, Proceedings, Part II (Lecture Notes in Computer Science, Vol. 13186), Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer, 259--264. https://doi.org/10.1007/978-3-030-99739-7_30Google ScholarDigital Library
- Elias Bassani, Pranav Kasela, Alessandro Raganato, and Gabriella Pasi. 2022. A Multi-Domain Benchmark for Personalized Search Evaluation. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, October 17-21, 2022, Mohammad Al Hasan and Li Xiong (Eds.). ACM, 3822--3827. https://doi.org/10.1145/3511808.3557536Google ScholarDigital Library
- Elias Bassani and Luca Romelli. 2022. ranx.fuse: A Python Library for Metasearch. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, October 17-21, 2022, Mohammad Al Hasan and Li Xiong (Eds.). ACM, 4808--4812. https://doi.org/10.1145/3511808.3557207Google ScholarDigital Library
- Timo Breuer, Jüri Keller, and Philipp Schaer. 2022. ir_metadata: An Extensible Metadata Schema for IR Experiments. In SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11 - 15, 2022, Enrique Amigó, Pablo Castells, Julio Gonzalo, Ben Carterette, J. Shane Culpepper, and Gabriella Kazai (Eds.). ACM, 3078--3089. https://doi.org/10.1145/3477495.3531738Google ScholarDigital Library
- Nick Craswell, Bhaskar Mitra, Emine Yilmaz, and Daniel Campos. 2020a. Overview of the TREC 2020 Deep Learning Track. In Proceedings of the Twenty-Ninth Text REtrieval Conference, TREC 2020, Virtual Event [Gaithersburg, Maryland, USA], November 16-20, 2020 (NIST Special Publication, Vol. 1266), Ellen M. Voorhees and Angela Ellis (Eds.). National Institute of Standards and Technology (NIST). https://trec.nist.gov/pubs/trec29/papers/OVERVIEW.DL.pdfGoogle Scholar
- Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, and Ellen M. Voorhees. 2020b. Overview of the TREC 2019 deep learning track. CoRR, Vol. abs/2003.07820 (2020). showeprint[arXiv]2003.07820 https://arxiv.org/abs/2003.07820Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, and Short Papers). Association for Computational Linguistics. https://doi.org/10.18653/v1/n19-1423Google ScholarCross Ref
- Thibault Formal, Benjamin Piwowarski, and Stéphane Clinchant. 2021. A White Box Analysis of ColBERT. In Advances in Information Retrieval - 43rd European Conference on IR Research, ECIR 2021, Virtual Event, March 28 - April 1, 2021, Proceedings, Part II (Lecture Notes in Computer Science, Vol. 12657), Djoerd Hiemstra, Marie-Francine Moens, Josiane Mothe, Raffaele Perego, Martin Potthast, and Fabrizio Sebastiani (Eds.). Springer, 257--263. https://doi.org/10.1007/978-3-030-72240-1_23Google ScholarDigital Library
- Thibault Formal, Benjamin Piwowarski, and Stéphane Clinchant. 2022. Match Your Words! A Study of Lexical Matching in Neural Information Retrieval. In Advances in Information Retrieval - 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10-14, 2022, Proceedings, Part II (Lecture Notes in Computer Science, Vol. 13186), Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer, 120--127. https://doi.org/10.1007/978-3-030-99739-7_14Google ScholarDigital Library
- Donna Harman. 2011. Information Retrieval Evaluation. Morgan & Claypool Publishers.Google Scholar
- David A. Patterson, Joseph Gonzalez, Quoc V. Le, Chen Liang, Lluis-Miquel Munguia, Daniel Rothchild, David R. So, Maud Texier, and Jeff Dean. 2021. Carbon Emissions and Large Neural Network Training. CoRR, Vol. abs/2104.10350 (2021). [arXiv]2104.10350 https://arxiv.org/abs/2104.10350Google Scholar
- Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res., Vol. 21 (2020), 140:1--140:67. http://jmlr.org/papers/v21/20-074.htmlGoogle Scholar
- David Rau and Jaap Kamps. 2022. How Different are Pre-trained Transformers for Text Ranking?. In Advances in Information Retrieval - 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10-14, 2022, Proceedings, Part II (Lecture Notes in Computer Science, Vol. 13186), Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty (Eds.). Springer, 207--214. https://doi.org/10.1007/978--3-030--99739--7_24Google ScholarCross Ref
- Mark Sanderson. 2010. Test Collection Based Evaluation of Information Retrieval Systems. Found. Trends Inf. Retr., Vol. 4, 4 (2010), 247--375.Google ScholarCross Ref
- Harrisen Scells, Shengyao Zhuang, and Guido Zuccon. 2022. Reduce, Reuse, Recycle: Green Information Retrieval Research. In SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11 - 15, 2022, Enrique Amigó, Pablo Castells, Julio Gonzalo, Ben Carterette, J. Shane Culpepper, and Gabriella Kazai (Eds.). ACM, 2825--2837. https://doi.org/10.1145/3477495.3531766Google ScholarDigital Library
- Emma Strubell, Ananya Ganesh, and Andrew McCallum. 2019. Energy and Policy Considerations for Deep Learning in NLP. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28-August 2, 2019, Volume 1: Long Papers, Anna Korhonen, David R. Traum, and Lluís Mà rquez (Eds.). Association for Computational Linguistics, 3645--3650. https://doi.org/10.18653/v1/p19-1355Google ScholarCross Ref
- Emma Strubell, Ananya Ganesh, and Andrew McCallum. 2020. Energy and Policy Considerations for Modern Deep Learning Research. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020. AAAI Press, 13693--13696. https://ojs.aaai.org/index.php/AAAI/article/view/7123Google ScholarCross Ref
- Nandan Thakur, Nils Reimers, Andreas Rücklé, Abhishek Srivastava, and Iryna Gurevych. 2021. BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual, Joaquin Vanschoren and Sai-Kit Yeung (Eds.). https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/65b9eea6e1cc6bb9f0cd2a47751a186f-Abstract-round2.htmlGoogle Scholar
- E Voorhees and D Harman. 2005. Experiment and evaluation in information retrieval.Google Scholar
- Ellen M. Voorhees, Shahzad Rajput, and Ian Soboroff. 2016. Promoting Repeatability Through Open Runs. In Proceedings of the Seventh International Workshop on Evaluating Information Access, EVIA 2016, a Satellite Workshop of the NTCIR-12 Conference, National Center of Sciences, Tokyo, Japan, june 7, 2016, Emine Yilmaz and Charles L. A. Clarke (Eds.). National Institute of Informatics (NII). http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings12/pdf/evia/04-EVIA2016-VoorheesE.pdfGoogle Scholar
- Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, and Jamie Brew. 2019. HuggingFace's Transformers: State-of-the-art Natural Language Processing. CoRR, Vol. abs/1910.03771 (2019). showeprint[arXiv]1910.03771 http://arxiv.org/abs/1910.03771Google Scholar
Index Terms
- ranxhub: An Online Repository for Information Retrieval Runs
Recommendations
Current Status of the Evaluation of Information Retrieval
This is the second in the series of the articles on an application of the systems analytic approach to evaluation of information retrieval (IR). In the previous article a historical overview of IR was presented and existing terminological problems ...
On Obtaining Effort Based Judgements for Information Retrieval
WSDM '16: Proceedings of the Ninth ACM International Conference on Web Search and Data MiningDocument relevance has been the primary focus in the design, optimization and evaluation of retrieval systems. Traditional testcollections are constructed by asking judges the relevance grade for a document with respect to an input query. Recent work of ...
Introducing a Conceptual Information Retrieval (IR) Framework
This is the third in the series of the articles on an application of the systems analytic approach to evaluation of information retrieval (IR). Previously terminological and evaluation problems associated with IR were identified, and it was proposed ...
Comments