short-paper

ExaRanker: Synthetic Explanations Improve Neural Rankers

Authors:

Fernando Ferraretto,

Roberto Lotufo,

Rodrigo NogueiraAuthors Info & Claims

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 2409 - 2414

https://doi.org/10.1145/3539618.3592067

Published: 18 July 2023 Publication History

Abstract

Recent work has shown that incorporating explanations into the output generated by large language models (LLMs) can significantly enhance performance on a broad spectrum of reasoning tasks. Our study extends these findings by demonstrating the benefits of explanations for neural rankers. By utilizing LLMs such as GPT-3.5 to enrich retrieval datasets with explanations, we trained a sequence-to-sequence ranking model, dubbed ExaRanker, to generate relevance labels and explanations for query-document pairs. The ExaRanker model, finetuned on a limited number of examples and synthetic explanations, exhibits performance comparable to models finetuned on three times more examples, but without explanations. Moreover, incorporating explanations imposes no additional computational overhead into the reranking step and allows for on-demand explanation generation. The codebase and datasets used in this study will be available at https://github.com/unicamp-dl/ExaRanker

Supplemental Material

MP4 File

Presentation video of ExaRanker - Explanation-Augmented Neural Ranker. A novel method to generate augmented dataset with LLM?s and train a neural ranker in a seq-to-seq strategy, bringing the benefits of LLM in text processing in the IR field.

Download
19.65 MB

References

[1]

P. Bajaj, D. Campos, N. Craswell, L. Deng, J. Gao, X. Liu, R. Majumder, A. Mc- Namara, B. Mitra, T. Nguyen, M. Rosenberg, X. Song, A. Stoica, S. Tiwary, and T. Wang. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. arXiv:1611.09268v3, 2018.

[2]

H. Bast and M. Celikik. Efficient index-based snippet generation. ACM Transactions on Information Systems (TOIS), 32(2):1--24, 2014.

[3]

L. Bonifacio, H. Abonizio, M. Fadaee, and R. Nogueira. Inpars: Unsupervised dataset generation for information retrieval. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2387--2392, 2022.

Digital Library

[4]

V. Boteva, D. Gholipour, A. Sokolov, and S. Riezler. A full-text learning to rank dataset for medical information retrieval. In European Conference on Information Retrieval, pages 716--722. Springer, 2016.

[5]

M. Bueno, C. Gemmel, J. Dalton, R. Lotufo, and R. Nogueira. Induced natural language rationales and interleaved markup tokens enable extrapolation in large language models. arXiv preprint arXiv:2208.11445, 2022.

[6]

W.-F. Chen, S. Syed, B. Stein, M. Hagen, and M. Potthast. Abstractive snippet generation. In Proceedings of The Web Conference 2020, pages 1309--1319, 2020.

Digital Library

[7]

N. Craswell, B. Mitra, E. Yilmaz, and D. Campos. Overview of the TREC 2020 deep learning track. CoRR, abs/2102.07662, 2021.

[8]

Z. Dai, V. Y. Zhao, J. Ma, Y. Luan, J. Ni, J. Lu, A. Bakalov, K. Guu, K. B. Hall, and M.-W. Chang. Promptagator: Few-shot dense retrieval from 8 examples. arXiv preprint arXiv:2209.11755, 2022.

[9]

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171--4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics.

[10]

P. Fernandes, M. Treviso, D. Pruthi, A. F. Martins, and G. Neubig. Learning to scaffold: Optimizing model explanations for teaching. arXiv preprint arXiv:2204.10810, 2022.

[11]

Z. T. Fernando, J. Singh, and A. Anand. A study on the interpretability of neural retrieval models using deepshap. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1005--1008, 2019.

Digital Library

[12]

T. Formal, B. Piwowarski, and S. Clinchant. Splade: Sparse lexical and expansion model for first stage ranking. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2288-- 2292, 2021.

Digital Library

[13]

L. Gao and J. Callan. Unsupervised corpus aware language model pre-training for dense passage retrieval. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2843-- 2853, Dublin, Ireland, May 2022. Association for Computational Linguistics.

[14]

L. Gao, Z. Dai, and J. Callan. Rethink training of bert rerankers in multi-stage retrieval pipeline. arXiv preprint arXiv:2101.08751, 2021.

[15]

F. Hasibi, F. Nikolaev, C. Xiong, K. Balog, S. E. Bratsberg, A. Kotov, and J. Callan. Dbpedia-entity v2: a test collection for entity search. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1265--1268, 2017.

Digital Library

[16]

S. Hofstätter, O. Khattab, S. Althammer, M. Sertkan, and A. Hanbury. Introducing neural bag of whole-words with colberter: Contextualized late interactions using enhanced reduction. arXiv preprint arXiv:2203.13088, 2022.

[17]

S. Hofstätter, S.-C. Lin, J.-H. Yang, J. Lin, and A. Hanbury. Efficiently teaching an effective dense retriever with balanced topic aware sampling. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 113--122, 2021.

Digital Library

[18]

J. Huang, S. S. Gu, L. Hou, Y. Wu, X. Wang, H. Yu, and J. Han. Large language models can self-improve. arXiv preprint arXiv:2210.11610, 2022.

[19]

G. Izacard, M. Caron, L. Hosseini, S. Riedel, P. Bojanowski, A. Joulin, and E. Grave. Unsupervised dense information retrieval with contrastive learning, 2021.

[20]

V. Jeronymo, L. Bonifacio, H. Abonizio, M. Fadaee, R. Lotufo, J. Zavrel, and R. Nogueira. Inpars-v2: Large language models as efficient dataset generators for information retrieval, 2023.

[21]

U. Katz, M. Geva, and J. Berant. Inferring implicit relations with language models. arXiv preprint arXiv:2204.13778, 2022.

[22]

C. Li, A. Yates, S. MacAvaney, B. He, and Y. Sun. Parade: Passage representation aggregation for document reranking. arXiv preprint arXiv:2008.09093, 2020.

[23]

J. Lin, X. Ma, S.-C. Lin, J.-H. Yang, R. Pradeep, and R. Nogueira. Pyserini: An easy-to-use Python toolkit to support replicable ir research with sparse and dense representations. ArXiv, abs/2102.10073, 2021.

[24]

J. Lin, R. Nogueira, and A. Yates. Pretrained transformers for text ranking: Bert and beyond. Synthesis Lectures on Human Language Technologies, 14(4):1--325, 2021.

[25]

I. Loshchilov and F. Hutter. Decoupled weight decay regularization. In International Conference on Learning Representations, 2019.

[26]

S. Lu, D. He, C. Xiong, G. Ke, W. Malik, Z. Dou, P. Bennett, T.-Y. Liu, and A. Overwijk. Less is more: Pretrain a strong Siamese encoder for dense text retrieval using a weak decoder. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 2780--2791, Online and Punta Cana, Dominican Republic, Nov. 2021. Association for Computational Linguistics.

[27]

S. MacAvaney, A. Yates, A. Cohan, and N. Goharian. Cedr: Contextualized embeddings for document ranking. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1101--1104, 2019.

Digital Library

[28]

L. C. Magister, J. Mallinson, J. Adamek, E. Malmi, and A. Severyn. Teaching small language models to reason, 2022.

[29]

M. Maia, S. Handschuh, A. Freitas, B. Davis, R. McDermott, M. Zarrouk, and A. Balahur. Www'18 open challenge: Financial opinion mining and question answering. In Companion Proceedings of the The Web Conference 2018, WWW '18, page 1941--1942, Republic and Canton of Geneva, CHE, 2018. International World Wide Web Conferences Steering Committee.

[30]

S. Mysore, A. Cohan, and T. Hope. Multi-vector models with textual guidance for fine-grained scientific document similarity. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4453--4470, Seattle, United States, July 2022. Association for Computational Linguistics.

[31]

R. Nogueira and K. Cho. Passage re-ranking with bert. arXiv preprint arXiv:1901.04085, 2019.

[32]

R. Nogueira, Z. Jiang, R. Pradeep, and J. Lin. Document ranking with a pretrained sequence-to-sequence model. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 708--718, 2020.

[33]

M. Nye, A. J. Andreassen, G. Gur-Ari, H. Michalewski, J. Austin, D. Bieber, D. Dohan, A. Lewkowycz, M. Bosma, D. Luan, C. Sutton, and A. Odena. Show your work: Scratchpads for intermediate computation with language models. In Deep Learning for Code Workshop, 2022.

[34]

L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, et al. Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155, 2022.

[35]

C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, P. J. Liu, et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140):1--67, 2020.

[36]

R. Rahimi, Y. Kim, H. Zamani, and J. Allan. Explaining documents' relevance to search queries. arXiv preprint arXiv:2111.01314, 2021.

[37]

G. Recchia. Teaching autoregressive language models complex tasks by demonstration. arXiv preprint arXiv:2109.02102, 2021.

[38]

K. Roberts, T. Alam, S. Bedrick, D. Demner-Fushman, K. Lo, I. Soboroff, E. Voorhees, L. L. Wang, and W. R. Hersh. TREC-COVID: rationale and structure of an information retrieval shared task for COVID-19. Journal of the American Medical Informatics Association, 27(9):1431--1436, 07 2020.

[39]

S. E. Robertson, S. Walker, S. Jones, M. M. Hancock-Beaulieu, M. Gatford, et al. Okapi at trec-3. Nist Special Publication Sp, 109:109, 1995.

[40]

G. Rosa, L. Bonifacio, V. Jeronymo, H. Abonizio, M. Fadaee, R. Lotufo, and R. Nogueira. In defense of cross-encoders for zero-shot retrieval. arXiv preprint arXiv:2212.06121, 2022.

[41]

G. M. Rosa, L. Bonifacio, V. Jeronymo, H. Abonizio, M. Fadaee, R. Lotufo, and R. Nogueira. No parameter left behind: How distillation and model size affect zero-shot retrieval. arXiv preprint arXiv:2206.02873, 2022.

[42]

D. Roy, S. Saha, M. Mitra, B. Sen, and D. Ganguly. I-rex: a lucene plugin for explainable ir. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pages 2949--2952, 2019.

Digital Library

[43]

D. S. Sachan, M. Lewis, M. Joshi, A. Aghajanyan,W.-t. Yih, J. Pineau, and L. Zettlemoyer. Improving passage retrieval with zero-shot question generation. 2022.

[44]

K. Santhanam, O. Khattab, J. Saad-Falcon, C. Potts, and M. Zaharia. ColBERTv2: Effective and efficient retrieval via lightweight late interaction. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3715--3734, Seattle, United States, July 2022. Association for Computational Linguistics.

[45]

P. Sen, D. Ganguly, M. Verma, and G. J. Jones. The curious case of ir explainability: Explaining document scores within and across ranking models. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2069--2072, 2020.

Digital Library

[46]

J. Singh and A. Anand. Interpreting search result rankings through intent modeling. arXiv preprint arXiv:1809.05190, 2018.

[47]

J. Singh and A. Anand. Posthoc interpretability of learning to rank models using secondary training data. arXiv preprint arXiv:1806.11330, 2018.

[48]

J. Singh and A. Anand. Exs: Explainable search using local model agnostic interpretability. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pages 770--773, 2019.

Digital Library

[49]

J. Singh, Z. Wang, M. Khosla, and A. Anand. Valid explanations for learning to rank models. arXiv preprint arXiv:2004.13972, 2020.

[50]

I. Soboroff, S. Huang, and D. Harman. Trec 2018 news track overview.

[51]

N. Thakur, N. Reimers, and J. Lin. Domain adaptation for memory-efficient dense retrieval. arXiv preprint arXiv:2205.11498, 2022.

[52]

N. Thakur, N. Reimers, A. Rücklé, A. Srivastava, and I. Gurevych. Beir: A heterogeneous benchmark for zero-shot evaluation of information retrieval models. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.

[53]

P. Thomas, B. Billerbeck, N. Craswell, and R. W. White. Investigating searchers' mental models to inform search explanations. ACM Transactions on Information Systems (TOIS), 38(1):1--25, 2019.

[54]

A. Tombros and M. Sanderson. Advantages of query biased summaries in information retrieval. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 2--10, 1998.

Digital Library

[55]

A. Turpin, Y. Tsegay, D. Hawking, and H. E. Williams. Fast generation of result snippets in web search. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 127--134, 2007.

Digital Library

[56]

M. Verma and D. Ganguly. Lirme: locally interpretable ranking model explanation. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1281--1284, 2019.

Digital Library

[57]

M. Völske, A. Bondarenko, M. Fröbe, B. Stein, J. Singh, M. Hagen, and A. Anand. Towards axiomatic explanations for neural ranking models. In Proceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval, pages 13--22, 2021.

Digital Library

[58]

E. Voorhees. Overview of the trec 2004 robust retrieval track, 2005-08-01 2005.

[59]

K. Wang, N. Thakur, N. Reimers, and I. Gurevych. Gpl: Generative pseudo labeling for unsupervised domain adaptation of dense retrieval. arXiv preprint arXiv:2112.07577, 2021.

[60]

X. Wang, J. Wei, D. Schuurmans, Q. Le, E. Chi, and D. Zhou. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022.

[61]

J. Xin, C. Xiong, A. Srinivasan, A. Sharma, D. Jose, and P. N. Bennett. Zero-shot dense retrieval with momentum adversarial domain invariant representations. arXiv preprint arXiv:2110.07581, 2021.

[62]

P. Yu, R. Rahimi, and J. Allan. Towards explainable search results: A listwise explanation generator. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 669--680, 2022.

Digital Library

[63]

E. Zelikman, Y. Wu, and N. D. Goodman. Star: Bootstrapping reasoning with reasoning. arXiv preprint arXiv:2203.14465, 2022.

[64]

D. Zhou, N. Schärli, L. Hou, J. Wei, N. Scales, X. Wang, D. Schuurmans, O. Bousquet, Q. Le, and E. Chi. Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625, 2022.

[65]

H. Zhuang, Z. Qin, R. Jagerman, K. Hui, J. Ma, J. Lu, J. Ni, X.Wang, and M. Bendersky. Rankt5: Fine-tuning t5 for text ranking with ranking losses. arXiv preprint arXiv:2210.10634, 2022.

[66]

H. Zhuang, X.Wang, M. Bendersky, A. Grushetsky, Y.Wu, P. Mitrichev, E. Sterling, N. Bell, W. Ravina, and H. Qian. Interpretable ranking with generalized additive models. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, pages 499--507, 2021.

Digital Library

Cited By

Bai XWu XStojkovic ITsioutsiouliklis KSerra ESpezzano F(2024)Leveraging Large Language Models for Improving Keyphrase Generation for Contextual TargetingProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680093(4349-4357)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3680093
Chen LJiang ZXia DCai ZSun LChilds PZuo H(2024)BIDTrainer: An LLMs-driven Education Tool for Enhancing the Understanding and Reasoning in Bio-inspired DesignProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642887(1-20)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642887

Index Terms

ExaRanker: Synthetic Explanations Improve Neural Rankers
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Novelty in information retrieval

Recommendations

InPars: Unsupervised Dataset Generation for Information Retrieval
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

The Information Retrieval (IR) community has recently witnessed a revolution due to large pretrained transformer models. Another key ingredient for this revolution was the MS MARCO dataset, whose scale and diversity has enabled zero-shot transfer ...
Counterfactual Explanations for Neural Recommenders
SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

While neural recommenders have become the state-of-the-art in recent years, the complexity of deep models still makes the generation of tangible explanations for end users a challenging problem. Existing methods are usually based on attention ...
Do Explanations Improve the Quality of AI-assisted Human Decisions? An Algorithm-in-the-Loop Analysis of Factual & Counterfactual Explanations
AAMAS '23: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems

The increased use of AI algorithmic aids in high-stakes decision making has prompted interest in explainable AI (xAI), and the role of counterfactual explanations to increase trust in human-algorithm collaborations and to mitigate unfair outcomes. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2023

3567 pages

ISBN:9781450394086

DOI:10.1145/3539618

General Chairs:
Hsin-Hsi Chen
National Taiwan University
,
Wei-Jou (Edward) Duh
National Taiwan University
,
Hen-Hsen Huang
Academia Sinica
,
Program Chairs:
Makoto P. Kato
Spotify
,
Josiane Mothe
Universite de Toulouse
,
Barbara Poblete
University of Chile and Amazon Visiting Academic

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

SIGIR '23

Sponsor:

SIGIR

SIGIR '23: The 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 23 - 27, 2023

Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
349
Total Downloads

Downloads (Last 12 months)144
Downloads (Last 6 weeks)15

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bai XWu XStojkovic ITsioutsiouliklis KSerra ESpezzano F(2024)Leveraging Large Language Models for Improving Keyphrase Generation for Contextual TargetingProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680093(4349-4357)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3680093
Chen LJiang ZXia DCai ZSun LChilds PZuo H(2024)BIDTrainer: An LLMs-driven Education Tool for Enhancing the Understanding and Reasoning in Bio-inspired DesignProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642887(1-20)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642887

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten