skip to main content
10.1145/3539618.3592067acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

ExaRanker: Synthetic Explanations Improve Neural Rankers

Published: 18 July 2023 Publication History

Abstract

Recent work has shown that incorporating explanations into the output generated by large language models (LLMs) can significantly enhance performance on a broad spectrum of reasoning tasks. Our study extends these findings by demonstrating the benefits of explanations for neural rankers. By utilizing LLMs such as GPT-3.5 to enrich retrieval datasets with explanations, we trained a sequence-to-sequence ranking model, dubbed ExaRanker, to generate relevance labels and explanations for query-document pairs. The ExaRanker model, finetuned on a limited number of examples and synthetic explanations, exhibits performance comparable to models finetuned on three times more examples, but without explanations. Moreover, incorporating explanations imposes no additional computational overhead into the reranking step and allows for on-demand explanation generation. The codebase and datasets used in this study will be available at https://github.com/unicamp-dl/ExaRanker

Supplemental Material

MP4 File
Presentation video of ExaRanker - Explanation-Augmented Neural Ranker. A novel method to generate augmented dataset with LLM?s and train a neural ranker in a seq-to-seq strategy, bringing the benefits of LLM in text processing in the IR field.

References

[1]
P. Bajaj, D. Campos, N. Craswell, L. Deng, J. Gao, X. Liu, R. Majumder, A. Mc- Namara, B. Mitra, T. Nguyen, M. Rosenberg, X. Song, A. Stoica, S. Tiwary, and T. Wang. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. arXiv:1611.09268v3, 2018.
[2]
H. Bast and M. Celikik. Efficient index-based snippet generation. ACM Transactions on Information Systems (TOIS), 32(2):1--24, 2014.
[3]
L. Bonifacio, H. Abonizio, M. Fadaee, and R. Nogueira. Inpars: Unsupervised dataset generation for information retrieval. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2387--2392, 2022.
[4]
V. Boteva, D. Gholipour, A. Sokolov, and S. Riezler. A full-text learning to rank dataset for medical information retrieval. In European Conference on Information Retrieval, pages 716--722. Springer, 2016.
[5]
M. Bueno, C. Gemmel, J. Dalton, R. Lotufo, and R. Nogueira. Induced natural language rationales and interleaved markup tokens enable extrapolation in large language models. arXiv preprint arXiv:2208.11445, 2022.
[6]
W.-F. Chen, S. Syed, B. Stein, M. Hagen, and M. Potthast. Abstractive snippet generation. In Proceedings of The Web Conference 2020, pages 1309--1319, 2020.
[7]
N. Craswell, B. Mitra, E. Yilmaz, and D. Campos. Overview of the TREC 2020 deep learning track. CoRR, abs/2102.07662, 2021.
[8]
Z. Dai, V. Y. Zhao, J. Ma, Y. Luan, J. Ni, J. Lu, A. Bakalov, K. Guu, K. B. Hall, and M.-W. Chang. Promptagator: Few-shot dense retrieval from 8 examples. arXiv preprint arXiv:2209.11755, 2022.
[9]
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171--4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics.
[10]
P. Fernandes, M. Treviso, D. Pruthi, A. F. Martins, and G. Neubig. Learning to scaffold: Optimizing model explanations for teaching. arXiv preprint arXiv:2204.10810, 2022.
[11]
Z. T. Fernando, J. Singh, and A. Anand. A study on the interpretability of neural retrieval models using deepshap. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1005--1008, 2019.
[12]
T. Formal, B. Piwowarski, and S. Clinchant. Splade: Sparse lexical and expansion model for first stage ranking. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2288-- 2292, 2021.
[13]
L. Gao and J. Callan. Unsupervised corpus aware language model pre-training for dense passage retrieval. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2843-- 2853, Dublin, Ireland, May 2022. Association for Computational Linguistics.
[14]
L. Gao, Z. Dai, and J. Callan. Rethink training of bert rerankers in multi-stage retrieval pipeline. arXiv preprint arXiv:2101.08751, 2021.
[15]
F. Hasibi, F. Nikolaev, C. Xiong, K. Balog, S. E. Bratsberg, A. Kotov, and J. Callan. Dbpedia-entity v2: a test collection for entity search. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1265--1268, 2017.
[16]
S. Hofstätter, O. Khattab, S. Althammer, M. Sertkan, and A. Hanbury. Introducing neural bag of whole-words with colberter: Contextualized late interactions using enhanced reduction. arXiv preprint arXiv:2203.13088, 2022.
[17]
S. Hofstätter, S.-C. Lin, J.-H. Yang, J. Lin, and A. Hanbury. Efficiently teaching an effective dense retriever with balanced topic aware sampling. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 113--122, 2021.
[18]
J. Huang, S. S. Gu, L. Hou, Y. Wu, X. Wang, H. Yu, and J. Han. Large language models can self-improve. arXiv preprint arXiv:2210.11610, 2022.
[19]
G. Izacard, M. Caron, L. Hosseini, S. Riedel, P. Bojanowski, A. Joulin, and E. Grave. Unsupervised dense information retrieval with contrastive learning, 2021.
[20]
V. Jeronymo, L. Bonifacio, H. Abonizio, M. Fadaee, R. Lotufo, J. Zavrel, and R. Nogueira. Inpars-v2: Large language models as efficient dataset generators for information retrieval, 2023.
[21]
U. Katz, M. Geva, and J. Berant. Inferring implicit relations with language models. arXiv preprint arXiv:2204.13778, 2022.
[22]
C. Li, A. Yates, S. MacAvaney, B. He, and Y. Sun. Parade: Passage representation aggregation for document reranking. arXiv preprint arXiv:2008.09093, 2020.
[23]
J. Lin, X. Ma, S.-C. Lin, J.-H. Yang, R. Pradeep, and R. Nogueira. Pyserini: An easy-to-use Python toolkit to support replicable ir research with sparse and dense representations. ArXiv, abs/2102.10073, 2021.
[24]
J. Lin, R. Nogueira, and A. Yates. Pretrained transformers for text ranking: Bert and beyond. Synthesis Lectures on Human Language Technologies, 14(4):1--325, 2021.
[25]
I. Loshchilov and F. Hutter. Decoupled weight decay regularization. In International Conference on Learning Representations, 2019.
[26]
S. Lu, D. He, C. Xiong, G. Ke, W. Malik, Z. Dou, P. Bennett, T.-Y. Liu, and A. Overwijk. Less is more: Pretrain a strong Siamese encoder for dense text retrieval using a weak decoder. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 2780--2791, Online and Punta Cana, Dominican Republic, Nov. 2021. Association for Computational Linguistics.
[27]
S. MacAvaney, A. Yates, A. Cohan, and N. Goharian. Cedr: Contextualized embeddings for document ranking. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1101--1104, 2019.
[28]
L. C. Magister, J. Mallinson, J. Adamek, E. Malmi, and A. Severyn. Teaching small language models to reason, 2022.
[29]
M. Maia, S. Handschuh, A. Freitas, B. Davis, R. McDermott, M. Zarrouk, and A. Balahur. Www'18 open challenge: Financial opinion mining and question answering. In Companion Proceedings of the The Web Conference 2018, WWW '18, page 1941--1942, Republic and Canton of Geneva, CHE, 2018. International World Wide Web Conferences Steering Committee.
[30]
S. Mysore, A. Cohan, and T. Hope. Multi-vector models with textual guidance for fine-grained scientific document similarity. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4453--4470, Seattle, United States, July 2022. Association for Computational Linguistics.
[31]
R. Nogueira and K. Cho. Passage re-ranking with bert. arXiv preprint arXiv:1901.04085, 2019.
[32]
R. Nogueira, Z. Jiang, R. Pradeep, and J. Lin. Document ranking with a pretrained sequence-to-sequence model. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 708--718, 2020.
[33]
M. Nye, A. J. Andreassen, G. Gur-Ari, H. Michalewski, J. Austin, D. Bieber, D. Dohan, A. Lewkowycz, M. Bosma, D. Luan, C. Sutton, and A. Odena. Show your work: Scratchpads for intermediate computation with language models. In Deep Learning for Code Workshop, 2022.
[34]
L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, et al. Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155, 2022.
[35]
C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, P. J. Liu, et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140):1--67, 2020.
[36]
R. Rahimi, Y. Kim, H. Zamani, and J. Allan. Explaining documents' relevance to search queries. arXiv preprint arXiv:2111.01314, 2021.
[37]
G. Recchia. Teaching autoregressive language models complex tasks by demonstration. arXiv preprint arXiv:2109.02102, 2021.
[38]
K. Roberts, T. Alam, S. Bedrick, D. Demner-Fushman, K. Lo, I. Soboroff, E. Voorhees, L. L. Wang, and W. R. Hersh. TREC-COVID: rationale and structure of an information retrieval shared task for COVID-19. Journal of the American Medical Informatics Association, 27(9):1431--1436, 07 2020.
[39]
S. E. Robertson, S. Walker, S. Jones, M. M. Hancock-Beaulieu, M. Gatford, et al. Okapi at trec-3. Nist Special Publication Sp, 109:109, 1995.
[40]
G. Rosa, L. Bonifacio, V. Jeronymo, H. Abonizio, M. Fadaee, R. Lotufo, and R. Nogueira. In defense of cross-encoders for zero-shot retrieval. arXiv preprint arXiv:2212.06121, 2022.
[41]
G. M. Rosa, L. Bonifacio, V. Jeronymo, H. Abonizio, M. Fadaee, R. Lotufo, and R. Nogueira. No parameter left behind: How distillation and model size affect zero-shot retrieval. arXiv preprint arXiv:2206.02873, 2022.
[42]
D. Roy, S. Saha, M. Mitra, B. Sen, and D. Ganguly. I-rex: a lucene plugin for explainable ir. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pages 2949--2952, 2019.
[43]
D. S. Sachan, M. Lewis, M. Joshi, A. Aghajanyan,W.-t. Yih, J. Pineau, and L. Zettlemoyer. Improving passage retrieval with zero-shot question generation. 2022.
[44]
K. Santhanam, O. Khattab, J. Saad-Falcon, C. Potts, and M. Zaharia. ColBERTv2: Effective and efficient retrieval via lightweight late interaction. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3715--3734, Seattle, United States, July 2022. Association for Computational Linguistics.
[45]
P. Sen, D. Ganguly, M. Verma, and G. J. Jones. The curious case of ir explainability: Explaining document scores within and across ranking models. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2069--2072, 2020.
[46]
J. Singh and A. Anand. Interpreting search result rankings through intent modeling. arXiv preprint arXiv:1809.05190, 2018.
[47]
J. Singh and A. Anand. Posthoc interpretability of learning to rank models using secondary training data. arXiv preprint arXiv:1806.11330, 2018.
[48]
J. Singh and A. Anand. Exs: Explainable search using local model agnostic interpretability. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pages 770--773, 2019.
[49]
J. Singh, Z. Wang, M. Khosla, and A. Anand. Valid explanations for learning to rank models. arXiv preprint arXiv:2004.13972, 2020.
[50]
I. Soboroff, S. Huang, and D. Harman. Trec 2018 news track overview.
[51]
N. Thakur, N. Reimers, and J. Lin. Domain adaptation for memory-efficient dense retrieval. arXiv preprint arXiv:2205.11498, 2022.
[52]
N. Thakur, N. Reimers, A. Rücklé, A. Srivastava, and I. Gurevych. Beir: A heterogeneous benchmark for zero-shot evaluation of information retrieval models. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.
[53]
P. Thomas, B. Billerbeck, N. Craswell, and R. W. White. Investigating searchers' mental models to inform search explanations. ACM Transactions on Information Systems (TOIS), 38(1):1--25, 2019.
[54]
A. Tombros and M. Sanderson. Advantages of query biased summaries in information retrieval. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 2--10, 1998.
[55]
A. Turpin, Y. Tsegay, D. Hawking, and H. E. Williams. Fast generation of result snippets in web search. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 127--134, 2007.
[56]
M. Verma and D. Ganguly. Lirme: locally interpretable ranking model explanation. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1281--1284, 2019.
[57]
M. Völske, A. Bondarenko, M. Fröbe, B. Stein, J. Singh, M. Hagen, and A. Anand. Towards axiomatic explanations for neural ranking models. In Proceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval, pages 13--22, 2021.
[58]
E. Voorhees. Overview of the trec 2004 robust retrieval track, 2005-08-01 2005.
[59]
K. Wang, N. Thakur, N. Reimers, and I. Gurevych. Gpl: Generative pseudo labeling for unsupervised domain adaptation of dense retrieval. arXiv preprint arXiv:2112.07577, 2021.
[60]
X. Wang, J. Wei, D. Schuurmans, Q. Le, E. Chi, and D. Zhou. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022.
[61]
J. Xin, C. Xiong, A. Srinivasan, A. Sharma, D. Jose, and P. N. Bennett. Zero-shot dense retrieval with momentum adversarial domain invariant representations. arXiv preprint arXiv:2110.07581, 2021.
[62]
P. Yu, R. Rahimi, and J. Allan. Towards explainable search results: A listwise explanation generator. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 669--680, 2022.
[63]
E. Zelikman, Y. Wu, and N. D. Goodman. Star: Bootstrapping reasoning with reasoning. arXiv preprint arXiv:2203.14465, 2022.
[64]
D. Zhou, N. Schärli, L. Hou, J. Wei, N. Scales, X. Wang, D. Schuurmans, O. Bousquet, Q. Le, and E. Chi. Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625, 2022.
[65]
H. Zhuang, Z. Qin, R. Jagerman, K. Hui, J. Ma, J. Lu, J. Ni, X.Wang, and M. Bendersky. Rankt5: Fine-tuning t5 for text ranking with ranking losses. arXiv preprint arXiv:2210.10634, 2022.
[66]
H. Zhuang, X.Wang, M. Bendersky, A. Grushetsky, Y.Wu, P. Mitrichev, E. Sterling, N. Bell, W. Ravina, and H. Qian. Interpretable ranking with generalized additive models. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, pages 499--507, 2021.

Cited By

View all
  • (2024)Leveraging Large Language Models for Improving Keyphrase Generation for Contextual TargetingProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680093(4349-4357)Online publication date: 21-Oct-2024
  • (2024)BIDTrainer: An LLMs-driven Education Tool for Enhancing the Understanding and Reasoning in Bio-inspired DesignProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642887(1-20)Online publication date: 11-May-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2023
3567 pages
ISBN:9781450394086
DOI:10.1145/3539618
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. explanations
  2. few-shot models
  3. generative models
  4. large language models
  5. multi-stage ranking
  6. synthetic datasets

Qualifiers

  • Short-paper

Conference

SIGIR '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)144
  • Downloads (Last 6 weeks)15
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Leveraging Large Language Models for Improving Keyphrase Generation for Contextual TargetingProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680093(4349-4357)Online publication date: 21-Oct-2024
  • (2024)BIDTrainer: An LLMs-driven Education Tool for Enhancing the Understanding and Reasoning in Bio-inspired DesignProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642887(1-20)Online publication date: 11-May-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media