short-paper

Query-specific Variable Depth Pooling via Query Performance Prediction

Authors:

Debasis Ganguly,

Emine YilmazAuthors Info & Claims

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 2303 - 2307

https://doi.org/10.1145/3539618.3592046

Published: 18 July 2023 Publication History

Abstract

Due to the massive size of test collections, a standard practice in IR evaluation is to construct a 'pool' of candidate relevant documents comprised of the top-k documents retrieved by a wide range of different retrieval systems - a process called depth-k pooling. A standard practice is to set the depth (k) to a constant value for each query constituting the benchmark set. However, in this paper we argue that the annotation effort can be substantially reduced if the depth of the pool is made a variable quantity for each query, the rationale being that the number of documents relevant to the information need can widely vary across queries. Our hypothesis is that a lower depth for queries with a small number of relevant documents, and a higher depth for those with a larger number of relevant documents can potentially reduce the annotation effort without a significant change in IR effectiveness evaluation.We make use of standard query performance prediction (QPP) techniques to estimate the number of potentially relevant documents for each query, which is then used to determine the depth of the pool. Our experiments conducted on standard test collections demonstrate that this proposed method of employing query-specific variable depths is able to adequately reflect the relative effectiveness of IR systems with a substantially smaller annotation effort.

References

[1]

Mustafa Abualsaud, Nimesh Ghelani, Haotian Zhang, Mark D. Smucker, Gordon V. Cormack, and Maura R. Grossman. 2018. A System for Efficient High-Recall Retrieval. In Proc. of SIGIR'18. 1317--1320. https://doi.org/10.1145/3209978.3210176

Digital Library

[2]

Avi Arampatzis, Jaap Kamps, and Stephen Robertson. 2009. Where to Stop Reading a Ranked List? Threshold Optimization Using Truncated Score Distributions. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (Boston, MA, USA) (SIGIR '09). Association for Computing Machinery, New York, NY, USA, 524--531. https://doi.org/10.1145/1571941.1572031

Digital Library

[3]

Javed A. Aslam and Emine Yilmaz. 2007. Inferring document relevance from incomplete information. In CIKM. ACM, 633--642.

[4]

Dara Bahri, Yi Tay, Che Zheng, Donald Metzler, and Andrew Tomkins. 2020. Choppy: Cut Transformer for Ranked List Truncation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (Virtual Event, China) (SIGIR '20). Association for Computing Machinery, New York, NY, USA, 1513--1516. https://doi.org/10.1145/3397271.3401188

Digital Library

[5]

Ben Carterette, James Allan, and Ramesh K. Sitaraman. 2006. Minimal test collections for retrieval evaluation. In SIGIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, Washington, USA, August 6--11, 2006, Efthimis N. Efthimiadis, Susan T. Dumais, David Hawking, and Kalervo J"a rvelin (Eds.). ACM, 268--275. https://doi.org/10.1145/1148170.1148219

Digital Library

[6]

Cyril Cleverdon. 1967. The Cranfield tests on index languages devices. (1967). http://www3.interscience.wiley.com/journal/114214228/abstract

[7]

Gordon V. Cormack, Christopher R. Palmer, and Charles L. A. Clarke. 1998. Efficient Construction of Large Test Collections. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Melbourne, Australia) (SIGIR '98). Association for Computing Machinery, New York, NY, USA, 282--289. https://doi.org/10.1145/290941.291009

Digital Library

[8]

Nick Craswell, Bhaskar Mitra, Emine Yilmaz, and Daniel Campos. 2020a. Overview of the TREC 2020 Deep Learning Track. In Proc. 29th Text REtrieval Conference, TREC 2020 (NIST Special Publication, Vol. 1266).

[9]

Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, and Ellen M. Voorhees. 2020b. Overview of the TREC 2019 deep learning track. arxiv: 2003.07820 [cs.IR]

[10]

Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, and Ellen M. Voorhees. 2020c. Overview of the TREC 2019 deep learning track. arxiv: 2003.07820 [cs.IR]

[11]

Steve Cronen-Townsend, Yun Zhou, and W. Bruce Croft. 2002. Predicting Query Performance. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '02). Association for Computing Machinery, New York, NY, USA, 299--306.

Digital Library

[12]

Suchana Datta, Debasis Ganguly, Derek Greene, and Mandar Mitra. 2022a. Deep-QPP: A Pairwise Interaction-based Deep Learning Model for Supervised Query Performance Prediction. In Proc. of WSDM'22. 201--209.

Digital Library

[13]

Suchana Datta, Debasis Ganguly, Mandar Mitra, and Derek Greene. 2022b. A Relative Information Gain-Based Query Performance Prediction Framework with Generated Query Variants. ACM Trans. Inf. Syst., Vol. 41, 2, Article 38 (dec 2022), 31 pages.

[14]

Suchana Datta, Sean MacAvaney, Debasis Ganguly, and Derek Greene. 2022c. A 'Pointwise-Query, Listwise-Document' based Query Performance Prediction Approach. In SIGIR. ACM, 2148--2153.

[15]

Fernando Diaz. 2007. Performance Prediction Using Spatial Autocorrelation. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '07). Association for Computing Machinery, New York, NY, USA, 583--590.

Digital Library

[16]

Claudia Hauff, Djoerd Hiemstra, and Franciska de Jong. 2008. A Survey of Pre-Retrieval Query Performance Predictors. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM '08). Association for Computing Machinery, New York, NY, USA, 1419--1420.

Digital Library

[17]

Ben He and Iadh Ounis. 2004. Inferring Query Performance Using Pre-retrieval Predictors. In String Processing and Information Retrieval. Springer Berlin Heidelberg, Berlin, Heidelberg, 43--54.

[18]

Yen-Chieh Lien, Daniel Cohen, and W. Bruce Croft. 2019. An Assumption-Free Approach to the Dynamic Truncation of Ranked Lists. In Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval (Santa Clara, CA, USA) (ICTIR '19). Association for Computing Machinery, New York, NY, USA, 79--82. https://doi.org/10.1145/3341981.3344234

Digital Library

[19]

David E. Losada, Javier Parapar, and Álvaro Barreiro. 2016. Feeling Lucky? Multi-Armed Bandits for Ordering Judgements in Pooling-Based Evaluation. In Proceedings of the 31st Annual ACM Symposium on Applied Computing (Pisa, Italy) (SAC '16). Association for Computing Machinery, New York, NY, USA, 1027--1034.

Digital Library

[20]

Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. In Proc. of CoCo@NIPS.

[21]

Dwaipayan Roy, Debasis Ganguly, Mandar Mitra, and Gareth J. F. Jones. 2016. Word Vector Compositionality based Relevance Feedback using Kernel Density Estimation. In Proc. of CIKM 2016. 1281--1290.

[22]

Mark Sanderson. 2010. Test Collection Based Evaluation of Information Retrieval Systems. Found. Trends Inf. Retr., Vol. 4, 4 (2010), 247--375. http://dblp.uni-trier.de/db/journals/ftir/ftir4.html#Sanderson10

[23]

Anna Shtok, Oren Kurland, David Carmel, Fiana Raiber, and Gad Markovits. 2012. Predicting Query Performance by Query-Drift Estimation. ACM Trans. Inf. Syst., Vol. 30, 2, Article 11 (2012).

Digital Library

[24]

Ellen M Voorhees and Donna K Harman. 2000. The eighth text retrieval conference (TREC-8). Technical Report.

[25]

Emine Yilmaz, Evangelos Kanoulas, and Javed A. Aslam. 2008. A simple and efficient sampling method for estimating AP and NDCG. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2008, Singapore, July 20--24, 2008, Sung-Hyon Myaeng, Douglas W. Oard, Fabrizio Sebastiani, Tat-Seng Chua, and Mun-Kew Leong (Eds.). ACM, 603--610.

Digital Library

[26]

Emine Yilmaz, Manisha Verma, Nick Craswell, Filip Radlinski, and Peter Bailey. 2014. Relevance and Effort: An Analysis of Document Utility. In Proc. of CIKM'14. 91--100.

Digital Library

[27]

Oleg Zendel, Anna Shtok, Fiana Raiber, Oren Kurland, and J. Shane Culpepper. 2019. Information Needs, Queries, and Query Performance Prediction. In Proc. of SIGIR '19. Association for Computing Machinery, New York, NY, USA, 395--404.

[28]

Ying Zhao, Falk Scholer, and Yohannes Tsegay. 2008. Effective Pre-Retrieval Query Performance Prediction Using Similarity and Variability Evidence. In Proc. ECIR'09. Springer-Verlag, Berlin, Heidelberg, 52--64.

[29]

Yun Zhou and W. Bruce Croft. 2007. Query Performance Prediction in Web Search Environments. In Proc. of SIGIR'07. 543--550.

Cited By

Salamat SArabzadeh NSeyedsalehi SBigdeli AZihayat MBagheri E(2025)A contrastive neural disentanglement approach for query performance predictionMachine Learning10.1007/s10994-025-06752-x114:4Online publication date: 25-Feb-2025
https://doi.org/10.1007/s10994-025-06752-x
Saleminezhad AArabzadeh NRad RBeheshti SBagheri E(2025)Robust query performance prediction for dense retrievers via adaptive disturbance generationMachine Learning10.1007/s10994-024-06659-z114:3Online publication date: 6-Feb-2025
https://doi.org/10.1007/s10994-024-06659-z
Arabzadeh NMeng CAliannejadi MBagheri ESakai TIshita EOhshima HHasibi FMao JJose J(2024)Query Performance Prediction: Techniques and Applications in Modern Information RetrievalProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698438(291-294)Online publication date: 8-Dec-2024
https://dl.acm.org/doi/10.1145/3673791.3698438
Show More Cited By

Index Terms

Query-specific Variable Depth Pooling via Query Performance Prediction
1. Information systems
  1. Information retrieval

Index terms have been assigned to the content through auto-classification.

Recommendations

Query Performance Prediction: From Ad-hoc to Conversational Search
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Query performance prediction (QPP) is a core task in information retrieval. The QPP task is to predict the retrieval quality of a search system for a query without relevance judgments. Research has shown the effectiveness and usefulness of QPP for ad-hoc ...
Query performance prediction

The prediction of query performance is an interesting and important issue in Information Retrieval (IR). Current predictors involve the use of relevance scores, which are time-consuming to compute. Therefore, current predictors are not very suitable for ...
Passage Based Answer-Set Graph Approach for Query Performance Prediction
ADCS '21: Proceedings of the 25th Australasian Document Computing Symposium

Approaches involving the use of post-retrieval information for a given query have been adopted in a variety of ways in the past for query performance prediction (QPP) tasks. Researchers have utilized information via document retrieval as well as ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2023

3567 pages

ISBN:9781450394086

DOI:10.1145/3539618

General Chairs:
Hsin-Hsi Chen
National Taiwan University
,
Wei-Jou (Edward) Duh
National Taiwan University
,
Hen-Hsen Huang
Academia Sinica
,
Program Chairs:
Makoto P. Kato
Spotify
,
Josiane Mothe
Universite de Toulouse
,
Barbara Poblete
University of Chile and Amazon Visiting Academic

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

SIGIR '23

Sponsor:

SIGIR

SIGIR '23: The 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 23 - 27, 2023

Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
77
Total Downloads

Downloads (Last 12 months)24
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Salamat SArabzadeh NSeyedsalehi SBigdeli AZihayat MBagheri E(2025)A contrastive neural disentanglement approach for query performance predictionMachine Learning10.1007/s10994-025-06752-x114:4Online publication date: 25-Feb-2025
https://doi.org/10.1007/s10994-025-06752-x
Saleminezhad AArabzadeh NRad RBeheshti SBagheri E(2025)Robust query performance prediction for dense retrievers via adaptive disturbance generationMachine Learning10.1007/s10994-024-06659-z114:3Online publication date: 6-Feb-2025
https://doi.org/10.1007/s10994-024-06659-z
Arabzadeh NMeng CAliannejadi MBagheri ESakai TIshita EOhshima HHasibi FMao JJose J(2024)Query Performance Prediction: Techniques and Applications in Modern Information RetrievalProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698438(291-294)Online publication date: 8-Dec-2024
https://dl.acm.org/doi/10.1145/3673791.3698438
Parry AGanguly DChandra MHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)"In-Context Learning" or: How I learned to stop worrying and love "Applied Information Retrieval"Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657842(14-25)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657842
Arabzadeh NMeng CAliannejadi MBagheri E(2024)Query Performance Prediction: From Fundamentals to Advanced TechniquesAdvances in Information Retrieval10.1007/978-3-031-56069-9_51(381-388)Online publication date: 24-Mar-2024
https://dl.acm.org/doi/10.1007/978-3-031-56069-9_51

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten