BertPE: A BERT-Based Pre-retrieval Estimator for Query Performance Prediction

Khodabakhsh, Maryam; Zarrinkalam, Fattane; Arabzadeh, Negar

doi:10.1007/978-3-031-56063-7_27

Maryam Khodabakhsh¹⁴,
Fattane Zarrinkalam¹⁵ &
Negar Arabzadeh¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14610))

Included in the following conference series:

European Conference on Information Retrieval

1150 Accesses

Abstract

Query Performance Prediction (QPP) aims to estimate the effectiveness of a query in addressing the underlying information need without any relevance judgments. More recent works in this area have employed the pre-trained neural embedding representations of the query to go beyond the corpus statistics of query terms and capture the semantics of the query. In this paper, we propose a supervised QPP method by adopting contextualized neural embeddings to directly learn the performance through fine-tuning. To address the challenges arising from disparities in the evaluation of retrieval models through sparse and comprehensive labels, we introduce an innovative strategy for creating synthetic relevance judgments to enable effective performance prediction for queries, irrespective of whether they are evaluated with sparse or more comprehensive labels. Through our experiments on four different query sets accompanied by MS MARCO V1 collection, we show that our approach shows significantly improved performance compared to the state-of-the-art Pre-retrieval QPP methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Neural Embedding-Based Metrics for Pre-retrieval Query Performance Prediction

Query Performance Prediction for Neural IR: Are We There Yet?

Estimating Query Performance Through Rich Contextualized Query Representations

References

Arabzadeh, N., Bigdeli, A., Zihayat, M., Bagheri, E.: Query performance prediction through retrieval coherency. In: Hiemstra, D., Moens, M.F., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds.) ECIR 2021, Part II. LNCS, vol. 12657, pp. 193–200. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72240-1_15
Chapter Google Scholar
Arabzadeh, N., Hamidi Rad, R., Khodabakhsh, M., Bagheri, E.: Noisy perturbations for estimating query difficulty in dense retrievers. In: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, pp. 3722–3727 (2023)
Google Scholar
Arabzadeh, N., Khodabakhsh, M., Bagheri, E.: Bert-QPP: contextualized pre-trained transformers for query performance prediction. In: Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp. 2857–2861 (2021)
Google Scholar
Arabzadeh, N., Seifikar, M., Clarke, C.L.A.: Unsupervised question clarity prediction through retrieved item coherency (2022)
Google Scholar
Arabzadeh, N., Vtyurina, A., Yan, X., Clarke, C.L.A.: Shallow pooling for sparse labels. CoRR abs/2109.00062 (2021). https://arxiv.org/abs/2109.00062
Arabzadeh, N., Zarrinkalam, F., Jovanovic, J., Al-Obeidat, F., Bagheri, E.: Neural embedding-based specificity metrics for pre-retrieval query performance prediction. Inf. Process. Manag. 57(4), 102248 (2020)
Article Google Scholar
Arabzadeh, N., Zarrinkalam, F., Jovanovic, J., Bagheri, E.: Neural embedding-based metrics for pre-retrieval query performance prediction. In: Jose, J., et al. (eds.) ECIR 2020, Part II. LNCS, vol. 12036, pp. 78–85. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45442-5_10
Chapter Google Scholar
Arabzadeh, N., Zarrinkalam, F., Jovanovic, J., Bagheri, E.: Geometric estimation of specificity within embedding spaces. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 2109–2112 (2019)
Google Scholar
Carmel, D., Yom-Tov, E.: Estimating the query difficulty for information retrieval. Synthesis Lectures on Information Concepts, Retrieval, and Services, vol. 2, no. 1, pp. 1–89 (2010)
Google Scholar
Chen, X., He, B., Sun, L.: Groupwise query performance prediction with BERT. In: Hagen, M., et al. (eds.) ECIR 2022. LNCS, vol. 13186, pp. 64–74. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-99739-7_8
Chapter Google Scholar
Cormack, G.V., Palmer, C.R., Clarke, C.L.: Efficient construction of large test collections. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 282–289 (1998)
Google Scholar
Craswell, N., Mitra, B., Yilmaz, E., Campos, D.: Overview of the TREC 2020 deep learning track. In: TREC (2020)
Google Scholar
Craswell, N., Mitra, B., Yilmaz, E., Campos, D.: Overview of the TREC 2020 deep learning track. CoRR abs/2102.07662 (2021). https://arxiv.org/abs/2102.07662
Craswell, N., Mitra, B., Yilmaz, E., Campos, D., Voorhees, E.M.: Overview of the TREC 2019 deep learning track. In: TREC (2019)
Google Scholar
Craswell, N., Mitra, B., Yilmaz, E., Campos, D., Voorhees, E.M.: Overview of the TREC 2019 deep learning track. arXiv preprint arXiv:2003.07820 (2020)
Craswell, N., Mitra, B., Yilmaz, E., Campos, D.F., Lin, J.: Overview of the TREC 2021 deep learning track. In: TREC (2021)
Google Scholar
Craswell, N., et al.: Overview of the TREC 2022 deep learning track. In: TREC (2022)
Google Scholar
Datta, S., Ganguly, D., Greene, D., Mitra, M.: Deep-QPP: a pairwise interaction-based deep learning model for supervised query performance prediction. In: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, pp. 201–209 (2022)
Google Scholar
Datta, S., Ganguly, D., Mitra, M., Greene, D.: A relative information gain-based query performance prediction framework with generated query variants. ACM Trans. Inf. Syst. 41(2), 1–31 (2022)
Article Google Scholar
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/n19-1423
Diaz, F.: Pseudo-query reformulation. In: Ferro, N., et al. (eds.) ECIR 2016. LNCS, vol. 9626, pp. 521–532. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-30671-1_38
Chapter Google Scholar
Faggioli, G., et al.: Towards query performance prediction for neural information retrieval: challenges and opportunities. In: Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval, pp. 51–63 (2023)
Google Scholar
He, B., Ounis, I.: Inferring query performance using pre-retrieval predictors. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 43–54. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30213-1_5
Chapter Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR) (2015)
Google Scholar
Losada, D.E., Parapar, J., Barreiro, A.: Multi-armed bandits for adjudicating documents in pooling-based evaluation of information retrieval systems. Inf. Process. Manag. 53(5), 1005–1025 (2017)
Article Google Scholar
Mackie, I., Dalton, J., Yates, A.: How deep is your learning: the DL-hard annotated deep learning dataset. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2335–2341 (2021)
Google Scholar
Meng, C., Arabzadeh, N., Aliannejadi, M., de Rijke, M.: Query performance prediction: from ad-hoc to conversational search. arXiv preprint arXiv:2305.10923 (2023)
Mothe, J., Tanguy, L.: Linguistic features to predict query difficulty. In: ACM SIGIR 2005 Workshop on Predicting Query Difficulty - Methods and Applications (2005)
Google Scholar
Nguyen, T., Rosenberg, M., Song, X., Gao, J., Tiwary, S., Majumder, R., Deng, L.: MS MARCO: a human generated machine reading comprehension dataset. In: CoCo@ NIPS (2016)
Google Scholar
Nogueira, R., Yang, W., Cho, K., Lin, J.: Multi-stage document ranking with bert (2019)
Google Scholar
Pradeep, R., Nogueira, R., Lin, J.: The expando-mono-duo design pattern for text ranking with pretrained sequence-to-sequence models. arXiv preprint arXiv:2101.05667 (2021)
Qu, Y., et al.: RocketQA: an optimized training approach to dense passage retrieval for open-domain question answering. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 5835–5847. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.naacl-main.466
Roitman, H., Erera, S., Feigenblat, G.: A study of query performance prediction for answer quality determination. In: Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval, pp. 43–46 (2019)
Google Scholar
Roy, D., Ganguly, D., Mitra, M., Jones, G.J.: Estimating gaussian mixture models in the local neighbourhood of embedded word vectors for query performance prediction. Inf. Process. Manag. 56(3), 1026–1045 (2019)
Article Google Scholar
Salamat, S., Arabzadeh, N., Seyedsalehi, S., Bigdeli, A., Zihayat, M., Bagheri, E.: Neural disentanglement of query difficulty and semantics. In: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, pp. 4264–4268 (2023)
Google Scholar
Singh, A., Ganguly, D., Datta, S., Macdonald, C.: Unsupervised query performance prediction for neural models utilising pairwise rank preferences. def 1, 2 (2023)
Google Scholar
Vaswani, A., et al.: Attention is all you need. CoRR abs/1706.03762 (2017). http://arxiv.org/abs/1706.03762
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Zamani, H., Bendersky, M.: Multivariate representation learning for information retrieval. arXiv preprint arXiv:2304.14522 (2023)
Zerveas, G., Rekabsaz, N., Eickhoff, C.: Enhancing the ranking context of dense retrieval methods through reciprocal nearest neighbors. arXiv preprint arXiv:2305.15720 (2023)
Zhao, Y., Scholer, F., Tsegay, Y.: Effective pre-retrieval query performance prediction using similarity and variability evidence. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 52–64. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78646-7_8
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Shahrood University of Technology, Shahrood, Iran
Maryam Khodabakhsh
University of Guelph, Guelph, ON, Canada
Fattane Zarrinkalam
University of Waterloo, Waterloo, ON, Canada
Negar Arabzadeh

Authors

Maryam Khodabakhsh
View author publications
You can also search for this author in PubMed Google Scholar
Fattane Zarrinkalam
View author publications
You can also search for this author in PubMed Google Scholar
Negar Arabzadeh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maryam Khodabakhsh .

Editor information

Editors and Affiliations

Georgetown University, Washington, WA, USA
Nazli Goharian
University of Pisa, PISA, Pisa, Italy
Nicola Tonellotto
King's College London, London, UK
Yulan He
University College London, London, UK
Aldo Lipani
University of Glasgow, Glasgow, UK
Graham McDonald
University of Glasgow, Glasgow, UK
Craig Macdonald
University of Glasgow, Glasgow, UK
Iadh Ounis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khodabakhsh, M., Zarrinkalam, F., Arabzadeh, N. (2024). BertPE: A BERT-Based Pre-retrieval Estimator for Query Performance Prediction. In: Goharian, N., et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14610. Springer, Cham. https://doi.org/10.1007/978-3-031-56063-7_27

Download citation

DOI: https://doi.org/10.1007/978-3-031-56063-7_27
Published: 23 March 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56062-0
Online ISBN: 978-3-031-56063-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

BertPE: A BERT-Based Pre-retrieval Estimator for Query Performance Prediction

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Neural Embedding-Based Metrics for Pre-retrieval Query Performance Prediction

Query Performance Prediction for Neural IR: Are We There Yet?

Estimating Query Performance Through Rich Contextualized Query Representations

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

BertPE: A BERT-Based Pre-retrieval Estimator for Query Performance Prediction

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Neural Embedding-Based Metrics for Pre-retrieval Query Performance Prediction

Query Performance Prediction for Neural IR: Are We There Yet?

Estimating Query Performance Through Rich Contextualized Query Representations

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation