Abstract
Query Performance Prediction (QPP) aims to estimate the effectiveness of a query in addressing the underlying information need without any relevance judgments. More recent works in this area have employed the pre-trained neural embedding representations of the query to go beyond the corpus statistics of query terms and capture the semantics of the query. In this paper, we propose a supervised QPP method by adopting contextualized neural embeddings to directly learn the performance through fine-tuning. To address the challenges arising from disparities in the evaluation of retrieval models through sparse and comprehensive labels, we introduce an innovative strategy for creating synthetic relevance judgments to enable effective performance prediction for queries, irrespective of whether they are evaluated with sparse or more comprehensive labels. Through our experiments on four different query sets accompanied by MS MARCO V1 collection, we show that our approach shows significantly improved performance compared to the state-of-the-art Pre-retrieval QPP methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Arabzadeh, N., Bigdeli, A., Zihayat, M., Bagheri, E.: Query performance prediction through retrieval coherency. In: Hiemstra, D., Moens, M.F., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds.) ECIR 2021, Part II. LNCS, vol. 12657, pp. 193–200. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72240-1_15
Arabzadeh, N., Hamidi Rad, R., Khodabakhsh, M., Bagheri, E.: Noisy perturbations for estimating query difficulty in dense retrievers. In: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, pp. 3722–3727 (2023)
Arabzadeh, N., Khodabakhsh, M., Bagheri, E.: Bert-QPP: contextualized pre-trained transformers for query performance prediction. In: Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp. 2857–2861 (2021)
Arabzadeh, N., Seifikar, M., Clarke, C.L.A.: Unsupervised question clarity prediction through retrieved item coherency (2022)
Arabzadeh, N., Vtyurina, A., Yan, X., Clarke, C.L.A.: Shallow pooling for sparse labels. CoRR abs/2109.00062 (2021). https://arxiv.org/abs/2109.00062
Arabzadeh, N., Zarrinkalam, F., Jovanovic, J., Al-Obeidat, F., Bagheri, E.: Neural embedding-based specificity metrics for pre-retrieval query performance prediction. Inf. Process. Manag. 57(4), 102248 (2020)
Arabzadeh, N., Zarrinkalam, F., Jovanovic, J., Bagheri, E.: Neural embedding-based metrics for pre-retrieval query performance prediction. In: Jose, J., et al. (eds.) ECIR 2020, Part II. LNCS, vol. 12036, pp. 78–85. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45442-5_10
Arabzadeh, N., Zarrinkalam, F., Jovanovic, J., Bagheri, E.: Geometric estimation of specificity within embedding spaces. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 2109–2112 (2019)
Carmel, D., Yom-Tov, E.: Estimating the query difficulty for information retrieval. Synthesis Lectures on Information Concepts, Retrieval, and Services, vol. 2, no. 1, pp. 1–89 (2010)
Chen, X., He, B., Sun, L.: Groupwise query performance prediction with BERT. In: Hagen, M., et al. (eds.) ECIR 2022. LNCS, vol. 13186, pp. 64–74. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-99739-7_8
Cormack, G.V., Palmer, C.R., Clarke, C.L.: Efficient construction of large test collections. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 282–289 (1998)
Craswell, N., Mitra, B., Yilmaz, E., Campos, D.: Overview of the TREC 2020 deep learning track. In: TREC (2020)
Craswell, N., Mitra, B., Yilmaz, E., Campos, D.: Overview of the TREC 2020 deep learning track. CoRR abs/2102.07662 (2021). https://arxiv.org/abs/2102.07662
Craswell, N., Mitra, B., Yilmaz, E., Campos, D., Voorhees, E.M.: Overview of the TREC 2019 deep learning track. In: TREC (2019)
Craswell, N., Mitra, B., Yilmaz, E., Campos, D., Voorhees, E.M.: Overview of the TREC 2019 deep learning track. arXiv preprint arXiv:2003.07820 (2020)
Craswell, N., Mitra, B., Yilmaz, E., Campos, D.F., Lin, J.: Overview of the TREC 2021 deep learning track. In: TREC (2021)
Craswell, N., et al.: Overview of the TREC 2022 deep learning track. In: TREC (2022)
Datta, S., Ganguly, D., Greene, D., Mitra, M.: Deep-QPP: a pairwise interaction-based deep learning model for supervised query performance prediction. In: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, pp. 201–209 (2022)
Datta, S., Ganguly, D., Mitra, M., Greene, D.: A relative information gain-based query performance prediction framework with generated query variants. ACM Trans. Inf. Syst. 41(2), 1–31 (2022)
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/n19-1423
Diaz, F.: Pseudo-query reformulation. In: Ferro, N., et al. (eds.) ECIR 2016. LNCS, vol. 9626, pp. 521–532. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-30671-1_38
Faggioli, G., et al.: Towards query performance prediction for neural information retrieval: challenges and opportunities. In: Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval, pp. 51–63 (2023)
He, B., Ounis, I.: Inferring query performance using pre-retrieval predictors. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 43–54. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30213-1_5
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR) (2015)
Losada, D.E., Parapar, J., Barreiro, A.: Multi-armed bandits for adjudicating documents in pooling-based evaluation of information retrieval systems. Inf. Process. Manag. 53(5), 1005–1025 (2017)
Mackie, I., Dalton, J., Yates, A.: How deep is your learning: the DL-hard annotated deep learning dataset. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2335–2341 (2021)
Meng, C., Arabzadeh, N., Aliannejadi, M., de Rijke, M.: Query performance prediction: from ad-hoc to conversational search. arXiv preprint arXiv:2305.10923 (2023)
Mothe, J., Tanguy, L.: Linguistic features to predict query difficulty. In: ACM SIGIR 2005 Workshop on Predicting Query Difficulty - Methods and Applications (2005)
Nguyen, T., Rosenberg, M., Song, X., Gao, J., Tiwary, S., Majumder, R., Deng, L.: MS MARCO: a human generated machine reading comprehension dataset. In: CoCo@ NIPS (2016)
Nogueira, R., Yang, W., Cho, K., Lin, J.: Multi-stage document ranking with bert (2019)
Pradeep, R., Nogueira, R., Lin, J.: The expando-mono-duo design pattern for text ranking with pretrained sequence-to-sequence models. arXiv preprint arXiv:2101.05667 (2021)
Qu, Y., et al.: RocketQA: an optimized training approach to dense passage retrieval for open-domain question answering. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 5835–5847. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.naacl-main.466
Roitman, H., Erera, S., Feigenblat, G.: A study of query performance prediction for answer quality determination. In: Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval, pp. 43–46 (2019)
Roy, D., Ganguly, D., Mitra, M., Jones, G.J.: Estimating gaussian mixture models in the local neighbourhood of embedded word vectors for query performance prediction. Inf. Process. Manag. 56(3), 1026–1045 (2019)
Salamat, S., Arabzadeh, N., Seyedsalehi, S., Bigdeli, A., Zihayat, M., Bagheri, E.: Neural disentanglement of query difficulty and semantics. In: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, pp. 4264–4268 (2023)
Singh, A., Ganguly, D., Datta, S., Macdonald, C.: Unsupervised query performance prediction for neural models utilising pairwise rank preferences. def 1, 2 (2023)
Vaswani, A., et al.: Attention is all you need. CoRR abs/1706.03762 (2017). http://arxiv.org/abs/1706.03762
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Zamani, H., Bendersky, M.: Multivariate representation learning for information retrieval. arXiv preprint arXiv:2304.14522 (2023)
Zerveas, G., Rekabsaz, N., Eickhoff, C.: Enhancing the ranking context of dense retrieval methods through reciprocal nearest neighbors. arXiv preprint arXiv:2305.15720 (2023)
Zhao, Y., Scholer, F., Tsegay, Y.: Effective pre-retrieval query performance prediction using similarity and variability evidence. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 52–64. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78646-7_8
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Khodabakhsh, M., Zarrinkalam, F., Arabzadeh, N. (2024). BertPE: A BERT-Based Pre-retrieval Estimator for Query Performance Prediction. In: Goharian, N., et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14610. Springer, Cham. https://doi.org/10.1007/978-3-031-56063-7_27
Download citation
DOI: https://doi.org/10.1007/978-3-031-56063-7_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56062-0
Online ISBN: 978-3-031-56063-7
eBook Packages: Computer ScienceComputer Science (R0)