Skip to main content

BertPE: A BERT-Based Pre-retrieval Estimator for Query Performance Prediction

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14610))

Included in the following conference series:

  • 360 Accesses

Abstract

Query Performance Prediction (QPP) aims to estimate the effectiveness of a query in addressing the underlying information need without any relevance judgments. More recent works in this area have employed the pre-trained neural embedding representations of the query to go beyond the corpus statistics of query terms and capture the semantics of the query. In this paper, we propose a supervised QPP method by adopting contextualized neural embeddings to directly learn the performance through fine-tuning. To address the challenges arising from disparities in the evaluation of retrieval models through sparse and comprehensive labels, we introduce an innovative strategy for creating synthetic relevance judgments to enable effective performance prediction for queries, irrespective of whether they are evaluated with sparse or more comprehensive labels. Through our experiments on four different query sets accompanied by MS MARCO V1 collection, we show that our approach shows significantly improved performance compared to the state-of-the-art Pre-retrieval QPP methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Arabzadeh, N., Bigdeli, A., Zihayat, M., Bagheri, E.: Query performance prediction through retrieval coherency. In: Hiemstra, D., Moens, M.F., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds.) ECIR 2021, Part II. LNCS, vol. 12657, pp. 193–200. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72240-1_15

    Chapter  Google Scholar 

  2. Arabzadeh, N., Hamidi Rad, R., Khodabakhsh, M., Bagheri, E.: Noisy perturbations for estimating query difficulty in dense retrievers. In: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, pp. 3722–3727 (2023)

    Google Scholar 

  3. Arabzadeh, N., Khodabakhsh, M., Bagheri, E.: Bert-QPP: contextualized pre-trained transformers for query performance prediction. In: Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp. 2857–2861 (2021)

    Google Scholar 

  4. Arabzadeh, N., Seifikar, M., Clarke, C.L.A.: Unsupervised question clarity prediction through retrieved item coherency (2022)

    Google Scholar 

  5. Arabzadeh, N., Vtyurina, A., Yan, X., Clarke, C.L.A.: Shallow pooling for sparse labels. CoRR abs/2109.00062 (2021). https://arxiv.org/abs/2109.00062

  6. Arabzadeh, N., Zarrinkalam, F., Jovanovic, J., Al-Obeidat, F., Bagheri, E.: Neural embedding-based specificity metrics for pre-retrieval query performance prediction. Inf. Process. Manag. 57(4), 102248 (2020)

    Article  Google Scholar 

  7. Arabzadeh, N., Zarrinkalam, F., Jovanovic, J., Bagheri, E.: Neural embedding-based metrics for pre-retrieval query performance prediction. In: Jose, J., et al. (eds.) ECIR 2020, Part II. LNCS, vol. 12036, pp. 78–85. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45442-5_10

    Chapter  Google Scholar 

  8. Arabzadeh, N., Zarrinkalam, F., Jovanovic, J., Bagheri, E.: Geometric estimation of specificity within embedding spaces. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 2109–2112 (2019)

    Google Scholar 

  9. Carmel, D., Yom-Tov, E.: Estimating the query difficulty for information retrieval. Synthesis Lectures on Information Concepts, Retrieval, and Services, vol. 2, no. 1, pp. 1–89 (2010)

    Google Scholar 

  10. Chen, X., He, B., Sun, L.: Groupwise query performance prediction with BERT. In: Hagen, M., et al. (eds.) ECIR 2022. LNCS, vol. 13186, pp. 64–74. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-99739-7_8

    Chapter  Google Scholar 

  11. Cormack, G.V., Palmer, C.R., Clarke, C.L.: Efficient construction of large test collections. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 282–289 (1998)

    Google Scholar 

  12. Craswell, N., Mitra, B., Yilmaz, E., Campos, D.: Overview of the TREC 2020 deep learning track. In: TREC (2020)

    Google Scholar 

  13. Craswell, N., Mitra, B., Yilmaz, E., Campos, D.: Overview of the TREC 2020 deep learning track. CoRR abs/2102.07662 (2021). https://arxiv.org/abs/2102.07662

  14. Craswell, N., Mitra, B., Yilmaz, E., Campos, D., Voorhees, E.M.: Overview of the TREC 2019 deep learning track. In: TREC (2019)

    Google Scholar 

  15. Craswell, N., Mitra, B., Yilmaz, E., Campos, D., Voorhees, E.M.: Overview of the TREC 2019 deep learning track. arXiv preprint arXiv:2003.07820 (2020)

  16. Craswell, N., Mitra, B., Yilmaz, E., Campos, D.F., Lin, J.: Overview of the TREC 2021 deep learning track. In: TREC (2021)

    Google Scholar 

  17. Craswell, N., et al.: Overview of the TREC 2022 deep learning track. In: TREC (2022)

    Google Scholar 

  18. Datta, S., Ganguly, D., Greene, D., Mitra, M.: Deep-QPP: a pairwise interaction-based deep learning model for supervised query performance prediction. In: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, pp. 201–209 (2022)

    Google Scholar 

  19. Datta, S., Ganguly, D., Mitra, M., Greene, D.: A relative information gain-based query performance prediction framework with generated query variants. ACM Trans. Inf. Syst. 41(2), 1–31 (2022)

    Article  Google Scholar 

  20. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/n19-1423

  21. Diaz, F.: Pseudo-query reformulation. In: Ferro, N., et al. (eds.) ECIR 2016. LNCS, vol. 9626, pp. 521–532. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-30671-1_38

    Chapter  Google Scholar 

  22. Faggioli, G., et al.: Towards query performance prediction for neural information retrieval: challenges and opportunities. In: Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval, pp. 51–63 (2023)

    Google Scholar 

  23. He, B., Ounis, I.: Inferring query performance using pre-retrieval predictors. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 43–54. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30213-1_5

    Chapter  Google Scholar 

  24. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR) (2015)

    Google Scholar 

  25. Losada, D.E., Parapar, J., Barreiro, A.: Multi-armed bandits for adjudicating documents in pooling-based evaluation of information retrieval systems. Inf. Process. Manag. 53(5), 1005–1025 (2017)

    Article  Google Scholar 

  26. Mackie, I., Dalton, J., Yates, A.: How deep is your learning: the DL-hard annotated deep learning dataset. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2335–2341 (2021)

    Google Scholar 

  27. Meng, C., Arabzadeh, N., Aliannejadi, M., de Rijke, M.: Query performance prediction: from ad-hoc to conversational search. arXiv preprint arXiv:2305.10923 (2023)

  28. Mothe, J., Tanguy, L.: Linguistic features to predict query difficulty. In: ACM SIGIR 2005 Workshop on Predicting Query Difficulty - Methods and Applications (2005)

    Google Scholar 

  29. Nguyen, T., Rosenberg, M., Song, X., Gao, J., Tiwary, S., Majumder, R., Deng, L.: MS MARCO: a human generated machine reading comprehension dataset. In: CoCo@ NIPS (2016)

    Google Scholar 

  30. Nogueira, R., Yang, W., Cho, K., Lin, J.: Multi-stage document ranking with bert (2019)

    Google Scholar 

  31. Pradeep, R., Nogueira, R., Lin, J.: The expando-mono-duo design pattern for text ranking with pretrained sequence-to-sequence models. arXiv preprint arXiv:2101.05667 (2021)

  32. Qu, Y., et al.: RocketQA: an optimized training approach to dense passage retrieval for open-domain question answering. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 5835–5847. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.naacl-main.466

  33. Roitman, H., Erera, S., Feigenblat, G.: A study of query performance prediction for answer quality determination. In: Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval, pp. 43–46 (2019)

    Google Scholar 

  34. Roy, D., Ganguly, D., Mitra, M., Jones, G.J.: Estimating gaussian mixture models in the local neighbourhood of embedded word vectors for query performance prediction. Inf. Process. Manag. 56(3), 1026–1045 (2019)

    Article  Google Scholar 

  35. Salamat, S., Arabzadeh, N., Seyedsalehi, S., Bigdeli, A., Zihayat, M., Bagheri, E.: Neural disentanglement of query difficulty and semantics. In: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, pp. 4264–4268 (2023)

    Google Scholar 

  36. Singh, A., Ganguly, D., Datta, S., Macdonald, C.: Unsupervised query performance prediction for neural models utilising pairwise rank preferences. def 1, 2 (2023)

    Google Scholar 

  37. Vaswani, A., et al.: Attention is all you need. CoRR abs/1706.03762 (2017). http://arxiv.org/abs/1706.03762

  38. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  39. Zamani, H., Bendersky, M.: Multivariate representation learning for information retrieval. arXiv preprint arXiv:2304.14522 (2023)

  40. Zerveas, G., Rekabsaz, N., Eickhoff, C.: Enhancing the ranking context of dense retrieval methods through reciprocal nearest neighbors. arXiv preprint arXiv:2305.15720 (2023)

  41. Zhao, Y., Scholer, F., Tsegay, Y.: Effective pre-retrieval query performance prediction using similarity and variability evidence. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 52–64. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78646-7_8

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maryam Khodabakhsh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Khodabakhsh, M., Zarrinkalam, F., Arabzadeh, N. (2024). BertPE: A BERT-Based Pre-retrieval Estimator for Query Performance Prediction. In: Goharian, N., et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14610. Springer, Cham. https://doi.org/10.1007/978-3-031-56063-7_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-56063-7_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-56062-0

  • Online ISBN: 978-3-031-56063-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics