Skip to main content

Query Exposure Prediction for Groups of Documents in Rankings

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14609))

Included in the following conference series:

Abstract

The main objective of an Information Retrieval (IR) system is to provide a user with the most relevant documents to the user’s query. To do this, modern IR systems typically deploy a re-ranking pipeline in which a set of documents is retrieved by a lightweight first-stage retrieval process and then re-ranked by a more effective but expensive model. However, the success of a re-ranking pipeline is heavily dependent on the performance of the first stage retrieval, since new documents are not usually identified during the re-ranking stage. Moreover, this can impact the amount of exposure that a particular group of documents, such as documents from a particular demographic group, can receive in the final ranking. For example, the fair allocation of exposure becomes more challenging or impossible if the first stage retrieval returns too few documents from certain groups, since the number of group documents in the ranking affects the exposure more than the documents’ positions. With this in mind, it is beneficial to predict the amount of exposure that a group of documents is likely to receive in the results of the first stage retrieval process, in order to ensure that there are a sufficient number of documents included from each of the groups. In this paper, we introduce the novel task of query exposure prediction (QEP). Specifically, we propose the first approach for predicting the distribution of exposure that groups of documents will receive for a given query. Our new approach, called GEP, uses lexical information from individual groups of documents to estimate the exposure the groups will receive in a ranking. Our experiments on the TREC 2021 and 2022 Fair Ranking Track test collections show that our proposed GEP approach results in exposure predictions that are up to \(\sim \)40% more accurate than the predictions of suitably adapted existing query performance prediction (QPP) and resource allocation approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abdi, H.: Coefficient of variation. Encycl. Res. Des. 1(5), 169–171 (2010)

    Google Scholar 

  2. Abdul-Jaleel, N., et al.: UMass at TREC 2004: Novelty and HARD. Computer Science Department Faculty Publication Series, p. 189 (2004)

    Google Scholar 

  3. Agarwal, A., Zaitsev, I., Wang, X., Li, C., Najork, M., Joachims, T.: Estimating position bias without intrusive interventions. In: Proceedings of ICWSM (2019)

    Google Scholar 

  4. Amati, G.: Probabilistic Models for Information Retrieval based on Divergence from Randomness. University of Glasgow, UK. Ph.D. thesis, PhD Thesis (2003)

    Google Scholar 

  5. Amati, G., Van Rijsbergen, C.J.: Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst. (TOIS) 20(4), 357–389 (2002)

    Article  Google Scholar 

  6. Biega, A.J., Gummadi, K.P., Weikum, G.: Equity of attention: amortizing individual fairness in rankings. In: Proceedings of SIGIR, pp. 405–414 (2018)

    Google Scholar 

  7. Bower, A., Lum, K., Lazovich, T., Yee, K., Belli, L.: Random isn’t always fair: Candidate set imbalance and exposure inequality in recommender systems (2022). arXiv preprint arXiv:2209.05000

  8. Callan, J.: Distributed information retrieval. In: Croft, W.B. (eds.) Advances in Information Retrieval. The Information Retrieval Series, vol. 7. Springer, Boston, MA (2002). https://doi.org/10.1007/0-306-47019-5_5

  9. Carmel, D., Yom-Tov, E.: Estimating the query difficulty for information retrieval. Synth. Lect. Inf. Concepts Retrieval Serv. 2(1), 1–89 (2010)

    Google Scholar 

  10. Craswell, N., Zoeter, O., Taylor, M., Ramsey, B.: An experimental comparison of click position-bias models. In: Proceedings of WSDM (2008)

    Google Scholar 

  11. Cronen-Townsend, S., Zhou, Y., Croft, W.B.: Predicting query performance. In: Proceedings of SIGIR, pp. 299–306 (2002)

    Google Scholar 

  12. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding (2018). arXiv preprint arXiv:1810.04805

  13. Diaz, F., Mitra, B., Ekstrand, M.D., Biega, A.J., Carterette, B.: Evaluating stochastic rankings with expected exposure. In: Proceedings of CIKM, pp. 275–284 (2020)

    Google Scholar 

  14. Ekstrand, M.D., Burke, R., Diaz, F.: Fairness and discrimination in retrieval and recommendation. In: Proceedings of SIGIR, pp. 1403–1404 (2019)

    Google Scholar 

  15. Ekstrand, M.D., McDonald, G., Raj, A., Johnson, I.: Overview of the TREC 2021 fair ranking track. In: Proceedings of TREC (2021)

    Google Scholar 

  16. Ekstrand, M.D., McDonald, G., Raj, A., Johnson, I.: Overview of the TREC 2022 fair ranking track. In: Proceedings of TREC 2022 (2022)

    Google Scholar 

  17. Fang, H., Zhai, C.: Semantic term matching in axiomatic approaches to information retrieval. In: Proceedings of SIGIR, pp. 115–122 (2006)

    Google Scholar 

  18. Formal, T., Piwowarski, B., Clinchant, S.: SPLADE: sparse lexical and expansion model for first stage ranking. In: Proceedings of SIGIR, pp. 2288–2292 (2021)

    Google Scholar 

  19. Hauff, C., Azzopardi, L., Hiemstra, D., de Jong, F.: Query performance prediction: evaluation contrasted with effectiveness. In: Gurrin, C., et al. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 204–216. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12275-0_20

    Chapter  Google Scholar 

  20. He, B., Ounis, I.: Inferring query performance using pre-retrieval predictors. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 43–54. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30213-1_5

    Chapter  Google Scholar 

  21. He, B., Ounis, I.: Query performance prediction. Inf. Syst. 31(7), 585–594 (2006)

    Article  Google Scholar 

  22. Heuss, M., Sarvi, F., de Rijke, M.: Fairness of exposure in light of incomplete exposure estimation. In: Proceedings of SIGIR, pp. 759–769 (2022)

    Google Scholar 

  23. Hofstätter, S., Hanbury, A.: Let’s measure run time! Extending the IR replicability infrastructure to include performance aspects (2019). arXiv preprint arXiv:1907.04614

  24. Jaenich, T., McDonald, G., Ounis, I.: ColBERT-FairPRF: towards fair pseudo-relevance feedback in dense retrieval. In: Kamps, J., et al. Advances in Information Retrieval. ECIR 2023. LNCS, vol. 13981. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-28238-6_36

  25. Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. (TOIS) 20(4), 422–446 (2002)

    Article  Google Scholar 

  26. Kletti, T., Renders, J.M., Loiseau, P.: Pareto-optimal fairness-utility amortizations in rankings with a DBN exposure model. In: Proceedings of SIGIR, pp. 748–758 (2022)

    Google Scholar 

  27. Lin, J., Nogueira, R., Yates, A.: Pretrained transformers for text ranking: BERT and beyond. Springer Nature (2022). https://doi.org/10.1007/978-3-031-02181-7

  28. Macdonald, C., Tonellotto, N.: Declarative experimentation in information retrieval using PyTerrier. In: Proceedings of ICTIR (2020)

    Google Scholar 

  29. McDonald, G., Macdonald, C., Ounis, I.: Search results diversification for effective fair ranking in academic search. Inf. Retrieval J. 25(1), 1–26 (2022)

    Article  Google Scholar 

  30. Morik, M., Singh, A., Hong, J., Joachims, T.: Controlling fairness and bias in dynamic learning-to-rank. In: Proceedings of SIGIR, pp. 429–438 (2020)

    Google Scholar 

  31. Pradeep, R., Liu, Y., Zhang, X., Li, Y., Yates, A., Lin, J.: Squeezing water from a stone: a bag of tricks for further improving cross-encoder effectiveness for reranking. In: Hagen, M., et al. (eds.) ECIR 2022. LNCS, vol. 13185, pp. 655–670. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-99736-6_44

    Chapter  Google Scholar 

  32. Pradeep, R., Nogueira, R., Lin, J.: The expando-mono-duo design pattern for text ranking with pretrained sequence-to-sequence models (2021). arXiv preprint arXiv:2101.05667

  33. Raj, A., Wood, C., Montoly, A., Ekstrand, M.D.: Comparing fair ranking metrics (2020). arXiv preprint arXiv:2009.01311

  34. Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M.M., Gatford, M., et al.: Okapi at TREC-3. NIST Special Publication SP 109, 109 (1995)

    Google Scholar 

  35. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)

    Article  Google Scholar 

  36. Sarvi, F., Heuss, M., Aliannejadi, M., Schelter, S., de Rijke, M.: Understanding and mitigating the effect of outliers in fair ranking. In: Proceedings of WSDM, pp. 861–869 (2022)

    Google Scholar 

  37. Singh, A., Joachims, T.: Fairness of exposure in rankings. In: Proceedings of KDD (2018)

    Google Scholar 

  38. Usunier, N., Do, V., Dohmatob, E.: Fast online ranking with fairness of exposure. In: Proceedings of FAccT, pp. 2157–2167 (2022)

    Google Scholar 

  39. Wang, X., Golbandi, N., Bendersky, M., Metzler, D., Najork, M.: Position bias estimation for unbiased learning to rank in personal search. In: Proceedings of ICWSM (2018)

    Google Scholar 

  40. Wu, H., Mitra, B., Ma, C., Diaz, F., Liu, X.: Joint multisided exposure fairness for recommendation. In: Proceedings of SIGIR, pp. 703–714 (2022)

    Google Scholar 

  41. Zehlike, M., Castillo, C.: Reducing disparate exposure in ranking: a learning to rank approach. In: Proceedings of The Web Conference, pp. 2849–2855 (2020)

    Google Scholar 

  42. Zehlike, M., Yang, K., Stoyanovich, J.: Fairness in ranking: A survey (2021). arXiv preprint arXiv:2103.14000

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas Jaenich .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jaenich, T., McDonald, G., Ounis, I. (2024). Query Exposure Prediction for Groups of Documents in Rankings. In: Goharian, N., et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14609. Springer, Cham. https://doi.org/10.1007/978-3-031-56060-6_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-56060-6_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-56059-0

  • Online ISBN: 978-3-031-56060-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics