Abstract
The main objective of an Information Retrieval (IR) system is to provide a user with the most relevant documents to the user’s query. To do this, modern IR systems typically deploy a re-ranking pipeline in which a set of documents is retrieved by a lightweight first-stage retrieval process and then re-ranked by a more effective but expensive model. However, the success of a re-ranking pipeline is heavily dependent on the performance of the first stage retrieval, since new documents are not usually identified during the re-ranking stage. Moreover, this can impact the amount of exposure that a particular group of documents, such as documents from a particular demographic group, can receive in the final ranking. For example, the fair allocation of exposure becomes more challenging or impossible if the first stage retrieval returns too few documents from certain groups, since the number of group documents in the ranking affects the exposure more than the documents’ positions. With this in mind, it is beneficial to predict the amount of exposure that a group of documents is likely to receive in the results of the first stage retrieval process, in order to ensure that there are a sufficient number of documents included from each of the groups. In this paper, we introduce the novel task of query exposure prediction (QEP). Specifically, we propose the first approach for predicting the distribution of exposure that groups of documents will receive for a given query. Our new approach, called GEP, uses lexical information from individual groups of documents to estimate the exposure the groups will receive in a ranking. Our experiments on the TREC 2021 and 2022 Fair Ranking Track test collections show that our proposed GEP approach results in exposure predictions that are up to \(\sim \)40% more accurate than the predictions of suitably adapted existing query performance prediction (QPP) and resource allocation approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abdi, H.: Coefficient of variation. Encycl. Res. Des. 1(5), 169–171 (2010)
Abdul-Jaleel, N., et al.: UMass at TREC 2004: Novelty and HARD. Computer Science Department Faculty Publication Series, p. 189 (2004)
Agarwal, A., Zaitsev, I., Wang, X., Li, C., Najork, M., Joachims, T.: Estimating position bias without intrusive interventions. In: Proceedings of ICWSM (2019)
Amati, G.: Probabilistic Models for Information Retrieval based on Divergence from Randomness. University of Glasgow, UK. Ph.D. thesis, PhD Thesis (2003)
Amati, G., Van Rijsbergen, C.J.: Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst. (TOIS) 20(4), 357–389 (2002)
Biega, A.J., Gummadi, K.P., Weikum, G.: Equity of attention: amortizing individual fairness in rankings. In: Proceedings of SIGIR, pp. 405–414 (2018)
Bower, A., Lum, K., Lazovich, T., Yee, K., Belli, L.: Random isn’t always fair: Candidate set imbalance and exposure inequality in recommender systems (2022). arXiv preprint arXiv:2209.05000
Callan, J.: Distributed information retrieval. In: Croft, W.B. (eds.) Advances in Information Retrieval. The Information Retrieval Series, vol. 7. Springer, Boston, MA (2002). https://doi.org/10.1007/0-306-47019-5_5
Carmel, D., Yom-Tov, E.: Estimating the query difficulty for information retrieval. Synth. Lect. Inf. Concepts Retrieval Serv. 2(1), 1–89 (2010)
Craswell, N., Zoeter, O., Taylor, M., Ramsey, B.: An experimental comparison of click position-bias models. In: Proceedings of WSDM (2008)
Cronen-Townsend, S., Zhou, Y., Croft, W.B.: Predicting query performance. In: Proceedings of SIGIR, pp. 299–306 (2002)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding (2018). arXiv preprint arXiv:1810.04805
Diaz, F., Mitra, B., Ekstrand, M.D., Biega, A.J., Carterette, B.: Evaluating stochastic rankings with expected exposure. In: Proceedings of CIKM, pp. 275–284 (2020)
Ekstrand, M.D., Burke, R., Diaz, F.: Fairness and discrimination in retrieval and recommendation. In: Proceedings of SIGIR, pp. 1403–1404 (2019)
Ekstrand, M.D., McDonald, G., Raj, A., Johnson, I.: Overview of the TREC 2021 fair ranking track. In: Proceedings of TREC (2021)
Ekstrand, M.D., McDonald, G., Raj, A., Johnson, I.: Overview of the TREC 2022 fair ranking track. In: Proceedings of TREC 2022 (2022)
Fang, H., Zhai, C.: Semantic term matching in axiomatic approaches to information retrieval. In: Proceedings of SIGIR, pp. 115–122 (2006)
Formal, T., Piwowarski, B., Clinchant, S.: SPLADE: sparse lexical and expansion model for first stage ranking. In: Proceedings of SIGIR, pp. 2288–2292 (2021)
Hauff, C., Azzopardi, L., Hiemstra, D., de Jong, F.: Query performance prediction: evaluation contrasted with effectiveness. In: Gurrin, C., et al. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 204–216. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12275-0_20
He, B., Ounis, I.: Inferring query performance using pre-retrieval predictors. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 43–54. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30213-1_5
He, B., Ounis, I.: Query performance prediction. Inf. Syst. 31(7), 585–594 (2006)
Heuss, M., Sarvi, F., de Rijke, M.: Fairness of exposure in light of incomplete exposure estimation. In: Proceedings of SIGIR, pp. 759–769 (2022)
Hofstätter, S., Hanbury, A.: Let’s measure run time! Extending the IR replicability infrastructure to include performance aspects (2019). arXiv preprint arXiv:1907.04614
Jaenich, T., McDonald, G., Ounis, I.: ColBERT-FairPRF: towards fair pseudo-relevance feedback in dense retrieval. In: Kamps, J., et al. Advances in Information Retrieval. ECIR 2023. LNCS, vol. 13981. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-28238-6_36
Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. (TOIS) 20(4), 422–446 (2002)
Kletti, T., Renders, J.M., Loiseau, P.: Pareto-optimal fairness-utility amortizations in rankings with a DBN exposure model. In: Proceedings of SIGIR, pp. 748–758 (2022)
Lin, J., Nogueira, R., Yates, A.: Pretrained transformers for text ranking: BERT and beyond. Springer Nature (2022). https://doi.org/10.1007/978-3-031-02181-7
Macdonald, C., Tonellotto, N.: Declarative experimentation in information retrieval using PyTerrier. In: Proceedings of ICTIR (2020)
McDonald, G., Macdonald, C., Ounis, I.: Search results diversification for effective fair ranking in academic search. Inf. Retrieval J. 25(1), 1–26 (2022)
Morik, M., Singh, A., Hong, J., Joachims, T.: Controlling fairness and bias in dynamic learning-to-rank. In: Proceedings of SIGIR, pp. 429–438 (2020)
Pradeep, R., Liu, Y., Zhang, X., Li, Y., Yates, A., Lin, J.: Squeezing water from a stone: a bag of tricks for further improving cross-encoder effectiveness for reranking. In: Hagen, M., et al. (eds.) ECIR 2022. LNCS, vol. 13185, pp. 655–670. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-99736-6_44
Pradeep, R., Nogueira, R., Lin, J.: The expando-mono-duo design pattern for text ranking with pretrained sequence-to-sequence models (2021). arXiv preprint arXiv:2101.05667
Raj, A., Wood, C., Montoly, A., Ekstrand, M.D.: Comparing fair ranking metrics (2020). arXiv preprint arXiv:2009.01311
Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M.M., Gatford, M., et al.: Okapi at TREC-3. NIST Special Publication SP 109, 109 (1995)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)
Sarvi, F., Heuss, M., Aliannejadi, M., Schelter, S., de Rijke, M.: Understanding and mitigating the effect of outliers in fair ranking. In: Proceedings of WSDM, pp. 861–869 (2022)
Singh, A., Joachims, T.: Fairness of exposure in rankings. In: Proceedings of KDD (2018)
Usunier, N., Do, V., Dohmatob, E.: Fast online ranking with fairness of exposure. In: Proceedings of FAccT, pp. 2157–2167 (2022)
Wang, X., Golbandi, N., Bendersky, M., Metzler, D., Najork, M.: Position bias estimation for unbiased learning to rank in personal search. In: Proceedings of ICWSM (2018)
Wu, H., Mitra, B., Ma, C., Diaz, F., Liu, X.: Joint multisided exposure fairness for recommendation. In: Proceedings of SIGIR, pp. 703–714 (2022)
Zehlike, M., Castillo, C.: Reducing disparate exposure in ranking: a learning to rank approach. In: Proceedings of The Web Conference, pp. 2849–2855 (2020)
Zehlike, M., Yang, K., Stoyanovich, J.: Fairness in ranking: A survey (2021). arXiv preprint arXiv:2103.14000
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Jaenich, T., McDonald, G., Ounis, I. (2024). Query Exposure Prediction for Groups of Documents in Rankings. In: Goharian, N., et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14609. Springer, Cham. https://doi.org/10.1007/978-3-031-56060-6_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-56060-6_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56059-0
Online ISBN: 978-3-031-56060-6
eBook Packages: Computer ScienceComputer Science (R0)