skip to main content
10.1145/3460231.3474248acmconferencesArticle/Chapter ViewAbstractPublication PagesrecsysConference Proceedingsconference-collections
research-article

Top-K Contextual Bandits with Equity of Exposure

Published: 13 September 2021 Publication History

Abstract

The contextual bandit paradigm provides a general framework for decision-making under uncertainty. It is theoretically well-defined and well-studied, and many personalisation use-cases can be cast as a bandit learning problem. Because this allows for the direct optimisation of utility metrics that rely on online interventions (such as click-through-rate (CTR)), this framework has become an attractive choice to practitioners. Historically, the literature on this topic has focused on a one-sided, user-focused notion of utility, overall disregarding the perspective of content providers in online marketplaces (for example, musical artists on streaming services). If not properly taken into account – recommendation systems in such environments are known to lead to unfair distributions of attention and exposure, which can directly affect the income of the providers. Recent work has shed a light on this, and there is now a growing consensus that some notion of “equity of exposure” might be preferable to implement in many recommendation use-cases.
We study how the top-K contextual bandit problem relates to issues of disparate exposure, and how this disparity can be minimised. The predominant approach in practice is to greedily rank the top-K items according to their estimated utility, as this is optimal according to the well-known Probability Ranking Principle. Instead, we introduce a configurable tolerance parameter that defines an acceptable decrease in utility for a maximal increase in fairness of exposure. We propose a personalised exposure-aware arm selection algorithm that handles this relevance-fairness trade-off on a user-level, as recent work suggests that users’ openness to randomisation may vary greatly over the global populace. Our model-agnostic algorithm deals with arm selection instead of utility modelling, and can therefore be implemented on top of any existing bandit system with minimal changes. We conclude with a case study on carousel personalisation in music recommendation: empirical observations highlight the effectiveness of our proposed method and show that exposure disparity can be significantly reduced with a negligible impact on user utility.

Supplementary Material

MP4 File (RecSys2021_Video_PaperB_4K.mp4)
Presentation video

References

[1]
H. Abdollahpouri, G. Adomavicius, R. Burke, I. Guy, D. Jannach, T. Kamishima, J. Krasnodebski, and L. Pizzato. 2020. Multistakeholder recommendation: Survey and research directions. User Modeling and User-Adapted Interaction 30, 1 (2020), 127–158.
[2]
S. Barocas, M. Hardt, and A. Narayanan. 2019. Fairness and Machine Learning. fairmlbook.org. http://www.fairmlbook.org.
[3]
W. Bendada, G. Salha, and T. Bontempelli. 2020. Carousel Personalization in Music Streaming Apps with Contextual Bandits. In Proc. of the 14th ACM Conference on Recommender Systems(RecSys ’20). ACM, 420–425.
[4]
A. Beutel, J. Chen, T. Doshi, H. Qian, L. Wei, Y. Wu, L. Heldt, Z. Zhao, L. Hong, E. H. Chi, and C. Goodrow. 2019. Fairness in Recommendation Ranking through Pairwise Comparisons. In Proc. of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD ’19). ACM, 2212–2220.
[5]
A. J. Biega, K. P. Gummadi, and G. Weikum. 2018. Equity of Attention: Amortizing Individual Fairness in Rankings. In Proc. of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’18). ACM, 405–414.
[6]
A. Bower, H. Eftekhari, M. Yurochkin, and Y. Sun. 2021. Individually Fair Rankings. In International Conference on Learning Representations(ICLR ’21).
[7]
R. Burke. 2017. Multisided Fairness for Recommendation. CoRR abs/1707.00093(2017). arxiv:1707.00093
[8]
O. Chapelle and L. Li. 2011. An Empirical Evaluation of Thompson Sampling. In Proc. of the 24th International Conference on Neural Information Processing Systems(NIPS’11). 2249–2257.
[9]
O. Chapelle, D. Metlzer, Y. Zhang, and P. Grinspan. 2009. Expected Reciprocal Rank for Graded Relevance. In Proc. of the 18th ACM Conference on Information and Knowledge Management(CIKM ’09). ACM, 621–630.
[10]
O. Chapelle and Y. Zhang. 2009. A Dynamic Bayesian Network Click Model for Web Search Ranking. In Proc. of the 18th International Conference on World Wide Web(WWW ’09). ACM, 1–10.
[11]
M. Chen, A. Beutel, P. Covington, S. Jain, F. Belletti, and E. H. Chi. 2019. Top-K Off-Policy Correction for a REINFORCE Recommender System. In Proc. of the 12th ACM International Conference on Web Search and Data Mining(WSDM ’19). ACM, 456–464.
[12]
Y. Chen, A. Cuellar, H. Luo, J. Modi, H. Nemlekar, and S. Nikolaidis. 2020. Fair Contextual Multi-Armed Bandits: Theory and Experiments. In Proc. of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), Vol. 124. PMLR, 181–190.
[13]
A. Chuklin, I. Markov, and M. de Rijke. 2015. Click models for web search. Synthesis lectures on information concepts, retrieval, and services 7, 3(2015), 1–115.
[14]
M. F. Dacrema, P. Cremonesi, and D. Jannach. 2019. Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches. In Proc. of the 13th ACM Conference on Recommender Systems(RecSys ’19). ACM, 101–109.
[15]
Y. Deldjoo, V. W. Anelli, H. Zamani, A. Bellogín, and T. Di Noia. 2021. A flexible framework for evaluating user and item fairness in recommender systems. User Modeling and User-Adapted Interaction(2021).
[16]
F. Diaz, B. Mitra, M. D. Ekstrand, A. J. Biega, and B. Carterette. 2020. Evaluating Stochastic Rankings with Expected Exposure. In Proc. of the 29th ACM International Conference on Information and Knowledge Management(CIKM ’20). ACM, 275–284.
[17]
B. Dumitrascu, K. Feng, and B. E. Engelhardt. 2018. PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits. In Advances in Neural Information Processing Systems 30(NeurIPS’18). 4629–4638.
[18]
F. Garcin, B. Faltings, O. Donatsch, A. Alazzawi, C. Bruttin, and A. Huber. 2014. Offline and Online Evaluation of News Recommender Systems at Swissinfo.Ch. In Proc. of the 8th ACM Conference on Recommender Systems(RecSys ’14). 169–176.
[19]
A. Gilotte, C. Calauzènes, T. Nedelec, A. Abraham, and S. Dollé. 2018. Offline A/B Testing for Recommender Systems. In Proc. of the Eleventh ACM International Conference on Web Search and Data Mining(WSDM ’18). ACM, 198–206.
[20]
E. Ie, V. Jain, J. Wang, S. Narvekar, R. Agarwal, R. Wu, H. Cheng, T. Chandra, and C. Boutilier. 2019. SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets. In Proc. of the 28th International Joint Conference on Artificial Intelligence(IJCAI ’19). 2592–2599.
[21]
O. Jeunen. 2019. Revisiting Offline Evaluation for Implicit-Feedback Recommender Systems. In Proc. of the 13th ACM Conference on Recommender Systems(RecSys ’19). ACM, 596–600.
[22]
O. Jeunen and B. Goethals. 2021. Pessimistic Reward Models for Off-Policy Learning in Recommendation. In Proc. of the 15th ACM Conference on Recommender Systems(RecSys ’21). ACM.
[23]
O. Jeunen, D. Rohde, and F. Vasile. 2019. On the Value of Bandit Feedback for Offline Recommender System Evaluation. In Proc. of the ACM RecSys Workshop on Reinforcement Learning and Robust Estimators for Recommendation(REVEAL ’19).
[24]
O. Jeunen, D. Rohde, F. Vasile, and M. Bompaire. 2020. Joint Policy-Value Learning for Recommendation. In Proc. of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD ’20). ACM, 1223–1233.
[25]
T. Joachims, A. Swaminathan, and T. Schnabel. 2017. Unbiased Learning-to-Rank with Biased Feedback. In Proc. of the 10th ACM International Conference on Web Search and Data Mining(WSDM ’17). ACM, 781–789.
[26]
M. Joseph, M. Kearns, J. H. Morgenstern, and A. Roth. 2016. Fairness in Learning: Classic and Contextual Bandits. In Advances in Neural Information Processing Systems(NeurIPS ’16), Vol. 29.
[27]
C. Li, H. Feng, and M. de Rijke. 2020. Cascading Hybrid Bandits: Online Learning to Rank for Relevance and Diversity. In Proc. of the 14th ACM Conference on Recommender Systems(RecSys ’20). ACM, 33–42.
[28]
L. Li, W. Chu, J. Langford, and R. E. Schapire. 2010. A Contextual-Bandit Approach to Personalized News Article Recommendation. In Proc. of the 19th International Conference on World Wide Web(WWW ’10). ACM, 661–670.
[29]
F. Liese and I. Vajda. 2006. On Divergences and Informations in Statistics and Information Theory. IEEE Transactions on Information Theory 52, 10 (2006), 4394–4412.
[30]
R. Mehrotra and B. Carterette. 2019. Recommendations in a Marketplace. In Proc. of the 13th ACM Conference on Recommender Systems(RecSys ’19). ACM, 580–581.
[31]
R. Mehrotra, J. McInerney, H. Bouchard, M. Lalmas, and F. Diaz. 2018. Towards a Fair Marketplace: Counterfactual Evaluation of the Trade-off between Relevance, Fairness & Satisfaction in Recommendation Systems. In Proc. of the 27th ACM International Conference on Information and Knowledge Management(CIKM ’18). ACM, 2243–2251.
[32]
R. Mehrotra, C. Shah, and B. Carterette. 2020. Investigating Listeners’ Responses to Divergent Recommendations. In Proc. of the 14th ACM Conference on Recommender Systems(RecSys ’20). ACM, 692–696.
[33]
R. Mehrotra, N. Xue, and M. Lalmas. 2020. Bandit Based Optimization of Multiple Objectives on a Music Streaming Platform. In Proc. of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD ’20). ACM, 3224–3233.
[34]
A. Moffat and J. Zobel. 2008. Rank-Biased Precision for Measurement of Retrieval Effectiveness. ACM Trans. Inf. Syst. 27, 1, Article 2 (Dec. 2008), 27 pages.
[35]
M. Morik, A. Singh, J. Hong, and T. Joachims. 2020. Controlling Fairness and Bias in Dynamic Learning-to-Rank. In Proc. of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’20). ACM, 429–438.
[36]
K. P. Murphy. 2021. Probabilistic Machine Learning: An introduction. MIT Press.
[37]
V. Patil, G. Ghalme, V. Nair, and Y. Narahari. 2020. Achieving Fairness in the Stochastic Multi-Armed Bandit Problem. Proc. of the AAAI Conference on Artificial Intelligence 34, 04 (Apr. 2020), 5379–5386.
[38]
G. K. Patro, A. Biswas, N. Ganguly, K. P. Gummadi, and A. Chakraborty. 2020. FairRec: Two-Sided Fairness for Personalized Recommendations in Two-Sided Platforms. In Proc. of The Web Conference(WWW ’20). ACM, 1194–1204.
[39]
J. Pearl. 2009. Causality. Cambridge university press.
[40]
M. Quadrana, P. Cremonesi, and D. Jannach. 2018. Sequence-Aware Recommender Systems. ACM Comput. Surv., Article Article 66 (July 2018), 36 pages.
[41]
S. E. Robertson. 1977. The probability ranking principle in IR. Journal of documentation(1977).
[42]
M. Rossetti, F. Stella, and M. Zanker. 2016. Contrasting Offline and Online Results when Evaluating Recommendation Algorithms. In Proc. of the 10th ACM Conference on Recommender Systems(RecSys ’16). ACM, 31–34.
[43]
O. Sakhi, S. Bonner, D. Rohde, and F. Vasile. 2020. BLOB : A Probabilistic Model for Recommendation that Combines Organic and Bandit Signals. In Proc. of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD ’20). ACM, 783–793.
[44]
A. Singh and T. Joachims. 2018. Fairness of Exposure in Rankings. In Proc. of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD ’18). ACM, 2219–2228.
[45]
A. Singh and T. Joachims. 2019. Policy Learning for Fairness in Ranking. In Advances in Neural Information Processing Systems(NeurIPS ’19), Vol. 32.
[46]
N. Sonboli, F. Eskandanian, R. Burke, W. Liu, and B. Mobasher. 2020. Opportunistic Multi-Aspect Fairness through Personalized Re-Ranking. In Proc. of the 28th ACM Conference on User Modeling, Adaptation and Personalization(UMAP ’20). ACM, 239–247.
[47]
H. Steck. 2013. Evaluation of Recommendations: Rating-prediction and Ranking. In Proc. of the 7th ACM Conference on Recommender Systems(RecSys ’13). ACM, 213–220.
[48]
Ö. Sürer, R. Burke, and E. C. Malthouse. 2018. Multistakeholder Recommendation with Provider Constraints. In Proc. of the 12th ACM Conference on Recommender Systems(RecSys ’18). ACM, 54–62.
[49]
L. Wang, Y. Bai, W. Sun, and T. Joachims. 2021. Fairness of Exposure in Stochastic Bandits. In International Conference on Machine Learning(ICML’21).
[50]
H. Yadav, Z. Du, and Thorsten Joachims. 2021. Policy-Gradient Training of Fair and Unbiased Ranking Functions. In Proc. of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’21). ACM, 1044–1053.

Cited By

View all
  • (2024)Multi-Objective Recommendation via Multivariate Policy LearningProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688132(712-721)Online publication date: 8-Oct-2024
  • (2024)Optimal Baseline Corrections for Off-Policy Contextual BanditsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688105(722-732)Online publication date: 8-Oct-2024
  • (2024)Fairness and Transparency in Music Recommender Systems: Improvements for ArtistsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688024(1368-1375)Online publication date: 8-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
RecSys '21: Proceedings of the 15th ACM Conference on Recommender Systems
September 2021
883 pages
ISBN:9781450384582
DOI:10.1145/3460231
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 September 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Fairness
  2. Probabilistic Models

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

RecSys '21: Fifteenth ACM Conference on Recommender Systems
September 27 - October 1, 2021
Amsterdam, Netherlands

Acceptance Rates

Overall Acceptance Rate 254 of 1,295 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)91
  • Downloads (Last 6 weeks)3
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Multi-Objective Recommendation via Multivariate Policy LearningProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688132(712-721)Online publication date: 8-Oct-2024
  • (2024)Optimal Baseline Corrections for Off-Policy Contextual BanditsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688105(722-732)Online publication date: 8-Oct-2024
  • (2024)Fairness and Transparency in Music Recommender Systems: Improvements for ArtistsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688024(1368-1375)Online publication date: 8-Oct-2024
  • (2024)On (Normalised) Discounted Cumulative Gain as an Off-Policy Evaluation Metric for Top-n RecommendationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671687(1222-1233)Online publication date: 25-Aug-2024
  • (2024)Multi-Task Neural Linear Bandit for Exploration in Recommender SystemsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671649(5723-5730)Online publication date: 25-Aug-2024
  • (2024)Building Human Values into Recommender Systems: An Interdisciplinary SynthesisACM Transactions on Recommender Systems10.1145/36322972:3(1-57)Online publication date: 5-Jun-2024
  • (2024)Mitigating Exposure Bias in Online Learning to Rank Recommendation: A Novel Reward Model for Cascading BanditsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679763(1638-1648)Online publication date: 21-Oct-2024
  • (2024)Can We Trust Recommender System Fairness Evaluation? The Role of Fairness and RelevanceProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657832(271-281)Online publication date: 10-Jul-2024
  • (2023)Multi-list interfaces for recommender systems: survey and future directionsFrontiers in Big Data10.3389/fdata.2023.12397056Online publication date: 10-Aug-2023
  • (2023)A Probabilistic Position Bias Model for Short-Video Recommendation FeedsProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608777(675-681)Online publication date: 14-Sep-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media