skip to main content
10.1145/3460231.3474247acmconferencesArticle/Chapter ViewAbstractPublication PagesrecsysConference Proceedingsconference-collections
research-article

Pessimistic Reward Models for Off-Policy Learning in Recommendation

Published: 13 September 2021 Publication History

Abstract

Methods for bandit learning from user interactions often require a model of the reward a certain context-action pair will yield – for example, the probability of a click on a recommendation. This common machine learning task is highly non-trivial, as the data-generating process for contexts and actions is often skewed by the recommender system itself. Indeed, when the deployed recommendation policy at data collection time does not pick its actions uniformly-at-random, this leads to a selection bias that can impede effective reward modelling. This in turn makes off-policy learning – the typical setup in industry – particularly challenging.
In this work, we propose and validate a general pessimistic reward modelling approach for off-policy learning in recommendation. Bayesian uncertainty estimates allow us to express scepticism about our own reward model, which can in turn be used to generate a conservative decision rule. We show how it alleviates a well-known decision making phenomenon known as the Optimiser’s Curse, and draw parallels with existing work on pessimistic policy learning. Leveraging the available closed-form expressions for both the posterior mean and variance when a ridge regressor models the reward, we show how to apply pessimism effectively and efficiently to an off-policy recommendation use-case. Empirical observations in a wide range of environments show that being conservative in decision-making leads to a significant and robust increase in recommendation performance. The merits of our approach are most outspoken in realistic settings with limited logging randomisation, limited training samples, and larger action spaces.

Supplementary Material

MP4 File (RecSys2021_Video_PaperA_4K.mp4)
Presentation video

References

[1]
A. Agarwal, S. Basu, T. Schnabel, and T. Joachims. 2017. Effective Evaluation Using Logged Bandit Feedback from Multiple Loggers. In Proc. of the 23rd ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(KDD ’17). ACM, 687–696.
[2]
A. Agarwal, K. Takatsu, I. Zaitsev, and T. Joachims. 2019. A General Framework for Counterfactual Learning-to-Rank. In Proc. of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR’19). ACM, 5–14.
[3]
A. Agarwal, X. Wang, C. Li, M. Bendersky, and M. Najork. 2019. Addressing Trust Bias for Unbiased Learning-to-Rank. In Proc. of the 2019 World Wide Web Conference(WWW ’19). ACM, 4–14.
[4]
J. O. Berger and R. L. Wolpert. 1988. The Likelihood Principle. IMS.
[5]
L. Bottou, J. Peters, J. Quiñonero-Candela, D. Charles, D. Chickering, E. Portugaly, D. Ray, P. Simard, and E. Snelson. 2013. Counterfactual reasoning and learning systems: The example of computational advertising. The Journal of Machine Learning Research 14, 1 (2013), 3207–3260.
[6]
A. Chaney, B. Stewart, and B. Engelhardt. 2018. How Algorithmic Confounding in Recommendation Systems Increases Homogeneity and Decreases Utility. In Proc. of the 12th ACM Conference on Recommender Systems(RecSys ’18). ACM, 224–232.
[7]
O. Chapelle and L. Li. 2011. An Empirical Evaluation of Thompson Sampling. In Proc. of the 24th International Conference on Neural Information Processing Systems(NIPS’11). 2249–2257.
[8]
M. Chen, A. Beutel, P. Covington, S. Jain, F. Belletti, and E. H. Chi. 2019. Top-K Off-Policy Correction for a REINFORCE Recommender System. In Proc. of the 12th ACM International Conference on Web Search and Data Mining(WSDM ’19). ACM, 456–464.
[9]
Y. Chen, Y. Wang, X. Zhao, J. Zou, and M. de Rijke. 2020. Block-Aware Item Similarity Models for Top-N Recommendation. ACM Trans. Inf. Syst. 38, 4, Article 42 (Sept. 2020), 26 pages.
[10]
Z. Chen, Y. Wang, D. Lin, D. Z. Cheng, L. Hong, E. H. Chi, and C. Cui. 2021. Beyond Point Estimate: Inferring Ensemble Prediction Variation from Neuron Activation Strength in Recommender Systems. In Proc. of the 14th ACM International Conference on Web Search and Data Mining(WSDM ’21). ACM, 76–84.
[11]
M. Choi, J. Kim, J. Lee, H. Shim, and J. Lee. 2021. Session-aware Linear Item-Item Models for Session-based Recommendation. In Proc. of the 2021 World Wide Web Conference(WWW ’21).
[12]
M. F. Dacrema, P. Cremonesi, and D. Jannach. 2019. Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches. In Proc. of the 13th ACM Conference on Recommender Systems(RecSys ’19). ACM, 101–109.
[13]
M. Dudík, J. Langford, and L. Li. 2011. Doubly Robust Policy Evaluation and Learning. In Proc. of the 28th International Conference on International Conference on Machine Learning(ICML’11). 1097–1104.
[14]
B. Dumitrascu, K. Feng, and B. E. Engelhardt. 2018. PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits. In Proc. of the 32nd International Conference on Neural Information Processing Systems(NIPS’18). 4629–4638.
[15]
B. Efron and R. J. Tibshirani. 1994. An introduction to the bootstrap. CRC press.
[16]
E. Elahi, W. Wang, D. Ray, A. Fenton, and T. Jebara. 2019. Variational Low Rank Multinomials for Collaborative Filtering with Side-information. In Proc. of the 13th ACM Conference on Recommender Systems(RecSys ’19). ACM, 340–347.
[17]
V. Elvira, L. Martino, D. Luengo, and M. F. Bugallo. 2019. Generalized Multiple Importance Sampling. Statist. Sci. 34, 1 (02 2019), 129–155.
[18]
M. Farajtabar, Y. Chow, and M. Ghavamzadeh. 2018. More Robust Doubly Robust Off-policy Evaluation. In Proc. of the 35th International Conference on Machine Learning(ICML’18, Vol. 80). PMLR, 1447–1456.
[19]
L. Faury, U. Tanielian, F. Vasile, E. Smirnova, and E. Dohmatob. 2020. Distributionally Robust Counterfactual Risk Minimization. In Proc. of the 34th AAAI Conference on Artificial Intelligence(AAAI’20). AAAI Press.
[20]
Y. Gal and Z. Ghahramani. 2016. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Proc. of The 33rd International Conference on Machine Learning(ICML ’16). PMLR, 1050–1059.
[21]
F. Garcin, B. Faltings, O. Donatsch, A. Alazzawi, C. Bruttin, and A. Huber. 2014. Offline and Online Evaluation of News Recommender Systems at Swissinfo.Ch. In Proc. of the 8th ACM Conference on Recommender Systems(RecSys ’14). 169–176.
[22]
A. Gilotte, C. Calauzènes, T. Nedelec, A. Abraham, and S. Dollé. 2018. Offline A/B Testing for Recommender Systems. In Proc. of the 11th ACM International Conference on Web Search and Data Mining(WSDM ’18). ACM, 198–206.
[23]
D. Guo, S. I. Ktena, P. K. Myana, F. Huszar, W. Shi, A. Tejani, M. Kneier, and S. Das. 2020. Deep Bayesian Bandits: Exploring in Online Personalized Recommendations. In Proc. of the 14th ACM Conference on Recommender Systems. ACM, 456–461.
[24]
X. He, O. Pan, J.and Jin, T. Xu, B. Liu, T. Xu, Y. Shi, A. Atallah, R. Herbrich, S. Bowers, and J. Q. Candela. 2014. Practical Lessons from Predicting Clicks on Ads at Facebook. In Proc. of the 8th International Workshop on Data Mining for Online Advertising(ADKDD’14). ACM, 1–9.
[25]
L. Hui and M. Belkin. 2021. Evaluation of Neural Architectures Trained with Square Loss vs Cross-Entropy in Classification Tasks. In Proc. of the 9th International Conference on Learning Representations(ICLR ’21). arxiv:2006.07322 [cs.LG]
[26]
E. L. Ionides. 2008. Truncated Importance Sampling. Journal of Computational and Graphical Statistics 17, 2(2008), 295–311.
[27]
O. Jeunen. 2019. Revisiting Offline Evaluation for Implicit-feedback Recommender Systems. In Proc. of the 13th ACM Conference on Recommender Systems(RecSys ’19). ACM, 596–600.
[28]
O. Jeunen and B. Goethals. 2020. An Empirical Evaluation of Doubly Robust Learning for Recommendation. In Proc. of the ACM RecSys Workshop on Bandit Learning from User Interactions(REVEAL ’20).
[29]
O. Jeunen, D. Mykhaylov, D. Rohde, F. Vasile, A. Gilotte, and M. Bompaire. 2019. Learning from Bandit Feedback: An Overview of the State-of-the-art. In Proc. of the ACM RecSys Workshop on Reinforcement Learning and Robust Estimators for Recommendation(REVEAL ’19).
[30]
O. Jeunen, D. Rohde, and F. Vasile. 2019. On the Value of Bandit Feedback for Offline Recommender System Evaluation. In Proc. of the ACM RecSys Workshop on Reinforcement Learning and Robust Estimators for Recommendation(REVEAL ’19).
[31]
O. Jeunen, D. Rohde, F. Vasile, and M. Bompaire. 2020. Joint Policy-Value Learning for Recommendation. In Proc. of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(KDD ’20). ACM, 1223–1233.
[32]
O. Jeunen, J. Van Balen, and B. Goethals. 2020. Closed-Form Models for Collaborative Filtering with Side-Information. In Proc. of the 14th ACM Conference on Recommender Systems(RecSys ’20). ACM, 651–656.
[33]
Y. Jin, Z. Yang, and Z. Wang. 2020. Is Pessimism Provably Efficient for Offline RL?arxiv:2012.15085 [cs.LG]
[34]
T. Joachims, A. Swaminathan, and M. de Rijke. 2018. Deep Learning with Logged Bandit Feedback. In Proc. of the 6th International Conference on Learning Representations(ICLR ’18).
[35]
T. Joachims, A. Swaminathan, and T. Schnabel. 2017. Unbiased Learning-to-Rank with Biased Feedback. In Proc. of the 10th ACM International Conference on Web Search and Data Mining(WSDM ’17). ACM, 781–789.
[36]
R. Kidambi, A. Rajeswaran, P. Netrapalli, and T. Joachims. 2020. MOReL: Model-Based Offline Reinforcement Learning. In Advances in Neural Information Processing Systems(NeurIPS ’20, Vol. 33).
[37]
A. Kumar, A. Zhou, G. Tucker, and S. Levine. 2020. Conservative Q-Learning for Offline Reinforcement Learning. In Advances in Neural Information Processing Systems(NeurIPS ’20, Vol. 33).
[38]
D. Lefortier, A. Swaminathan, X. Gu, T. Joachims, and M. de Rijke. 2016. Large-scale validation of counterfactual learning methods: A test-bed. arXiv preprint arXiv:1612.00367(2016).
[39]
S. Levine, A. Kumar, G. Tucker, and J. Fu. 2020. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems. arxiv:2005.01643 [cs.LG]
[40]
L. Li, W. Chu, J. Langford, and R. E. Schapire. 2010. A Contextual-Bandit Approach to Personalized News Article Recommendation. In Proc. of the 19th International Conference on World Wide Web(WWW ’10). ACM, 661–670.
[41]
S. Li, A. Karatzoglou, and C. Gentile. 2016. Collaborative Filtering Bandits. In Proc. of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’16). ACM, 539–548.
[42]
D. Liang, R. G. Krishnan, M. D Hoffman, and T. Jebara. 2018. Variational autoencoders for collaborative filtering. In Proc. of the 2018 World Wide Web Conference(WWW ’18). ACM, 689–698.
[43]
Y. Liu, A. Swaminathan, A. Agarwal, and E. Brunskill. 2020. Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration. In Advances in Neural Information Processing Systems(NeurIPS ’20, Vol. 33).
[44]
B. London and T. Sandler. 2019. Bayesian Counterfactual Risk Minimization. In Proc. of the 36th International Conference on Machine Learning(ICML ’19, Vol. 97). PMLR, 4125–4133.
[45]
R. Lopez, I. Dhillion, and M. I. Jordan. 2021. Learning from eXtreme Bandit Feedback. In Proc. of the 35th AAAI Conference on Artificial Intelligence(AAAI’21). AAAI Press.
[46]
C. Ma, L. Ma, Y. Zhang, R. Tang, X. Liu, and M. Coates. 2020. Probabilistic Metric Learning with Adaptive Margin for Top-K Recommendation. In Proc. of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(KDD ’20). ACM, 1036–1044.
[47]
J. Ma, Z. Zhao, X. Yi, J. Chen, L. Hong, and E. H. Chi. 2018. Modeling Task Relationships in Multi-Task Learning with Multi-Gate Mixture-of-Experts. In Proc. of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD ’18). ACM, 1930–1939.
[48]
J. Ma, Z. Zhao, X. Yi, J. Yang, M. Chen, J. Tang, L. Hong, and E. H. Chi. 2020. Off-Policy Learning in Two-Stage Recommender Systems. In Proc. of the 2020 World Wide Web Conference(WWW ’20). ACM.
[49]
Y. Ma, Y. Wang, and B. Narayanaswamy. 2019. Imitation-Regularized Offline Learning. In Proc. of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS)(AIStats ’19, Vol. 89). PMLR, 2956–2965.
[50]
M. Mansoury, H. Abdollahpouri, M. Pechenizkiy, B. Mobasher, and R. Burke. 2020. Feedback Loop and Bias Amplification in Recommender Systems. In Proc. of the 29th ACM International Conference on Information & Knowledge Management(CIKM ’20). ACM, 2145–2148.
[51]
A. Masegosa. 2020. Learning under Model Misspecification: Applications to Variational and Ensemble methods. In Advances in Neural Information Processing Systems(NeurIPS ’20, Vol. 33). 5479–5491.
[52]
A. Maurer and M. Pontil. 2009. Empirical Bernstein Bounds and Sample Variance Penalization. Stat. 1050(2009), 21.
[53]
B. C. May, N. Korda, A. Lee, and D. S. Leslie. 2012. Optimistic Bayesian Sampling in Contextual-Bandit Problems. J. Mach. Learn. Res. 13, 1 (June 2012), 2069–2106.
[54]
J. McInerney, B. Lacker, S. Hansen, K. Higley, H. Bouchard, A. Gruson, and R. Mehrotra. 2018. Explore, Exploit, and Explain: Personalizing Explainable Recommendations with Bandits. In Proc. of the 12th ACM Conference on Recommender Systems(RecSys ’18). ACM, 31–39.
[55]
H. B. McMahan, G. Holt, D. Sculley, M. Young, D. Ebner, J. Grady, L. Nie, T. Phillips, E. Davydov, D. Golovin, 2013. Ad click prediction: a view from the trenches. In Proc. of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1222–1230.
[56]
R. Mehrotra, J. McInerney, H. Bouchard, M. Lalmas, and F. Diaz. 2018. Towards a Fair Marketplace: Counterfactual Evaluation of the Trade-off between Relevance, Fairness & Satisfaction in Recommendation Systems. In Proc. of the 27th ACM International Conference on Information and Knowledge Management(CIKM ’18). ACM, 2243–2251.
[57]
R. Mehrotra, N. Xue, and M. Lalmas. 2020. Bandit Based Optimization of Multiple Objectives on a Music Streaming Platform. In Proc. of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(KDD ’20). ACM, 3224–3233.
[58]
K. P. Murphy. 2021. Probabilistic Machine Learning: An introduction. MIT Press.
[59]
D. Mykhaylov, D. Rohde, F. Vasile, M. Bompaire, and O. Jeunen. 2019. Three Methods for Training on Bandit Feedback. In Proc. of the NeurIPS Workshop on Causality and Machine Learning(CausalML ’19).
[60]
X. Ning and G. Karypis. 2011. SLIM: Sparse Linear Methods for Top-N Recommender Systems. In Proc. of the 2011 IEEE 11th International Conference on Data Mining(ICDM ’11). IEEE Computer Society, 497–506.
[61]
H. Oosterhuis and M. de Rijke. 2020. Policy-Aware Unbiased Learning to Rank for Top-k Rankings. In Proc. of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’20). ACM, 489–498.
[62]
I. Osband, C. Blundell, A. Pritzel, and B. Van Roy. 2016. Deep Exploration via Bootstrapped DQN. In Advances in Neural Information Processing Systems, Vol. 29. 4026–4034.
[63]
A. B. Owen. 2013. Monte Carlo theory, methods and examples.
[64]
D. Rohde, S. Bonner, T. Dunlop, F. Vasile, and A. Karatzoglou. 2018. RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising. In Proc. of the ACM RecSys Workshop on Offline Evaluation for Recommender Systems(REVEAL ’18).
[65]
M. Rossetti, F. Stella, and M. Zanker. 2016. Contrasting Offline and Online Results when Evaluating Recommendation Algorithms. In Proc. of the 10th ACM Conference on Recommender Systems(RecSys ’16). ACM, 31–34.
[66]
N. Sachdeva, Y. Su, and T. Joachims. 2020. Off-Policy Bandits with Deficient Support. In Proc. of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 965–975.
[67]
Y. Saito, S. Aihara, M. Matsutani, and Y. Narita. 2020. Large-scale Open Dataset, Pipeline, and Benchmark for Bandit Algorithms. arxiv:2008.07146 [cs.LG]
[68]
O. Sakhi, S. Bonner, D. Rohde, and F. Vasile. 2020. BLOB : A Probabilistic Model for Recommendation that Combines Organic and Bandit Signals. In Proc. of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(KDD ’20). ACM, 783–793.
[69]
S. Sedhain, A. Menon, S. Sanner, and D. Braziunas. 2016. On the Effectiveness of Linear Models for One-Class Collaborative Filtering. Proc. of the AAAI Conference on Artificial Intelligence 30, 1(2016).
[70]
I. Shenbin, A. Alekseev, E. Tutubalina, V. Malykh, and S. I. Nikolenko. 2020. RecVAE: A New Variational Autoencoder for Top-N Recommendations with Implicit Feedback. In Proc. of the 13th International Conference on Web Search and Data Mining(WSDM ’20). ACM, 528–536.
[71]
H. Shimodaira. 2000. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference 90, 2 (2000), 227 – 244.
[72]
N. Si, F. Zhang, Z. Zhou, and J. Blanchet. 2020. Distributionally Robust Policy Evaluation and Learning in Offline Contextual Bandits. In International Conference on Machine Learning(ICML’20).
[73]
J. E. Smith and R. L. Winkler. 2006. The Optimizer’s Curse: Skepticism and Postdecision Surprise in Decision Analysis. Management Science 52, 3 (2006), 311–322.
[74]
H. Steck. 2019. Embarrassingly Shallow Autoencoders for Sparse Data. In The World Wide Web Conference(WWW ’19). ACM, 3251–3257.
[75]
Y. Su, M. Dimakopoulou, A. Krishnamurthy, and M. Dudik. 2020. Doubly robust off-policy evaluation with shrinkage. In Proc. of the 37th International Conference on Machine Learning(ICML ’20). PMLR, 9167–9176.
[76]
Y. Su, L. Wang, M. Santacatterina, and T. Joachims. 2019. CAB: Continuous Adaptive Blending for Policy Evaluation and Learning. In International Conference on Machine Learning(ICML’19). 6005–6014.
[77]
A. Swaminathan and T. Joachims. 2015. Counterfactual Risk Minimization: Learning from Logged Bandit Feedback. In Proc. of the 32nd International Conference on International Conference on Machine Learning(ICML’15). JMLR.org, 814–823.
[78]
A. Swaminathan and T. Joachims. 2015. The Self-Normalized Estimator for Counterfactual Learning. In Advances in Neural Information Processing Systems. 3231–3239.
[79]
H. Tang, J. Liu, M. Zhao, and X. Gong. 2020. Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations. In Proc. of the 14th ACM Conference on Recommender Systems(RecSys ’20). ACM, 269–278.
[80]
F. Vasile, D. Rohde, O. Jeunen, and A. Benhalloum. 2020. A Gentle Introduction to Recommendation as Counterfactual Policy Learning. In Proc. of the 28th ACM Conference on User Modeling, Adaptation and Personalization(UMAP ’20). ACM, 392–393.
[81]
T. J. Walsh, I. Szita, C. Diuk, and M. L. Littman. 2009. Exploring Compact Reinforcement-Learning Representations with Linear Regression. In Proc. of the 25th Conference on Uncertainty in Artificial Intelligence(UAI ’09). AUAI Press, 591–598.
[82]
X. Xin, A. Karatzoglou, I. Arapakis, and J. M. Jose. 2020. Self-Supervised Reinforcement Learning for Recommender Systems. In Proc. of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’20). ACM, 931–940.
[83]
T. Yu, G. Thomas, L. Yu, S. Ermon, J. Y. Zou, S. Levine, C. Finn, and T. Ma. 2020. MOPO: Model-Based Offline Policy Optimization. In Advances in Neural Information Processing Systems(NeurIPS ’20, Vol. 33).
[84]
Z. Zhao, L. Hong, L. Wei, J. Chen, A. Nath, S. Andrews, A. Kumthekar, M. Sathiamoorthy, X. Yi, and E. H. Chi. 2019. Recommending What Video to Watch next: A Multitask Ranking System. In Proceedings of the 13th ACM Conference on Recommender Systems(RecSys ’19). ACM, 43–51.

Cited By

View all
  • (2024)Unified PAC-Bayesian study of pessimism for offline policy learning with regularized importance samplingProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702680(88-109)Online publication date: 15-Jul-2024
  • (2024)On the Opportunities and Challenges of Offline Reinforcement Learning for Recommender SystemsACM Transactions on Information Systems10.1145/366199642:6(1-26)Online publication date: 19-Aug-2024
  • (2024)Ranking the causal impact of recommendations under collider bias in k-spots recommender systemsACM Transactions on Recommender Systems10.1145/36431392:2(1-29)Online publication date: 14-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
RecSys '21: Proceedings of the 15th ACM Conference on Recommender Systems
September 2021
883 pages
ISBN:9781450384582
DOI:10.1145/3460231
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 September 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Contextual Bandits
  2. Offline Reinforcement Learning
  3. Probabilistic Models

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

RecSys '21: Fifteenth ACM Conference on Recommender Systems
September 27 - October 1, 2021
Amsterdam, Netherlands

Acceptance Rates

Overall Acceptance Rate 254 of 1,295 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)94
  • Downloads (Last 6 weeks)6
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Unified PAC-Bayesian study of pessimism for offline policy learning with regularized importance samplingProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702680(88-109)Online publication date: 15-Jul-2024
  • (2024)On the Opportunities and Challenges of Offline Reinforcement Learning for Recommender SystemsACM Transactions on Information Systems10.1145/366199642:6(1-26)Online publication date: 19-Aug-2024
  • (2024)Ranking the causal impact of recommendations under collider bias in k-spots recommender systemsACM Transactions on Recommender Systems10.1145/36431392:2(1-29)Online publication date: 14-May-2024
  • (2024)Δ-OPE: Off-Policy Estimation with Pairs of PoliciesProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688162(878-883)Online publication date: 8-Oct-2024
  • (2024)Multi-Objective Recommendation via Multivariate Policy LearningProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688132(712-721)Online publication date: 8-Oct-2024
  • (2024)Optimal Baseline Corrections for Off-Policy Contextual BanditsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688105(722-732)Online publication date: 8-Oct-2024
  • (2024)CONSEQUENCES --- The 3rd Workshop on Causality, Counterfactuals and Sequential Decision-Making for Recommender SystemsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3687095(1206-1209)Online publication date: 8-Oct-2024
  • (2024)On (Normalised) Discounted Cumulative Gain as an Off-Policy Evaluation Metric for Top-n RecommendationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671687(1222-1233)Online publication date: 25-Aug-2024
  • (2024)Reinforcing Long-Term Performance in Recommender Systems with User-Oriented Exploration PolicyProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657714(1850-1860)Online publication date: 10-Jul-2024
  • (2024)Ad-load Balancing via Off-policy Learning in a Content MarketplaceProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635846(586-595)Online publication date: 4-Mar-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media