skip to main content
10.1145/2740908.2741998acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Ad Recommendation Systems for Life-Time Value Optimization

Published: 18 May 2015 Publication History

Abstract

The main objective in the ad recommendation problem is to find a strategy that, for each visitor of the website, selects the ad that has the highest probability of being clicked. This strategy could be computed using supervised learning or contextual bandit algorithms, which treat two visits of the same user as two separate independent visitors, and thus, optimize greedily for a single step into the future. Another approach would be to use reinforcement learning (RL) methods, which differentiate between two visits of the same user and two different visitors, and thus, optimizes for multiple steps into the future or the life-time value (LTV) of a customer. While greedy methods have been well-studied, the LTV approach is still in its infancy, mainly due to two fundamental challenges: how to compute a good LTV strategy and how to evaluate a solution using historical data to ensure its "safety" before deployment. In this paper, we tackle both of these challenges by proposing to use a family of off-policy evaluation techniques with statistical guarantees about the performance of a new strategy. We apply these methods to a real ad recommendation problem, both for evaluating the final performance and for optimizing the parameters of the RL algorithm. Our results show that our LTV optimization algorithm equipped with these off-policy evaluation techniques outperforms the greedy approaches. They also give fundamental insights on the difference between the click through rate (CTR) and LTV metrics for performance evaluation in the ad recommendation problem.

References

[1]
L. Breiman. Random forests. Mach. Learn., 45(1):5--32, Oct. 2001.
[2]
J. Carpenter and J. Bithell. Bootstrap confidence intervals: when, which, what? a practical guide for medical statisticians. Statistics in Medicine, 19:1141--1164, 2000.
[3]
L. Champless, A. Folsom, A. Sharrett, P. Sorlie, D. Couper, M. Szklo, and F. Nieto. Coronary heard disease risk prediction in the Atherosclerosis Risk in Communities (ARIC) study. Journal of Clinical Epidemiology, 56(9):880--890, 2003.
[4]
B. Efron. Better bootstrap confidence intervals. Journal of the American Statistical Association, 82(397):171--185, 1987.
[5]
D. Ernst, P. Geurts, and L. Wehenkel. Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6:503--556, 2005.
[6]
A. Folsom, L. Chambless, B. Duncan, A. Gilbert, and J. Pankow. Prediction of coronary heart disease in middle-aged adults with diabetes. Diabetes Care, 26(10):2777--2784, 2003.
[7]
J. Jonker, N. Piersma, and D. V. den Poel. Joint optimization of customer segmentation and marketing policy to maximize long-term profitability. Expert Systems with Applications, 27(2):159 -- 168, 2004.
[8]
J. Langford, L. Li, and M. Dudík. Doubly robust policy evaluation and learning. In Proceedings of the 28th International Conference on Machine Learning, pages 1097--1104, 2011.
[9]
L. Li, W. Chu, J. Langford, and R. Schapire. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th International Conference on World Wide Web, pages 661--670, 2010.
[10]
A. Maurer and M. Pontil. Empirical Bernstein bounds and sample variance penalization. In Proceedings of the Twenty-Second Annual Conference on Learning Theory, pages 115--124, 2009.
[11]
E. Pednault, N. Abe, and B. Zadrozny. Sequential cost-sensitive decision making with reinforcement learning. In Proceedings of the eighth international conference on Knowledge discovery and data mining, pages 259--268, 2002.
[12]
P. Pfeifer and R. Carraway. Modeling customer relationships as markov chains. Journal of interactive marketing, pages 43--55, 2000.
[13]
D. Precup, R. S. Sutton, and S. Singh. Eligibility traces for off-policy policy evaluation. In Proceedings of the 17th International Conference on Machine Learning, pages 759--766, 2000.
[14]
D. Silver, L. Newnham, D. Barker, S. Weller, and J. McFall. Concurrent reinforcement learning from customer interactions. In In 30th International Conference on Machine Learning, 2013.
[15]
A. Strehl, J. Langford, L. Li, and S. Kakade. Learning from logged implicit exploration data. In Proceedings of Neural Information Processing Systems 24, pages 2217--2225, 2010.
[16]
R. Sutton and A. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998.
[17]
G. Theocharous and A. Hallak. Lifetime value marketing using reinforcement learning. In The 1st Multidisciplinary Conference on Reinforcement Learning and Decision Making, 2013.
[18]
P. Thomas, G. Theocharous, and M. Ghavamzadeh. High confidence off-policy evaluation. In AAAI, 2015.
[19]
G. Tirenni, A. Labbi, C. Berrospi, A. Elisseeff, T. Bhose, K. Pauro, and S. Poyhonen. The 2005 ISMS Practice Prize Winner Customer-Equity and Lifetime Management (CELM) Finnair Case Study. Marketing Science, 26:553--565, 2007.
[20]
W. Venables and B. Ripley. Modern Applied Statistics with S. Springer, New York, fourth edition, 2002.

Cited By

View all
  • (2025)Personalization At Doordash: From Conversion Modeling To Multi-objective Long-term Value OptimizationProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3706132(1096-1097)Online publication date: 10-Mar-2025
  • (2024)Distributed Recommendation Systems: Survey and Research DirectionsACM Transactions on Information Systems10.1145/369478343:1(1-38)Online publication date: 6-Sep-2024
  • (2024)A Survey on Reinforcement Learning for Recommender SystemsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.328016135:10(13164-13184)Online publication date: Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '15 Companion: Proceedings of the 24th International Conference on World Wide Web
May 2015
1602 pages
ISBN:9781450334730
DOI:10.1145/2740908

Sponsors

  • IW3C2: International World Wide Web Conference Committee

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 May 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. ad recommendation
  2. off-policy evaluation
  3. reinforcement learning

Qualifiers

  • Research-article

Conference

WWW '15
Sponsor:
  • IW3C2

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)64
  • Downloads (Last 6 weeks)7
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Personalization At Doordash: From Conversion Modeling To Multi-objective Long-term Value OptimizationProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3706132(1096-1097)Online publication date: 10-Mar-2025
  • (2024)Distributed Recommendation Systems: Survey and Research DirectionsACM Transactions on Information Systems10.1145/369478343:1(1-38)Online publication date: 6-Sep-2024
  • (2024)A Survey on Reinforcement Learning for Recommender SystemsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.328016135:10(13164-13184)Online publication date: Oct-2024
  • (2024)Cooperative Markov Decision Process model for human–machine co-adaptation in robot-assisted rehabilitationKnowledge-Based Systems10.1016/j.knosys.2024.111572291(111572)Online publication date: May-2024
  • (2024)Model-based approaches to profit-aware recommendationExpert Systems with Applications10.1016/j.eswa.2024.123642249(123642)Online publication date: Sep-2024
  • (2024)Adversarial Online Reinforcement Learning Under Limited Defender ResourcesNetwork Security Empowered by Artificial Intelligence10.1007/978-3-031-53510-9_10(265-301)Online publication date: 24-Feb-2024
  • (2023)Reinforcement Learning for the Face Support Pressure of Tunnel Boring MachinesGeosciences10.3390/geosciences1303008213:3(82)Online publication date: 13-Mar-2023
  • (2023)Off-policy evaluation in partially observed Markov decision processes under sequential ignorabilityThe Annals of Statistics10.1214/23-AOS228751:4Online publication date: 1-Aug-2023
  • (2023)Progressive Horizon Learning: Adaptive Long Term Optimization for Personalized RecommendationProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608852(940-946)Online publication date: 14-Sep-2023
  • (2023)Reinforced Explainable Knowledge Concept Recommendation in MOOCsACM Transactions on Intelligent Systems and Technology10.1145/357999114:3(1-20)Online publication date: 1-Apr-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media