research-article

Ad Recommendation Systems for Life-Time Value Optimization

Authors:

Georgios Theocharous,

Philip S. Thomas,

Mohammad GhavamzadehAuthors Info & Claims

WWW '15 Companion: Proceedings of the 24th International Conference on World Wide Web

Pages 1305 - 1310

https://doi.org/10.1145/2740908.2741998

Published: 18 May 2015 Publication History

Abstract

The main objective in the ad recommendation problem is to find a strategy that, for each visitor of the website, selects the ad that has the highest probability of being clicked. This strategy could be computed using supervised learning or contextual bandit algorithms, which treat two visits of the same user as two separate independent visitors, and thus, optimize greedily for a single step into the future. Another approach would be to use reinforcement learning (RL) methods, which differentiate between two visits of the same user and two different visitors, and thus, optimizes for multiple steps into the future or the life-time value (LTV) of a customer. While greedy methods have been well-studied, the LTV approach is still in its infancy, mainly due to two fundamental challenges: how to compute a good LTV strategy and how to evaluate a solution using historical data to ensure its "safety" before deployment. In this paper, we tackle both of these challenges by proposing to use a family of off-policy evaluation techniques with statistical guarantees about the performance of a new strategy. We apply these methods to a real ad recommendation problem, both for evaluating the final performance and for optimizing the parameters of the RL algorithm. Our results show that our LTV optimization algorithm equipped with these off-policy evaluation techniques outperforms the greedy approaches. They also give fundamental insights on the difference between the click through rate (CTR) and LTV metrics for performance evaluation in the ad recommendation problem.

References

[1]

L. Breiman. Random forests. Mach. Learn., 45(1):5--32, Oct. 2001.

Digital Library

[2]

J. Carpenter and J. Bithell. Bootstrap confidence intervals: when, which, what? a practical guide for medical statisticians. Statistics in Medicine, 19:1141--1164, 2000.

[3]

L. Champless, A. Folsom, A. Sharrett, P. Sorlie, D. Couper, M. Szklo, and F. Nieto. Coronary heard disease risk prediction in the Atherosclerosis Risk in Communities (ARIC) study. Journal of Clinical Epidemiology, 56(9):880--890, 2003.

[4]

B. Efron. Better bootstrap confidence intervals. Journal of the American Statistical Association, 82(397):171--185, 1987.

[5]

D. Ernst, P. Geurts, and L. Wehenkel. Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6:503--556, 2005.

Digital Library

[6]

A. Folsom, L. Chambless, B. Duncan, A. Gilbert, and J. Pankow. Prediction of coronary heart disease in middle-aged adults with diabetes. Diabetes Care, 26(10):2777--2784, 2003.

[7]

J. Jonker, N. Piersma, and D. V. den Poel. Joint optimization of customer segmentation and marketing policy to maximize long-term profitability. Expert Systems with Applications, 27(2):159 -- 168, 2004.

[8]

J. Langford, L. Li, and M. Dudík. Doubly robust policy evaluation and learning. In Proceedings of the 28th International Conference on Machine Learning, pages 1097--1104, 2011.

[9]

L. Li, W. Chu, J. Langford, and R. Schapire. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th International Conference on World Wide Web, pages 661--670, 2010.

Digital Library

[10]

A. Maurer and M. Pontil. Empirical Bernstein bounds and sample variance penalization. In Proceedings of the Twenty-Second Annual Conference on Learning Theory, pages 115--124, 2009.

[11]

E. Pednault, N. Abe, and B. Zadrozny. Sequential cost-sensitive decision making with reinforcement learning. In Proceedings of the eighth international conference on Knowledge discovery and data mining, pages 259--268, 2002.

Digital Library

[12]

P. Pfeifer and R. Carraway. Modeling customer relationships as markov chains. Journal of interactive marketing, pages 43--55, 2000.

[13]

D. Precup, R. S. Sutton, and S. Singh. Eligibility traces for off-policy policy evaluation. In Proceedings of the 17th International Conference on Machine Learning, pages 759--766, 2000.

Digital Library

[14]

D. Silver, L. Newnham, D. Barker, S. Weller, and J. McFall. Concurrent reinforcement learning from customer interactions. In In 30th International Conference on Machine Learning, 2013.

[15]

A. Strehl, J. Langford, L. Li, and S. Kakade. Learning from logged implicit exploration data. In Proceedings of Neural Information Processing Systems 24, pages 2217--2225, 2010.

[16]

R. Sutton and A. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998.

Digital Library

[17]

G. Theocharous and A. Hallak. Lifetime value marketing using reinforcement learning. In The 1st Multidisciplinary Conference on Reinforcement Learning and Decision Making, 2013.

[18]

P. Thomas, G. Theocharous, and M. Ghavamzadeh. High confidence off-policy evaluation. In AAAI, 2015.

[19]

G. Tirenni, A. Labbi, C. Berrospi, A. Elisseeff, T. Bhose, K. Pauro, and S. Poyhonen. The 2005 ISMS Practice Prize Winner Customer-Equity and Lifetime Management (CELM) Finnair Case Study. Marketing Science, 26:553--565, 2007.

Digital Library

[20]

W. Venables and B. Ripley. Modern Applied Statistics with S. Springer, New York, fourth edition, 2002.

Digital Library

Cited By

Qi QNejdl WAuer SKarras OCha MMoens MNajork M(2025)Personalization At Doordash: From Conversion Modeling To Multi-objective Long-term Value OptimizationProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3706132(1096-1097)Online publication date: 10-Mar-2025
https://dl.acm.org/doi/10.1145/3701551.3706132
Cai QCao JXu GZhu N(2024)Distributed Recommendation Systems: Survey and Research DirectionsACM Transactions on Information Systems10.1145/369478343:1(1-38)Online publication date: 6-Sep-2024
https://dl.acm.org/doi/10.1145/3694783
Lin YLiu YLin FZou LWu PZeng WChen HMiao C(2024)A Survey on Reinforcement Learning for Recommender SystemsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.328016135:10(13164-13184)Online publication date: Oct-2024
https://doi.org/10.1109/TNNLS.2023.3280161
Show More Cited By

Index Terms

Ad Recommendation Systems for Life-Time Value Optimization
1. Computing methodologies
  1. Artificial intelligence
    1. Philosophical/theoretical foundations of artificial intelligence
      1. Cognitive science
  2. Machine learning
2. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

Learning Personalized Health Recommendations via Offline Reinforcement Learning
RecSys '24: Proceedings of the 18th ACM Conference on Recommender Systems

The healthcare industry is strained and would benefit from personalized treatment plans for treating various health conditions (e.g., HIV and diabetes). Reinforcement Learning is a promising approach to learning such sequential recommendation systems. ...
Value-aware Recommendation based on Reinforcement Profit Maximization
WWW '19: The World Wide Web Conference

Existing recommendation algorithms mostly focus on optimizing traditional recommendation measures, such as the accuracy of rating prediction in terms of RMSE or the quality of top-k recommendation lists in terms of precision, recall, MAP, etc. However, ...
Reinforcement Learning to Optimize Lifetime Value in Cold-Start Recommendation
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Recommender system plays a crucial role in modern E-commerce platform. Due to the lack of historical interactions between users and items, cold-start recommendation is a challenging problem. In order to alleviate the cold-start issue, most existing ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '15 Companion: Proceedings of the 24th International Conference on World Wide Web

May 2015

1602 pages

ISBN:9781450334730

DOI:10.1145/2740908

General Chairs:
Aldo Gangemi
National Research Council, Italy & Paris 13 University-CNRS, France
,
Stefano Leonardi
Sapienza University of Rome, Italy
,
Alessandro Panconesi
Sapienza University of Rome, Italy

Copyright © 2015 Copyright is held by the International World Wide Web Conference Committee (IW3C2).

Sponsors

IW3C2: International World Wide Web Conference Committee

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 May 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WWW '15

Sponsor:

IW3C2

WWW '15: 24th International World Wide Web Conference

May 18 - 22, 2015

Florence, Italy

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

36
Total Citations
View Citations
484
Total Downloads

Downloads (Last 12 months)64
Downloads (Last 6 weeks)7

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Qi QNejdl WAuer SKarras OCha MMoens MNajork M(2025)Personalization At Doordash: From Conversion Modeling To Multi-objective Long-term Value OptimizationProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3706132(1096-1097)Online publication date: 10-Mar-2025
https://dl.acm.org/doi/10.1145/3701551.3706132
Cai QCao JXu GZhu N(2024)Distributed Recommendation Systems: Survey and Research DirectionsACM Transactions on Information Systems10.1145/369478343:1(1-38)Online publication date: 6-Sep-2024
https://dl.acm.org/doi/10.1145/3694783
Lin YLiu YLin FZou LWu PZeng WChen HMiao C(2024)A Survey on Reinforcement Learning for Recommender SystemsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.328016135:10(13164-13184)Online publication date: Oct-2024
https://doi.org/10.1109/TNNLS.2023.3280161
Guo KCheng ALi YLi JDuffield RSu S(2024)Cooperative Markov Decision Process model for human–machine co-adaptation in robot-assisted rehabilitationKnowledge-Based Systems10.1016/j.knosys.2024.111572291(111572)Online publication date: May-2024
https://doi.org/10.1016/j.knosys.2024.111572
De Biasio AJannach DNavarin N(2024)Model-based approaches to profit-aware recommendationExpert Systems with Applications10.1016/j.eswa.2024.123642249(123642)Online publication date: Sep-2024
https://doi.org/10.1016/j.eswa.2024.123642
Shi MLiang YShroff N(2024)Adversarial Online Reinforcement Learning Under Limited Defender ResourcesNetwork Security Empowered by Artificial Intelligence10.1007/978-3-031-53510-9_10(265-301)Online publication date: 24-Feb-2024
https://doi.org/10.1007/978-3-031-53510-9_10
Soranzo EGuardiani CWu W(2023)Reinforcement Learning for the Face Support Pressure of Tunnel Boring MachinesGeosciences10.3390/geosciences1303008213:3(82)Online publication date: 13-Mar-2023
https://doi.org/10.3390/geosciences13030082
Hu YWager S(2023)Off-policy evaluation in partially observed Markov decision processes under sequential ignorabilityThe Annals of Statistics10.1214/23-AOS228751:4Online publication date: 1-Aug-2023
https://doi.org/10.1214/23-AOS2287
Yi CZumwalt DNi ZChakrabarti S(2023)Progressive Horizon Learning: Adaptive Long Term Optimization for Personalized RecommendationProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608852(940-946)Online publication date: 14-Sep-2023
https://dl.acm.org/doi/10.1145/3604915.3608852
Jiang LLiu KWang YWang DWang PFu YYin M(2023)Reinforced Explainable Knowledge Concept Recommendation in MOOCsACM Transactions on Intelligent Systems and Technology10.1145/357999114:3(1-20)Online publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1145/3579991
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten