extended-abstract

Off-Policy Selection for Optimizing Ad Display Timing in Mobile Games (Samsung Instant Plays)

Authors:

Katarzyna Siudek-Tkaczuk,

Sławomir Kapka,

Jędrzej Alchimowicz,

Bartłomiej Swoboda,

Michał RomaniukAuthors Info & Claims

RecSys '24: Proceedings of the 18th ACM Conference on Recommender Systems

Pages 803 - 805

https://doi.org/10.1145/3640457.3688058

Published: 08 October 2024 Publication History

Get Access

Abstract

Off-Policy Selection (OPS) aims to select the best policy from a set of policies trained using offline Reinforcement Learning. In this work, we describe our custom OPS method and its successful application in Samsung Instant Plays for optimizing ad delivery timings. The motivation behind proposing our custom OPS method is the fact that traditional Off-Policy Evaluation (OPE) methods often exhibit enormous variance leading to unreliable results. We applied our OPS method to initialize policies for our custom pseudo-online training pipeline. The final policy resulted in a substantial 49% lift in the number of watched ads while maintaining similar retention rate.

References

[1]

Jason Gauci, Edoardo Conti, Yitao Liang, Kittipat Virochsiri, Zhengxing Chen, Yuchen He, Zachary Kaden, Vivek Narayanan, and Xiaohui Ye. 2018. Horizon: Facebook’s Open Source Applied Reinforcement Learning Platform. arXiv preprint arXiv:1811.00260 (2018).

Google Scholar

[2]

Josiah P. Hanna, Scott Niekum, and Peter Stone. 2019. Importance Sampling Policy Evaluation with an Estimated Behavior Policy. arxiv:1806.01347 [cs.LG]

Google Scholar

[3]

Nan Jiang and Lihong Li. 2015. Doubly Robust Off-policy Evaluation for Reinforcement Learning. CoRR abs/1511.03722 (2015). arXiv:1511.03722http://arxiv.org/abs/1511.03722

Google Scholar

[4]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing Atari with Deep Reinforcement Learning. arxiv:1312.5602 [cs.LG]

Google Scholar

[5]

Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I Jordan, 2018. Ray: a distributed framework for emerging AI applications. In Proceedings of the 13th USENIX conference on Operating Systems Design and Implementation. 561–577.

Google Scholar

[6]

Ikechukwu Uchendu, Ted Xiao, Yao Lu, Banghua Zhu, Mengyuan Yan, Joséphine Simon, Matthew Bennice, Chuyuan Fu, Cong Ma, Jiantao Jiao, 2023. Jump-start reinforcement learning. In International Conference on Machine Learning. PMLR, 34556–34583.

Google Scholar

[7]

Qing Wang, Jiechao Xiong, Lei Han, peng sun, Han Liu, and Tong Zhang. 2018. Exponentially Weighted Imitation Learning for Batched Historical Data. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.). Vol. 31. Curran Associates, Inc.https://proceedings.neurips.cc/paper_files/paper/2018/file/4aec1b3435c52abbdf8334ea0e7141e0-Paper.pdf

Google Scholar

[8]

Chengyang Ying, Zhongkai Hao, Xinning Zhou, Hang Su, Dong Yan, and Jun Zhu. 2023. On the Reuse Bias in Off-Policy Reinforcement Learning. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, Edith Elkind (Ed.). International Joint Conferences on Artificial Intelligence Organization, 4513–4521. https://doi.org/10.24963/ijcai.2023/502 Main Track.

Digital Library

Google Scholar

Index Terms

Off-Policy Selection for Optimizing Ad Display Timing in Mobile Games (Samsung Instant Plays)

Index terms have been assigned to the content through auto-classification.

Recommendations

Gradient temporal-difference learning for off-policy evaluation using emphatic weightings
Abstract
The problem of off-policy evaluation (OPE) has long been advocated as one of the foremost challenges in reinforcement learning. Gradient-based and emphasis-based temporal-difference (TD) learning algorithms comprise the major part of ...
Off-Policy Exploitability-Evaluation in Two-Player Zero-Sum Markov Games
AAMAS '21: Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems

Off-policy evaluation (OPE) is the problem of evaluating new policies using historical data obtained from a different policy. In the recent OPE context, most studies have focused on single-player cases, and not on multi-player cases. In this study, we ...
Learning Personalized Health Recommendations via Offline Reinforcement Learning
RecSys '24: Proceedings of the 18th ACM Conference on Recommender Systems

The healthcare industry is strained and would benefit from personalized treatment plans for treating various health conditions (e.g., HIV and diabetes). Reinforcement Learning is a promising approach to learning such sequential recommendation systems. ...

Comments

Information & Contributors

Information

Published In

RecSys '24: Proceedings of the 18th ACM Conference on Recommender Systems

October 2024

1438 pages

ISBN:9798400705052

DOI:10.1145/3640457

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 October 2024

Check for updates

Author Tags

Qualifiers

Extended-abstract
Research
Refereed limited

Conference

RecSys '24

Sponsor:

RecSys '24: 18th ACM Conference on Recommender Systems

October 14 - 18, 2024

Bari, Italy

Acceptance Rates

Overall Acceptance Rate 254 of 1,295 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
101
Total Downloads

Downloads (Last 12 months)101
Downloads (Last 6 weeks)3

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Abstract

References

Index Terms

Recommendations

Gradient temporal-difference learning for off-policy evaluation using emphatic weightings

Off-Policy Exploitability-Evaluation in Two-Player Zero-Sum Markov Games

Learning Personalized Health Recommendations via Offline Reinforcement Learning

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

HTML Format

Share

Share this Publication link

Share on social media

Affiliations