research-article

Real-Time Bid Prediction using Thompson Sampling-Based Expert Selection

Authors:
Elena Ikonomovska

Turn Inc., Redwood City, CA, USA

Turn Inc., Redwood City, CA, USA
View Profile

,
Sina Jafarpour

Turn Inc., Redwood City, CA, USA

Turn Inc., Redwood City, CA, USA
View Profile

,
Ali Dasdan

Turn Inc., Redwood City, CA, USA

Turn Inc., Redwood City, CA, USA
View Profile

KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data MiningAugust 2015Pages 1869–1878https://doi.org/10.1145/2783258.2788586

Published:10 August 2015Publication History

KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pages 1869–1878

ABSTRACT

We study online meta-learners for real-time bid prediction that predict by selecting a single best predictor among several subordinate prediction algorithms, here called "experts". These predictors belong to the family of context-dependent past performance estimators that make a prediction only when the instance to be predicted falls within their areas of expertise. Within the advertising ecosystem, it is very common for the contextual information to be incomplete, hence, it is natural for some of the experts to abstain from making predictions on some of the instances. Experts' areas of expertise can overlap, which makes their predictions less suitable for merging; as such, they lend themselves better to the problem of best expert selection. In addition, their performance varies over time, which gives the expert selection problem a non-stochastic, adversarial flavor. In this paper we propose to use probability sampling (via Thompson Sampling) as a meta-learning algorithm that samples from the pool of experts for the purpose of bid prediction. We show performance results from the comparison of our approach to multiple state-of-the-art algorithms using exploration scavenging on a log file of over 300 million ad impressions, as well as comparison to a baseline rule-based model using production traffic from a leading DSP platform.

References

S. Agrawal and N. Goyal. Analysis of thompson sampling for the multi-armed bandit problem. In Proceedings of the 25th Annual Conference on Learning Theory (COLT), June 2012.Google Scholar
S. Agrawal and N. Goyal. Further optimal regret bounds for thompson sampling. CoRR, abs/1209.3353, 2012.Google Scholar
S. Agrawal and N. Goyal. Thompson sampling for contextual bandits with linear payoffs. 30th International Conference on Machine Learning (ICML), June 2013.Google Scholar
P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Mach. Learn., 47(2--3):235--256, 2002. Google ScholarDigital Library
P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The nonstochastic multiarmed bandit problem. SIAM J. Comput., 32(1):48--77, Jan. 2003. Google ScholarDigital Library
A. Blum. Empirical support for winnow and weighted-majority based algorithms: Results on a calendar scheduling domain. In A. Prieditis and S. J. Russell, editors, ICML, pages 64--72. Morgan Kaufmann, 1995.Google Scholar
N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, New York, NY, USA, 2006. Google ScholarCross Ref
O. Chapelle and L. Li. An empirical evaluation of thompson sampling. In NIPS, pages 2249--2257, 2011.Google ScholarDigital Library
W. Ding, T. Qin, X.-D. Zhang, and T.-Y. Liu. Multi-armed bandit with budget constraint and variable costs. In M. desJardins and M. L. Littman, editors, AAAI. AAAI Press, 2013.Google Scholar
B. Edelman, M. Ostrovsky, and M. Schwarz. Internet advertising and the generalized second price auction: Selling billions of dollars worth of keywords. Working Paper 11765, National Bureau of Economic Research, November 2005.Google ScholarCross Ref
Y. Freund, R. E. Schapire, Y. Singer, and M. K. Warmuth. Using and combining predictors that specialize. In F. T. Leighton and P. W. Shor, editors, STOC, pages 334--343. ACM, 1997. Google ScholarDigital Library
J. C. Gittins. Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society. Series B (Methodological), 41(2):148--177, 1979.Google ScholarCross Ref
O. Granmo. Solving two-armed bernoulli bandit problems using a bayesian learning automaton. International Journal of Intelligent Computing and Cybernetics, 3(2):207--234, 2010.Google ScholarCross Ref
N. Gupta, O.-C. Granmo, and A. Agrawala. Thompson sampling for dynamic multi-armed bandits. In Proceedings of the 2011 10th International Conference on Machine Learning and Applications and Workshops - Volume 01, pages 484--489. IEEE Computer Society, 2011. Google ScholarDigital Library
E. Kaufmann, N. Korda, and R. Munos. Thompson sampling: An asymptotically optimal finite-time analysis. In Proceedings of the 23rd International Conference on Algorithmic Learning Theory, ALT'12, pages 199--213. Springer-Verlag, 2012. Google ScholarDigital Library
J. Langford. Tutorial on practical prediction theory for classification. J. Mach. Learn. Res., 6:273--306, 2005. Google ScholarDigital Library
J. Langford, A. Strehl, and J. Wortman. Exploration scavenging. In Proceedings of the 25th International Conference on Machine Learning, pages 528--535, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
K.-C. Lee, A. Jalali, and A. Dasdan. Real time bid optimization with smooth budget delivery in online advertising. In Proceedings of the Seventh International Workshop on Data Mining for Online Advertising, pages 1:1--1:9, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
K.-c. Lee, B. Orten, A. Dasdan, and W. Li. Estimating conversion rate in display advertising from past erformance data. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '12, pages 768--776, 2012. Google ScholarDigital Library
L. Li. Generalized thompson sampling for contextual bandits. CoRR, 2013.Google Scholar
L. Li, W. Chu, J. Langford, and R. E. Schapire. A contextual-bandit approach to personalized news article recommendation. In M. Rappa, P. Jones, J. Freire, and S. Chakrabarti, editors, WWW, pages 661--670. ACM, 2010. Google ScholarDigital Library
G. Marsaglia and W. W. Tsang. A simple method for generating gamma variables. ACM Trans. Math. Softw., 26(3):363--372, Sept. 2000. Google ScholarDigital Library
M. Matsumoto and T. Nishimura. Mersenne twister: A 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Trans. Model. Comput. Simul., 8(1):3--30, Jan. 1998. Google ScholarDigital Library
B. C. May, N. Korda, A. Lee, and D. S. Leslie. Optimistic bayesian sampling in contextual-bandit problems. J. Mach. Learn. Res., 13(1):2069--2106, June 2012. Google ScholarDigital Library
C. Perlich, B. Dalessandro, R. Hook, O. Stitelman, T. Raeder, and F. Provost. Bid optimizing and inventory scoring in targeted online advertising. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '12, pages 804--812. ACM, 2012. Google ScholarDigital Library
S. L. Scott. A modern bayesian look at the multi-armed bandit. Appl. Stoch. Model. Bus. Ind., 26(6):639--658, Nov. 2010. Google ScholarDigital Library
W. R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25:285--294, 1933.Google ScholarCross Ref
S. Jafarpour, V. Cevher, and R. E. Schapire. A game theoretic approach to expander-based compressive sensing. Proceedings, IEEE International Symposium on Information Theory, 2011.Google Scholar
R. E. Schapire, and Y. Freund. Boosting: Foundations and Algorithms MIT Press, 2012. Google ScholarDigital Library

Index Terms

Real-Time Bid Prediction using Thompson Sampling-Based Expert Selection
1. Applied computing
  1. Physical sciences and engineering
    1. Mathematics and statistics
2. Computing methodologies
  1. Machine learning
  2. Symbolic and algebraic manipulation
    1. Computer algebra systems

Recommendations

Optimization of a SSP's Header Bidding Strategy using Thompson Sampling
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Over the last decade, digital media (web or app publishers) generalized the use of real time ad auctions to sell their ad spaces. Multiple auction platforms, also called Supply-Side Platforms (SSP), were created. Because of this multiplicity, publishers ...
Read More
Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Neural-based multi-task learning has been successfully used in many real-world large-scale applications such as recommendation systems. For example, in movie recommendations, beyond providing users movies which they tend to purchase and watch, the ...
Read More
Real-Time Bidding by Reinforcement Learning in Display Advertising
WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining

The majority of online display ads are served through real-time bidding (RTB) --- each ad display impression is auctioned off in real-time when it is just being generated from a user visit. To place an ad automatically and optimally, it is critical for ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2015
2378 pages
ISBN:9781450336642
DOI:10.1145/2783258
General Chairs:
Longbing Cao
University of Technology, Sydney
,
Chengqi Zhang
University of Technology, Sydney
,
Program Chairs:
Thorsten Joachims
Cornell University
,
Geoff Webb
Monash University
,
Dragos D. Margineantu
Boeing Research
,
Graham Williams
Australian Taxation Office
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 August 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
bayesian online learning
multi-armed bandits
online advertising
online algorithms
randomized probability matching
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '15 Paper Acceptance Rate160of819submissions,20%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 771
  Total Downloads
- Downloads (Last 12 months)22
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Real-Time Bid Prediction using Thompson Sampling-Based Expert Selection

KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Optimization of a SSP's Header Bidding Strategy using Thompson Sampling

Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts

Real-Time Bidding by Reinforcement Learning in Display Advertising