ABSTRACT
We study online meta-learners for real-time bid prediction that predict by selecting a single best predictor among several subordinate prediction algorithms, here called "experts". These predictors belong to the family of context-dependent past performance estimators that make a prediction only when the instance to be predicted falls within their areas of expertise. Within the advertising ecosystem, it is very common for the contextual information to be incomplete, hence, it is natural for some of the experts to abstain from making predictions on some of the instances. Experts' areas of expertise can overlap, which makes their predictions less suitable for merging; as such, they lend themselves better to the problem of best expert selection. In addition, their performance varies over time, which gives the expert selection problem a non-stochastic, adversarial flavor. In this paper we propose to use probability sampling (via Thompson Sampling) as a meta-learning algorithm that samples from the pool of experts for the purpose of bid prediction. We show performance results from the comparison of our approach to multiple state-of-the-art algorithms using exploration scavenging on a log file of over 300 million ad impressions, as well as comparison to a baseline rule-based model using production traffic from a leading DSP platform.
- S. Agrawal and N. Goyal. Analysis of thompson sampling for the multi-armed bandit problem. In Proceedings of the 25th Annual Conference on Learning Theory (COLT), June 2012.Google Scholar
- S. Agrawal and N. Goyal. Further optimal regret bounds for thompson sampling. CoRR, abs/1209.3353, 2012.Google Scholar
- S. Agrawal and N. Goyal. Thompson sampling for contextual bandits with linear payoffs. 30th International Conference on Machine Learning (ICML), June 2013.Google Scholar
- P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Mach. Learn., 47(2--3):235--256, 2002. Google ScholarDigital Library
- P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The nonstochastic multiarmed bandit problem. SIAM J. Comput., 32(1):48--77, Jan. 2003. Google ScholarDigital Library
- A. Blum. Empirical support for winnow and weighted-majority based algorithms: Results on a calendar scheduling domain. In A. Prieditis and S. J. Russell, editors, ICML, pages 64--72. Morgan Kaufmann, 1995.Google Scholar
- N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, New York, NY, USA, 2006. Google ScholarCross Ref
- O. Chapelle and L. Li. An empirical evaluation of thompson sampling. In NIPS, pages 2249--2257, 2011.Google ScholarDigital Library
- W. Ding, T. Qin, X.-D. Zhang, and T.-Y. Liu. Multi-armed bandit with budget constraint and variable costs. In M. desJardins and M. L. Littman, editors, AAAI. AAAI Press, 2013.Google Scholar
- B. Edelman, M. Ostrovsky, and M. Schwarz. Internet advertising and the generalized second price auction: Selling billions of dollars worth of keywords. Working Paper 11765, National Bureau of Economic Research, November 2005.Google ScholarCross Ref
- Y. Freund, R. E. Schapire, Y. Singer, and M. K. Warmuth. Using and combining predictors that specialize. In F. T. Leighton and P. W. Shor, editors, STOC, pages 334--343. ACM, 1997. Google ScholarDigital Library
- J. C. Gittins. Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society. Series B (Methodological), 41(2):148--177, 1979.Google ScholarCross Ref
- O. Granmo. Solving two-armed bernoulli bandit problems using a bayesian learning automaton. International Journal of Intelligent Computing and Cybernetics, 3(2):207--234, 2010.Google ScholarCross Ref
- N. Gupta, O.-C. Granmo, and A. Agrawala. Thompson sampling for dynamic multi-armed bandits. In Proceedings of the 2011 10th International Conference on Machine Learning and Applications and Workshops - Volume 01, pages 484--489. IEEE Computer Society, 2011. Google ScholarDigital Library
- E. Kaufmann, N. Korda, and R. Munos. Thompson sampling: An asymptotically optimal finite-time analysis. In Proceedings of the 23rd International Conference on Algorithmic Learning Theory, ALT'12, pages 199--213. Springer-Verlag, 2012. Google ScholarDigital Library
- J. Langford. Tutorial on practical prediction theory for classification. J. Mach. Learn. Res., 6:273--306, 2005. Google ScholarDigital Library
- J. Langford, A. Strehl, and J. Wortman. Exploration scavenging. In Proceedings of the 25th International Conference on Machine Learning, pages 528--535, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- K.-C. Lee, A. Jalali, and A. Dasdan. Real time bid optimization with smooth budget delivery in online advertising. In Proceedings of the Seventh International Workshop on Data Mining for Online Advertising, pages 1:1--1:9, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- K.-c. Lee, B. Orten, A. Dasdan, and W. Li. Estimating conversion rate in display advertising from past erformance data. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '12, pages 768--776, 2012. Google ScholarDigital Library
- L. Li. Generalized thompson sampling for contextual bandits. CoRR, 2013.Google Scholar
- L. Li, W. Chu, J. Langford, and R. E. Schapire. A contextual-bandit approach to personalized news article recommendation. In M. Rappa, P. Jones, J. Freire, and S. Chakrabarti, editors, WWW, pages 661--670. ACM, 2010. Google ScholarDigital Library
- G. Marsaglia and W. W. Tsang. A simple method for generating gamma variables. ACM Trans. Math. Softw., 26(3):363--372, Sept. 2000. Google ScholarDigital Library
- M. Matsumoto and T. Nishimura. Mersenne twister: A 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Trans. Model. Comput. Simul., 8(1):3--30, Jan. 1998. Google ScholarDigital Library
- B. C. May, N. Korda, A. Lee, and D. S. Leslie. Optimistic bayesian sampling in contextual-bandit problems. J. Mach. Learn. Res., 13(1):2069--2106, June 2012. Google ScholarDigital Library
- C. Perlich, B. Dalessandro, R. Hook, O. Stitelman, T. Raeder, and F. Provost. Bid optimizing and inventory scoring in targeted online advertising. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '12, pages 804--812. ACM, 2012. Google ScholarDigital Library
- S. L. Scott. A modern bayesian look at the multi-armed bandit. Appl. Stoch. Model. Bus. Ind., 26(6):639--658, Nov. 2010. Google ScholarDigital Library
- W. R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25:285--294, 1933.Google ScholarCross Ref
- S. Jafarpour, V. Cevher, and R. E. Schapire. A game theoretic approach to expander-based compressive sensing. Proceedings, IEEE International Symposium on Information Theory, 2011.Google Scholar
- R. E. Schapire, and Y. Freund. Boosting: Foundations and Algorithms MIT Press, 2012. Google ScholarDigital Library
Index Terms
- Real-Time Bid Prediction using Thompson Sampling-Based Expert Selection
Recommendations
Optimization of a SSP's Header Bidding Strategy using Thompson Sampling
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningOver the last decade, digital media (web or app publishers) generalized the use of real time ad auctions to sell their ad spaces. Multiple auction platforms, also called Supply-Side Platforms (SSP), were created. Because of this multiplicity, publishers ...
Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningNeural-based multi-task learning has been successfully used in many real-world large-scale applications such as recommendation systems. For example, in movie recommendations, beyond providing users movies which they tend to purchase and watch, the ...
Real-Time Bidding by Reinforcement Learning in Display Advertising
WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data MiningThe majority of online display ads are served through real-time bidding (RTB) --- each ad display impression is auctioned off in real-time when it is just being generated from a user visit. To place an ad automatically and optimally, it is critical for ...
Comments