skip to main content
10.1145/2783258.2788586acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Real-Time Bid Prediction using Thompson Sampling-Based Expert Selection

Published:10 August 2015Publication History

ABSTRACT

We study online meta-learners for real-time bid prediction that predict by selecting a single best predictor among several subordinate prediction algorithms, here called "experts". These predictors belong to the family of context-dependent past performance estimators that make a prediction only when the instance to be predicted falls within their areas of expertise. Within the advertising ecosystem, it is very common for the contextual information to be incomplete, hence, it is natural for some of the experts to abstain from making predictions on some of the instances. Experts' areas of expertise can overlap, which makes their predictions less suitable for merging; as such, they lend themselves better to the problem of best expert selection. In addition, their performance varies over time, which gives the expert selection problem a non-stochastic, adversarial flavor. In this paper we propose to use probability sampling (via Thompson Sampling) as a meta-learning algorithm that samples from the pool of experts for the purpose of bid prediction. We show performance results from the comparison of our approach to multiple state-of-the-art algorithms using exploration scavenging on a log file of over 300 million ad impressions, as well as comparison to a baseline rule-based model using production traffic from a leading DSP platform.

References

  1. S. Agrawal and N. Goyal. Analysis of thompson sampling for the multi-armed bandit problem. In Proceedings of the 25th Annual Conference on Learning Theory (COLT), June 2012.Google ScholarGoogle Scholar
  2. S. Agrawal and N. Goyal. Further optimal regret bounds for thompson sampling. CoRR, abs/1209.3353, 2012.Google ScholarGoogle Scholar
  3. S. Agrawal and N. Goyal. Thompson sampling for contextual bandits with linear payoffs. 30th International Conference on Machine Learning (ICML), June 2013.Google ScholarGoogle Scholar
  4. P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Mach. Learn., 47(2--3):235--256, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The nonstochastic multiarmed bandit problem. SIAM J. Comput., 32(1):48--77, Jan. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Blum. Empirical support for winnow and weighted-majority based algorithms: Results on a calendar scheduling domain. In A. Prieditis and S. J. Russell, editors, ICML, pages 64--72. Morgan Kaufmann, 1995.Google ScholarGoogle Scholar
  7. N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, New York, NY, USA, 2006. Google ScholarGoogle ScholarCross RefCross Ref
  8. O. Chapelle and L. Li. An empirical evaluation of thompson sampling. In NIPS, pages 2249--2257, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. W. Ding, T. Qin, X.-D. Zhang, and T.-Y. Liu. Multi-armed bandit with budget constraint and variable costs. In M. desJardins and M. L. Littman, editors, AAAI. AAAI Press, 2013.Google ScholarGoogle Scholar
  10. B. Edelman, M. Ostrovsky, and M. Schwarz. Internet advertising and the generalized second price auction: Selling billions of dollars worth of keywords. Working Paper 11765, National Bureau of Economic Research, November 2005.Google ScholarGoogle ScholarCross RefCross Ref
  11. Y. Freund, R. E. Schapire, Y. Singer, and M. K. Warmuth. Using and combining predictors that specialize. In F. T. Leighton and P. W. Shor, editors, STOC, pages 334--343. ACM, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. C. Gittins. Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society. Series B (Methodological), 41(2):148--177, 1979.Google ScholarGoogle ScholarCross RefCross Ref
  13. O. Granmo. Solving two-armed bernoulli bandit problems using a bayesian learning automaton. International Journal of Intelligent Computing and Cybernetics, 3(2):207--234, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  14. N. Gupta, O.-C. Granmo, and A. Agrawala. Thompson sampling for dynamic multi-armed bandits. In Proceedings of the 2011 10th International Conference on Machine Learning and Applications and Workshops - Volume 01, pages 484--489. IEEE Computer Society, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. E. Kaufmann, N. Korda, and R. Munos. Thompson sampling: An asymptotically optimal finite-time analysis. In Proceedings of the 23rd International Conference on Algorithmic Learning Theory, ALT'12, pages 199--213. Springer-Verlag, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Langford. Tutorial on practical prediction theory for classification. J. Mach. Learn. Res., 6:273--306, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Langford, A. Strehl, and J. Wortman. Exploration scavenging. In Proceedings of the 25th International Conference on Machine Learning, pages 528--535, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. K.-C. Lee, A. Jalali, and A. Dasdan. Real time bid optimization with smooth budget delivery in online advertising. In Proceedings of the Seventh International Workshop on Data Mining for Online Advertising, pages 1:1--1:9, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. K.-c. Lee, B. Orten, A. Dasdan, and W. Li. Estimating conversion rate in display advertising from past erformance data. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '12, pages 768--776, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. L. Li. Generalized thompson sampling for contextual bandits. CoRR, 2013.Google ScholarGoogle Scholar
  21. L. Li, W. Chu, J. Langford, and R. E. Schapire. A contextual-bandit approach to personalized news article recommendation. In M. Rappa, P. Jones, J. Freire, and S. Chakrabarti, editors, WWW, pages 661--670. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. G. Marsaglia and W. W. Tsang. A simple method for generating gamma variables. ACM Trans. Math. Softw., 26(3):363--372, Sept. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Matsumoto and T. Nishimura. Mersenne twister: A 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Trans. Model. Comput. Simul., 8(1):3--30, Jan. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. B. C. May, N. Korda, A. Lee, and D. S. Leslie. Optimistic bayesian sampling in contextual-bandit problems. J. Mach. Learn. Res., 13(1):2069--2106, June 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C. Perlich, B. Dalessandro, R. Hook, O. Stitelman, T. Raeder, and F. Provost. Bid optimizing and inventory scoring in targeted online advertising. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '12, pages 804--812. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. L. Scott. A modern bayesian look at the multi-armed bandit. Appl. Stoch. Model. Bus. Ind., 26(6):639--658, Nov. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. W. R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25:285--294, 1933.Google ScholarGoogle ScholarCross RefCross Ref
  28. S. Jafarpour, V. Cevher, and R. E. Schapire. A game theoretic approach to expander-based compressive sensing. Proceedings, IEEE International Symposium on Information Theory, 2011.Google ScholarGoogle Scholar
  29. R. E. Schapire, and Y. Freund. Boosting: Foundations and Algorithms MIT Press, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Real-Time Bid Prediction using Thompson Sampling-Based Expert Selection

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
          August 2015
          2378 pages
          ISBN:9781450336642
          DOI:10.1145/2783258

          Copyright © 2015 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 10 August 2015

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          KDD '15 Paper Acceptance Rate160of819submissions,20%Overall Acceptance Rate1,133of8,635submissions,13%

          Upcoming Conference

          KDD '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader