skip to main content
10.1145/2566486.2567974acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Machine learning in an auction environment

Published:07 April 2014Publication History

ABSTRACT

We consider a model of repeated online auctions in which an ad with an uncertain click-through rate faces a random distribution of competing bids in each auction and there is discounting of payoffs. We formulate the optimal solution to this explore/exploit problem as a dynamic programming problem and show that efficiency is maximized by making a bid for each advertiser equal to the advertiser's expected value for the advertising opportunity plus a term proportional to the variance in this value divided by the number of impressions the advertiser has received thus far. We then use this result to illustrate that the value of incorporating active exploration into a machine learning system in an auction environment is exceedingly small.

References

  1. D. Agarwal, B.-C. Chen, and P. Elango. Explore/exploit schemes for web content optimization. In Proceedings of the 9th Industrial Conference on Data Mining (ICDM), pages 1--10, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. Aghion, P. Bolton, C. Harris, and B. Jullien. Optimal learning by experimentation. Review of Economic Studies, 58(4):621--654, 1991.Google ScholarGoogle ScholarCross RefCross Ref
  3. P. Aghion, M. P. Espinosa, and B. Jullien. Dynamic duopoly with learning through market experimentation. Economic Theory, 3(3):517--539, 1993.Google ScholarGoogle ScholarCross RefCross Ref
  4. N. Anthonisen. On learning to cooperate. Journal of Economic Theory, 107(2):253--287, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  5. P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multi-armed bandit problem. Machine Learning, 47(2--3):235--256, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Auer, N. Cesa-Bianchi, and P. Fischer. The nonstochastic multi-armed bandit problem. SIAM Journal on Computing, 32(1):48--77, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Babaioff, Y. Sharma, and A. Slivkins. Characterizing truthful multi-armed bandit mechanisms. In Proceedings of the 10th ACM Conference on Electronic Commerce (EC), pages 79--88, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Banerjee and D. Fudenberg. Word-of-mouth learning. Games and Economic Behavior, 46(1):1--22, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  9. J. S. Banks and R. K. Sundaram. Denumerable-armed bandits. Econometrica, 60(5):1071--1096, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  10. E. Bax, A. Kuratti, P. McAfee, and J. Romero. Comparing predicted prices in auctions for online advertising. International Journal of Industrial Organization, 30(1):80--88, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  11. D. Bergemann and J. Valimkaki. Learning and strategic pricing. Econometrica, 64(5):1125--1149, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  12. D. Bergemann and J. Valimkaki. Market diffusion with two-sided learning. RAND Journal of Economics, 28(4):773--795, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  13. D. Bergemann and J. Valimkaki. Experimentation in markets. Review of Economic Studies, 67(2):213--234, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  14. D. Bergemann and J. Valimkaki. Stationary multi-choice bandit problems. Journal of Economic Dynamics and Control, 25(1):1585--1594, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  15. P. Bolton and C. Harris. Strategic experimentation. Econometrica, 67(2):349--374, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  16. M. Brezzi and T. L. Lai. Optimal learning and experimentation in bandit problems. Journal of Economic Dynamics and Control, 27(1):87--108, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  17. S. Callander. Searching for good policies. American Political Science Review, 105(4):643--662, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  18. N. R. Devanur and S. M. Kakade. The price of truthfulness for pay-per-click auctions. In Proceedings of the 10th ACM Conference on Electronic Commerce (EC), pages 99--106, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Fishman and R. Rob. Experimentation and competition. Journal of Economic Theory, 78(2):299--320, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  20. D. Gale. What have we learned from social learning? European Economic Review, 40(3--5):617--628, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  21. D. Gale and R. W. Rosenthal. Experimentation, imitation, and stochastic stability. Journal of Economic Theory, 84(1):1--40, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  22. J. C. Gittins. Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society, Series B, 41(2):148--177, 1979.Google ScholarGoogle Scholar
  23. P. Hummel and R. P. McAfee. Machine learning in an auction environment. Google Inc. Typescript, 2013.Google ScholarGoogle Scholar
  24. K. Iyer, R. Johari, and M. Sundarajan. Mean field equilibria of dynamic auctions with learning. Cornell University Typescript, 2013.Google ScholarGoogle Scholar
  25. G. Keller and S. Rady. Optimal experimentation in a changing environment. Review of Economic Studies, 66(3):475--503, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  26. G. Keller and S. Rady. Strategic experimentation with poisson bandits. Theoretical Economics, 5(2):275--311, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  27. G. Keller, S. Rady, and M. Cripps. Strategic experimentation with experimental bandits. Econometrica, 73(1):39--68, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  28. S. Lahaie and R. P. McAfee. Efficient ranking in sponsored search. In Proceedings of the 7th International Workshop on Internet and Network Economics (WINE), pages 254--265, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6:4--22, 1985.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. A. Lewis. Where's the 'wear-out?' online display ads and the impact of frequency. Massachusetts Institute of Technology Typescript, 2010.Google ScholarGoogle Scholar
  31. S.-M. Li, M. Mahdian, and R. P. McAfee. Value of learning in sponsored search auctions. In Proceedings of the 6th International Workshop on Internet and Network Economics (WINE), pages 294--305, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. L. J. Mirman, L. Samuelson, and A. Urbano. Monopoly experimentation. International Economic Review, 34(3):549--563, 1993.Google ScholarGoogle ScholarCross RefCross Ref
  33. G. Moscarini and L. Smith. The optimal level of experimentation. Econometrica, 69(6):1629--1644, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  34. M. Ostrovsky and M. Schwarz. Reserve prices in internet advertising auctions: A field experiment. Stanford University Typescript, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  35. M. Rothschild. A two-armed bandit theory of market pricing. Journal of Economic Theory, 9(2):185--202, 1974.Google ScholarGoogle ScholarCross RefCross Ref
  36. A. Rusitchini and A. Wolinsky. Learning about variable demand in the long run. Journal of Economic Dynamics and Control, 19(5--7):1283--1292, 1995.Google ScholarGoogle Scholar
  37. K. H. Schlag. Why imitate, and if so how? a boundedly rational approach to multi-armed bandits. Journal of Economic Theory, 78(1):130--156, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  38. B. Strulovici. Learning while voting: Determinant of collective experimentation. Econometrica, 78(3):933--971, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  39. X. Vives. Learning from others: A welfare analysis. Games and Economic Behavior, 20(2):177--200, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  40. M. L. Weitzman. Optimal search for the best alternative. Econometrica, 47(3):641--654, 1979.Google ScholarGoogle ScholarCross RefCross Ref
  41. J. Wortman, Y. Vorobeychik, L. Li, and J. Langford. Maintaining equilibria during exploration in sponsored search auctions. In Proceedings of the 3rd International Workshop on Internet and Network Economics (WINE), pages 119--130, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Machine learning in an auction environment

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      WWW '14: Proceedings of the 23rd international conference on World wide web
      April 2014
      926 pages
      ISBN:9781450327442
      DOI:10.1145/2566486

      Copyright © 2014 Copyright is held by the International World Wide Web Conference Committee (IW3C2).

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 7 April 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      WWW '14 Paper Acceptance Rate84of645submissions,13%Overall Acceptance Rate1,899of8,196submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader