ABSTRACT
We consider a model of repeated online auctions in which an ad with an uncertain click-through rate faces a random distribution of competing bids in each auction and there is discounting of payoffs. We formulate the optimal solution to this explore/exploit problem as a dynamic programming problem and show that efficiency is maximized by making a bid for each advertiser equal to the advertiser's expected value for the advertising opportunity plus a term proportional to the variance in this value divided by the number of impressions the advertiser has received thus far. We then use this result to illustrate that the value of incorporating active exploration into a machine learning system in an auction environment is exceedingly small.
- D. Agarwal, B.-C. Chen, and P. Elango. Explore/exploit schemes for web content optimization. In Proceedings of the 9th Industrial Conference on Data Mining (ICDM), pages 1--10, 2009. Google ScholarDigital Library
- P. Aghion, P. Bolton, C. Harris, and B. Jullien. Optimal learning by experimentation. Review of Economic Studies, 58(4):621--654, 1991.Google ScholarCross Ref
- P. Aghion, M. P. Espinosa, and B. Jullien. Dynamic duopoly with learning through market experimentation. Economic Theory, 3(3):517--539, 1993.Google ScholarCross Ref
- N. Anthonisen. On learning to cooperate. Journal of Economic Theory, 107(2):253--287, 2002.Google ScholarCross Ref
- P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multi-armed bandit problem. Machine Learning, 47(2--3):235--256, 2002. Google ScholarDigital Library
- P. Auer, N. Cesa-Bianchi, and P. Fischer. The nonstochastic multi-armed bandit problem. SIAM Journal on Computing, 32(1):48--77, 2003. Google ScholarDigital Library
- M. Babaioff, Y. Sharma, and A. Slivkins. Characterizing truthful multi-armed bandit mechanisms. In Proceedings of the 10th ACM Conference on Electronic Commerce (EC), pages 79--88, 2009. Google ScholarDigital Library
- A. Banerjee and D. Fudenberg. Word-of-mouth learning. Games and Economic Behavior, 46(1):1--22, 2004.Google ScholarCross Ref
- J. S. Banks and R. K. Sundaram. Denumerable-armed bandits. Econometrica, 60(5):1071--1096, 1992.Google ScholarCross Ref
- E. Bax, A. Kuratti, P. McAfee, and J. Romero. Comparing predicted prices in auctions for online advertising. International Journal of Industrial Organization, 30(1):80--88, 2011.Google ScholarCross Ref
- D. Bergemann and J. Valimkaki. Learning and strategic pricing. Econometrica, 64(5):1125--1149, 1996.Google ScholarCross Ref
- D. Bergemann and J. Valimkaki. Market diffusion with two-sided learning. RAND Journal of Economics, 28(4):773--795, 1997.Google ScholarCross Ref
- D. Bergemann and J. Valimkaki. Experimentation in markets. Review of Economic Studies, 67(2):213--234, 2000.Google ScholarCross Ref
- D. Bergemann and J. Valimkaki. Stationary multi-choice bandit problems. Journal of Economic Dynamics and Control, 25(1):1585--1594, 2001.Google ScholarCross Ref
- P. Bolton and C. Harris. Strategic experimentation. Econometrica, 67(2):349--374, 1999.Google ScholarCross Ref
- M. Brezzi and T. L. Lai. Optimal learning and experimentation in bandit problems. Journal of Economic Dynamics and Control, 27(1):87--108, 2002.Google ScholarCross Ref
- S. Callander. Searching for good policies. American Political Science Review, 105(4):643--662, 2011.Google ScholarCross Ref
- N. R. Devanur and S. M. Kakade. The price of truthfulness for pay-per-click auctions. In Proceedings of the 10th ACM Conference on Electronic Commerce (EC), pages 99--106, 2009. Google ScholarDigital Library
- A. Fishman and R. Rob. Experimentation and competition. Journal of Economic Theory, 78(2):299--320, 1998.Google ScholarCross Ref
- D. Gale. What have we learned from social learning? European Economic Review, 40(3--5):617--628, 1996.Google ScholarCross Ref
- D. Gale and R. W. Rosenthal. Experimentation, imitation, and stochastic stability. Journal of Economic Theory, 84(1):1--40, 1999.Google ScholarCross Ref
- J. C. Gittins. Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society, Series B, 41(2):148--177, 1979.Google Scholar
- P. Hummel and R. P. McAfee. Machine learning in an auction environment. Google Inc. Typescript, 2013.Google Scholar
- K. Iyer, R. Johari, and M. Sundarajan. Mean field equilibria of dynamic auctions with learning. Cornell University Typescript, 2013.Google Scholar
- G. Keller and S. Rady. Optimal experimentation in a changing environment. Review of Economic Studies, 66(3):475--503, 1999.Google ScholarCross Ref
- G. Keller and S. Rady. Strategic experimentation with poisson bandits. Theoretical Economics, 5(2):275--311, 2010.Google ScholarCross Ref
- G. Keller, S. Rady, and M. Cripps. Strategic experimentation with experimental bandits. Econometrica, 73(1):39--68, 2005.Google ScholarCross Ref
- S. Lahaie and R. P. McAfee. Efficient ranking in sponsored search. In Proceedings of the 7th International Workshop on Internet and Network Economics (WINE), pages 254--265, 2011. Google ScholarDigital Library
- T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6:4--22, 1985.Google ScholarDigital Library
- R. A. Lewis. Where's the 'wear-out?' online display ads and the impact of frequency. Massachusetts Institute of Technology Typescript, 2010.Google Scholar
- S.-M. Li, M. Mahdian, and R. P. McAfee. Value of learning in sponsored search auctions. In Proceedings of the 6th International Workshop on Internet and Network Economics (WINE), pages 294--305, 2010. Google ScholarDigital Library
- L. J. Mirman, L. Samuelson, and A. Urbano. Monopoly experimentation. International Economic Review, 34(3):549--563, 1993.Google ScholarCross Ref
- G. Moscarini and L. Smith. The optimal level of experimentation. Econometrica, 69(6):1629--1644, 2001.Google ScholarCross Ref
- M. Ostrovsky and M. Schwarz. Reserve prices in internet advertising auctions: A field experiment. Stanford University Typescript, 2009.Google ScholarCross Ref
- M. Rothschild. A two-armed bandit theory of market pricing. Journal of Economic Theory, 9(2):185--202, 1974.Google ScholarCross Ref
- A. Rusitchini and A. Wolinsky. Learning about variable demand in the long run. Journal of Economic Dynamics and Control, 19(5--7):1283--1292, 1995.Google Scholar
- K. H. Schlag. Why imitate, and if so how? a boundedly rational approach to multi-armed bandits. Journal of Economic Theory, 78(1):130--156, 1998.Google ScholarCross Ref
- B. Strulovici. Learning while voting: Determinant of collective experimentation. Econometrica, 78(3):933--971, 2010.Google ScholarCross Ref
- X. Vives. Learning from others: A welfare analysis. Games and Economic Behavior, 20(2):177--200, 1997.Google ScholarCross Ref
- M. L. Weitzman. Optimal search for the best alternative. Econometrica, 47(3):641--654, 1979.Google ScholarCross Ref
- J. Wortman, Y. Vorobeychik, L. Li, and J. Langford. Maintaining equilibria during exploration in sponsored search auctions. In Proceedings of the 3rd International Workshop on Internet and Network Economics (WINE), pages 119--130, 2007. Google ScholarDigital Library
Index Terms
- Machine learning in an auction environment
Recommendations
Machine learning in an auction environment
We consider a model of repeated online auctions in which an ad with an uncertain click-through rate faces a random distribution of competing bids in each auction and there is discounting of payoffs. We formulate the optimal solution to this explore/...
Pricing Rule in a Clock Auction
We analyze a discrete clock auction with lowest-accepted-bid (LAB) pricing and provisional winners, as adopted by India for its 3G spectrum auction. In a perfect Bayesian equilibrium, the provisional winner shades her bid, whereas provisional losers do ...
On the design of sponsored keyword advertising slot auctions: An analysis of a generalized second-price auction approach
The generalized second-priceauction mechanism is commonly used in research in the context of keyword advertising slot auctioning. The mechanism sets the clearing prices for advertising slots on a search engine's Web pages such that the advertiser will ...
Comments