Deviations of Stochastic Bandit Regret

Salomon, Antoine; Audibert, Jean-Yves

doi:10.1007/978-3-642-24412-4_15

Antoine Salomon²² &
Jean-Yves Audibert^22,23

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6925))

Included in the following conference series:

International Conference on Algorithmic Learning Theory

2802 Accesses
6 Citations

Abstract

This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the total number of plays n is known beforehand by the agent, Audibert et al. (2009) exhibit a policy such that with probability at least 1-1/n, the regret of the policy is of order logn. They have also shown that such a property is not shared by the popular ucb1 policy of Auer et al. (2002). This work first answers an open question: it extends this negative result to any anytime policy. The second contribution of this paper is to design anytime robust policies for specific multi-armed bandit problems in which some restrictions are put on the set of possible distributions of the different arms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2-3), 235–256 (2002)
Article MATH Google Scholar
Agrawal, R.: Sample mean based index policies with o(log n) regret for the multi-armed bandit problem. Advances in Applied Mathematics 27, 1054–1078 (1995)
MATH Google Scholar
Audibert, J.-Y., Munos, R., Szepesvári, C.: Exploration-exploitation tradeoff using variance estimates in multi-armed bandits. Theoretical Computer Science 410(19), 1876–1902 (2009)
Article MathSciNet MATH Google Scholar
Bubeck, S., Munos, R., Stoltz, G., Szepesvari, C.: Online optimization in X-armed bandits. In: Advances in Neural Information Processing Systems, vol. 21, pp. 201–208 (2009)
Google Scholar
Babaioff, M., Sharma, Y., Slivkins, A.: Characterizing truthful multi-armed bandit mechanisms: extended abstract. In: Proceedings of the Tenth ACM Conference on Electronic Commerce, pp. 79–88. ACM, New York (2009)
Chapter Google Scholar
Bergemann, D., Valimaki, J.: Bandit problems. In: The New Palgrave Dictionary of Economics, 2nd edn. Macmillan Press, Basingstoke (2008)
Google Scholar
Coquelin, P.A., Munos, R.: Bandit algorithms for tree search. In: Uncertainty in Artificial Intelligence (2007)
Google Scholar
Devanur, N.R., Kakade, S.M.: The price of truthfulness for pay-per-click auctions. In: Proceedings of the Tenth ACM Conference on Electronic Commerce, pp. 99–106. ACM, New York (2009)
Chapter Google Scholar
Gelly, S., Wang, Y.: Exploration exploitation in go: UCT for Monte-Carlo go. In: Online trading between exploration and exploitation Workshop, Twentieth Annual Conference on Neural Information Processing Systems, NIPS 2006 (2006)
Google Scholar
Holland, J.H.: Adaptation in natural and artificial systems. MIT Press, Cambridge (1992)
Google Scholar
Kleinberg, R.D.: Nearly tight bounds for the continuum-armed bandit problem. In: Advances in Neural Information Processing Systems, vol. 17, pp. 697–704 (2005)
Google Scholar
Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006)
Chapter Google Scholar
Kleinberg, R., Slivkins, A., Upfal, E.: Multi-armed bandits in metric spaces. In: Proceedings of the 40th Annual ACM Symposium on Theory of Computing, pp. 681–690 (2008)
Google Scholar
Lamberton, D., Pagès, G., Tarrès, P.: When can the two-armed bandit algorithm be trusted? Annals of Applied Probability 14(3), 1424–1454 (2004)
Article MathSciNet MATH Google Scholar
Lai, T.L., Robbins, H.: Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 6, 4–22 (1985)
Article MathSciNet MATH Google Scholar
Massart, P.: The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality. The Annals of Probability 18(3), 1269–1283 (1990)
Article MathSciNet MATH Google Scholar
Robbins, H.: Some aspects of the sequential design of experiments. Bulletin of the American Mathematics Society 58, 527–535 (1952)
Article MathSciNet MATH Google Scholar
Rudin, W.: Real and complex analysis, 3rd edn. McGraw-Hill Inc., New York (1986)
MATH Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Imagine, LIGM, École des Ponts ParisTech Université Paris Est, France
Antoine Salomon & Jean-Yves Audibert
Sierra, CNRS/ENS/INRIA, Paris, France
Jean-Yves Audibert

Authors

Antoine Salomon
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Yves Audibert
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Helsinki, (Gustaf Hällströmin katu 2b), P.O. Box 68, 00014, Helsinki, Finland
Jyrki Kivinen & Esko Ukkonen &
Department of Computing Science, University of Alberta, T6G 2E8, Edmonton, AB, Canada
Csaba Szepesvári
Division of Computer Science, Hokkaido University, N-14, W-9, 060-0814, Sapporo, Japan
Thomas Zeugmann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Salomon, A., Audibert, JY. (2011). Deviations of Stochastic Bandit Regret. In: Kivinen, J., Szepesvári, C., Ukkonen, E., Zeugmann, T. (eds) Algorithmic Learning Theory. ALT 2011. Lecture Notes in Computer Science(), vol 6925. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24412-4_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-24412-4_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24411-7
Online ISBN: 978-3-642-24412-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics