Algorithms for Adversarial Bandit Problems with Multiple Plays

Uchiya, Taishi; Nakamura, Atsuyoshi; Kudo, Mineichi

doi:10.1007/978-3-642-16108-7_30

Taishi Uchiya²³,
Atsuyoshi Nakamura²³ &
Mineichi Kudo²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6331))

Included in the following conference series:

International Conference on Algorithmic Learning Theory

1553 Accesses
27 Citations

Abstract

Adversarial bandit problems studied by Auer et al. [4] are multi-armed bandit problems in which no stochastic assumption is made on the nature of the process generating the rewards for actions. In this paper, we extend their theories to the case where k( ≥ 1) distinct actions are selected at each time step. As algorithms to solve our problem, we analyze an extension of Exp3 [4] and an application of a bandit online linear optimization algorithm [1] in addition to two existing algorithms (Exp3,ComBand [5] in terms of time and space efficiency and the regret for the best fixed action set. The extension of Exp3, called Exp3.M, performs best with respect to all the measures: it runs in O(K(logk + 1)) time and O(K) space, and suffers at most $O(\sqrt{kTK\log(K/k)})$ regret, where K is the number of possible actions and T is the number of iterations. The upper bound of the regret we proved for Exp3.M is an extension of that proved by Auer et al. for Exp3.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Asymptotically optimal algorithms for budgeted multiple play bandits

Article 16 May 2019

Noise Free Multi-armed Bandit Game

An asymptotically optimal strategy for constrained multi-armed bandit problems

Article 02 January 2020

References

Abernethy, J., Hazan, E., Rakhlin, A.: Competing in the dark: An efficient algorithm for bandit linear optimization. In: Proceedings of the 21st Annual Conference on Learning Theory, COLT 2008 (2008)
Google Scholar
Agrawal, R., Hegde, M.V., Teneketzis, D.: Multi-armed bandits with multiple plays and switching cost. Stochastic and Stochastic Reports 29, 437–459 (1990)
MATH MathSciNet Google Scholar
Anantharam, V., Varaiya, P., Walrand, J.: Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays –part i: I.i.d. rewards. IEEE Transactions on Automatic Control 32, 968–976 (1986)
Article MathSciNet Google Scholar
Auer, P., Cesa-bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM Journal on Computing 32, 48–77 (2002)
Article MATH MathSciNet Google Scholar
Cesa-bianchi, N., Lugosi, G.: Combinatorial bandits. In: Proceedings of the 22nd Annual Conference on Learning Theory, COLT 2009 (2009)
Google Scholar
Gandhi, R., Khuller, S., Parthasarathy, S., Srinivasan, A.: Dependent rounding and its applications to approximation algorithms. Journal of the ACM 53(3), 320–360 (2006)
Article MathSciNet Google Scholar
György, A., Linder, T., Lugosi, G., Ottucsák, G.: The on-line shortest path problem under partial monitoring. Journal of Machine Learning Research 8, 2369–2403 (2007)
Google Scholar
Kleinberg, R.: Notes from week 8: Multi-armed bandit problems. CS 683–Learning, Games, and Electronic Markets (2007), http://www.cs.cornell.edu/courses/cs683/2007sp/lecnotes/week8.pdf
Krein, M., Milman, D.: On extreme points of regular convex sets. Studia Mathematica, 133–138 (1940)
Google Scholar
Mahajan, A., Teneketzis, D.: Multi-armed bandit problems. In: Foundations and Applications of Sensor Management, pp. 121–151. Springer, Heidelberg (2007)
Google Scholar
Nakamura, A., Abe, N.: Improvements to the linear programming based scheduling of web advertisements. Electronic Commerce Research 5, 75–98 (2005)
Article MATH Google Scholar
Niculescu-Mizil, A.: Multi-armed bandits with betting. In: COLT 2009 Workshop, pp. 133–138 (2009)
Google Scholar
Pandelis, D.G., Tenekezis, D.: On the optimality of the gittins index rule in multi-armed bandits with multiple plays. Mathematical Methods of Operations Research 50, 449–461 (1999)
Article MATH MathSciNet Google Scholar
Song, N.O., Teneketzis, D.: Discrete search with multiple sensors. Mathematical Methods of Operations Research 60, 1–14 (2004)
MATH MathSciNet Google Scholar
Uchiya, T., Nakamura, A., Kudo, M.: Adversarial bandit problems with multiple plays. In: The IEICE Technical Report, COMP2009-27 (2009)
Google Scholar
Warmuth, M.K., Takimoto, E.: Path kernels and multiplicative updates. Journal of Machine Learning Research, 773–818 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Information Science and Technology, Hokkaido University, Kita 14, Nishi 9, Kita-ku, Sapporo, 060-0814, Hokkaido, Japan
Taishi Uchiya, Atsuyoshi Nakamura & Mineichi Kudo

Authors

Taishi Uchiya
View author publications
You can also search for this author in PubMed Google Scholar
Atsuyoshi Nakamura
View author publications
You can also search for this author in PubMed Google Scholar
Mineichi Kudo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Research School of Information Sciences and Engineering, Australian National University and NICTA, 0200, Canberra, ACT, Australia
Marcus Hutter
Department of Mathematics, National University of Singapore, Block S17, 10 Lower Kent Ridge Road, 119076, Singapore, Republic of Singapore
Frank Stephan
Department of Computer Science, University of London, Royal Holloway, TW20 0EX, Egham, Surrey, UK
Vladimir Vovk
Division of Computer Science, Hokkaido University, , ,, N-14, W-9, Sapporo, 060-0814, Japan
Thomas Zeugmann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Uchiya, T., Nakamura, A., Kudo, M. (2010). Algorithms for Adversarial Bandit Problems with Multiple Plays. In: Hutter, M., Stephan, F., Vovk, V., Zeugmann, T. (eds) Algorithmic Learning Theory. ALT 2010. Lecture Notes in Computer Science(), vol 6331. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16108-7_30

Download citation

DOI: https://doi.org/10.1007/978-3-642-16108-7_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16107-0
Online ISBN: 978-3-642-16108-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics