On Following the Perturbed Leader in the Bandit Setting

Kujala, Jussi; Elomaa, Tapio

doi:10.1007/11564089_29

Jussi Kujala²¹ &
Tapio Elomaa²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3734))

Included in the following conference series:

International Conference on Algorithmic Learning Theory

2145 Accesses
2 Citations

Abstract

In an online decision problem an algorithm is at each time step required to choose one of the feasible points without knowing the cost associated with it. An adversary assigns the cost to possible decisions either obliviously or adaptively. The online algorithm, naturally, attempts to collect as little cost as possible. The cost difference of the online algorithm and the best static decision in hindsight is called the regret of the algorithm.

Kalai and Vempala [1] showed that it is possible to have efficient solutions to some problems with a linear cost function by following the perturbed leader. Their solution requires the costs of all decisions to be known.Recently there has also been some progress in the bandit setting, where only the cost of the selected decision is observed. A bound of O(T ^2/3) on T rounds was first shown by Awerbuch and Kleinberg [2] for the regret against an oblivious adversary and later McMahan and Blum [3] showed that a bound of \(O(\sqrt{\ln T}T^{3/4})\) is obtainable against an adaptive adversary.

In this paper we study Kalai and Vempala’s model from the viewpoint of bandit algorithms. We show that the algorithm of McMahan and Blum attains a regret of O(T ^2/3) against an oblivious adversary. Moreover, we show a tighter \(O(\sqrt{m\ln m}\sqrt{T})\) bound for the expert setting using m experts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kalai, A., Vempala, S.: Efficient algorithms for online decision problems. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 26–40. Springer, Heidelberg (2003)
Chapter Google Scholar
Awerbuch, B., Kleinberg, R.D.: Adaptive routing with end-to-end feedback: Distributed learning and geometric approaches. In: Proceeding of the Thirty-Sixth Annual ACM Symposium on Theory of Computing, pp. 45–53. ACM Press, New York (2004)
Chapter Google Scholar
McMahan, H.B., Blum, A.: Geometric optimization in the bandit setting against an adaptive adversary. In: Shawe-Taylor, J., Singer, Y. (eds.) COLT 2004. LNCS (LNAI), vol. 3120, pp. 109–123. Springer, Heidelberg (2004)
Chapter Google Scholar
Sleator, D., Tarjan, R.: Self-adjusting binary search trees. Journal of the ACM 32, 652–686 (1985)
Article MATH MathSciNet Google Scholar
Hannan, J.: Approximation to Bayes risk in repeated plays. In: Dresher, M., Tucker, A., Wolfe, P. (eds.) Contributions to the Theory of Games, vol. 3, pp. 97–139. Princeton University Press, Princeton (1957)
Google Scholar
Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. In: Fawcett, T., Mishra, N. (eds.) Proceeding of the Twentieth International Conference on Machine Learning, pp. 928–936. AAAI Press, Menlo Park (2003)
Google Scholar
Cesa-Bianchi, N., Freund, Y., Haussler, D., Helmbold, D., Schapire, R.E., Warmuth, M.K.: How to use expert advice. Journal of the ACM 44, 427–485 (1997)
Article MATH MathSciNet Google Scholar
Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Information and Computation 108, 212–261 (1994)
Article MATH MathSciNet Google Scholar
Takimoto, E., Warmuth, M.K.: Path kernels and multiplicative updates. Journal of Machine Learning Research 4, 773–818 (2003)
Article MathSciNet Google Scholar
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The non-stochastic multi-armed bandit problem. SIAM Journal on Computing 32, 48–77 (2002)
Article MATH MathSciNet Google Scholar
Flaxman, A.D., Kalai, A.T., McMahan, H.B.: Online convex optimization in the bandit setting: Gradient descent without a gradient. In: Proceeding of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 385–394. ACM Press, New York (2005)
Google Scholar
Hutter, M., Poland, J.: Adaptive online prediction by following the perturbed leader. Journal of Machine Learning Research 6, 639–660 (2005)
MathSciNet Google Scholar
Moore, E.H.: On the reciprocal of the general algebraic matrix (abstract). Bulletin of the American Mathematical Society 26, 394–395 (1920)
Google Scholar
Penrose, R.A.: A generalized inverse for matrices. Proceedings of the Cambridge Philosophical Society 51, 406–413 (1955)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Software Systems, Tampere University of Technology, P. O. Box 553, FI-33101, Tampere, Finland
Jussi Kujala & Tapio Elomaa

Authors

Jussi Kujala
View author publications
You can also search for this author in PubMed Google Scholar
Tapio Elomaa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing, National University of Singapore, 117590, Singapore
Sanjay Jain
Ruhr-Universität Bochum, Germany
Hans Ulrich Simon
Department of Information and Communication Engineering, Faculty of Electro-Communications, The University of Electro-Communications, Chofugaoka 1–5–1, Chofu, 182-8585, Tokyo, Japan
Etsuji Tomita

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kujala, J., Elomaa, T. (2005). On Following the Perturbed Leader in the Bandit Setting. In: Jain, S., Simon, H.U., Tomita, E. (eds) Algorithmic Learning Theory. ALT 2005. Lecture Notes in Computer Science(), vol 3734. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11564089_29

Download citation

DOI: https://doi.org/10.1007/11564089_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29242-5
Online ISBN: 978-3-540-31696-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics