Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary

McMahan, H. Brendan; Blum, Avrim

doi:10.1007/978-3-540-27819-1_8

H. Brendan McMahan²⁰ &
Avrim Blum²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3120))

Included in the following conference series:

International Conference on Computational Learning Theory

2394 Accesses
26 Citations

Abstract

We give an algorithm for the bandit version of a very general online optimization problem considered by Kalai and Vempala [1], for the case of an adaptive adversary. In this problem we are given a bounded set S ⊆ ℝⁿ of feasible points. At each time step t, the online algorithm must select a point x ^t ∈ S while simultaneously an adversary selects a cost vector C ^t ∈ ℝⁿ. The algorithm then incurs cost c ^t.x ^t. Kalai and Vempala show that even if S is exponentially large (or infinite), so long as we have an efficient algorithm for the offline problem (given c ∈ ℝⁿ, find x ∈ S to minimize c.x) and so long as the cost vectors are bounded, one can efficiently solve the online problem of performing nearly as well as the best fixed x∈ S in hindsight. The Kalai-Vempala algorithm assumes that the cost vectors c ^t are given to the algorithm after each time step. In the “bandit” version of the problem, the algorithm only observes its cost, c ^t.x ^t. Awerbuch and Kleinberg [2] give an algorithm for the bandit version for the case of an oblivious adversary, and an algorithm that works against an adaptive adversary for the special case of the shortest path problem. They leave open the problem of handling an adaptive adversary in the general case. In this paper, we solve this open problem, giving a simple online algorithm for the bandit problem in the general case in the presence of an adaptive adversary. Ignoring a (polynomial) dependence on n, we achieve a regret bound of \(\mathcal{O}(T^{\frac{3}{4}}\sqrt{ln(T)}))\).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Online Algorithms for Prize-Collecting Optimization Problems

Online Algorithms for Maximum Cardinality Matching with Edge Arrivals

Article 27 August 2018

The Power of Amortized Recourse for Online Graph Problems

References

Kalai, A., Vempala, S.: Efficient algorithms for on-line optimization. In: Proceedings of the The 16th Annual Conference on Learning Theory (2003)
Google Scholar
Awerbuch, B., Kleinberg, R.: Adaptive routing with end-to-end feedback: Distributed learning and geometric approaches. In: Proceedings of the 36th ACM Symposium on Theory of Computing (2004)
Google Scholar
Takimoto, E., Warmuth, M.K.: Path kernels and multiplicative updates. In: Proceedings of the 15th Annual Conference on Computational Learning Theory. Lecture Notes in Artificial Intelligence, Springer, Heidelberg (2002)
Google Scholar
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM Journal on Computing 32, 48–77 (2002)
Article MATH MathSciNet Google Scholar
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: Gambling in a rigged casino: the adversarial multi-armed bandit problem. In: Proceedings of the 36th Annual Symposium on Foundations of Computer Science, pp. 322–331. IEEE Computer Society Press, Los Alamitos (1995)
Google Scholar
Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. In: Proceedings of the Twentieth International Conference on Machine Learning (2003)
Google Scholar
Motwani, R., Raghavan, P.: Randomized algorithms. Cambridge University Press, Cambridge (1995)
MATH Google Scholar
Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. Technical Report CMU-CS-03-110, Carnegie Mellon University (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Carnegie Mellon University, Pittsburgh, PA, 15213
H. Brendan McMahan & Avrim Blum

Authors

H. Brendan McMahan
View author publications
You can also search for this author in PubMed Google Scholar
Avrim Blum
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

The Centre for Computational Statistics and Machine Learning Department of Computer Science, University College London, Gower St., WC1E 6BT, London
John Shawe-Taylor
Google, 1600 Amphitheater Parkway, CA 94043, Mountain View, USA
Yoram Singer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

McMahan, H.B., Blum, A. (2004). Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary. In: Shawe-Taylor, J., Singer, Y. (eds) Learning Theory. COLT 2004. Lecture Notes in Computer Science(), vol 3120. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27819-1_8

Download citation

DOI: https://doi.org/10.1007/978-3-540-27819-1_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22282-8
Online ISBN: 978-3-540-27819-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics