Abstract
We give an algorithm for the bandit version of a very general online optimization problem considered by Kalai and Vempala [1], for the case of an adaptive adversary. In this problem we are given a bounded set S ⊆ ℝn of feasible points. At each time step t, the online algorithm must select a point x t ∈ S while simultaneously an adversary selects a cost vector C t ∈ ℝn. The algorithm then incurs cost c t.x t. Kalai and Vempala show that even if S is exponentially large (or infinite), so long as we have an efficient algorithm for the offline problem (given c ∈ ℝn, find x ∈ S to minimize c.x) and so long as the cost vectors are bounded, one can efficiently solve the online problem of performing nearly as well as the best fixed x∈ S in hindsight. The Kalai-Vempala algorithm assumes that the cost vectors c t are given to the algorithm after each time step. In the “bandit” version of the problem, the algorithm only observes its cost, c t.x t. Awerbuch and Kleinberg [2] give an algorithm for the bandit version for the case of an oblivious adversary, and an algorithm that works against an adaptive adversary for the special case of the shortest path problem. They leave open the problem of handling an adaptive adversary in the general case. In this paper, we solve this open problem, giving a simple online algorithm for the bandit problem in the general case in the presence of an adaptive adversary. Ignoring a (polynomial) dependence on n, we achieve a regret bound of \(\mathcal{O}(T^{\frac{3}{4}}\sqrt{ln(T)}))\).
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Kalai, A., Vempala, S.: Efficient algorithms for on-line optimization. In: Proceedings of the The 16th Annual Conference on Learning Theory (2003)
Awerbuch, B., Kleinberg, R.: Adaptive routing with end-to-end feedback: Distributed learning and geometric approaches. In: Proceedings of the 36th ACM Symposium on Theory of Computing (2004)
Takimoto, E., Warmuth, M.K.: Path kernels and multiplicative updates. In: Proceedings of the 15th Annual Conference on Computational Learning Theory. Lecture Notes in Artificial Intelligence, Springer, Heidelberg (2002)
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM Journal on Computing 32, 48–77 (2002)
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: Gambling in a rigged casino: the adversarial multi-armed bandit problem. In: Proceedings of the 36th Annual Symposium on Foundations of Computer Science, pp. 322–331. IEEE Computer Society Press, Los Alamitos (1995)
Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. In: Proceedings of the Twentieth International Conference on Machine Learning (2003)
Motwani, R., Raghavan, P.: Randomized algorithms. Cambridge University Press, Cambridge (1995)
Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. Technical Report CMU-CS-03-110, Carnegie Mellon University (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
McMahan, H.B., Blum, A. (2004). Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary. In: Shawe-Taylor, J., Singer, Y. (eds) Learning Theory. COLT 2004. Lecture Notes in Computer Science(), vol 3120. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27819-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-27819-1_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22282-8
Online ISBN: 978-3-540-27819-1
eBook Packages: Springer Book Archive