Skip to main content

Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary

  • Conference paper
Learning Theory (COLT 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3120))

Included in the following conference series:

Abstract

We give an algorithm for the bandit version of a very general online optimization problem considered by Kalai and Vempala [1], for the case of an adaptive adversary. In this problem we are given a bounded set S ⊆ ℝn of feasible points. At each time step t, the online algorithm must select a point x t ∈ S while simultaneously an adversary selects a cost vector C t ∈ ℝn. The algorithm then incurs cost c t.x t. Kalai and Vempala show that even if S is exponentially large (or infinite), so long as we have an efficient algorithm for the offline problem (given c ∈ ℝn, find x ∈ S to minimize c.x) and so long as the cost vectors are bounded, one can efficiently solve the online problem of performing nearly as well as the best fixed x∈ S in hindsight. The Kalai-Vempala algorithm assumes that the cost vectors c t are given to the algorithm after each time step. In the “bandit” version of the problem, the algorithm only observes its cost, c t.x t. Awerbuch and Kleinberg [2] give an algorithm for the bandit version for the case of an oblivious adversary, and an algorithm that works against an adaptive adversary for the special case of the shortest path problem. They leave open the problem of handling an adaptive adversary in the general case. In this paper, we solve this open problem, giving a simple online algorithm for the bandit problem in the general case in the presence of an adaptive adversary. Ignoring a (polynomial) dependence on n, we achieve a regret bound of \(\mathcal{O}(T^{\frac{3}{4}}\sqrt{ln(T)}))\).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kalai, A., Vempala, S.: Efficient algorithms for on-line optimization. In: Proceedings of the The 16th Annual Conference on Learning Theory (2003)

    Google Scholar 

  2. Awerbuch, B., Kleinberg, R.: Adaptive routing with end-to-end feedback: Distributed learning and geometric approaches. In: Proceedings of the 36th ACM Symposium on Theory of Computing (2004)

    Google Scholar 

  3. Takimoto, E., Warmuth, M.K.: Path kernels and multiplicative updates. In: Proceedings of the 15th Annual Conference on Computational Learning Theory. Lecture Notes in Artificial Intelligence, Springer, Heidelberg (2002)

    Google Scholar 

  4. Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM Journal on Computing 32, 48–77 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  5. Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: Gambling in a rigged casino: the adversarial multi-armed bandit problem. In: Proceedings of the 36th Annual Symposium on Foundations of Computer Science, pp. 322–331. IEEE Computer Society Press, Los Alamitos (1995)

    Google Scholar 

  6. Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. In: Proceedings of the Twentieth International Conference on Machine Learning (2003)

    Google Scholar 

  7. Motwani, R., Raghavan, P.: Randomized algorithms. Cambridge University Press, Cambridge (1995)

    MATH  Google Scholar 

  8. Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. Technical Report CMU-CS-03-110, Carnegie Mellon University (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

McMahan, H.B., Blum, A. (2004). Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary. In: Shawe-Taylor, J., Singer, Y. (eds) Learning Theory. COLT 2004. Lecture Notes in Computer Science(), vol 3120. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27819-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-27819-1_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22282-8

  • Online ISBN: 978-3-540-27819-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics