Skip to main content

Adaptive Strategies and Regret Minimization in Arbitrarily Varying Markov Environments

  • Conference paper
  • First Online:
Computational Learning Theory (COLT 2001)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2111))

Included in the following conference series:

Abstract

We consider the problem of maximizing the average reward in a controlled Markov environment, which also contains some arbitrarily varying elements. This problem is captured by a two-person stochastic game model involving the reward maximizing agent and a second player, which is free to use an arbitrary (non-stationary and unpredictable) control strategy. While the minimax value of the associated zero-sum game provides a guaranteed performance level, the fact that the second player’s behavior is observed as the game unfolds opens up the opportunity to improve upon this minimax value if the second player is not playing a worst-case strategy. This basic idea has been formalized in the context of repeated matrix games by the classical notions of regret minimization with respect to the Bayes envelope, where an attainable performance goal is defined in terms of the empirical frequencies of the opponent’s actions. This paper presents an extension of these ideas to problems with Markovian dynamics, under appropriate recurrence conditions. The Bayes envelope is first defined in a natural way in terms of the observed state action frequencies. As this envelope may not be attained in general, we define a proper convexification thereof as an attainable solution concept. In the specific case of single-controller games, where the opponent alone controls the state transitions, the Bayes envelope itself turns out to be convex and attainable. Some concrete examples are shown to fit in this framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Special issue on learning in games. Games and Economic Behavior, 29(1), November 1999.

    Google Scholar 

  2. P. Auer, N. Cesa-Bianchi, Y. Freund, and R.E. Schapire. Gambling in a rigged casino: The adversarial multi armed bandit problem. In Proc. 36th Annual Symposium on Foundations of Computer Science, pages 322–331. IEEE Computer Society Press, 1995.

    Google Scholar 

  3. D.P. Bertsekas and J.N. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1995.

    Google Scholar 

  4. D. Blackwell. An analog of the minimax theorem for vector payoffs. Pacific J. Math., 6(1):1–8, 1956.

    MATH  MathSciNet  Google Scholar 

  5. D. Blackwell. Controlled random walks. In Proc. International Congress of Mathematicians, 1954, volume III, pages 336–338. North-Holland, 1956.

    Google Scholar 

  6. J. Filar and K. Vrieze. Competitive Markov Decision Processes. Springer Verlag, 1996.

    Google Scholar 

  7. Y. Freund and R. Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29:79–103, November 1999.

    Article  MATH  MathSciNet  Google Scholar 

  8. D. Fudenberg and D. Levine. Universal consistency and cautious fictitious play. Journal of Economic Dynamic and Control, 19:1065–1990, 1995.

    Article  MATH  MathSciNet  Google Scholar 

  9. J. Hannan. Approximation to bayes risk in repeated play. In M. Dresher, A.W. Tucker, and P. Wolde, editors, Contribution to The Theory of Games, III, pages 97–139. Princeton University Press, 1957.

    Google Scholar 

  10. S. Hart and A. Mas-Colell. A simple adaptive procedure leading to correlated equilibrium. DP 166, The Hebrew University of Jerusalem, Center for Rationality, 1998.

    Google Scholar 

  11. E. Lehrer. Approachability in infinite dimensional spaces and an application: A universal algorithm for generating extended normal numbers. Preprint, May 1998.

    Google Scholar 

  12. M.L. Littman. Markov games as a framework for multi-agent reinforcement learning. In Morgan Kaufman, editor, Eleventh International Conference on Machine Learning, pages 157–163, 1994.

    Google Scholar 

  13. S. Mannor and N. Shimkin. The empirical bayes envelope approach to regret minimization in stochastic games. Technical report EE-1262, Faculty of Electrical Engineering, Technion, Israel, October 2000. available from: http://tiger.technion.ac.il/~shie/Public/drmOct23techreport.ps.gz.

  14. S. Mannor and N. Shimkin. Regret minimization in signal space for repeated matrix games with partial observations. Technical report EE-1242, Faculty of Electrical Engineering, Technion, Israel, March 2000. available from: http://tiger.technion.ac.il/~shie/Public/beMar16.ps.gz.

  15. T. Parthasarathy and M. Stern. Markov games-a survey. Differential Games and Control Theory, 1977.

    Google Scholar 

  16. S.D. Patek. Stochastic Shortest Path Games. PhD thesis, LIDS MIT, January 1997.

    Google Scholar 

  17. M. Puterman. Markov Decision Processes. Wiley-Interscience, 1994.

    Google Scholar 

  18. E. Rasmunsen. Games and Information: An Introduction to Game Theory. Blackwell, 1994.

    Google Scholar 

  19. A. Rustichini. Minimizing regret: the general case. Games and Economic Behavior, 29:224–243, November 1999.

    Article  MATH  MathSciNet  Google Scholar 

  20. N. Shimkin and A. Shwartz. Guaranteed performance regions in markovian systems with competing decision makers. IEEE Trans. on Automatic Control, 38(1):84–95, January 1993.

    Article  MATH  MathSciNet  Google Scholar 

  21. X. Spiant. An approachability condition for general sets. Technical Report 496, Ecole Polytechnique, Paris, 1999.

    Google Scholar 

  22. V. Vovk. A game of prediction with experts advice. Journal of Computer and Systems Sciences, 56(2):153–173, April 1998.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mannor, S., Shimkin, N. (2001). Adaptive Strategies and Regret Minimization in Arbitrarily Varying Markov Environments. In: Helmbold, D., Williamson, B. (eds) Computational Learning Theory. COLT 2001. Lecture Notes in Computer Science(), vol 2111. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44581-1_9

Download citation

  • DOI: https://doi.org/10.1007/3-540-44581-1_9

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42343-0

  • Online ISBN: 978-3-540-44581-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics